Commit Graph

95 Commits

Author SHA1 Message Date
Xiaomeng Yang
265fa0ce4d Move math::Axpy function to elementwise lib (#18316)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18316

Move math::Axpy function to elementwise lib

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D14574697

fbshipit-source-id: 7cfbb2da295c8966c5328bd6b577cce2638eea62
2019-03-26 12:19:19 -07:00
Xiaomeng Yang
e04c9195b7 Update math::Transpose to support tensor with size > 2G (#17670)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17670

Update math::Transpose to support tensor with size > 2G

i-am-not-moving-c2-to-c10

Differential Revision: D14313624

fbshipit-source-id: 0b4a85b913972e5a8981f0d40d0c539407b98f30
2019-03-20 18:22:21 -07:00
Xiaomeng Yang
3a34f443c5 Separate reduce functions from math (#16929)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16929

Separate CPU reduce functions from math

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D13999469

fbshipit-source-id: bd628b15a6e3c1f04cc62aefffb0110690e1c0d1
2019-02-13 17:50:47 -08:00
Xiaomeng Yang
2db847b3a7 Separate elementwise level2 math functions (#16753)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16753

Separate elementwise level2 math functions

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D13954928

fbshipit-source-id: 1ca7a5d3da96e32510f502e5e4e79168854bee67
2019-02-07 18:38:26 -08:00
Xiaomeng Yang
598b713660 Seperate level1 elementwise functions from math (#16397)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16397

Seperate level1 elementwise functions from math

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D13830626

fbshipit-source-id: e6e672647076dab8b3b24be181f580a1486250c9
2019-01-30 00:04:12 -08:00
Xiaomeng Yang
866c4e3467 Separate Moments from math and optimize it (#16175)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16175

Separate Moments from math and optimize it

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D13742472

fbshipit-source-id: 90757d908d38c98ca69818855aaf68315e525992
2019-01-20 08:53:25 -08:00
Xiaomeng Yang
b436f94b53 Separate affine_channel from math and optimize it (#16135)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16135

Separate affine_channel from math and optimize it

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D13727606

fbshipit-source-id: 8980af4afadaf964a18a9da581106fe30896a7e9
2019-01-18 22:40:16 -08:00
bddppq
1a09a2a27f Export PyTorch erf to ONNX Erf and add Caffe2 Erf operator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16106

Differential Revision: D13709490

Pulled By: bddppq

fbshipit-source-id: 1b5b32261f06543371f7bd7ac9b11957a5eb4ad0
2019-01-17 09:18:08 -08:00
Jerry Zhang
890568a018 Tensor reinitialization codemod - 5/5 (#15884)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15884

Codemod generated with clangr shard mode, 25 files per diff,
To eliminiate partially initialized Tensor, we split the initialization of local Tensor variables into two steps, first declare un uninitialized Tensor, and
call `ReinitializeTensor` to initialize it.
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: hyuen

Differential Revision: D13586737

fbshipit-source-id: dc8e49e9f29505b8898bb19f84c1a983f2d811ab
2019-01-10 16:32:26 -08:00
Jongsoo Park
54e8623d26 3D Conv in NHWC layout (#12733)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12733

Conv in NHWC layout only works for 2D images. This has been a pain point when implementing quantized 3D convolution because we need NHWC layout for best performance (note that NHWC layout in general gives better performance in CPU not just for quantized operators). For example, our quantized ops have a functionality to measure quantized error operator by operator but this needs running a shadow fp32 operator, but this is not easy when there's no 3D conv in NHWC layout is available (currently we're doing layout conversion on the fly for the shadow fp32 operator which is error prone). Some of Caffe2 frameworks like brew generates error when we try to create a 3D conv op in NHWC layout. This was also a blocker for using aibench because aibench is using brew.

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D10333829

fbshipit-source-id: 2d203ee1db833cd3f9d39353219e3894b46c4389
2018-11-04 21:50:09 -08:00
Xiaomeng Yang
3092a69546 Optimize NCHW2NHWC on GPU (#12910)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12910

Optimize NCHW2NHWC on GPU

Reviewed By: houseroad

Differential Revision: D10481163

fbshipit-source-id: 6ddbd0ec9c96965b96aa1b8a006232d6f2b94249
2018-10-22 11:24:29 -07:00
Sebastian Messmer
6476e4598c Rename TypeMeta function pointers (#12306)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12306

In a future diff, I'm going to introduce non-placement constructor and destructor to TypeMeta.
To make it less ambigous, this diff is first renaming the existing ones to PlacementXXX.

Reviewed By: dzhulgakov

Differential Revision: D10184117

fbshipit-source-id: 119120ebc718048bdc1d66e0cc4d6a7840e666a4
2018-10-16 16:45:47 -07:00
Yangqing Jia
38f3d1fc40 move flags to c10 (#12144)
Summary:
still influx.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12144

Reviewed By: smessmer

Differential Revision: D10140176

Pulled By: Yangqing

fbshipit-source-id: 1a313abed022039333e3925d19f8b3ef2d95306c
2018-10-04 02:09:56 -07:00
Xiaomeng Yang
ec5404a449 Add cuda version of SpatialBNOp also optimize SpatialBN on CPU (#10888)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10888

Add cuda version of SpatialBNOp also optimize SpatialBN on CPU

Reviewed By: houseroad

Differential Revision: D9512435

fbshipit-source-id: 6f828c88d56d30dc9a2f98a297a161c35cc511b1
2018-09-06 18:26:13 -07:00
Maxim Naumov
7e0a052a5d Adding synthetic data generation to the filler.h file (#11060)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11060

Adding synthetic data generation to the filler.h file (the exact distribution to be replaced later on).

Reviewed By: highker

Differential Revision: D9417594

fbshipit-source-id: 5d66dfbcb254a5961c36b7d3a081332c7372dac7
2018-09-04 13:40:53 -07:00
Xiaomeng Yang
b3d559cdd1 Optimize WeightedSumOp for two inputs (#11049)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11049

Optimize WeightedSumOp for two inputs

Reviewed By: houseroad

Differential Revision: D9566692

fbshipit-source-id: 9aab1f02251d386b6f7d0699ae11eeb2ea2b5b4f
2018-09-01 11:54:55 -07:00
Orion Reblitz-Richardson
edb34434ab More changes for hidden visibility (#10692)
Summary:
Let's run CI tests to see what fails given the changes that just landed in https://github.com/pytorch/pytorch/pull/10624

cc mingzhe09088 ezyang Yangqing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10692

Reviewed By: mingzhe09088

Differential Revision: D9423617

Pulled By: orionr

fbshipit-source-id: 3bda1f118d13f8dd8e823727c93167cae747d8cf
2018-08-21 13:39:57 -07:00
Jongsoo Park
cc53807be5 group conv with NHWC layout (#10585)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10585

group conv with NHWC layout

Reviewed By: BIT-silence

Differential Revision: D7547497

fbshipit-source-id: da0ec5a4512c15a0a0d7b79e6ce00c1f8f77f661
2018-08-17 00:39:23 -07:00
Orion Reblitz-Richardson
488ea824ed Additional changes to make GPU builds work (#10507)
Summary:
A continuation of https://github.com/pytorch/pytorch/pull/10504 for GPU, torch, etc. builds.

I was testing with

```
FULL_CAFFE2=1 python setup.py build_deps | tee ~/log.txt
cat ~/log.txt | egrep 'undefined refer' | sort | less
```

I'll rebase on master when Yangqing's changes in 10504 land, but putting up for some testing.

cc mingzhe09088 anderspapitto ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10507

Reviewed By: Yangqing

Differential Revision: D9359606

Pulled By: orionr

fbshipit-source-id: c2a3683b3ea5839689f5d2661da0bc9055a54cd2
2018-08-16 13:25:27 -07:00
Xiaomeng Yang
87cac4c2f1 Update Im2Col related to make preparation for group conv in NHWC order. (#10439)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10439

Update Im2Col related to make preparation for group conv in NHWC order.

Reviewed By: houseroad

Differential Revision: D9285344

fbshipit-source-id: 1377b0243acb880d2ad9cf73084529a787dcb97d
2018-08-15 17:10:24 -07:00
Xiaomeng Yang
f57e4ce1d5 Update broadcast with alpha to reduce num of launching kernels. (#10235)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10235

Update broadcast with alpha to reduce num of launching kernels.

Reviewed By: houseroad

Differential Revision: D9175824

fbshipit-source-id: 7a463833350a2c84dcfb82f73cf40da403dd59a0
2018-08-04 19:54:20 -07:00
Xiaomeng Yang
57d2d4bcff Optimize reduce ops for 2d and 3d (#9992)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9992

Optimize reduce ops for 2d and 3d

Reviewed By: houseroad

Differential Revision: D9042505

fbshipit-source-id: 62af2125aa6439106293e59bdf6a2b920792fd2d
2018-08-04 13:53:58 -07:00
Igor Milyakov
607688e928 Adding reciprocal operator and a test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9908

Differential Revision: D9035809

Pulled By: virtan

fbshipit-source-id: bce1db46fd55faeeab18a3b266d25c8beeb08df7
2018-07-27 18:24:43 -07:00
Jerry Zhang
aebf3b47ae Remove template parameter from Tensor (#9939)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9939

Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13

Pull Request resolved: https://github.com/pytorch/translate/pull/166

Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125

Closes https://github.com/pytorch/pytorch/pull/9125

Use inheritance for polymorphism, and remove template parameter
This is to change the templating in call sites, the core implementations will change later

Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are:

1. We added an extra argument *DeviceType* to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)),
2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided.
3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type
4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change

Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s.

Reviewed By: ezyang, houseroad

Differential Revision: D9024330

fbshipit-source-id: e0b8295d2dc6ebe2963383ded5af799ad17164ba
2018-07-27 10:56:39 -07:00
Jerry Zhang
969b62f276 Revert D8121878: Remove template parameter from Tensor
Differential Revision:
D8121878

Original commit changeset: 4a5e9a677ba4

fbshipit-source-id: d8e2c0bb145b52fbcca323b22d1d3346f0b3249e
2018-07-26 14:02:04 -07:00
Jerry Zhang
cd5adc7b5f Remove template parameter from Tensor (#13)
Summary:
Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13

Pull Request resolved: https://github.com/pytorch/translate/pull/166

Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125

Closes https://github.com/pytorch/pytorch/pull/9125

Use inheritance for polymorphism, and remove template parameter
This is to change the templating in call sites, the core implementations will change later

Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are:

1. We added an extra argument *DeviceType* to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)),
2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided.
3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type
4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change

Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s.

Reviewed By: xw285cornell

Differential Revision: D8121878

fbshipit-source-id: 4a5e9a677ba4ac82095df959851a054c81eccf81
2018-07-26 10:25:23 -07:00
Xiaomeng Yang
445c17d492 Update CopyMatrix in math (#9792)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9792

Update CopyMatrix in math

Reviewed By: houseroad

Differential Revision: D8982421

fbshipit-source-id: da2056306cde3300124b21eba7a6c2d113111002
2018-07-25 16:10:52 -07:00
Xiaomeng Yang
5df3eae89e Add 1x1 specialization for conv with NCHW order (#9671)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9671

Add 1x1 specialization for conv with NCHW order

Reviewed By: houseroad

Differential Revision: D8944686

fbshipit-source-id: 94bf44f69498b1934b7dfff4c0e989342c7bb61c
2018-07-23 18:54:58 -07:00
James Sun
6de038286a Add random data filler to predictor bench to support production nets (#9520)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9520

Add random data filler to predictor bench to support production nets

Reviewed By: salexspb

Differential Revision: D8712757

fbshipit-source-id: 2c732b2ba71ab210f9222adf94d08442ca71dc03
2018-07-18 00:46:02 -07:00
Mark Richardson
88146484b4 Add support for .norm() pytorch onnx export and ReduceL1/ReduceL2 caffe2 operators (#9299)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9299

Onnx has ReduceL1 and ReduceL2 operators that would facilitate this, so allow pytorch to export those and allow caffe2 to run them.

I only implemented this on CPU so far.

Reviewed By: pjh5

Differential Revision: D8757381

fbshipit-source-id: 68afc9e2f90042a70929b73ace05a499b5c670c7
2018-07-14 10:54:13 -07:00
Orion Reblitz-Richardson
7f33ec55b2 Fix Eigen issue on OS X with CUDA and nvcc compile (#9350)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9350

Re-apply #9270

Breaking this out of #8338

This takes care of the Eigen failure we saw on Mac CUDA builds when BUILD_CAFFE2 and BUILD_ATEN were removed. Fix is to isolate Eigen from headers included by cu files and processed by nvcc. This was worked on with smessmer.

Reviewed By: mingzhe09088

Differential Revision: D8794431

fbshipit-source-id: de656334af46c697802073f8e8d9a6aeb9ca65a7
2018-07-11 14:00:05 -07:00
Xiaomeng Yang
cbcf45274b Move tanh function to math (#9328)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9328

Move tanh function to math

Reviewed By: houseroad

Differential Revision: D8794745

fbshipit-source-id: ea525dedde6f53592b06c2caffd6426688dea5fc
2018-07-11 13:59:50 -07:00
Huamin Li
fb9f9c9ba2 Implement Sinh and Cosh (#9213)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9213

Closes https://github.com/pytorch/pytorch/pull/9213

Added hyperbolic trig functions Sinh and Cosh

Reviewed By: BIT-silence

Differential Revision: D8752566

fbshipit-source-id: 5a58336a5153ec804404b9ac7b10b5662ede3cb7
2018-07-10 18:55:31 -07:00
Mike Kelley
8e6e8098ce Revert D8768025: [pytorch][PR] Fix Eigen issue on OS X with CUDA and nvcc compile
Differential Revision:
D8768025

Original commit changeset: 5b34017aeb67

fbshipit-source-id: 6ec892ff483bb9d966eb7138eadc77443972c8f8
2018-07-10 10:24:43 -07:00
Orion Reblitz-Richardson
bbeae24145 Fix Eigen issue on OS X with CUDA and nvcc compile (#9270)
Summary:
Breaking this out of #8338

This takes care of the Eigen failure we saw on Mac CUDA builds when BUILD_CAFFE2 and BUILD_ATEN were removed. Fix is to isolate Eigen from headers included by cu files and processed by nvcc. This was worked on with smessmer.

cc mingzhe09088 smessmer BIT-silence Yangqing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9270

Reviewed By: mingzhe09088

Differential Revision: D8768025

Pulled By: orionr

fbshipit-source-id: 5b34017aeb67e35a1b5938d962181ccd4cd37591
2018-07-10 09:25:42 -07:00
Xiaomeng Yang
838fdd6f99 Add Cube and Cbrt Ops (#8991)
Summary:
Closes https://github.com/pytorch/pytorch/pull/8991

Add Cube and Cbrt Ops

Reviewed By: houseroad

Differential Revision: D8678848

fbshipit-source-id: 051dd475e45ad9f1d11a8b32ae3acd1f7459b930
2018-06-28 14:55:30 -07:00
Xiaomeng Yang
9243b64bff
[Caffe2] Update elementwise ops to support numpy style boradcast (#8070)
* Update elementwise ops to support numpy style boradcast

Update elementwise ops to support numpy style boradcast

* Fix sqrt_op

* Fix compare ops

* Fix gradient test

* Fix optimizer legacy broadcast

* Fix legacy broadcast for elementwise ops

* Skip flaky test

* Fix eigen simple binary op

* Fix attention test

* Fix rnn test

* Fix LSTM test

* Fix tan grad

* Fix schema check
2018-06-05 15:49:16 -07:00
Yinghai Lu
14ad2e74f1
Add BiasCHW fallback for GPU (#7738) 2018-05-23 16:04:35 -07:00
James Sun
b4d5e67e5f Add asin, acos, tan, atan operators (#7600) 2018-05-16 18:09:26 -07:00
Xiaomeng Yang
921dece2d7
Update Im2ColNd functions (#7505)
Update Im2ColNd functions
2018-05-12 15:59:50 -07:00
Xiaomeng Yang
49f87320ba
[Caffe2] Add full impl of GroupNorm (#7058)
* Add full impl of GroupNorm

* Fix comments in math.h

* Remove unsed buffers

* Add #include <array> in gpu version

* Remove unused moments_buffer_

* Make inverse std to be a template.

* Add detailed comments
2018-04-29 11:26:40 -07:00
Xiaomeng Yang
e27d66a454 Remove Eigen from math CUDA and update algorithm in ReduceTensor and Moments (#6922) 2018-04-24 23:07:35 -07:00
Xiaomeng Yang
34fa355f27
[caffe2] Add Moments to math (#6798)
* Add gpu check for reduce_max

* Add Moments in math

* Update cpu version to avoid int type to be 0

* Update Moments on CPU to same as GPU
2018-04-21 01:03:44 -07:00
Xiaomeng Yang
71c644b005
[caffe2] Add ReduceMinOp and ReduceMaxOp (#6744)
* Add gpu check for reduce_max

* Add ReduceMinOp and ReduceMaxOp

* Merge util functions in reduce_ops and math

* Expose math internal functions
2018-04-19 00:22:23 -07:00
Xiaomeng Yang
be0b7f8c81 Add reduce min and reduce max (#6685) 2018-04-18 10:58:05 -07:00
Orion Reblitz-Richardson
6223bfdb1d Update from Facebook (#6692)
* [GanH][Easy]: Add assertion to adaptive weighting layer

0 weight causes numeric instability and exploding ne

* [Easy] Add cast op before computing norm in diagnose options

As LpNorm only takes floats we add a manual casting here.

* Introduce a new caching device allocator

`cudaMalloc` and `cudaFree` calls are slow, and become slower the
more GPUs there are. Essentially, they grab a host-wide (not device-wide) lock
because GPU memory is transparently shared across all GPUs. Normally, this
isn't much of a concern since workloads allocate memory upfront, and reuse it
during later computation.

However, under some computation models (specifically, memory conserving
approaches like checkpoint-and-recompute, see
https://medium.com/@yaroslavvb/fitting-larger-networks-into-memory-583e3c758ff9)
this assumption is no longer true. In these situations, `cudaMalloc` and
`cudaFree` are common and frequent. Furthermore, in data parallel contexts,
these calls happen at nearly the same time from all GPUs worsening lock
contention.

A common solution to this problem is to add a custom allocator. In fact,
nVIDIA provides one out of the box: CUB, which Caffe2 already supports.
Unfortunately, the CUB allocator suffers from very high fragmentation. This is
primarily because it is a "buddy" allocator which neither splits nor merges
free cached blocks. Study
https://github.com/NVlabs/cub/blob/1.8.0/cub/util_allocator.cuh#L357 if you
want to convince yourself.

This diff adapts a caching allocator from the Torch codebase
https://github.com/torch/cutorch/blob/master/lib/THC/THCCachingAllocator.cpp
which does splitting and merging and ends up working really well, at least for
workloads like the checkpoint-and-recompute computation models noted above.

I simplified the implementation a little bit, made it a bit more C++-like. I
also removed a bunch of stream synchronization primitives for this diff. I
plan to add them back in subsequent diffs.

* Report reader progress in fblearner workflows

Integrate with fblearner progress reporting API and add support to report training progress from reader nodes.
If reader is constructed with batch limits, report based on finished batch vs total batch. The finished batch may be more than total batch because we evaludate if we should stop processing everytime we dequeue a split.
If no limit for the reader, report based on finished splits (Hive files) vs total splits. This is fairly accurate.

* [GanH][Diagnose]: fix plotting

1. ganh diagnose needs to set plot options
2. modifier's blob name is used for metric field can need to be fixed before
generating net

* Automatic update of fbcode/onnx to 985af3f5a0f7e7d29bc0ee6b13047e7ead9c90c8

* Make CompositeReader stops as soon as one reader finishes

Previously, CompositeReader calls all readers before stopping. It results in flaky test since the last batch may be read by different threads; resulting in dropped data.

* [dper] make sure loss is not nan

as desc.

* [rosetta2] [mobile-vision] Option to export NHWC order for RoIWarp/RoIAlign

Thanks for finding this @stzpz and @wangyanghan. Looks like NHWC is more
optimized. For OCR though it doesn't yet help since NHWC uses more mem b/w but
will soon become important.

* Intra-op parallel FC operator

Intra-op parallel FC operator

* [C2 Proto] extra info in device option

passing extra information in device option

design doc: https://fb.quip.com/yAiuAXkRXZGx

* Unregister MKL fallbacks for NCHW conversions

* Tracing for more executors

Modified Tracer to work with other executors and add more tracing

* Remove ShiftActivationDevices()

* Check for blob entry iff it is present

When processing the placeholders ops, ignore if the blob is not present in the blob_to_device.

* Internalize use of eigen tensor

Move use of eigen tensor out of the header file so we don't get template partial specialization errors when building other libraries.

* feature importance for transformed features.

* - Fix unused parameter warnings

The changes in this diff comments out unused parameters.
This will allow us to enable -Wunused-parameter as error.

#accept2ship

* add opencv dependencies to caffe2

The video input op requires additional opencv packages. This is to add them to
cmake so that it can build

* Add clip_by_value option in gradient clipping

Add clip_by_value option in gradient clipping

when the value is bigger than max or smaller than min, do the clip

* std::round compat
2018-04-17 23:36:40 -07:00
Xiaomeng Yang
4be34ca0f3 Add broadcast and reduce gradient (#6668)
Add broadcast and reduce gradient
2018-04-17 13:31:13 -07:00
Xiaomeng Yang
cd2112717c
[caffe2] Update math functions with params on host. (#6602)
* Update ReduceMean

Add reduce mean to math

Add reduce mean to math

* sync reduce_ops_test

* Update math_gpu.cu
2018-04-14 21:41:41 -07:00
Xiaomeng Yang
8849bea120 [caffe2] Update ReduceOps (#6497)
* Update ReduceMean

* Add reduce mean to math

* Update cuda flag

* Update Eigen::Tensor ctor

* Remove unused variables

* Skip ReduceTensorGPUTest if no gpus

* Add NOMINMAX for windows

* Fix lpnorm_op in windows
2018-04-11 23:36:05 -07:00
Orion Reblitz-Richardson
1d5780d42c Remove Apache headers from source.
* LICENSE file contains details, so removing from individual source files.
2018-03-27 13:10:18 -07:00