Commit Graph

95 Commits

Author SHA1 Message Date
Will Feng
cdead5ace1 Enable CircleCI for Linux jobs (#12389)
Summary:
Changes in this PR:
1. Intermediate Docker image is shared from build stage to test stage through ECR, in order to fix the Caffe2 flaky CUDA tests.
2. There are ~7 Caffe2 operator tests that are only flaky in `caffe2_py2_gcc4_8_ubuntu14_04_test` on CPU. Disabling those tests on that config only, which is okay to do because we are still running those tests in other test jobs.

After this PR is merged, CircleCI will be running on master automatically, and will be running on PRs if the author rebased their PR onto the newest master (which we will ask all the authors to do when we switch off Jenkins for Linux).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12389

Differential Revision: D10224267

Pulled By: yf225

fbshipit-source-id: dd1a90a425c3d13b870d3d328cb301eee2e6e2cd
2018-10-08 17:09:37 -07:00
Jongsoo Park
29610621ec 64B align for avx512 (#11748)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11748

For avx512, we need to align at a multiple of 64B not 32B
Regardless of avx512, it's in general a good idea to be cache line aligned.

Reviewed By: ilia-cher

Differential Revision: D9845056

fbshipit-source-id: b1d3ed67749c0c1a64acd5cc230a1279e8023512
2018-09-17 14:08:31 -07:00
Will Feng
c9e66351a7 Port all PyTorch and Caffe2 jobs to CircleCI (#11264)
Summary:
This PR adds all PyTorch and Caffe2 job configs to CircleCI.

Steps for the CircleCI mini-trial:
- [ ] Make sure this PR passes Jenkins CI and fbcode internal tests
- [x] Approve this PR
- [ ] Ask CircleCI to turn up the number of build machines
- [ ] Land this PR so that the new `.circleci/config.yml` will take effect

Several Caffe2 tests are flaky on CircleCI machines and hence skipped when running on CircleCI. A proper fix for them will be worked on after a successful mini-trial.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11264

Differential Revision: D9656793

Pulled By: yf225

fbshipit-source-id: 7832e90018f3dff7651489c04a179d6742168fe1
2018-09-05 16:28:11 -07:00
rohithkrn
f5910c8a36 Add MIOPEN recurrent operator (#10840)
Summary:
The goal of this PR is to enable miopen engine(for hip devices) for recurrent operator and also enable corresponding unit test.
bddppq petrex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10840

Differential Revision: D9518980

Pulled By: bddppq

fbshipit-source-id: 214661e79a47c5dc6b712ef0fba986bd99db051f
2018-08-27 15:39:56 -07:00
Xiuyan Ni
db96a0951f Add SIMD version to GFTRL optimizer (#9698)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9698

Add SIMD version to GFTRL optimizer

Differential Revision: D8949723

fbshipit-source-id: 835ce2ce49630ae43fc6bac63c545c14b25f5a26
2018-07-30 15:27:24 -07:00
Junjie Bai
bdbbcf068a Temporarily disable test_unique on rocm since it keeps running into segfault (#9872)
Summary:
petrex

https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/3758/console
https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/3757/console
https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/3752/console
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9872

Reviewed By: ezyang

Differential Revision: D9013335

Pulled By: bddppq

fbshipit-source-id: 80490a0fd4a86aa9c8454378c0edddc57d135c4e
2018-07-26 08:34:00 -07:00
Junjie Bai
7af5883860 Eanble python tests on ROCM (#9616)
Summary:
petrex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9616

Differential Revision: D8960623

Pulled By: bddppq

fbshipit-source-id: bde93bda6230094e6bf4badd8ee79f0688ae1993
2018-07-24 11:37:58 -07:00
Xiuyan Ni
4e5369349f Add FTRL Optimzier with Group Lasso regularizer (#9074)
Summary:
Closes https://github.com/pytorch/pytorch/pull/9074

Implement an optimzier based on FTRL Optimzier which support Group
Lasso regularizer.

The relevant paper list for this optimizer:
1. About the FTRL Optimizer: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41159.pdf,
2. About the group lasso regularizer solver: http://www.cse.cuhk.edu.hk/~king/PUB/ICML2010-Yang-473.pdf

Differential Revision: D8623146

fbshipit-source-id: 40e08aa6319d1ad7aa95e8716e3de83b9cfb8452
2018-07-06 13:41:00 -07:00
bddppq
bc4feab3e3
Fix flaky atomic iter test (#7649) 2018-05-17 21:17:29 -07:00
Paul Jesse Hellemn
b875fb281c
Update from facebook (#7451)
* [bootcamp] Improve "Shape" operator to support axes specification

To improve .shape operator of Caffe2 to support x.shape(tensor, axes), which takes an optional int array "axes" as input. For example, x.shape(tensor, [1, 0]) will return the dimension for axis 1 and 0 following the specified order. For current version, "axes" input allows duplications and can have arbitrary length.

* Back out "Add barrier net that runs before training nets"

Original commit changeset: b373fdc9c30f. Need additional changes to some callers to support barrier failures.

* Change warning to verbose log to reduce log spam

The `LOG(WARNING)` was a bit spammy for regular use so lets just make it a `VLOG`.

* Extract the shared code from different caffe2_benchmark binaries

The OSS benchmark and Internal benchmark will share most functions in the benchmark.

* Support MFR in sequence training

As titled.

* Make knowledge distillation work with using logged prediction feature as teacher label.

1) Add loading raw dense feature as teacher label.
2) Optional calibration function for teacher label
3) Add teacher label into generic unit test
4) Deprecated TTSN workflow version using feature_options to config teacher label

* [C2/CUDA]: unjoined cross entropy sigmoid

as desc

* Add async_scheduling executor into deferrable_net_exec_test

Add async_scheduling into tests and fix some exception cases

* Fix Event disabled error

When disabling event in RNN ops make sure we don't call Finish on disabled
event from op's RunAsync

* cuda ensure cpu output op can handle both TensorCPU and TensorCUDA

as desc.

* [C2 Core] Infer input device option in C2 hypothesis_test checkers

Improve how we default input blob device options.
Previously it defaults as where op lives but it is not necessarily the case.

For example:
CopyCPUToGPU

* [C2 Op]SplitByLengthsOp CPU/GPU implementation

[C2 Op]SplitByLengthsOp CPU/GPU implementation

* fix undefined symbol error

not sure why we're getting undefined symbol even with link_whole = True
Need to figure out why but need this workaround for now

* Add tools in DAIPlayground platform to help debugging models

Add additional tools to allow Plauground override individual method defined in AnyExp.  This will allow user to create module that specificly change certain default method behavior.  An example included in this diff is deactivating test model and checkpointing.  When debugging any model problems, switching off components helps me quickly narrow down the location of the bug.  The technique is extensively used in task T27038712 (Steady memory increase in EDPM, eventually resulting in gloo/cuda.cu:34: out of memory)

* add shape and type inference for int8 conversion operator

* Fix flaky test for group_norm

Fix flaky test for group_norm

* Fix group_norm_op_test flaky

Fix group_norm_op_test flaky

* Implementation of composite learning rate policy

In many state-of-the-arts deep learning works, people use a simple trick to
schedule the learning rate: use a fixed learning rate until error plateaus
and then switch to a different fixed learning rate, and so on. In this diff,
we implemented a simple version of the composite learning rate. The user gives
a set of learning rates policies and corresponding iteration nums, and the
optimizer will change the learning rate policy based on the number of iterations so far.

For example, the user give two learning rate policies, one is FixedLearningRate
and PolyLearningRate, with an iteration number of 1k. Then the first 1k iteration,
we use FixedLearningRate. For the following iterations, we use PolyLearningRate.

* Split two use cases of CachedReader into two classes, DBFileReader and CachedReader

# Use Cases:

1). input: DB file -> output: DatasetReader.

Use DBFileReader.

2). input: Reader -> build cache DB file -> output: DatasetReader.

Use CachedReader.

# Changes to CachedReader:

1). Move db_path to the constructor.
Because in mock reader. cache will always be built ahead.

# Changes to tests:

1). Make a separate TestCase class for CachedReader and DBFileReader.

2). Make it possible to add more test functions by adding setUp, tearDown and _make_temp_path.

3). Make delete db_path more general. `db_path` could be a file for `log_file_db`, but could also be a directory for `leveldb`.

* Back out "On Mobile phones, call GlobalInit with no arguments in predictor in case we need to perform initialization"

Original commit changeset: 4489c6133f11

* Fix LARS bug

Fixed a bug in the LARS implementation which caused all subsequent blobs not using LARS to have the LARS learning rate multiplier applied to them.

* [tum] support sparse init & add uniformFill option

as title

* Propagate exception for async nets

Capture the exception when an exception is thrown in async nets and re-throw it after wait().  This allows exceptions to be propagated up to the caller.

This diff was a part of D7752068.  We split the diff so that C2 core files changes are in a separate diff.

* Automatic update of fbcode/onnx to 69894f207dfcd72d1e70497d387201cec327efbc

Previous import was 403ccfbd0161c38f0834413d790bad0874afbf9a

Included changes:
- **[69894f2](https://github.com/onnx/onnx/commit/69894f2)**: Use op schema.all tensor types in random like definitions (#865) <Scott McKay>
- **[b9d6b90](https://github.com/onnx/onnx/commit/b9d6b90)**: Clarify random like operators (#846) <Scott McKay>
- **[fc6b5fb](https://github.com/onnx/onnx/commit/fc6b5fb)**: Refactor shape inference implementation (#855) <anderspapitto>
- **[b7d8dc8](https://github.com/onnx/onnx/commit/b7d8dc8)**: fix cmake warning message (#863) <Eric S. Yu>
- **[f585c5d](https://github.com/onnx/onnx/commit/f585c5d)**: add pytorch-operator test for tile (#831) <Wenhao Hu>
- **[993fe70](https://github.com/onnx/onnx/commit/993fe70)**: add install step (#832) <Eric S. Yu>
- **[68bc26c](https://github.com/onnx/onnx/commit/68bc26c)**: add type inference for traditional ml ops except classifier ops. (#857) <Ke Zhang>
- **[9cc0cda](https://github.com/onnx/onnx/commit/9cc0cda)**: fix string representation of scalar types (#858) <G. Ramalingam>
- **[1078925](https://github.com/onnx/onnx/commit/1078925)**: fix y in pow test case to scalar (#852) <Wenhao Hu>
- **[c66fb6f](https://github.com/onnx/onnx/commit/c66fb6f)**: Add some math function shape inference (#845) <anderspapitto>
- **[ff667d1](https://github.com/onnx/onnx/commit/ff667d1)**: Refactor return type and docs for ONNXIFI_BACKEND_DIRECTX_ID (#853) <Marat Dukhan>
- **[11c6876](https://github.com/onnx/onnx/commit/11c6876)**: clear initializer names when clear initializer (#849) <Wenhao Hu>
- **[73c34ae](https://github.com/onnx/onnx/commit/73c34ae)**: Clarify FeatureVectorizer description. (#843) <Scott McKay>
- **[1befb9b](https://github.com/onnx/onnx/commit/1befb9b)**: Remove useless text in docs (#850) <Lu Fang>
- **[e84788f](https://github.com/onnx/onnx/commit/e84788f)**: Fix SELU attributes' default values (#839) <Lu Fang>
- **[ebac046](https://github.com/onnx/onnx/commit/ebac046)**: Add tile test case (#823) <Wenhao Hu>
- **[8b7a925](https://github.com/onnx/onnx/commit/8b7a925)**: a few more shape inference functions (#772) <anderspapitto>
- **[9718f42](https://github.com/onnx/onnx/commit/9718f42)**: Make the coefficient non optional for LinearClassifier (#836) <Jaliya Ekanayake>
- **[ef083d0](https://github.com/onnx/onnx/commit/ef083d0)**: Add save_tensor and load_tensor functions for Protos (#770) <Lu Fang>
- **[45ceb55](https://github.com/onnx/onnx/commit/45ceb55)**: Check if CMAKE_BUILD_TYPE set before project(). (#812) <Sergii Dymchenko>
- **[4b3d2b0](https://github.com/onnx/onnx/commit/4b3d2b0)**: [WIP] reenable shape inference tests (#834) <anderspapitto>
- **[22d17ee](https://github.com/onnx/onnx/commit/22d17ee)**: RNN tests: LSTM, GRU, SimpleRNN (#739) <Peyman Manikashani>
- **[de65b95](https://github.com/onnx/onnx/commit/de65b95)**: dimension denotation (#443) <Tian Jin>
- **[eccc76e](https://github.com/onnx/onnx/commit/eccc76e)**: fix field number issue in onnx operator proto and enable its build (#829) <Ke Zhang>
- **[d582beb](https://github.com/onnx/onnx/commit/d582beb)**: disable shape inference test to unbreak ci (#830) <Lu Fang>
- **[485b787](https://github.com/onnx/onnx/commit/485b787)**: function proto for composite op. (#802) <Ke Zhang>
- **[cd58928](https://github.com/onnx/onnx/commit/cd58928)**: specify defaults for attributes of Affine op (#820) <G. Ramalingam>
- **[7ee2cf9](https://github.com/onnx/onnx/commit/7ee2cf9)**: merge the dummy backend back into the main one (#743) <anderspapitto>
- **[1c03a5a](https://github.com/onnx/onnx/commit/1c03a5a)**: [Proposal] ONNX Interface for Framework Integration (previously ONNX Backend API) header and docs (#551) <Marat Dukhan>
- **[3769a98](https://github.com/onnx/onnx/commit/3769a98)**: Rename real model test case from VGG-16 to ZFNet (#821) <Lu Fang>

* [C2]ReluN Op

relu n op.

tf reference: https://www.tensorflow.org/api_docs/python/tf/nn/relu6

* Call destructor when assigning a blob value

* Add executor overrides

Add executor overrides flag to enable migration to async_scheduling executor

* Add barrier net that runs before training nets - attempt #2

Add a synchonize barrier net that is run before training nets.  With this net, shards that are faster will wait for other shards before start training.  This reduce chances of the faster shards timing out during GLOO AllReduce.
Removed explicit data_parallel_model.py.synchronize call in holmes workflow.

This change was landed previously but caused errors for some EDPM workflows - See https://fb.facebook.com/groups/1426530000692545/permalink/1906766366002237/ - because EDPM assumes any call to CreateOrCloneCommonWorld and Gloo ops are wrapped in exception handlers but in this case exception thrown in the barrier init net is not handled.

To address this issue, we add _CreateOrCloneCommonWorld to the param_init_net instead of a new barrier init net.  Since errors for param_init_net run is handled gracefully and re-rendezvous, it should fixes the problem.

* Handle empty nets in async_scheduling

Make sure we don't get stuck on empty nets

* use CUDA_ARCH for conditional compile

* [C2 fix] infer function for ensure_cpu_output_op

* Update group_norm test to reduce flaky test

* Fix lr_multiplier for GPU
2018-05-10 23:14:27 -07:00
Orion Reblitz-Richardson
1d5780d42c Remove Apache headers from source.
* LICENSE file contains details, so removing from individual source files.
2018-03-27 13:10:18 -07:00
mlappelbaum
d11fc90317 Export atomic iter count (#2379)
* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Add axis to top_k_op. (#2416)

* Revert update on top_k_op

* Add axis to top_k_op

Add axis to top_k_op

* [auto] Update onnx to a8e4648 - Adjust link flags when built in Windows Debug mode (#647)
a8e4648a7d

* [auto] Update onnx to f4acf28 - Remove allowconsumed enforceconsumed from op schema. (#617)
f4acf281ef

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Initialize cpuinfo in the thread pool

Thread pool called cpuinfo_get_processors_count() without initializing cpuinfo. Only by luck it didn't make Caffe2 single-threaded: threadpool is initialized after NNPACK, and NNPACK initializes cpuinfo itself.

This commit also updates cpuinfo to a version that aborts with a fatal error if its used uninitialized.

* Updated Python Op and Image Pre-Processing Pipeline tutorials && Added CIFAR-10 Part 1 tutorial (#2286)

* Updated Basics tutorial: (1) Added Python 3 support with __future__ statements; (2) Various grammatical/typo fixes and minor refactoring of Markdown

* Added Python 3 support and made minor typo fixes

* Added Python 3 support with future imports, refactored and corrected errors in Markdown, added comments

* Added Python 3 support with future imports, Added use of caffe_translator.py to translate downloaded .caffemodel file to .pb files

* Upgrades to Image Pre-Processing Pipeline tutorial

* Updated Python Op tutorial

* removed markdown with empty links

* Added Part 1 of an end-to-end CIFAR-10 tutorial

* Updated MNIST Dataset and Databases tutorial with python3 support and markdown fixes

* Tweaks to markup, less training iterations

* changed permissions of CIFAR10_Part1; typo corrections in Image_Pre-Processing_Pipeline

* Typo corrections in Multi-GPU Training tutorial

* sync Python_Op py_gen with the IPython notebook

* nit typo correction

* [auto] Update onnx to 5cb999d - Minor cleanups to shape inference (#653)
5cb999ddc1

* [auto] Update onnx to ecac1c1 - Merge Rel 1.1.0 branch into master (#657)
ecac1c1624

* Strip down onnx to only pb definitions in mobile build (#2426)

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count

* Exported AtomicIterOp count
2018-03-26 19:26:09 -07:00
Orion Reblitz-Richardson
0ea8964fd6 Revert "Export number of iterations of AtomicIterOp" (#2359)
* Revert "Use -DCMAKE_BUILD_TYPE=Release for local build by default"

This reverts commit 035c62081f6420405b9f1380cc5d21b4c6ae78f6.

* Revert "Export number of iterations of AtomicIterOp (#2338)"

This reverts commit 91b7a0cb48c6b079e2ca8fd5c26819a003937d76.
2018-03-21 16:11:29 -07:00
mlappelbaum
8346088094 Export number of iterations of AtomicIterOp (#2338)
* Exported AtomicIterOp count

* Exported AtomicIterOp count
2018-03-21 12:39:30 -07:00
Orion Reblitz-Richardson
6aa087d902 Revert "export num iterations of AtomicIter"
This reverts commit be9c8e5591f5d38131b9bdc2249542f27dadc221.
2018-03-20 13:34:22 -07:00
Matan Appelbaum
fac306d3c9 export num iterations of AtomicIter
as title.  Useful for tracking number of EASGD updates.
2018-03-20 13:34:22 -07:00
Orion Reblitz-Richardson
5c381bbc57 Patch cuda-convnet2 from internal Facebook changes.
* Unfortunately this needs to be manually monkey patched.
* This should get it so GitHub and fbcode versions match.
2018-02-28 14:20:48 -08:00
Pieter Noordhuis
52fa742c51 Revert D6893040: Implementing Pow operator (this merges existing pow with a scalar and new pow with a tensor exponent).
Summary:
This reverts commit 30f614beea6f859fee25ce4f85573142885dde45

bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
cause_a_sev_many_files

Differential Revision:
D6893040

Original commit changeset: 30f614beea6f

fbshipit-source-id: 5e98a24699088283f864efe31234874bdacbe3c3
2018-02-14 10:34:08 -08:00
Maxim Naumov
f7cc8e8822 Implementing Pow operator (this merges existing pow with a scalar and new pow with a tensor exponent).
Summary: The old pow operator has been deleted in math_ops.cc, math_ops.cu and math_ops.h, while the new operator supporting scalar and tensor exponent has been added in pow_op.cc, pow_op.h an elementwise_op.cu.

Reviewed By: houseroad

Differential Revision: D6893040

fbshipit-source-id: 30f614beea6f859fee25ce4f85573142885dde45
2018-02-13 17:46:35 -08:00
Huazhong Ning
90543ff13a weighted sampling reader dequeue outputs table index
Summary: Weighted sampling reader dequeue randomly chooses a hive reader to read a mini-batch. This diff allows dequeue to output the index of the randomly chosen table to a specific blob.

Reviewed By: kennyhorror

Differential Revision: D6621070

fbshipit-source-id: 754b981fc2bcfdb0146d2a0a5b677e7cfe74211b
2018-01-24 19:06:25 -08:00
Pieter Noordhuis
d618c05174 Increase lower bound of values for values in div test
Summary:
This should translate to an 1% error margin. The gradient checker uses a .5% threshold.
Closes https://github.com/caffe2/caffe2/pull/1766

Differential Revision: D6774077

Pulled By: pietern

fbshipit-source-id: f97c7ffb2ef34fdd71d69320a7fdcf4a6a457715
2018-01-22 09:06:12 -08:00
Ahmed Taei
d1d6c0b12b Add CUDA implementation for ReplaceNaNOp
Reviewed By: jay-mahadeokar

Differential Revision: D6481993

fbshipit-source-id: cb253621795bb9de73d3e8bc1c8fc21b596d88c3
2017-12-05 13:34:51 -08:00
Junjie Bai
3da9d7971d Suppress pytest filter_too_much health check
Summary:
Fix the travis CI
Closes https://github.com/caffe2/caffe2/pull/1524

Reviewed By: dzhulgakov

Differential Revision: D6412499

Pulled By: bddppq

fbshipit-source-id: eaa5942c88d4edd65600d035e31d2300fd8ab3a8
2017-11-27 08:35:27 -08:00
Yan Shang
24e83acbb9 Enable sampling in evaluation
Reviewed By: chocjy

Differential Revision: D6119768

fbshipit-source-id: c8447326008392df70ab10b04f84223cf6d882b1
2017-11-16 14:03:51 -08:00
Andrey Malevich
e13f199452 Switch RNNOp to use NetDef argument for step represenetation.
Summary: Before this diff RNNOp was using TextFormat for representing steps. This diff is changing RNNOp to prefer NetDef argument instead. To be backward compatible it supports TextFormat for existing models, though we can compile RNNs without TextFormat as well.

Reviewed By: salexspb

Differential Revision: D5949330

fbshipit-source-id: 9336a8f5ccf30ad8d8e3a7067b9437e1704b1c9f
2017-10-10 22:01:51 -07:00
Di Yu
acc384183a caffe2 operator logit / logit gradient CUDA implementation
Summary: This is the continuation of T20872698 Implement the gradient operator for element-wise Logit

Reviewed By: asaadaldien

Differential Revision: D5969487

fbshipit-source-id: c9bb4222529f9fd9085aa9048b90eb70a63f41f4
2017-10-03 18:48:25 -07:00
Yangqing Jia
8286ce1e3a Re-license to Apache
Summary: Closes https://github.com/caffe2/caffe2/pull/1260

Differential Revision: D5906739

Pulled By: Yangqing

fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902
2017-09-28 16:22:00 -07:00
Di Yu
711d7137c7 Implement the gradient operator for element-wise Logit
Summary: Implemented logit gradient with eps as arg.  Add the unit test for it and explored the optimal parameter to run the test.

Reviewed By: asaadaldien

Differential Revision: D5910655

fbshipit-source-id: 44898b784a57c7ad45519b202b1eaf95c1c4d460
2017-09-26 14:49:22 -07:00
Aapo Kyrola
bb08f261f1 EnsureDense/SparseToDense for CUDA
Summary: Make CUDA version of SparseToDense, register EnsureDense (which is trivial) on CUDA. Need to use atomics because indices can be duplicated. We can later add an option to inform if the indices are unique, and use faster path then.

Reviewed By: jhcross

Differential Revision: D5750893

fbshipit-source-id: 005d1675b127a571aac8474fca62d9633f0c7bff
2017-09-01 09:33:05 -07:00
Ahmed Taei
8af625ede2 Implement gradients for Col2Im and Im2Col operators
Reviewed By: jay-mahadeokar

Differential Revision: D5576385

fbshipit-source-id: a0ca4f704fd861f7cc67079041b1d0772fc66920
2017-08-07 15:51:30 -07:00
Wojciech Glogowski
8f8dccd2ed distance_op_test from hypothesis_test refactored
Summary:
Moved distance_op_test from hypothesis_test to distance_op_test and
refactored

Reviewed By: akyrola, asaadaldien

Differential Revision: D5495104

fbshipit-source-id: 4a90c75eabeb380ae9d150d6258e9b5b0fbfc5ca
2017-07-26 13:37:08 -07:00
Wojciech Glogowski
f656e002a7 CosineSimilarity GPU
Reviewed By: asaadaldien, akyrola

Differential Revision: D5476812

fbshipit-source-id: d931a7d8e4a4dfdf22ee18f8b9c755cc21b0e75b
2017-07-25 13:34:01 -07:00
Bangsheng Tang
e5a7891038 dot product using matmul
Summary:
1. PairwiseDotProduct in layers
2. add_axis argument in Concat and Split(just for backward propagtion)

Reviewed By: xianjiec

Differential Revision: D5383208

fbshipit-source-id: 8e18ce371fff2da2da77b1a728142d69cd48e9c3
2017-07-17 23:20:37 -07:00
Aapo Kyrola
f44991b398 add timeout argument to DequeueBlobs; use 10 min timeout for data workers
Summary: As title. This helps with (quite common) cases where data input is stuck for reason or another, and the net execution never proceeds and is stuck forever.

Reviewed By: andrewwdye

Differential Revision: D5409885

fbshipit-source-id: 840261fd5964408f788fc0f50ece0d74193694ac
2017-07-13 18:52:03 -07:00
Marat Dukhan
2ac9ff5c96 Cos, Sin, and Abs operators
Summary: add Cos, Sin, and Abs operators

Reviewed By: akyrola

Differential Revision: D5307632

fbshipit-source-id: 743c9d289e4d3fd439e4b5385841cdff87d9247a
2017-07-03 22:18:32 -07:00
Thomas Dudziak
5355634dac Dict fixes/improvements and unittest targets for Python 3 in caffe2 core
Summary: As title

Reviewed By: salexspb

Differential Revision: D5316104

fbshipit-source-id: aee43819d817842e5ce6ba3d045a55b1a2491c30
2017-06-29 17:05:41 -07:00
Andrew Tulloch
912ee4e40a Fix test_sparse_to_dense precision failures
Summary: ..

Reviewed By: tomdz

Differential Revision: D5349561

fbshipit-source-id: 4c510905515eb03a64abc36f33d59a1d998c2ab1
2017-06-29 12:48:03 -07:00
Thomas Dudziak
342de07231 Core unit test fixes for Python 3
Summary: As title

Differential Revision: D5291327

fbshipit-source-id: 7dd9279c53ba55d3422c31973ffcec5705787fdf
2017-06-23 13:22:16 -07:00
Luke Yeager
e2107fffba Fixes for test_recurrent in hypothesis_test.py
Summary:
/cc akyrola is it possible this test has been broken ever since 5614816fce?

More generally, why do we still have `hypothesis_test.py` at all? In the case of this test, surely one of these files does more than this one old test:

* `operator_test/cudnn_recurrent_test.py`
* `operator_test/recurrent_network_test.py`
* `operator_test/rnn_cell_test.py`
Closes https://github.com/caffe2/caffe2/pull/843

Differential Revision: D5292109

Pulled By: akyrola

fbshipit-source-id: 6df5df6353a9741d1ae1b796adaab98382857527
2017-06-21 05:35:42 -07:00
Luke Yeager
31e700910d Fix entropy error coming from test_div
Summary:
Working towards https://github.com/caffe2/caffe2/pull/817.

`E           InvalidArgument: Insufficient bytes of entropy to draw requested array.  shape=(4, 2, 5, 1, 3, 5, 5, 1), dtype=float32.  Can you reduce the size or dimensions of the array?  What about using a smaller dtype?  If slow test runs and minimisation are acceptable, you  could increase settings().buffer_size from 8192 to at least 24576000.`

https://travis-ci.org/caffe2/caffe2/jobs/243867951
Closes https://github.com/caffe2/caffe2/pull/828

Differential Revision: D5276723

Pulled By: akyrola

fbshipit-source-id: f7d0e2dd8ef8b6a2354bd4ff7c7446c377c954b4
2017-06-19 13:47:29 -07:00
Po-Yen Chou
5ce9cbae70 Upgrades python/hypothesis_test.py to use brew instead of CNNHelperModel
Summary: Upgrades this file to use brew instead of CNNHelperModel

Reviewed By: harouwu

Differential Revision: D5252089

fbshipit-source-id: 6df4350717c1d42bc4bcc63d255cd422f085ee05
2017-06-15 15:07:56 -07:00
Luke Yeager
f61e4ca070 Fixes in tests to support numpy >= 0.12
Summary:
```
  File "/data/caffe2/install/caffe2/python/hypothesis_test.py", line 1911, in test_batch_to_space
    (w + 2 * pad) / block_size).astype(np.float32)
  File "mtrand.pyx", line 1404, in mtrand.RandomState.randn (numpy/random/mtrand/mtrand.c:19843)
  File "mtrand.pyx", line 1534, in mtrand.RandomState.standard_normal (numpy/random/mtrand/mtrand.c:20368)
  File "mtrand.pyx", line 167, in mtrand.cont0_array (numpy/random/mtrand/mtrand.c:6127)
TypeError: 'float' object cannot be interpreted as an index
```
```
  File "/data/caffe2/install/caffe2/python/operator_test/tile_op_test.py", line 101, in tile_ref
    tiled_data = np.tile(X, tuple(dims))
  File "/data/caffe2/venv/local/lib/python2.7/site-packages/numpy/lib/shape_base.py", line 881, in tile
    return c.reshape(shape_out)
TypeError: only integer scalar arrays can be converted to a scalar index
```
I also tested to make sure this still works with 0.11.
Closes https://github.com/caffe2/caffe2/pull/787

Differential Revision: D5248087

Pulled By: salexspb

fbshipit-source-id: eff69482a8eabb8ace330003fa326c832b53865f
2017-06-15 14:17:20 -07:00
Jiyan Yang
c7aa8e142d Add gradient to SparseToDense op
Summary: As desc.

Differential Revision: D5169423

fbshipit-source-id: 64c72933c14c3caabfbe0bf85912194a479c24fa
2017-06-09 13:47:21 -07:00
Pooya Davoodi
2c97c98ca7 Enable testing the GPU implementations of Adagrad and Adam
Summary:
Enable testing the GPU implementations of Adagrad and Adam incl sparse versions.
Closes https://github.com/caffe2/caffe2/pull/607

Reviewed By: dzhulgakov

Differential Revision: D5121552

Pulled By: Yangqing

fbshipit-source-id: da6b7dde456237c94cf74d00860e7327b2267eab
2017-06-01 18:10:57 -07:00
Yiming Wu
b070197e8a cuda unique op
Summary:
cuda unique op , unittest provided, will provide benchmark agains CPU

SpeedUp results for synthetic real data. Input of size 20k, range[1, 10million], **~5x** speedup

  CPU 9.05795(ms) Unique
  GPU 1.79434(ms) Unique

SpeedUp results for 5x synthetic data. Input of  size 1 million, range[1, 10million] **~13.7x** speedup

  CPU 54.7539(ms) Unique
  GPU 3.99473(ms) Unique

Reviewed By: akyrola

Differential Revision: D5007726

fbshipit-source-id: 0a00c518fd1809d0ae8c6cfcba09b0bd982ffaff
2017-05-11 21:08:10 -07:00
Xiaolong Wang
0d32ab4a45 Refactor FTRL optimizer to allow sending Alpha as input blob
Summary: Split from parent diff

Reviewed By: xianjiec

Differential Revision: D4992993

fbshipit-source-id: 9f8a79023b0c581e84bd5e82e2e730c9e1a86e1e
2017-05-04 22:57:00 -07:00
Kittipat Virochsiri
c34d5a838f Generalize LastNWindowCollector
Summary: Use `CopyItems` so that it accepts any type of tensor. Also, move the cursor to input blob so that it's checkpoint friendly. Output is now also part of input so that inference can work correctly.

Reviewed By: xianjiec

Differential Revision: D4920987

fbshipit-source-id: da532736225ec27f409ff763ff69a0629235151c
2017-05-04 16:05:15 -07:00
Ahmed Taei
561255218a NormalizeOP CUDA impelementation
Summary:
Implement NormalizeOP for GPU using CUDA, and re-write the graident to be a function of the output
so its more efficent specially for CUDA implemntation.

Reviewed By: akyrola

Differential Revision: D4971300

fbshipit-source-id: e0ab66462000988aaf1f26010ea550533d107167
2017-05-01 09:25:30 -07:00
Aapo Kyrola
ed05c28bc6 Speedup SquaredL2Distance CUDA
Summary: Both SquaredL2Distance and SquaredL2DistanceGradient had bad CUDA implementations. Use proper reductions and batched kernels.

Reviewed By: asaadaldien

Differential Revision: D4968527

fbshipit-source-id: f7cf82072d38bc127c757c5751863a9439aca8b5
2017-04-28 11:55:59 -07:00
Luke Yeager
09bb91022a Fix tests for ops without a CUDA backend
Summary:
*See https://github.com/caffe2/caffe2/pull/227*

* Logit
* ReplaceNaN
* BatchOneHot
Closes https://github.com/caffe2/caffe2/pull/277

Differential Revision: D4915268

Pulled By: Yangqing

fbshipit-source-id: 77ccb2e7d03e6953e8ca60646987a02868d0ef5b
2017-04-24 15:52:25 -07:00