Summary:
Changes in this PR:
1. Intermediate Docker image is shared from build stage to test stage through ECR, in order to fix the Caffe2 flaky CUDA tests.
2. There are ~7 Caffe2 operator tests that are only flaky in `caffe2_py2_gcc4_8_ubuntu14_04_test` on CPU. Disabling those tests on that config only, which is okay to do because we are still running those tests in other test jobs.
After this PR is merged, CircleCI will be running on master automatically, and will be running on PRs if the author rebased their PR onto the newest master (which we will ask all the authors to do when we switch off Jenkins for Linux).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12389
Differential Revision: D10224267
Pulled By: yf225
fbshipit-source-id: dd1a90a425c3d13b870d3d328cb301eee2e6e2cd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11748
For avx512, we need to align at a multiple of 64B not 32B
Regardless of avx512, it's in general a good idea to be cache line aligned.
Reviewed By: ilia-cher
Differential Revision: D9845056
fbshipit-source-id: b1d3ed67749c0c1a64acd5cc230a1279e8023512
Summary:
This PR adds all PyTorch and Caffe2 job configs to CircleCI.
Steps for the CircleCI mini-trial:
- [ ] Make sure this PR passes Jenkins CI and fbcode internal tests
- [x] Approve this PR
- [ ] Ask CircleCI to turn up the number of build machines
- [ ] Land this PR so that the new `.circleci/config.yml` will take effect
Several Caffe2 tests are flaky on CircleCI machines and hence skipped when running on CircleCI. A proper fix for them will be worked on after a successful mini-trial.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11264
Differential Revision: D9656793
Pulled By: yf225
fbshipit-source-id: 7832e90018f3dff7651489c04a179d6742168fe1
Summary:
The goal of this PR is to enable miopen engine(for hip devices) for recurrent operator and also enable corresponding unit test.
bddppq petrex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10840
Differential Revision: D9518980
Pulled By: bddppq
fbshipit-source-id: 214661e79a47c5dc6b712ef0fba986bd99db051f
* [bootcamp] Improve "Shape" operator to support axes specification
To improve .shape operator of Caffe2 to support x.shape(tensor, axes), which takes an optional int array "axes" as input. For example, x.shape(tensor, [1, 0]) will return the dimension for axis 1 and 0 following the specified order. For current version, "axes" input allows duplications and can have arbitrary length.
* Back out "Add barrier net that runs before training nets"
Original commit changeset: b373fdc9c30f. Need additional changes to some callers to support barrier failures.
* Change warning to verbose log to reduce log spam
The `LOG(WARNING)` was a bit spammy for regular use so lets just make it a `VLOG`.
* Extract the shared code from different caffe2_benchmark binaries
The OSS benchmark and Internal benchmark will share most functions in the benchmark.
* Support MFR in sequence training
As titled.
* Make knowledge distillation work with using logged prediction feature as teacher label.
1) Add loading raw dense feature as teacher label.
2) Optional calibration function for teacher label
3) Add teacher label into generic unit test
4) Deprecated TTSN workflow version using feature_options to config teacher label
* [C2/CUDA]: unjoined cross entropy sigmoid
as desc
* Add async_scheduling executor into deferrable_net_exec_test
Add async_scheduling into tests and fix some exception cases
* Fix Event disabled error
When disabling event in RNN ops make sure we don't call Finish on disabled
event from op's RunAsync
* cuda ensure cpu output op can handle both TensorCPU and TensorCUDA
as desc.
* [C2 Core] Infer input device option in C2 hypothesis_test checkers
Improve how we default input blob device options.
Previously it defaults as where op lives but it is not necessarily the case.
For example:
CopyCPUToGPU
* [C2 Op]SplitByLengthsOp CPU/GPU implementation
[C2 Op]SplitByLengthsOp CPU/GPU implementation
* fix undefined symbol error
not sure why we're getting undefined symbol even with link_whole = True
Need to figure out why but need this workaround for now
* Add tools in DAIPlayground platform to help debugging models
Add additional tools to allow Plauground override individual method defined in AnyExp. This will allow user to create module that specificly change certain default method behavior. An example included in this diff is deactivating test model and checkpointing. When debugging any model problems, switching off components helps me quickly narrow down the location of the bug. The technique is extensively used in task T27038712 (Steady memory increase in EDPM, eventually resulting in gloo/cuda.cu:34: out of memory)
* add shape and type inference for int8 conversion operator
* Fix flaky test for group_norm
Fix flaky test for group_norm
* Fix group_norm_op_test flaky
Fix group_norm_op_test flaky
* Implementation of composite learning rate policy
In many state-of-the-arts deep learning works, people use a simple trick to
schedule the learning rate: use a fixed learning rate until error plateaus
and then switch to a different fixed learning rate, and so on. In this diff,
we implemented a simple version of the composite learning rate. The user gives
a set of learning rates policies and corresponding iteration nums, and the
optimizer will change the learning rate policy based on the number of iterations so far.
For example, the user give two learning rate policies, one is FixedLearningRate
and PolyLearningRate, with an iteration number of 1k. Then the first 1k iteration,
we use FixedLearningRate. For the following iterations, we use PolyLearningRate.
* Split two use cases of CachedReader into two classes, DBFileReader and CachedReader
# Use Cases:
1). input: DB file -> output: DatasetReader.
Use DBFileReader.
2). input: Reader -> build cache DB file -> output: DatasetReader.
Use CachedReader.
# Changes to CachedReader:
1). Move db_path to the constructor.
Because in mock reader. cache will always be built ahead.
# Changes to tests:
1). Make a separate TestCase class for CachedReader and DBFileReader.
2). Make it possible to add more test functions by adding setUp, tearDown and _make_temp_path.
3). Make delete db_path more general. `db_path` could be a file for `log_file_db`, but could also be a directory for `leveldb`.
* Back out "On Mobile phones, call GlobalInit with no arguments in predictor in case we need to perform initialization"
Original commit changeset: 4489c6133f11
* Fix LARS bug
Fixed a bug in the LARS implementation which caused all subsequent blobs not using LARS to have the LARS learning rate multiplier applied to them.
* [tum] support sparse init & add uniformFill option
as title
* Propagate exception for async nets
Capture the exception when an exception is thrown in async nets and re-throw it after wait(). This allows exceptions to be propagated up to the caller.
This diff was a part of D7752068. We split the diff so that C2 core files changes are in a separate diff.
* Automatic update of fbcode/onnx to 69894f207dfcd72d1e70497d387201cec327efbc
Previous import was 403ccfbd0161c38f0834413d790bad0874afbf9a
Included changes:
- **[69894f2](https://github.com/onnx/onnx/commit/69894f2)**: Use op schema.all tensor types in random like definitions (#865) <Scott McKay>
- **[b9d6b90](https://github.com/onnx/onnx/commit/b9d6b90)**: Clarify random like operators (#846) <Scott McKay>
- **[fc6b5fb](https://github.com/onnx/onnx/commit/fc6b5fb)**: Refactor shape inference implementation (#855) <anderspapitto>
- **[b7d8dc8](https://github.com/onnx/onnx/commit/b7d8dc8)**: fix cmake warning message (#863) <Eric S. Yu>
- **[f585c5d](https://github.com/onnx/onnx/commit/f585c5d)**: add pytorch-operator test for tile (#831) <Wenhao Hu>
- **[993fe70](https://github.com/onnx/onnx/commit/993fe70)**: add install step (#832) <Eric S. Yu>
- **[68bc26c](https://github.com/onnx/onnx/commit/68bc26c)**: add type inference for traditional ml ops except classifier ops. (#857) <Ke Zhang>
- **[9cc0cda](https://github.com/onnx/onnx/commit/9cc0cda)**: fix string representation of scalar types (#858) <G. Ramalingam>
- **[1078925](https://github.com/onnx/onnx/commit/1078925)**: fix y in pow test case to scalar (#852) <Wenhao Hu>
- **[c66fb6f](https://github.com/onnx/onnx/commit/c66fb6f)**: Add some math function shape inference (#845) <anderspapitto>
- **[ff667d1](https://github.com/onnx/onnx/commit/ff667d1)**: Refactor return type and docs for ONNXIFI_BACKEND_DIRECTX_ID (#853) <Marat Dukhan>
- **[11c6876](https://github.com/onnx/onnx/commit/11c6876)**: clear initializer names when clear initializer (#849) <Wenhao Hu>
- **[73c34ae](https://github.com/onnx/onnx/commit/73c34ae)**: Clarify FeatureVectorizer description. (#843) <Scott McKay>
- **[1befb9b](https://github.com/onnx/onnx/commit/1befb9b)**: Remove useless text in docs (#850) <Lu Fang>
- **[e84788f](https://github.com/onnx/onnx/commit/e84788f)**: Fix SELU attributes' default values (#839) <Lu Fang>
- **[ebac046](https://github.com/onnx/onnx/commit/ebac046)**: Add tile test case (#823) <Wenhao Hu>
- **[8b7a925](https://github.com/onnx/onnx/commit/8b7a925)**: a few more shape inference functions (#772) <anderspapitto>
- **[9718f42](https://github.com/onnx/onnx/commit/9718f42)**: Make the coefficient non optional for LinearClassifier (#836) <Jaliya Ekanayake>
- **[ef083d0](https://github.com/onnx/onnx/commit/ef083d0)**: Add save_tensor and load_tensor functions for Protos (#770) <Lu Fang>
- **[45ceb55](https://github.com/onnx/onnx/commit/45ceb55)**: Check if CMAKE_BUILD_TYPE set before project(). (#812) <Sergii Dymchenko>
- **[4b3d2b0](https://github.com/onnx/onnx/commit/4b3d2b0)**: [WIP] reenable shape inference tests (#834) <anderspapitto>
- **[22d17ee](https://github.com/onnx/onnx/commit/22d17ee)**: RNN tests: LSTM, GRU, SimpleRNN (#739) <Peyman Manikashani>
- **[de65b95](https://github.com/onnx/onnx/commit/de65b95)**: dimension denotation (#443) <Tian Jin>
- **[eccc76e](https://github.com/onnx/onnx/commit/eccc76e)**: fix field number issue in onnx operator proto and enable its build (#829) <Ke Zhang>
- **[d582beb](https://github.com/onnx/onnx/commit/d582beb)**: disable shape inference test to unbreak ci (#830) <Lu Fang>
- **[485b787](https://github.com/onnx/onnx/commit/485b787)**: function proto for composite op. (#802) <Ke Zhang>
- **[cd58928](https://github.com/onnx/onnx/commit/cd58928)**: specify defaults for attributes of Affine op (#820) <G. Ramalingam>
- **[7ee2cf9](https://github.com/onnx/onnx/commit/7ee2cf9)**: merge the dummy backend back into the main one (#743) <anderspapitto>
- **[1c03a5a](https://github.com/onnx/onnx/commit/1c03a5a)**: [Proposal] ONNX Interface for Framework Integration (previously ONNX Backend API) header and docs (#551) <Marat Dukhan>
- **[3769a98](https://github.com/onnx/onnx/commit/3769a98)**: Rename real model test case from VGG-16 to ZFNet (#821) <Lu Fang>
* [C2]ReluN Op
relu n op.
tf reference: https://www.tensorflow.org/api_docs/python/tf/nn/relu6
* Call destructor when assigning a blob value
* Add executor overrides
Add executor overrides flag to enable migration to async_scheduling executor
* Add barrier net that runs before training nets - attempt #2
Add a synchonize barrier net that is run before training nets. With this net, shards that are faster will wait for other shards before start training. This reduce chances of the faster shards timing out during GLOO AllReduce.
Removed explicit data_parallel_model.py.synchronize call in holmes workflow.
This change was landed previously but caused errors for some EDPM workflows - See https://fb.facebook.com/groups/1426530000692545/permalink/1906766366002237/ - because EDPM assumes any call to CreateOrCloneCommonWorld and Gloo ops are wrapped in exception handlers but in this case exception thrown in the barrier init net is not handled.
To address this issue, we add _CreateOrCloneCommonWorld to the param_init_net instead of a new barrier init net. Since errors for param_init_net run is handled gracefully and re-rendezvous, it should fixes the problem.
* Handle empty nets in async_scheduling
Make sure we don't get stuck on empty nets
* use CUDA_ARCH for conditional compile
* [C2 fix] infer function for ensure_cpu_output_op
* Update group_norm test to reduce flaky test
* Fix lr_multiplier for GPU
* Exported AtomicIterOp count
* Exported AtomicIterOp count
* Exported AtomicIterOp count
* Exported AtomicIterOp count
* Exported AtomicIterOp count
* Exported AtomicIterOp count
* Exported AtomicIterOp count
* Add axis to top_k_op. (#2416)
* Revert update on top_k_op
* Add axis to top_k_op
Add axis to top_k_op
* [auto] Update onnx to a8e4648 - Adjust link flags when built in Windows Debug mode (#647)
a8e4648a7d
* [auto] Update onnx to f4acf28 - Remove allowconsumed enforceconsumed from op schema. (#617)
f4acf281ef
* Exported AtomicIterOp count
* Exported AtomicIterOp count
* Exported AtomicIterOp count
* Exported AtomicIterOp count
* Exported AtomicIterOp count
* Exported AtomicIterOp count
* Initialize cpuinfo in the thread pool
Thread pool called cpuinfo_get_processors_count() without initializing cpuinfo. Only by luck it didn't make Caffe2 single-threaded: threadpool is initialized after NNPACK, and NNPACK initializes cpuinfo itself.
This commit also updates cpuinfo to a version that aborts with a fatal error if its used uninitialized.
* Updated Python Op and Image Pre-Processing Pipeline tutorials && Added CIFAR-10 Part 1 tutorial (#2286)
* Updated Basics tutorial: (1) Added Python 3 support with __future__ statements; (2) Various grammatical/typo fixes and minor refactoring of Markdown
* Added Python 3 support and made minor typo fixes
* Added Python 3 support with future imports, refactored and corrected errors in Markdown, added comments
* Added Python 3 support with future imports, Added use of caffe_translator.py to translate downloaded .caffemodel file to .pb files
* Upgrades to Image Pre-Processing Pipeline tutorial
* Updated Python Op tutorial
* removed markdown with empty links
* Added Part 1 of an end-to-end CIFAR-10 tutorial
* Updated MNIST Dataset and Databases tutorial with python3 support and markdown fixes
* Tweaks to markup, less training iterations
* changed permissions of CIFAR10_Part1; typo corrections in Image_Pre-Processing_Pipeline
* Typo corrections in Multi-GPU Training tutorial
* sync Python_Op py_gen with the IPython notebook
* nit typo correction
* [auto] Update onnx to 5cb999d - Minor cleanups to shape inference (#653)
5cb999ddc1
* [auto] Update onnx to ecac1c1 - Merge Rel 1.1.0 branch into master (#657)
ecac1c1624
* Strip down onnx to only pb definitions in mobile build (#2426)
* Exported AtomicIterOp count
* Exported AtomicIterOp count
* Exported AtomicIterOp count
* Exported AtomicIterOp count
* Exported AtomicIterOp count
* Exported AtomicIterOp count
* Exported AtomicIterOp count
* Exported AtomicIterOp count
* Exported AtomicIterOp count
* Revert "Use -DCMAKE_BUILD_TYPE=Release for local build by default"
This reverts commit 035c62081f6420405b9f1380cc5d21b4c6ae78f6.
* Revert "Export number of iterations of AtomicIterOp (#2338)"
This reverts commit 91b7a0cb48c6b079e2ca8fd5c26819a003937d76.
Summary:
This reverts commit 30f614beea6f859fee25ce4f85573142885dde45
bypass-lint
An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
cause_a_sev_many_files
Differential Revision:
D6893040
Original commit changeset: 30f614beea6f
fbshipit-source-id: 5e98a24699088283f864efe31234874bdacbe3c3
Summary: The old pow operator has been deleted in math_ops.cc, math_ops.cu and math_ops.h, while the new operator supporting scalar and tensor exponent has been added in pow_op.cc, pow_op.h an elementwise_op.cu.
Reviewed By: houseroad
Differential Revision: D6893040
fbshipit-source-id: 30f614beea6f859fee25ce4f85573142885dde45
Summary: Weighted sampling reader dequeue randomly chooses a hive reader to read a mini-batch. This diff allows dequeue to output the index of the randomly chosen table to a specific blob.
Reviewed By: kennyhorror
Differential Revision: D6621070
fbshipit-source-id: 754b981fc2bcfdb0146d2a0a5b677e7cfe74211b
Summary:
This should translate to an 1% error margin. The gradient checker uses a .5% threshold.
Closes https://github.com/caffe2/caffe2/pull/1766
Differential Revision: D6774077
Pulled By: pietern
fbshipit-source-id: f97c7ffb2ef34fdd71d69320a7fdcf4a6a457715
Summary: Before this diff RNNOp was using TextFormat for representing steps. This diff is changing RNNOp to prefer NetDef argument instead. To be backward compatible it supports TextFormat for existing models, though we can compile RNNs without TextFormat as well.
Reviewed By: salexspb
Differential Revision: D5949330
fbshipit-source-id: 9336a8f5ccf30ad8d8e3a7067b9437e1704b1c9f
Summary: This is the continuation of T20872698 Implement the gradient operator for element-wise Logit
Reviewed By: asaadaldien
Differential Revision: D5969487
fbshipit-source-id: c9bb4222529f9fd9085aa9048b90eb70a63f41f4
Summary: Implemented logit gradient with eps as arg. Add the unit test for it and explored the optimal parameter to run the test.
Reviewed By: asaadaldien
Differential Revision: D5910655
fbshipit-source-id: 44898b784a57c7ad45519b202b1eaf95c1c4d460
Summary: Make CUDA version of SparseToDense, register EnsureDense (which is trivial) on CUDA. Need to use atomics because indices can be duplicated. We can later add an option to inform if the indices are unique, and use faster path then.
Reviewed By: jhcross
Differential Revision: D5750893
fbshipit-source-id: 005d1675b127a571aac8474fca62d9633f0c7bff
Summary:
Moved distance_op_test from hypothesis_test to distance_op_test and
refactored
Reviewed By: akyrola, asaadaldien
Differential Revision: D5495104
fbshipit-source-id: 4a90c75eabeb380ae9d150d6258e9b5b0fbfc5ca
Summary: As title. This helps with (quite common) cases where data input is stuck for reason or another, and the net execution never proceeds and is stuck forever.
Reviewed By: andrewwdye
Differential Revision: D5409885
fbshipit-source-id: 840261fd5964408f788fc0f50ece0d74193694ac
Summary:
/cc akyrola is it possible this test has been broken ever since 5614816fce?
More generally, why do we still have `hypothesis_test.py` at all? In the case of this test, surely one of these files does more than this one old test:
* `operator_test/cudnn_recurrent_test.py`
* `operator_test/recurrent_network_test.py`
* `operator_test/rnn_cell_test.py`
Closes https://github.com/caffe2/caffe2/pull/843
Differential Revision: D5292109
Pulled By: akyrola
fbshipit-source-id: 6df5df6353a9741d1ae1b796adaab98382857527
Summary:
Working towards https://github.com/caffe2/caffe2/pull/817.
`E InvalidArgument: Insufficient bytes of entropy to draw requested array. shape=(4, 2, 5, 1, 3, 5, 5, 1), dtype=float32. Can you reduce the size or dimensions of the array? What about using a smaller dtype? If slow test runs and minimisation are acceptable, you could increase settings().buffer_size from 8192 to at least 24576000.`
https://travis-ci.org/caffe2/caffe2/jobs/243867951
Closes https://github.com/caffe2/caffe2/pull/828
Differential Revision: D5276723
Pulled By: akyrola
fbshipit-source-id: f7d0e2dd8ef8b6a2354bd4ff7c7446c377c954b4
Summary: Upgrades this file to use brew instead of CNNHelperModel
Reviewed By: harouwu
Differential Revision: D5252089
fbshipit-source-id: 6df4350717c1d42bc4bcc63d255cd422f085ee05
Summary:
```
File "/data/caffe2/install/caffe2/python/hypothesis_test.py", line 1911, in test_batch_to_space
(w + 2 * pad) / block_size).astype(np.float32)
File "mtrand.pyx", line 1404, in mtrand.RandomState.randn (numpy/random/mtrand/mtrand.c:19843)
File "mtrand.pyx", line 1534, in mtrand.RandomState.standard_normal (numpy/random/mtrand/mtrand.c:20368)
File "mtrand.pyx", line 167, in mtrand.cont0_array (numpy/random/mtrand/mtrand.c:6127)
TypeError: 'float' object cannot be interpreted as an index
```
```
File "/data/caffe2/install/caffe2/python/operator_test/tile_op_test.py", line 101, in tile_ref
tiled_data = np.tile(X, tuple(dims))
File "/data/caffe2/venv/local/lib/python2.7/site-packages/numpy/lib/shape_base.py", line 881, in tile
return c.reshape(shape_out)
TypeError: only integer scalar arrays can be converted to a scalar index
```
I also tested to make sure this still works with 0.11.
Closes https://github.com/caffe2/caffe2/pull/787
Differential Revision: D5248087
Pulled By: salexspb
fbshipit-source-id: eff69482a8eabb8ace330003fa326c832b53865f
Summary: Use `CopyItems` so that it accepts any type of tensor. Also, move the cursor to input blob so that it's checkpoint friendly. Output is now also part of input so that inference can work correctly.
Reviewed By: xianjiec
Differential Revision: D4920987
fbshipit-source-id: da532736225ec27f409ff763ff69a0629235151c
Summary:
Implement NormalizeOP for GPU using CUDA, and re-write the graident to be a function of the output
so its more efficent specially for CUDA implemntation.
Reviewed By: akyrola
Differential Revision: D4971300
fbshipit-source-id: e0ab66462000988aaf1f26010ea550533d107167
Summary: Both SquaredL2Distance and SquaredL2DistanceGradient had bad CUDA implementations. Use proper reductions and batched kernels.
Reviewed By: asaadaldien
Differential Revision: D4968527
fbshipit-source-id: f7cf82072d38bc127c757c5751863a9439aca8b5