Summary:
Not in the same format. Skip at the moment.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9751
Reviewed By: yinghai
Differential Revision: D8965636
Pulled By: houseroad
fbshipit-source-id: 81d39c2f5625c14c0e1ee11408b5f7267b53798f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9458
The goal is to support count_include_pad in Caffe2 ONNX backend. This commit contains the first step - support 4-D tensor cases.
AveragePool with count_include_pad can be expressed as PadImage + AveragePool.
Reviewed By: houseroad
Differential Revision: D8852180
fbshipit-source-id: 4db00e9771be7a000a2d92850dfd066d9c9c38bf
* [fix] fixup the bias multiplier data access issue
Hotfix for failues in conv_transpose
* [D2][Easy]: lint regularizer
lint with black
* [GanH]: Split mu in adaptive weight for diagnose
* [Dper] Add the ability to split FC weights into multiple smaller ones
* fix SumReduceLikeOp for empty blob
as desc.
* add ctc_greedy_decoder for caffe2
ctc_greedy_decoder same as tf's
* Update event callback handling
Allow multiple callbacks per event
* Add WeightedSum layer
The motivation is to do weighted sum in HoNet/crossnet, in the next diff, I'll replace model.Add with model.WeightedSum in
honet: https://fburl.com/f4rmolg2
crossnet: https://fburl.com/v7awn8se, https://fburl.com/63filbnm
* Replicate DAG's behavior
Some callers expect RunAsync to block, replicate that behavior in case of
explicit 'dag' net type
* [dper] layernorm layer
as title
* Override dag, async_dag, async_polling
Overriding dag, async_dag and async_polling with async_scheduling
* Name the thread pools
Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead.
* [Caffe2] FilleOp should support int64_t dimensions
Change argument type to int64_t for shape argument of FillerOp (used in ConstantFill, XavierFill, etc)
* Remove caffe2/caffe2/contrib/torch/
It's not used anywhere and depends on old lua torch that conflicts with Aten. Given PT1 it's not relevant any more (though it was nice and clever code!)
#accept2ship
* Fix linearWarmup multiplier check
The multiplier needs to be non-negative, not strictly positive.
* Revert D3314316
This is after 2 years and we do not seem to have a use case for this one, so
for the sake of clean API design we should potentially remove this. This would
allow us to potentially pass in arguments to optionally construct an object,
although it is indeed a little bit unclear how we can reuse existing objects if
constructor arguments are passed in. In any case, we may want to remove this
dangling feature.
* Speedup generate proposals by partial_sort.
Speedup generate proposals by partial_sort.
FACEBOOK:
- Saw speed improvement for training with this op.
- Yanghan benchmarked the op on a small dataset and see consistent 100% improvement on speed (6ms -> 3ms) on 420 input resolution. See next diff for details.
* More parallel processing friendly for CPP version of GenerateProposals.
More parallel processing friendly for CPP version of GenerateProposals.
* [DT] [43/n] Lift stop conditions inside reader code back to flow control
1. Split multi_reader function into local_reader and remote_reader
2. Lifted stop conditions inside Limiter back to flow control
3. Split epoch flow building logic into 3 cases:
- single machine (1 reader, 1 trainer on trainer0 node, no PS)
- (1 reader + 1 trainer) on trainer0 node, has PS
- multiple readers, readers do not share nodes with trainers, might have PS or not
* Resolve conflicts for torch/_thnn/utils.py
* [Caffe2] Handle image decoding errors
Image decoding errors can make the whole training fail. This diff is to handle them
1.Catch imdecode exceptions and check if decoded image has zero columns or rows. This is counted as decoding errors.
2.Replace the image with empty in case of error
3.Count the number of errors and throw runtime exception if the rate reaches given number
The empty image data is kept. It might introduce noise in the training data.
* Update MKL exporter to IDEEP ops
TSIA
* [Caffe2] GlobalInit is thread safe, fixing the comment
With the mutex and lock, GlobalInit is thread safe.
Update the comments.
* Back out "Add support for generating ATen files during fbcode build"
Original commit changeset: 28970ddba353
@override-unit-failures
(Note: this ignores all push blocking failures!)
* [DT]: fix predictor save
similar to D6610058, here we add the fix for distributed online training
* Remove net_singlethread_async_gpu.cc
Closes https://github.com/caffe2/caffe2/pull/2528
This removes net_singlethread_async_gpu.cc as part of our effort to clean
CUDAContext and the net executors.
* Inline DFS task execution
Add a DFS inline task execution mode in executor
* Add c10 folder to fbcode
This adds the c10 folder and its test cases to fbcode. Build flags are mostly taken from aten.
* add dependencies for online trainer
Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators
Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/
* Resolve conflicts for tools/jit/gen_jit_dispatch.py
* [Fix] sparse regularization in distributed training
* Support advanced pooling options in sum processor
* support advanced pooling options in sum processor
* remove redundant code
* support attention in sum processor
* Improve shard logging in net tracing code
Make it handle arbitrary shard ids instead of just one digit ids.
* [Caffe2] Call GlobalInit in predictor only in mobile
FACEBOOK:
Calling GlobalInit long after the program starts may not be safe. There are issues if the following happens:
User does not call GlobalInit and initFacebook after program starts
User sets a flag manually: https://fburl.com/mcsumw7d
User calls OSS predictor.
OSS predictor calls GlobalInit
GlobalInit calls initFacebook
initFacebook resets all flags: https://fburl.com/tolszha1
Thus, the user manually set flags are overwritten
This would happen anytime GlobalInit is called long after the program starts.
I suppose the intention of the user in this case is not to call GlobalInit throughout the program,
but use Caffe2 regardless (is that desired?)
But adding GlobalInit in the OSS predictor would automatically call GlobalInit when using Caffe2.
This issue doesn't exist in mobile, since initFacebook is not called on mobile.
For now, guard the GlobalInit in predictor for mobile only.
May want to ensure the GlobalInit is always called at the start of the program. @[3501714:kutta] has seen weird issues when not calling GlobalInit at the start of the program on server side. He has made some progress on this.
* resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py
* Add empty fix for SumLikeReduceOp
Add empty fix for SumLikeReduceOp
* Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN
This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9
@bypass-lint
An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files
* Remove Declarations.yaml
* Include common.h
* Change std::stoi to caffe2::stoi
* Add thread_name.cc to the CMake file
* No need to subtract 1. Fix test segfaults
* Fix NetTest, ObserverTest
Fix tests
(cherry picked from commit 3767e66c3f365596cba3d46d3e7322c933a0ab41)
* CTCGreedyDecoderOp only has CPU implementation, test should only run on CPU
* Add a variable to avoid conversion resizing issue
* [fix] fixup the bias multiplier data access issue
Hotfix for failues in conv_transpose
* [D2][Easy]: lint regularizer
lint with black
* [GanH]: Split mu in adaptive weight for diagnose
* [Dper] Add the ability to split FC weights into multiple smaller ones
* fix SumReduceLikeOp for empty blob
as desc.
* add ctc_greedy_decoder for caffe2
ctc_greedy_decoder same as tf's
* Update event callback handling
Allow multiple callbacks per event
* Add WeightedSum layer
The motivation is to do weighted sum in HoNet/crossnet, in the next diff, I'll replace model.Add with model.WeightedSum in
honet: https://fburl.com/f4rmolg2
crossnet: https://fburl.com/v7awn8se, https://fburl.com/63filbnm
* Replicate DAG's behavior
Some callers expect RunAsync to block, replicate that behavior in case of
explicit 'dag' net type
* [dper] layernorm layer
as title
* Override dag, async_dag, async_polling
Overriding dag, async_dag and async_polling with async_scheduling
* Name the thread pools
Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead.
* [Caffe2] FilleOp should support int64_t dimensions
Change argument type to int64_t for shape argument of FillerOp (used in ConstantFill, XavierFill, etc)
* Remove caffe2/caffe2/contrib/torch/
It's not used anywhere and depends on old lua torch that conflicts with Aten. Given PT1 it's not relevant any more (though it was nice and clever code!)
#accept2ship
* Fix linearWarmup multiplier check
The multiplier needs to be non-negative, not strictly positive.
* Revert D3314316
This is after 2 years and we do not seem to have a use case for this one, so
for the sake of clean API design we should potentially remove this. This would
allow us to potentially pass in arguments to optionally construct an object,
although it is indeed a little bit unclear how we can reuse existing objects if
constructor arguments are passed in. In any case, we may want to remove this
dangling feature.
* Speedup generate proposals by partial_sort.
Speedup generate proposals by partial_sort.
FACEBOOK:
- Saw speed improvement for training with this op.
- Yanghan benchmarked the op on a small dataset and see consistent 100% improvement on speed (6ms -> 3ms) on 420 input resolution. See next diff for details.
* More parallel processing friendly for CPP version of GenerateProposals.
More parallel processing friendly for CPP version of GenerateProposals.
* [DT] [43/n] Lift stop conditions inside reader code back to flow control
1. Split multi_reader function into local_reader and remote_reader
2. Lifted stop conditions inside Limiter back to flow control
3. Split epoch flow building logic into 3 cases:
- single machine (1 reader, 1 trainer on trainer0 node, no PS)
- (1 reader + 1 trainer) on trainer0 node, has PS
- multiple readers, readers do not share nodes with trainers, might have PS or not
* Resolve conflicts for torch/_thnn/utils.py
* [Caffe2] Handle image decoding errors
Image decoding errors can make the whole training fail. This diff is to handle them
1.Catch imdecode exceptions and check if decoded image has zero columns or rows. This is counted as decoding errors.
2.Replace the image with empty in case of error
3.Count the number of errors and throw runtime exception if the rate reaches given number
The empty image data is kept. It might introduce noise in the training data.
* Update MKL exporter to IDEEP ops
TSIA
* [Caffe2] GlobalInit is thread safe, fixing the comment
With the mutex and lock, GlobalInit is thread safe.
Update the comments.
* Back out "Add support for generating ATen files during fbcode build"
Original commit changeset: 28970ddba353
@override-unit-failures
(Note: this ignores all push blocking failures!)
* [DT]: fix predictor save
similar to D6610058, here we add the fix for distributed online training
* Remove net_singlethread_async_gpu.cc
Closes https://github.com/caffe2/caffe2/pull/2528
This removes net_singlethread_async_gpu.cc as part of our effort to clean
CUDAContext and the net executors.
* Inline DFS task execution
Add a DFS inline task execution mode in executor
* Add c10 folder to fbcode
This adds the c10 folder and its test cases to fbcode. Build flags are mostly taken from aten.
* add dependencies for online trainer
Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators
Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/
* Resolve conflicts for tools/jit/gen_jit_dispatch.py
* [Fix] sparse regularization in distributed training
* Support advanced pooling options in sum processor
* support advanced pooling options in sum processor
* remove redundant code
* support attention in sum processor
* Improve shard logging in net tracing code
Make it handle arbitrary shard ids instead of just one digit ids.
* [Caffe2] Call GlobalInit in predictor only in mobile
FACEBOOK:
Calling GlobalInit long after the program starts may not be safe. There are issues if the following happens:
User does not call GlobalInit and initFacebook after program starts
User sets a flag manually: https://fburl.com/mcsumw7d
User calls OSS predictor.
OSS predictor calls GlobalInit
GlobalInit calls initFacebook
initFacebook resets all flags: https://fburl.com/tolszha1
Thus, the user manually set flags are overwritten
This would happen anytime GlobalInit is called long after the program starts.
I suppose the intention of the user in this case is not to call GlobalInit throughout the program,
but use Caffe2 regardless (is that desired?)
But adding GlobalInit in the OSS predictor would automatically call GlobalInit when using Caffe2.
This issue doesn't exist in mobile, since initFacebook is not called on mobile.
For now, guard the GlobalInit in predictor for mobile only.
May want to ensure the GlobalInit is always called at the start of the program. @[3501714:kutta] has seen weird issues when not calling GlobalInit at the start of the program on server side. He has made some progress on this.
* resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py
* Add empty fix for SumLikeReduceOp
Add empty fix for SumLikeReduceOp
* Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN
This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9
@bypass-lint
An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files
* Remove Declarations.yaml
* Include common.h
* Change std::stoi to caffe2::stoi
* Add thread_name.cc to the CMake file
* No need to subtract 1. Fix test segfaults
* Fix NetTest, ObserverTest
Fix tests
(cherry picked from commit 3767e66c3f365596cba3d46d3e7322c933a0ab41)
* CTCGreedyDecoderOp only has CPU implementation, test should only run on CPU
* Add a variable to avoid conversion resizing issue
* Remove the code per soumith's comments
* Remove the code per soumith's comments
* Remove blank lines in the end of file
* Resolve conflicts for torch/_thnn/utils.py
* Update MKL exporter to IDEEP ops
TSIA
* Back out "Add support for generating ATen files during fbcode build"
Original commit changeset: 28970ddba353
@override-unit-failures
(Note: this ignores all push blocking failures!)
* add dependencies for online trainer
Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators
Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/
* Resolve conflicts for tools/jit/gen_jit_dispatch.py
* Support advanced pooling options in sum processor
* support advanced pooling options in sum processor
* remove redundant code
* support attention in sum processor
* resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py
* Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN
This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9
@bypass-lint
An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files
* Remove Declarations.yaml
* Include common.h
* Change std::stoi to caffe2::stoi
* [caffe2] uprade IDEEP and hotfix for conv op accuracy issue (#8364)
* [IDEEP] Upgrade IDEEP version
Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>
* [IDEEP] Fix accuracy issue in conv op
Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>
* Fix build error due to lack of src in CMakeLists
Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>
* Remove the code per soumith's comments
* [ONNX] Add an ATen fallback pathway for ONNX export (#8273)
* ATen fallback for ONNX export
* Move to enum
* Fix model test
* Add comment
* Address comments
BC interface
* Remove imaginary file (#8415)
* [Caffe2] Enable AMD/MIOPEN ops for Caffe2 (#8306)
* Add hip support for caffe2 core
* Add MIOPEN header/wrapper to caffe2 core
* Add HIP device into caffe2 PB
* top level makefile change for rocm/hip
* makefile scaffolding for AMD/RocM/HIP
* Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files
* caffe2 PB update for AMD/ROCM HIP device
* Add AMD/RocM/Thrust dependency
* HIP threadpool update
* Fix makefile macro
* makefile fix: duplicate test/binary name
* makefile clean-up
* makefile clean-up
* add HIP operator registry
* add utilities for hip device
* Add USE_HIP to config summary
* makefile fix for BUILD_TEST
* merge latest
* Fix indentation
* code clean-up
* Guard builds without HIP and use the same cmake script as PyTorch to find HIP
* Setup rocm environment variables in build.sh (ideally should be done in the docker images)
* setup locale
* set HIP_PLATFORM
* Revert "set HIP_PLATFORM"
This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad.
* continue the build script environment variables mess
* HCC_AMDGPU_TARGET
* Cleanup the mess, has been fixed in the lastest docker images
* Assign protobuf field hip_gpu_id a new field number for backward compatibility
* change name to avoid conflict
* Fix duplicated thread pool flag
* Refactor cmake files to not add hip includes and libs globally
* Fix the wrong usage of environment variables detection in cmake
* Add MIOPEN CNN operators
* Revert "Add MIOPEN CNN operators"
This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a.
* Add MIOPEN pooling operator
* Add MIOPEN activation operator
* Add MIOPEN softmax operator
* Add MIOPEN spatial batch norm operator
* Add MIOPEN loacl response normalization operator
* Add MIOPEN conv operator
* Clean-up LRN ops
* enable fp16 in MIOPEN pool ops
* Enable fp16 for MIOPEN relu op
* Enable fp16 for MIOPEN spatial batch norm op
* code clean-up
* revert float16 support
* Create Caffe2 python binding for AMD/ROCM/HIP
* Add op fallback for HIP operator
* add hip src/test files in cmake
* exclude hip src/test files
* fix python binding for hip backend
* fix MIOPEN pooling op workspace
* hack to compile miopen operators
* fix include path for MIOPEN ops
* Fix include path
* Add HIP math utilities
* Fix path for HIP math utils
* cmake fix
* Cmake fix / hipcc for hip files
* suppress hipcc warning
* cmake fix /replcae USE_HIP with USE_ROCM
* revert LoadHIP.cmake change
* fix include for thrust/cub-hip
* include path fix for conversion.h
* Updated with latest upstream changes
* clang format fixes
* Context_hip updates
* Fixed typo in rocblas handle get function
* Updated hipified math utils
* Updated math hip test util
* Updated context hip test
* Updated common_hip
* Updated net async dag for HIP
* Added MIOPEN in operator hip test
* fix
* C2 dependencies clean-up
* fix include path for building custom protobuf
* Decouple miopen pool op and conv_pool_op base
* cmake refactor
* fix operator_hip_test
* move all hip/miopen ops files into caffe2/operators/hip
* sanitize cmake
* permission issue
* remove extra parenthesis
* remove artifact from resolving merge conflict
* cont. sanitize cmake files
* fix syntax error
* sanitize conversion.h
* .
* Revert "."
This reverts commit 56020cb0e996a31ae27bf1f8f491955ed0b121b9.
* clang-format
* Enable some reduce operators' ONNX backend tests (#8418)
* fix old comment to point to the right file (#8416)
* Stop pinning nccl version. (#8421)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Expose logsumexp docs and mark log_sum_exp in distributions for internal use (#8428)
* Enable some of the ONNX backend test on broadcasting (#8423)
* Enable some of the ONNX backend test on broadcasting
* enable gemm broadcast
* Expose proto utils and ONNX (#8073)
* Expose proto utils and ONNX from PyTorch libcaffe2.so
* Try to use protobuf from _C.so
* Fix ONNX proto header include
* Adjust order of imports for ONNX until nanopb goes away
* Set and use ONNX_NAMESPACE for PyTorch builds
* Show protobuf summary for all builds
* Add ONNX_NAMESPACE for cpp_build
* Statically link libprotobuf.a into libtorch.so
* Set ONNX_NAMESPACE on Windows build
* Move core/dispatch up as well
* Add /MD flag for Windows build of _C
* Potential Windows fix for ONNX and protobuf
* Add direct linkage from _C to ONNX on Windows
* Only include protobuf wrapper for PyTorch
* Pass extra_compile_args to _nvrtc ext build
* Remove installation of .a files
* Rebase creates some weird situations, revert them manually
* Remove more weird changes due to rebase
* Need to add thread_name.cc after merge
* Added support to run ONNX Upsample operator (mode=nearest) in Caffe2
* adding error checks to upsample
* adding error checks to upsample
* adding error checks to upsample
* changing to np.isclose
* Revert onnx submodule update
* still fixing
* Skip some tests to unbreak CI
* Pass the opset_version to run_node
* Remove the stale check_graph call, caffe2_net_to_onnx_model will invoke check_model
* [bootcamp] Improve "Shape" operator to support axes specification
To improve .shape operator of Caffe2 to support x.shape(tensor, axes), which takes an optional int array "axes" as input. For example, x.shape(tensor, [1, 0]) will return the dimension for axis 1 and 0 following the specified order. For current version, "axes" input allows duplications and can have arbitrary length.
* Back out "Add barrier net that runs before training nets"
Original commit changeset: b373fdc9c30f. Need additional changes to some callers to support barrier failures.
* Change warning to verbose log to reduce log spam
The `LOG(WARNING)` was a bit spammy for regular use so lets just make it a `VLOG`.
* Extract the shared code from different caffe2_benchmark binaries
The OSS benchmark and Internal benchmark will share most functions in the benchmark.
* Support MFR in sequence training
As titled.
* Make knowledge distillation work with using logged prediction feature as teacher label.
1) Add loading raw dense feature as teacher label.
2) Optional calibration function for teacher label
3) Add teacher label into generic unit test
4) Deprecated TTSN workflow version using feature_options to config teacher label
* [C2/CUDA]: unjoined cross entropy sigmoid
as desc
* Add async_scheduling executor into deferrable_net_exec_test
Add async_scheduling into tests and fix some exception cases
* Fix Event disabled error
When disabling event in RNN ops make sure we don't call Finish on disabled
event from op's RunAsync
* cuda ensure cpu output op can handle both TensorCPU and TensorCUDA
as desc.
* [C2 Core] Infer input device option in C2 hypothesis_test checkers
Improve how we default input blob device options.
Previously it defaults as where op lives but it is not necessarily the case.
For example:
CopyCPUToGPU
* [C2 Op]SplitByLengthsOp CPU/GPU implementation
[C2 Op]SplitByLengthsOp CPU/GPU implementation
* fix undefined symbol error
not sure why we're getting undefined symbol even with link_whole = True
Need to figure out why but need this workaround for now
* Add tools in DAIPlayground platform to help debugging models
Add additional tools to allow Plauground override individual method defined in AnyExp. This will allow user to create module that specificly change certain default method behavior. An example included in this diff is deactivating test model and checkpointing. When debugging any model problems, switching off components helps me quickly narrow down the location of the bug. The technique is extensively used in task T27038712 (Steady memory increase in EDPM, eventually resulting in gloo/cuda.cu:34: out of memory)
* add shape and type inference for int8 conversion operator
* Fix flaky test for group_norm
Fix flaky test for group_norm
* Fix group_norm_op_test flaky
Fix group_norm_op_test flaky
* Implementation of composite learning rate policy
In many state-of-the-arts deep learning works, people use a simple trick to
schedule the learning rate: use a fixed learning rate until error plateaus
and then switch to a different fixed learning rate, and so on. In this diff,
we implemented a simple version of the composite learning rate. The user gives
a set of learning rates policies and corresponding iteration nums, and the
optimizer will change the learning rate policy based on the number of iterations so far.
For example, the user give two learning rate policies, one is FixedLearningRate
and PolyLearningRate, with an iteration number of 1k. Then the first 1k iteration,
we use FixedLearningRate. For the following iterations, we use PolyLearningRate.
* Split two use cases of CachedReader into two classes, DBFileReader and CachedReader
# Use Cases:
1). input: DB file -> output: DatasetReader.
Use DBFileReader.
2). input: Reader -> build cache DB file -> output: DatasetReader.
Use CachedReader.
# Changes to CachedReader:
1). Move db_path to the constructor.
Because in mock reader. cache will always be built ahead.
# Changes to tests:
1). Make a separate TestCase class for CachedReader and DBFileReader.
2). Make it possible to add more test functions by adding setUp, tearDown and _make_temp_path.
3). Make delete db_path more general. `db_path` could be a file for `log_file_db`, but could also be a directory for `leveldb`.
* Back out "On Mobile phones, call GlobalInit with no arguments in predictor in case we need to perform initialization"
Original commit changeset: 4489c6133f11
* Fix LARS bug
Fixed a bug in the LARS implementation which caused all subsequent blobs not using LARS to have the LARS learning rate multiplier applied to them.
* [tum] support sparse init & add uniformFill option
as title
* Propagate exception for async nets
Capture the exception when an exception is thrown in async nets and re-throw it after wait(). This allows exceptions to be propagated up to the caller.
This diff was a part of D7752068. We split the diff so that C2 core files changes are in a separate diff.
* Automatic update of fbcode/onnx to 69894f207dfcd72d1e70497d387201cec327efbc
Previous import was 403ccfbd0161c38f0834413d790bad0874afbf9a
Included changes:
- **[69894f2](https://github.com/onnx/onnx/commit/69894f2)**: Use op schema.all tensor types in random like definitions (#865) <Scott McKay>
- **[b9d6b90](https://github.com/onnx/onnx/commit/b9d6b90)**: Clarify random like operators (#846) <Scott McKay>
- **[fc6b5fb](https://github.com/onnx/onnx/commit/fc6b5fb)**: Refactor shape inference implementation (#855) <anderspapitto>
- **[b7d8dc8](https://github.com/onnx/onnx/commit/b7d8dc8)**: fix cmake warning message (#863) <Eric S. Yu>
- **[f585c5d](https://github.com/onnx/onnx/commit/f585c5d)**: add pytorch-operator test for tile (#831) <Wenhao Hu>
- **[993fe70](https://github.com/onnx/onnx/commit/993fe70)**: add install step (#832) <Eric S. Yu>
- **[68bc26c](https://github.com/onnx/onnx/commit/68bc26c)**: add type inference for traditional ml ops except classifier ops. (#857) <Ke Zhang>
- **[9cc0cda](https://github.com/onnx/onnx/commit/9cc0cda)**: fix string representation of scalar types (#858) <G. Ramalingam>
- **[1078925](https://github.com/onnx/onnx/commit/1078925)**: fix y in pow test case to scalar (#852) <Wenhao Hu>
- **[c66fb6f](https://github.com/onnx/onnx/commit/c66fb6f)**: Add some math function shape inference (#845) <anderspapitto>
- **[ff667d1](https://github.com/onnx/onnx/commit/ff667d1)**: Refactor return type and docs for ONNXIFI_BACKEND_DIRECTX_ID (#853) <Marat Dukhan>
- **[11c6876](https://github.com/onnx/onnx/commit/11c6876)**: clear initializer names when clear initializer (#849) <Wenhao Hu>
- **[73c34ae](https://github.com/onnx/onnx/commit/73c34ae)**: Clarify FeatureVectorizer description. (#843) <Scott McKay>
- **[1befb9b](https://github.com/onnx/onnx/commit/1befb9b)**: Remove useless text in docs (#850) <Lu Fang>
- **[e84788f](https://github.com/onnx/onnx/commit/e84788f)**: Fix SELU attributes' default values (#839) <Lu Fang>
- **[ebac046](https://github.com/onnx/onnx/commit/ebac046)**: Add tile test case (#823) <Wenhao Hu>
- **[8b7a925](https://github.com/onnx/onnx/commit/8b7a925)**: a few more shape inference functions (#772) <anderspapitto>
- **[9718f42](https://github.com/onnx/onnx/commit/9718f42)**: Make the coefficient non optional for LinearClassifier (#836) <Jaliya Ekanayake>
- **[ef083d0](https://github.com/onnx/onnx/commit/ef083d0)**: Add save_tensor and load_tensor functions for Protos (#770) <Lu Fang>
- **[45ceb55](https://github.com/onnx/onnx/commit/45ceb55)**: Check if CMAKE_BUILD_TYPE set before project(). (#812) <Sergii Dymchenko>
- **[4b3d2b0](https://github.com/onnx/onnx/commit/4b3d2b0)**: [WIP] reenable shape inference tests (#834) <anderspapitto>
- **[22d17ee](https://github.com/onnx/onnx/commit/22d17ee)**: RNN tests: LSTM, GRU, SimpleRNN (#739) <Peyman Manikashani>
- **[de65b95](https://github.com/onnx/onnx/commit/de65b95)**: dimension denotation (#443) <Tian Jin>
- **[eccc76e](https://github.com/onnx/onnx/commit/eccc76e)**: fix field number issue in onnx operator proto and enable its build (#829) <Ke Zhang>
- **[d582beb](https://github.com/onnx/onnx/commit/d582beb)**: disable shape inference test to unbreak ci (#830) <Lu Fang>
- **[485b787](https://github.com/onnx/onnx/commit/485b787)**: function proto for composite op. (#802) <Ke Zhang>
- **[cd58928](https://github.com/onnx/onnx/commit/cd58928)**: specify defaults for attributes of Affine op (#820) <G. Ramalingam>
- **[7ee2cf9](https://github.com/onnx/onnx/commit/7ee2cf9)**: merge the dummy backend back into the main one (#743) <anderspapitto>
- **[1c03a5a](https://github.com/onnx/onnx/commit/1c03a5a)**: [Proposal] ONNX Interface for Framework Integration (previously ONNX Backend API) header and docs (#551) <Marat Dukhan>
- **[3769a98](https://github.com/onnx/onnx/commit/3769a98)**: Rename real model test case from VGG-16 to ZFNet (#821) <Lu Fang>
* [C2]ReluN Op
relu n op.
tf reference: https://www.tensorflow.org/api_docs/python/tf/nn/relu6
* Call destructor when assigning a blob value
* Add executor overrides
Add executor overrides flag to enable migration to async_scheduling executor
* Add barrier net that runs before training nets - attempt #2
Add a synchonize barrier net that is run before training nets. With this net, shards that are faster will wait for other shards before start training. This reduce chances of the faster shards timing out during GLOO AllReduce.
Removed explicit data_parallel_model.py.synchronize call in holmes workflow.
This change was landed previously but caused errors for some EDPM workflows - See https://fb.facebook.com/groups/1426530000692545/permalink/1906766366002237/ - because EDPM assumes any call to CreateOrCloneCommonWorld and Gloo ops are wrapped in exception handlers but in this case exception thrown in the barrier init net is not handled.
To address this issue, we add _CreateOrCloneCommonWorld to the param_init_net instead of a new barrier init net. Since errors for param_init_net run is handled gracefully and re-rendezvous, it should fixes the problem.
* Handle empty nets in async_scheduling
Make sure we don't get stuck on empty nets
* use CUDA_ARCH for conditional compile
* [C2 fix] infer function for ensure_cpu_output_op
* Update group_norm test to reduce flaky test
* Fix lr_multiplier for GPU