Summary: This diff adds a string equality checking operator.
Test Plan: Unit tests
Differential Revision: D24042344
fbshipit-source-id: c8997c6130e3438f2ae95dae69f76978e2e95527
Summary: Currently GetSingleArgument is overflowing since it's expecting an int instead of an int64 when using a 1cycle (hill policy) annealing schedule
Test Plan:
unittest
buck test caffe2/caffe2/python/operator_test:learning_rate_op_test
Differential Revision: D23938169
fbshipit-source-id: 20d65df800d7a0f1dd9520705af31f63ae716463
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45315
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45314
in D23858329 (721cfbf842), we put PriorCorrectionCalibrationPrediction unit test in OSS file which causes test failure issue in public trunk.
this diff moves it to FB only test file.
Test Plan:
```
buck test //caffe2/caffe2/python/operator_test:torch_integration_test -- test_gather_ranges_to_dense_op
buck test //caffe2/caffe2/fb/python/operator_test:torch_integration_test -- test_prior_correct_calibration_prediction_op
```
all pass.
Reviewed By: houseroad
Differential Revision: D23899012
fbshipit-source-id: 1ed97d8702e2765991e6caf5695d4c49353dae82
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45231
There are two operators:
`PriorCorrectionCalibrationPrediction` and `GatherRangesToDense` is not supported in PT which makes GLOW cannot work.
To unblock, we first try to use C2->PT conversion. In the long-term, we need to implement PT custom ops.
This diff does this conversion to unblock current project.
Test Plan:
Run unit test. the Test input is from current DPER example.
All pass.
```buck test //caffe2/caffe2/python/operator_test:torch_integration_test -- test_prior_correct_calibration_prediction_op --print-passing-details
> c2 reference output
> [0.14285715 0.27272728 0.39130434 0.5 ]
> PT converted output
> tensor([0.1429, 0.2727, 0.3913, 0.5000])
buck test //caffe2/caffe2/python/operator_test:torch_integration_test -- test_gather_ranges_to_dense_op --print-passing-details
c2 reference output
> [array([[6, 5, 4, 3], [0, 0, 0, 0]], dtype=int64)]
> PT converted output
> [tensor([[6, 5, 4, 3], [0, 0, 0, 0]])]
```
Reviewed By: allwu, qizzzh
Differential Revision: D23858329
fbshipit-source-id: ed37118ca7f09e1cd0ad1fdec3d37f66dce60dd9
Summary:
There is a module called `2to3` which you can target for future specifically to remove these, the directory of `caffe2` has the most redundant imports:
```2to3 -f future -w caffe2```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45033
Reviewed By: seemethere
Differential Revision: D23808648
Pulled By: bugra
fbshipit-source-id: 38971900f0fe43ab44a9168e57f2307580d36a38
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44540
Support output type to be fp16 for UniformFill
Reviewed By: jianyuh
Differential Revision: D23558030
fbshipit-source-id: 53a5b2c92cfe78cd11f55e6ee498e1bd682fe4a1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44089
Add support of fp16 as input type in SparseLengthSum/Mean caffe2 operator
Reviewed By: xianjiec
Differential Revision: D23436877
fbshipit-source-id: 02fbef2fde17d4b0abea9ca5d17a36aa989f98a0
Summary: Exports the operator to PyTorch, to be made into a low-level module.
Test Plan:
```
buck test //caffe2/caffe2/python/operator_test:torch_integration_test -- test_learning_rate
```
Reviewed By: yf225
Differential Revision: D23545582
fbshipit-source-id: 6b6d9aa6a47b2802ccef0f87c1263c6cc2d2fdf6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43205
A number of tests that forward to `TestLoadSaveBase.load_save` are all marked as flaky due to them regularly taking much longer to start up than hypothesis' default timeout of 200ms. This diff fixes the problem by removing the timeout for `load_save`. This is alright as these tests aren't meant to be testing the performance of these operators.
I would set the deadline to 60s if I could however it appears the that caffe2 github CI uses a different version of hypothesis that doesn't allow using `dateutil.timedelta` so instead of trying to figure out an approach that works on both I've just removed the deadline time.
I've also tagged all existing tasks WRT these failures.
Differential Revision: D23175752
fbshipit-source-id: 324f9ff034df1ac4874797f04f50067149a6ba48
Summary:
1. Fix illegal memory access issue for SplitByLengths operator in the CUDA context.
2. Add support to scaling lengths vector for SplitByLengths operator.
3. Add support to test SplitByLengths operator in the CUDA context.
Example for SplitByLengths operator processing scaling lengths vector:
value vector A = [1, 2, 3, 4, 5, 6]
length vector B = [1, 2]
after execution of SplitByLengths operator,
the output should be [1,2] and [3,4,5,6]
Test Plan: buck test mode/dev-nosan caffe2/caffe2/python/operator_test:concat_split_op_test
Reviewed By: kennyhorror
Differential Revision: D23079841
fbshipit-source-id: 3700e7f2ee0a5a2791850071fdc16e5b054f8400
Summary:
Enforce counter value to double type in rowwise_counter.
**Context:**
The existing implementation is using float type for counter value. But due to the precision limit of a floating number [1], we observed that the counter value can't increment beyond 16777216.0 (i.e., the max value is 16777216.0) in our earlier experiments. We decide to enforce double type to avoid this issue.
[1] https://stackoverflow.com/questions/12596695/why-does-a-float-variable-stop-incrementing-at-16777216-in-c
Test Plan:
op test
```
ruixliu@devvm1997:~/fbsource/fbcode/caffe2/caffe2/python/operator_test(f0b0b48c)$ buck test :rowwise_counter_test
Trace available for this run at /tmp/testpilot.20200728-083200.729292.log
TestPilot test runner for Facebook. See https://fburl.com/testpilot for details.
Testpilot build revision cd2638f1f47250eac058b8c36561760027d16add fbpkg f88726c8ebde4ba288e1172a348c7f46 at Mon Jul 27 18:11:43 2020 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/887/t.par
Discovering tests
Running 1 test
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/7881299364977047
✓ caffe2/caffe2/python/operator_test:rowwise_counter_test - test_rowwise_counter (caffe2.caffe2.python.operator_test.rowwise_counter_test.TestRowWiseCounter) 0.265 1/1 (passed)
✓ caffe2/caffe2/python/operator_test:rowwise_counter_test - main 14.414 (passed)
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/7881299364977047
Summary (total time 18.51s):
PASS: 2
FAIL: 0
SKIP: 0
FATAL: 0
TIMEOUT: 0
OMIT: 0
```
optimizer test
```
ruixliu@devvm1997:~/fbsource/fbcode/caffe2/caffe2/python(7d66fbb9)$ buck test :optimizer_test
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/7036874434841896
Summary (total time 64.87s):
PASS: 48
FAIL: 0
SKIP: 24
caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestMomentumSgd)
caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestGFtrl)
caffe2/caffe2/python:optimizer_test - test_caffe2_cpu_vs_numpy (caffe2.caffe2.python.optimizer_test.TestYellowFin)
caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestSparseRAdam)
caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestRowWiseAdagradWithCounter)
caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestAdagrad)
caffe2/caffe2/python:optimizer_test - test_caffe2_gpu_vs_numpy (caffe2.caffe2.python.optimizer_test.TestYellowFin)
caffe2/caffe2/python:optimizer_test - testDense (caffe2.caffe2.python.optimizer_test.TestRowWiseAdagrad)
caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestFtrl)
caffe2/caffe2/python:optimizer_test - testSparse (caffe2.caffe2.python.optimizer_test.TestRmsProp)
...and 14 more not shown...
FATAL: 0
TIMEOUT: 0
OMIT: 0
```
param download test
```
ruixliu@devvm1997:~/fbsource/fbcode/caffe2/caffe2/fb/net_transforms/tests(7ef20a38)$ sudo buck test :param_download_test
Finished test run: Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/6473924481526935
```
e2e flow:
f208394929
f207991149
f207967273
ANP notebook to check the counter value loaded from the flows
https://fburl.com/anp/5fdcbnoi
screenshot of the loaded counter (note that counter max is larger than 16777216.0)
{F250926501}
Reviewed By: ellie-wen
Differential Revision: D22711514
fbshipit-source-id: 426fed7415270aa3f276dda8141907534734337f
Summary:
1. Fix illegal memory access issue for SplitByLengths operator in the CUDA context.
2. Add support to scaling lengths vector for SplitByLengths operator.
3. Add support to test SplitByLengths operator in the CUDA context.
Example for SplitByLengths operator processing scaling lengths vector:
value vector A = [1, 2, 3, 4, 5, 6]
length vector B = [1, 2]
after execution of SplitByLengths operator,
the output should be [1,2] and [3,4,5,6]
Test Plan: buck test mode/dev-nosan caffe2/caffe2/python/operator_test:concat_split_op_test
Reviewed By: kennyhorror
Differential Revision: D22780307
fbshipit-source-id: c5ca60ae16b24032cedfa045a421503b713daa6c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42380
[Caffe2] Remove explicitly divide by zero in SpatialBN training mode
Test Plan: buck test mode/dev-nosan //caffe2/caffe2/python/operator_test:spatial_bn_op_test
Reviewed By: houseroad
Differential Revision: D22873214
fbshipit-source-id: 70b505391b5db02b45fc46ecd7feb303e50c6280
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42151
Previously our Caffe2 SpatialBN op impl was incorrect for computing running_var without unbias coefficent. Actually it should fail the test because the output will be different with CuDNN's output. However, our tests are too weak to find this bug. This diff fix all of them.
Test Plan: buck test mode/dev-nosan //caffe2/caffe2/python/operator_test:spatial_bn_op_test
Reviewed By: houseroad
Differential Revision: D22786127
fbshipit-source-id: db80becb67d60c44faae180c7e4257cb136a266d
Summary:
Found while trying to get RocM Caffe2 job green
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42169
Reviewed By: seemethere
Differential Revision: D22791896
Pulled By: malfet
fbshipit-source-id: 9df6233876aec5ead056365499bab970aa7e8bdc
Summary: we need this op to avoid the splicing of a dense tensor and then use the Mergesinglescaler op
Test Plan: integrated test with dper2
Differential Revision: D22677523
fbshipit-source-id: f4f9a1f06841b0906ec8cbb435482ae0a89e1721
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41313
This diff backs out the backout diff. The failure was due to C++ `or`
not being supported in MSVC. This is now replaced with ||
Original commit changeset: fc7f3f8c968d
Test Plan: Existing unit tests, check github CI.
Reviewed By: malfet
Differential Revision: D22494777
fbshipit-source-id: 3271288919dc3a6bfb82508ab9d021edc910ae45
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40875
This op uses the given num_bins and a spacing strategy to automatically bin and compute the histogram of given matrices.
Test Plan: Unit tests.
Reviewed By: neha26shah
Differential Revision: D22329069
fbshipit-source-id: 28406b94e284d52d875f73662fc82f93dbc00064
Summary:
unique op test failure in caffe2 blocks upgrading CI to rocm3.5.1. Skipping the test to unblock will re-enable after root causing and fixing the issue.
jeffdaily sunway513
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41219
Differential Revision: D22471452
Pulled By: xw285cornell
fbshipit-source-id: 9e503c8b37c0a4b92632f77b2f8a90281a9889c3
Summary:
This PR contains the following updates:
1. MIOpen 3D pooling enabled in Caffe2.
2. Refactored the MIOpen pooling code in caffe2.
3. Enabled unit test cases for 3D pooling.
CC: ezyang jeffdaily ashishfarmer
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38260
Differential Revision: D21524754
Pulled By: xw285cornell
fbshipit-source-id: ddfe09dc585cd61e42eee22eff8348d326fd0c3b
Summary: Export logit op to pt for better preproc perf
Test Plan:
unit test
Also tested with model re-generation
Reviewed By: houseroad
Differential Revision: D22324611
fbshipit-source-id: 86accb6b4528e5c818d2c3f8c67926f279d158d6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40856
Add a new activation function - Mish: A Self Regularized Non-Monotonic Neural Activation Function https://arxiv.org/abs/1908.08681
Test Plan:
buck test //caffe2/caffe2/python/operator_test:elementwise_ops_test -- 'test_mish'
{F242275183}
Differential Revision: D22158035
fbshipit-source-id: 459c1dd0ac5b515913fc09b5f4cd13dcf095af31
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40925
normalization operator does not handle empty tensors correctly. This is a fix.
Test Plan: unit tests
Differential Revision: D22330340
fbshipit-source-id: 0bccf925bb768ebb997ed0c88130c5556308087f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40379
The current sum operator doesn't support Long .. hence modify the code
Test Plan: Write a test case
Reviewed By: jspark1105, yinghai
Differential Revision: D21917365
fbshipit-source-id: b37d2c100c70d17d2f89c309e40360ddfab584ee
Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/387
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39985
avx2 optimized 2/4-bit row-wise quantization/dequantization in perfkernels.
This diff slightly change the numerics of quantization by multiplying with the inverse of scale instead of dividing with scale.
Test Plan:
In my devserver
for i in 2 4 8; do echo $i; buck run mode/opt :fused_rowwise_nbit_conversion_bench -- --bit-rate=$i; done
Before this diff
2-bit
3.35394 ms. 100%. FloatToFused2BitRowwiseQuantized
4-bit
3.60351 ms. 100%. FloatToFused4BitRowwiseQuantized
8-bit
0.434467 ms. 100%. FloatToFused8BitRowwiseQuantized
After this diff
2-bit
0.606386 ms. 100%. FloatToFused2BitRowwiseQuantized
4-bit
0.446683 ms. 100%. FloatToFused4BitRowwiseQuantized
8-bit
0.4349 ms. 100%. FloatToFused8BitRowwiseQuantized
Reviewed By: choudharydhruv, jianyuh
Differential Revision: D22033195
fbshipit-source-id: d3a219e47b8345268d90a160c9314ed0d5b71467
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38574
Adding sparse L1 and L2 regularization operator to Caffe2. This doesn't work using run_on_loss, only run_after_optimize. Applying it to run_after_optimize rather than run_on_loss was easier to implement, particularly for the L1 norm which is preferable in some cases and is non-differentiable at zero.
Test Plan: Wrote and ran unit tests in operator_test:sparse_lp_regularizer_test.
Differential Revision: D21003029
fbshipit-source-id: 81070a621752560ce03e320d065ce27807a5d278
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39297
histogram op doesn't have GPU implementation. It's breaking the CI GPU test. Make the test run cpu only.
Test Plan: CI
Reviewed By: hwangjeff
Differential Revision: D21800824
fbshipit-source-id: 9c835786f22bac7d420ce610397a6ee69084c19a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38514
this diff introduces the `Histogram` caffe2 op, which computes a histogram tensor for a list of input tensors. the bin edges of the histogram are defined by arg `bin_edges`.
Test Plan: tests
Reviewed By: chocjy
Differential Revision: D21553956
fbshipit-source-id: fc98c8db691d66d2dad57b6ad14867109913cb6f
Summary:
Previously we got a CI issue in original submission (D21562485), so we backout the original diff (D21588831). Resubmitting here to reprod the CI issue and ask caffe2 dev to take a look.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38566
Original commit changeset: 6dda4b71904d
Test Plan: buck test
Reviewed By: houseroad
Differential Revision: D21589352
fbshipit-source-id: de40ff2884019e14476e31c4c952f24d6e438f5f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38517
as title
Test Plan: buck test
Reviewed By: olittle
Differential Revision: D21562485
fbshipit-source-id: 573419e5a8dae4121d99d5b72ed3960a92db7a54
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37705
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37372
Posted note: [Regularizing SparseNN Against Over-fitting](https://fb.workplace.com/notes/taiqing-wang/regularizing-sparsenn-against-over-fitting/220306075902708/)
**Problem formulation**
L(w) = J(w) + lambda/2 * ||w||^2
J(w) is the empirical loss, and ||w||^2 is the squared L2 norm of the parameters, a.k.a. L2 regularizer.
dL(w)/ dw_i = dJ(w)/dw_i + lambda w_i
dL(w)/ dw_i is the gradient of L(w) w.r.t. w_i.
To implement the L2 regularizer, the gradient of J(w) w.r.t. w_i is added with w_i. lambda is called as weight decay in this implementation.
**Code changes**
* In the initialization method of AdagradOptimizer, a new input argument, weight_decay, is added.
* In the _run function of AdagradOptimizer, the weight decay will be skipped for 1d bias vectors.
* In the parameter update functions of Adagrad, the gradient is updated by weight_decay * w_i. The default value for weight_decay is zero.
Test Plan:
`
buck build caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_weight_decay
`
`
./buck-out/gen/caffe2/caffe2/fb/dper/layer_models/tests/split_1/sparse_nn_test_weight_decay#binary.par
`
Reviewed By: jspark1105
Differential Revision: D21258652
fbshipit-source-id: d2366ddcd736a03205a2d16f914703b16d9fce8f
Summary: It was always skipped for last 1.5 years (since D10372230 was landed)
Test Plan: CI
Reviewed By: ailzhang
Differential Revision: D21036194
fbshipit-source-id: 9ace60b45a123a9372a88310b91f33a69ae8880c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36225
Implemented the [STORM](https://arxiv.org/abs/1905.10018) optimizer operator for dense and sparse cases.
Test Plan:
All newly added unit tests passed using "buck test //caffe2/caffe2/python/operator_test:storm_test".
{F233643713}
Reviewed By: chocjy
Differential Revision: D18702897
fbshipit-source-id: d25eeb492aa2a03c69754d3f076a8239230b3bf4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35763
Adds inference function and test for ScatterAssign
Test Plan: Updated unit test
Reviewed By: yyetim, shunting1986
Differential Revision: D20501079
fbshipit-source-id: 7ec6ef0127a151250dd699c90c2b80c35cfb1fe4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35857
This fixes a lot of common ops for InferBlobShapesAndTypes as well as adds support for testing the inferred shapes and types of gradient ops.
Ops:
* Concat
* Split
* LeakyReLU
* Relu
* Prelu
* Gelu
* Elu
* Sinh, Tanh, Cosh
* Abs
* ... and a number of other simple element wise ops
Test Plan:
Added support to hypothesis test to check the shape and type of gradient ops.
Enabled it for all the ops I fixed the shape and type inference for.
buck test caffe2/caffe2/python/operator_test:
Reviewed By: pradeepd24
Differential Revision: D20806284
fbshipit-source-id: 77f796d9ff208e09e871bdbadf9a0a7c196b77f2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35507
We want to split up the SparseLengthsSumSparse op into an indirection op and the SparseLengthsSum op so that we can lower the later part. The indirection part is a plain impl now.
Test Plan:
```
for i in `seq 10`; do buck test caffe2/caffe2/python/operator_test:lengths_reducer_fused_nbit_rowwise_ops_test -- test_sparse_lengths_sum_rowwise_sparse; done
```
Reviewed By: jspark1105
Differential Revision: D20683478
fbshipit-source-id: 509effe88719d20aa0c4783bbe0ce1f183ee473c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35430
This fixes and adds tests for several commonly used operators.
There's some formatting differences due to running clang-format on one of the files.
Test Plan: buck test //caffe2/caffe2/fb/operators:hypothesis_test //caffe2/caffe2/python/operator_test:utility_ops_test //caffe2/caffe2/python/operator_test:concat_split_op_test
Reviewed By: yyetim
Differential Revision: D20657405
fbshipit-source-id: 51d86d0834003b8ac8d6acb5149ae13d7bbfc6ab
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35346
weight scale op doesn't have GPU impl. This is breaking OSS CI from D20506032. Making it cpu only
Test Plan: OSS CI
Reviewed By: ustctf
Differential Revision: D20637440
fbshipit-source-id: 9aa6cce63ce637ab7856788e5d02f527decb2a26
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34903
Reattempt of D20461609
Moving 2/4-bit SLS and row-wise 2/4-bit conversion operator to open source to be used by DLRM
Test Plan: CI
Reviewed By: jianyuh
Differential Revision: D20495304
fbshipit-source-id: 66a99677583f50fd40e29c514710c7b1a8cdbc29
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34783
Moving 2/4-bit SLS and row-wise 2/4-bit conversion operator to open source to be used by DLRM
Test Plan: CI
Reviewed By: yinghai
Differential Revision: D20461609
fbshipit-source-id: b3ef73ff10f2433afe06ffa73fe1145282d9ec4c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33431
Some elementwise operators don't have shape and type inference specified for the output tensor: `BitwiseOr`, `BitwiseAnd`, `BitwiseXor`, `Not`, `Sign`.
This change fixes this issue:
- For `Not` and `Sign` operators, the output has the same type and shape as the input, so `IdenticalTypeAndShapeOfInput` function is used to specify that.
- For bitwise operators created by `CAFFE2_SCHEMA_FOR_BINARY_BITWISE_OP` macro, the type and shape inference rules should be the same as for other binary element-wise operators, so `TensorInferenceFunction(ElementwiseOpShapeInference)` is used to specify that.
Also some tests were modified to ensure that the shape and type are inferred (`ensure_outputs_are_inferred` parameter)
Test Plan:
```
CAFFE2_ASSERT_SHAPEINFERENCE=1 buck test caffe2/caffe2/python/operator_test:elementwise_ops_test
CAFFE2_ASSERT_SHAPEINFERENCE=1 buck test caffe2/caffe2/python/operator_test:math_ops_test
```
Note that the tests have to be executed with `CAFFE2_ASSERT_SHAPEINFERENCE=1` in order to fail upon shape inference failure.
Reviewed By: idning
Differential Revision: D19880164
fbshipit-source-id: 5d7902e045d79e5669e5e98dfb13a39711294939
Summary:
For both the Caffe2 and PyTorch backends, enable 3D convolutions through MIOpen.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33067
Reviewed By: BIT-silence
Differential Revision: D19880495
Pulled By: bddppq
fbshipit-source-id: 8f6f970910654c1c5aa871b48a04c1054875691c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32271
Use the 2-stage EmbeddingSpMDM interface in D19425982 to reduce the overhead of code cache lookup and lock contention.
Fix an issue in sparse_lengths_sum_benchmarks generating empty indices when average length is small like 1.
Test Plan: CI
Reviewed By: dskhudia
Differential Revision: D19425987
fbshipit-source-id: d5c5f0d46e0072403901809c31d516fa0f4b9b31
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32448
Using binary search to compute the value for the given quantile among the input tensors.
Test Plan: Newly added unittests;
Reviewed By: jspark1105
Differential Revision: D19487604
fbshipit-source-id: 0dc6627b78d1310ac35b3f1d53b89cc89a697ece
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32475
As title
Test Plan: CI
Reviewed By: houseroad
Differential Revision: D19508778
fbshipit-source-id: fd9ad63607535980505d155f3e3c3b7c6b95daf7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31612
Count the number recent update on rows. Exponential decay is applied on the counter with decay rate r, such that
r^{counter_halflife} = 0.5;
If counter_halflife is nonpositive, this operator is turned off.
Test Plan: added unittest
Reviewed By: chocjy
Differential Revision: D19217921
fbshipit-source-id: 96d850123e339212cc0e0ef352ea8a1b1bf61dfa
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24341
ConvTransposeOp doesn't crash for zero-batch, but it doesn't modify the output blob. This leads to buggy behaviour especially when running the same network twice using different input, or backprop during training.
Seems `ConvTransposeUnpoolBase<Context>::GetOutputSize` works for zero-batch, so I remove the check for `input.numel() > 0`, and reshape the output blob before returning.
For CudnnConvTransposeGradientOp, it's a bit verbose to set `dfilter` and `dbias`, it's a seems the Cudnn can handle it, so simply remove the `X.numel() == 0` branch.
Test Plan: buck test mode/dev-nosan caffe2/caffe2/python/operator_test:conv_transpose_test -- --run-disabled
Reviewed By: BIT-silence
Differential Revision: D16807606
fbshipit-source-id: 0d72c5bd8f2e03c34465e7b530cca548d9bdd5e1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19705
Optimizing for a case when there's a consecutive dims that are not broadcasted followed by another consecutive dims that are broadcasted.
For example, MulGradient(["dC", "A", "B"], ["dA", "dB"], broadcast=True, axis=0) where A.shape == dC.shape == [9508, 80] and B.shape == [80] .
Test Plan:
In SKL T6,
Running mul_gradient_benchmark without this optimization
Operator #0 (dA, MulGradient) 11.9119 ms/iter
After this optimization,
Operator #0 (dA, MulGradient) 0.672759 ms/iter
Need to land D15291800 before to fix the unit test error
Reviewed By: dmudiger
Differential Revision: D15075415
fbshipit-source-id: 0f97be17cf8f1dacbafa34cd637fb8bc1c5e5387
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29167
As titled.
This fix is crucial as multi_channel splitting would create history that has no items (i.e., D == 0), which leads to flow failure.
Test Plan:
Unittest
flow test:
before fix: f148783160
after fix: f149082299
buck test mode/dev-nosan caffe2/caffe2/python/operator_test:softmax_ops_test
Reviewed By: xianjiec
Differential Revision: D18296081
fbshipit-source-id: e0bb2dc2c4e5b465e213f31e5c5ced3a7e1fd574