Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37169
This allows some cleanup of the code below by making lines shorter.
ghstack-source-id: 102773593
Test Plan: Existing tests for interpolate.
Reviewed By: kimishpatel
Differential Revision: D21209988
fbshipit-source-id: cffcdf9a580b15c4f1fa83e3f27b5a69f66bf6f7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37168
It looks like this was made a separate function because of the `dim` argument,
but that argument is always equal to `input.dim() - 2`. Remove the argument
and consolidate all call sites into one. This also means that this will be
called on paths that previously didn't call it, but all those cases throw
exceptions anyway.
ghstack-source-id: 102773596
Test Plan: Existing tests for interpolate.
Reviewed By: kimishpatel
Differential Revision: D21209993
fbshipit-source-id: 2c274a3a6900ebfdb8d60b311a4c3bd956fa7c37
Summary:
Remove the requirement for the axes provided to reorderAxis to come from a Tensor. We were using that to determine the relevant loops, but we can alternatively determine it by traversing the parents of each provided For.
resistor does this work for you?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37873
Differential Revision: D21428016
Pulled By: nickgg
fbshipit-source-id: b16b2f41cb443dfc2c6548b7980731d1e7d89a35
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37834
Ported all use sites of the old registration API to use new Integer operator registration API.
Test Plan: Imported from OSS
Differential Revision: D21415700
Pulled By: MohammadMahdiJavanmard
fbshipit-source-id: 34f18757bad1642e1c485bb30c9771f7b7102230
Summary:
The existing contextmanager only conditionally enabled_profiling_mode, which was counter intuitive. When we changed the default executor it broke internal benchmarking as a result.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37825
Differential Revision: D21404611
Pulled By: eellison
fbshipit-source-id: 306b3c333ef4eb44ab6a6e5ab4e0682e5ce312ce
Summary:
We used to only support indexing through
- numbers like `x[0, 1]`
- tuple like `x[(0, 1)]`
- tensor like `x[torch.tensor([0, 1])]`
This PR adds support for indexing through list which is equivalent to tensor.
- `x[[0, 1, 5]]`
- `x[[0, 1], [0, 1]]`
- `x[[[0, 1], [0, 1]], [[0, 1], [0, 1]]]`
Note for `x[[0, 1, 5]]` we had a bug in AST conversion code so we used to treat it like `x[0, 1, 5]` which means it might accidentally run and produce wrong result(fixes https://github.com/pytorch/pytorch/issues/37286 fixes https://github.com/pytorch/pytorch/issues/18616), now that it's fixed we probably want to mark it as BC breaking.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37848
Reviewed By: suo
Differential Revision: D21409840
Pulled By: ailzhang
fbshipit-source-id: 6f2d962885c6dc009cb384d98be1822f5ca7a189
Summary:
Now that we landed float requantization for conv/linear, we do not need
the constraint for requant_scale < 1.
Removing that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37683
Test Plan: Quantization tests
Differential Revision: D21412536
Pulled By: kimishpatel
fbshipit-source-id: c932b5ab3aa40407e9d7f0c877e2fe7fd544f8a7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37922
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D21426425
Pulled By: ezyang
fbshipit-source-id: 9d0d997f608a742668f64e7529c41feb39bec24e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37700
Certain autograd functions can have optional Tensor arguments. For
this purpose it would be nice to support c10::optional<Tensor> as an argument
for C++ autograd functions.
I've added the appropriate overload to ExtractVariables to ensure this works.
For an example, you can look at D21272807 in terms of how this is used.
ghstack-source-id: 103541789
Test Plan: waitforbuildbot
Differential Revision: D21363491
fbshipit-source-id: 0c8665e9bfe279e6b9ab84a889524fea11fa971c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37882
Previously we checked if a node's inputs and outputs have shape
info only when we tried to merge this node into an existing fusion
group, but we didn't check it for the first node in the group. This PR
fixes that. It was causing a failure on test_milstm_cuda, which is now
fixed.
Test Plan: Imported from OSS
Reviewed By: Krovatkin
Differential Revision: D21412756
Pulled By: ZolotukhinM
fbshipit-source-id: 3ca30637ab8fe68443adb5fc03f1b8a11085a6a8
Summary:
This pull request enables ahead of time compilation of HIPExtensions with ninja by setting appropriate compilation flags for ROCm environment. Also, this enables the unit test for testing cuda_extensions on ROCm as well as removing test for ahead of time compilation of extensions with ninja from ROCM_BLACKLIST
ezyang jeffdaily
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37800
Differential Revision: D21408148
Pulled By: soumith
fbshipit-source-id: 146f4ffb3418f3534e6ce86805d3fe9c3eae84e1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37815
Generated device-specific wrappers for Tensor factory ops now call
methods on `globalContext()` directly, rather than indirecting
through `globalLegacyTypeDispatch()`, which we can now delete.
Test Plan: Imported from OSS
Differential Revision: D21398294
Pulled By: bhosmer
fbshipit-source-id: b37bc67aa33bfda6f156d441df55ada40e9b814d
Summary:
Helps prevent following accidental failures:
```
..\caffe2\core\parallel_net_test.cc:303
The difference between ms and 350 is 41, which exceeds kTimeThreshold, where
ms evaluates to 391,
350 evaluates to 350, and
kTimeThreshold evaluates to 40.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37892
Differential Revision: D21417251
Pulled By: malfet
fbshipit-source-id: 300cff7042e466f014850cc7cc406c725d5d0c04
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37515
Previously we classify ops like average pool to the category that doesn't require observation and
the quantization of these ops are done by swapping with dequantize ops: https://github.com/pytorch/pytorch/pull/33481
However, this operation is done in finalize, which means finalize is a numerics changing pass when we swap dequantize with
ops like average pool, this is not ideal since we want to restrict the scope of numerics changing passes.
Because although average pool doesn't require observation, quantized average pool = dequant + float32 average pool + quant
and swapping average pool with dequantize is a numerics changing operation.
This PR implements the support for that. We'll classify ops like average pool to a new category and we'll get average pool through fusion, like we did for other quantized ops. And the numerics changing pass will only happen in insert quant dequant pass, so the model will have the same numerics before and after finalize. With the new category, the debug only option(the model before finalize) for quantize_script will actually produce a model that's numerically consistent with the finalized model.
Test Plan:
python test/test_quantization.py TestQuantizeScriptJitPasses
Imported from OSS
Differential Revision: D21393512
fbshipit-source-id: 5632935fe1a7d76382fda22903d77586a08f0898
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37767Fixes#37577
Needs tests, and maybe a lint.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D21386704
Pulled By: ezyang
fbshipit-source-id: 082c69f9e1f40dc5ed7d371902a4c498f105d99f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37853
```
Uninitialized value was created by an allocation of 'acc_arr_next' in the stack frame of function '_ZN2at6vec25614vec_reduce_allIfZZNS_6native12_GLOBAL__N_124_vec_log_softmax_lastdimIfEEvPT_S6_llENKUlllE_clEllEUlRNS0_12_GLOBAL__N_16Vec256IfEESB_E_EES5_RKT0_NS9_IS5_EEl'
#0 0xa961530 in float at::vec256::vec_reduce_all<float, void at::native::(anonymous namespace)::_vec_log_softmax_lastdim<float>(float*, float*, long, long)::'lambda'(long, long)::operator()(long, long) const::'lambda'(at::vec256::(anonymous namespace)::Vec256<float>&, at::vec256::(anonymous namespace)::Vec256<float>&)>(void at::native::(anonymous namespace)::_vec_log_softmax_lastdim<float>(float*, float*, long, long)::'lambda'(long, long)::operator()(long, long) const::'lambda'(at::vec256::(anonymous namespace)::Vec256<float>&, at::vec256::(anonymous namespace)::Vec256<float>&) const&, at::vec256::(anonymous namespace)::Vec256<float>, long) xplat/caffe2/aten/src/ATen/cpu/vec256/functional.h:12
```
Test Plan:
passed sanitizer locally after change,
CI green
Differential Revision: D21408120
fbshipit-source-id: b9d058cedf42b3d1d34ce05a42049d402906cd13
Summary:
so we can import torch compiled with cuda on a CPU-only machine.
need tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37811
Differential Revision: D21417082
Pulled By: ezyang
fbshipit-source-id: 7a521b651bca7cbe38269915bd1d1b1bb756b45b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35961
Weight quantization was done incorrectly for LSTMs, the statistics for all weights (across layers) were combined in the observer. This meant that weights for later layers in a LSTM would use sub-optimal scales impacting accuracy. The problem gets worse as the number of layers increases.
ghstack-source-id: 103511725
Test Plan: Will be updated
Differential Revision: D20842145
fbshipit-source-id: a622b012d393e0755970531583950b44f1964413
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37866
make sure not to check `CUDA_VERSION` if it is not defined
Test Plan: CI gree
Reviewed By: anjali411
Differential Revision: D21408844
fbshipit-source-id: 5a9afe372b3f1fbaf08a7c43fa3e0e654a569d5f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37423
For now, see what breaks on CI
ghstack-source-id: 103508233
Test Plan:
CI
Imported from OSS
Differential Revision: D21310335
fbshipit-source-id: 99d22e61168fcb318b18a16522aabdc0115c1f39
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37422
The test was failing because in fbcode the version of hypothesis was too old to know
about the width parameter, and it was trying to generate values larger than float32. The fix
is to explicitly set the defaults of the floats range for old versions of hypothesis.
For now, reenable the test and see what breaks in CI
ghstack-source-id: 103500358
Test Plan:
CI
```
buck test mode/dev-nosan //caffe2/test:quantization -- 'test_compare_tensor_scalar \(quantization\.test_quantized_op\.TestComparatorOps\)'
```
Imported from OSS
Reviewed By: raghuramank100
Differential Revision: D21310336
fbshipit-source-id: 1a59ab722daa28aab3d6d2d09bc527874942dc36
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37867
this is to work around internal issue we are hitting with nvcc in ovrsource.
It does not seem to overload to the correct device version of `isinf` and `isnan` without this fudging of the code.
Test Plan:
CI green,
internal builds pass
Reviewed By: malfet
Differential Revision: D21408263
fbshipit-source-id: 1ff44e088b5c885d729cc95f00cf8fa07e525f6d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37839
Calling `RpcAgent::shutdown` from the TensorpipeAgent will ensure that parent class threads are joined and the atomic is set to False.
ghstack-source-id: 103496383
Test Plan: CI Build - no Tensorpipe Agent tests yet
Differential Revision: D21291974
fbshipit-source-id: 50cab929b021faf7f80e0e8139d0c7d1788a3a6c