pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Nikita Shulga	5e30c44c03	Update on "Add BUILD_LAZY_CUDA_LINALG option" When enable, it will generate `torch_cuda_linalg` library, which would depend on cusolve and magma and registers dynamic bindings to it from LinearAlgebraStubs Differential Revision: [D33992795](https://our.internmc.facebook.com/intern/diff/D33992795) [ghstack-poisoned]	2022-02-23 12:59:30 +00:00
Nikita Shulga	78fcbfb61e	Update base for Update on "Add BUILD_LAZY_CUDA_LINALG option" When enable, it will generate `torch_cuda_linalg` library, which would depend on cusolve and magma and registers dynamic bindings to it from LinearAlgebraStubs Differential Revision: [D33992795](https://our.internmc.facebook.com/intern/diff/D33992795) [ghstack-poisoned]	2022-02-23 12:59:30 +00:00
Jordan Fix	987f146185	[fx] Improve support for tuple subclasses such as NamedTuple (#73198 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73198 Previously, if an arg to an FX node is a subclass of tuple then it gets sanitized essentially back to that base class. An example here is when setting an arg to be a TensorMetadata object, which is a NamedTuple, it will be set as a tuple instead. - Change `map_aggregate` to repack the tuple to `type(a)` when it's not directly a tuple (try/except for best attempt) - During codegen, call `add_global` for `type(a)` if it's not directly a tuple. - Add an option for an arg to provide a `_custom_fx_repr_fn` for use inside stringifying via `_format_arg` Test Plan: Added unit test coverage, where we inline the named tuple into arg/kwarg. Reviewed By: jamesr66a Differential Revision: D34381888 fbshipit-source-id: bd672a8542e2bba5aa604b448bec920efc256440 (cherry picked from commit `68f99c12dd`)	2022-02-23 11:31:10 +00:00
Jan Zikes	715a0dc5c0	[PyTorch/d2go] fix optim _multi_tensor (#73215 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73215 Fixing an issue in optimizers from _multi_tensor, for `sgd_mt` introduced in `2cb03e926f` Reviewed By: mikaylagawarecki Differential Revision: D34389034 fbshipit-source-id: ede153d52dca15909c6c022853589707f18dc8d1 (cherry picked from commit `cc8a58e584`)	2022-02-23 10:29:48 +00:00
CodemodService FBSourceClangFormatLinterBot	97898e5144	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D34412981 fbshipit-source-id: a7aa81c0c69bf731db37813f431d9f6ed6a6a355 (cherry picked from commit `a43ea6d9fc`)	2022-02-23 10:29:48 +00:00
CodemodService FBSourceGoogleJavaFormatLinterBot	3b1b4875f1	[AutoAccept][Codemod][FBSourceGoogleJavaFormatLinter] Daily `arc lint --take GOOGLEJAVAFORMAT` Reviewed By: zertosh Differential Revision: D34412756 fbshipit-source-id: da7424025c1d9b82b1f56a030f6b31ba08dd7b8b (cherry picked from commit `736159d415`)	2022-02-23 10:29:48 +00:00
BowenBao	bd4902d81f	[ONNX] Add Squeeze/Unsqueeze dynamic dimensions support when opset >= 13 (#71158 ) * Add Squeeze/Unsqueeze dynamic axes support when opset >= 13 Co-authored-by: hwangdeyu <dejack953outlook.com> Co-authored-by: Gary Miguel <garymmgarymm.org> Pull Request resolved: https://github.com/pytorch/pytorch/pull/73104	2022-02-23 06:41:15 +00:00
BowenBao	80291dff43	[ONNX] Add torch.nan_to_num and torch.maximum/minimum symbolic (#72090 ) * Add nan_to_num symbolic * Restructure if statements * Add torch.maximum and torch.minimum support * Squash tests * Add dependency on input dtype * Add documentation Pull Request resolved: https://github.com/pytorch/pytorch/pull/73103	2022-02-23 06:38:11 +00:00
BowenBao	40de6b80ee	[ONNX] Add infra for quantized model export and support quantized mobilenet v3 (#72215 ) * Add infrastructure and helper functions to enable future work for other quantized operators and models. * Add export for quantized operators needed by torchvision mobilenet v3 large. * ATen namespace: hardsigmoid, flatten, adaptive_avg_pool, quantize_per_tensor, dequantize. * Quantized namespace: conv2d, conv2d_relu, hardswish, add, mul. * Numerous bug fixes, in unpack_quantized_weight.cpp, symbolic functions, and unit test. Co-authored-by: BowenBao <bowbaomicrosoft.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/73102	2022-02-23 06:22:58 +00:00
Michael Melesse	785ebb9d6d	[ROCM] Navi21 Enablement 3: Embedding kernels (#72809 ) Summary: This PR is a follow up to the following prs. https://github.com/pytorch/pytorch/pull/69942 https://github.com/pytorch/pytorch/pull/72682 We are adding support to Navi21 GPUs which have a warpsize of 32. We cannot rely on a constant so we have to dynamically look up the warpsize when launching the kernel on the host side. Inside device functions this is not needed and the compiler can correctly detect the correct warpsize to replace the C10_WARP_SIZE constant. Pull Request resolved: https://github.com/pytorch/pytorch/pull/72809 Reviewed By: mruberry Differential Revision: D34400737 Pulled By: ngimel fbshipit-source-id: 1a1374465d4006e485d4d11531a4c78ddb178cdf (cherry picked from commit `94211fe1f0`)	2022-02-23 04:26:58 +00:00
kshitij12345	299b40de50	[jiterator] stricter static_assert (#72576 ) Summary: * static_assert on `jiterator_stringify` usage in ROCm. * static_assert for `complex<half>` Pull Request resolved: https://github.com/pytorch/pytorch/pull/72576 Reviewed By: ngimel Differential Revision: D34387640 Pulled By: mruberry fbshipit-source-id: d58dbb062c9c301465b9b7e4a56ee3d64baaadf9 (cherry picked from commit `82d2a75519`)	2022-02-23 03:33:26 +00:00
Peter Bell	9ea6db4aca	fft: Fix invalid shape error for complex-to-real transforms (#73012 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/72910 `last_dim_size` is the expected output size for the Hermitian-compressed dimension and must be > 0. The confusingly named `ld` represents the input's last dim size which is calculated as `last_dim_size / 2 + 1` so could never be 0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/73012 Reviewed By: ngimel Differential Revision: D34387147 Pulled By: mruberry fbshipit-source-id: 6b410088efe2a9e117a5c6d8beefda370363dbb0 (cherry picked from commit `f8d771ed36`)	2022-02-23 03:33:26 +00:00
Terry Chen	16e2f5d291	[quant] Add ConvTranspose reference module - Reland #73031 (#73094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73094 Add ConvTranspose reference module Test Plan: python3 test/test_quantization.py TestQuantizeEagerOps.test_conv_transpose_2d Imported from OSS Reviewed By: jerryzh168 Differential Revision: D34352228 fbshipit-source-id: 03062d6b441bc5a3298ec094f421a69c4c3d5c40 (cherry picked from commit `2f2bdd4fcf`)	2022-02-23 02:31:42 +00:00
Xiao Wang	2051068233	Change how cuda available memory is calculated in largeTensorTest decorator (#72207 ) Summary: Related PR https://github.com/pytorch/pytorch/issues/45332 Related discussion https://github.com/pytorch/pytorch/pull/45332#issuecomment-985996064 Pull Request resolved: https://github.com/pytorch/pytorch/pull/72207 Reviewed By: ngimel Differential Revision: D34387921 Pulled By: mruberry fbshipit-source-id: 2d842a25a5d3d1fc48917ba8fb29ff96d7bc2650 (cherry picked from commit `01a9e980c7`)	2022-02-23 02:31:42 +00:00
Carlos Mocholí	491ee70e6e	Avoid `collections` deprecation warning (#72239 ) Summary: Avoids the following deprecation warning: ```python loss.backward(args, kwargs) /usr/local/lib/python3.7/dist-packages/torch/tensor.py:245: in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) /usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py:147: in backward allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag /usr/local/lib/python3.7/dist-packages/torch/autograd/function.py:89: in apply return self._forward_cls.backward(self, args) # type: ignore /usr/local/lib/python3.7/dist-packages/torch/nn/parallel/_functions.py:34: in backward return (None,) + ReduceAddCoalesced.apply(ctx.input_device, ctx.num_inputs, *grad_outputs) /usr/local/lib/python3.7/dist-packages/torch/nn/parallel/_functions.py:45: in forward return comm.reduce_add_coalesced(grads_, destination) /usr/local/lib/python3.7/dist-packages/torch/nn/parallel/comm.py:143: in reduce_add_coalesced flat_result = reduce_add(flat_tensors, destination) /usr/local/lib/python3.7/dist-packages/torch/nn/parallel/comm.py:96: in reduce_add nccl.reduce(inputs, output=result, root=root_index) /usr/local/lib/python3.7/dist-packages/torch/cuda/nccl.py:69: in reduce _check_sequence_type(inputs) /usr/local/lib/python3.7/dist-packages/torch/cuda/nccl.py:48: in _check_sequence_type if not isinstance(inputs, collections.Container) or isinstance(inputs, torch.Tensor): _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ name = 'Container' def __getattr__(name): # For backwards compatibility, continue to make the collections ABCs # through Python 3.6 available through the collections module. # Note, no new collections ABCs were added in Python 3.7 if name in _collections_abc.__all__: obj = getattr(_collections_abc, name) import warnings warnings.warn("Using or importing the ABCs from 'collections' instead " "of from 'collections.abc' is deprecated since Python 3.3," "and in 3.9 it will stop working", > DeprecationWarning, stacklevel=2) E DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working /usr/lib/python3.7/collections/__init__.py:52: DeprecationWarning ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/72239 Reviewed By: ngimel Differential Revision: D34387815 Pulled By: mruberry fbshipit-source-id: 30c9b4fe518351bc9a6f211269e27ee3ab73a13c (cherry picked from commit `1f68cdfac5`)	2022-02-23 02:31:42 +00:00
Peter Bell	facd6f0bea	Unpin librosa and update SciPy pin (#72834 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72834 This removes the upper bound to librosa's pin and updates the scipy pin since librosa 0.9 requires SciPy 1.2 or newer. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D34386898 Pulled By: mruberry fbshipit-source-id: db654bd337b474cd5a2ff8dbb9a659ed272728cf (cherry picked from commit `4790e8180c`)	2022-02-23 02:31:42 +00:00
Peter Bell	0947521268	Update stft tests to support latest librosa (#72833 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72833 Closes #72550 The latest version of librosa breaks backward compatibility in two ways: - Everything except the input tensor is now keyword-only - `pad_mode` now defaults to `'constant'` for zero-padding https://librosa.org/doc/latest/generated/librosa.stft.html This changes the test to match the old behaior even when using the new library and updates the documentation to explicitly say that `torch.stft` doesn't exactly follow the librosa API. This was always true (`torch.stft` it has new arguments, a different default window and supports complex input), but it can't hurt to be explicit. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D34386897 Pulled By: mruberry fbshipit-source-id: 6adc23f48fcb368dacf70602e9197726d6b7e0c1 (cherry picked from commit `b5c5ed4196`)	2022-02-23 02:31:42 +00:00
Cody Yu	1ef244e003	Fix tensor.__deepcopy__ for lazy device (#73197 ) Summary: A small bug that misses `lazy` in tensor.__deepcopy__, which results in segmentation when deepcopy a lazy model. Pull Request resolved: https://github.com/pytorch/pytorch/pull/73197 Reviewed By: jbschlosser Differential Revision: D34394482 Pulled By: wconstab fbshipit-source-id: c84fdb9b3a827677971fd3477a92679d7dbce3c0 (cherry picked from commit `c003d150ce`)	2022-02-23 02:31:42 +00:00
Neeraj Pradhan	af902102e0	Fix discrete sampler test to correctly run Chi2 test (#73251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73251 Scipy's chisquare test requires that the observed frequencies should sum up to the same number as the expected frequences. This modifies `_check_sampler_discrete` to ensure that two match. See: https://github.com/scipy/scipy/issues/12282 for details. Test Plan: Unit tests pass on platform010 Reviewed By: r-barnes Differential Revision: D34402314 fbshipit-source-id: 995b4ddf668cfb551176d3bd21fb8415dfe96cc1 (cherry picked from commit `d81a133b0d`)	2022-02-23 02:31:42 +00:00
Peter Bell	3d9ec11fea	Quantized LSTM/GRU: Remove legacy API support (#72522 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72522 Ref #72263 for cpp_custom_type_hack removal These overloads were deprecated in #35787 which was in the PyTorch 1.6 release, so the BC period is well expired. cc jamesr66a Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D34111271 Pulled By: albanD fbshipit-source-id: 0078564188133625ca67137975fd5dd2fa2b4827 (cherry picked from commit `4f9c5a3ed7`)	2022-02-23 01:29:30 +00:00
Eli Uriegas	4267e6e55e	Fix formatting issues for onnx Summary: These are formatting changes automatically done with `arc f` to deal with issues landing the onnx changes in this stack {F703786210} Test Plan: yeah_sandcastle Reviewed By: malfet Differential Revision: D34402111 fbshipit-source-id: 06eb352d1e4f8b1439a580148fe1060fb5c9e102 (cherry picked from commit `7bbf29ed8e`)	2022-02-22 23:31:13 +00:00
BowenBao	cc2aad2ef2	[ONNX] Add symbolic for torch.addcmul (#72126 ) * Add addcmul op * Remove required_grad Pull Request resolved: https://github.com/pytorch/pytorch/pull/73101	2022-02-22 22:48:18 +00:00
BowenBao	28bf2f80cf	Don't call _jit_pass_onnx_function_extraction if export_modules_as_functions is False (#69742 ) * fix clang-format violations * Don't call _jit_pass_onnx_function_extraction if export_modules_as_functions is False It's just wasteful. Pull Request resolved: https://github.com/pytorch/pytorch/pull/73100	2022-02-22 22:43:53 +00:00
francescocastelli	cbb2df541a	Added check for unsupported dispatch key in codegen (#67961 ) Summary: Added a check for the dispatch keys present in native_function.yaml, they must be part of the fixed set of dispatch keys. If not, signal an error. I also removed two dispatch keys from the function schema copy_ , because they are not supported (SparseHIP, SpareXPU). Pull Request resolved: https://github.com/pytorch/pytorch/pull/67961 Test Plan: this function schema (for example) in native_function.yaml ``` - func: native_norm(Tensor self, Scalar p=2) -> Tensor dispatch: SparseCPU, SparseCUDA, SparseHIP: norm_sparse ``` now generates this error during codegen: `AssertionError: SparseHIP is not a supported dispatch key.` Fixes https://github.com/pytorch/pytorch/issues/66190 Reviewed By: albanD Differential Revision: D34327853 Pulled By: ezyang fbshipit-source-id: 6959d14a7752aefd025baa482d56547b4ed69b4c (cherry picked from commit `26bea380af`)	2022-02-22 22:31:47 +00:00
Yedidya Feldblum	7a5b0efc64	[caffe2] fix build failures in optimized builds under clang Summary: There are various possible approaches, but the approach chosen minimizes disruption to source control blame. Addresses: ``` error: Function _ZN23FunctionalTest_Pad_Test8TestBodyEv is too big to optimize [-Werror,-Wignored-optimization-argument] ``` Test Plan: buck2 build mode/opt caffe2/test/cpp/api:functional Reviewed By: jamesr66a Differential Revision: D34027291 fbshipit-source-id: 9dfd771ad56d3d4bc0d41b38b04654c8dae7c006 (cherry picked from commit `d43b5a7ed6`)	2022-02-22 22:31:47 +00:00
Richard Barnes	600f4bf20c	Clean up some unused variable warnings (#73151 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73151 Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D34365492 fbshipit-source-id: d9eaa2e21aacd8ff0b97152e590d83f682df4667 (cherry picked from commit `ca0efc53db`)	2022-02-22 21:30:14 +00:00
hauntsaninja	e9c64168d9	Import packaging.version in torch_version, if available (#71902 ) Summary: Resolves https://github.com/pytorch/pytorch/issues/71280 We used to use `from pkg_resources import packaging`. To recap, this has three potential problems: 1) `pkg_resources` is a really slow import 2) We have an undeclared runtime dependency on `setuptools` 3) We're relying on `pkg_resources`'s secret vendored copy of `packaging`. This is obviously not part of the public API of `pkg_resources`. In https://github.com/pytorch/pytorch/issues/71345 this was made a lazy import, which is great! It means we don't run into these problems as long as users don't use `torch.__version__`. This change additionally helps further address problems 1 and 3, by directly importing `packaging`, if present, and only falling back to the vendored copy in `pkg_resources`. Benchmark for speed difference in a virtual environment with a couple hundred packages installed: ``` λ hyperfine -w 2 'python -c "from pkg_resources import packaging"' 'python -c "import packaging.version"' Benchmark 1: python -c "from pkg_resources import packaging" Time (mean ± σ): 706.7 ms ± 77.1 ms [User: 266.5 ms, System: 156.8 ms] Range (min … max): 627.9 ms … 853.2 ms 10 runs Benchmark 2: python -c "import packaging.version" Time (mean ± σ): 53.8 ms ± 8.5 ms [User: 34.8 ms, System: 14.4 ms] Range (min … max): 46.3 ms … 72.3 ms 53 runs 'python -c "import packaging.version"' ran 13.14 ± 2.52 times faster than 'python -c "from pkg_resources import packaging"' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/71902 Reviewed By: mikaylagawarecki Differential Revision: D34343145 Pulled By: malfet fbshipit-source-id: a6bd7ecf0cbb6b5c20ab18a22576aa2df9eb3324 (cherry picked from commit `0a249044c8`)	2022-02-22 21:30:14 +00:00
Alban Desmaison	7e919bd3c6	add dry run option and improve test list printing Pull Request resolved: https://github.com/pytorch/pytorch/pull/73208	2022-02-22 20:45:41 +00:00
Samantha Andow	53faf78143	expanded weights without fast rules (#70140 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70140 [Design Doc for Expanded Weights](https://gist.github.com/samdow/fa0a164fec7963f93ff45284989cfc55) <-- gives an overview of the design for Expanded Weights Introduces the ExpandedWeights mechanism and user-facing API without any custom implemented, faster rules. - User facing API is in `_stateless.py` (with documentation) - Testing is in test_expanded_weights - The rest is the implementation of the erroring fallback + the mechanism for being able to register faster per sample grad rules. Only linear is implemented here, but they are all implemented in #70141 Test Plan: Imported from OSS Reviewed By: mikaylagawarecki Differential Revision: D34350950 Pulled By: samdow fbshipit-source-id: 69c664b0bc3dff6951358d79d7e5d94882f7aef2 (cherry picked from commit `ae1620d3b6`)	2022-02-22 20:35:16 +00:00
Alban Desmaison	7807a83f6e	Fix error handling TestSetDefaultMobileCPUAllocator Pull Request resolved: https://github.com/pytorch/pytorch/pull/73207	2022-02-22 19:45:49 +00:00
Nikita Shulga	cfb6c942fe	`scatter_reduce` documentation (#73125 ) Summary: Reland of https://github.com/pytorch/pytorch/issues/68580 (which were milestoned for 1.11) plus partial revert of https://github.com/pytorch/pytorch/pull/72543 Pull Request resolved: https://github.com/pytorch/pytorch/pull/73125 Reviewed By: bdhirsh Differential Revision: D34355217 Pulled By: malfet fbshipit-source-id: 325ecdeaf53183d653b44ee5e6e8839ceefd9200 (cherry picked from commit `71db31748a`)	2022-02-22 19:33:46 +00:00
Nikita Shulga	e12c57a35b	[ONNX] Apply clang-format changes (#73220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73220 Test Plan: CI Reviewed By: seemethere Differential Revision: D34395058 fbshipit-source-id: dd043f32ba4e33f1ceeffbf432942a850488e628 (cherry picked from commit `c5265e90c7`)	2022-02-22 19:33:46 +00:00
Scott Wolchok	28339ddc25	[PyTorch] Hit fused addmm path in linear() for existing MHA (#72871 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72871 We do this same trick in the native MHA implementation; backport it for purposes of fair comparison. ghstack-source-id: 149526858 Test Plan: CI Reviewed By: ngimel Differential Revision: D34176090 fbshipit-source-id: 8b578c29c4dcf0d85bae74dfbbb82db9a8f32dc7 (cherry picked from commit `fd50170935`)	2022-02-22 19:33:46 +00:00
Nikita Shulga	8625623e86	Update clang-format hash It was out-of-date, rendering lint/clang-format a no-op Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/73225	2022-02-22 19:24:53 +00:00
Eli Uriegas	0bcf190c7a	.github: Create superuser group for GHF Creates the superuser group for GHF to allow for any changes reviewed by these individuals to be automatically merged using our GHF tooling Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/73221	2022-02-22 19:14:33 +00:00
Nikita Shulga	9a96604800	Revert D34318185: [pytorch][PR] Ensure that call before redispatch work well for PythonTLSSnapshot Test Plan: revert-hammer Differential Revision: D34318185 (`04c9e52ecc`) Original commit changeset: abc30fe69176 Original Phabricator Diff: D34318185 (`04c9e52ecc`) fbshipit-source-id: ba40c2e1eceb1c4b71ac6edefc64d01e174d9524 (cherry picked from commit `f47961904d`)	2022-02-22 18:31:13 +00:00
Pavithran Ramachandran	932adf26e4	[easy][PyTorch][CleanUp] Removing unused function def (missing function implementation) (#73019 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73019 fb: Code search shows no usage https://www.internalfb.com/code/search?q=repo%3Aall%20writeMobileMetadata&hide_uninteresting=0&hide_tests=0 ghstack-source-id: 149381949 Test Plan: CI Reviewed By: larryliu0820 Differential Revision: D34306823 fbshipit-source-id: b405e5683113bd4ff2e89eec023ae9ebb25c3dc9 (cherry picked from commit `a72621fbbd`)	2022-02-22 17:31:32 +00:00
Vasiliy Kuznetsov	6d86dc5390	dbr quant: store auto_quant_state on the top level model (#72934 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72934 Before this PR, DBR quantization had a limitation on handling user code which iterates over all module children. For example, imagine a forward function such as ``` def forward(self, x): for module in self: x = module(x) return x ``` Before this PR, this code would break with DBR quantization, because we attach `AutoQuantizationState` objects to each child, and those objects live in the child's module hierarchy and will appear in these kinds of iterations, changing the meaning of the user program. This PR reduces the scope of this problem to just the top level module. Instead of attaching `AutoQuantizationState` objects to each child, we register them in a map on the parent. Here is a before and after: ``` // toy model model \|--> child1 // toy model with AutoQuantizationState objects, before this PR model \|--> child1 \| \|--> _auto_quant_state \|--> _auto_quant_state // toy model with AutoQuantizationState objects, after this PR model \|--> child1 \|--> _fqn_to_auto_quant_state_map \|--> ( ) --> _auto_quant_state // of `model` \|--> (child1) --> _auto_quant_state // of `model.child1` ``` Note: `child1._auto_quant_state` works as before for convenience, but the `child1` object now stores a soft link to its `_auto_quant_state` instead of properly registering it in its module hierarchy. This is somewhat hacky. If we need to improve this in the future, we could remove this soft link and refactor the code to call the FQN map instead. Note: if the top level module iterates over its children, things will still be broken. This is less likely, and we will recommend that the user work around this by wrapping their model, or checking for the `AutoQuantizationStateModuleDict` type in their iteration loop. The impact of this change should be an improvement of coverage of user models. In fact, we expect this to drive our coverage of torchbenchmark models from 89% to 100%. Test Plan: ``` // previously disabled test cases with user code iterating // over module children are now enabled, with wrappers python test/test_quantization.py -k test_module_calls_items python test/test_quantization.py -k test_vovnet_sequential ``` Reviewed By: dzdang Differential Revision: D34281074 Pulled By: vkuzo fbshipit-source-id: 0e25fc1ec529c47f72478a1875fe43219feac6b1 (cherry picked from commit `4008f89967`)	2022-02-22 17:31:32 +00:00
Andrew Gu	c30659ffcc	[ZeRO] (Reland) Add ctor support for multiple param groups (#72932 ) Summary: Reland of https://github.com/pytorch/pytorch/pull/72578. Overview Windows CI was failing due to the multi-rank single-GPU case (see [here](https://github.com/pytorch/pytorch/runs/5204906995?check_suite_focus=true)). To address this, I - added `common_distributed.skip_if_no_gpu` for `test_multiple_param_groups()` to ensure that each rank can safely call `to(self.device)` -- this targets the expected SPSD use case where each rank has its own GPU; - moved `test_constructor()` back to `TestZeroRedundancyOptimizerSingleRank` to check that the multiple parameter group method for construction works even on a single rank. Test Plan - I checked both tests for CPU, 1 GPU, 2 GPUs, 4 GPUs, and 8 GPUs. - I added the `ciflow/win` label to run the failing Windows CI test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/72932 Reviewed By: rohan-varma Differential Revision: D34281482 Pulled By: awgu fbshipit-source-id: c4fe604ddd9d2c123c3071249741e6b8a6454b6e (cherry picked from commit `6bea9bcc63`)	2022-02-22 16:29:55 +00:00
Facebook Community Bot	1d404727c5	Automated submodule update: FBGEMM (#73061 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `51344755fe` Pull Request resolved: https://github.com/pytorch/pytorch/pull/73061 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105, jiecaoyu Differential Revision: D34331487 fbshipit-source-id: 39cc6d4c0c7a0c8ee26cb385966123990f9e6eda (cherry picked from commit `53919f8173`)	2022-02-22 16:29:55 +00:00
Alban Desmaison	04c9e52ecc	Ensure that call before redispatch work well for PythonTLSSnapshot (#73045 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73045 Reviewed By: zou3519 Differential Revision: D34318185 Pulled By: albanD fbshipit-source-id: abc30fe69176ba474e28bb045406a410e17cfd79 (cherry picked from commit `4d9a305d3a`)	2022-02-22 15:30:07 +00:00
Adam Costarino	849c6a526e	Extrapolated on equiv between linalg @ and solve (#71769 ) Summary: Potentially fixes https://github.com/pytorch/pytorch/issues/71385 similar docstring could also fix https://github.com/pytorch/pytorch/issues/71384 Updated the doc to `torch.linalg.inv` to include nuance around equivalence to `torch.linalg.solve`: Update is below: ``` .. note:: Consider using :func:`torch.linalg.solve` if possible for multiplying a matrix on the left by the inverse, as:: linalg.solve(A, B) == linalg.inv(A) @ B # When B is a matrix It is always prefered to use :func:`~solve` when possible, as it is faster and more numerically stable than computing the inverse explicitly. ``` IvanYashchuk please inform if this the right direction or over-extrapolation. I can apply the same changes to the `tensorinv` doc to fix https://github.com/pytorch/pytorch/issues/71384. Also in https://github.com/pytorch/pytorch/issues/71384 there was a mention of updating `torch.matmul` error message to indicate the proper tensor shapes, I could also potentially do that in this PR if needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/71769 Reviewed By: H-Huang Differential Revision: D34242541 Pulled By: mruberry fbshipit-source-id: 40e98dad4d821928d1dea72d4512ee579b690a32 (cherry picked from commit `a0321a5de9`)	2022-02-22 12:29:32 +00:00
Linbin Yu	99bcadced4	improve android instrumentation test and update README Added tests for lite interpreter. By default the run_test.sh will use lite interpreter, unless manually set BUILD_LITE_INTERPRETER=0 Also fixed model generation script for android instrumentation test and README. Verified test can pass for both full jit and lite interpreter. Also tested on emulator and real device using different abis. Lite interpreter ``` ./scripts/build_pytorch_android.sh x86 ./android/run_tests.sh ``` Full JIT ``` BUILD_LITE_INTERPRETER=0 ./scripts/build_pytorch_android.sh x86 BUILD_LITE_INTERPRETER=0 ./android/run_tests.sh ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/72736	2022-02-22 08:05:33 +00:00
Richard Barnes	c2255c36ec	Fix binary search in bisect_percentile_op (#73146 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73146 Binary search can overflow; this fixes it. Test Plan: Sandcastle Reviewed By: meyering Differential Revision: D34365186 fbshipit-source-id: f92a810b49ef5ce345d0b019b584fe3c1f5ae017 (cherry picked from commit `9c2133ec6f`)	2022-02-21 22:30:32 +00:00
Nikita Shulga	56aae5beca	Update on "Add BUILD_LAZY_CUDA_LINALG option" When enable, it will generate `torch_cuda_linalg` library, which would depend on cusolve and magma and registers dynamic bindings to it from LinearAlgebraStubs Differential Revision: [D33992795](https://our.internmc.facebook.com/intern/diff/D33992795) [ghstack-poisoned]	2022-02-21 19:05:02 +00:00
Nikita Shulga	863135a54d	Update base for Update on "Add BUILD_LAZY_CUDA_LINALG option" When enable, it will generate `torch_cuda_linalg` library, which would depend on cusolve and magma and registers dynamic bindings to it from LinearAlgebraStubs Differential Revision: [D33992795](https://our.internmc.facebook.com/intern/diff/D33992795) [ghstack-poisoned]	2022-02-21 19:05:02 +00:00
Nikita Shulga	5dad19fef0	Back out "[pytorch][PR] add BFloat16 sparse operators on CPU: copy, coalesce, sparse_mask, ad…" Summary: Original commit changeset: f1274125234a Original Phabricator Diff: D34343016 (`c6f56599bb`) Test Plan: Abovementioned PR regressed OSS CI Reviewed By: atalman Differential Revision: D34379703 fbshipit-source-id: bc624cfd86249dde2fac635d9b66f08f86b4aed9 (cherry picked from commit `e52827f1ae`)	2022-02-21 18:31:51 +00:00
Taylor Robie	9f541aa3ac	[Profiler] Optimize `reportMemoryUsage` (#71538 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71538 `reportMemoryUsage` is kind of awful. It does a bunch of string writes and such that makes it VERY expensive. Just moving that work off the hot path reduces the overhead for `profile_memory` from ~6.5 us to ~1.2 us. (85% reduction in the kineto contribution to profiling overhead.) Test Plan: Ran ubenchmark with `--op empty --stressTestKineto --kinetoProfileMemory` Reviewed By: swolchok Differential Revision: D32730167 fbshipit-source-id: fe18e8fa3881967cad8fa1c26c71c805e9b034e5 (cherry picked from commit `0d394cb252`)	2022-02-20 23:29:13 +00:00
Richard Barnes	24c91e23d3	Fix nasty bug in bisect_percentile_op (#73147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73147 Code used `reserve` instead of `resize` leading to platform010 test failures: ``` Trying example: test_bisect_percentil_op_large( self=<caffe2.caffe2.python.operator_test.bisect_percentile_op_test.TestBisectPercentileOp testMethod=test_bisect_percentil_op_large>, N=20, lengths=[2, 2], max_value=100, discrete=False, p=0.0, gc=, dc=[], ) stderr: E0219 13:14:52.601948 995877 JustKnobsConfigeratorLoader.cpp:114] Failed to load config justknobs/movefast/knobs after 55000ms timeout E0219 13:14:52.602150 995877 JustKnobsConfigeratorLoader.cpp:114] Failed to load config justknobs/pytorch/compiler after 55000ms timeout test_bisect_percentil_op_large (caffe2.caffe2.python.operator_test.bisect_percentile_op_test.TestBisectPercentileOp) ... third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/stl_vector.h:1045: std::vector::reference std::vector<int>::operator[](std::vector::size_type) [_Tp = int, _Alloc = std::allocator<int>]: Assertion '__n < this->size()' failed. * Aborted at 1645305292 (Unix time, try 'date -d 1645305292') * * Signal 6 (SIGABRT) (0x8556000f3225) received by PID 995877 (pthread TID 0x7f13a79c51c0) (linux TID 995877) (maybe from PID 995877, UID 34134) (code: -6), stack trace: * W0219 13:14:52.682251 995932 RetryingSender.cpp:433] Failed to make rpc. Sender name: pr-scubasing. Reason: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused. @ 000000000000431b folly::symbolizer::(anonymous namespace)::signalHandler(int, siginfo_t, void) ./folly/experimental/symbolizer/SignalHandler.cpp:449 @ 0000000000000000 (unknown) @ 000000000009c9f3 __GI___pthread_kill ``` Test Plan: Sandcastle Reviewed By: luciang Differential Revision: D34365188 fbshipit-source-id: 65dcc23226c59096afd5fb3c338c3bd29c936ec3 (cherry picked from commit `a1d18e3e6a`)	2022-02-20 17:28:35 +00:00
Michael Suo	bf03d93496	Revert D33919683: [FSDP] Implement local_state_dict and load_local_state_dict Test Plan: revert-hammer Differential Revision: D33919683 (`d50643adcd`) Original commit changeset: c9f1b43ce04d Original Phabricator Diff: D33919683 (`d50643adcd`) fbshipit-source-id: c54c181edf8eb6a3bc509ed54d34ffdce11b93f5 (cherry picked from commit `4dfb50cd0d`)	2022-02-20 02:32:48 +00:00

1 2 3 4 5 ...

44023 Commits