Commit Graph

44023 Commits

Author SHA1 Message Date
Nikita Shulga
5e30c44c03 Update on "Add BUILD_LAZY_CUDA_LINALG option"
When enable, it will generate `torch_cuda_linalg` library, which would depend on cusolve and magma and registers dynamic bindings to it from LinearAlgebraStubs

Differential Revision: [D33992795](https://our.internmc.facebook.com/intern/diff/D33992795)

[ghstack-poisoned]
2022-02-23 12:59:30 +00:00
Nikita Shulga
78fcbfb61e Update base for Update on "Add BUILD_LAZY_CUDA_LINALG option"
When enable, it will generate `torch_cuda_linalg` library, which would depend on cusolve and magma and registers dynamic bindings to it from LinearAlgebraStubs

Differential Revision: [D33992795](https://our.internmc.facebook.com/intern/diff/D33992795)

[ghstack-poisoned]
2022-02-23 12:59:30 +00:00
Jordan Fix
987f146185 [fx] Improve support for tuple subclasses such as NamedTuple (#73198)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73198

Previously, if an arg to an FX node is a subclass of tuple then it gets sanitized essentially back to that base class. An example here is when setting an arg to be a TensorMetadata object, which is a NamedTuple, it will be set as a tuple instead.

- Change `map_aggregate` to repack the tuple to `type(a)` when it's not directly a tuple (try/except for best attempt)
- During codegen, call `add_global` for `type(a)` if it's not directly a tuple.
- Add an option for an arg to provide a `_custom_fx_repr_fn` for use inside stringifying via `_format_arg`

Test Plan: Added unit test coverage, where we inline the named tuple into arg/kwarg.

Reviewed By: jamesr66a

Differential Revision: D34381888

fbshipit-source-id: bd672a8542e2bba5aa604b448bec920efc256440
(cherry picked from commit 68f99c12dd)
2022-02-23 11:31:10 +00:00
Jan Zikes
715a0dc5c0 [PyTorch/d2go] fix optim _multi_tensor (#73215)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73215

Fixing an issue in optimizers from _multi_tensor, for `sgd_mt` introduced in 2cb03e926f

Reviewed By: mikaylagawarecki

Differential Revision: D34389034

fbshipit-source-id: ede153d52dca15909c6c022853589707f18dc8d1
(cherry picked from commit cc8a58e584)
2022-02-23 10:29:48 +00:00
CodemodService FBSourceClangFormatLinterBot
97898e5144 [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D34412981

fbshipit-source-id: a7aa81c0c69bf731db37813f431d9f6ed6a6a355
(cherry picked from commit a43ea6d9fc)
2022-02-23 10:29:48 +00:00
CodemodService FBSourceGoogleJavaFormatLinterBot
3b1b4875f1 [AutoAccept][Codemod][FBSourceGoogleJavaFormatLinter] Daily arc lint --take GOOGLEJAVAFORMAT
Reviewed By: zertosh

Differential Revision: D34412756

fbshipit-source-id: da7424025c1d9b82b1f56a030f6b31ba08dd7b8b
(cherry picked from commit 736159d415)
2022-02-23 10:29:48 +00:00
BowenBao
bd4902d81f [ONNX] Add Squeeze/Unsqueeze dynamic dimensions support when opset >= 13 (#71158)
* Add Squeeze/Unsqueeze dynamic axes support when opset >= 13

Co-authored-by: hwangdeyu <dejack953outlook.com>
Co-authored-by: Gary Miguel <garymmgarymm.org>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73104
2022-02-23 06:41:15 +00:00
BowenBao
80291dff43 [ONNX] Add torch.nan_to_num and torch.maximum/minimum symbolic (#72090)
* Add nan_to_num symbolic

* Restructure if statements

* Add torch.maximum and torch.minimum support

* Squash tests

* Add dependency on input dtype

* Add documentation

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73103
2022-02-23 06:38:11 +00:00
BowenBao
40de6b80ee [ONNX] Add infra for quantized model export and support quantized mobilenet v3 (#72215)
* Add infrastructure and helper functions to enable future work for other quantized operators and models.
* Add export for quantized operators needed by torchvision mobilenet v3 large.
    * ATen namespace: hardsigmoid, flatten, adaptive_avg_pool, quantize_per_tensor, dequantize.
    * Quantized namespace: conv2d, conv2d_relu, hardswish, add, mul.
* Numerous bug fixes, in unpack_quantized_weight.cpp, symbolic functions, and unit test.

Co-authored-by: BowenBao <bowbaomicrosoft.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73102
2022-02-23 06:22:58 +00:00
Michael Melesse
785ebb9d6d [ROCM] Navi21 Enablement 3: Embedding kernels (#72809)
Summary:
This PR is a follow up to the following prs.
https://github.com/pytorch/pytorch/pull/69942
https://github.com/pytorch/pytorch/pull/72682

We are adding support to Navi21 GPUs which have a warpsize of 32. We cannot rely on a constant so we have to dynamically look up the warpsize when launching the kernel on the host side. Inside device functions this is not needed and the compiler can correctly detect the correct warpsize to replace the C10_WARP_SIZE constant.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72809

Reviewed By: mruberry

Differential Revision: D34400737

Pulled By: ngimel

fbshipit-source-id: 1a1374465d4006e485d4d11531a4c78ddb178cdf
(cherry picked from commit 94211fe1f0)
2022-02-23 04:26:58 +00:00
kshitij12345
299b40de50 [jiterator] stricter static_assert (#72576)
Summary:
* static_assert on `jiterator_stringify` usage in ROCm.
* static_assert for `complex<half>`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72576

Reviewed By: ngimel

Differential Revision: D34387640

Pulled By: mruberry

fbshipit-source-id: d58dbb062c9c301465b9b7e4a56ee3d64baaadf9
(cherry picked from commit 82d2a75519)
2022-02-23 03:33:26 +00:00
Peter Bell
9ea6db4aca fft: Fix invalid shape error for complex-to-real transforms (#73012)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/72910

`last_dim_size` is the expected output size for the
Hermitian-compressed dimension and must be > 0. The confusingly named
`ld` represents the input's last dim size which is calculated as
`last_dim_size / 2 + 1` so could never be 0.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73012

Reviewed By: ngimel

Differential Revision: D34387147

Pulled By: mruberry

fbshipit-source-id: 6b410088efe2a9e117a5c6d8beefda370363dbb0
(cherry picked from commit f8d771ed36)
2022-02-23 03:33:26 +00:00
Terry Chen
16e2f5d291 [quant] Add ConvTranspose reference module - Reland #73031 (#73094)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73094

Add ConvTranspose reference module

Test Plan:
python3 test/test_quantization.py TestQuantizeEagerOps.test_conv_transpose_2d

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D34352228

fbshipit-source-id: 03062d6b441bc5a3298ec094f421a69c4c3d5c40
(cherry picked from commit 2f2bdd4fcf)
2022-02-23 02:31:42 +00:00
Xiao Wang
2051068233 Change how cuda available memory is calculated in largeTensorTest decorator (#72207)
Summary:
Related PR https://github.com/pytorch/pytorch/issues/45332

Related discussion https://github.com/pytorch/pytorch/pull/45332#issuecomment-985996064

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72207

Reviewed By: ngimel

Differential Revision: D34387921

Pulled By: mruberry

fbshipit-source-id: 2d842a25a5d3d1fc48917ba8fb29ff96d7bc2650
(cherry picked from commit 01a9e980c7)
2022-02-23 02:31:42 +00:00
Carlos Mocholí
491ee70e6e Avoid collections deprecation warning (#72239)
Summary:
Avoids the following deprecation warning:

```python
    loss.backward(*args, **kwargs)
/usr/local/lib/python3.7/dist-packages/torch/tensor.py:245: in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py:147: in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
/usr/local/lib/python3.7/dist-packages/torch/autograd/function.py:89: in apply
    return self._forward_cls.backward(self, *args)  # type: ignore
/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/_functions.py:34: in backward
    return (None,) + ReduceAddCoalesced.apply(ctx.input_device, ctx.num_inputs, *grad_outputs)
/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/_functions.py:45: in forward
    return comm.reduce_add_coalesced(grads_, destination)
/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/comm.py:143: in reduce_add_coalesced
    flat_result = reduce_add(flat_tensors, destination)
/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/comm.py:96: in reduce_add
    nccl.reduce(inputs, output=result, root=root_index)
/usr/local/lib/python3.7/dist-packages/torch/cuda/nccl.py:69: in reduce
    _check_sequence_type(inputs)
/usr/local/lib/python3.7/dist-packages/torch/cuda/nccl.py:48: in _check_sequence_type
    if not isinstance(inputs, collections.Container) or isinstance(inputs, torch.Tensor):
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

name = 'Container'

    def __getattr__(name):
        # For backwards compatibility, continue to make the collections ABCs
        # through Python 3.6 available through the collections module.
        # Note, no new collections ABCs were added in Python 3.7
        if name in _collections_abc.__all__:
            obj = getattr(_collections_abc, name)
            import warnings
            warnings.warn("Using or importing the ABCs from 'collections' instead "
                          "of from 'collections.abc' is deprecated since Python 3.3,"
                          "and in 3.9 it will stop working",
>                         DeprecationWarning, stacklevel=2)
E           DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working

/usr/lib/python3.7/collections/__init__.py:52: DeprecationWarning
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72239

Reviewed By: ngimel

Differential Revision: D34387815

Pulled By: mruberry

fbshipit-source-id: 30c9b4fe518351bc9a6f211269e27ee3ab73a13c
(cherry picked from commit 1f68cdfac5)
2022-02-23 02:31:42 +00:00
Peter Bell
facd6f0bea Unpin librosa and update SciPy pin (#72834)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72834

This removes the upper bound to librosa's pin and updates the scipy
pin since librosa 0.9 requires SciPy 1.2 or newer.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D34386898

Pulled By: mruberry

fbshipit-source-id: db654bd337b474cd5a2ff8dbb9a659ed272728cf
(cherry picked from commit 4790e8180c)
2022-02-23 02:31:42 +00:00
Peter Bell
0947521268 Update stft tests to support latest librosa (#72833)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72833

Closes #72550

The latest version of librosa breaks backward compatibility in two
ways:
- Everything except the input tensor is now keyword-only
- `pad_mode` now defaults to `'constant'` for zero-padding

https://librosa.org/doc/latest/generated/librosa.stft.html

This changes the test to match the old behaior even when using the new
library and updates the documentation to explicitly say that
`torch.stft` doesn't exactly follow the librosa API. This was always
true (`torch.stft` it has new arguments, a different default window
and supports complex input), but it can't hurt to be explicit.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D34386897

Pulled By: mruberry

fbshipit-source-id: 6adc23f48fcb368dacf70602e9197726d6b7e0c1
(cherry picked from commit b5c5ed4196)
2022-02-23 02:31:42 +00:00
Cody Yu
1ef244e003 Fix tensor.__deepcopy__ for lazy device (#73197)
Summary:
A small bug that misses `lazy` in tensor.__deepcopy__, which results in segmentation when deepcopy a lazy model.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73197

Reviewed By: jbschlosser

Differential Revision: D34394482

Pulled By: wconstab

fbshipit-source-id: c84fdb9b3a827677971fd3477a92679d7dbce3c0
(cherry picked from commit c003d150ce)
2022-02-23 02:31:42 +00:00
Neeraj Pradhan
af902102e0 Fix discrete sampler test to correctly run Chi2 test (#73251)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73251

Scipy's chisquare test requires that the observed frequencies should sum up to the same number as the expected frequences. This modifies `_check_sampler_discrete` to ensure that two match. See: https://github.com/scipy/scipy/issues/12282 for details.

Test Plan: Unit tests pass on platform010

Reviewed By: r-barnes

Differential Revision: D34402314

fbshipit-source-id: 995b4ddf668cfb551176d3bd21fb8415dfe96cc1
(cherry picked from commit d81a133b0d)
2022-02-23 02:31:42 +00:00
Peter Bell
3d9ec11fea Quantized LSTM/GRU: Remove legacy API support (#72522)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72522

Ref #72263 for cpp_custom_type_hack removal

These overloads were deprecated in #35787 which was in the PyTorch 1.6
release, so the BC period is well expired.

cc jamesr66a

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D34111271

Pulled By: albanD

fbshipit-source-id: 0078564188133625ca67137975fd5dd2fa2b4827
(cherry picked from commit 4f9c5a3ed7)
2022-02-23 01:29:30 +00:00
Eli Uriegas
4267e6e55e Fix formatting issues for onnx
Summary:
These are formatting changes automatically done with `arc f` to deal with issues landing the onnx changes in this stack

{F703786210}

Test Plan: yeah_sandcastle

Reviewed By: malfet

Differential Revision: D34402111

fbshipit-source-id: 06eb352d1e4f8b1439a580148fe1060fb5c9e102
(cherry picked from commit 7bbf29ed8e)
2022-02-22 23:31:13 +00:00
BowenBao
cc2aad2ef2 [ONNX] Add symbolic for torch.addcmul (#72126)
* Add addcmul op

* Remove required_grad

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73101
2022-02-22 22:48:18 +00:00
BowenBao
28bf2f80cf Don't call _jit_pass_onnx_function_extraction if export_modules_as_functions is False (#69742)
* fix clang-format violations

* Don't call _jit_pass_onnx_function_extraction if export_modules_as_functions is False

It's just wasteful.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73100
2022-02-22 22:43:53 +00:00
francescocastelli
cbb2df541a Added check for unsupported dispatch key in codegen (#67961)
Summary:
Added a check for the dispatch keys present in native_function.yaml, they must be part of the fixed set of dispatch keys. If not, signal an error. I also removed two dispatch keys from the function schema copy_ , because they are not supported (SparseHIP, SpareXPU).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67961

Test Plan:
this function schema (for example) in native_function.yaml
```
- func: native_norm(Tensor self, Scalar p=2) -> Tensor
  dispatch:
    SparseCPU, SparseCUDA, SparseHIP: norm_sparse
```
now generates this error during codegen:  `AssertionError: SparseHIP is not a supported dispatch key.`

Fixes https://github.com/pytorch/pytorch/issues/66190

Reviewed By: albanD

Differential Revision: D34327853

Pulled By: ezyang

fbshipit-source-id: 6959d14a7752aefd025baa482d56547b4ed69b4c
(cherry picked from commit 26bea380af)
2022-02-22 22:31:47 +00:00
Yedidya Feldblum
7a5b0efc64 [caffe2] fix build failures in optimized builds under clang
Summary:
There are various possible approaches, but the approach chosen minimizes disruption to source control blame.

Addresses:
```
error: Function _ZN23FunctionalTest_Pad_Test8TestBodyEv is too big to optimize [-Werror,-Wignored-optimization-argument]
```

Test Plan: buck2 build mode/opt caffe2/test/cpp/api:functional

Reviewed By: jamesr66a

Differential Revision: D34027291

fbshipit-source-id: 9dfd771ad56d3d4bc0d41b38b04654c8dae7c006
(cherry picked from commit d43b5a7ed6)
2022-02-22 22:31:47 +00:00
Richard Barnes
600f4bf20c Clean up some unused variable warnings (#73151)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73151

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D34365492

fbshipit-source-id: d9eaa2e21aacd8ff0b97152e590d83f682df4667
(cherry picked from commit ca0efc53db)
2022-02-22 21:30:14 +00:00
hauntsaninja
e9c64168d9 Import packaging.version in torch_version, if available (#71902)
Summary:
Resolves https://github.com/pytorch/pytorch/issues/71280

We used to use `from pkg_resources import packaging`. To recap, this has
three potential problems:
1) `pkg_resources` is a really slow import
2) We have an undeclared runtime dependency on `setuptools`
3) We're relying on `pkg_resources`'s secret vendored copy of
   `packaging`. This is obviously not part of the public API of
   `pkg_resources`.

In https://github.com/pytorch/pytorch/issues/71345 this was made a lazy import, which is great! It means we don't
run into these problems as long as users don't use `torch.__version__`.

This change additionally helps further address problems 1 and 3, by
directly importing `packaging`, if present, and only falling back to the
vendored copy in `pkg_resources`.

Benchmark for speed difference in a virtual environment with a couple
hundred packages installed:
```
λ hyperfine -w 2 'python -c "from pkg_resources import packaging"' 'python -c "import packaging.version"'
Benchmark 1: python -c "from pkg_resources import packaging"
  Time (mean ± σ):     706.7 ms ±  77.1 ms    [User: 266.5 ms, System: 156.8 ms]
  Range (min … max):   627.9 ms … 853.2 ms    10 runs

Benchmark 2: python -c "import packaging.version"
  Time (mean ± σ):      53.8 ms ±   8.5 ms    [User: 34.8 ms, System: 14.4 ms]
  Range (min … max):    46.3 ms …  72.3 ms    53 runs
  'python -c "import packaging.version"' ran
   13.14 ± 2.52 times faster than 'python -c "from pkg_resources import packaging"'
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71902

Reviewed By: mikaylagawarecki

Differential Revision: D34343145

Pulled By: malfet

fbshipit-source-id: a6bd7ecf0cbb6b5c20ab18a22576aa2df9eb3324
(cherry picked from commit 0a249044c8)
2022-02-22 21:30:14 +00:00
Alban Desmaison
7e919bd3c6 add dry run option and improve test list printing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73208
2022-02-22 20:45:41 +00:00
Samantha Andow
53faf78143 expanded weights without fast rules (#70140)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70140

[Design Doc for Expanded Weights](https://gist.github.com/samdow/fa0a164fec7963f93ff45284989cfc55) <-- gives an overview of the design for Expanded Weights

Introduces the ExpandedWeights mechanism and user-facing API without any custom implemented, faster rules.
 - User facing API is in `_stateless.py` (with documentation)
 - Testing is in test_expanded_weights
 - The rest is the implementation of the erroring fallback + the mechanism for being able to register faster per sample grad rules. Only linear is implemented here, but they are all implemented in #70141

Test Plan: Imported from OSS

Reviewed By: mikaylagawarecki

Differential Revision: D34350950

Pulled By: samdow

fbshipit-source-id: 69c664b0bc3dff6951358d79d7e5d94882f7aef2
(cherry picked from commit ae1620d3b6)
2022-02-22 20:35:16 +00:00
Alban Desmaison
7807a83f6e Fix error handling TestSetDefaultMobileCPUAllocator
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73207
2022-02-22 19:45:49 +00:00
Nikita Shulga
cfb6c942fe scatter_reduce documentation (#73125)
Summary:
Reland of https://github.com/pytorch/pytorch/issues/68580 (which were milestoned for 1.11) plus partial revert of https://github.com/pytorch/pytorch/pull/72543

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73125

Reviewed By: bdhirsh

Differential Revision: D34355217

Pulled By: malfet

fbshipit-source-id: 325ecdeaf53183d653b44ee5e6e8839ceefd9200
(cherry picked from commit 71db31748a)
2022-02-22 19:33:46 +00:00
Nikita Shulga
e12c57a35b [ONNX] Apply clang-format changes (#73220)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73220

Test Plan: CI

Reviewed By: seemethere

Differential Revision: D34395058

fbshipit-source-id: dd043f32ba4e33f1ceeffbf432942a850488e628
(cherry picked from commit c5265e90c7)
2022-02-22 19:33:46 +00:00
Scott Wolchok
28339ddc25 [PyTorch] Hit fused addmm path in linear() for existing MHA (#72871)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72871

We do this same trick in the native MHA implementation; backport it for purposes of fair comparison.
ghstack-source-id: 149526858

Test Plan: CI

Reviewed By: ngimel

Differential Revision: D34176090

fbshipit-source-id: 8b578c29c4dcf0d85bae74dfbbb82db9a8f32dc7
(cherry picked from commit fd50170935)
2022-02-22 19:33:46 +00:00
Nikita Shulga
8625623e86 Update clang-format hash
It was out-of-date, rendering lint/clang-format a no-op

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73225
2022-02-22 19:24:53 +00:00
Eli Uriegas
0bcf190c7a .github: Create superuser group for GHF
Creates the superuser group for GHF to allow for any changes reviewed by
these individuals to be automatically merged using our GHF tooling

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73221
2022-02-22 19:14:33 +00:00
Nikita Shulga
9a96604800 Revert D34318185: [pytorch][PR] Ensure that call before redispatch work well for PythonTLSSnapshot
Test Plan: revert-hammer

Differential Revision:
D34318185 (04c9e52ecc)

Original commit changeset: abc30fe69176

Original Phabricator Diff: D34318185 (04c9e52ecc)

fbshipit-source-id: ba40c2e1eceb1c4b71ac6edefc64d01e174d9524
(cherry picked from commit f47961904d)
2022-02-22 18:31:13 +00:00
Pavithran Ramachandran
932adf26e4 [easy][PyTorch][CleanUp] Removing unused function def (missing function implementation) (#73019)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73019

fb: Code search shows no usage https://www.internalfb.com/code/search?q=repo%3Aall%20writeMobileMetadata&hide_uninteresting=0&hide_tests=0
ghstack-source-id: 149381949

Test Plan: CI

Reviewed By: larryliu0820

Differential Revision: D34306823

fbshipit-source-id: b405e5683113bd4ff2e89eec023ae9ebb25c3dc9
(cherry picked from commit a72621fbbd)
2022-02-22 17:31:32 +00:00
Vasiliy Kuznetsov
6d86dc5390 dbr quant: store auto_quant_state on the top level model (#72934)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72934

Before this PR, DBR quantization had a limitation on handling user
code which iterates over all module children. For example, imagine
a forward function such as

```
def forward(self, x):
    for module in self:
        x = module(x)
    return x
```

Before this PR, this code would break with DBR quantization, because
we attach `AutoQuantizationState` objects to each child, and those
objects live in the child's module hierarchy and will appear in
these kinds of iterations, changing the meaning of the user program.

This PR reduces the scope of this problem to just the top level module.
Instead of attaching `AutoQuantizationState` objects to each child,
we register them in a map on the parent. Here is a before and after:

```
// toy model
model
 |--> child1

// toy model with AutoQuantizationState objects, before this PR
model
 |--> child1
 |  |--> _auto_quant_state
 |--> _auto_quant_state

// toy model with AutoQuantizationState objects, after this PR
model
 |--> child1
 |--> _fqn_to_auto_quant_state_map
    |--> ( ) --> _auto_quant_state // of `model`
    |--> (child1) --> _auto_quant_state // of `model.child1`
```

Note: `child1._auto_quant_state` works as before for convenience,
but the `child1` object now stores a soft link to its `_auto_quant_state`
instead of properly registering it in its module hierarchy. This is
somewhat hacky. If we need to improve this in the future, we could
remove this soft link and refactor the code to call the FQN map
instead.

Note: if the top level module iterates over its children, things will
still be broken. This is less likely, and we will recommend that the
user work around this by wrapping their model, or checking for the
`AutoQuantizationStateModuleDict` type in their iteration loop.

The impact of this change should be an improvement of coverage
of user models. In fact, we expect this to drive our coverage of
torchbenchmark models from 89% to 100%.

Test Plan:
```
// previously disabled test cases with user code iterating
// over module children are now enabled, with wrappers
python test/test_quantization.py -k test_module_calls_items
python test/test_quantization.py -k test_vovnet_sequential
```

Reviewed By: dzdang

Differential Revision: D34281074

Pulled By: vkuzo

fbshipit-source-id: 0e25fc1ec529c47f72478a1875fe43219feac6b1
(cherry picked from commit 4008f89967)
2022-02-22 17:31:32 +00:00
Andrew Gu
c30659ffcc [ZeRO] (Reland) Add ctor support for multiple param groups (#72932)
Summary:
Reland of https://github.com/pytorch/pytorch/pull/72578.

**Overview**
Windows CI was failing due to the multi-rank single-GPU case (see [here](https://github.com/pytorch/pytorch/runs/5204906995?check_suite_focus=true)).

To address this, I
- added `common_distributed.skip_if_no_gpu` for `test_multiple_param_groups()` to ensure that each rank can safely call `to(self.device)` -- this targets the expected SPSD use case where each rank has its own GPU;
- moved `test_constructor()` back to `TestZeroRedundancyOptimizerSingleRank` to check that the multiple parameter group method for construction works even on a single rank.

**Test Plan**
- I checked both tests for CPU, 1 GPU, 2 GPUs, 4 GPUs, and 8 GPUs.
- I added the `ciflow/win` label to run the failing Windows CI test.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72932

Reviewed By: rohan-varma

Differential Revision: D34281482

Pulled By: awgu

fbshipit-source-id: c4fe604ddd9d2c123c3071249741e6b8a6454b6e
(cherry picked from commit 6bea9bcc63)
2022-02-22 16:29:55 +00:00
Facebook Community Bot
1d404727c5 Automated submodule update: FBGEMM (#73061)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM).

New submodule commit: 51344755fe

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73061

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: jspark1105, jiecaoyu

Differential Revision: D34331487

fbshipit-source-id: 39cc6d4c0c7a0c8ee26cb385966123990f9e6eda
(cherry picked from commit 53919f8173)
2022-02-22 16:29:55 +00:00
Alban Desmaison
04c9e52ecc Ensure that call before redispatch work well for PythonTLSSnapshot (#73045)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73045

Reviewed By: zou3519

Differential Revision: D34318185

Pulled By: albanD

fbshipit-source-id: abc30fe69176ba474e28bb045406a410e17cfd79
(cherry picked from commit 4d9a305d3a)
2022-02-22 15:30:07 +00:00
Adam Costarino
849c6a526e Extrapolated on equiv between linalg @ and solve (#71769)
Summary:
Potentially fixes https://github.com/pytorch/pytorch/issues/71385 similar docstring could also fix  https://github.com/pytorch/pytorch/issues/71384

Updated the doc to `torch.linalg.inv` to include nuance around equivalence to `torch.linalg.solve`:

Update is below:
```
.. note::
    Consider using :func:`torch.linalg.solve` if possible for multiplying a matrix on the left by
    the inverse, as::

        linalg.solve(A, B) == linalg.inv(A) @ B  # When B is a matrix

    It is always prefered to use :func:`~solve` when possible, as it is faster and more
    numerically stable than computing the inverse explicitly.
```

IvanYashchuk please inform if this the right direction or over-extrapolation. I can apply the same changes to the `tensorinv` doc to fix https://github.com/pytorch/pytorch/issues/71384. Also in https://github.com/pytorch/pytorch/issues/71384 there was a mention of updating `torch.matmul` error message to indicate the proper tensor shapes, I could also potentially do that in this PR if needed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71769

Reviewed By: H-Huang

Differential Revision: D34242541

Pulled By: mruberry

fbshipit-source-id: 40e98dad4d821928d1dea72d4512ee579b690a32
(cherry picked from commit a0321a5de9)
2022-02-22 12:29:32 +00:00
Linbin Yu
99bcadced4 improve android instrumentation test and update README
Added tests for lite interpreter. By default the run_test.sh will use lite interpreter, unless manually set BUILD_LITE_INTERPRETER=0

Also fixed model generation script for android instrumentation test and README.

Verified test can pass for both full jit and lite interpreter. Also tested on emulator and real device using different abis.

Lite interpreter
```
./scripts/build_pytorch_android.sh x86
./android/run_tests.sh
```

Full JIT
```
BUILD_LITE_INTERPRETER=0 ./scripts/build_pytorch_android.sh x86
BUILD_LITE_INTERPRETER=0 ./android/run_tests.sh
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72736
2022-02-22 08:05:33 +00:00
Richard Barnes
c2255c36ec Fix binary search in bisect_percentile_op (#73146)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73146

Binary search can overflow; this fixes it.

Test Plan: Sandcastle

Reviewed By: meyering

Differential Revision: D34365186

fbshipit-source-id: f92a810b49ef5ce345d0b019b584fe3c1f5ae017
(cherry picked from commit 9c2133ec6f)
2022-02-21 22:30:32 +00:00
Nikita Shulga
56aae5beca Update on "Add BUILD_LAZY_CUDA_LINALG option"
When enable, it will generate `torch_cuda_linalg` library, which would depend on cusolve and magma and registers dynamic bindings to it from LinearAlgebraStubs

Differential Revision: [D33992795](https://our.internmc.facebook.com/intern/diff/D33992795)

[ghstack-poisoned]
2022-02-21 19:05:02 +00:00
Nikita Shulga
863135a54d Update base for Update on "Add BUILD_LAZY_CUDA_LINALG option"
When enable, it will generate `torch_cuda_linalg` library, which would depend on cusolve and magma and registers dynamic bindings to it from LinearAlgebraStubs

Differential Revision: [D33992795](https://our.internmc.facebook.com/intern/diff/D33992795)

[ghstack-poisoned]
2022-02-21 19:05:02 +00:00
Nikita Shulga
5dad19fef0 Back out "[pytorch][PR] add BFloat16 sparse operators on CPU: copy, coalesce, sparse_mask, ad…"
Summary:
Original commit changeset: f1274125234a

Original Phabricator Diff: D34343016 (c6f56599bb)

Test Plan: Abovementioned PR regressed OSS CI

Reviewed By: atalman

Differential Revision: D34379703

fbshipit-source-id: bc624cfd86249dde2fac635d9b66f08f86b4aed9
(cherry picked from commit e52827f1ae)
2022-02-21 18:31:51 +00:00
Taylor Robie
9f541aa3ac [Profiler] Optimize reportMemoryUsage (#71538)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71538

`reportMemoryUsage` is kind of awful. It does a bunch of string writes and such that makes it VERY expensive. Just moving that work off the hot path reduces the overhead for `profile_memory` from ~6.5 us to ~1.2 us. (85% reduction in the kineto contribution to profiling overhead.)

Test Plan: Ran ubenchmark with `--op empty --stressTestKineto --kinetoProfileMemory`

Reviewed By: swolchok

Differential Revision: D32730167

fbshipit-source-id: fe18e8fa3881967cad8fa1c26c71c805e9b034e5
(cherry picked from commit 0d394cb252)
2022-02-20 23:29:13 +00:00
Richard Barnes
24c91e23d3 Fix nasty bug in bisect_percentile_op (#73147)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73147

Code used `reserve` instead of `resize` leading to platform010 test failures:
```
Trying example: test_bisect_percentil_op_large(
    self=<caffe2.caffe2.python.operator_test.bisect_percentile_op_test.TestBisectPercentileOp testMethod=test_bisect_percentil_op_large>,
    N=20,
    lengths=[2, 2],
    max_value=100,
    discrete=False,
    p=0.0,
    gc=,
    dc=[],
)

stderr:
E0219 13:14:52.601948 995877 JustKnobsConfigeratorLoader.cpp:114] Failed to load config justknobs/movefast/knobs after 55000ms timeout
E0219 13:14:52.602150 995877 JustKnobsConfigeratorLoader.cpp:114] Failed to load config justknobs/pytorch/compiler after 55000ms timeout
test_bisect_percentil_op_large (caffe2.caffe2.python.operator_test.bisect_percentile_op_test.TestBisectPercentileOp) ... third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/stl_vector.h:1045: std::vector::reference std::vector<int>::operator[](std::vector::size_type) [_Tp = int, _Alloc = std::allocator<int>]: Assertion '__n < this->size()' failed.
*** Aborted at 1645305292 (Unix time, try 'date -d 1645305292') ***
*** Signal 6 (SIGABRT) (0x8556000f3225) received by PID 995877 (pthread TID 0x7f13a79c51c0) (linux TID 995877) (maybe from PID 995877, UID 34134) (code: -6), stack trace: ***
W0219 13:14:52.682251 995932 RetryingSender.cpp:433] Failed to make rpc. Sender name: pr-scubasing. Reason: apache::thrift::transport::TTransportException: AsyncSocketException: connect failed, type = Socket not open, errno = 111 (Connection refused): Connection refused.
    @ 000000000000431b folly::symbolizer::(anonymous namespace)::signalHandler(int, siginfo_t*, void*)
                       ./folly/experimental/symbolizer/SignalHandler.cpp:449
    @ 0000000000000000 (unknown)
    @ 000000000009c9f3 __GI___pthread_kill
```

Test Plan: Sandcastle

Reviewed By: luciang

Differential Revision: D34365188

fbshipit-source-id: 65dcc23226c59096afd5fb3c338c3bd29c936ec3
(cherry picked from commit a1d18e3e6a)
2022-02-20 17:28:35 +00:00
Michael Suo
bf03d93496 Revert D33919683: [FSDP] Implement local_state_dict and load_local_state_dict
Test Plan: revert-hammer

Differential Revision:
D33919683 (d50643adcd)

Original commit changeset: c9f1b43ce04d

Original Phabricator Diff: D33919683 (d50643adcd)

fbshipit-source-id: c54c181edf8eb6a3bc509ed54d34ffdce11b93f5
(cherry picked from commit 4dfb50cd0d)
2022-02-20 02:32:48 +00:00