Commit Graph

20174 Commits

Author SHA1 Message Date
Rohan Varma
782ee6c7e7 [FSDP][Reland] Implement local_state_dict and load_local_state_dict
1. Implement the framework to allow user to choose among `state_dict`, `local_state_dict`, and `sharded_state_dict`.
2. Implement ShardedTensor compatible local_state_dict() and load_local_state_dict().
ghstack-source-id: 149625958

Differential Revision: [D34383925](https://our.internmc.facebook.com/intern/diff/D34383925/)

[ghstack-poisoned]
2022-02-23 07:57:34 -08:00
Rohan Varma
4bb27ae7d3 Skip optimizer overlap tests that have issues with NCCL async error handling
Skip these tests which sometimes have issues on unrelated PRs such as
https://github.com/pytorch/pytorch/runs/5291461671?check_suite_focus=true. See
https://github.com/pytorch/pytorch/issues/73259 for additional detail Skip
these tests which sometimes have issues on unrelated PRs such as
https://github.com/pytorch/pytorch/runs/5291461671?check_suite_focus=true. See
https://github.com/pytorch/pytorch/issues/73259 for additional details.

Differential Revision: [D34404857](https://our.internmc.facebook.com/intern/diff/D34404857/)

[ghstack-poisoned]
2022-02-22 15:53:01 -08:00
Scott Wolchok
28339ddc25 [PyTorch] Hit fused addmm path in linear() for existing MHA (#72871)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72871

We do this same trick in the native MHA implementation; backport it for purposes of fair comparison.
ghstack-source-id: 149526858

Test Plan: CI

Reviewed By: ngimel

Differential Revision: D34176090

fbshipit-source-id: 8b578c29c4dcf0d85bae74dfbbb82db9a8f32dc7
(cherry picked from commit fd50170935)
2022-02-22 19:33:46 +00:00
Pavithran Ramachandran
932adf26e4 [easy][PyTorch][CleanUp] Removing unused function def (missing function implementation) (#73019)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73019

fb: Code search shows no usage https://www.internalfb.com/code/search?q=repo%3Aall%20writeMobileMetadata&hide_uninteresting=0&hide_tests=0
ghstack-source-id: 149381949

Test Plan: CI

Reviewed By: larryliu0820

Differential Revision: D34306823

fbshipit-source-id: b405e5683113bd4ff2e89eec023ae9ebb25c3dc9
(cherry picked from commit a72621fbbd)
2022-02-22 17:31:32 +00:00
Vasiliy Kuznetsov
6d86dc5390 dbr quant: store auto_quant_state on the top level model (#72934)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72934

Before this PR, DBR quantization had a limitation on handling user
code which iterates over all module children. For example, imagine
a forward function such as

```
def forward(self, x):
    for module in self:
        x = module(x)
    return x
```

Before this PR, this code would break with DBR quantization, because
we attach `AutoQuantizationState` objects to each child, and those
objects live in the child's module hierarchy and will appear in
these kinds of iterations, changing the meaning of the user program.

This PR reduces the scope of this problem to just the top level module.
Instead of attaching `AutoQuantizationState` objects to each child,
we register them in a map on the parent. Here is a before and after:

```
// toy model
model
 |--> child1

// toy model with AutoQuantizationState objects, before this PR
model
 |--> child1
 |  |--> _auto_quant_state
 |--> _auto_quant_state

// toy model with AutoQuantizationState objects, after this PR
model
 |--> child1
 |--> _fqn_to_auto_quant_state_map
    |--> ( ) --> _auto_quant_state // of `model`
    |--> (child1) --> _auto_quant_state // of `model.child1`
```

Note: `child1._auto_quant_state` works as before for convenience,
but the `child1` object now stores a soft link to its `_auto_quant_state`
instead of properly registering it in its module hierarchy. This is
somewhat hacky. If we need to improve this in the future, we could
remove this soft link and refactor the code to call the FQN map
instead.

Note: if the top level module iterates over its children, things will
still be broken. This is less likely, and we will recommend that the
user work around this by wrapping their model, or checking for the
`AutoQuantizationStateModuleDict` type in their iteration loop.

The impact of this change should be an improvement of coverage
of user models. In fact, we expect this to drive our coverage of
torchbenchmark models from 89% to 100%.

Test Plan:
```
// previously disabled test cases with user code iterating
// over module children are now enabled, with wrappers
python test/test_quantization.py -k test_module_calls_items
python test/test_quantization.py -k test_vovnet_sequential
```

Reviewed By: dzdang

Differential Revision: D34281074

Pulled By: vkuzo

fbshipit-source-id: 0e25fc1ec529c47f72478a1875fe43219feac6b1
(cherry picked from commit 4008f89967)
2022-02-22 17:31:32 +00:00
Andrew Gu
c30659ffcc [ZeRO] (Reland) Add ctor support for multiple param groups (#72932)
Summary:
Reland of https://github.com/pytorch/pytorch/pull/72578.

**Overview**
Windows CI was failing due to the multi-rank single-GPU case (see [here](https://github.com/pytorch/pytorch/runs/5204906995?check_suite_focus=true)).

To address this, I
- added `common_distributed.skip_if_no_gpu` for `test_multiple_param_groups()` to ensure that each rank can safely call `to(self.device)` -- this targets the expected SPSD use case where each rank has its own GPU;
- moved `test_constructor()` back to `TestZeroRedundancyOptimizerSingleRank` to check that the multiple parameter group method for construction works even on a single rank.

**Test Plan**
- I checked both tests for CPU, 1 GPU, 2 GPUs, 4 GPUs, and 8 GPUs.
- I added the `ciflow/win` label to run the failing Windows CI test.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72932

Reviewed By: rohan-varma

Differential Revision: D34281482

Pulled By: awgu

fbshipit-source-id: c4fe604ddd9d2c123c3071249741e6b8a6454b6e
(cherry picked from commit 6bea9bcc63)
2022-02-22 16:29:55 +00:00
Adam Costarino
849c6a526e Extrapolated on equiv between linalg @ and solve (#71769)
Summary:
Potentially fixes https://github.com/pytorch/pytorch/issues/71385 similar docstring could also fix  https://github.com/pytorch/pytorch/issues/71384

Updated the doc to `torch.linalg.inv` to include nuance around equivalence to `torch.linalg.solve`:

Update is below:
```
.. note::
    Consider using :func:`torch.linalg.solve` if possible for multiplying a matrix on the left by
    the inverse, as::

        linalg.solve(A, B) == linalg.inv(A) @ B  # When B is a matrix

    It is always prefered to use :func:`~solve` when possible, as it is faster and more
    numerically stable than computing the inverse explicitly.
```

IvanYashchuk please inform if this the right direction or over-extrapolation. I can apply the same changes to the `tensorinv` doc to fix https://github.com/pytorch/pytorch/issues/71384. Also in https://github.com/pytorch/pytorch/issues/71384 there was a mention of updating `torch.matmul` error message to indicate the proper tensor shapes, I could also potentially do that in this PR if needed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71769

Reviewed By: H-Huang

Differential Revision: D34242541

Pulled By: mruberry

fbshipit-source-id: 40e98dad4d821928d1dea72d4512ee579b690a32
(cherry picked from commit a0321a5de9)
2022-02-22 12:29:32 +00:00
Taylor Robie
9f541aa3ac [Profiler] Optimize reportMemoryUsage (#71538)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71538

`reportMemoryUsage` is kind of awful. It does a bunch of string writes and such that makes it VERY expensive. Just moving that work off the hot path reduces the overhead for `profile_memory` from ~6.5 us to ~1.2 us. (85% reduction in the kineto contribution to profiling overhead.)

Test Plan: Ran ubenchmark with `--op empty --stressTestKineto --kinetoProfileMemory`

Reviewed By: swolchok

Differential Revision: D32730167

fbshipit-source-id: fe18e8fa3881967cad8fa1c26c71c805e9b034e5
(cherry picked from commit 0d394cb252)
2022-02-20 23:29:13 +00:00
Michael Suo
bf03d93496 Revert D33919683: [FSDP] Implement local_state_dict and load_local_state_dict
Test Plan: revert-hammer

Differential Revision:
D33919683 (d50643adcd)

Original commit changeset: c9f1b43ce04d

Original Phabricator Diff: D33919683 (d50643adcd)

fbshipit-source-id: c54c181edf8eb6a3bc509ed54d34ffdce11b93f5
(cherry picked from commit 4dfb50cd0d)
2022-02-20 02:32:48 +00:00
Michael Suo
2a7f9f0600 Revert D34284271: [TLC][checkpoint] Add unit test for StatefulComponentCheckpointAgent
Test Plan: revert-hammer

Differential Revision:
D34284271 (f49a93ba56)

Original commit changeset: 58f84c69782a

Original Phabricator Diff: D34284271 (f49a93ba56)

fbshipit-source-id: 87deabae3c3c10c5a9532825ca33d78c5251958e
(cherry picked from commit 03bc05a970)
2022-02-19 21:28:55 +00:00
Chien-Chin Huang
d50643adcd [FSDP] Implement local_state_dict and load_local_state_dict (#72469)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72469

1. Implement the framework to allow user to choose among `state_dict`, `local_state_dict`, and `sharded_state_dict`.
2. Implement ShardedTensor compatible local_state_dict() and load_local_state_dict().
ghstack-source-id: 149559985

Test Plan: CI

Reviewed By: rohan-varma

Differential Revision: D33919683

fbshipit-source-id: c9f1b43ce04da7db65c4aebf6ac2c7a0ac5e9de8
(cherry picked from commit 55fd6230c9)
2022-02-19 20:29:27 +00:00
Shihao Xu
f49a93ba56 [TLC][checkpoint] Add unit test for StatefulComponentCheckpointAgent
Summary: as titiled

Test Plan:
tsloop --mode-dev-nosan aiplatform/modelstore/client/tests/:stateful_component_checkpoint_agent_test -- --focus --fail-fast

buck build mode/dev-nosan //aiplatform/modelstore/client/tests/:stateful_component_checkpoint_agent_test

./buck-out/gen///aiplatform/modelstore/client/tests//stateful_component_checkpoint_agent_test#binary.par --focus --fail-fast

Reviewed By: xunnanxu

Differential Revision: D34284271

fbshipit-source-id: 58f84c69782a7bdb30bed0a2420c74e7b7487bb9
(cherry picked from commit a1037118f4)
2022-02-19 18:31:20 +00:00
Nikolay Korovaiko
237574db19 add assert to make sure expected number of LTC roots matches what TS … (#73112)
Summary:
…computes

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73112

Reviewed By: mikaylagawarecki

Differential Revision: D34351338

Pulled By: Krovatkin

fbshipit-source-id: 1b3d0f3c801bd095b68d2eff3184ecbefadf7f34
(cherry picked from commit 53b7fc4ad6)
2022-02-19 06:33:08 +00:00
patel-zeel
c837caf5c5 Adding details to kl.py (#72845)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/72765.

- [x] Improved `NotImplementedError` verbosity.
- [x] Automate the docstring generation process

## Improved `NotImplementedError` verbosity
### Code
```python
import torch

dist = torch.distributions

torch_normal = dist.Normal(loc=0.0, scale=1.0)
torch_mixture = dist.MixtureSameFamily(
    dist.Categorical(torch.ones(5,)
    ),
    dist.Normal(torch.randn(5,), torch.rand(5,)),
)

dist.kl_divergence(torch_normal, torch_mixture)
```
#### Output before this PR
```python
NotImplementedError:
```
#### Output after this PR
```python
NotImplementedError: No KL(p || q) is implemented for p type Normal and q type MixtureSameFamily
```

## Automate the docstring generation process
### Docstring before this PR
```python
Compute Kullback-Leibler divergence :math:`KL(p \| q)` between two distributions.

    .. math::

        KL(p \| q) = \int p(x) \log\frac {p(x)} {q(x)} \,dx

    Args:
        p (Distribution): A :class:`~torch.distributions.Distribution` object.
        q (Distribution): A :class:`~torch.distributions.Distribution` object.

    Returns:
        Tensor: A batch of KL divergences of shape `batch_shape`.

    Raises:
        NotImplementedError: If the distribution types have not been registered via
            :meth:`register_kl`.
```
### Docstring after this PR
```python
Compute Kullback-Leibler divergence :math:`KL(p \| q)` between two distributions.

    .. math::

        KL(p \| q) = \int p(x) \log\frac {p(x)} {q(x)} \,dx

    Args:
        p (Distribution): A :class:`~torch.distributions.Distribution` object.
        q (Distribution): A :class:`~torch.distributions.Distribution` object.

    Returns:
        Tensor: A batch of KL divergences of shape `batch_shape`.

    Raises:
        NotImplementedError: If the distribution types have not been registered via
            :meth:`register_kl`.
    KL divergence is currently implemented for the following distribution pairs:
        * :class:`~torch.distributions.Bernoulli` and :class:`~torch.distributions.Bernoulli`
        * :class:`~torch.distributions.Bernoulli` and :class:`~torch.distributions.Poisson`
        * :class:`~torch.distributions.Beta` and :class:`~torch.distributions.Beta`
        * :class:`~torch.distributions.Beta` and :class:`~torch.distributions.ContinuousBernoulli`
        * :class:`~torch.distributions.Beta` and :class:`~torch.distributions.Exponential`
        * :class:`~torch.distributions.Beta` and :class:`~torch.distributions.Gamma`
        * :class:`~torch.distributions.Beta` and :class:`~torch.distributions.Normal`
        * :class:`~torch.distributions.Beta` and :class:`~torch.distributions.Pareto`
        * :class:`~torch.distributions.Beta` and :class:`~torch.distributions.Uniform`
        * :class:`~torch.distributions.Binomial` and :class:`~torch.distributions.Binomial`
        * :class:`~torch.distributions.Categorical` and :class:`~torch.distributions.Categorical`
        * :class:`~torch.distributions.Cauchy` and :class:`~torch.distributions.Cauchy`
        * :class:`~torch.distributions.ContinuousBernoulli` and :class:`~torch.distributions.ContinuousBernoulli`
        * :class:`~torch.distributions.ContinuousBernoulli` and :class:`~torch.distributions.Exponential`
        * :class:`~torch.distributions.ContinuousBernoulli` and :class:`~torch.distributions.Normal`
        * :class:`~torch.distributions.ContinuousBernoulli` and :class:`~torch.distributions.Pareto`
        * :class:`~torch.distributions.ContinuousBernoulli` and :class:`~torch.distributions.Uniform`
        * :class:`~torch.distributions.Dirichlet` and :class:`~torch.distributions.Dirichlet`
        * :class:`~torch.distributions.Exponential` and :class:`~torch.distributions.Beta`
        * :class:`~torch.distributions.Exponential` and :class:`~torch.distributions.ContinuousBernoulli`
        * :class:`~torch.distributions.Exponential` and :class:`~torch.distributions.Exponential`
        * :class:`~torch.distributions.Exponential` and :class:`~torch.distributions.Gamma`
        * :class:`~torch.distributions.Exponential` and :class:`~torch.distributions.Gumbel`
        * :class:`~torch.distributions.Exponential` and :class:`~torch.distributions.Normal`
        * :class:`~torch.distributions.Exponential` and :class:`~torch.distributions.Pareto`
        * :class:`~torch.distributions.Exponential` and :class:`~torch.distributions.Uniform`
        * :class:`~torch.distributions.ExponentialFamily` and :class:`~torch.distributions.ExponentialFamily`
        * :class:`~torch.distributions.Gamma` and :class:`~torch.distributions.Beta`
        * :class:`~torch.distributions.Gamma` and :class:`~torch.distributions.ContinuousBernoulli`
        * :class:`~torch.distributions.Gamma` and :class:`~torch.distributions.Exponential`
        * :class:`~torch.distributions.Gamma` and :class:`~torch.distributions.Gamma`
        * :class:`~torch.distributions.Gamma` and :class:`~torch.distributions.Gumbel`
        * :class:`~torch.distributions.Gamma` and :class:`~torch.distributions.Normal`
        * :class:`~torch.distributions.Gamma` and :class:`~torch.distributions.Pareto`
        * :class:`~torch.distributions.Gamma` and :class:`~torch.distributions.Uniform`
        * :class:`~torch.distributions.Geometric` and :class:`~torch.distributions.Geometric`
        * :class:`~torch.distributions.Gumbel` and :class:`~torch.distributions.Beta`
        * :class:`~torch.distributions.Gumbel` and :class:`~torch.distributions.ContinuousBernoulli`
        * :class:`~torch.distributions.Gumbel` and :class:`~torch.distributions.Exponential`
        * :class:`~torch.distributions.Gumbel` and :class:`~torch.distributions.Gamma`
        * :class:`~torch.distributions.Gumbel` and :class:`~torch.distributions.Gumbel`
        * :class:`~torch.distributions.Gumbel` and :class:`~torch.distributions.Normal`
        * :class:`~torch.distributions.Gumbel` and :class:`~torch.distributions.Pareto`
        * :class:`~torch.distributions.Gumbel` and :class:`~torch.distributions.Uniform`
        * :class:`~torch.distributions.HalfNormal` and :class:`~torch.distributions.HalfNormal`
        * :class:`~torch.distributions.Independent` and :class:`~torch.distributions.Independent`
        * :class:`~torch.distributions.Laplace` and :class:`~torch.distributions.Beta`
        * :class:`~torch.distributions.Laplace` and :class:`~torch.distributions.ContinuousBernoulli`
        * :class:`~torch.distributions.Laplace` and :class:`~torch.distributions.Exponential`
        * :class:`~torch.distributions.Laplace` and :class:`~torch.distributions.Gamma`
        * :class:`~torch.distributions.Laplace` and :class:`~torch.distributions.Laplace`
        * :class:`~torch.distributions.Laplace` and :class:`~torch.distributions.Normal`
        * :class:`~torch.distributions.Laplace` and :class:`~torch.distributions.Pareto`
        * :class:`~torch.distributions.Laplace` and :class:`~torch.distributions.Uniform`
        * :class:`~torch.distributions.LowRankMultivariateNormal` and :class:`~torch.distributions.LowRankMultivariateNormal`
        * :class:`~torch.distributions.LowRankMultivariateNormal` and :class:`~torch.distributions.MultivariateNormal`
        * :class:`~torch.distributions.MultivariateNormal` and :class:`~torch.distributions.LowRankMultivariateNormal`
        * :class:`~torch.distributions.MultivariateNormal` and :class:`~torch.distributions.MultivariateNormal`
        * :class:`~torch.distributions.Normal` and :class:`~torch.distributions.Beta`
        * :class:`~torch.distributions.Normal` and :class:`~torch.distributions.ContinuousBernoulli`
        * :class:`~torch.distributions.Normal` and :class:`~torch.distributions.Exponential`
        * :class:`~torch.distributions.Normal` and :class:`~torch.distributions.Gamma`
        * :class:`~torch.distributions.Normal` and :class:`~torch.distributions.Gumbel`
        * :class:`~torch.distributions.Normal` and :class:`~torch.distributions.Laplace`
        * :class:`~torch.distributions.Normal` and :class:`~torch.distributions.Normal`
        * :class:`~torch.distributions.Normal` and :class:`~torch.distributions.Pareto`
        * :class:`~torch.distributions.Normal` and :class:`~torch.distributions.Uniform`
        * :class:`~torch.distributions.OneHotCategorical` and :class:`~torch.distributions.OneHotCategorical`
        * :class:`~torch.distributions.Pareto` and :class:`~torch.distributions.Beta`
        * :class:`~torch.distributions.Pareto` and :class:`~torch.distributions.ContinuousBernoulli`
        * :class:`~torch.distributions.Pareto` and :class:`~torch.distributions.Exponential`
        * :class:`~torch.distributions.Pareto` and :class:`~torch.distributions.Gamma`
        * :class:`~torch.distributions.Pareto` and :class:`~torch.distributions.Normal`
        * :class:`~torch.distributions.Pareto` and :class:`~torch.distributions.Pareto`
        * :class:`~torch.distributions.Pareto` and :class:`~torch.distributions.Uniform`
        * :class:`~torch.distributions.Poisson` and :class:`~torch.distributions.Bernoulli`
        * :class:`~torch.distributions.Poisson` and :class:`~torch.distributions.Binomial`
        * :class:`~torch.distributions.Poisson` and :class:`~torch.distributions.Poisson`
        * :class:`~torch.distributions.TransformedDistribution` and :class:`~torch.distributions.TransformedDistribution`
        * :class:`~torch.distributions.Uniform` and :class:`~torch.distributions.Beta`
        * :class:`~torch.distributions.Uniform` and :class:`~torch.distributions.ContinuousBernoulli`
        * :class:`~torch.distributions.Uniform` and :class:`~torch.distributions.Exponential`
        * :class:`~torch.distributions.Uniform` and :class:`~torch.distributions.Gamma`
        * :class:`~torch.distributions.Uniform` and :class:`~torch.distributions.Gumbel`
        * :class:`~torch.distributions.Uniform` and :class:`~torch.distributions.Normal`
        * :class:`~torch.distributions.Uniform` and :class:`~torch.distributions.Pareto`
        * :class:`~torch.distributions.Uniform` and :class:`~torch.distributions.Uniform`
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72845

Reviewed By: mikaylagawarecki

Differential Revision: D34344551

Pulled By: soulitzer

fbshipit-source-id: 7a603613a2f56f71138d56399c7c521e2238e8c5
(cherry picked from commit 6b2a51c796)
2022-02-19 06:33:08 +00:00
Andrey Talman
46f9e16afe Documenting cuda 11.5 windows issue (#73013)
Summary:
Adding documentation about compiling extension with CUDA 11.5 and Windows

Example of failure: https://github.com/pytorch/pytorch/runs/4408796098?check_suite_focus=true

 Note: Don't use torch/extension.h In CUDA 11.5 under windows in your C++ code:
    Use aten instead of torch interface in all cuda 11.5 code under windows. It has been failing with errors, due to a bug in nvcc.
    Example use:
        >>> #include <ATen/ATen.h>
        >>> at::Tensor SigmoidAlphaBlendForwardCuda(....)
    Instead of:
        >>> #include <torch/extension.h>
        >>> torch::Tensor SigmoidAlphaBlendForwardCuda(...)
    Currently open issue for nvcc bug: https://github.com/pytorch/pytorch/issues/69460
    Complete Workaround code example: cb170ac024

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73013

Reviewed By: malfet, seemethere

Differential Revision: D34306134

Pulled By: atalman

fbshipit-source-id: 3c5b9d7a89c91bd1920dc63dbd356e45dc48a8bd
(cherry picked from commit 87098e7f17)
2022-02-19 02:34:59 +00:00
Steven Troxler
906d26fb9b [codemod][type-comments] Convert type comments in api.py (#73084)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73084

I'm wrapping up the conversion of type comments to type annotations
in caffe2. The last remaining "bulk" codemod has test failures that
are hard for me to understand, so I'm going to submit PRs for each
module individually which makes it easier to see what's causing
problems.

All the codemods were produced via LibCST and then manually cleaned up.

Test Plan: Wait for github CI

Reviewed By: H-Huang

Differential Revision: D34344289

fbshipit-source-id: e8e3a13c3d95f6804829f1818fb7f0605e5ba137
(cherry picked from commit 92d47d9cd5)
2022-02-19 00:31:45 +00:00
shubhambhokare1
671c8a459a [ONNX] Add pixel_unshuffle support in opset 9
Current we are unable to utilize ONNX's SpaceToDepth operator due to the lack of the mode_s attribute, hence we add an alternative symbolic in opset 9 to support pixel_unshuffle

- Adds support for pixel_unshuffle in opset9
- Adds support for dynamic input shapes for pixel_shuffle and pixel_unshuffle
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72449
2022-02-19 00:15:16 +00:00
Scott Wolchok
79a216ce57 Move native MHA code out of PyTorch core (#72944)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72944

Doesn't make sense to develop it in core right now.
ghstack-source-id: 149456040

Test Plan:
CI

run MHA benchmark in benchmark_transformers.py to make sure it doesn't crash

Reviewed By: zrphercule

Differential Revision: D34283104

fbshipit-source-id: 4f0c7a6bc066f938ceac891320d4cf4c3f8a9cd6
(cherry picked from commit b9df65e97c)
2022-02-18 21:34:06 +00:00
Stephen Oakley
1646a0033d Use irange in PyTorch (#72836)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72836

Replacing increment iterator loops with ranged loops. It allows loops such as for(int i=0;i<10;i++) to be expressed as for(const auto i : c10::irange(10)). This auto-types the loops and adds const-safety to the iteration variable.

Reviewed By: albanD

Differential Revision: D34136539

fbshipit-source-id: 760a70ad43ce6f05630ba8fea261d4dbb699e62e
(cherry picked from commit 0428408d88)
2022-02-18 19:29:07 +00:00
albanD
7fe3f334fb Remove call into python API without GIL being held in c10d (#72928)
Summary:
Fix https://github.com/pytorch/pytorch/issues/26475

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72928

Reviewed By: mikaylagawarecki

Differential Revision: D34317697

Pulled By: albanD

fbshipit-source-id: e13efb98e8c6bf4cbc05181c028d68871a844bf7
(cherry picked from commit c0e0397688)
2022-02-18 19:29:07 +00:00
BowenBao
956bafef8b [onnx export] Add broadcast to matmul shape inference (#70534)
Reuse the same broadcast code from the function `ProcessBroadcastNode`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72990
2022-02-18 18:44:19 +00:00
BowenBao
98f9ff9026 [ONNX] Fix an assertion failure involving Slice (#71965)
Before this change, exporting a model to ONNX involving Slice crashes at `axes[i]` in line 153 if C++ assertions are enabled:
```
/usr/include/c++/11.1.0/bits/stl_vector.h:1045: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) [with _Tp = long int; _Alloc = std::allocator<long int>; std::vector<_Tp, _Alloc>::reference = long int&; std::vector<_Tp, _Alloc>::size_type = long unsigned int]: Assertion '__n < this->size()' failed.
```
The relevant check is https://github.com/gcc-mirror/gcc/blob/releases/gcc-11.1.0/libstdc++-v3/include/bits/stl_vector.h#L1045, which checks the vector index.

The issue can be reproduced by exporting Mask R-CNN or similar ones. For example,
```Python
import io
import torch
import torchvision as tv

model = tv.models.detection.maskrcnn_resnet50_fpn(pretrained=False)
x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
with io.BytesIO() as f:
    torch.onnx.export(model, x, f, opset_version=11)
```
(extracted from [onnxoptimizer tests](https://github.com/onnx/optimizer/blob/master/onnxoptimizer/test/optimizer_test.py))

Tested environment: Arch Linux x86_64 with pytorch and torchvisoin installed from [the official repo](https://github.com/archlinux/svntogit-community/blob/packages/python-pytorch/trunk/PKGBUILD) and [AUR](https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=python-torchvision), respectively.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72989
2022-02-18 18:41:47 +00:00
BowenBao
2791725a84 Integrate full ONNX check into ONNX export API (#71125)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72988
2022-02-18 18:40:09 +00:00
Raghavan Raman
2724e4c039 [Static Runtime] Do not replace with copy variants if TE fuser is enabled (#72946)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72946

The passes to replace with copy variants are run after TensorExpr fusion. Due to this the resulting graph does not conform to the assumptions made in the fuser.

So, even if these flags `use_copy_variants`, `use_maybe_copy_variants` are turned on, the corresponding passes will not be executed if TensorExpr fusion is enabled.

ghstack-source-id: 149429753

Test Plan: Tested locally.

Reviewed By: mikeiovine

Differential Revision: D34283842

fbshipit-source-id: 74edea517a00c85dff0319f9c8b3ac8befe09018
(cherry picked from commit 3798af7f1b)
2022-02-18 18:34:50 +00:00
Raghavan Raman
02afdd54b9 [Static Runtime] Handle fallback graphs that are generated as part of the TE Fuser (#72945)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72945

ghstack-source-id: 149429754

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/static_runtime:static_runtime_cpptest — --gtest_filter=CpuFusion.FallbackGraph
```

Reviewed By: mikeiovine

Differential Revision: D34283840

fbshipit-source-id: 868bd340a50fe691797164524f2400d07998d304
(cherry picked from commit 80f60f2cc0)
2022-02-18 18:34:50 +00:00
BowenBao
5843fea94d [ONNX] Add export support for linalg norm (#66575)
* Add matrix_norm

* Add vector norm

* Fixe flake

* Fixe flake

* nit fixes

* Nit fixes

* Restructure and add comments

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72987
2022-02-18 18:30:16 +00:00
BowenBao
32f6a1e2a2 [ONNX] First version of quantized model export: Support quantized.Linear (#69232)
Co-authored-by: David Fan <jiafamicrosoft.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72986
2022-02-18 18:27:26 +00:00
BowenBao
a6517c20cf [ONNX] Improve Expand shape inference (#69264)
Extend shape inference support for `Expand`, when value of argument `shape` is unknown. Infer the rank of the output of `Expand`, and set shape to dynamic, if shape of argument `shape` is known.

Without this, shape inference aborts, and falls back to the static shape provided by tracer, which is incorrect in many cases.

Co-authored-by: BowenBao <bowbaomicrosoft.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72985
2022-02-18 18:24:28 +00:00
Alban Desmaison
0951cb513a Revert D34342689: Revert D34250357: Sync lazy_tensor_staging back to master
Test Plan: revert-hammer

Differential Revision:
D34342689

Original commit changeset: 43f6da6986f7

Original Phabricator Diff: D34250357 (69389fb542)

fbshipit-source-id: 8a3fb74877e719e9b9577b58027b4e7061a04ef0
(cherry picked from commit c749f08e7a)
2022-02-18 17:31:21 +00:00
Jane Xu
477d1bd6cf Revert D34313425: [quant] Add ConvTranspose reference module
Test Plan: revert-hammer

Differential Revision:
D34313425 (710f12f58e)

Original commit changeset: 3eeec1b24a51

Original Phabricator Diff: D34313425 (710f12f58e)

fbshipit-source-id: aecf9113d2e4cef3ccf4e1a9c4c33b07dc2ad385
(cherry picked from commit 3fcb9cd14d)
2022-02-18 17:31:20 +00:00
Alban Desmaison
86a961af87 Revert D34250357: Sync lazy_tensor_staging back to master
Test Plan: revert-hammer

Differential Revision:
D34250357 (69389fb542)

Original commit changeset: aa7d589f6050

Original Phabricator Diff: D34250357 (69389fb542)

fbshipit-source-id: 43f6da6986f7fc5189d641b7803adc5ada27194c
(cherry picked from commit 3c930a5e4e)
2022-02-18 15:47:37 +00:00
Kevin Tse
f5e201e4e9 [DataPipe] Adding usage examples for IterDataPipes (#73033)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73033

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D34313793

Pulled By: NivekT

fbshipit-source-id: 51125be2f79d73d02658b2b1c2691f96be8d4769
(cherry picked from commit 3e3c2df7c6)
2022-02-18 15:12:34 +00:00
Vasiliy Kuznetsov
1c0df26597 eager quant: convert mapping for fused QAT Linear-Bn1d (#72796)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72796

Adds the eager mode convert mappint for fused QAT Linear-Bn1d module.

Test Plan:
```
python test/test_quantization.py TestQuantizeEagerQATNumerics.test_linear_bn_workflow
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D34213150

fbshipit-source-id: c08b5eb843dea673fd07c6b7b93dcd3ba03eaec2
(cherry picked from commit 722edfe676)
2022-02-18 13:14:56 +00:00
Vasiliy Kuznetsov
e73eaffd3b quant: add QAT fused Linear-Bn1d [1/x]: prepared module (#72431)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72431

Adds support for a fused QAT observed module for `Linear` followed by
`BatchNorm1d`. In this PR, only the support for prepared module with
fake_quants in the right places is added.

A future PR will add support for `convert`, and tests for eager and FX
graph mode workflows.

Similar to conv-bn, we rescale the weight before applying the fake
quant, and undo the rescaling after the linear operation.

Test Plan:
```
python test/test_quantization.py TestQuantizeEagerQATNumerics.test_linear_bn
```

Imported from OSS

Reviewed By: jerryzh168, raghuramank10000

Differential Revision: D34044427

fbshipit-source-id: 47a519173939ca4824d2c6e6ea7a599764a8ed10
(cherry picked from commit bfc75fe078)
2022-02-18 13:14:56 +00:00
Terry Chen
710f12f58e [quant] Add ConvTranspose reference module (#73031)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73031

Add ConvTranspose reference module

Test Plan:
python3 test/test_quantization.py TestQuantizeEagerOps.test_conv_transpose_2d

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D34313425

fbshipit-source-id: 3eeec1b24a51c7951c4d4b0c7dca43a012468b85
(cherry picked from commit 0ee7c1cc39)
2022-02-18 06:29:12 +00:00
Will Constable
69389fb542 Sync lazy_tensor_staging back to master (#72875)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72875

This diff contains changes from several PRs landed to lazy_tensor_staging branch.
* generating 'fallback' overrides for each codegenned op, useful for debugging
* supports operators which are missing aten:: symbols for op names, instead using their string counterpart
* makes the IR class a base class instead of hardcoding the assumption of TS

It also resolves lint issues and in particular cleans up the following:
* {Type}s shouldn't be passed into isValueType, and using the catch-all base class of CType is nicer than specifying a list of types.

Fixes #72852

Test Plan: test manually on lazy_tensor_staging branch

Reviewed By: shunting314

Differential Revision: D34250357

fbshipit-source-id: aa7d589f605055d5d02bc77c77fa6f1182ff7497
(cherry picked from commit 2f8f5e4971)
2022-02-18 03:49:46 +00:00
Don Jang
39fb771423 [Static Runtime] Report static op statistics from graph when input size is zero (#73032)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73032

Currently, ptvsc2_predictor_bench reports nothing when the input size is zero. However, Static Runtime's module creation has some useful information even after loading a model.

This change reports static op statistics when the given input's size is zero. In addition to that, this enables it to report the out variant coverage percentage, which is crucial to establish the baseline performance of Static Runtime.

Test Plan: - Ran `ptvsc2_predictor_bench` with this change as seen above.

Reviewed By: mikeiovine

Differential Revision: D34294803

fbshipit-source-id: 80c02199075dae9280657d6edecc7c679c1c27f4
(cherry picked from commit 83aec141a2)
2022-02-17 23:58:32 +00:00
Raghavan Raman
6d33852685 [NNC] TensorExprKernel state should not be modified on calls to run methods (#73028)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73028

A typical use case for `TensorExprKernel` is to create the kernel once and call it multiple times, possibly in parallel. For the parallel calls to work, we need to ensure that the run() method calls do not change any state in `TensorExprKernel`.

Before this change, the `run()` method was modifying the sizes and strides vectors when dynamic shapes were present. This manifested as a data race when running a model with Static Runtime.
ghstack-source-id: 149398820

Test Plan:
```
buck build mode/dev-asan //caffe2/test/cpp/tensorexpr:tensorexpr
./buck-out/dev/gen/caffe2/test/cpp/tensorexpr/tensorexpr --gtest_filter="DynamicShapes.MultiThreadedExecution"
```

Reviewed By: eellison

Differential Revision: D34287960

fbshipit-source-id: d311f3c5a66c5d5de4e1deaeaa01816b53e9906e
(cherry picked from commit 161568bfae)
2022-02-17 23:14:27 +00:00
Joel Schlosser
f670179c0a Fix doc regressions for various modules and functional forms (#73014)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73014

Fixes #72501
Fixes #72502
Fixes #72503
Fixes #72504
Fixes #72505
Fixes #72506
Fixes #72507
Fixes #72509
Fixes #72510

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D34305640

Pulled By: jbschlosser

fbshipit-source-id: 62f341633fdb0316eaa346cf7247865290eb830a
(cherry picked from commit 8362d264e7)
2022-02-17 22:40:18 +00:00
vfdev
af3ca50291 Fixed docstring typo for nn.Module.get_submodule (#73018)
Summary:
Description:
- Fixed docstring typo for nn.Module.get_submodule

otherwise output is invisible: https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.get_submodule

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73018

Reviewed By: davidberard98

Differential Revision: D34310091

Pulled By: jbschlosser

fbshipit-source-id: e35aef2b7479bdd81fb6b7ddd203bd71798769e1
(cherry picked from commit e4944e1f8e)
2022-02-17 22:40:18 +00:00
Rohan Varma
209a948896 [Reland][FSDP] Implement apply() (#72925)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72925

Reland with fix to add the owner string in test file
ghstack-source-id: 149280348

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D34273858

fbshipit-source-id: 2174c1d71fcc5148282d94e375071a50b92114f2
(cherry picked from commit 158762bbb3)
2022-02-17 21:50:03 +00:00
Chen Lai
2c916ef198 More update on the guidance (#72818)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72818

ghstack-source-id: 149395630

Test Plan: CI

Reviewed By: raziel

Differential Revision: D34226823

fbshipit-source-id: e31b71110e8e94bd9fabe25a388f0d4a9b9d0ca7
(cherry picked from commit 57e9b034aa)
2022-02-17 20:05:17 +00:00
Mike Iovine
d1c5f9e439 [JIT][SR] Introduce prim::IfThenElse (#72587)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72587

This pattern frequently appears in a few graphs:

```
%result = prim::If(%condition)
  block0():
    -> (%a)
  block1():
    -> (%b)
```

This is slow, particularly in static runtime. Static runtime creates memory planners/block runners for each sub-block, which eats up a lot of memory and introduces a lot of extra overhead for this relatively simple operation.

This diff introduces a new op that replaces nodes like the above with a single op meant to act like a ternary operator:

```
%result = prim::IfThenElse(%condition, %a, %b)
```

Test Plan: New unit tests

Reviewed By: eellison

Differential Revision: D34091789

fbshipit-source-id: eb6a8c460c39b4c019a1f4ab1f3f1e5b6edc400c
(cherry picked from commit 0f1b335e5b)
2022-02-17 18:22:48 +00:00
Chen Lai
cee84f4051 fix model dump for the lowered module (#72866)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72866

https://github.com/pytorch/pytorch/pull/71597 adds a wrapper `torch.jit.LoweredWrapper` and it breaks the model dump. Fix the model_dump in the notebook
ghstack-source-id: 149311636

Test Plan:
CI and test with N509022

Before:

{F701413403}

After:

{F701412963}

Reviewed By: iseeyuan

Differential Revision: D34247216

fbshipit-source-id: 695b02b03675fae596bb450441b327e4cdcffe9c
(cherry picked from commit d46a82a4c1)
2022-02-17 07:09:44 +00:00
Jordan Fix
540cb5fee2 [graph_manipulation] Unpack list of outputs (#72940)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72940

att

Reviewed By: jackm321

Differential Revision: D34282062

fbshipit-source-id: 743710c18e1f38286d1b91c91868bb22c760f3ca
(cherry picked from commit fd2bdd189d)
2022-02-17 06:38:52 +00:00
Don Jang
5ea74b4996 [Static Runtime] Remove ProcessedNode::num_outputs_ (#72592)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72592

Only code paths that are not perf-critical read `ProcessedNode::num_outputs_` and also its static feature of the op that `ProcessedNode` instance is executing.

Therefore, it's better to move `ProcessedNode::num_outputs_` into `ProcessedFunction::num_outputs_` and let `ProcessedNode` access it via `ProcessedNode::fn_` for its occasional use. Note that this prevents duplicating num_outputs_ per node & per Static Runtime instance since `ProcessedFunction` instances are shared across all runtime instances.

It's confirmed that this change reduces the `sizeof(ProcessedNode)` by 14% from local instrumentation as follows:

- Before
-- sizeof(ProcessedNode): 56

- After
-- sizeof(Processednode): 48

Test Plan: `buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Reviewed By: mikeiovine

Differential Revision: D33984792

fbshipit-source-id: e29ffc97b799e679215f42e1e85cd3fcd7e88983
(cherry picked from commit 0f7003f4df)
2022-02-17 05:09:17 +00:00
Pearu Peterson
456d96d544 Generate static docstrings for torch._masked functions. (#72865)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72865

Fixes #72636

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D34286183

Pulled By: cpuhrsch

fbshipit-source-id: 9cf81bfed6ba8c82593f6a1d9e0b20d0a083310d
(cherry picked from commit 0a3f57896b)
2022-02-17 02:44:16 +00:00
Philip Meier
1f74e082e2 only compare attributes for meta tensors (#72508)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72508

Todo:

- [x] document this behavior
- [x] add tests

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D34262452

Pulled By: ezyang

fbshipit-source-id: bc5c9653d5c3ad5c6efccc9c8e0efc0d28e15104
(cherry picked from commit 233142c88e)
2022-02-17 02:33:08 +00:00
Philip Meier
b5f2574f36 no longer coalesce sparse COO tensors before comparison (#69751)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69751

cc nikitaved pearu cpuhrsch IvanYashchuk

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D34262453

Pulled By: ezyang

fbshipit-source-id: e2e62d2aa03fc569d2951c880960b256f5dc4aaa
(cherry picked from commit cb6b0ef719)
2022-02-17 02:33:08 +00:00
Vitaly Fedyunin
81fbeea760 Add docstrings to native_channel_shuffle (#72919)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72919

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D34274717

Pulled By: VitalyFedyunin

fbshipit-source-id: fa42f91ef2335e2594b19ef65d914c711f7a94fd
(cherry picked from commit a6f6fe9112)
2022-02-17 02:33:08 +00:00