Commit Graph

21846 Commits

Author SHA1 Message Date
Edward Z. Yang
b7215de32f prod ref
It turns out the prim is implemented incorrectly as torch.prod does not accept
a dim list, so I added a little stub for this.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78461

Approved by: https://github.com/ngimel
2022-05-31 14:18:49 +00:00
Edward Z. Yang
e562ed0964 Register PrimTorch sum as a decomposition.
Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78460

Approved by: https://github.com/ngimel
2022-05-31 14:18:49 +00:00
Nikita Shulga
8f7e3791ef Make PyTorch importable on python-3.7.0 (#78500)
By stringifying "typing.OrderedDict", as [`typing.OrderedDict`](https://docs.python.org/3.10/library/typing.html#typing.OrderedDict) were introduced by Python-3.7.2+

See similar fix in 21a82fb519

Partially addresses https://github.com/pytorch/pytorch/issues/78499

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78500
Approved by: https://github.com/atalman
2022-05-31 06:11:30 +00:00
Jason Ansel
dabf8f0569 Populate the torch._decomp table on import (#78476)
#78041 broke TorchInductor, because of:
```
>>> from torch import _decomp
>>> import torch
>>> _decomp.get_decompositions([torch.ops.aten.leaky_relu])
{}
>>> import torch._refs.nn.functional
>>> _decomp.get_decompositions([torch.ops.aten.leaky_relu])
{<OpOverload(op='aten.leaky_relu', overload='default')>: <function leaky_relu at 0x7f5a39b56c10>, <OpOverload(op='aten.leaky_relu', overload='out')>: <function leaky_relu at 0x7f5a39b56c10>}
```

cc @Chillee

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78476
Approved by: https://github.com/Chillee
2022-05-31 03:46:38 +00:00
Shawn Zhong
a468941355 Fix jiterator doc format (#78471)
Current docs do not show the code example properly:
https://pytorch.org/docs/master/generated/torch.cuda.jiterator._create_jit_fn.html
https://pytorch.org/docs/master/generated/torch.cuda.jiterator._create_multi_output_jit_fn.html

This PR fixes the formatting issue:
https://docs-preview.pytorch.org/78471/generated/torch.cuda.jiterator._create_jit_fn.html
https://docs-preview.pytorch.org/78471/generated/torch.cuda.jiterator._create_multi_output_jit_fn.html
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78471
Approved by: https://github.com/ngimel
2022-05-31 03:44:52 +00:00
Bairen Yi
b6672b10e1 Fix incorrect decomposition for native_dropout (#77933)
Quick sanity check: it should be identity function if p=0.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77933
Approved by: https://github.com/Chillee
2022-05-30 20:08:48 +00:00
Yukio Siraichi
3f334f0dfd Fix asarray documentation formatting (#78485)
Fixes #78290

Here's a screenshot of the modified doc:
![asarray](https://user-images.githubusercontent.com/3337141/170971723-aafe20a9-8e51-420f-ae98-67dc2df579a2.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78485
Approved by: https://github.com/ngimel
2022-05-30 19:28:10 +00:00
Wang, Eikan
11b9a81e02 [NNC] channels last propagation within NNC fusion group (#76948)
Decide the memory layout propagation policy and propagate it within the NNC fusion group. The memory layout propagation policy could be `Contiguous` and `Channels-last contiguous`.
 - `Contiguous`: Convert the non-contiguous including channels-last contiguous input tensors to contiguous and generate the contiguous output `Buf` for lowering function.
 - `Channels-last contiguous`: Convert the input tensors to channels-last contiguous and generate the channels-last contiguous output `Buf` for lowering function.

Currently, the rule is simple. If all the input and out tensors of the NNC fusion group are channels-last contiguous, then the propagated memory layout is `Channels-last contiguous`. Otherwise, it is always `Contiguous` which is as same as current situation. It means that this PR provides a fast path to channels-last and the optimization is conservative since its trigger conditions are strict.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76948
Approved by: https://github.com/ZolotukhinM
2022-05-30 18:31:49 +00:00
Andrew Or
c7b4eec233 [Quant][fx][bc-breaking] Replace qconfig_dict with a config object (#78452)
**Summary:** Previously, FX graph mode quantization configurations
were specified through a dictionary of qconfigs. However, this
API was not in line with other core APIs in PyTorch. This commit
replaces this dictionary with a config object that users will
create and pass to prepare and convert. This leads to better
type safety and better user experience in notebook settings
due to improved auto completion.

The new API is as follows:

```
from torch.ao.quantization import QConfigMapping
from torch.ao.quantization.quantize_fx import prepare_fx

qconfig_mapping = QConfigMapping()
    .set_global(qconfig)
    .set_object_type(torch.nn.Linear, qconfig)
    .set_module_name_regex("foo.*bar", qconfig)
    .set_module_name("mod", qconfig)

prepare_fx(model, qconfig_mapping)
```

For backwards compatibility, `prepare_fx`, `prepare_qat_fx`,
and `convert_fx` will continue to accept qconfig_dicts, which
will be converted to QuantizationConfigs internally.

Note that this commit does not modify existing tests to use the
new API; they will continue to pass in qconfig_dict as before,
which still works but triggers a deprecation warning. This will
be handled in a future commit.

**Test Plan:**
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

**Reviewers:** jerryzh168, vkuzo

**Subscribers:** jerryzh168, vkuzo

Differential Revision: D36747998

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78452
Approved by: https://github.com/jerryzh168
2022-05-30 18:30:07 +00:00
Allen Goodman
64e0d0c4fe Laguerre polynomial (#78366)
Adds:

```Python
laguerre_polynomial_l(input, n, *, out=None) -> Tensor
```

Laguerre polynomial $L_{n}(\text{input})$.

## Derivatives

Recommended $k$-derivative formula with respect to $\text{input}$:

$$\frac{d^{k}}{d \times \text{input}^{k}} L_{n}(\text{input}) = -1^{k} \times L_{-k + n}^{k}(\text{input})$$

where $L_{n}^{\alpha}$ is the associated Laguerre polynomial.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78366
Approved by: https://github.com/mruberry
2022-05-30 17:24:00 +00:00
Mike Ruberry
089203f8bc Updates floor_divide to perform floor division (#78411)
Fixes https://github.com/pytorch/pytorch/issues/43874

This PR changes floor_divide to perform floor division instead of truncation division.

This is a BC-breaking change, but it's a "bug fix," and we've already warned users for several releases this behavior would change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78411
Approved by: https://github.com/ngimel
2022-05-29 21:28:45 +00:00
Kurt Mohler
e9afb43676 Add meta device support to _UntypedStorage and _TypedStorage (#78008)
Fixes #77885

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78008
Approved by: https://github.com/ezyang
2022-05-28 15:33:45 +00:00
Allen Goodman
9dc6d42c18 Probabilist’s Hermite polynomial (#78357)
Adds:

```Python
hermite_polynomial_he(input, n, *, out=None) -> Tensor
```
Physicist’s Hermite polynomial $He_{n}(\text{input})$.

If $n = 0$, $1$ is returned. If $n = 1$, $\text{input}$ is returned. Otherwise, the recursion:

$$He_{n + 1}(\text{input}) = 2 \times \text{input} \times He_{n}(\text{input}) - He_{n - 1}(\text{input})$$

is evaluated.

## Derivatives

Recommended $k$-derivative formula with respect to $\text{input}$:

$$\frac{d^{k}}{d \times \text{input}^{k}} He_{n}^{(k)} = \frac{n!}{(n - k)!}He_{n - k}(\text{input}).$$
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78357
Approved by: https://github.com/mruberry
2022-05-28 13:56:12 +00:00
Allen Goodman
18273c39da Physicist’s Hermite polynomial (#78352)
Adds:

```Python
hermite_polynomial_h(input, n, *, out=None) -> Tensor
```
Physicist’s Hermite polynomial $H_{n}(\text{input})$.

If $n = 0$, $1$ is returned. If $n = 1$, $\text{input}$ is returned. Otherwise, the recursion:

$$H_{n + 1}(\text{input}) = 2 \times \text{input} \times H_{n}(\text{input}) - H_{n - 1}(\text{input})$$

is evaluated.

## Derivatives

Recommended $k$-derivative formula with respect to $\text{input}$:

$$\frac{d^{k}}{d \times \text{input}^{k}} H_{n}^{(k)} = 2^{k} \times \frac{n!}{(n - k)!}H_{n - k}(\text{input})$$
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78352
Approved by: https://github.com/mruberry
2022-05-28 02:26:30 +00:00
Ryan Spring
2df1da09e1 Add Elementwise unary ops 4 references (#78216)
Add reference implementations for `nan_to_num, positive, sigmoid, signbit, tanhshink`
Add prims for `minimum_value(dtype)` and `maximum_value(dtype)`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78216
Approved by: https://github.com/mruberry
2022-05-27 21:55:34 +00:00
Allen Goodman
40a6cc6cc6 Chebyshev polynomial of the second kind (#78293)
Adds:

```Python
chebyshev_polynomial_u(input, n, *, out=None) -> Tensor
```

Chebyshev polynomial of the second kind $U_{n}(\text{input})$.

If $n = 0$, $1$ is returned. If $n = 1$, $2 \times \text{input}$ is returned. If $n < 6$ or $|\text{input}| > 1$ the recursion:

$$T_{n + 1}(\text{input}) = 2 \times \text{input} \times T_{n}(\text{input}) - T_{n - 1}(\text{input})$$

is evaluated. Otherwise, the explicit trigonometric formula:

$$\frac{\text{sin}((n + 1) \times \text{arccos}(\text{input}))}{\text{sin}(\text{arccos}(\text{input}))}$$

is evaluated.

## Derivatives

Recommended first derivative formula with respect to $\text{input}$:

$$\frac{(-1 - n)\times U_{-1 + n}(\text{input}) + n \times \text{input} \times U_{n}(x)}{-1 + \text{input}^{2}}.$$

Recommended $k$-derivative formula with respect to $\text{n}$:

$$\frac{\text{arccos}(\text{input})^{k} \times \text{sin}(\frac{k \times \pi}{2} + (1 + n) \times \text{arccos}(\text{input}))}{\sqrt{1 - \text{input}^{2}}}.$$

## Example

```Python
x = torch.linspace(-1.0, 1.0, 256)

matplotlib.pyplot.plot(x, torch.special.chebyshev_polynomial_u(x, 10))
```

![image](https://user-images.githubusercontent.com/315821/170352780-12af63d3-ce31-4948-8b68-8ecc37c71ac5.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78293
Approved by: https://github.com/mruberry
2022-05-27 18:32:11 +00:00
Aidyn-A
4963d41f9d Add logsumexp to AMP autocast (#76330)
Add `logsumexp` function to AMP rules.

This PR fixes an issue described in [PyTorch forum](https://discuss.pytorch.org/t/kl-divergence-negative-with-amp/149312).

cc @ptrblck @mcarilli
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76330
Approved by: https://github.com/mcarilli, https://github.com/ptrblck, https://github.com/ngimel
2022-05-27 17:26:20 +00:00
jjsjann123
1a9a1b8b5e fixing typo (#78417)
primtorch prod is mistakenly using `_sum_doc`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78417
Approved by: https://github.com/malfet
2022-05-27 17:10:15 +00:00
Pearu Peterson
8c88a55d44 Fix sparse BSR tensor validation.
Also adds bits to support dense dimensions for Sparse Compressed tensors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78359

Approved by: https://github.com/cpuhrsch
2022-05-27 13:26:35 +00:00
Jerry Zhang
85f308275e [fx2trt] Fix dummy weight initialization in conv1d converter (#78402)
Summary:
att, currently it errors out with the following error:
```
---> 72         dummy_weight = trt.Weights(weight_shape)
     73         layer = network.add_convolution_nd(
     74             input=input_val,
TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:
    1. tensorrt.tensorrt.Weights(type: tensorrt.tensorrt.DataType = <DataType.FLOAT: 0>)
    2. tensorrt.tensorrt.Weights(a: numpy.ndarray)
```
full error: https://www.internalfb.com/phabricator/paste/view/P503598381
we need to pass arond a numpy ndarray instead of a shape here.

and support conv1d in backend_config_dict for tensorrt

Test Plan:
```
buck test mode/opt deeplearning/trt/fx2trt_oss/test/converters:test_convolution
```

```
buck test mode/opt deeplearning/trt/fx2trt_oss/test/quant:test_quant_trt
```

Differential Revision: D36721313

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78402
Approved by: https://github.com/842974287
2022-05-27 04:48:45 +00:00
Justin Chu
299fbbccec [ONNX] Fix check_training_mode in symbolic_helper (#78376)
`check_training_mode` always warned that an op is set to training because it was comparing an int `op_train_mode` with an Enum `GLOBALS.training_mode`. This PR fixes the behavior.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78376
Approved by: https://github.com/garymm
2022-05-27 00:38:16 +00:00
Max Podkorytov
2679755bdc [static-runtime] out variant for aten::max (#78271)
Summary: Previously the op was auto-generated but it only covered the pointwise overload of aten::max. This adds support for reduction, overall and along a dim

Test Plan: Added a unit test

Differential Revision: D36656378

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78271
Approved by: https://github.com/mikeiovine
2022-05-26 23:29:27 +00:00
yuguo68
6ee072a324 fix missing dim out of range check for logcumsumexp_cuda with empty source tensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78284

Approved by: https://github.com/ngimel
2022-05-26 22:05:59 +00:00
Allen Goodman
029bbe4995 Chebyshev polynomial of the first kind (#78196)
Adds:

```Python
chebyshev_polynomial_t(input, n, *, out=None) -> Tensor
```

Chebyshev polynomial of the first kind $T_{n}(\text{input})$.

If $n = 0$, $1$ is returned. If $n = 1$, $\text{input}$ is returned. If $n < 6$ or $|\text{input}| > 1$ the recursion:

$$T_{n + 1}(\text{input}) = 2 \times \text{input} \times T_{n}(\text{input}) - T_{n - 1}(\text{input})$$

is evaluated. Otherwise, the explicit trigonometric formula:

$$T_{n}(\text{input}) = \text{cos}(n \times \text{arccos}(x))$$

is evaluated.

## Derivatives

Recommended $k$-derivative formula with respect to $\text{input}$:

$$2^{-1 + k} \times n \times \Gamma(k) \times C_{-k + n}^{k}(\text{input})$$

where $C$ is the Gegenbauer polynomial.

Recommended $k$-derivative formula with respect to $\text{n}$:

$$\text{arccos}(\text{input})^{k} \times \text{cos}(\frac{k \times \pi}{2} + n \times \text{arccos}(\text{input})).$$

## Example

```Python
x = torch.linspace(-1, 1, 256)

matplotlib.pyplot.plot(x, torch.special.chebyshev_polynomial_t(x, 10))
```

![image](https://user-images.githubusercontent.com/315821/170125525-60415735-4d49-4cbd-9278-26286413f635.png)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78196
Approved by: https://github.com/mruberry
2022-05-26 21:06:44 +00:00
kshitij12345
8bd8f62812 [primTorch] refs: margin_ranking_loss, hinge_embedding_loss (#78057)
Refs for `nn.functional.margin_ranking_loss` and `nn.functional.hinge_embedding_loss`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78057
Approved by: https://github.com/mruberry
2022-05-26 21:01:57 +00:00
Kshiteej K
4e1f41f66a [docs][nn] conv: complex support note (#78351)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78351
Approved by: https://github.com/anjali411, https://github.com/jbschlosser
2022-05-26 20:33:36 +00:00
Aidyn-A
31016eb81e [primTorch] Elementwise Binary Ops I (#78023)
This PR is a result of collaboration with @rdspring1 and @mruberry on primTorch.

It adds the following prims:
- `fmax`
- `fmin`
- `fmod`

And adds the following refs:
- `fmax`
- `fmin`
- `fmod`
- `logical_xor`

The work is in progress as there are some tests that fail.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78023
Approved by: https://github.com/mruberry
2022-05-26 20:22:27 +00:00
BowenBao
483bb4f0cb [ONNX] Extract export verification as standalone api from unittest
The verification logic is refactored and extracted from
`test_pytorch_onnx_onnxruntime.py`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76929

Approved by: https://github.com/justinchuby, https://github.com/garymm
2022-05-26 19:49:19 +00:00
leslie-fang-intel
1a41cd8f97 Conv BN folding data type issue when conv has no bias (#78241)
PR https://github.com/pytorch/pytorch/pull/77042 has fixed the new folding conv-bn data type issue but missing the case when original conv has no bias input.
In this PR:

- Fix the new folding conv-bn's bias data type issue, when conv has no bias but weight as lower precision datatype, the new generated bias data type should be same as conv's weight.
- Move the Autocast JIT Trace UT from `test_jit.py` to `test_jit_autocast.py`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78241
Approved by: https://github.com/davidberard98
2022-05-26 18:42:17 +00:00
Gao, Xiang
5ecd30e857 [primTorch] Rename is_finite->isfinite (#78211)
`isfinite` sounds like a better name, because PyTorch, C++, numpy all have this name instead of `is_finite`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78211
Approved by: https://github.com/ngimel, https://github.com/mruberry
2022-05-26 16:17:51 +00:00
Brian Hirsh
7ff091fc4e move Functionalize dispatch key closer to backends
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77132

Approved by: https://github.com/ezyang, https://github.com/zou3519
2022-05-26 16:15:43 +00:00
Brian Hirsh
5cc258ec9e make block_diag composite compliant
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77716

Approved by: https://github.com/zou3519
2022-05-26 16:15:42 +00:00
Bin Bao
29189d2ba8 [LT] Add IR resuing support for manually-implemented ops
Summary: Add CanBeReused methods for manually-implemented ops and replace MakeNode with
ReuseOrMakeNode.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77616

Approved by: https://github.com/JackCaoG, https://github.com/wconstab
2022-05-26 16:04:47 +00:00
Ayaka Mikazuki
91a4fe0777 [docs] Move a sentence from nn.Transformer to nn.TransformerEncoder (#78337)
`nn.Transformer` is not possible to be used to implement BERT, while `nn.TransformerEncoder` does. So this PR moves the sentence 'Users can build the BERT model with corresponding parameters.' from `nn.Transformer` to `nn.TransformerEncoder`.

Fixes #68053
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78337
Approved by: https://github.com/jbschlosser
2022-05-26 15:44:38 +00:00
Nikita Vedeneev
3924d56fae BCE loss: forward-over-reverse AD support (#77852)
Umbrella issue: https://github.com/pytorch/pytorch/issues/75432

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77852
Approved by: https://github.com/soulitzer
2022-05-26 14:36:52 +00:00
Jane Xu
50930604cf Hackily use addSkip to track flakiness in common_utils.py (#78292)
### Problem:
The current way we detect flakiness is by aggregating results at the end of a job, which has worked so far but is inefficient and potentially inaccurate. We have also been delegating a workflow step towards doing this analysis at the end of every job.

### Solution:
This PR uses unittest.TestResult's addSkip method, which adds a skipped test every time we detect something is flaky. This way, we no longer need to aggregate anything and we can easily scan through the test reports and filter for skipped tests with flaky = True. Not only is this much faster to query for, it rids us of needing to figure out janky aggregation logic.

### Test plan:
I simulated a flaky test locally (test_async_python) and observed that:
With overriding signal ON (so flaky test = green):
- Successes pass are reported just as they normally are with no skips. [override_signal_normal_success.txt](https://github.com/pytorch/pytorch/files/8774012/override_signal_normal_success.txt)
- Failures fail and are reported as they are with no skips. [override_signal_all_fails.txt](https://github.com/pytorch/pytorch/files/8774010/override_signal_all_fails.txt)
- Flaky tests have expected failures + a success + a skip denoting the correct information. [override_signal_1_1.txt](https://github.com/pytorch/pytorch/files/8774005/override_signal_1_1.txt)
 and [override_signal_2_1.txt](https://github.com/pytorch/pytorch/files/8774007/override_signal_2_1.txt)

With overriding signal OFF:
- Successes pass are reported just as they normally are with no skips. [report_only_one_success.txt](https://github.com/pytorch/pytorch/files/8774019/report_only_one_success.txt)
- Failures fail and are reported as they are with no skips. [report_only_all_fails.txt](https://github.com/pytorch/pytorch/files/8774018/report_only_all_fails.txt)
- Flaky tests have failures + unexpected successes + a skip denoting the correct
 information. [report_only_3_1.txt](https://github.com/pytorch/pytorch/files/8774015/report_only_3_1.txt)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78292
Approved by: https://github.com/suo
2022-05-26 14:06:57 +00:00
Louis Feng
18d46ea9fd [PyTorch] Integrate Execution Graph Observer into PyTorch Profiler (#75358)
Test Plan:
```
buck build mode/dev-nosan caffe2/test:profiler --show-output
buck-out/gen/caffe2/test/profiler#binary.par test_profiler.TestExecutionGraph.test_execution_graph
```

Example output: P491658589

Differential Revision: D35342394

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75358
Approved by: https://github.com/robieta
2022-05-26 08:06:27 +00:00
Hui Guo
d12bf9fd75 [static_runtime] Add auto-generated view ops (#77106)
Summary: This includes the generated view ops from D36258767.

Test Plan: buck run mode/opt //caffe2/benchmarks/static_runtime:static_runtime_cpptest

Differential Revision: D36258968

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77106
Approved by: https://github.com/alanwaketan, https://github.com/tenpercent
2022-05-26 03:13:59 +00:00
Rohan Varma
a0b3814433 Clean prefixes when searching for params / buffers to ignore (#78278)
Co-authored with: @awgu

When `state_dict` has a prefix attached to it, the current logic for ignoring parameters and buffers does not work since it doesn't account for this prefix. To fix this, we make the following changes:

- clean the key if it starts with prefix. Note that all keys may not start with prefix, i.e. if the current module's state_dict_post_hook is running and previous module `state_dict` has already been computed and previous module is on the same level of hierarchy as the current module.
- This prefixing makes it so that it is not current to override child module's ignored params and buffers with the root FSDP instance's (this wouldn't work if child FSDP instances had ignored modules, and root didn't, for example). We fix this by having each parent know about the ignored modules of their children, and computing fully qualified names for ignored params and buffers.
- This means that each for a particular FSDP instance, that instance knows about the names of itself and its children (in fully qualified form) that it needs to ignore. It wouldn't know about parent ignored params and buffers, but it doesn't need to store this data.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78278
Approved by: https://github.com/awgu
2022-05-26 02:43:03 +00:00
Zachary DeVito
b6920405da reorder checks to shave 1 us off no-op dispatch time
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78316

Approved by: https://github.com/Chillee, https://github.com/ezyang
2022-05-26 02:27:33 +00:00
Jerry Zhang
8225f42a8a [quant][fx][equalization] Fix example_inputs follow ups in test_equalize_fx
Summary:
as a followup to https://github.com/pytorch/pytorch/pull/76496, we defined model specific example_inputs
for the test models in common_quantization.py and used these in test_equalize_fx

Test Plan:
python test/test_quantization.py TestEqualizeFx

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78314

Approved by: https://github.com/vkuzo
2022-05-26 01:42:24 +00:00
johnlu
c1cbe3bad3 Enhance the _rebuild_qtensor to support other device type other than CPU (#78234)
## Motivation
There is a bug in torch._utils.rebuild_qtensor when to restore a qtensor from pickle for not CPU device type. The tensor is created on the CPU device but set to a storage which maybe a different device type.

## Solution
Create the qtensor based on the storage device type.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78234
Approved by: https://github.com/ezyang
2022-05-26 01:36:37 +00:00
Jerry Zhang
7ea5fa3dd4 [reland][quant] Add utility function get_fqn_to_example_inputs
Summary:
After https://github.com/pytorch/pytorch/pull/77608 `example_inputs` is required input for `prepare_fx` and `prepare_qat_fx`.
This makes quantizing submodules harder, so we added this utility function to get a dictionary from fqn to submodule example_inputs

Example Call:

```
example_inputs = (tensor0,)
get_fqn_to_example_inputs(m, example_inputs)
```

Example output:
```
{
   "linear1": (tensor1,),
   "linear2": (tensor2,),
   "sub": (tensor3,),
   "sub.linear1": (tensor4,),
   ...
}
```

Test Plan:
python test/test_quantization.py TestUtils

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78286

Approved by: https://github.com/dzdang
2022-05-25 23:31:51 +00:00
mikeiovine
56c23f5633 [SR] Out variant for embedding_bag_byte_unpack
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77661

Add an out variant and wrapper in static runtime.

I just added the declaration with the others in `qembeddingbag.h` for now (rather than properly adding the out variant to the torch library). This can be fixed in a followup.

Differential Revision: [D36449840](https://our.internmc.facebook.com/intern/diff/D36449840/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36449840/)!

Approved by: https://github.com/tenpercent
2022-05-25 23:24:11 +00:00
Xiang Gao
3b70a7c294 [primTorch] impl_nvfuser for unary ops - 1 (#78220)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78220
Approved by: https://github.com/mruberry
2022-05-25 22:59:48 +00:00
fduwjj
141238a889 [PT-D] Enable nan_to_num op for sharded tensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78223

Approved by: https://github.com/pritamdamania87
2022-05-25 18:03:42 +00:00
Wonjoo Lee
593d66e1b3 Add lazy shape inference for logical boolean ops (#78004)
Add lazy shape inference for logical boolean ops
- logical_and
- logical_not
- logical_or
- logical_xor

Uses helper functions defined at https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/ExpandUtils.h#L21 to infer the shape.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78004
Approved by: https://github.com/wconstab
2022-05-25 17:48:40 +00:00
Tristan Rice
ebba4219ae torch/distributed: move WorkerInfo registration into libtorch instead of libtorch_python (#78028)
Summary:
This moves torch::class_<WorkerInfo> into `rpc_agent.cpp` so it gets registered in libtorch instead of libtorch_python. This is intermediate work to getting torch::deploy to load an unmodified copy of libtorch. Current RPC is incompatible due to duplicate registrations.

```
unknown file: Failure
C++ exception with description "Exception Caught inside torch::deploy embedded library:
Custom class with name __torch__.torch.classes.dist_rpc.WorkerInfo is already registered. Ensure that registration with torch::class_ is only called once.
Exception raised from registerCustomClass at ../aten/src/ATen/core/custom_class.cpp:61 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x3e (0x7f3bd9adb92e in /home/tristanr/venvs/multipy/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5c (0x7f3bd9ab7068 in /home/tristanr/venvs/multipy/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #2: torch::registerCustomClass(std::shared_ptr<c10::ClassType>) + 0x110 (0x7f3bc2258980 in /home/tristanr/venvs/multipy/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #3: torch::detail::class_base::class_base(std::string const&, std::string const&, std::string, std::type_info const&, std::type_info const&) + 0x3b9 (0x7f3bc225a419 in /home/tristanr/venvs/multipy/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #4: [0x7f3ba45cfea1]
frame #5: <unknown function> + 0x1b5334 (0x5652bdab9334 in ./test_deploy)
frame #6: <unknown function> + 0x1b4f3e (0x5652bdab8f3e in ./test_deploy)
frame #7: <unknown function> + 0x1b519b (0x5652bdab919b in ./test_deploy)
frame #8: loadSearchFile(char const*) + 0x23e (0x7f3ba62f37f8 in /tmp/torch_deploy9ATEFg)
frame #9: deploy_set_self + 0x51 (0x7f3ba62f38f9 in /tmp/torch_deploy9ATEFg)
frame #10: torch::deploy::Interpreter::Interpreter(torch::deploy::InterpreterManager*, std::shared_ptr<torch::deploy::Environment>) + 0x274 (0x5652bdaaa790 in ./test_deploy)
frame #11: void __gnu_cxx::new_allocator<torch::deploy::Interpreter>::construct<torch::deploy::Interpreter, torch::deploy::InterpreterManager*, std::shared_ptr<torch::deploy::Environment>&>(torch::deploy::Interpreter*, torch::deploy::InterpreterManager*&&, std::shared_ptr<torch::deploy::Environment>&) + 0x81 (0x5652bdaaf58b in ./test_deploy)
frame #12: void std::allocator_traits<std::allocator<torch::deploy::Interpreter> >::construct<torch::deploy::Interpreter, torch::deploy::InterpreterManager*, std::shared_ptr<torch::deploy::Environment>&>(std::allocator<torch::deploy::Interpreter>&, torch::deploy::Interpreter*, torch::deploy::InterpreterManager*&&, std::shared_ptr<torch::deploy::Environment>&) + 0x4a (0x5652bdaae320 in ./test_deploy)
frame #13: void std::vector<torch::deploy::Interpreter, std::allocator<torch::deploy::Interpreter> >::_M_realloc_insert<torch::deploy::InterpreterManager*, std::shared_ptr<torch::deploy::Environment>&>(__gnu_cxx::__normal_iterator<torch::deploy::Interpreter*, std::vector<torch::deploy::Interpreter, std::allocator<torch::deploy::Interpreter> > >, torch::deploy::InterpreterManager*&&, std::shared_ptr<torch::deploy::Environment>&) + 0xee (0x5652bdaae4a0 in ./test_deploy)
frame #14: void std::vector<torch::deploy::Interpreter, std::allocator<torch::deploy::Interpreter> >::emplace_back<torch::deploy::InterpreterManager*, std::shared_ptr<torch::deploy::Environment>&>(torch::deploy::InterpreterManager*&&, std::shared_ptr<torch::deploy::Environment>&) + 0xb6 (0x5652bdaad258 in ./test_deploy)
frame #15: torch::deploy::InterpreterManager::InterpreterManager(unsigned long, std::shared_ptr<torch::deploy::Environment>) + 0x123 (0x5652bdaa83b1 in ./test_deploy)
frame #16: TorchpyTest_InitTwice_Test::TestBody() + 0x65 (0x5652bda075a9 in ./test_deploy)
frame #17: void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 0x65 (0x5652bda944b7 in ./test_deploy)
frame #18: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 0x5a (0x5652bda8cfe7 in ./test_deploy)
frame #19: testing::Test::Run() + 0x100 (0x5652bda68622 in ./test_deploy)
frame #20: testing::TestInfo::Run() + 0x10f (0x5652bda68fb3 in ./test_deploy)
frame #21: testing::TestSuite::Run() + 0x121 (0x5652bda6980d in ./test_deploy)
frame #22: testing::internal::UnitTestImpl::RunAllTests() + 0x38e (0x5652bda756e6 in ./test_deploy)
frame #23: bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 0x65 (0x5652bda9586b in ./test_deploy)
frame #24: bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 0x5a (0x5652bda8e0f7 in ./test_deploy)
frame #25: testing::UnitTest::Run() + 0xc9 (0x5652bda73fd1 in ./test_deploy)
frame #26: RUN_ALL_TESTS() + 0x11 (0x5652bda169fa in ./test_deploy)
frame #27: main + 0x27 (0x5652bda10ce2 in ./test_deploy)
frame #28: <unknown function> + 0x2d310 (0x7f3bc0431310 in /usr/lib/libc.so.6)
frame #29: __libc_start_main + 0x81 (0x7f3bc04313c1 in /usr/lib/libc.so.6)
frame #30: _start + 0x25 (0x5652bda063b5 in ./test_deploy)
```

Test Plan: CI

Differential Revision: D36564258

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78028
Approved by: https://github.com/rohan-varma
2022-05-25 17:46:39 +00:00
Andrew Gu
8412f209f0 [FSDP] Remove unneeded padding logic for optim state dict
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78208

Approved by: https://github.com/rohan-varma
2022-05-25 17:22:03 +00:00
Vasiliy Kuznetsov
53e05ad4b2 ns for fx: remove restriction on nodes with no args and only kwargs
Summary:

Removes the restriction from NS for FX on handling nodes which have
no positional arguments, such as `F.linear(input=x, weight=w, bias=b).

In order to achieve this, we delete all places in the code which
were doing things like

```
node.args[0]
```

And replace them with

```
_get_normalized_nth_input(node, gm, 0)
```

The `_get_normalized_nth_input` function is a best effort way to
get the n'th normalized input.

This is needed because some FX tools output nodes normalized to
be kwargs only, and we need to be able to handle this in NS.

Test plan:

```
python test/test_quantization.py -k test_linear_kwargs_shadow
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78181

Approved by: https://github.com/z-a-f, https://github.com/hx89
2022-05-25 17:00:39 +00:00