Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73748
This adds CPU-only slow test jobs, which previously would never run.
Includes fixes/skips for slow tests which fail (they need to be skipped now because they used to never run)
Test Plan: Imported from OSS
Reviewed By: malfet
Differential Revision: D34628803
Pulled By: davidberard98
fbshipit-source-id: c090ab7bf7bda9e24ec5cdefa6fd35c6310dbac0
(cherry picked from commit 06f7a94a57cc7023e9c5442be8298d20cd011144)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73564
While maintaining API backward compatibility, add an optional output parameter to split_module() that returns a mapping from the new qualified names in the modules after split to the old qualified names in the original module
Test Plan:
1. Added a test (test_split_qualname_mapping) to test_fx_experimental.py to check the returned qualname mapping
```
$ python test_fx_experimental.py
...
Ran 1084 tests in 73.464s
OK (skipped=531, expected failures=4)
```
2. Ask test_fx.py to accept split_module's new signature
```
$ python test_fx.py --accept
```
Reviewed By: jamesr66a
Differential Revision: D34541792
fbshipit-source-id: e8ec7e77ec884e4db7cad0c0593e31861c76e42d
(cherry picked from commit d2e5a95a353ee5fb52cdba065f127489e9df47ae)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61608
See #61544 for an example of issues created by functional wrappers. In this
case, these are directly wrapping the native function with no added
functionality. One exception was `bilinear` which was just missing the default
argument in C++, but was otherwise the same.
I've kept the symbol `torch.functional.istft` because it looks like public API,
but it could just as easily be moved to `_torch_docs.py`.
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D31401361
Pulled By: albanD
fbshipit-source-id: 162b74d0b2d4f2e5c4834687a94541960cefdd52
(cherry picked from commit 700cd73ca1)
Summary:
These tests were not actually running as they were defined in the local scope of another test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71885
Reviewed By: scottxu0730
Differential Revision: D33806251
Pulled By: jansel
fbshipit-source-id: 48a2d7b472f160759ef55e6fff1f8890511e3345
(cherry picked from commit 9ae14efb25)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71016
I found out that `split_module` doesn't preserve default values for arguments. In trying to fix that, I noticed that `Graph.placeholder` doesn't make it easy to add a default argument when making a placeholder. This PR addresses both of those issues
Test Plan: Imported from OSS
Reviewed By: ansley
Differential Revision: D33482218
Pulled By: jamesr66a
fbshipit-source-id: 57ebcdab25d267333fb1034994e08fc1bdb128ee
Summary:
Earlier, we were only testing for inputs with the shape of `(5,)` for `nn.functional.dropout`, but since it's used a lot - I feel it's a good idea to test for a few more shapes including scalars. This PR:
1. Revises sample inputs for `nn.functional.dropout`
2. Adds an OpInfo for `nn.functional.dropout2d`.
A note regarding the documentation:
Looks like `nn.functional.dropout2d` also supports inputs of shape `(H, W)` apart from `(N, C, H, W) / (C, H, W)` but the [documentation](https://pytorch.org/docs/stable/generated/torch.nn.Dropout2d.html#torch.nn.Dropout2d) doesn't mention that (`H, W` case). Should that be revised or am I missing anything here? (Filed an issue here: https://github.com/pytorch/pytorch/issues/67892)
```python
# A 2D tensor is a valid input for Dropout2d
In [11]: tensor = torch.randn((3, 4), device='cpu', dtype=torch.float32)
In [12]: dropout2d = torch.nn.Dropout2d(p=0.5)
In [13]: dropout2d(tensor)
Out[13]:
tensor([[-0.1026, -0.0000, -0.0000, -0.0000],
[-1.5647, 0.0000, -0.0000, -0.5820],
[-0.0000, -3.2080, 0.1164, -3.6780]])
```
Issue Tracker: https://github.com/pytorch/pytorch/issues/54261
cc: mruberry zou3519
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67891
Reviewed By: mrshenli
Differential Revision: D32628527
Pulled By: mruberry
fbshipit-source-id: 4c9b89550f1d49526e294378ce107eba9f29cabb
Summary:
As per title.
While working on this I have discovered several issues with these methods related to grad instabilities. I will file them and link here later. These were quite painful to force to pass all the tests with these discovered issues, sorry for the delay, mruberry!
cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69107
Reviewed By: zou3519
Differential Revision: D32920341
Pulled By: mruberry
fbshipit-source-id: 15b33e2b46acdcbff8a37d8e43e381eb55d1a296
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67357
This PR adds OpInfos for:
- new_ones, new_zeros, new_full, new_empty
- rand_like, randint_like
I forgot to add the _like functions in a previous PR, so here they are.
Test Plan: - wait for tests
Reviewed By: mruberry
Differential Revision: D31969533
Pulled By: zou3519
fbshipit-source-id: 236d70d66e82f1d6f8e5254b55ca2a37b54c9494
Summary:
All of the pooling modules except MaxUnpool and LPPool return either a
Tensor or [Tensor, Tensor]. The current type annotations are inaccurate,
and prevent scripting the module if return_indices is set as True in the
module.
There's not a great way to make this agree with mypy because the
overload is dependent on the value of return_indices, an attribute.
I tried changing the annotations from `Tensor` to
`Union[Tensor, Tuple[Tensor, Tensor]]`, but that breaks a bunch of uses
that have return_indices=False.
For example, this breaks:
4e94e84f65/torch/nn/modules/container.py (L139)
Also clean up how test names were being constructed in test_jit, since
otherwise we were getting name collisions when there were two tests on
the same nn.Module.
Fixes https://github.com/pytorch/pytorch/issues/45904
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65847
Reviewed By: ZolotukhinM
Differential Revision: D31462517
Pulled By: eellison
fbshipit-source-id: 6f9e8df1be6c75e5e1e9bae07cf3ad3603ba59bd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64282
OpInfos for:
- Tensor.bfloat16, Tensor.bool, Tensor.bypte, Tensor.char
- Tensor.double, Tensor.float, Tensor.half, Tensor.int
- Tensor.short, Tensor.long
None of these are supported by TorchScript. Also, the OpInfo autograd
test runner assumes that the operation is not allowed to change the
dtype of the argument, so only Tensor.double has
`supports_autograd=True` (in theory Tensor.bfloat16, Tensor.float,
Tensor.half should be differentiable).
Test Plan: - run tests
Reviewed By: dagitses
Differential Revision: D31452627
Pulled By: zou3519
fbshipit-source-id: b7f272e558558412c47aefe947af7f060dfb45c5
Summary:
OpInfo tracker: https://github.com/pytorch/pytorch/issues/54261
- Eliminate duplicated testing logic in test_autograd
- Moved tests that rely on this testing logic to use OpInfos
- `cat` already has OpInfo (no action needed)
- Created OpInfo for `block_diag` and `broadcast_tensors`
Running into some FX errors. Added op to skip-list and created an issue here: https://github.com/pytorch/pytorch/issues/64997
Both `block_diag` and `broadcast_tensors` are variadic, so skipping `test_variant_consistency_jit` (from comments on other OpInfos, it looks like JIT does not support variadic tensors)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64993
Reviewed By: jbschlosser
Differential Revision: D30961736
Pulled By: soulitzer
fbshipit-source-id: e169305384a683acae1178c4e12e9e214a67226a
Summary:
Fixes https://github.com/pytorch/pytorch/issues/61767
## Changes
- [x] Add `torch.concat` alias to `torch.cat`
- [x] Add OpInfo for `cat`/`concat`
- [x] Fix `test_out` skips (Use `at::native::resize_output` or `at::native::resize_output_check`)
- [x] `cat`/`concat`
- [x] `stack`
- [x] `hstack`
- [x] `dstack`
- [x] `vstack`/`row_stack`
- [x] Remove redundant tests for `cat`/`stack`
~I've not added `cat`/`concat` to OpInfo `op_db` yet, since cat is a little more tricky than other OpInfos (should have a lot of tests) and currently there are no OpInfos for that. I can try to add that in a subsequent PR or maybe here itself, whatever is suggested.~
**Edit**: cat/concat OpInfo has been added.
**Note**: I've added the named tensor support for `concat` alias as well, maybe that's out of spec in `array-api` but it is still useful for consistency in PyTorch.
Thanks to krshrimali for guidance on my first PR :))
cc mruberry rgommers pmeier asmeurer leofang AnirudhDagar asi1024 emcastillo kmaehashi heitorschueroff krshrimali
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62560
Reviewed By: saketh-are
Differential Revision: D30762069
Pulled By: mruberry
fbshipit-source-id: 6985159d1d9756238890488a0ab3ae7699d94337
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63101
Move internal fx2trt to torch/fx/experimental/fx2trt and merge the two TRT interpreter we have right now. cc: mortzur as this might affect uru exporting script.
Move oss_acc_tracer to torch/fx/experimental/fx_acc.
Test Plan: CI
Reviewed By: jerryzh168
Differential Revision: D30257909
fbshipit-source-id: 4e374965fbf88d72e91844d9e9b6ff9b98f467d1
Summary:
Reference https://github.com/pytorch/pytorch/issues/50345
`zeta` was already present in the codebase to support computation of `polygamma`.
However, `zeta` only had `double(double, double)` signature **for CPU** before the PR (which meant that computation `polygamma` were always upcasted to `double` for zeta part).
With this PR, float computations will take place in float and double in double.
Have also refactored the code and moved the duplicate code from `Math.cuh` to `Math.h`
**Note**: For scipy, q is optional, and if it is `None`, it defaults `1` which corresponds to Reimann-Zeta. However, for `torch.specia.zeta`, I made it mandatory cause for me it feels odd without `q` this is Reimann-Zeta and with `q` it is the general Hurwitz Zeta. I think sticking to just general made more sense as passing `1` for q sounds trivial.
Verify:
* [x] Docs https://14234587-65600975-gh.circle-artifacts.com/0/docs/special.html#torch.special.zeta
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59623
Reviewed By: ngimel
Differential Revision: D29348269
Pulled By: mruberry
fbshipit-source-id: a3f9ebe1f7724dbe66de2b391afb9da1cfc3e4bb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60064
This implements a host saturation optimization to maximize the utilization of the available devices.
It uses a greedy heuristic to replicate all partitions on the used devices to another set of idle devices with enough memory.
The added unittest shows an example as follows:
```
partition_0: 192 bytes; partition_1: 48 bytes
dev_0: 200 bytes, [partition_0]
dev_1: 200 bytes, [partition_1]
dev_2: 100 bytes,
dev_3: 100 bytes,
dev_4: 200 bytes,
dev_5: 100 bytes
```
Before host saturation, `partition_0` is assigned to dev_0 and `partition_1` is assigned to dev_1.
After host saturation, `partition_0` is replicated to dev_4 simply because it's the only device that can hold all partitions on dev_0. `partition_1` is replicated to dev_2 because it has minimal but large enough memory to hold all partitions on dev_1.
Test Plan:
```
buck test mode/opt //caffe2/test:test_fx_experimental -- --exact 'caffe2/test:test_fx_experimental - test_saturate_host (test_fx_experimental.TestFXExperimental)'
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/8444249343103429
✓ ListingSuccess: caffe2/test:test_fx_experimental - main (1.322)
✓ Pass: caffe2/test:test_fx_experimental - test_saturate_host (test_fx_experimental.TestFXExperimental) (1.322)
Summary
Pass: 1
ListingSuccess: 1
```
An e2e test will be added to `test_fx_glow.py` in a followup diff.
Reviewed By: gcatron
Differential Revision: D29039998
fbshipit-source-id: 57518aadf668f7f05abd6ff73224c16b5d2a12ac
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60056
Previously we put the whole graph as a single partition onto a device with maximum memory if possible, but the code assumed that the first logical device always has the maximum memory.
This diff fixes this issue and updates the unittest to reflect such a corner case.
Test Plan:
```
buck test mode/opt //caffe2/test:test_fx_experimental -- --exact 'caffe2/test:test_fx_experimental - test_find_single_partition (test_fx_experimental.TestFXExperimental)'
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/6473924507772744
✓ ListingSuccess: caffe2/test:test_fx_experimental - main (1.357)
✓ Pass: caffe2/test:test_fx_experimental - test_find_single_partition (test_fx_experimental.TestFXExperimental) (1.206)
Summary
Pass: 1
ListingSuccess: 1
```
Reviewed By: gcatron
Differential Revision: D29118715
fbshipit-source-id: cac6a1f0d2f47717446dcc80093bbcf362663859
Summary:
This PR fixes a bug in test_fx_experimental where code generated for ops with kwarg-only Tensor parameters would fail to execute because they would be called as positional parameters.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58587
Reviewed By: ailzhang
Differential Revision: D28548365
Pulled By: heitorschueroff
fbshipit-source-id: 8f1746053cbad1b11e817b0099db545d8dd22232
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57483
Pull Request resolved: https://github.com/pytorch/glow/pull/5622
Quantized linear has packed parameters. We want to unpack it so that it would be easier for graph optimization and importer to deal with the weight and bias. A customized remapping function is used to unpack quantized linear and map it to acc_op.linear.
Test Plan: `buck test glow/fb/fx/nnpi_importer:test_importer`
Reviewed By: gcatron, jfix71, khabinov
Differential Revision: D27451237
fbshipit-source-id: e46e961734788fd5333e227ca6143fd37c33204e
Summary:
Fixes https://github.com/pytorch/pytorch/issues/57719.
This PR fixes `torch.Tensor{__rsub__, __rdiv__, __rtruediv__, __pow__, __rmatmul__}` to return `NotImplemented` instead of raising a `TypeError`.
cc/ mruberry: The first commit of this PR is the same as 1d209db1cc excepts the commit message.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57934
Reviewed By: mruberry
Differential Revision: D28351931
Pulled By: albanD
fbshipit-source-id: 985457a44dba24d2496794dfb8c1661cbcd4ff8f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55995
Normalization is kind of broken currently. But making default arguments visible still appears to work, and is nice functionality to still be able to rely on/use. Adds an option to `NormalizeArgs`'s `__init__` called `normalize_to_only_use_kwargs` which defaults to true, which if set to false will keep using the same signature as provided, but additionally set kwargs in kwargs.
Test Plan: Added test to `test_fx_experimental`.
Reviewed By: 842974287
Differential Revision: D27759448
fbshipit-source-id: 620061fcf46d8549ac70b62aede8b6740aee3778
Summary:
Commandeered from https://github.com/pytorch/pytorch/pull/54563
Primary changes from first PR:
1. Refactored primary `normalize_function` logic into `operator_schemas.py` so that non-FX users can use it.
2. Refactored tests a bit, and added a path to call `normalize_function` directly.
3. Moved check for `boolean_dispatch` so that `torch.lu` also gets properly handled.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55992
Reviewed By: mruberry
Differential Revision: D27774396
Pulled By: Chillee
fbshipit-source-id: 7f65632e1d608e4abd55aec5ccbfdc3f67f52b8e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56212
The current design doesn't make it easy to use `node.copy()`. Explicitly copy over the node's meta.
Test Plan: Updated `test_subgraph_creation` in `test_fx_experimental`
Reviewed By: jamesr66a
Differential Revision: D27808477
fbshipit-source-id: 7fe7b6428c830307dbd1e395f16fa2774936d3b3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55405
Pull Request resolved: https://github.com/pytorch/glow/pull/5516
Allows FXIRImport to import quantized model.
This diff doesn't include the supports for per-channel weights, linear and conv. Will address them in the next diff.
Test Plan: buck test glow/fb/fx/nnpi_importer:test_importer
Reviewed By: jackm321, jfix71
Differential Revision: D27313543
fbshipit-source-id: bf5c96ef5f2ff1835c09db981e0ceefaec56dd5b
Summary:
Part of https://github.com/pytorch/pytorch/issues/48209
Taken from the docstring:
Performs a set of optimization passes to optimize a model for the purposes of inference. Specifically, the passes that are run are:
1. Conv/BN fusion
2. Dropout removal
3. MKL layout optimizations
The third optimization takes a function `use_mkl_heuristic` that's used to determine whether a subgraph should be explicity run in MKL layout.
I implemented 2 heuristics:
1. Does it in MKL if the subgraph is larger than 2.
2. Benchmarks each subgraph with MKL layout and without, and keeps the subgraph if it's faster.
### Batch size of 10 and multi-threaded.
Results with the second heuristic are generally as strong as the "jit.freeze" version, except in `densenet` and `vgg`, where it's faster, likely due to the heuristic being better. With the first heuristic, there are some notable gaps, particularly on `inception_v3` and `alexnet`.
```
model Eager FX FX Auto jit.mkldnn
------------ --------- --------- --------- --------- -
custom 0.195614 0.14686 0.15929 0.156442 6
resnet18 0.172012 0.114007 0.119678 0.12945 6
resnet50 0.486463 0.294308 0.299518 0.318121 6
densenet161 0.955309 0.893502 0.882798 1.29315 6
inception_v3 0.38454 0.307076 0.239513 0.233083 6
googlenet 0.229388 0.237486 0.170458 0.174106 6
shufflenet 0.0513613 0.0286739 0.0292908 0.0267209 6
alexnet 0.0709602 0.0768137 0.0660831 0.0650399 6
vgg16 1.053993 0.9013264 0.9360212 1.082820 6
mobilenet 0.12264 0.0970935 0.0936568 0.106314 6
mnasnet 0.0989875 0.0412083 0.0424499 0.0472336 6
resnext 0.476811 0.315428 0.314422 0.343156 6
```
For single-threaded (still running...)
```
model eager FX FX auto mkl threads
------------ --------- --------- --------- --------- ---------
custom 0.0401415 0.259863 0.0263152 0.200667 1
resnet18 0.499931 0.382113 0.383711 0.396335 1
resnet50 1.10353 0.911865 0.923645 0.992125 1
densenet161 2.20158 2.39421 2.08204 2.30124 1
inception_v3 0.79161 0.849207 0.703546 0.724492 1
googlenet 0.66896 0.820965 0.515927 0.529414 1
shufflenet 0.0987308 0.0689343 0.0629298 0.0617193 1
alexnet 0.198795 0.19862 0.19325 0.211934 1
vgg16 3.744 3.2499 3.28503 3.31576 1
mobilenet 0.152725 0.14505 0.135555 0.159754 1
mnasnet 0.141983 0.089406 0.089599 0.0956167 1
resnext 1.13778 0.97016 0.955417 0.965376 1
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53805
Reviewed By: gmagogsfm
Differential Revision: D27424611
Pulled By: Chillee
fbshipit-source-id: a39137159de962fba7ca15121dfa9e78c1e01223
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53444
GraphModule construction has two options when constructing the base nn.Module: a dict of names to attrs to assign to the GraphModule, or another nn.Module to copy attrs from.
- For the dict case, add logic to explicitly register `nn.Tensors` that are not `nn.Parameter` as buffers on the GraphModule, else fall back to `__setattr__`.
- For the other `nn.Module` case, update so that it checks in the other module whether the attr to copy in is a buffer, and register it as such, else fall back to `__setattr__`.
Test Plan: Added tests for fetching params and buffers from a GraphModule using both dict and module `__init__`s
Reviewed By: jamesr66a
Differential Revision: D26860055
fbshipit-source-id: 8d9999f91fef20aaa10969558006fc356247591f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51974
Right now, when an FX `Graph` references an external object, we will emit
code like:
import foo
def forward(input: foo.bar.baz):
...
This is problematic in a world with `torch.package`, since then name
`foo.bar.baz` may reference a name from any number of packages.
This PR lays the groundwork for FX-package integration by separating the
resolution of external references from the genration of the function
code.
When generating a Graph's Python source, we keep track of all external
references and assign them unique names. At the end, we have a
dictionary mapping names -> actual objects. This becomes the `globals`
namespace we pass to `exec` when installing the forward function in a
`GraphModule`. This is nice because we can always be sure that `exec` is
seeing the same objects that were referenced from the `Graph`, no import
statements needed.
At serialization time, we use a `ModuleEnv` to resolve the globals dict
to a set of import statements that can be run to reprodce the `global`
namespace. This is only used on serialiation/deserialization, and those
functions are expected to check that the import statements are producing
the correct results.
Concretely, the code above will now look like:
from foo.bar import baz as foo_bar_baz
def forward(input: foo_bar_baz):
...
Test Plan: Imported from OSS
Reviewed By: jamesr66a
Differential Revision: D26340593
Pulled By: suo
fbshipit-source-id: fe247f75205d0a03fd067bdd0f95491e8edf1436
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50151
**Summary**
This commit adds a graph transformation pass that merges several matrix
multiplications that use the same RHS operand into one large matrix
multiplication. The LHS operands from all of the smaller matrix multiplications
are concatenated together and used as an input in the large matrix multiply,
and the result is split in order to obtain the same products as the original
set of matrix multiplications.
**Test Plan**
This commit adds a simple unit test with two matrix multiplications that share
the same RHS operand.
`python test/test_fx_experimental.py -k merge_matmul -v`
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D25809409
Pulled By: SplitInfinity
fbshipit-source-id: fb55c044a54dea9f07b71aa60d44b7a8f3966ed0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50120
This commit adds a graph transformation pass that merges several matrix
multiplications that use the same RHS operand into one large matrix
multiplication. The LHS operands from all of the smaller matrix multiplications
are concatenated together and used as an input in the large matrix multiply,
and the result is split in order to obtain the same products as the original
set of matrix multiplications.
Test Plan:
This commit adds a simple unit test with two matrix multiplications that share
the same RHS operand.
`buck test //caffe2/test:fx_experimental`
Reviewed By: jamesr66a
Differential Revision: D25239967
fbshipit-source-id: fb99ad25b7d83ff876da6d19dc4abd112d13001e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47935
Fetch the parameters that are needed for lowering from nn.Module to fx.node for leaf_modules.
Test Plan: A test `test_fetch` is added to test_fx_experimental.py.
Reviewed By: jfix71
Differential Revision: D24957142
fbshipit-source-id: a349bb718bbcb7f543a49f235e071a079da638b7