Commit Graph

65545 Commits

Author SHA1 Message Date
lezcano
acd02a60d5 Add a test making sure we are not importing SymPy when importing torch (#112038)
As per title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112038
Approved by: https://github.com/malfet, https://github.com/peterbell10
ghstack dependencies: #112035, #112036, #112037
2023-10-26 23:32:27 +00:00
lezcano
47ccf04885 Split SymNode into its own file (#112037)
This PR:

- Moves TrueDiv, LShift, RShift, IsNonOverlappingAndDenseIndicator to `_sympy.functions.py`
- Moves SymNode to `fx.experimental.sym_node`.
  - This file does not have any SymPy dependencies at import time
  - It installs the magic methods in Sym{Bool,Int,Float}.
  - N.b. With this split, we may be able to move Sym{Bool,Int,Float} to this file, and remove quite a few of the hacks around these classes
- Imports `sym_node` in `torch/__init__.py` rather than the whole `symbolic_shapes.py`.
  This breaks the import-time dependency between torch and SymPy

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112037
Approved by: https://github.com/peterbell10
ghstack dependencies: #112035, #112036
2023-10-26 23:32:27 +00:00
lezcano
deac5357db Make proxy_tensor.py not depend on SymPy (#112036)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112036
Approved by: https://github.com/malfet, https://github.com/peterbell10
ghstack dependencies: #112035
2023-10-26 23:32:19 +00:00
lezcano
4f7f46ee35 Move SymDispatchMode to its own file (#112035)
This is just code movement + a getter and a setter to break the
dependency of SymDispatchMode, and in turn, ProxySymDispatchMode on
sympy.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112035
Approved by: https://github.com/peterbell10
2023-10-26 23:32:11 +00:00
PyTorch MergeBot
55ab9932f5 Revert "Constrain sdpa to fx strides (#111721)"
This reverts commit 8a7c3cec78.

Reverted https://github.com/pytorch/pytorch/pull/111721 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is breaking ROCm job in trunk 8a7c3cec78 ([comment](https://github.com/pytorch/pytorch/pull/111721#issuecomment-1782064133))
2023-10-26 23:27:57 +00:00
PyTorch MergeBot
4a94f77c8e Revert "Make numpy/lib vendored tests dynamo traceable (#112147)"
This reverts commit 190b6e4ba8.

Reverted https://github.com/pytorch/pytorch/pull/112147 on behalf of https://github.com/huydhn due to Sorry for reverting this again, but this is failing in trunk 190b6e4ba8 ([comment](https://github.com/pytorch/pytorch/pull/112147#issuecomment-1782056995))
2023-10-26 23:23:49 +00:00
Shunting Zhang
73cc5d1cdd [inductor] benchmark fusion (#108193)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108193
Approved by: https://github.com/jansel
2023-10-26 22:18:37 +00:00
Nikita Shulga
e660bd1422 Re-enable some embedded bag tests (#111712)
They were temporary disabled in 2019 by  https://github.com/pytorch/pytorch/pull/26599

As suggested, increased relative tolerance from 0 to 2% when tests are using float16 dtype

<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at 1e49d84</samp>

> _`TestEmbeddingNN`_
> _CUDA tests restored_
> _Bug fixed in autumn breeze_

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111712
Approved by: https://github.com/huydhn
2023-10-26 22:16:38 +00:00
Evgeni Burovski
190b6e4ba8 Make numpy/lib vendored tests dynamo traceable (#112147)
Follow up https://github.com/pytorch/pytorch/pull/112146 and  #112141 : make numpy/lib vendored tests dynamo traceable

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112147
Approved by: https://github.com/lezcano
2023-10-26 21:41:22 +00:00
PyTorch MergeBot
abe172e268 Revert "Cleanup error reporting for ProcessGroupNCCL (#111979)"
This reverts commit b29c658265.

Reverted https://github.com/pytorch/pytorch/pull/111979 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing multigpu test in trunk b29c658265 ([comment](https://github.com/pytorch/pytorch/pull/111979#issuecomment-1781919184))
2023-10-26 21:29:40 +00:00
rzou
d91a18c433 Grandfather in torchgen'ed aten ops to torch.Tag.pt2_compliant_tag (#112053)
In torchgen, we add the pt2_compliant_tag to all aten ops.

Test Plan:
- new test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112053
Approved by: https://github.com/soulitzer
2023-10-26 21:21:09 +00:00
Jon Chuang
27cf49549a [dynamo] ExecutorchCallDelegateHigherOrderVariable - add sanity check that input and output tensors are disjoint (#111960)
Fixes https://github.com/pytorch/pytorch/issues/111917

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111960
Approved by: https://github.com/zou3519
2023-10-26 21:13:05 +00:00
Bin Bao
73f36e44fb [aotinductor] Add a debug compile flag (#112021)
Summary: When the debug compile flag is specified, model.so is compiled with "-O0 -g".

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112021
Approved by: https://github.com/chenyang78
ghstack dependencies: #111823
2023-10-26 21:11:08 +00:00
Bin Bao
f66cc67562 [aotinductor] Fix duplicated unbacked symbol declarations (#111823)
Summary: For https://github.com/pytorch/pytorch/issues/111711

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111823
Approved by: https://github.com/ezyang, https://github.com/aakhundov
2023-10-26 21:11:08 +00:00
Lengyue
f839a5627b Add bf16 support to replicate padding (#112099)
Fixes #99433

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112099
Approved by: https://github.com/mikaylagawarecki
2023-10-26 20:30:49 +00:00
Elias Ellison
8a7c3cec78 Constrain sdpa to fx strides (#111721)
Fix for https://github.com/pytorch/pytorch/issues/109607. sdpa requires last dimension strides to be 1. Add constraint so that we run the op with the strides we observed in tracing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111721
Approved by: https://github.com/drisspg, https://github.com/Chillee, https://github.com/jansel
ghstack dependencies: #111976
2023-10-26 20:21:55 +00:00
Justin Yip
1b702b185e [pytorch-vulkan] disable one zero-dim tensor test to fix test (#112087)
Summary:
D50347338 has bug on android (not Mac, not Devserver).

This diff disable the test for time being while I identify the actual cause.

Test Plan:
##  Compile on devserver

```
[yipjustin@129360.od ~/fbsource (e415d865c)]$ buck2 build -c ndk.static_linking=true -c pt.enable_qpl=0  --target-platforms=ovr_config//platform/android:arm32-fbsource //xplat/caffe2:pt_vulkan_api_test_binAndroid  --show-output
File changed: fbcode//caffe2/aten/src/ATen/test/vulkan_api_test.cpp
File changed: fbsource//xplat/caffe2/aten/src/ATen/test/vulkan_api_test.cpp
Buck UI: https://www.internalfb.com/buck2/99d47e63-ed6e-4db9-bee2-24909d647b78
Network: Up: 3.2KiB  Down: 67KiB  (reSessionID-459e359b-773c-48a4-b129-81fde7c5e876)
Jobs completed: 4664. Time elapsed: 7.3s.
Cache hits: 100%. Commands: 38 (cached: 38, remote: 0, local: 0)
BUILD SUCCEEDED
fbsource//xplat/caffe2:pt_vulkan_api_test_binAndroid buck-out/v2/gen/fbsource/f1f3f9bed27e143c/xplat/caffe2/__pt_vulkan_api_test_binAndroid__/pt_vulkan_api_test_binAndroid
```

## Run test.
adb shell /data/local/tmp/pt_vulkan_api_test_binAndroid | pastry

Result: P864940908
```
...
[       OK ] VulkanAPITest.lstm_success (7 ms)
[ RUN      ] VulkanAPITest.lstm_mclareninputs_success
[       OK ] VulkanAPITest.lstm_mclareninputs_success (56 ms)
[ RUN      ] VulkanAPITest.lstm_prepack_success
[       OK ] VulkanAPITest.lstm_prepack_success (7 ms)
[ RUN      ] VulkanAPITest.querypool_flushed_shader_log
xplat/caffe2/aten/src/ATen/test/vulkan_api_test.cpp:7568: Skipped
QueryPool is not available
[  SKIPPED ] VulkanAPITest.querypool_flushed_shader_log (0 ms)
[----------] 391 tests from VulkanAPITest (30715 ms total)
[----------] Global test environment tear-down
[==========] 391 tests from 1 test suite ran. (30715 ms total)
[  PASSED  ] 390 tests.
[  SKIPPED ] 1 test, listed below:
[  SKIPPED ] VulkanAPITest.querypool_flushed_shader_log
  YOU HAVE 7 DISABLED TESTS

```

Reviewed By: liuk22

Differential Revision: D50668570

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112087
Approved by: https://github.com/izaitsevfb, https://github.com/SS-JIA
2023-10-26 19:48:40 +00:00
Yang Chen
5e5329155e [aotinductor] only include -lc10 for non-fbcode case (#112125)
Summary: otherwise, we would break internal uses

Differential Revision: D50681467

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112125
Approved by: https://github.com/swolchok, https://github.com/desertfire, https://github.com/SherlockNoMad
2023-10-26 19:47:08 +00:00
PyTorch MergeBot
3a284dae30 Revert "Do not materialize entire randperm in RandomSampler (#103339)"
This reverts commit d80174e2db.

Reverted https://github.com/pytorch/pytorch/pull/103339 on behalf of https://github.com/kit1980 due to Cause issues on MPS, and also fails without numpy ([comment](https://github.com/pytorch/pytorch/pull/103339#issuecomment-1781705172))
2023-10-26 18:53:14 +00:00
Thiago Crepaldi
b7affa2ac3 Add unit test for ONNX models with torch.distributions.normal.Normal (#111498)
Fixes #111034
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111498
Approved by: https://github.com/justinchuby, https://github.com/BowenBao
2023-10-26 17:57:34 +00:00
ydwu4
8bc0b382fa [HigherOrderOp] Move map_impl to torch.ops.higher_order (#111404)
The purpose of this pr is as titled. Because of some misusage of ghstack, ghimport, and export to github from internal, the stack of https://github.com/pytorch/pytorch/pull/111092 is a mess. I'll try to land them one by one. This is a replacement for #111092 and #111400.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111404
Approved by: https://github.com/tugsbayasgalan, https://github.com/zou3519
2023-10-26 16:59:10 +00:00
Huy Do
f6f81a5969 Update get-workflow-job-id to also return job name (#112103)
Then we can use this job name in `filter-test-configs` if it's available.  This addresses the issue in which `filter-test-configs` on GitHub runners (MacOS x86) couldn't find the runner log to get the job name.  This is expected because GitHub runners are isolated, so a job should not be able to access runner logs, which could contains information from other jobs.

This allows all missing features depending on running `filter-test-configs` on GitHub runners:
* Rerun disabled tests and memory leak check. For example, this would help avoid closing https://github.com/pytorch/pytorch/issues/110980#issuecomment-1779806466 early with the disabled test running properly on MacOS x86
* MacOS x86 jobs can now be disabled or marked as unstable

I keep the current logic to parse the log as a fallback because it's working fine on self-hosted runners.  That also handles the case where `get-workflow-job-id` fails.  Also I move the rest of `get-workflow-job-id` up before the test step like https://github.com/pytorch/pytorch/pull/111483

### Testing

Spot checks some jobs to confirm they have the correct names:

* MacOS M1 test job https://github.com/pytorch/pytorch/actions/runs/6648305319/job/18065275722?pr=112103#step:10:8
* MacOS x86 build job https://github.com/pytorch/pytorch/actions/runs/6648306305/job/18065138137?pr=112103#step:9:14
* Linux test job has https://github.com/pytorch/pytorch/actions/runs/6648300991/job/18065354503?pr=112103#step:13:7
* Windows test job https://github.com/pytorch/pytorch/actions/runs/6648305319/job/18065599500?pr=112103#step:12:7
* MacOS x86 test job https://github.com/pytorch/pytorch/actions/runs/6648306305/job/18066312801#step:10:8
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112103
Approved by: https://github.com/clee2000
2023-10-26 16:42:46 +00:00
PyTorch MergeBot
485cc0faae Revert "[inductor] benchmark fusion (#108193)"
This reverts commit ec0cdcdf6a.

Reverted https://github.com/pytorch/pytorch/pull/108193 on behalf of https://github.com/ZainRizvi due to This test is breaking trunk. In the future please make sure to add the ciflow/trunk label before force merging any PR to ensure your code doesn't break those tests ([comment](https://github.com/pytorch/pytorch/pull/108193#issuecomment-1781473282))
2023-10-26 16:41:20 +00:00
Edward Z. Yang
7da713bbaf Convert evaluate_expr GuardOnDataDependentSymNode into graph break (#111919)
Extracted this failure from
https://github.com/pytorch/pytorch/pull/110155

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111919
Approved by: https://github.com/lezcano
2023-10-26 16:28:00 +00:00
ydwu4
036abd43b3 [dynamo] Preserve node names in export (#111947)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111947
Approved by: https://github.com/ydwu4, https://github.com/zou3519
2023-10-26 16:11:35 +00:00
angelayi
b126adcdee [aotinductor] Pass TorchIR to AOTInductor (#110020)
Updates `_export.aot_compile` to pass a torch IR graph to inductor, allowing inductor to now run the pre_grad_passes, and reuse more of inductor's code.
Also updates the API to only return the `so_path`, and not returning the exported program. The pytree call spec is now serialized and placed inside of the generated model code. When calling the model, because there is no c++ pytree implementation linked yet, we can access the call specs through `get_call_spec()`, and call pytree flatten/unflattenin python.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110020
Approved by: https://github.com/desertfire
2023-10-26 15:54:31 +00:00
Evgeni Burovski
ed2cc4dd59 TST: make torch_np added tests dynamo traceable (#112149)
Follow up https://github.com/pytorch/pytorch/pull/112146,  https://github.com/pytorch/pytorch/pull/112141 and https://github.com/pytorch/pytorch/pull/112147: make torch_np added tests dynamo traceable

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112149
Approved by: https://github.com/lezcano
2023-10-26 15:36:36 +00:00
Joel Schlosser
42e4c648a2 New @decorateIf decorator for param-specific conditional decoration (#112033)
Adds a new decorator `@decorateIf(decorator, predicate_fn)`. Examples:
```python
from torch.testing._internal.common_utils import decorateIf
...

@decorateIf(unittest.skip, lambda params: params["x"] == 2)
@parametrize("x", range(5))
def test_foo(self, x):
    ...

@parametrize("x,y", [(1, 'foo'), (2, 'bar'), (3, 'baz')])
@decorateIf(
    unittest.expectedFailure,
    lambda params: params["x"] == 3 and params["y"] == "baz"
)
def test_bar(self, x, y):
    ...

@decorateIf(
    unittest.expectedFailure,
    lambda params: params["op"].name == "add" and params["dtype"] == torch.float16
)
@ops(op_db)
def test_op_foo(self, device, dtype, op):
    ...

@decorateIf(
    unittest.skip,
    lambda params: params["module_info"].module_cls is torch.nn.Linear and \
        params["device"] == "cpu"
)
@modules(module_db)
def test_module_foo(self, device, dtype, module_info):
    ...
```

Follow-up for per-param decoration based on https://github.com/pytorch/pytorch/issues/79161#issuecomment-1152487359
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112033
Approved by: https://github.com/clee2000, https://github.com/pmeier
2023-10-26 14:39:59 +00:00
Yang Chen
7671be8108 [aotinductor] allow generating default args in fbcode (#112085)
Summary:
Previously, we want to maintain forward-compatibility by skipping
default args in the serialized artifacts in fbcode. However, some of our shim
interfaces require default values being set. Discussed with Sherlock offline
and we decided to allow serializing default args into the C++ wrapper code
for now. We will refine this part if we see real FC requirement.

Test Plan: ci

Differential Revision: D50638663

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112085
Approved by: https://github.com/SherlockNoMad
2023-10-26 14:17:54 +00:00
lezcano
c8a5bb451e Do not import sympy within torch._prims_common (#112034)
This is the first of a few PRs that avoid importing SymPy at import time.
The pitch here is that we (almost!) do not have SymPy on our API, so
this should be feasible.

This should speed-up torch imports by a good 15% as per
https://dev-discuss.pytorch.org/t/delving-into-what-happens-when-you-import-torch/1589

In this PR we just move a few global imports into local imports.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112034
Approved by: https://github.com/ezyang
2023-10-26 12:53:25 +00:00
Jon Chuang
d6724a51f9 [dynamo] md5 hash non compile_ignored configs (#111298)
fixes: https://github.com/pytorch/pytorch/issues/111235

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111298
Approved by: https://github.com/ezyang
ghstack dependencies: #111303
2023-10-26 10:59:10 +00:00
Cao E
1c89ea7f72 Add Half support for softmax and log_softmax on CPU (#103315)
Add Half support for softmax and log_softmax on CPU.
Note: This introduces a correctness issue with MPS https://github.com/pytorch/pytorch/issues/111416 and https://github.com/pytorch/pytorch/issues/111479.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103315
Approved by: https://github.com/jgong5, https://github.com/mikaylagawarecki, https://github.com/malfet
2023-10-26 08:38:54 +00:00
dshi7
fbff99ffea Add regex matching to Inductor all2all collective unit tests (#112077)
Fixes #111776

Support check_regex in FileCheck() by adding `find_regex` in `struct TORCH_API StringCordView`.
Callsite accepts RE syntax for std::regex.

However, I haven't figured out submatch ID yet.
For example, "buf5[0], buf6_inputs[0]" is still considered a match.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112077
Approved by: https://github.com/yf225
2023-10-26 08:29:30 +00:00
XiaobingSuper
395614c1a4 keep sync bn training flag same with converted bn's training flag (#111998)
When converting bn to sync bn, we need to keep sync bn's training flag with the original bn flag, the motivation is there in case the given origin model has set some bn training flag and others are not seated, after we convert sync bn, we hoping not to change this behavior.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111998
Approved by: https://github.com/mikaylagawarecki
2023-10-26 08:18:08 +00:00
chilli
e38347f490 Readded device_assert skipping in index and index_put (and also added (#112093)
copy to noop pass)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112093
Approved by: https://github.com/oulgen, https://github.com/lezcano
ghstack dependencies: #111990
2023-10-26 07:54:44 +00:00
Jon Chuang
d090c18fca [dynamo] annotate config with @compile_ignored (#111303)
Fixes: #111221

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111303
Approved by: https://github.com/ezyang
2023-10-26 05:41:29 +00:00
Jez Ng
89bd17552d [dynamo] Enable typechecking for funcname_cache.py (#112031)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112031
Approved by: https://github.com/Skylion007
ghstack dependencies: #111894, #111992
2023-10-26 04:54:16 +00:00
Jez Ng
413baa1b25 [dynamo] Enable typechecking for codegen.py (#111992)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111992
Approved by: https://github.com/Skylion007, https://github.com/eellison
ghstack dependencies: #111894
2023-10-26 04:54:16 +00:00
Jez Ng
e67d2c9825 [dynamo] Enable typechecking for allowed_functions.py (#111894)
Motivation: MYPYNOFOLLOW currently typechecks almost all inductor files
and some dynamo files as well. However, it has `follow_imports=skip`
enabled which greatly nerfs its effectiveness. I would like to enable
import following for all the files currently checked by MYPYNOFOLLOW.
But that leads to a lot of new errors in other files.

I can exclude errors from files in other directories, but it is somewhat
difficult to do that for dynamo and inductor files themselves. Thus I am
making sure all the dynamo files typecheck first.

Note on changes: I could not type the return value of
`make_function_id_set` since it was returning a class defined in the
function body. Thus I deleted `make_function_id_set` and replaced it
with a direct construction of the `FunctionIdSet` instead.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111894
Approved by: https://github.com/Skylion007, https://github.com/eellison
2023-10-26 04:54:16 +00:00
Nikita Shulga
b61efe1c2b Fix torch.[size|stride](dim=None)` invocation (#111991)
Per documentation, one should be able to explicitly pass dim argument as None to get tensor size across all dimentions/strides, but before this change it was incorrectly interpreted as named tensor call.

Modify `size` and `stride` signatures generated by `gen_pyi.py` to highlight that overload with `None` will return a Tuple, but one with `dim: _int` returns `int`.

Add regression test to validate the behavior, and remove the check for asserts from two named tensors tests (NamedTensors are dead, aren't they?)

Fixes https://github.com/pytorch/pytorch/issues/111944
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111991
Approved by: https://github.com/zou3519
2023-10-26 04:14:35 +00:00
Shunting Zhang
ec0cdcdf6a [inductor] benchmark fusion (#108193)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108193
Approved by: https://github.com/jansel
2023-10-26 04:14:22 +00:00
Jon Chuang
edafe2ddb9 [dynamo] Be stricter about HigherOrderOperator kwargs (#111938)
kwargs need to be handled carefully in speculate subgraph. We should be clearer about the contract of what the inputs are.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111938
Approved by: https://github.com/zou3519
2023-10-26 03:51:30 +00:00
Brian Hirsh
2aaa7e542c AOTAutograd: avoid intermediate_base logic when all aliased outputs came from a multi_output_view (#111411)
Partially addresses https://github.com/pytorch/pytorch/issues/111081

This fixes the majority of the slowness from https://fb.workplace.com/groups/1405155842844877/permalink/7491314274228973/. In particular, the type of example that suffers the most perf-wise in AOTAutograd looks like this:
```
@torch.compile
def f(x):
    intermediate = x.mul(2)
    outs = intermediate.unbind(0)
    return *outs

x = torch.randn(50, 50, requires_grad=True)
outs = f(x)
sum(outs).sum().backward()
```

There are 50 output tensors in the above function, that all alias each other. AOTAutograd will dutifully exercise its intermediate base [logic](https://github.com/pytorch/pytorch/blob/main/torch/_functorch/aot_autograd.py#L294), and try to regenerate the aliases outside of the compiled `autograd.Function` at runtime, to ensure that the autograd engine is aware of the aliasing.

In this case, this will result in **50 AsStridedBackward nodes in the backward**, because we will fall back to using as_strided to generate each of those 50 outputs. The current PR as is (somewhat unsafely) ensures that the backward graph consists of a single `UnbindBackward`, or a call to `aten.cat()`.

I left a long comment in the code describing the situation, but the core idea is that **autograd does not let you mutate grad_fn of tensor aliases that come from multi-output views**. So if we have `k` outputs that alias each other, but `k-1` of them are aliases that came from multi-output views, then in eager mode, it would not be possible to mutate one of the aliases in a way that would change the grad_fn of any of the other aliases, without causing an error in the backward. So the claim I'm making is that if we hide this aliasing from the autograd engine, then it is impossible for the user to perform any mutations that would cause autograd metadata to diverge between torch.compile and eager in a way that isn't an error in eager mode.

To be fair, I think that taking the approach outlined in https://docs.google.com/document/d/1DlfFq8TKbuAn2zyJxLfoW-X1qkkm5PLdHFtySo03QAk/edit would also help us avoid the as_strided calls in this particularly egregious case, **and** keep the autograd error messages. This relies on both pre-dispatch functionalization being fully hardened **and** adding some pretty invasive changes to AOTAutograd though, and is probably at least several months out.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111411
Approved by: https://github.com/ezyang
2023-10-26 02:54:50 +00:00
Jeff Daily
28c0b07d19 [ROCm] remove HCC references (#111975)
- rename `__HIP_PLATFORM_HCC__` to `__HIP_PLATFORM_AMD__`
- rename `HIP_HCC_FLAGS` to `HIP_CLANG_FLAGS`
- rename `PYTORCH_HIP_HCC_LIBRARIES` to `PYTORCH_HIP_LIBRARIES`
- workaround in tools/amd_build/build_amd.py until submodules are updated

These symbols have had a long deprecation cycle and will finally be removed in ROCm 6.0.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111975
Approved by: https://github.com/ezyang, https://github.com/hongxiayang
2023-10-26 02:39:10 +00:00
Kurt Mohler
f1785373c0 Add torch.utils.deterministic.fill_uninitialized_memory flag (#111377)
Part of #109802

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111377
Approved by: https://github.com/albanD
2023-10-26 02:39:06 +00:00
Hongtao Yu
7a3a00bb0b [inductor] Remove redundant views (#111773)
As a follow-up to https://github.com/pytorch/pytorch/pull/110740, this patches enables removing redundant complex views to allow more operation fusing.

E.g,  given

```
@torch.compile
def foo(X, Y):
    Z = X + Y
    A = X + Y
    return A + Z
```

the generated code is:

```
@triton.jit
def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
    xnumel = 6
    xoffset = tl.program_id(0) * XBLOCK
    xindex = xoffset + tl.arange(0, XBLOCK)[:]
    xmask = xindex < xnumel
    x0 = xindex
    tmp0 = tl.load(in_ptr0 + (x0), xmask)
    tmp1 = tl.load(in_ptr1 + (x0), xmask)
    tmp2 = tmp0 + tmp1
    tmp3 = tmp2 + tmp2
    tl.store(out_ptr0 + (x0), tmp3, xmask)
''')

def call(args):
    arg0_1, arg1_1 = args
    args.clear()
    assert_size_stride(arg0_1, (3, ), (1, ))
    assert_size_stride(arg1_1, (3, ), (1, ))
    with torch.cuda._DeviceGuard(0):
        torch.cuda.set_device(0) # no-op to ensure context
        # Source Nodes: [A], Original ATen: [aten.add]
        buf0 = aten.view.dtype(arg0_1, torch.float32)
        del arg0_1
        buf1 = buf0
        del buf0
        # Source Nodes: [A], Original ATen: [aten.add]
        buf2 = aten.view.dtype(arg1_1, torch.float32)
        del arg1_1
        buf3 = buf2
        del buf2
        buf4 = empty_strided((6, ), (1, ), device='cuda', dtype=torch.float32)
        # Source Nodes: [add_2], Original ATen: [aten.add]
        stream0 = get_cuda_stream(0)
        triton_poi_fused_add_0.run(buf1, buf3, buf4, 6, grid=grid(6), stream=stream0)
        del buf1
        del buf3
        # Source Nodes: [add_2], Original ATen: [aten.add]
        buf5 = aten.view.dtype(buf4, torch.complex64)
        del buf4
        buf6 = buf5
        del buf5
        return (buf6, )
```

whereas previously the generated code was:

```
@triton.jit
def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
    xnumel = 6
    xoffset = tl.program_id(0) * XBLOCK
    xindex = xoffset + tl.arange(0, XBLOCK)[:]
    xmask = xindex < xnumel
    x0 = xindex
    tmp0 = tl.load(in_ptr0 + (x0), xmask)
    tmp1 = tl.load(in_ptr1 + (x0), xmask)
    tmp2 = tmp0 + tmp1
    tl.store(out_ptr0 + (x0), tmp2, xmask)

def call(args):
    arg0_1, arg1_1 = args
    args.clear()
    assert_size_stride(arg0_1, (3, ), (1, ))
    assert_size_stride(arg1_1, (3, ), (1, ))
    with torch.cuda._DeviceGuard(0):
        torch.cuda.set_device(0) # no-op to ensure context
        # Source Nodes: [A], Original ATen: [aten.add]
        buf0 = aten.view.dtype(arg0_1, torch.float32)
        buf1 = buf0
        del buf0
        # Source Nodes: [A], Original ATen: [aten.add]
        buf2 = aten.view.dtype(arg1_1, torch.float32)
        buf3 = buf2
        del buf2
        buf4 = empty_strided((6, ), (1, ), device='cuda', dtype=torch.float32)
        # Source Nodes: [A], Original ATen: [aten.add]
        stream0 = get_cuda_stream(0)
        triton_poi_fused_add_0.run(buf1, buf3, buf4, 6, grid=grid(6), stream=stream0)
        del buf1
        del buf3
        # Source Nodes: [A], Original ATen: [aten.add]
        buf5 = aten.view.dtype(buf4, torch.complex64)
        buf6 = buf5
        del buf5
        # Source Nodes: [add_2], Original ATen: [aten.add]
        buf7 = aten.view.dtype(buf6, torch.float32)
        del buf6
        buf8 = buf7
        del buf7
        # Source Nodes: [Z], Original ATen: [aten.add]
        buf9 = aten.view.dtype(arg0_1, torch.float32)
        del arg0_1
        buf10 = buf9
        del buf9
        # Source Nodes: [Z], Original ATen: [aten.add]
        buf11 = aten.view.dtype(arg1_1, torch.float32)
        del arg1_1
        buf12 = buf11
        del buf11
        buf13 = buf4; del buf4  # reuse
        # Source Nodes: [Z], Original ATen: [aten.add]
        triton_poi_fused_add_0.run(buf10, buf12, buf13, 6, grid=grid(6), stream=stream0)
        del buf10
        del buf12
        # Source Nodes: [Z], Original ATen: [aten.add]
        buf14 = aten.view.dtype(buf13, torch.complex64)
        buf15 = buf14
        del buf14
        # Source Nodes: [add_2], Original ATen: [aten.add]
        buf16 = aten.view.dtype(buf15, torch.float32)
        del buf15
        buf17 = buf16
        del buf16
        buf18 = buf13; del buf13  # reuse
        # Source Nodes: [add_2], Original ATen: [aten.add]
        triton_poi_fused_add_0.run(buf8, buf17, buf18, 6, grid=grid(6), stream=stream0)
        del buf17
        del buf8
        # Source Nodes: [add_2], Original ATen: [aten.add]
        buf19 = aten.view.dtype(buf18, torch.complex64)
        del buf18
        buf20 = buf19
        del buf19
        return (buf20, )
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111773
Approved by: https://github.com/jansel
2023-10-26 02:37:17 +00:00
Zhengxu Chen
64d75f72d4 [fx] Add a faster method for inserting positional argument. (#111974)
Summary:
Traditionally when user want to update the arguments for an FX node, the only way is to call the setter of .args property on nodes. This may be problematic when we insert a lot of arguments. Because of the semantics of the setter method, it has a worst case O(n) complexity.

Adding a new insert_arg provides us two benefits:
1. The operation is guaranteed to be O(1) cost.
2. User can express the intentation more directly, instead of writing code like `node.args = (arg,) + node.args`

Test Plan: caffe2/test:fx -- -r test_insert_arg

Reviewed By: suo

Differential Revision: D50574435

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111974
Approved by: https://github.com/angelayi
2023-10-26 02:30:42 +00:00
Pritam Damania
b29c658265 Cleanup error reporting for ProcessGroupNCCL (#111979)
Continuing some of the work from https://github.com/pytorch/pytorch/pull/108191, I realized majority of errors raised from ProcessGroupNCCL were just generic RuntimeError.

In this PR, I've added appropriate error types to all the exceptions raised from ProcessGroupNCCL.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111979
Approved by: https://github.com/fduwjj
2023-10-26 01:39:54 +00:00
chilli
74adb4cccc Updated flop counter to accept pytree inputs/outputs (#111990)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111990
Approved by: https://github.com/ezyang
2023-10-26 01:25:27 +00:00
PyTorch MergeBot
d641450180 Revert "[cpu][inductor] improve cpu vec implementations of log (#111898)"
This reverts commit b570320364.

Reverted https://github.com/pytorch/pytorch/pull/111898 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/111898#issuecomment-1780263780))
2023-10-26 01:12:19 +00:00