Commit Graph

2074 Commits

Author SHA1 Message Date
Adnan Akhundov
809ff3b274 Add host-side Triton TMA support to Dynamo (#137677)
This adds Dynamo tracing support for the host-side Triton TMA API (see `create_2d_tma_descriptor` calls on the host in the [Triton tutorial](https://triton-lang.org/main/getting-started/tutorials/09-persistent-matmul.html#sphx-glr-getting-started-tutorials-09-persistent-matmul-py)). A few notes:

- Here we assume the availability of the host-side TMA API added to upstream Triton in https://github.com/triton-lang/triton/pull/4498. As of time of writing, this is not a part of the PT2 OSS Triton pin (although back-ported internally). OSS Triton pin update should be done in December 2024.
- To capture the chain of calls `t.data_ptr() --> create_{1d,2d}_tma_descriptor(ptr, ...) --> kernel[grid](tma_desc, ...)`, we add three new variable trackers: `DataPtrVariable`, `CreateTMADescriptorVariable` (for the function), `TMADescriptorVariable` (for TMA descriptor object). This is to maintain the path back from the Triton kernel to the Tensor from which the TMA descriptor has been created.
- The newly introduced variables have `reconstruct` methods used in case of graph breaks.
- The `tma_descriptor_metadata` extracted from the captured `create_{1d,2d}_tma_descriptor` calls is propagated through the HOPs in Dynamo and AOTAutograd to be used by the downstream compiler (e.g., Inductor). See the unit tests for how the captured HOP arguments look like.
- In the Dynamo-captured fx graph, we replace the TMA descriptor arguments of the Triton kernel by the underlying Tensors, to be able to track the input/output relationships in terms of Tensors.
- In the Triton kernel mutation analysis pass (in AOTAutograd), we use the `tt.experimental_descriptor_store` TTIR op to detect mutations of the underlying tensors via TMA descriptors. So that downstream AOTAutograd can perform functionalizations as required.
- JIT Inductor and AOT Inductor support will be implemented in follow-up PRs.

Differential Revision: [D64404928](https://our.internmc.facebook.com/intern/diff/D64404928)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137677
Approved by: https://github.com/zou3519
2024-10-16 02:18:48 +00:00
Isuru Fernando
08ce3aac62 Cache some ValueRanges (#137438)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137438
Approved by: https://github.com/ezyang
2024-10-13 19:23:34 +00:00
xangma
fe8d66d9a6 Faster Faster BatchSampler (#137423)
Builds upon #76951.

Benchmarking code is the same as in #76950.

AMD Ryzen Threadripper PRO 3995WX:
```
  batch_size  drop_last      origin     new  speedup
------------  -----------  --------  ------  ---------
           4  True           0.94    0.5706  64.74%
           4  False          0.9745  0.9468  2.93%
           8  True           0.7423  0.3715  99.82%
           8  False          0.7974  0.5666  40.73%
          64  True           0.5394  0.2085  158.76%
          64  False          0.6083  0.2697  125.51%
         640  True           0.5448  0.1985  174.41%
         640  False          0.7085  0.2308  206.91%
        6400  True           0.5554  0.2028  173.88%
        6400  False          0.7711  0.2109  265.60%
       64000  True           0.556   0.2091  165.82%
       64000  False          0.7803  0.2078  275.58%
```

When `drop_last == True`, it uses `zip` to speed things up.
When `drop_last == False`, it uses `itertools` to speed things up.

`itertools` was the fastest way I could find that deals with the last batch if it is smaller than `batch_size`. I have a pure python method too, but it is slower when `batch_size` is 4 or 8, so I have committed the `itertools` version for now.

Happy to chat further about this change :-) I understand you may not want to introduce the `itertools` package into [sampler.py](https://github.com/pytorch/pytorch/blob/main/torch/utils/data/sampler.py).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137423
Approved by: https://github.com/Skylion007
2024-10-13 09:36:03 +00:00
Xiaodong Wang
eea1f79a1d [AMD] use rccl.h instead of rccl/rccl.h (#135472)
Summary: We hipify NCCLUtils.h from nccl.h to rccl/rccl.h. This follows the format of the rocm rpm suite (the header is in include/rccl/rccl.h), however the source code is just src/rccl.h. Using the rccl/rccl.h will make us find the rpm's header but not the src code's header.

Test Plan:
buck run mode/opt-amd-gpu -c hpc_comms.use_rccl=develop -c fbcode.split-dwarf=True  --config rccl.build_rdma_core=true --config rccl.adhoc_brcm=true  //aps_models/ads/icvr:icvr_launcher -- mode=local_ctr_cvr_cmf_rep_1000x_v1_no_atom   data_loader.dataset.table_ds=[2024-09-04]   data_loader.dataset.batch_size=512  max_ind_range=10

w/o this diff, it'll show 2.18 nccl version

Differential Revision: D62371434

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135472
Approved by: https://github.com/jeffdaily, https://github.com/cenzhaometa
2024-10-10 08:55:57 +00:00
Edward Z. Yang
d9f4a7d3f9 Simplify find_localzeros (#133325)
Instead of doing an N^2 connected thing, only do simplifications for binary max/min, and for very simple situations.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Differential Revision: [D64135230](https://our.internmc.facebook.com/intern/diff/D64135230)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133325
Approved by: https://github.com/albanD
2024-10-10 00:52:50 +00:00
PyTorch MergeBot
16a2c2cfd4 Revert "Introduce torch.sym_sum (#136429)"
This reverts commit 90bed32b98.

Reverted https://github.com/pytorch/pytorch/pull/136429 on behalf of https://github.com/ezyang due to fails internal stuff ([comment](https://github.com/pytorch/pytorch/pull/136429#issuecomment-2403335147))
2024-10-09 20:08:01 +00:00
Bob Ren
36133f39db Tensorify compute on Python scalars (#136674)
Signed-off-by: Bob Ren <bobrenfb.com>

Comandeered from https://github.com/pytorch/pytorch/pull/130228 as I'm helping @ezyang w/ shipping dynamic float arguments in PT2. This starts with supporting torch.ops.aten.mul. I'll stack on top support for other operators in subsequent PRs to keep this scoped to the mechanics of the fx pass.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136674
Approved by: https://github.com/ezyang
2024-10-09 18:51:41 +00:00
Edward Z. Yang
1aac1ffce1 Don't generate implicit value ranges for missing symbols. (#136667)
Instead, callback to a missing handler when needed. This greatly speeds things up with the value ranges dict is large. The missing handler is needed because nested ints don't have VRs, but symbolic sizes involving them occasionally show up in compute.

```
TORCHDYNAMO_EXTENDED_DEBUG_CREATE_SYMBOL="s11" TORCH_LOGS=dynamic PYTORCH_TEST_WITH_DYNAMO=1 python test/test_nestedtensor.py TestNestedTensorAutogradCPU.test_dropout_backward_jagged_cpu
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136667
Approved by: https://github.com/isuruf
ghstack dependencies: #136429
2024-10-08 18:12:57 +00:00
Edward Z. Yang
90bed32b98 Introduce torch.sym_sum (#136429)
Partially addresses https://github.com/pytorch/pytorch/issues/128150

When you have big sums of values, we end up computing long chains of
binary addition in our FX graph representation.  Not only is this ugly,
it also is quadratic, as the sympy.Add constructor is O(N) in number
of arguments.  Instead, ensure that we maintain the summation as a
single FX node so we can do the entire addition all in one go.

update_hint_regression benchmark, before and after:

```
update_hint_regression,compile_time_instruction_count,2648328980
update_hint_regression,compile_time_instruction_count,2563748678
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136429
Approved by: https://github.com/isuruf
2024-10-08 18:12:57 +00:00
PyTorch MergeBot
7e8dace0de Revert "[ROCm] remove caffe2 from hipify (#137157)"
This reverts commit 40d8260745.

Reverted https://github.com/pytorch/pytorch/pull/137157 on behalf of https://github.com/xw285cornell due to this is breaking internal where we still use caffe2 ([comment](https://github.com/pytorch/pytorch/pull/137157#issuecomment-2400466131))
2024-10-08 17:45:45 +00:00
Jeff Daily
40d8260745 [ROCm] remove caffe2 from hipify (#137157)
- Remove all "MasqueradingAsCUDA" files and classes.
- Do not rename "CUDA" classes to "HIP".

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137157
Approved by: https://github.com/eqy
2024-10-05 12:48:54 +00:00
Michal Gallus
79562f3af8 [ROCm] Modify hipify script to work with Windows paths (#135360)
This change modifies the `hipify_python.py` script to properly detect all directories, `include` and `ignore` paths during hipification process on Windows, by changing the path syntax convention to a UNIX-like one.

Since in many places the script assumes a UNIX-like convention by using paths with forward slashes `/`, I decided to accommodate for it by converting Windows paths to UNIX-like ones. By doing it so, the number of changes to the file is limited. Moreover this early-on unification allows for the rest of the code to have a battle-tested linux-like behaviour.

Another option would be to use `Path` object from `pathlib` to represent all paths in the script, however, it would impact a broader share of a code and would hence require a more meticulous evaluation in terms of non-altered logic and edge cases.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135360
Approved by: https://github.com/jeffdaily, https://github.com/jithunnair-amd
2024-10-04 23:43:43 +00:00
Jeff Daily
c7b0d4b148 raw_alloc ignores PYTORCH_NO_CUDA_MEMORY_CACHING (#131114)
raw_alloc is used by cudnn, miopen, thrust, and tunableop.  Without this PR, the env var for disabling the caching allocator will only partially work.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131114
Approved by: https://github.com/eqy, https://github.com/houseroad, https://github.com/albanD

Co-authored-by: Nichols A. Romero <nick.romero@amd.com>
2024-10-04 15:36:29 +00:00
PyTorch MergeBot
0d1701f310 Revert "raw_alloc ignores PYTORCH_NO_CUDA_MEMORY_CACHING (#131114)"
This reverts commit 7001907480.

Reverted https://github.com/pytorch/pytorch/pull/131114 on behalf of https://github.com/PaliC due to failing internal builds ([comment](https://github.com/pytorch/pytorch/pull/131114#issuecomment-2390615007))
2024-10-03 06:22:55 +00:00
Jeff Daily
7001907480 raw_alloc ignores PYTORCH_NO_CUDA_MEMORY_CACHING (#131114)
raw_alloc is used by cudnn, miopen, thrust, and tunableop.  Without this PR, the env var for disabling the caching allocator will only partially work.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131114
Approved by: https://github.com/eqy, https://github.com/houseroad, https://github.com/albanD

Co-authored-by: Nichols A. Romero <nick.romero@amd.com>
2024-10-02 16:27:15 +00:00
PyTorch MergeBot
7303716005 Revert "Simplify find_localzeros (#133325)"
This reverts commit 99f90c379e.

Reverted https://github.com/pytorch/pytorch/pull/133325 on behalf of https://github.com/ezyang due to https://fb.workplace.com/groups/gpuinference/permalink/2921405651341417/ ([comment](https://github.com/pytorch/pytorch/pull/133325#issuecomment-2385832600))
2024-10-01 13:25:03 +00:00
Edward Z. Yang
cc8f1cddd4 Turn on type-checking in torch.fx.experimental.symbolic_shapes (#136972)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136972
Approved by: https://github.com/Skylion007
ghstack dependencies: #136934, #136935
2024-10-01 13:22:10 +00:00
PyTorch MergeBot
8982906502 Revert "Turn on type-checking in torch.fx.experimental.symbolic_shapes (#136972)"
This reverts commit 3ff2d93d9f.

Reverted https://github.com/pytorch/pytorch/pull/136972 on behalf of https://github.com/ezyang due to need to back out for merge conflict ([comment](https://github.com/pytorch/pytorch/pull/136972#issuecomment-2384182244))
2024-09-30 21:35:08 +00:00
Jez Ng
71aac59e93 Add Triton CPU as an Inductor backend (#133408)
The goal is to use Inductor-generated kernels to stress test the new Triton CPU backend.

Differential Revision: [D63298968](https://our.internmc.facebook.com/intern/diff/D63298968)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133408
Approved by: https://github.com/jansel, https://github.com/blaine-rister, https://github.com/malfet
2024-09-30 20:24:52 +00:00
Edward Z. Yang
3ff2d93d9f Turn on type-checking in torch.fx.experimental.symbolic_shapes (#136972)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136972
Approved by: https://github.com/Skylion007
ghstack dependencies: #136917, #136934, #136935
2024-09-30 18:04:36 +00:00
Edward Z. Yang
99f90c379e Simplify find_localzeros (#133325)
Instead of doing an N^2 connected thing, only do simplifications for binary max/min, and for very simple situations.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133325
Approved by: https://github.com/albanD
2024-09-28 02:38:31 +00:00
albanD
e4571e7025 Add abi flags to cpp_extension cache folder (#136890)
This is to avoid cache confusion between normal vs pydebug vs nogil builds in cpp extensions which can lead to catastrophic ABI issues.
This is rare today for people to run both normal and pydebug on the same machine, but we expect quite a few people will run normal and nogil on the same machine going forward.

This is tested locally by running each version alternatively.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136890
Approved by: https://github.com/colesbury
2024-09-28 00:49:56 +00:00
PyTorch MergeBot
36428f91e9 Revert "Add Triton CPU as an Inductor backend (#133408)"
This reverts commit 31c0467594.

Reverted https://github.com/pytorch/pytorch/pull/133408 on behalf of https://github.com/int3 due to internal tests failing ([comment](https://github.com/pytorch/pytorch/pull/133408#issuecomment-2379692517))
2024-09-27 16:54:27 +00:00
Jez Ng
31c0467594 Add Triton CPU as an Inductor backend (#133408)
The goal is to use Inductor-generated kernels to stress test the new Triton CPU backend.

Differential Revision: [D63298968](https://our.internmc.facebook.com/intern/diff/D63298968)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133408
Approved by: https://github.com/jansel, https://github.com/blaine-rister, https://github.com/malfet
2024-09-26 15:35:26 +00:00
Ramana Sundararaman
be4b7e8131 Param fixes in docstring (#136097)
Fixes wrong param names in docstrings. cc: @kit1980

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136097
Approved by: https://github.com/ezyang
2024-09-21 18:56:34 +00:00
Bob Ren
7f9c06462f fix mypi in utils/_sympy/functions.py (#136339)
Signed-off-by: Bob Ren <bobren@fb.com>

Turns out older versions of python, in particular 3.8 shows errors that 3.12 doesn't. For posterity these are the steps I took to reproduce:

```
conda create -n py38 python=3.8
conda activate py38
pip install -r requirements.txt
lintrunner init
dmypy restart && lintrunner --all-files --take MYPY
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136339
Approved by: https://github.com/Skylion007
ghstack dependencies: #136205
2024-09-20 18:39:16 +00:00
Bob Ren
8d9c42735a Type _sympy/functions.py [1/n] (#136205)
Signed-off-by: Bob Ren <bobren@fb.com>

I was chatting with @jamesjwu about strategies to learn the code and he suggested adding types to some files. This stack of PRs adds types to _sympy/functions.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136205
Approved by: https://github.com/Skylion007, https://github.com/jamesjwu
2024-09-19 17:15:53 +00:00
Igor Sugak
bce52d0b60 [CODEMOD][caffe2] use npt.NDArray instead of np.ndarray in type annotations (#136288)
Summary:
To facilitate PSS-2 upgrade, this uses `ndt.NDArray` instead of `nd.ndarray` in type annotations. In Numpy-1.19 (PSS-1) it's an alias to `nd.ndarray` -- a noop.
In Numpy-1.24, `ndt.NDArray` a proper generic type, and without this change uses of `nd.ndarray` generate this Pyre type error:
```counterexample
 Invalid type parameters [24]: Generic type `np.ndarray` expects 2 type parameters.
```

Test Plan: Sandcastle plus visual inspection

Differential Revision: D62977370

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136288
Approved by: https://github.com/kit1980
2024-09-19 12:40:36 +00:00
Aaron Gokaslan
31715be72a [BE]: Update mypy to 1.11.2 (#133816)
Updates mypy to 1.11.1 to improve type inference

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133816
Approved by: https://github.com/ezyang
2024-09-16 19:44:11 +00:00
PyTorch MergeBot
d0cebedb31 Revert "Add Triton CPU as an Inductor backend (#133408)"
This reverts commit e498b02b47.

Reverted https://github.com/pytorch/pytorch/pull/133408 on behalf of https://github.com/jeanschmidt due to Broke internal signals, see D62737208 for more details ([comment](https://github.com/pytorch/pytorch/pull/133408#issuecomment-2353623816))
2024-09-16 18:33:33 +00:00
PyTorch MergeBot
3117f2cf67 Revert "[BE]: Update mypy to 1.11.2 (#133816)"
This reverts commit 55299cfc22.

Reverted https://github.com/pytorch/pytorch/pull/133816 on behalf of https://github.com/jeanschmidt due to seems to have broken https://github.com/pytorch/pytorch/actions/runs/10865710499/job/30155699792 on main ([comment](https://github.com/pytorch/pytorch/pull/133816#issuecomment-2352377684))
2024-09-16 09:11:16 +00:00
Bob Ren
a5eb43d8b4 Add TensorReferenceAnalysis and some tests (#135886)
Split out and modified from https://github.com/pytorch/pytorch/pull/130228. There were a bunch of subtle bugs eg. sometimes we need to use torch.ops.aten.{operator}.Tensor vs other times using torch.ops.aten.{operator}.default. Or in the case of pow we need to use Tensor_Tensor. I figured it'd be easier to split out adding TensorReferenceAnalysis and add some tests and do the actual integration in a separate diff.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135886
Approved by: https://github.com/ezyang
2024-09-14 23:09:40 +00:00
Jez Ng
e498b02b47 Add Triton CPU as an Inductor backend (#133408)
The goal is to use Inductor-generated kernels to stress test the new Triton CPU backend.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133408
Approved by: https://github.com/jansel
2024-09-14 21:45:19 +00:00
Aaron Gokaslan
55299cfc22 [BE]: Update mypy to 1.11.2 (#133816)
Updates mypy to 1.11.1 to improve type inference

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133816
Approved by: https://github.com/ezyang
2024-09-14 21:40:36 +00:00
Isuru Fernando
8c738c9270 Improve performance of sympy_generic_le (#135622)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135622
Approved by: https://github.com/ezyang
ghstack dependencies: #135621
2024-09-11 16:20:03 +00:00
xinan.lin
67735d1ee8 [Inductor] Generalize is_cuda to specific device_type to make cpp_wrapper mode be extensible (#134693)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134693
Approved by: https://github.com/ezyang, https://github.com/EikanWang, https://github.com/jansel
2024-09-10 10:11:13 +00:00
Michael Lazos
041960a1ce [Dynamo] Automatically in-graph traceable tensor subclass ctors (#135151)
Fixes https://github.com/pytorch/pytorch/issues/114389

Previously, dynamo would attempt to trace through the `__init__` of traceable tensor subclasses, since their constructors are AOT dispatcher traceable by definition, dynamo should automatically put these in the graph like we do for any other tensors. Not doing this is difficult because dynamo would need to apply mutations post tensor subclass creation in the graph.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135151
Approved by: https://github.com/bdhirsh
2024-09-06 12:23:38 +00:00
Avik Chaudhuri
8bfd4916d6 fast path for sympy gcd in floordiv (#134880)
Summary:
Re-implementation of https://github.com/pytorch/pytorch/pull/134150, which was reverted because of some internal tests hanging (case B). The original motivation was to get some other internal test unstuck (case A).

The root cause is that sympy.gcd is both very clever as well as can blow up in some cases. This PR introduces a fast path with an appropriate fallback to sympy.gcd that ensures that both cases A and B go through.

Test Plan:
See the included test for specific examples.
Also https://fb.workplace.com/groups/1075192433118967/posts/1491493248155548/?comment_id=1491938994777640&reply_comment_id=1492622821375924

Differential Revision: D62043315

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134880
Approved by: https://github.com/ezyang
2024-09-04 14:56:49 +00:00
Edward Z. Yang
6c5669903f Fix Invalid NaN comparison due to infinity-zero multiply on latest sympy (#135044)
Fixes https://github.com/pytorch/pytorch/issues/133735

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135044
Approved by: https://github.com/zou3519
2024-09-04 14:13:09 +00:00
Pian Pawakapan
0c7856973b [export] enumerate unsupported sympy.Functions (#134271) (#134598)
Summary:
There's 2 concepts of unsupported sympy.Functions in symbolic_shapes:
1) unsupported by the export solver, meaning the solver doesn't know how to provide useful fixes for those functions
2) unsupported by the sympy interpreter - meaning we can't reify them into FX nodes because the functions aren't present in PythonReferenceAnalysis

This splits the current call into a call for each version, with the Export solver the only user of 1). For 1), we enumerate the functions in _sympy/functions.py, and subtract the functions we know we can support. For 2) there's only 3 functions we've seen pop up in test cases.

cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10

Differential Revision: D61863394

Pulled By: pianpwk

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134598
Approved by: https://github.com/angelayi
2024-08-28 00:34:38 +00:00
PyTorch MergeBot
5b392d22c6 Revert "fix stuck floordiv (#134150)"
This reverts commit 92c4771853.

Reverted https://github.com/pytorch/pytorch/pull/134150 on behalf of https://github.com/anijain2305 due to compile time regression internal ([comment](https://github.com/pytorch/pytorch/pull/134150#issuecomment-2313230404))
2024-08-27 18:23:44 +00:00
PyTorch MergeBot
141a9c7204 Revert "[export] enumerate unsupported sympy.Functions (#134271)"
This reverts commit ddd71e3479.

Reverted https://github.com/pytorch/pytorch/pull/134271 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/134271#issuecomment-2311353460))
2024-08-27 00:45:00 +00:00
Pian Pawakapan
ddd71e3479 [export] enumerate unsupported sympy.Functions (#134271)
There's 2 concepts of unsupported sympy.Functions in symbolic_shapes:
1) unsupported by the export solver, meaning the solver doesn't know how to provide useful fixes for those functions
2) unsupported by the sympy interpreter - meaning we can't reify them into FX nodes because the functions aren't present in PythonReferenceAnalysis

This splits the current call into a call for each version, with the Export solver the only user of 1). For 1), we enumerate the functions in _sympy/functions.py, and subtract the functions we know we can support. For 2) there's only 3 functions we've seen pop up in test cases.

Differential Revision: D61677956

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134271
Approved by: https://github.com/avikchaudhuri
2024-08-26 22:44:12 +00:00
soulitzer
a23dae22d5 Update AC pass use_reentrant message (#134472)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134472
Approved by: https://github.com/albanD
2024-08-26 21:57:38 +00:00
albanD
2588b5e51a Move module_tracker to logging for confused hierarchy (#134467)
Fixes https://github.com/pytorch/pytorch/issues/134242

Make sure to never raise an error when confused. Logs for confusion can be enabled with `TORCH_LOGS="torch.utils.module_tracker"` or the usual python systems.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134467
Approved by: https://github.com/malfet
2024-08-26 19:39:08 +00:00
Avik Chaudhuri
92c4771853 fix stuck floordiv (#134150)
Summary: Fixes https://github.com/pytorch/pytorch/issues/134133

Test Plan:
Tested on the small repro in the linked issue with different lengths N (replacing 100), recording N vs. time taken in nanoseconds:
10 127268319
20 220839662
30 325463125
40 429259441
50 553136055
60 670799769
70 999170514
80 899014103
90 997168902
100 1168202035
110 1388556619
120 1457488235
130 1609816470
140 2177889877
150 1917560313
160 2121096113
170 2428502334
180 4117450755
190 4003068224

So N ~ 200 takes ~5s. Previously even smaller N would go for >1 min.

Didn't add a perf test because ezyang is planning to build a benchmark.

Also tested on https://www.internalfb.com/diff/D61560171, which now gets past the stuck point.

Differential Revision: D61619660

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134150
Approved by: https://github.com/ezyang
2024-08-26 07:27:59 +00:00
Edward Z. Yang
326db8af4c Replace sympy Min/Max with reimplementations (#133319)
Sympy's implementation of Min/Max displays asymptotically bad behavior on `TORCH_COMPILE_CPROFILE=1 python torchrec/distributed/tests/test_pt2_multiprocess.py TestPt2Train.test_compile_multiprocess`. Evidence profile:

![image](https://github.com/user-attachments/assets/142301e9-3a18-4370-b9db-19b32ece7ee8)

On this test case, we spend 42% of all time compiling the network on ShapeEnv.replace, which in turn spends all of its time in xreplace.

The problem appears to be find_localzeros call. By vendoring the implementations of Min/Max, we can potentially reduce the cost of this operation.

The implementation is copy-pasted sympy/functions/elementary/miscellaneous.py but with some adjustments:

* I deleted logic related to differentatiation, evalf and heaviside, as it's not relevant to PyTorch reasoning
* There's some massaging to appease PyTorch's linters, including a lot of noqa and type: ignore (which I could potentially refactor away with substantive changes, but that's better as its own change)
* I deleted the second loop iteration for is_connected, as an attempt at initial optimization (this also simplifies the port, since I can omit some code). I'll comment at that point what the exact difference is.

Before this change, the test in question takes 100s with 40 features; post this change, afterwards, it takes only 69s.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133319
Approved by: https://github.com/Skylion007
2024-08-25 05:05:59 +00:00
Aaron Orenstein
d95aedf5fd [BE] typing for decorators - fx/_compatibility (part 1) (#134202)
Part of #134054.

This corresponds to the pytorch mypy changes from D61493706. Updating takes so
long and touches so many files that it's impossible to land as a whole without conflicting with some other intermediate change.
So landing these 'type: ignore' for pytorch in advance of them actually being needed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134202
Approved by: https://github.com/Skylion007
2024-08-22 17:07:33 +00:00
PyTorch MergeBot
2db28a9611 Revert "[BE]: Update Typeguard to TypeIs for better type inference (#133814)"
This reverts commit bce0caba78.

Reverted https://github.com/pytorch/pytorch/pull/133814 on behalf of https://github.com/ezyang due to root cause of internal failures not addressed ([comment](https://github.com/pytorch/pytorch/pull/133814#issuecomment-2302466444))
2024-08-21 16:13:34 +00:00
blazej-smorawski
585c049fa3 Fix Extension attribute name in CppExtension example (#134046)
Hi! It seems there's a typo in `CppExtension` example. I think it should say `extra_link_args` instead of `extra_link_flags`. Not that I spent a few hours debugging missing kernels inside a library's fatbin or anything :D.

Please see `Extension` definition inside setuptools:
ebddeb36f7/setuptools/_distutils/extension.py (L62)

Thanks!
Błażej

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134046
Approved by: https://github.com/soulitzer
2024-08-21 13:58:16 +00:00