Commit Graph

2605 Commits

Author SHA1 Message Date
Soumith Chintala
8e478d4fb1 Add Alban and Piotr into Core Maintainers (#130903)
See official announcement here: https://dev-discuss.pytorch.org/t/alban-desmaison-and-piotr-bialecki-are-now-pytorch-core-maintainers/2280

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130903
Approved by: https://github.com/albanD, https://github.com/Skylion007
2024-07-20 16:02:42 +00:00
Li-Huai (Allan) Lin
125be005eb [Docs] Fix fake tensor doc (#131205)
Fix this: `# AttributeError: 'FakeTensorMode' object has no attribute 'from_real_tensor'`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131205
Approved by: https://github.com/eellison
2024-07-19 17:59:45 +00:00
Jerry Zhang
793b17ebcb Add numeric_debugger top level APIs (#130643)
Summary:
Add three top level APIs for numeric debugger in pt2e flow that can log intermediate output in the model
and calculate summary for metric comparisons between nodes in two graphs

* `prepare_for_propagation_comparison`
* `extract_results_from_loggers`
* `compare_results`

Test Plan:
python test/test_quantization.py -k test_prepare_for_propagation_comparison
python test/test_quantization.py -k test_extract_results_from_loggers

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130643
Approved by: https://github.com/dulinriley, https://github.com/tarun292
2024-07-18 20:54:18 +00:00
redwrasse
63a0a65df9 Define 'zero-preserving unary functions' in docs (#130804)
Make explicit the definition of 'zero-preserving unary functions' in the sparse tensors documentation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130804
Approved by: https://github.com/soulitzer
2024-07-18 13:30:29 +00:00
drisspg
2b43d339fe Make FlexAttention API public (#130755)
# Summary

Makes the prototype API flex_attention public

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130755
Approved by: https://github.com/Chillee
2024-07-16 16:21:25 +00:00
Xuehai Pan
a3abfa5cb5 [BE][Easy][1/19] enforce style for empty lines in import segments (#129752)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129752
Approved by: https://github.com/ezyang, https://github.com/malfet
2024-07-16 00:42:56 +00:00
Jerry Zhang
b893aa71ca Rename generate_numeric_debug_handle to numeric_debugger (#130590)
Summary:
att

Test Plan:
CI

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130590
Approved by: https://github.com/dulinriley, https://github.com/tarun292
2024-07-15 22:42:27 +00:00
Yu, Guangye
7cd48df2da Refine the logic of device construction when only device index is given (#129119)
# Motivation
Before this PR, device construction was `cuda` type when only a device index was given. It also returns the `PrivateUser1` type if a `PrivateUser1` type is registered.
```bash
>>> import torch
>>> device = torch.device(0)
>>> device.type
'cuda'
>>> a = torch.tensor([1, 2])
>>> b = a.to(0)
>>> b
tensor([1, 2], device='cuda:0')
```
It works well on CUDA GPU. But it will raise unexpected information and error running on XPU.
```bash
>>> import torch
>>> device = torch.device(0)
>>> device.type
'cuda'
>>> a = torch.tensor([1, 2])
>>> b = a.to(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/xxx/pytorch/torch/cuda/__init__.py", line 302, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
```
With this PR, refine the logic to use the currently available device type instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129119
Approved by: https://github.com/albanD, https://github.com/gujinghui, https://github.com/EikanWang
ghstack dependencies: #129463, #129205, #129363
2024-07-15 14:34:29 +00:00
Yu, Guangye
9cae2160f5 Introduce the concept of Accelerators to PyTorch doc (#129363)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129363
Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/albanD
ghstack dependencies: #129463, #129205
2024-07-15 14:24:46 +00:00
Mikayla Gawarecki
7c289c2a5c Add torch.serialization.safe_globals context manager (#127939)
Add context manager mentioned in https://github.com/pytorch/pytorch/pull/127808#pullrequestreview-2096298486

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127939
Approved by: https://github.com/albanD
2024-07-12 20:38:43 +00:00
rzou
9c69684af8 [custom_ops] expose torch.library.register_torch_dispatch (#130261)
This is the API for defining the interaction between a torch_dispatch
class and a custom op. Taking API bikeshedding.

Test Plan:
- new tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130261
Approved by: https://github.com/albanD
ghstack dependencies: #130064
2024-07-12 14:13:01 +00:00
Shangdi Yu
fb9bc6d74a [custom op] add doc for CustomOpDef.set_kernel_enabled (#130406)
<img width="1067" alt="Screenshot 2024-07-09 at 6 14 55 PM" src="https://github.com/pytorch/pytorch/assets/22356083/941751f8-8e12-43cb-8477-c739476e0096">
<img width="965" alt="Screenshot 2024-07-09 at 6 14 59 PM" src="https://github.com/pytorch/pytorch/assets/22356083/aa9be099-f26c-45a3-8a14-742a2bb7c28b">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130406
Approved by: https://github.com/zou3519
2024-07-11 15:47:35 +00:00
Shangdi Yu
a4576dad34 [reland][custom ops] infer schema (#130079)
Fixes #129617

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130079
Approved by: https://github.com/zou3519
2024-07-11 03:39:07 +00:00
PyTorch MergeBot
86bca69c5f Revert "[custom_ops] expose torch.library.register_torch_dispatch (#130261)"
This reverts commit bb9a73f767.

Reverted https://github.com/pytorch/pytorch/pull/130261 on behalf of https://github.com/izaitsevfb due to depends on #130064 which needs to be reverted ([comment](https://github.com/pytorch/pytorch/pull/130261#issuecomment-2221569707))
2024-07-10 21:43:28 +00:00
PyTorch MergeBot
e14a0f45ed Revert "[reland][custom ops] infer schema (#130079)"
This reverts commit bef085bdfa.

Reverted https://github.com/pytorch/pytorch/pull/130079 on behalf of https://github.com/izaitsevfb due to depends on #130064 which needs to be reverted ([comment](https://github.com/pytorch/pytorch/pull/130079#issuecomment-2221561483))
2024-07-10 21:40:16 +00:00
Shangdi Yu
bef085bdfa [reland][custom ops] infer schema (#130079)
Fixes #129617

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130079
Approved by: https://github.com/zou3519
2024-07-10 16:18:36 +00:00
rzou
bb9a73f767 [custom_ops] expose torch.library.register_torch_dispatch (#130261)
This is the API for defining the interaction between a torch_dispatch
class and a custom op. Taking API bikeshedding.

Test Plan:
- new tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130261
Approved by: https://github.com/albanD
ghstack dependencies: #130064
2024-07-09 21:11:27 +00:00
Yuanhao Ji
312652c325 [RFC] Add support for device extension autoloading (#127074)
Fixes #122468

- Load device extensions at the end of `torch/__init__.py`
- Enabled by default, or you can disable it with `TORCH_DEVICE_BACKEND_AUTOLOAD=0`

run test:

```python
python test/run_test.py -i test_autoload_enable
python test/run_test.py -i test_autoload_disable
```

doc:

https://docs-preview.pytorch.org/pytorch/pytorch/127074/miscellaneous_environment_variables.html

co-author:  @jgong5 @bsochack @bkowalskiINTEL @jczaja @FFFrog @hipudding

Co-authored-by: albanD <desmaison.alban@gmail.com>
Co-authored-by: Jiong Gong <jiong.gong@intel.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127074
Approved by: https://github.com/albanD, https://github.com/jgong5
2024-07-09 06:14:13 +00:00
PyTorch MergeBot
44a773c121 Revert "[custom ops] infer schema (#130079)"
This reverts commit 3fe324ffb6.

Reverted https://github.com/pytorch/pytorch/pull/130079 on behalf of https://github.com/huydhn due to The test_public_bindings failure looks legit 3fe324ffb6 ([comment](https://github.com/pytorch/pytorch/pull/130079#issuecomment-2215420957))
2024-07-08 22:02:29 +00:00
Shangdi Yu
3fe324ffb6 [custom ops] infer schema (#130079)
Fixes #129617

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130079
Approved by: https://github.com/zou3519
2024-07-08 20:46:23 +00:00
Kurt Mohler
e590168865 Enable sharing meta tensors between processes (#129520)
Fixes #129436

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129520
Approved by: https://github.com/ezyang
2024-07-04 20:29:48 +00:00
Li-Huai (Allan) Lin
42f3d7e948 [MPS] Add mps profiler env vars to docs (#129552)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129552
Approved by: https://github.com/malfet
ghstack dependencies: #129451
2024-07-04 06:44:48 +00:00
Zhengxu Chen
042d764872 [export] Update example inputs format for DB. (#129982)
Summary: To give user a simpler example code, we are getting rid of ExportArgs in favor of example_args and example_kwargs.

Test Plan: CI

Differential Revision: D59288920

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129982
Approved by: https://github.com/angelayi
2024-07-03 17:53:15 +00:00
Edward Z. Yang
29c68df600 Stop immediately specializing common constants 0/1 for plain int (#128327)
Fixes https://github.com/pytorch/pytorch/issues/128319

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128327
Approved by: https://github.com/lezcano
ghstack dependencies: #129983
2024-07-03 16:41:51 +00:00
Howard Huang
4eb449f7dc [pipelining] add small logging section to docs (#129368)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129368
Approved by: https://github.com/wconstab
2024-07-02 18:19:28 +00:00
Haoci Zhang
1ad683033b Implemented flexible PP schedule (#129597)
Enabled some cases to work where num_microbatches % pp_size != 0. Using the flex_pp schedule, we will have

num_rounds = max(1, n_microbatches // pp_group_size) and it works as long as n_microbatches % num_rounds is 0. As a few examples, support

pp_group_size = 4, n_microbatches = 10. We will have num_rounds = 2 and n_microbatches % 2 is 0.
pp_group_size = 4, n_microbatches = 3. We will have num_rounds = 1 and n_microbatches % 1 is 0.

Moved over from PiPPy (https://github.com/pytorch/PiPPy/pull/1129)

Tested using the config in (1), schedule looks like the following graph:

```
=========== ALL_RANK_ACTIONS ===========
         Rank 0  Rank 1  Rank 2  Rank 3
Step 00: F0_s0   None    None    None
Step 01: F1_s0   F0_s1   None    None
Step 02: F2_s0   F1_s1   F0_s2   None
Step 03: F3_s0   F2_s1   F1_s2   F0_s3
Step 04: F4_s0   F3_s1   F2_s2   F1_s3
Step 05: F0_s4   F4_s1   F3_s2   F2_s3
Step 06: F1_s4   F0_s5   F4_s2   F3_s3
Step 07: F2_s4   F1_s5   F0_s6   F4_s3
Step 08: F3_s4   F2_s5   F1_s6   F0_s7
Step 09: F4_s4   F3_s5   None    B0_s7
Step 10: F5_s0   None    F2_s6   F1_s7
Step 11: None    None    B0_s6   B1_s7
Step 12: None    F4_s5   F3_s6   F2_s7
Step 13: None    B0_s5   B1_s6   B2_s7
Step 14: F6_s0   F5_s1   F4_s6   F3_s7
Step 15: B0_s4   B1_s5   B2_s6   B3_s7
Step 16: F7_s0   F6_s1   F5_s2   F4_s7
Step 17: B1_s4   B2_s5   B3_s6   B4_s7
Step 18: F8_s0   F7_s1   F6_s2   F5_s3
Step 19: B2_s4   B3_s5   B4_s6   B0_s3
Step 20: F9_s0   F8_s1   F7_s2   F6_s3
Step 21: B3_s4   B4_s5   B0_s2   B1_s3
Step 22: F5_s4   F9_s1   F8_s2   F7_s3
Step 23: B4_s4   B0_s1   B1_s2   B2_s3
Step 24: F6_s4   F5_s5   F9_s2   F8_s3
Step 25: B0_s0   B1_s1   B2_s2   B3_s3
Step 26: F7_s4   F6_s5   F5_s6   F9_s3
Step 27: B1_s0   B2_s1   B3_s2   B4_s3
Step 28: F8_s4   F7_s5   F6_s6   F5_s7
Step 29: B2_s0   B3_s1   B4_s2   B5_s7
Step 30: F9_s4   F8_s5   F7_s6   F6_s7
Step 31: B3_s0   B4_s1   B5_s6   B6_s7
Step 32: None    F9_s5   F8_s6   F7_s7
Step 33: B4_s0   B5_s5   B6_s6   B7_s7
Step 34: None    None    F9_s6   F8_s7
Step 35: B5_s4   B6_s5   B7_s6   B8_s7
Step 36: None    None    None    F9_s7
Step 37: B6_s4   B7_s5   B8_s6   B9_s7
Step 38: None    None    None    None
Step 39: B7_s4   B8_s5   B9_s6   B5_s3
Step 40: None    None    None    None
Step 41: B8_s4   B9_s5   B5_s2   B6_s3
Step 42: None    None    None    None
Step 43: B9_s4   B5_s1   B6_s2   B7_s3
Step 44: None    None    None    None
Step 45: B5_s0   B6_s1   B7_s2   B8_s3
Step 46: None    None    None    None
Step 47: B6_s0   B7_s1   B8_s2   B9_s3
Step 48: None    None    None
Step 49: B7_s0   B8_s1   B9_s2
Step 50: None    None
Step 51: B8_s0   B9_s1
Step 52: None
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129597
Approved by: https://github.com/H-Huang
2024-07-02 07:54:38 +00:00
eqy
f845a7a91a [cuDNN][SDPA] Remove TORCH_CUDNN_SDPA_ENABLED=1, enable cuDNN SDPA by default on H100 and 2nd on other archs >= sm80 (#125343)
Looks like one of the first failures seen is `test_causal_variants_compile_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda` when `test_causal_variants_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda` passes.

What seems interesting here is that the `torch.compile` version fails while the eager version passes. Not sure what the difference would be here...

Nevertheless, is there a recommended mechanism to skip cuDNN SDPA as a backend for this test? CC @drisspg

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125343
Approved by: https://github.com/Skylion007
2024-06-30 19:22:16 +00:00
PyTorch MergeBot
3d96217891 Revert "[BE][Easy] use pathlib.Path instead of dirname / ".." / pardir (#129374)"
This reverts commit 9e1f3ecaa7.

Reverted https://github.com/pytorch/pytorch/pull/129374 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is still failing with the same error ([comment](https://github.com/pytorch/pytorch/pull/129374#issuecomment-2197801405))
2024-06-29 00:47:15 +00:00
PyTorch MergeBot
999eec8dea Revert "[cuDNN][SDPA] Remove TORCH_CUDNN_SDPA_ENABLED=1, enable cuDNN SDPA by default on H100 and 2nd on other archs >= sm80 (#125343)"
This reverts commit b7e7a4cb01.

Reverted https://github.com/pytorch/pytorch/pull/125343 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to break some test_transformer running on internal A100 and V100 ([comment](https://github.com/pytorch/pytorch/pull/125343#issuecomment-2196202003))
2024-06-28 06:03:54 +00:00
Xuehai Pan
9e1f3ecaa7 [BE][Easy] use pathlib.Path instead of dirname / ".." / pardir (#129374)
Changes by apply order:

1. Replace all `".."` and `os.pardir` usage with `os.path.dirname(...)`.
2. Replace nested `os.path.dirname(os.path.dirname(...))` call with `str(Path(...).parent.parent)`.
3. Reorder `.absolute()` ~/ `.resolve()`~ and `.parent`: always resolve the path first.

    `.parent{...}.absolute()` -> `.absolute().parent{...}`

4. Replace chained `.parent x N` with `.parents[${N - 1}]`: the code is easier to read (see 5.)

    `.parent.parent.parent.parent` -> `.parents[3]`

5. ~Replace `.parents[${N - 1}]` with `.parents[${N} - 1]`: the code is easier to read and does not introduce any runtime overhead.~

    ~`.parents[3]` -> `.parents[4 - 1]`~

6. ~Replace `.parents[2 - 1]` with `.parent.parent`: because the code is shorter and easier to read.~

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129374
Approved by: https://github.com/justinchuby, https://github.com/malfet
2024-06-28 00:35:15 +00:00
Li-Huai (Allan) Lin
84ad5452f6 [MPS] Fused SGD optimizer (#129350)
```
[-------------------------------------- Fused SGD --------------------------------------]
                                                          |  Fused: True  |  Fused: False
1 threads: ------------------------------------------------------------------------------
      numel: 1024, num_tensors: 100, momentum: True       |        2      |       15
      numel: 1024, num_tensors: 100, momentum: False      |        2      |        5
      numel: 65536, num_tensors: 100, momentum: True      |        3      |       16
      numel: 65536, num_tensors: 100, momentum: False     |        2      |        5
      numel: 1048576, num_tensors: 100, momentum: True    |       11      |       16
      numel: 1048576, num_tensors: 100, momentum: False   |        8      |        6
      numel: 1024, num_tensors: 500, momentum: True       |       29      |       70
      numel: 1024, num_tensors: 500, momentum: False      |       20      |       24
      numel: 65536, num_tensors: 500, momentum: True      |       33      |       76
      numel: 65536, num_tensors: 500, momentum: False     |       22      |       26
      numel: 1048576, num_tensors: 500, momentum: True    |       70      |       80
      numel: 1048576, num_tensors: 500, momentum: False   |       43      |       40
      numel: 1024, num_tensors: 1000, momentum: True      |      108      |      139
      numel: 1024, num_tensors: 1000, momentum: False     |       72      |       48
      numel: 65536, num_tensors: 1000, momentum: True     |      116      |      150
      numel: 65536, num_tensors: 1000, momentum: False    |       77      |       52
      numel: 1048576, num_tensors: 1000, momentum: True   |      190      |      170
      numel: 1048576, num_tensors: 1000, momentum: False  |      120      |       50
```

```python
def profile_fused_sgd():
    from torch.optim.sgd import sgd
    import torch.utils.benchmark as benchmark

    import itertools

    def profile(fn, params, grads, momentum_buffer_list, fused):
        fn(
            params,
            grads,
            momentum_buffer_list,
            momentum=True if len(momentum_buffer_list) > 0 else False,
            dampening=0.0,
            nesterov=False,
            foreach=False,
            fused=fused,
            lr=1e-3,
            weight_decay=.0,
            maximize=False,
            grad_scale=None,
            found_inf=None,
        )
        torch.mps.synchronize()

    device = "mps"

    results = []

    for num_tensors, numel, momentum in itertools.product([100, 500, 1000], [1024, 65536, 1048576], [True, False]):
        sublabel = f"numel: {numel}, num_tensors: {num_tensors}, momentum: {momentum}"
        print(sublabel)
        params, grads = [[torch.arange(numel, dtype=torch.float32, device=device) + (numel * i) for i in range(num_tensors)] for _ in range(2)]
        momentum_buffer_list = [torch.arange(numel, dtype=torch.float32, device=device) + (numel * i) for i in range(num_tensors)] if momentum else []
        fn = sgd

        for fused in [True, False]:

            t = benchmark.Timer(
                    stmt='profile(fn, params, grads, momentum_buffer_list, fused)',
                    label='Fused SGD',
                    sub_label=sublabel,
                    globals=locals(),
                    description= f"Fused: {fused}",
                ).blocked_autorange(min_run_time=5)
            results.append(t)

    compare = benchmark.Compare(results)
    compare.trim_significant_figures()
    compare.colorize(rowwise=True)
    compare.print()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129350
Approved by: https://github.com/janeyx99
ghstack dependencies: #129006, #129008, #129007, #129105
2024-06-27 04:37:14 +00:00
PyTorch MergeBot
895316119d Revert "[BE][Easy] use pathlib.Path instead of dirname / ".." / pardir (#129374)"
This reverts commit 0314c4c101.

Reverted https://github.com/pytorch/pytorch/pull/129374 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it causes lots of internal build failures where they fail to find hipify module ([comment](https://github.com/pytorch/pytorch/pull/129374#issuecomment-2192437052))
2024-06-26 19:03:57 +00:00
Shangdi Yu
cca85c96cd [export] minor typo fix (#129543)
Fixes a typo in torch.export doc.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129543
Approved by: https://github.com/angelayi
2024-06-26 18:35:31 +00:00
Eddie Yan
b7e7a4cb01 [cuDNN][SDPA] Remove TORCH_CUDNN_SDPA_ENABLED=1, enable cuDNN SDPA by default on H100 and 2nd on other archs >= sm80 (#125343)
Looks like one of the first failures seen is `test_causal_variants_compile_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda` when `test_causal_variants_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda` passes.

What seems interesting here is that the `torch.compile` version fails while the eager version passes. Not sure what the difference would be here...

Nevertheless, is there a recommended mechanism to skip cuDNN SDPA as a backend for this test? CC @drisspg

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125343
Approved by: https://github.com/Skylion007
2024-06-26 00:49:18 +00:00
Zhengxu Chen
e58ef5b65f [export] Rewrite exportdb formatting. (#129260)
Summary: It'll be easier to generate examples if the code doesn't depend on exportdb library.

Test Plan: CI

Differential Revision: D58886554

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129260
Approved by: https://github.com/tugsbayasgalan
2024-06-25 21:04:53 +00:00
Li-Huai (Allan) Lin
71ebe5121a [MPS] Fast math env var (#129007)
Allow users to decide whether they want to have fast math enabled via env var
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129007
Approved by: https://github.com/malfet
ghstack dependencies: #129006, #129008
2024-06-25 13:52:07 +00:00
Xuehai Pan
0314c4c101 [BE][Easy] use pathlib.Path instead of dirname / ".." / pardir (#129374)
Changes by apply order:

1. Replace all `".."` and `os.pardir` usage with `os.path.dirname(...)`.
2. Replace nested `os.path.dirname(os.path.dirname(...))` call with `str(Path(...).parent.parent)`.
3. Reorder `.absolute()` ~/ `.resolve()`~ and `.parent`: always resolve the path first.

    `.parent{...}.absolute()` -> `.absolute().parent{...}`

4. Replace chained `.parent x N` with `.parents[${N - 1}]`: the code is easier to read (see 5.)

    `.parent.parent.parent.parent` -> `.parents[3]`

5. ~Replace `.parents[${N - 1}]` with `.parents[${N} - 1]`: the code is easier to read and does not introduce any runtime overhead.~

    ~`.parents[3]` -> `.parents[4 - 1]`~

6. ~Replace `.parents[2 - 1]` with `.parent.parent`: because the code is shorter and easier to read.~

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129374
Approved by: https://github.com/justinchuby, https://github.com/malfet
2024-06-25 08:28:38 +00:00
Will Constable
2f8b301c32 Clean up distributed/CONTRIBUTING.md (#128450)
Click [here](cf6c88af48/torch/distributed/CONTRIBUTING.md) to see the rendered version of the file in this PR

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128450
Approved by: https://github.com/wanchaol
2024-06-22 02:41:22 +00:00
rzou
311fadb1fb [docs] Redirect custom ops landing page to the correct place (#129177)
I'm moving it to pytorch/tutorials
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129177
Approved by: https://github.com/albanD
2024-06-21 13:31:32 +00:00
cyy
5c676bb8b3 Remove Caffe2 handling from onnx_unpack_quantized_weights (#129021)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129021
Approved by: https://github.com/justinchuby, https://github.com/albanD
2024-06-21 06:16:44 +00:00
Jing Xu
5fba5d83f0 add xpu for amp (#127276)
As support for Intel GPU has been upstreamed, this PR is to add the XPU-related contents to AMP doc.

Co-authored-by: Yu, Guangye <guangye.yu@intel.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127276
Approved by: https://github.com/dvrogozh, https://github.com/albanD, https://github.com/malfet
2024-06-20 21:49:35 +00:00
Zhengxu Chen
65286883d4 [export] reland "experimental joint graph API." (#129081)
Summary: previous diff got reverted despite CI was green.

Test Plan: CI

Differential Revision: D58790048

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129081
Approved by: https://github.com/tugsbayasgalan
2024-06-20 16:50:53 +00:00
Oguz Ulgen
54b0006cb2 Evaluate symexprs on load path of cache not write (#128997)
When caching is enabled, an internal model fails with
```
assert_size_stride(bmm_9, (17, s0, 512), (54784, 512, 1))
AssertionError: expected size 17==17, stride 57344==54784 at dim=0
```
looking at this model, the exact problem is when the cache is hit on the forward graph, the generated code for backward fails since the strides of the outputs of forward, passed to backward as inputs, are not what we expected.

This PR changes the evaluation logic so that we defer evaluation of output stride exprs to load path as opposed to eagerly doing it on save path.

I have not been able to come up with a unit test repro for this problem.

Differential Revision: [D58796503](https://our.internmc.facebook.com/intern/diff/D58796503)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128997
Approved by: https://github.com/ezyang
2024-06-20 08:55:12 +00:00
Li-Huai (Allan) Lin
19f3abcde4 [Docs][MPS] Add mps environment variable table (#129008)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129008
Approved by: https://github.com/malfet
ghstack dependencies: #129006
2024-06-20 03:30:35 +00:00
PyTorch MergeBot
df94d57c0a Revert "[export] experimental joint graph API. (#128847)"
This reverts commit 0707811286.

Reverted https://github.com/pytorch/pytorch/pull/128847 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/128847#issuecomment-2179326891))
2024-06-19 19:04:36 +00:00
Zhengxu Chen
0707811286 [export] experimental joint graph API. (#128847)
Summary:
WARNING: This API is highly unstable and will be subject to change in the future.

Add a protoype to "decompose" an ExportedProgram into a joint graph form, so that we can compute the gradients on this graph.

Test Plan: buck test mode/opt caffe2/torch/fb/export:test_experimental

Differential Revision: D55657917

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128847
Approved by: https://github.com/tugsbayasgalan
2024-06-19 16:45:27 +00:00
Li-Huai (Allan) Lin
0fc603ece4 [optim] Fused implementation stability table (#129006)
I'd like to discuss the criteria that we regard an implementation as stable. If there is no existing standard, my initial proposal would be a 6 month period after the commit to regard it as stable. As a result, now Adam and AdamW on CUDA would be considered as stable, while the rest are of beta.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129006
Approved by: https://github.com/malfet
2024-06-19 16:29:49 +00:00
soulitzer
1877b7896c [checkpoint] Clean up selective activation checkpoint and make public (#125795)
### bc-breaking for existing users of the private API:
- Existing policy functions must now change their return value to be [CheckpointPolicy](c0b40ab42e/torch/utils/checkpoint.py (L1204-L1230))  Enum instead of bool.
   - To restore previous behavior, return `PREFER_RECOMPUTE` instead of `False` and `{PREFER,MUST}_SAVE` instead of `True` depending whether you prefer the compiler to override your policy.
- Policy function now accepts a `ctx` object instead of `mode` for its first argument.
   - To restore previous behavior, `mode = "recompute" if ctx.is_recompute else "forward"`.
- Existing calls to `_pt2_selective_checkpoint_context_fn_gen` must be renamed to `create_selective_checkpoint_contexts `. The way you use the API remains the same. It would've been nice to do something different (not make the user have to use functools.partial?), but this was the easiest to compile (idk if this should actually be a constraint).

Related doc: https://docs.google.com/document/d/1BKyizkZPdri9mHqdDOLAUpkI7SbbKfLHRFVVpK9ZWqo/edit

Memory considerations:
- As with the existing SAC, cached values are cleared upon first use.
- We error if the user wishes to backward a second time on a region forwarded with SAC enabled.

In-place:
- We use version counting to enforce that if any cached tensor has been mutated. In-place operations not mutating cached tensors are allowed.
- `allow_cache_entry_mutation=True` can be passed to disable this check (useful in the case of auto AC where the user is cleverly also saves the output of the in-place)

Randomness, views
- Currently in this PR, we don't do anything special for randomness or views, the author of the policy function is expected to handle them properly. (Would it would be beneficial to error? - we either want to save all or recompute all random tensors)

Tensor object preservation
- ~We guarantee that if a tensor does not requires grad, and it is saved, then what you get out is the same tensor object.~ UPDATE: We guarantee that if a tensor is of non-differentiable dtype AND it is not a view, and it is saved, then what you get out is the same tensor object. This is a nice guarantee for nested tensors which care about the object identity of of the offsets tensor.

Policy function
- Enum values are `{MUST,PREFER}_{SAVE,RECOMPUTE}` (bikeshed welcome). Alternatively there was `{SAVE,RECOMPUTE}_{NON_,}OVERRIDABLE`. The former was preferred bc it seemed clearer that two `MUST` clashing should error, versus it is ambiguous whether two `NON_OVERRIDABLE` being stacked should silently ignore or error.
- The usage of Enum today. There actually is NO API to stack SAC policies today. The only thing the Enum should matter for in the near term is the compiler. The stacking SAC policy would be useful if someone wants to implement something like simple FSDP, but it is not perfect because with a policy of `PREFER_SAVE` you are actually saving more than autograd would save normally (would be fixed with AC v3).
- The number of times we call the policy_fn is something that should be documented as part of public API. We call the policy function for all ops except ~~detach~~ UPDATE :  metadata ops listed in `torch.utils.checkpoint.SAC_IGNORED_OPS`) because these ops may be called a different number of times by AC itself between forward and recompute.
- The policy function can be a stateful object (we do NOT make separate copies of this object for forward/recompute, the user is expected to handle that via is_recompute see below).
Tensors guaranteed to be the same tensor as-is
- Policy function signature takes ctx object as its first argument. The ctx function is an object encapsulating info that may be useful to the user, it currently only holds "is_recompute". Adding this indirection gives us flexibility to add more attrs later if necessary.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125795
Approved by: https://github.com/Chillee, https://github.com/fmassa
2024-06-18 18:18:50 +00:00
Boyuan Feng
43998711a7 [CUDAGraph] add more docs for cudagraph trees (#127963)
This PR adds more documentation for CUDAGraph Trees, including
- Iteration Support
- Input Mutation Support
- Dynamic Shape Support
- NCCL Support
- Reasons for Skipping CUDAGraph

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127963
Approved by: https://github.com/eellison
2024-06-18 02:07:07 +00:00
ibartol
c6b180a316 Created docs (and example) for cudart function in torch.cuda (#128741)
Fixes #127908

## Description

Created docs to document the torch.cuda.cudart function to solve the issue #127908.
I tried to stick to the [guidelines to document a function](https://github.com/pytorch/pytorch/wiki/Docstring-Guidelines#documenting-a-function) but I was not sure if there is a consensus on how to handle the docs of a function that calls an internal function. So I went ahead and tried what the function will raise, etc. from the user endpoint and documented it (i.e. I am giving what actually _lazy_init() will raise).

Updated PR from #128298 since I made quite a big mistake in my branch. I apologize for the newbie mistake.

### Summary of Changes

- Added docs for torch.cuda.cudart
- Added the cudart function in the autosummary of docs/source/cuda.rst

## Checklist
- [X] The issue that is being fixed is referred in the description
- [X] Only one issue is addressed in this pull request
- [X] Labels from the issue that this PR is fixing are added to this pull request
- [X] No unnecesary issues are included into this pull request

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128741
Approved by: https://github.com/msaroufim
2024-06-17 16:50:37 +00:00