PyTorch MergeBot
e4b5645f83
Revert "Add wrappers for synchronous GPUDirect Storage APIs ( #130633 )"
...
This reverts commit 5b5e0698a5 .
Reverted https://github.com/pytorch/pytorch/pull/130633 on behalf of https://github.com/clee2000 due to breaking a lot of jobs and build rules internally D60085885, possibly needs to update some bazel build? ([comment](https://github.com/pytorch/pytorch/pull/130633#issuecomment-2245806738 ))
2024-07-23 17:19:34 +00:00
PyTorch MergeBot
b435d84261
Revert "[custom ops] Add register_vmap for custom ops ( #130589 )"
...
This reverts commit 074b420641 .
Reverted https://github.com/pytorch/pytorch/pull/130589 on behalf of https://github.com/atalman due to Please fix lint and reland ([comment](https://github.com/pytorch/pytorch/pull/130589#issuecomment-2244092174 ))
2024-07-23 01:44:44 +00:00
Shangdi Yu
074b420641
[custom ops] Add register_vmap for custom ops ( #130589 )
...
Fixes #130284
Fixes #130653
- Add `torch.library.register_vmap` to custom ops
- Add `register_vmap` for operators in ops in custom_op_db.
- Make `torch.autograd.Function` support kwarg-only kwargs for vmap
- test operators in op_db with `tests/test_vmap`.
- change `test_vmap` to allow custom `out_dim` and allow "None" in `out_dim` when testing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130589
Approved by: https://github.com/zou3519
2024-07-23 00:54:52 +00:00
Mikayla Gawarecki
5b5e0698a5
Add wrappers for synchronous GPUDirect Storage APIs ( #130633 )
...
Based in part on https://github.com/NVIDIA/apex/pull/1774
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130633
Approved by: https://github.com/albanD
2024-07-22 14:51:24 +00:00
PyTorch MergeBot
26383a6cc0
Revert "Added and_masks and or_masks utilities ( #131073 )"
...
This reverts commit 92bb323d36 .
Reverted https://github.com/pytorch/pytorch/pull/131073 on behalf of https://github.com/albanD due to The docs build fails here and in trunk ([comment](https://github.com/pytorch/pytorch/pull/131073#issuecomment-2242997958 ))
2024-07-22 13:44:55 +00:00
chilli
92bb323d36
Added and_masks and or_masks utilities ( #131073 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131073
Approved by: https://github.com/drisspg
ghstack dependencies: #130871 , #130904
2024-07-22 11:48:03 +00:00
Soumith Chintala
8e478d4fb1
Add Alban and Piotr into Core Maintainers ( #130903 )
...
See official announcement here: https://dev-discuss.pytorch.org/t/alban-desmaison-and-piotr-bialecki-are-now-pytorch-core-maintainers/2280
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130903
Approved by: https://github.com/albanD , https://github.com/Skylion007
2024-07-20 16:02:42 +00:00
Li-Huai (Allan) Lin
125be005eb
[Docs] Fix fake tensor doc ( #131205 )
...
Fix this: `# AttributeError: 'FakeTensorMode' object has no attribute 'from_real_tensor'`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131205
Approved by: https://github.com/eellison
2024-07-19 17:59:45 +00:00
Jerry Zhang
793b17ebcb
Add numeric_debugger top level APIs ( #130643 )
...
Summary:
Add three top level APIs for numeric debugger in pt2e flow that can log intermediate output in the model
and calculate summary for metric comparisons between nodes in two graphs
* `prepare_for_propagation_comparison`
* `extract_results_from_loggers`
* `compare_results`
Test Plan:
python test/test_quantization.py -k test_prepare_for_propagation_comparison
python test/test_quantization.py -k test_extract_results_from_loggers
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130643
Approved by: https://github.com/dulinriley , https://github.com/tarun292
2024-07-18 20:54:18 +00:00
redwrasse
63a0a65df9
Define 'zero-preserving unary functions' in docs ( #130804 )
...
Make explicit the definition of 'zero-preserving unary functions' in the sparse tensors documentation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130804
Approved by: https://github.com/soulitzer
2024-07-18 13:30:29 +00:00
drisspg
2b43d339fe
Make FlexAttention API public ( #130755 )
...
# Summary
Makes the prototype API flex_attention public
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130755
Approved by: https://github.com/Chillee
2024-07-16 16:21:25 +00:00
Xuehai Pan
a3abfa5cb5
[BE][Easy][1/19] enforce style for empty lines in import segments ( #129752 )
...
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501 . Most changes are auto-generated by linter.
You can review these PRs via:
```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129752
Approved by: https://github.com/ezyang , https://github.com/malfet
2024-07-16 00:42:56 +00:00
Jerry Zhang
b893aa71ca
Rename generate_numeric_debug_handle to numeric_debugger ( #130590 )
...
Summary:
att
Test Plan:
CI
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130590
Approved by: https://github.com/dulinriley , https://github.com/tarun292
2024-07-15 22:42:27 +00:00
Yu, Guangye
7cd48df2da
Refine the logic of device construction when only device index is given ( #129119 )
...
# Motivation
Before this PR, device construction was `cuda` type when only a device index was given. It also returns the `PrivateUser1` type if a `PrivateUser1` type is registered.
```bash
>>> import torch
>>> device = torch.device(0)
>>> device.type
'cuda'
>>> a = torch.tensor([1, 2])
>>> b = a.to(0)
>>> b
tensor([1, 2], device='cuda:0')
```
It works well on CUDA GPU. But it will raise unexpected information and error running on XPU.
```bash
>>> import torch
>>> device = torch.device(0)
>>> device.type
'cuda'
>>> a = torch.tensor([1, 2])
>>> b = a.to(0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/xxx/pytorch/torch/cuda/__init__.py", line 302, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
```
With this PR, refine the logic to use the currently available device type instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129119
Approved by: https://github.com/albanD , https://github.com/gujinghui , https://github.com/EikanWang
ghstack dependencies: #129463 , #129205 , #129363
2024-07-15 14:34:29 +00:00
Yu, Guangye
9cae2160f5
Introduce the concept of Accelerators to PyTorch doc ( #129363 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129363
Approved by: https://github.com/EikanWang , https://github.com/gujinghui , https://github.com/albanD
ghstack dependencies: #129463 , #129205
2024-07-15 14:24:46 +00:00
Mikayla Gawarecki
7c289c2a5c
Add torch.serialization.safe_globals context manager ( #127939 )
...
Add context manager mentioned in https://github.com/pytorch/pytorch/pull/127808#pullrequestreview-2096298486
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127939
Approved by: https://github.com/albanD
2024-07-12 20:38:43 +00:00
rzou
9c69684af8
[custom_ops] expose torch.library.register_torch_dispatch ( #130261 )
...
This is the API for defining the interaction between a torch_dispatch
class and a custom op. Taking API bikeshedding.
Test Plan:
- new tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130261
Approved by: https://github.com/albanD
ghstack dependencies: #130064
2024-07-12 14:13:01 +00:00
Shangdi Yu
fb9bc6d74a
[custom op] add doc for CustomOpDef.set_kernel_enabled ( #130406 )
...
<img width="1067" alt="Screenshot 2024-07-09 at 6 14 55 PM" src="https://github.com/pytorch/pytorch/assets/22356083/941751f8-8e12-43cb-8477-c739476e0096 ">
<img width="965" alt="Screenshot 2024-07-09 at 6 14 59 PM" src="https://github.com/pytorch/pytorch/assets/22356083/aa9be099-f26c-45a3-8a14-742a2bb7c28b ">
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130406
Approved by: https://github.com/zou3519
2024-07-11 15:47:35 +00:00
Shangdi Yu
a4576dad34
[reland][custom ops] infer schema ( #130079 )
...
Fixes #129617
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130079
Approved by: https://github.com/zou3519
2024-07-11 03:39:07 +00:00
PyTorch MergeBot
86bca69c5f
Revert "[custom_ops] expose torch.library.register_torch_dispatch ( #130261 )"
...
This reverts commit bb9a73f767 .
Reverted https://github.com/pytorch/pytorch/pull/130261 on behalf of https://github.com/izaitsevfb due to depends on #130064 which needs to be reverted ([comment](https://github.com/pytorch/pytorch/pull/130261#issuecomment-2221569707 ))
2024-07-10 21:43:28 +00:00
PyTorch MergeBot
e14a0f45ed
Revert "[reland][custom ops] infer schema ( #130079 )"
...
This reverts commit bef085bdfa .
Reverted https://github.com/pytorch/pytorch/pull/130079 on behalf of https://github.com/izaitsevfb due to depends on #130064 which needs to be reverted ([comment](https://github.com/pytorch/pytorch/pull/130079#issuecomment-2221561483 ))
2024-07-10 21:40:16 +00:00
Shangdi Yu
bef085bdfa
[reland][custom ops] infer schema ( #130079 )
...
Fixes #129617
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130079
Approved by: https://github.com/zou3519
2024-07-10 16:18:36 +00:00
rzou
bb9a73f767
[custom_ops] expose torch.library.register_torch_dispatch ( #130261 )
...
This is the API for defining the interaction between a torch_dispatch
class and a custom op. Taking API bikeshedding.
Test Plan:
- new tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130261
Approved by: https://github.com/albanD
ghstack dependencies: #130064
2024-07-09 21:11:27 +00:00
Yuanhao Ji
312652c325
[RFC] Add support for device extension autoloading ( #127074 )
...
Fixes #122468
- Load device extensions at the end of `torch/__init__.py`
- Enabled by default, or you can disable it with `TORCH_DEVICE_BACKEND_AUTOLOAD=0`
run test:
```python
python test/run_test.py -i test_autoload_enable
python test/run_test.py -i test_autoload_disable
```
doc:
https://docs-preview.pytorch.org/pytorch/pytorch/127074/miscellaneous_environment_variables.html
co-author: @jgong5 @bsochack @bkowalskiINTEL @jczaja @FFFrog @hipudding
Co-authored-by: albanD <desmaison.alban@gmail.com>
Co-authored-by: Jiong Gong <jiong.gong@intel.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127074
Approved by: https://github.com/albanD , https://github.com/jgong5
2024-07-09 06:14:13 +00:00
PyTorch MergeBot
44a773c121
Revert "[custom ops] infer schema ( #130079 )"
...
This reverts commit 3fe324ffb6 .
Reverted https://github.com/pytorch/pytorch/pull/130079 on behalf of https://github.com/huydhn due to The test_public_bindings failure looks legit 3fe324ffb6 ([comment](https://github.com/pytorch/pytorch/pull/130079#issuecomment-2215420957 ))
2024-07-08 22:02:29 +00:00
Shangdi Yu
3fe324ffb6
[custom ops] infer schema ( #130079 )
...
Fixes #129617
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130079
Approved by: https://github.com/zou3519
2024-07-08 20:46:23 +00:00
Kurt Mohler
e590168865
Enable sharing meta tensors between processes ( #129520 )
...
Fixes #129436
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129520
Approved by: https://github.com/ezyang
2024-07-04 20:29:48 +00:00
Li-Huai (Allan) Lin
42f3d7e948
[MPS] Add mps profiler env vars to docs ( #129552 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129552
Approved by: https://github.com/malfet
ghstack dependencies: #129451
2024-07-04 06:44:48 +00:00
Zhengxu Chen
042d764872
[export] Update example inputs format for DB. ( #129982 )
...
Summary: To give user a simpler example code, we are getting rid of ExportArgs in favor of example_args and example_kwargs.
Test Plan: CI
Differential Revision: D59288920
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129982
Approved by: https://github.com/angelayi
2024-07-03 17:53:15 +00:00
Edward Z. Yang
29c68df600
Stop immediately specializing common constants 0/1 for plain int ( #128327 )
...
Fixes https://github.com/pytorch/pytorch/issues/128319
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128327
Approved by: https://github.com/lezcano
ghstack dependencies: #129983
2024-07-03 16:41:51 +00:00
Howard Huang
4eb449f7dc
[pipelining] add small logging section to docs ( #129368 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129368
Approved by: https://github.com/wconstab
2024-07-02 18:19:28 +00:00
Haoci Zhang
1ad683033b
Implemented flexible PP schedule ( #129597 )
...
Enabled some cases to work where num_microbatches % pp_size != 0. Using the flex_pp schedule, we will have
num_rounds = max(1, n_microbatches // pp_group_size) and it works as long as n_microbatches % num_rounds is 0. As a few examples, support
pp_group_size = 4, n_microbatches = 10. We will have num_rounds = 2 and n_microbatches % 2 is 0.
pp_group_size = 4, n_microbatches = 3. We will have num_rounds = 1 and n_microbatches % 1 is 0.
Moved over from PiPPy (https://github.com/pytorch/PiPPy/pull/1129 )
Tested using the config in (1), schedule looks like the following graph:
```
=========== ALL_RANK_ACTIONS ===========
Rank 0 Rank 1 Rank 2 Rank 3
Step 00: F0_s0 None None None
Step 01: F1_s0 F0_s1 None None
Step 02: F2_s0 F1_s1 F0_s2 None
Step 03: F3_s0 F2_s1 F1_s2 F0_s3
Step 04: F4_s0 F3_s1 F2_s2 F1_s3
Step 05: F0_s4 F4_s1 F3_s2 F2_s3
Step 06: F1_s4 F0_s5 F4_s2 F3_s3
Step 07: F2_s4 F1_s5 F0_s6 F4_s3
Step 08: F3_s4 F2_s5 F1_s6 F0_s7
Step 09: F4_s4 F3_s5 None B0_s7
Step 10: F5_s0 None F2_s6 F1_s7
Step 11: None None B0_s6 B1_s7
Step 12: None F4_s5 F3_s6 F2_s7
Step 13: None B0_s5 B1_s6 B2_s7
Step 14: F6_s0 F5_s1 F4_s6 F3_s7
Step 15: B0_s4 B1_s5 B2_s6 B3_s7
Step 16: F7_s0 F6_s1 F5_s2 F4_s7
Step 17: B1_s4 B2_s5 B3_s6 B4_s7
Step 18: F8_s0 F7_s1 F6_s2 F5_s3
Step 19: B2_s4 B3_s5 B4_s6 B0_s3
Step 20: F9_s0 F8_s1 F7_s2 F6_s3
Step 21: B3_s4 B4_s5 B0_s2 B1_s3
Step 22: F5_s4 F9_s1 F8_s2 F7_s3
Step 23: B4_s4 B0_s1 B1_s2 B2_s3
Step 24: F6_s4 F5_s5 F9_s2 F8_s3
Step 25: B0_s0 B1_s1 B2_s2 B3_s3
Step 26: F7_s4 F6_s5 F5_s6 F9_s3
Step 27: B1_s0 B2_s1 B3_s2 B4_s3
Step 28: F8_s4 F7_s5 F6_s6 F5_s7
Step 29: B2_s0 B3_s1 B4_s2 B5_s7
Step 30: F9_s4 F8_s5 F7_s6 F6_s7
Step 31: B3_s0 B4_s1 B5_s6 B6_s7
Step 32: None F9_s5 F8_s6 F7_s7
Step 33: B4_s0 B5_s5 B6_s6 B7_s7
Step 34: None None F9_s6 F8_s7
Step 35: B5_s4 B6_s5 B7_s6 B8_s7
Step 36: None None None F9_s7
Step 37: B6_s4 B7_s5 B8_s6 B9_s7
Step 38: None None None None
Step 39: B7_s4 B8_s5 B9_s6 B5_s3
Step 40: None None None None
Step 41: B8_s4 B9_s5 B5_s2 B6_s3
Step 42: None None None None
Step 43: B9_s4 B5_s1 B6_s2 B7_s3
Step 44: None None None None
Step 45: B5_s0 B6_s1 B7_s2 B8_s3
Step 46: None None None None
Step 47: B6_s0 B7_s1 B8_s2 B9_s3
Step 48: None None None
Step 49: B7_s0 B8_s1 B9_s2
Step 50: None None
Step 51: B8_s0 B9_s1
Step 52: None
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129597
Approved by: https://github.com/H-Huang
2024-07-02 07:54:38 +00:00
eqy
f845a7a91a
[cuDNN][SDPA] Remove TORCH_CUDNN_SDPA_ENABLED=1, enable cuDNN SDPA by default on H100 and 2nd on other archs >= sm80 ( #125343 )
...
Looks like one of the first failures seen is `test_causal_variants_compile_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda` when `test_causal_variants_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda` passes.
What seems interesting here is that the `torch.compile` version fails while the eager version passes. Not sure what the difference would be here...
Nevertheless, is there a recommended mechanism to skip cuDNN SDPA as a backend for this test? CC @drisspg
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125343
Approved by: https://github.com/Skylion007
2024-06-30 19:22:16 +00:00
PyTorch MergeBot
3d96217891
Revert "[BE][Easy] use pathlib.Path instead of dirname / ".." / pardir ( #129374 )"
...
This reverts commit 9e1f3ecaa7 .
Reverted https://github.com/pytorch/pytorch/pull/129374 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is still failing with the same error ([comment](https://github.com/pytorch/pytorch/pull/129374#issuecomment-2197801405 ))
2024-06-29 00:47:15 +00:00
PyTorch MergeBot
999eec8dea
Revert "[cuDNN][SDPA] Remove TORCH_CUDNN_SDPA_ENABLED=1, enable cuDNN SDPA by default on H100 and 2nd on other archs >= sm80 ( #125343 )"
...
This reverts commit b7e7a4cb01 .
Reverted https://github.com/pytorch/pytorch/pull/125343 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to break some test_transformer running on internal A100 and V100 ([comment](https://github.com/pytorch/pytorch/pull/125343#issuecomment-2196202003 ))
2024-06-28 06:03:54 +00:00
Xuehai Pan
9e1f3ecaa7
[BE][Easy] use pathlib.Path instead of dirname / ".." / pardir ( #129374 )
...
Changes by apply order:
1. Replace all `".."` and `os.pardir` usage with `os.path.dirname(...)`.
2. Replace nested `os.path.dirname(os.path.dirname(...))` call with `str(Path(...).parent.parent)`.
3. Reorder `.absolute()` ~/ `.resolve()`~ and `.parent`: always resolve the path first.
`.parent{...}.absolute()` -> `.absolute().parent{...}`
4. Replace chained `.parent x N` with `.parents[${N - 1}]`: the code is easier to read (see 5.)
`.parent.parent.parent.parent` -> `.parents[3]`
5. ~Replace `.parents[${N - 1}]` with `.parents[${N} - 1]`: the code is easier to read and does not introduce any runtime overhead.~
~`.parents[3]` -> `.parents[4 - 1]`~
6. ~Replace `.parents[2 - 1]` with `.parent.parent`: because the code is shorter and easier to read.~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129374
Approved by: https://github.com/justinchuby , https://github.com/malfet
2024-06-28 00:35:15 +00:00
Li-Huai (Allan) Lin
84ad5452f6
[MPS] Fused SGD optimizer ( #129350 )
...
```
[-------------------------------------- Fused SGD --------------------------------------]
| Fused: True | Fused: False
1 threads: ------------------------------------------------------------------------------
numel: 1024, num_tensors: 100, momentum: True | 2 | 15
numel: 1024, num_tensors: 100, momentum: False | 2 | 5
numel: 65536, num_tensors: 100, momentum: True | 3 | 16
numel: 65536, num_tensors: 100, momentum: False | 2 | 5
numel: 1048576, num_tensors: 100, momentum: True | 11 | 16
numel: 1048576, num_tensors: 100, momentum: False | 8 | 6
numel: 1024, num_tensors: 500, momentum: True | 29 | 70
numel: 1024, num_tensors: 500, momentum: False | 20 | 24
numel: 65536, num_tensors: 500, momentum: True | 33 | 76
numel: 65536, num_tensors: 500, momentum: False | 22 | 26
numel: 1048576, num_tensors: 500, momentum: True | 70 | 80
numel: 1048576, num_tensors: 500, momentum: False | 43 | 40
numel: 1024, num_tensors: 1000, momentum: True | 108 | 139
numel: 1024, num_tensors: 1000, momentum: False | 72 | 48
numel: 65536, num_tensors: 1000, momentum: True | 116 | 150
numel: 65536, num_tensors: 1000, momentum: False | 77 | 52
numel: 1048576, num_tensors: 1000, momentum: True | 190 | 170
numel: 1048576, num_tensors: 1000, momentum: False | 120 | 50
```
```python
def profile_fused_sgd():
from torch.optim.sgd import sgd
import torch.utils.benchmark as benchmark
import itertools
def profile(fn, params, grads, momentum_buffer_list, fused):
fn(
params,
grads,
momentum_buffer_list,
momentum=True if len(momentum_buffer_list) > 0 else False,
dampening=0.0,
nesterov=False,
foreach=False,
fused=fused,
lr=1e-3,
weight_decay=.0,
maximize=False,
grad_scale=None,
found_inf=None,
)
torch.mps.synchronize()
device = "mps"
results = []
for num_tensors, numel, momentum in itertools.product([100, 500, 1000], [1024, 65536, 1048576], [True, False]):
sublabel = f"numel: {numel}, num_tensors: {num_tensors}, momentum: {momentum}"
print(sublabel)
params, grads = [[torch.arange(numel, dtype=torch.float32, device=device) + (numel * i) for i in range(num_tensors)] for _ in range(2)]
momentum_buffer_list = [torch.arange(numel, dtype=torch.float32, device=device) + (numel * i) for i in range(num_tensors)] if momentum else []
fn = sgd
for fused in [True, False]:
t = benchmark.Timer(
stmt='profile(fn, params, grads, momentum_buffer_list, fused)',
label='Fused SGD',
sub_label=sublabel,
globals=locals(),
description= f"Fused: {fused}",
).blocked_autorange(min_run_time=5)
results.append(t)
compare = benchmark.Compare(results)
compare.trim_significant_figures()
compare.colorize(rowwise=True)
compare.print()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129350
Approved by: https://github.com/janeyx99
ghstack dependencies: #129006 , #129008 , #129007 , #129105
2024-06-27 04:37:14 +00:00
PyTorch MergeBot
895316119d
Revert "[BE][Easy] use pathlib.Path instead of dirname / ".." / pardir ( #129374 )"
...
This reverts commit 0314c4c101 .
Reverted https://github.com/pytorch/pytorch/pull/129374 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it causes lots of internal build failures where they fail to find hipify module ([comment](https://github.com/pytorch/pytorch/pull/129374#issuecomment-2192437052 ))
2024-06-26 19:03:57 +00:00
Shangdi Yu
cca85c96cd
[export] minor typo fix ( #129543 )
...
Fixes a typo in torch.export doc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129543
Approved by: https://github.com/angelayi
2024-06-26 18:35:31 +00:00
Eddie Yan
b7e7a4cb01
[cuDNN][SDPA] Remove TORCH_CUDNN_SDPA_ENABLED=1, enable cuDNN SDPA by default on H100 and 2nd on other archs >= sm80 ( #125343 )
...
Looks like one of the first failures seen is `test_causal_variants_compile_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda` when `test_causal_variants_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda` passes.
What seems interesting here is that the `torch.compile` version fails while the eager version passes. Not sure what the difference would be here...
Nevertheless, is there a recommended mechanism to skip cuDNN SDPA as a backend for this test? CC @drisspg
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125343
Approved by: https://github.com/Skylion007
2024-06-26 00:49:18 +00:00
Zhengxu Chen
e58ef5b65f
[export] Rewrite exportdb formatting. ( #129260 )
...
Summary: It'll be easier to generate examples if the code doesn't depend on exportdb library.
Test Plan: CI
Differential Revision: D58886554
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129260
Approved by: https://github.com/tugsbayasgalan
2024-06-25 21:04:53 +00:00
Li-Huai (Allan) Lin
71ebe5121a
[MPS] Fast math env var ( #129007 )
...
Allow users to decide whether they want to have fast math enabled via env var
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129007
Approved by: https://github.com/malfet
ghstack dependencies: #129006 , #129008
2024-06-25 13:52:07 +00:00
Xuehai Pan
0314c4c101
[BE][Easy] use pathlib.Path instead of dirname / ".." / pardir ( #129374 )
...
Changes by apply order:
1. Replace all `".."` and `os.pardir` usage with `os.path.dirname(...)`.
2. Replace nested `os.path.dirname(os.path.dirname(...))` call with `str(Path(...).parent.parent)`.
3. Reorder `.absolute()` ~/ `.resolve()`~ and `.parent`: always resolve the path first.
`.parent{...}.absolute()` -> `.absolute().parent{...}`
4. Replace chained `.parent x N` with `.parents[${N - 1}]`: the code is easier to read (see 5.)
`.parent.parent.parent.parent` -> `.parents[3]`
5. ~Replace `.parents[${N - 1}]` with `.parents[${N} - 1]`: the code is easier to read and does not introduce any runtime overhead.~
~`.parents[3]` -> `.parents[4 - 1]`~
6. ~Replace `.parents[2 - 1]` with `.parent.parent`: because the code is shorter and easier to read.~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129374
Approved by: https://github.com/justinchuby , https://github.com/malfet
2024-06-25 08:28:38 +00:00
Will Constable
2f8b301c32
Clean up distributed/CONTRIBUTING.md ( #128450 )
...
Click [here](cf6c88af48/torch/distributed/CONTRIBUTING.md ) to see the rendered version of the file in this PR
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128450
Approved by: https://github.com/wanchaol
2024-06-22 02:41:22 +00:00
rzou
311fadb1fb
[docs] Redirect custom ops landing page to the correct place ( #129177 )
...
I'm moving it to pytorch/tutorials
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129177
Approved by: https://github.com/albanD
2024-06-21 13:31:32 +00:00
cyy
5c676bb8b3
Remove Caffe2 handling from onnx_unpack_quantized_weights ( #129021 )
...
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129021
Approved by: https://github.com/justinchuby , https://github.com/albanD
2024-06-21 06:16:44 +00:00
Jing Xu
5fba5d83f0
add xpu for amp ( #127276 )
...
As support for Intel GPU has been upstreamed, this PR is to add the XPU-related contents to AMP doc.
Co-authored-by: Yu, Guangye <guangye.yu@intel.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127276
Approved by: https://github.com/dvrogozh , https://github.com/albanD , https://github.com/malfet
2024-06-20 21:49:35 +00:00
Zhengxu Chen
65286883d4
[export] reland "experimental joint graph API." ( #129081 )
...
Summary: previous diff got reverted despite CI was green.
Test Plan: CI
Differential Revision: D58790048
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129081
Approved by: https://github.com/tugsbayasgalan
2024-06-20 16:50:53 +00:00
Oguz Ulgen
54b0006cb2
Evaluate symexprs on load path of cache not write ( #128997 )
...
When caching is enabled, an internal model fails with
```
assert_size_stride(bmm_9, (17, s0, 512), (54784, 512, 1))
AssertionError: expected size 17==17, stride 57344==54784 at dim=0
```
looking at this model, the exact problem is when the cache is hit on the forward graph, the generated code for backward fails since the strides of the outputs of forward, passed to backward as inputs, are not what we expected.
This PR changes the evaluation logic so that we defer evaluation of output stride exprs to load path as opposed to eagerly doing it on save path.
I have not been able to come up with a unit test repro for this problem.
Differential Revision: [D58796503](https://our.internmc.facebook.com/intern/diff/D58796503 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128997
Approved by: https://github.com/ezyang
2024-06-20 08:55:12 +00:00
Li-Huai (Allan) Lin
19f3abcde4
[Docs][MPS] Add mps environment variable table ( #129008 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129008
Approved by: https://github.com/malfet
ghstack dependencies: #129006
2024-06-20 03:30:35 +00:00