pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Mikayla Gawarecki	c3d099ddd1	[BE][Easy] Add hooks to doc for Optimizer base class (#131628 ) Happened to notice this was missing from the base class (but is rendering for the other optimizers like Adam etc.) when I wanted to link the state_dict hooks for https://discuss.pytorch.org/t/global-not-per-param-optimizer-state/206769 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131628 Approved by: https://github.com/janeyx99	2024-07-25 15:07:08 +00:00
Jane Xu	9c4cf866c2	Adafactor forloop basic impl (#129905 ) #109581 At this point, the vanilla implementation (the default) is good. Docs: https://docs-preview.pytorch.org/pytorch/pytorch/129905/generated/torch.optim.Adafactor.html#torch.optim.Adafactor Specifically, the impl in this PR, which attempts to replicate the paper, ``` optim = torch.optim.Adafactor([weight]) ``` is close enough to https://pytorch-optimizers.readthedocs.io/en/latest/optimizer/#pytorch_optimizer.AdaFactor ``` optim_c = AdaFactor([weight], betas=(0, 0.999), scale_parameter=False) ``` is close enough to https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adafactor ``` optim = keras.optimizers.Adafactor(learning_rate=0.01) ``` The three results respectively for the same randomly generated weights: ``` # ours tensor([[ 0.3807594, -0.3912092], [ 0.0762539, 0.5377805], [ 0.2459473, 0.4662207]]) # pytorch-optimizer tensor([[ 0.3807592, -0.3912172], [ 0.0762507, 0.5377818], [ 0.2459457, 0.4662213]]) # keras array([[ 0.38076326, -0.39121315], [ 0.0762547 , 0.5377859 ], [ 0.24594972, 0.46622536]], dtype=float32) ``` This gives me confidence to move forward in speeding up the implementation now that a baseline has been established. If you're curious about differences: * keras assigns step_size (rho_t in their code) to `min(lr, 1 / sqrt(step)` whereas the OG impl uses a hardcoded 0.01 instead of lr. We do the same thing as keras, but our lr default is 0.01. * We differ from the pytorch-optimizers default in that our default will not track momentum (thus `beta1=0`) and we do not apply parameter scaling. <details> Keras collab: https://colab.research.google.com/drive/1i3xF8ChL7TWKJGV_5v_5nMhXKnYmQQ06?usp=sharing My script repro: ``` import torch from pytorch_optimizer import AdaFactor torch.set_printoptions(precision=7) weight = torch.tensor([[ 0.37697506, -0.39500135], [ 0.07246649, 0.53399765], [ 0.24216151, 0.46243715]], dtype=torch.float32) # bias = torch.tensor([0, 0], dtype=torch.float32) weight.grad = torch.tensor([[-0.5940447, -0.7743838], [-0.5940447, -0.7743838], [-0.5940447, -0.7743838]], dtype=torch.float32) # bias.grad = torch.tensor([-2.5027974, 1.5422692], dtype=torch.float32) weight_c = weight.clone() weight_c.grad = weight.grad.clone() optim = torch.optim.Adafactor([weight]) optim.step() print(weight) optim_c = AdaFactor([weight_c], betas=(0, 0.999), scale_parameter=False) optim_c.step() print(weight_c) ``` <details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/129905 Approved by: https://github.com/albanD	2024-07-25 13:17:19 +00:00
Haoci Zhang	8fe5b93667	support zb1p and zb2p algorithms (#130752 ) Previously, we have proved that ZB2P is not truly zero bubble when num_local_stages exceed 4 and so only ZB1P was supported. We did a few tweaks to the ZB2P to really make it zero bubble. Algorithm and proof is attached. [zero_bubble.pdf](https://github.com/user-attachments/files/16238738/zero_bubble.pdf) Pull Request resolved: https://github.com/pytorch/pytorch/pull/130752 Approved by: https://github.com/H-Huang	2024-07-24 17:58:46 +00:00
Jun Luo	abb313b466	[torch.mtia] Noop set_rng_state and get_rng_state APIs (#130873 ) Summary: As title Test Plan: CI tests Reviewed By: joebos Differential Revision: D59036602 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130873 Approved by: https://github.com/hanzlfs	2024-07-24 01:52:21 +00:00
Shangdi Yu	68c725a094	[custom ops] Add register_vmap for custom ops (#130589 ) Fixes #130284 Fixes #130653 - Add `torch.library.register_vmap` to custom ops - Add `register_vmap` for operators in ops in custom_op_db. - Make `torch.autograd.Function` support kwarg-only kwargs for vmap - test operators in op_db with `tests/test_vmap`. - change `test_vmap` to allow custom `out_dim` and allow "None" in `out_dim` when testing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130589 Approved by: https://github.com/zou3519	2024-07-23 17:48:38 +00:00
PyTorch MergeBot	e4b5645f83	Revert "Add wrappers for synchronous GPUDirect Storage APIs (#130633 )" This reverts commit `5b5e0698a5`. Reverted https://github.com/pytorch/pytorch/pull/130633 on behalf of https://github.com/clee2000 due to breaking a lot of jobs and build rules internally D60085885, possibly needs to update some bazel build? ([comment](https://github.com/pytorch/pytorch/pull/130633#issuecomment-2245806738))	2024-07-23 17:19:34 +00:00
PyTorch MergeBot	b435d84261	Revert "[custom ops] Add register_vmap for custom ops (#130589 )" This reverts commit `074b420641`. Reverted https://github.com/pytorch/pytorch/pull/130589 on behalf of https://github.com/atalman due to Please fix lint and reland ([comment](https://github.com/pytorch/pytorch/pull/130589#issuecomment-2244092174))	2024-07-23 01:44:44 +00:00
Shangdi Yu	074b420641	[custom ops] Add register_vmap for custom ops (#130589 ) Fixes #130284 Fixes #130653 - Add `torch.library.register_vmap` to custom ops - Add `register_vmap` for operators in ops in custom_op_db. - Make `torch.autograd.Function` support kwarg-only kwargs for vmap - test operators in op_db with `tests/test_vmap`. - change `test_vmap` to allow custom `out_dim` and allow "None" in `out_dim` when testing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130589 Approved by: https://github.com/zou3519	2024-07-23 00:54:52 +00:00
Mikayla Gawarecki	5b5e0698a5	Add wrappers for synchronous GPUDirect Storage APIs (#130633 ) Based in part on https://github.com/NVIDIA/apex/pull/1774 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130633 Approved by: https://github.com/albanD	2024-07-22 14:51:24 +00:00
PyTorch MergeBot	26383a6cc0	Revert "Added and_masks and or_masks utilities (#131073 )" This reverts commit `92bb323d36`. Reverted https://github.com/pytorch/pytorch/pull/131073 on behalf of https://github.com/albanD due to The docs build fails here and in trunk ([comment](https://github.com/pytorch/pytorch/pull/131073#issuecomment-2242997958))	2024-07-22 13:44:55 +00:00
chilli	92bb323d36	Added and_masks and or_masks utilities (#131073 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/131073 Approved by: https://github.com/drisspg ghstack dependencies: #130871, #130904	2024-07-22 11:48:03 +00:00
Soumith Chintala	8e478d4fb1	Add Alban and Piotr into Core Maintainers (#130903 ) See official announcement here: https://dev-discuss.pytorch.org/t/alban-desmaison-and-piotr-bialecki-are-now-pytorch-core-maintainers/2280 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130903 Approved by: https://github.com/albanD, https://github.com/Skylion007	2024-07-20 16:02:42 +00:00
Li-Huai (Allan) Lin	125be005eb	[Docs] Fix fake tensor doc (#131205 ) Fix this: `# AttributeError: 'FakeTensorMode' object has no attribute 'from_real_tensor'` Pull Request resolved: https://github.com/pytorch/pytorch/pull/131205 Approved by: https://github.com/eellison	2024-07-19 17:59:45 +00:00
Jerry Zhang	793b17ebcb	Add numeric_debugger top level APIs (#130643 ) Summary: Add three top level APIs for numeric debugger in pt2e flow that can log intermediate output in the model and calculate summary for metric comparisons between nodes in two graphs * `prepare_for_propagation_comparison` * `extract_results_from_loggers` * `compare_results` Test Plan: python test/test_quantization.py -k test_prepare_for_propagation_comparison python test/test_quantization.py -k test_extract_results_from_loggers Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/130643 Approved by: https://github.com/dulinriley, https://github.com/tarun292	2024-07-18 20:54:18 +00:00
redwrasse	63a0a65df9	Define 'zero-preserving unary functions' in docs (#130804 ) Make explicit the definition of 'zero-preserving unary functions' in the sparse tensors documentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/130804 Approved by: https://github.com/soulitzer	2024-07-18 13:30:29 +00:00
drisspg	2b43d339fe	Make FlexAttention API public (#130755 ) # Summary Makes the prototype API flex_attention public Pull Request resolved: https://github.com/pytorch/pytorch/pull/130755 Approved by: https://github.com/Chillee	2024-07-16 16:21:25 +00:00
Xuehai Pan	a3abfa5cb5	[BE][Easy][1/19] enforce style for empty lines in import segments (#129752 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129752 Approved by: https://github.com/ezyang, https://github.com/malfet	2024-07-16 00:42:56 +00:00
Jerry Zhang	b893aa71ca	Rename generate_numeric_debug_handle to numeric_debugger (#130590 ) Summary: att Test Plan: CI Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/130590 Approved by: https://github.com/dulinriley, https://github.com/tarun292	2024-07-15 22:42:27 +00:00
Yu, Guangye	7cd48df2da	Refine the logic of device construction when only device index is given (#129119 ) # Motivation Before this PR, device construction was `cuda` type when only a device index was given. It also returns the `PrivateUser1` type if a `PrivateUser1` type is registered. ```bash >>> import torch >>> device = torch.device(0) >>> device.type 'cuda' >>> a = torch.tensor([1, 2]) >>> b = a.to(0) >>> b tensor([1, 2], device='cuda:0') ``` It works well on CUDA GPU. But it will raise unexpected information and error running on XPU. ```bash >>> import torch >>> device = torch.device(0) >>> device.type 'cuda' >>> a = torch.tensor([1, 2]) >>> b = a.to(0) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/xxx/pytorch/torch/cuda/__init__.py", line 302, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled ``` With this PR, refine the logic to use the currently available device type instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129119 Approved by: https://github.com/albanD, https://github.com/gujinghui, https://github.com/EikanWang ghstack dependencies: #129463, #129205, #129363	2024-07-15 14:34:29 +00:00
Yu, Guangye	9cae2160f5	Introduce the concept of Accelerators to PyTorch doc (#129363 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129363 Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/albanD ghstack dependencies: #129463, #129205	2024-07-15 14:24:46 +00:00
Mikayla Gawarecki	7c289c2a5c	Add torch.serialization.safe_globals context manager (#127939 ) Add context manager mentioned in https://github.com/pytorch/pytorch/pull/127808#pullrequestreview-2096298486 Pull Request resolved: https://github.com/pytorch/pytorch/pull/127939 Approved by: https://github.com/albanD	2024-07-12 20:38:43 +00:00
rzou	9c69684af8	[custom_ops] expose torch.library.register_torch_dispatch (#130261 ) This is the API for defining the interaction between a torch_dispatch class and a custom op. Taking API bikeshedding. Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/130261 Approved by: https://github.com/albanD ghstack dependencies: #130064	2024-07-12 14:13:01 +00:00
Shangdi Yu	fb9bc6d74a	[custom op] add doc for CustomOpDef.set_kernel_enabled (#130406 ) <img width="1067" alt="Screenshot 2024-07-09 at 6 14 55 PM" src="https://github.com/pytorch/pytorch/assets/22356083/941751f8-8e12-43cb-8477-c739476e0096"> <img width="965" alt="Screenshot 2024-07-09 at 6 14 59 PM" src="https://github.com/pytorch/pytorch/assets/22356083/aa9be099-f26c-45a3-8a14-742a2bb7c28b"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/130406 Approved by: https://github.com/zou3519	2024-07-11 15:47:35 +00:00
Shangdi Yu	a4576dad34	[reland][custom ops] infer schema (#130079 ) Fixes #129617 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130079 Approved by: https://github.com/zou3519	2024-07-11 03:39:07 +00:00
PyTorch MergeBot	86bca69c5f	Revert "[custom_ops] expose torch.library.register_torch_dispatch (#130261 )" This reverts commit `bb9a73f767`. Reverted https://github.com/pytorch/pytorch/pull/130261 on behalf of https://github.com/izaitsevfb due to depends on #130064 which needs to be reverted ([comment](https://github.com/pytorch/pytorch/pull/130261#issuecomment-2221569707))	2024-07-10 21:43:28 +00:00
PyTorch MergeBot	e14a0f45ed	Revert "[reland][custom ops] infer schema (#130079 )" This reverts commit `bef085bdfa`. Reverted https://github.com/pytorch/pytorch/pull/130079 on behalf of https://github.com/izaitsevfb due to depends on #130064 which needs to be reverted ([comment](https://github.com/pytorch/pytorch/pull/130079#issuecomment-2221561483))	2024-07-10 21:40:16 +00:00
Shangdi Yu	bef085bdfa	[reland][custom ops] infer schema (#130079 ) Fixes #129617 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130079 Approved by: https://github.com/zou3519	2024-07-10 16:18:36 +00:00
rzou	bb9a73f767	[custom_ops] expose torch.library.register_torch_dispatch (#130261 ) This is the API for defining the interaction between a torch_dispatch class and a custom op. Taking API bikeshedding. Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/130261 Approved by: https://github.com/albanD ghstack dependencies: #130064	2024-07-09 21:11:27 +00:00
Yuanhao Ji	312652c325	[RFC] Add support for device extension autoloading (#127074 ) Fixes #122468 - Load device extensions at the end of `torch/__init__.py` - Enabled by default, or you can disable it with `TORCH_DEVICE_BACKEND_AUTOLOAD=0` run test: ```python python test/run_test.py -i test_autoload_enable python test/run_test.py -i test_autoload_disable ``` doc: https://docs-preview.pytorch.org/pytorch/pytorch/127074/miscellaneous_environment_variables.html co-author: @jgong5 @bsochack @bkowalskiINTEL @jczaja @FFFrog @hipudding Co-authored-by: albanD <desmaison.alban@gmail.com> Co-authored-by: Jiong Gong <jiong.gong@intel.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/127074 Approved by: https://github.com/albanD, https://github.com/jgong5	2024-07-09 06:14:13 +00:00
PyTorch MergeBot	44a773c121	Revert "[custom ops] infer schema (#130079 )" This reverts commit `3fe324ffb6`. Reverted https://github.com/pytorch/pytorch/pull/130079 on behalf of https://github.com/huydhn due to The test_public_bindings failure looks legit `3fe324ffb6` ([comment](https://github.com/pytorch/pytorch/pull/130079#issuecomment-2215420957))	2024-07-08 22:02:29 +00:00
Shangdi Yu	3fe324ffb6	[custom ops] infer schema (#130079 ) Fixes #129617 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130079 Approved by: https://github.com/zou3519	2024-07-08 20:46:23 +00:00
Kurt Mohler	e590168865	Enable sharing meta tensors between processes (#129520 ) Fixes #129436 Pull Request resolved: https://github.com/pytorch/pytorch/pull/129520 Approved by: https://github.com/ezyang	2024-07-04 20:29:48 +00:00
Li-Huai (Allan) Lin	42f3d7e948	[MPS] Add mps profiler env vars to docs (#129552 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129552 Approved by: https://github.com/malfet ghstack dependencies: #129451	2024-07-04 06:44:48 +00:00
Zhengxu Chen	042d764872	[export] Update example inputs format for DB. (#129982 ) Summary: To give user a simpler example code, we are getting rid of ExportArgs in favor of example_args and example_kwargs. Test Plan: CI Differential Revision: D59288920 Pull Request resolved: https://github.com/pytorch/pytorch/pull/129982 Approved by: https://github.com/angelayi	2024-07-03 17:53:15 +00:00
Edward Z. Yang	29c68df600	Stop immediately specializing common constants 0/1 for plain int (#128327 ) Fixes https://github.com/pytorch/pytorch/issues/128319 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/128327 Approved by: https://github.com/lezcano ghstack dependencies: #129983	2024-07-03 16:41:51 +00:00
Howard Huang	4eb449f7dc	[pipelining] add small logging section to docs (#129368 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129368 Approved by: https://github.com/wconstab	2024-07-02 18:19:28 +00:00
Haoci Zhang	1ad683033b	Implemented flexible PP schedule (#129597 ) Enabled some cases to work where num_microbatches % pp_size != 0. Using the flex_pp schedule, we will have num_rounds = max(1, n_microbatches // pp_group_size) and it works as long as n_microbatches % num_rounds is 0. As a few examples, support pp_group_size = 4, n_microbatches = 10. We will have num_rounds = 2 and n_microbatches % 2 is 0. pp_group_size = 4, n_microbatches = 3. We will have num_rounds = 1 and n_microbatches % 1 is 0. Moved over from PiPPy (https://github.com/pytorch/PiPPy/pull/1129) Tested using the config in (1), schedule looks like the following graph: ``` =========== ALL_RANK_ACTIONS =========== Rank 0 Rank 1 Rank 2 Rank 3 Step 00: F0_s0 None None None Step 01: F1_s0 F0_s1 None None Step 02: F2_s0 F1_s1 F0_s2 None Step 03: F3_s0 F2_s1 F1_s2 F0_s3 Step 04: F4_s0 F3_s1 F2_s2 F1_s3 Step 05: F0_s4 F4_s1 F3_s2 F2_s3 Step 06: F1_s4 F0_s5 F4_s2 F3_s3 Step 07: F2_s4 F1_s5 F0_s6 F4_s3 Step 08: F3_s4 F2_s5 F1_s6 F0_s7 Step 09: F4_s4 F3_s5 None B0_s7 Step 10: F5_s0 None F2_s6 F1_s7 Step 11: None None B0_s6 B1_s7 Step 12: None F4_s5 F3_s6 F2_s7 Step 13: None B0_s5 B1_s6 B2_s7 Step 14: F6_s0 F5_s1 F4_s6 F3_s7 Step 15: B0_s4 B1_s5 B2_s6 B3_s7 Step 16: F7_s0 F6_s1 F5_s2 F4_s7 Step 17: B1_s4 B2_s5 B3_s6 B4_s7 Step 18: F8_s0 F7_s1 F6_s2 F5_s3 Step 19: B2_s4 B3_s5 B4_s6 B0_s3 Step 20: F9_s0 F8_s1 F7_s2 F6_s3 Step 21: B3_s4 B4_s5 B0_s2 B1_s3 Step 22: F5_s4 F9_s1 F8_s2 F7_s3 Step 23: B4_s4 B0_s1 B1_s2 B2_s3 Step 24: F6_s4 F5_s5 F9_s2 F8_s3 Step 25: B0_s0 B1_s1 B2_s2 B3_s3 Step 26: F7_s4 F6_s5 F5_s6 F9_s3 Step 27: B1_s0 B2_s1 B3_s2 B4_s3 Step 28: F8_s4 F7_s5 F6_s6 F5_s7 Step 29: B2_s0 B3_s1 B4_s2 B5_s7 Step 30: F9_s4 F8_s5 F7_s6 F6_s7 Step 31: B3_s0 B4_s1 B5_s6 B6_s7 Step 32: None F9_s5 F8_s6 F7_s7 Step 33: B4_s0 B5_s5 B6_s6 B7_s7 Step 34: None None F9_s6 F8_s7 Step 35: B5_s4 B6_s5 B7_s6 B8_s7 Step 36: None None None F9_s7 Step 37: B6_s4 B7_s5 B8_s6 B9_s7 Step 38: None None None None Step 39: B7_s4 B8_s5 B9_s6 B5_s3 Step 40: None None None None Step 41: B8_s4 B9_s5 B5_s2 B6_s3 Step 42: None None None None Step 43: B9_s4 B5_s1 B6_s2 B7_s3 Step 44: None None None None Step 45: B5_s0 B6_s1 B7_s2 B8_s3 Step 46: None None None None Step 47: B6_s0 B7_s1 B8_s2 B9_s3 Step 48: None None None Step 49: B7_s0 B8_s1 B9_s2 Step 50: None None Step 51: B8_s0 B9_s1 Step 52: None ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129597 Approved by: https://github.com/H-Huang	2024-07-02 07:54:38 +00:00
eqy	f845a7a91a	[cuDNN][SDPA] Remove `TORCH_CUDNN_SDPA_ENABLED=1`, enable cuDNN SDPA by default on H100 and 2nd on other archs >= sm80 (#125343 ) Looks like one of the first failures seen is `test_causal_variants_compile_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda` when `test_causal_variants_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda` passes. What seems interesting here is that the `torch.compile` version fails while the eager version passes. Not sure what the difference would be here... Nevertheless, is there a recommended mechanism to skip cuDNN SDPA as a backend for this test? CC @drisspg Pull Request resolved: https://github.com/pytorch/pytorch/pull/125343 Approved by: https://github.com/Skylion007	2024-06-30 19:22:16 +00:00
PyTorch MergeBot	3d96217891	Revert "[BE][Easy] use `pathlib.Path` instead of `dirname` / `".."` / `pardir` (#129374 )" This reverts commit `9e1f3ecaa7`. Reverted https://github.com/pytorch/pytorch/pull/129374 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is still failing with the same error ([comment](https://github.com/pytorch/pytorch/pull/129374#issuecomment-2197801405))	2024-06-29 00:47:15 +00:00
PyTorch MergeBot	999eec8dea	Revert "[cuDNN][SDPA] Remove `TORCH_CUDNN_SDPA_ENABLED=1`, enable cuDNN SDPA by default on H100 and 2nd on other archs >= sm80 (#125343 )" This reverts commit `b7e7a4cb01`. Reverted https://github.com/pytorch/pytorch/pull/125343 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to break some test_transformer running on internal A100 and V100 ([comment](https://github.com/pytorch/pytorch/pull/125343#issuecomment-2196202003))	2024-06-28 06:03:54 +00:00
Xuehai Pan	9e1f3ecaa7	[BE][Easy] use `pathlib.Path` instead of `dirname` / `".."` / `pardir` (#129374 ) Changes by apply order: 1. Replace all `".."` and `os.pardir` usage with `os.path.dirname(...)`. 2. Replace nested `os.path.dirname(os.path.dirname(...))` call with `str(Path(...).parent.parent)`. 3. Reorder `.absolute()` ~/ `.resolve()`~ and `.parent`: always resolve the path first. `.parent{...}.absolute()` -> `.absolute().parent{...}` 4. Replace chained `.parent x N` with `.parents[${N - 1}]`: the code is easier to read (see 5.) `.parent.parent.parent.parent` -> `.parents[3]` 5. ~Replace `.parents[${N - 1}]` with `.parents[${N} - 1]`: the code is easier to read and does not introduce any runtime overhead.~ ~`.parents[3]` -> `.parents[4 - 1]`~ 6. ~Replace `.parents[2 - 1]` with `.parent.parent`: because the code is shorter and easier to read.~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/129374 Approved by: https://github.com/justinchuby, https://github.com/malfet	2024-06-28 00:35:15 +00:00
Li-Huai (Allan) Lin	84ad5452f6	[MPS] Fused SGD optimizer (#129350 ) ``` [-------------------------------------- Fused SGD --------------------------------------] \| Fused: True \| Fused: False 1 threads: ------------------------------------------------------------------------------ numel: 1024, num_tensors: 100, momentum: True \| 2 \| 15 numel: 1024, num_tensors: 100, momentum: False \| 2 \| 5 numel: 65536, num_tensors: 100, momentum: True \| 3 \| 16 numel: 65536, num_tensors: 100, momentum: False \| 2 \| 5 numel: 1048576, num_tensors: 100, momentum: True \| 11 \| 16 numel: 1048576, num_tensors: 100, momentum: False \| 8 \| 6 numel: 1024, num_tensors: 500, momentum: True \| 29 \| 70 numel: 1024, num_tensors: 500, momentum: False \| 20 \| 24 numel: 65536, num_tensors: 500, momentum: True \| 33 \| 76 numel: 65536, num_tensors: 500, momentum: False \| 22 \| 26 numel: 1048576, num_tensors: 500, momentum: True \| 70 \| 80 numel: 1048576, num_tensors: 500, momentum: False \| 43 \| 40 numel: 1024, num_tensors: 1000, momentum: True \| 108 \| 139 numel: 1024, num_tensors: 1000, momentum: False \| 72 \| 48 numel: 65536, num_tensors: 1000, momentum: True \| 116 \| 150 numel: 65536, num_tensors: 1000, momentum: False \| 77 \| 52 numel: 1048576, num_tensors: 1000, momentum: True \| 190 \| 170 numel: 1048576, num_tensors: 1000, momentum: False \| 120 \| 50 ``` ```python def profile_fused_sgd(): from torch.optim.sgd import sgd import torch.utils.benchmark as benchmark import itertools def profile(fn, params, grads, momentum_buffer_list, fused): fn( params, grads, momentum_buffer_list, momentum=True if len(momentum_buffer_list) > 0 else False, dampening=0.0, nesterov=False, foreach=False, fused=fused, lr=1e-3, weight_decay=.0, maximize=False, grad_scale=None, found_inf=None, ) torch.mps.synchronize() device = "mps" results = [] for num_tensors, numel, momentum in itertools.product([100, 500, 1000], [1024, 65536, 1048576], [True, False]): sublabel = f"numel: {numel}, num_tensors: {num_tensors}, momentum: {momentum}" print(sublabel) params, grads = [[torch.arange(numel, dtype=torch.float32, device=device) + (numel * i) for i in range(num_tensors)] for _ in range(2)] momentum_buffer_list = [torch.arange(numel, dtype=torch.float32, device=device) + (numel * i) for i in range(num_tensors)] if momentum else [] fn = sgd for fused in [True, False]: t = benchmark.Timer( stmt='profile(fn, params, grads, momentum_buffer_list, fused)', label='Fused SGD', sub_label=sublabel, globals=locals(), description= f"Fused: {fused}", ).blocked_autorange(min_run_time=5) results.append(t) compare = benchmark.Compare(results) compare.trim_significant_figures() compare.colorize(rowwise=True) compare.print() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129350 Approved by: https://github.com/janeyx99 ghstack dependencies: #129006, #129008, #129007, #129105	2024-06-27 04:37:14 +00:00
PyTorch MergeBot	895316119d	Revert "[BE][Easy] use `pathlib.Path` instead of `dirname` / `".."` / `pardir` (#129374 )" This reverts commit `0314c4c101`. Reverted https://github.com/pytorch/pytorch/pull/129374 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it causes lots of internal build failures where they fail to find hipify module ([comment](https://github.com/pytorch/pytorch/pull/129374#issuecomment-2192437052))	2024-06-26 19:03:57 +00:00
Shangdi Yu	cca85c96cd	[export] minor typo fix (#129543 ) Fixes a typo in torch.export doc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129543 Approved by: https://github.com/angelayi	2024-06-26 18:35:31 +00:00
Eddie Yan	b7e7a4cb01	[cuDNN][SDPA] Remove `TORCH_CUDNN_SDPA_ENABLED=1`, enable cuDNN SDPA by default on H100 and 2nd on other archs >= sm80 (#125343 ) Looks like one of the first failures seen is `test_causal_variants_compile_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda` when `test_causal_variants_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda` passes. What seems interesting here is that the `torch.compile` version fails while the eager version passes. Not sure what the difference would be here... Nevertheless, is there a recommended mechanism to skip cuDNN SDPA as a backend for this test? CC @drisspg Pull Request resolved: https://github.com/pytorch/pytorch/pull/125343 Approved by: https://github.com/Skylion007	2024-06-26 00:49:18 +00:00
Zhengxu Chen	e58ef5b65f	[export] Rewrite exportdb formatting. (#129260 ) Summary: It'll be easier to generate examples if the code doesn't depend on exportdb library. Test Plan: CI Differential Revision: D58886554 Pull Request resolved: https://github.com/pytorch/pytorch/pull/129260 Approved by: https://github.com/tugsbayasgalan	2024-06-25 21:04:53 +00:00
Li-Huai (Allan) Lin	71ebe5121a	[MPS] Fast math env var (#129007 ) Allow users to decide whether they want to have fast math enabled via env var Pull Request resolved: https://github.com/pytorch/pytorch/pull/129007 Approved by: https://github.com/malfet ghstack dependencies: #129006, #129008	2024-06-25 13:52:07 +00:00
Xuehai Pan	0314c4c101	[BE][Easy] use `pathlib.Path` instead of `dirname` / `".."` / `pardir` (#129374 ) Changes by apply order: 1. Replace all `".."` and `os.pardir` usage with `os.path.dirname(...)`. 2. Replace nested `os.path.dirname(os.path.dirname(...))` call with `str(Path(...).parent.parent)`. 3. Reorder `.absolute()` ~/ `.resolve()`~ and `.parent`: always resolve the path first. `.parent{...}.absolute()` -> `.absolute().parent{...}` 4. Replace chained `.parent x N` with `.parents[${N - 1}]`: the code is easier to read (see 5.) `.parent.parent.parent.parent` -> `.parents[3]` 5. ~Replace `.parents[${N - 1}]` with `.parents[${N} - 1]`: the code is easier to read and does not introduce any runtime overhead.~ ~`.parents[3]` -> `.parents[4 - 1]`~ 6. ~Replace `.parents[2 - 1]` with `.parent.parent`: because the code is shorter and easier to read.~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/129374 Approved by: https://github.com/justinchuby, https://github.com/malfet	2024-06-25 08:28:38 +00:00
Will Constable	2f8b301c32	Clean up distributed/CONTRIBUTING.md (#128450 ) Click [here](`cf6c88af48/torch/distributed/CONTRIBUTING.md`) to see the rendered version of the file in this PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/128450 Approved by: https://github.com/wanchaol	2024-06-22 02:41:22 +00:00
rzou	311fadb1fb	[docs] Redirect custom ops landing page to the correct place (#129177 ) I'm moving it to pytorch/tutorials Pull Request resolved: https://github.com/pytorch/pytorch/pull/129177 Approved by: https://github.com/albanD	2024-06-21 13:31:32 +00:00

1 2 3 4 5 ...

2616 Commits