pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
William Wen	be172d2a60	[pt2, docs] Add new PT2 troubleshooting doc (#138620 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138620 Approved by: https://github.com/ezyang Co-authored-by: Svetlana Karslioglu <svekars@meta.com>	2024-11-09 01:17:39 +00:00
Bin Bao	63a0d6587e	[AOTI] Update the OSS tutorial (#139956 ) Summary: Update the OSS tutorial to use the new aoti_compile_and_package and aoti_load_package APIs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139956 Approved by: https://github.com/angelayi ghstack dependencies: #139955	2024-11-08 20:46:57 +00:00
Jerry Zhang	1fcc99c6bf	Update quantization.rst (#139824 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/139824 Approved by: https://github.com/svekars	2024-11-08 02:34:50 +00:00
John MacCormick	81d077cca2	Fix to modules.rst: indent line with activation functions (#139667 ) At line 205, I believe the code `x = self.activations[act](x)` should be indented so that it is in the body of the for loop. Otherwise, applying the four linear modules has the same effect as applying a single linear module, in the sense that it is still just a linear map so there is no point in having four of them. In other words, each layer of this network should have a nonlinearity. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139667 Approved by: https://github.com/malfet	2024-11-08 01:12:52 +00:00
Tongzhou Wang	22dd17c7bb	[doc] fixing missing colon in custom op doc (#140060 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/140060 Approved by: https://github.com/malfet	2024-11-07 23:48:44 +00:00
Mikayla Gawarecki	2ee91db03d	Add APIs to separate norm calculation and gradient scaling in `nn.utils.clip_grad_norm_` (#139662 ) Fixes https://github.com/pytorch/pytorch/issues/139467 Refactor `nn.utils.clip_grad_norm_` into `nn.utils.get_total_norm` and then `nn.utils.clip_grads_with_norm_` . `clip_grad_norm_` now calls into these two new ops, `get_total_norm` is generalized (rather than `get_grad_norm` due to the discussion on the issue from @awgu) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139662 Approved by: https://github.com/H-Huang	2024-11-07 23:13:23 +00:00
Shangdi Yu	83e36a6bfa	AOTI Minifier (#139351 ) See documentation at https://docs-preview.pytorch.org/pytorch/pytorch/139351/torch.compiler_aot_inductor_minifier.html. Add a minifier for AOTI. Test Plan: python test/inductor/test_minifier.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/139351 Approved by: https://github.com/desertfire	2024-11-07 21:43:44 +00:00
Tom Fogal	b5286ba207	Small fix to Python rendering in documentation. (#138281 ) The text was being rendered as normal text but I believe was meant to be code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138281 Approved by: https://github.com/janeyx99	2024-11-07 20:48:47 +00:00
Will Constable	2b400236c2	[DCP] Cross-link DCP doc to tutorials (#139776 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139776 Approved by: https://github.com/mhorowitz, https://github.com/LucasLLC, https://github.com/fduwjj ghstack dependencies: #139938	2024-11-07 02:19:49 +00:00
Jay Zhang	99deedff57	[ONNX] Describe memory usage of TorchDynamo-based exporter. (#139388 ) Add a new documentation to show one memory usage benefit brought by TorchDynamo-based ONNX exporter. Also add a unit test to make sure TorchDynamo-based ONNX exporter works well under FakeTensorMode. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139388 Approved by: https://github.com/xadupre	2024-11-06 17:29:11 +00:00
Tongzhou Wang	faab564bda	[doc] Fix grammar in export.ir_spec.rst (#139584 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139584 Approved by: https://github.com/zou3519	2024-11-05 23:26:36 +00:00
Ryan Guo	693a0a1bd4	[dynamo][NFC] Rename `mutable_local` and add documentation (#139339 ) This patch addresses the renaming part of #133027, specifically, it renames the following and adds documentation for relevant classes. 1. `VariableTracker.mutable_local` to `mutation_type` 2. `MatableLocal `to `ValueMutationNew` 3. `MutableSideEffects `to `ValueMutationExisting` 4. `MutableLocalSource` to `SourceType` 5. `MutableLocalSource.Local` to `New` Note that (2), (3) and (5) are mainly to bring consistency between them and `AttributeMutationNew`, `AttributeMutationExisting`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/139339 Approved by: https://github.com/jansel, https://github.com/mlazos, https://github.com/anijain2305	2024-11-05 19:11:41 +00:00
Henry Tsang	350bc2a166	[export] Add support for symbool to make it usable for torch.cond (#138765 ) # Why? I want the following code to work. minimal repro: ``` class M(torch.nn.Module): def forward(self, dilate_flag): return dilate_flag.item() input1 = (torch.tensor([1], dtype=torch.bool, device="cuda"),) model = M().cuda() ep = torch.export.export(model, input1, strict=True) path = torch._inductor.aot_compile(ep.module(), input1) aot_model = torch._export.aot_load(path, device="cuda") actual_output = aot_model(input1) ``` error: AssertionError: Encountered an unsupported object of type <class 'torch.SymBool'> while writing the metadata for exported program second error will be handled by https://github.com/pytorch/pytorch/pull/138760 # Motivation I could technically bypass it with a torch.int tensor. However, it doesn't work with torch.cond. I want the following to work. It would also require https://github.com/pytorch/pytorch/pull/138760 for aot compile to work. ``` class M(torch.nn.Module): def __init__(self) -> None: super().__init__() self.dilate_flag = 0 def forward(self, dilate_flag): self.dilate_flag = dilate_flag.item() def true_fn(dilate_flag): return dilate_flag.clone() def false_fn(dilate_flag): return dilate_flag.clone() torch.cond( self.dilate_flag, true_fn, false_fn, (dilate_flag,), ) return self.dilate_flag input1 = (torch.tensor([1], dtype=torch.bool, device="cuda"),) input2 = (torch.tensor([0], dtype=torch.bool, device="cuda"),) inputs = (input1, input2) model = M().cuda() for input in inputs: expected_output = model(input) ep = torch.export.export(model, input, strict=False) path = torch._inductor.aot_compile(ep.module(), input) aot_model = torch._export.aot_load(path, device="cuda") actual_output = aot_model(*input) assert ( expected_output == actual_output ), f"henry they are not equal {expected_output} != {actual_output}" ``` Differential Revision: D64867504 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138765 Approved by: https://github.com/ydwu4	2024-11-04 23:31:49 +00:00
Jane Xu	514c466cd9	Redirect the custom ops landing page :D (#139634 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139634 Approved by: https://github.com/zou3519	2024-11-04 22:25:15 +00:00
Will Constable	3d93caf664	[c10d] Add thread-safety initialization warning (#139638 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139638 Approved by: https://github.com/kwen2501, https://github.com/c-p-i-o, https://github.com/XilunWu	2024-11-04 21:38:47 +00:00
Edward Z. Yang	585dbfa583	Profile guided optimization for automatic_dynamic (#139001 ) Previously: https://github.com/pytorch/pytorch/pull/138052 but the implementation is done from scratch, so I open a new PR. This implements the ability to save and load profiles of automatic dynamic decisions, so on subsequent runs we can directly make something automatically dynamic. Unlike the previous implementation, this cache is never enabled by default; instead, you have to specify a "job id" that says it's OK to share results. We will be able to automatically populate this id for internal MAST jobs but for generic OSS users you will have to explicitly opt into it. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/139001 Approved by: https://github.com/oulgen	2024-11-03 06:29:57 +00:00
PyTorch MergeBot	92d7f29e59	Revert "Profile guided optimization for automatic_dynamic (#139001 )" This reverts commit `f6be44c74e`. Reverted https://github.com/pytorch/pytorch/pull/139001 on behalf of https://github.com/ezyang due to more fbcode errors ([comment](https://github.com/pytorch/pytorch/pull/139001#issuecomment-2452985581))	2024-11-02 13:11:04 +00:00
Edward Z. Yang	f6be44c74e	Profile guided optimization for automatic_dynamic (#139001 ) Previously: https://github.com/pytorch/pytorch/pull/138052 but the implementation is done from scratch, so I open a new PR. This implements the ability to save and load profiles of automatic dynamic decisions, so on subsequent runs we can directly make something automatically dynamic. Unlike the previous implementation, this cache is never enabled by default; instead, you have to specify a "job id" that says it's OK to share results. We will be able to automatically populate this id for internal MAST jobs but for generic OSS users you will have to explicitly opt into it. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Differential Revision: [D65065497](https://our.internmc.facebook.com/intern/diff/D65065497) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139001 Approved by: https://github.com/oulgen	2024-11-02 11:50:11 +00:00
PyTorch MergeBot	8d1eaa3da6	Revert "Profile guided optimization for automatic_dynamic (#139001 )" This reverts commit `a6630bcf87`. Reverted https://github.com/pytorch/pytorch/pull/139001 on behalf of https://github.com/ezyang due to internal code triggers import cycle ([comment](https://github.com/pytorch/pytorch/pull/139001#issuecomment-2452833882))	2024-11-02 03:38:15 +00:00
Mikayla Gawarecki	a979318ef7	Add section to serialization note re weights_only (#139433 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139433 Approved by: https://github.com/malfet ghstack dependencies: #138936, #139221	2024-11-01 21:51:50 +00:00
Edward Z. Yang	a6630bcf87	Profile guided optimization for automatic_dynamic (#139001 ) Previously: https://github.com/pytorch/pytorch/pull/138052 but the implementation is done from scratch, so I open a new PR. This implements the ability to save and load profiles of automatic dynamic decisions, so on subsequent runs we can directly make something automatically dynamic. Unlike the previous implementation, this cache is never enabled by default; instead, you have to specify a "job id" that says it's OK to share results. We will be able to automatically populate this id for internal MAST jobs but for generic OSS users you will have to explicitly opt into it. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Differential Revision: [D65065497](https://our.internmc.facebook.com/intern/diff/D65065497) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139001 Approved by: https://github.com/oulgen	2024-11-01 21:43:25 +00:00
Mikayla Gawarecki	ea0e09b3f3	Add utility to get all unsafe globals in checkpoint (no pickletools dependency) (#139221 ) Fixes https://github.com/pytorch/pytorch/issues/129698 https://github.com/pytorch/pytorch/pull/139106 without pickletools Pull Request resolved: https://github.com/pytorch/pytorch/pull/139221 Approved by: https://github.com/malfet ghstack dependencies: #138936	2024-11-01 19:31:39 +00:00
bskrlj	8e27833e30	Ensure SWA boundary conditions w.r.t. definition (#133773 ) According to the documentation, decay is a number in [0,1] range,[ i.e.](https://pytorch.org/docs/stable/optim.html) ``` Decay is a parameter between 0 and 1 that controls how fast the averaged parameters are decayed. If not provided to get_ema_multi_avg_fn, the default is 0.999. ``` An inspection of `swa_utils.py` indicates there are no checks for invalid values of `decay`. Adding asserts as suggested in this PR ensures valid compute range (one way to enforce correct behavior, there are perhaps more suitable ones). Papers `torch` cites for reference idea/implementation also consider exclusively this range (e.g., https://arxiv.org/pdf/2310.04415). Fixes https://github.com/pytorch/pytorch/issues/133772 Pull Request resolved: https://github.com/pytorch/pytorch/pull/133773 Approved by: https://github.com/janeyx99	2024-10-31 18:24:08 +00:00
Nhat Minh Luu	261d90c18f	Add docs page for `torch.inf` and `torch.nan` (#138430 ) Fixes #131040 ## Description Add docs for `torch.inf` and `torch.nan`, ## Checklist - [x] The issue that is being fixed is referred in the description (see above "Fixes #ISSUE_NUMBER") - [x] Only one issue is addressed in this pull request - [x] Labels from the issue that this PR is fixing are added to this pull request - [x] No unnecessary issues are included into this pull request. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138430 Approved by: https://github.com/ezyang	2024-10-31 05:46:46 +00:00
Boyuan Feng	68134a320e	[Flex Attention] Paged Attention (#137164 ) This PR adds paged attention for flex attention. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137164 Approved by: https://github.com/drisspg	2024-10-29 17:05:22 +00:00
Jeff Daily	7c7b2d89ba	[ROCm] set hipblas workspace (#138791 ) Fixes #138532. This brings hipblas behavior in line with cublas behavior with respect to setting the workspace to an allocation from the caching allocator as well as the env var HIPBLAS_WORKSPACE_CONFIG. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138791 Approved by: https://github.com/naromero77amd, https://github.com/eqy, https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2024-10-29 01:37:55 +00:00
Svetlana Karslioglu	e00ead400c	Add a temporary Survey about the search (#139096 ) - Add a link to the new search survey - Add .css classes needed for the search banner Pull Request resolved: https://github.com/pytorch/pytorch/pull/139096 Approved by: https://github.com/seemethere, https://github.com/cjyabraham	2024-10-28 23:43:25 +00:00
Joel Schlosser	8ba9063002	FlexAttention support for NJT (#136792 ) This PR adds FlexAttention + NJT support. In particular: * To handle raggedness, treats the packed sequence dim of input NJTs as a giant "stacked sequence". To ensure user `score_mod` / `mask_mod` functions can still be written in the original NJT sequence space, this PR handles conversions for indices within the giant "stacked sequence" -> sequence relative indices automatically. * Provides `py_impls` for `NestedTensor` to the HOPs for flex attention forward / backward that simply wrap / unwrap NJTs appropriately * Adds barebones `new_empty()` support to NJT since FlexAttention utilizes this repeatedly; right now, only `new_empty()` with a shape of `()` is supported * Tests that FlexAttention with a causal mask matches causal SDPA * Adds a new public API for FlexAttention usage: * `create_nested_block_mask(mask_mod, B, H, njt, BLOCK_SIZE, _compile)` - NJT analogue for `create_block_mask()` that utilizes the `njt`'s ragged structure to create an appropriately-sized block mask (e.g. `(1, 1, total_seqlen, total_seqlen)`). This function handles the index conversion from "stacked sequence" space -> relative sequence space. * Minor note: as this is a public API, this function is purposefully named with "nested" instead of "njt" to keep the latter as an informal, mostly internal-only term. Example usage: ```python def causal_mask(b, h, q_idx, kv_idx): return q_idx >= kv_idx query = ... # NJT of shape (B, H, S, D) key = ... # NJT of shape (B, H, S, D) value = ... # NJT of shape (B, H, S, D) # create_nested_block_mask() automatically converts indices from "stacked sequence" space -> relative sequence space block_mask = create_nested_block_mask(causal_mask, 1, 1, query) # block mask conceptual shape is (B, H, sum(S), sum(S)) output = flex_attention(query, key, value, block_mask=block_mask) def causal_score_mod(score, b, h, q_idx, kv_idx): return torch.where(q_idx >= kv_idx, score, float("-inf")) # flex_attention() automatically converts indices from "stacked sequence" space -> relative sequence space for NJT inputs output2 = flex_attention(query, key, value, score_mod=causal_score_mod) ``` TODO: ~~Determine the right level of abstraction for public API helpers + move them alongside other helpers~~ Verify this with others though * ~~Some cleanup~~ * ~~`njt_score_mod_adapter`~~ * ~~Q: should `create_njt_block_mask()` call `njt_mask_mod_adapter()` so we don't need two calls?~~ * Can we avoid materializing the `sum(s)` length `seq_idx` used for conversion between stacked sequence -> sequence relative indices? * Not for now, although future work may deepen the integration between Flex + NJT (possibly requiring custom templates). We should try to cache this though. * ~~Demonstrate non-causal mask~~ * Support non-contiguous NJTs with holes (booted to future PR) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136792 Approved by: https://github.com/drisspg ghstack dependencies: #138841	2024-10-28 20:01:27 +00:00
Wouter Devriendt	bae3426af7	reimport pr137735 due to merging check issues (#138959 ) This is a cherry-pick from #137735 by @mikaylagawarecki , that cannot be merged due to a (wrongly) failing check for codev @diff-train-skip-merge Pull Request resolved: https://github.com/pytorch/pytorch/pull/138959 Approved by: https://github.com/mikaylagawarecki	2024-10-27 16:31:34 +00:00
Yu, Guangye	40c098f731	Introduce a device-agnostic runtime API design (#132204 ) # Motivation According to [[RFC]A device-agnostic Python runtime API design for stream-based accelerators](https://github.com/pytorch/pytorch/issues/128403), this PR intends to introduce a device-agnostic runtime API design. I personally prefer the Simple Version APIs that no longer accept the device type as an input argument. It means we will leverage `getAccelerator` to fetch the current accelerator. And it is flexible to expand these APIs to handle multiple types of accelerator scenarios. The design does NOT break the previous design philosophies. I also believe that namespace torch.accelerator is better. It lets users know that the APIs they are calling are running on an accelerator rather than CPU. This is important. Meanwhile, we can follow a simple API design principle: 1. Device-agnostic APIs should be placed under the torch.accelerator namespace and not accept a device_type optional parameter. 2. Device-specific APIs should be placed under device-specific submodules. 3. APIS required by both CPU and accelerators should be placed under the torch namespace and accept a device_type optional parameter. Also, I list the pros and cons of Simple Version here: Pros: - `torch.accelerator.foo` will have the same input argument as `torch.xxx.foo`, bringing a better user experience; - more concise, facilitate the developer to write a device-agnostic code. Cons: - no obvious drawbacks. # Additional Context I list the new APIs here: ```python torch.accelerator.is_available() -> bool: torch.accelerator.current_accelerator() -> torch.device: torch.accelerator.device_count() -> int: torch.accelerator.current_device_idx() -> int: torch.accelerator.set_device_idx(device: Union[torch.device, str, int, None]) -> None: torch.accelerator.current_stream(device: Union[torch.device, str, int, None]) -> torch.Stream: torch.accelerator.set_stream(stream: torch.Stream) -> None: torch.accelerator.synchronize(device: Union[torch.device, str, int, None]) -> None: ``` According to the discussion with Alban, we decide to change the API name `set_device` to `set_device_idx` and `current_device` to `current_device_idx` for more explicit. And will submit other PR to support device and stream context manager. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132204 Approved by: https://github.com/EikanWang, https://github.com/abhilash1910, https://github.com/gujinghui, https://github.com/albanD	2024-10-27 10:37:09 +00:00
Laith Sakka	ed313a5ca2	Introduce torch.sym_add, variadic add (#138660 ) Tested internally here: https://www.internalfb.com/diff/D64057744 This is a reland after previous internal failures. main change is ``` if min is None and max is None: torch._check_is_size(size) return ``` Partially addresses https://github.com/pytorch/pytorch/issues/128150 When you have big sums of values, we end up computing long chains of binary addition in our FX graph representation. Not only is this ugly, it also is quadratic, as the sympy.Add constructor is O(N) in number of arguments. Instead, ensure that we maintain the summation as a single FX node so we can do the entire addition all in one go. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/138660 Approved by: https://github.com/ezyang, https://github.com/bobrenjc93	2024-10-23 17:42:41 +00:00
Laith Sakka	662d07e93e	Remove parallel_and and parallel_or (#138135 ) Not used, suggested by @ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/138135 Approved by: https://github.com/ezyang	2024-10-23 00:22:22 +00:00
Nikita Shulga	d1be61ce4e	Update copyrights to 2024 (#138638 ) Spiritual successor of https://github.com/pytorch/pytorch/pull/119413 + CPP docs copyright update as well Fixes https://github.com/pytorch/pytorch/issues/138630 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138638 Approved by: https://github.com/atalman	2024-10-22 21:00:58 +00:00
Syed Tousif Ahmed	03c72976a5	Properly uses ref-counting for torch.cuda.use_mem_pool (#133600 ) This PR refactors some ref-counting functionality out of `beginAllocateToPool` and `releasePool`. The ref-counting logic is then used in construction and destruction of `torch.cuda.MemPool`. The `use_count` variable in the CUDACachingAllocator is essentially a refcount of how many context managers are using the pool. Since we are now lifting up the MemPool abstraction to the user, the MemPool object itself now needs to hold a an extra reference as well. Part of https://github.com/pytorch/pytorch/issues/124807. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133600 Approved by: https://github.com/eqy, https://github.com/ezyang	2024-10-22 03:21:53 +00:00
Mikayla Gawarecki	e24871eb3c	Add environment variable to force no weights_only load (#138225 ) In preparation for `weights_only` flip, if users don't have access to the `torch.load` call Pull Request resolved: https://github.com/pytorch/pytorch/pull/138225 Approved by: https://github.com/albanD	2024-10-21 23:26:15 +00:00
Justin Chu	c6609ece84	[ONNX] Remove deprecated export_to_pretty_string (#137790 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137790 Approved by: https://github.com/titaiwangms, https://github.com/xadupre ghstack dependencies: #137789	2024-10-21 18:17:48 +00:00
Tugsbayasgalan Manlaibaatar	1f32a1fb80	Replace torch.export default decomp table to be lazily populated (#137650 ) In this PR, we implement lazy dictionary for export decomp behaviour for following reasons: 1. Custom op loading can happen after import time, as a result, the decomp table might not be able to pick up the decomp. Therefore we try to delay materialization as late as possible. I intentionally seperated out the core_aten_decomp to not have any custom CIA ops in this PR to mitigate the risk of getting reverted but in the future, core_aten_decomp under torch/_decomp will exist as an alias to official export table (torch.export.default_decompositions) Differential Revision: [D64140807](https://our.internmc.facebook.com/intern/diff/D64140807) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137650 Approved by: https://github.com/justinchuby, https://github.com/bdhirsh	2024-10-18 19:28:52 +00:00
Svetlana Karslioglu	9c2a80322a	Add Programmable Google Search (#137716 ) - Adding the code for the programmable Google search - Adding the CSS overrides. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137716 Approved by: https://github.com/seemethere, https://github.com/albanD Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2024-10-18 18:18:16 +00:00
ErezYosef	5a81475884	Documentation Update: Fix Missing Whitespace in Optimizer Docs (#138321 ) ### Description: This PR addresses a minor [formatting issue identified in a previous contribution to the Optimizer documentation](https://github.com/pytorch/pytorch/pull/134107#discussion_r1800833948). Specifically, it fixes the missing whitespace after `param_names` in the section on utilizing named parameters to load the optimizer state dict. You can find the related docs here: [Optimizer Documentation](https://pytorch.org/docs/main/optim.html#how-to-utilize-named-parameters-to-load-optimizer-state-dict). @janeyx99 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138321 Approved by: https://github.com/janeyx99	2024-10-18 15:41:43 +00:00
Yu, Guangye	8cda774a03	Add torch.xpu.get_arch_list and torch.xpu.get_gencode_flags for XPU (#137773 ) # Motivation Add `torch.xpu.get_arch_list()` and `torch.xpu.get_gencode_flags()` methods that return architecture list and AOT flags to preserve what flags PyTorch XPU was built with. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137773 Approved by: https://github.com/EikanWang, https://github.com/albanD	2024-10-18 02:28:08 +00:00
Zheng, Zhaoqiong	7ba706c74e	update get start xpu (#137479 ) 1. respect the comment from the community, downgrade the "Beta" to "Prototype" for the first xpu release with wheel 2. add wheels installation of torchaudio & torchvision for nightly on Windows Pull Request resolved: https://github.com/pytorch/pytorch/pull/137479 Approved by: https://github.com/atalman, https://github.com/malfet	2024-10-16 17:36:29 +00:00
PyTorch MergeBot	dd32a32cb6	Revert "Expose option to disable CRC-32 computation during `torch.save` (#137735 )" This reverts commit `534fa96f2d`. Reverted https://github.com/pytorch/pytorch/pull/137735 on behalf of https://github.com/clee2000 due to failing internally D64438525, probably needs gating ([comment](https://github.com/pytorch/pytorch/pull/137735#issuecomment-2417412264))	2024-10-16 17:03:06 +00:00
William Wen	4c8718d8e7	[dynamo] add torch.compiler.set_stance (#137504 ) Attempt # 2 at https://github.com/pytorch/pytorch/pull/132926 to implement https://github.com/pytorch/pytorch/issues/123771. Implement a new `torch.compiler.set_stance` function that can force `torch.compile` regions to run eagerly. See added tests for usage examples. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137504 Approved by: https://github.com/yf225, https://github.com/jansel	2024-10-16 16:18:25 +00:00
Howard Huang	75109682b6	[Pipelining] Refactor Interleaved1F1B and ZeroBubble (#137783 ) NOTE: this PR removes `ScheduleFlexibleInterleaved1F1B`, let me know if theres any concerns. `ScheduleFlexibleInterleaved1F1B` is a superset of `Interleaved1F1B` and uses most of the same implementation, but relaxes the condition that `n_microbatches % pp_size == 0`. This is refactors the implementation into `Interleaved1F1B` and then removes it since it is confusing to have both schedules with similar names. This also refactors the zero bubble logic to belong in the `ZeroBubble` schedule class. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137783 Approved by: https://github.com/wconstab	2024-10-16 03:05:14 +00:00
Jane Xu	eaec72d1e6	Link directly to new Custom Ops Landing Page (#137933 ) e.g., click on first link in https://docs-preview.pytorch.org/pytorch/pytorch/137933/library.html#testing-custom-ops Pull Request resolved: https://github.com/pytorch/pytorch/pull/137933 Approved by: https://github.com/zou3519	2024-10-15 21:18:21 +00:00
Mikayla Gawarecki	534fa96f2d	Expose option to disable CRC-32 computation during `torch.save` (#137735 ) Option only works in open source, not internal Pull Request resolved: https://github.com/pytorch/pytorch/pull/137735 Approved by: https://github.com/albanD	2024-10-15 19:30:02 +00:00
PyTorch MergeBot	2831af39c4	Revert "[ONNX] Remove deprecated export_to_pretty_string (#137790 )" This reverts commit `d0628a7e39`. Reverted https://github.com/pytorch/pytorch/pull/137790 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/137789#issuecomment-2414632100))	2024-10-15 17:40:06 +00:00
Alex Baden	39d21ed803	[Inductor] Update AttrsDescriptor instantiation for Triton changes (#137458 ) The `AttrsDescriptor` class has been present in Triton for almost a year now (introduced [here](`72c9833927`)), so we should be able to rely on it existing. I am in the process of supporting the new `AttrsDescriptor` class and @jansel suggested I split changes to the existing class out separately to make sure nothing breaks removing the legacy attribute descriptor attributes. Initially I attempted to remove the branching around detecting whether `AttrsDescriptor` exists but that breaks because PyTorch must build without Triton. So, I went back and updated for the naming introduced in the commit linked above, and also removed two unused attributes `divisible_by_8` and `ids_to_fold` which were removed in Feb 2024 (https://github.com/triton-lang/triton/pull/3122 and https://github.com/triton-lang/triton/pull/3080 respectively). With these changes only the internal workings of the `AttrsDescriptor` class will differ between supported Triton versions, but the data stored will remain consistent. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137458 Approved by: https://github.com/jansel	2024-10-14 20:20:29 +00:00
ErezYosef	197601eeea	Add Support for Tracking Parameter Names (named_parameters) in Optimizer State Dict (#134107 ) A proposal addressing Issue #1489: Optimizer should track parameter names and not id. (also mentioned in here: [[RFC] Introducing FQNs/clarity eyeglasses to optim state_dict](https://dev-discuss.pytorch.org/t/rfc-introducing-fqns-clarity-to-optim-state-dict/1552) ## Summary This PR introduces a backward-compatible enhancement where optimizers track parameter names instead of just their id. Optimizers can be initialized with `named_parameters()` as: ```python optimizer = optim.SGD(model.named_parameters(), lr=0.01, momentum=0.9) ``` This allows for greater clarity and ease when handling optimizers, as the parameters' names are preserved within the optimizer’s `state_dict` as: ``` state_dict = { 'state': { 0: {'momentum_buffer': tensor(...), ...}, 1: {'momentum_buffer': tensor(...), ...}, }, 'param_groups': [ { 'lr': 0.01, 'weight_decay': 0, ... 'params': [0,1] 'param_names' ['layer.weight', 'layer.bias'] (optional) } ] } ``` Loading `state_dict` is not changed (backward-compatible) and the `param_names` key will be ignored. ## Key Features #### Named Parameters in Optimizer Initialization: Optimizers can accept the output of `model.named_parameters()` during initialization, allowing them to store parameter names directly. #### Parameter Names in `state_dict`: The parameter names are saved as a list in the optimizer’s `state_dict` with key `param_names`, alongside the `params` indices, ensuring seamless tracking of both names and parameters. ## Backward Compatibility #### No Breaking Changes: This change is fully backward-compatible. The added `param_names` key in the optimizer's `state_dict` is ignored when loading a state to the optimizer. #### Customization with Hooks: For more control, the loaded state_dict can be modified using a custom `register_load_state_dict_pre_hook`, providing flexibility for different design needs. ## Documentation Updates Please refer to the documentation changes for more details on how this feature is implemented and how it can be used effectively. ## Solution Example: A suggested solution to the problem mentioned in #1489, for the same parameters but in a different order. The following `register_load_state_dict_pre_hook` should be added to the optimizer before loading to enable loading the state dict : ```python def adapt_state_dict_ids(optimizer, state_dict): # assuming a single param group. current_state_group = optimizer.state_dict()['param_groups'][0] loaded_state_group = state_dict['param_groups'][0] # same number of params, same names, only different ordering current_state_name_to_id_mapping = {} # mapping -- param_name: id for i, name in enumerate(current_state_group['param_names']): current_state_name_to_id_mapping[name] = current_state_group['params'][i] # changing the ids of the loaded state dict to match the order of the given state dict. for i, name in enumerate(current_state_group['param_names']): loaded_state_group['params'][i] = current_state_name_to_id_mapping[name] return state_dict ``` In this code, the loaded `state_dict` ids are adapted to match the order of the current optimizer `state_dict`. Both the previous and the current optimizers are required to be initiated with `named_parameters()` to have the 'param_names' key in the dict. ### Note This is my first contribution to PyTorch, and I wish to receive feedback or suggestions for improvement. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134107 Approved by: https://github.com/janeyx99 Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>	2024-10-14 19:24:44 +00:00
Justin Chu	d0628a7e39	[ONNX] Remove deprecated export_to_pretty_string (#137790 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137790 Approved by: https://github.com/titaiwangms ghstack dependencies: #137789	2024-10-11 20:10:04 +00:00

1 2 3 4 5 ...

2953 Commits