pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Jesse Cai	aea771de30	[core][pruning][sparse][feature] SparseSemiStructured tensor subclass (#102135 ) This PR adds in support for semi-structured sparsity via a tensor subclass. It currently uses the CUTLASS kernels merged in PR #100881. In the future we plan to add in cuSPARSELt support (see the other PRs in the stack), which will give us larger performance gains. This PR adds in 2 things: - a Tensor subclass, `SparseSemiStructuredTensor` to store the sparse tensor in copmressed form and override `__torch_dispatch__`. - a conversion function that takes in a dense tensor and a semi-structured sparse bool mask and creates an instance of the subclass. SparseSemiStructuredTensor The subclass stores the dense tensor in a contiguous flattened tensor for future compatability with cuSPARSELt, which expects this format. Note that the CUTLASS kernels do not have this limitation, as the specified values and the metadata are passed separately in `_structured_sparse_linear`. In the future we can use the cuSPARSELT bindings [here](https://github.com/pytorch/pytorch/pull/103700) for faster matmul, better dtype converage, and relaxed shape constraints. Since we currently don't have a way to go back from the sparse representation to the dense representation, and we store the weights in compressed form, we don't have a great way to handle .t(). Instead, we keep track of how often we've called transpose on our tensor, and if it's an unexpected number we throw an error. When the first argument is sparse, we expect an even number of calls to transpose, while when the second argument is sparse, we expect an odd number of calls. This is because we support second argument sparse matrix multiplications by using transpose properties. to_sparse_semi_structured This is a conversion function to convert a dense tensor and a semi-structured sparse bool mask into a subclass. Currently, we must pass in a bool mask, since we can't infer it becuase there may be additional zero elements in the dense tensor, so `tensor !=0` is not 2:4 sparse. Once we add either a method to derive the mask from the dense tensor or cuSPARSELt, we no longer need to pass in the mask. cuSPARSELt has it's own helper functions to create the metadata mask. User Details We have implemented support for the following ops for `torch.float16` and `torch.int8`: ``` torch.addmm(bias, dense, sparse.t()) torch.mm(dense, sparse) torch.mm(sparse, dense) aten.linear.default aten.t.default aten.t.detach ``` The end user interface to accelerate a nn.Linaer module with the subclass would look like this: ``` from torch.sparse import to_sparse_semi_structured mask = torch.Tensor([0, 0, 1, 1]).tile(128, 32).cuda().bool() linear = Model(128, 128).half().cuda() linear.weight = nn.Parameter(to_sparse_semi_structured(linear.weight, mask=linear.weight.bool()) ``` This also updates tests and the `torch.sparse` module docstring to reflect these changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102135 Approved by: https://github.com/albanD	2023-06-27 02:37:00 +00:00
Mikayla Gawarecki	981f24e806	Add docstring to torch.serialization.register_package (#104046 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104046 Approved by: https://github.com/albanD	2023-06-26 23:28:32 +00:00
PyTorch MergeBot	bfa08a1c67	Revert "[core][pruning][sparse][feature] SparseSemiStructured tensor subclass (#102135 )" This reverts commit `cf5262a84f`. Reverted https://github.com/pytorch/pytorch/pull/102135 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but test_sparse_semi_structured.py::TestSparseSemiStructuredCUDA::test_mm_sparse_first_NT_cuda_int8 is failing CUDA trunk jobs `cf5262a84f`. This looks like a landrace ([comment](https://github.com/pytorch/pytorch/pull/102135#issuecomment-1608423849))	2023-06-26 22:54:16 +00:00
Jesse Cai	cf5262a84f	[core][pruning][sparse][feature] SparseSemiStructured tensor subclass (#102135 ) This PR adds in support for semi-structured sparsity via a tensor subclass. It currently uses the CUTLASS kernels merged in PR #100881. In the future we plan to add in cuSPARSELt support (see the other PRs in the stack), which will give us larger performance gains. This PR adds in 2 things: - a Tensor subclass, `SparseSemiStructuredTensor` to store the sparse tensor in copmressed form and override `__torch_dispatch__`. - a conversion function that takes in a dense tensor and a semi-structured sparse bool mask and creates an instance of the subclass. SparseSemiStructuredTensor The subclass stores the dense tensor in a contiguous flattened tensor for future compatability with cuSPARSELt, which expects this format. Note that the CUTLASS kernels do not have this limitation, as the specified values and the metadata are passed separately in `_structured_sparse_linear`. In the future we can use the cuSPARSELT bindings [here](https://github.com/pytorch/pytorch/pull/103700) for faster matmul, better dtype converage, and relaxed shape constraints. Since we currently don't have a way to go back from the sparse representation to the dense representation, and we store the weights in compressed form, we don't have a great way to handle .t(). Instead, we keep track of how often we've called transpose on our tensor, and if it's an unexpected number we throw an error. When the first argument is sparse, we expect an even number of calls to transpose, while when the second argument is sparse, we expect an odd number of calls. This is because we support second argument sparse matrix multiplications by using transpose properties. to_sparse_semi_structured This is a conversion function to convert a dense tensor and a semi-structured sparse bool mask into a subclass. Currently, we must pass in a bool mask, since we can't infer it becuase there may be additional zero elements in the dense tensor, so `tensor !=0` is not 2:4 sparse. Once we add either a method to derive the mask from the dense tensor or cuSPARSELt, we no longer need to pass in the mask. cuSPARSELt has it's own helper functions to create the metadata mask. User Details We have implemented support for the following ops for `torch.float16` and `torch.int8`: ``` torch.addmm(bias, dense, sparse.t()) torch.mm(dense, sparse) torch.mm(sparse, dense) aten.linear.default aten.t.default aten.t.detach ``` The end user interface to accelerate a nn.Linaer module with the subclass would look like this: ``` from torch.sparse import to_sparse_semi_structured mask = torch.Tensor([0, 0, 1, 1]).tile(128, 32).cuda().bool() linear = Model(128, 128).half().cuda() linear.weight = nn.Parameter(to_sparse_semi_structured(linear.weight, mask=linear.weight.bool()) ``` This also updates tests and the `torch.sparse` module docstring to reflect these changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102135 Approved by: https://github.com/albanD	2023-06-26 21:30:43 +00:00
Sergii Dymchenko	adf9595c2f	Update CODEOWNERS (#103934 ) Remove users that no longer have write access to the repo, resolving CODEOWNERS errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103934 Approved by: https://github.com/ZainRizvi, https://github.com/atalman, https://github.com/malfet	2023-06-26 19:29:29 +00:00
ZhaoqiongZ	7cef7195f6	[draft] Update Multiprocessing best practices with CPU device (#103229 ) Fixes [#102498](https://github.com/pytorch/pytorch/issues/102498) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103229 Approved by: https://github.com/mingfeima, https://github.com/svekars, https://github.com/jgong5	2023-06-25 06:26:40 +00:00
Zachary DeVito	afc788a99c	Re-land _cycleviz.py: visualize reference cycles holding cuda memory (#104051 ) Reference cycles are freed by the cycle collector rather than being cleaned up when the objects in the cycle first become unreachable. If a cycle points to a tensor, the CUDA memory for that tensor will not be freed until garbage collection runs. Accumulation of CUDA allocations can lead to out of memory errors (OOMs), as well as non-deterministic allocation behavior which is harder to debug. This visualizer installs a garbage collection hook to look for cycles containing CUDA tensors and saves a visualization of the garbage: ``` from torch.cuda._cycleviz import warn_tensor_cycles warn_tensor_cycles() # do some work that results in a cycle getting garbage collected # ... > WARNING:root:Reference cycle includes a CUDA Tensor see visualization of cycle /tmp/tmpeideu9gl.html ``` Reland to make windows skip the test. This reverts commit `7b3b6dd426`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104051 Approved by: https://github.com/aaronenyeshi, https://github.com/malfet	2023-06-23 13:44:58 +00:00
PyTorch MergeBot	7b3b6dd426	Revert "_cycleviz.py: visualize reference cycles holding cuda memory (#102656 )" This reverts commit `dba67f71c9`. Reverted https://github.com/pytorch/pytorch/pull/102656 on behalf of https://github.com/huydhn due to Sorry for reverting your PR. But I think the change is failing on Windows CUDA https://github.com/pytorch/pytorch/actions/runs/5341701630/jobs/9683293600 ([comment](https://github.com/pytorch/pytorch/pull/102656#issuecomment-1603035364))	2023-06-22 17:16:47 +00:00
albanD	4143b6b89b	Add torch_dispatch and modes to extending.rst note (#102087 ) The following subjects are not in this PR and will be done in a follow up: - Go through torch_function section and update to the latest phrasing and link to the proper new sections - Go through torch.library and custom device docs to add links to the new sections as appropriate - Top level explanations on which component should be used Pull Request resolved: https://github.com/pytorch/pytorch/pull/102087 Approved by: https://github.com/janeyx99	2023-06-22 12:56:35 +00:00
Zachary DeVito	dba67f71c9	_cycleviz.py: visualize reference cycles holding cuda memory (#102656 ) Reference cycles are freed by the cycle collector rather than being cleaned up when the objects in the cycle first become unreachable. If a cycle points to a tensor, the CUDA memory for that tensor will not be freed until garbage collection runs. Accumulatin of CUDA allocations can lead to out of memory errors (OOMs), as well as non-deterministic allocation behavior which is harder to debug. This visualizer installs a garbage collection hook to look for cycles containing CUDA tensors and saves a visualization of the garbage: ``` from torch.cuda._cycleviz import warn_tensor_cycles warn_tensor_cycles() # do some work that results in a cycle getting garbage collected # ... > WARNING:root:Reference cycle includes a CUDA Tensor see visualization of cycle /tmp/tmpeideu9gl.html ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/102656 Approved by: https://github.com/aaronenyeshi	2023-06-22 04:00:28 +00:00
Michael Suo	a475ea4542	[fx] change from #users to num_users in graph printout (#101140 ) `#users` means stuff in various chat apps, which makes it annoying to copypasta graphs into them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101140 Approved by: https://github.com/ezyang	2023-06-20 21:24:32 +00:00
PyTorch MergeBot	e031dd23b0	Revert "To add brief intro for CPU backend optimization (#103666 )" This reverts commit `013ffe457e`. Reverted https://github.com/pytorch/pytorch/pull/103666 on behalf of https://github.com/huydhn due to Failing doc tests in trunk `013ffe457e` ([comment](https://github.com/pytorch/pytorch/pull/103666#issuecomment-1599301270))	2023-06-20 18:33:01 +00:00
Zaili Wang	013ffe457e	To add brief intro for CPU backend optimization (#103666 ) This PR is about adding brief introduction for x86 CPU backend optimization. Per previous discussion, the former PR #103307 was closed and creating this one, the contents are put into a new file. @Guobing-Chen @jgong5 @mingfeima @jingxu10 please help review, thanks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103666 Approved by: https://github.com/jgong5, https://github.com/malfet	2023-06-20 17:35:22 +00:00
leslie-fang-intel	9832cfbbfe	Quantization oneDNN backend only support VNNI CPU (#103653 ) Summary - Update the quantization document that default qconfig with oneDNN backend is recommended to be used on CPUs with Vector Neural Network Instruction support. - Add the warning message when user uses default qconfig with oneDNN backend on CPU without Vector Neural Network Instruction support. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103653 Approved by: https://github.com/jgong5, https://github.com/malfet	2023-06-19 09:50:07 +00:00
albanD	918fe519a0	Use the new analytics ID (#103766 ) Re: https://github.com/pytorch/pytorch.github.io/issues/1397 Following the migration to latest google analytics FYI @malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/103766 Approved by: https://github.com/svekars	2023-06-16 23:21:08 +00:00
Edward Z. Yang	bc6ec97e02	Switch dynamic_shapes to True by default (#103597 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/103597 Approved by: https://github.com/voznesenskym	2023-06-15 15:16:20 +00:00
Mark Saroufim	ea384cd377	torch.compiler public namespace (#102182 ) # torch.compiler public API ## Goal The goal of this document is to describe the public facing API for torchdynamo and torchinductor. Today both dynamo and torchinductor are in `torch/_dynamo` and `torch/_inductor` namespace with the only public function `torch.compile()` which is directly placed in `torch/__init__.py` This poses a few problems for users trying to take dependencies on PyTorch 2.0 1. Unclear BC guarantees 2. No builtin discovery mechanism outside of reading the source code 3. No hard requirements for docstrings or type annotations Most importantly it mixes two personas the PyTorch 2.0 developer vs the PyTorch 2.0 customer so this is an attempt to address this. We draw a lot of inspiration from the `functorch` migration to the `func` namespace. ## Alternate names We did discuss some other alternative names 1. `torch.compile` -> problem is this would break BC on the existing `torch.compile` function 2. `torch.dynamo` -> `dynamo` is so far not something we've deliberately hidden from users but problem is now figuring out what it's `_dynamo` vs `dynamo` might be confusing 3. `torch.compiler` -> 1 would be better but to keep BC this is a good compromise # The general approach ## Proposal 1 In https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/__init__.py We have function called `reset()`, this function is essential if users are trying to `torch.compile()` a model under different settings ```python # in _dynamo/ def reset(): do_reset_stuff() ``` Instead we propose ```python # in compiler/ def reset(): do_reset_stuff() # As in copy paste the logic from _dynamo.reset # in _dynamo/ import warnings import inspect def reset(): function_name = inspect.currentframe().f_code.co_name warnings.warn(f"{function_name} is deprecated, use compiler.{function_name} instead", DeprecationWarning) return compiler.reset() ``` ## Proposal 2 ```python # in compiler/ def reset(): “”” Docstrings here “”” _dynamo.reset() # in _dynamo/ No changes ``` Consensus so far seems to be proposal 2 since fewer warnings will be less jarring and it’ll make it quite easy to merge the public API ## Docstrings The above was an example of a function that has no inputs or outputs but there are other functions which could use an improvement in their docstrings, for example allow_in_graph actually works over lists of functions but that’s not mentioned anywhere in the example only if you read the source code. def allow_in_graph(fn): """ Customize which functions TorchDynamo will include in the generated graph. Similar to `torch.fx.wrap()`. Parameters: fn (callable or list/tuple): The function(s) to be allowed in the graph. Returns: callable or list/tuple: The input function(s) included in the graph. Examples: Customize inclusion of a single function: :: torch._dynamo.allow_in_graph(my_custom_function) Customize inclusion of multiple functions: :: torch._dynamo.allow_in_graph([my_custom_function1, my_custom_function2]) @torch._dynamo.optimize(...) def fn(a): x = torch.add(x, 1) x = my_custom_function(x) x = torch.add(x, 1) return x fn(...) Notes: The `allow_in_graph` function allows customization of which functions TorchDynamo includes in the generated graph. It can be used to include specific functions that are not automatically captured by TorchDynamo. If `fn` is a list or tuple, `allow_in_graph` will be called recursively on each element in the sequence. Once a function is allowed in the graph using `allow_in_graph`, it will be captured in the graph generated by TorchDynamo. This customization enables more fine-grained control over the functions included in the graph. Note that `allow_in_graph` expects the input `fn` to be a callable. """ if isinstance(fn, (list, tuple)): return [allow_in_graph(x) for x in fn] assert callable(fn), "allow_in_graph expects a callable" allowed_functions._allowed_function_ids.add(id(fn)) allowed_functions._disallowed_function_ids.remove(id(fn)) return fn So to make the API public, we’d have to write similar docstrings for all public functions we’d like to create. The benefit of this approach is that 1. No BC risks, internal and external users relying on our tooling can slowly wean off the private functions. 2. We will also have to write correct docstrings which will automatically make our documentation easier to maintain and render correctly on pytorch.org 3. We already have some BC guarantees already, we don’t kill OptimizedModule, we rejected the PR to change the config system The con of this approach is that Will be stuck with some potentially suboptimal functions/classes that you can’t kill ## Testing strategy If the approach is to mostly make a public function call an already tested private function then all we need to do is ensure that the function signatures don't change ## Which functions should be in the public API Our heuristic for deciding whether something should be public or not is are users already relying on it for lack of other options or have we recommended some non public functions for users to debug their PT 2.0 programs. Heuristic for not putting something in public is that it’s an experimental subsystem with the goal of turning it on by default, it’s very core dev centric, meta centric, a bunch of different configs that should be batched into a single user facing one, or something that needs to be renamed because the name is confusing #### Top level `torch.compile()` -> already is a public API it does require some minor improvements like having configs be passed in to any backend and not just inductor (EDIT: This was already done https://github.com/pytorch/pytorch/pull/99645l) and renaming `mode=reduce-overhead` to `mode=cudagraph` To make sure that PT 2.0 is supported with a given pytorch version users can create a new public function and this would replace the need for `try/except` blocks around `import torch._dynamo` that has been populating user code. ```python def pt2_enabled(): if hasattr(torch, 'compile'): return True else: return False ``` For all of the below they will be translated to `torch.compiler.function_name()` #### From _dynamo As a starting point we looked at https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/__init__.py and we suggest redefining these functions in `pytorch/torch/compiler/__init__.py` It might also make sense to split them over multiple files and import them in `__init__.py` but because the number of functions is small it'd probably be fine to add them all into a single compiler/__init__.py until this list becomes larger 1. `reset()` 2. `allow_in_graph()` 10. `list_backends()` 12. `compile()`: torch.compile() would be mostly a shell function passing arguments to torch.compiler.compile() 13. `assume_constant_result()`: TODO: Double check how this is useful 15. `torch._dynamo.disable()` Some notable omissions 11. `explain()`: We need to clean up the output for this function, make it a data class and pretty printable 1. `forbid_in_graph()`: Considered adding this but should instead consolidate on `disallow_in_graph` 2. `optimize_assert()`: Already covered by `torch.compile(fullgraph=True)` 3. `check_if_dynamo_supported()`: this would be supplanted by pt2_enabled() 4. `compilation_metrics`, `graph_breaks_reasons` ..: would all be accessed via `torch.compiler.explain()` 5. `replay` does not seem useful to end customers 6. . `graph_break()`: Mostly useful for debugging or unit tests 9. `register_backend()`: End users will just pass a string backend to torch.compile, only devs will create new backends 10. `export()` : Eventually this needs to public but for now it’s not ready so just highlighting that it will be in the public API eventually 11. `disallow_in_graph()`: Usage is limited 12. `mark_static()`: we can keep this private until dynamic=True is recommended in stable 13. `mark_dynamic()`: we can keep this private until dynamic=True is recommended in trunk 14. 8. `OptimizedModule`: This is the only class that we'd expose but is crucial since users are running code like `if isinstance(mod, OptimizedModule): torch.save(mod._orig_mod)` EDIT: because we fixed pickling we no longer need to expose this 15. `is_compiling()`: Still not clear how this useful to end users There are also config variables which we need to expose https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/config.py Some of our configs are useful dev flags, others are to gate experimental functionality and others are essential debugging tools and we seperate out the essential debugging and logging tools to a public facing config. TODO: I still need to think of a good way of porting the config in a BC way here are some ideas 1. Just make all passes available and controllable via `torch.compile(options={})` but only show docstrings for the ones users should care about. The current problem with our config system is we have 3 ways of setting them once via `options={}`, environment variables and variables in `config.py`, it'd be worth settling on one source of truth and have that be the public API. The configs we should make public are 1. `log_file_name` 2. `verbose` 3. `cache_size_limit` 4. `repro_level` and `repro_after`: Although we can rename these to minifier and give human readable names to the levels Everything else should stay private in particular 1. `print_graph_breaks`, `print_specializations`: should be supplanted by `explain()` for public users 2. dynamic shape configs : Users should only have to worry about `torch.compile(dynamic=True/False)` 3. The distributed flags, hook or guard configs: If we tell a user to use FSDP and DDP then the flag should be enabled by default or be in a private namespace 4. The fbcode flags: Obviously no need to be user facing 5. Skip/Allow lists: Not something normal users should play around with #### From _inductor Very little of inductor should be exposed in a public facing API, our core audience as in people writing models mostly just need information on what certain passes mean and how to control them a high level and they can do this with `torch.compile(options={})` so the goal here should be more to make available passes clearer and ideally consolidate them into `torch.compile()` docstrings or modes. There are some exceptions though from https://github.com/pytorch/pytorch/blob/main/torch/_inductor/__init__.py 1. `list_mode_options()` 2. `list_options()`: this needs an additional pass to hide internal or debug options For both of these we’d rename them to compiler.inductor_list_mode_options and compiler.inductor_list_options() since they would be in the same init file as the one for dynamo Notable omissions 1. `_inductor.compile()`: Because of users are coming in with their own fx graph, they are likely developers 2. `_inductor.aot_compile()`:Again this is about capturing and modifying fx graphs so users APIs don't need to be public However the configs are a slightly different story, because we can choose to either 1. Make all configs public 2. Make some configs public and keep most of the private ones. If public config is set it should override the private version 3. Make all configs controllable via `torch.compile(options={})` but make list_options() hide more things For now 3 seems like the most reasonable choice with some high level configs we’ll keep like TORCH_COMPILE_DEBUG Regardless here's what should probably be public or advertised more 1. `disable_progress` and verbose_progress: Combine and enable by default 2. `fallback_random`: We could make the case this shouldn't be public if a top level deterministic mode enables this 3. `profile_bandwidth`: Or could make the case that this should be in TORCH_COMPILE_DEBUG Notable omissions 1. Any config that would generally improve performance for most that we should probably enable by default but might be disabled in the short term because of stability: example `epilogue_fusion`, `pattern_matcher`, `reordering` 2. Autotuning flags: Should just sit behind `torch.compile(mode="max-autotune")` like `max_autotune`, `max_autotune_gemm` 3. `coordinate_descent_tuning`: This one I'm a but mixed about, maybe it just also fall into `mode="max-autotune"` 4. `trace`: `TORCH_COMPILE_DEBUG` is the best flag for all of this 5. `triton.cudagraphs`: Default should be `torch.compile(mode="reduce-overhead")` - I'd go further and rename the `mode=cudagraph` and we can keep reduce-overhead for BC reasons 6. `triton_unique_kernel_names`: Mostly useful for devs debugging 7. `dce`: which doesnt really do anything 8. `shape_padding`: Elias is working on enabling this by default in which case we also remove it ## Mechanics This PR would include the public functions with their docstrings Another PR will take a stab at the configs And for work where the APIs are still being cleaned up whether its minifier or escape hatches, export or dynamic shapes, aot_inductor etc.. we’ll keep them private until a public commitment can be made Pull Request resolved: https://github.com/pytorch/pytorch/pull/102182 Approved by: https://github.com/jansel, https://github.com/albanD	2023-06-13 19:52:17 +00:00
Michael Lazos	6c6c897d6b	Add graph break logging option instead of config flag (#103202 ) Make graph break logging a logging option vs a config setting Pull Request resolved: https://github.com/pytorch/pytorch/pull/103202 Approved by: https://github.com/yanboliang, https://github.com/anijain2305	2023-06-12 19:52:31 +00:00
shaoyf42	443edb9015	[DOCS][DDP]Fix the simple of saving and reloading PowerSGD state and hook. (#102721 ) Fix the simple of saving and reloading PowerSGD state and hook. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102721 Approved by: https://github.com/H-Huang	2023-06-10 00:15:00 +00:00
Weiming Zhao	28f43c767c	Fix outdated log settings in doc (#102285 ) (#102286 ) Replace torch._dynamo.config.loglevel=<level> with torch._logging.set_logs(dynamo=<level>) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/102286 Approved by: https://github.com/msaroufim, https://github.com/Neilblaze	2023-06-07 18:07:20 +00:00
David Berard	038955f489	torch.compile docs: "Profiling to understand torch.compile performance (#102862 ) Docs on how to use torch.profiler.profile to understand torch.compile performance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102862 Approved by: https://github.com/eellison	2023-06-06 22:00:36 +00:00
Eli Uriegas	e26f5b2ac7	docs: Render bullet points correctly (#103021 ) This wasn't rendering correctly on the website, this should make it so that the bullet points actually show correctly now. Signed-off-by: Eli Uriegas <eliuriegas@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/103021 Approved by: https://github.com/albanD	2023-06-06 00:22:49 +00:00
Elias Ellison	4479e2fa19	fix profiling ref in side panel (#103014 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103014 Approved by: https://github.com/msaroufim	2023-06-05 21:19:51 +00:00
Elias Ellison	d89c719160	Fix torch.compile side panels refs (#102407 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/102407 Approved by: https://github.com/msaroufim	2023-06-05 20:08:40 +00:00
PyTorch MergeBot	258d398eec	Revert "torch.compiler public namespace (#102182 )" This reverts commit `b5840f99c3`. Reverted https://github.com/pytorch/pytorch/pull/102182 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/102182#issuecomment-1576144551))	2023-06-05 06:52:37 +00:00
Mark Saroufim	b5840f99c3	torch.compiler public namespace (#102182 ) # torch.compiler public API ## Goal The goal of this document is to describe the public facing API for torchdynamo and torchinductor. Today both dynamo and torchinductor are in `torch/_dynamo` and `torch/_inductor` namespace with the only public function `torch.compile()` which is directly placed in `torch/__init__.py` This poses a few problems for users trying to take dependencies on PyTorch 2.0 1. Unclear BC guarantees 2. No builtin discovery mechanism outside of reading the source code 3. No hard requirements for docstrings or type annotations Most importantly it mixes two personas the PyTorch 2.0 developer vs the PyTorch 2.0 customer so this is an attempt to address this. We draw a lot of inspiration from the `functorch` migration to the `func` namespace. ## Alternate names We did discuss some other alternative names 1. `torch.compile` -> problem is this would break BC on the existing `torch.compile` function 2. `torch.dynamo` -> `dynamo` is so far not something we've deliberately hidden from users but problem is now figuring out what it's `_dynamo` vs `dynamo` might be confusing 3. `torch.compiler` -> 1 would be better but to keep BC this is a good compromise # The general approach ## Proposal 1 In https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/__init__.py We have function called `reset()`, this function is essential if users are trying to `torch.compile()` a model under different settings ```python # in _dynamo/ def reset(): do_reset_stuff() ``` Instead we propose ```python # in compiler/ def reset(): do_reset_stuff() # As in copy paste the logic from _dynamo.reset # in _dynamo/ import warnings import inspect def reset(): function_name = inspect.currentframe().f_code.co_name warnings.warn(f"{function_name} is deprecated, use compiler.{function_name} instead", DeprecationWarning) return compiler.reset() ``` ## Proposal 2 ```python # in compiler/ def reset(): “”” Docstrings here “”” _dynamo.reset() # in _dynamo/ No changes ``` Consensus so far seems to be proposal 2 since fewer warnings will be less jarring and it’ll make it quite easy to merge the public API ## Docstrings The above was an example of a function that has no inputs or outputs but there are other functions which could use an improvement in their docstrings, for example allow_in_graph actually works over lists of functions but that’s not mentioned anywhere in the example only if you read the source code. def allow_in_graph(fn): """ Customize which functions TorchDynamo will include in the generated graph. Similar to `torch.fx.wrap()`. Parameters: fn (callable or list/tuple): The function(s) to be allowed in the graph. Returns: callable or list/tuple: The input function(s) included in the graph. Examples: Customize inclusion of a single function: :: torch._dynamo.allow_in_graph(my_custom_function) Customize inclusion of multiple functions: :: torch._dynamo.allow_in_graph([my_custom_function1, my_custom_function2]) @torch._dynamo.optimize(...) def fn(a): x = torch.add(x, 1) x = my_custom_function(x) x = torch.add(x, 1) return x fn(...) Notes: The `allow_in_graph` function allows customization of which functions TorchDynamo includes in the generated graph. It can be used to include specific functions that are not automatically captured by TorchDynamo. If `fn` is a list or tuple, `allow_in_graph` will be called recursively on each element in the sequence. Once a function is allowed in the graph using `allow_in_graph`, it will be captured in the graph generated by TorchDynamo. This customization enables more fine-grained control over the functions included in the graph. Note that `allow_in_graph` expects the input `fn` to be a callable. """ if isinstance(fn, (list, tuple)): return [allow_in_graph(x) for x in fn] assert callable(fn), "allow_in_graph expects a callable" allowed_functions._allowed_function_ids.add(id(fn)) allowed_functions._disallowed_function_ids.remove(id(fn)) return fn So to make the API public, we’d have to write similar docstrings for all public functions we’d like to create. The benefit of this approach is that 1. No BC risks, internal and external users relying on our tooling can slowly wean off the private functions. 2. We will also have to write correct docstrings which will automatically make our documentation easier to maintain and render correctly on pytorch.org 3. We already have some BC guarantees already, we don’t kill OptimizedModule, we rejected the PR to change the config system The con of this approach is that Will be stuck with some potentially suboptimal functions/classes that you can’t kill ## Testing strategy If the approach is to mostly make a public function call an already tested private function then all we need to do is ensure that the function signatures don't change ## Which functions should be in the public API Our heuristic for deciding whether something should be public or not is are users already relying on it for lack of other options or have we recommended some non public functions for users to debug their PT 2.0 programs. Heuristic for not putting something in public is that it’s an experimental subsystem with the goal of turning it on by default, it’s very core dev centric, meta centric, a bunch of different configs that should be batched into a single user facing one, or something that needs to be renamed because the name is confusing #### Top level `torch.compile()` -> already is a public API it does require some minor improvements like having configs be passed in to any backend and not just inductor (EDIT: This was already done https://github.com/pytorch/pytorch/pull/99645l) and renaming `mode=reduce-overhead` to `mode=cudagraph` To make sure that PT 2.0 is supported with a given pytorch version users can create a new public function and this would replace the need for `try/except` blocks around `import torch._dynamo` that has been populating user code. ```python def pt2_enabled(): if hasattr(torch, 'compile'): return True else: return False ``` For all of the below they will be translated to `torch.compiler.function_name()` #### From _dynamo As a starting point we looked at https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/__init__.py and we suggest redefining these functions in `pytorch/torch/compiler/__init__.py` It might also make sense to split them over multiple files and import them in `__init__.py` but because the number of functions is small it'd probably be fine to add them all into a single compiler/__init__.py until this list becomes larger 1. `reset()` 2. `allow_in_graph()` 10. `list_backends()` 12. `compile()`: torch.compile() would be mostly a shell function passing arguments to torch.compiler.compile() 13. `assume_constant_result()`: TODO: Double check how this is useful 15. `torch._dynamo.disable()` Some notable omissions 11. `explain()`: We need to clean up the output for this function, make it a data class and pretty printable 1. `forbid_in_graph()`: Considered adding this but should instead consolidate on `disallow_in_graph` 2. `optimize_assert()`: Already covered by `torch.compile(fullgraph=True)` 3. `check_if_dynamo_supported()`: this would be supplanted by pt2_enabled() 4. `compilation_metrics`, `graph_breaks_reasons` ..: would all be accessed via `torch.compiler.explain()` 5. `replay` does not seem useful to end customers 6. . `graph_break()`: Mostly useful for debugging or unit tests 9. `register_backend()`: End users will just pass a string backend to torch.compile, only devs will create new backends 10. `export()` : Eventually this needs to public but for now it’s not ready so just highlighting that it will be in the public API eventually 11. `disallow_in_graph()`: Usage is limited 12. `mark_static()`: we can keep this private until dynamic=True is recommended in stable 13. `mark_dynamic()`: we can keep this private until dynamic=True is recommended in trunk 14. 8. `OptimizedModule`: This is the only class that we'd expose but is crucial since users are running code like `if isinstance(mod, OptimizedModule): torch.save(mod._orig_mod)` EDIT: because we fixed pickling we no longer need to expose this 15. `is_compiling()`: Still not clear how this useful to end users There are also config variables which we need to expose https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/config.py Some of our configs are useful dev flags, others are to gate experimental functionality and others are essential debugging tools and we seperate out the essential debugging and logging tools to a public facing config. TODO: I still need to think of a good way of porting the config in a BC way here are some ideas 1. Just make all passes available and controllable via `torch.compile(options={})` but only show docstrings for the ones users should care about. The current problem with our config system is we have 3 ways of setting them once via `options={}`, environment variables and variables in `config.py`, it'd be worth settling on one source of truth and have that be the public API. The configs we should make public are 1. `log_file_name` 2. `verbose` 3. `cache_size_limit` 4. `repro_level` and `repro_after`: Although we can rename these to minifier and give human readable names to the levels Everything else should stay private in particular 1. `print_graph_breaks`, `print_specializations`: should be supplanted by `explain()` for public users 2. dynamic shape configs : Users should only have to worry about `torch.compile(dynamic=True/False)` 3. The distributed flags, hook or guard configs: If we tell a user to use FSDP and DDP then the flag should be enabled by default or be in a private namespace 4. The fbcode flags: Obviously no need to be user facing 5. Skip/Allow lists: Not something normal users should play around with #### From _inductor Very little of inductor should be exposed in a public facing API, our core audience as in people writing models mostly just need information on what certain passes mean and how to control them a high level and they can do this with `torch.compile(options={})` so the goal here should be more to make available passes clearer and ideally consolidate them into `torch.compile()` docstrings or modes. There are some exceptions though from https://github.com/pytorch/pytorch/blob/main/torch/_inductor/__init__.py 1. `list_mode_options()` 2. `list_options()`: this needs an additional pass to hide internal or debug options For both of these we’d rename them to compiler.inductor_list_mode_options and compiler.inductor_list_options() since they would be in the same init file as the one for dynamo Notable omissions 1. `_inductor.compile()`: Because of users are coming in with their own fx graph, they are likely developers 2. `_inductor.aot_compile()`:Again this is about capturing and modifying fx graphs so users APIs don't need to be public However the configs are a slightly different story, because we can choose to either 1. Make all configs public 2. Make some configs public and keep most of the private ones. If public config is set it should override the private version 3. Make all configs controllable via `torch.compile(options={})` but make list_options() hide more things For now 3 seems like the most reasonable choice with some high level configs we’ll keep like TORCH_COMPILE_DEBUG Regardless here's what should probably be public or advertised more 1. `disable_progress` and verbose_progress: Combine and enable by default 2. `fallback_random`: We could make the case this shouldn't be public if a top level deterministic mode enables this 3. `profile_bandwidth`: Or could make the case that this should be in TORCH_COMPILE_DEBUG Notable omissions 1. Any config that would generally improve performance for most that we should probably enable by default but might be disabled in the short term because of stability: example `epilogue_fusion`, `pattern_matcher`, `reordering` 2. Autotuning flags: Should just sit behind `torch.compile(mode="max-autotune")` like `max_autotune`, `max_autotune_gemm` 3. `coordinate_descent_tuning`: This one I'm a but mixed about, maybe it just also fall into `mode="max-autotune"` 4. `trace`: `TORCH_COMPILE_DEBUG` is the best flag for all of this 5. `triton.cudagraphs`: Default should be `torch.compile(mode="reduce-overhead")` - I'd go further and rename the `mode=cudagraph` and we can keep reduce-overhead for BC reasons 6. `triton_unique_kernel_names`: Mostly useful for devs debugging 7. `dce`: which doesnt really do anything 8. `shape_padding`: Elias is working on enabling this by default in which case we also remove it ## Mechanics This PR would include the public functions with their docstrings Another PR will take a stab at the configs And for work where the APIs are still being cleaned up whether its minifier or escape hatches, export or dynamic shapes, aot_inductor etc.. we’ll keep them private until a public commitment can be made Pull Request resolved: https://github.com/pytorch/pytorch/pull/102182 Approved by: https://github.com/jansel	2023-06-02 14:38:55 +00:00
Weiming Zhao	b76af5f9a6	Fix broken link in Dynamo's guards doc (#102183 ) (#102185 ) This PR fixes broken link for the code referenced in the guards doc. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/102185 Approved by: https://github.com/mikaylagawarecki, https://github.com/ezyang	2023-06-02 14:36:28 +00:00
Thomas J. Fan	0d17bd5fa4	DOC Fixes unpacking issue in dynamo explain docs (#101761 ) This PR updates the docs to be consistent with `torch.explain` which currently returns 6 items: `bfb3941ad8/torch/_dynamo/eval_frame.py (L622-L629)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/101761 Approved by: https://github.com/desertfire	2023-05-25 22:32:15 +00:00
Elias Ellison	aa83a52742	Profiling doc (#101895 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101895 Approved by: https://github.com/msaroufim, https://github.com/shunting314	2023-05-25 04:57:38 +00:00
Elias Ellison	4692ea76a0	Fine grained apis docs (#101897 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101897 Approved by: https://github.com/msaroufim	2023-05-23 19:03:44 +00:00
Elias Ellison	2bce7c8f46	CUDAGraph trees doc (#101902 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101902 Approved by: https://github.com/msaroufim	2023-05-23 03:35:43 +00:00
Ramil Nugmanov	2ae87a1f87	missed StackDataset documentation (#101927 ) New dataset class added by #101338 missed in documentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101927 Approved by: https://github.com/kit1980	2023-05-22 21:12:16 +00:00
Ren Pang	a630328695	Fix Backend docs search items (#101214 ) Fixes #100944 ## New <img width="1142" alt="image" src="https://github.com/pytorch/pytorch/assets/13214530/79102f2e-8a8f-4169-be53-9248397e653c"> <img width="765" alt="image" src="https://github.com/pytorch/pytorch/assets/13214530/4e5f17e7-a445-4822-ac8a-0d73c9ed71ee"> ## Old <img width="1341" alt="image" src="https://github.com/pytorch/pytorch/assets/13214530/985b4ec9-6d11-4962-8619-3c14ec09c3d9"> <img width="1112" alt="image" src="https://github.com/pytorch/pytorch/assets/13214530/e8dcf1a9-73e7-4fd6-8adc-eb036b1bb87b"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/101214 Approved by: https://github.com/albanD	2023-05-22 14:58:38 +00:00
Rickey K. Liang	807d81155f	[CUDA][CUBLAS] Fix BF16 reduced precision reduction note in Numerical accuracy docs (#101884 ) Fixes #100966 Ref #101044 Align implementation and documentation. (This is what's previously missed from the above issue and PR) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101884 Approved by: https://github.com/eqy, https://github.com/ezyang	2023-05-21 17:38:00 +00:00
Mark Saroufim	3666ca9d97	Dynamic Shape Doc (#101885 ) <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 2f25c1e</samp> > _Dynamic shapes guide_ > _`TorchDynamo` and `TorchInductor`_ > _Learn from data flow_ Thanks @ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/101885 Approved by: https://github.com/eellison, https://github.com/ezyang	2023-05-19 21:43:22 +00:00
Mark Saroufim	ff5b9428aa	Fake Tensor Docs (#101882 ) <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 75f33ae</samp> > _Fake tensors help_ > _compile and optimize code_ > _`PT2` in autumn_ Thanks @ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/101882 Approved by: https://github.com/eellison, https://github.com/ezyang	2023-05-19 21:39:34 +00:00
Mark Saroufim	581d13a069	Add Logging Doc to compile index (#101888 ) <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at ba85a41</samp> > _`logging` module_ > _documents PyTorch events_ > _cutting through the fog_ Thanks @mlazos Pull Request resolved: https://github.com/pytorch/pytorch/pull/101888 Approved by: https://github.com/eellison	2023-05-19 21:29:25 +00:00
Mark Saroufim	2dd33c71c1	Docs for torchcompile and functorch (#101881 ) <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at b5f48b6</samp> > _`torch.compile` docs_ > _Add a new section for `func`_ > _Winter of features_ Thanks @zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101881 Approved by: https://github.com/eellison, https://github.com/zou3519	2023-05-19 21:23:43 +00:00
Jane Xu	cde597efa1	[docs] Warn that GradScaler can scale under 1 (#101569 ) Completes action item 1 in https://github.com/pytorch/pytorch/issues/99640 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101569 Approved by: https://github.com/ngimel	2023-05-16 23:56:07 +00:00
PyTorch MergeBot	66eef31444	Revert "[fx] change from #users to num_users in graph printout (#101140 )" This reverts commit `e568c5a18d`. Reverted https://github.com/pytorch/pytorch/pull/101140 on behalf of https://github.com/jeanschmidt due to There are internal changes to this commit that are preventing landing, so I am reverting to unblock the diff train ([comment](https://github.com/pytorch/pytorch/pull/101140#issuecomment-1547989487))	2023-05-15 14:35:22 +00:00
Ramin Azarmehr	0be53d83fc	[MPS] Add support for MPSProfiler Python bindings (#101002 ) - Added torch.mps.profiler.[start() and stop()] APIs with RST documentation - Added test case in test_mps Pull Request resolved: https://github.com/pytorch/pytorch/pull/101002 Approved by: https://github.com/malfet	2023-05-12 21:55:34 +00:00
Yueming Hao	a12b640dc9	Fix typos in troubleshooting.rst (#101305 ) There are several typos in the troubleshooting documentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101305 Approved by: https://github.com/desertfire	2023-05-12 21:05:13 +00:00
Ran Ding	b5c8d0359c	Update autograd.rst (#101007 ) Fixes #ISSUE_NUMBER typo fix and small change to improve clarity Pull Request resolved: https://github.com/pytorch/pytorch/pull/101007 Approved by: https://github.com/lezcano, https://github.com/anjali411	2023-05-12 11:47:51 +00:00
Michael Suo	e568c5a18d	[fx] change from #users to num_users in graph printout (#101140 ) `#users` means stuff in various chat apps, which makes it annoying to copypasta graphs into them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101140 Approved by: https://github.com/ezyang	2023-05-12 04:34:01 +00:00
eqy	33f3dca6b5	[CUDA][CUBLAS] Fix BF16 reduced precision reduction note in docs (#101044 ) #100966 CC @ngimel @ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/101044 Approved by: https://github.com/ngimel	2023-05-10 06:50:58 +00:00
eqy	6e2efd16d8	[CUDA][CUBLAS] Add cuBLAS workspace allocation behavior to docs (#100919 ) Adding to the docs for now, hopefully we can move to `cudaMallocAsync`-backed cuBLAS workspaces soon which should alleviate the recent confusion around `cuBLAS` "leaking" memory through workspaces. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100919 Approved by: https://github.com/ngimel	2023-05-10 06:40:26 +00:00
fduwjj	953aa6d90e	[TP] Enable more generic attn in Tensor Parallelism (#100508 ) To make TP more generic for Attention module, we come up with this new col/rowwise parallel style. Basically, the idea behind is that: We only do DTensor op for Col/Rowwise sharded part. For the rest of ATen ops, we will leave it to Tensor ops. And we set this behavior as default for Colwise and Rowwise parallel style. If people want to customize it, they can always pass in different prepare_input or prepare_output Pull Request resolved: https://github.com/pytorch/pytorch/pull/100508 Approved by: https://github.com/wanchaol	2023-05-07 18:15:49 +00:00
Michael Lazos	850556ed6e	Add "all" option to logging (#100664 ) Adds the long-promised "all" option to logging. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100664 Approved by: https://github.com/lezcano	2023-05-06 01:11:18 +00:00
Michael Lazos	c525440ba3	Logging documentation updates (#100595 ) Updated the logging.rst with info about the env var. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100595 Approved by: https://github.com/msaroufim, https://github.com/lezcano	2023-05-04 21:54:02 +00:00
Animesh Jain	8994d9e610	[dynamo] Hide guard_fail_hook behind a flag to improve cache lookup time (+10% DebertaV2) (#100590 ) For TorchDynamo eager backend, DebertaV2 speedup improves from 0.77x to 0.87x. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100590 Approved by: https://github.com/voznesenskym, https://github.com/wconstab	2023-05-04 18:52:21 +00:00

1 2 3 4 5 ...

2132 Commits