pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Jesse Cai	aea771de30	[core][pruning][sparse][feature] SparseSemiStructured tensor subclass (#102135 ) This PR adds in support for semi-structured sparsity via a tensor subclass. It currently uses the CUTLASS kernels merged in PR #100881. In the future we plan to add in cuSPARSELt support (see the other PRs in the stack), which will give us larger performance gains. This PR adds in 2 things: - a Tensor subclass, `SparseSemiStructuredTensor` to store the sparse tensor in copmressed form and override `__torch_dispatch__`. - a conversion function that takes in a dense tensor and a semi-structured sparse bool mask and creates an instance of the subclass. SparseSemiStructuredTensor The subclass stores the dense tensor in a contiguous flattened tensor for future compatability with cuSPARSELt, which expects this format. Note that the CUTLASS kernels do not have this limitation, as the specified values and the metadata are passed separately in `_structured_sparse_linear`. In the future we can use the cuSPARSELT bindings [here](https://github.com/pytorch/pytorch/pull/103700) for faster matmul, better dtype converage, and relaxed shape constraints. Since we currently don't have a way to go back from the sparse representation to the dense representation, and we store the weights in compressed form, we don't have a great way to handle .t(). Instead, we keep track of how often we've called transpose on our tensor, and if it's an unexpected number we throw an error. When the first argument is sparse, we expect an even number of calls to transpose, while when the second argument is sparse, we expect an odd number of calls. This is because we support second argument sparse matrix multiplications by using transpose properties. to_sparse_semi_structured This is a conversion function to convert a dense tensor and a semi-structured sparse bool mask into a subclass. Currently, we must pass in a bool mask, since we can't infer it becuase there may be additional zero elements in the dense tensor, so `tensor !=0` is not 2:4 sparse. Once we add either a method to derive the mask from the dense tensor or cuSPARSELt, we no longer need to pass in the mask. cuSPARSELt has it's own helper functions to create the metadata mask. User Details We have implemented support for the following ops for `torch.float16` and `torch.int8`: ``` torch.addmm(bias, dense, sparse.t()) torch.mm(dense, sparse) torch.mm(sparse, dense) aten.linear.default aten.t.default aten.t.detach ``` The end user interface to accelerate a nn.Linaer module with the subclass would look like this: ``` from torch.sparse import to_sparse_semi_structured mask = torch.Tensor([0, 0, 1, 1]).tile(128, 32).cuda().bool() linear = Model(128, 128).half().cuda() linear.weight = nn.Parameter(to_sparse_semi_structured(linear.weight, mask=linear.weight.bool()) ``` This also updates tests and the `torch.sparse` module docstring to reflect these changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102135 Approved by: https://github.com/albanD	2023-06-27 02:37:00 +00:00
Mikayla Gawarecki	981f24e806	Add docstring to torch.serialization.register_package (#104046 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104046 Approved by: https://github.com/albanD	2023-06-26 23:28:32 +00:00
PyTorch MergeBot	bfa08a1c67	Revert "[core][pruning][sparse][feature] SparseSemiStructured tensor subclass (#102135 )" This reverts commit `cf5262a84f`. Reverted https://github.com/pytorch/pytorch/pull/102135 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but test_sparse_semi_structured.py::TestSparseSemiStructuredCUDA::test_mm_sparse_first_NT_cuda_int8 is failing CUDA trunk jobs `cf5262a84f`. This looks like a landrace ([comment](https://github.com/pytorch/pytorch/pull/102135#issuecomment-1608423849))	2023-06-26 22:54:16 +00:00
Jesse Cai	cf5262a84f	[core][pruning][sparse][feature] SparseSemiStructured tensor subclass (#102135 ) This PR adds in support for semi-structured sparsity via a tensor subclass. It currently uses the CUTLASS kernels merged in PR #100881. In the future we plan to add in cuSPARSELt support (see the other PRs in the stack), which will give us larger performance gains. This PR adds in 2 things: - a Tensor subclass, `SparseSemiStructuredTensor` to store the sparse tensor in copmressed form and override `__torch_dispatch__`. - a conversion function that takes in a dense tensor and a semi-structured sparse bool mask and creates an instance of the subclass. SparseSemiStructuredTensor The subclass stores the dense tensor in a contiguous flattened tensor for future compatability with cuSPARSELt, which expects this format. Note that the CUTLASS kernels do not have this limitation, as the specified values and the metadata are passed separately in `_structured_sparse_linear`. In the future we can use the cuSPARSELT bindings [here](https://github.com/pytorch/pytorch/pull/103700) for faster matmul, better dtype converage, and relaxed shape constraints. Since we currently don't have a way to go back from the sparse representation to the dense representation, and we store the weights in compressed form, we don't have a great way to handle .t(). Instead, we keep track of how often we've called transpose on our tensor, and if it's an unexpected number we throw an error. When the first argument is sparse, we expect an even number of calls to transpose, while when the second argument is sparse, we expect an odd number of calls. This is because we support second argument sparse matrix multiplications by using transpose properties. to_sparse_semi_structured This is a conversion function to convert a dense tensor and a semi-structured sparse bool mask into a subclass. Currently, we must pass in a bool mask, since we can't infer it becuase there may be additional zero elements in the dense tensor, so `tensor !=0` is not 2:4 sparse. Once we add either a method to derive the mask from the dense tensor or cuSPARSELt, we no longer need to pass in the mask. cuSPARSELt has it's own helper functions to create the metadata mask. User Details We have implemented support for the following ops for `torch.float16` and `torch.int8`: ``` torch.addmm(bias, dense, sparse.t()) torch.mm(dense, sparse) torch.mm(sparse, dense) aten.linear.default aten.t.default aten.t.detach ``` The end user interface to accelerate a nn.Linaer module with the subclass would look like this: ``` from torch.sparse import to_sparse_semi_structured mask = torch.Tensor([0, 0, 1, 1]).tile(128, 32).cuda().bool() linear = Model(128, 128).half().cuda() linear.weight = nn.Parameter(to_sparse_semi_structured(linear.weight, mask=linear.weight.bool()) ``` This also updates tests and the `torch.sparse` module docstring to reflect these changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102135 Approved by: https://github.com/albanD	2023-06-26 21:30:43 +00:00
Sergii Dymchenko	adf9595c2f	Update CODEOWNERS (#103934 ) Remove users that no longer have write access to the repo, resolving CODEOWNERS errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103934 Approved by: https://github.com/ZainRizvi, https://github.com/atalman, https://github.com/malfet	2023-06-26 19:29:29 +00:00
ZhaoqiongZ	7cef7195f6	[draft] Update Multiprocessing best practices with CPU device (#103229 ) Fixes [#102498](https://github.com/pytorch/pytorch/issues/102498) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103229 Approved by: https://github.com/mingfeima, https://github.com/svekars, https://github.com/jgong5	2023-06-25 06:26:40 +00:00
Zachary DeVito	afc788a99c	Re-land _cycleviz.py: visualize reference cycles holding cuda memory (#104051 ) Reference cycles are freed by the cycle collector rather than being cleaned up when the objects in the cycle first become unreachable. If a cycle points to a tensor, the CUDA memory for that tensor will not be freed until garbage collection runs. Accumulation of CUDA allocations can lead to out of memory errors (OOMs), as well as non-deterministic allocation behavior which is harder to debug. This visualizer installs a garbage collection hook to look for cycles containing CUDA tensors and saves a visualization of the garbage: ``` from torch.cuda._cycleviz import warn_tensor_cycles warn_tensor_cycles() # do some work that results in a cycle getting garbage collected # ... > WARNING:root:Reference cycle includes a CUDA Tensor see visualization of cycle /tmp/tmpeideu9gl.html ``` Reland to make windows skip the test. This reverts commit `7b3b6dd426`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104051 Approved by: https://github.com/aaronenyeshi, https://github.com/malfet	2023-06-23 13:44:58 +00:00
PyTorch MergeBot	7b3b6dd426	Revert "_cycleviz.py: visualize reference cycles holding cuda memory (#102656 )" This reverts commit `dba67f71c9`. Reverted https://github.com/pytorch/pytorch/pull/102656 on behalf of https://github.com/huydhn due to Sorry for reverting your PR. But I think the change is failing on Windows CUDA https://github.com/pytorch/pytorch/actions/runs/5341701630/jobs/9683293600 ([comment](https://github.com/pytorch/pytorch/pull/102656#issuecomment-1603035364))	2023-06-22 17:16:47 +00:00
albanD	4143b6b89b	Add torch_dispatch and modes to extending.rst note (#102087 ) The following subjects are not in this PR and will be done in a follow up: - Go through torch_function section and update to the latest phrasing and link to the proper new sections - Go through torch.library and custom device docs to add links to the new sections as appropriate - Top level explanations on which component should be used Pull Request resolved: https://github.com/pytorch/pytorch/pull/102087 Approved by: https://github.com/janeyx99	2023-06-22 12:56:35 +00:00
Zachary DeVito	dba67f71c9	_cycleviz.py: visualize reference cycles holding cuda memory (#102656 ) Reference cycles are freed by the cycle collector rather than being cleaned up when the objects in the cycle first become unreachable. If a cycle points to a tensor, the CUDA memory for that tensor will not be freed until garbage collection runs. Accumulatin of CUDA allocations can lead to out of memory errors (OOMs), as well as non-deterministic allocation behavior which is harder to debug. This visualizer installs a garbage collection hook to look for cycles containing CUDA tensors and saves a visualization of the garbage: ``` from torch.cuda._cycleviz import warn_tensor_cycles warn_tensor_cycles() # do some work that results in a cycle getting garbage collected # ... > WARNING:root:Reference cycle includes a CUDA Tensor see visualization of cycle /tmp/tmpeideu9gl.html ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/102656 Approved by: https://github.com/aaronenyeshi	2023-06-22 04:00:28 +00:00
Michael Suo	a475ea4542	[fx] change from #users to num_users in graph printout (#101140 ) `#users` means stuff in various chat apps, which makes it annoying to copypasta graphs into them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101140 Approved by: https://github.com/ezyang	2023-06-20 21:24:32 +00:00
PyTorch MergeBot	e031dd23b0	Revert "To add brief intro for CPU backend optimization (#103666 )" This reverts commit `013ffe457e`. Reverted https://github.com/pytorch/pytorch/pull/103666 on behalf of https://github.com/huydhn due to Failing doc tests in trunk `013ffe457e` ([comment](https://github.com/pytorch/pytorch/pull/103666#issuecomment-1599301270))	2023-06-20 18:33:01 +00:00
Zaili Wang	013ffe457e	To add brief intro for CPU backend optimization (#103666 ) This PR is about adding brief introduction for x86 CPU backend optimization. Per previous discussion, the former PR #103307 was closed and creating this one, the contents are put into a new file. @Guobing-Chen @jgong5 @mingfeima @jingxu10 please help review, thanks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103666 Approved by: https://github.com/jgong5, https://github.com/malfet	2023-06-20 17:35:22 +00:00
leslie-fang-intel	9832cfbbfe	Quantization oneDNN backend only support VNNI CPU (#103653 ) Summary - Update the quantization document that default qconfig with oneDNN backend is recommended to be used on CPUs with Vector Neural Network Instruction support. - Add the warning message when user uses default qconfig with oneDNN backend on CPU without Vector Neural Network Instruction support. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103653 Approved by: https://github.com/jgong5, https://github.com/malfet	2023-06-19 09:50:07 +00:00
albanD	918fe519a0	Use the new analytics ID (#103766 ) Re: https://github.com/pytorch/pytorch.github.io/issues/1397 Following the migration to latest google analytics FYI @malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/103766 Approved by: https://github.com/svekars	2023-06-16 23:21:08 +00:00
Edward Z. Yang	bc6ec97e02	Switch dynamic_shapes to True by default (#103597 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/103597 Approved by: https://github.com/voznesenskym	2023-06-15 15:16:20 +00:00
Mark Saroufim	ea384cd377	torch.compiler public namespace (#102182 ) # torch.compiler public API ## Goal The goal of this document is to describe the public facing API for torchdynamo and torchinductor. Today both dynamo and torchinductor are in `torch/_dynamo` and `torch/_inductor` namespace with the only public function `torch.compile()` which is directly placed in `torch/__init__.py` This poses a few problems for users trying to take dependencies on PyTorch 2.0 1. Unclear BC guarantees 2. No builtin discovery mechanism outside of reading the source code 3. No hard requirements for docstrings or type annotations Most importantly it mixes two personas the PyTorch 2.0 developer vs the PyTorch 2.0 customer so this is an attempt to address this. We draw a lot of inspiration from the `functorch` migration to the `func` namespace. ## Alternate names We did discuss some other alternative names 1. `torch.compile` -> problem is this would break BC on the existing `torch.compile` function 2. `torch.dynamo` -> `dynamo` is so far not something we've deliberately hidden from users but problem is now figuring out what it's `_dynamo` vs `dynamo` might be confusing 3. `torch.compiler` -> 1 would be better but to keep BC this is a good compromise # The general approach ## Proposal 1 In https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/__init__.py We have function called `reset()`, this function is essential if users are trying to `torch.compile()` a model under different settings ```python # in _dynamo/ def reset(): do_reset_stuff() ``` Instead we propose ```python # in compiler/ def reset(): do_reset_stuff() # As in copy paste the logic from _dynamo.reset # in _dynamo/ import warnings import inspect def reset(): function_name = inspect.currentframe().f_code.co_name warnings.warn(f"{function_name} is deprecated, use compiler.{function_name} instead", DeprecationWarning) return compiler.reset() ``` ## Proposal 2 ```python # in compiler/ def reset(): “”” Docstrings here “”” _dynamo.reset() # in _dynamo/ No changes ``` Consensus so far seems to be proposal 2 since fewer warnings will be less jarring and it’ll make it quite easy to merge the public API ## Docstrings The above was an example of a function that has no inputs or outputs but there are other functions which could use an improvement in their docstrings, for example allow_in_graph actually works over lists of functions but that’s not mentioned anywhere in the example only if you read the source code. def allow_in_graph(fn): """ Customize which functions TorchDynamo will include in the generated graph. Similar to `torch.fx.wrap()`. Parameters: fn (callable or list/tuple): The function(s) to be allowed in the graph. Returns: callable or list/tuple: The input function(s) included in the graph. Examples: Customize inclusion of a single function: :: torch._dynamo.allow_in_graph(my_custom_function) Customize inclusion of multiple functions: :: torch._dynamo.allow_in_graph([my_custom_function1, my_custom_function2]) @torch._dynamo.optimize(...) def fn(a): x = torch.add(x, 1) x = my_custom_function(x) x = torch.add(x, 1) return x fn(...) Notes: The `allow_in_graph` function allows customization of which functions TorchDynamo includes in the generated graph. It can be used to include specific functions that are not automatically captured by TorchDynamo. If `fn` is a list or tuple, `allow_in_graph` will be called recursively on each element in the sequence. Once a function is allowed in the graph using `allow_in_graph`, it will be captured in the graph generated by TorchDynamo. This customization enables more fine-grained control over the functions included in the graph. Note that `allow_in_graph` expects the input `fn` to be a callable. """ if isinstance(fn, (list, tuple)): return [allow_in_graph(x) for x in fn] assert callable(fn), "allow_in_graph expects a callable" allowed_functions._allowed_function_ids.add(id(fn)) allowed_functions._disallowed_function_ids.remove(id(fn)) return fn So to make the API public, we’d have to write similar docstrings for all public functions we’d like to create. The benefit of this approach is that 1. No BC risks, internal and external users relying on our tooling can slowly wean off the private functions. 2. We will also have to write correct docstrings which will automatically make our documentation easier to maintain and render correctly on pytorch.org 3. We already have some BC guarantees already, we don’t kill OptimizedModule, we rejected the PR to change the config system The con of this approach is that Will be stuck with some potentially suboptimal functions/classes that you can’t kill ## Testing strategy If the approach is to mostly make a public function call an already tested private function then all we need to do is ensure that the function signatures don't change ## Which functions should be in the public API Our heuristic for deciding whether something should be public or not is are users already relying on it for lack of other options or have we recommended some non public functions for users to debug their PT 2.0 programs. Heuristic for not putting something in public is that it’s an experimental subsystem with the goal of turning it on by default, it’s very core dev centric, meta centric, a bunch of different configs that should be batched into a single user facing one, or something that needs to be renamed because the name is confusing #### Top level `torch.compile()` -> already is a public API it does require some minor improvements like having configs be passed in to any backend and not just inductor (EDIT: This was already done https://github.com/pytorch/pytorch/pull/99645l) and renaming `mode=reduce-overhead` to `mode=cudagraph` To make sure that PT 2.0 is supported with a given pytorch version users can create a new public function and this would replace the need for `try/except` blocks around `import torch._dynamo` that has been populating user code. ```python def pt2_enabled(): if hasattr(torch, 'compile'): return True else: return False ``` For all of the below they will be translated to `torch.compiler.function_name()` #### From _dynamo As a starting point we looked at https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/__init__.py and we suggest redefining these functions in `pytorch/torch/compiler/__init__.py` It might also make sense to split them over multiple files and import them in `__init__.py` but because the number of functions is small it'd probably be fine to add them all into a single compiler/__init__.py until this list becomes larger 1. `reset()` 2. `allow_in_graph()` 10. `list_backends()` 12. `compile()`: torch.compile() would be mostly a shell function passing arguments to torch.compiler.compile() 13. `assume_constant_result()`: TODO: Double check how this is useful 15. `torch._dynamo.disable()` Some notable omissions 11. `explain()`: We need to clean up the output for this function, make it a data class and pretty printable 1. `forbid_in_graph()`: Considered adding this but should instead consolidate on `disallow_in_graph` 2. `optimize_assert()`: Already covered by `torch.compile(fullgraph=True)` 3. `check_if_dynamo_supported()`: this would be supplanted by pt2_enabled() 4. `compilation_metrics`, `graph_breaks_reasons` ..: would all be accessed via `torch.compiler.explain()` 5. `replay` does not seem useful to end customers 6. . `graph_break()`: Mostly useful for debugging or unit tests 9. `register_backend()`: End users will just pass a string backend to torch.compile, only devs will create new backends 10. `export()` : Eventually this needs to public but for now it’s not ready so just highlighting that it will be in the public API eventually 11. `disallow_in_graph()`: Usage is limited 12. `mark_static()`: we can keep this private until dynamic=True is recommended in stable 13. `mark_dynamic()`: we can keep this private until dynamic=True is recommended in trunk 14. 8. `OptimizedModule`: This is the only class that we'd expose but is crucial since users are running code like `if isinstance(mod, OptimizedModule): torch.save(mod._orig_mod)` EDIT: because we fixed pickling we no longer need to expose this 15. `is_compiling()`: Still not clear how this useful to end users There are also config variables which we need to expose https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/config.py Some of our configs are useful dev flags, others are to gate experimental functionality and others are essential debugging tools and we seperate out the essential debugging and logging tools to a public facing config. TODO: I still need to think of a good way of porting the config in a BC way here are some ideas 1. Just make all passes available and controllable via `torch.compile(options={})` but only show docstrings for the ones users should care about. The current problem with our config system is we have 3 ways of setting them once via `options={}`, environment variables and variables in `config.py`, it'd be worth settling on one source of truth and have that be the public API. The configs we should make public are 1. `log_file_name` 2. `verbose` 3. `cache_size_limit` 4. `repro_level` and `repro_after`: Although we can rename these to minifier and give human readable names to the levels Everything else should stay private in particular 1. `print_graph_breaks`, `print_specializations`: should be supplanted by `explain()` for public users 2. dynamic shape configs : Users should only have to worry about `torch.compile(dynamic=True/False)` 3. The distributed flags, hook or guard configs: If we tell a user to use FSDP and DDP then the flag should be enabled by default or be in a private namespace 4. The fbcode flags: Obviously no need to be user facing 5. Skip/Allow lists: Not something normal users should play around with #### From _inductor Very little of inductor should be exposed in a public facing API, our core audience as in people writing models mostly just need information on what certain passes mean and how to control them a high level and they can do this with `torch.compile(options={})` so the goal here should be more to make available passes clearer and ideally consolidate them into `torch.compile()` docstrings or modes. There are some exceptions though from https://github.com/pytorch/pytorch/blob/main/torch/_inductor/__init__.py 1. `list_mode_options()` 2. `list_options()`: this needs an additional pass to hide internal or debug options For both of these we’d rename them to compiler.inductor_list_mode_options and compiler.inductor_list_options() since they would be in the same init file as the one for dynamo Notable omissions 1. `_inductor.compile()`: Because of users are coming in with their own fx graph, they are likely developers 2. `_inductor.aot_compile()`:Again this is about capturing and modifying fx graphs so users APIs don't need to be public However the configs are a slightly different story, because we can choose to either 1. Make all configs public 2. Make some configs public and keep most of the private ones. If public config is set it should override the private version 3. Make all configs controllable via `torch.compile(options={})` but make list_options() hide more things For now 3 seems like the most reasonable choice with some high level configs we’ll keep like TORCH_COMPILE_DEBUG Regardless here's what should probably be public or advertised more 1. `disable_progress` and verbose_progress: Combine and enable by default 2. `fallback_random`: We could make the case this shouldn't be public if a top level deterministic mode enables this 3. `profile_bandwidth`: Or could make the case that this should be in TORCH_COMPILE_DEBUG Notable omissions 1. Any config that would generally improve performance for most that we should probably enable by default but might be disabled in the short term because of stability: example `epilogue_fusion`, `pattern_matcher`, `reordering` 2. Autotuning flags: Should just sit behind `torch.compile(mode="max-autotune")` like `max_autotune`, `max_autotune_gemm` 3. `coordinate_descent_tuning`: This one I'm a but mixed about, maybe it just also fall into `mode="max-autotune"` 4. `trace`: `TORCH_COMPILE_DEBUG` is the best flag for all of this 5. `triton.cudagraphs`: Default should be `torch.compile(mode="reduce-overhead")` - I'd go further and rename the `mode=cudagraph` and we can keep reduce-overhead for BC reasons 6. `triton_unique_kernel_names`: Mostly useful for devs debugging 7. `dce`: which doesnt really do anything 8. `shape_padding`: Elias is working on enabling this by default in which case we also remove it ## Mechanics This PR would include the public functions with their docstrings Another PR will take a stab at the configs And for work where the APIs are still being cleaned up whether its minifier or escape hatches, export or dynamic shapes, aot_inductor etc.. we’ll keep them private until a public commitment can be made Pull Request resolved: https://github.com/pytorch/pytorch/pull/102182 Approved by: https://github.com/jansel, https://github.com/albanD	2023-06-13 19:52:17 +00:00
Michael Lazos	6c6c897d6b	Add graph break logging option instead of config flag (#103202 ) Make graph break logging a logging option vs a config setting Pull Request resolved: https://github.com/pytorch/pytorch/pull/103202 Approved by: https://github.com/yanboliang, https://github.com/anijain2305	2023-06-12 19:52:31 +00:00
shaoyf42	443edb9015	[DOCS][DDP]Fix the simple of saving and reloading PowerSGD state and hook. (#102721 ) Fix the simple of saving and reloading PowerSGD state and hook. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102721 Approved by: https://github.com/H-Huang	2023-06-10 00:15:00 +00:00
Weiming Zhao	28f43c767c	Fix outdated log settings in doc (#102285 ) (#102286 ) Replace torch._dynamo.config.loglevel=<level> with torch._logging.set_logs(dynamo=<level>) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/102286 Approved by: https://github.com/msaroufim, https://github.com/Neilblaze	2023-06-07 18:07:20 +00:00
David Berard	038955f489	torch.compile docs: "Profiling to understand torch.compile performance (#102862 ) Docs on how to use torch.profiler.profile to understand torch.compile performance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102862 Approved by: https://github.com/eellison	2023-06-06 22:00:36 +00:00
Eli Uriegas	e26f5b2ac7	docs: Render bullet points correctly (#103021 ) This wasn't rendering correctly on the website, this should make it so that the bullet points actually show correctly now. Signed-off-by: Eli Uriegas <eliuriegas@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/103021 Approved by: https://github.com/albanD	2023-06-06 00:22:49 +00:00
Elias Ellison	4479e2fa19	fix profiling ref in side panel (#103014 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103014 Approved by: https://github.com/msaroufim	2023-06-05 21:19:51 +00:00
Elias Ellison	d89c719160	Fix torch.compile side panels refs (#102407 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/102407 Approved by: https://github.com/msaroufim	2023-06-05 20:08:40 +00:00
PyTorch MergeBot	258d398eec	Revert "torch.compiler public namespace (#102182 )" This reverts commit `b5840f99c3`. Reverted https://github.com/pytorch/pytorch/pull/102182 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/102182#issuecomment-1576144551))	2023-06-05 06:52:37 +00:00
Mark Saroufim	b5840f99c3	torch.compiler public namespace (#102182 ) # torch.compiler public API ## Goal The goal of this document is to describe the public facing API for torchdynamo and torchinductor. Today both dynamo and torchinductor are in `torch/_dynamo` and `torch/_inductor` namespace with the only public function `torch.compile()` which is directly placed in `torch/__init__.py` This poses a few problems for users trying to take dependencies on PyTorch 2.0 1. Unclear BC guarantees 2. No builtin discovery mechanism outside of reading the source code 3. No hard requirements for docstrings or type annotations Most importantly it mixes two personas the PyTorch 2.0 developer vs the PyTorch 2.0 customer so this is an attempt to address this. We draw a lot of inspiration from the `functorch` migration to the `func` namespace. ## Alternate names We did discuss some other alternative names 1. `torch.compile` -> problem is this would break BC on the existing `torch.compile` function 2. `torch.dynamo` -> `dynamo` is so far not something we've deliberately hidden from users but problem is now figuring out what it's `_dynamo` vs `dynamo` might be confusing 3. `torch.compiler` -> 1 would be better but to keep BC this is a good compromise # The general approach ## Proposal 1 In https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/__init__.py We have function called `reset()`, this function is essential if users are trying to `torch.compile()` a model under different settings ```python # in _dynamo/ def reset(): do_reset_stuff() ``` Instead we propose ```python # in compiler/ def reset(): do_reset_stuff() # As in copy paste the logic from _dynamo.reset # in _dynamo/ import warnings import inspect def reset(): function_name = inspect.currentframe().f_code.co_name warnings.warn(f"{function_name} is deprecated, use compiler.{function_name} instead", DeprecationWarning) return compiler.reset() ``` ## Proposal 2 ```python # in compiler/ def reset(): “”” Docstrings here “”” _dynamo.reset() # in _dynamo/ No changes ``` Consensus so far seems to be proposal 2 since fewer warnings will be less jarring and it’ll make it quite easy to merge the public API ## Docstrings The above was an example of a function that has no inputs or outputs but there are other functions which could use an improvement in their docstrings, for example allow_in_graph actually works over lists of functions but that’s not mentioned anywhere in the example only if you read the source code. def allow_in_graph(fn): """ Customize which functions TorchDynamo will include in the generated graph. Similar to `torch.fx.wrap()`. Parameters: fn (callable or list/tuple): The function(s) to be allowed in the graph. Returns: callable or list/tuple: The input function(s) included in the graph. Examples: Customize inclusion of a single function: :: torch._dynamo.allow_in_graph(my_custom_function) Customize inclusion of multiple functions: :: torch._dynamo.allow_in_graph([my_custom_function1, my_custom_function2]) @torch._dynamo.optimize(...) def fn(a): x = torch.add(x, 1) x = my_custom_function(x) x = torch.add(x, 1) return x fn(...) Notes: The `allow_in_graph` function allows customization of which functions TorchDynamo includes in the generated graph. It can be used to include specific functions that are not automatically captured by TorchDynamo. If `fn` is a list or tuple, `allow_in_graph` will be called recursively on each element in the sequence. Once a function is allowed in the graph using `allow_in_graph`, it will be captured in the graph generated by TorchDynamo. This customization enables more fine-grained control over the functions included in the graph. Note that `allow_in_graph` expects the input `fn` to be a callable. """ if isinstance(fn, (list, tuple)): return [allow_in_graph(x) for x in fn] assert callable(fn), "allow_in_graph expects a callable" allowed_functions._allowed_function_ids.add(id(fn)) allowed_functions._disallowed_function_ids.remove(id(fn)) return fn So to make the API public, we’d have to write similar docstrings for all public functions we’d like to create. The benefit of this approach is that 1. No BC risks, internal and external users relying on our tooling can slowly wean off the private functions. 2. We will also have to write correct docstrings which will automatically make our documentation easier to maintain and render correctly on pytorch.org 3. We already have some BC guarantees already, we don’t kill OptimizedModule, we rejected the PR to change the config system The con of this approach is that Will be stuck with some potentially suboptimal functions/classes that you can’t kill ## Testing strategy If the approach is to mostly make a public function call an already tested private function then all we need to do is ensure that the function signatures don't change ## Which functions should be in the public API Our heuristic for deciding whether something should be public or not is are users already relying on it for lack of other options or have we recommended some non public functions for users to debug their PT 2.0 programs. Heuristic for not putting something in public is that it’s an experimental subsystem with the goal of turning it on by default, it’s very core dev centric, meta centric, a bunch of different configs that should be batched into a single user facing one, or something that needs to be renamed because the name is confusing #### Top level `torch.compile()` -> already is a public API it does require some minor improvements like having configs be passed in to any backend and not just inductor (EDIT: This was already done https://github.com/pytorch/pytorch/pull/99645l) and renaming `mode=reduce-overhead` to `mode=cudagraph` To make sure that PT 2.0 is supported with a given pytorch version users can create a new public function and this would replace the need for `try/except` blocks around `import torch._dynamo` that has been populating user code. ```python def pt2_enabled(): if hasattr(torch, 'compile'): return True else: return False ``` For all of the below they will be translated to `torch.compiler.function_name()` #### From _dynamo As a starting point we looked at https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/__init__.py and we suggest redefining these functions in `pytorch/torch/compiler/__init__.py` It might also make sense to split them over multiple files and import them in `__init__.py` but because the number of functions is small it'd probably be fine to add them all into a single compiler/__init__.py until this list becomes larger 1. `reset()` 2. `allow_in_graph()` 10. `list_backends()` 12. `compile()`: torch.compile() would be mostly a shell function passing arguments to torch.compiler.compile() 13. `assume_constant_result()`: TODO: Double check how this is useful 15. `torch._dynamo.disable()` Some notable omissions 11. `explain()`: We need to clean up the output for this function, make it a data class and pretty printable 1. `forbid_in_graph()`: Considered adding this but should instead consolidate on `disallow_in_graph` 2. `optimize_assert()`: Already covered by `torch.compile(fullgraph=True)` 3. `check_if_dynamo_supported()`: this would be supplanted by pt2_enabled() 4. `compilation_metrics`, `graph_breaks_reasons` ..: would all be accessed via `torch.compiler.explain()` 5. `replay` does not seem useful to end customers 6. . `graph_break()`: Mostly useful for debugging or unit tests 9. `register_backend()`: End users will just pass a string backend to torch.compile, only devs will create new backends 10. `export()` : Eventually this needs to public but for now it’s not ready so just highlighting that it will be in the public API eventually 11. `disallow_in_graph()`: Usage is limited 12. `mark_static()`: we can keep this private until dynamic=True is recommended in stable 13. `mark_dynamic()`: we can keep this private until dynamic=True is recommended in trunk 14. 8. `OptimizedModule`: This is the only class that we'd expose but is crucial since users are running code like `if isinstance(mod, OptimizedModule): torch.save(mod._orig_mod)` EDIT: because we fixed pickling we no longer need to expose this 15. `is_compiling()`: Still not clear how this useful to end users There are also config variables which we need to expose https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/config.py Some of our configs are useful dev flags, others are to gate experimental functionality and others are essential debugging tools and we seperate out the essential debugging and logging tools to a public facing config. TODO: I still need to think of a good way of porting the config in a BC way here are some ideas 1. Just make all passes available and controllable via `torch.compile(options={})` but only show docstrings for the ones users should care about. The current problem with our config system is we have 3 ways of setting them once via `options={}`, environment variables and variables in `config.py`, it'd be worth settling on one source of truth and have that be the public API. The configs we should make public are 1. `log_file_name` 2. `verbose` 3. `cache_size_limit` 4. `repro_level` and `repro_after`: Although we can rename these to minifier and give human readable names to the levels Everything else should stay private in particular 1. `print_graph_breaks`, `print_specializations`: should be supplanted by `explain()` for public users 2. dynamic shape configs : Users should only have to worry about `torch.compile(dynamic=True/False)` 3. The distributed flags, hook or guard configs: If we tell a user to use FSDP and DDP then the flag should be enabled by default or be in a private namespace 4. The fbcode flags: Obviously no need to be user facing 5. Skip/Allow lists: Not something normal users should play around with #### From _inductor Very little of inductor should be exposed in a public facing API, our core audience as in people writing models mostly just need information on what certain passes mean and how to control them a high level and they can do this with `torch.compile(options={})` so the goal here should be more to make available passes clearer and ideally consolidate them into `torch.compile()` docstrings or modes. There are some exceptions though from https://github.com/pytorch/pytorch/blob/main/torch/_inductor/__init__.py 1. `list_mode_options()` 2. `list_options()`: this needs an additional pass to hide internal or debug options For both of these we’d rename them to compiler.inductor_list_mode_options and compiler.inductor_list_options() since they would be in the same init file as the one for dynamo Notable omissions 1. `_inductor.compile()`: Because of users are coming in with their own fx graph, they are likely developers 2. `_inductor.aot_compile()`:Again this is about capturing and modifying fx graphs so users APIs don't need to be public However the configs are a slightly different story, because we can choose to either 1. Make all configs public 2. Make some configs public and keep most of the private ones. If public config is set it should override the private version 3. Make all configs controllable via `torch.compile(options={})` but make list_options() hide more things For now 3 seems like the most reasonable choice with some high level configs we’ll keep like TORCH_COMPILE_DEBUG Regardless here's what should probably be public or advertised more 1. `disable_progress` and verbose_progress: Combine and enable by default 2. `fallback_random`: We could make the case this shouldn't be public if a top level deterministic mode enables this 3. `profile_bandwidth`: Or could make the case that this should be in TORCH_COMPILE_DEBUG Notable omissions 1. Any config that would generally improve performance for most that we should probably enable by default but might be disabled in the short term because of stability: example `epilogue_fusion`, `pattern_matcher`, `reordering` 2. Autotuning flags: Should just sit behind `torch.compile(mode="max-autotune")` like `max_autotune`, `max_autotune_gemm` 3. `coordinate_descent_tuning`: This one I'm a but mixed about, maybe it just also fall into `mode="max-autotune"` 4. `trace`: `TORCH_COMPILE_DEBUG` is the best flag for all of this 5. `triton.cudagraphs`: Default should be `torch.compile(mode="reduce-overhead")` - I'd go further and rename the `mode=cudagraph` and we can keep reduce-overhead for BC reasons 6. `triton_unique_kernel_names`: Mostly useful for devs debugging 7. `dce`: which doesnt really do anything 8. `shape_padding`: Elias is working on enabling this by default in which case we also remove it ## Mechanics This PR would include the public functions with their docstrings Another PR will take a stab at the configs And for work where the APIs are still being cleaned up whether its minifier or escape hatches, export or dynamic shapes, aot_inductor etc.. we’ll keep them private until a public commitment can be made Pull Request resolved: https://github.com/pytorch/pytorch/pull/102182 Approved by: https://github.com/jansel	2023-06-02 14:38:55 +00:00
Weiming Zhao	b76af5f9a6	Fix broken link in Dynamo's guards doc (#102183 ) (#102185 ) This PR fixes broken link for the code referenced in the guards doc. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/102185 Approved by: https://github.com/mikaylagawarecki, https://github.com/ezyang	2023-06-02 14:36:28 +00:00
Thomas J. Fan	0d17bd5fa4	DOC Fixes unpacking issue in dynamo explain docs (#101761 ) This PR updates the docs to be consistent with `torch.explain` which currently returns 6 items: `bfb3941ad8/torch/_dynamo/eval_frame.py (L622-L629)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/101761 Approved by: https://github.com/desertfire	2023-05-25 22:32:15 +00:00
Elias Ellison	aa83a52742	Profiling doc (#101895 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101895 Approved by: https://github.com/msaroufim, https://github.com/shunting314	2023-05-25 04:57:38 +00:00
Elias Ellison	4692ea76a0	Fine grained apis docs (#101897 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101897 Approved by: https://github.com/msaroufim	2023-05-23 19:03:44 +00:00
Elias Ellison	2bce7c8f46	CUDAGraph trees doc (#101902 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101902 Approved by: https://github.com/msaroufim	2023-05-23 03:35:43 +00:00
Ramil Nugmanov	2ae87a1f87	missed StackDataset documentation (#101927 ) New dataset class added by #101338 missed in documentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101927 Approved by: https://github.com/kit1980	2023-05-22 21:12:16 +00:00
Ren Pang	a630328695	Fix Backend docs search items (#101214 ) Fixes #100944 ## New <img width="1142" alt="image" src="https://github.com/pytorch/pytorch/assets/13214530/79102f2e-8a8f-4169-be53-9248397e653c"> <img width="765" alt="image" src="https://github.com/pytorch/pytorch/assets/13214530/4e5f17e7-a445-4822-ac8a-0d73c9ed71ee"> ## Old <img width="1341" alt="image" src="https://github.com/pytorch/pytorch/assets/13214530/985b4ec9-6d11-4962-8619-3c14ec09c3d9"> <img width="1112" alt="image" src="https://github.com/pytorch/pytorch/assets/13214530/e8dcf1a9-73e7-4fd6-8adc-eb036b1bb87b"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/101214 Approved by: https://github.com/albanD	2023-05-22 14:58:38 +00:00
Rickey K. Liang	807d81155f	[CUDA][CUBLAS] Fix BF16 reduced precision reduction note in Numerical accuracy docs (#101884 ) Fixes #100966 Ref #101044 Align implementation and documentation. (This is what's previously missed from the above issue and PR) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101884 Approved by: https://github.com/eqy, https://github.com/ezyang	2023-05-21 17:38:00 +00:00
Mark Saroufim	3666ca9d97	Dynamic Shape Doc (#101885 ) <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 2f25c1e</samp> > _Dynamic shapes guide_ > _`TorchDynamo` and `TorchInductor`_ > _Learn from data flow_ Thanks @ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/101885 Approved by: https://github.com/eellison, https://github.com/ezyang	2023-05-19 21:43:22 +00:00
Mark Saroufim	ff5b9428aa	Fake Tensor Docs (#101882 ) <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at 75f33ae</samp> > _Fake tensors help_ > _compile and optimize code_ > _`PT2` in autumn_ Thanks @ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/101882 Approved by: https://github.com/eellison, https://github.com/ezyang	2023-05-19 21:39:34 +00:00
Mark Saroufim	581d13a069	Add Logging Doc to compile index (#101888 ) <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at ba85a41</samp> > _`logging` module_ > _documents PyTorch events_ > _cutting through the fog_ Thanks @mlazos Pull Request resolved: https://github.com/pytorch/pytorch/pull/101888 Approved by: https://github.com/eellison	2023-05-19 21:29:25 +00:00
Mark Saroufim	2dd33c71c1	Docs for torchcompile and functorch (#101881 ) <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at b5f48b6</samp> > _`torch.compile` docs_ > _Add a new section for `func`_ > _Winter of features_ Thanks @zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101881 Approved by: https://github.com/eellison, https://github.com/zou3519	2023-05-19 21:23:43 +00:00
Jane Xu	cde597efa1	[docs] Warn that GradScaler can scale under 1 (#101569 ) Completes action item 1 in https://github.com/pytorch/pytorch/issues/99640 Pull Request resolved: https://github.com/pytorch/pytorch/pull/101569 Approved by: https://github.com/ngimel	2023-05-16 23:56:07 +00:00
PyTorch MergeBot	66eef31444	Revert "[fx] change from #users to num_users in graph printout (#101140 )" This reverts commit `e568c5a18d`. Reverted https://github.com/pytorch/pytorch/pull/101140 on behalf of https://github.com/jeanschmidt due to There are internal changes to this commit that are preventing landing, so I am reverting to unblock the diff train ([comment](https://github.com/pytorch/pytorch/pull/101140#issuecomment-1547989487))	2023-05-15 14:35:22 +00:00
Ramin Azarmehr	0be53d83fc	[MPS] Add support for MPSProfiler Python bindings (#101002 ) - Added torch.mps.profiler.[start() and stop()] APIs with RST documentation - Added test case in test_mps Pull Request resolved: https://github.com/pytorch/pytorch/pull/101002 Approved by: https://github.com/malfet	2023-05-12 21:55:34 +00:00
Yueming Hao	a12b640dc9	Fix typos in troubleshooting.rst (#101305 ) There are several typos in the troubleshooting documentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101305 Approved by: https://github.com/desertfire	2023-05-12 21:05:13 +00:00
Ran Ding	b5c8d0359c	Update autograd.rst (#101007 ) Fixes #ISSUE_NUMBER typo fix and small change to improve clarity Pull Request resolved: https://github.com/pytorch/pytorch/pull/101007 Approved by: https://github.com/lezcano, https://github.com/anjali411	2023-05-12 11:47:51 +00:00
Michael Suo	e568c5a18d	[fx] change from #users to num_users in graph printout (#101140 ) `#users` means stuff in various chat apps, which makes it annoying to copypasta graphs into them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101140 Approved by: https://github.com/ezyang	2023-05-12 04:34:01 +00:00
eqy	33f3dca6b5	[CUDA][CUBLAS] Fix BF16 reduced precision reduction note in docs (#101044 ) #100966 CC @ngimel @ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/101044 Approved by: https://github.com/ngimel	2023-05-10 06:50:58 +00:00
eqy	6e2efd16d8	[CUDA][CUBLAS] Add cuBLAS workspace allocation behavior to docs (#100919 ) Adding to the docs for now, hopefully we can move to `cudaMallocAsync`-backed cuBLAS workspaces soon which should alleviate the recent confusion around `cuBLAS` "leaking" memory through workspaces. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100919 Approved by: https://github.com/ngimel	2023-05-10 06:40:26 +00:00
fduwjj	953aa6d90e	[TP] Enable more generic attn in Tensor Parallelism (#100508 ) To make TP more generic for Attention module, we come up with this new col/rowwise parallel style. Basically, the idea behind is that: We only do DTensor op for Col/Rowwise sharded part. For the rest of ATen ops, we will leave it to Tensor ops. And we set this behavior as default for Colwise and Rowwise parallel style. If people want to customize it, they can always pass in different prepare_input or prepare_output Pull Request resolved: https://github.com/pytorch/pytorch/pull/100508 Approved by: https://github.com/wanchaol	2023-05-07 18:15:49 +00:00
Michael Lazos	850556ed6e	Add "all" option to logging (#100664 ) Adds the long-promised "all" option to logging. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100664 Approved by: https://github.com/lezcano	2023-05-06 01:11:18 +00:00
Michael Lazos	c525440ba3	Logging documentation updates (#100595 ) Updated the logging.rst with info about the env var. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100595 Approved by: https://github.com/msaroufim, https://github.com/lezcano	2023-05-04 21:54:02 +00:00
Animesh Jain	8994d9e610	[dynamo] Hide guard_fail_hook behind a flag to improve cache lookup time (+10% DebertaV2) (#100590 ) For TorchDynamo eager backend, DebertaV2 speedup improves from 0.77x to 0.87x. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100590 Approved by: https://github.com/voznesenskym, https://github.com/wconstab	2023-05-04 18:52:21 +00:00
Bin Bao	edebad81a9	Add a rst doc for the performance dashboard (#100592 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100592 Approved by: https://github.com/msaroufim, https://github.com/huydhn	2023-05-04 18:28:09 +00:00
Richard Barnes	9c185b6b46	[codemod] Replace hasattr with getattr in caffe2/docs/source/notes/extending.rst (#100598 ) Summary: The pattern ``` X.Y if hasattr(X, "Y") else Z ``` can be replaced with ``` getattr(X, "Y", Z) ``` The [getattr](https://www.w3schools.com/python/ref_func_getattr.asp) function gives more succinct code than the [hasattr](https://www.w3schools.com/python/ref_func_hasattr.asp) function. Please use it when appropriate. This diff is very low risk. Green tests indicate that you can safely Accept & Ship. Test Plan: Sandcastle Differential Revision: D44886464 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100598 Approved by: https://github.com/Skylion007	2023-05-04 16:36:15 +00:00
Angela Yi	8eb82135d1	[docs] Docs for writing ATen IR passes + FX Pattern matching (#100577 ) I'm not really sure where to put this...maybe just link it somewhere in torch.compile docs? Pull Request resolved: https://github.com/pytorch/pytorch/pull/100577 Approved by: https://github.com/msaroufim	2023-05-04 05:17:10 +00:00
shibo	6aeb85add8	add checkpoint support for custom device (#99626 ) Fixes #ISSUE_NUMBER 1、add checkpoint support for custom device 2、add a device argument, I want to add a device="cuda" parameter to the func `forward` of `CheckpointFunction`, and I can specify the device type when using it, but the func `apply` of `torch.autograd.Function` does not support `kwargs`, so I added a variable named `_device`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99626 Approved by: https://github.com/soulitzer	2023-05-04 00:23:42 +00:00
vfdev-5	6a12f10b08	Publicly exposing `torch.backends.cpu.get_cpu_capability()` (#100164 ) Description: - As suggested by Nikita, created `torch.backends.cpu` submodule and exposed `get_cpu_capability`. - In torchvision Resize method we want to know current cpu capability in order to pick appropriate codepath depending on cpu capablities Newly coded vectorized resize of uint8 images on AVX2 supported CPUs is now faster than older way (uint8->float->resize->uint8). However, on non-avx hardware (e.g. Mac M1) certain configs are slower using native uint8. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100164 Approved by: https://github.com/albanD, https://github.com/malfet	2023-05-03 19:02:07 +00:00
Svetlana Karslioglu	d425da8bf3	Replace master with main in links and docs/conf.py (#100176 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/100176 Approved by: https://github.com/albanD, https://github.com/malfet	2023-05-02 18:20:32 +00:00
Hirochika Matsumoto	f143c92739	[docs] Fix typo in get-started.rst (#100355 ) This PR changes `""nvprims_nvfuser"` which should be a typo to `"nvprims_nvfuser"`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100355 Approved by: https://github.com/Skylion007, https://github.com/kit1980	2023-05-02 00:29:53 +00:00
BowenBao	c94b6a6712	[ONNX] Introduce 'diagnostics' to 'dynamo_export' api (#99668 ) Summary * Introduce `DiagnosticContext` to `torch.onnx.dynamo_export`. * Remove `DiagnosticEngine` in preparations to update 'diagnostics' in `dynamo_export` to drop dependencies on global diagnostic context. No plans to update `torch.onnx.export` diagnostics. Next steps * Separate `torch.onnx.export` diagnostics and `torch.onnx.dynamo_export` diagnostics. * Drop dependencies on global diagnostic context. https://github.com/pytorch/pytorch/pull/100219 * Replace 'print's with 'logger.log'. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99668 Approved by: https://github.com/justinchuby, https://github.com/abock	2023-05-01 19:58:49 +00:00
pbialecki	8fe91d16b0	Remove CUDA 11.6 note from complex docs (#100118 ) Removes note in the complex docs pointing to the CUDA 11.6 wheels introduced in https://github.com/pytorch/pytorch/pull/80363. Background: this warning was added via https://github.com/pytorch/pytorch/issues/79876 which pointed out a slow compilation time in 11.3. The 11.6 pip wheels were thus recommended but are not build anymore as our current support is 11.7, 11.8 (and 12.1 experimental in nightlies). The note is confusing users as it doesn't explain why 11.6 is needed. Reference: https://discuss.pytorch.org/t/complex-numbers-cuda-11-6-documentation-warning/178588/1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/100118 Approved by: https://github.com/msaroufim	2023-04-27 16:26:27 +00:00
milesial	45bf3f6216	Optimized EMA implementation (#94820 ) This PR proposes an optimized way to do Exponential Moving Average (EMA), which is faster than the current way using `swa_utils.AveragedModel` described in https://pytorch.org/docs/stable/optim.html#custom-averaging-strategies. This implementation is asynchronous, and is built as an optimizer wrapper so that the EMA weight update happens without any additional CPU/GPU sync, just after optimizer steps, and with limited code changes. Example usage: ``` model = Model().to(device) opt = torch.optim.Adam(model.parameters()) opt = EMAOptimizer(opt, device, 0.9999) for epoch in range(epochs): training_loop(model, opt) regular_eval_accuracy = evaluate(model) with opt.swap_ema_weights(): ema_eval_accuracy = evaluate(model) ``` Here are some benchmarks (time per iteration) on various torchvision models: \|model\|this PR iteration time \|swa_utils.AveragedModel iteration time\| iteration speedup \| \|-----\|-----------------------------\|-----------------------\|---------------------------------------------\| \| \| \| \| \| \|regnet_x_1_6gf\|62.73 \|67.998 \|1.08 \| \|regnet_x_3_2gf\|101.75 \|109.422 \|1.08 \| \|regnet_x_400mf\|25.13 \|32.005 \|1.27 \| \|regnet_x_800mf\|33.01 \|37.466 \|1.13 \| \|regnet_x_8gf\|128.13 \|134.868 \|1.05 \| \|regnet_y_16gf\|252.91 \|261.292 \|1.03 \| \|regnet_y_1_6gf\|72.14 \|84.22 \|1.17 \| \|regnet_y_3_2gf\|99.99 \|109.296 \|1.09 \| \|regnet_y_400mf\|29.53 \|36.506 \|1.24 \| \|regnet_y_800mf\|37.82 \|43.634 \|1.15 \| \|regnet_y_8gf\|196.63 \|203.317 \|1.03 \| \|resnet101\|128.80 \|137.434 \|1.07 \| \|resnet152\|182.85 \|196.498 \|1.07 \| \|resnet18\|29.06 \|29.975 \|1.03 \| \|resnet34\|50.73 \|53.443 \|1.05 \| \|resnet50\|76.88 \|80.602 \|1.05 \| \|resnext101_32x8d\|277.29 \|280.759 \|1.01 \| \|resnext101_64x4d\|269.56 \|281.052 \|1.04 \| \|resnext50_32x4d\|100.73 \|101.102 \|1.00 \| \|shufflenet_v2_x0_5\|10.56 \|15.419 \|1.46 \| \|shufflenet_v2_x1_0\|13.11 \|18.525 \|1.41 \| \|shufflenet_v2_x1_5\|18.05 \|23.132 \|1.28 \| \|shufflenet_v2_x2_0\|25.04 \|30.008 \|1.20 \| \|squeezenet1_1\|14.26 \|14.325 \|1.00 \| \|swin_b\|264.52 \|274.613 \|1.04 \| \|swin_s\|180.66 \|188.914 \|1.05 \| \|swin_t\|108.62 \|112.632 \|1.04 \| \|swin_v2_s\|220.29 \|231.153 \|1.05 \| \|swin_v2_t\|127.27 \|133.586 \|1.05 \| \|vgg11\|95.52 \|103.714 \|1.09 \| \|vgg11_bn\|106.49 \|120.711 \|1.13 \| \|vgg13\|132.94 \|147.063 \|1.11 \| \|vgg13_bn\|149.73 \|165.256 \|1.10 \| \|vgg16\|158.19 \|172.865 \|1.09 \| \|vgg16_bn\|177.04 \|192.888 \|1.09 \| \|vgg19\|184.76 \|194.194 \|1.05 \| \|vgg19_bn\|203.30 \|213.334 \|1.05 \| \|vit_b_16\|217.31 \|219.748 \|1.01 \| \|vit_b_32\|69.47 \|75.692 \|1.09 \| \|vit_l_32\|223.20 \|258.487 \|1.16 \| \|wide_resnet101_2\|267.38 \|279.836 \|1.05 \| \|wide_resnet50_2\|145.06 \|154.918 \|1.07 \| You can see that in all cases it is faster than using `AveragedModel`. In fact in many cases, adding EMA does not add any overhead since the computation is hidden behind the usual iteration flow. This is a similar implementation to the one currently in [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). If the team is interested in merging this, let me know and I'll add some documentation similar to `swa_utils` and tests. Credits to @szmigacz for the implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94820 Approved by: https://github.com/janeyx99	2023-04-26 18:02:11 +00:00
Chris Gottbrath	f0e28b1cb9	Adding the maintainers approved in 2023Q1 Core Maintainers meeting (#98520 ) Added Nikita to Core Maintainers Merged MKLDNN with CPU Performance Renamed CUDA to GPU Performance Added Jiong to Compiler and CPU Performance Added Xiaobing to CPU Performance Marking Vitaly and Jian Hui as Emeritus Pull Request resolved: https://github.com/pytorch/pytorch/pull/98520 Approved by: https://github.com/ezyang, https://github.com/soumith, https://github.com/dzhulgakov	2023-04-24 17:58:18 +00:00
Kurt Mohler	1e8cf6ad7f	Add documentation for `torch._logging.set_logs` (#99219 ) Part of #98871 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99219 Approved by: https://github.com/mlazos, https://github.com/lezcano	2023-04-24 08:06:57 +00:00
BowenBao	51742a467d	[ONNX] Fix missing import numpy for docs example (#99663 ) Fixes https://github.com/pytorch/pytorch/issues/99408 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99663 Approved by: https://github.com/justinchuby	2023-04-21 04:06:45 +00:00
Simon Seo	9f95032101	Fix broken links in contribution_guide.rst (#99295 ) mainly from `master` to `main` Pull Request resolved: https://github.com/pytorch/pytorch/pull/99295 Approved by: https://github.com/kit1980	2023-04-20 22:20:56 +00:00
Will Constable	e6aa8e0729	Test and document dynamo backward hooks support (#99382 ) No new support added, but backward hooks are working and now there is a test and some documentation about the limitations (hooks firing after whole graph). Pull Request resolved: https://github.com/pytorch/pytorch/pull/99382 Approved by: https://github.com/yanboliang	2023-04-18 03:03:29 +00:00
Will Constable	6eab5e88c8	Graph-break on allowed modules if they have hooks (#97184 ) Allowed modules are stuck into dynamo's fx graph as call_module nodes, without dynamo doing any tracing of the module. This means during AOT trace time, hooks will fire during tracing when the call_module is executed, but the hooks themselves will disappear after that and not be present in the compiled program. (worse, if they performed any tensor operations, those would get traced so you could end up with part of the hook's functionality). To circumvent this, there are two options for 'allowed modules' with hooks. 1) don't treat them as 'allowed' - trace into them 2) graph-break, so the module is no longer part of the dynamo trace at all (1) will fail for users that opted into allowed modules becuase they know their module has problems being traced by dynamo. (2) causes graph breaks on common modules such as nn.Linear, just because they are marked as 'allowed'. It would help matters if we could differentiate between types of allowed modules (A) allowed to avoid overheads - used for common ops like nn.Linear (B) allowed to avoid dynamo graphbreaks caused by unsupported code Ideally, we'd use method (1) for group (A) and (2) for (B). For now, graph-break on all cases of allowed modules. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97184 Approved by: https://github.com/jansel	2023-04-15 01:46:15 +00:00
BowenBao	606ce5b653	[ONNX] Introduce Input/Ouptut adapter; Switch to 'DynamoExporter' (#98421 ) Summary * Introduce input/output adapter. Due to design differences, input/output format between PyTorch model and exported ONNX model are often not the same. E.g., `None` inputs are allowed for PyTorch model, but are not supported by ONNX. Nested constructs of tensors are allowed for PyTorch model, but only flattened tensors are supported by ONNX, etc. The new input/output adapter is exported with the model. Providing an interface to automatically convert and validate inputs/outputs format. * As suggested by #98251, provide extension for unwrapping user defined python classes for `dynamo.export` based exporter. Unblock huggingface models. * Re-wire tests to run through `DynamoExporter` w/ `dynamo_export` api. Kept `DynamoOptimizeExporter` in the tests for now for coverage of this change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98421 Approved by: https://github.com/justinchuby, https://github.com/titaiwangms, https://github.com/thiagocrepaldi	2023-04-15 01:13:00 +00:00
PyTorch MergeBot	dda7ce4bb3	Revert "[core][pruning][be] Rename sparsifier folder to pruner (#98758 )" This reverts commit `778fd1922a`. Reverted https://github.com/pytorch/pytorch/pull/98758 on behalf of https://github.com/jcaip due to https://www.internalfb.com/diff/D44905951 need to fix broken import in fbcode	2023-04-13 16:30:47 +00:00
Tugsbayasgalan Manlaibaatar	39fd7f945f	Add Symbool support in python to C++ translation (#98453 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98453 Approved by: https://github.com/ezyang	2023-04-12 03:21:57 +00:00
Mark Saroufim	bc8cb62bcb	torch.compile benchmark utility (#97699 ) I've had many exchanges that look like this https://github.com/rasbt/faster-pytorch-blog/pull/2 so this is an attempt to get make this problem easier Pull Request resolved: https://github.com/pytorch/pytorch/pull/97699 Approved by: https://github.com/ezyang	2023-04-12 03:02:06 +00:00
soulitzer	367051e47e	[docs] Add missing functions to autograd.rst (#98854 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98854 Approved by: https://github.com/albanD	2023-04-11 20:45:49 +00:00
Jesse Cai	778fd1922a	[core][pruning][be] Rename sparsifier folder to pruner (#98758 ) Summary: att Test Plan: ``` python test/test_ao_sparsity.py ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/98758 Approved by: https://github.com/jerryzh168	2023-04-11 17:26:29 +00:00
Edward Z. Yang	b8b840be3d	Convert logging f-strings to use % format, part five (#98765 ) This does some annoying but simple cases by hand. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98765 Approved by: https://github.com/wanchaol	2023-04-11 13:17:59 +00:00
Guspan Tanadi	ab385bd49e	docs: Linking ResNeXt PyTorch Hub Pipeline (#98689 ) Introducing ResNeXt model as link to PyTorch Hub see Skip connections section. Handle issue in #98690. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98689 Approved by: https://github.com/zou3519, https://github.com/kit1980	2023-04-11 02:20:26 +00:00
Will Constable	390c51bf87	Skip nnmodule hook guards by default (#98371 ) This PR makes basic nnmodule forward hooks work by default, without any overhead. But it leaves silent correctness issues if users modify/remove their hooks later, thus also emits a warning. - the usual case is to not use hooks, so avoid guard overhead here - registering any hook before compile will trigger a warning about hook support - registering a hook later (or removing one) requires user knowledge and opting in, currently this isn't warnable (but maybe we can observe compiled nnmodules to make it warnable). Why skip hook guards by default instead of not tracing __call__/hooks by default? - avoid having a mode flag that alters dynamo tracing behavior (harder to test both codepaths in CI with full coverage) - the most basic hook usecase (registering a hook before compile, and never removing it) will work by default with this PR, while it would require enablement and incur overhead in the 'not tracing __call__' proposal. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98371 Approved by: https://github.com/jansel	2023-04-07 15:10:51 +00:00
BJ Hargrave	555ab310dc	Add itemsize and nbytes properties to Tensor (#98322 ) Adds properties for itemsize and nbytes to Tensor matching the properties in NumPy. Fixes https://github.com/pytorch/pytorch/issues/12728 Pull Request resolved: https://github.com/pytorch/pytorch/pull/98322 Approved by: https://github.com/ezyang	2023-04-05 12:11:55 +00:00
Aaron Bockover	558e5a240e	Introduce torch.onnx.dynamo_export API (#97920 ) This is the first phase of the new ONNX exporter API for exporting from TorchDynamo and FX, and represents the beginning of a new era for exporting ONNX from PyTorch. The API here is a starting point upon which we will layer more capability and expressiveness in subsequent phases. This first phase introduces the following into `torch.onnx`: ```python dynamo_export( model: torch.nn.Module, /, model_args, export_options: Optional[ExportOptions] = None, model_kwargs, ) -> ExportOutput: ... class ExportOptions: opset_version: Optional[int] = None dynamic_shapes: Optional[bool] = None logger: Optional[logging.Logger] = None class ExportOutputSerializer(Protocol): def serialize( self, export_output: ExportOutput, destination: io.BufferedIOBase, ) -> None: ... class ExportOutput: model_proto: onnx.ModelProto def save( self, destination: Union[str, io.BufferedIOBase], , serializer: Optional[ExportOutputSerializer] = None, ) -> None: ... ``` In addition to the API in the first commit on this PR, we have a few experiments for exporting Dynamo and FX to ONNX that this PR rationalizes through the new Exporter API and adjusts tests to use the new API. - A base `FXGraphModuleExporter` exporter from which all derive: - `DynamoExportExporter`: uses dynamo.export to acquire FX graph - `DynamoOptimizeExporter`: uses dynamo.optimize to acquire FX graph - `FXSymbolicTraceExporter`: uses FX symbolic tracing The `dynamo_export` API currently uses `DynamoOptimizeExporter`. ### Next Steps (subsequent PRs): * Combine `DynamoExportExporter` and `DynamoOptimizeExporter` into a single `DynamoExporter`. * Make it easy to test `FXSymbolicTraceExporter` through the same API; eventually `FXSymbolicTraceExporter` goes away entirely when the Dynamo approach works for large models. We want to keep `FXSymbolicTraceExporter` around for now for experimenting and internal use. * Parameterize (on `ExportOptions`) and consolidate Dynamo exporter tests. - This PR intentionally leaves the existing tests unchanged as much as possible except for the necessary plumbing. * Subsequent API phases: - Diagnostics - Registry, dispatcher, and Custom Ops - Passes - Dynamic shapes Fixes #94774 Pull Request resolved: https://github.com/pytorch/pytorch/pull/97920 Approved by: https://github.com/justinchuby, https://github.com/titaiwangms, https://github.com/thiagocrepaldi, https://github.com/shubhambhokare1	2023-04-04 18:13:29 +00:00
Richard Zou	6b9e22f3f6	Clarify the saving of intermediates in the "extending torch.func" docs (#98020 ) Fixes https://github.com/pytorch/pytorch/issues/97260 We got some feedback that the page reads like "in order to save an input for backward, you must return it as an output of the autograd.Function.forward". Doing so actually raises an error (on master and as of 2.1), but results in an ambiguous situation on 2.0.0. To avoid more users running into this, we clarify the documentation so it doesn't read like the above and clearly mentions that you can save things from the inputs or outputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98020 Approved by: https://github.com/soulitzer, https://github.com/kshitij12345	2023-03-31 13:57:37 +00:00
drisspg	a5b6f10c5d	Fix format bug in NT docs (#97998 ) Fixes a formatting bug in the NT docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/97998 Approved by: https://github.com/jbschlosser	2023-03-31 01:00:25 +00:00
Driss Guessous	5a81508bb6	Add NestedTensor ops: logical_not, logical_not_, masked_fill (#97934 ) # Summary <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 7954302</samp> This pull request adds support for `logical_not` and `masked_fill` operations on nested tensors, which are tensors that can have tensors as elements. It modifies the `native_functions.yaml` file to dispatch these operations to the nested tensor backend, implements the logic for these operations in `NestedTensorBinaryOps.cpp` and `NestedTensorUnaryOps.cpp`, adds documentation in `nested.rst`, and adds tests in `test_nestedtensor.py`. ## Description <!-- copilot:walkthrough --> ### <samp>🤖 Generated by Copilot at 7954302</samp> * Implement `logical_not` operation on nested tensors ([link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-2f3dbd85efb9b5172f2264eedd3be47dd765e6ab7cc8bf3ade5e62c28ae35991R1164), [link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-2f3dbd85efb9b5172f2264eedd3be47dd765e6ab7cc8bf3ade5e62c28ae35991R1172), [link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-f7c94671810b3ce652f9ad5458518cb7bbd67e8bf7e84e0a2fba641d878ba7c5R45-R56), [link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-c8b131d009badb3f92031b2aaa6e7f93a793f13caee278ea78e1c57d78c0399eR203), [link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-6eef496a8ec635930b6e52507358e069c80021f3535b8737d39e14ffc38950c0L854-R867)) - Add `NestedTensor_logical_not` and `NestedTensor_logical_not_` functions to `native_functions.yaml` for CPU and CUDA dispatch ([link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-2f3dbd85efb9b5172f2264eedd3be47dd765e6ab7cc8bf3ade5e62c28ae35991R1164), [link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-2f3dbd85efb9b5172f2264eedd3be47dd765e6ab7cc8bf3ade5e62c28ae35991R1172)) - Define `NestedTensor_logical_not` and `NestedTensor_logical_not_` functions in `NestedTensorUnaryOps.cpp` using `map_nt` and `get_buffer` ([link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-f7c94671810b3ce652f9ad5458518cb7bbd67e8bf7e84e0a2fba641d878ba7c5R45-R56)) - Document `torch.logical_not` function for nested tensors in `nested.rst` ([link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-c8b131d009badb3f92031b2aaa6e7f93a793f13caee278ea78e1c57d78c0399eR203)) - Add subtest for `logical_not` function in `test_activations` method in `TestNestedTensorDeviceType` class in `test_nestedtensor.py` ([link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-6eef496a8ec635930b6e52507358e069c80021f3535b8737d39e14ffc38950c0L854-R867)) * Implement `masked_fill` operation on nested tensors ([link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-2f3dbd85efb9b5172f2264eedd3be47dd765e6ab7cc8bf3ade5e62c28ae35991R7439), [link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-f847e41e3d373230df0b25574e993ec0e6b699bf16796b3df9ae9fb518048e25L210-R224), [link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-c8b131d009badb3f92031b2aaa6e7f93a793f13caee278ea78e1c57d78c0399eR197), [link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-6eef496a8ec635930b6e52507358e069c80021f3535b8737d39e14ffc38950c0R677-R688), [link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-6eef496a8ec635930b6e52507358e069c80021f3535b8737d39e14ffc38950c0R2515-R2528)) - Add `NestedTensor_masked_fill` function to `native_functions.yaml` for CPU and CUDA dispatch ([link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-2f3dbd85efb9b5172f2264eedd3be47dd765e6ab7cc8bf3ade5e62c28ae35991R7439)) - Define `NestedTensor_masked_fill` function in `NestedTensorBinaryOps.cpp` using `NestedTensor_elementwise_Tensor` ([link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-f847e41e3d373230df0b25574e993ec0e6b699bf16796b3df9ae9fb518048e25L210-R224)) - Document `torch.Tensor.masked_fill` function for nested tensors in `nested.rst` ([link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-c8b131d009badb3f92031b2aaa6e7f93a793f13caee278ea78e1c57d78c0399eR197)) - Add test case for `masked_fill` function in `TestNestedTensorDeviceType` class in `test_nestedtensor.py` ([link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-6eef496a8ec635930b6e52507358e069c80021f3535b8737d39e14ffc38950c0R677-R688)) - Add test case for backward pass of `masked_fill` function in `TestNestedTensorAutograd` class in `test_nestedtensor.py` ([link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-6eef496a8ec635930b6e52507358e069c80021f3535b8737d39e14ffc38950c0R2515-R2528)) * Improve error message for unsupported element-wise binary operations on nested dense tensors ([link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-f847e41e3d373230df0b25574e993ec0e6b699bf16796b3df9ae9fb518048e25L142-R150)) - Modify `NestedTensor_elementwise_Tensor` function in `NestedTensorBinaryOps.cpp` to include operation name in error message ([link](https://github.com/pytorch/pytorch/pull/97934/files?diff=unified&w=0#diff-f847e41e3d373230df0b25574e993ec0e6b699bf16796b3df9ae9fb518048e25L142-R150)) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97934 Approved by: https://github.com/cpuhrsch	2023-03-30 08:14:39 +00:00
Driss Guessous	f603873c1b	add various NT ops needed for testing (#97837 ) # Summary Add some Simple unary and binary NT ops - Sub - sgn - abs Pull Request resolved: https://github.com/pytorch/pytorch/pull/97837 Approved by: https://github.com/cpuhrsch	2023-03-29 23:43:37 +00:00
vfdev	0f424f7f05	Fixed broken link to troubleshooting.html docs page (#97330 ) Seen first in error message: ``` [2023-03-22 10:30:39,786] torch._dynamo.convert_frame: [WARNING] torch._dynamo hit config.cache_size_limit (64) function: '<resume in paste_mask_in_image>' (/vision/torchvision/models/detection/roi_heads.py:407) reasons: w == 857 to diagnose recompilation issues, see https://pytorch.org/docs/master/dynamo/troubleshooting.html. [2023-03-22 10:30:40,036] torch._dynamo.convert_frame: [WARNING] torch._dynamo hit config.cache_size_limit (64) function: '<resume in paste_mask_in_image>' (/vision/torchvision/models/detection/roi_heads.py:406) reasons: ___stack0 == 207 to diagnose recompilation issues, see https://pytorch.org/docs/master/dynamo/troubleshooting.html. ``` Broken link: - https://pytorch.org/docs/master/dynamo/troubleshooting.html. Good link: - https://pytorch.org/docs/master/compile/troubleshooting.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/97330 Approved by: https://github.com/zou3519	2023-03-22 16:40:21 +00:00
Mikayla Gawarecki	b04363ead4	[easy] Expose documentation for a few global nn.Module hooks (#97185 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97185 Approved by: https://github.com/albanD	2023-03-21 20:09:29 +00:00
Kazuaki Ishizaki	50ed38a7eb	Fix typo under docs directory (#97202 ) This PR fixes typo in `.rst` files under docs directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/97202 Approved by: https://github.com/kit1980	2023-03-21 01:24:10 +00:00
Driss Guessous	a269e5fa04	Add forward and backward support for silu to NestedTensors (#97181 ) # Summary Add forward and backward support for silu to NestedTensors - Add forward support to silu - Add forward support to silu_ - Add backward support to silu - Add to NT docs - Add tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/97181 Approved by: https://github.com/cpuhrsch, https://github.com/jbschlosser	2023-03-20 23:46:12 +00:00
Mark Saroufim	6110effa86	Rework torch.compile docs (#96706 ) Chatted with @stas00 on slack and here are some great improvements he suggested to the compile docs - [x] Rename `dynamo` folder to `compile` - [x] Link `compile` docstring on `torch.html` to main index page for compile - [x] Create a new index page that describes why people should care - [x] easy perf, memory reduction, 1 line - [x] Short benchmark table - [x] How to guide - [x] TOC that links to the more technical pages folks have written, make the existing docs we have a Technical overview - [x] Highlight the new APIs for `torch._inductor.list_options()` and `torch._inductor.list_mode_options()` - clarify these are inductor specific and add more prose around which ones are most interesting He also highlighted an interesting way to think about who is reading this doc we have - [x] End users, that just want things to run fast - [x] Library maintainers wrapping torch.compile which would care for example about understanding when in their code they should compile a model, which backends are supported - [x] Debuggers who needs are somewhat addressed by the troubleshooting guide and faq but those could be dramatically reworked to say what we expect to break And in a seperate PR I'll work on the below with @SherlockNoMad - [ ] Authors of new backends that care about how to plug into dynamo or inductor layer so need to explain some more internals like - [ ] IR - [ ] Where to plugin, dynamo? inductor? triton? Pull Request resolved: https://github.com/pytorch/pytorch/pull/96706 Approved by: https://github.com/svekars	2023-03-15 04:41:13 +00:00
Bin Bao	f03db8d6cb	[reland2][inductor] Add an AOT compilation mode for Inductor CPP backend (#96520 ) Summary: This is a reland of https://github.com/pytorch/pytorch/pull/94822. Solved the long compilation issue for inductor cpp tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96520 Approved by: https://github.com/huydhn, https://github.com/malfet	2023-03-14 16:10:54 +00:00
eqy	6e3e22d58c	[CUDA][cuFFT] Minor fix for cuFFT plan cache docs (#96373 ) The attributes described in the docs require indexing in to the plan cache manager, as there is a separate plan cache per device. CC @ptrblck @ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/96373 Approved by: https://github.com/ngimel	2023-03-14 00:28:14 +00:00
Driss Guessous	f330281fb2	Add torch.nn.LayerNorm() to documented list of supported nested tensor ops (#96434 ) Layer norm is supported and this updates the documentation to reflect that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96434 Approved by: https://github.com/cpuhrsch, https://github.com/jbschlosser	2023-03-13 23:16:09 +00:00
Joel Schlosser	30d56dd8c1	Support randn_like() for NT (#96528 ) To satisfy an internal ask. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96528 Approved by: https://github.com/mikaylagawarecki, https://github.com/cpuhrsch	2023-03-13 19:39:51 +00:00
Kiuk Chung	55a1bd3fc6	[PT-D] Update CODEOWNERS, merge_rules, and Persons-of-Interest for to… (#96321 ) Synchronize CODEOWNERS, merge_rules, and POI files to reflect kiukchung and d4l3k (Tristan Rice) as one of the maintainers for the distributed module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96321 Approved by: https://github.com/d4l3k, https://github.com/albanD, https://github.com/malfet	2023-03-13 17:38:43 +00:00
Joel Schlosser	024ea1a21e	Support zeros_like() for NT (#96527 ) This is used for the fake tensor fallbacks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96527 Approved by: https://github.com/cpuhrsch	2023-03-13 15:15:08 +00:00
Rishub Tamirisa	f3b8638074	Adding nn.ZeroPad1d and nn.ZeroPad3d (#96295 ) Fixes #95796 ### Implementation Adds python implementation for `nn.ZeroPad1d` and `nn.ZeroPad3d` in `torch/nn/modules/padding.py`. Adds cpp implementation for `nn::ZeroPad1d` and `nn::ZeroPad3d` in the following 3 files, refactored with templates similarly to `nn::ConstantPad`'s implementation: <br> - `torch/crsc/api/include/torch/nn/modules/padding.h` - `torch/csrc/api/include/torch/nn/options/padding.h` - `torch/csrc/api/src/nn/modules/padding.cpp` Also added relevant definitions in `torch/nn/modules/__init__.py`. ### Testing Adds the following tests: - cpp tests of similar length and structure as `ConstantPad` and the existing `ZeroPad2d` impl in `test/cpp/api/modules.cpp` - cpp API parity tests in `torch/testing/_internal/common_nn.py` - module init tests in `test/test_module_init.py` Also added relevant definitions in `test/cpp_api_parity/parity-tracker.md` Pull Request resolved: https://github.com/pytorch/pytorch/pull/96295 Approved by: https://github.com/soulitzer	2023-03-10 03:51:41 +00:00
Joel Schlosser	7324aef9a8	Add torch.empty_like() to documented list of supported nested tensor ops (#96211 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96211 Approved by: https://github.com/drisspg	2023-03-07 23:33:34 +00:00
Iris	a7698a8260	[DCP] Add DCP FSDP sharded_state_dict checkpoint example to DCP .rst file (#95517 ) As title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95517 Approved by: https://github.com/kumpera	2023-03-03 18:09:10 +00:00
Svetlana Karslioglu	004bcffc6a	Fix formatting (#95906 ) Fixing list formatting by adding a missing blank line: Before: ![Screenshot 2023-03-02 at 3 17 28 PM (2)](https://user-images.githubusercontent.com/5317992/222585127-9b6ed4dd-4719-4756-b2ac-1ba6e8f97b87.png) After: ![Screenshot 2023-03-02 at 3 16 48 PM (2)](https://user-images.githubusercontent.com/5317992/222585172-3ef35a48-641f-4b73-9f7b-f419a122196b.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95906 Approved by: https://github.com/orionr	2023-03-03 16:18:12 +00:00
Michael Lazos	184fb9f11d	Small doc update for torch_compile_debug (#95809 ) Updates the troubleshooting documentation with the folder structure of the debug directory Pull Request resolved: https://github.com/pytorch/pytorch/pull/95809 Approved by: https://github.com/msaroufim	2023-03-02 00:25:28 +00:00
Mark Saroufim	f7b26bdd22	Remove mention of dynamo.optimize() in docs (#95802 ) This should be self containable to merge but other stuff that's been bugging me is * Instructions on debugging IMA issues * Dynamic shape instructions * Explaining config options better Will look at adding a config options doc Pull Request resolved: https://github.com/pytorch/pytorch/pull/95802 Approved by: https://github.com/svekars	2023-03-01 23:24:09 +00:00
ajithvallabai	e9c70b0b20	Fix typo and grammatical errors in community docs and dynamo docs (#95692 ) Fixes typo and grammatical errors in community docs and dynamo docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/95692 Approved by: https://github.com/H-Huang	2023-03-01 18:10:46 +00:00
ajithvallabai	3944e7c3e8	Fix grammatical errors in contribution guide (#95454 ) Fixed following errors in contribution guide. "deep neural networks using a on tape-based autograd systems." to "deep neural networks using a tape-based autograd systems." "the best entrance point and are great places to start." to "the best entrance points and are great places to start." Pull Request resolved: https://github.com/pytorch/pytorch/pull/95454 Approved by: https://github.com/ezyang	2023-02-28 03:44:40 +00:00
Svetlana Karslioglu	d7146e7870	Update copyright (#95652 ) Updating the copyright to reflect on the website. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95652 Approved by: https://github.com/atalman	2023-02-27 23:15:55 +00:00
Jane Xu	b215af2db8	[optim] Add general documentation on our algorithm defaults (#95391 ) I added a section + table under Algorithms https://docs-preview.pytorch.org/95391/optim.html?highlight=optim#module-torch.optim <img width="725" alt="image" src="https://user-images.githubusercontent.com/31798555/221246256-99325a27-9016-407b-a9fe-404d61e41a82.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/95391 Approved by: https://github.com/albanD	2023-02-24 21:35:30 +00:00
Mark Saroufim	9f707f164e	Add more GPU metric instrumentation (#91717 ) Fixes https://github.com/pytorch/serve/issues/1937 A fairly common query I see folks running while using pytorch is `nvidia-smi --format=csv,noheader,nounits --query-gpu=utilization.gpu,utilization.memory,memory.total,memory.used,temperature.gpu,power.draw,clocks.current.sm,clocks.current.memory -l 10` Existing metrics we have * For kernel utilization`torch.cuda.utilization()` * For memory utilization we have them under `torch.cuda.memory` the memory allocated with `torch.cuda.memory.memory_allocated()` * For total available memory we have `torch.cuda.get_device_properties(0).total_memory` Which means the only metrics we're missing are * Temperature: now in `torch.cuda.temperature()` * Power draw: now in `torch.cuda.power()` * Clock speed: now in `torch.cuda.clock_speed()` With some important details on each * Clock speed settings: I picked the SM clock domain which is documented here https://docs.nvidia.com/deploy/nvml-api/group__nvmlDeviceEnumvs.html#group__nvmlDeviceEnumvs_1g805c0647be9996589fc5e3f6ff680c64 * Temperature: I use `pynvml.nvmlDeviceGetTemperature(handle, 0)` where 0 refers to the GPU die temperature Pull Request resolved: https://github.com/pytorch/pytorch/pull/91717 Approved by: https://github.com/ngimel	2023-02-24 00:38:03 +00:00
Atharva Kavitkar	627282fa6c	Corrected grammar in contribution guide (#93014 ) Corrected the grammar of a sentence in "Implementing Features or Fixing Bugs" section of the contribution guide. Before: Issues that are labeled first-new-issue, low, or medium priority provide the best entrance point are great places to start. After: Issues that are labeled first-new-issue, low, or medium priority provide the best entrance point _and_ are great places to start. Pull Request resolved: https://github.com/pytorch/pytorch/pull/93014 Approved by: https://github.com/albanD, https://github.com/kit1980	2023-02-24 00:22:14 +00:00
fduwjj	b209d8fa0d	[PT-D][Sequence Parallelism] Enable DTensor based Naive sequence parallelism (#94369 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94369 Approved by: https://github.com/wanchaol	2023-02-16 21:21:00 +00:00
Wanchao Liang	cd9ca4c73f	[tp] additional doc fixes (#94786 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94786 Approved by: https://github.com/fduwjj	2023-02-15 21:25:26 +00:00
Yaoyao Ding	57b22bc6d8	[Dynamo] Backend registration with ``entry_points`` (#93873 ) Fixes #91824 This PR add a new dynamo backend registration mechanism through ``entry_points``. The ``entry_points`` of a package is provides a way for the package to reigster a plugin for another one. The docs of the new mechanism: ![image](https://user-images.githubusercontent.com/23381083/216133221-18cf18e2-6ad6-4cf7-8da2-9b9b883389c8.png) (the typo '...named "my_backend" that has been..." has been fixed to '...named "my_compiler" that has been...') # Discussion ## About the test I did not add a test for this PR as it is hard either to install a fack package during a test or manually hack the entry points function by replacing it with a fake one. I have tested this PR offline with the hidet compiler and it works fine. Please let me know if you have any good idea to test this PR. ## About the dependency of ``importlib_metadata`` This PR will add a dependency ``importlib_metadata`` for the python < 3.10 because the modern usage of ``importlib`` gets stable at this python version (see the documentation of the importlib package [here](https://docs.python.org/3/library/importlib.html)). For python < 3.10, the package ``importlib_metadata`` implements the feature of ``importlib``. The current PR will hint the user to install this ``importlib_metata`` if their python version < 3.10. ## About the name and docs Please let me know how do you think the name ``torch_dynamo_backend`` as the entry point group name and the documentation of this registration mechanism. Pull Request resolved: https://github.com/pytorch/pytorch/pull/93873 Approved by: https://github.com/malfet, https://github.com/jansel	2023-02-14 15:44:25 +00:00
fduwjj	39511697d4	[PT-D][BE] Update 2D parallelism API name and docs (#94771 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94771 Approved by: https://github.com/wanchaol	2023-02-14 08:13:15 +00:00
PyTorch MergeBot	28ed0bdb37	Revert "[tp] additional doc fixes (#94786 )" This reverts commit `7522ca55f1`. Reverted https://github.com/pytorch/pytorch/pull/94786 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, but the doc failure looks related and they are also failing in trunk `7522ca55f1`	2023-02-14 05:43:37 +00:00
Wanchao Liang	7522ca55f1	[tp] additional doc fixes (#94786 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94786 Approved by: https://github.com/fduwjj	2023-02-14 04:52:04 +00:00
Wanchao Liang	2db12e3844	[tp] minor update to TP docs (#94748 ) minor update to TP docs for beta release Pull Request resolved: https://github.com/pytorch/pytorch/pull/94748 Approved by: https://github.com/fduwjj	2023-02-13 21:54:19 +00:00
Quajak	c0e7077674	Fix link in docs (#94686 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94686 Approved by: https://github.com/kit1980	2023-02-13 20:42:24 +00:00
Ramin Azarmehr	b57e6fdb50	[MPS] Enable Memory Leak Detection for test_mps.py (#94646 ) - To check for Memory Leaks in `test_mps.py`, set the env-variable `PYTORCH_TEST_MPS_MEM_LEAK_CHECK=1` when running test_mps.py (used CUDA code as reference). - Added support for the following new python interfaces in MPS module: `torch.mps.[empty_cache(), set_per_process_memory_fraction(), current_allocated_memory(), driver_allocated_memory()]` - Renamed `_is_mps_on_macos_13_or_newer()` to `_mps_is_on_macos_13_or_newer()`, and `_is_mps_available()` to `_mps_is_available()` to be consistent in naming with prefix `_mps`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94646 Approved by: https://github.com/malfet	2023-02-13 17:56:24 +00:00
Mikayla Gawarecki	5ce1fad711	Add rnn.unpad_sequence and rnn.unpack_sequence to documentation (#94316 ) Fix #76064 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94316 Approved by: https://github.com/jbschlosser	2023-02-13 17:47:10 +00:00
Ramin Azarmehr	bdd8f518d7	[MPS] Add Python Module Bindings for the MPS backend (#94417 ) - This PR is a prerequisite for the upcoming Memory Leak Detection PR. - Enable global manual seeding via `torch.manual_seed()` + test case - Add `torch.mps.synchronize()` to wait for MPS stream to finish + test case - Enable the following python interfaces for MPS: `torch.mps.[get_rng_state(), set_rng_state(), synchronize(), manual_seed(), seed()]` - Added some test cases in test_mps.py - Added `mps.rst` to document the `torch.mps` module. - Fixed the failure with `test_public_bindings.py` Description of new files added: - `torch/csrc/mps/Module.cpp`: implements `torch._C` module functions for `torch.mps` and `torch.backends.mps`. - `torch/mps/__init__.py`: implements Python bindings for `torch.mps` module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94417 Approved by: https://github.com/albanD	2023-02-12 21:22:30 +00:00
Xuehai Pan	8d45f555d7	[BE] [1/3] Rewrite `super()` calls in caffe2 and benchmarks (#94587 ) Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied. - #94587 - #94588 - #94592 Also, methods with only a `super()` call are removed: ```diff class MyModule(nn.Module): - def __init__(self): - super().__init__() - def forward(self, ...): ... ``` Some cases that change the semantics should be kept unchanged. E.g.: `f152a79be9/caffe2/python/net_printer.py (L184-L190)` `f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94587 Approved by: https://github.com/ezyang	2023-02-11 18:19:48 +00:00
PyTorch MergeBot	4fe365774a	Revert "[MPS] Add Python Module Bindings for the MPS backend (#94417 )" This reverts commit `beb4f5bf39`. Reverted https://github.com/pytorch/pytorch/pull/94417 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, but it seems to break MacOS test in trunk `bae397ec63`	2023-02-11 05:24:45 +00:00
Ramin Azarmehr	beb4f5bf39	[MPS] Add Python Module Bindings for the MPS backend (#94417 ) - This PR is a prerequisite for the upcoming Memory Leak Detection PR. - Enable global manual seeding via `torch.manual_seed()` + test case - Add `torch.mps.synchronize()` to wait for MPS stream to finish + test case - Enable the following python interfaces for MPS: `torch.mps.[get_rng_state(), set_rng_state(), synchronize(), manual_seed(), seed()]` - Added some test cases in test_mps.py - Added `mps.rst` to document the `torch.mps` module. - Fixed the failure with `test_public_bindings.py` Description of new files added: - `torch/csrc/mps/Module.cpp`: implements `torch._C` module functions for `torch.mps` and `torch.backends.mps`. - `torch/mps/__init__.py`: implements Python bindings for `torch.mps` module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94417 Approved by: https://github.com/albanD	2023-02-10 23:18:41 +00:00
Driss Guessous	70026aaad6	[SDPA] update type hint for scaled_dot_product_attention and documentation (#94008 ) # Summary - Adds type hinting support for SDPA - Updates the documentation adding warnings and notes on the context manager - Adds scaled_dot_product_attention to the non-linear activation function section of nn.functional docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/94008 Approved by: https://github.com/cpuhrsch	2023-02-10 18:02:43 +00:00
Xuehai Pan	a229b4526f	[BE] Prefer dash over underscore in command-line options (#94505 ) Preferring dash over underscore in command-line options. Add `--command-arg-name` to the argument parser. The old arguments with underscores `--command_arg_name` are kept for backward compatibility. Both dashes and underscores are used in the PyTorch codebase. Some argument parsers only have dashes or only have underscores in arguments. For example, the `torchrun` utility for distributed training only accepts underscore arguments (e.g., `--master_port`). The dashes are more common in other command-line tools. And it looks to be the default choice in the Python standard library: `argparse.BooleanOptionalAction`: `4a9dff0e5a/Lib/argparse.py (L893-L895)` ```python class BooleanOptionalAction(Action): def __init__(...): if option_string.startswith('--'): option_string = '--no-' + option_string[2:] _option_strings.append(option_string) ``` It adds `--no-argname`, not `--no_argname`. Also typing `_` need to press the shift or the caps-lock key than `-`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94505 Approved by: https://github.com/ezyang, https://github.com/seemethere	2023-02-09 20:16:49 +00:00
Xuehai Pan	69e0bda999	[BE] Import `Literal`, `Protocol`, and `Final` from standard library `typing` as of Python 3.8+ (#94490 ) Changes: 1. `typing_extensions -> typing-extentions` in dependency. Use dash rather than underline to fit the [PEP 503: Normalized Names](https://peps.python.org/pep-0503/#normalized-names) convention. ```python import re def normalize(name): return re.sub(r"[-_.]+", "-", name).lower() ``` 2. Import `Literal`, `Protocal`, and `Final` from standard library as of Python 3.8+ 3. Replace `Union[Literal[XXX], Literal[YYY]]` to `Literal[XXX, YYY]`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94490 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-09 19:17:49 +00:00
double7	685108b201	[docs] Fix incorrect wrapping of function (#94446 ) The sample code of document incorrectly wraps the function decorator. To fix this, update the attributes of `func` based on `torch_function`. Fixes #94305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94446 Approved by: https://github.com/ezyang	2023-02-09 16:01:10 +00:00
kshitij12345	4f3858c6d8	[functorch] linearize (#94173 ) Fixes https://github.com/pytorch/functorch/issues/724 TODO: * [x] Docs NOTE: `const_fold` pass raises UserWarning -> https://github.com/pytorch/pytorch/issues/94374 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94173 Approved by: https://github.com/Chillee	2023-02-09 15:45:08 +00:00
PyTorch MergeBot	e0e4f1a890	Revert "[functorch] linearize (#94173 )" This reverts commit `b6b9e1e6e0`. Reverted https://github.com/pytorch/pytorch/pull/94173 on behalf of https://github.com/kshitij12345 due to Broke lint runner	2023-02-09 09:22:39 +00:00
Kshiteej K	b6b9e1e6e0	[functorch] linearize (#94173 ) Fixes https://github.com/pytorch/functorch/issues/724 TODO: * [x] Docs NOTE: `const_fold` pass raises UserWarning -> https://github.com/pytorch/pytorch/issues/94374 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94173 Approved by: https://github.com/Chillee	2023-02-09 08:57:05 +00:00
fduwjj	41e3189222	[PT-D][Tensor parallelism] Add documentations for TP (#94421 ) This is far from completed and we will definitely polish it down the road. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94421 Approved by: https://github.com/wz337	2023-02-09 02:31:06 +00:00
Vasiliy Kuznetsov	a9f57db607	AO migration: migrate .rst files to new locations (#94211 ) Summary: Migrates the PyTorch documentation to point to the new locations of AO code. Context: https://github.com/pytorch/pytorch/issues/81667 Process: 1. run https://gist.github.com/vkuzo/c38d4ba201604579d7d316ec4a4692e7 for automated replacement 2. manually fix the doc build errors (by removing the module declarations which are now duplicate) Test plan: CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/94211 Approved by: https://github.com/jerryzh168	2023-02-07 02:32:23 +00:00
Jason Ansel	e071d72f3c	Tag dynamo backends as debug/experimental (#93878 ) Hides debug/experimental backends by default. Before: ``` torch._dynamo.list_backends() ['aot_eager', 'aot_eager_decomp_partition', 'aot_torchxla_trace_once', 'aot_torchxla_trivial', 'aot_ts', 'aot_ts_nvfuser', 'cudagraphs', 'dynamo_accuracy_minifier_backend', 'dynamo_minifier_backend', 'eager', 'inductor', 'ipex', 'nvprims_aten', 'nvprims_nvfuser', 'onnxrt', 'tensorrt', 'torchxla_trace_once', 'torchxla_trivial', 'ts', 'tvm'] ``` After: ``` torch._dynamo.list_backends() ['aot_ts_nvfuser', 'cudagraphs', 'inductor', 'ipex', 'nvprims_nvfuser', 'onnxrt', 'tensorrt', 'tvm'] ``` Fixes https://github.com/pytorch/pytorch/issues/93733 Pull Request resolved: https://github.com/pytorch/pytorch/pull/93878 Approved by: https://github.com/voznesenskym	2023-02-04 00:50:51 +00:00
Svetlana Karslioglu	5197496799	Add a private API banner (#93996 ) Add a banner that will appear on all pages where the last segment of the URL starts with an underscore "_". Example pages: * https://pytorch.org/docs/master/_dynamo.html * https://pytorch.org/docs/master/_modules/torch/_jit_internal.html Sample screenshots: <img width="885" alt="Screenshot 2023-02-03 at 1 13 47 PM" src="https://user-images.githubusercontent.com/5317992/216711948-6ba35d38-da8f-4145-9580-bafc921a1df5.png"> <img width="871" alt="Screenshot 2023-02-03 at 1 12 51 PM" src="https://user-images.githubusercontent.com/5317992/216711951-877a760e-3449-4593-b81c-14bf3b9943da.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/93996 Approved by: https://github.com/malfet, https://github.com/albanD	2023-02-03 21:40:15 +00:00
Jason Ansel	5d709af59a	Rename aot_cudagraphs to cudagraphs (#93821 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/93821 Approved by: https://github.com/ezyang	2023-02-03 21:01:27 +00:00
Svetlana Karslioglu	3b7140d938	Add the new submission form (#94000 ) Adding the new form for submitting topics on quarterly maintainers meetings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94000 Approved by: https://github.com/orionr	2023-02-03 16:46:30 +00:00
soulitzer	77cbaedd5c	[docs] Add section about tensor hooks on in-place in autograd note (#93116 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/93116 Approved by: https://github.com/albanD	2023-02-01 17:35:21 +00:00
Ivan Kobzarev	9daca46dc4	[jit][await] Apply review comments (#93284 ) Differential Revision: [D42849920](https://our.internmc.facebook.com/intern/diff/D42849920) Pull Request resolved: https://github.com/pytorch/pytorch/pull/93284 Approved by: https://github.com/malfet	2023-02-01 07:22:06 +00:00
Svetlana Karslioglu	218d4eac56	Remove submission form (#93287 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/93287 Approved by: https://github.com/orionr	2023-01-31 23:41:16 +00:00
akhilkedia	129a1bc715	Minor error in docs regarding execution time (#93258 ) The previous sentence seemed to imply that sparse may not always be helpful, ie, your execution time may increase when using sparse. But the docs mentioned otherwise. A simple re-ordering of two words in the documentation to better align with the contextual sentiment. Pull Request resolved: https://github.com/pytorch/pytorch/pull/93258 Approved by: https://github.com/cpuhrsch	2023-01-31 23:32:42 +00:00
Ivan Yashchuk	fba13d94a1	Remove deprecated torch.symeig (#70988 ) The time has come to remove deprecated linear algebra related functions. This PR removes `torch.symeig`. - [x] XLA PR: https://github.com/pytorch/xla/pull/4498 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70988 Approved by: https://github.com/lezcano, https://github.com/kit1980, https://github.com/malfet	2023-01-31 11:59:11 +00:00
William Wen	2a6e085704	Update custom backend docs (#92721 ) Title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92721 Approved by: https://github.com/jansel	2023-01-30 23:54:49 +00:00
Ivan Kobzarev	2fc73622f8	[jit] Support Awaitable type (#90863 ) We want to make TorchRec sharded models TorchScriptable. TorchRec sharded models uses generic types Awaitable[W] and LazyAwaitable[W] (https://github.com/pytorch/torchrec/blob/main/torchrec/distributed/types.py#L212). In sharded model those types are used instead of contained type W, having the initialization function that produces object of type W. At the moment when the first attribute of W is requested - `LazyAwaitable[W]` will call its initialization function (on the same stack), cache the result inside and work transparently as an object of W. So we can think about it as a delayed object initialization. To support this behavior in TorchScript - we propose a new type to TorchScript - `Await`. In eager mode it works the same as `LazyAwaitable[W]` in TorchRec, being dynamically typed - acting as a type `W` while it is `Await[W]`. Within torchscript it is `Await[W]` and can be only explicitly converted to W, using special function `torch.jit.awaitable_wait(aw)`. Creation of this `Await[W]` is done via another special function `torch.jit.awaitable(func, args)`. The semantic is close to `torch.jit.Future`, fork, wait and uses the same jit mechanics (inline fork Closures) with the difference that it does not start this function in parallel on fork. It only stores as a lambda inside IValue that will be called on the same thread when `torch.jit.awaitable_wait` is called. For example (more examples in this PR `test/jit/test_await.py`) ``` def delayed(z: Tensor) -> Tensor: return Tensor 3 @torch.jit.script def fn(x: Tensor): aw: Await[int] = torch.jit._awaitable(delayed, 99) a = torch.eye(2) b = torch.jit._awaitable_wait(aw) return a + b + x ``` Functions semantics: `_awaitable(func -> Callable[Tuple[...], W], args, *kwargs) -> Await[W]` Creates Await object, owns args and kwargs. Once _awaitable_wait calls, executes function func and owns the result of the function. Following _awaitable_wait calls will return this result from the first function call. `_awaitable_wait(Await[W]) -> W` Returns either cached result of W if it is not the first _awaitable_wait call to this Await object or calls specified function if the first. `_awaitable_nowait(W) -> Await[W]` Creates trivial Await[W] wrapper on specified object To be type complaint for the corner cases. Differential Revision: [D42502706](https://our.internmc.facebook.com/intern/diff/D42502706) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90863 Approved by: https://github.com/davidberard98	2023-01-30 17:38:59 +00:00
Edward Z. Yang	c7b03010ec	Split the aot/dynamo TORCHDYNAMO_REPRO_AFTER cases (#93226 ) I often copy paste this line and it is annoying to have to modify the inside to select aot/dynamo Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/93226 Approved by: https://github.com/desertfire	2023-01-30 14:23:16 +00:00
Felix Divo	219e9533f0	Improve autograd doc on complex numbers (#93065 ) A tiny change to fix formatting and clarify a bit in [this section](https://pytorch.org/docs/stable/notes/autograd.html#what-are-complex-derivatives). Pull Request resolved: https://github.com/pytorch/pytorch/pull/93065 Approved by: https://github.com/albanD	2023-01-27 09:36:38 +00:00
Sherlock Huang	a6ac922eab	Rename Canonical Aten IR to Core Aten IR (#92904 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92904 Approved by: https://github.com/bdhirsh	2023-01-25 05:12:23 +00:00
PyTorch MergeBot	acdd462b1a	Revert "Remove deprecated torch.symeig (#70988 )" This reverts commit `d70ed68162`. Reverted https://github.com/pytorch/pytorch/pull/70988 on behalf of https://github.com/kit1980 due to Failing XLA tests, forward fix unsuccessful	2023-01-24 19:03:40 +00:00
Rodrigo Kumpera	9e56378ef2	Add documentation for DCP. (#92813 ) This populates the website with some basic documentation. It's far from ideal as we should include some basic usage example. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92813 Approved by: https://github.com/wz337	2023-01-24 17:21:51 +00:00
Ivan Yashchuk	d70ed68162	Remove deprecated torch.symeig (#70988 ) The time has come to remove deprecated linear algebra related functions. This PR removes `torch.symeig`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70988 Approved by: https://github.com/lezcano, https://github.com/kit1980	2023-01-23 22:51:40 +00:00
Kazuaki Ishizaki	d40a4540d6	Fix typo under docs directory (#92762 ) This PR fixes typo and URL (`http -> https`) in `rst` files under `docs` directory Pull Request resolved: https://github.com/pytorch/pytorch/pull/92762 Approved by: https://github.com/H-Huang	2023-01-23 18:07:22 +00:00
Masaki Kozuki	30876229a7	[mta] Backward of unary foreach functions (#89591 ) as per title, this PR defines backward of those. This doesn't implement forward-mode automatic differentiation as [the current codegen](`a747326423/tools/autograd/gen_variable_type.py (L1513)`) doesn't seem to handle `ArrayRef<Tensor>`. Rel: - https://github.com/pytorch/pytorch/issues/53796 - https://github.com/pytorch/pytorch/issues/58833 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89591 Approved by: https://github.com/albanD	2023-01-23 08:28:06 +00:00
Edward Z. Yang	85a1f0223a	Add a warning about performance cost of set_default_device (#92703 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92703 Approved by: https://github.com/albanD	2023-01-21 02:23:13 +00:00
Edward Z. Yang	5c6f5439b7	Implement SymBool (#92149 ) We have known for a while that we should in principle support SymBool as a separate concept from SymInt and SymFloat ( in particular, every distinct numeric type should get its own API). However, recent work with unbacked SymInts in, e.g., https://github.com/pytorch/pytorch/pull/90985 have made this a priority to implement. The essential problem is that our logic for computing the contiguity of tensors performs branches on the passed in input sizes, and this causes us to require guards when constructing tensors from unbacked SymInts. Morally, this should not be a big deal because, we only really care about the regular (non-channels-last) contiguity of the tensor, which should be guaranteed since most people aren't calling `empty_strided` on the tensor, however, because we store a bool (not a SymBool, prior to this PR it doesn't exist) on TensorImpl, we are forced to immediately compute these values, even if the value ends up not being used at all. In particular, even when a user allocates a contiguous tensor, we still must compute channels-last contiguity (as some contiguous tensors are also channels-last contiguous, but others are not.) This PR implements SymBool, and makes TensorImpl use SymBool to store the contiguity information in ExtraMeta. There are a number of knock on effects, which I now discuss below. * I introduce a new C++ type SymBool, analogous to SymInt and SymFloat. This type supports logical and, logical or and logical negation. I support the bitwise operations on this class (but not the conventional logic operators) to make it clear that logical operations on SymBool are NOT short-circuiting. I also, for now, do NOT support implicit conversion of SymBool to bool (creating a guard in this case). This does matter too much in practice, as in this PR I did not modify the equality operations (e.g., `==` on SymInt) to return SymBool, so all preexisting implicit guards did not need to be changed. I also introduced symbolic comparison functions `sym_eq`, etc. on SymInt to make it possible to create SymBool. The current implementation of comparison functions makes it unfortunately easy to accidentally introduce guards when you do not mean to (as both `s0 == s1` and `s0.sym_eq(s1)` are valid spellings of equality operation); in the short term, I intend to prevent excess guarding in this situation by unit testing; in the long term making the equality operators return SymBool is probably the correct fix. * ~~I modify TensorImpl to store SymBool for the `is_contiguous` fields and friends on `ExtraMeta`. In practice, this essentially meant reverting most of the changes from https://github.com/pytorch/pytorch/pull/85936 . In particular, the fields on ExtraMeta are no longer strongly typed; at the time I was particularly concerned about the giant lambda I was using as the setter getting a desynchronized argument order, but now that I have individual setters for each field the only "big list" of boolean arguments is in the constructor of ExtraMeta, which seems like an acceptable risk. The semantics of TensorImpl are now that we guard only when you actually attempt to access the contiguity of the tensor via, e.g., `is_contiguous`. By in large, the contiguity calculation in the implementations now needs to be duplicated (as the boolean version can short circuit, but the SymBool version cannot); you should carefully review the duplicate new implementations. I typically use the `identity` template to disambiguate which version of the function I need, and rely on overloading to allow for implementation sharing. The changes to the `compute_` functions are particularly interesting; for most of the functions, I preserved their original non-symbolic implementation, and then introduce a new symbolic implementation that is branch-less (making use of our new SymBool operations). However, `compute_non_overlapping_and_dense` is special, see next bullet.~~ This appears to cause performance problems, so I am leaving this to an update PR. * (Update: the Python side pieces for this are still in this PR, but they are not wired up until later PRs.) While the contiguity calculations are relatively easy to write in a branch-free way, `compute_non_overlapping_and_dense` is not: it involves a sort on the strides. While in principle we can still make it go through by using a data oblivious sorting network, this seems like too much complication for a field that is likely never used (because typically, it will be obvious that a tensor is non overlapping and dense, because the tensor is contiguous.) So we take a different approach: instead of trying to trace through the logic computation of non-overlapping and dense, we instead introduce a new opaque operator IsNonOverlappingAndDenseIndicator which represents all of the compute that would have been done here. This function returns an integer 0 if `is_non_overlapping_and_dense` would have returned `False`, and an integer 1 otherwise, for technical reasons (Sympy does not easily allow defining custom functions that return booleans). The function itself only knows how to evaluate itself if all of its arguments are integers; otherwise it is left unevaluated. This means we can always guard on it (as `size_hint` will always be able to evaluate through it), but otherwise its insides are left a black box. We typically do NOT expect this custom function to show up in actual boolean expressions, because we will typically shortcut it due to the tensor being contiguous. It's possible we should apply this treatment to all of the other `compute_` operations, more investigation necessary. As a technical note, because this operator takes a pair of a list of SymInts, we need to support converting `ArrayRef<SymNode>` to Python, and I also unpack the pair of lists into a single list because I don't know if Sympy operations can actually validly take lists of Sympy expressions as inputs. See for example `_make_node_sizes_strides` * On the Python side, we also introduce a SymBool class, and update SymNode to track bool as a valid pytype. There is some subtlety here: bool is a subclass of int, so one has to be careful about `isinstance` checks (in fact, in most cases I replaced `isinstance(x, int)` with `type(x) is int` for expressly this reason.) Additionally, unlike, C++, I do NOT define bitwise inverse on SymBool, because it does not do the correct thing when run on booleans, e.g., `~True` is `-2`. (For that matter, they don't do the right thing in C++ either, but at least in principle the compiler can warn you about it with `-Wbool-operation`, and so the rule is simple in C++; only use logical operations if the types are statically known to be SymBool). Alas, logical negation is not overrideable, so we have to introduce `sym_not` which must be used in place of `not` whenever a SymBool can turn up. To avoid confusion with `__not__` which may imply that `operators.__not__` might be acceptable to use (it isn't), our magic method is called `__sym_not__`. The other bitwise operators `&` and `\|` do the right thing with booleans and are acceptable to use. * There is some annoyance working with booleans in Sympy. Unlike int and float, booleans live in their own algebra and they support less operations than regular numbers. In particular, `sympy.expand` does not work on them. To get around this, I introduce `safe_expand` which only calls expand on operations which are known to be expandable. TODO: this PR appears to greatly regress performance of symbolic reasoning. In particular, `python test/functorch/test_aotdispatch.py -k max_pool2d` performs really poorly with these changes. Need to investigate. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92149 Approved by: https://github.com/albanD, https://github.com/Skylion007	2023-01-21 02:21:56 +00:00
Will Constable	a2b8e891f6	Fix/modernize dynamo docs (#92572 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92572 Approved by: https://github.com/ezyang	2023-01-19 16:15:31 +00:00
Edward Z. Yang	6420fecdc4	Introduce sym_min and sym_max (#92107 ) It turns out our old max/min implementation didn't do anything, because `__max__` and `__min__` are not actually magic methods in Python. So I give 'em the `sym_` treatment, similar to the other non-overrideable builtins. NB: I would like to use `sym_max` when computing contiguous strides but this appears to make `python test/functorch/test_aotdispatch.py -v -k test_aot_autograd_symbolic_exhaustive_nn_functional_max_pool2d_cpu_float32` run extremely slowly. Needs investigating. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/92107 Approved by: https://github.com/albanD, https://github.com/voznesenskym, https://github.com/Skylion007	2023-01-18 20:57:27 +00:00

1 2 3 4 5 ...

2232 Commits