pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Ke Wen	ad96f991a5	[pipelining] Add pipe.build_stage() (#128240 ) Given `PipelineStage` name to manual side. Thus adding a method under `Pipe` to create PipelineStage. Moved `PipeInfo` to utils.py to avoid circular dependency between `_IR` and `PipelineStage`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128240 Approved by: https://github.com/wconstab, https://github.com/H-Huang	2024-06-08 01:26:02 +00:00
Howard Huang	bef586111a	[pipelining] pipelining.rst updates (#128228 ) fix some nits and add `PipelineStage` (manual) Pull Request resolved: https://github.com/pytorch/pytorch/pull/128228 Approved by: https://github.com/wconstab ghstack dependencies: #128201	2024-06-07 23:29:54 +00:00
Ke Wen	3090667cf9	[pipelining] pipeline() taking microbatch as example input (#128163 ) Changed the API of `pipeline()` to take microbatch instead of full batch as example args. Main purpose is to: - make this API more atomic; - decouple tracing frontend from runtime info like `num_chunks`. Side effects: - Creates opportunity for varying `num_chunks` of schedules with the same `pipe` object. - User has to create example microbatch input. - Chunk spec stuff are now all moved to runtime side. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128163 Approved by: https://github.com/H-Huang	2024-06-07 15:51:53 +00:00
Howard Huang	543a870943	[pipelining] Rename ManualPipelineStage -> PipelineStage (#128157 ) Renaming ManualPipelineStage to remove the "Manual" part. I needed to replace the existing `PipelineStage` which takes in the `pipe` argument, so I have renamed that to `TracerPipelineStage`. @kwen2501 will remove this entirely in favor of adding a util to `Pipe` to just create the stage directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128157 Approved by: https://github.com/wconstab	2024-06-07 09:24:16 +00:00
chunyuan	7efaeb1494	[AOTI] docs: add suggestion to turn on freezing on CPU (#128010 ) With https://github.com/pytorch/pytorch/pull/124350 landed, it is now suggested in AOTI to turn on freezing on CPU to get better performance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128010 Approved by: https://github.com/desertfire	2024-06-07 08:57:02 +00:00
Ke Wen	01601ebd41	Retire torch.distributed.pipeline (#127354 ) Actually retiring module after deprecation warning for a while. The new supported module is: torch.distributed.pipelining. Please migrate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127354 Approved by: https://github.com/wconstab	2024-06-07 08:11:58 +00:00
Ke Wen	96806b1777	[pipelining][doc] Add frontend description and change tracer example (#128070 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/128070 Approved by: https://github.com/wconstab, https://github.com/H-Huang	2024-06-07 04:09:36 +00:00
Pian Pawakapan	50155e825b	[export] provide refine function for automatically accepting dynamic shapes suggested fixes (#127436 ) Summary: Part of the work helping export's automatic dynamic shapes / dynamic shapes refining based on suggested fixes. Introduces a util function refine_dynamic_shapes_from_suggested_fixes() that takes the error message from a ConstraintViolationError message containing suggested dynamic shapes fixes, along with the original dynamic shapes spec, and returns the new spec. Written so that the suggested fixes from export can be directly parsed and used. Example usage for the automatic dynamic shapes workflow: ``` # export, fail, parse & refine suggested fixes, re-export try: export(model, inps, dynamic_shapes=dynamic_shapes) except torch._dynamo.exc.UserError as exc: new_shapes = refine_dynamic_shapes_from_suggested_fixes(exc.msg, dynamic_shapes) export(model, inps, dynamic_shapes=new_shapes) ``` For examples of behavior, see the added test and docstring. Will take suggestions for renaming the function to something else 😅 Test Plan: test_export tests Differential Revision: D57409142 Pull Request resolved: https://github.com/pytorch/pytorch/pull/127436 Approved by: https://github.com/avikchaudhuri	2024-06-07 03:29:06 +00:00
brightonanc	6dfdce92ba	Fixed typos in the complex numbers portion of the autograd docs (#127948 ) This PR fixes several typos in the complex numbers section of the docs for autograd. Only documentation was altered. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127948 Approved by: https://github.com/soulitzer	2024-06-06 22:47:04 +00:00
ibartol	bb2de3b101	Fixed broken link and removed unfinished sentence from issue #126367 (#127938 ) Fixes #126367. ## Description Fixed a broken link in the pytorch/docs/source/torch.compiler_faq.rst doc and deleted a few words that were extra according to the issue tagged above. ## Checklist - [X] The issue that is being fixed is referred in the description - [X] Only one issue is addressed in this pull request - [X] Labels from the issue that this PR is fixing are added to this pull request - [X] No unnecesary issues are included into this pull request Pull Request resolved: https://github.com/pytorch/pytorch/pull/127938 Approved by: https://github.com/msaroufim	2024-06-05 07:37:32 +00:00
Svetlana Karslioglu	20f966a8e0	Ignore undocumented PipelineSchedule.step (#127955 ) Ignore undocumented PipelineSchedule.step to fix doc build: https://github.com/pytorch/pytorch/actions/runs/9372492435/job/25805861083?pr=127938#step:11:1284 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/127955 Approved by: https://github.com/kit1980	2024-06-04 22:11:09 +00:00
Tristan Rice	597922ba21	Reapply "distributed debug handlers (#126601 )" (#127805 ) This reverts commit `7646825c3e`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127805 Approved by: https://github.com/PaliC	2024-06-04 19:44:30 +00:00
PyTorch MergeBot	0ff60236ab	Revert "Retire torch.distributed.pipeline (#127354 )" This reverts commit `b9c058c203`. Reverted https://github.com/pytorch/pytorch/pull/127354 on behalf of https://github.com/huydhn due to Sorry for reverting your change but the doc build failure looks legit `b9c058c203` ([comment](https://github.com/pytorch/pytorch/pull/127354#issuecomment-2148133982))	2024-06-04 18:19:31 +00:00
Ke Wen	b9c058c203	Retire torch.distributed.pipeline (#127354 ) Actually retiring module after deprecation warning for a while. The new supported module is: torch.distributed.pipelining. Please migrate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127354 Approved by: https://github.com/wconstab	2024-06-04 07:03:26 +00:00
Jeff Daily	0e7bd7fedd	[ROCm] TunableOp improvements (#124362 ) - use less memory; smaller default hipblaslt workspace size - options to avoid cache effects - icache flush option - rotating buffers during tuning - python APIs - unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/124362 Approved by: https://github.com/xw285cornell	2024-06-03 22:30:11 +00:00
Sheng Fu	c1dd3a615f	Implement Graph Transform Observer (#127427 ) Summary: Implement Graph Transform Observer Differential Revision: D57887518 Pull Request resolved: https://github.com/pytorch/pytorch/pull/127427 Approved by: https://github.com/angelayi	2024-06-02 06:49:47 +00:00
PyTorch MergeBot	7646825c3e	Revert "distributed debug handlers (#126601 )" This reverts commit `3d541835d5`. Reverted https://github.com/pytorch/pytorch/pull/126601 on behalf of https://github.com/PaliC due to breaking internal typechecking tests ([comment](https://github.com/pytorch/pytorch/pull/126601#issuecomment-2141076987))	2024-05-31 01:21:24 +00:00
Alex Baden	5d316c81be	[Inductor] Add 0 initialization to Triton masked loads (#127311 ) For a masked `tl.load` operation, the Triton language specifies that values masked out (i.e. where the mask evaluates to false) are undefined in the output of the load. Triton provides an optional `other` parameter which, when included, provides an explicit value to use for masked out values from the load. If the output from a masked load without the `other` parameter is used in a conditional, unexpected behavior can occur. Despite the language specification, all Triton backends currently in use by PyTorch Inductor (NVIDIA, AMD, and Intel) 0-initialize masked loads if `other` is not present (we recently changed the Intel backend behavior to match NVIDIA and AMD because that's what our users expect, even if we are not following the Triton spec to the tee). This PR attempts to "future-proof" Inductor for new backends (or perhaps changes in the current backends? - we did not see any performance change from 0-initializing in the Intel XPU backend but one could imagine compiler optimizations to remove paths that depend on undefined) to add an explicit `other` in instances where later conditionals depend on the `tl.load` output. I also removed an exception to `other` behavior for boolean loads, which was put in place for a Triton bug that should be fixed. I added `other` to the getting started documentation as a clue that masked load behavior requires explicit initialization if, even though I don't expect `undef` values to cause the example code to fail if the underlying output is not 0-initialized. Finally, I added other to the `make_load` function in `select_algorithm.py`, though I wasn't able to determine if that function was actually being called. Fixes #126535 Pull Request resolved: https://github.com/pytorch/pytorch/pull/127311 Approved by: https://github.com/jansel	2024-05-30 04:50:54 +00:00
Tristan Rice	3d541835d5	distributed debug handlers (#126601 ) This adds debug handlers as described in: * https://gist.github.com/d4l3k/828b7be585c7615e85b2c448b308d925 (public copy) * https://docs.google.com/document/d/1la68szcS6wUYElUUX-P6zXgkPA8lnfzpagMTPys3aQ8/edit (internal copy) This is only adding the C++ pieces that will be used from the main process. The Python and torchrun pieces will be added in a follow up PR. This adds 2 handlers out of the box: * `/handler/ping` for testing purposes * `/handler/dump_nccl_trace_pickle` as a POC integration with Flight Recorder Test plan: ``` python test/distributed/elastic/test_control_plane.py ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/126601 Approved by: https://github.com/kurman, https://github.com/c-p-i-o	2024-05-30 02:21:08 +00:00
rzou	1abcac9dab	New Custom Ops Documentation landing page (#127400 ) We create a new landing page for PyTorch custom ops (suggested by jansel). All of our error messages will link here, and I'll work with the docs team to see if we can boost SEO for this page. NB: the landing page links some non-searchable webpages. Two of those (the Python custom ops tutorial and C++ custom ops tutorial) will turn into actual webpages when PyTorch 2.4 comes around. I'll make the third one (the Custom Operators Manual) once it stabilizes (we continously add new things to it and the length means that we might want to create a custom website for it to make the presentation more ingestable). Test Plan: - view docs preview. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127400 Approved by: https://github.com/jansel ghstack dependencies: #127291, #127292	2024-05-30 01:06:04 +00:00
Edward Z. Yang	76fc58c160	Document the legacy constructor for Tensor (#122625 ) Fixes https://github.com/pytorch/pytorch/issues/122408 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/122625 Approved by: https://github.com/albanD	2024-05-29 23:23:19 +00:00
Xuehai Pan	26f4f10ac8	[5/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torch (#127126 ) The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126 Approved by: https://github.com/kit1980	2024-05-27 14:49:57 +00:00
PyTorch MergeBot	55c0ab2887	Revert "[5/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torch (#127126 )" This reverts commit `7763c83af6`. Reverted https://github.com/pytorch/pytorch/pull/127126 on behalf of https://github.com/XuehaiPan due to Broken CI ([comment](https://github.com/pytorch/pytorch/pull/127126#issuecomment-2133044286))	2024-05-27 09:22:08 +00:00
Xuehai Pan	7763c83af6	[5/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torch (#127126 ) The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126 Approved by: https://github.com/kit1980 ghstack dependencies: #127122, #127123, #127124, #127125	2024-05-27 04:22:18 +00:00
Xuehai Pan	35ea5c6b22	[3/N][Easy] fix typo for `usort` config in `pyproject.toml` (`kown` -> `known`): sort torchgen (#127124 ) The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127124 Approved by: https://github.com/Skylion007 ghstack dependencies: #127122, #127123	2024-05-25 19:20:03 +00:00
Yu, Guangye	e7a42702f9	generalize custom_fwd&custom_bwd to be device-agnostic (#126531 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/126531 Approved by: https://github.com/jgong5, https://github.com/gujinghui, https://github.com/albanD, https://github.com/EikanWang ghstack dependencies: #126527	2024-05-25 06:48:16 +00:00
Yu, Guangye	c09205a057	Deprecate device-specific GradScaler autocast API (#126527 ) # Motivation ## for `torch.amp.GradScaler`, - `torch.cpu.amp.GradScaler(args...)` is completely equivalent to `torch. amp.GradScaler("cpu", args...)`. - `torch.cuda.amp.GradScaler(args...)` is completely equivalent to `torch.amp.GradScaler("cuda", args...)`. So, we intend to depreate them and strongly recommend developer to use `torch.amp.GradScaler`. ## for `custom_fwd` and `custom_bwd`, this is a good solution to make the custom function run with or without effect even in an autocast-enabled region and can be shared by other backends, like CPU and XPU. So we generalize it to be device-agnostic and put them int `torch/amp/autocast_mode.py` and re-expose to `torch.amp.custom_fwd` and `torch.amp.custom_bwd`. Meanwhile, we deprecate `torch.cuda.amp.custom_fwd` and `torch.cuda.amp.custom_bwd`. # Additional Context Add UT to cover the deprecated warning. No need for more UTs to cover the functionality of `torch.amp.custom_f/bwd`, the existing UTs that previously covered the functionality of `torch.cuda.amp.custom_f/bwd` can cover them. To facilitate the review, we separate these code changes to two PRs. The first PR cover `torch.amp.GradScaler`. The follow-up covers `custom_fwd` and `custom_bwd`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126527 Approved by: https://github.com/jgong5, https://github.com/gujinghui, https://github.com/janeyx99, https://github.com/EikanWang	2024-05-25 06:41:34 +00:00
lezcano	a30baec0c3	[Docs] Fix NumPy + backward example (#126872 ) We were calling backward on a tensor not a scalar... Pull Request resolved: https://github.com/pytorch/pytorch/pull/126872 Approved by: https://github.com/albanD	2024-05-22 21:29:31 +00:00
Kurman Karabukaev	d62b025efc	[TorchElastic] Option for sharing TCPStore created by rdzv handlers (#125743 ) Summary: 1. Define explicit `use_agent_store` on rdzv handlers. Handlers that set is true can share the store. 2. Instead of agent coordinating master_add/master_port values, the logic is now encapsulated by a rdzv_handler where `RendezvousInfo` will have `RendezvousStoreInfo` object that handlers must return. - Depending on the implementation they can either: - point to existing store (and expected to `use_agent_store` as true - point 1). Client code will rely on `TORCHELASTIC_USE_AGENT_STORE` env variable to know if the store is shared. - build args that `torch.distributed.init_process_group` can bootstrap by creating new store. Additional points: - When TCPStore is shared, it should be wrapped in PrefixStore to qualify/scope namespace for other usecases. - `next_rendezvous` signature changed to return instance of `RendezvousInfo` instead of a (store, rank, world_size) tuple for extensibility purposes. Why: - Reduce moving parts - easier to swap implementation - improve tractability - addressing perf/debug-ability will benefit all usecases - Test Plan: CI Differential Revision: D57055235 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125743 Approved by: https://github.com/d4l3k	2024-05-22 18:24:11 +00:00
Ke Wen	403012b50a	[pipelining] expose APIs per pytorch rule (#126812 ) Rule is enforced by #126103. The rule: - If `torch.a.b` defines a public class `C` (i.e. to be exposed in torch API namespace), then `torch.a.b` must be a public path, i.e. no `_`. - `torch.a.b` should ideally have an `__all__` that defines what should be imported from this file when it is imported. - All other definitions in `torch.a.b` that you don't want to expose should have a `_` prefix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126812 Approved by: https://github.com/wconstab	2024-05-22 16:21:13 +00:00
Sahdev Zala	fe0a36fd7c	Fix a link in the compiler backend doc (#126079 ) The core aten is the core subset of aten and seems the corrent link to replace the broken link. Fixes #125961 Pull Request resolved: https://github.com/pytorch/pytorch/pull/126079 Approved by: https://github.com/svekars	2024-05-21 20:16:04 +00:00
Joel Schlosser	31ba6ee49b	Traceable wrapper subclass support for deferred runtime asserts (#126198 ) The padded dense -> jagged conversion op has the signature: ``` _fbgemm_dense_to_jagged_forward(Tensor dense, Tensor[] offsets, SymInt? total_L=None) -> Tensor ``` when `total_L` is not specified, the meta registration has a data-dependent output shape (based on `offsets[0][-1]`). Returning an unbacked SymInt here should work in theory, but traceable wrapper subclass support is missing in later code to handle deferred runtime asserts. This PR fixes this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126198 Approved by: https://github.com/ezyang	2024-05-21 01:21:46 +00:00
Mikayla Gawarecki	66dc8fb7ff	Allow tensor subclasses and add `torch.serialization.add_safe_globals` that allows users to allowlist classes for `weights_only` load (#124331 ) #### Conditions for allowlisting tensor subclasses We allow tensor subclasses types that (1) Do not override `__setstate__`, `__getattr__`, `__setattr__`, `__get__`, `__set__` or `__getattribute__` of `torch.Tensor` (`torch.Tensor` does not have a definition of `__getattr__`, `__get__` or `__set__` so we check that these are `None`) (2) Use the generic `tp_alloc` (3) Are in a module that has been imported by the user to be pushed onto the stack as strings by `GLOBAL` instructions, while storing the type in a dict The strings will be converted to the classes as appropriate when executing `REBUILD` with `_rebuild_from_type_v2` Note that we use `inspect.getattr_static(sys.modules[module], name)` to get the class/function as this method claims to have no code execution. The rationale for the 3 conditions above is as follows: The rebuild func provided by `Tensor.__reduce_ex__` is `torch._tensor._rebuild_from_type_v2`, which is defined as such (note the call to `getattr`, `Tensor.__setstate__` and the call to `as_subclass` as well as the call to `_set_obj_state` which calls `setattr`) `4e66aaa010/torch/_tensor.py (L57-L71)` `as_subclass` is implemented with a call to `THPVariable_NewWithVar` that will eventually call `tp_alloc` here `4e66aaa010/torch/csrc/autograd/python_variable.cpp (L2053)` The `func` arg to `_rebuild_from_type_v2` for wrapper subclasses is `Tensor.rebuild_wrapper_subclass`, which will similarly call into `THPVariable_NewWithVar` and hit the above `tp_alloc` Note that we do not call `tp_init` or `tp_new` (i.e. `cls.__init__` or `cls.__new__`) when unpickling* ### How do we check something is a tensor subclass/constraints around imports In order to check whether `bla` is a tensor subclass in the bytecode `GLOBAL module.name`, we need to do an `issubclass` check, which entails converting the global string to the appropriate type. We do not arbitrarily import modules but will perform this check as long as the given subclass (given by `module.name`) has already been imported by the user (i.e. `module in sys.modules` and `issubclass(getattr(sys[modules], name), torch.Tensor)` This PR also allowlisted `torch._utils._rebuild_wrapper_subclass` and `torch.device` (used by `_rebuild_wrapper_subclass`) ### API for allow listing This PR also added `torch.serialization.{add/get/clear}_safe_globals` that enables user to allowlist globals they have deemed safe and manipulate this list (for example they could allowlist a tensor subclass with a custom `__setstate__` if they have checked that this is safe). Next steps: - Add testing and allowlist required classes for all in-core tensor subclasses (e.g. `DTensor`, `FakeTensor` etc.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124331 Approved by: https://github.com/albanD	2024-05-17 17:56:57 +00:00
yuanx749	691af57fbc	Fix broken link of scikit-learn (#120972 ) The link is broken in https://pytorch.org/docs/main/community/design.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/120972 Approved by: https://github.com/Skylion007	2024-05-16 11:46:34 +00:00
Edward Z. Yang	44efeac24e	Beef up error message for pending assert failure (#126212 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/126212 Approved by: https://github.com/Skylion007	2024-05-15 18:22:53 +00:00
Oguz Ulgen	79655a1321	Add force_disable_caches to the docs (#126184 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/126184 Approved by: https://github.com/msaroufim	2024-05-15 07:16:08 +00:00
Ke Wen	07d6ab5aa2	[pipelining] Add pipeline schedules (#125975 ) 1. Add pipeline schedules: - GPipe - 1F1B - Interleaved 1F1B - LoopedBFS 2. Add basic forward and backward tests: test_schedule.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/125975 Approved by: https://github.com/wconstab ghstack dependencies: #125729	2024-05-11 21:17:53 +00:00
Will Constable	26b942c4fc	[C10D] Document destroy_process_group usage (#122358 ) This API was not documented. It has already been a source of confusion, but recently has become more urgent as improper destruction can lead to hangs due to ncclCommAbort's requirement of being called collectively. <img width="888" alt="image" src="https://github.com/pytorch/pytorch/assets/4984825/9e16342d-1108-4d7d-95c8-b8753661b8e9"> Fixes #48203 Pull Request resolved: https://github.com/pytorch/pytorch/pull/122358 Approved by: https://github.com/shuqiangzhang	2024-05-09 16:51:31 +00:00
lezcano	acafabaa29	Rename TorchDynamo -> Dyanamo in the dynamo tutorial doc (#123431 ) Less verbose and it aligns it with the dynamo deepdive Pull Request resolved: https://github.com/pytorch/pytorch/pull/123431 Approved by: https://github.com/peterbell10	2024-05-07 05:07:00 +00:00
albanD	76a26a885d	Add module tracker (#125352 ) This does a few things that were originally a few PRs but I am on a new machine and don't have ghstack. If it is too problematic to review, I can re-split, just let me know. This does: - Cleanup context manager use in test_flop_counter - Remove need for mod argument in FlopCounterMode, warning about it - Re-implement a Module tracker from scratch using global forward Module use and multi_grad_hook (we cannot use global backward Module hook because they don't look for nested Tensor and they're custom Function based instead of multi_grad_hook). - Update FlopCouterMode to use the new ModuleTracker. All the existing test suite passes as-is (only changes there are new tests and refactoring mentioned above) Pull Request resolved: https://github.com/pytorch/pytorch/pull/125352 Approved by: https://github.com/mikaylagawarecki	2024-05-04 18:33:35 +00:00
Ke Wen	5cd7c75bd9	[pipelining] Add tracing frontend (#125448 ) This PR allows user to transform a model into a pipeline representation with split stages, according to a split spec. ``` def pipeline( module: torch.nn.Module, num_chunks: int, example_args: Tuple[Any, ...], example_kwargs: Optional[Dict[str, Any]] = None, split_spec: Optional[Dict[str, SplitPoint]] = None, split_policy: Optional[Callable[[fx.GraphModule], fx.GraphModule]] = None, ) -> Pipe: ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/125448 Approved by: https://github.com/H-Huang ghstack dependencies: #125273	2024-05-04 09:00:25 +00:00
Muralidhar Andoorveedu	b96b1e8cff	[Distributed] Add P2P versions of *object_list operations (#124379 ) This PR adds `send_object_list` and `recv_object_list` to `distributed_c10d.py`. This is extending functionality already present in PyTorch with `broadcast_object_list` that I noticed was missing and decided to upstream. With this change, sending and receiving arbitrary picklable python objects is possible. Relevant issue: https://github.com/pytorch/pytorch/issues/3473 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124379 Approved by: https://github.com/kwen2501, https://github.com/wconstab	2024-05-03 23:22:58 +00:00
Alexandre Ghelfi, PhD	d18a6f46d0	Adding Compare in torch.utils.benchmark documentation (#125009 ) `torch.utils.benchmark.Compare` is not directly exposed in torch.utils.benchmark documentation. I think this is a valuable resource to add since it can help people embracing the torch benchmark way of doing things, and help people building documentation towards it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125009 Approved by: https://github.com/mikaylagawarecki	2024-05-03 00:50:54 +00:00
Ke Wen	0199ce8d6c	[pipelining] Add microbatch split and merge utils (#125273 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/125273 Approved by: https://github.com/H-Huang ghstack dependencies: #124776, #124875, #124958	2024-05-02 21:09:47 +00:00
Lucas Pasqualin	799f1460af	[DCP] Provides default AsyncStager (#124939 ) Differential Revision: [D56575987](https://our.internmc.facebook.com/intern/diff/D56575987/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124939 Approved by: https://github.com/fegin ghstack dependencies: #122965	2024-05-02 19:48:54 +00:00
Lucas Pasqualin	3741fb3680	[DCP] Introduce async staging extension points (#122965 ) Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #124944 * #124939 * __->__ #122965 Differential Revision: [D55493240](https://our.internmc.facebook.com/intern/diff/D55493240/) This PR is now ready for merge and is not an RFC Major choices are: -- the introduction of the AsyncStager protocol -- removed `executor` from param. -- leave async as a separate method (for now) This proposal seeks to add extension points to dcp.async_save, allowing users to: - Specify a specific staging method when calling async_save - Allow a vehicle for also making the staging method async, to allow for cases where we may want to overlap with the training loop (e.g., overlap d2h with and only synchronize at the optim.step) - Potentially specify the execution method for doing async_save in parallel. For example some users may prefer a subprocess over a thread to avoid GIL issues. A totally reasonable alternative to this entire proposal is to expect users who want this level of customization to write their own custom async save methods. Here's an example which addresses the issues mentioned in PR comments. ``` def custom_async_save(...): # this step accomplishes staging and includes the usual 'planning' calls (issue 1) buffered_writer = CpuBufferedWriter() # this is stateful, contains a copy of state_dict dcp.save(state_dict, storage_writer=buffered_writer) final_storage_writer = FileSystemWriter() mp.spawn( # issue2 is gone, do whatever you want here dcp.save, # or some custom sub-process method which calls dcp.save under the hood buffered_writer.state_dict, # lot's of way's to do this, not really the most important part checkpoint_id=checkpoint_id, storage_writer=storage_writer, planner=planner, process_group=process_group, # this actually wouldn't work, but again not the pt. ) # leaving out the rest of the details for managing your extra special subprocess. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/122965 Approved by: https://github.com/daulet-askarov	2024-05-02 19:01:55 +00:00
Ke Wen	52142192d4	[pipelining] Add stage backward function (#124958 ) This is a helper function which: 1. computes the gradients for the stage inputs, and 2. accumulates gradients for the stage module's parameters. A unit test for this function is also added. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124958 Approved by: https://github.com/wconstab ghstack dependencies: #124776, #124875	2024-05-01 07:56:58 +00:00
Mikayla Gawarecki	2480e8b8a1	Add MAP_SHARED option for torch.load(mmap=True) (#124889 ) Fixes #124528 Going over the options for our MapAllocator and what they do, I don't think any other of them need to be piped up to `torch.load` `4f29103749/aten/src/ATen/MapAllocator.h (L8-L16)` ~However, I wonder if this `MmapVisibility(Enum)` is a good way to represent "or-ing" together of `mmap` flags if we want to extend it in the future. I looked over the flags for [`mmap(2)`](https://man7.org/linux/man-pages/man2/mmap.2.html), and could not immediately see how most of them would be useful for `torch.load` (would maybe `MAP_LOCKED` (like `mlock`) or `MAP_HUGE` ever be worthwhile?)~ Using the flags provided by the python `mmap` library so that we can extend the allowed flags and pipe them down to the cpp `mmap` call if there is a need for other flags in the future Pull Request resolved: https://github.com/pytorch/pytorch/pull/124889 Approved by: https://github.com/albanD	2024-04-30 15:02:19 +00:00
Avik Chaudhuri	e7846447e0	dynamic shapes builder API (#124898 ) This PR introduces a new way of building `dynamic_shapes` for export. The idea is to build up a mapping from input tensors to the dynamic shapes that should be assigned to their corresponding fake tensors. This mapping is automatically converted to the current form of `dynamic_shapes`, which must exactly match the structure of inputs. We do this by using pytree utils. With the current `dynamic_shapes`, we had to be careful about user-defined classes that are registered with pytree, since such classes are not necessarily polymorphic containers; they may be fine containing tensors, but not dynamic shapes. Thus we had decided to allow input instances of such classes to be associated with dynamic shapes in flattened form. This decision needs to be mirrored in this PR as well. To make it easier to keep these code paths in sync, we refactor the current recursive procedure for associating inputs with dynamic shapes to use the same pytree utils. This needs minor fixes to a few tests where `dynamic_shapes` were not exactly matching the structure of inputs. Differential Revision: D56551992 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124898 Approved by: https://github.com/zhxchen17	2024-04-30 03:59:49 +00:00
Tristan Rice	dc4c75ba72	elastic/rendezvous: make barrier and rank assignment operations O(n) instead of O(n^2) (#124982 ) Summary: This makes barrier and rank operations linear instead of quadratic with the number of workers. This drastically improves performance for rendezvous when running with over 1000 hosts. This uses 2 approaches for different areas: * local rank assignment: each worker does 1 set and 1 get, local ranks are assigned on the rank 0 host in a O(n) operation which reduces total store operations to be linear with number of workers. * exit_barrier: use a counter and a final flag so each worker has to do max 1 set, 1 get and 1 add. At 4000 hosts we see torchelastic be able to run in as little as 10 seconds down from 373 seconds. Test Plan: This is testing using many small tests running on a remote cluster. {D56549942} ``` torchx run --scheduler mast -- --image=torchelastic_benchmark --j=4000x1 ``` Differential Revision: D56605193 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124982 Approved by: https://github.com/kiukchung, https://github.com/kurman	2024-04-27 02:21:44 +00:00

1 2 3 4 5 ...

2536 Commits