pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Xilun Wu	a66107a30c	[DTensor][Random] Introduce CudaRNGStateTracker to maintain parallel RNG state for DTensor (#103235 ) # Change This PR adds two classes to DTensor: 1. `CudaRNGStateTracker`: `CudaRNGStateTracker` stores Random Number Generator (RNG) state (a `ByteTensor` object) in a `dict`, mapping from a corresponding tag to each state tensor. It also provides a set of convenient utility methods to help access/modify the state tensors. The most important interface is `_distribute_region` which will be used when DTensor executes a random op (an operator that calls RNG). 2. `OffsetBasedRNGTracker`: This subclass of `CudaRNGStateTracker` defines the default policy of how RNG states should be shared and synchronized among all ranks to respect the semantics of DTensor random operators. # Warning - With `Multi-threaded ProcessGroup`, the global variable `_rng_tracker` will be shared among threads(ranks) and cause issue. We need to figure out a compatible solution for that. - The RNG state may be asynchronous outside of participating ranks. It is harmless in our current use case of submesh though. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103235 Approved by: https://github.com/wanchaol	2023-06-27 19:00:25 +00:00
Wanchao Liang	70eccdbf92	[dtensor] add necessary logging to APIs and components (#101994 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/101994 Approved by: https://github.com/wz337	2023-05-23 18:17:54 +00:00
Wanchao Liang	a1aa32e204	[dtensor] tensor ops to use strategy based sharding prop (#100607 ) This is the first series of PR that adopts operator impls to use a strategy based approach, each op utilizes OpStrategy and PlacementStrategy to generate their own strategy. By utilizing the strategy based approach along with the op graph, we could enable more advanced op implementation (decomp is possible), and turn the sharding prop to be more like a contraint satisfication problem. This PR alone only adds some basic tensor op strategies, and it directly works on the op graph that was used for metadata propagation. The tensor ops added in this PR mainly follows one of the arg strategy. The next set of PRs would add more op strategies to other ops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100607 Approved by: https://github.com/XilunWu	2023-05-11 02:47:20 +00:00
Xilun Wu	964c7e3e85	[BE][DTensor] fix DTensor equal op (#99014 ) ## What problem this PR solves? #97170 fixed `equal` operator return type (old: Tensor, now: bool) by giving it the correct sharding propagation. This is consistent with the `aten::equal` op. However, the correctness only stays at the local result level: * `equal` op returns True if the local copy of dtensor A equals to the the local copy of dtensor B This is not the correct semantic of `equal` which should return True if all local copies of A are equal to the corresponding local copies of B. ## What is this PR? 1. For non-participating ranks, if the return type is scalar, `local_results` is set to `None` which means the default value is a reduced result of participating ranks only. 2. For all ranks, if the return type is scalar and the `op_call` is `aten::equal`(because `aten::equal` is the only function that returns scalar value and needs communication), all gather the `local_results` within the `default pg` and reduce on them with `operator.and_`. The result will be the new `local_result`. ## Result/Impact For non-participating ranks and the return type is scalar: 1. op is `aten::equal`, the return value is same with all other ranks 2. op is not `aten::equal`, the return value is None. Before this PR, this will raise "NotImplementedError" but has not been tested. For participating ranks and the return type is scalar: 1. op is `aten::equal`, the return value is the equality of two dtensor operands - True if all copies are equal, False otherwise. 2. op is not `aten::equal`, simply the local computation result. Pull Request resolved: https://github.com/pytorch/pytorch/pull/99014 Approved by: https://github.com/wanchaol	2023-04-18 03:22:44 +00:00
Xilun Wu	e686a1e1b3	[DTensor][2/N] add Philox offset adjustment logic in operator_dispatch (#98199 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98199 Approved by: https://github.com/wanchaol	2023-04-10 23:57:04 +00:00
Yifu Wang	0830808dde	[spmd expansion] speed up expansion by ~5x (#98389 ) According to profiling, the top two expensive operations in spmd expansion are propagate_op_sharding and make_fx (for every dispatcher op node). This PR makes the following changes to speed up spmd expansion: - We are unneccessarily doing propagate_op_sharding twice for every op. Remove one. - When no tensor redistribution is required, we only need to update non-tensor args of the node according to op_schema and avoid building a GraphModule just for the node. On a DDP use cases + foreach Adam, this change speeds up spmd expansion by ~5x (~10 min -> ~2 min). Pull Request resolved: https://github.com/pytorch/pytorch/pull/98389 Approved by: https://github.com/mrshenli	2023-04-05 16:31:40 +00:00
Shen Li	02179827cb	[Easy] Include SPMD and DTensor files in UFMT checks (#98148 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98148 Approved by: https://github.com/fegin	2023-04-02 15:34:49 +00:00
Shen Li	e8d39606eb	[SPMD] Enable fused Adam in full train step tracing (#98113 ) Differential Revision: [](https://our.internmc.facebook.com/intern/diff/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98113 Approved by: https://github.com/yifuwang, https://github.com/fegin	2023-04-01 15:54:13 +00:00
Shen Li	bccf2ef0ce	Format DTensor dispatch.py and _meta_registrations.py (#98114 ) Format-only changes with black and lintrunner to prepare for the commit on top. Differential Revision: [D44603809](https://our.internmc.facebook.com/intern/diff/D44603809) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98114 Approved by: https://github.com/yifuwang, https://github.com/fegin	2023-04-01 15:54:13 +00:00
Wanchao Liang	e9c4904915	[dtensor] remove custom dispatch op (#95629 ) Since we removed all custom dispatch ops, we can safely delete this table as we won't use it for other purposes Pull Request resolved: https://github.com/pytorch/pytorch/pull/95629 Approved by: https://github.com/XilunWu	2023-03-28 02:25:45 +00:00
Xilun Wu	c2d7508276	[DTensor] default value for DTensor ops on non-participating devices (#95852 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95852 Approved by: https://github.com/wanchaol	2023-03-23 19:30:02 +00:00
Wanchao Liang	2a1cb9640c	[dtensor] support creating DTensor in submesh (#95458 ) This PR supports creating DTensor in a submesh, if the rank is not participating in the mesh, we assign the local tensor to be empty tensor, and do nothing in the operator dispatch Differential Revision: [D43643577](https://our.internmc.facebook.com/intern/diff/D43643577) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95458 Approved by: https://github.com/XilunWu	2023-02-28 17:54:26 +00:00
Wanchao Liang	bb9a05b116	[dtensor] use tracing for metadata prop (#95456 ) This PR uses tracing for metadata prop, so that we can get correct shape/stride metadata without manual calculation by ourselves. The follow up PR on this would be adopt tracing for the sharding prop itself Differential Revision: [D43643578](https://our.internmc.facebook.com/intern/diff/D43643578) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95456 Approved by: https://github.com/XilunWu	2023-02-28 17:54:22 +00:00
Aaron Gokaslan	67d9790985	[BE] Apply almost all remaining flake8-comprehension checks (#94676 ) Applies the remaining flake8-comprehension fixes and checks. This changes replace all remaining unnecessary generator expressions with list/dict/set comprehensions which are more succinct, performant, and better supported by our torch.jit compiler. It also removes useless generators such as 'set(a for a in b)`, resolving it into just the set call. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94676 Approved by: https://github.com/ezyang	2023-02-12 01:01:25 +00:00
Wanchao Liang	b072245178	[dtensor][4/N] refactor dispatching logic and add propagator (#90733 ) This PR refactors the dispatching logic to make it more clean, and isolate the sharding propagation logic out to a separate class. This is so that we can implement more complicated propagation features later. Differential Revision: [D42876251](https://our.internmc.facebook.com/intern/diff/D42876251) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90733 Approved by: https://github.com/XilunWu, https://github.com/fduwjj	2023-02-01 05:02:11 +00:00
Wanchao Liang	c55f6973e4	[dtensor][3/N] move OpSchema and types to a separate file (#90732 ) This PR moves OpSchema and types to a separate file to resolve circular dependency better, this is part of refactor on dispatching logic to enable more complicated features Pull Request resolved: https://github.com/pytorch/pytorch/pull/90732 Approved by: https://github.com/XilunWu	2023-01-18 07:16:23 +00:00
Wanchao Liang	a1186d6af9	[dtensor][1/N] add __hash__ to device_mesh and dtensor_spec (#90731 ) This PR adds __hash__ to device_mesh and dtensor_spec to allow things like dict indexing Pull Request resolved: https://github.com/pytorch/pytorch/pull/90731 Approved by: https://github.com/XilunWu, https://github.com/fduwjj	2023-01-18 07:16:21 +00:00
Wanchao Liang	2c2cce73d4	[dtensor] remove torchgen function schema and parse manually (#90106 ) This PR get rids of torchgen FunctionSchema parsing and parse it manually, it should resolve torchgen package issue and also provide some perf wins when running DTensor eagerly Pull Request resolved: https://github.com/pytorch/pytorch/pull/90106 Approved by: https://github.com/awgu	2022-12-06 05:45:00 +00:00
Wanchao Liang	bf23e0bdbd	[dtensor] ufmt distributed._tensor (#89967 ) cmd: `ufmt format torch/distributed/_tensor` copy from Andrew: Notes For VSCode users, Install ufmt: https://pypi.org/project/ufmt/ Install VSCode ufmt extension: https://marketplace.visualstudio.com/items?itemName=omnilib.ufmt Include in settings.json: ``` { "[python]": { "editor.defaultFormatter": "omnilib.ufmt", "editor.formatOnSave": true, }, } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89967 Approved by: https://github.com/fduwjj	2022-12-01 20:58:13 +00:00
fduwjj	009dd3c4af	[PT-D][Tensor Parallel] Add more test cases when we use use_orig_params for FSDP wrapping (#89779 ) Differential Revision: [D41600656](https://our.internmc.facebook.com/intern/diff/D41600656) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89779 Approved by: https://github.com/wanchaol	2022-11-30 06:34:58 +00:00
Wanchao Liang	4b945967de	[dtensor] PART 2: move DTensor abstraction and APIs to core distributed (#88176 ) This PR moves the core DTensor abstraction and high level APIs to torch.distributed._tensor folder, which includes the following: 1. DTensor class 2. high level APIs (distribute_tensor/module) 3. dispatching logic 4. redistribute logic part of https://github.com/pytorch/pytorch/issues/88838 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88176 Approved by: https://github.com/fduwjj	2022-11-16 08:07:41 +00:00

21 Commits