pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
PyTorch MergeBot	41adec3c59	Revert "Switch to native functional collective by default (#120370 )" This reverts commit `1f1bc0e6ac`. Reverted https://github.com/pytorch/pytorch/pull/120370 on behalf of https://github.com/yifuwang due to broke CI ([comment](https://github.com/pytorch/pytorch/pull/120370#issuecomment-1965362938))	2024-02-26 21:55:13 +00:00
Yifu Wang	1f1bc0e6ac	Switch to native functional collective by default (#120370 ) This enables native functional collectives by default. After this PR: - The Python APIs remain backward compatible. Users will receive a deprecation warning if they use `(rank, tags)` as process group identifier. - Collectives will be captured as `_c10d_functional` ops in post-grad fx graphs. The change will not affect end-users, but it will impact `torch-xla` which has implemented an all-reduce backend based on the existing `c10d_functional` IR. This excludes the migration for `torch-xla` use cases, which will be coordinated separately (see communications in #93173). - Collectives will be lowered to and codegen'd by new Inductor collective IRs (`ir._CollectiveKernel` and `ir._WaitKernel`). This change will not affect end-users. Testing performed: - We have been running a set of representative unit tests with both the new native funcol and the old py funcol in CI. These test will continue to run with the old py funcol after this PR, so they are covered until they are removed. - Manually verified with e2e llama model training with DTensor + functional collectives (https://github.com/fairinternal/xlformers/tree/pt2_llm/pt2d#create-your-local-development-env). Fallback mechansim: - Introduced a temporary environment variable `TORCH_DISABLE_NATIVE_FUNCOL` that allows users to fall back to the previous implementation. We don't expect the migration to break anything; the mechanism is a safety measure to reduce potential disruption in case the PR causes unforeseen breakages. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120370 Approved by: https://github.com/wconstab, https://github.com/yf225	2024-02-24 09:38:26 +00:00
Yifu Wang	637cf4a3f2	Test parametrization utils for native funcol migration (#119950 ) ``` Between the time we switch to the native funcol by default and the time when we are confident that we can remove the legacy implementation, we want to ensure that the legacy funcol remains covered by unit tests. This is to prepare for any potential (but unlikely) reverts. The following utilities help achieve this goal. run_with_{native,legacy}_funcol - mark a test to run with only {native,legacy} funcol. These decorators are for impl specific tests (e.g. verifying generated code with FileCheck). run_with_both_funcol_impls - parametrize a test to run with both legacy and native funcol. run_with_both_funcol_impls_with_arg - same as run_with_both_funcol_impls, but passes `enable_native_funcol` to the test so impl specific checks can be carried out. ``` This PR also marks some tests we want to cover in this fashion. More tests will be marked in subsequent PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/119950 Approved by: https://github.com/wanchaol ghstack dependencies: #119881	2024-02-19 02:46:03 +00:00
Yifu Wang	40786ca509	Handle unwaited work objects on process termination (#119881 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/119881 Approved by: https://github.com/wconstab	2024-02-19 02:46:02 +00:00
Yifu Wang	8f82a44a5b	Run device mesh tests with native funcol enabled (#118437 ) ### Summary Run the relevant tests in `test/distributed/_tensor/test_dtensor_compile.py` and `test/distributed/test_device_mesh.py` with native funcol enabled, in addition to with them being disabled. All tests excepts `test_tp_compile_comm_reordering` pass. This is expected because the native funcols have slightly different IRs, so the reordering pass needs to be adjusted. This test is disabled for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/118437 Approved by: https://github.com/LucasLLC ghstack dependencies: #118910, #118911	2024-02-04 04:11:11 +00:00
Yifu Wang	697ca4f292	Preliminary DeviceMesh + native c10d functional integration (#118423 ) ### Summary - Added `group_name` as the third field in `dim_group_infos`. - `DeviceMeshTest` now runs both w/ and w/0 `_USE_NATIVE_C10D_FUNCTIONAL=1` in CI. ### Other fixes - Convert `reduceOp` to lower case before passing it into c10d_functional ops. - Added a finalizer to handle unwaited collectives (this mirrors the treatment for Python functional collective ops). Pull Request resolved: https://github.com/pytorch/pytorch/pull/118423 Approved by: https://github.com/wanchaol, https://github.com/LucasLLC, https://github.com/wconstab	2024-01-31 04:36:12 +00:00
Yifu Wang	b778f44e97	Allow using native c10d_functional via _functional_collectives (#113057 ) This diff introduces an env var `_USE_NATIVE_C10D_FUNCTIONAL` that tells `_functional_collective` to use native `c10d_functional` ops. The Python version and the native version will co-exist until we completely switch to the native version after more testing and verification. NOTE: `DeviceMesh` support for native `c10d_functional` will be added in a subsequent PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113057 Approved by: https://github.com/LucasLLC, https://github.com/wconstab, https://github.com/wanchaol	2024-01-30 02:34:25 +00:00
Edward Z. Yang	46712b019d	Enable local_partial_types (#118467 ) When using dmypy, this setting is enabled and cannot be turned off. Force it for regular mypy too. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/118467 Approved by: https://github.com/Skylion007 ghstack dependencies: #118414, #118418, #118432	2024-01-28 13:38:22 +00:00
Chien-Chin Huang	50db2aa70a	[funcol][BE] Apply ufmt to _functional_collectives.py and turn on lintrunner for functional_collective (#115648 ) No logic change, just formatting. Differential Revision: [D51857236](https://our.internmc.facebook.com/intern/diff/D51857236/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115648 Approved by: https://github.com/wconstab, https://github.com/wz337 ghstack dependencies: #115523, #115302	2023-12-13 11:19:29 +00:00
Lucas Pasqualin	1d56e7b5af	Adds broadcast to functional collectives (#112668 ) Adds `broadcast` to functional collectives, including inductor support. Test with `python test_inductor_collectives.py -- TestCollectivesMultiProc.test_broadcast_inductor` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112668 Approved by: https://github.com/wanchaol, https://github.com/wconstab	2023-11-09 15:47:52 +00:00
Edward Z. Yang	f274c7b32c	Add functional collective all_to_all_single and support it in Inductor (#110195 ) Copy of https://github.com/pytorch/pytorch/pull/106655 from yf225 rebased on top of item() support changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/110195 Approved by: https://github.com/Skylion007	2023-10-05 23:11:51 +00:00
Rodrigo Kumpera	bbf03561a9	[functional collectives] Move back to registering finalizers on wrappers. (#107250 ) We cannot use inner tensors for finalizers as they are uncollective until waited. This PR adds a bunch of tests for the observable behavior we want, including the necessary scafold for us to test code for their waitiness. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107250 Approved by: https://github.com/wconstab	2023-08-17 21:08:28 +00:00
Wanchao Liang	5c48ff20b5	AsyncCollectiveTensor: dont sync on view ops (#105240 ) AsyncCollectiveTensor is a tensor subclass that is meant to "delay synchronization" when you call into the functional collectives API's. It does this (if I understand correctly) by internally holding an "unsynchronized" version of the tensor, which is the result of the communication op, and internally calling `.wait()` to synchronize the data the next time it is used. Previously, these wait() calls would happen immediately, because `AsyncCollectiveTensor` gets wrapped by `DTensor()`, which calls `.detach()` on its inner tensor, immediately causing the sync (code: `1518d5eec4/torch/distributed/_tensor/api.py (L207)`) AsyncCollectiveTensor shouldn't need to do a synchronization if you try to detach() it though - in fact, it should be fine to avoid synchronizing if you perform any view ops on it (which just require viewing metadata, but not actual data). This PR tries to update `AsyncCollectiveTensor` to delay `wait()` calls whenever the subclass encounters a view op. Added some light testing, that just runs some DTensor compute followed by view ops, and confirms that the output is still an `AsyncCollectiveTensor` when we call `.to_local()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/105240 Approved by: https://github.com/wanchaol, https://github.com/fduwjj, https://github.com/wconstab	2023-08-11 19:20:25 +00:00
Wanchao Liang	f026b32008	[device_mesh][BE] reduce_scatter fallback to funcol and remove from DM (#105642 ) For the reason similar to https://github.com/pytorch/pytorch/pull/105605 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105642 Approved by: https://github.com/kumpera, https://github.com/wz337, https://github.com/fduwjj	2023-07-27 01:33:05 +00:00
Will Constable	d64bada876	Refactor funcol for readability and dynamo tracing (#104387 ) Move eager kernel impls to separate file, which is eaiser to read (since users may be confused about 2 versions of each kernel in the same file) and easier to set a dynamo policy to trace only the first file currently. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104387 Approved by: https://github.com/wanchaol, https://github.com/fduwjj, https://github.com/kumpera	2023-07-06 23:29:49 +00:00

15 Commits