pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Yifu Wang	b778f44e97	Allow using native c10d_functional via _functional_collectives (#113057 ) This diff introduces an env var `_USE_NATIVE_C10D_FUNCTIONAL` that tells `_functional_collective` to use native `c10d_functional` ops. The Python version and the native version will co-exist until we completely switch to the native version after more testing and verification. NOTE: `DeviceMesh` support for native `c10d_functional` will be added in a subsequent PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113057 Approved by: https://github.com/LucasLLC, https://github.com/wconstab, https://github.com/wanchaol	2024-01-30 02:34:25 +00:00
Yifu Wang	7d0ad6e870	Make native c10d_functional ops work with AOTInductor (#113735 ) Summary: - Revised `c10d_functional` ops to conform to https://github.com/pytorch/pytorch/tree/main/aten/src/ATen/native#func - Modifed `get_cpp_op_schema()` to handle mutable args and aliasing returns Pull Request resolved: https://github.com/pytorch/pytorch/pull/113735 Approved by: https://github.com/desertfire ghstack dependencies: #113438	2023-12-22 08:12:13 +00:00
Yifu Wang	718b576e2c	Port all_to_all_single to native c10d_functional (#113438 ) Summary: - Ported `all_to_all_single` to native c10d_functional - Added Inductor support for the native `all_to_all_single` via the new collective IR's `create_out_of_place()` - Since the new collective IR derives from `FallbackKernel` which implements a generic `free_unbacked_symbols`, no additional unbacked symbol handling for all_to_all_single is required Pull Request resolved: https://github.com/pytorch/pytorch/pull/113438 Approved by: https://github.com/yf225, https://github.com/ezyang	2023-12-22 08:12:13 +00:00
rzou	a06832f911	Grandfather in c10d_functional ops to pt2_compliant (#113049 ) This PR also adds the ability to specify Tags for more `m.def(` overloads. Test Plan: - new test Pull Request resolved: https://github.com/pytorch/pytorch/pull/113049 Approved by: https://github.com/williamwen42	2023-11-07 12:55:05 +00:00
PyTorch MergeBot	1fea599d9a	Revert "Grandfather in c10d_functional ops to pt2_compliant (#113049 )" This reverts commit `fe8570a1fe`. Reverted https://github.com/pytorch/pytorch/pull/113049 on behalf of https://github.com/clee2000 due to something in the stack broke distributed and inductor, pretty sure its this one ([comment](https://github.com/pytorch/pytorch/pull/113049#issuecomment-1797298969))	2023-11-07 02:34:13 +00:00
rzou	fe8570a1fe	Grandfather in c10d_functional ops to pt2_compliant (#113049 ) This PR also adds the ability to specify Tags for more `m.def(` overloads. Test Plan: - new test Pull Request resolved: https://github.com/pytorch/pytorch/pull/113049 Approved by: https://github.com/williamwen42 ghstack dependencies: #113036	2023-11-06 23:43:23 +00:00
Yifu Wang	ec18ef62f4	Native c10d_functional ops (#110570 ) This PR introduces a native version of c10d_functional ops. The main goal is to add collective support in AOTInductor and allow collective ops to work in multi-threaded native runtimes. The native version also incorporated API improvements we wished to implement in Python c10d_functional: - Removed `ranks` and `group_size` from collective op signatures which were proven to be redundant. - Use tensor storage as opposed to `void*` to resolve in-flight work. The native process group registration/resolution mechansim is only used for native c10d_functional in the PR. It will become the single source of truth in upcoming PRs. The upcoming PRs will implement Inductor/AOTInductor support for c10d_functional, after which native c10d_functional will replace Python c10d_functional. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110570 Approved by: https://github.com/wanchaol	2023-10-25 22:56:06 +00:00

7 Commits