Commit Graph

7 Commits

Author SHA1 Message Date
Yifu Wang
b778f44e97 Allow using native c10d_functional via _functional_collectives (#113057)
This diff introduces an env var `_USE_NATIVE_C10D_FUNCTIONAL` that tells `_functional_collective` to use native `c10d_functional` ops. The Python version and the native version will co-exist until we completely switch to the native version after more testing and verification.

NOTE: `DeviceMesh` support for native `c10d_functional` will be added in a subsequent PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113057
Approved by: https://github.com/LucasLLC, https://github.com/wconstab, https://github.com/wanchaol
2024-01-30 02:34:25 +00:00
Yifu Wang
7d0ad6e870 Make native c10d_functional ops work with AOTInductor (#113735)
Summary:
- Revised `c10d_functional` ops to conform to https://github.com/pytorch/pytorch/tree/main/aten/src/ATen/native#func
- Modifed `get_cpp_op_schema()` to handle mutable args and aliasing returns

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113735
Approved by: https://github.com/desertfire
ghstack dependencies: #113438
2023-12-22 08:12:13 +00:00
Yifu Wang
718b576e2c Port all_to_all_single to native c10d_functional (#113438)
Summary:
- Ported `all_to_all_single` to native c10d_functional
- Added Inductor support for the native `all_to_all_single` via the new collective IR's `create_out_of_place()`
- Since the new collective IR derives from `FallbackKernel` which implements a generic `free_unbacked_symbols`, no additional unbacked symbol handling for all_to_all_single is required

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113438
Approved by: https://github.com/yf225, https://github.com/ezyang
2023-12-22 08:12:13 +00:00
rzou
a06832f911 Grandfather in c10d_functional ops to pt2_compliant (#113049)
This PR also adds the ability to specify Tags for more `m.def(`
overloads.

Test Plan:
- new test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113049
Approved by: https://github.com/williamwen42
2023-11-07 12:55:05 +00:00
PyTorch MergeBot
1fea599d9a Revert "Grandfather in c10d_functional ops to pt2_compliant (#113049)"
This reverts commit fe8570a1fe.

Reverted https://github.com/pytorch/pytorch/pull/113049 on behalf of https://github.com/clee2000 due to something in the stack broke distributed and inductor, pretty sure its this one ([comment](https://github.com/pytorch/pytorch/pull/113049#issuecomment-1797298969))
2023-11-07 02:34:13 +00:00
rzou
fe8570a1fe Grandfather in c10d_functional ops to pt2_compliant (#113049)
This PR also adds the ability to specify Tags for more `m.def(`
overloads.

Test Plan:
- new test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113049
Approved by: https://github.com/williamwen42
ghstack dependencies: #113036
2023-11-06 23:43:23 +00:00
Yifu Wang
ec18ef62f4 Native c10d_functional ops (#110570)
This PR introduces a native version of c10d_functional ops. The main goal is to add collective support in AOTInductor and allow collective ops to work in multi-threaded native runtimes.

The native version also incorporated API improvements we wished to implement in Python c10d_functional:

- Removed `ranks` and `group_size` from collective op signatures which were proven to be redundant.
- Use tensor storage as opposed to `void*` to resolve in-flight work.

The native process group registration/resolution mechansim is only used for native c10d_functional in the PR. It will become the single source of truth in upcoming PRs.

The upcoming PRs will implement Inductor/AOTInductor support for c10d_functional, after which native c10d_functional will replace Python c10d_functional.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110570
Approved by: https://github.com/wanchaol
2023-10-25 22:56:06 +00:00