Update ruff to 0.4.1 .
This version fixes a lot false negatives/false positives, is 20-40% faster, and has various other bug fixes.
Below is a before and after table showing the execution time of ruff lint and ruff format in milliseconds courtesy of https://astral.sh/blog/ruff-v0.4.0
| Repository | Linter (v0.3) | Linter (v0.4) | Formatter (v0.3) | Formatter (v0.4) |
|----------------------------------------------------|---------------|---------------|------------------|------------------|
| [pytorch/pytorch](https://github.com/pytorch/pytorch) | 328.7 | 251.8 | 351.1 | 274.9 |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124549
Approved by: https://github.com/ezyang
Fixes a bug where a reference to `_ProcessGroupWrapper` is used without first checking whether gloo is available. This fails on pytorch builds that do not include gloo becuase `_ProcessGroupWrapper` is only pybinded when building with `USE_GLOO=1`. Therefore, creation of a new process group fails with a `NameError` when only NCCL is available as the backend.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124233
Approved by: https://github.com/rohan-varma, https://github.com/d4l3k
Summary:
This ENV was introduced to safely rollout the behavior change in destroy
process group (e.g., call ncclCommsAbort). Now that this behavior change
were already rolled out, we no longer need this env and we should clean
up it to keep our code cleaner
Test Plan:
Modified/existing ut pass
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124334
Approved by: https://github.com/wconstab
Summary:
As part of the work of unifying process group identifier, log <group_name, group_desc>, instead of pg uid in profiler.
- group_name remains as the unique identifier, e.g. “0”, "1"
- group_desc will be the user specified name, e.g. "fsdp".
Reviewed By: aaronenyeshi, kwen2501
Differential Revision: D55610682
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124035
Approved by: https://github.com/aaronenyeshi
Summary:
We need a way to allow user set a customized description for a process group, e.g. FSDP, PP.
Here are several use cases of user specified group_desc:
- Logging: we can easily match a log line and understand what's this collective/pg is used to.
- Pytorch traces (e.g. Kineto, Execution Trace) can benefit from the PG desc since trace analysis, benchmarks will be able to easily differentiate PG purpose like FSDP, PP.
- Lower layer collectives(e.g. NCCL) debug: we will be able to expose PG desc to NCCL communicator so NCCL layer operations can be easily correlated to a PG.
Solution: Add a group_desc field to c10d
Differential Revision: D55781850
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123472
Approved by: https://github.com/kwen2501
Summary:
Pass python c10d group_name to c++ ProcessGroupNCCL so that the pg name will be consistent across different layers.
Also record pg_name in flight recorder entry.
Differential Revision: D55597200
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123117
Approved by: https://github.com/wconstab
Summary:
Minor logging cleanup in distributed library
1. Don't use "f" formatted strings - address linter issues.
2. Nits: Make use of unused `e` (error) in a few logs.
3. Change info->debug as asked in issue #113545
4. Nit: rename log -> logger in a few files for consistency
5. Fix a linter error.
Test Plan:
1. Local build passes.
2. Linter is happy.
Reviewers: wanchaol
Pull Request resolved: https://github.com/pytorch/pytorch/pull/122921
Approved by: https://github.com/wanchaol
Summary: Process Group config is essential to analyze collective pattern. We have added this in Execution Trace. Now expose this information in Kineto as well
Differential Revision: D53557965
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119443
Approved by: https://github.com/kwen2501
Summary:
This PR tries to resolve issue #119215.
Basically, processgroup shutdown (and hence ncclCommAbort) is called in
destroy_process_group APIs for the corresponding PGs. and in the
destructor of ProcessGroup, we avoid calling abort/ncclCommAbort.
Instead, it just checks if the users have explicitly already called destroy_process_group. If
not, Destructor will log a warning and encourage/expect users to do so
for cleanup of resources of PGs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119250
Approved by: https://github.com/minsii, https://github.com/kwen2501
The NCCL backend requires CUDA (including devices) to be available. So don't use that backend by default if that isn't the case to avoid the following error when creating a CPU-only device mesh:
> RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!
Fixes#117746
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119149
Approved by: https://github.com/kwen2501
Fixes https://github.com/pytorch/pytorch/issues/118129
Suppressions automatically added with
```
import re
with open("error_file.txt", "r") as f:
errors = f.readlines()
error_lines = {}
for error in errors:
match = re.match(r"(.*):(\d+):\d+: error:.*\[(.*)\]", error)
if match:
file_path, line_number, error_type = match.groups()
if file_path not in error_lines:
error_lines[file_path] = {}
error_lines[file_path][int(line_number)] = error_type
for file_path, lines in error_lines.items():
with open(file_path, "r") as f:
code = f.readlines()
for line_number, error_type in sorted(lines.items(), key=lambda x: x[0], reverse=True):
code[line_number - 1] = code[line_number - 1].rstrip() + f" # type: ignore[{error_type}]\n"
with open(file_path, "w") as f:
f.writelines(code)
```
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Co-authored-by: Catherine Lee <csl@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118533
Approved by: https://github.com/Skylion007, https://github.com/zou3519
Simplifies and optimizes dict construction using the `fromkeys` classmethod ctor. This also makes it really obvious when all the keys will have the same static value, which could be a bug if unintentional. It is also significantly faster than using a dict comprehension. The rule is in preview, but I am adding a forward fix for when it becomes stable.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118637
Approved by: https://github.com/albanD
Follow up https://github.com/pytorch/pytorch/pull/118359: whether``src`` and ``dst`` are base on global pg or sub pg
* update c10d docstring: ``src`` / ``dst`` are base on global pg regardless of ``group`` arguments
* communication ops with ``dst`` argument: ``reduce``, ``gather_object``, ``gather``, ``send``, ``isend``
* communication ops with ``src`` argument: ``irecv``, ``recv``, ``broadcast``, ``broadcast_object_list``, ``scatter``, ``scatter_object_list``
* ``pytest test/distributed/test_c10d_nccl.py -k subgroup``
3 collectives are for pickable objects (``gather_object``, ``broadcast_object_list``, ``scatter_object_list``). There are 2 ways to set device
* use device argument: it's implemented in ``broadcast_object_list``. maybe worth implementing in the other 2
* ``torch.cuda.set_device(global_rank)``
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118593
Approved by: https://github.com/wconstab
Fixes https://github.com/pytorch/pytorch/issues/118129
Suppressions automatically added with
```
import re
with open("error_file.txt", "r") as f:
errors = f.readlines()
error_lines = {}
for error in errors:
match = re.match(r"(.*):(\d+):\d+: error:.*\[(.*)\]", error)
if match:
file_path, line_number, error_type = match.groups()
if file_path not in error_lines:
error_lines[file_path] = {}
error_lines[file_path][int(line_number)] = error_type
for file_path, lines in error_lines.items():
with open(file_path, "r") as f:
code = f.readlines()
for line_number, error_type in sorted(lines.items(), key=lambda x: x[0], reverse=True):
code[line_number - 1] = code[line_number - 1].rstrip() + f" # type: ignore[{error_type}]\n"
with open(file_path, "w") as f:
f.writelines(code)
```
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118533
Approved by: https://github.com/Skylion007, https://github.com/zou3519
This diff introduces an env var `_USE_NATIVE_C10D_FUNCTIONAL` that tells `_functional_collective` to use native `c10d_functional` ops. The Python version and the native version will co-exist until we completely switch to the native version after more testing and verification.
NOTE: `DeviceMesh` support for native `c10d_functional` will be added in a subsequent PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113057
Approved by: https://github.com/LucasLLC, https://github.com/wconstab, https://github.com/wanchaol
Summary:
Adds a JK killswitch check and configures the env for enabling pytorch
nccl flight recorder. Note- this only enables recording events in memory, not
dumping them.
Test Plan: CI test
Reviewed By: zdevito
Differential Revision: D52920092
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118451
Approved by: https://github.com/malfet
Addresses #118337 somewhat- we probably need to update docs. Let's first
confirm what behavior we want.
Identifies a couple of confusing things
1) 'dst' arg for many collectives is always in 'global' rank regardless
of whether a subgroup is passed in. This needs a doc update
2) gather_object has a strong dependency on setting the cuda device;
could we make that smoother?
3) gather_object also should be happy with an empty list on the dst
side, imo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118359
Approved by: https://github.com/weifengpy