The current call passes in `['/actual/path']` to os.walk which is a string pointing to no path and thus silently leads to and empty traversal.
There is an unused function just above that handles that, so I guess this is what was supposed to be called.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126103
Approved by: https://github.com/suo
This PR fix `torch.backends.xeon.run_cpu` behavior when it is launched from `torchrun` with `--nproc-per-node` parameter.
As a CPU launcher, `run_cpu` would bind cores to each instance it launches using `numactl`, and assign cores to each instance evenly.
However, if we use `torchrun` to start `run_cpu` and use `--nproc-per-node` to create multiple `run_cpu` processes. In this case, each `run_cpu` process would assume it can use all the CPU cores, which causes each `run_cpu` process compete for CPU cores. This results in poor performance.
This PR recognize environment variable `LOCAL_WORLD_SIZE` and `LOCAL_RANK` set by `torchrun`, then use this information to further shard the cores bind to each instance. With this PR, when launched by `torchrun --nproc-per-node ...`, different CPU cores will be bind to different workers, which maximize CPU utilization and application performance.
The specific use case this PR enabled is using TorchServe with DeepSpeed tensor parallel. In this case, TorchServe would run `torchrun --nproc-per-node <tp_size>` to start tensor parallel workers it needed. When run TorchServe on multisocket CPU server with DeepSpeed tensor parallel, we need this PR to achieve best performance.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123711
Approved by: https://github.com/jingxu10, https://github.com/ezyang
Automatic fixes that replaces certain list comprehensions with generator ones where appropriate so that they are immediately consumed. This is preview functionality in ruff for rule C419 and it was automatically applied.
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123960
Approved by: https://github.com/malfet
Summary:
Pulling out logging parameters into a logging specs that can be overridden (follow-up changes on possible mechanism)
Why?
Right now the logging approach is quite rigid:
- Requires for log directory to exist and not be empty
- Will create tempdir otherwise,
- Creates subdir for a run
- creates subdir for each attempt
- creates files named as stdout.log, stderr.log, error.json
In some instances some of the users would like to customize the behavior including file names based on context. And we do have right now a mechanism to template multiplexed teed output prefix.
With current changes, users can create custom log spec that can use env variables to change the behavior.
Notes:
Made `LaunchConf.logs_specs` as an optional field that will be bound to `DefaultLogsSpecs` instance. There are large number of clients (code) that use the API directly without using torchrun API. For those cases, we have to explicitly pass LogSpecs implementation if we would like to override the implementation. For the regular torchrun users, we can use pluggable approach proposed in the follow up change.
Test Plan: CI + unit tests
Differential Revision: D54176265
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120691
Approved by: https://github.com/ezyang
Fixes#112633
Fixed errors relating to pydocstyle in the following files. The remaining errors are not covered in this issue. `torch/utils/dlpack.py` was not modified as the errors are relating to the function signature in the first line in the docstring which must be maintained as is for proper Sphinx interpretation.
```python
def from_dlpack(ext_tensor: Any) -> 'torch.Tensor':
"""from_dlpack(ext_tensor) -> Tensor
.....
"""
```
pydocstyle torch/utils/_contextlib.py --count
before: 4
after: 0
pydocstyle torch/backends/mps/__init__.py --count
before: 8
after: 1
**remaining errors**
```
torch/backends/mps/__init__.py:1 at module level:
D104: Missing docstring in public package
```
pydocstyle torch/backends/xeon/run_cpu.py --count
before: 13
after: 1
**remaining errors**
```
torch/backends/xeon/run_cpu.py:864 in public function `main`:
D103: Missing docstring in public function
```
pydocstyle torch/backends/cpu/__init__.py --count
before: 2
after: 1
**remaining errors**
```
torch/backends/cpu/__init__.py:1 at module level:
D104: Missing docstring in public package
```
pydocstyle torch/utils/cpp_backtrace.py --count
before: 4
after: 1
**remaining errors**
```
torch/utils/cpp_backtrace.py:1 at module level:
D100: Missing docstring in public module
```
pydocstyle torch/utils/bundled_inputs.py --count
before: 8
after: 1
**remaining errors**
```
torch/utils/bundled_inputs.py:1 at module level:
D100: Missing docstring in public module
```
pydocstyle torch/utils/file_baton.py --count
before: 8
after: 1
**remaining errors**
```
torch/utils/file_baton.py:1 at module level:
D100: Missing docstring in public module
```
pydocstyle torch/utils/mobile_optimizer.py --count
before: 6
after: 1
**remaining errors**
```
torch/utils/mobile_optimizer.py:8 in public class `LintCode`:
D101: Missing docstring in public class
```
pydocstyle torch/backends/opt_einsum/__init__.py --count
before: 7
after: 5
**remaining errors**
```
torch/backends/opt_einsum/__init__.py:1 at module level:
D104: Missing docstring in public package
torch/backends/opt_einsum/__init__.py:67 in public function `set_flags`:
D103: Missing docstring in public function
torch/backends/opt_einsum/__init__.py:77 in public function `flags`:
D103: Missing docstring in public function
torch/backends/opt_einsum/__init__.py:93 in public class `OptEinsumModule`:
D101: Missing docstring in public class
torch/backends/opt_einsum/__init__.py:94 in public method `__init__`:
D107: Missing docstring in __init__
```
pydocstyle torch/utils/_device.py --count
before: 9
after: 6
**remaining errors**
```
torch/utils/_device.py:58 in public class `DeviceContext`:
D101: Missing docstring in public class
torch/utils/_device.py:59 in public method `__init__`:
D107: Missing docstring in __init__
torch/utils/_device.py:62 in public method `__enter__`:
D105: Missing docstring in magic method
torch/utils/_device.py:68 in public method `__exit__`:
D105: Missing docstring in magic method
torch/utils/_device.py:73 in public method `__torch_function__`:
D105: Missing docstring in magic method
torch/utils/_device.py:80 in public function `device_decorator`:
D103: Missing docstring in public function
```
pydocstyle torch/utils/_freeze.py --count
before: 15
after: 7
**remaining errors**
```
torch/utils/_freeze.py:77 in public function `indent_msg`:
D103: Missing docstring in public function
torch/utils/_freeze.py:89 in public class `FrozenModule`:
D101: Missing docstring in public class
torch/utils/_freeze.py:100 in public class `Freezer`:
D101: Missing docstring in public class
torch/utils/_freeze.py:101 in public method `__init__`:
D107: Missing docstring in __init__
torch/utils/_freeze.py:106 in public method `msg`:
D102: Missing docstring in public method
torch/utils/_freeze.py:185 in public method `get_module_qualname`:
D102: Missing docstring in public method
torch/utils/_freeze.py:206 in public method `compile_string`:
D102: Missing docstring in public method
```
pydocstyle torch/utils/throughput_benchmark.py --count
before: 25
after: 8
**remaining errors**
```
torch/utils/throughput_benchmark.py:1 at module level:
D100: Missing docstring in public module
torch/utils/throughput_benchmark.py:27 in public class `ExecutionStats`:
D101: Missing docstring in public class
torch/utils/throughput_benchmark.py:28 in public method `__init__`:
D107: Missing docstring in __init__
torch/utils/throughput_benchmark.py:33 in public method `latency_avg_ms`:
D102: Missing docstring in public method
torch/utils/throughput_benchmark.py:37 in public method `num_iters`:
D102: Missing docstring in public method
torch/utils/throughput_benchmark.py:46 in public method `total_time_seconds`:
D102: Missing docstring in public method
torch/utils/throughput_benchmark.py:50 in public method `__str__`:
D105: Missing docstring in magic method
torch/utils/throughput_benchmark.py:94 in public method `__init__`:
D107: Missing docstring in __init__
```
pydocstyle torch/utils/hooks.py --count
before: 14
after: 11
**remaining errors**
```
torch/utils/hooks.py:1 at module level:
D100: Missing docstring in public module
torch/utils/hooks.py:23 in public method `__init__`:
D107: Missing docstring in __init__
torch/utils/hooks.py:34 in public method `remove`:
D102: Missing docstring in public method
torch/utils/hooks.py:44 in public method `__getstate__`:
D105: Missing docstring in magic method
torch/utils/hooks.py:50 in public method `__setstate__`:
D105: Missing docstring in magic method
torch/utils/hooks.py:64 in public method `__enter__`:
D105: Missing docstring in magic method
torch/utils/hooks.py:67 in public method `__exit__`:
D105: Missing docstring in magic method
torch/utils/hooks.py:82 in public function `warn_if_has_hooks`:
D103: Missing docstring in public function
torch/utils/hooks.py:103 in public method `__init__`:
D107: Missing docstring in __init__
torch/utils/hooks.py:188 in public method `setup_input_hook`:
D102: Missing docstring in public method
torch/utils/hooks.py:197 in public method `setup_output_hook`:
D102: Missing docstring in public method
```
pydocstyle torch/utils/_traceback.py --count
before: 19
after: 14
**remaining errors**
```
torch/utils/_traceback.py:47 in public function `report_compile_source_on_error`:
D103: Missing docstring in public function
torch/utils/_traceback.py:160 in public class `CapturedTraceback`:
D101: Missing docstring in public class
torch/utils/_traceback.py:163 in public method `__init__`:
D107: Missing docstring in __init__
torch/utils/_traceback.py:167 in public method `cleanup`:
D102: Missing docstring in public method
torch/utils/_traceback.py:170 in public method `summary`:
D102: Missing docstring in public method
torch/utils/_traceback.py:182 in public method `__getstate__`:
D105: Missing docstring in magic method
torch/utils/_traceback.py:190 in public method `extract`:
D205: 1 blank line required between summary line and description (found 0)
torch/utils/_traceback.py:190 in public method `extract`:
D400: First line should end with a period (not 't')
torch/utils/_traceback.py:213 in public method `format`:
D205: 1 blank line required between summary line and description (found 0)
torch/utils/_traceback.py:213 in public method `format`:
D400: First line should end with a period (not 'f')
torch/utils/_traceback.py:213 in public method `format`:
D401: First line should be in imperative mood (perhaps 'Format', not 'Formats')
torch/utils/_traceback.py:224 in public method `format_all`:
D200: One-line docstring should fit on one line with quotes (found 3)
torch/utils/_traceback.py:247 in private function `_extract_symbolized_tb`:
D205: 1 blank line required between summary line and description (found 0)
torch/utils/_traceback.py:247 in private function `_extract_symbolized_tb`:
D400: First line should end with a period (not 'f')
```
pydocstyle torch/utils/mkldnn.py --count
before: 28
after: 26
**remaining errors**
```
torch/utils/mkldnn.py:1 at module level:
D100: Missing docstring in public module
torch/utils/mkldnn.py:4 in public class `MkldnnLinear`:
D101: Missing docstring in public class
torch/utils/mkldnn.py:5 in public method `__init__`:
D107: Missing docstring in __init__
torch/utils/mkldnn.py:19 in public method `__getstate__`:
D105: Missing docstring in magic method
torch/utils/mkldnn.py:23 in public method `__setstate__`:
D105: Missing docstring in magic method
torch/utils/mkldnn.py:29 in public method `forward`:
D102: Missing docstring in public method
torch/utils/mkldnn.py:75 in public class `MkldnnConv1d`:
D101: Missing docstring in public class
torch/utils/mkldnn.py:76 in public method `__init__`:
D107: Missing docstring in __init__
torch/utils/mkldnn.py:82 in public method `__setstate__`:
D105: Missing docstring in magic method
torch/utils/mkldnn.py:88 in public class `MkldnnConv2d`:
D101: Missing docstring in public class
torch/utils/mkldnn.py:89 in public method `__init__`:
D107: Missing docstring in __init__
torch/utils/mkldnn.py:100 in public method `__setstate__`:
D105: Missing docstring in magic method
torch/utils/mkldnn.py:110 in public class `MkldnnConv3d`:
D101: Missing docstring in public class
torch/utils/mkldnn.py:111 in public method `__init__`:
D107: Missing docstring in __init__
torch/utils/mkldnn.py:122 in public method `__setstate__`:
D105: Missing docstring in magic method
torch/utils/mkldnn.py:133 in public class `MkldnnBatchNorm`:
D101: Missing docstring in public class
torch/utils/mkldnn.py:136 in public method `__init__`:
D107: Missing docstring in __init__
torch/utils/mkldnn.py:155 in public method `__getstate__`:
D105: Missing docstring in magic method
torch/utils/mkldnn.py:163 in public method `__setstate__`:
D105: Missing docstring in magic method
torch/utils/mkldnn.py:171 in public method `forward`:
D102: Missing docstring in public method
torch/utils/mkldnn.py:184 in public class `MkldnnPrelu`:
D101: Missing docstring in public class
torch/utils/mkldnn.py:185 in public method `__init__`:
D107: Missing docstring in __init__
torch/utils/mkldnn.py:190 in public method `__getstate__`:
D105: Missing docstring in magic method
torch/utils/mkldnn.py:194 in public method `__setstate__`:
D105: Missing docstring in magic method
torch/utils/mkldnn.py:199 in public method `forward`:
D102: Missing docstring in public method
torch/utils/mkldnn.py:205 in public function `to_mkldnn`:
D103: Missing docstring in public function
```
pydocstyle torch/utils/weak.py --count
before: 32
after: 30
**remaining errors**
```
torch/utils/weak.py:1 at module level:
D100: Missing docstring in public module
torch/utils/weak.py:42 in public class `WeakIdRef`:
D101: Missing docstring in public class
torch/utils/weak.py:45 in public method `__init__`:
D107: Missing docstring in __init__
torch/utils/weak.py:54 in public method `__call__`:
D102: Missing docstring in public method
torch/utils/weak.py:61 in public method `__hash__`:
D105: Missing docstring in magic method
torch/utils/weak.py:64 in public method `__eq__`:
D105: Missing docstring in magic method
torch/utils/weak.py:84 in public class `WeakIdKeyDictionary`:
D101: Missing docstring in public class
torch/utils/weak.py:87 in public method `__init__`:
D107: Missing docstring in __init__
torch/utils/weak.py:131 in public method `__delitem__`:
D105: Missing docstring in magic method
torch/utils/weak.py:135 in public method `__getitem__`:
D105: Missing docstring in magic method
torch/utils/weak.py:138 in public method `__len__`:
D105: Missing docstring in magic method
torch/utils/weak.py:145 in public method `__repr__`:
D105: Missing docstring in magic method
torch/utils/weak.py:148 in public method `__setitem__`:
D105: Missing docstring in magic method
torch/utils/weak.py:151 in public method `copy`:
D102: Missing docstring in public method
torch/utils/weak.py:162 in public method `__deepcopy__`:
D105: Missing docstring in magic method
torch/utils/weak.py:172 in public method `get`:
D102: Missing docstring in public method
torch/utils/weak.py:175 in public method `__contains__`:
D105: Missing docstring in magic method
torch/utils/weak.py:182 in public method `items`:
D102: Missing docstring in public method
torch/utils/weak.py:189 in public method `keys`:
D102: Missing docstring in public method
torch/utils/weak.py:198 in public method `values`:
D102: Missing docstring in public method
torch/utils/weak.py:216 in public method `popitem`:
D102: Missing docstring in public method
torch/utils/weak.py:224 in public method `pop`:
D102: Missing docstring in public method
torch/utils/weak.py:228 in public method `setdefault`:
D102: Missing docstring in public method
torch/utils/weak.py:231 in public method `update`:
D102: Missing docstring in public method
torch/utils/weak.py:241 in public method `__ior__`:
D105: Missing docstring in magic method
torch/utils/weak.py:245 in public method `__or__`:
D105: Missing docstring in magic method
torch/utils/weak.py:252 in public method `__ror__`:
D105: Missing docstring in magic method
torch/utils/weak.py:262 in public method `__eq__`:
D105: Missing docstring in magic method
torch/utils/weak.py:276 in public method `__init__`:
D107: Missing docstring in __init__
torch/utils/weak.py:280 in public method `__call__`:
D102: Missing docstring in public method
```
@mikaylagawarecki @jbschlosser @svekars
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113311
Approved by: https://github.com/ezyang
Enables two ruff rules derived from pylint:
* PLR1722 replaces any exit() calls with sys.exit(). exit() is only designed to be used in repl contexts as may not always be imported by default. This always use the version in the sys module which is better
* PLW3301 replaces nested min / max calls with simplified versions (ie. `min(a, min(b, c))` => `min(a, b. c)`). The new version is more idiomatic and more efficient.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109461
Approved by: https://github.com/ezyang
- port https://github.com/intel-innersource/frameworks.ai.pytorch.ipex-cpu/pull/740 to `run_cpu`
- use-case by https://github.com/pytorch/serve/pull/2166 where `numactl` is unavailable (e.g., requires `privileged` mode)
This PR automatically tries taskset if numactl core binding doesn't work.
Reference:
`taskset` is added to adapt to launcher use-cases such as in docker where `numactl` requires to be ran in `privileged` mode, where the `privileged` mode "wont work for deployments like sagemaker for example" as raised by TorchServe. Please see [torchserve ipex docker discussion](https://github.com/pytorch/serve/pull/1401#issuecomment-1090817704) for reference. To address such use-cases, `taskset` can be used in place of `numactl` to set core affinity. Note that, unlike `numactl`, `taskset` does not provide memory binding to local memories; however, memory binding may not be needed in these use-cases that typically do not span multi sockets. Hence we can automatically try taskset if numactl doesn't work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96011
Approved by: https://github.com/jgong5, https://github.com/malfet
Preferring dash over underscore in command-line options. Add `--command-arg-name` to the argument parser. The old arguments with underscores `--command_arg_name` are kept for backward compatibility.
Both dashes and underscores are used in the PyTorch codebase. Some argument parsers only have dashes or only have underscores in arguments. For example, the `torchrun` utility for distributed training only accepts underscore arguments (e.g., `--master_port`). The dashes are more common in other command-line tools. And it looks to be the default choice in the Python standard library:
`argparse.BooleanOptionalAction`: 4a9dff0e5a/Lib/argparse.py (L893-L895)
```python
class BooleanOptionalAction(Action):
def __init__(...):
if option_string.startswith('--'):
option_string = '--no-' + option_string[2:]
_option_strings.append(option_string)
```
It adds `--no-argname`, not `--no_argname`. Also typing `_` need to press the shift or the caps-lock key than `-`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94505
Approved by: https://github.com/ezyang, https://github.com/seemethere