Commit Graph

110 Commits

Author SHA1 Message Date
Xuehai Pan
b77406a9ec [BE][CI] bump ruff to 0.8.4 (#143753)
Changes:

1. Bump `ruff` from 0.7.4 to 0.8.4
2. Change `%`-formatted strings to f-string
3. Change arguments with the `__`-prefix to positional-only arguments with the `/` separator in function signature.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/143753
Approved by: https://github.com/Skylion007
2024-12-24 12:24:10 +00:00
Tom Ritchford
c0582fd0f8 Remove unused Python variables in torch/[b-z]* (#136963)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136963
Approved by: https://github.com/ezyang
2024-10-19 16:45:22 +00:00
Yuxin Wu
8c3ab21490 multiprocessing.spawn: allow a grace period when shutdown (#131278)
When one process fails, others are immediately killed. This prevents other processes to do necessary cleanups, or dump debug information (in particular, the NCCL flight recorder).

This PR adds a grace period. Default behavior is unchanged.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131278
Approved by: https://github.com/albanD
2024-10-07 12:37:34 +00:00
Kiuk Chung
abd16a8c64 [torch/multiprocessing] Use multiprocessing.reduction.register ForkingPickler.register to register custom tensor and storage reductions (#135030)
Right now `multiprocessing.reduction.register()` is simply an alias to `multiprocessing.reduction.ForkingPickler.register()`
https://github.com/python/cpython/blame/main/Lib/multiprocessing/reduction.py#L56, but the top-level `register()` function exposes less of the internal details of `multiprocessing.reduction` module.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135030
Approved by: https://github.com/albanD
2024-09-16 20:07:29 +00:00
Aaron Gokaslan
31715be72a [BE]: Update mypy to 1.11.2 (#133816)
Updates mypy to 1.11.1 to improve type inference

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133816
Approved by: https://github.com/ezyang
2024-09-16 19:44:11 +00:00
PyTorch MergeBot
3117f2cf67 Revert "[BE]: Update mypy to 1.11.2 (#133816)"
This reverts commit 55299cfc22.

Reverted https://github.com/pytorch/pytorch/pull/133816 on behalf of https://github.com/jeanschmidt due to seems to have broken https://github.com/pytorch/pytorch/actions/runs/10865710499/job/30155699792 on main ([comment](https://github.com/pytorch/pytorch/pull/133816#issuecomment-2352377684))
2024-09-16 09:11:16 +00:00
Aaron Gokaslan
55299cfc22 [BE]: Update mypy to 1.11.2 (#133816)
Updates mypy to 1.11.1 to improve type inference

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133816
Approved by: https://github.com/ezyang
2024-09-14 21:40:36 +00:00
Jia Li
20b62fed21 Create processes in parallel in mp.start_processes for forkserver (#134629)
Summary:
This is to fix the pytorch issue filed https://github.com/pytorch/pytorch/issues/133010
one way to fix this problem is to enable parallel start processes in mp.start_processes.
What else in the diff:
refactored a test case api_test which was repeating a lot of tests due to the inheritance.
added unit test for forkserver when parallel start is on.

Test Plan: Added unit tests

Differential Revision: D61878552

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134629
Approved by: https://github.com/d4l3k
2024-08-28 21:34:32 +00:00
PyTorch MergeBot
adcce538b7 Revert "Allow mp.start_processes to create processes in parallel (#133707)"
This reverts commit 3546628a2a.

Reverted https://github.com/pytorch/pytorch/pull/133707 on behalf of https://github.com/ZainRizvi due to sorry but trunk has been consistently broken since this PR was merged. See: [GH job link](https://github.com/pytorch/pytorch/actions/runs/10529617600/job/29191757055) [HUD commit link](3546628a2a) ([comment](https://github.com/pytorch/pytorch/pull/133707#issuecomment-2310709523))
2024-08-26 17:31:10 +00:00
Jia Li
3546628a2a Allow mp.start_processes to create processes in parallel (#133707)
Summary:
Background discussion in https://fb.workplace.com/groups/319878845696681/posts/1226087421742481

and pytorch issue filed https://github.com/pytorch/pytorch/issues/133010

one way to fix this problem is to add an option to parallel start processes on pytorch side.

Test Plan: Tested aps run in problem and things are in parallel now (next diff)

Differential Revision: D61301603

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133707
Approved by: https://github.com/d4l3k, https://github.com/ezyang
2024-08-23 17:11:20 +00:00
Xuehai Pan
f3fce597e9 [BE][Easy][17/19] enforce style for empty lines in import segments in torch/[a-c]*/ and torch/[e-n]*/ (#129769)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129769
Approved by: https://github.com/ezyang
2024-08-04 10:24:09 +00:00
Oguz Ulgen
72d2dba992 Add None return type to init (#132335)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132335
Approved by: https://github.com/albanD
2024-08-01 15:26:45 +00:00
Kurt Mohler
e590168865 Enable sharing meta tensors between processes (#129520)
Fixes #129436

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129520
Approved by: https://github.com/ezyang
2024-07-04 20:29:48 +00:00
Tristan Rice
7c370d2fb0 expose set_thread_name to Python and set thread names (#128448)
This adds a new multiprocessing method `_set_thread_name` and calls it from torchelastic and dataloader main functions. This will allow better monitoring of processes as we can separate elastic and dataloading processes from the main training process.

Threads named:

* torchrun/elastic
* PyTorch dataloader worker processes + pin memory thread
* TCPStore
* ProcessGroupNCCL background threads
* WorkerServer httpserver thread

Test plan:

```
$ torchrun --nnodes 1 --nproc_per_node 1 --no-python /bin/bash -c 'ps -eL | grep pt_'
3264281 3264281 pts/45   00:00:02 pt_elastic
3264281 3267950 pts/45   00:00:00 pt_elastic
```

dataloading

```py
import torch
import time

from torch.utils.data import (
    DataLoader,
    Dataset,
)

class NoopDataset(Dataset):
    def __getitem__(self, index):
        return index

    def __len__(self):
        return 10

dataloader = DataLoader(NoopDataset(), num_workers=2)

for i, x in enumerate(dataloader):
    print(i, x)
    time.sleep(10000)
```

```
$ python3 ~/scripts/dataloader_test.py
$ ps -eL | grep pt_
1228312 1228312 pts/45   00:00:02 pt_main_thread
1228312 1230058 pts/45   00:00:00 pt_main_thread
1228312 1230059 pts/45   00:00:00 pt_main_thread
1230052 1230052 pts/45   00:00:00 pt_data_worker
1230052 1230198 pts/45   00:00:00 pt_data_worker
1230052 1230740 pts/45   00:00:00 pt_data_worker
1230055 1230055 pts/45   00:00:00 pt_data_worker
1230055 1230296 pts/45   00:00:00 pt_data_worker
1230055 1230759 pts/45   00:00:00 pt_data_worker
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128448
Approved by: https://github.com/c-p-i-o, https://github.com/andrewkho, https://github.com/rsdcastro
2024-06-13 16:38:23 +00:00
Xuehai Pan
83bb9b7c53 [BE] explicitly export subpackage torch.utils (#128342)
Resolves #126401

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128342
Approved by: https://github.com/Skylion007
ghstack dependencies: #127707
2024-06-13 04:39:16 +00:00
Aaron Orenstein
038b927590 Flip default value for mypy disallow_untyped_defs [7/11] (#127844)
See #127836 for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127844
Approved by: https://github.com/oulgen
ghstack dependencies: #127842, #127843
2024-06-08 18:49:45 +00:00
Xuehai Pan
67ef2683d9 [BE] wrap deprecated function/class with typing_extensions.deprecated (#127689)
Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing.

Note that only warnings that their messages contain `[Dd]eprecat(ed|ion)` are updated in this PR.

Resolves #126888

- #126888

This PR is split from PR #126898.

- #126898

------

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127689
Approved by: https://github.com/Skylion007
2024-06-02 12:30:43 +00:00
PyTorch MergeBot
033e733021 Revert "[BE] wrap deprecated function/class with typing_extensions.deprecated (#126898)"
This reverts commit 749a132fb0.

Reverted https://github.com/pytorch/pytorch/pull/126898 on behalf of https://github.com/fbgheith due to switching typing-extensions=4.3.0 to 4.9.0 causes internal failure ([comment](https://github.com/pytorch/pytorch/pull/126898#issuecomment-2142884456))
2024-05-31 19:47:24 +00:00
Xuehai Pan
749a132fb0 [BE] wrap deprecated function/class with typing_extensions.deprecated (#126898)
Use `typing_extensions.deprecated` for deprecation annotation if possible. Otherwise, add `category=FutureWarning` to `warnings.warn("message")` if the category is missing.

Note that only warnings that their messages contain `[Dd]eprecat(ed|ion)` are updated in this PR.

UPDATE: Use `FutureWarning` instead of `DeprecationWarning`.

Resolves #126888

- #126888

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126898
Approved by: https://github.com/albanD
2024-05-29 12:09:27 +00:00
Aaron Gokaslan
34910f87f0 [BE]: Update ruff to v0.4.4 (#125031)
Update ruff version to 0.4.2. This version mostly has bugfixes for the new parser and also updates the f-string rule to be able to apply more fixes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125031
Approved by: https://github.com/albanD, https://github.com/malfet
2024-05-12 20:02:37 +00:00
Chip Turner
2ed47fecc5 Robustify torch.multiprocessing.spawn error reporting to be less deadlock prone (#114688)
multiprocessing.Queue relies on, among other things, background threads to send messages between processes.  This works in the happy path but can cause issues if a process is exiting by bypassing atexit handlers or crashing because the writer to the Queue can terminate while the reader is blocked reading the queue.  The reader sees the queue as non-empty yet even with a timeout will actually block forever.

An example of a Queue deadlock is here: https://gist.github.com/chipturner/342f72341f087737befe9df84d0e41ce

Since the error reporting case here is a simple one-shot message from the dying child to the parent, we can just use a file-based rendezvous.  This eliminates the deadlock when a large traceback is still being flushed to the network when a child exits.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114688
Approved by: https://github.com/suo, https://github.com/yifuwang
2023-12-09 03:36:43 +00:00
Satendra Gera
9521331ba5 [pytorch] Multiprocessing api to use sigkill if sigterm doesn't kill the process (#115219)
Summary:
[pytorch] Multiprocessing api to use sigkill if sigterm doesn't kill the process
We have seen a handful of jobs training stuck where one of the trainer goes down
while others are stuck in c++ land and hence not handling the sigterm.

Test Plan: Manually validated by attaching gdb to one of the processes and sent a kill -9 to another. Saw the log ```WARNING] Unable to shutdown process 4422 via Signals.SIGTERM, forcefully exiting via Signals.SIGKILL```

Differential Revision: D51862545

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115219
Approved by: https://github.com/wconstab, https://github.com/fduwjj
2023-12-08 02:26:19 +00:00
Chip Turner
d7160c9223 Handle potential ValueError exception when stringifying signals (#114696)
On some systems it is possible to receive a signal that does not have a name.  Rare, but possible.  This prevents our error handler from crashing and instead properly reports the signal.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114696
Approved by: https://github.com/xmfan
2023-12-07 02:10:30 +00:00
Pearu Peterson
0bd4d1f4ab Add sparse tensors support to dataloader. (#112842)
Fixes https://github.com/pytorch/pytorch/issues/106837

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112842
Approved by: https://github.com/cpuhrsch, https://github.com/gokulavasan
2023-11-19 16:05:27 +00:00
IvanLauLinTiong
91c90f232a Fix docstring errors in reductions.py, spawn.py, pool.py, parameter.py, cpp.py, grad.py, __init__.py, profiler.py, queue.py, graph.py (#113052)
Fixes #112595
- `torch/autograd/profiler.py` </br>
**Before: 37**

```
torch/autograd/profiler.py:1 at module level:
        D100: Missing docstring in public module
torch/autograd/profiler.py:91 in public class `profile`:
        D205: 1 blank line required between summary line and description (found 0)
torch/autograd/profiler.py:175 in public method `__init__`:
        D107: Missing docstring in __init__
torch/autograd/profiler.py:261 in public method `config`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:272 in public method `__enter__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:290 in public method `__exit__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:308 in public method `__repr__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:313 in public method `__str__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:322 in public method `table`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:346 in public method `export_chrome_trace`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:355 in public method `export_stacks`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:361 in public method `key_averages`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:368 in public method `total_average`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:377 in public method `self_cpu_time_total`:
        D205: 1 blank line required between summary line and description (found 0)
torch/autograd/profiler.py:377 in public method `self_cpu_time_total`:
        D400: First line should end with a period (not 'f')
torch/autograd/profiler.py:555 in public class `record_function`:
        D205: 1 blank line required between summary line and description (found 0)
torch/autograd/profiler.py:555 in public class `record_function`:
        D400: First line should end with a period (not 'f')
torch/autograd/profiler.py:591 in public method `__init__`:
        D107: Missing docstring in __init__
torch/autograd/profiler.py:602 in public method `__enter__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:608 in public method `__exit__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:625 in private method `_call_end_callbacks_on_future`:
        D205: 1 blank line required between summary line and description (found 0)
torch/autograd/profiler.py:625 in private method `_call_end_callbacks_on_future`:
        D400: First line should end with a period (not 'c')
torch/autograd/profiler.py:707 in public method `__init__`:
        D107: Missing docstring in __init__
torch/autograd/profiler.py:712 in public method `__enter__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:733 in public method `__exit__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:826 in public method `__init__`:
        D107: Missing docstring in __init__
torch/autograd/profiler.py:831 in public method `__enter__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:853 in public method `__exit__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:863 in public function `load_nvprof`:
        D401: First line should be in imperative mood (perhaps 'Open', not 'Opens')
torch/autograd/profiler.py:874 in public method `__init__`:
        D107: Missing docstring in __init__
torch/autograd/profiler.py:877 in public method `see`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:883 in public function `parse_nvprof_trace`:
        D103: Missing docstring in public function
torch/autograd/profiler.py:951 in public class `KinetoStepTracker`:
        D205: 1 blank line required between summary line and description (found 0)
torch/autograd/profiler.py:991 in public method `init_step_count`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:995 in public method `erase_step_count`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:1000 in public method `increment_step`:
        D205: 1 blank line required between summary line and description (found 0)
torch/autograd/profiler.py:1023 in public method `current_step`:
        D102: Missing docstring in public method
37
```

**After: 27**

```
torch/autograd/profiler.py:1 at module level:
        D100: Missing docstring in public module
torch/autograd/profiler.py:176 in public method `__init__`:
        D107: Missing docstring in __init__
torch/autograd/profiler.py:262 in public method `config`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:273 in public method `__enter__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:291 in public method `__exit__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:309 in public method `__repr__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:314 in public method `__str__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:323 in public method `table`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:347 in public method `export_chrome_trace`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:356 in public method `export_stacks`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:362 in public method `key_averages`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:369 in public method `total_average`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:593 in public method `__init__`:
        D107: Missing docstring in __init__
torch/autograd/profiler.py:604 in public method `__enter__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:610 in public method `__exit__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:708 in public method `__init__`:
        D107: Missing docstring in __init__
torch/autograd/profiler.py:713 in public method `__enter__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:734 in public method `__exit__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:827 in public method `__init__`:
        D107: Missing docstring in __init__
torch/autograd/profiler.py:832 in public method `__enter__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:854 in public method `__exit__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:875 in public method `__init__`:
        D107: Missing docstring in __init__
torch/autograd/profiler.py:878 in public method `see`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:884 in public function `parse_nvprof_trace`:
        D103: Missing docstring in public function
torch/autograd/profiler.py:993 in public method `init_step_count`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:997 in public method `erase_step_count`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:1025 in public method `current_step`:
        D102: Missing docstring in public method
27
```

- `torch/autograd/graph.py` </br>
**Before: 22**

```
torch/autograd/graph.py:1 at module level:
        D100: Missing docstring in public module
torch/autograd/graph.py:24 in public class `Node`:
        D101: Missing docstring in public class
torch/autograd/graph.py:27 in public method `name`:
        D401: First line should be in imperative mood (perhaps 'Return', not 'Returns')
torch/autograd/graph.py:42 in public method `next_functions`:
        D102: Missing docstring in public method
torch/autograd/graph.py:47 in public method `metadata`:
        D401: First line should be in imperative mood (perhaps 'Return', not 'Returns')
torch/autograd/graph.py:56 in public method `register_hook`:
        D401: First line should be in imperative mood (perhaps 'Register', not 'Registers')
torch/autograd/graph.py:94 in public method `register_prehook`:
        D401: First line should be in imperative mood (perhaps 'Register', not 'Registers')
torch/autograd/graph.py:129 in public method `__subclasshook__`:
        D105: Missing docstring in magic method
torch/autograd/graph.py:147 in public function `get_gradient_edge`:
        D205: 1 blank line required between summary line and description (found 0)
torch/autograd/graph.py:147 in public function `get_gradient_edge`:
        D400: First line should end with a period (not 'f')
torch/autograd/graph.py:147 in public function `get_gradient_edge`:
        D401: First line should be in imperative mood; try rephrasing (found 'This')
torch/autograd/graph.py:166 in public function `increment_version`:
        D205: 1 blank line required between summary line and description (found 0)
torch/autograd/graph.py:166 in public function `increment_version`:
        D400: First line should end with a period (not 'd')
torch/autograd/graph.py:166 in public function `increment_version`:
        D401: First line should be in imperative mood; try rephrasing (found 'This')
torch/autograd/graph.py:243 in public method `__init__`:
        D107: Missing docstring in __init__
torch/autograd/graph.py:251 in public method `__enter__`:
        D105: Missing docstring in magic method
torch/autograd/graph.py:256 in public method `__exit__`:
        D105: Missing docstring in magic method
torch/autograd/graph.py:261 in public class `save_on_cpu`:
        D205: 1 blank line required between summary line and description (found 0)
torch/autograd/graph.py:261 in public class `save_on_cpu`:
        D400: First line should end with a period (not 'e')
torch/autograd/graph.py:303 in public method `__init__`:
        D107: Missing docstring in __init__
torch/autograd/graph.py:365 in public function `register_multi_grad_hook`:
        D401: First line should be in imperative mood (perhaps 'Register', not 'Registers')
torch/autograd/graph.py:588 in public function `allow_mutation_on_saved_tensors`:
        D400: First line should end with a period (not 'd')
22
```

**After: 8**

```
torch/autograd/graph.py:1 at module level:
        D100: Missing docstring in public module
torch/autograd/graph.py:24 in public class `Node`:
        D101: Missing docstring in public class
torch/autograd/graph.py:42 in public method `next_functions`:
        D102: Missing docstring in public method
torch/autograd/graph.py:129 in public method `__subclasshook__`:
        D105: Missing docstring in magic method
torch/autograd/graph.py:244 in public method `__init__`:
        D107: Missing docstring in __init__
torch/autograd/graph.py:252 in public method `__enter__`:
        D105: Missing docstring in magic method
torch/autograd/graph.py:257 in public method `__exit__`:
        D105: Missing docstring in magic method
torch/autograd/graph.py:303 in public method `__init__`:
        D107: Missing docstring in __init__
8
```

- `torch/multiprocessing/pool.py` </br>
**Before: 6**

```
torch/multiprocessing/pool.py:1 at module level:
        D100: Missing docstring in public module
torch/multiprocessing/pool.py:7 in public function `clean_worker`:
        D103: Missing docstring in public function
torch/multiprocessing/pool.py:18 in public class `Pool`:
        D205: 1 blank line required between summary line and description (found 0)
torch/multiprocessing/pool.py:18 in public class `Pool`:
        D209: Multi-line docstring closing quotes should be on a separate line
torch/multiprocessing/pool.py:29 in private method `_repopulate_pool`:
        D205: 1 blank line required between summary line and description (found 0)
torch/multiprocessing/pool.py:29 in private method `_repopulate_pool`:
        D400: First line should end with a period (not ',')
6
```

**After: 2**

```
torch/multiprocessing/pool.py:1 at module level:
        D100: Missing docstring in public module
torch/multiprocessing/pool.py:7 in public function `clean_worker`:
        D103: Missing docstring in public function
2
```

- `torch/multiprocessing/queue.py` </br>
**Before: 11**

```
torch/multiprocessing/queue.py:1 at module level:
        D100: Missing docstring in public module
torch/multiprocessing/queue.py:8 in public class `ConnectionWrapper`:
        D205: 1 blank line required between summary line and description (found 0)
torch/multiprocessing/queue.py:8 in public class `ConnectionWrapper`:
        D209: Multi-line docstring closing quotes should be on a separate line
torch/multiprocessing/queue.py:8 in public class `ConnectionWrapper`:
        D400: First line should end with a period (not 'o')
torch/multiprocessing/queue.py:11 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/queue.py:14 in public method `send`:
        D102: Missing docstring in public method
torch/multiprocessing/queue.py:19 in public method `recv`:
        D102: Missing docstring in public method
torch/multiprocessing/queue.py:23 in public method `__getattr__`:
        D105: Missing docstring in magic method
torch/multiprocessing/queue.py:29 in public class `Queue`:
        D101: Missing docstring in public class
torch/multiprocessing/queue.py:30 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/queue.py:38 in public class `SimpleQueue`:
        D101: Missing docstring in public class
11
```

**After: 8**

```
torch/multiprocessing/queue.py:1 at module level:
        D100: Missing docstring in public module
torch/multiprocessing/queue.py:10 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/queue.py:13 in public method `send`:
        D102: Missing docstring in public method
torch/multiprocessing/queue.py:18 in public method `recv`:
        D102: Missing docstring in public method
torch/multiprocessing/queue.py:22 in public method `__getattr__`:
        D105: Missing docstring in magic method
torch/multiprocessing/queue.py:28 in public class `Queue`:
        D101: Missing docstring in public class
torch/multiprocessing/queue.py:29 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/queue.py:37 in public class `SimpleQueue`:
        D101: Missing docstring in public class
8
```

- `torch/multiprocessing/reductions.py` </br>
**Before: 31**

```
torch/multiprocessing/reductions.py:1 at module level:
        D100: Missing docstring in public module
torch/multiprocessing/reductions.py:24 in public class `StorageWeakRef`:
        D209: Multi-line docstring closing quotes should be on a separate line
torch/multiprocessing/reductions.py:31 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/reductions.py:38 in public method `from_weakref`:
        D102: Missing docstring in public method
torch/multiprocessing/reductions.py:44 in public method `expired`:
        D102: Missing docstring in public method
torch/multiprocessing/reductions.py:47 in public method `__del__`:
        D105: Missing docstring in magic method
torch/multiprocessing/reductions.py:50 in public method `__hash__`:
        D105: Missing docstring in magic method
torch/multiprocessing/reductions.py:53 in public method `__eq__`:
        D105: Missing docstring in magic method
torch/multiprocessing/reductions.py:60 in public class `SharedCache`:
        D400: First line should end with a period (not 'f')
torch/multiprocessing/reductions.py:62 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/reductions.py:75 in public method `get`:
        D102: Missing docstring in public method
torch/multiprocessing/reductions.py:79 in public method `__setitem__`:
        D105: Missing docstring in magic method
torch/multiprocessing/reductions.py:85 in public method `free_dead_references`:
        D102: Missing docstring in public method
torch/multiprocessing/reductions.py:99 in public function `rebuild_event`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:103 in public function `reduce_event`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:108 in public function `rebuild_tensor`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:121 in public function `rebuild_cuda_tensor`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:189 in public function `reduce_tensor`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:347 in public function `rebuild_nested_tensor`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:364 in public function `reduce_nested_tensor`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:389 in public function `fd_id`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:397 in public function `storage_from_cache`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:404 in public function `rebuild_storage_fd`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:417 in public function `rebuild_storage_filename`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:437 in public function `rebuild_storage_empty`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:441 in public function `rebuild_typed_storage`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:446 in public function `reduce_typed_storage`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:450 in public function `rebuild_typed_storage_child`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:455 in public function `reduce_typed_storage_child`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:459 in public function `reduce_storage`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:488 in public function `init_reductions`:
        D103: Missing docstring in public function
31
```

**After: 29**

```
torch/multiprocessing/reductions.py:1 at module level:
        D100: Missing docstring in public module
torch/multiprocessing/reductions.py:32 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/reductions.py:39 in public method `from_weakref`:
        D102: Missing docstring in public method
torch/multiprocessing/reductions.py:45 in public method `expired`:
        D102: Missing docstring in public method
torch/multiprocessing/reductions.py:48 in public method `__del__`:
        D105: Missing docstring in magic method
torch/multiprocessing/reductions.py:51 in public method `__hash__`:
        D105: Missing docstring in magic method
torch/multiprocessing/reductions.py:54 in public method `__eq__`:
        D105: Missing docstring in magic method
torch/multiprocessing/reductions.py:63 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/reductions.py:76 in public method `get`:
        D102: Missing docstring in public method
torch/multiprocessing/reductions.py:80 in public method `__setitem__`:
        D105: Missing docstring in magic method
torch/multiprocessing/reductions.py:86 in public method `free_dead_references`:
        D102: Missing docstring in public method
torch/multiprocessing/reductions.py:100 in public function `rebuild_event`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:104 in public function `reduce_event`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:109 in public function `rebuild_tensor`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:122 in public function `rebuild_cuda_tensor`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:190 in public function `reduce_tensor`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:348 in public function `rebuild_nested_tensor`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:365 in public function `reduce_nested_tensor`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:390 in public function `fd_id`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:398 in public function `storage_from_cache`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:405 in public function `rebuild_storage_fd`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:418 in public function `rebuild_storage_filename`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:438 in public function `rebuild_storage_empty`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:442 in public function `rebuild_typed_storage`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:447 in public function `reduce_typed_storage`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:451 in public function `rebuild_typed_storage_child`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:456 in public function `reduce_typed_storage_child`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:460 in public function `reduce_storage`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:489 in public function `init_reductions`:
        D103: Missing docstring in public function
29
```

- `torch/multiprocessing/spawn.py` </br>
**Before: 19**

```
torch/multiprocessing/spawn.py:1 at module level:
        D100: Missing docstring in public module
torch/multiprocessing/spawn.py:11 in public class `ProcessException`:
        D101: Missing docstring in public class
torch/multiprocessing/spawn.py:14 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/spawn.py:20 in public method `__reduce__`:
        D105: Missing docstring in magic method
torch/multiprocessing/spawn.py:25 in public class `ProcessRaisedException`:
        D205: 1 blank line required between summary line and description (found 0)
torch/multiprocessing/spawn.py:25 in public class `ProcessRaisedException`:
        D400: First line should end with a period (not 'n')
torch/multiprocessing/spawn.py:30 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/spawn.py:40 in public class `ProcessExitedException`:
        D205: 1 blank line required between summary line and description (found 0)
torch/multiprocessing/spawn.py:40 in public class `ProcessExitedException`:
        D400: First line should end with a period (not 'l')
torch/multiprocessing/spawn.py:47 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/spawn.py:59 in public method `__reduce__`:
        D105: Missing docstring in magic method
torch/multiprocessing/spawn.py:85 in public class `ProcessContext`:
        D101: Missing docstring in public class
torch/multiprocessing/spawn.py:86 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/spawn.py:93 in public method `pids`:
        D102: Missing docstring in public method
torch/multiprocessing/spawn.py:97 in public method `join`:
        D205: 1 blank line required between summary line and description (found 0)
torch/multiprocessing/spawn.py:97 in public method `join`:
        D401: First line should be in imperative mood (perhaps 'Try', not 'Tries')
torch/multiprocessing/spawn.py:166 in public class `SpawnContext`:
        D101: Missing docstring in public class
torch/multiprocessing/spawn.py:167 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/spawn.py:180 in public function `start_processes`:
        D103: Missing docstring in public function
19
```

**After: 13**

```
torch/multiprocessing/spawn.py:1 at module level:
        D100: Missing docstring in public module
torch/multiprocessing/spawn.py:11 in public class `ProcessException`:
        D101: Missing docstring in public class
torch/multiprocessing/spawn.py:14 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/spawn.py:20 in public method `__reduce__`:
        D105: Missing docstring in magic method
torch/multiprocessing/spawn.py:27 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/spawn.py:41 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/spawn.py:53 in public method `__reduce__`:
        D105: Missing docstring in magic method
torch/multiprocessing/spawn.py:79 in public class `ProcessContext`:
        D101: Missing docstring in public class
torch/multiprocessing/spawn.py:80 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/spawn.py:87 in public method `pids`:
        D102: Missing docstring in public method
torch/multiprocessing/spawn.py:161 in public class `SpawnContext`:
        D101: Missing docstring in public class
torch/multiprocessing/spawn.py:162 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/spawn.py:175 in public function `start_processes`:
        D103: Missing docstring in public function
13
```

- `torch/multiprocessing/__init__.py` </br>
**Before: 0**

```
torch/multiprocessing/__init__.py:1 at module level:
        D205: 1 blank line required between summary line and description (found 0)
torch/multiprocessing/__init__.py:1 at module level:
        D400: First line should end with a period (not '`')
torch/multiprocessing/__init__.py:57 in public function `set_sharing_strategy`:
        D401: First line should be in imperative mood (perhaps 'Set', not 'Sets')
torch/multiprocessing/__init__.py:69 in public function `get_sharing_strategy`:
        D401: First line should be in imperative mood (perhaps 'Return', not 'Returns')
torch/multiprocessing/__init__.py:74 in public function `get_all_sharing_strategies`:
        D401: First line should be in imperative mood (perhaps 'Return', not 'Returns')
5
```

**After: 0**

- `torch/nn/__init__.py` </br>
**Before: 3**

```
torch/nn/__init__.py:1 at module level:
        D104: Missing docstring in public package
torch/nn/__init__.py:14 in public function `factory_kwargs`:
        D205: 1 blank line required between summary line and description (found 0)
torch/nn/__init__.py:14 in public function `factory_kwargs`:
        D400: First line should end with a period (not 'd')
3
```

**After: 1**

```
torch/nn/__init__.py:1 at module level:
        D104: Missing docstring in public package
1
```

- `torch/nn/cpp.py` </br>
**Before: 16**

```
torch/nn/cpp.py:7 in public class `OrderedDictWrapper`:
        D205: 1 blank line required between summary line and description (found 0)
torch/nn/cpp.py:7 in public class `OrderedDictWrapper`:
        D400: First line should end with a period (not 'e')
torch/nn/cpp.py:16 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/cpp.py:21 in public method `cpp_dict`:
        D102: Missing docstring in public method
torch/nn/cpp.py:27 in public method `items`:
        D102: Missing docstring in public method
torch/nn/cpp.py:30 in public method `keys`:
        D102: Missing docstring in public method
torch/nn/cpp.py:33 in public method `values`:
        D102: Missing docstring in public method
torch/nn/cpp.py:36 in public method `__iter__`:
        D105: Missing docstring in magic method
torch/nn/cpp.py:39 in public method `__len__`:
        D105: Missing docstring in magic method
torch/nn/cpp.py:42 in public method `__contains__`:
        D105: Missing docstring in magic method
torch/nn/cpp.py:45 in public method `__getitem__`:
        D105: Missing docstring in magic method
torch/nn/cpp.py:50 in public class `ModuleWrapper`:
        D205: 1 blank line required between summary line and description (found 0)
torch/nn/cpp.py:50 in public class `ModuleWrapper`:
        D400: First line should end with a period (not 'd')
torch/nn/cpp.py:55 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/cpp.py:83 in public method `training`:
        D102: Missing docstring in public method
torch/nn/cpp.py:90 in public method `__repr__`:
        D105: Missing docstring in magic method
16
```

**After: 12**

```
torch/nn/cpp.py:16 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/cpp.py:21 in public method `cpp_dict`:
        D102: Missing docstring in public method
torch/nn/cpp.py:27 in public method `items`:
        D102: Missing docstring in public method
torch/nn/cpp.py:30 in public method `keys`:
        D102: Missing docstring in public method
torch/nn/cpp.py:33 in public method `values`:
        D102: Missing docstring in public method
torch/nn/cpp.py:36 in public method `__iter__`:
        D105: Missing docstring in magic method
torch/nn/cpp.py:39 in public method `__len__`:
        D105: Missing docstring in magic method
torch/nn/cpp.py:42 in public method `__contains__`:
        D105: Missing docstring in magic method
torch/nn/cpp.py:45 in public method `__getitem__`:
        D105: Missing docstring in magic method
torch/nn/cpp.py:52 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/cpp.py:80 in public method `training`:
        D102: Missing docstring in public method
torch/nn/cpp.py:87 in public method `__repr__`:
        D105: Missing docstring in magic method
12
```

- `torch/nn/grad.py` </br>
**Before: 10**

```
torch/nn/grad.py:1 at module level:
        D400: First line should end with a period (not 'e')
torch/nn/grad.py:8 in public function `conv1d_input`:
        D205: 1 blank line required between summary line and description (found 0)
torch/nn/grad.py:8 in public function `conv1d_input`:
        D401: First line should be in imperative mood (perhaps 'Compute', not 'Computes')
torch/nn/grad.py:40 in public function `conv1d_weight`:
        D401: First line should be in imperative mood (perhaps 'Compute', not 'Computes')
torch/nn/grad.py:71 in public function `conv2d_input`:
        D205: 1 blank line required between summary line and description (found 0)
torch/nn/grad.py:71 in public function `conv2d_input`:
        D401: First line should be in imperative mood (perhaps 'Compute', not 'Computes')
torch/nn/grad.py:103 in public function `conv2d_weight`:
        D401: First line should be in imperative mood (perhaps 'Compute', not 'Computes')
torch/nn/grad.py:134 in public function `conv3d_input`:
        D205: 1 blank line required between summary line and description (found 0)
torch/nn/grad.py:134 in public function `conv3d_input`:
        D401: First line should be in imperative mood (perhaps 'Compute', not 'Computes')
torch/nn/grad.py:166 in public function `conv3d_weight`:
        D401: First line should be in imperative mood (perhaps 'Compute', not 'Computes')
10
```

**After: 0**

- `torch/nn/parameter.py` </br>
**Before: 17**

```
torch/nn/parameter.py:1 at module level:
        D100: Missing docstring in public module
torch/nn/parameter.py:14 in public class `Parameter`:
        D204: 1 blank line required after class docstring (found 0)
torch/nn/parameter.py:33 in public method `__new__`:
        D102: Missing docstring in public method
torch/nn/parameter.py:54 in public method `__deepcopy__`:
        D105: Missing docstring in magic method
torch/nn/parameter.py:62 in public method `__repr__`:
        D105: Missing docstring in magic method
torch/nn/parameter.py:65 in public method `__reduce_ex__`:
        D105: Missing docstring in magic method
torch/nn/parameter.py:84 in public class `UninitializedTensorMixin`:
        D101: Missing docstring in public class
torch/nn/parameter.py:105 in public method `materialize`:
        D205: 1 blank line required between summary line and description (found 0)
torch/nn/parameter.py:125 in public method `shape`:
        D102: Missing docstring in public method
torch/nn/parameter.py:132 in public method `share_memory_`:
        D102: Missing docstring in public method
torch/nn/parameter.py:138 in public method `__repr__`:
        D105: Missing docstring in magic method
torch/nn/parameter.py:141 in public method `__reduce_ex__`:
        D105: Missing docstring in magic method
torch/nn/parameter.py:149 in public method `__torch_function__`:
        D105: Missing docstring in magic method
torch/nn/parameter.py:164 in public function `is_lazy`:
        D103: Missing docstring in public function
torch/nn/parameter.py:186 in public method `__new__`:
        D102: Missing docstring in public method
torch/nn/parameter.py:191 in public method `__deepcopy__`:
        D105: Missing docstring in magic method
torch/nn/parameter.py:217 in public method `__new__`:
        D102: Missing docstring in public method
17
```

**After: 15**

```
torch/nn/parameter.py:1 at module level:
        D100: Missing docstring in public module
torch/nn/parameter.py:34 in public method `__new__`:
        D102: Missing docstring in public method
torch/nn/parameter.py:55 in public method `__deepcopy__`:
        D105: Missing docstring in magic method
torch/nn/parameter.py:63 in public method `__repr__`:
        D105: Missing docstring in magic method
torch/nn/parameter.py:66 in public method `__reduce_ex__`:
        D105: Missing docstring in magic method
torch/nn/parameter.py:85 in public class `UninitializedTensorMixin`:
        D101: Missing docstring in public class
torch/nn/parameter.py:127 in public method `shape`:
        D102: Missing docstring in public method
torch/nn/parameter.py:134 in public method `share_memory_`:
        D102: Missing docstring in public method
torch/nn/parameter.py:140 in public method `__repr__`:
        D105: Missing docstring in magic method
torch/nn/parameter.py:143 in public method `__reduce_ex__`:
        D105: Missing docstring in magic method
torch/nn/parameter.py:151 in public method `__torch_function__`:
        D105: Missing docstring in magic method
torch/nn/parameter.py:166 in public function `is_lazy`:
        D103: Missing docstring in public function
torch/nn/parameter.py:188 in public method `__new__`:
        D102: Missing docstring in public method
torch/nn/parameter.py:193 in public method `__deepcopy__`:
        D105: Missing docstring in magic method
torch/nn/parameter.py:219 in public method `__new__`:
        D102: Missing docstring in public method
15
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113052
Approved by: https://github.com/mikaylagawarecki, https://github.com/soulitzer
2023-11-10 21:19:17 +00:00
Joel Schlosser
43ea782af3 Multiprocessing support for NT (#110292)
Fixes #110161

Allows NTs to be used in DataLoaders with `num_workers > 1`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110292
Approved by: https://github.com/cpuhrsch, https://github.com/albanD
2023-10-10 21:58:19 +00:00
PyTorch MergeBot
dac895c10a Revert "Multiprocessing support for NT (#110292)"
This reverts commit f17fe89e14.

Reverted https://github.com/pytorch/pytorch/pull/110292 on behalf of https://github.com/kit1980 due to Causes CUDA memory leaks ([comment](https://github.com/pytorch/pytorch/pull/110292#issuecomment-1749852095))
2023-10-06 01:07:40 +00:00
Joel Schlosser
f17fe89e14 Multiprocessing support for NT (#110292)
Fixes #110161

Allows NTs to be used in DataLoaders with `num_workers > 1`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110292
Approved by: https://github.com/cpuhrsch, https://github.com/albanD
2023-10-05 15:04:48 +00:00
PyTorch MergeBot
7e6cf04a84 Revert "Multiprocessing support for NT (#110292)"
This reverts commit 881e7304d6.

Reverted https://github.com/pytorch/pytorch/pull/110292 on behalf of https://github.com/jbschlosser due to Address review comments ([comment](https://github.com/pytorch/pytorch/pull/110292#issuecomment-1743524901))
2023-10-02 18:27:13 +00:00
Joel Schlosser
881e7304d6 Multiprocessing support for NT (#110292)
Fixes #110161

Allows NTs to be used in DataLoaders with `num_workers > 1`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110292
Approved by: https://github.com/cpuhrsch
ghstack dependencies: #110219
2023-10-02 18:14:34 +00:00
Edward Z. Yang
3bf922a6ce Apply UFMT to low traffic torch modules (#106249)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106249
Approved by: https://github.com/Skylion007
2023-07-29 23:37:30 +00:00
Justin Chu
4cc1745b13 [BE] f-stringify torch/ and scripts (#105538)
This PR is a follow up on the pyupgrade series to convert more strings to use f-strings using `flynt`.

- https://docs.python.org/3/reference/lexical_analysis.html#f-strings
- https://pypi.org/project/flynt/

Command used:

```
flynt torch/ -ll 120
flynt scripts/ -ll 120
flynt tools/ -ll 120
```

and excluded `collect_env.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105538
Approved by: https://github.com/ezyang, https://github.com/malfet
2023-07-21 19:35:24 +00:00
Aaron Gokaslan
738ba13b35 [BE]: enable PLE error codes in ruff and fix bugs (#101079)
Enables PyLint error codes implemented in ruff. These are un-opinionated static analysis checks on Python code that finds common bugs. After running all the PLE error codes that are implemented in ruff, I fixed the bugs, added a few ignores for malformed Python code that is part of our JIT test script, and finally added a few ignores for a false positive on PLE0605 and submitted an issue upstream to fix in ruff https://github.com/charliermarsh/ruff/issues/4345 .

Common bugs found here include analysis for malformed logging format calls, bad string format calls, invalid escape sequences, and more.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101079
Approved by: https://github.com/malfet
2023-05-11 23:57:25 +00:00
Elias Ellison
5c8fea5647 Reduce overhead in CUDAGraph Trees (#98529)
Significantly reduces overhead of constructing Tensors and Storages and checking Storage Liveness. Removes the regression for HF models that I tested and removes 75% of overhead of the extremely overhead bound resnet50 training we have in torchbench. (.91x base commit, 1.02x torchinductor default, 1.16x this PR, 1.25 previous cudagraphs impl).

This PR takes care of all of the lower hanging fruit.

- Computes storage aliasing at record time instead of during at runtime. We no longer need to use a runtime storage cache, and can instead index directly into the existing alias if there is one, or construct a new Storage

- Moves the heavyweight C++ calls into a batch - getting storage weakrefs and constructing tensors

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98529
Approved by: https://github.com/jansel, https://github.com/ngimel
2023-04-07 05:46:08 +00:00
PyTorch MergeBot
73b7702b7e Revert "FIX make sure we import the correct object from multiprocessing (#81862)"
This reverts commit 701cdbb6a5.

Reverted https://github.com/pytorch/pytorch/pull/81862 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally
2023-03-22 17:22:47 +00:00
tommoral
701cdbb6a5 FIX make sure we import the correct object from multiprocessing (#81862)
Fixes #44687.

The issue was that the Process object is not the one from the `_default_context` which should be `loky` when nesting `loky` calls.

This is a revamp of #53282 that was reverted because it broke some other tests.
How can I run the failing tests so I can see why this breaks?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81862
Approved by: https://github.com/VitalyFedyunin, https://github.com/janeyx99
2023-03-21 14:48:17 +00:00
Xuehai Pan
5b1cedacde [BE] [2/3] Rewrite super() calls in functorch and torch (#94588)
Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied.

- #94587
- #94588
- #94592

Also, methods with only a `super()` call are removed:

```diff
class MyModule(nn.Module):
-   def __init__(self):
-       super().__init__()
-
    def forward(self, ...):
        ...
```

Some cases that change the semantics should be kept unchanged. E.g.:

f152a79be9/caffe2/python/net_printer.py (L184-L190)

f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94588
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-02-10 21:16:33 +00:00
Aaron Gokaslan
8fce9a09cd [BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308)
Apply parts of pyupgrade to torch (starting with the safest changes).
This PR only does two things: removes the need to inherit from object and removes unused future imports.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-02-07 21:10:56 +00:00
Nikita Shulga
5976f0bdfe Set min supported Python version to 3.8 (#93155)
Also, grep for `if sys.version_info .cond. (3, 8)` and replaces them with appropriate action.

This is a last in a series of PRs that moved CI/CD away from testing PyTorch behavior against Python-3.7.

Fixes https://github.com/pytorch/pytorch/issues/80513

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93155
Approved by: https://github.com/huydhn
2023-01-29 18:28:46 +00:00
Kurt Mohler
ee28b865ee Deprecate TypedStorage, its derived classes, and all of their public methods (#85303)
Part of #85302

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85303
Approved by: https://github.com/ezyang
2022-11-08 18:11:01 +00:00
Kurt Mohler
14d0296e5c Rename _Typed/_UntypedStorage to Typed/UntypedStorage and update docs (#82438)
### Description

Since the major changes for `_TypedStorage` and `_UntypedStorage` are now complete, they can be renamed to be public.

`TypedStorage._untyped()` is renamed to `TypedStorage.untyped()`.

Documentation for storages is improved as well.

### Issue
Fixes #82436

### Testing
N/A

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82438
Approved by: https://github.com/ezyang
2022-07-30 19:37:08 +00:00
ProGamerGov
357b7d589c Fix docstring inconsistencies: string -> str, boolean -> bool (#82410)
### Description

Throughout the PyTorch docs and codebase, the `string` type in docstrings is referred to by two separate names. This leads to inconsistent docs, like you can see here: https://pytorch.org/docs/stable/generated/torch.nn.Conv3d.html#torch.nn.Conv3d

This PR fixes this issue by ensuring that all mentions of the string type in docstrings, are using the same format that Sphinx generates hyperlinks for.

### Testing
No testing should be required for this change

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82410
Approved by: https://github.com/jbschlosser
2022-07-28 21:29:57 +00:00
PyTorch MergeBot
3c9479dc30 Revert "FIX make sure we import the correct object from multiprocessing (#53282)"
This reverts commit e103d6af3d.

Reverted https://github.com/pytorch/pytorch/pull/53282 on behalf of https://github.com/janeyx99 due to Sorry, reverting as this breaks 10.2 tests on trunk e103d6af3d
2022-07-20 20:28:39 +00:00
tomMoral
e103d6af3d FIX make sure we import the correct object from multiprocessing (#53282)
Fixes #44687.

The issue was that the Process object is not the one from the `_default_context` which should be `loky` when nesting `loky` calls.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53282
Approved by: https://github.com/VitalyFedyunin
2022-07-20 16:52:03 +00:00
Elias Ellison
1058b47562 Weak-ref-ify MetaConverter and FakeTensorConverter (#80544)
Make `MetaConverter` and `FakeTensorConverter` hold weak references to their memoized tensors, and also have `MetaConverter` hold weak reference to Tensor storage. Otherwise it can be tricky for users to make sure all existing FakeTensors or FakeTensorModes are deleted which otherwise will leak memory.

I ran into https://github.com/pytorch/pytorch/issues/7733 which I was able to get around with the following (see comment from code):

```
# torch.Tensors cannot be used as a key in a dictionary
# because they define a custom __eq__ function which when used
# to resolve hash collisions will throw when comparing tensors:
# "RuntimeError: bool value of Tensor with more than one value is ambiguous."
# To avoid that, we use an object which will hold a Tensor and use
# its id for both hashing and equality.
# In order to use this as a weak key reference, we cannot
# simply use weakref.WeakKeyDictionary because the newly constructed
# WeakTensorRefKey only use would be a dictionary so it would have no strong
# references.
# To get around this issue, we can use it as a normal key, and then set
# `weakref.finalize` to delete the key when its contained tensor dies.
```

While for the tensor memo we can set a `weakref.finalize` callback that will remove the corresponding `WeakTensorRefKey` from the tensor memo, at the point that this callback is invoked the tensor storage is not yet deallocated.. See comment from code:

```
# [expired-storages]
# NB: even though the tensor has died,
# the deallocation of its storage can take longer,
# even when the storage has no other uses/views.
# In this case, the StorageWeakRef object will be kept alive
# longer than it needs to be, however the storage itself
# will be deallocated. We retain the possibly dead storages
# and periodically check if any of them are expired and
# can be freed.
```

partial fix for https://github.com/pytorch/torchdynamo/issues/468
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80544
Approved by: https://github.com/ezyang
2022-06-29 23:36:35 +00:00
Kurt Mohler
cecb2ad95e Restore old names for private funcs in legacy storages (#77861)
Followup from #75459

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77861
Approved by: https://github.com/ezyang
2022-05-20 02:03:34 +00:00
Kurt Mohler
aea6e2c396 Merge torch.cuda._UntypedStorage into torch._UntypedStorage (#75459)
Fixes #74933

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75459
Approved by: https://github.com/ezyang
2022-05-19 13:54:39 +00:00
Kurt Mohler
79ddc72b85 Virtualize <type>Storage classes (#66970)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/66228

cc ezyang bhosmer smessmer ljk53 bdhirsh

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66970

Reviewed By: bdhirsh

Differential Revision: D33245612

Pulled By: ezyang

fbshipit-source-id: 4c61c2cb029e2b94b0e68927c377d3e1c358dd7c
(cherry picked from commit d29fcdfb4bc2cc17b1795d4349e4b56fa0d1cf12)
2022-03-22 23:44:48 +00:00
Kurt Mohler
8e7fe87630 Rename Typed/UntypedStorage to _Typed/_UntypedStorage (#72540)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72540

Reviewed By: jbschlosser

Differential Revision: D34216823

Pulled By: bdhirsh

fbshipit-source-id: 1bc9930ab582771ebf02308e035576cd1a0dbe47
(cherry picked from commit 329238f612)
2022-02-15 23:53:01 +00:00
epwalsh
14d3d29b16 make ProcessException pickleable (#70118)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/70116

Happy to add tests if you let me know the best place to put them.

cc VitalyFedyunin

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70118

Reviewed By: malfet

Differential Revision: D33255899

Pulled By: ejguan

fbshipit-source-id: 41d495374182eb28bb8bb421e890eca3bddc077b
2021-12-30 09:09:55 -08:00