Commit Graph

90 Commits

Author SHA1 Message Date
Chip Turner
2ed47fecc5 Robustify torch.multiprocessing.spawn error reporting to be less deadlock prone (#114688)
multiprocessing.Queue relies on, among other things, background threads to send messages between processes.  This works in the happy path but can cause issues if a process is exiting by bypassing atexit handlers or crashing because the writer to the Queue can terminate while the reader is blocked reading the queue.  The reader sees the queue as non-empty yet even with a timeout will actually block forever.

An example of a Queue deadlock is here: https://gist.github.com/chipturner/342f72341f087737befe9df84d0e41ce

Since the error reporting case here is a simple one-shot message from the dying child to the parent, we can just use a file-based rendezvous.  This eliminates the deadlock when a large traceback is still being flushed to the network when a child exits.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114688
Approved by: https://github.com/suo, https://github.com/yifuwang
2023-12-09 03:36:43 +00:00
Satendra Gera
9521331ba5 [pytorch] Multiprocessing api to use sigkill if sigterm doesn't kill the process (#115219)
Summary:
[pytorch] Multiprocessing api to use sigkill if sigterm doesn't kill the process
We have seen a handful of jobs training stuck where one of the trainer goes down
while others are stuck in c++ land and hence not handling the sigterm.

Test Plan: Manually validated by attaching gdb to one of the processes and sent a kill -9 to another. Saw the log ```WARNING] Unable to shutdown process 4422 via Signals.SIGTERM, forcefully exiting via Signals.SIGKILL```

Differential Revision: D51862545

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115219
Approved by: https://github.com/wconstab, https://github.com/fduwjj
2023-12-08 02:26:19 +00:00
Chip Turner
d7160c9223 Handle potential ValueError exception when stringifying signals (#114696)
On some systems it is possible to receive a signal that does not have a name.  Rare, but possible.  This prevents our error handler from crashing and instead properly reports the signal.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114696
Approved by: https://github.com/xmfan
2023-12-07 02:10:30 +00:00
Pearu Peterson
0bd4d1f4ab Add sparse tensors support to dataloader. (#112842)
Fixes https://github.com/pytorch/pytorch/issues/106837

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112842
Approved by: https://github.com/cpuhrsch, https://github.com/gokulavasan
2023-11-19 16:05:27 +00:00
IvanLauLinTiong
91c90f232a Fix docstring errors in reductions.py, spawn.py, pool.py, parameter.py, cpp.py, grad.py, __init__.py, profiler.py, queue.py, graph.py (#113052)
Fixes #112595
- `torch/autograd/profiler.py` </br>
**Before: 37**

```
torch/autograd/profiler.py:1 at module level:
        D100: Missing docstring in public module
torch/autograd/profiler.py:91 in public class `profile`:
        D205: 1 blank line required between summary line and description (found 0)
torch/autograd/profiler.py:175 in public method `__init__`:
        D107: Missing docstring in __init__
torch/autograd/profiler.py:261 in public method `config`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:272 in public method `__enter__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:290 in public method `__exit__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:308 in public method `__repr__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:313 in public method `__str__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:322 in public method `table`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:346 in public method `export_chrome_trace`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:355 in public method `export_stacks`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:361 in public method `key_averages`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:368 in public method `total_average`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:377 in public method `self_cpu_time_total`:
        D205: 1 blank line required between summary line and description (found 0)
torch/autograd/profiler.py:377 in public method `self_cpu_time_total`:
        D400: First line should end with a period (not 'f')
torch/autograd/profiler.py:555 in public class `record_function`:
        D205: 1 blank line required between summary line and description (found 0)
torch/autograd/profiler.py:555 in public class `record_function`:
        D400: First line should end with a period (not 'f')
torch/autograd/profiler.py:591 in public method `__init__`:
        D107: Missing docstring in __init__
torch/autograd/profiler.py:602 in public method `__enter__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:608 in public method `__exit__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:625 in private method `_call_end_callbacks_on_future`:
        D205: 1 blank line required between summary line and description (found 0)
torch/autograd/profiler.py:625 in private method `_call_end_callbacks_on_future`:
        D400: First line should end with a period (not 'c')
torch/autograd/profiler.py:707 in public method `__init__`:
        D107: Missing docstring in __init__
torch/autograd/profiler.py:712 in public method `__enter__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:733 in public method `__exit__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:826 in public method `__init__`:
        D107: Missing docstring in __init__
torch/autograd/profiler.py:831 in public method `__enter__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:853 in public method `__exit__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:863 in public function `load_nvprof`:
        D401: First line should be in imperative mood (perhaps 'Open', not 'Opens')
torch/autograd/profiler.py:874 in public method `__init__`:
        D107: Missing docstring in __init__
torch/autograd/profiler.py:877 in public method `see`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:883 in public function `parse_nvprof_trace`:
        D103: Missing docstring in public function
torch/autograd/profiler.py:951 in public class `KinetoStepTracker`:
        D205: 1 blank line required between summary line and description (found 0)
torch/autograd/profiler.py:991 in public method `init_step_count`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:995 in public method `erase_step_count`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:1000 in public method `increment_step`:
        D205: 1 blank line required between summary line and description (found 0)
torch/autograd/profiler.py:1023 in public method `current_step`:
        D102: Missing docstring in public method
37
```

**After: 27**

```
torch/autograd/profiler.py:1 at module level:
        D100: Missing docstring in public module
torch/autograd/profiler.py:176 in public method `__init__`:
        D107: Missing docstring in __init__
torch/autograd/profiler.py:262 in public method `config`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:273 in public method `__enter__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:291 in public method `__exit__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:309 in public method `__repr__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:314 in public method `__str__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:323 in public method `table`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:347 in public method `export_chrome_trace`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:356 in public method `export_stacks`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:362 in public method `key_averages`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:369 in public method `total_average`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:593 in public method `__init__`:
        D107: Missing docstring in __init__
torch/autograd/profiler.py:604 in public method `__enter__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:610 in public method `__exit__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:708 in public method `__init__`:
        D107: Missing docstring in __init__
torch/autograd/profiler.py:713 in public method `__enter__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:734 in public method `__exit__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:827 in public method `__init__`:
        D107: Missing docstring in __init__
torch/autograd/profiler.py:832 in public method `__enter__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:854 in public method `__exit__`:
        D105: Missing docstring in magic method
torch/autograd/profiler.py:875 in public method `__init__`:
        D107: Missing docstring in __init__
torch/autograd/profiler.py:878 in public method `see`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:884 in public function `parse_nvprof_trace`:
        D103: Missing docstring in public function
torch/autograd/profiler.py:993 in public method `init_step_count`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:997 in public method `erase_step_count`:
        D102: Missing docstring in public method
torch/autograd/profiler.py:1025 in public method `current_step`:
        D102: Missing docstring in public method
27
```

- `torch/autograd/graph.py` </br>
**Before: 22**

```
torch/autograd/graph.py:1 at module level:
        D100: Missing docstring in public module
torch/autograd/graph.py:24 in public class `Node`:
        D101: Missing docstring in public class
torch/autograd/graph.py:27 in public method `name`:
        D401: First line should be in imperative mood (perhaps 'Return', not 'Returns')
torch/autograd/graph.py:42 in public method `next_functions`:
        D102: Missing docstring in public method
torch/autograd/graph.py:47 in public method `metadata`:
        D401: First line should be in imperative mood (perhaps 'Return', not 'Returns')
torch/autograd/graph.py:56 in public method `register_hook`:
        D401: First line should be in imperative mood (perhaps 'Register', not 'Registers')
torch/autograd/graph.py:94 in public method `register_prehook`:
        D401: First line should be in imperative mood (perhaps 'Register', not 'Registers')
torch/autograd/graph.py:129 in public method `__subclasshook__`:
        D105: Missing docstring in magic method
torch/autograd/graph.py:147 in public function `get_gradient_edge`:
        D205: 1 blank line required between summary line and description (found 0)
torch/autograd/graph.py:147 in public function `get_gradient_edge`:
        D400: First line should end with a period (not 'f')
torch/autograd/graph.py:147 in public function `get_gradient_edge`:
        D401: First line should be in imperative mood; try rephrasing (found 'This')
torch/autograd/graph.py:166 in public function `increment_version`:
        D205: 1 blank line required between summary line and description (found 0)
torch/autograd/graph.py:166 in public function `increment_version`:
        D400: First line should end with a period (not 'd')
torch/autograd/graph.py:166 in public function `increment_version`:
        D401: First line should be in imperative mood; try rephrasing (found 'This')
torch/autograd/graph.py:243 in public method `__init__`:
        D107: Missing docstring in __init__
torch/autograd/graph.py:251 in public method `__enter__`:
        D105: Missing docstring in magic method
torch/autograd/graph.py:256 in public method `__exit__`:
        D105: Missing docstring in magic method
torch/autograd/graph.py:261 in public class `save_on_cpu`:
        D205: 1 blank line required between summary line and description (found 0)
torch/autograd/graph.py:261 in public class `save_on_cpu`:
        D400: First line should end with a period (not 'e')
torch/autograd/graph.py:303 in public method `__init__`:
        D107: Missing docstring in __init__
torch/autograd/graph.py:365 in public function `register_multi_grad_hook`:
        D401: First line should be in imperative mood (perhaps 'Register', not 'Registers')
torch/autograd/graph.py:588 in public function `allow_mutation_on_saved_tensors`:
        D400: First line should end with a period (not 'd')
22
```

**After: 8**

```
torch/autograd/graph.py:1 at module level:
        D100: Missing docstring in public module
torch/autograd/graph.py:24 in public class `Node`:
        D101: Missing docstring in public class
torch/autograd/graph.py:42 in public method `next_functions`:
        D102: Missing docstring in public method
torch/autograd/graph.py:129 in public method `__subclasshook__`:
        D105: Missing docstring in magic method
torch/autograd/graph.py:244 in public method `__init__`:
        D107: Missing docstring in __init__
torch/autograd/graph.py:252 in public method `__enter__`:
        D105: Missing docstring in magic method
torch/autograd/graph.py:257 in public method `__exit__`:
        D105: Missing docstring in magic method
torch/autograd/graph.py:303 in public method `__init__`:
        D107: Missing docstring in __init__
8
```

- `torch/multiprocessing/pool.py` </br>
**Before: 6**

```
torch/multiprocessing/pool.py:1 at module level:
        D100: Missing docstring in public module
torch/multiprocessing/pool.py:7 in public function `clean_worker`:
        D103: Missing docstring in public function
torch/multiprocessing/pool.py:18 in public class `Pool`:
        D205: 1 blank line required between summary line and description (found 0)
torch/multiprocessing/pool.py:18 in public class `Pool`:
        D209: Multi-line docstring closing quotes should be on a separate line
torch/multiprocessing/pool.py:29 in private method `_repopulate_pool`:
        D205: 1 blank line required between summary line and description (found 0)
torch/multiprocessing/pool.py:29 in private method `_repopulate_pool`:
        D400: First line should end with a period (not ',')
6
```

**After: 2**

```
torch/multiprocessing/pool.py:1 at module level:
        D100: Missing docstring in public module
torch/multiprocessing/pool.py:7 in public function `clean_worker`:
        D103: Missing docstring in public function
2
```

- `torch/multiprocessing/queue.py` </br>
**Before: 11**

```
torch/multiprocessing/queue.py:1 at module level:
        D100: Missing docstring in public module
torch/multiprocessing/queue.py:8 in public class `ConnectionWrapper`:
        D205: 1 blank line required between summary line and description (found 0)
torch/multiprocessing/queue.py:8 in public class `ConnectionWrapper`:
        D209: Multi-line docstring closing quotes should be on a separate line
torch/multiprocessing/queue.py:8 in public class `ConnectionWrapper`:
        D400: First line should end with a period (not 'o')
torch/multiprocessing/queue.py:11 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/queue.py:14 in public method `send`:
        D102: Missing docstring in public method
torch/multiprocessing/queue.py:19 in public method `recv`:
        D102: Missing docstring in public method
torch/multiprocessing/queue.py:23 in public method `__getattr__`:
        D105: Missing docstring in magic method
torch/multiprocessing/queue.py:29 in public class `Queue`:
        D101: Missing docstring in public class
torch/multiprocessing/queue.py:30 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/queue.py:38 in public class `SimpleQueue`:
        D101: Missing docstring in public class
11
```

**After: 8**

```
torch/multiprocessing/queue.py:1 at module level:
        D100: Missing docstring in public module
torch/multiprocessing/queue.py:10 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/queue.py:13 in public method `send`:
        D102: Missing docstring in public method
torch/multiprocessing/queue.py:18 in public method `recv`:
        D102: Missing docstring in public method
torch/multiprocessing/queue.py:22 in public method `__getattr__`:
        D105: Missing docstring in magic method
torch/multiprocessing/queue.py:28 in public class `Queue`:
        D101: Missing docstring in public class
torch/multiprocessing/queue.py:29 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/queue.py:37 in public class `SimpleQueue`:
        D101: Missing docstring in public class
8
```

- `torch/multiprocessing/reductions.py` </br>
**Before: 31**

```
torch/multiprocessing/reductions.py:1 at module level:
        D100: Missing docstring in public module
torch/multiprocessing/reductions.py:24 in public class `StorageWeakRef`:
        D209: Multi-line docstring closing quotes should be on a separate line
torch/multiprocessing/reductions.py:31 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/reductions.py:38 in public method `from_weakref`:
        D102: Missing docstring in public method
torch/multiprocessing/reductions.py:44 in public method `expired`:
        D102: Missing docstring in public method
torch/multiprocessing/reductions.py:47 in public method `__del__`:
        D105: Missing docstring in magic method
torch/multiprocessing/reductions.py:50 in public method `__hash__`:
        D105: Missing docstring in magic method
torch/multiprocessing/reductions.py:53 in public method `__eq__`:
        D105: Missing docstring in magic method
torch/multiprocessing/reductions.py:60 in public class `SharedCache`:
        D400: First line should end with a period (not 'f')
torch/multiprocessing/reductions.py:62 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/reductions.py:75 in public method `get`:
        D102: Missing docstring in public method
torch/multiprocessing/reductions.py:79 in public method `__setitem__`:
        D105: Missing docstring in magic method
torch/multiprocessing/reductions.py:85 in public method `free_dead_references`:
        D102: Missing docstring in public method
torch/multiprocessing/reductions.py:99 in public function `rebuild_event`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:103 in public function `reduce_event`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:108 in public function `rebuild_tensor`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:121 in public function `rebuild_cuda_tensor`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:189 in public function `reduce_tensor`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:347 in public function `rebuild_nested_tensor`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:364 in public function `reduce_nested_tensor`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:389 in public function `fd_id`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:397 in public function `storage_from_cache`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:404 in public function `rebuild_storage_fd`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:417 in public function `rebuild_storage_filename`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:437 in public function `rebuild_storage_empty`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:441 in public function `rebuild_typed_storage`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:446 in public function `reduce_typed_storage`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:450 in public function `rebuild_typed_storage_child`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:455 in public function `reduce_typed_storage_child`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:459 in public function `reduce_storage`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:488 in public function `init_reductions`:
        D103: Missing docstring in public function
31
```

**After: 29**

```
torch/multiprocessing/reductions.py:1 at module level:
        D100: Missing docstring in public module
torch/multiprocessing/reductions.py:32 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/reductions.py:39 in public method `from_weakref`:
        D102: Missing docstring in public method
torch/multiprocessing/reductions.py:45 in public method `expired`:
        D102: Missing docstring in public method
torch/multiprocessing/reductions.py:48 in public method `__del__`:
        D105: Missing docstring in magic method
torch/multiprocessing/reductions.py:51 in public method `__hash__`:
        D105: Missing docstring in magic method
torch/multiprocessing/reductions.py:54 in public method `__eq__`:
        D105: Missing docstring in magic method
torch/multiprocessing/reductions.py:63 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/reductions.py:76 in public method `get`:
        D102: Missing docstring in public method
torch/multiprocessing/reductions.py:80 in public method `__setitem__`:
        D105: Missing docstring in magic method
torch/multiprocessing/reductions.py:86 in public method `free_dead_references`:
        D102: Missing docstring in public method
torch/multiprocessing/reductions.py:100 in public function `rebuild_event`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:104 in public function `reduce_event`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:109 in public function `rebuild_tensor`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:122 in public function `rebuild_cuda_tensor`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:190 in public function `reduce_tensor`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:348 in public function `rebuild_nested_tensor`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:365 in public function `reduce_nested_tensor`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:390 in public function `fd_id`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:398 in public function `storage_from_cache`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:405 in public function `rebuild_storage_fd`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:418 in public function `rebuild_storage_filename`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:438 in public function `rebuild_storage_empty`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:442 in public function `rebuild_typed_storage`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:447 in public function `reduce_typed_storage`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:451 in public function `rebuild_typed_storage_child`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:456 in public function `reduce_typed_storage_child`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:460 in public function `reduce_storage`:
        D103: Missing docstring in public function
torch/multiprocessing/reductions.py:489 in public function `init_reductions`:
        D103: Missing docstring in public function
29
```

- `torch/multiprocessing/spawn.py` </br>
**Before: 19**

```
torch/multiprocessing/spawn.py:1 at module level:
        D100: Missing docstring in public module
torch/multiprocessing/spawn.py:11 in public class `ProcessException`:
        D101: Missing docstring in public class
torch/multiprocessing/spawn.py:14 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/spawn.py:20 in public method `__reduce__`:
        D105: Missing docstring in magic method
torch/multiprocessing/spawn.py:25 in public class `ProcessRaisedException`:
        D205: 1 blank line required between summary line and description (found 0)
torch/multiprocessing/spawn.py:25 in public class `ProcessRaisedException`:
        D400: First line should end with a period (not 'n')
torch/multiprocessing/spawn.py:30 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/spawn.py:40 in public class `ProcessExitedException`:
        D205: 1 blank line required between summary line and description (found 0)
torch/multiprocessing/spawn.py:40 in public class `ProcessExitedException`:
        D400: First line should end with a period (not 'l')
torch/multiprocessing/spawn.py:47 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/spawn.py:59 in public method `__reduce__`:
        D105: Missing docstring in magic method
torch/multiprocessing/spawn.py:85 in public class `ProcessContext`:
        D101: Missing docstring in public class
torch/multiprocessing/spawn.py:86 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/spawn.py:93 in public method `pids`:
        D102: Missing docstring in public method
torch/multiprocessing/spawn.py:97 in public method `join`:
        D205: 1 blank line required between summary line and description (found 0)
torch/multiprocessing/spawn.py:97 in public method `join`:
        D401: First line should be in imperative mood (perhaps 'Try', not 'Tries')
torch/multiprocessing/spawn.py:166 in public class `SpawnContext`:
        D101: Missing docstring in public class
torch/multiprocessing/spawn.py:167 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/spawn.py:180 in public function `start_processes`:
        D103: Missing docstring in public function
19
```

**After: 13**

```
torch/multiprocessing/spawn.py:1 at module level:
        D100: Missing docstring in public module
torch/multiprocessing/spawn.py:11 in public class `ProcessException`:
        D101: Missing docstring in public class
torch/multiprocessing/spawn.py:14 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/spawn.py:20 in public method `__reduce__`:
        D105: Missing docstring in magic method
torch/multiprocessing/spawn.py:27 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/spawn.py:41 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/spawn.py:53 in public method `__reduce__`:
        D105: Missing docstring in magic method
torch/multiprocessing/spawn.py:79 in public class `ProcessContext`:
        D101: Missing docstring in public class
torch/multiprocessing/spawn.py:80 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/spawn.py:87 in public method `pids`:
        D102: Missing docstring in public method
torch/multiprocessing/spawn.py:161 in public class `SpawnContext`:
        D101: Missing docstring in public class
torch/multiprocessing/spawn.py:162 in public method `__init__`:
        D107: Missing docstring in __init__
torch/multiprocessing/spawn.py:175 in public function `start_processes`:
        D103: Missing docstring in public function
13
```

- `torch/multiprocessing/__init__.py` </br>
**Before: 0**

```
torch/multiprocessing/__init__.py:1 at module level:
        D205: 1 blank line required between summary line and description (found 0)
torch/multiprocessing/__init__.py:1 at module level:
        D400: First line should end with a period (not '`')
torch/multiprocessing/__init__.py:57 in public function `set_sharing_strategy`:
        D401: First line should be in imperative mood (perhaps 'Set', not 'Sets')
torch/multiprocessing/__init__.py:69 in public function `get_sharing_strategy`:
        D401: First line should be in imperative mood (perhaps 'Return', not 'Returns')
torch/multiprocessing/__init__.py:74 in public function `get_all_sharing_strategies`:
        D401: First line should be in imperative mood (perhaps 'Return', not 'Returns')
5
```

**After: 0**

- `torch/nn/__init__.py` </br>
**Before: 3**

```
torch/nn/__init__.py:1 at module level:
        D104: Missing docstring in public package
torch/nn/__init__.py:14 in public function `factory_kwargs`:
        D205: 1 blank line required between summary line and description (found 0)
torch/nn/__init__.py:14 in public function `factory_kwargs`:
        D400: First line should end with a period (not 'd')
3
```

**After: 1**

```
torch/nn/__init__.py:1 at module level:
        D104: Missing docstring in public package
1
```

- `torch/nn/cpp.py` </br>
**Before: 16**

```
torch/nn/cpp.py:7 in public class `OrderedDictWrapper`:
        D205: 1 blank line required between summary line and description (found 0)
torch/nn/cpp.py:7 in public class `OrderedDictWrapper`:
        D400: First line should end with a period (not 'e')
torch/nn/cpp.py:16 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/cpp.py:21 in public method `cpp_dict`:
        D102: Missing docstring in public method
torch/nn/cpp.py:27 in public method `items`:
        D102: Missing docstring in public method
torch/nn/cpp.py:30 in public method `keys`:
        D102: Missing docstring in public method
torch/nn/cpp.py:33 in public method `values`:
        D102: Missing docstring in public method
torch/nn/cpp.py:36 in public method `__iter__`:
        D105: Missing docstring in magic method
torch/nn/cpp.py:39 in public method `__len__`:
        D105: Missing docstring in magic method
torch/nn/cpp.py:42 in public method `__contains__`:
        D105: Missing docstring in magic method
torch/nn/cpp.py:45 in public method `__getitem__`:
        D105: Missing docstring in magic method
torch/nn/cpp.py:50 in public class `ModuleWrapper`:
        D205: 1 blank line required between summary line and description (found 0)
torch/nn/cpp.py:50 in public class `ModuleWrapper`:
        D400: First line should end with a period (not 'd')
torch/nn/cpp.py:55 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/cpp.py:83 in public method `training`:
        D102: Missing docstring in public method
torch/nn/cpp.py:90 in public method `__repr__`:
        D105: Missing docstring in magic method
16
```

**After: 12**

```
torch/nn/cpp.py:16 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/cpp.py:21 in public method `cpp_dict`:
        D102: Missing docstring in public method
torch/nn/cpp.py:27 in public method `items`:
        D102: Missing docstring in public method
torch/nn/cpp.py:30 in public method `keys`:
        D102: Missing docstring in public method
torch/nn/cpp.py:33 in public method `values`:
        D102: Missing docstring in public method
torch/nn/cpp.py:36 in public method `__iter__`:
        D105: Missing docstring in magic method
torch/nn/cpp.py:39 in public method `__len__`:
        D105: Missing docstring in magic method
torch/nn/cpp.py:42 in public method `__contains__`:
        D105: Missing docstring in magic method
torch/nn/cpp.py:45 in public method `__getitem__`:
        D105: Missing docstring in magic method
torch/nn/cpp.py:52 in public method `__init__`:
        D107: Missing docstring in __init__
torch/nn/cpp.py:80 in public method `training`:
        D102: Missing docstring in public method
torch/nn/cpp.py:87 in public method `__repr__`:
        D105: Missing docstring in magic method
12
```

- `torch/nn/grad.py` </br>
**Before: 10**

```
torch/nn/grad.py:1 at module level:
        D400: First line should end with a period (not 'e')
torch/nn/grad.py:8 in public function `conv1d_input`:
        D205: 1 blank line required between summary line and description (found 0)
torch/nn/grad.py:8 in public function `conv1d_input`:
        D401: First line should be in imperative mood (perhaps 'Compute', not 'Computes')
torch/nn/grad.py:40 in public function `conv1d_weight`:
        D401: First line should be in imperative mood (perhaps 'Compute', not 'Computes')
torch/nn/grad.py:71 in public function `conv2d_input`:
        D205: 1 blank line required between summary line and description (found 0)
torch/nn/grad.py:71 in public function `conv2d_input`:
        D401: First line should be in imperative mood (perhaps 'Compute', not 'Computes')
torch/nn/grad.py:103 in public function `conv2d_weight`:
        D401: First line should be in imperative mood (perhaps 'Compute', not 'Computes')
torch/nn/grad.py:134 in public function `conv3d_input`:
        D205: 1 blank line required between summary line and description (found 0)
torch/nn/grad.py:134 in public function `conv3d_input`:
        D401: First line should be in imperative mood (perhaps 'Compute', not 'Computes')
torch/nn/grad.py:166 in public function `conv3d_weight`:
        D401: First line should be in imperative mood (perhaps 'Compute', not 'Computes')
10
```

**After: 0**

- `torch/nn/parameter.py` </br>
**Before: 17**

```
torch/nn/parameter.py:1 at module level:
        D100: Missing docstring in public module
torch/nn/parameter.py:14 in public class `Parameter`:
        D204: 1 blank line required after class docstring (found 0)
torch/nn/parameter.py:33 in public method `__new__`:
        D102: Missing docstring in public method
torch/nn/parameter.py:54 in public method `__deepcopy__`:
        D105: Missing docstring in magic method
torch/nn/parameter.py:62 in public method `__repr__`:
        D105: Missing docstring in magic method
torch/nn/parameter.py:65 in public method `__reduce_ex__`:
        D105: Missing docstring in magic method
torch/nn/parameter.py:84 in public class `UninitializedTensorMixin`:
        D101: Missing docstring in public class
torch/nn/parameter.py:105 in public method `materialize`:
        D205: 1 blank line required between summary line and description (found 0)
torch/nn/parameter.py:125 in public method `shape`:
        D102: Missing docstring in public method
torch/nn/parameter.py:132 in public method `share_memory_`:
        D102: Missing docstring in public method
torch/nn/parameter.py:138 in public method `__repr__`:
        D105: Missing docstring in magic method
torch/nn/parameter.py:141 in public method `__reduce_ex__`:
        D105: Missing docstring in magic method
torch/nn/parameter.py:149 in public method `__torch_function__`:
        D105: Missing docstring in magic method
torch/nn/parameter.py:164 in public function `is_lazy`:
        D103: Missing docstring in public function
torch/nn/parameter.py:186 in public method `__new__`:
        D102: Missing docstring in public method
torch/nn/parameter.py:191 in public method `__deepcopy__`:
        D105: Missing docstring in magic method
torch/nn/parameter.py:217 in public method `__new__`:
        D102: Missing docstring in public method
17
```

**After: 15**

```
torch/nn/parameter.py:1 at module level:
        D100: Missing docstring in public module
torch/nn/parameter.py:34 in public method `__new__`:
        D102: Missing docstring in public method
torch/nn/parameter.py:55 in public method `__deepcopy__`:
        D105: Missing docstring in magic method
torch/nn/parameter.py:63 in public method `__repr__`:
        D105: Missing docstring in magic method
torch/nn/parameter.py:66 in public method `__reduce_ex__`:
        D105: Missing docstring in magic method
torch/nn/parameter.py:85 in public class `UninitializedTensorMixin`:
        D101: Missing docstring in public class
torch/nn/parameter.py:127 in public method `shape`:
        D102: Missing docstring in public method
torch/nn/parameter.py:134 in public method `share_memory_`:
        D102: Missing docstring in public method
torch/nn/parameter.py:140 in public method `__repr__`:
        D105: Missing docstring in magic method
torch/nn/parameter.py:143 in public method `__reduce_ex__`:
        D105: Missing docstring in magic method
torch/nn/parameter.py:151 in public method `__torch_function__`:
        D105: Missing docstring in magic method
torch/nn/parameter.py:166 in public function `is_lazy`:
        D103: Missing docstring in public function
torch/nn/parameter.py:188 in public method `__new__`:
        D102: Missing docstring in public method
torch/nn/parameter.py:193 in public method `__deepcopy__`:
        D105: Missing docstring in magic method
torch/nn/parameter.py:219 in public method `__new__`:
        D102: Missing docstring in public method
15
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113052
Approved by: https://github.com/mikaylagawarecki, https://github.com/soulitzer
2023-11-10 21:19:17 +00:00
Joel Schlosser
43ea782af3 Multiprocessing support for NT (#110292)
Fixes #110161

Allows NTs to be used in DataLoaders with `num_workers > 1`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110292
Approved by: https://github.com/cpuhrsch, https://github.com/albanD
2023-10-10 21:58:19 +00:00
PyTorch MergeBot
dac895c10a Revert "Multiprocessing support for NT (#110292)"
This reverts commit f17fe89e14.

Reverted https://github.com/pytorch/pytorch/pull/110292 on behalf of https://github.com/kit1980 due to Causes CUDA memory leaks ([comment](https://github.com/pytorch/pytorch/pull/110292#issuecomment-1749852095))
2023-10-06 01:07:40 +00:00
Joel Schlosser
f17fe89e14 Multiprocessing support for NT (#110292)
Fixes #110161

Allows NTs to be used in DataLoaders with `num_workers > 1`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110292
Approved by: https://github.com/cpuhrsch, https://github.com/albanD
2023-10-05 15:04:48 +00:00
PyTorch MergeBot
7e6cf04a84 Revert "Multiprocessing support for NT (#110292)"
This reverts commit 881e7304d6.

Reverted https://github.com/pytorch/pytorch/pull/110292 on behalf of https://github.com/jbschlosser due to Address review comments ([comment](https://github.com/pytorch/pytorch/pull/110292#issuecomment-1743524901))
2023-10-02 18:27:13 +00:00
Joel Schlosser
881e7304d6 Multiprocessing support for NT (#110292)
Fixes #110161

Allows NTs to be used in DataLoaders with `num_workers > 1`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110292
Approved by: https://github.com/cpuhrsch
ghstack dependencies: #110219
2023-10-02 18:14:34 +00:00
Edward Z. Yang
3bf922a6ce Apply UFMT to low traffic torch modules (#106249)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106249
Approved by: https://github.com/Skylion007
2023-07-29 23:37:30 +00:00
Justin Chu
4cc1745b13 [BE] f-stringify torch/ and scripts (#105538)
This PR is a follow up on the pyupgrade series to convert more strings to use f-strings using `flynt`.

- https://docs.python.org/3/reference/lexical_analysis.html#f-strings
- https://pypi.org/project/flynt/

Command used:

```
flynt torch/ -ll 120
flynt scripts/ -ll 120
flynt tools/ -ll 120
```

and excluded `collect_env.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105538
Approved by: https://github.com/ezyang, https://github.com/malfet
2023-07-21 19:35:24 +00:00
Aaron Gokaslan
738ba13b35 [BE]: enable PLE error codes in ruff and fix bugs (#101079)
Enables PyLint error codes implemented in ruff. These are un-opinionated static analysis checks on Python code that finds common bugs. After running all the PLE error codes that are implemented in ruff, I fixed the bugs, added a few ignores for malformed Python code that is part of our JIT test script, and finally added a few ignores for a false positive on PLE0605 and submitted an issue upstream to fix in ruff https://github.com/charliermarsh/ruff/issues/4345 .

Common bugs found here include analysis for malformed logging format calls, bad string format calls, invalid escape sequences, and more.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101079
Approved by: https://github.com/malfet
2023-05-11 23:57:25 +00:00
Elias Ellison
5c8fea5647 Reduce overhead in CUDAGraph Trees (#98529)
Significantly reduces overhead of constructing Tensors and Storages and checking Storage Liveness. Removes the regression for HF models that I tested and removes 75% of overhead of the extremely overhead bound resnet50 training we have in torchbench. (.91x base commit, 1.02x torchinductor default, 1.16x this PR, 1.25 previous cudagraphs impl).

This PR takes care of all of the lower hanging fruit.

- Computes storage aliasing at record time instead of during at runtime. We no longer need to use a runtime storage cache, and can instead index directly into the existing alias if there is one, or construct a new Storage

- Moves the heavyweight C++ calls into a batch - getting storage weakrefs and constructing tensors

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98529
Approved by: https://github.com/jansel, https://github.com/ngimel
2023-04-07 05:46:08 +00:00
PyTorch MergeBot
73b7702b7e Revert "FIX make sure we import the correct object from multiprocessing (#81862)"
This reverts commit 701cdbb6a5.

Reverted https://github.com/pytorch/pytorch/pull/81862 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally
2023-03-22 17:22:47 +00:00
tommoral
701cdbb6a5 FIX make sure we import the correct object from multiprocessing (#81862)
Fixes #44687.

The issue was that the Process object is not the one from the `_default_context` which should be `loky` when nesting `loky` calls.

This is a revamp of #53282 that was reverted because it broke some other tests.
How can I run the failing tests so I can see why this breaks?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81862
Approved by: https://github.com/VitalyFedyunin, https://github.com/janeyx99
2023-03-21 14:48:17 +00:00
Xuehai Pan
5b1cedacde [BE] [2/3] Rewrite super() calls in functorch and torch (#94588)
Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied.

- #94587
- #94588
- #94592

Also, methods with only a `super()` call are removed:

```diff
class MyModule(nn.Module):
-   def __init__(self):
-       super().__init__()
-
    def forward(self, ...):
        ...
```

Some cases that change the semantics should be kept unchanged. E.g.:

f152a79be9/caffe2/python/net_printer.py (L184-L190)

f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94588
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-02-10 21:16:33 +00:00
Aaron Gokaslan
8fce9a09cd [BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308)
Apply parts of pyupgrade to torch (starting with the safest changes).
This PR only does two things: removes the need to inherit from object and removes unused future imports.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-02-07 21:10:56 +00:00
Nikita Shulga
5976f0bdfe Set min supported Python version to 3.8 (#93155)
Also, grep for `if sys.version_info .cond. (3, 8)` and replaces them with appropriate action.

This is a last in a series of PRs that moved CI/CD away from testing PyTorch behavior against Python-3.7.

Fixes https://github.com/pytorch/pytorch/issues/80513

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93155
Approved by: https://github.com/huydhn
2023-01-29 18:28:46 +00:00
Kurt Mohler
ee28b865ee Deprecate TypedStorage, its derived classes, and all of their public methods (#85303)
Part of #85302

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85303
Approved by: https://github.com/ezyang
2022-11-08 18:11:01 +00:00
Kurt Mohler
14d0296e5c Rename _Typed/_UntypedStorage to Typed/UntypedStorage and update docs (#82438)
### Description

Since the major changes for `_TypedStorage` and `_UntypedStorage` are now complete, they can be renamed to be public.

`TypedStorage._untyped()` is renamed to `TypedStorage.untyped()`.

Documentation for storages is improved as well.

### Issue
Fixes #82436

### Testing
N/A

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82438
Approved by: https://github.com/ezyang
2022-07-30 19:37:08 +00:00
ProGamerGov
357b7d589c Fix docstring inconsistencies: string -> str, boolean -> bool (#82410)
### Description

Throughout the PyTorch docs and codebase, the `string` type in docstrings is referred to by two separate names. This leads to inconsistent docs, like you can see here: https://pytorch.org/docs/stable/generated/torch.nn.Conv3d.html#torch.nn.Conv3d

This PR fixes this issue by ensuring that all mentions of the string type in docstrings, are using the same format that Sphinx generates hyperlinks for.

### Testing
No testing should be required for this change

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82410
Approved by: https://github.com/jbschlosser
2022-07-28 21:29:57 +00:00
PyTorch MergeBot
3c9479dc30 Revert "FIX make sure we import the correct object from multiprocessing (#53282)"
This reverts commit e103d6af3d.

Reverted https://github.com/pytorch/pytorch/pull/53282 on behalf of https://github.com/janeyx99 due to Sorry, reverting as this breaks 10.2 tests on trunk e103d6af3d
2022-07-20 20:28:39 +00:00
tomMoral
e103d6af3d FIX make sure we import the correct object from multiprocessing (#53282)
Fixes #44687.

The issue was that the Process object is not the one from the `_default_context` which should be `loky` when nesting `loky` calls.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53282
Approved by: https://github.com/VitalyFedyunin
2022-07-20 16:52:03 +00:00
Elias Ellison
1058b47562 Weak-ref-ify MetaConverter and FakeTensorConverter (#80544)
Make `MetaConverter` and `FakeTensorConverter` hold weak references to their memoized tensors, and also have `MetaConverter` hold weak reference to Tensor storage. Otherwise it can be tricky for users to make sure all existing FakeTensors or FakeTensorModes are deleted which otherwise will leak memory.

I ran into https://github.com/pytorch/pytorch/issues/7733 which I was able to get around with the following (see comment from code):

```
# torch.Tensors cannot be used as a key in a dictionary
# because they define a custom __eq__ function which when used
# to resolve hash collisions will throw when comparing tensors:
# "RuntimeError: bool value of Tensor with more than one value is ambiguous."
# To avoid that, we use an object which will hold a Tensor and use
# its id for both hashing and equality.
# In order to use this as a weak key reference, we cannot
# simply use weakref.WeakKeyDictionary because the newly constructed
# WeakTensorRefKey only use would be a dictionary so it would have no strong
# references.
# To get around this issue, we can use it as a normal key, and then set
# `weakref.finalize` to delete the key when its contained tensor dies.
```

While for the tensor memo we can set a `weakref.finalize` callback that will remove the corresponding `WeakTensorRefKey` from the tensor memo, at the point that this callback is invoked the tensor storage is not yet deallocated.. See comment from code:

```
# [expired-storages]
# NB: even though the tensor has died,
# the deallocation of its storage can take longer,
# even when the storage has no other uses/views.
# In this case, the StorageWeakRef object will be kept alive
# longer than it needs to be, however the storage itself
# will be deallocated. We retain the possibly dead storages
# and periodically check if any of them are expired and
# can be freed.
```

partial fix for https://github.com/pytorch/torchdynamo/issues/468
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80544
Approved by: https://github.com/ezyang
2022-06-29 23:36:35 +00:00
Kurt Mohler
cecb2ad95e Restore old names for private funcs in legacy storages (#77861)
Followup from #75459

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77861
Approved by: https://github.com/ezyang
2022-05-20 02:03:34 +00:00
Kurt Mohler
aea6e2c396 Merge torch.cuda._UntypedStorage into torch._UntypedStorage (#75459)
Fixes #74933

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75459
Approved by: https://github.com/ezyang
2022-05-19 13:54:39 +00:00
Kurt Mohler
79ddc72b85 Virtualize <type>Storage classes (#66970)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/66228

cc ezyang bhosmer smessmer ljk53 bdhirsh

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66970

Reviewed By: bdhirsh

Differential Revision: D33245612

Pulled By: ezyang

fbshipit-source-id: 4c61c2cb029e2b94b0e68927c377d3e1c358dd7c
(cherry picked from commit d29fcdfb4bc2cc17b1795d4349e4b56fa0d1cf12)
2022-03-22 23:44:48 +00:00
Kurt Mohler
8e7fe87630 Rename Typed/UntypedStorage to _Typed/_UntypedStorage (#72540)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72540

Reviewed By: jbschlosser

Differential Revision: D34216823

Pulled By: bdhirsh

fbshipit-source-id: 1bc9930ab582771ebf02308e035576cd1a0dbe47
(cherry picked from commit 329238f612)
2022-02-15 23:53:01 +00:00
epwalsh
14d3d29b16 make ProcessException pickleable (#70118)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/70116

Happy to add tests if you let me know the best place to put them.

cc VitalyFedyunin

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70118

Reviewed By: malfet

Differential Revision: D33255899

Pulled By: ejguan

fbshipit-source-id: 41d495374182eb28bb8bb421e890eca3bddc077b
2021-12-30 09:09:55 -08:00
Kurt Mohler
5883523c1d Remove dtype from torch.Storage and use only torch.ByteStorage (#62030)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62030

Remove dtype tracking from Python Storage interface, remove all the different `<type>Storage` classes except for `ByteStorage`, and update serialization accordingly, while maintaining as much FC/BC as possible

Fixes https://github.com/pytorch/pytorch/issues/47442

* **THE SERIALIZATION FORMAT IS FULLY FC/BC.** We worked very hard to make sure this is the case. We will probably want to break FC at some point to make the serialization structure of tensors make more sense, but not today.
* There is now only a single torch.ByteStorage class. Methods like `Tensor.set_` no longer check that the dtype of storage is appropriate.
* As we no longer know what dtype of a storage is, we've **removed** the size method from Storage, replacing it with nbytes. This is to help catch otherwise silent errors where you confuse number of elements with number of bytes.
* `Storage._new_shared` takes a `nbytes` kwarg and will reject previous positional only calls.  `Storage._new_with_file` and `_set_from_file` require explicit element size arguments.
* It's no longer possible to convert storages to different types using the float/double/etc methods. Instead, do the conversion using a tensor.
* It's no longer possible to allocate a typed storage directly using FloatStorage/DoubleStorage/etc constructors. Instead, construct a tensor and extract its storage. The classes still exist but they are used purely for unpickling.
* The preexisting serialization format stores dtype with storage, and in fact this dtype is used to determine the dtype of the tensor overall.
 To accommodate this case, we introduce a new TypedStorage concept that exists only during unpickling time which is used to temporarily store the dtype so we can construct a tensor. **If you overrode the handling of pickling/unpickling, you MUST add handling for TypedStorage** or your serialization code will degrade to standard file-based serialization.

Original pull request: https://github.com/pytorch/pytorch/pull/59671

Reviewed By: soulitzer, ngimel

Differential Revision: D29466819

Pulled By: ezyang

fbshipit-source-id: 4a14e5d3c2b08e06e558683d97f7378a3180b00e
2021-10-05 13:50:34 -07:00
Kaushik B
ba07aaf211 Fix typo in warning for spawn method (#57927)
Summary:
Fix typo in warning for spawn method

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57927

Reviewed By: suo

Differential Revision: D28326390

Pulled By: bdhirsh

fbshipit-source-id: b0c12b1020d713865687f94f28ab2873ae260c23
2021-05-10 13:12:38 -07:00
Vasiliy Alekseev
bac4cfd54d Fix mp serialization for integer nn.Parameter on CUDA (#56529)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/56342

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56529

Reviewed By: albanD

Differential Revision: D27896094

Pulled By: ngimel

fbshipit-source-id: fe817781eb7139ea57c78acfd56e7c11b61eb4ed
2021-04-22 16:21:04 -07:00
Sam Estep
4753100a3b Un-ignore F403 in .flake8 (#55838)
Summary:
Generally wildcard imports are bad for the reasons described here: https://www.flake8rules.com/rules/F403.html

This PR replaces wildcard imports with an explicit list of imported items where possible, and adds a `# noqa: F403` comment in the other cases (mostly re-exports in `__init__.py` files).

This is a prerequisite for https://github.com/pytorch/pytorch/issues/55816, because currently [`tools/codegen/dest/register_dispatch_key.py` simply fails if you sort its imports](https://github.com/pytorch/pytorch/actions/runs/742505908).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55838

Test Plan: CI. You can also run `flake8` locally.

Reviewed By: jbschlosser

Differential Revision: D27724232

Pulled By: samestep

fbshipit-source-id: 269fb09cb4168f8a51fd65bfaacc6cda7fb87c34
2021-04-13 09:24:07 -07:00
Robert Gmyr
93bbbeccf7 Make SharedCache thread-safe (#53750)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/53731

Make SharedCache thread-safe by using explicit locks instead of relying on atomicity of certain Python operations

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53750

Reviewed By: malfet

Differential Revision: D27304793

Pulled By: albanD

fbshipit-source-id: 7c62babe4357bed57df3056fbda6801fb6168846
2021-03-25 06:35:03 -07:00
Richard Barnes
b89827b73f Drop unused imports (#49972)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49972

From
```
./python/libcst/libcst codemod remove_unused_imports.RemoveUnusedImportsWithGlean --no-format caffe2/
```

Test Plan: Standard sandcastle tests

Reviewed By: xush6528

Differential Revision: D25727352

fbshipit-source-id: 6b90717e161aeb1da8df30e67d586101d35d7d5f
2021-01-13 12:26:17 -08:00
Hugo van Kemenade
473e78c0fa Remove redundant code for unsupported Python versions (#49486)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49486

Remove code for Python 3.5 and lower.

There's more that can be removed/modernised, but sticking mainly to redundant version checks here, to keep the diff/PR smaller.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46579

Reviewed By: zou3519

Differential Revision: D24453571

Pulled By: ezyang

fbshipit-source-id: c2cfcf05d6c5f65df64d89c331692c9aec09248e
2021-01-06 12:45:46 -08:00
Samuel Marks
e6779d4357 [*.py] Rename "Arguments:" to "Args:" (#49736)
Summary:
I've written custom parsers and emitters for everything from docstrings to classes and functions. However, I recently came across an issue when I was parsing/generating from the TensorFlow codebase: inconsistent use of `Args:` and `Arguments:` in its docstrings.

```sh
(pytorch#c348fae)$ for name in 'Args:' 'Arguments:'; do
    printf '%-10s %04d\n' "$name" "$(rg -IFtpy --count-matches "$name" | paste -s -d+ -- | bc)"; done
Args:      1095
Arguments: 0336
```

It is easy enough to extend my parsers to support both variants, however it looks like `Arguments:` is wrong anyway, as per:

  - https://google.github.io/styleguide/pyguide.html#doc-function-args @ [`ddccc0f`](https://github.com/google/styleguide/blob/ddccc0f/pyguide.md)

  - https://chromium.googlesource.com/chromiumos/docs/+/master/styleguide/python.md#describing-arguments-in-docstrings @ [`9fc0fc0`](https://chromium.googlesource.com/chromiumos/docs/+/9fc0fc0/styleguide/python.md)

  - https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html @ [`c0ae8e3`](https://github.com/sphinx-contrib/napoleon/blob/c0ae8e3/docs/source/example_google.rst)

Therefore, only `Args:` is valid. This PR replaces them throughout the codebase.

PS: For related PRs, see tensorflow/tensorflow/pull/45420

PPS: The trackbacks automatically appearing below are sending the same changes to other repositories in the [PyTorch](https://github.com/pytorch) organisation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49736

Reviewed By: albanD

Differential Revision: D25710534

Pulled By: soumith

fbshipit-source-id: 61e8ff01abb433e9f78185c2d1d0cbd7c22c1619
2020-12-28 09:34:47 -08:00
Guilherme Leobas
cf92b0f3a0 add type annotations to multiprocessing module (#47756)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/47757

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47756

Reviewed By: malfet

Differential Revision: D24970773

Pulled By: ezyang

fbshipit-source-id: b0b9edb9cc1057829c6320e78174c6d5f7a77477
2020-11-16 13:05:49 -08:00
Chester Liu
17a6bc7c1b Cleanup unused code for Python < 3.6 (#47822)
Summary:
I think these can be safely removed since the min version of supported Python is now 3.6

Pull Request resolved: https://github.com/pytorch/pytorch/pull/47822

Reviewed By: smessmer

Differential Revision: D24954936

Pulled By: ezyang

fbshipit-source-id: 5d4b2aeb78fc97d7ee4abaf5fb2aae21bf765e8b
2020-11-13 21:37:01 -08:00
Aliaksandr Ivanou
3ffd2af8cd Add exception classification to torch.multiprocessing.spawn (#45174)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45174

Introduce different types of exceptions that map to different failures
of torch.multiprocessing.spawn. The change introduces three different exception types:
ProcessRaisedException - occurs when the process initiated by spawn raises an exception
ProcessExitedException - occurs when the process initiated by spawn exits
The following logic will allow frameworks that use mp.spawn to categorize failures.
This can be helpful for tracking metrics and enhancing logs.

Test Plan: Imported from OSS

Reviewed By: taohe

Differential Revision: D23889400

Pulled By: tierex

fbshipit-source-id: 8849624c616230a6a81158c52ce0c18beb437330
2020-10-09 12:59:41 -07:00
Xiang Gao
20ac736200 Remove py2 compatible future imports (#44735)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735

Reviewed By: mruberry

Differential Revision: D23731306

Pulled By: ezyang

fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f
2020-09-16 12:55:57 -07:00
David Reiss
e75fb4356b Remove (most) Python 2 support from Python code (#35615)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35615

Python 2 has reached end-of-life and is no longer supported by PyTorch.
Now we can clean up a lot of cruft that we put in place to support it.
These changes were all done manually, and I skipped anything that seemed
like it would take more than a few seconds, so I think it makes sense to
review it manually as well (though using side-by-side view and ignoring
whitespace change might be helpful).

Test Plan: CI

Differential Revision: D20842886

Pulled By: dreiss

fbshipit-source-id: 8cad4e87c45895e7ce3938a88e61157a79504aed
2020-04-22 09:23:14 -07:00
Kiuk Chung
7314f1c281 [torch/multiprocessing] Update documentation indicating that start_method is ignored for mp.spawn() (#33070)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33070

`start_method` parameter is intentionally ignored for `mp.spawn()`. Document this fact and point the user to `start_processes` if they want to use a different `start_method`.

Test Plan:
Warning message looks like:
```
main.py:8: UserWarning: This method only supports start_method=spawn (got: fork).
To use a different start_method use:
         torch.multiprocessing.start_process(...)
  warnings.warn(msg)
```

Reviewed By: ailzhang

Differential Revision: D19780235

fbshipit-source-id: 4599cd18c3ba6cc401810efe4f390290ffa8023b
2020-02-07 15:26:00 -08:00
Brian Wignall
f326045b37 Fix typos, via a Levenshtein-type corrector (#31523)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking.

Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523

Differential Revision: D19216749

Pulled By: mrshenli

fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea
2020-01-17 16:03:19 -08:00
peterjc123
6486bdfb90 Fix os.register_at_fork not defined on Windows (#30809)
Summary:
According to https://docs.python.org/3.8/library/os.html#os.register_at_fork, this function is only available in Unix platforms.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30809

Differential Revision: D18828777

Pulled By: bddppq

fbshipit-source-id: 3325a984da488bb0a80a5c27131553fbcf78921f
2019-12-05 13:36:53 -08:00
Peter Bell
dcd1216efe Force early initialization of OpenMP in forked children (#29006)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/28389

Intel's OpenMP implementation sets the thread affinity on the first call to an OpenMP function after a fork. By adding an atfork handler we can force this to happen before a user tries to set the affinity in their own DataLoader `worker_init_fn`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29006

Differential Revision: D18782456

Pulled By: ezyang

fbshipit-source-id: ce0b515256da0cf18ceb125e0cdec99a3311bbd3
2019-12-03 15:23:31 -08:00
Ailing Zhang
a997f224ac Add torch.multiprocessing.create_processes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28493

Differential Revision: D18766066

Pulled By: ailzhang

fbshipit-source-id: 7f424c8fae3012be2416cf9bc72ee2dde40c1f89
2019-12-03 10:38:19 -08:00
Brian Wignall
e7fe64f6a6 Fix typos (#30606)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30606

Differential Revision: D18763028

Pulled By: mrshenli

fbshipit-source-id: 896515a2156d062653408852e6c04b429fc5955c
2019-12-02 20:17:42 -08:00
なるみ
d83389d327 Ignore F401 in all __init__.py without putting noqa (#25823)
Summary:
By adding `per-file-ignores = __init__.py: F401` into `.flake8` with `flake8>=3.7`, we can ignore F410 in all `__init__.py` without putting `# noqa: F401` line by line.

http://flake8.pycqa.org/en/latest/user/options.html?highlight=per-file-ignores#cmdoption-flake8-per-file-ignores
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25823

Differential Revision: D17252182

Pulled By: soumith

fbshipit-source-id: 87b174075b79e4078953a7521bd1a8f82405646b
2019-10-23 15:28:13 -07:00