Summary:
Current behavior is that each process (main and workers) will print trace from `KeyboardInterrupt`. And the main process will also print
```
RuntimeError: DataLoader worker (pid 46045) exited unexpectedly with exit code 1. Details are lost due to multiprocessing. Rerunning with nm_workers=0 may give better error trace.
```
due to our SIGCLD handler.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11718
Differential Revision: D9840844
Pulled By: SsnL
fbshipit-source-id: 1a05060bb02907fef5aac3f274d2c84f9f42d187
Summary:
The old `torch.distributed` will go to `torch.distributed.deprecated`
The old DDP will go to `torch.nn.parallel.deprecated`
Now `torch.nn.parallel.DDP` will use c10d DDP
Now `torch.distributed` will use C10d frontend API
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11405
Reviewed By: pietern
Differential Revision: D9733733
Pulled By: teng-li
fbshipit-source-id: d6a3f3e73f8d3a7fcb1f4baef53c78063b8cbb08
Summary:
`Process.start()` actually take some time as it needs to start a
process and pass the arguments over via a pipe. Therefore, we
only add a worker to self.workers list after it started, so
that we do not call `.join()` if program dies before it starts,
and `__del__` tries to join it but will get:
AssertionError: can only join a started process.
Example trace when such error happens:
```py
[unrelated]
File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 500, in __iter__
return _DataLoaderIter(self)
File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 292, in __init__
w.start()
File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/process.py", line 112, in start
self._popen = self._Popen(self)
File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/context.py", line 277, in _Popen
return Popen(process_obj)
File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
self._launch(process_obj)
File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/popen_fork.py", line 70, in _launch
self.pid = os.fork()
KeyboardInterrupt
Exception ignored in: <function _DataLoaderIter.__del__ at 0x7fa704d5aa60>
Traceback (most recent call last):
File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 398, in __del__
self._shutdown_workers()
File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 392, in _shutdown_workers
w.join()
File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/process.py", line 139, in join
assert self._popen is not None, 'can only join a started process'
AssertionError: can only join a started process
```
No test because hard to reliably trigger.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11432
Reviewed By: ezyang
Differential Revision: D9735430
Pulled By: SsnL
fbshipit-source-id: a8912d9bb4063f210d6236267b178173810e2351
Summary:
There is no reason that user should do an extra import to use DistributedSampler.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10671
Differential Revision: D9395189
Pulled By: SsnL
fbshipit-source-id: 8f41d93813c8fb52fe012f76980c6a261a8db9b2
Reopening #6606 with fix for TEST_CUDA import issue on Windows and improvement to how we wait for manager exit in test_manager_unclean_exit. Loop tested on the Windows CI multiple times to make sure this actually fixes the CUDA OOM issue.
* Terminate dataloader workers properly when parent process is SIGKILL'ed
* Wait for worker processes to finish before shutting down manager process
* Add test for checking proper worker exit
* cosmetic change
* Test only if CUDA exists
* Don't call multiprocessing.set_start_method() in Python 2
* import TEST_CUDA only when we are in __main__
* Tune JOIN_TIMEOUT
* handle os.getppid() == 0 case
* Reset to original JOIN_TIMEOUT
* Use WaitForSingleObject() to check parent process status on Windows
* Fix TEST_CUDA import
* clean up
* Check main process only when index_queue.get() times out
* Change index_queues to multiprocessing.Queue
* Move manager checking logic to watchdog class
* Fix bugs in dataloader
* Fix TEST_CUDA import issue
* Don't import TEST_CUDA from common_nn
* Use event to signal manager exit in test
* fix lint
* Add comments
* Terminate dataloader workers properly when parent process is SIGKILL'ed
* Wait for worker processes to finish before shutting down manager process
* Add test for checking proper worker exit
* cosmetic change
* Test only if CUDA exists
* Don't call multiprocessing.set_start_method() in Python 2
* import TEST_CUDA only when we are in __main__
* Tune JOIN_TIMEOUT
* handle os.getppid() == 0 case
* Reset to original JOIN_TIMEOUT
* Use WaitForSingleObject() to check parent process status on Windows
* Fix TEST_CUDA import
* clean up
* Check main process only when index_queue.get() times out
* Change index_queues to multiprocessing.Queue
* Move manager checking logic to watchdog class
* Fix bugs in dataloader
* Fix TEST_CUDA import issue
* Codemod to update our codebase to 0.4 standard
* Update some of the test scri[ts
* remove Variable in test_clip_grad_value
* fix _symbolic_override_wrapper_maker
* Scope variables inside the dataloader
This clears up the memory consumed by batches inside the dataloader. Its pretty useful for long living data loaders.
* Update dataloader.py
Keeping compatibility, enable TensorDataset to get any number of tensors.
* Enable TensorDataset to get any number of tensors
* Update dataset.py
Fix syntax error on python 2.7
* Add several test for tensordataset
* Fix whitespaces
* Simplify args
* Update dataset.py
dded ind_worker_queue parameter to data.DataLoader. It makes preprocessing determinate.
DataLoader in multiprocessing mode may cause non-deterministic issue. Even if radom_seed has frozen, each subprocess may get tasks in unstable order. This is caused by different I/O time while data loads. If you use augmentation while data loading, it makes results unreproduceble. Look at the https://discuss.pytorch.org/t/deterministic-non-deterministic-results-with-pytorch/9087
To fix this issue I have added the individual queue for each worker. In this case each worker get tasks in the stable order. In summary, subprocess produces the stable results.
To reproduce issue you may change ind_worker_queue to False and run the script several times.
Code to reproduce issue is in the corresponding PR.
* TestIndividualWorkerQueue added to DataLoader tests
* Review fixes
* "Simplify" code by removing itertools
* Rebase conflicts fix
* Review fixes
* Fixed shutdown behavior
* Removed ind_worker_queue flag.
* Rebase on master
* Disable tests that use DataLoader with multiple workers (#5322)
This replaces the torch.Tensor constructors with factories that produce
Variables. Similarly, functions on the torch module (e.g. torch.randn)
now return Variables.
To keep the PR to a reasonable size, I've left most of the unused tensor
code. Subsequent PRs will remove the dead code, clean-up calls to
torch.autograd.Variable, and rename Variable to Tensor everywhere.
There are some breaking changes because Variable and Tensors had
slightly different semantics. There's a list of those changes here:
https://github.com/pytorch/pytorch/wiki/Breaking-Changes-from-Variable-and-Tensor-merge
* Add default PyTorch seeding and worker_init_fn to DataLoader
* generate seed using current RNG each time
* worker_seed <- main_proc_RNG_generated_seed + worker_id