Summary:
Resubmit #20698 which got messed up.
Idea is that when PyTorch is used in a custom build environment (e.g. Facebook), it's useful to track usage of various APIs centrally. This PR introduces a simple very lightweight mechanism to do so - only first invocation of a trigger point would be logged. This is significantly more lightweight than #18235 and thus we can allow to put logging in e.g. TensorImpl.
Also adds an initial list of trigger points. Trigger points are added in such a way that no static initialization triggers them, i.e. just linking with libtorch.so will not cause any logging. Further suggestions of what to log are welcomed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20745
Differential Revision: D15429196
Pulled By: dzhulgakov
fbshipit-source-id: a5e41a709a65b7ebccc6b95f93854e583cf20aca
Summary:
It's been hard to understand how workers are launched and what code runs in the worker vs. main process, especially on Windows, which leads to many of our samples failing. This explains when workers run an how to make code work on Windows as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18091
Differential Revision: D15083766
Pulled By: soumith
fbshipit-source-id: 8a7e60defc8a72ec63874f657d7d5267d951dccf
Summary:
Also
1. Bump multiprocessing test timeout following python core tests
2. Fix one type of flakiness in `test_proper_exit`.
3. Add trace reporting when loader process hangs in `test_proper_exit` using `faulthandler`.
3. Give `test_proper_exit` another try.
I'll heavily retest this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19421
Differential Revision: D15063728
Pulled By: ezyang
fbshipit-source-id: 4e0d992622e11053c44a9ec237b88b9a28a4472c
Summary:
Renewed attempt at https://github.com/pytorch/pytorch/pull/14171
From the original PR:
> Currently, the pin_memory_batch function in the dataloader will return a batch comprised of any unrecognized type without pinning the data, because it doesn't know how.
>
>This behavior was preventing us from overlapping data prefetching in Mask-RCNN, whose custom collate_fn returns a custom batch type.
The old PR allowed the user to implement batch pinning for custom batch and data types by passing a custom pin function to the dataloader. slayton58 suggested a cleaner approach: allow the user to define a `pin_memory` method on their custom types, and have `pin_memory_batch` [check for the presence of that method](https://github.com/pytorch/pytorch/pull/16743/files#diff-9f154cbd884fe654066b1621fad654f3R56) in the incoming batch as a fallback. I've updated the test and docstrings accordingly.
The old PR was merged but then reverted due to weird cuda OOM errors on windows that may or may not have been related. I have no idea why my changes would cause such errors (then or now) but it's something to keep an eye out for.
fmassa and yf225 who were my POCs on the old PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16743
Differential Revision: D13991745
Pulled By: ezyang
fbshipit-source-id: 74e71f62a03be453b4caa9f5524e9bc53467fa17
Summary:
1. Improve error message for better debugging info
2. Increase timeout
3. Also apply the windows worker failure detection mechanism on non-Windows platforms, for better robustness
Attempt to fix#14501
cc ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16249
Differential Revision: D13784702
Pulled By: ezyang
fbshipit-source-id: 09a7cff83ab9edce561ed69f9fb555ab35d1275f
Summary:
Same as #14668, and was approved there.
ailzhang , please apply this patch to Horizon's `data_streamer.py`: https://gist.github.com/SsnL/020fdb3d6b7016d81b6ba1d04cc41459 Thank you!
Below is the original description at #14668:
As I am working on tasks in https://github.com/pytorch/pytorch/issues/13023, I realized how unreadable the code is because all functions to be run in multiprocessing must be at top global level. Adding more functionalities to `dataloader.py` will only make things worse.
So in this PR, I refactor `dataloader.py` and move much of it into `data._utils`. E.g., the `_worker_loop` and related methods are now in `data._utils.worker`, signal handling code in `data._utils.signal_handling`, collating code in `data._utils.collate`, etc. This split, IMHO, makes code much clearer. I will base my future changes to DataLoader on top of this.
No functionality is changed, except that I added `torch._six.queue`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15331
Reviewed By: yf225
Differential Revision: D13503120
Pulled By: ailzhang
fbshipit-source-id: 94df16b4d80ad1102c437cde0d5a2e62cffe1f8e
Summary:
As I am working on tasks in https://github.com/pytorch/pytorch/issues/13023, I realized how unreadable the code is because all functions to be run in multiprocessing must be at top global level. Adding more functionalities to `dataloader.py` will only make things worse.
So in this PR, I refactor `dataloader.py` and move much of it into `data._utils`. E.g., the `_worker_loop` and related methods are now in `data._utils.worker`, signal handling code in `data._utils.signal_handling`, collating code in `data._utils.collate`, etc. This split, IMHO, makes code much clearer. I will base my future changes to DataLoader on top of this.
No functionality is changed, except that I added `torch._six.queue`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14668
Reviewed By: soumith
Differential Revision: D13289919
Pulled By: ailzhang
fbshipit-source-id: d701bc7bb48f5dd7b163b5be941a9d27eb277a4c
Summary:
Currently, the `pin_memory_batch` function in the dataloader will return a batch comprised of any unrecognized type without pinning the data, because it doesn't know how.
This behavior was preventing us from overlapping data prefetching in Mask-RCNN, whose custom `collate_fn` returns a custom batch type.
The present PR adds the ability for the user to pass a `pin_fn` alongside any custom `collate_fn` to handle such custom types.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14171
Differential Revision: D13166669
Pulled By: soumith
fbshipit-source-id: ca965f9841d4a259b3ca4413c8bd0d8743d433ab
Summary:
I struggled with yet another DataLoader hang for the entire evening. After numerous experiments, I realized that it is unsafe to do anything when Python is shutting down. We also unfortunately implement our DataLaoder cleaning-up logic in `__del__`, a function that may or may not be called during shutdown, and if called, may or may not be called before core library resources are freed.
Fortunately, we are already setting all our workers and pin_memory_thread as daemonic. So in case of Python shutting down, we can just do a no-op in `__del__` and rely on the automatic termination of daemonic children.
An `atexit` hook is used to detect Python exit.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12700
Differential Revision: D10419027
Pulled By: SsnL
fbshipit-source-id: 5753e70d03e69eb1c9ec4ae2154252d51e2f79b0
Summary:
Current behavior is that each process (main and workers) will print trace from `KeyboardInterrupt`. And the main process will also print
```
RuntimeError: DataLoader worker (pid 46045) exited unexpectedly with exit code 1. Details are lost due to multiprocessing. Rerunning with nm_workers=0 may give better error trace.
```
due to our SIGCLD handler.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11718
Differential Revision: D9840844
Pulled By: SsnL
fbshipit-source-id: 1a05060bb02907fef5aac3f274d2c84f9f42d187
Summary:
`Process.start()` actually take some time as it needs to start a
process and pass the arguments over via a pipe. Therefore, we
only add a worker to self.workers list after it started, so
that we do not call `.join()` if program dies before it starts,
and `__del__` tries to join it but will get:
AssertionError: can only join a started process.
Example trace when such error happens:
```py
[unrelated]
File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 500, in __iter__
return _DataLoaderIter(self)
File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 292, in __init__
w.start()
File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/process.py", line 112, in start
self._popen = self._Popen(self)
File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/context.py", line 277, in _Popen
return Popen(process_obj)
File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
self._launch(process_obj)
File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/popen_fork.py", line 70, in _launch
self.pid = os.fork()
KeyboardInterrupt
Exception ignored in: <function _DataLoaderIter.__del__ at 0x7fa704d5aa60>
Traceback (most recent call last):
File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 398, in __del__
self._shutdown_workers()
File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 392, in _shutdown_workers
w.join()
File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/process.py", line 139, in join
assert self._popen is not None, 'can only join a started process'
AssertionError: can only join a started process
```
No test because hard to reliably trigger.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11432
Reviewed By: ezyang
Differential Revision: D9735430
Pulled By: SsnL
fbshipit-source-id: a8912d9bb4063f210d6236267b178173810e2351
Reopening #6606 with fix for TEST_CUDA import issue on Windows and improvement to how we wait for manager exit in test_manager_unclean_exit. Loop tested on the Windows CI multiple times to make sure this actually fixes the CUDA OOM issue.
* Terminate dataloader workers properly when parent process is SIGKILL'ed
* Wait for worker processes to finish before shutting down manager process
* Add test for checking proper worker exit
* cosmetic change
* Test only if CUDA exists
* Don't call multiprocessing.set_start_method() in Python 2
* import TEST_CUDA only when we are in __main__
* Tune JOIN_TIMEOUT
* handle os.getppid() == 0 case
* Reset to original JOIN_TIMEOUT
* Use WaitForSingleObject() to check parent process status on Windows
* Fix TEST_CUDA import
* clean up
* Check main process only when index_queue.get() times out
* Change index_queues to multiprocessing.Queue
* Move manager checking logic to watchdog class
* Fix bugs in dataloader
* Fix TEST_CUDA import issue
* Don't import TEST_CUDA from common_nn
* Use event to signal manager exit in test
* fix lint
* Add comments
* Terminate dataloader workers properly when parent process is SIGKILL'ed
* Wait for worker processes to finish before shutting down manager process
* Add test for checking proper worker exit
* cosmetic change
* Test only if CUDA exists
* Don't call multiprocessing.set_start_method() in Python 2
* import TEST_CUDA only when we are in __main__
* Tune JOIN_TIMEOUT
* handle os.getppid() == 0 case
* Reset to original JOIN_TIMEOUT
* Use WaitForSingleObject() to check parent process status on Windows
* Fix TEST_CUDA import
* clean up
* Check main process only when index_queue.get() times out
* Change index_queues to multiprocessing.Queue
* Move manager checking logic to watchdog class
* Fix bugs in dataloader
* Fix TEST_CUDA import issue
* Codemod to update our codebase to 0.4 standard
* Update some of the test scri[ts
* remove Variable in test_clip_grad_value
* fix _symbolic_override_wrapper_maker
* Scope variables inside the dataloader
This clears up the memory consumed by batches inside the dataloader. Its pretty useful for long living data loaders.
* Update dataloader.py
dded ind_worker_queue parameter to data.DataLoader. It makes preprocessing determinate.
DataLoader in multiprocessing mode may cause non-deterministic issue. Even if radom_seed has frozen, each subprocess may get tasks in unstable order. This is caused by different I/O time while data loads. If you use augmentation while data loading, it makes results unreproduceble. Look at the https://discuss.pytorch.org/t/deterministic-non-deterministic-results-with-pytorch/9087
To fix this issue I have added the individual queue for each worker. In this case each worker get tasks in the stable order. In summary, subprocess produces the stable results.
To reproduce issue you may change ind_worker_queue to False and run the script several times.
Code to reproduce issue is in the corresponding PR.
* TestIndividualWorkerQueue added to DataLoader tests
* Review fixes
* "Simplify" code by removing itertools
* Rebase conflicts fix
* Review fixes
* Fixed shutdown behavior
* Removed ind_worker_queue flag.
* Rebase on master
* Disable tests that use DataLoader with multiple workers (#5322)
* Add default PyTorch seeding and worker_init_fn to DataLoader
* generate seed using current RNG each time
* worker_seed <- main_proc_RNG_generated_seed + worker_id