pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Tongzhou Wang	11c31aef04	Prevent hanging in data loader altogether Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11985 Differential Revision: D10202374 Pulled By: SsnL fbshipit-source-id: 1ab1a07185f78a104f9b05930a87ef5a32f431e4	2018-10-09 09:54:19 -07:00
Tongzhou Wang	c30790797f	Minor data loader doc improvements Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11821 Differential Revision: D9948292 Pulled By: SsnL fbshipit-source-id: 01c21c129423c0f7844b403e665a8fe021a9c820	2018-09-19 15:33:25 -07:00
Tongzhou Wang	8e76dcf173	Prevent raising KeyboardInterrupt in worker (#11718 ) Summary: Current behavior is that each process (main and workers) will print trace from `KeyboardInterrupt`. And the main process will also print ``` RuntimeError: DataLoader worker (pid 46045) exited unexpectedly with exit code 1. Details are lost due to multiprocessing. Rerunning with nm_workers=0 may give better error trace. ``` due to our SIGCLD handler. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11718 Differential Revision: D9840844 Pulled By: SsnL fbshipit-source-id: 1a05060bb02907fef5aac3f274d2c84f9f42d187	2018-09-14 16:09:35 -07:00
Jeff Smith	05e06f7de2	migrating deprecated calls without abc module for containers (#11515 ) Summary: Implementing #10540. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11515 Reviewed By: apaszke Differential Revision: D9771045 Pulled By: jeffreyksmithjr fbshipit-source-id: 85ea39abaa9b465805a969f122b626b11fc85ef6	2018-09-13 15:09:22 -07:00
Tongzhou Wang	57f149a861	Only join pin_memory_thread after it started (#11599 ) Summary: Same reason as in #11432 . Example error: ``` Exception ignored in: <function _DataLoaderIter.__del__ at 0x7fa06963cf28> Traceback (most recent call last): File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 405, in __del__ self._shutdown_workers() File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 401, in _shutdown_workers self.pin_memory_thread.join() AttributeError: '_DataLoaderIter' object has no attribute 'pin_memory_thread' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11599 Differential Revision: D9801143 Pulled By: SsnL fbshipit-source-id: 520590a21f56fa381fcac621457a7544d3fba47e	2018-09-13 09:40:49 -07:00
Tongzhou Wang	560d6efd3a	Only join started dataloader workers (#11432 ) Summary: `Process.start()` actually take some time as it needs to start a process and pass the arguments over via a pipe. Therefore, we only add a worker to self.workers list after it started, so that we do not call `.join()` if program dies before it starts, and `__del__` tries to join it but will get: AssertionError: can only join a started process. Example trace when such error happens: ```py [unrelated] File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 500, in __iter__ return _DataLoaderIter(self) File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 292, in __init__ w.start() File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/process.py", line 112, in start self._popen = self._Popen(self) File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/context.py", line 277, in _Popen return Popen(process_obj) File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__ self._launch(process_obj) File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/popen_fork.py", line 70, in _launch self.pid = os.fork() KeyboardInterrupt Exception ignored in: <function _DataLoaderIter.__del__ at 0x7fa704d5aa60> Traceback (most recent call last): File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 398, in __del__ self._shutdown_workers() File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 392, in _shutdown_workers w.join() File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/process.py", line 139, in join assert self._popen is not None, 'can only join a started process' AssertionError: can only join a started process ``` No test because hard to reliably trigger. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11432 Reviewed By: ezyang Differential Revision: D9735430 Pulled By: SsnL fbshipit-source-id: a8912d9bb4063f210d6236267b178173810e2351	2018-09-09 12:55:51 -07:00
Tongzhou Wang	04f381650e	Resubmit: Fix dataloader hang when it is not completely iterated (#10366 ) Summary: https://github.com/pytorch/pytorch/pull/9655 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10366 Differential Revision: D9237393 Pulled By: SsnL fbshipit-source-id: fabfad7f371ba33300098f6b885c0e3f26c3e14a	2018-08-09 00:10:24 -07:00
Tongzhou Wang	a7f183f971	Revert "Fix dataloader hang when it is not completely iterated (#9655 )" (#9804 ) Summary: This reverts commit `9ee5133651`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9804 Reviewed By: ezyang Differential Revision: D8987780 Pulled By: SsnL fbshipit-source-id: 75ad70b0b8d672d0b35235fa248b187be64b68e5	2018-07-25 10:10:30 -07:00
Tongzhou Wang	9ee5133651	Fix dataloader hang when it is not completely iterated (#9655 ) Summary: second trial of https://github.com/pytorch/pytorch/pull/7140 cc csarofeen Let's see if this works. It passes everything locally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9655 Differential Revision: D8940177 Pulled By: SsnL fbshipit-source-id: 8d6340fc9f7355c71e1e26b262da166402faa158	2018-07-22 20:38:27 -07:00
tvn	146b951ec5	Fix seeding random module in DataLoader (#7886 ) * fix seeding random module * make base seed int * follow 0.4 idiom * add a test for random seeding	2018-05-29 15:55:04 -04:00
Maxim Berman	03767b66db	Add FileNotFoundError to torch._six (#7524 ) Add FileNotFoundError for compatibility with Python 2 and use in dataloader. Fixes pytorch/pytorch#6932	2018-05-12 20:54:26 -04:00
vfdev	6363faf184	Fix issue #7209 in DataLoader (#7265 )	2018-05-04 10:51:46 +02:00
Thomas Viehmann	1b0ad8678b	import *Sampler to utils.data (Better fix than #6982 ) (#7007 )	2018-04-27 10:18:29 +02:00
Will Feng	e8bdbdaa27	Terminate dataloader workers properly when parent process is SIGKILL'ed (#6779 ) Reopening #6606 with fix for TEST_CUDA import issue on Windows and improvement to how we wait for manager exit in test_manager_unclean_exit. Loop tested on the Windows CI multiple times to make sure this actually fixes the CUDA OOM issue. * Terminate dataloader workers properly when parent process is SIGKILL'ed * Wait for worker processes to finish before shutting down manager process * Add test for checking proper worker exit * cosmetic change * Test only if CUDA exists * Don't call multiprocessing.set_start_method() in Python 2 * import TEST_CUDA only when we are in __main__ * Tune JOIN_TIMEOUT * handle os.getppid() == 0 case * Reset to original JOIN_TIMEOUT * Use WaitForSingleObject() to check parent process status on Windows * Fix TEST_CUDA import * clean up * Check main process only when index_queue.get() times out * Change index_queues to multiprocessing.Queue * Move manager checking logic to watchdog class * Fix bugs in dataloader * Fix TEST_CUDA import issue * Don't import TEST_CUDA from common_nn * Use event to signal manager exit in test * fix lint * Add comments	2018-04-22 23:03:54 -04:00
gchanan	4c5b95a433	Revert "Terminate dataloader workers properly when parent process is SIGKILL'ed (#6606 )" (#6772 ) This reverts commit `8d6a50aaeb`.	2018-04-19 14:28:48 -04:00
Tongzhou Wang	072d49f787	Fix import error sometimes happening in dataloader when exiting Python (#6671 ) * Fix import error sometimes happening in dataloader when exiting Python * address comments	2018-04-19 06:56:39 -04:00
Will Feng	8d6a50aaeb	Terminate dataloader workers properly when parent process is SIGKILL'ed (#6606 ) * Terminate dataloader workers properly when parent process is SIGKILL'ed * Wait for worker processes to finish before shutting down manager process * Add test for checking proper worker exit * cosmetic change * Test only if CUDA exists * Don't call multiprocessing.set_start_method() in Python 2 * import TEST_CUDA only when we are in __main__ * Tune JOIN_TIMEOUT * handle os.getppid() == 0 case * Reset to original JOIN_TIMEOUT * Use WaitForSingleObject() to check parent process status on Windows * Fix TEST_CUDA import * clean up * Check main process only when index_queue.get() times out * Change index_queues to multiprocessing.Queue * Move manager checking logic to watchdog class * Fix bugs in dataloader * Fix TEST_CUDA import issue	2018-04-18 20:41:33 -04:00
Tongzhou Wang	1c01eabd3c	Codemod to update our codebase to 0.4 standard (#6641 ) * Codemod to update our codebase to 0.4 standard * Update some of the test scri[ts * remove Variable in test_clip_grad_value * fix _symbolic_override_wrapper_maker	2018-04-17 22:06:54 -04:00
Sanjeev Satheesh	f15f3ca1af	Scope variables inside the dataloader (#6673 ) * Scope variables inside the dataloader This clears up the memory consumed by batches inside the dataloader. Its pretty useful for long living data loaders. * Update dataloader.py	2018-04-17 17:48:12 -04:00
Tongzhou Wang	6b7ec95abb	Link relevant FAQ section in DataLoader docs (#6476 ) * Link FAQ section on workers returning same random numbers in DataLoader docs * explicitly mention section names	2018-04-11 13:41:46 -04:00
Tongzhou Wang	60a16e5663	Set dataloader.batch_size = None when batch_sampler is given (#6108 )	2018-03-30 10:01:09 +02:00
peterjc123	ae4362bc6a	Fix memory leak when using multiple workers on Windows (#5585 )	2018-03-28 10:35:28 +02:00
AlexanderRadionov	831780390c	Fixed non-determinate preprocessing on DataLoader (#4640 ) dded ind_worker_queue parameter to data.DataLoader. It makes preprocessing determinate. DataLoader in multiprocessing mode may cause non-deterministic issue. Even if radom_seed has frozen, each subprocess may get tasks in unstable order. This is caused by different I/O time while data loads. If you use augmentation while data loading, it makes results unreproduceble. Look at the https://discuss.pytorch.org/t/deterministic-non-deterministic-results-with-pytorch/9087 To fix this issue I have added the individual queue for each worker. In this case each worker get tasks in the stable order. In summary, subprocess produces the stable results. To reproduce issue you may change ind_worker_queue to False and run the script several times. Code to reproduce issue is in the corresponding PR. * TestIndividualWorkerQueue added to DataLoader tests * Review fixes * "Simplify" code by removing itertools * Rebase conflicts fix * Review fixes * Fixed shutdown behavior * Removed ind_worker_queue flag. * Rebase on master * Disable tests that use DataLoader with multiple workers (#5322)	2018-03-23 17:43:59 -04:00
Tongzhou Wang	04461fa289	Prefix DataLoaderIter with underscore to discourage subclassing (#5619 )	2018-03-08 11:09:51 +01:00
Will Feng	a90b695590	Disallow num_workers > 0 for DataLoader on Windows (#5591 ) Using DataLoader with num_workers > 0 is known to cause CUDA out-of-memory issue on Windows. This issue has already been noted in #4092.	2018-03-07 10:21:03 -05:00
Tongzhou Wang	392fc8885c	add faq on cuda memory management and dataloder (#5378 )	2018-02-27 18:35:30 -05:00
Achal Dave	8327982904	Set python random seed in workers (#5415 ) * Set python random seed in workers * Import random	2018-02-27 03:16:10 -05:00
Tongzhou Wang	1ff537ca71	Ignore FileNotFoundError when shutting down in data_queue.get (#5380 ) * Ignore FileNotFoundError when shutting down in data_queue.get * Address @apaszke comments	2018-02-24 13:32:13 -05:00
Tongzhou Wang	64a9ecae02	Dataloader issues (#4643 ) * EINTR and kill by loader fix * addressed @apaszke 's comments * remove EINTR handling and add test if we are in main thread before setting SIGCHLD	2018-01-29 01:18:17 +01:00
Tongzhou Wang	0ac58d53b8	ATen conv param expansion; InstanceNorm use_running_stats fix (#4544 ) * fix instancenorm and aten conv param expansion * addressed colesbury 's comments * improve conv input shape check	2018-01-10 17:36:26 -05:00
Christian Sarofeen	bc6bd62bd6	Fix distributed dataloader so it pins memory to current GPU not GPU 0.	2017-12-19 13:39:06 +01:00
Tongzhou Wang	5cc26c0c90	Add default PyTorch seeding and worker_init_fn to DataLoader (#4018 ) * Add default PyTorch seeding and worker_init_fn to DataLoader * generate seed using current RNG each time * worker_seed <- main_proc_RNG_generated_seed + worker_id	2017-12-18 02:19:08 -05:00
Jon Crall	5c13c6962c	Raise errors when num_workers == 0 in DataLoader (#4019 )	2017-12-05 11:07:43 -08:00
Alykhan Tejani	5571d0187e	Accept longs in default_collate for dataloader in python 2 (#4001 )	2017-12-04 09:50:57 -08:00
SsnL	1661370ac5	Signal handling in DataLoader workers; Timeout option (#3474 )	2017-11-29 23:52:14 +01:00
Ozan Çağlayan	dd6d04ddf2	doc: Normalize all true/false in docstrings to ``True\|False`` (#3593 ) * doc: Normalize all true/false in docstrings to ``True\|False`` This makes them more apparent in the documentation. * doc: fix flake8	2017-11-09 08:12:29 -05:00
Richard Zou	e579ae75b5	Fix error when default_collate is passed a collection of numpy.str_ (#3404 ) * Fix error when default_collate is passed a collection of numpy.str_ * Error if default_collate input is nested nparray containing non-numbers	2017-11-08 10:02:08 -05:00
Sam Gross	8e58135a26	Fix E722 ('do not use bare except') (#3239 ) The new version of flake8 includes a check for not using bare except. We should avoid this since it catches things like KeyboardInterrupt.	2017-10-23 23:03:37 -04:00
Adam Paszke	411e1469e0	Add tools for autograd profiling	2017-09-25 23:21:30 -04:00
Sam Gross	f09027bc29	Add batch sampler to DataLoader (#1867 )	2017-06-22 20:18:31 +02:00
Sasank Chilamkurthy	94b147fd41	Allows dicts batches in dataloader. (#1354 ) * Allow dicts in Dataloader * use collections.Sequence instead of collections.Iterable in dataloader	2017-04-28 19:14:52 +02:00
Sam Gross	24d92b5d9f	Concatenate directly into shared memory when constructing batches (#1323 ) This saves an extra memory copy, which speeds up data loading a bit (5-10% with accimage). As part of this change: * torch.cat accepts keyword argument out * sepcifiying out=None is treated like not specifying out	2017-04-22 03:40:30 -04:00
Adam Paszke	605b3c86ce	Retain the type of numpy scalars in collate_fn	2017-04-11 14:48:54 -07:00
Xingdong Zuo	9f2a5d804d	Add a flag to fix when dataset size is not divisible by batch size. (#1133 )	2017-04-06 00:18:43 -04:00
Xingdong Zuo	476d85dd3f	DataLoader: Fix batch data type for numpy array (#1074 )	2017-03-24 11:34:24 -04:00
Eli Stevens	e216f557fd	Fixes issue returning strings from a Dataloader with pin_memory=True (#908 )	2017-03-13 10:11:07 +01:00
yunjey	3330287dc7	Update dataloader.py (#837 )	2017-02-23 14:38:41 -05:00
Adam Paszke	7ea6ae57c8	Support numpy arrays in default_collate	2017-02-20 23:28:31 -08:00
Adam Paszke	4cc11066b2	Add torch.utils.data docs and improve notes (#460 ) * Add torch.utils.data docs and improve notes	2017-01-17 14:51:05 -05:00
Sergey Zagoruyko	a0c614ece3	unsqueeze instead of view in dataloader	2017-01-01 23:38:54 +01:00

1 2

58 Commits