pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Tongzhou Wang	c30790797f	Minor data loader doc improvements Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11821 Differential Revision: D9948292 Pulled By: SsnL fbshipit-source-id: 01c21c129423c0f7844b403e665a8fe021a9c820	2018-09-19 15:33:25 -07:00
Tongzhou Wang	8e76dcf173	Prevent raising KeyboardInterrupt in worker (#11718 ) Summary: Current behavior is that each process (main and workers) will print trace from `KeyboardInterrupt`. And the main process will also print ``` RuntimeError: DataLoader worker (pid 46045) exited unexpectedly with exit code 1. Details are lost due to multiprocessing. Rerunning with nm_workers=0 may give better error trace. ``` due to our SIGCLD handler. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11718 Differential Revision: D9840844 Pulled By: SsnL fbshipit-source-id: 1a05060bb02907fef5aac3f274d2c84f9f42d187	2018-09-14 16:09:35 -07:00
Jeff Smith	05e06f7de2	migrating deprecated calls without abc module for containers (#11515 ) Summary: Implementing #10540. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11515 Reviewed By: apaszke Differential Revision: D9771045 Pulled By: jeffreyksmithjr fbshipit-source-id: 85ea39abaa9b465805a969f122b626b11fc85ef6	2018-09-13 15:09:22 -07:00
Tongzhou Wang	57f149a861	Only join pin_memory_thread after it started (#11599 ) Summary: Same reason as in #11432 . Example error: ``` Exception ignored in: <function _DataLoaderIter.__del__ at 0x7fa06963cf28> Traceback (most recent call last): File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 405, in __del__ self._shutdown_workers() File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 401, in _shutdown_workers self.pin_memory_thread.join() AttributeError: '_DataLoaderIter' object has no attribute 'pin_memory_thread' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11599 Differential Revision: D9801143 Pulled By: SsnL fbshipit-source-id: 520590a21f56fa381fcac621457a7544d3fba47e	2018-09-13 09:40:49 -07:00
Teng Li	0988bbad2d	C10d release to torch.distributed for PT1 (#11405 ) Summary: The old `torch.distributed` will go to `torch.distributed.deprecated` The old DDP will go to `torch.nn.parallel.deprecated` Now `torch.nn.parallel.DDP` will use c10d DDP Now `torch.distributed` will use C10d frontend API Pull Request resolved: https://github.com/pytorch/pytorch/pull/11405 Reviewed By: pietern Differential Revision: D9733733 Pulled By: teng-li fbshipit-source-id: d6a3f3e73f8d3a7fcb1f4baef53c78063b8cbb08	2018-09-10 23:27:22 -07:00
Tongzhou Wang	560d6efd3a	Only join started dataloader workers (#11432 ) Summary: `Process.start()` actually take some time as it needs to start a process and pass the arguments over via a pipe. Therefore, we only add a worker to self.workers list after it started, so that we do not call `.join()` if program dies before it starts, and `__del__` tries to join it but will get: AssertionError: can only join a started process. Example trace when such error happens: ```py [unrelated] File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 500, in __iter__ return _DataLoaderIter(self) File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 292, in __init__ w.start() File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/process.py", line 112, in start self._popen = self._Popen(self) File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/context.py", line 277, in _Popen return Popen(process_obj) File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__ self._launch(process_obj) File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/popen_fork.py", line 70, in _launch self.pid = os.fork() KeyboardInterrupt Exception ignored in: <function _DataLoaderIter.__del__ at 0x7fa704d5aa60> Traceback (most recent call last): File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 398, in __del__ self._shutdown_workers() File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 392, in _shutdown_workers w.join() File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/process.py", line 139, in join assert self._popen is not None, 'can only join a started process' AssertionError: can only join a started process ``` No test because hard to reliably trigger. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11432 Reviewed By: ezyang Differential Revision: D9735430 Pulled By: SsnL fbshipit-source-id: a8912d9bb4063f210d6236267b178173810e2351	2018-09-09 12:55:51 -07:00
Wei Yang	7f9fd1cc26	allow RandomSampler to sample with replacement (#9911 ) Summary: fixes #7908 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9911 Reviewed By: yf225 Differential Revision: D9023223 Pulled By: weiyangfb fbshipit-source-id: 68b199bef3940b7205d0fdad75e7c46e6fe65ba7	2018-08-28 10:52:25 -07:00
Chetter2	5ca2713a8b	Fix performance of WeightedRandomSampler (#10636 ) Summary: Since https://github.com/pytorch/pytorch/pull/8958 was merged, the BatchSampler samples 0d tensors from WeightedRandomSampler instead of integers. It significantly reduces performance. This PR fix it the same way as https://github.com/pytorch/pytorch/pull/10361 fix DistributedSampler. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10636 Differential Revision: D9423869 Pulled By: zou3519 fbshipit-source-id: f94da2d4cccf70e63beea6cfc3d1230b5610ae44	2018-08-22 13:15:48 -07:00
Tongzhou Wang	108b657159	Import DistributedSampler in utils/data/__init__ (#10671 ) Summary: There is no reason that user should do an extra import to use DistributedSampler. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10671 Differential Revision: D9395189 Pulled By: SsnL fbshipit-source-id: 8f41d93813c8fb52fe012f76980c6a261a8db9b2	2018-08-19 16:55:13 -07:00
Alex Sergeev	18d2fcde7a	Fix performance of DistributedSampler per #8958 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10361 Differential Revision: D9240798 Pulled By: ezyang fbshipit-source-id: dc4cfe79612f711bbcff34a147877df6a5f7b89f	2018-08-09 12:54:37 -07:00
Tongzhou Wang	04f381650e	Resubmit: Fix dataloader hang when it is not completely iterated (#10366 ) Summary: https://github.com/pytorch/pytorch/pull/9655 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10366 Differential Revision: D9237393 Pulled By: SsnL fbshipit-source-id: fabfad7f371ba33300098f6b885c0e3f26c3e14a	2018-08-09 00:10:24 -07:00
Tongzhou Wang	a7f183f971	Revert "Fix dataloader hang when it is not completely iterated (#9655 )" (#9804 ) Summary: This reverts commit `9ee5133651`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9804 Reviewed By: ezyang Differential Revision: D8987780 Pulled By: SsnL fbshipit-source-id: 75ad70b0b8d672d0b35235fa248b187be64b68e5	2018-07-25 10:10:30 -07:00
Tongzhou Wang	9ee5133651	Fix dataloader hang when it is not completely iterated (#9655 ) Summary: second trial of https://github.com/pytorch/pytorch/pull/7140 cc csarofeen Let's see if this works. It passes everything locally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9655 Differential Revision: D8940177 Pulled By: SsnL fbshipit-source-id: 8d6340fc9f7355c71e1e26b262da166402faa158	2018-07-22 20:38:27 -07:00
Dmitriy Serdyuk	ba8e133844	Refactor batch sampler (#8958 ) Summary: Fixes #8652, fixes #8957 Closes https://github.com/pytorch/pytorch/pull/8958 Reviewed By: ezyang Differential Revision: D8668253 Pulled By: soumith fbshipit-source-id: 663d461621511166f29cfcc902e6c2a71befa647	2018-06-27 16:06:47 -07:00
tvn	146b951ec5	Fix seeding random module in DataLoader (#7886 ) * fix seeding random module * make base seed int * follow 0.4 idiom * add a test for random seeding	2018-05-29 15:55:04 -04:00
Gao, Xiang	d7c32df67f	move Subset, random_split to data, use sequence at some places. (#7816 )	2018-05-25 12:50:50 +02:00
Gao, Xiang	42e5e12750	make BatchSampler subclass of Sampler, and expose (#7707 )	2018-05-19 21:29:03 +02:00
Maxim Berman	03767b66db	Add FileNotFoundError to torch._six (#7524 ) Add FileNotFoundError for compatibility with Python 2 and use in dataloader. Fixes pytorch/pytorch#6932	2018-05-12 20:54:26 -04:00
vfdev	6363faf184	Fix issue #7209 in DataLoader (#7265 )	2018-05-04 10:51:46 +02:00
Thomas Viehmann	1b0ad8678b	import *Sampler to utils.data (Better fix than #6982 ) (#7007 )	2018-04-27 10:18:29 +02:00
Thomas Viehmann	5dc5a71d74	Improve error message (Sampler location) Fixes #6917 (#6982 ) Thank you @ruotianluo for reporting!	2018-04-26 10:58:27 -04:00
Will Feng	e8bdbdaa27	Terminate dataloader workers properly when parent process is SIGKILL'ed (#6779 ) Reopening #6606 with fix for TEST_CUDA import issue on Windows and improvement to how we wait for manager exit in test_manager_unclean_exit. Loop tested on the Windows CI multiple times to make sure this actually fixes the CUDA OOM issue. * Terminate dataloader workers properly when parent process is SIGKILL'ed * Wait for worker processes to finish before shutting down manager process * Add test for checking proper worker exit * cosmetic change * Test only if CUDA exists * Don't call multiprocessing.set_start_method() in Python 2 * import TEST_CUDA only when we are in __main__ * Tune JOIN_TIMEOUT * handle os.getppid() == 0 case * Reset to original JOIN_TIMEOUT * Use WaitForSingleObject() to check parent process status on Windows * Fix TEST_CUDA import * clean up * Check main process only when index_queue.get() times out * Change index_queues to multiprocessing.Queue * Move manager checking logic to watchdog class * Fix bugs in dataloader * Fix TEST_CUDA import issue * Don't import TEST_CUDA from common_nn * Use event to signal manager exit in test * fix lint * Add comments	2018-04-22 23:03:54 -04:00
gchanan	4c5b95a433	Revert "Terminate dataloader workers properly when parent process is SIGKILL'ed (#6606 )" (#6772 ) This reverts commit `8d6a50aaeb`.	2018-04-19 14:28:48 -04:00
Tongzhou Wang	072d49f787	Fix import error sometimes happening in dataloader when exiting Python (#6671 ) * Fix import error sometimes happening in dataloader when exiting Python * address comments	2018-04-19 06:56:39 -04:00
Will Feng	8d6a50aaeb	Terminate dataloader workers properly when parent process is SIGKILL'ed (#6606 ) * Terminate dataloader workers properly when parent process is SIGKILL'ed * Wait for worker processes to finish before shutting down manager process * Add test for checking proper worker exit * cosmetic change * Test only if CUDA exists * Don't call multiprocessing.set_start_method() in Python 2 * import TEST_CUDA only when we are in __main__ * Tune JOIN_TIMEOUT * handle os.getppid() == 0 case * Reset to original JOIN_TIMEOUT * Use WaitForSingleObject() to check parent process status on Windows * Fix TEST_CUDA import * clean up * Check main process only when index_queue.get() times out * Change index_queues to multiprocessing.Queue * Move manager checking logic to watchdog class * Fix bugs in dataloader * Fix TEST_CUDA import issue	2018-04-18 20:41:33 -04:00
Tongzhou Wang	1c01eabd3c	Codemod to update our codebase to 0.4 standard (#6641 ) * Codemod to update our codebase to 0.4 standard * Update some of the test scri[ts * remove Variable in test_clip_grad_value * fix _symbolic_override_wrapper_maker	2018-04-17 22:06:54 -04:00
Sanjeev Satheesh	f15f3ca1af	Scope variables inside the dataloader (#6673 ) * Scope variables inside the dataloader This clears up the memory consumed by batches inside the dataloader. Its pretty useful for long living data loaders. * Update dataloader.py	2018-04-17 17:48:12 -04:00
Tongzhou Wang	1f0b07cddc	fix typos in sampler.py (#6525 )	2018-04-11 17:27:25 -04:00
Tongzhou Wang	6b7ec95abb	Link relevant FAQ section in DataLoader docs (#6476 ) * Link FAQ section on workers returning same random numbers in DataLoader docs * explicitly mention section names	2018-04-11 13:41:46 -04:00
Tongzhou Wang	efc91d8c6d	Add arg checks in torch.utils.data.Sampler classes (#6249 ) Fixes #6168 * add arg checks in torch.utils.data.Sampler * add check for positive-ness	2018-04-04 23:07:31 -04:00
Tongzhou Wang	60a16e5663	Set dataloader.batch_size = None when batch_sampler is given (#6108 )	2018-03-30 10:01:09 +02:00
Jason Park	64e2c03bea	Enable TensorDataset to get any number of tensors (#6038 ) Keeping compatibility, enable TensorDataset to get any number of tensors. * Enable TensorDataset to get any number of tensors * Update dataset.py Fix syntax error on python 2.7 * Add several test for tensordataset * Fix whitespaces * Simplify args * Update dataset.py	2018-03-28 11:20:50 -04:00
peterjc123	ae4362bc6a	Fix memory leak when using multiple workers on Windows (#5585 )	2018-03-28 10:35:28 +02:00
AlexanderRadionov	831780390c	Fixed non-determinate preprocessing on DataLoader (#4640 ) dded ind_worker_queue parameter to data.DataLoader. It makes preprocessing determinate. DataLoader in multiprocessing mode may cause non-deterministic issue. Even if radom_seed has frozen, each subprocess may get tasks in unstable order. This is caused by different I/O time while data loads. If you use augmentation while data loading, it makes results unreproduceble. Look at the https://discuss.pytorch.org/t/deterministic-non-deterministic-results-with-pytorch/9087 To fix this issue I have added the individual queue for each worker. In this case each worker get tasks in the stable order. In summary, subprocess produces the stable results. To reproduce issue you may change ind_worker_queue to False and run the script several times. Code to reproduce issue is in the corresponding PR. * TestIndividualWorkerQueue added to DataLoader tests * Review fixes * "Simplify" code by removing itertools * Rebase conflicts fix * Review fixes * Fixed shutdown behavior * Removed ind_worker_queue flag. * Rebase on master * Disable tests that use DataLoader with multiple workers (#5322)	2018-03-23 17:43:59 -04:00
Tongzhou Wang	04461fa289	Prefix DataLoaderIter with underscore to discourage subclassing (#5619 )	2018-03-08 11:09:51 +01:00
Will Feng	a90b695590	Disallow num_workers > 0 for DataLoader on Windows (#5591 ) Using DataLoader with num_workers > 0 is known to cause CUDA out-of-memory issue on Windows. This issue has already been noted in #4092.	2018-03-07 10:21:03 -05:00
Tongzhou Wang	392fc8885c	add faq on cuda memory management and dataloder (#5378 )	2018-02-27 18:35:30 -05:00
Achal Dave	8327982904	Set python random seed in workers (#5415 ) * Set python random seed in workers * Import random	2018-02-27 03:16:10 -05:00
Tongzhou Wang	1ff537ca71	Ignore FileNotFoundError when shutting down in data_queue.get (#5380 ) * Ignore FileNotFoundError when shutting down in data_queue.get * Address @apaszke comments	2018-02-24 13:32:13 -05:00
Sam Gross	30ec06c140	Merge Variable and Tensor classes (#5225 ) This replaces the torch.Tensor constructors with factories that produce Variables. Similarly, functions on the torch module (e.g. torch.randn) now return Variables. To keep the PR to a reasonable size, I've left most of the unused tensor code. Subsequent PRs will remove the dead code, clean-up calls to torch.autograd.Variable, and rename Variable to Tensor everywhere. There are some breaking changes because Variable and Tensors had slightly different semantics. There's a list of those changes here: https://github.com/pytorch/pytorch/wiki/Breaking-Changes-from-Variable-and-Tensor-merge	2018-02-23 18:03:31 -05:00
Tongzhou Wang	64a9ecae02	Dataloader issues (#4643 ) * EINTR and kill by loader fix * addressed @apaszke 's comments * remove EINTR handling and add test if we are in main thread before setting SIGCHLD	2018-01-29 01:18:17 +01:00
Tongzhou Wang	0ac58d53b8	ATen conv param expansion; InstanceNorm use_running_stats fix (#4544 ) * fix instancenorm and aten conv param expansion * addressed colesbury 's comments * improve conv input shape check	2018-01-10 17:36:26 -05:00
Alykhan Tejani	18a866aedd	Add random_split to torch.utils.data.dataset (#4435 )	2018-01-02 18:56:49 +01:00
Christian Sarofeen	bc6bd62bd6	Fix distributed dataloader so it pins memory to current GPU not GPU 0.	2017-12-19 13:39:06 +01:00
Tongzhou Wang	5cc26c0c90	Add default PyTorch seeding and worker_init_fn to DataLoader (#4018 ) * Add default PyTorch seeding and worker_init_fn to DataLoader * generate seed using current RNG each time * worker_seed <- main_proc_RNG_generated_seed + worker_id	2017-12-18 02:19:08 -05:00
Jon Crall	5c13c6962c	Raise errors when num_workers == 0 in DataLoader (#4019 )	2017-12-05 11:07:43 -08:00
Alykhan Tejani	5571d0187e	Accept longs in default_collate for dataloader in python 2 (#4001 )	2017-12-04 09:50:57 -08:00
SsnL	1661370ac5	Signal handling in DataLoader workers; Timeout option (#3474 )	2017-11-29 23:52:14 +01:00
Mikhail Korobov	754f3d3fe8	fixed a typo in ConcatDataset.cumulative_sizes attribute name	2017-11-24 11:07:51 +01:00
Ozan Çağlayan	dd6d04ddf2	doc: Normalize all true/false in docstrings to ``True\|False`` (#3593 ) * doc: Normalize all true/false in docstrings to ``True\|False`` This makes them more apparent in the documentation. * doc: fix flake8	2017-11-09 08:12:29 -05:00

1 2

87 Commits