pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
SsnL	ed19580dc4	Fix dataloader._shutdown_workers if not all workers are started (#23761 ) Summary: Otherwise you may see errors like ``` Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x000001F99F5CB9D8> Traceback (most recent call last): File "C:\Users\Divyansh J\Anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 883, in __del__ self._shutdown_workers() File "C:\Users\Divyansh J\Anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 860, in _shutdown_workers if self.workers_status[worker_id]: IndexError: list index out of range ``` e.g. https://discuss.pytorch.org/t/how-to-construct-dataset-with-iterator-for-multi-process-dataloader/49612/5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23761 Differential Revision: D16644687 Pulled By: soumith fbshipit-source-id: a60e847431264525079456ff422317af1ac2be4b	2019-08-07 09:06:11 -07:00
Tongzhou Wang	0539462ca2	Fix pin_memory_thread not exiting quickly (#23646 ) Summary: fixes https://github.com/pytorch/pytorch/issues/23642 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23646 Differential Revision: D16600874 Pulled By: soumith fbshipit-source-id: 50f0828d774a558d6f21e9dd21135906bd5be128	2019-08-01 15:24:14 -07:00
SsnL	e982e46de3	Add multiprocessing_context= argument to DataLoader (#22990 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/22131 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22990 Differential Revision: D16539052 Pulled By: colesbury fbshipit-source-id: b1c48ae2fb54065dd96a67be263254129e02eaa2	2019-07-29 12:58:40 -07:00
Jan Schlüter	0bc90194fb	Catch and print exception traceback in parallel_apply() workers (#18055 ) Summary: When an exception occurs in one of the modules passed to `parallel_apply()`, it is caught and re-raised in the main thread. This preserves the original exception type and message, but has the traceback point at the position where it's re-raised, rather than the original point of failure. This PR saves the exception information required to generate the traceback, and includes the original traceback in the message of the exception raised in the main thread. Before: ``` ... File ".../torch/nn/parallel/data_parallel.py", line 153, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File ".../torch/nn/parallel/parallel_apply.py", line 84, in parallel_apply raise output RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor ``` After: ``` ... File ".../torch/nn/parallel/data_parallel.py", line 153, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File ".../torch/nn/parallel/parallel_apply.py", line 88, in parallel_apply ''.join(traceback.format_exception(*exc_info))) RuntimeError: Caught exception in replica 0. Original traceback and message: Traceback (most recent call last): ... File "../models/foo.py", line 319, in bar baz = asdf / ghij[:, np.newaxis] RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor ``` I took care to raise an exception of the original type (in case the main code checks for that), but replaced the message. It helped me find a bug that did not occur outside `data_parallel()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18055 Differential Revision: D16444972 Pulled By: zhangguanheng66 fbshipit-source-id: ec436c9d4677fad18106a8046cfa835a20a101ce	2019-07-26 11:41:22 -07:00
Tongzhou Wang	e4b75c6580	Fix typo in dataloader.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23132 Differential Revision: D16402759 Pulled By: ezyang fbshipit-source-id: 9500570f6b7492a67a2af853bfb63a5667e6b7b5	2019-07-23 08:45:47 -07:00
Arul	43d36415b9	torch.utils.data.Dataloader: documentation about RNG state consumption (#22540 ) Summary: the outcome from the pytorch forum issue: https://discuss.pytorch.org/t/dataloader-problem-problem-arises-when-shuffle-true/45631 The discussion is here: https://github.com/pytorch/pytorch/pull/20749 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22540 Differential Revision: D16131777 Pulled By: ezyang fbshipit-source-id: 566deda1b44dc7fae54250e9b508d120851a2848	2019-07-08 08:22:04 -07:00
Dehua Cheng	7730346853	Make shuffling optional in DistributedSampler (#22479 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22479 In some cases, for example, when we training on CTR data, we would like to start training from old samples and finish on new recent samples. This diff add the option to disable the shuffling in DistributedSampler to accommodate this use case. Reviewed By: soumith Differential Revision: D16100388 fbshipit-source-id: 35566581f5250040b2db5ec408a63037b47a9f5d	2019-07-05 18:56:28 -07:00
Tongzhou Wang	fde75a33e1	update IterableDataset doc to be consistent with current behavior Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22230 Differential Revision: D15994680 Pulled By: ezyang fbshipit-source-id: 9e47e8369aa08a550987c4468ce75aa7650ee1d4	2019-06-26 06:49:22 -07:00
Tongzhou Wang	058beae411	Add IterableDataset (#19228 ) Summary: This is a modified version of https://github.com/pytorch/pytorch/pull/14705 since commit structure for that PR is quite messy. 1. Add `IterableDataset`. 3. So we have 2 data loader mods: `Iterable` and `Map`. 1. `Iterable` if the `dataset` is an instance of `IterableDataset` 2. `Map` o.w. 3. Add better support for non-batch loading (i.e., `batch_size=None` and `batch_sampler=None`). This is useful in doing things like bulk loading. 3. Refactor `DataLoaderIter` into two classes, `_SingleProcessDataLoaderIter` and `_MultiProcessingDataLoaderIter`. Rename some methods to be more generic, e.g., `get_batch` -> `get_data`. 4. Add `torch.utils.data.get_worker_info` which returns worker information in a worker proc (e.g., worker id, dataset obj copy, etc.) and can be used in `IterableDataset.__iter__` and `worker_init_fn` to do per-worker configuration. 5. Add `ChainDataset`, which is the analog of `ConcatDataset` for `IterableDataset`. 7. Import torch.utils.data in `torch/__init__.py` 9. data loader examples and documentations 10. Use `get_worker_info` to detect whether we are in a worker process in `default_collate` Closes https://github.com/pytorch/pytorch/issues/17909, https://github.com/pytorch/pytorch/issues/18096, https://github.com/pytorch/pytorch/issues/19946, and some of https://github.com/pytorch/pytorch/issues/13023 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19228 Reviewed By: bddppq Differential Revision: D15058152 fbshipit-source-id: 9e081a901a071d7e4502b88054a34b450ab5ddde	2019-06-20 20:12:44 -07:00
jpgard	0556141339	fix small typo muliprocessing -> multiprocessing (#20998 ) Summary: Minor typo fix in docstring. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20998 Differential Revision: D15514698 Pulled By: soumith fbshipit-source-id: a9ceb557251ff5868e810331195243b6a8717851	2019-05-27 21:36:13 -07:00
Soumith Chintala	6480d3f140	Revert D15511921: [pytorch][PR] BatchSampler now uses list.clear() instead of creating new objects Differential Revision: D15511921 Original commit changeset: e943d21e75e1 fbshipit-source-id: 933b7ef74c7a530f0a2cc087c8ee6f0455cf9239	2019-05-27 10:51:24 -07:00
Tongzhou Wang	482ae8e6b2	BatchSampler now uses list.clear() instead of creating new objects Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20976 Differential Revision: D15511921 Pulled By: soumith fbshipit-source-id: e943d21e75e19f9154a0570f3188cc3ce174083e	2019-05-26 23:45:26 -07:00
Dmytro Dzhulgakov	c25e33789e	Lightweight at-most-once logging for API usage (#20745 ) Summary: Resubmit #20698 which got messed up. Idea is that when PyTorch is used in a custom build environment (e.g. Facebook), it's useful to track usage of various APIs centrally. This PR introduces a simple very lightweight mechanism to do so - only first invocation of a trigger point would be logged. This is significantly more lightweight than #18235 and thus we can allow to put logging in e.g. TensorImpl. Also adds an initial list of trigger points. Trigger points are added in such a way that no static initialization triggers them, i.e. just linking with libtorch.so will not cause any logging. Further suggestions of what to log are welcomed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20745 Differential Revision: D15429196 Pulled By: dzhulgakov fbshipit-source-id: a5e41a709a65b7ebccc6b95f93854e583cf20aca	2019-05-23 23:17:59 -07:00
Edward Z. Yang	9b1dbffba5	Re-sync with internal repository (#20702 )	2019-05-20 09:22:57 -04:00
Dmytro Dzhulgakov	d3059b9c49	Lightweight logging for once-only API usage	2019-05-19 23:04:40 -07:00
Tongzhou Wang	f496ea36b2	DataLoader: add error detection for worker_init_fn (#20150 ) Summary: This is an attempt to isolate unrelated changes from #19228 for easier review. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20150 Differential Revision: D15314891 Pulled By: ezyang fbshipit-source-id: 8c429747ba83ad5aca4cdd8f8086bcf65a326921	2019-05-12 18:28:56 -07:00
Michael Antonov	698103cdd6	DataLoader docs update to describe how workers are managed, including Windows. (#18091 ) Summary: It's been hard to understand how workers are launched and what code runs in the worker vs. main process, especially on Windows, which leads to many of our samples failing. This explains when workers run an how to make code work on Windows as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18091 Differential Revision: D15083766 Pulled By: soumith fbshipit-source-id: 8a7e60defc8a72ec63874f657d7d5267d951dccf	2019-04-26 16:01:30 -07:00
SsnL	5e62ee2b97	Fix no SIGCHLD checking in DataLoaderIter._shutdown_workers (#19421 ) Summary: Also 1. Bump multiprocessing test timeout following python core tests 2. Fix one type of flakiness in `test_proper_exit`. 3. Add trace reporting when loader process hangs in `test_proper_exit` using `faulthandler`. 3. Give `test_proper_exit` another try. I'll heavily retest this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19421 Differential Revision: D15063728 Pulled By: ezyang fbshipit-source-id: 4e0d992622e11053c44a9ec237b88b9a28a4472c	2019-04-24 08:06:58 -07:00
crcrpar	bb05f70724	fix the docstring of `RandomSampler` (#19113 ) Summary: fix - the order of `Arguments` in `RandomSampler` doc - the meaningless check of `replacement`'s type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19113 Differential Revision: D15013081 Pulled By: ezyang fbshipit-source-id: 39e367f42841de6814b1214eb9df7b75f14f747e	2019-04-23 09:54:20 -07:00
SsnL	941ccd6b35	Fix missing import sys in pin_memory.py (#19419 ) Summary: kostmo pointed this out in #15331. Thanks :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/19419 Differential Revision: D15002846 Pulled By: soumith fbshipit-source-id: c600fab3f7a7a5147994b9363910af4565c7ee65	2019-04-18 22:19:26 -07:00
Jon Malmaud	0565141728	Type annotations for `util.data`. (#18963 ) Summary: I haven't had a chance to rigorously try these out yet so don't merge yet. Closes #18725. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18963 Differential Revision: D14832897 Pulled By: ezyang fbshipit-source-id: 4780e7a34126bc66ddbfd9d808dfc9e0edd77e68	2019-04-08 09:52:53 -07:00
Edward Yang	173f224570	Turn on F401: Unused import warning. (#18598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598 ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a Stack from [ghstack](https://github.com/ezyang/ghstack): * #18598 Turn on F401: Unused import warning. This was requested by someone at Facebook; this lint is turned on for Facebook by default. "Sure, why not." I had to noqa a number of imports in __init__. Hypothetically we're supposed to use __all__ in this case, but I was too lazy to fix it. Left for future work. Be careful! flake8-2 and flake8-3 behave differently with respect to import resolution for # type: comments. flake8-3 will report an import unused; flake8-2 will not. For now, I just noqa'd all these sites. All the changes were done by hand. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14687478 fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3	2019-03-30 09:01:17 -07:00
Stas Bekman	c0a2452ffe	multiline KeyError msg python bug workaround (#18557 ) Summary: make multiline KeyError msg readable by working around a python bug https://bugs.python.org/issue2651 discussion: https://github.com/pytorch/pytorch/issues/16647 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18557 Differential Revision: D14681086 Pulled By: soumith fbshipit-source-id: acbd13a823302c854c3d364028ed414fd8ce6bc8	2019-03-29 07:04:20 -07:00
bhushan	a6c4ea66dd	Passing indices as a list to Subset instead of Tensor (#17649 ) Summary: Indices in Subset were stored as tensors earlier passing as list in random_split to ensure integer indexing fixes: #17466 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17649 Differential Revision: D14400250 Pulled By: soumith fbshipit-source-id: cd20a959f33773c4babf8e861ea37ec61c2713a0	2019-03-10 09:23:53 -07:00
Krishna Kalyan	d80f0a1f3a	Add example to WeightedRandomSampler doc string (#17432 ) Summary: Example for the weighted random sampler are missing [here](https://pytorch.org/docs/stable/data.html#torch.utils.data.WeightedRandomSampler) Differential Revision: D14198642 Pulled By: soumith fbshipit-source-id: af6d8445d31304011002dd4308faaf40b0c1b609	2019-02-23 20:29:06 -08:00
Olen ANDONI	be4ad3fe30	fix(typo): Change 'integeral' to 'integer' Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17396 Differential Revision: D14195023 Pulled By: soumith fbshipit-source-id: 300ab68c24bfbf10768fefac44fad64784463c8f	2019-02-23 08:22:01 -08:00
jayleverett	016f212357	fix behavior of ConcatDataset w/ negative indices (#15756 ) Summary: Currently, when you pass a negative index to a `Dataset` created with `ConcatDataset`, it simply passes that index to the first dataset in the list. So if, for example, we took `concatenated_dataset[-1]`, this will give us the last entry of the first dataset, rather than the last entry of the last dataset, as we would expect. This is a simple fix to support the expected behavior for negative indices. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15756 Reviewed By: ezyang Differential Revision: D14081811 Pulled By: fmassa fbshipit-source-id: a7783fd3fd9e1a8c00fd076c4978ca39ad5a8a2a	2019-02-14 13:02:54 -08:00
ptrblck	8abfd28f58	#16627 convert weights using torch.as_tensor to avoid warning (#17067 ) Summary: Minor change which fixes #16627 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17067 Differential Revision: D14078726 Pulled By: soumith fbshipit-source-id: c04a5f1eff44e4a4b04b981f0ae8de6ff018515b	2019-02-13 20:54:29 -08:00
Daniel	e5742494f6	Minor typo Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16980 Differential Revision: D14033686 Pulled By: gchanan fbshipit-source-id: 9f7967defc6795640e14157d0b701b185061741f	2019-02-12 08:02:04 -08:00
Eskil Jörgensen	8042edcdb1	Make pin_memory and default_collate preserve namedtuples (#16440 ) Summary: Open issue: https://github.com/pytorch/pytorch/issues/3281 Corresponding PR (conflict): https://github.com/pytorch/pytorch/pull/4577 Another open issue: https://github.com/pytorch/pytorch/issues/14613 Pull Request resolved: https://github.com/pytorch/pytorch/pull/16440 Differential Revision: D14020901 Pulled By: ezyang fbshipit-source-id: 4abe817fc43c281a510715d311bad544511995d3	2019-02-11 08:47:33 -08:00
Michael Carilli	0742874643	Allow dataloader to accept a custom memory pinning function (#16743 ) Summary: Renewed attempt at https://github.com/pytorch/pytorch/pull/14171 From the original PR: > Currently, the pin_memory_batch function in the dataloader will return a batch comprised of any unrecognized type without pinning the data, because it doesn't know how. > >This behavior was preventing us from overlapping data prefetching in Mask-RCNN, whose custom collate_fn returns a custom batch type. The old PR allowed the user to implement batch pinning for custom batch and data types by passing a custom pin function to the dataloader. slayton58 suggested a cleaner approach: allow the user to define a `pin_memory` method on their custom types, and have `pin_memory_batch` [check for the presence of that method](https://github.com/pytorch/pytorch/pull/16743/files#diff-9f154cbd884fe654066b1621fad654f3R56) in the incoming batch as a fallback. I've updated the test and docstrings accordingly. The old PR was merged but then reverted due to weird cuda OOM errors on windows that may or may not have been related. I have no idea why my changes would cause such errors (then or now) but it's something to keep an eye out for. fmassa and yf225 who were my POCs on the old PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16743 Differential Revision: D13991745 Pulled By: ezyang fbshipit-source-id: 74e71f62a03be453b4caa9f5524e9bc53467fa17	2019-02-10 19:37:53 -08:00
SsnL	4aae89fa7b	Make test_proper_exit more robust (#16249 ) Summary: 1. Improve error message for better debugging info 2. Increase timeout 3. Also apply the windows worker failure detection mechanism on non-Windows platforms, for better robustness Attempt to fix #14501 cc ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/16249 Differential Revision: D13784702 Pulled By: ezyang fbshipit-source-id: 09a7cff83ab9edce561ed69f9fb555ab35d1275f	2019-01-25 08:25:05 -08:00
kyryl	a7415787ac	fix RandomSampler length (#15991 ) Summary: Hi! This PR addresses #15537 issue. Please review. Thanks! Differential Revision: D13649890 Pulled By: soumith fbshipit-source-id: 166212ae383331345423236dfc4fa2ea907d265d	2019-01-13 23:09:51 -08:00
SsnL	9b5ec2a076	Fix TestDataLoader.test_proper_exit (#15665 ) Summary: Currently, in `test_proper_exit`, 1. we do not kill the correct input `pid` in the `kill_pid` function `fe15d6a2c2/test/test_dataloader.py (L325-L329)` 2. the Windows command that detects process status doesn't actually work `fe15d6a2c2/test/test_dataloader.py (L641-L646)` 3. `worker_error` and `worker_kill` cases (sometimes?) are not tested because the workers may exit naturally due to the pre-fetching mechanism and a too small `dataset size / batch size`. In this PR, I, in separate commits: 1. Install `psutil` (a python package specifically built for process monitoring) on some CI builds. (Linux builds installation are done in https://github.com/pietern/pytorch-dockerfiles/pull/29 https://github.com/pietern/pytorch-dockerfiles/pull/30 https://github.com/pytorch/ossci-job-dsl/pull/36 and https://github.com/pytorch/pytorch/pull/15795). 2. Rewrite `test_proper_exit` with `psutil` so we 1. do not rely on the hacky `is_process_alive` `fe15d6a2c2/test/test_dataloader.py (L640-L653)` 2. increase the #task per worker so `worker_error` and `worker_kill` properly trigger 3. test error message content to ensure that the loader exits with correct message corresponding to each exiting scenario. 3. Fix Windows data loader not having any mechanism to detect worker failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15665 Differential Revision: D13615527 Pulled By: soumith fbshipit-source-id: cfb2f67837d2d87928a53f00b4d20f09754b7949	2019-01-10 08:47:27 -08:00
Christoph	2a45050fdc	Concatenate directly into shared memory when constructing batches for numpy (#14534 ) Summary: Since #1323 tensors are shared with shared memory, but this feature is not active for numpy. This PR fix this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14534 Differential Revision: D13561649 Pulled By: soumith fbshipit-source-id: b6bc9e99fb91e8b675c2ef131fba9fa11c1647c0	2018-12-29 17:51:02 -08:00
SsnL	fb22f76eb6	default_collate should collate bool list to byte tensors (#14669 ) Summary: Based on #15331 . Review only the last commit. Fixes https://github.com/pytorch/pytorch/issues/14507. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14669 Reviewed By: ezyang Differential Revision: D13528725 Pulled By: soumith fbshipit-source-id: f12f1ac1c4ff2a3ddd6877c0c096a5da3a1ffa3c	2018-12-28 12:26:46 -08:00
SsnL	9217bde807	Refactor dataloader.py (#15331 ) Summary: Same as #14668, and was approved there. ailzhang , please apply this patch to Horizon's `data_streamer.py`: https://gist.github.com/SsnL/020fdb3d6b7016d81b6ba1d04cc41459 Thank you! Below is the original description at #14668: As I am working on tasks in https://github.com/pytorch/pytorch/issues/13023, I realized how unreadable the code is because all functions to be run in multiprocessing must be at top global level. Adding more functionalities to `dataloader.py` will only make things worse. So in this PR, I refactor `dataloader.py` and move much of it into `data._utils`. E.g., the `_worker_loop` and related methods are now in `data._utils.worker`, signal handling code in `data._utils.signal_handling`, collating code in `data._utils.collate`, etc. This split, IMHO, makes code much clearer. I will base my future changes to DataLoader on top of this. No functionality is changed, except that I added `torch._six.queue`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15331 Reviewed By: yf225 Differential Revision: D13503120 Pulled By: ailzhang fbshipit-source-id: 94df16b4d80ad1102c437cde0d5a2e62cffe1f8e	2018-12-19 12:36:03 -08:00
Derek Kim	656b565a0f	Trivial comment correction in dataloader (#15276 ) Summary: Trivial comment correction in dataloader Pull Request resolved: https://github.com/pytorch/pytorch/pull/15276 Differential Revision: D13477324 Pulled By: soumith fbshipit-source-id: 2a74a014999655d129311d611f2a09411339cb13	2018-12-15 10:59:00 -08:00
Ailing Zhang	38eb1beff5	Revert D13289919: [pytorch][PR] [DataLoader] Refactor dataloader.py Differential Revision: D13289919 Original commit changeset: d701bc7bb48f fbshipit-source-id: c350c491fefa98a0a7c0cf22cb832e78aeb15c3d	2018-12-04 20:25:16 -08:00
SsnL	16558a1e9d	Refactor dataloader.py (#14668 ) Summary: As I am working on tasks in https://github.com/pytorch/pytorch/issues/13023, I realized how unreadable the code is because all functions to be run in multiprocessing must be at top global level. Adding more functionalities to `dataloader.py` will only make things worse. So in this PR, I refactor `dataloader.py` and move much of it into `data._utils`. E.g., the `_worker_loop` and related methods are now in `data._utils.worker`, signal handling code in `data._utils.signal_handling`, collating code in `data._utils.collate`, etc. This split, IMHO, makes code much clearer. I will base my future changes to DataLoader on top of this. No functionality is changed, except that I added `torch._six.queue`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14668 Reviewed By: soumith Differential Revision: D13289919 Pulled By: ailzhang fbshipit-source-id: d701bc7bb48f5dd7b163b5be941a9d27eb277a4c	2018-12-04 09:53:41 -08:00
Will Feng	5918de8e84	Revert D13166669: [pytorch][PR] Allow dataloader to accept a custom memory pinning function Differential Revision: D13166669 Original commit changeset: ca965f9841d4 fbshipit-source-id: 0836b4f50f73ba01c97491a719660f02e36f20ad	2018-11-26 14:55:04 -08:00
Michael Carilli	7557a993ab	Allow dataloader to accept a custom memory pinning function (#14171 ) Summary: Currently, the `pin_memory_batch` function in the dataloader will return a batch comprised of any unrecognized type without pinning the data, because it doesn't know how. This behavior was preventing us from overlapping data prefetching in Mask-RCNN, whose custom `collate_fn` returns a custom batch type. The present PR adds the ability for the user to pass a `pin_fn` alongside any custom `collate_fn` to handle such custom types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14171 Differential Revision: D13166669 Pulled By: soumith fbshipit-source-id: ca965f9841d4a259b3ca4413c8bd0d8743d433ab	2018-11-23 08:12:43 -08:00
Tongzhou Wang	034c969f3c	Simply exit DataLoader when Python is dying (#12700 ) Summary: I struggled with yet another DataLoader hang for the entire evening. After numerous experiments, I realized that it is unsafe to do anything when Python is shutting down. We also unfortunately implement our DataLaoder cleaning-up logic in `__del__`, a function that may or may not be called during shutdown, and if called, may or may not be called before core library resources are freed. Fortunately, we are already setting all our workers and pin_memory_thread as daemonic. So in case of Python shutting down, we can just do a no-op in `__del__` and rely on the automatic termination of daemonic children. An `atexit` hook is used to detect Python exit. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12700 Differential Revision: D10419027 Pulled By: SsnL fbshipit-source-id: 5753e70d03e69eb1c9ec4ae2154252d51e2f79b0	2018-10-16 22:05:33 -07:00
Jie	a3fb004b18	(#12474 ) Summary: Modifies the DistributedSampler logic. Now each process samples elements with a given interval, instead of a consecutive section. This eliminates the possibility where the DataLoader uses padded data while dropping the real data. It happens when: 1. DistributedSampler padded data; and 2. DataLoader drops_last is effectively true, and drops less then the number of padded data. from the example down, we see that data (10, 11, 12) are padded through duplicating data sample (1, 2, 3) The old sampler drops legit original data (3, 6, 9) and introduces duplication (10, 11) into the training set; while the new sampler logic samples correct data points from the data set. This example has been added to dataloader unit test example: ``` data after shuffle: 1, 2, 3, 4, 5, 6, 7, 8, 9 padded data : 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 old sampler: -> DataLoader with (batch_size=2 and drop_last=True) p 1: 1, 2, 3 1, 2 p 2: 4, 5, 6 4, 5 p 3: 7, 8, 9 7, 8 p 4:10,11,12 10,11 new sampler: -> p 1: 1, 5, 9 1, 5 p 2: 2, 6,10 2, 6 p 3: 3, 7,11 3, 7 p 4: 4, 8,12 4, 8 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12474 Differential Revision: D10260410 Pulled By: SsnL fbshipit-source-id: 710856571260f42ce25955b81a5b8008e04938cf	2018-10-09 11:23:50 -07:00
Tongzhou Wang	11c31aef04	Prevent hanging in data loader altogether Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11985 Differential Revision: D10202374 Pulled By: SsnL fbshipit-source-id: 1ab1a07185f78a104f9b05930a87ef5a32f431e4	2018-10-09 09:54:19 -07:00
Tongzhou Wang	c30790797f	Minor data loader doc improvements Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11821 Differential Revision: D9948292 Pulled By: SsnL fbshipit-source-id: 01c21c129423c0f7844b403e665a8fe021a9c820	2018-09-19 15:33:25 -07:00
Tongzhou Wang	8e76dcf173	Prevent raising KeyboardInterrupt in worker (#11718 ) Summary: Current behavior is that each process (main and workers) will print trace from `KeyboardInterrupt`. And the main process will also print ``` RuntimeError: DataLoader worker (pid 46045) exited unexpectedly with exit code 1. Details are lost due to multiprocessing. Rerunning with nm_workers=0 may give better error trace. ``` due to our SIGCLD handler. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11718 Differential Revision: D9840844 Pulled By: SsnL fbshipit-source-id: 1a05060bb02907fef5aac3f274d2c84f9f42d187	2018-09-14 16:09:35 -07:00
Jeff Smith	05e06f7de2	migrating deprecated calls without abc module for containers (#11515 ) Summary: Implementing #10540. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11515 Reviewed By: apaszke Differential Revision: D9771045 Pulled By: jeffreyksmithjr fbshipit-source-id: 85ea39abaa9b465805a969f122b626b11fc85ef6	2018-09-13 15:09:22 -07:00
Tongzhou Wang	57f149a861	Only join pin_memory_thread after it started (#11599 ) Summary: Same reason as in #11432 . Example error: ``` Exception ignored in: <function _DataLoaderIter.__del__ at 0x7fa06963cf28> Traceback (most recent call last): File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 405, in __del__ self._shutdown_workers() File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 401, in _shutdown_workers self.pin_memory_thread.join() AttributeError: '_DataLoaderIter' object has no attribute 'pin_memory_thread' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11599 Differential Revision: D9801143 Pulled By: SsnL fbshipit-source-id: 520590a21f56fa381fcac621457a7544d3fba47e	2018-09-13 09:40:49 -07:00
Teng Li	0988bbad2d	C10d release to torch.distributed for PT1 (#11405 ) Summary: The old `torch.distributed` will go to `torch.distributed.deprecated` The old DDP will go to `torch.nn.parallel.deprecated` Now `torch.nn.parallel.DDP` will use c10d DDP Now `torch.distributed` will use C10d frontend API Pull Request resolved: https://github.com/pytorch/pytorch/pull/11405 Reviewed By: pietern Differential Revision: D9733733 Pulled By: teng-li fbshipit-source-id: d6a3f3e73f8d3a7fcb1f4baef53c78063b8cbb08	2018-09-10 23:27:22 -07:00

1 2 3

132 Commits