pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
PyTorch MergeBot	3d7428d9ac	Revert "[lint] upgrade mypy to latest version" This reverts commit `9bf18aab94`. Reverted https://github.com/pytorch/pytorch/pull/76753 on behalf of https://github.com/suo	2022-05-03 20:01:18 +00:00
Michael Suo	9bf18aab94	[lint] upgrade mypy to latest version Fixes https://github.com/pytorch/pytorch/issues/75927. Had to fix some bugs and add some ignores. To check if clean: ``` lintrunner --paths-cmd='git grep -Il .' --take MYPY,MYPYSTRICT ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/76753 Approved by: https://github.com/malfet	2022-05-03 19:43:28 +00:00
Erjia Guan	0289ab2cec	Fix data-related public API (#368 ) Summary: X-link: https://github.com/pytorch/data/pull/368 This is PR aims to expose the right data-relate API. There are two more changes made in this PR to convert public api to private api `check_lambda_fn` -> `_check_lambda_fn` `deprecation_warning` -> `_deprecation_warning` Pull Request resolved: https://github.com/pytorch/pytorch/pull/76143 Reviewed By: albanD, NivekT Differential Revision: D35798311 Pulled By: ejguan fbshipit-source-id: b13fded5c88a533c706702fb2070c918c839dca4 (cherry picked from commit 0b534b829a2e90e1e533951c6d334fdeaa9358b9)	2022-04-21 17:27:05 -07:00
Jeeja	45bbc4c028	Update Dataloader with default parameter device (#65402 ) Summary: pin_memory, has optional device parameter to specify which device you want to pin for. With this above change the Dataloader will work only for CUDA backend. To add support for other backend which supports pinned memory, dataloader is updated with device as optional parameter. Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/65402 Reviewed By: zou3519 Differential Revision: D32282204 Pulled By: VitalyFedyunin fbshipit-source-id: e2e09876969af108d0db38af7c2d1b2f1cfa9858 (cherry picked from commit 3b76e151964fce442e27fe8fb5c37af930da4fa1)	2022-04-21 01:33:53 +00:00
Philip Meier	04db1b874f	prevent overriding shuffle settings in DataLoader for datapipes Fixes https://github.com/pytorch/data/issues/295 Follow-up to https://github.com/pytorch/pytorch/pull/75014#issuecomment-1091921305. We only need to update locations where we actually check `shuffle` for identity with a boolean value, i.e. `shuffle is False`. For bool-ish checks like `if shuffle:`, `None` behaves just like `False`. `IterDataPipe`'s are currently not mentioned in the docstring. Since this change only applies to them, I didn't update it. LMK, if I should do that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75505 Approved by: https://github.com/ejguan	2022-04-12 18:26:33 +00:00
Philip Meier	3c10987692	don't add extra shuffle in DataLoader2 if one is present Without this, `DataLoader2` will just add an `Shuffler` to the end of the datapipe if `shuffle=True`: ```py from torch.utils.data.dataloader_experimental import DataLoader2 from torchdata.datapipes.iter import IterableWrapper, IterDataPipe, Shuffler class Sorter(IterDataPipe): def __init__(self, datapipe): self.datapipe = datapipe def __iter__(self): return iter(sorted(self.datapipe)) data = list(range(1000)) dp = IterableWrapper(data) dp = Shuffler(dp).set_shuffle(False) dp = Sorter(dp) dl2 = DataLoader2(dp, shuffle=True, batch_size=None) assert list(dl2) == data # fails unless you hit a lucky random seed ``` This example is somewhat non-sensical, but demonstrates we cannot simply add a `Shuffler`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75014 Approved by: https://github.com/ejguan	2022-04-05 19:53:08 +00:00
amin-nejad	cce831c805	Fix misleading DataLoader docstring Fixes description of `prefetch_factor` argument to `DataLoader` as discussed in #58030 Pull Request resolved: https://github.com/pytorch/pytorch/pull/74558 Approved by: https://github.com/NivekT	2022-03-28 17:54:48 +00:00
Evren Tumer	7534525735	Reset worker cycle iterator for determinism across runs (#73675 ) Summary: Reset worker cycle iterator for determinism across runs Fixes https://github.com/pytorch/pytorch/issues/73603 Pull Request resolved: https://github.com/pytorch/pytorch/pull/73675 Reviewed By: bdhirsh Differential Revision: D34688704 Pulled By: ejguan fbshipit-source-id: 7bab11f0b9f59645d9b168fa11d92dc7c2c4d34e (cherry picked from commit eb5fd559224988f9967528e154cf37c5031fe7c2)	2022-03-09 14:55:07 +00:00
Erjia Guan	67a275c293	Fix persistent worker exits before pin_memory thread (#71579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71579 Fixes #1551 As the comment in the code, register a function to terminate persistent workers. By adding a reference of these workers in `atexit`, it would prevent Python interpreter kills these persistent worker processes before `pin_memorh_thread` exits. And, if users explicitly kills DataLoader iterator, such function in `atexit` would be a no-op. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D33896537 Pulled By: ejguan fbshipit-source-id: 36b57eac7523d8aa180180c2b61fc693ea4638ae (cherry picked from commit `05add2ae0f`)	2022-02-01 23:57:17 +00:00
Nikita Shulga	86aefdc082	Revert D33694867: Fix persistent worker exits before pin_memory thread Test Plan: revert-hammer Differential Revision: D33694867 (`e2191e7084`) Original commit changeset: 0847f4d424a0 Original Phabricator Diff: D33694867 (`e2191e7084`) fbshipit-source-id: 5f28616700d8647cbe468a9e300724a7f0c6cc15 (cherry picked from commit `3d8125ba6d`)	2022-01-22 00:09:28 +00:00
Erjia Guan	e2191e7084	Fix persistent worker exits before pin_memory thread (#71579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71579 Fixes #1551 As the comment in the code, register a function to terminate persistent workers. Using `atexit` to make sure termination of persistent workers always happens at the end (after pin_memory_thread exits). We need such mechanism because Python interpreter would clean up worker process before DataLoader iterator in some rare cases. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D33694867 Pulled By: ejguan fbshipit-source-id: 0847f4d424a0cd6b3c0be8235d505415970254e8 (cherry picked from commit `18ad4621af`)	2022-01-21 20:31:16 +00:00
Erjia Guan	0721fc6474	Decouple MapDataPipe from Dataset (#70991 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70991 Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D33477680 Pulled By: ejguan fbshipit-source-id: d3e89492e921a96791319f35052a229684ddf7cf	2022-01-07 14:28:41 -08:00
Kevin Tse	b67eaec853	[DateLoader] more clearly expose 'default_collate' and 'default_convert' to users (#69862 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69862 Fixes #69445 cc SsnL VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan, ngimel Differential Revision: D33068792 Pulled By: NivekT fbshipit-source-id: ef9791acdc23d014b8761fa7420062d454ce8969	2021-12-14 11:18:26 -08:00
Vitaly Fedyunin	d90012689f	[DataPipe] Control shuffle settings from DataLoader2 (#65756 ) Summary: Makes `shuffle` DataPipe sensitive to DataLoader(2) `shuffle` kwarg. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65756 Reviewed By: albanD Differential Revision: D31344867 Pulled By: VitalyFedyunin fbshipit-source-id: e0084e0ac193ac784d6298328ca1222745681347	2021-12-14 07:35:26 -08:00
Erjia Guan	060e41eafa	Forward fix type hint for DataLoader (#66001 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66001 Test Plan: Imported from OSS Reviewed By: NivekT Differential Revision: D31340565 Pulled By: ejguan fbshipit-source-id: d05ae42ebf93f61d781dc5d81ef0222e24f5acb3	2021-10-01 15:48:45 -07:00
Michael Suo	21da6ae9ce	suppress mypy error (#66003 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66003 Differential Revision: D31340874 D31340874 Test Plan: Imported from OSS Reviewed By: seemethere Pulled By: suo fbshipit-source-id: d9ef0f40625fe5ff21f8a5e044d5a75400367dc2	2021-10-01 09:17:42 -07:00
Roman Shapovalov	fc52f1293e	Improve pytorch type hints (Dataloader, trig functions) Summary: This is to fix Pyre errors in our applications: * calling `tensor.cos()` etc. * creating a data loader with batch sampler that is `List[List[int]]`. Test Plan: TODO: rebase the diffs and run Pyre. Reviewed By: ejguan Differential Revision: D31309564 fbshipit-source-id: 1c6f3070d7570260de170e2fe2153d277b246745	2021-10-01 06:53:57 -07:00
Adam J. Stewart	e5ab0d1013	DataLoader: allow non-integer Samplers (#63500 ) Summary: Not entirely sure how to use TypeVar but if someone could give me a hint it would be appreciated. Also let me know if you want me to add tests so we can make sure non-integer samplers actually work. It seems like `test/test_dataloader.py` is the correct location but that's a big file. Fixes https://github.com/pytorch/pytorch/issues/63483 ejguan Pull Request resolved: https://github.com/pytorch/pytorch/pull/63500 Reviewed By: mruberry Differential Revision: D30403689 Pulled By: ejguan fbshipit-source-id: 464e09e5aad3215b94a29cc5e21cb4b10ec136e3	2021-08-19 14:55:46 -07:00
Victor Bittorf	91c076eadc	Add TorchVitals for DataLoader (#60959 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60959 Add TorchVitals for Dataloader, this indicates that the data loader was enabled. This is a no-op if TORCH_VITALS environment variable is not set. Test Plan: buck test mode/dbg caffe2/test:torch -- --regex vitals Reviewed By: VitalyFedyunin Differential Revision: D29445146 fbshipit-source-id: d5778fff3dafb3c0463fec7a498bff4905597518	2021-06-29 14:08:32 -07:00
Philip Meier	d5988c5eca	remove unused `type: ignore` directives (#60006 ) Summary: During development it is common practice to put `type: ignore` comments on lines that are correct, but `mypy` doesn't recognize this. This often stems from the fact, that the used `mypy` version wasn't able to handle the used pattern. With every new release `mypy` gets better at handling complex code. In addition to fix all the previously accepted but now failing patterns, we should also revisit all `type: ignore` comments to see if they are still needed or not. Fortunately, we don't need to do it manually: by adding `warn_unused_ignores = True` to the configuration, `mypy` will error out in case it encounters an `type: ignore` that is no longer needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60006 Reviewed By: jbschlosser, malfet Differential Revision: D29133237 Pulled By: albanD fbshipit-source-id: 41e82edc5cd5affa7ccedad044b59b94dad4425a	2021-06-18 07:23:31 -07:00
Erjia Guan	8cf85a1152	[DataLoader][doc] Randomness for base_seed generator and NumPy seed (#56528 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56528 Tried to search across internal and external usage of DataLoader. People haven't started to use `generator` for `DataLoader`. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27908487 Pulled By: ejguan fbshipit-source-id: 14c83ed40d4ba4dc988b121968a78c2732d8eb93	2021-04-22 09:40:45 -07:00
Erjia Guan	aec83ff45e	[DataLoader] Add Numpy seeding to worker of DataLoader (#56488 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56488 Considering amount of requests for this feature, introduce numpy seeding as default within each worker for DataLoader. ## BC-breaking Note: - By introducing default numpy.random seeding strategy to workers of DataLoader, users don't need to manually set seed for workers by the `worker_init_fn`. And this PR won't influence users who are currently using `worker_init_fn` to set customized seed for workers. - DataLoader will preserve reproducibility for users who are using numpy.random within Dataset. - Multiprocessing (without `worker_init_fn` to define seed for numpy) - Start method as `spawn`: Each worker will now have seed for numpy random, rather than the seed generated from the imported time of Numpy module that make the DataLoader lose the reproducibility. - Start method as `fork`: Each worker not only have the same benefit as `spawn`, but also have different seed for numpy as default, rather than inheriting the same seed. Using the following Dataset and script as an example: ```py class RandomDataset(Dataset): def __getitem__(self, ind): item = [ind, np.random.randint(1, 10000)] return item def __len__(self): return 20 if __name__ == '__main__'" ctx = mp.get_context('fork') ds = RandomDataset() g = torch.Generator() g.manual_seed(0) dl = DataLoader(ds, 2, shuffle=False, num_workers=4, multiprocessing_context=ctx, generator=g) epochs = 2 for _ in range(epochs): for batch in d;: print(batch) print("====" * 10) ``` ### 1.8.1: Each worker generates same random result per iteration. And the seed will be reset to same for each epoch. ```py tensor([[ 0, 7449], [ 1, 1519]]) tensor([[ 2, 7449], [ 3, 1519]]) tensor([[ 4, 9645], [ 5, 2387]]) tensor([[ 6, 9645], [ 7, 2387]]) tensor([[ 8, 3118], [ 9, 4552]]) ========================= tensor([[ 0, 7449], [ 1, 1519]]) tensor([[ 2, 7449], [ 3, 1519]]) tensor([[ 4, 9645], [ 5, 2387]]) tensor([[ 6, 9645], [ 7, 2387]]) tensor([[ 8, 3118], [ 9, 4552]]) ========================= ``` ### This PR: Each worker has different seed at the beginning and re-seed for each epoch. ```py tensor([[ 0, 8715], [ 1, 5555]]) tensor([[ 2, 6379], [ 3, 1432]]) tensor([[ 4, 3271], [ 5, 5132]]) tensor([[ 6, 4287], [ 7, 1104]]) tensor([[ 8, 8682], [ 9, 1699]]) ========================= tensor([[ 0, 1374], [ 1, 996]]) tensor([[ 2, 143], [ 3, 3507]]) tensor([[ 4, 5887], [ 5, 4730]]) tensor([[ 6, 7274], [ 7, 738]]) tensor([[ 8, 6374], [ 9, 1572]]) ========================= ``` Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27908486 Pulled By: ejguan fbshipit-source-id: 5f313a30563bedeb88be214fa4beca0cefe9e4f4	2021-04-22 09:39:33 -07:00
Sam Estep	75024e228c	Add lint for unqualified `type: ignore` (#56290 ) Summary: The other half of https://github.com/pytorch/pytorch/issues/56272. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56290 Test Plan: CI should pass on the tip of this PR, and we know that the lint works because the following CI runs (before this PR was finished) failed: - https://github.com/pytorch/pytorch/runs/2384511062 - https://github.com/pytorch/pytorch/actions/runs/765036024 Reviewed By: seemethere Differential Revision: D27867219 Pulled By: samestep fbshipit-source-id: e648f07b6822867e70833e23ddafe7fb7eaca235	2021-04-21 08:07:23 -07:00
Zhiyuan Chen	7d4e9bdba1	Add type hint for SequentialSampler (#56374 ) Summary: Add type hint for SequentialSampler Pull Request resolved: https://github.com/pytorch/pytorch/pull/56374 Reviewed By: heitorschueroff Differential Revision: D27884528 Pulled By: ejguan fbshipit-source-id: 68eb900643098565743245c843e76e464f981458	2021-04-20 14:45:52 -07:00
danielgordon10	7f1693d95e	Fix type hints of the callable arguments for DataLoader (#52924 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52806 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52924 Reviewed By: malfet Differential Revision: D26694894 Pulled By: ejguan fbshipit-source-id: 55734ec9684caa90f1e599b65659b7c57047f802	2021-02-27 07:45:49 -08:00
Chester Liu	58eb23378f	Clean up usage of torch._six partially (#49785 ) Summary: See https://github.com/pytorch/pytorch/issues/42919 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49785 Reviewed By: mruberry Differential Revision: D25963833 Pulled By: bugra fbshipit-source-id: 11c90d6b8d3f206c9d0a4d8621b773beb10c6ba2	2021-02-08 13:58:34 -08:00
Tongzhou Wang	54ce171f16	Fix persistent_workers + pin_memory (#48543 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48370 https://github.com/pytorch/pytorch/issues/47445 cc emcastillo who authored the original functionality. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48543 Reviewed By: bdhirsh Differential Revision: D25277474 Pulled By: ejguan fbshipit-source-id: 1967002124fb0fff57caca8982bc7df359a059a2	2021-01-08 07:04:10 -08:00
Hugo van Kemenade	473e78c0fa	Remove redundant code for unsupported Python versions (#49486 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49486 Remove code for Python 3.5 and lower. There's more that can be removed/modernised, but sticking mainly to redundant version checks here, to keep the diff/PR smaller. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46579 Reviewed By: zou3519 Differential Revision: D24453571 Pulled By: ezyang fbshipit-source-id: c2cfcf05d6c5f65df64d89c331692c9aec09248e	2021-01-06 12:45:46 -08:00
Samuel Marks	e6779d4357	[*.py] Rename "Arguments:" to "Args:" (#49736 ) Summary: I've written custom parsers and emitters for everything from docstrings to classes and functions. However, I recently came across an issue when I was parsing/generating from the TensorFlow codebase: inconsistent use of `Args:` and `Arguments:` in its docstrings. ```sh (pytorch#c348fae)$ for name in 'Args:' 'Arguments:'; do printf '%-10s %04d\n' "$name" "$(rg -IFtpy --count-matches "$name" \| paste -s -d+ -- \| bc)"; done Args: 1095 Arguments: 0336 ``` It is easy enough to extend my parsers to support both variants, however it looks like `Arguments:` is wrong anyway, as per: - https://google.github.io/styleguide/pyguide.html#doc-function-args @ [`ddccc0f`](https://github.com/google/styleguide/blob/ddccc0f/pyguide.md) - https://chromium.googlesource.com/chromiumos/docs/+/master/styleguide/python.md#describing-arguments-in-docstrings @ [`9fc0fc0`](https://chromium.googlesource.com/chromiumos/docs/+/9fc0fc0/styleguide/python.md) - https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html @ [`c0ae8e3`](https://github.com/sphinx-contrib/napoleon/blob/c0ae8e3/docs/source/example_google.rst) Therefore, only `Args:` is valid. This PR replaces them throughout the codebase. PS: For related PRs, see tensorflow/tensorflow/pull/45420 PPS: The trackbacks automatically appearing below are sending the same changes to other repositories in the [PyTorch](https://github.com/pytorch) organisation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49736 Reviewed By: albanD Differential Revision: D25710534 Pulled By: soumith fbshipit-source-id: 61e8ff01abb433e9f78185c2d1d0cbd7c22c1619	2020-12-28 09:34:47 -08:00
Tom McClintock	a3aafea076	Fixed a typo in dataloader.py. (#49437 ) Summary: This small PR fixes a one character typo in the docstring for `DataLoader`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49437 Reviewed By: ngimel Differential Revision: D25665971 Pulled By: mrshenli fbshipit-source-id: b60f975f1e3bf0bb8f88e39f490f716c602f087e	2020-12-21 10:27:24 -08:00
Teng Gao	1c31f76297	Add high level profiling trace for dataloading and optimizer (#47655 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47441 To give user more information about python level functions in profiler traces, we propose to instrument on the following functions: ``` _BaseDataLoaderIter.__next__ Optimizer.step Optimizer.zero_grad ``` Because the record_function already uses if (!active) to check whether the profiler is enabled, so we don't explicitly call torch.autograd._profiler_enabled() before each instrument. Acknowledgement: nbcsm, guotuofeng, gunandrose4u , guyang3532 , mszhanyi Pull Request resolved: https://github.com/pytorch/pytorch/pull/47655 Reviewed By: smessmer Differential Revision: D24960386 Pulled By: ilia-cher fbshipit-source-id: 2eb655789e2e2f506e1b8f95ad3d470c83281102	2020-12-09 00:13:56 -08:00
Tongzhou Wang	1112773cf5	Fix unintended error when worker force kill happens #43455 (#43462 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/43455 Pull Request resolved: https://github.com/pytorch/pytorch/pull/43462 Reviewed By: bdhirsh Differential Revision: D25277759 Pulled By: VitalyFedyunin fbshipit-source-id: 0bb0d87374c0403853d71aac2c242374bfc7acf2	2020-12-02 21:42:16 -08:00
SsnL	4abca9067b	Fix dataloader hang with large sampler (#48669 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48669 Reviewed By: zhangguanheng66 Differential Revision: D25255763 Pulled By: VitalyFedyunin fbshipit-source-id: d06421f52bb1d00cdf8025f1a2ba0d1f9284731a	2020-12-02 09:07:30 -08:00
lixinyu	67b7e751e6	add warning if DataLoader is going to create excessive number of thread (#46867 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46867 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D24545540 Pulled By: glaringlee fbshipit-source-id: a3bef0d417e535b8ec0bb33f39cfa2308aadfff0	2020-10-30 07:54:23 -07:00
Vitaly Fedyunin	31ee5d8d8b	Adding information how to control randomness with DataLoader (#45749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45749 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D24088407 Pulled By: VitalyFedyunin fbshipit-source-id: 398b73ec5e8c83000ebc692001da847fc0aaa48f	2020-10-12 16:57:58 -07:00
Emilio Castillo	5472426b9f	Reset `DataLoader` workers instead of creating new ones (#35795 ) Summary: This PR needs discussion as it changes the behavior of `DataLoader`. It can be closed if its not considered a good practice. Currently, the `DataLoader` spawns a new `_BaseDataLoaderIter` object every epoch, In the case of the multiprocess DataLoader, every epoch the worker processes are re-created and they make a copy of the original `Dataset` object. If users want to cache data or do some tracking on their datasets, all their data will be wiped out every epoch. Notice that this doesn't happen when the number of workers is 0. giving some inconsistencies with the multiprocess and serial data loaders. This PR keeps the `_BaseDataLoaderIter` object alive and just resets it within epochs, so the workers remain active and so their own `Dataset` objects. People seem to file issues about this often. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35795 Reviewed By: ailzhang Differential Revision: D23426612 Pulled By: VitalyFedyunin fbshipit-source-id: e16950036bae35548cd0cfa78faa06b6c232a2ea	2020-09-01 11:48:00 -07:00
Akihiro Nitta	f17d7a5556	Fix exception chaining in `torch/` (#43836 ) Summary: ## Motivation Fixes https://github.com/pytorch/pytorch/issues/43770. ## Description of the change This PR fixes exception chaining only in files under `torch/` where appropriate. To fix exception chaining, I used either: 1. `raise new_exception from old_exception` where `new_exception` itself seems not descriptive enough to debug or `old_exception` delivers valuable information. 2. `raise new_exception from None` where raising both of `new_exception` and `old_exception` seems a bit noisy and redundant. I subjectively chose which one to use from the above options. ## List of lines containing raise in except clause: I wrote [this simple script](https://gist.github.com/akihironitta/4223c1b32404b36c1b349d70c4c93b4d) using [ast](https://docs.python.org/3.8/library/ast.html#module-ast) to list lines where `raise`ing in `except` clause. - [x] `000739c31a/torch/jit/annotations.py (L35)` - [x] `000739c31a/torch/jit/annotations.py (L150)` - [x] `000739c31a/torch/jit/annotations.py (L158)` - [x] `000739c31a/torch/jit/annotations.py (L231)` - [x] `000739c31a/torch/jit/_trace.py (L432)` - [x] `000739c31a/torch/nn/utils/prune.py (L192)` - [x] `000739c31a/torch/cuda/nvtx.py (L7)` - [x] `000739c31a/torch/utils/cpp_extension.py (L1537)` - [x] `000739c31a/torch/utils/tensorboard/_pytorch_graph.py (L292)` - [x] `000739c31a/torch/utils/data/dataloader.py (L835)` - [x] `000739c31a/torch/utils/data/dataloader.py (L849)` - [x] `000739c31a/torch/utils/data/dataloader.py (L856)` - [x] `000739c31a/torch/testing/_internal/common_utils.py (L186)` - [x] `000739c31a/torch/testing/_internal/common_utils.py (L189)` - [x] `000739c31a/torch/testing/_internal/common_utils.py (L424)` - [x] `000739c31a/torch/testing/_internal/common_utils.py (L1279)` - [x] `000739c31a/torch/testing/_internal/common_utils.py (L1283)` - [x] `000739c31a/torch/testing/_internal/common_utils.py (L1356)` - [x] `000739c31a/torch/testing/_internal/common_utils.py (L1388)` - [x] `000739c31a/torch/testing/_internal/common_utils.py (L1391)` - [ ] `000739c31a/torch/testing/_internal/common_utils.py (L1412)` - [x] `000739c31a/torch/testing/_internal/codegen/random_topo_test.py (L310)` - [x] `000739c31a/torch/testing/_internal/codegen/random_topo_test.py (L329)` - [x] `000739c31a/torch/testing/_internal/codegen/random_topo_test.py (L332)` - [x] `000739c31a/torch/testing/_internal/jit_utils.py (L183)` - [x] `000739c31a/torch/testing/_internal/common_nn.py (L4789)` - [x] `000739c31a/torch/onnx/utils.py (L367)` - [x] `000739c31a/torch/onnx/utils.py (L659)` - [x] `000739c31a/torch/onnx/utils.py (L892)` - [x] `000739c31a/torch/onnx/utils.py (L897)` - [x] `000739c31a/torch/serialization.py (L108)` - [x] `000739c31a/torch/serialization.py (L754)` - [x] `000739c31a/torch/distributed/rpc/_testing/faulty_agent_backend_registry.py (L76)` - [x] `000739c31a/torch/distributed/rpc/backend_registry.py (L260)` - [x] `000739c31a/torch/distributed/distributed_c10d.py (L184)` - [x] `000739c31a/torch/_utils_internal.py (L57)` - [x] `000739c31a/torch/hub.py (L494)` - [x] `000739c31a/torch/contrib/_tensorboard_vis.py (L16)` - [x] `000739c31a/torch/distributions/lowrank_multivariate_normal.py (L100)` - [x] `000739c31a/torch/distributions/constraint_registry.py (L142)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/43836 Reviewed By: ailzhang Differential Revision: D23431212 Pulled By: malfet fbshipit-source-id: 5f7f41b391164a5ad0efc06e55cd58c23408a921	2020-08-31 20:26:23 -07:00
Ralf Gommers	bcab2d6848	And type annotations for cpp_extension, utils.data, signal_handling (#42647 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42647 Reviewed By: ezyang Differential Revision: D22967041 Pulled By: malfet fbshipit-source-id: 35e124da0be56934faef56834a93b2b400decf66	2020-08-06 09:42:07 -07:00
yl-to	1b55e2b043	add prefetch_factor for multiprocessing prefetching process (#41130 ) Summary: fix https://github.com/pytorch/pytorch/issues/40604 Add parameter to Dataloader to configure the per-worker prefetch number. Before this edit, the prefetch process always prefetch 2 * num_workers data items, this commit help us make this configurable, e.x. you can specify to prefetch 10 * num_workers data items. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41130 Reviewed By: izdeby Differential Revision: D22705288 Pulled By: albanD fbshipit-source-id: 2c483fce409735fef1351eb5aa0b033f8e596561	2020-07-24 08:38:13 -07:00
SsnL	1922f2212a	Make IterableDataset dataloader.__len__ warning clearer (#41175 ) Summary: Based on discussion with jlucier (https://github.com/pytorch/pytorch/pull/38925#issuecomment-655859195) . `batch_size` change isn't made because data loader only has the notion of `batch_sampler`, not batch size. If `batch_size` dependent sharding is needed, users can still access it from their own code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41175 Differential Revision: D22456525 Pulled By: zou3519 fbshipit-source-id: 5281fcf14807f219de06e32107d5fe7d5b6a8623	2020-07-09 13:49:29 -07:00
Wojciech Baranowski	0e09511af9	type annotations for dataloader, dataset, sampler (#39392 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/38913 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39392 Reviewed By: anjali411 Differential Revision: D22102489 Pulled By: zou3519 fbshipit-source-id: acb68d9521145f0b047214d62b5bdc5a0d1b9be4	2020-07-07 07:16:18 -07:00
Tongzhou Wang	019eeb3183	Kill DataLoader worker when we can't join (#39869 ) Summary: There still are occasional reports of DataLoader workers not exiting (e.g., https://github.com/pytorch/pytorch/issues/39570). Before we figure out why, we should just kill them if the join timesout to prevent hanging. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39869 Differential Revision: D22018501 Pulled By: ezyang fbshipit-source-id: 66a00d0f5b3e303b6106b336949176b3ff8ac8ae	2020-06-15 11:18:23 -07:00
ShawnZhong	c8c53c802e	Add `generator=` kwarg for DataLoader & random samplers (#39737 ) Summary: Fix https://github.com/pytorch/pytorch/issues/39572 Add `generator=` kwarg for DataLoader & random samplers cc: SsnL, deeppatel4557, albanD, mitar Pull Request resolved: https://github.com/pytorch/pytorch/pull/39737 Differential Revision: D22019132 Pulled By: albanD fbshipit-source-id: 835e08b86c5396bc0b0e41057661306b15394d6e	2020-06-15 07:01:20 -07:00
Daiming Yang	0b90b9cdd3	Allow shuffle when auto-batching disabled in DataLoader (#39865 ) Summary: Fix https://github.com/pytorch/pytorch/issues/35761 cc SsnL Note: closed the other PR for this new branch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39865 Differential Revision: D22003612 Pulled By: ezyang fbshipit-source-id: 26aecd1b298fe99d3924f4c8157cd6cae2561c7c	2020-06-11 15:17:46 -07:00
Donna Choi	3d2fce6bc3	Change len(DataLoader) for IterableDataset (#38925 ) Summary: Fix https://github.com/pytorch/pytorch/issues/36176 One-liner change to ensure that ```len(loader) == (len(dataset) // batch_size)``` for IterableDataset. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38925 Differential Revision: D21731587 Pulled By: ezyang fbshipit-source-id: 59a086165a004c0c1c8a1ee0776b1444bd26de23	2020-05-27 11:56:41 -07:00
SsnL	b5868b2833	Relax sampler check in BatchSampler (#38403 ) Summary: Since the check was added in https://github.com/pytorch/pytorch/pull/6249, one can not pass an iterable as a sampler to the data loader anymore, which was a very handy feature (e.g., https://github.com/pytorch/pytorch/issues/1337). I think the check should be removed for two-fold reasons: 1. It is too strict. There is no reason that it should not be a general iterable. 2. It is inconsistent. In `DataLoader` (the main place where people use samplers), you can pass a general iterable as `batch_sampler` but not `sampler` due to this check. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38403 Differential Revision: D21555958 Pulled By: soumith fbshipit-source-id: c7267bb99a31edd8f2750689205d6edc5dab5cff	2020-05-13 22:24:29 -07:00
Wojciech Baranowski	69e3ee2d5f	DataLoader: properly diagnose exceeding file descriptor limit (#34768 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/973 Common failure scenario: * DataLoader creates workers and communicates with them through SHMs * Workers send back through an AF_UNIX socket file descriptors to SHMs containing data * The limit of open files gets fully used * A FD gets stripped from a socket message coming back from a worker, without the worker knowing this. * This causes a `RuntimeError: received 0 items of ancdata` in the standard `multiprocessing` package * The exception is not handled by PyTorch and so is presented to the users. After this change the user will see ``` Traceback (most recent call last): File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 761, in _try_get_data data = self._data_queue.get(timeout=timeout) File "/home/wbaranowski/miniconda3/envs/pytorch-cuda-dev/lib/python3.6/multiprocessing/queues.py", line 113, in get return _ForkingPickler.loads(res) File "/home/wbaranowski/git/Quansight/pytorch/torch/multiprocessing/reductions.py", line 294, in rebuild_storage_fd fd = df.detach() File "/home/wbaranowski/miniconda3/envs/pytorch-cuda-dev/lib/python3.6/multiprocessing/resource_sharer.py", line 58, in detach return reduction.recv_handle(conn) File "/home/wbaranowski/miniconda3/envs/pytorch-cuda-dev/lib/python3.6/multiprocessing/reduction.py", line 184, in recv_handle return recvfds(s, 1)[0] File "/home/wbaranowski/miniconda3/envs/pytorch-cuda-dev/lib/python3.6/multiprocessing/reduction.py", line 162, in recvfds len(ancdata)) RuntimeError: received 0 items of ancdata During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 787, in _try_get_data fs = [tempfile.NamedTemporaryFile() for i in range(10)] File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 787, in <listcomp> fs = [tempfile.NamedTemporaryFile() for i in range(10)] File "/home/wbaranowski/miniconda3/envs/pytorch-cuda-dev/lib/python3.6/tempfile.py", line 551, in NamedTemporaryFile (fd, name) = _mkstemp_inner(dir, prefix, suffix, flags, output_type) File "/home/wbaranowski/miniconda3/envs/pytorch-cuda-dev/lib/python3.6/tempfile.py", line 262, in _mkstemp_inner fd = _os.open(file, flags, 0o600) OSError: [Errno 24] Too many open files: '/tmp/tmpnx_f6v_f' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "test_shm_leak.py", line 56, in <module> worker_init_fn=worker_init_fn File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 345, in __next__ data = self._next_data() File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 861, in _next_data idx, data = self._get_data() File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 828, in _get_data success, data = self._try_get_data() File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 791, in _try_get_data "Too many open files. Communication with the" RuntimeError: Too many open files. Communication with the workers is no longer possible. Please increase the limit using `ulimit -n` in the shell or change the sharing strategy by calling `torch.multiprocessing.set_sharing_strategy('file_system')` at the beginning of your code ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34768 Differential Revision: D20538053 Pulled By: ezyang fbshipit-source-id: be4425cf2fa02aff61619b2b829c153cb1a867cb	2020-04-14 07:10:57 -07:00
Hong Xu	817e4f9ef1	Correct a ValueError in dataloader to TypeError (#36244 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36244 Differential Revision: D20963949 Pulled By: ezyang fbshipit-source-id: 8c6aa4831021788052269e7aa8282d11eba4e085	2020-04-10 09:03:58 -07:00
Tongzhou Wang	4ef854b4b4	Fix potential hang when exiting main process (#33721 ) Summary: The following script reproduces the hang ```py import multiprocessing, logging logger = multiprocessing.log_to_stderr() logger.setLevel(multiprocessing.SUBDEBUG) import torch class Dataset: def __len__(self): return 23425 def __getitem__(self, idx): return torch.randn(3, 128, 128), idx % 100 ds = Dataset() trdl = torch.utils.data.DataLoader(ds, batch_size=64, num_workers=300, pin_memory=True, shuffle=True) for e in range(1000): for ii, (x, y) in enumerate(trdl): print(f'tr {e: 5d} {ii: 5d} avg y={y.mean(dtype=torch.double).item()}') if ii % 2 == 0: print("="200 + "BEFORE ERROR" + "="200) 1/0 ``` The process will hang at joining the putting thread of `data_queue` in main process. The root cause is that too many things are put in the queue from the worker processes, and the `put` at `062ac6b472/torch/utils/data/dataloader.py (L928)` is blocked at background thread. The `pin_memory_thread` exits from the set `pin_memory_thread_done_event`, without getting the `(None, None)`. Hence, the main process needs the same treatment as the workers did at `062ac6b472/torch/utils/data/_utils/worker.py (L198)` . After the patch, the script finishes correctly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33721 Differential Revision: D20089209 Pulled By: ezyang fbshipit-source-id: e73fbfdd7631afe1ce5e1edd05dbdeb7b85ba961	2020-02-25 07:04:41 -08:00
Tongzhou Wang	c37de32b23	Enable len(dataloader) for iterable dataset (#23587 ) Summary: Copy-paste comment from code for reasoning: ``` # NOTE [ IterableDataset and __len__ ] # # For `IterableDataset`, `__len__` could be inaccurate when one naively # does multi-processing data loading, since the samples will be duplicated. # However, no real use case should be actually using that behavior, so # it should count as a user error. We should generally trust user # code to do the proper thing (e.g., configure each replica differently # in `__iter__`), and give us the correct `__len__` if they choose to # implement it (this will still throw if the dataset does not implement # a `__len__`). # # To provide a further warning, we track if `__len__` was called on the # `DataLoader`, save the returned value in `self._len_called`, and warn # if the iterator ends up yielding more than this number of samples. ``` Fixes https://github.com/pytorch/pytorch/issues/30184 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23587 Differential Revision: D18852625 Pulled By: ailzhang fbshipit-source-id: aea8d4d70c7f21aaa69b35908a6f43026493d826	2019-12-06 15:38:05 -08:00
Nathan Goldbaum	f522bde121	Replace references to _DataLoaderIter with _BaseDataLoaderIter (#27105 ) Summary: Back in April, malmaud added type annotations for `dataloader.py`. However, at about the same time, SsnL in https://github.com/pytorch/pytorch/issues/19228 replaced `_DataLoaderIter` with `_BaseDataLoaderIter` and two subclasses, `_SingleProcessDataLoaderIter`, and `_MultiProcessingDataLoaderIter`. However - probably because these changes happened in parallel at roughly the same time, the type stubs and several other references in the codebase were never updated to match this refactoring. I've gone ahead and done the updates to reflect the refactoring in https://github.com/pytorch/pytorch/issues/19228, which fixes the specific type stub/impelementation mismatch pointed out in https://github.com/pytorch/pytorch/issues/26673, although not the broader problem that pytorch doesn't have a test to make sure that the `.pyi` type stub files match the real API defined in `.py` files. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27105 Differential Revision: D17813641 Pulled By: ezyang fbshipit-source-id: ed7ac025c8d6ad3f298dd073347ec83bb4b6600c	2019-10-08 12:09:02 -07:00
Michael Kuchnik	e5d9a5e5be	Fix typo in docs. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26263 Differential Revision: D17397190 Pulled By: ezyang fbshipit-source-id: 62e3c4c3021c728a3314262528579676d605a81e	2019-09-17 07:46:49 -07:00
SsnL	df9d8f9032	Fix no auto batching bugs: cannot bulk load; not work with namedtuple (#26065 ) Summary: see title Pull Request resolved: https://github.com/pytorch/pytorch/pull/26065 Differential Revision: D17392851 Pulled By: soumith fbshipit-source-id: 468cd41c8e03d689ff2e0261d948e28daad6bfaf	2019-09-16 07:22:31 -07:00
Tongzhou Wang	928754b67d	make more iterator attributes private (#23744 ) Summary: 1. Prefixed underscores to any `DataLoaderIter` attribute that is not part of the data loader ctor argument list. 2. Prefixed `DataLoader.dataset_kind` with underscore because it only makes sense with the private enum `_DatasetKind`, and is an implementation detail. 3. Disallow setting `DataLoader.dataset` and `DataLoader.batch_sampler` after initializing a `DataLoader` because they affect other attributes in `__init__`. These changes should not have major BC breaking effect since the big changes are on the iterator class and most users don't even store it. I GitHub searched `pin_memory_thread` and (while I didn't look through all result pages) results I see are forks of pytorch and blog posts on how data loader works. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23744 Differential Revision: D16732507 Pulled By: ezyang fbshipit-source-id: 9f04d000b4200b8047f31eaa3473780b66cebd26	2019-08-09 11:43:00 -07:00
SsnL	ed19580dc4	Fix dataloader._shutdown_workers if not all workers are started (#23761 ) Summary: Otherwise you may see errors like ``` Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x000001F99F5CB9D8> Traceback (most recent call last): File "C:\Users\Divyansh J\Anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 883, in __del__ self._shutdown_workers() File "C:\Users\Divyansh J\Anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 860, in _shutdown_workers if self.workers_status[worker_id]: IndexError: list index out of range ``` e.g. https://discuss.pytorch.org/t/how-to-construct-dataset-with-iterator-for-multi-process-dataloader/49612/5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23761 Differential Revision: D16644687 Pulled By: soumith fbshipit-source-id: a60e847431264525079456ff422317af1ac2be4b	2019-08-07 09:06:11 -07:00
Tongzhou Wang	0539462ca2	Fix pin_memory_thread not exiting quickly (#23646 ) Summary: fixes https://github.com/pytorch/pytorch/issues/23642 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23646 Differential Revision: D16600874 Pulled By: soumith fbshipit-source-id: 50f0828d774a558d6f21e9dd21135906bd5be128	2019-08-01 15:24:14 -07:00
SsnL	e982e46de3	Add multiprocessing_context= argument to DataLoader (#22990 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/22131 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22990 Differential Revision: D16539052 Pulled By: colesbury fbshipit-source-id: b1c48ae2fb54065dd96a67be263254129e02eaa2	2019-07-29 12:58:40 -07:00
Jan Schlüter	0bc90194fb	Catch and print exception traceback in parallel_apply() workers (#18055 ) Summary: When an exception occurs in one of the modules passed to `parallel_apply()`, it is caught and re-raised in the main thread. This preserves the original exception type and message, but has the traceback point at the position where it's re-raised, rather than the original point of failure. This PR saves the exception information required to generate the traceback, and includes the original traceback in the message of the exception raised in the main thread. Before: ``` ... File ".../torch/nn/parallel/data_parallel.py", line 153, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File ".../torch/nn/parallel/parallel_apply.py", line 84, in parallel_apply raise output RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor ``` After: ``` ... File ".../torch/nn/parallel/data_parallel.py", line 153, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File ".../torch/nn/parallel/parallel_apply.py", line 88, in parallel_apply ''.join(traceback.format_exception(*exc_info))) RuntimeError: Caught exception in replica 0. Original traceback and message: Traceback (most recent call last): ... File "../models/foo.py", line 319, in bar baz = asdf / ghij[:, np.newaxis] RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor ``` I took care to raise an exception of the original type (in case the main code checks for that), but replaced the message. It helped me find a bug that did not occur outside `data_parallel()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18055 Differential Revision: D16444972 Pulled By: zhangguanheng66 fbshipit-source-id: ec436c9d4677fad18106a8046cfa835a20a101ce	2019-07-26 11:41:22 -07:00
Tongzhou Wang	e4b75c6580	Fix typo in dataloader.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23132 Differential Revision: D16402759 Pulled By: ezyang fbshipit-source-id: 9500570f6b7492a67a2af853bfb63a5667e6b7b5	2019-07-23 08:45:47 -07:00
Arul	43d36415b9	torch.utils.data.Dataloader: documentation about RNG state consumption (#22540 ) Summary: the outcome from the pytorch forum issue: https://discuss.pytorch.org/t/dataloader-problem-problem-arises-when-shuffle-true/45631 The discussion is here: https://github.com/pytorch/pytorch/pull/20749 Pull Request resolved: https://github.com/pytorch/pytorch/pull/22540 Differential Revision: D16131777 Pulled By: ezyang fbshipit-source-id: 566deda1b44dc7fae54250e9b508d120851a2848	2019-07-08 08:22:04 -07:00
Tongzhou Wang	058beae411	Add IterableDataset (#19228 ) Summary: This is a modified version of https://github.com/pytorch/pytorch/pull/14705 since commit structure for that PR is quite messy. 1. Add `IterableDataset`. 3. So we have 2 data loader mods: `Iterable` and `Map`. 1. `Iterable` if the `dataset` is an instance of `IterableDataset` 2. `Map` o.w. 3. Add better support for non-batch loading (i.e., `batch_size=None` and `batch_sampler=None`). This is useful in doing things like bulk loading. 3. Refactor `DataLoaderIter` into two classes, `_SingleProcessDataLoaderIter` and `_MultiProcessingDataLoaderIter`. Rename some methods to be more generic, e.g., `get_batch` -> `get_data`. 4. Add `torch.utils.data.get_worker_info` which returns worker information in a worker proc (e.g., worker id, dataset obj copy, etc.) and can be used in `IterableDataset.__iter__` and `worker_init_fn` to do per-worker configuration. 5. Add `ChainDataset`, which is the analog of `ConcatDataset` for `IterableDataset`. 7. Import torch.utils.data in `torch/__init__.py` 9. data loader examples and documentations 10. Use `get_worker_info` to detect whether we are in a worker process in `default_collate` Closes https://github.com/pytorch/pytorch/issues/17909, https://github.com/pytorch/pytorch/issues/18096, https://github.com/pytorch/pytorch/issues/19946, and some of https://github.com/pytorch/pytorch/issues/13023 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19228 Reviewed By: bddppq Differential Revision: D15058152 fbshipit-source-id: 9e081a901a071d7e4502b88054a34b450ab5ddde	2019-06-20 20:12:44 -07:00
jpgard	0556141339	fix small typo muliprocessing -> multiprocessing (#20998 ) Summary: Minor typo fix in docstring. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20998 Differential Revision: D15514698 Pulled By: soumith fbshipit-source-id: a9ceb557251ff5868e810331195243b6a8717851	2019-05-27 21:36:13 -07:00
Dmytro Dzhulgakov	c25e33789e	Lightweight at-most-once logging for API usage (#20745 ) Summary: Resubmit #20698 which got messed up. Idea is that when PyTorch is used in a custom build environment (e.g. Facebook), it's useful to track usage of various APIs centrally. This PR introduces a simple very lightweight mechanism to do so - only first invocation of a trigger point would be logged. This is significantly more lightweight than #18235 and thus we can allow to put logging in e.g. TensorImpl. Also adds an initial list of trigger points. Trigger points are added in such a way that no static initialization triggers them, i.e. just linking with libtorch.so will not cause any logging. Further suggestions of what to log are welcomed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20745 Differential Revision: D15429196 Pulled By: dzhulgakov fbshipit-source-id: a5e41a709a65b7ebccc6b95f93854e583cf20aca	2019-05-23 23:17:59 -07:00
Edward Z. Yang	9b1dbffba5	Re-sync with internal repository (#20702 )	2019-05-20 09:22:57 -04:00
Dmytro Dzhulgakov	d3059b9c49	Lightweight logging for once-only API usage	2019-05-19 23:04:40 -07:00
Michael Antonov	698103cdd6	DataLoader docs update to describe how workers are managed, including Windows. (#18091 ) Summary: It's been hard to understand how workers are launched and what code runs in the worker vs. main process, especially on Windows, which leads to many of our samples failing. This explains when workers run an how to make code work on Windows as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18091 Differential Revision: D15083766 Pulled By: soumith fbshipit-source-id: 8a7e60defc8a72ec63874f657d7d5267d951dccf	2019-04-26 16:01:30 -07:00
SsnL	5e62ee2b97	Fix no SIGCHLD checking in DataLoaderIter._shutdown_workers (#19421 ) Summary: Also 1. Bump multiprocessing test timeout following python core tests 2. Fix one type of flakiness in `test_proper_exit`. 3. Add trace reporting when loader process hangs in `test_proper_exit` using `faulthandler`. 3. Give `test_proper_exit` another try. I'll heavily retest this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19421 Differential Revision: D15063728 Pulled By: ezyang fbshipit-source-id: 4e0d992622e11053c44a9ec237b88b9a28a4472c	2019-04-24 08:06:58 -07:00
Stas Bekman	c0a2452ffe	multiline KeyError msg python bug workaround (#18557 ) Summary: make multiline KeyError msg readable by working around a python bug https://bugs.python.org/issue2651 discussion: https://github.com/pytorch/pytorch/issues/16647 Pull Request resolved: https://github.com/pytorch/pytorch/pull/18557 Differential Revision: D14681086 Pulled By: soumith fbshipit-source-id: acbd13a823302c854c3d364028ed414fd8ce6bc8	2019-03-29 07:04:20 -07:00
Daniel	e5742494f6	Minor typo Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16980 Differential Revision: D14033686 Pulled By: gchanan fbshipit-source-id: 9f7967defc6795640e14157d0b701b185061741f	2019-02-12 08:02:04 -08:00
Michael Carilli	0742874643	Allow dataloader to accept a custom memory pinning function (#16743 ) Summary: Renewed attempt at https://github.com/pytorch/pytorch/pull/14171 From the original PR: > Currently, the pin_memory_batch function in the dataloader will return a batch comprised of any unrecognized type without pinning the data, because it doesn't know how. > >This behavior was preventing us from overlapping data prefetching in Mask-RCNN, whose custom collate_fn returns a custom batch type. The old PR allowed the user to implement batch pinning for custom batch and data types by passing a custom pin function to the dataloader. slayton58 suggested a cleaner approach: allow the user to define a `pin_memory` method on their custom types, and have `pin_memory_batch` [check for the presence of that method](https://github.com/pytorch/pytorch/pull/16743/files#diff-9f154cbd884fe654066b1621fad654f3R56) in the incoming batch as a fallback. I've updated the test and docstrings accordingly. The old PR was merged but then reverted due to weird cuda OOM errors on windows that may or may not have been related. I have no idea why my changes would cause such errors (then or now) but it's something to keep an eye out for. fmassa and yf225 who were my POCs on the old PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16743 Differential Revision: D13991745 Pulled By: ezyang fbshipit-source-id: 74e71f62a03be453b4caa9f5524e9bc53467fa17	2019-02-10 19:37:53 -08:00
SsnL	4aae89fa7b	Make test_proper_exit more robust (#16249 ) Summary: 1. Improve error message for better debugging info 2. Increase timeout 3. Also apply the windows worker failure detection mechanism on non-Windows platforms, for better robustness Attempt to fix #14501 cc ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/16249 Differential Revision: D13784702 Pulled By: ezyang fbshipit-source-id: 09a7cff83ab9edce561ed69f9fb555ab35d1275f	2019-01-25 08:25:05 -08:00
SsnL	9b5ec2a076	Fix TestDataLoader.test_proper_exit (#15665 ) Summary: Currently, in `test_proper_exit`, 1. we do not kill the correct input `pid` in the `kill_pid` function `fe15d6a2c2/test/test_dataloader.py (L325-L329)` 2. the Windows command that detects process status doesn't actually work `fe15d6a2c2/test/test_dataloader.py (L641-L646)` 3. `worker_error` and `worker_kill` cases (sometimes?) are not tested because the workers may exit naturally due to the pre-fetching mechanism and a too small `dataset size / batch size`. In this PR, I, in separate commits: 1. Install `psutil` (a python package specifically built for process monitoring) on some CI builds. (Linux builds installation are done in https://github.com/pietern/pytorch-dockerfiles/pull/29 https://github.com/pietern/pytorch-dockerfiles/pull/30 https://github.com/pytorch/ossci-job-dsl/pull/36 and https://github.com/pytorch/pytorch/pull/15795). 2. Rewrite `test_proper_exit` with `psutil` so we 1. do not rely on the hacky `is_process_alive` `fe15d6a2c2/test/test_dataloader.py (L640-L653)` 2. increase the #task per worker so `worker_error` and `worker_kill` properly trigger 3. test error message content to ensure that the loader exits with correct message corresponding to each exiting scenario. 3. Fix Windows data loader not having any mechanism to detect worker failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15665 Differential Revision: D13615527 Pulled By: soumith fbshipit-source-id: cfb2f67837d2d87928a53f00b4d20f09754b7949	2019-01-10 08:47:27 -08:00
SsnL	9217bde807	Refactor dataloader.py (#15331 ) Summary: Same as #14668, and was approved there. ailzhang , please apply this patch to Horizon's `data_streamer.py`: https://gist.github.com/SsnL/020fdb3d6b7016d81b6ba1d04cc41459 Thank you! Below is the original description at #14668: As I am working on tasks in https://github.com/pytorch/pytorch/issues/13023, I realized how unreadable the code is because all functions to be run in multiprocessing must be at top global level. Adding more functionalities to `dataloader.py` will only make things worse. So in this PR, I refactor `dataloader.py` and move much of it into `data._utils`. E.g., the `_worker_loop` and related methods are now in `data._utils.worker`, signal handling code in `data._utils.signal_handling`, collating code in `data._utils.collate`, etc. This split, IMHO, makes code much clearer. I will base my future changes to DataLoader on top of this. No functionality is changed, except that I added `torch._six.queue`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15331 Reviewed By: yf225 Differential Revision: D13503120 Pulled By: ailzhang fbshipit-source-id: 94df16b4d80ad1102c437cde0d5a2e62cffe1f8e	2018-12-19 12:36:03 -08:00
Derek Kim	656b565a0f	Trivial comment correction in dataloader (#15276 ) Summary: Trivial comment correction in dataloader Pull Request resolved: https://github.com/pytorch/pytorch/pull/15276 Differential Revision: D13477324 Pulled By: soumith fbshipit-source-id: 2a74a014999655d129311d611f2a09411339cb13	2018-12-15 10:59:00 -08:00
Ailing Zhang	38eb1beff5	Revert D13289919: [pytorch][PR] [DataLoader] Refactor dataloader.py Differential Revision: D13289919 Original commit changeset: d701bc7bb48f fbshipit-source-id: c350c491fefa98a0a7c0cf22cb832e78aeb15c3d	2018-12-04 20:25:16 -08:00
SsnL	16558a1e9d	Refactor dataloader.py (#14668 ) Summary: As I am working on tasks in https://github.com/pytorch/pytorch/issues/13023, I realized how unreadable the code is because all functions to be run in multiprocessing must be at top global level. Adding more functionalities to `dataloader.py` will only make things worse. So in this PR, I refactor `dataloader.py` and move much of it into `data._utils`. E.g., the `_worker_loop` and related methods are now in `data._utils.worker`, signal handling code in `data._utils.signal_handling`, collating code in `data._utils.collate`, etc. This split, IMHO, makes code much clearer. I will base my future changes to DataLoader on top of this. No functionality is changed, except that I added `torch._six.queue`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14668 Reviewed By: soumith Differential Revision: D13289919 Pulled By: ailzhang fbshipit-source-id: d701bc7bb48f5dd7b163b5be941a9d27eb277a4c	2018-12-04 09:53:41 -08:00
Will Feng	5918de8e84	Revert D13166669: [pytorch][PR] Allow dataloader to accept a custom memory pinning function Differential Revision: D13166669 Original commit changeset: ca965f9841d4 fbshipit-source-id: 0836b4f50f73ba01c97491a719660f02e36f20ad	2018-11-26 14:55:04 -08:00
Michael Carilli	7557a993ab	Allow dataloader to accept a custom memory pinning function (#14171 ) Summary: Currently, the `pin_memory_batch` function in the dataloader will return a batch comprised of any unrecognized type without pinning the data, because it doesn't know how. This behavior was preventing us from overlapping data prefetching in Mask-RCNN, whose custom `collate_fn` returns a custom batch type. The present PR adds the ability for the user to pass a `pin_fn` alongside any custom `collate_fn` to handle such custom types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14171 Differential Revision: D13166669 Pulled By: soumith fbshipit-source-id: ca965f9841d4a259b3ca4413c8bd0d8743d433ab	2018-11-23 08:12:43 -08:00
Tongzhou Wang	034c969f3c	Simply exit DataLoader when Python is dying (#12700 ) Summary: I struggled with yet another DataLoader hang for the entire evening. After numerous experiments, I realized that it is unsafe to do anything when Python is shutting down. We also unfortunately implement our DataLaoder cleaning-up logic in `__del__`, a function that may or may not be called during shutdown, and if called, may or may not be called before core library resources are freed. Fortunately, we are already setting all our workers and pin_memory_thread as daemonic. So in case of Python shutting down, we can just do a no-op in `__del__` and rely on the automatic termination of daemonic children. An `atexit` hook is used to detect Python exit. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12700 Differential Revision: D10419027 Pulled By: SsnL fbshipit-source-id: 5753e70d03e69eb1c9ec4ae2154252d51e2f79b0	2018-10-16 22:05:33 -07:00
Tongzhou Wang	11c31aef04	Prevent hanging in data loader altogether Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11985 Differential Revision: D10202374 Pulled By: SsnL fbshipit-source-id: 1ab1a07185f78a104f9b05930a87ef5a32f431e4	2018-10-09 09:54:19 -07:00
Tongzhou Wang	c30790797f	Minor data loader doc improvements Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11821 Differential Revision: D9948292 Pulled By: SsnL fbshipit-source-id: 01c21c129423c0f7844b403e665a8fe021a9c820	2018-09-19 15:33:25 -07:00
Tongzhou Wang	8e76dcf173	Prevent raising KeyboardInterrupt in worker (#11718 ) Summary: Current behavior is that each process (main and workers) will print trace from `KeyboardInterrupt`. And the main process will also print ``` RuntimeError: DataLoader worker (pid 46045) exited unexpectedly with exit code 1. Details are lost due to multiprocessing. Rerunning with nm_workers=0 may give better error trace. ``` due to our SIGCLD handler. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11718 Differential Revision: D9840844 Pulled By: SsnL fbshipit-source-id: 1a05060bb02907fef5aac3f274d2c84f9f42d187	2018-09-14 16:09:35 -07:00
Jeff Smith	05e06f7de2	migrating deprecated calls without abc module for containers (#11515 ) Summary: Implementing #10540. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11515 Reviewed By: apaszke Differential Revision: D9771045 Pulled By: jeffreyksmithjr fbshipit-source-id: 85ea39abaa9b465805a969f122b626b11fc85ef6	2018-09-13 15:09:22 -07:00
Tongzhou Wang	57f149a861	Only join pin_memory_thread after it started (#11599 ) Summary: Same reason as in #11432 . Example error: ``` Exception ignored in: <function _DataLoaderIter.__del__ at 0x7fa06963cf28> Traceback (most recent call last): File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 405, in __del__ self._shutdown_workers() File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 401, in _shutdown_workers self.pin_memory_thread.join() AttributeError: '_DataLoaderIter' object has no attribute 'pin_memory_thread' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11599 Differential Revision: D9801143 Pulled By: SsnL fbshipit-source-id: 520590a21f56fa381fcac621457a7544d3fba47e	2018-09-13 09:40:49 -07:00
Tongzhou Wang	560d6efd3a	Only join started dataloader workers (#11432 ) Summary: `Process.start()` actually take some time as it needs to start a process and pass the arguments over via a pipe. Therefore, we only add a worker to self.workers list after it started, so that we do not call `.join()` if program dies before it starts, and `__del__` tries to join it but will get: AssertionError: can only join a started process. Example trace when such error happens: ```py [unrelated] File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 500, in __iter__ return _DataLoaderIter(self) File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 292, in __init__ w.start() File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/process.py", line 112, in start self._popen = self._Popen(self) File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/context.py", line 277, in _Popen return Popen(process_obj) File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__ self._launch(process_obj) File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/popen_fork.py", line 70, in _launch self.pid = os.fork() KeyboardInterrupt Exception ignored in: <function _DataLoaderIter.__del__ at 0x7fa704d5aa60> Traceback (most recent call last): File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 398, in __del__ self._shutdown_workers() File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 392, in _shutdown_workers w.join() File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/process.py", line 139, in join assert self._popen is not None, 'can only join a started process' AssertionError: can only join a started process ``` No test because hard to reliably trigger. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11432 Reviewed By: ezyang Differential Revision: D9735430 Pulled By: SsnL fbshipit-source-id: a8912d9bb4063f210d6236267b178173810e2351	2018-09-09 12:55:51 -07:00
Tongzhou Wang	04f381650e	Resubmit: Fix dataloader hang when it is not completely iterated (#10366 ) Summary: https://github.com/pytorch/pytorch/pull/9655 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10366 Differential Revision: D9237393 Pulled By: SsnL fbshipit-source-id: fabfad7f371ba33300098f6b885c0e3f26c3e14a	2018-08-09 00:10:24 -07:00
Tongzhou Wang	a7f183f971	Revert "Fix dataloader hang when it is not completely iterated (#9655 )" (#9804 ) Summary: This reverts commit `9ee5133651`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9804 Reviewed By: ezyang Differential Revision: D8987780 Pulled By: SsnL fbshipit-source-id: 75ad70b0b8d672d0b35235fa248b187be64b68e5	2018-07-25 10:10:30 -07:00
Tongzhou Wang	9ee5133651	Fix dataloader hang when it is not completely iterated (#9655 ) Summary: second trial of https://github.com/pytorch/pytorch/pull/7140 cc csarofeen Let's see if this works. It passes everything locally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9655 Differential Revision: D8940177 Pulled By: SsnL fbshipit-source-id: 8d6340fc9f7355c71e1e26b262da166402faa158	2018-07-22 20:38:27 -07:00
tvn	146b951ec5	Fix seeding random module in DataLoader (#7886 ) * fix seeding random module * make base seed int * follow 0.4 idiom * add a test for random seeding	2018-05-29 15:55:04 -04:00
Maxim Berman	03767b66db	Add FileNotFoundError to torch._six (#7524 ) Add FileNotFoundError for compatibility with Python 2 and use in dataloader. Fixes pytorch/pytorch#6932	2018-05-12 20:54:26 -04:00
vfdev	6363faf184	Fix issue #7209 in DataLoader (#7265 )	2018-05-04 10:51:46 +02:00
Thomas Viehmann	1b0ad8678b	import *Sampler to utils.data (Better fix than #6982 ) (#7007 )	2018-04-27 10:18:29 +02:00
Will Feng	e8bdbdaa27	Terminate dataloader workers properly when parent process is SIGKILL'ed (#6779 ) Reopening #6606 with fix for TEST_CUDA import issue on Windows and improvement to how we wait for manager exit in test_manager_unclean_exit. Loop tested on the Windows CI multiple times to make sure this actually fixes the CUDA OOM issue. * Terminate dataloader workers properly when parent process is SIGKILL'ed * Wait for worker processes to finish before shutting down manager process * Add test for checking proper worker exit * cosmetic change * Test only if CUDA exists * Don't call multiprocessing.set_start_method() in Python 2 * import TEST_CUDA only when we are in __main__ * Tune JOIN_TIMEOUT * handle os.getppid() == 0 case * Reset to original JOIN_TIMEOUT * Use WaitForSingleObject() to check parent process status on Windows * Fix TEST_CUDA import * clean up * Check main process only when index_queue.get() times out * Change index_queues to multiprocessing.Queue * Move manager checking logic to watchdog class * Fix bugs in dataloader * Fix TEST_CUDA import issue * Don't import TEST_CUDA from common_nn * Use event to signal manager exit in test * fix lint * Add comments	2018-04-22 23:03:54 -04:00
gchanan	4c5b95a433	Revert "Terminate dataloader workers properly when parent process is SIGKILL'ed (#6606 )" (#6772 ) This reverts commit `8d6a50aaeb`.	2018-04-19 14:28:48 -04:00
Tongzhou Wang	072d49f787	Fix import error sometimes happening in dataloader when exiting Python (#6671 ) * Fix import error sometimes happening in dataloader when exiting Python * address comments	2018-04-19 06:56:39 -04:00
Will Feng	8d6a50aaeb	Terminate dataloader workers properly when parent process is SIGKILL'ed (#6606 ) * Terminate dataloader workers properly when parent process is SIGKILL'ed * Wait for worker processes to finish before shutting down manager process * Add test for checking proper worker exit * cosmetic change * Test only if CUDA exists * Don't call multiprocessing.set_start_method() in Python 2 * import TEST_CUDA only when we are in __main__ * Tune JOIN_TIMEOUT * handle os.getppid() == 0 case * Reset to original JOIN_TIMEOUT * Use WaitForSingleObject() to check parent process status on Windows * Fix TEST_CUDA import * clean up * Check main process only when index_queue.get() times out * Change index_queues to multiprocessing.Queue * Move manager checking logic to watchdog class * Fix bugs in dataloader * Fix TEST_CUDA import issue	2018-04-18 20:41:33 -04:00
Tongzhou Wang	1c01eabd3c	Codemod to update our codebase to 0.4 standard (#6641 ) * Codemod to update our codebase to 0.4 standard * Update some of the test scri[ts * remove Variable in test_clip_grad_value * fix _symbolic_override_wrapper_maker	2018-04-17 22:06:54 -04:00
Sanjeev Satheesh	f15f3ca1af	Scope variables inside the dataloader (#6673 ) * Scope variables inside the dataloader This clears up the memory consumed by batches inside the dataloader. Its pretty useful for long living data loaders. * Update dataloader.py	2018-04-17 17:48:12 -04:00
Tongzhou Wang	6b7ec95abb	Link relevant FAQ section in DataLoader docs (#6476 ) * Link FAQ section on workers returning same random numbers in DataLoader docs * explicitly mention section names	2018-04-11 13:41:46 -04:00
Tongzhou Wang	60a16e5663	Set dataloader.batch_size = None when batch_sampler is given (#6108 )	2018-03-30 10:01:09 +02:00

1 2 3 4

187 Commits