Summary:
Otherwise you may see errors like
```
Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x000001F99F5CB9D8>
Traceback (most recent call last):
File "C:\Users\Divyansh J\Anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 883, in __del__
self._shutdown_workers()
File "C:\Users\Divyansh J\Anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py", line 860, in _shutdown_workers
if self.workers_status[worker_id]:
IndexError: list index out of range
```
e.g. https://discuss.pytorch.org/t/how-to-construct-dataset-with-iterator-for-multi-process-dataloader/49612/5
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23761
Differential Revision: D16644687
Pulled By: soumith
fbshipit-source-id: a60e847431264525079456ff422317af1ac2be4b
Summary:
When an exception occurs in one of the modules passed to `parallel_apply()`, it is caught and re-raised in the main thread. This preserves the original exception type and message, but has the traceback point at the position where it's re-raised, rather than the original point of failure.
This PR saves the exception information required to generate the traceback, and includes the original traceback in the message of the exception raised in the main thread.
Before:
```
...
File ".../torch/nn/parallel/data_parallel.py", line 153, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File ".../torch/nn/parallel/parallel_apply.py", line 84, in parallel_apply
raise output
RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor
```
After:
```
...
File ".../torch/nn/parallel/data_parallel.py", line 153, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File ".../torch/nn/parallel/parallel_apply.py", line 88, in parallel_apply
''.join(traceback.format_exception(*exc_info)))
RuntimeError: Caught exception in replica 0. Original traceback and message:
Traceback (most recent call last):
...
File "../models/foo.py", line 319, in bar
baz = asdf / ghij[:, np.newaxis]
RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor
```
I took care to raise an exception of the original type (in case the main code checks for that), but replaced the message. It helped me find a bug that did not occur outside `data_parallel()`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18055
Differential Revision: D16444972
Pulled By: zhangguanheng66
fbshipit-source-id: ec436c9d4677fad18106a8046cfa835a20a101ce
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22479
In some cases, for example, when we training on CTR data, we would like to start training from old samples and finish on new recent samples.
This diff add the option to disable the shuffling in DistributedSampler to accommodate this use case.
Reviewed By: soumith
Differential Revision: D16100388
fbshipit-source-id: 35566581f5250040b2db5ec408a63037b47a9f5d
Summary:
This is a modified version of https://github.com/pytorch/pytorch/pull/14705 since commit structure for that PR is quite messy.
1. Add `IterableDataset`.
3. So we have 2 data loader mods: `Iterable` and `Map`.
1. `Iterable` if the `dataset` is an instance of `IterableDataset`
2. `Map` o.w.
3. Add better support for non-batch loading (i.e., `batch_size=None` and `batch_sampler=None`). This is useful in doing things like bulk loading.
3. Refactor `DataLoaderIter` into two classes, `_SingleProcessDataLoaderIter` and `_MultiProcessingDataLoaderIter`. Rename some methods to be more generic, e.g., `get_batch` -> `get_data`.
4. Add `torch.utils.data.get_worker_info` which returns worker information in a worker proc (e.g., worker id, dataset obj copy, etc.) and can be used in `IterableDataset.__iter__` and `worker_init_fn` to do per-worker configuration.
5. Add `ChainDataset`, which is the analog of `ConcatDataset` for `IterableDataset`.
7. Import torch.utils.data in `torch/__init__.py`
9. data loader examples and documentations
10. Use `get_worker_info` to detect whether we are in a worker process in `default_collate`
Closes https://github.com/pytorch/pytorch/issues/17909, https://github.com/pytorch/pytorch/issues/18096, https://github.com/pytorch/pytorch/issues/19946, and some of https://github.com/pytorch/pytorch/issues/13023
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19228
Reviewed By: bddppq
Differential Revision: D15058152
fbshipit-source-id: 9e081a901a071d7e4502b88054a34b450ab5ddde
Summary:
Resubmit #20698 which got messed up.
Idea is that when PyTorch is used in a custom build environment (e.g. Facebook), it's useful to track usage of various APIs centrally. This PR introduces a simple very lightweight mechanism to do so - only first invocation of a trigger point would be logged. This is significantly more lightweight than #18235 and thus we can allow to put logging in e.g. TensorImpl.
Also adds an initial list of trigger points. Trigger points are added in such a way that no static initialization triggers them, i.e. just linking with libtorch.so will not cause any logging. Further suggestions of what to log are welcomed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20745
Differential Revision: D15429196
Pulled By: dzhulgakov
fbshipit-source-id: a5e41a709a65b7ebccc6b95f93854e583cf20aca
Summary:
This is an attempt to isolate unrelated changes from #19228 for easier review.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20150
Differential Revision: D15314891
Pulled By: ezyang
fbshipit-source-id: 8c429747ba83ad5aca4cdd8f8086bcf65a326921
Summary:
It's been hard to understand how workers are launched and what code runs in the worker vs. main process, especially on Windows, which leads to many of our samples failing. This explains when workers run an how to make code work on Windows as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18091
Differential Revision: D15083766
Pulled By: soumith
fbshipit-source-id: 8a7e60defc8a72ec63874f657d7d5267d951dccf
Summary:
Also
1. Bump multiprocessing test timeout following python core tests
2. Fix one type of flakiness in `test_proper_exit`.
3. Add trace reporting when loader process hangs in `test_proper_exit` using `faulthandler`.
3. Give `test_proper_exit` another try.
I'll heavily retest this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19421
Differential Revision: D15063728
Pulled By: ezyang
fbshipit-source-id: 4e0d992622e11053c44a9ec237b88b9a28a4472c
Summary:
fix
- the order of `Arguments` in `RandomSampler` doc
- the meaningless check of `replacement`'s type.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19113
Differential Revision: D15013081
Pulled By: ezyang
fbshipit-source-id: 39e367f42841de6814b1214eb9df7b75f14f747e
Summary:
I haven't had a chance to rigorously try these out yet so don't merge yet.
Closes#18725.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18963
Differential Revision: D14832897
Pulled By: ezyang
fbshipit-source-id: 4780e7a34126bc66ddbfd9d808dfc9e0edd77e68
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598
ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18598 Turn on F401: Unused import warning.**
This was requested by someone at Facebook; this lint is turned
on for Facebook by default. "Sure, why not."
I had to noqa a number of imports in __init__. Hypothetically
we're supposed to use __all__ in this case, but I was too lazy
to fix it. Left for future work.
Be careful! flake8-2 and flake8-3 behave differently with
respect to import resolution for # type: comments. flake8-3 will
report an import unused; flake8-2 will not. For now, I just
noqa'd all these sites.
All the changes were done by hand.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D14687478
fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3
Summary:
Indices in Subset were stored as tensors earlier
passing as list in random_split to ensure integer indexing
fixes: #17466
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17649
Differential Revision: D14400250
Pulled By: soumith
fbshipit-source-id: cd20a959f33773c4babf8e861ea37ec61c2713a0
Summary:
Currently, when you pass a negative index to a `Dataset` created with `ConcatDataset`, it simply passes that index to the first dataset in the list. So if, for example, we took `concatenated_dataset[-1]`, this will give us the last entry of the *first* dataset, rather than the last entry of the *last* dataset, as we would expect.
This is a simple fix to support the expected behavior for negative indices.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15756
Reviewed By: ezyang
Differential Revision: D14081811
Pulled By: fmassa
fbshipit-source-id: a7783fd3fd9e1a8c00fd076c4978ca39ad5a8a2a
Summary:
Renewed attempt at https://github.com/pytorch/pytorch/pull/14171
From the original PR:
> Currently, the pin_memory_batch function in the dataloader will return a batch comprised of any unrecognized type without pinning the data, because it doesn't know how.
>
>This behavior was preventing us from overlapping data prefetching in Mask-RCNN, whose custom collate_fn returns a custom batch type.
The old PR allowed the user to implement batch pinning for custom batch and data types by passing a custom pin function to the dataloader. slayton58 suggested a cleaner approach: allow the user to define a `pin_memory` method on their custom types, and have `pin_memory_batch` [check for the presence of that method](https://github.com/pytorch/pytorch/pull/16743/files#diff-9f154cbd884fe654066b1621fad654f3R56) in the incoming batch as a fallback. I've updated the test and docstrings accordingly.
The old PR was merged but then reverted due to weird cuda OOM errors on windows that may or may not have been related. I have no idea why my changes would cause such errors (then or now) but it's something to keep an eye out for.
fmassa and yf225 who were my POCs on the old PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16743
Differential Revision: D13991745
Pulled By: ezyang
fbshipit-source-id: 74e71f62a03be453b4caa9f5524e9bc53467fa17
Summary:
1. Improve error message for better debugging info
2. Increase timeout
3. Also apply the windows worker failure detection mechanism on non-Windows platforms, for better robustness
Attempt to fix#14501
cc ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16249
Differential Revision: D13784702
Pulled By: ezyang
fbshipit-source-id: 09a7cff83ab9edce561ed69f9fb555ab35d1275f
Summary:
Since #1323 tensors are shared with shared memory, but this feature is not active for numpy.
This PR fix this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14534
Differential Revision: D13561649
Pulled By: soumith
fbshipit-source-id: b6bc9e99fb91e8b675c2ef131fba9fa11c1647c0
Summary:
Same as #14668, and was approved there.
ailzhang , please apply this patch to Horizon's `data_streamer.py`: https://gist.github.com/SsnL/020fdb3d6b7016d81b6ba1d04cc41459 Thank you!
Below is the original description at #14668:
As I am working on tasks in https://github.com/pytorch/pytorch/issues/13023, I realized how unreadable the code is because all functions to be run in multiprocessing must be at top global level. Adding more functionalities to `dataloader.py` will only make things worse.
So in this PR, I refactor `dataloader.py` and move much of it into `data._utils`. E.g., the `_worker_loop` and related methods are now in `data._utils.worker`, signal handling code in `data._utils.signal_handling`, collating code in `data._utils.collate`, etc. This split, IMHO, makes code much clearer. I will base my future changes to DataLoader on top of this.
No functionality is changed, except that I added `torch._six.queue`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15331
Reviewed By: yf225
Differential Revision: D13503120
Pulled By: ailzhang
fbshipit-source-id: 94df16b4d80ad1102c437cde0d5a2e62cffe1f8e
Summary:
As I am working on tasks in https://github.com/pytorch/pytorch/issues/13023, I realized how unreadable the code is because all functions to be run in multiprocessing must be at top global level. Adding more functionalities to `dataloader.py` will only make things worse.
So in this PR, I refactor `dataloader.py` and move much of it into `data._utils`. E.g., the `_worker_loop` and related methods are now in `data._utils.worker`, signal handling code in `data._utils.signal_handling`, collating code in `data._utils.collate`, etc. This split, IMHO, makes code much clearer. I will base my future changes to DataLoader on top of this.
No functionality is changed, except that I added `torch._six.queue`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14668
Reviewed By: soumith
Differential Revision: D13289919
Pulled By: ailzhang
fbshipit-source-id: d701bc7bb48f5dd7b163b5be941a9d27eb277a4c
Summary:
Currently, the `pin_memory_batch` function in the dataloader will return a batch comprised of any unrecognized type without pinning the data, because it doesn't know how.
This behavior was preventing us from overlapping data prefetching in Mask-RCNN, whose custom `collate_fn` returns a custom batch type.
The present PR adds the ability for the user to pass a `pin_fn` alongside any custom `collate_fn` to handle such custom types.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14171
Differential Revision: D13166669
Pulled By: soumith
fbshipit-source-id: ca965f9841d4a259b3ca4413c8bd0d8743d433ab
Summary:
I struggled with yet another DataLoader hang for the entire evening. After numerous experiments, I realized that it is unsafe to do anything when Python is shutting down. We also unfortunately implement our DataLaoder cleaning-up logic in `__del__`, a function that may or may not be called during shutdown, and if called, may or may not be called before core library resources are freed.
Fortunately, we are already setting all our workers and pin_memory_thread as daemonic. So in case of Python shutting down, we can just do a no-op in `__del__` and rely on the automatic termination of daemonic children.
An `atexit` hook is used to detect Python exit.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12700
Differential Revision: D10419027
Pulled By: SsnL
fbshipit-source-id: 5753e70d03e69eb1c9ec4ae2154252d51e2f79b0
Summary:
Modifies the DistributedSampler logic. Now each process samples elements with
a given interval, instead of a consecutive section.
This eliminates the possibility where the DataLoader uses padded data while
dropping the real data. It happens when:
1. DistributedSampler padded data; and
2. DataLoader drops_last is effectively true, and drops less then the number
of padded data.
from the example down, we see that data (10, 11, 12) are padded through
duplicating data sample (1, 2, 3)
The old sampler drops legit original data (3, 6, 9) and introduces duplication
(10, 11) into the training set; while the new sampler logic samples correct data
points from the data set.
This example has been added to dataloader unit test
example:
```
data after shuffle: 1, 2, 3, 4, 5, 6, 7, 8, 9
padded data : 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
old sampler: -> DataLoader with (batch_size=2 and drop_last=True)
p 1: 1, 2, 3 1, 2
p 2: 4, 5, 6 4, 5
p 3: 7, 8, 9 7, 8
p 4:10,11,12 10,11
new sampler: ->
p 1: 1, 5, 9 1, 5
p 2: 2, 6,10 2, 6
p 3: 3, 7,11 3, 7
p 4: 4, 8,12 4, 8
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12474
Differential Revision: D10260410
Pulled By: SsnL
fbshipit-source-id: 710856571260f42ce25955b81a5b8008e04938cf
Summary:
Current behavior is that each process (main and workers) will print trace from `KeyboardInterrupt`. And the main process will also print
```
RuntimeError: DataLoader worker (pid 46045) exited unexpectedly with exit code 1. Details are lost due to multiprocessing. Rerunning with nm_workers=0 may give better error trace.
```
due to our SIGCLD handler.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11718
Differential Revision: D9840844
Pulled By: SsnL
fbshipit-source-id: 1a05060bb02907fef5aac3f274d2c84f9f42d187
Summary:
The old `torch.distributed` will go to `torch.distributed.deprecated`
The old DDP will go to `torch.nn.parallel.deprecated`
Now `torch.nn.parallel.DDP` will use c10d DDP
Now `torch.distributed` will use C10d frontend API
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11405
Reviewed By: pietern
Differential Revision: D9733733
Pulled By: teng-li
fbshipit-source-id: d6a3f3e73f8d3a7fcb1f4baef53c78063b8cbb08