pytorch/docs/source/notes
Tongzhou Wang 058beae411 Add IterableDataset (#19228)
Summary:
This is a modified version of https://github.com/pytorch/pytorch/pull/14705 since commit structure for that PR is quite messy.

1. Add `IterableDataset`.
3. So we have 2 data loader mods: `Iterable` and `Map`.

    1. `Iterable` if the `dataset` is an instance of `IterableDataset`
    2. `Map` o.w.

3. Add better support for non-batch loading (i.e., `batch_size=None` and `batch_sampler=None`). This is useful in doing things like bulk loading.
3. Refactor `DataLoaderIter` into two classes, `_SingleProcessDataLoaderIter` and `_MultiProcessingDataLoaderIter`. Rename some methods to be more generic, e.g., `get_batch` -> `get_data`.
4. Add `torch.utils.data.get_worker_info` which returns worker information in a worker proc (e.g., worker id, dataset obj copy, etc.) and can be used in `IterableDataset.__iter__` and `worker_init_fn` to do per-worker configuration.
5. Add `ChainDataset`, which is the analog of `ConcatDataset` for `IterableDataset`.
7. Import torch.utils.data in `torch/__init__.py`
9. data loader examples and documentations
10. Use `get_worker_info` to detect whether we are in a worker process in `default_collate`

Closes https://github.com/pytorch/pytorch/issues/17909, https://github.com/pytorch/pytorch/issues/18096, https://github.com/pytorch/pytorch/issues/19946, and some of https://github.com/pytorch/pytorch/issues/13023
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19228

Reviewed By: bddppq

Differential Revision: D15058152

fbshipit-source-id: 9e081a901a071d7e4502b88054a34b450ab5ddde
2019-06-20 20:12:44 -07:00
..
autograd.rst [docs] Update autograd notes (#6769) 2018-04-19 13:34:14 -04:00
broadcasting.rst [docs] Update broadcasting and cuda semantics notes (#6904) 2018-04-24 13:41:24 -04:00
cuda.rst Add IterableDataset (#19228) 2019-06-20 20:12:44 -07:00
extending.rst Update extension docs, fix Fold/Unfold docs (#9239) 2018-07-08 19:09:39 -07:00
faq.rst Use "length of the RNN input" instead of "length of the RNN" 2019-05-24 09:03:50 -07:00
multiprocessing.rst Add IterableDataset (#19228) 2019-06-20 20:12:44 -07:00
randomness.rst Update randomness.rst (#21337) 2019-06-04 07:38:00 -07:00
serialization.rst code syntax error in document (serialization.rst) (#937) 2017-03-06 10:06:04 -05:00
windows.rst Add magma for CUDA 10.1 to Windows docs 2019-04-29 10:13:21 -07:00