Summary:
This is a modified version of https://github.com/pytorch/pytorch/pull/14705 since commit structure for that PR is quite messy.
1. Add `IterableDataset`.
3. So we have 2 data loader mods: `Iterable` and `Map`.
1. `Iterable` if the `dataset` is an instance of `IterableDataset`
2. `Map` o.w.
3. Add better support for non-batch loading (i.e., `batch_size=None` and `batch_sampler=None`). This is useful in doing things like bulk loading.
3. Refactor `DataLoaderIter` into two classes, `_SingleProcessDataLoaderIter` and `_MultiProcessingDataLoaderIter`. Rename some methods to be more generic, e.g., `get_batch` -> `get_data`.
4. Add `torch.utils.data.get_worker_info` which returns worker information in a worker proc (e.g., worker id, dataset obj copy, etc.) and can be used in `IterableDataset.__iter__` and `worker_init_fn` to do per-worker configuration.
5. Add `ChainDataset`, which is the analog of `ConcatDataset` for `IterableDataset`.
7. Import torch.utils.data in `torch/__init__.py`
9. data loader examples and documentations
10. Use `get_worker_info` to detect whether we are in a worker process in `default_collate`
Closes https://github.com/pytorch/pytorch/issues/17909, https://github.com/pytorch/pytorch/issues/18096, https://github.com/pytorch/pytorch/issues/19946, and some of https://github.com/pytorch/pytorch/issues/13023
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19228
Reviewed By: bddppq
Differential Revision: D15058152
fbshipit-source-id: 9e081a901a071d7e4502b88054a34b450ab5ddde
Summary:
Indices in Subset were stored as tensors earlier
passing as list in random_split to ensure integer indexing
fixes: #17466
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17649
Differential Revision: D14400250
Pulled By: soumith
fbshipit-source-id: cd20a959f33773c4babf8e861ea37ec61c2713a0
Summary:
Currently, when you pass a negative index to a `Dataset` created with `ConcatDataset`, it simply passes that index to the first dataset in the list. So if, for example, we took `concatenated_dataset[-1]`, this will give us the last entry of the *first* dataset, rather than the last entry of the *last* dataset, as we would expect.
This is a simple fix to support the expected behavior for negative indices.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15756
Reviewed By: ezyang
Differential Revision: D14081811
Pulled By: fmassa
fbshipit-source-id: a7783fd3fd9e1a8c00fd076c4978ca39ad5a8a2a
Keeping compatibility, enable TensorDataset to get any number of tensors.
* Enable TensorDataset to get any number of tensors
* Update dataset.py
Fix syntax error on python 2.7
* Add several test for tensordataset
* Fix whitespaces
* Simplify args
* Update dataset.py
Here's the command I used to invoke autopep8 (in parallel!):
git ls-files | grep '\.py$' | xargs -n1 -P`nproc` autopep8 -i
Several rules are ignored in setup.cfg. The goal is to let autopep8
handle everything which it can handle safely, and to disable any rules
which are tricky or controversial to address. We may want to come back
and re-enable some of these rules later, but I'm trying to make this
patch as safe as possible.
Also configures flake8 to match pep8's behavior.
Also configures TravisCI to check the whole project for lint.