Commit Graph

16 Commits

Author SHA1 Message Date
Tongzhou Wang
fde75a33e1 update IterableDataset doc to be consistent with current behavior
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22230

Differential Revision: D15994680

Pulled By: ezyang

fbshipit-source-id: 9e47e8369aa08a550987c4468ce75aa7650ee1d4
2019-06-26 06:49:22 -07:00
Tongzhou Wang
058beae411 Add IterableDataset (#19228)
Summary:
This is a modified version of https://github.com/pytorch/pytorch/pull/14705 since commit structure for that PR is quite messy.

1. Add `IterableDataset`.
3. So we have 2 data loader mods: `Iterable` and `Map`.

    1. `Iterable` if the `dataset` is an instance of `IterableDataset`
    2. `Map` o.w.

3. Add better support for non-batch loading (i.e., `batch_size=None` and `batch_sampler=None`). This is useful in doing things like bulk loading.
3. Refactor `DataLoaderIter` into two classes, `_SingleProcessDataLoaderIter` and `_MultiProcessingDataLoaderIter`. Rename some methods to be more generic, e.g., `get_batch` -> `get_data`.
4. Add `torch.utils.data.get_worker_info` which returns worker information in a worker proc (e.g., worker id, dataset obj copy, etc.) and can be used in `IterableDataset.__iter__` and `worker_init_fn` to do per-worker configuration.
5. Add `ChainDataset`, which is the analog of `ConcatDataset` for `IterableDataset`.
7. Import torch.utils.data in `torch/__init__.py`
9. data loader examples and documentations
10. Use `get_worker_info` to detect whether we are in a worker process in `default_collate`

Closes https://github.com/pytorch/pytorch/issues/17909, https://github.com/pytorch/pytorch/issues/18096, https://github.com/pytorch/pytorch/issues/19946, and some of https://github.com/pytorch/pytorch/issues/13023
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19228

Reviewed By: bddppq

Differential Revision: D15058152

fbshipit-source-id: 9e081a901a071d7e4502b88054a34b450ab5ddde
2019-06-20 20:12:44 -07:00
bhushan
a6c4ea66dd Passing indices as a list to Subset instead of Tensor (#17649)
Summary:
Indices in Subset were stored as tensors earlier
passing as list in random_split to ensure integer indexing

fixes: #17466
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17649

Differential Revision: D14400250

Pulled By: soumith

fbshipit-source-id: cd20a959f33773c4babf8e861ea37ec61c2713a0
2019-03-10 09:23:53 -07:00
jayleverett
016f212357 fix behavior of ConcatDataset w/ negative indices (#15756)
Summary:
Currently, when you pass a negative index to a `Dataset` created with `ConcatDataset`, it simply passes that index to the first dataset in the list. So if, for example, we took `concatenated_dataset[-1]`, this will give us the last entry of the *first* dataset, rather than the last entry of the *last* dataset, as we would expect.

This is a simple fix to support the expected behavior for negative indices.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15756

Reviewed By: ezyang

Differential Revision: D14081811

Pulled By: fmassa

fbshipit-source-id: a7783fd3fd9e1a8c00fd076c4978ca39ad5a8a2a
2019-02-14 13:02:54 -08:00
Gao, Xiang
d7c32df67f move Subset, random_split to data, use sequence at some places. (#7816) 2018-05-25 12:50:50 +02:00
Jason Park
64e2c03bea Enable TensorDataset to get any number of tensors (#6038)
Keeping compatibility, enable TensorDataset to get any number of tensors.

* Enable TensorDataset to get any number of tensors

* Update dataset.py

Fix syntax error on python 2.7

* Add several test for tensordataset

* Fix whitespaces

* Simplify args

* Update dataset.py
2018-03-28 11:20:50 -04:00
Alykhan Tejani
18a866aedd Add random_split to torch.utils.data.dataset (#4435) 2018-01-02 18:56:49 +01:00
Mikhail Korobov
754f3d3fe8 fixed a typo in ConcatDataset.cumulative_sizes attribute name 2017-11-24 11:07:51 +01:00
Tzu-Wei Huang
618026e999 implements operator + for Dataset class (#3180)
* implements operator + for Dataset class

* check for exact equivalent
2017-10-29 01:19:59 +05:30
Sasank Chilamkurthy
bbf2c6a084 Fix ConcatDataset docs (#2355)
* Fix ConcatDataset docs

so that sphinx-napoleon parses it right.

* Fix WeightedRandomSampler docs
2017-08-23 09:47:57 -04:00
Valentin Haenel
d592e188f7 port of ConcatDataset (#1902) 2017-06-27 12:31:56 -04:00
zhtvk
4d37ef878c Remove view on data and target tensors of dim 1 in TensorDataset (#609) 2017-02-09 22:06:39 +01:00
Luke Yeager
e7c1e6a8e3 [pep8] Fix most lint automatically with autopep8
Here's the command I used to invoke autopep8 (in parallel!):

    git ls-files | grep '\.py$' | xargs -n1 -P`nproc` autopep8 -i

Several rules are ignored in setup.cfg. The goal is to let autopep8
handle everything which it can handle safely, and to disable any rules
which are tricky or controversial to address. We may want to come back
and re-enable some of these rules later, but I'm trying to make this
patch as safe as possible.

Also configures flake8 to match pep8's behavior.

Also configures TravisCI to check the whole project for lint.
2017-01-28 01:15:51 +01:00
Adam Paszke
4cc11066b2 Add torch.utils.data docs and improve notes (#460)
* Add torch.utils.data docs and improve notes
2017-01-17 14:51:05 -05:00
Adam Lerer
a1f5fe6a8f Add multiprocess data loader + improvements to torch.utils.data 2016-09-30 16:23:43 -04:00
Adam Paszke
ee85fe1a9c Initial utils implementation 2016-09-08 18:49:48 -07:00