pytorch

OSSForks/pytorch

Fork 0

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Commit Graph

Author	SHA1	Message	Date
Peter Goldsborough	a022fd2d6b	Implement DataLoader (#11918 ) Summary: This PR implements a DataLoader API for the C++ frontend. The components present in this API largely match the Python API. It consists of: - `Dataset`s: Conceptually a function from a set of indices to a batch of examples; - `Transform`s: A functional transformation of a dataset. A `Map<D, T>` for Dataset `D` and transform `T` is itself a dataset; - `Sampler`s: Specify a strategy for generating indices for a new batch; - A `DataLoader`, with the ability to automatically parallelize fetching of samples across multiple worker threads; Note that collation functions fall naturally out of the `Map<Dataset, Transform>` abstraction. Things that are missing right now that maybe should be added: - Memory pinning for CUDA tensors The API was designed to be generalizable to almost any kind of dataset, transform or sampling strategy, while providing a convenient API out of the box. To achieve this, it is quite heavily templatized on various possible input types. There are many parts to this PR! Right now, I would like feedback on: - Your impression of the general usability of the API; - Your impression of which parts seem too complex or overthought; - The implementation of the parallelization aspects of the DataLoader. I've followed the Python implementation in some matters, but also differ in others. I think my implementation is a little cleaner and decouples components slightly better than the Python dataloader. I haven't added too many comments yet, as this is fresh out of the oven. Let me know if anything is unclear from the code itself. There also aren't any tests yet. I will write a comprehensive test suite once we agree on the API and implementation. apaszke ezyang The controller you requested could not be found. pietern Pull Request resolved: https://github.com/pytorch/pytorch/pull/11918 Reviewed By: ezyang Differential Revision: D9998881 Pulled By: goldsborough fbshipit-source-id: 22cf357b63692bea42ddb1cc2abc71dae5030aea	2018-10-22 10:22:41 -07:00

Author

SHA1

Message

Date

Peter Goldsborough

a022fd2d6b

Implement DataLoader (#11918 )

Summary:
This PR implements a DataLoader API for the C++ frontend.

The components present in this API largely match the Python API. It consists of:
- `Dataset`s: Conceptually a function from a set of indices to a batch of examples;
- `Transform`s: A functional transformation of a dataset. A `Map<D, T>` for Dataset `D` and transform `T` is itself a dataset;
- `Sampler`s: Specify a strategy for generating indices for a new batch;
- A `DataLoader`, with the ability to automatically parallelize fetching of samples across multiple worker threads;

Note that collation functions fall naturally out of the `Map<Dataset, Transform>` abstraction.

Things that are missing right now that maybe should be added:
- Memory pinning for CUDA tensors

The API was designed to be generalizable to almost any kind of dataset, transform or sampling strategy, while providing a convenient API out of the box. To achieve this, it is quite heavily templatized on various possible input types.

There are many parts to this PR! Right now, I would like feedback on:
- Your impression of the general usability of the API;
- Your impression of which parts seem too complex or overthought;
- The implementation of the parallelization aspects of the DataLoader. I've followed the Python implementation in some matters, but also differ in others. I think my implementation is a little cleaner and decouples components slightly better than the Python dataloader.

I haven't added too many comments yet, as this is fresh out of the oven. Let me know if anything is unclear from the code itself.

There also aren't any tests yet. I will write a comprehensive test suite once we agree on the API and implementation.

apaszke ezyang The controller you requested could not be found. pietern
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11918

Reviewed By: ezyang

Differential Revision: D9998881

Pulled By: goldsborough

fbshipit-source-id: 22cf357b63692bea42ddb1cc2abc71dae5030aea

2018-10-22 10:22:41 -07:00

1 Commits