pytorch/test/cpp/api
Peter Goldsborough a022fd2d6b Implement DataLoader (#11918)
Summary:
This PR implements a DataLoader API for the C++ frontend.

The components present in this API largely match the Python API. It consists of:
- `Dataset`s: Conceptually a function from a set of indices to a batch of examples;
- `Transform`s: A functional transformation of a dataset. A `Map<D, T>` for Dataset `D` and transform `T` is itself a dataset;
- `Sampler`s: Specify a strategy for generating indices for a new batch;
- A `DataLoader`, with the ability to automatically parallelize fetching of samples across multiple worker threads;

Note that collation functions fall naturally out of the `Map<Dataset, Transform>` abstraction.

Things that are missing right now that maybe should be added:
- Memory pinning for CUDA tensors

The API was designed to be generalizable to almost any kind of dataset, transform or sampling strategy, while providing a convenient API out of the box. To achieve this, it is quite heavily templatized on various possible input types.

There are many parts to this PR! Right now, I would like feedback on:
- Your impression of the general usability of the API;
- Your impression of which parts seem too complex or overthought;
- The implementation of the parallelization aspects of the DataLoader. I've followed the Python implementation in some matters, but also differ in others. I think my implementation is a little cleaner and decouples components slightly better than the Python dataloader.

I haven't added too many comments yet, as this is fresh out of the oven. Let me know if anything is unclear from the code itself.

There also aren't any tests yet. I will write a comprehensive test suite once we agree on the API and implementation.

apaszke ezyang The controller you requested could not be found. pietern
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11918

Reviewed By: ezyang

Differential Revision: D9998881

Pulled By: goldsborough

fbshipit-source-id: 22cf357b63692bea42ddb1cc2abc71dae5030aea
2018-10-22 10:22:41 -07:00
..
any.cpp Remove caffe2::Tensor::capacity_nbytes, at::Tensor::to##name##Data, (#11876) 2018-09-24 10:40:10 -07:00
CMakeLists.txt Implement DataLoader (#11918) 2018-10-22 10:22:41 -07:00
cursor.cpp Rewrite C++ API tests in gtest (#11953) 2018-09-21 21:28:16 -07:00
dataloader.cpp Implement DataLoader (#11918) 2018-10-22 10:22:41 -07:00
expanding-array.cpp Rewrite C++ API tests in gtest (#11953) 2018-09-21 21:28:16 -07:00
integration.cpp Remove caffe2::Tensor::capacity_nbytes, at::Tensor::to##name##Data, (#11876) 2018-09-24 10:40:10 -07:00
jit.cpp Remove caffe2::Tensor::capacity_nbytes, at::Tensor::to##name##Data, (#11876) 2018-09-24 10:40:10 -07:00
memory.cpp Move exception to C10 (#12354) 2018-10-15 13:33:18 -07:00
misc.cpp Remove caffe2::Tensor::capacity_nbytes, at::Tensor::to##name##Data, (#11876) 2018-09-24 10:40:10 -07:00
module.cpp Move exception to C10 (#12354) 2018-10-15 13:33:18 -07:00
modules.cpp Remove caffe2::Tensor::capacity_nbytes, at::Tensor::to##name##Data, (#11876) 2018-09-24 10:40:10 -07:00
optim_baseline.h Lazily create tensors in optim_baseline (#12301) 2018-10-04 10:55:53 -07:00
optim_baseline.py Lazily create tensors in optim_baseline (#12301) 2018-10-04 10:55:53 -07:00
optim.cpp Lazily create tensors in optim_baseline (#12301) 2018-10-04 10:55:53 -07:00
ordered-dict.cpp Rewrite C++ API tests in gtest (#11953) 2018-09-21 21:28:16 -07:00
parallel.cpp Move exception to C10 (#12354) 2018-10-15 13:33:18 -07:00
README.md Rewrite C++ API tests in gtest (#11953) 2018-09-21 21:28:16 -07:00
rnn.cpp Remove caffe2::Tensor::capacity_nbytes, at::Tensor::to##name##Data, (#11876) 2018-09-24 10:40:10 -07:00
sequential.cpp Rewrite C++ API tests in gtest (#11953) 2018-09-21 21:28:16 -07:00
serialize.cpp Revamp and document serialization, support streams (#12421) 2018-10-15 15:47:59 -07:00
static.cpp Rewrite C++ API tests in gtest (#11953) 2018-09-21 21:28:16 -07:00
support.h Move JIT tests to gtest (#12030) 2018-10-06 23:09:44 -07:00
tensor_cuda.cpp Rewrite C++ API tests in gtest (#11953) 2018-09-21 21:28:16 -07:00
tensor_options_cuda.cpp Rewrite C++ API tests in gtest (#11953) 2018-09-21 21:28:16 -07:00
tensor_options.cpp Support additional device types (#12293) 2018-10-05 13:15:05 -07:00
tensor.cpp Remove caffe2::Tensor::capacity_nbytes, at::Tensor::to##name##Data, (#11876) 2018-09-24 10:40:10 -07:00

C++ Frontend Tests

In this folder live the tests for PyTorch's C++ Frontend. They use the GoogleTest test framework.

CUDA Tests

To make a test runnable only on platforms with CUDA, you should suffix your test with _CUDA, e.g.

TEST(MyTestSuite, MyTestCase_CUDA) { }

To make it runnable only on platforms with at least two CUDA machines, suffix it with _MultiCUDA instead of _CUDA, e.g.

TEST(MyTestSuite, MyTestCase_MultiCUDA) { }

There is logic in main.cpp that detects the availability and number of CUDA devices and supplies the appropriate negative filters to GoogleTest.

Integration Tests

Integration tests use the MNIST dataset. You must download it by running the following command from the PyTorch root folder:

$ python tools/download_mnist.py -d test/cpp/api/mnist

The required paths will be referenced as test/cpp/api/mnist/... in the test code, so you must run the integration tests from the PyTorch root folder.