mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

History

Peter Goldsborough a022fd2d6b Implement DataLoader (#11918 ) Summary: This PR implements a DataLoader API for the C++ frontend. The components present in this API largely match the Python API. It consists of: - `Dataset`s: Conceptually a function from a set of indices to a batch of examples; - `Transform`s: A functional transformation of a dataset. A `Map<D, T>` for Dataset `D` and transform `T` is itself a dataset; - `Sampler`s: Specify a strategy for generating indices for a new batch; - A `DataLoader`, with the ability to automatically parallelize fetching of samples across multiple worker threads; Note that collation functions fall naturally out of the `Map<Dataset, Transform>` abstraction. Things that are missing right now that maybe should be added: - Memory pinning for CUDA tensors The API was designed to be generalizable to almost any kind of dataset, transform or sampling strategy, while providing a convenient API out of the box. To achieve this, it is quite heavily templatized on various possible input types. There are many parts to this PR! Right now, I would like feedback on: - Your impression of the general usability of the API; - Your impression of which parts seem too complex or overthought; - The implementation of the parallelization aspects of the DataLoader. I've followed the Python implementation in some matters, but also differ in others. I think my implementation is a little cleaner and decouples components slightly better than the Python dataloader. I haven't added too many comments yet, as this is fresh out of the oven. Let me know if anything is unclear from the code itself. There also aren't any tests yet. I will write a comprehensive test suite once we agree on the API and implementation. apaszke ezyang The controller you requested could not be found. pietern Pull Request resolved: https://github.com/pytorch/pytorch/pull/11918 Reviewed By: ezyang Differential Revision: D9998881 Pulled By: goldsborough fbshipit-source-id: 22cf357b63692bea42ddb1cc2abc71dae5030aea		2018-10-22 10:22:41 -07:00
..
any.cpp	Remove caffe2::Tensor::capacity_nbytes, at::Tensor::to##name##Data, (#11876 )	2018-09-24 10:40:10 -07:00
CMakeLists.txt	Implement DataLoader (#11918 )	2018-10-22 10:22:41 -07:00
cursor.cpp	Rewrite C++ API tests in gtest (#11953 )	2018-09-21 21:28:16 -07:00
dataloader.cpp	Implement DataLoader (#11918 )	2018-10-22 10:22:41 -07:00
expanding-array.cpp	Rewrite C++ API tests in gtest (#11953 )	2018-09-21 21:28:16 -07:00
integration.cpp	Remove caffe2::Tensor::capacity_nbytes, at::Tensor::to##name##Data, (#11876 )	2018-09-24 10:40:10 -07:00
jit.cpp	Remove caffe2::Tensor::capacity_nbytes, at::Tensor::to##name##Data, (#11876 )	2018-09-24 10:40:10 -07:00
memory.cpp	Move exception to C10 (#12354 )	2018-10-15 13:33:18 -07:00
misc.cpp	Remove caffe2::Tensor::capacity_nbytes, at::Tensor::to##name##Data, (#11876 )	2018-09-24 10:40:10 -07:00
module.cpp	Move exception to C10 (#12354 )	2018-10-15 13:33:18 -07:00
modules.cpp	Remove caffe2::Tensor::capacity_nbytes, at::Tensor::to##name##Data, (#11876 )	2018-09-24 10:40:10 -07:00
optim_baseline.h	Lazily create tensors in optim_baseline (#12301 )	2018-10-04 10:55:53 -07:00
optim_baseline.py	Lazily create tensors in optim_baseline (#12301 )	2018-10-04 10:55:53 -07:00
optim.cpp	Lazily create tensors in optim_baseline (#12301 )	2018-10-04 10:55:53 -07:00
ordered-dict.cpp	Rewrite C++ API tests in gtest (#11953 )	2018-09-21 21:28:16 -07:00
parallel.cpp	Move exception to C10 (#12354 )	2018-10-15 13:33:18 -07:00
README.md	Rewrite C++ API tests in gtest (#11953 )	2018-09-21 21:28:16 -07:00
rnn.cpp	Remove caffe2::Tensor::capacity_nbytes, at::Tensor::to##name##Data, (#11876 )	2018-09-24 10:40:10 -07:00
sequential.cpp	Rewrite C++ API tests in gtest (#11953 )	2018-09-21 21:28:16 -07:00
serialize.cpp	Revamp and document serialization, support streams (#12421 )	2018-10-15 15:47:59 -07:00
static.cpp	Rewrite C++ API tests in gtest (#11953 )	2018-09-21 21:28:16 -07:00
support.h	Move JIT tests to gtest (#12030 )	2018-10-06 23:09:44 -07:00
tensor_cuda.cpp	Rewrite C++ API tests in gtest (#11953 )	2018-09-21 21:28:16 -07:00
tensor_options_cuda.cpp	Rewrite C++ API tests in gtest (#11953 )	2018-09-21 21:28:16 -07:00
tensor_options.cpp	Support additional device types (#12293 )	2018-10-05 13:15:05 -07:00
tensor.cpp	Remove caffe2::Tensor::capacity_nbytes, at::Tensor::to##name##Data, (#11876 )	2018-09-24 10:40:10 -07:00

README.md

C++ Frontend Tests

In this folder live the tests for PyTorch's C++ Frontend. They use the GoogleTest test framework.

CUDA Tests

To make a test runnable only on platforms with CUDA, you should suffix your test with _CUDA, e.g.

TEST(MyTestSuite, MyTestCase_CUDA) { }

To make it runnable only on platforms with at least two CUDA machines, suffix it with _MultiCUDA instead of _CUDA, e.g.

TEST(MyTestSuite, MyTestCase_MultiCUDA) { }

There is logic in main.cpp that detects the availability and number of CUDA devices and supplies the appropriate negative filters to GoogleTest.

Integration Tests

Integration tests use the MNIST dataset. You must download it by running the following command from the PyTorch root folder:

$ python tools/download_mnist.py -d test/cpp/api/mnist

The required paths will be referenced as test/cpp/api/mnist/... in the test code, so you must run the integration tests from the PyTorch root folder.