Commit Graph

7 Commits

Author SHA1 Message Date
Edward Yang
0478d32cb8 Move AlignOf, SmallVector and ArrayRef to c10.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13916

Reviewed By: smessmer

Differential Revision: D13046722

fbshipit-source-id: 1583d3170d60e22f0a535cd1fd56bdf928186f5d
2018-11-14 11:13:16 -08:00
Peter Goldsborough
5151d33287 Unflake the ordering enforcement test (#13919)
Summary:
Attempts to unflake the dataloader ordering enforcement test. I think the issue was that the `thread_counter` variable was not atomic. I've made it atomic, and also global just to make it a bit clearer.

Fixes https://github.com/pytorch/pytorch/issues/13634

colesbury SsnL ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13919

Differential Revision: D13051718

Pulled By: goldsborough

fbshipit-source-id: b9f7f6317701a8b861a1d5c6a9b2b17b44782561
2018-11-13 21:05:02 -08:00
Peter Goldsborough
393ad6582d Use torch:: instead of at:: in all C++ APIs (#13523)
Summary:
In TorchScript and C++ extensions we currently advocate a mix of `torch::` and `at::` namespace usage. In the C++ frontend I had instead exported all symbols from `at::` and some from `c10::` into the `torch::` namespace. This is far, far easier for users to understand, and also avoid bugs around creating tensors vs. variables. The same should from now on be true for the TorchScript C++ API (for running and loading models) and all C++ extensions.

Note that since we're just talking about typedefs, this change does not break any existing code.

Once this lands I will update stuff in `pytorch/tutorials` too.

zdevito ezyang gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13523

Differential Revision: D12942787

Pulled By: goldsborough

fbshipit-source-id: 76058936bd8707b33d9e5bbc2d0705fc3d820763
2018-11-06 14:32:25 -08:00
Peter Goldsborough
8fafa7b6ac Remove size() from BatchDataset and templatize IndexType (#12960)
Summary:
This PR brings to changes to the recently landed C++ Frontend dataloader:

1. Removes the `size()` method from `BatchDataset`. This makes it cleaner to implement unsized ("infinite stream") datasets. The method was not used much beyond initial configuration.
2. Makes the index type of a dataset a template parameter of `BatchDataset` and `Sampler`. This essentially allows custom index types instead of only `vector<size_t>`. This greatly improves flexibility.

See the `InfiniteStreamDataset` and `TestIndex` datasets in the tests for what this enables.

Some additional minor updates and code movements too.

apaszke SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12960

Differential Revision: D12893342

Pulled By: goldsborough

fbshipit-source-id: ef03ea0f11a93319e81fba7d52a0ef1a125d3108
2018-11-05 17:13:09 -08:00
Peter Goldsborough
c21471c77f Sampler serialization and deserialization (#12999)
Summary:
Implements serialization and deserialization for samplers in the C++ frontend dataloader.

apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12999

Differential Revision: D10859676

Pulled By: goldsborough

fbshipit-source-id: cd132100fd35323e5a3df33e314511750806f48d
2018-10-26 12:20:51 -07:00
Dmytro Dzhulgakov
49046239f2 Change explicit usages of at::optional to c10::optional (#13082)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13082

Follow up of D10511254. For these cases we can move to preferred `optional` without namespace right away.

Reviewed By: ezyang, Yangqing

Differential Revision: D10844117

fbshipit-source-id: 99a59e692fb4b236b299579f937f1536d443d899
2018-10-25 15:17:53 -07:00
Peter Goldsborough
a022fd2d6b Implement DataLoader (#11918)
Summary:
This PR implements a DataLoader API for the C++ frontend.

The components present in this API largely match the Python API. It consists of:
- `Dataset`s: Conceptually a function from a set of indices to a batch of examples;
- `Transform`s: A functional transformation of a dataset. A `Map<D, T>` for Dataset `D` and transform `T` is itself a dataset;
- `Sampler`s: Specify a strategy for generating indices for a new batch;
- A `DataLoader`, with the ability to automatically parallelize fetching of samples across multiple worker threads;

Note that collation functions fall naturally out of the `Map<Dataset, Transform>` abstraction.

Things that are missing right now that maybe should be added:
- Memory pinning for CUDA tensors

The API was designed to be generalizable to almost any kind of dataset, transform or sampling strategy, while providing a convenient API out of the box. To achieve this, it is quite heavily templatized on various possible input types.

There are many parts to this PR! Right now, I would like feedback on:
- Your impression of the general usability of the API;
- Your impression of which parts seem too complex or overthought;
- The implementation of the parallelization aspects of the DataLoader. I've followed the Python implementation in some matters, but also differ in others. I think my implementation is a little cleaner and decouples components slightly better than the Python dataloader.

I haven't added too many comments yet, as this is fresh out of the oven. Let me know if anything is unclear from the code itself.

There also aren't any tests yet. I will write a comprehensive test suite once we agree on the API and implementation.

apaszke ezyang The controller you requested could not be found. pietern
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11918

Reviewed By: ezyang

Differential Revision: D9998881

Pulled By: goldsborough

fbshipit-source-id: 22cf357b63692bea42ddb1cc2abc71dae5030aea
2018-10-22 10:22:41 -07:00