Summary:
Attempts to unflake the dataloader ordering enforcement test. I think the issue was that the `thread_counter` variable was not atomic. I've made it atomic, and also global just to make it a bit clearer.
Fixes https://github.com/pytorch/pytorch/issues/13634
colesbury SsnL ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13919
Differential Revision: D13051718
Pulled By: goldsborough
fbshipit-source-id: b9f7f6317701a8b861a1d5c6a9b2b17b44782561
Summary:
In TorchScript and C++ extensions we currently advocate a mix of `torch::` and `at::` namespace usage. In the C++ frontend I had instead exported all symbols from `at::` and some from `c10::` into the `torch::` namespace. This is far, far easier for users to understand, and also avoid bugs around creating tensors vs. variables. The same should from now on be true for the TorchScript C++ API (for running and loading models) and all C++ extensions.
Note that since we're just talking about typedefs, this change does not break any existing code.
Once this lands I will update stuff in `pytorch/tutorials` too.
zdevito ezyang gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13523
Differential Revision: D12942787
Pulled By: goldsborough
fbshipit-source-id: 76058936bd8707b33d9e5bbc2d0705fc3d820763
Summary:
This PR brings to changes to the recently landed C++ Frontend dataloader:
1. Removes the `size()` method from `BatchDataset`. This makes it cleaner to implement unsized ("infinite stream") datasets. The method was not used much beyond initial configuration.
2. Makes the index type of a dataset a template parameter of `BatchDataset` and `Sampler`. This essentially allows custom index types instead of only `vector<size_t>`. This greatly improves flexibility.
See the `InfiniteStreamDataset` and `TestIndex` datasets in the tests for what this enables.
Some additional minor updates and code movements too.
apaszke SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12960
Differential Revision: D12893342
Pulled By: goldsborough
fbshipit-source-id: ef03ea0f11a93319e81fba7d52a0ef1a125d3108
Summary:
Implements serialization and deserialization for samplers in the C++ frontend dataloader.
apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12999
Differential Revision: D10859676
Pulled By: goldsborough
fbshipit-source-id: cd132100fd35323e5a3df33e314511750806f48d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13082
Follow up of D10511254. For these cases we can move to preferred `optional` without namespace right away.
Reviewed By: ezyang, Yangqing
Differential Revision: D10844117
fbshipit-source-id: 99a59e692fb4b236b299579f937f1536d443d899
Summary:
This PR implements a DataLoader API for the C++ frontend.
The components present in this API largely match the Python API. It consists of:
- `Dataset`s: Conceptually a function from a set of indices to a batch of examples;
- `Transform`s: A functional transformation of a dataset. A `Map<D, T>` for Dataset `D` and transform `T` is itself a dataset;
- `Sampler`s: Specify a strategy for generating indices for a new batch;
- A `DataLoader`, with the ability to automatically parallelize fetching of samples across multiple worker threads;
Note that collation functions fall naturally out of the `Map<Dataset, Transform>` abstraction.
Things that are missing right now that maybe should be added:
- Memory pinning for CUDA tensors
The API was designed to be generalizable to almost any kind of dataset, transform or sampling strategy, while providing a convenient API out of the box. To achieve this, it is quite heavily templatized on various possible input types.
There are many parts to this PR! Right now, I would like feedback on:
- Your impression of the general usability of the API;
- Your impression of which parts seem too complex or overthought;
- The implementation of the parallelization aspects of the DataLoader. I've followed the Python implementation in some matters, but also differ in others. I think my implementation is a little cleaner and decouples components slightly better than the Python dataloader.
I haven't added too many comments yet, as this is fresh out of the oven. Let me know if anything is unclear from the code itself.
There also aren't any tests yet. I will write a comprehensive test suite once we agree on the API and implementation.
apaszke ezyang The controller you requested could not be found. pietern
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11918
Reviewed By: ezyang
Differential Revision: D9998881
Pulled By: goldsborough
fbshipit-source-id: 22cf357b63692bea42ddb1cc2abc71dae5030aea