Summary:
This is a stack PR based on https://github.com/pytorch/pytorch/pull/14454.
It enables the restoring the storage to appropriate device.
~~[TODO]: add/modify appropriate tests~~ Done
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14711
Reviewed By: dzhulgakov
Differential Revision: D13315746
Pulled By: houseroad
fbshipit-source-id: fe6f24a45c35e88fd1a2eebc09950d4430fac185
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14414
The previous functions were CUDA-centric, and lead to lots of places
where we improperly assumed that CUDA is the only game in town (it's not).
Best to delete them.
What are your alternatives? This diff fix some use sites which may give
you some ideas. In particular, the "given a device type, give me the
current device for that device type" might be a good function to enshrine
for real.
Reviewed By: gchanan
Differential Revision: D13218540
fbshipit-source-id: 2f42cd6b9bdab4930d25166b8041c9466a1c6e0a
Summary:
Make Samplers optionally accept new size in their reset() method. This helps dataloader or dataset to reset the sampler for an epoch or a chunk of data with different sizes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13870
Differential Revision: D13240120
Pulled By: soumith
fbshipit-source-id: 19c53f8be13c0fdcf504f0637b0d3e6009a8e599
Summary:
I noticed the test `DataLoaderTest.CanDereferenceIteratorMultipleTimes` doesn't test proper progression of the iterator. I also added a test for using `std::copy`.
Fixes https://github.com/pytorch/pytorch/issues/14276
ebetica ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14045
Differential Revision: D13092187
Pulled By: goldsborough
fbshipit-source-id: 57698ec00fa7b914b159677a4ab38b6b25c2860b
Summary:
This PR adds a `SharedDataset` to the C++ frontend data API, which allows wrapping a shared_ptr to a dataset into a class that conforms to the `Dataset` interface (with `get_batch`). This enables use cases where a custom dataset is (1) thread-safe and (2) expensive to copy. All workers will reference a single instance of this dataset. No additional copies are incurred.
jaliyae apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13800
Differential Revision: D13075610
Pulled By: goldsborough
fbshipit-source-id: 4ffdfd7959d49b042c0e254110085f62a0bfeb6c
Summary:
This reverts commit 37cb357d8d.
Try to see if it unbreaks master
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14082
Differential Revision: D13095888
Pulled By: bddppq
fbshipit-source-id: c728f80f233b4d9daaf65f43202d8104651029a9
Summary:
Deletes the `OptionsGuard` from ATen. This works towards the goal of reworking `DefaultTensorOptions`. `OptionsGuard` is troublesome because it relies on mutating thread local state. This PR fixes those code locations and then deletes the `OptionsGuard`.
ezyang gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13738
Differential Revision: D13000962
Pulled By: goldsborough
fbshipit-source-id: c8143ee75070c2280f5fd1d9af86f8ce14279b72
Summary:
I think this will be it. So for one, the previous test was bullshit because it was returning the thread id instead of the sample index (which is the thing whose ordering is enforced). Just turning up the number of threads to 10 from 4 made this very obvious. I also think there is a race condition, which may or may not have surfaced, in that there was nothing stopping one worker to get multiple batches, which would screw with the whole ordering logic. I've added a barrier struct such that workers wait for all workers to be in the `get_batch` function before actually doing something.
Fixes https://github.com/pytorch/pytorch/issues/14002
ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14038
Differential Revision: D13088132
Pulled By: goldsborough
fbshipit-source-id: 4bded63756c6a49502ee07ef8709a03073e7e05f
Summary:
Attempts to unflake the dataloader ordering enforcement test. I think the issue was that the `thread_counter` variable was not atomic. I've made it atomic, and also global just to make it a bit clearer.
Fixes https://github.com/pytorch/pytorch/issues/13634
colesbury SsnL ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13919
Differential Revision: D13051718
Pulled By: goldsborough
fbshipit-source-id: b9f7f6317701a8b861a1d5c6a9b2b17b44782561
Summary:
This PR adds Windows support for the C++ frontend. A lot of declarations were missing `TORCH_API` macros, and lots of code just did not compile on MSVC.
ebetica ezyang orionr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11716
Reviewed By: orionr
Differential Revision: D13038253
Pulled By: goldsborough
fbshipit-source-id: c8e5a45efd26117aeb99e768b56fcd5a89fcb9f8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13342
This PR introduces a few new concepts:
- DeviceGuardImplInterface, and implementations for CPU and CUDA, which
provide a generic interface for interfacing with device and stream state,
without requiring a direct dependency on the code in question.
- InlineDeviceGuard, a general template for generating both specialized
and dynamically dispatched device guard implementations. Dynamic
dispatch is done by specializing it on a VirtualGuardImpl.
- Provide a device-independent DeviceGuard class, which can be used even
from CPU code. It uses the aforementioned dynamic dispatch.
- CUDA-specialized CUDAGuard class, which doesn't have a dynamic dispatch
but can only be used from CUDA.
- StreamGuard, which is the same as above, but for streams rather than
devices.
- Optional variants of all the aforementioned guards, which are a no-op if
no device/stream is specified
- CUDAMultiStreamGuard, specifically for the case when we want to set
a device on every guard.
There are some subtle semantic changes, which have been thoroughly documented
in the class definition.
BC-breaking changes:
- Move constructor/assignment have been removed from all device guard
implementations.
- In some cases where you previously wrote 'set_device' (or 'set_stream'), you now must write
'reset_device', because if you switch devices/device types, the stream/device on the
previous device is unset. This is different from previous behavior.
- CUDAGuard no longer handles streams, or multiple streams. Use CUDAStreamGuard
or CUDAMultiStreamGuard as appropriate for your use case.
Reviewed By: dzhulgakov
Differential Revision: D12849620
fbshipit-source-id: f61956256f0b12be754b3234fcc73c2abc1be04e
Summary:
We have an MNIST reader in the C++ data API, so we can get rid of the custom one currently implemented in the integration tests.
ebetica
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13737
Differential Revision: D12990936
Pulled By: goldsborough
fbshipit-source-id: 125a1910ec91d53dbf121570fc9eec6ccfba0477
Summary:
This is a pre-cursor diff to Python <-> C++ frontend integration -- I have a follow-up PR coming for that. This PR changes the C++ frontend module interface to replace the custom "cursor"s I introduced some time ago with `OrderedDict`. I introduced cursors at the time as a convenient way of applying functions and query operations on a modules' parameters, buffers and modules, allowing things like `module.parameters().map(my_func)`. However, I noticed that (1) this functionality is easily implement-able on top of a regular data structure and (2) more importantly, using OrderedDicts is much, much easier for Python integration. This is especially true given that ScriptModule today also uses OrderedDict. Since C++ frontend modules and ScriptModules will soon too share as many implementation details as possible, it is overall the best move to ditch the custom cursor datastructure and pervasively use OrderedDict everywhere.
For this I did:
1. Changed the C++ frontend module interface to more closely match the Python one by providing `parameters()`, `named_parameters()` and other methods Python provides. This is very important for the following diff which binds these into Python for inter-op with Python modules.
2. In lieu of the `Cursor::apply()` method I added `nn::Module::apply`. This again is one more unifying step between Python and C++, since Python modules have an apply function too.
3. Deleted all uses of Cursor.
4. Tidied and beefed up the `OrderedDict` class. In particular, I made `OrderedDict::Item` store an `std::pair` under the hood, because that is trivial to bind into Python and saved me a lot of headaches. `key` and `value` become methods instead of fields, which they should have been from the very start anyway because it allows exactly these kinds of changes, as per usual good software engineering principle of encapsulation.
5. Added many tests for the OrderedDict use in `nn::Module`.
ebetica ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13427
Differential Revision: D12894092
Pulled By: goldsborough
fbshipit-source-id: 715770c95a9643753a1db26d7f9da9a78619a15d
Summary:
In TorchScript and C++ extensions we currently advocate a mix of `torch::` and `at::` namespace usage. In the C++ frontend I had instead exported all symbols from `at::` and some from `c10::` into the `torch::` namespace. This is far, far easier for users to understand, and also avoid bugs around creating tensors vs. variables. The same should from now on be true for the TorchScript C++ API (for running and loading models) and all C++ extensions.
Note that since we're just talking about typedefs, this change does not break any existing code.
Once this lands I will update stuff in `pytorch/tutorials` too.
zdevito ezyang gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13523
Differential Revision: D12942787
Pulled By: goldsborough
fbshipit-source-id: 76058936bd8707b33d9e5bbc2d0705fc3d820763
Summary:
This PR brings to changes to the recently landed C++ Frontend dataloader:
1. Removes the `size()` method from `BatchDataset`. This makes it cleaner to implement unsized ("infinite stream") datasets. The method was not used much beyond initial configuration.
2. Makes the index type of a dataset a template parameter of `BatchDataset` and `Sampler`. This essentially allows custom index types instead of only `vector<size_t>`. This greatly improves flexibility.
See the `InfiniteStreamDataset` and `TestIndex` datasets in the tests for what this enables.
Some additional minor updates and code movements too.
apaszke SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12960
Differential Revision: D12893342
Pulled By: goldsborough
fbshipit-source-id: ef03ea0f11a93319e81fba7d52a0ef1a125d3108
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13275
This resulted in a bunch of knock-on changes, which I will now
describe:
- s/original_index/original_device/
- s/last_index/last_device/
- A bunch of places that used set_index, now use CUDAGuard (which does have
set_index) because they were CUDA-specific code.
Major caveat: DeviceGuard doesn't *actually* work non-CUDA/CPU devices, To make
that happen, I plan on totally replacing the implementation of DeviceGuard; what
I mostly care about here is wrangling the API into an acceptable state.
Reviewed By: gchanan
Differential Revision: D12832080
fbshipit-source-id: 7de068c7cec35663dc8a533026a626331336e61d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13232
DeviceGuard should be device agnostic, which means that it shouldn't
assume that int64_t means select the CUDA device.
Reviewed By: gchanan
Differential Revision: D10858024
fbshipit-source-id: b40e8337e4046906fd8f83a95e6206367fb29dbe
Summary:
ezyang on the template hack
smessmer on SFINAE of the `TensorOptions(Device)`
goldsborough on the C++ API test changes
zdevito on the `jit` codegen changes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13146
Reviewed By: ezyang
Differential Revision: D12823809
Pulled By: SsnL
fbshipit-source-id: 98d65c401c98fda1c6fa358e4538f86c6495abdc
Summary:
Implements serialization and deserialization for samplers in the C++ frontend dataloader.
apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12999
Differential Revision: D10859676
Pulled By: goldsborough
fbshipit-source-id: cd132100fd35323e5a3df33e314511750806f48d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12768
Note: DefaultTensorOptions no longer fits in 64-bits.
I kept functions that take ScalarType as input to minimize changes for now.
Reviewed By: ezyang
Differential Revision: D10419671
fbshipit-source-id: 9cc8c5982fde9ff243e03d55c0c52c2aa2c7efd8
Summary:
Does
```cpp
namespace torch {
using c10::optional;
using c10::nullopt;
}
```
So that users can be oblivious of our changes with ATen/c10 happening in the background, and also don't have to deal with multiple namespaces (which is very confusing).
ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12927
Differential Revision: D10510630
Pulled By: goldsborough
fbshipit-source-id: e456264f2fbca3eda277712de11cdd8acc77fbd4
Summary:
This PR adds optional type to ATen native, autograd, JIT schema and Python Arg parser, closes#9513. It allows us to use optional default values (including None) for function signature and implementations like clamp, etc., and also let us remove the python_default_init hack.
Follow up:
remove python_default_init completely.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12582
Differential Revision: D10417423
Pulled By: wanchaol
fbshipit-source-id: 1c80f0727bb528188b47c595629e2996be269b89
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13082
Follow up of D10511254. For these cases we can move to preferred `optional` without namespace right away.
Reviewed By: ezyang, Yangqing
Differential Revision: D10844117
fbshipit-source-id: 99a59e692fb4b236b299579f937f1536d443d899
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12991
Remove the file proxying. Before we can do land `using namespace c10` everywhere, we just keep the one off namespace proxy. The follow up diff is going to replace explicit at::optional but keep just `optional` usage
Reviewed By: ezyang, Yangqing
Differential Revision: D10511254
fbshipit-source-id: 8297c61d7e9810ae215a18869a6ec9b63f55d202
Summary:
This PR implements a DataLoader API for the C++ frontend.
The components present in this API largely match the Python API. It consists of:
- `Dataset`s: Conceptually a function from a set of indices to a batch of examples;
- `Transform`s: A functional transformation of a dataset. A `Map<D, T>` for Dataset `D` and transform `T` is itself a dataset;
- `Sampler`s: Specify a strategy for generating indices for a new batch;
- A `DataLoader`, with the ability to automatically parallelize fetching of samples across multiple worker threads;
Note that collation functions fall naturally out of the `Map<Dataset, Transform>` abstraction.
Things that are missing right now that maybe should be added:
- Memory pinning for CUDA tensors
The API was designed to be generalizable to almost any kind of dataset, transform or sampling strategy, while providing a convenient API out of the box. To achieve this, it is quite heavily templatized on various possible input types.
There are many parts to this PR! Right now, I would like feedback on:
- Your impression of the general usability of the API;
- Your impression of which parts seem too complex or overthought;
- The implementation of the parallelization aspects of the DataLoader. I've followed the Python implementation in some matters, but also differ in others. I think my implementation is a little cleaner and decouples components slightly better than the Python dataloader.
I haven't added too many comments yet, as this is fresh out of the oven. Let me know if anything is unclear from the code itself.
There also aren't any tests yet. I will write a comprehensive test suite once we agree on the API and implementation.
apaszke ezyang The controller you requested could not be found. pietern
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11918
Reviewed By: ezyang
Differential Revision: D9998881
Pulled By: goldsborough
fbshipit-source-id: 22cf357b63692bea42ddb1cc2abc71dae5030aea
Summary:
This PR does three things:
1. Add support for serializing to `ostream` and deserializing from `istream`s in addition to files. This is after https://github.com/pytorch/pytorch/pull/11932 added support for streams in `torch::jit::ExportModule` and `torch::jit::load`.
2. Update the internal interface for how things get serialized into archives (e.g. use the more idiomatic `operator<<` instead of a `save` method). *The external interface does not change*.
3. Add documentation.
ezyang ebetica
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12421
Reviewed By: ezyang
Differential Revision: D10248529
Pulled By: goldsborough
fbshipit-source-id: 6cde6abd0174e3fbf3579c05376a32db0b53755f
Summary:
There are still a few work to be done:
- Move logging and unify AT_WARN with LOG(ERROR).
- A few header files are still being plumbed through, need cleaning.
- caffe2::EnforceNotMet aliasing is not done yet.
- need to unify the macros. See c10/util/Exception.h
This is mainly a codemod and not causing functional changes. If you find your job failing and trace back to this diff, usually it can be fixed by the following approaches:
(1) add //caffe2/c10:c10 to your dependency (or transitive dependency).
(2) change objects such as at::Error, at::Optional to the c10 namespace.
(3) change functions to the c10 namespace. Especially, caffe2::MakeString is not overridden by the unified c10::str function. Nothing else changes.
Please kindly consider not reverting this diff - it involves multiple rounds of rebasing and the fix is usually simple. Contact jiayq@ or AI Platform Dev for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12354
Reviewed By: orionr
Differential Revision: D10238910
Pulled By: Yangqing
fbshipit-source-id: 7794d5bf2797ab0ca6ebaccaa2f7ebbd50ff8f32
Summary:
In our #better-engineering quest of removing all uses of catch in favor of gtest, this PR ports JIT tests to gtest. After #11846 lands, we will be able to delete catch.
I don't claim to use/write these tests much (though I wrote the custom operator tests) so please do scrutinize whether you will want to write tests in the way I propose. Basically:
1. One function declaration per "test case" in test/cpp/jit/test.h
2. One definition in test/cpp/jit/test.cpp
3. If you want to be able to run it in Python, add it to `runJitTests()` which is called from Python tests
4. If you want to be able to run it in C++, add a `JIT_TEST` line in test/cpp/jit/gtest.cpp
Notice also I was able to share support code between C++ frontend and JIT tests, which is healthy.
ezyang apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12030
Differential Revision: D10207745
Pulled By: goldsborough
fbshipit-source-id: d4bae087e4d03818b72b8853cd5802d79a4cf32e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12103
This defers lookup of defaults to the site where we read
out of TensorOptions. THIS IS A BC-BREAKING BEHAVIOR CHANGE,
but we expect the bulk of uses of OptionsGuard don't allocate TensorOptions
inside the OptionsGuard region, and then use it outside of the region
(the situation where behavior could change.)
I also optimize the size of TensorOptions by rearranging fields, so that we
always fit in two 64-bit words.
Reviewed By: goldsborough
Differential Revision: D10052523
fbshipit-source-id: f454a15b4dbf8cd17bc902ab7d2016f2f689ed13
Summary:
Tensors cannot be created globally because of static initialization order issues. So tensors for the optim_baseline test must be created lazily instead. This is fine because these functions will only be called once (in the respective test).
ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12301
Differential Revision: D10201008
Pulled By: goldsborough
fbshipit-source-id: 59a041f437354e7c6600e5655b3e2d0647dbde9e
Summary:
This PR is a large codemod to rewrite all C++ API tests with GoogleTest (gtest) instead of Catch.
You can largely trust me to have correctly code-modded the tests, so it's not required to review every of the 2000+ changed lines. However, additional things I changed were:
1. Moved the cmake parts for these tests into their own `CMakeLists.txt` under `test/cpp/api` and calling `add_subdirectory` from `torch/CMakeLists.txt`
2. Fixing DataParallel tests which weren't being compiled because `USE_CUDA` wasn't correctly being set at all.
3. Updated README
ezyang ebetica
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11953
Differential Revision: D9998883
Pulled By: goldsborough
fbshipit-source-id: affe3f320b0ca63e7e0019926a59076bb943db80
Summary:
This PR serves two purposes:
1. Design an abstraction over a serialization scheme for C++ modules, optimizers and tensors in general,
2. Add serialization to the ONNX/PyTorch proto format.
This is currently a rough prototype I coded up today, to get quick feedback.
For this I propose the following serialization interface within the C++ API:
```cpp
namespace torch { namespace serialize {
class Reader {
public:
virtual ~Reader() = default;
virtual void read(const std::string& key, Tensor& tensor, bool is_buffer = false) = 0;
virtual void finish() { }
};
class Writer {
public:
virtual ~Reader() = default;
virtual void writer(const std::string& key, const Tensor& tensor, bool is_buffer = false) = 0;
virtual void finish() { }
};
}} // namespace torch::serialize
```
There are then subclasses of these two for (1) Cereal and (2) Protobuf (called the "DefaultWriter" and "DefaultReader" to hide the implementation details). See `torch/serialize/cereal.h` and `torch/serialize/default.h`. This abstraction and subclassing for these two allows us to:
1. Provide a cereal-less serialization forward that we can ship and iterate on going forward,
2. Provide no-friction backwards compatibility with existing C++ API uses, mainly StarCraft.
The user-facing API is (conceptually):
```cpp
void torch::save(const Module& module, Writer& writer);
void torch::save(const Optimizer& optimizer, Writer& writer);
void torch::read(Module& module, Reader& reader);
void torch::read(Optimizer& optimizer, Reader& reader);
```
with implementations for both optimizers and modules that write into the `Writer` and read from the `Reader`
ebetica ezyang zdevito dzhulgakov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11619
Differential Revision: D9984664
Pulled By: goldsborough
fbshipit-source-id: e03afaa646221546e7f93bb8dfe3558e384a5847
Summary:
The second part of T32009899
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11556
Differential Revision: D9888224
Pulled By: zrphercule
fbshipit-source-id: cb0d0ba5d9c7ad601ee3bce0d932ce9cbbc40908
Summary:
1. Document the Sequential module in the C++ API at a high, why-does-this-exist, and low, how-to-use, level
2. Change the Sequential tests to be in a style that makes them easier to convert to gtest. No code changes.
ebetica ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11648
Differential Revision: D9834526
Pulled By: goldsborough
fbshipit-source-id: 39f2f5c6cbbf8ed5a1b69986978c8ef127036de1