Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22004
In future, we want all dicts/lists to store information about the types they contain.
This is only possible if the creation API doesn't allow creating lists/dicts without type information.
This diff removes some call sites that don't specify type information and have it specify type information.
Reviewed By: dzhulgakov
Differential Revision: D15906387
fbshipit-source-id: 64766a2534b52c221e8a5501a85eaad13812e7bd
Summary:
This change adds one advanced support for cross-chunk shuffling.
For training with static dataset, the default configuration is at user's disposal. However, in some user cases, over each epoch, new data is added to the current dataset, thus the dataset's size is dynamically changing/increasing. In order to mix the new data and the old data for better random sampling, one approach is to shuffle examples from more than 1 chunks. This feature is supported with this change. By specifying the `cross_chunk_shuffle_count_` on construction, advanced user can specify how many chunks to shuffle example from.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22347
Differential Revision: D16081378
Pulled By: zhangguanheng66
fbshipit-source-id: fd001dfb9e66947839adecfb9893156fbbce80d0
Summary:
When dealing with large scale dataset, it is handy if we can save the dataset status and resume later. Especially in cases where some unexpected crash happens, user don't need to start over the whole dataset from begining. Instead, they can reload it from the last checkpoint.
This change adds support for checkpoint save/load logic in ChunkDataset.
On ChunkDataset construction, user can specify a file name from which to load the checkpoint. If it is empty, default to start from fresh; otherwise the ChunkDataset will 'fast forward' the chunk sampler to the corresponding checkpoint.
The user can also call ChunkDataset::save() to serialize current status to a file, which can be used later.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21889
Differential Revision: D16024582
Pulled By: ailzhang
fbshipit-source-id: 1862ab5116f94c9d29da174ce04a91041d06cad5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22084
For DictPtr/ListPtr, default construction was disallowed because it was ambigious if it's supposed to create an empty list or a nullptr.
But since we renamed them to Dict/List, we can now allow default construction without ambiguity.
Differential Revision: D15948098
fbshipit-source-id: 942a9235b51608d1870ee4a2f2f0a5d0d45ec6e6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21937
This changes call sites to use the new naming scheme
Reviewed By: zdevito
Differential Revision: D15892404
fbshipit-source-id: 8d32aa90a0ead1066688166478f299fde9c2c133
Summary:
Resolves https://github.com/pytorch/lockdown/issues/18
This implements NamedTuple by taking advantage of the existing `names` field in `TupleType`.
TODO: This currently doesn't retain the NamedTuple-ness through serialization. Discussed with suo offline, we can probably make a way to define an anonymous NamedTuple in script (e.g. `NamedTuple('Foo', [('a', int), ('b', float), ('c', List[float])])` and serialize that
TODO: implement support for calling the constructor with kwargs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21428
Differential Revision: D15741564
Pulled By: jamesr66a
fbshipit-source-id: c077cbcea1880675ca6deb340a9ec78f824a136c
Summary:
This renames the CMake `caffe2` target to `torch`, as well as renaming `caffe2_gpu` to `torch_gpu` (and likewise for other gpu target variants). Many intermediate variables that don't manifest as artifacts of the build remain for now with the "caffe2" name; a complete purge of `caffe2` from CMake variable names is beyond the scope of this PR.
The shell `libtorch` library that had been introduced as a stopgap in https://github.com/pytorch/pytorch/issues/17783 is again flattened in this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20774
Differential Revision: D15769965
Pulled By: kostmo
fbshipit-source-id: b86e8c410099f90be0468e30176207d3ad40c821
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21177
- Integrate c10::ListPtr into IValue and the c10 dispatcher.
- Streamline conversion to/from IValue. Before, we had IValue::to<> and kernel_functor.h had its own ivalue_to_arg_type and return_type_to_ivalue. They are now unified. Also, this means that nested types like Dicts of Lists of Optional of Dict of ... do work as expected now
Differential Revision: D15476433
fbshipit-source-id: bde9df80df20091aa8e6ae17ba7e90abd149b954
Summary:
Fixes#19540
CC nmerrill67
C++ data parallel was using Module.clone() to create module replicas on every destination device. However, clone() does not set up gradient edges to point from replicas to the original module. As a result, the gradient will not be aggregated into the original module. This commit fixes the the problem by manually setting gradient edges from every parameter X in every replica to the same parameter X in the original module.
## Failed Attempt
Initially I tried implementing what we did in `replicate.py`, which
1. create module replicas
2. use Python `Broadcast` autograd function to broadcast every parameter in the original module to all destination devices.
3. assign the broadcast result params to module replicas' `_parameters` dict.
This works in Python because derived module member field params (e.g., `Linear.weight`) and base module `_parameters` (e.g., `Linear._parameters['weight']`) are referencing the same parameter instance. Assigning one of them will apply to both. However, in C++, even though I can modify Module's `parameters_ `values and gradient edges to point to the broadcast source, I cannot touch the weight and bias member fields in Linear, because replicate cannot (and should not) add special-case handlers to every different module. (See `Linear` [.h](https://github.com/pytorch/pytorch/blob/master/torch/csrc/api/include/torch/nn/modules/linear.h), [.cpp](https://github.com/pytorch/pytorch/blob/master/torch/csrc/api/src/nn/modules/linear.cpp)) Although they initially point to the same `TensorImpl` instance, after assigning to `Module.parameters_['weight']`, it will be different from `Linear.weight`.
## Solution Options
gchanan and I had several discussions on this issue and figured two solutions to this problem.
### Option One [implemented in this PR]
Replicate the module in two steps:
1. call `Module.clone()` to create a module replica on every destination device.
2. manually setting gradient edges from every parameter in every replica to the same parameter in the original module.
* Pro: Does not need to change any existing module, and relatively easier to implement
* Con: It is a little hackish.
### Options Two
Implement a `Replicatable` class (similar to `Cloneable`), and make it a friend class of `Module`. For more details see `Note [Replicating Modules]` in the code change.
* Pro: Maybe this aligns more with our existing approach implemented in `Cloneable`?
* Con: Require changes to every existing module.
I am inclined to go with option one, because `replicate` will only be used on data parallel. I feel it is too big an overkill if we have to change all existing module implementations due to a data parallel requirement.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20910
Differential Revision: D15556426
Pulled By: mrshenli
fbshipit-source-id: aa836290ec657b32742e2bea80bd0ac2404ef3b0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20669
Before, Dict was a value type, i.e. copying it did a deep copy.
Unfortunately, this doesn't work well with storing and passing Dicts around in IValues because IValues are reference types.
This diff changes Dict to be a reference type.
Reviewed By: dzhulgakov
Differential Revision: D15404911
fbshipit-source-id: dc990d3eb7cae044b74dd0253f8b704dde6a6c86
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20372
Implement a Dict type that allows us to abstract away from the concrete implementation used.
The API is similar to std::unordered_map, but behind the scenes we can switch to any map implementation we like. ska::flat_hash_map, google dense map, or any future map implementation with better performance.
Switching such an implementation choice does not have to break backwards compatibility of kernel code using the Dict type.
Reviewed By: zdevito
Differential Revision: D15298234
fbshipit-source-id: b5ad368a9e9516030805cd8f5f1b02e3986933c0
Summary:
This PR is an intermediate step toward the ultimate goal of eliminating "caffe2" in favor of "torch". This PR moves all of the files that had constituted "libtorch.so" into the "libcaffe2.so" library, and wraps "libcaffe2.so" with a shell library named "libtorch.so". This means that, for now, `caffe2/CMakeLists.txt` becomes a lot bigger, and `torch/CMakeLists.txt` becomes smaller.
The torch Python bindings (`torch_python.so`) still remain in `torch/CMakeLists.txt`.
The follow-up to this PR will rename references to `caffe2` to `torch`, and flatten the shell into one library.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17783
Differential Revision: D15284178
Pulled By: kostmo
fbshipit-source-id: a08387d735ae20652527ced4e69fd75b8ff88b05
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19976
Implement a Dict type that allows us to abstract away from the concrete implementation used.
The API is similar to std::unordered_map, but behind the scenes we can switch to any map implementation we like. ska::flat_hash_map, google dense map, or any future map implementation with better performance.
Switching such an implementation choice does not have to break backwards compatibility of kernel code using the Dict type.
Reviewed By: li-roy
Differential Revision: D15156384
fbshipit-source-id: b9313ec4dd9acb3b6a0035345b6ba4f2a437d1e5
Summary:
This PR restricts the BatchType template argument of ChunkDataReader to STL
vectors only. Internally, ChunkDataReader was assuming BatchType was a
vector, but the user could pass any type to the template argument,
leading to compiling issues during CPP extensions.
Additionally to the proposed API change, this PR adds missing include
headers to chunk.h. Currently the current implementation works but if
users try to create C++ extensions that implements new ChunkDataReaders
to be along with the existing ChunkDataset, the build will fail due to
the missing headers.
In terms of functionality, nothing has changed. This PR simply makes the
implementation slightly more robust for future extensions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19485
Differential Revision: D15261725
Pulled By: soumith
fbshipit-source-id: 38c9465d665392ae6a2d12c5a520a4f501e1a6ca
Summary:
Currently, the Python API doesn't serialize layers that don't have weights (such as `nn.ReLU` and `nn.MaxPool2d`e.g. in https://github.com/pytorch/vision/blob/master/torchvision/models/densenet.py#L80-L81). If one saves a model that contains weight-less layers in Python and tries to load it into C++, the C++ module loading code (`torch::load(...)`) will throw an error complaining that the expected layers are not found in the serialized file (e.g. https://github.com/pytorch/vision/pull/728#issuecomment-480974175). This PR solves the problem by ignoring layers that are not serializable (which currently only include `nn::Functional`) in the C++ module serialization code (`torch::save(...)` and `torch::load(...)`), and the user is expected to use `nn::Functional` to wrap the weight-less layers so that they can be ignored when serializing / deserializing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19740
Differential Revision: D15100575
Pulled By: yf225
fbshipit-source-id: 956481a2355d1de45341585abedda05e35d2ee8b
Summary:
In the distributed training development work, we need to be able to serialize a `std::vector` of `torch::Tensor`s. This PR adds support for serializing `std::vector<torch::Tensor>`.
cc. mrshenli
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19677
Differential Revision: D15069860
Pulled By: yf225
fbshipit-source-id: 505147e5f5fea78be1bf60fb8418bc187dbc2a98
Summary:
Fixes#18518
I changed the C++ API torch::nn::init::orthogonal_ implementation to match the Python implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18915
Differential Revision: D14851833
Pulled By: ezyang
fbshipit-source-id: 45b5e9741582777c203e9ebed564ab3ac1f94baf
Summary:
Previously, we were not able to assign names to `nn::Sequential`'s submodules. This PR adds this feature to match the Python API. Example use:
```cpp
Sequential sequential(named_submodule({
{"linear", Linear(10, 3)},
{"conv2d", Conv2d(1, 2, 3)},
{"dropout", Dropout(0.5)},
{"batchnorm", BatchNorm(5)},
{"embedding", Embedding(4, 10)},
{"lstm", LSTM(4, 5)}
}));
```
It also enables loading parameters of Python `nn.Sequential` module with custom submodules names into C++ frontend, unblocking https://github.com/pytorch/vision/pull/728#issuecomment-466661344.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17552
Differential Revision: D14246834
Pulled By: yf225
fbshipit-source-id: 3030b5c5d68f6dd5d3e37ac4b4f98dc6d6d9ba72
Summary:
Check for Tuple Matching in isSubvalueOf, since they may contain container types that need to be recursed within isSubvalueOf
Fix for https://github.com/pytorch/pytorch/issues/17650
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17687
Differential Revision: D14324642
Pulled By: eellison
fbshipit-source-id: 7f1e019875286b2640a3b9c003d1635dda8cf543
Summary:
The chunk buffer had a possibility to hang when no data is read and the buffer size is lower than chunk size. We detected this while running with larger dataset and hence the fix. I added a test to mimic the situation and validated that the fix is working. Thank you Xueyun for finding this issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17409
Differential Revision: D14198546
Pulled By: soumith
fbshipit-source-id: b8ca43b0400deaae2ebb6601fdc65b47f32b0554
Summary:
Currently there is a mismatch in naming between Python BatchNorm `running_var` and C++ BatchNorm `running_variance`, which causes JIT model parameters loading to fail (https://github.com/pytorch/vision/pull/728#issuecomment-466067138):
```
terminate called after throwing an instance of 'c10::Error'
what(): No such serialized tensor 'running_variance' (read at /home/shahriar/Build/pytorch/torch/csrc/api/src/serialize/input-archive.cpp:27)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x85 (0x7f2d92d32f95 in /usr/local/lib/libc10.so)
frame #1: torch::serialize::InputArchive::read(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, at::Tensor&, bool) + 0xdeb (0x7f2d938551ab in /usr/local/lib/libtorch.so.1)
frame #2: torch::nn::Module::load(torch::serialize::InputArchive&) + 0x98 (0x7f2d9381cd08 in /usr/local/lib/libtorch.so.1)
frame #3: torch::nn::Module::load(torch::serialize::InputArchive&) + 0xf9 (0x7f2d9381cd69 in /usr/local/lib/libtorch.so.1)
frame #4: torch::nn::Module::load(torch::serialize::InputArchive&) + 0xf9 (0x7f2d9381cd69 in /usr/local/lib/libtorch.so.1)
frame #5: torch::nn::operator>>(torch::serialize::InputArchive&, std::shared_ptr<torch::nn::Module> const&) + 0x32 (0x7f2d9381c7b2 in /usr/local/lib/libtorch.so.1)
frame #6: <unknown function> + 0x2b16c (0x5645f4d1916c in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest)
frame #7: <unknown function> + 0x27a3c (0x5645f4d15a3c in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest)
frame #8: <unknown function> + 0x2165c (0x5645f4d0f65c in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest)
frame #9: <unknown function> + 0x1540b (0x5645f4d0340b in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest)
frame #10: __libc_start_main + 0xf3 (0x7f2d051dd223 in /usr/lib/libc.so.6)
frame #11: <unknown function> + 0x1381e (0x5645f4d0181e in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest)
```
Renaming C++ BatchNorm `running_variance` to `running_var` should fix this problem.
This is a BC-breaking change, but it should be easy for end user to rename `running_variance` to `running_var` in their call sites.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17371
Reviewed By: goldsborough
Differential Revision: D14172775
Pulled By: yf225
fbshipit-source-id: b9d3729ec79272a8084269756f28a8f7c4dd16b6
Summary:
Adding two distrbuted samplers, Random and Sequential to the mix. Similar to python counterpart, DistributedSampler introduces a new method `set_epoch(size_t epoch)` which can be use to shuffle data determinstically between distributed processes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16910
Differential Revision: D14130980
Pulled By: soumith
fbshipit-source-id: ec08b7130c01e2fc6dc3693f7ac622a0a6d60f10
Summary:
/cc goldsborough
Working on #14582
The corresponding python implementations are at: [pytorch/torch/nn/init.py](6302e4001a/torch/nn/init.py (L261-L327))
Here is my initial implementation of Kaiming Initialization. I have not been able to figure out how to successfully run tests locally so I haven't added any yet.
A couple questions:
- Are the enums defined in the right place? I copied their names from Python, but do you prefer different naming conventions for C++?
- To run tests locally do I use `python setup.py test`? Can I run just a subset of the tests somehow?
- Should I add my tests at [test/cpp/api/misc.cpp](https://github.com/pytorch/pytorch/blob/master/test/cpp/api/misc.cpp#L47-L54)?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14718
Differential Revision: D14049159
Pulled By: goldsborough
fbshipit-source-id: 966ac5126875936e69b185b5041f16476ed4cf70
Summary:
libshm_manager doesn't need to depend on all of libtorch. It only uses tiny tempfile.h which can be moved to c10. I could just duplicate the file too, but it's not worth it as c10 is small enough.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17019
Differential Revision: D14052688
Pulled By: dzhulgakov
fbshipit-source-id: 8797d15f8c7c49c49d40b7ab2f43aa3bf6becb0c
Summary:
Previously, the ChunkBuffer depends on the remaining chunk count to signal end of dataloading. This does not work with distributed samplers where each sampler only loads a subset of chunks. This refactor remove the dependency on the remaining chunk count at the ChunkBuffer.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16868
Differential Revision: D14066517
Pulled By: goldsborough
fbshipit-source-id: 293dfe282ceff326dff0876c2f75c2ee4f4463e2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16751
This was made more complicated by the fact that ivalue::IntList
is a thing. So I had to fix all of the sites where we referring
to IValue post facto.
The following codemods were run, in this order:
```
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in IntList IntArrayRef
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in IntArrayRef::create IntList::create
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in ivalue::IntArrayRef ivalue::IntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in Tag::IntArrayRef Tag::IntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in isIntArrayRef isIntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in toIntArrayRef toIntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in 'Shared<IntArrayRef>' 'Shared<IntList>'
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in 'intrusive_ptr<IntArrayRef>' 'intrusive_ptr<IntList>'
```
Some manual fixups were done afterwards; they can be reviewed separately
at https://github.com/pytorch/pytorch/pull/16752
Reviewed By: dzhulgakov
Differential Revision: D13954363
fbshipit-source-id: b5c40aacba042402155a2f5a229fa6db7992ac64
Summary:
Fixes https://github.com/pytorch/pytorch/issues/16326
Previously we didn't handle module inputs which included Generic Lists. When checking whether a generic list if a subvalue of the input arg type, I currently recurse on every element of the list. This shouldn't be too slow since the innermost list will be specialized and we won't have to check it's elements.
E.g. Tensor[][] -> GenericList [TensorList ].
The error message could be improved, but extracting the complete type of nested lists would have to deal with unifying types across lists / empty lists & typevars so I'm going to save that for a follow up PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16482
Differential Revision: D13882582
Pulled By: eellison
fbshipit-source-id: 3609bc572f0ee9ebf20a77ea5ebc8fa3b165e24b