Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17585
Create a sugared value that represents a class during initialization. This is
so that assignments to attributes correctly define attributes in __init__ but
raise an error elsewhere.
Reviewed By: shannonzhu
Differential Revision: D14263403
fbshipit-source-id: 09b2feeb272302f00a79c2a0302fbdf5483aed6a
Summary:
Last batch of IR expect files removed. Includes some removal of expect files that are no longer used.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17886
Differential Revision: D14414435
Pulled By: eellison
fbshipit-source-id: 0bfd7ce66ac2f72a57f15f45ebd60b95e80b6c16
Summary:
Check for Tuple Matching in isSubvalueOf, since they may contain container types that need to be recursed within isSubvalueOf
Fix for https://github.com/pytorch/pytorch/issues/17650
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17687
Differential Revision: D14324642
Pulled By: eellison
fbshipit-source-id: 7f1e019875286b2640a3b9c003d1635dda8cf543
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17594
The original version of this broke things because a concurrent change raced with it in CI.
Reviewed By: ezyang
Differential Revision: D14266663
fbshipit-source-id: e8ac5dfcb7349b4f2c425d9f0eabbfc964314063
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17511
AliasTracker was doing bookkeeping for three concepts: the points-to graph,
writes, and wildcards.
This PR makes AliasTracker's job clearer: it keeps track of the points-to
graph. Thus it has been renamed MemoryDAG. Write and wildcard information were
pulled back into AliasDb as part of this—I may decide to pull them into their
own little modules since I don't want the alias analysis stuff to get too
bloated.
This refactor is necessary because we want to start tracking information for
aliasing elements that _aren't_ first-class IR Values (e.g. the "stuff" inside
a list). So MemoryDAG can't know too much about Values
Reviewed By: houseroad
Differential Revision: D14231251
fbshipit-source-id: 6cd98ae6fced8d6c1522c2454da77c3c1b2b0504
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17480
This was always part of our "spec" but not implemented
Reviewed By: houseroad
Differential Revision: D14214301
fbshipit-source-id: 118db320b43ec099dc3e730c67d39487474c23ea
Summary:
The chunk buffer had a possibility to hang when no data is read and the buffer size is lower than chunk size. We detected this while running with larger dataset and hence the fix. I added a test to mimic the situation and validated that the fix is working. Thank you Xueyun for finding this issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17409
Differential Revision: D14198546
Pulled By: soumith
fbshipit-source-id: b8ca43b0400deaae2ebb6601fdc65b47f32b0554
Summary:
This PR removes a few size of `self` that passed from forward pass to backward pass when `self` is already required in backward pass. This could be reason that cause the potential slow down in #16689 . I will attach a few perf numbers (still a bit volatile among runs tho) I got in the comment.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17187
Differential Revision: D14179512
Pulled By: ailzhang
fbshipit-source-id: 5f3b1f6f26a3fef6dec15623b940380cc13656fa
Summary:
Creates a new shared type parser to be shared between the IR parser and the Schema Parser.
Also adds parsing of CompleteTensorType and DimensionedTensorType, and feature-gates that for the IRParser.
Renames the existing type_parser for python annotations, python_type_parser, and names the new one jit_type_parser.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17383
Differential Revision: D14186438
Pulled By: eellison
fbshipit-source-id: bbd5e337917d8862c7c6fa0a0006efa101c76afe
Summary:
Currently there is a mismatch in naming between Python BatchNorm `running_var` and C++ BatchNorm `running_variance`, which causes JIT model parameters loading to fail (https://github.com/pytorch/vision/pull/728#issuecomment-466067138):
```
terminate called after throwing an instance of 'c10::Error'
what(): No such serialized tensor 'running_variance' (read at /home/shahriar/Build/pytorch/torch/csrc/api/src/serialize/input-archive.cpp:27)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x85 (0x7f2d92d32f95 in /usr/local/lib/libc10.so)
frame #1: torch::serialize::InputArchive::read(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, at::Tensor&, bool) + 0xdeb (0x7f2d938551ab in /usr/local/lib/libtorch.so.1)
frame #2: torch::nn::Module::load(torch::serialize::InputArchive&) + 0x98 (0x7f2d9381cd08 in /usr/local/lib/libtorch.so.1)
frame #3: torch::nn::Module::load(torch::serialize::InputArchive&) + 0xf9 (0x7f2d9381cd69 in /usr/local/lib/libtorch.so.1)
frame #4: torch::nn::Module::load(torch::serialize::InputArchive&) + 0xf9 (0x7f2d9381cd69 in /usr/local/lib/libtorch.so.1)
frame #5: torch::nn::operator>>(torch::serialize::InputArchive&, std::shared_ptr<torch::nn::Module> const&) + 0x32 (0x7f2d9381c7b2 in /usr/local/lib/libtorch.so.1)
frame #6: <unknown function> + 0x2b16c (0x5645f4d1916c in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest)
frame #7: <unknown function> + 0x27a3c (0x5645f4d15a3c in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest)
frame #8: <unknown function> + 0x2165c (0x5645f4d0f65c in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest)
frame #9: <unknown function> + 0x1540b (0x5645f4d0340b in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest)
frame #10: __libc_start_main + 0xf3 (0x7f2d051dd223 in /usr/lib/libc.so.6)
frame #11: <unknown function> + 0x1381e (0x5645f4d0181e in /home/shahriar/Projects/CXX/build-TorchVisionTest-Desktop_Qt_5_12_1_GCC_64bit-Debug/TorchVisionTest)
```
Renaming C++ BatchNorm `running_variance` to `running_var` should fix this problem.
This is a BC-breaking change, but it should be easy for end user to rename `running_variance` to `running_var` in their call sites.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17371
Reviewed By: goldsborough
Differential Revision: D14172775
Pulled By: yf225
fbshipit-source-id: b9d3729ec79272a8084269756f28a8f7c4dd16b6
Summary:
Trying to land again, make prim::None into a case of prim::Constant. Reverted the previous landing because it broke an important onnx export test.
https://github.com/pytorch/pytorch/pull/16160
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17186
Differential Revision: D14115304
Pulled By: eellison
fbshipit-source-id: 161435fc30460b4e116cdd62c7b2e5b94581dcb7
Summary:
Adding two distrbuted samplers, Random and Sequential to the mix. Similar to python counterpart, DistributedSampler introduces a new method `set_epoch(size_t epoch)` which can be use to shuffle data determinstically between distributed processes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16910
Differential Revision: D14130980
Pulled By: soumith
fbshipit-source-id: ec08b7130c01e2fc6dc3693f7ac622a0a6d60f10
Summary:
It might need some cleaning up and might be missing some features, but it should be already working for most cases.
This PR is based on top of PR16986 (so please review only the last commit here).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16987
Differential Revision: D14074577
Pulled By: ZolotukhinM
fbshipit-source-id: 712b598f423265655f574bb9903e2066628eaad3
Summary:
Currently the converters are very straightforward, i.e. there is no code for trying to
preserve semantics, we're purely perform conversion from one format to another.
Two things that we might want to add/change:
1. Add semantic conversion as well (but probably it would be a good idea to keep
it separate as a temporary thing).
2. Make sure we don't mess with value names, as they are crucial for current
uses of NetDefs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17123
Differential Revision: D14090244
Pulled By: ZolotukhinM
fbshipit-source-id: 07175fa9235582e1d1da5f10a42a5c1280b1b394
Summary:
This change simplifies analysis done on constants since prim::None does not need to be handled separately now. To check if a constant node is None, use node->isNone().
Next step will be to remove prim::Undefined.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16160
Differential Revision: D14109636
Pulled By: eellison
fbshipit-source-id: d26fd383976163a2ddd4c24984bd672a541cc876
Summary:
/cc goldsborough
Working on #14582
The corresponding python implementations are at: [pytorch/torch/nn/init.py](6302e4001a/torch/nn/init.py (L261-L327))
Here is my initial implementation of Kaiming Initialization. I have not been able to figure out how to successfully run tests locally so I haven't added any yet.
A couple questions:
- Are the enums defined in the right place? I copied their names from Python, but do you prefer different naming conventions for C++?
- To run tests locally do I use `python setup.py test`? Can I run just a subset of the tests somehow?
- Should I add my tests at [test/cpp/api/misc.cpp](https://github.com/pytorch/pytorch/blob/master/test/cpp/api/misc.cpp#L47-L54)?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14718
Differential Revision: D14049159
Pulled By: goldsborough
fbshipit-source-id: 966ac5126875936e69b185b5041f16476ed4cf70
Summary:
- Moved a few functions from `autograd` namespace to `aten` namespace to be visible from JIT nativeResolver.
- Added a hack to loop up keyword only argument. Will add proper support for kw only later
- Simulate function overload in aten using `_<number>` as function name suffix.
- Even `forward` returns multiple outputs like in `kthvalue`, there's at most one requires grad that we currently support.
- Removed the `TensorList` related ops here since partial `TensorList` support is prone to bugs. Our symbolic diff for `cat` was never tested with autodiff, and it seems broken. Need to find another proper way to support these ops(either by properly supporting `TensorList` or sth like `prim::ConstantChunk` and leave them for next PR.
Ops supported in this PR:
```
erf
expand_as
index
kthvalue
mean
permute
pow
rsub
select
sqrt
squeeze
t
to
topk
transpose
view
var
embedding
logsumexp
// grad is None
_dim_arange
contiguous
nonzero
ones_like
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16689
Differential Revision: D14020806
Pulled By: ailzhang
fbshipit-source-id: a5e2c144a7be5a0d39d7ac5f93cb402ec12503a5
Summary:
libshm_manager doesn't need to depend on all of libtorch. It only uses tiny tempfile.h which can be moved to c10. I could just duplicate the file too, but it's not worth it as c10 is small enough.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17019
Differential Revision: D14052688
Pulled By: dzhulgakov
fbshipit-source-id: 8797d15f8c7c49c49d40b7ab2f43aa3bf6becb0c
Summary:
Currently the converters are very straightforward, i.e. there is no code for trying to
preserve semantics, we're purely perform conversion from one format to another.
Two things that we might want to add/change:
1. Add semantic conversion as well (but probably it would be a good idea to keep
it separate as a temporary thing).
2. Make sure we don't mess with value names, as they are crucial for current
uses of NetDefs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16967
Differential Revision: D14062537
Pulled By: ZolotukhinM
fbshipit-source-id: 88b184ee7276779e5e9152b149d69857515ad98a
Summary:
Previously, the ChunkBuffer depends on the remaining chunk count to signal end of dataloading. This does not work with distributed samplers where each sampler only loads a subset of chunks. This refactor remove the dependency on the remaining chunk count at the ChunkBuffer.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16868
Differential Revision: D14066517
Pulled By: goldsborough
fbshipit-source-id: 293dfe282ceff326dff0876c2f75c2ee4f4463e2
Summary:
(review top commit only).
As expected, fork/wait introduces some corner cases into the alias analysis. The comments inline should describe the changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16671
Differential Revision: D13963219
Pulled By: suo
fbshipit-source-id: 2bec6fc03a4989cf309fbb9473f3f2ffe2c31431
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16751
This was made more complicated by the fact that ivalue::IntList
is a thing. So I had to fix all of the sites where we referring
to IValue post facto.
The following codemods were run, in this order:
```
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in IntList IntArrayRef
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in IntArrayRef::create IntList::create
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in ivalue::IntArrayRef ivalue::IntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in Tag::IntArrayRef Tag::IntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in isIntArrayRef isIntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in toIntArrayRef toIntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in 'Shared<IntArrayRef>' 'Shared<IntList>'
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in 'intrusive_ptr<IntArrayRef>' 'intrusive_ptr<IntList>'
```
Some manual fixups were done afterwards; they can be reviewed separately
at https://github.com/pytorch/pytorch/pull/16752
Reviewed By: dzhulgakov
Differential Revision: D13954363
fbshipit-source-id: b5c40aacba042402155a2f5a229fa6db7992ac64
Summary:
This PR reworks the mutability API to be simpler (updates passes to use "mayAlias" calls) and improves the caching logic.
The difference is that we now directly express the idea of a "memory location." Leaves in the alias trackers points-to graph are considered unique memory locations, and mayAlias questions can be boiled down whether two values share a leaf.
To speed up queries, some basic path compression has been added.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16605
Differential Revision: D13952738
Pulled By: suo
fbshipit-source-id: cfc7fb2b23369f1dc425d1d8ca2c753c193d95dd
Summary:
I went through my build log and did what I thought were reasonable fixes to all the C++ compilation warnings that came up
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16411
Differential Revision: D13901006
Pulled By: jamesr66a
fbshipit-source-id: 02df4e3e5a5c8dd9e69ac9f065cd3f2a80645033
Summary:
Start splitting up these tests so we don't have a massive test file. Doesn't change how you run them, since `gtest.cpp` and `no-gtest.cpp` will still collect everything.
Renamed `tests.h` to `test_misc.h` to vaguely discourage people from adding yet more stuff to it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16536
Reviewed By: zdevito, eellison
Differential Revision: D13882215
Pulled By: suo
fbshipit-source-id: 61cf97f3c2c50703dcf6a3a34da01415ecb7e7d6
Summary:
Fixes https://github.com/pytorch/pytorch/issues/16326
Previously we didn't handle module inputs which included Generic Lists. When checking whether a generic list if a subvalue of the input arg type, I currently recurse on every element of the list. This shouldn't be too slow since the innermost list will be specialized and we won't have to check it's elements.
E.g. Tensor[][] -> GenericList [TensorList ].
The error message could be improved, but extracting the complete type of nested lists would have to deal with unifying types across lists / empty lists & typevars so I'm going to save that for a follow up PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16482
Differential Revision: D13882582
Pulled By: eellison
fbshipit-source-id: 3609bc572f0ee9ebf20a77ea5ebc8fa3b165e24b
Summary:
This PR changes the way we store aliasing information from a "set" approach to a "points-to" analysis. Set-based approaches lose information in ways that make it difficult to do "live" updates to the alias DB as one as mutating the graph.
The tradeoff is that simple queries get more expensive, since they require traversing the points-to graph to answer most questions. In practice, this is unlikely to be that costly since we don't have massive aliasing chains, but we could create an approximation/caching layer if this becomes a problem.
My rough plan is:
1. This PR, switching to a points-to graph
2. Make it "live": analyzing a node should record all the edges the node added, so that we can rollback when the node is destroyed.
3. Reduce wildcard scope: we can make the wildcard a special vertex that points to anything that we're not "sure" about; namely, things that have been put inside lists, or graph inputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16386
Differential Revision: D13855117
Pulled By: suo
fbshipit-source-id: f009f58143173c275501624eb105d07ab60fe5e1
Summary:
This PR contains the implementation of chunk dataset, with the API proposed in PR https://github.com/pytorch/pytorch/pull/15562
A chunk dataset is derived from StatefulDataset. It utilizes worker threads to prefetches chunk data, splits it into batches and caches them into a queue. When get_batch is called from dataloader, batch data is retrieved from the queue, and data in new chunks will be pushed for later following batches.
Chunk dataset uses two samplers (chunk_sampler and example_sampler) to perform sampling. The chunk_sampler decides which chunk to load, and example_sampler shuffles the examples inside a specific chunk. More detail of this sampling approach can be found here: http://martin.zinkevich.org/publications/nips2010.pdf
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15932
Differential Revision: D13868688
Pulled By: soumith
fbshipit-source-id: a43000c478ca2a3c64cc84b3626d6b8b1ad9a07e
Summary:
This PR inlines `Attributes` into `Node`. It helps to cleanup the code a little as everything is one place (some of the cleanups are included in the PR).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16098
Differential Revision: D13717637
Pulled By: ZolotukhinM
fbshipit-source-id: c54ae65178a95a01354688921a9ccb1ca699f8eb
Summary:
In Python, you can use the call operator to invoke the `forward()` method of a module. In C++ this was currently not possible, because I couldn't figure out how to deduce the return type of a module's `forward()` method under the constraint that `forward()` may not exist at all (since the base module class in C++ does not mandate a `forward()` method). I now figured it out, so the call operator can be used.
ezyang ebetica
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15831
Differential Revision: D13652676
Pulled By: goldsborough
fbshipit-source-id: ccab45a15215dda56460e560f0038781b539135f
Summary:
This is the first of several PRs to simplify AliasDb usage.
- Hide the concept wildcards from users. They are too hard to think about and too easy to forget about.
- Start moving "mutability-safe" graph mutation methods into AliasDb (right now, the various methods that deal with topological move).
Eventually I want to create a "mutability-aware" handle to the graph. If you only use that handle to transform the graph, you can be sure that all transformations are safe with respect to mutability.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15656
Differential Revision: D13615492
Pulled By: suo
fbshipit-source-id: 5c39a157b4ea76f1f976315d06a314a89cc4f22f
Summary:
Wanted to use `Tensor.isnan` in C++, figured it'd be nice to have, so I made it into a tiny native function.
gchanan ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15722
Differential Revision: D13591315
Pulled By: goldsborough
fbshipit-source-id: a78bd22101fde87a0257f759b9bfcf3b4208f5fa
Summary:
The PR clang-formats everything in `torch/csrc/jit/` and adds it to the pre-commit hook.
Here is a list of non-mechanical changes:
- I went over each file and fixed up whenever I could tell that clang-format was clobbering comment formatting.
- Made the macros in register_prim_ops a little more clang-format friendly by omitting trailing commas
- Refactored autodiff.cpp to use a helper class with explicit state rather than a bunch of capturing lambdas
- Small improvements to the precommit hook clang-format
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15524
Differential Revision: D13547989
Pulled By: suo
fbshipit-source-id: 3ff1541bb06433ccfe6de6e33f29227a2b5bb493
Summary:
Currently re-implements the dataloader for stateful datasets. Outstanding work:
- Refactor DataLoader and DataLoader2 to have common base classes and only differ in specifi pieces of logic,
- Figure out how to not duplicate the `MapDataset` logic for stateful vs. non-stateful
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15096
Differential Revision: D13522043
Pulled By: goldsborough
fbshipit-source-id: 08e461ca51783047f11facc4d27dfa2e4f1e4c2a
Summary:
(otherwise len is not resolvable using torch::jit::compile)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15488
Differential Revision: D13539991
Pulled By: zdevito
fbshipit-source-id: 3ba85fa7b1adb163f9229c568f7997d22321903d
Summary:
This PR enables autodiff to use the forward/backward graph compiled from python code, instead of using symbolic gradients(modifying the original graph directly).
We put the map in a separate .h file for now to wait for the native_functions.yaml and derivatives.yaml merge. This should ideally go into native_functions.yaml eventually.
This PR should be enough to unblock us for now, we can start writing gradients for aten functions in python.
Differential Revision: D13494635
Pulled By: ailzhang
fbshipit-source-id: f8d51a15243ac46afd09d930c573ccdfcd9fdaaf
Summary:
This makes DCE more granular by tracking live values/aliases through the graph (rather than just nodes). So we can be more aggressive in DCE around control flow blocks. For example, in:
```
%a0 = aten::foo()
%b = aten::foo()
%a2, %b2 = prim::If(%cond) {
block0() {
%a1 = aten::foo(%.0)
%b1 = aten::foo(%b)
} -> (%a1, %b1)
}
return (%a2)
```
we will now dce all the `%b` stuff.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14910
Differential Revision: D13476445
Pulled By: suo
fbshipit-source-id: 2bf5db19711c07dde946697a4f4b270bd8baf791
Summary:
This PR fixes around 250 places in the codebase where we were making unnecessary copies of objects (some large, some small).
ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15026
Differential Revision: D13458784
Pulled By: goldsborough
fbshipit-source-id: be5148b2ce09493588d70952e6f6d6ff5ec5199b
Summary:
Before this PR, loop unrolling + the graph fuser was creating multiple
FusionGroups with the same bodies (with different variable names) for
JIT LSTMs. Each FusionGroup got registered to a separate fusion key;
each key resulted in a different compilation for the same
specializations.
This PR makes it so that when registering FusionGroups with the fusion
compiler, the compiler first checks the KernelSpec cache to see if the
FusionGroup's graph exists already. If it does, then return the
corresponding KernelSpec's key to share compiled kernels.
In addition, graphs in the KernelSpec cache are canonicalized before
being cached. I added a flag to the canonicalize pass to remove unique
names of values.
This shortens the compile time for a JIT LSTM (seq_len of 100, loop
unroll factor of 8) from 5.3s to 2.3s. Most of this compile time is
running the graph fuser and/or fusion compiler; while this PR
makes it so that there is only one unique kernel in the forward pass,
there are a lot of different kernels (6) in the backward pass
(after loop unrolling) that should be investigated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14541
Differential Revision: D13324487
Pulled By: zou3519
fbshipit-source-id: b841d82ed35a959b5cfc72db033bf5a7b42cc4fb
Summary:
Fixes a bug where (de-)/serializing a hierarchy of submodules where one submodule doesn't have any parameters, but its submodules do, doesn't get properly loaded. This had to do with the fact that the old protobuf format couldn't store empty parameters.
Fixes https://github.com/pytorch/pytorch/issues/14891
soumith ezyang ebetica
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15033
Differential Revision: D13411322
Pulled By: goldsborough
fbshipit-source-id: 2ef73b2aa93fa9e46b1cbe1fd47d9f134d6016d5
Summary:
Removing the deprecated functions in `torch/csrc/variable_tensor_functions.h` (like `torch::CPU`) and corresponding implementations from `torch/csrc/torch.cpp` from master after the release.
ezyang gchanan soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15003
Differential Revision: D13418086
Pulled By: goldsborough
fbshipit-source-id: a0accdf6f7b0efa1ec07ac7b74b86ff2da37543f
Summary:
Replaces the `DefaultTensorOptions` with just a global default dtype that you can set and get like in Python.
Also, calls `set_default_dtype` in the implementation of `torch.set_default_dtype`. Right now these two default values are separate but will always be the same. Should we just bind `set_default_dtype` into Python? I think that might be good to do in a separate PR though.
ezyang gchanan
Also CC colesbury who wanted to do this for ATen for a while? What do you think about it?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13748
Differential Revision: D13340207
Pulled By: goldsborough
fbshipit-source-id: 2689b09eb137fabb3a92d1ad1635782bee9398e8
Summary:
**Review only the last commit.**
This commit adds a few optimizations to AD, that let us dramatically
reduce the number of sizes we capture from forward.
We now:
- collapse chains of SumToSize
- avoid capturing sizes of tensors that are captured anyway
- more aggressively DCE the reverse code
- run CSE on the primal code to deduplicate `aten::size` calls
cc zou3519 zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14758
Differential Revision: D13324440
Pulled By: zou3519
fbshipit-source-id: 45ccbc13605adcef2b461840c6089d3200000c72
Summary:
Make `at::_local_scalar` more "official" by renaming it to `item()`.
gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13676
Differential Revision: D13003020
Pulled By: goldsborough
fbshipit-source-id: 0ac25f5237fb81a1576304a0a02f840ff44168a4
Summary:
This is a stack PR based on https://github.com/pytorch/pytorch/pull/14454.
It enables the restoring the storage to appropriate device.
~~[TODO]: add/modify appropriate tests~~ Done
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14711
Reviewed By: dzhulgakov
Differential Revision: D13315746
Pulled By: houseroad
fbshipit-source-id: fe6f24a45c35e88fd1a2eebc09950d4430fac185
Summary:
Previously symbolic AD formulas assumed that no broadcasting happened,
and would return gradients of incorrect shapes (possibly leading to
silent errors later).
Fixes a few bugs (known and unknown):
- #11736
- ArgumentSpec didn't compute the input types correctly [(it didn't advance the offset for non-tensor args)](https://github.com/pytorch/pytorch/pull/14485/files#diff-4fd3157a056596aefb8cdf41022a208bR153)
- Symbolic AD could suffer from use after free (dangling pointers in grad map), because [`EliminateDeadCode` could have removed nodes](https://github.com/pytorch/pytorch/pull/14485/files#diff-25d33ad1ed6855684dec79d927ca6142L781) that referenced gradients of certain values.
- Undefined behavior in `aten::size`
During my tests I've also found a few new problems, and I have opened issues for them:
- FusionGroup seems to think that cat nodes broadcast their inputs (#14483)
- `prim::ConstantChunk` derivative formula doesn't handle undefined inputs (#14484)
This patch unfortunately deoptimizes some of our code (Fusion doesn't happen past chunk nodes, and outputs more tensors only because we have to get their size). I know how to fix those issues, but wanted to fix this terrible bug quickly.
cc zou3519 zdevito ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14485
Reviewed By: eellison
Differential Revision: D13312888
Pulled By: suo
fbshipit-source-id: ad46bfb4d0a306ad9451002f8270f7a790f72d58
Summary:
Previously symbolic AD formulas assumed that no broadcasting happened,
and would return gradients of incorrect shapes (possibly leading to
silent errors later).
Fixes a few bugs (known and unknown):
- #11736
- ArgumentSpec didn't compute the input types correctly [(it didn't advance the offset for non-tensor args)](https://github.com/pytorch/pytorch/pull/14485/files#diff-4fd3157a056596aefb8cdf41022a208bR153)
- Symbolic AD could suffer from use after free (dangling pointers in grad map), because [`EliminateDeadCode` could have removed nodes](https://github.com/pytorch/pytorch/pull/14485/files#diff-25d33ad1ed6855684dec79d927ca6142L781) that referenced gradients of certain values.
- Undefined behavior in `aten::size`
During my tests I've also found a few new problems, and I have opened issues for them:
- FusionGroup seems to think that cat nodes broadcast their inputs (#14483)
- `prim::ConstantChunk` derivative formula doesn't handle undefined inputs (#14484)
This patch unfortunately deoptimizes some of our code (Fusion doesn't happen past chunk nodes, and outputs more tensors only because we have to get their size). I know how to fix those issues, but wanted to fix this terrible bug quickly.
cc zou3519 zdevito ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14485
Differential Revision: D13280899
Pulled By: soumith
fbshipit-source-id: 80cc5ec9331be80e1bb9ddfe85b81c2b997e0b0c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14414
The previous functions were CUDA-centric, and lead to lots of places
where we improperly assumed that CUDA is the only game in town (it's not).
Best to delete them.
What are your alternatives? This diff fix some use sites which may give
you some ideas. In particular, the "given a device type, give me the
current device for that device type" might be a good function to enshrine
for real.
Reviewed By: gchanan
Differential Revision: D13218540
fbshipit-source-id: 2f42cd6b9bdab4930d25166b8041c9466a1c6e0a
Summary:
Stacked on zip commit because it also changes expect files, read only the last commit.
This reduces the number of ways we can print a Type from 3 (python_str, str, operator<<) to 2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14657
Differential Revision: D13288912
Pulled By: zdevito
fbshipit-source-id: f8dd610cea798c511c1d4327395bba54b1aa1697
Summary:
Make Samplers optionally accept new size in their reset() method. This helps dataloader or dataset to reset the sampler for an epoch or a chunk of data with different sizes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13870
Differential Revision: D13240120
Pulled By: soumith
fbshipit-source-id: 19c53f8be13c0fdcf504f0637b0d3e6009a8e599
Summary:
I noticed the test `DataLoaderTest.CanDereferenceIteratorMultipleTimes` doesn't test proper progression of the iterator. I also added a test for using `std::copy`.
Fixes https://github.com/pytorch/pytorch/issues/14276
ebetica ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14045
Differential Revision: D13092187
Pulled By: goldsborough
fbshipit-source-id: 57698ec00fa7b914b159677a4ab38b6b25c2860b
Summary:
This PR is deceptively large because of an indenting change. The actual change is small; I will highlight it inline
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14325
Differential Revision: D13183296
Pulled By: suo
fbshipit-source-id: fcbf6d5317954694ec83e6b8cc1c989f2d8ac298
Summary:
First draft of an alias analysis pass. It's a big PR unfortunately; a rough table of contents/suggested order of review:
1. `AliasAnalysis` pass, which traverses the graph and builds an `AliasDb`. The basic strategy is to assign alias information to every value of mutable type (list/tuple/tensor), and use the alias annotations of each node's schema to assign alias info to the outputs based on the alias info the inputs. Nodes that aren't explicitly schematized have hand-written analysis rules.
2. Integration of aliasing information into `moveBefore/AfterTopologicallyValid()`. Basically, we pass in an alias DB when we ask for moveBefore/After. Similar to how we can boil down dependency analysis to "what nodes use this node", we can boil down mutability analysis to "what nodes write to an alias set input/output'd by this node".
3. Integration of alias analysis to optimization passes that need it. Right now, it is `GraphFuser`, `CreateAutodiffSubgraphs`, constant prop, and CSE. Not sure if any others need it.
- Testing; still figuring out the best way to do this.
- Eventually we want to integrate the alias db into the graph, but we shouldn't do that until we can guarantee that the information can stay up to date with mutations.
- Do the same thing `python_printer` did for operators and force people to register alias analyzers if they can't schematize their op.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14018
Differential Revision: D13144906
Pulled By: suo
fbshipit-source-id: 1bc964f9121a504c237cef6dfeea6b233694de6a
Summary:
zdevito soumith
Sorry about the previous PR, had some git issues. This is the same exact code as the previous PR but updated w.r.t pytorch/master.
fixes#13254
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14181
Differential Revision: D13117688
Pulled By: soumith
fbshipit-source-id: 044840b2c7a0101ef43dd16655fd9a0f9981f53f
Summary:
This PR adds a `SharedDataset` to the C++ frontend data API, which allows wrapping a shared_ptr to a dataset into a class that conforms to the `Dataset` interface (with `get_batch`). This enables use cases where a custom dataset is (1) thread-safe and (2) expensive to copy. All workers will reference a single instance of this dataset. No additional copies are incurred.
jaliyae apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13800
Differential Revision: D13075610
Pulled By: goldsborough
fbshipit-source-id: 4ffdfd7959d49b042c0e254110085f62a0bfeb6c
Summary:
This reverts commit 37cb357d8d.
Try to see if it unbreaks master
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14082
Differential Revision: D13095888
Pulled By: bddppq
fbshipit-source-id: c728f80f233b4d9daaf65f43202d8104651029a9
Summary:
Deletes the `OptionsGuard` from ATen. This works towards the goal of reworking `DefaultTensorOptions`. `OptionsGuard` is troublesome because it relies on mutating thread local state. This PR fixes those code locations and then deletes the `OptionsGuard`.
ezyang gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13738
Differential Revision: D13000962
Pulled By: goldsborough
fbshipit-source-id: c8143ee75070c2280f5fd1d9af86f8ce14279b72
Summary:
I think this will be it. So for one, the previous test was bullshit because it was returning the thread id instead of the sample index (which is the thing whose ordering is enforced). Just turning up the number of threads to 10 from 4 made this very obvious. I also think there is a race condition, which may or may not have surfaced, in that there was nothing stopping one worker to get multiple batches, which would screw with the whole ordering logic. I've added a barrier struct such that workers wait for all workers to be in the `get_batch` function before actually doing something.
Fixes https://github.com/pytorch/pytorch/issues/14002
ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14038
Differential Revision: D13088132
Pulled By: goldsborough
fbshipit-source-id: 4bded63756c6a49502ee07ef8709a03073e7e05f
Summary:
* Add hooks to get a callback whenever a valid graph is produced in the compiler or through tracing. These hooks can be used to pretty_print and then reparse every graph our tests produce to check that the serialization function works correctly. Currently this is guarded by an environment variable since there are a few remaining failures.
* Fix printing bugs: True and False rather than 1 and 0, print 0. for floating point zero
* Change behavior of NoneType. It is now no longer a subtype of Optional but instead implicitly converts to it, returning a prim::Node with an Option[T] type for some specific T. This allows functions like `_unwrap_optional` to correctly match against a None while still deriving the right type.
* Fix a bug where empty blocks did not correctly emit "pass" in printer.
* Fix a bug where prim::Undefine sometimes cannot be printed as None because it is being used in a schema-less op. This should be fixable once Optional[T] always uses the same None object.
* Other minor printing bugs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13959
Reviewed By: jamesr66a
Differential Revision: D13073519
Pulled By: zdevito
fbshipit-source-id: 4167a6b614f2e87b4d21823275a26be5ba4fc3dd
Summary:
Grab bag of additions to alias annotations that were useful when writing the alias analysis pass. Not very organized since these were mostly split off from that PR.
- Switch alias sets to actual sets, since we will want to union them.
- Correctly parse alias set unions `a|b`, and correctly parse wildcards
- Move writes into `AliasInfo`, which cleans up some code that was passing a `writes` vector everywhere and simplifies tracking aliased writes during analysis.
- Change Tensor list extraction ops to return wildcard tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13632
Differential Revision: D13088038
Pulled By: suo
fbshipit-source-id: 49dc5d0e9cd4895427fea3a87b0ec325bd5fe437
Summary:
Migrate the `CreateAutodiffSubgraphs` pass to use topologically-safe moves instead of DynamicDAG. This is to unify the interface that we use for determining safe node moves to prepare for mutability.
The pass looks a lot like GraphFuser now, and there's a lot of code duplication. I plan to pull common stuff out into a "subgraph manipulation utils" thing, but didn't want to clutter this PR.
Future steps:
- Get rid of code duplication (see above)
- Use DynamicDAG to back the `moveBefore/After` calls.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13862
Differential Revision: D13072871
Pulled By: suo
fbshipit-source-id: 92e7880ef444e0aefd51df60964bba7feaf42ae0
Summary:
Attempts to unflake the dataloader ordering enforcement test. I think the issue was that the `thread_counter` variable was not atomic. I've made it atomic, and also global just to make it a bit clearer.
Fixes https://github.com/pytorch/pytorch/issues/13634
colesbury SsnL ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13919
Differential Revision: D13051718
Pulled By: goldsborough
fbshipit-source-id: b9f7f6317701a8b861a1d5c6a9b2b17b44782561
Summary:
This PR adds Windows support for the C++ frontend. A lot of declarations were missing `TORCH_API` macros, and lots of code just did not compile on MSVC.
ebetica ezyang orionr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11716
Reviewed By: orionr
Differential Revision: D13038253
Pulled By: goldsborough
fbshipit-source-id: c8e5a45efd26117aeb99e768b56fcd5a89fcb9f8
Summary:
Extend `isAfter` to work for nodes in different blocks. This is useful if we want to ask a question like "are any of the uses of value `v` after this node", since uses may be inside inner blocks.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13855
Differential Revision: D13030528
Pulled By: suo
fbshipit-source-id: f681405396f3ec68eec1a2cb92e40873921a4b78
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13342
This PR introduces a few new concepts:
- DeviceGuardImplInterface, and implementations for CPU and CUDA, which
provide a generic interface for interfacing with device and stream state,
without requiring a direct dependency on the code in question.
- InlineDeviceGuard, a general template for generating both specialized
and dynamically dispatched device guard implementations. Dynamic
dispatch is done by specializing it on a VirtualGuardImpl.
- Provide a device-independent DeviceGuard class, which can be used even
from CPU code. It uses the aforementioned dynamic dispatch.
- CUDA-specialized CUDAGuard class, which doesn't have a dynamic dispatch
but can only be used from CUDA.
- StreamGuard, which is the same as above, but for streams rather than
devices.
- Optional variants of all the aforementioned guards, which are a no-op if
no device/stream is specified
- CUDAMultiStreamGuard, specifically for the case when we want to set
a device on every guard.
There are some subtle semantic changes, which have been thoroughly documented
in the class definition.
BC-breaking changes:
- Move constructor/assignment have been removed from all device guard
implementations.
- In some cases where you previously wrote 'set_device' (or 'set_stream'), you now must write
'reset_device', because if you switch devices/device types, the stream/device on the
previous device is unset. This is different from previous behavior.
- CUDAGuard no longer handles streams, or multiple streams. Use CUDAStreamGuard
or CUDAMultiStreamGuard as appropriate for your use case.
Reviewed By: dzhulgakov
Differential Revision: D12849620
fbshipit-source-id: f61956256f0b12be754b3234fcc73c2abc1be04e
Summary:
We have an MNIST reader in the C++ data API, so we can get rid of the custom one currently implemented in the integration tests.
ebetica
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13737
Differential Revision: D12990936
Pulled By: goldsborough
fbshipit-source-id: 125a1910ec91d53dbf121570fc9eec6ccfba0477
Summary:
In particular, this was breaking the logic for cudnn algorithm to fall back to a less memory hungry algorithm if the selected one OOM when creating the workspace.
c10::Error are subclass of `std::exception` and not `std::runtime_error`.
I removed `runtime_error` in all places in our code and replaced them with `const exception`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13665
Differential Revision: D12958396
Pulled By: soumith
fbshipit-source-id: af557efd9887b013140113d3067de157ffcf8465
Summary:
This is a pre-cursor diff to Python <-> C++ frontend integration -- I have a follow-up PR coming for that. This PR changes the C++ frontend module interface to replace the custom "cursor"s I introduced some time ago with `OrderedDict`. I introduced cursors at the time as a convenient way of applying functions and query operations on a modules' parameters, buffers and modules, allowing things like `module.parameters().map(my_func)`. However, I noticed that (1) this functionality is easily implement-able on top of a regular data structure and (2) more importantly, using OrderedDicts is much, much easier for Python integration. This is especially true given that ScriptModule today also uses OrderedDict. Since C++ frontend modules and ScriptModules will soon too share as many implementation details as possible, it is overall the best move to ditch the custom cursor datastructure and pervasively use OrderedDict everywhere.
For this I did:
1. Changed the C++ frontend module interface to more closely match the Python one by providing `parameters()`, `named_parameters()` and other methods Python provides. This is very important for the following diff which binds these into Python for inter-op with Python modules.
2. In lieu of the `Cursor::apply()` method I added `nn::Module::apply`. This again is one more unifying step between Python and C++, since Python modules have an apply function too.
3. Deleted all uses of Cursor.
4. Tidied and beefed up the `OrderedDict` class. In particular, I made `OrderedDict::Item` store an `std::pair` under the hood, because that is trivial to bind into Python and saved me a lot of headaches. `key` and `value` become methods instead of fields, which they should have been from the very start anyway because it allows exactly these kinds of changes, as per usual good software engineering principle of encapsulation.
5. Added many tests for the OrderedDict use in `nn::Module`.
ebetica ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13427
Differential Revision: D12894092
Pulled By: goldsborough
fbshipit-source-id: 715770c95a9643753a1db26d7f9da9a78619a15d
Summary:
In TorchScript and C++ extensions we currently advocate a mix of `torch::` and `at::` namespace usage. In the C++ frontend I had instead exported all symbols from `at::` and some from `c10::` into the `torch::` namespace. This is far, far easier for users to understand, and also avoid bugs around creating tensors vs. variables. The same should from now on be true for the TorchScript C++ API (for running and loading models) and all C++ extensions.
Note that since we're just talking about typedefs, this change does not break any existing code.
Once this lands I will update stuff in `pytorch/tutorials` too.
zdevito ezyang gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13523
Differential Revision: D12942787
Pulled By: goldsborough
fbshipit-source-id: 76058936bd8707b33d9e5bbc2d0705fc3d820763