Summary:
This PR inlines `Attributes` into `Node`. It helps to cleanup the code a little as everything is one place (some of the cleanups are included in the PR).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16098
Differential Revision: D13717637
Pulled By: ZolotukhinM
fbshipit-source-id: c54ae65178a95a01354688921a9ccb1ca699f8eb
Summary:
In Python, you can use the call operator to invoke the `forward()` method of a module. In C++ this was currently not possible, because I couldn't figure out how to deduce the return type of a module's `forward()` method under the constraint that `forward()` may not exist at all (since the base module class in C++ does not mandate a `forward()` method). I now figured it out, so the call operator can be used.
ezyang ebetica
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15831
Differential Revision: D13652676
Pulled By: goldsborough
fbshipit-source-id: ccab45a15215dda56460e560f0038781b539135f
Summary:
This is the first of several PRs to simplify AliasDb usage.
- Hide the concept wildcards from users. They are too hard to think about and too easy to forget about.
- Start moving "mutability-safe" graph mutation methods into AliasDb (right now, the various methods that deal with topological move).
Eventually I want to create a "mutability-aware" handle to the graph. If you only use that handle to transform the graph, you can be sure that all transformations are safe with respect to mutability.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15656
Differential Revision: D13615492
Pulled By: suo
fbshipit-source-id: 5c39a157b4ea76f1f976315d06a314a89cc4f22f
Summary:
Wanted to use `Tensor.isnan` in C++, figured it'd be nice to have, so I made it into a tiny native function.
gchanan ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15722
Differential Revision: D13591315
Pulled By: goldsborough
fbshipit-source-id: a78bd22101fde87a0257f759b9bfcf3b4208f5fa
Summary:
The PR clang-formats everything in `torch/csrc/jit/` and adds it to the pre-commit hook.
Here is a list of non-mechanical changes:
- I went over each file and fixed up whenever I could tell that clang-format was clobbering comment formatting.
- Made the macros in register_prim_ops a little more clang-format friendly by omitting trailing commas
- Refactored autodiff.cpp to use a helper class with explicit state rather than a bunch of capturing lambdas
- Small improvements to the precommit hook clang-format
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15524
Differential Revision: D13547989
Pulled By: suo
fbshipit-source-id: 3ff1541bb06433ccfe6de6e33f29227a2b5bb493
Summary:
Currently re-implements the dataloader for stateful datasets. Outstanding work:
- Refactor DataLoader and DataLoader2 to have common base classes and only differ in specifi pieces of logic,
- Figure out how to not duplicate the `MapDataset` logic for stateful vs. non-stateful
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15096
Differential Revision: D13522043
Pulled By: goldsborough
fbshipit-source-id: 08e461ca51783047f11facc4d27dfa2e4f1e4c2a
Summary:
(otherwise len is not resolvable using torch::jit::compile)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15488
Differential Revision: D13539991
Pulled By: zdevito
fbshipit-source-id: 3ba85fa7b1adb163f9229c568f7997d22321903d
Summary:
This PR enables autodiff to use the forward/backward graph compiled from python code, instead of using symbolic gradients(modifying the original graph directly).
We put the map in a separate .h file for now to wait for the native_functions.yaml and derivatives.yaml merge. This should ideally go into native_functions.yaml eventually.
This PR should be enough to unblock us for now, we can start writing gradients for aten functions in python.
Differential Revision: D13494635
Pulled By: ailzhang
fbshipit-source-id: f8d51a15243ac46afd09d930c573ccdfcd9fdaaf
Summary:
This makes DCE more granular by tracking live values/aliases through the graph (rather than just nodes). So we can be more aggressive in DCE around control flow blocks. For example, in:
```
%a0 = aten::foo()
%b = aten::foo()
%a2, %b2 = prim::If(%cond) {
block0() {
%a1 = aten::foo(%.0)
%b1 = aten::foo(%b)
} -> (%a1, %b1)
}
return (%a2)
```
we will now dce all the `%b` stuff.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14910
Differential Revision: D13476445
Pulled By: suo
fbshipit-source-id: 2bf5db19711c07dde946697a4f4b270bd8baf791
Summary:
This PR fixes around 250 places in the codebase where we were making unnecessary copies of objects (some large, some small).
ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15026
Differential Revision: D13458784
Pulled By: goldsborough
fbshipit-source-id: be5148b2ce09493588d70952e6f6d6ff5ec5199b
Summary:
Before this PR, loop unrolling + the graph fuser was creating multiple
FusionGroups with the same bodies (with different variable names) for
JIT LSTMs. Each FusionGroup got registered to a separate fusion key;
each key resulted in a different compilation for the same
specializations.
This PR makes it so that when registering FusionGroups with the fusion
compiler, the compiler first checks the KernelSpec cache to see if the
FusionGroup's graph exists already. If it does, then return the
corresponding KernelSpec's key to share compiled kernels.
In addition, graphs in the KernelSpec cache are canonicalized before
being cached. I added a flag to the canonicalize pass to remove unique
names of values.
This shortens the compile time for a JIT LSTM (seq_len of 100, loop
unroll factor of 8) from 5.3s to 2.3s. Most of this compile time is
running the graph fuser and/or fusion compiler; while this PR
makes it so that there is only one unique kernel in the forward pass,
there are a lot of different kernels (6) in the backward pass
(after loop unrolling) that should be investigated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14541
Differential Revision: D13324487
Pulled By: zou3519
fbshipit-source-id: b841d82ed35a959b5cfc72db033bf5a7b42cc4fb
Summary:
Fixes a bug where (de-)/serializing a hierarchy of submodules where one submodule doesn't have any parameters, but its submodules do, doesn't get properly loaded. This had to do with the fact that the old protobuf format couldn't store empty parameters.
Fixes https://github.com/pytorch/pytorch/issues/14891
soumith ezyang ebetica
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15033
Differential Revision: D13411322
Pulled By: goldsborough
fbshipit-source-id: 2ef73b2aa93fa9e46b1cbe1fd47d9f134d6016d5
Summary:
Removing the deprecated functions in `torch/csrc/variable_tensor_functions.h` (like `torch::CPU`) and corresponding implementations from `torch/csrc/torch.cpp` from master after the release.
ezyang gchanan soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15003
Differential Revision: D13418086
Pulled By: goldsborough
fbshipit-source-id: a0accdf6f7b0efa1ec07ac7b74b86ff2da37543f
Summary:
Replaces the `DefaultTensorOptions` with just a global default dtype that you can set and get like in Python.
Also, calls `set_default_dtype` in the implementation of `torch.set_default_dtype`. Right now these two default values are separate but will always be the same. Should we just bind `set_default_dtype` into Python? I think that might be good to do in a separate PR though.
ezyang gchanan
Also CC colesbury who wanted to do this for ATen for a while? What do you think about it?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13748
Differential Revision: D13340207
Pulled By: goldsborough
fbshipit-source-id: 2689b09eb137fabb3a92d1ad1635782bee9398e8
Summary:
**Review only the last commit.**
This commit adds a few optimizations to AD, that let us dramatically
reduce the number of sizes we capture from forward.
We now:
- collapse chains of SumToSize
- avoid capturing sizes of tensors that are captured anyway
- more aggressively DCE the reverse code
- run CSE on the primal code to deduplicate `aten::size` calls
cc zou3519 zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14758
Differential Revision: D13324440
Pulled By: zou3519
fbshipit-source-id: 45ccbc13605adcef2b461840c6089d3200000c72
Summary:
Make `at::_local_scalar` more "official" by renaming it to `item()`.
gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13676
Differential Revision: D13003020
Pulled By: goldsborough
fbshipit-source-id: 0ac25f5237fb81a1576304a0a02f840ff44168a4
Summary:
This is a stack PR based on https://github.com/pytorch/pytorch/pull/14454.
It enables the restoring the storage to appropriate device.
~~[TODO]: add/modify appropriate tests~~ Done
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14711
Reviewed By: dzhulgakov
Differential Revision: D13315746
Pulled By: houseroad
fbshipit-source-id: fe6f24a45c35e88fd1a2eebc09950d4430fac185
Summary:
Previously symbolic AD formulas assumed that no broadcasting happened,
and would return gradients of incorrect shapes (possibly leading to
silent errors later).
Fixes a few bugs (known and unknown):
- #11736
- ArgumentSpec didn't compute the input types correctly [(it didn't advance the offset for non-tensor args)](https://github.com/pytorch/pytorch/pull/14485/files#diff-4fd3157a056596aefb8cdf41022a208bR153)
- Symbolic AD could suffer from use after free (dangling pointers in grad map), because [`EliminateDeadCode` could have removed nodes](https://github.com/pytorch/pytorch/pull/14485/files#diff-25d33ad1ed6855684dec79d927ca6142L781) that referenced gradients of certain values.
- Undefined behavior in `aten::size`
During my tests I've also found a few new problems, and I have opened issues for them:
- FusionGroup seems to think that cat nodes broadcast their inputs (#14483)
- `prim::ConstantChunk` derivative formula doesn't handle undefined inputs (#14484)
This patch unfortunately deoptimizes some of our code (Fusion doesn't happen past chunk nodes, and outputs more tensors only because we have to get their size). I know how to fix those issues, but wanted to fix this terrible bug quickly.
cc zou3519 zdevito ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14485
Reviewed By: eellison
Differential Revision: D13312888
Pulled By: suo
fbshipit-source-id: ad46bfb4d0a306ad9451002f8270f7a790f72d58
Summary:
Previously symbolic AD formulas assumed that no broadcasting happened,
and would return gradients of incorrect shapes (possibly leading to
silent errors later).
Fixes a few bugs (known and unknown):
- #11736
- ArgumentSpec didn't compute the input types correctly [(it didn't advance the offset for non-tensor args)](https://github.com/pytorch/pytorch/pull/14485/files#diff-4fd3157a056596aefb8cdf41022a208bR153)
- Symbolic AD could suffer from use after free (dangling pointers in grad map), because [`EliminateDeadCode` could have removed nodes](https://github.com/pytorch/pytorch/pull/14485/files#diff-25d33ad1ed6855684dec79d927ca6142L781) that referenced gradients of certain values.
- Undefined behavior in `aten::size`
During my tests I've also found a few new problems, and I have opened issues for them:
- FusionGroup seems to think that cat nodes broadcast their inputs (#14483)
- `prim::ConstantChunk` derivative formula doesn't handle undefined inputs (#14484)
This patch unfortunately deoptimizes some of our code (Fusion doesn't happen past chunk nodes, and outputs more tensors only because we have to get their size). I know how to fix those issues, but wanted to fix this terrible bug quickly.
cc zou3519 zdevito ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14485
Differential Revision: D13280899
Pulled By: soumith
fbshipit-source-id: 80cc5ec9331be80e1bb9ddfe85b81c2b997e0b0c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14414
The previous functions were CUDA-centric, and lead to lots of places
where we improperly assumed that CUDA is the only game in town (it's not).
Best to delete them.
What are your alternatives? This diff fix some use sites which may give
you some ideas. In particular, the "given a device type, give me the
current device for that device type" might be a good function to enshrine
for real.
Reviewed By: gchanan
Differential Revision: D13218540
fbshipit-source-id: 2f42cd6b9bdab4930d25166b8041c9466a1c6e0a
Summary:
Stacked on zip commit because it also changes expect files, read only the last commit.
This reduces the number of ways we can print a Type from 3 (python_str, str, operator<<) to 2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14657
Differential Revision: D13288912
Pulled By: zdevito
fbshipit-source-id: f8dd610cea798c511c1d4327395bba54b1aa1697
Summary:
Make Samplers optionally accept new size in their reset() method. This helps dataloader or dataset to reset the sampler for an epoch or a chunk of data with different sizes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13870
Differential Revision: D13240120
Pulled By: soumith
fbshipit-source-id: 19c53f8be13c0fdcf504f0637b0d3e6009a8e599
Summary:
I noticed the test `DataLoaderTest.CanDereferenceIteratorMultipleTimes` doesn't test proper progression of the iterator. I also added a test for using `std::copy`.
Fixes https://github.com/pytorch/pytorch/issues/14276
ebetica ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14045
Differential Revision: D13092187
Pulled By: goldsborough
fbshipit-source-id: 57698ec00fa7b914b159677a4ab38b6b25c2860b
Summary:
This PR is deceptively large because of an indenting change. The actual change is small; I will highlight it inline
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14325
Differential Revision: D13183296
Pulled By: suo
fbshipit-source-id: fcbf6d5317954694ec83e6b8cc1c989f2d8ac298
Summary:
First draft of an alias analysis pass. It's a big PR unfortunately; a rough table of contents/suggested order of review:
1. `AliasAnalysis` pass, which traverses the graph and builds an `AliasDb`. The basic strategy is to assign alias information to every value of mutable type (list/tuple/tensor), and use the alias annotations of each node's schema to assign alias info to the outputs based on the alias info the inputs. Nodes that aren't explicitly schematized have hand-written analysis rules.
2. Integration of aliasing information into `moveBefore/AfterTopologicallyValid()`. Basically, we pass in an alias DB when we ask for moveBefore/After. Similar to how we can boil down dependency analysis to "what nodes use this node", we can boil down mutability analysis to "what nodes write to an alias set input/output'd by this node".
3. Integration of alias analysis to optimization passes that need it. Right now, it is `GraphFuser`, `CreateAutodiffSubgraphs`, constant prop, and CSE. Not sure if any others need it.
- Testing; still figuring out the best way to do this.
- Eventually we want to integrate the alias db into the graph, but we shouldn't do that until we can guarantee that the information can stay up to date with mutations.
- Do the same thing `python_printer` did for operators and force people to register alias analyzers if they can't schematize their op.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14018
Differential Revision: D13144906
Pulled By: suo
fbshipit-source-id: 1bc964f9121a504c237cef6dfeea6b233694de6a
Summary:
zdevito soumith
Sorry about the previous PR, had some git issues. This is the same exact code as the previous PR but updated w.r.t pytorch/master.
fixes#13254
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14181
Differential Revision: D13117688
Pulled By: soumith
fbshipit-source-id: 044840b2c7a0101ef43dd16655fd9a0f9981f53f
Summary:
This PR adds a `SharedDataset` to the C++ frontend data API, which allows wrapping a shared_ptr to a dataset into a class that conforms to the `Dataset` interface (with `get_batch`). This enables use cases where a custom dataset is (1) thread-safe and (2) expensive to copy. All workers will reference a single instance of this dataset. No additional copies are incurred.
jaliyae apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13800
Differential Revision: D13075610
Pulled By: goldsborough
fbshipit-source-id: 4ffdfd7959d49b042c0e254110085f62a0bfeb6c
Summary:
This reverts commit 37cb357d8d.
Try to see if it unbreaks master
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14082
Differential Revision: D13095888
Pulled By: bddppq
fbshipit-source-id: c728f80f233b4d9daaf65f43202d8104651029a9
Summary:
Deletes the `OptionsGuard` from ATen. This works towards the goal of reworking `DefaultTensorOptions`. `OptionsGuard` is troublesome because it relies on mutating thread local state. This PR fixes those code locations and then deletes the `OptionsGuard`.
ezyang gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13738
Differential Revision: D13000962
Pulled By: goldsborough
fbshipit-source-id: c8143ee75070c2280f5fd1d9af86f8ce14279b72
Summary:
I think this will be it. So for one, the previous test was bullshit because it was returning the thread id instead of the sample index (which is the thing whose ordering is enforced). Just turning up the number of threads to 10 from 4 made this very obvious. I also think there is a race condition, which may or may not have surfaced, in that there was nothing stopping one worker to get multiple batches, which would screw with the whole ordering logic. I've added a barrier struct such that workers wait for all workers to be in the `get_batch` function before actually doing something.
Fixes https://github.com/pytorch/pytorch/issues/14002
ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14038
Differential Revision: D13088132
Pulled By: goldsborough
fbshipit-source-id: 4bded63756c6a49502ee07ef8709a03073e7e05f
Summary:
* Add hooks to get a callback whenever a valid graph is produced in the compiler or through tracing. These hooks can be used to pretty_print and then reparse every graph our tests produce to check that the serialization function works correctly. Currently this is guarded by an environment variable since there are a few remaining failures.
* Fix printing bugs: True and False rather than 1 and 0, print 0. for floating point zero
* Change behavior of NoneType. It is now no longer a subtype of Optional but instead implicitly converts to it, returning a prim::Node with an Option[T] type for some specific T. This allows functions like `_unwrap_optional` to correctly match against a None while still deriving the right type.
* Fix a bug where empty blocks did not correctly emit "pass" in printer.
* Fix a bug where prim::Undefine sometimes cannot be printed as None because it is being used in a schema-less op. This should be fixable once Optional[T] always uses the same None object.
* Other minor printing bugs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13959
Reviewed By: jamesr66a
Differential Revision: D13073519
Pulled By: zdevito
fbshipit-source-id: 4167a6b614f2e87b4d21823275a26be5ba4fc3dd
Summary:
Grab bag of additions to alias annotations that were useful when writing the alias analysis pass. Not very organized since these were mostly split off from that PR.
- Switch alias sets to actual sets, since we will want to union them.
- Correctly parse alias set unions `a|b`, and correctly parse wildcards
- Move writes into `AliasInfo`, which cleans up some code that was passing a `writes` vector everywhere and simplifies tracking aliased writes during analysis.
- Change Tensor list extraction ops to return wildcard tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13632
Differential Revision: D13088038
Pulled By: suo
fbshipit-source-id: 49dc5d0e9cd4895427fea3a87b0ec325bd5fe437
Summary:
Migrate the `CreateAutodiffSubgraphs` pass to use topologically-safe moves instead of DynamicDAG. This is to unify the interface that we use for determining safe node moves to prepare for mutability.
The pass looks a lot like GraphFuser now, and there's a lot of code duplication. I plan to pull common stuff out into a "subgraph manipulation utils" thing, but didn't want to clutter this PR.
Future steps:
- Get rid of code duplication (see above)
- Use DynamicDAG to back the `moveBefore/After` calls.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13862
Differential Revision: D13072871
Pulled By: suo
fbshipit-source-id: 92e7880ef444e0aefd51df60964bba7feaf42ae0
Summary:
Attempts to unflake the dataloader ordering enforcement test. I think the issue was that the `thread_counter` variable was not atomic. I've made it atomic, and also global just to make it a bit clearer.
Fixes https://github.com/pytorch/pytorch/issues/13634
colesbury SsnL ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13919
Differential Revision: D13051718
Pulled By: goldsborough
fbshipit-source-id: b9f7f6317701a8b861a1d5c6a9b2b17b44782561
Summary:
This PR adds Windows support for the C++ frontend. A lot of declarations were missing `TORCH_API` macros, and lots of code just did not compile on MSVC.
ebetica ezyang orionr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11716
Reviewed By: orionr
Differential Revision: D13038253
Pulled By: goldsborough
fbshipit-source-id: c8e5a45efd26117aeb99e768b56fcd5a89fcb9f8
Summary:
Extend `isAfter` to work for nodes in different blocks. This is useful if we want to ask a question like "are any of the uses of value `v` after this node", since uses may be inside inner blocks.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13855
Differential Revision: D13030528
Pulled By: suo
fbshipit-source-id: f681405396f3ec68eec1a2cb92e40873921a4b78
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13342
This PR introduces a few new concepts:
- DeviceGuardImplInterface, and implementations for CPU and CUDA, which
provide a generic interface for interfacing with device and stream state,
without requiring a direct dependency on the code in question.
- InlineDeviceGuard, a general template for generating both specialized
and dynamically dispatched device guard implementations. Dynamic
dispatch is done by specializing it on a VirtualGuardImpl.
- Provide a device-independent DeviceGuard class, which can be used even
from CPU code. It uses the aforementioned dynamic dispatch.
- CUDA-specialized CUDAGuard class, which doesn't have a dynamic dispatch
but can only be used from CUDA.
- StreamGuard, which is the same as above, but for streams rather than
devices.
- Optional variants of all the aforementioned guards, which are a no-op if
no device/stream is specified
- CUDAMultiStreamGuard, specifically for the case when we want to set
a device on every guard.
There are some subtle semantic changes, which have been thoroughly documented
in the class definition.
BC-breaking changes:
- Move constructor/assignment have been removed from all device guard
implementations.
- In some cases where you previously wrote 'set_device' (or 'set_stream'), you now must write
'reset_device', because if you switch devices/device types, the stream/device on the
previous device is unset. This is different from previous behavior.
- CUDAGuard no longer handles streams, or multiple streams. Use CUDAStreamGuard
or CUDAMultiStreamGuard as appropriate for your use case.
Reviewed By: dzhulgakov
Differential Revision: D12849620
fbshipit-source-id: f61956256f0b12be754b3234fcc73c2abc1be04e
Summary:
We have an MNIST reader in the C++ data API, so we can get rid of the custom one currently implemented in the integration tests.
ebetica
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13737
Differential Revision: D12990936
Pulled By: goldsborough
fbshipit-source-id: 125a1910ec91d53dbf121570fc9eec6ccfba0477
Summary:
In particular, this was breaking the logic for cudnn algorithm to fall back to a less memory hungry algorithm if the selected one OOM when creating the workspace.
c10::Error are subclass of `std::exception` and not `std::runtime_error`.
I removed `runtime_error` in all places in our code and replaced them with `const exception`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13665
Differential Revision: D12958396
Pulled By: soumith
fbshipit-source-id: af557efd9887b013140113d3067de157ffcf8465
Summary:
This is a pre-cursor diff to Python <-> C++ frontend integration -- I have a follow-up PR coming for that. This PR changes the C++ frontend module interface to replace the custom "cursor"s I introduced some time ago with `OrderedDict`. I introduced cursors at the time as a convenient way of applying functions and query operations on a modules' parameters, buffers and modules, allowing things like `module.parameters().map(my_func)`. However, I noticed that (1) this functionality is easily implement-able on top of a regular data structure and (2) more importantly, using OrderedDicts is much, much easier for Python integration. This is especially true given that ScriptModule today also uses OrderedDict. Since C++ frontend modules and ScriptModules will soon too share as many implementation details as possible, it is overall the best move to ditch the custom cursor datastructure and pervasively use OrderedDict everywhere.
For this I did:
1. Changed the C++ frontend module interface to more closely match the Python one by providing `parameters()`, `named_parameters()` and other methods Python provides. This is very important for the following diff which binds these into Python for inter-op with Python modules.
2. In lieu of the `Cursor::apply()` method I added `nn::Module::apply`. This again is one more unifying step between Python and C++, since Python modules have an apply function too.
3. Deleted all uses of Cursor.
4. Tidied and beefed up the `OrderedDict` class. In particular, I made `OrderedDict::Item` store an `std::pair` under the hood, because that is trivial to bind into Python and saved me a lot of headaches. `key` and `value` become methods instead of fields, which they should have been from the very start anyway because it allows exactly these kinds of changes, as per usual good software engineering principle of encapsulation.
5. Added many tests for the OrderedDict use in `nn::Module`.
ebetica ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13427
Differential Revision: D12894092
Pulled By: goldsborough
fbshipit-source-id: 715770c95a9643753a1db26d7f9da9a78619a15d
Summary:
In TorchScript and C++ extensions we currently advocate a mix of `torch::` and `at::` namespace usage. In the C++ frontend I had instead exported all symbols from `at::` and some from `c10::` into the `torch::` namespace. This is far, far easier for users to understand, and also avoid bugs around creating tensors vs. variables. The same should from now on be true for the TorchScript C++ API (for running and loading models) and all C++ extensions.
Note that since we're just talking about typedefs, this change does not break any existing code.
Once this lands I will update stuff in `pytorch/tutorials` too.
zdevito ezyang gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13523
Differential Revision: D12942787
Pulled By: goldsborough
fbshipit-source-id: 76058936bd8707b33d9e5bbc2d0705fc3d820763
Summary:
This PR brings to changes to the recently landed C++ Frontend dataloader:
1. Removes the `size()` method from `BatchDataset`. This makes it cleaner to implement unsized ("infinite stream") datasets. The method was not used much beyond initial configuration.
2. Makes the index type of a dataset a template parameter of `BatchDataset` and `Sampler`. This essentially allows custom index types instead of only `vector<size_t>`. This greatly improves flexibility.
See the `InfiniteStreamDataset` and `TestIndex` datasets in the tests for what this enables.
Some additional minor updates and code movements too.
apaszke SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12960
Differential Revision: D12893342
Pulled By: goldsborough
fbshipit-source-id: ef03ea0f11a93319e81fba7d52a0ef1a125d3108
Summary:
Built on top of #13108, so please review only the last commit.
This makes the graph fuser ignore input types (device/scalar type) when considering graphs for fusion, making it much more robust to shape-prop failures. Those properties are now checked at run time, as part of the kernel validation. This should enable graph fusions in `jit_premul` and `jit_multilayer` timelines in our benchmarks.
One regression is that I've disabled fusions of comparison ops (and `type_as`). That's because there's really no good way to ensure that those are really valid, and are a source of bugs (I filed #13384).
cc ngimel mruberry zdevito zou3519
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13387
Differential Revision: D12888104
Pulled By: zou3519
fbshipit-source-id: c233ea599679c34ac70fb4d8b8497c60aad9e480
Summary:
This PR principally redesigns the fuser's logical flow to be hierarchical, with device-independent logic directing (relatively little) device-specific logic. This design is based on reviews of XLA, TVM, internal design review at NVIDIA and discussions with fuser owners at Facebook. To further vet the design I have begun developing the next significant PR (extended fusion logic) on top of this architecture and it has made the work significantly easier. This PR also improves fuser modularity, which should make it easier for others to contribute to. Unfortunately, this PR is large and its nature has made breaking it into smaller pieces challenging. Future PRs should be smaller.
The fusion flow is now:
- Fusions are "registered" and "upfront compilation" occurs. The fusion specifications, which includes the graph, go into a thread-safe device-independent cache. Upfront compilation generates some information used later during shape inference.
- Fusions are run, which passes them to an executor that performs shape inference, requests an instantiated fusion from the specification's thread-safe store, and launches them. Launch logic eventually defers to device-specific logic.
- Fusions not previously instantiated are compiled. Compilation is device-specific and arg-specific. Compilation logic eventually defers to device-specific logic.
- If the fusion could not be run because fusion on the requested device is disabled or shape inference fails a fallback is invoked.
This flow can be thought of as PyTorch IR -> Device-Independent Fusion Logic -> Device-Specific Fusion Logic. The current upstream logic is, by contrast, PyTorch IR -> Device-Specific Logic -> Device-Independent Logic, which results in needless code duplication and lack of conceptual clarity. That was my mistake when splitting the fuser off from the rest of the jit and our reviews since then have been incredibly helpful in understanding why the approach in this PR is better.
This PR does not only move code around. It also fixes few couple bugs and makes some logical/code changes.
Bug fixes:
- thread-safety is improved with caches preventing concurrent access
- the nvrtc version is now reviewed to determine the appropriate compute architecture to compile for, fixing a bug that would cause runtime errors if a user's nvrtc didn't support the compute architecture their gpu reported
- an issue with DeviceGuard not setting the device properly and failing silently is worked-around (ezyang mentioned he was reviewing the dynamic registration DeviceGuard uses, which may resolve the issue)
Code/Logical changes:
- "const" now appears many more places (note: I cast const away in operator.h because of some obscure build issues -- I think we should be able to fix this and will take a look while this goes through testing)
- The new flow allowed some redundant code to be removed (AnnotatedGraph is gone, for example, and the more straightforward flow eliminated duplication of effort elsewhere)
- Fallback logic is now also invoked if a fusion is requested on a device that cannot handle fusions
- Use of macros to determine which files are compiled is reduced (though they may come back if the Windows build is unhappy)
- There is no more "common" code or folder, the device-independent logic being at the forefront of the fuser replaces and improves upon the goal of sharing code
apaszke who I promised naming rights to
zdevito who correctly pointed out that the device-independent logic should be the bulk of what the fuser is doing
ngimel who contributed to the design of this architecture
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13108
Reviewed By: gchanan, fmassa
Differential Revision: D12850608
Pulled By: soumith
fbshipit-source-id: 24e2df6dfa97591ee36aeca8944519678c301fa3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13275
This resulted in a bunch of knock-on changes, which I will now
describe:
- s/original_index/original_device/
- s/last_index/last_device/
- A bunch of places that used set_index, now use CUDAGuard (which does have
set_index) because they were CUDA-specific code.
Major caveat: DeviceGuard doesn't *actually* work non-CUDA/CPU devices, To make
that happen, I plan on totally replacing the implementation of DeviceGuard; what
I mostly care about here is wrangling the API into an acceptable state.
Reviewed By: gchanan
Differential Revision: D12832080
fbshipit-source-id: 7de068c7cec35663dc8a533026a626331336e61d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13232
DeviceGuard should be device agnostic, which means that it shouldn't
assume that int64_t means select the CUDA device.
Reviewed By: gchanan
Differential Revision: D10858024
fbshipit-source-id: b40e8337e4046906fd8f83a95e6206367fb29dbe
Summary:
ezyang on the template hack
smessmer on SFINAE of the `TensorOptions(Device)`
goldsborough on the C++ API test changes
zdevito on the `jit` codegen changes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13146
Reviewed By: ezyang
Differential Revision: D12823809
Pulled By: SsnL
fbshipit-source-id: 98d65c401c98fda1c6fa358e4538f86c6495abdc
Summary:
Future now is an IValue. prim::Wait now is replaced by aten::wait
This PR is built on top of #12925
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12976
Differential Revision: D10861483
Pulled By: highker
fbshipit-source-id: 9e17926a625bc502fb12335ef9ce819f25776be7
Summary:
Add new methods to move a node before/after another node while preserving data data dependencies.
Any suggestions for a pithier name for the methods would be appreciated 😃
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13026
Differential Revision: D10854574
Pulled By: QueryConnectionException
fbshipit-source-id: b42751cac18d1e23940e35903c8e6a54a395292e
Summary:
Implements serialization and deserialization for samplers in the C++ frontend dataloader.
apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12999
Differential Revision: D10859676
Pulled By: goldsborough
fbshipit-source-id: cd132100fd35323e5a3df33e314511750806f48d
Summary:
This commit is a minimial initial pass at adding inplace and _out variants to the JIT.
It changes gen_jit_dispatch.py to add bindings for these operators, and it also
supplements the FunctionSchema with alias information for these operators and for
viewing operators.
Tests are very minimal and will need to be improved in future commits.
Notes:
* Custom operator tests needed to be changed since _out variants add overloads, which
the custom operator pipeline does not handle when called from python. This commit
registers special test ops in the _test namespace for this purpose.
* Extends the schema parser to parse alias annotations more robustly.
* Extends FunctionSchema with `writes()` a set of alias set names that the op will write to,
and `annotatedType()` which will return AnnotatedType objects which contain the alias_set
information that was parsed from the schema.
* Disables all optimizations in graph executor when a mutable operator is found. This
is something that will be improved in the future but is necessary for correctness now.
* Adds annotate_ops to gen_jit_dispatch which adds aliasing information to all of the
aten ops.
* Adds AnnotatedType to the type hierarchy which is used to mark List and Tensor types
with their alias_set. These types only appear in schema when you call annotatedType
and are erased from types in normal use.
* Extends jit::Type with .containedTypes() and .withContained(new_types). The first returns all types contained
within the type (e.g. T for T[], or {T,L} for a tuple (T, L)). The second constructs a new
version of the same type, replacing the contained types with new_types. This simplifies
a lot of logic for recursively cleaning up types.
* Refactor List[T] into a common part that is shared with Annotated[T] and can be shared
with Optional[T] and Future[T] when they are merged.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13093
Differential Revision: D10848176
Pulled By: zdevito
fbshipit-source-id: d057f23eeb99cde8881129b42d3f151ed5e7655d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12768
Note: DefaultTensorOptions no longer fits in 64-bits.
I kept functions that take ScalarType as input to minimize changes for now.
Reviewed By: ezyang
Differential Revision: D10419671
fbshipit-source-id: 9cc8c5982fde9ff243e03d55c0c52c2aa2c7efd8
Summary:
Does
```cpp
namespace torch {
using c10::optional;
using c10::nullopt;
}
```
So that users can be oblivious of our changes with ATen/c10 happening in the background, and also don't have to deal with multiple namespaces (which is very confusing).
ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12927
Differential Revision: D10510630
Pulled By: goldsborough
fbshipit-source-id: e456264f2fbca3eda277712de11cdd8acc77fbd4
Summary:
This PR adds optional type to ATen native, autograd, JIT schema and Python Arg parser, closes#9513. It allows us to use optional default values (including None) for function signature and implementations like clamp, etc., and also let us remove the python_default_init hack.
Follow up:
remove python_default_init completely.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12582
Differential Revision: D10417423
Pulled By: wanchaol
fbshipit-source-id: 1c80f0727bb528188b47c595629e2996be269b89
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13082
Follow up of D10511254. For these cases we can move to preferred `optional` without namespace right away.
Reviewed By: ezyang, Yangqing
Differential Revision: D10844117
fbshipit-source-id: 99a59e692fb4b236b299579f937f1536d443d899
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12991
Remove the file proxying. Before we can do land `using namespace c10` everywhere, we just keep the one off namespace proxy. The follow up diff is going to replace explicit at::optional but keep just `optional` usage
Reviewed By: ezyang, Yangqing
Differential Revision: D10511254
fbshipit-source-id: 8297c61d7e9810ae215a18869a6ec9b63f55d202
Summary:
Removes test_jit.cpp, which was supposed to have been deleted in https://github.com/pytorch/pytorch/pull/12030
I had to move zou3519's dynamic DAG tests into `test/cpp/jit/tests.h` too. No other changes to `test_jit.cpp` seem to have happened in the meantime.
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12988
Differential Revision: D10854320
Pulled By: goldsborough
fbshipit-source-id: 7ab533e6e494e34a16ce39bbe62b1150e48fcb58
Summary:
We are beginning to use this class in a wider reaching set of use-cases. This PR refactors it so that we always access schema properties through methods. This will make adding extra information like alias information easier (i.e. we can a version of `type()` that returns the type with alias information and another version that returns a type without that information).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12967
Differential Revision: D10502674
Pulled By: zdevito
fbshipit-source-id: a88783ed8f20ab3be6460c12da95f9f940891c44
Summary:
This PR implements a DataLoader API for the C++ frontend.
The components present in this API largely match the Python API. It consists of:
- `Dataset`s: Conceptually a function from a set of indices to a batch of examples;
- `Transform`s: A functional transformation of a dataset. A `Map<D, T>` for Dataset `D` and transform `T` is itself a dataset;
- `Sampler`s: Specify a strategy for generating indices for a new batch;
- A `DataLoader`, with the ability to automatically parallelize fetching of samples across multiple worker threads;
Note that collation functions fall naturally out of the `Map<Dataset, Transform>` abstraction.
Things that are missing right now that maybe should be added:
- Memory pinning for CUDA tensors
The API was designed to be generalizable to almost any kind of dataset, transform or sampling strategy, while providing a convenient API out of the box. To achieve this, it is quite heavily templatized on various possible input types.
There are many parts to this PR! Right now, I would like feedback on:
- Your impression of the general usability of the API;
- Your impression of which parts seem too complex or overthought;
- The implementation of the parallelization aspects of the DataLoader. I've followed the Python implementation in some matters, but also differ in others. I think my implementation is a little cleaner and decouples components slightly better than the Python dataloader.
I haven't added too many comments yet, as this is fresh out of the oven. Let me know if anything is unclear from the code itself.
There also aren't any tests yet. I will write a comprehensive test suite once we agree on the API and implementation.
apaszke ezyang The controller you requested could not be found. pietern
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11918
Reviewed By: ezyang
Differential Revision: D9998881
Pulled By: goldsborough
fbshipit-source-id: 22cf357b63692bea42ddb1cc2abc71dae5030aea
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12792
This is a follow up diff after D10238910.
Only non-codemod change is the removal of ATen/Error.h and ATen/core/Error.h. Other files are basically changing the inclusion path + clang format for inclusion order.
Reviewed By: bddppq
Differential Revision: D10437824
fbshipit-source-id: 7f885f80ab5827468d1351cfb2765d0e3f555a69
Summary:
This PR does three things:
1. Add support for serializing to `ostream` and deserializing from `istream`s in addition to files. This is after https://github.com/pytorch/pytorch/pull/11932 added support for streams in `torch::jit::ExportModule` and `torch::jit::load`.
2. Update the internal interface for how things get serialized into archives (e.g. use the more idiomatic `operator<<` instead of a `save` method). *The external interface does not change*.
3. Add documentation.
ezyang ebetica
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12421
Reviewed By: ezyang
Differential Revision: D10248529
Pulled By: goldsborough
fbshipit-source-id: 6cde6abd0174e3fbf3579c05376a32db0b53755f
Summary:
... they are basically the same class and I didn't see it in the initial PR. I also got resolvers back onto std::functions by keeping the function_table logic local to defineMethodInModules.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12589
Differential Revision: D10383103
Pulled By: zdevito
fbshipit-source-id: 1b0a85eb4f112bc28256cac44446d671d803d3a2
Summary:
This commit adds the hooks in schema parser for futures, options,
mutable alias sets, marking writes, and named output arguments that
need to exist for other upcoming work.
This also fixes that problem where you could not declare Lists of Lists.
Implementation of most of these features is left NYI. This commit should
avoid merge conflicts for these individual features on the schema parser.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12585
Differential Revision: D10382229
Pulled By: zdevito
fbshipit-source-id: 41d794e58ca462cf3a389861c533c68944dc560b
Summary:
There are still a few work to be done:
- Move logging and unify AT_WARN with LOG(ERROR).
- A few header files are still being plumbed through, need cleaning.
- caffe2::EnforceNotMet aliasing is not done yet.
- need to unify the macros. See c10/util/Exception.h
This is mainly a codemod and not causing functional changes. If you find your job failing and trace back to this diff, usually it can be fixed by the following approaches:
(1) add //caffe2/c10:c10 to your dependency (or transitive dependency).
(2) change objects such as at::Error, at::Optional to the c10 namespace.
(3) change functions to the c10 namespace. Especially, caffe2::MakeString is not overridden by the unified c10::str function. Nothing else changes.
Please kindly consider not reverting this diff - it involves multiple rounds of rebasing and the fix is usually simple. Contact jiayq@ or AI Platform Dev for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12354
Reviewed By: orionr
Differential Revision: D10238910
Pulled By: Yangqing
fbshipit-source-id: 7794d5bf2797ab0ca6ebaccaa2f7ebbd50ff8f32
Summary:
In our #better-engineering quest of removing all uses of catch in favor of gtest, this PR ports JIT tests to gtest. After #11846 lands, we will be able to delete catch.
I don't claim to use/write these tests much (though I wrote the custom operator tests) so please do scrutinize whether you will want to write tests in the way I propose. Basically:
1. One function declaration per "test case" in test/cpp/jit/test.h
2. One definition in test/cpp/jit/test.cpp
3. If you want to be able to run it in Python, add it to `runJitTests()` which is called from Python tests
4. If you want to be able to run it in C++, add a `JIT_TEST` line in test/cpp/jit/gtest.cpp
Notice also I was able to share support code between C++ frontend and JIT tests, which is healthy.
ezyang apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12030
Differential Revision: D10207745
Pulled By: goldsborough
fbshipit-source-id: d4bae087e4d03818b72b8853cd5802d79a4cf32e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12103
This defers lookup of defaults to the site where we read
out of TensorOptions. THIS IS A BC-BREAKING BEHAVIOR CHANGE,
but we expect the bulk of uses of OptionsGuard don't allocate TensorOptions
inside the OptionsGuard region, and then use it outside of the region
(the situation where behavior could change.)
I also optimize the size of TensorOptions by rearranging fields, so that we
always fit in two 64-bit words.
Reviewed By: goldsborough
Differential Revision: D10052523
fbshipit-source-id: f454a15b4dbf8cd17bc902ab7d2016f2f689ed13
Summary:
Tensors cannot be created globally because of static initialization order issues. So tensors for the optim_baseline test must be created lazily instead. This is fine because these functions will only be called once (in the respective test).
ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12301
Differential Revision: D10201008
Pulled By: goldsborough
fbshipit-source-id: 59a041f437354e7c6600e5655b3e2d0647dbde9e
Summary:
This PR is a large codemod to rewrite all C++ API tests with GoogleTest (gtest) instead of Catch.
You can largely trust me to have correctly code-modded the tests, so it's not required to review every of the 2000+ changed lines. However, additional things I changed were:
1. Moved the cmake parts for these tests into their own `CMakeLists.txt` under `test/cpp/api` and calling `add_subdirectory` from `torch/CMakeLists.txt`
2. Fixing DataParallel tests which weren't being compiled because `USE_CUDA` wasn't correctly being set at all.
3. Updated README
ezyang ebetica
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11953
Differential Revision: D9998883
Pulled By: goldsborough
fbshipit-source-id: affe3f320b0ca63e7e0019926a59076bb943db80
Summary:
This PR serves two purposes:
1. Design an abstraction over a serialization scheme for C++ modules, optimizers and tensors in general,
2. Add serialization to the ONNX/PyTorch proto format.
This is currently a rough prototype I coded up today, to get quick feedback.
For this I propose the following serialization interface within the C++ API:
```cpp
namespace torch { namespace serialize {
class Reader {
public:
virtual ~Reader() = default;
virtual void read(const std::string& key, Tensor& tensor, bool is_buffer = false) = 0;
virtual void finish() { }
};
class Writer {
public:
virtual ~Reader() = default;
virtual void writer(const std::string& key, const Tensor& tensor, bool is_buffer = false) = 0;
virtual void finish() { }
};
}} // namespace torch::serialize
```
There are then subclasses of these two for (1) Cereal and (2) Protobuf (called the "DefaultWriter" and "DefaultReader" to hide the implementation details). See `torch/serialize/cereal.h` and `torch/serialize/default.h`. This abstraction and subclassing for these two allows us to:
1. Provide a cereal-less serialization forward that we can ship and iterate on going forward,
2. Provide no-friction backwards compatibility with existing C++ API uses, mainly StarCraft.
The user-facing API is (conceptually):
```cpp
void torch::save(const Module& module, Writer& writer);
void torch::save(const Optimizer& optimizer, Writer& writer);
void torch::read(Module& module, Reader& reader);
void torch::read(Optimizer& optimizer, Reader& reader);
```
with implementations for both optimizers and modules that write into the `Writer` and read from the `Reader`
ebetica ezyang zdevito dzhulgakov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11619
Differential Revision: D9984664
Pulled By: goldsborough
fbshipit-source-id: e03afaa646221546e7f93bb8dfe3558e384a5847
Summary:
The second part of T32009899
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11556
Differential Revision: D9888224
Pulled By: zrphercule
fbshipit-source-id: cb0d0ba5d9c7ad601ee3bce0d932ce9cbbc40908
Summary:
1. Document the Sequential module in the C++ API at a high, why-does-this-exist, and low, how-to-use, level
2. Change the Sequential tests to be in a style that makes them easier to convert to gtest. No code changes.
ebetica ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11648
Differential Revision: D9834526
Pulled By: goldsborough
fbshipit-source-id: 39f2f5c6cbbf8ed5a1b69986978c8ef127036de1
Summary:
In order to comply with Python's rules on implicit casting of
non-booleans to booleans, this PR removes implicit casting in favor of
explicit casts via `bool()`
cc zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11503
Differential Revision: D9780869
Pulled By: driazati
fbshipit-source-id: c753acaca27f4e79dddf424c6b04674f44a6aad9
Summary:
Documents the `AnyModule` class in the C++ API.
Also changed the API to be friendlier by default. Calling `AnyModule::forward` used to return an `AnyModule::Value` which you had to call `.get<T>()` on to cast to a concrete type. I changed the name of that `forward` method to `any_forward` and instead made `forward` templated on a `ReturnType` template parameter which you can supply to do the `.get<T>` cast for you automatically. I default this parameter to `torch::Tensor` so that it can often be omitted. So where you used to have to write
```cpp
any_module.forward(...).get<int>();
any_module.forward(...).get<torch::Tensor>();
```
you now write
```cpp
any_module.forward<int>(...);
any_module.forward(...);
```
ebetica ezyang soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11580
Differential Revision: D9798626
Pulled By: goldsborough
fbshipit-source-id: 060b4ea28facaffc417f53b80b846a9dff9acb73
Summary:
This PR:
1. Documents `BatchNorm`,
2. Makes a number of API changes after reconsidering some quirks:
1. The default value for the `stateful` parameter used to be `false`, but the most common usage of `BatchNorm` out of the wild is certainly stateful, and the default in Python is also statefulness. So we change the default to stateful.
2. The `pure_forward` function used to use the internal running mean and variance variables instead of the ones supplied to that function call when `stateful` was true, which certainly seems odd. When you call `pure_forward` you would certainly expect the values you pass explicitly to be used. This is now fixed.
3. Adds tests for `BatchNorm`, finally.
ebetica apaszke ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11484
Reviewed By: pjh5
Differential Revision: D9779618
Pulled By: goldsborough
fbshipit-source-id: 59ba760e085c01454b75644b24b22317b688e459
Summary:
This PR does two things:
1. Replaces the implementation of the `Dropout` module with a call to the ATen function,
2. Replaces `Dropout2d` with a new `FeatureDropout` module that shall take the place of `Dropout2d` and `Dropout3d`. I contemplated calling it `Dropout2d` and making `Dropout3d` an alias for it, but similar to our decision for `BatchNorm{1,2,3}d` (c.f. https://github.com/pytorch/pytorch/pull/9188), we can deviate from Python PyTorch in favor of the ideal-world solution, which is to have a single module, since both actually just call `feature_dropout`.
I also replaced the implementation of `dropout3d` with a call to `dropout2d` in Python. The code is the same and it's easier for developers to parse than having to manually match the tokens to make sure it's really 100% the same code (which it is, if I matched the tokens correctly).
ebetica ezyang SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11458
Differential Revision: D9756603
Pulled By: goldsborough
fbshipit-source-id: fe847cd2cda2b6da8b06779255d76e32a974807c
Summary:
This PR cleans up the `at::Tensor` class by removing all methods that start with an underscore in favor of functions in the `at::` namespace. This greatly cleans up the `Tensor` class and makes it clearer what is the public and non-public API.
For this I changed `native_functions.yaml` and `Declarations.cwrap` to make all underscore methods `variant: function` (or add such a statement to begin with), and then fixed all code locations using the underscore methods.
ezyang colesbury gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11152
Differential Revision: D9683607
Pulled By: goldsborough
fbshipit-source-id: 97f869f788fa56639c05a439e2a33be49f10f543
Summary:
This lets you compile builtin functions from C++ without having a dependence on Python
```cpp
auto module = torch::jit::compile(JIT"(
def my_script_method(x, y):
return torch.relu(x) + y
)");
IValue result = module->run_method("my_script_method", 1, 2);
```
goldsborough zdevito apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10847
Differential Revision: D9543461
Pulled By: driazati
fbshipit-source-id: 6160dae094030ca144a0df93cb9f26aa78c8cf27
Summary:
Linting `torch/csrc/` (non-recursive) and `torch/csrc/autograd` (non-recursive).
Fixed things like:
- `typedef` vs `using`
- Use `.empty()` instead of comparing with empty string/using `.size() == 0`
- Use range for loops instead of old style loops (`modernize-`)
- Remove some `virtual` + `override`
- Replace `stdint.h` with `cstdint`
- Replace `return Type(x, y)` with `return {x, y}`
- Use boolean values (`true`/`false`) instead of numbers (1/0)
- More ...
ezyang apaszke cpuhrsch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11050
Differential Revision: D9597505
Pulled By: goldsborough
fbshipit-source-id: cb0fb4793ade885a8dbf4b10484487b84c64c7f2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11189
Replaces it with an operator TensorOptions() method on
Type, reestablishing the implicit conversion. I originally
wanted to get rid of the implicit conversion entirely, but
there were a *lot* of use-sites, so I added it back to avoid
a huge codemod. In this patch, I only had to fix sites that
used the optional device_index API.
Reviewed By: cpuhrsch
Differential Revision: D9628281
fbshipit-source-id: 5fe2a68eefb77a3c9bb446f03a94ad723ef90210
Summary:
We don't generate a corresponding Type implementations for them,
so this doesn't do anything at the moment.
We don't plan on supporting complex32 in the near future, but
it is added to reserve the name and number in case we do at
some point in the future.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11173
Reviewed By: SsnL
Differential Revision: D9627477
Pulled By: ezyang
fbshipit-source-id: f49a44ab1c92d8a33130c249ac7b234f210a65e6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11023
I'd like TensorOptions to not know anything about Context, so I can
move it to ATen/core without pulling in Context. To do this, the
type() method has to go, since it consults the context to get a Type.
Reviewed By: cpuhrsch
Differential Revision: D9562467
fbshipit-source-id: 61a18a76eb042a5e70b64b963501e9d68c25d4f0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11101
I'd like to invert the dependency between Tensor and TensorOptions
(such that Tensor includes TensorOptions); to do this, I'd prefer
there to not be a Tensor constructor. Eventually, all references
of Tensor will disappear from TensorOptions.h
Reviewed By: cpuhrsch
Differential Revision: D9585627
fbshipit-source-id: dd4a28b2c06b1e55f629762915f03c2b6c34d840
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11096
To discourage willy-nilly use, and make it clearer that it
is not a Variable
Reviewed By: cpuhrsch
Differential Revision: D9583699
fbshipit-source-id: 4fbde0c01ae3deb2c7ef8c125a9028f089b203ae
Summary:
This is along the way of removing Tensor as a member of the tagged union in Scalar. This simplifies ordering dependencies, because currently Scalar and Tensor both depend on each other (so we introduce a TensorBase). Also, this API isn't particularly useful publicly: we can't autograd through Scalars, so you still need a Tensor overload basically everywhere anyway.
I'm undecided what the final API should be here. We could keep a Tensor constructor on Scalar, but have it generate a local scalar; this is convenient but given this API used to be non-synchronizing, it may not be the best.
For now, I'm just using _local_scalar, which is clear, although we should get rid of the prefix _ if that's the API we intend to promote.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10852
Reviewed By: ezyang
Differential Revision: D9496766
Pulled By: gchanan
fbshipit-source-id: 16f39b57536b9707132a5a4d915650c381bb57db
Summary:
apaszke recently ported RNNs from Python into ATen, which means we can replace our implementation in the C++ API (written by ebetica) with the ATen implementation, which cleans up a lot of code (+99, -323). Thanks apaszke!
I also added the `bidirectional` and `batch_first` options to the C++ API RNN options, just because why not.
apaszke ebetica
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10761
Differential Revision: D9443885
Pulled By: goldsborough
fbshipit-source-id: b6ef7566b9ced2b2f0b2e1f46c295b6f250c65a8
Summary:
```
Use intrusive_ptr in Storage; replace unique_ptr<Storage> with Storage
This patch does two major changes:
- It replaces the use of Retainable in Storage with a new implementation
based on intrusive_ptr. This will be necessary because Caffe2 will
be using this class to implement intrusive_ptrs, and we need to
line these up for the merge. One good thing about the new implementation is
that the default copy/move constructors/assignment operators and destructor
work automatically, instead of needing to be hardcoded into Storage/Tensor.
- It replaces all places where we returned std::unique_ptr<Storage> with
Storage, collapsing an unnecessary double indirection that is no longer
necessary now that we have correctly working copy/move constructors.
I didn't initially want to do step (2), but it was very important to
eliminate all bare uses of new Storage and new StorageImpl, and this making
the API change was the most straightforward way to do this.
HOW TO FIX YOUR CODE IN THE NEW API
- You no longer need to dereference the result of tensor.storage() to pass
it to set. So, instead of:
x.set_(*y.storage());
just write:
x.set_(y.storage());
- If you were accessing methods on StorageImpl via the pImpl() method, you
must use the dot operator to run pImpl(). Even better; just drop pImpl,
we now have method forwarding. So, instead of:
storage->pImpl()->data();
just do:
storage->data();
// storage.pImpl()->data() works too but is not as recommended
- storage->getDevice() is no more; instead use storage->device().index()
MISC CODE UPDATES
- retain, release, weak_retain, weak_release and weak_lock are now
reimplemented using the "blessed API", and renamed to make it
clearer that their use is discouraged.
- nvcc OS X and general OS X portability improvements to intrusive_ptr
- A new comment in intrusive_ptr describing how stack allocated
intrusive_ptr_targets work differently than heap allocated ones
from c10::make_intrusive
CAVEAT EMPTOR
- THStorage_weakRetain used to work on strong pointers, but it NO LONGER
works with intrusive_ptr. You must reclaim the strong pointer into a
real strong pointer, construct a weak pointer from it, and then release
the strong and weak pointers. See StorageSharing.cpp for an example.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10488
Reviewed By: gchanan
Differential Revision: D9306134
Pulled By: ezyang
fbshipit-source-id: 02d58ef62dab8e4da6131e1a24834a65c21048e2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10478
- Removed Backend constructor from Device, and fixed all
use-sites to use DeviceType::CPU instead of kCPU, or
use a new function backendToDeviceType to perform
the conversion.
- New method device_type() on Type; it gives you the
underlying device type, e.g., CPU for SparseCPU.
- We add backward compatibility for kCPU/kCUDA uses,
by introducing a new special type which is implicitly
convertible to both DeviceType and Backend. As long as
you don't define a function that's overloaded on both
DeviceType and Backend (but not on BackendOrDeviceType),
the implicit conversions will ensure that uses
of at::Device(at::kCPU) keep working. We fixed use-sites in
the library, but did NOT fix sites in the test code, so that
we can exercise this BC code.
Reviewed By: Yangqing
Differential Revision: D9301861
fbshipit-source-id: 9a9d88620500715c7b37e655b4fd761f6dd72716
Summary:
This PR removes the `using Tensor = autograd::Variable;` alias from `torch/tensor.h`, which means `torch::Tensor` is now `at::Tensor`. This PR fixes up some last uses of `.data()` and tidies up the resulting code. For example, I was able to remove `TensorListView` such that code like
```
auto loss = torch::stack(torch::TensorListView(policy_loss)).sum() +
torch::stack(torch::TensorListView(value_loss)).sum();
```
is now
```
auto loss = torch::stack(policy_loss).sum() + torch::stack(value_loss).sum();
```
CC jgehring
ebetica
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10516
Differential Revision: D9324691
Pulled By: goldsborough
fbshipit-source-id: a7c1cb779c9c829f89cea55f07ac539b00c78449
Summary:
After talking to users of the C++ API we found that having the tensor type be `autograd::Variable` causes more complications than having it be `at::Tensor`. It used to be a problem because `at::Tensor` didn't have the "autograd API" of variable (e.g. `detach()` or `grad()` methods), but those methods are now on `at::Tensor`. As such, we want to make a last big breaking change to have the tensor type be `at::Tensor`, while factory methods like `torch::ones` will return `Variable`s disguised as `at::Tensor`. This will make many things easier, like calling functions in ATen that take vectors of tensors.
This PR makes a small step in this direction by updating the optimizer classes to not use `.data()` on `Variable` to access the underlying `at::Tensor`. Using `.data()` is effectively a hack to work around our modification rules for tensors that require grad. The proper way of doing things is to use `with torch.no_grad` or equivalently `NoGradGuard` in C++ to guard in-place operations.
The next step can then simply redefine `torch::Tensor` to be `at::Tensor`. This transition should be smooth, since all methods available on `Variable` are at this point available on `at::Tensor`.
For this PR I:
1. Modified the implementations of optimizers to not use `.data()`. This means the implementations are now different from PyTorch, which still uses the legacy method of using `.data`.
2. To properly verify (1), I added more fine-grained test cases to our optimizer tests, e.g. `SGD` with and without `weight_decay`, then with `nesterov` etc. Generally more tests = more happy!
3. Minor cleanup of the optimizer codebase
ebetica apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10490
Differential Revision: D9318229
Pulled By: goldsborough
fbshipit-source-id: fb386700f37840542bc5d323f308ea88fe5ea5c5
Summary:
This PR provides 4 fixes / features:
1. torch::nn::Cloneable inherits virtually from torch::nn::Module. We want to pass around a module with new functions, and the best way to do this is to do a diamond inheritance pattern, i.e.
```c++
struct MySuperModuleImpl : virtual public torch::nn::Module {
virtual void myFunction() = 0;
}
struct MySuperModule : public torch::nn::Cloneable<MySuperModule>, MySuperModuleImple {};
struct MyModule : public MySuperModule<MyModule> {
void myFunction() override;
};
```
This way, we can simply pass around MySuperModuleImpl around instead of torch::nn::Module.
2. Optimizer options are public now, since there's no way to decay the LR or modify it during training otherwise
3. Serialization functions creates autograd history and calls copy_! Bad!
4. Optimizers did not create buffers after add_parameters was called.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9837
Reviewed By: goldsborough
Differential Revision: D9199746
Pulled By: ebetica
fbshipit-source-id: 76d6b22e589a42637b7cc0b5bcd3c6b6662fb299
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10130
Update some include paths to make them internally consistent
Reviewed By: ezyang
Differential Revision: D9119906
fbshipit-source-id: b44e5cab8e8e795ee18afe9ffc6caf1f2b413467
Summary:
ebetica made me aware that `nn::Module::clone()` always clones to the current device (usually CPU) instead of preserving the device of each parameter. This PR changes the signature of `clone` from
`shared_ptr<Module> clone()`
to
`shared_ptr<Module> clone(optional<Device> device = nullopt)`
with semantics of:
1. If a `device` is given, all parameters/buffers are moved to that device,
2. If no `device` is supplied (default), parameters/buffers retain their device.
ezyang apaszke ebetica
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9609
Differential Revision: D8957367
Pulled By: goldsborough
fbshipit-source-id: 0d409ae645ed2b8d97d6fc060240de2f3d4bc6c8
Summary:
I renamed the variable in the `Embedding` module from `weight` to `table` a few months ago, because it seemed like a more meaningful name. Turns out it's not such a good idea because it deviates from PyTorch, which unnecessarily breaks C++->Python translated code.
ebetica ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9720
Differential Revision: D8955647
Pulled By: goldsborough
fbshipit-source-id: 77228b07d2b733866e8cdecaa6d0686eef4cc3ea
Summary:
This PR adds the functional version of `DataParallel` (i.e. `data_parallel`) to the C++ frontend.
For this, I had to:
1. Add "differentiable" versions of scatter and gather, which perform their inverse operation in the backward pass, to C++. I've added them under `torch/csrc/autograd/functions/comm.{h,cpp}`. I had to move some utilities from `VariableType.cpp` into `torch/csrc/autograd/functions/utils.h`, and changed them a bit to fix the `const_cast`s for which there were `TODO`s,
2. Implement the `replicate`, `parallel_apply` and the combining `data_parallel` functions in C++.
`replicate` is implemented based on our existing `clone()` interface, along with the ability to set the current device via `at::OptionsGuard` (so nice).
`parallel_apply` is implemented using `at::parallel_for` (CC cpuhrsch) and [follows the code from PyTorch](https://github.com/pytorch/pytorch/blob/master/torch/nn/parallel/parallel_apply.py).
Added lots of tests for these things.
apaszke ezyang ebetica colesbury
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9234
Differential Revision: D8865182
Pulled By: goldsborough
fbshipit-source-id: 4f1fecf2b3f3bc1540c071dfb2d23dd45de433e4
Summary:
In our pimpl system, default constructing a module holder default constructs the contained module. This means `Linear linear;` is ill-formed, since `Linear` doesn't have a default constructor. Instead we require `Linear linear = nullptr;` to get the empty state of the `Linear`. This PR makes the error message for the ill-formed case nicer.
I had to change the forwarding constructors of most of our modules for this, but that's a minor adjustment.
E.g.
```
Linear linear;
In file included from /home/psag/pytorch/pytorch/torch/csrc/api/include/torch/nn/module.h:5:0,
from /home/psag/pytorch/pytorch/test/cpp/api/module.cpp:3:
/home/psag/pytorch/pytorch/torch/csrc/api/include/torch/nn/pimpl.h: In instantiation of ‘torch::nn::ModuleHolder<Contained>::ModuleHolder() [with Contained = torch::nn::LinearImpl]’:
/home/psag/pytorch/pytorch/torch/csrc/api/include/torch/nn/modules/dropout.h:45:1: required from here
/home/psag/pytorch/pytorch/torch/csrc/api/include/torch/nn/pimpl.h:46:5: error: static assertion failed: You are trying to default construct a module which has no default constructor. Use = nullptr to give it the empty state (like an empt
y std::shared_ptr).
static_assert(
```
ebetica ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9565
Differential Revision: D8903666
Pulled By: goldsborough
fbshipit-source-id: 5e6b788921a27a44359db89afdc2b057facc5cec
Summary:
THCStream was recently moved to ATen by mruberry: https://github.com/pytorch/pytorch/pull/8997. This PR now introduces a guard class that replaces `AutoStream` from `torch/csrc/` and also uses this new stream interface.
I had to extend the `CUDAStream` interface with unchecked calls, so that we can reset the stream without throwing an exception in the guard's destructor.
colesbury apaszke ezyang
Fixes https://github.com/pytorch/pytorch/issues/7800
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9277
Differential Revision: D8865183
Pulled By: goldsborough
fbshipit-source-id: 67c9bc09629d92fa5660286b5eec08fde9108cd7
Summary:
ebetica asked for a way to add parameters to `Optimizer`s after they are created.
ebetica ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9472
Differential Revision: D8872176
Pulled By: goldsborough
fbshipit-source-id: 39a4032c519a6d3b458dd3596361b04afea10365
Summary:
I noticed that `Sequential::clone()` does not work. This is because `Sequential` does not use `reset()` which is normally where modules have to initialize and register its submodules. Further, this is because of the way `Sequential` allows its modules to be passed in the constructor, which doesn't work with `reset()` (since it does "late" initialization).
I've added some better error messages inside `Cloneable::clone()` which makes this kind of mistake clearer for other users, and tests for `Sequential::clone()`.
I also had to give `AnyModule` a deep `clone()` method.
ebetica ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9372
Differential Revision: D8865189
Pulled By: goldsborough
fbshipit-source-id: b81586e0d3157cd3c4265b19ac8dd87c5d8dcf94
Summary:
To allow our C++ customers to use our initialization methods as well, this PR moves some of the code from `torch.nn.init` to ATen, calls it from Python, and adds equivalent code to the C++ frontend.
Notes:
1. Happy to hear thoughts on whether it's ok to have e.g. `torch.nn.init.dirac_` *and* `torch.dirac_` (the former has a `no_grad` guard). We have this for `ones_` and stuff too, so I don't mind it.
2. I left the exception checking in Python because they throw `ValueError`s while ATen errors show as `RuntimeError`s. I imagine this would break users' error handling if someone were to have a `try`-`except` handler for `ValueError` (or maybe it's a far fetch)
EDIT: After discussions with zdevito, the PR now simply duplicates the code in C++ exclusively for the C++ API, and we leave the Python code as-is (to make it easier for people to read/modify).
ebetica ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9295
Differential Revision: D8813793
Pulled By: goldsborough
fbshipit-source-id: 4b969f3f75952c1be4e837e19e23b8098e5fbd4b
Summary:
In the C++ API, `Sequential` currently was not refcounted itself, but stored `shared_ptr<AnyModule>` to get the reference semantics. This is unfortunate because most modules in the API are accessed via `->`, e.g. `Linear l(1, 2); l->forward(...);`. `Sequential` was different in that it had value semantics itself, thus was accessed via `.`.
This PR makes `Sequential` store `AnyModule` (without extra indirection), and uses the same pImpl mechanism we use for all other modules to make `Sequential` have reference semantics itself. This makes it consistent with the rest of the library. It also removes one level of indirection inside of `Sequential`, which is cool.
One thing I had to change was that the `ModuleHolder` with which the whole pImpl thing is implemented previously did some tricks to make `Linear(3, 4)` actually construct `Linear(LinearOptions(3, 4))`. This doesn't work well with `Sequential` since it takes a variadic parameter pack. Instead, I made `ModuleHolder` forward all arguments to the underlying module, and then further pushed the trick to forward parameters to modules' options types into the actual Modules. This adds one constructor per Module in the library. This is not something user modules have to do (unless they want this nice forwarding themselves). It makes the code simpler overall.
ezyang ebetica apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9151
Reviewed By: ezyang
Differential Revision: D8809298
Pulled By: goldsborough
fbshipit-source-id: da68452c3de912fbc67af330ba93b5220de6909f
Summary:
Added a way to `dynamic_cast` an `nn::Module` and get a pointer to it. `nn::Module::is<T>` just checked if the return value of the `dynamic_cast` was nullptr, so I got rid of `is<T>` since it's equivalent to `as<T> != nullptr`(or just `as<T>` due to boolean conversion).
We're now at
```
if (auto* conv = module.as<nn::Conv2d>()) {
conv->weight.data().normal_(0.0, 0.02);
} else if (auto* bn = module.as<nn::BatchNorm>()) {
bn->weight.data().normal_(1.0, 0.02);
bn->bias.data().fill_(0);
}
```
ezyang apaszke ebetica
Closes https://github.com/pytorch/pytorch/pull/9149
Differential Revision: D8735954
Pulled By: goldsborough
fbshipit-source-id: e2b8f6f0cea16a621f8bc0807a33cc7651d25154
Summary:
There is no way to concatenate two `Sequential`s in Python, but it's also easier to do in an immutable fashion by just writing `Sequential(first.modules() + second.modules())`. Concatenating vectors isn't as easy in C++, so I think it's fair to save users some for loops by giving them `Sequential::extend()`.
apaszke ebetica ezyang
CC jamespinkerton
Closes https://github.com/pytorch/pytorch/pull/9116
Reviewed By: ezyang
Differential Revision: D8719630
Pulled By: goldsborough
fbshipit-source-id: 840d7ac70755350e6202b493c531e30ecbb6546f
Summary:
The goal of this PR was to add support for dropout descriptors in the C++ API's RNN class.
The end result is a 4x-5x speedup for our RNN integration tests since they can now use cuDNN instead of autograd when dropout is set.
To achieve this, I had to move `_cudnn_init_dropout_state` to the `TensorOptions` API.
I also fixed a bug around `RNN::cuda()` not flattening parameters for cuDNN.
ebetica ezyang
Closes https://github.com/pytorch/pytorch/pull/9012
Reviewed By: pjh5
Differential Revision: D8689786
Pulled By: goldsborough
fbshipit-source-id: 44fb191f5a38e41c4ded5417306b5bbc012cd56c
Summary:
When initializing weights for my C++ model, I had to write
```cpp
void initialize_weights(nn::Module& module) {
if (module.name().find("Conv2d") != std::string::npos) {
module.parameters()["weight"].data().normal_(0.0, 0.02);
} else if (module.name().find("BatchNorm") != std::string::npos) {
auto parameters = module.parameters();
parameters["weight"].data().normal_(1.0, 0.02);
parameters["bias"].data().fill_(0);
}
}
```
The string-based module determination is not very nice, and not very C++-y. So I created `nn::Module::is<T>` which does a `dynamic_cast` inside. It also handles the `ModuleHolder` vs. `Module` distinction.
It now becomes
```cpp
if (module.is<nn::Conv2d>()) {
module.parameters()["weight"].data().normal_(0.0, 0.02);
} else if (module.is<nn::BatchNorm>()) {
auto parameters = module.parameters();
parameters["weight"].data().normal_(1.0, 0.02);
parameters["bias"].data().fill_(0);
}
```
ebetica ezyang apaszke
Closes https://github.com/pytorch/pytorch/pull/8970
Differential Revision: D8677476
Pulled By: goldsborough
fbshipit-source-id: 053294e19b6a58cce868167596c89639f7de91c2
Summary:
Operations on `Variable`s (or `torch::Tensor`) usually return `at::Tensor`. This is usually fine, but the `AnyModule` used in the implementation of `torch::Sequential` is very picky about types, and does not understand implicit conversions like this. This means that `sequential.forward(at_tensor_that_is_actually_a_variable)` will fail unless you wrap `at_tensor_that_is_actually_a_variable` with `torch::Tensor`.
This PR adds a special case to `AnyModule` that will convert an `at::Tensor` to `torch::Tensor` when the tensor is really a variable, and else just pass the `at::Tensor`. This is a nice little usability improvement for the often-used `Sequential` class.
ebetica ezyang
Closes https://github.com/pytorch/pytorch/pull/8968
Reviewed By: ezyang
Differential Revision: D8670407
Pulled By: goldsborough
fbshipit-source-id: 3635ed6ed28238f3900ce4a876d07f1b11713831
Summary:
Sets the random seed at the start of C++ tests so that everything is super deterministic.
I made sure we only generate random values from torch instead of `std::`, so that this seed always applies. I.e. I do:
```
torch::randint(2, {2}, at::kInt64)
```
instead of
```
std::rand() % 2
```
Also got rid of the tests that test the random seeding, since it would interfere here. And the test is not useful since we just use ATen's seeding mechanism, which should work.
Fixes #7288#7286#7289
ebetica ezyang
Closes https://github.com/pytorch/pytorch/pull/8903
Differential Revision: D8667269
Pulled By: goldsborough
fbshipit-source-id: a833e86e156d5e68dae8c53a4b1c433cb0608b6c
Summary:
This PR is the final step to making `torch::` the only namespace users of the C++ API ever see. Basically, I did:
``` cpp
namespace torch {
using namespace at;
}
```
And then changed `torch::` to `at::` almost everywhere. This worked surprisingly well out of the box. So users can now write `torch::relu` and `torch::log_softmax` and `torch::conv2d` instead of having to know when to use `at::` and when `torch::`. This is happy!
Another thing I did was to have `using Dtype = at::ScalarType`, which will be the eventual name anyway.
ebetica ezyang apaszke zdevito
Closes https://github.com/pytorch/pytorch/pull/8911
Reviewed By: ezyang
Differential Revision: D8668230
Pulled By: goldsborough
fbshipit-source-id: a72ccb70fca763c396c4b0997d3c4767c8cf4fd3
* Better forward methods in C++ API
capitalize error message in test_torch.test_flatten
Support for operator()
* Add operator() to Functional
* Get rid of SigmoidLinear
* Add BoundFunction to FunctionalImpl
* Remove macro from conv because it makes errors more nasty
* Rework optim folder
* Removed TORCH_OPTIMIZER_CLASS macro
* Got rid of CRTP/Impl
* Removed TORCH_AUTOGRAD_KWARG
* Differentiate between Optimizer and LossClosureOptimizer
* Make Optimizers parameters based instead of model based
* Allow construction of optimizer from arbitrary vector
* Added test for zero grad
* Added test for external parameter vectors
* Now comparing against baseline values
* Documentation
* Post rebase fixes
* Different strategy for creating and accessing buffers in optimizers
* Fix member ordering
* Bag of fixes
* Rename tensor_range.h to tensor_list_view.h
* Post rebase fixes
* Rename torch::tensor namespace to torch::tensors due to name conflict
* Avoid recursion in Module::to
* Created DefaultTensorOptions
* Fix TensorOptions() call which was interpreted as function decl
* Fix empty OptionsGuard
* Make options_ and mutex_ in DefaultTensorOptions class static because of dynamic linker issues
* Make DefaultOptions thread local
* Created TORCH_MODULE macro
Rewrote Linear
Rewrote Dropout and added default constructor to TORCH_MODULE macro
Turned TORCH_MODULE contens into a proper base class
Added some documentation
Got rid of the old Dropout module
Got rid of the old Embedding module
Got rid of the old BatchNorm module
Got rid of the old Conv module
Fixing optimizers
Rebase
Removed old RNN modules and the TORCH_ATTR macro
Removed temporary P:: namespace
Added cloning behavior to all modules
Got rid of some get() calls
self review nits
Remove noexcept from ModuleHolder methods that can throw
Remove spaces
Add missing override to reset() methods
Added examples to documentation in pimpl.h
* Post rebase fixes
* Created TensorOptions
Storing the type in TensorOptions to solve the Variable problem
Created convenience creation functions for TensorOptions and added tests
Converted zeros to TensorOptions
Converted rand to TensorOptions
Fix codegen for TensorOptions and multiple arguments
Put TensorOptions convenience functions into torch namespace too
All factory functions except *_like support TensorOptions
Integrated with recent JIT changes
Support *_like functions
Fix in place modification
Some cleanups and fixes
Support sparse_coo_tensor
Fix bug in Type.cpp
Fix .empty calls in C++ API
Fix bug in Type.cpp
Trying to fix device placement
Make AutoGPU CPU compatible
Remove some auto_gpu.h uses
Fixing some headers
Fix some remaining CUDA/AutoGPU issues
Fix some AutoGPU uses
Fixes to dispatch_tensor_conversion
Reset version of new variables to zero
Implemented parsing device strings
Random fixes to tests
Self review cleanups
flake8
Undo changes to variable.{h,cpp} because they fail on gcc7.2
Add [cuda] tag to tensor_options_cuda.cpp
Move AutoGPU::set_index_from into .cpp file because Windows is stupid and sucks
Fix linker error in AutoGPU.cpp
Fix bad merge conflict in native_functions.yaml
Fixed caffe2/contrib/aten
Fix new window functions added to TensorFactories.cpp
* Removed torch::TensorOptions
Added code to generate wrapper functions for factory methods
Add implicit constructor from Backend to TensorOptions
Remove Var() from C++ API and use torch:: functions
Use torch:: functions more subtly in C++ API
Make AutoGPU::set_device more exception safe
Check status directly in DynamicCUDAHooksInterface
Rename AutoGPU to DeviceGuard
Removed set_requires_grad from python_variables.h and warn appropriately in Variable::set_requires_grad
remove python_default_init: self.type()
Add back original factory functions, but with deprecation warnings
Disable DeviceGuard for a couple functions in ATen
Remove print statement
Fix DeviceGuard construction from undefined tensor
Fixing CUDA device compiler issues
Moved as many methods as possible into header files
Dont generate python functions for deprecated factories
Remove merge conflict artefact
Fix tensor_options_cuda.cpp
Fix set_requires_grad not being checked
Fix tensor_new.h
TEMPORARILY put some methods in .cpp files to see if it solves issues on windows and mac
Fix bug in DeviceGuard.h
Missing includes
TEMPORARILY moving a few more methods into .cpp to see if it fixes windows
Fixing linker errors
* Fix up SummaryOps to use new factories
Undo device agnostic behavior of DeviceGuard
Use -1 instead of optional for default device index
Also move DeviceGuard methods into header
Fixes around device index after optional -> int32_t switch
Fix use of DeviceGuard in new_with_tensor_copy
Fix tensor_options.cpp
* Fix Type::copy(
* Remove test_non_float_params from ONNX tests
* Set requires_grad=False in ONNX tests that use ints
* Put layout/dtype/device on Tensor
* Post merge fixes
* Change behavior of DeviceGuard to match AutoGPU
* Fix C++ API integration tests
* Fix flip functions
* Add backward() to Tensor and Variable
* Add at:: in front of Tensor
* Trying to not move optional to appease windows?
* Move implementation into cpp file
* Undo some formatting changes
* Implemented fused builder based construction mechanism
* "weights" -> "weight"
* Use int64_t instead of size_t everywhere in RNN
* Extracted Conv::ExpandingSize into its own thing
* Rename TORCH_PARAMETER to TORCH_ATTR
* Added documentation
* Fix weight names in batchnorm module
* Adding LBFGS to cpp API
* Adding stop conditions
* Test cases now passing and adding closure to all algs
* Addressing code review
* Set seeds to make optim tests more deterministic
* Add name() to C++ modules
* Use RTTI to get module name by default
* Add functional.cpp to CMakeLists.txt
* Call typeid() inside name() instead of constructor
* Add tests and use default constructor
* Rename autograd namespace to torch and change torch.h into python.h
* Pave the way for torch::nn::Module
* Reorganize module code structure
* Undo ONNX update
* Remove sleef submodule
* Rename autograd namespace to torch and change torch.h into python.h
* Include torch.h instead of python.h in test/cpp/api
* Change some mentions of torch.h to python.h in C++ extensions
* Set paths directly, without find_path
* Dump autogradpp into PyTorch
* Fixed up CMake for autogradpp/C++ API
* Made cereal a submodule
* Change search location of autogradpps mnist directory
* Add test_api to CI
* Download MNIST from the internet instead of storing in repo
* Fix warnings