Summary:
The init-list form of `at::indexing::Slice` (i.e. `tensor.index({{1, None, 2}, ...})` instead of `tensor.index({Slice(1, None, 2), ...})`) in C++ API can be easily confused with the list-form indexing in Python API (e.g. `tensor[[1, 3, 2], ...]`), which is not good from readability perspective. This PR removes the init-list form of `at::indexing::Slice` to make the API less confusing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34255
Test Plan: Imported from GitHub, without a `Test Plan:` line.
Differential Revision: D20290166
Pulled By: yf225
fbshipit-source-id: abbcbeca0b179219e5e1f196a33ef8aec87ebb76
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34122
Earlier work added support for async rpc cases when RecordFunction's
end callbacks might be called in a different thread; in addition some
extra care was needed to handle pointer to parent function;
This PR makes RecordFunction aware of potentially multiple threads in
use, as well as removes unused parent() call and restricts current()
RecordFunction to scope-based record functions (RECORD_FUNCTION macro)
Test Plan: unit tests
Differential Revision: D20297709
Pulled By: ilia-cher
fbshipit-source-id: 46a59e1b2eea0bbd8a59630385e193b38d30f9d1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33294
1. Serialize bytecode of __setstate__ and run it when loading the model.
2. One use case is quantization. To test this use case a few operators are registered temporarily for lite interpreter. The "_" prefix registration will be removed when the operators are all migrated to mobile.
Test Plan: Imported from OSS
Differential Revision: D20162898
Pulled By: iseeyuan
fbshipit-source-id: 7a3180807bf38fbce594d86993896861f12bb58c
Summary:
**Summary**
There is often a need to create a Tensor when writing IR by hand for JIT
optimisation pass unit tests. The only options for this today are real
Tensor creation functions like `aten::ones`. Any test that uses these functions
must also use the same default arguments as the Python/C++ API, which means
that all of the tests have to be updated when the API is updated. This commit
introduces a new primitive, `prim::MakeTestTensor` with schema `() -> Tensor` that
should be used in unit tests instead of real Tensor creation functions. This new
primitive has no public-facing API, so the maintenance burden is much lower.
**Testing**
This commit updates the alias analysis and DCE tests to use `prim::MakeTestTensor` instead of
`aten::rand`, `aten::ones`, and `aten::zeros`.
```
$ ./bin/test_jit
CUDA not available. Disabling CUDA and MultiCUDA tests
Note: Google Test filter = *-*_CUDA:*_MultiCUDA
[==========] Running 75 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 75 tests from JitTest
[ RUN ] JitTest.ADFormulas
[ OK ] JitTest.ADFormulas (82 ms)
[ RUN ] JitTest.Attributes
[ OK ] JitTest.Attributes (0 ms)
...
...
...
[ RUN ] JitTest.LiteInterpreterPrim
[ OK ] JitTest.LiteInterpreterPrim (0 ms)
[ RUN ] JitTest.LiteInterpreterLoadOrigJit
[ OK ] JitTest.LiteInterpreterLoadOrigJit (2 ms)
[----------] 75 tests from JitTest (150 ms total)
[----------] Global test environment tear-down
[==========] 75 tests from 1 test case ran. (150 ms total)
[ PASSED ] 75 tests.
```
**Fixes**
This pull request fixes https://github.com/pytorch/pytorch/issues/33500.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33914
Differential Revision: D20150304
Pulled By: SplitInfinity
fbshipit-source-id: c88f5289055a02dc20b7a5dcdf87469f9816d020
Summary:
Stacked PRs
* #33474 - [jit] Remove list specializations from pickler
* **#33255 - [jit] Add type tags to lists/dicts in pickle**
This adds a global call to `torch.jit._pickle.restore_type_tags` for
lists and dicts so that we can preserve their types after serialization.
](https://our.intern.facebook.com/intern/diff/19868637/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33255
Pulled By: driazati
Reviewed By: xman1979, Tianshu-Bao
Differential Revision: D19868637
fbshipit-source-id: 2f1826e6679a786ca209198690269f399a542c04
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33834
This changes how we report Tracebacks to make them more clear when
there are both serialized and non-serialized ranges. It now looks like:
```
Traceback (most recent call last):
File "foo.py", line 25, in <module>
s2(a, b)
File "/scratch/zdevito/pytorch/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/__torch__.py", line 7, in forward
x: Tensor,
y: Tensor) -> Tensor:
return (self).bar(x, y, )
~~~~~~~~~ <--- HERE
def bar(self: __torch__.Moo,
x: Tensor,
File "code/__torch__.py", line 11, in bar
x: Tensor,
y: Tensor) -> Tensor:
_0 = (self).baz(x, y, )
~~~~~~~~~ <--- HERE
_1 = torch.ones([3], dtype=None, layout=None, device=None, pin_memory=None)
return torch.add(_0, _1, alpha=1)
File "code/__torch__.py", line 17, in baz
x: Tensor,
y: Tensor) -> Tensor:
return torch.add(x, y, alpha=1)
~~~~~~~~~ <--- HERE
Traceback of TorchScript, original code (most recent call last):
File "foo.py", line 11, in forward
def forward(self, x, y):
return self.bar(x, y)
~~~~~~~~ <--- HERE
File "foo.py", line 9, in bar
def bar(self, x, y):
return self.baz(x, y) + torch.ones(3)
~~~~~~~~ <--- HERE
File "foo.py", line 7, in baz
def baz(self, x, y):
return x + y
~~~~~ <--- HERE
RuntimeError: The size of tensor a (4) must match the size of tensor b (5) at non-singleton dimension 1
```
It follows Python convension of having the most important information last
and reading from the bottom up.
Changes:
* Moved the error message to the end, to copy Python
* Report original traceback separate from serialized traceback
* Make sure root functions have names in the interpreter trace.
Test Plan: Imported from OSS
Differential Revision: D20126136
Pulled By: zdevito
fbshipit-source-id: fd01f9985e5d74e04c4d064c02e8bc320f4fac13
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34035
Bug for the conditon check in https://github.com/pytorch/pytorch/pull/24342, realized we don't have tests in either
python or cpp to catch this, so added testes for both python and cpp.
Thanks hczhu on capturing it!
Test Plan: Imported from OSS
Differential Revision: D20198837
Pulled By: wanchaol
fbshipit-source-id: 33846a14c0a8e7aac2e8328189d10c38a0d7e6ee
Summary:
**Summary**
There is often a need to create a Tensor when writing IR by hand for JIT
optimisation pass unit tests. The only options for this today are real
Tensor creation functions like `aten::ones`. Any test that uses these functions
must also use the same default arguments as the Python/C++ API, which means
that all of the tests have to be updated when the API is updated. This commit
introduces a new primitive, `prim::MakeTestTensor` with schema `() -> Tensor` that
should be used in unit tests instead of real Tensor creation functions. This new
primitive has no public-facing API, so the maintenance burden is much lower.
**Testing**
This commit updates the alias analysis and DCE tests to use `prim::MakeTestTensor` instead of
`aten::rand`, `aten::ones`, and `aten::zeros`.
```
$ ./bin/test_jit
CUDA not available. Disabling CUDA and MultiCUDA tests
Note: Google Test filter = *-*_CUDA:*_MultiCUDA
[==========] Running 75 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 75 tests from JitTest
[ RUN ] JitTest.ADFormulas
[ OK ] JitTest.ADFormulas (82 ms)
[ RUN ] JitTest.Attributes
[ OK ] JitTest.Attributes (0 ms)
...
...
...
[ RUN ] JitTest.LiteInterpreterPrim
[ OK ] JitTest.LiteInterpreterPrim (0 ms)
[ RUN ] JitTest.LiteInterpreterLoadOrigJit
[ OK ] JitTest.LiteInterpreterLoadOrigJit (2 ms)
[----------] 75 tests from JitTest (150 ms total)
[----------] Global test environment tear-down
[==========] 75 tests from 1 test case ran. (150 ms total)
[ PASSED ] 75 tests.
```
**Fixes**
This pull request fixes https://github.com/pytorch/pytorch/issues/33500.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33595
Differential Revision: D20127441
Pulled By: SplitInfinity
fbshipit-source-id: 56da4f23ac46335227254f606c6481718108f378
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33711Fixed#33480
This makes `dist_autograd.backward` and `dist_optimizer.step` functional by making the user explicitly pass in the `context_id` as opposed to relying on the confusing thread_local context_id.
This diff incorporates these API changes and all places where these functions are called.
More concretely, this code:
```
with dist_autograd.context():
# Forward pass.
dist_autograd.backward([loss.sum()])
dist_optim.step()
```
should now be written as follows:
```
with dist_autograd.context() as context_id:
# Forward pass.
dist_autograd.backward(context_id, [loss.sum()])
dist_optim.step(context_id)
```
Test Plan: Ensuring all existing dist_autograd and dist_optimizer tests pass with the new API. Also added a new test case for input checking.
Differential Revision: D20011710
fbshipit-source-id: 216e12207934a2a79c7223332b97c558d89d4d65
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30426
This PR adds `assert_tensor_equal` and `assert_tensor_not_equal` to `test/cpp/api/support.h`, as better functions for testing whether two tensors are equal / not equal.
Test Plan: Imported from OSS
Differential Revision: D18695900
Pulled By: yf225
fbshipit-source-id: c19b9bc4c4e84d9f444015023649d27618fcbdf5
Summary:
Most of the function implementation and test code are translated from the Python version.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33652
Differential Revision: D20052211
Pulled By: yf225
fbshipit-source-id: ce6767db54364f91ef4f06674239a12278c2752a
Summary:
This PR adds the following items:
- **1st item**: `ArrayRef<TensorIndex>` and `std::initializer_list<TensorIndex>` overloads for `Tensor::index` and `Tensor::index_put_`, to be used specifically for multi-dim indexing purpose.
Design rationale:
* C++ `Tensor::index` and `Tensor::index_put_` are both existing tensor APIs, and they currently (before this PR) only accept a list of tensors (i.e. `ArrayRef<Tensor>`) as indices. If we change their signatures to also accept non-tensors as indices (i.e. `ArrayRef<TensorIndex>`, and `TensorIndex` is convertible from `Tensor` / `Slice` / `None` / `Ellipsis`), it would slow down the original code path (since now it has to go through more steps), which is undesirable.
To get around this problem, the proposed solution is to keep the original `ArrayRef<Tensor>` overload, and add `ArrayRef<TensorIndex>` and `std::initializer_list<TensorIndex>` overloads to `Tensor::index` and `Tensor::index_put_`. This way, the original code path won’t be affected, and the tensor multi-dim indexing API is only used when the user explicitly pass an `ArrayRef<TensorIndex>` or a braced-init-list of `TensorIndex`-convertible types to `Tensor::index` and `Tensor::index_put_` .
Note that the above proposed solution would still affect perf for the user’s original `Tensor::index` or `Tensor::index_put_` call sites that use a braced-init-list of tensors as input, e.g. `tensor.index({...})` or `tensor.index_put_({...}, value)`, since now such function calls would take the multi-dim indexing path instead of the original advanced indexing path. However, there are only two instances of this in our codebase (one in ATen cpp test, one in a C++ API nn init function), and they can be easily changed to explicitly use `ArrayRef<Tensor>` as input (I changed them in this PR). For external user’s code, since this is part of the C++ frontend which is still considered experimental, we will only talk about this change in the release note, and ask users to switch to using `ArrayRef<Tensor>` explicitly if they want to keep using the original advanced indexing code path.
- **2nd item**: Mechanisms for parsing `ArrayRef<TensorIndex>` indices and performing indexing operations (mirroring the functions in `torch/csrc/autograd/python_variable_indexing.cpp`).
- **3rd item**: Simple tests to demonstrate that the `Tensor::index()` and `Tensor::index_put_()` APIs work. I will add more tests after the first few PRs are reviewed.
- **4th item**: Merge Python/C++ indexing code paths, for code simplicity. I tested locally and found that there is no perf regression resulting from the merge. I will get more concrete numbers for common use cases when we settle on the overall design.
This PR supersedes https://github.com/pytorch/pytorch/pull/30425.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32841
Differential Revision: D19919692
Pulled By: yf225
fbshipit-source-id: 7467e64f97fc0e407624809dd183c95ea16b1482
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32791
When a registered operator has varags (ends with ... in its schema),
the interpreter now appends the number of arguments to the top of
the stack before invoking the operator. This allows the removal of more
uses of Node* in the interpreter.
This PR also then cleans up the constructors for Operator to make
it more likely someone chooses the correct one. After making these ops:
```
USES NODE: prim::TupleUnpack(...) -> (...)
USES NODE: prim::TupleSlice(...) -> (...)
USES NODE: prim::TupleConstruct(...) -> (...)
USES NODE: prim::ListUnpack(...) -> (...)
USES NODE: prim::ListConstruct(...) -> (...)
USES NODE: prim::DictConstruct(...) -> (...)
USES NODE: prim::Constant() -> (...)
USES NODE: prim::isinstance(...) -> (...)
USES NODE: prim::CreateObject(...) -> (...)
USES NODE: prim::fork(...) -> (...)
USES NODE: aten::warn(str message, *, int stacklevel=2) -> () # need stack level information, so ideally in interpreter so it can look at the stack
```
Into interpreter primitives, we can remove all but two constructors for operators:
one that is (schema_string, operation), and one that is (symbol, op_creator) for
the remaining weird primitives.
Test Plan: Imported from OSS
Differential Revision: D19673158
Pulled By: zdevito
fbshipit-source-id: 95442a001538a6f53c1db4a210f8557ef118de66
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33027
This PR allows default arguments in module's forward method to be skipped when module is used in `torch::nn::Sequential`, by introducing the `FORWARD_HAS_DEFAULT_ARGS` macro and requiring that all modules that have default arguments in its forward method must have a corresponding `FORWARD_HAS_DEFAULT_ARGS` macro call.
Fixes issue mentioned in https://github.com/pytorch/pytorch/issues/30931#issuecomment-564144468.
Test Plan: Imported from OSS
Differential Revision: D19777815
Pulled By: yf225
fbshipit-source-id: 73282fcf63377530063e0092a9d84b6c139d2e32
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33026
This PR contains necessary changes to prepare for https://github.com/pytorch/pytorch/pull/33027. It exposes the following classes to public:
1. `torch::nn::AnyValue`, because if the user has optional arguments in their module's forward method, they must also use the `FORWARD_HAS_DEFAULT_ARGS` macro and pass in the default values for those optional arguments wrapped by `torch::nn::AnyValue`.
2. `torch::nn::AnyModuleHolder`, because `torch::nn::Module` needs to declare it as a friend class for it to be able to access `torch::nn::Module`'s protected methods such as `_forward_has_default_args` / `_forward_num_required_args` / `_forward_populate_default_args`.
Test Plan: Imported from OSS
Differential Revision: D19777814
Pulled By: yf225
fbshipit-source-id: 1c9d5aa24f0689154752c426a83ee98f64c9d02f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33243
If a file does not exist in an archive, PyTorchStreamReader throws an exception. However, when PyTorchStreamReader is destructed another exception is thrown while processing the first exception. As a result of this double exception there is SIGABORT.
Thanks dreiss for catching this bug and suggesting the fix. It happened when he used _load_for_mobile to load a torch script file without bytecode session. A unittest is added to test this case.
Test Plan: Imported from OSS
Differential Revision: D19859205
Pulled By: iseeyuan
fbshipit-source-id: 8f96b6256f1a1f933fce1c256d64604c7e9269e4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33068
The version counter is already tracked if we use pytorch's functions but not if the user unpack the Tensor and modifies it by hand or with a third party library.
Test Plan: Imported from OSS
Differential Revision: D19791564
Pulled By: albanD
fbshipit-source-id: a73c0f73d8fd0c0e5bf838f14bed54fa66937840
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32506
In this PR, we've introduced a `retain_graph` parameter to distributed
autograd similar to `torch.autograd.backward`.
In terms of design, this parameter is sent over RPC to all nodes and is used to
create the GraphTask on the local nodes. This enables us to run
`dist_autograd.backward()` multiple times in the same context.
The use case currently for this is to benchmark only the backward pass for
distributed autograd. We'd like to measure the QPS for the backward pass and as
a result, running a single forward pass and multiple backward passes in a loop
is one way to benchmark backward pass performance.
ghstack-source-id: 97868900
Test Plan: waitforbuildbot
Differential Revision: D19521288
fbshipit-source-id: 7ad8521059fd400d7b5a6ab77ce56e1927ced90a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32326
Now that we have type-level granularity we can improve `mayContainAlias` queries. Each new values is initialized as containing the wildcard set of each contained mutable type. Whenever a value is added to a container it is set to the wildcard set. Now, to check if any two values contain overlapping values, we can just check if the `containedMemoryLocations` of two sets overlap.
Test Plan: Imported from OSS
Differential Revision: D19563262
Pulled By: eellison
fbshipit-source-id: c6d7489749c14b2054a6d50ef75baca699ada471
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32251
Previously wildcard sets were associated by TypeKind, meaning all Lists were in one alias set, all Classes were in one alias set, etc. We can improve analysis by bucketing wildcard sets by TypePtr instead. Any two mutable types which can unify should be in the same wildcard set bucket.
This also allows us do much simpler `mayContainAlias` analysis, and also improves `analyzeConservative` analysis because now we can recurse through all contained memory locations and mark writes, instead of just recursing only level deep in contained elements.
Test Plan: Imported from OSS
Differential Revision: D19563263
Pulled By: eellison
fbshipit-source-id: 371a37d1a8596abc6c53f41c09840b6c140ea362