Summary:
The block and thread extent calculations in `cuda_codegen` should be using `int64_t` instead of `int`. The updated test, `test_dynamic_shapes`, fails without this change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71428
Reviewed By: samdow
Differential Revision: D33640374
Pulled By: navahgar
fbshipit-source-id: 64c340ad2a9a1fa1fe066cf1c5dfc3b546b7be6d
(cherry picked from commit 6ea546ce11)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70613
This refactors `at::detail::empty_cpu` to use only `TensorBase` so you
can construct tensors without including `Tensor.h`. It also adds a
`TensorOptions` version to reduce friction in operators moving from
the `at::empty` API.
Test Plan: Imported from OSS
Reviewed By: samdow
Differential Revision: D33623682
Pulled By: ngimel
fbshipit-source-id: 7a7b08bc2ed06830a3d698197a0c8389a096dc1d
(cherry picked from commit 2e17ad0bbd)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71291
This commit removes torch::lazy::convertShapes since it's no longer used.
In addition, it replaces a numel logic within LTCTensorImpl.
Test Plan:
./build/bin/test_lazy
CI in lazy_tensor_staging branch
Reviewed By: wconstab
Differential Revision: D33575084
Pulled By: alanwaketan
fbshipit-source-id: b104ef39fd552822e1f4069eab2cb942d48423a6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71275
Currently it's taking more than 10 minutes to run the conformance test. Instead we should use parametrized test to shard into test segments so that they can run in parallel.
ghstack-source-id: 146990608
Test Plan:
```
[zhxchen17@devbig560.ftw3 /data/users/zhxchen17/fbsource/fbcode] buck test mode/dev-tsan //caffe2/test/cpp/jit:jit -- -r 'LiteInterpreterDynamicTypeTestFixture'
Building... 34.9 sec (99%) 12110/12111 jobs, 0/12111 updated
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: ebea52b3-7c7f-46be-9f69-18e2e7b040cc
Trace available for this run at /tmp/tpx-20220113-113635.717778/trace.log
RemoteExecution session id: reSessionID-ebea52b3-7c7f-46be-9f69-18e2e7b040cc-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/4222124735827748
✓ ListingSuccess: caffe2/test/cpp/jit:jit : 431 tests discovered (11.173)
✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/0 (51.331)
✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/1 (65.614)
✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/3 (76.875)
✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/5 (77.271)
✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/4 (78.871)
✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/6 (78.984)
✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/7 (84.068)
✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/2 (85.198)
✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/8 (88.815)
✓ Pass: caffe2/test/cpp/jit:jit - Conformance/LiteInterpreterDynamicTypeTestFixture.Conformance/9 (90.332)
Summary
Pass: 10
ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/4222124735827748
```
Reviewed By: qihqi
Differential Revision: D33570442
fbshipit-source-id: 5c49e03b0f88068d444c84b4adeaaf45433ce1fa
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69853
We can implement this overload more efficiently.
ghstack-source-id: 146924693
Test Plan:
patched alias_analysis tests
Time reported to initialize a predictor by static runtime when given ctr_mobile_feed local_ro net is 9.5s instead of 10.5s.
Reviewed By: mikeiovine
Differential Revision: D33039731
fbshipit-source-id: 52559d678e9eb00e335b9e0db304e7a5840ea397
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70201
Included functions:
save_mobile_module -> saves a mobile::Module to flatbuffer
load_mobile_module_from_file -> loads a flatbuffer into mobile::Module
parse_mobile_module -> parses from bytes or deserialized flatbuffer module object
Compared to previous attempts, this diff only adds flatbuffer to cmake target and leaves fbcode/xplat ones unchanged.
Test Plan: unittest
Reviewed By: malfet, gmagogsfm
Differential Revision: D33239362
fbshipit-source-id: b9ca36b83d6af2d78cc50b9eb9e2a6fa7fce0763
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67781
Update `LLVMCodeGen` in NNC to use the given kernel function name while emitting code.
This was earlier committed as D31445799 (c30dc52739) and got reverted as part of a stack of diffs that included a cache for `PyTorchLLVMJIT`, which was the likely culprit.
Test Plan:
```
buck test mode/opt //caffe2/test/cpp/tensorexpr:tensorexpr -- --exact 'caffe2/test/cpp/tensorexpr:tensorexpr - LLVM.CodeGenKernelFuncName'
```
Reviewed By: ZolotukhinM, bdhirsh
Differential Revision: D32145958
fbshipit-source-id: 5f4e0400c4fa7cabce5b91e6de2a294fa0cad88e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69567
This exposes torch.monitor events and stats via pybind11 to the underlying C++ implementation.
* The registration interface is a tad different since it takes a lambda function in Python where as in C++ it's a full class.
* This has a small amount of changes to the counter interfaces since there's no way to create an initializer list at runtime so they now also take a vector.
* Only double based stats are provided in Python since it's intended more for high level stats where float imprecision shouldn't be an issue. This can be changed down the line if need arises.
```
events = []
def handler(event):
events.append(event)
handle = register_event_handler(handler)
log_event(Event(type="torch.monitor.TestEvent", timestamp=datetime.now(), metadata={"foo": 1.0}))
```
D32969391 is now included in this diff.
This cleans up the naming for events. type is now name, message is gone, and metadata is renamed data.
Test Plan: buck test //caffe2/test:monitor //caffe2/test/cpp/monitor:monitor
Reviewed By: kiukchung
Differential Revision: D32924141
fbshipit-source-id: 563304c2e3261a4754e40cca39fc64c5a04b43e8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70464
Add handling of strided input tensors to dynamic fusion. This is done with the same set of input striding specializations as https://github.com/pytorch/pytorch/pull/60684/:
```
S_ONE, // STRIDE_ONE: packed
S_CONT, // STRIDE_CONTIGUOUS: stride[i + 1] * sizes[i + 1]
S_TRAN_CONT, // STRIDE_TRANSPOSED_CONTIGUOUS: stride[i-1] * sizes[i-1]
S_AS_ARG, // STRIDE_AS_ARG: stride passed in as runtime value
```
and then two additional specializations for a) contiguous tensor and b) channels-last tensor. channels-last is a common case and we should optimize for it. additionally, tensors natively store whether they are contiguous/channels-last contiguous, which makes it faster to check if tensors follow this pattern.
Output striding will be done in a follow up.
The striding is stored on both the TensorGroup node and on the guard node. The striding descriptors are stored as a vector of strings on the node for debugability and to make use of storing ivalues as attributes on nodes.
As an example:
```
%8 : Double(10, 11, 12, 13, strides=[1716, 1, 143, 11], requires_grad=0, device=cpu) = prim::TensorExprGroup_0[symbolic_shape_inputs=[-37, -36, -35, -34], striding_inputs_desc=[["TENSOR_CONT_CHANNELS_LAST"]](%x, %24, %23, %22, %21)```
```
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision: D33458649
Pulled By: eellison
fbshipit-source-id: c42616d3c683d70f6258180d23d3841a31a6030d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71144
This wasn't being used anywhere. It was originally intended for the SR flow but we're doing something else now.
Test Plan: Imported from OSS
Reviewed By: navahgar, ZolotukhinM
Differential Revision: D33521061
Pulled By: eellison
fbshipit-source-id: 0574698a2b7409df6feb703f81e806d886225307
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70509
TypeFactory will construct DynamicType when building on Edge platforms. We use this facility to make FunctionSchema return DynamicType all the time for OptionalType. We don't explicitly use DynamicTypeFactory everywhere because that requires too many changes and will split the entire aten codebase.
ghstack-source-id: 146818621
Test Plan: CI
Reviewed By: iseeyuan
Differential Revision: D33306737
fbshipit-source-id: d7ce00b438f7c03b43945d578280cfd254b1f634
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70233
Make type parser to produce DynamicType for all base types which don't have type arguments, and return DynamicType pointer for IValue::type().
ghstack-source-id: 146818622
Test Plan: no behavior change.
Reviewed By: iseeyuan
Differential Revision: D33137219
fbshipit-source-id: 1612c924f5619261ebb21359936309b41b2754f5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70212
Use DynamicType instead of ListType all over the place in Lite Interpreter. Namely we need to modify the following places:
1. Type parser which produces the Type constants.
2. IValue::type() which returns reflected Type from IValues.
3. Helper functions to construct the container value.
4. Typechecks which test whether a type instance is a particular container type.
ghstack-source-id: 146818619
Test Plan: CI
Reviewed By: iseeyuan
Differential Revision: D33176931
fbshipit-source-id: 9144787f5fc4778538e5c665946974eb6171a2e6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70202
Use DynamicType instead of DictType all over the place in Lite Interpreter. Namely we need to modify the following places:
1. Type parser which produces the Type constants.
2. IValue::type() which returns reflected Type from IValues.
3. Helper functions to construct the container value.
4. Typechecks which test whether a type instance is a particular container type.
ghstack-source-id: 146735648
Test Plan: no behavior change.
Reviewed By: iseeyuan
Differential Revision: D33137257
fbshipit-source-id: 971bf431658c422ea9353cc32cdab66e98876e9d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70871
We had previously handled reusing memory in the optimized kernel execution path, but not yet handled it if we hit the unoptimized fallback.
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D33458652
Pulled By: eellison
fbshipit-source-id: 4eb62181ed02c95813a99638f5e2d0f9347b5c08
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70338
Today Unpickler is used by both server and mobile for deserializing model, and it always fallback to mobile parser when there's no type resolver provided by user. However this is not intended as server and mobile type parser supports different things. In this diff we provide a default fallback using script parser and opt it out for all mobile cases.
ghstack-source-id: 146727330
(Note: this ignores all push blocking failures!)
Test Plan: CI
Reviewed By: iseeyuan
Differential Revision: D33284352
fbshipit-source-id: 997c4f110b36eee6596e8f23f6a87bf91a4197ed
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68137
A small step to replace existing OptionalType usage to DynamicType in Edge runtime.
ghstack-source-id: 146670520
Test Plan: CI
Reviewed By: iseeyuan
Differential Revision: D32264617
fbshipit-source-id: 62d3ffad40901842deac19ca2098ea5ca132e718
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69482
Add a test to enumerate a number of JIT type combinations and see if their subtyping behavior is preserved in the new DynamicType system.
ghstack-source-id: 146670526
Test Plan: buck test mode/opt //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.DynamicType'
Reviewed By: gmagogsfm
Differential Revision: D32891263
fbshipit-source-id: 728211b39778e93db011b69b0a4047df78a8fc5b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70535
This also fixes handling of inputs that happen to be outputs (they
require copy).
Test Plan: Imported from OSS
Reviewed By: pbelevich
Differential Revision: D33399116
Pulled By: ZolotukhinM
fbshipit-source-id: 9845838eb653b82ae47b527631b51893990d5319
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66515
These passes should not be used generally as they change API of the
model's forward method, but they help experimenting with the model and
ironing out all the kinks before it can be compiled properly. In the
long run ideally we should provide a better way to enable such
experiments.
Differential Revision:
D31590862
D31590862
Test Plan: Imported from OSS
Reviewed By: navahgar
Pulled By: ZolotukhinM
fbshipit-source-id: 74ded34c6c871d4cafa29f43dc27c7e71daff8fc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70339
When a python program is translated to TorchScript, the python exception type is dropped. This makes users's life hard when they need to categorize errors based more than only exception message.
Here we make the change so when we raise a python exception, we record the fully qualified class name for the exception. Later on when the TorchScript is interpreted, a special exception CustomJITException is thrown. User can get the python class name from CustomJITException::getPythonClassName .
Note that, this diff does not customize the mapping from C++ exception to Python exception. It's left to the users to do whatever mapping they want.
Code under scripts/shunting are just my own experimental code. I can split them out if requested.
ghstack-source-id: 146221879
Test Plan: buck test mode/opt //caffe2/test:jit
Reviewed By: gmagogsfm
Differential Revision: D33282878
fbshipit-source-id: 910f67a764519f1053a48589d1a34df69001525d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70062
This commit upstreams LTCTensorImpl from the lazy_tensor_staging branch.
It inherits from c10::TensorImpl and thus manages the lifetime/storage
of LazyTensor.
Test Plan: ./build/bin/test_lazy --gtest_filter=LazyTensorImplTest.*
Reviewed By: desertfire
Differential Revision: D33171186
Pulled By: alanwaketan
fbshipit-source-id: 6af9f91cc7c7e997f120cb89a7bcd6785c03ace0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69477
This diff adds a new run method to `TensorExprKernel` which takes in
output tensors as inputs and stores the output in those given tensors.
ghstack-source-id: 146107009
Test Plan: buck test mode/dev-nosan //caffe2/test/cpp/tensorexpr:tensorexpr -- --exact 'caffe2/test/cpp/tensorexpr:tensorexpr - Kernel.RunWithAllocatedOutputs'
Reviewed By: ZolotukhinM
Differential Revision: D32823890
fbshipit-source-id: edc1f4839785124048b034060feb71cb8c1be34f