Summary:
This diff is reverting D34455360 (61d6c43864)
D34455360 (61d6c43864) is making the following tests to fail and this revert diff is either the revert of the blame diff or the revert of the stack of diffs that need to be reverted to revert the blame diff
Tests affected:
- https://www.internalfb.com/intern/test/562950004334605/
Multisect link:
https://www.internalfb.com/intern/testinfra/multisect/756170
Test Plan: NA
Reviewed By: zhxchen17
Differential Revision: D34596156
fbshipit-source-id: a465bca0094db3caf6130c80f1ed49eea981359b
(cherry picked from commit ef5e5578c64ce9827570757fb016aafa9c782c6a)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72301
First step in resolving #35026.
This adds `PythonRecordFunction` which is a `torch::CustomClassHolder`
for `at::RecordFunction` to keep the ATen code free of torch includes.
And adds new unused internal API functions
`_record_function_enter_new` which return the torchbind object.
Once the FC period is expired, `torch.profiler.record_function` will
be updated to use this new internal API. Then once BC period is
expired, the cpp_custom_type_hack-based API can be removed.
Test Plan: Imported from OSS
Reviewed By: dagitses
Differential Revision: D34586311
Pulled By: robieta
fbshipit-source-id: d3eb9ffad7b348548a2b22c75203a92d1cb5115b
(cherry picked from commit 92d2ca808e5fbd20c9d6645dcabc3f059f9ef2d3)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71230
DBR quantization uses `torch.Tensor.as_subclass` frequently. When
the quantized model is traced with `torch.jit.trace`, these calls appear
in the resulting graph as `aten::alias`. This PR adds a pass to remove
these calls from the graph, for two reasons:
1. ease of debugging (these calls do nothing)
2. less work for downstream passes (for example, converting to ONNX currently breaks if these alias calls are present)
For now, we have to inline the graph in order for `aliasDb` to determine
safety properly. In the future, we may choose to relax this if there is
a need for it.
Test Plan:
Test plan is pretty basic for now, it can be improved in future PRs.
```
python test/test_quantization.py TestQuantizeDBR.test_jit_tracing_removes_aliases
```
Reviewed By: eellison
Differential Revision: D33552387
Pulled By: vkuzo
fbshipit-source-id: 681a33ddfff394a91e971263ac593afd93c5ea78
(cherry picked from commit 0f8412725d0c6fd9ef1072a50d4203465aa5d1f9)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72889
The script along with the GRAPH_EXPORT macro will allow for an easy way to extract IR from logs. One use case in this diff is to extract the fusion groups from nvfuser, so that the fusions can be tested individually.
Usage (e.g. for nvfuser test)
1. Write some test.py file that uses nvfuser
2. `PYTORCH_JIT_LOG_LEVEL=">>graph_fuser" python3 test.py 2>&1 | tee output.txt`
3. `python3 pytorch/scripts/jit/log_extract.py output.txt --nvfuser`
This will run with and without nvfuser to compare the output.
Alternatively, use `--output` to dump the IR so that it can be used in other applications.
Currently, only `--output` works (since generating input tensors is not supported)
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D34440189
Pulled By: davidberard98
fbshipit-source-id: fca0f619200ee37aba34bb39b69e6c640c263e26
(cherry picked from commit eb319166075db160f1628f0de545641fbecde8be)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73368
debug_pkl file inside of pytorch's .pt file consists of a list of SourceRanges. Each SourceRange points to a Source which is a stack track, filename, and start, end numbers. Those are emitted in debug_pkl file as strings.
Since many SourceRange shares the same source, the string for trace can be deduped.
The newer format saves a set of unique traces in a tuple, then each SourceRange will save the offset of it's trace w.r.t. position in that tuple. (i.e. manually applying dictionary compression).
The above helps with smaller file size. On loading, if we copy each trace to Source as string the runtime memory would still blowup.
To mitigate this, we use SourceView directly instead of source which will take the reference of string inside of Deserializer and make that into string_view. This is safe because Deserializer is hold by Unpickler by shared_ptr, and Unpickler is also hold by shared_ptr by another Source object. That Source object will be alive during the model construction.
Test Plan:
unit test
Took original file (312271638_930.predictor.disagg.local); loaded with `torch.jit.load` save again with `torch.jit.save`. Unzip both, look at contents:
```
[qihan@devvm5585.vll0 ~]$ du archive -h
4.0K archive/xl_model_weights
3.7M archive/extra
8.0K archive/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K archive/code/__torch__/caffe2/torch/fb/model_transform
8.0K archive/code/__torch__/caffe2/torch/fb
8.0K archive/code/__torch__/caffe2/torch
8.0K archive/code/__torch__/caffe2
20M archive/code/__torch__/torch/fx/graph_module
20M archive/code/__torch__/torch/fx
8.0K archive/code/__torch__/torch/classes
20M archive/code/__torch__/torch
20M archive/code/__torch__
20M archive/code
2.7M archive/constants
35M archive
[qihan@devvm5585.vll0 ~]$ du resaved -h
4.0K resaved/extra
8.0K resaved/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K resaved/code/__torch__/caffe2/torch/fb/model_transform
8.0K resaved/code/__torch__/caffe2/torch/fb
8.0K resaved/code/__torch__/caffe2/torch
8.0K resaved/code/__torch__/caffe2
1.3M resaved/code/__torch__/torch/fx/graph_module
1.3M resaved/code/__torch__/torch/fx
8.0K resaved/code/__torch__/torch/classes
1.4M resaved/code/__torch__/torch
1.4M resaved/code/__torch__
1.4M resaved/code
2.7M resaved/constants
13M resaved
[qihan@devvm5585.vll0 ~]$
```
Reviewed By: gmagogsfm
Differential Revision: D34455360
fbshipit-source-id: 8cc716f9bba7183746b1b4ecc33a2de34ac503b9
(cherry picked from commit f1a04730fc9ac8fdab6c8e4c44cb5529e42090e4)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73329
There is a quantization use case for having better alias analysis with function calls remaining. This does the relatively dumb approach of getting the inlined graph of each function call, and then analyzing that subgraph. Since we need a unique single analysis of every `Value*`, for every function call make a copy of the graph for every analysis past the first. This is relatively slow, but given the limited use case here should work well enough (and is no slower than calling the inlining pass).
cc vkuzo
Test Plan: Imported from OSS
Reviewed By: davidberard98
Differential Revision: D34451424
Pulled By: eellison
fbshipit-source-id: b7c7e54679d723f5ded1e11ffb32eb6d2176431d
(cherry picked from commit 81a42b31522b890311a3f512448b372c4ebbefd1)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72596
debug_pkl file inside of pytorch's .pt file consists of a list of SourceRanges. Each SourceRange points to a Source which is a stack track, filename, and start, end numbers. Those are emitted in debug_pkl file as strings.
Since many SourceRange shares the same source, the string for trace can be deduped.
The newer format saves a set of unique traces in a tuple, then each SourceRange will save the offset of it's trace w.r.t. position in that tuple. (i.e. manually applying dictionary compression).
The above helps with smaller file size. On loading, if we copy each trace to Source as string the runtime memory would still blowup.
To mitigate this, we use SourceView directly instead of source which will take the reference of string inside of Deserializer and make that into string_view. This is safe because Deserializer is hold by Unpickler by shared_ptr, and Unpickler is also hold by shared_ptr by another Source object. That Source object will be alive during the model construction.
Test Plan:
unit test
Took original file (312271638_930.predictor.disagg.local); loaded with `torch.jit.load` save again with `torch.jit.save`. Unzip both, look at contents:
```
[qihan@devvm5585.vll0 ~]$ du archive -h
4.0K archive/xl_model_weights
3.7M archive/extra
8.0K archive/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K archive/code/__torch__/caffe2/torch/fb/model_transform
8.0K archive/code/__torch__/caffe2/torch/fb
8.0K archive/code/__torch__/caffe2/torch
8.0K archive/code/__torch__/caffe2
20M archive/code/__torch__/torch/fx/graph_module
20M archive/code/__torch__/torch/fx
8.0K archive/code/__torch__/torch/classes
20M archive/code/__torch__/torch
20M archive/code/__torch__
20M archive/code
2.7M archive/constants
35M archive
[qihan@devvm5585.vll0 ~]$ du resaved -h
4.0K resaved/extra
8.0K resaved/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K resaved/code/__torch__/caffe2/torch/fb/model_transform
8.0K resaved/code/__torch__/caffe2/torch/fb
8.0K resaved/code/__torch__/caffe2/torch
8.0K resaved/code/__torch__/caffe2
1.3M resaved/code/__torch__/torch/fx/graph_module
1.3M resaved/code/__torch__/torch/fx
8.0K resaved/code/__torch__/torch/classes
1.4M resaved/code/__torch__/torch
1.4M resaved/code/__torch__
1.4M resaved/code
2.7M resaved/constants
13M resaved
[qihan@devvm5585.vll0 ~]$
```
Reviewed By: JasonHanwen
Differential Revision: D33994011
fbshipit-source-id: 8e6224c6e942e91c3403f686c8f0937d1002ed41
(cherry picked from commit a7014dd4029308c95007f362a57c31796d686647)
Summary:
Based on past PRs, here is an non-exhaustive list of files to consider for extension. The PR is not meant to be final. Based on feedback and discussion, files could be dropped from the list, or PR could be updated to move code around such that extension is no longer needed.
List of files below and description:
* These files are for converting from IR to ONNX proto. These should be used only for ONNX.
```
"torch/csrc/jit/serialization/export.*",
"torch/csrc/jit/serialization/onnx.*",
```
* This file is touched whenever pass signature is updated.
```
"torch/_C/__init__.pyi.in",
```
* These files are touched whenever pass signature is updated. Somehow it's been convention that onnx passes are also added here, but it could be possible to move them. Let me know what you think.
~~"torch/csrc/jit/python/init.cpp",~~
~~"torch/csrc/jit/python/script_init.cpp",~~
Update: Bowen will move onnx passes to files under onnx folder.
* ~~Touched when need new attr::xxx, or onnx::xxx.~~
~~"aten/src/ATen/core/interned_strings.h"~~
Update: Nikita will help separate this file.
malfet
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72297
Reviewed By: H-Huang
Differential Revision: D34254666
Pulled By: malfet
fbshipit-source-id: 032cfa590cbedf4648b7335fe8f09a2380ab14cb
(cherry picked from commit 88653eadbf)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72899
Reland D33282878 (911d527b87). This is the frontend change.
ghstack-source-id: 149204031
Test Plan: Refer to D33282878 (911d527b87). Also check CI
Reviewed By: gmagogsfm
Differential Revision: D34252127
fbshipit-source-id: 27b17ddd4d05d904eb91fd9ee094d9121f00e388
(cherry picked from commit 1d276baca3)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70471
Reland D33282878 (911d527b87). This is the frontend change.
ghstack-source-id: 149114933
Test Plan: Refer to D33282878 (911d527b87). Also check CI
Reviewed By: gmagogsfm
Differential Revision: D33342569
fbshipit-source-id: 57984ac67ae2c56c38f72d3b1fb69105901fb472
(cherry picked from commit b47cc935ee)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69547
ScriptModule export introduces duplicated ONNX initializers for shared weights, unnecessarily increases ONNX model size. This PR de-duplicates ONNX initializers for model exported in eval mode, by checking if the underlying tensors share the same `data_ptr`, `strides` and `sizes`.
Test Plan: Imported from OSS
Reviewed By: msaroufim
Differential Revision: D32994271
Pulled By: malfet
fbshipit-source-id: 10ac66638b6255890875272472aa9ed07a5b1d9a
Co-authored-by: BowenBao <bowbao@microsoft.com>
(cherry picked from commit d7cbde940c)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68491
* Allows implementing symbolic functions for domains other than `aten`, for example `prim`, in symbolic_opset#.py.
* Allows symbolic function to access extra context if needed, through `SymbolicFunctionState`.
* Particularly, the `prim::PythonOp` special case can access node without the need of passing node through inputs. Updates will be made downstreams, and in a follow-up PR we will remove the previous workaround in exporter.
* `prim::Loop`, `prim::If`, etc are now moved outside of `_run_symbolic_function` from utils.py, and to symbolic_opset9.py.
Motivation for this change:
- Better maintainability and reducing complexity. Easier to add symbolic for operators, both simple and complex ones (that need additional context), without the former needing to know the existence of the latter.
- The design idea was long outdated. prim ops are no longer rare special cases, and they shouldn't all be handled inside `_run_symbolic_function`. As a result this function becomes too clumsy. There were also prim ops symbolic added in symbolic_opset#.py with signature `prim_[opname]`, creating separation and confusion.
Test Plan: Imported from OSS
Reviewed By: jansel
Differential Revision: D32483782
Pulled By: malfet
fbshipit-source-id: f9affc31b1570af30ffa6668da9375da111fd54a
Co-authored-by: BowenBao <bowbao@microsoft.com>
(cherry picked from commit 1e04ffd2fd)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70465
These tests check to ensure that
(a) the result after nnc fusion (of a single op) is the same as the
unfused op
(b) for certain ops where fusion is expected to occur, ensure that
fusion does actually occur
Test Plan: Imported from OSS
Reviewed By: wenleix
Differential Revision: D33595240
Pulled By: davidberard98
fbshipit-source-id: e2e17a921bc30c313e92e8e5bbc6c1b5fcd14bc1
(cherry picked from commit b1ba221acc)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71651
The only tests that regress are because chunk NYI, the other tests that I touched were passing just because the `assertAllFused` wasn't working correctly. That, and we're no longer compiling conv/matmul w dynamic shapes
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision: D33801500
Pulled By: eellison
fbshipit-source-id: 074118ab4a975b7db876a4fcdfb9483afb879e79
(cherry picked from commit abaa7948c1)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71650
*
Refactors PE so there is a current fusion strategy set, which will take in a vector of e.g. [(STATIC, 2), (DYNAMIC, 10)] which means fuse two static invocations then fuse 10 dynamic ones, then stop specializing.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D33801501
Pulled By: eellison
fbshipit-source-id: ebc7ac3c57e35a3b9bb15ab751f0aa1d25cc9bd5
(cherry picked from commit 8dd89088d3)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71578
Use more robust way of extracting upgrader min and max versions
Test Plan: omgitsgreen
Reviewed By: cccclai
Differential Revision: D33690113
fbshipit-source-id: 79a964acb26d7ca1354e104710a285b8da3f46d1
(cherry picked from commit 9e316ee5c1)
Summary:
The model generation script will check the model version, to ensure the developer run the script before they change operator
Previously, the version use the old model version. However, it's hard for developer to know the old version number. In this change, it use the current max operator version to check. It's less strict, but more developer friendly
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71894
ghstack-source-id: 147769215
Test Plan:
first time run:
```
chenlai@devvm5615:~/fbsource/fbcode(b82243650)$ buck run mode/opt //caffe2/torch/fb/mobile/upgrader_codegen:upgrader_test_models_gen
Parsing buck files: finished in 0.7 sec
Downloaded 0/2 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 21.6 sec (100%) 11547/11547 jobs, 2/11547 updated
Total time: 22.4 sec
BUILD SUCCEEDED
TestVersionedDivTensorExampleV7() aten::div.Tensor
INFO:test.jit.fixtures_srcs.generate_models:Processing TestVersionedDivTensorExampleV7
INFO:test.jit.fixtures_srcs.generate_models:Generating model test_versioned_div_tensor_example_v7 and it's save to /data/users/chenlai/fbsource/fbcode/caffe2/test/jit/fixtures/test_versioned_div_tensor_example_v7.ptl
chenlai@devvm5615:~/fbsource/fbcode(b82243650)$
```
second time run:
```
chenlai@devvm5615:~/fbsource/fbcode(b82243650)$ rm caffe2/test/jit/fixtures/test_versioned_div_tensor_example_v4.ptl
chenlai@devvm5615:~/fbsource/fbcode(b82243650)$ buck run mode/opt //caffe2/torch/fb/mobile/upgrader_codegen:upgrader_test_models_gen
Action graph will be rebuilt because files have been added or removed.
Parsing buck files: finished in 2.0 sec
Building... 17.4 sec (99%) 9289/9290 jobs, 0/9290 updated
TestVersionedDivTensorExampleV7() aten::div.Tensor
INFO:test.jit.fixtures_srcs.generate_models:Processing TestVersionedDivTensorExampleV7
INFO:test.jit.fixtures_srcs.generate_models:Model test_versioned_div_tensor_example_v7 already exists, skipping
chenlai@devvm5615:~/fbsource/fbcode(b82243650)$ jf s
```
Reviewed By: tugsbayasgalan
Differential Revision: D33804737
fbshipit-source-id: 7424b81a700703bdf896ec606c2dac8df6dbf8a6
(cherry picked from commit 44b4e37d30)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68945
This PR enables the Python conversion functions for `Storage` (specifically `UntypedStorage`) and also cleans up some remnants of the deprecated typed storages from `DynamicTypes.cpp`.
ghstack-source-id: 147245110
Test Plan: Run the existing unit and integration tests.
Reviewed By: albanD
Differential Revision: D32676505
fbshipit-source-id: 3a3f6db4fb0da5c78dd406c96ab70bdc37015521
(cherry picked from commit d6427b94cf)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71443
cogwheel test inline_cvr_infer_canary_pyper_model_publish is timing out.
The convert_fx call takes > 20 mins for local and local_ro sub modules, which used to take ~ 2 mins.
Test Plan:
Fblearn flow run
* the following cmd took 1113 seconds before the diff and 5002 seconds after.
flow-cli clone-locally 320014219 --run-as-secure-group pytorch_at_scale --operators pyper_model_publish_workflow.pyper_model_publish_workflow.process_torch_package_model_files.process_non_sparse_parameters[0]
Cogwheel test
* Cogwheel test with packages in B3588 (the last good run) took 4694.48s
* Cogwheel test with packages in B3590 (the first timeout) took 13975.83s
* Cogwheel test with the following packages took 4535.04s
* all packages in B3588 except the model publish
* the model publish built with D33469839 (043e84b3d2) reversed (created D33633570)
Reviewed By: albanD, jerryzh168
Differential Revision: D33633570
fbshipit-source-id: dc5e777c48a90c551641a3f79126461f6a60449e
(cherry picked from commit 03ab65023a)
Summary:
`diff_type` kind of naturally should be `ptrdiff_t`, as `ssize_t` is actually defined [here](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_types.h.html) as :
> The type ssize_t shall be capable of storing values at least in the range [-1, {SSIZE_MAX}].
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71271
Reviewed By: atalman
Differential Revision: D33569304
Pulled By: malfet
fbshipit-source-id: 57dafed5fc42a1f91cdbed257e76cec4fdfbbebe
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70464
Add handling of strided input tensors to dynamic fusion. This is done with the same set of input striding specializations as https://github.com/pytorch/pytorch/pull/60684/:
```
S_ONE, // STRIDE_ONE: packed
S_CONT, // STRIDE_CONTIGUOUS: stride[i + 1] * sizes[i + 1]
S_TRAN_CONT, // STRIDE_TRANSPOSED_CONTIGUOUS: stride[i-1] * sizes[i-1]
S_AS_ARG, // STRIDE_AS_ARG: stride passed in as runtime value
```
and then two additional specializations for a) contiguous tensor and b) channels-last tensor. channels-last is a common case and we should optimize for it. additionally, tensors natively store whether they are contiguous/channels-last contiguous, which makes it faster to check if tensors follow this pattern.
Output striding will be done in a follow up.
The striding is stored on both the TensorGroup node and on the guard node. The striding descriptors are stored as a vector of strings on the node for debugability and to make use of storing ivalues as attributes on nodes.
As an example:
```
%8 : Double(10, 11, 12, 13, strides=[1716, 1, 143, 11], requires_grad=0, device=cpu) = prim::TensorExprGroup_0[symbolic_shape_inputs=[-37, -36, -35, -34], striding_inputs_desc=[["TENSOR_CONT_CHANNELS_LAST"]](%x, %24, %23, %22, %21)```
```
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision: D33458649
Pulled By: eellison
fbshipit-source-id: c42616d3c683d70f6258180d23d3841a31a6030d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69645
As noted in code comment:
existing device operator is registered with input name `a`, which prevents torch.device(type="cuda") from working. add shim-layer here
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision: D33515231
Pulled By: eellison
fbshipit-source-id: c04af8158a9568a20cd5fbbbd573f6efab98fd60
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67254
Fixes https://github.com/pytorch/pytorch/issues/65997
BC breaking:
`output = torch.ops._test.leaky_relu(self=torch.tensor(-1.0))` now fails with the error `TypeError: __call__() got multiple values for argument 'self'` since we call into `OpOverloadBundle`'s `__call__` method that has `self` bound to it as its first argument.
Follow up work:
1. disallow `default` as an overload name for aten operators.
2. Add a method to obtain a list of all overloads (exclude the ones registered by JIT)
3. Add methods/properties to `OpOverload` to access more schema information (types of input and output args etc)
cc ezyang gchanan
Test Plan: Imported from OSS
Reviewed By: pbelevich
Differential Revision: D33469839
Pulled By: anjali411
fbshipit-source-id: c3fc43460f1c7c9651c64b4d46337be21c400621
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65645
This is a retry of PR: https://github.com/pytorch/pytorch/pull/59492
Latest Changes: Added more tests, added the getOrCreateDB pattern, updated logic to remove unnecessary checks
addressed all comments.
Adding code to find common expressions from the two subblocks of an if
operation and hoist them before the if block.
This also allows Dead Code Elimination to
then eliminate some if blocks.
Test Plan: python test_jit.py TestIfHoisting
Reviewed By: eellison
Differential Revision: D33302065
Pulled By: Gamrix
fbshipit-source-id: a5a184a480cf07354359aaca344c6e27b687a3c2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68136
DynamicType is an extension to existing server JIT types. Today using normal server types on Edge is a bit problematic because in embedded environments we don't need the full spectrum of types but we still build with these unneeded dependencies.
Is it possible to just get rid of unneeded JIT types from Edge builds? It's not easy to do so at this moment. For example, on Edge we don't support Union type, but we have to pull in the dependency of Union type because Optional type is being supported which inherits from Union type, so Union type has to be included in the build. Although we could split Union type and Optional type, it could be argued that the root cause is every time we use anything inheriting from `c10::Type`, we don't have the direct evidence of how much dependency we pull in, because we do virtual calls and we don't know what exactly we're calling with server JIT types. If we don't know, it's highly possible that the linker doesn't know either so it cannot effectively strip unused methods.
To address this problem, one option is to implement a separate `DynamicType` which has simpler behavior and doesn't store different types as different symbols in binary but rather raw data (or "tag"). This could increase the binary size by several KBs, so I included several binary size reductions in the same stack, hoping at least we don't regress the binary size.
Currently `DynamicType` inherits from `c10::Type` because I want to reduce the migration cost of `DynamicType` by making it interfacing with existing server JIT types. In the future `DynamicType` should be implemented as a separate class without relying on `c10::Type` to make things both simpler and leaner.
ghstack-source-id: 146670522
Test Plan: in the next diff.
Reviewed By: VitalyFedyunin
Differential Revision: D32264615
fbshipit-source-id: 180eb0998a14eacc1d8b28db39870d84fcc17d5b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69579
This should help us avoid reference counting overhead on singleton Type subclasses without a major rewrite of the Type subsystem.
ghstack-source-id: 146643993
Test Plan:
Ran //caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads:cpp_benchmark with arguments `--op empty -niter 40 --stressTestRecordFunction --captureRecordFunctionInputs` on devbig with turbo off.
Before:
```
I1206 13:47:15.037441 1201670 bench.cpp:144] Mean 0.737675
I1206 13:47:15.037463 1201670 bench.cpp:145] Median 0.736725
I1206 13:47:15.037468 1201670 bench.cpp:146] Min 0.722897
I1206 13:47:15.037473 1201670 bench.cpp:147] stddev 0.00508187
I1206 13:47:15.037482 1201670 bench.cpp:148] stddev / mean 0.00688903
```
After:
```
I1206 13:48:16.830123 1205612 bench.cpp:144] Mean 0.66988
I1206 13:48:16.830150 1205612 bench.cpp:145] Median 0.663956
I1206 13:48:16.830157 1205612 bench.cpp:146] Min 0.65986
I1206 13:48:16.830164 1205612 bench.cpp:147] stddev 0.0335928
I1206 13:48:16.830171 1205612 bench.cpp:148] stddev / mean 0.0501475
```
Static runtime startup is also improved; for CMF local_ro, time to initialize a predictor went from 10.01s to 9.59s.
(Note: I wish I had a production workload to demonstrate the advantage of this on. I tried ctr_mobile_feed local_ro net but it was neutral. Anything that manipulates types or List/Dict a lot might be promising.)
Reviewed By: suo
Differential Revision: D32923880
fbshipit-source-id: c82ed6689b3598e61047fbcb2149982173127ff0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67254
Fixes https://github.com/pytorch/pytorch/issues/65997
TODO: disallow `default` as an overload name for aten operators.
BC breaking:
`output = torch.ops._test.leaky_relu(self=torch.tensor(-1.0))` now fails with the error `TypeError: __call__() got multiple values for argument 'self'` since we call into `OpOverloadBundle`'s `__call__` method that has `self` bound to it as its first argument.
cc ezyang gchanan
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D33262228
Pulled By: anjali411
fbshipit-source-id: 600dbf511514ea9b41aea3e6b1bc1102dab08909
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68691
TraceType is a sharded file, so by only including specific operator
headers, we ensure that changing one (non-method) operator only needs
one shard to be re-compiled.
This also changes all the included autograd and jit headers from
including `ATen/ATen.h` to just including `ATen/core/Tensor.h`.
Test Plan: Imported from OSS
Reviewed By: gchanan
Differential Revision: D33336948
Pulled By: albanD
fbshipit-source-id: 4e40371592b9a5a7e7fcd1d8cecae11ffb873113