Commit Graph

1896 Commits

Author SHA1 Message Date
Michael Suo
032d1ace1d [ci] disable flaky MobileProfiler.Backend test
This test is flaky, normally I'd disable using the disable bot but it
doesn't support cpp.

[skip ci]

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78320

Approved by: https://github.com/malfet
2022-05-26 03:22:55 +00:00
Shunting Zhang
26d9386f67 Make string serialization of C++ FunctionSchema consistent with torchgen.model.FunctionSchema
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77926

There is a discrepency between the string representation of C++ FunctionSchema and torchgen.model.FunctionSchema.
The latter will not add parenthesis around the returned types if that a single item,
but the C++ FunctionSchema always add the parenthesis.

Make them consistent so we can convert one type to the other via its string representation and parse method.

Differential Revision: [D36535924](https://our.internmc.facebook.com/intern/diff/D36535924/)

Approved by: https://github.com/bdhirsh
2022-05-24 19:39:26 +00:00
John Clow
c82fb7a67f Adding support for upper and lower bound functions in SSA
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77389

Approved by: https://github.com/eellison
2022-05-20 23:58:40 +00:00
Nikolay Korovaiko
df1f9b9840 Implement sym_sizes to create proper IR for sym ints representing tensor sizes (#77756)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77756
Approved by: https://github.com/desertfire
2022-05-20 05:39:03 +00:00
PyTorch MergeBot
e9d660c331 Revert "Revert "Revert "Implement sym_sizes to create proper IR for sym ints representing tensor sizes (#76836)"""
This reverts commit acf7136a52.

Reverted https://github.com/pytorch/pytorch/pull/77719 on behalf of https://github.com/suo
2022-05-18 05:06:50 +00:00
Edward Z. Yang
acf7136a52 Revert "Revert "Implement sym_sizes to create proper IR for sym ints representing tensor sizes (#76836)""
This reverts commit c35bd8d423.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77719

Approved by: https://github.com/Chillee, https://github.com/malfet
2022-05-18 03:25:43 +00:00
PyTorch MergeBot
c35bd8d423 Revert "Implement sym_sizes to create proper IR for sym ints representing tensor sizes (#76836)"
This reverts commit fc4c3c9bc7.

Reverted https://github.com/pytorch/pytorch/pull/76836 on behalf of https://github.com/suo
2022-05-18 02:45:25 +00:00
Han Qi (qihqi)
3822a472ef Python function to extract information on mobile::Module from flatbuffer (#77624)
Summary:
Includes following refactor:
1. common loading on operator validation that is dup'd in pickle and
   flatbuffer loader moved to function.h/cpp
2. Allow loading of a function without wiring operator.

This function will be used to implement get_bundled_input and friends
for flatbuffer.

Test Plan: contbuild & OSS CI, see 69fa49f123

Reviewed By: cccclai

Differential Revision: D36348549

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77624
Approved by: https://github.com/cccclai
2022-05-18 00:42:57 +00:00
Nikolay Korovaiko
fc4c3c9bc7 Implement sym_sizes to create proper IR for sym ints representing tensor sizes (#76836)
LTC Tensors now create real IR (SizeNode) for sym_sizes() in LTCTensorImpl.cpp.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76836
Approved by: https://github.com/ezyang
2022-05-18 00:40:42 +00:00
Michael Suo
7f1e331b34 Make SymInt constructor explicit
Since we plan to have a bunch of code that is sensitive to whether or
not a SymInt contains a symbolic shape or not, it seems like a bad idea
to have an implicit constructor.

For example, code like:
```
sizes_and_strides_.stride_at_unchecked(dim) = 0;
```

would sail through, and the `0` would get implicitly promoted to a
SymInt.

This is a tradeoff though: it makes code that handles `SymInt`s more
clunky as `int64_t`s and integer literals need to be explicitly wrapped
in `SymInt` before being used.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77666

Approved by: https://github.com/ezyang
2022-05-17 22:28:35 +00:00
Bin Bao
25c6ebd12c Revert "Revert "[LT] Codegen ReuseNode for supported ops""
Summary: Fixed a XLC build failure by generating an always-return-false
default CanBeReused method.

This reverts commit 3cade9d454.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77513

Approved by: https://github.com/alanwaketan
2022-05-16 20:14:42 +00:00
Wang, Eikan
e5a5cd149f Simplify IfThenElse and CompareSelect within for-loop (#76793)
Analyze the range to determine if a condition cannot be satisfied. Suppose the for-loop body contains `IfThenElse` or `CompareSelect` while the condition of the two statements depends on the for-loop index `Var`. In that case, we will analyze the range to check whether the condition could always be satisfied or not. If the condition is deterministic, simplify the logic.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76793
Approved by: https://github.com/huiguoo
2022-05-15 20:21:28 +00:00
PyTorch MergeBot
3cade9d454 Revert "[LT] Codegen ReuseNode for supported ops"
This reverts commit 6066e5929f.

Reverted https://github.com/pytorch/pytorch/pull/76738 on behalf of https://github.com/malfet
2022-05-14 00:33:10 +00:00
Bin Bao
6066e5929f [LT] Codegen ReuseNode for supported ops
Summary:
1. Update the codegen script to add a TrieCache lookup (ReuseNode)
before creating a new IR node. The following is an example generated
code,

```
    at::Tensor LazyNativeFunctions::add(const at::Tensor & self, const at::Tensor & other, const at::Scalar & alpha) {
        ...
        torch::lazy::NodePtr node = torch::lazy::ReuseNode<AddTensor>(lazy_self->GetIrValue(), lazy_other->GetIrValue(), node_alpha);
        if (!node) {
            auto out_meta = at::meta::add(self, other, alpha);
            std::vector<Shape> shapes{Shape(out_meta.scalar_type(), out_meta.sizes().vec())};
            TORCH_INTERNAL_ASSERT(shapes.size() == 1);
            if(symbolicShapeEnabled()){
                std::vector<jit::IValue> inputs = { self, other, alpha };
                char* schema_str = "aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor";
                applySymbolicShapesOnLT(schema_str, inputs, shapes);
            }

            node = torch::lazy::MakeNode<AddTensor>(lazy_self->GetIrValue(), lazy_other->GetIrValue(), node_alpha, std::move(shapes));
            CacheNode(node);
        }
        ...
    }
```
2. TrieCache lookup depends on each IR node subclass to provide its own
comparison function. The following is an example generated code,

```
  bool CanBeReused(const torch::lazy::Value& self, const torch::lazy::Value& other, const torch::lazy::Value& alpha) const {
    size_t i = 0;
    return (operand(i++) == self &&
        operand(i++) == other &&
        operand(i++) == alpha);
  }
```

3. DeviceData is specially handled.

4. Non-codegen op changes are coming a separate PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76738

Approved by: https://github.com/JackCaoG, https://github.com/wconstab
2022-05-13 19:13:58 +00:00
yanbing-j
4f82f439d1 Enable BFloat16 ELU, SELU and CELU in CPU path (#62546)
Enable BFloat16 ELU, SELU and CELU in CPU path. SELU and CELU will call ELU implementation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62546
Approved by: https://github.com/frank-wei
2022-05-12 16:56:57 +00:00
Xiang Gao
cc9d0f309e lshift and rshift stop support floating types (#77146)
Fixes #74358

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77146
Approved by: https://github.com/ngimel
2022-05-11 22:29:30 +00:00
Bin Bao
8f5cdc6d5d Revert "Revert "[LT] Store OpKind for each IR subclass in a static field""
Summary: Re-land https://github.com/pytorch/pytorch/pull/76711 by
fixing internal build errors.
Generate class-level opkind as a static method instead of a static
member.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77102

Approved by: https://github.com/wconstab, https://github.com/JackCaoG, https://github.com/antoniojkim
2022-05-11 12:27:05 +00:00
John Clow
26e2936edc [JIT SSA] Added testing for the Cat Op in LazyTensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76552

Approved by: https://github.com/Krovatkin
2022-05-09 22:11:14 +00:00
PyTorch MergeBot
7eaf4780ba Revert "[LT] Store OpKind for each IR subclass in a static field"
This reverts commit ac37ddc795.

Reverted https://github.com/pytorch/pytorch/pull/76711 on behalf of https://github.com/malfet
2022-05-09 20:50:09 +00:00
Fuqiang Zhang
bd573389f6 [Bootcamp]Add option for flatbuffer loader to copy memory to individual tensors (#76986)
Summary: Add option for flatbuffer loader to copy memory to individual tensors to allow free memeory without waiting for all tensor runs completed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76986
Approved by: https://github.com/qihqi
2022-05-09 17:29:30 +00:00
Bin Bao
ac37ddc795 [LT] Store OpKind for each IR subclass in a static field
Summary: Currently OpKind is stored as an object field called op_ for each IR
node, and one usage of op_ is to avoid dynamic_cast in NodeCast when we
need to downcast a base-node pointer into a concrete sub-node pointer.
As a result, we need to construct and pass in an op when downcasting
nodes, and this becomes quite anonnying when we start to implement the
trie-based IR node reusing. More importantly, the op for each subclass
should be unique for that subclass and thus making it a const static field
is a more logical design.

In this PR, we still keep the object-level op_ for easier XLA adoption. As
furture work, we can come back to remove op_, make the op() method
virtual, and get rid of OpKind in all the node constructors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76711

Approved by: https://github.com/wconstab, https://github.com/JackCaoG
2022-05-06 19:14:46 +00:00
David Berard
6c615a21a0 [NVFuser] prep for on-by-default
1. fix tests that expected nvfuser off-by-default behavior
2. skip nvfuser if getExecutorMode() == false

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76937

Approved by: https://github.com/eellison
2022-05-06 18:18:53 +00:00
Bin Bao
f05710dd40 [LT] Add a trie data structure for caching IR nodes
Summary: TrieCache provides a way to look up an IR node before we
actually create it. If the lookup hits in TrieCache, we reuse the
existing node and move the current pointer in TrieCache to point to that
node; if the lookup misses, we create a new node and insert it into TrieCache.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76542

Approved by: https://github.com/wconstab, https://github.com/JackCaoG
2022-05-04 23:48:03 +00:00
Wang, Eikan
429a80dded [NNC] Lowering function generates the output buffer with the specified stride (#76529)
Summary:
Pass stride information to lowering function to generate the output bufer with proper memory layout.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76529

Reviewed By: ZolotukhinM

Differential Revision: D36116712

Pulled By: IvanKobzarev

fbshipit-source-id: d3901f756b3710ecce172d6db3ecb0b7c12fb929
(cherry picked from commit b6cd53c91c01db36ea0e99167dc0ce0ae1d3aa23)
2022-05-04 20:04:22 +00:00
Bin Bao
f8a4780eb2 [LT] Move MakeNode into ir_builder.h
Summary: Move MakeNode into ir_builder.h to avoid circular header
reference later when introducing a trie cache for IR node lookup.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76482

Approved by: https://github.com/wconstab
2022-05-03 14:53:19 +00:00
Elias Ellison
e5a55af305 Reland reland
Reland of https://github.com/pytorch/pytorch/pull/76397 and https://github.com/pytorch/pytorch/pull/76493

This time I'll get it right 😢
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76539
Approved by: https://github.com/davidberard98, https://github.com/osalpekar
2022-04-28 20:41:55 +00:00
PyTorch MergeBot
a5bc02aeb2 Revert "[JIT] Register decomp reland"
This reverts commit 81b9cb741c.

Reverted https://github.com/pytorch/pytorch/pull/76397 on behalf of https://github.com/osalpekar
2022-04-28 03:33:29 +00:00
Antonio Kim
f3f327e103 Decouple LTC from TS Backend using Lazy IR Builder
Next stage of breaking up https://github.com/pytorch/pytorch/pull/74710

IR builder class introduced to decouple the explicit usage of `TsNode` in core lazy tensors.

Requires https://github.com/pytorch/pytorch/pull/75324 to be merged in first.

**Background**
- there are ~ 5 special ops used in lazy core but defined as :public {Backend}Node.  (DeviceData, Expand, Scalar...)
- we currently require all nodes derive from {Backend}Node, so that backends can make this assumption safely
- it is hard to have shared 'IR classes' in core/ because they depend on 'Node'

**Motivation**

1. avoid copy-paste of "special" node classes for each backend
2. in general decouple and remove all dependencies that LTC has on the TS backend

**Summary of changes**
- new 'IRBuilder' interface that knows how to make 5 special ops
- move 'special' node classes to `ts_backend/`
- implement TSIRBuilder that makes the special TS Nodes
- new backend interface API to get the IRBuilder
- update core code to call the builder

CC: @wconstab @JackCaoG @henrytwo

Partially Fixes #74628

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75433
Approved by: https://github.com/wconstab
2022-04-28 02:07:02 +00:00
Jiewen Tan
a28b132bc2 Revert D35860266: [pytorch][PR] Update torch::lazy::BackendDevice to have a new default ordinal
Test Plan: revert-hammer

Differential Revision:
D35860266 (f9d07ae644)

Original commit changeset: 554ebe16a068

Original Phabricator Diff: D35860266 (f9d07ae644)

fbshipit-source-id: 325c54aa2e87e51134115213352b3d33a81b7edf
(cherry picked from commit bbd74bf34a534d1b87aadff9790038e3dbbfa9c8)
2022-04-27 18:11:24 +00:00
Elias Ellison
81b9cb741c [JIT] Register decomp reland
Reland of https://github.com/pytorch/pytorch/pull/76252
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76397
Approved by: https://github.com/davidberard98
2022-04-26 23:17:18 +00:00
PyTorch MergeBot
2d72cb3373 Revert "[JIT] Allow registering Decompositions"
This reverts commit d9f0774f98.

Reverted https://github.com/pytorch/pytorch/pull/76252 on behalf of https://github.com/zengk95
2022-04-26 04:47:05 +00:00
Elias Ellison
d9f0774f98 [JIT] Allow registering Decompositions
- Allow registering custom decompositions
- Add easier API for invoking decompositions
- Shorten API names (no users yet)

I am doing these as one pr because they are fairly short/simple and because github first does not support ghstack yet.

cc @Chillee @zou3519
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76252
Approved by: https://github.com/davidberard98
2022-04-26 03:00:35 +00:00
Nikolay Korovaiko
bb60cac25a E2E SymInt example narrow_copy
This **roughly** corresponds to Goal 3.2 in https://docs.google.com/document/d/1iiLNwR5ohAsw_ymfnOpDsyF6L9RTUaHMpD8YLw-jxEw/edit#

Namely:

It adds the following:

* SymbolicIntNode interface
* LazySymbolicIntNode implementation
* Lazy `narrow_copy` implementation
* Need add support for SymInt in codegen
* Test (below)

```cpp
TEST(LazyDynamicOpsTest, NarrowCopy) {
  auto x = torch::rand({5, 10, 10}).to(kLazy);
  const size_t Y_DIM = 3;
  const size_t X_DIM_INDEX = 2;
  auto y = torch::rand({Y_DIM}).to(kLazy);
  auto ly = torch::lazy::TryGetLtcTensor(y);
  auto dim_node = MakeNode<SizeNode>(ly->GetIrValue(), 0);
  auto lmn = new torch::lazy::SymbolicIntNode(dim_node);
  auto z = x.narrow_copy(X_DIM_INDEX, 0, lmn->toSymInt());
  AllClose(z.cpu(), x.cpu().narrow_copy(X_DIM_INDEX, 0, Y_DIM));
}
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75759
Approved by: https://github.com/wconstab
2022-04-26 02:40:27 +00:00
Wonjoo Lee
f9d07ae644 Update torch::lazy::BackendDevice to have a new default ordinal (#76264)
Summary:
Fixes https://github.com/pytorch/xla/issues/3490. Updates `torch::lazy::BackendDevice` with changes below:

1. Remove the no-op string constructor.
2. Update default ordinal to `-1`.
3. Add a `is_valid` function to check if `ordinal` is valid/non-default (`ordinal >= 0`).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76264

Reviewed By: mrshenli

Differential Revision: D35860266

Pulled By: alanwaketan

fbshipit-source-id: 554ebe16a0683d37b00270c4f35163bf690bfe28
(cherry picked from commit b941d10e8545dfecfb34e4d5c24a29a1cc49bc4b)
2022-04-25 23:57:18 +00:00
zengk95
1d55518198 Revert "[nnc] Strides to Tensor (#72962)"
This reverts commit 939060925f.

Fixes https://github.com/pytorch/vision/issues/5873

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76332
Approved by: https://github.com/seemethere
2022-04-25 19:50:00 +00:00
Ivan Kobzarev
939060925f [nnc] Strides to Tensor (#72962)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72962

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM, cpuhrsch

Differential Revision: D34589306

Pulled By: IvanKobzarev

fbshipit-source-id: ecee5249760ecc0c8b2edb1842b90218899bc944
(cherry picked from commit 9e310c4c67389da30da89126d838ffe3864aba6f)
2022-04-23 19:35:15 +00:00
Prem
7557407653 Added directory check before saving in C++ API
Fixes #75177

Couldn't find any utility method to get directory name in pytorch repo, hence creating a function for that.
Let me know if a new function is not needed.

I also referred [this](https://github.com/pytorch/pytorch/blob/master/c10/test/util/tempfile_test.cpp#L15) for directory check.

Also I am using TORCH_CHECK to show the error. This is highly verbose with the entire stack visible. Is there any alternative for the same so that it is easier to read? This could happen a frequently, so small and concise error would be more helpful here.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75681
Approved by: https://github.com/albanD
2022-04-22 20:04:41 +00:00
Wang, Eikan
ef0873327e [NNC] Add utility functions to check channels-last contiguous (#75938)
Summary:
The `Buf` uses `std::vector<ExprHandle>` to represent its strides. The `ExprHandle` could be an immediate value or a mathematical expression with variables involved both for the static shape and dynamic shape. So it is hard to directly deduce the channels-last contiguous layout based on the numerical calculation. Hence, the utility functions of this PR are based on the pattern match to check whether the `Buf` is channels-last contiguous.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75938

Reviewed By: cpuhrsch

Differential Revision: D35724091

Pulled By: ZolotukhinM

fbshipit-source-id: f79ae21749d0aad8601f0434b52df88602ff09bf
(cherry picked from commit 3712bbbe4bea57c5c1abe1eafde4b8778e13e0c4)
2022-04-22 06:42:39 -07:00
Antonio Kim
2c2c13d21b Decouple Lazy Node Shape Cache (#75324)
Summary:
Next stage of breaking up https://github.com/pytorch/pytorch/pull/74710

Move shape cache implementation to the backend interface. Also, clean up some of the hashing logic in the base node class.

CC: wconstab JackCaoG henrytwo

Partially Fixes https://github.com/pytorch/pytorch/issues/74628

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75324

Reviewed By: anjali411

Differential Revision: D35730823

Pulled By: wconstab

fbshipit-source-id: cf6fa326319b9324e5f422a78817b6fb5bf7e9b8
(cherry picked from commit faec5043df56639e2fd23de2d91ae796e4f3df70)
2022-04-21 17:27:05 -07:00
Nikolay Korovaiko
69e048b090 List of SymInt rebase on master
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75115
Approved by: https://github.com/ezyang
2022-04-20 02:09:55 +00:00
Elias Ellison
f65eb09d6b [JIT] Move Shape Function definition to python
Moves jit shape function registration to python. Like jit decompositions, a script must be run after adding new definitions which serializes them in a c++ file.

This was a request so that torch-mlir could define functions in python and upstream their shape functions. cc @silvasean  @makslevental
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75546
Approved by: https://github.com/davidberard98
2022-04-19 20:59:44 +00:00
Taylor Robie
a5e338a826 [RecordFunction] More effecient machinery to determine which callbacks to run. (#75807)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75807

There is a tension in RecordFunction between two use cases:
1) In the normal eager path we don't run any callbacks, so we need to bail out of the profiling path as soon as possible to minimize eager overhead.
2) When profiling we want to determine which callbacks to run as efficiently as possible to minimize instrumentation overhead.

The confounding factor in all of this is sampling callbacks because they change which callbacks will run on each call, even in steady state operation. This has traditionally been handled with a two stage procedure: first we flip a coin to determine if a sampled callback *might* run. If false (which it usually is), do nothing. This solves (1). If true, check to see if we need to build the full callback set or if it was a false positive. This procedure has two negative effects:
* It forces us to rebuild the set of callbacks to run on every step when profiling
* It leaks the sampling abstraction, requiring other parts of the code to bump certain values and forces RecordFunction to lazily initialize.

This change introduces a multi-level cache which can (in the common case) quickly determine which callbacks *will* run, rather than if callbacks *might* run. This means that rather than call `shouldRunRecordFunction`, we can simply get the callbacks for an invocation and check if they are empty. (And completely removes the pre-sampling heuristic.) Another major benefit of the new cache structure is that it allows thread-safe registration and unregistration of global callbacks.

It's worth briefly discussing how this maintains eager performance. In the standard eager case (only sampling callbacks registered) the cache first checks that the global callbacks haven't changed (atomic read), decrements a counter to see if a sampling callback fired, and then returns the active callbacks which is simply a SmallVector of pointer pairs and a couple POD values (scope, needs inputs/outputs/ids). The biggest cost according to perf is the SmallVector logic; we could consider adopting a hard limit on active callbacks; more than half a dozen callbacks *running* in a single step would be quite a lot. But the total cost relative to `PYTORCH_DISABLE_PER_OP_PROFILING` is only ~10ns, so debatable if it's worth it to switch to `std::array`.

The primary change is in `record_function.cpp`, which has a more detailed description of the new cache structure. `record_function.h` has some minor changes to align with the new calling convention and the remaining files are simply changes to the call sites.

Future work:
  * RecordFunction no longer needs to be lazily initialized.
  * We can deprecate the disable/reenable APIs, since we can not safely add and remove global callbacks.

Test Plan:
I tested eager mode performance using the overhead benchmark and found that the non-profiled path was unaffected. However the no-op observer dropped from 0.41us to 0.37us (0.25us if no observers are active) which is about 1/3rd reduction in the cost of the callback selection machinery.

I also added several C++ unit tests, as the core RecordFunction machinery (especially sampling) was largely untested.

Reviewed By: swolchok, davidberard98

Differential Revision: D35276158

fbshipit-source-id: 35135f444724fba4eb97c0ae7f3f710f0f9016fd
(cherry picked from commit 9e359b87422c18f2a195185f32e7e85c82f956fd)
2022-04-19 20:46:16 +00:00
Han Qi
b34b192d6b Reland "Make debug_pkl smaller by only emitting unique traces." (#73368)
Summary:
## Original commit message:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73368

debug_pkl file inside of pytorch's .pt file consists of a list of SourceRanges. Each SourceRange points to a Source which is a stack track, filename, and start, end numbers. Those are emitted in debug_pkl file as strings.
Since many SourceRange shares the same source, the string for trace can be deduped.
The newer format saves a set of unique traces in a tuple, then each SourceRange will save the offset of it's trace w.r.t. position in that tuple. (i.e. manually applying dictionary compression).
The above helps with smaller file size. On loading, if we copy each trace to Source as string the runtime memory would still blowup.
To mitigate this, we use SourceView directly instead of source which will take the reference of string inside of Deserializer and make that into string_view. This is safe because Deserializer is hold by Unpickler by shared_ptr, and Unpickler is also hold by shared_ptr by another Source object. That Source object will be alive during the model construction.

Test Plan:
## Original Test plan
unit test

Took original file (312271638_930.predictor.disagg.local); loaded with `torch.jit.load` save again with `torch.jit.save`. Unzip both, look at contents:
```
[qihan@devvm5585.vll0 ~]$ du archive -h
4.0K    archive/xl_model_weights
3.7M    archive/extra
8.0K    archive/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K    archive/code/__torch__/caffe2/torch/fb/model_transform
8.0K    archive/code/__torch__/caffe2/torch/fb
8.0K    archive/code/__torch__/caffe2/torch
8.0K    archive/code/__torch__/caffe2
20M     archive/code/__torch__/torch/fx/graph_module
20M     archive/code/__torch__/torch/fx
8.0K    archive/code/__torch__/torch/classes
20M     archive/code/__torch__/torch
20M     archive/code/__torch__
20M     archive/code
2.7M    archive/constants
35M     archive
[qihan@devvm5585.vll0 ~]$ du resaved -h
4.0K    resaved/extra
8.0K    resaved/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K    resaved/code/__torch__/caffe2/torch/fb/model_transform
8.0K    resaved/code/__torch__/caffe2/torch/fb
8.0K    resaved/code/__torch__/caffe2/torch
8.0K    resaved/code/__torch__/caffe2
1.3M    resaved/code/__torch__/torch/fx/graph_module
1.3M    resaved/code/__torch__/torch/fx
8.0K    resaved/code/__torch__/torch/classes
1.4M    resaved/code/__torch__/torch
1.4M    resaved/code/__torch__
1.4M    resaved/code
2.7M    resaved/constants
13M     resaved
[qihan@devvm5585.vll0 ~]$
```
## Additional test:
`buck test mode/dev-tsan //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --exact 'caffe2/benchmarks/static_runtime:static_runtime_cpptest - StaticRuntime.to'` passes

 test jest.fbios.startup_cold_start.local.simulator f333356873 -

Differential Revision: D35196883

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74869
Approved by: https://github.com/gmagogsfm
2022-04-18 22:34:21 +00:00
Han Qi
7d5c07830d Add upgrader related logic to flatbuffer (#71451)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71451

title

Test Plan: unittest

Reviewed By: tugsbayasgalan

Differential Revision: D33593056

fbshipit-source-id: c48d6ad50e6e2f757b68525dfe07693711b95840
(cherry picked from commit 8e09e20c1dafcdbdb45c2d1574da68a32e54a3a5)
2022-04-17 18:51:23 +00:00
Nikita Shulga
fe8eff3711 Revert "Add upgrader related logic to flatbuffer"
This reverts commit dfae96171a.
2022-04-17 11:38:59 -07:00
Han Qi
dfae96171a Add upgrader related logic to flatbuffer
Summary: title

Test Plan: unittest

Differential Revision: D33593056

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71451
Approved by: https://github.com/tugsbayasgalan
2022-04-16 02:04:48 +00:00
Raghavan Raman
c2d5f6a5a4 [nnc] Update bounds overlap analysis to identify non-overlaps even with symbolic bounds
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74658

Approved by: https://github.com/ZolotukhinM
2022-04-14 20:24:03 +00:00
Raghavan Raman
d8ad1a579f [nnc] Fuse loops that have variable bounds
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74346

Approved by: https://github.com/ZolotukhinM
2022-04-14 20:24:03 +00:00
Jiewen Tan
ab0d9b18e9 [LT] Support Tensor.is_alias_of
Summary:
Tensor.is_alias_of relies on Storage to perform. However, LTCTensorImpl was
not implemented with that in mind. This commit adds a fake storage to LazyTensor
as a marker to mark LazyTensors that point to the same storage. The reason
why it's not done at LTCTensorImpl is that LazyTensor maintains the view ops/alias
logic in LazyTensor class instead of relying on TensorImpl to do the check.

Test Plan:
./build/bin/test_lazy --gtest_filter=LazyOpsTest.IsAliasOf

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75246

Approved by: https://github.com/bdhirsh
2022-04-14 07:28:03 +00:00
Nikolay Korovaiko
ce842f43f2 Relanding shape cache (75400) (#75710)
Summary:
https://github.com/pytorch/pytorch/pull/75400

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75710

Reviewed By: malfet

Differential Revision: D35598920

Pulled By: Krovatkin

fbshipit-source-id: 2bbbb3d0c24214b5dbb4ca605e7daa94671f96b0
(cherry picked from commit 572f2f9df5bfd73cd7b83536f619bc86d820ccd8)
2022-04-13 17:17:30 +00:00