Commit Graph

401 Commits

Author SHA1 Message Date
Meghan Lele
05b802d4e0 [pytorch] Bring back RemoveInplaceOps() (#62200)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62200

This commit brings back the `RemoveInplaceOps` pass removed in D29523283 (dec5aa2260) that apparently had a bunch of internal users.

Test Plan: danthe3rd

Reviewed By: danthe3rd

Differential Revision: D29833316

fbshipit-source-id: 6cf13d463ab0a5e50ba3eb3243f79a9c51623809
2021-07-28 12:00:38 -07:00
Kimish Patel
026cfe85b4 Fix InlinedCallStack annotation to account for module calling its own (#61791)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61791

methods from forward

During inlining we attached InlinedCallstack to nodes being inlined. In
the process we attach moodule information as well, such that if
CallMethod is being inlined we know which class instance and class type
the method belongs to. However, CallMethod can be calling a method of
the same object to which the graph belongs. e.g.:

```
def forward(self, input):
  x = input + 10
  return forward_impl_(x, input)
```
Here forward_impl is method defined on the same class in which forward
is defined. Existing module hierarchy annotation will mislabel this as
unknown instance since the method is not associated with output of
GetAttr node (it would be we had called self.conv.forward_impl_ for
example).
Change in this PR reconciles this by creating a placeholder name "SELF"
for module instance indicating that you can traverse InlinedCallStack
backwards to find first node with name != SELF, which would be the name
of the object.
e.g.:
TOP(ResNet)::forward.SELF(ResNet)::_forward_impl.layer1(Sequential)::forward.0(BasicBlock)::forward.conv1(Conv2d)::forward.SELF(Conv2d)::_conv_forward

Test Plan:
Add test

Imported from OSS

Reviewed By: larryliu0820

Differential Revision: D29745443

fbshipit-source-id: 1525e41df53913341c4c36a56772454782a0ba93
2021-07-26 15:00:57 -07:00
Richard Barnes
ee44d73e59 Modernize override (#61744)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61744

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D29717320

fbshipit-source-id: 6eea4295ee2e5572ab337620be412376fcc2f3cc
2021-07-23 23:04:46 -07:00
Nikita Shulga
a9b0a921d5 Disable avoid-non-const-global-variables lint check (#62008)
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`

All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`;  do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008

Reviewed By: driazati, r-barnes

Differential Revision: D29838584

Pulled By: malfet

fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13
2021-07-22 18:04:40 -07:00
Michael Suo
04043d681e [package] fix storage serialization collision (#61806)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61806

Currently, if you do `save_pickle` on a ScriptModule, then `save_pickle`
on a tensor, this would result in a `0.storage` tensor being written
*twice* to the zip archive. This would cause weird bugs on the
serializing side (this presented as a ASAN-detected heap buffer overflow
because we tried to read more memory from a tensor than we actually
had).

Turns out this was because when we did:
```
self.storage_context = self.script_module_serializer.storage_context()
```
it returned a new copy of the storage context, so we weren't actually
assigning unique names to tensors!!

This PR fixes the issue by making `(De)SerializationStorageContext`
non-copyable and fixing up the parts of the bindings that returned by
copy.

Differential Revision:
D29748969
D29748969

Test Plan: Imported from OSS

Reviewed By: Lilyjjo

Pulled By: suo

fbshipit-source-id: c2f89ab270e07e7a111fb35c545b5e07b804dc3c
2021-07-19 18:22:36 -07:00
Meghan Lele
5144381b1d [pytorch][JIT] Widen exception caught by ScriptList casting (#61520)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61520

This commit widens the exception caught by the try-catch block that checks if
an object passed to a scripted function is a `ScriptList`. It turns out that
there are internal tests that do not throw a `py::cast_error` so catching only
that is not sufficient.

Test Plan: Ran the failing tests in T94889011.

Reviewed By: Chillee

Differential Revision: D29560815

fbshipit-source-id: 442258f8997146d833a9d5db923e1f6359f2bfdd
2021-07-12 23:20:58 -07:00
Gary Miguel
dec5aa2260 [JIT] clean up (#60390)
Summary:
* Minor: spelling, grammar.
* Add calls to `GRAPH_DUMP()` where they were missing.
* Add or expand a few comments.
* Move a few comments to seemingly more appropriate spots.
* In canonicalize_graph_fuser_ops.cpp inline `runnableInputs()` since it
  was only called in one place and had a misleading comment and
  confusing name.
* In `PeepholeOptimizeImpl::optimizeBlock()`, set `changed = true;` when
  removing `aten::is_complex`. Pretty sure its absence was a bug.
* Delete unused `_jit_pass_remove_inplace_ops` and and its
  implementation `RemoveInplaceOps()`.
* In `preprocessCaffe2Ops()`, remove redundant check for nested optional
  types. It was already checked in `checkONNXCompatibility()`.
* In `EncoderBase::AddAttribute`, log the unexpected attribute kind.
  I don't remember the repro case now but I did hit this error at some
  point and this additional logging made it easier to understand.
* In `fuseConvBatchNorm()` in eval_peephole.cpp, consistently use
  camelCase instead of snake_case for local variables.
* Add curly braces around the bodies of if and loops.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60390

Reviewed By: Krovatkin

Differential Revision: D29523283

Pulled By: SplitInfinity

fbshipit-source-id: 4e16c5648616f53da07d68dab7fdf252e06a0752
2021-07-09 16:28:27 -07:00
BowenBao
95a7f3ccfe [ONNX] Fix shape inference for large model (#59320) (#60244)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60244

Do 2GB size check for protocol buffer serialization at a later time, to avoid false alarming for cases like shape inference where no serialization actually happens.

Test Plan: Imported from OSS

Reviewed By: zou3519, ZolotukhinM

Differential Revision: D29494910

Pulled By: SplitInfinity

fbshipit-source-id: 4c36d26de9a94e5d6cf78f332d4dffc46588ebf0

Co-authored-by: BowenBao <bowbao@microsoft.com>
2021-07-08 16:29:22 -07:00
Meghan Lele
4a2e8b53bb [JIT] Add torch._C.ScriptList` (#52832)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52832

**Summary**
This commit adds `torch._C.ScriptList`, a list type that has reference
semantics across the Python/TorchScript boundary. That is, modifications
made in TorchScript to instances of `torch._C.ScriptList`
are visible in Python even when it is not returned from the function.

`torch._C.ScriptList` is implemented using a modified version of pybind's
`stl_bind.h`-style bindings attached to `ScriptList` and `ScriptListIterator`,
wrapper classes around `c10::impl::GenericList` and
`c10::impl::GenericList::iterator`. These bindings allow instances of
`torch._C.ScriptList` to be used as if it were a
regular `list` in Python. Reference semantics are achieved by simply
retrieving the `IValue` contained in `ScriptList` in `toIValue` (invoked
when converting Python arguments to `IValues` before calling TorchScript
code).

**Test Plan**
This commit adds `TestScriptList` to `test_list_dict.py`, a set of tests
that check that all of the common list operations are supported
and that instances have reference semantics across the
Python/TorchScript boundary.

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D29478121

Pulled By: SplitInfinity

fbshipit-source-id: 652cc25cfa37debe28db9527504846f22abd8b54
2021-07-01 20:28:13 -07:00
Mike Guo
6ecc1a4c4f Make pytorch clang-tidy clean (#60649)
Summary:
This PR suppresses clang-tidy warnings in the codebase (for now) so that we can re-enable clang-tidy checks on master.

I ran this script to add the `NOLINTNEXTLINE` comments (on a devserver):
```bash
python3 setup.py develop

# Uses same script that's run on CI and adds the -j (parallel), -s (add comments), -k (continue if diagnostic errors are found) options
python3 tools/clang_tidy.py \
  -j \
  -s \
  -k \
  -v \
  --paths torch/csrc/ \
  -g"-torch/csrc/jit/passes/onnx/helper.cpp" \
  -g"-torch/csrc/jit/passes/onnx/shape_type_inference.cpp" \
  -g"-torch/csrc/jit/serialization/onnx.cpp" \
  -g"-torch/csrc/jit/serialization/export.cpp" \
  -g"-torch/csrc/jit/serialization/import.cpp" \
  -g"-torch/csrc/jit/serialization/import_legacy.cpp" \
  -g"-torch/csrc/onnx/init.cpp" \
  -g"-torch/csrc/cuda/nccl.*" \
  -g"-torch/csrc/cuda/python_nccl.cpp" \
  -g"-torch/csrc/autograd/FunctionsManual.cpp" \
  -g"-torch/csrc/generic/*.cpp" \
  -g"-torch/csrc/jit/codegen/cuda/runtime/*" \
  -g"-torch/csrc/deploy/interpreter/interpreter.cpp" \
  -g"-torch/csrc/deploy/interpreter/interpreter.h" \
  -g"-torch/csrc/deploy/interpreter/interpreter_impl.h" \
  -g"-torch/csrc/deploy/interpreter/test_main.cpp"
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60649

Test Plan: Verified changes by re-running the script (without the `-s` option) and seeing no warnings/errors.

Reviewed By: walterddr, janeyx99

Differential Revision: D29504258

Pulled By: 1ntEgr8

fbshipit-source-id: 78310b30ee8213b73ddb4771ad874665323e7a4e
2021-07-01 12:21:07 -07:00
Meghan Lele
6c1c1111de [JIT] Add reference semantics to TorchScript classes (#44324)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44324

**Summary**
This commit adds reference semantics to TorchScript class types;
modifications made to them within TorchScript will be visible in Python.

**Test Plan**
This commit adds a unit test to `TestClassType` that checks that
modifications made to a class type instance passed into TorchScript are
visible in Python after executing the scripted function or module.

**Fixes**
This commit closes #41421.

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D24912807

Pulled By: SplitInfinity

fbshipit-source-id: d64ac6211012425b040b987e3358253016e84ca0
2021-06-30 14:27:17 -07:00
Mengwei Liu
10fc58620e [PyTorch][NASProfiler] Add moduleHierarchy Python API to print out hierarchical information about a Node (#60384)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60384

Currently inlining module graph will drop module hierarchy info on Python side. Here we retrieve the module hierarchy from cpp side and expose it to a new Python API on Node called `moduleHierarchy()`.

Test Plan:
Usage:
```
torch._C._jit_pass_inline(module.graph)
torch._C._jit_pass_propagate_shapes_on_graph(module.graph)
node = module.graph.findNode("quantized::conv2d_relu")
'top(' + module.original_name + ').' + node.moduleHierarchy() + '.' + node.kind()
```
Output:
```
'top(QuantWrapper).module(FBNetHR).0(Sequential).xif0_0(ConvBNRelu).conv(ConvReLU2d).quantized::conv2d_relu'
```

Reviewed By: kimishpatel

Differential Revision: D29252169

fbshipit-source-id: 74163a87f919e061e5e75dfebc4c5cdbe8489d93
2021-06-30 01:32:31 -07:00
Bert Maher
93772792e3 [nnc] Get rid of fuser trigger counters (#57334)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57334

Here's a possibly controversial PR.  These counters got in the way of
generalizing the fuser tests to handle arbitrary devices, and I guess I'm just
generally skeptical that they provide much value.  While true that they let us
observe whether fusion groups were created, we already have assertions based on
the shape of the graph, and I'm not sure that I trust those any less than these
counters.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D29471484

Pulled By: bertmaher

fbshipit-source-id: f6d76f6e72dbfb581acff1d834b0c74500941b57
2021-06-29 22:22:15 -07:00
Lily Johnson
0dd90cceaf [package] track storages across lifetime of PackageExporter (#59735)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59735

1. Fixes ABA storage identity problem during serialization for `torch.package` by keeping reference of serialized storages through lifetime of `PackageExporter` to prevent reuse of memory address. Achieved by extending logic used in solution to mobile's same issue.
2. Adds determinism to naming scheme of serialized storages in export code paths which utilize `tensor_cdata_naming_scheme`(introduced 2nd mapping in `StorageContext`, now maps `storage cdata ptr` -> `unique id`, `unique id` -> `c10::Storage`)
3. Additionally uses presence of a storage in the `StorageContext` instance as marker for if a storage has been serialized or not, removing the need to scan the `PythonStreamWriter` for presence of the storage's serialization file

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D29075276

Pulled By: Lilyjjo

fbshipit-source-id: 15a5c30b1de99c5bd7079388f2db9b6ece2eca12
2021-06-29 14:16:54 -07:00
Ansley Ussery
0fbc471d10 Support default values on NamedTuple fields (#54682)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54682

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D27327241

Pulled By: ansley

fbshipit-source-id: 76546f1770d50ebc3435bba3b74540e3c6be8a1c
2021-06-26 15:18:21 -07:00
Hariom Narang
9d1d799034 Added API to change logging levels for JIT (#58821)
Summary:
Description:
- Before this, logging level could only be changed by changing the env
variable "PYTORCH_JIT_LOG_LEVEL"
    - Can change the level from python now
- Have not added stream configuration for now
- Configuration is stored in a singleton class managing the options

Issue Link: https://github.com/pytorch/pytorch/issues/54188

Gotchas:
- Created separate functions
`::torch::jit::get_jit_logging_levels/set_jit_logging_levels` instead of
using the singleton class's method directly
    - This is because when running test cases, two different instances
    of the singleton are created for the test suite and the actual code
    (`jit_log.cpp`)
    - On using these methods directly, `is_enabled` calls the singleton
    in `jit_log.cpp` while we are setting the config using another
    singleton
    - See: https://stackoverflow.com/questions/55467246/my-singleton-can-be-called-multiple-times

API:
- To set the level: `torch._C._jit_set_logging_option("level")`
- To get the level: `torch._C._jit_get_logging_option()`

Testing:
- UTs were added for C++
- A very simple UT was added for python to just check if the API is
being called correctly
- The API was checked by running trace in a sample python file
    - Set env variable to "" and used `_jit_set_logging_option` in python to set the variable to `>dead_code_elimination`
    - The error output had logs of form [DUMP..] [UPDATE...] etc

Fixes https://github.com/pytorch/pytorch/issues/54188

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58821

Reviewed By: soulitzer

Differential Revision: D29116712

Pulled By: ZolotukhinM

fbshipit-source-id: 8f2861ee2bd567fb63b405953d035ca657a3200f
2021-06-21 16:10:49 -07:00
Richard Barnes
b162d95e46 Fix a number of lint perf and safety issues in torch (#59897)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59897

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D29037012

fbshipit-source-id: 7c16286d5fc2b67964fb65f8374dfff4d1a7aefb
2021-06-15 13:14:51 -07:00
Meghan Lele
d9d7d5e24a [torch] Remove migration warning for ScriptDict
Summary:
This commit removes the warning that suggests that users script their
dictionaries before passing them into TorchScript code. The ScriptDict feature
is not fully ready, so it does not make sense to recommend this yet.

Test Plan:
Sandcastle.

In addition, the PyPER test broken by the original diff passes:

```
buck test mode/opt //caffe2/torch/fb/training_toolkit/backend/tests:test_model_materializer_full_sync_lwt -- --exact 'caffe2/torch/fb/training_toolkit/backend/tests:test_model_materializer_full_sync_lwt - caffe2.torch.fb.training_toolkit.backend.tests.test_model_materializer_full_sync_lwt.ModelMaterializerFullSyncLwtTest: test_materialization_determinism_cpu' --run-disabled
```

Differential Revision: D28891351

fbshipit-source-id: 2a3a00cde935d670fb1dc7fd8c709ae9c2ad8cdc
2021-06-03 20:55:40 -07:00
Bin Bao
add291cf66 [JIT] Add a phase to perform inplace<->functional conversion for activation operators (#57477)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57477

Currently the conversion only deals with activation operators. The legality check is somewhat strict for now.

Test Plan:
```
python test/test_jit.py -k test_functional_to_inplace_activation
python test/test_jit.py -k test_inplace_to_functional_activation
```

Reviewed By: mrshenli

Differential Revision: D28155153

Pulled By: desertfire

fbshipit-source-id: df092830c4dff3ce9578ff76285eb7a566b7d81b
2021-06-03 06:43:23 -07:00
Richard Barnes
3979cb0656 irange for size_t (#55320)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55320

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D27572577

fbshipit-source-id: 97710fd2bb1303006b05828a0d1343b0b59ccb03
2021-06-03 01:04:13 -07:00
Meghan Lele
484d53f4a0 [torch][JIT] Warn only once when using unscripted dictionary (#59287)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59287

D27211605 added a warning in `toIValue` that warns users to script their
dictionaries before passing them to TorchScript functions in order to get some
performance benefits and reference semantics. However, this warning is emitted
every time `toIValue` is called (e.g. when a dictionary is passed to
TorchScript function), which can lead to noisy log output. This diff changes
this changes to use `TORCH_WARN_ONCE` instead.

Test Plan: Sandcastle, OSS CI.

Reviewed By: hyuen

Differential Revision: D28824468

fbshipit-source-id: e651eade4380abaf77c6c8a81ec4e565b0c2c714
2021-06-02 11:41:37 -07:00
eellison
d8cbba3ee2 [JIT] Disable Complete Shape Inlining For Testing Purposes (#56966)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56966

This PR adds a toggle to shape analysis which won't inline complete tensor shapes as constants into the shape compute graph, which is a good stress test on the partial evaluation pipeline.

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D28444664

Pulled By: eellison

fbshipit-source-id: a62e424515a8837a4b596546efa93af5e8e61f10
2021-05-27 17:57:48 -07:00
eellison
f66fbb1e2e Add unary/binary ops necessary for mobilenet (#56828)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56828

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D28444660

Pulled By: eellison

fbshipit-source-id: 656673e6139550f2752c0d3ac2fb8731f4bf9bbb
2021-05-27 17:56:30 -07:00
Meghan Lele
b14c3205fd [JIT] Add torch._C.ScriptDict (#52659)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52659

**Summary**
This commit adds `torch._C.ScriptDict`, a dictionary type that has reference
semantics across the Python/TorchScript boundary. That is, modifications
made to instances of `torch._C.ScriptDict` in TorchScript are visible in
Python even when it is not returned from the function. Instances can be
constructed by passing an instance of a Python dictionary to
`torch.jit.script`. In the case of an empty dictionary, its type is
assumed to be `Dict[str, Tensor]` to be consistent with the handling of
empty dictionaries in TorchScript source code.

`torch._C.ScriptDict` is implemented using a modified version of pybind's `stl_bind.h`-style bindings attached to `ScriptDict`, `ScriptDictIterator` and `ScriptDictKeyIterator`, wrapper classes around `c10::impl::GenericDict` and `c10::impl::GenericDict::iterator`. These bindings allow instances of `torch._C.ScriptDict` to be used as if it were a regular `dict` Python. Reference semantics are achieved by simply retrieving the `IValue` contained in `ScriptDict` in `toIValue` (invoked when converting Python arguments to `IValues` before calling TorchScript code).

**Test Plan**
This commit adds `TestScriptDict` to `test_list_dict.py`, a set of tests
that check that all of the common dictionary operations are supported
and that instances have reference semantics across the
Python/TorchScript boundary.

Differential Revision:
D27211605
D27211605

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Pulled By: SplitInfinity

fbshipit-source-id: 446d4e5328375791aa73eb9e8b04dfe3465af960
2021-05-27 10:25:30 -07:00
Ansley Ussery
5268b5a29a Add parsing logic for Tuple[()] annotation (#58340)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58340

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D28459502

Pulled By: ansley

fbshipit-source-id: 4bb188448d66269b42b068858b895debac86e9ee
2021-05-25 12:12:43 -07:00
Kimish Patel
e067675167 [Pytorch] Provide API to preserve source range and callstack information during graph rewrite (#58300)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58300

Current state: During graph rewriting that can fuse nodes or add nodes
result in new nodes without debug information that was available in
original node. Thus we lose this information during graph rewrite.

This PR changes graph rewriting API to let user specify how the values
in the replacement pattern map to values in the pattern to be matched.
Then the graph rewriting will copy source range and inlined callstack
from the matched nodes onto the nodes being inserted.

(Note: this ignores all push blocking failures!)

Test Plan:
python test/test_jit.py
TestJit.test_pattern_based_rewrite_with_source_range_preserved

Imported from OSS

Reviewed By: malfet

Differential Revision: D28512465

fbshipit-source-id: 863173c29de726be85b3acbd3ddf3257eea36d13
2021-05-25 09:18:59 -07:00
Meghan Lele
0b8931fe4b [torch][JIT] Predicate uses of RPC APIs on torch.distributed.rpc.is_available() (#58887)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58887

There are some callsites of `torch.distributed.rpc.XXX` APIs that are compiled
or not based on `USE_RPC`. However, `torch::deploy`, at least for now,
is compiled with `USE_RPC=1`, but the `torch.distributed.rpc.XXX` APIs used by
the aforementioned pieces of code are not available (i.e.
`torch.distributed.rpc.is_available()` returns `False`). This can cause
Torchscript compilation to fail, even if the code being compiled doesn't use
RPC.

This commit fixes this problem (at least temporarily) by predicating the use
all thse `torch.distributed.rpc` APIs on the value of
`torch.distributed.rpc.is_available()`.

Test Plan: Ran packaged XLM-R model with C++ benchmark.

Reviewed By: suo

Differential Revision: D28660925

fbshipit-source-id: fbff7c7ef9596549105e79f702987a53b04ba6f9
2021-05-24 21:53:53 -07:00
Zhengxu Chen
2b0ec9c3cf Reapply "[jit] Implement ScriptProfile to collect instruction profiles." (#58783)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58783

This reverts commit fc804b5def.

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D28617037

Pulled By: zhxchen17

fbshipit-source-id: 645de2ede20500a5c218d6ec3c7faae94de37a14
2021-05-24 18:23:21 -07:00
Jacob Szwejbka
1c5f63d86d [Pytorch Edge] Model Ops compatibility api (#57501)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57501

Add an api _get_model_ops_and_info to get root operators and versioning info of a model in both cxx and python, and the input can be from a file path or buffer.
ghstack-source-id: 129620112

Test Plan: unit test.

Reviewed By: xcheng16, raziel

Differential Revision: D28162765

fbshipit-source-id: 4413c1e906b8a872e4a717d849da37347adbbea4
2021-05-24 12:00:06 -07:00
Elias Ellison
5313bafd31 [JIT] integer value refinement (#56438)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56438

Test Plan: Imported from OSS

Reviewed By: nikithamalgifb

Differential Revision: D27924239

Pulled By: eellison

fbshipit-source-id: ace54fcb594853f30c242369ea203b0eb5527ac1
2021-05-21 08:51:01 -07:00
Elias Ellison
5cebf29b4e Add list len refinement (#55926)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55926

This is necessary for code like conv2d where we wish to share a generic convolution shape function logic with that of conv2d but for conv2d always infer the output is dimension 4. I'm also hoping the refinement algorithm here could be refactored out and used to support refining tensor types from user annotations. i have a length comment explaining how this works, and the logic outside of data structures is pretty small and contained. Additionally, you might check out https://fb.quip.com/X7EVAdQ99Zzm for a very similar description of how to refine values based on comparison operators.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D27750997

Pulled By: eellison

fbshipit-source-id: d962415af519ac37ebc9de88f2e1ea60a1374f7c
2021-05-21 08:50:54 -07:00
Elias Ellison
9fd2306036 Add handling of symbolic shapes (#55925)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55925

This sets up the initial handling of symbolic shapes. As in the test, it doesn't work perfectly yet because it needs a couple other optimization passes. The basic description is pretty simple: we resolve tensor dimension indices to the same Value *, and before extracting out the output Tensor shape we substitute in symbolic shapes. We don't substitute during optimization because they are represented as negative numbers so we don't want them inadvertently used in Constant prop or something else.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D27750996

Pulled By: eellison

fbshipit-source-id: 6984e7276b578f96b00fc2025cef0e13f594b6e6
2021-05-21 08:50:52 -07:00
Elias Ellison
f39471a171 Initial Symbolic Shape Analysis (#54809)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54809

I'm going to post on dev-discuss soon with a more thorough explanation of the design and advantages of this shape analysis, so I'm leaving out that for now.

There is still a ton left to do, I'm posting this initial version so we can get something on master multiple can work on. List of many remaining steps to do:

- [ ] Add symbolic shapes support
- [ ] Bind shape functions for operators in C++
- [ ] Make classes of operators share the same shape function (e.g. pointwise, broadcast two inputs)
- [ ] Refactor APIs
- [ ] Only iteratively optimize shape function while a change has been made
- [ ] Expand coverage of coverage to common ops
- [ ] Add shape analysis pass on Graph that handles Ifs and Loops
- [ ] Allow concurrent reads to the operator map
- [ ] Successive applications of same inputs to same shape function (e.g. series of pointwise ops)

For this review, I am mostly looking for comments related to the implementation of symolic_shape_analysis.cpp, with the caveats listed above. I am not really looking for comments related to api/registration/graph level analysis as those are all planned to be changed. I am fine landing this as is or waiting until necessary components of the TODOs above are finished.

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D27750998

Pulled By: eellison

fbshipit-source-id: 4338b99e8651df076291c6b781c0e36a1bcbec03
2021-05-21 08:49:46 -07:00
Edward Yang
fc804b5def Revert D28133579: [jit] Implement ScriptProfile to collect instruction profiles.
Test Plan: revert-hammer

Differential Revision:
D28133579 (034a238bab)

Original commit changeset: e7e30e961513

fbshipit-source-id: 5a7756468b4f2eeed24d2abb7b52ab46d081a95e
2021-05-21 08:18:40 -07:00
Zhengxu Chen
034a238bab [jit] Implement ScriptProfile to collect instruction profiles. (#57397)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57397

Introduces two main classes in C++ runtime:

ScriptProfile is the implementation for enalbing and disabling interpreter
profiling in C++. This should be only used from Python, and we will add
corresponding Python API in the next diff.

InstructionSpan is a utility class to instrument execution of each single
instruction. A start timestamp is recorded in the consturctor, and an end
timestamp is recorded in the destructor. During destruction, this will send
runtime data to all enabled ScriptProfile instances.

Test Plan:
build/bin/test_jit --gtest_filter='ScriptProfileTest.Basic'

Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D28133579

fbshipit-source-id: e7e30e96151367022793ab3ad323f01c51ad4a3b
2021-05-20 14:11:03 -07:00
Raghavan Raman
3fe72d30dc [NNC] Optimize conditionals that correspond to the form generated for aten::cat op. (#57673)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57673

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28231374

Pulled By: navahgar

fbshipit-source-id: 1777a63df4e5ebed6d515683bd772a88be465b3a
2021-05-18 14:23:48 -07:00
Luca Wehrstedt
5a238eb96e Fix deadlock in Future due to lock inversion with GIL (#58382)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58382

Calling markCompleted on a Future now first acquires the Future's mutex (as usual) but then sometimes tries to acquire the GIL during the DataPtr extraction while still holding the Future's mutex. (This happens when the value passed to markCompleted is a Python object). This can cause a deadlock if someone else calls any of the other methods of Future while holding the GIL.

There are two solutions to this: avoid holding the Future's mutex when extracting DataPtrs, and avoid holding the GIL while invoking the Future's method. In this PR I'm going for the latter, because it's a very simple immediate fix, but I believe this is brittle and that we should probably also consider the former fix.
ghstack-source-id: 129105358

Test Plan: The repro in https://github.com/pytorch/pytorch/issues/58239 now doesn't deadlock.

Reviewed By: mrshenli

Differential Revision: D28472816

fbshipit-source-id: 1bc9bca426dd004f9eb2568db1ffd38f014450e2
2021-05-17 10:53:19 -07:00
Lillian Johnson
9403fe17ce [torch.package/TorchScript] logic to enable sharing of tensors on load (#57573)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57573

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D28226975

Pulled By: Lilyjjo

fbshipit-source-id: bc8cb3e8052fa18336c437e0601d8b0028fd1895
2021-05-14 08:21:43 -07:00
Lillian Johnson
3ad11803f7 [torch.Package/TorchScript] ScriptModuleSerializer add unified format (#56299)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56299

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D27832545

Pulled By: Lilyjjo

fbshipit-source-id: 1b2880a8458f99bd66a8c9656c5ca700f43cffe8
2021-05-14 08:21:40 -07:00
Lillian Johnson
07de11c26d [torch.Package/TorchScript] TS serialization importer to handle unified format (#54891)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54891

Changed TorchScript's jit/serialization importer logic to handle both original TS serialization format and new unified TS format

Original TS file format:
```
resnet.pt
├── data  # tensor data
│   ├── 94286146172688
│   ├── 94286146172784
│   └── ...
├── code/  # TorchScript code
│   ├── __torch__
│   │   ├── torch
│   │   │   └── nn ...
│   │   └── torchvision ...
│   ├── __torch__.py
│   └── __torch__.py.debug_pkl
├── data.pkl  # the ScriptModule object, pickled by the TS pickler
├── version  # version metadata
├── constants.pkl  # any tensor constants present in the TS code
└── extra
     ├── name_of_file
     └── foo
```

Unified file format:
```
─── package_name.pt
    ├── .data
    │   ├── ts_code # code shared between models
    │   │   ├── 0
    │   │   │   ├── constants.pkl
    │   │   │   └── data.pkl
    │   │   ├── 1
    │   │   │   ├── constants.pkl
    │   │   │   └── data.pkl
    │   │   └── code
    │   │       ├── __torch__
    │   │       │   ├── torch
    │   │       │   │   └── nn ...
    │   │       │   └── torchvision ...
    │   │       ├── __torch__.py
    │   │       └── __torch__.py.debug_pkl
    │   ├── 0.storage
    │   ├── 1.storage
    │   ├── <many more storages>
    │   ├── 201.storage
    │   ├── extern_modules
    │   └── version
    └── res
        ├── mod.pkl  # maps to ts_id 0 and .data/ts_code/0
        └── mod2.pkl # maps to ts_id 1 and .data/ts_code/1
```

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D27832548

Pulled By: Lilyjjo

fbshipit-source-id: 4a6e84c3a9bac8eed6a4e4afc2ac76dd691858b0
2021-05-14 08:20:34 -07:00
Dhruv Matani
38e606d056 [RFC] Add method torch.jit._clone_module_with_class (#56152)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56152

Currently, the Bundled Inputs API mutates the module in-place. It adds class methods and not instance methods. This results in a small problem that one can't re-run an already executed cell in Bento if the class has already been subject to bundled inputs.

In addition, there is no way to add bundled inputs to a module that has bundled inputs added already. This API provides a way to solve this problem as well by adding an `ignored_methods` to the call to `clone()` by allowing the implementation of bundled inputs to pass in the methods that it will add as `ignored_methods` so that when it does try to add those methods, it will be able to do so successfully.

We'll have to be careful when ignoring those methods during the call to `torch.jit._clone_module_with_class` since any bundled input that relies on a user-provided method will need to be preserved and not ignored during the clone.

Looking for feedback on whether this is an acceptable direction.
ghstack-source-id: 128908360

Test Plan:
Added unit test and ran it as `buck test //caffe2/test:mobile`

Also see this Bento Notebook: https://www.internalfb.com/intern/anp/view/?id=550829

Reviewed By: gmagogsfm

Differential Revision: D27788394

fbshipit-source-id: 48109cd4583506d4efdb345e4ba31385db23a273
2021-05-13 22:31:05 -07:00
BowenBao
346dc88bfa [ONNX] Support registering custom export for prim::PythonOp from torch.autograd.Function (#55630) (#57600)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57600

Demo script:

```python
import torch

class MyReLU(torch.autograd.Function):
    staticmethod
    def forward(ctx, input, scalar_tuple, scalar, scalar_list):
        ctx.save_for_backward(input)
        return input.clamp(min=scalar)
    staticmethod
    def backward(ctx, grad_output):
        input, = ctx.saved_tensors
        grad_input = grad_output.clone()
        grad_input[input < 0] = 0
        return grad_input

class MyModule(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.linear_a = torch.nn.Linear(2, 2)
        self.linear_b = torch.nn.Linear(2, 2)
        self.relu = MyReLU.apply
    def forward(self, x):
        h = self.linear_a(x)
        h = self.relu(h, (5, 3), 2, [1, 2, 3])
        h = self.linear_b(h)
        return h

"""
User define how to export prim::PythonOp into custom op.
"""
def symbolic_pythonop(g, n, *args, **kwargs):
    # Print information:
    print('arguments of ', kwargs['name'], ':')
    print('original node: ', n)
    for i, out in enumerate(n.outputs()):
        print('original output {}: {}, requires grad: {}'.format(i, out, out.requiresGrad()))
    import torch.onnx.symbolic_helper as sym_helper
    for i, arg in enumerate(args):
        print('arg {}: {}, requires grad: {}'.format(i, arg, arg.requiresGrad() if sym_helper._is_value(arg) else False))
    for k, v in kwargs.items():
        print('key: ', k, ' v: ', v)

    # TODO: all inputs (tensors and scalars) are in args.
    #       backend can define CustomDomain::PythonOp and how info are stored however it deem fit.
    return g.op("CustomDomain::PythonOp", args[0], name_s=kwargs['name'])

torch.onnx.register_custom_op_symbolic("::prim_PythonOp", symbolic_pythonop, 9)

# Define input.
x = torch.tensor([[0.3971, 0.7544],
                  [0.5695, 0.4388]], requires_grad=True)

model = MyModule()
# Forward.
y = model(x)

torch.onnx.export(model, (x,), 'model.onnx', opset_version=12, verbose=True)
```

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D28393528

Pulled By: SplitInfinity

fbshipit-source-id: e0d55b7c737c5916fda08a3b26b3306037f970df

Co-authored-by: BowenBao <bowbao@microsoft.com>
2021-05-13 13:42:49 -07:00
neginraoof
1de3525ca8 [ONNX] Handle PackedParams inputs for _propagate_and_assign_input_shapes (#56449) (#57079)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57079

Testing onnx 1.9 release, we see that the old bug is triggered for the caffe2 test:
`pytest test/onnx/test_pytorch_onnx_caffe2_quantized.py::TestQuantizedOps::test_small_model`
This is because the graph inputs
```python
graph(%x.1 : Tensor,
      %conv1._packed_params : __torch__.torch.classes.quantized.Conv2dPackedParamsBase,
      %conv2._packed_params : __torch__.torch.classes.quantized.Conv2dPackedParamsBase,
      %fc.bias : Float(10, strides=[1], requires_grad=0, device=cpu),
      %fc.weight : Float(10, 72, strides=[72, 1], requires_grad=0, device=cpu)):
```
contains `Conv2dPackedParamsBase` which is a PackedParams.
When we do flatten, we will flatten to several tensors, then the shape inference for input misaligned.
This PR record how may tensors got flattened in PackeParams, and skip by these number rather than 1, then the UT passed.
Note that tuple case should still follow the original logic.

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D28393949

Pulled By: malfet

fbshipit-source-id: 98d48aad27e5ca03fb10d260f8e625478d996ee2

Co-authored-by: David <jiafa@microsoft.com>
2021-05-12 15:20:26 -07:00
Chen Lai
8c04593c0a [PyTorch Edge] Add backport to export old bytecode models (#56802)
Summary:
Add an api to backport a model vn to model vi. It accept an input model (file or buffer) and output a model (file or buffer) with an expected bytecode version.

In this change, the input is a model and it can come from a file or buffer. The output is a model and can be either file path or buffer.

When backport fails, function return false with a warning message :
```
/Users/chenlai/pytorch/cmake-build-debug/bin/test_jit --gtest_filter=LiteInterpreterTest.BackPortByteCodeModelV4:LiteInterpreterTest/*.BackPortByteCodeModelV4:*/LiteInterpreterTest.BackPortByteCodeModelV4/*:*/LiteInterpreterTest/*.BackPortByteCodeModelV4 --gtest_color=no
Testing started at 2:32 PM ...
CUDA not available. Disabling CUDA and MultiCUDA tests

[W backport.cpp:419] Warning: Backport doesn't support backport to version3 (function _backport_for_mobile_impl)
Process finished with exit code 0
```

## Test
1. Run both `caffe2/test/cpp/jit/test_lite_interpreter.cpp` and `caffe2/test/mobile/test_bytecode.py`.
2. Run all prod models with backport api.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56802

ghstack-source-id: 128425510

Test Plan: CI

Reviewed By: raziel, iseeyuan

Differential Revision: D27844651

fbshipit-source-id: 8a803cf6c76433ee0a3049b1a5570585d569f8d6
2021-05-07 18:14:33 -07:00
Luca Wehrstedt
36e47af58b Pass reference to parent future in callbacks (#57635)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57635

Note: this PR looks massive, but it's just one simple change, codemodded many times.

In many cases, a callback needs to access the value/error produced by the parent future. In Python this was easy because the callback was invoked with the parent future as argument, and could thus inspect it. In C++ the callbacks didn't take any arguments, thus in many cases we worked around this by capturing the future in its own callback. This is risky (leads to reference cycle and thus memory leak) and must be done carefully (spoiler: sometimes we weren't).
ghstack-source-id: 128296580

Test Plan: CI

Reviewed By: wanchaol

Differential Revision: D28178783

fbshipit-source-id: 6de02c4568be42123372edc008f630d5ddae0081
2021-05-07 03:59:18 -07:00
Luca Wehrstedt
8e9bbd3113 Make DataPtr extraction in CUDAFuture faster for Python values (#56918)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56918

Re-importing a Python module each time is a bit expensive, and it's unnecessary because this is a private module which won't change and thus we can cache the value once we first extract it.

ghstack-source-id: 128184666

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D27985910

fbshipit-source-id: be40ae9b67ab8ea6c07bc2cb9a78d2c2c30b35d3
2021-05-06 01:12:53 -07:00
Yi Huang (Symphony)
ba78bf1363 [standaloneRunner] fix another GIL mutithreading issue exposed by torch::jit::toIValue() (#57688)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57688

P412982836 says that `torch::jit::toIValue()` will also touch GIL through `torch::jit::createGenericDict()` (P412848640)
So we have to move `torch::jit::toIValue()` out of multithreading execution

Reviewed By: hyuen

Differential Revision: D28236527

fbshipit-source-id: 43a33dbcfc828cc42c5e1230c8f5cb415bf7bde4
2021-05-05 21:41:04 -07:00
Chen Lai
fb9a32b7b4 [PyTorch][Edge] Add api to get bytecode model version (#56801)
Summary:
Add an api `_get_bytecode_version` to get version number given a bytecode model in both cxx and python, and the input can be both from file path and buffer.
## Test
CI (new added unit test will run as part of `pytorch_core-buck`)

1. run test_lite_interpreter.cpp
2. `python test/mobile/test_bytecode.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56801

ghstack-source-id: 128169647

Test Plan:
CI (new added unit test will run as part of `pytorch_core-buck`)

1. run test_lite_interpreter.cpp
2. `python test/mobile/test_bytecode.py`

Reviewed By: iseeyuan

Differential Revision: D27961417

fbshipit-source-id: f786cc9573d855feecff0b4fe8e5363e25f5728c
2021-05-05 09:17:26 -07:00
Luca Wehrstedt
58bc003487 Add pybind type caster for c10::Device (#57292)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57292

In Future (and soon in other places too) we need to receive a list of devices from Python-land. We don't want to just take their indices because we need full devices in order to infer the type from them. torch.device is not defined through pybind, it's defined through a plain `PyModule_AddObject` call with CPython, thus pybind isn't naturally able to understand and convert it. However we can provide a custom type caster which fixes that. We have this already for at::Tensor, at::Generator, ...
ghstack-source-id: 127916268

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D28092732

fbshipit-source-id: 1c31d0b85a4d5c9e7bde8161efbb7574d505157c
2021-05-01 16:11:10 -07:00
Scott Wolchok
b87d3fa432 [PyTorch][jit] Don't allow create() on singleton types (#56807)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56807

If I understand correctly, there's no reason to create your own instance of these global singleton types.
ghstack-source-id: 127312270

Test Plan: CI

Reviewed By: SplitInfinity

Differential Revision: D27973447

fbshipit-source-id: f12df69d185f1baaa45f2ac6eac70570a7a65912
2021-04-30 10:28:50 -07:00