Commit Graph

600 Commits

Author SHA1 Message Date
Michael Andreas Dagitses
acd072967a canonicalize includes of form <aten/src/ATen/...>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78033

This was never intended to be supported.

@override-unit-failures
(Note: this ignores all push blocking failures!)

Differential Revision: [D36567054](https://our.internmc.facebook.com/intern/diff/D36567054/)

Approved by: https://github.com/kit1980
2022-06-16 17:46:45 +00:00
Nikolay Korovaiko
8ef6356f26 Reland PySymInt (#79617)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79617
Approved by: https://github.com/Chillee
2022-06-16 04:18:06 +00:00
PyTorch MergeBot
b8db0a0475 Revert "Python Bindings for SymInts (#78135)"
This reverts commit d332724071.

Reverted https://github.com/pytorch/pytorch/pull/78135 on behalf of https://github.com/ezyang due to broke torchvision tests
2022-06-15 13:52:14 +00:00
Nikolay Korovaiko
d332724071 Python Bindings for SymInts (#78135)
This PR adds support for `SymInt`s in python. Namely,
* `THPVariable_size` now returns `sym_sizes()`
* python arg parser is modified to parse PyObjects into ints and `SymbolicIntNode`s
* pybind11 bindings for `SymbolicIntNode` are added, so size expressions can be traced
* a large number of tests added to demonstrate how to implement python symints.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78135
Approved by: https://github.com/ezyang
2022-06-14 02:17:59 +00:00
goldenxuett
2f7ed05f22 Retry - [JIT] Add mutation checks for tensor inputs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79316

Approved by: https://github.com/davidberard98
2022-06-13 18:16:50 +00:00
Michael Andreas Dagitses
ab2ca95dd1 turn on -Werror=unused-variable in our Bazel CPU build
Summary:
We also fix any existing issues. Note that we only do this for the CPU
build because nvcc is considered a C++ toolchain but it does not have
the same flag support. Adding flags to the GPU build will cause nvcc
errors.

Test Plan: Built locally, rely on CI to confirm.

Reviewers: malfet

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79156

Approved by: https://github.com/seemethere, https://github.com/osalpekar, https://github.com/albanD
2022-06-11 02:46:34 +00:00
anjali411
38350acf8f Autogen Tags enum, and allow specifying tags while defining an op
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79322

Approved by: https://github.com/albanD
2022-06-11 00:29:32 +00:00
PyTorch MergeBot
b712467cd1 Revert "Add mutation checks for tensor inputs"
This reverts commit 83c0a2bc38.

Reverted https://github.com/pytorch/pytorch/pull/79078 on behalf of https://github.com/davidberard98 due to broke bazel build-and-test, see [https://github.com/pytorch/pytorch/runs/6836001002?check_suite_focus=true](https://github.com/pytorch/pytorch/runs/6836001002?check_suite_focus=true%22)
2022-06-10 20:15:30 +00:00
goldenxuett
83c0a2bc38 Add mutation checks for tensor inputs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79078

Approved by: https://github.com/davidberard98, https://github.com/Krovatkin
2022-06-10 18:17:33 +00:00
Luka Mushkudiani
c0a7c1d02e Expose _export_data from C++ to Python (#79207)
Summary:
https://www.internalfb.com/code/fbsource/[477a5768452957f87e56044169de47f051197567]/fbcode/caffe2/torch/csrc/jit/mobile/train/export_data.cpp
export_data is used to serialize data.

I binded this method to Python with PyBind11

Test Plan:
Wrote a file pybind_check.py which checks if the binding works.

Then, tried to read the produced data file from C++ with "torch::jit::_load_parameters" and checked that content matched.

Differential Revision: D37029253

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79207
Approved by: https://github.com/qihqi
2022-06-10 00:41:33 +00:00
Yanan Cao (PyTorch)
67badf0d5c Add missing QSCheme IValue conversion logic (#78862)
Differential Revision: D36913736

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78862
Approved by: https://github.com/suo
2022-06-07 08:34:17 +00:00
goldenxuett
eb49dde9cf Disable TracerWarnings on NNC opinfo tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78756

Approved by: https://github.com/davidberard98
2022-06-03 18:11:12 +00:00
Elias Ellison
26d273959c Add Caching of Conversion to Fake/Meta tensors in FakeTensorMode
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78090

Approved by: https://github.com/ezyang
2022-06-03 13:56:00 +00:00
PyTorch MergeBot
954522a485 Revert "Autogen Tags enum, and allow specifying tags while defining an op"
This reverts commit 9476a78f37.

Reverted https://github.com/pytorch/pytorch/pull/77313 on behalf of https://github.com/malfet due to Broke OSS buck builds, see 9476a78f37
2022-06-03 01:53:53 +00:00
anjali411
9476a78f37 Autogen Tags enum, and allow specifying tags while defining an op
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77313

Approved by: https://github.com/ezyang, https://github.com/albanD
2022-06-03 01:13:44 +00:00
Pavithran Ramachandran
9b81e81771 [PyTorchEdge] Extend Flatbuffer to get mobile_info for NMLML workflows
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78306

Extending the feature available from pickle that helps NMLML system get info of mobile models from `extra_files` dir

Differential Revision: [D36609548](https://our.internmc.facebook.com/intern/diff/D36609548/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D36609548/)!

Approved by: https://github.com/iseeyuan
2022-06-01 20:09:09 +00:00
Tugsbayasgalan Manlaibaatar
c7e9eea915 Expose is_out to python
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78591

Approved by: https://github.com/zhxchen17
2022-06-01 07:39:24 +00:00
Elias Ellison
678213ead2 Fake Tensor Part 1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77969

Approved by: https://github.com/ezyang
2022-05-31 16:20:35 +00:00
Edward Z. Yang
6b273444c4 Add logit ref; allow non-refs to be called in refs.
Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77816

Approved by: https://github.com/mruberry
2022-05-21 02:35:14 +00:00
Elias Ellison
05ce0f9be6 Add option to disable autocast pass
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77566

Approved by: https://github.com/anijain2305, https://github.com/davidberard98
2022-05-18 14:57:25 +00:00
David Berard
d0dc7cb774 Reland "[JIT] during freezing, cast optional bias to half if weight is half"
Original PR: #77295

Original commit message:
On GPU, conv errors if not all its inputs have the same dtype.

In the case of autocasting during freezing, what we see is:
1) inputs to conv are casted to half
2) inputs to batchnorm are not casted, so many are still floats
3) we try to fold conv + batchnorm, by finding different weight and bias such that conv(input, new_weight, new_bias) is equivalent to the original conv -> batchnorm.

If conv previously had an optional bias, then during freezing we will temporarily create a zero-valued bias as a placeholder for conv_bias. We want to construct it to have the same dtype as the weight input to conv, to avoid errors on GPU.

Reland changes:
There's a memory leak from cuda caching allocator that is a side effect of this fix. The memory leak causes the test to fail, though for some reason it didn't fail on CI in the last PR. This skips the tests for now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77617

Approved by: https://github.com/eellison
2022-05-17 12:25:26 +00:00
PyTorch MergeBot
246078e251 Revert "[JIT] during freezing, cast optional bias to half if weight is half"
This reverts commit 2547be5135.

Reverted https://github.com/pytorch/pytorch/pull/77295 on behalf of https://github.com/malfet
2022-05-17 00:34:51 +00:00
Tugsbayasgalan Manlaibaatar
31d9f7c303 Move other div variants to upgraders map
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73586

Approved by: https://github.com/gmagogsfm
2022-05-16 22:32:15 +00:00
David Berard
2547be5135 [JIT] during freezing, cast optional bias to half if weight is half
On GPU, conv errors if not all its inputs have the same dtype.

In the case of autocasting during freezing, what we see is:
1) inputs to conv are casted to half
2) inputs to batchnorm are not casted, so many are still floats
3) we try to fold conv + batchnorm, by finding different weight and bias such that conv(input, new_weight, new_bias) is equivalent to the original conv -> batchnorm.

If conv previously had an optional bias, then during freezing we will temporarily create a zero-valued bias as a placeholder for conv_bias. We want to construct it to have the same dtype as the weight input to conv, to avoid errors on GPU.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77295

Approved by: https://github.com/eellison
2022-05-16 22:18:47 +00:00
max
25a6aabe71 Expose permute inputs (#77391)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77391
Approved by: https://github.com/eellison
2022-05-13 22:18:51 +00:00
Hongxia Yang
8d34a8325d TorchScript to support capability to rethrow the original python exception (#77093)
Summary:
In order to categorize exceptions/errors, the observability /migration team faced a problem that currently the exception is shown as RuntimeError, and hard to categorize.

The solution to this problem is to be able to get the original python exception's class name and msg, and hopefully to recreate a python exception from that.
TO support this approach, we did the following in this diff:

(1) TorchScript to translate JITException so that it does not show as RuntimeError
(2) record python exception class name, original message during translation.

Then, later, the python exception can be reconstructed.

(3) Added a new decorator to reconstruct the python exception and then rethrow it.

Test Plan:
buck test //caffe2/torch/fb/translate_exception/tests:test_rethrow mode/dev-tsan
```
More details at https://www.internalfb.com/intern/buck/build/1180a788-3767-48e5-a64d-06d284b91a17
BUILD SUCCEEDED
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 24ae6c7c-a647-404e-8f12-d12c762bf728
Trace available for this run at /tmp/tpx-20220507-195320.698499-24ae6c7c-a647-404e-8f12-d12c762bf728/trace.log
RemoteExecution session id: reSessionID-24ae6c7c-a647-404e-8f12-d12c762bf728-tpx
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/8162774413147962
    ✓ ListingSuccess: caffe2/torch/fb/translate_exception/tests:test_rethrow : 3 tests discovered (27.233)
    ✓ Pass: caffe2/torch/fb/translate_exception/tests:test_rethrow - test_one_parameter (test_rethrow.TestTranslateRethrowPythonException) (28.467)
    ✓ Pass: caffe2/torch/fb/translate_exception/tests:test_rethrow - test_no_parameter (test_rethrow.TestTranslateRethrowPythonException) (28.495)
    ✓ Pass: caffe2/torch/fb/translate_exception/tests:test_rethrow - test_2_parameter_with_torch_script_only (test_rethrow.TestTranslateRethrowPythonException) (28.708)
Summary
  Pass: 3
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/8162774413147962

```

Differential Revision: D36166520

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77093
Approved by: https://github.com/qihqi
2022-05-13 16:40:25 +00:00
David Berard
0925597707 [JIT] Support for ParameterDict getattr
Adds support for scripting ParameterDicts and getattr() on them. It does
not support iterating on ParameterDicts because torch/nn/container.py
implementation of ParameterDict.items() uses a generator, which is not
supported by torchscript. torch/nn/container.py would need to be updated
so that iter gets correctly registered in python_sugared_value.cpp

Added a test in test_module_containers.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77143

Approved by: https://github.com/eellison
2022-05-13 01:03:25 +00:00
Henry Tu
f6eb811786 Add RefineTypes JIT pass for Tuple (#76919)
Consider the following JIT graph, where the type of `%a` and `%b` are out of sync with tuple `%c`.
Before:
```
graph(%a : Float(123), %b : Float(4, 5, 6)):
    c : (Tensor, Tensor) = prim::TupleConstruct(%a, %b)
    return (%c)
```
After:
```
graph(%a : Float(123), %b : Float(4, 5, 6)):
    c : (Float(123), Float(4, 5, 6)) = prim::TupleConstruct(%a, %b)
    return (%c)
```
This PR adds a pass `RefineTypes(...)` to update all such instances with the correct type. This is also available via Python by using `torch._C._jit_pass_refine_types(...)`.

A unit test has been added for unnamed tuples, but no test exists for `NamedTuple` (though it was tested manually) since it isn't supported by the parser:
```
RuntimeError:
unknown type specifier:

        graph(%a : Float(123), %b : Float(4, 5, 6)):
          %c : NamedTuple(Tensor : Tuple, Tensor : Tuple) = prim::TupleConstruct(%a, %b)
               ~~~~~~~~~~ <--- HERE
          return (%c)
```

cc: @ke1337 @antoniojkim @wconstab @eellison
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76919
Approved by: https://github.com/eellison
2022-05-12 00:48:39 +00:00
Edward Z. Yang
0a14a4c280 Register prims as operators.
This makes prims look as if they were defined in native_functions.yaml
but they're still all written in Python.  You now need to give a full
schema string for your prims.  The returned prim object is now
torch.ops.prim overload (prims are not allowed to be overloaded,
so we return the overload, not the overload packet, for speed.)

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77117

Approved by: https://github.com/mruberry, https://github.com/albanD
2022-05-11 16:38:14 +00:00
Han Qi
41ff6f8c49 make has_bundled_input work for flatbuffer (#76854)
Summary: title

Test Plan: unit test

Differential Revision: D36120947

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76854
Approved by: https://github.com/Jack-Khuu
2022-05-09 23:04:08 +00:00
Edward Z. Yang
f2eed9400d Register PrimTorch refs as decompositions.
For the most part, PrimTorch refs have the same signature as their
ATen equivalents.  I modify most PrimTorch refs to register themselves
as decompositions, using the prim name they wrap to find the aten name
(except for a few cases where the prim/aten names mismatch).  There are
some exclusions, falling into one of two categories:

- The torch equivalent was already implemented as a CompositeImplicitAutograd
  decomposition in C++

- The ref doesn't support enough features (e.g., the real deal has more
  kwargs / overloads than are currently implemented)

PrimTorch refs are written as a single function that supports all
overloads, and this style is convenient for cases where we have a bundle
of overloads for what morally is a single overload with a Union type
on an argument (which we ought to have supported in
native_functions.yaml but blah); to support registering a single decomp
for all the overloads, we modify register_decomposition to register
to ALL overloads if you pass it an overload packet.  This is technically
BC breaking but no tests started failing because of it.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76835

Approved by: https://github.com/Chillee, https://github.com/mruberry
2022-05-06 20:11:45 +00:00
sanchitintel
4ee29d6033 [Reland take-2] Add JIT graph fuser for oneDNN Graph API (v0.5)
Re-landing #68111/#74596

## Description
v0.5 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444).

On the basis of #50256, the below improvements are included:

 * The [v0.5 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.5) of the oneDNN Graph API is used
 * The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties.

 ### User API:
The optimization pass is disabled by default. Users could enable it by:

```
 torch.jit.enable_onednn_fusion(True)
```
`torch.jit.freeze` should be used after tracing (recommended) or scripting a model.

 ### Performance:
 [pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance:

 * SkyLake 8180 (1 socket of 28 cores):
   ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png)
* SkyLake 8180 (single thread):
   ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png)
   * By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI)
   ** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops

 ### Directory structure of the integration code
 Fuser-related code is placed under:

 ```
 torch/csrc/jit/codegen/onednn/
 ```

 Optimization pass registration is done in:

 ```
 torch/csrc/jit/passes/onednn_graph_fuser.h
 ```

 CMake for the integration code is in:

 ```
 caffe2/CMakeLists.txt
 cmake/public/mkldnn.cmake
 cmake/Modules/FindMKLDNN.cmake
 ```

 ## Limitations
 * In this PR, we only support Pytorch-oneDNN-Graph integration on Linux platform. Support on Windows and MacOS will be enabled as a next step.
 * We have only optimized the inference use-case.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76622
Approved by: https://github.com/eellison
2022-05-05 16:57:03 +00:00
Edward Z. Yang
3a6da16a5a Return all overloads for an operator in _jit_get_operation
This allows us to provide OpOverloadPacket.overloads method that
lists all of the overloads.

This isn't tested; will be exercised in the next PR.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76814

Approved by: https://github.com/mruberry
2022-05-04 23:49:47 +00:00
BowenBao
679fc90cdb [ONNX] Support optional type (#68793) (#73284)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73284

Some important ops won't support optional type until opset 16,
so we can't fully test things end-to-end, but I believe this should
be all that's needed. Once ONNX Runtime supports opset 16,
we can do more testing and fix any remaining bugs.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D34625646

Pulled By: malfet

fbshipit-source-id: 537fcbc1e9d87686cc61f5bd66a997e99cec287b

Co-authored-by: BowenBao <bowbao@microsoft.com>
Co-authored-by: neginraoof <neginmr@utexas.edu>
Co-authored-by: Nikita Shulga <nshulga@fb.com>
(cherry picked from commit 822e79f31ae54d73407f34f166b654f4ba115ea5)
2022-05-04 20:24:30 +00:00
David Berard
e33f3229a2 [NVFuser] environment variable to turn nvfuser on or off (#76485)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76485

Adds an environment variable `PYTORCH_JIT_ENABLE_NVFUSER` for
controlling whether or not nvfuser is enabled. This required changing
the PassManager behavior to support the case where nvfuser gets enabled
by default when PYTORCH_JIT_ENABLE_NVFUSER=1.

Previously the solution for turning nvfuser on or off was to use the
PassManager to register or un-register the pass. That works fine if the
pass starts of _disabled_, but causes issues once we try to enable the
pass by default.

The main issue with enabling by default is with the validation check to
see whether NVFuser can be turned on. The check relies on
at::globalContext().hasCUDA(), which requires CUDAHooks to be registered
before hasCUDA() wil work correctly. At static initialization time it's
difficult to ensure that CUDAHooks will be registered _before_ we
attempt to register the nvfuser pass. In OSS it worked fine, but in
internal builds it would fail on ROCm builds.

To fix this, we switch the control of NVFuser enablement to a check in
the pass. i.e. previously, we enabled/disabled nvfuser by registering or
de-registering the pass in pass manager; now, the pass is always
registered in pass manager, and enablement is done by a check within the
nvfuser pass.

Remaining TODO: Connect this with NNC so that in cases where NNC is
available but not NVFuser (i.e. on AMD gpus), NNC can be turned on
automatically.

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D35982618

Pulled By: davidberard98

fbshipit-source-id: fd5b76bc0b8c8716c96fdc04bebfb15026a7ef60
(cherry picked from commit ff14603ff5ac8d9b6c749c4f111f4a8be8023b7f)
2022-05-03 23:05:40 +00:00
PyTorch MergeBot
3dcd67a1b3 Revert "[Re-landing 68111] Add JIT graph fuser for oneDNN Graph API (Preview4.1)"
This reverts commit 8b11d81058.

Reverted https://github.com/pytorch/pytorch/pull/74596 on behalf of https://github.com/janeyx99
2022-04-29 15:40:17 +00:00
chunyuan
8b11d81058 [Re-landing 68111] Add JIT graph fuser for oneDNN Graph API (Preview4.1)
Re-landing https://github.com/pytorch/pytorch/pull/68111

## Description
Preview4 PR of this [RFC](https://github.com/pytorch/pytorch/issues/49444).

On the basis of https://github.com/pytorch/pytorch/pull/50256, the below improvements are included:

- The [preview4 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.4.1) of the oneDNN Graph API is used
- The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties.

### User API:
The optimization pass is disabled by default. Users could enable it by:
```
torch.jit.enable_onednn_fusion(True)
```

### Performance:
[pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance:
- SkyLake 8180 (1 socket of 28 cores):

  ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png)

- SkyLake 8180 (single thread):

  ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png)
 \* By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI)
  \** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops

### Directory structure of the integration code
Fuser-related code are placed under:
```
torch/csrc/jit/codegen/onednn/
```

Optimization pass registration is done in:
```
torch/csrc/jit/passes/onednn_graph_fuser.h
```

CMake for the integration code is:
```
caffe2/CMakeLists.txt
```

## Limitations

- In this PR, we have only supported the optimization on Linux platform. The support on Windows and MacOS will be enabled as the next step.
- We have only optimized the inference use case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74596
Approved by: https://github.com/malfet
2022-04-29 01:01:33 +00:00
Elias Ellison
e5a55af305 Reland reland
Reland of https://github.com/pytorch/pytorch/pull/76397 and https://github.com/pytorch/pytorch/pull/76493

This time I'll get it right 😢
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76539
Approved by: https://github.com/davidberard98, https://github.com/osalpekar
2022-04-28 20:41:55 +00:00
PyTorch MergeBot
a5bc02aeb2 Revert "[JIT] Register decomp reland"
This reverts commit 81b9cb741c.

Reverted https://github.com/pytorch/pytorch/pull/76397 on behalf of https://github.com/osalpekar
2022-04-28 03:33:29 +00:00
Elias Ellison
81b9cb741c [JIT] Register decomp reland
Reland of https://github.com/pytorch/pytorch/pull/76252
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76397
Approved by: https://github.com/davidberard98
2022-04-26 23:17:18 +00:00
Kevin Stephano
b17b2b1cc7 Add NVFuser Python Frontend
New functionality.

1. Adds Pybind11 bindings for NVFuser.
2. Requires a build file change and JIT python file change outside of NVFuser's code area.

Example:
```
import torch

from torch._C._nvfuser import Fusion, FusionDefinition

# Construct and Define Fusion
fusion = Fusion()

with FusionDefinition(fusion) as fd :
    t0 = fd.define_tensor(3)
    t1 = fd.define_tensor(1)
    s0 = fd.define_scalar()

    fd.add_input(t0)
    fd.add_input(t1)
    fd.add_input(s0)

    c0 = fd.define_constant(3.0)

    t1_b = fd.Ops.broadcast(t1, [True, True, False])
    t2 = fd.Ops.add(t0, t1)
    t3 = fd.Ops.mul(t2, c0)
    t4 = fd.Ops.mul(t3, s0)
    t5 = fd.Ops.relu(t4)
    t6 = fd.Ops.sum(t5, [-1], False)

    fd.add_output(t6)

fusion.print_ir()

# Execute Fusion
input1 = torch.ones(2, 4, 8, device='cuda')
input2 = torch.ones(8, device='cuda')

# Kernel compilation should be cached for the 2nd iteration
# with input tensors of the same shape
for _ in range(5) :
    outputs = fusion.execute([input1, input2, 2.0])

print(outputs[0])
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76353
Approved by: https://github.com/csarofeen, https://github.com/mruberry
2022-04-26 06:10:19 +00:00
PyTorch MergeBot
2d72cb3373 Revert "[JIT] Allow registering Decompositions"
This reverts commit d9f0774f98.

Reverted https://github.com/pytorch/pytorch/pull/76252 on behalf of https://github.com/zengk95
2022-04-26 04:47:05 +00:00
Elias Ellison
d9f0774f98 [JIT] Allow registering Decompositions
- Allow registering custom decompositions
- Add easier API for invoking decompositions
- Shorten API names (no users yet)

I am doing these as one pr because they are fairly short/simple and because github first does not support ghstack yet.

cc @Chillee @zou3519
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76252
Approved by: https://github.com/davidberard98
2022-04-26 03:00:35 +00:00
David Berard
82421b0fb8 [JIT] support parameterlist iteration
Followup to https://github.com/pytorch/pytorch/pull/75479.

This adds support for iterating through parameterlists

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76140

Approved by: https://github.com/tugsbayasgalan
2022-04-21 18:51:27 +00:00
David Berard
272890998e [JIT] pass more exception info through the JIT interpreter
If TORCH_SHOW_CPP_STACKTRACES=1, then dump e.what() into the RuntimeError, which should make it easier to debug exceptions that happen within interpreted sections.

Test:
```patch
diff --git a/test/cpp/jit/test_dce.cpp b/test/cpp/jit/test_dce.cpp
index 6f9161d0d9..7c574787cf 100644
--- a/test/cpp/jit/test_dce.cpp
+++ b/test/cpp/jit/test_dce.cpp
@@ -3,6 +3,10 @@
 #include <torch/csrc/jit/ir/irparser.h>
 #include <torch/csrc/jit/passes/dead_code_elimination.h>
 #include <torch/csrc/jit/testing/file_check.h>
+#include <torch/csrc/jit/runtime/interpreter.h>
+#include <test/cpp/jit/test_utils.h>
+
+#include <ATen/ATen.h>

 namespace torch {
 namespace jit {
@@ -48,5 +52,30 @@ graph():
   // Check that dead code elimin
   testing::FileCheck().run(input, *graph);
 }
+
+TEST(EliminateDeadCodeTest, interpreterfailure) {
+  const std::string input = R"IR(
+graph(%x.1 : Tensor):
+  %2 : int = prim::Constant[value=128]() # /data/users/dberard/scripts/DGB/sz.py:4:38
+  %3 : int = prim::Constant[value=256]() # /data/users/dberard/scripts/DGB/sz.py:4:43
+  %5 : int = prim::Constant[value=1]() # /data/users/dberard/scripts/DGB/sz.py:4:53
+  %4 : int[] = prim::ListConstruct(%2, %3)
+  %6 : Tensor[] = aten::split_with_sizes(%x.1, %4, %5) # /data/users/dberard/scripts/DGB/sz.py:4:11
+  return (%6)
+)IR";
+  auto graph = std::make_shared<Graph>();
+  parseIR(input, graph.get());
+
+  //auto stack = createStack({at::randn({2, 383}, at::kCPU)});
+  auto stack = createStack({at::Tensor{}});
+
+  Code code(graph, "");
+  InterpreterState interpreter{code};
+  interpreter.run(stack);
+ ASSERT_EQ(2, stack.size());
+  ASSERT_FALSE(stack[0].toTensor().defined());
+  ASSERT_FALSE(stack[1].toTensor().defined());
+}
+
 } // namespace jit
 } // namespace torch
```

^ use this to repro the interpreter issue: `TORCH_SHOW_CPP_STACKTRACES=1 ./bin/test_jit --gtest_filter="EliminateDeadCodeTest.interpreterfailure"` and the stack trace is shown.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75682

Approved by: https://github.com/eellison
2022-04-21 18:26:49 +00:00
jishaomin
91e9fcf5b0 sup torch script parameterlist
Fixes #61176

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75479
Approved by: https://github.com/davidberard98
2022-04-20 20:53:07 +00:00
Elias Ellison
0c671c15ec [JIT] Remove CSE Hoisting
This has led to a couple bugs, and I don't think the additional complexity was worth keeping in codebase.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75756
Approved by: https://github.com/davidberard98
2022-04-19 20:59:25 +00:00
Han Qi
b34b192d6b Reland "Make debug_pkl smaller by only emitting unique traces." (#73368)
Summary:
## Original commit message:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73368

debug_pkl file inside of pytorch's .pt file consists of a list of SourceRanges. Each SourceRange points to a Source which is a stack track, filename, and start, end numbers. Those are emitted in debug_pkl file as strings.
Since many SourceRange shares the same source, the string for trace can be deduped.
The newer format saves a set of unique traces in a tuple, then each SourceRange will save the offset of it's trace w.r.t. position in that tuple. (i.e. manually applying dictionary compression).
The above helps with smaller file size. On loading, if we copy each trace to Source as string the runtime memory would still blowup.
To mitigate this, we use SourceView directly instead of source which will take the reference of string inside of Deserializer and make that into string_view. This is safe because Deserializer is hold by Unpickler by shared_ptr, and Unpickler is also hold by shared_ptr by another Source object. That Source object will be alive during the model construction.

Test Plan:
## Original Test plan
unit test

Took original file (312271638_930.predictor.disagg.local); loaded with `torch.jit.load` save again with `torch.jit.save`. Unzip both, look at contents:
```
[qihan@devvm5585.vll0 ~]$ du archive -h
4.0K    archive/xl_model_weights
3.7M    archive/extra
8.0K    archive/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K    archive/code/__torch__/caffe2/torch/fb/model_transform
8.0K    archive/code/__torch__/caffe2/torch/fb
8.0K    archive/code/__torch__/caffe2/torch
8.0K    archive/code/__torch__/caffe2
20M     archive/code/__torch__/torch/fx/graph_module
20M     archive/code/__torch__/torch/fx
8.0K    archive/code/__torch__/torch/classes
20M     archive/code/__torch__/torch
20M     archive/code/__torch__
20M     archive/code
2.7M    archive/constants
35M     archive
[qihan@devvm5585.vll0 ~]$ du resaved -h
4.0K    resaved/extra
8.0K    resaved/code/__torch__/caffe2/torch/fb/model_transform/splitting
8.0K    resaved/code/__torch__/caffe2/torch/fb/model_transform
8.0K    resaved/code/__torch__/caffe2/torch/fb
8.0K    resaved/code/__torch__/caffe2/torch
8.0K    resaved/code/__torch__/caffe2
1.3M    resaved/code/__torch__/torch/fx/graph_module
1.3M    resaved/code/__torch__/torch/fx
8.0K    resaved/code/__torch__/torch/classes
1.4M    resaved/code/__torch__/torch
1.4M    resaved/code/__torch__
1.4M    resaved/code
2.7M    resaved/constants
13M     resaved
[qihan@devvm5585.vll0 ~]$
```
## Additional test:
`buck test mode/dev-tsan //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --exact 'caffe2/benchmarks/static_runtime:static_runtime_cpptest - StaticRuntime.to'` passes

 test jest.fbios.startup_cold_start.local.simulator f333356873 -

Differential Revision: D35196883

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74869
Approved by: https://github.com/gmagogsfm
2022-04-18 22:34:21 +00:00
John Clow
f281d83d77 Moving Remove Tensor Type Specializations to after custom passes
This is to allow for Intel folks to use type information in their custom passes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71748

Approved by: https://github.com/eellison
2022-04-11 22:12:01 +00:00
Emma Blink
ca056cc918 [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D35543681

fbshipit-source-id: 0453f35c2a39299df172dc2b4fc77fb73963bb97
(cherry picked from commit aae11d9628a1cf7fd88a2113191f31e979750bc8)
2022-04-11 13:48:41 +00:00