Commit Graph

36 Commits

Author SHA1 Message Date
Mikhail Zolotukhin
765bf8f03d Remove duplicate bindings from torch/csrc/jit/python/init.cpp. (#36492)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36492

Test Plan: Imported from OSS

Differential Revision: D20995235

Pulled By: ZolotukhinM

fbshipit-source-id: 6afa3a956e57c2fb94bb29d332177be73a2bac2a
2020-04-13 12:28:32 -07:00
Kimish Patel
d559a47933 Enable relu fusion with prepacked linear/conv. (#35705)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35705

Introduces a pass for relu fusion.

Test Plan:
python test/test_xnnpack_integration.py

Imported from OSS

Differential Revision: D20746592

fbshipit-source-id: 6c22f60a20e9121618c85077b9b58fb8d4082b3b
2020-04-03 15:38:45 -07:00
Mikhail Zolotukhin
af5121f62a Invoke TensorExpr fuser pass from a graph executor. (#35913)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35913

The pass itself is still disabled by default, but with this change we
don't need to register it as a custom pass anymore. It allows us to
control its behavior with env variables more easily.

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D20827189

Pulled By: ZolotukhinM

fbshipit-source-id: e74d90b5e46422e7ab7bc40974a805220da50fbc
2020-04-03 12:20:26 -07:00
Christian Sarofeen
6d24f8fe21 Infrastructure for a new CUDA Fuser (#34785)
Summary:
**Summary:** This PR contains the infrastructure of a new CUDA fuser. This CUDA fuser is based on many of the same principles of TensorExpressions and Halide, however the implementation is ground up. The fusion pass itself is similar to the default CUDA fuser, however, it has undergone some refactoring and is using the new code generation infrastructure. For those who are interested in how the code generation in this PR works, I would recommend reviewing _test/cpp/jit/test_gpu_fusion.cpp_ as well as the long comment section at the beginning of _torch/csrc/jit/codegen/cuda/transform_replay.h_  One of the largest differences between our approach and that of TVM/Halide, is the concept of "TensorView". TensorView from a high level should be thought of similarly to how we think of working with Tensors in PyTorch. It's an N-D object which can undergo transformations that change its dimensionality. Dimensionality changes are done through the operations split/merge/reorder/computeAt. These transformations are similar to split/fuse/reorder/compute_at of TVM, they modify how a tensor is iterated over to generate GPU code. Interestingly, in our scheme these transformations are applied to tensors and only impact how that tensor is generated.

**Warning:** This PR is purposefully not feature complete with the current fuser. We wanted to separate out the infrastructure from the fusion capabilities. Once in, smaller incremental PRs will be submitted to expand capabilities of the fuser.

**Short term goals:**

Parity with current CUDA fuser (including performance):
- Dynamic shapes (no recompilation)
- Implicit handling of braodcast (broadcasted tensors are treated as tensors of the braodcasted size in the generated code)
- Dropout

**Mid-term goals:**

- Transposes fused with pointwise operations where transpose involves only 2 axes (across the fused operation).
- 1-D reductions fused with pointwise operations
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34785

Reviewed By: ZolotukhinM

Differential Revision: D20650977

Pulled By: soumith

fbshipit-source-id: ee39c95a880e1b9822e874ed4cc180971572bf63
2020-04-02 09:22:42 -07:00
Supriya Rao
a090de380c [quant][graph] Add quant fusion for dynamic quantization (#35586)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35586

This pass fuses the choose_qparams-quant-dequant sequence
Fusion for weight tensor is the same as static quant.

Test Plan:
python test/test_quantize_script.py

Imported from OSS

Differential Revision: D20755680

fbshipit-source-id: b7443770642b6e6fa0fa9da8a44637e9b2d4df70
2020-03-30 23:34:56 -07:00
Supriya Rao
1f7ee7b6b7 [quant][graph] Add pass to insert quant dequant for dynamic quantization (#35448)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35448

Add _choose_qparams_per_tensor which returns scale and zero_point similar to the dynamic quantization in the operator

Test Plan:
python test/test_quantize_script.py

Imported from OSS

Differential Revision: D20755679

fbshipit-source-id: c9066d8f1bb3e331809be26c4be806faafc9b981
2020-03-30 23:33:32 -07:00
Jerry Zhang
6fc2403951 [quant][graphmode] qconfig_dict support None (#35336)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35336

Test Plan:
python test/test_quantization.py

Imported from OSS

Differential Revision: D20655302

fbshipit-source-id: b453f3240ac487aa29629953b4d71274dbbc25fc
2020-03-29 12:47:47 -07:00
Nikolay Korovaiko
9e22d15f14 Enable tensorexpr cpp tests in CI. try #2 (#35454)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35454

Differential Revision: D20665160

Pulled By: Krovatkin

fbshipit-source-id: e04cbe92b2ee5a3288f3c4e5c83533bfea85bf85
2020-03-27 12:09:55 -07:00
Bram Wasti
a3e10d2a17 Expose enablement of TensorExpr fuser as env variable (#35341)
Summary:
This commit allows one to use an environment variable to enable the fuser in torch/csrc/jit/tensorexpr/

```
PYTORCH_TENSOREXPR=1 python benchmark.py
```

This commit also changes the registration to happen by default, removing the requirement for the python exposed "_jit_register_tensorexpr_fuser"
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35341

Reviewed By: ZolotukhinM

Differential Revision: D20676348

Pulled By: bwasti

fbshipit-source-id: 4c997cdc310e7567c03905ebff72b3e8a4c2f464
2020-03-26 14:31:57 -07:00
Meghan Lele
6384c2d81b [JIT] clang-format JIT code (#35115)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35115

This commit runs the newly added tools/clang_format.py on the JIT
codebase and includes all of the formatting changes thus produced.

Testing:
Ran the script, CI.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D20568523

Pulled By: SplitInfinity

fbshipit-source-id: e09bdb982ccf090eecfb7c7b461b8d0681eef82b
2020-03-26 11:24:51 -07:00
Suraj Menon
aa01a95c6d Revert D20630760: [pytorch][PR] Enable NNC tests vol. i. add test_tensorexpr.py tests [WIP]
Test Plan: revert-hammer

Differential Revision:
D20630760

Original commit changeset: 7d2f27aca6b1

fbshipit-source-id: 28ac92b3390651a4a67061d6ebf208515b9b9463
2020-03-25 20:34:46 -07:00
Kimish Patel
dc2c4d02f9 Add a wrapper to wrap all optimization for mobile. (#35227)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35227

This wraps.
1. Conv BN folding (not mobile specific)
2. insert XNNPACK conv2d/Linear ops
3. Remove prepacking ops.

Test Plan: Imported from OSS

Differential Revision: D20603562

fbshipit-source-id: ff373af7112c070ec6198bac51845282e09ff1f8
2020-03-25 19:21:14 -07:00
Nikolay Korovaiko
f3a5081bd4 Enable NNC tests vol. i. add test_tensorexpr.py tests [WIP] (#34897)
Summary:
This  PR add tensorexpr cpp tests to test_jit.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34897

Differential Revision: D20630760

Pulled By: Krovatkin

fbshipit-source-id: 7d2f27aca6b1e23e3ffed1c765d8f590688118e3
2020-03-25 17:23:48 -07:00
Elias Ellison
aab4beb87f [JIT] Pass To Safely Remove Aten Inplace Ops (#33186)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33186

This helps create larger functional graphs. It has the potential to increase memory use, so in order to land this on by default we would probably also do a reuse of buffers pass.

This is currently O(n * | Removed Nodes | ) because we have to rebuild the alias Db each time we make a change. This pass is critical to creating functional graphs, so this might be a compelling use case to build incremental updates to alias Db.

Test Plan: Imported from OSS

Differential Revision: D20603189

Pulled By: eellison

fbshipit-source-id: 105db52bf38e02188ca6df6d36294466d3309a0a
2020-03-24 23:45:58 -07:00
Elias Ellison
5b2f8cef08 [JIT] Functional Graph Pass (#33020)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33020

This is a pass to create functional blocks. The other PRs in the stack help avoid some of the limitations that are are often found in graphs. It's possible that this would work well with a graph that is frozen. Follow up work items that will help this pass:

- We don't currently have any capacity in alias analysis to tell whether a Value that came from the wildcard set "re-escapes" back into the wildcard set.
- More comments on the semantics of the graph and correctness conditions
- We could consider using dynamic dag if the perf of this is a limitation.
- potential make Functional Graphs Functional Blocks instead, so that we do not repeatedly copy constants, also to make IR read easier.

Test Plan: Imported from OSS

Differential Revision: D20603188

Pulled By: eellison

fbshipit-source-id: 6822a6e65f4cc2676f8f6445fe8aa1cb858ebeeb
2020-03-24 23:44:18 -07:00
davidriazati
44622bbda9 [jit] Add lazy script decorator (#34935)
Summary:
Stacked PRs
 * #34938 - [jit] Remove stray `script`
 * **#34935 - [jit] Add lazy script decorator**

Some users maintain libraries of code that is largely trace-able but not
script-able. However, some functions may need to be `torch.jit.script`ed if
they contain control flow so the tracer will use the compiler version.
This however impacts library start up time as in #33418, so this PR adds
a workaround in the form of a `torch.jit._lazy_script_while_tracing`
that will only initialize the compiler if the function is called while
actually tracing.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/34935

Pulled By: driazati

Differential Revision: D20569778

fbshipit-source-id: d87c88c02b1abc86b283729ab8db94285d7d4853
2020-03-24 13:43:18 -07:00
Supriya Rao
55019d357e [quant][graphmode] Add observers for dynamic quant (#35121)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35121

For dynamic quantization we insert observers at the input to mimic the quatization of activations that happens in the operator
Observer for weight is inserted similar to static quant

Test Plan:
python test/test_quantize_script.py

Sample output for single layer FC

.graph(%self : __torch__.___torch_mangle_4.M,
      %x.2 : Tensor):
  %_observer_1 : __torch__.torch.quantization.observer.MinMaxObserver = prim::GetAttr[name="_observer_1"](%self)
  %x.1 : Tensor = prim::CallMethod[name="forward"](%_observer_1, %x.2)
  %2 : __torch__.torch.nn.modules.linear.___torch_mangle_5.Linear = prim::GetAttr[name="fc"](%self)
  %3 : Tensor = prim::CallMethod[name="forward"](%2, %x.1) # test/test_quantize_script.py:19:23
  return (%3)

graph(%self : __torch__.torch.nn.modules.linear.___torch_mangle_5.Linear,
      %input.1 : Tensor):
 %2 : Function = prim::Constant[name="linear"]()
 %3 : Tensor = prim::GetAttr[name="weight"](%self)
 %_observer_0 : __torch__.torch.quantization.observer.MinMaxObserver = prim::GetAttr[name="_observer_0"](%self)
 %7 : Tensor = prim::CallMethod[name="forward"](%_observer_0, %3)
 %4 : Tensor = prim::GetAttr[name="bias"](%self)
 %5 : Tensor = prim::CallFunction(%2, %input.1, %7, %4) # /home/supriyar/miniconda3/envs/pytorch_py3/lib/python3.7/site-packages/torch/nn/modules/linear.py:87:15
 return (%5)

Imported from OSS

Differential Revision: D20599144

fbshipit-source-id: 9a8fa0e8655b9908826b981dce8a11d86efce5df
2020-03-24 10:54:16 -07:00
Kimish Patel
e1c092fe3a Changes to transition to generic API for ops with weight prepacking (#35010)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35010

semantics.

This PR moves all the xnnpack specific interfces to a generic interface.
Accordingly removes xnnpac specific reference from API and some variable
names.
What has not yet changed:

TODO:
USE_XNNPACK is still used. This can be removed where no XNNPACK
specific things are done. e.g., RegisterOpContext.cpp and
xnnpack_rewrite.cpp.
Also the filename and structure also remains. Some of the generic class
definition can be moved non-XNNPACK specific folder.

Test Plan:
python test/test_xnnpack_integration.py

Imported from OSS

Differential Revision: D20526416

fbshipit-source-id: 2e1725345c44bbb26bdc448097a7384eca121387
2020-03-22 08:31:53 -07:00
Wanchao Liang
c21fde6421 [jit] make jit/rpc share the same PythonFutureWrapper (#35039)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35039

This is the initial step towards merging ivalue future and rpc future

Test Plan: Imported from OSS

Differential Revision: D20537164

Pulled By: wanchaol

fbshipit-source-id: d4f148c88e49ed6b0881ca4b4dd945ea24166183
2020-03-20 22:35:34 -07:00
Mikhail Zolotukhin
12f0052eee Add TensorExpr Fuser tests (resubmit). (#35085)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35085

Test Plan: Imported from OSS

Differential Revision: D20552334

Pulled By: ZolotukhinM

fbshipit-source-id: 628fcf4719a879f18978ff8a0a64afbb045df645
2020-03-20 13:19:31 -07:00
Natalia Gimelshein
3c90a90730 Revert D20540599: Add TensorExpr Fuser tests.
Test Plan: revert-hammer

Differential Revision:
D20540599

Original commit changeset: ced9b6657fe7

fbshipit-source-id: e8fa11f20207c35f39b3fbe6f45fc627715377c1
2020-03-19 18:37:32 -07:00
Mikhail Zolotukhin
7b59f41009 Add TensorExpr Fuser tests. (#35052)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35052

Differential Revision: D20540599

Test Plan: Imported from OSS

Pulled By: ZolotukhinM

fbshipit-source-id: ced9b6657fe72bca61833ab5d59bdaddcacd114b
2020-03-19 14:31:54 -07:00
Mikhail Zolotukhin
95833a49e6 [TensorExpr] Pull changes from bertmaher/pytorch_fusion. (#34842)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34842

This PR (hopefully the last one of such kind) is merging changes from a
side branch where tensor expessions based fuser work has been done so
far. This PR is is a squashed version of changes in the side branch,
which is available here: https://github.com/bertmaher/pytorch

Differential Revision: D20478208

Test Plan: Imported from OSS

Pulled By: ZolotukhinM

fbshipit-source-id: 21556e009f1fd88099944732edba72ac40e9b9c0
2020-03-17 11:02:48 -07:00
Mikhail Zolotukhin
35e7efeb9a [TensorExpr] Add CUDA codegen. (#34227)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34227

This PR adds a CUDA support to tensor expressions.

Differential Revision: D20251836

Test Plan: Imported from OSS

Pulled By: ZolotukhinM

fbshipit-source-id: ab36a55834cceff30c8371fef6cca1054a32f017
2020-03-16 11:49:29 -07:00
Mikhail Zolotukhin
42b2c8c65d [TensorExpr] Add a fuser pass based on tensor expressions. (#34226)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34226

LLVM and Cuda backends are added in subsequent PRs, so at this point the fuser is pretty useless, but it still can be tested and its logic is not going to change with addition of the codegens.

Differential Revision: D20251838

Test Plan: Imported from OSS

Pulled By: ZolotukhinM

fbshipit-source-id: 82b0d221fa89904ed526689d02a6c7676a8ce8de
2020-03-16 11:49:24 -07:00
peter
24c9e61e79 Enable JIT tests on Windows (#27029)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27029

Reviewed By: eellison

Differential Revision: D20458664

Pulled By: jamesr66a

fbshipit-source-id: 22be918543703869f471e89b3478423198351bf3
2020-03-16 11:26:21 -07:00
Kimish Patel
4da5569300 Pass to remove prepacking ops. (#34319)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34319

Removes prepacking ops and install them as attributes of the top level
module. Needs to run freezing as the first pass.

Test Plan:
python test/test_xnnpack_integration.py

Imported from OSS

Differential Revision: D20290726

fbshipit-source-id: 633ceaa867ff7d5c8e69bd814c0362018394cb3a
2020-03-14 12:53:31 -07:00
Kimish Patel
7dd5da2026 JIT pass to insert XNNPACK ops (#34048)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34048

Rewrites the graph to insert xnnpack prepack and packed run ops for
conv2d and linear.

Test Plan:
python test/test_xnnpack_integration.py

Imported from OSS

Differential Revision: D20185658

fbshipit-source-id: c4c073c912ad33e822e7beb4ed86c9f895129d55
2020-03-14 12:53:27 -07:00
Zachary DeVito
52005b551c invokeOperatorFromPython: support overloaded operator calling (#34671)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34671

Like the python arg parser, this tries to convert to the schema in order.
It introduces schema_match_exception which gets thrown when the schema doesn't match,
allowing the overload handler to try the next option.

Behavior will not 100% match the schema argument parser but should work for
simple cases using custom binding.

Test Plan: Imported from OSS

Differential Revision: D20432206

Pulled By: zdevito

fbshipit-source-id: 280839a2205ea3497db3a9b5741fccc1e2bff9a8
2020-03-13 18:46:03 -07:00
Michael Suo
af3a7e2b50 [jit] small cleanups after script:: removal (#34677)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34677

1. Remove remaining uses of `script::` namespace from the codebase,
2. Add one more typedef for `script::ExtraFilesMap` which is part of the
public interface.

Pull Request resolved: #34580

Test Plan: Imported from OSS

Reviewed By: zdevito

Differential Revision: D20431739

Pulled By: suo

fbshipit-source-id: a29d369c755b6506c53447ca1f286b6339222c9a
2020-03-13 17:56:16 -07:00
Jerry Zhang
90ca7a1feb [quant][graphmode] Add Finalize function that inlines graph and produce quantized ops (#33927)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33927

Test Plan:
test will be added in later PRs

Imported from OSS

Differential Revision: D20354879

fbshipit-source-id: 03976f4b86c46dbdc4e45764a1e72f1a3855a404
2020-03-12 14:52:58 -07:00
Michael Suo
c235be42dd [jit] kill script namespace (#34515)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34515

Once upon a time we thought this was necessary. In reality it is not, so
removing it.

For backcompat, our public interface (defined in `api/`) still has
typedefs to the old `script::` names.

There was only one collision: `Pass` as a `Stmt` and `Pass` as a graph
transform. I renamed one of them.

Test Plan: Imported from OSS

Differential Revision: D20353503

Pulled By: suo

fbshipit-source-id: 48bb911ce75120a8c9e0c6fb65262ef775dfba93
2020-03-11 23:32:48 -07:00
Jerry Zhang
2e7eef41ac [quant][graphmode] Swap quantized functional linear with aten::linear (#33853)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33853

Quant fusion relies on inline, but inline will break the CallFunction("linaer", ...) into a if block
it will be hard to recognize this block and swap it with quantized::linear, in order to
preserve the op, we will swap all quantized functional linear into aten::linear.
They might produce different backward graph, but this is called in the step before we get quantized
model, so it shouldn't affect anything.
We'll integrate this with convert_script later in the new "finalize_quant" API

Test Plan:
python test/test_jit.py

Imported from OSS

Differential Revision: D20343873

fbshipit-source-id: 423e03bf893b79267d2dc97bc997ee1bfe54ec0f
2020-03-09 15:45:20 -07:00
Jie
2b79bab029 [CUDA_FUSER] Fork CUDA fuser (#33527)
Summary:
Separating CUDA fuser from CPU fuser.

1. New node in IR - prim::CudaFusionGroup:
   This enables the cuda fuser to co-exist along side the old fuser. Allows us
   to incrementally build and expand cuda fuser.

2. copied FuseGraph optimization passes to CudaFuserGraph:
   We will re-factor & reuse Chunk/Concat in the old fuser logic, which is
   handled in the optimization pass at this moment. Unfortunately many code in
   the pass is tightly binded with the legacy fuser, which makes code sharing
   difficult.
   The CudaFusionGraph will support only a subset of operations comparing to
   legacy fuser (CUDA only). It is registered as a custom pass post fusion via
     ```torch._C._jit_register_cuda_fuser()```
   To have it in effect, you should also turn off fusion on GPU via
     ```torch._C._jit_override_can_fuse_on_gpu(False)```

3. We don't have codegen in this PR yet (WIP). Currently we just fall back to
   the old fuser.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33527

Differential Revision: D20171598

Pulled By: ZolotukhinM

fbshipit-source-id: 9a3c0f06f46da7eaa80ae7551c04869f5b03ef71
2020-03-04 20:25:08 -08:00
Zino Benaissa
cab8772c6c Freezing Torchscript modules (#32178)
Summary:
This patch enables folding GetAttr nodes with their corresponding
values. _jit_pass_freeze_module API returns a new TorchScipt module
where all function calls and get attributes are inlined.
Usage:

frozen_model = torch._C._freeze_module(scrited_model._c)
frozen_model.forward(...)

This API currently optimizes the forward method. We will follow up to
to preserve and optimize methods and attributes that are annotated as
 torch.jit.interface.

Several future improvements to JIT optimizations are required to maximize
clean up/de-sugar the graph and eliminate redundancies.
Ideally, we want to produce a graph that can easily be lowered to
GLOW and other low-level backends.
__
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32178

Differential Revision: D19419640

Pulled By: bzinodev

fbshipit-source-id: 52baffaba9bca2cd60a8e747baa68d57711ad42b
2020-03-02 11:38:36 -08:00
Michael Suo
dbe850af5b [jit] do the code reorg (#33851)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33851

Rationale and context described in #33828.

Script to reproduce the move:
https://gist.github.com/suo/16cbefaaeb67ca5a7c6caffd49b7f6e9
ghstack-source-id: 99079645

Test Plan: Make sure CI passes

Reviewed By: jamesr66a

Differential Revision: D20133869

fbshipit-source-id: 390e9241a9c85366d9005c492ac31f10aa96488e
2020-02-27 13:02:51 -08:00