Commit Graph

655 Commits

Author SHA1 Message Date
Edward Z. Yang
2a332afbf4 Add SymFloat, support SymInt to SymFloat conversion (#84284)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84284
Approved by: https://github.com/albanD
2022-09-03 01:30:32 +00:00
Elias Ellison
97b2dff600 Add Initial Support For Fake Tensor Constant Tracking (#84387)
Adds support for constant tensor tracking within FakeTensors. Copy-pasta'ing from `proxy_tensor.py` why this is useful:
```
# In some circumstances, we will be tracing in a situation where a tensor
# is *statically* known to be a constant (currently, this only happens if
# you run torch.tensor; deterministic factory functions like torch.arange
# don't get this treatment).  When the tensor in question is small, it's
# helpful to due constant propagation in case we call item() (in which
# case we can return the constant value that is known, rather than give
# an error.)
```

This PR only attempts to add support for the tracing scenarios where we run each operation linearly - aot autograd, torchdynamo. It does not yet handle how constant tensors should be handled as part of the persistent fx graph. Additionally, it does not yet attempt to de-duplicate or interact with ProxyMode's only constant tensor handling.

Edit: plan is to rely on functionalization for fx graph
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84387
Approved by: https://github.com/ezyang
2022-09-02 02:43:04 +00:00
Horace He
6a3ecda5a2 Started storing faketensor/symbolic shape metadata on FX nodes in make_fx (#84114)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84114
Approved by: https://github.com/SherlockNoMad
2022-08-31 04:39:48 +00:00
Edward Z. Yang
ad44670fa1 Back out "Revert D38984222: Don't introduce new overload for SymInt (#83628)" (#84173)
Also Back out "Revert D39075159: [acc_tensor] Use SymIntArrayRef for overloaded empty.memory_format's signature"

Original commit changeset: dab4a9dba4fa
Original commit changeset: dcaf16c037a9

Original Phabricator Diff: D38984222
Original Phabricator Diff: D39075159

Also update Metal registrations for C++ registration changes.

Also update NNPI registration to account for tightened schema checking

Differential Revision: [D39084762](https://our.internmc.facebook.com/intern/diff/D39084762/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39084762/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84173
Approved by: https://github.com/Krovatkin
2022-08-29 18:01:07 +00:00
Kimish Patel
cfd18e105f [Pytorch][Ondevice quantization] Add device side API to convert model (#83807)
Summary:
This diff adds device side API which will convert the model to its
quantized equivalent. THe input model must have been prepared AOT for
quantization.

API is implemented by:
- Running reset obervers
- Running observe method
- Running quantize method
- And replacing method, e.g. forward, with its quantized equivalent.

Test Plan:
test/quantization/jit/test_ondevice_quantization.py

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D38889818](https://our.internmc.facebook.com/intern/diff/D38889818)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83807
Approved by: https://github.com/iseeyuan
2022-08-29 17:57:38 +00:00
Kimish Patel
5c7e801c50 [pytorch][on device quant] Finalize method for ondevice quant (#83571)
Summary:
After inserting quant dequant nodes in the graph, we need
1. Insert packed param creation and quantized op
2. Create packed_params attribute in the top module. For this we need
graph that inlined except for calculate_qparams method calls. But they
can be inlined too. So perhaps we need to make sure no other callmethods
exist.
3. Insert SetAttr for the packed param
4. Insert GetAttr for the packed param
5. Use GetAttr output for quantized op where applicable, e.g.
linear_dynamic

The above is added to quantize_<method-name> method created inprevious
step. Once the above steps are done clone the method into
quantized_<method-name>

Modify quantize_<method-name>:
1. Remove all outputs from the method.
2. Run dce
3. Remove all inputs from the method except self.

Modify quantized_<method-name>:
1. Remove all packed_param setAttr nodes.
2. Run dce.

This should result in removal of all nodes that generate packed param.

Test Plan: To be written

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D38771416](https://our.internmc.facebook.com/intern/diff/D38771416)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83571
Approved by: https://github.com/jerryzh168
2022-08-29 17:53:11 +00:00
Kimish Patel
446afb5f9f [On Device Quantization][pytorch]Make insert_quant_dequant support ondevice ptq (#83570)
Summary:
This diff adds a way to:
- clone previously observed method
- Add calls to observer's calculate_qparams methods
- Extract the scale and zero point
- Use them to insert quant dequant nodes

Now for forward method we have
- observe_forward
- quantize_forward

observe_forward is used post training to observer statistics. In the
case of dynamic PTQ this requires just running that method once to
update weight observer statistics.

quantize_forward method will be used to use the observer
statistics to calculate quantization parameters and apply that to quant
dequant op.

Subsequent diffs will replace dequant + op with their quantized op
counter parts and replace quantize ops with relevant packed params class
where possible

Test Plan:
To be written

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D38771419](https://our.internmc.facebook.com/intern/diff/D38771419)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83570
Approved by: https://github.com/jerryzh168
2022-08-29 17:51:00 +00:00
Kimish Patel
9189edb3b3 [Quantization][Pytorch] On device quantization support part 1 (#83568)
Summary:
TO support on device quantization this diff introduces observer
insertion. Specifically observers are inserted by adding new method with
prefix observ_.

Intent is that post training, this method will be run to record
statistics

Test Plan:
test_ondevice_quantization.py

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D38771417](https://our.internmc.facebook.com/intern/diff/D38771417)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83568
Approved by: https://github.com/jerryzh168
2022-08-29 17:22:30 +00:00
Ivan Yashchuk
3aae6ff1e1 Add nvprims.var_mean (#83508)
This PR adds nvfuser-specific primitive - `var_mean`.
Interpretation `torch.var_mean` -> `torch.ops.nvprims.var_mean` is handled by `TorchRefsNvfuserCapabilityMode` context manager.

I moved some helper code from `_prims/__init__.py` to `_prims_common`. Correctness is tested with OpInfo tests (see `PythonRefInfo("ops.nvprims.var_mean"`).

Layer norm reference now uses `torch.var_mean` instead of `torch._refs.var_mean` to allow interception. Here's a simple comparison of performance with this PR and master (on 3080ti):
```py
import torch
from torch._prims.context import TorchRefsNvfuserCapabilityMode
from torch.fx.experimental.proxy_tensor import make_fx
from torch._prims.executor import execute

def func(a):
    return torch.native_layer_norm(a, (1024,), None, None, 1e-6)

a = torch.randn(10, 512, 1024, dtype=torch.float16, device="cuda")

with TorchRefsNvfuserCapabilityMode():
    gm = make_fx(func)(a)

for _ in range(10):
    execute(gm, a, executor="strictly_nvfuser");
```
run with `PYTORCH_NVFUSER_DUMP=dump_eff_bandwidth python script.py`
```py
# WITH THIS PR
# kernel1 run in 0.032768 ms, achieved: 641.25 GB/s
# kernel1 run in 0.033792 ms, achieved: 621.818 GB/s
# kernel1 run in 0.032768 ms, achieved: 641.25 GB/s
# kernel1 run in 0.032608 ms, achieved: 644.396 GB/s
# kernel1 run in 0.031744 ms, achieved: 661.935 GB/s
# kernel1 run in 0.031744 ms, achieved: 661.935 GB/s
# kernel1 run in 0.032768 ms, achieved: 641.25 GB/s
# kernel1 run in 0.03072 ms, achieved: 684 GB/s
# kernel1 run in 0.031744 ms, achieved: 661.935 GB/s
# kernel1 run in 0.031744 ms, achieved: 661.935 GB/s

# ON MASTER
# kernel1 run in 0.05632 ms, achieved: 373.091 GB/s
# kernel1 run in 0.044032 ms, achieved: 477.209 GB/s
# kernel1 run in 0.044032 ms, achieved: 477.209 GB/s
# kernel1 run in 0.044032 ms, achieved: 477.209 GB/s
# kernel1 run in 0.043808 ms, achieved: 479.649 GB/s
# kernel1 run in 0.043008 ms, achieved: 488.571 GB/s
# kernel1 run in 0.044032 ms, achieved: 477.209 GB/s
# kernel1 run in 0.043008 ms, achieved: 488.571 GB/s
# kernel1 run in 0.043008 ms, achieved: 488.571 GB/s
# kernel1 run in 0.043008 ms, achieved: 488.571 GB/s
```
So this PR gives about 35% improvement in performance using nvfuser executor with this specific normalized shape.

Also this PR fixes https://github.com/pytorch/pytorch/issues/83506 (see the change in `torch/csrc/jit/python/pybind_utils.cpp`).

Ref. https://github.com/pytorch/pytorch/issues/80187

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83508
Approved by: https://github.com/ngimel
2022-08-28 18:45:25 +00:00
PyTorch MergeBot
b159a5230f Revert "Add nvprims.var_mean (#83508)"
This reverts commit 7e7694b661.

Reverted https://github.com/pytorch/pytorch/pull/83508 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally
2022-08-28 11:30:27 +00:00
Ivan Yashchuk
7e7694b661 Add nvprims.var_mean (#83508)
This PR adds nvfuser-specific primitive - `var_mean`.
Interpretation `torch.var_mean` -> `torch.ops.nvprims.var_mean` is handled by `TorchRefsNvfuserCapabilityMode` context manager.

I moved some helper code from `_prims/__init__.py` to `_prims_common`. Correctness is tested with OpInfo tests (see `PythonRefInfo("ops.nvprims.var_mean"`).

Layer norm reference now uses `torch.var_mean` instead of `torch._refs.var_mean` to allow interception. Here's a simple comparison of performance with this PR and master (on 3080ti):
```py
import torch
from torch._prims.context import TorchRefsNvfuserCapabilityMode
from torch.fx.experimental.proxy_tensor import make_fx
from torch._prims.executor import execute

def func(a):
    return torch.native_layer_norm(a, (1024,), None, None, 1e-6)

a = torch.randn(10, 512, 1024, dtype=torch.float16, device="cuda")

with TorchRefsNvfuserCapabilityMode():
    gm = make_fx(func)(a)

for _ in range(10):
    execute(gm, a, executor="strictly_nvfuser");
```
run with `PYTORCH_NVFUSER_DUMP=dump_eff_bandwidth python script.py`
```py
# WITH THIS PR
# kernel1 run in 0.032768 ms, achieved: 641.25 GB/s
# kernel1 run in 0.033792 ms, achieved: 621.818 GB/s
# kernel1 run in 0.032768 ms, achieved: 641.25 GB/s
# kernel1 run in 0.032608 ms, achieved: 644.396 GB/s
# kernel1 run in 0.031744 ms, achieved: 661.935 GB/s
# kernel1 run in 0.031744 ms, achieved: 661.935 GB/s
# kernel1 run in 0.032768 ms, achieved: 641.25 GB/s
# kernel1 run in 0.03072 ms, achieved: 684 GB/s
# kernel1 run in 0.031744 ms, achieved: 661.935 GB/s
# kernel1 run in 0.031744 ms, achieved: 661.935 GB/s

# ON MASTER
# kernel1 run in 0.05632 ms, achieved: 373.091 GB/s
# kernel1 run in 0.044032 ms, achieved: 477.209 GB/s
# kernel1 run in 0.044032 ms, achieved: 477.209 GB/s
# kernel1 run in 0.044032 ms, achieved: 477.209 GB/s
# kernel1 run in 0.043808 ms, achieved: 479.649 GB/s
# kernel1 run in 0.043008 ms, achieved: 488.571 GB/s
# kernel1 run in 0.044032 ms, achieved: 477.209 GB/s
# kernel1 run in 0.043008 ms, achieved: 488.571 GB/s
# kernel1 run in 0.043008 ms, achieved: 488.571 GB/s
# kernel1 run in 0.043008 ms, achieved: 488.571 GB/s
```
So this PR gives about 35% improvement in performance using nvfuser executor with this specific normalized shape.

Also this PR fixes https://github.com/pytorch/pytorch/issues/83506 (see the change in `torch/csrc/jit/python/pybind_utils.cpp`).

Ref. https://github.com/pytorch/pytorch/issues/80187

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83508
Approved by: https://github.com/ngimel
2022-08-27 09:05:20 +00:00
PyTorch MergeBot
c7edcd6968 Revert "Don't introduce new overload for SymInt (#83628)"
This reverts commit 9790d90e4b.

Reverted https://github.com/pytorch/pytorch/pull/83628 on behalf of https://github.com/malfet due to Breaks internal builds, see D39076487
2022-08-27 01:23:17 +00:00
Edward Z. Yang
9790d90e4b Don't introduce new overload for SymInt (#83628)
Previously, we introduced new SymInt overloads for every function we wanted.  This led to a lot of boilerplate, and also a lot of confusion about how the overloads needed to be implemented.

This PR takes a simpler but more risky approach: just take the original function and changes its ints to SymInts.

This is BC-breaking in the following ways:

* The C++ API for registering implementations for aten operators will change from int64_t to SymInt whenever you make this change. Code generated registrations in PyTorch do not change as codegen handles the translation automatically, but manual registrations will need to follow the change.  Typically, if you now accept a SymInt where you previously only took int64_t, you have to convert it back manually.  This will definitely break XLA, see companion PR https://github.com/pytorch/xla/pull/3914 Note that not all dispatch keys get the automatic translation; all the composite keys and Meta keys are modified to take SymInt directly (because they should handle them directly), and so there are adjustments for this.

This is not BC-breaking in the following ways:

* The user facing C++ API remains compatible.  Even if a function changes from int to SymInt, the default C++ binding still takes only ints.  (e.g., at::empty(IntArrayRef, ...).  To call with SymInts, you must call at::empty_symint instead. This involved adding two more signatures to CppSignatureGroup; in many cases I refactored code to iterate over all signatures in the group instead of hard-coding the two that previously existed.
* This is TorchScript compatible; internally we treat SymInts as ints so there is no change to what happens at runtime in TorchScript. In particular, it's OK to reference an empty schema by its old type (using int types), as long as you're not doing string equality (which you shouldn't be), these parse to the same underyling type.

Structure of the PR:

* The general strategy of this PR is that, even when you write `SymInt` inside `native_functions.yaml`, sometimes, we will treat it *as if* it were an `int`. This idea pervades the codegen changes, where we have a translation from SymInt to c10::SymInt or int64_t, and this is controlled by a symint kwarg which I added and then audited all call sites to decide which I wanted. Here are some of the major places where we pick one or the other:
  * The C++ FunctionSchema representation represents `SymInt` as `int`. There are a few places we do need to know that we actually have a SymInt and we consult `real_type()` to get the real type in this case. In particular:
    * When we do schema validation of C++ operator registration, we must compare against true schema (as the C++ API will provide `c10::SymInt`, and this will only be accepted if the schema is `SymInt`. This is handled with cloneWithRealTypes before we check for schema differences.
    * In `toIValue` argument parsing, we parse against the true schema value. For backwards compatibility reasons, I do still accept ints in many places where Layout/SymInt/etc were expected. (Well, accepting int where SymInt is expected is not BC, it's just the right logic!)
  * In particular, because SymInt never shows up as type() in FunctionSchema, this means that we no longer need a dedicated Tag::SymInt. This is good, because SymInts never show up in mobile anyway.
* Changes to functorch/aten are mostly about tracking changes to the C++ API registration convention. Additionally, since SymInt overloads no longer exist, registrations for SymInt implementations are deleted. In many cases, the old implementations did not properly support SymInts; I did not add any new functionality with this PR, but I did try to annotate with TODOs where this is work to do. Finally, because the signature of `native::` API changed from int to SymInt, I need to find alternative APIs for people who were directly calling these functions to call. Typically, I insert a new dispatch call when perf doesn't matter, or use `at::compositeexplicitautograd` namespace to handle other caes.
* The change to `make_boxed_from_unboxed_functor.h` is so that we accept a plain IntList IValue anywhere a SymIntList is expected; these are read-only arguments so covariant typing is OK.
* I change how unboxing logic works slightly. Previously, we interpret the C++ type for Layout/etc directly as IntType JIT type, which works well because the incoming IValue is tagged as an integer. Now, we interpret the C++ type for Layout as its true type, e.g., LayoutType (change to `jit_type.h`), but then we accept an int IValue for it anyway. This makes it symmetric with SymInt, where we interpret the C++ type as SymIntType, and then accept SymInt and int IValues for it.
* I renamed the `empty.names` overload to `empty_names` to make it less confusing (I kept mixing it up with the real empty overload)
* I deleted the `empty.SymInt` overload, which ended up killing a pile of functions. (This was originally a separate PR but the profiler expect test was giving me grief so I folded it in.)
* I deleted the LazyDynamicOpsTest tests. These were failing after these changes, and I couldn't figure out why they used to be passing: they make use of `narrow_copy` which didn't actually support SymInts; they were immediately converted to ints.
* I bashed LTC into working. The patches made here are not the end of the story. The big problem is that SymInt translates into Value, but what if you have a list of SymInt? This cannot be conveniently represented in the IR today, since variadic Values are not supported. To work around this, I translate SymInt[] into plain int[] (this is fine for tests because LTC dynamic shapes never actually worked); but this will need to be fixed for proper LTC SymInt support. The LTC codegen also looked somewhat questionable; I added comments based on my code reading.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83628
Approved by: https://github.com/albanD, https://github.com/bdhirsh
2022-08-26 01:35:40 +00:00
PyTorch MergeBot
a7edf71360 Revert "Don't introduce new overload for SymInt (#83628)"
This reverts commit 8fae7027b3.

Reverted https://github.com/pytorch/pytorch/pull/83628 on behalf of https://github.com/malfet due to breaking internal builds, see https://www.internalfb.com/diff/D38984222
2022-08-25 00:49:40 +00:00
Edward Z. Yang
8fae7027b3 Don't introduce new overload for SymInt (#83628)
Previously, we introduced new SymInt overloads for every function we wanted.  This led to a lot of boilerplate, and also a lot of confusion about how the overloads needed to be implemented.

This PR takes a simpler but more risky approach: just take the original function and changes its ints to SymInts.

This is BC-breaking in the following ways:

* The C++ API for registering implementations for aten operators will change from int64_t to SymInt whenever you make this change. Code generated registrations in PyTorch do not change as codegen handles the translation automatically, but manual registrations will need to follow the change.  Typically, if you now accept a SymInt where you previously only took int64_t, you have to convert it back manually.  This will definitely break XLA, see companion PR https://github.com/pytorch/xla/pull/3914 Note that not all dispatch keys get the automatic translation; all the composite keys and Meta keys are modified to take SymInt directly (because they should handle them directly), and so there are adjustments for this.

This is not BC-breaking in the following ways:

* The user facing C++ API remains compatible.  Even if a function changes from int to SymInt, the default C++ binding still takes only ints.  (e.g., at::empty(IntArrayRef, ...).  To call with SymInts, you must call at::empty_symint instead. This involved adding two more signatures to CppSignatureGroup; in many cases I refactored code to iterate over all signatures in the group instead of hard-coding the two that previously existed.
* This is TorchScript compatible; internally we treat SymInts as ints so there is no change to what happens at runtime in TorchScript. In particular, it's OK to reference an empty schema by its old type (using int types), as long as you're not doing string equality (which you shouldn't be), these parse to the same underyling type.

Structure of the PR:

* The general strategy of this PR is that, even when you write `SymInt` inside `native_functions.yaml`, sometimes, we will treat it *as if* it were an `int`. This idea pervades the codegen changes, where we have a translation from SymInt to c10::SymInt or int64_t, and this is controlled by a symint kwarg which I added and then audited all call sites to decide which I wanted. Here are some of the major places where we pick one or the other:
  * The C++ FunctionSchema representation represents `SymInt` as `int`. There are a few places we do need to know that we actually have a SymInt and we consult `real_type()` to get the real type in this case. In particular:
    * When we do schema validation of C++ operator registration, we must compare against true schema (as the C++ API will provide `c10::SymInt`, and this will only be accepted if the schema is `SymInt`. This is handled with cloneWithRealTypes before we check for schema differences.
    * In `toIValue` argument parsing, we parse against the true schema value. For backwards compatibility reasons, I do still accept ints in many places where Layout/SymInt/etc were expected. (Well, accepting int where SymInt is expected is not BC, it's just the right logic!)
  * In particular, because SymInt never shows up as type() in FunctionSchema, this means that we no longer need a dedicated Tag::SymInt. This is good, because SymInts never show up in mobile anyway.
* Changes to functorch/aten are mostly about tracking changes to the C++ API registration convention. Additionally, since SymInt overloads no longer exist, registrations for SymInt implementations are deleted. In many cases, the old implementations did not properly support SymInts; I did not add any new functionality with this PR, but I did try to annotate with TODOs where this is work to do. Finally, because the signature of `native::` API changed from int to SymInt, I need to find alternative APIs for people who were directly calling these functions to call. Typically, I insert a new dispatch call when perf doesn't matter, or use `at::compositeexplicitautograd` namespace to handle other caes.
* The change to `make_boxed_from_unboxed_functor.h` is so that we accept a plain IntList IValue anywhere a SymIntList is expected; these are read-only arguments so covariant typing is OK.
* I change how unboxing logic works slightly. Previously, we interpret the C++ type for Layout/etc directly as IntType JIT type, which works well because the incoming IValue is tagged as an integer. Now, we interpret the C++ type for Layout as its true type, e.g., LayoutType (change to `jit_type.h`), but then we accept an int IValue for it anyway. This makes it symmetric with SymInt, where we interpret the C++ type as SymIntType, and then accept SymInt and int IValues for it.
* I renamed the `empty.names` overload to `empty_names` to make it less confusing (I kept mixing it up with the real empty overload)
* I deleted the `empty.SymInt` overload, which ended up killing a pile of functions. (This was originally a separate PR but the profiler expect test was giving me grief so I folded it in.)
* I deleted the LazyDynamicOpsTest tests. These were failing after these changes, and I couldn't figure out why they used to be passing: they make use of `narrow_copy` which didn't actually support SymInts; they were immediately converted to ints.
* I bashed LTC into working. The patches made here are not the end of the story. The big problem is that SymInt translates into Value, but what if you have a list of SymInt? This cannot be conveniently represented in the IR today, since variadic Values are not supported. To work around this, I translate SymInt[] into plain int[] (this is fine for tests because LTC dynamic shapes never actually worked); but this will need to be fixed for proper LTC SymInt support. The LTC codegen also looked somewhat questionable; I added comments based on my code reading.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83628
Approved by: https://github.com/albanD, https://github.com/bdhirsh
2022-08-23 22:04:07 +00:00
Nikolay Korovaiko
5b621205f4 Revert "Revert "adding a custom caster for c10::SymInt (#82692)"" (#83223)
This should fix the MacOS build errors and reland #82692
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83223
Approved by: https://github.com/albanD
2022-08-12 00:46:50 +00:00
David Berard
1f99bdfcc4 [JIT] Retry - Support scripting torch.is_autocast_enabled() (#82394)
This adds an `aten::is_autocast_enabled` op into the jit runtime so that
autocasting ops can be scripted and called from within jit.

Differential Revision: [D38294040](https://our.internmc.facebook.com/intern/diff/D38294040)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82394
Approved by: https://github.com/eellison
2022-08-10 18:26:17 +00:00
goldenxuett
2b6905413e [JIT] Add SchemaCheckMode OpInfo test (#82442)
- Move test_schema_check to torch/test directory.
- Add opInfo test for SchemaCheckMode to check all operator schemas
- Add various changes (using isClose instead of equals, skipping complex number cases for certain ops, etc...) in order to have test_schema_check pass.

Differential Revision: [D38437946](https://our.internmc.facebook.com/intern/diff/D38437946)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82442
Approved by: https://github.com/davidberard98
2022-08-09 23:13:43 +00:00
PyTorch MergeBot
daeea7d2c3 Revert "adding a custom caster for c10::SymInt (#82692)"
This reverts commit dee63f4f7b.

Reverted https://github.com/pytorch/pytorch/pull/82692 on behalf of https://github.com/seemethere due to Broke internal builds, see [logs](https://www.internalfb.com/intern/sandcastle/job/4503600373141339/insights)
2022-08-09 22:17:41 +00:00
Edward Z. Yang
988bd0173c Add OpOverload.decompose API (#83075)
This allows you to directly call into the CompositeImplicitAutograd
implementation of an operator, *without* changing any aspects of the
dispatcher state.  In particular, you can use this to recursively call
into a decomposition, dispatching back to your tensor subclass/mode
as desired.

Hypothetically, we should also make these available in the
decompositions dictionary, but I'm leaving this as future work as
enumerating these decompositions is annoying (as operators are lazily
registered.)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83075
Approved by: https://github.com/albanD
2022-08-09 18:53:19 +00:00
Tugsbayasgalan Manlaibaatar
b4b60c2a2e Get rid of ENABLE_UPGRADERS macro (#77574)
Since it's been a while after we merged the upgrader design and we haven't encountered any issues, let's get rid of the macro for safe rollout
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77574
Approved by: https://github.com/gmagogsfm
2022-08-09 05:33:14 +00:00
Nikolay Korovaiko
dee63f4f7b adding a custom caster for c10::SymInt (#82692)
### Description
Adding a custom caster for `c10::SymInt`. This simplifies handling of c10::SymInt on C++/Pytorch boundary. Namely, removing if statements to handle the union nature (e.g. SymIntNode, int) of c10::SymInt.

### Issue
<!-- Link to Issue ticket or RFP -->

### Testing
<!-- How did you test your change? -->

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82692
Approved by: https://github.com/ezyang
2022-08-08 21:40:53 +00:00
Nikolay Korovaiko
8b20e47974 add integer divison for symints (#82791)
### Description
This PR brings integer division (floor) to symints + tests.

### Issue

https://github.com/orgs/pytorch/projects/17/views/2

### Testing
added two tests to TestPySymInts

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82791
Approved by: https://github.com/ezyang
2022-08-04 20:00:51 +00:00
Ivan Yashchuk
ec67c6abbe Add torch.ops.nvprims namespace for nvFuser-specific prims (#82155)
New namespace `torch.ops.nvprims` is meant for specific to the nvFuser set of primitives. All `impl_nvfuser` attributes are removed from `torch.ops.prims` functions.

`NvfuserPrimsMode()` context manager can be used for automatic rewrite of `torch.ops.prims` calls to `torch.ops.nvprims` when possible.

The previous way to test whether a prim would be executable with nvFuser was to test `impl_nvfuser is not None`, now all functions in the `torch.ops.nvprims` namespace are supposed to have the `impl_nvfuser` attribute and hence all are executable by nvFuser.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82155
Approved by: https://github.com/jjsjann123, https://github.com/ngimel
2022-08-04 16:51:56 +00:00
Edward Z. Yang
df69660832 Revert "Revert "Add a lint rule for torch/csrc/util/pybind.h include (#82552)"" (#82599)
This reverts commit 532b8a9e00.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82599
Approved by: https://github.com/albanD
2022-08-02 19:37:02 +00:00
Andres Lugo-Reyes
f1a1356907 [ROCm] Enable/fix unit tests test_stream_args and test_event_args (#82346)
### Description
Removed some stubbed out code that was necessary for ROCm builds to support JIT compilation of Event and Stream classes. Original motivation for the code to be stubbed out in the ROCm case was likely due to this pull request:
https://github.com/pytorch/pytorch/pull/48020
In this PR, the include statement at the at the top of cuda.h was incorrectly pointed to aten/src/ATen/cuda/CUDAEvent.h when it should have been set to ATen/cuda/CUDAEvent.h. This error caused the hipification process of build_amd.py to not hipify this include statement correctly, causing errors. The include statement in question was subsequently fixed in the following commit:
acd072967a

This PR re-introduces the stubbed out code to the ROCm build and "unskips" the associated unit tests.

### Testing
Note: bullets prepended by ROCm were tested on systems with AMD GPUs while the others were tested with NVIDIA GPUs.
- apply commit
- (ROCm)`python tools/amd_build/build_amd.py`
- `python setup.py develop`
- (ROCm)`PYTORCH_TEST_WITH_ROCM=1 python test/test_jit.py TestCUDA.test_event_args`
- (ROCm)`PYTORCH_TEST_WITH_ROCM=1 python test/test_jit.py TestCUDA.test_stream_args`
- `python test/test_jit.py TestCUDA.test_event_args`
- `python test/test_jit.py TestCUDA.test_stream_args`
- Confirm tests pass in all scenarios

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82346
Approved by: https://github.com/malfet
2022-08-01 22:55:15 +00:00
PyTorch MergeBot
532b8a9e00 Revert "Add a lint rule for torch/csrc/util/pybind.h include (#82552)"
This reverts commit 9465c0e0b5.

Reverted https://github.com/pytorch/pytorch/pull/82552 on behalf of https://github.com/zengk95 due to This seems to be breaking windows binary wheels
2022-08-01 20:25:35 +00:00
Edward Z. Yang
9465c0e0b5 Add a lint rule for torch/csrc/util/pybind.h include (#82552)
We define specializations for pybind11 defined templates
(in particular, PYBIND11_DECLARE_HOLDER_TYPE) and consequently
it is important that these specializations *always* be #include'd
when making use of pybind11 templates whose behavior depends on
these specializations, otherwise we can cause an ODR violation.

The easiest way to ensure that all the specializations are always
loaded is to designate a header (in this case, torch/csrc/util/pybind.h)
that ensures the specializations are defined, and then add a lint
to ensure this header is included whenever pybind11 headers are
included.

The existing grep linter didn't have enough knobs to do this
conveniently, so I added some features.  I'm open to suggestions
for how to structure the features better.  The main changes:

- Added an --allowlist-pattern flag, which turns off the grep lint
  if some other line exists.  This is used to stop the grep
  lint from complaining about pybind11 includes if the util
  include already exists.

- Added --match-first-only flag, which lets grep only match against
  the first matching line.  This is because, even if there are multiple
  includes that are problematic, I only need to fix one of them.
  We don't /really/ need this, but when I was running lintrunner -a
  to fixup the preexisting codebase it was annoying without this,
  as the lintrunner overall driver fails if there are multiple edits
  on the same file.

I excluded any files that didn't otherwise have a dependency on
torch/ATen, this was mostly caffe2 and the valgrind wrapper compat
bindings.

Note the grep replacement is kind of crappy, but clang-tidy lint
cleaned it up in most cases.

See also https://github.com/pybind/pybind11/issues/4099

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82552
Approved by: https://github.com/albanD
2022-08-01 17:16:58 +00:00
Edward Z. Yang
50e8abbcad Change SymIntNode into an intrusive pointer (#82548)
This will make the pointer type a single word, which is important
for packing it into an int64_t

This time, this diff doesn't segfault when you build with DEBUG mode; more details at https://github.com/pybind/pybind11/issues/4099

Signed-off-by: Edward Z. Yang <ezyangfb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82548
Approved by: https://github.com/albanD
2022-08-01 15:07:21 +00:00
PyTorch MergeBot
3b9cbb1738 Revert "Change SymIntNode into an intrusive pointer (#82432)"
This reverts commit 7be44f8158.

Reverted https://github.com/pytorch/pytorch/pull/82432 on behalf of https://github.com/ezyang due to segfaults on test but not caught in CI
2022-07-29 20:08:59 +00:00
Edward Z. Yang
7be44f8158 Change SymIntNode into an intrusive pointer (#82432)
This will make the pointer type a single word, which is important
for packing it into an int64_t

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82432
Approved by: https://github.com/albanD, https://github.com/Krovatkin
2022-07-29 17:32:54 +00:00
William Tambellini
6e56629efa [JIT] JIT script init verbose assert (#80495)
Log the sizes of inputs in the assert of setInputTensorTypes(...)
in jit/python/script_init.cpp for easy debugging.
Helps/close:
https://github.com/pytorch/pytorch/issues/72763
Fixes #72763

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80495
Approved by: https://github.com/davidberard98
2022-07-29 00:50:18 +00:00
Edward Z. Yang
34bdd46e6e Rename shared_ptr<SymIntNodeImpl> to SymIntNode (#82355)
Makes code a lot more compact!  It also makes it possible to swap out
the shared ptr implementation, which I am about to do next.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82355
Approved by: https://github.com/Krovatkin
2022-07-28 18:27:45 +00:00
Edward Z. Yang
fd5ac1e6b5 Rename SymbolicIntNode to SymIntNodeImpl (#82350)
Done via

```
git grep -l 'SymbolicIntNode' | xargs sed -i 's/SymbolicIntNode/SymIntNodeImpl/g'
```

Reasoning for the change:

* Sym is shorter than Symbolic, and consistent with SymInt
* You usually will deal in shared_ptr<...>, so we're going to
  reserve the shorter name (SymIntNode) for the shared pointer.

But I don't want to update the Python name, so afterwards I ran

```
 git grep -l _C.SymIntNodeImpl | xargs sed -i 's/_C.SymIntNodeImpl/_C.SymIntNode/'
```

and manually fixed up the binding code

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82350
Approved by: https://github.com/Krovatkin
2022-07-28 18:27:45 +00:00
PyTorch MergeBot
554b4060aa Revert "[JIT] Support scripting torch.is_autocast_enabled() (#81305)"
This reverts commit bcc9084bc4.

Reverted https://github.com/pytorch/pytorch/pull/81305 on behalf of https://github.com/malfet due to Broke lite-intepreter builds, see https://github.com/pytorch/pytorch/runs/7550084494?check_suite_focus=true
2022-07-28 00:02:53 +00:00
David Berard
bcc9084bc4 [JIT] Support scripting torch.is_autocast_enabled() (#81305)
This adds an `aten::is_autocast_enabled` op into the jit runtime so that
autocasting ops can be scripted and called from within jit.

Differential Revision: [D37901585](https://our.internmc.facebook.com/intern/diff/D37901585)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81305
Approved by: https://github.com/qihqi, https://github.com/eellison
2022-07-27 22:32:08 +00:00
albanD
4b7de26556 Fix C API to be compatible with latest 3.11 beta (#81242)
Based off https://github.com/pytorch/pytorch/pull/80511 with extra changes:
- Update pybind to the latest release as it contains some needed fixes
- Extend the compat header to do reduce changes in code
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81242
Approved by: https://github.com/malfet, https://github.com/mattip
2022-07-27 08:37:10 +00:00
goldenxuett
d576a7dc97 [JIT] Fix python binding error with empty containers in init.cpp (#81786)
- toTypeInferredIValue will throw an error when given an empty container because it isn't able to tell what kind of container it is. Thus empty containers are ignored in addArgumentValue/s, overlaps, and is_alias_of.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81786
Approved by: https://github.com/davidberard98
2022-07-23 05:50:39 +00:00
goldenxuett
c9497886fd [JIT] Modify is_mutable in FunctionSchema and SchemaInfo to have SchemaArgument parameter instead of index (#81784)
- Modify the is_mutable(size_t index) overload to become is_mutable(const SchemaArgument& argument) due to cases where one might want to check the mutability of either input or output arguments.
- Refactored all calls to the function to use this new overload
- Tested through is_mutable() tests in test_schema_info.cpp
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81784
Approved by: https://github.com/davidberard98
2022-07-20 22:09:56 +00:00
goldenxuett
21a4be34cd [JIT] Enchance training ops check to be more inclusive and account for possible pybind exceptions (#81782)
- Modified is_mutable python binding to accept a string instead of a string_view for better python compatibility.
- Modified argument value adding python bindings to deal with input/self edge case due to inconsistencies in how the first variable is named.
- Modified _is_alias_of and created _contains_alias_of python bindings to accurately find out if values are aliasing, or contain an alias.
- Fixed is_mutable implementation to cover all ops that have mutable optional arguments. (These are all the ops that have the optional arguments 'running_mean' and 'running_var' along with either 'train', 'training' or 'use_input_stats.'
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81782
Approved by: https://github.com/davidberard98
2022-07-20 22:09:54 +00:00
goldenxuett
8e454cc702 [JIT] Add SchemaInfo python bindings to init.cpp (#81518)
- Added python bindings for SchemaInfo class, SchemaArgument struct, and SchemaArgType enum.
- Tested that argument values are added correctly to SchemaInfo binding in test_schema_check.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81518
Approved by: https://github.com/davidberard98
2022-07-19 22:33:19 +00:00
Edward Z. Yang
7e60e315da Add support for Generator conversion to/from IValue (#81697)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81697
Approved by: https://github.com/anjali411
2022-07-19 16:50:10 +00:00
Horace He
97938d872e Added a couple more symint magic methods + symbolic shape infra (#81086)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81086
Approved by: https://github.com/ezyang
2022-07-16 23:47:58 +00:00
albanD
1afb804f26 Improve wrapper subclass detection for serialization (#81105)
Fixes https://github.com/pytorch/pytorch/issues/80983

Also fix a small bug uncovered by the new test where creating memory_view for 0-sized inputs is not valid and is now skipped

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81105
Approved by: https://github.com/ezyang
2022-07-11 14:02:37 +00:00
Edward Z. Yang
91b0250606 Remove dead code from torch.ops torch function handling (#80993)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80993
Approved by: https://github.com/anjali411
2022-07-06 22:56:18 +00:00
Edward Z. Yang
421f04dd02 Only allow numbers as tensors if operator was explicitly allowlisted so (#80587)
Fixes https://github.com/pytorch/pytorch/issues/80508

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80587
Approved by: https://github.com/ngimel
2022-06-30 18:59:38 +00:00
Rodrigo Kumpera
b4e491798c Avoid temporary buffers for tensors with torch.save. (#80404)
Fix torch.save _open_zipfile_writer optimization that uses a c++ stream when `f` is a os.PathLike.
This fastpath requires that we don't `open()` in python if possible, so don't do it unconditionally.

Fix PyTorchStreamWriter construction binding that takes a buffer object.
Use py::memoryview instead of py::bytes as the former doesn't copy the data.

Validated with a trivial benchmark that calls torch.save in a loop 20x with a 10M elements float32 tensor
either on cpu or cuda. Saved to /dev/null.

Tried two variants 'str' and 'open'
    In 'str' we pass the string "/dev/null" to torch.save.
    In 'open' we pass `open("/dev/null", "wb")` to torch.save.

Timing in seconds.

Before this patch:
str-cpu :: 0.757
open-cpu :: 0.757
str-cuda :: 1.367
open-cuda :: 1.366

After this patch:
str-cpu :: 0.256
open-cpu :: 0.251
str-cuda :: 0.896
open-cuda :: 0.834

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80404
Approved by: https://github.com/jamesr66a
2022-06-30 00:19:42 +00:00
Horace He
7850a328b4 Revert "Revert "parse pysymints to IValues (#80066)"" (#80419)
This is a reland of https://github.com/pytorch/pytorch/pull/80066 with the relative path changed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80419
Approved by: https://github.com/Krovatkin
2022-06-28 17:21:34 +00:00
PyTorch MergeBot
0322ecc3fd Revert "parse pysymints to IValues (#80066)"
This reverts commit f532b3a619.

Reverted https://github.com/pytorch/pytorch/pull/80066 on behalf of https://github.com/seemethere due to Uses relative includes which causes internal builds to fail
2022-06-24 20:15:09 +00:00
Nikolay Korovaiko
f532b3a619 parse pysymints to IValues (#80066)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80066
Approved by: https://github.com/Chillee
2022-06-23 19:51:08 +00:00