- Support storing SymFloat in IValue
- Add SymFloat to JIT type system (erases to float)
- Printing support for SymFloat
- add/sub/mul/truediv operator support for SymFloat
- Support truediv on integers, it returns a SymFloat
- Support parsing SymFloat from Python object
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85411
Approved by: https://github.com/albanD
No changes to contents, just moving things out of header.
I only moved the stuff I suspected I'd be editing; maybe more
things from this header could migrate out.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84919
Approved by: https://github.com/suo
Also Back out "Revert D39075159: [acc_tensor] Use SymIntArrayRef for overloaded empty.memory_format's signature"
Original commit changeset: dab4a9dba4fa
Original commit changeset: dcaf16c037a9
Original Phabricator Diff: D38984222
Original Phabricator Diff: D39075159
Also update Metal registrations for C++ registration changes.
Also update NNPI registration to account for tightened schema checking
Differential Revision: [D39084762](https://our.internmc.facebook.com/intern/diff/D39084762/)
**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39084762/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84173
Approved by: https://github.com/Krovatkin
This PR adds nvfuser-specific primitive - `var_mean`.
Interpretation `torch.var_mean` -> `torch.ops.nvprims.var_mean` is handled by `TorchRefsNvfuserCapabilityMode` context manager.
I moved some helper code from `_prims/__init__.py` to `_prims_common`. Correctness is tested with OpInfo tests (see `PythonRefInfo("ops.nvprims.var_mean"`).
Layer norm reference now uses `torch.var_mean` instead of `torch._refs.var_mean` to allow interception. Here's a simple comparison of performance with this PR and master (on 3080ti):
```py
import torch
from torch._prims.context import TorchRefsNvfuserCapabilityMode
from torch.fx.experimental.proxy_tensor import make_fx
from torch._prims.executor import execute
def func(a):
return torch.native_layer_norm(a, (1024,), None, None, 1e-6)
a = torch.randn(10, 512, 1024, dtype=torch.float16, device="cuda")
with TorchRefsNvfuserCapabilityMode():
gm = make_fx(func)(a)
for _ in range(10):
execute(gm, a, executor="strictly_nvfuser");
```
run with `PYTORCH_NVFUSER_DUMP=dump_eff_bandwidth python script.py`
```py
# WITH THIS PR
# kernel1 run in 0.032768 ms, achieved: 641.25 GB/s
# kernel1 run in 0.033792 ms, achieved: 621.818 GB/s
# kernel1 run in 0.032768 ms, achieved: 641.25 GB/s
# kernel1 run in 0.032608 ms, achieved: 644.396 GB/s
# kernel1 run in 0.031744 ms, achieved: 661.935 GB/s
# kernel1 run in 0.031744 ms, achieved: 661.935 GB/s
# kernel1 run in 0.032768 ms, achieved: 641.25 GB/s
# kernel1 run in 0.03072 ms, achieved: 684 GB/s
# kernel1 run in 0.031744 ms, achieved: 661.935 GB/s
# kernel1 run in 0.031744 ms, achieved: 661.935 GB/s
# ON MASTER
# kernel1 run in 0.05632 ms, achieved: 373.091 GB/s
# kernel1 run in 0.044032 ms, achieved: 477.209 GB/s
# kernel1 run in 0.044032 ms, achieved: 477.209 GB/s
# kernel1 run in 0.044032 ms, achieved: 477.209 GB/s
# kernel1 run in 0.043808 ms, achieved: 479.649 GB/s
# kernel1 run in 0.043008 ms, achieved: 488.571 GB/s
# kernel1 run in 0.044032 ms, achieved: 477.209 GB/s
# kernel1 run in 0.043008 ms, achieved: 488.571 GB/s
# kernel1 run in 0.043008 ms, achieved: 488.571 GB/s
# kernel1 run in 0.043008 ms, achieved: 488.571 GB/s
```
So this PR gives about 35% improvement in performance using nvfuser executor with this specific normalized shape.
Also this PR fixes https://github.com/pytorch/pytorch/issues/83506 (see the change in `torch/csrc/jit/python/pybind_utils.cpp`).
Ref. https://github.com/pytorch/pytorch/issues/80187
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83508
Approved by: https://github.com/ngimel
This PR adds nvfuser-specific primitive - `var_mean`.
Interpretation `torch.var_mean` -> `torch.ops.nvprims.var_mean` is handled by `TorchRefsNvfuserCapabilityMode` context manager.
I moved some helper code from `_prims/__init__.py` to `_prims_common`. Correctness is tested with OpInfo tests (see `PythonRefInfo("ops.nvprims.var_mean"`).
Layer norm reference now uses `torch.var_mean` instead of `torch._refs.var_mean` to allow interception. Here's a simple comparison of performance with this PR and master (on 3080ti):
```py
import torch
from torch._prims.context import TorchRefsNvfuserCapabilityMode
from torch.fx.experimental.proxy_tensor import make_fx
from torch._prims.executor import execute
def func(a):
return torch.native_layer_norm(a, (1024,), None, None, 1e-6)
a = torch.randn(10, 512, 1024, dtype=torch.float16, device="cuda")
with TorchRefsNvfuserCapabilityMode():
gm = make_fx(func)(a)
for _ in range(10):
execute(gm, a, executor="strictly_nvfuser");
```
run with `PYTORCH_NVFUSER_DUMP=dump_eff_bandwidth python script.py`
```py
# WITH THIS PR
# kernel1 run in 0.032768 ms, achieved: 641.25 GB/s
# kernel1 run in 0.033792 ms, achieved: 621.818 GB/s
# kernel1 run in 0.032768 ms, achieved: 641.25 GB/s
# kernel1 run in 0.032608 ms, achieved: 644.396 GB/s
# kernel1 run in 0.031744 ms, achieved: 661.935 GB/s
# kernel1 run in 0.031744 ms, achieved: 661.935 GB/s
# kernel1 run in 0.032768 ms, achieved: 641.25 GB/s
# kernel1 run in 0.03072 ms, achieved: 684 GB/s
# kernel1 run in 0.031744 ms, achieved: 661.935 GB/s
# kernel1 run in 0.031744 ms, achieved: 661.935 GB/s
# ON MASTER
# kernel1 run in 0.05632 ms, achieved: 373.091 GB/s
# kernel1 run in 0.044032 ms, achieved: 477.209 GB/s
# kernel1 run in 0.044032 ms, achieved: 477.209 GB/s
# kernel1 run in 0.044032 ms, achieved: 477.209 GB/s
# kernel1 run in 0.043808 ms, achieved: 479.649 GB/s
# kernel1 run in 0.043008 ms, achieved: 488.571 GB/s
# kernel1 run in 0.044032 ms, achieved: 477.209 GB/s
# kernel1 run in 0.043008 ms, achieved: 488.571 GB/s
# kernel1 run in 0.043008 ms, achieved: 488.571 GB/s
# kernel1 run in 0.043008 ms, achieved: 488.571 GB/s
```
So this PR gives about 35% improvement in performance using nvfuser executor with this specific normalized shape.
Also this PR fixes https://github.com/pytorch/pytorch/issues/83506 (see the change in `torch/csrc/jit/python/pybind_utils.cpp`).
Ref. https://github.com/pytorch/pytorch/issues/80187
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83508
Approved by: https://github.com/ngimel
Previously, we introduced new SymInt overloads for every function we wanted. This led to a lot of boilerplate, and also a lot of confusion about how the overloads needed to be implemented.
This PR takes a simpler but more risky approach: just take the original function and changes its ints to SymInts.
This is BC-breaking in the following ways:
* The C++ API for registering implementations for aten operators will change from int64_t to SymInt whenever you make this change. Code generated registrations in PyTorch do not change as codegen handles the translation automatically, but manual registrations will need to follow the change. Typically, if you now accept a SymInt where you previously only took int64_t, you have to convert it back manually. This will definitely break XLA, see companion PR https://github.com/pytorch/xla/pull/3914 Note that not all dispatch keys get the automatic translation; all the composite keys and Meta keys are modified to take SymInt directly (because they should handle them directly), and so there are adjustments for this.
This is not BC-breaking in the following ways:
* The user facing C++ API remains compatible. Even if a function changes from int to SymInt, the default C++ binding still takes only ints. (e.g., at::empty(IntArrayRef, ...). To call with SymInts, you must call at::empty_symint instead. This involved adding two more signatures to CppSignatureGroup; in many cases I refactored code to iterate over all signatures in the group instead of hard-coding the two that previously existed.
* This is TorchScript compatible; internally we treat SymInts as ints so there is no change to what happens at runtime in TorchScript. In particular, it's OK to reference an empty schema by its old type (using int types), as long as you're not doing string equality (which you shouldn't be), these parse to the same underyling type.
Structure of the PR:
* The general strategy of this PR is that, even when you write `SymInt` inside `native_functions.yaml`, sometimes, we will treat it *as if* it were an `int`. This idea pervades the codegen changes, where we have a translation from SymInt to c10::SymInt or int64_t, and this is controlled by a symint kwarg which I added and then audited all call sites to decide which I wanted. Here are some of the major places where we pick one or the other:
* The C++ FunctionSchema representation represents `SymInt` as `int`. There are a few places we do need to know that we actually have a SymInt and we consult `real_type()` to get the real type in this case. In particular:
* When we do schema validation of C++ operator registration, we must compare against true schema (as the C++ API will provide `c10::SymInt`, and this will only be accepted if the schema is `SymInt`. This is handled with cloneWithRealTypes before we check for schema differences.
* In `toIValue` argument parsing, we parse against the true schema value. For backwards compatibility reasons, I do still accept ints in many places where Layout/SymInt/etc were expected. (Well, accepting int where SymInt is expected is not BC, it's just the right logic!)
* In particular, because SymInt never shows up as type() in FunctionSchema, this means that we no longer need a dedicated Tag::SymInt. This is good, because SymInts never show up in mobile anyway.
* Changes to functorch/aten are mostly about tracking changes to the C++ API registration convention. Additionally, since SymInt overloads no longer exist, registrations for SymInt implementations are deleted. In many cases, the old implementations did not properly support SymInts; I did not add any new functionality with this PR, but I did try to annotate with TODOs where this is work to do. Finally, because the signature of `native::` API changed from int to SymInt, I need to find alternative APIs for people who were directly calling these functions to call. Typically, I insert a new dispatch call when perf doesn't matter, or use `at::compositeexplicitautograd` namespace to handle other caes.
* The change to `make_boxed_from_unboxed_functor.h` is so that we accept a plain IntList IValue anywhere a SymIntList is expected; these are read-only arguments so covariant typing is OK.
* I change how unboxing logic works slightly. Previously, we interpret the C++ type for Layout/etc directly as IntType JIT type, which works well because the incoming IValue is tagged as an integer. Now, we interpret the C++ type for Layout as its true type, e.g., LayoutType (change to `jit_type.h`), but then we accept an int IValue for it anyway. This makes it symmetric with SymInt, where we interpret the C++ type as SymIntType, and then accept SymInt and int IValues for it.
* I renamed the `empty.names` overload to `empty_names` to make it less confusing (I kept mixing it up with the real empty overload)
* I deleted the `empty.SymInt` overload, which ended up killing a pile of functions. (This was originally a separate PR but the profiler expect test was giving me grief so I folded it in.)
* I deleted the LazyDynamicOpsTest tests. These were failing after these changes, and I couldn't figure out why they used to be passing: they make use of `narrow_copy` which didn't actually support SymInts; they were immediately converted to ints.
* I bashed LTC into working. The patches made here are not the end of the story. The big problem is that SymInt translates into Value, but what if you have a list of SymInt? This cannot be conveniently represented in the IR today, since variadic Values are not supported. To work around this, I translate SymInt[] into plain int[] (this is fine for tests because LTC dynamic shapes never actually worked); but this will need to be fixed for proper LTC SymInt support. The LTC codegen also looked somewhat questionable; I added comments based on my code reading.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83628
Approved by: https://github.com/albanD, https://github.com/bdhirsh
Previously, we introduced new SymInt overloads for every function we wanted. This led to a lot of boilerplate, and also a lot of confusion about how the overloads needed to be implemented.
This PR takes a simpler but more risky approach: just take the original function and changes its ints to SymInts.
This is BC-breaking in the following ways:
* The C++ API for registering implementations for aten operators will change from int64_t to SymInt whenever you make this change. Code generated registrations in PyTorch do not change as codegen handles the translation automatically, but manual registrations will need to follow the change. Typically, if you now accept a SymInt where you previously only took int64_t, you have to convert it back manually. This will definitely break XLA, see companion PR https://github.com/pytorch/xla/pull/3914 Note that not all dispatch keys get the automatic translation; all the composite keys and Meta keys are modified to take SymInt directly (because they should handle them directly), and so there are adjustments for this.
This is not BC-breaking in the following ways:
* The user facing C++ API remains compatible. Even if a function changes from int to SymInt, the default C++ binding still takes only ints. (e.g., at::empty(IntArrayRef, ...). To call with SymInts, you must call at::empty_symint instead. This involved adding two more signatures to CppSignatureGroup; in many cases I refactored code to iterate over all signatures in the group instead of hard-coding the two that previously existed.
* This is TorchScript compatible; internally we treat SymInts as ints so there is no change to what happens at runtime in TorchScript. In particular, it's OK to reference an empty schema by its old type (using int types), as long as you're not doing string equality (which you shouldn't be), these parse to the same underyling type.
Structure of the PR:
* The general strategy of this PR is that, even when you write `SymInt` inside `native_functions.yaml`, sometimes, we will treat it *as if* it were an `int`. This idea pervades the codegen changes, where we have a translation from SymInt to c10::SymInt or int64_t, and this is controlled by a symint kwarg which I added and then audited all call sites to decide which I wanted. Here are some of the major places where we pick one or the other:
* The C++ FunctionSchema representation represents `SymInt` as `int`. There are a few places we do need to know that we actually have a SymInt and we consult `real_type()` to get the real type in this case. In particular:
* When we do schema validation of C++ operator registration, we must compare against true schema (as the C++ API will provide `c10::SymInt`, and this will only be accepted if the schema is `SymInt`. This is handled with cloneWithRealTypes before we check for schema differences.
* In `toIValue` argument parsing, we parse against the true schema value. For backwards compatibility reasons, I do still accept ints in many places where Layout/SymInt/etc were expected. (Well, accepting int where SymInt is expected is not BC, it's just the right logic!)
* In particular, because SymInt never shows up as type() in FunctionSchema, this means that we no longer need a dedicated Tag::SymInt. This is good, because SymInts never show up in mobile anyway.
* Changes to functorch/aten are mostly about tracking changes to the C++ API registration convention. Additionally, since SymInt overloads no longer exist, registrations for SymInt implementations are deleted. In many cases, the old implementations did not properly support SymInts; I did not add any new functionality with this PR, but I did try to annotate with TODOs where this is work to do. Finally, because the signature of `native::` API changed from int to SymInt, I need to find alternative APIs for people who were directly calling these functions to call. Typically, I insert a new dispatch call when perf doesn't matter, or use `at::compositeexplicitautograd` namespace to handle other caes.
* The change to `make_boxed_from_unboxed_functor.h` is so that we accept a plain IntList IValue anywhere a SymIntList is expected; these are read-only arguments so covariant typing is OK.
* I change how unboxing logic works slightly. Previously, we interpret the C++ type for Layout/etc directly as IntType JIT type, which works well because the incoming IValue is tagged as an integer. Now, we interpret the C++ type for Layout as its true type, e.g., LayoutType (change to `jit_type.h`), but then we accept an int IValue for it anyway. This makes it symmetric with SymInt, where we interpret the C++ type as SymIntType, and then accept SymInt and int IValues for it.
* I renamed the `empty.names` overload to `empty_names` to make it less confusing (I kept mixing it up with the real empty overload)
* I deleted the `empty.SymInt` overload, which ended up killing a pile of functions. (This was originally a separate PR but the profiler expect test was giving me grief so I folded it in.)
* I deleted the LazyDynamicOpsTest tests. These were failing after these changes, and I couldn't figure out why they used to be passing: they make use of `narrow_copy` which didn't actually support SymInts; they were immediately converted to ints.
* I bashed LTC into working. The patches made here are not the end of the story. The big problem is that SymInt translates into Value, but what if you have a list of SymInt? This cannot be conveniently represented in the IR today, since variadic Values are not supported. To work around this, I translate SymInt[] into plain int[] (this is fine for tests because LTC dynamic shapes never actually worked); but this will need to be fixed for proper LTC SymInt support. The LTC codegen also looked somewhat questionable; I added comments based on my code reading.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83628
Approved by: https://github.com/albanD, https://github.com/bdhirsh
### Description
Adding a custom caster for `c10::SymInt`. This simplifies handling of c10::SymInt on C++/Pytorch boundary. Namely, removing if statements to handle the union nature (e.g. SymIntNode, int) of c10::SymInt.
### Issue
<!-- Link to Issue ticket or RFP -->
### Testing
<!-- How did you test your change? -->
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82692
Approved by: https://github.com/ezyang
Done via
```
git grep -l 'SymbolicIntNode' | xargs sed -i 's/SymbolicIntNode/SymIntNodeImpl/g'
```
Reasoning for the change:
* Sym is shorter than Symbolic, and consistent with SymInt
* You usually will deal in shared_ptr<...>, so we're going to
reserve the shorter name (SymIntNode) for the shared pointer.
But I don't want to update the Python name, so afterwards I ran
```
git grep -l _C.SymIntNodeImpl | xargs sed -i 's/_C.SymIntNodeImpl/_C.SymIntNode/'
```
and manually fixed up the binding code
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82350
Approved by: https://github.com/Krovatkin
This makes prims look as if they were defined in native_functions.yaml
but they're still all written in Python. You now need to give a full
schema string for your prims. The returned prim object is now
torch.ops.prim overload (prims are not allowed to be overloaded,
so we return the overload, not the overload packet, for speed.)
Signed-off-by: Edward Z. Yang <ezyangfb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77117
Approved by: https://github.com/mruberry, https://github.com/albanD
For the most part, PrimTorch refs have the same signature as their
ATen equivalents. I modify most PrimTorch refs to register themselves
as decompositions, using the prim name they wrap to find the aten name
(except for a few cases where the prim/aten names mismatch). There are
some exclusions, falling into one of two categories:
- The torch equivalent was already implemented as a CompositeImplicitAutograd
decomposition in C++
- The ref doesn't support enough features (e.g., the real deal has more
kwargs / overloads than are currently implemented)
PrimTorch refs are written as a single function that supports all
overloads, and this style is convenient for cases where we have a bundle
of overloads for what morally is a single overload with a Union type
on an argument (which we ought to have supported in
native_functions.yaml but blah); to support registering a single decomp
for all the overloads, we modify register_decomposition to register
to ALL overloads if you pass it an overload packet. This is technically
BC breaking but no tests started failing because of it.
Signed-off-by: Edward Z. Yang <ezyangfb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76835
Approved by: https://github.com/Chillee, https://github.com/mruberry
Summary:
This PR introduces `SymInt` type to Pytorch which will be used by LTC and AOTAutograd for tracing size arithmetic and tests.
`SymInt` is a C++ union structure [int64_t, SymbolicIntNode*] that wraps around an int64_t field where the value of the field could be an index into a list of `shared_ptr<SymbolicIntNode>` or a real int.
This PR doesn't add any support for actually tracing symbolic ints. i.e. data_ for now can only contain real ints.
```
Goal 1: just to show we can add a type to PyTorch core. (wraps int) LANDEABLE
Finalize the naming - symint
Want the name to be short
Does invoke “size” - NO
SInt/SymInt/SymbolicInt
SInt could mean signed int
sym_int or symint or SymInt (originally it was “int”; capitalized implies object semantics, whereas lowercase implies value semantics)
JIT schema - symint
C++ - symint
```
See more details here: https://docs.google.com/document/d/1iiLNwR5ohAsw_ymfnOpDsyF6L9RTUaHMpD8 (d843f63f2a)YLw-jxEw
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74861
Reviewed By: qihqi, ngimel
Differential Revision: D35226230
Pulled By: Krovatkin
fbshipit-source-id: 34acf342bd50fcaa4d8d5dd49c2fd6a98823a5b3
(cherry picked from commit 218643f63ef181cabb92d13a6e837eb64f2dda3c)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73874
These get triggered when you are doing normal stuff with sparse
tensors and `__torch_dispatch__`, but it all works fine. No need
to warn.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: bdhirsh
Differential Revision: D34707395
Pulled By: ezyang
fbshipit-source-id: 3492c03abb1df1e925af3855dbf772784405d8c1
(cherry picked from commit 95e5981b304abf0367740906c238b29cadeea41c)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68945
This PR enables the Python conversion functions for `Storage` (specifically `UntypedStorage`) and also cleans up some remnants of the deprecated typed storages from `DynamicTypes.cpp`.
ghstack-source-id: 147245110
Test Plan: Run the existing unit and integration tests.
Reviewed By: albanD
Differential Revision: D32676505
fbshipit-source-id: 3a3f6db4fb0da5c78dd406c96ab70bdc37015521
(cherry picked from commit d6427b94cf)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68136
DynamicType is an extension to existing server JIT types. Today using normal server types on Edge is a bit problematic because in embedded environments we don't need the full spectrum of types but we still build with these unneeded dependencies.
Is it possible to just get rid of unneeded JIT types from Edge builds? It's not easy to do so at this moment. For example, on Edge we don't support Union type, but we have to pull in the dependency of Union type because Optional type is being supported which inherits from Union type, so Union type has to be included in the build. Although we could split Union type and Optional type, it could be argued that the root cause is every time we use anything inheriting from `c10::Type`, we don't have the direct evidence of how much dependency we pull in, because we do virtual calls and we don't know what exactly we're calling with server JIT types. If we don't know, it's highly possible that the linker doesn't know either so it cannot effectively strip unused methods.
To address this problem, one option is to implement a separate `DynamicType` which has simpler behavior and doesn't store different types as different symbols in binary but rather raw data (or "tag"). This could increase the binary size by several KBs, so I included several binary size reductions in the same stack, hoping at least we don't regress the binary size.
Currently `DynamicType` inherits from `c10::Type` because I want to reduce the migration cost of `DynamicType` by making it interfacing with existing server JIT types. In the future `DynamicType` should be implemented as a separate class without relying on `c10::Type` to make things both simpler and leaner.
ghstack-source-id: 146670522
Test Plan: in the next diff.
Reviewed By: VitalyFedyunin
Differential Revision: D32264615
fbshipit-source-id: 180eb0998a14eacc1d8b28db39870d84fcc17d5b
Summary:
As per title. This in particular allows to more easily override backward function for which the underlying backend returns `None`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67793
Reviewed By: zou3519
Differential Revision: D32242962
Pulled By: albanD
fbshipit-source-id: 6e114def90ee9499161e1303d301ba7fd003ff89
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65345
FooType::get() can return a const reference. Inconveniently, converting shared_ptr<FooType> to shared_ptr<Type> requires a copy & refcount bump, so to properly take advantage of this in unshapedType() we need to take a const Type& in isSubtypeOf(), which is good practice anyway -- don't require a shared_ptr if you don't need to take ownership.
ghstack-source-id: 140044165
Test Plan:
CI
perf says c10::unshapedType time decreased from 2.8% to 2.2% during static runtime startup, though I expect this to be generally beneficial.
Reviewed By: hlu1
Differential Revision: D31027361
fbshipit-source-id: 676feb81db9f74ad7b8651d8774f4ecb4cfa6ab8
Summary:
This PR is created to replace https://github.com/pytorch/pytorch/pull/53180 PR stack, which has all the review discussions. Reason for needing a replacement is due to a messy Sandcastle issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64234
Reviewed By: gmagogsfm
Differential Revision: D30656444
Pulled By: ansley
fbshipit-source-id: 77536c8bcc88162e2c72636026ca3c16891d669a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61520
This commit widens the exception caught by the try-catch block that checks if
an object passed to a scripted function is a `ScriptList`. It turns out that
there are internal tests that do not throw a `py::cast_error` so catching only
that is not sufficient.
Test Plan: Ran the failing tests in T94889011.
Reviewed By: Chillee
Differential Revision: D29560815
fbshipit-source-id: 442258f8997146d833a9d5db923e1f6359f2bfdd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52832
**Summary**
This commit adds `torch._C.ScriptList`, a list type that has reference
semantics across the Python/TorchScript boundary. That is, modifications
made in TorchScript to instances of `torch._C.ScriptList`
are visible in Python even when it is not returned from the function.
`torch._C.ScriptList` is implemented using a modified version of pybind's
`stl_bind.h`-style bindings attached to `ScriptList` and `ScriptListIterator`,
wrapper classes around `c10::impl::GenericList` and
`c10::impl::GenericList::iterator`. These bindings allow instances of
`torch._C.ScriptList` to be used as if it were a
regular `list` in Python. Reference semantics are achieved by simply
retrieving the `IValue` contained in `ScriptList` in `toIValue` (invoked
when converting Python arguments to `IValues` before calling TorchScript
code).
**Test Plan**
This commit adds `TestScriptList` to `test_list_dict.py`, a set of tests
that check that all of the common list operations are supported
and that instances have reference semantics across the
Python/TorchScript boundary.
Test Plan: Imported from OSS
Reviewed By: gmagogsfm
Differential Revision: D29478121
Pulled By: SplitInfinity
fbshipit-source-id: 652cc25cfa37debe28db9527504846f22abd8b54
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44324
**Summary**
This commit adds reference semantics to TorchScript class types;
modifications made to them within TorchScript will be visible in Python.
**Test Plan**
This commit adds a unit test to `TestClassType` that checks that
modifications made to a class type instance passed into TorchScript are
visible in Python after executing the scripted function or module.
**Fixes**
This commit closes#41421.
Test Plan: Imported from OSS
Reviewed By: gmagogsfm
Differential Revision: D24912807
Pulled By: SplitInfinity
fbshipit-source-id: d64ac6211012425b040b987e3358253016e84ca0
Summary:
This commit removes the warning that suggests that users script their
dictionaries before passing them into TorchScript code. The ScriptDict feature
is not fully ready, so it does not make sense to recommend this yet.
Test Plan:
Sandcastle.
In addition, the PyPER test broken by the original diff passes:
```
buck test mode/opt //caffe2/torch/fb/training_toolkit/backend/tests:test_model_materializer_full_sync_lwt -- --exact 'caffe2/torch/fb/training_toolkit/backend/tests:test_model_materializer_full_sync_lwt - caffe2.torch.fb.training_toolkit.backend.tests.test_model_materializer_full_sync_lwt.ModelMaterializerFullSyncLwtTest: test_materialization_determinism_cpu' --run-disabled
```
Differential Revision: D28891351
fbshipit-source-id: 2a3a00cde935d670fb1dc7fd8c709ae9c2ad8cdc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59287
D27211605 added a warning in `toIValue` that warns users to script their
dictionaries before passing them to TorchScript functions in order to get some
performance benefits and reference semantics. However, this warning is emitted
every time `toIValue` is called (e.g. when a dictionary is passed to
TorchScript function), which can lead to noisy log output. This diff changes
this changes to use `TORCH_WARN_ONCE` instead.
Test Plan: Sandcastle, OSS CI.
Reviewed By: hyuen
Differential Revision: D28824468
fbshipit-source-id: e651eade4380abaf77c6c8a81ec4e565b0c2c714
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52659
**Summary**
This commit adds `torch._C.ScriptDict`, a dictionary type that has reference
semantics across the Python/TorchScript boundary. That is, modifications
made to instances of `torch._C.ScriptDict` in TorchScript are visible in
Python even when it is not returned from the function. Instances can be
constructed by passing an instance of a Python dictionary to
`torch.jit.script`. In the case of an empty dictionary, its type is
assumed to be `Dict[str, Tensor]` to be consistent with the handling of
empty dictionaries in TorchScript source code.
`torch._C.ScriptDict` is implemented using a modified version of pybind's `stl_bind.h`-style bindings attached to `ScriptDict`, `ScriptDictIterator` and `ScriptDictKeyIterator`, wrapper classes around `c10::impl::GenericDict` and `c10::impl::GenericDict::iterator`. These bindings allow instances of `torch._C.ScriptDict` to be used as if it were a regular `dict` Python. Reference semantics are achieved by simply retrieving the `IValue` contained in `ScriptDict` in `toIValue` (invoked when converting Python arguments to `IValues` before calling TorchScript code).
**Test Plan**
This commit adds `TestScriptDict` to `test_list_dict.py`, a set of tests
that check that all of the common dictionary operations are supported
and that instances have reference semantics across the
Python/TorchScript boundary.
Differential Revision:
D27211605
D27211605
Test Plan: Imported from OSS
Reviewed By: gmagogsfm
Pulled By: SplitInfinity
fbshipit-source-id: 446d4e5328375791aa73eb9e8b04dfe3465af960
Summary:
Previously we might have gotten segfaults and all, now it raises an exception.
Thread safety hasn't been an objective.
I have a followup to expand the Python interface for the API.
Fixes https://github.com/pytorch/pytorch/issues/49969.
wanchaol
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50326
Reviewed By: pbelevich
Differential Revision: D26096234
Pulled By: gmagogsfm
fbshipit-source-id: 5425772002eb4deb3830ed51eaa3964f22505840
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51340
**Summary**
`toIValue` assumes that any value passed for an argument of type
`torch.device` is a valid device object, even when it is not. This can
lead to device type arguments of functions being assigned incorrect
values (see #51098).
This commit adds an explicit check that the passed in object is indeed a
`torch.device` using `THPDevice_Check` and only then does is it
converted to an `IValue`. Since implicit conversion from strings to
devices is generally allowed, if `THPDevice_Check` fails, it is assumed
that the object is a string and an `IValue` containing a `c10::Device`
containing the passed in string is returned.
**Test Plan**
This commit adds a unit test to `test_jit.py` to test that invalid
strings passed as devices are not longer silently accepted.
**Fixes**
This commit fixes#51098.
Test Plan: Imported from OSS
Reviewed By: pbelevich
Differential Revision: D26187190
Pulled By: SplitInfinity
fbshipit-source-id: 48c990203431da30f9f09381cbec8218d763325b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50228
`fastmod -m 'expect(<((at|c10)::)?\w+Type>\(\)\s*)->'
'expectRef${1}.'`
Presuming it builds, this is a safe change: the result of `expect()`
wasn't being saved anywhere, so we didn't need it, so we can take a
reference instead of a new `shared_ptr`.
ghstack-source-id: 119782961
Test Plan: CI
Reviewed By: SplitInfinity
Differential Revision: D25837374
fbshipit-source-id: 86757b70b1520e3dbaa141001e7976400cdd3b08
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50255
**Summary**
TorchScript classes are copied attribute-by-attribute from a py::object into
a `jit::Object` in `toIValue`, which is called when copying objects from
Python into TorchScript. However, if an attribute of the class cannot be
converted, the error thrown is a standard pybind error that is hard to
act on.
This commit adds code to `toIValue` to convert each attribute to an
`IValue` inside a try-catch block, throwing a `cast_error` containing
the name of the attribute and the target type if the conversion fails.
**Test Plan**
This commit adds a unit test to `test_class_type.py`
based on the code in the issue that commit fixes.
**Fixes**
This commit fixes#46341.
Test Plan: Imported from OSS
Reviewed By: pbelevich, tugsbayasgalan
Differential Revision: D25854183
Pulled By: SplitInfinity
fbshipit-source-id: 69d6e49cce9144af4236b8639d8010a20b7030c0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48840
The CUDAFuture class needs to inspect the values it contains in order to extract its tensors (in fact, the DataPtrs backing those). These are needed first to determine what CUDA devices back those tensors, so that an event for each such device can be recorded; and later to record these DataPtrs with the CUDA caching allocator if they are used in other streams.
This became complicated when Python was added to the mix, because to inspect a Python object we need to acquire the GIL, but we couldn't do so from code that was supposed to also work in C++-only mode. The solution was for users to provide a custom way to extract DataPtrs, so that the PythonFutureWrapper could install such a custom Python-aware one. This was the DataPtr extractor.
In https://github.com/pytorch/pytorch/pull/48502 a different suggestion was proposed. At its root, it consists in adding support for IValues of type PyObject to the visit() and getSubValues() methods. In order to deal with the GIL, we do this through a virtual method: PyObjectHolder, which is the base class, is available also in C++-only mode, and thus defines this method but leaves it unimplemented; ConcretePyObjectHolder, which is the subclass, is only included in Python mode, and thus it can implement that method, acquire the GIL, and do what it's supposed to.
In my opinion, this approach is just brilliant! Thank wanchaol for proposing it! It hides the complexity of dealing with Python inside getSubValues(), where it can be done properly, thus simplifying enormously the CUDAFuture and the PythonFutureWrapper classes.
ghstack-source-id: 118704935
Test Plan: Unit tests
Reviewed By: wanchaol
Differential Revision: D25334355
fbshipit-source-id: 3f1d3bf6e6e8505a114c877fb9a6fcc3f68d91d3