As part of this, a new `AutocastIPU` dispatch key has been added.
There's an existing PR, #85043, to make `Autocast` a proper per-backend functionality key, but it ran into issues with layering with other functionality keys and went stale.
This has been tested in the out-of-tree IPU PyTorch backend.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103890
Approved by: https://github.com/albanD
We want to make TorchRec sharded models TorchScriptable.
TorchRec sharded models uses generic types Awaitable[W] and LazyAwaitable[W] (https://github.com/pytorch/torchrec/blob/main/torchrec/distributed/types.py#L212).
In sharded model those types are used instead of contained type W, having the initialization function that produces object of type W.
At the moment when the first attribute of W is requested - `LazyAwaitable[W]` will call its initialization function (on the same stack), cache the result inside and work transparently as an object of W. So we can think about it as a delayed object initialization.
To support this behavior in TorchScript - we propose a new type to TorchScript - `Await`.
In eager mode it works the same as `LazyAwaitable[W]` in TorchRec, being dynamically typed - acting as a type `W` while it is `Await[W]`.
Within torchscript it is `Await[W]` and can be only explicitly converted to W, using special function `torch.jit.awaitable_wait(aw)`.
Creation of this `Await[W]` is done via another special function `torch.jit.awaitable(func, *args)`.
The semantic is close to `torch.jit.Future`, fork, wait and uses the same jit mechanics (inline fork Closures) with the difference that it does not start this function in parallel on fork. It only stores as a lambda inside IValue that will be called on the same thread when `torch.jit.awaitable_wait` is called.
For example (more examples in this PR `test/jit/test_await.py`)
```
def delayed(z: Tensor) -> Tensor:
return Tensor * 3
@torch.jit.script
def fn(x: Tensor):
aw: Await[int] = torch.jit._awaitable(delayed, 99)
a = torch.eye(2)
b = torch.jit._awaitable_wait(aw)
return a + b + x
```
Functions semantics:
`_awaitable(func -> Callable[Tuple[...], W], *args, **kwargs) -> Await[W]`
Creates Await object, owns args and kwargs. Once _awaitable_wait calls, executes function func and owns the result of the function. Following _awaitable_wait calls will return this result from the first function call.
`_awaitable_wait(Await[W]) -> W`
Returns either cached result of W if it is not the first _awaitable_wait call to this Await object or calls specified function if the first.
`_awaitable_nowait(W) -> Await[W]`
Creates trivial Await[W] wrapper on specified object To be type complaint for the corner cases.
Differential Revision: [D42502706](https://our.internmc.facebook.com/intern/diff/D42502706)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90863
Approved by: https://github.com/davidberard98
This refactor was prompted by challenges handling mixed int/float
operations in C++. A previous version of this patch
added overloads for each permutation of int/float and was unwieldy
https://github.com/pytorch/pytorch/pull/87722/ This PR takes a different
approach.
The general outline of the patch is to combine the C++ types SymIntNode
and SymFloatNode into a single type, SymNode. This is type erased; we
no longer know statically at C++ if we have an int/float and have to test
it with the is_int()/is_float() virtual methods. This has a number of
knock on effects.
- We no longer have C++ classes to bind to Python. Instead, we take an
entirely new approach to our Python API, where we have a SymInt/SymFloat
class defined entirely in Python, which hold a SymNode (which corresponds
to the C++ SymNode). However, SymNode is not pybind11-bound; instead,
it lives as-is in Python, and is wrapped into C++ SymNode using PythonSymNode
when it goes into C++. This implies a userland rename.
In principle, it is also possible for the canonical implementation of SymNode
to be written in C++, and then bound to Python with pybind11 (we have
this code, although it is commented out.) However, I did not implement
this as we currently have no C++ implementations of SymNode.
Because we do return SymInt/SymFloat from C++ bindings, the C++ binding
code needs to know how to find these classes. Currently, this is done
just by manually importing torch and getting the attributes.
- Because SymInt/SymFloat are easy Python wrappers, __sym_dispatch__ now
takes SymInt/SymFloat, rather than SymNode, bringing it in line with how
__torch_dispatch__ works.
Some miscellaneous improvements:
- SymInt now has a constructor that takes SymNode. Note that this
constructor is ambiguous if you pass in a subclass of SymNode,
so an explicit downcast is necessary. This means toSymFloat/toSymInt
are no more. This is a mild optimization as it means rvalue reference
works automatically.
- We uniformly use the caster for c10::SymInt/SymFloat, rather than
going the long way via the SymIntNode/SymFloatNode.
- Removed some unnecessary toSymInt/toSymFloat calls in normalize_*
functions, pretty sure this doesn't do anything.
- guard_int is now a free function, since to guard on an int you cannot
assume the method exists. A function can handle both int and SymInt
inputs.
- We clean up the magic method definition code for SymInt/SymFloat/SymNode.
ONLY the user classes (SymInt/SymFloat) get magic methods; SymNode gets
plain methods; this is to help avoid confusion between the two types.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87817
Approved by: https://github.com/albanD, https://github.com/anjali411
- Support storing SymFloat in IValue
- Add SymFloat to JIT type system (erases to float)
- Printing support for SymFloat
- add/sub/mul/truediv operator support for SymFloat
- Support truediv on integers, it returns a SymFloat
- Support parsing SymFloat from Python object
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85411
Approved by: https://github.com/albanD
Record stack trace information for each allocated segment in the allocator.
It takes around 1.5us to record 50 stack frames of context.
Since invoking a Pytorch operator is around 8us, this adds minimal overhead but we still leave it disabled by default so that we can test it more on real workloads first.
Stack information is kept both for allocated blocks and the last allocation used inactive blocks. We could potential keep around the _first_ allocation that caused the block to get allocated from cuda as well.
Potential Followups:
* stack frame entries are small (16 bytes), but the list of Frames is not compressed eventhough most frames will share some entries. So far this doesn't produce huge dumps (7MB for one real workload that uses all memory on the GPU), but it can be much smaller through compression.
* Code to format the information is slow (a few seconds) because it uses python and FlameGraph.pl
* Things allocated during the backward pass have no stack frames because they are run on another C++ thread.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82146
Approved by: https://github.com/albanD
Done via
```
git grep -l 'SymbolicIntNode' | xargs sed -i 's/SymbolicIntNode/SymIntNodeImpl/g'
```
Reasoning for the change:
* Sym is shorter than Symbolic, and consistent with SymInt
* You usually will deal in shared_ptr<...>, so we're going to
reserve the shorter name (SymIntNode) for the shared pointer.
But I don't want to update the Python name, so afterwards I ran
```
git grep -l _C.SymIntNodeImpl | xargs sed -i 's/_C.SymIntNodeImpl/_C.SymIntNode/'
```
and manually fixed up the binding code
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82350
Approved by: https://github.com/Krovatkin
The `allowlist_for_publicAPI.json` allows specifying the modules that
are being migrated. However, the exceptions in that file are only
applied to the original entry. This introduces a change to the
`test_correct_module_names` to extend the `allow_dict` with the modules
that are being migrated.
## Example Scenario
Assume there is an "allow list" for some module `torch.foo`:
```json
{
"torch.foo": [
"Any",
"Optional",
]
}
```
Assume that the module is also being migrated to `torch.bar`, with
a `*` import in the original location (s.a. `from torch.bar import *`)
```json
{
"being_migrated": {
"torch.foo": "torch.bar"
},
"torch.foo": [
"Any",
"Optional",
],
"torch.bar": [
"Any",
"Optional",
],
}
```
In that case, both `torch.foo` and `torch.bar` must have the same list
of exceptions. One way to do it, is to enforce the developers to add
new "allow list" to the JSON file for the migrations. As an alternative
this PR just creates a duplicate entry to support exceptions in both
`torch.foo` and `torch.bar`.
With this PR, we don't need to modify anything beyond the `being_migrated` list:
```json
{
"being_migrated": {
"torch.foo": "torch.bar"
},
"torch.foo": [
"Any",
"Optional",
],
}
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82022
Approved by: https://github.com/albanD
- This PR adds the nondeterministic tag to tags.yaml to specify functions that may not necessarily return the same outputs when ran with identical inputs.
- The tag is added to the functions in native_functions.yaml that are specified as nondeterministic by aliasdb in https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/ir/ir.cpp#L1146
- **Thus there may be ops that are nondeterministic that currently do not have the nondeterministic tag but should. The plan is to create a test bench to determine which ops in native_functions.yaml are nondeterministic and add the tag to qualifying functions in a later pr.**
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81440
Approved by: https://github.com/anjali411
If a module is being migrated, a common practice is to temporarily support
the old location. That might break the assertion that the `__module__`
of a function is pointing to the same location as where it is created.
## Example
1. Assume there is `torch/nn/quantized/functional.py`
2. The file is copied to `torch/ao/nn/quantzied/functional.py`
3. The old location is changed to have `from torch.ao.nn.quantized.functional import *`
In such a situation, importing from the old location will have `__module__`
pointing to the new `torch/ao/nn/...` location. This will break the
current test.
## What changed
This PR adds the following:
1. Added a key `"being_migrated"` to the `allowlist_for_publicAPI.json`
2. Added a check in the `test_public_bindings.py` to check if the JSON file has the `"being_migrated"` key.
## How to add migration entries
1. Add an entry to the `"being_migrated"`
For the example above, add `"torch.nn.quantized.functional": "torch.ao.nn.quantized.functional"`
2. Change any existing keys for the old location
For example, if there is an existing entry `"torch.nn.quantized.functional": [...]`
outside the `"being_migrated"`.
Change it to `"torch.ao.nn.quantized.functional": [...]`
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81314
Approved by: https://github.com/anjali411
This PR adds support for `SymInt`s in python. Namely,
* `THPVariable_size` now returns `sym_sizes()`
* python arg parser is modified to parse PyObjects into ints and `SymbolicIntNode`s
* pybind11 bindings for `SymbolicIntNode` are added, so size expressions can be traced
* a large number of tests added to demonstrate how to implement python symints.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78135
Approved by: https://github.com/ezyang
Following feedback.
The new message looks like:
```
# torch.nn.intrinsic.modules._FusedModule:
- Is public: it is inside the module's (`torch.nn.intrinsic.modules`) `__all__`
- Does NOT look public: because it starts with `_` (`_FusedModule`)
- You can do either of these two things to fix this problem:
- To make it NOT public: remove it from the modules's (`torch.nn.intrinsic.modules`) `__all__`
- To make it look public: remove the `_` at the beginning of the name
# torch.ao.nn.sparse.quantized.dynamic.linear.LinearBlockSparsePattern:
- Is public: it is an attribute that does not start with `_` on a module that does not have `__all__` defined
- Does NOT look public: because its `__module__` attribute (`torch.ao.nn.sparse.quantized.utils`) is not within the torch library or does not start with the submodule where it is defined (`torch.ao.nn.sparse.quantized.dynamic.linear`)
- You can do either of these two things to fix this problem:
- To make it NOT public: either define a `__all__` for `torch.ao.nn.sparse.quantized.dynamic.linear` or add a `_` at the beginning of the name
- To make it look public: make sure the `__module__` is properly set and points to a submodule of `torch.ao.nn.sparse.quantized.dynamic.linear`
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76261
Approved by: https://github.com/NivekT