We want to make TorchRec sharded models TorchScriptable.
TorchRec sharded models uses generic types Awaitable[W] and LazyAwaitable[W] (https://github.com/pytorch/torchrec/blob/main/torchrec/distributed/types.py#L212).
In sharded model those types are used instead of contained type W, having the initialization function that produces object of type W.
At the moment when the first attribute of W is requested - `LazyAwaitable[W]` will call its initialization function (on the same stack), cache the result inside and work transparently as an object of W. So we can think about it as a delayed object initialization.
To support this behavior in TorchScript - we propose a new type to TorchScript - `Await`.
In eager mode it works the same as `LazyAwaitable[W]` in TorchRec, being dynamically typed - acting as a type `W` while it is `Await[W]`.
Within torchscript it is `Await[W]` and can be only explicitly converted to W, using special function `torch.jit.awaitable_wait(aw)`.
Creation of this `Await[W]` is done via another special function `torch.jit.awaitable(func, *args)`.
The semantic is close to `torch.jit.Future`, fork, wait and uses the same jit mechanics (inline fork Closures) with the difference that it does not start this function in parallel on fork. It only stores as a lambda inside IValue that will be called on the same thread when `torch.jit.awaitable_wait` is called.
For example (more examples in this PR `test/jit/test_await.py`)
```
def delayed(z: Tensor) -> Tensor:
return Tensor * 3
@torch.jit.script
def fn(x: Tensor):
aw: Await[int] = torch.jit._awaitable(delayed, 99)
a = torch.eye(2)
b = torch.jit._awaitable_wait(aw)
return a + b + x
```
Functions semantics:
`_awaitable(func -> Callable[Tuple[...], W], *args, **kwargs) -> Await[W]`
Creates Await object, owns args and kwargs. Once _awaitable_wait calls, executes function func and owns the result of the function. Following _awaitable_wait calls will return this result from the first function call.
`_awaitable_wait(Await[W]) -> W`
Returns either cached result of W if it is not the first _awaitable_wait call to this Await object or calls specified function if the first.
`_awaitable_nowait(W) -> Await[W]`
Creates trivial Await[W] wrapper on specified object To be type complaint for the corner cases.
Differential Revision: [D42502706](https://our.internmc.facebook.com/intern/diff/D42502706)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90863
Approved by: https://github.com/davidberard98
Not only is this change usually shorter and more readable, it also can yield better performance. size() is not always a constant time operation (such as on LinkedLists), but empty() always is.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93236
Approved by: https://github.com/malfet
Summary:
// A non owning pointer to a type. When a class get inserted as a constant
// into a graph, if we used a strong pointer we would have a circular reference
// from Object -> CompilationUnit and CompilationUnit -> Graph (which owns the
// Constant Object)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65442
Reviewed By: ezyang
Differential Revision: D31101962
Pulled By: eellison
fbshipit-source-id: f1c1cfbe5a8d16a832cad7ba46e2a57a98670083
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63414
Misuse of raw pointer in here where stack is never nullable.
ghstack-source-id: 136938318
Test Plan:
compiles.
Imported from OSS
Reviewed By: ejguan
Differential Revision: D30375410
fbshipit-source-id: 9d65b620bb76d90d886c800f54308520095d58ee
Summary:
As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH`
All changes but the ones to `.clang-tidy` are generated using following script:
```
for i in `find . -type f -iname "*.c*" -or -iname "*.h"|xargs grep cppcoreguidelines-avoid-non-const-global-variables|cut -f1 -d:|sort|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008
Reviewed By: driazati, r-barnes
Differential Revision: D29838584
Pulled By: malfet
fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13
Summary:
* Minor: spelling, grammar.
* Add calls to `GRAPH_DUMP()` where they were missing.
* Add or expand a few comments.
* Move a few comments to seemingly more appropriate spots.
* In canonicalize_graph_fuser_ops.cpp inline `runnableInputs()` since it
was only called in one place and had a misleading comment and
confusing name.
* In `PeepholeOptimizeImpl::optimizeBlock()`, set `changed = true;` when
removing `aten::is_complex`. Pretty sure its absence was a bug.
* Delete unused `_jit_pass_remove_inplace_ops` and and its
implementation `RemoveInplaceOps()`.
* In `preprocessCaffe2Ops()`, remove redundant check for nested optional
types. It was already checked in `checkONNXCompatibility()`.
* In `EncoderBase::AddAttribute`, log the unexpected attribute kind.
I don't remember the repro case now but I did hit this error at some
point and this additional logging made it easier to understand.
* In `fuseConvBatchNorm()` in eval_peephole.cpp, consistently use
camelCase instead of snake_case for local variables.
* Add curly braces around the bodies of if and loops.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60390
Reviewed By: Krovatkin
Differential Revision: D29523283
Pulled By: SplitInfinity
fbshipit-source-id: 4e16c5648616f53da07d68dab7fdf252e06a0752
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os
def get_compiled_files_list():
import json
with open("build/compile_commands.json") as f:
data = json.load(f)
files = [os.path.relpath(node['file']) for node in data]
for idx, fname in enumerate(files):
if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
return files
def run_clang_tidy(fname):
check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
changes = check_output(["git", "ls-files", "-m"])
if len(changes) == 0:
return
check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])
def main():
git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
compiled_files = get_compiled_files_list()
for idx, fname in enumerate(git_files):
if fname not in compiled_files:
continue
if fname.startswith("caffe2/contrib/aten/"):
continue
print(f"[{idx}/{len(git_files)}] Processing {fname}")
run_clang_tidy(fname)
if __name__ == "__main__":
main()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892
Reviewed By: H-Huang
Differential Revision: D27991944
Pulled By: malfet
fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54640
If we are running constant propagation on a graph that doesn't have any operators with constant inputs and any mutable inputs/outputs, we do not need to initialize an alias db. This is going to be used to speed up symbolic shape analysis.
Test Plan: Imported from OSS
Reviewed By: nikithamalgifb
Differential Revision: D27340863
Pulled By: eellison
fbshipit-source-id: 087b2a33b42c58fa5dae405d652b056d0f1d72e7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54110
dictConstruct doesn't need to make its caller have a `shared_ptr<DictType>`. It also doesn't need to do extra `shared_ptr` copies into the `key_type` and `value_type` locals.
ghstack-source-id: 124150642
Test Plan: fitsships
Reviewed By: ezyang
Differential Revision: D27101782
fbshipit-source-id: 3c632ad9d8f1bd7bdf37f517a86aca27bd41548a
Summary:
Pull Request resolved: https://github.com/pytorch/glow/pull/5062
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45556
User defined classes can be used as constants. This is useful when freezing and removing the module from the graph.
Test Plan: waitforsadcastle
Reviewed By: eellison
Differential Revision: D23994974
fbshipit-source-id: 5b4a5c91158aa7f22df39d71f2658afce1d29317
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46686
I was trying to page this code back in after a while and some things
stuck out as unnecessarily confusing.
1. Improve documentation of closures and fork stuff to be more accurate
to how we use them today.
2. Change `prim::LocalVariableScope` to `prim::ListComprehension`. It is
only ever used for a list comprehensions, and in general the nodes
emitted by `ir_emitter` should correspond to concrete operations or
language features rather than semantic constraints.
3. Change the somewhat mysterious "inputs" and "attributes" argument
names throughout the codebase to be the more obvious "args" and "kwargs"
that they generally represent (I think "inputs" and "attributes" come
from the AST naming).
Test Plan: Imported from OSS
Reviewed By: navahgar, jamesr66a
Differential Revision: D24464197
Pulled By: suo
fbshipit-source-id: 1f4b1475b58b5690a0b204e705caceff969533b4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43633
In the backward graph, _grad_sum_to_size is inserted whenever a possibly broadcasting op is called:"
`"aten::_grad_sum_to_size(Tensor(a) self, int[]? size) -> Tensor(a)"`
If a broadcast occurred, a sum is called, otherwise the second input is None and it is a no-op. Most of the time, it's a no-op (in the fast RNNs benchmark > 90% of the time).
We can get rid of this op by profiling the optionality of the second input. I added `prim::profile_optional` to do this, which counts the number of times it saw a None value and the number of times it saw a value present. When specializing the backward graph, we insert checks for values we profiled as None, and in the optimized block can remove the grad_sum_to_size calls that use those values.
In the future we may revisit this when NNC supports reductions and we want to replace grad_sum_to_size with sums as well, but I think this is worth landing now.
Test Plan: Imported from OSS
Reviewed By: bwasti, ZolotukhinM
Differential Revision: D23358809
Pulled By: eellison
fbshipit-source-id: a30a148ca581370789d57ba082d23cbf7ef2cd4d
Summary:
Raise and assert used to have a hard-coded error message "Exception". User provided error message was ignored. This PR adds support to represent user's error message in TorchScript.
This breaks backward compatibility because now we actually need to script the user's error message, which can potentially contain unscriptable expressions. Such programs can break when scripting, but saved models can still continue to work.
Increased an op count in test_mobile_optimizer.py because now we need aten::format to form the actual exception message.
This is built upon an WIP PR: https://github.com/pytorch/pytorch/pull/34112 by driazati
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41907
Reviewed By: ngimel
Differential Revision: D22778301
Pulled By: gmagogsfm
fbshipit-source-id: 2b94f0db4ae9fe70c4cd03f4048e519ea96323ad
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37034
c10 takes a Stack* in boxed functions while JIT took Stack&.
c10 doesn't return anything while JIT returns an int which is always zero.
This changes JIT to follow the c10 behavior.
ghstack-source-id: 106834069
Test Plan: unit tests
Differential Revision: D20567950
fbshipit-source-id: 1a7aea291023afc52ae706957e9a5ca576fbb53b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39497
Previously, we didn't consider side effects at all when moving nodes in alias analysis. It is never valid to reorder a node with a side effect. This has led to bugs when used with Bailouts.
Unfortunately this will might cause regressions but it wasn't correct prior :/
Test Plan: Imported from OSS
Differential Revision: D21963774
Pulled By: eellison
fbshipit-source-id: 656995d1b82534eca65437ed4e397b2bf08a4dec
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35474
I had previously tried to optimize getMutableTypePtr calls by not recursing through container types, but it turns out there are a few uses of container types which refine their contained elements.
This attempt was in #35301
Now I am optimizing calls by caching TypePtr -> Mutable TypePtr conversions. Now that we are doing caching none of the functions marked as const are really const anymore. Previously many of the const functions actually mutated internal state, such as rebuildWriteCache.
one kind of annoying thing is that there is a general api for querying mutability isMutableType that doesn't use the cache, and one internal that does, isMutableTypeInternal. It would be nice if I could call isMutableType within alias analysis and it would dispatch to the internal function, but I'm not sure how to do that.
getMutableTypePtr showed up as 12% of the first run of FairSeq, so this is a function worth optimizing.
Test Plan: Imported from OSS
Differential Revision: D20873493
Pulled By: eellison
fbshipit-source-id: 1b42bb58ba4142c118a6bc47a26978cd7fd0ac79
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35115
This commit runs the newly added tools/clang_format.py on the JIT
codebase and includes all of the formatting changes thus produced.
Testing:
Ran the script, CI.
Test Plan: Imported from OSS
Reviewed By: eellison
Differential Revision: D20568523
Pulled By: SplitInfinity
fbshipit-source-id: e09bdb982ccf090eecfb7c7b461b8d0681eef82b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33265
This removes the need for isinstance to keep trace of list and tuple
separately by introducing AnyListType and AnyTupleType into the JIT
type system to be the common supertype of any lists or tuples.
This allows us to remove the weird flags from the interpreter for
the isinstance operator.
Test Plan: Imported from OSS
Differential Revision: D19883933
Pulled By: zdevito
fbshipit-source-id: f998041b42d8b4554c5b99f4d95d1d42553c4d81
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32889
Common primitive ops that have special inputs make it very hard to
serialize the bytecode for mobile because information about how the
op behaves is hidden in the Node*. This changes how we handle the following
ops so that they are encoded as their own interpreter bytecodes.
```
USES NODE: prim::TupleUnpack(...) -> (...)
USES NODE: prim::TupleSlice(...) -> (...)
USES NODE: prim::TupleConstruct(...) -> (...)
USES NODE: prim::ListUnpack(...) -> (...)
USES NODE: prim::ListConstruct(...) -> (...)
USES NODE: prim::DictConstruct(...) -> (...)
USES NODE: prim::Constant() -> (...)
USES NODE: prim::isinstance(...) -> (...)
USES NODE: prim::CreateObject(...) -> (...)
USES NODE: prim::fork(...) -> (...)
USES NODE: aten::warn(str message, *, int stacklevel=2) -> () # need stack level information, so ideally in interpreter so it can look at the stack
```
This leaves a state where the _only_ remaining Node*-consuming builtins
are things that are only introduced during JIT optimization and will
not appear in mobile code.
Serialization of bytecode can now be made to directly write the CodeImpl
object without modification.
Test Plan: Imported from OSS
Differential Revision: D19673157
Pulled By: zdevito
fbshipit-source-id: 7b8c633d38a4c783b250fbdb222705e71a83ad26
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32804
Constants are interpreter primitives so the op was not actually used.
This cleans up some of the logic around it.
This also fixes constant prop such that failures to look up an op
do not silently stop constant propagation. Instead, only errors
inside the op implementation itself will do this.
Test Plan: Imported from OSS
Differential Revision: D19673156
Pulled By: zdevito
fbshipit-source-id: 7beee59a6a67a6c2f8261d86bd505280fefa999e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32682
This moves code around so that operator.h/cpp no longer requires a full
definition of Node* nor does it include alias analysis or the pretty printer.
This should make it possible to include in the mobile build.
Functionality for checking if operators match Node and to look up
and operator for a Node have moved to the Node object.
Test Plan: Imported from OSS
Differential Revision: D19615386
Pulled By: zdevito
fbshipit-source-id: e38bdf29971183597ef940d061c06ba56e71d9c5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31840
The next PR in this stack makes tuples insertable as constants, so we can remove special handling of tuples in constant propagation.
Test Plan: Imported from OSS
Differential Revision: D19439515
Pulled By: eellison
fbshipit-source-id: c58f153157f1d4eee4c1242decc4f36e41c1aa05
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30544
Run Constant Propagation upon compilation only on ops with non-aliasing inputs and outputs. This speeds up the first run of `torchvision.models.resnet18` by over 50% and speeds up compilation by about 25% (although the effects didn't seem additive with with https://github.com/pytorch/pytorch/pull/30503, so I'm going to land this PR first and then see if caching still has a sizable impact).
Running constant prop only with non-aliasing types does a lot of graph cleanup by removing constant ifs and a bunch of other smaller ops. It also avoids all the jitter problems we had when we tried running full constant prop previously. Bc it is idempotent it doesn't jitter, and it doesn't jitter graphs constructed from tracing because tracing doesn't emit any ops that only involve non-aliasing inputs.
Full constant prop isn't idempotent because what ops are run depends on the state of mutation in alias db, which will often change upon successive iterations of constant propagation, and bc it affects graphs constructed from tracing.
Edit: if we were okay with running constant propagation on graphs constructed from tracing (potentially making them hard to debug), an alternative would be to run constant propagation until the graph reaches a fixed point.
Test Plan: Imported from OSS
Differential Revision: D18833607
Pulled By: eellison
fbshipit-source-id: 92a0adb4882d67ed5a0db5c279f5e122aeeba54a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28255
Add support for treating Sequentials, ModuleLists, and ModuleDicts as iterables.
As previously, when emitting a for loop over a Module Container we unroll the for loop over all elements. We require that any Sugared Value in an iterable with a Module Container have a statically - determinable length.
Otherwise, if you zipped over a list of varying length and an nn.Sequential that alternated between returning a Tensor and a Dictionary, the output type would change based on the length of the list.
Fix for #17179
And https://github.com/pytorch/pytorch/issues/27401
and https://github.com/pytorch/pytorch/issues/27506
Test Plan: Imported from OSS
Reviewed By: ZolotukhinM
Differential Revision: D18278124
Pulled By: eellison
fbshipit-source-id: aca336a5b8da89c756b1f0884883649510cbde3c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25361
Previously we had a different None object for each type T so that
unwrap optional could still recover the type T from it. After a few
months of having this conversion behavior, it has become clear that
only the unwrap optional operators cause this problem. Furthermore, it
is beneficial to have NoneType <: Optional[T] because this is how IValues
work (in particular the None IValue is not tagged). This patch makes the
necessary changes to do this. In particular it special cases unwrap optional
in export so that it annotates the None to make sure we can recover the type.
This also changes how matching and evaluating type values work so that we
can consider None matchable to type Optional[T], eventhough we cannot
derive T from that match.
Test Plan: Imported from OSS
Differential Revision: D17103072
Pulled By: zdevito
fbshipit-source-id: 37678ed3e5ce54f2eb3ee4dff2734a39f0bee028
Summary:
Don't throw in constant propagation, since the op we're running may not be reached. Previously we would only only catch `C10::Error`; however it's hard to maintain that the entire codebase doesn't throw any other types of errors, and some errors map nicely to python errors, like `std::index_error` to IndexError.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25270
Differential Revision: D17102545
Pulled By: eellison
fbshipit-source-id: 9fd485821743ad882e5c6fc912ca47b0b001b0e9
Summary:
Replaces https://github.com/pytorch/pytorch/pull/21501 because ghimport had errors when i tried to import the stack that i couldn't figure out :'(
has the two commits that were previously accepted and the merge commit
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22561
Differential Revision: D16135743
Pulled By: eellison
fbshipit-source-id: f0a98842ccb334c7ceab04d1437e09dc76be0eb1
Summary:
Create an uninitialized ivalue. This will be needed for Breaks & Continues to match up if block outputs of values that are guaranteed not to be used but need to escape the block scope. It is not exposed to users.
Was previously part of final returns but I was asked to make a separate PR for it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21387
Differential Revision: D15745124
Pulled By: eellison
fbshipit-source-id: ae6a6f766b4a70a71b9033987a630cfbf044e296