Summary:
Check whether the codegen'd alias annotations actually track alias creation and writes correctly. This could be made more exhaustive, but it's good enough for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14588
Differential Revision: D13312653
Pulled By: suo
fbshipit-source-id: 98de1610ea86deada71957c75c222fff331a0888
Summary:
This PR:
1. Handle None value attr in the WeakScriptModuleProxy
2. add back module tests that now passing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14715
Differential Revision: D13313573
Pulled By: wanchaol
fbshipit-source-id: a6b7892707350290a6d69b6f6270ad089bfc954b
Summary:
To enable self.training in script modules, this PR automatically adds a buffer called 'training' if a script method requests self.training. Assignment to self.training is overloaded to assign both to the boolean property and the tensor value.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14719
Differential Revision: D13310569
Pulled By: zdevito
fbshipit-source-id: 406387bb602f8ce5794eeff37642863c75928be5
Summary:
[ note: stacked on expect files changes, will unstack once they land ]
This adds DeviceObjType (cannot use DeviceType it is already an enum)
to the type hierarchy and an isDevice/toDevice pair to IValue.
Previous hacks which used an int[] to represent Device are removed
and at::Device is used instead.
Note: the behavior or .to is only a subset of python, we need to
fix the aten op so that it accepts Option[Device] and Optional[ScalarType].
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14666
Reviewed By: suo
Differential Revision: D13290405
Pulled By: zdevito
fbshipit-source-id: 68b4381b292f5418a6a46aaa077f1c902750b134
Summary:
This PR is a part of task to unblock standard library export. Basically we want enable the ability to meta program IF stmt to dynamically emit different branches base on `cond`. This is primarily used to disable certain branch compilation on If, like the below
```python
import torch
class Test(torch.jit.ScriptModule):
def __init__(self, b = None):
self.b = b
def forward(self, input):
x = input
if self.b is not None:
x = self.b(input)
return x
Test()(torch.randn(2, 3))
```
This is also the first step for us to bridge the gap between none simple value and any sugared value in JIT.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14533
Differential Revision: D13310526
Pulled By: wanchaol
fbshipit-source-id: 78d1a8127acda5e44d2a8a88f7627c43d29ff244
Summary:
We align the restore logic to `torch.load`, we try to restore to the right device, and if the device is not available, an exception is raised. We allow user to remap the device through a parameter `map_location`, it can be 1) a string like 'cuda:0`, `cpu`, 2) a device, torch.device('cpu'), 3) a dict, {'cuda:1', 'cuda:0'}, and a function, and its signature looks like string map_location(tensor, saved_device_string).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14454
Reviewed By: zrphercule
Differential Revision: D13271956
Pulled By: houseroad
fbshipit-source-id: dfd6b6049b0dc07549ddeddf2dea03ac53ba6d49
Summary:
Previously symbolic AD formulas assumed that no broadcasting happened,
and would return gradients of incorrect shapes (possibly leading to
silent errors later).
Fixes a few bugs (known and unknown):
- #11736
- ArgumentSpec didn't compute the input types correctly [(it didn't advance the offset for non-tensor args)](https://github.com/pytorch/pytorch/pull/14485/files#diff-4fd3157a056596aefb8cdf41022a208bR153)
- Symbolic AD could suffer from use after free (dangling pointers in grad map), because [`EliminateDeadCode` could have removed nodes](https://github.com/pytorch/pytorch/pull/14485/files#diff-25d33ad1ed6855684dec79d927ca6142L781) that referenced gradients of certain values.
- Undefined behavior in `aten::size`
During my tests I've also found a few new problems, and I have opened issues for them:
- FusionGroup seems to think that cat nodes broadcast their inputs (#14483)
- `prim::ConstantChunk` derivative formula doesn't handle undefined inputs (#14484)
This patch unfortunately deoptimizes some of our code (Fusion doesn't happen past chunk nodes, and outputs more tensors only because we have to get their size). I know how to fix those issues, but wanted to fix this terrible bug quickly.
cc zou3519 zdevito ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14485
Differential Revision: D13280899
Pulled By: soumith
fbshipit-source-id: 80cc5ec9331be80e1bb9ddfe85b81c2b997e0b0c
Summary:
This PR makes DCE a little smarter in the presence of mutable ops. Previously mutable ops could never be cleaned up, now they can be cleaned up if we can prove there are no live uses of any alias sets that the op writes to.
This behavior is optional; if you pass DCE a block instead of a graph, it will do the same thing as before. Also changed `InlineAutographSubgraph` to use the common subgraph utils.
Tested on traced ResNet, and it gets rid of the dead code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14601
Differential Revision: D13309118
Pulled By: suo
fbshipit-source-id: dac2791e7d2ecf219ae717a2759b83c1e927f254
Summary:
This PR is a part of task to unblock standard library export. Basically we want enable the ability to meta program IF stmt to dynamically emit different branches base on `cond`. This is primarily used to disable certain branch compilation on If, like the below
```python
import torch
class Test(torch.jit.ScriptModule):
def __init__(self, b = None):
self.b = b
def forward(self, input):
x = input
if self.b is not None:
x = self.b(input)
return x
Test()(torch.randn(2, 3))
```
This is also the first step for us to bridge the gap between none simple value and any sugared value in JIT.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14533
Differential Revision: D13272203
Pulled By: wanchaol
fbshipit-source-id: 44a545abb766bbd39b762a6e19f9ebaa295e324b
Summary:
Stacked on zip commit because it also changes expect files, read only the last commit.
This reduces the number of ways we can print a Type from 3 (python_str, str, operator<<) to 2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14657
Differential Revision: D13288912
Pulled By: zdevito
fbshipit-source-id: f8dd610cea798c511c1d4327395bba54b1aa1697
Summary:
After consulting with Owen, who pointed out the existence of the miniz library, I decided to take one last shot at using zip as our container format.
miniz makes this surprisingly feasible and I think the benefits of using zip are large enough that we should do it.
This replaces our custom container format with a zip archive, preserving all of the
desirable features of our custom format, such as append-oriented writing, and
mmap'able tensor data while adding a bunch of debugging advantages:
1. You can unzip and explore the container to debug what is going on with a model.
2. You can edit the model using a text editor (e.g. change the definition of a method,
or editing the json-serialized meta-data), re-zip the file use OSX's native 'Compress'
option, and re-load the result into pytorch. Note: this enables you to, e.g., print-debug
serialized models.
3. We can easily enable features like compression in the future.
4. Stock python , without pytorch installed, and other programming languages
can reasonably consume this format,using json and zipfile packages, which enables
people to build tools like visualizers without those visualizers depending on pytorch.
This will be especially useful if you want to, for instance, write a visualizer in javascript.
Notes:
* This add miniz (https://github.com/richgel999/miniz) as a dependency. miniz is a self-contained
library for reading/writing zipfiles that unlike other zip libraries also includes libz
compatible compress/decompress support. It is a single header and a single C file without
any other dependencies. Note that the instructions for miniz explicitly state:
> Please use the files from the releases page in your projects. Do not use the git checkout directly!
So we have checked in the 'release' source. Miniz supports zip64, and its API is amenable
to doing zip-align style things to align data.
* Removes 'size' from RecordRef. This allows you to edit files in the zip archive without
editing the meta-data file. Very important if you want to print-debug serialized models.
* PyTorchStreamReader/PyTorchStreamWriter keep mostly the same API (though keys become strings)
However, their implementation is completely swapped out to use miniz.
* Code exists to check for the old magic number to give a decent warning to our preview users
after we change the format.
* Container version information is now put in a stand-alone 'version' file in the archive
and serves a similar purpose to the other container version info.
* All files in the zip archive start at 64-byte boundaries, using an approach similar to
zip-align. Tests check that this property remains true. While the writer does this,
the reader doesn't depend on it, allowing user-created archives that can use compression,
and do not have to align data.
* Added test to check for > 4GB files and archives. Disabled by default because it takes
almost 2 minutes to run.
* torchscript files are now optional: if a submodule does not have methods, it will
not be written.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14521
Reviewed By: jamesr66a
Differential Revision: D13252945
Pulled By: zdevito
fbshipit-source-id: 01209294c0f6543d0fd716f85a38532249c52f8c
Summary:
Remove no_grad_embedding_renorm_ from aten. Setting the derivatives of the inputs to false has different semantics from calling with no_grad(), because it will not error if an input is modified and then has it's grad accessed.
Instead, make a custom op, and use NoGradGuard.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14639
Differential Revision: D13285604
Pulled By: eellison
fbshipit-source-id: c7d343fe8f22e369669e92799f167674f124ffe7
Summary:
This moves `new_module_tests` from `test_nn.py` to `common_nn.py` so
that they can be used in `test_jit.py` without running any of
`test_nn.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14578
Differential Revision: D13268286
Pulled By: driazati
fbshipit-source-id: 6e8654a4c29ab754d656ac83820c14d1c1843e03
Summary:
Stacked on https://github.com/pytorch/pytorch/pull/14378, only look at the last commit.
This changes the way methods are defined in TorchScript archives to use
PythonPrint rather than ONNX protobufs.
It also updates torch.proto to directly document the tensor data
structure actually being serialized.
Notes:
* because PythonPrint prints all the methods at once per module, this
removes MethodDef in favor of a single torchscript_area and a separate
caffe2_graphs entry. Note that NetDef's already have method names,
so there is no need or a separate method name entry.
* This switches cpp/pickle area to RecordRef (references to a file in
the container format) since it is possible the data in these arenas
may be large and not suited to json ouput.
* Removes 'annotations' -- annotations should be re-added on the first
commit that actually has a practical use for them. In the current state
it is unlikely they are representing the right information.
* Some expect files have changed because PythonPrint is preserving more
debug name information for parameter names.
* MethodEncoder (the ONNX output format) has been deleted. There is still
some cleanup possible combining EncoderBase and GraphEncode now that there
is only a single pathway using EncoderBase.
* This incorporates the changes from #14397
to define TensorDef
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14400
Reviewed By: suo
Differential Revision: D13231800
Pulled By: zdevito
fbshipit-source-id: af5c1152d0bd6bca8b06c4703f59b161bb19f571
Summary:
To convert `max_unpool` functions to weak script, this PR adds support
for `T` as default arguments for `BroadcastingListN[T]`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14361
Differential Revision: D13192231
Pulled By: driazati
fbshipit-source-id: a25b75a0e88ba3dfa22d6a83775e9778d735e249
Summary:
This PR adds weak modules for all activation modules and uses `test_nn` module tests to test weak modules that have been annotated with `weak_module` and therefore are in `torch._jit_internal._weak_types`
Also depends on #14379
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14238
Differential Revision: D13252887
Pulled By: driazati
fbshipit-source-id: e9638cf74089884a32b8f0f38396cf432c02c988
Summary:
Resubmitting PR #14415
The tests added for Embedding + EmbeddingBag had random numbers as input, which affected the random number generator & caused the flakey test to break.
Everything but the last two commits have already been accepted
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14509
Differential Revision: D13247917
Pulled By: eellison
fbshipit-source-id: ea6963c47f666c07687787e2fa82020cddc6aa15
Summary:
Add support for Embedding and EmbeddingBag in script. Both functions require with torch.no_grad(), which we don't have any plans to support in the near future. To work around this, I added a embedding_renorm function without derivatives.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14415
Reviewed By: wanchaol
Differential Revision: D13219647
Pulled By: eellison
fbshipit-source-id: c90706aa6fbd48686eb10f3efdb65844be7b8717
Summary:
This PR adds weak modules for all activation modules and uses `test_nn` module tests to test weak modules that have been annotated with `weak_module` and therefore are in `torch._jit_internal._weak_types`
Also depends on #14379
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14238
Differential Revision: D13192230
Pulled By: driazati
fbshipit-source-id: 36488960b6c91448b38c0fa65422539a93af8c5e
Summary:
This PR allows to overload functions based on the value of a parameter (so long as it is a constant). See max_pool1d for an example usage.
This is the first step in enabling the use of max_pool functions for the standard library that can return `Tensor` or `Tuple[Tensor, Tensor]` based on the `return_indices` flag. This will give the JIT identical results to the Python versions of the functions.
Fixes#14081
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14425
Differential Revision: D13222104
Pulled By: driazati
fbshipit-source-id: 8cb676b8b13ebcec3262234698edf4a7d7dcbbe1
Summary:
Port AffineGrid to C++, because script does not support compiling Function classes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14392
Differential Revision: D13219698
Pulled By: eellison
fbshipit-source-id: 3ddad8a84c72010b5a6c6f7f9712be614202faa6
Summary:
Stacked on #14176, review only the last commit.
* Print parameters to methods as self.weight rather than as extra inputs.
* Print entire set of methods out as a single string
* Update test code to test the module-at-a-time export/import
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14378
Differential Revision: D13198463
Pulled By: zdevito
fbshipit-source-id: 3fab02e8239cfd6f40d6ab6399047bd02cf0a8c8
Summary:
In #14239 we fixed ONNX_ATEN.
In order to make sure its correctness in the future, we should add related test case.
We use torch.fmod() to test ONNX_ATEN.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14259
Differential Revision: D13204610
Pulled By: zrphercule
fbshipit-source-id: e4660c346e5edd201f1458b7d74d7dfac49b94c7
Summary:
This PR adds a `try_outplace` option to the tracer. When `try_outplace` is true, the tracer will attempt to out-of-place ops (similar to how things are done today). When it's false, the correct in-place op is emitted.
I made `try_outplace` false by default, but flipped it to true for ONNX export utils. zdevito jamesr66a, anywhere else I should preserve the existing behavior?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14254
Reviewed By: eellison
Differential Revision: D13166691
Pulled By: suo
fbshipit-source-id: ce39fdf73ac39811c55100e567466d53108e856b
Summary:
[Stacked commit, only review the last commit]
This PR adds support for printing default values in python printing as well as the logic
for parsing default values back in using the parser. For simplicity, this PR simply
creates a subgraph of the constant expressions and then runs that graph to generate the defaults.
A more lightweight approach should be possible later, but would require more machinery.
To make reading code in the printer easier, this also add ir_views.h.
Similar to tree_views.h these classes can provide views of some commonly used IR nodes
that have complicated structure and common operations on that structure.
Currently it has only read-only views for prim::If and prim::Loop,
but we should eventually add helpers to manipulate If/Loop nodes as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14176
Differential Revision: D13198455
Pulled By: zdevito
fbshipit-source-id: dc99ab9692804ccaedb60a55040c0b89ac7a6a6d
Summary:
This adds scalar type support to the fuser, both internally (instead of auto / assuming float) and for the inputs/outputs.
We can now fuse things with input / output of arbitrary scalar type, in particular comparisons and where work well. So it fixes#13384 by returning the right type tensor (and adds a test where byte and double tensors are returned).
The type inference is done by re-calling PropagateTensorShapeOnNode in the compilation, I would venture that it isn't prohibitively expensive compared to the actual compilation. (Propagation was fixed for where to return the second argument's type and amended to handle FusedConcat.)
I'm not sure how to add a check for the code generated by the fuser, but I am not sure we absolutely need to (we'd see if it is invalid / produces wrong results).
Thanks in particular to apaszke, fmassa, mruberry for advice and encouragement! All the errors are my own.
I have discussed order of PRs briefly with mruberry, if this goes in before he submits the PR, he graciously agreed to rebasing his, but I'd happily rebase, too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14336
Differential Revision: D13202620
Pulled By: soumith
fbshipit-source-id: 855159e261fa15f21aca3053bfc05fb3f720a8ef
Summary:
This PR allows to overload functions based on the value of a parameter (so long as it is a constant). See `max_pool1d` for an example usage.
This is the first step in enabling the use of `max_pool` functions for the standard library that can return `Tensor` or `Tuple[Tensor, Tensor]` based on the `return_indices` flag. This will give the JIT identical results to the Python versions of the functions.
Depends on #14232 for `Optional[BroadcastingList[T]]`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14081
Differential Revision: D13192228
Pulled By: driazati
fbshipit-source-id: fce33c400c1fd06e59747d98507c5fdcd8d4c113
Summary:
Fixes#12290. Also speeds up JIT LSTM forward pass from 8.8ms to 7.8ms; previously, each JIT lstm cell used 2 fused kernels. Now, it only uses one fused kernel (which is how many kernels cudnn uses).
Explanation:
Let f, g, h be fusible ops.
```
x = f(v, w)
z = g(x, y)
a, b = chunk(z)
c = h(a, b)
```
becomes (before this PR):
```
x = f(v, w)
x', y' = broadcast_tensors([x, y])
ax, bx = chunk(x')
ay, by = chunk(y')
a = g(ax, ay)
b = g(bx, by)
c = h(a, b)
```
The graph fuser then puts g, g, and h into one FusionGroup and is unable
to move `x = f(v, w)` into the FusionGroup.
This PR lets the graph fuser move `x = f(v, w)` into the FusionGroup.
It does this by abstracting the broadcast_tensors + multiple chunk nodes
into one intermediate `prim::BroadcastingChunk[chunks, dim]` node.
A `BroadcastingChunk[chunks, dim](*inputs)` node is equivalent to:
- broadcasting all of *inputs
- chunk-ing each broadcasted input into `chunks` chunks along dim `dim`.
Abstracting the broadcasting chunk behavior away, it is now a lot easier
for the graph fuser to move (broadcast + chunk) past an operation. After
this PR, the above graph becomes:
```
x = f(v, w)
ax, bx, ay, by = BroadcastingChunk(x, y)
a = g(ax, ay)
b = g(bx, by)
c = h(a, b)
```
Now, to move `x = f(v, w)` after the BroadcastingChunk, one just needs
to add f's operands to the BroadcastingChunk:
```
ay, by, av, bv, aw, bw = BroadcastingChunk(y, v, w)
ax = f(av, aw)
by = f(bv, bw)
a = g(ax, ay)
b = g(bx, by)
c = h(a, b)
```
cc apaszke mruberry zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14055
Differential Revision: D13159259
Pulled By: zou3519
fbshipit-source-id: 134e9e645c950384d9be6a06a883a10e17a73d7d
Summary:
Fix a mishandling of `foo[a] = b` when `a` was a tensor. We were assigning to a copy of `foo`, not a view of it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14311
Differential Revision: D13196109
Pulled By: suo
fbshipit-source-id: c929401fda7c4a27622d3fe2b11278b08a7f17f1
Summary:
This handles the input pre-multiplication in RNNs, yielding pretty significant speedups in backward times. This pass depends on loop unrolling, so we'll batch only as many elements as the unrolling factor allows.
cc mruberry ngimel zou3519 zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13456
Differential Revision: D12920339
Pulled By: zou3519
fbshipit-source-id: 5bcd6d259c054a6dea02ae09a9fdf9f030856443
Summary:
1. Support `Optional[BroadcastingList1[int]]` like type annotation to accept a int or a list[int]
2. Convert gumbel_softmax, lp pooling weak functions and modules
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14232
Differential Revision: D13164506
Pulled By: wanchaol
fbshipit-source-id: 6c2a2b9a0613bfe907dbb5934122656ce2b05700
Summary:
export - print a method with python_print
import - import a method with import_method
We want to ensure:
export(g) == export(import(export(g)))
That is after after exporting/importing once, the graph will stay exactly
the same. This is less strict that g == import(export(g)) which would
require us to maintain a lot more information about the structure of the
IR and about the names of debug symbols.
This PR addresses this with the following fixes:
* print out double-precision numbers with high enough precision such
that they always parse in the same way
* when creating loop-carried dependencies, sort them
by variable name, ensuring a consistent order
* parse nan correctly
* DCE: remove unused outputs of if statements, and loop-carried dependencies
in loops that are dead both after the loop and inside the body of the
loop.
* Do not set uniqueName for variables whose names are _[0-9]+, these
are probably rare in user code, and we need a way to communicate
that we do not care about a variable name when re-parsing the graph.
Otherwise temporary variable names will jitter around.
* Expand the definition of a constant in printing code to None,
and family.
* Allow re-treeing to work as long as the only thing in its way is a
constant node. These do not have side effects but are sometimes
inserted in a different order when tracing compared to how we print them.
* Print all constant nodes out first in the order in which they are used_val
(or, if they are inlined, ensure they get assigned CONSTANT.cX number
in a consistent order). Cleanup tuples (this is done in the compiler,
but not in the tracer, leading to some tuple indexing jitter if not
done).
* use strtod_l, not std::stod which can throw exceptions
Other:
* Add REL_WITH_DEB_INFO to setup.py. It already existed for the
cmake files. Threading it into setup.py allows us to turn on
debug symbols with optimization everywhere.
* enable round trip testing for all generated graphs. This only adds
~6 seconds to total build time but tests printing for every graph.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14064
Differential Revision: D13094637
Pulled By: zdevito
fbshipit-source-id: 0a1c6912194d965f15d6b0c6cf838ccc551f161d
Summary:
This PR inserts `prim::None` constants for undefined tensors. This comes in the standard library if an `Optional[Tensor]` is statically determined to be `None`:
```python
torch.jit.script
def fn(x=None):
# type: (Optional[Tensor]) -> Tensor
return torch.jit._unwrap_optional(x)
torch.jit.script
def fn2():
# type: () -> Tensor
return fn()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14120
Differential Revision: D13124625
Pulled By: driazati
fbshipit-source-id: 9eaa82e478c49c503f68ed89d8c770e8273ea569
Summary:
This PR did three things:
1. It export the BatchNorm functional and module, and rewrite some of the components to stay align with the current supported JIT features
2. In the process of export, add necessary compiler support for in_place op aug assign
4. change the test_jit behavior in add_module_test to utilize a single rng state during module initialization
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14016
Differential Revision: D13112064
Pulled By: wanchaol
fbshipit-source-id: 31e3aee5fbb509673c781e7dbb6d8884cfa55d91
Summary:
- This should help TestJit.test_lstm_fusion_concat_cuda
to be less flaky. (Checked on manual_seed 0..99)
Fixes: #14026
- Revert the renaming of test_fused_abs that was introduced
to game the order of tests to avoid the flakiness above.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14094
Differential Revision: D13100174
Pulled By: soumith
fbshipit-source-id: 91bb63b07a960a81dddfc0bf25c67696c0f6c46d