Summary:
Throw a warning when calling `torch.load` on a zip file
Fixes#15570
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15578
Differential Revision: D13555954
Pulled By: driazati
fbshipit-source-id: a37ecdb3dd0c23eff809f86e2f8b74cd48ff7277
Summary:
Changes originally in this PR:
1. Move Variable::Impl data members into TensorImpl as `AutogradMeta` struct
2. Change Variable::Impl functions to use data members in `AutogradMeta` struct
3. Add `shallow_copy_and_detach()` function to each subclass of TensorImpl
4. Do shallow copy when the user calls `make_variable(tensor)` / `make_variable_view(tensor)` / `variable.set_data(tensor)` / `variable.detach()`
Changes moved from https://github.com/pytorch/pytorch/pull/13645:
1. Add a flag to Variable to disallow size/stride/storage_ptr changes from in-place operations such as `resize_` / `resize_as_` / `set_` / `transpose_`, and set this flag to true when people call `tensor.data` in Python.
2. Write text in the docs to actively discourage changing the shape or storage of `tensor_detached` and expecting `tensor` to also be updated.
This is the 1st+2nd PR mentioned in https://github.com/pytorch/pytorch/issues/13638.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13827
Differential Revision: D13507173
Pulled By: yf225
fbshipit-source-id: b177b08438d534a8197e34e1ad4a837e2db0ed6a
Summary:
This adds AD support for adaptive_avg_pool2d, which is necessary for resnet50 in pytorch/vision:master. cc: soumith asuhan dlibenzi
apaszke I saw that autodiff bug you fixed in #15403 , as it doesn't prevent this PR from passing, so I'll leave it for your PR to fix it. :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15459
Differential Revision: D13534732
Pulled By: ailzhang
fbshipit-source-id: 4e48b93e35d5ecfe7bd64b6a132a55b07843f206
Summary:
This PR allows a subclass of programs that have return statements that are not final in the graph.
`final_returns.h` contains the a comment describing how this is accomplished.
To minimize complexity in `compiler.cpp`, this pass is done as an AST-to-AST rewrite before the compiler runs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15463
Differential Revision: D13538962
Pulled By: zdevito
fbshipit-source-id: 67105ca873351825b4a364092ab1873779f3e462
Summary:
This PR implements infrastructure for post-processing a model to apply int8 quantization to its `nn.Linear` modules. Highlights of the implementation:
1) Inputs and outputs are `float` (quantized and packed internally), but the weight is quantized and packed ahead of time for efficiency. This implementation performs well in small-batch size GEMM calls. It should not be considered a general-purpose quantized GEMM kernel.
2) Weight packing is dependent on machine architecture (e.g. vector register width), so it is done just-in-time. Concretely, it is done on model load for the weights and it is done during operator execution for the input value.
3) Biases are unquantized
4) We fail loudly if we are attempting to run this on a machine that does not support FBGEMM. This is because we do not want a model's numerics to differ based on which machine it is run on. A model containing these FBGEMM ops *must* be run with FBGEMM
The API can be seen in the added test case. Highlights are:
1) `torch.jit.quantized.quantize_linear_modules` walks the module hierarchy of the passed-in Module and replaces all `nn.Linear` modules with a new `QuantizedLinear` module, which encapsulates the behavior described above.
2) `_pack()` and `_unpack()` script methods are present on `QuantizedLinear` modules. These methods should be called before serialization and after deserialization, respectively. This ensures that the weight matrix is properly packed for the running machine's architecture. Note that in the long term, we would like to move toward a more Pickle-style serialization technique, rather than having these explicit methods that mutate member values. This is blocked on being able to assign attributes in a ScriptMethod, among other things.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13777
Differential Revision: D13383276
Pulled By: jamesr66a
fbshipit-source-id: 00f29c9f34544add2b90107e3cf55a287802c344
Summary:
This PR adds enough of the infra for supporting closures (inner script functions) in order to allow us to expression symbolic gradients using them. We do not actually ever run graphs that contain these closures. The symbolic_script infrastructure just extracts them out of the original forward graph and turns them into discrete forward/backward pairs. This cuts down on the type annotations necessary to write forward/backward pairs and aligns closely with the "differentiator" function approach to expression reverse-mode AD.
Example:
This code:
```
import torch
r = torch.jit.CompilationUnit(
'''
def mul_forward(self, other):
def backward(grad_output):
grad_self = (grad_output * other).sum_to_size(self.size())
grad_other = (grad_output * self).sum_to_size(other.size())
return grad_self, grad_other
return self * other, backward
''')
print(r.module.code)
```
Will produce this graph (pretty printed for clarity):
```
def mul_forward(self,
self: Tensor,
other: Tensor) -> Tuple[Tensor, Tuple[None, Tuple[Tensor, Tensor]]]:
backward = (self.__lambda, (other, self))
return (torch.mul(self, other), backward)
def __lambda(self,
context: Tuple[Tensor, Tensor],
grad_output: Tensor) -> Tuple[Tensor, Tensor]:
other, self, = context
grad_self = torch.sum_to_size(torch.mul(grad_output, other), torch.size(self))
grad_other = torch.sum_to_size(torch.mul(grad_output, self), torch.size(other))
return (grad_self, grad_other)
```
symbolic_script will then do some modifications to remove the unsuppored prim::Function node, yielding:
```
def mul_forward(self,
self: Tensor,
other: Tensor) -> Tuple[Tensor, Tuple[None, Tuple[Tensor, Tensor]]]:
return (torch.mul(self, other), (other, self))
def backward(self,
context: Tuple[Tensor, Tensor],
grad_output: Tensor) -> Tuple[Tensor, Tensor]:
other, self, = context
grad_self = torch.sum_to_size(torch.mul(grad_output, other), torch.size(self))
grad_other = torch.sum_to_size(torch.mul(grad_output, self), torch.size(other))
return (grad_self, grad_other)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15411
Differential Revision: D13523340
Pulled By: zdevito
fbshipit-source-id: 4d4a269460e595b16802c00ec55ae00e3e682d49
Summary:
Save error info in the future for parent thread to pick up. Throw the error
when the thread is the root thread.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14523
Differential Revision: D13251756
Pulled By: highker
fbshipit-source-id: b40f9a45665e1a934743f131ec5e8bad5622ce67
Summary:
Changelog:
- change some expect tests that didn't have to be expect tests,
instead use self.assertAllFused
- Some of the fuser tests weren't using self.assertAllFused.
- Minor test renames
cc apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15134
Differential Revision: D13507481
Pulled By: zou3519
fbshipit-source-id: dd0788530a60bb5ed2f42b961fae3db2b4404b64
Summary:
This PR changes Method (just Method not all graphs) to always have a single
return argument.
This is part 1 in a set of changes that will enable us to have better handling if early return statements.
The simplification that this change provides greatly reduces the work for the next step.
This change makes it so that Method and Python handle multiple returns in the same way:
* 0 - None
* 1 - <single value>
* many - Tuple[...]
The result is that a lot of special-case handling in compiler.cpp and its
bindings can be removed. It also fixes several bugs in return handling,
including one where return values were not always checked against their
attributed values.
Notes:
* inferTypeFrom is renamed to be more accurate and discourage use.
* This has uncovered some bugs in other components, which are noted in
the diff.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15289
Differential Revision: D13481649
Pulled By: zdevito
fbshipit-source-id: 0e2242a40bb28cca2d0e8be48bede96195e4858c
Summary:
There is still limitation on this: if a script module is somewhere
in the trace, the inputs/outputs can only be tensors or tuples of
tensors.
resolves#15052
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15184
Differential Revision: D13457691
Pulled By: highker
fbshipit-source-id: 8fe46afc41357a0eb8eadd83f687b31d074deb0e
Summary:
This PR add isinstance to do static type checking in JIT.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15076
Differential Revision: D13471067
Pulled By: wanchaol
fbshipit-source-id: d39b7ed5db9fcca4b503659d02cf7795950ea8ea
Summary:
`torch.expand` and `torch.ne` are used often in models and this PR adds ONNX export support for them. ArmenAg has created issue https://github.com/pytorch/pytorch/issues/10882 for this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15050
Differential Revision: D13453036
Pulled By: houseroad
fbshipit-source-id: 4724b4ffcebda6cd6b2acac51d6733cb27318daf
Summary:
`rsplit` doesn't have kwargs in Python 2 so this line raises an error
Fixes#15135
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12732
Differential Revision: D10458630
Pulled By: driazati
fbshipit-source-id: a63e42fbc0e39e4291480775b516c98122ec05a1
Summary:
We need this, for example, to properly call `_unpack` when we have a traced module in the hierarchy
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15101
Differential Revision: D13468467
Pulled By: jamesr66a
fbshipit-source-id: c2b6740b12cde6e23395d12e42d4fc2c4c7ca3f2
Summary:
Adding support for torch.tensor in script.
The input list is typed as t[], because it can be arbitrarily nested. I added a check a compile time check that the inner type of the list is a bool, float, or int.
Also adds specialization for Boolean Lists, which already existed at the ivalue level but had not been added to the compiler yet
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14913
Differential Revision: D13407930
Pulled By: eellison
fbshipit-source-id: d17f1195a22149d5b0d08d76c89a7fab8444f7c5
Summary:
Before this PR, loop unrolling + the graph fuser was creating multiple
FusionGroups with the same bodies (with different variable names) for
JIT LSTMs. Each FusionGroup got registered to a separate fusion key;
each key resulted in a different compilation for the same
specializations.
This PR makes it so that when registering FusionGroups with the fusion
compiler, the compiler first checks the KernelSpec cache to see if the
FusionGroup's graph exists already. If it does, then return the
corresponding KernelSpec's key to share compiled kernels.
In addition, graphs in the KernelSpec cache are canonicalized before
being cached. I added a flag to the canonicalize pass to remove unique
names of values.
This shortens the compile time for a JIT LSTM (seq_len of 100, loop
unroll factor of 8) from 5.3s to 2.3s. Most of this compile time is
running the graph fuser and/or fusion compiler; while this PR
makes it so that there is only one unique kernel in the forward pass,
there are a lot of different kernels (6) in the backward pass
(after loop unrolling) that should be investigated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14541
Differential Revision: D13324487
Pulled By: zou3519
fbshipit-source-id: b841d82ed35a959b5cfc72db033bf5a7b42cc4fb
Summary:
Fixes#15119. Before this PR, we were propagating constants through
aten::warn AND running it as a part of shape analysis.
This caused aten::warn to be run regardless of if it is
supposed to be run dynamically. This PR adds an exclusion for aten::warn
in constant propagation and shape analysis, similar to that of prim::RaiseException.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15124
Differential Revision: D13432815
Pulled By: zou3519
fbshipit-source-id: 15ab533ce2accb2da3fd4e569070c7979ce61708
Summary:
Fixes#15038.
aten::_cast_Float(tensor, non_blocking) support was added in #14336.
Its second argument is a bool, but because we don't support generating values
of type bool in the fuser codegen, the codegen errored out.
aten::_cast_Float in the fuser never actually uses its non_blocking
argument, so another way to fix this would be to have a special op for a
fused cast but I thought that we might have fusible ops that do take
bool arguments in the future so this would be good to have.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15057
Differential Revision: D13432091
Pulled By: zou3519
fbshipit-source-id: 455fe574f5f080aca9a112e346b841a2534a8dc3
Summary:
This PR creates TestFuser inside test_jit.py to be a home for graph fuser
specific tests.
This was a useful exercise because now that all the fuser tests are in
one place, I can spot redundant and bitrotting tests for cleanup in a
future PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15072
Differential Revision: D13421458
Pulled By: zou3519
fbshipit-source-id: 80b1a7712feff75a0c186d1664601c4edbbca694
Summary: Removes all warnings spew for the TestJitGenerated tests
Differential Revision: D13420919
fbshipit-source-id: f251c12f923088ccc5daa2984c15003a67cbd1c1
Summary:
This removes FloatToInt style names replacing it with just the destination
name (e.g. FloatToInt -> Float). This makes it more consistent with the
syntax and makes it easier to add type conversions (just add a new
prim::Int op, for instance).
None of these ops get serialized so this should not effect loading of
old models.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14947
Differential Revision: D13408409
Pulled By: zdevito
fbshipit-source-id: d773fe863f14d9de893f686832769f8cc8903a8e
Summary:
This can be use to initialize state that is not necessarily eligible for serialization/is implementation-specific. Concretely, I'm going to use this to pack the weight matrices for quantized Linear modules according to the FBGEMM APIs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14655
Differential Revision: D13404438
Pulled By: jamesr66a
fbshipit-source-id: 2d327cef5520fdd716b5b1b29effd60a049e8a4a
Summary:
Not ready yet, need some comments / help with this. It's good enough for https://github.com/pytorch/xla immediate goals (forward + backward trace fusion), but there are at least two issues with it:
1. If we don't allow it, `test/test_jit.py` fails to cover the change.
2. If we allow the weight to be set, running `test/test_jit.py TestJitGenerated.test_nn_nll_loss` fails with:
```
======================================================================
ERROR: test_nn_nll_loss (__main__.TestJitGenerated)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test/test_jit.py", line 10001, in do_test
fn, f_args_variable, kwargs_variable, no_grad=no_grad)
File "test/test_jit.py", line 9360, in check_against_reference
outputs_test = self.runAndSaveRNG(func, recording_inputs, kwargs)
File "test/test_jit.py", line 425, in runAndSaveRNG
results = func(*inputs, **kwargs)
File "test/test_jit.py", line 9298, in script_fn
self.assertExportImport(CU.the_method.graph, tensors)
File "test/test_jit.py", line 415, in assertExportImport
self.assertExportImportModule(m, inputs)
File "test/test_jit.py", line 419, in assertExportImportModule
self.assertEqual(self.runAndSaveRNG(m.forward, inputs),
File "test/test_jit.py", line 425, in runAndSaveRNG
results = func(*inputs, **kwargs)
RuntimeError:
arguments for call are not valid:
for operator aten::nll_loss_backward(Tensor grad_output, Tensor self, Tensor target, Tensor? weight, int reduction, int ignore_index, Tensor total_weight, *, Tensor out) -> Tensor:
expected a value of type Tensor for argument 'total_weight' but found bool
<internally-created-node>
~ <--- HERE
for operator aten::nll_loss_backward(Tensor grad_output, Tensor self, Tensor target, Tensor? weight, int reduction, int ignore_index, Tensor total_weight) -> Tensor:
expected a value of type Tensor for argument 'total_weight' but found bool
<internally-created-node>
~ <--- HERE
for call at:
<internally-created-node>
~ <--- HERE
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14305
Differential Revision: D13356265
Pulled By: ezyang
fbshipit-source-id: 504d783b2d87f923e698a6a4efc0fd9935a94a41
Summary:
This PR adds `None` buffers as parameters (similarly to #14715). It also cleans up a bunch of the `test_jit.py` tests that should be covered by `common_nn.py` and brings in `criterion_tests` to test loss functions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14778
Differential Revision: D13330849
Pulled By: driazati
fbshipit-source-id: 924cc4cf94e0dcd11e811a55222fd2ebc42a9e76
Summary:
This PR:
1. add tests for batchnorm/dropout for train/eval parameter mutatino
2. remove training constants from all our standard library
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14780
Differential Revision: D13331578
Pulled By: wanchaol
fbshipit-source-id: d92ca3ce38cc2888688d50fe015e3e22539a20a5
Summary:
Dealing with so many `aten::size` calls (in particular calls on elements computed inside fusion groups) requires us to do some extra graph processing in the fuser (to compute the sizes by explicit broadcasts, instead of writing the intermediate tensors only to check their size). This restores the forward expects of LSTM and MiLSTM to a single big kernel. Unfortunately the backward is much harder, because as long as we can't prove that the reductions are unnecessary (or if we can't distribute them over the op), we will not be able to fuse them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14558
Differential Revision: D13321748
Pulled By: zou3519
fbshipit-source-id: c04fc2f70d106d2bfb56206b5aec517a93b79d1f
Summary:
This PR add resnet to test_jit and convert more nn modules, stacked on #14533 and #14715
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14437
Differential Revision: D13325871
Pulled By: wanchaol
fbshipit-source-id: 6c94a988b36794a373af6541c0c262a07291f7b1
Summary:
Adding Loss modules to script. Some of the modules have an optional tensor parameter. I will wait until wanchao's diff to support optional tensors is landed before landing this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14720
Differential Revision: D13317990
Pulled By: eellison
fbshipit-source-id: 535925bdf126d28d9e7d64077b83ebd836a5beba
Summary:
Only compare the device index if device has it.
Test the tensor restore with some computation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14734
Reviewed By: dzhulgakov
Differential Revision: D13317949
Pulled By: houseroad
fbshipit-source-id: 26b2f2912a9bbc3b660a62283fb403ddab437e49
Summary:
Previously symbolic AD formulas assumed that no broadcasting happened,
and would return gradients of incorrect shapes (possibly leading to
silent errors later).
Fixes a few bugs (known and unknown):
- #11736
- ArgumentSpec didn't compute the input types correctly [(it didn't advance the offset for non-tensor args)](https://github.com/pytorch/pytorch/pull/14485/files#diff-4fd3157a056596aefb8cdf41022a208bR153)
- Symbolic AD could suffer from use after free (dangling pointers in grad map), because [`EliminateDeadCode` could have removed nodes](https://github.com/pytorch/pytorch/pull/14485/files#diff-25d33ad1ed6855684dec79d927ca6142L781) that referenced gradients of certain values.
- Undefined behavior in `aten::size`
During my tests I've also found a few new problems, and I have opened issues for them:
- FusionGroup seems to think that cat nodes broadcast their inputs (#14483)
- `prim::ConstantChunk` derivative formula doesn't handle undefined inputs (#14484)
This patch unfortunately deoptimizes some of our code (Fusion doesn't happen past chunk nodes, and outputs more tensors only because we have to get their size). I know how to fix those issues, but wanted to fix this terrible bug quickly.
cc zou3519 zdevito ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14485
Reviewed By: eellison
Differential Revision: D13312888
Pulled By: suo
fbshipit-source-id: ad46bfb4d0a306ad9451002f8270f7a790f72d58
Summary:
Add support for interpolate and upsampling in weak_script mode.
Because the function parameters are overloaded, i had to add it as a builtin op. For interpolate:
size can be ?int | int[]?, and scale_factor can be ?float | float[]?. Every combination of the two parameters needs to be supported.
The same logic applies for upsample_nearest, upsample_bilinear, and upsample.
There are a few fixes that I came to along the way.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14123
Differential Revision: D13278923
Pulled By: eellison
fbshipit-source-id: e59729034369be4ce4b747291a3d1c74e135b869
Summary:
Check whether the codegen'd alias annotations actually track alias creation and writes correctly. This could be made more exhaustive, but it's good enough for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14588
Differential Revision: D13312653
Pulled By: suo
fbshipit-source-id: 98de1610ea86deada71957c75c222fff331a0888
Summary:
This PR:
1. Handle None value attr in the WeakScriptModuleProxy
2. add back module tests that now passing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14715
Differential Revision: D13313573
Pulled By: wanchaol
fbshipit-source-id: a6b7892707350290a6d69b6f6270ad089bfc954b
Summary:
To enable self.training in script modules, this PR automatically adds a buffer called 'training' if a script method requests self.training. Assignment to self.training is overloaded to assign both to the boolean property and the tensor value.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14719
Differential Revision: D13310569
Pulled By: zdevito
fbshipit-source-id: 406387bb602f8ce5794eeff37642863c75928be5
Summary:
[ note: stacked on expect files changes, will unstack once they land ]
This adds DeviceObjType (cannot use DeviceType it is already an enum)
to the type hierarchy and an isDevice/toDevice pair to IValue.
Previous hacks which used an int[] to represent Device are removed
and at::Device is used instead.
Note: the behavior or .to is only a subset of python, we need to
fix the aten op so that it accepts Option[Device] and Optional[ScalarType].
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14666
Reviewed By: suo
Differential Revision: D13290405
Pulled By: zdevito
fbshipit-source-id: 68b4381b292f5418a6a46aaa077f1c902750b134
Summary:
This PR is a part of task to unblock standard library export. Basically we want enable the ability to meta program IF stmt to dynamically emit different branches base on `cond`. This is primarily used to disable certain branch compilation on If, like the below
```python
import torch
class Test(torch.jit.ScriptModule):
def __init__(self, b = None):
self.b = b
def forward(self, input):
x = input
if self.b is not None:
x = self.b(input)
return x
Test()(torch.randn(2, 3))
```
This is also the first step for us to bridge the gap between none simple value and any sugared value in JIT.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14533
Differential Revision: D13310526
Pulled By: wanchaol
fbshipit-source-id: 78d1a8127acda5e44d2a8a88f7627c43d29ff244
Summary:
We align the restore logic to `torch.load`, we try to restore to the right device, and if the device is not available, an exception is raised. We allow user to remap the device through a parameter `map_location`, it can be 1) a string like 'cuda:0`, `cpu`, 2) a device, torch.device('cpu'), 3) a dict, {'cuda:1', 'cuda:0'}, and a function, and its signature looks like string map_location(tensor, saved_device_string).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14454
Reviewed By: zrphercule
Differential Revision: D13271956
Pulled By: houseroad
fbshipit-source-id: dfd6b6049b0dc07549ddeddf2dea03ac53ba6d49
Summary:
Previously symbolic AD formulas assumed that no broadcasting happened,
and would return gradients of incorrect shapes (possibly leading to
silent errors later).
Fixes a few bugs (known and unknown):
- #11736
- ArgumentSpec didn't compute the input types correctly [(it didn't advance the offset for non-tensor args)](https://github.com/pytorch/pytorch/pull/14485/files#diff-4fd3157a056596aefb8cdf41022a208bR153)
- Symbolic AD could suffer from use after free (dangling pointers in grad map), because [`EliminateDeadCode` could have removed nodes](https://github.com/pytorch/pytorch/pull/14485/files#diff-25d33ad1ed6855684dec79d927ca6142L781) that referenced gradients of certain values.
- Undefined behavior in `aten::size`
During my tests I've also found a few new problems, and I have opened issues for them:
- FusionGroup seems to think that cat nodes broadcast their inputs (#14483)
- `prim::ConstantChunk` derivative formula doesn't handle undefined inputs (#14484)
This patch unfortunately deoptimizes some of our code (Fusion doesn't happen past chunk nodes, and outputs more tensors only because we have to get their size). I know how to fix those issues, but wanted to fix this terrible bug quickly.
cc zou3519 zdevito ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14485
Differential Revision: D13280899
Pulled By: soumith
fbshipit-source-id: 80cc5ec9331be80e1bb9ddfe85b81c2b997e0b0c
Summary:
This PR makes DCE a little smarter in the presence of mutable ops. Previously mutable ops could never be cleaned up, now they can be cleaned up if we can prove there are no live uses of any alias sets that the op writes to.
This behavior is optional; if you pass DCE a block instead of a graph, it will do the same thing as before. Also changed `InlineAutographSubgraph` to use the common subgraph utils.
Tested on traced ResNet, and it gets rid of the dead code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14601
Differential Revision: D13309118
Pulled By: suo
fbshipit-source-id: dac2791e7d2ecf219ae717a2759b83c1e927f254
Summary:
This PR is a part of task to unblock standard library export. Basically we want enable the ability to meta program IF stmt to dynamically emit different branches base on `cond`. This is primarily used to disable certain branch compilation on If, like the below
```python
import torch
class Test(torch.jit.ScriptModule):
def __init__(self, b = None):
self.b = b
def forward(self, input):
x = input
if self.b is not None:
x = self.b(input)
return x
Test()(torch.randn(2, 3))
```
This is also the first step for us to bridge the gap between none simple value and any sugared value in JIT.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14533
Differential Revision: D13272203
Pulled By: wanchaol
fbshipit-source-id: 44a545abb766bbd39b762a6e19f9ebaa295e324b
Summary:
Stacked on zip commit because it also changes expect files, read only the last commit.
This reduces the number of ways we can print a Type from 3 (python_str, str, operator<<) to 2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14657
Differential Revision: D13288912
Pulled By: zdevito
fbshipit-source-id: f8dd610cea798c511c1d4327395bba54b1aa1697
Summary:
After consulting with Owen, who pointed out the existence of the miniz library, I decided to take one last shot at using zip as our container format.
miniz makes this surprisingly feasible and I think the benefits of using zip are large enough that we should do it.
This replaces our custom container format with a zip archive, preserving all of the
desirable features of our custom format, such as append-oriented writing, and
mmap'able tensor data while adding a bunch of debugging advantages:
1. You can unzip and explore the container to debug what is going on with a model.
2. You can edit the model using a text editor (e.g. change the definition of a method,
or editing the json-serialized meta-data), re-zip the file use OSX's native 'Compress'
option, and re-load the result into pytorch. Note: this enables you to, e.g., print-debug
serialized models.
3. We can easily enable features like compression in the future.
4. Stock python , without pytorch installed, and other programming languages
can reasonably consume this format,using json and zipfile packages, which enables
people to build tools like visualizers without those visualizers depending on pytorch.
This will be especially useful if you want to, for instance, write a visualizer in javascript.
Notes:
* This add miniz (https://github.com/richgel999/miniz) as a dependency. miniz is a self-contained
library for reading/writing zipfiles that unlike other zip libraries also includes libz
compatible compress/decompress support. It is a single header and a single C file without
any other dependencies. Note that the instructions for miniz explicitly state:
> Please use the files from the releases page in your projects. Do not use the git checkout directly!
So we have checked in the 'release' source. Miniz supports zip64, and its API is amenable
to doing zip-align style things to align data.
* Removes 'size' from RecordRef. This allows you to edit files in the zip archive without
editing the meta-data file. Very important if you want to print-debug serialized models.
* PyTorchStreamReader/PyTorchStreamWriter keep mostly the same API (though keys become strings)
However, their implementation is completely swapped out to use miniz.
* Code exists to check for the old magic number to give a decent warning to our preview users
after we change the format.
* Container version information is now put in a stand-alone 'version' file in the archive
and serves a similar purpose to the other container version info.
* All files in the zip archive start at 64-byte boundaries, using an approach similar to
zip-align. Tests check that this property remains true. While the writer does this,
the reader doesn't depend on it, allowing user-created archives that can use compression,
and do not have to align data.
* Added test to check for > 4GB files and archives. Disabled by default because it takes
almost 2 minutes to run.
* torchscript files are now optional: if a submodule does not have methods, it will
not be written.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14521
Reviewed By: jamesr66a
Differential Revision: D13252945
Pulled By: zdevito
fbshipit-source-id: 01209294c0f6543d0fd716f85a38532249c52f8c
Summary:
Remove no_grad_embedding_renorm_ from aten. Setting the derivatives of the inputs to false has different semantics from calling with no_grad(), because it will not error if an input is modified and then has it's grad accessed.
Instead, make a custom op, and use NoGradGuard.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14639
Differential Revision: D13285604
Pulled By: eellison
fbshipit-source-id: c7d343fe8f22e369669e92799f167674f124ffe7
Summary:
This moves `new_module_tests` from `test_nn.py` to `common_nn.py` so
that they can be used in `test_jit.py` without running any of
`test_nn.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14578
Differential Revision: D13268286
Pulled By: driazati
fbshipit-source-id: 6e8654a4c29ab754d656ac83820c14d1c1843e03
Summary:
Stacked on https://github.com/pytorch/pytorch/pull/14378, only look at the last commit.
This changes the way methods are defined in TorchScript archives to use
PythonPrint rather than ONNX protobufs.
It also updates torch.proto to directly document the tensor data
structure actually being serialized.
Notes:
* because PythonPrint prints all the methods at once per module, this
removes MethodDef in favor of a single torchscript_area and a separate
caffe2_graphs entry. Note that NetDef's already have method names,
so there is no need or a separate method name entry.
* This switches cpp/pickle area to RecordRef (references to a file in
the container format) since it is possible the data in these arenas
may be large and not suited to json ouput.
* Removes 'annotations' -- annotations should be re-added on the first
commit that actually has a practical use for them. In the current state
it is unlikely they are representing the right information.
* Some expect files have changed because PythonPrint is preserving more
debug name information for parameter names.
* MethodEncoder (the ONNX output format) has been deleted. There is still
some cleanup possible combining EncoderBase and GraphEncode now that there
is only a single pathway using EncoderBase.
* This incorporates the changes from #14397
to define TensorDef
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14400
Reviewed By: suo
Differential Revision: D13231800
Pulled By: zdevito
fbshipit-source-id: af5c1152d0bd6bca8b06c4703f59b161bb19f571
Summary:
To convert `max_unpool` functions to weak script, this PR adds support
for `T` as default arguments for `BroadcastingListN[T]`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14361
Differential Revision: D13192231
Pulled By: driazati
fbshipit-source-id: a25b75a0e88ba3dfa22d6a83775e9778d735e249
Summary:
This PR adds weak modules for all activation modules and uses `test_nn` module tests to test weak modules that have been annotated with `weak_module` and therefore are in `torch._jit_internal._weak_types`
Also depends on #14379
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14238
Differential Revision: D13252887
Pulled By: driazati
fbshipit-source-id: e9638cf74089884a32b8f0f38396cf432c02c988
Summary:
Resubmitting PR #14415
The tests added for Embedding + EmbeddingBag had random numbers as input, which affected the random number generator & caused the flakey test to break.
Everything but the last two commits have already been accepted
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14509
Differential Revision: D13247917
Pulled By: eellison
fbshipit-source-id: ea6963c47f666c07687787e2fa82020cddc6aa15
Summary:
Add support for Embedding and EmbeddingBag in script. Both functions require with torch.no_grad(), which we don't have any plans to support in the near future. To work around this, I added a embedding_renorm function without derivatives.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14415
Reviewed By: wanchaol
Differential Revision: D13219647
Pulled By: eellison
fbshipit-source-id: c90706aa6fbd48686eb10f3efdb65844be7b8717
Summary:
This PR adds weak modules for all activation modules and uses `test_nn` module tests to test weak modules that have been annotated with `weak_module` and therefore are in `torch._jit_internal._weak_types`
Also depends on #14379
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14238
Differential Revision: D13192230
Pulled By: driazati
fbshipit-source-id: 36488960b6c91448b38c0fa65422539a93af8c5e
Summary:
This PR allows to overload functions based on the value of a parameter (so long as it is a constant). See max_pool1d for an example usage.
This is the first step in enabling the use of max_pool functions for the standard library that can return `Tensor` or `Tuple[Tensor, Tensor]` based on the `return_indices` flag. This will give the JIT identical results to the Python versions of the functions.
Fixes#14081
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14425
Differential Revision: D13222104
Pulled By: driazati
fbshipit-source-id: 8cb676b8b13ebcec3262234698edf4a7d7dcbbe1
Summary:
Port AffineGrid to C++, because script does not support compiling Function classes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14392
Differential Revision: D13219698
Pulled By: eellison
fbshipit-source-id: 3ddad8a84c72010b5a6c6f7f9712be614202faa6
Summary:
Stacked on #14176, review only the last commit.
* Print parameters to methods as self.weight rather than as extra inputs.
* Print entire set of methods out as a single string
* Update test code to test the module-at-a-time export/import
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14378
Differential Revision: D13198463
Pulled By: zdevito
fbshipit-source-id: 3fab02e8239cfd6f40d6ab6399047bd02cf0a8c8
Summary:
In #14239 we fixed ONNX_ATEN.
In order to make sure its correctness in the future, we should add related test case.
We use torch.fmod() to test ONNX_ATEN.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14259
Differential Revision: D13204610
Pulled By: zrphercule
fbshipit-source-id: e4660c346e5edd201f1458b7d74d7dfac49b94c7
Summary:
This PR adds a `try_outplace` option to the tracer. When `try_outplace` is true, the tracer will attempt to out-of-place ops (similar to how things are done today). When it's false, the correct in-place op is emitted.
I made `try_outplace` false by default, but flipped it to true for ONNX export utils. zdevito jamesr66a, anywhere else I should preserve the existing behavior?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14254
Reviewed By: eellison
Differential Revision: D13166691
Pulled By: suo
fbshipit-source-id: ce39fdf73ac39811c55100e567466d53108e856b
Summary:
[Stacked commit, only review the last commit]
This PR adds support for printing default values in python printing as well as the logic
for parsing default values back in using the parser. For simplicity, this PR simply
creates a subgraph of the constant expressions and then runs that graph to generate the defaults.
A more lightweight approach should be possible later, but would require more machinery.
To make reading code in the printer easier, this also add ir_views.h.
Similar to tree_views.h these classes can provide views of some commonly used IR nodes
that have complicated structure and common operations on that structure.
Currently it has only read-only views for prim::If and prim::Loop,
but we should eventually add helpers to manipulate If/Loop nodes as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14176
Differential Revision: D13198455
Pulled By: zdevito
fbshipit-source-id: dc99ab9692804ccaedb60a55040c0b89ac7a6a6d
Summary:
This adds scalar type support to the fuser, both internally (instead of auto / assuming float) and for the inputs/outputs.
We can now fuse things with input / output of arbitrary scalar type, in particular comparisons and where work well. So it fixes#13384 by returning the right type tensor (and adds a test where byte and double tensors are returned).
The type inference is done by re-calling PropagateTensorShapeOnNode in the compilation, I would venture that it isn't prohibitively expensive compared to the actual compilation. (Propagation was fixed for where to return the second argument's type and amended to handle FusedConcat.)
I'm not sure how to add a check for the code generated by the fuser, but I am not sure we absolutely need to (we'd see if it is invalid / produces wrong results).
Thanks in particular to apaszke, fmassa, mruberry for advice and encouragement! All the errors are my own.
I have discussed order of PRs briefly with mruberry, if this goes in before he submits the PR, he graciously agreed to rebasing his, but I'd happily rebase, too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14336
Differential Revision: D13202620
Pulled By: soumith
fbshipit-source-id: 855159e261fa15f21aca3053bfc05fb3f720a8ef
Summary:
This PR allows to overload functions based on the value of a parameter (so long as it is a constant). See `max_pool1d` for an example usage.
This is the first step in enabling the use of `max_pool` functions for the standard library that can return `Tensor` or `Tuple[Tensor, Tensor]` based on the `return_indices` flag. This will give the JIT identical results to the Python versions of the functions.
Depends on #14232 for `Optional[BroadcastingList[T]]`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14081
Differential Revision: D13192228
Pulled By: driazati
fbshipit-source-id: fce33c400c1fd06e59747d98507c5fdcd8d4c113
Summary:
Fixes#12290. Also speeds up JIT LSTM forward pass from 8.8ms to 7.8ms; previously, each JIT lstm cell used 2 fused kernels. Now, it only uses one fused kernel (which is how many kernels cudnn uses).
Explanation:
Let f, g, h be fusible ops.
```
x = f(v, w)
z = g(x, y)
a, b = chunk(z)
c = h(a, b)
```
becomes (before this PR):
```
x = f(v, w)
x', y' = broadcast_tensors([x, y])
ax, bx = chunk(x')
ay, by = chunk(y')
a = g(ax, ay)
b = g(bx, by)
c = h(a, b)
```
The graph fuser then puts g, g, and h into one FusionGroup and is unable
to move `x = f(v, w)` into the FusionGroup.
This PR lets the graph fuser move `x = f(v, w)` into the FusionGroup.
It does this by abstracting the broadcast_tensors + multiple chunk nodes
into one intermediate `prim::BroadcastingChunk[chunks, dim]` node.
A `BroadcastingChunk[chunks, dim](*inputs)` node is equivalent to:
- broadcasting all of *inputs
- chunk-ing each broadcasted input into `chunks` chunks along dim `dim`.
Abstracting the broadcasting chunk behavior away, it is now a lot easier
for the graph fuser to move (broadcast + chunk) past an operation. After
this PR, the above graph becomes:
```
x = f(v, w)
ax, bx, ay, by = BroadcastingChunk(x, y)
a = g(ax, ay)
b = g(bx, by)
c = h(a, b)
```
Now, to move `x = f(v, w)` after the BroadcastingChunk, one just needs
to add f's operands to the BroadcastingChunk:
```
ay, by, av, bv, aw, bw = BroadcastingChunk(y, v, w)
ax = f(av, aw)
by = f(bv, bw)
a = g(ax, ay)
b = g(bx, by)
c = h(a, b)
```
cc apaszke mruberry zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14055
Differential Revision: D13159259
Pulled By: zou3519
fbshipit-source-id: 134e9e645c950384d9be6a06a883a10e17a73d7d
Summary:
Fix a mishandling of `foo[a] = b` when `a` was a tensor. We were assigning to a copy of `foo`, not a view of it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14311
Differential Revision: D13196109
Pulled By: suo
fbshipit-source-id: c929401fda7c4a27622d3fe2b11278b08a7f17f1
Summary:
This handles the input pre-multiplication in RNNs, yielding pretty significant speedups in backward times. This pass depends on loop unrolling, so we'll batch only as many elements as the unrolling factor allows.
cc mruberry ngimel zou3519 zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13456
Differential Revision: D12920339
Pulled By: zou3519
fbshipit-source-id: 5bcd6d259c054a6dea02ae09a9fdf9f030856443
Summary:
1. Support `Optional[BroadcastingList1[int]]` like type annotation to accept a int or a list[int]
2. Convert gumbel_softmax, lp pooling weak functions and modules
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14232
Differential Revision: D13164506
Pulled By: wanchaol
fbshipit-source-id: 6c2a2b9a0613bfe907dbb5934122656ce2b05700
Summary:
export - print a method with python_print
import - import a method with import_method
We want to ensure:
export(g) == export(import(export(g)))
That is after after exporting/importing once, the graph will stay exactly
the same. This is less strict that g == import(export(g)) which would
require us to maintain a lot more information about the structure of the
IR and about the names of debug symbols.
This PR addresses this with the following fixes:
* print out double-precision numbers with high enough precision such
that they always parse in the same way
* when creating loop-carried dependencies, sort them
by variable name, ensuring a consistent order
* parse nan correctly
* DCE: remove unused outputs of if statements, and loop-carried dependencies
in loops that are dead both after the loop and inside the body of the
loop.
* Do not set uniqueName for variables whose names are _[0-9]+, these
are probably rare in user code, and we need a way to communicate
that we do not care about a variable name when re-parsing the graph.
Otherwise temporary variable names will jitter around.
* Expand the definition of a constant in printing code to None,
and family.
* Allow re-treeing to work as long as the only thing in its way is a
constant node. These do not have side effects but are sometimes
inserted in a different order when tracing compared to how we print them.
* Print all constant nodes out first in the order in which they are used_val
(or, if they are inlined, ensure they get assigned CONSTANT.cX number
in a consistent order). Cleanup tuples (this is done in the compiler,
but not in the tracer, leading to some tuple indexing jitter if not
done).
* use strtod_l, not std::stod which can throw exceptions
Other:
* Add REL_WITH_DEB_INFO to setup.py. It already existed for the
cmake files. Threading it into setup.py allows us to turn on
debug symbols with optimization everywhere.
* enable round trip testing for all generated graphs. This only adds
~6 seconds to total build time but tests printing for every graph.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14064
Differential Revision: D13094637
Pulled By: zdevito
fbshipit-source-id: 0a1c6912194d965f15d6b0c6cf838ccc551f161d
Summary:
This PR inserts `prim::None` constants for undefined tensors. This comes in the standard library if an `Optional[Tensor]` is statically determined to be `None`:
```python
torch.jit.script
def fn(x=None):
# type: (Optional[Tensor]) -> Tensor
return torch.jit._unwrap_optional(x)
torch.jit.script
def fn2():
# type: () -> Tensor
return fn()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14120
Differential Revision: D13124625
Pulled By: driazati
fbshipit-source-id: 9eaa82e478c49c503f68ed89d8c770e8273ea569
Summary:
This PR did three things:
1. It export the BatchNorm functional and module, and rewrite some of the components to stay align with the current supported JIT features
2. In the process of export, add necessary compiler support for in_place op aug assign
4. change the test_jit behavior in add_module_test to utilize a single rng state during module initialization
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14016
Differential Revision: D13112064
Pulled By: wanchaol
fbshipit-source-id: 31e3aee5fbb509673c781e7dbb6d8884cfa55d91
Summary:
- This should help TestJit.test_lstm_fusion_concat_cuda
to be less flaky. (Checked on manual_seed 0..99)
Fixes: #14026
- Revert the renaming of test_fused_abs that was introduced
to game the order of tests to avoid the flakiness above.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14094
Differential Revision: D13100174
Pulled By: soumith
fbshipit-source-id: 91bb63b07a960a81dddfc0bf25c67696c0f6c46d
Summary:
* Add hooks to get a callback whenever a valid graph is produced in the compiler or through tracing. These hooks can be used to pretty_print and then reparse every graph our tests produce to check that the serialization function works correctly. Currently this is guarded by an environment variable since there are a few remaining failures.
* Fix printing bugs: True and False rather than 1 and 0, print 0. for floating point zero
* Change behavior of NoneType. It is now no longer a subtype of Optional but instead implicitly converts to it, returning a prim::Node with an Option[T] type for some specific T. This allows functions like `_unwrap_optional` to correctly match against a None while still deriving the right type.
* Fix a bug where empty blocks did not correctly emit "pass" in printer.
* Fix a bug where prim::Undefine sometimes cannot be printed as None because it is being used in a schema-less op. This should be fixable once Optional[T] always uses the same None object.
* Other minor printing bugs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13959
Reviewed By: jamesr66a
Differential Revision: D13073519
Pulled By: zdevito
fbshipit-source-id: 4167a6b614f2e87b4d21823275a26be5ba4fc3dd
Summary:
I'm now traveling and don't have access to a good computer to compile test by myself. Will see the outcome of CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12199
Differential Revision: D13062326
Pulled By: nairbv
fbshipit-source-id: 85873525caa94906ccaf2c739eb4cd55a72a4ffd
Summary:
* Correctly adds annotate when needed for lists
* Parser/Emitter handles octal escapes so we do not fail for some strings.
* more complete keyword list in pretty printer
* floating point numbers are always printed with a decimal to ensure
we never mistake them in parsing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13879
Differential Revision: D13037860
Pulled By: zdevito
fbshipit-source-id: f09ab174fc33402a429b21a5bfaf72e15c802cad
Summary:
Convert some more functions to match up with features added. Some
conversions were unsuccessful but the type line was left in for later.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13707
Differential Revision: D13030210
Pulled By: driazati
fbshipit-source-id: 02d5712779b83b7f18d0d55539e336321335e0cc
Summary:
This PR makes weak modules in `nn.Sequential` get properly compiled
when used
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13889
Differential Revision: D13039559
Pulled By: driazati
fbshipit-source-id: d3266305f0e206b2a19b63230ac2ab8f02faa603
Summary:
* Adds `OptionalType` support for import/export
* Optionals get exported along with their contained type, i.e. 'Optional[int]'
* Allows concrete types and `None` to be passed to an op that takes an optional
* Converts `softmax`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13647
Differential Revision: D12954672
Pulled By: driazati
fbshipit-source-id: 159e9bfb7f3e398bec3912d414c393098cc7455a
Summary:
Get pretty printer ready for use as a serialization format
This PR adds a bunch of functionality to the pretty printer (now called python_printer to reflect
the fact that it will be used to output valid python source). The idea is to get the printer
ready for use as serialization format. This PR does not have tests beyond what the pretty
printer already had. PRs stacked on this one will do round-trip export/import to test this functionality more robustly.
Notes:
* PythonPrinter is an evolution of the original pretty printer. However, much of it has changed so it is best just to
read it as a new implementation. Trying to correlate it to the original implementation is probably not much help.
* The printer tries to get reasonably close to how the original function was likely written, such as
writing expressions rather than making intermediates when possible. We may decide to turn this off
for the actual serialization, but it is useful for pretty printing.
* tensor field access was changed so that prim::device and family have schema
* fixed a bug in the compiler where setUniqueName gets called even when a value already has one.
this sometimes assigned really poor names to graph inputs
* Graph::insert gains an optional range argument to make range-preserving inserts easier.
* prim:: ops that can have schema now have schema. This is because when we parse them back in,
we will need the schema to correctly set their output types.
* there is code in the python printer to complain if you try to add a prim op and do not update the printer.
* BuiltinModule is generalized to take an operator namespace and a version number for work in future commits.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13616
Reviewed By: goldsborough
Differential Revision: D13008252
Pulled By: zdevito
fbshipit-source-id: 32b33bc6410d6ca1c6f02bd6e050f8d5eea32083
Summary:
This PR is a part of task to unblock standard library export.
* we treat None differently from Tensor and other types, when passing None as Tensor, it's an undefined tensor rather than the None IValue.
* Refine the type system so that we have correct tensor types hierarchy (Dynamic/Tensor/CompleteTensor), Dynamic should be at the top of the inheritance hierarchy.
* It also tries to export bilinear as an example of undefined tensor(None) input.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13650
Differential Revision: D12967026
Pulled By: wanchaol
fbshipit-source-id: 6aedccc7ce2a12fadd13d9e620c03e1260103a5a
Summary:
When tracing scripted functions, we used to only allow Tensor arguments.
This enables tracing script modules with List[Tensor] or Tuple[Tensor, Tensor] arguments (passing
tuples).
Fixes: #13566
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13597
Differential Revision: D12990464
Pulled By: soumith
fbshipit-source-id: fdce3afcb1e09f3c26d6ce834c01bf18d261f47c
Summary:
Upon calling wait(), save the forked thread and the current thread to a
task queue. A idling thread (which currently is single threaded) should
pick a ready task and run till there is nothing in the task queue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13212
Differential Revision: D12884522
Pulled By: highker
fbshipit-source-id: b3942a0ee63c148e05f5f41bdc73007fa3c3368e
Summary:
This adds torch.jit.annotate for annotating the type of an intermediate.
This is Py2/3 compatible, e.g.:
```
from torch.jit import annotate
from typing import List
torch.jit.script
def foo():
a = annotate(List[int], [])
```
This is needed to output valid python programs from our IR. It removes
the need for the empty list constructors.
A future patch can add support to the C++ parser and Python 3,
via desugaring:
```
a : int = b
a = anntoate(int, b)
```
But this functionality is not required for serialization so is not added in this patch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13752
Differential Revision: D12989885
Pulled By: zdevito
fbshipit-source-id: 161573a7352094543dc0d33a892f2a3b9103d847
Summary:
This is probably slow but it should make the traces more understandable and make debugging easier. Any suggestions for how to make it faster (i.e. make it so we don't have to traverse all of locals() and globals()) would be appreciated
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13441
Differential Revision: D12879763
Pulled By: jamesr66a
fbshipit-source-id: b84133dc2ef9ca6cfbfaf2e3f9106784cc42951e
Summary:
To support `_Reduction` in the jit this PR moves it out to a new file so that it goes through the paths for python modules in the script compiler and converts `F.ctc_loss` to weak script
Depends on #13484 for saving rng state
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13401
Differential Revision: D12868501
Pulled By: driazati
fbshipit-source-id: 23cec0fb135744578c73e31ac825e238db495d27
Summary:
Support things like `foo[0] = bar` in script.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13486
Differential Revision: D12964550
Pulled By: suo
fbshipit-source-id: 3dda8ffd683d1b045787c65bfa0c7d43b0455658
Summary:
Previously, we did not distinguish between `a = b` (simple assignment),
and `a, = b` (tuple destructuring of a singleton tuple).
The second case would fail in the string frontend, and would not unpack
in the python frontend. This patch fixes both issues and also cleans up
the error reporting for unexpected expressions on the LHS.
Will likely conflict with #13486
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13656
Differential Revision: D12964566
Pulled By: zdevito
fbshipit-source-id: 992b19e5068aef59a78cd23cb0e59a9eeb7755d1
Summary:
Due to a logic bug, tracing is broken for custom ops. Unfortunately, there also weren't any tests for tracing custom ops.
The fix is a single line change of moving `pop(stack, std::get<Is>(arguments)...);` before `node = getTracedNode<Is...>(schema, arguments);`. Other changes are added tests and improved commenting/formatting.
Fixes https://github.com/pytorch/pytorch/issues/13564
CC The controller you requested could not be found. fmassa
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13654
Differential Revision: D12952887
Pulled By: goldsborough
fbshipit-source-id: 87d256576f787c58e8d8f5c13a0fecd0ec62a602
Summary:
Correctly propagate the error msg from a python op to the JIT interpreter. In the interpreter we wrap the exception and re-throw it as a Runtime Exception. Potentially in a future diff we can throw the same type of python exception as was originally thrown.
Fix for https://github.com/pytorch/pytorch/issues/13560
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13624
Differential Revision: D12948756
Pulled By: eellison
fbshipit-source-id: 94cdf4c376143c5e40dcb9716aefb3c1e2d957db
Summary:
Add a builtin to refine the type of Optional[T] -> T. This is a short-term solution to unblock porting of the the standard library.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13599
Reviewed By: driazati, wanchaol
Differential Revision: D12943193
Pulled By: eellison
fbshipit-source-id: 31c893a78d813313bbbc1d8212b5c04e403cfb4d
Summary:
- fixes weights-contiguous requirement for THCUNN Convolutions
- Add tests that conv backward pass works for non-contiguous weights
- fix RNN tests / error messages to be consistent and pass
- relax weight grad precision for fp16 for a particular test
- fix regression of CMAKE_PREFIX_PATH not passing through
- add missing skipIfNoLapack annotations where needed
Differential Revision: D12918456
Pulled By: soumith
fbshipit-source-id: 8642d36bffcc6f2957800d6afa1e10bef2a91d05
Summary:
This PR adds functions defined in `torch._C._nn` as builtin functions (including inplace variants). This allows for the conversion of more functions to weak script
NB: many `torch.nn.functional` functions will have to be slightly rewritten to avoid early returns (as with `threshold` in this PR)
Converts these functions to weak script:
* `threshold`
* `relu`
* `hardtanh`
* `relu6`
* `elu`
* `selu`
* `celu`
* `leaky_relu`
* `rrelu`
* `tanh`
* `sigmoid`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13322
Differential Revision: D12852203
Pulled By: driazati
fbshipit-source-id: 220670df32cb1ff39d120bdc04aa1bd41209c809
Summary:
The PR did two things:
1. fix the bug in erase_number_type on node inputs
2. handle negative indices for dim-reduce in caffe2
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12888
Reviewed By: houseroad
Differential Revision: D12833486
Pulled By: wanchaol
fbshipit-source-id: c3ceb400d91f0173b73ad95e392b010c3c14db7d
Summary:
To convert `nn.functional.dropout`
* `_VF` had to be exposed as a Python module so this PR adds a module class to forward to `torch._C._VariableFunctions`
* rng state between calls in the tests needed to be made consistent
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13484
Differential Revision: D12929622
Pulled By: driazati
fbshipit-source-id: 78b455db9c8856b94d2dda573fb7dc74d5784f56
Summary:
If there is no return type then the returns of the schema are not
checked against the returns in the graph, so this PR adds an error if
that case is detected.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13525
Differential Revision: D12929524
Pulled By: driazati
fbshipit-source-id: da562e979482393098830bbded26729a2499152a
Summary:
Built on top of #13108, so please review only the last commit.
This makes the graph fuser ignore input types (device/scalar type) when considering graphs for fusion, making it much more robust to shape-prop failures. Those properties are now checked at run time, as part of the kernel validation. This should enable graph fusions in `jit_premul` and `jit_multilayer` timelines in our benchmarks.
One regression is that I've disabled fusions of comparison ops (and `type_as`). That's because there's really no good way to ensure that those are really valid, and are a source of bugs (I filed #13384).
cc ngimel mruberry zdevito zou3519
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13387
Differential Revision: D12888104
Pulled By: zou3519
fbshipit-source-id: c233ea599679c34ac70fb4d8b8497c60aad9e480
Summary:
There are a couple subtle bugs in the way varargs is implemented:
1. it fails if you pass 0 arguments, because it doesn't handle the case when there are 0 varargs, and because Operator::matches was not updated.
2. it breaks all the named-based lookups on nodes. For instance node->get<int>(attr::value)
will return a single entry of the varargs if you look it up by name.
Furthermore it complicates some assumptions about the positional arguments (e.g. they use to be
1-to-1 with node inputs but with varargs they are not).
Because varargs are only being used for format, this diff instead
just allows format to take any value as input, regardless of type. It just provides a way to set is_vararg
from the schema but does not restrict the type of the varargs things. This is inline with
the pre-existing behavior for is_vararg so it doesn't require Operator::matches changes.
This also keeps format inline with how print works, and is closer to the python implementation of format. Note that the implementation
of format already worked with arbitrary IValues so restricting to strings was just making it more conservative than needed.
This also fixes the implementation of format to work when there are 0 arguments or text before and after a format string, where it would not print things.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13492
Differential Revision: D12896989
Pulled By: zdevito
fbshipit-source-id: 21425bac8edc81709030a7408180494edea0a54b
Summary:
This PR changes the compiler to correctly emit in-place operators for augmented assignments (`+=` and friends).
- To better match the Python AST structure, add an `AugAssign` tree view and make `Assign` apply only to `=` assignments.
- Emit those `AugAssign` exprs in the compiler, dispatching to in-place aten ops for tensors and lowering to simple assignments for scalar types.
- In order to preserve (suspect) ONNX export semantics, add a pass to lower the in-place operators to out-of-place operators.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13364
Differential Revision: D12899734
Pulled By: suo
fbshipit-source-id: bec83be0062cb0235eb129aed78d6110a9e2c146
Summary:
As titled, this PR is a part of tasks to unblock exporting the standard library.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13336
Differential Revision: D12888912
Pulled By: wanchaol
fbshipit-source-id: 6213a17a75a593ae45999994fd9562f29b7d42df
Summary:
Add npair builtins to unblock standard library. As with broadcasting list, the only occurrences are with int/floats.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13473
Differential Revision: D12890844
Pulled By: eellison
fbshipit-source-id: c360bb581d0f967cb51b858b6f964c300992d62a
Summary:
Arguments have an optional fixed length list field which allows either a list or a single element that will be broadcast to a fixed length.
This PR exposes that as a denotable argument, mostly to cover the many instances in which this used in the standard library. It appears in the standard library with ints & floats. Since this is not really a pattern we want to promote moving forward, I did not expose this for booleans or tensors.
We could consider making the optional static length part of the list type, instead of the argument, which would make some of this code much nicer.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13142
Differential Revision: D12876047
Pulled By: eellison
fbshipit-source-id: e7359d2a878b4627fc2b9ebc090f9849ee524693
Summary:
Adding assert statements to unblock standard library.
The same limitations that apply to the existing implementation of Exceptions apply to this as well
(No control-flow logic, & we ignore the specific Exception thrown).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13408
Reviewed By: driazati
Differential Revision: D12876451
Pulled By: eellison
fbshipit-source-id: 767ba5a50ba7c5dd6a857ed4845ac076a81cf305
Summary:
Fixes#13376, `len(tensor)` was converting tensor to a 1 element list and returning 1 every time.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13398
Differential Revision: D12867630
Pulled By: driazati
fbshipit-source-id: 28f3580a072d763df0980b3149c49d1894842ec9
Summary:
This PR adds `aten::format` as a builtin op for strings with the basic formatting semantics of Python.
It also adds varargs to the schema parser (with the limitation that the varargs item is the last argument, i.e. `(*args, **kwargs)` is not supported) and to the compiler
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13157
Differential Revision: D12832537
Pulled By: driazati
fbshipit-source-id: 17c1a5615bb286c648fc9e38f2ebe501b064c732
Summary:
This is used commonly in `nn` functions. This PR adds it as a weak
module (and also alters the conversion of weak modules to strong modules
to accept ordinary `object`s)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13057
Differential Revision: D10846618
Pulled By: driazati
fbshipit-source-id: 028b9f852d40e2e53ee85b93282c98cef8cd336b
Summary:
This is a first step towards adding exceptions. We need minimal support in order to begin converting the torch library to weak script mode (which is the main goal here).
Some limitations (that are documented in the tests & compiler):
1. Cannot assign exceptions to variables
2. Any name after raise is being treated as a valid Exception
3. No control flow analysis yet. Below a will be undefined:
if True:
a = 1
else:
raise Exception("Hi")
return a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12789
Differential Revision: D12848936
Pulled By: eellison
fbshipit-source-id: 1f60ceef2381040486123ec797e97d65b074862d
Summary:
Depends on #13072
Adds support for tuples as variables instead of just as literals. Before, tuples would give the error `python value of type 'tuple' cannot be used as a value`. This PR adds a flag on `SugaredValue` to determine in a value is a tuple or not.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13086
Differential Revision: D10846632
Pulled By: driazati
fbshipit-source-id: 7b5d6ae9426ca3dd476fee3f929357d7b180faa7
Summary:
This PR special cases tensor.storage_offset to avoid dispatches in the
common case. tensor.storage_offset is important for torch.as_strided
performance, because as_strided(sizes, strides) shares an implementation
with as_strided(sizes, strides, storage_offset) and it might not be the
best if there were two separate implementations (including backward
implementations).
This PR reduces times on a tensor.storage_offset
microbenchmark from 22ns to 2ns (these numbers are pretty stable). For
a torch.as_strided benchmark, this PR reduces numbers from 1042 to
928ns, a 100ns improvement, but this number is noisy and goes up and
down.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13267
Reviewed By: ezyang
Differential Revision: D12829828
Pulled By: zou3519
fbshipit-source-id: df907731e2398ce2baf1c8b1860a561ccc456f78
Summary:
Looks like a print() snuck in by accident with a recent PR and it's printing a lot of spew when the tests are run.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13280
Differential Revision: D12833449
Pulled By: michaelsuo
fbshipit-source-id: 5b50fd4b03bb73e5ca44cabdc99609c10017ff55
Summary:
Adding string equality comparison, and concat. Both are used in the standard library
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12992
Differential Revision: D10513681
Pulled By: eellison
fbshipit-source-id: 1f845ef50be7850fdd3366951b20dc2a805c21fd
Summary:
Future now is an IValue. prim::Wait now is replaced by aten::wait
This PR is built on top of #12925
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12976
Differential Revision: D10861483
Pulled By: highker
fbshipit-source-id: 9e17926a625bc502fb12335ef9ce819f25776be7
Summary:
As titled, this PR is a part of tasks to unblock exporting the standard library
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13161
Differential Revision: D10866927
Pulled By: wanchaol
fbshipit-source-id: 50038dbe6840b097b98cbed9d46a189a64e82302
Summary:
This commit is a minimial initial pass at adding inplace and _out variants to the JIT.
It changes gen_jit_dispatch.py to add bindings for these operators, and it also
supplements the FunctionSchema with alias information for these operators and for
viewing operators.
Tests are very minimal and will need to be improved in future commits.
Notes:
* Custom operator tests needed to be changed since _out variants add overloads, which
the custom operator pipeline does not handle when called from python. This commit
registers special test ops in the _test namespace for this purpose.
* Extends the schema parser to parse alias annotations more robustly.
* Extends FunctionSchema with `writes()` a set of alias set names that the op will write to,
and `annotatedType()` which will return AnnotatedType objects which contain the alias_set
information that was parsed from the schema.
* Disables all optimizations in graph executor when a mutable operator is found. This
is something that will be improved in the future but is necessary for correctness now.
* Adds annotate_ops to gen_jit_dispatch which adds aliasing information to all of the
aten ops.
* Adds AnnotatedType to the type hierarchy which is used to mark List and Tensor types
with their alias_set. These types only appear in schema when you call annotatedType
and are erased from types in normal use.
* Extends jit::Type with .containedTypes() and .withContained(new_types). The first returns all types contained
within the type (e.g. T for T[], or {T,L} for a tuple (T, L)). The second constructs a new
version of the same type, replacing the contained types with new_types. This simplifies
a lot of logic for recursively cleaning up types.
* Refactor List[T] into a common part that is shared with Annotated[T] and can be shared
with Optional[T] and Future[T] when they are merged.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13093
Differential Revision: D10848176
Pulled By: zdevito
fbshipit-source-id: d057f23eeb99cde8881129b42d3f151ed5e7655d
Summary:
This PR adds optional type to ATen native, autograd, JIT schema and Python Arg parser, closes#9513. It allows us to use optional default values (including None) for function signature and implementations like clamp, etc., and also let us remove the python_default_init hack.
Follow up:
remove python_default_init completely.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12582
Differential Revision: D10417423
Pulled By: wanchaol
fbshipit-source-id: 1c80f0727bb528188b47c595629e2996be269b89
Summary:
Adds the attribute name to the error message and fixes the corresponding
test to actually run
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13072
Differential Revision: D10846622
Pulled By: driazati
fbshipit-source-id: a7eee6320c28140c4937ede3d4e4685cfce08d84
Summary:
`warnings.warn` is used commonly thoughout `nn.functional`, so this adds
support for it by forwarding its arguments to `print`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12964
Differential Revision: D10559427
Pulled By: driazati
fbshipit-source-id: 5b591f6f446c906418f9fc7730c17e301f263d9b
Summary:
Adds support for weak script modules created that get compiled to `ScriptModule`s once added as a submodule of a `ScriptModule`:
```python
weak_module
class Test(torch.nn.Module):
...
weak_script_method
def forward(self, x):
...
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12682
Differential Revision: D10458626
Pulled By: driazati
fbshipit-source-id: 10ae23cb83cdafc4646cee58f399e14b2e60acd4
Summary:
expecttest and test_expecttest are the implementation and tests
for this functionality. I wired it up to the --accept flag,
but there's also a new environment variable EXPECTTEST_ACCEPT
which may be more convenient to trigger. Haven't tested if this
works in fbcode.
There may be a few expect tests which will benefit from inline
treatment, but I just did one to show it works.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12825
Reviewed By: teng-li
Differential Revision: D10448630
Pulled By: ezyang
fbshipit-source-id: 3d339f82e2d00891309620a60e13039fa1ed8b46
Summary:
* Moves `weak_script` annotation to `torch/_jit_internal.py` folder to resolve dependency issue between `torch.jit` and `torch.nn`
* Add `torch._jit.weak_script` to `tanhshrink` and `softsign`, their tests now pass instead of giving an `unknown builtin op` error
* Blacklist converted `torch.nn.functional` functions from appearing in the builtin op list if they don't actually have corresponding `aten` ops
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12723
Differential Revision: D10452986
Pulled By: driazati
fbshipit-source-id: c7842bc2d3ba0aaf7ca6e1e228523dbed3d63c36
Summary:
Add strings to our set of built-in types for annotations. This is used in the the functional library.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12731
Differential Revision: D10453153
Pulled By: eellison
fbshipit-source-id: f54177c0c529f2e09f7ff380ddb476c3545ba5b0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12777
Enables JIT tests in FBCode. Changes pybind11 code to avoid mixing py::args with positinally matched arguments because old versions of PyBind11 leak memory in this case.
Reviewed By: jamesr66a
Differential Revision: D10419708
fbshipit-source-id: 74bc466001b5d363132d1af32e96841b38601827
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12794
common.py is used in base_module for almost all tests in test/. The
name of this file is so common that can easily conflict with other dependencies
if they happen to have another common.py in the base module. Rename the file to
avoid conflict.
Reviewed By: orionr
Differential Revision: D10438204
fbshipit-source-id: 6a996c14980722330be0a9fd3a54c20af4b3d380
Summary:
I found a bug in norm() and fixed it (and added tests to make sure it's fixed)
here is how to reproduce it:
```python
import torch
x = torch.FloatTensor([[10, 12, 13], [4, 0, 12]])
print(torch.norm(x, -40, dim=0, keepdim=True)) #output is tensor([[ 4.0000, 0.0000, 11.9853]])
print(torch.norm(x, float('-inf'), dim=0, keepdim=True)) #output is tensor([[1., 1., 1.]]) which is wrong!
from numpy.linalg import norm as np_norm
x = x.numpy()
print(np_norm(x, ord=-40, axis=0)) #output is array([[4., 0., 11.985261]])
print(np_norm(x, ord=float('-inf'), axis=0)) #output is array([[4., 0., 12.0]])
```
it's related to [#6817](https://github.com/pytorch/pytorch/issues/6817) and [#6969](https://github.com/pytorch/pytorch/pull/6969)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12722
Differential Revision: D10427687
Pulled By: soumith
fbshipit-source-id: 936a7491d1e2625410513ee9c39f8c910e8e6803
Summary:
Where is declared as:
```
where(Tensor condition, Tensor self, Tensor other)
```
Previously the compiler assumed that self must be the first argument.
But this is not true in practice for `where` and for a few other exceptions.
This changes the compiler to take an explicit self argument which gets matched
to the `self` that appears in the schema.
Note that this requires renaming a variant of pow, which referred to
an exponent Tensor as `self` because otherwise that would cause `t^3`
to match against `t` being the exponent.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12385
Differential Revision: D10364658
Pulled By: zdevito
fbshipit-source-id: 39e030c6912dd19b4b0b9e35fcbabc167b4cc255
Summary:
Add dtype argument to softmax/log_softmax functions.
Computing softmax in fp32 precision is necessary for mixed precision training, and converting output of the previous layer into fp32 and then reading it as fp32 in softmax is expensive, memory and perf-wise, this PR allows one to avoid it.
For most input data/dtype combinations, input data is converted to dtype and then softmax is computed. If input data is half type and dtype is fp32, kernels with the corresponding template arguments are called.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11719
Reviewed By: ezyang
Differential Revision: D10175514
Pulled By: zou3519
fbshipit-source-id: 06d285af91a0b659932236d41ad63b787eeed243
Summary:
Implement contiguous as `aten::contiguous` so it can be recorded during tracing. This was causing issues with both the trace checker as well as when a `contiguous()`-ed tensor was used downstream in a view that expected certain strides
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12541
Differential Revision: D10304028
Pulled By: jamesr66a
fbshipit-source-id: dc4c878771d052f5a0e9674f610fdec3c6782c41
Summary:
If any inputs require_grad then the graph executor does differential subgraph slicing. The existing algorithm combines adjacent differentiable Node*.
There are two major motivations. The first is improving fusion opportunities: the graph fusion pass runs after differential subgraph slicing. This means that only nodes that are a part of the same differential subgraph may be considered for fusion. If something like the following happens,
```
y = f(x)
k = not_differentiable_op(m)
z = g(y)
```
and f and g are both fusible and differentiable operations, then they will be inserted into different differential subgraphs and not fused together.
The second is to enable JIT optimizations on backward passes for things like an (automatically) unrolled LSTM. Right now, in an unrolled LSTM, we see something like the following:
```
lstm_cell()
non_differentiable_list_op()
lstm_cell()
non_differentiable_list_op()
lstm_cell()
non_differentiable_list_op()
```
Each lstm_cell itself is differentiable and gets put into a separate differential subgraph. During the backwards pass, each prim::DifferentiableSubgraph has its own graph executor: these graph executors cannot talk to each other. It is better if we combined all of the lstm_cells (where applicable) into one differential subgraph so their backward passes are combined into one graph executor that can perform better optimizations than several separate graph executors.
Think about the computation graph as a DAG where edges are data dependencies and vertices are operations (the nodes). Each vertex is either black or red; a vertex is colored black if it is differentiable and red otherwise. The goal is to contract edges (merge nodes) to have the fewest black vertices remaining such that the graph is still a DAG.
The algorithm is the following:
- Take the Graph& and create a shadow "DynamicDAG" object to wrap Node* and edges. Each Vertex holds multiple Node* (but starts out holding one Node*) and each edge is a data dependency.
- Greedily contract vertices in the DynamicDAG if they are "differentiable". This operation is unrelated to the Graph&.
- A Vertex is "differentiable" if all the nodes it holds is differentiable.
- When contracting vertices, combine their Node* contents.
- The DynamicDAG keeps its vertices in topological order and complains if the contraction is invalid so everything is good.
- Take the DynamicDAG: reorder the nodes in the Graph& to match the topological order in the DynamicDAG.
- Finally, go through each Vertex in the DynamicDAG: if it contains multiple Node* then merge all of them into a prim::DifferentiableGraph.
The DynamicDAG is based off of the dynamic top sort algorithm in [this paper](https://www.doc.ic.ac.uk/~phjk/Publications/DynamicTopoSortAlg-JEA-07.pdf) by Pearce and Kelly.
Each contractEdge(producer, consumer) call is `O(|AR| log |AR| * min(|out_edges(producer)|, |in_edges(consumer)|)` where `AR` is the "affected region" (defined as the set of nodes that, in topological order, are between producer and consumer). By only considering contractions such that `|ord(producer) - ord(consumer)| < threshold1` and `|out_edges(producer)| < threshold2` we can make each contractEdge(producer, consumer) call take constant time. The resulting algorithm is linear in the number of nodes.
Added a lot of small test cases.
Looking for suggestions on the following:
- what big computation graphs should I run this on to test how fast or slow it is?
- what things other than correctness should I be thinking about when I test this?
cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12175
Differential Revision: D10302564
Pulled By: zou3519
fbshipit-source-id: 8a94d130d82f8a1713cc28483afef9a72d83d61a
Summary:
There's some action at a distance issues and not having this is disabling quantization in C2 for prod use cases
ref T34831022
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12538
Differential Revision: D10302931
Pulled By: jamesr66a
fbshipit-source-id: 700dc8c5c4297e942171992266ffb67b815be754
Summary:
The value_info proto field was being processed in BuildGraph, but control flow blocks used buildBlocks instead. This PR moves moves that step to BuildBlock.
I removed DecoderBase because it was making the code confusing and we never needed it in the first place.
closes#12319
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12351
Differential Revision: D10212411
Pulled By: li-roy
fbshipit-source-id: 47f289a462a1ab7391ff57368185401673980233
Summary:
Add a pass to move all constants to the beginning of the graph, and deduplicate.
This extends https://github.com/pytorch/pytorch/pull/10231 to also handle constants introduced in inlining, constant propagation, etc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12222
Reviewed By: driazati
Differential Revision: D10201616
Pulled By: eellison
fbshipit-source-id: bc9c5be26868c8b5414257a0d4462de025aeb9bd
Summary:
This PR is the start of weak script mode for functions
Weak scripts allow you to compile a graph from Python code at runtime by annotating with `torch.jit.weak_script` for use in the JIT without affecting eager execution. Scripts are compiled lazily on the first call in a graph to avoid long Python startup times.
apaszke zdevito ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11963
Differential Revision: D10183451
Pulled By: driazati
fbshipit-source-id: 128750994d5eb148a984f8aba4113525c3e248c8
Summary:
* Replaces `prim::PythonOp` with the name of the function being called
* Delays printing values used in `prim::Return` nodes until the return
node itself if that is the only place the value is used to remove some
useless assigns
zdevito apaszke ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12179
Differential Revision: D10132661
Pulled By: driazati
fbshipit-source-id: cbc4ac34137ed5872049082e25d19eb1ebc71208
Summary:
If a PythonOp throws an error it raises an exception to the interpreter and also releases the GIL which causes [pybind to segfault](https://github.com/potassco/clingo/issues/42)
This fix catches pybind errors while the GIL is still held and throws a `python_error` to re-capture the GIL
Fixes#12118
apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12243
Differential Revision: D10182787
Pulled By: driazati
fbshipit-source-id: 719d4a7c3294af201e061cf7141bec3ca0fb1f04
Summary:
This PR adds a bool type to `IValue` and puts it into place.
* changes conds for `prim::If` and `prim::Loop` to use `bool` type
* changes operators that take `bool`s to match their native ops
* fixes ambiguous `aten` ops `aten::std` and `aten::var`
* fixes tests in `test_jit.py TestJitGenerated`
```
'test_std_dim',
'test_std_dim_1d',
'test_std_dim_1d_neg0',
'test_std_dim_neg0',
'test_var_dim',
'test_var_dim_1d',
'test_var_dim_1d_neg0',
'test_var_dim_neg0'
```
* adds `prim::BoolToTensor` and `prim::TensorToBool`
apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11834
Differential Revision: D9928570
Pulled By: driazati
fbshipit-source-id: 373c53df2f1a8ffa9e33d9a517002fbeef25f3eb
Summary:
Allow trailing commas in assign statements or tuples, which also allows single element tuples.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11723
Differential Revision: D10052162
Pulled By: eellison
fbshipit-source-id: 344d908a3ad942a23ebd9f341794bc9734226aa8
Summary:
In current upstream float scalars are always written into kernels with:
`out << std::scientific << v << "f";`
When the floats are special values like NaN, +inf, or -inf this produces nonsense that causes compilation to fail. This fix updates the conversion of float scalars to device-specific special values. The appropriate macros are added to the CPU and CUDA resource strings. Note that a NAN macro was not necessary on the CPU since math.h defines NAN.
To verify this fix I updated the test_clamp_fusion test in test_jit.py. I wanted to test -inf, too, but -inf is not currently accepted by the interpreter.
Edit:
Forgot to mention, this partially addresses issue #12067.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12070
Reviewed By: ezyang
Differential Revision: D10044704
Pulled By: soumith
fbshipit-source-id: 8f4a930862d66a7d37d985e3f6a6fb724579e74c
Summary:
This PR replaces the use of `std::FILE` with `istream`/`ostream` for JIT serialization.
It uses this mechanism to add the possibility to serialize to/from binary buffers, in addition to files, both in `libtorch` and from Python.
`getExportImportCopy` in `test_jit.py` has been updated so that both file and buffer codepaths are exercised during tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11932
Differential Revision: D10084303
Pulled By: apaszke
fbshipit-source-id: b850801b3932922fa1dbac6fdaed5063d58bc20d
Summary:
This PR implements the design that we discussed. Changes:
- Added a World token IValue and type. The IValue is basically a dummy struct for now, in the future we may extend it (say, add thread-local state).
- Effectful ops explicitly declare they are mutable by having World tokens as inputs and outputs in their schema.
- Purely functional ops that use mutable values will get "fenced" and the world token will be threaded through the fences
- AnnotateEffects pass which wires up all the world tokens together.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10700
Reviewed By: eellison
Differential Revision: D9547881
Pulled By: michaelsuo
fbshipit-source-id: ebbd786c31f15bf45e2ddb0c188438ff2f5f3c88
Summary:
We generate specialized list operations for int, float, and Tensor lists so that small lists of integers like the arguments to conv do not involve tons of boxing code.
This PR adds a fallback GenericList for List types that contain any other type. It does so by adding type variables to `jit::Type`, and machinery for matching/replacing the type variables during `tryMatchSchema` and operator lookup.
It also modifies the builtin list ops to include a fallback that works on a GenericList object that simply holds IValues. This is distinguished from IValue's tuple type so that conversion to/from Python still happens losslessly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12040
Differential Revision: D10037098
Pulled By: zdevito
fbshipit-source-id: 0c5f2864d12e7d33554bf34cc29e5fb700dde150
Summary:
This PR does a few things:
Previously test_jit.py only tested autograd on backward graphs.
This is because we borrow from test_autograd and construct graphs with a small
number of nodes. Because the number of nodes is small (typically 1-2), those graph
do not end up containing autodiff subgraphs, so autodiff never gets tested.
This PR enables autodiff testing by doing the following:
- added disableDebugAutodiffSubgraphInlining fn to graph_executor to disable
autodiff subgraph inlining.
- (implementation) added autodiffSubgraphNodeThreshold and autodiffSubgraphInlineThreshold.
These are set to their default values (2, 5) but disableDebugAutodiffSubgraphInlining()
sets both to 1, disabling subgraph inlining and allowing 1-node autodiff subgraphs.
- The relevant backward jit tests disable autodiff subgraph inlining so they
will test the autodiff versions of the operators instead of autograd whenever
an autodiff variant exists.
- We don't run the tests that do inline autodiff subgraphs anymore.
This has no impact on testing correctness because the assumption is
that autograd functions are correct and are tested in test_autograd.py
This allows the graph fuser to work better because a lot of these ops were previously not autodiff-compatible but fusible. On a more concrete example, lstm backward contains a lot of tensor-scalar operations; these autodiff formulas help its double backward pass.
Included:
- arithmetic overloads
- abs, acos, asin, atan, ceil, cos, cosh, exp, expm1, floor, fmod, frac, log, log10, log1p, log2 reciprocal, remainder, round, sin, sinh, tan, trunc, rsqrt
TestJitGenerated tests autodiff for all of the added operations.
cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11832
Differential Revision: D10031256
Pulled By: zou3519
fbshipit-source-id: 9daf9900a5ad187743609cd0fbbd10b15411ad93
Summary:
Because we emit a lot of them in our symbolic AD. This brings down the backward time of an LSTM I'm testing from 14.2ms to 12.5ms (a 15% improvement).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11801
Differential Revision: D9916815
Pulled By: apaszke
fbshipit-source-id: 2d9cb886c424ccd43b9f996aad89950d3bddf494
Summary:
Or even taking them as inputs. This prevents optimizations to happen
either inside the differentiable subgraphs, or in the surrounding graph.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11809
Differential Revision: D10009680
Pulled By: apaszke
fbshipit-source-id: face638566228e470a6deec48dc2aa3a1cce26d4
Summary:
Previously, aten::view returned a Dynamic type when attr::size is a prim::ListConstruct.
See [this for a repro](https://gist.github.com/zou3519/cbd610472ba3369f556fa612a7d93b28).
This prevented a pre-multipled lstm input graph from being fusible (aten::view is necessary
to do premultiplication).
If aten::view is passed an output of a prim::ListConstruct node, then shape prop should
be able to figure out its TensorType because we statically know the number of inputs to
prim::ListConstruct. This PR implements that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11877
Differential Revision: D9972356
Pulled By: zou3519
fbshipit-source-id: cb87786f6e7f222d4b8f07d8f2a9de34859cb6a5
Summary:
This is pretty important because a common situation of passing LSTM hidden states as a tuple completely trashes performance of a network.
Cleans up all our propagation/undef specialization passes, at a cost of increased complexity of `ArgumentSpec` and `GraphExecutor`. An alternative would be to simply flatten all tuple inputs to a graph ahead of time, but that might just end up being confusing in the future (you never know if you're working with a graph that can have tuple or not).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11863
Differential Revision: D9992814
Pulled By: apaszke
fbshipit-source-id: 0a565a3b23e32f8fa72c0534e07c1ce6187739fc
Summary:
- Disable addmm fusion. The reason for this is explained in the comment.
- Tiny change in `stack.h` that lets us avoid constructing an unnecessary temporary `IValue` on the (C++) stack (it will only get created on the interpreter stack directly).
- Fixed a correctness issue in requires grad propagation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11654
Reviewed By: colesbury
Differential Revision: D9813739
Pulled By: apaszke
fbshipit-source-id: 23e83bc8605802f39bfecf447efad9239b9421c3
Summary:
Spruriously added in #11261
I had a PR to catch these automatically (#11279), but it had some issues
passing on some CI environments but not others (e.g. for
`test_nn_group_norm`), any ideas?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11916
Differential Revision: D9992065
Pulled By: driazati
fbshipit-source-id: 05cfa8ed9af939e8ffd5827847ee7bfe0be799b2