Summary:
Also, make `torch.isclose` work with integral tensors and refactor `_check_trace` a bit.
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11246
Differential Revision: D9652701
Pulled By: apaszke
fbshipit-source-id: fb0bdbfd1952e45e153541e4d471b423a5659f25
Summary:
there are multiple views of the tensor live.
Also adds recording for copy_ because this is the critical in place
op where these views will cause LHS indexing to fail.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11129
Differential Revision: D9600195
Pulled By: zdevito
fbshipit-source-id: bfd8f5befa47377e36d704dbdb11023c608fe9a3
Summary:
Adding short circuit evaluation to AND or OR. The second expression of and AND or OR gets lifted into an if branch, which is conditionally evaluated.
BatchOps was using the expression `dims = dims1 or dims2`, where dims is often an empty tensor. This nows throws an error, because dims1 gets cast to a boolean, and you can't convert an empty tensor to a scalar. It now matches the behavior of pytorch in python.
One thing that came up is if the second expression in an and/or in python gets returned, it does not get coerced to a boolean.
`tensor == (False or tensor)`
`tensor == (True and tensor)`
We do not currently support this.
edit: wording
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11116
Differential Revision: D9618168
Pulled By: eellison
fbshipit-source-id: 93b202be2f222d41f85d38d9c95f04d1749e8343
Summary:
* improve docker packages (install OpenBLAS to have at-compile-time LAPACK functionality w/ optimizations for both Intel and AMD CPUs)
* integrate rocFFT (i.e., enable Fourier functionality)
* fix bugs in ROCm caused by wrong warp size
* enable more test sets, skip the tests that don't work on ROCm yet
* don't disable asserts any longer in hipification
* small improvements
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10893
Differential Revision: D9615053
Pulled By: ezyang
fbshipit-source-id: 864b4d27bf089421f7dfd8065e5017f9ea2f7b3b
Summary:
Previously when we had a slicing expression like `x[0:5, 0]`, where the sliced tensor was of size `5` in dimension 0, we would skip dispatching the actual slice call as an optimization.
This caused incorrect behavior under tracing, as we would not record the slice op and thus if we encountered an input with a different shape while running the trace, we would get incorrect results.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11156
Differential Revision: D9622252
Pulled By: jamesr66a
fbshipit-source-id: 822f2e8f01504e131f53bd9ef51c171c7913a7cc
Summary:
This makes it so `detach` and `detach_` are traceable and also adds a pass to erase them before ONNX export
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11038
Differential Revision: D9588038
Pulled By: jamesr66a
fbshipit-source-id: 263dd3147e24fcb0c716743f37fdb9f84c0015e7
Summary:
**Review last commit only.** Stacked on top of #10949.
This commit fixes a number of issues connected to caching
differentiability status of graphs inside graph executors,
and changes the rules for optimization of differentiable subgraphs.
Previously every one of those was instantiated as a separate graph
executor, but now they are simply heavier-optimized graph regions,
and graph executors are only instantiated for their backward.
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10977
Differential Revision: D9600626
Pulled By: apaszke
fbshipit-source-id: dad09a0f586e396afbd5406319c1cd54fbb8a3d3
Summary:
Operators like aten::chunk used to return a number of tensors, but
now return a list. To make it easier to do shape prop through
aten::chunk and fuse it, I've also introduced prim::ConstantChunk,
which behaves like the previous implementation (has a variable length
output list).
The downside of this PR is that the introduction of more lists to the IR causes the LSTM and MiLSTM graphs to be considered as non-differentiable by the graph executor. I verified that they are still optimize correctly, and my next patch (that changes how the specializations/differentiation works) will restore those.
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10949
Reviewed By: zdevito
Differential Revision: D9556823
Pulled By: apaszke
fbshipit-source-id: 33e63b17fc7247cac6cfc05eb7eb9bf069b499ee
Summary:
This was done because it surprising for a decorator to run a function
rather than wrap it, and not simplify the syntax for tracing modules.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11069
Reviewed By: jamesr66a
Differential Revision: D9583192
Pulled By: zdevito
fbshipit-source-id: b914b7ab4c73c255086465a6576eef3a22de1e13
Summary:
Also add single grad whitelist to the jit test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10782
Reviewed By: ezyang
Differential Revision: D9583378
Pulled By: erikbrinkman
fbshipit-source-id: 069e5ae68ea7f3524dec39cf1d5fe9cd53941944
Summary:
Things like torch.zeros now appear in traces rather than constants.
To continue to support our current level of ONNX export, we run
constant prop to turn these back into constants where possible before
export.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10935
Differential Revision: D9527427
Pulled By: zdevito
fbshipit-source-id: 552a8bcc01b911251dab7d7026faafdd7a3c758a
Summary:
TODO: integrate into torch.onnx.export -- separate PR
*Problem:* We have a facility to trace PyTorch operations on Python code, but there are several failure modes where the trace is not representative of the actual underlying computation:
* The tracer encountered dynamic control flow
* Some computation escaped the tracer, and appeared as a Constant tensor node in the graph
* Some stateful function was traced, e.g. someone did an optimization in Python by memoizing function outputs
*Objective*: In an ideal world, this whole process would be automated and the user can trust that the system will magically capture the intended semantics from the program. Realistically speaking, we will likely have to settle with a human-in-the-loop error reporting system, allowing for the user to identify problems and modify the source code to allow for tracing.
*Stage 1* (this PR): Output-level checking & graph diff. torch.jit.trace gains a kwarg 'check_inputs', which is a list of tuples of input arguments. We will iterate through the list and trace the function again for each set of check inputs. We'll also interpret the original trace with these inputs and compare output values and graphs, printing a diff of the graph if there is a difference.
Examples:
```
torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(4, 5),)])
def foo(x):
y = torch.arange(0, x.shape[0]).float()
return x + y.unsqueeze(1)
```
```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Graphs differed across invocations!
Graph diff:
graph(%0 : Dynamic) {
- %1 : Dynamic = prim::Constant[value= 0 1 2 [ CPULongType{3} ]]()
? ^
+ %1 : Dynamic = prim::Constant[value= 0 1 2 3 [ CPULongType{4} ]]()
? +++ ^
%2 : int = prim::Constant[value=0]()
%3 : Dynamic = aten::_cast_Float(%1, %2)
%4 : int = prim::Constant[value=1]()
%5 : Dynamic = aten::unsqueeze(%3, %4)
%6 : int = prim::Constant[value=1]()
%7 : Dynamic = aten::add(%0, %5, %6)
return (%7);
}
Node diff:
- %1 : Dynamic = prim::Constant[value= 0 1 2 [ CPULongType{3} ]]()
? ^
+ %1 : Dynamic = prim::Constant[value= 0 1 2 3 [ CPULongType{4} ]]()
? +++ ^
Trace source location:
dank.py(5): foo
/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper
dank.py(3): <module>
Check source location:
dank.py(5): foo
/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(281): check_trace
/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(408): wrapper
dank.py(3): <module>
ERROR: Tensor-valued Constant nodes differed in value across invocations. This often indicates that the tracer has encountered untraceable code.
Node:
%1 : Dynamic = prim::Constant[value= 0 1 2 [ CPULongType{3} ]]()
Source Location:
dank.py(5): foo
/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper
dank.py(3): <module>
Comparison exception:
Not equal to tolerance rtol=1e-07, atol=0
(shapes (3,), (4,) mismatch)
x: array([0, 1, 2])
y: array([0, 1, 2, 3])
```
==
```
torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(3, 4),)])
def foo(x):
y = x.data
return x + y
```
```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Traced function outputs do not match the Python function outputs.
ERROR: Tensor-valued Constant nodes differed in value across invocations. This often indicates that the tracer has encountered untraceable code.
Node:
%1 : Dynamic = prim::Constant[value=<Tensor>]()
Source Location:
dank.py(6): foo
/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper
dank.py(3): <module>
Comparison exception:
Not equal to tolerance rtol=1e-07, atol=0
(mismatch 100.0%)
x: array([0.397137, 0.956105, 0.169478, 0.560292, 0.392568, 0.108441,
0.97645 , 0.34412 , 0.951246, 0.793061, 0.557595, 0.770245],
dtype=float32)
y: array([0.243178, 0.315964, 0.972041, 0.0215 , 0.927751, 0.457512,
0.951092, 0.97883 , 0.048688, 0.118066, 0.779345, 0.271272],
dtype=float32)
```
==
```
import torch
torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(4, 4),)])
def foo(x):
for _ in range(x.size(0)):
x = torch.neg(x)
return x
```
```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Traced function outputs do not match the Python function outputs.
ERROR: Graphs differed across invocations!
Graph diff:
graph(%0 : Dynamic) {
%1 : Dynamic = aten::neg(%0)
%2 : Dynamic = aten::neg(%1)
%3 : Dynamic = aten::neg(%2)
+ %4 : Dynamic = aten::neg(%3)
- return (%3);
? ^
+ return (%4);
? ^
}
```
==
```
import torch
def foo(x):
if not hasattr(foo, 'cache'):
foo.cache = torch.neg(x)
return x + foo.cache
traced = torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(3, 4),)])(foo)
```
```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Traced function outputs do not match the Python function outputs.
ERROR: Graphs differed across invocations!
Graph diff:
graph(%0 : Dynamic) {
- %1 : Dynamic = aten::neg(%0)
+ %1 : Dynamic = prim::Constant[value=<Tensor>]()
%2 : int = prim::Constant[value=1]()
%3 : Dynamic = aten::add(%0, %1, %2)
return (%3);
}
Node diff:
- %1 : Dynamic = aten::neg(%0)
+ %1 : Dynamic = prim::Constant[value=<Tensor>]()
Trace source location:
test.py(5): foo
/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper
test.py(8): <module>
Check source location:
test.py(6): foo
/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(281): check_trace
/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(408): wrapper
test.py(8): <module>
```
The following two examples show instances where program semantics are lost in the Python -> trace transformation, and repeated invocation does not give us useful debug information. Further design in underway for catching these scenarios.
```
import torch
torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(3, 4),)])
def foo(x):
for i in range(3):
x[i, :] = torch.zeros(4)
return x
```
```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Traced function outputs do not match the Python function outputs.
Exception:
Not equal to tolerance rtol=1e-07, atol=0
(mismatch 100.0%)
x: array([0.830221, 0.915481, 0.940281, 0.555241], dtype=float32)
y: array([0., 0., 0., 0.], dtype=float32)
```
==
```
import torch
torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(5, 6),)])
def foo(x):
x.view(-1).add_(-x.view(-1))
return x
```
```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Traced function outputs do not match the Python function outputs.
Exception:
Not equal to tolerance rtol=1e-07, atol=0
(mismatch 100.0%)
x: array([0.734441, 0.445327, 0.640592, 0.30076 , 0.891674, 0.124771],
dtype=float32)
y: array([0., 0., 0., 0., 0., 0.], dtype=float32)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10841
Differential Revision: D9499945
Pulled By: jamesr66a
fbshipit-source-id: 1f842a32d0b0645259cc43b29700b86d99c59a45
Summary:
Changes the approach for resolving builtin ops so that the following works
```
add = torch.add
script
def foo(x):
return add(x, x)
```
This handles cases when people alias torch and torch.nn.functional to
shorter names.
This works by building a table of id -> builtin name for the know builtin
ops in torch, torch.nn.functional, and for any user-defined
op created by accessing in torch.ops.foo.bar
This allows us to clean up many SugaredValue types in the compiler.
Notes:
* we now consider any attributes on python modules to be constants
(e.g. math.pi, and torch.double).
* fixes a bug where we incorrectly allowed attribute lookup on arbitrary
pyton objects. It is now restricted to modules only.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10927
Differential Revision: D9527522
Pulled By: zdevito
fbshipit-source-id: 0280422af08b4b0f48f302766d5a9c0deee47660
Summary:
Previously when tracing slicing & select negative indices would get normalized, fixing the index to the size of the traced tensor. This makes the behavior the same as script so aten::select with negative indices is emitted.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10560
Differential Revision: D9493614
Pulled By: eellison
fbshipit-source-id: ce7a8bae59863723247208d86b9f2948051ccc6c
Summary:
* Fix the necessary pathways so that tuples and lists can be inputs to the script.
* prevent linear algebra functions from being run in shape prop because
they frequently will error out for nonsense data.
* favor schema-driven python input conversion where possible.
remaining cases where we directly create Stacks without schema are
only for debugging
* Make the error messages when calling script/trace functions more pythonic
* Simplify FlattenTuples -- now that tuples are supported we can choose to only flatten tuples when needed. This may have to be revisited pending onnx test results, but is necessary for making tuple io work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10812
Differential Revision: D9477982
Pulled By: zdevito
fbshipit-source-id: ed06fc426e6ef6deb404602a26c435a7fc40ea0c
Summary:
The scalar situation has gotten a lot better and now we can
remove all instances of FIXME_zerol().
cc zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10900
Differential Revision: D9514206
Pulled By: zou3519
fbshipit-source-id: e4e522f324126c5454cd6de14b832d2d1f6cb0ce
Summary:
Please review the expects carefully to make sure there are no regressions. I tried to go over them one by one when they changed, but it's sometimes easy to miss finer details.
Summary of changes:
- Renamed `TensorType` to `CompleteTensorType`. Added a new `TensorType` which records only the scalar type, number of dimensions, and device of a value. The argument behind the rename is to encourage people to use `CompleteTensorType` less, as most passes will only have limited information available. To make transition easier `complete_type->cast<TensorType>()` works, and makes our passes work with both kinds of specialization if they don't need extra the extra detail.
- Renamed `ArgumentSpec` to `CompleteArgumentSpec`. Added a new `ArgumentSpec`, which matches argument only at the level of the new `TensorType`.
- Shape analysis can process graphs with both `CompleteTensorType` and `TensorType`.
- Fuser was a part that heavily relied on full shape information being available. Now, we simply try to fuse the largest possible graphs, and have to do run-time checks to make sure they match the code we generate. If they don't, we fall back to regular interpretation. The shape checks are implementing using an optimized method exploiting algebraic properties of shapes with broadcasting, and the relations of broadcasting with pointwise ops. A full written proof of correctness of the shape checking algorithm is included in a comment in `graph_fuser.cpp`.
zdevito ezyang mruberry ngimel csarofeen
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10844
Differential Revision: D9498705
Pulled By: apaszke
fbshipit-source-id: 0c53c2fcebd871cc2a29c260f8d012276479cc61
Summary:
When matching schema, first try to match without adding TensorToNum conversions. Then make another pass where TensorToNum conversions are allowed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10180
Differential Revision: D9438153
Pulled By: eellison
fbshipit-source-id: 80541b5abd06e9d4187e89dda751f44dab6f58c5
Summary:
Part of #10774.
This PR does the following:
- Support ast.ExtSlice in the frontend. This is done by returning a
list of ast.Index and ast.Slice.
- Support multidimensional indexing with ints and slices
The general approach is to desugar multidimensional indexing into
at::slice, at::select operations. This is exactly how normal pytorch
does indexing (by desugaring it into at::slice, at::select, and other ops).
I used [this code](https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/python_variable_indexing.cpp) as reference.
We should be able to copy the rest of this to implement the missing
indexing features in script (indexing with ellipses, tensors, sequences, etc).
After I'm done implementing the missing indexing features in future prs, I can try to
templatize python_variable_indexing.cpp so that it can work with both JIT
script and normal pytorch indexing, but right now I'm not sure if that's
a good idea or not.
cc zdevito jamesr66a apaszke wanchaol
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10787
Differential Revision: D9481402
Pulled By: zou3519
fbshipit-source-id: 78c9fa42771a037d157879e23e20b87401cf1837
Summary:
Things like `zeros(1,2,3, dtype=torch.int)` are now supported in the script by altering tryMatchSchema to auto-construct the list `[1,2,3]` when it sees inlined members of the list as the last positional arguments.
I suggest reading the commits individually, since the first two incrementally change how we do tryMatchSchema to get it ready for adding vararg list conversion, while the third actually does the modification.
closes#10632closes#8516
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10250
Differential Revision: D9478235
Pulled By: zdevito
fbshipit-source-id: 0c48caf7a6184e463d9293d97015e9884758ef9c
Summary:
When emitting if Branches, check that the types on each value returned are equivalent. As with reassignment of values, tensors are not forced to be the same shape or subtype.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10281
Differential Revision: D9466566
Pulled By: eellison
fbshipit-source-id: 746abdeb34a0f68806b8e73726ad5003b536911c
Summary:
Augassign (i.e., `x += 1`) gets desugared to an assignment of a binop (`x = x + 1`).
Right now we assert that the RHS of the binop is a tensor,
but it really doesn't have to be because we support scalar/scalar ops and also
list-list ops (i.e., `[1, 2] + [2, 3]`).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10730
Differential Revision: D9465110
Pulled By: zou3519
fbshipit-source-id: 7b118622701f09ce356aca81b8db743d9611097b
Summary:
ONNX doesn't support this. Instead flatten the inputs to the ListConstruct op and inline it into the subsequent usage
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10713
Differential Revision: D9458508
Pulled By: jamesr66a
fbshipit-source-id: 0b41e69320e694bb2f304c6221864a39121e4694
Summary:
This will make the common case more natural (no need to do `_construct_empty_tensor_list()`)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10705
Differential Revision: D9411622
Pulled By: michaelsuo
fbshipit-source-id: 2d91fbc5787426748d6e1c8e7bbeee737544dc96
Summary:
... to avoid slow at::chunk (it is slow due to tensor initialization). Picking up from #10026
This is done through the following:
1) Absorb starting chunks into FusionGroup as a part of the graph fuser
pass.
2) When compiling a kernel, emit a `std::vector<ConcatDesc>` that describes if an input (of the original graph) will be chunked.
3) When launching a kernel, `use std::vector<ConcatDesc>` to chunk an
input tensor on the CPU. This chunk directly takes in an at::Tensor and creates
four TensorInfo structs in-place in the argument list, bypassing the creation of intermediate Tensors.
- Expect test and correctness test to see if a single chunk is fused
by the graph fuser
- Correctness test for a variety of chunks (dimension = beginning,
middle, end) and tensors (contiguous, non-contiguous, edge case
(splitSize = 1) for both CPU/CUDA
- Expect test for multiple chunks fused into the same kernel and
correctness test.
cc zdevito apaszke
LSTM forward pass, 1 layer, 512 hidden size and input size, 100 seq length, requires_grad=False on all inputs and weights.
After changes:
```
thnn cudnn jit
8.8468 6.5797 9.3470
```
Before changes:
```
thnn cudnn jit
9.9221 6.6539 11.2550
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10178
Differential Revision: D9382661
Pulled By: zou3519
fbshipit-source-id: 1f8a749208fbdd45559775ce98cf4eb9558448f8
Summary:
Fixes#10096
If the only thing preventing a simple mappable operator from being fused
into a fusion group is that its Tensor inputs are not of the same shape as the
output, then the graph fuser inserts explicit expand nodes for those
inputs.
This helps the graph fuser not miss out on any fusion opportunities
involving simple mappable operations that have Tensor inputs. This PR
doesn't do anything for the scalar case; that can be addressed later.
Test Plan
- Simple expect test case
- Added expect tests for a raw LSTMCell. The expands help speed up the
forwards pass by allowing more operations to be fused into the LSTMCell's single
FusionGroup.
cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10325
Differential Revision: D9379308
Pulled By: zou3519
fbshipit-source-id: 86d2202eb97e9bb16e511667b7fe177aeaf88245
Summary:
This is on the way to resolving #9940.
Fixes#10501
This PR modifies graph fuser to fuse operations that have constant
scalar arguments. These constant scalar arguments are directly inlined
into the kernel body.
The context for this is that LSTM backward (in particular, sigmoid
backward) has many add(x, 1.) operations. This PR should be sufficient for
LSTM backward to get fused by the graph fuser.
cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10511
Differential Revision: D9378896
Pulled By: zou3519
fbshipit-source-id: 6a7a2987f5b6e8edaaf4b599cd200df33361650f
Summary:
The PR is the first step to integrate torch.nn library with JIT. It adds the tests for nn functional interfaces in trace/script mode, and tries to find out the different between torch.nn.functional ops and the ATen ops, to see the work need to be done in order to support a full set of nn functional in script mode.
Some statistics in summary:
- Totally 84 useful functions in torch.nn.functional (the number does not include helper funcs and deprecated funcs in torch.nn.functional).
- 7 functions/ops does not support higher gradient, so just excluded from the whole test.
- 36 functions is different with the Aten op for different reasons. Among those 36 functions, bunch of them (roughly around 10-15) are just naming difference and simple transformation using other ops inside the function.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10409
Differential Revision: D9350694
Pulled By: wanchaol
fbshipit-source-id: 8fce6f30d8d25ace5a544a57b219fe61f5a092f8
Summary:
Inlining if branches which have constant inputs. If an if node gets inlined, the set of mutated variables returned by its ancestors may have changed. In the following example the block should
return a mutated set of (a) and not (a, b).
```
if cond:
if True:
a = a - 1
else:
b = b - 1
```
To calculate this we recursively update mutate variables in if branches from the leaf nodes up.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10084
Reviewed By: michaelsuo
Differential Revision: D9340429
Pulled By: eellison
fbshipit-source-id: b0dd638a5cace9fdec3130460428fca655ce4b98
Summary:
This is the last step in the custom operator implementation: providing a way to build from C++ and Python. For this I:
1. Created a `FindTorch.cmake` taken largely from ebetica with a CMake function to easily create simple custom op libraries
2. Created a ` torch/op.h` header for easy inclusion of necessary headers,
3. Created a test directory `pytorch/test/custom_operator` which includes the basic setup for a custom op.
1. It defines an op in `op.{h,cpp}`
2. Registers it with the JIT using `RegisterOperators`
3. Builds it into a shared library via a `CMakeLists.txt`
4. Binds it into Python using a `setup.py`. This step makes use of our C++ extension setup that we already have. No work, yey!
The pure C++ and the Python builds are separate and not coupled in any way.
zdevito soumith dzhulgakov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10226
Differential Revision: D9296839
Pulled By: goldsborough
fbshipit-source-id: 32f74cafb6e3d86cada8dfca8136d0dfb1f197a0
Summary:
After this, all combinations of {String frontend, Python AST Frontend}{Python 3-style type annotations, MyPy-style type comments}{Script method, Script function} should properly accept type annotations.
Possible TODOs:
- Clean up the functions marked HACK
- Clean up the Subscript tree-view to better match the Python AST versions
- Can we use this for Python functions? That's the only place annotations.get_signature() is still needed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10279
Differential Revision: D9319726
Pulled By: jamesr66a
fbshipit-source-id: b13f7d4f066b0283d4fc1421a1abb9305c3b28fa
Summary:
- Exposed get_debug_graph for ScriptModule (gets the debug graph for its
forward Method)
- Added forward/backward expect tests for lstm and milstm cells. These
are intended to prevent regressions
cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10506
Differential Revision: D9316590
Pulled By: zou3519
fbshipit-source-id: 3c2510d8363e9733ccbc5c7cc015cd1d028efecf
Summary:
This commit adds the ability to insert a node with inputs, using the schema to check the inputs are valid types, fill in any default values, and perform standard implicit conversions. Since it is schema based, it will discover and use the right overload.
Constructors to `NamedValue` enable it to be constructed using `IValue` constants so it is possible to use constant values in the input list as well:
```
g.insert(aten::add, {v, 3});
```
Keyword arguments are also supported:
```
g.insert(aten::add, {v}, {{"other", t}, {"scalar", 1}});
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10198
Differential Revision: D9307252
Pulled By: zdevito
fbshipit-source-id: 644620aa85047d1eae1288383a619d50fec44d9b
Summary:
Fixes#10456
The graph fuser was fusing together groups with prim::FusedConcat (the producer) with other ops (the consumer) if the consumer is fusable. For example,
```
import torch
torch.jit.script
def fn(x, y, z):
x1 = x + y
y1 = x - y
w = torch.cat([x1, y1])
return w + z
x = torch.randn(2, 2, dtype=torch.float, device='cpu')
y = torch.randn(2, 2, dtype=torch.float, device='cpu')
z = torch.randn(4, 2, dtype=torch.float, device='cpu')
fn(x, y, z)
fn.graph_for(x, y, z)
```
produced the following graph:
```
graph(%x : Float(2, 2)
%y : Float(2, 2)
%z : Float(4, 2)) {
%3 : int = prim::Constant[value=1]()
%y1 : Float(2, 2) = aten::sub(%x, %y, %3)
%8 : int = prim::Constant[value=0]()
%14 : Float(4, 2) = prim::FusionGroup_0[device=-1](%z, %y1, %x, %y)
return (%14);
}
with prim::FusionGroup_0 = graph(%1 : Float(4, 2)
%5 : Float(2, 2)
%7 : Float(2, 2)
%8 : Float(2, 2)) {
%11 : int = prim::Constant[value=1]()
%9 : int = prim::Constant[value=1]()
%x1 : Float(2, 2) = aten::add(%7, %8, %9)
%w : Float(4, 2) = prim::FusedConcat[dim=0](%x1, %5)
%2 : int = prim::Constant[value=1]()
%3 : Float(4, 2) = aten::add(%w, %1, %2)
return (%3);
}
```
this is a problem because it violates two invariants:
1) all inputs to the FusionGroup must have the same size
2) prim::FusedConcat's output must not be used inside the FusionGroup
This PR fixes this problem by checking if the output to a FusionGroup came from a prim::FusedConcat node when deciding whether to fuse the consumer and producer.
If the producer is a value that came from a prim::FusedConcat node in a FusionGroup, then consumer & producer do not get fused.
cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10466
Differential Revision: D9296686
Pulled By: zou3519
fbshipit-source-id: ed826fa9c436b42c04ca7d4d790cece804c162bd
Summary:
* some small leftovers from the last PR review
* enable more unit test sets for CI
* replace use of hcRNG w/ rocRAND (docker image was already updated w/ newer rocRAND)
* use rocBLAS instead of hipBLAS to allow convergence w/ Caffe2
* use strided_batched gemm interface also from the batched internal interface
* re-enable Dropout.cu as we now have philox w/ rocRAND
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10406
Reviewed By: Jorghi12
Differential Revision: D9277093
Pulled By: ezyang
fbshipit-source-id: 7ef2f6fe4ead77e501ed7aea5c3743afe2466ca2
Summary:
Copy of #10191 because these changes didn't land with the diff.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10394
Differential Revision: D9260816
Pulled By: li-roy
fbshipit-source-id: 7dc16919cfab6221fda1d44e98c5b900cfb40558
Summary:
Previously, `tensor[i:]` was transformed to `tensor[i:-1]`. This incorrectly leaves off the last element. Noticed this when implementing slicing for list types.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10286
Differential Revision: D9193292
Pulled By: michaelsuo
fbshipit-source-id: df372b815f9a3b8029830dd9e8769f9985a890e7
Summary:
This PR for the ROCm target does the following:
* enable some unit tests on ROCm
* fix a missing static_cast that breaks BatchNorm call on ROCm
* fix BatchNorm to work on ROCm w/ ROCm warp sizes etc
* improve the pyhipify script by introducing kernel scope to some transpilations and other improvements
* fix a linking issue on ROCm
* for more unit test sets: mark currently broken tests broken (to be fixed)
* enable THINLTO (phase one) to parallelize linking
* address the first failing of the elementwise kernel by removing non-working ROCm specialization
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10266
Differential Revision: D9184178
Pulled By: ezyang
fbshipit-source-id: 03bcd1fe4ca4dd3241f09634dbd42b6a4c350297