Commit Graph

293 Commits

Author SHA1 Message Date
Roy Li
9fc22cb772 Add import export step to end to end tests
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10717

Differential Revision: D9562888

Pulled By: li-roy

fbshipit-source-id: 8f5d62fd0a44aca0a41dc10438e7bb91cc2a972a
2018-09-05 09:39:47 -07:00
Adam Paszke
6d6655e6be Port PackedSequences functions to C++ (#11224)
Summary:
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11224

Differential Revision: D9652703

Pulled By: apaszke

fbshipit-source-id: 558e39457e590cad07516e5bb2ecb12789564950
2018-09-05 06:35:15 -07:00
Adam Paszke
b7038f7c37 Treat numerical differences as warnings instead of errors when tracing (#11246)
Summary:
Also, make `torch.isclose` work with integral tensors and refactor `_check_trace` a bit.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11246

Differential Revision: D9652701

Pulled By: apaszke

fbshipit-source-id: fb0bdbfd1952e45e153541e4d471b423a5659f25
2018-09-05 06:35:13 -07:00
Zachary DeVito
1eed7d5f0b Report an error when trying to record a mutable operator when (#11129)
Summary:
there are multiple views of the tensor live.

Also adds recording for copy_ because this is the critical in place
op where these views will cause LHS indexing to fail.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11129

Differential Revision: D9600195

Pulled By: zdevito

fbshipit-source-id: bfd8f5befa47377e36d704dbdb11023c608fe9a3
2018-09-04 13:40:51 -07:00
Elias Ellison
539579aa9a Logical short circuit (#11116)
Summary:
Adding short circuit evaluation to AND or OR. The second expression of and AND or OR gets lifted into an if branch, which is conditionally evaluated.

BatchOps was using the expression `dims = dims1 or dims2`, where dims is often an empty tensor. This nows throws an error, because dims1 gets cast to a boolean, and you can't convert an empty tensor to a scalar. It now matches the behavior of pytorch in python.

One thing that came up is if the second expression in an and/or in python gets returned, it does not get coerced to a boolean.

`tensor == (False or tensor)`
`tensor == (True and tensor)`

We do not currently support this.

edit: wording
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11116

Differential Revision: D9618168

Pulled By: eellison

fbshipit-source-id: 93b202be2f222d41f85d38d9c95f04d1749e8343
2018-09-04 09:25:13 -07:00
iotamudelta
33c7cc13ca improve docker packages, fix bugs, enable tests, enable FFT (#10893)
Summary:
* improve docker packages (install OpenBLAS to have at-compile-time LAPACK functionality w/ optimizations for both Intel and AMD CPUs)
* integrate rocFFT (i.e., enable Fourier functionality)
* fix bugs in ROCm caused by wrong warp size
* enable more test sets, skip the tests that don't work on ROCm yet
* don't disable asserts any longer in hipification
* small improvements
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10893

Differential Revision: D9615053

Pulled By: ezyang

fbshipit-source-id: 864b4d27bf089421f7dfd8065e5017f9ea2f7b3b
2018-09-02 08:54:42 -07:00
James Reed
43e73f85ad Dont optimize slicing dispatch when we are tracing (#11156)
Summary:
Previously when we had a slicing expression like `x[0:5, 0]`, where the sliced tensor was of size `5` in dimension 0, we would skip dispatching the actual slice call as an optimization.

This caused incorrect behavior under tracing, as we would not record the slice op and thus if we encountered an input with a different shape while running the trace, we would get incorrect results.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11156

Differential Revision: D9622252

Pulled By: jamesr66a

fbshipit-source-id: 822f2e8f01504e131f53bd9ef51c171c7913a7cc
2018-09-01 17:13:03 -07:00
James Reed
03c06ec93d Traceable detach (#11038)
Summary:
This makes it so `detach` and `detach_` are traceable and also adds a pass to erase them before ONNX export
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11038

Differential Revision: D9588038

Pulled By: jamesr66a

fbshipit-source-id: 263dd3147e24fcb0c716743f37fdb9f84c0015e7
2018-08-31 16:40:42 -07:00
Adam Paszke
780d2792c5 Warn about non-traceable behavior when tracing (#11088)
Summary:
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11088

Differential Revision: D9585527

Pulled By: apaszke

fbshipit-source-id: 29a03cb152d83b626f748fff4501ac9e139994c2
2018-08-31 14:27:00 -07:00
Adam Paszke
82aeebb3d9 Fix a bug in addmm fusion in the JIT (#11100)
Summary:
Fixes #10839.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11100

Differential Revision: D9585533

Pulled By: apaszke

fbshipit-source-id: 19e2710c8fc113f577faf14c080d8c89afbe23c4
2018-08-31 07:24:34 -07:00
Adam Paszke
00df09b65d Change specialization rules in GraphExecutors (#10977)
Summary:
**Review last commit only.** Stacked on top of #10949.

This commit fixes a number of issues connected to caching
differentiability status of graphs inside graph executors,
and changes the rules for optimization of differentiable subgraphs.
Previously every one of those was instantiated as a separate graph
executor, but now they are simply heavier-optimized graph regions,
and graph executors are only instantiated for their backward.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10977

Differential Revision: D9600626

Pulled By: apaszke

fbshipit-source-id: dad09a0f586e396afbd5406319c1cd54fbb8a3d3
2018-08-30 22:11:01 -07:00
Adam Paszke
f3c3127c67 Don't flatten output lists in the JIT IR (#10949)
Summary:
Operators like aten::chunk used to return a number of tensors, but
now return a list. To make it easier to do shape prop through
aten::chunk and fuse it, I've also introduced prim::ConstantChunk,
which behaves like the previous implementation (has a variable length
output list).

The downside of this PR is that the introduction of more lists to the IR causes the LSTM and MiLSTM graphs to be considered as non-differentiable by the graph executor. I verified that they are still optimize correctly, and my next patch (that changes how the specializations/differentiation works) will restore those.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10949

Reviewed By: zdevito

Differential Revision: D9556823

Pulled By: apaszke

fbshipit-source-id: 33e63b17fc7247cac6cfc05eb7eb9bf069b499ee
2018-08-30 19:54:39 -07:00
Zachary DeVito
93bd291e55 Change torch.jit.trace to no longer be a decorator (#11069)
Summary:
This was done because it surprising for a decorator to run a function
rather than wrap it, and not simplify the syntax for tracing modules.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11069

Reviewed By: jamesr66a

Differential Revision: D9583192

Pulled By: zdevito

fbshipit-source-id: b914b7ab4c73c255086465a6576eef3a22de1e13
2018-08-30 13:56:05 -07:00
Erik Brinkman
611a608517 Add ATen pdist CPU kernel (#10782)
Summary:
Also add single grad whitelist to the jit test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10782

Reviewed By: ezyang

Differential Revision: D9583378

Pulled By: erikbrinkman

fbshipit-source-id: 069e5ae68ea7f3524dec39cf1d5fe9cd53941944
2018-08-30 11:55:27 -07:00
Zachary DeVito
ae635b16f7 Record tensor factory functions in trace (#10935)
Summary:
Things like torch.zeros now appear in traces rather than constants.

To continue to support our current level of ONNX export, we run
constant prop to turn these back into constants where possible before
export.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10935

Differential Revision: D9527427

Pulled By: zdevito

fbshipit-source-id: 552a8bcc01b911251dab7d7026faafdd7a3c758a
2018-08-29 17:10:24 -07:00
Adam Paszke
d9b74f6540 Make it possible to disable JIT using env variables (#10867)
Summary:
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10867

Differential Revision: D9556882

Pulled By: apaszke

fbshipit-source-id: 04c0ca875d15d37dd9ac05ac7b515cd899ddb7e4
2018-08-29 15:11:05 -07:00
James Reed
beeec47041 Sanity checks for tracing (#10841)
Summary:
TODO: integrate into torch.onnx.export -- separate PR

*Problem:* We have a facility to trace PyTorch operations on Python code, but there are several failure modes where the trace is not representative of the actual underlying computation:

* The tracer encountered dynamic control flow
* Some computation escaped the tracer, and appeared as a Constant tensor node in the graph
* Some stateful function was traced, e.g. someone did an optimization in Python by memoizing function outputs

*Objective*: In an ideal world, this whole process would be automated and the user can trust that the system will magically capture the intended semantics from the program. Realistically speaking, we will likely have to settle with a human-in-the-loop error reporting system, allowing for the user to identify problems and modify the source code to allow for tracing.

*Stage 1* (this PR): Output-level checking & graph diff. torch.jit.trace gains a kwarg 'check_inputs', which is a list of tuples of input arguments. We will iterate through the list and trace the function again for each set of check inputs. We'll also interpret the original trace with these inputs and compare output values and graphs, printing a diff of the graph if there is a difference.

Examples:

```
torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(4, 5),)])
def foo(x):
    y = torch.arange(0, x.shape[0]).float()
    return x + y.unsqueeze(1)
```

```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Graphs differed across invocations!
	Graph diff:
		  graph(%0 : Dynamic) {
		-   %1 : Dynamic = prim::Constant[value= 0  1  2 [ CPULongType{3} ]]()
		?                                                              ^
		+   %1 : Dynamic = prim::Constant[value= 0  1  2  3 [ CPULongType{4} ]]()
		?                                                +++              ^
		    %2 : int = prim::Constant[value=0]()
		    %3 : Dynamic = aten::_cast_Float(%1, %2)
		    %4 : int = prim::Constant[value=1]()
		    %5 : Dynamic = aten::unsqueeze(%3, %4)
		    %6 : int = prim::Constant[value=1]()
		    %7 : Dynamic = aten::add(%0, %5, %6)
		    return (%7);
		  }
	Node diff:
		- %1 : Dynamic = prim::Constant[value= 0  1  2 [ CPULongType{3} ]]()
		?                                                            ^
		+ %1 : Dynamic = prim::Constant[value= 0  1  2  3 [ CPULongType{4} ]]()
		?                                              +++              ^
	Trace source location:
		dank.py(5): foo
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper
		dank.py(3): <module>
	Check source location:
		dank.py(5): foo
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(281): check_trace
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(408): wrapper
		dank.py(3): <module>
ERROR: Tensor-valued Constant nodes differed in value across invocations. This often indicates that the tracer has encountered untraceable code.
	Node:
		%1 : Dynamic = prim::Constant[value= 0  1  2 [ CPULongType{3} ]]()
	Source Location:
		dank.py(5): foo
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper
		dank.py(3): <module>
	Comparison exception:
		Not equal to tolerance rtol=1e-07, atol=0

		(shapes (3,), (4,) mismatch)
		 x: array([0, 1, 2])
		 y: array([0, 1, 2, 3])

```
==

```
torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(3, 4),)])
def foo(x):
    y = x.data
    return x + y
```

```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Traced function outputs do not match the Python function outputs.
ERROR: Tensor-valued Constant nodes differed in value across invocations. This often indicates that the tracer has encountered untraceable code.
	Node:
		%1 : Dynamic = prim::Constant[value=<Tensor>]()
	Source Location:
		dank.py(6): foo
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper
		dank.py(3): <module>
	Comparison exception:
		Not equal to tolerance rtol=1e-07, atol=0

		(mismatch 100.0%)
		 x: array([0.397137, 0.956105, 0.169478, 0.560292, 0.392568, 0.108441,
		       0.97645 , 0.34412 , 0.951246, 0.793061, 0.557595, 0.770245],
		      dtype=float32)
		 y: array([0.243178, 0.315964, 0.972041, 0.0215  , 0.927751, 0.457512,
		       0.951092, 0.97883 , 0.048688, 0.118066, 0.779345, 0.271272],
		      dtype=float32)
```

==

```
import torch

torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(4, 4),)])
def foo(x):
    for _ in range(x.size(0)):
        x = torch.neg(x)
    return x
```

```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Traced function outputs do not match the Python function outputs.
ERROR: Graphs differed across invocations!
	Graph diff:
		  graph(%0 : Dynamic) {
		    %1 : Dynamic = aten::neg(%0)
		    %2 : Dynamic = aten::neg(%1)
		    %3 : Dynamic = aten::neg(%2)
		+   %4 : Dynamic = aten::neg(%3)
		-   return (%3);
		?            ^
		+   return (%4);
		?            ^
		  }
```

==

```
import torch

def foo(x):
    if not hasattr(foo, 'cache'):
        foo.cache = torch.neg(x)
    return x + foo.cache

traced = torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(3, 4),)])(foo)
```

```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Traced function outputs do not match the Python function outputs.
ERROR: Graphs differed across invocations!
	Graph diff:
		  graph(%0 : Dynamic) {
		-   %1 : Dynamic = aten::neg(%0)
		+   %1 : Dynamic = prim::Constant[value=<Tensor>]()
		    %2 : int = prim::Constant[value=1]()
		    %3 : Dynamic = aten::add(%0, %1, %2)
		    return (%3);
		  }
	Node diff:
		- %1 : Dynamic = aten::neg(%0)
		+ %1 : Dynamic = prim::Constant[value=<Tensor>]()
	Trace source location:
		test.py(5): foo
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper
		test.py(8): <module>
	Check source location:
		test.py(6): foo
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(281): check_trace
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(408): wrapper
		test.py(8): <module>
```

The following two examples show instances where program semantics are lost in the Python -> trace transformation, and repeated invocation does not give us useful debug information. Further design in underway for catching these scenarios.

```
import torch

torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(3, 4),)])
def foo(x):
    for i in range(3):
        x[i, :] = torch.zeros(4)
    return x
```

```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Traced function outputs do not match the Python function outputs.
Exception:
Not equal to tolerance rtol=1e-07, atol=0

(mismatch 100.0%)
 x: array([0.830221, 0.915481, 0.940281, 0.555241], dtype=float32)
 y: array([0., 0., 0., 0.], dtype=float32)
```

==

```
import torch

torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(5, 6),)])
def foo(x):
    x.view(-1).add_(-x.view(-1))
    return x
```

```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Traced function outputs do not match the Python function outputs.
Exception:
Not equal to tolerance rtol=1e-07, atol=0

(mismatch 100.0%)
 x: array([0.734441, 0.445327, 0.640592, 0.30076 , 0.891674, 0.124771],
      dtype=float32)
 y: array([0., 0., 0., 0., 0., 0.], dtype=float32)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10841

Differential Revision: D9499945

Pulled By: jamesr66a

fbshipit-source-id: 1f842a32d0b0645259cc43b29700b86d99c59a45
2018-08-28 20:25:26 -07:00
Zachary DeVito
22c9bc3117 Resolve builtins using a dict rather than by name (#10927)
Summary:
Changes the approach for resolving builtin ops so that the following works

```
add = torch.add
script
def foo(x):
  return add(x, x)
```

This handles cases when people alias torch and torch.nn.functional to
shorter names.

This works by building a table of id -> builtin name for the know builtin
ops in torch, torch.nn.functional, and for any user-defined
op created by accessing in torch.ops.foo.bar

This allows us to clean up many SugaredValue types in the compiler.

Notes:
* we now consider any attributes on python modules to be constants
(e.g. math.pi, and torch.double).
* fixes a bug where we incorrectly allowed attribute lookup on arbitrary
pyton objects. It is now restricted to modules only.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10927

Differential Revision: D9527522

Pulled By: zdevito

fbshipit-source-id: 0280422af08b4b0f48f302766d5a9c0deee47660
2018-08-28 11:25:11 -07:00
Elias Ellison
58b145f515 Fix negative indices in tracer (#10560)
Summary:
Previously when tracing slicing & select negative indices would get normalized, fixing the index to the size of the traced tensor. This makes the behavior the same as script so aten::select with negative indices is emitted.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10560

Differential Revision: D9493614

Pulled By: eellison

fbshipit-source-id: ce7a8bae59863723247208d86b9f2948051ccc6c
2018-08-27 15:19:41 -07:00
Zachary DeVito
6ce799edd6 Tuples/Lists can now be inputs/outputs to script and other simple fixes. (#10812)
Summary:
* Fix the necessary pathways so that tuples and lists can be inputs to the script.

* prevent linear algebra functions from being run in shape prop because
they frequently will error out for nonsense data.

* favor schema-driven python input conversion where possible.
remaining cases where we directly create Stacks without schema are
only for debugging

* Make the error messages when calling script/trace functions more pythonic

* Simplify FlattenTuples -- now that tuples are supported we can choose to only flatten tuples when needed. This may have to be revisited pending onnx test results, but is necessary for making tuple io work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10812

Differential Revision: D9477982

Pulled By: zdevito

fbshipit-source-id: ed06fc426e6ef6deb404602a26c435a7fc40ea0c
2018-08-27 14:40:40 -07:00
Richard Zou
35beecfe17 fix xfails involving literals (#10905)
Summary:
I missed these in #10900

cc apaszke jamesr66a zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10905

Differential Revision: D9516748

Pulled By: zou3519

fbshipit-source-id: a5c3e3b65a33c339d5c4e9fc160462c3d35705f3
2018-08-27 12:41:06 -07:00
Richard Zou
67f6f930a8 Remove FIXME_zerol() from test_jit.py (#10900)
Summary:
The scalar situation has gotten a lot better and now we can
remove all instances of FIXME_zerol().

cc zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10900

Differential Revision: D9514206

Pulled By: zou3519

fbshipit-source-id: e4e522f324126c5454cd6de14b832d2d1f6cb0ce
2018-08-27 08:55:08 -07:00
Adam Paszke
c8b246abf3 Prevent JIT from overspecializing to every single size configuration (#10844)
Summary:
Please review the expects carefully to make sure there are no regressions. I tried to go over them one by one when they changed, but it's sometimes easy to miss finer details.

Summary of changes:

- Renamed `TensorType` to `CompleteTensorType`. Added a new `TensorType` which records only the scalar type, number of dimensions, and device of a value. The argument behind the rename is to encourage people to use `CompleteTensorType` less, as most passes will only have limited information available. To make transition easier `complete_type->cast<TensorType>()` works, and makes our passes work with both kinds of specialization if they don't need extra the extra detail.
- Renamed `ArgumentSpec` to `CompleteArgumentSpec`. Added a new `ArgumentSpec`, which matches argument only at the level of the new `TensorType`.
- Shape analysis can process graphs with both `CompleteTensorType` and `TensorType`.
- Fuser was a part that heavily relied on full shape information being available. Now, we simply try to fuse the largest possible graphs, and have to do run-time checks to make sure they match the code we generate. If they don't, we fall back to regular interpretation. The shape checks are implementing using an optimized method exploiting algebraic properties of shapes with broadcasting, and the relations of broadcasting with pointwise ops. A full written proof of correctness of the shape checking algorithm is included in a comment in `graph_fuser.cpp`.

zdevito ezyang mruberry ngimel csarofeen
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10844

Differential Revision: D9498705

Pulled By: apaszke

fbshipit-source-id: 0c53c2fcebd871cc2a29c260f8d012276479cc61
2018-08-26 09:54:48 -07:00
Elias Ellison
0ef5cfd28c fix ivalue printing for lists (#10777)
Summary:
Fixing the printing of IValue lists, which didn't work previously.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10777

Differential Revision: D9474264

Pulled By: eellison

fbshipit-source-id: 0c7d6e7ecaa3f7908b131ac9f1036f19ac4f8b4f
2018-08-24 16:02:03 -07:00
Elias Ellison
74e6a666b3 If none of the schema match, add ImplicitTensorToNum conversions where needed. (#10180)
Summary:
When matching schema, first try to match without adding TensorToNum conversions. Then make another pass where TensorToNum conversions are allowed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10180

Differential Revision: D9438153

Pulled By: eellison

fbshipit-source-id: 80541b5abd06e9d4187e89dda751f44dab6f58c5
2018-08-24 16:02:00 -07:00
Richard Zou
ca567862b2 Support multidimensional indexing (#10787)
Summary:
Part of #10774.

This PR does the following:
- Support ast.ExtSlice in the frontend. This is done by returning a
  list of ast.Index and ast.Slice.
- Support multidimensional indexing with ints and slices

The general approach is to desugar multidimensional indexing into
at::slice, at::select operations. This is exactly how normal pytorch
does indexing (by desugaring it into at::slice, at::select, and other ops).

I used [this code](https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/python_variable_indexing.cpp) as reference.
We should be able to copy the rest of this to implement the missing
indexing features in script (indexing with ellipses, tensors, sequences, etc).

After I'm done implementing the missing indexing features in future prs, I can try to
templatize python_variable_indexing.cpp so that it can work with both JIT
script and normal pytorch indexing, but right now I'm not sure if that's
a good idea or not.

cc zdevito jamesr66a apaszke wanchaol
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10787

Differential Revision: D9481402

Pulled By: zou3519

fbshipit-source-id: 78c9fa42771a037d157879e23e20b87401cf1837
2018-08-24 08:10:32 -07:00
Zachary DeVito
3d43a82440 Add support for vararg style functions. (#10250)
Summary:
Things like `zeros(1,2,3, dtype=torch.int)` are now supported in the script by altering tryMatchSchema to auto-construct the list `[1,2,3]` when it sees inlined members of the list as the last positional arguments.

I suggest reading the commits individually, since the first two incrementally change how we do tryMatchSchema to get it ready for adding vararg list conversion, while the third actually does the modification.

closes #10632
closes #8516
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10250

Differential Revision: D9478235

Pulled By: zdevito

fbshipit-source-id: 0c48caf7a6184e463d9293d97015e9884758ef9c
2018-08-23 15:10:36 -07:00
Elias Ellison
5c0eece2fd Force types on values returned from if blocks to be equivalent (#10281)
Summary:
When emitting if Branches, check that the types on each value returned are equivalent. As with reassignment of values, tensors are not forced to be the same shape or subtype.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10281

Differential Revision: D9466566

Pulled By: eellison

fbshipit-source-id: 746abdeb34a0f68806b8e73726ad5003b536911c
2018-08-22 19:55:38 -07:00
Adam Paszke
f72e813c2f Allow tracing functions that take tuples of tensors as inputs (#10637)
Summary:
And return tuples.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10637

Reviewed By: eellison

Differential Revision: D9385892

Pulled By: apaszke

fbshipit-source-id: 542f4444d909fb246d7f1d88d6fb98345de2d431
2018-08-22 15:37:10 -07:00
Richard Zou
6c84f7fea0 Relax RHS type assert for augassign (#10730)
Summary:
Augassign (i.e., `x += 1`) gets desugared to an assignment of a binop (`x = x + 1`).
Right now we assert that the RHS of the binop is a tensor,
but it really doesn't have to be because we support scalar/scalar ops and also
list-list ops (i.e., `[1, 2] + [2, 3]`).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10730

Differential Revision: D9465110

Pulled By: zou3519

fbshipit-source-id: 7b118622701f09ce356aca81b8db743d9611097b
2018-08-22 15:10:33 -07:00
James Reed
6fcac354c5 Erase ListConstruct nodes for ONNX export (#10713)
Summary:
ONNX doesn't support this. Instead flatten the inputs to the ListConstruct op and inline it into the subsequent usage
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10713

Differential Revision: D9458508

Pulled By: jamesr66a

fbshipit-source-id: 0b41e69320e694bb2f304c6221864a39121e4694
2018-08-22 14:39:58 -07:00
Michael Suo
9e75ec11fb Make empty list literals construct empty Tensor[] (#10705)
Summary:
This will make the common case more natural (no need to do `_construct_empty_tensor_list()`)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10705

Differential Revision: D9411622

Pulled By: michaelsuo

fbshipit-source-id: 2d91fbc5787426748d6e1c8e7bbeee737544dc96
2018-08-20 18:28:28 -07:00
James Reed
585e6b581f Allow method-style casts on tensors (#10641)
Summary:
Closes https://github.com/pytorch/pytorch/issues/10631
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10641

Differential Revision: D9407598

Pulled By: jamesr66a

fbshipit-source-id: a0331f4e9e55d92718cde7a1112fe8c705206b1f
2018-08-20 14:10:21 -07:00
Richard Zou
f1420adfe3 Move at::chunk into the graph fuser (#10178)
Summary:
... to avoid slow at::chunk (it is slow due to tensor initialization). Picking up from #10026

This is done through the following:

1) Absorb starting chunks into FusionGroup as a part of the graph fuser
pass.
2) When compiling a kernel, emit a `std::vector<ConcatDesc>` that describes if an input (of the original graph) will be chunked.
3) When launching a kernel, `use std::vector<ConcatDesc>` to chunk an
input tensor on the CPU. This chunk directly takes in an at::Tensor and creates
four TensorInfo structs in-place in the argument list, bypassing the creation of intermediate Tensors.

- Expect test and correctness test to see if a single chunk is fused
  by the graph fuser
- Correctness test for a variety of chunks (dimension = beginning,
  middle, end) and tensors (contiguous, non-contiguous, edge case
  (splitSize = 1) for both CPU/CUDA
- Expect test for multiple chunks fused into the same kernel and
  correctness test.

cc zdevito apaszke

LSTM forward pass, 1 layer, 512 hidden size and input size, 100 seq length, requires_grad=False on all inputs and weights.

After changes:
```
thnn    cudnn   jit
8.8468  6.5797  9.3470
```

Before changes:
```
thnn    cudnn   jit
9.9221  6.6539  11.2550
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10178

Differential Revision: D9382661

Pulled By: zou3519

fbshipit-source-id: 1f8a749208fbdd45559775ce98cf4eb9558448f8
2018-08-18 16:10:11 -07:00
Richard Zou
e29b5a1ea8 graph fuser inserts explicit expands where necessary (#10325)
Summary:
Fixes #10096

If the only thing preventing a simple mappable operator from being fused
into a fusion group is that its Tensor inputs are not of the same shape as the
output, then the graph fuser inserts explicit expand nodes for those
inputs.
This helps the graph fuser not miss out on any fusion opportunities
involving simple mappable operations that have Tensor inputs. This PR
doesn't do anything for the scalar case; that can be addressed later.

Test Plan
- Simple expect test case
- Added expect tests for a raw LSTMCell. The expands help speed up the
  forwards pass by allowing more operations to be fused into the LSTMCell's single
  FusionGroup.

cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10325

Differential Revision: D9379308

Pulled By: zou3519

fbshipit-source-id: 86d2202eb97e9bb16e511667b7fe177aeaf88245
2018-08-17 16:03:46 -07:00
Richard Zou
86c9856d9c Fuse tensor-scalar ops when scalar is constant (#10511)
Summary:
This is on the way to resolving #9940.

Fixes #10501

This PR modifies graph fuser to fuse operations that have constant
scalar arguments. These constant scalar arguments are directly inlined
into the kernel body.

The context for this is that LSTM backward (in particular, sigmoid
backward) has many add(x, 1.) operations. This PR should be sufficient for
LSTM backward to get fused by the graph fuser.

cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10511

Differential Revision: D9378896

Pulled By: zou3519

fbshipit-source-id: 6a7a2987f5b6e8edaaf4b599cd200df33361650f
2018-08-17 14:10:23 -07:00
Wanchao Liang
52058204d6 Add nn functional tests in JIT (#10409)
Summary:
The PR is the first step to integrate torch.nn library with JIT. It adds the tests for nn functional interfaces in trace/script mode, and tries to find out the different between torch.nn.functional ops and the ATen ops, to see the work need to be done in order to support a full set of nn functional in script mode.

Some statistics in summary:

- Totally 84 useful functions in torch.nn.functional (the number does not include helper funcs and deprecated funcs in torch.nn.functional).

- 7 functions/ops does not support higher gradient, so just excluded from the whole test.

- 36 functions is different with the Aten op for different reasons. Among those 36 functions, bunch of them (roughly around 10-15) are just naming difference and simple transformation using other ops inside the function.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10409

Differential Revision: D9350694

Pulled By: wanchaol

fbshipit-source-id: 8fce6f30d8d25ace5a544a57b219fe61f5a092f8
2018-08-17 11:09:49 -07:00
Elias Ellison
e190505e84 Adding support for inlining if branches (#10084)
Summary:
Inlining if branches which have constant inputs.  If an if node gets inlined, the set of mutated variables returned by its ancestors may have changed. In the following example the block should
return a mutated set of (a) and not (a, b).

```
if cond:
  if True:
	 a = a - 1
    else:
	b = b - 1
```
To calculate this we recursively update mutate variables in if branches from the leaf nodes up.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10084

Reviewed By: michaelsuo

Differential Revision: D9340429

Pulled By: eellison

fbshipit-source-id: b0dd638a5cace9fdec3130460428fca655ce4b98
2018-08-17 09:48:47 -07:00
Peter Goldsborough
c101a57a74 Build mechanism for custom operators (#10226)
Summary:
This is the last step in the custom operator implementation: providing a way to build from C++ and Python. For this I:

1. Created a `FindTorch.cmake` taken largely from ebetica with a CMake function to easily create simple custom op libraries
2. Created a ` torch/op.h` header for easy inclusion of necessary headers,
3. Created a test directory `pytorch/test/custom_operator` which includes the basic setup for a custom op.
    1. It defines an op in `op.{h,cpp}`
    2. Registers it with the JIT using `RegisterOperators`
    3. Builds it into a shared library via a `CMakeLists.txt`
    4. Binds it into Python using a `setup.py`. This step makes use of our C++ extension setup that we already have. No work, yey!

The pure C++ and the Python builds are separate and not coupled in any way.

zdevito soumith dzhulgakov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10226

Differential Revision: D9296839

Pulled By: goldsborough

fbshipit-source-id: 32f74cafb6e3d86cada8dfca8136d0dfb1f197a0
2018-08-16 18:56:17 -07:00
Owen Anderson
abf85bf0ef Perform CSE across block boundaries. (#10105)
Summary:
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10105

Differential Revision: D9186678

Pulled By: resistor

fbshipit-source-id: 87b63d4fc0c7d394edb4777acdefa8f022a8bf8d
2018-08-16 00:25:36 -07:00
James Reed
32bb4040dd Unified type annotation parsing for script frontends (#10279)
Summary:
After this, all combinations of {String frontend, Python AST Frontend}{Python 3-style type annotations, MyPy-style type comments}{Script method, Script function} should properly accept type annotations.

Possible TODOs:
- Clean up the functions marked HACK
- Clean up the Subscript tree-view to better match the Python AST versions
- Can we use this for Python functions? That's the only place annotations.get_signature() is still needed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10279

Differential Revision: D9319726

Pulled By: jamesr66a

fbshipit-source-id: b13f7d4f066b0283d4fc1421a1abb9305c3b28fa
2018-08-14 18:13:15 -07:00
Richard Zou
b4462511fd Add LSTMCell backward pass expect tests (#10506)
Summary:
- Exposed get_debug_graph for ScriptModule (gets the debug graph for its
  forward Method)
- Added forward/backward expect tests for lstm and milstm cells. These
  are intended to prevent regressions

cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10506

Differential Revision: D9316590

Pulled By: zou3519

fbshipit-source-id: 3c2510d8363e9733ccbc5c7cc015cd1d028efecf
2018-08-14 11:39:44 -07:00
Zachary DeVito
61bedc96f0 Schema-based creation of graph nodes (#10198)
Summary:
This commit adds the ability to insert a node with inputs, using the schema to check the inputs are valid types, fill in any default values, and perform standard implicit conversions. Since it is schema based, it will discover and use the right overload.
Constructors to `NamedValue` enable it to be constructed using `IValue` constants so it is possible to use constant values in the input list as well:

```
g.insert(aten::add, {v, 3});
```

Keyword arguments are also supported:

```
g.insert(aten::add, {v}, {{"other", t}, {"scalar", 1}});
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10198

Differential Revision: D9307252

Pulled By: zdevito

fbshipit-source-id: 644620aa85047d1eae1288383a619d50fec44d9b
2018-08-14 10:25:38 -07:00
Richard Zou
fed05cf4cf Fix prim::FusedConcat bug (#10466)
Summary:
Fixes #10456

The graph fuser was fusing together groups with prim::FusedConcat (the producer) with other ops (the consumer) if the consumer is fusable. For example,

```
import torch
torch.jit.script
def fn(x, y, z):
    x1 = x + y
    y1 = x - y
    w = torch.cat([x1, y1])
    return w + z

x = torch.randn(2, 2, dtype=torch.float, device='cpu')
y = torch.randn(2, 2, dtype=torch.float, device='cpu')
z = torch.randn(4, 2, dtype=torch.float, device='cpu')
fn(x, y, z)
fn.graph_for(x, y, z)
```
produced the following graph:
```
graph(%x : Float(2, 2)
      %y : Float(2, 2)
      %z : Float(4, 2)) {
  %3 : int = prim::Constant[value=1]()
  %y1 : Float(2, 2) = aten::sub(%x, %y, %3)
  %8 : int = prim::Constant[value=0]()
  %14 : Float(4, 2) = prim::FusionGroup_0[device=-1](%z, %y1, %x, %y)
  return (%14);
}
with prim::FusionGroup_0 = graph(%1 : Float(4, 2)
      %5 : Float(2, 2)
      %7 : Float(2, 2)
      %8 : Float(2, 2)) {
  %11 : int = prim::Constant[value=1]()
  %9 : int = prim::Constant[value=1]()
  %x1 : Float(2, 2) = aten::add(%7, %8, %9)
  %w : Float(4, 2) = prim::FusedConcat[dim=0](%x1, %5)
  %2 : int = prim::Constant[value=1]()
  %3 : Float(4, 2) = aten::add(%w, %1, %2)
  return (%3);
}
```

this is a problem because it violates two invariants:
1) all inputs to the FusionGroup must have the same size
2) prim::FusedConcat's output must not be used inside the FusionGroup

This PR fixes this problem by checking if the output to a FusionGroup came from a prim::FusedConcat node when deciding whether to fuse the consumer and producer.
If the producer is a value that came from a prim::FusedConcat node in a FusionGroup, then consumer & producer do not get fused.

cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10466

Differential Revision: D9296686

Pulled By: zou3519

fbshipit-source-id: ed826fa9c436b42c04ca7d4d790cece804c162bd
2018-08-13 21:09:25 -07:00
iotamudelta
75651d5b58 improve use of ROCm libraries, enable more tests, small fixes (#10406)
Summary:
* some small leftovers from the last PR review
* enable more unit test sets for CI
* replace use of hcRNG w/ rocRAND (docker image was already updated w/ newer rocRAND)
* use rocBLAS instead of hipBLAS to allow convergence w/ Caffe2
* use strided_batched gemm interface also from the batched internal interface
* re-enable Dropout.cu as we now have philox w/ rocRAND
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10406

Reviewed By: Jorghi12

Differential Revision: D9277093

Pulled By: ezyang

fbshipit-source-id: 7ef2f6fe4ead77e501ed7aea5c3743afe2466ca2
2018-08-13 11:39:43 -07:00
Roy Li
e9ad74357e Use serialization container in ir import export (#10394)
Summary:
Copy of #10191 because these changes didn't land with the diff.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10394

Differential Revision: D9260816

Pulled By: li-roy

fbshipit-source-id: 7dc16919cfab6221fda1d44e98c5b900cfb40558
2018-08-10 00:09:30 -07:00
Michael Suo
0950d7a98d support list slicing (#10318)
Summary:
As title.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10318

Differential Revision: D9254351

Pulled By: michaelsuo

fbshipit-source-id: be891a584dc295b5e353f7f5257d64a356fb9586
2018-08-09 17:25:13 -07:00
Michael Suo
b6402648f4 fix off-by-one bug in open-ended slicing (#10286)
Summary:
Previously, `tensor[i:]` was transformed to `tensor[i:-1]`. This incorrectly leaves off the last element. Noticed this when implementing slicing for list types.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10286

Differential Revision: D9193292

Pulled By: michaelsuo

fbshipit-source-id: df372b815f9a3b8029830dd9e8769f9985a890e7
2018-08-07 00:39:42 -07:00
Michael Suo
5a7c710548 Support some basic list operations (#10225)
Summary:
Support a few basic operators:
- eq
- add
- len
- select (indexing)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10225

Differential Revision: D9172338

Pulled By: michaelsuo

fbshipit-source-id: 6e75ec1453b9589b0fb4698598ecdba5a5fccff9
2018-08-07 00:39:40 -07:00
iotamudelta
a38b572de3 enable unit tests and other changes (#10266)
Summary:
This PR for the ROCm target does the following:
* enable some unit tests on ROCm
* fix a missing static_cast that breaks BatchNorm call on ROCm
* fix BatchNorm to work on ROCm w/ ROCm warp sizes etc
* improve the pyhipify script by introducing kernel scope to some transpilations and other improvements
* fix a linking issue on ROCm
* for more unit test sets: mark currently broken tests broken (to be fixed)
* enable THINLTO (phase one) to parallelize linking
* address the first failing of the elementwise kernel by removing non-working ROCm specialization
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10266

Differential Revision: D9184178

Pulled By: ezyang

fbshipit-source-id: 03bcd1fe4ca4dd3241f09634dbd42b6a4c350297
2018-08-06 14:54:01 -07:00