Commit Graph

283 Commits

Author SHA1 Message Date
Adam Paszke
00df09b65d Change specialization rules in GraphExecutors (#10977)
Summary:
**Review last commit only.** Stacked on top of #10949.

This commit fixes a number of issues connected to caching
differentiability status of graphs inside graph executors,
and changes the rules for optimization of differentiable subgraphs.
Previously every one of those was instantiated as a separate graph
executor, but now they are simply heavier-optimized graph regions,
and graph executors are only instantiated for their backward.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10977

Differential Revision: D9600626

Pulled By: apaszke

fbshipit-source-id: dad09a0f586e396afbd5406319c1cd54fbb8a3d3
2018-08-30 22:11:01 -07:00
Adam Paszke
f3c3127c67 Don't flatten output lists in the JIT IR (#10949)
Summary:
Operators like aten::chunk used to return a number of tensors, but
now return a list. To make it easier to do shape prop through
aten::chunk and fuse it, I've also introduced prim::ConstantChunk,
which behaves like the previous implementation (has a variable length
output list).

The downside of this PR is that the introduction of more lists to the IR causes the LSTM and MiLSTM graphs to be considered as non-differentiable by the graph executor. I verified that they are still optimize correctly, and my next patch (that changes how the specializations/differentiation works) will restore those.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10949

Reviewed By: zdevito

Differential Revision: D9556823

Pulled By: apaszke

fbshipit-source-id: 33e63b17fc7247cac6cfc05eb7eb9bf069b499ee
2018-08-30 19:54:39 -07:00
Zachary DeVito
93bd291e55 Change torch.jit.trace to no longer be a decorator (#11069)
Summary:
This was done because it surprising for a decorator to run a function
rather than wrap it, and not simplify the syntax for tracing modules.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11069

Reviewed By: jamesr66a

Differential Revision: D9583192

Pulled By: zdevito

fbshipit-source-id: b914b7ab4c73c255086465a6576eef3a22de1e13
2018-08-30 13:56:05 -07:00
Erik Brinkman
611a608517 Add ATen pdist CPU kernel (#10782)
Summary:
Also add single grad whitelist to the jit test
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10782

Reviewed By: ezyang

Differential Revision: D9583378

Pulled By: erikbrinkman

fbshipit-source-id: 069e5ae68ea7f3524dec39cf1d5fe9cd53941944
2018-08-30 11:55:27 -07:00
Zachary DeVito
ae635b16f7 Record tensor factory functions in trace (#10935)
Summary:
Things like torch.zeros now appear in traces rather than constants.

To continue to support our current level of ONNX export, we run
constant prop to turn these back into constants where possible before
export.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10935

Differential Revision: D9527427

Pulled By: zdevito

fbshipit-source-id: 552a8bcc01b911251dab7d7026faafdd7a3c758a
2018-08-29 17:10:24 -07:00
Adam Paszke
d9b74f6540 Make it possible to disable JIT using env variables (#10867)
Summary:
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10867

Differential Revision: D9556882

Pulled By: apaszke

fbshipit-source-id: 04c0ca875d15d37dd9ac05ac7b515cd899ddb7e4
2018-08-29 15:11:05 -07:00
James Reed
beeec47041 Sanity checks for tracing (#10841)
Summary:
TODO: integrate into torch.onnx.export -- separate PR

*Problem:* We have a facility to trace PyTorch operations on Python code, but there are several failure modes where the trace is not representative of the actual underlying computation:

* The tracer encountered dynamic control flow
* Some computation escaped the tracer, and appeared as a Constant tensor node in the graph
* Some stateful function was traced, e.g. someone did an optimization in Python by memoizing function outputs

*Objective*: In an ideal world, this whole process would be automated and the user can trust that the system will magically capture the intended semantics from the program. Realistically speaking, we will likely have to settle with a human-in-the-loop error reporting system, allowing for the user to identify problems and modify the source code to allow for tracing.

*Stage 1* (this PR): Output-level checking & graph diff. torch.jit.trace gains a kwarg 'check_inputs', which is a list of tuples of input arguments. We will iterate through the list and trace the function again for each set of check inputs. We'll also interpret the original trace with these inputs and compare output values and graphs, printing a diff of the graph if there is a difference.

Examples:

```
torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(4, 5),)])
def foo(x):
    y = torch.arange(0, x.shape[0]).float()
    return x + y.unsqueeze(1)
```

```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Graphs differed across invocations!
	Graph diff:
		  graph(%0 : Dynamic) {
		-   %1 : Dynamic = prim::Constant[value= 0  1  2 [ CPULongType{3} ]]()
		?                                                              ^
		+   %1 : Dynamic = prim::Constant[value= 0  1  2  3 [ CPULongType{4} ]]()
		?                                                +++              ^
		    %2 : int = prim::Constant[value=0]()
		    %3 : Dynamic = aten::_cast_Float(%1, %2)
		    %4 : int = prim::Constant[value=1]()
		    %5 : Dynamic = aten::unsqueeze(%3, %4)
		    %6 : int = prim::Constant[value=1]()
		    %7 : Dynamic = aten::add(%0, %5, %6)
		    return (%7);
		  }
	Node diff:
		- %1 : Dynamic = prim::Constant[value= 0  1  2 [ CPULongType{3} ]]()
		?                                                            ^
		+ %1 : Dynamic = prim::Constant[value= 0  1  2  3 [ CPULongType{4} ]]()
		?                                              +++              ^
	Trace source location:
		dank.py(5): foo
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper
		dank.py(3): <module>
	Check source location:
		dank.py(5): foo
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(281): check_trace
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(408): wrapper
		dank.py(3): <module>
ERROR: Tensor-valued Constant nodes differed in value across invocations. This often indicates that the tracer has encountered untraceable code.
	Node:
		%1 : Dynamic = prim::Constant[value= 0  1  2 [ CPULongType{3} ]]()
	Source Location:
		dank.py(5): foo
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper
		dank.py(3): <module>
	Comparison exception:
		Not equal to tolerance rtol=1e-07, atol=0

		(shapes (3,), (4,) mismatch)
		 x: array([0, 1, 2])
		 y: array([0, 1, 2, 3])

```
==

```
torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(3, 4),)])
def foo(x):
    y = x.data
    return x + y
```

```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Traced function outputs do not match the Python function outputs.
ERROR: Tensor-valued Constant nodes differed in value across invocations. This often indicates that the tracer has encountered untraceable code.
	Node:
		%1 : Dynamic = prim::Constant[value=<Tensor>]()
	Source Location:
		dank.py(6): foo
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper
		dank.py(3): <module>
	Comparison exception:
		Not equal to tolerance rtol=1e-07, atol=0

		(mismatch 100.0%)
		 x: array([0.397137, 0.956105, 0.169478, 0.560292, 0.392568, 0.108441,
		       0.97645 , 0.34412 , 0.951246, 0.793061, 0.557595, 0.770245],
		      dtype=float32)
		 y: array([0.243178, 0.315964, 0.972041, 0.0215  , 0.927751, 0.457512,
		       0.951092, 0.97883 , 0.048688, 0.118066, 0.779345, 0.271272],
		      dtype=float32)
```

==

```
import torch

torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(4, 4),)])
def foo(x):
    for _ in range(x.size(0)):
        x = torch.neg(x)
    return x
```

```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Traced function outputs do not match the Python function outputs.
ERROR: Graphs differed across invocations!
	Graph diff:
		  graph(%0 : Dynamic) {
		    %1 : Dynamic = aten::neg(%0)
		    %2 : Dynamic = aten::neg(%1)
		    %3 : Dynamic = aten::neg(%2)
		+   %4 : Dynamic = aten::neg(%3)
		-   return (%3);
		?            ^
		+   return (%4);
		?            ^
		  }
```

==

```
import torch

def foo(x):
    if not hasattr(foo, 'cache'):
        foo.cache = torch.neg(x)
    return x + foo.cache

traced = torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(3, 4),)])(foo)
```

```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Traced function outputs do not match the Python function outputs.
ERROR: Graphs differed across invocations!
	Graph diff:
		  graph(%0 : Dynamic) {
		-   %1 : Dynamic = aten::neg(%0)
		+   %1 : Dynamic = prim::Constant[value=<Tensor>]()
		    %2 : int = prim::Constant[value=1]()
		    %3 : Dynamic = aten::add(%0, %1, %2)
		    return (%3);
		  }
	Node diff:
		- %1 : Dynamic = aten::neg(%0)
		+ %1 : Dynamic = prim::Constant[value=<Tensor>]()
	Trace source location:
		test.py(5): foo
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper
		test.py(8): <module>
	Check source location:
		test.py(6): foo
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(281): check_trace
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(408): wrapper
		test.py(8): <module>
```

The following two examples show instances where program semantics are lost in the Python -> trace transformation, and repeated invocation does not give us useful debug information. Further design in underway for catching these scenarios.

```
import torch

torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(3, 4),)])
def foo(x):
    for i in range(3):
        x[i, :] = torch.zeros(4)
    return x
```

```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Traced function outputs do not match the Python function outputs.
Exception:
Not equal to tolerance rtol=1e-07, atol=0

(mismatch 100.0%)
 x: array([0.830221, 0.915481, 0.940281, 0.555241], dtype=float32)
 y: array([0., 0., 0., 0.], dtype=float32)
```

==

```
import torch

torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(5, 6),)])
def foo(x):
    x.view(-1).add_(-x.view(-1))
    return x
```

```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Traced function outputs do not match the Python function outputs.
Exception:
Not equal to tolerance rtol=1e-07, atol=0

(mismatch 100.0%)
 x: array([0.734441, 0.445327, 0.640592, 0.30076 , 0.891674, 0.124771],
      dtype=float32)
 y: array([0., 0., 0., 0., 0., 0.], dtype=float32)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10841

Differential Revision: D9499945

Pulled By: jamesr66a

fbshipit-source-id: 1f842a32d0b0645259cc43b29700b86d99c59a45
2018-08-28 20:25:26 -07:00
Zachary DeVito
22c9bc3117 Resolve builtins using a dict rather than by name (#10927)
Summary:
Changes the approach for resolving builtin ops so that the following works

```
add = torch.add
script
def foo(x):
  return add(x, x)
```

This handles cases when people alias torch and torch.nn.functional to
shorter names.

This works by building a table of id -> builtin name for the know builtin
ops in torch, torch.nn.functional, and for any user-defined
op created by accessing in torch.ops.foo.bar

This allows us to clean up many SugaredValue types in the compiler.

Notes:
* we now consider any attributes on python modules to be constants
(e.g. math.pi, and torch.double).
* fixes a bug where we incorrectly allowed attribute lookup on arbitrary
pyton objects. It is now restricted to modules only.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10927

Differential Revision: D9527522

Pulled By: zdevito

fbshipit-source-id: 0280422af08b4b0f48f302766d5a9c0deee47660
2018-08-28 11:25:11 -07:00
Elias Ellison
58b145f515 Fix negative indices in tracer (#10560)
Summary:
Previously when tracing slicing & select negative indices would get normalized, fixing the index to the size of the traced tensor. This makes the behavior the same as script so aten::select with negative indices is emitted.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10560

Differential Revision: D9493614

Pulled By: eellison

fbshipit-source-id: ce7a8bae59863723247208d86b9f2948051ccc6c
2018-08-27 15:19:41 -07:00
Zachary DeVito
6ce799edd6 Tuples/Lists can now be inputs/outputs to script and other simple fixes. (#10812)
Summary:
* Fix the necessary pathways so that tuples and lists can be inputs to the script.

* prevent linear algebra functions from being run in shape prop because
they frequently will error out for nonsense data.

* favor schema-driven python input conversion where possible.
remaining cases where we directly create Stacks without schema are
only for debugging

* Make the error messages when calling script/trace functions more pythonic

* Simplify FlattenTuples -- now that tuples are supported we can choose to only flatten tuples when needed. This may have to be revisited pending onnx test results, but is necessary for making tuple io work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10812

Differential Revision: D9477982

Pulled By: zdevito

fbshipit-source-id: ed06fc426e6ef6deb404602a26c435a7fc40ea0c
2018-08-27 14:40:40 -07:00
Richard Zou
35beecfe17 fix xfails involving literals (#10905)
Summary:
I missed these in #10900

cc apaszke jamesr66a zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10905

Differential Revision: D9516748

Pulled By: zou3519

fbshipit-source-id: a5c3e3b65a33c339d5c4e9fc160462c3d35705f3
2018-08-27 12:41:06 -07:00
Richard Zou
67f6f930a8 Remove FIXME_zerol() from test_jit.py (#10900)
Summary:
The scalar situation has gotten a lot better and now we can
remove all instances of FIXME_zerol().

cc zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10900

Differential Revision: D9514206

Pulled By: zou3519

fbshipit-source-id: e4e522f324126c5454cd6de14b832d2d1f6cb0ce
2018-08-27 08:55:08 -07:00
Adam Paszke
c8b246abf3 Prevent JIT from overspecializing to every single size configuration (#10844)
Summary:
Please review the expects carefully to make sure there are no regressions. I tried to go over them one by one when they changed, but it's sometimes easy to miss finer details.

Summary of changes:

- Renamed `TensorType` to `CompleteTensorType`. Added a new `TensorType` which records only the scalar type, number of dimensions, and device of a value. The argument behind the rename is to encourage people to use `CompleteTensorType` less, as most passes will only have limited information available. To make transition easier `complete_type->cast<TensorType>()` works, and makes our passes work with both kinds of specialization if they don't need extra the extra detail.
- Renamed `ArgumentSpec` to `CompleteArgumentSpec`. Added a new `ArgumentSpec`, which matches argument only at the level of the new `TensorType`.
- Shape analysis can process graphs with both `CompleteTensorType` and `TensorType`.
- Fuser was a part that heavily relied on full shape information being available. Now, we simply try to fuse the largest possible graphs, and have to do run-time checks to make sure they match the code we generate. If they don't, we fall back to regular interpretation. The shape checks are implementing using an optimized method exploiting algebraic properties of shapes with broadcasting, and the relations of broadcasting with pointwise ops. A full written proof of correctness of the shape checking algorithm is included in a comment in `graph_fuser.cpp`.

zdevito ezyang mruberry ngimel csarofeen
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10844

Differential Revision: D9498705

Pulled By: apaszke

fbshipit-source-id: 0c53c2fcebd871cc2a29c260f8d012276479cc61
2018-08-26 09:54:48 -07:00
Elias Ellison
0ef5cfd28c fix ivalue printing for lists (#10777)
Summary:
Fixing the printing of IValue lists, which didn't work previously.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10777

Differential Revision: D9474264

Pulled By: eellison

fbshipit-source-id: 0c7d6e7ecaa3f7908b131ac9f1036f19ac4f8b4f
2018-08-24 16:02:03 -07:00
Elias Ellison
74e6a666b3 If none of the schema match, add ImplicitTensorToNum conversions where needed. (#10180)
Summary:
When matching schema, first try to match without adding TensorToNum conversions. Then make another pass where TensorToNum conversions are allowed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10180

Differential Revision: D9438153

Pulled By: eellison

fbshipit-source-id: 80541b5abd06e9d4187e89dda751f44dab6f58c5
2018-08-24 16:02:00 -07:00
Richard Zou
ca567862b2 Support multidimensional indexing (#10787)
Summary:
Part of #10774.

This PR does the following:
- Support ast.ExtSlice in the frontend. This is done by returning a
  list of ast.Index and ast.Slice.
- Support multidimensional indexing with ints and slices

The general approach is to desugar multidimensional indexing into
at::slice, at::select operations. This is exactly how normal pytorch
does indexing (by desugaring it into at::slice, at::select, and other ops).

I used [this code](https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/python_variable_indexing.cpp) as reference.
We should be able to copy the rest of this to implement the missing
indexing features in script (indexing with ellipses, tensors, sequences, etc).

After I'm done implementing the missing indexing features in future prs, I can try to
templatize python_variable_indexing.cpp so that it can work with both JIT
script and normal pytorch indexing, but right now I'm not sure if that's
a good idea or not.

cc zdevito jamesr66a apaszke wanchaol
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10787

Differential Revision: D9481402

Pulled By: zou3519

fbshipit-source-id: 78c9fa42771a037d157879e23e20b87401cf1837
2018-08-24 08:10:32 -07:00
Zachary DeVito
3d43a82440 Add support for vararg style functions. (#10250)
Summary:
Things like `zeros(1,2,3, dtype=torch.int)` are now supported in the script by altering tryMatchSchema to auto-construct the list `[1,2,3]` when it sees inlined members of the list as the last positional arguments.

I suggest reading the commits individually, since the first two incrementally change how we do tryMatchSchema to get it ready for adding vararg list conversion, while the third actually does the modification.

closes #10632
closes #8516
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10250

Differential Revision: D9478235

Pulled By: zdevito

fbshipit-source-id: 0c48caf7a6184e463d9293d97015e9884758ef9c
2018-08-23 15:10:36 -07:00
Elias Ellison
5c0eece2fd Force types on values returned from if blocks to be equivalent (#10281)
Summary:
When emitting if Branches, check that the types on each value returned are equivalent. As with reassignment of values, tensors are not forced to be the same shape or subtype.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10281

Differential Revision: D9466566

Pulled By: eellison

fbshipit-source-id: 746abdeb34a0f68806b8e73726ad5003b536911c
2018-08-22 19:55:38 -07:00
Adam Paszke
f72e813c2f Allow tracing functions that take tuples of tensors as inputs (#10637)
Summary:
And return tuples.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10637

Reviewed By: eellison

Differential Revision: D9385892

Pulled By: apaszke

fbshipit-source-id: 542f4444d909fb246d7f1d88d6fb98345de2d431
2018-08-22 15:37:10 -07:00
Richard Zou
6c84f7fea0 Relax RHS type assert for augassign (#10730)
Summary:
Augassign (i.e., `x += 1`) gets desugared to an assignment of a binop (`x = x + 1`).
Right now we assert that the RHS of the binop is a tensor,
but it really doesn't have to be because we support scalar/scalar ops and also
list-list ops (i.e., `[1, 2] + [2, 3]`).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10730

Differential Revision: D9465110

Pulled By: zou3519

fbshipit-source-id: 7b118622701f09ce356aca81b8db743d9611097b
2018-08-22 15:10:33 -07:00
James Reed
6fcac354c5 Erase ListConstruct nodes for ONNX export (#10713)
Summary:
ONNX doesn't support this. Instead flatten the inputs to the ListConstruct op and inline it into the subsequent usage
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10713

Differential Revision: D9458508

Pulled By: jamesr66a

fbshipit-source-id: 0b41e69320e694bb2f304c6221864a39121e4694
2018-08-22 14:39:58 -07:00
Michael Suo
9e75ec11fb Make empty list literals construct empty Tensor[] (#10705)
Summary:
This will make the common case more natural (no need to do `_construct_empty_tensor_list()`)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10705

Differential Revision: D9411622

Pulled By: michaelsuo

fbshipit-source-id: 2d91fbc5787426748d6e1c8e7bbeee737544dc96
2018-08-20 18:28:28 -07:00
James Reed
585e6b581f Allow method-style casts on tensors (#10641)
Summary:
Closes https://github.com/pytorch/pytorch/issues/10631
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10641

Differential Revision: D9407598

Pulled By: jamesr66a

fbshipit-source-id: a0331f4e9e55d92718cde7a1112fe8c705206b1f
2018-08-20 14:10:21 -07:00
Richard Zou
f1420adfe3 Move at::chunk into the graph fuser (#10178)
Summary:
... to avoid slow at::chunk (it is slow due to tensor initialization). Picking up from #10026

This is done through the following:

1) Absorb starting chunks into FusionGroup as a part of the graph fuser
pass.
2) When compiling a kernel, emit a `std::vector<ConcatDesc>` that describes if an input (of the original graph) will be chunked.
3) When launching a kernel, `use std::vector<ConcatDesc>` to chunk an
input tensor on the CPU. This chunk directly takes in an at::Tensor and creates
four TensorInfo structs in-place in the argument list, bypassing the creation of intermediate Tensors.

- Expect test and correctness test to see if a single chunk is fused
  by the graph fuser
- Correctness test for a variety of chunks (dimension = beginning,
  middle, end) and tensors (contiguous, non-contiguous, edge case
  (splitSize = 1) for both CPU/CUDA
- Expect test for multiple chunks fused into the same kernel and
  correctness test.

cc zdevito apaszke

LSTM forward pass, 1 layer, 512 hidden size and input size, 100 seq length, requires_grad=False on all inputs and weights.

After changes:
```
thnn    cudnn   jit
8.8468  6.5797  9.3470
```

Before changes:
```
thnn    cudnn   jit
9.9221  6.6539  11.2550
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10178

Differential Revision: D9382661

Pulled By: zou3519

fbshipit-source-id: 1f8a749208fbdd45559775ce98cf4eb9558448f8
2018-08-18 16:10:11 -07:00
Richard Zou
e29b5a1ea8 graph fuser inserts explicit expands where necessary (#10325)
Summary:
Fixes #10096

If the only thing preventing a simple mappable operator from being fused
into a fusion group is that its Tensor inputs are not of the same shape as the
output, then the graph fuser inserts explicit expand nodes for those
inputs.
This helps the graph fuser not miss out on any fusion opportunities
involving simple mappable operations that have Tensor inputs. This PR
doesn't do anything for the scalar case; that can be addressed later.

Test Plan
- Simple expect test case
- Added expect tests for a raw LSTMCell. The expands help speed up the
  forwards pass by allowing more operations to be fused into the LSTMCell's single
  FusionGroup.

cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10325

Differential Revision: D9379308

Pulled By: zou3519

fbshipit-source-id: 86d2202eb97e9bb16e511667b7fe177aeaf88245
2018-08-17 16:03:46 -07:00
Richard Zou
86c9856d9c Fuse tensor-scalar ops when scalar is constant (#10511)
Summary:
This is on the way to resolving #9940.

Fixes #10501

This PR modifies graph fuser to fuse operations that have constant
scalar arguments. These constant scalar arguments are directly inlined
into the kernel body.

The context for this is that LSTM backward (in particular, sigmoid
backward) has many add(x, 1.) operations. This PR should be sufficient for
LSTM backward to get fused by the graph fuser.

cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10511

Differential Revision: D9378896

Pulled By: zou3519

fbshipit-source-id: 6a7a2987f5b6e8edaaf4b599cd200df33361650f
2018-08-17 14:10:23 -07:00
Wanchao Liang
52058204d6 Add nn functional tests in JIT (#10409)
Summary:
The PR is the first step to integrate torch.nn library with JIT. It adds the tests for nn functional interfaces in trace/script mode, and tries to find out the different between torch.nn.functional ops and the ATen ops, to see the work need to be done in order to support a full set of nn functional in script mode.

Some statistics in summary:

- Totally 84 useful functions in torch.nn.functional (the number does not include helper funcs and deprecated funcs in torch.nn.functional).

- 7 functions/ops does not support higher gradient, so just excluded from the whole test.

- 36 functions is different with the Aten op for different reasons. Among those 36 functions, bunch of them (roughly around 10-15) are just naming difference and simple transformation using other ops inside the function.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10409

Differential Revision: D9350694

Pulled By: wanchaol

fbshipit-source-id: 8fce6f30d8d25ace5a544a57b219fe61f5a092f8
2018-08-17 11:09:49 -07:00
Elias Ellison
e190505e84 Adding support for inlining if branches (#10084)
Summary:
Inlining if branches which have constant inputs.  If an if node gets inlined, the set of mutated variables returned by its ancestors may have changed. In the following example the block should
return a mutated set of (a) and not (a, b).

```
if cond:
  if True:
	 a = a - 1
    else:
	b = b - 1
```
To calculate this we recursively update mutate variables in if branches from the leaf nodes up.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10084

Reviewed By: michaelsuo

Differential Revision: D9340429

Pulled By: eellison

fbshipit-source-id: b0dd638a5cace9fdec3130460428fca655ce4b98
2018-08-17 09:48:47 -07:00
Peter Goldsborough
c101a57a74 Build mechanism for custom operators (#10226)
Summary:
This is the last step in the custom operator implementation: providing a way to build from C++ and Python. For this I:

1. Created a `FindTorch.cmake` taken largely from ebetica with a CMake function to easily create simple custom op libraries
2. Created a ` torch/op.h` header for easy inclusion of necessary headers,
3. Created a test directory `pytorch/test/custom_operator` which includes the basic setup for a custom op.
    1. It defines an op in `op.{h,cpp}`
    2. Registers it with the JIT using `RegisterOperators`
    3. Builds it into a shared library via a `CMakeLists.txt`
    4. Binds it into Python using a `setup.py`. This step makes use of our C++ extension setup that we already have. No work, yey!

The pure C++ and the Python builds are separate and not coupled in any way.

zdevito soumith dzhulgakov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10226

Differential Revision: D9296839

Pulled By: goldsborough

fbshipit-source-id: 32f74cafb6e3d86cada8dfca8136d0dfb1f197a0
2018-08-16 18:56:17 -07:00
Owen Anderson
abf85bf0ef Perform CSE across block boundaries. (#10105)
Summary:
zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10105

Differential Revision: D9186678

Pulled By: resistor

fbshipit-source-id: 87b63d4fc0c7d394edb4777acdefa8f022a8bf8d
2018-08-16 00:25:36 -07:00
James Reed
32bb4040dd Unified type annotation parsing for script frontends (#10279)
Summary:
After this, all combinations of {String frontend, Python AST Frontend}{Python 3-style type annotations, MyPy-style type comments}{Script method, Script function} should properly accept type annotations.

Possible TODOs:
- Clean up the functions marked HACK
- Clean up the Subscript tree-view to better match the Python AST versions
- Can we use this for Python functions? That's the only place annotations.get_signature() is still needed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10279

Differential Revision: D9319726

Pulled By: jamesr66a

fbshipit-source-id: b13f7d4f066b0283d4fc1421a1abb9305c3b28fa
2018-08-14 18:13:15 -07:00
Richard Zou
b4462511fd Add LSTMCell backward pass expect tests (#10506)
Summary:
- Exposed get_debug_graph for ScriptModule (gets the debug graph for its
  forward Method)
- Added forward/backward expect tests for lstm and milstm cells. These
  are intended to prevent regressions

cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10506

Differential Revision: D9316590

Pulled By: zou3519

fbshipit-source-id: 3c2510d8363e9733ccbc5c7cc015cd1d028efecf
2018-08-14 11:39:44 -07:00
Zachary DeVito
61bedc96f0 Schema-based creation of graph nodes (#10198)
Summary:
This commit adds the ability to insert a node with inputs, using the schema to check the inputs are valid types, fill in any default values, and perform standard implicit conversions. Since it is schema based, it will discover and use the right overload.
Constructors to `NamedValue` enable it to be constructed using `IValue` constants so it is possible to use constant values in the input list as well:

```
g.insert(aten::add, {v, 3});
```

Keyword arguments are also supported:

```
g.insert(aten::add, {v}, {{"other", t}, {"scalar", 1}});
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10198

Differential Revision: D9307252

Pulled By: zdevito

fbshipit-source-id: 644620aa85047d1eae1288383a619d50fec44d9b
2018-08-14 10:25:38 -07:00
Richard Zou
fed05cf4cf Fix prim::FusedConcat bug (#10466)
Summary:
Fixes #10456

The graph fuser was fusing together groups with prim::FusedConcat (the producer) with other ops (the consumer) if the consumer is fusable. For example,

```
import torch
torch.jit.script
def fn(x, y, z):
    x1 = x + y
    y1 = x - y
    w = torch.cat([x1, y1])
    return w + z

x = torch.randn(2, 2, dtype=torch.float, device='cpu')
y = torch.randn(2, 2, dtype=torch.float, device='cpu')
z = torch.randn(4, 2, dtype=torch.float, device='cpu')
fn(x, y, z)
fn.graph_for(x, y, z)
```
produced the following graph:
```
graph(%x : Float(2, 2)
      %y : Float(2, 2)
      %z : Float(4, 2)) {
  %3 : int = prim::Constant[value=1]()
  %y1 : Float(2, 2) = aten::sub(%x, %y, %3)
  %8 : int = prim::Constant[value=0]()
  %14 : Float(4, 2) = prim::FusionGroup_0[device=-1](%z, %y1, %x, %y)
  return (%14);
}
with prim::FusionGroup_0 = graph(%1 : Float(4, 2)
      %5 : Float(2, 2)
      %7 : Float(2, 2)
      %8 : Float(2, 2)) {
  %11 : int = prim::Constant[value=1]()
  %9 : int = prim::Constant[value=1]()
  %x1 : Float(2, 2) = aten::add(%7, %8, %9)
  %w : Float(4, 2) = prim::FusedConcat[dim=0](%x1, %5)
  %2 : int = prim::Constant[value=1]()
  %3 : Float(4, 2) = aten::add(%w, %1, %2)
  return (%3);
}
```

this is a problem because it violates two invariants:
1) all inputs to the FusionGroup must have the same size
2) prim::FusedConcat's output must not be used inside the FusionGroup

This PR fixes this problem by checking if the output to a FusionGroup came from a prim::FusedConcat node when deciding whether to fuse the consumer and producer.
If the producer is a value that came from a prim::FusedConcat node in a FusionGroup, then consumer & producer do not get fused.

cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10466

Differential Revision: D9296686

Pulled By: zou3519

fbshipit-source-id: ed826fa9c436b42c04ca7d4d790cece804c162bd
2018-08-13 21:09:25 -07:00
iotamudelta
75651d5b58 improve use of ROCm libraries, enable more tests, small fixes (#10406)
Summary:
* some small leftovers from the last PR review
* enable more unit test sets for CI
* replace use of hcRNG w/ rocRAND (docker image was already updated w/ newer rocRAND)
* use rocBLAS instead of hipBLAS to allow convergence w/ Caffe2
* use strided_batched gemm interface also from the batched internal interface
* re-enable Dropout.cu as we now have philox w/ rocRAND
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10406

Reviewed By: Jorghi12

Differential Revision: D9277093

Pulled By: ezyang

fbshipit-source-id: 7ef2f6fe4ead77e501ed7aea5c3743afe2466ca2
2018-08-13 11:39:43 -07:00
Roy Li
e9ad74357e Use serialization container in ir import export (#10394)
Summary:
Copy of #10191 because these changes didn't land with the diff.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10394

Differential Revision: D9260816

Pulled By: li-roy

fbshipit-source-id: 7dc16919cfab6221fda1d44e98c5b900cfb40558
2018-08-10 00:09:30 -07:00
Michael Suo
0950d7a98d support list slicing (#10318)
Summary:
As title.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10318

Differential Revision: D9254351

Pulled By: michaelsuo

fbshipit-source-id: be891a584dc295b5e353f7f5257d64a356fb9586
2018-08-09 17:25:13 -07:00
Michael Suo
b6402648f4 fix off-by-one bug in open-ended slicing (#10286)
Summary:
Previously, `tensor[i:]` was transformed to `tensor[i:-1]`. This incorrectly leaves off the last element. Noticed this when implementing slicing for list types.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10286

Differential Revision: D9193292

Pulled By: michaelsuo

fbshipit-source-id: df372b815f9a3b8029830dd9e8769f9985a890e7
2018-08-07 00:39:42 -07:00
Michael Suo
5a7c710548 Support some basic list operations (#10225)
Summary:
Support a few basic operators:
- eq
- add
- len
- select (indexing)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10225

Differential Revision: D9172338

Pulled By: michaelsuo

fbshipit-source-id: 6e75ec1453b9589b0fb4698598ecdba5a5fccff9
2018-08-07 00:39:40 -07:00
iotamudelta
a38b572de3 enable unit tests and other changes (#10266)
Summary:
This PR for the ROCm target does the following:
* enable some unit tests on ROCm
* fix a missing static_cast that breaks BatchNorm call on ROCm
* fix BatchNorm to work on ROCm w/ ROCm warp sizes etc
* improve the pyhipify script by introducing kernel scope to some transpilations and other improvements
* fix a linking issue on ROCm
* for more unit test sets: mark currently broken tests broken (to be fixed)
* enable THINLTO (phase one) to parallelize linking
* address the first failing of the elementwise kernel by removing non-working ROCm specialization
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10266

Differential Revision: D9184178

Pulled By: ezyang

fbshipit-source-id: 03bcd1fe4ca4dd3241f09634dbd42b6a4c350297
2018-08-06 14:54:01 -07:00
Peter Goldsborough
0c848f4179 Python integration for custom operators (#10149)
Summary:
Adds the Python path to custom operators, including dynamically loading operations into Python.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10149

Reviewed By: ezyang

Differential Revision: D9158380

Pulled By: goldsborough

fbshipit-source-id: 3edffa639e8d2959e9e80d1bd4f20ab4a1b3ca02
2018-08-06 13:54:48 -07:00
Richard Zou
29406a2c4c Fix shared_ptr refcycle in graph executor (#10222)
Summary:
Fixes #10032

When capturing an output, GraphExecutorAutogradFunction creates
SavedVariable with is_output=False and owns it:
https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/graph_executor.cpp#L87

Constructing SavedVariable with is_output=False makes it own a copy of
the shared_ptr<GraphExecutorAutogradFunction>, which causes a reference
cycle:
6456b944fd/torch/csrc/autograd/saved_variable.cpp (L27)

The solution in this PR is to construct the SavedVariable with
is_output=True if the captured value is an output.

Test Plan

Turn on cuda memory checking for JitTestCase. If the test's name
includes "cuda" or "gpu" in it, the cuda memory checking test happens.

cc zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10222

Reviewed By: ezyang

Differential Revision: D9162995

Pulled By: zou3519

fbshipit-source-id: aeace85a09160c7a7e79cf35f6ac61eac87cbf66
2018-08-04 11:39:10 -07:00
Wanchao Liang
50cf326158 Allow type cast between int and float in Script (#10168)
Summary:
The PR allows int→float and float→int casts. Current we only allow `tensor→int` and `tensor→float` casts.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10168

Differential Revision: D9141163

Pulled By: wanchaol

fbshipit-source-id: 5e5591a98b4985a675641dfc9a385b2a0bf8e208
2018-08-03 10:56:05 -07:00
Michael Suo
13de6e8dfa Make list literals construct ListType (#10193)
Summary:
Previously, `foo = [bar, baz]` would construct a TupleType of fixed arity. This would cause code like:
```
foo = [2]
if True:
    foo = [2, 2]
```
to fail to compile, since `(int)` is not the same as `(int, int)`.

This PR changes things so that list literals construct ListTypes, which can be resized.

Potentially breaking changes introduced:
- Empty list literals are now disallowed, `_constructEmptyFooList()` builtins are required to replace them.
- Iterable variable unpacking where the rhs is a list is now disallowed. (Tuples still work)
- Lists must have a single type.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10193

Differential Revision: D9147166

Pulled By: michaelsuo

fbshipit-source-id: bbd1b97b0b6b7cb0e6f9d6aefa1ee9c731e63039
2018-08-03 00:55:23 -07:00
Roy Li
0e9c6898cb Export modules in ir with google protobuf
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9746

Differential Revision: D9110006

Pulled By: li-roy

fbshipit-source-id: 8b9744c042f822fdfe959a7a7fef3d0baff4f639
2018-08-02 15:54:51 -07:00
Elias Ellison
170d29769b Strings lexing, parsing, implementation in print (#9324)
Summary:
This PR adds strings to the ast and implements them for print statements. Strings are lifted as attributes to the print node. They must be arguments to print itself, not as an argument for an object that is passed to print.  If they are encountered elsewhere a NYI exception will be thrown.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9324

Reviewed By: jramseyer

Differential Revision: D8807128

Pulled By: eellison

fbshipit-source-id: 984401ff458ed18d473c6d1bd86750e56c77d078
2018-08-02 11:09:03 -07:00
James Reed
9c818bfbc7 Refactor PythonValue types + use tryMatchSchema for PythonOp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10132

Differential Revision: D9121327

Pulled By: jamesr66a

fbshipit-source-id: 6d8bcf6b0dca54106cf9ed740bcff857062a03da
2018-08-02 10:26:58 -07:00
iotamudelta
cfa05706ef ROCm contributions week 29 (#9653)
Summary:
In this changeset:
* improvements to `hipify-python.py`
* marking unit tests broken for ROCm
* reducing the number of jobs for the built to avoid out of memory issues
* switch to Thrust/cub-hip master for the CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9653

Differential Revision: D9117791

Pulled By: ezyang

fbshipit-source-id: a6c3c7b81f2bda9825974bf9bf89a97767244352
2018-08-02 09:09:00 -07:00
Runtian Zhou
70d47f92db Add support for rand_like op in fusion compiler (#9795)
Summary:
Enabled support for generating random numbers in fusion compiler. Currently a philox RNG implemented by Tensorflow is used, as the NVRTC couldn't resolve the curand.h header correctly. The two implementation should have exact same behavior according to our tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9795

Differential Revision: D8999029

Pulled By: SsnL

fbshipit-source-id: f0d2616a699a942e2f370bdb02ac77b9c463d7b8
2018-08-02 08:55:25 -07:00
Chunli Fu
57061d600a Auto-batching IR transformation for control flow (#9392)
Summary:
Implement IR transformation for control flow

- `prim::Constant`: clone to new graph directly
- `prim::NumToTensor`: create a `BatchTensor` from output tensor with `batch_size = 1`
- `prim::TensorToNum`: clone to new graph
- `prim::ListConstruct`: clone to new graph
- `prim::If`: execute both `if_block` and `else_block` and combine results from them using `cond`
- `prim::Loop`:
  - for loop
  - while loop: change while `cond` to `cond_any`, use `cond` to update outputs

test case: hand-written LSTM, greedy search, beam search
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9392

Differential Revision: D8822369

Pulled By: ChunliF

fbshipit-source-id: 8f03c95757d32e8c4580eeab3974fd1bc429a1e5
2018-08-01 22:24:35 -07:00