Commit Graph

33 Commits

Author SHA1 Message Date
Sebastian Messmer
bbc53bffef AliasAnalysisKind::CONSERVATIVE/FROM_SCHEMA (#22175)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22175

- Rename AliasAnalysisKind::DEFAULT to AliasAnalysisKind::CONSERVATIVE
- Introduce AliasAnalysisKind::FROM_SCHEMA that means the alias annotations of the schema should be honored
- Introduce AliasAnalysisKind::INTERNAL_SPECIAL_CASE to be able to run assertions that internal special cased ops are treated correctly

- aten:: and prim:: ops are not treated as special cases anymore, but just use AliasAnalysisKind::FROM_SCHEMA
- There's a set of assertions to ensure that aten:: and prim:: ops are all correctly set up to use AliasAnalysisKind::FROM_SCHEMA. Once this PR lands and passes all tests, we will remove those assertions and open up for the possibility of different AliasAnalysisKind settings for aten:: and prim:: ops

Differential Revision: D15929595

fbshipit-source-id: 7c6a9d4d29e13b8c9a856062cd6fb3f8a46a2e0d
2019-07-25 11:53:51 -07:00
Bram Wasti
82ceeaeca2 Add options to jit's operator constructor (#21315)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21315
ghimport-source-id: 168ddecb333a8cb309e7b859683de9b077123205

Differential Revision: D15614506

Pulled By: bwasti

fbshipit-source-id: ae013a88e2069c38845b5b8ff805db96ab2c29e9
2019-06-03 19:30:22 -07:00
Sebastian Messmer
384d828ea5 Add aliasAnalysis to torch::RegisterOperators() (#21084)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21084

- Now AliasAnalysisKind can be set using the torch::RegisterOperators() API
- This also allows us to remove the last place in torch::jit::RegisterOperators that didn't use c10 yet.

Reviewed By: dzhulgakov

Differential Revision: D15542097

fbshipit-source-id: ea127ecf051a5c1e567e035692deed44e04faa9e
2019-05-31 13:51:07 -07:00
Bram Wasti
035966d538 Add options to Operator to enable registration of alias analysis passes (#19382)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19382
ghimport-source-id: aeaad3b84ea20dd95b38635ca28c5ff657187909

Differential Revision: D14990873

Pulled By: bwasti

fbshipit-source-id: e1292ac8358ca8ff5bad8d8aeaddf06c23e66067
2019-05-06 15:40:13 -07:00
Sebastian Messmer
094678c04b Split function schema parser from operator (#19280)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19280

We want to use the function schema parser from ATen/core, but with as little dependencies as possible.
This diff moves the function schema parser into its own file and removes some of its dependencies.

Reviewed By: dzhulgakov

Differential Revision: D14931651

fbshipit-source-id: c2d787202795ff034da8cba255b9f007e69b4aea
2019-04-17 23:50:15 -07:00
Michael Suo
242743eedb Revert D14901379: [jit] Add options to Operator to enable registration of alias analysis passes
Differential Revision:
D14901379

Original commit changeset: d92a497e280f

fbshipit-source-id: 51d31491ab90907a6c95af5d8a59dff5e5ed36a4
2019-04-17 16:56:14 -07:00
Bram Wasti
3a031c414a Add options to Operator to enable registration of alias analysis passes (#18589)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18589
ghimport-source-id: dab203f6be13bf41963848f5315235b6bbe45c08

Differential Revision: D14901379

Pulled By: bwasti

fbshipit-source-id: d92a497e280f1b0a63b11a9fd8ae9b48bf52e6bf
2019-04-17 13:14:55 -07:00
Sebastian Messmer
be364ac8d7 Specify overload name in function schema (#18037)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18037

The FunctionSchema can now store an overload name and the parser knows how to parse it. Specify like this:

    my_func.overload1(arg1: Tensor) -> Tensor
    my_func.overload2(arg1: Tensor, arg2: Tensor) -> Tensor

Reviewed By: zdevito

Differential Revision: D14467497

fbshipit-source-id: 8832b32f07351bb61090357b17b77a6a2fed3650
2019-03-15 16:58:13 -07:00
Sebastian Messmer
16468a9f45 Automatically register c10 ops with JIT (#16534)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16534

All c10 ops from the c10 dispatcher are now automatically registered with JIT

Reviewed By: dzhulgakov

Differential Revision: D13869275

fbshipit-source-id: 5ab5dec5b983fe661f977f9d29d8036768cdcab6
2019-02-06 21:21:33 -08:00
Sebastian Messmer
ed4776820a Fix includes for ATen/core/stack.h (#16462)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16462

This file was moved, now we change the includes to the new location and remove the proxy header.

Reviewed By: ezyang

Differential Revision: D13847279

fbshipit-source-id: 4617d52fdcfe785cb7b2154460a6686c437abd8b
2019-01-29 23:33:13 -08:00
Mikhail Zolotukhin
47bf30661f Directly include headers from ATen.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16287

Differential Revision: D13792949

Pulled By: ZolotukhinM

fbshipit-source-id: d627d8dc469df048063c70d0b5b8d33fede809a3
2019-01-24 11:22:27 -08:00
Elias Ellison
b1529eeadb Print out operator suggestions for unknown builtin op (#15183)
Summary:
This improves the error message for "unknown builtin op" to suggest similarly named ops.

Currently it prints out all operators with a name within two edits.

Related issue: https://github.com/pytorch/pytorch/issues/13409
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15183

Differential Revision: D13578509

Pulled By: eellison

fbshipit-source-id: 5c73408eda1f7aa456f5bd28790c34df0c76aeca
2019-01-04 13:04:44 -08:00
Michael Suo
f636dc9276 clang format world (#15524)
Summary:
The PR clang-formats everything in `torch/csrc/jit/` and adds it to the pre-commit hook.

Here is a list of non-mechanical changes:
- I went over each file and fixed up whenever I could tell that clang-format was clobbering comment formatting.
- Made the macros in register_prim_ops a little more clang-format friendly by omitting trailing commas
- Refactored autodiff.cpp to use a helper class with explicit state rather than a bunch of capturing lambdas
- Small improvements to the precommit hook clang-format
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15524

Differential Revision: D13547989

Pulled By: suo

fbshipit-source-id: 3ff1541bb06433ccfe6de6e33f29227a2b5bb493
2018-12-26 06:55:01 -08:00
Ailing Zhang
6ab2e7442d Autograd using torchscript (#14604)
Summary:
This PR enables autodiff to use the forward/backward graph compiled from python code, instead of using symbolic gradients(modifying the original graph directly).

We put the map in a separate .h file for now to wait for the native_functions.yaml and derivatives.yaml merge. This should ideally go into native_functions.yaml eventually.

This PR should be enough to unblock us for now, we can start writing gradients for aten functions in python.

Differential Revision: D13494635

Pulled By: ailzhang

fbshipit-source-id: f8d51a15243ac46afd09d930c573ccdfcd9fdaaf
2018-12-18 19:10:57 -08:00
Edward Yang
517c7c9861 Canonicalize all includes in PyTorch. (#14849)
Summary:
Anywhere we used #include "foo.h", we now say #include <foo.h>
Paths are adjusted to be rooted out of aten/src, torch/lib, or
the root level directory.

I modified CMakeLists.txt by hand to remove TH and THC from
the include paths.

I used the following script to do the canonicalization:

```
  import subprocess
  import re
  import os.path

  files = subprocess.check_output(['git', 'ls-files']).decode('utf-8').rstrip().split('\n')
  for fn in files:
      if not any(fn.endswith(suff) for suff in ['.cu', '.cpp', '.in', '.h', '.hpp', '.cu', '.cuh', '.cc']):
          continue
      if not any(fn.startswith(pref) for pref in ["aten/", "torch/"]):
          continue
      with open(fn, 'r') as f:
          c = f.read()
      def fmt(p):
          return "#include <{}>".format(p)
      def repl(m):
          p = m.group(1)
          if p in ["dlfcn.h", "unistd.h", "nvrtc.h", "cuda.h", "cuda_runtime.h", "cstdint", "cudnn.h", "Python.h", "cusparse.h", "cuda_runtime_api.h", "cuda_fp16.h", "cublas_v2.h", "stdint.h", "curand_kernel.h"]:
              return fmt(p)
          if any(p.startswith(pref) for pref in ["torch/csrc", "c10/", "ATen/", "caffe2/", "TH/", "THC/", "Eigen/", "gtest/", "zdl/", "gloo/", "onnx/", "miopen/"]):
              return fmt(p)
          for root in ["aten/src", "torch/lib", ""]:
              for bad_root in [os.path.dirname(fn), "aten/src/TH", "aten/src/THC", "torch/csrc"]:
                  new_p = os.path.relpath(os.path.join(bad_root, p), root)
                  if not new_p.startswith("../") and (os.path.exists(os.path.join(root, new_p)) or os.path.exists(os.path.join(root, new_p + ".in"))):
                      return fmt(new_p)
          print("ERROR: ", fn, p)
          return m.group(0)
      new_c = re.sub(r'#include "([^"]+)"', repl, c)
      if new_c != c:
          print(fn)
          with open(fn, 'w') as f:
              f.write(new_c)
```

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14849

Reviewed By: dzhulgakov

Differential Revision: D13363445

Pulled By: ezyang

fbshipit-source-id: 52361f878a672785f9306c9e9ab2513128092b68
2018-12-08 19:38:30 -08:00
Zachary DeVito
aef9e76283 Get pretty printer ready for use as a serialization format (#13616)
Summary:
Get pretty printer ready for use as a serialization format

This PR adds a bunch of functionality to the pretty printer (now called python_printer to reflect
the fact that it will be used to output valid python source). The idea is to get the printer
ready for use as serialization format.  This PR does not have tests beyond what the pretty
printer already had. PRs stacked on this one will do round-trip export/import to test this functionality more robustly.

Notes:
* PythonPrinter is an evolution of the original pretty printer. However, much of it has changed so it is best just to
  read it as a new implementation. Trying to correlate it to the original implementation is probably not much help.
* The printer tries to get reasonably close to how the original function was likely written, such as
  writing expressions rather than making intermediates when possible. We may decide to turn this off
  for the actual serialization, but it is useful for pretty printing.
* tensor field access was changed so that prim::device and family have schema
* fixed a bug in the compiler where setUniqueName gets called even when a value already has one.
  this sometimes assigned really poor names to graph inputs
* Graph::insert gains an optional range argument to make range-preserving inserts easier.
* prim:: ops that can have schema now have schema. This is because when we parse them back in,
  we will need the schema to correctly set their output types.
* there is code in the python printer to complain if you try to add a prim op and do not update the printer.
* BuiltinModule is generalized to take an operator namespace and a version number for work in future commits.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13616

Reviewed By: goldsborough

Differential Revision: D13008252

Pulled By: zdevito

fbshipit-source-id: 32b33bc6410d6ca1c6f02bd6e050f8d5eea32083
2018-11-12 10:21:30 -08:00
mruberry
6fe089c6ea Hierarchical device independent -> device specific architecture (#13108)
Summary:
This PR principally redesigns the fuser's logical flow to be hierarchical, with device-independent logic directing (relatively little) device-specific logic. This design is based on reviews of XLA, TVM, internal design review at NVIDIA and discussions with fuser owners at Facebook. To further vet the design I have begun developing the next significant PR (extended fusion logic) on top of this architecture and it has made the work significantly easier. This PR also improves fuser modularity, which should make it easier for others to contribute to. Unfortunately, this PR is large and its nature has made breaking it into smaller pieces challenging. Future PRs should be smaller.

The fusion flow is now:

- Fusions are "registered" and "upfront compilation" occurs. The fusion specifications, which includes the graph, go into a thread-safe device-independent cache. Upfront compilation generates some information used later during shape inference.
- Fusions are run, which passes them to an executor that performs shape inference, requests an instantiated fusion from the specification's thread-safe store, and launches them. Launch logic eventually defers to device-specific logic.
- Fusions not previously instantiated are compiled. Compilation is device-specific and arg-specific. Compilation logic eventually defers to device-specific logic.
- If the fusion could not be run because fusion on the requested device is disabled or shape inference fails a fallback is invoked.

This flow can be thought of as PyTorch IR -> Device-Independent Fusion Logic -> Device-Specific Fusion Logic. The current upstream logic is, by contrast, PyTorch IR -> Device-Specific Logic -> Device-Independent Logic, which results in needless code duplication and lack of conceptual clarity. That was my mistake when splitting the fuser off from the rest of the jit and our reviews since then have been incredibly helpful in understanding why the approach in this PR is better.

This PR does not only move code around. It also fixes few couple bugs and makes some logical/code changes.

Bug fixes:
- thread-safety is improved with caches preventing concurrent access
- the nvrtc version is now reviewed to determine the appropriate compute architecture to compile for, fixing a bug that would cause runtime errors if a user's nvrtc didn't support the compute architecture their gpu reported
- an issue with DeviceGuard not setting the device properly and failing silently is worked-around (ezyang mentioned he was reviewing the dynamic registration DeviceGuard uses, which may resolve the issue)

Code/Logical changes:
- "const" now appears many more places (note: I cast const away in operator.h because of some obscure build issues -- I think we should be able to fix this and will take a look while this goes through testing)
- The new flow allowed some redundant code to be removed (AnnotatedGraph is gone, for example, and the more straightforward flow eliminated duplication of effort elsewhere)
- Fallback logic is now also invoked if a fusion is requested on a device that cannot handle fusions
- Use of macros to determine which files are compiled is reduced (though they may come back if the Windows build is unhappy)
- There is no more "common" code or folder, the device-independent logic being at the forefront of the fuser replaces and improves upon the goal of sharing code

apaszke who I promised naming rights to
zdevito who correctly pointed out that the device-independent logic should be the bulk of what the fuser is doing
ngimel who contributed to the design of this architecture
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13108

Reviewed By: gchanan, fmassa

Differential Revision: D12850608

Pulled By: soumith

fbshipit-source-id: 24e2df6dfa97591ee36aeca8944519678c301fa3
2018-10-31 18:13:00 -07:00
Peter Goldsborough
9ea19cb079 Windows CI integration for custom ops (#12928)
Summary:
Resubmission of https://github.com/pytorch/pytorch/pull/11527

ezyang orionr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12928

Differential Revision: D10501342

Pulled By: goldsborough

fbshipit-source-id: 7ce74795aab2f13efeb38f56ce82f53055f5eade
2018-10-23 09:18:09 -07:00
Will Feng
9473e57eca Revert D10444104: [pytorch][PR] Windows CI integration for custom ops
Differential Revision:
D10444104

Original commit changeset: 4c447beeb967

fbshipit-source-id: ead52444aefa27692e3f36dadad986e2313261bd
2018-10-18 14:08:18 -07:00
Peter Goldsborough
12be60cc04 Windows CI integration for custom ops (#11527)
Summary:
This is likely currently broken due to symbol visibility issues, but we will investigate it using this PR.

CC orionr yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11527

Differential Revision: D10444104

Pulled By: goldsborough

fbshipit-source-id: 4c447beeb9671598ecfc846cb5c507ef143459fe
2018-10-18 07:55:05 -07:00
Yangqing Jia
713e706618 Move exception to C10 (#12354)
Summary:
There are still a few work to be done:

- Move logging and unify AT_WARN with LOG(ERROR).
- A few header files are still being plumbed through, need cleaning.
- caffe2::EnforceNotMet aliasing is not done yet.
- need to unify the macros. See c10/util/Exception.h

This is mainly a codemod and not causing functional changes. If you find your job failing and trace back to this diff, usually it can be fixed by the following approaches:

(1) add //caffe2/c10:c10 to your dependency (or transitive dependency).
(2) change objects such as at::Error, at::Optional to the c10 namespace.
(3) change functions to the c10 namespace. Especially, caffe2::MakeString is not overridden by the unified c10::str function. Nothing else changes.

Please kindly consider not reverting this diff - it involves multiple rounds of rebasing and the fix is usually simple. Contact jiayq@ or AI Platform Dev for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/12354

Reviewed By: orionr

Differential Revision: D10238910

Pulled By: Yangqing

fbshipit-source-id: 7794d5bf2797ab0ca6ebaccaa2f7ebbd50ff8f32
2018-10-15 13:33:18 -07:00
James Reed
beeec47041 Sanity checks for tracing (#10841)
Summary:
TODO: integrate into torch.onnx.export -- separate PR

*Problem:* We have a facility to trace PyTorch operations on Python code, but there are several failure modes where the trace is not representative of the actual underlying computation:

* The tracer encountered dynamic control flow
* Some computation escaped the tracer, and appeared as a Constant tensor node in the graph
* Some stateful function was traced, e.g. someone did an optimization in Python by memoizing function outputs

*Objective*: In an ideal world, this whole process would be automated and the user can trust that the system will magically capture the intended semantics from the program. Realistically speaking, we will likely have to settle with a human-in-the-loop error reporting system, allowing for the user to identify problems and modify the source code to allow for tracing.

*Stage 1* (this PR): Output-level checking & graph diff. torch.jit.trace gains a kwarg 'check_inputs', which is a list of tuples of input arguments. We will iterate through the list and trace the function again for each set of check inputs. We'll also interpret the original trace with these inputs and compare output values and graphs, printing a diff of the graph if there is a difference.

Examples:

```
torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(4, 5),)])
def foo(x):
    y = torch.arange(0, x.shape[0]).float()
    return x + y.unsqueeze(1)
```

```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Graphs differed across invocations!
	Graph diff:
		  graph(%0 : Dynamic) {
		-   %1 : Dynamic = prim::Constant[value= 0  1  2 [ CPULongType{3} ]]()
		?                                                              ^
		+   %1 : Dynamic = prim::Constant[value= 0  1  2  3 [ CPULongType{4} ]]()
		?                                                +++              ^
		    %2 : int = prim::Constant[value=0]()
		    %3 : Dynamic = aten::_cast_Float(%1, %2)
		    %4 : int = prim::Constant[value=1]()
		    %5 : Dynamic = aten::unsqueeze(%3, %4)
		    %6 : int = prim::Constant[value=1]()
		    %7 : Dynamic = aten::add(%0, %5, %6)
		    return (%7);
		  }
	Node diff:
		- %1 : Dynamic = prim::Constant[value= 0  1  2 [ CPULongType{3} ]]()
		?                                                            ^
		+ %1 : Dynamic = prim::Constant[value= 0  1  2  3 [ CPULongType{4} ]]()
		?                                              +++              ^
	Trace source location:
		dank.py(5): foo
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper
		dank.py(3): <module>
	Check source location:
		dank.py(5): foo
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(281): check_trace
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(408): wrapper
		dank.py(3): <module>
ERROR: Tensor-valued Constant nodes differed in value across invocations. This often indicates that the tracer has encountered untraceable code.
	Node:
		%1 : Dynamic = prim::Constant[value= 0  1  2 [ CPULongType{3} ]]()
	Source Location:
		dank.py(5): foo
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper
		dank.py(3): <module>
	Comparison exception:
		Not equal to tolerance rtol=1e-07, atol=0

		(shapes (3,), (4,) mismatch)
		 x: array([0, 1, 2])
		 y: array([0, 1, 2, 3])

```
==

```
torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(3, 4),)])
def foo(x):
    y = x.data
    return x + y
```

```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Traced function outputs do not match the Python function outputs.
ERROR: Tensor-valued Constant nodes differed in value across invocations. This often indicates that the tracer has encountered untraceable code.
	Node:
		%1 : Dynamic = prim::Constant[value=<Tensor>]()
	Source Location:
		dank.py(6): foo
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper
		dank.py(3): <module>
	Comparison exception:
		Not equal to tolerance rtol=1e-07, atol=0

		(mismatch 100.0%)
		 x: array([0.397137, 0.956105, 0.169478, 0.560292, 0.392568, 0.108441,
		       0.97645 , 0.34412 , 0.951246, 0.793061, 0.557595, 0.770245],
		      dtype=float32)
		 y: array([0.243178, 0.315964, 0.972041, 0.0215  , 0.927751, 0.457512,
		       0.951092, 0.97883 , 0.048688, 0.118066, 0.779345, 0.271272],
		      dtype=float32)
```

==

```
import torch

torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(4, 4),)])
def foo(x):
    for _ in range(x.size(0)):
        x = torch.neg(x)
    return x
```

```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Traced function outputs do not match the Python function outputs.
ERROR: Graphs differed across invocations!
	Graph diff:
		  graph(%0 : Dynamic) {
		    %1 : Dynamic = aten::neg(%0)
		    %2 : Dynamic = aten::neg(%1)
		    %3 : Dynamic = aten::neg(%2)
		+   %4 : Dynamic = aten::neg(%3)
		-   return (%3);
		?            ^
		+   return (%4);
		?            ^
		  }
```

==

```
import torch

def foo(x):
    if not hasattr(foo, 'cache'):
        foo.cache = torch.neg(x)
    return x + foo.cache

traced = torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(3, 4),)])(foo)
```

```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Traced function outputs do not match the Python function outputs.
ERROR: Graphs differed across invocations!
	Graph diff:
		  graph(%0 : Dynamic) {
		-   %1 : Dynamic = aten::neg(%0)
		+   %1 : Dynamic = prim::Constant[value=<Tensor>]()
		    %2 : int = prim::Constant[value=1]()
		    %3 : Dynamic = aten::add(%0, %1, %2)
		    return (%3);
		  }
	Node diff:
		- %1 : Dynamic = aten::neg(%0)
		+ %1 : Dynamic = prim::Constant[value=<Tensor>]()
	Trace source location:
		test.py(5): foo
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper
		test.py(8): <module>
	Check source location:
		test.py(6): foo
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(281): check_trace
		/Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(408): wrapper
		test.py(8): <module>
```

The following two examples show instances where program semantics are lost in the Python -> trace transformation, and repeated invocation does not give us useful debug information. Further design in underway for catching these scenarios.

```
import torch

torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(3, 4),)])
def foo(x):
    for i in range(3):
        x[i, :] = torch.zeros(4)
    return x
```

```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Traced function outputs do not match the Python function outputs.
Exception:
Not equal to tolerance rtol=1e-07, atol=0

(mismatch 100.0%)
 x: array([0.830221, 0.915481, 0.940281, 0.555241], dtype=float32)
 y: array([0., 0., 0., 0.], dtype=float32)
```

==

```
import torch

torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(5, 6),)])
def foo(x):
    x.view(-1).add_(-x.view(-1))
    return x
```

```
torch.jit.TracingCheckError: Tracing failed sanity checks!
ERROR: Traced function outputs do not match the Python function outputs.
Exception:
Not equal to tolerance rtol=1e-07, atol=0

(mismatch 100.0%)
 x: array([0.734441, 0.445327, 0.640592, 0.30076 , 0.891674, 0.124771],
      dtype=float32)
 y: array([0., 0., 0., 0., 0., 0.], dtype=float32)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10841

Differential Revision: D9499945

Pulled By: jamesr66a

fbshipit-source-id: 1f842a32d0b0645259cc43b29700b86d99c59a45
2018-08-28 20:25:26 -07:00
Peter Goldsborough
c101a57a74 Build mechanism for custom operators (#10226)
Summary:
This is the last step in the custom operator implementation: providing a way to build from C++ and Python. For this I:

1. Created a `FindTorch.cmake` taken largely from ebetica with a CMake function to easily create simple custom op libraries
2. Created a ` torch/op.h` header for easy inclusion of necessary headers,
3. Created a test directory `pytorch/test/custom_operator` which includes the basic setup for a custom op.
    1. It defines an op in `op.{h,cpp}`
    2. Registers it with the JIT using `RegisterOperators`
    3. Builds it into a shared library via a `CMakeLists.txt`
    4. Binds it into Python using a `setup.py`. This step makes use of our C++ extension setup that we already have. No work, yey!

The pure C++ and the Python builds are separate and not coupled in any way.

zdevito soumith dzhulgakov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10226

Differential Revision: D9296839

Pulled By: goldsborough

fbshipit-source-id: 32f74cafb6e3d86cada8dfca8136d0dfb1f197a0
2018-08-16 18:56:17 -07:00
Peter Goldsborough
0c848f4179 Python integration for custom operators (#10149)
Summary:
Adds the Python path to custom operators, including dynamically loading operations into Python.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10149

Reviewed By: ezyang

Differential Revision: D9158380

Pulled By: goldsborough

fbshipit-source-id: 3edffa639e8d2959e9e80d1bd4f20ab4a1b3ca02
2018-08-06 13:54:48 -07:00
Peter Goldsborough
87d57dc5f5 Simplified Operator (#10080)
Summary:
zdevito explained that the attributed versions of `Operator`s are no longer necessary. This PR does two things:

1. Removes all code associated with attributed operators,
2. Adds a second kind of state to `Operator` where it is constructed with an `Operation` directly instead of an `OperationCreator`. This will be useful to test custom operators which don't require a node (you can just retrieve it directly).

Now rebased on top of https://github.com/pytorch/pytorch/pull/9801

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10080

Differential Revision: D9113668

Pulled By: goldsborough

fbshipit-source-id: 1276a191c7cf89da1c38488769f2105ce2664750
2018-08-01 09:41:08 -07:00
Zachary DeVito
bf744bea94 Parse and register schema declarations lazily (#9801)
Summary:
This should prevent slow startup times, and will not report as many
errors during static initialization time which are hard to debug

ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9801

Reviewed By: goldsborough

Differential Revision: D8986603

Pulled By: zdevito

fbshipit-source-id: 440d43ab5e8cffe0b15118cb5fda36391ed06dbc
2018-07-31 17:24:24 -07:00
Peter Goldsborough
04939a4745 Match parameter names and = default (#9737)
Summary:
More clang tidy cleanups in `torch/csrc`. This time:

1. `hicpp-use-equals-default` recommends `= default` instead of `{}` for constructors/destructors. This is better practice because it expresses the intent better (https://stackoverflow.com/questions/6502828/what-does-default-mean-after-a-class-function-declaration)
2. `readability-inconsistent-declaration-parameter-name` enforces that parameter names in the declaration match parameter names in the definition. This is just generally useful and can prevent confusion and bugs.

Also updated my script a little bit.

apaszke ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9737

Differential Revision: D9069069

Pulled By: goldsborough

fbshipit-source-id: f7b3f3a4eb4c9fadc30425a153566d3b613a41ae
2018-07-30 14:10:00 -07:00
Adam Paszke
8cb1eef7b9 Unify IR operator representation (stop using attributes in the JIT) (#9807)
Summary:
Based on top of #9763 (first 3 commits belong to that PR). The first commits from this PR are "Stop using attributes ..."

I tried to separate the changes into fairly meaningful commits. I can't split them up into smaller PRs, because everything starts working and all tests pass only after the whole sequence, but hopefully this will make reviewing somewhat easier.

Known issues/regressions/future tasks:
- `aten::lerp` and `aten::clamp` are no longer fusable
- `CreateAutodiffSubgraphs` needs a rewrite
  - It is much more strict now, and will miss a lot of opportunities, especially when viewing ops are involved. Our previous approach was "ignore the assumption on shape availability in gradient formulas to determine differentiability, and hope that shape prop will be robust enough to actually deliver them before we differentiate", which obviously doesn't scale well to more complex cases. We should either work on reducing the size dependency of grad formulas (feasible e.g. for `view`/`reshape`, unfeasible for `squeeze`/`unsqueeze`), or make `CreateAutodiffSubgraphs` integrate some kind of "I could integrate this node into an AD subgraph, but will I be able to infer the shape of its input" reasoning (kind of like a limited shape prop, that doesn't infer anything, and only tells if it *could* infer something).
  - It sometimes creates constant-only (or constants + one node) graphs, which is useless
- Broken `aten::add` in auto-batching, because it gained a non-tensor input. I changed the test for pointwise operations to use `aten::mul` instead, but I needed to disable the LSTM cell test. I'm not sure how scalar constants should be implemented in this case, because I don't fully understand our format. cc: ChunliF
- Graph import does some hacks to recover type of constants. This code should be removed once we'll gain the ability to export the IR along with value types.
- There's still a fair amount of dead code that can be removed. I didn't want to make this diff any bigger, and removing it is an easy task.
- Graph fuser could be improved to use signature matching (possibly using `OperatorSet`) instead of basing on node kinds.
- Manual constant propagation for the `ListConstruct` node in `torch/onnx/utils.py` should be replaced with a proper constant propagation pass (or we should ensure that the one we have handles at least this case before we remove this code).

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9807

Reviewed By: ezyang

Differential Revision: D9004285

Pulled By: apaszke

fbshipit-source-id: fe88026a765f6b687354add034c86402362508b7
2018-07-26 22:11:50 -07:00
Peter Goldsborough
f62bc01dfe Remove TORCH_ASSERT (#9575)
Summary:
I got some tensor->variable conversion exceptions from `torch/csrc/autograd/variable.h`, which used the `TORCH_ASSERTM` macros instead of `AT_CHECK`, so they didn't have backtraces. This was such a substantial loss for debugability that I decided to update the whole codebase to use the backtrace-enabled ATen macros instead of `TORCH_ASSERT` and `JIT_ASSERT`, the latter having been an alias of the former.

ezyang apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9575

Differential Revision: D8924566

Pulled By: goldsborough

fbshipit-source-id: 7a4013b13eec9dbf024cef94cf49fca72f61d441
2018-07-24 18:10:06 -07:00
Zachary DeVito
a949245a86 Switch interpreter to use IValue's primitive int/floats (#9718)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9718

This patch switches the interpreter to use IValue's primitive numbers rather than tensors for computing on integers and floats. In addition to preparing the interpreter for first-class support of other types, this cleans up the handling of primitive numbers, making it possible to just use the normal operator overloading dispatch to find the right implementation for numbers. As a result of this change, a lot of other functionality needed to be updated since it was the first time we use non-tensors in a lot of places in the code base.

Notes:
* Fixes code_template.py so that multi-line strings are indented correctly when used on a standalone line
* Cast operators (`int(x)`) now are functional. Some tests have addition conversions to integers because
we no longer allow implicit tensor -> integer conversions following the same convention as in python
* prim::ListConstruct/createList has been added to the interpreter for creating lists and this has
replaced aten::stack for integers lists
* gen_jit_dispatch.py has been refactored so that non-tensor types use operators on IValues to extract
the primitives
* IValue gains a .to<T> method that is the equivalent of tensor_as but for IValue instead of at::Tensor
* `constant_as<T>` is switched over to using IValues's `.to<T>` method, to make conversion from constant->IValue->C++ type
more consistent. This functionality combined with `toIValue(Value*)` replaces the `tensor_as` and `as_tensor` family of functions.
* conditional expressions (if, loop) and operators related to them are now computed on integers rather than tensors
* IValue gains constructors for constructing from at::Scalar and converting to it. However, IValue itself will always store
the scalars as a double or int64.
* To align with python 3 syntax, TK_INT, TK_FLOAT, and TK_BOOL have been removed from the parser, and int/float/bool are just treated as special identifiers in the compiler,
along with print. These are represented as special sugared values with a `call` method implemented. For int/float/bool this implements casting behavior.
* Dropped shared_from_this from Type/Module. They were not needed and they making debugging harder because they internally throw/catch exceptions.
* Shape propagation has been updated to support running nodes that include floating point primitive types, this required some refactoring of internal functions.
* TensorToNum and NumToTensor have actual implementations as operators now
* regster_prim_ops now contains implementations of math operators for float/int primitive types, and for mixed (prim <+> tensor) versions. This removes the need for special handling in compiler.cpp
* Primitive math is now entirely handled by letting the compiler choose the right overloads. This removes tons of special casing in the compiler.
* incorporates eellison's change to allow casting from return values. Due to the addition of primitive support, the code need slight modifications, so I just pre-merged it here.
* stack.h gains generic vararg versions of push/pop that know how to convert to/from C++ types:

```
at::Tensor a;
at::Scalar b;
pop(stack, a, b);
at::Tensor c = a + b;
push(stack, c);
```
apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9584

Reviewed By: apaszke

Differential Revision: D8910546

Pulled By: zdevito

fbshipit-source-id: 0f3e60d4d22217f196a8f606549430e43b7e7e30
2018-07-23 14:11:11 -07:00
peter
53083b8353 Remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS and fix CUDA 8 build on Windows (#9491) (#9491)
Summary:
Fixes #9092.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9491
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9693

Differential Revision: D8946850

Pulled By: ezyang

fbshipit-source-id: bd816f459ab70f6b4a0983305a1ce341bb633707
2018-07-23 06:40:39 -07:00
Adam Paszke
769cb5a640 Add new ways of matching nodes with schemas in the JIT (#9567)
Summary:
**REVIEW LAST COMMIT ONLY**

As discussed in our yesterday's meeting. Nodes can be now matched to particular overloads using the `matches(...)` function:
```cpp
n->matches("aten::type_as(Tensor self, Tensor other) -> Tensor")
```

This also changes the shape prop and peephole passes to use those functions for matching. This fixes a few bugs, makes them much more robust, and prepares us for removal of attributes.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9567

Reviewed By: zdevito

Differential Revision: D8938482

Pulled By: apaszke

fbshipit-source-id: eb2382eeeae99692aada2d78d5d0c87c8ef1545e
2018-07-21 21:39:07 -07:00
Zachary DeVito
efefd1d7cf Unify aten_dispatch and aten_schema into a single operator abstraction with human-readable schema. (#8885)
Summary:
This is a series of two commits that should probably be read separately. They are stacked on top of #9018 since the second commit requires it for correctness.

Commit 1
=======

This commit is the first in a series that will clean up how we handle declaring operators and intrinsics in the JIT to make it more modular and readable. This introduces readable declarations that can be used to register operators and switches gen_jit_dispatch to generate this schema. A follow up PR will remove the dispatch keys like "add-3" and resolve ops directly based on the registered schema, further simplifying the generation process.

* Switches schema over to parsed declarations, in the future this will allow something like:

```
  registry.register_intrinsic("foo(Tensor a, Tensor b) -> Tensor", [](Stack& stack) {
    ...
  })
```

This will allow the scalable registration of intrinsics for lists, tuples, and other ops, as long as meta-data for these ops (e.g. derivatives and size propagation routines).

The declarations resemble those used by PythonArgParser but have been singificantly cleaned up to minimize the number of types that can appear in the declaration. We should strive to get the other parts of PyTorch switched over to this restricted declaration set when possible, but it is too much to do in a single PR. My hope is that eventually we will use a very similar language to describe declarations in C10, and this can serve as a guide for that.

Parsing is done using the script lexer, so it is very robust to whitespace and extensible for future types.

This removes the other way we encoded schema, and makes it easier to see what schema are registered.

Current generated declarations: https://gist.github.com/zdevito/a96a17766fb3a098d69a91ee00abaaf6

* Switches how we handle attempting to use an integer in the place of a fixed-sized int list, such as in conv (e.g. 'int[3] stride=1'). Now that we can statically distinguish between int and Tensor, we handle the expansion as an implicit conversion in the compiler. This allows us to simplify the interpreter since it no longer needs to handle the conversion itself.

* Schema declarations have been changed so that they match the type system in the IR exactly. In particular, attribute_info which was used by liftConstantAttributes has been dropped and constant attributes are lifted purely based on the type of the input. Type conversions in compiler have been simplified due to this change.

* Error highlighting in ErrorReport now only reports at most 20 lines of code, to make reading where an error occurred easier.

Commit 2
=======

This commit unifies aten_dispatch and aten_schema into a single Operator object that both contains schema and implementation information. In the future we can use this object to also contain functionality like shape prop and autodiff needed by all operators. Operators are registered globally, and dispatch logic uses the schema information to figure out which variant to use. Descriptor keys, a frequent source of inscrutable debug errors, have been removed.

* Introduce Operator, to replace TensorOp. Unlike TensorOp, we use Operator for all op implementations, including primitives that may occur in the graphs. The only exceptions are ops that are only known to the interpreter like jumps, and GraphExecutors where we need to record additional debug info.

* Adds a global registry for Operator implementations. aten_dispatch.cpp turns into register_aten_ops.cpp, which registers all the Operators for aten with the operator registry. register_prim_ops.cpp now contains the implementations for primitive operators that used to be in the interpreter. This means that it is now safe to use `getOperation(node)` to lookup the true interpreter function for the node, which will simplify const-propagation passes.

* Remove addInterpreterOpHandler in favor of global operator registry.

* Instead of descriptors, we match Node arguments directly against FunctionSchema describing expected inputs in `matchSchema`. `matchSchema` knows how parse both attributes and positional inputs from a node and match it to the appropriate registered operator. Debug error messages when we try to run an invalid operator are significantly improved: they now automatically display the schema for the op with the same name that are registered.

* Merge aten_schema into regsiter_aten_ops. Each Operator takes a string schema which is parsed to determine when to dispatch to that op.

* Cleans up gen_jit_dispatch.py now that we do not need to write out descriptors.  In particular, skip_scalar_overloads can be removed since Richard's code sorts declarations to put Tensor, Tensor declarations first.

* remove matchSchemaAndLiftConstantAttributes and use emitBuiltinCall instead to remove code duplication

* refactor stack manipulation functions into a separate header file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8885

Reviewed By: jamesr66a

Differential Revision: D8751048

Pulled By: zdevito

fbshipit-source-id: 312aabfbf88307c5f6ab947b6caf691468b94557
2018-07-10 10:24:48 -07:00