Commit Graph

129 Commits

Author SHA1 Message Date
eellison
393731ab24 Re-land Parsing file check (#18570)
Summary:
The last time I tried to land it there was a merge race with the docs coverage test lol. Re-landing with the fix.

Re-land of https://github.com/pytorch/pytorch/pull/18304
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18570

Differential Revision: D14668859

Pulled By: eellison

fbshipit-source-id: 3825a35ddc6179a0d433d70d22b5c1a96c20b21a
2019-03-29 15:46:59 -07:00
Mikhail Zolotukhin
fca9d9a100 Initial implementation of InsertObserverNodes pass. (#18152)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18152
ghimport-source-id: 1dd5e62c4d93394dcd8d8af2871554575c8d3d1a

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18152 Initial implementation of InsertObserverNodes pass.**
* #18151 Add quant-passes stubs.

gh-metadata: pytorch pytorch 18150 gh/zolotukhinm@gmail.com/2/head

Differential Revision: D14584223

fbshipit-source-id: 30896acc1a8901d22c6a167eb87d2fbaafbbeb6f
2019-03-29 15:08:57 -07:00
Elias Ellison
ff4b6d1a49 Delete batch tensor (#18575)
Summary:
Deleting batch tensor since we are no longer maintaining the project and keeping it functional is blocking other improvements.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18575

Differential Revision: D14671126

Pulled By: eellison

fbshipit-source-id: b42d5b699c4d12171ed95e6d3a977532167f0d2c
2019-03-28 23:13:27 -07:00
Elias Ellison
ffc7158bf2 Revert D14652372: [pytorch][PR] Add parsing to file check
Differential Revision:
D14652372

Original commit changeset: 7430b9d1dc2b

fbshipit-source-id: fa3d0f68515fe53447746469844d2db20c1292e0
2019-03-28 00:12:47 -07:00
Elias Ellison
0daafe0209 Add parsing to file check (#18304)
Summary:
This allows you to embed checks in IR, making the test more readable.

E.g.
```
graph_str = 'graph(%0 : Double(5, 5)):
          # CHECK: aten::relu
          %1 : Double(5, 5) = aten::relu(%0)
          return (%1)'
FileCheck().run(graph_str, parseIR(graph_str))
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18304

Differential Revision: D14652372

Pulled By: eellison

fbshipit-source-id: 7430b9d1dc2b7584704375aac02d7392ecec76a0
2019-03-27 18:16:05 -07:00
Mikhail Zolotukhin
13b95eac55 Add quant-passes stubs. (#18151)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18151
ghimport-source-id: 7d12462971bdf3e5e26a3f150f1fcad05bba1a15

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18152 Initial implementation of InsertObserverNodes pass.
* **#18151 Add quant-passes stubs.**

gh-metadata: pytorch pytorch 18149 gh/zolotukhinm@gmail.com/1/head

Differential Revision: D14584224

fbshipit-source-id: b3d0b5ff797160d5ad23f91f732e627b0129086c
2019-03-25 17:48:54 -07:00
Sebastian Messmer
be364ac8d7 Specify overload name in function schema (#18037)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18037

The FunctionSchema can now store an overload name and the parser knows how to parse it. Specify like this:

    my_func.overload1(arg1: Tensor) -> Tensor
    my_func.overload2(arg1: Tensor, arg2: Tensor) -> Tensor

Reviewed By: zdevito

Differential Revision: D14467497

fbshipit-source-id: 8832b32f07351bb61090357b17b77a6a2fed3650
2019-03-15 16:58:13 -07:00
Elias Ellison
4d2f6f1bbe Remove remaining test jit expects redux (#17924)
Summary:
Trying to reland https://github.com/pytorch/pytorch/pull/17886 since it broke a build and I reverted it
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17924

Differential Revision: D14423842

Pulled By: eellison

fbshipit-source-id: f219e786bd07f7da3b7f9e866981199f5ccf6318
2019-03-12 11:33:34 -07:00
Elias Ellison
f540536dfd Revert D14414435: [pytorch][PR] Remove remaining IR Expect files
Differential Revision:
D14414435

Original commit changeset: 0bfd7ce66ac2

fbshipit-source-id: 02de1814f3c4e581d3798059cee752517b176ed9
2019-03-11 17:36:44 -07:00
Elias Ellison
fd67f6b463 Remove remaining IR Expect files (#17886)
Summary:
Last batch of IR expect files removed. Includes some removal of expect files that are no longer used.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17886

Differential Revision: D14414435

Pulled By: eellison

fbshipit-source-id: 0bfd7ce66ac2f72a57f15f45ebd60b95e80b6c16
2019-03-11 17:32:19 -07:00
Wanchao Liang
ab95b5c6cc Rename prim::Undefined to prim::AutogradZero (#17611)
Summary:
supersedes #17245
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17611

Differential Revision: D14283581

Pulled By: wanchaol

fbshipit-source-id: 8022d02b8a021ea2fee9a18a2c8920eb123200c5
2019-03-01 15:13:18 -08:00
James Reed
d1ed0176df Trace fork and join calls
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16232

Differential Revision: D13772974

Pulled By: jamesr66a

fbshipit-source-id: b2db370271809e26d3301f8cc98eec567db5e62b
2019-01-26 14:42:45 -08:00
Mikhail Zolotukhin
47bf30661f Directly include headers from ATen.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16287

Differential Revision: D13792949

Pulled By: ZolotukhinM

fbshipit-source-id: d627d8dc469df048063c70d0b5b8d33fede809a3
2019-01-24 11:22:27 -08:00
Zachary DeVito
3f6b212e80 Register CPU/CUDA fuser dynamically (#15887)
Summary:
This avoids a bunch of conditional compilation logic
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15887

Reviewed By: eellison

Differential Revision: D13613239

Pulled By: zdevito

fbshipit-source-id: a18fc69676b3ef19b4469ab58d8714d1f6efccbb
2019-01-11 10:50:35 -08:00
Lu Fang
a918f1d9af Adding a hook (wrapper) for non-std stream reader in PyTorchStreamReader (#15551)
Summary:
To implement a stream is very annoying, since it is closely defined with the underlying storage streambuffer.

So in this PR, we add ReadAdapterInterface and PyTorchStreamReader will use it. We implement IStreamAdapter as a wrapper of std::istream. And keep the user interface unchanged.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15551

Reviewed By: zrphercule

Differential Revision: D13568907

Pulled By: houseroad

fbshipit-source-id: 93708cb801248a6c101f35cb14d1631029365c3c
2019-01-04 22:50:07 -08:00
Michael Suo
f636dc9276 clang format world (#15524)
Summary:
The PR clang-formats everything in `torch/csrc/jit/` and adds it to the pre-commit hook.

Here is a list of non-mechanical changes:
- I went over each file and fixed up whenever I could tell that clang-format was clobbering comment formatting.
- Made the macros in register_prim_ops a little more clang-format friendly by omitting trailing commas
- Refactored autodiff.cpp to use a helper class with explicit state rather than a bunch of capturing lambdas
- Small improvements to the precommit hook clang-format
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15524

Differential Revision: D13547989

Pulled By: suo

fbshipit-source-id: 3ff1541bb06433ccfe6de6e33f29227a2b5bb493
2018-12-26 06:55:01 -08:00
Zachary DeVito
056cfaf3ff Method returns a single argument (#15289)
Summary:
This PR changes Method (just Method not all graphs) to always have a single
return argument.

This is part 1 in a set of changes that will enable us to have better handling if early return statements.
The simplification that this change provides greatly reduces the work for the next step.

This change makes it so that Method and Python handle multiple returns in the same way:
* 0 - None
* 1 - <single value>
* many - Tuple[...]

The result is that a lot of special-case handling in compiler.cpp and its
bindings can be removed. It also fixes several bugs in return handling,
including one where return values were not always checked against their
attributed values.

Notes:
* inferTypeFrom is renamed to be more accurate and discourage use.
* This has uncovered some bugs in other components, which are noted in
  the diff.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15289

Differential Revision: D13481649

Pulled By: zdevito

fbshipit-source-id: 0e2242a40bb28cca2d0e8be48bede96195e4858c
2018-12-18 10:44:09 -08:00
Peter Goldsborough
7a61306031 Enable all clang-tidy performance checks (#15198)
Summary:
This PR adds the final set of clang-tidy checks we should add for our codebase: a last set of performance-related checks. Most fixes here are around changing `auto` to `const auto&` in a few places where unnecessary copies were made, and adding `reserve()` calls before loops doing repeated `push_back()`. Also a few cases of calling `std::string::find` with a single-character string literal instead of a single char, which uses a less efficient string search algorithm meant for searching larger substrings.

![image](https://user-images.githubusercontent.com/6429851/49978940-adc1a780-ff01-11e8-99da-a4e431361f07.png)

ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15198

Differential Revision: D13468797

Pulled By: goldsborough

fbshipit-source-id: 2bed1ea1c7c162b7f3e0e1026f17125e88c4d5b2
2018-12-14 13:32:47 -08:00
Richard Zou
b14d6d730a Reuse KernelSpec for FusionGroups with equivalent graphs (#14541)
Summary:
Before this PR, loop unrolling + the graph fuser was creating multiple
FusionGroups with the same bodies (with different variable names) for
JIT LSTMs. Each FusionGroup got registered to a separate fusion key;
each key resulted in a different compilation for the same
specializations.

This PR makes it so that when registering FusionGroups with the fusion
compiler, the compiler first checks the KernelSpec cache to see if the
FusionGroup's graph exists already. If it does, then return the
corresponding KernelSpec's key to share compiled kernels.

In addition, graphs in the KernelSpec cache are canonicalized before
being cached. I added a flag to the canonicalize pass to remove unique
names of values.

This shortens the compile time for a JIT LSTM (seq_len of 100, loop
unroll factor of 8) from 5.3s to 2.3s. Most of this compile time is
running the graph fuser and/or fusion compiler; while this PR
makes it so that there is only one unique kernel in the forward pass,
there are a lot of different kernels (6) in the backward pass
(after loop unrolling) that should be investigated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14541

Differential Revision: D13324487

Pulled By: zou3519

fbshipit-source-id: b841d82ed35a959b5cfc72db033bf5a7b42cc4fb
2018-12-13 07:54:35 -08:00
Edward Yang
517c7c9861 Canonicalize all includes in PyTorch. (#14849)
Summary:
Anywhere we used #include "foo.h", we now say #include <foo.h>
Paths are adjusted to be rooted out of aten/src, torch/lib, or
the root level directory.

I modified CMakeLists.txt by hand to remove TH and THC from
the include paths.

I used the following script to do the canonicalization:

```
  import subprocess
  import re
  import os.path

  files = subprocess.check_output(['git', 'ls-files']).decode('utf-8').rstrip().split('\n')
  for fn in files:
      if not any(fn.endswith(suff) for suff in ['.cu', '.cpp', '.in', '.h', '.hpp', '.cu', '.cuh', '.cc']):
          continue
      if not any(fn.startswith(pref) for pref in ["aten/", "torch/"]):
          continue
      with open(fn, 'r') as f:
          c = f.read()
      def fmt(p):
          return "#include <{}>".format(p)
      def repl(m):
          p = m.group(1)
          if p in ["dlfcn.h", "unistd.h", "nvrtc.h", "cuda.h", "cuda_runtime.h", "cstdint", "cudnn.h", "Python.h", "cusparse.h", "cuda_runtime_api.h", "cuda_fp16.h", "cublas_v2.h", "stdint.h", "curand_kernel.h"]:
              return fmt(p)
          if any(p.startswith(pref) for pref in ["torch/csrc", "c10/", "ATen/", "caffe2/", "TH/", "THC/", "Eigen/", "gtest/", "zdl/", "gloo/", "onnx/", "miopen/"]):
              return fmt(p)
          for root in ["aten/src", "torch/lib", ""]:
              for bad_root in [os.path.dirname(fn), "aten/src/TH", "aten/src/THC", "torch/csrc"]:
                  new_p = os.path.relpath(os.path.join(bad_root, p), root)
                  if not new_p.startswith("../") and (os.path.exists(os.path.join(root, new_p)) or os.path.exists(os.path.join(root, new_p + ".in"))):
                      return fmt(new_p)
          print("ERROR: ", fn, p)
          return m.group(0)
      new_c = re.sub(r'#include "([^"]+)"', repl, c)
      if new_c != c:
          print(fn)
          with open(fn, 'w') as f:
              f.write(new_c)
```

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14849

Reviewed By: dzhulgakov

Differential Revision: D13363445

Pulled By: ezyang

fbshipit-source-id: 52361f878a672785f9306c9e9ab2513128092b68
2018-12-08 19:38:30 -08:00
Peter Goldsborough
d6c53328f9 Large scale fix of python-related files in torch/csrc/
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14515

Differential Revision: D13247966

Pulled By: goldsborough

fbshipit-source-id: 7a127c508fc576a7a92626dd6b729f660162d628
2018-12-07 13:04:46 -08:00
Michael Suo
95e5a5ae0c basic testing of builtin alias annotations (#14588)
Summary:
Check whether the codegen'd alias annotations actually track alias creation and writes correctly. This could be made more exhaustive, but it's good enough for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14588

Differential Revision: D13312653

Pulled By: suo

fbshipit-source-id: 98de1610ea86deada71957c75c222fff331a0888
2018-12-03 22:31:02 -08:00
Michael Suo
b768db0810 Allow DCE to clean up some mutable ops (#14601)
Summary:
This PR makes DCE a little smarter in the presence of mutable ops. Previously mutable ops could never be cleaned up, now they can be cleaned up if we can prove there are no live uses of any alias sets that the op writes to.

This behavior is optional; if you pass DCE a block instead of a graph, it will do the same thing as before. Also changed `InlineAutographSubgraph` to use the common subgraph utils.

Tested on traced ResNet, and it gets rid of the dead code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14601

Differential Revision: D13309118

Pulled By: suo

fbshipit-source-id: dac2791e7d2ecf219ae717a2759b83c1e927f254
2018-12-03 13:31:08 -08:00
Zachary DeVito
170ff7764f Use a zip archive as our container format (#14521)
Summary:
After consulting with Owen, who pointed out the existence of the miniz library, I decided to take one last shot at using zip as our container format.
miniz makes this surprisingly feasible and I think the benefits of using zip are large enough that we should do it.

This replaces our custom container format with a zip archive, preserving all of the
desirable features of our custom format, such as append-oriented writing, and
mmap'able tensor data while adding a bunch of debugging advantages:

1. You can unzip and explore the container to debug what is going on with a model.
2. You can edit the model using a text editor (e.g. change the definition of a method,
   or editing the json-serialized meta-data), re-zip the file use OSX's native 'Compress'
   option, and re-load the result into pytorch. Note: this enables you to, e.g., print-debug
   serialized models.
3. We can easily enable features like compression in the future.
4. Stock python , without pytorch installed, and other programming languages
   can reasonably consume this format,using json  and zipfile packages, which enables
   people to build tools like visualizers without those visualizers depending on pytorch.
   This will be especially useful if you want to, for instance, write a visualizer in javascript.

Notes:

*  This add miniz (https://github.com/richgel999/miniz) as a dependency. miniz is a self-contained
   library for reading/writing zipfiles that unlike other zip libraries also includes libz
   compatible compress/decompress support. It is a single header and a single C file without
   any other dependencies. Note that the instructions for miniz explicitly state:

   > Please use the files from the releases page in your projects. Do not use the git checkout directly!

   So we have checked in the 'release' source. Miniz supports zip64, and its API is amenable
   to doing zip-align style things to align data.

*  Removes 'size' from RecordRef. This allows you to edit files in the zip archive without
   editing the meta-data file. Very important if you want to print-debug serialized models.

*  PyTorchStreamReader/PyTorchStreamWriter keep mostly the same API (though keys become strings)
   However, their implementation is completely swapped out to use miniz.

*  Code exists to check for the old magic number to give a decent warning to our preview users
   after we change the format.

*  Container version information is now put in a stand-alone 'version' file in the archive
   and serves a similar purpose to the other container version info.

*  All files in the zip archive start at 64-byte boundaries, using an approach similar to
   zip-align. Tests check that this property remains true. While the writer does this,
   the reader doesn't depend on it, allowing user-created archives that can use compression,
   and do not have to align data.

*  Added test to check for > 4GB files and archives. Disabled by default because it takes
   almost 2 minutes to run.

*  torchscript files are now optional: if a submodule does not have methods, it will
   not be written.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14521

Reviewed By: jamesr66a

Differential Revision: D13252945

Pulled By: zdevito

fbshipit-source-id: 01209294c0f6543d0fd716f85a38532249c52f8c
2018-11-30 19:19:29 -08:00
Michael Suo
3fca4bde50 Trace in-place ops (#14254)
Summary:
This PR adds a `try_outplace` option to the tracer. When `try_outplace` is true, the tracer will attempt to out-of-place ops (similar to how things are done today). When it's false, the correct in-place op is emitted.

I made `try_outplace` false by default, but flipped it to true for ONNX export utils. zdevito jamesr66a, anywhere else I should preserve the existing behavior?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14254

Reviewed By: eellison

Differential Revision: D13166691

Pulled By: suo

fbshipit-source-id: ce39fdf73ac39811c55100e567466d53108e856b
2018-11-27 12:40:56 -08:00
Michael Suo
33d091f432 shape analysis fix (#14325)
Summary:
This PR is deceptively large because of an indenting change. The actual change is small; I will highlight it inline
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14325

Differential Revision: D13183296

Pulled By: suo

fbshipit-source-id: fcbf6d5317954694ec83e6b8cc1c989f2d8ac298
2018-11-23 11:24:24 -08:00
Michael Suo
7ea9c674bc migrate subgraph slicing to use moveBefore/moveAfter (#13862)
Summary:
Migrate the `CreateAutodiffSubgraphs` pass to use topologically-safe moves instead of DynamicDAG. This is to unify the interface that we use for determining safe node moves to prepare for mutability.

The pass looks a lot like GraphFuser now, and there's a lot of code duplication. I plan to pull common stuff out into a "subgraph manipulation utils" thing, but didn't want to clutter this PR.

Future steps:
- Get rid of code duplication (see above)
- Use DynamicDAG to back the `moveBefore/After` calls.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13862

Differential Revision: D13072871

Pulled By: suo

fbshipit-source-id: 92e7880ef444e0aefd51df60964bba7feaf42ae0
2018-11-14 17:33:36 -08:00
James Reed
85bde3801b Tracer now records Python variable names (#13441)
Summary:
This is probably slow but it should make the traces more understandable and make debugging easier. Any suggestions for how to make it faster (i.e. make it so we don't have to traverse all of locals() and globals()) would be appreciated
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13441

Differential Revision: D12879763

Pulled By: jamesr66a

fbshipit-source-id: b84133dc2ef9ca6cfbfaf2e3f9106784cc42951e
2018-11-08 13:08:42 -08:00
Michael Suo
5fbaf0eaf8 add augmented assignment ops (#13364)
Summary:
This PR changes the compiler to correctly emit in-place operators for augmented assignments (`+=` and friends).
- To better match the Python AST structure, add an `AugAssign` tree view and make `Assign` apply only to `=` assignments.
- Emit those `AugAssign` exprs in the compiler, dispatching to in-place aten ops for tensors and lowering to simple assignments for scalar types.
- In order to preserve (suspect) ONNX export semantics, add a pass to lower the in-place operators to out-of-place operators.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13364

Differential Revision: D12899734

Pulled By: suo

fbshipit-source-id: bec83be0062cb0235eb129aed78d6110a9e2c146
2018-11-02 00:01:07 -07:00
Adam Paszke
26a8bb62ee Re-enabled mm+add tree batching in the JIT (#13228)
Summary:
I've had to generously increase the range of the CreateADSubgraphs pass, because even though it collapses the RNN loop to a single differentiable subgraphs and a few other nodes, the range uses the distances in the original graph...

cc zdevito zou3519
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13228

Differential Revision: D12871316

Pulled By: zou3519

fbshipit-source-id: 32da6f30f7821e4339034f1a4dec41ed0849abfb
2018-11-01 14:50:17 -07:00
mruberry
6fe089c6ea Hierarchical device independent -> device specific architecture (#13108)
Summary:
This PR principally redesigns the fuser's logical flow to be hierarchical, with device-independent logic directing (relatively little) device-specific logic. This design is based on reviews of XLA, TVM, internal design review at NVIDIA and discussions with fuser owners at Facebook. To further vet the design I have begun developing the next significant PR (extended fusion logic) on top of this architecture and it has made the work significantly easier. This PR also improves fuser modularity, which should make it easier for others to contribute to. Unfortunately, this PR is large and its nature has made breaking it into smaller pieces challenging. Future PRs should be smaller.

The fusion flow is now:

- Fusions are "registered" and "upfront compilation" occurs. The fusion specifications, which includes the graph, go into a thread-safe device-independent cache. Upfront compilation generates some information used later during shape inference.
- Fusions are run, which passes them to an executor that performs shape inference, requests an instantiated fusion from the specification's thread-safe store, and launches them. Launch logic eventually defers to device-specific logic.
- Fusions not previously instantiated are compiled. Compilation is device-specific and arg-specific. Compilation logic eventually defers to device-specific logic.
- If the fusion could not be run because fusion on the requested device is disabled or shape inference fails a fallback is invoked.

This flow can be thought of as PyTorch IR -> Device-Independent Fusion Logic -> Device-Specific Fusion Logic. The current upstream logic is, by contrast, PyTorch IR -> Device-Specific Logic -> Device-Independent Logic, which results in needless code duplication and lack of conceptual clarity. That was my mistake when splitting the fuser off from the rest of the jit and our reviews since then have been incredibly helpful in understanding why the approach in this PR is better.

This PR does not only move code around. It also fixes few couple bugs and makes some logical/code changes.

Bug fixes:
- thread-safety is improved with caches preventing concurrent access
- the nvrtc version is now reviewed to determine the appropriate compute architecture to compile for, fixing a bug that would cause runtime errors if a user's nvrtc didn't support the compute architecture their gpu reported
- an issue with DeviceGuard not setting the device properly and failing silently is worked-around (ezyang mentioned he was reviewing the dynamic registration DeviceGuard uses, which may resolve the issue)

Code/Logical changes:
- "const" now appears many more places (note: I cast const away in operator.h because of some obscure build issues -- I think we should be able to fix this and will take a look while this goes through testing)
- The new flow allowed some redundant code to be removed (AnnotatedGraph is gone, for example, and the more straightforward flow eliminated duplication of effort elsewhere)
- Fallback logic is now also invoked if a fusion is requested on a device that cannot handle fusions
- Use of macros to determine which files are compiled is reduced (though they may come back if the Windows build is unhappy)
- There is no more "common" code or folder, the device-independent logic being at the forefront of the fuser replaces and improves upon the goal of sharing code

apaszke who I promised naming rights to
zdevito who correctly pointed out that the device-independent logic should be the bulk of what the fuser is doing
ngimel who contributed to the design of this architecture
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13108

Reviewed By: gchanan, fmassa

Differential Revision: D12850608

Pulled By: soumith

fbshipit-source-id: 24e2df6dfa97591ee36aeca8944519678c301fa3
2018-10-31 18:13:00 -07:00
Elias Ellison
59f8e8ada7 First step at adding exceptions (#12789)
Summary:
This is a first step towards adding exceptions. We need minimal support in order to begin converting the torch library to weak script mode (which is the main goal here).

Some limitations (that are documented in the tests & compiler):
1. Cannot assign exceptions to variables
2. Any name after raise is being treated as a valid Exception
3. No control flow analysis yet. Below a will be undefined:

if True:
     a = 1
else:
     raise Exception("Hi")
return a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12789

Differential Revision: D12848936

Pulled By: eellison

fbshipit-source-id: 1f60ceef2381040486123ec797e97d65b074862d
2018-10-30 20:25:50 -07:00
James Sun
4d62eef505 Add Future to IValue (#12976)
Summary:
Future now is an IValue. prim::Wait now is replaced by aten::wait

This PR is built on top of #12925
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12976

Differential Revision: D10861483

Pulled By: highker

fbshipit-source-id: 9e17926a625bc502fb12335ef9ce819f25776be7
2018-10-27 10:00:35 -07:00
Lu Fang
9f9f06c937 Improve inline container and add some test (#12993)
Summary:
Added getNextRecord/hasNextRecord methods. Even the model data is stored at the end, we can still read the file from the beginning.

Added gtest to cover reader and writer's code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12993

Reviewed By: yinghai

Differential Revision: D10860086

Pulled By: houseroad

fbshipit-source-id: 01b1380f8f50f5e853fe48a8136e3176eb3b0c29
2018-10-26 12:06:47 -07:00
Zachary DeVito
6c8d47f2af Add methods to FunctionSchema (#12967)
Summary:
We are beginning to use this class in a wider reaching set of use-cases. This PR refactors it so that we always access schema properties through methods. This will make adding extra information like alias information easier (i.e. we can a version of `type()` that returns the type with alias information and another version that returns a type without that information).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12967

Differential Revision: D10502674

Pulled By: zdevito

fbshipit-source-id: a88783ed8f20ab3be6460c12da95f9f940891c44
2018-10-24 10:32:27 -07:00
Lu Fang
e240e89984 move the torch/csrc/jit/serialization.h to caffe2 source folder and rename to inline_container.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12781

Reviewed By: dzhulgakov

Differential Revision: D10436151

Pulled By: houseroad

fbshipit-source-id: 7f59eec21df5acbab0ea693e1a1cd4fa152f05e5
2018-10-18 09:47:19 -07:00
Yangqing Jia
713e706618 Move exception to C10 (#12354)
Summary:
There are still a few work to be done:

- Move logging and unify AT_WARN with LOG(ERROR).
- A few header files are still being plumbed through, need cleaning.
- caffe2::EnforceNotMet aliasing is not done yet.
- need to unify the macros. See c10/util/Exception.h

This is mainly a codemod and not causing functional changes. If you find your job failing and trace back to this diff, usually it can be fixed by the following approaches:

(1) add //caffe2/c10:c10 to your dependency (or transitive dependency).
(2) change objects such as at::Error, at::Optional to the c10 namespace.
(3) change functions to the c10 namespace. Especially, caffe2::MakeString is not overridden by the unified c10::str function. Nothing else changes.

Please kindly consider not reverting this diff - it involves multiple rounds of rebasing and the fix is usually simple. Contact jiayq@ or AI Platform Dev for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/12354

Reviewed By: orionr

Differential Revision: D10238910

Pulled By: Yangqing

fbshipit-source-id: 7794d5bf2797ab0ca6ebaccaa2f7ebbd50ff8f32
2018-10-15 13:33:18 -07:00
James Reed
0f9807ee61 Enable addmm fusion for ONNX export only (#12538)
Summary:
There's some action at a distance issues and not having this is disabling quantization in C2 for prod use cases

ref T34831022
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12538

Differential Revision: D10302931

Pulled By: jamesr66a

fbshipit-source-id: 700dc8c5c4297e942171992266ffb67b815be754
2018-10-11 13:57:50 -07:00
Elias Ellison
00aedfc0e2 constant pooling pass (#12222)
Summary:
Add a pass to move all constants to the beginning of the graph, and deduplicate.

This extends https://github.com/pytorch/pytorch/pull/10231 to also handle constants introduced in inlining, constant propagation, etc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12222

Reviewed By: driazati

Differential Revision: D10201616

Pulled By: eellison

fbshipit-source-id: bc9c5be26868c8b5414257a0d4462de025aeb9bd
2018-10-08 11:55:02 -07:00
Peter Goldsborough
db8d01b248 Move JIT tests to gtest (#12030)
Summary:
In our #better-engineering quest of removing all uses of catch in favor of gtest, this PR ports JIT tests to gtest. After #11846 lands, we will be able to delete catch.

I don't claim to use/write these tests much (though I wrote the custom operator tests) so please do scrutinize whether you will want to write tests in the way I propose. Basically:

1. One function declaration per "test case" in test/cpp/jit/test.h
2. One definition in test/cpp/jit/test.cpp
3. If you want to be able to run it in Python, add it to `runJitTests()` which is called from Python tests
4. If you want to be able to run it in C++, add a `JIT_TEST` line in test/cpp/jit/gtest.cpp

Notice also I was able to share support code between C++ frontend and JIT tests, which is healthy.

ezyang apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12030

Differential Revision: D10207745

Pulled By: goldsborough

fbshipit-source-id: d4bae087e4d03818b72b8853cd5802d79a4cf32e
2018-10-06 23:09:44 -07:00
Luca Antiga
5be0baefa2 Use streams in JIT serialization, allow JIT serialization to/from buffer (#11932)
Summary:
This PR replaces the use of `std::FILE` with `istream`/`ostream` for JIT serialization.
It uses this mechanism to add the possibility to serialize to/from binary buffers, in addition to files, both in `libtorch` and from Python.

`getExportImportCopy` in `test_jit.py` has been updated so that both file and buffer codepaths are exercised during tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11932

Differential Revision: D10084303

Pulled By: apaszke

fbshipit-source-id: b850801b3932922fa1dbac6fdaed5063d58bc20d
2018-09-28 07:54:27 -07:00
Adam Paszke
51414822f5 Stop moving constants into DifferentiableSubgraphs (#11809)
Summary:
Or even taking them as inputs. This prevents optimizations to happen
either inside the differentiable subgraphs, or in the surrounding graph.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11809

Differential Revision: D10009680

Pulled By: apaszke

fbshipit-source-id: face638566228e470a6deec48dc2aa3a1cce26d4
2018-09-24 13:24:53 -07:00
Adam Paszke
7efbf3a827 Specialize ArgumentSpecs on tuple elements too (#11863)
Summary:
This is pretty important because a common situation of passing LSTM hidden states as a tuple completely trashes performance of a network.

Cleans up all our propagation/undef specialization passes, at a cost of increased complexity of `ArgumentSpec` and `GraphExecutor`. An alternative would be to simply flatten all tuple inputs to a graph ahead of time, but that might just end up being confusing in the future (you never know if you're working with a graph that can have tuple or not).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11863

Differential Revision: D9992814

Pulled By: apaszke

fbshipit-source-id: 0a565a3b23e32f8fa72c0534e07c1ce6187739fc
2018-09-21 14:19:58 -07:00
Mike Ruberry
96d3f968eb Splits CPU and CUDA fusion compilers (#10981)
Summary:
This PR splits the CPU and CUDA fusion compilers, putting them into a new jit/fusers/ directory with jit/fusers/common for common components. In particular:

- A fusion interface is created that allows "fusion handles" to be requested
- The CPU and CUDA fusers implement this interface, with dispatch determined by device
- The fusion compilers, fusion function specializations and resource strings are split
- CPU-specific classes like TempFile and DynamicLibrary are in the CPU fuser
- Common classes likes TensorDesc and the base fusion function class are in jit/fusers/common
- There is still some specialization in jit/fusers/common, but these specializations are small(-ish)
- Updates the build system to remove the dummy interface on Windows and minimize the use of macros

This structure should allow in-flight PRs to easily rebase while providing a clear interface to the fusers.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10981

Reviewed By: soumith

Differential Revision: D9701999

Pulled By: apaszke

fbshipit-source-id: 3b6bec7b97e0444b2a93caa38d9b897f2e68c1b3
2018-09-14 14:05:34 -07:00
Adam Paszke
98e04db955 Implement requires_grad propagation in the JIT (#11586)
Summary:
Previously, we would pretty much assume that all floating point tensors do require grad, which might result in some unnecessary compute.

I don't really like the fact that `TensorType` uses `tensor.is_variable() && tensor.requires_grad()` to infer the value of `requires_grad`, but changing constants to keep variables turns out to be pretty hard. I got halfway there, but it would still need some more work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11586

Reviewed By: ezyang

Differential Revision: D9813648

Pulled By: apaszke

fbshipit-source-id: 77f77756d18ff7632fca3aa68ce855e1d7f3bdb8
2018-09-13 19:25:26 -07:00
Adam Paszke
a00fa2c614 Release GIL when calling into JIT interpreter
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11541

Differential Revision: D9777909

Pulled By: apaszke

fbshipit-source-id: d0217e203721262f3f131b54ea78f898df0b54ec
2018-09-11 21:55:40 -07:00
Zachary DeVito
7de0332e10 Add initial documentation for JIT (#11357)
Summary:
In addition to documentation, this cleans up a few error message formats.
It also adds infra to find which operators are supported by the JIT automatically, which is then used in the generation of the docs.

The wording and formatting of the docs is not yet polished, but having this will allow our document writers to make faster progress.

Followup PRs will polish the docs and fix formatting issues.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11357

Differential Revision: D9721277

Pulled By: zdevito

fbshipit-source-id: 153a0d5be1efb314511bcfc0cec48643d78ea48b
2018-09-07 14:27:47 -07:00
Richard Zou
68c2e014cb Handling for py2/py3 division differences (#11016)
Summary:
- In Python 2, use of `/` (regardless of int/float/Tensor) causes a compiler error if
  `from __future__ import division` is not imported in the file.
- The / operator is universally set to do "true" division for integers
- Added a `prim::FloorDiv` operator because it is used in loop unrolling.

The error if users use '/' in python 2 without importing from __future__
occurs when building the JIT AST.

cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11016

Differential Revision: D9613527

Pulled By: zou3519

fbshipit-source-id: 0cebf44d5b8c92e203167733692ad33c4ec9dac6
2018-09-05 14:57:38 -07:00
Adam Paszke
00df09b65d Change specialization rules in GraphExecutors (#10977)
Summary:
**Review last commit only.** Stacked on top of #10949.

This commit fixes a number of issues connected to caching
differentiability status of graphs inside graph executors,
and changes the rules for optimization of differentiable subgraphs.
Previously every one of those was instantiated as a separate graph
executor, but now they are simply heavier-optimized graph regions,
and graph executors are only instantiated for their backward.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10977

Differential Revision: D9600626

Pulled By: apaszke

fbshipit-source-id: dad09a0f586e396afbd5406319c1cd54fbb8a3d3
2018-08-30 22:11:01 -07:00
Adam Paszke
f3c3127c67 Don't flatten output lists in the JIT IR (#10949)
Summary:
Operators like aten::chunk used to return a number of tensors, but
now return a list. To make it easier to do shape prop through
aten::chunk and fuse it, I've also introduced prim::ConstantChunk,
which behaves like the previous implementation (has a variable length
output list).

The downside of this PR is that the introduction of more lists to the IR causes the LSTM and MiLSTM graphs to be considered as non-differentiable by the graph executor. I verified that they are still optimize correctly, and my next patch (that changes how the specializations/differentiation works) will restore those.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10949

Reviewed By: zdevito

Differential Revision: D9556823

Pulled By: apaszke

fbshipit-source-id: 33e63b17fc7247cac6cfc05eb7eb9bf069b499ee
2018-08-30 19:54:39 -07:00