Commit Graph

57 Commits

Author SHA1 Message Date
Adam Paszke
00df09b65d Change specialization rules in GraphExecutors (#10977)
Summary:
**Review last commit only.** Stacked on top of #10949.

This commit fixes a number of issues connected to caching
differentiability status of graphs inside graph executors,
and changes the rules for optimization of differentiable subgraphs.
Previously every one of those was instantiated as a separate graph
executor, but now they are simply heavier-optimized graph regions,
and graph executors are only instantiated for their backward.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10977

Differential Revision: D9600626

Pulled By: apaszke

fbshipit-source-id: dad09a0f586e396afbd5406319c1cd54fbb8a3d3
2018-08-30 22:11:01 -07:00
Edward Yang
f7b02b3a68 Change Tensor/TensorImpl to use c10::intrusive_ptr (#10824)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10824

API additions:
- Tensor(c10::intrusive_ptr<TensorImpl,UndefinedTensor>&&)
- Tensor(const c10::intrusive_ptr<TensorImpl,UndefinedTensor>&)
- Tensor::operator=(Tensor&&) && (for completeness sake)
- TensorBase::unsafeGetTensorImpl()
- TensorBase::unsafeReleaseTensorImpl()
- TensorBase::getIntrusivePtr()
- TensorImpl::type_id()
- Tensor::set_data()
- Tensor::is_same(Tensor)
- Tensor::use_count()
- Tensor::type_id()
- Tensor::scalar_type()
- WeakTensor::is_same(WeakTensor)
- intrusive_ptr::weak_use_count()
- weak_intrusive_ptr::weak_use_count()
- c10::raw::intrusive_ptr::{incref,decref,make_weak}
- c10::raw::weak_intrusive_ptr::{incref,decref,lock}

API changes:
- Tensor::pImpl is no longer public (and now named tensor_impl_)
    - Most methods accessed this way are now accessible on Tensor
      maybe_zero_dim() and set_wrapped_number() being prominent exceptions
      (they are now accessed through unsafeGetTensorImpl())
- Type is no longer friend of Tensor
- TensorBase::reset(TensorImpl*) is deleted
- TensorBase::reset(TensorImpl*, bool should_retain) is deleted
- TensorBase::swap(TensorBaseImpl&) is deleted; use std::swap instead
- TensorBase::get() is deleted; use unsafeGetTensorImpl() instead
- TensorBase::detach() is deleted; use unsafeReleaseTensorImpl() instead
- TensorBase::retain() is deleted; use _raw_incref() instead
- TensorBase::release() is deleted; use _raw_decref() instead
- WeakTensor lost most of its methods (it no longer inherits from
  TensorBase)
- TensorImpl::storage() is now a const method
- Tensor(TensorBase) constructor removed, instead
  we go through getIntrusivePtr().  I'm not sure about
  this change; I happened to have accidentally removed the
  TensorBase constructor and decided to fix call sites,
  but I could go the other way.
- detail::set_data() is deleted; use Tensor::set_data() instead
- c10::raw_intrusive_ptr_target removed; use the functions in c10::raw instead.
  (The reason for this change, is that it is invalid to cast an intrusive_ptr_target*
  to a raw_intrusive_ptr_target* to take advantage of the methods. But there is
  no reason the incref/decref methods shouldn't also work on intrusive_ptr_target;
  it is primarily an API consideration. We can be more standards compliant by
  keeping them as functions, which are universally applicable.)
- intrusive_ptr::reclaim() and weak_intrusive_ptr::reclaim() now work on
  pointers of the NullType. (This counts as a bug fix, because the documentation
  specified that pointers produced by release() are valid to reclaim(), and
  a release() on a null intrusive_ptr produces the NullType::singleton())

Bug fixes:
- Dispatch code for mutable references incorrectly returned
  a reference to a value argument (which would immediately
  go out of scope).  They now correctly return a tensor by
  value.
- intrusive_ptr copy/move assignment did not work correctly when
  an object was assigned to itself. We now check for this case and
  no-op if so. (This bug manifested itself as a Tensor mysteriously
  becoming an UndefinedTensor after lines of code like
  'x = x.mul_(y)')

Other changes:
- The checked cast functions in Utils.h have now been
  renamed and detemplatized into checked unwrap functions.
- Added type_id() and scalar_type() methods to Tensor
- pImpl is no longer public
- Documented what the && overloads are doing
- All occurrences of 'new TensorImpl' (and similar spellings, like 'new THTensor')
  have been expunged. This is NO LONGER a valid way to create a new
  tensor, and if you do this, upon your first incref, you will catch an ASSERT
  failure saying that only tensors created by intrusive_ptr::release() are valid
  to reclaim(). Use c10::make_intrusive instead in this situation.
- IValue is adjusted to use intrusive_ptr instead of Retainable, and all
  other sub-classes of Retainable were modified to use intrusive_ptr.
  When doing this, I had to make the constructors of sub-classes like
  ConstantList public, so that c10::make_intrusive could invoke them.  Fortunately,
  if you incorrectly stack allocate a ConstantList, and then try to get an
  intrusive_ptr to it, it will fail, as stack allocated ConstantLists have refcount 0.
- IValue very narrowly sidesteps the problem of handling NullType, as it
  considers intrusive_ptr<TensorImpl> identical to intrusive_ptr<TensorImpl, UndefinedTensor>
  which is not always true. This was always the case, but there's now a comment
  explaining what's going on.

Some MSVC bugs were uncovered during the preparation of this patch.
They are documented as comments in the code.

Reviewed By: gchanan

Differential Revision: D9481140

fbshipit-source-id: 14a8ea0c231ed88b5715fb86d92730926f9f92fc
2018-08-27 16:11:01 -07:00
Adam Paszke
c8b246abf3 Prevent JIT from overspecializing to every single size configuration (#10844)
Summary:
Please review the expects carefully to make sure there are no regressions. I tried to go over them one by one when they changed, but it's sometimes easy to miss finer details.

Summary of changes:

- Renamed `TensorType` to `CompleteTensorType`. Added a new `TensorType` which records only the scalar type, number of dimensions, and device of a value. The argument behind the rename is to encourage people to use `CompleteTensorType` less, as most passes will only have limited information available. To make transition easier `complete_type->cast<TensorType>()` works, and makes our passes work with both kinds of specialization if they don't need extra the extra detail.
- Renamed `ArgumentSpec` to `CompleteArgumentSpec`. Added a new `ArgumentSpec`, which matches argument only at the level of the new `TensorType`.
- Shape analysis can process graphs with both `CompleteTensorType` and `TensorType`.
- Fuser was a part that heavily relied on full shape information being available. Now, we simply try to fuse the largest possible graphs, and have to do run-time checks to make sure they match the code we generate. If they don't, we fall back to regular interpretation. The shape checks are implementing using an optimized method exploiting algebraic properties of shapes with broadcasting, and the relations of broadcasting with pointwise ops. A full written proof of correctness of the shape checking algorithm is included in a comment in `graph_fuser.cpp`.

zdevito ezyang mruberry ngimel csarofeen
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10844

Differential Revision: D9498705

Pulled By: apaszke

fbshipit-source-id: 0c53c2fcebd871cc2a29c260f8d012276479cc61
2018-08-26 09:54:48 -07:00
Edward Yang
19031c68dc Use intrusive_ptr in Storage; replace unique_ptr<Storage> with Storage (#10488)
Summary:
```
Use intrusive_ptr in Storage; replace unique_ptr<Storage> with Storage

This patch does two major changes:

- It replaces the use of Retainable in Storage with a new implementation
  based on intrusive_ptr.  This will be necessary because Caffe2 will
  be using this class to implement intrusive_ptrs, and we need to
  line these up for the merge.  One good thing about the new implementation is
  that the default copy/move constructors/assignment operators and destructor
  work automatically, instead of needing to be hardcoded into Storage/Tensor.

- It replaces all places where we returned std::unique_ptr<Storage> with
  Storage, collapsing an unnecessary double indirection that is no longer
  necessary now that we have correctly working copy/move constructors.

I didn't initially want to do step (2), but it was very important to
eliminate all bare uses of new Storage and new StorageImpl, and this making
the API change was the most straightforward way to do this.

HOW TO FIX YOUR CODE IN THE NEW API

- You no longer need to dereference the result of tensor.storage() to pass
  it to set.  So, instead of:

      x.set_(*y.storage());

  just write:

      x.set_(y.storage());

- If you were accessing methods on StorageImpl via the pImpl() method, you
  must use the dot operator to run pImpl().  Even better; just drop pImpl,
  we now have method forwarding.  So, instead of:

      storage->pImpl()->data();

  just do:

      storage->data();
      // storage.pImpl()->data() works too but is not as recommended

- storage->getDevice() is no more; instead use storage->device().index()

MISC CODE UPDATES

- retain, release, weak_retain, weak_release and weak_lock are now
  reimplemented using the "blessed API", and renamed to make it
  clearer that their use is discouraged.

- nvcc OS X and general OS X portability improvements to intrusive_ptr

- A new comment in intrusive_ptr describing how stack allocated
  intrusive_ptr_targets work differently than heap allocated ones
  from c10::make_intrusive

CAVEAT EMPTOR

- THStorage_weakRetain used to work on strong pointers, but it NO LONGER
  works with intrusive_ptr.  You must reclaim the strong pointer into a
  real strong pointer, construct a weak pointer from it, and then release
  the strong and weak pointers.  See StorageSharing.cpp for an example.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10488

Reviewed By: gchanan

Differential Revision: D9306134

Pulled By: ezyang

fbshipit-source-id: 02d58ef62dab8e4da6131e1a24834a65c21048e2
2018-08-21 21:39:55 -07:00
Gregory Chanan
00f2731112 Merge THTensor into TensorImpl
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10479

Differential Revision: D9315800

Pulled By: gchanan

fbshipit-source-id: b13ef0de3342600b02b54e0700eb02021a9d1a9e
2018-08-16 08:10:06 -07:00
Edward Yang
64235d5c01 Rewrite TensorImpl to use TensorTypeId. (#10278)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10278

Translation to Backend happens immediately before we go into the
Type universe; otherwise we use TensorTypeId.

I allocated TensorTypeId corresponding exactly to existing ATen
Backend.  Only CPUTensorId and CUDATensorId are relevant in the
Caffe2 universe.

Reviewed By: gchanan

Differential Revision: D9184060

fbshipit-source-id: 9d3989c26f70b90f1bbf98b2a96c57e2b0a46597
2018-08-13 11:20:04 -07:00
Edward Yang
41dce17e22 Delete TensorImpl::type_, replace with backend_/scalar_type_/is_variable_ (#10210)
Summary:
The basic game plan is to stop accessing the type_ field directly,
and instead using the stored backend_, scalar_type_ and
is_variable_ to look up the appropriate Type from Context.
Storage of backend_ and scalar_type_ are new.

At some future point in time, I'd like to look at this code
carefully to see if I can get everything in this codepath inlining.
I didn't do it in this patch because there are circular include
problems making things difficult.

Some other details:

- Added Device::backend() which does what it says on the tin

- SparseTensorImpl is temporarily hard-coded to root in at::Context
  for the appropriate context.  If/when we put this in shared code,
  we'll have to break this dep too, but for now it should be OK.

- There's a stupid problem with globalContext() deadlocking if
  you didn't actually initialize it before loading libtorch.so
  (which is bringing along the variable hooks).  I fixed this by
  reordering the static initializers. Fixes #9784

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10210

Differential Revision: D9150697

Pulled By: ezyang

fbshipit-source-id: 89e2006c88688bcfab0dcee82dc369127c198c35
2018-08-03 18:25:19 -07:00
Zachary DeVito
1f78e06f63 Add g.insertConstant and clean up dead attributes code (#10177)
Summary:
* Changes `insertConstant(g, val)` to `g.insertConstant(val)`.
* Moves SourceRange to its own file to enable it.
* Cleans up dead attribute code in schema matching and graph.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10177

Differential Revision: D9137789

Pulled By: zdevito

fbshipit-source-id: 8a73cfb01a576f02e7e4dce019be9c0a0002989d
2018-08-02 20:45:31 -07:00
Edward Yang
9e85a7a9de Back out "[pytorch][PR] [TENSOR MERGE] Delete type_ field from TensorImpl, replaced with backend_/scalar_typ…" (#10169)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10169

Original commit changeset: 2b4d867abfdc

Reviewed By: pjh5, SsnL

Differential Revision: D9135216

fbshipit-source-id: d5c9f12c3a0f75df224c781e1cd1e323cdfbb0d5
2018-08-02 12:39:01 -07:00
Edward Yang
1d427fd6f6 Delete type_ field from TensorImpl, replaced with backend_/scalar_typ… (#9787)
Summary:
…e_/is_variable_

The basic game plan is to stop accessing the type_ field directly,
and instead using the stored backend_, scalar_type_ and
is_variable_ to look up the appropriate Type from Context.
Storage of backend_ and scalar_type_ are new.

At some future point in time, I'd like to look at this code
carefully to see if I can get everything in this codepath inlining.
I didn't do it in this patch because there are circular include
problems making things difficult.

Some other details:

- Added Device::backend() which does what it says on the tin

- SparseTensorImpl is temporarily hard-coded to root in at::Context
  for the appropriate context.  If/when we put this in shared code,
  we'll have to break this dep too, but for now it should be OK.

- There's a stupid problem with globalContext() deadlocking if
  you didn't actually initialize it before loading libtorch.so
  (which is bringing along the variable hooks).  I didn't fix
  it in this PR; it's tracked in #9784

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9787

Reviewed By: cpuhrsch

Differential Revision: D8980971

Pulled By: ezyang

fbshipit-source-id: 2b4d867abfdc3999a836a220c638c109053145a8
2018-08-01 15:34:56 -07:00
Peter Goldsborough
04939a4745 Match parameter names and = default (#9737)
Summary:
More clang tidy cleanups in `torch/csrc`. This time:

1. `hicpp-use-equals-default` recommends `= default` instead of `{}` for constructors/destructors. This is better practice because it expresses the intent better (https://stackoverflow.com/questions/6502828/what-does-default-mean-after-a-class-function-declaration)
2. `readability-inconsistent-declaration-parameter-name` enforces that parameter names in the declaration match parameter names in the definition. This is just generally useful and can prevent confusion and bugs.

Also updated my script a little bit.

apaszke ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9737

Differential Revision: D9069069

Pulled By: goldsborough

fbshipit-source-id: f7b3f3a4eb4c9fadc30425a153566d3b613a41ae
2018-07-30 14:10:00 -07:00
Adam Paszke
8cb1eef7b9 Unify IR operator representation (stop using attributes in the JIT) (#9807)
Summary:
Based on top of #9763 (first 3 commits belong to that PR). The first commits from this PR are "Stop using attributes ..."

I tried to separate the changes into fairly meaningful commits. I can't split them up into smaller PRs, because everything starts working and all tests pass only after the whole sequence, but hopefully this will make reviewing somewhat easier.

Known issues/regressions/future tasks:
- `aten::lerp` and `aten::clamp` are no longer fusable
- `CreateAutodiffSubgraphs` needs a rewrite
  - It is much more strict now, and will miss a lot of opportunities, especially when viewing ops are involved. Our previous approach was "ignore the assumption on shape availability in gradient formulas to determine differentiability, and hope that shape prop will be robust enough to actually deliver them before we differentiate", which obviously doesn't scale well to more complex cases. We should either work on reducing the size dependency of grad formulas (feasible e.g. for `view`/`reshape`, unfeasible for `squeeze`/`unsqueeze`), or make `CreateAutodiffSubgraphs` integrate some kind of "I could integrate this node into an AD subgraph, but will I be able to infer the shape of its input" reasoning (kind of like a limited shape prop, that doesn't infer anything, and only tells if it *could* infer something).
  - It sometimes creates constant-only (or constants + one node) graphs, which is useless
- Broken `aten::add` in auto-batching, because it gained a non-tensor input. I changed the test for pointwise operations to use `aten::mul` instead, but I needed to disable the LSTM cell test. I'm not sure how scalar constants should be implemented in this case, because I don't fully understand our format. cc: ChunliF
- Graph import does some hacks to recover type of constants. This code should be removed once we'll gain the ability to export the IR along with value types.
- There's still a fair amount of dead code that can be removed. I didn't want to make this diff any bigger, and removing it is an easy task.
- Graph fuser could be improved to use signature matching (possibly using `OperatorSet`) instead of basing on node kinds.
- Manual constant propagation for the `ListConstruct` node in `torch/onnx/utils.py` should be replaced with a proper constant propagation pass (or we should ensure that the one we have handles at least this case before we remove this code).

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9807

Reviewed By: ezyang

Differential Revision: D9004285

Pulled By: apaszke

fbshipit-source-id: fe88026a765f6b687354add034c86402362508b7
2018-07-26 22:11:50 -07:00
Adam Paszke
e39c8043dc Make GraphExecutors work on Stacks instead of variable_tensor_lists (#9763)
Summary:
This is blocking the IR operator unification, because I need to be able to pass scalars to backward functions.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9763

Reviewed By: zou3519

Differential Revision: D8978457

Pulled By: apaszke

fbshipit-source-id: 570b4c3409322459cb0f2592069730a7d586ab20
2018-07-26 12:00:27 -07:00
Edward Yang
6cd0174ff5 Reimplement localScalar as a native function. (#9762)
Summary:
I split it into two parts, _local_scalar and _local_scalar_dense (unchecked)
so I could reuse the sparse logic in both paths.

_local_scalar became a method on Tensor to work around a circular
include problem.

This is resurrected copy of #9652
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9762

Differential Revision: D8972348

Pulled By: ezyang

fbshipit-source-id: 2232dbfc8e1286b8a4a1c67d285c13a7771aad4c
2018-07-25 19:09:39 -07:00
Peter Goldsborough
f62bc01dfe Remove TORCH_ASSERT (#9575)
Summary:
I got some tensor->variable conversion exceptions from `torch/csrc/autograd/variable.h`, which used the `TORCH_ASSERTM` macros instead of `AT_CHECK`, so they didn't have backtraces. This was such a substantial loss for debugability that I decided to update the whole codebase to use the backtrace-enabled ATen macros instead of `TORCH_ASSERT` and `JIT_ASSERT`, the latter having been an alias of the former.

ezyang apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9575

Differential Revision: D8924566

Pulled By: goldsborough

fbshipit-source-id: 7a4013b13eec9dbf024cef94cf49fca72f61d441
2018-07-24 18:10:06 -07:00
Edward Yang
1283834600 Devirtualize TensorImpl::toString (#9758)
Summary:
This can hardly be called an improvement (we now print
CPUFloatType instead of CPUFloatTensor) but it was the
simplest way I could think of devirtualizing this function in
the short term.  Probably need some sort of native function
that gives string information about a tensor.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Approved in #9710
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9758

Differential Revision: D8966935

Pulled By: ezyang

fbshipit-source-id: a4641affe0a6153f90cdd9f4f2a1100e46d1a2db
2018-07-24 11:11:33 -07:00
Edward Yang
099b5ba9d1 Tensor merge PRs from July 20 (#9713)
Summary:
Constituent PRs:

- [x] #9553 Remove unnecessary functions from StorageDerived.h (by cpuhrsch, reviewed by ezyang)
- [x] #9588 Use THTensor/Storage for THVoidTensor/Storage (by cpuhrsch , reviewed by gchanan)
- [x] #9627 Delete context from tensor (by ezyang, reviewed by gchanan)
- [x] #9641 Tensor reorganization (by ezyang, reviewed by gchanan )
- [x] #9647 Remove dim_ from THTensor (by cpuhrsch, reviewed by ezyang)
- [x] #9650 Remove context (by cpuhrsch, reviewed by gchanan and ezyang)
- [x] #9715 Fix Windows build in tensor merge PR (by ezyang, reviewed by gchanan and SsnL)

Upcoming PRs which didn't make this cut:

- [x] #9644 Stride move to TensorImpl, and nits (by ezyang, reviewed by gchanan)
- [ ] #9652 Native localScalar  (by ezyang, **UNREVIEWED AND FAILING TESTS**)
- [x] #9710 Devirtualize TensorImpl::toString (by ezyang, reviewed by gchanan)
- [ ] #9654 Use int64_t instead of ptrdiff_t for size / Rename flag to resizable_  (by cpuhrsch, **CHANGES REQUESTED AND FAILING TESTS**)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9713

Reviewed By: gchanan

Differential Revision: D8960882

Pulled By: ezyang

fbshipit-source-id: 99747b2c5462c7ff6809b67aacb4197626408204
2018-07-23 18:00:41 -07:00
Zachary DeVito
a949245a86 Switch interpreter to use IValue's primitive int/floats (#9718)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9718

This patch switches the interpreter to use IValue's primitive numbers rather than tensors for computing on integers and floats. In addition to preparing the interpreter for first-class support of other types, this cleans up the handling of primitive numbers, making it possible to just use the normal operator overloading dispatch to find the right implementation for numbers. As a result of this change, a lot of other functionality needed to be updated since it was the first time we use non-tensors in a lot of places in the code base.

Notes:
* Fixes code_template.py so that multi-line strings are indented correctly when used on a standalone line
* Cast operators (`int(x)`) now are functional. Some tests have addition conversions to integers because
we no longer allow implicit tensor -> integer conversions following the same convention as in python
* prim::ListConstruct/createList has been added to the interpreter for creating lists and this has
replaced aten::stack for integers lists
* gen_jit_dispatch.py has been refactored so that non-tensor types use operators on IValues to extract
the primitives
* IValue gains a .to<T> method that is the equivalent of tensor_as but for IValue instead of at::Tensor
* `constant_as<T>` is switched over to using IValues's `.to<T>` method, to make conversion from constant->IValue->C++ type
more consistent. This functionality combined with `toIValue(Value*)` replaces the `tensor_as` and `as_tensor` family of functions.
* conditional expressions (if, loop) and operators related to them are now computed on integers rather than tensors
* IValue gains constructors for constructing from at::Scalar and converting to it. However, IValue itself will always store
the scalars as a double or int64.
* To align with python 3 syntax, TK_INT, TK_FLOAT, and TK_BOOL have been removed from the parser, and int/float/bool are just treated as special identifiers in the compiler,
along with print. These are represented as special sugared values with a `call` method implemented. For int/float/bool this implements casting behavior.
* Dropped shared_from_this from Type/Module. They were not needed and they making debugging harder because they internally throw/catch exceptions.
* Shape propagation has been updated to support running nodes that include floating point primitive types, this required some refactoring of internal functions.
* TensorToNum and NumToTensor have actual implementations as operators now
* regster_prim_ops now contains implementations of math operators for float/int primitive types, and for mixed (prim <+> tensor) versions. This removes the need for special handling in compiler.cpp
* Primitive math is now entirely handled by letting the compiler choose the right overloads. This removes tons of special casing in the compiler.
* incorporates eellison's change to allow casting from return values. Due to the addition of primitive support, the code need slight modifications, so I just pre-merged it here.
* stack.h gains generic vararg versions of push/pop that know how to convert to/from C++ types:

```
at::Tensor a;
at::Scalar b;
pop(stack, a, b);
at::Tensor c = a + b;
push(stack, c);
```
apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9584

Reviewed By: apaszke

Differential Revision: D8910546

Pulled By: zdevito

fbshipit-source-id: 0f3e60d4d22217f196a8f606549430e43b7e7e30
2018-07-23 14:11:11 -07:00
Zachary DeVito
9ed2190bdb Add a tagged union type that replaces tensor in the interpreter. (#9368)
Summary:
IValue is short for interpreter value. It is used frequently so a short name is important.
This will allow us to implement more non-tensor types in an efficient way and remove
many hacks from the compiler.

This PR is limited. It only introduces IValue and changes interpreter to use it.
Follow up PRs will:
* Change the way aten_ops consume non-tensor types so that integer lists,
  are no longer represented as Tensors.
* Introduce TensorList as a fundamental type and remove all vararg handling in gen_jit_dispatch
* Change the compiler to implement math on primitive numbers rather than converting to tensors.

jamesr66a  apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9368

Reviewed By: ezyang

Differential Revision: D8817598

Pulled By: zdevito

fbshipit-source-id: 29dce80611ce5f6384234de9d12a67861d2b112f
2018-07-16 15:40:22 -07:00
Adam Paszke
b9f575fc33 Remove legacy code from the JIT (#9323)
Summary:
In particular, get rid of backward tracing and CppOp.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9323

Reviewed By: ezyang

Differential Revision: D8795935

Pulled By: apaszke

fbshipit-source-id: fb7a7eeee41902da35f2a8efd77262ca60fd6bbe
2018-07-11 10:25:38 -07:00
Zachary DeVito
efefd1d7cf Unify aten_dispatch and aten_schema into a single operator abstraction with human-readable schema. (#8885)
Summary:
This is a series of two commits that should probably be read separately. They are stacked on top of #9018 since the second commit requires it for correctness.

Commit 1
=======

This commit is the first in a series that will clean up how we handle declaring operators and intrinsics in the JIT to make it more modular and readable. This introduces readable declarations that can be used to register operators and switches gen_jit_dispatch to generate this schema. A follow up PR will remove the dispatch keys like "add-3" and resolve ops directly based on the registered schema, further simplifying the generation process.

* Switches schema over to parsed declarations, in the future this will allow something like:

```
  registry.register_intrinsic("foo(Tensor a, Tensor b) -> Tensor", [](Stack& stack) {
    ...
  })
```

This will allow the scalable registration of intrinsics for lists, tuples, and other ops, as long as meta-data for these ops (e.g. derivatives and size propagation routines).

The declarations resemble those used by PythonArgParser but have been singificantly cleaned up to minimize the number of types that can appear in the declaration. We should strive to get the other parts of PyTorch switched over to this restricted declaration set when possible, but it is too much to do in a single PR. My hope is that eventually we will use a very similar language to describe declarations in C10, and this can serve as a guide for that.

Parsing is done using the script lexer, so it is very robust to whitespace and extensible for future types.

This removes the other way we encoded schema, and makes it easier to see what schema are registered.

Current generated declarations: https://gist.github.com/zdevito/a96a17766fb3a098d69a91ee00abaaf6

* Switches how we handle attempting to use an integer in the place of a fixed-sized int list, such as in conv (e.g. 'int[3] stride=1'). Now that we can statically distinguish between int and Tensor, we handle the expansion as an implicit conversion in the compiler. This allows us to simplify the interpreter since it no longer needs to handle the conversion itself.

* Schema declarations have been changed so that they match the type system in the IR exactly. In particular, attribute_info which was used by liftConstantAttributes has been dropped and constant attributes are lifted purely based on the type of the input. Type conversions in compiler have been simplified due to this change.

* Error highlighting in ErrorReport now only reports at most 20 lines of code, to make reading where an error occurred easier.

Commit 2
=======

This commit unifies aten_dispatch and aten_schema into a single Operator object that both contains schema and implementation information. In the future we can use this object to also contain functionality like shape prop and autodiff needed by all operators. Operators are registered globally, and dispatch logic uses the schema information to figure out which variant to use. Descriptor keys, a frequent source of inscrutable debug errors, have been removed.

* Introduce Operator, to replace TensorOp. Unlike TensorOp, we use Operator for all op implementations, including primitives that may occur in the graphs. The only exceptions are ops that are only known to the interpreter like jumps, and GraphExecutors where we need to record additional debug info.

* Adds a global registry for Operator implementations. aten_dispatch.cpp turns into register_aten_ops.cpp, which registers all the Operators for aten with the operator registry. register_prim_ops.cpp now contains the implementations for primitive operators that used to be in the interpreter. This means that it is now safe to use `getOperation(node)` to lookup the true interpreter function for the node, which will simplify const-propagation passes.

* Remove addInterpreterOpHandler in favor of global operator registry.

* Instead of descriptors, we match Node arguments directly against FunctionSchema describing expected inputs in `matchSchema`. `matchSchema` knows how parse both attributes and positional inputs from a node and match it to the appropriate registered operator. Debug error messages when we try to run an invalid operator are significantly improved: they now automatically display the schema for the op with the same name that are registered.

* Merge aten_schema into regsiter_aten_ops. Each Operator takes a string schema which is parsed to determine when to dispatch to that op.

* Cleans up gen_jit_dispatch.py now that we do not need to write out descriptors.  In particular, skip_scalar_overloads can be removed since Richard's code sorts declarations to put Tensor, Tensor declarations first.

* remove matchSchemaAndLiftConstantAttributes and use emitBuiltinCall instead to remove code duplication

* refactor stack manipulation functions into a separate header file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8885

Reviewed By: jamesr66a

Differential Revision: D8751048

Pulled By: zdevito

fbshipit-source-id: 312aabfbf88307c5f6ab947b6caf691468b94557
2018-07-10 10:24:48 -07:00
Zachary DeVito
f74207c99f
Allow autograd to work even when the shape of values cannot be determined (#8641)
This commit implements the solution proposed in https://github.com/pytorch/pytorch/issues/8410
to workaround the need to create zero tensors with the same shape as inputs.
It introduces the concept of a LinearBlock which marks places in the code
where we know if all the inputs to the node are zero, then the outputs
to the node are also zero. Autodiff introduces LinearBlocks around
backwards functions, which have this property. specializeUndef then
propagates Undef nodes using this information.

Notes:
* Since we do not always specialize, we have a pass LowerLinearBlocks
that replaces the block with an if statement that dynamically guards
the Undef case.
* We introduce AutogradAdd which is addition that still works when
its inputs might be undefined. In cases where we specialize this will
get removed in favor of a normal add, but there are cases where
gradient graphs do not specialize (e.g. when they are not differentiable,
but a derivative is required) so it is important for this op to be executable.
2018-06-25 18:40:04 -07:00
Richard Zou
8489c4cc6e
Better support for literals in jit script (#8687)
Addresses #8177

A design doc can be found here: [gist](https://gist.github.com/zou3519/4b7f13f03cc9f3612bd9363e6405fa0a) version or [quip](https://fb.quip.com/azL1AqUckBdo) version

General approach:
- Add NumberType, FloatType, IntType to represent Python numbers, floats and ints.
- Emit these types for python literals
- Change aten_schema such that Scalars are NumberType, int64_t and bool are IntType.
- Emit aten::type_as, prim::NumToTensor, and prim::TensorToNum nodes for tensor-number math. (see examples below)
- Erase NumberType,  prim::NumToTensor, and prim::TensorToNum for ONNX export

### Tensor/number math
```
import torch
@torch.jit.script
def fn(x):
    return x + 1
```
```
graph(%x : Dynamic) {
  %1 : int = prim::Constant[value={1}]()
  %2 : Dynamic = prim::NumToTensor(%1)
  %3 : Dynamic = aten::type_as(%2, %x)
  %4 : Dynamic = aten::add[alpha={1}](%x, %4)
  return (%5);
}
```

### Number/Number Math
```
import torch
@torch.jit.script
def fn(zero):
    c = 1 + 1
    return zero + c
```
```
graph(%zero : Dynamic) {
  %1 : int = prim::Constant[value={1}]()
  %2 : int = prim::Constant[value={1}]()
  %3 : Dynamic = prim::num_to_tensor(%1)
  %4 : Dynamic = prim::num_to_tensor(%2)
  %5 : Dynamic = aten::add[alpha={1}](%3, %4)
  %c : int = prim::TensorToNum(%6)  # this is the result of the addition
  ...
  return (%13);
}
```

List of squashed commits:

* Introduce Python Number types

Added: IntType, FloatType, NumberType with
IntType <: NumberType
FloatType <: NumberType

Changed aten_schema so arguments have corresponding types

* Emit a NumberType for python literals.

Also emit a NumberType for Scalar default values.

* Add prim::NumToTensor and prim::TensorToNum

* Add DynamicType -> NumberType implicit cast for bc

* Better ensureTensor error message

* Add ensureTensorOrNumber. Allow passing Number to some functions

Like the range() construct and slices

* Patch IntList to work.

IntList is still a DynamicType in the frontend: a tensor gets built from
a List[int].

Also, IntList[1] is a "union between int and IntList" the way it is
implemented. If the frontend sees an int being passed for an IntList[1]
arg, it converts it to a tensor as well.

* Enforce some order on schemas to avoid overload ambiguity

add(Tensor, Tensor) should appear earlier than add(Tensor, Scalar). This
matches the order in which python_arg_parser parses its arguments.

* Disable std_dim and var_dim tests.

With the new schema information, std(input, keepdim) and std(input, dim)
are ambiguous. This will need to be fixed at a later date.

* Add NumberType erasure pass.

This is used for ONNX export and to ensure that NumberType information
doesn't reach the interpreter

* Add support for mixed tensor/number math ops.

* Tests for new functionality.

Includes:
- Tensor/number math
- number/number math
- EraseNumberTypes pass test

* Patch tests

Update expect tests for:
- decompose_addmm
- loop unrolling tests

Because python numbers are now NumberType, they cannot be returned by
functions anymore. Work around this by using "torch.full", or by adding
a tensor([0]) (taken from FIXME_zerol()). Both approaches are used
because torch.full is more readable, but it is broken in some cases.

* Add erase_number_types to torch/CMakeLists.txt

* Move math back to emitSimpleExpr from emitSugaredExpr

* Remove some dead lines

* Renable some excluded script/trace tests that are fixed.

* Move some tests to expected failure

* Address some comments (more addressing to come)

* Erase relevant aten::type_as nodes in EraseNumberTypes

I also changed it so that EraseNumberTypes is only called for ONNX
export. It is no longer used to prevent
prim::NumToTensor/prim::TensorToNum from reaching shape_analysis or
interpreter.cpp.

shape_analysis infers the type of the output of these nodes to be the
same as their input.

intepreter.cpp treats both of these nodes as no-ops.

* Add reminder to fix std/var

* Call EraseNumberTypes only when exporting a script module

* Update expects after rebase
2018-06-21 15:43:38 -04:00
Peter Goldsborough
372d1d6735
Create ATen tensors via TensorOptions (#7869)
* Created TensorOptions

Storing the type in TensorOptions to solve the Variable problem

Created convenience creation functions for TensorOptions and added tests

Converted zeros to TensorOptions

Converted rand to TensorOptions

Fix codegen for TensorOptions and multiple arguments

Put TensorOptions convenience functions into torch namespace too

All factory functions except *_like support TensorOptions

Integrated with recent JIT changes

Support *_like functions

Fix in place modification

Some cleanups and fixes

Support sparse_coo_tensor

Fix bug in Type.cpp

Fix .empty calls in C++ API

Fix bug in Type.cpp

Trying to fix device placement

Make AutoGPU CPU compatible

Remove some auto_gpu.h uses

Fixing some headers

Fix some remaining CUDA/AutoGPU issues

Fix some AutoGPU uses

Fixes to dispatch_tensor_conversion

Reset version of new variables to zero

Implemented parsing device strings

Random fixes to tests

Self review cleanups

flake8

Undo changes to variable.{h,cpp} because they fail on gcc7.2

Add [cuda] tag to tensor_options_cuda.cpp

Move AutoGPU::set_index_from into .cpp file because Windows is stupid and sucks

Fix linker error in AutoGPU.cpp

Fix bad merge conflict in native_functions.yaml

Fixed caffe2/contrib/aten

Fix new window functions added to TensorFactories.cpp

* Removed torch::TensorOptions

Added code to generate wrapper functions for factory methods

Add implicit constructor from Backend to TensorOptions

Remove Var() from C++ API and use torch:: functions

Use torch:: functions more subtly in C++ API

Make AutoGPU::set_device more exception safe

Check status directly in DynamicCUDAHooksInterface

Rename AutoGPU to DeviceGuard

Removed set_requires_grad from python_variables.h and warn appropriately in Variable::set_requires_grad

remove python_default_init: self.type()

Add back original factory functions, but with deprecation warnings

Disable DeviceGuard for a couple functions in ATen

Remove print statement

Fix DeviceGuard construction from undefined tensor

Fixing CUDA device compiler issues

Moved as many methods as possible into header files

Dont generate python functions for deprecated factories

Remove merge conflict artefact

Fix tensor_options_cuda.cpp

Fix set_requires_grad not being checked

Fix tensor_new.h

TEMPORARILY put some methods in .cpp files to see if it solves issues on windows and mac

Fix bug in DeviceGuard.h

Missing includes

TEMPORARILY moving a few more methods into .cpp to see if it fixes windows

Fixing linker errors

* Fix up SummaryOps to use new factories

Undo device agnostic behavior of DeviceGuard

Use -1 instead of optional for default device index

Also move DeviceGuard methods into header

Fixes around device index after optional -> int32_t switch

Fix use of DeviceGuard in new_with_tensor_copy

Fix tensor_options.cpp

* Fix Type::copy(

* Remove test_non_float_params from ONNX tests

* Set requires_grad=False in ONNX tests that use ints

* Put layout/dtype/device on Tensor

* Post merge fixes

* Change behavior of DeviceGuard to match AutoGPU

* Fix C++ API integration tests

* Fix flip functions
2018-06-16 00:40:35 -07:00
Adam Paszke
9232afeffa
Add code for TensorBoard visualization of JIT GraphExecutors (#8050) 2018-06-02 20:55:25 +02:00
Zachary DeVito
23dd033b51 Factor python dependency out of interpreter (#7970)
* Factor python dependency out of interpreter

* Remove NO_PYTHON for the autograd engine

If there is no python bindings, then a default Engine is constructed
the first time it is requested.

If the python libraries are loaded, then they override the default
accessor and the default engine becomes a python Engine.

Note: it is possible for two engines to be generated if a non-python
one gets created before the python bindings are loaded. This case
is rare, and just results in additional threads being spawned.

* Fixing AlexNet test which is skipped in CI
2018-06-01 16:07:21 -04:00
Zachary DeVito
286cd04a20
JIT cleanup (#7631)
Cleans up dead code in the JIT:

* Remove interpreter_autograd_function
* Remove Handles
* Remove HandleBuilder
* Remove creates_handles, and tracing_autograd_python_function flags
* Remove unused var_args
* Fix submodules
2018-05-21 10:06:29 -07:00
Zachary DeVito
ce69d3110b
Improve script builtin checking using schema (#7311)
Improve script builtin checking using schema

* This add aten_schema.h which provides a barebones amount of type and
  argument information about each builtin operator
* emitBuiltinCall is updated to use this information rather than
  aten_dispatch to ensure the operator is correct.
* handling of keyword and position arguments now matches python behavior
* There is no longer a requirement that kwargs be constant or that the
  attributes of an op must be entirely constant or non-constant
* compiler now constructs a non-attributed version of the op first and
  then turns it into the constant-attribute version if all attributes
  are constants.
* default arguments for builtins now work
* SugaredValue::call and similar functions now have SourceRange information
  for their arguments so that error reporting is more accurate

Notes:
* This does not try to merge the builtin checking with python arg parser.
  Given that we will eventually have C10 schema which will replace aten_schema,
  we will eventually have a C++ description of the schema and working of that
  description directly will be the easiest form to understand.
* python function calls and script method calls do not support keyword arguments yet.
  When we add this support we should refactor the handling in tryEmitSchema
  that resolves keywords into a common function.

* default arguments work
* keyword arguments to builtins work (still need to extend to calling python and other script methods)
* much better error reporting for incorrect builtins

Lift any constants to attributes on nodes when possible

* Schema  is usable internally in the compiler as
  the function signatures of script functions as well as for builtin
  operators.
* Adds a List[T] class to better represent the arguments to cat/stack
  as a type rather than with custom checking.
* Support kwargs for calls of script methods

A future commit will be needed to add support for:
* calls to script _functions_ which are currently are GraphExecutors without schema info.
* kwargs to python functions, which will require refactoring python op
2018-05-14 14:46:36 -07:00
Zachary DeVito
38bc732b2d
[jit] Change interpreter/fuser to work on Variables only (#7489)
* this removes the flag controlling whether the interpreter works on variables.
* now the interpreter _always_ works on variables
* constants in the IR are still _always_ non-variables, and an assert was added to ensure this.
* as_tensor was split into as_variable and as_tensor since it is sometimes used
  to construct constants in the IR
* I tried changing the IR to also always use variables but that change was much more
  cross cutting and fragile and I never got it working
2018-05-11 13:33:47 -07:00
James Reed
4667983f0f
Fixes for interpreter and ONNX export for translation (#7044)
Fixes for interpreter and ONNX export for translation

Address comments
2018-04-27 22:23:57 -07:00
Zachary DeVito
b7487d42a0
Workaround to make PythonOps traced with torch.jit.trace work correctly. (#6738)
The long-term fix is to remove the handling-creating pathways and
remove all the modes from PythonOp making it into an op that simply
calls a PyObject. Right now ONNX expects PythonOp to hold a
nn.Function, not a generic callable, so completely removing the legacy
pathway will also require changes to how ONNX symbolics are found.
2018-04-24 17:21:00 -07:00
Zachary DeVito
d985cf46f1
Add workaround to fix include warnings in Python 2 builds. (#6716) 2018-04-24 12:30:19 -07:00
James Reed
213fa61706 Implement range for loop in script (#5827)
* Implement range for loop in script

* Fix handling of boolean constants

* Use WithInsertPoint

* Allow dynamic max trip count

* fix symbols

* Fix argument order

* fix test

* Add insert{Input,Output} APIs and use them

* Factor out condition stuff

* clang-format

* Address remaining comments

* Fix tests

* Implement script in AST frontend
2018-03-23 11:55:32 -04:00
Edward Z. Yang
acc409396b
Namespaced symbols (#5820)
* Namespaced symbols

- Our interned strings now have structure, "ns::symname" rather than just
  "symname" before.  We support efficient namespace testing for uniques
  by encoding the namespace in one byte in the Symbol internal representation.
  See torch/csrc/jit/interned_strings.h for a more in-depth implementation
  discussion.

- All uses of ksymbol are now attr::symbol (or some appropriate namespace).
  The valid namespaces are prim, attr, onnx and aten.

- Symbol is bound in Python as a qualified string "attr::symbol", EXCEPT for the
  attribute setting/getting API, whose symbols must always be attr
  symbols; they get special cased to assume strings are passed.
  There's a little bit of naughtiness in the implementation, maybe you know
  how to solve it.

- However, the g.op() convenience function assumes that you're generating
  ONNX operators, unless you explicitly qualify.

- All ATen operators and nodes have built-in interned strings generated
  for them, so you should never have to write a string literal ever again.
  The tracing code is adjusted to use it.

- ONNX exporter now properly tests to see that all operators are in
  onnx namespace before accepting the export.  This is way more
  robust than the previous exporter, which would be willing to
  export capitalized operators which were not actually ONNX operators.

- A slight organizational change for symbolic.py; this module now ONLY
  contains aten operators.  In particular, the exporter for Constant
  has moved into utils.py (along with Undefined, from the C++ side),
  since primitive ops get "special treatment."

- The un-inplacing logic in recording is more robust, so that we don't
  delete a trailing underscore from __and__.  This never affected us
  before because we didn't have any tests for it.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-03-16 13:36:11 -04:00
Zachary DeVito
41285edbb6 [jit] add a compiled script module (#5630)
Add script::Module C++ class to represent script modules
switch AST -> IR conversion to work on Modules/Methods rather than raw graphs
function-only AST -> IR conversion is just a simplified case where there is
only one module with a single method and no parameters.
introduce SugaredValue in compiler.h to represent values in scope in a script
function that are not first-class and that get desugared. This is used to
represent the module's self parameter, as well as python function calls,
and method calls on tensor
provide a Python ScriptModule that provides a nice API on top of script::Module
allowing for the definition of script modules with methods, parameters,
and submodules
Not in this PR but intended for the future:

ScriptModule actually subclasses nn.Module, with most methods implemented
Unification of tracedmodule and script module functionality into one container class.

Detailed changelog:

* Switch compiler over to using Module, but don't
use them yet.

* Remove intermediate attribute encoding in compiler

* Create SugaredValue object to handle resolution
of compiled module.

* switch to_ir to modules, implement Select

* hacky python wrappers

* Private ScriptModule

* Add `define` to script module

* Attributes use TK_LIST_LITERAL

this anticipates adding a real list literal expression to the language.

* Add a metaclass to make sure script stubs are registered

* Add a test

* Doc createResolutionCallback

* Docs and minor editing

* Address PR comments

* Document

* Fix unicode issue
2018-03-12 09:52:40 -04:00
Luca Antiga
396637cdd6 Python-free build of autograd + jit (#5356)
This PR adds the possibility to build the C++ parts of autograd and jit, with no dependency on Python.
The goal is to allow taking a PyTorch IR representation (a tree s-expr) and running it with provided inputs.

Prerequisite: build PyTorch so that codegen runs once.
Instructions:

cd tools/cpp_build
bash build_all.sh
This will build libtorchjit and torchjit_test in tools/cpp_build/build/torchjit-build. The latter basically runs the code in test_jit.cpp for now.

While writing the PR, it turned out that a few of Python.h includes were redundant. They were removed here (PyTorch tests still pass on my machine, we'll see CI).

* Introduce Python-free builds of autograd and jit

* Remove NO_PYTHON ifdef in functions/special
2018-03-08 15:13:10 -05:00
Zachary DeVito
806239d6bd
Fix a bug gen_jit_dispatch.py (#5518)
* Fix a bug gen_jit_dispatch.py

The `fromLast` function is confusing to understand since `fromLast(stack, 0)`
was actually invalid whereas `fromLast(stack, 1)` was the last element.
This created off-by-one bugs in gen_jit_dispatch for some operators.

This changes it to `peek(stack, i, N)` which treats the last `N`
elements of the stack as a list, and extracts element `i` of that list.
This usage reflects how `fromLast` was actually being used in the code.

`peekSlice(stack, i, len, N)` similarly treats the last N elements
as a list but extracts a slice. This enables use to get rid of
drop calls and simplify the dispatch logic.
2018-03-02 10:32:02 -08:00
Peter Goldsborough
a5e1b4efc9 Fix warnings in jit (#5499) 2018-03-01 15:15:35 -05:00
James Reed
55c64e5243 Add Python function calls to JIT script (#5445)
* Add Python function calls to script
* Script compiler gains a `Resolver` object that runs when it does not understand a function call. This decouples the python resolution from the conversion to IR.
2018-02-28 19:45:04 -08:00
Zachary DeVito
39608b0180
Add source information to IR nodes (#5449)
* Add source information to IR nodes

SourceRange information from the script is not propagated to IR nodes.
This information is only used in two places now: the interpreter
wraps errors that occur when an instruction executions and shape
propagation now reports errors on the line where it fails:

    Traceback (most recent call last):
      File "test/test_jit.py", line 1655, in test_script_error
        bar(Variable(torch.rand(10), requires_grad=True), Variable(torch.rand(9), requires_grad=True))
    RuntimeError:
    The size of tensor a (10) must match the size of tensor b (9) at non-singleton dimension 0:
    @torch.jit.script
    def bar(c, b):
        return c / b
               ~~~~~ <--- HERE

In the future, shape propagation should really not report any size
errors and instead just not propagate shapes and let the actual
execution fail. However, this is hard to accomplish while we still
depend on running the op to do shape propagation.
2018-02-28 17:06:18 -08:00
Zachary DeVito
05269b582b
[JIT] Support shape propagation with control-flow (#5391)
Support shape propagation with control-flow

* This allows us to enable optimization in the GraphExecutor for most
  script tests.
* Changes Type to always be present (non-null) on a Value, removing `hasType()`
  and `typeOption()`. A new type kind 'DynamicType' now represents when
  a specific type has not been determined.
* If/Loop nodes propagate shapes/types in the simple cases where types of
  outputs do not change depending on where control flows. In other
  cases, we propagate DynamicType to indicate we do not know what
  the shape will be.
* Remove the `cond` input to the body of Loop to simplify handling in
  interpreter and shape propagation.
* Bugfix for zero-dim contiguousStridesOf
2018-02-26 15:24:05 -08:00
Adam Paszke
a0118533ef
Add a print() function to the JIT script (#5274)
Additionally:
- add support for calling functions that are not methods in the Python frontend
- add an end-to-end test for the Python frontend
- add a capture_stdout helper for checking that `print` actually works
2018-02-24 11:15:55 +01:00
Zachary DeVito
8904616028
add control flow to interpreter (#5293)
* Use stacks in the interpreter/aten_dispatch

Rather than have separate input/output lists,
the interpreter now works using a single stack.
Operators in the interpreter push/pop from the stack.
This allows ownership of tensors to transfer directly to an operator,
and an operator can drop the reference to a tensors as soon as it is
no longer needed. This is important for the GraphExecutor op,
which recursively runs the interpreter.

Once autograd is updated to pass variables to Function by value,
we will be able to ensure that we release ownership as soon as possible.

This commit also switches the interpreter to use a fake
tensor 'ContainerTensor' rather than at::Retainable to hold non-tensor
data in the interpreter. This allows us to use std::vector<at::Tensor>
for all registers, which is significantly less confusing than the
OwnedRetainables struct it was replacing.

* Add If and Loop to interpreter

* Preprocess loop to calculate where references to tensor should be dropped
* Add control instructions JumpZ/JumpNZ/Jump
* Switch from explicitly having stage structs to having a single list
  of instructions with Store/Load instructions to take values off the
  initial stack
* Make the interpreter tests executable rather than use expect files
* add a flag to interpreter code so that constants are variables
  if the interpreter is running on variables.

* Add tensor_as to its own file
2018-02-22 19:56:15 -08:00
Peter Goldsborough
702a7f3864 Improve Function interface (#5221)
* Improve Function interface

* Undo tracer changes

* Fix bug in VariableType.set_history

* Rename function_counter and sequence_number to sequence_nr

* Clarify Function documentation

* Replace swap_next_edges with next_edges() getter

* Bring back set_gradient_edge

* Simplify special.cpp

* add_gradient_edge -> create_gradient_edge

* Add mutable getters for pre/post hooks

* Use make_variable with Edge

* Remove remove_gradient_edge in favor of detach_

* Fix documentation and remove create_gradient_edge friend method

* Canonicalize some includes
2018-02-21 16:37:52 -05:00
Peter Goldsborough
2d5fbe6e0d Improve Variable interface (#5127)
* Improve Variable interface

* Address comments from @apaszke and @colesbury

* string ::operator= is not noexcept

* Remove ir.h from tracer_state.h to improve build times

* Make Variable a struct and pack SavedVariable fields

* Implement as_variable_ref

* grad_fn_ptr() -> grad_fn_unsafe()

* Reduce hackiness of set_type hack

* Include variable.h and edge.h in tracer_state.h because it uses them

* class Variable -> struct Variable because Windows cant even

* Make Variable::output_nr uint32_t instead of int

* Add comment about tracing state

* Replaced more static_cast<Variable&> and improve docs

* Remove SavedVariable destructor and construct members in init list

* Clarify docs for Variable

* Variable::set_version -> set_version_counter
2018-02-12 23:26:26 -05:00
Peter Goldsborough
25e946bf78 Replace edge_type with Edge and create Variable::gradient_edge() (#5030) 2018-02-07 10:50:42 -08:00
Zachary DeVito
c308e03f3e
Initial GraphExecutor Implementation. (#4982)
This adds the initial implementation of graph executor for the new JIT design. It includes a few python tests ensuring that nograd, backward, and double-backward cases work for simple examples and some corner cases. More work needs to be done to performance optimize as there are many extra copies and places where we hold onto variables longer than we should. These are noted in the comments.
2018-02-02 17:45:59 -08:00
Adam Paszke
79d15c52cb
Improve the engine support for functional graph execution (#4690)
Previously the side-effect free grad calculation was performed
using callbacks that could also override the decision to run a
function. However this had a few problems e.g. it forced us to iterate
over pretty much all functions in the graph and drop their buffers.

This patch improves the mechanism, by adding explicit support for this
kind of evaluation in execute(). It's safer, and the algorithm used to
decide which nodes have to be evaluated was replaced with a faster one.
2018-01-18 11:20:30 +01:00
Adam Paszke
17148f891f Fix a leak in JIT interpreter 2018-01-03 13:44:49 -05:00
Zachary DeVito
766312b7f2 Further relax VariableFlags, ... and fix bugs (#4244)
* Further relax VariableFlags

* Allow a requires_grad=True trace to be used for a requires_grad=False
  input by computing the gradient but they not connecting it to the
  input.
* Enable CSE to de-duplicate WLM backwards pass code which calls sum twice.
* Fix a bug in the interpreter that frees a register too early when
  it appears twice in a use list.

* [fuser] Follow all outputs to check if fusion is safe

This bug was introduced when we allowed fusion groups
to fuse together. Previously producers were forced to have a single
output, but now producers that are fusion groups can have multiple outputs.
So now we check the uses of all the outputs of a producer.

* [JIT] Fix handling of undefined inputs

It is not legal to call .data() on variable objects whose tensors
are undefined.
2017-12-20 10:36:22 -05:00