Commit Graph

118 Commits

Author SHA1 Message Date
Vitaly Fedyunin
266c1652e6 Back out "Add memory format support to rand_like operator" (#28801)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28801

Original commit changeset: 2a1d47571268
ghstack-source-id: 92748792

Test Plan: buck test language_technology/neural_mt/os/pytorch_translate/test:test_onnx

Reviewed By: ifedan

Differential Revision: D18175304

fbshipit-source-id: ffd61f6e42f256b39b80a6b42d989c238228f25d
2019-10-28 12:44:45 -07:00
Vitaly Fedyunin
04f5325583 Add memory format support to rand_like operator (#27561)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27561

Adds memory_format keyword argument (positional for cpp).

'Preserve' behavior now follows next rules:
1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor.
2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format.
3) Output tensor is going to be contiguous in all other cases.

 ---
Dense tensor is the tensor that store values in a contiguous block of memory.
Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory.

Test Plan: Imported from OSS

Differential Revision: D17980316

Pulled By: VitalyFedyunin

fbshipit-source-id: 2a1d47571268673de0c6f5ae1b6d4f9110962ab0
2019-10-25 07:29:12 -07:00
Mike Ruberry
ac7996ccd3 Removes SymbolicVariable (#25077)
Summary:
This PR excises the last of SymbolicVariable. There should be no change in functionality. One new test for addmm fusion was added. A case where the peephole optimizer might convert a scalar argument remains untested.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25077

Test Plan: Refactors existing code so mostly covered by current tests. One test for addmm fusion was added.

Differential Revision: D17145334

Pulled By: mruberry

fbshipit-source-id: 6b68faf764f9ee8398b55c43110228ed9faf81eb
2019-08-31 11:19:50 -07:00
Zachary DeVito
bdc57d3833 Merge ProfiledTensorType and TensorType (#24284)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24284

This PR finishes the unification of all Tensor types into a single object.
ProfiledTensorType is renamed to TensorType and the old TensorType is
deleted.

Notes:
* Fixes bug in merge for VaryingShape by changing its representation to an
 optional list of optional ints.
* Removes ProfiledTensorType::create(type) invocations that can now
  simply be expect calls on tensor type.

Test Plan: Imported from OSS

Differential Revision: D16794034

Pulled By: zdevito

fbshipit-source-id: 10362398d0bb166d0d385d74801e95d9b87d9dfc
2019-08-20 13:01:28 -07:00
Nikolay Korovaiko
3d15ee1b34 Remove more uses of DimensionedTensorType
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23060

Differential Revision: D16460391

Pulled By: Krovatkin

fbshipit-source-id: b50ee87d22ad18b8cbfff719b199ea876ef172f1
2019-08-01 21:19:28 -07:00
Thomas Viehmann
cf50249bde Disable fusion of grad_sum_to_size (#23372)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/22833

grad_sum_to_size does not commute with AutogradAdd after all because it turns the broadcasting AutogradAdd into a broadcasting add.

Chillee did actually do most of the tracking down to the fusion of grad_sum_to_size and pinging me when he had found the cause. Thank you!

About the choice of removing the fusion completely instead of being more precise:
- We do have grad_sum_to_size elimination which works for cases where broadcasting does not actually happen in the forward, so the cases where the fusing of grad_sum_to_size is actually beneficial is much smaller than when initially proposed.
- There will be less fusion, in terms of the tests, IOU stops being fully fused. I vaguely think that it is a case we could handle with refined logic.
- Keeping it would add complexity in checking when to merge fusion groups to the complexities that this PR removes.
- The future of fusion probably lies more in more complete solutions including reductions (TVM or KeOps or our own or ...).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23372

Differential Revision: D16489930

Pulled By: soumith

fbshipit-source-id: bc0431b0d3eda264c401b634675872c4ce46f0f4
2019-07-25 08:55:33 -07:00
jjsjann123
252710262f (#22775)
Summary:
passing FusionCallback and Symbol to recursive GraphFuser calls. It ensures
consistent fusion in nested Blocks.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22775

Differential Revision: D16439979

Pulled By: soumith

fbshipit-source-id: 18d4b13f52b03708b8580c73f75450adbb672ac1
2019-07-25 05:54:03 -07:00
Bram Wasti
05d56bd1b6 Remove hard-coded NVRTC specific constant from fuser header
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22699

Test Plan: Imported from OSS

Differential Revision: D16192290

Pulled By: bwasti

fbshipit-source-id: 4dccaf3e6e0151e86d35474c36e1ddb7f2afb5cf
2019-07-11 13:44:25 -07:00
Thomas Viehmann
17941f9979 JIT: Eliminate SumToSize by using Optional Lists (#18697)
Summary:
This PR is a eliminates unneeded grad_sum_to_size and in particular speeds up the LSTM backward by allowing better fusion.

It consists of two parts:
- In AutoDiff, record broadcasting sizes only if the broadcast output size is different from the input size, otherwise record None.
- The specialization of Optional arguments (#18407) allows us to then eliminate ` _grad_sum_to_size(t, None)` in the peephole optimization   step.

Thus, in the LSTM case, no SumToSize remain in the crucial fusion group. The trick here is that we can specialize on the runtime information from the forward.

I'm testing that different broadcasting situations lead to different graphs.

I didn't move all symbolic_script _grad_sum_to_size to the new logic, but it might be better to do this incrementally, anyway.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18697

Differential Revision: D15482076

Pulled By: wanchaol

fbshipit-source-id: 7f89367e35b8729910077c95c02bccefc8678afb
2019-05-24 11:24:17 -07:00
Wanchao Liang
871c9dcb1d move batchnorm and layernorm fusion to decompose (#20337)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20337
ghimport-source-id: 2196f84f2ef384c1f25587b2fb4bd9dd2f63c2b4

Differential Revision: D15448596

Pulled By: wanchaol

fbshipit-source-id: b66e608f1b72471fc0775aaa4e09f9fa1070fc3c
2019-05-22 18:01:27 -07:00
Bram Wasti
7b733e4fc1 Rebase conflict fix for isFusableDevice (#20251)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20251
ghimport-source-id: 0c8c1847a7979fcd77e4f6618730b170b6b8ce25

Differential Revision: D15262850

Pulled By: bwasti

fbshipit-source-id: 17ecc340a310ddbcce141cfa3ee0efa9660194d2
2019-05-08 12:14:12 -07:00
Bram Wasti
4ca325df87 Add Custom graph fusion (#18588)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18588
ghimport-source-id: f40df177af8b87c73f04bf337f478a62133284cf

Differential Revision: D14901297

Pulled By: bwasti

fbshipit-source-id: 1b6371a5175b3d63dad542b7cc22cb82e8c6cfd0
2019-05-06 23:15:16 -07:00
Mikhail Zolotukhin
8b46938355 Cleanup includes in torch/csrc/jit/* (#19922)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19922
ghimport-source-id: 0434c46bf75621ff79ea27a18a2475e7f13e2487

Differential Revision: D15125015

Pulled By: ZolotukhinM

fbshipit-source-id: 5685edfc94067f62e363a85e9badb7f757b1d321
2019-05-06 13:40:26 -07:00
Zachary DeVito
a425e1cbf8 Remove duplicate inlineCallToCode (#19724)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19724
ghimport-source-id: a68d28ac9bbe62dd61f03bfd9d57f4ef1d0ce9c9

Reviewed By: jamesr66a

Differential Revision: D15078532

Pulled By: zdevito

fbshipit-source-id: bebd34ff6105f538395260b027dc169448b5bc96
2019-04-25 15:53:10 -07:00
Wanchao Liang
c571969148 Fix the insert_guard for norm decomposation (#19646)
Summary:
move the insert_guard all the way up to the beginning of the decomposation, this will fix the case that we lose insert_point context after decomposeCommonNormalization and we still need to modify the graph.

fixes #19502
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19646

Differential Revision: D15058040

Pulled By: wanchaol

fbshipit-source-id: ebdbf8623ebfe4556c461e1b650e94b905791adb
2019-04-24 23:12:37 -07:00
James Reed
e7fc7c732c Bugfix for fusion device check (#19594)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19594

I missed a callsite

Reviewed By: wanchaol

Differential Revision: D15041457

fbshipit-source-id: eef76ad51bee06a56d31b4ab64f19250fe2ad8f0
2019-04-22 20:55:17 -07:00
James Reed
5be4bee4ff Don't create FusionGroups for known-CPU producer values (#19342)
Summary:
I believe the existing check in FuseGraph was only `false` if PyTorch was built with NO_CUDA=1. Otherwise, we would create fusion groups even if we're on a CPU-only machine running CPU code. This is confusing. Instead I've made it so that the decision to fuse or not is dependent on if the producer Value is a known CPU tensor. If it is, we skip fusion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19342

Differential Revision: D15038351

Pulled By: jamesr66a

fbshipit-source-id: fce9d83929309a7bf14346833f84b996f3e7f6db
2019-04-22 16:57:18 -07:00
Michael Suo
1e94a3bc4d Turn resolver into a class (#19236)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19236
ghimport-source-id: d36705ea5ecff085d0d84ea57bb96d18d7c260dd

Differential Revision: D14928292

Reviewed By: zdevito

Pulled By: suo

fbshipit-source-id: cd038100ac423fa1c19d0547b9e5487a633a2258
2019-04-19 13:01:59 -07:00
Thomas Viehmann
b9291f55bb pow scalar exponent / base autodiff, fusion (#19324)
Summary:
Fixes: #19253

Fixing pow(Tensor, float) is straightforward.
The breakage for pow(float, Tensor) is a bit more subtle to trigger, and fixing needs `torch.log` (`math.log` didn't work) from the newly merged #19115  (Thanks ngimel for pointing out this has landed.)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19324

Differential Revision: D15003531

Pulled By: ailzhang

fbshipit-source-id: 8b22138fa27a43806b82886fb3a7b557bbb5a865
2019-04-18 17:58:35 -07:00
Wanchao Liang
a3d3008e73 JIT Layernorm fusion (#18266)
Summary:
Partially fuse layer_norm by decomposing layer_norm into the batchnorm kernel that computes the stats, and then fusing the affine operations after the reduce operations, this is similar to the batchnorm fusion that apaszke did, it also only works in inference mode now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18266

Differential Revision: D14879877

Pulled By: wanchaol

fbshipit-source-id: 0197d8f2a17ec438d3e53f4c411d759c1ae81efe
2019-04-12 14:38:31 -07:00
Zachary DeVito
ef406ee925 First class modules in the compiler, round 2 (#19167)
Summary:
This PR propagates where we use first-class modules objects into the compiler. This creates a transitionary state where:

* compiler.cpp creates Graphs where `self` is a Module class and attributes/parameters/buffers/submodules are looked up with `prim::GetAttr`
* GraphExecutor still runs "lowered graphs" where the self object has been removed by a compiler pass `lower_first_class_method`.
* Tracing still creates "lowered graphs", and a pass "lift_lowered_method" creates a first-class method graph for things.

* This PR separates out Method and Function. A script::Function is a pure Graph with no `self` bound.  Similar to Python, a script::Method is just a bound `self` and its underlying `script::Function`.
* This PR also separates CompilationUnit from Module. A CompilationUnit is just a list of named script::Functions.  Class's have a CompilationUnit holding the class methods, and Modules also have a CompilationUnit holding their Methods. This avoids the weird circular case Module --has a-> Class -> has a -> Module ...

Details:
* In this transitionary state, we maintain two copies of a Graph, first-class module and lowered. Th first-class one has a self argument that is the module's class type. The lowered one is the lowered graph that uses the initial_ivalues inputs.
* When defining lowered methods using `_defined_lowered` we immediately create the first-class equivalent. The reverse is done lazily, creating lowered_methods on demand from the class.
* The two way conversions will be deleted in a future PR when the executor itself runs first-class objects. However this requires more changes to (1) the traces, (2) the python bindings, and (3) the onnx export pass and would make this PR way to large.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19167

Differential Revision: D14891966

Pulled By: zdevito

fbshipit-source-id: 0b5f03118aa65448a15c7a7818e64089ec93d7ea
2019-04-11 13:55:48 -07:00
Zachary DeVito
f5165ade5b Revert D14842057: Compiler uses first-class modules**
Differential Revision:
D14842057

Original commit changeset: ca6e7b5a4380

fbshipit-source-id: e8f1862a59bf20d5f78648b2fdc53a8b3750ead3
2019-04-11 06:17:01 -07:00
Zachary DeVito
5e1f0b2a07 Compiler uses first-class modules** (#19043)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19043
ghimport-source-id: 0c9e80d5f35654af6d472abd5643bff3e9eb9ddf

Differential Revision: D14842057

Pulled By: zdevito

fbshipit-source-id: ca6e7b5a43805240f40b84d30e54495061067dc0
2019-04-11 00:00:48 -07:00
Roy Ju
a9a29dd63f Fixes error when too many parameters are passed to fused cuda kernel (#18063)
Summary:
Bug fix for https://github.com/pytorch/pytorch/issues/15043, where a large fusion in JIT with a large number of kernel arguments, which exceeds the limit allowed by nvrtc on a cuda device.
  The fix is to check the number of arguments before a cuda kernel is generated. If the number exceeds the limit, take the runFallBack() path.
  Add a reduced test from the original issue to keep the test time low. The test would fail without this fix.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18063

Differential Revision: D14691401

Pulled By: soumith

fbshipit-source-id: b98829bc89ed7724e91eda82ae3a5a1151af721a
2019-04-09 22:37:09 -07:00
Wanchao Liang
6c9b312fd4 Add addcmul, lerp to fuser, enable scalar->float specialization in symbolic script (#18081)
Summary:
This PR did two things:

1. Enable scalar->float specialization in symbolic script, so AD formula that contains scalar in the schema, should write `float` instead.
2. add addcmul, lerp to AD and fuser.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18081

Differential Revision: D14490493

Pulled By: wanchaol

fbshipit-source-id: b3b86d960d5f051b30733bc908b19786111cdaa4
2019-03-25 11:05:45 -07:00
Natalia Gimelshein
ed47b85d3b Allow fusion of float function arguments (#18087)
Summary:
so that functions like `def fn(x, p:float)` can be fused. Fixes #9940 and #11186. Fuses only float (not integer) arguments to simplify assembling arguments for fusion launch.
CPU fusion is disabled in CI and this won't be tested, but I tested it locally.
cc t-vi, apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18087

Differential Revision: D14581206

Pulled By: wanchaol

fbshipit-source-id: ccb0cf79b1751706f9b2cdf1715115eae5a39fb6
2019-03-22 13:52:33 -07:00
Michael Suo
f9820e55af initializing class value (#17585)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17585

Create a sugared value that represents a class during initialization. This is
so that assignments to attributes correctly define attributes in __init__ but
raise an error elsewhere.

Reviewed By: shannonzhu

Differential Revision: D14263403

fbshipit-source-id: 09b2feeb272302f00a79c2a0302fbdf5483aed6a
2019-03-11 19:13:52 -07:00
Wanchao Liang
ab95b5c6cc Rename prim::Undefined to prim::AutogradZero (#17611)
Summary:
supersedes #17245
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17611

Differential Revision: D14283581

Pulled By: wanchaol

fbshipit-source-id: 8022d02b8a021ea2fee9a18a2c8920eb123200c5
2019-03-01 15:13:18 -08:00
eellison
82aa511146 move prim::None to prim::Constant (again) (#17186)
Summary:
Trying to land again, make prim::None into a case of prim::Constant. Reverted the previous landing because it broke an important onnx export test.

https://github.com/pytorch/pytorch/pull/16160
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17186

Differential Revision: D14115304

Pulled By: eellison

fbshipit-source-id: 161435fc30460b4e116cdd62c7b2e5b94581dcb7
2019-02-19 11:45:50 -08:00
Natalia Gimelshein
19117f6a0a reenable rand_like fusion when there is no broadcast (#16087)
Summary:
Reenables rand_like fusion if no tensor is broadcasted in the fusion group. This is a sufficient but not necessary condition for fused rand_like to produce correct results, and it has an unpleasant side effect of falling back to non-fused path if rand_like was optimistically included in the fusion group, but there is a broadcast in the fusion group not necessarily related to rand_like. E.g. before this PR, if the network had (biasAdd -> relu -> dropout), fuser could fuse biasAdd and relu, now it will try fusing the whole thing (if dropout is expressed via rand_like) and fall back every time.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16087

Differential Revision: D13720232

Pulled By: zou3519

fbshipit-source-id: 1e19203bec4a59257bfc7078b054a19f00fab4ad
2019-02-19 11:12:25 -08:00
Elias Ellison
91c1d728ac Revert D14109636: [pytorch][PR] move prim::None to a case in prim::Constant
Differential Revision:
D14109636

Original commit changeset: d26fd3839761

fbshipit-source-id: c8c8113e2bff49ea93235732603e6ebc89356533
2019-02-15 16:38:12 -08:00
Elias Ellison
7caa21f5ca move prim::None to a case in prim::Constant (#16160)
Summary:
This change simplifies analysis done on constants since prim::None does not need to be handled separately now.  To check if a constant node is None, use node->isNone().

Next step will be to remove prim::Undefined.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16160

Differential Revision: D14109636

Pulled By: eellison

fbshipit-source-id: d26fd383976163a2ddd4c24984bd672a541cc876
2019-02-15 16:27:57 -08:00
Ailing Zhang
b0545aa85f maskrcnn & bert AD coverage part 1 (#16689)
Summary:
- Moved a few functions from `autograd` namespace to `aten` namespace to be visible from JIT nativeResolver.
- Added a hack to loop up keyword only argument. Will add proper support for kw only later
- Simulate function overload in aten using `_<number>` as function name suffix.
- Even `forward` returns multiple outputs like in `kthvalue`, there's at most one requires grad that we currently support.
- Removed the `TensorList` related ops here since partial `TensorList` support is prone to bugs. Our symbolic diff for `cat` was never tested with autodiff, and it seems broken. Need to find another proper way to support these ops(either by properly supporting `TensorList` or sth like `prim::ConstantChunk`  and leave them for next PR.

Ops supported in this PR:
```
erf
expand_as
index
kthvalue
mean
permute
pow
rsub
select
sqrt
squeeze
t
to
topk
transpose
view
var
embedding
logsumexp
// grad is None
_dim_arange
contiguous
nonzero
ones_like
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16689

Differential Revision: D14020806

Pulled By: ailzhang

fbshipit-source-id: a5e2c144a7be5a0d39d7ac5f93cb402ec12503a5
2019-02-14 15:36:39 -08:00
Zachary DeVito
f34192db0f Rename DynamicType -> TensorType (#16787)
Summary:
```
import json
from subprocess import check_call
from pprint import pprint
renames = {
    'c10::TensorType': 'DimentionedTensorType',
    'c10::DynamicType': 'TensorType',
    'c10::TensorTypePtr': 'DimentionedTensorTypePtr',
    'c10::DynamicTypePtr': 'TensorTypePtr',
    'c10::TypeKind::DynamicType': 'TensorType',
    'c10::TypeKind::TensorType': 'DimentionedTensorType',
}

entries = json.loads(open('compile_commands.json', 'r').read())

build = None
sources = []

for e in entries:
    name = e['file']
    if not ('jit' in name or 'ATen/core' in name):
        continue
    build = e['directory']
    sources.append(name)

args = ['clang-rename', '-i', '-force', '-pl']
for name in sorted(renames.keys()):
    args += ['-qualified-name={}'.format(name), '-new-name={}'.format(renames[name])]

for source in sources:
    cmd = args + [source]
    pprint(args)
    check_call(cmd, cwd=build)
    check_call(['git', 'stash', 'push', '-m', 'rename'])
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16787

Differential Revision: D13974132

Pulled By: zdevito

fbshipit-source-id: 8368fd53e17cff83707bbe77f2d7aad74f8ce60e
2019-02-06 17:31:07 -08:00
Thomas Viehmann
20d45c43d7 Get more fusion after autodiff uses SumToSize (#14957)
Summary:
Here is a fresh attempt at getting some fusion back in autodiff-generated graphs in the presence of SumToSize.

- The sum to size operator is now  `aten::_grad_sum_to_size` to allow symbolic script differentiation (and that in turn would need to use this in place of sum_to_size to signal that it strictly operates on gradients). This is also used in the autodiff code, replacing `prim::SumToSize`.
- `_grad_sum_to_size` is now fusable, `cat`s - which are fused afterwards thanks to Adam's simplification of the code - are only fused if there is no `_grad_sum_to_size` in the fusion group.
- I push the `_grad_sum_to_size` out of the the fusion group when compiling and record the desired summations in the KernelSpec. The reasoning is the following:
  - As the autodiff is a repeated applicaiton of the chain rule, we always have the pattern `grad_in = mm(A, grad_out)`,  with A often diagonal for cases interesting to the fuser, whence it is `grad_in = a * grad_out` (a pointwise multiplication). We know that only `grad_out` may have AutodiffGradSumToSize applied, so we can commute AutodiffGradSumToSize with the `mul` (and `div` and `neg` are of similar origin).
  - For `type_as` the gradient might be giving the type, so just skip SumToSize,
  - `add` (which was inserted as `prim::AutogradAdd`) adding gradients when the forward used the same value in several places. This is non-broadcasting, so we know that the two arguments would have the same sizes as inputs - which is good so we don't have to do bookkeeping of the two parts.

Details:
- During fusion, the Tensor arguments are always kept as the first parameters of the fusion group to accomodate indexing assumptions in the fuser.
- The rewriting of the fusion group to record the necessary output transformation and eliminate `_grad_sum_to_size` from the fusion group is now in the fuser compile step.
- In the execution step, the arguments are split into Tensor / Non-Tensor and the non-tensor args are mostly forgotten about except for doing `sum_to_size` at the end. This would want to be improved if/when we fuse nonconstant scalar arguments.
- In a number of places in the fuser, the non-Tensor arguments to the fusion group needed to be ignored.

Thank you, apaszke for the insightful discussion. All bad ideas and errors are my own.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14957

Differential Revision: D13888173

Pulled By: zou3519

fbshipit-source-id: 071992c876e8b845f2b3e6329ae03a835d39a0ea
2019-01-31 12:24:38 -08:00
Michael Suo
dc84ff1e5a Use a points-to graph for alias analysis (#16386)
Summary:
This PR changes the way we store aliasing information from a "set" approach to a "points-to" analysis. Set-based approaches lose information in ways that make it difficult to do "live" updates to the alias DB as one as mutating the graph.

The tradeoff is that simple queries get more expensive, since they require traversing the points-to graph to answer most questions. In practice, this is unlikely to be that costly since we don't have massive aliasing chains, but we could create an approximation/caching layer if this becomes a problem.

My rough plan is:
1. This PR, switching to a points-to graph
2. Make it "live": analyzing a node should record all the edges the node added, so that we can rollback when the node is destroyed.
3. Reduce wildcard scope: we can make the wildcard a special vertex that points to anything that we're not "sure" about; namely, things that have been put inside lists, or graph inputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16386

Differential Revision: D13855117

Pulled By: suo

fbshipit-source-id: f009f58143173c275501624eb105d07ab60fe5e1
2019-01-30 11:28:03 -08:00
Mikhail Zolotukhin
47bf30661f Directly include headers from ATen.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16287

Differential Revision: D13792949

Pulled By: ZolotukhinM

fbshipit-source-id: d627d8dc469df048063c70d0b5b8d33fede809a3
2019-01-24 11:22:27 -08:00
Michael Suo
83c054de48 AliasDB interface cleanup (#15656)
Summary:
This is the first of several PRs to simplify AliasDb usage.
- Hide the concept wildcards from users. They are too hard to think about and too easy to forget about.
- Start moving "mutability-safe" graph mutation methods into AliasDb (right now, the various methods that deal with topological move).

Eventually I want to create a "mutability-aware" handle to the graph. If you only use that handle to transform the graph, you can be sure that all transformations are safe with respect to mutability.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15656

Differential Revision: D13615492

Pulled By: suo

fbshipit-source-id: 5c39a157b4ea76f1f976315d06a314a89cc4f22f
2019-01-11 20:06:53 -08:00
Zachary DeVito
3f6b212e80 Register CPU/CUDA fuser dynamically (#15887)
Summary:
This avoids a bunch of conditional compilation logic
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15887

Reviewed By: eellison

Differential Revision: D13613239

Pulled By: zdevito

fbshipit-source-id: a18fc69676b3ef19b4469ab58d8714d1f6efccbb
2019-01-11 10:50:35 -08:00
Adam Paszke
d580d3583b Simplify cat fusion (#15633)
Summary:
That makes that definition of a "fusable node" much simpler,
as we don't need to keep considering whether something has to be an
"exit node" at every step. The fuser now tries to maximize the
pointwise fusions first, and proceeds to prepending chunks and appending
concats only once a fix point is reached.

This patch not only makes the fuser much simpler to reason about,
making it siginifcantly easier to implement features like SumToSize
fusion, to improve performance of derivative graphs.

cc zou3519 mruberry
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15633

Differential Revision: D13575306

Pulled By: zou3519

fbshipit-source-id: 0c55ea61d65d1f1ed3d75a8e1e83bc85a83f3aff
2019-01-11 10:33:42 -08:00
Adam Paszke
d35295c603 JIT Batch Norm fusion (#15897)
Summary:
Resubmit of #15146, which has been accidentally reverted.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15897

Differential Revision: D13616093

Pulled By: zou3519

fbshipit-source-id: 0c3a3bec8f9fed57274da9f6c7cf40cbc05cf91a
2019-01-10 12:38:47 -08:00
Topher Lubaway
14b40c0633 Revert D13548303: [pytorch][PR] Add support for batch_norm fusion to the JIT
Differential Revision:
D13548303

Original commit changeset: a2e2e5abc383

fbshipit-source-id: 5b70cdbcbd1cac06eeefb2a939773358c061183c
2019-01-09 08:53:57 -08:00
Adam Paszke
5e1b35bf28 Add support for batch_norm fusion to the JIT (#15146)
Summary:
We don't support reductions yet, but simply decomposing batch_norm
into a kernel that computes the stats, and the fusing everything else
with ReLU and following pointwise ops provides nice speedups.

Note that this is only limited to inference mode for now, because we
don't support convolutions and batch norm in AD, so the fuser isn't
applied to those parts.

This commit gives us a 7% end-to-end speedup for ResNet50 with batch size 32. Note that this only applies to inference mode at the moment due to lack of AD support for CNN operations (I'll be adding that soon), and not to the standard `torchvision` models, because they use in-place ops which aren't supported by the fuser (we need a way of proving that de-inplacing them is safe).

cc zou3519 zdevito mruberry ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15146

Differential Revision: D13548303

Pulled By: zou3519

fbshipit-source-id: a2e2e5abc383f637fae19bd1b423f20c2cbc056a
2019-01-08 07:00:19 -08:00
Michael Suo
f636dc9276 clang format world (#15524)
Summary:
The PR clang-formats everything in `torch/csrc/jit/` and adds it to the pre-commit hook.

Here is a list of non-mechanical changes:
- I went over each file and fixed up whenever I could tell that clang-format was clobbering comment formatting.
- Made the macros in register_prim_ops a little more clang-format friendly by omitting trailing commas
- Refactored autodiff.cpp to use a helper class with explicit state rather than a bunch of capturing lambdas
- Small improvements to the precommit hook clang-format
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15524

Differential Revision: D13547989

Pulled By: suo

fbshipit-source-id: 3ff1541bb06433ccfe6de6e33f29227a2b5bb493
2018-12-26 06:55:01 -08:00
Peter Goldsborough
7a61306031 Enable all clang-tidy performance checks (#15198)
Summary:
This PR adds the final set of clang-tidy checks we should add for our codebase: a last set of performance-related checks. Most fixes here are around changing `auto` to `const auto&` in a few places where unnecessary copies were made, and adding `reserve()` calls before loops doing repeated `push_back()`. Also a few cases of calling `std::string::find` with a single-character string literal instead of a single char, which uses a less efficient string search algorithm meant for searching larger substrings.

![image](https://user-images.githubusercontent.com/6429851/49978940-adc1a780-ff01-11e8-99da-a4e431361f07.png)

ezyang apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15198

Differential Revision: D13468797

Pulled By: goldsborough

fbshipit-source-id: 2bed1ea1c7c162b7f3e0e1026f17125e88c4d5b2
2018-12-14 13:32:47 -08:00
Natalia Gimelshein
fb140c7828 add erf and erfc to fuser/autodiff
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15139

Differential Revision: D13455690

Pulled By: soumith

fbshipit-source-id: b06e5f5d362869c2e5fa11a52f9450d77c30d4cb
2018-12-13 19:17:40 -08:00
Edward Yang
517c7c9861 Canonicalize all includes in PyTorch. (#14849)
Summary:
Anywhere we used #include "foo.h", we now say #include <foo.h>
Paths are adjusted to be rooted out of aten/src, torch/lib, or
the root level directory.

I modified CMakeLists.txt by hand to remove TH and THC from
the include paths.

I used the following script to do the canonicalization:

```
  import subprocess
  import re
  import os.path

  files = subprocess.check_output(['git', 'ls-files']).decode('utf-8').rstrip().split('\n')
  for fn in files:
      if not any(fn.endswith(suff) for suff in ['.cu', '.cpp', '.in', '.h', '.hpp', '.cu', '.cuh', '.cc']):
          continue
      if not any(fn.startswith(pref) for pref in ["aten/", "torch/"]):
          continue
      with open(fn, 'r') as f:
          c = f.read()
      def fmt(p):
          return "#include <{}>".format(p)
      def repl(m):
          p = m.group(1)
          if p in ["dlfcn.h", "unistd.h", "nvrtc.h", "cuda.h", "cuda_runtime.h", "cstdint", "cudnn.h", "Python.h", "cusparse.h", "cuda_runtime_api.h", "cuda_fp16.h", "cublas_v2.h", "stdint.h", "curand_kernel.h"]:
              return fmt(p)
          if any(p.startswith(pref) for pref in ["torch/csrc", "c10/", "ATen/", "caffe2/", "TH/", "THC/", "Eigen/", "gtest/", "zdl/", "gloo/", "onnx/", "miopen/"]):
              return fmt(p)
          for root in ["aten/src", "torch/lib", ""]:
              for bad_root in [os.path.dirname(fn), "aten/src/TH", "aten/src/THC", "torch/csrc"]:
                  new_p = os.path.relpath(os.path.join(bad_root, p), root)
                  if not new_p.startswith("../") and (os.path.exists(os.path.join(root, new_p)) or os.path.exists(os.path.join(root, new_p + ".in"))):
                      return fmt(new_p)
          print("ERROR: ", fn, p)
          return m.group(0)
      new_c = re.sub(r'#include "([^"]+)"', repl, c)
      if new_c != c:
          print(fn)
          with open(fn, 'w') as f:
              f.write(new_c)
```

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14849

Reviewed By: dzhulgakov

Differential Revision: D13363445

Pulled By: ezyang

fbshipit-source-id: 52361f878a672785f9306c9e9ab2513128092b68
2018-12-08 19:38:30 -08:00
Adam Paszke
8dfebc16cc Improvements for symbolic AD (#14758)
Summary:
**Review only the last commit.**

This commit adds a few optimizations to AD, that let us dramatically
reduce the number of sizes we capture from forward.

We now:
- collapse chains of SumToSize
- avoid capturing sizes of tensors that are captured anyway
- more aggressively DCE the reverse code
- run CSE on the primal code to deduplicate `aten::size` calls

cc zou3519 zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14758

Differential Revision: D13324440

Pulled By: zou3519

fbshipit-source-id: 45ccbc13605adcef2b461840c6089d3200000c72
2018-12-04 20:38:21 -08:00
Adam Paszke
d76fd43294 Reenable all forward-pass fusions that worked before the AD fix (#14558)
Summary:
Dealing with so many `aten::size` calls (in particular calls on elements computed inside fusion groups) requires us to do some extra graph processing in the fuser (to compute the sizes by explicit broadcasts, instead of writing the intermediate tensors only to check their size). This restores the forward expects of LSTM and MiLSTM to a single big kernel. Unfortunately the backward is much harder, because as long as we can't prove that the reductions are unnecessary (or if we can't distribute them over the op), we will not be able to fuse them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14558

Differential Revision: D13321748

Pulled By: zou3519

fbshipit-source-id: c04fc2f70d106d2bfb56206b5aec517a93b79d1f
2018-12-04 15:43:37 -08:00
Adam Paszke
7bc489c827 Disable randn_like fusion in the JIT (#14752)
Summary:
Fixes #14674. We won't have time for a proper fix before the release, so at least disable fusion of nodes that trigger incorrect behavior.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14752

Differential Revision: D13320407

Pulled By: zou3519

fbshipit-source-id: 2400f7c2cd332b957c248e755fdb0dadee68da5d
2018-12-04 08:55:47 -08:00