pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Vitaly Fedyunin	266c1652e6	Back out "Add memory format support to `rand_like` operator" (#28801 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28801 Original commit changeset: 2a1d47571268 ghstack-source-id: 92748792 Test Plan: buck test language_technology/neural_mt/os/pytorch_translate/test:test_onnx Reviewed By: ifedan Differential Revision: D18175304 fbshipit-source-id: ffd61f6e42f256b39b80a6b42d989c238228f25d	2019-10-28 12:44:45 -07:00
Vitaly Fedyunin	04f5325583	Add memory format support to `rand_like` operator (#27561 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27561 Adds memory_format keyword argument (positional for cpp). 'Preserve' behavior now follows next rules: 1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor. 2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format. 3) Output tensor is going to be contiguous in all other cases. --- Dense tensor is the tensor that store values in a contiguous block of memory. Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory. Test Plan: Imported from OSS Differential Revision: D17980316 Pulled By: VitalyFedyunin fbshipit-source-id: 2a1d47571268673de0c6f5ae1b6d4f9110962ab0	2019-10-25 07:29:12 -07:00
Mike Ruberry	ac7996ccd3	Removes SymbolicVariable (#25077 ) Summary: This PR excises the last of SymbolicVariable. There should be no change in functionality. One new test for addmm fusion was added. A case where the peephole optimizer might convert a scalar argument remains untested. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25077 Test Plan: Refactors existing code so mostly covered by current tests. One test for addmm fusion was added. Differential Revision: D17145334 Pulled By: mruberry fbshipit-source-id: 6b68faf764f9ee8398b55c43110228ed9faf81eb	2019-08-31 11:19:50 -07:00
Zachary DeVito	bdc57d3833	Merge ProfiledTensorType and TensorType (#24284 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24284 This PR finishes the unification of all Tensor types into a single object. ProfiledTensorType is renamed to TensorType and the old TensorType is deleted. Notes: * Fixes bug in merge for VaryingShape by changing its representation to an optional list of optional ints. * Removes ProfiledTensorType::create(type) invocations that can now simply be expect calls on tensor type. Test Plan: Imported from OSS Differential Revision: D16794034 Pulled By: zdevito fbshipit-source-id: 10362398d0bb166d0d385d74801e95d9b87d9dfc	2019-08-20 13:01:28 -07:00
Nikolay Korovaiko	3d15ee1b34	Remove more uses of `DimensionedTensorType` Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23060 Differential Revision: D16460391 Pulled By: Krovatkin fbshipit-source-id: b50ee87d22ad18b8cbfff719b199ea876ef172f1	2019-08-01 21:19:28 -07:00
Thomas Viehmann	cf50249bde	Disable fusion of grad_sum_to_size (#23372 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/22833 grad_sum_to_size does not commute with AutogradAdd after all because it turns the broadcasting AutogradAdd into a broadcasting add. Chillee did actually do most of the tracking down to the fusion of grad_sum_to_size and pinging me when he had found the cause. Thank you! About the choice of removing the fusion completely instead of being more precise: - We do have grad_sum_to_size elimination which works for cases where broadcasting does not actually happen in the forward, so the cases where the fusing of grad_sum_to_size is actually beneficial is much smaller than when initially proposed. - There will be less fusion, in terms of the tests, IOU stops being fully fused. I vaguely think that it is a case we could handle with refined logic. - Keeping it would add complexity in checking when to merge fusion groups to the complexities that this PR removes. - The future of fusion probably lies more in more complete solutions including reductions (TVM or KeOps or our own or ...). Pull Request resolved: https://github.com/pytorch/pytorch/pull/23372 Differential Revision: D16489930 Pulled By: soumith fbshipit-source-id: bc0431b0d3eda264c401b634675872c4ce46f0f4	2019-07-25 08:55:33 -07:00
jjsjann123	252710262f	(#22775 ) Summary: passing FusionCallback and Symbol to recursive GraphFuser calls. It ensures consistent fusion in nested Blocks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/22775 Differential Revision: D16439979 Pulled By: soumith fbshipit-source-id: 18d4b13f52b03708b8580c73f75450adbb672ac1	2019-07-25 05:54:03 -07:00
Bram Wasti	05d56bd1b6	Remove hard-coded NVRTC specific constant from fuser header Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22699 Test Plan: Imported from OSS Differential Revision: D16192290 Pulled By: bwasti fbshipit-source-id: 4dccaf3e6e0151e86d35474c36e1ddb7f2afb5cf	2019-07-11 13:44:25 -07:00
Thomas Viehmann	17941f9979	JIT: Eliminate SumToSize by using Optional Lists (#18697 ) Summary: This PR is a eliminates unneeded grad_sum_to_size and in particular speeds up the LSTM backward by allowing better fusion. It consists of two parts: - In AutoDiff, record broadcasting sizes only if the broadcast output size is different from the input size, otherwise record None. - The specialization of Optional arguments (#18407) allows us to then eliminate ` _grad_sum_to_size(t, None)` in the peephole optimization step. Thus, in the LSTM case, no SumToSize remain in the crucial fusion group. The trick here is that we can specialize on the runtime information from the forward. I'm testing that different broadcasting situations lead to different graphs. I didn't move all symbolic_script _grad_sum_to_size to the new logic, but it might be better to do this incrementally, anyway. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18697 Differential Revision: D15482076 Pulled By: wanchaol fbshipit-source-id: 7f89367e35b8729910077c95c02bccefc8678afb	2019-05-24 11:24:17 -07:00
Wanchao Liang	871c9dcb1d	move batchnorm and layernorm fusion to decompose (#20337 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20337 ghimport-source-id: 2196f84f2ef384c1f25587b2fb4bd9dd2f63c2b4 Differential Revision: D15448596 Pulled By: wanchaol fbshipit-source-id: b66e608f1b72471fc0775aaa4e09f9fa1070fc3c	2019-05-22 18:01:27 -07:00
Bram Wasti	7b733e4fc1	Rebase conflict fix for isFusableDevice (#20251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20251 ghimport-source-id: 0c8c1847a7979fcd77e4f6618730b170b6b8ce25 Differential Revision: D15262850 Pulled By: bwasti fbshipit-source-id: 17ecc340a310ddbcce141cfa3ee0efa9660194d2	2019-05-08 12:14:12 -07:00
Bram Wasti	4ca325df87	Add Custom graph fusion (#18588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18588 ghimport-source-id: f40df177af8b87c73f04bf337f478a62133284cf Differential Revision: D14901297 Pulled By: bwasti fbshipit-source-id: 1b6371a5175b3d63dad542b7cc22cb82e8c6cfd0	2019-05-06 23:15:16 -07:00
Mikhail Zolotukhin	8b46938355	Cleanup includes in torch/csrc/jit/* (#19922 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19922 ghimport-source-id: 0434c46bf75621ff79ea27a18a2475e7f13e2487 Differential Revision: D15125015 Pulled By: ZolotukhinM fbshipit-source-id: 5685edfc94067f62e363a85e9badb7f757b1d321	2019-05-06 13:40:26 -07:00
Zachary DeVito	a425e1cbf8	Remove duplicate inlineCallToCode (#19724 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19724 ghimport-source-id: a68d28ac9bbe62dd61f03bfd9d57f4ef1d0ce9c9 Reviewed By: jamesr66a Differential Revision: D15078532 Pulled By: zdevito fbshipit-source-id: bebd34ff6105f538395260b027dc169448b5bc96	2019-04-25 15:53:10 -07:00
Wanchao Liang	c571969148	Fix the insert_guard for norm decomposation (#19646 ) Summary: move the insert_guard all the way up to the beginning of the decomposation, this will fix the case that we lose insert_point context after decomposeCommonNormalization and we still need to modify the graph. fixes #19502 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19646 Differential Revision: D15058040 Pulled By: wanchaol fbshipit-source-id: ebdbf8623ebfe4556c461e1b650e94b905791adb	2019-04-24 23:12:37 -07:00
James Reed	e7fc7c732c	Bugfix for fusion device check (#19594 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19594 I missed a callsite Reviewed By: wanchaol Differential Revision: D15041457 fbshipit-source-id: eef76ad51bee06a56d31b4ab64f19250fe2ad8f0	2019-04-22 20:55:17 -07:00
James Reed	5be4bee4ff	Don't create FusionGroups for known-CPU producer values (#19342 ) Summary: I believe the existing check in FuseGraph was only `false` if PyTorch was built with NO_CUDA=1. Otherwise, we would create fusion groups even if we're on a CPU-only machine running CPU code. This is confusing. Instead I've made it so that the decision to fuse or not is dependent on if the producer Value is a known CPU tensor. If it is, we skip fusion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19342 Differential Revision: D15038351 Pulled By: jamesr66a fbshipit-source-id: fce9d83929309a7bf14346833f84b996f3e7f6db	2019-04-22 16:57:18 -07:00
Michael Suo	1e94a3bc4d	Turn resolver into a class (#19236 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19236 ghimport-source-id: d36705ea5ecff085d0d84ea57bb96d18d7c260dd Differential Revision: D14928292 Reviewed By: zdevito Pulled By: suo fbshipit-source-id: cd038100ac423fa1c19d0547b9e5487a633a2258	2019-04-19 13:01:59 -07:00
Thomas Viehmann	b9291f55bb	pow scalar exponent / base autodiff, fusion (#19324 ) Summary: Fixes: #19253 Fixing pow(Tensor, float) is straightforward. The breakage for pow(float, Tensor) is a bit more subtle to trigger, and fixing needs `torch.log` (`math.log` didn't work) from the newly merged #19115 (Thanks ngimel for pointing out this has landed.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/19324 Differential Revision: D15003531 Pulled By: ailzhang fbshipit-source-id: 8b22138fa27a43806b82886fb3a7b557bbb5a865	2019-04-18 17:58:35 -07:00
Wanchao Liang	a3d3008e73	JIT Layernorm fusion (#18266 ) Summary: Partially fuse layer_norm by decomposing layer_norm into the batchnorm kernel that computes the stats, and then fusing the affine operations after the reduce operations, this is similar to the batchnorm fusion that apaszke did, it also only works in inference mode now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18266 Differential Revision: D14879877 Pulled By: wanchaol fbshipit-source-id: 0197d8f2a17ec438d3e53f4c411d759c1ae81efe	2019-04-12 14:38:31 -07:00
Zachary DeVito	ef406ee925	First class modules in the compiler, round 2 (#19167 ) Summary: This PR propagates where we use first-class modules objects into the compiler. This creates a transitionary state where: * compiler.cpp creates Graphs where `self` is a Module class and attributes/parameters/buffers/submodules are looked up with `prim::GetAttr` * GraphExecutor still runs "lowered graphs" where the self object has been removed by a compiler pass `lower_first_class_method`. * Tracing still creates "lowered graphs", and a pass "lift_lowered_method" creates a first-class method graph for things. * This PR separates out Method and Function. A script::Function is a pure Graph with no `self` bound. Similar to Python, a script::Method is just a bound `self` and its underlying `script::Function`. * This PR also separates CompilationUnit from Module. A CompilationUnit is just a list of named script::Functions. Class's have a CompilationUnit holding the class methods, and Modules also have a CompilationUnit holding their Methods. This avoids the weird circular case Module --has a-> Class -> has a -> Module ... Details: * In this transitionary state, we maintain two copies of a Graph, first-class module and lowered. Th first-class one has a self argument that is the module's class type. The lowered one is the lowered graph that uses the initial_ivalues inputs. * When defining lowered methods using `_defined_lowered` we immediately create the first-class equivalent. The reverse is done lazily, creating lowered_methods on demand from the class. * The two way conversions will be deleted in a future PR when the executor itself runs first-class objects. However this requires more changes to (1) the traces, (2) the python bindings, and (3) the onnx export pass and would make this PR way to large. Pull Request resolved: https://github.com/pytorch/pytorch/pull/19167 Differential Revision: D14891966 Pulled By: zdevito fbshipit-source-id: 0b5f03118aa65448a15c7a7818e64089ec93d7ea	2019-04-11 13:55:48 -07:00
Zachary DeVito	f5165ade5b	Revert D14842057: Compiler uses first-class modules** Differential Revision: D14842057 Original commit changeset: ca6e7b5a4380 fbshipit-source-id: e8f1862a59bf20d5f78648b2fdc53a8b3750ead3	2019-04-11 06:17:01 -07:00
Zachary DeVito	5e1f0b2a07	Compiler uses first-class modules** (#19043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19043 ghimport-source-id: 0c9e80d5f35654af6d472abd5643bff3e9eb9ddf Differential Revision: D14842057 Pulled By: zdevito fbshipit-source-id: ca6e7b5a43805240f40b84d30e54495061067dc0	2019-04-11 00:00:48 -07:00
Roy Ju	a9a29dd63f	Fixes error when too many parameters are passed to fused cuda kernel (#18063 ) Summary: Bug fix for https://github.com/pytorch/pytorch/issues/15043, where a large fusion in JIT with a large number of kernel arguments, which exceeds the limit allowed by nvrtc on a cuda device. The fix is to check the number of arguments before a cuda kernel is generated. If the number exceeds the limit, take the runFallBack() path. Add a reduced test from the original issue to keep the test time low. The test would fail without this fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18063 Differential Revision: D14691401 Pulled By: soumith fbshipit-source-id: b98829bc89ed7724e91eda82ae3a5a1151af721a	2019-04-09 22:37:09 -07:00
Wanchao Liang	6c9b312fd4	Add addcmul, lerp to fuser, enable scalar->float specialization in symbolic script (#18081 ) Summary: This PR did two things: 1. Enable scalar->float specialization in symbolic script, so AD formula that contains scalar in the schema, should write `float` instead. 2. add addcmul, lerp to AD and fuser. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18081 Differential Revision: D14490493 Pulled By: wanchaol fbshipit-source-id: b3b86d960d5f051b30733bc908b19786111cdaa4	2019-03-25 11:05:45 -07:00
Natalia Gimelshein	ed47b85d3b	Allow fusion of float function arguments (#18087 ) Summary: so that functions like `def fn(x, p:float)` can be fused. Fixes #9940 and #11186. Fuses only float (not integer) arguments to simplify assembling arguments for fusion launch. CPU fusion is disabled in CI and this won't be tested, but I tested it locally. cc t-vi, apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/18087 Differential Revision: D14581206 Pulled By: wanchaol fbshipit-source-id: ccb0cf79b1751706f9b2cdf1715115eae5a39fb6	2019-03-22 13:52:33 -07:00
Michael Suo	f9820e55af	initializing class value (#17585 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17585 Create a sugared value that represents a class during initialization. This is so that assignments to attributes correctly define attributes in __init__ but raise an error elsewhere. Reviewed By: shannonzhu Differential Revision: D14263403 fbshipit-source-id: 09b2feeb272302f00a79c2a0302fbdf5483aed6a	2019-03-11 19:13:52 -07:00
Wanchao Liang	ab95b5c6cc	Rename prim::Undefined to prim::AutogradZero (#17611 ) Summary: supersedes #17245 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17611 Differential Revision: D14283581 Pulled By: wanchaol fbshipit-source-id: 8022d02b8a021ea2fee9a18a2c8920eb123200c5	2019-03-01 15:13:18 -08:00
eellison	82aa511146	move prim::None to prim::Constant (again) (#17186 ) Summary: Trying to land again, make prim::None into a case of prim::Constant. Reverted the previous landing because it broke an important onnx export test. https://github.com/pytorch/pytorch/pull/16160 Pull Request resolved: https://github.com/pytorch/pytorch/pull/17186 Differential Revision: D14115304 Pulled By: eellison fbshipit-source-id: 161435fc30460b4e116cdd62c7b2e5b94581dcb7	2019-02-19 11:45:50 -08:00
Natalia Gimelshein	19117f6a0a	reenable rand_like fusion when there is no broadcast (#16087 ) Summary: Reenables rand_like fusion if no tensor is broadcasted in the fusion group. This is a sufficient but not necessary condition for fused rand_like to produce correct results, and it has an unpleasant side effect of falling back to non-fused path if rand_like was optimistically included in the fusion group, but there is a broadcast in the fusion group not necessarily related to rand_like. E.g. before this PR, if the network had (biasAdd -> relu -> dropout), fuser could fuse biasAdd and relu, now it will try fusing the whole thing (if dropout is expressed via rand_like) and fall back every time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16087 Differential Revision: D13720232 Pulled By: zou3519 fbshipit-source-id: 1e19203bec4a59257bfc7078b054a19f00fab4ad	2019-02-19 11:12:25 -08:00
Elias Ellison	91c1d728ac	Revert D14109636: [pytorch][PR] move prim::None to a case in prim::Constant Differential Revision: D14109636 Original commit changeset: d26fd3839761 fbshipit-source-id: c8c8113e2bff49ea93235732603e6ebc89356533	2019-02-15 16:38:12 -08:00
Elias Ellison	7caa21f5ca	move prim::None to a case in prim::Constant (#16160 ) Summary: This change simplifies analysis done on constants since prim::None does not need to be handled separately now. To check if a constant node is None, use node->isNone(). Next step will be to remove prim::Undefined. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16160 Differential Revision: D14109636 Pulled By: eellison fbshipit-source-id: d26fd383976163a2ddd4c24984bd672a541cc876	2019-02-15 16:27:57 -08:00
Ailing Zhang	b0545aa85f	maskrcnn & bert AD coverage part 1 (#16689 ) Summary: - Moved a few functions from `autograd` namespace to `aten` namespace to be visible from JIT nativeResolver. - Added a hack to loop up keyword only argument. Will add proper support for kw only later - Simulate function overload in aten using `_<number>` as function name suffix. - Even `forward` returns multiple outputs like in `kthvalue`, there's at most one requires grad that we currently support. - Removed the `TensorList` related ops here since partial `TensorList` support is prone to bugs. Our symbolic diff for `cat` was never tested with autodiff, and it seems broken. Need to find another proper way to support these ops(either by properly supporting `TensorList` or sth like `prim::ConstantChunk` and leave them for next PR. Ops supported in this PR: ``` erf expand_as index kthvalue mean permute pow rsub select sqrt squeeze t to topk transpose view var embedding logsumexp // grad is None _dim_arange contiguous nonzero ones_like ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/16689 Differential Revision: D14020806 Pulled By: ailzhang fbshipit-source-id: a5e2c144a7be5a0d39d7ac5f93cb402ec12503a5	2019-02-14 15:36:39 -08:00
Zachary DeVito	f34192db0f	Rename DynamicType -> TensorType (#16787 ) Summary: ``` import json from subprocess import check_call from pprint import pprint renames = { 'c10::TensorType': 'DimentionedTensorType', 'c10::DynamicType': 'TensorType', 'c10::TensorTypePtr': 'DimentionedTensorTypePtr', 'c10::DynamicTypePtr': 'TensorTypePtr', 'c10::TypeKind::DynamicType': 'TensorType', 'c10::TypeKind::TensorType': 'DimentionedTensorType', } entries = json.loads(open('compile_commands.json', 'r').read()) build = None sources = [] for e in entries: name = e['file'] if not ('jit' in name or 'ATen/core' in name): continue build = e['directory'] sources.append(name) args = ['clang-rename', '-i', '-force', '-pl'] for name in sorted(renames.keys()): args += ['-qualified-name={}'.format(name), '-new-name={}'.format(renames[name])] for source in sources: cmd = args + [source] pprint(args) check_call(cmd, cwd=build) check_call(['git', 'stash', 'push', '-m', 'rename']) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/16787 Differential Revision: D13974132 Pulled By: zdevito fbshipit-source-id: 8368fd53e17cff83707bbe77f2d7aad74f8ce60e	2019-02-06 17:31:07 -08:00
Thomas Viehmann	20d45c43d7	Get more fusion after autodiff uses SumToSize (#14957 ) Summary: Here is a fresh attempt at getting some fusion back in autodiff-generated graphs in the presence of SumToSize. - The sum to size operator is now `aten::_grad_sum_to_size` to allow symbolic script differentiation (and that in turn would need to use this in place of sum_to_size to signal that it strictly operates on gradients). This is also used in the autodiff code, replacing `prim::SumToSize`. - `_grad_sum_to_size` is now fusable, `cat`s - which are fused afterwards thanks to Adam's simplification of the code - are only fused if there is no `_grad_sum_to_size` in the fusion group. - I push the `_grad_sum_to_size` out of the the fusion group when compiling and record the desired summations in the KernelSpec. The reasoning is the following: - As the autodiff is a repeated applicaiton of the chain rule, we always have the pattern `grad_in = mm(A, grad_out)`, with A often diagonal for cases interesting to the fuser, whence it is `grad_in = a * grad_out` (a pointwise multiplication). We know that only `grad_out` may have AutodiffGradSumToSize applied, so we can commute AutodiffGradSumToSize with the `mul` (and `div` and `neg` are of similar origin). - For `type_as` the gradient might be giving the type, so just skip SumToSize, - `add` (which was inserted as `prim::AutogradAdd`) adding gradients when the forward used the same value in several places. This is non-broadcasting, so we know that the two arguments would have the same sizes as inputs - which is good so we don't have to do bookkeeping of the two parts. Details: - During fusion, the Tensor arguments are always kept as the first parameters of the fusion group to accomodate indexing assumptions in the fuser. - The rewriting of the fusion group to record the necessary output transformation and eliminate `_grad_sum_to_size` from the fusion group is now in the fuser compile step. - In the execution step, the arguments are split into Tensor / Non-Tensor and the non-tensor args are mostly forgotten about except for doing `sum_to_size` at the end. This would want to be improved if/when we fuse nonconstant scalar arguments. - In a number of places in the fuser, the non-Tensor arguments to the fusion group needed to be ignored. Thank you, apaszke for the insightful discussion. All bad ideas and errors are my own. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14957 Differential Revision: D13888173 Pulled By: zou3519 fbshipit-source-id: 071992c876e8b845f2b3e6329ae03a835d39a0ea	2019-01-31 12:24:38 -08:00
Michael Suo	dc84ff1e5a	Use a points-to graph for alias analysis (#16386 ) Summary: This PR changes the way we store aliasing information from a "set" approach to a "points-to" analysis. Set-based approaches lose information in ways that make it difficult to do "live" updates to the alias DB as one as mutating the graph. The tradeoff is that simple queries get more expensive, since they require traversing the points-to graph to answer most questions. In practice, this is unlikely to be that costly since we don't have massive aliasing chains, but we could create an approximation/caching layer if this becomes a problem. My rough plan is: 1. This PR, switching to a points-to graph 2. Make it "live": analyzing a node should record all the edges the node added, so that we can rollback when the node is destroyed. 3. Reduce wildcard scope: we can make the wildcard a special vertex that points to anything that we're not "sure" about; namely, things that have been put inside lists, or graph inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16386 Differential Revision: D13855117 Pulled By: suo fbshipit-source-id: f009f58143173c275501624eb105d07ab60fe5e1	2019-01-30 11:28:03 -08:00
Mikhail Zolotukhin	47bf30661f	Directly include headers from ATen. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16287 Differential Revision: D13792949 Pulled By: ZolotukhinM fbshipit-source-id: d627d8dc469df048063c70d0b5b8d33fede809a3	2019-01-24 11:22:27 -08:00
Michael Suo	83c054de48	AliasDB interface cleanup (#15656 ) Summary: This is the first of several PRs to simplify AliasDb usage. - Hide the concept wildcards from users. They are too hard to think about and too easy to forget about. - Start moving "mutability-safe" graph mutation methods into AliasDb (right now, the various methods that deal with topological move). Eventually I want to create a "mutability-aware" handle to the graph. If you only use that handle to transform the graph, you can be sure that all transformations are safe with respect to mutability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15656 Differential Revision: D13615492 Pulled By: suo fbshipit-source-id: 5c39a157b4ea76f1f976315d06a314a89cc4f22f	2019-01-11 20:06:53 -08:00
Zachary DeVito	3f6b212e80	Register CPU/CUDA fuser dynamically (#15887 ) Summary: This avoids a bunch of conditional compilation logic Pull Request resolved: https://github.com/pytorch/pytorch/pull/15887 Reviewed By: eellison Differential Revision: D13613239 Pulled By: zdevito fbshipit-source-id: a18fc69676b3ef19b4469ab58d8714d1f6efccbb	2019-01-11 10:50:35 -08:00
Adam Paszke	d580d3583b	Simplify cat fusion (#15633 ) Summary: That makes that definition of a "fusable node" much simpler, as we don't need to keep considering whether something has to be an "exit node" at every step. The fuser now tries to maximize the pointwise fusions first, and proceeds to prepending chunks and appending concats only once a fix point is reached. This patch not only makes the fuser much simpler to reason about, making it siginifcantly easier to implement features like SumToSize fusion, to improve performance of derivative graphs. cc zou3519 mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/15633 Differential Revision: D13575306 Pulled By: zou3519 fbshipit-source-id: 0c55ea61d65d1f1ed3d75a8e1e83bc85a83f3aff	2019-01-11 10:33:42 -08:00
Adam Paszke	d35295c603	JIT Batch Norm fusion (#15897 ) Summary: Resubmit of #15146, which has been accidentally reverted. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15897 Differential Revision: D13616093 Pulled By: zou3519 fbshipit-source-id: 0c3a3bec8f9fed57274da9f6c7cf40cbc05cf91a	2019-01-10 12:38:47 -08:00
Topher Lubaway	14b40c0633	Revert D13548303: [pytorch][PR] Add support for batch_norm fusion to the JIT Differential Revision: D13548303 Original commit changeset: a2e2e5abc383 fbshipit-source-id: 5b70cdbcbd1cac06eeefb2a939773358c061183c	2019-01-09 08:53:57 -08:00
Adam Paszke	5e1b35bf28	Add support for batch_norm fusion to the JIT (#15146 ) Summary: We don't support reductions yet, but simply decomposing batch_norm into a kernel that computes the stats, and the fusing everything else with ReLU and following pointwise ops provides nice speedups. Note that this is only limited to inference mode for now, because we don't support convolutions and batch norm in AD, so the fuser isn't applied to those parts. This commit gives us a 7% end-to-end speedup for ResNet50 with batch size 32. Note that this only applies to inference mode at the moment due to lack of AD support for CNN operations (I'll be adding that soon), and not to the standard `torchvision` models, because they use in-place ops which aren't supported by the fuser (we need a way of proving that de-inplacing them is safe). cc zou3519 zdevito mruberry ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/15146 Differential Revision: D13548303 Pulled By: zou3519 fbshipit-source-id: a2e2e5abc383f637fae19bd1b423f20c2cbc056a	2019-01-08 07:00:19 -08:00
Michael Suo	f636dc9276	clang format world (#15524 ) Summary: The PR clang-formats everything in `torch/csrc/jit/` and adds it to the pre-commit hook. Here is a list of non-mechanical changes: - I went over each file and fixed up whenever I could tell that clang-format was clobbering comment formatting. - Made the macros in register_prim_ops a little more clang-format friendly by omitting trailing commas - Refactored autodiff.cpp to use a helper class with explicit state rather than a bunch of capturing lambdas - Small improvements to the precommit hook clang-format Pull Request resolved: https://github.com/pytorch/pytorch/pull/15524 Differential Revision: D13547989 Pulled By: suo fbshipit-source-id: 3ff1541bb06433ccfe6de6e33f29227a2b5bb493	2018-12-26 06:55:01 -08:00
Peter Goldsborough	7a61306031	Enable all clang-tidy performance checks (#15198 ) Summary: This PR adds the final set of clang-tidy checks we should add for our codebase: a last set of performance-related checks. Most fixes here are around changing `auto` to `const auto&` in a few places where unnecessary copies were made, and adding `reserve()` calls before loops doing repeated `push_back()`. Also a few cases of calling `std::string::find` with a single-character string literal instead of a single char, which uses a less efficient string search algorithm meant for searching larger substrings. ![image](https://user-images.githubusercontent.com/6429851/49978940-adc1a780-ff01-11e8-99da-a4e431361f07.png) ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/15198 Differential Revision: D13468797 Pulled By: goldsborough fbshipit-source-id: 2bed1ea1c7c162b7f3e0e1026f17125e88c4d5b2	2018-12-14 13:32:47 -08:00
Natalia Gimelshein	fb140c7828	add erf and erfc to fuser/autodiff Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15139 Differential Revision: D13455690 Pulled By: soumith fbshipit-source-id: b06e5f5d362869c2e5fa11a52f9450d77c30d4cb	2018-12-13 19:17:40 -08:00
Edward Yang	517c7c9861	Canonicalize all includes in PyTorch. (#14849 ) Summary: Anywhere we used #include "foo.h", we now say #include <foo.h> Paths are adjusted to be rooted out of aten/src, torch/lib, or the root level directory. I modified CMakeLists.txt by hand to remove TH and THC from the include paths. I used the following script to do the canonicalization: ``` import subprocess import re import os.path files = subprocess.check_output(['git', 'ls-files']).decode('utf-8').rstrip().split('\n') for fn in files: if not any(fn.endswith(suff) for suff in ['.cu', '.cpp', '.in', '.h', '.hpp', '.cu', '.cuh', '.cc']): continue if not any(fn.startswith(pref) for pref in ["aten/", "torch/"]): continue with open(fn, 'r') as f: c = f.read() def fmt(p): return "#include <{}>".format(p) def repl(m): p = m.group(1) if p in ["dlfcn.h", "unistd.h", "nvrtc.h", "cuda.h", "cuda_runtime.h", "cstdint", "cudnn.h", "Python.h", "cusparse.h", "cuda_runtime_api.h", "cuda_fp16.h", "cublas_v2.h", "stdint.h", "curand_kernel.h"]: return fmt(p) if any(p.startswith(pref) for pref in ["torch/csrc", "c10/", "ATen/", "caffe2/", "TH/", "THC/", "Eigen/", "gtest/", "zdl/", "gloo/", "onnx/", "miopen/"]): return fmt(p) for root in ["aten/src", "torch/lib", ""]: for bad_root in [os.path.dirname(fn), "aten/src/TH", "aten/src/THC", "torch/csrc"]: new_p = os.path.relpath(os.path.join(bad_root, p), root) if not new_p.startswith("../") and (os.path.exists(os.path.join(root, new_p)) or os.path.exists(os.path.join(root, new_p + ".in"))): return fmt(new_p) print("ERROR: ", fn, p) return m.group(0) new_c = re.sub(r'#include "([^"]+)"', repl, c) if new_c != c: print(fn) with open(fn, 'w') as f: f.write(new_c) ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14849 Reviewed By: dzhulgakov Differential Revision: D13363445 Pulled By: ezyang fbshipit-source-id: 52361f878a672785f9306c9e9ab2513128092b68	2018-12-08 19:38:30 -08:00
Adam Paszke	8dfebc16cc	Improvements for symbolic AD (#14758 ) Summary: Review only the last commit. This commit adds a few optimizations to AD, that let us dramatically reduce the number of sizes we capture from forward. We now: - collapse chains of SumToSize - avoid capturing sizes of tensors that are captured anyway - more aggressively DCE the reverse code - run CSE on the primal code to deduplicate `aten::size` calls cc zou3519 zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/14758 Differential Revision: D13324440 Pulled By: zou3519 fbshipit-source-id: 45ccbc13605adcef2b461840c6089d3200000c72	2018-12-04 20:38:21 -08:00
Adam Paszke	d76fd43294	Reenable all forward-pass fusions that worked before the AD fix (#14558 ) Summary: Dealing with so many `aten::size` calls (in particular calls on elements computed inside fusion groups) requires us to do some extra graph processing in the fuser (to compute the sizes by explicit broadcasts, instead of writing the intermediate tensors only to check their size). This restores the forward expects of LSTM and MiLSTM to a single big kernel. Unfortunately the backward is much harder, because as long as we can't prove that the reductions are unnecessary (or if we can't distribute them over the op), we will not be able to fuse them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14558 Differential Revision: D13321748 Pulled By: zou3519 fbshipit-source-id: c04fc2f70d106d2bfb56206b5aec517a93b79d1f	2018-12-04 15:43:37 -08:00
Adam Paszke	7bc489c827	Disable randn_like fusion in the JIT (#14752 ) Summary: Fixes #14674. We won't have time for a proper fix before the release, so at least disable fusion of nodes that trigger incorrect behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14752 Differential Revision: D13320407 Pulled By: zou3519 fbshipit-source-id: 2400f7c2cd332b957c248e755fdb0dadee68da5d	2018-12-04 08:55:47 -08:00

1 2 3

118 Commits