pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Wanchao Liang	ac00e85e36	Remove undefined tensor in jit script (#16379 ) Summary: This PR is a follow up of #15460, it did the following things: * remove the undefined tensor semantic in jit script/tracing mode * change ATen/JIT schema for at::index and other index related ops with `Tensor?[]` to align with what at::index is really doing and to adopt `optional[tensor]` in JIT * change python_print to correctly print the exported script * register both TensorList and ListOfOptionalTensor in JIT ATen ops to support both * Backward compatibility for `torch.jit.annotate(Tensor, None)` List of follow ups: * remove the undefined tensor semantic in jit autograd, autodiff and grad_of * remove prim::Undefined fully For easy reviews, please turn on `hide white space changes` in diff settings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16379 Differential Revision: D13855677 Pulled By: wanchaol fbshipit-source-id: 0e21c14d7de250c62731227c81bfbfb7b7da20ab	2019-02-07 11:02:14 -08:00
Mikhail Zolotukhin	0e6123fb8a	Remove dependency on ResourceGuard from IR.h. (#16351 ) Summary: It looks like `WithInsertionPoint` and `WithCurrentScope` can be easily implemented without `ResourceGuard` - that helps readability and removes one more dependency. Is there anything I'm missing? Pull Request resolved: https://github.com/pytorch/pytorch/pull/16351 Differential Revision: D13821826 Pulled By: ZolotukhinM fbshipit-source-id: b203200b345fb5508a97dc8656e6f51cde4cc21f	2019-01-29 00:21:32 -08:00
Mikhail Zolotukhin	47bf30661f	Directly include headers from ATen. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16287 Differential Revision: D13792949 Pulled By: ZolotukhinM fbshipit-source-id: d627d8dc469df048063c70d0b5b8d33fede809a3	2019-01-24 11:22:27 -08:00
Michael Suo	f636dc9276	clang format world (#15524 ) Summary: The PR clang-formats everything in `torch/csrc/jit/` and adds it to the pre-commit hook. Here is a list of non-mechanical changes: - I went over each file and fixed up whenever I could tell that clang-format was clobbering comment formatting. - Made the macros in register_prim_ops a little more clang-format friendly by omitting trailing commas - Refactored autodiff.cpp to use a helper class with explicit state rather than a bunch of capturing lambdas - Small improvements to the precommit hook clang-format Pull Request resolved: https://github.com/pytorch/pytorch/pull/15524 Differential Revision: D13547989 Pulled By: suo fbshipit-source-id: 3ff1541bb06433ccfe6de6e33f29227a2b5bb493	2018-12-26 06:55:01 -08:00
Zachary DeVito	0368054a6d	Split up compiler.cpp (#15355 ) Summary: This separates the different parts of compiler.cpp to make their relationship more clear. In particular it adds: * sugared_value.{h,cpp} - all the public SugaredValues that the compiler defines and a few that were inside compiler.cpp * type_parser.{h, cpp} - Turns TreeRef's defining types into TypePtr * schema_matching.{h, cpp} - infrastructure for matching arguments against overloaded schema and emitting builtin operators with a particular schema. Retains: * compiler.{h, cpp} - now responsible simply for the `defineMethodsInModule` infra structure. Some utility functions like inlineCallTo have moved to ir.h. Only thing that is not a move is some changes in module.h/cpp that remove multiple returns from `Method::emit_call_to`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15355 Reviewed By: suo, wanchaol Differential Revision: D13507524 Pulled By: zdevito fbshipit-source-id: 69ec936a9ff1a383c12a883616346b219c72e393	2018-12-18 19:43:35 -08:00
David Riazati	f3bff2d500	Add RNNCell modules to Script standard library (#14695 ) Summary: Adds RNNCell modules to script standard lib cc apaszke for argument_spec changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/14695 Differential Revision: D13467680 Pulled By: driazati fbshipit-source-id: 13a14da87714325cc4c3d49e5fde8a850d5d757b	2018-12-18 17:28:28 -08:00
James Sun	e37a22128e	Allow tracing with fork/wait (#15184 ) Summary: There is still limitation on this: if a script module is somewhere in the trace, the inputs/outputs can only be tensors or tuples of tensors. resolves #15052 Pull Request resolved: https://github.com/pytorch/pytorch/pull/15184 Differential Revision: D13457691 Pulled By: highker fbshipit-source-id: 8fe46afc41357a0eb8eadd83f687b31d074deb0e	2018-12-17 20:34:26 -08:00
Peter Goldsborough	7a61306031	Enable all clang-tidy performance checks (#15198 ) Summary: This PR adds the final set of clang-tidy checks we should add for our codebase: a last set of performance-related checks. Most fixes here are around changing `auto` to `const auto&` in a few places where unnecessary copies were made, and adding `reserve()` calls before loops doing repeated `push_back()`. Also a few cases of calling `std::string::find` with a single-character string literal instead of a single char, which uses a less efficient string search algorithm meant for searching larger substrings. ![image](https://user-images.githubusercontent.com/6429851/49978940-adc1a780-ff01-11e8-99da-a4e431361f07.png) ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/15198 Differential Revision: D13468797 Pulled By: goldsborough fbshipit-source-id: 2bed1ea1c7c162b7f3e0e1026f17125e88c4d5b2	2018-12-14 13:32:47 -08:00
Edward Yang	517c7c9861	Canonicalize all includes in PyTorch. (#14849 ) Summary: Anywhere we used #include "foo.h", we now say #include <foo.h> Paths are adjusted to be rooted out of aten/src, torch/lib, or the root level directory. I modified CMakeLists.txt by hand to remove TH and THC from the include paths. I used the following script to do the canonicalization: ``` import subprocess import re import os.path files = subprocess.check_output(['git', 'ls-files']).decode('utf-8').rstrip().split('\n') for fn in files: if not any(fn.endswith(suff) for suff in ['.cu', '.cpp', '.in', '.h', '.hpp', '.cu', '.cuh', '.cc']): continue if not any(fn.startswith(pref) for pref in ["aten/", "torch/"]): continue with open(fn, 'r') as f: c = f.read() def fmt(p): return "#include <{}>".format(p) def repl(m): p = m.group(1) if p in ["dlfcn.h", "unistd.h", "nvrtc.h", "cuda.h", "cuda_runtime.h", "cstdint", "cudnn.h", "Python.h", "cusparse.h", "cuda_runtime_api.h", "cuda_fp16.h", "cublas_v2.h", "stdint.h", "curand_kernel.h"]: return fmt(p) if any(p.startswith(pref) for pref in ["torch/csrc", "c10/", "ATen/", "caffe2/", "TH/", "THC/", "Eigen/", "gtest/", "zdl/", "gloo/", "onnx/", "miopen/"]): return fmt(p) for root in ["aten/src", "torch/lib", ""]: for bad_root in [os.path.dirname(fn), "aten/src/TH", "aten/src/THC", "torch/csrc"]: new_p = os.path.relpath(os.path.join(bad_root, p), root) if not new_p.startswith("../") and (os.path.exists(os.path.join(root, new_p)) or os.path.exists(os.path.join(root, new_p + ".in"))): return fmt(new_p) print("ERROR: ", fn, p) return m.group(0) new_c = re.sub(r'#include "([^"]+)"', repl, c) if new_c != c: print(fn) with open(fn, 'w') as f: f.write(new_c) ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14849 Reviewed By: dzhulgakov Differential Revision: D13363445 Pulled By: ezyang fbshipit-source-id: 52361f878a672785f9306c9e9ab2513128092b68	2018-12-08 19:38:30 -08:00
Adam Paszke	a60368982b	Batch more matrix multiplies (#13456 ) Summary: This handles the input pre-multiplication in RNNs, yielding pretty significant speedups in backward times. This pass depends on loop unrolling, so we'll batch only as many elements as the unrolling factor allows. cc mruberry ngimel zou3519 zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/13456 Differential Revision: D12920339 Pulled By: zou3519 fbshipit-source-id: 5bcd6d259c054a6dea02ae09a9fdf9f030856443	2018-11-26 09:20:35 -08:00
Michael Suo	33d091f432	shape analysis fix (#14325 ) Summary: This PR is deceptively large because of an indenting change. The actual change is small; I will highlight it inline Pull Request resolved: https://github.com/pytorch/pytorch/pull/14325 Differential Revision: D13183296 Pulled By: suo fbshipit-source-id: fcbf6d5317954694ec83e6b8cc1c989f2d8ac298	2018-11-23 11:24:24 -08:00
Michael Suo	b149456645	alias analysis (#14018 ) Summary: First draft of an alias analysis pass. It's a big PR unfortunately; a rough table of contents/suggested order of review: 1. `AliasAnalysis` pass, which traverses the graph and builds an `AliasDb`. The basic strategy is to assign alias information to every value of mutable type (list/tuple/tensor), and use the alias annotations of each node's schema to assign alias info to the outputs based on the alias info the inputs. Nodes that aren't explicitly schematized have hand-written analysis rules. 2. Integration of aliasing information into `moveBefore/AfterTopologicallyValid()`. Basically, we pass in an alias DB when we ask for moveBefore/After. Similar to how we can boil down dependency analysis to "what nodes use this node", we can boil down mutability analysis to "what nodes write to an alias set input/output'd by this node". 3. Integration of alias analysis to optimization passes that need it. Right now, it is `GraphFuser`, `CreateAutodiffSubgraphs`, constant prop, and CSE. Not sure if any others need it. - Testing; still figuring out the best way to do this. - Eventually we want to integrate the alias db into the graph, but we shouldn't do that until we can guarantee that the information can stay up to date with mutations. - Do the same thing `python_printer` did for operators and force people to register alias analyzers if they can't schematize their op. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14018 Differential Revision: D13144906 Pulled By: suo fbshipit-source-id: 1bc964f9121a504c237cef6dfeea6b233694de6a	2018-11-21 17:48:46 -08:00
Michael Suo	7ea9c674bc	migrate subgraph slicing to use `moveBefore/moveAfter` (#13862 ) Summary: Migrate the `CreateAutodiffSubgraphs` pass to use topologically-safe moves instead of DynamicDAG. This is to unify the interface that we use for determining safe node moves to prepare for mutability. The pass looks a lot like GraphFuser now, and there's a lot of code duplication. I plan to pull common stuff out into a "subgraph manipulation utils" thing, but didn't want to clutter this PR. Future steps: - Get rid of code duplication (see above) - Use DynamicDAG to back the `moveBefore/After` calls. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13862 Differential Revision: D13072871 Pulled By: suo fbshipit-source-id: 92e7880ef444e0aefd51df60964bba7feaf42ae0	2018-11-14 17:33:36 -08:00
Thomas Viehmann	9ffabcfcaa	Use nested variant of getValueTrace to allow more flexible tracing script modules (#13597 ) Summary: When tracing scripted functions, we used to only allow Tensor arguments. This enables tracing script modules with List[Tensor] or Tuple[Tensor, Tensor] arguments (passing tuples). Fixes: #13566 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13597 Differential Revision: D12990464 Pulled By: soumith fbshipit-source-id: fdce3afcb1e09f3c26d6ce834c01bf18d261f47c	2018-11-09 06:24:02 -08:00
Adam Paszke	f6ff5d8934	Append parameters when checking graphs for TorchScript Methods (#13553 ) Summary: Also, add an assertion in the GraphExecutor to make sure we don't access memory out of bounds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13553 Differential Revision: D12924796 Pulled By: soumith fbshipit-source-id: ea2a134084538484178b8ebad33d6716a8e1d633	2018-11-05 16:07:36 -08:00
Peter Goldsborough	0479517325	Add modernize-* checks to clang-tidy (#13196 ) Summary: Enables almost all `modernize-*` checks in clang-tidy. This warns against things such as: - Use of `const std::string&` instead of new-style `std::string` + move, - Using old-style loops instead of range-for loops, - Use of raw `new` - Use of `push_back` instead of `emplace_back` - Use of `virtual` together with `override` (`override` is sufficient) ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/13196 Differential Revision: D12891837 Pulled By: goldsborough fbshipit-source-id: 4d0f782a09eb391ee718d3d66f74c095ee121c09	2018-11-02 20:30:40 -07:00
Michael Suo	5fbaf0eaf8	add augmented assignment ops (#13364 ) Summary: This PR changes the compiler to correctly emit in-place operators for augmented assignments (`+=` and friends). - To better match the Python AST structure, add an `AugAssign` tree view and make `Assign` apply only to `=` assignments. - Emit those `AugAssign` exprs in the compiler, dispatching to in-place aten ops for tensors and lowering to simple assignments for scalar types. - In order to preserve (suspect) ONNX export semantics, add a pass to lower the in-place operators to out-of-place operators. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13364 Differential Revision: D12899734 Pulled By: suo fbshipit-source-id: bec83be0062cb0235eb129aed78d6110a9e2c146	2018-11-02 00:01:07 -07:00
Michael Suo	57e162da56	Switch mutable lists to new mutable schema (#13406 ) Summary: Goodbye, World! This PR removes the world tokens and associated pass and switches lists over to the new mutability/aliasing annotations. Should resolve #12780 since we are disabling optimization pending alias analysis. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13406 Differential Revision: D12886463 Pulled By: suo fbshipit-source-id: e64e55905aebdcad273b39862df3209f823f5408	2018-11-01 19:41:04 -07:00
mruberry	6fe089c6ea	Hierarchical device independent -> device specific architecture (#13108 ) Summary: This PR principally redesigns the fuser's logical flow to be hierarchical, with device-independent logic directing (relatively little) device-specific logic. This design is based on reviews of XLA, TVM, internal design review at NVIDIA and discussions with fuser owners at Facebook. To further vet the design I have begun developing the next significant PR (extended fusion logic) on top of this architecture and it has made the work significantly easier. This PR also improves fuser modularity, which should make it easier for others to contribute to. Unfortunately, this PR is large and its nature has made breaking it into smaller pieces challenging. Future PRs should be smaller. The fusion flow is now: - Fusions are "registered" and "upfront compilation" occurs. The fusion specifications, which includes the graph, go into a thread-safe device-independent cache. Upfront compilation generates some information used later during shape inference. - Fusions are run, which passes them to an executor that performs shape inference, requests an instantiated fusion from the specification's thread-safe store, and launches them. Launch logic eventually defers to device-specific logic. - Fusions not previously instantiated are compiled. Compilation is device-specific and arg-specific. Compilation logic eventually defers to device-specific logic. - If the fusion could not be run because fusion on the requested device is disabled or shape inference fails a fallback is invoked. This flow can be thought of as PyTorch IR -> Device-Independent Fusion Logic -> Device-Specific Fusion Logic. The current upstream logic is, by contrast, PyTorch IR -> Device-Specific Logic -> Device-Independent Logic, which results in needless code duplication and lack of conceptual clarity. That was my mistake when splitting the fuser off from the rest of the jit and our reviews since then have been incredibly helpful in understanding why the approach in this PR is better. This PR does not only move code around. It also fixes few couple bugs and makes some logical/code changes. Bug fixes: - thread-safety is improved with caches preventing concurrent access - the nvrtc version is now reviewed to determine the appropriate compute architecture to compile for, fixing a bug that would cause runtime errors if a user's nvrtc didn't support the compute architecture their gpu reported - an issue with DeviceGuard not setting the device properly and failing silently is worked-around (ezyang mentioned he was reviewing the dynamic registration DeviceGuard uses, which may resolve the issue) Code/Logical changes: - "const" now appears many more places (note: I cast const away in operator.h because of some obscure build issues -- I think we should be able to fix this and will take a look while this goes through testing) - The new flow allowed some redundant code to be removed (AnnotatedGraph is gone, for example, and the more straightforward flow eliminated duplication of effort elsewhere) - Fallback logic is now also invoked if a fusion is requested on a device that cannot handle fusions - Use of macros to determine which files are compiled is reduced (though they may come back if the Windows build is unhappy) - There is no more "common" code or folder, the device-independent logic being at the forefront of the fuser replaces and improves upon the goal of sharing code apaszke who I promised naming rights to zdevito who correctly pointed out that the device-independent logic should be the bulk of what the fuser is doing ngimel who contributed to the design of this architecture Pull Request resolved: https://github.com/pytorch/pytorch/pull/13108 Reviewed By: gchanan, fmassa Differential Revision: D12850608 Pulled By: soumith fbshipit-source-id: 24e2df6dfa97591ee36aeca8944519678c301fa3	2018-10-31 18:13:00 -07:00
Zachary DeVito	ce0d3e9b35	Bind inplace and _out variants into JIT (#13093 ) Summary: This commit is a minimial initial pass at adding inplace and _out variants to the JIT. It changes gen_jit_dispatch.py to add bindings for these operators, and it also supplements the FunctionSchema with alias information for these operators and for viewing operators. Tests are very minimal and will need to be improved in future commits. Notes: * Custom operator tests needed to be changed since _out variants add overloads, which the custom operator pipeline does not handle when called from python. This commit registers special test ops in the _test namespace for this purpose. * Extends the schema parser to parse alias annotations more robustly. * Extends FunctionSchema with `writes()` a set of alias set names that the op will write to, and `annotatedType()` which will return AnnotatedType objects which contain the alias_set information that was parsed from the schema. * Disables all optimizations in graph executor when a mutable operator is found. This is something that will be improved in the future but is necessary for correctness now. * Adds annotate_ops to gen_jit_dispatch which adds aliasing information to all of the aten ops. * Adds AnnotatedType to the type hierarchy which is used to mark List and Tensor types with their alias_set. These types only appear in schema when you call annotatedType and are erased from types in normal use. * Extends jit::Type with .containedTypes() and .withContained(new_types). The first returns all types contained within the type (e.g. T for T[], or {T,L} for a tuple (T, L)). The second constructs a new version of the same type, replacing the contained types with new_types. This simplifies a lot of logic for recursively cleaning up types. * Refactor List[T] into a common part that is shared with Annotated[T] and can be shared with Optional[T] and Future[T] when they are merged. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13093 Differential Revision: D10848176 Pulled By: zdevito fbshipit-source-id: d057f23eeb99cde8881129b42d3f151ed5e7655d	2018-10-26 10:37:20 -07:00
Elias Ellison	00aedfc0e2	constant pooling pass (#12222 ) Summary: Add a pass to move all constants to the beginning of the graph, and deduplicate. This extends https://github.com/pytorch/pytorch/pull/10231 to also handle constants introduced in inlining, constant propagation, etc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12222 Reviewed By: driazati Differential Revision: D10201616 Pulled By: eellison fbshipit-source-id: bc9c5be26868c8b5414257a0d4462de025aeb9bd	2018-10-08 11:55:02 -07:00
Zachary DeVito	bd09ab6687	Remove stages from IR, they are not longer used Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12352 Differential Revision: D10219743 Pulled By: zdevito fbshipit-source-id: 4d9441dc3748616f9b1f0734c65ec1a7abb0d663	2018-10-05 13:58:15 -07:00
Michael Suo	7f35e92af2	mutable lists (#10700 ) Summary: This PR implements the design that we discussed. Changes: - Added a World token IValue and type. The IValue is basically a dummy struct for now, in the future we may extend it (say, add thread-local state). - Effectful ops explicitly declare they are mutable by having World tokens as inputs and outputs in their schema. - Purely functional ops that use mutable values will get "fenced" and the world token will be threaded through the fences - AnnotateEffects pass which wires up all the world tokens together. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10700 Reviewed By: eellison Differential Revision: D9547881 Pulled By: michaelsuo fbshipit-source-id: ebbd786c31f15bf45e2ddb0c188438ff2f5f3c88	2018-09-27 19:25:13 -07:00
Richard Zou	c8a0b11b7f	add autodiff expressions for common operations (#11832 ) Summary: This PR does a few things: Previously test_jit.py only tested autograd on backward graphs. This is because we borrow from test_autograd and construct graphs with a small number of nodes. Because the number of nodes is small (typically 1-2), those graph do not end up containing autodiff subgraphs, so autodiff never gets tested. This PR enables autodiff testing by doing the following: - added disableDebugAutodiffSubgraphInlining fn to graph_executor to disable autodiff subgraph inlining. - (implementation) added autodiffSubgraphNodeThreshold and autodiffSubgraphInlineThreshold. These are set to their default values (2, 5) but disableDebugAutodiffSubgraphInlining() sets both to 1, disabling subgraph inlining and allowing 1-node autodiff subgraphs. - The relevant backward jit tests disable autodiff subgraph inlining so they will test the autodiff versions of the operators instead of autograd whenever an autodiff variant exists. - We don't run the tests that do inline autodiff subgraphs anymore. This has no impact on testing correctness because the assumption is that autograd functions are correct and are tested in test_autograd.py This allows the graph fuser to work better because a lot of these ops were previously not autodiff-compatible but fusible. On a more concrete example, lstm backward contains a lot of tensor-scalar operations; these autodiff formulas help its double backward pass. Included: - arithmetic overloads - abs, acos, asin, atan, ceil, cos, cosh, exp, expm1, floor, fmod, frac, log, log10, log1p, log2 reciprocal, remainder, round, sin, sinh, tan, trunc, rsqrt TestJitGenerated tests autodiff for all of the added operations. cc apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11832 Differential Revision: D10031256 Pulled By: zou3519 fbshipit-source-id: 9daf9900a5ad187743609cd0fbbd10b15411ad93	2018-09-26 08:10:04 -07:00
Adam Paszke	7efbf3a827	Specialize ArgumentSpecs on tuple elements too (#11863 ) Summary: This is pretty important because a common situation of passing LSTM hidden states as a tuple completely trashes performance of a network. Cleans up all our propagation/undef specialization passes, at a cost of increased complexity of `ArgumentSpec` and `GraphExecutor`. An alternative would be to simply flatten all tuple inputs to a graph ahead of time, but that might just end up being confusing in the future (you never know if you're working with a graph that can have tuple or not). Pull Request resolved: https://github.com/pytorch/pytorch/pull/11863 Differential Revision: D9992814 Pulled By: apaszke fbshipit-source-id: 0a565a3b23e32f8fa72c0534e07c1ce6187739fc	2018-09-21 14:19:58 -07:00
Adam Paszke	98e04db955	Implement requires_grad propagation in the JIT (#11586 ) Summary: Previously, we would pretty much assume that all floating point tensors do require grad, which might result in some unnecessary compute. I don't really like the fact that `TensorType` uses `tensor.is_variable() && tensor.requires_grad()` to infer the value of `requires_grad`, but changing constants to keep variables turns out to be pretty hard. I got halfway there, but it would still need some more work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11586 Reviewed By: ezyang Differential Revision: D9813648 Pulled By: apaszke fbshipit-source-id: 77f77756d18ff7632fca3aa68ce855e1d7f3bdb8	2018-09-13 19:25:26 -07:00
Adam Paszke	0ddbe668cd	Improve shape analysis to cover all most commonly used ops (#11358 ) Summary: [Here's a list](https://gist.github.com/apaszke/f0821840bdcc67a977832dc58acc1b85) of ops that are in `register_aten_ops.cpp`, but aren't supported in shape prop. Everything else should work now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11358 Differential Revision: D9753693 Pulled By: apaszke fbshipit-source-id: efeae0126ce16cb56b8797fc5246405588bcae3c	2018-09-11 06:02:39 -07:00
Adam Paszke	3081c8ea1d	Lower trivial differentiable subgraphs (#11110 ) Summary: zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11110 Differential Revision: D9616408 Pulled By: apaszke fbshipit-source-id: f1ae77d698bf0ada32f2c1c3f587e46a4f57a867	2018-08-31 14:55:10 -07:00
Adam Paszke	00df09b65d	Change specialization rules in GraphExecutors (#10977 ) Summary: Review last commit only. Stacked on top of #10949. This commit fixes a number of issues connected to caching differentiability status of graphs inside graph executors, and changes the rules for optimization of differentiable subgraphs. Previously every one of those was instantiated as a separate graph executor, but now they are simply heavier-optimized graph regions, and graph executors are only instantiated for their backward. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10977 Differential Revision: D9600626 Pulled By: apaszke fbshipit-source-id: dad09a0f586e396afbd5406319c1cd54fbb8a3d3	2018-08-30 22:11:01 -07:00
Adam Paszke	f3c3127c67	Don't flatten output lists in the JIT IR (#10949 ) Summary: Operators like aten::chunk used to return a number of tensors, but now return a list. To make it easier to do shape prop through aten::chunk and fuse it, I've also introduced prim::ConstantChunk, which behaves like the previous implementation (has a variable length output list). The downside of this PR is that the introduction of more lists to the IR causes the LSTM and MiLSTM graphs to be considered as non-differentiable by the graph executor. I verified that they are still optimize correctly, and my next patch (that changes how the specializations/differentiation works) will restore those. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10949 Reviewed By: zdevito Differential Revision: D9556823 Pulled By: apaszke fbshipit-source-id: 33e63b17fc7247cac6cfc05eb7eb9bf069b499ee	2018-08-30 19:54:39 -07:00
Richard Zou	29406a2c4c	Fix shared_ptr refcycle in graph executor (#10222 ) Summary: Fixes #10032 When capturing an output, GraphExecutorAutogradFunction creates SavedVariable with is_output=False and owns it: https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/graph_executor.cpp#L87 Constructing SavedVariable with is_output=False makes it own a copy of the shared_ptr<GraphExecutorAutogradFunction>, which causes a reference cycle: `6456b944fd/torch/csrc/autograd/saved_variable.cpp (L27)` The solution in this PR is to construct the SavedVariable with is_output=True if the captured value is an output. Test Plan Turn on cuda memory checking for JitTestCase. If the test's name includes "cuda" or "gpu" in it, the cuda memory checking test happens. cc zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10222 Reviewed By: ezyang Differential Revision: D9162995 Pulled By: zou3519 fbshipit-source-id: aeace85a09160c7a7e79cf35f6ac61eac87cbf66	2018-08-04 11:39:10 -07:00
Adam Paszke	1f13453b4d	Slightly relax the constraints on argument and return types to script functions (#9969 ) Summary: This lays out initial support for taking and returning a richer set of types than only tensors. Floats and ints are already valid, lists are straightforward to add, tuples need some discussion. Based on top of #9948. Review only the last commit. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9969 Reviewed By: zdevito Differential Revision: D9076973 Pulled By: apaszke fbshipit-source-id: 5a1fe912ea6b79ab2bfd0dcce265eb05855b5ff0	2018-07-31 14:25:29 -07:00
Elias Ellison	e57cb4a1b2	Add a Constant Propagation Pass to the JIT (#8808 ) Summary: Adding a constant propagation pass to the JIT. I have added examples to the expect files. There are a couple of special cases which have not been implemented here. IF nodes with constant conditions can be inlined with the correct block. WHILE nodes can be removed if the condition is false. I have added a test for each case in test_jit.py file as expected failures. To be consistent with DCE, python ops & CPP ops are treated as not having side-effects. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8808 Reviewed By: wanchaol Differential Revision: D8906770 Pulled By: eellison fbshipit-source-id: 10ad796d89f80b843566c9ddad6a0abd1f3dc74c	2018-07-30 15:54:31 -07:00
Peter Goldsborough	04939a4745	Match parameter names and = default (#9737 ) Summary: More clang tidy cleanups in `torch/csrc`. This time: 1. `hicpp-use-equals-default` recommends `= default` instead of `{}` for constructors/destructors. This is better practice because it expresses the intent better (https://stackoverflow.com/questions/6502828/what-does-default-mean-after-a-class-function-declaration) 2. `readability-inconsistent-declaration-parameter-name` enforces that parameter names in the declaration match parameter names in the definition. This is just generally useful and can prevent confusion and bugs. Also updated my script a little bit. apaszke ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/9737 Differential Revision: D9069069 Pulled By: goldsborough fbshipit-source-id: f7b3f3a4eb4c9fadc30425a153566d3b613a41ae	2018-07-30 14:10:00 -07:00
Adam Paszke	8cb1eef7b9	Unify IR operator representation (stop using attributes in the JIT) (#9807 ) Summary: Based on top of #9763 (first 3 commits belong to that PR). The first commits from this PR are "Stop using attributes ..." I tried to separate the changes into fairly meaningful commits. I can't split them up into smaller PRs, because everything starts working and all tests pass only after the whole sequence, but hopefully this will make reviewing somewhat easier. Known issues/regressions/future tasks: - `aten::lerp` and `aten::clamp` are no longer fusable - `CreateAutodiffSubgraphs` needs a rewrite - It is much more strict now, and will miss a lot of opportunities, especially when viewing ops are involved. Our previous approach was "ignore the assumption on shape availability in gradient formulas to determine differentiability, and hope that shape prop will be robust enough to actually deliver them before we differentiate", which obviously doesn't scale well to more complex cases. We should either work on reducing the size dependency of grad formulas (feasible e.g. for `view`/`reshape`, unfeasible for `squeeze`/`unsqueeze`), or make `CreateAutodiffSubgraphs` integrate some kind of "I could integrate this node into an AD subgraph, but will I be able to infer the shape of its input" reasoning (kind of like a limited shape prop, that doesn't infer anything, and only tells if it could infer something). - It sometimes creates constant-only (or constants + one node) graphs, which is useless - Broken `aten::add` in auto-batching, because it gained a non-tensor input. I changed the test for pointwise operations to use `aten::mul` instead, but I needed to disable the LSTM cell test. I'm not sure how scalar constants should be implemented in this case, because I don't fully understand our format. cc: ChunliF - Graph import does some hacks to recover type of constants. This code should be removed once we'll gain the ability to export the IR along with value types. - There's still a fair amount of dead code that can be removed. I didn't want to make this diff any bigger, and removing it is an easy task. - Graph fuser could be improved to use signature matching (possibly using `OperatorSet`) instead of basing on node kinds. - Manual constant propagation for the `ListConstruct` node in `torch/onnx/utils.py` should be replaced with a proper constant propagation pass (or we should ensure that the one we have handles at least this case before we remove this code). zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9807 Reviewed By: ezyang Differential Revision: D9004285 Pulled By: apaszke fbshipit-source-id: fe88026a765f6b687354add034c86402362508b7	2018-07-26 22:11:50 -07:00
Adam Paszke	e39c8043dc	Make GraphExecutors work on Stacks instead of variable_tensor_lists (#9763 ) Summary: This is blocking the IR operator unification, because I need to be able to pass scalars to backward functions. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9763 Reviewed By: zou3519 Differential Revision: D8978457 Pulled By: apaszke fbshipit-source-id: 570b4c3409322459cb0f2592069730a7d586ab20	2018-07-26 12:00:27 -07:00
James Reed	0b16b03b98	Plumb type annotations through script compilation (new) (#9547 ) Summary: Supersedes https://github.com/pytorch/pytorch/pull/9405 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9547 Reviewed By: zdevito Differential Revision: D8900327 Pulled By: jamesr66a fbshipit-source-id: a00a94615af4fbaec98ee3ede0cb54bcfd9108dd	2018-07-25 17:10:14 -07:00
Peter Goldsborough	f62bc01dfe	Remove TORCH_ASSERT (#9575 ) Summary: I got some tensor->variable conversion exceptions from `torch/csrc/autograd/variable.h`, which used the `TORCH_ASSERTM` macros instead of `AT_CHECK`, so they didn't have backtraces. This was such a substantial loss for debugability that I decided to update the whole codebase to use the backtrace-enabled ATen macros instead of `TORCH_ASSERT` and `JIT_ASSERT`, the latter having been an alias of the former. ezyang apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9575 Differential Revision: D8924566 Pulled By: goldsborough fbshipit-source-id: 7a4013b13eec9dbf024cef94cf49fca72f61d441	2018-07-24 18:10:06 -07:00
Adam Paszke	aa7af94656	Make JIT tracing a thread-local property (#9414 ) Summary: As in the title. Lets us simplify a lot of code. Depends on #9363, so please review only the last commit. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9414 Reviewed By: zdevito Differential Revision: D8836496 Pulled By: apaszke fbshipit-source-id: 9b3c3d1f001a9dc522f8478abc005b6b86cfa3e3	2018-07-19 19:09:39 -07:00
Zachary DeVito	9ed2190bdb	Add a tagged union type that replaces tensor in the interpreter. (#9368 ) Summary: IValue is short for interpreter value. It is used frequently so a short name is important. This will allow us to implement more non-tensor types in an efficient way and remove many hacks from the compiler. This PR is limited. It only introduces IValue and changes interpreter to use it. Follow up PRs will: * Change the way aten_ops consume non-tensor types so that integer lists, are no longer represented as Tensors. * Introduce TensorList as a fundamental type and remove all vararg handling in gen_jit_dispatch * Change the compiler to implement math on primitive numbers rather than converting to tensors. jamesr66a apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/9368 Reviewed By: ezyang Differential Revision: D8817598 Pulled By: zdevito fbshipit-source-id: 29dce80611ce5f6384234de9d12a67861d2b112f	2018-07-16 15:40:22 -07:00
Mary McBreen	483ae8cb5d	Replaces const ref with && for apply (#9175 ) Summary: Addresses https://github.com/pytorch/pytorch/issues/5011 Tested with python test/test_autograd.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/9175 Reviewed By: zdevito Differential Revision: D8736377 Pulled By: marymcbreen fbshipit-source-id: ff86f427f7b2cf0cab5912e7f32812bd0f49a712	2018-07-12 08:31:59 -07:00
Adam Paszke	b9f575fc33	Remove legacy code from the JIT (#9323 ) Summary: In particular, get rid of backward tracing and CppOp. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9323 Reviewed By: ezyang Differential Revision: D8795935 Pulled By: apaszke fbshipit-source-id: fb7a7eeee41902da35f2a8efd77262ca60fd6bbe	2018-07-11 10:25:38 -07:00
Zachary DeVito	f74207c99f	Allow autograd to work even when the shape of values cannot be determined (#8641 ) This commit implements the solution proposed in https://github.com/pytorch/pytorch/issues/8410 to workaround the need to create zero tensors with the same shape as inputs. It introduces the concept of a LinearBlock which marks places in the code where we know if all the inputs to the node are zero, then the outputs to the node are also zero. Autodiff introduces LinearBlocks around backwards functions, which have this property. specializeUndef then propagates Undef nodes using this information. Notes: * Since we do not always specialize, we have a pass LowerLinearBlocks that replaces the block with an if statement that dynamically guards the Undef case. * We introduce AutogradAdd which is addition that still works when its inputs might be undefined. In cases where we specialize this will get removed in favor of a normal add, but there are cases where gradient graphs do not specialize (e.g. when they are not differentiable, but a derivative is required) so it is important for this op to be executable.	2018-06-25 18:40:04 -07:00
Richard Zou	8489c4cc6e	Better support for literals in jit script (#8687 ) Addresses #8177 A design doc can be found here: [gist](https://gist.github.com/zou3519/4b7f13f03cc9f3612bd9363e6405fa0a) version or [quip](https://fb.quip.com/azL1AqUckBdo) version General approach: - Add NumberType, FloatType, IntType to represent Python numbers, floats and ints. - Emit these types for python literals - Change aten_schema such that Scalars are NumberType, int64_t and bool are IntType. - Emit aten::type_as, prim::NumToTensor, and prim::TensorToNum nodes for tensor-number math. (see examples below) - Erase NumberType, prim::NumToTensor, and prim::TensorToNum for ONNX export ### Tensor/number math ``` import torch @torch.jit.script def fn(x): return x + 1 ``` ``` graph(%x : Dynamic) { %1 : int = prim::Constant[value={1}]() %2 : Dynamic = prim::NumToTensor(%1) %3 : Dynamic = aten::type_as(%2, %x) %4 : Dynamic = aten::add[alpha={1}](%x, %4) return (%5); } ``` ### Number/Number Math ``` import torch @torch.jit.script def fn(zero): c = 1 + 1 return zero + c ``` ``` graph(%zero : Dynamic) { %1 : int = prim::Constant[value={1}]() %2 : int = prim::Constant[value={1}]() %3 : Dynamic = prim::num_to_tensor(%1) %4 : Dynamic = prim::num_to_tensor(%2) %5 : Dynamic = aten::add[alpha={1}](%3, %4) %c : int = prim::TensorToNum(%6) # this is the result of the addition ... return (%13); } ``` List of squashed commits: * Introduce Python Number types Added: IntType, FloatType, NumberType with IntType <: NumberType FloatType <: NumberType Changed aten_schema so arguments have corresponding types * Emit a NumberType for python literals. Also emit a NumberType for Scalar default values. * Add prim::NumToTensor and prim::TensorToNum * Add DynamicType -> NumberType implicit cast for bc * Better ensureTensor error message * Add ensureTensorOrNumber. Allow passing Number to some functions Like the range() construct and slices * Patch IntList to work. IntList is still a DynamicType in the frontend: a tensor gets built from a List[int]. Also, IntList[1] is a "union between int and IntList" the way it is implemented. If the frontend sees an int being passed for an IntList[1] arg, it converts it to a tensor as well. * Enforce some order on schemas to avoid overload ambiguity add(Tensor, Tensor) should appear earlier than add(Tensor, Scalar). This matches the order in which python_arg_parser parses its arguments. * Disable std_dim and var_dim tests. With the new schema information, std(input, keepdim) and std(input, dim) are ambiguous. This will need to be fixed at a later date. * Add NumberType erasure pass. This is used for ONNX export and to ensure that NumberType information doesn't reach the interpreter * Add support for mixed tensor/number math ops. * Tests for new functionality. Includes: - Tensor/number math - number/number math - EraseNumberTypes pass test * Patch tests Update expect tests for: - decompose_addmm - loop unrolling tests Because python numbers are now NumberType, they cannot be returned by functions anymore. Work around this by using "torch.full", or by adding a tensor([0]) (taken from FIXME_zerol()). Both approaches are used because torch.full is more readable, but it is broken in some cases. * Add erase_number_types to torch/CMakeLists.txt * Move math back to emitSimpleExpr from emitSugaredExpr * Remove some dead lines * Renable some excluded script/trace tests that are fixed. * Move some tests to expected failure * Address some comments (more addressing to come) * Erase relevant aten::type_as nodes in EraseNumberTypes I also changed it so that EraseNumberTypes is only called for ONNX export. It is no longer used to prevent prim::NumToTensor/prim::TensorToNum from reaching shape_analysis or interpreter.cpp. shape_analysis infers the type of the output of these nodes to be the same as their input. intepreter.cpp treats both of these nodes as no-ops. * Add reminder to fix std/var * Call EraseNumberTypes only when exporting a script module * Update expects after rebase	2018-06-21 15:43:38 -04:00
ngimel	8d674c0d51	add comparison operators to jit (#8058 ) * add comparison operators to jit * try to fix CI * address review comments * fix type of comparison ops result * address review comments * fix indentation * add comments * require type_as to have non-dynamic tensor arg * Typo (should check if template argument of type_as, inputs()[1], is tensor) * Use .at() instead of [] * Use .at() again	2018-06-14 09:30:25 -04:00
Adam Paszke	f45a3d5558	Add a loop unrolling pass to PyTorch JIT (#7672 )	2018-06-06 09:36:12 +02:00
Adam Paszke	9232afeffa	Add code for TensorBoard visualization of JIT GraphExecutors (#8050 )	2018-06-02 20:55:25 +02:00
James Reed	1f94a6eab3	[JIT] Fission and fusion passes for addmm (#7938 ) * Addmm decomposition pass * Addmm peephole pass * Fix handling of output shape in fusion pass * Add DCE to the peephole passes * add comments * maybe bugfix? * Fix GPU tests * fix py2/3 test issue	2018-05-30 18:06:58 -04:00
Adam Paszke	b45f2ff1ae	Remove CompiledFunction + clean up JIT tests (#7421 )	2018-05-16 20:03:04 +02:00
Zachary DeVito	38bc732b2d	[jit] Change interpreter/fuser to work on Variables only (#7489 ) * this removes the flag controlling whether the interpreter works on variables. * now the interpreter _always_ works on variables * constants in the IR are still _always_ non-variables, and an assert was added to ensure this. * as_tensor was split into as_variable and as_tensor since it is sometimes used to construct constants in the IR * I tried changing the IR to also always use variables but that change was much more cross cutting and fragile and I never got it working	2018-05-11 13:33:47 -07:00

1 2

66 Commits