pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Adam Paszke	8c3a94eaf2	Improve autograd profiler performance (#11773 ) Summary: To illustrate the benefits of this commit, I'll use the time/iter I got from one of the JIT benchmarks on my machine. \| Run \| Time \| \|----------------------------------------------\|-------------------------\| \| No profiler \| 45ms \| \| With profiler \| 56ms \| \| Use `clock_gettime` instead of `std::chrono` \| 48ms \| \| Touch all pages on block allocation \| 48ms (less jitter) \| \| Use `const char*` instead of `std::string` \| 47ms (even less jitter) \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/11773 Differential Revision: D9886858 Pulled By: apaszke fbshipit-source-id: 58f926f09e95df0b11ec687763a72b06b66991d0	2018-09-19 09:25:43 -07:00
Edward Yang	def44c96fd	Revert D9779866: [pytorch][PR] Move function deletion from the stack to the heap. Differential Revision: D9779866 Original commit changeset: 96753eead790 fbshipit-source-id: 959deeb63318d48f4c563e10e70ef6ec7fabd3b4	2018-09-12 16:56:11 -07:00
Owen Anderson	d4e05f4e1e	Move function deletion from the stack to the heap. (#11534 ) Summary: This eliminates the need for any heuristics regarding stack size limits. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11534 Differential Revision: D9779866 Pulled By: resistor fbshipit-source-id: 96753eead7904bbdc2869fb01f7bd42141032347	2018-09-12 14:39:59 -07:00
Peter Goldsborough	3e3d8caecd	Allow setting deletion constant Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11529 Differential Revision: D9775398 Pulled By: goldsborough fbshipit-source-id: 8593d1afcf8be3150dcc4a58433f53307e3ae665	2018-09-11 23:11:46 -07:00
Anders Papitto	4c615b1796	Introduce libtorch to setup.py build (#8792 ) Summary: Prior to this diff, there have been two ways of compiling the bulk of the torch codebase. There was no interaction between them - you had to pick one or the other. 1) with setup.py. This method - used the setuptools C extension functionality - worked on all platforms - did not build test_jit/test_api binaries - did not include the C++ api - always included python functionality - produced _C.so 2) with cpp_build. This method - used CMake - did not support Windows or ROCM - was capable of building the test binaries - included the C++ api - did not build the python functionality - produced libtorch.so This diff combines the two. 1) cpp_build/CMakeLists.txt has become torch/CMakeLists.txt. This build - is CMake-based - works on all platforms - builds the test binaries - includes the C++ api - does not include the python functionality - produces libtorch.so 2) the setup.py build - compiles the python functionality - calls into the CMake build to build libtorch.so - produces _C.so, which has a dependency on libtorch.so In terms of code changes, this mostly means extending the cmake build to support the full variety of environments and platforms. There are also a small number of changes related to the fact that there are now two shared objects - in particular, windows requires annotating some symbols with dllimport/dllexport, and doesn't allow exposing thread_local globals directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8792 Reviewed By: ezyang Differential Revision: D8764181 Pulled By: anderspapitto fbshipit-source-id: abec43834f739049da25f4583a0794b38eb0a94f	2018-07-18 14:59:33 -07:00
albanD	78e3259bbe	Add autograd automatic anomaly detection (#7677 ) * add autograd automatic anomaly detection * python 3 string support * Fix non python build * fix typo in doc * better test and naming fix * fix no python build and python object handling * fix missing checks * clean NO_PYTHON build * Remove unwanted changes	2018-06-11 21:26:17 -04:00
Zachary DeVito	d985cf46f1	Add workaround to fix include warnings in Python 2 builds. (#6716 )	2018-04-24 12:30:19 -07:00
Sam Gross	6b3a4637d6	Make the tensor type torch.Tensor instead of torch.autograd.Variable (#5785 ) This changes type(tensor) to return `torch.Tensor` instead of `torch.autograd.Variable`. This requires a few implementation changes: - torch.Tensor is now a regular Python class instead of a pseudo-factory like torch.FloatTensor/torch.DoubleTensor - torch.autograd.Variable is just a shell with a __new__ function. Since no instanes are constructed it doesn't have any methods. - Adds torch.get_default_dtype() since torch.Tensor.dtype returns <attribute 'dtype' of 'torch._C._TensorBase' objects>	2018-04-03 16:29:25 -04:00
Sam Gross	7588893ce2	Some additional clean-ups (#5505 ) - Remove some uses of mega-header THP.h - Use HANDLE_TH_ERRORS in functions that may throw - Move NumPy includes to common header - Delete unused allocator	2018-03-05 17:45:02 -05:00
Peter Goldsborough	f38b6f611e	Replace NULL with nullptr in autograd (#5162 )	2018-02-12 12:01:52 -08:00
Sam Gross	d605058212	Replace Variable.volatile with torch.no_grad() (#3970 ) This removes volatile from Variable. The functionality is mostly replaced by a global (thread-local) flag, which is controlled by torch.set_grad_enabled() and the context manager torch.no_grad(). In C++, the flag is exposed through GradMode::is_enabled() and GradMode::set_enabled() Fixes #3627	2017-12-18 15:46:13 -05:00
Edward Z. Yang	8a254a0271	Port batchnorm_double_backward to ATen. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-12-13 17:19:47 -05:00
Zachary DeVito	c25a1493cd	CUDA mode profiler fixes (#3754 ) * CUDA mode profiler fixes * Enable multi-gpu CUDA tracing We need to record per-device start events because event timing comparison only works for events on the same device. * Course-grained CPU-CUDA syncing of timelines Record a __cuda_start event used to synchronize cuda/gpu timings. This requires running some warm-up event records to ensure the call to event record for the __cuda_start event doesn't take longer than normal. fix syncing * fix cuda build and lint	2017-11-28 09:32:34 -05:00
Adam Paszke	cf407213f9	Clean up stochastic function related dead code (#3782 )	2017-11-20 12:44:45 -05:00
Zachary DeVito	cc7f09a372	Add cudaEvent support to the profiler (#3734 ) * Add cudaEvent support to the profiler This adds the ability to record cuda timings using cudaEventRecord in the profiler. Since it doesn't require nvprof it is easier to run than the nvprof path. This also records a thread id for each event, which will make tracing results easier to understand * Add flow arrows from cpu to cuda event * Fix no cuda build * Review comments * Move CUDA checks to one place	2017-11-16 13:58:09 -08:00
peterjc123	aa911939a3	Improve Windows Compatibility (for csrc/scripts) (#2941 )	2017-11-08 19:51:35 +01:00
Adam Paszke	22f596572c	Add torch.autograd.profiler.range	2017-11-06 19:42:44 -05:00
Junjie Bai	c8f824cd1b	Improve import failure messages	2017-09-27 10:37:54 -04:00
Adam Paszke	411e1469e0	Add tools for autograd profiling	2017-09-25 23:21:30 -04:00
Adam Paszke	ea05ac8f41	Move JIT-related files to jit dir. Remove IR interpreter	2017-09-05 17:48:55 -04:00
Edward Z. Yang	4979359800	Add graphs, trace them. It is not an /expression/ we trace, but it is a /graph/: that is, a closed expression which knows its parameters. Knowing the list of parameters is helpful and helps remove a hack when interpreting. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	3055b69f63	Refactor Arg class away. Although ANF style developments traditionally stratifies syntactic classes into atomic (Arg) and complex (Expr) expressions, where atomic expressions could be variables, constants or lambdas, Zach has successfully convinced me that we should do away with the variant here and always require arguments to be variables. There are a few reasons for this: 1) Tensor constants, not currently supported, could be modeled using a "Constant" instruction, removing the need for them to be representable directly inline. An inline constant is marginally more convenient for peephole optimizations, but since we have gone full ANF, we are going to need to be able to see across def-uses in any case, and it is not too much worse to need to handle constants this way. By the way, Swift Intermediate Language also made a similar choice, see the slide on "Literal Instructions" in http://llvm.org/devmtg/2015-10/slides/GroffLattner-SILHighLevelIR.pdf 2) Scalar constants, which are quite important for passing non-tensor arguments to Python operators, are now stored out-of-band as NON first-class values. This more closely matches the ToffeeIR design, and makes it clear what parameters are "first class" (tensors only) and which ones are not. However, we need to be able to unswizzle the separate scalar/tensor lists into a unified list in the correct format; this is what PyFunctionCConv is for. Also, Locals got renamed into Tuple. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	a797ab9343	Rewrite AST to a new, more functional representation. Previously, our AST was a DAG, where shared Nodes indicated a computation should be reused. This commit rewrites the IR into a new functional representation which represents sharing explicitly using variable bindings. We offer a few justifications for this new style: 1. The new representation is not all that different from the old one; it is about as easy to construct, and the lack of an explicit graph doesn't negatively impact our ability to interpret the graph, since we've chosen, as a matter of design, to NOT have the IR participate in the actual execution of a graph. 2. The new let-binding representation has an implicit ordering, which we can use to conveniently keep track of the original order the trace showed up as. This automatically gives us a topsort, and gives us an easier to read textual representation of our IR: %14 = Embedding %11, %0, -1, None, 2, False, False %15 = Dropout %14, 0.2, True, False %16 = Index %12, 0 %17 = Index %12, 1 %18 = Index %13, 0 %19 = Index %13, 1 %20 = Index %15, 0 %21 = Linear %20, %1, %3 %22 = Linear %16, %2, %4 3. It moves us closer to a Futhark style language (http://futhark-lang.org/publications/pldi17.pdf). Major aspects of the diff - Node is replaced with Expr and Arg, a pair of mutually recursive structures which represent our new language. In BNF, the language looks like this: a ::= c \| %i e ::= %i, ... = e \| PyOp e, ... \| Ret %i, ... Technically, Ret is not actually a return (no control flow is involved), it just tuples up a series of tensors (identified by variables). One important invariant is that locals are always tensors; they are never constants (this is asymmetric with Args.) - Arguments support Python constants. This is an important piece because many operators take extra Python literals like integers and tuples in order to specify extra parameters about how an operator operates. Adding this was essential to getting word_language_model to work. - As both Expr and Arg have multiple variants, there is new infrastructure for doing case on the variants using ExprVisitor and ArgVisitor. The strategy here is adapted from WebAssembly's visitors, although we have generalized to permit arbitrary argument forwarding, which is necessary to support tail-recursive visitor calls. TCO is important because our interpreter may recurse arbitrarily deep into a stack of nested lets. If users wish, they can also manually case on the type tag. - Tracing is now turned on and off using _tracer_enter/_tracer_exit in torch._C. _tracer_enter accepts a list of variables which are to be treated as arguments; _tracer_exit accepts the list of traced variables which should be returned when you reexecute the trace, and returns the trace expression which can be reexecuted. GlobalTracingState is a global variable which tracks whether or not we are tracing or not. - You use run_forward to execute a trace on some set of parameters. - When under tracing, variables keep track, via trace_local, what the name of their variables in the IR are. Here is a simple runner which leaks memory but can be used to JIT models: import torch.autograd.function as F import torch._C def jit(model): import types real_forward = model.forward def forward(self, args): def flatten(x): return tuple(F._iter_variables(x)) if not hasattr(self, "saved_trace"): torch._C._tracer_enter(tuple(self.parameters()) + flatten(args)) out = real_forward(args) self.saved_trace = torch._C._tracer_exit(flatten(out)) self.saved_outs = out return out else: flat_out = Variable._execution_engine.run_forward(self.saved_trace, tuple(self.parameters()) + flatten(args)) return F._unflatten(flat_out, self.saved_outs) Major problems: - Sanity checking is spotty at best, especially when users pass in variables. - The interpreter leaks tensor memory from the store. When we add back def-use we should be able to deallocate tensors as soon as we know they are no longer necessary. - The interpreter needs to reach feature parity with the old execution engine. From there, we need to see if backwards can be subsumed as well. - I still have no confidence in having memory managed everything correctly. This requires a close look. - Rather than return an open expression as a trace, we should return a lambda instead, which knows about how many formal parameters it requires. - The IR is not introspectable from Python at the moment, but this is simply a matter of implementing all the binding code. - The tracer is NOT reentrant (you can't trace while you're inside a trace.) Furthermore, no sanity checking is done if you try to incorrectly reuse things from one trace in another. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	e1b7872fc2	Make it possible to access IR from Python. Also, add a new trace_fn field to attach forward IR to Variables. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
gchanan	925208af72	Implement BatchNorm double backwards (#2207 ) * Implement BatchNorm double backwards as a python function called directly from C++. This will be converted to C++ code once ATen is integrated with autograd. * Some performance improvements via inplace ops and reusing calculations.	2017-07-27 06:00:31 +05:30
Adam Paszke	8a70067b92	Add support for stochastic functions in autograd (#294 )	2016-12-16 13:14:37 +01:00
Adam Paszke	0325e2f646	Major autograd refactor Improves autograd performance by more than 2x and fixes a couple of bugs. All core functions have been moved to C.	2016-10-13 17:17:49 -07:00

27 Commits