pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Adam Paszke	3e4a777e44	Correct JIT interpreter autograd function (#3760 )	2017-11-19 21:48:22 +01:00
Zachary DeVito	cc7f09a372	Add cudaEvent support to the profiler (#3734 ) * Add cudaEvent support to the profiler This adds the ability to record cuda timings using cudaEventRecord in the profiler. Since it doesn't require nvprof it is easier to run than the nvprof path. This also records a thread id for each event, which will make tracing results easier to understand * Add flow arrows from cpu to cuda event * Fix no cuda build * Review comments * Move CUDA checks to one place	2017-11-16 13:58:09 -08:00
Soumith Chintala	99037d627d	fix OSX cuda build (#3722 )	2017-11-15 16:38:18 -05:00
Zachary DeVito	e43ff32192	Add a JIT interpreter (#3634 ) * Add a JIT interpreter The separate interpreter is used to graphs with a lower overhead than converting them to autograd graphs. Some notes: * does not support Handles/PythonOp/CppOp, these will be in a future commit * jit_closure.cpp still exists and we fall back to it for now when cannot handle something because of PythonOp/CppOp * In order to support retain_graph=True, the interpreter can be cloned, creating a copy that can be run with different arguments. This is assumed to be the non-standard case so cloning is not particularly optimized. No tensor _data_ is copied, but the at::Tensor list in the interpreter is. If we hit problems, there is a lot we could do (such as register allocation) to minimize the stuff that needs to be copied. * Uses a pImpl pattern to keep implementation details out of its header file. * Modifies the way getTensorOp works so that it reads/writes to already-existing vectors, this prevents needing to realloc these buffers each time. * Timings are here: https://gist.github.com/zdevito/5a20ac29fb1b9e449e693b67dc478127 This reduces overhead to about the same as running it in python. It is about 10us faster to run the same thing using ATen directly. * Code Mod Interpreter -> InterpreterState Function -> Code Add other requested comments. * RegList -> ListHandle<T> Change the RegList functions to be safer by identifying the type of each argument list, and checking that list insert does not try to add to two different lists at once. * Use exactly equal for interp tests	2017-11-13 22:09:53 -08:00
Sam Gross	4fa94793dd	Bump version in master (#3605 )	2017-11-11 18:49:19 -05:00
peter	7160fb0801	Fix setup scripts for Windows CUDA builds	2017-11-11 13:05:35 +01:00
Adam Paszke	1f1612ee37	Move _CompiledMixin to C++	2017-11-10 16:31:44 +01:00
Soumith Chintala	285ce10dbe	fix linking order of nvrtc to force no-as-needed (#3583 )	2017-11-08 22:05:09 -05:00
Edward Z. Yang	d2784b6e5b	Link ATen against CuDNN when available. (#3582 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-08 20:20:53 -05:00
peterjc123	aa911939a3	Improve Windows Compatibility (for csrc/scripts) (#2941 )	2017-11-08 19:51:35 +01:00
Adam Paszke	621fbd5c4e	Move flattening/unflattening JIT logic to C	2017-11-06 19:42:44 -05:00
Sam Gross	fde355f7d4	Allow in-place operations on views (#3384 ) Allow in-place operations on views Adds VariableViewImpl, a subclass of VariableImpl which has a pointer to the base Variable on which it is a view. In-place operations on views change the grad_fn of the base. Note that in-place operations only work on views that are the first output of the function that created them. All C++/ATen implemented functions have this behavior, but it's possible to write Python-implemented autograd functions that do not. In-place operations on these view will raise an exception. Fixes #3313	2017-11-06 18:19:56 -05:00
Zach DeVito	f6dac327df	build fixes	2017-11-02 19:53:36 -04:00
Zach DeVito	88d56cc198	fix setup.py paths	2017-11-02 19:53:36 -04:00
Zach DeVito	5aa5b572e4	update build so that all of TH* is in libATen	2017-11-02 19:53:36 -04:00
Sam Gross	afdf50cafe	Move jit/assert.h to csrc/assertions.h (#3442 ) I've kept JIT_ASSERT as an alias to TORCH_ASSERT, which we can use throughout the C++ code.	2017-11-02 13:26:51 -04:00
Soumith Chintala	fc7a68d147	fix lint	2017-11-02 07:36:58 -04:00
Soumith Chintala	4108feb27d	fix OSX cuda build	2017-11-02 07:15:24 -04:00
Trevor Killeen	0e38d3bbb3	remove thpp library (#3405 )	2017-11-01 11:57:09 -04:00
Trevor Killeen	b544882335	ATen in THD (Part I) (#2288 ) * enable size from ATen type * temp commit aten thd * port copy, math * port random * changes after rebase * lapack bind * thd and csrc compile * fix min/max reductions in DataChannelTCP * clean up changes * re-enable tensor constructors * port MPI to at::Tensor * fix storage methods to not cast to thpp storage ptrs	2017-11-01 09:59:02 -04:00
Edward Z. Yang	d4abaa4b9e	Move ONNX broadcast fusion into separate ONNX pass, fixes verbose printing. This breaks a lot of the onnx-pytorch tests because the abstraction barriers are not respected. I'll spin up a patch for that separately. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-01 09:49:53 -04:00
Soumith Chintala	91af122d43	add no-as-needed for THRTC	2017-11-01 04:25:42 -07:00
Soumith Chintala	88d9ebc850	lazy-load nvrtc and libcuda (#3408 )	2017-11-01 06:07:03 -04:00
Adam Cécile	a5dbc254f8	if git is not installed at all, no subprocess exception will be raised (#3379 )	2017-10-30 18:37:12 -04:00
Edward Z. Yang	40f7f6e095	Improve handling of 'expand' (broadcasting) in JIT and ONNX The pieces: - I improved the lint / asserts to catch some bugs which I committed while working on my export. There are two new properties which the linter checks now: (1) "Anticipated uses". If a node says that is used by M, M better appear later in the topsort. Previously, we only checked if it was in all_nodes. (2) If you are a select node, you better be a multi-type node; if you're not a select node, you better not be! And you should never have an input that is multi-type. - There is a new peephole optimization pass, for simple, local transformations to graphs. Right now, it implements a simple optimization: remove 'expand' invocations that are no-ops (the size before matches the size after), but we can add other things to it later. I needed this for ONNX because no-op expands show up in the left-hand argument, which we don't support. - There is now a broadcast fuser, which fuses ATen expand ops into broadcastable ONNX ops (Add, Div, Mul, Pow, Sub, Gemm.) It only fuses when the original size is a suffix of the new size, as per the ONNX spec. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-29 23:50:34 -04:00
Maxim Berman	7b00adf5d3	Add CUDNN_LIB_DIR in rpath (#3255 ) * Add CUDNN_LIB_DIR in link -rpath * insert CUDNN_LIB_PATH in front of rpath	2017-10-28 00:13:53 -04:00
Adam Paszke	61afb0d519	Autogenerate ATen dispatch for JIT nodes	2017-10-27 02:40:09 +05:30
Sam Gross	67839ce7bc	Delete unused Softmax code (#3220 ) Softmax and LogSoftmax are automatically bound and dispatched through VariableType.	2017-10-21 20:51:27 +02:00
Edward Z. Yang	67612cba09	Add -Wno-missing-braces Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-19 23:04:19 -04:00
Sam Gross	f1f64c8d07	Generate autograd functions for NN / more refactors (#3136 ) Generate autograd functions for NN and implement more derivatives in derivatives.yaml A big refactor of gen_variable_type.py	2017-10-19 15:03:26 -04:00
Adam Paszke	98e67448fa	Large Softmax and LogSoftmax refactor - Cleaned up THNN and THCUNN code and kernels - Improved THCUNN kernel performance 5x, making it match cuDNN performance - Added support for computing softmax over arbitrary dims NOTE: The default dim for 3D inputs is now 1 (used to be 0) - Both functions now accept inputs with arbitrarily many dimensions - Autograd functions no longer save the input (it's unnecessary) - Added cuDNN bindings for softmax, but they are unused as THCUNN matches or even exceeds cuDNN performance	2017-10-19 19:51:10 +02:00
Trevor Killeen	dcb457fdd9	add support for using nnpack when installed via conda (#3155 ) * add support for using nnpack when installed via conda * unify nnpack discovery between conda and user	2017-10-18 20:11:13 +02:00
Richard Zou	0f4ae13f05	Better cudnn version checking (#3132 )	2017-10-16 20:59:18 +02:00
Richard Zou	1322f9a272	Add cudnn version to torch.version	2017-10-13 23:58:25 +02:00
Francisco Massa	f093545919	Add compiled CUDA version in torch.version.cuda	2017-10-10 10:16:14 -04:00
Soumith Chintala	efe91fb9c1	delete redundant python nccl code	2017-10-09 22:24:18 -04:00
Soumith Chintala	4d62933529	add initial NCCL C bindings	2017-10-09 22:24:18 -04:00
Soumith Chintala	b7e258f81e	link specific versioned System NCCL, rather than generic file	2017-10-09 22:24:18 -04:00
Trevor Killeen	029252fb3b	NNPACK bindings for Convolution (#2826 ) * skeleton commit for building and linking nnpack library in PyTorch * first stab at conv forward binding + integration * bind NNPACK gradient kernels * move nnpack forward, input gradient calls deeper * nnpack conv api mimics nn * fix symbol error; use memory across calls * clean up warnings, add shape checking, thread safety, configurable thread specification * add batch size threshold, also bind for single-element batch for the future	2017-10-04 13:48:14 -04:00
Adam Paszke	437d3af7bf	Add CUDNN_INCLUDE_DIR before CUDA directories in setup.py	2017-10-03 10:06:47 -04:00
Sam Gross	de757805fc	Implement some autograd functions using ATen (#2805 ) This adds some generated autograd functions implemented in C++, which are generated from derivatives.yaml. It also generates Python bindings for the Variable methods. The generated files are: Functions.cpp/h: subclasses of torch::autograd::Function VariableType.cpp/h: The at::Type for autograd Variables python_variable_methods.cpp: Python bindings to torch::autograd::Variable python_variable_methods_dispatch.h: wrapper which releases GIL and sets the CUDA device python_functions.cpp/h: exposes generated autograd functions as Python objects The generated functions are mostly shadowed by the definitions in variable.py. We'll remove the Python implementations in favor of the generated C++ implementations in a subsequent commit.	2017-09-26 17:08:00 -04:00
Adam Paszke	b7849662b5	Always regenerate nn wrappers after rebuilding THNN and THCUNN	2017-09-25 23:21:30 -04:00
Adam Paszke	411e1469e0	Add tools for autograd profiling	2017-09-25 23:21:30 -04:00
Soumith Chintala	f4eca7c94d	make CUDA_HOME take precedence over all other CUDA detection methods (#2863 )	2017-09-25 18:17:40 -04:00
Soumith Chintala	5be06230f9	cleanup external NCCL detection, add NCCL_ROOT_DIR / NCCL_LIB_DIR mechanism	2017-09-25 11:28:59 -04:00
Edward Z. Yang	bf9ab91779	Indicate if the last invocation of setup.py was debug or not. How to use: import torch.version print(torch.version.debug) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-22 18:33:47 -04:00
Lu Fang	0a1ac8bfe5	create a cse pass, with very naive support.	2017-09-22 17:06:27 -04:00
Edward Z. Yang	670ec4bc59	Split Type into its own header file. No other substantive changes. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-20 12:24:27 -04:00
Adam Paszke	28828e033f	Make certain functions traceable	2017-09-19 10:53:32 -04:00
Adam Paszke	b708b6de8d	Add ONNX pass (JIT trace initialization)	2017-09-19 10:53:32 -04:00
Adam Paszke	0e53fe3a41	Put ONNX files where they belong	2017-09-19 10:53:32 -04:00
Adam Paszke	8dae433de8	Move JIT passes to a separate directory	2017-09-19 10:53:32 -04:00
Sam Gross	80d229b0e7	Refactor THPUtils_invalidArguments into separate file	2017-09-13 19:18:02 -04:00
Peter Ruch	0a9f93e43c	add env var for python executable	2017-09-13 17:49:08 -04:00
Soumith Chintala	19cfda761c	write THD link libraries to text file and read it in setup.py to link dependencies correctly (#2711 )	2017-09-12 20:56:36 -04:00
Sam Gross	1290e586fb	Use at::Tensor based autograd Variable (#2676 ) Variable is now a subclass of at::Tensor backed by a VariableImpl* pImpl. The implementation of the ATen functions is defined in the auto-generated VariableType.h/cpp file. Currently, only functions which fall through to the base type, such as sizes() and isCuda() are implemented. Differentiable ops like add() and mul() will be added in a subsequent PR.	2017-09-12 11:36:01 -04:00
Soumith Chintala	cf2c7ca998	add THPP linkage when building THD (#2687 )	2017-09-11 08:53:38 -04:00
Edward Z. Yang	459cc5a346	Check for nanopb and pybind11 submodules as well. (#2660 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-07 13:24:31 -04:00
Soumith Chintala	84095f9512	add linux guard	2017-09-07 11:57:49 -04:00
Soumith Chintala	894c05fd22	fix static linkage and make THD statically linked	2017-09-07 11:54:18 -04:00
Zach DeVito	6d8d5bab4c	Codemod Toffee -> ONNX, toffee -> onnx. Change file names to match	2017-09-06 13:45:39 -04:00
Edward Z. Yang	d59714e3b1	Code review comment changes. - Reduce setup.py diff. - Expunge WITH_TOFFEE from codebase. - Elaborate on a comment. - Move gen_toffee.sh to tools - Delete densenet test. - Use 'using' to inherit a constructor. - Delete outdated comment. - Comment about why primspecs can return fewer outputs. - Remove dead, commented out includes. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	7ac6d67a4e	Add nanopb to list of dep_libs. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Adam Paszke	594f98ce16	Support multi-stage AutogradClosures	2017-09-05 17:48:55 -04:00
Edward Z. Yang	605ef38831	Explicitly override CMAKE_DEBUG_POSTFIX for nanopb build. If it's not set, CMAKE_DEBUG_POSTFIX sets it to 'd' which means the static library gets named something different when built in debug mode. This is annoying because it means if you build in debug mode, the library is in a different place. Rather than teach the build system to find the correct name, just set this POSTFIX so names don't change. Also, update setup.py to look for the non-debug archive. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	de6ef65be5	Port to nanopb. General strategy: - nanopb is statically linked into PyTorch. It must be built with -fPIC. - Generated nanopb files for toffee.proto are checked into our repo. - Because nanopb generated protobufs are C only, we wrote a wrapper around it to give a Google C++ style interface. More on this shortly. How does the wrapper work? - It's called "micropb" becaues it is less small than nanopb :) - nanopb requires all variable-length fields to be written out using a "callbacks" mechanism. - We wrote pre-canned callbacks for all of the types ToffeeIR writes out and lists; these are micropb_callback and micropb_callback_list. These operate simply by dynamically allocating and storing the data to be written out in data (this defeats the purpose of the callback mechanism, but it's easy to implement) - Finally some boilerplate to actually implement the wrapper classes and have owning pointers to the actual data. Testing strategy: - Take the serialized protobuf from nanopb, parse it again with ToffeeIR and print it. Worked with all of test_jit.py! These tests don't run without 'toffee' being installed. TODO: - Update CI to install ToffeeIR, so we can run the Toffee tests in CI - Update E2E with Caffe2 tests so that they work with new stuff. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Zach DeVito	a3fdb281d1	Python wrapper for Node IR using pybind11 Supports almost all of the IR API.	2017-09-05 17:48:55 -04:00
Adam Paszke	fa308b3183	Improve backward tracing	2017-09-05 17:48:55 -04:00
Zach DeVito	57b7370aab	switch NodeKind over to Symbol type.	2017-09-05 17:48:55 -04:00
Zach DeVito	d7d74428a3	batchnorm hacking	2017-09-05 17:48:55 -04:00
Edward Z. Yang	db79be82ab	Move Toffee for C++ functions back to autograd. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	e1b345d81b	More alexnet things as primspec. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	6f6fe177f1	Make Toffee optional. Unbreaks CI. The general strategy: - We put all the toffee files in torch/csrc/toffee; they will only be added when toffee is enabled - Toffee is enabled if torch/lib/ToffeeIR is present (since we don't have a submodule/subtree thing going on) - The most prevalant place you will need to use WITH_TOFFEE is for primspec definitions on C++ autograd functions. There is a macro HAS_PRIMSPEC to ameliorate optionally defining primspec() virtual overrides on Function classes. HasPrimspec is always available but will be a zero field class when Toffee is disabled. NB: We might revert this commit in the future if we figure out a way to unconditionally enable Toffee that everyone likes. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	4b1f182199	Disable C++ Python conversion code. We want all the conversion code to live in one place. Away it goes! This means that alexnet protobuf no longer works. It will start working again when we port changes. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	dd58b145c3	Toffee graph exporting for PyTorch. This commit adds a new exporter pass which takes a graph and returns a string of the human-readable protobuf representation of a model. We have two strategies for how conversions are implemented: - If a Python autograd function has a primspec static method, we invoke it to get the Toffee conversion. Use torch.toffee.op to generate the format expected to be returned. The particular data representation is opaque and subject to change in the future. - Otherwise, there's a giant if statement in the exporter, which manually uses the JIT IR C++ API and Toffee IR C++ protobuf API to convert. You must check out a copy of the ToffeeIR repo https://github.com/ProjectToffee/ToffeeIR at torch/lib; at the moment we don't have a subtree/submodule set up. Technical debt in this commit: - To get protobuf headers in scope, we unconditionally add $CONDA_PREFIX/include to the include path. This needs to be replaced with a more robust mechanism. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Adam Paszke	7f60a18293	Add initial support for backward tracing	2017-09-05 17:48:55 -04:00
Adam Paszke	1c4538e017	Trace C functions	2017-09-05 17:48:55 -04:00
Adam Paszke	233a66dcbe	Remove SimpleMap from JIT IR	2017-09-05 17:48:55 -04:00
Zach DeVito	f5e414862a	cuda guards for fusion compiler	2017-09-05 17:48:55 -04:00
Zach DeVito	50e51eaa7f	Fusion of simple map operations using nvrtc. Approach is based on the approach of THC's pointwiseApply{1,2,3} family of kernels, but doesn't have any dependencies on that code. Adjacent contiguous dimensions of input tensors are compressed to reduce the complexity of indexing math. For the completely contiguous case, the indexing logic simplifies to just the linear index. In simple tests, this code matched or beat the equivalent from THC.	2017-09-05 17:48:55 -04:00
Adam Paszke	f270973937	Add JIT IR -> Autograd IR converter	2017-09-05 17:48:55 -04:00
Zach DeVito	48945a435d	IR modifications to make mutatation possible. Nodes are in intrusive doubly-linked list. Methods added to manipulate inputs etc.	2017-09-05 17:48:55 -04:00
Edward Z. Yang	8215860d2f	Add an assert wrapper for easy porting. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Adam Paszke	ea05ac8f41	Move JIT-related files to jit dir. Remove IR interpreter	2017-09-05 17:48:55 -04:00
Zach DeVito	1325fa511c	JIT IR including use-def chains and updated comments.	2017-09-05 17:48:55 -04:00
Edward Z. Yang	a797ab9343	Rewrite AST to a new, more functional representation. Previously, our AST was a DAG, where shared Nodes indicated a computation should be reused. This commit rewrites the IR into a new functional representation which represents sharing explicitly using variable bindings. We offer a few justifications for this new style: 1. The new representation is not all that different from the old one; it is about as easy to construct, and the lack of an explicit graph doesn't negatively impact our ability to interpret the graph, since we've chosen, as a matter of design, to NOT have the IR participate in the actual execution of a graph. 2. The new let-binding representation has an implicit ordering, which we can use to conveniently keep track of the original order the trace showed up as. This automatically gives us a topsort, and gives us an easier to read textual representation of our IR: %14 = Embedding %11, %0, -1, None, 2, False, False %15 = Dropout %14, 0.2, True, False %16 = Index %12, 0 %17 = Index %12, 1 %18 = Index %13, 0 %19 = Index %13, 1 %20 = Index %15, 0 %21 = Linear %20, %1, %3 %22 = Linear %16, %2, %4 3. It moves us closer to a Futhark style language (http://futhark-lang.org/publications/pldi17.pdf). Major aspects of the diff - Node is replaced with Expr and Arg, a pair of mutually recursive structures which represent our new language. In BNF, the language looks like this: a ::= c \| %i e ::= %i, ... = e \| PyOp e, ... \| Ret %i, ... Technically, Ret is not actually a return (no control flow is involved), it just tuples up a series of tensors (identified by variables). One important invariant is that locals are always tensors; they are never constants (this is asymmetric with Args.) - Arguments support Python constants. This is an important piece because many operators take extra Python literals like integers and tuples in order to specify extra parameters about how an operator operates. Adding this was essential to getting word_language_model to work. - As both Expr and Arg have multiple variants, there is new infrastructure for doing case on the variants using ExprVisitor and ArgVisitor. The strategy here is adapted from WebAssembly's visitors, although we have generalized to permit arbitrary argument forwarding, which is necessary to support tail-recursive visitor calls. TCO is important because our interpreter may recurse arbitrarily deep into a stack of nested lets. If users wish, they can also manually case on the type tag. - Tracing is now turned on and off using _tracer_enter/_tracer_exit in torch._C. _tracer_enter accepts a list of variables which are to be treated as arguments; _tracer_exit accepts the list of traced variables which should be returned when you reexecute the trace, and returns the trace expression which can be reexecuted. GlobalTracingState is a global variable which tracks whether or not we are tracing or not. - You use run_forward to execute a trace on some set of parameters. - When under tracing, variables keep track, via trace_local, what the name of their variables in the IR are. Here is a simple runner which leaks memory but can be used to JIT models: import torch.autograd.function as F import torch._C def jit(model): import types real_forward = model.forward def forward(self, args): def flatten(x): return tuple(F._iter_variables(x)) if not hasattr(self, "saved_trace"): torch._C._tracer_enter(tuple(self.parameters()) + flatten(args)) out = real_forward(args) self.saved_trace = torch._C._tracer_exit(flatten(out)) self.saved_outs = out return out else: flat_out = Variable._execution_engine.run_forward(self.saved_trace, tuple(self.parameters()) + flatten(args)) return F._unflatten(flat_out, self.saved_outs) Major problems: - Sanity checking is spotty at best, especially when users pass in variables. - The interpreter leaks tensor memory from the store. When we add back def-use we should be able to deallocate tensors as soon as we know they are no longer necessary. - The interpreter needs to reach feature parity with the old execution engine. From there, we need to see if backwards can be subsumed as well. - I still have no confidence in having memory managed everything correctly. This requires a close look. - Rather than return an open expression as a trace, we should return a lambda instead, which knows about how many formal parameters it requires. - The IR is not introspectable from Python at the moment, but this is simply a matter of implementing all the binding code. - The tracer is NOT reentrant (you can't trace while you're inside a trace.) Furthermore, no sanity checking is done if you try to incorrectly reuse things from one trace in another. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	e1b7872fc2	Make it possible to access IR from Python. Also, add a new trace_fn field to attach forward IR to Variables. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	c5faaf69d8	Initial IR representation for forward trace. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
hongyi-zhang	bf013f4c99	fix Python 2 gloo install (#2597 )	2017-09-02 20:05:37 -04:00
Edward Z. Yang	a03e5cb409	Remind users to submodule update. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-08-30 16:14:38 -04:00
Sam Gross	966fdbd93a	Add commands to re-build individual libraries. (#2506 ) When working on PyTorch dependencies we often want to rebuild only that dependency and the Python extension. You can now do that by running: python setup.py build_thc to only re-build THC	2017-08-23 07:16:05 -04:00
Thomas Viehmann	7c04f11d88	search for ldconfig in /sbin for nccl detection (#2276 )	2017-08-03 05:32:21 +05:30
Zachary DeVito	43c944acbd	Remove dead THPP code that has been replaced with ATen objects. (#2235 ) THPP usage is now isolated in THD.	2017-07-29 08:07:41 +05:30
Trevor Killeen	c304d04fc6	Replace thpp::Tensor with ATen Tensor in autograd csrc (#2170 )	2017-07-28 10:18:37 -04:00
Soumith Chintala	ea6f9a26b8	fix version number	2017-07-20 13:30:53 -04:00
Soumith Chintala	09abaa2189	make keepdim backcompat warnings emit in autograd as well (#2157 )	2017-07-20 01:48:05 -04:00
Soumith Chintala	a5c2546c0f	version bump	2017-07-19 12:34:43 -07:00
Soumith Chintala	b660303a16	Static linking against libstdc++ in Binary Build mode	2017-07-19 12:19:36 -04:00
Soumith Chintala	169ca67a4e	Adding Spatial Transformers w/CuDNN support	2017-07-12 14:32:06 -04:00
Zach DeVito	ab3d85c410	add build commands for ATen	2017-07-11 10:35:03 -04:00
Trevor Killeen	6df23b418d	mark tools as excluded in find_packages (#1915 )	2017-06-29 13:49:56 -04:00
Trevor Killeen	cb4eaa9c5d	TensorLib/Aten --> changes required in pytorch	2017-06-22 12:55:55 -04:00
gchanan	a64560c22e	Remove flattening for torch.dot (#1781 )	2017-06-16 02:15:33 +02:00
Edward Z. Yang	3ada9da808	Make csrc -Werror clean. (#1795 ) Primary things I had to fix: - Suppress _XOPEN_SOURCE warnings by ensuring that Python.h is included first, because it always unconditionally defines this macro. - Turn off strict aliasing, because Python 2 doesn't work with strict aliasing. - Workaround setuptools bug, where it's incorrectly passing -Wstrict-prototypes to C++ compilers (where this doesn't make any sense) To compile csrc with -Werror, run `CFLAGS="-Werror" python setup.py build_ext` Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-06-13 20:18:09 -04:00
Adam Paszke	714351ff39	Officially enable process-group mode	2017-06-12 22:02:11 -04:00
Gregory Chanan	65b23f146e	Add broadcasting support for copy_, simplify code generation by moving a lot of currently generated code to expand_utils.	2017-06-11 05:37:59 -04:00
Gregory Chanan	6a40acb4f0	Add Broadcast plugin.	2017-06-11 05:37:59 -04:00
Edward Z. Yang	ba690d5607	Add support for NVTX functions. (#1748 )	2017-06-10 18:26:58 +02:00
Adam Paszke	8ea7c87c29	Improve init methods	2017-06-02 23:42:11 +02:00
Adam Paszke	702a2e3bc5	Make Variables not subclass Function anymore Because of this Variables can no longer appear in the graph. Every usage of a leaf Variable will leave an AccumulateGrad function that has no outputs, but modifies var.grad as a side effect.	2017-05-01 16:44:56 -04:00
Adam Paszke	2ca787fcf4	Refactor attribute names in autograd	2017-05-01 16:44:56 -04:00
Soumith Chintala	2197e4c766	version bump	2017-05-01 15:54:52 -04:00
Adam Paszke	9169f60a84	Parallelize TensorMethods.cpp builds (#1400 )	2017-04-29 09:07:21 -04:00
Soumith Chintala	24e5a9057e	Revert "Parallelize TensorMethods.cpp builds (#1364 )" (#1390 ) This reverts commit `060048bcd8`.	2017-04-28 07:59:40 -04:00
Adam Paszke	060048bcd8	Parallelize TensorMethods.cpp builds (#1364 )	2017-04-28 07:45:21 -04:00
albanD	f0c7124420	Allow support for negative dimension argument for all functions	2017-04-06 16:37:00 -07:00
Soumith Chintala	1c391f6f93	bump version	2017-03-29 10:08:34 -04:00
Sam Gross	b9379cfab7	Use cuDNN and NCCL symbols from _C library (#1017 ) This ensures that we use the same library at the C++ level and with Python ctypes. It moves the searching for the correct library from run-time to compile-time.	2017-03-16 16:10:17 -04:00
Low Kian Seong	2f5c215d34	Update setup.py (#981 ) Adding `description` to `setup.py`	2017-03-11 12:14:07 -05:00
Sam Gross	15a9fbdedb	Merge pull request #881 from colesbury/parallelize_backwards Parallelize autograd backwards	2017-03-06 16:57:19 -05:00
soumith	76f7d749e4	bump version	2017-03-05 08:49:52 -08:00
Sam Gross	34ce58c909	Parallelize backwards	2017-03-03 11:26:00 -08:00
Adam Paszke	0db9c63300	Use library_dirs in setup.py	2017-02-20 23:28:31 -08:00
Adam Paszke	1bdc28161a	Add torch.__version__	2017-02-17 10:40:08 +05:30
Dr. Kashif Rasul	8d90ab2d9b	compile with cudart (#737 )	2017-02-14 06:40:35 +05:30
Sam Gross	bd5303010d	Refactor autograd package to separate Python dependencies. (#662 ) The core autograd Variable, Function, and Engine no longer depend on the Python API. This let's us implement functions in C++. In the future, we can also multithread engine and release the GIL for most of the non-Python backwards.	2017-02-13 16:00:16 -08:00
Sam Gross	f8fb25e0a2	Add generic bindings to THNN and THCUNN (#645 ) Adds bindings using thpp::Tensor to THNN and THCUNN. This allows calling into those APIs without knowing the concrete types of the tensor arguments.	2017-01-31 13:23:02 -05:00
Adam Paszke	79232c24e2	Fixes after rebase	2017-01-31 01:58:09 +01:00
Janusz Marcinkiewicz	76520512e7	DataChannel tests rewrite (#42 ); DataChannel `isend` and `irecv` implementation (#44 )	2017-01-31 01:58:09 +01:00
Adam Paszke	60d1852c7b	Major improvements to master-worker mode * Fixed all undefined symbol errors * Implemented storage interface and THStorage class * RPC improvements * Code refactor	2017-01-31 01:58:09 +01:00
Filip Binkiewicz	9fc3c5e4d2	THDTensor constructors implemented + some minor fixes	2017-01-31 01:58:09 +01:00
Adam Paszke	55632d81d2	Add Python wrappers for process group mode	2017-01-31 01:58:09 +01:00
Adam Paszke	9c411513bf	Patch distutils crash when linking with ccache	2017-01-28 00:28:33 +01:00
Luke Yeager	2ad967dbe4	Fix pep8 in setup.py with "autopep8 -i setup.py"	2017-01-25 22:23:22 -05:00
Sam Gross	c9db9c2317	Add C++ tensor library (from THD fork) (#526 )	2017-01-20 15:23:34 -05:00
Sam Gross	9302f860ae	Remove unused file TensorDocstrings.cpp (#481 ) Tensor docstrings are created in _tensor_docs.py	2017-01-18 13:34:40 -05:00
soumith	57a2ccf777	PYTORCH_BUILD_VERSION to setup.py	2017-01-17 17:51:16 -08:00
soumith	e4812b3903	add binary version to setup.py	2017-01-17 14:14:01 -08:00
Sam Gross	fd92470e23	Add cuDNN bindings for BatchNorm (#421 )	2017-01-07 15:35:24 -05:00
Zeming Lin	59d66e6963	Sparse Library (#333 )	2017-01-05 00:43:41 +01:00
Soumith Chintala	6a2785aef7	remove link_prefix from linker arguments (#395 )	2017-01-02 12:37:52 -05:00
Soumith Chintala	b650a45b9c	fix botched merge in setup.py	2016-12-31 16:55:53 -05:00
Soumith Chintala	b5dc36f278	explicitly linking against v1 libs to avoid lua-torch conflicts (#386 )	2016-12-31 10:30:36 -05:00
Adam Paszke	08d346df9c	Print libraries used for building the extension	2016-12-15 00:47:55 +01:00
Adam Paszke	28f0cf6cee	Add docstring support to cwrap (#295 )	2016-12-11 23:25:14 +01:00
Adam Paszke	cb849524f3	Improve cuDNN detection at build time	2016-12-01 23:14:41 +01:00
Adam Paszke	ebc70f7919	Look for libcudart in default CUDA installation paths (#195 )	2016-11-02 19:36:10 -04:00
Adam Paszke	ef557761dd	Allow to not use all function outputs in autograd	2016-10-31 22:47:09 +01:00
Sam Gross	ad5fdef6ac	Make every user-visible Tensor have a Storage (#179 )	2016-10-31 12:12:22 -04:00
Sam Gross	f2d7e94948	Use torch.Size for Tensor sizes and tuple for strides See issue #20 The torch.Size class is a tuple subclass which distinguishes sizes from other tuples so that torch.Tensor(size) is interpreted as size instead of data.	2016-10-28 19:37:09 +02:00
Sam Gross	ad2d413c0b	Add C++ bindings for cuDNN (#167 ) The Python ctypes bindings overhead was high enough that it slowed down multi-gpu training when using 4+ Maxwell GPUs.	2016-10-26 19:51:48 -04:00
Soumith Chintala	140c65e52b	fixing python setup.py clean	2016-10-21 23:20:02 -04:00
Sam Gross	79ead42ade	Add CUDA Stream and Event API (#133 )	2016-10-18 12:15:57 -04:00
Adam Paszke	0325e2f646	Major autograd refactor Improves autograd performance by more than 2x and fixes a couple of bugs. All core functions have been moved to C.	2016-10-13 17:17:49 -07:00
Adam Paszke	2acee24332	Add keyword argument support to most tensor functions	2016-10-13 12:32:04 -04:00
Adam Paszke	96f61bff30	Add LAPACK functions	2016-10-08 20:37:37 -07:00
Sam Gross	e8a5f00866	Auto GPU for CUNN (#71 )	2016-09-30 14:04:53 -04:00
Adam Paszke	941cf4e63d	Add ffi utils for user C extensions	2016-09-29 09:35:56 -07:00
Sam Gross	cb5d4e836f	Lazy load CUDA and THNN modules (#64 )	2016-09-28 19:29:53 -04:00
Adam Paszke	52ed57352a	Free GIL in C functions	2016-09-27 15:22:20 -07:00
Soumith Chintala	1cf87e8a0b	OSX + Python 2 build fixes	2016-09-25 19:26:13 -04:00
Adam Paszke	ddf1598ef8	Add a method for catching exceptions thrown in ctypes	2016-09-25 12:25:54 -07:00
Adam Paszke	06ab3f962f	Refactor _C extension to export some utilities	2016-09-21 08:36:54 -07:00
soumith	65d4055366	adding static linking on binary builds	2016-09-13 10:34:13 -07:00
Sam Gross	1486d880b0	Add Storage.from_buffer The from_buffer is similar to numpy's frombuffer. It decodes a Python buffer object into a Storage object. For byte and char storages, it simply copies the bytes.	2016-09-07 15:32:33 -07:00
Soumith Chintala	4cffa2219a	build fixes for OSX	2016-09-06 22:06:06 -04:00
Adam Paszke	f9d186d33a	Add initial version of multiprocessing module	2016-08-31 19:46:08 -07:00
Adam Paszke	686e8d32e2	Add torch.save and torch.load	2016-08-23 07:51:55 -07:00
Adam Paszke	8d933cbfc4	Fixes for OS X	2016-08-22 22:45:35 -04:00
Adam Paszke	4c51a523c8	Add super basic CUDA autodetection	2016-08-19 14:23:53 -07:00
Adam Paszke	b06c000478	Fix <3.5 compatibility and travis configuration	2016-08-16 21:11:10 -07:00
Adam Paszke	207d6ae60d	Override build commands in setup.py	2016-08-14 20:47:27 -07:00
Adam Paszke	1902bc0bfb	Interface with numpy	2016-08-13 20:19:17 -07:00
Adam Paszke	9fff8e7392	Fixes for changes in libs	2016-08-12 22:02:57 -07:00
Adam Paszke	ef7364b80e	Fix Python 2.7 compatibility	2016-08-12 18:26:10 -07:00
Adam Paszke	12bed8dc0d	Add CUDA device selection	2016-08-12 07:46:46 -07:00
Adam Paszke	e9f9fd3727	Major refactor	2016-08-10 09:24:53 -07:00
Adam Paszke	652a31b714	Add build scripts for libraries	2016-08-04 14:12:31 -07:00
Adam Paszke	6df0ae5d35	Add cunn	2016-08-02 09:20:18 -07:00
Adam Paszke	2f342af22f	Move optim to legacy	2016-08-01 12:01:46 -04:00
Adam Paszke	ae40bcd58c	Base for nn conversion	2016-07-22 22:21:29 -04:00
Adam Paszke	554a1d8336	Add optim	2016-07-21 16:42:06 -04:00
Adam Paszke	bc7bd7a8b3	Add unit tests and fix detected bugs	2016-07-21 13:46:59 -04:00
Adam Paszke	3a44259b32	Add support for CUDA	2016-07-19 10:45:59 -04:00
Adam Paszke	cf90bee8af	Enable parallel builds	2016-07-18 23:56:50 -04:00
Adam Paszke	3cec305524	Restructure python code	2016-06-23 22:55:05 +02:00
Adam Paszke	077bfbde03	Add all constructors for Tensor and Storage	2016-06-19 23:45:41 +02:00
Adam Paszke	4f66ea42af	Add random-related Tensor methods	2016-06-18 21:36:10 +02:00
Soumith Chintala	5ee3358a92	python 2 support	2016-06-08 19:14:57 -04:00
Adam Paszke	449ac4ca2a	Add torch.* functions	2016-05-09 19:14:40 +02:00
Adam Paszke	7567a0bb13	Add cwrap	2016-05-07 15:28:13 +02:00
Adam Paszke	c3b3df9f22	Add utilities and clenup Tensor wrappers	2016-05-06 15:04:57 +02:00
Adam Paszke	842e1b6358	Add exception handling	2016-05-05 20:58:13 +02:00
Adam Paszke	f4b3554d9e	Refactor generic/Tensor.c and add Short objects	2016-05-03 21:20:54 +02:00
Adam Paszke	690d470c71	Add Storage.py template	2016-05-03 15:13:12 +02:00
Adam Paszke	b0d90e3688	Add templated __init__	2016-05-02 23:54:59 +02:00
Adam Paszke	731041cb6a	Initial commit	2016-05-02 23:19:57 +02:00

... 8 9 10 11 12 ...

647 Commits