pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Sam Gross	60c03bc09c	Implement apply_, map_, and map2_ in Variable (#4057 )	2017-12-07 14:48:56 -05:00
Sam Gross	d0cabbde74	Implement Variable.from_numpy (#4043 ) Implements from_numpy using ATen tensors. Variable.from_numpy is a convenient placeholder for the variant that returns Variables until we merge Tensor and Variable. The behavior is slightly changed: - from_numpy() on an empty array now returns an empty tensor instead of throwing an exception. The shape may not be preserved. - CharTensor(ndarray) used to throw an exception. It now copies the ndarray. Copying is implemented via ATen toType.	2017-12-06 14:08:56 -05:00
Sam Gross	38f13447bc	Implement Variable.tolist() (#4038 ) Tensor.tolist() now dispatches through Variable.tolist() so that we only have one code path to test until we merge Variable and Tensor.	2017-12-06 12:35:05 -05:00
Sam Gross	5241cdf546	Implement Variable.numpy() (#4006 ) Implement Variable.numpy() and dispatch Tensor.numpy() through Variable.numpy() Variable.numpy() is disallowed on variables that require grad.	2017-12-05 14:24:11 -05:00
Zachary DeVito	9e46fca424	Use ninja as the cmake backend as well.	2017-12-04 14:16:26 -05:00
Zach DeVito	f72fe0624d	Add a CPU Fuser (single core) This adds a simple fusion backend for the CPU. * Refactors CompiledFusionFunction to have two subclasses that handle the compilation details of each backend. * emit-compile-link-run cycle for the CPU * simple single core loop to run the operation * lift CUDA-only restrictions in the fuser, checks that fusion groups are only on a single backend.	2017-12-04 14:13:44 -05:00
Zach DeVito	710f6d6958	Fix warnings and add alert to enable ninja when developing.	2017-12-03 04:49:41 +01:00
Edward Z. Yang	1c0fbd27a1	CuDNN bindings rewrite (into ATen) (#3666 ) * Comprehensive rewrite of Torch CuDNN bindings / a bit of ATen infra The executive summary is that this moves the torch/csrc/cudnn library into ATen, adding a number of new cudnn_ methods to ATen for batchnorm, convolution, affine grid generator and grid sampler. ATen infra changes: - TensorGeometry was moved to ATen - TensorGeometry was modified to make its interface resemble that of Tensor; in particular, sizes is no longer a field, it's a method. - AT_CUDA_ENABLED macro is set via ATen/Config.h header which is generated at cmake configure time. Fixes https://github.com/zdevito/ATen/issues/168 - Change AT_CUDA_ENABLED macro to be a function macro, so that we error if it is not defined - Introduce a new TensorArg class, which is a Tensor plus a little metadata. This helps us give good error messages when checking dimensions/shapes of tensors. Fixes https://github.com/zdevito/ATen/issues/169 - Also introduce a TensorGeometryArg class, for when you don't need the actual tensor data (which is most of the time.) - Add ATen/Check.h, which contains a number of utility functions for testing shapes, types and devices of input tensors. This will be particulary useful for native methods, which don't get code generated input testing code. These functions take a 'CheckedFrom' argument, at the moment just a string, which specifies some extra information about what function was doing the actual checking; this greatly improves error messages. - Many check functions take initializer lists, which let you test that all tensors have some property. This API is peculiar, in that we IGNORE undefined tensors in this case. This is handled by filterDefined. - Add AT_CUDNN_ENABLED macro - CuDNN linking from ATen was improved; for example, we now actually add the CuDNN headers to our include path. - Add some missing override specifiers to some methods - We now actually build tests with CUDA functionality accessible (previously, AT_CUDA_ENABLED was not defined, meaning that the headers were missing all CUDA-only functionality.) - Native functions now support giving explicit names to return outputs in yaml. This makes it possible to hook into the NN autogenerated derivatives codepath using native functions. CuDNN rewrite changes: - torch/csrc/cudnn now uses ATen (rather than passing around THVoidTensor) and lives in ATen. This lets us remove tensorPointer shenanigans. The functions are exposed to ATen as native functions described in aten/src/ATen/cudnn/cuDNN.yaml - ATen now builds and links against CuDNN when enabled. The cmake package script was taken from Caffe2. - Some header reorganization was done to help reduce dependencies on headers (this reorg is no longer used but I've kept it) - Rename CHECK to CUDNN_CHECK - Rip out old shape/type testing code in favor of modern ATen/Check.h interface using TensorArg. In many cases, increase the robustness of the checking code. - Change the inputs of the public facing functions, so that they can be bound by ATen - Delete THCState; this is retrieved from the global ATen context - Delete cudnnHandle_t, this is retrieved from the global Handles.h - Delete cudnnDataType_t, this is retrieved from the Tensor type - Delete Convolution class, instead its constituent arguments are passed individually - Change functions to return tensors, rather than take an appropriately sized output tensor as an input. - Redo how transposed convolution / backward convolution is implemented (knock on effect of returning tensors). Previously it was assumed that you would always pass an appropriately sized output tensor, but we don't want to do this anymore. For backwards, we instead give the desired output tensor (input, really) size, because that is readily available. For transposed* convolution, however, we take output_padding, and otherwise do the shape calculation. - Redo how legacy group convolution is implemented (knock on effect from porting cudnn to ATen.) Previously, group convolution was implemented by manually constructing sizes and strides and then outputting appropriate, with macros switching between individual groups and all-at-once based on CuDNN version. Now, the code looks exactly what you'd expect: there's a top-level wrapping function that supports group convolution no matter the version of CuDNN, and a low-level wrapper which supports only what CuDNN supports. The top-level function conditions on CuDNN version, and invokes the low-level interface 1 or n times. - There is now a debugging printer for tensor descriptors. - Convolution struct is replaced with ConvolutionArgs, which is not part of the public API but is used internally to conveniently pass around all of the arguments needed for Convolution. - Add some constexprs for well-known dimensions, reduce amount of magic numbers in code. - Put 'deterministic' in to ConvParams. Fixes #3659 - Lots more comments. - Some pessimizations, in the name of code clarity: - The descriptors are initialized on every invocation of convolution forward/backward. Previously, the descriptors were cached, so that you didn't have to initialize them again on backwards. This is difficult to support in the ATen interface so I didn't support it. - Legacy group convolution initializes its workspace for every group it performs. I did not feel motivated to fix this because the legacy codepath is already quite slow. - Affine grid generator and grid sampler automatically call contiguous on their arguments as necessary. - Batchnorm input checking is greatly beefed up, it now checks for the following input characteristics: - Definedness - GPU location - Type - Contiguity - Size PyTorch binding code changes - batchnorm now uses consistent var/data naming - batchnorm and convolution make use of new ATen bindings - Affine grid generator and grid sampler make use of ATen CuDNN bindings via derivatives.yaml. This means I had to restructure the code a little, since the THNN bindings still go through a legacy Python class. - I fixed some warnings: - s/friend class/friend struct/ on InterpreterStateImpl - Removed pessimizing move 'detached' in torch/csrc/autograd/variable.cpp - Removed unused pack_list on Scalar Signed-off-by: Edward Z. Yang <ezyang@fb.com> GCC 4.8 buildfix Signed-off-by: Edward Z. Yang <ezyang@fb.com> Add TensorGeometry to ATen.h Signed-off-by: Edward Z. Yang <ezyang@fb.com> CUDNN_CHECK Signed-off-by: Edward Z. Yang <ezyang@fb.com> Update TODO comment Signed-off-by: Edward Z. Yang <ezyang@fb.com> Delete return in cudnn_grid_sampler Signed-off-by: Edward Z. Yang <ezyang@fb.com> s/cudnnSetStreamToCurrent/setCuDNNStreamToCurrent/g Signed-off-by: Edward Z. Yang <ezyang@fb.com> Don't allocate a new vector when filtering defined. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Remove Check overloads, convert to pass references. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Some more microbenchmarking. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-30 23:06:58 -05:00
Zachary DeVito	70ca83793d	Add support to emit compile_commands.json from CMake/ninja files.	2017-11-30 13:47:27 -05:00
Zachary DeVito	0e54c3a989	Significantly speed up the incremental build. This commit adds code to setup.py to use ninja to manage C++ and code generator dependencies rather than use raw setuptools. This is based on similar code added to ONNX. Enabled optionally when ninja is installed. On my computer speed for a do-nothing build drops from 10s to 1.5 seconds. Speed of other compilation steps is significantly improved as well. Dependencies are tracked correctly so the need for ccache is reduced.	2017-11-30 13:47:27 -05:00
Zachary DeVito	929a11f920	Add interpreter support for Handles/PythonOp/CppOp (#3866 ) * Add interpreter support for Handles/PythonOp/CppOp This treats Handles as a first-class type in the interpreter since this turned out to be conceptually simpler than treating them as a separate concept, which requires a second channel for register allocating and moving data from one op to the next. Notes: * The refcounting nature of tensors is factored into its own base type so that it can be shared with other refcounted types such as handle. * Some methods redundant with TensorBase have been deleted from Tensor * The interpreter uses raw refcounted handles. In addition to being able to treat Tensors and Handles as the same base object, it removes a lot of redundant refcounting as objects moved from tensors to input/ output lists. * aten_dispatch has been updated to work directly on the raw refcounted lists to avoid refcounting and duplicate lists. * Removing jit_closure.cpp, The interpreter can now handle all pathways. * Functions like `unsafeToTensorShare` describe how ownership transfers in the interpreter. The `Steal` variants take rvalue references as arguments, and invalidate those arguments to prevent potential problems. * Make TensorTemporary is not a subtype relationship because it is too easy to do something horribly unsafe: ``` void foo(at::Tensor bar) { // bar destructor call release on a temporary! } foo(TensorTemporary(retainable)); // structure slicing! ```	2017-11-29 11:38:57 -05:00
Sam Gross	4518793aa2	Implement indexing in ATen (#3725 ) Implements basic and advanced indexing using ATen tensors/variables. Basic indexing is translated at the Python-binding level (python_variable_indexing.cpp) to slice/squeeze/unsqueeze/select calls. Advanced indexing is implemented in ATen in terms of take() and put() calls.	2017-11-21 13:19:00 -05:00
Scott Stevenson	a9ef76b9c6	Reflect renaming of OS X to macOS (#3795 )	2017-11-20 16:52:10 -05:00
Adam Paszke	3e4a777e44	Correct JIT interpreter autograd function (#3760 )	2017-11-19 21:48:22 +01:00
Zachary DeVito	cc7f09a372	Add cudaEvent support to the profiler (#3734 ) * Add cudaEvent support to the profiler This adds the ability to record cuda timings using cudaEventRecord in the profiler. Since it doesn't require nvprof it is easier to run than the nvprof path. This also records a thread id for each event, which will make tracing results easier to understand * Add flow arrows from cpu to cuda event * Fix no cuda build * Review comments * Move CUDA checks to one place	2017-11-16 13:58:09 -08:00
Soumith Chintala	99037d627d	fix OSX cuda build (#3722 )	2017-11-15 16:38:18 -05:00
Zachary DeVito	e43ff32192	Add a JIT interpreter (#3634 ) * Add a JIT interpreter The separate interpreter is used to graphs with a lower overhead than converting them to autograd graphs. Some notes: * does not support Handles/PythonOp/CppOp, these will be in a future commit * jit_closure.cpp still exists and we fall back to it for now when cannot handle something because of PythonOp/CppOp * In order to support retain_graph=True, the interpreter can be cloned, creating a copy that can be run with different arguments. This is assumed to be the non-standard case so cloning is not particularly optimized. No tensor _data_ is copied, but the at::Tensor list in the interpreter is. If we hit problems, there is a lot we could do (such as register allocation) to minimize the stuff that needs to be copied. * Uses a pImpl pattern to keep implementation details out of its header file. * Modifies the way getTensorOp works so that it reads/writes to already-existing vectors, this prevents needing to realloc these buffers each time. * Timings are here: https://gist.github.com/zdevito/5a20ac29fb1b9e449e693b67dc478127 This reduces overhead to about the same as running it in python. It is about 10us faster to run the same thing using ATen directly. * Code Mod Interpreter -> InterpreterState Function -> Code Add other requested comments. * RegList -> ListHandle<T> Change the RegList functions to be safer by identifying the type of each argument list, and checking that list insert does not try to add to two different lists at once. * Use exactly equal for interp tests	2017-11-13 22:09:53 -08:00
Sam Gross	4fa94793dd	Bump version in master (#3605 )	2017-11-11 18:49:19 -05:00
peter	7160fb0801	Fix setup scripts for Windows CUDA builds	2017-11-11 13:05:35 +01:00
Adam Paszke	1f1612ee37	Move _CompiledMixin to C++	2017-11-10 16:31:44 +01:00
Soumith Chintala	285ce10dbe	fix linking order of nvrtc to force no-as-needed (#3583 )	2017-11-08 22:05:09 -05:00
Edward Z. Yang	d2784b6e5b	Link ATen against CuDNN when available. (#3582 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-08 20:20:53 -05:00
peterjc123	aa911939a3	Improve Windows Compatibility (for csrc/scripts) (#2941 )	2017-11-08 19:51:35 +01:00
Adam Paszke	621fbd5c4e	Move flattening/unflattening JIT logic to C	2017-11-06 19:42:44 -05:00
Sam Gross	fde355f7d4	Allow in-place operations on views (#3384 ) Allow in-place operations on views Adds VariableViewImpl, a subclass of VariableImpl which has a pointer to the base Variable on which it is a view. In-place operations on views change the grad_fn of the base. Note that in-place operations only work on views that are the first output of the function that created them. All C++/ATen implemented functions have this behavior, but it's possible to write Python-implemented autograd functions that do not. In-place operations on these view will raise an exception. Fixes #3313	2017-11-06 18:19:56 -05:00
Zach DeVito	f6dac327df	build fixes	2017-11-02 19:53:36 -04:00
Zach DeVito	88d56cc198	fix setup.py paths	2017-11-02 19:53:36 -04:00
Zach DeVito	5aa5b572e4	update build so that all of TH* is in libATen	2017-11-02 19:53:36 -04:00
Sam Gross	afdf50cafe	Move jit/assert.h to csrc/assertions.h (#3442 ) I've kept JIT_ASSERT as an alias to TORCH_ASSERT, which we can use throughout the C++ code.	2017-11-02 13:26:51 -04:00
Soumith Chintala	fc7a68d147	fix lint	2017-11-02 07:36:58 -04:00
Soumith Chintala	4108feb27d	fix OSX cuda build	2017-11-02 07:15:24 -04:00
Trevor Killeen	0e38d3bbb3	remove thpp library (#3405 )	2017-11-01 11:57:09 -04:00
Trevor Killeen	b544882335	ATen in THD (Part I) (#2288 ) * enable size from ATen type * temp commit aten thd * port copy, math * port random * changes after rebase * lapack bind * thd and csrc compile * fix min/max reductions in DataChannelTCP * clean up changes * re-enable tensor constructors * port MPI to at::Tensor * fix storage methods to not cast to thpp storage ptrs	2017-11-01 09:59:02 -04:00
Edward Z. Yang	d4abaa4b9e	Move ONNX broadcast fusion into separate ONNX pass, fixes verbose printing. This breaks a lot of the onnx-pytorch tests because the abstraction barriers are not respected. I'll spin up a patch for that separately. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-01 09:49:53 -04:00
Soumith Chintala	91af122d43	add no-as-needed for THRTC	2017-11-01 04:25:42 -07:00
Soumith Chintala	88d9ebc850	lazy-load nvrtc and libcuda (#3408 )	2017-11-01 06:07:03 -04:00
Adam Cécile	a5dbc254f8	if git is not installed at all, no subprocess exception will be raised (#3379 )	2017-10-30 18:37:12 -04:00
Edward Z. Yang	40f7f6e095	Improve handling of 'expand' (broadcasting) in JIT and ONNX The pieces: - I improved the lint / asserts to catch some bugs which I committed while working on my export. There are two new properties which the linter checks now: (1) "Anticipated uses". If a node says that is used by M, M better appear later in the topsort. Previously, we only checked if it was in all_nodes. (2) If you are a select node, you better be a multi-type node; if you're not a select node, you better not be! And you should never have an input that is multi-type. - There is a new peephole optimization pass, for simple, local transformations to graphs. Right now, it implements a simple optimization: remove 'expand' invocations that are no-ops (the size before matches the size after), but we can add other things to it later. I needed this for ONNX because no-op expands show up in the left-hand argument, which we don't support. - There is now a broadcast fuser, which fuses ATen expand ops into broadcastable ONNX ops (Add, Div, Mul, Pow, Sub, Gemm.) It only fuses when the original size is a suffix of the new size, as per the ONNX spec. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-29 23:50:34 -04:00
Maxim Berman	7b00adf5d3	Add CUDNN_LIB_DIR in rpath (#3255 ) * Add CUDNN_LIB_DIR in link -rpath * insert CUDNN_LIB_PATH in front of rpath	2017-10-28 00:13:53 -04:00
Adam Paszke	61afb0d519	Autogenerate ATen dispatch for JIT nodes	2017-10-27 02:40:09 +05:30
Sam Gross	67839ce7bc	Delete unused Softmax code (#3220 ) Softmax and LogSoftmax are automatically bound and dispatched through VariableType.	2017-10-21 20:51:27 +02:00
Edward Z. Yang	67612cba09	Add -Wno-missing-braces Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-19 23:04:19 -04:00
Sam Gross	f1f64c8d07	Generate autograd functions for NN / more refactors (#3136 ) Generate autograd functions for NN and implement more derivatives in derivatives.yaml A big refactor of gen_variable_type.py	2017-10-19 15:03:26 -04:00
Adam Paszke	98e67448fa	Large Softmax and LogSoftmax refactor - Cleaned up THNN and THCUNN code and kernels - Improved THCUNN kernel performance 5x, making it match cuDNN performance - Added support for computing softmax over arbitrary dims NOTE: The default dim for 3D inputs is now 1 (used to be 0) - Both functions now accept inputs with arbitrarily many dimensions - Autograd functions no longer save the input (it's unnecessary) - Added cuDNN bindings for softmax, but they are unused as THCUNN matches or even exceeds cuDNN performance	2017-10-19 19:51:10 +02:00
Trevor Killeen	dcb457fdd9	add support for using nnpack when installed via conda (#3155 ) * add support for using nnpack when installed via conda * unify nnpack discovery between conda and user	2017-10-18 20:11:13 +02:00
Richard Zou	0f4ae13f05	Better cudnn version checking (#3132 )	2017-10-16 20:59:18 +02:00
Richard Zou	1322f9a272	Add cudnn version to torch.version	2017-10-13 23:58:25 +02:00
Francisco Massa	f093545919	Add compiled CUDA version in torch.version.cuda	2017-10-10 10:16:14 -04:00
Soumith Chintala	efe91fb9c1	delete redundant python nccl code	2017-10-09 22:24:18 -04:00
Soumith Chintala	4d62933529	add initial NCCL C bindings	2017-10-09 22:24:18 -04:00

1 2 3 4 5

210 Commits