pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Thomas Viehmann	f98b778086	Fix forward and backward for norm/renorm with infty norm (fixes #6817 ) (#6969 )	2018-04-26 12:54:53 +02:00
gchanan	3d907ef78e	Consistently check 'out' variants against specified dtype/layout/device parameters. (#6973 ) We were previously doing this in the most common cases, but not consistently.	2018-04-25 22:46:42 -04:00
Soumith Chintala	333e8c9b22	any/all returns LongTensor, make test expect that (#6957 )	2018-04-25 14:05:29 -04:00
Tao He	39d4814933	Make any and all on ByteTensor behave like sum/prod. (#4627 )	2018-04-25 10:25:38 +02:00
cpuhrsch	a8bdb561b7	Fix reductions on some contiguous tensors where size(dim) == 1 (#6815 )	2018-04-22 13:55:55 -04:00
Richard Zou	d1a992a85e	Disallow chunks that are <= in torch.chunk (#6761 ) Fixes #6759. Before, `tensor.chunk(0)` would cause a divide by 0. `tensor.chunk(-1)` would throw an error complaining that "split_size needs to be positive". This PR changes it so that the error message makes it clear that `chunks` has to be greater than 0.	2018-04-19 18:31:14 -04:00
MRuberry	9c47eb5548	Fixes test_torch.py so that all tests pass on Volta hardware. (#6736 ) Issue: "python3 test_cuda.py" currently results in a failure when using Volta hardware. The failure is in test_advancedindex, and is caused by two "sub-tests." At line 4651 a series of indices are used to compare PyTorch's and Numpy's indexing behavior. At least two of these indices index the same element of the reference tensor multiple times. These are: [slice(None), [[2]], [[0, 3], [4, 4]]] [slice(None), [[0, 1], [1, 0]], [[2, 3], [3, 0]]] The first index selects the 5th element of the third row twice, and the second index selects the 4th element of the second row twice. This causes the test to attempt to update the same index with two distinct values simultaneously. On my machine the Numpy created tensor will always take the "latter" of these two values, while the Volta tensor will always take the "former." (Not to say this behavior is guaranteed by either framework.) The fix is to remove these two indices from test_torch.py. This causes all tests to pass. While updating test_torch.py I also noticed that assert_get_eq(tensor, indexer) had a bug where it was referring to "reference" instead of "tensor." This bug had no impact on behavior. The fix is to have this function refer to its input tensor, "tensor," instead. All tests still pass after this fix.	2018-04-18 22:44:14 -04:00
Adam Paszke	d26ab68485	Sort declarations when generating Python bindings (#6701 ) * Sort declarations when generating Python bindings This helps resolve ambiguities in argument parsing according to any rules we will need. For now, this allows us to make scalar operations more conservarive wrt. argument types, but makes them commutative again. * Fix inconsistencies between mod with tensor and scalar * Fix a stupid mistake	2018-04-18 21:51:35 -04:00
Thomas Viehmann	bd0cc7d364	Implement torch.einsum (fixes #1889 ) (#6307 ) * start at generic trilinear * Implement einsum (fixes #1889) This provides a simple implementation of einsum. It is built on top of the work for computing bilinear (#6110). It uses a naive left-to-right resolution at the moment. Autograd is able to differentiate by itself. The obvious unsupported feature is taking diagonals (einsum('ii->i',(a,)). * add tests and docs * fix flake8 * clean diff * rebase on current master to resolve conflicting String wrapping * clean up after rebase * better commentary in einsum and sumproduct_pair * don't say fixme if it's fixed and rename num_outputs to num_output_dims * adapt python wrapper to use std::string instead of String to avoid typedef at::String * typos and some vector to array conversion * fix accidental python<->python3 change * really fix bad rebase	2018-04-18 13:41:27 +02:00
Francisco Massa	feb8522f99	randperm supports n=0 (#6656 ) This makes it compatible with arange and numpy.random.permutation	2018-04-17 19:03:57 +02:00
gchanan	30849eb668	Bind 0-dim variables without requires grad to int64/double similar to how we do with Scalar. (#6637 ) Note: - Only integral scalar types bind to int64 - Both integral and floating point scalar types bind to double (same rules as python numbers).	2018-04-17 09:54:49 -04:00
Du Phan	c345212c86	Support gpu triangle solve (#6648 ) * add cuda trtrs * remove queue * add test trtrs	2018-04-17 14:33:39 +02:00
gchanan	5ed3f3347a	Add dtypes (with reasonable defaults) to sum, prod, cumsum, cumprod. (#6573 ) * Add dtypes (with reasonable defaults) to sum, prod, cumsum, cumprod. This adds optional dtypes to torch.sum, torch.prod, torch.cumsum, torch.cumprod. By default, the dtype is torch.float64 for integral types, and the dtype of the input for floating point types. * Don't use optional<ScalarType>, because the jit can't handle it yet. Instead, we manually build the overloads. This is fairly painful because of default arguments, but should be easy to pull out once the jit can handle optional<ScalarType>. * Fix keepdim with out parameters. * Fix _cudnn_rnn_flatten_weight. * If dtype is provided to an out function, make sure it matches the dtype of the result. * Fix typo.	2018-04-16 23:52:59 -04:00
gchanan	d7cb78478f	Split set_default_tensor_type(dtype) into set_default_dtype(dtype). (#6599 ) * Split set_default_tensor_type(dtype) into set_default_dtype(dtype). * Fix flake8. The difference between this one and set_default_tensor_type is that it only sets scalar type what determines the type + device of a tensor returned from a factory function with defaults is the default tensor type + the current device (if the default tensor type is cuda). This just changes the scalar type of the default tensor type. We do eventually want to deprecate set_default_tensor_type; it is not clear how to do that in a sensible and backwards compatible way.	2018-04-16 13:49:00 -04:00
gchanan	46374ad5c8	Add tensor.to(device) method. (#6588 ) * Add tensor.on(device) and tensor.on_device_as(tensor) methods. * Rename {'on', 'on_device_as'} -> 'to'. * Fix test ordinal. * Fix device ordinal again.	2018-04-16 10:50:34 -04:00
Richard Zou	6c0f74089f	More precise digamma (#6517 ) * More precise digamma Fixes #6190. This is a rebase of #3955 with some tweaks for better performance around poles. The code is ported over from cephes with permission. By itself, the cephes code returns inf for the poles. For better performance around the poles with float32, one intermediate step is always computed with double precision, regardless of dtype. This step does `PI / tan(PI * input)`. This is necessary because small (1e-6) rounding errors for the inputs to tan have strong effects on the output (ie, the derivative of tan is very large at some points). * Replace usages of finite-differences digamma with newly implemented digamma * Better behavior near and at poles * ScalarConvert -> scalar_cast for readability	2018-04-13 11:49:09 -04:00
Tongzhou Wang	8aa0ae3836	Support arbitrary number of batch dimensions in *FFT (#6528 )	2018-04-12 15:03:22 -04:00
gchanan	749d51414a	Separate cuda-ness from dtype. (#6470 ) * Separate cuda-ness from dtype. There are no longer torch.cuda.int64, etc; only torch.int64 that correspond to at::ScalarType. At the python arg parser level, the corresponding ATen type is selected from the combination of (ScalarType, Layout, Device). There is also currently unused code in here for support ScalarType in native_functions; this will be used for specifying aggregate types on reduction functions. * Fix test_autograd. * Add defaults to randint_like. * Track is_cuda in py tensor types. * Fix test_sparse. * Fix multiprocessing. * Fix rnn. * Fix test_nn. * Fix flake8.	2018-04-12 14:05:44 -04:00
Tongzhou Wang	ca09e4a3c5	Fix THTensor_(take) negative index check (#6482 ) * fix THTensor_(take) negative index check * add tests * rename to invalidIdxPos	2018-04-11 12:12:35 -04:00
Tongzhou Wang	0dff2b5e35	[fft] [3 of 3] Implements backward of fft ifft rfft irfft (#5537 ) * change irfft signal_sizes arg to be the last * add docs for fft, ifft, rfft, irfft; update doc for stft * fix typo in window function docs * improve gradcheck error message * implement backward of fft, ifft, rfft, irfft * add grad tests for fft, ifft, rfft, irfft * fix nits and typos from #6118 * address comments	2018-04-10 22:09:36 -04:00
Tongzhou Wang	930f181255	Fix fft when any of the input dimensions is not aligned (#6118 ) * fix fft when any of the input dimensions is not like complex type; add test for ifft+fft * clarify the comments * Address comments: add note; add helper function * use at::nullopt * add notes on conjugate symmetry; fix complex-to-real cloning condition (should be advanced data layout rather than base_istride) * add at::sum_intlist and at::prod_intlist * revert optional<vector> helper due to windows compiler error	2018-04-10 13:11:05 -04:00
albanD	bb097e2a50	[pytorch] Fix signed random_ (#6463 ) * Fix cpu signed random * fix gpu signed tensor * add test for signed random_ * cleaner tests * fix lint	2018-04-10 13:07:04 -04:00
Naman Jain	acb7df11a2	Add torch.randint and torch.randint_like functions (#6136 ) Adds randint and randint_like to TensorFactories.cpp	2018-04-10 12:08:21 -04:00
Zhou Chang	d0f395f744	[pytorch] Fix clamp is missing kwarg out (#6028 ) (#6418 ) torch.clamp is out from template code, add it manually, same with auto generated code.	2018-04-09 13:39:31 -04:00
gchanan	87e369111a	Add string-style devices to all tensors. (#6283 ) * Add string-style devices to all tensors. Previously, tensors only had a 'get_device' method which would throw an exception on a CPU tensor. This made it necessary to if/else code that was meant to be device agnostic. This PR implements the following: 1) Adds a 'device' property to all tensors that returns a string representation of the device for all tensors. For cpu tensors this is 'cpu'. For cuda tensors this is 'cuda:X', where X is the cuda device ordinal. 2) Adds a DeviceSpec class. This is just a helper class for separating device_type and device_index specification and to allow partial specification. For example, you can call DeviceSpec('cuda'), DeviceSpec('cuda:0'), DeviceSpec('cuda', 1). Also has backwards compatibility support for specifying integers, which are treated as cuda devices. DeviceSpecs have the following properties: a) device_type: string representation of the device type (i.e. 'cpu' or 'cuda') b) device_index: integer for the device index (None if not specified) c) cuda_device_index: for backwards compatibility; behaves roughly like `get_device` did previously. I.e. if a function previously took integers for cuda devices, it can now take DeviceSpecs (or strings), and can maintain the old functionality by calling `old_index = DeviceSpec(old).cuda_device_index`. 3) tensor methods and torch. functions that took integer devices can now take integers, strings, or DeviceSpecs. For example: torch.randn((2,3), dtype=torch.cuda.float32, device='cuda:1') TODO in future PRs: A) Split out cuda from dtype so you don't need to overspecify cuda-ness B) We currently only support strings/DeviceSpecs in tensor methods and torch. functions. We should have equivalents torch.cuda.device(...), torch.cuda.device_of, etc. at the torch. level that work on strings/DeviceSpecs * Add deviceInt64 to python arg parser. * device_str. * Remove device_str. * remove device prefix from attributes. * Use const char * instead of string. * Move autogpu index out of Device. * comment on is_default. * Rename torch.DeviceSpec to torch.device. * comment. * Fix tests. * Fix flake8. * Fix sparse_coo_tensor parameter name. * Improve error message. * Remove device_ prefix from C++ device object. * Allocate static strings. * Return not implemented from rich compare. * Move torch::Device to THPDevice. * Remove cuda index. * Py_RETURN_NOTIMPLEMENTED doesn't exist in python2.	2018-04-06 15:12:05 -04:00
Tongzhou Wang	29c69f049e	add test for old tensor serialization (#6275 )	2018-04-05 17:00:30 -04:00
Vishwak Srinivasan	0aa35780bf	[ready] Implement log2 and log10 in PyTorch (#6272 ) * Implemented log2 and log10 * Re-add incorrectly removed files * Fix minor bugs * Fix log1p docs * Add a try-except for python2 math module in log2 test * Revert changes made to aten/doc/* * Fix docstring errors * Fix windows build	2018-04-05 14:28:37 -04:00
Peter Goldsborough	9ba70856a1	Add max_values and argmax convenience functions to ATen (#6201 ) * Add max_values and argmax convenience functions to ATen * Add documentation for torch.argmax/argmin and skip max_values * Add tests for argmax/argmin * Dont default the dim argument * Use dim=0 in test_torch.py for argmax tests * Implement argmin() and argmax() without dim * Call .contiguous() before .view(-1)	2018-04-04 15:53:26 -04:00
Sam Gross	6b3a4637d6	Make the tensor type torch.Tensor instead of torch.autograd.Variable (#5785 ) This changes type(tensor) to return `torch.Tensor` instead of `torch.autograd.Variable`. This requires a few implementation changes: - torch.Tensor is now a regular Python class instead of a pseudo-factory like torch.FloatTensor/torch.DoubleTensor - torch.autograd.Variable is just a shell with a __new__ function. Since no instanes are constructed it doesn't have any methods. - Adds torch.get_default_dtype() since torch.Tensor.dtype returns <attribute 'dtype' of 'torch._C._TensorBase' objects>	2018-04-03 16:29:25 -04:00
Sam Gross	4a9e02fc2f	Reduce flakiness of math tests in test_torch.py (#6200 ) This compares the torch function against the reference math funciton against a relative small set of inputs, including integers, extremes of some common functions, zero, a few numbers from randn and a few numbers near 1e6. The idea here is not to be completely exhaustive, but rather quickly expose the most common bugs. For exhaustive checks, we should evaluate torch functions against all ~4e9 possible float32 value. We compare the torch function evaluated against contiguous and non-contiguous inputs and large vs. small tensors. Also: - Make torch.allclose work with nan and +/-inf - Add torch.isclose (like numpy.isclose) - Add torch.testing.assert_allclose (like numpy.testing.assert_allclose)	2018-04-03 13:51:47 -04:00
gchanan	4c81282c33	Introduce torch.layout and split layout from dtypes. (#6145 ) * Introduce torch.layout and split layout from dtypes. Tensors (and tensor types) now have a 'layout' attribute that returns either 'torch.strided' or 'torch.sparse_coo'. Previously, dtypes were 1-to-1 with ATen types/PyTensorTypes; the impetus behind this decision was to make things easy in the common case (i.e. specifying a type in a factory function). But this doesn't really follow for sparity, which isn't a common case. It also doesn't properly represent the concept or a dtype, which in numpy are proper scalar types (i.e. roughly the type returned from indexing the last dimension of an n-d array). But this should be the same whether or not the tensor is represented via strides, sparsity, etc. This is accomplished by: 1) having the dtype of tensor return the (device-type, scalar-type) combination, i.e. torch.cuda.float32, so both torch.cuda.FloatTensor and torch.cuda.sparse.FloatTensor have the same dtype 2) Adding a layout parameter to python functions, where the combination of (dtype, layout) maps to an ATen type that is used for dispatch. * Formatting, make init throw python_error. * Fix cuda not enabled error message. * Fix test.	2018-04-02 14:07:50 -04:00
cpuhrsch	bc1b4c8912	ByteTensor sum test (#6042 )	2018-03-30 10:58:38 -04:00
Sam Gross	e4c0bb1809	Speed up sum over a dimension (#6026 ) Perf numbers: https://gist.github.com/colesbury/9e28dd7b0f27b0b019f68adbd4bd4b88 I've changed the dispatch stub so that it doesn't require every kernel to be compiled for every instruction set. Kernel implementations are stored in the stub's table with the REGISTER_DISPATCH macro. I've also moved vec256 to it's own folder and split up the specializations before they get too unwieldy. Change UnaryOpsKernel to use new DisaptchStub - Prefer signed integers. Mixing signed and unsigned integers is a pain and ATen mostly uses signed integers (int64_t). - Use inline lambda instead of struct for UnaryOps - Rename partial load overload "load_partial"	2018-03-29 18:13:43 -04:00
cpuhrsch	f5d0d947c1	Exp, log, sin, cos vectorized (#6078 ) Measured perf using the this script: https://paste.fedoraproject.org/paste/yJiXU3AZGHuyjTVRWlj5OQ	2018-03-29 13:24:44 -04:00
Tongzhou Wang	ecd5de0f36	[fft][2 of 3] Forward for fft methods (#5856 ) * implement fft ifft rfft irfft * add tests for fft ifft rfft irfft	2018-03-28 18:44:29 -04:00
gchanan	6ae0576e1c	Remove dtypes from legacy tensor.new(...) (#6081 ) This is in preparation for splitting out sparsity (layout) from dtypes; it's complex to maintain these and tensor.new(...) is a legacy API in any case.	2018-03-28 18:37:21 -04:00
cpuhrsch	bde2f6b298	ATen Unary Ops (#6030 ) Implements a few unary operations for which there are AVX intrinsics. The perf comparison script is here: https://paste.fedoraproject.org/paste/f1adcJhpGtzDNWImS34XzQ	2018-03-27 20:39:28 -04:00
gchanan	db53389761	Add numpy.array-like type inference to torch.tensor. (#5997 ) * Add numpy.array-like type inference to torch.tensor. * Temporary fix for int/double types. * Treat python floats as the default (scalar) dtype. * Also make 0-length sequences the default scalar type and add more tests. * Add type inference to sparse_coo_tensor. * Fix sparse test. * Remove allow_variables. * Check numpy platform bits. * Address review comments. * Make suggested changes to constraints. * More checking windows builds. * Fix test for windows.	2018-03-27 15:27:23 -04:00
Richard Zou	9923701a0d	Fix crash when cat-ing empty cuda tensors (#5971 ) Fixes #5739. The CUDA path for `torch.cat` was missing a check for the case where all input tensors are empty.	2018-03-23 22:22:39 -04:00
Richard Zou	8e22ef0cb2	Support legacy empty tensor behavior in cat (#5889 ) * Support legacy empty tensor behavior in cat Continuing from #5837: Fixes #5332. Currently, the following behavior happens with torch.cat: ``` import torch x = torch.randn(4, 3, 32, 32) empty = torch.Tensor([]) res1 = torch.cat([x, empty], dim=1) res2 = torch.cat([empty, x], dim=1) ``` However, at some point in the past, res1 and res2 were equal. This PR supports the legacy behavior of ignoring empty tensors when concatenating a list of tensors, until we have empty tensors that can have arbitrary shape, at which point we'll stop supporting this behavior. * Address comments	2018-03-23 11:53:31 -04:00
cpuhrsch	befd9642bf	py3 - use loop instead of map for test_torch:test_cpu_parallel (#5940 )	2018-03-22 11:28:29 -04:00
cpuhrsch	e9f144b3e8	parallel_for_2d fix and guarding avx/avx2 compilation (#5926 ) Fix for #5921. I'm adding support compilers that don't support -mavx -mavx2 by revisiting the dispatch code.	2018-03-22 01:14:56 -04:00
Vedanuj Goswami	08b1324ec2	Fix integer overflow in remainder operator (#5906 ) * Fix integer overflow in remainder * Fix remainder operator in CUDA * Add tests for remainder integer overflow * Add has_different_sign static function	2018-03-20 22:05:34 -04:00
Richard Zou	cf2e176049	Fix error message for cat-ing zero-dim tensors (#5819 ) Fixes #5552 * Fix error message for cat-ing zero-dim tensors * Address comments	2018-03-19 16:06:27 -04:00
Tongzhou Wang	940a0ab67b	Add logdet and slogdet (#5393 ) * 1. Add logdet and slogdet in ATen side 2. Previously, det can return result with incorrect sign upon seeing symmetric matrices. This is caused by the wrong assumption I had on SVD (when input is symmetric U=V^T). This fixes it. 3. Moreover, after fixing 2 now QR is always needed for det forward. So I moved SVD to backward call. Since this is a specific variant of SVD, it is named as _svd_with_positive_UV_det, with derivative.yaml entry being svd_backward. 4. Updated/added backward functions for det, logdet and slogdet, which uses _svd_with_positive_UV_det and svd_backward inside. 5. Optimized svd_backward: a. Avoid unnecessary kernels when only sigma has gradient (this is the usual case, and also true with det backward functions). b. Fix SVD double backward by avoiding a nan. 1. Add/update grad checks for det, logdet, and slogdet. 2. Fix an incorrect check for dim_args_idx in test_autograd.py 3. Add option to only test a subset of output values, specified by test_output_indices, for cases like slogdet where only the second output is differentiable. 4. Add better doc for the test generating list. * Add/improve output tests for det, logdet and slogdet Add a scaling to random matrices so closeness checks are more robust * Remove unnecessaery Variable wrappers in some test files * Add logdet slogdet docs * Improve an err msg in THTensorLapack.c * add inverse-based backward for invertible matrices use svd only for non-invertible case, so don't need the special variant anymore * use LU rather than QR	2018-03-16 09:23:00 -04:00
cpuhrsch	5fa3aac610	ATen ReduceOps (#5776 ) #5481 was reverted due to a strange test bug. This PR attempts to fix that. This diff adds vectorization to ATen. It uses intel intrinsics to build a general vec256 class, that represents types of 256bit width. These can then be treated like regular variables. Using those it implements torch.sum() for the contiguous case. It uses Intel TBB for multithreading, which allows workstealing and chunks the reduction operations based on a experimentally chosen value (_THRESHOLD). It uses cpuinfo to pick the right code depending on the host's capabilities. The kernels are implemented under native/cpu. Each .cpp file is compiled with -avx, -avx2 and no additional flags. A macro is used to append AVX, AVX2 or NONE to the function name. The header then needs to define the functions three times, one for each capability. This could be improved by either changing the cmake file a bit or possibly generating source code using a Python script etc. For the non-contiguous case this defaults to the current implementation within TH. For CUDA is entirely defaults to the implementation within THC. There probably needs to be a bit of a debate around the design decisions here, the additional dependencies, parallelization strategy, clarity, etc. The numerical results also diverge from numpy with larger tensors, which is expected since we're summing, for example, 8 numbers and then adding the result to the running sum, instead of each number one by one. But there might be something to be said about accumulating into a double for floats or the degree of divergence, the behavior with respect to CUDA, etc. I wrote a [small Python script]( https://github.com/cpuhrsch/benchmark/blob/sumall/benchmarks/sum_bench.py) to compare the results with numpy numerically as well as on timing. I ran this script to create timings both on master and this branch. Here is the command for 1 core `OMP_NUM_THREAD=1 taskset -c 0 python sum_bench.py --enable_numpy 200` Here is the command for all cores `python sum_bench.py --enable_numpy 200` Here are the results of each: [Master, 1 core](https://paste.fedoraproject.org/paste/Nho9JzHpPVK9av8a6mByjQ) [This branch, 1 core](https://paste.fedoraproject.org/paste/6xLHkYvcVJx9z~5MoHxN4w) [Master, all cores](https://paste.fedoraproject.org/paste/5l3V1d5zGqvJcMXIUteMRw) [This branch, all cores](https://paste.fedoraproject.org/paste/J4RuDU-0Drz0aZwtphQwEA) To test the command is `python sum_bench.py --test 200` [This branch, test results](https://paste.fedoraproject.org/paste/kTEoUC~oWgXA6XWMAfNfNw) For this test we look at the average absolute value of the differences. This does not take into account the relative magnitude of the numbers. The numbers are sampled from a standard normal distribution. In terms of performance this diff should bring PyTorch on par with Numpy and usually exceed it by 1.5 to 2x.	2018-03-15 12:09:28 -04:00
Edward Z. Yang	cadeb0cb17	Revert "ATen ReduceOps (#5481 )" (#5765 ) * Revert "ATen ReduceOps (#5481)" This reverts commit `310c3735b9`. * Revert "Check that new cpuinfo and tbb submodules exist (#5714)" This reverts commit `1a23c9901d`.	2018-03-13 23:50:16 -04:00
Richard Zou	542fbcc127	Add optimization to norm for common norms (#5722 )	2018-03-12 19:54:49 -04:00
Sam Gross	a2641500bf	Implement torch.reshape and Tensor.reshape (#5575 ) * Implement torch.reshape and Tensor.reshape This implements reshape which has similar semantics to numpy.reshape. It will return a view of the source tensor if possible. Otherwise, it returns a copy. * Remove in-place reshape_ that was an alias for resize_ * Update documentation	2018-03-12 16:20:40 -04:00
gchanan	f6c708f869	Ensure torch.tensor and Tensor.new_tensor copy numpy data. (#5713 )	2018-03-12 16:20:10 -04:00

1 2 3 4 5 ...

292 Commits