pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Hameer Abbasi	3d46e02ea1	Add __torch_function__ for methods (#37091 ) Summary: According to pytorch/rfcs#3 From the goals in the RFC: 1. Support subclassing `torch.Tensor` in Python (done here) 2. Preserve `torch.Tensor` subclasses when calling `torch` functions on them (done here) 3. Use the PyTorch API with `torch.Tensor`-like objects that are _not_ `torch.Tensor` subclasses (done in https://github.com/pytorch/pytorch/issues/30730) 4. Preserve `torch.Tensor` subclasses when calling `torch.Tensor` methods. (done here) 5. Propagating subclass instances correctly also with operators, using views/slices/indexing/etc. (done here) 6. Preserve subclass attributes when using methods or views/slices/indexing. (done here) 7. A way to insert code that operates on both functions and methods uniformly (so we can write a single function that overrides all operators). (done here) 8. The ability to give external libraries a way to also define functions/methods that follow the `__torch_function__` protocol. (will be addressed in a separate PR) This PR makes the following changes: 1. Adds the `self` argument to the arg parser. 2. Dispatches on `self` as well if `self` is not `nullptr`. 3. Adds a `torch._C.DisableTorchFunction` context manager to disable `__torch_function__`. 4. Adds a `torch::torch_function_enabled()` and `torch._C._torch_function_enabled()` to check the state of `__torch_function__`. 5. Dispatches all `torch._C.TensorBase` and `torch.Tensor` methods via `__torch_function__`. TODO: - [x] Sequence Methods - [x] Docs - [x] Tests Closes https://github.com/pytorch/pytorch/issues/28361 Benchmarks in https://github.com/pytorch/pytorch/pull/37091#issuecomment-633657778 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37091 Reviewed By: ngimel Differential Revision: D22765678 Pulled By: ezyang fbshipit-source-id: 53f8aa17ddb8b1108c0997f6a7aa13cb5be73de0	2020-08-05 20:44:13 -07:00
Michael Carilli	8066fba226	[RELAND2] Change AccumulateGrad to yield `.grad`s that match weights' memory layout (#40358 ) Summary: https://github.com/pytorch/pytorch/pull/40129 fixed the error responsible for the first revert, but exposed another error in the same test. This PR is intended as the "master copy" for merge, and it runs on full CI. Two other PRs (restricted to run on a small subset of CI) supporting debugging DDP failures/hangs with multiple devices per process (`test_c10d.py:DistributedDataParallelTest.test_grad_layout_1devicemodule_2replicaperprocess`). - https://github.com/pytorch/pytorch/pull/40290 tries the test with purely rowmajor contiguous params on an untouched master. In other words https://github.com/pytorch/pytorch/pull/40290 contains none of this PR's diffs aside from the test itself. - https://github.com/pytorch/pytorch/pull/40178, for comparison, tries the test with this PR's diffs. Both fail the same way, indicating failure is unrelated to this PR's other diffs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40358 Differential Revision: D22165785 Pulled By: albanD fbshipit-source-id: ac7cdd79af5c080ab74341671392dca8e717554e	2020-06-22 17:13:21 -07:00
Alban Desmaison	08227fea4f	Revert D22079377: [pytorch][PR] [RELAND] Change AccumulateGrad to yield `.grad`s that match weights' memory layout Test Plan: revert-hammer Differential Revision: D22079377 Original commit changeset: 9bd2b7e0c34f fbshipit-source-id: c22cc349d790caa574eace0d63980854c33e5a59	2020-06-17 10:17:27 -07:00
Michael Carilli	1ec8ece2b9	[RELAND] Change AccumulateGrad to yield `.grad`s that match weights' memory layout (#40129 ) Summary: https://github.com/pytorch/pytorch/pull/34904 was reverted because it had a misconfigured 4 GPU test that for some reason wasn't caught by external CI ([example failure](https://app.circleci.com/pipelines/github/pytorch/pytorch/181719/workflows/cfb37cd9-9a0c-4738-898b-d683934cd308/jobs/5868948/steps)). This PR reverts the revert, and adds diffs that should repair the misconfigured test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40129 Differential Revision: D22079377 Pulled By: albanD fbshipit-source-id: 9bd2b7e0c34fdaf887497b52037cfe82cba709c1	2020-06-17 09:02:54 -07:00
Xiang Gao	d57ca73c53	Remove item and data_ptr for std::complex (#39838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39838 Differential Revision: D22068251 Pulled By: ezyang fbshipit-source-id: d1f0e1ff98290a139f1a080a9f7a1258943cd3ad	2020-06-16 11:13:54 -07:00
Alban Desmaison	f1e575a0bf	Revert D20496044: [pytorch][PR] Change AccumulateGrad to yield `.grad`s that match weights' memory layout Test Plan: revert-hammer Differential Revision: D20496044 Original commit changeset: 248d680f4b1b fbshipit-source-id: 6462b25e3fb9c8596c1da443389089f09c32df4d	2020-06-16 10:38:40 -07:00
Michael Carilli	2beb9690c3	Change AccumulateGrad to yield `.grad`s that match weights' memory layout (#34904 ) Summary: Currently, whether `AccumulateGrad` [steals](`67cb018462/torch/csrc/autograd/functions/accumulate_grad.h (L42)`) or [clones](`67cb018462/torch/csrc/autograd/functions/accumulate_grad.h (L80)`) an incoming gradient, the gradient ends up rowmajor contiguous, regardless of its param's layout. If the param's layout is channels last, or otherwise not rowmajor contigous, later kernels that apply gradients to params are forced into an uncoalesced memory access pattern for either the param or the gradient. This may not sound like a big deal but for any binary op on large tensors it's a >3X increase in gmem traffic => 3X slowdown. The present PR changes `AccumulateGrad` to prefer, where possible, stashing gradients that match their params' layouts (["Gradient Layout Contract"](https://github.com/pytorch/pytorch/pull/34904/files#diff-ef1a56d24f66b280dcdb401502d6a796R29-R38)). Allowing `AccumulateGrad` to stash non-rowmajor-contiguous grads means DDP allreduces and DP reduces must allow non-rowmajor-contiguous grads. This PR extends DDP and DP to allow gradients with non-rowmajor-contiguous strides as long as their layout is nonoverlapping and dense. For good measure, I include changes that allow all five nccl primitives (allreduce, reduce, broadcast, allgather, reducescatter) to act on non-rowmajor-contiguous tensors (again as long as each input's layout is nonoverlapping and dense, and as long as all tensors participating in a given collective have the same layout). The primitive comm changes aren't necessary to enable the DDP changes, but I wasn't sure this would end up true until I had written both sets of changes. I think primitive comm enablement is reasonable to keep in the PR, especially since the code for it is simple. Channels last params will be a major beneficiary of this PR, but I don't see it as channels-last-specific fix. The spirit is layout matching in general: - Grads should be stashed with memory layouts matching their params. - Src and dst tensors on opposite ends of collectives should have matching dense layouts. This PR also updates autograd docs to describe potential BC-breaking changes below. ## BC notes ngimel albanD gchanan #### BC-breaking In the common case where the user lets AccumulateGrad decide grad layouts, strides for grads of dense but non-rowmajor-contiguous params will change. Any user code that was accustomed to `view(-1)`ing these grads will break. Also, the circumstances under which a grad can be stolen directly from the backward function that created it, as opposed to deep-copied by AccumulateGrad, have changed. In most cases we expect silent performance improvement, because we expect channels-last-aware backward kernels will create channels last gradients for channels last params. Now those can be stolen, whereas before this PR they were cloned and made rowmajor contiguous. IMO this is a mild BC breakage. Param backward hooks still see grads come in with whatever format the backward kernel gave them. The only BC breakage potential I see is if user code relies somehow on a grad in a hook having or not having the same deep memory as the eventual `param.grad`. Any such users hopefully know they're off the edge of the map and understand how to update their expectations. #### BC escape hatches At alband's recommendation, this PR's changes to AccumulateGrad do not alter the pre-PR code's decisions about whether grad is accumulated in or out of place. Accumulations of new grads onto an existing `.grad` attribute were (usually) in-place before this PR and remain in-place after this PR, keeping the existing `.grad`'s layout. After this PR, if the user wants to force accumulation into a grad with a particular layout, they can preset `param.grad` to a zeroed tensor with the desired strides or call `grad.contiguous(desired format)`. This likely won't be as performant as letting AccumulateGrad establish grad layouts by cloning or stealing grads with contract-compliant strides, but at least users have a control point. One limitation (present before this PR and unchanged by this PR): Presetting `param.grad` does not ensure in-place accumulation all the time. For example, if `create_graph=True`, or if incoming `new_grad` is dense and existing `variable_grad` is sparse, accumulation occurs out of place, and the out-of-place result may not match the existing grad's strides. ---------------------------- I also noticed some potential DDP improvements that I considered out of scope but want to mention for visibility: 1. make sure Reducer's ops sync with AccumulateGrad streams 2. ~to reduce CPU overhead and incur fewer kernel launches, lazily create flat `contents` tensors by a single `cat` kernel only when a bucket is full, instead of `copy_`ing grads into `contents` individually as soon as they are received.~ PR includes a [minor change](https://github.com/pytorch/pytorch/pull/34904/files#diff-c269190a925a4b0df49eda8a8f6c5bd3R312-R315) to divide grads while copying them into flat buffers, instead of copying them in, then dividing separately. Without cat+div fusion, div-while-copying is the best we can do. 3. https://github.com/pytorch/pytorch/issues/38942 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34904 Differential Revision: D20496044 Pulled By: albanD fbshipit-source-id: 248d680f4b1bf77b0a986451844ec6e254469217	2020-06-16 08:43:31 -07:00
Gregory Chanan	287f3b746e	Remove Backend -> THPLayout mapping. (#37527 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37527 This is yet another place that needs to be updated for adding a new "Backend" and is unnecessary. Instead, just use layout_from_backend and have a map from Layout -> THPLayout. Other changes: - rename torch::getDtype and torch::getLayout to torch::getTHPDtype and torch::getTHPLayout since e.g. for layout you are both passing in and returning a "layout" type. - add NumOptions to Layout to match the dtype/ScalarType formulation. Test Plan: Imported from OSS Differential Revision: D21309836 Pulled By: gchanan fbshipit-source-id: ede0e4f3bf7ff2cd04a9b17df020f0d4fd654ba3	2020-04-30 11:11:09 -07:00
Pritam Damania	05e98149ae	Refactor lambda post hook. (#37025 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37025 This allows us to reuse this framework in other places. Test Plan: buck test mode/dev-nosan caffe2/torch/fb/distributed/model_parallel/tests:test_dist_optim -- test_optimizer_hook Differential Revision: D20958327 fbshipit-source-id: 2a37dae3687fea8820427e174900111b58673194	2020-04-23 15:29:34 -07:00
Supriya Rao	032c27cff7	[quant][graph] Add _choose_qparams function for graph mode (#35235 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35235 For dynamic quantization in graph mode, we need an operator that returns the qparams of the tensor similar to the linear_dynamic quantized op Test Plan: python test/test_quantized_tensor.py TestQuantizedTensor.test_choose_qparams Imported from OSS Differential Revision: D20608793 fbshipit-source-id: b923b2620421b32d05f4097db0d6153d53198221	2020-03-25 10:33:21 -07:00
Edward Yang	0c91ebb694	Delete all trivial uses of make_variable. (#29213 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29213 A trivial use of make_variable is one where requires_grad=False. This transformation is not technically semantics preserving, as make_variable will create a shallow copy of the tensor in question; however, I am guessing that we have the invariant that we don't actually make use of this shallow copy in a nontrivial way. There were some cases where the surrounding code expected a Variable proper to be returned; I retained those sites. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18353503 Pulled By: ezyang fbshipit-source-id: 57fe34d82e009c0cc852266fb0b79d6d9c62bb03	2019-11-13 07:43:41 -08:00
Jie	fdab1cf0d4	NHWC support in cuDNN BatchNorm & Conv2d (#29361 ) Summary: This reverts the `9a9bb448ee` Fixing the broken case which reverts the previous commit. details about fix: modified: aten/src/ATen/native/Convolution.cpp called contiguous on 3D input tensor. This avoids the code path to accidentally recognize the input as channel_last stride, due to unsqueezing of permuted 3d tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29361 Differential Revision: D18371964 Pulled By: VitalyFedyunin fbshipit-source-id: a5985f4687b37e183649fa35b8ccdb50368ebfdf	2019-11-07 10:39:58 -08:00
Vitaly Fedyunin	9a9bb448ee	Revert cudnn changes #23861 (#29329 ) Summary: Broken case: ```python x = torch.randn(192,16,50).cuda() x = x.permute(0,2,1).contiguous().permute(0,2,1) m = torch.nn.Conv1d( in_channels=16, out_channels=32, kernel_size=2, bias=True, ).cuda() m(x) ``` This reverts commit `8160f390cf`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29329 Differential Revision: D18357674 Pulled By: VitalyFedyunin fbshipit-source-id: cdd7e77e8dcbfc5f2ab3df54eb53ccfbf703b245	2019-11-06 17:38:46 -08:00
Jie	8160f390cf	(#23861 ) Summary: Added nhwc support for: 1. cudnn_batch_norm & cudnn_batch_norm_backward 2. cudnn_convolution_forward & cudnn_convolution_backward 3. cudnn_convolution_transpose & cudnn_convolution_transpose_backward patching suggest_memory_format for convolution suggest_memory_format has ambiguous meaning for two cases: 1. tensor with NCHW where C = 1. we could use stride of C as a hint to tell the intended memory format. 2. tensor with NCHW where H == W == 1. there's no way to identify the intended memory format from strides. Currently we fallback to NCHW whenever we see contiguous tensor. Hence avoiding ambiguity for some of the special cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23861 Differential Revision: D18263434 Pulled By: VitalyFedyunin fbshipit-source-id: dd9f69576ec12fec879cd87a3d446931371360d9	2019-11-04 09:11:50 -08:00
Pavel Belevich	46f96d1538	C++ API parity: at::Tensor::requires_grad_ Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26332 Test Plan: Imported from OSS Differential Revision: D17427575 Pulled By: pbelevich fbshipit-source-id: 5500169a4fa0ef9cc2a7272e13b6e2d89df09260	2019-10-24 13:24:18 -07:00
Vitaly Fedyunin	d39ab0312a	Add memory_format support `to` and `type` operators (#27107 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27107 Adds memory_format keyword argument (positional for cpp). 'Preserve' behavior now follows next rules: 1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor. 2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format. 3) Output tensor is going to be contiguous in all other cases. --- Dense tensor is the tensor that store values in a contiguous block of memory. Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory. Test Plan: Imported from OSS Differential Revision: D17931062 Pulled By: VitalyFedyunin fbshipit-source-id: 2c5dd3dd05bf58a9a29f25562cd45190b009c3f9	2019-10-15 12:55:56 -07:00
Brian Vaughan	002c250139	Expose a torch.result_type and simplify tensor iterator Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26012 Test Plan: Imported from OSS Differential Revision: D17556197 Pulled By: nairbv fbshipit-source-id: c0be3ac9e99fecc26a181e301defc1942bc6708c	2019-09-25 06:52:23 -07:00
Dmytro Dzhulgakov	8c1354c31b	Implement more support for per-channel quantization (#26240 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26240 In particular adds support for empty/empty_like which is needed for memory layouts to work. Test Plan: Imported from OSS Differential Revision: D17443220 Pulled By: dzhulgakov fbshipit-source-id: 9c9e25981999c0edaf40be104a5741e9c62a1333	2019-09-19 13:39:17 -07:00
Jerry Zhang	94f903654c	Add qscheme() method (#20608 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20608 Exposing QScheme in python as Python objects like `torch.qscheme.per_tensor_affine` etc. Reviewed By: zafartahirov Differential Revision: D15364354 fbshipit-source-id: 4d6a96d67e9ead051cf4a8f934553a8c7232fdb7	2019-06-14 16:29:29 -07:00
Will Feng	a45b79d23f	Fix wrap(at::Scalar) (#18632 ) Summary: Problem: ```cpp // This function expects a `Variable` as input inline PyObject* wrap(at::Tensor tensor) { return THPVariable_Wrap(Variable(std::move(tensor))); } inline PyObject* wrap(at::Scalar scalar) { // This function calls `wrap(at::Tensor tensor)` (the function above), but since // `scalar_to_tensor(...)` returns a `Tensor` and not a `Variable`, the call to // `wrap(at::Tensor tensor)` will fail with "Tensor that was converted to Variable // was not actually a Variable", which is not what we want. return wrap(scalar_to_tensor(scalar)); } ``` The right fix is to call `make_variable(...)` with the tensor returned from `scalar_to_tensor(scalar)`. This unblocks https://github.com/pytorch/pytorch/pull/18230 as it is the only patch that hits this code path now. All other native functions that return Scalar (such as `item()` or `_local_scalar_dense()`) either has custom-defined implementation that doesn't go through this path, or is not exposed to Python at all. Pull Request resolved: https://github.com/pytorch/pytorch/pull/18632 Differential Revision: D14689293 Pulled By: yf225 fbshipit-source-id: be7ba5d3de83a69533a2997de97ad92989ff78ee	2019-03-30 11:36:11 -07:00
Roy Li	7aae51cded	Replace tensor.type().scalarType() calls with tensor.scalar_type() Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17515 Reviewed By: ezyang Differential Revision: D14233250 fbshipit-source-id: 6c7af8d2291c0c2b148001b30cf03834f34366c0	2019-03-08 14:08:18 -08:00
Alex Şuhan	e157a6432f	Fix Python device type property for XLA and MSNPU Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17361 Differential Revision: D14243546 Pulled By: soumith fbshipit-source-id: b7498968f72e3d97de5bf6e5b44c5a59b6913acb	2019-02-28 13:36:19 -08:00
Adam Paszke	7157be8622	Add special ops for BatchNorm symbolic differentiation (#15403 ) Summary: The main problem there is with differentiating batch norm statically is that we make a lot of complex run-time decisions about the backend we choose. Then, the autograd derivatives are implemented for every backend separately, which makes sense, because they might be saving buffers containing different values. To resolve the issue, the forward op returns an index of the chosen backend, and the backward function takes it as an argument, such that it knows how to interpret the buffers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15403 Differential Revision: D14098815 Pulled By: ailzhang fbshipit-source-id: 7fcd3e6e0566433e81fe8286fb441c1ecaf198ad	2019-02-15 15:40:28 -08:00
Xiang Gao	c5e1b469be	Return namedtuples from torch.* function with multiple return arguments for C++ operators (#15429 ) Summary: Partially fixes: https://github.com/pytorch/pytorch/issues/394 Implementation detail: Codegen is modified to generate codes that looks like below: ```C++ static PyObject * THPVariable_svd(PyObject* self_, PyObject* args, PyObject* kwargs) { HANDLE_TH_ERRORS static PythonArgParser parser({ "svd(Tensor input, bool some=True, bool compute_uv=True, , TensorList[3] out=None)", }, /traceable=*/true); ParsedArgs<6> parsed_args; auto r = parser.parse(args, kwargs, parsed_args); static PyStructSequence_Field fields0[] = { {"U", ""}, {"S", ""}, {"V", ""}, {nullptr} }; static PyStructSequence_Desc desc0 = { "torch.return_types.svd_out", nullptr, fields0, 3 }; static PyTypeObject type0; static bool namedtuple_type_initialized0 = false; if (!namedtuple_type_initialized0) { PyStructSequence_InitType(&type0, &desc0); namedtuple_type_initialized0 = true; } static PyStructSequence_Field fields1[] = { {"U", ""}, {"S", ""}, {"V", ""}, {nullptr} }; static PyStructSequence_Desc desc1 = { "torch.return_types.svd", nullptr, fields1, 3 }; static PyTypeObject type1; static bool namedtuple_type_initialized1 = false; if (!namedtuple_type_initialized1) { PyStructSequence_InitType(&type1, &desc1); namedtuple_type_initialized1 = true; } if (r.idx == 0) { if (r.isNone(3)) { return wrap(&type1, dispatch_svd(r.tensor(0), r.toBool(1), r.toBool(2))); } else { auto results = r.tensorlist_n<3>(3); return wrap(&type0, dispatch_svd(r.tensor(0), r.toBool(1), r.toBool(2), results[0], results[1], results[2])); } } Py_RETURN_NONE; END_HANDLE_TH_ERRORS } ``` Types are defined as static member of `THPVariable_${op_name}` functions, and initialized at the first time the function is called. When parsing function prototypes in `native_functions.yaml`, the parser will set the specified name as `field_name` when see things like `-> (Tensor t1, ...)`. These field names will be the field names of namedtuple. The class of namedtuples will be named `torch.return_types.${op_name}`. In some python 2, `PyStructSequence` is not a subtype of tuple, so we have to create some functions to check if an object is a tuple or namedtuple for compatibility issue. Operators in `native_functions.yaml` are changed such that only `max` and `svd` are generated as namedtuple. Tests are added for these two operators to see if the return value works as expected. Docs for these two ops are also updated to explicitly mention the return value is a namedtuple. More ops will be added in later PRs. There is some issue with Windows build of linker unable to resolve `PyStructSequence_UnnamedField`, and some workaround is added to deal with this case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15429 Differential Revision: D13709678 Pulled By: ezyang fbshipit-source-id: 23a511c9436977098afc49374e9a748b6e30bccf	2019-01-22 11:12:18 -08:00
James Reed	acbd9c49b0	Direct FBGEMM integraton into ATen (#13777 ) Summary: This PR implements infrastructure for post-processing a model to apply int8 quantization to its `nn.Linear` modules. Highlights of the implementation: 1) Inputs and outputs are `float` (quantized and packed internally), but the weight is quantized and packed ahead of time for efficiency. This implementation performs well in small-batch size GEMM calls. It should not be considered a general-purpose quantized GEMM kernel. 2) Weight packing is dependent on machine architecture (e.g. vector register width), so it is done just-in-time. Concretely, it is done on model load for the weights and it is done during operator execution for the input value. 3) Biases are unquantized 4) We fail loudly if we are attempting to run this on a machine that does not support FBGEMM. This is because we do not want a model's numerics to differ based on which machine it is run on. A model containing these FBGEMM ops must be run with FBGEMM The API can be seen in the added test case. Highlights are: 1) `torch.jit.quantized.quantize_linear_modules` walks the module hierarchy of the passed-in Module and replaces all `nn.Linear` modules with a new `QuantizedLinear` module, which encapsulates the behavior described above. 2) `_pack()` and `_unpack()` script methods are present on `QuantizedLinear` modules. These methods should be called before serialization and after deserialization, respectively. This ensures that the weight matrix is properly packed for the running machine's architecture. Note that in the long term, we would like to move toward a more Pickle-style serialization technique, rather than having these explicit methods that mutate member values. This is blocked on being able to assign attributes in a ScriptMethod, among other things. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13777 Differential Revision: D13383276 Pulled By: jamesr66a fbshipit-source-id: 00f29c9f34544add2b90107e3cf55a287802c344	2018-12-21 10:35:51 -08:00
Edward Yang	517c7c9861	Canonicalize all includes in PyTorch. (#14849 ) Summary: Anywhere we used #include "foo.h", we now say #include <foo.h> Paths are adjusted to be rooted out of aten/src, torch/lib, or the root level directory. I modified CMakeLists.txt by hand to remove TH and THC from the include paths. I used the following script to do the canonicalization: ``` import subprocess import re import os.path files = subprocess.check_output(['git', 'ls-files']).decode('utf-8').rstrip().split('\n') for fn in files: if not any(fn.endswith(suff) for suff in ['.cu', '.cpp', '.in', '.h', '.hpp', '.cu', '.cuh', '.cc']): continue if not any(fn.startswith(pref) for pref in ["aten/", "torch/"]): continue with open(fn, 'r') as f: c = f.read() def fmt(p): return "#include <{}>".format(p) def repl(m): p = m.group(1) if p in ["dlfcn.h", "unistd.h", "nvrtc.h", "cuda.h", "cuda_runtime.h", "cstdint", "cudnn.h", "Python.h", "cusparse.h", "cuda_runtime_api.h", "cuda_fp16.h", "cublas_v2.h", "stdint.h", "curand_kernel.h"]: return fmt(p) if any(p.startswith(pref) for pref in ["torch/csrc", "c10/", "ATen/", "caffe2/", "TH/", "THC/", "Eigen/", "gtest/", "zdl/", "gloo/", "onnx/", "miopen/"]): return fmt(p) for root in ["aten/src", "torch/lib", ""]: for bad_root in [os.path.dirname(fn), "aten/src/TH", "aten/src/THC", "torch/csrc"]: new_p = os.path.relpath(os.path.join(bad_root, p), root) if not new_p.startswith("../") and (os.path.exists(os.path.join(root, new_p)) or os.path.exists(os.path.join(root, new_p + ".in"))): return fmt(new_p) print("ERROR: ", fn, p) return m.group(0) new_c = re.sub(r'#include "([^"]+)"', repl, c) if new_c != c: print(fn) with open(fn, 'w') as f: f.write(new_c) ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14849 Reviewed By: dzhulgakov Differential Revision: D13363445 Pulled By: ezyang fbshipit-source-id: 52361f878a672785f9306c9e9ab2513128092b68	2018-12-08 19:38:30 -08:00
Thomas Viehmann	ba25e13782	Forbid Module.to with copy argument. (#12617 ) Summary: Module.to uses the Tensor.to parsing facility. It should not, however, accept "copy" as a keyword/fourth positional argument. See #12571 for discussion. Thank you SsnL for noticing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12617 Differential Revision: D10392053 Pulled By: ezyang fbshipit-source-id: b67a5def7993189b4b47193abc7b741b7d07512c	2018-10-16 20:31:44 -07:00
Yangqing Jia	713e706618	Move exception to C10 (#12354 ) Summary: There are still a few work to be done: - Move logging and unify AT_WARN with LOG(ERROR). - A few header files are still being plumbed through, need cleaning. - caffe2::EnforceNotMet aliasing is not done yet. - need to unify the macros. See c10/util/Exception.h This is mainly a codemod and not causing functional changes. If you find your job failing and trace back to this diff, usually it can be fixed by the following approaches: (1) add //caffe2/c10:c10 to your dependency (or transitive dependency). (2) change objects such as at::Error, at::Optional to the c10 namespace. (3) change functions to the c10 namespace. Especially, caffe2::MakeString is not overridden by the unified c10::str function. Nothing else changes. Please kindly consider not reverting this diff - it involves multiple rounds of rebasing and the fix is usually simple. Contact jiayq@ or AI Platform Dev for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12354 Reviewed By: orionr Differential Revision: D10238910 Pulled By: Yangqing fbshipit-source-id: 7794d5bf2797ab0ca6ebaccaa2f7ebbd50ff8f32	2018-10-15 13:33:18 -07:00
Thomas Viehmann	0cf3c1ce66	Add copy= keyword to Tensor.to (#12571 ) Summary: Fixes: #12454 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12571 Differential Revision: D10356994 Pulled By: SsnL fbshipit-source-id: d87416078a5a8e5ffa690cd73c09fa6b4e16aa25	2018-10-12 02:10:44 -07:00
Edward Yang	ac9268f25d	Conversions to and from complex numbers. (#11420 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11420 Surprisingly tricky! Here are the major pieces: - We grow a even yet more ludicrous macro AT_FORALL_SCALAR_TYPES_WITH_COMPLEX_EXCEPT_COMPLEX_HALF which does what it says on the tin. This is because I was too lazy to figure out how to define the necessary conversions in and out of ComplexHalf without triggering ambiguity problems. It doesn't seem to be as simple as just Half. Leave it for when someone actually wants this. - Scalar now can hold std::complex<double>. Internally, it is stored as double[2] because nvcc chokes on a non-POD type inside a union. - overflow() checking is generalized to work with complex. When converting to std::complex<T>, all we need to do is check for overflow against T. When converting from complex, we must check (1) if To is not complex, that imag() == 0 and (2) for overflow componentwise. - convert() is generalized to work with complex<->real conversions. Complex to real drops the imaginary component; we rely on overflow checking to tell if this actually loses fidelity. To get the specializations and overloads to work out, we introduce a new Converter class that actually is specializable. - Complex scalars convert into Python complex numbers - This probably fixes complex tensor printing, but there is no way to test this right now. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Reviewed By: cpuhrsch Differential Revision: D9697878 Pulled By: ezyang fbshipit-source-id: 181519e56bbab67ed1e5b49c691b873e124d7946	2018-09-08 16:39:43 -07:00
Gregory Chanan	6219c4a28f	Make Scalar::toTensor a free function, move Scalar to ATen/core. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11125 Reviewed By: ezyang Differential Revision: D9599798 Pulled By: gchanan fbshipit-source-id: 2fec682c109013a82788dfba13f4d30b2945d3f4	2018-09-04 16:25:57 -07:00
Peter Goldsborough	47492ed451	[C++ API] Bag of fixes (#8843 ) * Bag of fixes * Rename tensor_range.h to tensor_list_view.h * Post rebase fixes * Rename torch::tensor namespace to torch::tensors due to name conflict * Avoid recursion in Module::to	2018-06-25 21:11:49 -07:00
Peter Goldsborough	372d1d6735	Create ATen tensors via TensorOptions (#7869 ) * Created TensorOptions Storing the type in TensorOptions to solve the Variable problem Created convenience creation functions for TensorOptions and added tests Converted zeros to TensorOptions Converted rand to TensorOptions Fix codegen for TensorOptions and multiple arguments Put TensorOptions convenience functions into torch namespace too All factory functions except _like support TensorOptions Integrated with recent JIT changes Support _like functions Fix in place modification Some cleanups and fixes Support sparse_coo_tensor Fix bug in Type.cpp Fix .empty calls in C++ API Fix bug in Type.cpp Trying to fix device placement Make AutoGPU CPU compatible Remove some auto_gpu.h uses Fixing some headers Fix some remaining CUDA/AutoGPU issues Fix some AutoGPU uses Fixes to dispatch_tensor_conversion Reset version of new variables to zero Implemented parsing device strings Random fixes to tests Self review cleanups flake8 Undo changes to variable.{h,cpp} because they fail on gcc7.2 Add [cuda] tag to tensor_options_cuda.cpp Move AutoGPU::set_index_from into .cpp file because Windows is stupid and sucks Fix linker error in AutoGPU.cpp Fix bad merge conflict in native_functions.yaml Fixed caffe2/contrib/aten Fix new window functions added to TensorFactories.cpp * Removed torch::TensorOptions Added code to generate wrapper functions for factory methods Add implicit constructor from Backend to TensorOptions Remove Var() from C++ API and use torch:: functions Use torch:: functions more subtly in C++ API Make AutoGPU::set_device more exception safe Check status directly in DynamicCUDAHooksInterface Rename AutoGPU to DeviceGuard Removed set_requires_grad from python_variables.h and warn appropriately in Variable::set_requires_grad remove python_default_init: self.type() Add back original factory functions, but with deprecation warnings Disable DeviceGuard for a couple functions in ATen Remove print statement Fix DeviceGuard construction from undefined tensor Fixing CUDA device compiler issues Moved as many methods as possible into header files Dont generate python functions for deprecated factories Remove merge conflict artefact Fix tensor_options_cuda.cpp Fix set_requires_grad not being checked Fix tensor_new.h TEMPORARILY put some methods in .cpp files to see if it solves issues on windows and mac Fix bug in DeviceGuard.h Missing includes TEMPORARILY moving a few more methods into .cpp to see if it fixes windows Fixing linker errors * Fix up SummaryOps to use new factories Undo device agnostic behavior of DeviceGuard Use -1 instead of optional for default device index Also move DeviceGuard methods into header Fixes around device index after optional -> int32_t switch Fix use of DeviceGuard in new_with_tensor_copy Fix tensor_options.cpp * Fix Type::copy( * Remove test_non_float_params from ONNX tests * Set requires_grad=False in ONNX tests that use ints * Put layout/dtype/device on Tensor * Post merge fixes * Change behavior of DeviceGuard to match AutoGPU * Fix C++ API integration tests * Fix flip functions	2018-06-16 00:40:35 -07:00
Tongzhou Wang	c0a419e6ba	Add non_blocking to Tensor/Module.to (#7312 ) * Add non_blocking to Tensor/Module.to * flake8 * Add argparse tests * cpp parse * Use C++ parser * use a commong parse function with Tensor.to * fix test_jit * use THPObjectPtr * increase refcount for None, True, and False * address comments * address comments	2018-06-04 18:46:52 -04:00
gchanan	7abdc303c6	Don't allow requires_grad to be set on integer Tensor constructors in… (#7185 ) * Don't allow requires_grad to be set on integer Tensor constructors in tensor_new. * Fix autograd test. * Fix test_distributions. * Fix test_jit. * Fix NN tests.	2018-05-18 19:45:10 +02:00
Zachary DeVito	d985cf46f1	Add workaround to fix include warnings in Python 2 builds. (#6716 )	2018-04-24 12:30:19 -07:00
gchanan	d0b0edf27a	Add a requires_grad_() function to tensors. (#6771 )	2018-04-19 13:47:24 -04:00
gchanan	4c81282c33	Introduce torch.layout and split layout from dtypes. (#6145 ) * Introduce torch.layout and split layout from dtypes. Tensors (and tensor types) now have a 'layout' attribute that returns either 'torch.strided' or 'torch.sparse_coo'. Previously, dtypes were 1-to-1 with ATen types/PyTensorTypes; the impetus behind this decision was to make things easy in the common case (i.e. specifying a type in a factory function). But this doesn't really follow for sparity, which isn't a common case. It also doesn't properly represent the concept or a dtype, which in numpy are proper scalar types (i.e. roughly the type returned from indexing the last dimension of an n-d array). But this should be the same whether or not the tensor is represented via strides, sparsity, etc. This is accomplished by: 1) having the dtype of tensor return the (device-type, scalar-type) combination, i.e. torch.cuda.float32, so both torch.cuda.FloatTensor and torch.cuda.sparse.FloatTensor have the same dtype 2) Adding a layout parameter to python functions, where the combination of (dtype, layout) maps to an ATen type that is used for dispatch. * Formatting, make init throw python_error. * Fix cuda not enabled error message. * Fix test.	2018-04-02 14:07:50 -04:00
gchanan	5edf6b2037	Add numpy-style dtypes to Variable factories. (#5245 ) * Add numpy-style dtypes to Variable factories. 1) Add numpy-style dtypes corresponding to torch tensor types. These are: torch.float16, torch.float32, torch.float64, torch.uint8, torch.int8, torch.int16, torch.int32, torch.int64 as well as torch.cuda, torch.sparse, and torch.cuda.sparse equivalents. 2) Adds "legacy" names for the above dtypes that correspond more closely to existing tensor names. These are: torch.half, torch.float, torch.double, torch.short, torch.int, torch.long. torch.byte and torch.char don't exist because they either don't match numpy semantics or differ on different architectures. 3) Adds a "dtype" parameter to Variable factories (e.g. zeros, ones) that allows the user to specify the type without changing the default tensor type. 4) Adds a "dtype" getter to Variables that return the canonical dtype from 1) This PR is missing the following useful features that should be added in the future: A) We only add the "dtype" parameter to auto-generated factories; hand-written factories like in tensor_new.cpp don't support this yet. B) We don't allow type conversions to use dtypes; that should be added to type(param) or a new function. C) We don't yet have a "device" parameter for these factories; right now, they will only create Variables on the default device. * backend_to_string can be private. * Define python binding argument indexes in a more simple way. * add all_declared_types, still need to hook it up to THPDType. * Fix all_declared_types for missing types (it's Sparse + Half). * Ensure cuda dtypes are created even if compiled with NO_CUDA=1. * Fix case where dtype is provided but dispatch is via namespace. This happens in ones_like, empty_like, randn_like. There is some question if we should do: 1) at::ones_like(tensor).toType(dtype) 2) at::ones_like(tensor.toType(dtype)) I did the former because this matches with the numpy documentation, i.e.: "Overrides the data type of the result." and it's easier to implement. Note that the above causes an extra copy, either of the input or output. Here's a better implementation: 1) Make zeros_like, ones_like native functions that take an optional type (named dtype?). 2) Match the type argument with the dtype, so we don't have two different parameters. 3) Call at::zeros_like(input, type) -> at::native::zeros_like(input, type) -> type.zeros(input.sizes()) * Don't return from maybe_initialize_cuda. * Don't leak DType name. * Address cpp review comments. * Share code between sparse and non-sparse test_dtypes. * Rewrite _like functions as native function with explicit type parameter. * Use type 'Type' instead of 'dtype' for consistency. * Address review comments. * Handle arg_idx when there is requires_grad but no dtype in python_binding_arguments.	2018-02-20 11:04:14 -05:00
Edward Z. Yang	7bd2db997e	Port cuDNN RNN bindings to ATen (#4881 ) * Add transpose() to TensorGeometry. This code is dead; I briefly used it in my RNN patchset but eventually rewrote it to not be necessary. However, it seemed like a useful gadget so I kept it. In general, it seems that it would be useful for TensorGeometry to support all operations that Tensor does, but it only computes the changes to sizes/strides instead of actually doing the computation. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Turn on wrap_dim behavior for TensorGeometry Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Support for hard-coded differentiable outputs. Some outputs of functions are nondifferentiable, and should always be returned with requires_grad=False. Traditionally, we have used the presence of 'grad' to signal that only the first output is differentiable, and the rest are not, but cudnn_rnn (to be implemented) breaks this pattern; its first three outputs are differentiable, but its last output is a buffer that is just consumed by backwards. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * TensorGeometry constructor from just sizes The sizes are assumed to form a contiguous tensor, and we compute the strides we would get in that case. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Support saving TensorList for backwards. There is some back story here. Saved TensorList in backwards will be used by cudnn_rnn, and it is worth asking, why is it necessary to save a list of tensors? Indeed, technically speaking a list of tensors is not necessary, we only need to save the sizes of each of the weight tensors. (We need the sizes because cuDNN is only going to blast the derivative of weights into a flat buffer, but we need to match the sizes of the views into the buffer when we eventually return the derivatives.) However, it was surprisingly awful trying to implement passing just sizes, because as non-Tensor arguments, the JIT interpreter generation code is expected to handle all non-Tensor arguments as attributes in the trace, and our attributes struct doesn't actually know how to do arrays of arrays. Saved TensorList code was much easier to get working, so that's what this patch does. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * MatrixRef - an ArrayRef with a stride, making it a 2D ArrayRef. Like ArrayRef, this class does not own the underlying data, it is expected to be used in situations where the data resides in some other buffer. This is intended to be trivially copyable, so it should be passed by value. For now, 2D only (so the copies are actually cheap, without having to write a SmallVector class) and contiguous only (so we can return non-strided ArrayRef on index). The intended use-case (not in this commit) is to make it easier to work with RNN weights, which are num_weights x num_layers matrix of parameters. P.S. dimension 0 indexes rows, dimension 1 indexes columns Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Generalize getDataType in Descriptors.h Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Change copy_range to take Tensor, and change cat_tensors_backward accordingly Should a backward function return a Variable or a Tensor? For the most part, all of our backward functions return Tensor, except cat_tensors_backward, which returns a variable_list (which is really the only thing that matters, because Tensor and Variable are interconvertible). But this is kind of weird, because it means that you can't implement a backwards in ATen that returns a std::vector<Tensor>, and then hook it up transparently with the derivatives code. So I switched it over. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Support 5-ary return Tensor tuple. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Support code generation with mixed Tensor/TensorList in output. I don't think I ended up using this in cudnn_rnn, but this seems it might be useful for someone else later. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Support 4-ary boolean array Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add support for retain_variables in tools/autograd/derivatives.yaml 'retain_variables', a bool which is true if a user has specified that saved variables should be retained in case the backwards is run again later. This allows an optimization where we can destroy saved buffers if we know variables are not going to be retained, e.g., it is (will be) used by _cudnn_rnn Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Lazily initialize cuDNN descriptors Previously, cuDNN descriptors were eagerly allocated as soon as a FooDescriptor object was created. However, in some uses of TensorDescriptor, this is problematic: some tensors are optional and cuDNN's API expects to be given a nullptr TensorDescriptor in this case, not an uninitialized (but allocated) descriptor. Lazily initializing the descriptors makes it less likely for us to use uninitialized memory and matches the usual semantics of unique_ptr. It's good sense! Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Port cuDNN RNNs to ATen. This brings three new functions: - _cudnn_rnn_flatten_weight: flatten a matrix of weight tensors into a single contiguous weight buffer as required by cuDNN - _cudnn_rnn: run RNN forwards - _cudnn_rnn_backward: run RNN backwards RNNs have a lot of parameters, so we restructured what was previously a single 'fn' object that recorded all the parameters into three objects: RNNDescriptorParams, TensorDescriptorListParams and DropoutDescriptorParams. We make use of MatrixRef to organize the weight tensors (which are weight/bias x number of layers), but I did not teach the codegen how to pass these as arguments/return values natively, so instead a MatrixRef is passed as its constituent ArrayRef and int64_t stride0. cudnn_rnn has three differentiable outputs and one nondifferentiable one, so it makes use of the support for hard-coded differentiable outputs. I haven't deleted all of the descriptor code from Python, because dropout initialization still goes through this codepath, that should be fixed soon but I don't see it as essential for this PR. This commit also removes the last use of NestedIOFunction from PyTorch. There are some shenanigans with cuDNN dropout descriptor initialization, see below: Note [cuDNN dropout descriptor initialization] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In most cases, setting descriptors in cuDNN is cheap (e.g., cudnnSetTensorNdDescriptor). However, this is not the case for cudnnSetDropoutDescriptor: in cuDNN 6/7 (and possibly others) it does an expensive precomputation to initialize the random number generator states. In cuDNN 6, this is the ONLY official mechanism to initialize a dropout descriptor, which means that law-abiding clients were expected to generate a dropout descriptor once and cache it. However, our ATen interface is (1) stateless (so we can't cache the descriptors) and (2) does not accept arbitrary user types in its interface (so we can't pass the descriptor in). This puts us in a pickle. In cuDNN 7, a new function, cudnnRestoreDropoutDescriptor was added, which forgoes the expensive initialization process, and can initialize the descriptor with a pre-initialized state CUDA tensor. This is great, because it means we can simply pass in the state tensor and then initialize the descriptor internally. Unfortunately, this function is not available in cuDNN 6. To work around this, we break the cuDNN abstraction barrier, and have the struct layout of the underlaying dropout descriptor. With this struct, we can reimplement cudnnRestoreDropoutDescriptor from scratch. Great! Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Fix cuDNN 7 behavior. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Delete some unused, controversial methods from MatrixRef. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add missing filter_dim_a slice Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Replace nested for-loop with itertools.chain. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * CR comment on mut_desc() Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Refactor DropoutDescriptor API. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Use cached CurrentDeviceProperties from Context. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Document _cudnn_rnn outputs. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Improve fmap docs, convert some functions to use it. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Move IndexRange to autograd/function.h Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Elaborate on CUDNN_STATUS_INVALID_VALUE return some more. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add an all-in-one setter for RNNDescriptorParams. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Print what the unrecognized RNN mode was Signed-off-by: Edward Z. Yang <ezyang@fb.com> * RNN TensorDescriptor improvements - Have an explicit size/stride overload for set TensorDescriptor, so you don't have to create a goofy view to feed in. - Change the padding to 3D rather than 5D, which is all you actually need (it's just 2D that is not supported by cuDNN API.) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Fix implementation of cudnnRestoreDropoutDescriptor, plus test. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Better comments about input layout. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add comment about no-DropoutDescriptor argument RNNDescriptor function. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Rename vocab_size back to input_size. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Don't use backslash in comment. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Bugfix for contiguous TensorGeometry calculation. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Don't allocate a dummy tensor when setting TensorDescriptor for flatten_weight. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Make contiguity errors more user-friendly. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * s/fn.dropout.train/fn_train/ Signed-off-by: Edward Z. Yang <ezyang@fb.com> * s/_cudnn_rnn_backward_grad/_cudnn_rnn_backward_input/ Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Make dcx properly undefined when not required. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Remove old TODO. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add state size check in cudnnRestoreDropoutDescriptor Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Explicitly narrow int64_t to size_t Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Restore copyParams comment. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update benchmark numbers, and slight engineering improvements. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Typofix. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-02-05 13:54:11 -05:00
Tongzhou Wang	c681b03d37	Add determinant function on variable; Add backward on svd (#3816 ) * determinant on variable * svd bwd	2017-12-01 13:22:46 -05:00
gchanan	157f949cef	Implement python scalar conversions via ATen; allow localScalar if numel == 1 (#3908 ) * Have localScalar work with all 1 element tensors, not just scalars. Also have toCFloat, etc. call localScalar so 1 element tensors work as well. * Implement python number conversions. * Implement __bool__, __nonzero__ as ATen functions. * Remove merge artifacts. * Simplify by dispatching to toCDouble.	2017-11-28 12:56:51 -05:00
Adam Paszke	65e0d5bad8	Fix void* wrapping in autograd codegen Also, add assertions here and there to make sure bad things never happen again.	2017-11-24 13:33:13 +01:00
Sam Gross	d496f9b20c	Ensure that Variables are at least one-dim in VariableType (#3609 ) Previously, we checked that Variables were at least one dimensional in the Python binding (wrap_outputs.h) and in the backwards functions. This was necessary because some Tensor functions returned Scalar types, which must be zero dimensional. This moves the wrapping logic into VariableType.	2017-11-09 17:34:24 -05:00
Gregory Chanan	5fd93b56fd	[master] Don't expose 0-dim tensors to Variable API.	2017-11-07 15:15:42 -05:00
Gregory Chanan	fe0ac0f7d0	Support native functions in C++ autograd automatically.	2017-10-24 19:33:37 -04:00
Sam Gross	f1f64c8d07	Generate autograd functions for NN / more refactors (#3136 ) Generate autograd functions for NN and implement more derivatives in derivatives.yaml A big refactor of gen_variable_type.py	2017-10-19 15:03:26 -04:00

47 Commits