Commit Graph

358 Commits

Author SHA1 Message Date
Vitaly Fedyunin
b7c830b916 Revert "Adding pin_memory kwarg to zeros, ones, empty,... (#18854)
Summary:
This reverts commit c484cf43a0.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18854

Differential Revision: D14778393

Pulled By: VitalyFedyunin

fbshipit-source-id: 4b5a1f5b1c091bbc4a8e75614734cc011d26b452
2019-04-05 06:25:33 -07:00
Jerry Zhang
dfcd7b0185 QTensor (#18230)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18230

Implementing minimum qtensor API to unblock other workstreams in quantization

Changes:
- Added Quantizer which represents different quantization schemes
- Added qint8 as a data type for QTensor
- Added a new ScalarType QInt8
- Added QTensorImpl for QTensor
- Added following user facing APIs
  - quantize_linear(scale, zero_point)
  - dequantize()
  - q_scale()
  - q_zero_point()

Reviewed By: dzhulgakov

Differential Revision: D14524641

fbshipit-source-id: c1c0ae0978fb500d47cdb23fb15b747773429e6c
2019-04-03 13:17:11 -07:00
Vitaly Fedyunin
c484cf43a0 Adding pin_memory kwarg to zeros, ones, empty, ... tensor constructors. (#18455)
Summary:
Make it possible to construct a pinned memory tensor without creating a storage first and without calling pin_memory() function. It is also faster, as copy operation is unnecessary.

Supported functions:
```python
torch.rand_like(t, pin_memory=True)
torch.randn_like(t, pin_memory=True)
torch.empty_like(t, pin_memory=True)
torch.full_like(t, 4, pin_memory=True)
torch.zeros_like(t, pin_memory=True)
torch.ones_like(t, pin_memory=True)
torch.tensor([10,11], pin_memory=True)
torch.randn(3, 5, pin_memory=True)
torch.rand(3, pin_memory=True)
torch.zeros(3, pin_memory=True)
torch.randperm(3, pin_memory=True)
torch.empty(6, pin_memory=True)
torch.ones(6, pin_memory=True)
torch.eye(6, pin_memory=True)
torch.arange(3, 5, pin_memory=True)
```

Part of the bigger: `Remove Storage` plan.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18455

Reviewed By: ezyang

Differential Revision: D14672084

Pulled By: VitalyFedyunin

fbshipit-source-id: 9d0997ec00f59500ee018f8b851934d334012124
2019-04-02 08:48:19 -07:00
Vishwak Srinivasan
d859031ebf Rename btrifact* to lu (#18435)
Summary:
Changelog:

- Renames `btrifact` and `btrifact_with_info` to `lu`to remain consistent with other factorization methods (`qr` and `svd`).
- Now, we will only have one function and methods named `lu`, which performs `lu` decomposition. This function takes a get_infos kwarg, which when set to True includes a infos tensor in the tuple.
- Rename all tests, fix callsites
- Create a tentative alias for `lu` under the name `btrifact` and `btrifact_with_info`, and add a deprecation warning to not promote usage.
- Add the single batch version for `lu` so that users don't have to unsqueeze and squeeze for a single square matrix (see changes in determinant computation in `LinearAlgebra.cpp`)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18435

Differential Revision: D14680352

Pulled By: soumith

fbshipit-source-id: af58dfc11fa53d9e8e0318c720beaf5502978cd8
2019-03-29 00:34:30 -07:00
vishwakftw
291746f110 Rename trtrs to triangular_solve (#18213)
Summary:
Changelog:
- Renames `trtrs` to `triangular_solve` to remain consistent with `cholesky_solve` and `solve`.
- Rename all tests, fix callsites
- Create a tentative alias for `triangular_solve` under the name `trtrs`, and add a deprecation warning to not promote usage.
- Move `isnan` to _torch_docs.py
- Remove unnecessary imports
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18213

Differential Revision: D14566902

Pulled By: ezyang

fbshipit-source-id: 544f57c29477df391bacd5de700bed1add456d3f
2019-03-21 14:27:21 -07:00
Gao, Xiang
7e6220393f Cleanup arg{min, max} (#17103)
Summary:
Why do we need this workaround? `PythonArgParser` handles these two cases well.

The discussion started at https://github.com/pytorch/pytorch/pull/6201#issuecomment-378724406. The conclusion at that time by goldsborough was:

> Because we wanted to allow `dim=None` in Python and route to a different function. Essentially the problem was wanting to wrap the C++ function in Python. AFAIK there is no way of translating `dim=None` behavior into C++? So Richard and I came up with this strategy

Maybe at that time `PythonArgParser` was not powerful enough to handle the routing of two function with same name but different C++ signature.

Will keep an eye on the CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17103

Differential Revision: D14523503

Pulled By: VitalyFedyunin

fbshipit-source-id: cae3e2678062da2eccd93b51d4050578c7a9ab80
2019-03-20 16:28:27 -07:00
Vishwak Srinivasan
421b508d55 Rename gesv to solve (#18060)
Summary:
Changelog:

- Renames `gesv` to `solve` to remain consistent with `cholesky_solve`.
- Rename all tests, fix callsites
- Create a tentative alias for `solve` under the name `gesv`, and add a deprecated warning to not promote usage.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18060

Differential Revision: D14503117

Pulled By: zou3519

fbshipit-source-id: 99c16d94e5970a19d7584b5915f051c030d49ff5
2019-03-18 16:04:24 -07:00
vishwakftw
f268370b42 torch.btrifact for tensors with greater than 3 dimensions (#14964)
Summary:
Motivation:
- Earlier, `torch.btrifact` could not handle tensors with greater than 3 dimensions. This is because of the check:
>   AT_CHECK(THTensor_(nDimension)(a) == 3, "expected 3D tensor, got size: ", a->sizes());

What is in this PR?:
- Move `btrifact` to ATen
- Remove relation to TH/THC.
- Handle tensors with more than three dimensions
- Tests
- Docs modifications: added a note about the non-pivoting variant.

[blocked due to old magma-cuda binaries]
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14964

Differential Revision: D14405106

Pulled By: soumith

fbshipit-source-id: f051f5d6aaa45f85836a2867176c065733563184
2019-03-12 01:46:07 -07:00
Xiang Gao
2e5a8cee82 Customize the printing of namedtuple return (#17136)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/17112
```python
print("good", torch.randn(5,5,5).max(1))
print("terrible", torch.randn(5,5,10).max(1))
print("not as good", torch.randn(5,5,500).max(1))
print ("old behaviour = gold standard")
print(tuple(torch.randn(5,5,5).max(1)))
print(tuple(torch.randn(5,5,10).max(1)))
print(tuple(torch.randn(5,5,500).max(1)))
```
now gives
```
>>> import torch
>>> print("good", torch.randn(5,5,5).max(1))
good torch.return_types.max(
values=tensor([[ 1.2821,  1.8063,  1.8075,  1.3082, -0.1267],
        [ 0.3437,  0.7353,  1.2619,  0.7557,  1.6662],
        [ 0.8583,  1.8906,  1.0246,  1.7598,  1.1184],
        [ 1.7821,  0.0230,  0.9452,  1.0318,  1.0823],
        [ 0.4116, -0.0379, -0.1843,  1.4129,  1.8796]]),
indices=tensor([[4, 4, 3, 2, 1],
        [1, 2, 4, 1, 1],
        [2, 4, 0, 2, 1],
        [0, 2, 0, 3, 1],
        [0, 4, 4, 4, 4]]))
>>> print("terrible", torch.randn(5,5,10).max(1))
terrible torch.return_types.max(
values=tensor([[ 2.1272,  1.3664,  2.2067,  1.3974, -0.0883,  1.2505,  1.0074,  1.1217,
          0.3849,  0.6936],
        [ 0.6288, -0.4560,  1.2748,  1.5482,  1.2777,  1.6874,  0.7151,  0.6041,
          1.3572,  1.6232],
        [ 1.6703,  1.0075,  1.6480,  2.2839,  1.3390,  0.4938,  1.6449,  1.7628,
          0.8141,  2.5714],
        [ 0.7079,  1.8677,  3.2478,  1.5591,  2.4870,  0.8635, -0.1450,  1.6923,
          1.4924,  1.6298],
        [ 2.4056,  0.8002,  0.9317,  0.7455,  0.7866,  2.1191,  0.3492,  1.2095,
          1.8637,  1.7470]]),
indices=tensor([[1, 1, 0, 0, 0, 0, 3, 4, 4, 4],
        [4, 2, 2, 1, 2, 2, 3, 1, 1, 3],
        [0, 3, 3, 0, 2, 1, 4, 1, 0, 1],
        [4, 1, 3, 0, 3, 2, 0, 1, 4, 3],
        [1, 0, 3, 2, 1, 0, 0, 1, 0, 1]]))
>>> print("not as good", torch.randn(5,5,500).max(1))
not as good torch.return_types.max(
values=tensor([[ 0.3877,  0.7873,  1.8701,  ...,  0.5971,  1.6103, -0.3435],
        [ 1.1300,  2.2418,  1.4239,  ...,  1.3943,  0.3872,  1.6475],
        [ 2.0656,  1.3136,  0.9896,  ...,  2.3918,  0.8226,  1.0517],
        [ 1.1054,  0.9945,  1.0561,  ...,  2.1039,  1.1524,  3.0304],
        [ 1.5041,  2.2809,  1.0883,  ...,  0.8504,  2.4774,  1.1041]]),
indices=tensor([[4, 3, 1,  ..., 1, 4, 0],
        [4, 4, 4,  ..., 3, 0, 3],
        [3, 0, 1,  ..., 2, 2, 4],
        [0, 1, 1,  ..., 4, 2, 2],
        [1, 0, 4,  ..., 2, 0, 2]]))
>>> print ("old behaviour = gold standard")
old behaviour = gold standard
>>> print(tuple(torch.randn(5,5,5).max(1)))
(tensor([[ 1.1908,  1.1807,  1.3151,  1.7184,  0.3556],
        [ 0.3798,  0.9213,  0.3001,  1.3087,  2.2419],
        [ 1.4233,  1.4814,  1.9900,  1.7744,  1.3059],
        [ 1.0026, -0.0330,  1.3061,  1.8730,  2.0685],
        [ 1.3041,  1.6458,  1.3449,  1.8948,  3.6206]]), tensor([[0, 4, 3, 4, 0],
        [1, 1, 4, 0, 4],
        [4, 1, 0, 3, 3],
        [1, 2, 1, 4, 0],
        [3, 3, 0, 3, 3]]))
>>> print(tuple(torch.randn(5,5,10).max(1)))
(tensor([[-0.1232,  0.8275,  0.6732,  1.1223,  0.8247,  1.2851,  1.6009,  1.9979,
          1.9109,  0.7313],
        [ 0.2260,  0.5922,  1.6928,  0.6024,  2.1158,  3.0619,  0.5653,  0.7426,
          0.8316,  0.6346],
        [ 0.4319,  0.2231,  0.5255,  1.7620,  1.1657,  0.8875,  0.5782,  0.6506,
          0.5032,  1.7097],
        [ 0.4137,  1.7265,  1.4260,  2.0301,  1.2244,  0.7128,  2.6345,  0.7230,
          1.3553,  1.6508],
        [ 1.0684,  1.7195,  1.4068,  0.7076, -0.0242,  0.8474,  0.8754,  1.7108,
          0.2188,  1.1584]]), tensor([[0, 1, 3, 4, 2, 3, 4, 2, 1, 0],
        [1, 4, 0, 0, 3, 2, 0, 0, 3, 3],
        [2, 3, 1, 1, 4, 0, 1, 4, 4, 4],
        [0, 4, 1, 3, 2, 0, 2, 0, 3, 1],
        [1, 0, 0, 0, 0, 3, 3, 3, 2, 0]]))
>>> print(tuple(torch.randn(5,5,500).max(1)))
(tensor([[0.9395, 1.5572, 1.8797,  ..., 2.0494, 0.8202, 0.9623],
        [1.7937, 0.7225, 1.8836,  ..., 0.7927, 1.4976, 1.1813],
        [0.8558, 1.6943, 1.4192,  ..., 0.8327, 1.9661, 0.4197],
        [1.2993, 1.4995, 0.9357,  ..., 0.7810, 1.3030, 2.6216],
        [1.4206, 1.8315, 1.0338,  ..., 1.4312, 1.3198, 1.5233]]), tensor([[0, 4, 3,  ..., 3, 0, 2],
        [0, 1, 0,  ..., 0, 4, 3],
        [3, 4, 3,  ..., 3, 0, 0],
        [3, 2, 3,  ..., 1, 2, 1],
        [1, 2, 4,  ..., 3, 1, 3]]))
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17136

Differential Revision: D14250021

Pulled By: VitalyFedyunin

fbshipit-source-id: aae72f03b35980063b1ac1f07b8353eddb0c8b93
2019-02-28 13:07:26 -08:00
Christian Puhrsch
e47aeede32 Use name for output variables instead of out in JIT (#17386)
Summary:
This adds 88 matches.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17386

Differential Revision: D14179139

Pulled By: cpuhrsch

fbshipit-source-id: 2c3263b8e4d084db84791e53290e8c8b1b7aecd5
2019-02-27 14:03:33 -08:00
Adam Paszke
7157be8622 Add special ops for BatchNorm symbolic differentiation (#15403)
Summary:
The main problem there is with differentiating batch norm statically
is that we make a lot of complex run-time decisions about the backend
we choose. Then, the autograd derivatives are implemented for every
backend separately, which makes sense, because they might be saving
buffers containing different values. To resolve the issue, the forward
op returns an index of the chosen backend, and the backward function
takes it as an argument, such that it knows how to interpret the buffers.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15403

Differential Revision: D14098815

Pulled By: ailzhang

fbshipit-source-id: 7fcd3e6e0566433e81fe8286fb441c1ecaf198ad
2019-02-15 15:40:28 -08:00
Edward Yang
4404762d7d Rename IntList to IntArrayRef. (#16751)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16751

This was made more complicated by the fact that ivalue::IntList
is a thing.  So I had to fix all of the sites where we referring
to IValue post facto.

The following codemods were run, in this order:

```
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in IntList IntArrayRef
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in IntArrayRef::create IntList::create
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in ivalue::IntArrayRef ivalue::IntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in Tag::IntArrayRef Tag::IntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in isIntArrayRef isIntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in toIntArrayRef toIntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in 'Shared<IntArrayRef>' 'Shared<IntList>'
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in 'intrusive_ptr<IntArrayRef>' 'intrusive_ptr<IntList>'
```

Some manual fixups were done afterwards; they can be reviewed separately
at https://github.com/pytorch/pytorch/pull/16752

Reviewed By: dzhulgakov

Differential Revision: D13954363

fbshipit-source-id: b5c40aacba042402155a2f5a229fa6db7992ac64
2019-02-05 14:54:34 -08:00
Roy Li
4c803f4ebd Expose backend extensions to python
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16582

Reviewed By: gchanan

Differential Revision: D13887539

fbshipit-source-id: 8755babf2e3e849af974655f2f3a91740efe977e
2019-02-01 11:00:18 -08:00
Lu Fang
b1b00f329e Fix the flake8 linter
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16549

Reviewed By: bddppq

Differential Revision: D13877435

Pulled By: houseroad

fbshipit-source-id: dbe575ba3f6dd30d27ac6aa5eec2eea025063540
2019-01-30 09:36:00 -08:00
Thomas Viehmann
6a6983ed7f create type hint stub files for module torch (#12500)
Summary:
We have:

- This is an initial stab at creating a type stub `torch/__init__.pyi` .
- This is only tested on Python 3, since that's the only Python version mypy
  works on.
- So far, we only aim at doing this for torch functions and torch.Tensor.
- Quite a few methods and functions have to be typed manually. These are
  done in `torch/__init__.pyi.in`

For me, PyCharm (the non-paid one) didn't seem to indicate errors in the .pyi when opening and seemed to be able to get the type hint for the few functions I tried, but I don't use PyCharm for my usual PyTorch activities, so I didn't extensively try this out.

An example of a generated PYI is at [this gist](https://gist.github.com/ezyang/bf9b6a5fa8827c52152858169bcb61b1).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12500

Differential Revision: D13695553

Pulled By: ezyang

fbshipit-source-id: 4566c71913ede4e4c23ebc4a72c17151f94e8e21
2019-01-29 12:14:17 -08:00
Wanchao Liang
c6503a4205 Revert D13540278: [pytorch][PR] Unhide unique from C++, make unique partially scriptable
Differential Revision:
D13540278

Original commit changeset: 3768c76a90b0

fbshipit-source-id: 7a31c239f9dca6ff467344d99820095addcae9d7
2019-01-22 12:22:40 -08:00
Xiang Gao
c5e1b469be Return namedtuples from torch.* function with multiple return arguments for C++ operators (#15429)
Summary:
Partially fixes: https://github.com/pytorch/pytorch/issues/394

Implementation detail:

Codegen is modified to generate codes that looks like below:
```C++
static PyObject * THPVariable_svd(PyObject* self_, PyObject* args, PyObject* kwargs)
{
  HANDLE_TH_ERRORS
  static PythonArgParser parser({
    "svd(Tensor input, bool some=True, bool compute_uv=True, *, TensorList[3] out=None)",
  }, /*traceable=*/true);

  ParsedArgs<6> parsed_args;
  auto r = parser.parse(args, kwargs, parsed_args);
  static PyStructSequence_Field fields0[] = {
    {"U", ""}, {"S", ""}, {"V", ""}, {nullptr}
  };
  static PyStructSequence_Desc desc0 = {
    "torch.return_types.svd_out", nullptr,
    fields0, 3
  };
  static PyTypeObject type0;
  static bool namedtuple_type_initialized0 = false;
  if (!namedtuple_type_initialized0) {
    PyStructSequence_InitType(&type0, &desc0);
    namedtuple_type_initialized0 = true;
  }
  static PyStructSequence_Field fields1[] = {
    {"U", ""}, {"S", ""}, {"V", ""}, {nullptr}
  };
  static PyStructSequence_Desc desc1 = {
    "torch.return_types.svd", nullptr,
    fields1, 3
  };
  static PyTypeObject type1;
  static bool namedtuple_type_initialized1 = false;
  if (!namedtuple_type_initialized1) {
    PyStructSequence_InitType(&type1, &desc1);
    namedtuple_type_initialized1 = true;
  }
  if (r.idx == 0) {
    if (r.isNone(3)) {
      return wrap(&type1, dispatch_svd(r.tensor(0), r.toBool(1), r.toBool(2)));
    } else {
      auto results = r.tensorlist_n<3>(3);
      return wrap(&type0, dispatch_svd(r.tensor(0), r.toBool(1), r.toBool(2), results[0], results[1], results[2]));
    }
  }
  Py_RETURN_NONE;
  END_HANDLE_TH_ERRORS
}
```
Types are defined as static member of `THPVariable_${op_name}` functions, and initialized at the first time the function is called.

When parsing function prototypes in `native_functions.yaml`, the parser will set the specified name as `field_name` when see things like `-> (Tensor t1, ...)`. These field names will be the field names of namedtuple. The class of namedtuples will be named `torch.return_types.${op_name}`.

In some python 2, `PyStructSequence` is not a subtype of tuple, so we have to create some functions to check if an object is a tuple or namedtuple for compatibility issue.

Operators in `native_functions.yaml` are changed such that only `max` and `svd` are generated as namedtuple. Tests are added for these two operators to see if the return value works as expected. Docs for these two ops are also updated to explicitly mention the return value is a namedtuple. More ops will be added in later PRs.

There is some issue with Windows build of linker unable to resolve `PyStructSequence_UnnamedField`, and some workaround is added to deal with this case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15429

Differential Revision: D13709678

Pulled By: ezyang

fbshipit-source-id: 23a511c9436977098afc49374e9a748b6e30bccf
2019-01-22 11:12:18 -08:00
Xiang Gao
bed7db7772 Unhide unique from C++, make unique partially scriptable (#15256)
Summary:
This PR does three things:

~~Allow `int64_t?` in function schema,  which provide an elegant way of implementing null-able int arguments, as discussed in https://github.com/pytorch/pytorch/pull/15208#pullrequestreview-185230081~~

~~Originally implemented in https://github.com/pytorch/pytorch/pull/15235~~

~~Example:~~

```yaml
- func: myop(Tensor self, int64_t? dim=None) -> Tensor
  variants: function
```

~~cc: zou3519~~

Edit: implemented in https://github.com/pytorch/pytorch/pull/15234

Previously tried in https://github.com/pytorch/pytorch/pull/12064. There was a problem that C++ does not have kwarg support, which makes it confusing to know whether `unique(t, 1)` actually means `unique(t, dim=1)` or `unique(t, sorted=1)`.

Now I think I have a better idea on how to implement this: there are two ATen operators: `unique` and `unique_dim`. `unique` has the same signature as in python, and exported to both python and C++. `unique_dim` has signature `unique_dim(tensor, dim, sorted=False, return_inverse=False)`, and only exported to C++, which could be used more naturally for a C++ user.

Differential Revision: D13540278

Pulled By: wanchaol

fbshipit-source-id: 3768c76a90b0881f565a1f890459ebccbdfe6ecd
2019-01-21 12:31:37 -08:00
James Reed
acbd9c49b0 Direct FBGEMM integraton into ATen (#13777)
Summary:
This PR implements infrastructure for post-processing a model to apply int8 quantization to its `nn.Linear` modules. Highlights of the implementation:

1) Inputs and outputs are `float` (quantized and packed internally), but the weight is quantized and packed ahead of time for efficiency. This implementation performs well in small-batch size GEMM calls. It should not be considered a general-purpose quantized GEMM kernel.
2) Weight packing is dependent on machine architecture (e.g. vector register width), so it is done just-in-time. Concretely, it is done on model load for the weights and it is done during operator execution for the input value.
3) Biases are unquantized
4) We fail loudly if we are attempting to run this on a machine that does not support FBGEMM. This is because we do not want a model's numerics to differ based on which machine it is run on. A model containing these FBGEMM ops *must* be run with FBGEMM

The API can be seen in the added test case. Highlights are:
1) `torch.jit.quantized.quantize_linear_modules` walks the module hierarchy of the passed-in Module and replaces all `nn.Linear` modules with a new `QuantizedLinear` module, which encapsulates the behavior described above.
2) `_pack()` and `_unpack()` script methods are present on `QuantizedLinear` modules. These methods should be called before serialization and after deserialization, respectively. This ensures that the weight matrix is properly packed for the running machine's architecture. Note that in the long term, we would like to move toward a more Pickle-style serialization technique, rather than having these explicit methods that mutate member values. This is blocked on being able to assign attributes in a ScriptMethod, among other things.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13777

Differential Revision: D13383276

Pulled By: jamesr66a

fbshipit-source-id: 00f29c9f34544add2b90107e3cf55a287802c344
2018-12-21 10:35:51 -08:00
Wanchao Liang
b89b46abfb Remove python_default_init from ATen and use Optional (#15234)
Summary:
Optional clean up. This PR remove python_default_init from the yaml files, and the code-gen, and utilize optional type to do the work.

This also fix the bug in the #13149 to correctly adopt as_strided backward.

Fixes #9941
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15234

Differential Revision: D13502044

Pulled By: wanchaol

fbshipit-source-id: 774b61fc4414482cf11d56e22bd0275aefb352a4
2018-12-19 21:38:50 -08:00
Tugrul Ates
560530aeec Optional ScalarType support for native functions & JIT (#15154)
Summary:
For #6593 and #9515

This completes the support for optional<ScalarType> in native, JIT and autograd.

Note: Mostly following the existing implementation for optional<Scalar> that was added in https://github.com/pytorch/pytorch/pull/12582.

This PR introduces a way to make functions accept an optional dtype and it will unblock #9515 by allowing the `dtype` param for type promotion interface:
```
func: name(inputs, *, ScalarType? dtype=None, Casting casting=same_kind)
```

An alternative approach could have been using `ScalarType::Undefined` for the same purpose but without optional, though it would have been a bit hacky.
```
func: name(inputs, *, ScalarType dtype=Undefined, Casting casting=same_kind)
```

Here's an example use of this in action: 971f69eac6

There are already a bunch of native functions that were getting optional `dtype` through function overloading. https://github.com/pytorch/pytorch/pull/15133 is the attempt to migrate all of those. I will send those changes separately after this since some functions (e.g. sum) need quite a bit of change in the codebase. See the commits over there.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15154

Differential Revision: D13457760

Pulled By: tugrulates

fbshipit-source-id: 706134f0bd578683edd416b96329b49a1ba8ab48
2018-12-19 10:45:35 -08:00
Shen Li
90f9e8103c Implement torch.tril_indices and torch.triu_indices (#12653) (#14904)
Summary:
This is an optimized implementation that does the following:

1. created an empty Tensor of correct size.
2. fill the Tensor with correct values.

The following three designs to fill in the Tensor result in roughly the same performance. Hence, the 2nd option is taken for simpler code, and to return contiguous tensors.

1. Sequential: fill row coordinates first, then columns. This results in two for-loop and more arithmetic operations.
2. Interleaved: fill in index coordinates one by one, which jumps between the two output Tensor rows in every iteration.
3. Transpose: create a n X 2 Tensor, fill the Tensor sequentially, and then transpose it.

<img width="352" alt="screen shot 2018-12-10 at 3 54 39 pm" src="https://user-images.githubusercontent.com/16999635/49769172-07bd3580-fc94-11e8-8164-41839185e9f9.png">

NOTE:

This implementation returns a 2D tensor, instead of a tuple of two tensors. It means that users will not be able to do the following:

```python
x = torch.ones(3, 3)
i = torch.tril_indices(3, 3)
x[i]  # need to first convert the 2D tensor into a tuple of two 1D tensors.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14904

Reviewed By: zou3519

Differential Revision: D13433027

Pulled By: mrshenli

fbshipit-source-id: 41c876aafcf584832d7069f7c5929ffb59e0ae6a
2018-12-12 15:40:14 -08:00
Peter Goldsborough
875be849e9 Rename _local_scalar to item() (#13676)
Summary:
Make `at::_local_scalar` more "official" by renaming it to `item()`.

gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13676

Differential Revision: D13003020

Pulled By: goldsborough

fbshipit-source-id: 0ac25f5237fb81a1576304a0a02f840ff44168a4
2018-12-04 13:19:26 -08:00
Wei Yang
12558019a8 backward for sparse.addmm(D, S, D, alpha, beta) -> D (#13345)
Summary:
- introduce `sparse.addmm()` with backward for sparse matrix input for https://github.com/pytorch/pytorch/issues/12308
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13345

Differential Revision: D13094070

Pulled By: weiyangfb

fbshipit-source-id: 136c08c3ca9bafb20577b60dd43d31c3e5cd5461
2018-11-26 17:47:48 -08:00
Gregory Chanan
b6edd7bbb4 Support 'python_module' of 'nn' in native functions. (#14126)
Summary:
Also move mse_loss, binary_cross_entropy, l1_loss to use this functionality.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14126

Reviewed By: ezyang

Differential Revision: D13109975

Pulled By: gchanan

fbshipit-source-id: 0b29dc8cf222d25db14da7532d8dc096a988a0ec
2018-11-19 14:13:25 -08:00
vishwakftw
a30ade1139 Batched cholesky decomposition (#14017)
Summary:
Implements batching for the Cholesky decomposition.

Performance could be improved with a dedicated batched `tril` and `triu` op, which is also impeding autograd operations.

Changes made:
- batching code
- tests in `test_torch.py`, `test_cuda.py` and `test_autograd.py`.
- doc string modification
- autograd modification
- removal of `_batch_potrf` in `MultivariateNormal`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14017

Differential Revision: D13087945

Pulled By: ezyang

fbshipit-source-id: 2386db887140295475ffc247742d5e9562a42f6e
2018-11-17 10:49:15 -08:00
Gregory Chanan
ce6192a21f Don't python bind _thnn_ functions. (#14101)
Summary:
This is needed for moving nn functions to native functions, but since some functions are already named
this way, I'm going to stop binding pre-emptively so we can check if there are any current dependencies.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14101

Differential Revision: D13102219

Pulled By: gchanan

fbshipit-source-id: 6bbcca33a03ab1bf648f1b73cadfe84339fa3050
2018-11-16 17:18:08 -08:00
Vishwak Srinivasan
7b2fb012a8 Make potrs batched (#13453)
Summary:
- This is a straightforward PR, building up on the batch inverse PR, except for one change:
  - The GENERATE_LINALG_HELPER_n_ARGS macro has been removed, since it is not very general and the resulting code is actually not very copy-pasty.

Billing of changes:
- Add batching for `potrs`
- Add relevant tests
- Modify doc string

Minor changes:
- Remove `_gesv_single`, `_getri_single` from `aten_interned_strings.h`.
- Add test for CUDA `potrs` (2D Tensor op)
- Move the batched shape checking to `LinearAlgebraUtils.h`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13453

Reviewed By: soumith

Differential Revision: D12942039

Pulled By: zou3519

fbshipit-source-id: 1b8007f00218e61593fc415865b51c1dac0b6a35
2018-11-09 15:16:26 -08:00
vishwakftw
1fe8278559 Batched Inverse (#9949)
Summary:
Complete billing of changes:

Related to Batch Inverse:
- [x] Add batched inverse (CPU)
- [x] Add batched inverse (CUDA)
- [x] Modify autograd entry
- [x] Add tests
  - [x] test_autograd
  - [x] test_cuda
  - [x] test_torch
- [x] Modify docs
- [x] Remove `_batch_inverse` in `MultivariateNormal`.
- [x] Allow batch matrices as inputs for negative powers in `matrix_power`

Miscellaneous modifications:
- [x] Move all batch operations to BatchLinearAlgebra.cpp/.cu and provide general framework for adding more batch ops.
- [x] Add a RAII structure for MAGMA queue management.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9949

Differential Revision: D10559089

Pulled By: zou3519

fbshipit-source-id: 7da24977f8a79d97dd42883302e13e708c1726e4
2018-10-27 23:42:46 -07:00
Richard Zou
4870b1b68f Speed up tensor.resize_(sizes) when tensor has correct size (#12824)
Summary:
While using gbenchmark, I found `tensor.resize_({0})` would take 300ns
if tensor already has the correct size. This is important for
`at::empty({0})` perf because `at::empty` always calls `resize_`, which
in turn is a important for JIT perf: the fusion compiler creates empty
tensors and then `resize_`s them to computed sizes. Most of the 300ns is
due to DeviceGuard (200ns)

Summary of findings:
- `at::empty({0}, cuda)`: 851ns
- `empty_tensor.resize({0})`: 308ns
- `DeviceGuard(tensor)`: ctor + dtor: 200ns (Going to look into this
  next because it impacts `resize_` perf).
- vdispatch overhead (`tensor.resize_()` vs
  `at::native::resize__cuda(tensor)`): ~10ns

This PR rips out the TH `resize_` implementation and adds it to ATen
with the following modifications:
- DeviceGuard used only after the same-size check.
- Same-size check rewritten for simplicity. The new check doesn't
affect perf.
- empty_cpu / empty_cuda avoid the dispatch overhead to
tensor.resize_.

Timing with this PR:
- `at::empty({0}, cuda)`: 363ns
- `empty_tensor.resize_({0})`: 17ns

Future:
- Investigate `resize_(sizes)` slowness when `tensor.sizes() != sizes`
- Should tell resize_as_ to use the new resize_ implementation...
(because resize_as_ is in TH, it is calling the old TH resize_)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12824

Differential Revision: D10449209

Pulled By: zou3519

fbshipit-source-id: cecae5e6caf390017c07cd44a8eaf2fa6e3fdeb6
2018-10-25 21:09:41 -07:00
Wanchao Liang
4e1c64caee Add c10::optional to type syntax (#12582)
Summary:
This PR adds optional type to ATen native, autograd, JIT schema and Python Arg parser, closes #9513. It allows us to use optional default values (including None) for function signature and implementations like clamp, etc., and also let us remove the python_default_init hack.

Follow up:

remove python_default_init completely.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12582

Differential Revision: D10417423

Pulled By: wanchaol

fbshipit-source-id: 1c80f0727bb528188b47c595629e2996be269b89
2018-10-25 16:08:29 -07:00
Tongzhou Wang
46162ccdb9 Autograd indices/values and sparse_coo ctor (#13001)
Summary:
Reopen of #11253 after fixing bug in index_select
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13001

Differential Revision: D10514987

Pulled By: SsnL

fbshipit-source-id: 399a83a1d3246877a3523baf99aaf1ce8066f33f
2018-10-24 10:00:22 -07:00
Gregory Chanan
7d24985852 Kill is_type_dispatched. (#12684)
Summary:
All factory functions are now implemeneted in terms of TensorOptions, which is passed through Type, if necessary.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12684

Differential Revision: D10390224

Pulled By: gchanan

fbshipit-source-id: fb536271735e6e0e542f021e407529998b0482eb
2018-10-16 07:05:49 -07:00
Yangqing Jia
713e706618 Move exception to C10 (#12354)
Summary:
There are still a few work to be done:

- Move logging and unify AT_WARN with LOG(ERROR).
- A few header files are still being plumbed through, need cleaning.
- caffe2::EnforceNotMet aliasing is not done yet.
- need to unify the macros. See c10/util/Exception.h

This is mainly a codemod and not causing functional changes. If you find your job failing and trace back to this diff, usually it can be fixed by the following approaches:

(1) add //caffe2/c10:c10 to your dependency (or transitive dependency).
(2) change objects such as at::Error, at::Optional to the c10 namespace.
(3) change functions to the c10 namespace. Especially, caffe2::MakeString is not overridden by the unified c10::str function. Nothing else changes.

Please kindly consider not reverting this diff - it involves multiple rounds of rebasing and the fix is usually simple. Contact jiayq@ or AI Platform Dev for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/12354

Reviewed By: orionr

Differential Revision: D10238910

Pulled By: Yangqing

fbshipit-source-id: 7794d5bf2797ab0ca6ebaccaa2f7ebbd50ff8f32
2018-10-15 13:33:18 -07:00
Wei Yang
572132fb17 copy_(Sparse, Sparse) for sparse tensor (#9005)
Summary:
- fix #8330
- add `torch.copy_(Sparse, Sparse)` with autograd support
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9005

Differential Revision: D8987885

Pulled By: weiyangfb

fbshipit-source-id: b317a41da22ee1eae2835622a0ed28a6771a3a06
2018-09-30 11:55:09 -07:00
Tongzhou Wang
24e958a0a7 Move bernoulli into ATen (#10273)
Summary:
+ https://github.com/pytorch/pytorch/issues/10236 : torch.bernoulli's out kwarg is broken
  fixed in moving `bernoulli_out` to ATen
+ https://github.com/pytorch/pytorch/issues/9917 : BUG torch.bernoulli(p.expand(shape)) is broken
  fixed in moving all `bernoulli` ops in ATen to use the modern apply utils methods
+ https://github.com/pytorch/pytorch/issues/10357 : torch.bernoulli inconsistent gpu/cpu results
  fixed by adding CUDA asserts

In order to use `curand_uniform4`, I made some changes to `CUDAApplyUtils.cuh`. Specifically, I introduced an optional template parameter `int step` to the `CUDA_tensor_applyN` methods, representing that we want to process `step` values at each time for each of the `N` tensors.

The calling convention for `step = 1` (default) isn't changed. But if `step > 1`, the given lambda `op` must take in `int n` as its first argument, representing the number of valid values, because there may not be full `step` values at the boundary. E.g., here is what the `bernoulli(self, p_tensor)` call look like:
```cpp

  // The template argument `4` below indicates that we want to operate on four
  // element at each time. See NOTE [ CUDA_tensor_applyN helpers ] for details.
  at::cuda::CUDA_tensor_apply2<scalar_t, prob_t, 4>(
      ret, p,
      [seeds] __device__(
          int n, scalar_t& v1, scalar_t& v2, scalar_t& v3, scalar_t& v4,
          const prob_t& p1, const prob_t& p2, const prob_t& p3, const prob_t& p4) {
        curandStatePhilox4_32_10_t state;
        curand_init(
            seeds.first,
            blockIdx.x * blockDim.x + threadIdx.x,
            seeds.second,
            &state);
        float4 rand = curand_uniform4(&state);
        switch (n) {
          case 4: {
            assert(0 <= p4 && p4 <= 1);
            v4 = static_cast<scalar_t>(rand.w <= p4);
          }
          case 3: {
            assert(0 <= p3 && p3 <= 1);
            v3 = static_cast<scalar_t>(rand.z <= p3);
          }
          case 2: {
            assert(0 <= p2 && p2 <= 1);
            v2 = static_cast<scalar_t>(rand.y <= p2);
          }
          case 1: {
            assert(0 <= p1 && p1 <= 1);
            v1 = static_cast<scalar_t>(rand.x <= p1);
          }
        }
      }
    );
```

Benchmarking on `torch.rand(200, 300, 400)` 20 times, each time with 20 loops:

post patch
```
➜  ~ numactl --cpunodebind 1 --membind 1 -- taskset -c 12,13,14,15,16,17,18,19,20,21,22,23 env CUDA_LAUNCH_BLOCKING=1 python bern.py
torch.bernoulli(x)
6.841588497161865 +- 0.05413117632269859
torch.bernoulli(xc)
0.05963418632745743 +- 0.0008014909108169377
x.bernoulli_()
0.4024486541748047 +- 0.0021550932433456182
xc.bernoulli_()
0.02167394384741783 +- 2.3818030967959203e-05

```

pre-patch
```
➜  ~ numactl --cpunodebind 1 --membind 1 -- taskset -c 12,13,14,15,16,17,18,19,20,21,22,23 env CUDA_LAUNCH_BLOCKING=1 python bern.py
torch.bernoulli(x)
12.394511222839355 +- 0.0966421514749527
torch.bernoulli(xc)
0.08970972150564194 +- 0.0038722590543329716
x.bernoulli_()
1.654480218887329 +- 0.02364428900182247
xc.bernoulli_()
0.058352887630462646 +- 0.003094920190051198

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10273

Differential Revision: D9831294

Pulled By: SsnL

fbshipit-source-id: 65e0655a36b90d5278b675d35cb5327751604088
2018-09-19 16:45:47 -07:00
Edward Yang
72822ee6b2 Fix #11430 (CPU only builds raise opaque error message when calling .… (#11533)
Summary:
…cuda())

While I was at it, I audited all other ways I know how we might get a CUDA
type from PyTorch and fixed more constructors which don't work.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11533

Differential Revision: D9775786

Pulled By: ezyang

fbshipit-source-id: cd07cdd375fdf74945539ec475a48bf08cbc0c17
2018-09-14 09:10:08 -07:00
Adam Paszke
62c9d4ac96 Make .to() methods native functions (to fix JIT tracing)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11491

Differential Revision: D9771121

Pulled By: apaszke

fbshipit-source-id: 08d11101fb12093f8cf913b06359adddf3af9da7
2018-09-11 21:55:42 -07:00
Tongzhou Wang
b9b9ae935b Make torch.randint have default dtype int64 (#11040)
Summary:
cc gchanan apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11040

Differential Revision: D9565728

Pulled By: SsnL

fbshipit-source-id: eb5be9609f30c88f52746fa7e13ad71e2856648e
2018-09-08 07:55:06 -07:00
Peter Goldsborough
fb4e8088f3 Remove methods that start with an underscore from at::Tensor (#11152)
Summary:
This PR cleans up the `at::Tensor` class by removing all methods that start with an underscore in favor of functions in the `at::` namespace. This greatly cleans up the `Tensor` class and makes it clearer what is the public and non-public API.

For this I changed `native_functions.yaml` and `Declarations.cwrap` to make all underscore methods `variant: function` (or add such a statement to begin with), and then fixed all code locations using the underscore methods.

ezyang colesbury gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11152

Differential Revision: D9683607

Pulled By: goldsborough

fbshipit-source-id: 97f869f788fa56639c05a439e2a33be49f10f543
2018-09-07 11:55:11 -07:00
Edward Yang
b02b125d16 Rename getMaybeVariableType back to getType. (#11250)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11250

```
codemod -d . --extensions cc,cpp,cu,cuh,h getMaybeVariableType getType
```

Reviewed By: gchanan

Differential Revision: D9648830

fbshipit-source-id: 6b2ac2b1c265ae47722390e6e7f106653077d851
2018-09-07 08:11:50 -07:00
Edward Yang
2c5ae8c4bf Get rid of type() method on TensorOptions; use at::getType instead (#11023)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11023

I'd like TensorOptions to not know anything about Context, so I can
move it to ATen/core without pulling in Context.  To do this, the
type() method has to go, since it consults the context to get a Type.

Reviewed By: cpuhrsch

Differential Revision: D9562467

fbshipit-source-id: 61a18a76eb042a5e70b64b963501e9d68c25d4f0
2018-08-31 14:27:05 -07:00
Edward Yang
750ede7215 Rename getType to getVariableTypeFromBaseType / getVariableType (#11095)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11095

We used getType to mean a lot of things.

- getVariableTypeFromBaseType: given a base Type (non-Variable type)
  compute the Variable Type which corresponds to it.

- getVariableType: like at::getType, but return the Variable type
  rather than the plain type.

This rename makes it clearer at the use-site what things are what,
and will make a subsequent rename of at::getType easier.

Reviewed By: gchanan, cpuhrsch

Differential Revision: D9583630

fbshipit-source-id: 2667ec98e7607bc466920c7415a8c651fd56dfca
2018-08-30 20:11:25 -07:00
Edward Yang
f7b02b3a68 Change Tensor/TensorImpl to use c10::intrusive_ptr (#10824)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10824

API additions:
- Tensor(c10::intrusive_ptr<TensorImpl,UndefinedTensor>&&)
- Tensor(const c10::intrusive_ptr<TensorImpl,UndefinedTensor>&)
- Tensor::operator=(Tensor&&) && (for completeness sake)
- TensorBase::unsafeGetTensorImpl()
- TensorBase::unsafeReleaseTensorImpl()
- TensorBase::getIntrusivePtr()
- TensorImpl::type_id()
- Tensor::set_data()
- Tensor::is_same(Tensor)
- Tensor::use_count()
- Tensor::type_id()
- Tensor::scalar_type()
- WeakTensor::is_same(WeakTensor)
- intrusive_ptr::weak_use_count()
- weak_intrusive_ptr::weak_use_count()
- c10::raw::intrusive_ptr::{incref,decref,make_weak}
- c10::raw::weak_intrusive_ptr::{incref,decref,lock}

API changes:
- Tensor::pImpl is no longer public (and now named tensor_impl_)
    - Most methods accessed this way are now accessible on Tensor
      maybe_zero_dim() and set_wrapped_number() being prominent exceptions
      (they are now accessed through unsafeGetTensorImpl())
- Type is no longer friend of Tensor
- TensorBase::reset(TensorImpl*) is deleted
- TensorBase::reset(TensorImpl*, bool should_retain) is deleted
- TensorBase::swap(TensorBaseImpl&) is deleted; use std::swap instead
- TensorBase::get() is deleted; use unsafeGetTensorImpl() instead
- TensorBase::detach() is deleted; use unsafeReleaseTensorImpl() instead
- TensorBase::retain() is deleted; use _raw_incref() instead
- TensorBase::release() is deleted; use _raw_decref() instead
- WeakTensor lost most of its methods (it no longer inherits from
  TensorBase)
- TensorImpl::storage() is now a const method
- Tensor(TensorBase) constructor removed, instead
  we go through getIntrusivePtr().  I'm not sure about
  this change; I happened to have accidentally removed the
  TensorBase constructor and decided to fix call sites,
  but I could go the other way.
- detail::set_data() is deleted; use Tensor::set_data() instead
- c10::raw_intrusive_ptr_target removed; use the functions in c10::raw instead.
  (The reason for this change, is that it is invalid to cast an intrusive_ptr_target*
  to a raw_intrusive_ptr_target* to take advantage of the methods. But there is
  no reason the incref/decref methods shouldn't also work on intrusive_ptr_target;
  it is primarily an API consideration. We can be more standards compliant by
  keeping them as functions, which are universally applicable.)
- intrusive_ptr::reclaim() and weak_intrusive_ptr::reclaim() now work on
  pointers of the NullType. (This counts as a bug fix, because the documentation
  specified that pointers produced by release() are valid to reclaim(), and
  a release() on a null intrusive_ptr produces the NullType::singleton())

Bug fixes:
- Dispatch code for mutable references incorrectly returned
  a reference to a value argument (which would immediately
  go out of scope).  They now correctly return a tensor by
  value.
- intrusive_ptr copy/move assignment did not work correctly when
  an object was assigned to itself. We now check for this case and
  no-op if so. (This bug manifested itself as a Tensor mysteriously
  becoming an UndefinedTensor after lines of code like
  'x = x.mul_(y)')

Other changes:
- The checked cast functions in Utils.h have now been
  renamed and detemplatized into checked unwrap functions.
- Added type_id() and scalar_type() methods to Tensor
- pImpl is no longer public
- Documented what the && overloads are doing
- All occurrences of 'new TensorImpl' (and similar spellings, like 'new THTensor')
  have been expunged. This is NO LONGER a valid way to create a new
  tensor, and if you do this, upon your first incref, you will catch an ASSERT
  failure saying that only tensors created by intrusive_ptr::release() are valid
  to reclaim(). Use c10::make_intrusive instead in this situation.
- IValue is adjusted to use intrusive_ptr instead of Retainable, and all
  other sub-classes of Retainable were modified to use intrusive_ptr.
  When doing this, I had to make the constructors of sub-classes like
  ConstantList public, so that c10::make_intrusive could invoke them.  Fortunately,
  if you incorrectly stack allocate a ConstantList, and then try to get an
  intrusive_ptr to it, it will fail, as stack allocated ConstantLists have refcount 0.
- IValue very narrowly sidesteps the problem of handling NullType, as it
  considers intrusive_ptr<TensorImpl> identical to intrusive_ptr<TensorImpl, UndefinedTensor>
  which is not always true. This was always the case, but there's now a comment
  explaining what's going on.

Some MSVC bugs were uncovered during the preparation of this patch.
They are documented as comments in the code.

Reviewed By: gchanan

Differential Revision: D9481140

fbshipit-source-id: 14a8ea0c231ed88b5715fb86d92730926f9f92fc
2018-08-27 16:11:01 -07:00
Peter Goldsborough
148ea2a653 Create at::linear (#10799)
Summary:
Resubmission of https://github.com/pytorch/pytorch/pull/10755 with fix for ONNX

ezyang jamesr66a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10799

Differential Revision: D9482168

Pulled By: goldsborough

fbshipit-source-id: 85d4bdfcf0d451f2e7a1c83c5f5415cdd6caacdc
2018-08-24 16:02:08 -07:00
Will Feng
b14f2e899c Preserve sparse tensor shape and dim invariants, and add scalar tensor support (#9279)
Summary:
When 0-sized dimension support is added, we expect an empty sparse tensor to be a 1-dimensional tensor of size `[0]`, with `sparseDims == 1` and `denseDims == 0`. Also, we expect the following invariants to be preserved at all times:

```
_sparseDims + _denseDims = len(shape)
_indices.shape: dimensionality: 2,  shape: (_sparseDims, nnz)
_values.shape:  dimensionality: 1 + _denseDims.  shape: (nnz, shape[_sparseDims:])
```

This PR fixes various places where the invariants are not strictly enforced when 0-sized dimension support is enabled.

Tested and `test_sparse.py` passes locally on both CPU and CUDA with the `USE_TH_SIZE_ZERO_DIM` flag.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9279

Differential Revision: D8936683

Pulled By: yf225

fbshipit-source-id: 12f5cd7f52233d3b26af6edc20b4cdee045bcb5e
2018-08-23 10:10:24 -07:00
James Reed
d40a598777 Back out "[pytorch][PR] Create at::linear" (#10785)
Summary:
Multiple failing external and internal CI signals were ignored when this commit
was landed. goldsborough please fix the text failures and resubmit this change as a
new PR
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10785

Reviewed By: ezyang

Differential Revision: D9466791

Pulled By: jamesr66a

fbshipit-source-id: b260e93bac95d05fd627c64e620b6aefb5045949
2018-08-22 14:39:59 -07:00
Edward Yang
19031c68dc Use intrusive_ptr in Storage; replace unique_ptr<Storage> with Storage (#10488)
Summary:
```
Use intrusive_ptr in Storage; replace unique_ptr<Storage> with Storage

This patch does two major changes:

- It replaces the use of Retainable in Storage with a new implementation
  based on intrusive_ptr.  This will be necessary because Caffe2 will
  be using this class to implement intrusive_ptrs, and we need to
  line these up for the merge.  One good thing about the new implementation is
  that the default copy/move constructors/assignment operators and destructor
  work automatically, instead of needing to be hardcoded into Storage/Tensor.

- It replaces all places where we returned std::unique_ptr<Storage> with
  Storage, collapsing an unnecessary double indirection that is no longer
  necessary now that we have correctly working copy/move constructors.

I didn't initially want to do step (2), but it was very important to
eliminate all bare uses of new Storage and new StorageImpl, and this making
the API change was the most straightforward way to do this.

HOW TO FIX YOUR CODE IN THE NEW API

- You no longer need to dereference the result of tensor.storage() to pass
  it to set.  So, instead of:

      x.set_(*y.storage());

  just write:

      x.set_(y.storage());

- If you were accessing methods on StorageImpl via the pImpl() method, you
  must use the dot operator to run pImpl().  Even better; just drop pImpl,
  we now have method forwarding.  So, instead of:

      storage->pImpl()->data();

  just do:

      storage->data();
      // storage.pImpl()->data() works too but is not as recommended

- storage->getDevice() is no more; instead use storage->device().index()

MISC CODE UPDATES

- retain, release, weak_retain, weak_release and weak_lock are now
  reimplemented using the "blessed API", and renamed to make it
  clearer that their use is discouraged.

- nvcc OS X and general OS X portability improvements to intrusive_ptr

- A new comment in intrusive_ptr describing how stack allocated
  intrusive_ptr_targets work differently than heap allocated ones
  from c10::make_intrusive

CAVEAT EMPTOR

- THStorage_weakRetain used to work on strong pointers, but it NO LONGER
  works with intrusive_ptr.  You must reclaim the strong pointer into a
  real strong pointer, construct a weak pointer from it, and then release
  the strong and weak pointers.  See StorageSharing.cpp for an example.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10488

Reviewed By: gchanan

Differential Revision: D9306134

Pulled By: ezyang

fbshipit-source-id: 02d58ef62dab8e4da6131e1a24834a65c21048e2
2018-08-21 21:39:55 -07:00
Peter Goldsborough
1068ba667c Create at::linear (#10755)
Summary:
The optimized code for `linear()` which uses `addmm` when a bias is given was duplicated three times in the ATen and the C++ API. Let's just have `at::linear` and use that everywhere.

apaszke ezyang (who mentioned this in #10481)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10755

Differential Revision: D9443881

Pulled By: goldsborough

fbshipit-source-id: a64862d1649b5961043d58401625ec267d97d9f3
2018-08-21 19:40:15 -07:00
Wanchao Liang
47c1badf90 Fix the clamp special case and gradient problem on None, add None to JIT (#9596)
Summary:
Supersedes #8925

This PR fixes #8502, it fixes the gradients problem for clamp when passing None to the function, and add support for the NoneLiteral and NoneType in script to enable clamp tests. Now we could have corner cases like:

```python
torch.jit.script
def func():
    x = torch.randn(3, 3, requires_grad=True)
    y = torch.clamp(x, None, 0) # max = 0
    y = torch.clamp(x, min=None, max=0)
```

In both JIT and Aten, we use Scalar(NAN) as a sentinel value when passing None type to function clamp, this is the current way we used to support None type in JIT and to solve the gradient problem when user explicitly passing None into clamp.

In JIT side, we create a tensor(NAN) and undefinedTensor if we encounter None when matching the function schema, and later in the interpreter, it will translate to Scalar(NAN) if needed.

Ideally we don't need clamp_min and clamp_max in ATenNative/Autograd and could only support clamp after this change, but since bunch of other operators (e.g. Activation.cpp, Loss.cpp) is using clamp_min in several places, we will still have the functions available, but all python invocations will only call clamp instead of clamp_min/max (with calling underlying th_max/th_min in clamp).

zdevito jamesr66a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9596

Reviewed By: zdevito

Differential Revision: D8940839

Pulled By: wanchaol

fbshipit-source-id: c543a867b82e0ab8c99384773b173fdde2605d28
2018-07-27 22:54:33 -07:00
Sam Gross
829d763c69 Implement add, sub, mul, div using TensorIterator (#8919)
Summary:
```
This adds TensorIterator, a helper class for computing element-wise
operations that's intended to replace the CPU and CUDA apply utils
functions.

CPU kernels are implemented as functions that operate on strided 1-d
tensors compared to CPUApplyUtils which operated individual elements. This
allows the kernels to handle vectorization, while TensorIterator handles
parallelization and non-coalesced dimensions.

GPU kernels continue to operate on elements, but the number of
specializations is reduced. The contiguous case remains the same. The
non-contiguous case uses a single (reduced) shape for all operands and
the fast integer division from THCIntegerDivider. To avoid extra
specializations for indexing with 64-bits, large operations are split
into smaller operations that can be indexed with 32-bits.

Major semantic changes:

 - No more s_add, s_mul, s_div, or s_sub. Broadcasting is handled by
   TensorIterator. The autograd engine performs the reduction assuming
   standard broadcasting if the gradient shape does not match the
   expected shape. Functions that do not use standard broadcasting rules
   should either continue to trace the expand calls or handle the
   reduction in their derivative formula.

 - Use ONNX v7, which supports broadcasting ops.

Performance impact:

 - Small increased fixed overhead (~0.5 us)
 - Larger overhead for wrapped numbers (~2.5 us)
 - No significant change for ops on contiguous tensors
 - Much faster worst-case performance for non-contiguous GPU tensors
 - Faster CPU bias addition (~2x)
 - Faster GPU bias addition (~30% faster)

Future work:

 - Decrease overhead, especially for wrapping numbers in Tensors
 - Handle general inter-type operations
 - Extend to unary ops and reductions
 - Use buffering for compute-bound operations on non-contiguous tensors
   (pull in from CPUApplyUtils)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8919

Differential Revision: D8677600

Pulled By: colesbury

fbshipit-source-id: 61bc9cc2a36931dfd00eb7153501003fe0584afd
2018-07-27 14:43:24 -07:00
Edward Yang
6cd0174ff5 Reimplement localScalar as a native function. (#9762)
Summary:
I split it into two parts, _local_scalar and _local_scalar_dense (unchecked)
so I could reuse the sparse logic in both paths.

_local_scalar became a method on Tensor to work around a circular
include problem.

This is resurrected copy of #9652
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9762

Differential Revision: D8972348

Pulled By: ezyang

fbshipit-source-id: 2232dbfc8e1286b8a4a1c67d285c13a7771aad4c
2018-07-25 19:09:39 -07:00
Tongzhou Wang
aee9e90abd Fix TestAutograd.test_as_strided (#9538)
Summary:
0. Fixes #9479
1. rewrites `as_strided` as a native function. This is fine because `set_` does the scalar check.
2. allow using `self` in `python_default_init`. Previously `python_variable_methods.cpp` has `self` as an input `PyObject *`, and use `self_` as the unpacked tensor. But `python_torch_functions.cpp` just use `self` as the unpacked tensor, making it impossible to use `self` in `python_default_init`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9538

Differential Revision: D8894556

Pulled By: SsnL

fbshipit-source-id: ca7877b488e12557b7fb94e781346dcb55d3b299
2018-07-19 09:11:13 -07:00
Gregory Chanan
f277645968 Support N-dimensional empty tensors in CPU BLAS and (a selection of) … (#9522)
Summary:
…CPU LAPACK routines.

Note that the LAPACK functions in general require a different approach, because direct calls with size zero dims do not work.
Here I just selected a reasonable subset of LAPACK routines to support.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9522

Reviewed By: ezyang

Differential Revision: D8888180

Pulled By: gchanan

fbshipit-source-id: 16b9013937806d375d83d1c406815765fda00602
2018-07-18 08:25:21 -07:00
Gregory Chanan
f92edf7ef4 N-dimensional empty tensors: indexing, factories, reductions. (#9209)
Summary:
This PR implements and tests N-dimensional empty tensors for indexing, factories, and reductions if compiled with -DUSE_TH_SIZE_ZERO_DIM.

Still remaining to add:
1) TensorShape functions
2) Simple linear algebra functions (matrix multiply variants)
3) Other functions that operate over a dimension (but don't reduce).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9209

Reviewed By: ezyang

Differential Revision: D8751257

Pulled By: gchanan

fbshipit-source-id: 2113374dc7af6caf31a99bf67b3893f130a29e23
2018-07-09 19:40:01 -07:00
Gregory Chanan
168a29f497 Create native wrappers around dimension reduction functions. (#9197)
Summary:
This is necessary for n-dimensional empty tensors, which have special native handling.
Closes https://github.com/pytorch/pytorch/pull/9197

Differential Revision: D8744083

Pulled By: gchanan

fbshipit-source-id: 3cc692a1d62cbeb169681b7c40e3df50e12953b7
2018-07-06 08:11:23 -07:00
Peter Goldsborough
f0772c0ab2 Replace max_pool with max_pool_with_indices (#8946)
Summary:
Re-push from https://github.com/pytorch/pytorch/pull/8892
Closes https://github.com/pytorch/pytorch/pull/8946

Differential Revision: D8666862

Pulled By: goldsborough

fbshipit-source-id: 44cd3d63d347316818a7b0f5f89fce8ff7486736
2018-06-28 16:10:08 -07:00
Orion Reblitz-Richardson
9ec0a2aef4 fbshipit-source-id: ba600fcd2b5cefc7621357bdeb05e24cea02e5af 2018-06-27 04:50:56 -07:00
Peter Goldsborough
290d20b094
Replace max_pool with max_pool_with_indices (#8892)
* Create max_poolXd_with_indices

* Match ATen names in ONNX symbolic
2018-06-26 17:09:30 -07:00
Tongzhou Wang
e6c7b38f94
Cache cufft plans (#8344)
* cache cufft plans

* use an LRU cache

* suffix CuFFTParams members with _

* import print_function for py2

* lint

* fix potential race; add dummy impl for CPU only builds

* cpp formatting; remove nccl makefile change

* Use CUDA hooks instead

* comments and doc

* update the error message

* move LRU cachae to a separate file and native::detail namespace

* update comment

* specify NOTE location in CuFFTPlanCache.h

* update disabled_features.yaml to make amd ci work

* another fix for AMD CI in disabled_features.yaml

* Wrap cufft_plan_cache_* methods in __HIP_PLATFORM_HCC__

* improve the notes

* lint

* revert onnx change

* put back inlining for CUFFT_CHECK
2018-06-22 13:02:34 -04:00
Peter Goldsborough
372d1d6735
Create ATen tensors via TensorOptions (#7869)
* Created TensorOptions

Storing the type in TensorOptions to solve the Variable problem

Created convenience creation functions for TensorOptions and added tests

Converted zeros to TensorOptions

Converted rand to TensorOptions

Fix codegen for TensorOptions and multiple arguments

Put TensorOptions convenience functions into torch namespace too

All factory functions except *_like support TensorOptions

Integrated with recent JIT changes

Support *_like functions

Fix in place modification

Some cleanups and fixes

Support sparse_coo_tensor

Fix bug in Type.cpp

Fix .empty calls in C++ API

Fix bug in Type.cpp

Trying to fix device placement

Make AutoGPU CPU compatible

Remove some auto_gpu.h uses

Fixing some headers

Fix some remaining CUDA/AutoGPU issues

Fix some AutoGPU uses

Fixes to dispatch_tensor_conversion

Reset version of new variables to zero

Implemented parsing device strings

Random fixes to tests

Self review cleanups

flake8

Undo changes to variable.{h,cpp} because they fail on gcc7.2

Add [cuda] tag to tensor_options_cuda.cpp

Move AutoGPU::set_index_from into .cpp file because Windows is stupid and sucks

Fix linker error in AutoGPU.cpp

Fix bad merge conflict in native_functions.yaml

Fixed caffe2/contrib/aten

Fix new window functions added to TensorFactories.cpp

* Removed torch::TensorOptions

Added code to generate wrapper functions for factory methods

Add implicit constructor from Backend to TensorOptions

Remove Var() from C++ API and use torch:: functions

Use torch:: functions more subtly in C++ API

Make AutoGPU::set_device more exception safe

Check status directly in DynamicCUDAHooksInterface

Rename AutoGPU to DeviceGuard

Removed set_requires_grad from python_variables.h and warn appropriately in Variable::set_requires_grad

remove python_default_init: self.type()

Add back original factory functions, but with deprecation warnings

Disable DeviceGuard for a couple functions in ATen

Remove print statement

Fix DeviceGuard construction from undefined tensor

Fixing CUDA device compiler issues

Moved as many methods as possible into header files

Dont generate python functions for deprecated factories

Remove merge conflict artefact

Fix tensor_options_cuda.cpp

Fix set_requires_grad not being checked

Fix tensor_new.h

TEMPORARILY put some methods in .cpp files to see if it solves issues on windows and mac

Fix bug in DeviceGuard.h

Missing includes

TEMPORARILY moving a few more methods into .cpp to see if it fixes windows

Fixing linker errors

* Fix up SummaryOps to use new factories

Undo device agnostic behavior of DeviceGuard

Use -1 instead of optional for default device index

Also move DeviceGuard methods into header

Fixes around device index after optional -> int32_t switch

Fix use of DeviceGuard in new_with_tensor_copy

Fix tensor_options.cpp

* Fix Type::copy(

* Remove test_non_float_params from ONNX tests

* Set requires_grad=False in ONNX tests that use ints

* Put layout/dtype/device on Tensor

* Post merge fixes

* Change behavior of DeviceGuard to match AutoGPU

* Fix C++ API integration tests

* Fix flip functions
2018-06-16 00:40:35 -07:00
Edward Z. Yang
711e5a6ceb
Port THS to ATen. (#8409)
* Port THS to ATen.

The basic structure of the patch:

- All kernels in aten/src/THS got rewritten as native
  functions in aten/src/ATen/native/sparse

  I took the liberty to rename some of the kernels,
  opting for a longer, more transparent names than
  things like 'spaddcmul'.

- Instead of holding fields for sparse tensor in the TH
  C struct THSTensor, they are now held in a C++ class
  SparseTensorImpl (this explains why I had to do this
  all in one go; I can't have *two* reps for sparse
  tensors!)

  Along the way, we change a key internal representation
  invariant: an "empty" sparse tensor has dimI == 1 and
  dimV == 0 (this is different from dimI == 0 and dimV == 0
  we had before); this ensures that we maintain the invariant
  that dim == dimI + dimV.  "Scalar" sparse tensors are
  made illegal, because there really is no way to properly
  express them in COO format.

- Because we haven't ported THCS or any of the traditional
  dense TH implementations, there is a new set of adapter
  functions in native/LegacyBridge.cpp exclusively devoted
  to deciding whether or not to go to the new native implementation
  or back to the legacy TH binding (prefixed with th_).
  The intent is that when everything gets ported, we can
  delete this file.

- I've kept the stubs for all the THS functions, but they now all
  error if you try to actually call them.  Eventually, we should
  replace these with calls to ATen so that everything keeps
  working.

- I gobbled up SparseMM (SparseMM.cpp is no more). It was tasty.

There are some miscellaneous improvements which were needed for other
changes in this patch:

- There is now AT_FORALL_SCALAR_TYPES_EXCEPT_HALF, which does what
  it says on the tin.

- axpy templated function moved to TH/BlasUtils.h, there's a new macro
  which lets you easily forward to all of the TH functions. We also expose
  THBlas_copy.  I'm not terribly pleased with these functions but
  they seem to serve a purpose they need.

- New method on Tensor to get TensorImpl*, unsafeGetTensorImpl

- accessor() is now this-const, since const-correctness on Tensor is a lie

- New toSparse()/toDense() methods on Type; now you can call these
  directly without having to manually apply at::toSparse/toDense
  on the Backend and then running toBackend yourself.

Changes to the kernels:

- Previously, the whole body of all kernels was compiled for
  every supported scalar type.  In our new implementation,
  the scalar dispatch has been pushed into the smallest extent
  which (1) is not in a type loop and (2) requires statically
  knowing the scalar type.  These sites all use
  AT_DISPATCH_ALL_TYPES.  I tried to use lambdas as much as
  possible, but sometimes it was not possible when a OpenMP
  pragma was used.

- Anywhere we tested if the nDimension of a tensor was zero,
  we replaced with a test that numel is zero.  Because, as we
  known, nDimension of zero-size tensors in TH is zero, and
  that's wrong wrong wrong (and not done this way in ATen).

Some subtleties:

- Places where previously fastget1d was used, I now use a
  TensorAccessor.  However, you have to be careful about grabbing
  the accessor, because sometimes you will be accessor'ing
  indices/values and they are empty, which means they will
  be *1D* ("oh, aren't indices always 2D?" Nope. Nyet.)
  So, essentially, it is only safe to grab an accessor *after*
  you have checked that nnz != 0.  All of these shenanigans
  will go away when we properly support zero-size dimensions.

  A few places, we test for this case just by wrapping the loop
  in a conditional on nnz.  Some other places this is not so easy,
  so we instead short-circuit the function with a special case for
  when nnz == 0 (usually, these implementations are degenerate).

- There is a very subtle but important difference between
  _sparse_get_impl(self)->indices() and self._indices();
  the latter may return a view!  This is because nnz is
  not guaranteed to match the dimensions of indices/values;
  you can "truncate" a sparse tensor by setting the nnz.
  Actually, I think this is not a good idea and we should
  enforce a stronger invariant, but for this patch I slavishly
  adhere to the old ways, and as such I have to be very
  careful if I want to resize something, I had better use
  the former and not the latter.

- I had to reimplement broadcasting by hand (thus the s_
  and non-s_ functions in the sparse native files).  There
  is a very important distinction between foo_out and foo_,
  so it is important that the LegacyBridge function always
  call to the lower layer, and not try to avoid boilerplate
  by calling to another LegacyBridge function first.
  I did NOT put broadcasting in LegacyBridge (even though,
  ultimately, that's where it must live), because the th_
  functions which are invoked from LegacyBridge handle
  broadcasting themselves, and I don't want to broadcast
  twice.

- Sparse function MUST explicitly specify the Type they
  dispatch from, otherwise Variable wrapping/unwrapping will
  not work correctly.  If you use _get_sparse_impl, that is
  sufficient to levy this requirement.

- The "has native" tests in LegacyBridge.cpp are not 100%,
  because some of the functions are mixed dense-sparse functions,
  and so you can't just say, "Oh, if it's sparse and CPU, call
  the native sparse implementation."  This is handled on a
  case by case basis.  There is some especially complex
  logic for add(), which has dense-dense, sparse-sparse
  and dense-sparse implementations.

- I added some uses of SparseTensorRef in native_functions.yaml,
  but you will notice that these are all on native_* functions,
  and not the actual, top-level functions.  So the SparseTensorRef
  is purely documentary (helping you not call the wrong overload)
  but there is no magic; we do the wrapping ourselves the hard
  way. (This is in constrast to the TH binding code which is magical.)
  Except for _sparse_mask; _sparse_mask is magical.

- There is a raw_copy_sparse_ method, which is really my way of
  getting around the fact that copy_ has never been implemented
  for sparse tensors (even before this patch), but there IS a
  super secret, internal way of doing these copies that the THS
  code used, and which I needed to get my hands on when I did this
  port.  We should refactor so that either (a) copy_ does support
  sparse-sparse copy natively, or (b) we do this other ways.

- Irritatingly, I must explicitly resize_as_ before copy_ into
  a tensor.  This was not the case with THTensor_(copy) but I don't
  have any direct binding that doesn't have this requirement.

- For some reason, the sparse tensor constructor accepts a scalar
  tensor for the values tensor.  This is kind of weird because
  you always need an nnz-dimension.  However, the old code supported
  this and just expanded it into a 1D size 0 tensor; so we need some
  explicit code to do this.

There are maybe a bit more AT_ASSERTs in some of the kernels
than is wise.  I added them all when I was debugging and was
loathe to remove them.

Some last mile fixes after this commit went into PR

- Move expand outside of dispatch so autograd works (it used to be inside and then we lost all of the recorded broadcasts).
- Hack to duplicate the derivatives for our now two definitions TH and native. Mercifully the derivatives are short.
- Apparently, TH has a special case to make foo_ functions method only, and if you don't do this the Python arg parsing is wrong. We carefully work around this in the native bindings
- Apply DCE to a test_jit case, fixes wobbling due to DCE trick in tracing
- Update test_function's output
- Some last mile fixes for dispatch confusion in sparse_coo_tensor functions.
- New simplified regression test based on failures I saw in ONNX
- Increase tolerance on super resolution test
- More robust dynamic_type normalization, fixes ONNX bug.
  The dynamic_type situation is very delicate; probably need
  to stop having both Scalar and real.
- Make new_with_tensor_sparse more CUDA safe
- Note about CUDA-safety in SparseTensorImpl
- Rename dimI/dimV to sparseDims/denseDims.
- Make localScalar on SparseTensorImpl work.
- Make numel uniformly supported on all types, not just dense
  types
- Add tests for is_nonzero() method (which exercises localScalar)
- Disable constant JIT autogenerated tests, which are fragile and broken
  by this change, but being fixed in a parallel track.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-06-15 17:52:21 -04:00
anderspapitto
fcd9af8a25
changes to support ATen code generation inside fbcode (#8397)
* Back out "Back out "Add support for generating ATen files during fbcode build""

Original commit changeset: 7b8de22d1613

I'm re-sending this diff exactly as it was approved and
committed. Fixes to support @mode/opt will be sent separately for ease
of review.

* Enable building //caffe2:torch with @mode/opt

In @mode/opt, python runs out of a PAR, which breaks a lot of
assumptions in the code about where templates/ folders live relative
to __file__. Rather than introduce hacks with parutil, I simply turn
template_path into a parameter for all the relevant functions and
thread it through from the top level.
2018-06-12 14:57:29 -07:00
Edward Z. Yang
7ed361a466
Rename SparseTensor to SparseTensorRef. (#8237)
I want to introduce using SparseTensor = Tensor (as a documentary
type alias for Tensor), but the name is already taken.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-06-07 11:03:49 -04:00
Seth Hendrickson
e9c33e91d9 Remove python bindings for torch.slice (#7924)
* skip python bindings for slice

* remove tests

* convert slice test to indexing
2018-05-31 13:42:49 -04:00
Thomas Viehmann
8f97cbcf4e remove index from python bindings (fixes: #7639) (#7690) 2018-05-19 20:04:07 +02:00
Richard Zou
71626491c4 Add batched linear solver to torch.gesv() (#6100)
* Add batched linear solver to torch.gesv()

Fixes #3164
Picks up from #4502

I moved `gesv` to ATen.
Adds bindings for MAGMA's `gesv_batched` function for CUDA.
For CPU, runs `THLapack(gesv)` in a for loop.

The new function supports arbitrary batch dimensions (and broadcasting
of those dimensions). For example, the 4-d tensor `A x B x M x M` should
be treated as having batch-size `(A x B)`.

The overhead of creating the magma_queue_t is: ~350000 microseconds
the first time it's called and ~6 microseconds every time after that.

* Tests and docs

* Address comments

* Address comments

* Rebase

* Address comments

* Fix rebase

* Addressed comments

* Address comments

* Address comments

* Addressed comments
2018-05-08 17:06:27 -04:00
Adam Paszke
0829d4502d
Trace size-dependent expressions correctly (#6554)
This makes the JIT tracer much more robust, by allowing it to record
dependencies on tensor sizes. For example, if you were to trace this
function

def fn(x):
    return x.view(x.size(1), -1)

before this patch, then it would embed the actual value of x.size(1)
in the trace as a constant, making it very hard to have e.g. batch size
independent traces. Now, this will correctly record the dependency, and
will retrieve the size of x at every run.
2018-05-04 10:55:39 +02:00
gchanan
681baa9254
Restore warning to torch.range. (#7194)
Also, get rid of warning specification in Declarations.cwrap, which currently has no effect.
2018-05-02 21:53:00 -04:00
gchanan
2a18e7c45b
Have python dispatch respect 'auto_gpu' and 'with_gil'. (#7137) 2018-05-01 13:51:02 -04:00
gchanan
a6bfa16c17
torch.arange: add numpy-style type inference. (#7016)
* torch.arange: add numpy-style type inference.

This is a backwards-compatibility breaking change.

* Fix flake8.

* Use at::optional.

* Remove unneeded header files.

* Use reference wrapper.

* Update arange for test.

* Address review comments.
2018-04-27 15:11:45 -04:00
gchanan
3d907ef78e
Consistently check 'out' variants against specified dtype/layout/device parameters. (#6973)
We were previously doing this in the most common cases, but not consistently.
2018-04-25 22:46:42 -04:00
Adam Paszke
d26ab68485 Sort declarations when generating Python bindings (#6701)
* Sort declarations when generating Python bindings

This helps resolve ambiguities in argument parsing according to
any rules we will need.

For now, this allows us to make scalar operations more conservarive
wrt. argument types, but makes them commutative again.

* Fix inconsistencies between mod with tensor and scalar

* Fix a stupid mistake
2018-04-18 21:51:35 -04:00
Thomas Viehmann
bd0cc7d364 Implement torch.einsum (fixes #1889) (#6307)
* start at generic trilinear

* Implement einsum (fixes #1889)

This provides a simple implementation of einsum. It is built on
top of the work for computing bilinear (#6110).
It uses a naive left-to-right resolution at the moment.
Autograd is able to differentiate by itself.
The obvious unsupported feature is taking diagonals (einsum('ii->i',(a,)).

* add tests and docs

* fix flake8

* clean diff

* rebase on current master to resolve conflicting String wrapping

* clean up after rebase

* better commentary in einsum and sumproduct_pair

* don't say fixme if it's fixed and rename num_outputs to num_output_dims

* adapt python wrapper to use std::string instead of String to avoid typedef at::String

* typos and some vector to array conversion

* fix accidental python<->python3 change

* really fix bad rebase
2018-04-18 13:41:27 +02:00
gchanan
5ed3f3347a Add dtypes (with reasonable defaults) to sum, prod, cumsum, cumprod. (#6573)
* Add dtypes (with reasonable defaults) to sum, prod, cumsum, cumprod.

This adds optional dtypes to torch.sum, torch.prod, torch.cumsum, torch.cumprod.
By default, the dtype is torch.float64 for integral types, and the dtype of the input for floating point types.

* Don't use optional<ScalarType>, because the jit can't handle it yet.

Instead, we manually build the overloads.  This is fairly painful because of default arguments, but should be easy to pull out once the jit can handle optional<ScalarType>.

* Fix keepdim with out parameters.

* Fix _cudnn_rnn_flatten_weight.

* If dtype is provided to an out function, make sure it matches the dtype of the result.

* Fix typo.
2018-04-16 23:52:59 -04:00
gchanan
749d51414a
Separate cuda-ness from dtype. (#6470)
* Separate cuda-ness from dtype.

There are no longer torch.cuda.int64, etc; only torch.int64 that correspond to at::ScalarType.
At the python arg parser level, the corresponding ATen type is selected from the combination of (ScalarType, Layout, Device).

There is also currently unused code in here for support ScalarType in native_functions; this will be used for specifying aggregate types
on reduction functions.

* Fix test_autograd.

* Add defaults to randint_like.

* Track is_cuda in py tensor types.

* Fix test_sparse.

* Fix multiprocessing.

* Fix rnn.

* Fix test_nn.

* Fix flake8.
2018-04-12 14:05:44 -04:00
gchanan
87e369111a
Add string-style devices to all tensors. (#6283)
* Add string-style devices to all tensors.

Previously, tensors only had a 'get_device' method which would throw an exception on a CPU tensor.   This made it necessary to if/else code that
was meant to be device agnostic.

This PR implements the following:
1) Adds a 'device' property to all tensors that returns a string representation of the device for all tensors.
For cpu tensors this is 'cpu'.  For cuda tensors this is 'cuda:X', where X is the cuda device ordinal.

2) Adds a DeviceSpec class.  This is just a helper class for separating device_type and device_index specification and to allow partial specification.
For example, you can call DeviceSpec('cuda'), DeviceSpec('cuda:0'), DeviceSpec('cuda', 1).
Also has backwards compatibility support for specifying integers, which are treated as cuda devices.

DeviceSpecs have the following properties:
a) device_type: string representation of the device type (i.e. 'cpu' or 'cuda')
b) device_index: integer for the device index (None if not specified)
c) cuda_device_index: for backwards compatibility; behaves roughly like `get_device` did previously.  I.e. if a function previously took integers for cuda devices,
it can now take DeviceSpecs (or strings), and can maintain the old functionality by calling `old_index = DeviceSpec(old).cuda_device_index`.

3) tensor methods and torch. functions that took integer devices can now take integers, strings, or DeviceSpecs.  For example:
torch.randn((2,3), dtype=torch.cuda.float32, device='cuda:1')

TODO in future PRs:
A) Split out cuda from dtype so you don't need to overspecify cuda-ness
B) We currently only support strings/DeviceSpecs in tensor methods and torch. functions.  We should have equivalents torch.cuda.device(...), torch.cuda.device_of, etc.
at the torch. level that work on strings/DeviceSpecs

* Add deviceInt64 to python arg parser.

* device_str.

* Remove device_str.

* remove device prefix from attributes.

* Use const char * instead of string.

* Move autogpu index out of Device.

* comment on is_default.

* Rename torch.DeviceSpec to torch.device.

* comment.

* Fix tests.

* Fix flake8.

* Fix sparse_coo_tensor parameter name.

* Improve error message.

* Remove device_ prefix from C++ device object.

* Allocate static strings.

* Return not implemented from rich compare.

* Move torch::Device to THPDevice.

* Remove cuda index.

* Py_RETURN_NOTIMPLEMENTED doesn't exist in python2.
2018-04-06 15:12:05 -04:00
Peter Goldsborough
9ba70856a1 Add max_values and argmax convenience functions to ATen (#6201)
* Add max_values and argmax convenience functions to ATen

* Add documentation for torch.argmax/argmin and skip max_values

* Add tests for argmax/argmin

* Dont default the dim argument

* Use dim=0 in test_torch.py for argmax tests

* Implement argmin()  and argmax() without dim

* Call .contiguous() before .view(-1)
2018-04-04 15:53:26 -04:00
gchanan
4c81282c33
Introduce torch.layout and split layout from dtypes. (#6145)
* Introduce torch.layout and split layout from dtypes.

Tensors (and tensor types) now have a 'layout' attribute that returns either 'torch.strided' or 'torch.sparse_coo'.

Previously, dtypes were 1-to-1 with ATen types/PyTensorTypes; the impetus behind this decision was to make things easy in the common case
(i.e. specifying a type in a factory function).  But this doesn't really follow for sparity, which isn't a common case.

It also doesn't properly represent the concept or a dtype, which in numpy are proper scalar types (i.e. roughly the type returned from indexing the
last dimension of an n-d array).  But this should be the same whether or not the tensor is represented via strides, sparsity, etc.

This is accomplished by:
1) having the dtype of tensor return the (device-type, scalar-type) combination, i.e. torch.cuda.float32, so both
   torch.cuda.FloatTensor and torch.cuda.sparse.FloatTensor have the same dtype
2) Adding a layout parameter to python functions, where the combination of (dtype, layout) maps to an ATen type that is used for dispatch.

* Formatting, make init throw python_error.

* Fix cuda not enabled error message.

* Fix test.
2018-04-02 14:07:50 -04:00
Tongzhou Wang
d2c0f8bb57 avoid generating torch.*_backward_(input|weight|bias) (#6114) 2018-03-30 15:23:56 -04:00
gchanan
df039e2998
Unify handling of type_dispatched_args in gen_python_functions. (#6088)
This is just to simplify the handling, there is no generated code difference.
2018-03-28 22:23:20 -04:00
Richard Zou
e3e0c34390 Unify error checking for tesnor.index_copy_ (#5642) 2018-03-22 20:07:15 -04:00
gchanan
a3442f62bc
Support native namespace functions with type dispatch. (#5576)
* Support native namespace functions with type dispatch.

Use 'ones' as an example.  Note this is a "halfway" solution; i.e. the call chain is:
at::ones(shape, dtype) -> dtype.ones(shape, dtype) -> CPUFloatType.ones(shape, dtype) -> at::native::ones(shape, dtype)

The "nicer" solution would probably be something like:
at::ones(shape, dtype) -> dtype.ones(shape) -> CPUFloatType.ones(shape) -> at::native::ones(shape, this)

* Fix type inference.

* Fix test install.

* Fix extensions.

* Put dtype argument at the beginning.

* Fix extension.cpp.

* Fix rnn.

* Move zeros in the same manner.

* Fix cuda.

* Change randn.

* Change rand.

* Change randperm.

* Fix aten contrib.

* Resize in randperm_out.

* Implement eye.

* Fix sparse zeros.

* linspace, logspace.

* arange.

* range.

* Remove type dispatch from gen_python_functions.

* Properly generate maybe_init_cuda for type dispatch functions not named type.

* Don't duplicate dtype, this parameters for native type dispatched functions.

* Call VariableType factory methods from the base type so it gets version number 0.

* Address review comments.
2018-03-09 10:52:53 -05:00
gchanan
0f86f64398
Add support for device python arguments with constructors. (#5384)
* Add support for device python arguments with constructors.

* Fix flake8.

* Simplify device handling.

* Dont use torch._C._VariableFunctions.

* Handle default values for functions that have tensor args (e.g. ones_like).
2018-02-28 14:41:57 -05:00
Sam Gross
ebd32f7bcd
Check that parsed_args contains enough space for all parameters (#5467) 2018-02-28 14:34:04 -05:00
gchanan
f4cfd9bbfc
Don't python bind 'tensor' or 'sparse_coo_tensor'. (#5390)
These are internal ATen functions; we have better python APIs.
2018-02-26 11:06:25 -05:00
Sam Gross
30ec06c140
Merge Variable and Tensor classes (#5225)
This replaces the torch.Tensor constructors with factories that produce
Variables. Similarly, functions on the torch module (e.g. torch.randn)
now return Variables.

To keep the PR to a reasonable size, I've left most of the unused tensor
code. Subsequent PRs will remove the dead code, clean-up calls to
torch.autograd.Variable, and rename Variable to Tensor everywhere.

There are some breaking changes because Variable and Tensors had
slightly different semantics. There's a list of those changes here:

 https://github.com/pytorch/pytorch/wiki/Breaking-Changes-from-Variable-and-Tensor-merge
2018-02-23 18:03:31 -05:00
gchanan
0878c6d4d7
Various dtype improvements. (#5321)
* Various dtype improvements.

1) Add dtypes to the new data-based constructors: Variable.new_tensor and torch.autograd.variable.
2) In the python signatures, use Type instead of Dtype to match	the C++ signatures; the error messages still print as dtype.
3) Handle / add a better error message when a dtype is used when ATen was not compiled with that type (e.g. cuda types).
4) Move cuda_lazy_init to its own file.

A later commit will add support to the legacy constructors as well.

* Move implementation of lazy_init to cpp.

* Fix parsed_arg size.
2018-02-21 17:37:59 -05:00
gchanan
5edf6b2037
Add numpy-style dtypes to Variable factories. (#5245)
* Add numpy-style dtypes to Variable factories.

1) Add numpy-style dtypes corresponding to torch tensor types.  These are:
torch.float16, torch.float32, torch.float64, torch.uint8, torch.int8, torch.int16, torch.int32, torch.int64
as well as torch.cuda, torch.sparse, and torch.cuda.sparse equivalents.

2) Adds "legacy" names for the above dtypes that correspond more closely to existing tensor names.  These are:
torch.half, torch.float, torch.double, torch.short, torch.int, torch.long.
torch.byte and torch.char don't exist because they either don't match numpy semantics or differ on different architectures.

3) Adds a "dtype" parameter to Variable factories (e.g. zeros, ones) that allows the user to specify the type without changing the default tensor type.

4) Adds a "dtype" getter to Variables that return the canonical dtype from 1)

This PR is missing the following useful features that should be added in the future:
A) We only add the "dtype" parameter to auto-generated factories; hand-written factories like in tensor_new.cpp don't support this yet.

B) We don't allow type conversions to use dtypes; that should be added to type(param) or a new function.

C) We don't yet have a "device" parameter for these factories; right now, they will only create Variables on the default device.

* backend_to_string can be private.

* Define python binding argument indexes in a more simple way.

* add all_declared_types, still need to hook it up to THPDType.

* Fix all_declared_types for missing types (it's Sparse + Half).

* Ensure cuda dtypes are created even if compiled with NO_CUDA=1.

* Fix case where dtype is provided but dispatch is via namespace.

This happens in ones_like, empty_like, randn_like.

There is some question if we should do:
1) at::ones_like(tensor).toType(dtype)
2) at::ones_like(tensor.toType(dtype))

I did the former because this matches with the numpy documentation, i.e.:
"Overrides the data type of the result." and it's easier to implement.

Note that the above causes an extra copy, either of the input or output.
Here's a better implementation:
1) Make zeros_like, ones_like native functions that take an optional type (named dtype?).
2) Match the type argument with the dtype, so we don't have two different parameters.
3) Call at::zeros_like(input, type) -> at::native::zeros_like(input, type) -> type.zeros(input.sizes())

* Don't return from maybe_initialize_cuda.

* Don't leak DType name.

* Address cpp review comments.

* Share code between sparse and non-sparse test_dtypes.

* Rewrite _like functions as native function with explicit type parameter.

* Use type 'Type' instead of 'dtype' for consistency.

* Address review comments.

* Handle arg_idx when there is requires_grad but no dtype in python_binding_arguments.
2018-02-20 11:04:14 -05:00
Sam Gross
0509f26d41 Speed-up nn.Linear for the 3d input case (#5279)
This adds at::_unsafe_view and uses it in matmul. The _unsafe_view
function is identical to view except that the output is not treated
like a view by the automatic differentiation code. This avoids in-place
modifications triggering the more expensive CopySlices/AsStridedBackward
behavior.

The _unsafe_view function is only safe to use on temporaries that will
be immediately discarded and that do not alias other tensors. Otherwise,
in-place modificatiions may trigger incorrect gradients. The funciton is
not exposed to Python.

See #5169
2018-02-19 19:47:20 -05:00
Sam Gross
c1b98f0841
Add deprecated add_out overload (#5088)
We have a few calls that use this signature on Tensors. This also
updates the binding code to support deprecated xxx_out signatures.
2018-02-06 17:08:23 -05:00
Edward Z. Yang
7bd2db997e
Port cuDNN RNN bindings to ATen (#4881)
* Add transpose() to TensorGeometry.

This code is dead; I briefly used it in my RNN patchset but
eventually rewrote it to not be necessary.  However, it seemed
like a useful gadget so I kept it.  In general, it seems that it
would be useful for TensorGeometry to support all operations that
Tensor does, but it only computes the changes to sizes/strides
instead of actually doing the computation.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Turn on wrap_dim behavior for TensorGeometry

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Support for hard-coded differentiable outputs.

Some outputs of functions are nondifferentiable, and should always
be returned with requires_grad=False.  Traditionally, we have used
the presence of 'grad' to signal that only the first output is
differentiable, and the rest are not, but cudnn_rnn (to be
implemented) breaks this pattern; its first three outputs are differentiable,
but its last output is a buffer that is just consumed by backwards.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* TensorGeometry constructor from just sizes

The sizes are assumed to form a contiguous tensor, and we compute
the strides we would get in that case.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Support saving TensorList for backwards.

There is some back story here.  Saved TensorList in backwards will
be used by cudnn_rnn, and it is worth asking, why is it necessary to
save a list of tensors?  Indeed, *technically* speaking a list of
tensors is not necessary, we only need to save the sizes of each
of the weight tensors.  (We need the sizes because cuDNN is only
going to blast the derivative of weights into a flat buffer, but
we need to match the sizes of the views into the buffer when we
eventually return the derivatives.)

However, it was surprisingly awful trying to implement passing just
sizes, because as non-Tensor arguments, the JIT interpreter generation
code is expected to handle all non-Tensor arguments as attributes in the
trace, and our attributes struct doesn't actually know how to do
arrays of arrays.  Saved TensorList code was much easier to get working,
so that's what this patch does.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* MatrixRef - an ArrayRef with a stride, making it a 2D ArrayRef.

Like ArrayRef, this class does not own the underlying data, it is expected
to be used in situations where the data resides in some other buffer.
This is intended to be trivially copyable, so it should be passed by
value.

For now, 2D only (so the copies are actually cheap, without having
to write a SmallVector class) and contiguous only (so we can
return non-strided ArrayRef on index).

The intended use-case (not in this commit) is to make it easier to
work with RNN weights, which are num_weights x num_layers matrix of
parameters.

P.S. dimension 0 indexes rows, dimension 1 indexes columns

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Generalize getDataType in Descriptors.h

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Change copy_range to take Tensor, and change cat_tensors_backward accordingly

Should a backward function return a Variable or a Tensor?  For the most
part, all of our backward functions return Tensor, except cat_tensors_backward,
which returns a variable_list (which is really the only thing that matters,
because Tensor and Variable are interconvertible).  But this is kind of weird,
because it means that you can't implement a backwards in ATen that returns
a std::vector<Tensor>, and then hook it up transparently with the derivatives
code.  So I switched it over.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Support 5-ary return Tensor tuple.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Support code generation with mixed Tensor/TensorList in output.

I don't think I ended up using this in cudnn_rnn, but this seems
it might be useful for someone else later.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Support 4-ary boolean array

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Add support for retain_variables in tools/autograd/derivatives.yaml

'retain_variables', a bool which is true if a user has specified
that saved variables should be retained in case the backwards is
run again later.  This allows an optimization where we can
destroy saved buffers if we know variables are not going to be retained,
e.g., it is (will be) used by _cudnn_rnn

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Lazily initialize cuDNN descriptors

Previously, cuDNN descriptors were eagerly allocated as soon
as a FooDescriptor object was created.  However, in some uses
of TensorDescriptor, this is problematic: some tensors are optional
and cuDNN's API expects to be given a nullptr TensorDescriptor
in this case, not an uninitialized (but allocated) descriptor.

Lazily initializing the descriptors makes it less likely for
us to use uninitialized memory and matches the usual semantics of
unique_ptr.  It's good sense!

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Port cuDNN RNNs to ATen.

This brings three new functions:
  - _cudnn_rnn_flatten_weight: flatten a matrix of weight tensors into
    a single contiguous weight buffer as required by cuDNN
  - _cudnn_rnn: run RNN forwards
  - _cudnn_rnn_backward: run RNN backwards

RNNs have a lot of parameters, so we restructured what was previously
a single 'fn' object that recorded all the parameters into three
objects: RNNDescriptorParams, TensorDescriptorListParams and
DropoutDescriptorParams.

We make use of MatrixRef to organize the weight tensors (which are
weight/bias x number of layers), but I did not teach the codegen
how to pass these as arguments/return values natively, so instead
a MatrixRef is passed as its constituent ArrayRef and int64_t stride0.

cudnn_rnn has three differentiable outputs and one nondifferentiable
one, so it makes use of the support for hard-coded differentiable outputs.

I haven't deleted all of the descriptor code from Python, because dropout
initialization still goes through this codepath, that should be fixed soon
but I don't see it as essential for this PR.

This commit also removes the last use of NestedIOFunction from PyTorch.

There are some shenanigans with cuDNN dropout descriptor initialization,
see below:

Note [cuDNN dropout descriptor initialization]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In most cases, setting descriptors in cuDNN is cheap (e.g.,
cudnnSetTensorNdDescriptor).  However, this is not the case for
cudnnSetDropoutDescriptor: in cuDNN 6/7 (and possibly others) it does an
expensive precomputation to initialize the random number generator states.  In
cuDNN 6, this is the ONLY official mechanism to initialize a dropout descriptor,
which means that law-abiding clients were expected to generate a dropout
descriptor once and cache it.  However, our ATen interface is (1) stateless (so
we can't cache the descriptors) and (2) does not accept arbitrary user types in
its interface (so we can't pass the descriptor in).  This puts us in a pickle.

In cuDNN 7, a new function, cudnnRestoreDropoutDescriptor was added, which
forgoes the expensive initialization process, and can initialize the
descriptor with a pre-initialized state CUDA tensor.  This is great, because
it means we can simply pass in the state tensor and then initialize the
descriptor internally.  Unfortunately, this function is not available in
cuDNN 6.

To work around this, we break the cuDNN abstraction barrier, and have
the struct layout of the underlaying dropout descriptor.  With this struct,
we can reimplement cudnnRestoreDropoutDescriptor from scratch. Great!

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Fix cuDNN 7 behavior.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Delete some unused, controversial methods from MatrixRef.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Add missing filter_dim_a slice

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Replace nested for-loop with itertools.chain.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* CR comment on mut_desc()

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Refactor DropoutDescriptor API.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Use cached CurrentDeviceProperties from Context.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Document _cudnn_rnn outputs.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Improve fmap docs, convert some functions to use it.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Move IndexRange to autograd/function.h

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Elaborate on CUDNN_STATUS_INVALID_VALUE return some more.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Add an all-in-one setter for RNNDescriptorParams.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Print what the unrecognized RNN mode was

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* RNN TensorDescriptor improvements

- Have an explicit size/stride overload for set TensorDescriptor,
  so you don't have to create a goofy view to feed in.

- Change the padding to 3D rather than 5D, which is all you actually
  need (it's just 2D that is not supported by cuDNN API.)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Fix implementation of cudnnRestoreDropoutDescriptor, plus test.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Better comments about input layout.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Add comment about no-DropoutDescriptor argument RNNDescriptor function.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Rename vocab_size back to input_size.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Don't use backslash in comment.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Bugfix for contiguous TensorGeometry calculation.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Don't allocate a dummy tensor when setting TensorDescriptor for flatten_weight.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Make contiguity errors more user-friendly.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* s/fn.dropout.train/fn_train/

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* s/_cudnn_rnn_backward_grad/_cudnn_rnn_backward_input/

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Make dcx properly undefined when not required.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Remove old TODO.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Add state size check in cudnnRestoreDropoutDescriptor

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Explicitly narrow int64_t to size_t

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Restore copyParams comment.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Update benchmark numbers, and slight engineering improvements.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Typofix.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-02-05 13:54:11 -05:00
Richard Zou
bc11511cda Restore sparse variable transpose_() and t_() (#4779)
* Restore sparse variable transpose_() and t_()

* Add dimension wrapping to transpose_, t_

* Don't expose sparse_raw_resize_ to python
2018-01-23 21:32:40 -05:00
gchanan
c49f0279a6
Add kwarg-only 'requires_grad' parameter to Variable factories. (#4748)
* Add kwarg-only 'requires_grad' parameter to Variable factories.

Functions that create variables, e.g. torch.ones_like currently always return Variables with requires_grad=False;
this is less convenient than the existing Variable constructor that has a requires_grad parameter.  This commit
adds the parameter at the python binding level.

* Fix flake8.

* Address review comments.

* Match set_requires_grad implementation with tensor_new version.
2018-01-22 19:15:11 -05:00
Sam Gross
57549b7e44
Bind functions with out= arguments in VariableType (#4565)
This adds overrides in VariableType for the xxx_out ATen functions and
implements Python bindings. There is no support for automatic
differentiation. If any of the inputs (or outputs) requires grad, then the
function will throw an exception unless it's running in "no-grad" mode.

The bindings for calling torch.xxx functions on Variables are moved to a
different object. Previously, they were static method on VariableBase.
This change prevents users from accidentally calling static methods as if
they were instance methods.
2018-01-17 18:27:42 -05:00
Sam Gross
f8a4b1a266
Split off load_derivatives and gen_autograd_functions from gen_variable_type (#4370) 2017-12-27 18:59:41 -05:00
Tongzhou Wang
d8b2e5d091 Add python only default init expression; Implement stft, hann/hamming/bartlett window. (#4095)
* implement stft

* addressed comments; implemented window functions; added support for python only default initialization
2017-12-18 12:28:23 -05:00
Sam Gross
c813ce3787 Implement Variable._sparse_mask (#4124)
* Implement Variable._sparse_mask

* Use SparseTensor as the dyanmic_type
2017-12-15 17:25:20 -05:00
Sam Gross
aeb7a3668d
Implement Variable.new (#4080) 2017-12-11 15:45:43 -05:00
Tongzhou Wang
c681b03d37 Add determinant function on variable; Add backward on svd (#3816)
* determinant on variable

* svd bwd
2017-12-01 13:22:46 -05:00
Adam Paszke
65e0d5bad8 Fix void* wrapping in autograd codegen
Also, add assertions here and there to make sure bad things
never happen again.
2017-11-24 13:33:13 +01:00
Sam Gross
9cb8b43778
Split off in-place NN functions (#3683)
For example, this splits threshold into threshold(), which is now
never in-place, and threshold_() which is always in-place.

This simplifies the in-place vs. non-in-place logic in
gen_variable_type.py, which was bug-prone.
2017-11-14 12:59:06 -05:00
Zach DeVito
5aa5b572e4 update build so that all of TH* is in libATen 2017-11-02 19:53:36 -04:00
Edward Z. Yang
53fe804322 Make ONNX work with new C++ autograd world.
The general strategy is there is a new module, torch.onnx.symbolic, which
contains a function for every ATen method name with the ONNX translation.
While implementing this, I took the opportunity to expunge all references
of 'g' from the public API; instead, it is managed by a global variable in
torch.onnx which tracks the "current graph".

Other changes:

- If you pass a Tensor to op as an argument, it will now automatically be
  converted into a Constant ONNX node.  This lets us remove needing to
  implement ONNX

- Rename value to other, wherever there is both a Scalar and Tensor overload.
  This way, keyword dispatch can work uniformly in both cases.

- Deleted any autograd Function classes that both had a symbolic and were ported
  to the new C++ autograd implementation.  There may still be some straggling
  classes that didn't have symbolic.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-10-20 15:38:01 -04:00
Sam Gross
f1f64c8d07 Generate autograd functions for NN / more refactors (#3136)
Generate autograd functions for NN and implement more derivatives in derivatives.yaml

A big refactor of gen_variable_type.py
2017-10-19 15:03:26 -04:00
Sam Gross
f29bcab67e Use Declarations.yaml to generate python bindings 2017-10-07 00:41:29 -04:00
Sam Gross
558d26a69e Fix argument indices 2017-10-07 00:41:29 -04:00
Sam Gross
dcb8d0f088 Refactor out python binding generation from gen_variable_type.py
- Also includes some prep work for binding NN functions
2017-10-07 00:41:29 -04:00