Commit Graph

83 Commits

Author SHA1 Message Date
Dylan Bespalko
849c32f8e9 Cpu-strided-complex support for binary-ops (#25534)
Summary:
In-tree changes to pytorch to support complex numbers are being submitted here.
Out-of-tree support for complex numbers is here: [pytorch-cpu-strided-complex extension](https://gitlab.com/pytorch-complex/pytorch-cpu-strided-complex)

Note: These changes do not support AVX/SSE operations on complex tensors.
Changes so far:

- [x]  Added complex support of torch.empty.
- [x]  Added complex support of CopyKernels
- [x]  Added complex support of BinaryOp kernels

Once these changes are applied the rest of the kernels are pretty easy.

ezyang
I have fixed the issues in the original [PR: 25373](https://github.com/pytorch/pytorch/pull/25373).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25534

Differential Revision: D17188390

Pulled By: ezyang

fbshipit-source-id: ade9fb00b2caa89b0f66a4de70a662b62db13a8c
2019-09-04 13:20:52 -07:00
Richard Zou
0dcb8755c8 Implement tensor.set_names_, tensor.names setter
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23172

Test Plan:
- [namedtensor ci]

gh-metadata: pytorch pytorch 23172 gh/zou3519/74/head

Imported from OSS

Differential Revision: D16494364

Pulled By: zou3519

fbshipit-source-id: 8d0e26b33346d4eadba30b2e76610f6d7be7c373
2019-07-26 08:50:49 -07:00
Richard Zou
9817d7e16b Implement named inference rule for torch.sum
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23081

Test Plan:
- New tests [namedtensor ci]

Imported from OSS

Differential Revision: D16419174

Pulled By: zou3519

fbshipit-source-id: 8679f77f121664d0398d7f062a53c0fa37482481
2019-07-26 08:50:40 -07:00
Sam Gross
b1b65f34a9 Make PythonArgs::tensor and PythonArgs::scalar faster (#22782)
Summary:
Speeds up the common case where Tensor is a torch.Tensor (not a
subclass). This reduces the number of executed instructions for a
torch.add(tensor1, tensor2) by ~328 (should be ~65 ns faster).

Note that most of the PythonArgs accessors are too large to be inlined.
We should move most of them to the cpp file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22782

Differential Revision: D16223592

Pulled By: colesbury

fbshipit-source-id: cc20f8989944389d5a5e3fab033cdd70d581ffb1
2019-07-12 11:57:29 -07:00
Hong Xu
693871ded3 Rename macros and build options NAMEDTENSOR_ENABLED to BUILD_NAMEDTENSOR (#22360)
Summary:
Currently the build system accepts USE_NAMEDTENSOR from the environment
variable and turns it into NAMEDTENSOR_ENABLED when passing to CMake.
This discrepancy does not seem necessary and complicates the build
system. The naming of this build option is also semantically incorrect
("BUILD_" vis-a-vis "USE_").  This commit eradicate this issue before it
is made into a stable release.

The support of NO_NAMEDTENSOR is also removed, since PyTorch has been
quite inconsistent about "NO_*" build options.

 ---

Note: All environment variables with their names starting with `BUILD_` are currently automatically passed to CMake with no need of an additional wrapper.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22360

Differential Revision: D16074509

Pulled By: zou3519

fbshipit-source-id: dc316287e26192118f3c99b945454bc50535b2ae
2019-07-02 11:46:13 -07:00
Roy Li
9c8f9f0ecb Remove many usages of Type (#21941)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21941
ghimport-source-id: f20cca6229daba9eb8652adb3d959266ae081ef1

Test Plan: Imported from OSS

Differential Revision: D15893331

Pulled By: li-roy

fbshipit-source-id: c988b16008ff0e2725a88c6025afd4aabdaca45a
2019-06-30 04:11:28 -07:00
Vitaly Fedyunin
516c7e4456 Adding memory_format to empty and empty_like operators (#20558)
Summary:
Original RFC https://github.com/pytorch/pytorch/issues/19092

To ensure that we are not introducing BC breaking change, empty_like returns contiguous tensor by default.

```python
nCwh = torch.randn(N, C, H, W)
nhwC = nCwh.contiguous(memory_format=torch.channels_last)

new_nCwh = torch.empty_like(nhwC)
new_nCwh.is_contiguous(memory_format=torch.channels_last) == False
```

Now we need a way to preserve memory format in `empty_like`

```python
nCwh = torch.randn(N, C, H, W)
nhwC = nCwh.contiguous(memory_format=torch.channels_last)

new_nhwC = torch.empty_like(nhwC, memory_format=torch.preserve_format)
new_nhwC.is_contiguous(memory_format=torch.channels_last) == True

like_nCwh = torch.empty_like(nCwh, memory_format=torch.preserve_format)
like_nCwh.is_contiguous(memory_format=torch.channels_last) == False
```

Usage of `torch.preserve_format` allows us to avoid `if` constructs.

We can also generate different memory format outputs

```python
nCwh = torch.randn(N, C, H, W)
nhwC = nCwh.contiguous(memory_format=torch.channels_last)

new_nhwC = torch.empty_like(nCwh, memory_format=torch.channels_last)
new_nhwC.is_contiguous(memory_format=torch.channels_last) == True

new_nCwh = torch.empty_like(nhwC, memory_format=torch.contiguous_format)
new_nCwh.is_contiguous(memory_format=torch.channels_last) == False
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20558

Differential Revision: D15502474

Pulled By: VitalyFedyunin

fbshipit-source-id: 2e120d57eefad6fb8e04b8322c79871392f64331
2019-06-26 11:48:27 -07:00
Lara
7b1ffba3bf ArgumentStash for Scalar arguments (#21931)
Summary:
Scalars are being traced as constants.
This PR is to fix this issue.

The ONNX Graph for Test_Full_op() before and after this change:

def Test_Full_op():
  class Test_Full(nn.Module):
    def forward(self, x):
      return torch.full((3, 4), x, dtype=torch.long)
  model = Test_Full()
  x = torch.tensor(12)
  output = model(x)

Before this change:
graph(%input1 : Long()):
%output1 : Float(3, 4) = onnx::Constant[value=<Tensor>]
return (%output1)

After this change:
graph(%input1 : Long()):
%1 : int[] = onnx::Constant[value= 3 4 [ Variable[CPULongType]{2} ]]
%2 : Tensor = onnx::ConstantOfShape[value={0}]
%output1 : Float(3, 4) = onnx::Add(%2, %input1)
return (%output1)

Similar PR : https://github.com/pytorch/pytorch/pull/12939
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21931

Reviewed By: zrphercule

Differential Revision: D15950066

Pulled By: houseroad

fbshipit-source-id: 3470665d88fa34faa600940ef16b069a06002cd5
2019-06-25 15:22:08 -07:00
Richard Zou
4bc89bd5a6 Implement tensor.select(Dimname,int) (#21795)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21795
ghimport-source-id: d13af6078a47de1d6045cfbb7d278c378fe734fe

Test Plan: Imported from OSS

Differential Revision: D15833457

Pulled By: zou3519

fbshipit-source-id: fa52aff25ce0e12f31da3eef83ea948b4f7a5d9f
2019-06-21 16:16:45 -07:00
Jerry Zhang
081cd3a293 Change AT_CHECK to TORCH_CHECK in python_arg_parser.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21887

Differential Revision: D15869483

Pulled By: jerryzh168

fbshipit-source-id: f3d9d73078e7c1c08ec79694105e18084e7f9caf
2019-06-18 10:48:38 -07:00
Jerry Zhang
94f903654c Add qscheme() method (#20608)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20608

Exposing QScheme in python as Python objects like `torch.qscheme.per_tensor_affine` etc.

Reviewed By: zafartahirov

Differential Revision: D15364354

fbshipit-source-id: 4d6a96d67e9ead051cf4a8f934553a8c7232fdb7
2019-06-14 16:29:29 -07:00
Richard Zou
0d6eb209e6 Expose torch.empty(sizes, *, names, ...) to Python (#21648)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21648
ghimport-source-id: 583f155c8ee95967d2f8b9d8df27d94b9e725694

Differential Revision: D15804482

Pulled By: zou3519

fbshipit-source-id: f86520dda479100be2a752e4db8a902167413a83
2019-06-14 11:52:47 -07:00
Zachary DeVito
69aa2b2814 Collapse tracing_state.h into tracer.h (#21563)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21563
ghimport-source-id: de87e5e621da33326a9d2cb8a57d82d355166479

Reviewed By: suo

Differential Revision: D15729499

Pulled By: zdevito

fbshipit-source-id: 17b3e2e71d004f08c4413e80091388ae9ac2df2b
2019-06-09 15:28:29 -07:00
Zachary DeVito
c27cabe2d7 Revert D15719982: Collapse tracing_state.h into tracer.h
Differential Revision:
D15719982

Original commit changeset: 56bb021dd949

fbshipit-source-id: 2eb3e2c9745c35a84ebcc0fc7ac62b5f1fdd6437
2019-06-07 22:20:37 -07:00
Zachary DeVito
8c5f3acfc0 Collapse tracing_state.h into tracer.h (#21513)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21513
ghimport-source-id: 86278929818a8fc65684bd8f2ffac31460772fe9

Reviewed By: jamesr66a

Differential Revision: D15719982

Pulled By: zdevito

fbshipit-source-id: 56bb021dd949668562ea481c5ff0115a9ea2b02e
2019-06-07 20:57:01 -07:00
Brennan Vincent
0a3fb45d3d allow passing Python built-in types as dtypes (#21215)
Summary:
Another simple bit of syntax that NumPy supports and we don't.

Support int, float, and bool.

```python
>>> torch.randn((2,3), dtype=float)
tensor([[-0.1752, -0.3240, -0.6148],
        [ 0.1861,  1.6472,  0.1687]], dtype=torch.float64)
```

A bit confusingly, Python's "float" actually means double, but nothing we can do about that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21215

Differential Revision: D15697012

Pulled By: umanwizard

fbshipit-source-id: 9a38d960a610b8e67023486b0c9265edd3c22246
2019-06-06 13:17:23 -07:00
Edward Yang
c15254d4ab Expunge some more deprecated uses of AT_CHECK.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/21194

Differential Revision: D15576898

fbshipit-source-id: f030195f5bffe0027d4081aece57e2852aaf9ecb
2019-06-05 10:25:25 -07:00
Vitaly Fedyunin
5b78a5eadb Memory format support for contiguous and is_contiguous (#20455)
Summary:
#19975 was separated by 2 PRs.

This one:

Introduce MemoryFormat argument to the `x.is_contiguous(memory_format=torch.channels_last)` and to the `y = x.contiguous(memory_format=torch.channels_last)` functions.

At this moment both functions just operate with strides and doesn't store any tensor state.

(Original RFC #19092)

-----

Expands functionality of two tensor functions `.is_contiguous` and `.contiguous` (both python and c++ api).

Note: We had several complaints about `.to(memory_format)` function, and decided not to support it.

1.  `.contiguous` now support optional keyword-only argument - `memory_format`, which can be either `torch.contiguous_format` or `torch.channels_last`.

    - Using `torch.contiguous_format` will preserve existing `.contiguous()` behavior.

    - Calling `x.contiguous(memory_format=torch.channels_last)` returns new tensor which maintain same semantical layout (NCHW), but have different memory allocation pattern.

        `x.contiguous(memory_format=torch.channels_last)` expects input tensor to be 3d, 4d or 5d; and fails otherwise.

2. `.is_contiguous` now support optional keyword-only argument - `memory_format`, which can be either `torch.contiguous_format` or `torch.channels_last`.

    - `x.is_contiguous(memory_format=torch.contiguous_format)` preserves same functionality as `x.is_contiguous()` and remains unchanged.

    - `x.is_contiguous(memory_format=torch.channels_last)` returns true if A) input tensor is contiguous in memory AND B) allocated in the memory in NWHC (or similar for 3d,5d) format.

Note: By the end of the phase one `x.is_contiguous(memory_format=torch.channels_last)` will calculate state of the Tensor on every call. This functionality going to be updated later.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20455

Differential Revision: D15341577

Pulled By: VitalyFedyunin

fbshipit-source-id: bbb6b4159a8a49149110ad321109a3742383185d
2019-05-16 07:18:24 -07:00
Brennan Vincent
72bb84c518 Provide a few default args for numpy translation (#20451)
Summary:
Add automatic translations for a few argument names that commonly differ between PyTorch and NumPy.

For now, they are as follows:

* `keepdim` -> `keepdims`
* `dim` -> `axis`
* `input` -> (any of `a`, `x`, `x1`)
* `other` -> `x2`

Basic examples:
```python
>>> t=torch.randn(10,10)
>>> torch.sum(x=t, axis=1)
tensor([ 0.5199, -0.3768,  4.3619, -0.9105,  1.1804,  1.0837, -0.9036,  0.2365,
         1.1171, -0.0999])
```
```python
>>> torch.add(x1=5, x2=6)
tensor(11)
```

The additional overhead is zero when using traditional PyTorch argument names, and a few (usually 1) extra PyDict lookups when using NumPy argument names.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20451

Differential Revision: D15337521

Pulled By: umanwizard

fbshipit-source-id: 7a7d389786f4ccf5c86a14ecb2002c61730c51b5
2019-05-15 10:13:17 -07:00
Edward Yang
97e1f07ffc Replace AT_CHECK with TORCH_CHECK [shard 10/10]
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/20436

Reviewed By: jerryzh168

Differential Revision: D15318926

fbshipit-source-id: 71a43070cc50cc174f703ebc595f1d87c6fc1e91
2019-05-15 07:35:37 -07:00
Mikhail Zolotukhin
722eb48ff2 Cleanup includes in torch/csrc/* (#19924)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19924
ghimport-source-id: f7248b16c8e263a7d0ba7975b1fc0b00cb2cf2c0

Differential Revision: D15125018

Pulled By: ZolotukhinM

fbshipit-source-id: 322c7ca53e38ef8b43b5ac5bd747b28bc10379f1
2019-05-06 14:03:18 -07:00
Vitaly Fedyunin
d14abe3aff Add torch.from_file function similar to the Storage.from_file, but returning tensor (#18688)
Summary:
Porting `torch.Storage.from_file(filename, shared, size)` function to `torch.from_file(filename, shared, size, dtype=torch.int)`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18688

Differential Revision: D15012644

Pulled By: VitalyFedyunin

fbshipit-source-id: 3f62ca9e414fad3847fe71b785ff97b5bdc2d2cd
2019-04-24 15:38:56 -07:00
Roy Li
689dd800ed Generate only one Type class per backend (#19295)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19295
ghimport-source-id: 9345110f91f044a449804ddd5116cc9179444a00

Differential Revision: D14948581

Pulled By: li-roy

fbshipit-source-id: a317b03d58d621e8df162918038f7543bfb13ba2
2019-04-21 21:16:14 -07:00
Gao, Xiang
11c89dde55 Allow structseq to be input of operators where tuple is expected (#17208)
Summary:
Currently the following code gives an error on python 2 because `ret` is a structseq which is not a tuple
```python
ret = a.max(dim=0)
ret1 = torch.max(a, dim=0, out=ret)
```

This PR modify tuple check in python arg parser to allow structseq to be input of operators where tuple is expected, which would make the above code work.

Depend on: https://github.com/pytorch/pytorch/pull/17136
Partially fixes: https://github.com/pytorch/pytorch/issues/16813
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17208

Differential Revision: D14280198

Pulled By: VitalyFedyunin

fbshipit-source-id: beffebfd3951c4f5c7c8fe99a5847616a89491f3
2019-03-11 11:33:35 -07:00
Edward Yang
4404762d7d Rename IntList to IntArrayRef. (#16751)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16751

This was made more complicated by the fact that ivalue::IntList
is a thing.  So I had to fix all of the sites where we referring
to IValue post facto.

The following codemods were run, in this order:

```
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in IntList IntArrayRef
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in IntArrayRef::create IntList::create
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in ivalue::IntArrayRef ivalue::IntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in Tag::IntArrayRef Tag::IntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in isIntArrayRef isIntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in toIntArrayRef toIntList
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in 'Shared<IntArrayRef>' 'Shared<IntList>'
codemod -m -d . --extensions cc,cpp,cu,cuh,h,hpp,py,cwrap,yaml,in 'intrusive_ptr<IntArrayRef>' 'intrusive_ptr<IntList>'
```

Some manual fixups were done afterwards; they can be reviewed separately
at https://github.com/pytorch/pytorch/pull/16752

Reviewed By: dzhulgakov

Differential Revision: D13954363

fbshipit-source-id: b5c40aacba042402155a2f5a229fa6db7992ac64
2019-02-05 14:54:34 -08:00
Wanchao Liang
b89b46abfb Remove python_default_init from ATen and use Optional (#15234)
Summary:
Optional clean up. This PR remove python_default_init from the yaml files, and the code-gen, and utilize optional type to do the work.

This also fix the bug in the #13149 to correctly adopt as_strided backward.

Fixes #9941
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15234

Differential Revision: D13502044

Pulled By: wanchaol

fbshipit-source-id: 774b61fc4414482cf11d56e22bd0275aefb352a4
2018-12-19 21:38:50 -08:00
Edward Yang
517c7c9861 Canonicalize all includes in PyTorch. (#14849)
Summary:
Anywhere we used #include "foo.h", we now say #include <foo.h>
Paths are adjusted to be rooted out of aten/src, torch/lib, or
the root level directory.

I modified CMakeLists.txt by hand to remove TH and THC from
the include paths.

I used the following script to do the canonicalization:

```
  import subprocess
  import re
  import os.path

  files = subprocess.check_output(['git', 'ls-files']).decode('utf-8').rstrip().split('\n')
  for fn in files:
      if not any(fn.endswith(suff) for suff in ['.cu', '.cpp', '.in', '.h', '.hpp', '.cu', '.cuh', '.cc']):
          continue
      if not any(fn.startswith(pref) for pref in ["aten/", "torch/"]):
          continue
      with open(fn, 'r') as f:
          c = f.read()
      def fmt(p):
          return "#include <{}>".format(p)
      def repl(m):
          p = m.group(1)
          if p in ["dlfcn.h", "unistd.h", "nvrtc.h", "cuda.h", "cuda_runtime.h", "cstdint", "cudnn.h", "Python.h", "cusparse.h", "cuda_runtime_api.h", "cuda_fp16.h", "cublas_v2.h", "stdint.h", "curand_kernel.h"]:
              return fmt(p)
          if any(p.startswith(pref) for pref in ["torch/csrc", "c10/", "ATen/", "caffe2/", "TH/", "THC/", "Eigen/", "gtest/", "zdl/", "gloo/", "onnx/", "miopen/"]):
              return fmt(p)
          for root in ["aten/src", "torch/lib", ""]:
              for bad_root in [os.path.dirname(fn), "aten/src/TH", "aten/src/THC", "torch/csrc"]:
                  new_p = os.path.relpath(os.path.join(bad_root, p), root)
                  if not new_p.startswith("../") and (os.path.exists(os.path.join(root, new_p)) or os.path.exists(os.path.join(root, new_p + ".in"))):
                      return fmt(new_p)
          print("ERROR: ", fn, p)
          return m.group(0)
      new_c = re.sub(r'#include "([^"]+)"', repl, c)
      if new_c != c:
          print(fn)
          with open(fn, 'w') as f:
              f.write(new_c)
```

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14849

Reviewed By: dzhulgakov

Differential Revision: D13363445

Pulled By: ezyang

fbshipit-source-id: 52361f878a672785f9306c9e9ab2513128092b68
2018-12-08 19:38:30 -08:00
Peter Goldsborough
d6c53328f9 Large scale fix of python-related files in torch/csrc/
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14515

Differential Revision: D13247966

Pulled By: goldsborough

fbshipit-source-id: 7a127c508fc576a7a92626dd6b729f660162d628
2018-12-07 13:04:46 -08:00
Junjie Bai
ba0ebe33c1 Unify device argument parsing between torch and c10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14786

Differential Revision: D13334501

Pulled By: bddppq

fbshipit-source-id: ae3536be1fe0dcd6a1552ec93629ecc9554c0d7c
2018-12-05 18:37:32 -08:00
Peter Goldsborough
875be849e9 Rename _local_scalar to item() (#13676)
Summary:
Make `at::_local_scalar` more "official" by renaming it to `item()`.

gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13676

Differential Revision: D13003020

Pulled By: goldsborough

fbshipit-source-id: 0ac25f5237fb81a1576304a0a02f840ff44168a4
2018-12-04 13:19:26 -08:00
albanD
6c8ac50753 Fix exception catching to catch c10::Error properly (#13665)
Summary:
In particular, this was breaking the logic for cudnn algorithm to fall back to a less memory hungry algorithm if the selected one OOM when creating the workspace.
c10::Error are subclass of `std::exception` and not `std::runtime_error`.

I removed `runtime_error` in all places in our code and replaced them with `const exception`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13665

Differential Revision: D12958396

Pulled By: soumith

fbshipit-source-id: af557efd9887b013140113d3067de157ffcf8465
2018-11-07 11:22:48 -08:00
Sam Gross
a4f00c3d1e Fix error message in tensorlist()
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13392

Differential Revision: D12860921

Pulled By: colesbury

fbshipit-source-id: 86da3ef15d70b0343dc922a3842449001c1afffa
2018-10-31 11:19:56 -07:00
James Reed
db0b5c7ab7 ArgumentStash for int64_t arguments (#12939)
Summary:
Closes https://github.com/pytorch/pytorch/issues/12906. https://github.com/pytorch/pytorch/issues/12580 is still open because the schema is marked as `traceable=false` in the arg parser constructor, I think.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12939

Differential Revision: D10492031

Pulled By: jamesr66a

fbshipit-source-id: ca5376de3997b5fb62b493e2e6a9bb0d6c3b9687
2018-10-29 13:55:24 -07:00
Wanchao Liang
4e1c64caee Add c10::optional to type syntax (#12582)
Summary:
This PR adds optional type to ATen native, autograd, JIT schema and Python Arg parser, closes #9513. It allows us to use optional default values (including None) for function signature and implementations like clamp, etc., and also let us remove the python_default_init hack.

Follow up:

remove python_default_init completely.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12582

Differential Revision: D10417423

Pulled By: wanchaol

fbshipit-source-id: 1c80f0727bb528188b47c595629e2996be269b89
2018-10-25 16:08:29 -07:00
Yangqing Jia
713e706618 Move exception to C10 (#12354)
Summary:
There are still a few work to be done:

- Move logging and unify AT_WARN with LOG(ERROR).
- A few header files are still being plumbed through, need cleaning.
- caffe2::EnforceNotMet aliasing is not done yet.
- need to unify the macros. See c10/util/Exception.h

This is mainly a codemod and not causing functional changes. If you find your job failing and trace back to this diff, usually it can be fixed by the following approaches:

(1) add //caffe2/c10:c10 to your dependency (or transitive dependency).
(2) change objects such as at::Error, at::Optional to the c10 namespace.
(3) change functions to the c10 namespace. Especially, caffe2::MakeString is not overridden by the unified c10::str function. Nothing else changes.

Please kindly consider not reverting this diff - it involves multiple rounds of rebasing and the fix is usually simple. Contact jiayq@ or AI Platform Dev for details.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/12354

Reviewed By: orionr

Differential Revision: D10238910

Pulled By: Yangqing

fbshipit-source-id: 7794d5bf2797ab0ca6ebaccaa2f7ebbd50ff8f32
2018-10-15 13:33:18 -07:00
Christian Puhrsch
a9e6a673ae Remove caffe2::Tensor::capacity_nbytes, at::Tensor::to##name##Data, (#11876)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11876

Modern C++ api instead of macros, item() is aligned with Python frontend. caffe2::Tensor::capacity_nbytes is effecitvely unused and confusing w.r.t. caffe2::Tensor::nbytes().

codemod -d caffe2           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCByte   "item<uint8_t>"
codemod -d caffe2           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCLong   "item<int64_t>"
codemod -d caffe2           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCInt    "item<int32_t>"
codemod -d caffe2           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCDouble "item<double>"
codemod -d caffe2           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCFloat  "item<float>"

codemod -d caffe2           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toByteData   "data<uint8_t>"
codemod -d caffe2           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toLongData   "data<int64_t>"
codemod -d caffe2           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toIntData    "data<int32_t>"
codemod -d caffe2           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toDoubleData "data<double>"
codemod -d caffe2           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toFloatData  "data<float>"

codemod -d hphp           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCByte   "item<uint8_t>"
codemod -d hphp           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCLong   "item<int64_t>"
codemod -d hphp           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCInt    "item<int32_t>"
codemod -d hphp           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCDouble "item<double>"
codemod -d hphp           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCFloat  "item<float>"

codemod -d hphp           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toByteData   "data<uint8_t>"
codemod -d hphp           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toLongData   "data<int64_t>"
codemod -d hphp           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toIntData    "data<int32_t>"
codemod -d hphp           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toDoubleData "data<double>"
codemod -d hphp           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toFloatData  "data<float>"

codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCComplexDouble "item<std::complex<double>>"

codemod -d tc           --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCFloat  "item<float>"

Reviewed By: ezyang

Differential Revision: D9948572

fbshipit-source-id: 70c9f5390d92b82c85fdd5f8a5aebca338ab413c
2018-09-24 10:40:10 -07:00
Sebastian Messmer
f4d9fe395d Remove intrusive_ptr::reclaim() in Storage (#11352)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11352

Pushing manual refcounting further back, making things safer.

Reviewed By: ezyang

Differential Revision: D9694327

fbshipit-source-id: befdbcac199225383a93520472ee7c6511a0e9cd
2018-09-14 16:57:10 -07:00
Roger-luo
5bc90b8554 support conversion and dispatch of complex numbers (#11603)
Summary:
- Just a simple fix to support `fill_`
- And a fix for indexing in `pytorch-complex`

Differential Revision: D9804061

Pulled By: ezyang

fbshipit-source-id: 631129b3fa220a9670770b3766f14a8e03633bdf
2018-09-13 11:10:37 -07:00
Adam Paszke
3e665cc29b Improve support for tracing sizes, add more tracer warnings (#11288)
Summary:
Many constructors like `torch.zeros` or `torch.randn` didn't support
size tracing correctly which is fixed by this pass. Same issue has been
fixed in legacy tensor constructors.

Additionally, new tensor constructors, which do not participate in
tracing (most notably `torch.tensor`, `torch.as_tensor` and
`torch.from_numpy`) raise a warning when they are used.

Finally, entering a traceable operation disables the tracing in its body.
This is needed because

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11288

Reviewed By: ezyang

Differential Revision: D9751183

Pulled By: apaszke

fbshipit-source-id: 51444a39d76a3e164adc396c432fd5ee3c8d5f7f
2018-09-10 15:22:48 -07:00
Peter Goldsborough
fb4e8088f3 Remove methods that start with an underscore from at::Tensor (#11152)
Summary:
This PR cleans up the `at::Tensor` class by removing all methods that start with an underscore in favor of functions in the `at::` namespace. This greatly cleans up the `Tensor` class and makes it clearer what is the public and non-public API.

For this I changed `native_functions.yaml` and `Declarations.cwrap` to make all underscore methods `variant: function` (or add such a statement to begin with), and then fixed all code locations using the underscore methods.

ezyang colesbury gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11152

Differential Revision: D9683607

Pulled By: goldsborough

fbshipit-source-id: 97f869f788fa56639c05a439e2a33be49f10f543
2018-09-07 11:55:11 -07:00
Gregory Chanan
6219c4a28f Make Scalar::toTensor a free function, move Scalar to ATen/core.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11125

Reviewed By: ezyang

Differential Revision: D9599798

Pulled By: gchanan

fbshipit-source-id: 2fec682c109013a82788dfba13f4d30b2945d3f4
2018-09-04 16:25:57 -07:00
Edward Yang
f7b02b3a68 Change Tensor/TensorImpl to use c10::intrusive_ptr (#10824)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10824

API additions:
- Tensor(c10::intrusive_ptr<TensorImpl,UndefinedTensor>&&)
- Tensor(const c10::intrusive_ptr<TensorImpl,UndefinedTensor>&)
- Tensor::operator=(Tensor&&) && (for completeness sake)
- TensorBase::unsafeGetTensorImpl()
- TensorBase::unsafeReleaseTensorImpl()
- TensorBase::getIntrusivePtr()
- TensorImpl::type_id()
- Tensor::set_data()
- Tensor::is_same(Tensor)
- Tensor::use_count()
- Tensor::type_id()
- Tensor::scalar_type()
- WeakTensor::is_same(WeakTensor)
- intrusive_ptr::weak_use_count()
- weak_intrusive_ptr::weak_use_count()
- c10::raw::intrusive_ptr::{incref,decref,make_weak}
- c10::raw::weak_intrusive_ptr::{incref,decref,lock}

API changes:
- Tensor::pImpl is no longer public (and now named tensor_impl_)
    - Most methods accessed this way are now accessible on Tensor
      maybe_zero_dim() and set_wrapped_number() being prominent exceptions
      (they are now accessed through unsafeGetTensorImpl())
- Type is no longer friend of Tensor
- TensorBase::reset(TensorImpl*) is deleted
- TensorBase::reset(TensorImpl*, bool should_retain) is deleted
- TensorBase::swap(TensorBaseImpl&) is deleted; use std::swap instead
- TensorBase::get() is deleted; use unsafeGetTensorImpl() instead
- TensorBase::detach() is deleted; use unsafeReleaseTensorImpl() instead
- TensorBase::retain() is deleted; use _raw_incref() instead
- TensorBase::release() is deleted; use _raw_decref() instead
- WeakTensor lost most of its methods (it no longer inherits from
  TensorBase)
- TensorImpl::storage() is now a const method
- Tensor(TensorBase) constructor removed, instead
  we go through getIntrusivePtr().  I'm not sure about
  this change; I happened to have accidentally removed the
  TensorBase constructor and decided to fix call sites,
  but I could go the other way.
- detail::set_data() is deleted; use Tensor::set_data() instead
- c10::raw_intrusive_ptr_target removed; use the functions in c10::raw instead.
  (The reason for this change, is that it is invalid to cast an intrusive_ptr_target*
  to a raw_intrusive_ptr_target* to take advantage of the methods. But there is
  no reason the incref/decref methods shouldn't also work on intrusive_ptr_target;
  it is primarily an API consideration. We can be more standards compliant by
  keeping them as functions, which are universally applicable.)
- intrusive_ptr::reclaim() and weak_intrusive_ptr::reclaim() now work on
  pointers of the NullType. (This counts as a bug fix, because the documentation
  specified that pointers produced by release() are valid to reclaim(), and
  a release() on a null intrusive_ptr produces the NullType::singleton())

Bug fixes:
- Dispatch code for mutable references incorrectly returned
  a reference to a value argument (which would immediately
  go out of scope).  They now correctly return a tensor by
  value.
- intrusive_ptr copy/move assignment did not work correctly when
  an object was assigned to itself. We now check for this case and
  no-op if so. (This bug manifested itself as a Tensor mysteriously
  becoming an UndefinedTensor after lines of code like
  'x = x.mul_(y)')

Other changes:
- The checked cast functions in Utils.h have now been
  renamed and detemplatized into checked unwrap functions.
- Added type_id() and scalar_type() methods to Tensor
- pImpl is no longer public
- Documented what the && overloads are doing
- All occurrences of 'new TensorImpl' (and similar spellings, like 'new THTensor')
  have been expunged. This is NO LONGER a valid way to create a new
  tensor, and if you do this, upon your first incref, you will catch an ASSERT
  failure saying that only tensors created by intrusive_ptr::release() are valid
  to reclaim(). Use c10::make_intrusive instead in this situation.
- IValue is adjusted to use intrusive_ptr instead of Retainable, and all
  other sub-classes of Retainable were modified to use intrusive_ptr.
  When doing this, I had to make the constructors of sub-classes like
  ConstantList public, so that c10::make_intrusive could invoke them.  Fortunately,
  if you incorrectly stack allocate a ConstantList, and then try to get an
  intrusive_ptr to it, it will fail, as stack allocated ConstantLists have refcount 0.
- IValue very narrowly sidesteps the problem of handling NullType, as it
  considers intrusive_ptr<TensorImpl> identical to intrusive_ptr<TensorImpl, UndefinedTensor>
  which is not always true. This was always the case, but there's now a comment
  explaining what's going on.

Some MSVC bugs were uncovered during the preparation of this patch.
They are documented as comments in the code.

Reviewed By: gchanan

Differential Revision: D9481140

fbshipit-source-id: 14a8ea0c231ed88b5715fb86d92730926f9f92fc
2018-08-27 16:11:01 -07:00
Gregory Chanan
87a7840fa6 Remove Tensor constructor of Scalar. (#10852)
Summary:
This is along the way of removing Tensor as a member of the tagged union in Scalar.  This simplifies ordering dependencies, because currently Scalar and Tensor both depend on each other (so we introduce a TensorBase).  Also, this API isn't particularly useful publicly: we can't autograd through Scalars, so you still need a Tensor overload basically everywhere anyway.

I'm undecided what the final API should be here.  We could keep a Tensor constructor on Scalar, but have it generate a local scalar; this is convenient but given this API used to be non-synchronizing, it may not be the best.

For now, I'm just using _local_scalar, which is clear, although we should get rid of the prefix _ if that's the API we intend to promote.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10852

Reviewed By: ezyang

Differential Revision: D9496766

Pulled By: gchanan

fbshipit-source-id: 16f39b57536b9707132a5a4d915650c381bb57db
2018-08-24 16:02:05 -07:00
Edward Yang
19031c68dc Use intrusive_ptr in Storage; replace unique_ptr<Storage> with Storage (#10488)
Summary:
```
Use intrusive_ptr in Storage; replace unique_ptr<Storage> with Storage

This patch does two major changes:

- It replaces the use of Retainable in Storage with a new implementation
  based on intrusive_ptr.  This will be necessary because Caffe2 will
  be using this class to implement intrusive_ptrs, and we need to
  line these up for the merge.  One good thing about the new implementation is
  that the default copy/move constructors/assignment operators and destructor
  work automatically, instead of needing to be hardcoded into Storage/Tensor.

- It replaces all places where we returned std::unique_ptr<Storage> with
  Storage, collapsing an unnecessary double indirection that is no longer
  necessary now that we have correctly working copy/move constructors.

I didn't initially want to do step (2), but it was very important to
eliminate all bare uses of new Storage and new StorageImpl, and this making
the API change was the most straightforward way to do this.

HOW TO FIX YOUR CODE IN THE NEW API

- You no longer need to dereference the result of tensor.storage() to pass
  it to set.  So, instead of:

      x.set_(*y.storage());

  just write:

      x.set_(y.storage());

- If you were accessing methods on StorageImpl via the pImpl() method, you
  must use the dot operator to run pImpl().  Even better; just drop pImpl,
  we now have method forwarding.  So, instead of:

      storage->pImpl()->data();

  just do:

      storage->data();
      // storage.pImpl()->data() works too but is not as recommended

- storage->getDevice() is no more; instead use storage->device().index()

MISC CODE UPDATES

- retain, release, weak_retain, weak_release and weak_lock are now
  reimplemented using the "blessed API", and renamed to make it
  clearer that their use is discouraged.

- nvcc OS X and general OS X portability improvements to intrusive_ptr

- A new comment in intrusive_ptr describing how stack allocated
  intrusive_ptr_targets work differently than heap allocated ones
  from c10::make_intrusive

CAVEAT EMPTOR

- THStorage_weakRetain used to work on strong pointers, but it NO LONGER
  works with intrusive_ptr.  You must reclaim the strong pointer into a
  real strong pointer, construct a weak pointer from it, and then release
  the strong and weak pointers.  See StorageSharing.cpp for an example.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10488

Reviewed By: gchanan

Differential Revision: D9306134

Pulled By: ezyang

fbshipit-source-id: 02d58ef62dab8e4da6131e1a24834a65c21048e2
2018-08-21 21:39:55 -07:00
Edward Yang
6bdbad93b9 Refactor Device to not depend on Backend. (#10478)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10478

- Removed Backend constructor from Device, and fixed all
  use-sites to use DeviceType::CPU instead of kCPU, or
  use a new function backendToDeviceType to perform
  the conversion.
- New method device_type() on Type; it gives you the
  underlying device type, e.g., CPU for SparseCPU.
- We add backward compatibility for kCPU/kCUDA uses,
  by introducing a new special type which is implicitly
  convertible to both DeviceType and Backend.  As long as
  you don't define a function that's overloaded on both
  DeviceType and Backend (but not on BackendOrDeviceType),
  the implicit conversions will ensure that uses
  of at::Device(at::kCPU) keep working. We fixed use-sites in
  the library, but did NOT fix sites in the test code, so that
  we can exercise this BC code.

Reviewed By: Yangqing

Differential Revision: D9301861

fbshipit-source-id: 9a9d88620500715c7b37e655b4fd761f6dd72716
2018-08-18 17:39:14 -07:00
Peter Goldsborough
04939a4745 Match parameter names and = default (#9737)
Summary:
More clang tidy cleanups in `torch/csrc`. This time:

1. `hicpp-use-equals-default` recommends `= default` instead of `{}` for constructors/destructors. This is better practice because it expresses the intent better (https://stackoverflow.com/questions/6502828/what-does-default-mean-after-a-class-function-declaration)
2. `readability-inconsistent-declaration-parameter-name` enforces that parameter names in the declaration match parameter names in the definition. This is just generally useful and can prevent confusion and bugs.

Also updated my script a little bit.

apaszke ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9737

Differential Revision: D9069069

Pulled By: goldsborough

fbshipit-source-id: f7b3f3a4eb4c9fadc30425a153566d3b613a41ae
2018-07-30 14:10:00 -07:00
Sam Gross
829d763c69 Implement add, sub, mul, div using TensorIterator (#8919)
Summary:
```
This adds TensorIterator, a helper class for computing element-wise
operations that's intended to replace the CPU and CUDA apply utils
functions.

CPU kernels are implemented as functions that operate on strided 1-d
tensors compared to CPUApplyUtils which operated individual elements. This
allows the kernels to handle vectorization, while TensorIterator handles
parallelization and non-coalesced dimensions.

GPU kernels continue to operate on elements, but the number of
specializations is reduced. The contiguous case remains the same. The
non-contiguous case uses a single (reduced) shape for all operands and
the fast integer division from THCIntegerDivider. To avoid extra
specializations for indexing with 64-bits, large operations are split
into smaller operations that can be indexed with 32-bits.

Major semantic changes:

 - No more s_add, s_mul, s_div, or s_sub. Broadcasting is handled by
   TensorIterator. The autograd engine performs the reduction assuming
   standard broadcasting if the gradient shape does not match the
   expected shape. Functions that do not use standard broadcasting rules
   should either continue to trace the expand calls or handle the
   reduction in their derivative formula.

 - Use ONNX v7, which supports broadcasting ops.

Performance impact:

 - Small increased fixed overhead (~0.5 us)
 - Larger overhead for wrapped numbers (~2.5 us)
 - No significant change for ops on contiguous tensors
 - Much faster worst-case performance for non-contiguous GPU tensors
 - Faster CPU bias addition (~2x)
 - Faster GPU bias addition (~30% faster)

Future work:

 - Decrease overhead, especially for wrapping numbers in Tensors
 - Handle general inter-type operations
 - Extend to unary ops and reductions
 - Use buffering for compute-bound operations on non-contiguous tensors
   (pull in from CPUApplyUtils)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8919

Differential Revision: D8677600

Pulled By: colesbury

fbshipit-source-id: 61bc9cc2a36931dfd00eb7153501003fe0584afd
2018-07-27 14:43:24 -07:00
Peter Goldsborough
47492ed451
[C++ API] Bag of fixes (#8843)
* Bag of fixes

* Rename tensor_range.h to tensor_list_view.h

* Post rebase fixes

* Rename torch::tensor namespace to torch::tensors due to name conflict

* Avoid recursion in Module::to
2018-06-25 21:11:49 -07:00
Peter Goldsborough
372d1d6735
Create ATen tensors via TensorOptions (#7869)
* Created TensorOptions

Storing the type in TensorOptions to solve the Variable problem

Created convenience creation functions for TensorOptions and added tests

Converted zeros to TensorOptions

Converted rand to TensorOptions

Fix codegen for TensorOptions and multiple arguments

Put TensorOptions convenience functions into torch namespace too

All factory functions except *_like support TensorOptions

Integrated with recent JIT changes

Support *_like functions

Fix in place modification

Some cleanups and fixes

Support sparse_coo_tensor

Fix bug in Type.cpp

Fix .empty calls in C++ API

Fix bug in Type.cpp

Trying to fix device placement

Make AutoGPU CPU compatible

Remove some auto_gpu.h uses

Fixing some headers

Fix some remaining CUDA/AutoGPU issues

Fix some AutoGPU uses

Fixes to dispatch_tensor_conversion

Reset version of new variables to zero

Implemented parsing device strings

Random fixes to tests

Self review cleanups

flake8

Undo changes to variable.{h,cpp} because they fail on gcc7.2

Add [cuda] tag to tensor_options_cuda.cpp

Move AutoGPU::set_index_from into .cpp file because Windows is stupid and sucks

Fix linker error in AutoGPU.cpp

Fix bad merge conflict in native_functions.yaml

Fixed caffe2/contrib/aten

Fix new window functions added to TensorFactories.cpp

* Removed torch::TensorOptions

Added code to generate wrapper functions for factory methods

Add implicit constructor from Backend to TensorOptions

Remove Var() from C++ API and use torch:: functions

Use torch:: functions more subtly in C++ API

Make AutoGPU::set_device more exception safe

Check status directly in DynamicCUDAHooksInterface

Rename AutoGPU to DeviceGuard

Removed set_requires_grad from python_variables.h and warn appropriately in Variable::set_requires_grad

remove python_default_init: self.type()

Add back original factory functions, but with deprecation warnings

Disable DeviceGuard for a couple functions in ATen

Remove print statement

Fix DeviceGuard construction from undefined tensor

Fixing CUDA device compiler issues

Moved as many methods as possible into header files

Dont generate python functions for deprecated factories

Remove merge conflict artefact

Fix tensor_options_cuda.cpp

Fix set_requires_grad not being checked

Fix tensor_new.h

TEMPORARILY put some methods in .cpp files to see if it solves issues on windows and mac

Fix bug in DeviceGuard.h

Missing includes

TEMPORARILY moving a few more methods into .cpp to see if it fixes windows

Fixing linker errors

* Fix up SummaryOps to use new factories

Undo device agnostic behavior of DeviceGuard

Use -1 instead of optional for default device index

Also move DeviceGuard methods into header

Fixes around device index after optional -> int32_t switch

Fix use of DeviceGuard in new_with_tensor_copy

Fix tensor_options.cpp

* Fix Type::copy(

* Remove test_non_float_params from ONNX tests

* Set requires_grad=False in ONNX tests that use ints

* Put layout/dtype/device on Tensor

* Post merge fixes

* Change behavior of DeviceGuard to match AutoGPU

* Fix C++ API integration tests

* Fix flip functions
2018-06-16 00:40:35 -07:00
Edward Z. Yang
711e5a6ceb
Port THS to ATen. (#8409)
* Port THS to ATen.

The basic structure of the patch:

- All kernels in aten/src/THS got rewritten as native
  functions in aten/src/ATen/native/sparse

  I took the liberty to rename some of the kernels,
  opting for a longer, more transparent names than
  things like 'spaddcmul'.

- Instead of holding fields for sparse tensor in the TH
  C struct THSTensor, they are now held in a C++ class
  SparseTensorImpl (this explains why I had to do this
  all in one go; I can't have *two* reps for sparse
  tensors!)

  Along the way, we change a key internal representation
  invariant: an "empty" sparse tensor has dimI == 1 and
  dimV == 0 (this is different from dimI == 0 and dimV == 0
  we had before); this ensures that we maintain the invariant
  that dim == dimI + dimV.  "Scalar" sparse tensors are
  made illegal, because there really is no way to properly
  express them in COO format.

- Because we haven't ported THCS or any of the traditional
  dense TH implementations, there is a new set of adapter
  functions in native/LegacyBridge.cpp exclusively devoted
  to deciding whether or not to go to the new native implementation
  or back to the legacy TH binding (prefixed with th_).
  The intent is that when everything gets ported, we can
  delete this file.

- I've kept the stubs for all the THS functions, but they now all
  error if you try to actually call them.  Eventually, we should
  replace these with calls to ATen so that everything keeps
  working.

- I gobbled up SparseMM (SparseMM.cpp is no more). It was tasty.

There are some miscellaneous improvements which were needed for other
changes in this patch:

- There is now AT_FORALL_SCALAR_TYPES_EXCEPT_HALF, which does what
  it says on the tin.

- axpy templated function moved to TH/BlasUtils.h, there's a new macro
  which lets you easily forward to all of the TH functions. We also expose
  THBlas_copy.  I'm not terribly pleased with these functions but
  they seem to serve a purpose they need.

- New method on Tensor to get TensorImpl*, unsafeGetTensorImpl

- accessor() is now this-const, since const-correctness on Tensor is a lie

- New toSparse()/toDense() methods on Type; now you can call these
  directly without having to manually apply at::toSparse/toDense
  on the Backend and then running toBackend yourself.

Changes to the kernels:

- Previously, the whole body of all kernels was compiled for
  every supported scalar type.  In our new implementation,
  the scalar dispatch has been pushed into the smallest extent
  which (1) is not in a type loop and (2) requires statically
  knowing the scalar type.  These sites all use
  AT_DISPATCH_ALL_TYPES.  I tried to use lambdas as much as
  possible, but sometimes it was not possible when a OpenMP
  pragma was used.

- Anywhere we tested if the nDimension of a tensor was zero,
  we replaced with a test that numel is zero.  Because, as we
  known, nDimension of zero-size tensors in TH is zero, and
  that's wrong wrong wrong (and not done this way in ATen).

Some subtleties:

- Places where previously fastget1d was used, I now use a
  TensorAccessor.  However, you have to be careful about grabbing
  the accessor, because sometimes you will be accessor'ing
  indices/values and they are empty, which means they will
  be *1D* ("oh, aren't indices always 2D?" Nope. Nyet.)
  So, essentially, it is only safe to grab an accessor *after*
  you have checked that nnz != 0.  All of these shenanigans
  will go away when we properly support zero-size dimensions.

  A few places, we test for this case just by wrapping the loop
  in a conditional on nnz.  Some other places this is not so easy,
  so we instead short-circuit the function with a special case for
  when nnz == 0 (usually, these implementations are degenerate).

- There is a very subtle but important difference between
  _sparse_get_impl(self)->indices() and self._indices();
  the latter may return a view!  This is because nnz is
  not guaranteed to match the dimensions of indices/values;
  you can "truncate" a sparse tensor by setting the nnz.
  Actually, I think this is not a good idea and we should
  enforce a stronger invariant, but for this patch I slavishly
  adhere to the old ways, and as such I have to be very
  careful if I want to resize something, I had better use
  the former and not the latter.

- I had to reimplement broadcasting by hand (thus the s_
  and non-s_ functions in the sparse native files).  There
  is a very important distinction between foo_out and foo_,
  so it is important that the LegacyBridge function always
  call to the lower layer, and not try to avoid boilerplate
  by calling to another LegacyBridge function first.
  I did NOT put broadcasting in LegacyBridge (even though,
  ultimately, that's where it must live), because the th_
  functions which are invoked from LegacyBridge handle
  broadcasting themselves, and I don't want to broadcast
  twice.

- Sparse function MUST explicitly specify the Type they
  dispatch from, otherwise Variable wrapping/unwrapping will
  not work correctly.  If you use _get_sparse_impl, that is
  sufficient to levy this requirement.

- The "has native" tests in LegacyBridge.cpp are not 100%,
  because some of the functions are mixed dense-sparse functions,
  and so you can't just say, "Oh, if it's sparse and CPU, call
  the native sparse implementation."  This is handled on a
  case by case basis.  There is some especially complex
  logic for add(), which has dense-dense, sparse-sparse
  and dense-sparse implementations.

- I added some uses of SparseTensorRef in native_functions.yaml,
  but you will notice that these are all on native_* functions,
  and not the actual, top-level functions.  So the SparseTensorRef
  is purely documentary (helping you not call the wrong overload)
  but there is no magic; we do the wrapping ourselves the hard
  way. (This is in constrast to the TH binding code which is magical.)
  Except for _sparse_mask; _sparse_mask is magical.

- There is a raw_copy_sparse_ method, which is really my way of
  getting around the fact that copy_ has never been implemented
  for sparse tensors (even before this patch), but there IS a
  super secret, internal way of doing these copies that the THS
  code used, and which I needed to get my hands on when I did this
  port.  We should refactor so that either (a) copy_ does support
  sparse-sparse copy natively, or (b) we do this other ways.

- Irritatingly, I must explicitly resize_as_ before copy_ into
  a tensor.  This was not the case with THTensor_(copy) but I don't
  have any direct binding that doesn't have this requirement.

- For some reason, the sparse tensor constructor accepts a scalar
  tensor for the values tensor.  This is kind of weird because
  you always need an nnz-dimension.  However, the old code supported
  this and just expanded it into a 1D size 0 tensor; so we need some
  explicit code to do this.

There are maybe a bit more AT_ASSERTs in some of the kernels
than is wise.  I added them all when I was debugging and was
loathe to remove them.

Some last mile fixes after this commit went into PR

- Move expand outside of dispatch so autograd works (it used to be inside and then we lost all of the recorded broadcasts).
- Hack to duplicate the derivatives for our now two definitions TH and native. Mercifully the derivatives are short.
- Apparently, TH has a special case to make foo_ functions method only, and if you don't do this the Python arg parsing is wrong. We carefully work around this in the native bindings
- Apply DCE to a test_jit case, fixes wobbling due to DCE trick in tracing
- Update test_function's output
- Some last mile fixes for dispatch confusion in sparse_coo_tensor functions.
- New simplified regression test based on failures I saw in ONNX
- Increase tolerance on super resolution test
- More robust dynamic_type normalization, fixes ONNX bug.
  The dynamic_type situation is very delicate; probably need
  to stop having both Scalar and real.
- Make new_with_tensor_sparse more CUDA safe
- Note about CUDA-safety in SparseTensorImpl
- Rename dimI/dimV to sparseDims/denseDims.
- Make localScalar on SparseTensorImpl work.
- Make numel uniformly supported on all types, not just dense
  types
- Add tests for is_nonzero() method (which exercises localScalar)
- Disable constant JIT autogenerated tests, which are fragile and broken
  by this change, but being fixed in a parallel track.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-06-15 17:52:21 -04:00