Commit Graph

525 Commits

Author SHA1 Message Date
Jacie Fan
a7796bc24d CUDA histogram implementation
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15842

Reviewed By: zou3519

Differential Revision: D13868982

Pulled By: jaciefan

fbshipit-source-id: bce81dc121c4538d204047506f8f14d0b4d8f905
2019-01-30 11:36:20 -08:00
Sebastian Messmer
7c66ad7455 Add test case for calling c10 ops from pytorch
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16062

Reviewed By: ezyang

Differential Revision: D13628955

fbshipit-source-id: f6ed3f07db2675bd9ae9251da990ca7b8c963717
2019-01-29 18:22:52 -08:00
SsnL
ded6fb0293 Add stack & cat support for CPU Half (#16389)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/6968

Needed for #14705
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16389

Differential Revision: D13861446

Pulled By: gchanan

fbshipit-source-id: 7b8700b95aaf252d9669693dbddccb2302e58409
2019-01-29 13:06:29 -08:00
Junjie Bai
17d7818578 Fix lint errors introduced in pytorch/pytorch@ceece5d (#16454)
Summary:
ifedan

```
./test/common_utils.py:748:1: E302 expected 2 blank lines, found 1
./test/test_torch.py:1235:5: E303 too many blank lines (2)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16454

Differential Revision: D13844905

Pulled By: bddppq

fbshipit-source-id: 3dc7c740d86310a8efc9864d7c7798fda8257a21
2019-01-28 11:29:11 -08:00
Igor Fedan
ceece5dd0f CPU implementation of torch.cdist (#16168)
Summary:
cdist is used for calculating distances between collections of observations.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16168

Differential Revision: D13739147

Pulled By: ifedan

fbshipit-source-id: 9419c2c166891ac7db40672c72f17848f0b446f9
2019-01-28 09:16:32 -08:00
Xiang Gao
c5e1b469be Return namedtuples from torch.* function with multiple return arguments for C++ operators (#15429)
Summary:
Partially fixes: https://github.com/pytorch/pytorch/issues/394

Implementation detail:

Codegen is modified to generate codes that looks like below:
```C++
static PyObject * THPVariable_svd(PyObject* self_, PyObject* args, PyObject* kwargs)
{
  HANDLE_TH_ERRORS
  static PythonArgParser parser({
    "svd(Tensor input, bool some=True, bool compute_uv=True, *, TensorList[3] out=None)",
  }, /*traceable=*/true);

  ParsedArgs<6> parsed_args;
  auto r = parser.parse(args, kwargs, parsed_args);
  static PyStructSequence_Field fields0[] = {
    {"U", ""}, {"S", ""}, {"V", ""}, {nullptr}
  };
  static PyStructSequence_Desc desc0 = {
    "torch.return_types.svd_out", nullptr,
    fields0, 3
  };
  static PyTypeObject type0;
  static bool namedtuple_type_initialized0 = false;
  if (!namedtuple_type_initialized0) {
    PyStructSequence_InitType(&type0, &desc0);
    namedtuple_type_initialized0 = true;
  }
  static PyStructSequence_Field fields1[] = {
    {"U", ""}, {"S", ""}, {"V", ""}, {nullptr}
  };
  static PyStructSequence_Desc desc1 = {
    "torch.return_types.svd", nullptr,
    fields1, 3
  };
  static PyTypeObject type1;
  static bool namedtuple_type_initialized1 = false;
  if (!namedtuple_type_initialized1) {
    PyStructSequence_InitType(&type1, &desc1);
    namedtuple_type_initialized1 = true;
  }
  if (r.idx == 0) {
    if (r.isNone(3)) {
      return wrap(&type1, dispatch_svd(r.tensor(0), r.toBool(1), r.toBool(2)));
    } else {
      auto results = r.tensorlist_n<3>(3);
      return wrap(&type0, dispatch_svd(r.tensor(0), r.toBool(1), r.toBool(2), results[0], results[1], results[2]));
    }
  }
  Py_RETURN_NONE;
  END_HANDLE_TH_ERRORS
}
```
Types are defined as static member of `THPVariable_${op_name}` functions, and initialized at the first time the function is called.

When parsing function prototypes in `native_functions.yaml`, the parser will set the specified name as `field_name` when see things like `-> (Tensor t1, ...)`. These field names will be the field names of namedtuple. The class of namedtuples will be named `torch.return_types.${op_name}`.

In some python 2, `PyStructSequence` is not a subtype of tuple, so we have to create some functions to check if an object is a tuple or namedtuple for compatibility issue.

Operators in `native_functions.yaml` are changed such that only `max` and `svd` are generated as namedtuple. Tests are added for these two operators to see if the return value works as expected. Docs for these two ops are also updated to explicitly mention the return value is a namedtuple. More ops will be added in later PRs.

There is some issue with Windows build of linker unable to resolve `PyStructSequence_UnnamedField`, and some workaround is added to deal with this case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15429

Differential Revision: D13709678

Pulled By: ezyang

fbshipit-source-id: 23a511c9436977098afc49374e9a748b6e30bccf
2019-01-22 11:12:18 -08:00
Shen Li
1ff864712b Port legacy any(*) to ATen
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15547

Differential Revision: D13549495

Pulled By: mrshenli

fbshipit-source-id: 09a065a8ffa7d73f409759b779c7314cc87f4853
2019-01-18 10:32:19 -08:00
Gregory Chanan
595f767880 Revert batched pdist, improve existing kernel, add test (#15901)
Summary:
1) Reverts https://github.com/pytorch/pytorch/pull/12302 which added support for batched pdist. Except I kept the (non-batched) test improvements that came with that PR, because they are nice to have.  Motivation: https://github.com/pytorch/pytorch/issues/15511
2) For the non-batched pdist, improved the existing kernel by forcing fp64 math and properly checking cuda launch errors
3) Added a 'large tensor' test that at least on my machine, fails on the batch pdist implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15901

Reviewed By: ezyang

Differential Revision: D13616730

Pulled By: gchanan

fbshipit-source-id: 620d3f9b9acd492dc131bad9d2ff618d69fc2954
2019-01-17 10:44:43 -08:00
Shen Li
a2af554e6f Port legacy all(*) to ATen (#15540)
Summary:
Questions:

1. ~This PR disables `common_dtype` computation [in `TensorIterator.cpp`](https://github.com/mrshenli/pytorch/blob/all/aten/src/ATen/native/TensorIterator.cpp#L489-L491) for `all*` operators. The reason is that, [this code](https://github.com/mrshenli/pytorch/blob/all/aten/src/ATen/native/TensorIterator.cpp#L120) otherwise complains type mismatch, where the `op.tensor` is `type Variable[CPUByteType]` while the `op` is `CPUByteType`. I am not sure if this is the right solution for this problem.~

2. Should I clean up all occurrences of `_th_all` and `_th_all_out` (and `logicalAnd`, `logicalAndAll`)?

3. Do I need to implement derivatives for `all`?

gchanan

Benchmark:

<img width="590" alt="screen shot 2018-12-26 at 3 24 31 pm" src="https://user-images.githubusercontent.com/16999635/50456505-e9596a00-0922-11e9-844e-00c4b4aad7ca.png">

<img width="587" alt="screen shot 2018-12-26 at 3 26 10 pm" src="https://user-images.githubusercontent.com/16999635/50456509-ef4f4b00-0922-11e9-96bf-0a30c8574fe7.png">

<img width="590" alt="screen shot 2018-12-26 at 3 26 54 pm" src="https://user-images.githubusercontent.com/16999635/50456510-ef4f4b00-0922-11e9-8a63-e47988843cc8.png">

<img width="589" alt="screen shot 2018-12-26 at 3 27 16 pm" src="https://user-images.githubusercontent.com/16999635/50456511-ef4f4b00-0922-11e9-9004-2518aebcdc6e.png">
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15540

Differential Revision: D13548938

Pulled By: mrshenli

fbshipit-source-id: 5a2e5eef1047decb4c79906cb9f3332034908c9c
2019-01-16 09:06:26 -08:00
Xiang Gao
1065e7cd24 Add itertools.{prod, combinations, combinations_with_replacement} like op to pytorch (#9393)
Summary:
closes https://github.com/pytorch/pytorch/issues/7580
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9393

Differential Revision: D13659628

Pulled By: zou3519

fbshipit-source-id: 3a233befa785709395a793ba8833413be394a6fd
2019-01-15 08:31:22 -08:00
Brennan Vincent
bc233fe405 var for multiple dimensions (#15892)
Summary:
Timings are the same as for `std` .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15892

Differential Revision: D13651173

Pulled By: umanwizard

fbshipit-source-id: a26bf1021dd972aa9e3e60fb901cd4983bfa190f
2019-01-14 20:17:42 -08:00
Christian Puhrsch
d33159a426 Undo norm optimizations and add more documentation for parallel.h (#15885)
Summary:
See https://github.com/pytorch/pytorch/issues/15602
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15885

Differential Revision: D13614841

Pulled By: cpuhrsch

fbshipit-source-id: 5d3e45f499d36ac287dbbc2e45798aa51eb5bfdf
2019-01-11 13:32:35 -08:00
Brennan Vincent
70dd44f6a8 Match NumPy by considering NaNs to be larger than any number when sorting (#15886)
Summary:
Fixes #15764
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15886

Differential Revision: D13612971

Pulled By: umanwizard

fbshipit-source-id: 91f552a25d1fd108f2f0b10e09a0ce0364f8c21e
2019-01-11 08:14:11 -08:00
Gregory Chanan
b7cdeb3fc3 Port empty_strided to ATen. (#15948)
Summary:
Turns out this has basically been implemented already in Resize.h / Resize.cuh.
Also added some testing, basically just to check that empty_strided behaves equivalently to as_strided.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15948

Differential Revision: D13631098

Pulled By: gchanan

fbshipit-source-id: eb0e04eead45e4cff393ebde340f9d265779e185
2019-01-11 07:58:05 -08:00
vishwakftw
b4c3268b23 Batched upper triangular, lower triangular (#15257)
Summary:
Changelog:

- Implements `triu` and `tril` for batches of 2D tensors.
- Remove TH/THC binding for `tril`
- Fix CUDA implementation
- Update docstrings for tril and triu.
- Remove mask-based `triu` and `tril` in cholesky forward and backward.
- Remove batched tril in torch.distributions.utils
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15257

Differential Revision: D13613888

Pulled By: mrshenli

fbshipit-source-id: 0949a05b9b8e974c1acfaf02a6284848ec5cc1c4
2019-01-09 19:46:39 -08:00
zou3519
f0c2a9a7b6 Add torch.bincount() test case on sliced tensor (#15835)
Summary:
This was causing a problem in #15735 but appears to have been fixed.
Adding this test to prevent regressions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15835

Differential Revision: D13600282

Pulled By: zou3519

fbshipit-source-id: d9939e74d372be71c50122a5f6a615fbd7fa4df6
2019-01-09 07:31:19 -08:00
vishwakftw
95febdfacc Add is_floating_point to docs (#15704)
Summary:
Fixes #15700 .

Changelog:

- Expose torch.*.is_floating_point to docs

Differential Revision: D13580734

Pulled By: zou3519

fbshipit-source-id: 76edb4af666c08237091a2cebf53d9ba5e6c8909
2019-01-07 10:43:22 -08:00
mruberry
b6a8c45f57 Removes print statements from test_torch.py (#15747)
Summary:
These print statements do not affect the test, and tests (generally) shouldn't print.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15747

Differential Revision: D13587289

Pulled By: soumith

fbshipit-source-id: c758793c9e35faf02bacba6c7c6d072f7c40453f
2019-01-05 09:07:27 -08:00
Shen Li
efc3d6b65d Fix vec256 inversion (#15659)
Summary:
soumith zou3519

I was browsing the code, and think `vec256_int.h` might need a minor revision, but not 100% sure.

1. It currently invert the result by `XOR` with 0. Should it `XOR` with 1 instead?
~2. AVX2 logical operations would set all bits in a byte/word/... to `1` if the condition holds. So functions, such as `_mm256_cmpeq_epi64 ` would return `0/-1` instead of `0/1`. Should it be masked with `1` to make sure it returns 0/1?~

~Would I be correct if I assume that the code revised below is not yet activated, but will be after we port legacy code to ATen?~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15659

Differential Revision: D13565929

Pulled By: mrshenli

fbshipit-source-id: 8ae3daf256c3d915dd855a2215c95275e899ea8c
2019-01-02 21:32:44 -08:00
surgan12
b52420742d clamp fixes (#15479)
Summary: fix to #15338 .

Differential Revision: D13564343

Pulled By: soumith

fbshipit-source-id: be64b572945533e10ae6f627d335b47f093720a3
2019-01-01 23:12:17 -08:00
vishwakftw
7bb41e3953 Make btriunpack work for high dimensional batches and faster than before (#15286)
Summary:
Changelog:
- Optimize btriunpack by using `torch.where` instead of indexing, inplace operations instead of out place operations and avoiding costly permutations by computing the final permutation over a list.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15286

Differential Revision: D13562038

Pulled By: soumith

fbshipit-source-id: e2c94cfab5322bf1d24bf56d7b056619f553acc6
2018-12-30 12:42:07 -08:00
Vishwak Srinivasan
9c8d8eab9d Remove TH/THC link for gesv (#15510)
Summary:
This PR removes the TH/THC binding for gesv.

Changelog:
- Remove TH/THC binding
- Port single matrix case to ATen
- Enable test_gesv for CUDA as well
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15510

Differential Revision: D13559990

Pulled By: soumith

fbshipit-source-id: 9da2825e94d3103627e719709e6b1f8b521a07fb
2018-12-28 16:54:27 -08:00
Will Feng
7b87ecae37 Move autograd metadata from VariableImpl to TensorImpl (#13827)
Summary:
Changes originally in this PR:
1. Move Variable::Impl data members into TensorImpl as `AutogradMeta` struct
2. Change Variable::Impl functions to use data members in `AutogradMeta` struct
3. Add `shallow_copy_and_detach()` function to each subclass of TensorImpl
4. Do shallow copy when the user calls `make_variable(tensor)` / `make_variable_view(tensor)` / `variable.set_data(tensor)` / `variable.detach()`

Changes moved from https://github.com/pytorch/pytorch/pull/13645:
1. Add a flag to Variable to disallow size/stride/storage_ptr changes from in-place operations such as `resize_` / `resize_as_` / `set_` / `transpose_`, and set this flag to true when people call `tensor.data` in Python.
2. Write text in the docs to actively discourage changing the shape or storage of `tensor_detached` and expecting `tensor` to also be updated.

This is the 1st+2nd PR mentioned in https://github.com/pytorch/pytorch/issues/13638.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13827

Differential Revision: D13507173

Pulled By: yf225

fbshipit-source-id: b177b08438d534a8197e34e1ad4a837e2db0ed6a
2018-12-26 16:34:24 -08:00
Frank Zhang
d4712ee218 Added correct isinf handling for Integral tensors (#15489)
Summary:
Currently torch.isinf on integral tensor will raise RuntimeError: value cannot be converted to type int16_t without overflow: inf.
This pr will suppress the error and return false(0) for all integral tensors. The behavior will also be consistent with np.isinf
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15489

Reviewed By: zou3519

Differential Revision: D13540786

Pulled By: flashhack

fbshipit-source-id: e730dea849da6a59f3752d347bcfbadfd12c6483
2018-12-26 06:36:09 -08:00
SsnL
521894c490 Allow converting char tensor to numpy; add [fi]info.min (#15046)
Summary:
https://github.com/pytorch/pytorch/pull/14710 with test fixed.

Also added `finfo.min` and `iinfo.min` to get castable tensors.

cc soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15046

Reviewed By: soumith

Differential Revision: D13429388

Pulled By: SsnL

fbshipit-source-id: 9a08004419c83bc5ef51d03b6df3961a9f5dbf47
2018-12-24 09:11:24 -08:00
Gao, Xiang
a47749cb28 Add at::one_hot (#15208)
Summary: Closes: https://github.com/pytorch/pytorch/issues/15060

Differential Revision: D13528014

Pulled By: ezyang

fbshipit-source-id: 5a18689a4c5638d92f9390c91517f741e5396293
2018-12-20 14:24:58 -08:00
Shen Li
06a7cb5901 Implementing cuda kernel for tril_indices and triu_indices (#15203)
Summary:
Followup PR of #14904, and the stretch goal of #12653.

Directly calculate coordinates in the original tensor using column index in the result tensor. Every GPU thread takes care of a column (two numbers) in the output tensor.

The implementation detects and handles precision loss during calculating the square root of a `int64_t` variable, and supports tensors with up to `row * column = 2 ^ 59` numbers.

Algorithm details are describe in [comments of TensorFactories.cu](23ddb6f58a/aten/src/ATen/native/cuda/TensorFactories.cu (L109-L255)).

zou3519
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15203

Reviewed By: zou3519

Differential Revision: D13517695

Pulled By: mrshenli

fbshipit-source-id: 86b305d22cac08c8962a3b0cf8e9e620b7ec33ea
2018-12-20 10:23:38 -08:00
Erik Brinkman
8db44eda01 Add support for batched pdist (#12302)
Summary:
This updates pdist to work for batched inputs, and updates the
documentation to reflect issues raised.

closes #9406
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12302

Reviewed By: ezyang

Differential Revision: D13528485

Pulled By: erikbrinkman

fbshipit-source-id: 63d93a6e1cc95b483fb58e9ff021758b341cd4de
2018-12-20 09:41:08 -08:00
Brennan Vincent
7a764fe270 multi-dim standard deviation for CUDA. (#14990)
Summary:
This is the CUDA version of #14535 .
It refactors Reduce.cuh to allow more general classes of reductions to be performed -- we no longer assume that the temporary data returned during reduction is just one scalar, and instead allow an arbitrary accumulate type.
We also allow 64-bit indexing when necessary, since in general we will no longer be able to accumulate directly in the output. (In the cases when we can, we continue to split the tensors until they can be addressed with 32-bits, as before).
As an initial use-case, we implement `std` in multiple dimensions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14990

Differential Revision: D13405097

Pulled By: umanwizard

fbshipit-source-id: a56c24dc2fd5326d417632089bd3f5c4f9f0d2cb
2018-12-20 08:56:32 -08:00
vishwakftw
41e7e1bc40 Rename potrs to cholesky_solve (#15334)
Summary:
Changelog:
- Renames `potrs` to `cholesky_solve` to remain consistent with Tensorflow and Scipy (not really, they call their function chol_solve)
- Default argument for upper in cholesky_solve is False. This will allow a seamless interface between `cholesky` and `cholesky_solve`, since the `upper` argument in both function are the same.
- Rename all tests
- Create a tentative alias for `cholesky_solve` under the name `potrs`, and add deprecated warning to not promote usage.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15334

Differential Revision: D13507724

Pulled By: soumith

fbshipit-source-id: b826996541e49d2e2bcd061b72a38c39450c76d0
2018-12-19 12:31:24 -08:00
Gregory Chanan
2469f7e02e Port torch.linspace to ATen and parallelize it on CPU.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15320

Reviewed By: ezyang

Differential Revision: D13498995

Pulled By: gchanan

fbshipit-source-id: fba655d51d978fffaa53a5e4cae4a99ebfb0eddc
2018-12-18 15:01:49 -08:00
vishwakftw
214f46faf5 Fix bincount for non-contiguous inputs on CPU (#15109)
Summary:
Fixes #15058.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15109

Differential Revision: D13447448

Pulled By: soumith

fbshipit-source-id: 56e8d42934538fb00465105a2c5ccfeb7c18a651
2018-12-13 09:44:20 -08:00
Tyler Moncur
895cb8fcea Fix resize for edge case tensors (#14874)
Summary:
Certain tensor shapes failed when being resized. This pull request addresses the bug found in #13404.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14874

Differential Revision: D13429788

Pulled By: soumith

fbshipit-source-id: 8aa6451dbadce46d6d1c47a01cb26e6559bcfc8c
2018-12-12 19:56:23 -08:00
Shen Li
90f9e8103c Implement torch.tril_indices and torch.triu_indices (#12653) (#14904)
Summary:
This is an optimized implementation that does the following:

1. created an empty Tensor of correct size.
2. fill the Tensor with correct values.

The following three designs to fill in the Tensor result in roughly the same performance. Hence, the 2nd option is taken for simpler code, and to return contiguous tensors.

1. Sequential: fill row coordinates first, then columns. This results in two for-loop and more arithmetic operations.
2. Interleaved: fill in index coordinates one by one, which jumps between the two output Tensor rows in every iteration.
3. Transpose: create a n X 2 Tensor, fill the Tensor sequentially, and then transpose it.

<img width="352" alt="screen shot 2018-12-10 at 3 54 39 pm" src="https://user-images.githubusercontent.com/16999635/49769172-07bd3580-fc94-11e8-8164-41839185e9f9.png">

NOTE:

This implementation returns a 2D tensor, instead of a tuple of two tensors. It means that users will not be able to do the following:

```python
x = torch.ones(3, 3)
i = torch.tril_indices(3, 3)
x[i]  # need to first convert the 2D tensor into a tuple of two 1D tensors.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14904

Reviewed By: zou3519

Differential Revision: D13433027

Pulled By: mrshenli

fbshipit-source-id: 41c876aafcf584832d7069f7c5929ffb59e0ae6a
2018-12-12 15:40:14 -08:00
Brennan Vincent
f36a84b71b fix some tests that I accidentally disabled (#15077)
Summary:
While moving these scenarios into `_test_dim_ops` I accidentally left an empty loop in the actual tests, causing them to do nothing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15077

Differential Revision: D13428759

Pulled By: umanwizard

fbshipit-source-id: 08f53068981d9192c1408878b168e9053f4dc92e
2018-12-12 09:25:34 -08:00
Edward Yang
d30b6bf3b6 Revert D13306052: [pytorch][PR] Allow converting CharTensor to np arrays
Differential Revision:
D13306052

Original commit changeset: 202d038f139c

fbshipit-source-id: 11f6bdd687f8ea5ce2e5f28f48d19449a5c403eb
2018-12-10 10:36:17 -08:00
SsnL
54d5c53826 Support torch.load with encoding (#14743)
Summary:
Addresses a common compatibility issue when loading Py2 checkpoints in Py3 regarding to bytes.

E.g.,
[1] https://github.com/pytorch/pytorch/issues/5994,
[2] https://github.com/CSAILVision/places365/issues/25,
[3] https://discuss.pytorch.org/t/how-to-load-a-saved-model-trained-on-pytorch-0-3-1-python-2-7-on-pyorch-1-0-python-3-7/31212
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14743

Reviewed By: weiyangfb

Differential Revision: D13350888

Pulled By: soumith

fbshipit-source-id: 2df4e828a8b70509118a355307ca3ebe51e108f6
2018-12-10 08:07:36 -08:00
SsnL
9b2bd284b3 Convert int8 numpy array to CharTensor (#14700)
Summary:
When rewriting `default_collate`, I noticed that `from_numpy` and `as_tensor` and `tensor` all do not work on `np.int8` arrays.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14700

Reviewed By: weiyangfb

Differential Revision: D13305297

Pulled By: soumith

fbshipit-source-id: 2937110f65ed714ee830d50098db292238e9b2a9
2018-12-10 07:39:06 -08:00
SsnL
e1b5dbf699 Allow converting CharTensor to np arrays (#14710)
Summary:
The other direction of #14700

cc soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14710

Reviewed By: weiyangfb

Differential Revision: D13306052

Pulled By: soumith

fbshipit-source-id: 202d038f139cf05e01069ff8d05268c66354c983
2018-12-10 07:35:28 -08:00
vishwakftw
fc30e2782c Remove deprecated info argument in btrifact (#14935)
Summary:
As specified in title.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14935

Differential Revision: D13394449

Pulled By: soumith

fbshipit-source-id: 569d59414f3a1a43ea641bded4b5433eb53e3490
2018-12-09 15:59:30 -08:00
Brennan Vincent
25110d61fb Implement std for multiple dimensions on CPU devices. (#14535)
Summary:
Tested on a tensor with 1 billion elements and 3 dimensions on a powerful, highly
multi-core Linux machine.

parallelized: All operations (e.g., `t.std(1)`) that could be done in the old code are now several times faster. All
new operations (e.g., `t.std((0,2))` are significantly faster than the NumPy equivalents.
`t.std((0, 1, 2))`, a new operation, is logically equivalent to the
old `t.std()`, but faster.

serial: The above comment about old operationos now being faster still
holds, but `t.std((t1, ..., tn))` is now a few
times slower than `t.std()`. If this turns out to be important, we can
special-case that to use the old algorithm.

The approach is to create a new method, `TensorIterator::foreach_reduced_elt`,
valid for `TensorIterator`s that represent a dimension reduction. This
method calls a supplied function for each element in the output,
supplying it with the input elements that correspond to that output.

Given that primitive, we can implement reductions like the following pseudocode:

If there is more than one output element:
```
PARALLEL FOR EACH element IN output:
    accumulator = identity
    SERIAL FOR EACH data_point IN element.corresponding_input:
        accumulator.update(data_point)
    element = accumulator.to_output()
```

If there is only one output element, we still want to parallelize, so we
do so along the *input* instead:

```
accumulators[n_threads]
PARALLEL FOR EACH input_chunk IN input.chunks():
    accumulators[thread_num()] = identity
    SERIAL FOR EACH data_point IN input_chunk:
        accumulators[thread_num()].update_with_data(data_point)
accumulator = identity
SERIAL FOR EACH acc in accumulators:
    accumulator.update_with_other_accumulator(acc)
output_element = accumulator.to_output()
```

Note that accumulators and data points do not have to be the same type
in general, since it might be necessary to track arbitrary amounts of
data at intermediate stages.

For example, for `std`, we use a parallel version of Welford's
algorithm, which requies us to track the mean, second moment, and number
of elements, so the accumulator type for `std` contains three pieces of
data.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14535

Differential Revision: D13283887

Pulled By: umanwizard

fbshipit-source-id: 8586b7bf00bf9f663c55d6f8323301e257f5ec3f
2018-12-07 20:16:04 -08:00
Johannes M Dieterich
52942e1f09 Enable unit tests known to work on ROCm (#14011)
Summary:
* Enable unit tests known to work on ROCm.
* Disable a few that are known to be flaky for the time being.
* Use std::abs for Half
* No more special casing for ROCm in TensorMathReduce
* Document an important detail for a hardcoded block size w.r.t. ROCm in TensorMathReduce

ezyang bddppq for awareness
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14011

Differential Revision: D13387679

Pulled By: bddppq

fbshipit-source-id: 4177f2a57b09d866ccbb82a24318f273e3292f71
2018-12-07 18:57:32 -08:00
Jan Schlüter
1c8d41a08d Allow linspace and logspace with steps=1 and start != end like numpy (#14748)
Summary:
`torch.linspace(0, 1, 1)` fails with `RuntimeError: invalid argument 3: invalid number of points at ../aten/src/TH/generic/THTensorMoreMath.cpp:2119`, while `np.linspace(0, 1, 1)` works fine.
Looking at the code, there is even a comment by gchanan asking: "NumPy allows you to pass different points even if n <= 1 -- should we?"
I would say "yes". Currently, I would need to handle the case of `steps == 1` or `steps == 0` separately, making sure to change the `end` when calling `torch.linspace`. This is impractical. If we support `start != end`, there are two possibilities for the result: Either we ensure the first value in the resulting sequence always equals `start`, or we ensure the last value in the resulting sequence always equals `end`. Numpy chose the former, which also allows it to support a boolean `endpoint` flag. I'd say we should follow numpy.

This PR adapts `linspace` and `logspace` to mimic the behavior of numpy, adapts the tests accordingly, and extends the docstrings to make clear what happens when passing `steps=1`.

If you decide against this PR, the error message should become explicit about what I did wrong, and the documentation should be extended to mention this restriction.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14748

Differential Revision: D13356136

Pulled By: ezyang

fbshipit-source-id: db85b8f0a98a5e24b3acd766132ab71c91794a82
2018-12-06 09:30:55 -08:00
Junjie Bai
ba0ebe33c1 Unify device argument parsing between torch and c10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14786

Differential Revision: D13334501

Pulled By: bddppq

fbshipit-source-id: ae3536be1fe0dcd6a1552ec93629ecc9554c0d7c
2018-12-05 18:37:32 -08:00
Richard Zou
1921816f85 Fix clamp when min/max are both None (#14716)
Summary:
Before this PR, tensor.clamp() would return an empty tensor if min and
max were not specified. This is a regression from 0.4.1, which would
throw an error. This PR restores that error message.

Fixes #14470
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14716

Differential Revision: D13311031

Pulled By: zou3519

fbshipit-source-id: 87894db582d5749eaccfc22ba06aac4e10983880
2018-12-04 07:07:09 -08:00
Roy Li
0786dfee7c Move THTensor_(copy) to aten (#13603)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13603
P
Moved vectorized CPU copy to aten. Notable changes mainly in _copy_same_type_.

Reviewed By: ezyang

Differential Revision: D12936031

fbshipit-source-id: 00d28813e3160595e73d104f76685e13154971c1
2018-11-30 11:12:54 -08:00
Brennan Vincent
c638f379b3 Make mean function work across multiple dimensions. (#14252)
Summary:
Multi-dimensional `sum` is already implemented, and it's trivial to implement `mean` in terms of `sum`, so just do it.

Bonus: Fix incomplete language in the `torch.sum` documentation which doesn't take into account multiple dimensions when describing `unsqueeze` (at the same time as introducing similar language in `torch.mean`).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14252

Differential Revision: D13161157

Pulled By: umanwizard

fbshipit-source-id: c45da692ba83c0ec80815200c5543302128da75c
2018-11-28 06:53:09 -08:00
Francisco Massa
68251fb931 Fix half tensor printing plus speedup large tensor printing (#14418)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/14344 and https://github.com/pytorch/pytorch/issues/6863

The slowdown was due to the fact that we were only summarizing the tensor (for computing the number of digits to print) if its first dimension was larger than the threshold. It now goes over all the dimensions.

Some quick runtime analysis:

Before this PR:
```python
In [1]: import torch; a = torch.rand(1, 1700, 34, 50)

In [2]: %timeit str(a)
13.6 s ± 84.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

After this PR

```python
In [1]: import torch; a = torch.rand(1, 1700, 34, 50)

In [2]: %timeit str(a)
2.08 ms ± 395 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [3]: b = a.cuda()

In [4]: %timeit str(b)
8.39 ms ± 45.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14418

Reviewed By: weiyangfb

Differential Revision: D13226950

Pulled By: soumith

fbshipit-source-id: 19eb4b855db4c8f891d0925a9c56ae8a2824bb23
2018-11-28 06:13:06 -08:00
Brian Vaughan
a0def0b57e check for invalid ranges in torch.arange
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13915

Differential Revision: D13222110

Pulled By: nairbv

fbshipit-source-id: fcff1ad058fbf792d0fdf4aa75d77f22e3b7483b
2018-11-27 20:38:56 -08:00
Brian Vaughan
b08a186153 roll along multiple dimensions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13874

Differential Revision: D13223669

Pulled By: nairbv

fbshipit-source-id: 1678d52529c326fa4a0614d0994b1820ad12bc04
2018-11-27 20:32:30 -08:00