Summary:
Resubmit of https://github.com/pytorch/pytorch/issues/58811, Closes gh-24745
The existing PR (gh-50655) has been stalled because `TensorIterator` doesn't guarantee iteration order in the same way that `TH_TENSOR_APPLY` does. For contiguous test cases this isn't an issue; but it breaks down for example with channels last format. I resolve this by adding a new `TensorIteratorConfig` parameter, `enforce_linear_iteration`, which disables dimension reordering. I've also added a test case for non-contiguous tensors to verify this works.
This PR also significantly improves performance by adding multithreading support to the algorithm. As part of this, I wrote a custom `count_nonzero` that gives per-thread counts which is necessary to write the outputs in the right location.
| Shape | Before | After (1 thread) | After (8 threads) |
|:----------:|--------:|-----------------:|------------------:|
| 256,128,32 | 2610 us | 2150 us | 551 us |
| 128,128,32 | 1250 us | 1020 us | 197 us |
| 64,128,32 | 581 us | 495 us | 99 us |
| 32,128,32 | 292 us | 255 us | 83 us |
| 16,128,32 | 147 us | 126 us | 75 us |
| 8,128,32 | 75 us | 65 us | 65 us |
| 4,128,32 | 39 us | 33 us | 33 us |
| 2,128,32 | 20 us | 18 us | 18 us |
| 1,128,32 | 11 us | 9 us | 9 us |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59149
Reviewed By: mruberry
Differential Revision: D28817466
Pulled By: ngimel
fbshipit-source-id: f08f6c003c339368fd53dabd28e9ada9e59de732
Summary:
Closes gh-24745
The existing PR (gh-50655) has been stalled because `TensorIterator` doesn't guarantee iteration order in the same way that `TH_TENSOR_APPLY` does. For contiguous test cases this isn't an issue; but it breaks down for example with channels last format. I resolve this by adding a new `TensorIteratorConfig` parameter, `enforce_linear_iteration`, which disables dimension reordering. I've also added a test case for non-contiguous tensors to verify this works.
This PR also significantly improves performance by adding multithreading support to the algorithm. As part of this, I wrote a custom `count_nonzero` that gives per-thread counts which is necessary to write the outputs in the right location.
| Shape | Before | After (1 thread) | After (8 threads) |
|:----------:|--------:|-----------------:|------------------:|
| 256,128,32 | 2610 us | 2220 us | 496 us |
| 128,128,32 | 1250 us | 976 us | 175 us |
| 64,128,32 | 581 us | 486 us | 88 us |
| 32,128,32 | 292 us | 245 us | 80 us |
| 16,128,32 | 147 us | 120 us | 71 us |
| 8,128,32 | 75 us | 61 us | 61 us |
| 4,128,32 | 39 us | 32 us | 32 us |
| 2,128,32 | 20 us | 17 us | 17 us |
| 1,128,32 | 11 us | 9 us | 9 us |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58811
Reviewed By: anjali411
Differential Revision: D28700259
Pulled By: ngimel
fbshipit-source-id: 9b279ca7c36d8e348b7e5e4be0dd159e05aee159
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54153
Currently, sparse tensors only support real floating point tensors. Complex support is added in this PR for CPU/CUDA.
- [x] add complex support (torch.cfloat and torch.cdouble) to torch.sparse_coo_tensor constructors
- [x] add complex support to coalesce function
- [x] add complex support to to_dense function
- [x] add complex support to to_sparse function
- [x] add complex support to sparse_add function
- [x] add unit tests
Note: This PR contains only complex support for torch.sparse_coo_tensor fordward function and the related ops used with this function (coalesce, to_dense, to_sparse, and sparse_add). The following PRs in ghstack should cover other sparse operations to have a more complex sparse support, specifically related with the use of specific APIs for accelerated linear algebra.
Note: Before using ghstack the original PR was #50984
Test Plan: Imported from OSS
Reviewed By: H-Huang
Differential Revision: D27765618
Pulled By: ezyang
fbshipit-source-id: a9cdd31d5c7a7dafd790f6cc148f3df26e884c89
Summary:
Fixes https://github.com/pytorch/pytorch/issues/52253
In the issue reproducer we can replace `torch.sparse.sum(S)` with `S.coalesce()` and get the same memory leak. The reason is that calling `coalesce()` on an already coalesced tensor returns `self`. With autograd, the result gets it's `grad_fn` set to a node that contains a reference to the input tensor, creating a reference cycle. Cloning the tensor fixes this, so `coalesce` always returns a new tensor.
As an aside, `torch.sparse.sum(S)` doesn't need to coalesce. The result should be the same either way.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52874
Reviewed By: bdhirsh
Differential Revision: D27246997
Pulled By: albanD
fbshipit-source-id: 0fe6c11043501a7874a50982afd42964f47470d3
Summary:
Stack:
* https://github.com/pytorch/pytorch/issues/54954 Fixed OpInfo jit tests failing for TensorList inputs
* __#54922 Added support for TensorList inputs in OpInfo__
Updated OpInfo to accept either a `Tensor` or `TensorList` as `sample.input` and added workarounds to make this work with gradcheck.
Note: JIT testing support for TensorList inputs will be added in a follow up PR.
Fixes https://github.com/pytorch/pytorch/issues/51996
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54922
Reviewed By: H-Huang
Differential Revision: D27448952
Pulled By: heitorschueroff
fbshipit-source-id: 3f24a56f6180eb2d044dcfc89ba59fce8acfe278
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54034Fixes#53544
I had to touch a bunch of lines but the refactoring was fairly
mechanical. Here's how it works.
The basic concept behind this PR is that tensor_new.cpp was previously
abusing DispatchKey when it actually meant TensorOptions. The provided
DispatchKey argument to most of the constructor functions typically
comes from torch::tensors::get_default_dispatch_key(); it doesn't
really make sense for people to set the default dispatch key, but
this got grandfathered in due to the old API set_default_tensor_type
(where the "Type" concept got refactored into "DispatchKey" concept
over time). See also #53124. But the upshot is that, semantically,
what we refer to as the default dispatch key really is more like
torch.set_default_tensor_type(torch.Tensor) versus
torch.set_default_tensor_type(torch.cuda.Tensor): clearly the user
wants to do something about *construction* of the tensor, and
TensorOptions captures that exactly.
So, how exactly to translate from one to the other?
- Sources (things that used to PRODUCE DispatchKey)
- Most top level functions take a DispatchKey as their argument. I
use the new function dispatchKeyToTensorOptions to convert it into
a TensorOptions
- typeIdWithDefault now produces a TensorOptions (probably could do
with a rename, though I didn't)
- Sinks (things that used to CONSUME DispatchKey)
- Previously, the function options() was typically used to convert the
DispatchKey into a TensorOptions. Now its replacement build_options
just takes a TensorOptions and sets some extra fields on it.
Irritatingly, I can't just replace
`build_options(options, scalar_type, device)` with
`options.dtype(scalar_type).device(device)` because the semantics
are slightly different: if device is nullopt, we should preserve
the usage of the device specified in options (what options.device()
does is overwrite the device unconditionally; e.g., if device is
nullopt, unset device from options)
- The other major sink for DispatchKey was `internal_new_from_data`,
but it turns out it only really extracts the device type from
the dispatch key. Now it just pulls out the device from
TensorOptions.
- To actually do the translation of DispatchKey to TensorOptions, I
introduce new functions dispatchKeyToLayout (replicating
layout_from_backend--there are still a few uses of this function
so I couldn't delete it) and dispatchKeyToDeviceType (replacing
computeDeviceType)
- In all internal functions, whenever DispatchKey is taken as an argument,
I instead take TensorOptions as an argument, and pass it along.
- Anywhere `legacyExtractDispatchKey(other.key_set())` equality was
previously used, I now do `other.options().type_equal()`, which
is the intended BC for doing "backend to backend" comparisons
- There are a few places in the sparse constructors where we allocated
a tensor for values, and then read out the dispatch key from the
result to allocate the keys. As best as I can tell, this is totally
equivalent to just passing in the options to both values and indices
(the only difference is dtype, which is captured via a separate
argument)
This refactor doesn't really go far enough: for example, there are now
functions that take both TensorOptions and ScalarType, when really
the TensorOptions can capture this all. I kept it solely just
s/DispatchKey/TensorOptions/ to reduce the number of possible bugs;
also, a lot of this will be mooted by a proper fix to #53124.
Even with this limited refactor, the payoff is sweet. I can delete:
- backendToCPU
- backendToXPU
- backendToCUDA
- backendToHIP
- backendToBackendOfDeviceType
The reason I can do this is because I can simply overwrite layout in TensorOptions
to do the conversion, rather than having to type out each backend case
explicitly.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: bhosmer
Differential Revision: D27109509
Pulled By: ezyang
fbshipit-source-id: 91d16cfbc390127770362ac04fb43f7e070077e9
Summary:
Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857
These are the only hand-written parts of this diff:
- the addition to `.github/workflows/lint.yml`
- the file endings changed in these four files (to appease FB-internal land-blocking lints):
- `GLOSSARY.md`
- `aten/src/ATen/core/op_registration/README.md`
- `scripts/README.md`
- `torch/csrc/jit/codegen/fuser/README.md`
The rest was generated by running this command (on macOS):
```
git grep -I -l ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' | xargs gsed -i 's/ *$//'
```
I looked over the auto-generated changes and didn't see anything that looked problematic.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406
Test Plan:
This run (after adding the lint but before removing existing trailing spaces) failed:
- https://github.com/pytorch/pytorch/runs/2043032377
This run (on the tip of this PR) succeeded:
- https://github.com/pytorch/pytorch/runs/2043296348
Reviewed By: walterddr, seemethere
Differential Revision: D26856620
Pulled By: samestep
fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97
Summary:
checks sizes of sparse tensors when comparing them in assertEqual.
Removes additional checks in safeCoalesce, safeCoalesce should not be a test for `.coalesce()` function.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47148
Reviewed By: mruberry
Differential Revision: D24823127
Pulled By: ngimel
fbshipit-source-id: 9303a6ff74aa3c9d9207803d05c0be2325fe392a
Summary:
Fixes https://github.com/pytorch/pytorch/issues/43699
- Changed the order of `TORCH_CHECK` and `if (options.layout() == kSparse && self.is_sparse())`
inside `empty_like` method.
- [x] Added tests
EDIT:
More details on that and why we can not take zeros_like approach.
Python code :
```python
res = torch.zeros_like(input_coalesced, memory_format=torch.preserve_format)
```
is routed to
```c++
// TensorFactories.cpp
Tensor zeros_like(
const Tensor& self,
const TensorOptions& options,
c10::optional<c10::MemoryFormat> optional_memory_format) {
if (options.layout() == kSparse && self.is_sparse()) {
auto res = at::empty({0}, options); // to be resized
res.sparse_resize_and_clear_(
self.sizes(), self.sparse_dim(), self.dense_dim());
return res;
}
auto result = at::empty_like(self, options, optional_memory_format);
return result.zero_();
}
```
and passed to `if (options.layout() == kSparse && self.is_sparse())`
When we call in Python
```python
res = torch.empty_like(input_coalesced, memory_format=torch.preserve_format)
```
it is routed to
```c++
Tensor empty_like(
const Tensor& self,
const TensorOptions& options_,
c10::optional<c10::MemoryFormat> optional_memory_format) {
TORCH_CHECK(
!(options_.has_memory_format() && optional_memory_format.has_value()),
"Cannot set memory_format both in TensorOptions and explicit argument; please delete "
"the redundant setter.");
TensorOptions options =
self.options()
.merge_in(options_)
.merge_in(TensorOptions().memory_format(optional_memory_format));
TORCH_CHECK(
!(options.layout() != kStrided &&
optional_memory_format.has_value()),
"memory format option is only supported by strided tensors");
if (options.layout() == kSparse && self.is_sparse()) {
auto result = at::empty({0}, options); // to be resized
result.sparse_resize_and_clear_(
self.sizes(), self.sparse_dim(), self.dense_dim());
return result;
}
```
cc pearu
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44058
Reviewed By: albanD
Differential Revision: D23672494
Pulled By: mruberry
fbshipit-source-id: af232274dd2b516dd6e875fc986e3090fa285658
Summary:
This PR:
- updates div to perform true division
- makes torch.true_divide an alias of torch.div
This follows on work in previous PyTorch releases that first deprecated div performing "integer" or "floor" division, then prevented it by throwing a runtime error.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42907
Reviewed By: ngimel
Differential Revision: D23622114
Pulled By: mruberry
fbshipit-source-id: 414c7e3c1a662a6c3c731ad99cc942507d843927
Summary:
Refactor comnon pattern of (torch.cuda.version and [int(x) for x in torch.cuda.version.split(".")] >= [a, b]) into `_get_torch_cuda_version()` function
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42626
Reviewed By: seemethere
Differential Revision: D22956149
Pulled By: malfet
fbshipit-source-id: 897c55965e53b477cd20f69e8da15d90489035de
Summary:
**BC-Breaking Note:**
BC breaking changes in the case where keepdim=True. Before this change, when calling `torch.norm` with keepdim=True and p='fro' or p=number, leaving all other optional arguments as their default values, the keepdim argument would be ignored. Also, any time `torch.norm` was called with p='nuc', the result would have one fewer dimension than the input, and the dimensions could be out of order depending on which dimensions were being reduced. After the change, for each of these cases, the result has the same number and order of dimensions as the input.
**PR Summary:**
* Fix keepdim behavior
* Throw descriptive errors for unsupported sparse norm args
* Increase unit test coverage for these cases and for complex inputs
These changes were taken from part of PR https://github.com/pytorch/pytorch/issues/40924. That PR is not going to be merged because it overrides `torch.norm`'s interface, which we want to avoid. But these improvements are still useful.
Issue https://github.com/pytorch/pytorch/issues/24802
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41956
Reviewed By: albanD
Differential Revision: D22837455
Pulled By: mruberry
fbshipit-source-id: 509ecabfa63b93737996f48a58c7188b005b7217
Summary:
**BC-Breaking Note**
This PR changes the behavior of the torch.tensor, torch.as_tensor, and sparse constructors. When given a tensor as input and a device is not explicitly specified, these constructors now always infer their device from the tensor. Historically, if the optional dtype kwarg was provided then these constructors would not infer their device from tensor inputs. Additionally, for the sparse ctor a runtime error is now thrown if the indices and values tensors are on different devices and the device kwarg is not specified.
**PR Summary**
This PR's functional change is a single line:
```
auto device = device_opt.has_value() ? *device_opt : (type_inference ? var.device() : at::Device(computeDeviceType(dispatch_key)));
```
=>
```
auto device = device_opt.has_value() ? *device_opt : var.device();
```
in `internal_new_from_data`. This line entangled whether the function was performing type inference with whether it inferred its device from an input tensor, and in practice meant that
```
t = torch.tensor((1, 2, 3), device='cuda')
torch.tensor(t, dtype=torch.float64)
```
would return a tensor on the CPU, not the default CUDA device, while
```
t = torch.tensor((1, 2, 3), device='cuda')
torch.tensor(t)
```
would return a tensor on the device of `t`!
This behavior is niche and odd, but came up while aocsa was fixing https://github.com/pytorch/pytorch/issues/40648.
An additional side affect of this change is that the indices and values tensors given to a sparse constructor must be on the same device, or the sparse ctor must specify the dtype kwarg. The tests in test_sparse.py have been updated to reflect this behavior.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41984
Reviewed By: ngimel
Differential Revision: D22721426
Pulled By: mruberry
fbshipit-source-id: 909645124837fcdf3d339d7db539367209eccd48
Summary:
This updates assertEqual and assertEqual-like functions to either require both or neither of atol and rtol be specified. This should improve clarity around handling precision in the test suite, and it allows us to remove the legacy positional atol argument from assertEqual. In addition, the "message" kwarg is replace with a kwarg-only "msg" argument whose name is consistent with unittest's assertEqual argument.
In the future we could make "msg" an optional third positional argument to be more consistent with unittest's assertEqual, but requiring it be specified should be clear, and we can easily update the signature to make "msg" an optional positional argument in the future, too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38872
Differential Revision: D21740237
Pulled By: mruberry
fbshipit-source-id: acbc027aa1d7877a49664d94db9a5fff91a07042
Summary:
This updates assertEqual and assertEqual-like functions to either require both or neither of atol and rtol be specified. This should improve clarity around handling precision in the test suite, and it allows us to remove the legacy positional atol argument from assertEqual. In addition, the "message" kwarg is replace with a kwarg-only "msg" argument whose name is consistent with unittest's assertEqual argument.
In the future we could make "msg" an optional third positional argument to be more consistent with unittest's assertEqual, but requiring it be specified should be clear, and we can easily update the signature to make "msg" an optional positional argument in the future, too.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38872
Differential Revision: D21717199
Pulled By: mruberry
fbshipit-source-id: 9feb856f94eee911b44f6c7140a1d07c1b026d3a
Summary:
This PR implements softmax support for sparse tensors.
The sparse softmax is related to dense softmax when the values of unspecified sparse tensor entries are taken to be `-inf` that will have the effect of "zero entries ignored". This relation is used for testing the correctness of results here.
Resolves https://github.com/pytorch/pytorch/issues/23651 for CPU.
- [x] sparse softmax
- [x] CPU C++ implementation
- [x] unittests
- [x] update softmax documentation
- [x] autograd support
- [x] sparse log_softmax
- [x] CPU C++ implementation
- [x] unittests
- [x] update log_softmax documentation
- [x] autograd support
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36305
Differential Revision: D21566540
Pulled By: ezyang
fbshipit-source-id: a632ea69c38622f960721482e442efeb8d0a54fc
Summary:
This pull request fixes and re-enables two of the tests disabled in https://github.com/pytorch/pytorch/issues/37427
1. `test_sparse_add_out_bfloat16` in test_sparse.py fixed to use updated `atol` argument instead of `prec` for `assertEqual`
2. The conversion of `flt_min` to `int64` is divergent on HIP compared to numpy. The change removes that conversion from the `test_float_to_int_conversion_finite` test case in test_torch.py
cc: ezyang jeffdaily
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37616
Differential Revision: D21379876
Pulled By: ezyang
fbshipit-source-id: 2bfb41d67874383a01330c5d540ee516b3b07dcc
Summary:
Fixes https://github.com/pytorch/pytorch/issues/34964
Sparse cuda add was implemented by just concatenating the indices and values for the tensor. If called repeatedly in a tight loop this will let `nnz` grow unbounded. In the worst case of `x.add_(x)` it grows exponentially.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36030
Differential Revision: D20873504
Pulled By: zou3519
fbshipit-source-id: d90ed8dda0c89571fb89e358757b5dde299513df
Summary:
This pull request disables the unit tests that were observed to be failing once `test2` was enabled. These tests will be one by one looked at and fixed at the earliest, but until then disabling them to unblock `test2`
The pull request also disables fftPlanDestroy for rocFFT to avoid double-freeing FFT handles
cc: ezyang jeffdaily
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37427
Differential Revision: D21302909
Pulled By: ezyang
fbshipit-source-id: ecadda3778e65b7f4f97e24b932b96b9ce928616