Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72402
The original PR had an array-out-of-bounds access in `DispatchKeyExtractor.cpp`, that wasn't caught by ASAN and appeared to only manifest in a subset of android internal tests. After fixing the OOB access (and adding more asserts), I confirmed that the android internal test passes.
Reland of D33255193 (20b8653dfa)
ghstack-source-id: 148830728
Test Plan:
Steps to test:
(1) connect to a mobile OD
(2) run `one_world android emulator android-29` in a terminal to start the android emulator
(3) In a separate terminal, run the test: `buck test //fbandroid/instrumentation_tests/com/facebook/pytorch/bi_xray:instrumentation_test -c test.external_runner=tpx -- --regex 'testBIXRayModel.*PyTorchBIXRayInstrumentationTest' --force-remote-execution --run-disabled`
I also ran `buck test fbandroid/mode/dbg //fbandroid/instrumentation_tests/com/facebook/pytorch/bi_xray:instrumentation_test`, which failed before and passed after the PR.
Reviewed By: albanD
Differential Revision: D34034848
fbshipit-source-id: 9677ee2c0a1afd1183896f7055009445712523c5
Summary: I think this diff stack broke all the related tasks below.
Test Plan:
For our failing tests:
buck test //fbandroid/instrumentation_tests/com/facebook/pytorch/bi_xray:instrumentation_test -c test.external_runner=tpx -- --regex 'testBIXRayModel.*PyTorchBIXRayInstrumentationTest' --force-remote-execution --run-disabled
For the ubn:
Not really sure what to do, trying to build the app and see if I can use an effect?
Reviewed By: shoumikhin
Differential Revision: D34018849
fbshipit-source-id: 3571718cb6621931af931b494e0a70d6e0164e65
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71091
Fixes https://github.com/pytorch/pytorch/issues/65394
The masked sum on a full input tensor (of any layout) with an all-true mask is the same as the sum on the strided input tensor (after applying `to_dense` to sparse inputs).
Since masked sum uses `torch.sparse.sum` then, for the simplicity of masked reductions implementations, its reduction behavior ought to be defined by the behavior of the `torch.sum`. This PR implements the behavioral connection with respect to the directional summation of empty sparse tensors that correspond to all-zero strided tensors.
cc nikitaved pearu cpuhrsch
Test Plan: Imported from OSS
Reviewed By: davidberard98
Differential Revision: D33651750
Pulled By: cpuhrsch
fbshipit-source-id: 703891bff88c8da6270b4272f5d2da81688db67d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68887Closes#46988, closes#46987, closes#46761
By "simple" I mean operators that map 0->0 so we can implement it by
just re-dispatching on the values tensor. That does mean we have `sin`
but not `cos` for example, but without fill value support this is the
best that can be done.
Most of these don't support autograd because the derivative formulas
use unsupported operators.
cc nikitaved pearu cpuhrsch IvanYashchuk
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision: D32734911
Pulled By: cpuhrsch
fbshipit-source-id: 203ab105799f3d2d682b01ca3d6b18e7c994776a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68887Closes#46988, closes#46987, closes#46761
By "simple" I mean operators that map 0->0 so we can implement it by
just re-dispatching on the values tensor. That does mean we have `sin`
but not `cos` for example, but without fill value support this is the
best that can be done.
Most of these don't support autograd because the derivative formulas
use unsupported operators.
cc nikitaved pearu cpuhrsch IvanYashchuk
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision: D32706197
Pulled By: cpuhrsch
fbshipit-source-id: 65e1acb3645737ca7bdb7f2db739d8e118906f4b
Summary:
Fixes https://github.com/pytorch/pytorch/issues/67904.
- Create a sparse tensor when the sparse layout is given even if the input tensor is not sparse.
cc nikitaved pearu cpuhrsch IvanYashchuk
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68108
Reviewed By: anjali411
Differential Revision: D32316269
Pulled By: cpuhrsch
fbshipit-source-id: 923dbd4dc7c74f51f7cdbafb2375a30271a6a886
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64181
This PR replaces all the calls to:
- `transpose(-2, -1)` or `transpose(-1, -2)` by `mT()` in C++ and `mT` in Python
- `conj().transpose(-2, -1)` or `transpose(-2, -1).conj()` or `conj().transpose(-1, -2)` or `transpose(-1, -2).conj()` by `mH()` in C++ and `mH` in Python.
It also simplifies two pieces of code, and fixes one bug where a pair
of parentheses were missing in the function `make_symmetric_matrices`.
Test Plan: Imported from OSS
Reviewed By: H-Huang
Differential Revision: D31692896
Pulled By: anjali411
fbshipit-source-id: e9112c42343663d442dc5bd53ff2b492094b434a
Summary:
This PR enables Half, BFloat16, ComplexFloat, and ComplexDouble support for matrix-matrix multiplication of COO sparse matrices.
The change is applied only to CUDA 11+ builds.
`cusparseSpGEMM` also supports `CUDA_C_16F` (complex float16) and `CUDA_C_16BF` (complex bfloat16). PyTorch also supports the complex float16 dtype (`ScalarType::ComplexHalf`), but there is no convenient dispatch, so this dtype is omitted in this PR.
cc nikitaved pearu cpuhrsch IvanYashchuk ezyang anjali411 dylanbespalko mruberry Lezcano
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59980
Reviewed By: ngimel
Differential Revision: D30994115
Pulled By: cpuhrsch
fbshipit-source-id: 4f55b99e8e25079d6273b4edf95ad6fa85aeaf24
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63554
Following https://github.com/pytorch/pytorch/pull/61840#issuecomment-884087809, this deprecates all the dtype getters publicly exposed in the `torch.testing` namespace. The reason for this twofold:
1. If someone is not familiar with the C++ dispatch macros PyTorch uses, the names are misleading. For example `torch.testing.floating_types()` will only give you `float32` and `float64` skipping `float16` and `bfloat16`.
2. The dtype getters provide very minimal functionality that can be easily emulated by downstream libraries.
We thought about [providing an replacement](https://gist.github.com/pmeier/3dfd2e105842ad0de4505068a1a0270a), but ultimately decided against it. The major problem is BC: by keeping it, either the namespace is getting messy again after a new dtype is added or we need to somehow version the return values of the getters.
Test Plan: Imported from OSS
Reviewed By: H-Huang
Differential Revision: D30662206
Pulled By: mruberry
fbshipit-source-id: a2bdb10ab02ae665df1b5b76e8afa9af043bbf56
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60548
`Tensor.__floordiv__` was indirectly deprecated by deprecation of `torch.floor_divide` (see https://github.com/pytorch/pytorch/issues/43874). Deprecating it directly provides clearer feedback.
Repro:
```
import torch
x = torch.tensor(0)
x // 1
```
Before this change, a deprecation warning was triggered within the C++ implementation of floor_divide:
```
UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at ../aten/src/ATen/native/BinaryOps.cpp:571.)
return torch.floor_divide(self, other)
```
After this change, the warning instead cites the user's offending line of Python code:
```
UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
x // 1
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64034
Reviewed By: mruberry
Differential Revision: D30658010
Pulled By: saketh-are
fbshipit-source-id: b0e6c5008d741897509d102f4a89efb47de4aa2a
Summary:
This PR enables Half, BFloat16, ComplexFloat, and ComplexDouble support for matrix-matrix multiplication of COO sparse matrices.
The change is applied only to CUDA 11+ builds.
`cusparseSpGEMM` also supports `CUDA_C_16F` (complex float16) and `CUDA_C_16BF` (complex bfloat16). PyTorch also supports the complex float16 dtype (`ScalarType::ComplexHalf`), but there is no convenient dispatch, so this dtype is omitted in this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59980
Reviewed By: ngimel
Differential Revision: D29699456
Pulled By: cpuhrsch
fbshipit-source-id: 407ae53392acb2f92396a62a57cbaeb0fe6e950b
Summary:
Fixes https://github.com/pytorch/pytorch/issues/59916
This fixes two problems with sparse multiplication
- 0d-dense * sparse was creating a non-sparse output and failing.
- dense * sparse or sparse * dense is not supported, but would emit an unhelpful error message
<details>
<summary> unhelpful error message </summary>
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NotImplementedError: Could not run 'aten::_nnz' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_nnz' is only available for these backends: [SparseCPU, SparseCUDA, SparseCsrCPU, SparseCsrCUDA, BackendSelect, Python, Named, Conjugate, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradXPU, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, UNKNOWN_TENSOR_TYPE_ID, Autocast, Batched, VmapMode].
SparseCPU: registered at aten/src/ATen/RegisterSparseCPU.cpp:961 [kernel]
SparseCUDA: registered at aten/src/ATen/RegisterSparseCUDA.cpp:1092 [kernel]
SparseCsrCPU: registered at aten/src/ATen/RegisterSparseCsrCPU.cpp:202 [kernel]
SparseCsrCUDA: registered at aten/src/ATen/RegisterSparseCsrCUDA.cpp:229 [kernel]
BackendSelect: fallthrough registered at ../aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:38 [backend fallback]
Named: registered at ../aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: registered at ../aten/src/ATen/ConjugateFallback.cpp:118 [backend fallback]
ADInplaceOrView: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:60 [backend fallback]
AutogradOther: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradCPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradCUDA: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradXLA: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradXPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradMLC: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradHPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradNestedTensor: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradPrivateUse1: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradPrivateUse2: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
AutogradPrivateUse3: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:11202 [autograd kernel]
Tracer: registered at ../torch/csrc/autograd/generated/TraceType_2.cpp:10254 [kernel]
UNKNOWN_TENSOR_TYPE_ID: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:446 [backend fallback]
Autocast: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:285 [backend fallback]
Batched: registered at ../aten/src/ATen/BatchingRegistrations.cpp:1016 [backend fallback]
VmapMode: fallthrough registered at ../aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
</details>
Also added tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61723
Reviewed By: ezyang
Differential Revision: D29962639
Pulled By: cpuhrsch
fbshipit-source-id: 5455680ddfa91d5cc9925174d0fd3107c40f5b06
Summary:
The `ops` decorator provides a way to parameterize a test across a given list of ops. This would be useful for modules as well (e.g. a `modules` decorator), but the mechanism by which this is accomplished is specific to ops. In the details, the `ops` decorator tags a test function with the metadata needed (list of ops, `dtypes`) and the actual tests are generated according to this metadata during the call to `instantiate_device_type_tests()`.
This PR makes this mechanism more generic, allowing for test parameterization across arbitrary dimensions. This makes a `modules` decorator (or any similar type of decorator) straightforward to implement without changes to the device-specific test instantiation logic.
One caveat is that, since this is implemented where the old `ops` decorator was (within `instantiate_device_type_tests()`), this only works for tests instantiated using the device-specific instantiation logic. Longer term, even device-specific test instantiation could be treated as an optional parameterization across device types, but this PR takes a low-risk approach for now. In practice, this just means that a `device` kwarg is required for all test signatures used with the mechanism.
The `ops` decorator has been refactored to use the generic mechanism and works the same as before, with one difference: when `OpDTypes.none` is specified, the test signature no longer needs an unused `dtype` kwarg. This is a nice bonus that demonstrates the added flexibility of a generic parameterization mechanism. The refactored form also has the bonus that all op-specific test generation logic is contained within the `ops` decorator class, improving readability.
Behind the scenes, the generic mechanism is a base decorator class (`_TestParameterizer`) from which `ops` derives. The core functionality is in the `_parameterize_test()` method, which takes in a test function and returns a generator that produces parameterized tests, including names and parameter kwargs to pass to them. Using the `ops` decorator results in a set of op-specific tests from a given generic test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60233
Reviewed By: iramazanli
Differential Revision: D29494995
Pulled By: jbschlosser
fbshipit-source-id: a14446488c106094fafcaa75ccf8e9e3faf33bfc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59553
Added a test for 0x0 sparse coo input for sparse_unary_ufuncs.
This test fails for `conj` on master.
Modified `unsupportedTypes` for test_sparse_consistency, complex dtypes
pass, but float16 doesn't pass for `conj` because `to_dense()` doesn't
work with float16.
Fixes https://github.com/pytorch/pytorch/issues/59549
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision: D28968215
Pulled By: anjali411
fbshipit-source-id: 44e99f0ce4aa45b760d79995a021e6139f064fea
Summary:
Resubmit of https://github.com/pytorch/pytorch/issues/58811, Closes gh-24745
The existing PR (gh-50655) has been stalled because `TensorIterator` doesn't guarantee iteration order in the same way that `TH_TENSOR_APPLY` does. For contiguous test cases this isn't an issue; but it breaks down for example with channels last format. I resolve this by adding a new `TensorIteratorConfig` parameter, `enforce_linear_iteration`, which disables dimension reordering. I've also added a test case for non-contiguous tensors to verify this works.
This PR also significantly improves performance by adding multithreading support to the algorithm. As part of this, I wrote a custom `count_nonzero` that gives per-thread counts which is necessary to write the outputs in the right location.
| Shape | Before | After (1 thread) | After (8 threads) |
|:----------:|--------:|-----------------:|------------------:|
| 256,128,32 | 2610 us | 2150 us | 551 us |
| 128,128,32 | 1250 us | 1020 us | 197 us |
| 64,128,32 | 581 us | 495 us | 99 us |
| 32,128,32 | 292 us | 255 us | 83 us |
| 16,128,32 | 147 us | 126 us | 75 us |
| 8,128,32 | 75 us | 65 us | 65 us |
| 4,128,32 | 39 us | 33 us | 33 us |
| 2,128,32 | 20 us | 18 us | 18 us |
| 1,128,32 | 11 us | 9 us | 9 us |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59149
Reviewed By: mruberry
Differential Revision: D28817466
Pulled By: ngimel
fbshipit-source-id: f08f6c003c339368fd53dabd28e9ada9e59de732
Summary:
Closes gh-24745
The existing PR (gh-50655) has been stalled because `TensorIterator` doesn't guarantee iteration order in the same way that `TH_TENSOR_APPLY` does. For contiguous test cases this isn't an issue; but it breaks down for example with channels last format. I resolve this by adding a new `TensorIteratorConfig` parameter, `enforce_linear_iteration`, which disables dimension reordering. I've also added a test case for non-contiguous tensors to verify this works.
This PR also significantly improves performance by adding multithreading support to the algorithm. As part of this, I wrote a custom `count_nonzero` that gives per-thread counts which is necessary to write the outputs in the right location.
| Shape | Before | After (1 thread) | After (8 threads) |
|:----------:|--------:|-----------------:|------------------:|
| 256,128,32 | 2610 us | 2220 us | 496 us |
| 128,128,32 | 1250 us | 976 us | 175 us |
| 64,128,32 | 581 us | 486 us | 88 us |
| 32,128,32 | 292 us | 245 us | 80 us |
| 16,128,32 | 147 us | 120 us | 71 us |
| 8,128,32 | 75 us | 61 us | 61 us |
| 4,128,32 | 39 us | 32 us | 32 us |
| 2,128,32 | 20 us | 17 us | 17 us |
| 1,128,32 | 11 us | 9 us | 9 us |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58811
Reviewed By: anjali411
Differential Revision: D28700259
Pulled By: ngimel
fbshipit-source-id: 9b279ca7c36d8e348b7e5e4be0dd159e05aee159
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54153
Currently, sparse tensors only support real floating point tensors. Complex support is added in this PR for CPU/CUDA.
- [x] add complex support (torch.cfloat and torch.cdouble) to torch.sparse_coo_tensor constructors
- [x] add complex support to coalesce function
- [x] add complex support to to_dense function
- [x] add complex support to to_sparse function
- [x] add complex support to sparse_add function
- [x] add unit tests
Note: This PR contains only complex support for torch.sparse_coo_tensor fordward function and the related ops used with this function (coalesce, to_dense, to_sparse, and sparse_add). The following PRs in ghstack should cover other sparse operations to have a more complex sparse support, specifically related with the use of specific APIs for accelerated linear algebra.
Note: Before using ghstack the original PR was #50984
Test Plan: Imported from OSS
Reviewed By: H-Huang
Differential Revision: D27765618
Pulled By: ezyang
fbshipit-source-id: a9cdd31d5c7a7dafd790f6cc148f3df26e884c89
Summary:
Fixes https://github.com/pytorch/pytorch/issues/52253
In the issue reproducer we can replace `torch.sparse.sum(S)` with `S.coalesce()` and get the same memory leak. The reason is that calling `coalesce()` on an already coalesced tensor returns `self`. With autograd, the result gets it's `grad_fn` set to a node that contains a reference to the input tensor, creating a reference cycle. Cloning the tensor fixes this, so `coalesce` always returns a new tensor.
As an aside, `torch.sparse.sum(S)` doesn't need to coalesce. The result should be the same either way.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52874
Reviewed By: bdhirsh
Differential Revision: D27246997
Pulled By: albanD
fbshipit-source-id: 0fe6c11043501a7874a50982afd42964f47470d3
Summary:
Stack:
* https://github.com/pytorch/pytorch/issues/54954 Fixed OpInfo jit tests failing for TensorList inputs
* __#54922 Added support for TensorList inputs in OpInfo__
Updated OpInfo to accept either a `Tensor` or `TensorList` as `sample.input` and added workarounds to make this work with gradcheck.
Note: JIT testing support for TensorList inputs will be added in a follow up PR.
Fixes https://github.com/pytorch/pytorch/issues/51996
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54922
Reviewed By: H-Huang
Differential Revision: D27448952
Pulled By: heitorschueroff
fbshipit-source-id: 3f24a56f6180eb2d044dcfc89ba59fce8acfe278
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54034Fixes#53544
I had to touch a bunch of lines but the refactoring was fairly
mechanical. Here's how it works.
The basic concept behind this PR is that tensor_new.cpp was previously
abusing DispatchKey when it actually meant TensorOptions. The provided
DispatchKey argument to most of the constructor functions typically
comes from torch::tensors::get_default_dispatch_key(); it doesn't
really make sense for people to set the default dispatch key, but
this got grandfathered in due to the old API set_default_tensor_type
(where the "Type" concept got refactored into "DispatchKey" concept
over time). See also #53124. But the upshot is that, semantically,
what we refer to as the default dispatch key really is more like
torch.set_default_tensor_type(torch.Tensor) versus
torch.set_default_tensor_type(torch.cuda.Tensor): clearly the user
wants to do something about *construction* of the tensor, and
TensorOptions captures that exactly.
So, how exactly to translate from one to the other?
- Sources (things that used to PRODUCE DispatchKey)
- Most top level functions take a DispatchKey as their argument. I
use the new function dispatchKeyToTensorOptions to convert it into
a TensorOptions
- typeIdWithDefault now produces a TensorOptions (probably could do
with a rename, though I didn't)
- Sinks (things that used to CONSUME DispatchKey)
- Previously, the function options() was typically used to convert the
DispatchKey into a TensorOptions. Now its replacement build_options
just takes a TensorOptions and sets some extra fields on it.
Irritatingly, I can't just replace
`build_options(options, scalar_type, device)` with
`options.dtype(scalar_type).device(device)` because the semantics
are slightly different: if device is nullopt, we should preserve
the usage of the device specified in options (what options.device()
does is overwrite the device unconditionally; e.g., if device is
nullopt, unset device from options)
- The other major sink for DispatchKey was `internal_new_from_data`,
but it turns out it only really extracts the device type from
the dispatch key. Now it just pulls out the device from
TensorOptions.
- To actually do the translation of DispatchKey to TensorOptions, I
introduce new functions dispatchKeyToLayout (replicating
layout_from_backend--there are still a few uses of this function
so I couldn't delete it) and dispatchKeyToDeviceType (replacing
computeDeviceType)
- In all internal functions, whenever DispatchKey is taken as an argument,
I instead take TensorOptions as an argument, and pass it along.
- Anywhere `legacyExtractDispatchKey(other.key_set())` equality was
previously used, I now do `other.options().type_equal()`, which
is the intended BC for doing "backend to backend" comparisons
- There are a few places in the sparse constructors where we allocated
a tensor for values, and then read out the dispatch key from the
result to allocate the keys. As best as I can tell, this is totally
equivalent to just passing in the options to both values and indices
(the only difference is dtype, which is captured via a separate
argument)
This refactor doesn't really go far enough: for example, there are now
functions that take both TensorOptions and ScalarType, when really
the TensorOptions can capture this all. I kept it solely just
s/DispatchKey/TensorOptions/ to reduce the number of possible bugs;
also, a lot of this will be mooted by a proper fix to #53124.
Even with this limited refactor, the payoff is sweet. I can delete:
- backendToCPU
- backendToXPU
- backendToCUDA
- backendToHIP
- backendToBackendOfDeviceType
The reason I can do this is because I can simply overwrite layout in TensorOptions
to do the conversion, rather than having to type out each backend case
explicitly.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: bhosmer
Differential Revision: D27109509
Pulled By: ezyang
fbshipit-source-id: 91d16cfbc390127770362ac04fb43f7e070077e9
Summary:
Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857
These are the only hand-written parts of this diff:
- the addition to `.github/workflows/lint.yml`
- the file endings changed in these four files (to appease FB-internal land-blocking lints):
- `GLOSSARY.md`
- `aten/src/ATen/core/op_registration/README.md`
- `scripts/README.md`
- `torch/csrc/jit/codegen/fuser/README.md`
The rest was generated by running this command (on macOS):
```
git grep -I -l ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' | xargs gsed -i 's/ *$//'
```
I looked over the auto-generated changes and didn't see anything that looked problematic.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406
Test Plan:
This run (after adding the lint but before removing existing trailing spaces) failed:
- https://github.com/pytorch/pytorch/runs/2043032377
This run (on the tip of this PR) succeeded:
- https://github.com/pytorch/pytorch/runs/2043296348
Reviewed By: walterddr, seemethere
Differential Revision: D26856620
Pulled By: samestep
fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97