mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-06 12:20:52 +01:00
Summary: Removes older `torch.stack`-based logic in favor of `torch.diagonal()` and `torch.diag_embed()`. I see 100x speedup in my application, where my batched matrix has shape `(800, 32 ,32)`. ```py import torch from torch.distributions import constraints, transform_to x = torch.randn(800, 32, 32, requires_grad=True) # Before this PR: %%timeit transform_to(constraints.lower_cholesky)(x).sum().backward() # 579 ms ± 34.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) # After this PR: %%timeit transform_to(constraints.lower_cholesky)(x).sum().backward() # 4.5 ms ± 241 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/24131 Differential Revision: D16764035 Pulled By: ezyang fbshipit-source-id: 170cdb0d924cdc94cd5ad3b75d1427404718d437 |
||
|---|---|---|
| .. | ||
| _thnn | ||
| autograd | ||
| backends | ||
| contrib | ||
| csrc | ||
| cuda | ||
| distributed | ||
| distributions | ||
| for_onnx | ||
| jit | ||
| legacy | ||
| lib | ||
| multiprocessing | ||
| nn | ||
| onnx | ||
| optim | ||
| quantization | ||
| sparse | ||
| testing | ||
| utils | ||
| __config__.py | ||
| __future__.py | ||
| __init__.py | ||
| __init__.pyi.in | ||
| _classes.py | ||
| _jit_internal.py | ||
| _ops.py | ||
| _six.py | ||
| _storage_docs.py | ||
| _tensor_docs.py | ||
| _tensor_str.py | ||
| _torch_docs.py | ||
| _utils_internal.py | ||
| _utils.py | ||
| abi-check.cpp | ||
| CMakeLists.txt | ||
| custom_class.h | ||
| extension.h | ||
| functional.py | ||
| hub.py | ||
| namedtensor.py | ||
| py.typed | ||
| quasirandom.py | ||
| random.py | ||
| README.txt | ||
| script.h | ||
| serialization.py | ||
| storage.py | ||
| tensor.py | ||
Note [TH abstraction violation] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ TH/THC provide some hpp headers, which are proper C++ headers rather than C headers. These headers serve double duty as *internal implementation detail* headers, whose contents should largely not be used by external clients. Ideally, we would not install these headers at all; instead, you should use public functions (in headers like `THTensor.h`, NOT `THTensor.hpp`) to manipulate these structs. However, there are a few places in torch/csrc where we violate this abstraction. They are marked with a pointer to this note. Each of those sites will have to be refactored when we refactor the guts of THTensor and related structures.