pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
kshitij12345	ffd76d11c9	[fix] take : backward batching rule (#95772 ) Fixes https://github.com/pytorch/pytorch/issues/95738 Pull Request resolved: https://github.com/pytorch/pytorch/pull/95772 Approved by: https://github.com/zou3519	2023-03-30 17:18:17 +00:00
Li-Huai (Allan) Lin	7776653a0c	Add linear gradgrad (#97151 ) Fixes #92206 Pull Request resolved: https://github.com/pytorch/pytorch/pull/97151 Approved by: https://github.com/albanD	2023-03-30 07:25:02 +00:00
Edward Z. Yang	32fdd44577	SymIntify maybe_multiply (#97675 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/97675 Approved by: https://github.com/albanD	2023-03-27 23:20:23 +00:00
Joel Schlosser	0d66db1b2a	Implement last dim split_with_sizes for NT (forward only, non-SymInt-ified) (#97446 ) This is needed for the HSTU model. Details: * ~~NT `chunk` now calls into NT `split_with_sizes` since the latter is more general~~ (removed; they're totally separate) * Throws for backward * Only operates over the last dim (`dim=-1`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97446 Approved by: https://github.com/cpuhrsch	2023-03-23 22:17:06 +00:00
Driss Guessous	98a5cf090d	[SDPA] Remove the chunk_grad from mem-eff attention (#96880 ) # Summary There exists an optimization within the scaled_dot_product_efficieint bacwkard attention path to, under the right conditions, output grad_q, grad_k, grad_v all as aliases of the same storage. This was done to optimize for the hot path where mha does packed linear_projection -> chunk -> (view stuff) -> sdpa. The thought was that chunk-> would be able to "trivially" cat inputs to chunk.backward(). However upon closer inspection chunk.backward will call ` cat` irregardless of the inputs so this is not being utilized. I validated this by profiling on main and then this branch and the traces produced the same both with `split.backward()` calling into cat. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96880 Approved by: https://github.com/cpuhrsch	2023-03-17 21:28:25 +00:00
Driss Guessous	7ec0d6f006	Moves SDPA backward helper native function to functionsmanual.cpp (#95821 ) ## Summary chunk_grad_outputs should have been created within functionsmanual.cpp to begin with. This removes it as a native function and adds to its appropriate home. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95821 Approved by: https://github.com/cpuhrsch, https://github.com/albanD	2023-03-14 17:49:07 +00:00
Nikita Shulga	82daf98151	[Sparse] Move `SparseTensorUtils.` to `native/` (#96696 ) Fixes internal linking problem after `DECLARE_DISPATCH` was introduced in SparseTensorUtils.cpp, but implemented inside the native library. Also, fix `sign-unsigned` compare in `_flatten_indices_impl` Followups: Move code declared/implemented in `SparseTensorUtils.` to `at::native` namespace Pull Request resolved: https://github.com/pytorch/pytorch/pull/96696 Approved by: https://github.com/albanD	2023-03-14 02:56:52 +00:00
Kazuaki Ishizaki	69aa6b4bb9	fix typo in comments under torch/csrc/autograd (#96061 ) This PR fixes typos in comments of `.cpp` and `.h` files under `torch/csrc/autograd` directory Pull Request resolved: https://github.com/pytorch/pytorch/pull/96061 Approved by: https://github.com/soulitzer	2023-03-06 18:05:14 +00:00
Masaki Kozuki	49f6849f58	Fix codegen logic for foreach derivatives (#95263 ) follow-up https://github.com/pytorch/pytorch/pull/93901. Unexpected numerical mismatches observed in some foreach functions' backward result seemed to be caused by the wrong order of `IndexRangeGenerator::range` call. This pr has `args_with_derivatives` have the same or similar order of `foreach_native_function.func.arguments.flat_non_out` --- what the current master generates for `_foreach_mul.List`: ```cpp variable_list ForeachMulBackward0List::apply(variable_list&& grads) { std::lock_guard<std::mutex> lock(mutex_); TORCH_CHECK(!other_released_, ERR_BACKWARD_TWICE); TORCH_CHECK(!self_released_, ERR_BACKWARD_TWICE); IndexRangeGenerator gen; auto other_ix = gen.range(other_size_); auto self_ix = gen.range(self_size_); variable_list grad_inputs(gen.size()); auto other = unpack_list(other_); auto self = unpack_list(self_); if (task_should_compute_output({ other_ix })) { std::vector<Tensor> grad_result; grad_result.reserve(grads.size()); for (const auto & i : c10::irange(grads.size())) { grad_result.emplace_back(mul_tensor_backward(grads[i], self[i], other[i].scalar_type())); } copy_range(grad_inputs, other_ix, grad_result); } if (task_should_compute_output({ self_ix })) { std::vector<Tensor> grad_result; grad_result.reserve(grads.size()); for (const auto & i : c10::irange(grads.size())) { grad_result.emplace_back(mul_tensor_backward(grads[i], other[i], self[i].scalar_type())); } copy_range(grad_inputs, self_ix, grad_result); } return grad_inputs; } ``` with this PR the generated backward is ```cpp variable_list ForeachMulBackward0List::apply(variable_list&& grads) { std::lock_guard<std::mutex> lock(mutex_); TORCH_CHECK(!self_released_, ERR_BACKWARD_TWICE); TORCH_CHECK(!other_released_, ERR_BACKWARD_TWICE); IndexRangeGenerator gen; auto self_ix = gen.range(self_size_); <----- diff auto other_ix = gen.range(other_size_); <----- diff variable_list grad_inputs(gen.size()); auto self = unpack_list(self_); auto other = unpack_list(other_); if (task_should_compute_output({ other_ix })) { std::vector<Tensor> grad_result; grad_result.reserve(grads.size()); for (const auto & i : c10::irange(grads.size())) { grad_result.emplace_back(mul_tensor_backward(grads[i], self[i], other[i].scalar_type())); } copy_range(grad_inputs, other_ix, grad_result); } if (task_should_compute_output({ self_ix })) { std::vector<Tensor> grad_result; grad_result.reserve(grads.size()); for (const auto & i : c10::irange(grads.size())) { grad_result.emplace_back(mul_tensor_backward(grads[i], other[i], self[i].scalar_type())); } copy_range(grad_inputs, self_ix, grad_result); } return grad_inputs; } ``` The change is to fix the order of `self_ix` and `other_ix`.[](url) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95263 Approved by: https://github.com/soulitzer	2023-03-04 20:03:54 +00:00
Edward Z. Yang	fb10e66d35	Bulk convert numel() to sym_numel() in FunctionsManual (#95543 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/95543 Approved by: https://github.com/ngimel, https://github.com/Skylion007	2023-02-27 13:46:13 +00:00
Peter Bell	bc438af6fe	std/var: support floating point correction value (#94073 ) Ref https://github.com/pytorch/pytorch/issues/61492#issuecomment-1413003480 The array API specifies correction to be `Union[int, float]` while we currently only support integers. https://data-apis.org/array-api/latest/API_specification/generated/array_api.std.html As std/var is calculated currently, the final count of elements is already done in floating point so we can make the correction floating point without any loss of precision or generality. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94073 Approved by: https://github.com/ezyang	2023-02-23 05:50:45 +00:00
Li-Huai (Allan) Lin	b6a1c238bd	[MPS] Remove mps specialized path in BCE backward (#95220 ) Remove mps specialized path in BCE backward as `logit` op has been implemented for mps. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95220 Approved by: https://github.com/soulitzer	2023-02-22 19:43:53 +00:00
kshitij12345	311b20aae1	[fix] torch.pow handle real negative base and complex exponent (#95198 ) Fixes https://github.com/pytorch/pytorch/issues/89903 https://github.com/pytorch/pytorch/issues/95111 Pull Request resolved: https://github.com/pytorch/pytorch/pull/95198 Approved by: https://github.com/albanD, https://github.com/ngimel	2023-02-21 18:36:20 +00:00
min-jean-cho	92f3feabaa	fix torch.var backward when n==correction (#94546 ) Fixes #94184 This PR, as discussed in [comment ](https://github.com/pytorch/pytorch/issues/94184#issuecomment-1422128166), returns `x.grad` of same shape as `x`, and filled with `NaN` when the gradient of `torch.var(unbiased=True)` is `NaN`. The gradient of unbiased variance is `NaN` (undefined, divide by zero in the denom `N-1`, where `N` is the number of samples) when `N` is 1 (i.e., there's one sample only -- product of dim is 1 such as `[1]`, `[1,...,1]`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/94546 Approved by: https://github.com/soulitzer	2023-02-13 23:38:38 +00:00
Yanming Wang	9bef1ebb9e	Fix div by fp64 scalar issue on xla device (#94459 ) This PR fixes https://github.com/pytorch/xla/issues/4574. I'll create a separate test PR in pytorch/xla repo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94459 Approved by: https://github.com/ezyang	2023-02-10 17:57:47 +00:00
cyy	1a32db15e7	Some performance fixes (#94034 ) Applies some performance fixes Pull Request resolved: https://github.com/pytorch/pytorch/pull/94034 Approved by: https://github.com/Skylion007	2023-02-04 02:17:48 +00:00
Nikita Vedeneev	b484d17c24	_sparse_coo_tensor_with_dims_and_tensors backward: simplify and optimize (#91704 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91704 Approved by: https://github.com/albanD, https://github.com/cpuhrsch	2023-02-01 09:02:25 +00:00
cyy	4d51c8532c	Some simple fixes (#93221 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/93221 Approved by: https://github.com/Skylion007	2023-01-30 05:14:03 +00:00
Aaron Gokaslan	0247ed27cc	Apply Clang-Tidy readability-container-size-empty (#93236 ) Not only is this change usually shorter and more readable, it also can yield better performance. size() is not always a constant time operation (such as on LinkedLists), but empty() always is. Pull Request resolved: https://github.com/pytorch/pytorch/pull/93236 Approved by: https://github.com/malfet	2023-01-29 23:28:19 +00:00
mfkasim1	75cfc0be21	Logcumsumexp for CPU (#93153 ) Partial work from #90847, in the direction of solving #89205. Most of the content is from #90847, but this is only for CPU, so hopefully it does not increase the build time by a lot. tag: @albanD, @malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/93153 Approved by: https://github.com/malfet, https://github.com/Skylion007	2023-01-27 22:29:33 +00:00
cyy	e292ddff4e	More clang-tidy fixes (#92944 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/92944 Approved by: https://github.com/Skylion007	2023-01-25 19:11:51 +00:00
PyTorch MergeBot	9b23fd378f	Revert "Logcumsumexp for complex in CPU and CUDA (#90847 )" This reverts commit `64985123e4`. Reverted https://github.com/pytorch/pytorch/pull/90847 on behalf of https://github.com/malfet due to Reverting to decrease build time, let's discuss the alternatives here	2023-01-24 20:49:08 +00:00
Aaron Gokaslan	8c8cd9539d	Add missing moves to torch autograd (#92772 ) Applies some additional std::move functions to torch/csrc/autograd to opportunities that were found via static analysis. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92772 Approved by: https://github.com/ezyang	2023-01-24 02:01:52 +00:00
Nikita Vedeneev	9f381c9b7f	sparse_sparse_matmul: simplify backward (#91712 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91712 Approved by: https://github.com/albanD	2023-01-23 19:24:28 +00:00
mfkasim1	64985123e4	Logcumsumexp for complex in CPU and CUDA (#90847 ) Another PR towards solving #89205. What's in this PR: * The implementation of forward `logcumsumexp` for complex numbers in CPU & CUDA * The tests on forward call of `logcumsumexp` for complex numbers * The implementation of backward `logcumsumexp` for complex numbers What's missing: * The test on backward gradient of `logcumsumexp` (it complaints `RuntimeError: logcumsumexp does not support automatic differentiation for outputs with complex dtype.` and I don't know how to solve the error and I don't know where to put the test for the backward computation). If possible, I'd like this to be done in this PR. It's really tricky to handle the edge cases here (i.e. the ones involving `inf`), but I've tried my best to put some comments explaining the reasonings of my decisions in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90847 Approved by: https://github.com/albanD	2023-01-20 15:10:50 +00:00
Peter Bell	4058dedf21	Replace log(1 + x) with log1p(x) (#92114 ) `log1p` offers better precision near zero since `(1 + x) - 1` truncates any values less than the float epsilon to zero. For `soft_margin_loss` this also requires one fewer kernel invocation which for numel=1e7 gives me a 1.2x speedup on CUDA and a 1.1x speedup on CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/92114 Approved by: https://github.com/ngimel, https://github.com/lezcano	2023-01-18 10:43:56 +00:00
Peter Bell	fb1427ea8f	squeeze: allow squeezing multiple dimensions at once (#89017 ) Ref #70924 This addresses part 1 of the issue, allowing `torch.squeeze` to be passed a tuple of dimensions. e.g. ```python x.squeeze(0).squeeze(0) ``` can now be written ```python x.squeeze((0, 1)) ``` (assuming x has at least 2 dimensions) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89017 Approved by: https://github.com/albanD	2023-01-17 14:20:15 +00:00
David Berard	d7dc1c2fd5	Support zero dimensions in softmax decompositions (#91322 ) The eager implementation of softmax supports computation along zero dimensions, but many of the other implementations did not, including: * decompositions & refs (this was causing dynamo failures) * forward AD for logsumexp * MPS log_softmax_backward This PR handles the `input.numel() == 0` cases separately to avoid running `amax()`, which fails for zero dimensions, and updates opinfos. example of "computation along zero dimensions": ```python # example of where import torch t = torch.rand((4, 0, 0)) print("~") print(torch.nn.functional.softmax(t, dim=-1)) # this passes print("~") torch._refs.softmax(t, dim=-1) # this fails print("~") ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/91322 Approved by: https://github.com/lezcano	2023-01-11 09:35:43 +00:00
PyTorch MergeBot	df4b3b13bc	Revert "squeeze: allow squeezing multiple dimensions at once (#89017 )" This reverts commit `e26cb06681`. Reverted https://github.com/pytorch/pytorch/pull/89017 on behalf of https://github.com/mehtanirav due to Internal breakages	2023-01-05 19:25:08 +00:00
Peter Bell	e26cb06681	squeeze: allow squeezing multiple dimensions at once (#89017 ) Ref #70924 This addresses part 1 of the issue, allowing `torch.squeeze` to be passed a tuple of dimensions. e.g. ```python x.squeeze(0).squeeze(0) ``` can now be written ```python x.squeeze((0, 1)) ``` (assuming x has at least 2 dimensions) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89017 Approved by: https://github.com/albanD	2023-01-04 14:40:56 +00:00
lezcano	d5163f5206	Fix NumPy broadcasting in lstsq_backward (#91460 ) Fixes https://github.com/pytorch/pytorch/issues/77225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91460 Approved by: https://github.com/albanD	2022-12-30 10:49:20 +00:00
lezcano	051d16a2f7	Fix NumPy-compat broadcasting in the derivative of linalg.solve (#91456 ) Fixes https://github.com/pytorch/pytorch/issues/89761 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91456 Approved by: https://github.com/albanD	2022-12-30 10:49:20 +00:00
lezcano	484dd40022	Implement PReLU in a compositional way (#91238 ) The PReLU implementation was all over the place. This lead to a number of bugs like https://github.com/pytorch/pytorch/issues/68760. We fix it by: - Keeping the weird broadcasting logic it has as a CompositeImplicit kernel that calls into a second kernel - This second kernel is just a good-ol' pointwise kernel. - We implement the derivative for the pointwise kernel via TI as well for speed. - We implement the second derivative for the pointwise kernel and the forward AD derivatives compositionally This fixes a number of issues: - We don't perform copies any more when the inputs are not contiguous - The derivatives are now correct - We fix vmap and many other functorch-related issues. - CPU and CUDA now share the relevant broadcasting logic - The implementation is about 1/3 the length. Fixes https://github.com/pytorch/pytorch/issues/68760 Fixes https://github.com/pytorch/pytorch/issues/89895 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91238 Approved by: https://github.com/kshitij12345, https://github.com/jbschlosser, https://github.com/albanD	2022-12-30 10:42:30 +00:00
lezcano	5b223c43ec	Avoid calling allclose in the backward if there are tensor subclasses (#91444 ) `allclose` it's data-dependent (returns a bool) so it does not play well with functorch. We are skipping that check in the context of subclasses to avoid hard errors. Partially fixes https://github.com/pytorch/pytorch/issues/90499 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91444 Approved by: https://github.com/albanD	2022-12-28 19:12:50 +00:00
Nikita Karetnikov	cc11edb084	[aot_autograd] symintify `logsumexp` (#91442 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/91442 Approved by: https://github.com/albanD	2022-12-28 18:06:26 +00:00
Nikita Vedeneev	3870a9e28d	to_sparse_XXX: backward support (#90281 ) As per title. Fixes https://github.com/pytorch/pytorch/issues/85226 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90281 Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer	2022-12-14 09:05:17 +00:00
soulitzer	98a9235dce	Fix prelu ref when a.ndim < 2 (#89809 ) Fixes https://github.com/pytorch/pytorch/issues/89560 Previously the test case for "input is 1-D or scalar + weight is not scalar" did not exist; adding it introduced some failures: - forward AD (fixed in this PR) - vmap (filed https://github.com/pytorch/pytorch/issues/89895) - ref/meta (fixed this PR, though this also regresses nvFuser support) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89809 Approved by: https://github.com/ngimel	2022-12-12 23:55:31 +00:00
Aaron Gokaslan	7541c9f8be	[Fix]: remove unnecessary copies in aten, c10, and torch bindings (#90629 ) Applies various automated fixes that reduces the number of spurious copies in torch, aten, and c10. I also inlined any default dtors that would have made the type trivially destructible. Follow up to #89000 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90629 Approved by: https://github.com/ezyang	2022-12-12 17:05:52 +00:00
Richard Zou	4b1053497c	[vmap] Prepend "legacy" to files for old vmap implementation (#90324 ) We have an older torch.vmap implementation. It is no longer supported. It still needs to exist somewhere for the sake of BC with torch.autograd.functional. This PR makes it clear what files are meant for implementing the old vmap implementation. I've seen a couple of PRs recently adding support for the old vmap implementation, so this will lessen the confusion. Test Plan: - CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/90324 Approved by: https://github.com/samdow	2022-12-07 18:46:15 +00:00
Nikita Karetnikov	4cb6bbbe27	Symintify `embedding` (#89327 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89327 Approved by: https://github.com/ezyang	2022-11-24 03:25:00 +00:00
Andrew M. James	a41f70603a	Round out rad2deg sparse support (#88442 ) - Add sparse coo dispatch - Modify backward to work with sparse compressed layouts - Enable sparse_compressed autograd testing - Correct layout support attributes on OpInfo Pull Request resolved: https://github.com/pytorch/pytorch/pull/88442 Approved by: https://github.com/cpuhrsch	2022-11-17 06:00:23 +00:00
Kazuaki Ishizaki	e0c194f10b	Fix typos in messages under torch (#88961 ) This PR fixes typos of messages and parms in c++ source and head files under `torch` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88961 Approved by: https://github.com/albanD	2022-11-14 19:06:41 +00:00
Brian Hirsh	a16ced03c9	reland "fix as_strided_scatter_backward (#87646 )" (#88342 ) This reverts commit `71fb763e54`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88342 Approved by: https://github.com/zou3519	2022-11-07 15:00:58 +00:00
Andrew M. James	ff6770a9a1	enable backward for log1p (sparse layouts) (#88155 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88155 Approved by: https://github.com/cpuhrsch	2022-11-04 20:59:26 +00:00
Andrew M. James	6938dd0b2c	Support sparse inputs to deg2rad (#88156 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88156 Approved by: https://github.com/cpuhrsch	2022-11-04 20:59:26 +00:00
PyTorch MergeBot	71fb763e54	Revert "fix as_strided_scatter_backward (#87646 )" This reverts commit `f9d7985851`. Reverted https://github.com/pytorch/pytorch/pull/87646 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but I think this one or one of the PR in the stack break bionic-cuda11.7 on trunk `70782981f0`	2022-11-02 16:54:36 +00:00
Brian Hirsh	f9d7985851	fix as_strided_scatter_backward (#87646 ) as_strided_scatter's derivative formula was broken - instead of making a "mask" of 1's and 0's, it would effectively make a mask of 1's and uninitialized memory. Fixes https://github.com/pytorch/pytorch/issues/88105 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87646 Approved by: https://github.com/albanD	2022-11-02 14:36:49 +00:00
albanD	8a9aca7b8d	Reland 2 Many symintifications (#87604 ) (#87980 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/87980 Approved by: https://github.com/ezyang	2022-10-28 13:40:11 +00:00
PyTorch MergeBot	8b4d95759c	Revert "Many symintifications (#87604 )" This reverts commit `777e6a2c51`. Reverted https://github.com/pytorch/pytorch/pull/87604 on behalf of https://github.com/weiwangmeta due to breaking internal builds	2022-10-28 03:00:11 +00:00
albanD	777e6a2c51	Many symintifications (#87604 ) Adds expand_inplace conv conv_double_backward convolution adaptive_avg_pool2d_symint _embedding_bag_backward_symint cudnn_grid_sampler cuda 32 bit indexing nll_loss / nll_loss_2d tensor split pooling same mode cudnn_is_acceptable storage nbytes Pull Request resolved: https://github.com/pytorch/pytorch/pull/87604 Approved by: https://github.com/ezyang	2022-10-26 17:33:53 +00:00

1 2 3 4 5 ...

371 Commits