pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
isalia20	37fe8015ac	softshrink nan fixes (#138421 ) Fixes #138385 . Currently contains fixes for cpu and cuda. Will add fixes to mps as well soon if my mac can build it from source.(Had some issues with building it on my linux pc due to limited memory) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138421 Approved by: https://github.com/mikaylagawarecki	2024-11-21 23:06:08 +00:00
Michael Diggin	723498aab8	Gaussian nll loss scalar variance support (#138931 ) Fixes #138747 Adds support for `variance` being a Tensor or a float in `gaussian_nll_loss` to avoid a cpu-gpu sync point in the loss function, when the variance is a static tensor like `<scalar>*torch.ones_like(input)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/138931 Approved by: https://github.com/mikaylagawarecki	2024-11-21 18:20:09 +00:00
PyTorch MergeBot	109f8274a8	Revert "Add NHWC support for group normalization (#126635 )" This reverts commit `ed0e63e938`. Reverted https://github.com/pytorch/pytorch/pull/126635 on behalf of https://github.com/kit1980 due to Reverted internally at Meta, see D65979564 ([comment](https://github.com/pytorch/pytorch/pull/126635#issuecomment-2480130943))	2024-11-15 23:38:15 +00:00
Nikita Shulga	0f739b8f66	[Codemod] `skipIfMps`->`skipIfMPS` (#140562 ) As `MPS` is an acronym that stands for Metal Performance Shaders Also to closer align with `skipCUDAIf` not `skipCudaIf` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140562 Approved by: https://github.com/ZainRizvi, https://github.com/r-barnes	2024-11-13 19:45:08 +00:00
zeshengzong	cb71bcc542	Replace clone.detach with detach.clone (#140264 ) Fixes #64532 As state in issue, replace `clone.detach` by `detach.clone` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140264 Approved by: https://github.com/soulitzer	2024-11-13 07:01:02 +00:00
Mikayla Gawarecki	2ee91db03d	Add APIs to separate norm calculation and gradient scaling in `nn.utils.clip_grad_norm_` (#139662 ) Fixes https://github.com/pytorch/pytorch/issues/139467 Refactor `nn.utils.clip_grad_norm_` into `nn.utils.get_total_norm` and then `nn.utils.clip_grads_with_norm_` . `clip_grad_norm_` now calls into these two new ops, `get_total_norm` is generalized (rather than `get_grad_norm` due to the discussion on the issue from @awgu) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139662 Approved by: https://github.com/H-Huang	2024-11-07 23:13:23 +00:00
Danial Javady	ed0e63e938	Add NHWC support for group normalization (#126635 ) Fixes #111824 Currently it is the case that if the user specifies their group normalization to be of NHWC format, pytorch will default to NCHW tensors and convert. This conversion is not immediately obvious to the user unless they check the format themselves which is not intuitive. This PR adds suppor for NHWC for cuda by adding necessary kernels. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126635 Approved by: https://github.com/eqy, https://github.com/mikaylagawarecki	2024-11-07 01:12:08 +00:00
Donald Tolley	c1e7d85ce6	Add Weighted Loss Functions to PyTorch : WMSE, WMAE, and Weighted Huber Loss (#132049 ) #### Summary This pull request introduces new weighted loss functions to the PyTorch library: `weighted_huber_loss`, `wmse_loss`, and `wmae_loss`. These functions allow for precise control over the influence of each sample during training, important for imbalanced data or when certain samples are more significant than others. #### Changes - `weighted_huber_loss`: Huber loss modified to incorporate weights, providing a balance between L1 and L2 loss based on the `delta` parameter. - `wmse_loss` (Weighted Mean Squared Error): Applies weights to the standard MSE loss, useful for emphasizing certain samples in regression tasks. - `wmae_loss` (Weighted Mean Absolute Error): Adjusts MAE loss calculation by including weights, ideal for datasets with outliers. #### Code Details - Input Validation: Ensures `input`, `target`, and `weights` tensors match in size to prevent broadcasting errors. - Reduction Options: Supports `none`, `mean`, and `sum` reductions to suit various computational needs. - Backward Compatibility: Maintains support for deprecated arguments `size_average` and `reduce`, while encouraging use of the `reduction` argument. #### Usage Example ```python import torch input = torch.tensor([0.5, 2.5, 2.0], dtype=torch.float32) target = torch.tensor([0.0, 2.0, 1.5], dtype=torch.float32) weights = torch.tensor([1.0, 0.5, 1.5], dtype=torch.float32) loss = weighted_huber_loss(input, target, weights, delta=1.0) print(loss) ``` --- Feedback on these implementations is welcome; please let me know if further modifications are required. Resolves #132465 Pull Request resolved: https://github.com/pytorch/pytorch/pull/132049 Approved by: https://github.com/mikaylagawarecki Co-authored-by: mikaylagawarecki <mikaylagawarecki@gmail.com>	2024-10-31 21:59:43 +00:00
Siddharth Kotapati	e27c0048db	Enable additional tests for MPS CI runs (#134356 ) As part of the follow up for https://github.com/pytorch/pytorch/issues/133520, adapting existing unused tests for use in MPS CI runs. Focusing on nhwc & other memory formatting tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/134356 Approved by: https://github.com/malfet, https://github.com/eqy, https://github.com/huydhn	2024-10-04 21:52:38 +00:00
Nikita Shulga	c6192f32f1	[MPS] Add upsample_bicubic2d as Metal op (#136123 ) More or less literal copy-n-paste of `c33b0580e6/aten/src/ATen/native/cuda/UpSampleBicubic2d.cu (L24)` and `c33b0580e6/aten/src/ATen/native/cuda/UpSampleBicubic2d.cu (L99)` Missing `uint8` implementation mimics CUDA behavior Initial version coded live in https://www.youtube.com/watch?v=shi6Kb5xxvk Later refinements: - Switch from 2D dispatch to 1D one (to match CUDA behavior) - Added batch + channel loops - Fixed scale computation to match align corners behavior - Added backward implementation Backward implementation again, mimics CUDA, so it has issues precision issue for `torch.half` as well as a somewhat slow simulation of atomic adds using atomic compare and exchange of the pair of adjacent values, i.e. ```metal emplate <typename T> static inline void atomic_add_helper( device atomic<int>* data, long offset, float value) { auto ptr = data + (offset >> 1); auto old = atomic_load_explicit(ptr, memory_order_relaxed); union { int i; T t[2]; } val; do { val.i = old; val.t[offset & 1] += static_cast<T>(value); } while (!atomic_compare_exchange_weak_explicit( ptr, &old, val.i, memory_order_relaxed, memory_order_relaxed)); } ``` Bump basic Metal language version to 3.0, as it's supported on MacOS13 and that's the first version that has `atomic_float` Pull Request resolved: https://github.com/pytorch/pytorch/pull/136123 Approved by: https://github.com/albanD	2024-09-24 18:58:11 +00:00
Xinya Zhang	74fd1bf965	[ROCm] Update to AOTriton 0.7b (#134498 ) Notable changes: 1. Enable CudaGraph related tests 2. Fix UT problems 3. EXPERIMENTAL Navi31 support. User should enable Navi31 support with Env Var `TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1` Know Problem: 1. `test/test_transformers.py` will massive failures and/or NaN outputs with `--use-pytest` + Update: Confirmed skip `class TestSDPAPrivateUse1Only` can fix the problem with `--use-pytest` Note: AOTriton 0.7b adds support to nestedtenosrs+SDPA but need more work (and consequently a separate PR) to enable it. Fixes #133540 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134498 Approved by: https://github.com/pruthvistony, https://github.com/jeffdaily, https://github.com/malfet	2024-09-11 20:34:01 +00:00
Roy Hvaara	23b1486185	[MPS] Allow nan mean reduction in `nll_loss` (#135434 ) This PR allows results from `nn_loss` to be `nan`, which is the same behavior as with CUDA and CPU https://github.com/pytorch/pytorch/pull/64572#issuecomment-926504162. Fixes #134431 Ref #64572 #119108 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135434 Approved by: https://github.com/malfet	2024-09-10 08:37:59 +00:00
Guilherme Leobas	136e28f616	Enable forward AD in functional.affine_grid (#135494 ) Fixes #121411 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135494 Approved by: https://github.com/zou3519, https://github.com/soulitzer	2024-09-10 00:07:07 +00:00
CaoE	dfb2b661f7	Use float data type for Half var_sum in batchnorm stats updating on CPU (#126525 ) Using float data type for Half `var_sum` in batchnorm stats updating on CPU to avoid `var_sum` overflow since the representation range of Half is small. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126525 Approved by: https://github.com/jgong5, https://github.com/peterbell10	2024-09-09 15:31:38 +00:00
Roy Hvaara	5a69e0ebbe	[MPS] Update decorator comments with issue ref (#135448 ) Updating the comments with references to better places for context now that the bugs have been identified. xref #135442 #135447 #134184 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135448 Approved by: https://github.com/ezyang	2024-09-09 15:18:52 +00:00
Nikita Shulga	f95085fd91	[BE][MPS] Prefer xfail to skip (#134858 ) This essentially undoes large skips on everything but MacOS Sequoia to nn.modules made by https://github.com/pytorch/pytorch/pull/128393 Instead it uses existing `xfail`, but guards it on `_macos15_or_newer` boolean Before the change if run on MacOS 14: ``` % python3 ../test/test_modules.py -v -k Hardswish 2>&1\|tail -n3 Ran 57 tests in 0.053s OK (skipped=32) ``` After ``` % python3 ../test/test_modules.py -v -k Hardswish 2>&1\|tail -n3 Ran 57 tests in 0.229s OK (skipped=10, expected failures=2) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/134858 Approved by: https://github.com/janeyx99	2024-08-31 00:29:48 +00:00
Nikita Shulga	8de0d7690c	Use newer `toAccumulateType` signature in `Normalization.cpp` (#134540 ) Which fixes BatchNorm behavior for if called with empty tensors on MPS backed. Removed `expectedFailureMPS` in test_nn.py, deleted expected failure in `test_mps.py` and adjusted `skipIfMPS` to `expectedFailureMPS` in BatchNorm2d OpInfo decorator, but restrict it only to the memory format tests Test Plan: CI + `python3 -c "import torch; print(torch.nn.BatchNorm2d(3, device='mps')(torch.rand(0, 3, 2, 2, device='mps')))"` Fixes https://github.com/pytorch/pytorch/issues/134423 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134540 Approved by: https://github.com/Skylion007, https://github.com/albanD	2024-08-27 18:09:20 +00:00
Roy Hvaara	1565940114	[MPS] Add `test/test_nn.py` to test suite (#134184 ) This PR increases test coverage by including the tests in `test/test_nn.py` in the test suite of MPS. Some of the tests are decorated with `@expectedFailureMPS` for various reasons. Either that the op is not implemented, or that the outputs do not align. Those tests that contain differing results should be investigated further to rule out any live bugs. ```bash $ python test/run_test.py --mps --verbose -k TestNN Running test batch 'tests to run' cost 84.76 seconds ``` Ref #133520 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134184 Approved by: https://github.com/albanD, https://github.com/malfet	2024-08-26 23:48:23 +00:00
drisspg	fb26b84390	Update fused kernels and call _safe_softmax from SDPA (#133882 ) # UPDATE: This is take 3 of https://github.com/pytorch/pytorch/pull/131863 which was landed via co dev but not applying correclty # Summary Changes the stance of SDPA on what to do for fully masked out rows ## Current Behavior Several PyTorch users have expressed frustration over this issue: - https://github.com/pytorch/pytorch/issues/41508 - https://github.com/pytorch/pytorch/issues/103749 - https://github.com/pytorch/pytorch/issues/103963 These are significant issues with extensive discussion but no satisfactory resolution. The PyTorch team's consensus, as stated here: https://github.com/pytorch/pytorch/issues/24816#issuecomment-524415617 Can be paraphrased as follows: When passing in fully masked out rows, attention becomes ambiguous. We have two main options: 1. Uniformly attend to all values: ```python scores[masked_out_rows] = 1 / len(row) out[masked_out_rows] = 1 / len(row) * value ``` 2. Decide that attention between no queries (masked) and no keys (masked) is meaningless: ```python output[fully_masked_rows] = NaN ``` We went with option 2. Partially because it was easier to implement, but also people argued that users can slice the output to remove the NaNs: ``` Python >fill_value = -float("inf") >row0 = torch.randn(4) >row1 = torch.tensor([(fill_value for _ in range(4)]) >matrix = torch.stack([row0, row1]).requires_grad_(True) >out = torch.softmax(matrix, 1) >out = out[0] >print(out) tensor([0.5377, 0.2729, 0.0692, 0.1201]) ``` Cool, problem solved. But what happends when you call backwards.. ```Python >out.backward(torch.ones_like(out)) >print(matrix.grad) tensor([[3.0957e-08, 1.4157e-08, 7.7802e-10, 1.3713e-08], [ nan, nan, nan, nan]]) ``` Those pesky NaNs are back! ## Why do we see NaNs today? The core of the problem revolves around using softmax function in sdpa: ```python > row = torch.tensor([(-float("inf")) for _ in range(4)]) > torch.softmax(row, 0) tensor([nan, nan, nan, nan]) ``` ## Quick Aside: Masking in Attention Attention itself doesn't have a concept of masking. The `sdpa` function has an argument called `attn_mask`, which would be more accurately named `attn_bias`. This is because we don't actually "mask" entries when computing attention. Instead, due to implementation details([performance](https://github.com/pytorch/pytorch/issues/25110#issuecomment-524519087)), we add a value to the masked-out query/key pairs. We use a large negative number (typically -inf) to decrease the attention weight, as softmax assigns more weight to larger values. ## Alternative Approaches If we use a very large negative number instead of -inf: ```python > row = torch.tensor([(-1e6) for _ in range(4)]) > torch.softmax(row, 0) tensor([0.2500, 0.2500, 0.2500, 0.2500]) ``` However if users always remembered to "slice" out their outputs i.e.: ```Python >fill_value = -1e6 >... >out.backward(torch.ones_like(out)) >print(matrix.grad) tensor([[-0.0563, -0.0564, 0.1613, -0.0486], [ 0.0000, 0.0000, 0.0000, 0.0000]]) ``` This would bring us back into a better state. ## A Third Option We don't necessarily need to alter the behavior of softmax for -inf or very large negative numbers. The fundamental goal is to exclude certain query/key pairs from attention, regardless of the underlying implementation. This PR implements the new semantic for masking w/ attention in fully masked-out rows: ```python out[masked_out_rows] = 0 ``` Important Note: This idea isn't entirely new. The [MaskedTensor](https://pytorch.org/tutorials/prototype/maskedtensor_overview#safe-softmax) prototype, a tensor subclass, was designed to handle such cases. However, it remains a prototype feature and hasn't gained widespread adoption. ## Details This PR stack does 3 things: 1. Adds a PRIVATE _safe_softmax op 2. Updates semantic for flash_cpu fused kernel 3. Updates semantic for efficient_cuda fused kernel _safe_softmax is not supposed to be used generically and is only meant to be used within the context of SDPA. Due to this fact instead of decomposing softmax and checking for -inf rows we instead "cheat" and use nan_to_num. Why I think this is okay? (please find a counter point if avail) There are multiple ways NaNs can emerge. For the fully masked out rows case nan_to_num works. But what if there were other NaNs, wouldn't this silently remove them? The only case that this can happen is if the input itself had a NaN or an Inf For example: ```Python a = torch.ones([4], requires_grad=False, dtype=torch.float16) a[1] = torch.finfo(torch.float16).max print(a.softmax(-1)) ``` Will return `tensor([0., 1., 0., 0.], dtype=torch.float16)` Where ```Python a = torch.ones([4], requires_grad=False, dtype=torch.float16) a[1] = float("inf") a.softmax(-1) ``` returns: `tensor([nan, nan, nan, nan], dtype=torch.float16)` If we dont want to even allow for the possibility of "inf" or "NaN" attention scores to be converted to 0 then we can implemented it something like this ```Python max = torch.max(a, dim=-1, keepdim=True) exp = torch.exp(a - max.values) denom = torch.sum(exp, dim=-1, keepdim=True) softmax = exp / denom softmax = torch.where(max.values == float('-inf'), 0.0, softmax) ``` however we would be paying for this in math performance. ## Why Now I think one point that has substantially changed where PyTorch should lie on this argument is the fact that we have fused implementations for SDPA now. And these fused implementations allow us to easily and performantly support this new semantic. Differential Revision: [D61418679](https://our.internmc.facebook.com/intern/diff/D61418679) Pull Request resolved: https://github.com/pytorch/pytorch/pull/133882 Approved by: https://github.com/soulitzer	2024-08-19 18:53:11 +00:00
Mikayla Gawarecki	d9576c9440	Fix failures when default is flipped for weights_only (#127627 ) Tests on XLA shard not fixed yet but there is an issue here https://github.com/pytorch/xla/issues/7799 Pull Request resolved: https://github.com/pytorch/pytorch/pull/127627 Approved by: https://github.com/albanD ghstack dependencies: #132349	2024-08-16 00:22:43 +00:00
PyTorch MergeBot	cfec69e2a1	Revert "Update fused kernels and call _safe_softmax from SDPA (#131863 )" This reverts commit `caba37e99b`. Reverted https://github.com/pytorch/pytorch/pull/131863 on behalf of https://github.com/izaitsevfb due to breaks executorch test executorch/backends/apple/coreml:test - test_vit_skip_conv (executorch.backends.apple.coreml.test.test_coreml_partitioner.TestCoreMLPartitioner) ([comment](https://github.com/pytorch/pytorch/pull/131863#issuecomment-2291855634))	2024-08-15 17:55:07 +00:00
drisspg	caba37e99b	Update fused kernels and call _safe_softmax from SDPA (#131863 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/131863 Approved by: https://github.com/jbschlosser, https://github.com/Chillee	2024-08-13 23:37:50 +00:00
PyTorch MergeBot	4cca18d5b6	Revert "Update fused kernels and call _safe_softmax from SDPA (#131863 )" This reverts commit `e61def65d5`. Reverted https://github.com/pytorch/pytorch/pull/131863 on behalf of https://github.com/albanD due to Broke forward AD tests in main ([comment](https://github.com/pytorch/pytorch/pull/131863#issuecomment-2286432628))	2024-08-13 14:44:08 +00:00
drisspg	e61def65d5	Update fused kernels and call _safe_softmax from SDPA (#131863 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/131863 Approved by: https://github.com/jbschlosser	2024-08-13 00:51:55 +00:00
PyTorch MergeBot	31ef900a65	Revert "added persistent option to buffers and namedbuffers (#132994 )" This reverts commit `8707c6dfac`. Reverted https://github.com/pytorch/pytorch/pull/132994 on behalf of https://github.com/PaliC due to breaking internal pyre tests ([comment](https://github.com/pytorch/pytorch/pull/132994#issuecomment-2278487672))	2024-08-09 18:14:53 +00:00
Randolf Scholz	8707c6dfac	added persistent option to buffers and namedbuffers (#132994 ) Fixes #85235 Alternative to PR https://github.com/pytorch/pytorch/pull/129655, implements 3-valued option (None or bool). - adds keyword only argument `persistent: Optional[bool] = None` to `nn.Module.buffers` - updated docstrings slightly. - added test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/132994 Approved by: https://github.com/mikaylagawarecki	2024-08-08 21:39:01 +00:00
Oguz Ulgen	221350e3a4	Add None return type to init -- tests (#132352 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132352 Approved by: https://github.com/ezyang ghstack dependencies: #132335, #132351	2024-08-01 15:44:51 +00:00
ekamiti	9e473fd868	Make adding Buffers more like adding Parameters (#125971 ) Add similar semantics for creating a buffer object similar to creating a parameter. This is done by introducing a new Buffer class that can be used for type disambiguation. The underlying functionality of registering a buffer remains the same as the register_buffer method has not been changed. The persistent parameter in the Buffer type is to indicate whether a buffer object should be persistent or not. Other non-test changes have to do with getting the new Buffer type recognized by inductor and dynamo. Remaining changes are test changes to make sure that the Buffer type can be used as a drop in replacement for register_buffer as it just leads to register_buffer being called. The addition of this new functionality still allows for normal tensors to be used as buffers so these changes are intended to be backwards compatible. Fixes #35735 Co-authored-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/125971 Approved by: https://github.com/albanD, https://github.com/anijain2305, https://github.com/mlazos	2024-07-31 10:32:40 +00:00
Xuehai Pan	973037be6a	[BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): `list()` / `tuple()` / `dict()` (#130199 ) This PR changes the empty collection factory call to Python literals: - `list()` -> `[]` - `tuple()` -> `()` - `dict()` -> `{}` The Python literals are more performant and safer. For example, the bytecode for building an empty dictionary: ```bash $ python3 -m dis - <<EOS import collections d1 = {} d2 = dict() dict = collections.OrderedDict d3 = dict() EOS ``` ```text 0 0 RESUME 0 1 2 LOAD_CONST 0 (0) 4 LOAD_CONST 1 (None) 6 IMPORT_NAME 0 (collections) 8 STORE_NAME 0 (collections) 3 10 BUILD_MAP 0 12 STORE_NAME 1 (d1) 4 14 PUSH_NULL 16 LOAD_NAME 2 (dict) 18 CALL 0 26 STORE_NAME 3 (d2) 6 28 LOAD_NAME 0 (collections) 30 LOAD_ATTR 8 (OrderedDict) 50 STORE_NAME 2 (dict) 7 52 PUSH_NULL 54 LOAD_NAME 2 (dict) 56 CALL 0 64 STORE_NAME 5 (d3) 66 RETURN_CONST 1 (None) ``` The dict literal `{}` only has one bytecode `BUILD_MAP`, while the factory call `dict()` has three `PUSH_NULL + LOAD_NAME + CALL`. Also, the factory call is not safe if users override the `dict` name in `locals` or `globals` (see the example of replacing with `OrderedDict` above). Pull Request resolved: https://github.com/pytorch/pytorch/pull/130199 Approved by: https://github.com/malfet	2024-07-11 17:30:28 +00:00
Mikayla Gawarecki	dfd1d1971e	Fix warning when pickle.load torch.Storage (#130246 ) Fixes https://github.com/pytorch/pytorch/issues/130242 Since `torch.save` does not use pickle for storages, the `torch.load` in `_load_from_bytes` should not ever be called when `torch.load`-ing a checkpoint. Setting weights_only=False explicitly in `_load_from_bytes` to avoid the weights_only warning when using the pickle module Pull Request resolved: https://github.com/pytorch/pytorch/pull/130246 Approved by: https://github.com/albanD	2024-07-11 02:40:29 +00:00
Huy Do	f78b79daaa	Forward fix the missing torch.nn.Module.set_submodule from D59140215 (#130075 ) Summary: This is to forward fix D59140215 from a PyTorch open source contributor T194074371. On PyTorch side, we need to use isinstance instead of type when checking for nn.Module. This is the same way get_submodule is currently implemented. Test Plan: `buck2 test 'fbcode//mode/opt' fbcode//dper3/dper3/core/tests:module_test` Differential Revision: D59254638 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130075 Approved by: https://github.com/mikaylagawarecki	2024-07-04 17:46:56 +00:00
Eddie Yan	8755e035d2	[CUDA][Pooling] Fix 64-bit indexing in `avg_pool_2d` backward attempt 2 (#129818 ) Somehow the original PR was missing the `CUDA_KERNEL_LOOP_TYPE` change??? Thanks @johnc-keen @Chillee for the great repro! (#129785) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129818 Approved by: https://github.com/Chillee, https://github.com/Skylion007	2024-06-30 16:52:33 +00:00
Yang Cao	9f29a2291c	Feat: Updated torch.nn.Modules.set_submodules() (#127714 ) modified: torch/nn/modules/module.py Implemented feature request by #127712. Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/127714 Approved by: https://github.com/mikaylagawarecki	2024-06-27 06:38:54 +00:00
Mikayla Gawarecki	303ad8d7f5	Add warning for weights_only (#129239 ) Also changes default for `weights_only` to `None` per comment below (hence the `suppress-bc-linter` tag) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129239 Approved by: https://github.com/albanD, https://github.com/malfet	2024-06-26 14:20:19 +00:00
PyTorch MergeBot	b1f486aff9	Revert "Add warning for weights_only (#129239 )" This reverts commit `381ce0821c`. Reverted https://github.com/pytorch/pytorch/pull/129239 on behalf of https://github.com/huydhn due to Sorry for reverting your change but I am seeing some test_nn failures from ROCm `381ce0821c`, trying to revert this to see if trunk recovers ([comment](https://github.com/pytorch/pytorch/pull/129239#issuecomment-2189812903))	2024-06-25 19:30:07 +00:00
eqy	8bfd9e9815	[cuDNN] Graph-capturable cuDNN CTCLoss (#128271 ) cuDNN v8.x added a graph-capturable CTCLoss, which slots "neatly" into the `Tensor` variant ~~WIP as cuDNN has a restriction on the max target length (255), but this is not checkable in the graph-capture case, so the UX around warnings/error-messages here might need to be tuned...~~ Currently checks restriction on max target length during warmup run(s), and bails out during capture if this constraint was violated during warmup. CC @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/128271 Approved by: https://github.com/ezyang, https://github.com/malfet	2024-06-25 06:01:50 +00:00
Mikayla Gawarecki	381ce0821c	Add warning for weights_only (#129239 ) Also changes default for `weights_only` to `None` per comment below (hence the `suppress-bc-linter` tag) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129239 Approved by: https://github.com/albanD ghstack dependencies: #129244, #129251	2024-06-25 04:19:44 +00:00
PyTorch MergeBot	1c75ddff35	Revert "[cuDNN] Graph-capturable cuDNN CTCLoss (#128271 )" This reverts commit `40e8675fcb`. Reverted https://github.com/pytorch/pytorch/pull/128271 on behalf of https://github.com/malfet due to This makes PyTorch buildable only with CuDNN v9 ([comment](https://github.com/pytorch/pytorch/pull/128271#issuecomment-2183576996))	2024-06-21 23:29:20 +00:00
Eddie Yan	40e8675fcb	[cuDNN] Graph-capturable cuDNN CTCLoss (#128271 ) cuDNN v8.x added a graph-capturable CTCLoss, which slots "neatly" into the `Tensor` variant ~~WIP as cuDNN has a restriction on the max target length (255), but this is not checkable in the graph-capture case, so the UX around warnings/error-messages here might need to be tuned...~~ Currently checks restriction on max target length during warmup run(s), and bails out during capture if this constraint was violated during warmup. CC @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/128271 Approved by: https://github.com/ezyang	2024-06-21 21:40:23 +00:00
Mikayla Gawarecki	a2d4fea872	[easy] Move state_dict hooks tests to test_module_hooks and decorate tests that call load_state_dict with swap (#126906 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/126906 Approved by: https://github.com/albanD	2024-06-10 21:50:17 +00:00
laithsakka	3a2d0755a4	enable test_ParameterList with dynamo if nn module inlining enabled only (#128308 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/128308 Approved by: https://github.com/anijain2305	2024-06-10 21:25:40 +00:00
Mikayla Gawarecki	65aa16f968	Revert "Default XLA to use swap_tensors path in nn.Module._apply (#126814 )" (#128170 ) https://github.com/pytorch/pytorch/issues/128165 :( This reverts commit `a7b1dd82ff`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/128170 Approved by: https://github.com/drisspg, https://github.com/albanD	2024-06-07 01:44:14 +00:00
Mikayla Gawarecki	a7b1dd82ff	Default XLA to use swap_tensors path in nn.Module._apply (#126814 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/126814 Approved by: https://github.com/JackCaoG, https://github.com/albanD ghstack dependencies: #127313	2024-06-04 21:40:49 +00:00
PyTorch MergeBot	17dea09b15	Revert "Default XLA to use swap_tensors path in nn.Module._apply (#126814 )" This reverts commit `bfdec93395`. Reverted https://github.com/pytorch/pytorch/pull/126814 on behalf of https://github.com/izaitsevfb due to suspicious build instructions count regression, see [D58015016](https://www.internalfb.com/diff/D58015016) ([comment](https://github.com/pytorch/pytorch/pull/126814#issuecomment-2143545818))	2024-06-01 18:46:16 +00:00
Mikayla Gawarecki	bfdec93395	Default XLA to use swap_tensors path in nn.Module._apply (#126814 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/126814 Approved by: https://github.com/JackCaoG, https://github.com/albanD ghstack dependencies: #127313	2024-05-30 18:28:13 +00:00
Mikayla Gawarecki	cd06ae0cb8	Relax use_count constraints for swap_tensors when AccumulateGrad holds a reference (#127313 ) ### Before this PR: `torch.utils.swap_tensors(a, b)` required the `use_count` of `a` and `b` to be 1 ```python a = torch.randn(2, 3, requires_grad=True) b = torch.randn(2, 4) out = a * 2 out.sum().backward() # Calling swap_tensors here would fail due to the reference held by AccumulateGrad node, which is not cleaned up after backward # torch.utils.swap_tensors(a, b) del out # Calling swap_tensors here would pass torch.utils.swap_tensors(a, b) ``` ### After this PR: `torch.utils.swap_tensors(a, b)` requires the `use_count` of `a` and `b` to be 1 or 2 IF the second reference is held by `AccumulateGrad` A pre-hook will be registered on the `AccumulateGrad` node so that it will fail if it is called (i.e. if user attempts to backward through the graph). ```python a = torch.randn(2, 3, requires_grad=True) b = torch.randn(2, 4) out = a * 2 out.sum().backward() # Calling swap_tensors here is ok torch.utils.swap_tensors(a, b) # If we ever backward to the AccumulateGrad node it will error that it was poisoned by swap_tensors ``` ### Application to `nn.Module` This issue is especially pertinent in context of `nn.Module` where parameters will have `AccumulateGrad` nodes initialized after forward. Specifically, this is intended to address https://github.com/pytorch/pytorch/pull/126814#issuecomment-2127777866. Previously, this would fail at the `m.cpu()` but we want users to be able to do something like the following, and instead raise an error if the user ever attempts to backward through the poisoned `AccumulateGrad` node ```python import torch import torch.nn as nn m = nn.Linear(3, 5) inp = torch.randn(2, 3) out = m(inp) out.sum().backward() m.cpu() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/127313 Approved by: https://github.com/soulitzer	2024-05-30 07:06:55 +00:00
JackCaoG	38a33c3202	don't call .item in onehot for XLA (#127335 ) We found that `nn.function.one_hot` will cause a graph break due to the item call in the native implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127335 Approved by: https://github.com/ezyang	2024-05-29 20:37:26 +00:00
dan_the_3rd	c133665d4a	[CUDA] Parallelize upsampling OPS across the batch/channel dimension. (#127082 ) This can make this operation 200x+ faster on modern GPUs for small grid sizes, as otherwise this kernel is scheduled with a single block (!) Tested on A100 with: ``` python test/test_nn.py TestNNDeviceTypeCUDA ``` Benchmarks FW Ran on A100 / bf16 ## Forward pass benchmarks \| batch size \| input size \| output size \| before runtime (mem bandwidth) \| after runtime (mem bandwidth) \| speedup \| \|------------\|------------\|-------------\|------------------\|-----------------\|---------\| \| 768 \| 16x16 \| 6x6 \| 5855us (0.07 GB/s) \| 38us (10 GB/s) \| 154x \| \| 768 \| 16x16 \| 7x7 \| 5214us (0.08 GB/s) \| 37us (11 GB/s) \| 138x \| \| 768 \| 16x16 \| 14x14 \| 2314us (0.27 GB/s) \| 36us (17 GB/s) \| 63x \| \| 768 \| 16x16 \| 16x16 \| 1232us (0.59 GB/s) \| 33us (21 GB/s) \| 36x \| \| 768 \| 32x32 \| 6x6 \| 19442us (0.07 GB/s) \| 98us (15 GB/s) \| 197x \| \| 768 \| 32x32 \| 7x7 \| 16918us (0.09 GB/s) \| 89us (17 GB/s) \| 188x \| \| 768 \| 32x32 \| 14x14 \| 6023us (0.28 GB/s) \| 69us (25 GB/s) \| 86x \| \| 768 \| 32x32 \| 16x16 \| 3455us (0.52 GB/s) \| 55us (32 GB/s) \| 62x \| \| 768 \| 48x48 \| 6x6 \| 38597us (0.08 GB/s) \| 179us (18 GB/s) \| 214x \| \| 768 \| 48x48 \| 7x7 \| 34700us (0.09 GB/s) \| 163us (20 GB/s) \| 211x \| \| 768 \| 48x48 \| 14x14 \| 10647us (0.33 GB/s) \| 112us (31 GB/s) \| 94x \| \| 768 \| 48x48 \| 16x16 \| 7388us (0.49 GB/s) \| 100us (36 GB/s) \| 73x \| \| 768 \| 64x64 \| 6x6 \| 76288us (0.07 GB/s) \| 310us (19 GB/s) \| 246x \| \| 768 \| 64x64 \| 7x7 \| 54981us (0.1 GB/s) \| 257us (23 GB/s) \| 213x \| \| 768 \| 64x64 \| 14x14 \| 16565us (0.37 GB/s) \| 169us (36 GB/s) \| 97x \| \| 768 \| 64x64 \| 16x16 \| 12037us (0.51 GB/s) \| 141us (43 GB/s) \| 84x \| \| 1024 \| 16x16 \| 6x6 \| 8123us (0.06 GB/s) \| 44us (12 GB/s) \| 183x \| \| 1024 \| 16x16 \| 7x7 \| 7017us (0.08 GB/s) \| 45us (12 GB/s) \| 155x \| \| 1024 \| 16x16 \| 14x14 \| 3150us (0.27 GB/s) \| 45us (18 GB/s) \| 69x \| \| 1024 \| 16x16 \| 16x16 \| 1695us (0.57 GB/s) \| 41us (23 GB/s) \| 40x \| \| 1024 \| 32x32 \| 6x6 \| 25918us (0.07 GB/s) \| 120us (16 GB/s) \| 214x \| \| 1024 \| 32x32 \| 7x7 \| 22622us (0.09 GB/s) \| 108us (18 GB/s) \| 208x \| \| 1024 \| 32x32 \| 14x14 \| 8245us (0.28 GB/s) \| 87us (26 GB/s) \| 94x \| \| 1024 \| 32x32 \| 16x16 \| 4599us (0.53 GB/s) \| 68us (35 GB/s) \| 67x \| \| 1024 \| 48x48 \| 6x6 \| 51486us (0.08 GB/s) \| 219us (20 GB/s) \| 234x \| \| 1024 \| 48x48 \| 7x7 \| 46501us (0.09 GB/s) \| 202us (22 GB/s) \| 229x \| \| 1024 \| 48x48 \| 14x14 \| 14280us (0.33 GB/s) \| 145us (32 GB/s) \| 98x \| \| 1024 \| 48x48 \| 16x16 \| 9877us (0.49 GB/s) \| 125us (39 GB/s) \| 79x \| \| 1024 \| 64x64 \| 6x6 \| 101731us (0.07 GB/s) \| 378us (20 GB/s) \| 268x \| \| 1024 \| 64x64 \| 7x7 \| 73465us (0.1 GB/s) \| 320us (24 GB/s) \| 229x \| \| 1024 \| 64x64 \| 14x14 \| 22109us (0.37 GB/s) \| 218us (37 GB/s) \| 101x \| \| 1024 \| 64x64 \| 16x16 \| 16081us (0.51 GB/s) \| 178us (46 GB/s) \| 90x \| \| 1536 \| 16x16 \| 6x6 \| 12546us (0.06 GB/s) \| 61us (13 GB/s) \| 205x \| \| 1536 \| 16x16 \| 7x7 \| 11064us (0.07 GB/s) \| 63us (13 GB/s) \| 175x \| \| 1536 \| 16x16 \| 14x14 \| 4839us (0.26 GB/s) \| 62us (20 GB/s) \| 77x \| \| 1536 \| 16x16 \| 16x16 \| 2630us (0.55 GB/s) \| 59us (24 GB/s) \| 44x \| \| 1536 \| 32x32 \| 6x6 \| 38898us (0.07 GB/s) \| 170us (17 GB/s) \| 227x \| \| 1536 \| 32x32 \| 7x7 \| 34079us (0.09 GB/s) \| 155us (19 GB/s) \| 219x \| \| 1536 \| 32x32 \| 14x14 \| 12632us (0.27 GB/s) \| 124us (28 GB/s) \| 101x \| \| 1536 \| 32x32 \| 16x16 \| 6900us (0.53 GB/s) \| 98us (37 GB/s) \| 70x \| \| 1536 \| 48x48 \| 6x6 \| 77272us (0.08 GB/s) \| 316us (21 GB/s) \| 243x \| \| 1536 \| 48x48 \| 7x7 \| 70153us (0.09 GB/s) \| 291us (23 GB/s) \| 240x \| \| 1536 \| 48x48 \| 14x14 \| 21500us (0.33 GB/s) \| 208us (34 GB/s) \| 103x \| \| 1536 \| 48x48 \| 16x16 \| 14851us (0.49 GB/s) \| 181us (40 GB/s) \| 81x \| \| 1536 \| 64x64 \| 6x6 \| 152669us (0.07 GB/s) \| 548us (21 GB/s) \| 278x \| \| 1536 \| 64x64 \| 7x7 \| 110348us (0.1 GB/s) \| 466us (25 GB/s) \| 236x \| \| 1536 \| 64x64 \| 14x14 \| 33350us (0.36 GB/s) \| 316us (38 GB/s) \| 105x \| \| 1536 \| 64x64 \| 16x16 \| 24173us (0.51 GB/s) \| 263us (47 GB/s) \| 91x \| \| 4096 \| 16x16 \| 6x6 \| 34638us (0.06 GB/s) \| 138us (16 GB/s) \| 249x \| \| 4096 \| 16x16 \| 7x7 \| 31590us (0.07 GB/s) \| 144us (16 GB/s) \| 218x \| \| 4096 \| 16x16 \| 14x14 \| 13203us (0.26 GB/s) \| 149us (23 GB/s) \| 88x \| \| 4096 \| 16x16 \| 16x16 \| 7328us (0.53 GB/s) \| 143us (27 GB/s) \| 51x \| \| 4096 \| 32x32 \| 6x6 \| 103802us (0.07 GB/s) \| 405us (19 GB/s) \| 256x \| \| 4096 \| 32x32 \| 7x7 \| 91354us (0.08 GB/s) \| 372us (22 GB/s) \| 245x \| \| 4096 \| 32x32 \| 14x14 \| 34501us (0.26 GB/s) \| 312us (29 GB/s) \| 110x \| \| 4096 \| 32x32 \| 16x16 \| 18465us (0.52 GB/s) \| 247us (39 GB/s) \| 74x \| ## Backward pass benchmarks \| batch size \| input size \| output size \| before runtime (mem bandwidth) \| after runtime (mem bandwidth) \| speedup \| \|------------\|------------\|-------------\|------------------\|-----------------\|---------\| \| 768 \| 16x16 \| 6x6 \| 78656us (0.0 GB/s) \| 323us (1 GB/s) \| 243x \| \| 768 \| 16x16 \| 7x7 \| 67167us (0.0 GB/s) \| 292us (1 GB/s) \| 230x \| \| 768 \| 16x16 \| 14x14 \| 27478us (0.02 GB/s) \| 229us (2 GB/s) \| 119x \| \| 768 \| 16x16 \| 16x16 \| 131us (5.59 GB/s) \| 56us (13 GB/s) \| 2x \| \| 768 \| 32x32 \| 6x6 \| 271752us (0.0 GB/s) \| 888us (1 GB/s) \| 305x \| \| 768 \| 32x32 \| 7x7 \| 224110us (0.0 GB/s) \| 813us (1 GB/s) \| 275x \| \| 768 \| 32x32 \| 14x14 \| 85365us (0.02 GB/s) \| 450us (3 GB/s) \| 189x \| \| 768 \| 32x32 \| 16x16 \| 67700us (0.02 GB/s) \| 360us (5 GB/s) \| 187x \| \| 768 \| 48x48 \| 6x6 \| 593709us (0.0 GB/s) \| 1988us (1 GB/s) \| 298x \| \| 768 \| 48x48 \| 7x7 \| 485566us (0.0 GB/s) \| 1694us (1 GB/s) \| 286x \| \| 768 \| 48x48 \| 14x14 \| 164059us (0.02 GB/s) \| 897us (3 GB/s) \| 182x \| \| 768 \| 48x48 \| 16x16 \| 134317us (0.02 GB/s) \| 674us (5 GB/s) \| 199x \| \| 768 \| 64x64 \| 6x6 \| 1026651us (0.0 GB/s) \| 3360us (1 GB/s) \| 305x \| \| 768 \| 64x64 \| 7x7 \| 770901us (0.0 GB/s) \| 2584us (2 GB/s) \| 298x \| \| 768 \| 64x64 \| 14x14 \| 277850us (0.02 GB/s) \| 1556us (3 GB/s) \| 178x \| \| 768 \| 64x64 \| 16x16 \| 236245us (0.02 GB/s) \| 1144us (5 GB/s) \| 206x \| \| 1024 \| 16x16 \| 6x6 \| 106638us (0.0 GB/s) \| 341us (1 GB/s) \| 312x \| \| 1024 \| 16x16 \| 7x7 \| 90886us (0.0 GB/s) \| 314us (1 GB/s) \| 288x \| \| 1024 \| 16x16 \| 14x14 \| 36572us (0.02 GB/s) \| 292us (2 GB/s) \| 124x \| \| 1024 \| 16x16 \| 16x16 \| 171us (5.69 GB/s) \| 56us (17 GB/s) \| 3x \| \| 1024 \| 32x32 \| 6x6 \| 356900us (0.0 GB/s) \| 936us (2 GB/s) \| 380x \| \| 1024 \| 32x32 \| 7x7 \| 299139us (0.0 GB/s) \| 870us (2 GB/s) \| 343x \| \| 1024 \| 32x32 \| 14x14 \| 113205us (0.02 GB/s) \| 576us (4 GB/s) \| 196x \| \| 1024 \| 32x32 \| 16x16 \| 90886us (0.02 GB/s) \| 458us (5 GB/s) \| 198x \| \| 1024 \| 48x48 \| 6x6 \| 786896us (0.0 GB/s) \| 2127us (2 GB/s) \| 369x \| \| 1024 \| 48x48 \| 7x7 \| 640515us (0.0 GB/s) \| 1837us (2 GB/s) \| 348x \| \| 1024 \| 48x48 \| 14x14 \| 218720us (0.02 GB/s) \| 1152us (4 GB/s) \| 189x \| \| 1024 \| 48x48 \| 16x16 \| 178827us (0.02 GB/s) \| 863us (5 GB/s) \| 207x \| \| 1024 \| 64x64 \| 6x6 \| 1379991us (0.0 GB/s) \| 3589us (2 GB/s) \| 384x \| \| 1024 \| 64x64 \| 7x7 \| 1047466us (0.0 GB/s) \| 2774us (2 GB/s) \| 377x \| \| 1024 \| 64x64 \| 14x14 \| 370139us (0.02 GB/s) \| 1999us (4 GB/s) \| 185x \| \| 1024 \| 64x64 \| 16x16 \| 316501us (0.02 GB/s) \| 1470us (5 GB/s) \| 215x \| \| 1536 \| 16x16 \| 6x6 \| 159057us (0.0 GB/s) \| 477us (1 GB/s) \| 332x \| \| 1536 \| 16x16 \| 7x7 \| 135578us (0.0 GB/s) \| 441us (1 GB/s) \| 306x \| \| 1536 \| 16x16 \| 14x14 \| 53002us (0.02 GB/s) \| 400us (3 GB/s) \| 132x \| \| 1536 \| 16x16 \| 16x16 \| 252us (5.79 GB/s) \| 55us (26 GB/s) \| 4x \| \| 1536 \| 32x32 \| 6x6 \| 545653us (0.0 GB/s) \| 1323us (2 GB/s) \| 412x \| \| 1536 \| 32x32 \| 7x7 \| 447491us (0.0 GB/s) \| 1248us (2 GB/s) \| 358x \| \| 1536 \| 32x32 \| 14x14 \| 173491us (0.02 GB/s) \| 787us (4 GB/s) \| 220x \| \| 1536 \| 32x32 \| 16x16 \| 136395us (0.02 GB/s) \| 633us (5 GB/s) \| 215x \| \| 1536 \| 48x48 \| 6x6 \| 1198639us (0.0 GB/s) \| 3057us (2 GB/s) \| 392x \| \| 1536 \| 48x48 \| 7x7 \| 985549us (0.0 GB/s) \| 2645us (2 GB/s) \| 372x \| \| 1536 \| 48x48 \| 14x14 \| 331419us (0.02 GB/s) \| 1581us (4 GB/s) \| 209x \| \| 1536 \| 48x48 \| 16x16 \| 270972us (0.02 GB/s) \| 1186us (6 GB/s) \| 228x \| \| 1536 \| 64x64 \| 6x6 \| 2094282us (0.0 GB/s) \| 5214us (2 GB/s) \| 401x \| \| 1536 \| 64x64 \| 7x7 \| 1593449us (0.0 GB/s) \| 4086us (2 GB/s) \| 389x \| \| 1536 \| 64x64 \| 14x14 \| 559244us (0.02 GB/s) \| 2828us (4 GB/s) \| 197x \| \| 1536 \| 64x64 \| 16x16 \| 469471us (0.02 GB/s) \| 2057us (6 GB/s) \| 228x \| \| 4096 \| 16x16 \| 6x6 \| 430494us (0.0 GB/s) \| 1008us (2 GB/s) \| 427x \| \| 4096 \| 16x16 \| 7x7 \| 360346us (0.0 GB/s) \| 1015us (2 GB/s) \| 354x \| \| 4096 \| 16x16 \| 14x14 \| 142868us (0.02 GB/s) \| 988us (3 GB/s) \| 144x \| \| 4096 \| 16x16 \| 16x16 \| 658us (5.93 GB/s) \| 56us (69 GB/s) \| 11x \| \| 4096 \| 32x32 \| 6x6 \| 1425928us (0.0 GB/s) \| 2796us (2 GB/s) \| 509x \| \| 4096 \| 32x32 \| 7x7 \| 1188862us (0.0 GB/s) \| 2906us (2 GB/s) \| 409x \| \| 4096 \| 32x32 \| 14x14 \| 464286us (0.02 GB/s) \| 1965us (4 GB/s) \| 236x \| \| 4096 \| 32x32 \| 16x16 \| 363903us (0.02 GB/s) \| 1588us (6 GB/s) \| 229x \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/127082 Approved by: https://github.com/fmassa	2024-05-24 21:17:12 +00:00
PyTorch MergeBot	b36e390b6c	Revert "Default XLA to use swap_tensors path in nn.Module._apply (#126814 )" This reverts commit `eb41ed5d90`. Reverted https://github.com/pytorch/pytorch/pull/126814 on behalf of https://github.com/mikaylagawarecki due to broke xla ci ([comment](https://github.com/pytorch/pytorch/pull/126814#issuecomment-2127719337))	2024-05-23 17:43:06 +00:00
Mikayla Gawarecki	eb41ed5d90	Default XLA to use swap_tensors path in nn.Module._apply (#126814 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/126814 Approved by: https://github.com/JackCaoG, https://github.com/albanD	2024-05-23 15:43:32 +00:00

1 2 3 4 5 ...

1520 Commits