pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
atalman	00a53b58dd	Refactor release only changes to two step execution (#121728 ) Refactor release only changes to two step execution. 1. Step ``tag-docker-images.sh`` . Tags latest docker images for current release. This step takes about 30min to complete. This step may fail due to space issues on the local host or http connection when pulling image. Hence should be rerun if failed. 2. Apply release only changes ``apply-release-changes.sh`` prepares a PR with release only changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121728 Approved by: https://github.com/jeanschmidt	2024-03-12 17:22:22 +00:00
Animesh Jain	4e63d9065a	[dynamo] Delete record replay tests as they are not maintained (#121705 ) Fixes https://github.com/pytorch/pytorch/issues/115518 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121705 Approved by: https://github.com/mlazos	2024-03-12 17:16:34 +00:00
Animesh Jain	cd1751b14f	[dynamo] Measure Dynamo cache latency lookup (#121604 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121604 Approved by: https://github.com/jansel ghstack dependencies: #121614, #121622	2024-03-12 17:09:11 +00:00
Animesh Jain	22489bfe70	[dynamo][guards-cpp-refactor] Directly call root guard manager in eval_frame (#121622 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121622 Approved by: https://github.com/jansel ghstack dependencies: #121614	2024-03-12 17:09:11 +00:00
Animesh Jain	2348e8e4e7	[dynamo][guards-cpp-refactor] Simplify DYNAMIC_INDICES guard (#121614 ) Use NO_HASATTR guard for the common part. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121614 Approved by: https://github.com/jansel	2024-03-12 17:08:56 +00:00
PyTorch MergeBot	0398dc9e8e	Revert "[DCP] Makes fsspec public (#121508 )" This reverts commit `d482614fec`. Reverted https://github.com/pytorch/pytorch/pull/121508 on behalf of https://github.com/osalpekar due to this causes torchrec tests to fail internally with this error: ModuleNotFoundError: No module named 'fsspec'. see [D54779117](https://www.internalfb.com/diff/D54779117) ([comment](https://github.com/pytorch/pytorch/pull/121508#issuecomment-1992137831))	2024-03-12 17:02:43 +00:00
Edward Z. Yang	b84f94f6a3	Restore timestamps on C++ logs without glog (#121384 ) It looks like it was commented out because the original implementation was not sufficiently portable. I had to do some rewrites to the innards to make it no portable. No Windows nanoseconds support because I'm lazy. I tested by running `build/bin/TCPStoreTest` and observing the log messages there. I am actually not sure how to look at the log messages from Python though. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/121384 Approved by: https://github.com/Skylion007, https://github.com/malfet	2024-03-12 17:01:32 +00:00
Igor Sugak	704e15307e	[caffe2] replace refernces to np.asscalar (#121332 ) (#121545 ) Summary: `np.asscalar` was deprecated and removed in a recent Numpy. It used to be implemented the following way, and the recommended alternative is to call `item()` directly: ```python def asscalar(a): return a.item() ``` This fixes all of the references. Test Plan: visual inspection and automated tests Differential Revision: D54697760 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121545 Approved by: https://github.com/malfet	2024-03-12 16:58:47 +00:00
angelayi	d1715c3adb	[export] Update error message for set_grad (#121666 ) Context: https://fb.workplace.com/groups/222849770514616/posts/381979051268353/?comment_id=383334957799429 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121666 Approved by: https://github.com/ydwu4	2024-03-12 16:41:45 +00:00
Jason Ansel	3c8c7e2a46	[dynamo] Tweak naming for module hook bw_state (#121609 ) Some minor changes not related to the other PRs in the stack Pull Request resolved: https://github.com/pytorch/pytorch/pull/121609 Approved by: https://github.com/yanboliang	2024-03-12 16:27:56 +00:00
Chien-Chin Huang	7a68e0a3e8	[DCP][state_dict] Remove the check of FSDP has root (#121544 ) Root may not exist due to FSDP lazy initialization. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121544 Approved by: https://github.com/Skylion007 ghstack dependencies: #121273, #121276, #121290	2024-03-12 15:43:19 +00:00
Andrew Gu	85dc254364	[DTensor] Moved `Transformer` sharding to staticmethod (#121660 ) To support FSDP + TP/SP unit tests, let us factor out the canonical TP/SP sharding of `Transformer` to a staticmethod that can be called by other unit tests. Test Plan: ``` pytest test/distributed/tensor/parallel/test_tp_examples.py -k test_transformer_training ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/121660 Approved by: https://github.com/wanchaol, https://github.com/yifuwang ghstack dependencies: #121360, #121357	2024-03-12 15:08:57 +00:00
Stephen Jia	cc51e100f5	[ET-VK] Enable Dynamic shape support via tensor virtual and physical resizing (#121598 ) Summary: ## Context This changeset lays the foundations for supporting dynamic shapes in the ExecuTorch Vulkan delegate via allowing Tensors to be resized in one of two ways: 1. Discarding underlying `vkImage` or `vkBuffer` and reallocating a new `vkImage` or `vkBuffer` with updated sizes. This method is intended to be used when the current `vkImage` or `vkBuffer` is not large enough to contain the new sizes. 2. Update the tensor's size metadata without reallocating any new resources. This allows shaders to interpret the underlying `vkImage` or `vkBuffer` as if it were smaller than it actually is, and allows command buffers to be preserved when sizes are changed. Test Plan: Check CI. Tests have also been added to `vulkan_compute_api_test` that test the two methods of tensor resizing. Differential Revision: D54728401 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121598 Approved by: https://github.com/jorgep31415	2024-03-12 14:32:00 +00:00
Howard Huang	2a99e6f299	Update error message (#121644 ) Summary: We don't want people to move to NCCL exp without explicit opt in. It seems that sparse allreduce was accidentally called and people were confused whether they should use NCCL exp instead. Update the error message to explicitly say that sparse_allreduce is not supported. Test Plan: sandcastle Differential Revision: D54759307 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121644 Approved by: https://github.com/awgu	2024-03-12 13:04:21 +00:00
kausik	edf22f3a48	Modify signature of dequantize ops for decomposed quantized Tensor (#119173 ) (#121450 ) Summary: X-link: https://github.com/pytorch/executorch/pull/2308 Note: The initial purpose of this PR is to draw suggestion and feedback regarding better alternative, if any. At present, dequantize op for decomposed quantized Tensor representation e.g. dequantize_per_tensor() assumes the output dtype as torch.float and hence, it does not have the output dtype in its operator argument list. However, this op signature becomes unusable when the assumption breaks. Because, in case the output dtype is different from torch.float, there is no way to specify the same during dequantization. This change is aimed at generalizing the signature of dequantize op like dequantize_per_tensor() for wider use-cases where the output dtype can be different from torch.float and needs to passed during dequantization. The proposal is to use an additional argument named 'output_dtype' to solve the problem. However, we would also like to have suggestion and feedback regarding any better alternative that can be used instead. cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo jgong5 Xia-Weiwen leslie-fang-intel Reviewed By: digantdesai Differential Revision: D53590486 Pulled By: manuelcandales Co-authored-by: kausik <kmaiti@habana.ai> Pull Request resolved: https://github.com/pytorch/pytorch/pull/121450 Approved by: https://github.com/jerryzh168	2024-03-12 12:36:31 +00:00
Adnan Akhundov	06d2392003	Support tt.reduce in Triton kernel analysis pass (#121706 ) Summary: Previously, we bailed out of the Triton kernel analysis pass when seeing a `tt.reduce` op. In this PR, we support the op and don't bail out anymore. Test Plan: This is a bit tricky, as the extension is added to the MLIR walk-based analysis code path which is active only on when the MLIR bindings added in https://github.com/openai/triton/pull/3191 are available. So for now I've run the `test_argmax` and `test_reduce_sum` manually with a newer Triton version than the current pin. When pin updates, we'll make those tests official (left a TODO comment). Pull Request resolved: https://github.com/pytorch/pytorch/pull/121706 Approved by: https://github.com/jansel	2024-03-12 11:38:28 +00:00
Animesh Jain	78b4793c96	[dynamo][compile-time] Caching VTs to reduce compile-time (#121031 ) Reduces the `torch.compile(backend="eager")` for this code ~~~ def fn(x): for _ in range(10000): # x = torch.sin(x) x = torch.ops.aten.sin(x) # x = sin(x) return x ~~~ From 18 seconds to 12 seconds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121031 Approved by: https://github.com/jansel	2024-03-12 09:19:50 +00:00
Tugsbayasgalan Manlaibaatar	52ad2b682c	Generate predispatch tests (#121678 ) In this PR, we create another dynamic test class for TestExport tests that basically serializes/deserializas pre-dispatch IR. I encountered 4 additional failures. But 3 of them are due to different operator showing up in the graph and only one legit failure which is tracked by another task internally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121678 Approved by: https://github.com/angelayi ghstack dependencies: #121652	2024-03-12 08:34:50 +00:00
Dmitry Nikolaev	656134c38f	[ROCm] enable complex128 in test_addmm_sizes_all_sparse_csr for rocm for trivial (k,n,m) cases (#120504 ) This PR enables `test_addmm_sizes_all_sparse_csr_k__n__m_*_cuda_complex128` for ROCm for trivial cases (m or n or k = 0) CUSPARSE_SPMM_COMPLEX128_SUPPORTED also used for `test_addmm_all_sparse_csr` and ` test_sparse_matmul` and both of them are skipped for ROCm by `@skipIfRocm` or `@skipCUDAIf(not _check_cusparse_spgemm_available())` Pull Request resolved: https://github.com/pytorch/pytorch/pull/120504 Approved by: https://github.com/jithunnair-amd, https://github.com/ezyang	2024-03-12 07:29:57 +00:00
lezcano	86a2d67bb9	Simplify guards using info from previous guards (#121463 ) Let me see what CI thinks about this one. Will add tests tomorrow. Fixes https://github.com/pytorch/pytorch/issues/119917 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121463 Approved by: https://github.com/ezyang	2024-03-12 04:22:20 +00:00
Nikita Shulga	703e83e336	Fix AARCH64 builds (#121700 ) After https://github.com/pytorch/pytorch/pull/119992 was landed Pull Request resolved: https://github.com/pytorch/pytorch/pull/121700 Approved by: https://github.com/janeyx99, https://github.com/huydhn	2024-03-12 04:17:47 +00:00
Shen Xu	159f30331f	[quant][pt2e] Call sub-quantizers' transform_for_annotation in ComposableQuantizer (#121548 ) Test Plan: ``` buck run caffe2/test:quantization_pt2e ``` Differential Revision: D54454707 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121548 Approved by: https://github.com/jerryzh168	2024-03-12 02:59:12 +00:00
Tugsbayasgalan Manlaibaatar	7fc497711d	Also test predispatch serialization (#121652 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121652 Approved by: https://github.com/zhxchen17, https://github.com/angelayi	2024-03-12 02:37:59 +00:00
eellison	6ca9ae4f86	Express y grid > 2^16 in terms of z grid (#121554 ) CUDA has a max y_grid of 65535. If we're computing larger than that we can compose it in terms of z grid, which is currently unused in inductor codegen. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121554 Approved by: https://github.com/aakhundov	2024-03-12 02:36:19 +00:00
Jane Xu	fb1d7935bb	[optim][BE] move complex_2d (last of complex tests) to OptimInfo (#120618 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/120618 Approved by: https://github.com/albanD	2024-03-12 02:33:21 +00:00
Xinya Zhang	a37e22de70	Add Flash Attention support on ROCM (#121561 ) This patch addresses the major limitations in our previous [PR #115981](https://github.com/pytorch/pytorch/pull/115981) through the new dedicated repository [AOTriton](https://github.com/ROCm/aotriton) - [x] Only supports MI200 series GPU (i.e., `gcnArchName == gfx90a:sramecc+:xnack-`). * MI300X is supported. More architectures will be added once Triton support them. - [x] Only supports power of two sequence lengths. * Now it support arbitrary sequence length - [ ] No support for varlen APIs. * varlen API will be supported in the next release of AOTriton - [x] Only support head dimension 16,32,64,128. * Now it support arbitrary head dimension <= 256 - [x] Performance is still being optimized. * Kernel is selected according to autotune information from Triton. Other improvements from AOTriton include * Allow more flexible Tensor storage layout * More flexible API This is a more extensive fix to #112997 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121561 Approved by: https://github.com/malfet, https://github.com/atalman	2024-03-12 01:16:53 +00:00
Kefei Lu	3a5f48d55f	Port remove_split_ops to PT2 pre-grad passes (#121674 ) Summary: For OEMAE, this contributes 14% of the total DPER pass perf gain. Test Plan: Run test cases Run oemae lower benchmark with and with this fix. FLOP/s 29 -> 34. Reviewed By: frank-wei Differential Revision: D54711064 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121674 Approved by: https://github.com/frank-wei	2024-03-12 01:15:19 +00:00
Elias Ellison	5b5d423c2e	Benchmark templates (#118880 ) Adding support for benchmarking templates in `benchmark_fusion` Pull Request resolved: https://github.com/pytorch/pytorch/pull/118880 Approved by: https://github.com/shunting314	2024-03-11 23:55:13 +00:00
Mu-Chu Lee	7676433012	[AOTInductor] Reuse generated kernels between constant graph and main graph (#121564 ) Summary: We copy the src_to_kernel from constant graph to main graph so that we could avoid generating duplicating kernels. And pass throught the name counter such that no duplicated names will be generated. Test Plan: Included in commit Differential Revision: D54706767 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121564 Approved by: https://github.com/desertfire, https://github.com/chenyang78	2024-03-11 22:44:38 +00:00
Andrew Gu	272cf29e4d	[FSDP2][BE] Refactored `check_1d_sharded_parity` to use mesh (#121357 ) Eventually, we should just have one unified way to check for parity between a `DTensor`-sharded model and a replicated model. This PR is a small refactor to work toward that. One current gap to use this `check_sharded_parity` function for 2D is that FSDP's `(Shard(0), Shard(0))` layout differs from that of the `DTensor` APIs since FSDP shards on dim-0 after TP shards on dim-0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121357 Approved by: https://github.com/weifengpy ghstack dependencies: #121360	2024-03-11 22:34:42 +00:00
Sergii Dymchenko	cd1dc5e484	Delete requirements-flake8.txt (#121657 ) The file seems to be unused and also has different flake8 version compared to .lintrunner.toml, creating confusion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121657 Approved by: https://github.com/huydhn, https://github.com/seemethere, https://github.com/malfet	2024-03-11 22:29:25 +00:00
PyTorch MergeBot	fd0dbcd891	Revert "Batch Norm Consolidation (#116092 )" This reverts commit `7b4f70eda5`. Reverted https://github.com/pytorch/pytorch/pull/116092 on behalf of https://github.com/osalpekar due to Causes build failure in //caffe2:aten-hip (AMD build) target. See [D54707318](https://www.internalfb.com/diff/D54707318) for more details, may require internal build system changes to resolve. ([comment](https://github.com/pytorch/pytorch/pull/116092#issuecomment-1989542965))	2024-03-11 22:22:41 +00:00
Sergii Dymchenko	498a94a7f5	Don't install torchfix for python<3.9 (#121655 ) Fixes https://github.com/pytorch/pytorch/issues/121591 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121655 Approved by: https://github.com/huydhn, https://github.com/malfet	2024-03-11 22:18:42 +00:00
PyTorch MergeBot	b2f09c1859	Revert "[compiled autograd] support custom ops backed by c++ autograd::Function (#120681 )" This reverts commit `d27509c384`. Reverted https://github.com/pytorch/pytorch/pull/120681 on behalf of https://github.com/xmfan due to breaking internal builds, see D54707287 ([comment](https://github.com/pytorch/pytorch/pull/120681#issuecomment-1989542344))	2024-03-11 22:18:36 +00:00
Alexander Grund	d1f45a93af	Check for releasing GIL at compiletime (#116695 ) Introduce `conditional_gil_scoped_release` and use it in `wrap_pybind_function*` to avoid a runtime branch making the code cleaner and faster. @albanD This is the GIL change extracted from #112607 as discussed. Also fixes a potential use of a moved-from object introduced in #116560: - `f` is captured by value in a lambda that may be used called times - After `std::move(f)` the lambda is not safe to call anymore CC @cyyever for that change Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/116695 Approved by: https://github.com/albanD, https://github.com/Skylion007	2024-03-11 22:04:56 +00:00
Sam Larsen	fd13a56f61	Refactor some testing helpers for FX graph cache testing (#121520 ) Summary: I plan to enable the FX graph cache for more inductor unit tests. This PR does some refactoring to prepare by moving the `TestCase` base class to `torch._inductor.test_case` (which mirrors the existing `torch._dynamo.test_case`). In a subsequent diff, I'll modify tests importing `torch._dynamo.test_case.TestCase` to use `torch._inductor.test_case.TestCase` instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121520 Approved by: https://github.com/eellison	2024-03-11 21:46:27 +00:00
Andres Lugo-Reyes	e01b07e1e8	[ROCm] Autocast RNN Support (#121539 ) Fixes #116361 Implements Autocast wrapper for miopen rnn's Pull Request resolved: https://github.com/pytorch/pytorch/pull/121539 Approved by: https://github.com/albanD, https://github.com/jeffdaily	2024-03-11 21:14:43 +00:00
Kefei Lu	fc712311ce	port fuse_parallel_linear (without changing weights) to PT2 pre-grad (#121617 ) Summary: Does not change weights structure so compatible with const folding and realtime weights update Test Plan: run added test cases Reviewed By: frank-wei Differential Revision: D53843428 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121617 Approved by: https://github.com/frank-wei	2024-03-11 20:51:11 +00:00
Zhenghao Zhao	3461404869	[pt2 export]fix name collision on constant name (#121145 ) Summary: Taking the right most part of the fqn will cause name conflict when having multiple instances of the same class. Changed to replace "." in fqn by "_" to avoid invalid syntax in input args. Test Plan: added test case Differential Revision: D54435230 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121145 Approved by: https://github.com/zhxchen17	2024-03-11 20:40:59 +00:00
Huy Do	b091a32909	Add a section on release wiki about pytorchbot cherry-pick command (#121648 ) I add a section about the new `pytorchbot cherry-pick` command in the release wiki so that more people know about it Pull Request resolved: https://github.com/pytorch/pytorch/pull/121648 Approved by: https://github.com/atalman, https://github.com/seemethere	2024-03-11 20:09:58 +00:00
Jinzhe Zeng	dd2062c737	fix CMake FindCUDA module for cross-compiling (#121590 ) Fix two cross-compiling issues in `FindCUDA.cmake` (xref: https://github.com/conda-forge/pytorch-cpu-feedstock/pull/224). 1. `setup.py` reads the cached `CUDA_TOOLKIT_ROOT_DIR`, so it must be cached. `41286f1505/setup.py (L593)` I also submitted it to the upstream CMake: https://gitlab.kitware.com/cmake/cmake/-/merge_requests/9323. 2. [SBSA toolkit](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=arm64-sbsa&Compilation=Cross&Distribution=Ubuntu&target_version=20.04&target_type=deb_network_cross) is in `sbsa-linux` directory. See also https://gitlab.kitware.com/cmake/cmake/-/issues/24192 I also submitted it to the upstream CMake: https://gitlab.kitware.com/cmake/cmake/-/merge_requests/9324 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121590 Approved by: https://github.com/malfet	2024-03-11 20:09:52 +00:00
lancerts	5fd7f5c4e3	Include torch warn in each error in cudnn/Conv_v8.cpp (#120719 ) Fixes #120702 Pull Request resolved: https://github.com/pytorch/pytorch/pull/120719 Approved by: https://github.com/eqy, https://github.com/janeyx99	2024-03-11 20:05:42 +00:00
Jason Ansel	9aa3fedb75	Slightly faster FX graph iterator (#121611 ) Before: ``` iterating over 100000000 FX nodes took 5.9s (16830686 nodes/s) ``` After: ``` iterating over 100000000 FX nodes took 5.0s (19937698 nodes/s) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/121611 Approved by: https://github.com/oulgen	2024-03-11 20:00:19 +00:00
James Wu	ae22bdaefe	Update torchbench commit pin, add sam_fast benchmark (#121420 ) After this, the sam_fast benchmark can now be run in the pytorch repo: ``` SEGMENT_ANYTHING_FAST_USE_FLASH_4=0 benchmarks/dynamo/torchbench.py --inference --amp --performance --backend=inductor --explain --only sam_fast ``` sam_fast is designed for inference only, with cuda and amp on. The code adds these restrictions to the benchmark. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121420 Approved by: https://github.com/oulgen, https://github.com/msaroufim	2024-03-11 19:48:53 +00:00
Daniel Herrera	dccc1ca839	[torch] Use __prepare_scriptable__ for closures (#121553 ) Summary: This fixes a case left incomplete by https://github.com/pytorch/pytorch/pull/106229 The object is using __prepare_scriptable__ correctly inside of torch.jit.script() but the clousre that is obtained below is using the non-prepared version. This causes issues when the prepared and non-prepared versions are in different python modules. Test Plan: ``` buck2 run mode/opt caffe2/test:jit -- -r test_decorator ``` Differential Revision: D54308741 Re-exporting, as #120806 #121307 were not properly merged. Co-authored-by: Daniel Herrera <dherrera@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/121553 Approved by: https://github.com/huydhn, https://github.com/seemethere	2024-03-11 19:14:19 +00:00
Huy Do	b4160fd9c7	Clean up macOS x86 binaries build jobs (#116726 ) This will stop building binaries for MacOS x86 on PyTorch including nightly and all future releases. If we want this for 2.2, this can be cherry-picked there. * [x] https://github.com/pytorch/pytorch/pull/116725 * [ ] https://github.com/pytorch/pytorch/pull/116726 Fixes https://github.com/pytorch/pytorch/issues/114602 Pull Request resolved: https://github.com/pytorch/pytorch/pull/116726 Approved by: https://github.com/atalman	2024-03-11 19:09:39 +00:00
iefgnoix	8d03c59d59	Bring torch_xla pin to the latest torch_xla commit (03/08/2024). (#121529 ) Update the torch_xla pin to a more recent one (03/08/2024). We need to make sure the torch_xla pin stays up-to-date so that pytorch can test against a up-to-date version of torch_xla. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121529 Approved by: https://github.com/atalman	2024-03-11 18:25:42 +00:00
Aidyn-A	39ed038f41	[TEST] Prepare test_cumulative_trapezoid for SciPy 1.12 (#121541 ) Follow up on #119326 with addressed comment: https://github.com/pytorch/pytorch/pull/119326#issuecomment-1939428705: > I'd like to propose a slightly different approach. We could check if scipy is version `1.12.0`. If so, overload `scipy_cumulative_trapezoid` with a function that specifically checks `t.shape[axis] == 0`, and in that case return an array of the same shape as `t`, which is the expected behavior as far as I understand. That way, we're not just skipping the test cases I would like to add that the version check is not necessary as in any case the outcome is the same. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121541 Approved by: https://github.com/nWEIdia, https://github.com/albanD	2024-03-11 17:48:29 +00:00
Catherine Lee	6801595349	Fix round robin sharding (#121022 ) Fix round robin sharding when there are no test times and sort_by_time=False Adds more tests to test_test_selections for sort_by_time=False Adds more checks to test_split_shards_random for serial/parallel ordering + ordering of tests Refactoring of dup code Tested locally by running `python test/run_test.py --shard 3 5` with no test times downloaded and checked that it wasn't an empty list. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121022 Approved by: https://github.com/huydhn, https://github.com/osalpekar	2024-03-11 17:30:12 +00:00
Aaron Gokaslan	e2ac2dc13a	Update NCCL submodule to v2.20.5 (#121635 ) Updates NCCL submodule to 2.20.5 . Includes a lot of bugfixes for reductions and connections issues. Should also improve performance. We have been running 2.20.5 internally for a few weeks, the binary pip wheels have finally been published so we can update main. Release notes here: https://docs.nvidia.com/deeplearning/nccl/release-notes/rel_2-20-5.html#rel_2-20-5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121635 Approved by: https://github.com/malfet	2024-03-11 17:23:59 +00:00

1 2 3 4 5 ...

70483 Commits