pytorch/torch/testing/_internal
Nikita Shulga 3aecf2dc52 [MPS] Extend index_put to half precision floats (#151869)
By reusing `c10/metal/atomic.h`
This also fixes `GPUTests.test_index_put_fallback[12]_mps` that is unrolled by inductor, so no need for dedicated atomic_add support

TODOs:
 - Get rid of indexing kernel and compute it directly when kernel is run
 - Simulate atomic_add for int64 types as series of int32 atomic-add-and-fetch
 - Setup tolerances correctly to pass float16/bfloat16 tests (as CPU always takes sequential strategy)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/151869
Approved by: https://github.com/Skylion007, https://github.com/dcci
2025-04-22 22:00:08 +00:00
..
codegen
data
distributed Fix DTensorTestBase to barrier with device ids (#150896) 2025-04-22 20:22:55 +00:00
generated
opinfo Avoid overflow in vector_norm for scalar input (#144073) 2025-04-07 17:10:10 +00:00
optests
test_module
__init__.py
autocast_test_lists.py
autograd_function_db.py
check_kernel_launches.py
common_cuda.py [ROCm] opportunistic fastatomics for ReduceAdd operations for MI300 GPUs (#146264) 2025-04-22 21:55:40 +00:00
common_device_type.py Fix setUpClass() / tearDownClass() for device-specific tests (#151129) 2025-04-16 02:18:42 +00:00
common_dist_composable.py
common_distributed.py Reapply "ProcessGroupGloo: support lazy_init (#150801)" (#151031) 2025-04-11 01:58:35 +00:00
common_dtype.py
common_fsdp.py
common_jit.py
common_methods_invocations.py Make torch._chunk_cat support non-contiguous inputs (#151263) 2025-04-16 04:18:46 +00:00
common_mkldnn.py [BE]: Apply ruff PERF403 to use dict comprehensions more often (#149257) 2025-03-18 00:46:07 +00:00
common_modules.py
common_mps.py [MPS] Extend index_put to half precision floats (#151869) 2025-04-22 22:00:08 +00:00
common_nn.py
common_optimizers.py
common_pruning.py
common_quantization.py [Codemod][AddExplicitStrictExportForTrainingInferenceArg] caffe2/ (#149595) 2025-04-03 23:50:13 +00:00
common_quantized.py wire torch._scaled_mm with fp4 operands to the cublas nvfp4 kernel (#148792) 2025-03-27 17:32:20 +00:00
common_subclass.py
common_utils.py Code Clean: Using the new builtin function provides by python 3.8 later (#150839) 2025-04-10 01:17:39 +00:00
composite_compliance.py [pytree] add APIs to determine a class is a namedtuple or PyStructSequence (#113257) 2025-04-01 10:40:43 +00:00
custom_op_db.py
custom_tensor.py
dist_utils.py
dynamo_test_failures.py
fake_config_module.py
fake_config_module2.py
fake_config_module3.py
hop_db.py
hypothesis_utils.py
inductor_utils.py cpp_wrapper: Fix even more tests (#147225) 2025-04-07 14:20:06 +00:00
jit_metaprogramming_utils.py
jit_utils.py
logging_tensor.py
logging_utils.py
quantization_torch_package_models.py
static_module.py
subclasses.py
torchbind_impls.py Fakify torchbind objects in compile_fx and add tests for SigridTransformsInstanceTorchBind (#149529) 2025-03-21 18:58:28 +00:00
triton_utils.py [AOTInductor] Fix autotuning code's codegen (#150522) 2025-04-03 00:08:19 +00:00
two_tensor.py Support subclass constructor capturing in export (#147014) 2025-03-16 18:19:19 +00:00