| .. |
|
cutlass_extensions
|
Clean up of CUTLASS_VERSION (#152947)
|
2025-05-08 08:32:34 +00:00 |
|
linalg
|
[1/N][Fix] Fix typo in aten folder (#166126)
|
2025-10-27 15:34:39 +00:00 |
|
AbsKernel.cu
|
|
|
|
Activation.cpp
|
c10::string_view -> std::string_view in aten (#141903)
|
2024-12-07 23:23:52 +00:00 |
|
Activation.h
|
Modernize C++ code in aten/src/ATen/ (#141424)
|
2024-11-24 02:15:19 +00:00 |
|
ActivationEluKernel.cu
|
|
|
|
ActivationGeluKernel.cu
|
|
|
|
ActivationGluKernel.cu
|
|
|
|
ActivationHardshrinkKernel.cu
|
|
|
|
ActivationHardsigmoidKernel.cu
|
[ROCm] fix hardsigmoid op (#162758)
|
2025-09-12 15:07:13 +00:00 |
|
ActivationHardswishKernel.cu
|
Fix torch.nn.functional.hardswish gradients corner case (#148049)
|
2025-03-14 18:53:10 +00:00 |
|
ActivationHardtanhKernel.cu
|
|
|
|
ActivationLeakyReluKernel.cu
|
|
|
|
ActivationLogSigmoidKernel.cu
|
|
|
|
ActivationMishKernel.cu
|
|
|
|
ActivationPreluKernel.cu
|
|
|
|
ActivationSiluKernel.cu
|
|
|
|
ActivationSoftplusKernel.cu
|
|
|
|
ActivationSoftshrinkKernel.cu
|
softshrink nan fixes (#138421)
|
2024-11-21 23:06:08 +00:00 |
|
ActivationThresholdKernel.cu
|
|
|
|
AdaptiveAveragePooling.cu
|
[Doc fix] fix spelling of enough (#159587)
|
2025-08-01 01:50:57 +00:00 |
|
AdaptiveAveragePooling3d.cu
|
Fix incorrect stride handling in adaptive_avg_pool3d (#157326)
|
2025-07-01 03:03:48 +00:00 |
|
AdaptiveMaxPooling2d.cu
|
|
|
|
AdaptiveMaxPooling3d.cu
|
|
|
|
airy_ai.cu
|
|
|
|
AmpKernels.cu
|
Fix broken URLs (#152237)
|
2025-04-27 09:56:42 +00:00 |
|
AveragePool2d.cu
|
[CUDA][avgpool2d] Fix backward launch bounds again for sm100, sm120 (#150640)
|
2025-04-04 13:05:40 +00:00 |
|
AveragePool3d.cu
|
|
|
|
bessel_j1.cu
|
|
|
|
bessel_j0.cu
|
|
|
|
bessel_y1.cu
|
|
|
|
bessel_y0.cu
|
|
|
|
BinaryBitwiseOpsKernels.cu
|
|
|
|
BinaryDivFloorKernel.cu
|
|
|
|
BinaryDivTrueKernel.cu
|
|
|
|
BinaryDivTruncKernel.cu
|
|
|
|
BinaryGeometricKernels.cu
|
|
|
|
BinaryInternal.h
|
|
|
|
BinaryLogicalOpsKernels.cu
|
|
|
|
BinaryMiscBackwardOpsKernels.cu
|
|
|
|
BinaryMiscOpsKernels.cu
|
|
|
|
BinaryMulKernel.cu
|
|
|
|
BinaryRemainderKernel.cu
|
|
|
|
BinaryShiftOpsKernels.cu
|
|
|
|
Blas.cpp
|
[CUDA][cuBLASLt] addmm -- extend bias fusions to cases with (1 by n) shapes (#166307)
|
2025-10-31 14:30:41 +00:00 |
|
block_reduce.cuh
|
[ROCm] Remove use of warpsize on host-side compilation (#156979)
|
2025-07-01 04:55:31 +00:00 |
|
Bucketization.cu
|
[CUDA][64-bit indexing] Fix some existing problematic int64_t _ = blockIdx.* * blockDim.* code (#142010)
|
2024-12-19 00:55:11 +00:00 |
|
chebyshev_polynomial_t.cu
|
|
|
|
chebyshev_polynomial_u.cu
|
|
|
|
chebyshev_polynomial_v.cu
|
|
|
|
chebyshev_polynomial_w.cu
|
|
|
|
Col2Im.cu
|
|
|
|
CompareEQKernel.cu
|
|
|
|
CompareKernels.cu
|
|
|
|
ComplexKernel.cu
|
|
|
|
CompositeRandomAccessor.h
|
|
|
|
ConvolutionMM2d.cu
|
|
|
|
Copy.cu
|
[PyTorch] Use events from pool in copy_device_to_device (#165647)
|
2025-10-28 05:19:05 +00:00 |
|
Copy.h
|
|
|
|
CopysignKernel.cu
|
|
|
|
CrossKernel.cu
|
|
|
|
cuBlasCommonArgs.h
|
[1/2] Split cublasCommonArgs into its own file (#166313)
|
2025-10-28 16:35:32 +00:00 |
|
CUDAJitLoops.cuh
|
[ATen][CUDA] Implement 128 bit vectorization v2 (#145746)
|
2025-01-31 06:42:08 +00:00 |
|
CUDALoops.cuh
|
Update workaround to old CUDA bug (#164354) (#165984)
|
2025-10-21 19:09:43 +00:00 |
|
CUDAScalar.cu
|
[ROCm] delete un-needed workaround for tensor.item() (#158486)
|
2025-07-23 00:31:57 +00:00 |
|
CuFFTPlanCache.h
|
[Doc fix] fix spelling of enough (#159587)
|
2025-08-01 01:50:57 +00:00 |
|
CuFFTUtils.h
|
[ATen][CUDA][cuFFT] Guard against deprecated error codes (#159466)
|
2025-07-30 21:10:32 +00:00 |
|
CumminmaxKernel.cu
|
|
|
|
CumprodKernel.cu
|
|
|
|
CumsumKernel.cu
|
|
|
|
cutlass_common.cuh
|
[CUTLASS] [CUDA] SM100 GroupMM (#156203)
|
2025-06-28 23:02:00 +00:00 |
|
DepthwiseConv2d.cu
|
Work around buggy use_const_ref_for_mutable_tensors (#145530)
|
2025-01-24 14:38:49 +00:00 |
|
DepthwiseConv3d.cu
|
|
|
|
DeviceSqrt.cuh
|
|
|
|
DilatedMaxPool2d.cu
|
Turn some const variables into constexpr in C++ code (#165401)
|
2025-10-17 13:24:46 +00:00 |
|
DilatedMaxPool3d.cu
|
|
|
|
DistanceKernel.cu
|
[CUDA][64-bit indexing] Fix some existing problematic int64_t _ = blockIdx.* * blockDim.* code (#142010)
|
2024-12-19 00:55:11 +00:00 |
|
DistributionBernoulli.cu
|
|
|
|
DistributionCauchyKernel.cu
|
|
|
|
DistributionExponentialKernel.cu
|
|
|
|
DistributionGeometricKernel.cu
|
|
|
|
DistributionLogNormalKernel.cu
|
|
|
|
DistributionNormal.cu
|
|
|
|
DistributionRandomKernel.cu
|
|
|
|
Distributions.cpp
|
|
|
|
Distributions.cu
|
|
|
|
Distributions.h
|
|
|
|
DistributionTemplates.h
|
[1/N][Fix] Fix typo in aten folder (#166126)
|
2025-10-27 15:34:39 +00:00 |
|
DistributionUniform.cu
|
|
|
|
Dropout.cu
|
[ATen][CUDA] Implement 128 bit vectorization v2 (#145746)
|
2025-01-31 06:42:08 +00:00 |
|
Embedding.cu
|
Remove CUDA 11 workarounds for CUB_SUPPORTS_SCAN_BY_KEY and CUB_SUPPORTS_UNIQUE_BY_KEY (#164637)
|
2025-10-18 20:05:54 +00:00 |
|
EmbeddingBackwardKernel.cu
|
Remove CUDA 11 workarounds for CUB_SUPPORTS_SCAN_BY_KEY and CUB_SUPPORTS_UNIQUE_BY_KEY (#164637)
|
2025-10-18 20:05:54 +00:00 |
|
EmbeddingBackwardKernel.cuh
|
|
|
|
EmbeddingBag.cu
|
Remove CUDA 11 workarounds for CUB_SUPPORTS_SCAN_BY_KEY and CUB_SUPPORTS_UNIQUE_BY_KEY (#164637)
|
2025-10-18 20:05:54 +00:00 |
|
Equal.cpp
|
|
|
|
FillKernel.cu
|
|
|
|
FlattenIndicesKernel.cu
|
|
|
|
ForeachBinaryOpList.cu
|
[BE][Ez]: Prevent copies of std::vector in CUDA ForeachOps (#163416)
|
2025-09-21 05:24:13 +00:00 |
|
ForeachBinaryOpScalar.cu
|
[BE][Ez]: Prevent copies of std::vector in CUDA ForeachOps (#163416)
|
2025-09-21 05:24:13 +00:00 |
|
ForeachBinaryOpScalarList.cu
|
[BE][Ez]: Prevent copies of std::vector in CUDA ForeachOps (#163416)
|
2025-09-21 05:24:13 +00:00 |
|
ForeachBinaryOpScalarTensor.cu
|
[BE][Ez]: Prevent copies of std::vector in CUDA ForeachOps (#163416)
|
2025-09-21 05:24:13 +00:00 |
|
ForeachFunctors.cuh
|
Revert "handling special case for pow(3) for GPU (#157537)"
|
2025-08-19 22:57:45 +00:00 |
|
ForeachMinMaxFunctors.cuh
|
|
|
|
ForeachPointwiseOp.cu
|
[BE][Ez]: Prevent copies of std::vector in CUDA ForeachOps (#163416)
|
2025-09-21 05:24:13 +00:00 |
|
ForeachReduceOp.cu
|
chunk_size should always be int64_t for Foreach functors (#156872)
|
2025-06-27 22:35:34 +00:00 |
|
ForeachTernaryOp.cu
|
[BE][Ez]: Prevent copies of std::vector in CUDA ForeachOps (#163416)
|
2025-09-21 05:24:13 +00:00 |
|
ForeachUnaryOp.cu
|
[BE][Ez]: Prevent copies of std::vector in CUDA ForeachOps (#163416)
|
2025-09-21 05:24:13 +00:00 |
|
FractionalMaxPool2d.cu
|
|
|
|
FractionalMaxPool3d.cu
|
[CUDA][64-bit indexing] Fix some existing problematic int64_t _ = blockIdx.* * blockDim.* code (#142010)
|
2024-12-19 00:55:11 +00:00 |
|
FunctionOfAMatrixUtilsKernel.cu
|
|
|
|
fused_adagrad_impl.cu
|
Split out C++ code from fused adagrad PR (#159008)
|
2025-07-26 00:36:59 +00:00 |
|
fused_adagrad_impl.cuh
|
Split out C++ code from fused adagrad PR (#159008)
|
2025-07-26 00:36:59 +00:00 |
|
fused_adagrad_utils.cuh
|
[BugFix] chunk_size should always be int64_t (#165971)
|
2025-10-21 19:52:47 +00:00 |
|
fused_adam_amsgrad_impl.cu
|
|
|
|
fused_adam_amsgrad_impl.cuh
|
|
|
|
fused_adam_impl.cu
|
|
|
|
fused_adam_impl.cuh
|
|
|
|
fused_adam_utils.cuh
|
chunk_size should always be int64_t for Foreach functors (#156872)
|
2025-06-27 22:35:34 +00:00 |
|
fused_adamw_amsgrad_impl.cu
|
|
|
|
fused_adamw_amsgrad_impl.cuh
|
|
|
|
fused_adamw_impl.cu
|
|
|
|
fused_adamw_impl.cuh
|
|
|
|
FusedAdagradKernel.cu
|
Split out C++ code from fused adagrad PR (#159008)
|
2025-07-26 00:36:59 +00:00 |
|
FusedAdamKernel.cu
|
[5/N] Apply bugprone-unchecked-optional-access (#143111)
|
2024-12-15 01:07:28 +00:00 |
|
FusedAdamWKernel.cu
|
[5/N] Apply bugprone-unchecked-optional-access (#143111)
|
2024-12-15 01:07:28 +00:00 |
|
FusedSgdKernel.cu
|
chunk_size should always be int64_t for Foreach functors (#156872)
|
2025-06-27 22:35:34 +00:00 |
|
GcdLcmKernel.cu
|
|
|
|
GridSampler.cpp
|
|
|
|
GridSampler.cu
|
|
|
|
GridSampler.cuh
|
|
|
|
GridSampler.h
|
Modernize C++ code in aten/src/ATen/ (#141424)
|
2024-11-24 02:15:19 +00:00 |
|
group_norm_kernel.cu
|
[CUDA][64-bit indexing] Fix some existing problematic int64_t _ = blockIdx.* * blockDim.* code (#142010)
|
2024-12-19 00:55:11 +00:00 |
|
GroupedBlas.cpp
|
Add MXFP4 grouped gemm support via. FBGEMM kernels (#166530)
|
2025-10-30 16:46:11 +00:00 |
|
GroupMM.cu
|
improve shape checks for grouped_mm (#159666)
|
2025-08-02 00:12:25 +00:00 |
|
GroupMM.h
|
bf16 grouped gemm (#150374)
|
2025-04-06 04:53:24 +00:00 |
|
GroupMMCommon.cuh
|
improve shape checks for grouped_mm (#159666)
|
2025-08-02 00:12:25 +00:00 |
|
hermite_polynomial_h.cu
|
|
|
|
hermite_polynomial_he.cu
|
|
|
|
IGammaKernel.cu
|
Turn some const variables into constexpr in C++ code (#165401)
|
2025-10-17 13:24:46 +00:00 |
|
Im2Col.cu
|
|
|
|
im2col.cuh
|
[BE] Remove unusued channels arg in col2im (#142336)
|
2024-12-09 01:49:41 +00:00 |
|
Indexing.cu
|
[ROCm] Adjust grid size for non-unit stride backwards indexing (#165026)
|
2025-10-10 16:36:38 +00:00 |
|
IndexKernel.cpp
|
[4/N] Avoid copy in std::get (#142285)
|
2024-12-09 07:59:35 +00:00 |
|
IndexKernel.cu
|
[CUDA] fix indexing on large tensor causing nvalid configuration argument (#164049)
|
2025-09-29 06:07:35 +00:00 |
|
IndexKernel.h
|
Modernize C++ code in aten/src/ATen/ (#141424)
|
2024-11-24 02:15:19 +00:00 |
|
IndexKernelUtils.cu
|
Add a compile-time flag to trigger verbose logging for device-side asserts (#166171)
|
2025-10-30 19:43:46 +00:00 |
|
IndexKernelUtils.h
|
Support more dtypes for input, indices in gather (#151822)
|
2025-05-01 16:35:23 +00:00 |
|
int4mm.cu
|
Remove old ROCm version checks and branches (#166111)
|
2025-10-27 05:32:54 +00:00 |
|
int8mm.cu
|
[WOQ] Integrate CUDA support for int8pack_mm woq optimization pattern (#161680)
|
2025-09-17 10:24:13 +00:00 |
|
jit_utils.cpp
|
[1/N][Fix] Fix typo in aten folder (#166126)
|
2025-10-27 15:34:39 +00:00 |
|
jit_utils.h
|
add the torch.float8_e8m0fnu dtype to PyTorch (#147466)
|
2025-02-20 13:55:42 +00:00 |
|
JitLoops.cuh
|
|
|
|
KernelUtils.cuh
|
Remove old ROCm version checks and branches (#166111)
|
2025-10-27 05:32:54 +00:00 |
|
laguerre_polynomial_l.cu
|
|
|
|
LaunchUtils.h
|
|
|
|
layer_norm_kernel.cu
|
[ROCm] Disable __builtin_amdgcn_rcpf for gfx90a (#166454)
|
2025-10-30 23:39:00 +00:00 |
|
legendre_polynomial_p.cu
|
|
|
|
Lerp.cu
|
Fix torch.lerp RuntimeError when weight is CPU scalar while input & end are CUDA tensor (#141820)
|
2024-12-09 18:14:54 +00:00 |
|
LinearAlgebra.cu
|
|
|
|
LinearAlgebraStubs.cpp
|
[1/N][Fix] Fix typo in aten folder (#166126)
|
2025-10-27 15:34:39 +00:00 |
|
LogAddExpKernel.cu
|
|
|
|
LogcumsumexpKernel.cu
|
Remove old workaround in launch_logcumsumexp_cuda_kernel (#164567)
|
2025-10-03 18:07:02 +00:00 |
|
Loops.cuh
|
Simplify c10::guts::apply (#164566)
|
2025-10-22 00:47:43 +00:00 |
|
Loss.cu
|
Removed ROCM ifdef that governs thread count + smem parallel reduction. (#149779)
|
2025-03-29 04:27:54 +00:00 |
|
LossCTC.cu
|
[CUDA] Decrease launch bounds of CTCLoss backward for blackwell (#159522)
|
2025-08-05 19:26:25 +00:00 |
|
Math.cuh
|
Turn some const variables into constexpr in C++ code (#165401)
|
2025-10-17 13:24:46 +00:00 |
|
MaxMinElementwiseKernel.cu
|
|
|
|
MaxUnpooling.cu
|
[BUG] MaxUnpool2d/3d should check output dim before accessing its elements (#163507)
|
2025-09-22 21:36:48 +00:00 |
|
MemoryAccess.cuh
|
[ROCm] Improve vectorized elementwise kernel performance in MI300X (#153634)
|
2025-05-27 20:49:32 +00:00 |
|
MiscUtils.h
|
Enable modernize-use-default-member-init (#149046)
|
2025-04-09 11:57:24 +00:00 |
|
MixedDtypesLinear.cu
|
Remove outdated CUDA 11 conditions (#154313)
|
2025-05-28 08:44:58 +00:00 |
|
modified_bessel_i1.cu
|
|
|
|
modified_bessel_i0.cu
|
|
|
|
modified_bessel_k1.cu
|
|
|
|
modified_bessel_k0.cu
|
|
|
|
MultiLabelMarginCriterion.cu
|
|
|
|
MultiMarginLoss.cu
|
[CUDA] Fix missing __syncthreads in MultiMarginLoss backward (#158994)
|
2025-07-24 20:47:29 +00:00 |
|
MultinomialKernel.cu
|
[ROCm] Remove use of warpsize on host-side compilation (#156979)
|
2025-07-01 04:55:31 +00:00 |
|
MultiTensorApply.cuh
|
|
|
|
NaiveConvolutionTranspose2d.cu
|
|
|
|
NaiveConvolutionTranspose3d.cu
|
|
|
|
NaiveDilatedConvolution.cu
|
|
|
|
NLLLoss2d.cu
|
[cuda] fix nll_loss2d backward bounds check with reduction=none (#165247)
|
2025-10-20 06:25:11 +00:00 |
|
Nonzero.cu
|
Remove C++ and test branches for CUDA<12 (#163443)
|
2025-09-22 18:20:08 +00:00 |
|
Normalization.cu
|
Add assertion to align with cuda (#153233)
|
2025-05-23 07:32:43 +00:00 |
|
Normalization.cuh
|
Use std::min for #166021 (#166195)
|
2025-10-27 17:57:44 +00:00 |
|
PersistentSoftmax.cuh
|
Improve softmax's perf in cuda (#144679)
|
2025-01-23 00:02:57 +00:00 |
|
PointwiseOpsKernel.cu
|
Remove outdated CUDA 11 conditions (#154313)
|
2025-05-28 08:44:58 +00:00 |
|
Pow.cuh
|
Workaround ATen SFINAE under libc++ (#161101)
|
2025-08-21 00:55:58 +00:00 |
|
PowKernel.cu
|
Revert "handling special case for pow(3) for GPU (#157537)"
|
2025-08-19 22:57:45 +00:00 |
|
Randperm.cu
|
|
|
|
Randperm.cuh
|
[4/N] Avoid copy in std::get (#142285)
|
2024-12-09 07:59:35 +00:00 |
|
RangeFactories.cu
|
[CUDA][MPS] Fix torch.arange bound validation for large float inputs (#154320)
|
2025-06-05 14:51:25 +00:00 |
|
RecordStream.cu
|
|
|
|
Reduce.cu
|
|
|
|
Reduce.cuh
|
[ATen] Fix CUDA reduction warp shuffle order (#164790)
|
2025-10-21 00:09:13 +00:00 |
|
ReduceAMinMaxKernel.cu
|
|
|
|
ReduceArgMaxKernel.cu
|
|
|
|
ReduceArgMinKernel.cu
|
|
|
|
ReduceLogicKernel.cu
|
|
|
|
ReduceMaxValuesKernel.cu
|
|
|
|
ReduceMinValuesKernel.cu
|
|
|
|
ReduceMomentKernel.cu
|
[ATen] Vectorize 8 elements on 16 bit data types for sum/mean (#165055)
|
2025-10-17 13:39:36 +00:00 |
|
ReduceNormKernel.cu
|
|
|
|
ReduceOps.cpp
|
|
|
|
ReduceOps.h
|
Modernize C++ code in aten/src/ATen/ (#141424)
|
2024-11-24 02:15:19 +00:00 |
|
ReduceSumProdKernel.cu
|
[ATen] Vectorize 8 elements on 16 bit data types for sum/mean (#165055)
|
2025-10-17 13:39:36 +00:00 |
|
reduction_template.cuh
|
[ATen] Fix CUDA reduction warp shuffle order (#164790)
|
2025-10-21 00:09:13 +00:00 |
|
ReflectionPad.cu
|
[CUDA] fix reflection padding for large batch size (#165942)
|
2025-10-21 21:07:38 +00:00 |
|
RenormKernel.cu
|
|
|
|
Repeat.cu
|
Add CUDA_KERNEL_ASSERT_PRINTF, a more flexible CUDA_KERNEL_ASSERT_MSG (#160129)
|
2025-09-16 00:23:48 +00:00 |
|
ReplicationPadding.cu
|
[CUDA][64-bit indexing] Fix some existing problematic int64_t _ = blockIdx.* * blockDim.* code (#142010)
|
2024-12-19 00:55:11 +00:00 |
|
Resize.cpp
|
|
|
|
Resize.h
|
Enable modernize-use-default-member-init (#149046)
|
2025-04-09 11:57:24 +00:00 |
|
RNN.cu
|
[ROCm] missing AT_CUDA_CHECK for cub and SoftMax (#149883)
|
2025-03-25 23:22:32 +00:00 |
|
RowwiseScaledMM.cu
|
[cutlass] Prep for cutlass upgrade by ignoring Wunused-but-set-variable (#159276)
|
2025-07-29 04:40:24 +00:00 |
|
RowwiseScaledMM.h
|
|
|
|
RreluWithNoise.cu
|
[4/N] Avoid copy in std::get (#142285)
|
2024-12-09 07:59:35 +00:00 |
|
scaled_modified_bessel_k1.cu
|
|
|
|
scaled_modified_bessel_k0.cu
|
|
|
|
ScaledBlas.cpp
|
Revert "Add CUDA MXFP4 scaled mm support via. FBGEMM (#166526)"
|
2025-10-31 21:10:28 +00:00 |
|
ScaledGroupMM.cu
|
improve shape checks for grouped_mm (#159666)
|
2025-08-02 00:12:25 +00:00 |
|
ScaledGroupMM.h
|
[WIP] Initial implementation of Grouped Gemm API (#148531)
|
2025-03-11 21:49:46 +00:00 |
|
ScanKernels.cpp
|
Implement deterministic scan (#140887)
|
2024-11-19 23:43:26 +00:00 |
|
ScanKernels.h
|
|
|
|
ScanUtils.cuh
|
Remove outdated CUDA 11 conditions (#154313)
|
2025-05-28 08:44:58 +00:00 |
|
ScatterGatherKernel.cu
|
Add a compile-time flag to trigger verbose logging for device-side asserts (#166171)
|
2025-10-30 19:43:46 +00:00 |
|
SegmentReduce.cu
|
[CD] Add CUDA 13.0 Windows build (#161663)
|
2025-09-01 15:27:17 +00:00 |
|
Shape.cu
|
Fix: nDims is mutated inside the loop in Shape.cu (#165446)
|
2025-10-15 02:32:15 +00:00 |
|
shifted_chebyshev_polynomial_t.cu
|
|
|
|
shifted_chebyshev_polynomial_u.cu
|
|
|
|
shifted_chebyshev_polynomial_v.cu
|
|
|
|
shifted_chebyshev_polynomial_w.cu
|
|
|
|
SoftMax.cu
|
[ROCm] Remove use of warpsize on host-side compilation (#156979)
|
2025-07-01 04:55:31 +00:00 |
|
Sort.cpp
|
[ROCm] Fix sort for non-standard bool (#147459)
|
2025-03-06 00:23:02 +00:00 |
|
Sort.cu
|
[ROCm] Use IPT=8 for block radix sort (#147657)
|
2025-02-26 04:22:16 +00:00 |
|
Sort.h
|
|
|
|
SortImpl.cu
|
|
|
|
Sorting.cpp
|
Fix race condition and make CUDA kthvalue deterministic (#165762)
|
2025-10-25 00:45:57 +00:00 |
|
Sorting.cu
|
Fix race condition and make CUDA kthvalue deterministic (#165762)
|
2025-10-25 00:45:57 +00:00 |
|
Sorting.h
|
Modernize C++ code in aten/src/ATen/ (#141424)
|
2024-11-24 02:15:19 +00:00 |
|
SortingCommon.cuh
|
Recover non-standard bool test for msort (#139870)
|
2024-11-11 02:00:34 +00:00 |
|
SortingRadixSelect.cuh
|
|
|
|
SortStable.cu
|
Allow at::native::offset_t to be offset using operator+= (#164570)
|
2025-10-15 01:40:54 +00:00 |
|
SortStable.h
|
|
|
|
SortUtils.cuh
|
|
|
|
SparseBinaryOpIntersectionKernel.cu
|
|
|
|
SparseMM.cu
|
|
|
|
SpectralOps.cpp
|
Remove unnecessary "static" for definitions in anonymous namespace (#165035)
|
2025-10-11 00:04:23 +00:00 |
|
SpectralOps.cu
|
[1/N] Remove inclusion of ATen/core/Array.h (#122064)
|
2024-11-18 08:50:28 +00:00 |
|
spherical_bessel_j0.cu
|
|
|
|
StepKernel.cu
|
|
|
|
SummaryOps.cu
|
Non-deterministic alert in histc_cuda for floating types only (#151701)
|
2025-04-24 21:16:46 +00:00 |
|
TensorCompare.cpp
|
|
|
|
TensorCompare.cu
|
Add FP8 support for eye (#139974)
|
2024-12-24 10:00:23 +00:00 |
|
TensorFactories.cu
|
[CUDA][64-bit indexing] Fix some existing problematic int64_t _ = blockIdx.* * blockDim.* code (#142010)
|
2024-12-19 00:55:11 +00:00 |
|
TensorModeKernel.cpp
|
|
|
|
TensorModeKernel.cu
|
[ROCm] Remove use of warpsize on host-side compilation (#156979)
|
2025-07-01 04:55:31 +00:00 |
|
TensorModeKernel.cuh
|
Remove outdated CUDA 11 conditions (#154313)
|
2025-05-28 08:44:58 +00:00 |
|
TensorModeKernel.h
|
Modernize C++ code in aten/src/ATen/ (#141424)
|
2024-11-24 02:15:19 +00:00 |
|
TensorShape.cu
|
Make torch._chunk_cat support non-contiguous inputs (#151263)
|
2025-04-16 04:18:46 +00:00 |
|
TensorShapeCUDA.cpp
|
|
|
|
TensorTopK.cpp
|
Remove CUDA 11 workarounds for CUB_SUPPORTS_SCAN_BY_KEY and CUB_SUPPORTS_UNIQUE_BY_KEY (#164637)
|
2025-10-18 20:05:54 +00:00 |
|
TensorTopK.cu
|
Remove CUDA 11 workarounds for CUB_SUPPORTS_SCAN_BY_KEY and CUB_SUPPORTS_UNIQUE_BY_KEY (#164637)
|
2025-10-18 20:05:54 +00:00 |
|
TensorTopK.h
|
Modernize C++ code in aten/src/ATen/ (#141424)
|
2024-11-24 02:15:19 +00:00 |
|
TensorTransformations.cu
|
[CUDA][64-bit indexing] Fix some existing problematic int64_t _ = blockIdx.* * blockDim.* code (#142010)
|
2024-12-19 00:55:11 +00:00 |
|
thread_constants.h
|
Revert "[CUDA] Only use vec128 if CUDA version is newer than 12.8 (#150705)"
|
2025-04-08 16:29:05 +00:00 |
|
TriangularOps.cu
|
[cuda] fix triu/tril int32 overflow for large matrices (#164705)
|
2025-10-20 07:17:41 +00:00 |
|
UnaryComplexKernels.cu
|
|
|
|
UnaryFractionKernels.cu
|
|
|
|
UnaryGammaKernels.cu
|
|
|
|
UnaryGeometricAcoshKernel.cu
|
|
|
|
UnaryGeometricAcosKernel.cu
|
|
|
|
UnaryGeometricAsinhKernel.cu
|
|
|
|
UnaryGeometricAsinKernel.cu
|
|
|
|
UnaryGeometricAtanhKernel.cu
|
|
|
|
UnaryGeometricAtanKernel.cu
|
|
|
|
UnaryGeometricCoshKernel.cu
|
|
|
|
UnaryGeometricCosKernel.cu
|
|
|
|
UnaryGeometricSinhKernel.cu
|
|
|
|
UnaryGeometricSinKernel.cu
|
|
|
|
UnaryGeometricTanhKernel.cu
|
disable jiterator for complex tan and tanh (#165250)
|
2025-10-29 04:59:01 +00:00 |
|
UnaryGeometricTanKernel.cu
|
disable jiterator for complex tan and tanh (#165250)
|
2025-10-29 04:59:01 +00:00 |
|
UnaryLogKernels.cu
|
|
|
|
UnaryOpsKernel.cu
|
|
|
|
UnarySignKernels.cu
|
|
|
|
UnarySpecialOpsKernel.cu
|
|
|
|
UnfoldBackwardKernel.cu
|
|
|
|
Unique.cu
|
|
|
|
UniqueCub.cu
|
[ATen][CUDA][CUB] Implement changes to CCCL (CUB/Thrust/LibCUDACXX) usage in ATen (#153373)
|
2025-06-28 05:44:52 +00:00 |
|
UniqueCub.cuh
|
|
|
|
UpSample.cuh
|
Turn some const variables into constexpr in C++ code (#165401)
|
2025-10-17 13:24:46 +00:00 |
|
UpSampleBicubic2d.cu
|
|
|
|
UpSampleBilinear2d.cu
|
[ROCm] new implementation of upsample_bilinear2d_backward (#164572)
|
2025-10-25 02:39:24 +00:00 |
|
UpSampleLinear1d.cu
|
|
|
|
UpSampleNearest1d.cu
|
|
|
|
UpSampleNearest2d.cu
|
[64-bit][CUDA] Upsample2D 64-bit indexing fix attempt 2 (#141923)
|
2025-01-04 02:30:38 +00:00 |
|
UpSampleNearest3d.cu
|
[64-bit] Int64 casting for UpSampleNearest3D (#144865)
|
2025-01-29 19:30:09 +00:00 |
|
UpSampleTrilinear3d.cu
|
|
|
|
ValidateCompressedIndicesKernel.cu
|
|
|
|
vol2col.cuh
|
|
|
|
WeightNorm.cu
|
|
|
|
ZetaKernel.cu
|
|
|