pytorch/test/cpp/api
yewentao256 fd6655a0f5 Feature: Implement support for cudnn_batch_norm_out kernel to replace the autogen approach. (#123020)
Fixes #115611

Autogen kernel may cause redundant copy, so we develop the kernel to improve efficiency.

Test Case:

```c++
#include <torch/torch.h>
#include <iostream>
#include <ATen/ATen.h>
#include <ATen/cuda/CUDAContext.h>

int main() {
    auto input = torch::rand({2, 3, 4, 4}, torch::device(torch::kCUDA));
    auto weight = torch::randn({3}, torch::device(torch::kCUDA));
    auto bias = torch::randn({3}, torch::device(torch::kCUDA));
    auto running_mean = torch::zeros({3}, torch::device(torch::kCUDA));
    auto running_var = torch::ones({3}, torch::device(torch::kCUDA));

    bool training = true;
    double exponential_average_factor = 0.1;
    double epsilon = 1e-5;

    auto output = torch::empty_like(input);
    auto save_mean = torch::empty({3}, torch::device(torch::kCUDA));
    auto save_var = torch::empty({3}, torch::device(torch::kCUDA));
    auto reserve = torch::empty({0}, torch::device(torch::kCUDA)); // empty place-holder

    at::native::cudnn_batch_norm_out(input, weight, bias, running_mean, running_var, training, exponential_average_factor, epsilon, output, save_mean, save_var, reserve);
    auto outputs = at::native::cudnn_batch_norm(input, weight, bias, running_mean, running_var, training, exponential_average_factor, epsilon);

    bool is_close_output = torch::allclose(output, std::get<0>(outputs));
    bool is_close_save_mean = torch::allclose(save_mean, std::get<1>(outputs));
    bool is_close_save_var = torch::allclose(save_var, std::get<2>(outputs));
    bool is_close_reserve = torch::allclose(reserve, std::get<3>(outputs));

    std::cout << "Is output close: " << is_close_output << std::endl;
    std::cout << "Is save_mean close: " << is_close_save_mean << std::endl;
    std::cout << "Is save_var close: " << is_close_save_var << std::endl;
    std::cout << "Is reserve close: " << is_close_reserve << std::endl;

    return 0;
}
```

Please CC @albanD

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123020
Approved by: https://github.com/andrewor14, https://github.com/eqy, https://github.com/albanD
2025-08-04 22:40:33 +00:00
..
any.cpp
autograd.cpp
CMakeLists.txt
dataloader.cpp [Lint] Update clang-format to 19.1.4 (#153889) 2025-05-20 14:12:46 +00:00
dispatch.cpp
enum.cpp
expanding-array.cpp
fft.cpp
functional.cpp
grad_mode.cpp
inference_mode.cpp
init_baseline.h
init_baseline.py
init.cpp
integration.cpp
ivalue.cpp
jit.cpp
memory.cpp
meta_tensor.cpp
misc.cpp
module.cpp
moduledict.cpp
modulelist.cpp
modules.cpp [2/N] Fix cppcoreguidelines-init-variables suppression (#146237) 2025-06-19 23:26:42 +00:00
namespace.cpp
nested_int.cpp
nested.cpp
nn_utils.cpp
operations.cpp
optim_baseline.h
optim_baseline.py
optim.cpp
ordered_dict.cpp
parallel_benchmark.cpp
parallel.cpp
parameterdict.cpp
parameterlist.cpp
README.md
rnn.cpp
sequential.cpp
serialize.cpp
special.cpp
static.cpp
support.cpp
support.h
tensor_cuda.cpp Feature: Implement support for cudnn_batch_norm_out kernel to replace the autogen approach. (#123020) 2025-08-04 22:40:33 +00:00
tensor_flatten.cpp
tensor_indexing.cpp
tensor_options_cuda.cpp
tensor_options.cpp
tensor.cpp [Lint] Update clang-format to 19.1.4 (#153889) 2025-05-20 14:12:46 +00:00
torch_include.cpp
transformer.cpp [BE][3/6] fix typos in test/ (#157637) 2025-07-17 12:08:33 +00:00

C++ Frontend Tests

In this folder live the tests for PyTorch's C++ Frontend. They use the GoogleTest test framework.

CUDA Tests

To make a test runnable only on platforms with CUDA, you should suffix your test with _CUDA, e.g.

TEST(MyTestSuite, MyTestCase_CUDA) { }

To make it runnable only on platforms with at least two CUDA machines, suffix it with _MultiCUDA instead of _CUDA, e.g.

TEST(MyTestSuite, MyTestCase_MultiCUDA) { }

There is logic in main.cpp that detects the availability and number of CUDA devices and supplies the appropriate negative filters to GoogleTest.

Integration Tests

Integration tests use the MNIST dataset. You must download it by running the following command from the PyTorch root folder:

$ python tools/download_mnist.py -d test/cpp/api/mnist

The required paths will be referenced as test/cpp/api/mnist/... in the test code, so you must run the integration tests from the PyTorch root folder.