Commit Graph

11 Commits

Author SHA1 Message Date
Yiming Zhou
6eb8d9671b Enable torch.nn.functional.batch_norm in test_export_opinfo (#164261)
Summary:
There are actually 2 `nn.functional.batch_norm` in op_db. See https://github.com/pytorch/pytorch/blob/main/torch/testing/_internal/common_methods_invocations.py#L16797-L16831

So previously the test failed at `assert len(ops)==1`

Test Plan: python test/export/test_export_opinfo.py TestExportOnFakeCudaCUDA

Differential Revision: D83581427

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164261
Approved by: https://github.com/SherlockNoMad
2025-10-01 21:56:08 +00:00
Yiming Zhou
937869657e Exporting aten.sdpa with cuda under fake mode on a cuda-less machine (#164162)
Summary:
As titled.

sdpa will select backend based on hardware check, and it fails when exporting with cuda under fake mode on a cuda-less machine.

We guard `at::cuda::is_available()` check before `at::cuda::getCurrentDeviceProperties()` and give warnings.

Test Plan: buck2 run mode/dev-nosan caffe2/test:test_export -- -r nn_functional_scaled_dot_product_attention

Differential Revision: D83496154

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164162
Approved by: https://github.com/SherlockNoMad
2025-09-30 17:21:31 +00:00
Sherlock Huang
3564cd294c Fix TestExportOpInfo (#164184)
Fixes https://github.com/pytorch/pytorch/issues/163699

Pull Request resolved: https://github.com/pytorch/pytorch/pull/164184
Approved by: https://github.com/yiming0416, https://github.com/tugsbayasgalan
2025-09-30 16:12:39 +00:00
Yiming Zhou
2a45f30ae7 Exporting aten.conv with cuda under fake mode on a cuda-less machine (#163912)
Summary:
Improve op coverage of exporting a CUDA model on a CPU-only machine under fake tensor mode.

For `torch.nn.functional.conv2d`, it will `_select_conv_backend` based on input and weight shapes.

When calling into `supportsDepthwiseConvolutionWithCuDNN()`, it calls `at::cuda::getCurrentDeviceProperties()` and fails on a CPU-only machine.

So we check if CUDA is actually enabled first.

Test Plan: TORCH_SHOW_CPP_STACKTRACES=1 buck2 run fbcode//caffe2/test:test_export -- --r nn_functional_conv2d

Reviewed By: angelayi, henryoier

Differential Revision: D80562984

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163912
Approved by: https://github.com/SherlockNoMad
2025-09-26 06:04:20 +00:00
Sherlock Huang
6f9aef5fef [2/n] Support module.to("cuda:0") in FakeTensorMode on cuda-less machine (#163433)
Summary:
To support exporting a cuda model on a CPU-only machine under fake tensor mode.
User commonly need to move sample inputs to the cuda device with .to("cuda:0") or .to("cuda") call.
This diff supports this.

I expect the following pattern to work

```
with FakeTensorMode(allow_non_fake_inputs=True):
    cuda_module = module.to("cuda:0")
    cuda_sample_inputs = tuple([x.to("cuda:0") for x in sample_inputs])

    with torch.no_grad():
        ep = torch.export.export(cuda_module, cuda_sample_inputs)

```

Before
Moving module.to("cuda:0") under fake tensor mode would have parameter on `meta` device.

After
parameters would be on "cuda:0" .

Test Plan: buck2 run  fbcode//caffe2/test:fake_tensor -- --r test_move_module

Reviewed By: mikaylagawarecki

Differential Revision: D80102876

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163433
Approved by: https://github.com/albanD
2025-09-22 20:16:32 +00:00
Sherlock Huang
033b7d1e1a [Reland] Return NoOpDeviceGuardImpl in replace of CudaDeviceGuard when device is not available (#163187)
Reland of #160532

Summary:

To support exporting a cuda model on a CPU-only machine under fake tensor mode. User commonly need to move sample inputs to the cuda device with .to("cuda:0") or .to("cuda") call. This diff supports this.
I expect the following pattern to work
```
with FakeTensorMode(allow_non_fake_inputs=True):
    cuda_module = module.to("cuda:0")
    cuda_sample_inputs = tuple([x.to("cuda:0") for x in sample_inputs])
    with torch.no_grad():
        ep = torch.export.export(cuda_module, cuda_sample_inputs)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163016
Approved by: https://github.com/huydhn

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163187
Approved by: https://github.com/angelayi
2025-09-18 04:46:26 +00:00
PyTorch MergeBot
79fd497423 Revert "[Reland] Return NoOpDeviceGuardImpl in replace of CudaDeviceGuard when device is not available, or cpu-only build (#163016)"
This reverts commit f1eb99e2e4.

Reverted https://github.com/pytorch/pytorch/pull/163016 on behalf of https://github.com/jeffdaily due to broke rocm CI, see export/test_export_opinfo.py::TestExportOnFakeCudaCUDA::test_fake_export_nonzero_cuda_float32 [GH job link](https://github.com/pytorch/pytorch/actions/runs/17787208381/job/50564369696) [HUD commit link](f1eb99e2e4) ([comment](https://github.com/pytorch/pytorch/pull/163016#issuecomment-3303707552))
2025-09-17 16:17:53 +00:00
Sherlock Huang
f1eb99e2e4 [Reland] Return NoOpDeviceGuardImpl in replace of CudaDeviceGuard when device is not available, or cpu-only build (#163016)
Reland of #160532

Summary:

To support exporting a cuda model on a CPU-only machine under fake tensor mode.
User commonly need to move sample inputs to the cuda device with .to("cuda:0") or .to("cuda") call.
This diff supports this.
I expect the following pattern to work
```
with FakeTensorMode(allow_non_fake_inputs=True):
    cuda_module = module.to("cuda:0")
    cuda_sample_inputs = tuple([x.to("cuda:0") for x in sample_inputs])
    with torch.no_grad():
        ep = torch.export.export(cuda_module, cuda_sample_inputs)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/163016
Approved by: https://github.com/huydhn
2025-09-17 05:01:33 +00:00
PyTorch MergeBot
9c93dc8123 Revert "Return NoOpDeviceGuardImpl in replace of CudaDeviceGuard when device is not available, or cpu-only build (#160532)"
This reverts commit a956c4ab1c.

Reverted https://github.com/pytorch/pytorch/pull/160532 on behalf of https://github.com/huydhn due to Reverted internally ([comment](https://github.com/pytorch/pytorch/pull/160532#issuecomment-3287745165))
2025-09-13 07:42:12 +00:00
Sherlock Huang
a956c4ab1c Return NoOpDeviceGuardImpl in replace of CudaDeviceGuard when device is not available, or cpu-only build (#160532)
Summary:

To support exporting a cuda model on a CPU-only machine under fake tensor mode.
User commonly need to move sample inputs to the cuda device with .to("cuda:0") or .to("cuda") call.
This diff supports this.
I expect the following pattern to work
```
with FakeTensorMode(allow_non_fake_inputs=True):
    cuda_module = module.to("cuda:0")
    cuda_sample_inputs = tuple([x.to("cuda:0") for x in sample_inputs])
    with torch.no_grad():
        ep = torch.export.export(cuda_module, cuda_sample_inputs)
```

Test Plan:
CI

Rollback Plan:

Differential Revision: D80181887

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160532
Approved by: https://github.com/henryoier, https://github.com/ezyang
2025-09-13 01:50:51 +00:00
Sherlock Huang
eaa5d9d3d3 Introduce OpInfo test for testing export on fake device (#160694)
Summary: Prepare for the upcoming diffs for exporting on fake cuda device.

Test Plan:
test

Rollback Plan:

Differential Revision: D80304225

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160694
Approved by: https://github.com/dolpm
2025-08-15 07:26:28 +00:00