Commit Graph

34 Commits

Author SHA1 Message Date
Zizeng Meng
a762dd1f67 [Memento] On-demand mode using without torch api (#153171)
Summary:
CUDA Post: https://fb.workplace.com/groups/ai.efficiency.tools.users/permalink/2020094788475989/

# Context
In this diff, we want to enable the on-demand mode of memory snapshot to allow user to trace any remote process via dyno command line.

# Design decision

**How do we send on-demand signal to remote process**
We leverage the dyno-Kineto approach.
Since dyno is running on all machine in Meta, it can send a request to the remote machine to start the Kineto.
Kineto will start another thread for memoryProfiler (https://fburl.com/code/dxsmmrok)

**why we use different approach as CUDA**

On CUDA side, we are using pybind to load torch Module and invoke the python api to start/stop the profiling. However, this requires us to compile the whole torch binary in the predictor which is not recommended by runtime(andruwang)

Thus, we decide to use the CPP api directly to avoid un-necessary dependency

**why the snapshot is saved as json string directly instead of pickle**
Pickle is primarily designed for use with Python and doesn't have well support in cpp. Also, it is hard for user to download the snapshot file and open locally.
Due to the dependency issue, it is hard to import the gzip/pickle library to decode the data. Thus, let's use JSON for now. I will work on the visualizer to fasten the render and support other format later.

**Plan**:
* Now, we will encoded file into gz for MTIA ondemand only and update the visualizer to support both type.
* Update auto-trace and CUDA side to encode in gzip as well
* Fully remove pickle dependency.

Test Plan:
# Remote cogwheel test
Servicelab: https://fburl.com/servicelab/pckux7a3
snapshot file manifold: https://fburl.com/manifold/fnotk18c
snapshot file in pastry: P1805522232

Visualization on D74399684
 {F1977786422}

# Local Predictor Test
url: https://fburl.com/pytorch_memory_visualizer/y06kskkm

 {F1977787329}

Differential Revision: D74179606

Pull Request resolved: https://github.com/pytorch/pytorch/pull/153171
Approved by: https://github.com/sraikund16
2025-05-15 06:07:04 +00:00
Zizeng Meng
861945100e [Kineto] Enable OOM observer (#152160)
Summary:
# Context:
When memory leak happens, it usually trigger the OOM in the later iterations. The snapshot of full iteration will be huge and hard to interpret.
On CUDA side, they provide OOM observer which generates snapshot when OOM happens with latest 1,500,000 entries for debugging.

In this diff, we want to implement the feature on MTIA side

Test Plan:
Run this test with last diff in the stack.
```
buck run @//mode/opt  kineto/libkineto/fb/mtia/integration_tests:mtia_memory_auto_trace_test
```

As shown, the memory_snapshot is generated when oom happens
Log: P1794792326
Snapshot: https://fburl.com/pytorch_memory_visualizer/lx73y6s3 {F1977402355}

Differential Revision: D71993315

Pull Request resolved: https://github.com/pytorch/pytorch/pull/152160
Approved by: https://github.com/sraikund16
2025-04-27 15:56:44 +00:00
Yu, Guangye
b0810168a3 Generalize poison fork logic for each device backend (#144664)
# Motivation
Generalize the posion_fork code to make it reusable across different devices.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144664
Approved by: https://github.com/EikanWang, https://github.com/albanD
2025-04-13 09:54:30 +00:00
PyTorch MergeBot
a0ab243c3a Revert "Generalize poison fork logic for each device backend (#144664)"
This reverts commit 83bd0b63b5.

Reverted https://github.com/pytorch/pytorch/pull/144664 on behalf of https://github.com/atalman due to failing internal tests ([comment](https://github.com/pytorch/pytorch/pull/144664#issuecomment-2795157082))
2025-04-10 21:02:14 +00:00
Yu, Guangye
83bd0b63b5 Generalize poison fork logic for each device backend (#144664)
# Motivation
Generalize the posion_fork code to make it reusable across different devices.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144664
Approved by: https://github.com/EikanWang, https://github.com/albanD
2025-04-10 02:34:53 +00:00
PyTorch MergeBot
bf1132c196 Revert "Generalize poison fork logic for each device backend (#144664)"
This reverts commit d86c14156d.

Reverted https://github.com/pytorch/pytorch/pull/144664 on behalf of https://github.com/atalman due to failing periodic test: python test/test_cpp_extensions_mtia_backend.py TestCppExtensionMTIABackend.test_device_context ([comment](https://github.com/pytorch/pytorch/pull/144664#issuecomment-2784506104))
2025-04-07 20:09:53 +00:00
Yu, Guangye
d86c14156d Generalize poison fork logic for each device backend (#144664)
# Motivation
Generalize the posion_fork code to make it reusable across different devices.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144664
Approved by: https://github.com/EikanWang, https://github.com/albanD
2025-04-07 02:06:21 +00:00
Pat Vignola
42bd4a09a3 [MTIA] Add _mtia_getCurrentRawStream to MTIA module (#149436)
Summary: The FlexAttention path generates code that uses this function. Although streams are not used yet in Triton-MTIA, adding this now allows us to not branch out just for MTIA and generate different code.

Test Plan: CI

Reviewed By: chaos5958

Differential Revision: D70072057

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149436
Approved by: https://github.com/chaos5958
2025-03-19 05:17:51 +00:00
Pat Vignola
213eea216a [MTIA] Add _mtia_maybeExchangeDevice to MTIA module (#149340)
Summary: The FlexAttention path uses `_maybe_exchange_device`, so it will be needed eventually for MTIA as well.

Test Plan: `buck2 test fbcode//mtia/host_runtime/torch_mtia/tests:test_torch_mtia_api -- test_maybe_exchange_device`

Reviewed By: chaos5958

Differential Revision: D70072063

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149340
Approved by: https://github.com/chaos5958
2025-03-18 15:15:12 +00:00
Pat Vignola
769f19bf95 [MTIA] Add _mtia_exchangeDevice to MTIA module (#149322)
Summary: The FlexAttention path uses `_exchange_device`, so it will be needed eventually for MTIA as well.

Test Plan: `buck2 test fbcode//mtia/host_runtime/torch_mtia/tests:test_torch_mtia_api -- test_exchange_device`

Reviewed By: chaos5958

Differential Revision: D70072059

Pull Request resolved: https://github.com/pytorch/pytorch/pull/149322
Approved by: https://github.com/chaos5958
2025-03-17 19:31:10 +00:00
cyy
8fa81a6066 Enable misc-use-internal-linkage check and apply fixes (#148948)
Enables clang-tidy rule [`misc-use-internal-linkage`](https://clang.llvm.org/extra/clang-tidy/checks/misc/use-internal-linkage.html). This new check was introduced in Clang-Tidy 18 and is available due to recent update of Clang-Tidy 19.

The check marks functions and variables used only in the translation unit as static. Therefore undesired symbols are not leaked into other units, more link time optimisations are possible and the resulting binaries may be smaller.

The detected violations were mostly fixed by using static. In other cases, the symbols were indeed consumed by others files, then their declaring headers were included. Still some declarations were wrong and have been fixed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/148948
Approved by: https://github.com/Skylion007
2025-03-12 14:22:56 +00:00
Hyunho Yeo
5f621c5879 [MTIA] (3/n) Implement PyTorch APIs to query/reset device peak memory usage (#146710)
Summary: Public summary (shared with Github): This diff implements a C++-Python binding to enable `reset_peak_memory_stats`.

Test Plan: The test is implemented in the following diff.

Reviewed By: yuhc

Differential Revision: D68988673

Pull Request resolved: https://github.com/pytorch/pytorch/pull/146710
Approved by: https://github.com/nautsimon
2025-02-10 16:57:09 +00:00
cyy
29f52e3972 [2/N] Remove unnecessary once flag usage (#145057)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145057
Approved by: https://github.com/albanD
2025-01-23 09:48:46 +00:00
Hyunho Yeo
057d9aff39 [S481486] [MTIA] Correct mtia.device_count() API (#145338)
Summary:
Prev: Count the number of "general" accelerators

Curr: Count the number of MTIA devices by using the MTIA runtime API

Test Plan:
```
buck test //mtia/host_runtime/torch_mtia/tests:test_torch_mtia_api -- -r  test_get_device_count
```

https://www.internalfb.com/intern/testinfra/testrun/8162774572631995

Reviewed By: BoyueZheng

Differential Revision: D68472668

Pull Request resolved: https://github.com/pytorch/pytorch/pull/145338
Approved by: https://github.com/BoyueZheng, https://github.com/egienvalue
2025-01-22 17:45:15 +00:00
cyy
b0be30dd79 [19/N] Fix extra warnings brought by clang-tidy-17 (#144448)
Apply more clang-tidy fixes. There was a bug introduced by #144014 due to incorrect namespace concatenation which is reverted here.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144448
Approved by: https://github.com/albanD
2025-01-09 15:58:05 +00:00
cyy
d0070ca07e [18/N] Fix extra warnings brought by clang-tidy-17 (#144014)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/144014
Approved by: https://github.com/Skylion007, https://github.com/albanD
2025-01-08 17:21:55 +00:00
Zizeng Meng
eb67dd3e2d [3/N][Memory Profiling] Add memory profiling function for MTIA hooks (#142149)
Design Doc: https://fburl.com/gdoc/47zpuweb
Prototyping:  D66469341

In this diff, we implement two new mtia hooks to start/stop profiler and export the memory snapshot.

In next diff, we will integrate the mtia backend with profiler python api

Differential Revision: [D66823583](https://our.internmc.facebook.com/intern/diff/D66823583/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/142149
Approved by: https://github.com/nautsimon
2024-12-18 11:58:23 +00:00
Hyunho Yeo
d70b7029c8 [MTIA] Support torch.mtia.empty_cache() (#141533)
Summary: As title

Test Plan:
Passed a local unit test: `buck2 test //mtia/host_runtime/torch_mtia/tests:test_torch_mtia_api`

https://www.internalfb.com/intern/testinfra/testrun/4785074861101240

Reviewed By: nautsimon

Differential Revision: D66481778

Pull Request resolved: https://github.com/pytorch/pytorch/pull/141533
Approved by: https://github.com/nautsimon
2024-11-28 02:24:19 +00:00
Edward Yang
b14269dcfb Make Context to be Device-agnostic Step by Step (1/N) (#136519) (#138155)
Summary:
- make init to be device-agnostic and move it to AcceleratorHooksInterface
- refactoring context related to device initialization

Original pull request: https://github.com/pytorch/pytorch/pull/136519

Test Plan: contbuild & OSS CI, see 4a8e49389c

Reviewed By: malfet

Differential Revision: D64471142

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138155
Approved by: https://github.com/malfet, https://github.com/bobrenjc93
2024-10-17 20:58:56 +00:00
PyTorch MergeBot
d4d687ffb2 Revert "Make Context to be Device-agnostic Step by Step (1/N) (#136519)"
This reverts commit 4a8e49389c.

Reverted https://github.com/pytorch/pytorch/pull/136519 on behalf of https://github.com/clee2000 due to breaking internal tests related to MITA, @ezyang has a forward fix? ([comment](https://github.com/pytorch/pytorch/pull/136519#issuecomment-2414588302))
2024-10-15 17:19:16 +00:00
FFFrog
4a8e49389c Make Context to be Device-agnostic Step by Step (1/N) (#136519)
----

- make init to be device-agnostic and move it to AcceleratorHooksInterface
- refactoring context related to device initialization

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136519
Approved by: https://github.com/ezyang, https://github.com/EikanWang, https://github.com/guangyey
2024-10-13 12:38:02 +00:00
PyTorch MergeBot
079f909263 Revert "Make Context to be Device-agnostic Step by Step (1/N) (#136519)"
This reverts commit be0b75256a.

Reverted https://github.com/pytorch/pytorch/pull/136519 on behalf of https://github.com/jovianjaison due to this pr is causing errors internally ([comment](https://github.com/pytorch/pytorch/pull/136519#issuecomment-2405781093))
2024-10-10 18:32:17 +00:00
FFFrog
be0b75256a Make Context to be Device-agnostic Step by Step (1/N) (#136519)
- make init to be device-agnostic and move it to AcceleratorHooksInterface
- refactoring context related to device initialization

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136519
Approved by: https://github.com/ezyang, https://github.com/EikanWang, https://github.com/guangyey
2024-10-09 02:13:36 +00:00
cyy
0c0d8c8ff0 [1/N] Fix extra warnings brought by clang-tidy-17 (#137407)
Before we can use clang-tidy-17
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137407
Approved by: https://github.com/Skylion007, https://github.com/aaronenyeshi
2024-10-07 17:53:59 +00:00
Trung Truong
cc365fdd7b [MTIA] Support torch.cuda.get_device_capability equivalent API on MTIA (#135889)
Summary:
Mirror `get_device_capability` on MTIA per https://fburl.com/gdoc/p4lo5avn

At the moment, both the major and minor version are just 0

Test Plan:
Unit test: `buck2 test //mtia/host_runtime/torch_mtia/tests:test_torch_mtia_api`

https://www.internalfb.com/intern/testinfra/testconsole/testrun/1688850109958190/

Differential Revision: D62595296

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135889
Approved by: https://github.com/egienvalue
2024-09-17 17:42:56 +00:00
Simon Mahns
dcb03106b7 [Land Internally] MTIA equivalent of torch.cuda.memory_stats (#132007)
Summary: as title

Test Plan: pytorch ci failing: https://github.com/pytorch/pytorch/issues/131962

Differential Revision: D60335413

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132007
Approved by: https://github.com/hanzlfs, https://github.com/egienvalue
2024-07-29 20:47:18 +00:00
PyTorch MergeBot
b343644f3a Revert "MTIA equivalent of torch.cuda.memory_stats (#131673)"
This reverts commit 513ce5f69a.

Reverted https://github.com/pytorch/pytorch/pull/131673 on behalf of https://github.com/clee2000 due to linked internal diff has internal changes, not sure what happened here, but this shouldn't have been merged externally without also merging the internal diff ([comment](https://github.com/pytorch/pytorch/pull/131673#issuecomment-2251749644))
2024-07-26 00:54:37 +00:00
Simon Mahns
513ce5f69a MTIA equivalent of torch.cuda.memory_stats (#131673)
Summary: Adding MTIA equivalent of `torch.cuda.memory_stats`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131673
Approved by: https://github.com/egienvalue
2024-07-25 21:59:59 +00:00
Jun Luo
1fd7496ab2 [MTIA] Fix synchronize API (#128714)
Reviewed By: fenypatel99

Differential Revision: D58590313

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128714
Approved by: https://github.com/aaronenyeshi
2024-06-17 21:58:46 +00:00
egienvalue
73744a2c00 torch.mtia module for MTIA device backend (#123612)
MTIA device has its own Module in PyTorch now.
torch.mtia has following APIs similar to other backends. The lazy_init is also supported.
```
__all__ = [
    "init",
    "is_available",
    "synchronize",
    "device_count",
    "current_device",
    "current_stream",
    "default_stream",
    "set_stream",
    "stream",
    "device",
]

```
------------
For device management. We expand AccleratorHooksInterface to support generic device management and it can be used in both C++ and PyThon.
```
def _accelerator_hooks_device_count() -> _int: ...
def _accelerator_hooks_set_current_device(device_index: _int) -> None: ...
def _accelerator_hooks_get_current_device() -> _int : ...
def _accelerator_hooks_exchange_device(device_index: _int) -> _int : ...
def _accelerator_hooks_maybe_exchange_device(device_index: _int) -> _int : ...
```

---------
Adding get_device_module API to retrieve device modules for different device types.
```
def get_device_module(device: Optional[Union[torch.device, str]] = None)
```
---------

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123612
Approved by: https://github.com/albanD
ghstack dependencies: #123611
2024-04-26 16:17:54 +00:00
PyTorch MergeBot
e04c7b19f4 Revert "torch.mtia module for MTIA device backend (#123612)"
This reverts commit 381653de63.

Reverted https://github.com/pytorch/pytorch/pull/123612 on behalf of https://github.com/jeffdaily due to this PR broke ROCm with message RuntimeError: Cannot have MTIA with other devices ([comment](https://github.com/pytorch/pytorch/pull/123612#issuecomment-2077649762))
2024-04-25 16:06:46 +00:00
egienvalue
381653de63 torch.mtia module for MTIA device backend (#123612)
MTIA device has its own Module in PyTorch now.
torch.mtia has following APIs similar to other backends. The lazy_init is also supported.
```
__all__ = [
    "init",
    "is_available",
    "synchronize",
    "device_count",
    "current_device",
    "current_stream",
    "default_stream",
    "set_stream",
    "stream",
    "device",
]

```
------------
For device management. We expand AccleratorHooksInterface to support generic device management and it can be used in both C++ and PyThon.
```
def _accelerator_hooks_device_count() -> _int: ...
def _accelerator_hooks_set_current_device(device_index: _int) -> None: ...
def _accelerator_hooks_get_current_device() -> _int : ...
def _accelerator_hooks_exchange_device(device_index: _int) -> _int : ...
def _accelerator_hooks_maybe_exchange_device(device_index: _int) -> _int : ...
```

---------
Adding get_device_module API to retrieve device modules for different device types.
```
def get_device_module(device: Optional[Union[torch.device, str]] = None)
```
---------

Differential Revision: [D56443356](https://our.internmc.facebook.com/intern/diff/D56443356)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123612
Approved by: https://github.com/albanD
ghstack dependencies: #123611
2024-04-24 20:51:20 +00:00
PyTorch MergeBot
929242a15c Revert "torch.mtia module for MTIA device backend (#123612)"
This reverts commit d7e1bf9ff9.

Reverted https://github.com/pytorch/pytorch/pull/123612 on behalf of https://github.com/jeffdaily due to This broke ROCm. see test_overrides.py ([comment](https://github.com/pytorch/pytorch/pull/123611#issuecomment-2067363780))
2024-04-19 22:44:26 +00:00
egienvalue
d7e1bf9ff9 torch.mtia module for MTIA device backend (#123612)
MTIA device has its own Module in PyTorch now.
torch.mtia has following APIs similar to other backends. The lazy_init is also supported.
```
__all__ = [
    "init",
    "is_available",
    "synchronize",
    "device_count",
    "current_device",
    "current_stream",
    "default_stream",
    "set_stream",
    "stream",
    "device",
]

```
------------
For device management. We expand AccleratorHooksInterface to support generic device management and it can be used in both C++ and PyThon.
```
def _accelerator_hooks_device_count() -> _int: ...
def _accelerator_hooks_set_current_device(device_index: _int) -> None: ...
def _accelerator_hooks_get_current_device() -> _int : ...
def _accelerator_hooks_exchange_device(device_index: _int) -> _int : ...
def _accelerator_hooks_maybe_exchange_device(device_index: _int) -> _int : ...
```

---------
Adding get_device_module API to retrieve device modules for different device types.
```
def get_device_module(device: Optional[Union[torch.device, str]] = None)
```
---------
@exported-using-ghexport

Differential Revision: [D52923602](https://our.internmc.facebook.com/intern/diff/D52923602/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123612
Approved by: https://github.com/albanD
ghstack dependencies: #123611
2024-04-18 17:38:06 +00:00