mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 12:21:27 +01:00
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55647 This adds [breakpad](https://github.com/google/breakpad) which comes with out-of-the-box utilities to register a signal handler that writes out a minidump on an unhandled exception. Right now this is gated behind a flag in `torch.utils`, but in the future it could be on by default. Sizewise this adds aboute 500k to `libtorch_cpu.so` (187275968 B to 187810016 B). ```bash $ cat <<EOF > test.py import torch torch.utils.enable_minidump_collection() # temporary util that just segfaults torch._C._crash() EOF $ python test.py Wrote minidump to /tmp/pytorch_crashes/6a829041-50e9-4247-ea992f99-a74cf47a.dmp fish: “python test.py” terminated by signal SIGSEGV (Address boundary error) $ minidump-2-core /tmp/pytorch_crashes/6a829041-50e9-4247-ea992f99-a74cf47a.dmp -o core.dmp $ gdb python core.dmp ... commence debugging ... ``` Right now all exceptions that get passed up to Python don't trigger the signal handler (which by default only handles [these](https://github.com/google/breakpad/blob/main/src/client/linux/handler/exception_handler.cc#L115)). It would be possible for PyTorch exceptions to explicitly write a minidump when passed up to Python (maybe only when the exception is unhandled or something). Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D27679767 Pulled By: driazati fbshipit-source-id: 1ab3b5160b6dc405f5097eb25acc644d533358d7
19 lines
582 B
Python
19 lines
582 B
Python
import os
|
|
import sys
|
|
import pathlib
|
|
|
|
import torch
|
|
|
|
DEFAULT_MINIDUMP_DIR = "/tmp/pytorch_crashes"
|
|
|
|
def enable_minidump_collection(directory=DEFAULT_MINIDUMP_DIR):
|
|
if sys.platform != "linux":
|
|
raise RuntimeError("Minidump collection is currently only implemented for Linux platforms")
|
|
|
|
if directory == DEFAULT_MINIDUMP_DIR:
|
|
pathlib.Path(directory).mkdir(parents=True, exist_ok=True)
|
|
elif not os.path.exists(directory):
|
|
raise RuntimeError(f"Directory does not exist: {directory}")
|
|
|
|
torch._C._enable_minidump_collection(directory) # type: ignore
|