Summary:
We currently build breakpad from [this fork](https://github.com/driazati/breakpad) to include extra logic to restore signal handlers that were previously present. With some [new additions](https://github.com/google/breakpad/compare/main...driazati:main) this fork now includes a CMake based build, so we can add breakpad as a proper dependency rather than rely on including it in Docker images as a system library which is error prone (we have a bunch of images) and hard to extend to MacOS / Windows. This also includes some changes to the crash handling code to support MacOS / Windows in a similar way to Linux.
```python
import torch
# On Windows this writes crashes to C:\Users\<user>\AppData\pytorch_crashes
# On MacOS/Linux this writes crashes to /tmp/pytorch_crashes
torch.utils._crash_handler.enable_minidumps()
# Easy way to cause a segfault and trigger the handler
torch.bincount(input=torch.tensor([9223372036854775807]))
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63186
Reviewed By: malfet, seemethere
Differential Revision: D30318404
Pulled By: driazati
fbshipit-source-id: 0d7daf3701cfaba5451cc529a0730272ab1eb1dc
Summary:
This PR
* adds the breakpad build to most of the remaining docker images (except the mobile + slim ones)
* pins to a [fork of breakpad](https://github.com/google/breakpad/compare/master...driazati:master?expand=1) to enable dasiy chaining on signal handlers
* renames the API to be nicer
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59236
Reviewed By: malfet
Differential Revision: D28792511
Pulled By: driazati
fbshipit-source-id: 83723e74b7f0a00e1695210ac2620a0c91ab4bf2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55647
This adds [breakpad](https://github.com/google/breakpad) which comes with out-of-the-box utilities to register a signal handler that writes out a minidump on an unhandled exception. Right now this is gated behind a flag in `torch.utils`, but in the future it could be on by default. Sizewise this adds aboute 500k to `libtorch_cpu.so` (187275968 B to 187810016 B).
```bash
$ cat <<EOF > test.py
import torch
torch.utils.enable_minidump_collection()
# temporary util that just segfaults
torch._C._crash()
EOF
$ python test.py
Wrote minidump to /tmp/pytorch_crashes/6a829041-50e9-4247-ea992f99-a74cf47a.dmp
fish: “python test.py” terminated by signal SIGSEGV (Address boundary error)
$ minidump-2-core /tmp/pytorch_crashes/6a829041-50e9-4247-ea992f99-a74cf47a.dmp -o core.dmp
$ gdb python core.dmp
... commence debugging ...
```
Right now all exceptions that get passed up to Python don't trigger the signal handler (which by default only
handles [these](https://github.com/google/breakpad/blob/main/src/client/linux/handler/exception_handler.cc#L115)). It would be possible for PyTorch exceptions to explicitly write a minidump when passed up to Python (maybe only when the exception is unhandled or something).
Test Plan: Imported from OSS
Reviewed By: ailzhang
Differential Revision: D27679767
Pulled By: driazati
fbshipit-source-id: 1ab3b5160b6dc405f5097eb25acc644d533358d7