If pytorch is installed systemwide (via os package manager) or by alternative package manager like `uv`, pip is not available, causing error in `collect_env`.
However it is still possible to collect exactly the same list using `importlib` API, which is always available.
Fixes#144615
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144616
Approved by: https://github.com/malfet
Updates flake8 to v6.1.0 and fixes a few lints using sed and some ruff tooling.
- Replace `assert(0)` with `raise AssertionError()`
- Remove extraneous parenthesis i.e.
- `assert(a == b)` -> `assert a == b`
- `if(x > y or y < z):`->`if x > y or y < z:`
- And `return('...')` -> `return '...'`
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116591
Approved by: https://github.com/albanD, https://github.com/malfet
I'm looking to repurpose some logic in `torch.utils.collect_env` for the `geowatch` package. I'm mostly able to just use this script as a library, which is great because it reduces code in my package. However, the issue is that the package patterns that are relevant to torch are hard-coded inside of `get_conda_packages` and `get_pip_packages`.
The changes I made are simple. I defined the default package patterns as two global sets, and I added an argument to each function that lets the user customize exactly what package patterns are relevant. If they are not specified the defaults are used.
I was considering extending the power of the patterns by utilizing `fnmatch`, `re` (or [xdev.pattern](https://github.com/Erotemic/xdev/blob/main/xdev/patterns.py) which abstracts them both), but instead I opted to just use the existing `__contains__` test to keep things simple.
From torch's perspective this should make maintaining this file slightly easier because to update relevant packages, the developer now updates two neighboring top-level globals instead of two separated local variables. However, it does add an argument to two functions, and that argument isn't used in torch itself, so there is an argument for removing that, and then users *could* still have some control by modifying globals, but I think the way I did it balances the tradeoffs well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112993
Approved by: https://github.com/zou3519
Printing just the device name is not helpful when investigating PyTorch issues filed for specific AMD GPUs, as the support/issue might depend on the gfx arch, which is part of the gcnArchName property.
`torch.cuda.get_device_properties(0).gcnArchName` will print the value of the `gcnArchName` property: eg.
```
>>> torch.cuda.get_device_properties(0).gcnArchName
'gfx906:sramecc+:xnack-'
```
```
root@6f064e3c19fb:/data/pytorch/test# python ../torch/utils/collect_env.py
...
GPU models and configuration: AMD Radeon Graphics(gfx906:sramecc+:xnack-)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107477
Approved by: https://github.com/albanD
Optimize unnecessary collection cast calls, unnecessary calls to list, tuple, and dict, and simplify calls to the sorted builtin. This should strictly improve speed and improve readability.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94323
Approved by: https://github.com/albanD
This PR sets CUDA_MODULE_LOADING if it's not set by the user. By default, it sets it to "LAZY".
It was tested using the following commands:
```
python -c "import torch; tensor=torch.randn(20, 16, 50, 100).cuda(); free, total = torch.cuda.cudart().cudaMemGetInfo(0); print(total-free)"
```
which shows a memory usage of: 287,047,680 bytes
vs
```
CUDA_MODULE_LOADING="DEFAULT" python -c "import torch; tensor=torch.randn(20, 16, 50, 100).cuda(); free, total = torch.cuda.cudart().cudaMemGetInfo(0); print(total-free)"
```
which shows 666,632,192 bytes.
C++ implementation is needed for the libtorch users (otherwise it could have been a pure python functionality).
cc: @ptrblck @ngimel @malfet
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85692
Approved by: https://github.com/malfet
Users were reporting errors of not being able to use collect_env with
older versions of python. This adds a test to ensure that we maintain
compat for this script with older versions of python
Signed-off-by: Eli Uriegas <eliuriegasfb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78946
Approved by: https://github.com/janeyx99
Summary:
Fixes a bug where collect_env.py was not able to be run without having
torch installed
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74342
Reviewed By: malfet, janeyx99
Differential Revision: D34943464
Pulled By: seemethere
fbshipit-source-id: dbaa0004b88cb643a9c6426c9ea7c5be3d3c9ef5
(cherry picked from commit 4f39ebb823f88df0c3902db15deaffc6ba481cb3)
Summary:
- Target Sha1: ae108ef49aa5623b896fc93d4298c49d1750d9ba
- Make USE_XNNPACK a dependent option on cmake minimum version 3.12
- Print USE_XNNPACK under cmake options summary, and print the
availability from collet_env.py
- Skip XNNPACK based tests when XNNPACK is not available
- Add SkipIfNoXNNPACK wrapper to skip tests
- Update cmake version for xenial-py3.7-gcc5.4 image to 3.12.4
- This is required for the backwards compatibility test.
The PyTorch op schema is XNNPACK dependent. See,
aten/src/ATen/native/xnnpack/RegisterOpContextClass.cpp for
example. The nightly version is assumed to have USE_XNNPACK=ON,
so with this change we ensure that the test build can also
have XNNPACK.
- HACK: skipping test_xnnpack_integration tests on ROCM
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72642
Reviewed By: kimishpatel
Differential Revision: D34456794
Pulled By: digantdesai
fbshipit-source-id: 85dbfe0211de7846d8a84321b14fdb061cd6c037
(cherry picked from commit 6cf48e7b64d6979962d701b5d493998262cc8bfa)
Summary:
Invoking `pip` or `pip3` yields list of packages invoked for `pip` alias on the path, rather than for the one currently being executed. Changed `get_pip_packages` to use `sys.executable + '-mpip'`
Also, add mypy to the list of packages of interest
Discovered while looking at https://github.com/pytorch/pytorch/issues/63279
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63321
Reviewed By: walterddr
Differential Revision: D30342099
Pulled By: malfet
fbshipit-source-id: fc8d17cf2ddcf18236cfde5c1b9edb4e72804ee0
Summary:
Fixes https://github.com/pytorch/pytorch/issues/35901
This change is designed to prevent fragmentation in the Caching Allocator. Permissive block splitting in the allocator allows very large blocks to be split into many pieces. Once split too finely it is unlikely all pieces will be 'free' at that same time so the original allocation can never be returned. Anecdotally, we've seen a model run out of memory failing to alloc a 50 MB block on a 32 GB card while the caching allocator is holding 13 GB of 'split free blocks'
Approach:
- Large blocks above a certain size are designated "oversize". This limit is currently set 1 decade above large, 200 MB
- Oversize blocks can not be split
- Oversize blocks must closely match the requested size (e.g. a 200 MB request will match an existing 205 MB block, but not a 300 MB block)
- In lieu of splitting oversize blocks there is a mechanism to quickly free a single oversize block (to the system allocator) to allow an appropriate size block to be allocated. This will be activated under memory pressure and will prevent _release_cached_blocks()_ from triggering
Initial performance tests show this is similar or quicker than the original strategy. Additional tests are ongoing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44742
Reviewed By: zou3519
Differential Revision: D29186394
Pulled By: ezyang
fbshipit-source-id: c88918836db3f51df59de6d1b3e03602ebe306a9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55647
This adds [breakpad](https://github.com/google/breakpad) which comes with out-of-the-box utilities to register a signal handler that writes out a minidump on an unhandled exception. Right now this is gated behind a flag in `torch.utils`, but in the future it could be on by default. Sizewise this adds aboute 500k to `libtorch_cpu.so` (187275968 B to 187810016 B).
```bash
$ cat <<EOF > test.py
import torch
torch.utils.enable_minidump_collection()
# temporary util that just segfaults
torch._C._crash()
EOF
$ python test.py
Wrote minidump to /tmp/pytorch_crashes/6a829041-50e9-4247-ea992f99-a74cf47a.dmp
fish: “python test.py” terminated by signal SIGSEGV (Address boundary error)
$ minidump-2-core /tmp/pytorch_crashes/6a829041-50e9-4247-ea992f99-a74cf47a.dmp -o core.dmp
$ gdb python core.dmp
... commence debugging ...
```
Right now all exceptions that get passed up to Python don't trigger the signal handler (which by default only
handles [these](https://github.com/google/breakpad/blob/main/src/client/linux/handler/exception_handler.cc#L115)). It would be possible for PyTorch exceptions to explicitly write a minidump when passed up to Python (maybe only when the exception is unhandled or something).
Test Plan: Imported from OSS
Reviewed By: ailzhang
Differential Revision: D27679767
Pulled By: driazati
fbshipit-source-id: 1ab3b5160b6dc405f5097eb25acc644d533358d7
Summary:
Fixes https://github.com/pytorch/pytorch/issues/35901
This change is designed to prevent fragmentation in the Caching Allocator. Permissive block splitting in the allocator allows very large blocks to be split into many pieces. Once split too finely it is unlikely all pieces will be 'free' at that same time so the original allocation can never be returned. Anecdotally, we've seen a model run out of memory failing to alloc a 50 MB block on a 32 GB card while the caching allocator is holding 13 GB of 'split free blocks'
Approach:
- Large blocks above a certain size are designated "oversize". This limit is currently set 1 decade above large, 200 MB
- Oversize blocks can not be split
- Oversize blocks must closely match the requested size (e.g. a 200 MB request will match an existing 205 MB block, but not a 300 MB block)
- In lieu of splitting oversize blocks there is a mechanism to quickly free a single oversize block (to the system allocator) to allow an appropriate size block to be allocated. This will be activated under memory pressure and will prevent _release_cached_blocks()_ from triggering
Initial performance tests show this is similar or quicker than the original strategy. Additional tests are ongoing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44742
Reviewed By: ngimel
Differential Revision: D23752058
Pulled By: ezyang
fbshipit-source-id: ccb7c13e3cf8ef2707706726ac9aaac3a5e3d5c8
Summary:
Inspired by https://github.com/pytorch/pytorch/issues/47993, this fixes the import error in `collect_env.py` with older version of PyTorch when `torch.version` does not have `hip` property.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48076
Reviewed By: seemethere, xuzhao9
Differential Revision: D25024352
Pulled By: samestep
fbshipit-source-id: 7dff9d2ab80b0bd25f9ca035d8660f38419cdeca
Summary:
Moved all torch specific checks under `if TORCH_AVAILABLE` block
Embed gpu_info dict back into SystemEnv constructor creation and deduplicate some code between HIP and CUDA cases
Fixes https://github.com/pytorch/pytorch/issues/47397
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47398
Reviewed By: walterddr
Differential Revision: D24740421
Pulled By: malfet
fbshipit-source-id: d0a1fe5b428617cb1a9d027324d24d7371c68d64
Summary:
This adds HIP version info to the `collect_env.py` output.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44106
Reviewed By: VitalyFedyunin
Differential Revision: D23652341
Pulled By: zou3519
fbshipit-source-id: a1f5bce8da7ad27a1277a95885934293d0fd43c5
Summary:
No type annotations can be added to the script, as it still have to be Python-2 compliant.
Make changes to avoid variable type redefinition.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43062
Reviewed By: zou3519
Differential Revision: D23132991
Pulled By: malfet
fbshipit-source-id: 360c02e564398f555273e5889a99f834a5467059
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35615
Python 2 has reached end-of-life and is no longer supported by PyTorch.
Now we can clean up a lot of cruft that we put in place to support it.
These changes were all done manually, and I skipped anything that seemed
like it would take more than a few seconds, so I think it makes sense to
review it manually as well (though using side-by-side view and ignoring
whitespace change might be helpful).
Test Plan: CI
Differential Revision: D20842886
Pulled By: dreiss
fbshipit-source-id: 8cad4e87c45895e7ce3938a88e61157a79504aed