pytorch/docs/source
Daniel Dale ce56ee11fd Extend torch.cuda.is_available() to attempt an NVML-based CUDA availability assessment when explicitly requested by the user (#85951)
Fixes #83973 (This is a substitute PR for https://github.com/pytorch/pytorch/pull/85024)

First of all, thanks for your invaluable contributions to PyTorch everyone!

Given how extensively `torch.cuda.is_available` is used in the PyTorch ecosystem, IMHO it's worthwhile to provide downstream libraries/frameworks/users the ability to alter the default behavior of `torch.cuda.is_available` in the context of their PyTorch usage.

I'm confident there are many current and future such use cases which could benefit from leveraging a weakened, NVML-based `torch.cuda.is_available` assessment at a downstream framework's explicit direction (thanks @malfet 81da50a972 !). Though one could always patch out the `torch.cuda.is_available` function with another implementation in a downstream library, I think this environmental variable based configuration option is more convenient and the cost to including the option is quite low.

As discussed in https://github.com/pytorch/pytorch/pull/85024#issuecomment-1261542045, this PR gates new non-default NVML-based CUDA behavior with an environmental variable (PYTORCH_NVML_BASED_CUDA_CHK) that allows a user/framework to invoke non-default, NVML-based `is_available()` assessments if desired.

Thanks again for your work everyone!
@ngimel @malfet @awaelchli

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85951
Approved by: https://github.com/ngimel
2022-10-12 18:37:50 +00:00
..
_static
_templates
community [skip-ci] Fixed bad link in build_ci_governance.rst (#85933) 2022-10-03 17:35:44 +00:00
elastic Add watchdog to TorchElastic agent and trainers (#84081) 2022-09-07 00:17:20 +00:00
notes Extend torch.cuda.is_available() to attempt an NVML-based CUDA availability assessment when explicitly requested by the user (#85951) 2022-10-12 18:37:50 +00:00
rpc
scripts [ONNX] Update ONNX documentation to include unsupported operators (#84496) 2022-09-16 23:48:37 +00:00
amp.rst Remove deprecated torch.matrix_rank (#70981) 2022-09-22 17:40:46 +00:00
autograd.rst Change torch.autograd.graph.disable_saved_tensors_hooks to be public API (#85994) 2022-10-03 16:25:01 +00:00
backends.rst Add opteinsum backend to give users control (#86219) 2022-10-05 06:33:25 +00:00
benchmark_utils.rst
bottleneck.rst add itt unit test and docstrings (#84848) 2022-09-28 01:39:58 +00:00
checkpoint.rst
complex_numbers.rst
conf.py Add user facing documentation for CSAN (#84689) 2022-09-09 15:29:34 +00:00
config_mod.rst
cpp_extension.rst
cpp_index.rst
cuda._sanitizer.rst Rework printing tensor aliases in CSAN error message (#85008) 2022-09-21 13:41:52 +00:00
cuda.rst (Re-open) Adds cudaMallocAsync as an alternative backend for the CUDA allocator (#82682) 2022-10-12 03:44:21 +00:00
cudnn_persistent_rnn.rst
cudnn_rnn_determinism.rst
data.rst Extend collate function that can register collate functions to handle specific types (#85748) 2022-09-30 13:30:18 +00:00
ddp_comm_hooks.rst Fix two small typos in ddp_comm_hooks.rst (#82047) 2022-07-23 19:10:57 +00:00
deploy.rst Delete torch::deploy from pytorch core (#85953) 2022-10-06 07:20:16 +00:00
distributed.algorithms.join.rst
distributed.elastic.rst
distributed.optim.rst
distributed.rst Update distributed.rst backend collective support chart (#86406) 2022-10-07 12:59:09 +00:00
distributions.rst
dlpack.rst
docutils.conf
fft.rst
fsdp.rst
futures.rst
fx.rst CSE Pass and common pass Tests (#81742) 2022-07-22 03:45:09 +00:00
hub.rst
index.rst Add torch.nested namespace (#84102) 2022-09-12 16:31:05 +00:00
jit_builtin_functions.rst
jit_language_reference_v2.rst Fix typos in docs (#80602) 2022-08-29 23:32:44 +00:00
jit_language_reference.rst (Re-open) Adds cudaMallocAsync as an alternative backend for the CUDA allocator (#82682) 2022-10-12 03:44:21 +00:00
jit_python_reference.rst
jit_unsupported.rst (Re-open) Adds cudaMallocAsync as an alternative backend for the CUDA allocator (#82682) 2022-10-12 03:44:21 +00:00
jit_utils.rst
jit.rst
library.rst
linalg.rst [Array API] Add linalg.vecdot (#70542) 2022-07-12 14:28:54 +00:00
masked.rst [maskedtensor] first commit, core and creation (#82836) 2022-08-16 20:10:34 +00:00
math-quantizer-equation.png
mobile_optimizer.rst
model_zoo.rst
monitor.rst
multiprocessing.rst
name_inference.rst
named_tensor.rst Add torch.unflatten and improve its docs (#81399) 2022-07-29 15:02:42 +00:00
nested.rst Add python nested_tensor and as_nested_tensor constructors in torch.nested (#85593) 2022-09-28 20:15:02 +00:00
nn.functional.rst
nn.init.rst
nn.rst
onnx_supported_aten_ops.rst [ONNX] Update ONNX documentation to include unsupported operators (#84496) 2022-09-16 23:48:37 +00:00
onnx.rst [ONNX] Update user documentation (#85819) 2022-09-30 19:35:34 +00:00
optim.rst [doc] LR scheduler example fix (#86629) 2022-10-11 21:41:50 +00:00
package.rst Fix typos in torch.package documentation (#82994) 2022-08-08 20:19:17 +00:00
pipeline.rst
profiler.rst Fix ITT unit-tests if PyTorch is compiled with USE_ITT=OFF (#86199) 2022-10-04 21:57:05 +00:00
quantization-accuracy-debugging.rst
quantization-backend-configuration.rst
quantization-support.rst [quant][ao_migration] nn.intrinsic.quantized migration to ao (#86172) 2022-10-08 00:01:38 +00:00
quantization.rst (Re-open) Adds cudaMallocAsync as an alternative backend for the CUDA allocator (#82682) 2022-10-12 03:44:21 +00:00
random.rst
rpc.rst Fix typo under docs directory and RELEASE.md (#85896) 2022-09-29 21:41:59 +00:00
sparse.rst (Re-open) Adds cudaMallocAsync as an alternative backend for the CUDA allocator (#82682) 2022-10-12 03:44:21 +00:00
special.rst [primTorch] special: j0, j1, spherical_j0 (#86049) 2022-10-04 18:21:46 +00:00
storage.rst Fix typos in docs (#80602) 2022-08-29 23:32:44 +00:00
tensor_attributes.rst [docs] Add `torch.channels_last_3d (#85888) 2022-10-03 17:32:07 +00:00
tensor_view.rst
tensorboard.rst
tensors.rst Remove deprecated torch.lstsq (#70980) 2022-09-23 00:16:55 +00:00
testing.rst Fix links in torch.testing docs (#80353) 2022-07-11 19:15:53 +00:00
torch.ao.ns._numeric_suite_fx.rst
torch.ao.ns._numeric_suite.rst
torch.overrides.rst
torch.rst Add Context Manager for Disabling Multithreading in Backwards, use in aot autograd (#86245) 2022-10-06 03:27:42 +00:00
type_info.rst