pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Alanna Burke	250e9af4da	Removing per torch.compile audit. (#154572 ) Removing https://pytorch.org/docs/stable/torch.compiler_best_practices_for_backends.html per torch.compile audit Pull Request resolved: https://github.com/pytorch/pytorch/pull/154572 Approved by: https://github.com/williamwen42, https://github.com/svekars	2025-06-03 15:41:52 +00:00
bobrenjc93	33f2d0ff45	add reference to stances from dynamic shapes doc (#154823 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154823 Approved by: https://github.com/Skylion007, https://github.com/williamwen42 ghstack dependencies: #154802, #154826, #154822	2025-06-02 18:47:19 +00:00
bobrenjc93	d99e9568ec	Add docs for how to mark as unbacked (#154822 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154822 Approved by: https://github.com/Skylion007 ghstack dependencies: #154802, #154826	2025-06-02 18:30:57 +00:00
bobrenjc93	9fe1b40d17	[ez] add dynamic sources docs (#154826 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154826 Approved by: https://github.com/Skylion007 ghstack dependencies: #154802	2025-06-02 17:53:30 +00:00
Nikita Shulga	0350c7e72c	[BE] Introduce torch.AcceleratorError (#152023 ) Which inherits from `RuntimeError` and contains `error_code`, which in case of CUDA should contain error returned by `cudaGetLastError` `torch::detail::_new_accelerator_error_object(c10::AcceleratorError&)` follows the pattern of CPython's [`PyErr_SetString`](`cb8a72b301/Python/errors.c (L282)`), namely - Convert cstr into Python string with `PyUnicode_FromString` - Create new exception object using `PyObject_CallOneArg` just like it's done in [`_PyErr_CreateException`](`cb8a72b301/Python/errors.c (L32)`) - Set `error_code` property using `PyObject_SetAttrString` - decref all temporary references Test that it works and captures CPP backtrace (in addition to CI) by running ```python import os os.environ['TORCH_SHOW_CPP_STACKTRACES'] = '1' import torch x = torch.rand(10, device="cuda") y = torch.arange(20, device="cuda") try: x[y] = 2 print(x) except torch.AcceleratorError as e: print("Exception was raised", e.args[0]) print("Captured error code is ", e.error_code) ``` which produces following output ``` Exception was raised CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. Exception raised from c10_cuda_check_implementation at /home/ubuntu/pytorch/c10/cuda/CUDAException.cpp:41 (most recent call first): C++ CapturedTraceback: #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0 #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0 #6 c10::cuda::c10_cuda_check_implementation(int, char const, char const, int, bool) [clone .cold] from CUDAException.cpp:0 #7 void at::native::gpu_kernel_impl<at::native::AbsFunctor<float> >(at::TensorIteratorBase&, at::native::AbsFunctor<float> const&) [clone .isra.0] from tmpxft_000191fc_00000000-6_AbsKernel.cudafe1.cpp:0 #8 at::native::abs_kernel_cuda(at::TensorIteratorBase&) from ??:0 #9 at::Tensor& at::native::unary_op_impl_with_complex_to_float_out<at::native::abs_stub_DECLARE_DISPATCH_type>(at::Tensor&, at::Tensor const&, at::native::abs_stub_DECLARE_DISPATCH_type&, bool) [clone .constprop.0] from UnaryOps.cpp:0 #10 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA_out_abs_out(at::Tensor const&, at::Tensor&) from RegisterCUDA_0.cpp:0 #11 at::_ops::abs_out::call(at::Tensor const&, at::Tensor&) from ??:0 #12 at::native::abs(at::Tensor const&) from ??:0 #13 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeExplicitAutograd__abs>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&> >, at::Tensor (at::Tensor const&)>::call(c10::OperatorKernel, c10::DispatchKeySet, at::Tensor const&) from RegisterCompositeExplicitAutograd_0.cpp:0 #14 at::_ops::abs::redispatch(c10::DispatchKeySet, at::Tensor const&) from ??:0 #15 torch::autograd::VariableType::(anonymous namespace)::abs(c10::DispatchKeySet, at::Tensor const&) from VariableType_1.cpp:0 #16 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&), &torch::autograd::VariableType::(anonymous namespace)::abs>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&> >, at::Tensor (c10::DispatchKeySet, at::Tensor const&)>::call(c10::OperatorKernel, c10::DispatchKeySet, at::Tensor const&) from VariableType_1.cpp:0 #17 at::_ops::abs::call(at::Tensor const&) from ??:0 #18 at::native::isfinite(at::Tensor const&) from ??:0 #19 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__isfinite>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&> >, at::Tensor (at::Tensor const&)>::call(c10::OperatorKernel, c10::DispatchKeySet, at::Tensor const&) from RegisterCompositeImplicitAutograd_0.cpp:0 #20 at::_ops::isfinite::call(at::Tensor const&) from ??:0 #21 torch::autograd::THPVariable_isfinite(_object, _object, _object) from python_torch_functions_2.cpp:0 #22 PyObject_CallFunctionObjArgs from ??:0 #23 _PyObject_MakeTpCall from ??:0 #24 _PyEval_EvalFrameDefault from ??:0 #25 _PyObject_FastCallDictTstate from ??:0 #26 _PyStack_AsDict from ??:0 #27 _PyObject_MakeTpCall from ??:0 #28 _PyEval_EvalFrameDefault from ??:0 #29 _PyFunction_Vectorcall from ??:0 #30 _PyEval_EvalFrameDefault from ??:0 #31 _PyFunction_Vectorcall from ??:0 #32 _PyEval_EvalFrameDefault from ??:0 #33 _PyFunction_Vectorcall from ??:0 #34 _PyEval_EvalFrameDefault from ??:0 #35 PyFrame_GetCode from ??:0 #36 PyNumber_Xor from ??:0 #37 PyObject_Str from ??:0 #38 PyFile_WriteObject from ??:0 #39 _PyWideStringList_AsList from ??:0 #40 _PyDict_NewPresized from ??:0 #41 _PyEval_EvalFrameDefault from ??:0 #42 PyEval_EvalCode from ??:0 #43 PyEval_EvalCode from ??:0 #44 PyUnicode_Tailmatch from ??:0 #45 PyInit__collections from ??:0 #46 PyUnicode_Tailmatch from ??:0 #47 _PyRun_SimpleFileObject from ??:0 #48 _PyRun_AnyFileObject from ??:0 #49 Py_RunMain from ??:0 #50 Py_BytesMain from ??:0 #51 __libc_init_first from ??:0 #52 __libc_start_main from ??:0 #53 _start from ??:0 Captured error code is 710 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/152023 Approved by: https://github.com/eqy, https://github.com/mradmila, https://github.com/ngimel ghstack dependencies: #154436	2025-06-01 21:02:43 +00:00
Nikita Shulga	f7c09f864a	[Docs] Reformat sparse example (#154785 ) Not sure why, but rst fails to colorize multiline inputs, but works fine for single line commands Test plan: \| [Before](https://docs.pytorch.org/docs/main/sparse.html#construction) \| [After](https://docs-preview.pytorch.org/pytorch/pytorch/154785/sparse.html#construction) \| \| ------------- \| ------------- \| \| <img width="466" alt="image" src="https://github.com/user-attachments/assets/96a5c52a-1804-4d05-a5cf-c10221aaddf6" /> \| <img width="477" alt="image" src="https://github.com/user-attachments/assets/99565288-5c0b-4e8e-bd60-f016ebc207b5" /> \| Fixes https://github.com/pytorch/pytorch/issues/154779 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154785 Approved by: https://github.com/janeyx99, https://github.com/Skylion007	2025-06-01 20:56:14 +00:00
Natalia Gimelshein	f01e628e3b	Resubmit Remove MemPoolContext (#154042 ) (#154746 ) Summary: Per title Test Plan: Added tests + existing tests Differential Revision: D75695030 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154746 Approved by: https://github.com/malfet	2025-05-31 01:21:54 +00:00
PyTorch MergeBot	d173ba5a75	Revert "Remove MemPoolContext (#154042 )" This reverts commit `3b38989b5f`. Reverted https://github.com/pytorch/pytorch/pull/154042 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/154042#issuecomment-2921401100))	2025-05-30 06:53:37 +00:00
bobrenjc93	9c06dff1ce	[multigraph] use specializations in compile_and_call_fx_graph (#153449 ) The goal of this multigraph work is to enable a compiled region that has a single dynamo trace but multiple backend specializations. This work was inspired by vLLM which does this in a somewhat hacky way where they use a custom backend to capture a dynamo graph and then manually invoke compile_fx multiple times to get specialized graphs. There's really two parts of this work: The frontend changes: 1) we introduce an optional kwarg `specialize_on` to mark_{dynamic,unbacked} that takes in a list of specializations. I debated other methods including specifying specializations via decorators, but ultimately decided this approach was more harmonious. The big issue with decorators is the difficulty of composing well with the rest of the torch.compile ecosystem including graph breaks, lazy initialization of variable trackers and symbolic variables, etc. The backend changes (this PR): 1) We capture the backend_specialization specified in the mark_{dynamic,unbacked} API into a SymbolicContext. See changes in `/_dynamo/variables/builder.py` 2) After we are done dynamo tracing, we will lazily (more on this later) invoke `call_user_compiler` up to N + 1 times for N specializations and 1 generic graph. Under the hood this will call compile_fx, which composes nicely with both Async Compile and AOTAutogradCache. We do this by using a context manager to patch in specialization specific axioms into the ShapeEnv before invoking the user compiler. 3) When we have specializations, we install a lazy specialized dispatch function that checks each specialization and dispatches to the first one that matches. Instead of doing all of the specialization compiles up front, we do the compiles lazily. The first time a specialization is invoked, we will do the compilation and save it in a cache so subsequent invocations are fast. If none of the specializations match, we dispatch to the generic graph. I decided to do this over returning N different GuardedCodes since 1) it doesn't pollute the dynamo cache (eg. if you have 8 specializations, you would hit the cache limit) 2) it naturally incorporates the hierarchical lattice structure of the guards since the specializations are always necessarily stricter than the generic region's guards. I benchmarked this PR stack with #152596 and found around a 50% reduction when dispatching to the specialized regions: ![495269647_576053105510082_9189856138964956774_n](https://github.com/user-attachments/assets/66030fed-d62e-4d87-940f-aa13c99b1a73) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153449 Approved by: https://github.com/zou3519 ghstack dependencies: #153433	2025-05-30 03:19:49 +00:00
nirajkamalk	40abb2b403	Fix deprecated amp APIs in docs (#154553 ) Update usage of deprecated amp APIs. Fixes https://github.com/pytorch/tutorials/issues/3331 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154553 Approved by: https://github.com/Skylion007	2025-05-29 00:05:59 +00:00
Natalia Gimelshein	3b38989b5f	Remove MemPoolContext (#154042 ) Removes MemPoolContext from custom user mempools. The ground truth for which pool should be used is in graph_pools active pool, and MemPoolContext just introduced an opportunity for the pool pointed to by MemPoolContext and active pool in graph_pools to go out of sync (see all the asserts in the code to make sure that happens, and yet it still could happen in a multithread scenario, see my recent PRs (#153990). Pull Request resolved: https://github.com/pytorch/pytorch/pull/154042 Approved by: https://github.com/albanD, https://github.com/syed-ahmed	2025-05-28 16:35:48 +00:00
Yuki Kobayashi	f55f2f42a7	Add missing docstring for `sym_ite` (#154201 ) `sym_ite` is listed in [the reference page](https://docs.pytorch.org/docs/stable/torch.html) and has no document. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154201 Approved by: https://github.com/Skylion007	2025-05-26 15:59:21 +00:00
bobrenjc93	53ecb8159a	Introduce statically_known_false (#154291 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/154291 Approved by: https://github.com/mengluy0125	2025-05-24 14:23:55 +00:00
Svetlana Karslioglu	1ab2993345	Add a link to transformer_building_blocks tutorial (#154281 ) Cross-link to https://docs.pytorch.org/tutorials/intermediate/transformer_building_blocks.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/154281 Approved by: https://github.com/mikaylagawarecki	2025-05-24 02:50:24 +00:00
Svetlana Karslioglu	ec368a1903	Add sitemap (#154158 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/154158 Approved by: https://github.com/albanD	2025-05-23 18:01:00 +00:00
Shangdi Yu	04a6fe7914	Update provenance tracking doc (#154062 ) Summary: Update the doc to reflect the changes in https://github.com/pytorch/pytorch/pull/153584/files#diff-e0cdb58c0f84f56f20c5433339b6d83c470dcde47847e2328effea6bedd4cd27 and https://github.com/pytorch/tlparse/pull/110 Test Plan: CI Differential Revision: D75155981 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154062 Approved by: https://github.com/svekars, https://github.com/desertfire	2025-05-23 17:09:52 +00:00
Anita Katahoire	996c4d803d	Removing conda references from PyTorch Docs (#152702 ) Addresses #148339 Pull Request resolved: https://github.com/pytorch/pytorch/pull/152702 Approved by: https://github.com/svekars, https://github.com/albanD, https://github.com/atalman	2025-05-20 20:33:28 +00:00
Svetlana Karslioglu	7c9d94e9bb	Redirect mobile_optimizer.rst to executorch (#153664 ) Redirect mobile_optimizer.rst to executorch Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/153664 Approved by: https://github.com/byjlw, https://github.com/malfet	2025-05-20 18:13:45 +00:00
Mikayla Gawarecki	6383ddcfa4	Update serialization docs (#153631 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153631 Approved by: https://github.com/albanD	2025-05-19 20:22:07 +00:00
Angela Yi	b4fb801b2d	[export] Move PT2 constants to torch::_export (#153206 ) Test Plan: `buck2 test //sigmoid/...` https://www.internalfb.com/intern/testinfra/testrun/1970325119807758 Differential Revision: D74417085 Pull Request resolved: https://github.com/pytorch/pytorch/pull/153206 Approved by: https://github.com/zhxchen17, https://github.com/dolpm	2025-05-17 08:21:59 +00:00
Anthony Shoumikhin	7d39e73c57	Fix more URLs (#153277 ) Or ignore them. Found by running the lint_urls.sh script locally with https://github.com/pytorch/pytorch/pull/153246 Pull Request resolved: https://github.com/pytorch/pytorch/pull/153277 Approved by: https://github.com/malfet	2025-05-14 16:23:50 +00:00
angelayi	d51bc27378	[export] Make draft_export public (#153219 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/153219 Approved by: https://github.com/pianpwk	2025-05-14 02:18:36 +00:00
Svetlana Karslioglu	f136046919	Clean up right nav (#153090 ) - Move community and language binding links to the horizontal bar - Add an intro to the community page. - Fix the link in the ogp_image - Fix the link in the version switcher - Clean up unneeded links Pull Request resolved: https://github.com/pytorch/pytorch/pull/153090 Approved by: https://github.com/albanD	2025-05-12 21:00:45 +00:00
PyTorch MergeBot	fdc387ec7c	Revert "refine fp32 precision api (#125888 )" This reverts commit `4c11b26158`. Reverted https://github.com/pytorch/pytorch/pull/125888 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to cause some failures on ROCm ([comment](https://github.com/pytorch/pytorch/pull/125888#issuecomment-2869274791))	2025-05-11 00:35:46 +00:00
haozhe.zhu	4c11b26158	refine fp32 precision api (#125888 ) Based on the [conversation](https://github.com/pytorch/pytorch/issues/121791), we plan to drop the "highest, high, medium" to represent fp32 internal computation data types . Instead, we will directly use the algorithm to represent it. ### Design Choice: Directly use algorithms name like "TF32", "BF16". #### Pros - The names are more informative. 'tf32' is more informative than a simple "high". - Easier to extend new algorithm like `tf32x3` #### Cons - "HIGHEST, HIGH, MEDIUM" indicated the relative precision between different algorithms. However, we can have more documents to discuss them. ### We provide a layered structure for backends/operators. ('f32' is short for 'fp32_precision') ![image](https://github.com/user-attachments/assets/f89143e5-d6a1-4865-9351-9a50439f5067) ### We provide 3 fp32 compute precision can be set: - "ieee": Not allowed to use any other internal computation data types . - "tf32": Allowed to use tf32 as internal computation data types. - "bf16": Allowed to use bf16 as internal computation data types. - "none": Precision's are not set. Can be override by its father node. ### Overriding Precision Settings Child node can be override by its father node if it is set to default. For current default settings: ``` backend = generic, op = all, precision setting = none backend = cuda, op = all, precision setting = none backend = cuda, op = conv, precision setting = tf32 backend = cuda, op = rnn, precision setting = tf32 backend = cuda, op = matmul, precision setting = none backend = matmul, op = all, precision setting = none backend = matmul, op = conv, precision setting = none backend = matmul, op = rnn, precision setting = none backend = matmul, op = matmul, precision setting = none ``` - If the user set `torch.backends.mkldnn.fp32_precision="bf16"`, his child nodes `torch.backends.mkldnn.matmul.fp32_precision` / `torch.backends.mkldnn.conv.fp32_precision` / `torch.backends.mkldnn.rnn.fp32_precision` will also be override to "bf16". - If the user set `torch.backends.fp32_precision="bf16"`, `torch.backends.mkldnn.fp32_precision` and his child nodes will also we override to "bf16". ### Backward Compatible Since new API allow user to have more fine-grained control. There will be some conflict. For example, previous `torch.backends.cudnn.allow_tf32` are not enough to represent the status for `torch.backends.cudnn.rnn.fp32_precision="ieee"` and `torch.backends.cudnn.conv.fp32_precision="tf32"`. Therefore, our goal for backward compatible is - If the user only uses previous APIs, it will work as previous expectations. - If the user use new API to change the status to an un-representable status for old API, and try to access the status by old API. We will raise Runtime Error and point the document for user. ### Test Plan ``` python test/test_cuda.py -k test_fp32_precision_with_tf32 python test/test_cuda.py -k test_fp32_precision_with_float32_matmul_precision python test/test_cuda.py -k test_invalid_status_for_legacy_api python test/test_mkldnn.py -k test_mlkdnn_get_set python test/test_mkldnn.py -k test_generic_precision python test/test_mkldnn.py -k test_invalid python test/test_mkldnn.py -k test_default_use_parent ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/125888 Approved by: https://github.com/jgong5, https://github.com/albanD Co-authored-by: Jiang, Yanbing <yanbing.jiang@intel.com>	2025-05-10 11:13:04 +00:00
soulitzer	9d00f2b375	[autograd][docs] Add more details on why save_for_backward is important in extending autograd note (#153005 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/153005 Approved by: https://github.com/albanD	2025-05-09 16:36:57 +00:00
Shangdi Yu	faff387bfd	Mini tutorial for provenance tracking (#152211 ) as title Pull Request resolved: https://github.com/pytorch/pytorch/pull/152211 Approved by: https://github.com/svekars, https://github.com/eellison, https://github.com/desertfire	2025-05-09 01:41:04 +00:00
Wei Feng	5a8c9c3ab0	[FSDP2][Doc] add pointer to torchtitan (#153079 ) <img width="838" alt="Screenshot 2025-05-08 at 10 51 05 AM" src="https://github.com/user-attachments/assets/4cf43a16-3801-424b-a74f-ede1d41ff052" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/153079 Approved by: https://github.com/mori360	2025-05-08 22:22:07 +00:00
Yuxin Wu	2cf7fd0d2b	Update docs of saved_tensors_hooks to avoid ref cycle (#153049 ) Fixes #115255 Pull Request resolved: https://github.com/pytorch/pytorch/pull/153049 Approved by: https://github.com/Skylion007, https://github.com/soulitzer	2025-05-07 18:54:56 +00:00
angelayi	60ecc560af	[export] Add draft-export docs (#152637 ) Sample page: https://docs-preview.pytorch.org/pytorch/pytorch/152637/draft_export.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/152637 Approved by: https://github.com/zou3519, https://github.com/svekars	2025-05-07 01:12:45 +00:00
Ti-Tai Wang	5fa5017479	[ONNX] Suggest users setting dynamo=True when exporting (#152478 ) Fixes #152025 Pull Request resolved: https://github.com/pytorch/pytorch/pull/152478 Approved by: https://github.com/justinchuby	2025-05-06 23:18:11 +00:00
Laith Sakka	376529c78b	consolidate guard_or_x and definitely_x (#152463 ) definitely_true is almost same as guard_or_false, the potential differences are not meaningful to a degree that justify the existence of both. same for definitely_false, it can be expressed with guard_or_true and guard_or_false. Pull Request resolved: https://github.com/pytorch/pytorch/pull/152463 Approved by: https://github.com/bobrenjc93	2025-05-02 18:08:11 +00:00
Huy Do	3f10091d3c	Clean up conda usage in benchmark scripts (#152552 ) Fixes https://github.com/pytorch/pytorch/issues/152123. * Switch `benchmarks/dynamo/Makefile` to use uv. Note that these scripts are only used locally, so it's kind of ok to keep conda here IMO. But switching to uv is probably nicer to most folks. * Delete some files that are outdated and not used anymore Pull Request resolved: https://github.com/pytorch/pytorch/pull/152552 Approved by: https://github.com/atalman, https://github.com/albanD	2025-04-30 21:27:29 +00:00
Svetlana Karslioglu	e58c73be44	Add latex settings (#152350 ) - Fixes #147027 - Only lualatex can build our 3K pages PDF with reasonable quality, xelatex runs out of memory and pdflatex just fails. - Move notes under the same toctree as python-api which is needed for the PDF but doesn't change how the HTML is generated. This is the produced PDF: [pytorch.pdf](https://github.com/user-attachments/files/19945450/pytorch.pdf) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152350 Approved by: https://github.com/albanD	2025-04-29 19:28:43 +00:00
Zizeng Meng	861945100e	[Kineto] Enable OOM observer (#152160 ) Summary: # Context: When memory leak happens, it usually trigger the OOM in the later iterations. The snapshot of full iteration will be huge and hard to interpret. On CUDA side, they provide OOM observer which generates snapshot when OOM happens with latest 1,500,000 entries for debugging. In this diff, we want to implement the feature on MTIA side Test Plan: Run this test with last diff in the stack. ``` buck run @//mode/opt kineto/libkineto/fb/mtia/integration_tests:mtia_memory_auto_trace_test ``` As shown, the memory_snapshot is generated when oom happens Log: P1794792326 Snapshot: https://fburl.com/pytorch_memory_visualizer/lx73y6s3 {F1977402355} Differential Revision: D71993315 Pull Request resolved: https://github.com/pytorch/pytorch/pull/152160 Approved by: https://github.com/sraikund16	2025-04-27 15:56:44 +00:00
Anthony Shoumikhin	e2f9759bd0	Fix broken URLs (#152237 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152237 Approved by: https://github.com/huydhn, https://github.com/malfet	2025-04-27 09:56:42 +00:00
Dan Johnson	d22c4cc353	Add option to use mempool on OOM (#151487 ) MemPool is a separate pool of memory handled by the caching allocator. This PR adds the option let the caching allocator try to use this pool as a last resort instead of OOMing by associating a use_on_oom bool with each MemPool. Usage: Users can optionally specify a ``use_on_oom`` bool (which is False by default) during MemPool creation. If true, then the CUDACachingAllocator will be able to use memory in this pool as a last resort instead of OOMing. ``` pool = torch.cuda.MemPool(allocator, use_on_oom=True) with torch.cuda.use_mem_pool(pool): a = torch.randn(40 * 1024 * 1024, dtype=torch.uint8, device="cuda") del a # at the memory limit, this will succeed by using pool's memory in order to avoid the oom b = torch.randn(40 * 1024 * 1024, dtype=torch.uint8, device="cuda") ``` Testing: ``` python test/test_cuda.py -k test_mempool_limited_memory_with_allocator ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/151487 Approved by: https://github.com/eqy, https://github.com/syed-ahmed, https://github.com/ngimel	2025-04-26 04:04:57 +00:00
Yu, Guangye	33c75cae0a	Add torch.accelerator.device_index as accelerator's device switch context (#148864 ) # Motivation We propose adding support for the Python with statement on `torch.accelerator.device_index` to enable device switching functionality. This enhancement would simplify writing device-agnostic code and provide benefits across all accelerators. Its device-specific counterparts include [`torch.cuda.device`](`00199acdb8/torch/cuda/__init__.py (L482)`) and [`torch.cuda._DeviceGuard`](`00199acdb8/torch/cuda/__init__.py (L469)`). Design Philosophy It accepts either an `Int` or `None` as input. When `None` is passed, no device switch is performed. Supporting `None` is important for compatibility, as it's possible to encounter `None` values from `torch.device.index`. Therefore, with this PR, we can do like this ```python src = 0 dst = 1 # Set src to current device torch.accelerator.set_device_index(src) with torch.accelerator.device_index(dst): # Inside with statement, we set dst to current device assert torch.accelerator.get_device_index() == dst # Here the current device should be src assert torch.accelerator.get_device_index() == src ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/148864 Approved by: https://github.com/albanD	2025-04-25 09:45:25 +00:00
Jane Xu	8a9c66bb70	Improve stable library apis per Scott's feedback (#152040 ) Following 3 suggestions: 1. inline at::Tensor arg 2. use uniq ptr of array vs std::vector 3. document the `std::optional<S>()` case Pull Request resolved: https://github.com/pytorch/pytorch/pull/152040 Approved by: https://github.com/swolchok, https://github.com/albanD	2025-04-24 20:51:03 +00:00
ILCSFNO	bd09d87fdb	add Out Notes (#151306 ) Fixes #150181 @albanD Could you please have a check? Build locally without pytorch build: ![Developer-FAQ](https://github.com/user-attachments/assets/351a7e0b-588e-48ae-ad0a-03f427c86e89) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151306 Approved by: https://github.com/albanD	2025-04-24 20:25:09 +00:00
Pian Pawakapan	2ee8de54b1	[dynamic shapes] user-code friendly statically_known_true, has_static_value (#151601 ) Fixes #151480 Allows `statically_known_true` in user code, as well as introducing `has_static_value`, returning True if the input has a static bool/float/int value Pull Request resolved: https://github.com/pytorch/pytorch/pull/151601 Approved by: https://github.com/laithsakka, https://github.com/zou3519, https://github.com/jingsh	2025-04-24 02:53:59 +00:00
Kaiyu Shi	f39a1a43ee	Fix typos in meta.rst (#151979 ) ### Fixes made: - "allow you to the module" → corrected to "allows you to move the module" - "allow" → changed to "allows" to agree with the singular subject "method" Pull Request resolved: https://github.com/pytorch/pytorch/pull/151979 Approved by: https://github.com/colesbury	2025-04-24 01:25:09 +00:00
Syed Tousif Ahmed	334aab0dea	Updates NCCLConfig with QOS variable (#151821 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151821 Approved by: https://github.com/kwen2501	2025-04-23 00:03:49 +00:00
Scott Wolchok	2f74cffab2	Remove `reinterpret_cast`s with undefined behavior from stable/library.h (#151595 ) There is a list of valid uses of `reinterpret_cast` (see https://en.cppreference.com/w/cpp/language/reinterpret_cast), and the use here was not on the list, hence undefined behavior. Implement what we meant using memcpy, which is well-defined. Differential Revision: [D73200791](https://our.internmc.facebook.com/intern/diff/D73200791/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151595 Approved by: https://github.com/janeyx99	2025-04-22 20:24:47 +00:00
Svetlana Karslioglu	2fb1326483	Add dates to pages (#151602 ) re: #150873 Pull Request resolved: https://github.com/pytorch/pytorch/pull/151602 Approved by: https://github.com/albanD	2025-04-21 19:53:55 +00:00
Will Constable	bedefa46a9	Document non-pytorch CUDA memory allocation and how to query it (#150880 ) This PR documents the fact that PyTorch does not have visibility into how every CUDA memory allocation happend - it only knows about allocations that went through the pytorch CUDA allocator. It also adds a code snippet showing how to use pynvml to query current GPU memory usage. ## Preview Added a note at the top of "Understanding CUDA Memory Usage" doc: <img width="732" alt="image" src="https://github.com/user-attachments/assets/69e28d2a-841a-4b1b-b886-e96fb5d76582" /> which links to a section below: <img width="733" alt="image" src="https://github.com/user-attachments/assets/cab4f252-9ac2-4fc6-a45d-fdb958fc7dbc" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/150880 Approved by: https://github.com/kwen2501, https://github.com/ngimel	2025-04-18 03:48:54 +00:00
Kashif Rasul	2ed2cb5805	add generalized pareto distribution (GPD) (#135968 ) Add the GPD as a distribution class Pull Request resolved: https://github.com/pytorch/pytorch/pull/135968 Approved by: https://github.com/albanD Co-authored-by: Alexander März <statmixedmlgit@gmail.com>	2025-04-17 18:51:02 +00:00
Svetlana Karslioglu	cd7bc60e11	Migrate to new theme (#149331 ) - Migrate pytorch docs, cpp docs and functorch docs to the pytorch_sphinx_theme2 - Migrate index.rst to markdown and restructure to use high-level horizontal bar sections Python API, Developer Notes - Added python-api.md which becomes the main container for the API docs. This file will be used to add all api references in the toctree. It would be great to have lint for this file: https://github.com/pytorch/pytorch/issues/150718 - Enabled mermaid sphinx extension and opengraph sphinx extension Pull Request resolved: https://github.com/pytorch/pytorch/pull/149331 Approved by: https://github.com/malfet, https://github.com/atalman, https://github.com/albanD	2025-04-16 21:35:19 +00:00
Pian Pawakapan	6dddd6520d	[dynamic shapes] add sym_and, sym_or (#150456 ) This has been pretty helpful for the size-oblivious rewrite. Wanted the variadic args version to avoid `sym_or(a, sym_or(b, sym_or(c, d)))` in favor of `sym_or(a, b, c, d)`. Happy to change this to ban the 1-arg version. This is better than plain and/or because the whole symbolic expression gets preserved, and if we guard on it or defer as a runtime assert, we preserve all branches. Pull Request resolved: https://github.com/pytorch/pytorch/pull/150456 Approved by: https://github.com/laithsakka	2025-04-14 18:18:06 +00:00
fzyzcjy	50abc1ecc4	Super tiny fix typo (#151212 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/151212 Approved by: https://github.com/Skylion007	2025-04-14 16:47:40 +00:00

1 2 3 4 5 ...

2981 Commits