pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Richard Barnes	ed327876f5	[codemod] `c10:optional` -> `std::optional` (#126135 ) Generated by running the following from PyTorch root: ``` find . -regex ".*\.$cpp\\|h\\|cu\\|hpp\\|cc\\|cxx$$" \| grep -v "build/" \| xargs -n 50 -P 4 perl -pi -e 's/c10::optional/std::optional/' ``` `c10::optional` is just an alias for `std::optional`. This removes usages of that alias in preparation for eliminating it entirely. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126135 Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/albanD, https://github.com/aaronenyeshi	2024-05-14 19:35:51 +00:00
Frank Lin	249e65b92d	Graph-Safe RNG State Exchange for Tensor Parallelism (#114068 ) See #113541 The PR allows for registering and controlling multiple RNG states using indices, ensuring cudagraph-safe operations, and includes both C++ and Python API changes to support this functionality. cc @eellison @anijain2305 @jansel @ezyang @ptrblck @csarofeen @mcarilli Pull Request resolved: https://github.com/pytorch/pytorch/pull/114068 Approved by: https://github.com/ezyang, https://github.com/eqy, https://github.com/xuzhao9	2024-03-27 01:14:38 +00:00
PyTorch MergeBot	4dc09d6aa4	Revert "Graph-Safe RNG State Exchange for Tensor Parallelism (#114068 )" This reverts commit `e9dcda5cba`. Reverted https://github.com/pytorch/pytorch/pull/114068 on behalf of https://github.com/ezyang due to memory leak in another ci ([comment](https://github.com/pytorch/pytorch/pull/114068#issuecomment-2018044527))	2024-03-25 13:49:04 +00:00
Frank Lin	e9dcda5cba	Graph-Safe RNG State Exchange for Tensor Parallelism (#114068 ) See #113541 The PR allows for registering and controlling multiple RNG states using indices, ensuring cudagraph-safe operations, and includes both C++ and Python API changes to support this functionality. cc @eellison @anijain2305 @jansel @ezyang @ptrblck @csarofeen @mcarilli Pull Request resolved: https://github.com/pytorch/pytorch/pull/114068 Approved by: https://github.com/ezyang	2024-03-21 01:57:08 +00:00
Elias Ellison	0a9778a372	Expose cudaStreamCaptureMode in CUDA Graphs, use local setting in inductor (#107407 ) > capture_error_mode (str, optional): specifies the cudaStreamCaptureMode for the graph capture stream. Can be "global", "thread_local" or "relaxed". During cuda graph capture, some actions, such as cudaMalloc, may be unsafe. "global" will error on actions in other threads, "thread_local" will only error for actions in the current thread, and "relaxed" will not error on these actions. Inductor codegen is single-threaded, so it should be safe to enable "thread_local" for inductor's cuda graph capturing. We have seen errors when inductor cudagraphs has been used concurrently with data preprocessing in other threads. Differential Revision: [D48656014](https://our.internmc.facebook.com/intern/diff/D48656014) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107407 Approved by: https://github.com/albanD, https://github.com/eqy	2023-08-25 01:44:26 +00:00
albanD	d8aa68c683	make sure that our error handling runs with the GIL enabled (#92848 ) Fixes https://github.com/pytorch/pytorch/issues/92684 I checked the other use case of this API and they never release the GIL Pull Request resolved: https://github.com/pytorch/pytorch/pull/92848 Approved by: https://github.com/ngimel	2023-01-24 09:30:42 +00:00
eqy	62e450d55f	[CUDA Graphs] Add option to dump a captured graph for debugging (#85519 ) CC @xwang233 @ptrblck @ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/85519 Approved by: https://github.com/ngimel	2022-12-06 22:03:05 +00:00
Peter Bell	3d79ced8cf	wrap_pybind_function: support member function pointers (#88932 ) This updates `wrap_pybind_function` to use `invoke` and adds the `invoke_traits` object which is analogous to `function_traits` but for member functions it includes the class as an explicit argument. To test this is working properly, I've also applied it to the `CUDAGraph` binding code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88932 Approved by: https://github.com/albanD	2022-11-14 18:47:34 +00:00
Michael Suo	30fb2c4aba	[lint] autoformat test/cpp and torch/csrc Let's have some fun. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78828 Approved by: https://github.com/ezyang	2022-06-11 21:11:16 +00:00
Michael Carilli	8d08b103be	[CUDA graphs] Prototype API and documentation (#63269 ) Summary: RFC: https://github.com/pytorch/pytorch/issues/61880 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63269 Reviewed By: mruberry Differential Revision: D30596643 Pulled By: ngimel fbshipit-source-id: b1f8061406364b667e2c2d4d30fbce1f0d8456be	2021-08-31 13:34:23 -07:00
Michael Carilli	b27e678dfb	[RELAND] [CUDA graphs] Private mempools for CUDA graphs (#54038 ) Summary: Resubmit of https://github.com/pytorch/pytorch/pull/51436. Apparently some non-public windows builds run cuda tests on the default stream, so I changed a few capture tests to manually ensure all captures happen on non-default streams. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54038 Reviewed By: mruberry Differential Revision: D27068649 Pulled By: ngimel fbshipit-source-id: 4284475fa40ee38c0f8faff05a2faa310cf8a207	2021-03-16 12:13:33 -07:00
Natalia Gimelshein	76129c7cdf	Revert D26993790: [pytorch][PR] [CUDA graphs] Private mempools for CUDA graphs Test Plan: revert-hammer Differential Revision: D26993790 (`90dfdef226`) Original commit changeset: a992eaee1b8c fbshipit-source-id: 6ddb4aedd6154d7d89847aa5a34181158d06a309	2021-03-12 13:07:28 -08:00
Michael Carilli	90dfdef226	[CUDA graphs] Private mempools for CUDA graphs (#51436 ) Summary: Implements https://github.com/pytorch/pytorch/issues/51075#issuecomment-768884685 and additions discussed offline with ezyang ngimel . (Calling it "simple" is charitable but it's not too bad). [High level strategy](https://github.com/pytorch/pytorch/pull/51436/files#diff-acc6337586bf9cdcf0a684380779300ec171897d05b8569bf439820dc8c93bd5R57-R82) The current design aggregates stats from private pools with the ordinary pools, which may or may not be what we want. Instead of adding PrivatePools as an internal feature of DeviceAllocator, I could inherit from DeviceAllocator (eg `DevicePrivateAllocator : public DeviceAllocator`) and create separate per-graph instances of the inherited class. I'm not sure if that would be better. Graph bindings in Python are almost unchanged from https://github.com/pytorch/pytorch/pull/48875: ```python # Same bindings as 48875, but now implicitly grabs a private mempool graph1.capture_begin() graph1.capture_end() # pool=... is new. It hints that allocations during graph2's capture may share graph1's mempool graph2.capture_begin(pool=graph1.pool()) graph2.capture_end() # graph3 also implicitly creates its own mempool graph3.capture_begin() graph3.capture_end() ``` Test plan (other suggestions appreciated): - [x] Stop maintaining manual references for all the tensors in my existing graphs+RNG tests. If private pools somehow give bad allocations, they should start failing intermittently. They run eager ops and eager allocations mixed with graph replays, so they may expose if eager ops and replays corrupt each other. - [x] `test_graph_two_successive`: Capture successive graphs, with the second graph using the first graph's result. Try with and without sharing a pool. Check results, also check memory stats to confirm sharing a pool saves memory. - [x] `test_graph_concurrent_replay`: Capture some graphs in separate private pools, replay them concurrently in different streams, check the results to make sure they don't corrupt each other's memory. Capture some graphs with a shared pool, replay them concurrently in different streams, check results, confirm they DO corrupt each other's memory. - [x] `test_graph_three_successive`: A three-graph case, checking the safe and unsafe replay patterns in [Restrictions of the Strawman API](https://github.com/pytorch/pytorch/issues/51075)). - [x] `test_graph_memory_stats_and_use_result_after_destroy_graph`: Comprehensively check torch.cuda.memory_stats() changes that result from graph capture and delete. Check that a tensor ref created during capture and held after graph delete stays valid until the tensor itself is deleted. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51436 Reviewed By: mruberry Differential Revision: D26993790 Pulled By: ngimel fbshipit-source-id: a992eaee1b8c23628e7b388a5a3c26e0f80e54da	2021-03-12 11:07:47 -08:00
Michael Carilli	c068180a17	[CUDA graphs] Cuda RNG-safe graph capture and replay bindings (#48875 ) Summary: Part 2 of https://github.com/pytorch/pytorch/pull/46148 refactor. (part 1 was https://github.com/pytorch/pytorch/pull/48694.) Contains - a few more CUDAGeneratorImpl diffs to clean up graph capture interaction - Capture and replay bindings that interact correctly with CUDAGeneratorImpl - Tests. Diffs compile and tests pass on my machine (ubuntu 20.04, cuda 11.0) but it needs finetuning for many CI builds. See [Note [CUDA Graph-safe RNG states]](`02d89f9f1d/aten/src/ATen/CUDAGeneratorImpl.h (L13-L85)`) for the strategy, based on https://github.com/pytorch/pytorch/pull/46148#issuecomment-724414794. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48875 Reviewed By: zou3519 Differential Revision: D25482654 Pulled By: ngimel fbshipit-source-id: 634dbc4c6c9d7d0d9a62dc81a52d430561f905fe	2020-12-14 10:51:58 -08:00

14 Commits