pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
dllehr-amd	98012e4a59	[ROCm] hipGraph support for pytorch mainline (#88202 ) With the release of ROCm 5.3 hip now supports a hipGraph implementation. All necessary backend work and hipification is done to support the same functionality as cudaGraph. Unit tests are modified to support a new TEST_GRAPH feature which allows us to create a single check for graph support instead of attempted to gather the CUDA level in annotations for every graph test Pull Request resolved: https://github.com/pytorch/pytorch/pull/88202 Approved by: https://github.com/jithunnair-amd, https://github.com/pruthvistony, https://github.com/malfet	2023-02-14 22:18:56 +00:00
Eddie Yan	bac33ea8b6	[CUDA] Drop CUDA 10 support (#89582 ) CC @ptrblck @ngimel @malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/89582 Approved by: https://github.com/malfet, https://github.com/ngimel	2023-01-05 05:11:53 +00:00
Pruthvi Madugundu	085e2f7bdd	[ROCm] Changes not to rely on CUDA_VERSION or HIP_VERSION (#65610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65610 - Replace HIP_PLATFORM_HCC with USE_ROCM - Dont rely on CUDA_VERSION or HIP_VERSION and use USE_ROCM and ROCM_VERSION. - In the next PR - Will be removing the mapping from CUDA_VERSION to HIP_VERSION and CUDA to HIP in hipify. - HIP_PLATFORM_HCC is deprecated, so will add HIP_PLATFORM_AMD to support HIP host code compilation on gcc. cc jeffdaily sunway513 jithunnair-amd ROCmSupport amathews-amd Reviewed By: jbschlosser Differential Revision: D30909053 Pulled By: ezyang fbshipit-source-id: 224a966ebf1aaec79beccbbd686fdf3d49267e06	2021-09-29 09:55:43 -07:00
Scott Wolchok	44cc873fba	[PyTorch] Autoformat c10 (#56830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56830 Opt into formatting on GitHub and format everything. This is a trial run before turning on formatting for more and eventually all of the codebase. Test Plan: CI Reviewed By: zertosh Differential Revision: D27979080 fbshipit-source-id: a80f0c48691c08ae8ca0af06377b87e6a2351151	2021-04-30 21:23:28 -07:00
Michael Carilli	b27e678dfb	[RELAND] [CUDA graphs] Private mempools for CUDA graphs (#54038 ) Summary: Resubmit of https://github.com/pytorch/pytorch/pull/51436. Apparently some non-public windows builds run cuda tests on the default stream, so I changed a few capture tests to manually ensure all captures happen on non-default streams. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54038 Reviewed By: mruberry Differential Revision: D27068649 Pulled By: ngimel fbshipit-source-id: 4284475fa40ee38c0f8faff05a2faa310cf8a207	2021-03-16 12:13:33 -07:00
Natalia Gimelshein	76129c7cdf	Revert D26993790: [pytorch][PR] [CUDA graphs] Private mempools for CUDA graphs Test Plan: revert-hammer Differential Revision: D26993790 (`90dfdef226`) Original commit changeset: a992eaee1b8c fbshipit-source-id: 6ddb4aedd6154d7d89847aa5a34181158d06a309	2021-03-12 13:07:28 -08:00
Michael Carilli	90dfdef226	[CUDA graphs] Private mempools for CUDA graphs (#51436 ) Summary: Implements https://github.com/pytorch/pytorch/issues/51075#issuecomment-768884685 and additions discussed offline with ezyang ngimel . (Calling it "simple" is charitable but it's not too bad). [High level strategy](https://github.com/pytorch/pytorch/pull/51436/files#diff-acc6337586bf9cdcf0a684380779300ec171897d05b8569bf439820dc8c93bd5R57-R82) The current design aggregates stats from private pools with the ordinary pools, which may or may not be what we want. Instead of adding PrivatePools as an internal feature of DeviceAllocator, I could inherit from DeviceAllocator (eg `DevicePrivateAllocator : public DeviceAllocator`) and create separate per-graph instances of the inherited class. I'm not sure if that would be better. Graph bindings in Python are almost unchanged from https://github.com/pytorch/pytorch/pull/48875: ```python # Same bindings as 48875, but now implicitly grabs a private mempool graph1.capture_begin() graph1.capture_end() # pool=... is new. It hints that allocations during graph2's capture may share graph1's mempool graph2.capture_begin(pool=graph1.pool()) graph2.capture_end() # graph3 also implicitly creates its own mempool graph3.capture_begin() graph3.capture_end() ``` Test plan (other suggestions appreciated): - [x] Stop maintaining manual references for all the tensors in my existing graphs+RNG tests. If private pools somehow give bad allocations, they should start failing intermittently. They run eager ops and eager allocations mixed with graph replays, so they may expose if eager ops and replays corrupt each other. - [x] `test_graph_two_successive`: Capture successive graphs, with the second graph using the first graph's result. Try with and without sharing a pool. Check results, also check memory stats to confirm sharing a pool saves memory. - [x] `test_graph_concurrent_replay`: Capture some graphs in separate private pools, replay them concurrently in different streams, check the results to make sure they don't corrupt each other's memory. Capture some graphs with a shared pool, replay them concurrently in different streams, check results, confirm they DO corrupt each other's memory. - [x] `test_graph_three_successive`: A three-graph case, checking the safe and unsafe replay patterns in [Restrictions of the Strawman API](https://github.com/pytorch/pytorch/issues/51075)). - [x] `test_graph_memory_stats_and_use_result_after_destroy_graph`: Comprehensively check torch.cuda.memory_stats() changes that result from graph capture and delete. Check that a tensor ref created during capture and held after graph delete stays valid until the tensor itself is deleted. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51436 Reviewed By: mruberry Differential Revision: D26993790 Pulled By: ngimel fbshipit-source-id: a992eaee1b8c23628e7b388a5a3c26e0f80e54da	2021-03-12 11:07:47 -08:00

7 Commits