pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

History

Michael Carilli be038d8989 [CUDA graphs] Make stream semantics of backward calls consistent with other cuda ops (ci-all edition) (#57833 ) Summary: ci-all resubmit of https://github.com/pytorch/pytorch/pull/54227. Tests look good except for a few distributed autograd failures (pytorch_linux_xenial_cuda10_2_cudnn7_py3_multigpu_test) and rocm failures (pr/pytorch-linux-bionic-rocm4.1-py3.6). The common denominator in rocm failures appears to be multi-gpu activity: some [multiprocess DDP failures](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm4.1-py3.6-test1/8115/console), some [single-process failures](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm4.1-py3.6-test2/8115/console) where the single process has autograd ops that span devices. jeffdaily jithunnair-amd sunway513, could one of you take a look? The streaming backward change is also beneficial to rocm, I expect. For debugging rocm failures, I think we should ignore the multiprocess/DDP tests and focus on the single process cases. The root cause is probably the same and the single process cases are simpler. ---------------------------------- Update: Rocm failures are due to https://github.com/pytorch/pytorch/issues/59750. `2718a54032` is a workaround, to be updated once https://github.com/pytorch/pytorch/issues/59750 is fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57833 Reviewed By: mruberry Differential Revision: D28942391 Pulled By: ngimel fbshipit-source-id: d6047e971c5f1c6386334bf3641402a92f12e2f8		2021-06-13 12:09:56 -07:00
..
amp_examples.rst	Reference amp tutorial (recipe) from core amp docs (#44725 )	2020-09-16 11:37:58 -07:00
autograd.rst	Add no-grad inference mode note (#58513 )	2021-05-25 13:06:54 -07:00
broadcasting.rst	Fixes docs (#51439 )	2021-01-31 22:00:26 -08:00
cpu_threading_runtimes.svg	Update CPU threading doc (#33083 )	2020-02-11 14:13:51 -08:00
cpu_threading_torchscript_inference.rst	Upgrade MKL-DNN to DNNL v1.2 (#32422 )	2020-03-26 22:07:59 -07:00
cpu_threading_torchscript_inference.svg	Lint trailing newlines (#54737 )	2021-03-30 13:09:52 -07:00
cuda.rst	[CUDA graphs] Make stream semantics of backward calls consistent with other cuda ops (ci-all edition) (#57833 )	2021-06-13 12:09:56 -07:00
ddp.rst	Forbid trailing whitespace (#53406 )	2021-03-05 17:22:55 -08:00
extending.rst	Remove legacy constructor calls from pytorch codebase. (#54142 )	2021-04-11 15:45:17 -07:00
faq.rst	[DataLoader][doc] Randomness for base_seed generator and NumPy seed (#56528 )	2021-04-22 09:40:45 -07:00
gradcheck.rst	Add first draft of gradcheck note (#55966 )	2021-04-27 14:33:42 -07:00
hip.rst	Add HIP (ROCm) semantics doc (#57871 )	2021-05-12 12:34:07 -07:00
large_scale_deployments.rst	Move ThreadLocalDebugInfo to c10 (#37774 )	2020-05-11 19:27:41 -07:00
modules.rst	Note on Modules for 1.8 docs (#51536 )	2021-02-04 11:28:11 -08:00
multiprocessing.rst	Update docs for master to remove Python 2 references (#36336 )	2020-04-16 10:15:48 -07:00
randomness.rst	[DataLoader][doc] Randomness for base_seed generator and NumPy seed (#56528 )	2021-04-22 09:40:45 -07:00
serialization.rst	docs: reference links to serialization.html (#54659 )	2021-03-29 10:15:07 -07:00
windows.rst	Forbid trailing whitespace (#53406 )	2021-03-05 17:22:55 -08:00