pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Bert Maher	d3d85e1c3b	Emit torch.cuda.synchronize() after every kernel call in inductor (#90472 ) Debugging illegal memory access is hard; even CUDA_LAUNCH_BLOCKING=1 and using C10_CUDA_KERNEL_LAUNCH_CHECK doesn't guarantee a useful stack trace. doesn't necessarily guarantee that you'll get a stack trace pointing to the right kernel. This diff adds a config option to force a CUDA synchronize after every kernel call in inductor, for debugging those tricky cases. Differential Revision: [D41744967](https://our.internmc.facebook.com/intern/diff/D41744967/) Differential Revision: [D41744967](https://our.internmc.facebook.com/intern/diff/D41744967) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90472 Approved by: https://github.com/jansel	2022-12-12 04:35:10 +00:00
Peter Bell	4f44877983	[Inductor] Add test for Scheduler fusions (#90014 ) Currently there is `test_vertical_fusion1` which fuses entirely during the lowering stage and no buffers are realized. This adds `test_scheduler_vertical_fusion1` which is the same test but with several intermediate calculations realized so the scheduler is left to do the fusion. To support the test, this PR also adds: - `metrics.ir_nodes_pre_fusion` which when compared with `generated_kernel_count` tells us how many nodes were fused. - `torch._test_inductor_realize` which is an identity operator in eager, but under inductor also forces the input to be realized. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90014 Approved by: https://github.com/jansel	2022-12-07 01:33:25 +00:00
Elias Ellison	acd68f9097	[Reland] dont clone args (#89766 ) Reland of https://github.com/pytorch/pytorch/pull/89519. Improves first memory compression on pytorch struct from .55 -> .73. However, it doesn't totally eliminate the overhead from autotuning because of the 250mb cache clearing in triton benchmarking. Reland bc previously we weren't accounting for inplace buffer reuse correctly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89766 Approved by: https://github.com/jansel	2022-12-02 17:20:40 +00:00
Soumith Chintala	6f5945e4bb	triton supports devices < 7.0, not 6.0 (#90020 ) triton is still buggy with Pascal devices, so make the error checker reflect that. Also, this < 6.0 never worked, as the `has_triton` definition in utils.py was checking >= 7.0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90020 Approved by: https://github.com/yanboliang, https://github.com/anijain2305	2022-12-01 22:01:41 +00:00
Horace He	419ef2cdcf	Added utility to count memory reads/written in Inductor (#89203 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89203 Approved by: https://github.com/jansel, https://github.com/ngimel	2022-11-19 04:18:26 +00:00
Michael Voznesensky	3fbf748f21	Assert we have triton before scheduling on triton (#88849 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88849 Approved by: https://github.com/wconstab, https://github.com/ngimel, https://github.com/jansel	2022-11-11 02:30:29 +00:00
Natalia Gimelshein	73c9911fc0	always realize output regardless of the number of reads (#88046 ) This improves hf_Bert 1.139x->1.21x, currently lowmem dropout doesn't work for nn.Dropout module, and before this change we were recomputing all the dropout masks in a very inefficient kernel. This change pushes dropout masks to be saved in the dropout kernels where they are first computed. cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx Pull Request resolved: https://github.com/pytorch/pytorch/pull/88046 Approved by: https://github.com/Chillee	2022-11-01 15:47:43 +00:00
Jason Ansel	054a2fd6c2	Sync changes from `pytorch/torchdynamo` (#87013 ) This updates to: `6380959be2` Generated with: https://github.com/pytorch/torchdynamo/blob/main/copy_to_core.sh Pull Request resolved: https://github.com/pytorch/pytorch/pull/87013 Approved by: https://github.com/voznesenskym	2022-10-15 21:00:57 +00:00
Jason Ansel	8f71e8de7e	Sync changes from pytorch/torchdynamo, enable tests (#86950 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86950 Approved by: https://github.com/Chillee	2022-10-14 23:08:58 +00:00
Jason Ansel	c7c09722ad	Move TorchDynamo into PyTorch core (#86461 ) Context: https://github.com/pytorch/torchdynamo/issues/1588 This PR moves [TorchDynamo](https://github.com/pytorch/torchdynamo) and TorchInductor into PyTorch core. - `torchdynamo` becomes `torch._dynamo` - `torchinductor` becomes `torch._inductor` This PR was generated by running `copy_to_core.sh` in https://github.com/pytorch/torchdynamo/pull/1538 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86461 Approved by: https://github.com/voznesenskym	2022-10-13 23:18:06 +00:00

10 Commits