Commit Graph

10 Commits

Author SHA1 Message Date
Bert Maher
d3d85e1c3b Emit torch.cuda.synchronize() after every kernel call in inductor (#90472)
Debugging illegal memory access is hard; even CUDA_LAUNCH_BLOCKING=1
and using C10_CUDA_KERNEL_LAUNCH_CHECK doesn't guarantee a useful stack trace.
doesn't necessarily guarantee that you'll get a stack trace pointing to the
right kernel.  This diff adds a config option to force a CUDA synchronize after
every kernel call in inductor, for debugging those tricky cases.

Differential Revision: [D41744967](https://our.internmc.facebook.com/intern/diff/D41744967/)

Differential Revision: [D41744967](https://our.internmc.facebook.com/intern/diff/D41744967)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90472
Approved by: https://github.com/jansel
2022-12-12 04:35:10 +00:00
Peter Bell
4f44877983 [Inductor] Add test for Scheduler fusions (#90014)
Currently there is `test_vertical_fusion1` which fuses entirely during
the lowering stage and no buffers are realized. This adds
`test_scheduler_vertical_fusion1` which is the same test but with
several intermediate calculations realized so the scheduler is left
to do the fusion.

To support the test, this PR also adds:
- `metrics.ir_nodes_pre_fusion` which when compared with
`generated_kernel_count` tells us how many nodes were fused.
- `torch._test_inductor_realize` which is an identity operator in
eager, but under inductor also forces the input to be realized.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90014
Approved by: https://github.com/jansel
2022-12-07 01:33:25 +00:00
Elias Ellison
acd68f9097 [Reland] dont clone args (#89766)
Reland of https://github.com/pytorch/pytorch/pull/89519.

Improves first memory compression on pytorch struct from .55 -> .73. However, it doesn't totally eliminate the overhead from autotuning because of the 250mb cache clearing in triton benchmarking.

Reland bc previously we weren't accounting for inplace buffer reuse correctly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89766
Approved by: https://github.com/jansel
2022-12-02 17:20:40 +00:00
Soumith Chintala
6f5945e4bb triton supports devices < 7.0, not 6.0 (#90020)
triton is still buggy with Pascal devices, so make the error checker reflect that.

Also, this < 6.0 never worked, as the `has_triton` definition in utils.py was checking >= 7.0.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90020
Approved by: https://github.com/yanboliang, https://github.com/anijain2305
2022-12-01 22:01:41 +00:00
Horace He
419ef2cdcf Added utility to count memory reads/written in Inductor (#89203)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89203
Approved by: https://github.com/jansel, https://github.com/ngimel
2022-11-19 04:18:26 +00:00
Michael Voznesensky
3fbf748f21 Assert we have triton before scheduling on triton (#88849)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88849
Approved by: https://github.com/wconstab, https://github.com/ngimel, https://github.com/jansel
2022-11-11 02:30:29 +00:00
Natalia Gimelshein
73c9911fc0 always realize output regardless of the number of reads (#88046)
This improves hf_Bert 1.139x->1.21x, currently lowmem dropout doesn't work for nn.Dropout module, and before this change we were recomputing all the dropout masks in a very inefficient kernel. This change pushes dropout masks to be saved in the dropout kernels where they are first computed.

cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @chunyuan-w @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88046
Approved by: https://github.com/Chillee
2022-11-01 15:47:43 +00:00
Jason Ansel
054a2fd6c2 Sync changes from pytorch/torchdynamo (#87013)
This updates to:
6380959be2

Generated with:
https://github.com/pytorch/torchdynamo/blob/main/copy_to_core.sh
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87013
Approved by: https://github.com/voznesenskym
2022-10-15 21:00:57 +00:00
Jason Ansel
8f71e8de7e Sync changes from pytorch/torchdynamo, enable tests (#86950)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86950
Approved by: https://github.com/Chillee
2022-10-14 23:08:58 +00:00
Jason Ansel
c7c09722ad Move TorchDynamo into PyTorch core (#86461)
Context:
https://github.com/pytorch/torchdynamo/issues/1588

This PR moves [TorchDynamo](https://github.com/pytorch/torchdynamo) and TorchInductor into PyTorch core.
- `torchdynamo` becomes `torch._dynamo`
- `torchinductor` becomes `torch._inductor`

This PR was generated by running `copy_to_core.sh` in https://github.com/pytorch/torchdynamo/pull/1538

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86461
Approved by: https://github.com/voznesenskym
2022-10-13 23:18:06 +00:00