pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Nakul Camsamudram	3b265e021f	Support Optional typehint without graph breaking (#108970 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108970 Approved by: https://github.com/anijain2305	2023-09-11 16:42:44 +00:00
PyTorch MergeBot	56c2386157	Revert "reland [finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#108883 )" This reverts commit `d4230e5574`. Reverted https://github.com/pytorch/pytorch/pull/108883 on behalf of https://github.com/huydhn due to Per the discussion thread on D49122208, reverting this change ([comment](https://github.com/pytorch/pytorch/pull/108883#issuecomment-1712707853))	2023-09-10 04:40:02 +00:00
Animesh Jain	d4230e5574	reland [finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#108883 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108883 Approved by: https://github.com/voznesenskym, https://github.com/huydhn	2023-09-09 03:12:31 +00:00
Bin Bao	e91f66471c	[reland][inductor] Switch to use the runtime interface for AOTInductor testing (#108878 ) Summary: This is a reland of https://github.com/pytorch/pytorch/pull/108663 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108878 Approved by: https://github.com/muchulee8	2023-09-08 17:58:35 +00:00
Yanbo Liang	8990174676	[Dynamo] Should inline __new__ function rather than skipping frame (#108549 ) Fixes #107460 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108549 Approved by: https://github.com/jansel	2023-09-08 16:51:47 +00:00
Huy Do	a9c663c269	Revert "Flash Attention v2 (#105602 )" (#108827 ) This reverts commit `add45aea1c`. There are some conflicts on some benchmark csv file https://github.com/pytorch/pytorch/pull/105602#issuecomment-1710988951 so I need to revert this manually. The diff has been reverted internally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108827 Approved by: https://github.com/kit1980	2023-09-08 07:43:04 +00:00
PyTorch MergeBot	428f5f9e7e	Revert "[inductor] Switch to use the runtime interface for AOTInductor testing (#108663 )" This reverts commit `366ce589d0`. Reverted https://github.com/pytorch/pytorch/pull/108663 on behalf of https://github.com/Chillee due to Sorry :'( Need to revert to resolve merge conflict for another revert ([comment](https://github.com/pytorch/pytorch/pull/108663#issuecomment-1711076411))	2023-09-08 05:01:27 +00:00
PyTorch MergeBot	72f24d0001	Revert "[dynamo][finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#108528 )" This reverts commit `34bb74c4cf`. Reverted https://github.com/pytorch/pytorch/pull/108528 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it has some nasty merge conflicts after the revert of D48910794. I need to revert this so the conflict could be resolved. Please help rebase this tomorrow and reland the change ([comment](https://github.com/pytorch/pytorch/pull/108528#issuecomment-1711034781))	2023-09-08 03:49:41 +00:00
PyTorch MergeBot	e45b290127	Revert "Revert "Flash Attention v2 (#105602 )" (#108827 )" This reverts commit `24e9bbe22a`. Reverted https://github.com/pytorch/pytorch/pull/108827 on behalf of https://github.com/huydhn due to I need to land this revert properly as there are new failures showing up on trunk ([comment](https://github.com/pytorch/pytorch/pull/108827#issuecomment-1711020924))	2023-09-08 03:25:45 +00:00
Huy Do	24e9bbe22a	Revert "Flash Attention v2 (#105602 )" (#108827 ) This reverts commit `add45aea1c`. There are some conflicts on some benchmark csv file https://github.com/pytorch/pytorch/pull/105602#issuecomment-1710988951 so I need to revert this manually. The diff has been reverted internally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108827 Approved by: https://github.com/kit1980	2023-09-08 02:54:20 +00:00
Bin Bao	366ce589d0	[inductor] Switch to use the runtime interface for AOTInductor testing (#108663 ) Summary: Switch AOTInductor unit tests and integration tests to invoke the same runtime interface. This is only an effort to unify the usage of the runtime. The interface scrutiny will come in later PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108663 Approved by: https://github.com/ezyang ghstack dependencies: #108653	2023-09-07 23:38:11 +00:00
Bin Bao	e1aba2c8c3	[CI] Update the pinned timm version (#108076 ) Summary: Unify the pinned timm version and install timm at the docker building time Pull Request resolved: https://github.com/pytorch/pytorch/pull/108076 Approved by: https://github.com/ezyang	2023-09-07 11:38:13 +00:00
Animesh Jain	34bb74c4cf	[dynamo][finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#108528 ) This PR is a 99% copy paste of Sam Gross (@colesbury) work at https://github.com/pytorch/pytorch/pull/100642. Copied from there -------- The NN_MODULE guard now subsumes guards on Module attributes. The check_fn will fail if the module attributes are changed (such as Module.training), parameters, submodules, and buffers are added or removed, and if fields are changed on the type itself. This gives up specificity in the guard check -- if any field is changed the check_fn fails -- for faster overall checks. ----- Pull Request resolved: https://github.com/pytorch/pytorch/pull/108528 Approved by: https://github.com/ezyang	2023-09-07 01:45:47 +00:00
JackCaoG	e73ec92ad2	Minor fixs to make torchbench runable on torch/xla (#107919 ) `import torch_xla.core.xla_model as xm` no longer trigger the xla runtime to init, hence explictly create the device here. This is a workaround for https://github.com/pytorch/xla/issues/4174. `is_correct` reference has been deleted, I think it is a deadcode. After this patch, I am able to run ``` python benchmarks/dynamo/torchbench.py --randomize-input --performance --trace-on-xla --training --backend=openxla --only resnet50 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/107919 Approved by: https://github.com/shunting314, https://github.com/wconstab	2023-09-06 22:35:53 +00:00
eellison	738106c1f7	Torchbench model tolerance changes (#108598 ) Move detectron2_fcos_r_50_fpn to amp. The minifier showed the following snippet as causing the divergence, where inductor has better numerics than eager: ``` import torch def foo(x): return x > .2 inp = torch.tensor([.2002], device="cuda", dtype=torch.bfloat16) print(foo(inp)) print(torch.compile(foo)(inp)) ``` doctr_reco_predictor had very minimal divergence (.002 vs .001 required), bumping tolerance here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108598 Approved by: https://github.com/shunting314	2023-09-06 16:52:29 +00:00
Bin Bao	60bd30ee0b	[inductor] Move AOTInductor runtime headers (#108564 ) Summary: Move AOTInductor runtime header files into its own subdirectory, to separate them from to-be-added libtorch C interface. Reviewed By: frank-wei Differential Revision: D48905038 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108564 Approved by: https://github.com/frank-wei	2023-09-06 11:50:41 +00:00
Bin Bao	28c5b62210	[inductor] Use empty_strided to create output tensors when testing AOTInductor (#108364 ) Summary: This will fix 3 fail_accuracy failures in HF. Test Plan: ``` python benchmarks/dynamo/huggingface.py --bfloat16 --accuracy --inference --device cuda --export-aot-inductor --only T5Small ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108364 Approved by: https://github.com/angelayi ghstack dependencies: #108412	2023-09-06 02:04:32 +00:00
Yanbo Liang	ff28b4b908	Fix dynamo benchmark config --print-graph-breaks (#108584 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/108584 Approved by: https://github.com/anijain2305	2023-09-05 23:31:43 +00:00
Mark Saroufim	5f5caed25a	do not cast all inputs in benchmarks (#108456 ) Fixes why stable diffusion is not showing up in inference dashboard even though it shows up in training dashboard The reason is stable diffusion in torchbench has a line like `input_tensor = input_tensor.long().to(self.device)` and if you cast this to a bfloat16 the inference will fail <img width="1705" alt="Screenshot 2023-09-01 at 4 37 49 PM" src="https://github.com/pytorch/pytorch/assets/3282513/ada0d381-1af0-4378-8e8b-2375b39c3713"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/108456 Approved by: https://github.com/cpuhrsch	2023-09-02 03:13:17 +00:00
drisspg	add45aea1c	Flash Attention v2 (#105602 ) # Summary ## PR Dependencies I don't use ghstack :( this is a PR where it would have been helpful. That beings said I am going to peel off some PRs to make reviewing this easier: - [x] Separate build flags for Flash and MemEff: #107985 ### Description This pull request updates the version of _scaled_dot_product_flash_attention from version 1 to version 2. The changes are based on the flash attention code originally authored by @tridao ### Changes Made The majority of the changes in this pull request involve: - Copying over the flash_attention sources. - Updating header files. - Removing padding and slicing code from within the flash_attention kernel and relocating it to the composite implicit region of the SDPA. This was need to make the kernel functional and appease autograd. - Introducing a simple kernel generator to generate different instantiations of the forward and backward flash templates. - Adding conditional compilation (ifdef) to prevent building when nvcc is invoked with gencode < sm80. - Introducing a separate dependent option for mem_eff_attention, as flash_attention v2 lacks support for Windows and cannot be built for sm50 generation codes. - Modifying build.sh to reduce parallelization on sm86 runners and to lower the maximum parallelization on the manywheel builds. This adjustment was made to address out-of-memory issues during the compilation of FlashAttentionV2 sources. - Adding/Updating tests. ### Notes for Reviewers This is not a fun review, and I apologize in advance. Most of the files-changed are in the flash_attn/ folder. The only files of interest here IMO: - aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp - aten/src/ATen/native/transformers/cuda/flash_attn/kernels/generate_kernels.py ( this has been incorporated upstream to flash-attention github) There are a number of files all related to avoiding OOMs in CI/CD. These are typically shell scripts. ### Follow up items - Include the updates from `e07aa036db` and `9e5e8bc91e` \| https://github.com/pytorch/pytorch/issues/108108 ### Work Items - [x] I don't think Windows will be supported for 3.1.0 - Need to update cmakee - [x] Let multi_query/attention pass through and test \| UPDATE: I have the fast path implemented here: https://github.com/pytorch/pytorch/pull/106730 but since this will require changes to semantics of math to call repeat_interleave, I think this should be done as a followup. - [x] Had to drop cutlass back to 3.0.0 to get it to compile. Need to figure out how to upgrade to 3.1.0 and later. Spoke with Tri and he is going to be taking a look. Note: compiling with clang currently errors for the cute headers. - [x] Update test exercise above codepath - [x] Still need to disable on seq_len % 128 != 0 for backward( Tri beat me to it `a4f148b6ab`) - [x] Add determinism warning to BWD, Tri got to this one as well: 1c41d2b - [x] Update dispatcher to universally prefer FlashV2 - [x] Update tests to exercise new head_dims - [x] Move the head_dim padding from kernel to top level composite implicit function in order to make it purely functional - [x] Create template generator script - [x] Initial cmake support for building kernels/ folder - [x] Replay CudaGraph changes ### Results #### Forward only The TFlops are reported here are on a100 that is underclocked. ![flashv2_tflops_vs_seq_len](https://github.com/pytorch/pytorch/assets/32754868/152de46d-8fa6-42f0-9a9c-ef1eb7ae29e7) #### Forward+Backward Ran a sweep and for large compute bound sizes we do see a ~2x performance increase for forw+back. <img width="1684" alt="Screenshot 2023-07-20 at 3 47 47 PM" src="https://github.com/pytorch/pytorch/assets/32754868/fdd26e07-0077-4878-a417-f3a418b6fb3b"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/105602 Approved by: https://github.com/huydhn, https://github.com/cpuhrsch	2023-09-01 22:14:44 +00:00
Elias Ellison	e18f512b81	Update accuracy checking for nan, floats (#108202 ) Fixes inference accuracy for `doctr_reco_predictor` and `pyhpc_turbulent_kinetic_energy`. For the `same(float, float)` comparison we weren't going through the more rigorous tensor comparison path which takes into account the fp64 base results. Also return True when fp64 base result are not well formed (nan). I debugged these models and the source of divergence were innocuous: `doctr_reco_predictor` - can be fixed by turning off layout optimization, decomp for batch norm `pyhpc_turbulent_kinetic_energy` - divergence caused because fused kernel keeps precision in fp32 instead of casting back and forth from/to fp32/bf16. Fused kernel is better precision, anyway. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108202 Approved by: https://github.com/jansel	2023-09-01 02:54:01 +00:00
Elias Ellison	63eee52ba7	Add Drq to BF16 Higher Tolernace (#108368 ) This passes for me on aws gpu but not devgpu, and was already in the `REQUIRE_HIGHER_FP16_TOLERANCE` set. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108368 Approved by: https://github.com/shunting314	2023-09-01 00:29:27 +00:00
Shunting Zhang	eb8659fe81	pass inference accuracy check for detectron2_fcos_r_50_fpn (#108328 ) We need a higher tolerance to pass the inference accuracy check for detectron2_fcos_r_50_fpn . Command: ``` python benchmarks/dynamo/torchbench.py --backend inductor --bfloat16 --accuracy --only detectron2_fcos_r_50_fpn --disable-cudagraphs --inference ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108328 Approved by: https://github.com/jansel	2023-08-31 20:21:20 +00:00
Bin Bao	06d74e6b24	Revert "[AOTInductor] Include constants in AOTInductor .so file. (#10… (#108349 ) This reverts commit `c3239442a3` due to internal test failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108349 Approved by: https://github.com/aakhundov, https://github.com/zhxchen17	2023-08-31 16:26:02 +00:00
Yanbo Liang	9862c7196b	[Dynamo] SetVariable supports contains (#108189 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/108189 Approved by: https://github.com/voznesenskym	2023-08-31 04:28:49 +00:00
Mu-Chu Lee	c3239442a3	[AOTInductor] Include constants in AOTInductor .so file. (#107718 ) Summary: Include the constants into AOTInductor .so file. We do not modify existing API signatures but create necessary format with weight lifted out instead. Test Plan: test/inductor/test_aot_inductor.py Reviewers: Subscribers: Tasks: Tags: Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/107718 Approved by: https://github.com/angelayi, https://github.com/eellison	2023-08-29 22:37:30 +00:00
PyTorch MergeBot	2f226804a0	Revert "Minor fixs to make torchbench runable on torch/xla (#107919 )" This reverts commit `ed8f21282f`. Reverted https://github.com/pytorch/pytorch/pull/107919 on behalf of https://github.com/izaitsevfb due to Conflicts with the revert of 106914 ([comment](https://github.com/pytorch/pytorch/pull/107919#issuecomment-1696662453))	2023-08-29 02:18:07 +00:00
JackCaoG	ed8f21282f	Minor fixs to make torchbench runable on torch/xla (#107919 ) `import torch_xla.core.xla_model as xm` no longer trigger the xla runtime to init, hence explictly create the device here. This is a workaround for https://github.com/pytorch/xla/issues/4174. `is_correct` reference has been deleted, I think it is a deadcode. After this patch, I am able to run ``` python benchmarks/dynamo/torchbench.py --randomize-input --performance --trace-on-xla --training --backend=openxla --only resnet50 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/107919 Approved by: https://github.com/shunting314, https://github.com/wconstab	2023-08-26 03:34:54 +00:00
lezcano	2b6249e209	Wrap indirect indexing on CUDA (#105055 ) Lifting this to CPU should be rather easy. @jgong5 Partially fixes https://github.com/pytorch/pytorch/issues/97365. I'd wait to close that issue once this works on CPU as well. This fix works with dynamic shapes as well. @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @muchulee8 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105055 Approved by: https://github.com/peterbell10, https://github.com/jansel	2023-08-23 11:59:20 +00:00
Aaron Gokaslan	660e8060ad	[BE]: Update ruff to 0.285 (#107519 ) This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings. I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519 Approved by: https://github.com/ezyang	2023-08-22 23:16:38 +00:00
PyTorch MergeBot	d59a6864fb	Revert "[BE]: Update ruff to 0.285 (#107519 )" This reverts commit `88ab3e4322`. Reverted https://github.com/pytorch/pytorch/pull/107519 on behalf of https://github.com/ZainRizvi due to Sorry, but this PR breaks internal tests. @ezyang, can you please hep them get unblocked? It seems like one of the strings was prob accidentally modified ([comment](https://github.com/pytorch/pytorch/pull/107519#issuecomment-1688833480))	2023-08-22 19:53:32 +00:00
blzheng	b9befc53a6	benchmark: higher tolerance for RobertaForQuestionAnswering (#107376 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107376 Approved by: https://github.com/kit1980, https://github.com/XiaobingSuper, https://github.com/jansel ghstack dependencies: #107375	2023-08-21 04:34:24 +00:00
blzheng	1ea83f04d2	benchmark: convert output of fp64 to torch.float64 (#107375 ) This PR adds converting the output of fp64 to torch.float64 before checking for accuracy. Why we need this change? For llama of torchbench, it converts output to float before returning it. `bad4e9ac19/torchbenchmark/models/llama/model.py (L241)` While in the correctness checker, it will not compare the res results with fp64_ref if the fp64_ref.dtype is not torch.float64. So llama fails the accuracy check in the low-precision case, even though res is closer to fp64_ref than ref. `e108f33299/torch/_dynamo/utils.py (L1025)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/107375 Approved by: https://github.com/jgong5, https://github.com/XiaobingSuper, https://github.com/jansel	2023-08-21 04:34:23 +00:00
Aaron Gokaslan	88ab3e4322	[BE]: Update ruff to 0.285 (#107519 ) This updates ruff to 0.285 which is faster, better, and have fixes a bunch of false negatives with regards to fstrings. I also enabled RUF017 which looks for accidental quadratic list summation. Luckily, seems like there are no instances of it in our codebase, so enabling it so that it stays like that. :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107519 Approved by: https://github.com/ezyang	2023-08-20 01:36:18 +00:00
Edward Z. Yang	5b9b816b17	WAR by avoid querying device before env mutation (#107301 ) We should probably fix https://github.com/pytorch/pytorch/issues/107300 properly but this works around the problem Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/107301 Approved by: https://github.com/bdhirsh, https://github.com/H-Huang, https://github.com/albanD	2023-08-17 00:31:16 +00:00
BowenBao	19a76290d8	[ONNX] Public diagnostic options for 'dynamo_export' (#106741 ) Generate diagnostic reports to monitor the internal stages of the export process. This tool aids in unblocking model exports and debugging the exporter. #### Settings ~~1. Choose if you want to produce a .sarif file and specify its location.~~ 1. Updated: saving .sarif file should be done by `export_output.save_sarif_log(dst)`, similar to saving exported onnx model `export_output.save(model_dst)`. 2. Customize diagnostic options: - Set the desired verbosity for diagnostics. - Treat warnings as errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106741 Approved by: https://github.com/titaiwangms, https://github.com/justinchuby, https://github.com/malfet	2023-08-15 17:46:15 +00:00
Edward Z. Yang	5b04e9b6ce	Install torchrec/fbgemm from source in CI (#106808 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/106808 Approved by: https://github.com/malfet, https://github.com/xuzhao9	2023-08-12 02:08:44 +00:00
Howard Huang	656412f0cb	Add multiprocess option to dynamo benchmarks (#106394 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/106394 Approved by: https://github.com/XilunWu	2023-08-11 18:34:09 +00:00
lezcano	a9dca53438	NumPy support in torch.compile (#106211 ) RFC: https://github.com/pytorch/rfcs/pull/54 First commit is the contents of https://github.com/Quansight-Labs/numpy_pytorch_interop/ We have already been using this in core for the last few months as a external dependency. This PR pulls all these into core. In the next commits, I do a number of things in this order - Fix a few small issues - Make the tests that this PR adds pass - Bend backwards until lintrunner passes - Remove the optional dependency on `torch_np` and simply rely on the upstreamed code - Fix a number dynamo tests that were passing before (they were not tasting anything I think) and are not passing now. Missing from this PR (but not blocking): - Have a flag that deactivates tracing NumPy functions and simply breaks. There used to be one but after the merge stopped working and I removed it. @lezcano to investigate. - https://github.com/pytorch/pytorch/pull/106431#issuecomment-1667079543. @voznesenskym to submit a fix after we merge. All the tests in `tests/torch_np` take about 75s to run. This was a work by @ev-br, @rgommers @honno and I. I did not create this PR via ghstack (which would have been convenient) as this is a collaboration, and ghstack doesn't allow for shared contributions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106211 Approved by: https://github.com/ezyang	2023-08-11 00:39:32 +00:00
Alexander Pivovarov	02abbb8109	Fix some typos, mostly "that that" (#106901 ) Fix some typos Pull Request resolved: https://github.com/pytorch/pytorch/pull/106901 Approved by: https://github.com/janeyx99	2023-08-10 19:46:53 +00:00
Mark Saroufim	1b32ac3cab	Update torchbench.txt (#106761 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/106761 Approved by: https://github.com/malfet	2023-08-09 19:01:21 +00:00
Edward Z. Yang	c379d6283a	Don't suppress ModuleNotFoundError if the failure is for an unrelated module (#106807 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/106807 Approved by: https://github.com/williamwen42, https://github.com/voznesenskym	2023-08-09 01:54:49 +00:00
Mark Saroufim	90c264c276	sd flaky on cpu skip (#106726 ) waiting for update expected script Pull Request resolved: https://github.com/pytorch/pytorch/pull/106726 Approved by: https://github.com/malfet	2023-08-08 02:44:47 +00:00
Elias Ellison	578969ca61	skip maml (#106471 ) This one benchmark distorts benchmarks because it is so low (.0007, the equivalent of a 1400x speedup). It also has been flakey, which has produced a lot of noise. Disabling. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106471 Approved by: https://github.com/anijain2305	2023-08-04 22:14:09 +00:00
angelayi	5b13c779d4	[AOTInductor] Remove call to aot_autograd when receiving ExportedProgram (#105977 ) https://github.com/pytorch/pytorch/issues/105555 Existing flow first exports and then calls torch._inductor.aot_compile. However, export calls aot_autograd with the core aten decomposition table, and then torch._inductor.aot_compile calls aot_autograd again with the inductor decomposition table. The 2nd calling of aot_autograd is supposedly causing some problems, and seems excessive, so instead we will create a new function, torch._export.aot_compiler which will export using the inductor decomposition table, pass it to inductor's compile_fx_aot, and because it has already been exported, avoid recalling aot_autograd. ``` def aot_compile( f: Callable, args: Tuple[Any], kwargs: Optional[Dict[str, Any]] = None, constraints: Optional[List[Constraint]] = None, ) -> Tuple[str, ExportedProgram]: ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/105977 Approved by: https://github.com/desertfire, https://github.com/zhxchen17, https://github.com/eellison	2023-08-04 15:35:23 +00:00
angelayi	b2d3a2f433	[inductor] Remove ReinterpretView copy_ for AOT Inductor outputs (#106564 ) Running benchmark on HF models result in 71% pass rate now: P802905571 Updated [dashboard](https://hud.pytorch.org/benchmark/compilers?startTime=Fri%2C%2028%20Jul%202023%2005%3A02%3A20%20GMT&stopTime=Fri%2C%2004%20Aug%202023%2005%3A02%3A20%20GMT&granularity=hour&suite=torchbench&mode=inference&dtype=bfloat16&lBranch=angelayi/bench&lCommit=e35a655e59b2038c0395f972a1f567f862093d9c&rBranch=main&rCommit=3e5a52cedd2d586fc6cb40a73a098252b9edc2a1) Originally, a lot of the HF export-aot-inductor tests are failing with the error message: ``` RuntimeError: unsupported operation: some elements of the input tensor and the written-to tensor refer to a single memory location. Please clone() the tensor before performing the operation. ``` I looked at the result of one of the models, AlbertForMaskedLM, and the error is due to an additional [`copy_`](https://www.internalfb.com/phabricator/paste/view/P802043305?lines=1460%2C1426%2C1438%2C1451%2C1428) being inserted at the end. Looking at the [exported graph](https://www.internalfb.com/phabricator/paste/view/P802908243?lines=1124), `buf237` in the cpp program corresponds to the `view_269` node. During inductor lowering, this `view_269` node will result in a `ir.ReinterpretView` node, and when generating code for the outputs, this [line](https://fburl.com/code/epola0di) will add an additional `copy_`. I'm unsure if removing this case will result in other errors, but it seems to raise the HF model benchmark pass rate :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/106564 Approved by: https://github.com/jansel	2023-08-04 07:51:29 +00:00
Howard Huang	236eda4d51	remove jit from torchbench (#106071 ) Need to remove jit arguments after changes in https://github.com/pytorch/benchmark/pull/1787 Also curious, is there is a procedure for updating torchbench version in Pytorch CI? Pull Request resolved: https://github.com/pytorch/pytorch/pull/106071 Approved by: https://github.com/xuzhao9, https://github.com/msaroufim, https://github.com/malfet, https://github.com/lezcano	2023-08-03 21:04:43 +00:00
Mark Saroufim	b03505eca8	update expected pass for torchbench dynamic (#106573 ) fixes https://github.com/pytorch/pytorch/pull/106009#issuecomment-1664513049 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106573 Approved by: https://github.com/cpuhrsch	2023-08-03 20:46:08 +00:00
Mark Saroufim	6268ab2c2d	torchbench pin upd: hf auth token, clip, whisper, llamav2, sd (#106009 ) Includes stable diffusion, whisper, llama7b and clip To get this to work I had to Pass in hf auth token to all ci jobs, github does not pass in secrets from parent to child automatically. There's a likelihood HF will rate limit us in case please revert this PR and I'll work on adding a cache next - cc @voznesenskym @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @aakhundov @malfet Something upstream changed in torchbench too where now `hf_Bert` and `hf_Bert_large` are both failing on some dynamic shape looking error which I'm not sure how to debug yet so for now felt a bit gross but added a skip since others are building on top this work @ezyang `llamav2_7b_16h` cannot pass through accuracy checks cause it OOMs on deepcloning extra inputs this seems to make it not need to show up in expected numbers csv, will figure this when we update the pin with https://github.com/pytorch/benchmark/pull/1803 cc @H-Huang @xuzhao9 @cpuhrsch Pull Request resolved: https://github.com/pytorch/pytorch/pull/106009 Approved by: https://github.com/malfet	2023-08-03 16:28:40 +00:00
Ubuntu	77e369b363	Run minification for TorchDynamo benchmark models that fail evaluation (#106201 ) ### Description As an alternative to PR #105774, which provides a standalone, end-to-end minification script that covers all types of failures and has more functionality, this PR adds the ability to minify models when they fail the eval loop (accuracy checks). Both this PR and the other one can be merged without issue. ### Purpose The goal is to leverage the minifier to minify models that fail accuracy checks, allowing failed models to be debugged more easily. The ideal use-case is trying to run a model suite on a backend where operator coverage is not known or is limited. If models can compile but fails the eval loop, having the repro script for each model is valuable for any developer that's trying to fix the issue. ### Functionality - Create minify flag that minifies models when they fail accuracy check - Produce minified graph for each model, and save it into repro script - Move repro script to output directory/base Dynamo directory - Enable functionality for running an entire model suite (Hugging Face, timm, and TorchBench) by prepending model name to repro script Pull Request resolved: https://github.com/pytorch/pytorch/pull/106201 Approved by: https://github.com/ezyang	2023-08-03 03:34:04 +00:00

1 2 3 4 5 ...

1298 Commits