pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Bin Bao	4abfa22812	[aotinductor] Add a perf smoke test for AOTInductor (#110972 ) Summary: To prevent perf regression like the one caused by #110510 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110972 Approved by: https://github.com/chenyang78	2023-10-11 13:30:05 +00:00
Michael Voznesensky	1e7947b3e0	Revert "Reland 3rd try [finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#109323 )" + Forward fixes + test (#110964 ) This reverts commit `f786fbdebd`. Forward fixes Pull Request resolved: https://github.com/pytorch/pytorch/pull/110964 Approved by: https://github.com/ezyang, https://github.com/anijain2305	2023-10-11 05:16:47 +00:00
angelayi	83061ee177	[aotinductor] Fix benchmarks with self.autocast (#110490 ) Fixes https://github.com/pytorch/pytorch/issues/108173 The original error was that there was a type mismatch between the output of eager mode (float16) and from aot_compile (float32). This is because when we run the model eagerly in the benchmarks, we call [self.model_iter_fn](https://github.com/pytorch/pytorch/blob/main/benchmarks/dynamo/common.py#L2072-L2076) to run the model, rather than directly calling the model. In the case of timm models, it calls the model with [self.autocast()](https://github.com/pytorch/pytorch/blob/main/benchmarks/dynamo/timm_models.py#L321-L323), causing the eager model to return a float16 value. However, the model we export with aot_compile does not have the self.autocast context, so it returns float32. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110490 Approved by: https://github.com/desertfire	2023-10-06 02:13:47 +00:00
Xu Zhao	2e31fae5c5	Cleanup the code in the `dynamo` userbenchmark (#110519 ) Summary: Skip importing the modules that are only available in the pytorch source code, not pytorch nightly release. Make dynamo benchmark work on both OSS and internal. X-link: https://github.com/pytorch/benchmark/pull/1960 Test Plan: ``` $ python run_benchmark.py dynamo --only alexnet --training --performance --inductor loading model: 0it [00:05, ?it/s] cuda train alexnet running benchmark: 100%\|█████████████████\| 30/30 [00:00<00:00, 41.46it/s] 1.129x ``` ``` $ buck2 run mode/opt //pytorch/benchmark:run_benchmark -- dynamo --only alexnet --training --inductor --performance --output-directory $HOME loading model: 0it [00:16, ?it/s] running benchmark: 100%\|█████████████████\| 30/30 [00:00<00:00, 37.94it/s] cuda train alexnet 1.120x ``` Differential Revision: D49912006 Pulled By: xuzhao9 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110519 Approved by: https://github.com/desertfire, https://github.com/jansel	2023-10-04 23:26:30 +00:00
Bin Bao	06e88d2cfc	[aotinductor] Remove output_spec from AOTInductorModelCache (#110462 ) Summary: No need to store output_spec as the returned exported.call_spec already contains that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110462 Approved by: https://github.com/angelayi	2023-10-03 22:29:36 +00:00
BowenBao	6b2c52278e	Benchmark flag to include slowdowns when computing gmean of speedups over eager (#108375 ) `clip(1)` excludes slowdowns by treating them as 1x. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108375 Approved by: https://github.com/jansel	2023-10-02 20:35:08 +00:00
atalman	b253fc9c93	Revert "[1/N] Dynamo skipfiles refactor (#109567 )" (#110296 ) This reverts commit `84c5435b29`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110296 Approved by: https://github.com/yanboliang	2023-09-29 20:35:46 +00:00
Simon Fan	88ef126a93	rename nanogpt_generate to nanogpt to also support train (#109746 ) Differential Revision: [D49522940](https://our.internmc.facebook.com/intern/diff/D49522940) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109746 Approved by: https://github.com/msaroufim, https://github.com/malfet, https://github.com/xuzhao9	2023-09-29 17:36:48 +00:00
Bin Bao	f82a29e32b	[inductor] Add CI jobs to test AOTInductor (#108419 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108419 Approved by: https://github.com/angelayi, https://github.com/jansel	2023-09-28 20:19:25 +00:00
Yanbo Liang	84c5435b29	[1/N] Dynamo skipfiles refactor (#109567 ) This is 1/N of the dynamo skipfiles/allowed_functions refactor, the major change in this PR includes: * Refactor & define the [skipfiles rules](https://github.com/pytorch/pytorch/pull/109567/files#diff-5aa3ce9db729bf0901ea97a5d3cc51924cc8575d9c516c1c8f572a35de92544aR56) and interface * For every ```skipfiles.check```, we return both the check result and the skip/inline reason and log them for debugging. * We found several latent issues/bugs and incorrect implementations in the codebase, but I'm planning to fix them in follow-up PRs to make the refactor decoupled with bug fixes. * More details in the inline comments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109567 Approved by: https://github.com/ezyang, https://github.com/jansel, https://github.com/anijain2305	2023-09-28 18:36:46 +00:00
PyTorch MergeBot	75462fd870	Revert "[1/N] Dynamo skipfiles refactor (#109567 )" This reverts commit `f8e0ebec8c`. Reverted https://github.com/pytorch/pytorch/pull/109567 on behalf of https://github.com/huydhn due to Many jobs are failing in trunk after this with FILENAME_ALLOWLIST is not defined error `f8e0ebec8c`. This looks like a landrace ([comment](https://github.com/pytorch/pytorch/pull/109567#issuecomment-1738344950))	2023-09-28 02:22:22 +00:00
Yanbo Liang	f8e0ebec8c	[1/N] Dynamo skipfiles refactor (#109567 ) This is 1/N of the dynamo skipfiles/allowed_functions refactor, the major change in this PR includes: * Refactor & define the [skipfiles rules](https://github.com/pytorch/pytorch/pull/109567/files#diff-5aa3ce9db729bf0901ea97a5d3cc51924cc8575d9c516c1c8f572a35de92544aR56) and interface * For every ```skipfiles.check```, we return both the check result and the skip/inline reason and log them for debugging. * We found several latent issues/bugs and incorrect implementations in the codebase, but I'm planning to fix them in follow-up PRs to make the refactor decoupled with bug fixes. * More details in the inline comments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109567 Approved by: https://github.com/ezyang, https://github.com/jansel, https://github.com/anijain2305	2023-09-28 01:21:59 +00:00
BowenBao	85e408217a	[ONNX] Move out onnx bench bash scripts (#103983 ) Summary: - Remove onnx bench related scripts and `_onnx` folder. - Update `common.py` to include onnx related patches previously under `_onnx` folder. - Update `merge_rules.json` to include bench files. - Added quick sanity onnx bench test to onnx CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103983 Approved by: https://github.com/kit1980	2023-09-27 23:54:26 +00:00
rzou	7dbdf3be1e	Fix inductor CI (by updating graph break count) (#110160 ) There was a vision hash update which led to fewer graph breaks. This seems expected to me (because the hash update included https://github.com/pytorch/vision/pull/7944 and nms is used in maskrcnn). Test Plan: - wait for ci Pull Request resolved: https://github.com/pytorch/pytorch/pull/110160 Approved by: https://github.com/ezyang, https://github.com/Chillee	2023-09-27 14:37:36 +00:00
angelayi	57cdad2396	[aotinductor] Update benchmark to include compilation time (#109998 ) Fixes [comment](https://github.com/pytorch/pytorch/pull/109820#pullrequestreview-1638629777) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109998 Approved by: https://github.com/desertfire	2023-09-25 21:30:22 +00:00
angelayi	a565f1bee6	[aotinductor] Skip benchmarks with control flow (#109661 ) Since AOTInductor doesn't support control flow yet, we will skip over tests that are currently failing due to containing control flow in the code. Logs taken from https://hud.pytorch.org/benchmark/compilers?startTime=Tue%2C%2012%20Sep%202023%2022%3A56%3A40%20GMT&stopTime=Tue%2C%2019%20Sep%202023%2022%3A56%3A40%20GMT&granularity=hour&suite=torchbench&mode=inference&dtype=bfloat16&lBranch=main&lCommit=2c1554a0323107d821be3ff13df7833b9f0b960d&rBranch=main&rCommit=47be61e12bd51df27182343d312dc3df485d5559 Errors documented in https://github.com/pytorch/pytorch/issues/105217 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109661 Approved by: https://github.com/desertfire	2023-09-25 18:49:06 +00:00
PyTorch MergeBot	d9627c4264	Revert "[inductor] fix a max-autotune rng state related bug (#109828 )" This reverts commit `3663436db3`. Reverted https://github.com/pytorch/pytorch/pull/109828 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but the rocm failure looks legit. There is also another numpy import error when running dynamo test on CPU ([comment](https://github.com/pytorch/pytorch/pull/109828#issuecomment-1732423883))	2023-09-23 22:35:37 +00:00
Shunting Zhang	3663436db3	[inductor] fix a max-autotune rng state related bug (#109828 ) Fix https://github.com/pytorch/pytorch/issues/109736 . HF pin move causes regression on accuracy check for HF models on the dashboard. Manually reverting the HF PR ( https://github.com/huggingface/transformers/pull/24696/files ) could recover, but this may hide some real issue. I happen to found that using a warm matmul max-autotune cache can work around the issue. Or putting it in another way: - make all calls to check_cache cache miss repro the issue - make all cals to check_cache cache hit works around the issue I did some sort of 'bisect' to force halving the amount of cache miss each time while still make sure we can repro. Luckily reducing to a single cache miss still repro the issue. With more debugging, it turns out that it's the call to `torch.randn` on cuda device causing the problem. The fix is to make sure we restore the rng state when we generate random inputs for max-autotune benchmarking. TBH, I can not fully explain the root cause although I know it's caused by rng state change. AOTAutograd already has some logic to preserve rng state. And I can not repro the issue in unit tests. I have a few guess why the RNG state is not restored in the first place after we generate random inputs for max-autotune: - maybe AOTAutograd misses some corner case to preserve the rng state - maybe for the failed models, there are some eager fallback that's not handled by inductor. And if those fallback calles random number related APIs, we will see the issue. But again I don't find a good way to simulate this. Repro: ``` TORCHINDUCTOR_BENCHMARK_KERNEL=1 TORCHINDUCTOR_MAX_AUTOTUNE_GEMM=1 CUDA_VISIBLE_DEVICES=3 time python benchmarks/dynamo/huggingface.py --backend inductor --amp --accuracy --only PLBartForCausalLM --training --cold-start-latency ``` We always repro the issue without the PR but pass the accuracy check with the PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109828 Approved by: https://github.com/eellison	2023-09-23 00:58:10 +00:00
Mark Saroufim	e2cfbca5ab	Add clip to dynamo runners (#109840 ) CLIP was moved to canary models because we use the multimodal version which depends on torchtext which torchbench deprecated https://github.com/pytorch/benchmark/pull/1837 This issue didn't show up before because we hadn't updated the torchbench pin Pull Request resolved: https://github.com/pytorch/pytorch/pull/109840 Approved by: https://github.com/cpuhrsch	2023-09-22 20:50:57 +00:00
Bin Bao	8856c1628e	[inductor] Change AOTInductor to return output tensors (#109790 ) Summary: Change AOTInductor to directly return output tensors instead of taking pre-allocated output tensors to return the results. This gives several benefits: * It makes sure AOTInductor has the same behavior when managing the output tensors as the default Inductor, which is widely tested and thus more reliable. * As we have debugged before, there are cases we still have to codegen extra copy_ ops to fill the pre-allocated output tensors which doesn't make sense for performance. * With the coming enhanced memory planning, this again will make sure the memory planning logic is the between AOTInductor and Inductor, which will greatly simplify the problem and improve the reliability. This change also combines D49494954 from Yang and https://github.com/pytorch/pytorch/pull/109560 from Angela. Differential Revision: D49502318 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109790 Approved by: https://github.com/chenyang78	2023-09-22 02:31:52 +00:00
Angela Yi	f7ddc54503	[aotinductor] Update performance benchmark code (109560) (#109820 ) Summary: Same as #109560, made a new PR because we need to land from internal Previously during performance benchmark testing, we would create an AOTInductorModelContainerHandle every time the compiled function is run with new inputs. However after https://github.com/pytorch/pytorch/pull/108473 we now load the constants needed in the runtime when initializing the AOTInductorModelContainerHandle. This resulted in our benchmarks displaying a ~0.4x speedup. This diff moves the initialization of AOTInductorModelContainerHandle outside of the code where we run the compiled function with different inputs. For example, ``` python benchmarks/dynamo/huggingface.py --performance --cold-start-latency --inference --bfloat16 --export-aot-inductor --disable-cudagraphs --device cuda --total-partitions 3 --partition-id 0 --only AlbertForMaskedLM ``` results in `1.359x` speedup. Specifically, this adds a `create_container_handle` and `delete_container_handle` function which need to called before `run`. We call `create_container_handle` to initialize the AOTInductorModelContainerHandle, call `run` to run the compiled .so with different inputs, and then `delete_container_handle` to delete it. [Updated dashboard results](https://hud.pytorch.org/benchmark/compilers?startTime=Wed%2C%2013%20Sep%202023%2021%3A03%3A55%20GMT&stopTime=Wed%2C%2020%20Sep%202023%2021%3A03%3A55%20GMT&granularity=hour&suite=torchbench&mode=inference&dtype=bfloat16&lBranch=angelayi/aot_inductor_benchmark&lCommit=f9aa49c4c9a1a140b6f0c4520d1d6d99b57e12fa&rBranch=main&rCommit=015be4cedba357eb931e24bf188479235db7c5c8) Test Plan: CI Differential Revision: D49513934 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109820 Approved by: https://github.com/desertfire	2023-09-21 20:49:41 +00:00
Simon Fan	ef8d461b09	Fix torchbench --multiprocess (#109657 ) `python benchmarks/dynamo/torchbench.py --multiprocess` currently fails due to initializing distributed multiple times: ``` torch.distributed.DistNetworkError: The server socket has failed to listen on any local network address. The server socket has failed to bind to [::]:6789 (errno: 98 - Address already in use). The server socket has failed to bind to 0.0.0.0:6789 (errno: 98 - Address already in use). ``` Because torchbench calls itself via mp.spawn, there is the parent run (with `--multiprocess`) and child runs (with `--multiprocess --only <model>`). This PR addresses this by fixing two issues: 1) distributed is initialized once in parent run and once in child runs, it should be initialized only in child runs where we have accurate rank and world size info 2) torchbench overrides CUDA_VISIBLE_DEVICES/world_size sometimes, but it shouldn't for distributed use cases where we want to use all available gpus I am also adding a CI test to cover this type of issue in #109311 ### Test plan parent run test: `python benchmarks/dynamo/torchbench.py --ci --accuracy --timing --explain --inductor --device cuda --inference --bfloat16 --output /home/xmfan/local/pytorch/test/test-reports/inference_torchbench.csv --multiprocess` child run test: `python benchmarks/dynamo/torchbench.py --ci --accuracy --timing --explain --inductor --device cuda --inference --bfloat16 --output /home/xmfan/local/pytorch/test/test-reports/inference_torchbench.csv --multiprocess --only simple_gpt` Pull Request resolved: https://github.com/pytorch/pytorch/pull/109657 Approved by: https://github.com/H-Huang	2023-09-21 16:53:07 +00:00
eellison	d24ba7a634	Add 3d Attn Pattern to match HF Whisper (#109156 ) Adds a 3d pattern that improves perf of HF Whisper from 1.3 -> 4.1. We could be matching more generally on 3d, but i'll leave that for another pr. Thanks to @drisspg for helping me write the pattern. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109156 Approved by: https://github.com/yanboliang ghstack dependencies: #109663, #108894, #108917, #109142	2023-09-20 16:39:31 +00:00
Edward Z. Yang	964b79c813	[EASY] Update dynamo dependency installing Makefile (#107229 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/107229 Approved by: https://github.com/bdhirsh	2023-09-19 18:58:37 +00:00
Mark Saroufim	0ec9f59f70	Loudly Error in dynamo bench if eager fails (#109536 ) Helps debug https://github.com/pytorch/benchmark/issues/1901 I will wait until the ONNX beartype sev is fixed before merging Pull Request resolved: https://github.com/pytorch/pytorch/pull/109536 Approved by: https://github.com/xuzhao9	2023-09-19 00:40:42 +00:00
angelayi	5b13f74e9b	[export] Update how we input kwargs (#109160 ) Previously, the code for passing inputs to exported program was: ``` if kwargs: return (args, kwargs) else: return args ``` However, this causes some inconsistency where if the original input contains args and kwargs, the treespec would be a tuple containing a tuple of arguments, and a dictionary of keyword arguments. But if the original input only contained args, the treespec would just be a tuple of arguments. This inconsistency causes some inconveniences in the runtime. So I updated the code to just always keep the kwargs around. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109160 Approved by: https://github.com/zhxchen17, https://github.com/avikchaudhuri	2023-09-19 00:04:32 +00:00
Justin Chu	050c56d0a5	[dynamo][ci] Pin beartype to 0.15.0 (#109510 ) CIs are failing because of https://github.com/beartype/beartype/issues/282 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109510 Approved by: https://github.com/thiagocrepaldi	2023-09-18 19:08:32 +00:00
Aaron Gokaslan	6d725e7d66	[BE]: enable ruff rules PLR1722 and PLW3301 (#109461 ) Enables two ruff rules derived from pylint: * PLR1722 replaces any exit() calls with sys.exit(). exit() is only designed to be used in repl contexts as may not always be imported by default. This always use the version in the sys module which is better * PLW3301 replaces nested min / max calls with simplified versions (ie. `min(a, min(b, c))` => `min(a, b. c)`). The new version is more idiomatic and more efficient. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109461 Approved by: https://github.com/ezyang	2023-09-18 02:07:21 +00:00
Animesh Jain	f786fbdebd	Reland 3rd try [finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#109323 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109323 Approved by: https://github.com/huydhn, https://github.com/voznesenskym	2023-09-15 08:44:14 +00:00
Simon Fan	54c5f474a7	Forward rank and world size info to Torchbench models when using dynamo runner (#108438 ) Adding support to pass rank and world_size to torchbench model, via its extra_args parameter: https://github.com/pytorch/benchmark/blob/main/torchbenchmark/util/model.py#L83C80-L83C90 This is used for models which distribute over multiple GPUs e.g. simple_gpt https://github.com/pytorch/benchmark/pull/1867 Also add an option to skip multiprocess only gpu models Testing via `python benchmarks/dynamo/torchbench.py -d cuda --output=benchmark_logs/performance.csv --inference --performance --timing --print-memory --multiprocess --only simple_gpt` Pull Request resolved: https://github.com/pytorch/pytorch/pull/108438 Approved by: https://github.com/Chillee	2023-09-14 21:01:20 +00:00
Nakul Camsamudram	109ab6a0df	Support str() on user defined functions (#108973 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108973 Approved by: https://github.com/anijain2305	2023-09-14 01:32:02 +00:00
drisspg	ad90ab31f2	Flash Attention v2 (#105602 ) # Summary ## PR Dependencies I don't use ghstack :( this is a PR where it would have been helpful. That beings said I am going to peel off some PRs to make reviewing this easier: - [x] Separate build flags for Flash and MemEff: #107985 ### Description This pull request updates the version of _scaled_dot_product_flash_attention from version 1 to version 2. The changes are based on the flash attention code originally authored by @tridao ### Changes Made The majority of the changes in this pull request involve: - Copying over the flash_attention sources. - Updating header files. - Removing padding and slicing code from within the flash_attention kernel and relocating it to the composite implicit region of the SDPA. This was need to make the kernel functional and appease autograd. - Introducing a simple kernel generator to generate different instantiations of the forward and backward flash templates. - Adding conditional compilation (ifdef) to prevent building when nvcc is invoked with gencode < sm80. - Introducing a separate dependent option for mem_eff_attention, as flash_attention v2 lacks support for Windows and cannot be built for sm50 generation codes. - Modifying build.sh to reduce parallelization on sm86 runners and to lower the maximum parallelization on the manywheel builds. This adjustment was made to address out-of-memory issues during the compilation of FlashAttentionV2 sources. - Adding/Updating tests. ### Notes for Reviewers This is not a fun review, and I apologize in advance. Most of the files-changed are in the flash_attn/ folder. The only files of interest here IMO: - aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp - aten/src/ATen/native/transformers/cuda/flash_attn/kernels/generate_kernels.py ( this has been incorporated upstream to flash-attention github) There are a number of files all related to avoiding OOMs in CI/CD. These are typically shell scripts. ### Follow up items - Include the updates from `e07aa036db` and `9e5e8bc91e` \| https://github.com/pytorch/pytorch/issues/108108 ### Work Items - [x] I don't think Windows will be supported for 3.1.0 - Need to update cmakee - [x] Let multi_query/attention pass through and test \| UPDATE: I have the fast path implemented here: https://github.com/pytorch/pytorch/pull/106730 but since this will require changes to semantics of math to call repeat_interleave, I think this should be done as a followup. - [x] Had to drop cutlass back to 3.0.0 to get it to compile. Need to figure out how to upgrade to 3.1.0 and later. Spoke with Tri and he is going to be taking a look. Note: compiling with clang currently errors for the cute headers. - [x] Update test exercise above codepath - [x] Still need to disable on seq_len % 128 != 0 for backward( Tri beat me to it `a4f148b6ab`) - [x] Add determinism warning to BWD, Tri got to this one as well: 1c41d2b - [x] Update dispatcher to universally prefer FlashV2 - [x] Update tests to exercise new head_dims - [x] Move the head_dim padding from kernel to top level composite implicit function in order to make it purely functional - [x] Create template generator script - [x] Initial cmake support for building kernels/ folder - [x] Replay CudaGraph changes ### Results #### Forward only The TFlops are reported here are on a100 that is underclocked. ![flashv2_tflops_vs_seq_len](https://github.com/pytorch/pytorch/assets/32754868/152de46d-8fa6-42f0-9a9c-ef1eb7ae29e7) #### Forward+Backward Ran a sweep and for large compute bound sizes we do see a ~2x performance increase for forw+back. <img width="1684" alt="Screenshot 2023-07-20 at 3 47 47 PM" src="https://github.com/pytorch/pytorch/assets/32754868/fdd26e07-0077-4878-a417-f3a418b6fb3b"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/105602 Approved by: https://github.com/huydhn, https://github.com/cpuhrsch	2023-09-13 13:59:05 +00:00
angelayi	3d8d59e68b	Update inductor ci_expected_accuracy (#109148 ) Changes due to updating the HF pin: [107400](https://github.com/pytorch/pytorch/pull/107400) Somehow during the previous PR it didn't need these changes...probably a CI bug Pull Request resolved: https://github.com/pytorch/pytorch/pull/109148 Approved by: https://github.com/clee2000, https://github.com/desertfire	2023-09-13 05:12:33 +00:00
angelayi	c3945b5f84	Update HF version to commit hash (6c26faa) (#107400 ) Some [errors](https://ossci-raw-job-status.s3.amazonaws.com/log/15968424899) in the [torchinductor hf benchmarks](https://hud.pytorch.org/benchmark/huggingface/inductor_aot_inductor?startTime=Thu,%2010%20Aug%202023%2018:05:47%20GMT&stopTime=Thu,%2017%20Aug%202023%2018:05:47%20GMT&granularity=hour&mode=inference&dtype=bfloat16&lBranch=main&lCommit=384e0d104fd077d31efafc564129660e9b7a0f25&rBranch=main&rCommit=03414081ff7ee011e17ee10f9ddb2584811bf965) should be fixed in the most recent release (for example, this [line](`c036c814f4/src/transformers/models/opt/modeling_opt.py (L688)`) no longer exists). Additionally, I landed a [commit (6c26faa)](`6c26faa159`) to the HF transformers repro to fix one of the graph breaks. This PR results in [76% pass rate for the export + aot inductor HF benchmark!](https://hud.pytorch.org/benchmark/compilers?startTime=Thu%2C%2010%20Aug%202023%2022%3A45%3A09%20GMT&stopTime=Thu%2C%2017%20Aug%202023%2022%3A45%3A09%20GMT&granularity=hour&suite=torchbench&mode=inference&dtype=bfloat16&lBranch=angelayi/hf_version&lCommit=0accaaca2fa70ca2f78c1a587dd4b6750448dd90&rBranch=main&rCommit=03414081ff7ee011e17ee10f9ddb2584811bf965) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107400 Approved by: https://github.com/ezyang, https://github.com/desertfire, https://github.com/malfet	2023-09-12 15:25:28 +00:00
Nakul Camsamudram	3b265e021f	Support Optional typehint without graph breaking (#108970 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108970 Approved by: https://github.com/anijain2305	2023-09-11 16:42:44 +00:00
PyTorch MergeBot	56c2386157	Revert "reland [finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#108883 )" This reverts commit `d4230e5574`. Reverted https://github.com/pytorch/pytorch/pull/108883 on behalf of https://github.com/huydhn due to Per the discussion thread on D49122208, reverting this change ([comment](https://github.com/pytorch/pytorch/pull/108883#issuecomment-1712707853))	2023-09-10 04:40:02 +00:00
Animesh Jain	d4230e5574	reland [finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#108883 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108883 Approved by: https://github.com/voznesenskym, https://github.com/huydhn	2023-09-09 03:12:31 +00:00
Bin Bao	e91f66471c	[reland][inductor] Switch to use the runtime interface for AOTInductor testing (#108878 ) Summary: This is a reland of https://github.com/pytorch/pytorch/pull/108663 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108878 Approved by: https://github.com/muchulee8	2023-09-08 17:58:35 +00:00
Yanbo Liang	8990174676	[Dynamo] Should inline __new__ function rather than skipping frame (#108549 ) Fixes #107460 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108549 Approved by: https://github.com/jansel	2023-09-08 16:51:47 +00:00
Huy Do	a9c663c269	Revert "Flash Attention v2 (#105602 )" (#108827 ) This reverts commit `add45aea1c`. There are some conflicts on some benchmark csv file https://github.com/pytorch/pytorch/pull/105602#issuecomment-1710988951 so I need to revert this manually. The diff has been reverted internally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108827 Approved by: https://github.com/kit1980	2023-09-08 07:43:04 +00:00
PyTorch MergeBot	428f5f9e7e	Revert "[inductor] Switch to use the runtime interface for AOTInductor testing (#108663 )" This reverts commit `366ce589d0`. Reverted https://github.com/pytorch/pytorch/pull/108663 on behalf of https://github.com/Chillee due to Sorry :'( Need to revert to resolve merge conflict for another revert ([comment](https://github.com/pytorch/pytorch/pull/108663#issuecomment-1711076411))	2023-09-08 05:01:27 +00:00
PyTorch MergeBot	72f24d0001	Revert "[dynamo][finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#108528 )" This reverts commit `34bb74c4cf`. Reverted https://github.com/pytorch/pytorch/pull/108528 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it has some nasty merge conflicts after the revert of D48910794. I need to revert this so the conflict could be resolved. Please help rebase this tomorrow and reland the change ([comment](https://github.com/pytorch/pytorch/pull/108528#issuecomment-1711034781))	2023-09-08 03:49:41 +00:00
PyTorch MergeBot	e45b290127	Revert "Revert "Flash Attention v2 (#105602 )" (#108827 )" This reverts commit `24e9bbe22a`. Reverted https://github.com/pytorch/pytorch/pull/108827 on behalf of https://github.com/huydhn due to I need to land this revert properly as there are new failures showing up on trunk ([comment](https://github.com/pytorch/pytorch/pull/108827#issuecomment-1711020924))	2023-09-08 03:25:45 +00:00
Huy Do	24e9bbe22a	Revert "Flash Attention v2 (#105602 )" (#108827 ) This reverts commit `add45aea1c`. There are some conflicts on some benchmark csv file https://github.com/pytorch/pytorch/pull/105602#issuecomment-1710988951 so I need to revert this manually. The diff has been reverted internally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108827 Approved by: https://github.com/kit1980	2023-09-08 02:54:20 +00:00
Bin Bao	366ce589d0	[inductor] Switch to use the runtime interface for AOTInductor testing (#108663 ) Summary: Switch AOTInductor unit tests and integration tests to invoke the same runtime interface. This is only an effort to unify the usage of the runtime. The interface scrutiny will come in later PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108663 Approved by: https://github.com/ezyang ghstack dependencies: #108653	2023-09-07 23:38:11 +00:00
Bin Bao	e1aba2c8c3	[CI] Update the pinned timm version (#108076 ) Summary: Unify the pinned timm version and install timm at the docker building time Pull Request resolved: https://github.com/pytorch/pytorch/pull/108076 Approved by: https://github.com/ezyang	2023-09-07 11:38:13 +00:00
Animesh Jain	34bb74c4cf	[dynamo][finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#108528 ) This PR is a 99% copy paste of Sam Gross (@colesbury) work at https://github.com/pytorch/pytorch/pull/100642. Copied from there -------- The NN_MODULE guard now subsumes guards on Module attributes. The check_fn will fail if the module attributes are changed (such as Module.training), parameters, submodules, and buffers are added or removed, and if fields are changed on the type itself. This gives up specificity in the guard check -- if any field is changed the check_fn fails -- for faster overall checks. ----- Pull Request resolved: https://github.com/pytorch/pytorch/pull/108528 Approved by: https://github.com/ezyang	2023-09-07 01:45:47 +00:00
JackCaoG	e73ec92ad2	Minor fixs to make torchbench runable on torch/xla (#107919 ) `import torch_xla.core.xla_model as xm` no longer trigger the xla runtime to init, hence explictly create the device here. This is a workaround for https://github.com/pytorch/xla/issues/4174. `is_correct` reference has been deleted, I think it is a deadcode. After this patch, I am able to run ``` python benchmarks/dynamo/torchbench.py --randomize-input --performance --trace-on-xla --training --backend=openxla --only resnet50 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/107919 Approved by: https://github.com/shunting314, https://github.com/wconstab	2023-09-06 22:35:53 +00:00
eellison	738106c1f7	Torchbench model tolerance changes (#108598 ) Move detectron2_fcos_r_50_fpn to amp. The minifier showed the following snippet as causing the divergence, where inductor has better numerics than eager: ``` import torch def foo(x): return x > .2 inp = torch.tensor([.2002], device="cuda", dtype=torch.bfloat16) print(foo(inp)) print(torch.compile(foo)(inp)) ``` doctr_reco_predictor had very minimal divergence (.002 vs .001 required), bumping tolerance here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108598 Approved by: https://github.com/shunting314	2023-09-06 16:52:29 +00:00
Bin Bao	60bd30ee0b	[inductor] Move AOTInductor runtime headers (#108564 ) Summary: Move AOTInductor runtime header files into its own subdirectory, to separate them from to-be-added libtorch C interface. Reviewed By: frank-wei Differential Revision: D48905038 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108564 Approved by: https://github.com/frank-wei	2023-09-06 11:50:41 +00:00

1 2 3 4 5 ...

518 Commits