pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Xuehai Pan	ba48cf6535	[BE][Easy][6/19] enforce style for empty lines in import segments in `test/` (#129757 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129757 Approved by: https://github.com/ezyang	2024-07-17 06:42:37 +00:00
Xuehai Pan	4d7bf72d93	[BE][Easy] fix ruff rule needless-bool (SIM103) (#130206 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/130206 Approved by: https://github.com/malfet	2024-07-14 08:17:52 +00:00
Yuanhao Ji	312652c325	[RFC] Add support for device extension autoloading (#127074 ) Fixes #122468 - Load device extensions at the end of `torch/__init__.py` - Enabled by default, or you can disable it with `TORCH_DEVICE_BACKEND_AUTOLOAD=0` run test: ```python python test/run_test.py -i test_autoload_enable python test/run_test.py -i test_autoload_disable ``` doc: https://docs-preview.pytorch.org/pytorch/pytorch/127074/miscellaneous_environment_variables.html co-author: @jgong5 @bsochack @bkowalskiINTEL @jczaja @FFFrog @hipudding Co-authored-by: albanD <desmaison.alban@gmail.com> Co-authored-by: Jiong Gong <jiong.gong@intel.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/127074 Approved by: https://github.com/albanD, https://github.com/jgong5	2024-07-09 06:14:13 +00:00
Catherine Lee	91a8376d47	run_test: Unset cpp stacktraces after reruns (#129004 ) Rerun the failing test singly with the env var set. If it succeeds, start a new process without the cpp stack traces env var We don't want to waste time generating these if we don't have to They can also show up in assertion errors, which may cause unexpected failures if a test wants to check these Adds new --rs (run single) to be used the same way --scs and --sc are. It will only run the single test in the step current file https://hud.pytorch.org/pytorch/pytorch/pull/129004?sha=2c349d3557d399020bf1f6a8b7045e2e4957ba46 has some examples of logs In the above: * test_checkpoint_valid failed, then passed in another subprocess. The testing continued in a different new subprocess from the test right after it (test_checkpointing_without_reentrant_early_free) * test_format_traceback_short failed consistently, but it continued to run because keep-going was set Pull Request resolved: https://github.com/pytorch/pytorch/pull/129004 Approved by: https://github.com/PaliC	2024-07-03 01:50:15 +00:00
Xuehai Pan	4ee1cb9b95	[BE][Easy] replace `import pathlib` with `from pathlib import Path` (#129426 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129426 Approved by: https://github.com/malfet	2024-06-30 01:36:07 +00:00
PyTorch MergeBot	2effbcfcd8	Revert "[BE][Easy] replace `import pathlib` with `from pathlib import Path` (#129426 )" This reverts commit `6d75604ef1`. Reverted https://github.com/pytorch/pytorch/pull/129426 on behalf of https://github.com/XuehaiPan due to recognize `Path` as new exported API ([comment](https://github.com/pytorch/pytorch/pull/129426#issuecomment-2198371625))	2024-06-29 23:24:06 +00:00
Xuehai Pan	6d75604ef1	[BE][Easy] replace `import pathlib` with `from pathlib import Path` (#129426 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129426 Approved by: https://github.com/malfet	2024-06-29 15:42:09 +00:00
Catherine Lee	8892ddaacc	[TD] Test removal on sm86 (#127131 ) Yolo I'm excited to break CI :') Pull Request resolved: https://github.com/pytorch/pytorch/pull/127131 Approved by: https://github.com/huydhn, https://github.com/ZainRizvi	2024-06-07 20:19:18 +00:00
Howard Huang	baaa914bf7	[small] test clean up (#128079 ) remove unnecessary line: https://github.com/pytorch/pytorch/issues/123733 add main so test can be run `python ...`: https://github.com/pytorch/pytorch/issues/124906 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128079 Approved by: https://github.com/awgu	2024-06-06 21:21:40 +00:00
chuanqiw	627d2cd87d	[CI] disable td for xpu ci test by default (#127611 ) Due to the xpu ci test has been enabled td by default, a lot of test cases (75%) have been skipped in CI tests. It caused some ci failures escaped from the ci tests, for example issue #127539. This PR depends on PR #127595 landed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127611 Approved by: https://github.com/etaf, https://github.com/atalman	2024-06-04 17:15:10 +00:00
Catherine Lee	a31a60d85b	Change run_test.py arg parsing to handle additional args better (#126709 ) Do not inherit parser from common_utils * I don't think we use any variables in run_test that depend on those, and I think all tests except doctests run in a subprocess so they will parse the args in common_utils and set the variables. I don't think doctests wants any of those variables? Parse known args, add the extra args as extra, pass the extra ones along to the subprocess Removes the first instance of `--` I think I will miss run_test telling me if an arg is valid or not Pull Request resolved: https://github.com/pytorch/pytorch/pull/126709 Approved by: https://github.com/ZainRizvi, https://github.com/huydhn, https://github.com/Flamefire	2024-05-23 21:08:12 +00:00
Catherine Lee	ac2c547838	[TD] Upload names of failures to s3 for pytest cache (#126315 ) Some tests don't get run through pytest and pytest crashes when a test segfaults, so in both caess, the pytest cache won't have an entry (similar to https://github.com/pytorch/test-infra/pull/5205). Instead, manually upload/download an extra file that lists the failing test files Technically this would be more general than the pytest cache Pull Request resolved: https://github.com/pytorch/pytorch/pull/126315 Approved by: https://github.com/ZainRizvi	2024-05-21 16:29:31 +00:00
PyTorch MergeBot	8bca0847c2	Revert "[TD] Upload names of failures to s3 for pytest cache (#126315 )" This reverts commit `655038687a`. Reverted https://github.com/pytorch/pytorch/pull/126315 on behalf of https://github.com/clee2000 due to broke inductor ([comment](https://github.com/pytorch/pytorch/pull/126315#issuecomment-2121133045))	2024-05-20 20:15:08 +00:00
Catherine Lee	655038687a	[TD] Upload names of failures to s3 for pytest cache (#126315 ) Some tests don't get run through pytest and pytest crashes when a test segfaults, so in both caess, the pytest cache won't have an entry (similar to https://github.com/pytorch/test-infra/pull/5205). Instead, manually upload/download an extra file that lists the failing test files Technically this would be more general than the pytest cache Pull Request resolved: https://github.com/pytorch/pytorch/pull/126315 Approved by: https://github.com/ZainRizvi	2024-05-20 17:36:30 +00:00
drisspg	762ce6f062	Add Lowering for FlexAttention Backwards (#125515 ) # Summary #### What does this PR do? It enables Inductor to actually generate the fused flex attention kernel for the backwards I did some other things along the way: - Abstract out the 'build_subgraph_buffer' subroutine and make it reusable between flex attention and flex_attention backwards. In total we need too build 3 subgraphs for fwd + bwd. 1 for the fwd graph and then 2 in the bwd. The FAv2 algorithm recomputes the parts of the forward (more efficiently since we already have the row_max via logsumexp), therefore we need to inline both the fwd graph and the joint graph in the bwds kernel. - The version of the backwards kernel is from a somewhat older version of the triton tutorial implementation. I think that we should update in a follow up to a newer version. Notably the blocks need to be square for this to work as currently implemented. I am sure there are many opportunities for optimization. - I didnt correctly register the decomp table + IndexMode when I landed: https://github.com/pytorch/pytorch/pull/123902, this remedies that. - The rel_bias helper func was reversed in terms of causality. I updated and then add a test specific for "future causal" attention. - This PRs but the main point that I think still needs to be worked out is the store_output call. I have it hacked up to be 'fake' but I dont think we want to land that and likely want to just have a mutated 'dq' and a stored_output 'dk' - I also needed to update the `TritonTemplateKernel` to actually accept multiple subgraphs (modifications) - I updated the benchmark to also profile bwds performance ### Benchmark Numbers: _The current implementation is not parallelizing over ctx length in the bwd_ FWD Speedups \| Type \| Speedup \| shape \| score_mod \| dtype \| \|---------\|-----------\|--------------------\|-------------\|----------------\| \| Average \| 0.991 \| \| \| \| \| Max \| 1.182 \| (16, 16, 4096, 64) \| noop \| torch.bfloat16 \| \| Min \| 0.796 \| (2, 16, 512, 256) \| head_bias \| torch.bfloat16 \| BWD Speedups \| Type \| Speedup \| shape \| score_mod \| dtype \| \|---------\|-----------\|--------------------\|-------------\|----------------\| \| Average \| 0.291 \| \| \| \| \| Max \| 0.652 \| (8, 16, 512, 64) \| head_bias \| torch.bfloat16 \| \| Min \| 0.073 \| (2, 16, 4096, 128) \| head_bias \| torch.bfloat16 \| <details> <summary>Full Data</summary> \| shape \| score_mod \| dtype \| fwd_eager_time \| fwd_compiled_time \| bwd_eager_time \| bwd_compiled_time \| fwd_speedup \| bwd_speedup \| \|---------------------\|---------------\|----------------\|------------------\|---------------------\|------------------\|---------------------\|---------------\|---------------\| \| (2, 16, 512, 64) \| noop \| torch.bfloat16 \| 19.936 \| 19.092 \| 57.851 \| 193.564 \| 1.044 \| 0.299 \| \| (2, 16, 512, 64) \| causal_mask \| torch.bfloat16 \| 19.955 \| 19.497 \| 57.662 \| 206.278 \| 1.024 \| 0.280 \| \| (2, 16, 512, 64) \| relative_bias \| torch.bfloat16 \| 19.455 \| 21.297 \| 57.674 \| 195.219 \| 0.913 \| 0.295 \| \| (2, 16, 512, 64) \| head_bias \| torch.bfloat16 \| 19.958 \| 21.289 \| 57.674 \| 193.859 \| 0.938 \| 0.298 \| \| (2, 16, 512, 128) \| noop \| torch.bfloat16 \| 28.157 \| 28.615 \| 82.831 \| 454.211 \| 0.984 \| 0.182 \| \| (2, 16, 512, 128) \| causal_mask \| torch.bfloat16 \| 28.154 \| 28.444 \| 83.091 \| 432.083 \| 0.990 \| 0.192 \| \| (2, 16, 512, 128) \| relative_bias \| torch.bfloat16 \| 28.722 \| 27.897 \| 83.175 \| 446.789 \| 1.030 \| 0.186 \| \| (2, 16, 512, 128) \| head_bias \| torch.bfloat16 \| 28.299 \| 27.673 \| 83.052 \| 459.179 \| 1.023 \| 0.181 \| \| (2, 16, 512, 256) \| noop \| torch.bfloat16 \| 41.167 \| 50.504 \| 175.019 \| 1083.545 \| 0.815 \| 0.162 \| \| (2, 16, 512, 256) \| causal_mask \| torch.bfloat16 \| 41.656 \| 51.933 \| 175.078 \| 1171.176 \| 0.802 \| 0.149 \| \| (2, 16, 512, 256) \| relative_bias \| torch.bfloat16 \| 41.697 \| 50.722 \| 175.159 \| 1097.312 \| 0.822 \| 0.160 \| \| (2, 16, 512, 256) \| head_bias \| torch.bfloat16 \| 41.690 \| 52.387 \| 175.184 \| 1097.336 \| 0.796 \| 0.160 \| \| (2, 16, 1024, 64) \| noop \| torch.bfloat16 \| 39.232 \| 37.454 \| 127.847 \| 612.430 \| 1.047 \| 0.209 \| \| (2, 16, 1024, 64) \| causal_mask \| torch.bfloat16 \| 39.930 \| 39.599 \| 127.755 \| 665.359 \| 1.008 \| 0.192 \| \| (2, 16, 1024, 64) \| relative_bias \| torch.bfloat16 \| 39.417 \| 41.304 \| 127.902 \| 614.990 \| 0.954 \| 0.208 \| \| (2, 16, 1024, 64) \| head_bias \| torch.bfloat16 \| 39.965 \| 42.034 \| 127.953 \| 613.273 \| 0.951 \| 0.209 \| \| (2, 16, 1024, 128) \| noop \| torch.bfloat16 \| 63.964 \| 71.024 \| 226.510 \| 1637.669 \| 0.901 \| 0.138 \| \| (2, 16, 1024, 128) \| causal_mask \| torch.bfloat16 \| 63.843 \| 72.451 \| 226.750 \| 1558.949 \| 0.881 \| 0.145 \| \| (2, 16, 1024, 128) \| relative_bias \| torch.bfloat16 \| 64.301 \| 70.487 \| 226.651 \| 1610.063 \| 0.912 \| 0.141 \| \| (2, 16, 1024, 128) \| head_bias \| torch.bfloat16 \| 64.033 \| 71.394 \| 226.676 \| 1668.511 \| 0.897 \| 0.136 \| \| (2, 16, 1024, 256) \| noop \| torch.bfloat16 \| 129.348 \| 141.390 \| 507.337 \| 4405.175 \| 0.915 \| 0.115 \| \| (2, 16, 1024, 256) \| causal_mask \| torch.bfloat16 \| 129.538 \| 145.680 \| 507.178 \| 4768.874 \| 0.889 \| 0.106 \| \| (2, 16, 1024, 256) \| relative_bias \| torch.bfloat16 \| 129.438 \| 142.782 \| 507.004 \| 4401.002 \| 0.907 \| 0.115 \| \| (2, 16, 1024, 256) \| head_bias \| torch.bfloat16 \| 129.058 \| 146.242 \| 507.547 \| 4434.251 \| 0.883 \| 0.114 \| \| (2, 16, 4096, 64) \| noop \| torch.bfloat16 \| 481.606 \| 409.120 \| 1440.890 \| 14147.269 \| 1.177 \| 0.102 \| \| (2, 16, 4096, 64) \| causal_mask \| torch.bfloat16 \| 480.227 \| 438.847 \| 1434.419 \| 14973.386 \| 1.094 \| 0.096 \| \| (2, 16, 4096, 64) \| relative_bias \| torch.bfloat16 \| 480.831 \| 458.104 \| 1432.935 \| 14193.253 \| 1.050 \| 0.101 \| \| (2, 16, 4096, 64) \| head_bias \| torch.bfloat16 \| 480.749 \| 452.497 \| 1437.040 \| 14084.869 \| 1.062 \| 0.102 \| \| (2, 16, 4096, 128) \| noop \| torch.bfloat16 \| 872.534 \| 848.275 \| 2600.895 \| 35156.849 \| 1.029 \| 0.074 \| \| (2, 16, 4096, 128) \| causal_mask \| torch.bfloat16 \| 872.647 \| 868.279 \| 2587.581 \| 31919.531 \| 1.005 \| 0.081 \| \| (2, 16, 4096, 128) \| relative_bias \| torch.bfloat16 \| 871.484 \| 827.644 \| 2593.989 \| 34805.634 \| 1.053 \| 0.075 \| \| (2, 16, 4096, 128) \| head_bias \| torch.bfloat16 \| 871.422 \| 856.437 \| 2602.482 \| 35708.591 \| 1.017 \| 0.073 \| \| (2, 16, 4096, 256) \| noop \| torch.bfloat16 \| 1904.497 \| 1758.183 \| 6122.416 \| 66754.593 \| 1.083 \| 0.092 \| \| (2, 16, 4096, 256) \| causal_mask \| torch.bfloat16 \| 1911.174 \| 1762.821 \| 6113.207 \| 72759.392 \| 1.084 \| 0.084 \| \| (2, 16, 4096, 256) \| relative_bias \| torch.bfloat16 \| 1911.254 \| 1727.108 \| 6123.530 \| 66577.988 \| 1.107 \| 0.092 \| \| (2, 16, 4096, 256) \| head_bias \| torch.bfloat16 \| 1916.977 \| 1801.804 \| 6118.158 \| 67359.680 \| 1.064 \| 0.091 \| \| (8, 16, 512, 64) \| noop \| torch.bfloat16 \| 44.984 \| 43.974 \| 170.276 \| 262.259 \| 1.023 \| 0.649 \| \| (8, 16, 512, 64) \| causal_mask \| torch.bfloat16 \| 45.001 \| 46.265 \| 170.509 \| 274.893 \| 0.973 \| 0.620 \| \| (8, 16, 512, 64) \| relative_bias \| torch.bfloat16 \| 45.466 \| 48.211 \| 170.606 \| 262.759 \| 0.943 \| 0.649 \| \| (8, 16, 512, 64) \| head_bias \| torch.bfloat16 \| 45.481 \| 48.435 \| 170.267 \| 261.265 \| 0.939 \| 0.652 \| \| (8, 16, 512, 128) \| noop \| torch.bfloat16 \| 72.565 \| 74.736 \| 313.220 \| 773.126 \| 0.971 \| 0.405 \| \| (8, 16, 512, 128) \| causal_mask \| torch.bfloat16 \| 72.015 \| 75.755 \| 313.311 \| 775.513 \| 0.951 \| 0.404 \| \| (8, 16, 512, 128) \| relative_bias \| torch.bfloat16 \| 72.105 \| 74.189 \| 313.806 \| 769.238 \| 0.972 \| 0.408 \| \| (8, 16, 512, 128) \| head_bias \| torch.bfloat16 \| 72.005 \| 74.364 \| 313.509 \| 775.237 \| 0.968 \| 0.404 \| \| (8, 16, 512, 256) \| noop \| torch.bfloat16 \| 138.656 \| 165.453 \| 663.707 \| 2672.067 \| 0.838 \| 0.248 \| \| (8, 16, 512, 256) \| causal_mask \| torch.bfloat16 \| 139.096 \| 172.613 \| 663.593 \| 2926.538 \| 0.806 \| 0.227 \| \| (8, 16, 512, 256) \| relative_bias \| torch.bfloat16 \| 139.500 \| 168.417 \| 663.938 \| 2658.629 \| 0.828 \| 0.250 \| \| (8, 16, 512, 256) \| head_bias \| torch.bfloat16 \| 139.776 \| 173.549 \| 662.920 \| 2667.266 \| 0.805 \| 0.249 \| \| (8, 16, 1024, 64) \| noop \| torch.bfloat16 \| 134.883 \| 125.004 \| 484.706 \| 1195.254 \| 1.079 \| 0.406 \| \| (8, 16, 1024, 64) \| causal_mask \| torch.bfloat16 \| 134.297 \| 132.875 \| 485.420 \| 1234.953 \| 1.011 \| 0.393 \| \| (8, 16, 1024, 64) \| relative_bias \| torch.bfloat16 \| 134.839 \| 139.231 \| 485.470 \| 1198.556 \| 0.968 \| 0.405 \| \| (8, 16, 1024, 64) \| head_bias \| torch.bfloat16 \| 133.822 \| 136.449 \| 485.608 \| 1189.198 \| 0.981 \| 0.408 \| \| (8, 16, 1024, 128) \| noop \| torch.bfloat16 \| 235.470 \| 234.765 \| 886.094 \| 2662.944 \| 1.003 \| 0.333 \| \| (8, 16, 1024, 128) \| causal_mask \| torch.bfloat16 \| 236.305 \| 241.382 \| 886.293 \| 2646.984 \| 0.979 \| 0.335 \| \| (8, 16, 1024, 128) \| relative_bias \| torch.bfloat16 \| 236.414 \| 233.980 \| 885.250 \| 2642.178 \| 1.010 \| 0.335 \| \| (8, 16, 1024, 128) \| head_bias \| torch.bfloat16 \| 237.176 \| 239.040 \| 885.754 \| 2665.242 \| 0.992 \| 0.332 \| \| (8, 16, 1024, 256) \| noop \| torch.bfloat16 \| 504.445 \| 517.855 \| 1978.956 \| 9592.906 \| 0.974 \| 0.206 \| \| (8, 16, 1024, 256) \| causal_mask \| torch.bfloat16 \| 502.428 \| 536.002 \| 1978.611 \| 10607.342 \| 0.937 \| 0.187 \| \| (8, 16, 1024, 256) \| relative_bias \| torch.bfloat16 \| 503.396 \| 523.960 \| 1977.993 \| 9539.284 \| 0.961 \| 0.207 \| \| (8, 16, 1024, 256) \| head_bias \| torch.bfloat16 \| 503.818 \| 536.014 \| 1980.131 \| 9576.262 \| 0.940 \| 0.207 \| \| (8, 16, 4096, 64) \| noop \| torch.bfloat16 \| 1970.139 \| 1674.930 \| 5750.940 \| 16724.134 \| 1.176 \| 0.344 \| \| (8, 16, 4096, 64) \| causal_mask \| torch.bfloat16 \| 1959.036 \| 1775.056 \| 5780.512 \| 17390.350 \| 1.104 \| 0.332 \| \| (8, 16, 4096, 64) \| relative_bias \| torch.bfloat16 \| 1947.198 \| 1773.869 \| 5780.643 \| 16779.699 \| 1.098 \| 0.345 \| \| (8, 16, 4096, 64) \| head_bias \| torch.bfloat16 \| 1963.935 \| 1829.502 \| 5780.018 \| 16703.259 \| 1.073 \| 0.346 \| \| (8, 16, 4096, 128) \| noop \| torch.bfloat16 \| 3582.711 \| 3362.623 \| 10436.069 \| 36415.565 \| 1.065 \| 0.287 \| \| (8, 16, 4096, 128) \| causal_mask \| torch.bfloat16 \| 3581.504 \| 3499.472 \| 10346.869 \| 36164.959 \| 1.023 \| 0.286 \| \| (8, 16, 4096, 128) \| relative_bias \| torch.bfloat16 \| 3589.779 \| 3337.849 \| 10529.621 \| 36261.696 \| 1.075 \| 0.290 \| \| (8, 16, 4096, 128) \| head_bias \| torch.bfloat16 \| 3602.265 \| 3436.444 \| 10458.660 \| 36507.790 \| 1.048 \| 0.286 \| \| (8, 16, 4096, 256) \| noop \| torch.bfloat16 \| 7695.923 \| 7126.275 \| 24643.009 \| 140949.081 \| 1.080 \| 0.175 \| \| (8, 16, 4096, 256) \| causal_mask \| torch.bfloat16 \| 7679.939 \| 7186.252 \| 24538.105 \| 157156.067 \| 1.069 \| 0.156 \| \| (8, 16, 4096, 256) \| relative_bias \| torch.bfloat16 \| 7681.374 \| 6994.832 \| 24549.713 \| 140077.179 \| 1.098 \| 0.175 \| \| (8, 16, 4096, 256) \| head_bias \| torch.bfloat16 \| 7679.822 \| 7212.278 \| 24627.823 \| 140675.003 \| 1.065 \| 0.175 \| \| (16, 16, 512, 64) \| noop \| torch.bfloat16 \| 80.126 \| 78.291 \| 333.719 \| 541.165 \| 1.023 \| 0.617 \| \| (16, 16, 512, 64) \| causal_mask \| torch.bfloat16 \| 80.065 \| 81.696 \| 333.779 \| 551.113 \| 0.980 \| 0.606 \| \| (16, 16, 512, 64) \| relative_bias \| torch.bfloat16 \| 80.138 \| 86.715 \| 333.364 \| 542.118 \| 0.924 \| 0.615 \| \| (16, 16, 512, 64) \| head_bias \| torch.bfloat16 \| 80.415 \| 85.204 \| 333.294 \| 536.840 \| 0.944 \| 0.621 \| \| (16, 16, 512, 128) \| noop \| torch.bfloat16 \| 134.964 \| 138.025 \| 607.093 \| 1333.102 \| 0.978 \| 0.455 \| \| (16, 16, 512, 128) \| causal_mask \| torch.bfloat16 \| 134.192 \| 141.523 \| 606.269 \| 1424.318 \| 0.948 \| 0.426 \| \| (16, 16, 512, 128) \| relative_bias \| torch.bfloat16 \| 135.711 \| 138.639 \| 606.283 \| 1327.974 \| 0.979 \| 0.457 \| \| (16, 16, 512, 128) \| head_bias \| torch.bfloat16 \| 135.552 \| 140.555 \| 607.107 \| 1347.370 \| 0.964 \| 0.451 \| \| (16, 16, 512, 256) \| noop \| torch.bfloat16 \| 275.113 \| 315.144 \| 1301.583 \| 5268.153 \| 0.873 \| 0.247 \| \| (16, 16, 512, 256) \| causal_mask \| torch.bfloat16 \| 274.867 \| 328.106 \| 1302.513 \| 5770.594 \| 0.838 \| 0.226 \| \| (16, 16, 512, 256) \| relative_bias \| torch.bfloat16 \| 276.052 \| 321.770 \| 1302.904 \| 5241.920 \| 0.858 \| 0.249 \| \| (16, 16, 512, 256) \| head_bias \| torch.bfloat16 \| 271.409 \| 328.839 \| 1302.142 \| 5266.037 \| 0.825 \| 0.247 \| \| (16, 16, 1024, 64) \| noop \| torch.bfloat16 \| 260.489 \| 237.463 \| 955.884 \| 1817.558 \| 1.097 \| 0.526 \| \| (16, 16, 1024, 64) \| causal_mask \| torch.bfloat16 \| 262.378 \| 254.350 \| 955.280 \| 1843.807 \| 1.032 \| 0.518 \| \| (16, 16, 1024, 64) \| relative_bias \| torch.bfloat16 \| 261.338 \| 268.253 \| 956.038 \| 1820.036 \| 0.974 \| 0.525 \| \| (16, 16, 1024, 64) \| head_bias \| torch.bfloat16 \| 262.153 \| 264.156 \| 956.023 \| 1810.076 \| 0.992 \| 0.528 \| \| (16, 16, 1024, 128) \| noop \| torch.bfloat16 \| 476.475 \| 461.413 \| 1760.578 \| 4306.521 \| 1.033 \| 0.409 \| \| (16, 16, 1024, 128) \| causal_mask \| torch.bfloat16 \| 473.794 \| 479.178 \| 1761.277 \| 4619.439 \| 0.989 \| 0.381 \| \| (16, 16, 1024, 128) \| relative_bias \| torch.bfloat16 \| 473.839 \| 463.282 \| 1758.692 \| 4290.562 \| 1.023 \| 0.410 \| \| (16, 16, 1024, 128) \| head_bias \| torch.bfloat16 \| 472.979 \| 472.896 \| 1763.086 \| 4367.931 \| 1.000 \| 0.404 \| \| (16, 16, 1024, 256) \| noop \| torch.bfloat16 \| 1014.184 \| 1026.764 \| 3922.997 \| 19104.147 \| 0.988 \| 0.205 \| \| (16, 16, 1024, 256) \| causal_mask \| torch.bfloat16 \| 1013.217 \| 1039.046 \| 3928.382 \| 21086.281 \| 0.975 \| 0.186 \| \| (16, 16, 1024, 256) \| relative_bias \| torch.bfloat16 \| 1008.519 \| 1015.278 \| 3922.133 \| 18980.652 \| 0.993 \| 0.207 \| \| (16, 16, 1024, 256) \| head_bias \| torch.bfloat16 \| 1011.360 \| 1047.542 \| 3931.245 \| 19069.172 \| 0.965 \| 0.206 \| \| (16, 16, 4096, 64) \| noop \| torch.bfloat16 \| 3929.850 \| 3325.667 \| 11411.704 \| 23344.280 \| 1.182 \| 0.489 \| \| (16, 16, 4096, 64) \| causal_mask \| torch.bfloat16 \| 3885.262 \| 3581.544 \| 11390.515 \| 23725.639 \| 1.085 \| 0.480 \| \| (16, 16, 4096, 64) \| relative_bias \| torch.bfloat16 \| 3865.737 \| 3537.308 \| 11489.901 \| 23406.330 \| 1.093 \| 0.491 \| \| (16, 16, 4096, 64) \| head_bias \| torch.bfloat16 \| 3880.530 \| 3665.249 \| 11484.411 \| 23299.496 \| 1.059 \| 0.493 \| \| (16, 16, 4096, 128) \| noop \| torch.bfloat16 \| 7030.306 \| 6745.715 \| 20621.264 \| 57464.096 \| 1.042 \| 0.359 \| \| (16, 16, 4096, 128) \| causal_mask \| torch.bfloat16 \| 7095.414 \| 7034.385 \| 20410.656 \| 61660.511 \| 1.009 \| 0.331 \| \| (16, 16, 4096, 128) \| relative_bias \| torch.bfloat16 \| 7084.779 \| 6686.497 \| 20315.161 \| 57243.969 \| 1.060 \| 0.355 \| \| (16, 16, 4096, 128) \| head_bias \| torch.bfloat16 \| 7075.367 \| 6863.305 \| 20494.385 \| 58481.953 \| 1.031 \| 0.350 \| \| (16, 16, 4096, 256) \| noop \| torch.bfloat16 \| 15612.741 \| 14297.482 \| 55306.847 \| 281161.865 \| 1.092 \| 0.197 \| \| (16, 16, 4096, 256) \| causal_mask \| torch.bfloat16 \| 15326.592 \| 14263.878 \| 55227.806 \| 313063.232 \| 1.075 \| 0.176 \| \| (16, 16, 4096, 256) \| relative_bias \| torch.bfloat16 \| 15297.963 \| 14007.379 \| 54558.029 \| 279529.175 \| 1.092 \| 0.195 \| \| (16, 16, 4096, 256) \| head_bias \| torch.bfloat16 \| 15216.160 \| 14276.027 \| 55081.581 \| 280996.826 \| 1.066 \| 0.196 \| </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/125515 Approved by: https://github.com/Chillee	2024-05-17 00:41:55 +00:00
Jithun Nair	14d8e3aec0	Add distributed/_tensor/test_attention to ROCM_BLOCKLIST (#126336 ) Fixes #125504 Fixes #126252 Fixes #126296 Fixes #126330 This PR doesn't really fix the RingAttentionTest tests for ROCm, but explicitly adds the whole test file to ROCM_BLOCKLIST to get a clean signal on ROCm distributed CI. We will enable these tests in a follow-up PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126336 Approved by: https://github.com/huydhn, https://github.com/pruthvistony	2024-05-16 16:38:09 +00:00
Catherine Lee	48f98bcdfc	[TD] Enable test removal on most default configs + distributed CUDA for everyone (#125931 ) yolo Add the longest jobs in pull: * default cpu configs * non sm86 cuda * distributed cuda for everyone Still excluding * slow, inductor, rocm, onnx, mac, dynamo * distributed cpu * windows cuda Pull Request resolved: https://github.com/pytorch/pytorch/pull/125931 Approved by: https://github.com/huydhn, https://github.com/ZainRizvi	2024-05-14 17:35:12 +00:00
Catherine Lee	6f619cc727	[ez] functorch/test_vmap and test_dataloader to run in parallel (#125597 ) Also mark test_svd serial in linalg to see if it helps with the flakiness Pull Request resolved: https://github.com/pytorch/pytorch/pull/125597 Approved by: https://github.com/huydhn, https://github.com/ZainRizvi	2024-05-08 15:37:29 +00:00
Huy Do	0e57bbb6d7	Set timeout for C++ tests (#125517 ) Looking at the unrelated Windows timeout failure on https://github.com/pytorch/pytorch/pull/125199, it looks like we don't have a timeout value set for C++ tests atm. In this case, a C++ test on Windows timed out after 2+ hours. ``` 2024-05-02T23:35:34.0639067Z Running cpp/c10_TypeList_test 1/1 ... [2024-05-02 23:35:34.059021] 2024-05-02T23:35:34.0641108Z Executing ['pytest', 'C:\\actions-runner\\_work\\pytorch\\pytorch\\build\\win_tmp\\build\\torch\\test\\c10_TypeList_test.exe', '-m', 'not serial', '-v', '-vv', '-rfEX', '-n', '2', '--junit-xml-reruns', 'test-reports\\python-pytest\\test\\run_test\\test\\run_test-c898ddeff8f33cbf.xml', '-x', '--reruns=2'] ... [2024-05-02 23:35:34.062137] 2024-05-03T02:45:33.7862004Z Process SpawnPoolWorker-2: 2024-05-03T02:45:33.7927201Z Traceback (most recent call last): 2024-05-03T02:45:33.7928032Z File "C:\Jenkins\Miniconda3\lib\multiprocessing\process.py", line 315, in _bootstrap 2024-05-03T02:45:33.7928722Z self.run() 2024-05-03T02:45:33.7929722Z File "C:\Jenkins\Miniconda3\lib\multiprocessing\process.py", line 108, in run 2024-05-03T02:45:33.7931639Z self._target(self._args, self._kwargs) 2024-05-03T02:45:33.7932435Z File "C:\Jenkins\Miniconda3\lib\multiprocessing\pool.py", line 114, in worker 2024-05-03T02:45:33.7933338Z task = get() 2024-05-03T02:45:33.7933946Z File "C:\Jenkins\Miniconda3\lib\multiprocessing\queues.py", line 365, in get 2024-05-03T02:45:33.7935219Z res = self._reader.recv_bytes() 2024-05-03T02:45:33.7935897Z File "C:\Jenkins\Miniconda3\lib\multiprocessing\connection.py", line 221, in recv_bytes 2024-05-03T02:45:33.7936609Z buf = self._recv_bytes(maxlength) 2024-05-03T02:45:33.7937302Z File "C:\Jenkins\Miniconda3\lib\multiprocessing\connection.py", line 310, in _recv_bytes 2024-05-03T02:45:33.7938316Z waitres = _winapi.WaitForMultipleObjects( 2024-05-03T02:45:33.7938766Z KeyboardInterrupt ``` Retrying was working, but it was already too late to finish the job. I'm setting the same default `THRESHOLD 3` timeout value here for C++ tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125517 Approved by: https://github.com/clee2000	2024-05-07 16:41:38 +00:00
Catherine Lee	848fce35b5	[CI][ez] Don't retry when it says don't retry (#125643 ) default arg for retry_shell is retries=1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/125643 Approved by: https://github.com/huydhn	2024-05-07 16:20:00 +00:00
Catherine Lee	1b3fd83ab2	[TD] Enable TD on AVX related configs (#125482 ) On test configs `nogpu_AVX512` and `nogpu_NO_AVX2`, which are the next longest jobs on trunk after windows Pull Request resolved: https://github.com/pytorch/pytorch/pull/125482 Approved by: https://github.com/huydhn	2024-05-06 22:02:16 +00:00
Catherine Lee	d4727fd4eb	[TD][ez] Better check for is pr or not (#125485 ) You can trigger ciflow tags on main branch commits, so we should be more conservative when checking to see if a workflow is a PR/on the main branch. get_pr_number checks for the pr number based on the PR_NUMBER env var or a tag of the for `ciflow/workflow/pr number` If we fail to find something like this, then assume it is on the main branch Pull Request resolved: https://github.com/pytorch/pytorch/pull/125485 Approved by: https://github.com/huydhn	2024-05-04 03:08:44 +00:00
Catherine Lee	e16f1ee4cc	[ez][CI] Move test_modules and test_schema_check off CI_SERIAL_LIST (#125193 ) * Related https://github.com/pytorch/pytorch/pull/124085 As in title, move test_modules and test_schema_check off CI_SERIAL_LIST If things fail, they can get the serialTest decorator instead Pull Request resolved: https://github.com/pytorch/pytorch/pull/125193 Approved by: https://github.com/huydhn	2024-05-01 15:48:48 +00:00
PyTorch MergeBot	e7631d6eae	Revert "CI: add aarch64 linux workflow (#121284 )" This reverts commit `32cf04cb7f`. Reverted https://github.com/pytorch/pytorch/pull/121284 on behalf of https://github.com/malfet due to Test only changes has not been reverted ([comment](https://github.com/pytorch/pytorch/pull/121284#issuecomment-2083925890))	2024-04-30 00:24:11 +00:00
Catherine Lee	4d717cd7c3	[TD] Enable td on cpu windows (#125049 ) yolo Also * Ensure that at least 1 test always gets run (`//` does truncation which results in 0 if you have too few tests discovered) * Don't run test removal on slow tests - I'm not touching that yet I am avoid everything other than pull + trunk workflows, so not doing this on windows CUDA, which runs on periodic Pull Request resolved: https://github.com/pytorch/pytorch/pull/125049 Approved by: https://github.com/huydhn, https://github.com/ZainRizvi	2024-04-29 23:39:54 +00:00
Catherine Lee	faee0e5ee8	[ez][CI] Move test_linalg and test_sparse_csr off CI_SERIAL_LIST (#125068 ) * https://github.com/pytorch/pytorch/pull/124649 for context Pull Request resolved: https://github.com/pytorch/pytorch/pull/125068 Approved by: https://github.com/huydhn, https://github.com/ZainRizvi	2024-04-29 21:22:35 +00:00
Sunita Nadampalli	32cf04cb7f	CI: add aarch64 linux workflow (#121284 ) aarch64 linux workflow is triggered for ciflow/aarch64 tags. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121284 Approved by: https://github.com/atalman, https://github.com/malfet	2024-04-29 18:25:40 +00:00
egienvalue	8461e7ed9e	Add test_cpp_extensions tests for stream_and_event and mita_backend (#123614 ) Test the generic torch.Stream/Event with fake device gurad and hooks. Since we added a fake device backend, it is mutual exclusive to other backends. Tests will be skipped if TEST_CUDA or TEST_ROCM is true. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123614 Approved by: https://github.com/albanD ghstack dependencies: #123611, #123612	2024-04-26 16:17:54 +00:00
PyTorch MergeBot	4a1299cc0e	Revert "Add test_cpp_extensions tests for stream_and_event and mita_backend (#123614 )" This reverts commit `355dc34f86`. Reverted https://github.com/pytorch/pytorch/pull/123614 on behalf of https://github.com/jeffdaily due to this PR broke ROCm with message RuntimeError: Cannot have MTIA with other devices ([comment](https://github.com/pytorch/pytorch/pull/123612#issuecomment-2077649762))	2024-04-25 16:06:46 +00:00
Catherine Lee	4f29103749	[ez][CI] Move test_cuda off CI_SERIAL_LIST (#124649 ) Tag test cases with large tensor with serial, also tag a few more that failed on a previous iteration of this PR Move test_cuda and test_cuda_expandable_segments off the serial list Pull Request resolved: https://github.com/pytorch/pytorch/pull/124649 Approved by: https://github.com/ZainRizvi	2024-04-24 22:04:23 +00:00
egienvalue	355dc34f86	Add test_cpp_extensions tests for stream_and_event and mita_backend (#123614 ) Test the generic torch.Stream/Event with fake device gurad and hooks. Differential Revision: [D56443358](https://our.internmc.facebook.com/intern/diff/D56443358) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123614 Approved by: https://github.com/albanD ghstack dependencies: #123611, #123612	2024-04-24 20:51:20 +00:00
Catherine Lee	8fe0b8b6a8	No CPP or xdist process level reruns (#124798 ) xdist doesn't play well with current process level rerun scheme Pull Request resolved: https://github.com/pytorch/pytorch/pull/124798 Approved by: https://github.com/huydhn	2024-04-24 19:44:51 +00:00
PyTorch MergeBot	52da03edeb	Revert "Add test_cpp_extensions tests for stream_and_event and mita_backend (#123614 )" This reverts commit `b6f0159db0`. Reverted https://github.com/pytorch/pytorch/pull/123614 on behalf of https://github.com/jeffdaily due to This broke ROCm. see test_overrides.py ([comment](https://github.com/pytorch/pytorch/pull/123611#issuecomment-2067363780))	2024-04-19 22:44:26 +00:00
egienvalue	b6f0159db0	Add test_cpp_extensions tests for stream_and_event and mita_backend (#123614 ) Test the generic torch.Stream/Event with fake device gurad and hooks. @exported-using-ghexport Differential Revision: [D55902506](https://our.internmc.facebook.com/intern/diff/D55902506/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123614 Approved by: https://github.com/albanD ghstack dependencies: #123611, #123612	2024-04-18 17:40:13 +00:00
Catherine Lee	025387f4dd	[ez][CI] Reduce CI_SERIAL_LIST pt2 (#124298 ) #124085 Add @serialTest() to some tests slow gradcheck already runs serially Doing this slowly so its easier to check flaky issues that might get made Pull Request resolved: https://github.com/pytorch/pytorch/pull/124298 Approved by: https://github.com/kit1980	2024-04-18 00:13:36 +00:00
Catherine Lee	0abd3f60fd	[CI] Reduce CI_SERIAL_LIST list (#124085 ) Add serial marker for individual tests so the test file can be removed from the ci serial list Run serial marked tests first in serial Run all other tests afterwards in parallel Slowly reduce list and mark individual tests as serial instead Hope # of serial tests is small so sharding evenness doesn't get too messed up Hopefully can do 3 procs for sm86 and cpu? serial no longer looks like a real word to me Pull Request resolved: https://github.com/pytorch/pytorch/pull/124085 Approved by: https://github.com/seemethere, https://github.com/malfet	2024-04-17 00:23:47 +00:00
Catherine Lee	946b50c788	[ez][TD] Increase logging (#124082 ) increase logging during td generate an artifact that says which tests got excluded fix minor bug where filter test configs couldnt get commit messages Pull Request resolved: https://github.com/pytorch/pytorch/pull/124082 Approved by: https://github.com/seemethere, https://github.com/malfet	2024-04-17 00:18:28 +00:00
Catherine Lee	3cd06f56b1	[ez] test_profiler in serial (#123665 ) Add test_profiler to the serial list since we keep needing to reopen disable issues and I think its due to being incompatible with parallelism Pull Request resolved: https://github.com/pytorch/pytorch/pull/123665 Approved by: https://github.com/ZainRizvi, https://github.com/huydhn	2024-04-11 20:24:47 +00:00
William Wen	4bee4c7c25	[3.12] enable inductor unittests (#123654 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123654 Approved by: https://github.com/jansel	2024-04-10 20:51:43 +00:00
Catherine Lee	61be8843c9	[TD] Use label to configure td on distributed for rollout (#122976 ) Gate TD on distributed behind label TODO: auto add label to certain people's prs Pull Request resolved: https://github.com/pytorch/pytorch/pull/122976 Approved by: https://github.com/huydhn, https://github.com/ZainRizvi	2024-04-08 15:53:55 +00:00
William Wen	d59c5d7353	[dynamo, 3.12] enable dynamo on 3.12, enable most dynamo unittests on 3.12 (#123216 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123216 Approved by: https://github.com/jansel, https://github.com/malfet	2024-04-04 20:00:54 +00:00
Catherine Lee	b5bef9bbfd	Fix cpp tests not running + failing to surface (#122845 ) The comment in the code should have the information Pull Request resolved: https://github.com/pytorch/pytorch/pull/122845 Approved by: https://github.com/huydhn	2024-03-29 22:41:45 +00:00
Catherine Lee	03184a82dd	[TD] TD on ASAN PR jobs (#122332 ) Low impact CPU jobs Pull Request resolved: https://github.com/pytorch/pytorch/pull/122332 Approved by: https://github.com/huydhn	2024-03-22 22:32:51 +00:00
eellison	cbbed46377	Defer selection of triton template (#120275 ) Our prior approach to epilogue fusion was to select from a choice from a set of triton templates and extern calls based on benchmarking inputs, then unconditionally fuse epilogues. This can be sub-optimal in following ways: - We select an extern kernel, however an epilogue like relu() exists such that choosing a triton template + relu would have been faster - We select a triton template, epilogue fuse, and register spilling occurs causing it to be slower than not epilogue fusing. In this PR we wait to select either the Triton Template or Extern Kernel based on benchmarking results from the kernel itself and its epilogue. As soon as a successful fusion occurs where a fused Triton Template + epilogue is faster than the unfused choice we finalize the MultiTemplateBuffer as a specific template. If no fusion occurs we'll finalize the MultiTemplateBuffer after fusion. Note: if there are multiple epilogue fusions (not super likely), even though we select a template after the first fusion, we will still benchmark to see if subsequent epilogue are worth fusing. We could potentially defer choosing template in this case in a follow up at expense of compile time. Gives 4% HF training win, 10% TIMM inference win. Increases compilation time which I will be trying to address more in follow up prs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120275 Approved by: https://github.com/jansel ghstack dependencies: #121996	2024-03-20 01:40:33 +00:00
Kai Londenberg	a5ec45f2ec	[Inductor Cutlass backend] Move tests to separate file (#121489 ) Move Cutlass backend related tests to test/inductor/test_cutlass_backend.py - no changes to the tests themselves. Pull Request resolved: https://github.com/pytorch/pytorch/pull/121489 Approved by: https://github.com/jansel	2024-03-12 21:59:48 +00:00
Catherine Lee	fac06a12c8	CI sanity check test for env vars (#120519 ) Make a test that fails on purpose to trigger retries. Check the opposite of success (that env vars exist) It's bit hacky because I want it to fail on the normal flow in order to trigger reruns but I don't want to expose the failures to users since it's confusing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120519 Approved by: https://github.com/huydhn	2024-03-11 15:35:45 +00:00
PyTorch MergeBot	2c2d6ce515	Revert "CI sanity check test for env vars (#120519 )" This reverts commit `f43b9c56c5`. Reverted https://github.com/pytorch/pytorch/pull/120519 on behalf of https://github.com/clee2000 due to broken on slow `d27509c384` https://github.com/pytorch/pytorch/actions/runs/8208843198/job/22453617568 ([comment](https://github.com/pytorch/pytorch/pull/120519#issuecomment-1986480624))	2024-03-08 22:01:35 +00:00
Catherine Lee	f43b9c56c5	CI sanity check test for env vars (#120519 ) Make a test that fails on purpose to trigger retries. Check the opposite of success (that env vars exist) It's bit hacky because I want it to fail on the normal flow in order to trigger reruns but I don't want to expose the failures to users since it's confusing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120519 Approved by: https://github.com/huydhn	2024-03-08 20:28:50 +00:00
Catherine Lee	06b52dd103	TD outside of test job (#118250 ) Give TD it's own job so that each shard can get the results from this one job artifact and they will always be in sync with each other/no longer need to worry about consistently issues * Move test discovery to its own file that is not dependent on torch so it can be run without building torch * Cannot do cpp test discovery before building pytorch * Move TD calculation to own file that will create a json file with the final results * TD is now job/build env agnostic * TD will rank all tests, including those that test jobs may not want to run (ex it will rank distributed tests along with default tests, even though these tests are never run on the same machine together) Pull Request resolved: https://github.com/pytorch/pytorch/pull/118250 Approved by: https://github.com/huydhn	2024-03-01 23:08:10 +00:00
Catherine Lee	0290fe65bd	Test TD (test removal) on crossref (#119426 ) Current threshold is to cut the bottom 75% of test files, which results in 13 min of tests getting cut. test_ops, functorch/test_ops, and test_decomp and other really long running test files are not getting cut and make the top 25% to take really long (still 90+ min) The original plan was to test on rocm but I'm worried about queuing given that cutting 75% of test files only cuts off 13 min, and crossref is rarely referenced by others and people keep talking about getting rid of it, so it's a good alternative Pull Request resolved: https://github.com/pytorch/pytorch/pull/119426 Approved by: https://github.com/huydhn	2024-02-29 18:53:43 +00:00

1 2 3 4 5 ...

653 Commits