pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Bin Bao	577d930c39	[CI] Revert https://github.com/pytorch/pytorch/pull/96195 (#96897 ) Summary: https://github.com/pytorch/pytorch/pull/96195 was an experiment for debugging flaky failures on CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96897 Approved by: https://github.com/ngimel	2023-03-16 06:28:18 +00:00
Edward Z. Yang	3606f59366	Default specialize_int to False (#96624 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96624 Approved by: https://github.com/janeyx99	2023-03-16 02:54:18 +00:00
Will Constable	54cd4a67d0	Output peak memory stats from dynamo torchbench perf CI (#95666 ) Adds absolute memory usage numbers (in addition to compression ratio) to performance jobs. Example output: <img width="1211" alt="image" src="https://user-images.githubusercontent.com/4984825/225419950-500908c5-00ce-4711-afa2-c995bf90d35d.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/95666 Approved by: https://github.com/ezyang, https://github.com/williamwen42	2023-03-15 19:24:47 +00:00
Bin Bao	33c7be360f	[reland][CI] switch torchbench to a pinned version (#96782 ) Summary: This is reland of https://github.com/pytorch/pytorch/pull/96553 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96782 Approved by: https://github.com/huydhn	2023-03-15 12:46:36 +00:00
Edward Z. Yang	037acd5a22	Update CI skips (#96745 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96745 Approved by: https://github.com/wconstab	2023-03-14 22:19:10 +00:00
PyTorch MergeBot	be4eaa69c2	Revert "[CI] switch torchbench to a pinned version (#96553 )" This reverts commit `61d6ccd29a`. Reverted https://github.com/pytorch/pytorch/pull/96553 on behalf of https://github.com/desertfire due to land race	2023-03-14 21:39:45 +00:00
PyTorch MergeBot	ba4fb9b6ad	Revert "Default specialize_int to False (#96624 )" This reverts commit `1ac8782db2`. Reverted https://github.com/pytorch/pytorch/pull/96624 on behalf of https://github.com/kit1980 due to Broke inductor/test_torchinductor_dynamic_shapes.py	2023-03-14 19:43:47 +00:00
Bin Bao	61d6ccd29a	[CI] switch torchbench to a pinned version (#96553 ) Summary: Previously we were using a branch on torchbench which skips torchaudio. We should switch to make sure a good test coverage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96553 Approved by: https://github.com/huydhn, https://github.com/ezyang	2023-03-14 18:42:22 +00:00
Edward Z. Yang	1ac8782db2	Default specialize_int to False (#96624 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96624 Approved by: https://github.com/janeyx99	2023-03-14 18:37:47 +00:00
David Berard	6e3d51b08a	[inductor][CI] also skip rexnet_100 on non-dynamic shapes (#96691 ) Recent failures show rexnet_100 accuracy is flaky also on non-dynamic shapes (was already disabled for dynamic shapes in #96474). The failure occurs for the same reason (stem.bn.weight.grad). e.g. https://github.com/pytorch/pytorch/actions/runs/4402868441/jobs/7710977874 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96691 Approved by: https://github.com/desertfire	2023-03-14 18:11:59 +00:00
Edward Z. Yang	ff7e510d1e	Correctly use PythonPrinter for generating wrapper code referencing sympy (#96710 ) Otherwise you get stuff like ceiling(s0) which is not valid Python code. Fixes volo_d1_224 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96710 Approved by: https://github.com/ngimel, https://github.com/jansel	2023-03-14 14:35:52 +00:00
Wang, Eikan	3cad8d23d0	[Inductor] Skip the hf_T5_base due to intermittent failure on CI (#96649 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96649 Approved by: https://github.com/desertfire	2023-03-14 07:40:20 +00:00
Edward Z. Yang	507feb805f	Don't specialize torch.Size with specialize_int = False (#96419 ) Fixes https://github.com/pytorch/pytorch/issues/95868 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96419 Approved by: https://github.com/jansel, https://github.com/ngimel	2023-03-14 01:32:58 +00:00
Edward Z. Yang	c7f39c0820	Update CI skips (#96554 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96554 Approved by: https://github.com/janeyx99	2023-03-13 13:40:45 +00:00
David Berard	29cd60dfb7	[CI] handle more dynamo benchmark models that are not expected to be deterministic (#96324 ) Follow-up to #96245. alexnet, Background_Matting, vision_maskrcnn, and vgg16 all have the same problem; but on float32 they were also failing on the previous day so I missed this. Once the amp jobs became available I could see that these have the same issue (on both float32 and amp). Pull Request resolved: https://github.com/pytorch/pytorch/pull/96324 Approved by: https://github.com/desertfire	2023-03-10 18:15:34 +00:00
Bin Bao	a651e6253a	[CI] Change compile_threads to 1 when running benchmark accuracy test on CI (#96195 ) Summary: This is not a pretty solution, but it a way to verify if the flakiness is coming from parallel compilation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96195 Approved by: https://github.com/ngimel	2023-03-10 17:39:38 +00:00
Edward Z. Yang	ff2e14f200	Skip rexnet_100 in dynamic CI (#96474 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96474 Approved by: https://github.com/yanboliang, https://github.com/msaroufim	2023-03-10 01:23:19 +00:00
Edward Z. Yang	c988de1040	[EASY] Update inductor training dynamic skips (#96298 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96298 Approved by: https://github.com/Chillee, https://github.com/janeyx99	2023-03-08 19:31:46 +00:00
Bin Bao	b3a079810e	[CI] Add a workflow for quick perf comparison (#96166 ) Summary: ciflow/inductor-perf-test-nightly now contains full dashboard run which takes a very long time. Ed proposed a simplification of the perf run there, but it is still worth to have a set of fast perf test which only includes one configuration (--training --amp). Pull Request resolved: https://github.com/pytorch/pytorch/pull/96166 Approved by: https://github.com/huydhn, https://github.com/weiwangmeta	2023-03-08 19:09:04 +00:00
Bin Bao	664381b293	[CI] Avoid calling torch.use_deterministic_algorithms for some models (#96245 ) tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/96245 Approved by: https://github.com/davidberard98	2023-03-08 03:35:32 +00:00
Edward Z. Yang	d0641ed247	[TEST] Turn on unspecialize int dynamic training inductor CI (#96058 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96058 Approved by: https://github.com/janeyx99, https://github.com/voznesenskym	2023-03-07 16:08:45 +00:00
Edward Z. Yang	a6e3e7905e	Turn on unspecialize int dynamic inductor CI (#96034 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96034 Approved by: https://github.com/voznesenskym	2023-03-07 12:39:55 +00:00
Jason Ansel	95d17dc93d	[inductor] Reland #95567 part 1 (#96023 ) This is the non-problematic part of #95567. The errors were coming from IR printing changes which will be next in the stack. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96023 Approved by: https://github.com/ngimel, https://github.com/mlazos	2023-03-06 22:57:22 +00:00
Edward Z. Yang	1fd7ea1ba8	Update skips for RecursionError (#96109 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/96109 Approved by: https://github.com/huydhn	2023-03-06 17:55:38 +00:00
Bin Bao	60cf95610d	[CI] Skip xcit_large_24_p8_224 in TIMM (#96048 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/96048 Approved by: https://github.com/jansel	2023-03-05 14:54:46 +00:00
Bin Bao	1359d16fe8	[CI] Further tighten the checking of two eager runs (#95902 ) Summary: To catch nondeterminism in eager if there is any. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95902 Approved by: https://github.com/jansel	2023-03-05 14:53:02 +00:00
Edward Z. Yang	c7c4a20321	Update dynamic skips (#95966 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/95966 Approved by: https://github.com/janeyx99, https://github.com/voznesenskym	2023-03-04 23:01:58 +00:00
Jason Ansel	43dd043ea7	Revert "[inductor] Improve error messages (#95567 )" (#96014 ) This reverts commit `62b775583f`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96014 Approved by: https://github.com/Chillee	2023-03-04 04:03:31 +00:00
Edward Z. Yang	d303665d33	Make int unspecialization actually work (#95621 ) OK, so this PR used to be about reducing the number of constants we specialize on, but it turns out that unspecialization was ~essentially never used (because we still constant specialized way too aggressively) and I ended up having to fix a bunch of issues to actually get tests to pass. So this PR is now "make int unspecialization actually work". As part of this, I have to turn off unspecialization by default, as there are still latent bugs in inductor. The general strategy is that an unspecialized int is represented as a SymInt. Representing it as a 0d tensor (which is what the code used to do) is untenable: (1) we often need unspecialized ints to participate in size computations, but we have no way of propagating sympy expressions through tensor compute, and (2) a lot of APIs work when passed SymInt, but not when passed a Tensor. However, I continue to represent Numpy scalars as Tensors, as they are rarely used for size computation and they have an explicit dtype, so they are more accurately modeled as 0d tensors. * I folded in the changes from https://github.com/pytorch/pytorch/pull/95099 as I cannot represent unspecialized ints as SymInts without also turning on dynamic shapes. This also eliminates the necessity for test_unspec.py, as toggling specialization without dynamic shapes doesn't do anything. As dynamic shapes defaults to unspecializing, I just deleted this entirely; for the specialization case, I rely on regular static shape tests to catch it. (Hypothetically, we could also rerun all the tests with dynamic shapes, but WITH int/float specialization, but this seems... not that useful? I mean, I guess export wants it, but I'd kind of like our Source heuristic to improve enough that export doesn't have to toggle this either.) * Only 0/1 integers get specialized by default now * A hodgepodge of fixes. I'll comment on the PR about them. Fixes https://github.com/pytorch/pytorch/issues/95469 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/95621 Approved by: https://github.com/jansel, https://github.com/Chillee	2023-03-04 01:22:08 +00:00
Jason Ansel	62b775583f	[inductor] Improve error messages (#95567 ) Example error message before/after (710 to 131 lines): https://gist.github.com/jansel/6fecad057738089fa95bf08c3de9fc8a Pull Request resolved: https://github.com/pytorch/pytorch/pull/95567 Approved by: https://github.com/mlazos	2023-03-02 02:20:55 +00:00
Bin Bao	879f0c3fee	[CI] Increate the timeout limit for benchmark test (#95787 ) Summary: xcit_large_24_p8_224 occasionally hits TIMEOUT on CI. Bump up the limit to reduce flakiness. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95787 Approved by: https://github.com/ezyang, https://github.com/ZainRizvi	2023-03-01 19:54:25 +00:00
Bin Bao	e79b2b7792	[CI] Force clear triton cache between running each test (#95729 ) Summary: The idea is to see if this reduces some of the flakiness we have seen on CI. If it does help, then we have a problem in our caching implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95729 Approved by: https://github.com/ngimel	2023-03-01 04:10:03 +00:00
Will Constable	1a72712645	Add dynamo graph break stats to CI (#95635 ) Adds columns to csv produced by accuracy job including dynamo graph break stats. Example output from torchbench CI job: <img width="771" alt="image" src="https://user-images.githubusercontent.com/4984825/221716236-9276684e-1be8-43e1-837e-f41671d4e0e3.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/95635 Approved by: https://github.com/ezyang	2023-02-28 16:17:46 +00:00
Edward Z. Yang	3762e801ba	Update dynamic skips (#95587 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/95587 Approved by: https://github.com/voznesenskym	2023-02-28 03:26:55 +00:00
Bin Bao	fa5a4b0dfc	[CI] Do not compare two eager run results against fp64 result (#95616 ) Summary: When running the benchmark test with --accuracy, two eager runs should return the same result. If not, we want to detect it early, but comparing against fp64_output may hide the non-deterministism in eager. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95616 Approved by: https://github.com/ZainRizvi	2023-02-27 20:11:21 +00:00
Bin Bao	ab1ab3ab19	[CI] Specify more torch.backends.cudnn options to reduce non-determinism (#95478 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/95478 Approved by: https://github.com/ezyang	2023-02-25 18:54:12 +00:00
Bin Bao	4c8ad93a7c	[Inductor][CI] Remove hf_GPT2_large from CPU inference test (#95473 ) Summary: hf_GPT2_large shows random failure on CI for the CPU inference. Created https://github.com/pytorch/pytorch/issues/95474 for the Intel team to investigate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95473 Approved by: https://github.com/anijain2305	2023-02-24 18:21:36 +00:00
Will Constable	8de4238a31	Add dynamo bench arg --per_process_memory_fraction (#95260 ) Simply pipes the arg to the existing torch.cuda API by the same name. Useful for locally debugging OOMs that happened on a smaller GPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/95260 Approved by: https://github.com/davidberard98	2023-02-22 05:11:18 +00:00
Edward Z. Yang	08370ddad8	Update model skips (#95089 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/95089 Approved by: https://github.com/albanD	2023-02-20 13:24:49 +00:00
Wang, Eikan	954c767bc6	[Inductor] Enable accuracy test for CPPBackend (#94898 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94898 Approved by: https://github.com/jgong5, https://github.com/desertfire	2023-02-20 05:02:15 +00:00
Edward Z. Yang	a2f44d82f8	Flag guard unbacked SymInt/SymFloat support (#94987 ) I believe this fixes the AllenaiLongformerBase problem in periodic. The longer version of the problem is here is we are currently optimistically converting all item() calls into unbacked SymInt/SymFloat, but sometimes this results in a downstream error due to a data-dependent guard. Fallbacks for this case are non-existent; this will just crash the model. This is bad. So we flag guard until we get working fallbacks. What could these fallbacks look like? One idea I have is to optimistically make data-dependent calls unbacked, but then if it results in a crash, restart Dynamo analysis with the plan of graph breaking when the item() call immediately happened. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/94987 Approved by: https://github.com/Skylion007, https://github.com/malfet	2023-02-17 00:25:05 +00:00
Edward Z. Yang	7aaebe00ee	Fail dynamic_aot_eager AllenaiLongformerBase model (#94986 ) ``` GuardOnDataDependentSymNode: It appears that you're trying to get a value out of symbolic int/float whose value is data-dependent (and thus we do not know the true value.) The expression we were trying to evaluate is Eq(i3, -1). Scroll up to see where each of these data-dependent accesses originally occurred. While executing %as_strided : [#users=1] = call_method[target=as_strided](args = (%pad,), kwargs = {size: (12, %add, 768, 64), stride: (%getitem, %mul, %getitem_1, %getitem_2)}) Original traceback: File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/transformers/models/longformer/modeling_longformer.py", line 928, in <graph break in _sliding_chunks_matmul_attn_probs_value> chunked_value = padded_value.as_strided(size=chunked_value_size, stride=chunked_value_stride) ``` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/94986 Approved by: https://github.com/albanD	2023-02-16 20:02:46 +00:00
Aaron Gokaslan	0444a6c90a	[BE] Remove deprecated logging warn method (#94708 ) Swaps all logging.warn calls to logging.warning since the former is deprecated and even raises a deprecation warning now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94708 Approved by: https://github.com/ezyang	2023-02-13 18:24:52 +00:00
Edward Z. Yang	ae7a628b03	Dynamic shapes CI updates (#94690 ) Data from https://github.com/pytorch/pytorch/pull/94683 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/94690 Approved by: https://github.com/cpuhrsch	2023-02-13 18:20:12 +00:00
PyTorch MergeBot	10c430ba0a	Revert "Set torch.backends.cudnn.enabled to false when testing accuracy (#94363 )" This reverts commit `2a5851735a`. Reverted https://github.com/pytorch/pytorch/pull/94363 on behalf of https://github.com/desertfire due to TIMM models start to show flaky failures after this PR, need more investigation	2023-02-10 04:40:32 +00:00
Bin Bao	2a5851735a	Set torch.backends.cudnn.enabled to false when testing accuracy (#94363 ) Summary: It looks like setting torch.backends.cudnn.deterministic to True is not enough for eliminating non-determinism when testing benchmarks with --accuracy, so let's turn off cudnn completely. With this change, mobilenet_v3_large does not show random failure on my local environment. Also take this chance to clean up CI skip lists. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94363 Approved by: https://github.com/ezyang	2023-02-09 23:43:13 +00:00
Xuehai Pan	a229b4526f	[BE] Prefer dash over underscore in command-line options (#94505 ) Preferring dash over underscore in command-line options. Add `--command-arg-name` to the argument parser. The old arguments with underscores `--command_arg_name` are kept for backward compatibility. Both dashes and underscores are used in the PyTorch codebase. Some argument parsers only have dashes or only have underscores in arguments. For example, the `torchrun` utility for distributed training only accepts underscore arguments (e.g., `--master_port`). The dashes are more common in other command-line tools. And it looks to be the default choice in the Python standard library: `argparse.BooleanOptionalAction`: `4a9dff0e5a/Lib/argparse.py (L893-L895)` ```python class BooleanOptionalAction(Action): def __init__(...): if option_string.startswith('--'): option_string = '--no-' + option_string[2:] _option_strings.append(option_string) ``` It adds `--no-argname`, not `--no_argname`. Also typing `_` need to press the shift or the caps-lock key than `-`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94505 Approved by: https://github.com/ezyang, https://github.com/seemethere	2023-02-09 20:16:49 +00:00
Edward Z. Yang	c028fc4e25	Decouple PT2 dynamic shapes from the functorch setting (#94469 ) The functorch setting still exists, but now it is no longer necessary: we infer use of Python dispatcher by checking if the ambient FakeTensorMode has a ShapeEnv or not. The setting still exists, but it is for controlling direct AOTAutograd use now; for PT2, it's sufficient to use torch._dynamo.config.dynamic_shapes. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/94469 Approved by: https://github.com/Chillee, https://github.com/voznesenskym, https://github.com/jansel	2023-02-09 06:41:41 +00:00
PyTorch MergeBot	ca63040d2b	Revert "Set torch.backends.cudnn.enabled to false when testing accuracy (#94363 )" This reverts commit `7bfc59993d`. Reverted https://github.com/pytorch/pytorch/pull/94363 on behalf of https://github.com/huydhn due to This change fails in trunk `7bfc59993d` running out of memory. Mark this as weird because it was green in PR	2023-02-09 01:24:35 +00:00
Bin Bao	7bfc59993d	Set torch.backends.cudnn.enabled to false when testing accuracy (#94363 ) Summary: It looks like setting torch.backends.cudnn.deterministic to True is not enough for eliminating non-determinism when testing benchmarks with --accuracy, so let's turn off cudnn completely. With this change, mobilenet_v3_large does not show random failure on my local environment. Also take this chance to clean up CI skip lists. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94363 Approved by: https://github.com/ezyang	2023-02-08 23:30:10 +00:00

1 2 3 4

164 Commits