pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Shangdi Yu	a4e4368157	add node mapping processing (#146103 ) Summary: Add `node_mapping = create_node_mapping(pre_grad_graph_id, inductor_post_to_pre_grad_nodes, debug_info)`, to produce a `inductor_provenance_tracking_node_mappings.json` file. This file will be used by the provenance tracking highlighter tool to create provenance visualization. `inductor_triton_kernel_to_post_grad_nodes.json` and `inductor_provenance_tracking_node_mappings.json` files are not dumped if they are both empty. So it's removed from some of the `test_structured_trace` tests. Test Plan: CI ``` buck run mode/dev-nosan fbcode//caffe2/test:fx -- -r graph_provenance buck run mode/dev-nosan fbcode//caffe2/test/inductor:provenance_tracing python test/dynamo/test_structured_trace.py ``` Differential Revision: D68190173 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146103 Approved by: https://github.com/chenyang78	2025-02-01 08:29:29 +00:00
shangdiy	6bd19e65b1	add inductor_triton_kernel_mapping_post_grad.json to tlparseadd changes (#145954 ) Landing D67612181 here. The original exported PR somehow fails OSS CI, but this one doesn't (though the PR content is the same). Add debug trace artifact to inductor_triton_kernel_mapping_post_grad.json (debug artifact for provenance tracking) to tlparse. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145954 Approved by: https://github.com/YUNQIUGUO	2025-01-30 06:18:48 +00:00
Animesh Jain	64ee57847b	[dynamo][builtin-skipfiles-cleanup] Remove some builtins (#145856 ) [dynamo][builtin-skipfiles-cleanup] Remove more builtins Pull Request resolved: https://github.com/pytorch/pytorch/pull/145856 Approved by: https://github.com/zou3519	2025-01-29 05:29:47 +00:00
James Wu	7c1fc0a047	Log cache state for AOTAutograd in title of file (#145715 ) Differential Revision: [D68692755](https://our.internmc.facebook.com/intern/diff/D68692755/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145715 Approved by: https://github.com/bobrenjc93	2025-01-28 02:14:18 +00:00
Sam Larsen	de04acaca9	Disable scuba logging for autotuning (#144568 ) Summary: the compile IDs are currently null, which is confusing. Turn it off until we have a solution. Test Plan: https://fburl.com/scuba/dynamo_compile/sandbox/g2d2g5xs Pull Request resolved: https://github.com/pytorch/pytorch/pull/144568 Approved by: https://github.com/jamesjwu	2025-01-12 15:47:14 +00:00
Jason Ansel	2da7fb5320	[inductor] Make generated kernels deterministic (#143951 ) `"compile_id"` had slipped into our generated Triton code (in the metadata), which will defeat caching because the same kernels generated in a different order would not cache hit with eachother. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143951 Approved by: https://github.com/oulgen	2024-12-30 23:35:11 +00:00
Simon Fan	a8953c36f5	[compiled autograd] log compilation time to perfetto (#140964 ) https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmprli4iy/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=100 ``` [ { "args": { "compile_id": "0/-/-", "graph_id": 0 }, "cat": "dynamo_timed", "name": "compiled_autograd", "ph": "B", "pid": 0, "tid": 0, "ts": 1733886868992655.8 }, { "args": { "compile_id": "0/-/-", "graph_id": 0 }, "cat": "dynamo_timed", "name": "compiled_autograd", "ph": "E", "pid": 0, "tid": 0, "ts": 1733886869130681.0 }, { "args": { "compile_id": "0/0/0" }, "cat": "dynamo_timed", "name": "dynamo", "ph": "B", "pid": 0, "tid": 0, "ts": 1733886869134350.5 }, { ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140964 Approved by: https://github.com/masnesral ghstack dependencies: #141907, #143175	2024-12-21 04:23:25 +00:00
Simon Fan	4ee166b82f	[ca] add compiled autograd to CompileId (#141907 ) tlparse PR: https://github.com/ezyang/tlparse/pull/83 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141907 Approved by: https://github.com/ezyang	2024-12-21 00:41:24 +00:00
Guilherme Leobas	673cc88fd6	Add support for `contextmanager` in Dynamo (#136033 ) Fixes #130559 * Intro This PR adds support for `@contextmanager` in Dynamo. We chose to limit the scope of this work to only `@contextmanager` and plan to handle generators fully in #141055 (still in draft). * Motivation Dynamo lacks support for generator functions. When it encounters one, it traces it as if it were a regular function. This is problematic because it can lead to incorrect behavior. To illustrate, consider the test case below: ```python import torch import contextlib @contextlib.contextmanager def set_default_dtype(dtype): old_dtype = torch.get_default_dtype() try: torch.set_default_dtype(dtype) yield finally: torch.set_default_dtype(old_dtype) @torch.compile(backend="eager", fullgraph=True) def fn(): with set_default_dtype(torch.float64): x = torch.tensor([3.0, 3.0 + 5.0j]) return x ``` Before this work, Dynamo would not stop at the `yield`, and the graph produced would contain both calls to `set_default_dtype` executed one after the other. This is incorrect because the context manager should execute code before and after the `yield`. * List of changes `YIELD_VALUE` now raises an exception (`YieldValueOp`) to signal that control flow must be suspended and returned to the caller. Additionally, `RETURN_VALUE` behaves differently in a generator function. Unlike regular functions, where `RETURN_VALUE` indicates the final result, in generators it signifies that the generator is exhausted and implicitly raises `StopIteration`. A new `VariableTracker` named `FunctionDecoratedByContextlibContextManagerVariable` was introduced to handle `@contextmanager`. This variable tracker acts not just as a wrapper for the original function but also maintains an internal `tx` (InstructionTranslator) object to suspend and return control flow to the parent tracer when a `yield` is encountered. * Corner cases Returning a context manager from a compiled function is not supported. This would require PyTorch to synchronize the generator state between Dynamo and the interpreter. Any attempt to return it will result in an `IncorrectUsage` exception. Graph breaks require special handling as well. In the event of a graph break, the frame associated with the context manager is skipped, and the context manager runs in eager mode. * This PR is breaking my code There is a configuration flag (`enable_trace_contextlib`) that can be set to `False` to disable tracing context managers. If this still causes crashes, please revert this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/136033 Approved by: https://github.com/zou3519	2024-12-20 12:02:20 +00:00
Tom Ritchford	d25e6e623f	Fix unused Python variables in test/[a-d]* (#134665 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/134665 Approved by: https://github.com/albanD	2024-12-13 22:13:12 +00:00
Shangdi Yu	8fae4397b4	Add "inductor_pre_grad_graph" logging (#142717 ) (#143126 ) Summary: Add new structured logging "inductor_pre_grad_graph" This is for inductor provenance tracking front-end to load this graph from tlparse. ghstack-source-id: 257581974 exported-using-ghexport Test Plan: ``` buck2 run 'fbcode//mode/dev-nosan' //caffe2/test/dynamo:test_dynamo -- -r StructuredTraceTest ``` Differential Revision: D67150288 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143126 Approved by: https://github.com/desertfire	2024-12-13 21:48:25 +00:00
Sam Larsen	60c54467db	[logging] Log runtime autotuning timing to scuba (#141919 ) See test plan in internal diff [D66679369](https://our.internmc.facebook.com/intern/diff/D66679369) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141919 Approved by: https://github.com/jamesjwu, https://github.com/ezyang	2024-12-13 21:22:13 +00:00
Yuanhao Ji	67ba79676f	[Dynamo] Replace `torch._dynamo.optimize()` with `torch.compile()` [7/N] (#140922 ) related commits: - #139706 - #140238 - #140247 - #140253 - #140663 - #140688 - #140922 - #140924 - #140933 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140922 Approved by: https://github.com/williamwen42	2024-12-06 07:07:29 +00:00
James Wu	a7ca6a9113	Enable autograd cache on inductor tests (#140890 ) This turns on AOTAutogradCache for all inductor tests. It clears AOTAutogradCache on each test as well, by virtue of the local cache using the same directory to store cache entries. I've also tested with INDUCTOR_TEST_DISABLE_FRESH_CACHE=1, running all the tests. AOTAutogradCache successfully caches 99% of these. There are a few tests that use view_replay and therefore save functional tensors, which cause AOTAutogradCache to fail to pickle its result. Will look into next steps there, but for now, it seems okay if the cache just misses on those cases where it can't serialize the result. It would be better to check before pickling, though. I've made the following small bugfixes to get this working: - Inductor is sometimes used in a standalone mode without dynamo, which leads to attribute errors in check_can_cache. In general, we should never crash in cache checking, only bypass. So I change a try catch to check Exception instead of just a specific exception. - Add extra structured logging for metadata on cache hits Pull Request resolved: https://github.com/pytorch/pytorch/pull/140890 Approved by: https://github.com/bdhirsh	2024-11-27 20:41:43 +00:00
Edward Z. Yang	8c8a484d72	Add some symbolic shapes guard logs to tlparse by default (#140867 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/140867 Approved by: https://github.com/bdhirsh	2024-11-27 01:00:14 +00:00
PyTorch MergeBot	ad37afd590	Revert "Always unspecialize float in OSS (#138922 )" This reverts commit `ba5253da9b`. Reverted https://github.com/pytorch/pytorch/pull/138922 on behalf of https://github.com/yf225 due to perf regression on torchbench ([comment](https://github.com/pytorch/pytorch/pull/138922#issuecomment-2499277511))	2024-11-26 00:03:03 +00:00
Bob Ren	ba5253da9b	Always unspecialize float in OSS (#138922 ) Fixes https://github.com/pytorch/pytorch/issues/107277 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138922 Approved by: https://github.com/ezyang Co-authored-by: Edward Z. Yang <ezyang@meta.com>	2024-11-24 01:58:13 +00:00
PyTorch MergeBot	a8c90e5140	Revert "Always unspecialize float in OSS (#138922 )" This reverts commit `6d779d0549`. Reverted https://github.com/pytorch/pytorch/pull/138922 on behalf of https://github.com/huydhn due to Sorry for reverting your change but there is some slow tests failing after this land ([comment](https://github.com/pytorch/pytorch/pull/138922#issuecomment-2495076878))	2024-11-22 23:18:36 +00:00
Bob Ren	6d779d0549	Always unspecialize float in OSS (#138922 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138922 Approved by: https://github.com/ezyang Co-authored-by: Edward Z. Yang <ezyang@meta.com>	2024-11-22 17:54:42 +00:00
Edward Z. Yang	bca696ae81	Switch times to us in CompilationMetrics and improvements (#138975 ) Companion logger diff: https://www.internalfb.com/diff/D65012523 * Using float seconds for timestamps is bad because our internal system defaults to float32 precision and you don't even get second precision for timestamps in float32 * We decide to use microseconds instead of milliseconds because millisecond granularity you can end up with the same timestamp if compilation is happening very quickly; much better to force non-overlapping spans * Because there are so many new fields and I don't feel like reimplementing each on BwdCompilationMetrics, BwdCompilationMetrics is no more, it's just that everything in CompilationMetrics is now optional. * The actual frame compile times collection is not modified (still float) to reduce blast radius, so I just convert to microseconds before making the record. At float64 precision (Python's default), you get about microsecond precision on timestamps so shouldn't be a data problem (https://www.leebutterman.com/2021/02/01/store-your-unix-epoch-times-as-float64.html) * I rename some entries for clarity. In particular, whenever a timing contains all of the its lower phases (e.g., how Inductor also contains Triton compilation) we put "cumulative" in its name. If something doesn't happen at compile time but is delayed until we have actual real inputs, we put "runtime" in its name. Test plan: ``` buck2 run @mode/opt @mode/inplace //scripts/oulgen:runner ``` And then inspect https://fburl.com/scuba/dynamo_compile/sandbox/mslu7f5w and verify the us columns are populated and meaningful. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/138975 Approved by: https://github.com/masnesral	2024-10-28 17:17:18 +00:00
Sam Larsen	86b45bde19	[pt2] Add logger logging for remote fx graph cache get + put (#138164 ) Summary: Capture the timing for the remote fx graph cache get and put operations and add them to the logger logging. Test Plan: 1) Landed D64483593 and waited for logger actualization. 2) Ran test script on devserver: `buck2 run mode/opt scripts/slarsen/torch_compile_model:run` 3) Queried dynamo_compile/sandbox: ``` (pytorch-3.10_4) devvm2296:~/local/pytorch-3.10_4 $ scuba -e="select time,co_filename,remote_fx_graph_cache_get_time_s,remote_fx_graph_cache_put_time_s from \`dynamo_compile/sandbox\` where remote_fx_graph_cache_put_time_s is not null" +------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+----------------------------------+ \| time \| co_filename \| remote_fx_graph_cache_get_time_s \| remote_fx_graph_cache_put_time_s \| +------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+----------------------------------+ \| 1729136266 \| null \| 0.05652284622192383 \| 0.9691152572631836 \| \| 1729136263 \| /data/users/slarsen/fbsource/buck-out/v2/gen/fbcode/289bb46b326874c6/scripts/slarsen/torch_compile_model/__run__/run-inplace#link-tree/scripts/slarsen/torch_compile_model/run.py \| 0.8298435211181641 \| 0.18642282485961914 \| +------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+----------------------------------+ ``` Reviewed By: oulgen Differential Revision: D64484025 Pull Request resolved: https://github.com/pytorch/pytorch/pull/138164 Approved by: https://github.com/jamesjwu, https://github.com/ezyang	2024-10-25 21:30:18 +00:00
Will Feng	36c6ad71ba	[tlparse] Add `dynamo_graph_break_reason` logging to trace_structured (#138778 ) A common challenge during torch.compile enablement is to answer user's question: "where is the graph break?". This PR will help make it easier to answer by surfacing graph breaks and their corresponding user stack trace / compiler stack trace in a direct link e.g. `0_0_0/dynamo_graph_break_reason_0.txt` from tlparse index.html. ![image](https://github.com/user-attachments/assets/79cd43f5-af14-4d08-9d5b-cb47d8203851) ![image](https://github.com/user-attachments/assets/23233ee2-0d56-4526-bf9a-d22c337c4d18) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138778 Approved by: https://github.com/ezyang	2024-10-25 02:00:04 +00:00
Sam Larsen	a80b87353c	[pt2] Log is_forward field to dynamo_compile scuba table (#138505 ) Differential Revision: [D64711721](https://our.internmc.facebook.com/intern/diff/D64711721) Pull Request resolved: https://github.com/pytorch/pytorch/pull/138505 Approved by: https://github.com/oulgen	2024-10-22 05:50:49 +00:00
Will Feng	0c76c68d7d	[tlparse][AOTAutograd] Rename to aot_inference_graph in tlparse output (#137803 ) Compiled Autograd uses this AOT inference path, but it shows up as "aot_forward_graph" in tlparse output, which causes it to not be easily differentiable from normal "aot_forward_graph"s that are also in the tlparse output. This PR renames it to "aot_inference_graph" which makes it easier to tell which tlparse graph block is from Compiled Autograd. Pull Request resolved: https://github.com/pytorch/pytorch/pull/137803 Approved by: https://github.com/Microve, https://github.com/bdhirsh, https://github.com/ezyang	2024-10-17 18:44:37 +00:00
Brian Hirsh	ed94725b8c	log ViewAndMutationMeta to trace_structured (#133784 ) I ended up bundling it into the existing tlparse logs for the AOT forward graph, since it looked like registering it as a separate artifact requires changes to tlparse itself (maybe that is wrong though?) Example new fw AOT graph tlparse output for the below code: https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/.tmp70zKiO/0_0_0/aot_forward_graph_2.txt ``` import torch @torch.compile def f(x): out1 = torch.view_as_complex(x) out2 = torch.view_as_complex(x) return out1, out2, x * 2 x_ = torch.randn(4, 2, requires_grad=True, dtype=torch.float64) out = f(x_) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/133784 Approved by: https://github.com/ezyang	2024-10-15 02:49:02 +00:00
Oguz Ulgen	33461592e2	[TLParse] Include cache hit/miss/bypass in the report name (#137282 ) Makes it easier to tell on glance https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/.tmp1xoGc1/index.html <img width="398" alt="image" src="https://github.com/user-attachments/assets/7ed111cb-46d8-4442-a1b2-037d0a8decd8"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/137282 Approved by: https://github.com/jamesjwu	2024-10-07 16:00:00 +00:00
Edward Z. Yang	7cb6d31567	Dump partially traced make_fx graph in event of error to tlparse (#136508 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/136508 Approved by: https://github.com/zou3519, https://github.com/bdhirsh, https://github.com/malfet ghstack dependencies: #136533	2024-09-25 17:44:15 +00:00
Edward Z. Yang	451eaf0ff2	Log full exception trace when error raised in Dynamo (#135697 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/135697 Approved by: https://github.com/Skylion007	2024-09-11 18:14:33 +00:00
Sam Larsen	2ab26806f1	Require tlparse for failing tests in test_structured_trace.py (#135376 ) Summary: These tests are currently failing internally. Per discussion, skip if tlparse is unavailable Test Plan: ``` feature remove tlparse buck2 test 'fbcode//mode/opt' fbcode//caffe2/test/dynamo:test_dynamo -- --run-disabled --regex test_structured_trace.py feature install tlparse buck2 test 'fbcode//mode/opt' fbcode//caffe2/test/dynamo:test_dynamo -- --run-disabled --regex test_structured_trace.py ``` Differential Revision: D62310342 Pull Request resolved: https://github.com/pytorch/pytorch/pull/135376 Approved by: https://github.com/ezyang	2024-09-06 21:53:41 +00:00
James Wu	f037803290	Add ChromiumEventLogger, log FXGraphCache and AOTAutogradCache (#132864 ) This PR implements ChromiumEventLogger in all @dynamo_timed events. For each dynamo timed call, we log: - A start event before starting the function execution - An end event after finishing the function execution - An extra pair of start/end events for any phase names included in dynamo. Separately, this also gives us the ability to log instant events. I use them to log cache hits/misses as a first step. The little arrows on the bottom of the UI are cache hits/misses, and you can look at cache details by clicking each triangle. The outputted chromium trace events can be viewed in perfetto for a timeline of an execution. Here's what it looks like for a run of nanogpt: ![image](https://github.com/user-attachments/assets/cb9e6c7a-1acf-45e6-8a27-6651d9ae6132) And another with warm start: ![image](https://github.com/user-attachments/assets/cd9709bc-59ef-4da1-a7dd-10b1a0ab9b8f) Trace events are based around the JSON Event format: https://docs.google.com/document/d/1CvAClvFfyA5R-PhYUmn5OOQtYMH4h6I0nSsKchNAySU/preview We may want to switch to the less deprecated Protobuf format later, but so far I don't see any features we care about supported there. Internal FB employees can see a link to this in the tlparse output: https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/.tmpVi1FIl/dedicated_log_torch_trace_bb4zl_bc.log/index.html I'll also work on logging these Pull Request resolved: https://github.com/pytorch/pytorch/pull/132864 Approved by: https://github.com/aorenste	2024-08-10 01:15:53 +00:00
Oguz Ulgen	920f0426ae	Add None return type to init -- tests rest (#132376 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132376 Approved by: https://github.com/jamesjwu ghstack dependencies: #132335, #132351, #132352	2024-08-01 15:44:51 +00:00
eellison	f0da167ce5	Add fx graph runnable to tl parse (#130976 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/130976 Approved by: https://github.com/ezyang	2024-07-31 22:19:35 +00:00
Edward Z. Yang	898a431a46	Dump files that look like FX graphs to structured log (#132100 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/132100 Approved by: https://github.com/oulgen	2024-07-31 18:45:28 +00:00
PyTorch MergeBot	5406e46b00	Revert "Add fx graph runnable to tl parse (#130976 )" This reverts commit `52c3af62d6`. Reverted https://github.com/pytorch/pytorch/pull/130976 on behalf of https://github.com/albanD due to Broke trunk ([comment](https://github.com/pytorch/pytorch/pull/130976#issuecomment-2260579485))	2024-07-31 13:53:57 +00:00
eellison	52c3af62d6	Add fx graph runnable to tl parse (#130976 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/130976 Approved by: https://github.com/ezyang	2024-07-31 02:27:22 +00:00
Animesh Jain	f389bca2e9	[dynamo][inline_inbuilt_nn_modules] Skip test_dpp_graphs for now (#132053 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132053 Approved by: https://github.com/laithsakka	2024-07-29 17:59:47 +00:00
Xuehai Pan	918ece4f4d	[BE][Easy][11/19] enforce style for empty lines in import segments in `test/dy*/` (#129762 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129762 Approved by: https://github.com/anijain2305	2024-07-27 17:43:53 +00:00
Animesh Jain	0ceaabaf71	[easy][inline-inbuilt-nn-modules] Update test (#131563 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/131563 Approved by: https://github.com/mlazos ghstack dependencies: #131347, #131367, #131378, #131389, #131405, #131480, #131512	2024-07-24 02:32:19 +00:00
Oguz Ulgen	1e13cb2f28	Log cache state to structured logs (#130845 ) https://interncache-all.fbcdn.net/manifold/tlparse_reports/tree/logs/.tmpRm4MaD/0_0_0/fx_graph_cache_hash_4.json Differential Revision: D59795574 Pull Request resolved: https://github.com/pytorch/pytorch/pull/130845 Approved by: https://github.com/jamesjwu	2024-07-17 16:45:45 +00:00
Animesh Jain	1d983bbb28	[easy][inline-inbuilt-nn-module] Update test output (#130681 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/130681 Approved by: https://github.com/zou3519, https://github.com/jansel ghstack dependencies: #130654, #130420	2024-07-15 06:19:53 +00:00
Xuehai Pan	4d7bf72d93	[BE][Easy] fix ruff rule needless-bool (SIM103) (#130206 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/130206 Approved by: https://github.com/malfet	2024-07-14 08:17:52 +00:00
Edward Z. Yang	e836ee1955	Enhancements to recompiles logs (#130043 ) ---- - We now record on CacheEntry what the compile id that populated it was, so now we can say why a specific frame was rejected - Add structured log for recompiles under name artifact "recompile_reasons". As it stands, it's not terribly structured, but this was the easiest thing I could do to start - Slightly reformat multi-reason printing; since we only report one guard failure seems better to have it as a single line Example output: ``` V0703 10:34:13.273000 140345997743104 torch/_dynamo/guards.py:2590] [0/1] [__recompiles] Recompiling function f in /data/users/ezyang/a/pytorch/b.py:3 V0703 10:34:13.273000 140345997743104 torch/_dynamo/guards.py:2590] [0/1] [__recompiles] triggered by the following guard failure(s): V0703 10:34:13.273000 140345997743104 torch/_dynamo/guards.py:2590] [0/1] [__recompiles] - 0/0: tensor 'L['x']' size mismatch at index 0. expected 4, actual 5 ``` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/130043 Approved by: https://github.com/anijain2305	2024-07-09 03:40:56 +00:00
Oguz Ulgen	04264efab6	Add structured logging on FXGraphCache hit (#129588 ) We'll also want to do this for AOTAutogradCache once that's ready Differential Revision: [D59144226](https://our.internmc.facebook.com/intern/diff/D59144226) Co-authored-by: Oguz Ulgen <oulgen@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/129588 Approved by: https://github.com/oulgen, https://github.com/xmfan	2024-06-28 16:06:22 +00:00
Animesh Jain	c017c97333	[dynamo][inlining-inbuilt-nn-modules] Update test output (#128880 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/128880 Approved by: https://github.com/mlazos ghstack dependencies: #128315, #128748, #128877, #128878	2024-06-18 02:18:09 +00:00
Animesh Jain	a0604193a2	handle call_function with Parameter args in DDPOptimizer splitting (#128034 ) When nn module inlining is enabled, modules are replaced with the underlying function calls in the output fx graph. example: ``` class GraphModule(torch.nn.Module): def forward(self, L_x_: "f32[1024, 1024]"): l_x_ = L_x_ # File: /data/users/lsakka/pytorch/pytorch/test/dynamo/test_structured_trace.py:284 in forward, code: return self.layers(x) l__self___layers_0: "f32[1024, 1024]" = self.L__self___layers_0(l_x_); l_x_ = None l__self___layers_1: "f32[1024, 1024]" = self.L__self___layers_1(l__self___layers_0); l__self___layers_0 = None return (l__self___layers_1,) ``` will be ``` class GraphModule(torch.nn.Module): def forward(self, L_self_layers_0_weight: "f32[1024, 1024]", L_self_layers_0_bias: "f32[1024]", L_x_: "f32[1024, 1024]", L_self_layers_1_weight: "f32[1024, 1024]", L_self_layers_1_bias: "f32[1024]"): l_self_layers_0_weight = L_self_layers_0_weight l_self_layers_0_bias = L_self_layers_0_bias l_x_ = L_x_ l_self_layers_1_weight = L_self_layers_1_weight l_self_layers_1_bias = L_self_layers_1_bias # File: /data/users/lsakka/pytorch/pytorch/torch/nn/modules/linear.py:116 in forward, code: return F.linear(input, self.weight, self.bias) input_1: "f32[1024, 1024]" = torch._C._nn.linear(l_x_, l_self_layers_0_weight, l_self_layers_0_bias); l_x_ = l_self_layers_0_weight = l_self_layers_0_bias = None input_2: "f32[1024, 1024]" = torch._C._nn.linear(input_1, l_self_layers_1_weight, l_self_layers_1_bias); input_1 = l_self_layers_1_weight = l_self_layers_1_bias = None return (input_2,) ``` The DDP optimizer when performing splitting, does not handle the inlined graph since it does not handle function calls since earlier we did not have function calls with params as inputs. (but calls to modules instead). This diff addresses that, it uses the example_value in the arguments to determine Parameter arguments of a function call and the Parameter properties. This address #https://github.com/pytorch/pytorch/issues/127552 running the optimizer on the code above with inlining yields to the following splitting: ``` ---submod_0 graph--- graph(): %l_x_ : torch.Tensor [num_users=1] = placeholder[target=l_x_] %l_self_layers_0_weight : torch.nn.parameter.Parameter [num_users=1] = placeholder[target=l_self_layers_0_weight] %l_self_layers_0_bias : torch.nn.parameter.Parameter [num_users=1] = placeholder[target=l_self_layers_0_bias] %linear : [num_users=1] = call_function[target=torch._C._nn.linear](args = (%l_x_, %l_self_layers_0_weight, %l_self_layers_0_bias), kwargs = {}) return linear ---submod_1 graph--- graph(): %input_1 : [num_users=1] = placeholder[target=input_1] %l_self_layers_1_weight : torch.nn.parameter.Parameter [num_users=1] = placeholder[target=l_self_layers_1_weight] %l_self_layers_1_bias : torch.nn.parameter.Parameter [num_users=1] = placeholder[target=l_self_layers_1_bias] %linear : [num_users=1] = call_function[target=torch._C._nn.linear](args = (%input_1, %l_self_layers_1_weight, %l_self_layers_1_bias), kwargs = {}) return linear ---final graph--- graph(): %l_self_layers_0_weight : torch.nn.parameter.Parameter [num_users=1] = placeholder[target=L_self_layers_0_weight] %l_self_layers_0_bias : torch.nn.parameter.Parameter [num_users=1] = placeholder[target=L_self_layers_0_bias] %l_x_ : torch.Tensor [num_users=1] = placeholder[target=L_x_] %l_self_layers_1_weight : torch.nn.parameter.Parameter [num_users=1] = placeholder[target=L_self_layers_1_weight] %l_self_layers_1_bias : torch.nn.parameter.Parameter [num_users=1] = placeholder[target=L_self_layers_1_bias] %submod_0 : [num_users=1] = call_module[target=compiled_submod_0](args = (%l_x_, %l_self_layers_0_weight, %l_self_layers_0_bias), kwargs = {}) %submod_1 : [num_users=1] = call_module[target=compiled_submod_1](args = (%submod_0, %l_self_layers_1_weight, %l_self_layers_1_bias), kwargs = {}) return (submod_1,) --------------- ``` where as without inlining it uses to be ``` ---submod_0 graph--- graph(): %l_x_ : torch.Tensor [num_users=1] = placeholder[target=l_x_] %l__self___layers_0 : [num_users=1] = call_module[target=L__self___layers_0](args = (%l_x_,), kwargs = {}) return l__self___layers_0 /data/users/lsakka/pytorch/pytorch/torch/_inductor/compile_fx.py:133: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance. warnings.warn( ---submod_1 graph--- graph(): %l__self___layers_0 : [num_users=1] = placeholder[target=l__self___layers_0] %l__self___layers_1 : [num_users=1] = call_module[target=L__self___layers_1](args = (%l__self___layers_0,), kwargs = {}) return l__self___layers_1 ---final graph--- graph(): %l_x_ : torch.Tensor [num_users=1] = placeholder[target=L_x_] %submod_0 : [num_users=1] = call_module[target=compiled_submod_0](args = (%l_x_,), kwargs = {}) %submod_1 : [num_users=1] = call_module[target=compiled_submod_1](args = (%submod_0,), kwargs = {}) return (submod_1,) --------------- ``` TESTING: (1) running ``` TORCHDYNAMO_INLINE_INBUILT_NN_MODULES=1 pytest test/distributed/test_dynamo_distributed.py -k ``` result in reduction in failures from 6 to 2 with this PR. The two remaining are FSDP related which does not sounds trivial and have so many details. will leave them for future work. Co-authored-by: Animesh Jain <anijain@umich.edu> Pull Request resolved: https://github.com/pytorch/pytorch/pull/128034 Approved by: https://github.com/anijain2305, https://github.com/wconstab	2024-06-13 17:07:27 +00:00
Michael Lazos	2129903aa3	Properly detect nested torch function args (#127496 ) Dynamo was not detecting nested torch function classes in containers. This was due to pytree compatibility for variable trackers being removed. Fixes https://github.com/pytorch/pytorch/issues/127174 Pull Request resolved: https://github.com/pytorch/pytorch/pull/127496 Approved by: https://github.com/anijain2305	2024-06-02 03:43:22 +00:00
Edward Z. Yang	0aaac68c57	Add structured logging for tensor fakeification (#126879 ) This adds dumps of MetaTensorDesc and MetaStorageDesc to structured logs when they are triggered from Dynamo. The logs look like this: ``` V0522 08:13:25.267000 140224882566144 torch/_subclasses/meta_utils.py:195] {"describe_storage": {"id": 0, "describer_id": 0, "size": 32}, "frame_id": 0, "frame_compile_id": 0, "attempt": 0} V0522 08:13:25.267000 140224882566144 torch/_subclasses/meta_utils.py:220] {"describe_tensor": {"id": 0, "ndim": 1, "dtype": "torch.float32", "device": "device(type='cpu')", "size": [8], "is_leaf": true, "stride": [1], "storage": 0, "view_func": "<built-in method _view_func_unsafe of Tensor object at 0x7f882959e840>", "describer_id": 0}, "frame_id": 0, "frame_compile_id": 0, "attempt": 0} V0522 08:13:25.268000 140224882566144 torch/_subclasses/meta_utils.py:1594] {"describe_source": {"describer_id": 0, "id": 0, "source": "L['x']"}, "frame_id": 0, "frame_compile_id": 0, "attempt": 0} ``` The `describer_id` is used to disambiguate ids. We expect it to be unique per frame id, but if there is a bug it possibly is not. Note you will get redundant dumps when evaluation restarts. tlparse can use this to give a visualization of input tensors to a model, you could also use this to generate example inputs to run graphs on. Some care is taken to avoid redumping the tensor metadata multiple times, which would happen ordinarily because AOTAutograd refakifies everything after Dynamo, to deal with metadata mutation. Partially fixes https://github.com/pytorch/pytorch/issues/126644 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/126879 Approved by: https://github.com/jamesjwu	2024-05-31 01:58:44 +00:00
Oguz Ulgen	25a9262ba4	Add structured logging for fx graph cache hash (#127156 ) Summary: Add structured logging for fx graph cache hash so that we can debug MAST jobs easily. Test Plan: ad hoc testing Differential Revision: D57791537 Pull Request resolved: https://github.com/pytorch/pytorch/pull/127156 Approved by: https://github.com/jamesjwu	2024-05-27 17:18:41 +00:00
Edward Z. Yang	da5d2d9b3e	Hotfix: restore CPP guard string in structured trace (#125303 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/125303 Approved by: https://github.com/albanD	2024-05-02 03:57:19 +00:00
Sam Larsen	74e8817311	[inductor] Minor fixes to various tests before enabling fx graph caching in OSS by default (#125258 ) Summary: Discovered breakages by enabling codecache by default and doing a CI run. I'll commit these fixes first and eventually enabling caching by default will (hopefully) be a one-liner. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125258 Approved by: https://github.com/eellison	2024-05-01 02:34:01 +00:00
Xuehai Pan	93e249969b	[BE] enable `ruff` rule `RSE` and remove useless parentheses in `raise` statements (#124261 ) Remove useless parentheses in `raise` statements if the exception type is raised with no argument. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124261 Approved by: https://github.com/albanD	2024-04-17 19:29:34 +00:00
Edward Z. Yang	852111e1c2	[TORCH_TRACE] Record stack when no compile context is available (#122644 ) This will help me track down those annoying unknown compile products. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/122644 Approved by: https://github.com/jamesjwu	2024-03-26 19:30:52 +00:00
Edward Z. Yang	7e176ebb47	Log compilation_metrics to TORCH_TRACE (#122638 ) It's not technically needed as you can get it from Scuba too, but it's more convenient for tlparse to get at it this way. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/122638 Approved by: https://github.com/albanD	2024-03-26 14:10:55 +00:00
Edward Z. Yang	5b5bcf0470	Test that tlparse understands the structured logs we output (#120658 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/120658 Approved by: https://github.com/Skylion007, https://github.com/malfet ghstack dependencies: #120712, #120289	2024-02-28 21:58:39 +00:00
Edward Z. Yang	1a1fc1047d	Add structured trace logs (#120289 ) Overall design: https://docs.google.com/document/d/1CX_hJ0PNy9f3R1y8TJrfkSeLkvGjjjLU84BSXgS2AZ8/edit How to read the diff: * Most files are me augmenting pre-existing logging with structured variants. For the most part it's simple (esp FX graphs, which have a canonical string representation); it gets more complicated when I decided to JSON-ify some data structure instead of keeping the ad hoc printing (notably, guards and dynamo output graph sizes) * torch/_functorch/_aot_autograd/collect_metadata_analysis.py is some unrelated fixes I noticed while auditing artifact logs * torch/_logging/_internal.py has the actual trace log implementation. The trace logger is implement as a logger named torch.__trace which is disconnected from the logging hierarchy. It gets its own handler and formatter (TorchLogsFormatter with _is_trace True). `trace_structured` is the main way to emit a trace log. Unusually, there's a separate "metadata" and "payload" field. The metadata field should not be too long (as it is serialized as a single line) and is always JSON (we put contextual things like compile id in it); the payload field can be long and is emitted after the metadata log line and can span multiple lines. * torch/_logging/structured.py contains some helpers for converting Python data structures into JSON form. Notably, we have a string interning implementation here, which helps reduce the cost of serializing filenames into the log. * test/dynamo/test_structured_trace.py the tests are cribbed from test_logging.py, but all rewritten to use expect tests on munged versions of what we'd actually output. Payloads are never tested, since they tend not be very stable. https://github.com/ezyang/tlparse is a POC Rust program that can interpret these logs. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/120289 Approved by: https://github.com/Skylion007 ghstack dependencies: #120712	2024-02-28 01:01:41 +00:00
PyTorch MergeBot	f3dd2a544c	Revert "Add structured trace logs (#120289 )" This reverts commit `9dfaef962c`. Reverted https://github.com/pytorch/pytorch/pull/120289 on behalf of https://github.com/kit1980 due to breaking internal builds, see D54230697 ([comment](https://github.com/pytorch/pytorch/pull/120289#issuecomment-1967477120))	2024-02-27 19:49:05 +00:00
Edward Z. Yang	9dfaef962c	Add structured trace logs (#120289 ) Overall design: https://docs.google.com/document/d/1CX_hJ0PNy9f3R1y8TJrfkSeLkvGjjjLU84BSXgS2AZ8/edit How to read the diff: * Most files are me augmenting pre-existing logging with structured variants. For the most part it's simple (esp FX graphs, which have a canonical string representation); it gets more complicated when I decided to JSON-ify some data structure instead of keeping the ad hoc printing (notably, guards and dynamo output graph sizes) * torch/_functorch/_aot_autograd/collect_metadata_analysis.py is some unrelated fixes I noticed while auditing artifact logs * torch/_logging/_internal.py has the actual trace log implementation. The trace logger is implement as a logger named torch.__trace which is disconnected from the logging hierarchy. It gets its own handler and formatter (TorchLogsFormatter with _is_trace True). There's a teensy bit of FB specific code to automatically enable trace logging if a /logs directory exists. `trace_structured` is the main way to emit a trace log. Unusually, there's a separate "metadata" and "payload" field. The metadata field should not be too long (as it is serialized as a single line) and is always JSON (we put contextual things like compile id in it); the payload field can be long and is emitted after the metadata log line and can span multiple lines. * torch/_logging/structured.py contains some helpers for converting Python data structures into JSON form. Notably, we have a string interning implementation here, which helps reduce the cost of serializing filenames into the log. * test/dynamo/test_structured_trace.py the tests are cribbed from test_logging.py, but all rewritten to use expect tests on munged versions of what we'd actually output. Payloads are never tested, since they tend not be very stable. https://github.com/ezyang/tlparse is a POC Rust program that can interpret these logs. Testing that the fbcode detection works at https://www.internalfb.com/mlhub/pipelines/runs/fblearner/534553450 (Meta-only) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/120289 Approved by: https://github.com/Skylion007	2024-02-27 00:04:23 +00:00

1 2 3

107 Commits