pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Xuehai Pan	1cb4e2df65	[BE][PYFMT] migrate PYFMT for `torch._inductor` to `ruff format` (#144550 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144550 Approved by: https://github.com/jansel	2025-02-28 13:33:19 +00:00
Jason Ansel	78a94c9114	[inductor] Remove type ignores from scheduler.py (#145712 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145712 Approved by: https://github.com/yanboliang, https://github.com/Skylion007 ghstack dependencies: #145692	2025-01-28 01:44:32 +00:00
Aaron Gokaslan	f3304571fc	[BE][Ez]: FURB148 - remove useless enumerate calls (#145619 ) Remove useless enumerate calls Pull Request resolved: https://github.com/pytorch/pytorch/pull/145619 Approved by: https://github.com/drisspg	2025-01-24 23:37:15 +00:00
Aaron Orenstein	bac62341eb	PEP585 update - torch/_inductor (#145198 ) See #145101 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145198 Approved by: https://github.com/bobrenjc93	2025-01-21 21:04:33 +00:00
bobrenjc93	a3ab27b8e0	Migrate from Tuple -> tuple in torch/_inductor (#144264 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144264 Approved by: https://github.com/eellison	2025-01-07 03:27:27 +00:00
Tom Ritchford	da67a6a7bb	[inductor] Replace set by OrderedSet (#138466 ) Uses the set_linter from https://github.com/pytorch/pytorch/pull/138454 and considerable manual editing Pull Request resolved: https://github.com/pytorch/pytorch/pull/138466 Approved by: https://github.com/eellison	2024-12-13 16:08:45 +00:00
Jason Ansel	6eca0aee76	[inductor] Refactor ir.Layout into ir.OutputSpec (#140910 ) This separate the concepts of a Layout (size/stride/etc) and an OutputSpec (which includes multiple outputs). Which should make typing easier. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140910 Approved by: https://github.com/ezyang ghstack dependencies: #140895	2024-11-21 20:01:57 +00:00
Xuan Zhang	9c2ffce71a	add condition for freeable input buffer (#139480 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139480 Approved by: https://github.com/yf225 ghstack dependencies: #139396	2024-11-01 21:15:40 +00:00
Xuan Zhang	86602a66d7	[orm] fix live_memory computation in lpmf algorithm (#139396 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139396 Approved by: https://github.com/yf225	2024-10-31 23:45:30 +00:00
Xuan Zhang	2980aed65b	[inductor][memory] restructuring memory.py and turn on the flag (#137205 ) Addressing additional comments given in PR https://github.com/pytorch/pytorch/pull/134874 Pull Request resolved: https://github.com/pytorch/pytorch/pull/137205 Approved by: https://github.com/eellison	2024-10-25 17:19:34 +00:00
Xuan Zhang	c9d12f6360	[inductor][memory] add signpost event for memory pass (#136538 ) Add logging to scuba table for internal models. For verification, I triggered a sample workflow internally and checked the scuba table logging to make sure the `Paramaters` column has the expected loggings, see [here](https://fburl.com/scuba/workflow_signpost/39h7qo9s). Pull Request resolved: https://github.com/pytorch/pytorch/pull/136538 Approved by: https://github.com/yf225	2024-09-25 21:47:46 +00:00
Xuan Zhang	03957efa5d	[inductor][scheduler] reorder scheduler nodes after fusion to reduce peak memory (#134874 ) Motivations: A topological order of the scheduler nodes that optimize the liveness of buffers can reduce the peak memory utilization. This has been observed and studied e.g., [here](https://arxiv.org/pdf/1910.02653) and [here](https://proceedings.mlr.press/v202/steiner23a/steiner23a.pdf). Solutions: 1. implement a peak memory estimator via liveness analysis 2. implement a few memory aware topological sorting algorithms and pick the one with the lowest peak memory Results: On some models we can reduce the peak memory significantly: \| model \| batch size \| peak_memory baseline \| peak_memory new \| ratio \| \|:-----------------------------:\|:----------:\|:--------------------:\|:---------------:\|:-----:\| \| alexnet \| 128 \| 1.17 \| 0.99 \| 1.19 \| \| vgg16 \| 64 \| 4.10 \| 3.57 \| 1.15 \| \| DebertaV2ForQuestionAnswering \| 1 \| 11.60 \| 10.56 \| 1.10 \| In the presence of compiler based AC, peak memory can be further reduced: \| model \| batch size \| peak_memory baseline \| peak_memory new \| ratio \| \|:------------------------------:\|:----------:\|:--------------------:\|:---------------:\|:-----:\| \| AlbertForMaskedLM \| 4 \| 6.87 \| 6.43 \| 1.07 \| \| AlbertForQuestionAnswering \| 4 \| 8.69 \| 7.76 \| 1.12 \| \| MobileBertForQuestionAnswering \| 128 \| 4.67 \| 3.90 \| 1.20 \| [Here](https://fb.workplace.com/groups/1075192433118967/posts/1499920537312819/?comment_id=1499938843977655&reply_comment_id=1499951630643043) is an internal use case. Other infos: * neutral model runtime, because the the reordering happens after fusion. So memory saving is _for free_. * minimal compile time overhead as the algorithm is linear in the number of edges of the inductor graph. For all hugglingface benchmark models, the additional compile time is less than 1 second. * no peak memory regression since we only adopt a new order if the peak memory is reduced based on the estimator. However, the model is unaware of operators' working memories, but for large models, the working memory should be negligible. We haven't observed any significant regressions on all of our tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134874 Approved by: https://github.com/yf225	2024-09-21 16:28:38 +00:00

12 Commits