pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Raymond Li	21c2565f35	Document dynamo (#146736 ) Many files in dynamo are currently lacking file/module-level documentation, which makes it hard to know what they do at a glance and without digging into the code. This fixes that. Note: documentation was AI-generated and could be incorrect, please review carefully. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146736 Approved by: https://github.com/jansel, https://github.com/StrongerXi, https://github.com/anijain2305, https://github.com/zou3519	2025-02-13 00:02:21 +00:00
Burak Turk	01a4d86b31	add pt2 callbacks for backward pass and prevent duplicate callbacks (#145732 ) Summary: This change adds callbacks for lazy backwards compilation while preventing duplicate callbacks to be fired. Differential Revision: D68577593 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145732 Approved by: https://github.com/mlazos	2025-01-28 03:50:02 +00:00
Aaron Orenstein	a79100ab11	PEP585 update - torch/_dynamo (#145105 ) See #145101 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145105 Approved by: https://github.com/bobrenjc93	2025-01-18 20:47:11 +00:00
Atul Jangra	6a096a0b96	[PT2] Fix callbacks to account for entire execution in compilation (#141323 ) Summary: In SJD, we register the callbacks to get notified of an active compilation. Using this information, we can basically allow for an increase time for the training loop The callbacks currently do not account for entire time and in several cases, the end callback is not called at all. This leads to a bunch of APS jobs getting terminated incorrectly: https://fburl.com/scuba/mast_hpc_job_run_status/ondwzt2w In this diff, we basically install a context manager which will call the start and end callbacks, similar to how we log counters and other information. Test Plan: ``` buck2 run mode/opt //aps_models/examples/dlrm:dlrm_train_app -- --config-name train_mast_fsdp_torchdynamo launcher.data_project=apf_ai_infra launcher.fbl_entitlement=ai_infra_training_rnd_tc launcher.hardware=TC_ANY_80G ``` Led to https://www.internalfb.com/mlhub/pipelines/runs/mast/aps-atuljangra-ef2285ba9a?job_attempt=0&version=0&env=prod https://fburl.com/ai_infra/sv0a213y confirms that callback was correctly called and a lease was properly installed, which takes over the training loop lease. {F1965137027} Differential Revision: D66347023 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141323 Approved by: https://github.com/ezyang	2024-11-24 22:31:04 +00:00
Bob Ren	d073223663	turn CompilationCallbackHandler into dataclass (#137312 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137312 Approved by: https://github.com/Skylion007 ghstack dependencies: #137181	2024-10-05 19:03:28 +00:00
Bob Ren	cfc51c858a	type _dynamo/callback.py (#137181 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/137181 Approved by: https://github.com/Skylion007	2024-10-04 03:28:52 +00:00
Aaron Orenstein	dcfa7702c3	Flip default value for mypy disallow_untyped_defs [1/11] (#127838 ) See #127836 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127838 Approved by: https://github.com/oulgen	2024-06-08 18:16:33 +00:00
Yanbo Liang	169c220bf8	[torch.compile] Provide capability to register callback on compile start/stop (#120764 ) This is a requirement from Meta internal cases, where ppl wants to register a callback function to detect if a job is stuck during compilation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120764 Approved by: https://github.com/jansel	2024-02-29 07:37:52 +00:00

8 Commits