pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

History

Sean McGovern f332017294 C++ API handle optimizer defaults (#161825 ) Fixes #141884 This fixes the issue for all optimizers and parameter options. A member function `overwrite_from` is added to the optimizer base class. Each optimizer then implements this function for comparing their accepted parameters to defaults. A SFINAE approach to handle the different optimizer parameters generically (in optimizer.h only) was evaluated, but I think this is easier to review and maintain. This mirrors the Python API up to one edge case. An example of the edge case is provided below. Python can distinguish between 1) Key not present in dict = "not specified" and 2) Key present in dict = "explicitly set". The C++ implementation cannot. The issue hinges on whether or not to track if a particular parameter was set by the user explicitly or not (discrepancy in the case when the constructor default is explicitly passed in). To track this seems like it will take more intervention than would be worth it (modify TORCH_ARG to keep track, use std::optional for the parameter types, use bitset tracking) and was not pursued in the current PR. I'm happy to alter the design if appropriate. ### Example of edge case hinging on CONSTRUCTOR DEFAULTS vs OPTIMIZER DEFAULTS 1. CONSTRUCTOR DEFAULTS: These are the values you get when calling AdamOptions() AdamOptions().lr() = 0.001 AdamOptions().weight_decay() = 0 AdamOptions().eps() = 1e-08 2. OPTIMIZER DEFAULTS: These are the values the user chose when creating the optimizer User's optimizer defaults: optimizer.lr() = 0.005 optimizer.weight_decay() = 0.1 optimizer.eps() = 1e-07 3. THE PROBLEM SCENARIO: User wants to add a parameter group with explicit weight_decay=0.0 User sets: weight_decay(0) 4. THE CONFUSION: Constructor default weight_decay: 0 User's explicit weight_decay: 0 Are they equal? YES Since they're equal, our overwrite_from() logic thinks: "User didn't set weight_decay explicitly, use optimizer default" 5. CURRENT BEHAVIOR: Final weight_decay: 0.1 User expected: 0 Match? ❌ NO === KEY INSIGHT === Constructor defaults are built into the C++ class definition. Optimizer defaults are chosen by the user at runtime. We want to respect the user intention. Pull Request resolved: https://github.com/pytorch/pytorch/pull/161825 Approved by: https://github.com/janeyx99		2025-10-08 16:40:45 +00:00
..
aoti_abi_check	Migrate DeviceType to torch/headeronly (#163999 )	2025-09-30 23:13:27 +00:00
aoti_inference	[AOTInductor] Use CudaCachingAllocator for memory allocation (#162893 )	2025-09-17 17:08:20 +00:00
api	C++ API handle optimizer defaults (#161825 )	2025-10-08 16:40:45 +00:00
c10d	[fr] [xpu] Add FlightRecorder support for ProcessGroupXCCL (#158568 )	2025-08-22 09:03:35 +00:00
common
dist_autograd	Revert "[RELAND] Always build USE_DISTRIBUTED (#160449 ) and Make distributed modules importable even when backend not built (#159889 ) (#162594 )"	2025-09-25 13:47:46 +00:00
jit	[TorchScript] ProfilingExecutor - RemoveProfileNodesAndSpecializeTypes None handling (#161538 )	2025-08-27 23:12:15 +00:00
lazy	[BC-Breaking] Remove long-deprecated casting functions from native_functions.yaml (#164641 )	2025-10-08 08:27:58 +00:00
lite_interpreter_runtime	[BE][3/6] fix typos in test/ (#157637 )	2025-07-17 12:08:33 +00:00
monitor
nativert	kjt pytree registration (#161114 )	2025-09-13 03:57:43 +00:00
profiler	[Lint] Update clang-format to 19.1.4 (#153889 )	2025-05-20 14:12:46 +00:00
rpc	Fix some CMake issues (#153686 )	2025-05-19 00:31:34 +00:00
__init__.py