pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Jane Xu	c6be5d55a5	Migrate param_group testing to OptimizerInfo (#117675 ) Today, our param_group testing does the equivalent of pitting weight and bias with different optimizer hyperparams and then check that the overall result is going the right direction based on maximize. This PR introduces two tests to encompass coverage: 1. For every optimizer input (no differentiable), always force bias to have 0 weight_decay, and then check that the direction is expected. This is basically a replica to today's tests, but is more methodical as the test is a real use case. 2. To ensure that the different groups have distinct behavior, I added another test where lr is basically 0 in default group, and ensure that the param in the default group doesn't move while loss does. Together, these tests do a better job of testing param groups than today's tests, though we do lose some flavors. For example, RMSProp also pits centered=True vs False across the param_groups, Adadelta has a variation on rho, and ASGD has a variation for t0. I don't think this is really a loss, as the previous test was just testing for direction and our new tests test stronger guarantees. The leftover param group configs are used in conjunction with LRSchedulers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/117675 Approved by: https://github.com/albanD	2024-01-22 23:48:46 +00:00
Jane Xu	95a6866220	Migrate fused optim load_state_dict to OptimizerInfo (#117890 ) The new tests look like: ``` (pytorch-3.10) [janeyx@devgpu023.odn1 ~/local/pytorch (`29f899ef`)]$ python test/test_optim.py -v -k test_cpu_load_state_dict /home/janeyx/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead. _torch_pytree._register_pytree_node( test_cpu_load_state_dict_impl_capturable_AdamW_cpu_float32 (__main__.TestOptimRenewedCPU) ... skipped 'Only runs on cuda' test_cpu_load_state_dict_impl_capturable_Adam_cpu_float32 (__main__.TestOptimRenewedCPU) ... skipped 'Only runs on cuda' test_cpu_load_state_dict_impl_capturable_SGD_cpu_float32 (__main__.TestOptimRenewedCPU) ... skipped 'Only runs on cuda' test_cpu_load_state_dict_impl_fused_AdamW_cpu_float32 (__main__.TestOptimRenewedCPU) ... skipped 'Only runs on cuda' test_cpu_load_state_dict_impl_fused_Adam_cpu_float32 (__main__.TestOptimRenewedCPU) ... skipped 'Only runs on cuda' test_cpu_load_state_dict_impl_fused_SGD_cpu_float32 (__main__.TestOptimRenewedCPU) ... skipped 'Only runs on cuda' test_cpu_load_state_dict_impl_capturable_AdamW_cuda_float32 (__main__.TestOptimRenewedCUDA) ... ok test_cpu_load_state_dict_impl_capturable_Adam_cuda_float32 (__main__.TestOptimRenewedCUDA) ... ok test_cpu_load_state_dict_impl_capturable_SGD_cuda_float32 (__main__.TestOptimRenewedCUDA) ... skipped 'SGD does not currently support capturable' test_cpu_load_state_dict_impl_fused_AdamW_cuda_float32 (__main__.TestOptimRenewedCUDA) ... ok test_cpu_load_state_dict_impl_fused_Adam_cuda_float32 (__main__.TestOptimRenewedCUDA) ... ok test_cpu_load_state_dict_impl_fused_SGD_cuda_float32 (__main__.TestOptimRenewedCUDA) ... ok ---------------------------------------------------------------------- Ran 12 tests in 12.865s OK (skipped=6) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/117890 Approved by: https://github.com/albanD	2024-01-22 21:14:38 +00:00
Masaki Kozuki	1d14adfa66	[mta] Fused SGD (#116585 ) depends on #116583 rel: - #94791 Pull Request resolved: https://github.com/pytorch/pytorch/pull/116585 Approved by: https://github.com/janeyx99	2024-01-16 23:54:38 +00:00
Jane Xu	c329eddcb9	Migrate the rest of state_dict testing to OptimizerInfo (#117186 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/117186 Approved by: https://github.com/albanD ghstack dependencies: #116509	2024-01-12 22:32:37 +00:00
Jane Xu	bcf1f312a0	Migrate nontensor step and CUDA params state_dict tests to OptimizerInfo (#116509 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/116509 Approved by: https://github.com/albanD	2024-01-12 22:32:37 +00:00
Jane Xu	90df7c008a	Migrate state_dict bc test to OptimizerInfo, increase coverage (#116500 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/116500 Approved by: https://github.com/albanD	2024-01-10 08:19:27 +00:00
Aaron Gokaslan	3fe437b24b	[BE]: Update flake8 to v6.1.0 and fix lints (#116591 ) Updates flake8 to v6.1.0 and fixes a few lints using sed and some ruff tooling. - Replace `assert(0)` with `raise AssertionError()` - Remove extraneous parenthesis i.e. - `assert(a == b)` -> `assert a == b` - `if(x > y or y < z):`->`if x > y or y < z:` - And `return('...')` -> `return '...'` Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/116591 Approved by: https://github.com/albanD, https://github.com/malfet	2024-01-03 06:04:44 +00:00
Jane Xu	4af1c27fa8	Migrate repr, deterministic state_dict test to OptimizerInfo (#116496 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/116496 Approved by: https://github.com/albanD ghstack dependencies: #116471	2023-12-28 19:49:04 +00:00
Jane Xu	44b98c09ca	[BE] migrate all assertRaises tests to OptimizerInfo test_errors (#116315 ) Removes a part of the sparse adam test and the following three tests: `test_fused_optimizer_raises`, `test_duplicate_params_across_param_groups`, `test_duplicate_params_in_one_param_group` ``` (pytorch-3.10) [janeyx@devgpu023.odn1 ~/local/pytorch (`d2d129de`)]$ python test/test_optim.py -k test_fused_optimizer_raises -k test_duplicate_params_across_param_groups -k test_duplicate_params_in_one_param_group /home/janeyx/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.0 warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}" ... ---------------------------------------------------------------------- Ran 3 tests in 0.023s OK ``` Increases coverage by testing the duplicate param tests on ALL the optims instead of just one each. Also fixes SparseAdam bug which was accidentally calling torch.unbind through list instead of putting params in a list. This bug was caught by migrating the weird warning stuff to just one easy warning context manager, which checks that nothing else gets raised. The new test_errors does not run slower than before, overhead is still king: ``` (pytorch-3.10) [janeyx@devgpu023.odn1 ~/local/pytorch (`d2d129de`)]$ python test/test_optim.py -k test_errors /home/janeyx/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.0 warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}" .......................... ---------------------------------------------------------------------- Ran 26 tests in 10.337s OK ``` Compared to test_errors BEFORE my commit :p ``` (pytorch-3.10) [janeyx@devgpu023.odn1 ~/local/pytorch (`b47aa696`)]$ python test/test_optim.py -k test_errors /home/janeyx/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.0 warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}" .............sssssssssssss ---------------------------------------------------------------------- Ran 26 tests in 11.980s OK (skipped=13) (pytorch-3.10) [janeyx@devgpu023.odn1 ~/local/pytorch (`b47aa696`)]$ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/116315 Approved by: https://github.com/mikaylagawarecki	2023-12-27 00:08:31 +00:00
Jane Xu	edf1ea622d	Move step is noop tests (#115299 ) As stated. I do notice there is perhaps opportunity to abstract, but the tests as written are also super understandable and more abstraction might not be desirable. This PR _increases coverage_. The original tests each tested 12 default configs (left out Rprop). Now the tests test ~80 configs, and then foreach + fused on top of that! Test time, we basically increase over 10-fold, but this test is tiny so we are not worried: Old: ``` (pytorch-3.10) [janeyx@devgpu023.odn1 ~/local/pytorch (5ca9672c)]$ python test/test_optim.py -k test_step_is_noop_when_params_have_no_grad /home/janeyx/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.0 warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}" . ---------------------------------------------------------------------- Ran 1 test in 0.028s OK ``` New (includes the old test): ``` (pytorch-3.10) [janeyx@devgpu023.odn1 ~/local/pytorch (5ca9672c)]$ python test/test_optim.py -k test_step_is_noop_when_params_have_no_grad /home/janeyx/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.0 warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}" ........................... ---------------------------------------------------------------------- Ran 27 tests in 0.456s OK ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/115299 Approved by: https://github.com/albanD ghstack dependencies: #114802, #115023, #115025	2023-12-20 22:49:44 +00:00
Jane Xu	8f3a0594e9	Move tests depending on listed configs to OptimizerInfo (#115025 ) Removing 4 tests: ``` (pytorch-3.10) [janeyx@devgpu023.odn1 ~/local/pytorch (7539011b)]$ python test/test_optim.py -v -k test_fused_optimizers_with_large_tensors -k test_fused_optimizers_with_varying_tensors -k test_multi_tensor_optimizers_with_large_tensors -k test_multi_tensor_optimizers_with_varying_tensors /home/janeyx/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.0 warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}" test_fused_optimizers_with_large_tensors (optim.test_optim.TestOptim) ... ok test_fused_optimizers_with_varying_tensors (optim.test_optim.TestOptim) ... ok test_multi_tensor_optimizers_with_large_tensors (optim.test_optim.TestOptim) ... ok test_multi_tensor_optimizers_with_varying_tensors (optim.test_optim.TestOptim) ... ok ---------------------------------------------------------------------- Ran 4 tests in 22.731s OK ``` For the same 4 but more granular: ``` (pytorch-3.10) [janeyx@devgpu023.odn1 ~/local/pytorch (7539011b)]$ python test/test_optim.py -v -k test_fused_large_tensor -k test_fused_mixed_device_dtype -k test_foreach_large_tensor -k test_foreach_mixed_device_dtype /home/janeyx/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.0 warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}" test_foreach_large_tensor_ASGD_cpu_float16 (__main__.TestOptimRenewedCPU) ... skipped 'Only runs on cuda' .... test_fused_mixed_device_dtype_Adam_cpu_float32 (__main__.TestOptimRenewedCPU) ... skipped 'Only runs on cuda' test_foreach_large_tensor_ASGD_cuda_float16 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_large_tensor_Adadelta_cuda_float16 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_large_tensor_Adagrad_cuda_float16 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_large_tensor_AdamW_cuda_float16 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_large_tensor_Adam_cuda_float16 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_large_tensor_NAdam_cuda_float16 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_large_tensor_RAdam_cuda_float16 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_large_tensor_RMSprop_cuda_float16 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_large_tensor_Rprop_cuda_float16 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_large_tensor_SGD_cuda_float16 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_mixed_device_dtype_ASGD_cuda_float32 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_mixed_device_dtype_Adadelta_cuda_float32 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_mixed_device_dtype_Adagrad_cuda_float32 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_mixed_device_dtype_AdamW_cuda_float32 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_mixed_device_dtype_Adam_cuda_float32 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_mixed_device_dtype_Adamax_cuda_float32 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_mixed_device_dtype_NAdam_cuda_float32 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_mixed_device_dtype_RAdam_cuda_float32 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_mixed_device_dtype_RMSprop_cuda_float32 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_mixed_device_dtype_Rprop_cuda_float32 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_mixed_device_dtype_SGD_cuda_float32 (__main__.TestOptimRenewedCUDA) ... ok test_fused_large_tensor_AdamW_cuda_float16 (__main__.TestOptimRenewedCUDA) ... ok test_fused_large_tensor_Adam_cuda_float16 (__main__.TestOptimRenewedCUDA) ... ok test_fused_mixed_device_dtype_AdamW_cuda_float32 (__main__.TestOptimRenewedCUDA) ... ok test_fused_mixed_device_dtype_Adam_cuda_float32 (__main__.TestOptimRenewedCUDA) ... ok ---------------------------------------------------------------------- Ran 50 tests in 50.785s OK (skipped=25) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/115025 Approved by: https://github.com/albanD ghstack dependencies: #114802, #115023	2023-12-20 22:49:44 +00:00
Jane Xu	05d60931b3	Migrate test_peak_mem_multi_tensor_optimizers to OptimizerInfo (#115023 ) Replace the following: ``` (pytorch-3.10) [janeyx@devgpu023.odn1 ~/local/pytorch (1bbf1c6f)]$ python test/test_optim.py -k test_peak_mem_multi_tensor_optimizers /home/janeyx/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.0 warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}" . ---------------------------------------------------------------------- Ran 1 test in 38.599s OK ``` with 11 tests (one for each foreach optim :)) ``` (pytorch-3.10) [janeyx@devgpu023.odn1 ~/local/pytorch (1bbf1c6f)]$ python test/test_optim.py -k TestOptimRenewedCUDA.test_foreach_memory /home/janeyx/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.0 warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}" ........... ---------------------------------------------------------------------- Ran 11 tests in 39.293s OK ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/115023 Approved by: https://github.com/albanD ghstack dependencies: #114802	2023-12-20 22:49:44 +00:00
Jane Xu	4fb92b591d	[BE] remove redundant _test_derived_optimizers by migrating more to OptimizerInfo (#114802 ) New tests look like: ``` (pytorch-3.10) [janeyx@devgpu023.odn1 ~/local/pytorch (af8fca04)]$ python test/test_optim.py -v -k TestOptimRenewedCUDA.test_fused /home/janeyx/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.0 warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}" test_fused_AdamW_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok test_fused_Adam_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok ---------------------------------------------------------------------- Ran 2 tests in 34.591s OK (pytorch-3.10) [janeyx@devgpu023.odn1 ~/local/pytorch (af8fca04)]$ python test/test_optim.py -v -k test_set_default_dtype_works_with_foreach /home/janeyx/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.0 warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}" test_set_default_dtype_works_with_foreach_ASGD_cpu_float64 (__main__.TestOptimRenewedCPU) ... skipped 'Only runs on cuda' ... test_set_default_dtype_works_with_foreach_ASGD_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok test_set_default_dtype_works_with_foreach_Adadelta_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok test_set_default_dtype_works_with_foreach_Adagrad_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok test_set_default_dtype_works_with_foreach_AdamW_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok test_set_default_dtype_works_with_foreach_Adam_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok test_set_default_dtype_works_with_foreach_Adamax_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok test_set_default_dtype_works_with_foreach_NAdam_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok test_set_default_dtype_works_with_foreach_RAdam_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok test_set_default_dtype_works_with_foreach_RMSprop_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok test_set_default_dtype_works_with_foreach_Rprop_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok test_set_default_dtype_works_with_foreach_SGD_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok ---------------------------------------------------------------------- Ran 22 tests in 32.915s OK (skipped=11) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114802 Approved by: https://github.com/albanD	2023-12-20 22:49:44 +00:00
PyTorch MergeBot	5b6b680517	Revert "Adamw refactor (#115983 )" This reverts commit `eafeba71c1`. Reverted https://github.com/pytorch/pytorch/pull/115983 on behalf of https://github.com/jeanschmidt due to Breaking internal tests, @janeyx99 please help @tfsingh to have this PR landed ([comment](https://github.com/pytorch/pytorch/pull/115983#issuecomment-1862976954))	2023-12-19 15:26:44 +00:00
Tej Singh	eafeba71c1	Adamw refactor (#115983 ) Fixes #104899, refactors adamw by abstracting out common code in adam. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115983 Approved by: https://github.com/janeyx99	2023-12-17 06:58:39 +00:00
Jane Xu	21cca2494d	Move test_multi_tensor_optimizers to use OptimizerInfos (#114797 ) This PR aims for parity+ compared to the old testing for the simplest foreach test case. Test coverage increase: we now test foreach optimizers with CPU as well as on GPU. Before: ``` (pytorch-3.10) [janeyx@devgpu023.odn1 ~/local/pytorch (19136605)]$ python test/test_optim.py -v -k test_multi_tensor_optimizers /home/janeyx/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.0 warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}" test_multi_tensor_optimizers (optim.test_optim.TestOptim) ... ok ---------------------------------------------------------------------- Ran 1 test in 7.253s OK (pytorch-3.10) [janeyx@devgpu023.odn1 ~/local/pytorch (19136605)]$ ``` Now, we get granular test cases at the cost of overhead! ``` (pytorch-3.10) [janeyx@devgpu023.odn1 ~/local/pytorch (19136605)]$ python test/test_optim.py -v -k test_foreach /home/janeyx/.conda/envs/pytorch-3.10/lib/python3.10/site-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.0 warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}" test_foreach_ASGD_cpu_float64 (__main__.TestOptimRenewedCPU) ... ok test_foreach_Adadelta_cpu_float64 (__main__.TestOptimRenewedCPU) ... ok test_foreach_Adagrad_cpu_float64 (__main__.TestOptimRenewedCPU) ... ok test_foreach_AdamW_cpu_float64 (__main__.TestOptimRenewedCPU) ... ok test_foreach_Adam_cpu_float64 (__main__.TestOptimRenewedCPU) ... ok test_foreach_Adamax_cpu_float64 (__main__.TestOptimRenewedCPU) ... ok test_foreach_NAdam_cpu_float64 (__main__.TestOptimRenewedCPU) ... ok test_foreach_RAdam_cpu_float64 (__main__.TestOptimRenewedCPU) ... ok test_foreach_RMSprop_cpu_float64 (__main__.TestOptimRenewedCPU) ... ok test_foreach_Rprop_cpu_float64 (__main__.TestOptimRenewedCPU) ... ok test_foreach_SGD_cpu_float64 (__main__.TestOptimRenewedCPU) ... ok test_foreach_ASGD_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_Adadelta_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_Adagrad_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_AdamW_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_Adam_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_Adamax_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_NAdam_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_RAdam_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_RMSprop_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_Rprop_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok test_foreach_SGD_cuda_float64 (__main__.TestOptimRenewedCUDA) ... ok ---------------------------------------------------------------------- Ran 22 tests in 30.954s OK (pytorch-3.10) [janeyx@devgpu023.odn1 ~/local/pytorch (19136605)]$ ``` Why the increase in time? Two reasons: 1. overhead. Any _CUDA_ *Info test (OpInfo, ModuleInfo, OptimizerInfo) will wrap itself with the `CudaNonDefaultStream` policy, and `CudaNonDefaultStream.__enter__` when called for the first time will go through all visible CUDA devices and synchronize each of them, thus forcing the CUDAContext to be init'd. Doing this for all 8 devices takes ~10-15s. Also, test parametrization costs a little overhead too, but not to the level init'ing CUDA context does. 2. We test more! Now, we have 72 configs (in the foreach optimizer world) whereas we only had 59 before. Next steps for the future: - consider adding more Tensor LR configs (like a Tensor LR without capturable in the single tensor case) - this is likely the next PR or 2: migrate all uses of _test_derived_optimizers in test_optim to TestOptimRenewed Pull Request resolved: https://github.com/pytorch/pytorch/pull/114797 Approved by: https://github.com/albanD	2023-12-07 19:37:56 +00:00
Jane Xu	d78fe039eb	Introduce OptimizerInfos + add a test_errors (#114178 ) Introduce OptimizerInfos + use them to refactor out the error testing. Why OptimizerInfos? - cleaner, easier way to test all configs of optimizers - would plug in well with devicetype to auto-enable tests for devices like MPS, meta - would allow for more granular testing. currently, lots of functionality is tested in `_test_basic_cases` and some of that should be broken down more. What did I do for error testing? - I moved out some error cases from `_test_basic_cases` into a new test_errors parametrized test. - The new test has to live in TestOptimRenewed (bikeshedding welcome) because the parametrized tests need to take in device and dtype and hook correctly, and not all tests in TestOptim do that. - TestOptimRenewed also is migrating to the toplevel test/test_optim.py now because importing TestOptimRenewed does not work (because of test instantiation, TestOptimRenewed gets replaced with TestOptimRenewedDevice for CPU, CUDA, and whatever other device). Is there any change in test coverage? - INCREASE: The error case where a single Parameter (vs a container of them) are passed in has now expanded to all optims instead of only LBFGS - DECREASE: Not much. The only thing is we no longer test two error cases for foreach=True AND foreach=False, which I think is redundant. (Highlighted in comments) Possible but not urgent next step: test ALL possible error cases by going through all the constructors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114178 Approved by: https://github.com/albanD	2023-12-05 22:58:36 +00:00
Jane Xu	7c1a5012f0	[BE][SparseAdam] cleaner way to verify no sparse params (#114425 ) Context: https://github.com/pytorch/pytorch/pull/47724 fixed the problem that SparseAdam could not handle generators by using the `list(...)` construct. However, this meant that SparseAdam deviated from other optimizers in that it could _accept_ a raw Tensors/Parameter vs requiring a container of them. This is not really a big deal. So why this PR? I do think this PR is cleaner. It uses the fact that the Optimizer parent class already containerizes parameters into parameter groups, so we could reuse that here by calling `super().__init__` first and then filter the param_groups after. This change would also make SparseAdam consistent with the rest of our optimizers in that only containerized params are accepted, which technically is BC breaking SO I've added a deprecation warning that we should remove in May 2024. (But is it really BC breaking when we've said in the docs that params should be an iterable this whole time? Maybe this is just a bug fix....😛) Pull Request resolved: https://github.com/pytorch/pytorch/pull/114425 Approved by: https://github.com/drisspg	2023-11-29 19:47:03 +00:00
Aaron Gokaslan	bc34f02c38	[BE][Easy]: Apply RUF019: remove duplicate checks for dict access (#114478 ) Applies RUF019 nightly preview rule to the codebase Pull Request resolved: https://github.com/pytorch/pytorch/pull/114478 Approved by: https://github.com/mikaylagawarecki	2023-11-29 00:14:02 +00:00
Jon Chuang	62de29d06f	[optim] be explicit about CPU scalar tensor dtypes (#111008 ) Fixes https://github.com/pytorch/pytorch/issues/110940 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111008 Approved by: https://github.com/janeyx99	2023-11-21 22:44:50 +00:00
Jane Xu	5e30741754	Clean up optimizer imports in test_optim (#113971 ) This is purely a cosmetic change to set up for my optimizer infos, which will benefit from not needing to type optim.SparseAdam or whatever. The next step is actually adding the OptimizerInfos, similar to my attempt in https://github.com/pytorch/pytorch/pull/102774/files Pull Request resolved: https://github.com/pytorch/pytorch/pull/113971 Approved by: https://github.com/cpuhrsch	2023-11-18 03:52:01 +00:00
Jon Chuang	f74d766632	feat(optim): use `has_complex` shortcut flag for all applicable optimizers, use `_view_as_real` auxiliary function (#110706 ) Follow up to: https://github.com/pytorch/pytorch/pull/110607 CC: @lezcano @janeyx99 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110706 Approved by: https://github.com/lezcano	2023-10-31 20:33:03 +00:00
Jon Chuang	63fe5de89b	feat(optim): add SGD sparse multitensor to testing path (#110562 ) Follow up to: https://github.com/pytorch/pytorch/pull/110454, which defines the infra for sparse multi tensor optimizer testing Pull Request resolved: https://github.com/pytorch/pytorch/pull/110562 Approved by: https://github.com/janeyx99	2023-10-06 07:48:25 +00:00
Jon Chuang	57e9969021	feat(optim): Add adadelta multi_tensor support for complex, with `has_complex` shortcut (#110631 ) Partial fix: https://github.com/pytorch/pytorch/issues/110606 More on `has_complex` shortcut: https://github.com/pytorch/pytorch/pull/110613#issuecomment-1749314805 CC: @janeyx99, @mlazos, @lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/110631 Approved by: https://github.com/lezcano	2023-10-06 03:34:41 +00:00
Jon Chuang	11047be10e	feat(optim): Add `NAdam`support for complex, with `has_complex` shortcut (#110634 ) Partial fix: https://github.com/pytorch/pytorch/issues/110606 More on `has_complex` shortcut: https://github.com/pytorch/pytorch/pull/110613#issuecomment-1749314805 CC: @janeyx99 @mlazos @lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/110634 Approved by: https://github.com/lezcano	2023-10-06 03:31:48 +00:00
Jon Chuang	347ea3fe0d	feat(optim): Add `RAdam` support for complex, with `has_complex` shortcut (#110635 ) Partial fix: https://github.com/pytorch/pytorch/issues/110606 More on `has_complex` shortcut: https://github.com/pytorch/pytorch/pull/110613#issuecomment-1749314805 CC: @janeyx99 @mlazos @lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/110635 Approved by: https://github.com/lezcano	2023-10-06 03:29:26 +00:00
Jon Chuang	c99de9f37c	fix(optim): adagrad sparse multitensor incorrect early exit (#110454 ) Fixes https://github.com/pytorch/pytorch/issues/110444#issuecomment-1745181530 This PR: Passes Main: ``` test/optim/test_optim.py::TestOptim::test_adagrad_sparse FAILED [0.0058s] ==================================================================================================================================== FAILURES ===================================================================================================================================== __________________________________________________________________________________________________________________________ TestOptim.test_adagrad_sparse __________________________________________________________________________________________________________________________ Traceback (most recent call last): File "/home/jonch/Desktop/Programming/mlsys/pytorch/test/optim/test_optim.py", line 1448, in test_adagrad_sparse self._test_rosenbrock_sparse( File "/home/jonch/Desktop/Programming/mlsys/pytorch/test/optim/test_optim.py", line 128, in _test_rosenbrock_sparse self.assertEqual(params, params_c, atol=1e-6, rtol=1e-6) File "/home/jonch/Desktop/Programming/mlsys/pytorch/torch/testing/_internal/common_utils.py", line 3309, in assertEqual raise error_metas.pop()[0].to_error( AssertionError: Tensor-likes are not close! Mismatched elements: 1 / 2 (50.0%) Greatest absolute difference: 0.09999999999993325 at index (1,) (up to 1e-06 allowed) Greatest relative difference: 0.06249999999996089 at index (1,) (up to 1e-06 allowed) ``` CC: @janeyx99 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110454 Approved by: https://github.com/janeyx99	2023-10-05 20:37:57 +00:00
Jane Xu	9f40ffeec6	[optim] disable large_tensor tests for ROCm (#110559 ) Closes #105825 #105820 #105754 by replacing with an incode skip. Fixes #105825, fixes #105820, fixes #105754 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110559 Approved by: https://github.com/albanD	2023-10-05 01:21:21 +00:00
Michael Lazos	b193f295b6	Add capturable ASGD impl (#107857 ) Add capturable ASGD impl + test Pull Request resolved: https://github.com/pytorch/pytorch/pull/107857 Approved by: https://github.com/janeyx99	2023-09-07 06:30:30 +00:00
bilzard	18a58f0bd6	Implement "RAdamW" optimizer (#107507 ) Fixes #107282 ## Overview - basic design decision was followed as they made on #103881 (tensor operation, test cases, order & position of argument etc.) - for the algorithm for decoupled weight decay, I referred to [1, 2] ## backwards-incompatible changes - positional argument `decoupled_weight_decay` is added to: - `torch.optim.radam` The existing code which refers to these APIs can be affected. Note: Positional argument `decoupled_weight_decay` is added to `torch.optim.RAdam`. However, since it was added to the last position and with default value, it is not affected. ## Reference - [1] [Decoupled Weight Decay Regularization](https://arxiv.org/abs/1711.05101) - [2] https://github.com/LiyuanLucasLiu/RAdam/blob/master/radam/radam.py#L5-L94 ## TODO - [x] implement tensor operation - [x] implement test cases - [x] modify doc-string - [x] pass unit test code locally `python test/test_optim.py -k test_radam` Pull Request resolved: https://github.com/pytorch/pytorch/pull/107507 Approved by: https://github.com/janeyx99	2023-08-28 20:50:25 +00:00
PyTorch MergeBot	3a3cf0e09d	Revert "[optim] Make casting to match params a hook (#106725 )" This reverts commit `9f86d85172`. Reverted https://github.com/pytorch/pytorch/pull/106725 on behalf of https://github.com/janeyx99 due to We acknowledge this is a huge risk because people do not remember to call super().__init__ from their Optimizer subclasses and so this will break lots of load_state_dict behavior ([comment](https://github.com/pytorch/pytorch/pull/106725#issuecomment-1693386137))	2023-08-25 13:47:19 +00:00
Jane Xu	9f86d85172	[optim] Make casting to match params a hook (#106725 ) Moves the logic to casting state to match parameters into a hook so that users can choose to enable their hooks before or after the casting has happened. With this, there is a little bit of redundancy of the id_map building and the check that the param groups are still aligned in length. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106725 Approved by: https://github.com/albanD	2023-08-23 22:25:33 +00:00
Jane Xu	1641d671e5	[optim] FusedAdam/W accepts lr: Tensor without h2ds (#106916 ) Starts addressing #106802 This PR also conveniently does some BE: - Fixes a bug in adamw where we use amsgrad instead of per group amsgrad - Brings the impls of adamw and adam closer to correctness and to each other I couldn't fully remove the .pyi's because mypy was going to complain about the entire files which scared me and shouldn't go in this PR anyway. Test plan: - Add tests to ensure that lr could be passed as a Tensor - Did some profiling of the below code (runs 1k iterations of step for Adam) ``` import torch from torch.testing._internal.common_utils import TestCase param = torch.rand(2, 3, dtype=torch.float, device='cuda:0', requires_grad=True) param.grad = torch.rand_like(param) lr = torch.tensor(.001, device='cuda:0') opt = torch.optim.Adam([param], lr=lr, fused=True) with torch.profiler.profile( activities=[ torch.profiler.ProfilerActivity.CPU, torch.profiler.ProfilerActivity.CUDA, ] ) as p: for _ in range(1000): opt.step() print(p.key_averages().table(sort_by="cpu_time_total")) ``` Before my change: <img width="1381" alt="image" src="https://github.com/pytorch/pytorch/assets/31798555/cfc5175a-0f41-4829-941f-342554f3b152"> After my change (notice there are no d2h syncs and the CPU time is lower!): ![image](https://github.com/pytorch/pytorch/assets/31798555/726d7e66-dcff-4a4f-8a75-e84329961989) Next steps long term: - have all capturable foreach + forloop impls in Adam(W) handle tensor LR - have all capturable impls handle tensor LR - have all impls handle tensor LR Pull Request resolved: https://github.com/pytorch/pytorch/pull/106916 Approved by: https://github.com/albanD	2023-08-21 23:00:44 +00:00
Jane Xu	c0f80c6696	[forward-fix] Fix multigpu varying tensor optim tests (#106887 ) Forward fixes https://github.com/pytorch/pytorch/pull/106615 by increasing tolerance in the test. The capturable implementation for foreach simply varies due to a different order of operations when updating params. I had also attempted to compare against fp64 but that introduced more disparity in the other optimizer configs. It is worth trying the fp64 comparison at a later point, but let's get the test passing first. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106887 Approved by: https://github.com/izaitsevfb	2023-08-10 16:35:38 +00:00
Jane Xu	0208574db9	[NAdam] Add capturable API and tests + fix differentiable (#106615 ) This PR: - adds a capturable API for NAdam similar to Adam(W) - adds tests accordingly - discovered and fixed bugs in the differentiable implementation (now tested through the capturable codepath). Pull Request resolved: https://github.com/pytorch/pytorch/pull/106615 Approved by: https://github.com/albanD	2023-08-07 19:49:11 +00:00
Jane Xu	59d0dea90f	Only make a shallow copy when loading optimizer state_dict (#106082 ) The thing we do still deep copy is the param_groups, which is much lighter weight. This should also save memory when loading from a checkpoint. The deepcopy was introduced in `ecfcf39f30`, but module.py had only a shallow copy at that point so it did not actually bring parity. Incorporates an XLA fix, which is why I'm updating the pin to `ca5eab87a7` Pull Request resolved: https://github.com/pytorch/pytorch/pull/106082 Approved by: https://github.com/albanD, https://github.com/Skylion007	2023-08-01 05:33:31 +00:00
Jane Xu	23f47f746b	[optim][rprop] Minimize intermediates=1 for foreach to save memory (#105193 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105193 Approved by: https://github.com/albanD	2023-07-31 20:59:26 +00:00
Jane Xu	dffa4e14b9	Add Optimizer state_dict hooks (#105953 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105953 Approved by: https://github.com/albanD	2023-07-28 11:52:41 +00:00
janEbert	b0708654c0	Implement NAdamW optimizer (#103881 ) NAdamW, which is simply NAdam with the AdamW weight decay term, has shown strong performance in optimizer comparisons such as 1. https://arxiv.org/abs/2211.09760 1. https://arxiv.org/abs/2306.07179 [The VeLO paper](https://arxiv.org/abs/2211.09760) argues its power lies in its ability to act as a superset of other popular optimizers. This PR adds NAdamW by ~~copying and making very small adaptations to the NAdam implementation (just like AdamW and Adam). To see the small changes in better detail, you can `diff torch/optim/nadam.py torch/optim/nadamw.py`.~~ adding a boolean flag `decoupled_weight_decay` that activates NAdamW behavior (`False` by default) to NAdam. Interest in the optimizer has also been shown in the PyTorch forums: https://discuss.pytorch.org/t/nadamw-and-demon-optimizers/179778 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103881 Approved by: https://github.com/janeyx99	2023-07-24 19:29:26 +00:00
Jane Xu	1959802548	[AdamW] Fix complex x amsgrad support (#104990 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104990 Approved by: https://github.com/albanD	2023-07-21 23:43:26 +00:00
Jane Xu	e1296a7f8d	[Adam] Fix complex x amsgrad support (#104989 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104989 Approved by: https://github.com/albanD	2023-07-21 23:43:26 +00:00
Jane Xu	e855348cdf	[foreach][SGD] minimize intermediates=1 to decrease peak memory (#105599 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105599 Approved by: https://github.com/albanD	2023-07-20 17:06:52 +00:00
Justin Chu	3721fa5612	[BE] Enable ruff's UP rules and autoformat optim/ (#105426 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105426 Approved by: https://github.com/malfet, https://github.com/albanD, https://github.com/aaronenyeshi, https://github.com/janeyx99	2023-07-18 21:07:43 +00:00
Jane Xu	cd15229950	[foreach][RMSprop] Minimize intermediates=2 to decrease peak memory (#105161 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105161 Approved by: https://github.com/albanD	2023-07-13 23:18:54 +00:00
Jane Xu	219cf2a1c8	[foreach][ASGD] Minimize intermediates=1 to decrease peak memory (#105146 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105146 Approved by: https://github.com/Skylion007, https://github.com/albanD	2023-07-13 23:18:54 +00:00
Jane Xu	0bc382ea55	[foreach][Adamax] Minimize intermediates=1 to decrease peak memory (#104991 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104991 Approved by: https://github.com/albanD, https://github.com/Skylion007	2023-07-12 03:09:17 +00:00
Jane Xu	ea6a563a8c	[foreach][Adagrad] Minimize intermediates=2 to decrease peak memory (#104988 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104988 Approved by: https://github.com/albanD, https://github.com/Skylion007	2023-07-12 03:09:17 +00:00
Jane Xu	455f495f04	[foreach][Adadelta] Minimize intermediates=3 to decrease peak memory (#104983 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104983 Approved by: https://github.com/albanD, https://github.com/Skylion007	2023-07-12 03:09:15 +00:00
Jane Xu	15aa401baa	[foreach][NAdam] Minimize use of intermediates to decrease peak memory (#104910 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104910 Approved by: https://github.com/Skylion007, https://github.com/albanD	2023-07-11 17:08:07 +00:00
Jane Xu	6878d3a157	[foreach][RAdam] Minimize use of intermediates to decrease peak memory (#104904 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104904 Approved by: https://github.com/albanD, https://github.com/Skylion007	2023-07-11 17:08:07 +00:00

1 2

62 Commits