pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Xuehai Pan	5cedc5a0ff	[BE][PYFMT] migrate PYFMT for `torch/[p-z]*/` to `ruff format` (#144552 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144552 Approved by: https://github.com/ezyang	2025-08-07 00:09:56 +00:00
Roy Hvaara	b95dadd717	[MPS] Enable RProp test for non-contiguous (#155439 ) I believe this issue has already been fixed, but I don't know the hero PR. I'm relying on ci signals to verify it's fixed across macOS versions. Fixes #118117 xref #115350 Pull Request resolved: https://github.com/pytorch/pytorch/pull/155439 Approved by: https://github.com/Skylion007	2025-06-09 21:29:09 +00:00
Roy Hvaara	3490a4f906	[MPS] Enable optimizer tests affected by addcdiv (#155437 ) Tracked in #118115. Fixed in #124442. This PR unskips the tests. xref #115350 Pull Request resolved: https://github.com/pytorch/pytorch/pull/155437 Approved by: https://github.com/Skylion007	2025-06-09 21:27:37 +00:00
cyy	d473c212fd	Remove code for Python < 3.9 (#147097 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/147097 Approved by: https://github.com/albanD	2025-02-14 03:22:49 +00:00
Aaron Orenstein	dea7ad3371	PEP585 update - torch/testing (#145200 ) See #145101 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145200 Approved by: https://github.com/bobrenjc93	2025-01-20 22:42:42 +00:00
bobrenjc93	3b6b306b71	Migrate from Tuple -> tuple in torch/testing (#144256 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144256 Approved by: https://github.com/aorenste	2025-01-10 06:37:55 +00:00
Aaron Gokaslan	e4a05dec0f	[BE][Ez]: Fix docs recommending inefficient tensor op order (#144270 ) `detach().clone()` is faster than `.clone().detatch()` since the gradients are not cloned. Let's update all the documentation and tests so that users do not use the inefficient op ordering. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144270 Approved by: https://github.com/awgu, https://github.com/XuehaiPan	2025-01-07 17:31:32 +00:00
Michael Lazos	1fd4757fdc	Support tensor betas in Adam and AdamW (#134171 ) Adds support for beta1 and beta2 to be wrapped in tensor for Adam and AdamW. Fixes https://github.com/pytorch/pytorch/issues/133898 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134171 Approved by: https://github.com/janeyx99	2024-11-15 21:55:55 +00:00
Nikita Shulga	9c88b08ac9	[BE] Replace `skipIfMPS` with `expectedFailureMPS` (#139940 ) Functionally two decorators are very similar, but one should rely on expectedFailure as much as possible to get signal when something is fixed. - Move `product_version` variable from `test_mps` to common_utils, but call it `MACOS_VERSION` - Introduce `skipIfMPSOnMacOS13` to decorate the hard crashes that happens only on MacOS13 (which at this point will not get any fixes and will be deprecated soon) - Add `device_type='mps'` to all `skipIfMPS` per https://github.com/pytorch/pytorch/issues/140560 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139940 Approved by: https://github.com/janeyx99, https://github.com/huydhn	2024-11-15 03:48:37 +00:00
Nikita Shulga	0f739b8f66	[Codemod] `skipIfMps`->`skipIfMPS` (#140562 ) As `MPS` is an acronym that stands for Metal Performance Shaders Also to closer align with `skipCUDAIf` not `skipCudaIf` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140562 Approved by: https://github.com/ZainRizvi, https://github.com/r-barnes	2024-11-13 19:45:08 +00:00
zeshengzong	cb71bcc542	Replace clone.detach with detach.clone (#140264 ) Fixes #64532 As state in issue, replace `clone.detach` by `detach.clone` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140264 Approved by: https://github.com/soulitzer	2024-11-13 07:01:02 +00:00
ErezYosef	197601eeea	Add Support for Tracking Parameter Names (named_parameters) in Optimizer State Dict (#134107 ) A proposal addressing Issue #1489: Optimizer should track parameter names and not id. (also mentioned in here: [[RFC] Introducing FQNs/clarity eyeglasses to optim state_dict](https://dev-discuss.pytorch.org/t/rfc-introducing-fqns-clarity-to-optim-state-dict/1552) ## Summary This PR introduces a backward-compatible enhancement where optimizers track parameter names instead of just their id. Optimizers can be initialized with `named_parameters()` as: ```python optimizer = optim.SGD(model.named_parameters(), lr=0.01, momentum=0.9) ``` This allows for greater clarity and ease when handling optimizers, as the parameters' names are preserved within the optimizer’s `state_dict` as: ``` state_dict = { 'state': { 0: {'momentum_buffer': tensor(...), ...}, 1: {'momentum_buffer': tensor(...), ...}, }, 'param_groups': [ { 'lr': 0.01, 'weight_decay': 0, ... 'params': [0,1] 'param_names' ['layer.weight', 'layer.bias'] (optional) } ] } ``` Loading `state_dict` is not changed (backward-compatible) and the `param_names` key will be ignored. ## Key Features #### Named Parameters in Optimizer Initialization: Optimizers can accept the output of `model.named_parameters()` during initialization, allowing them to store parameter names directly. #### Parameter Names in `state_dict`: The parameter names are saved as a list in the optimizer’s `state_dict` with key `param_names`, alongside the `params` indices, ensuring seamless tracking of both names and parameters. ## Backward Compatibility #### No Breaking Changes: This change is fully backward-compatible. The added `param_names` key in the optimizer's `state_dict` is ignored when loading a state to the optimizer. #### Customization with Hooks: For more control, the loaded state_dict can be modified using a custom `register_load_state_dict_pre_hook`, providing flexibility for different design needs. ## Documentation Updates Please refer to the documentation changes for more details on how this feature is implemented and how it can be used effectively. ## Solution Example: A suggested solution to the problem mentioned in #1489, for the same parameters but in a different order. The following `register_load_state_dict_pre_hook` should be added to the optimizer before loading to enable loading the state dict : ```python def adapt_state_dict_ids(optimizer, state_dict): # assuming a single param group. current_state_group = optimizer.state_dict()['param_groups'][0] loaded_state_group = state_dict['param_groups'][0] # same number of params, same names, only different ordering current_state_name_to_id_mapping = {} # mapping -- param_name: id for i, name in enumerate(current_state_group['param_names']): current_state_name_to_id_mapping[name] = current_state_group['params'][i] # changing the ids of the loaded state dict to match the order of the given state dict. for i, name in enumerate(current_state_group['param_names']): loaded_state_group['params'][i] = current_state_name_to_id_mapping[name] return state_dict ``` In this code, the loaded `state_dict` ids are adapted to match the order of the current optimizer `state_dict`. Both the previous and the current optimizers are required to be initiated with `named_parameters()` to have the 'param_names' key in the dict. ### Note This is my first contribution to PyTorch, and I wish to receive feedback or suggestions for improvement. Pull Request resolved: https://github.com/pytorch/pytorch/pull/134107 Approved by: https://github.com/janeyx99 Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>	2024-10-14 19:24:44 +00:00
Chien-Lin Chen	40de63be09	parameterized test_graph_optims and test_graph_scaling_fused_optimizers (#133749 ) Fixes #123451 This is a rework of a reverted pull request, https://github.com/pytorch/pytorch/pull/125127. The test failure is fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/133749 Approved by: https://github.com/janeyx99	2024-08-28 16:34:06 +00:00
Li, Xingyuan	dcfa415e6e	[Inductor UT] Reuse inductor UT for intel GPU `test/inductor/test_compiled_optimizers.py` (#133083 ) [Inductor UT] Reuse Inductor test case for Intel GPU. Reuse `test/inductor/test_compiled_optimizers.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/133083 Approved by: https://github.com/etaf, https://github.com/jansel, https://github.com/mlazos	2024-08-17 01:15:26 +00:00
Jane Xu	c23dceb8f1	Add Adafactor foreach impl (#132336 ) This PR adds the foreach impl for Adafactor knowing that there are many ways to improve its runtime perf today (by adding more foreach support). After this PR: - we have a foreach flag for Adafactor - It is NOT the default. Why not? It is only slightly faster + uses O(n) more memory where n is the number of params in your max param group. People tend to use Adafactor for memory efficiency. Next steps: - make torch.compile possible on it - make it faster (by adding more foreach apis) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132336 Approved by: https://github.com/albanD ghstack dependencies: #133360	2024-08-15 17:00:33 +00:00
Jane Xu	9c4cf866c2	Adafactor forloop basic impl (#129905 ) #109581 At this point, the vanilla implementation (the default) is good. Docs: https://docs-preview.pytorch.org/pytorch/pytorch/129905/generated/torch.optim.Adafactor.html#torch.optim.Adafactor Specifically, the impl in this PR, which attempts to replicate the paper, ``` optim = torch.optim.Adafactor([weight]) ``` is close enough to https://pytorch-optimizers.readthedocs.io/en/latest/optimizer/#pytorch_optimizer.AdaFactor ``` optim_c = AdaFactor([weight], betas=(0, 0.999), scale_parameter=False) ``` is close enough to https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adafactor ``` optim = keras.optimizers.Adafactor(learning_rate=0.01) ``` The three results respectively for the same randomly generated weights: ``` # ours tensor([[ 0.3807594, -0.3912092], [ 0.0762539, 0.5377805], [ 0.2459473, 0.4662207]]) # pytorch-optimizer tensor([[ 0.3807592, -0.3912172], [ 0.0762507, 0.5377818], [ 0.2459457, 0.4662213]]) # keras array([[ 0.38076326, -0.39121315], [ 0.0762547 , 0.5377859 ], [ 0.24594972, 0.46622536]], dtype=float32) ``` This gives me confidence to move forward in speeding up the implementation now that a baseline has been established. If you're curious about differences: * keras assigns step_size (rho_t in their code) to `min(lr, 1 / sqrt(step)` whereas the OG impl uses a hardcoded 0.01 instead of lr. We do the same thing as keras, but our lr default is 0.01. * We differ from the pytorch-optimizers default in that our default will not track momentum (thus `beta1=0`) and we do not apply parameter scaling. <details> Keras collab: https://colab.research.google.com/drive/1i3xF8ChL7TWKJGV_5v_5nMhXKnYmQQ06?usp=sharing My script repro: ``` import torch from pytorch_optimizer import AdaFactor torch.set_printoptions(precision=7) weight = torch.tensor([[ 0.37697506, -0.39500135], [ 0.07246649, 0.53399765], [ 0.24216151, 0.46243715]], dtype=torch.float32) # bias = torch.tensor([0, 0], dtype=torch.float32) weight.grad = torch.tensor([[-0.5940447, -0.7743838], [-0.5940447, -0.7743838], [-0.5940447, -0.7743838]], dtype=torch.float32) # bias.grad = torch.tensor([-2.5027974, 1.5422692], dtype=torch.float32) weight_c = weight.clone() weight_c.grad = weight.grad.clone() optim = torch.optim.Adafactor([weight]) optim.step() print(weight) optim_c = AdaFactor([weight_c], betas=(0, 0.999), scale_parameter=False) optim_c.step() print(weight_c) ``` <details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/129905 Approved by: https://github.com/albanD	2024-07-25 13:17:19 +00:00
Li-Huai (Allan) Lin	99d9b369f4	[Optim] Support tensor lr for all optimizers and check it is 1-element (#131065 ) Fixes: #130980 Pull Request resolved: https://github.com/pytorch/pytorch/pull/131065 Approved by: https://github.com/janeyx99	2024-07-23 04:27:05 +00:00
Jovian Anthony Jaison	e57101d927	Add testing regarding SparseAdam state_dicts (#130645 ) Summary: - Updated SparseAdam to run test_state_dict_deterministic unit test. - Made gradients sparse while keeping weights dense in the above test. Test Plan: - Ran test_optim.py locally. Fixes #116507 Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/130645 Approved by: https://github.com/janeyx99	2024-07-16 11:29:22 +00:00
Xuehai Pan	973037be6a	[BE][Easy] apply autofix for ruff rules unnecessary-collection-call (C408): `list()` / `tuple()` / `dict()` (#130199 ) This PR changes the empty collection factory call to Python literals: - `list()` -> `[]` - `tuple()` -> `()` - `dict()` -> `{}` The Python literals are more performant and safer. For example, the bytecode for building an empty dictionary: ```bash $ python3 -m dis - <<EOS import collections d1 = {} d2 = dict() dict = collections.OrderedDict d3 = dict() EOS ``` ```text 0 0 RESUME 0 1 2 LOAD_CONST 0 (0) 4 LOAD_CONST 1 (None) 6 IMPORT_NAME 0 (collections) 8 STORE_NAME 0 (collections) 3 10 BUILD_MAP 0 12 STORE_NAME 1 (d1) 4 14 PUSH_NULL 16 LOAD_NAME 2 (dict) 18 CALL 0 26 STORE_NAME 3 (d2) 6 28 LOAD_NAME 0 (collections) 30 LOAD_ATTR 8 (OrderedDict) 50 STORE_NAME 2 (dict) 7 52 PUSH_NULL 54 LOAD_NAME 2 (dict) 56 CALL 0 64 STORE_NAME 5 (d3) 66 RETURN_CONST 1 (None) ``` The dict literal `{}` only has one bytecode `BUILD_MAP`, while the factory call `dict()` has three `PUSH_NULL + LOAD_NAME + CALL`. Also, the factory call is not safe if users override the `dict` name in `locals` or `globals` (see the example of replacing with `OrderedDict` above). Pull Request resolved: https://github.com/pytorch/pytorch/pull/130199 Approved by: https://github.com/malfet	2024-07-11 17:30:28 +00:00
Li-Huai (Allan) Lin	d62d351107	[Optim][BE] Change str(device) to _get_device_type(device) (#129984 ) Prevent using vague expressions like `"cuda" in str(device)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129984 Approved by: https://github.com/janeyx99 ghstack dependencies: #129451, #129552	2024-07-04 06:44:48 +00:00
Li-Huai (Allan) Lin	8ec5ba960f	[MPS] Add tensor_lr overloads to fused adam & adamw (#129451 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129451 Approved by: https://github.com/janeyx99	2024-07-02 19:46:30 +00:00
Li-Huai (Allan) Lin	84ad5452f6	[MPS] Fused SGD optimizer (#129350 ) ``` [-------------------------------------- Fused SGD --------------------------------------] \| Fused: True \| Fused: False 1 threads: ------------------------------------------------------------------------------ numel: 1024, num_tensors: 100, momentum: True \| 2 \| 15 numel: 1024, num_tensors: 100, momentum: False \| 2 \| 5 numel: 65536, num_tensors: 100, momentum: True \| 3 \| 16 numel: 65536, num_tensors: 100, momentum: False \| 2 \| 5 numel: 1048576, num_tensors: 100, momentum: True \| 11 \| 16 numel: 1048576, num_tensors: 100, momentum: False \| 8 \| 6 numel: 1024, num_tensors: 500, momentum: True \| 29 \| 70 numel: 1024, num_tensors: 500, momentum: False \| 20 \| 24 numel: 65536, num_tensors: 500, momentum: True \| 33 \| 76 numel: 65536, num_tensors: 500, momentum: False \| 22 \| 26 numel: 1048576, num_tensors: 500, momentum: True \| 70 \| 80 numel: 1048576, num_tensors: 500, momentum: False \| 43 \| 40 numel: 1024, num_tensors: 1000, momentum: True \| 108 \| 139 numel: 1024, num_tensors: 1000, momentum: False \| 72 \| 48 numel: 65536, num_tensors: 1000, momentum: True \| 116 \| 150 numel: 65536, num_tensors: 1000, momentum: False \| 77 \| 52 numel: 1048576, num_tensors: 1000, momentum: True \| 190 \| 170 numel: 1048576, num_tensors: 1000, momentum: False \| 120 \| 50 ``` ```python def profile_fused_sgd(): from torch.optim.sgd import sgd import torch.utils.benchmark as benchmark import itertools def profile(fn, params, grads, momentum_buffer_list, fused): fn( params, grads, momentum_buffer_list, momentum=True if len(momentum_buffer_list) > 0 else False, dampening=0.0, nesterov=False, foreach=False, fused=fused, lr=1e-3, weight_decay=.0, maximize=False, grad_scale=None, found_inf=None, ) torch.mps.synchronize() device = "mps" results = [] for num_tensors, numel, momentum in itertools.product([100, 500, 1000], [1024, 65536, 1048576], [True, False]): sublabel = f"numel: {numel}, num_tensors: {num_tensors}, momentum: {momentum}" print(sublabel) params, grads = [[torch.arange(numel, dtype=torch.float32, device=device) + (numel * i) for i in range(num_tensors)] for _ in range(2)] momentum_buffer_list = [torch.arange(numel, dtype=torch.float32, device=device) + (numel * i) for i in range(num_tensors)] if momentum else [] fn = sgd for fused in [True, False]: t = benchmark.Timer( stmt='profile(fn, params, grads, momentum_buffer_list, fused)', label='Fused SGD', sub_label=sublabel, globals=locals(), description= f"Fused: {fused}", ).blocked_autorange(min_run_time=5) results.append(t) compare = benchmark.Compare(results) compare.trim_significant_figures() compare.colorize(rowwise=True) compare.print() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129350 Approved by: https://github.com/janeyx99 ghstack dependencies: #129006, #129008, #129007, #129105	2024-06-27 04:37:14 +00:00
Li-Huai (Allan) Lin	9a7e2519d3	[MPS] Fused Adam & AdamW (#127242 ) Summary: This PR adds fused Adam and AdamW implementations. Benchmark on Macbook Pro with M1 Max chip and 64GB unified memory: Fast math enabled: ``` [---------------------------------------------- Fused Adam ----------------------------------------------] \| Fused: True \| Fused: False 1 threads: ----------------------------------------------------------------------------------------------- amsgrad: True, adamWflag: True, numel: 1024, num_tensors: 100 \| 10 \| 100 amsgrad: False, adamWflag: True, numel: 1024, num_tensors: 100 \| 9 \| 89 amsgrad: True, adamWflag: False, numel: 1024, num_tensors: 100 \| 9 \| 90 amsgrad: False, adamWflag: False, numel: 1024, num_tensors: 100 \| 9 \| 83 amsgrad: True, adamWflag: True, numel: 65536, num_tensors: 100 \| 12 \| 94 amsgrad: False, adamWflag: True, numel: 65536, num_tensors: 100 \| 11 \| 88 amsgrad: True, adamWflag: False, numel: 65536, num_tensors: 100 \| 12 \| 90 amsgrad: False, adamWflag: False, numel: 65536, num_tensors: 100 \| 11 \| 100 amsgrad: True, adamWflag: True, numel: 1048576, num_tensors: 100 \| 27 \| 100 amsgrad: False, adamWflag: True, numel: 1048576, num_tensors: 100 \| 23 \| 100 amsgrad: True, adamWflag: False, numel: 1048576, num_tensors: 100 \| 27 \| 100 amsgrad: False, adamWflag: False, numel: 1048576, num_tensors: 100 \| 23 \| 98 amsgrad: True, adamWflag: True, numel: 1024, num_tensors: 500 \| 82 \| 480 amsgrad: False, adamWflag: True, numel: 1024, num_tensors: 500 \| 72 \| 450 amsgrad: True, adamWflag: False, numel: 1024, num_tensors: 500 \| 82 \| 450 amsgrad: False, adamWflag: False, numel: 1024, num_tensors: 500 \| 73 \| 420 amsgrad: True, adamWflag: True, numel: 65536, num_tensors: 500 \| 91 \| 500 amsgrad: False, adamWflag: True, numel: 65536, num_tensors: 500 \| 83 \| 400 amsgrad: True, adamWflag: False, numel: 65536, num_tensors: 500 \| 94 \| 500 amsgrad: False, adamWflag: False, numel: 65536, num_tensors: 500 \| 78 \| 400 amsgrad: True, adamWflag: True, numel: 1048576, num_tensors: 500 \| 170 \| 500 amsgrad: False, adamWflag: True, numel: 1048576, num_tensors: 500 \| 140 \| 600 amsgrad: True, adamWflag: False, numel: 1048576, num_tensors: 500 \| 170 \| 600 amsgrad: False, adamWflag: False, numel: 1048576, num_tensors: 500 \| 140 \| 500 amsgrad: True, adamWflag: True, numel: 1024, num_tensors: 1000 \| 250 \| 890 amsgrad: False, adamWflag: True, numel: 1024, num_tensors: 1000 \| 220 \| 850 amsgrad: True, adamWflag: False, numel: 1024, num_tensors: 1000 \| 250 \| 830 amsgrad: False, adamWflag: False, numel: 1024, num_tensors: 1000 \| 220 \| 770 amsgrad: True, adamWflag: True, numel: 65536, num_tensors: 1000 \| 270 \| 870 amsgrad: False, adamWflag: True, numel: 65536, num_tensors: 1000 \| 230 \| 840 amsgrad: True, adamWflag: False, numel: 65536, num_tensors: 1000 \| 270 \| 810 amsgrad: False, adamWflag: False, numel: 65536, num_tensors: 1000 \| 240 \| 800 amsgrad: True, adamWflag: True, numel: 1048576, num_tensors: 1000 \| 400 \| 1000 amsgrad: False, adamWflag: True, numel: 1048576, num_tensors: 1000 \| 360 \| 2000 amsgrad: True, adamWflag: False, numel: 1048576, num_tensors: 1000 \| 430 \| 2000 amsgrad: False, adamWflag: False, numel: 1048576, num_tensors: 1000 \| 360 \| 1300 Times are in milliseconds (ms). ``` Fast math disabled: ``` [---------------------------------------------- Fused Adam ----------------------------------------------] \| Fused: True \| Fused: False 1 threads: ----------------------------------------------------------------------------------------------- amsgrad: True, adamWflag: True, numel: 1024, num_tensors: 100 \| 10 \| 100 amsgrad: False, adamWflag: True, numel: 1024, num_tensors: 100 \| 9 \| 84 amsgrad: True, adamWflag: False, numel: 1024, num_tensors: 100 \| 9 \| 84 amsgrad: False, adamWflag: False, numel: 1024, num_tensors: 100 \| 9 \| 79 amsgrad: True, adamWflag: True, numel: 65536, num_tensors: 100 \| 11 \| 93 amsgrad: False, adamWflag: True, numel: 65536, num_tensors: 100 \| 10 \| 90 amsgrad: True, adamWflag: False, numel: 65536, num_tensors: 100 \| 11 \| 91 amsgrad: False, adamWflag: False, numel: 65536, num_tensors: 100 \| 11 \| 81 amsgrad: True, adamWflag: True, numel: 1048576, num_tensors: 100 \| 34 \| 100 amsgrad: False, adamWflag: True, numel: 1048576, num_tensors: 100 \| 31 \| 100 amsgrad: True, adamWflag: False, numel: 1048576, num_tensors: 100 \| 34 \| 95 amsgrad: False, adamWflag: False, numel: 1048576, num_tensors: 100 \| 31 \| 100 amsgrad: True, adamWflag: True, numel: 1024, num_tensors: 500 \| 94 \| 500 amsgrad: False, adamWflag: True, numel: 1024, num_tensors: 500 \| 82 \| 430 amsgrad: True, adamWflag: False, numel: 1024, num_tensors: 500 \| 92 \| 430 amsgrad: False, adamWflag: False, numel: 1024, num_tensors: 500 \| 81 \| 390 amsgrad: True, adamWflag: True, numel: 65536, num_tensors: 500 \| 98 \| 500 amsgrad: False, adamWflag: True, numel: 65536, num_tensors: 500 \| 88 \| 430 amsgrad: True, adamWflag: False, numel: 65536, num_tensors: 500 \| 100 \| 500 amsgrad: False, adamWflag: False, numel: 65536, num_tensors: 500 \| 88 \| 400 amsgrad: True, adamWflag: True, numel: 1048576, num_tensors: 500 \| 210 \| 500 amsgrad: False, adamWflag: True, numel: 1048576, num_tensors: 500 \| 190 \| 610 amsgrad: True, adamWflag: False, numel: 1048576, num_tensors: 500 \| 210 \| 510 amsgrad: False, adamWflag: False, numel: 1048576, num_tensors: 500 \| 190 \| 500 amsgrad: True, adamWflag: True, numel: 1024, num_tensors: 1000 \| 300 \| 900 amsgrad: False, adamWflag: True, numel: 1024, num_tensors: 1000 \| 260 \| 850 amsgrad: True, adamWflag: False, numel: 1024, num_tensors: 1000 \| 295 \| 900 amsgrad: False, adamWflag: False, numel: 1024, num_tensors: 1000 \| 260 \| 800 amsgrad: True, adamWflag: True, numel: 65536, num_tensors: 1000 \| 320 \| 910 amsgrad: False, adamWflag: True, numel: 65536, num_tensors: 1000 \| 280 \| 900 amsgrad: True, adamWflag: False, numel: 65536, num_tensors: 1000 \| 320 \| 900 amsgrad: False, adamWflag: False, numel: 65536, num_tensors: 1000 \| 300 \| 900 amsgrad: True, adamWflag: True, numel: 1048576, num_tensors: 1000 \| 500 \| 2000 amsgrad: False, adamWflag: True, numel: 1048576, num_tensors: 1000 \| 480 \| 2000 amsgrad: True, adamWflag: False, numel: 1048576, num_tensors: 1000 \| 540 \| 1500 amsgrad: False, adamWflag: False, numel: 1048576, num_tensors: 1000 \| 480 \| 1200 Times are in milliseconds (ms). ``` ```python def profile_fused_adam(): from torch.optim import adam, adamw import torch.utils.benchmark as benchmark import itertools def profile(fn, params, grads, exp_avgs, exp_avg_sqs, max_exp_avg_sqs, state_steps, amsgrad, fused): fn( params, grads, exp_avgs, exp_avg_sqs, max_exp_avg_sqs, state_steps, foreach=False, capturable=False, fused=fused, amsgrad=amsgrad, beta1=0.9, beta2=0.99, lr=1e-3, weight_decay=.0, eps=1e-5, maximize=False, grad_scale=None, found_inf=None, ) torch.mps.synchronize() device = "mps" results = [] for num_tensors, numel, adamWflag, amsgrad in itertools.product([100, 500, 1000], [1024, 65536, 1048576], [True, False], [True, False]): print(f"amsgrad: {amsgrad}, adamWflag: {adamWflag}, numel: {numel}, num_tensors: {num_tensors}") params, grads, exp_avgs, exp_avg_sqs = [[torch.arange(numel, dtype=torch.float32, device=device) + (numel * i) for i in range(num_tensors)] for _ in range(4)] max_exp_avg_sqs = [torch.arange(numel, dtype=torch.float32, device=device) for _ in range(num_tensors)] if amsgrad else [] state_steps = [torch.tensor([5], dtype=torch.float32, device=device) for _ in range(num_tensors)] if adamWflag: fn = adamw.adamw else: fn = adam.adam for fused in [True, False]: t = benchmark.Timer( stmt='profile(fn, params, grads, exp_avgs, exp_avg_sqs, max_exp_avg_sqs, state_steps, amsgrad, fused)', label='Fused Adam', sub_label=f"amsgrad: {amsgrad}, adamWflag: {adamWflag}, numel: {numel}, num_tensors: {num_tensors}", globals=locals(), description= f"Fused: {fused}", ).blocked_autorange(min_run_time=5) results.append(t) compare = benchmark.Compare(results) compare.trim_significant_figures() compare.colorize(rowwise=True) compare.print() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/127242 Approved by: https://github.com/kulinseth, https://github.com/janeyx99	2024-06-18 19:59:50 +00:00
Michael Lazos	a61939467a	Enable passing dynamo-traced complex test (#128771 ) Fixes https://github.com/pytorch/pytorch/issues/118159 Pull Request resolved: https://github.com/pytorch/pytorch/pull/128771 Approved by: https://github.com/anijain2305	2024-06-16 07:28:09 +00:00
Michael Lazos	638f543ac2	Enable single nadam test (#128087 ) https://github.com/pytorch/pytorch/issues/117150 has been fixed Pull Request resolved: https://github.com/pytorch/pytorch/pull/128087 Approved by: https://github.com/xmfan	2024-06-06 06:25:00 +00:00
cyy	d44daebdbc	[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051 Approved by: https://github.com/cpuhrsch, https://github.com/malfet	2024-05-31 01:20:45 +00:00
feifan	da9fb670d2	Nadam support the flag for "maximize" (#127214 ) Fixes https://github.com/pytorch/pytorch/issues/126642 Pull Request resolved: https://github.com/pytorch/pytorch/pull/127214 Approved by: https://github.com/janeyx99	2024-05-31 01:11:16 +00:00
SandishKumarHN	da39461d61	[optim] Move test_grad_scaling_autocast_fused_optimizers to test_cuda.py (#126418 ) this PR address the comments in this PR #124904 - Move test_grad_scaling_autocast_fused_optimizers to test_cuda.py - Combine _grad_scaling_autocast_fused_optimizers into test_grad_scaling_autocast_fused_optimizers - Move to OptimizerInfo framework. - For failing tests test_grad_scaling_autocast_fused_optimizers AdamW_cuda_float32, Adam_cuda_float32 - Added toleranceOverride in this PR - created a issue #127000 ``` > (c2env) [sandish@devgpu166.ash6 ~/pytorch (refactoroptimizers)]$ python test/test_cuda.py -k test_grad_scaling_autocast_fused_optimizers -v /home/sandish/pytorch/torch/backends/cudnn/__init__.py:106: UserWarning: PyTorch was compiled without cuDNN/MIOpen support. To use cuDNN/MIOpen, rebuild PyTorch making sure the library is visible to the build system. warnings.warn( /home/sandish/pytorch/torch/backends/cudnn/__init__.py:106: UserWarning: PyTorch was compiled without cuDNN/MIOpen support. To use cuDNN/MIOpen, rebuild PyTorch making sure the library is visible to the build system. warnings.warn( test_grad_scaling_autocast_fused_optimizers_Adagrad_cpu_float32 (__main__.TestCudaOptimsCPU) ... {'fused': True} {'fused': True} {'weight_decay': 0.1, 'fused': True} {'weight_decay': 0.1, 'fused': True} {'weight_decay': 0.1, 'maximize': True, 'fused': True} {'weight_decay': 0.1, 'maximize': True, 'fused': True} {'lr': 0.1, 'fused': True} {'lr': 0.1, 'fused': True} {'initial_accumulator_value': 0.1, 'weight_decay': 0.1, 'fused': True} {'initial_accumulator_value': 0.1, 'weight_decay': 0.1, 'fused': True} {'lr': 0.1, 'lr_decay': 0.5, 'weight_decay': 0.1, 'fused': True} {'lr': 0.1, 'lr_decay': 0.5, 'weight_decay': 0.1, 'fused': True} {'lr': tensor(0.0010), 'fused': True} {'lr': tensor(0.0010), 'fused': True} ok test_grad_scaling_autocast_fused_optimizers_AdamW_cpu_float32 (__main__.TestCudaOptimsCPU) ... {'fused': True} {'fused': True} {'lr': 0.01, 'fused': True} {'lr': 0.01, 'fused': True} {'weight_decay': 0.1, 'fused': True} {'weight_decay': 0.1, 'fused': True} {'weight_decay': 0.1, 'maximize': True, 'fused': True} {'weight_decay': 0.1, 'maximize': True, 'fused': True} {'weight_decay': 0.1, 'amsgrad': True, 'fused': True} {'weight_decay': 0.1, 'amsgrad': True, 'fused': True} ok test_grad_scaling_autocast_fused_optimizers_Adam_cpu_float32 (__main__.TestCudaOptimsCPU) ... {'fused': True} {'fused': True} {'lr': 0.01, 'fused': True} {'lr': 0.01, 'fused': True} {'weight_decay': 0.1, 'fused': True} {'weight_decay': 0.1, 'fused': True} {'weight_decay': 0.1, 'maximize': True, 'fused': True} {'weight_decay': 0.1, 'maximize': True, 'fused': True} {'weight_decay': 0.1, 'amsgrad': True, 'fused': True} {'weight_decay': 0.1, 'amsgrad': True, 'fused': True} ok test_grad_scaling_autocast_fused_optimizers_SGD_cpu_float32 (__main__.TestCudaOptimsCPU) ... {'fused': True} {'fused': True} {'lr': 0.01, 'fused': True} {'lr': 0.01, 'fused': True} {'lr': tensor(0.0010), 'fused': True} {'lr': tensor(0.0010), 'fused': True} {'momentum': 0.9, 'fused': True} {'momentum': 0.9, 'fused': True} {'momentum': 0.9, 'dampening': 0.5, 'fused': True} {'momentum': 0.9, 'dampening': 0.5, 'fused': True} {'momentum': 0.9, 'weight_decay': 0.1, 'fused': True} {'momentum': 0.9, 'weight_decay': 0.1, 'fused': True} {'momentum': 0.9, 'nesterov': True, 'weight_decay': 0.1, 'fused': True} {'momentum': 0.9, 'nesterov': True, 'weight_decay': 0.1, 'fused': True} {'weight_decay': 0.1, 'maximize': True, 'fused': True} {'weight_decay': 0.1, 'maximize': True, 'fused': True} ok test_grad_scaling_autocast_fused_optimizers_Adagrad_cuda_float32 (__main__.TestCudaOptimsCUDA) ... skipped 'cuda is not supported for fused on Adagrad' test_grad_scaling_autocast_fused_optimizers_AdamW_cuda_float32 (__main__.TestCudaOptimsCUDA) ... {'fused': True} {'fused': True} {'lr': 0.01, 'fused': True} {'lr': 0.01, 'fused': True} {'weight_decay': 0.1, 'fused': True} {'weight_decay': 0.1, 'fused': True} {'weight_decay': 0.1, 'maximize': True, 'fused': True} {'weight_decay': 0.1, 'maximize': True, 'fused': True} {'weight_decay': 0.1, 'amsgrad': True, 'fused': True} {'weight_decay': 0.1, 'amsgrad': True, 'fused': True} {'capturable': True, 'fused': True} {'capturable': True, 'fused': True} {'weight_decay': 0.1, 'amsgrad': True, 'capturable': True, 'fused': True} {'weight_decay': 0.1, 'amsgrad': True, 'capturable': True, 'fused': True} {'lr': tensor(0.0010), 'amsgrad': True, 'capturable': True, 'fused': True} {'lr': tensor(0.0010), 'amsgrad': True, 'capturable': True, 'fused': True} ok test_grad_scaling_autocast_fused_optimizers_Adam_cuda_float32 (__main__.TestCudaOptimsCUDA) ... {'fused': True} {'fused': True} {'lr': 0.01, 'fused': True} {'lr': 0.01, 'fused': True} {'weight_decay': 0.1, 'fused': True} {'weight_decay': 0.1, 'fused': True} {'weight_decay': 0.1, 'maximize': True, 'fused': True} {'weight_decay': 0.1, 'maximize': True, 'fused': True} {'weight_decay': 0.1, 'amsgrad': True, 'fused': True} {'weight_decay': 0.1, 'amsgrad': True, 'fused': True} {'capturable': True, 'fused': True} {'capturable': True, 'fused': True} {'weight_decay': 0.1, 'amsgrad': True, 'capturable': True, 'fused': True} {'weight_decay': 0.1, 'amsgrad': True, 'capturable': True, 'fused': True} {'lr': tensor(0.0010), 'amsgrad': True, 'capturable': True, 'fused': True} {'lr': tensor(0.0010), 'amsgrad': True, 'capturable': True, 'fused': True} ok test_grad_scaling_autocast_fused_optimizers_SGD_cuda_float32 (__main__.TestCudaOptimsCUDA) ... {'fused': True} {'fused': True} {'lr': 0.01, 'fused': True} {'lr': 0.01, 'fused': True} {'lr': tensor(0.0010), 'fused': True} {'lr': tensor(0.0010), 'fused': True} {'momentum': 0.9, 'fused': True} {'momentum': 0.9, 'fused': True} {'momentum': 0.9, 'dampening': 0.5, 'fused': True} {'momentum': 0.9, 'dampening': 0.5, 'fused': True} {'momentum': 0.9, 'weight_decay': 0.1, 'fused': True} {'momentum': 0.9, 'weight_decay': 0.1, 'fused': True} {'momentum': 0.9, 'nesterov': True, 'weight_decay': 0.1, 'fused': True} {'momentum': 0.9, 'nesterov': True, 'weight_decay': 0.1, 'fused': True} {'weight_decay': 0.1, 'maximize': True, 'fused': True} {'weight_decay': 0.1, 'maximize': True, 'fused': True} ok ---------------------------------------------------------------------- Ran 8 tests in 16.117s OK (skipped=1) > lintrunner test/test_cuda.py ---------------------------------------------------------------------- ok No lint issues. > lintrunner torch/testing/_internal/common_optimizers.py ---------------------------------------------------------------------- ok No lint issues. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/126418 Approved by: https://github.com/janeyx99	2024-05-30 01:47:41 +00:00
PyTorch MergeBot	67739d8c6f	Revert "[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051 )" This reverts commit `699db7988d`. Reverted https://github.com/pytorch/pytorch/pull/127051 on behalf of https://github.com/PaliC due to This PR needs to be synced using the import button as there is a bug in our diff train ([comment](https://github.com/pytorch/pytorch/pull/127051#issuecomment-2138496995))	2024-05-30 01:16:57 +00:00
cyy	699db7988d	[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051 Approved by: https://github.com/cpuhrsch, https://github.com/malfet	2024-05-29 11:58:03 +00:00
PyTorch MergeBot	cdbb2c9acc	Revert "[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051 )" This reverts commit `4fdbaa794f`. Reverted https://github.com/pytorch/pytorch/pull/127051 on behalf of https://github.com/PaliC due to This PR needs to be synced using the import button as there is a bug in our diff train ([comment](https://github.com/pytorch/pytorch/pull/127051#issuecomment-2136428735))	2024-05-29 03:02:35 +00:00
feifan	22712ba5c5	Radam support the flag for "maximize" (#126765 ) Fixes #[126642](https://github.com/pytorch/pytorch/issues/126642) I reference the maximize in `Adam` and add `Radam's` maximize flag. If this pr is OK, I will add another pr for `Nadam`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126765 Approved by: https://github.com/janeyx99	2024-05-27 06:34:50 +00:00
cyy	4fdbaa794f	[Submodule] Remove deprecated USE_TBB option and TBB submodule (#127051 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127051 Approved by: https://github.com/cpuhrsch, https://github.com/malfet	2024-05-27 03:54:03 +00:00
Jane Xu	665637714f	Remove SparseAdam weird allowance of raw Tensor input (#127081 ) This continues the full deprecation after https://github.com/pytorch/pytorch/pull/114425. It's been 6 months! And I'm fairly certain no one is going to yell at me as this patch is not really used. ------ # BC Breaking note As of this PR, SparseAdam will become consistent with the rest of our optimizers in that it will only accept containers of Tensors/Parameters/param groups and fully complete deprecation of this path. Hitherto, the SparseAdam constructor had allowed raw tensors as the params argument to the constructor. Now, if you write the following code, there will be an error similar to every other optim: "params argument given to the optimizer should be an iterable of Tensors or dicts" ``` import torch param = torch.rand(16, 32) optimizer = torch.optim.SparseAdam(param) ``` Instead you should replace the last line with ``` optimizer = torch.optim.SparseAdam([param]) ``` to no longer error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127081 Approved by: https://github.com/soulitzer	2024-05-25 02:58:24 +00:00
eqy	ebbd431d9e	[CPU] Bump `test_complex_2d` thresholds for LBFGS on `complex64` (#126358 ) Is this supposed to be bitwise identical? Wasn't sure how to interpret the comment but it seems to be giving mismatches like: ``` Mismatched elements: 1 / 2 (50.0%) Greatest absolute difference: 4.6372413635253906e-05 at index (1,) (up to 1e-05 allowed) Greatest relative difference: 3.4600801882334054e-05 at index (1,) (up to 1.3e-06 allowed) To execute this test, run the following from the base repo dir: python test/test_optim.py -k test_complex_2d_LBFGS_cpu_complex64 ``` on Neoverse-N2 SBSA ARM CPUs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126358 Approved by: https://github.com/lezcano, https://github.com/janeyx99	2024-05-23 00:16:45 +00:00
PyTorch MergeBot	cb69c51b6f	Revert " Updated test_graph_optims and test_graph_scaling_fused_optimizers to use new OptimizerInfo infrastructure (#125127 )" This reverts commit `cf35a591b9`. Reverted https://github.com/pytorch/pytorch/pull/125127 on behalf of https://github.com/DanilBaibak due to Broken trunk ([comment](https://github.com/pytorch/pytorch/pull/125127#issuecomment-2120337584))	2024-05-20 12:14:22 +00:00
jayanth domalapalli	cf35a591b9	Updated test_graph_optims and test_graph_scaling_fused_optimizers to use new OptimizerInfo infrastructure (#125127 ) This PR is meant to address issue #123451, more specifically, the ```test_graph_optims``` and ```test_graph_scaling_fused_optimizers``` functions in ```test_cuda.py``` have been updated so that they now use the new OptimizerInfo infrastructure. Lintrunner passed: ``` $ lintrunner test/test_cuda.py ok No lint issues. ``` Tests passed: ``` >python test_cuda.py -k test_graph_optims Ran 19 tests in 7.463s OK (skipped=9) >python test_cuda.py -k test_graph_scaling_fused_optimizers Ran 6 tests in 2.800s OK (skipped=3) ``` Both the functions have been moved to the newly created TestCase class ```TestCudaOptims```. The test is mostly the same except the ```@optims``` decorator is used at the top of the function to implicitly call the function using each of the optimizers mentioned in the decorator instead of explicitly using a for loop to iterate through each of the optimizers. I was unable to use the ```_get_optim_inputs_including_global_cliquey_kwargs``` to get all kwargs for each of the optimizers since some of the kwargs that are used in the original ```test_graph_optims``` function are not being returned by the new OptimizerInfo infrastructure, more specifically, for the ```torch.optim.rmsprop.RMSprop``` optimizer, the following kwargs are not returned whenever ```_get_optim_inputs_including_global_cliquey_kwargs``` is called: ``` {'foreach': False, 'maximize': True, 'weight_decay': 0} { 'foreach': True, 'maximize': True, 'weight_decay': 0} ``` I ran into the same issue for ```test_graph_scaling_fused_optimizers```, for the ```torch.optim.adamw.AdamW``` optimizer, whenever ```optim_info.optim_inputs_func(device=device)``` was called, the following kwarg was not returned: ``` {'amsgrad': True} ``` Due to this issue, I resorted to using a dictionary to store the kwargs for each of the optimizers, I am aware that this is less than ideal. I was wondering whether I should use the OptimizerInfo infrastructure to get all the kwargs regardless of the fact that it lacks some kwargs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125127 Approved by: https://github.com/janeyx99	2024-05-20 06:20:45 +00:00
David Chiu	7e166e8057	[optim] Fix: wrong ASGD implementation (#126375 ) This PR is based on #125440, additionally merging the latest main branch and fixing the lint failures from #126361. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126375 Approved by: https://github.com/janeyx99	2024-05-17 15:46:39 +00:00
PyTorch MergeBot	e3c5d1b7d7	Revert "[optim] Fix: wrong ASGD implementation (#125440 )" This reverts commit `2c5ad9a3d7`. Reverted https://github.com/pytorch/pytorch/pull/125440 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it looks like there is a linter failure coming from this change ([comment](https://github.com/pytorch/pytorch/pull/125440#issuecomment-2113833108))	2024-05-16 02:12:29 +00:00
haozhe.zhu	f9d107af66	[optim] add fused_adagrad support for CPU device (#124905 ) Support fused_sgd_kernel support for CPU. ## Bench result: 32 core/sockets ICX Test Scripts: https://gist.github.com/zhuhaozhe/79e842e0a6e25d6d7fa1e4598807272c https://gist.github.com/zhuhaozhe/b4c6998a509dcea1796dd05b3005c969 ``` Tensor Size: 262144, Num Tensor 4, Num Threads: 1 _single_tensor_adagrad time: 0.2500 seconds _fused_adagrad time: 0.0933 seconds Tensor Size: 4194304, Num Tensor 32, Num Threads: 32 _single_tensor_adagrad time: 2.8819 seconds _fused_adagrad time: 1.7591 seconds ``` ## Test Plan: ``` python test_optim.py -k test_fused_matches_forloop python test_optim.py -k test_fused_large_tensor python test_optim.py -k test_can_load_older_state_dict python test_optim.py -k test_grad_scaling_autocast_fused_optimizers python test_torch.py -k test_grad_scaling_autocast_fused python test_torch.py -k test_params_invalidated_with_grads_invalidated_between_unscale_and_step ``` Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/124905 Approved by: https://github.com/jgong5, https://github.com/janeyx99	2024-05-16 01:11:51 +00:00
David Chiu	2c5ad9a3d7	[optim] Fix: wrong ASGD implementation (#125440 ) > previous: Originally, the variables `new_eta` and `new_mu` would be constructed `len(grouped_mus)` times, but each of their values is the same and won't be changed. Therefore, it can be simplified using Python list multiplication, which only constructs one tensor. - [X] Ill assumption that every param will have the same step. - [x] DIfferent implementation between `foreach=Ture` and `foreach=False`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/125440 Approved by: https://github.com/janeyx99	2024-05-15 22:52:15 +00:00
PyTorch MergeBot	bd3cbdba2f	Revert "[optim] add fused_adagrad support for CPU device (#124905 )" This reverts commit `1c3fe84033`. Reverted https://github.com/pytorch/pytorch/pull/124905 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing distributed multigpu test in trunk `1c3fe84033` ([comment](https://github.com/pytorch/pytorch/pull/124905#issuecomment-2108777063))	2024-05-13 20:53:22 +00:00
haozhe.zhu	1c3fe84033	[optim] add fused_adagrad support for CPU device (#124905 ) Support fused_sgd_kernel support for CPU. ## Bench result: 32 core/sockets ICX Test Scripts: https://gist.github.com/zhuhaozhe/79e842e0a6e25d6d7fa1e4598807272c https://gist.github.com/zhuhaozhe/b4c6998a509dcea1796dd05b3005c969 ``` Tensor Size: 262144, Num Tensor 4, Num Threads: 1 _single_tensor_adagrad time: 0.2500 seconds _fused_adagrad time: 0.0933 seconds Tensor Size: 4194304, Num Tensor 32, Num Threads: 32 _single_tensor_adagrad time: 2.8819 seconds _fused_adagrad time: 1.7591 seconds ``` ## Test Plan: ``` python test_optim.py -k test_fused_matches_forloop python test_optim.py -k test_fused_large_tensor python test_optim.py -k test_can_load_older_state_dict python test_optim.py -k test_grad_scaling_autocast_fused_optimizers python test_torch.py -k test_grad_scaling_autocast_fused python test_torch.py -k test_params_invalidated_with_grads_invalidated_between_unscale_and_step ``` Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/124905 Approved by: https://github.com/jgong5, https://github.com/janeyx99	2024-05-13 01:16:20 +00:00
Michael Lazos	b24ad7eab5	Enable dynamo traced test_param_group_with_lrscheduler_goes_right_direction (#124544 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124544 Approved by: https://github.com/janeyx99 ghstack dependencies: #125825, #125826	2024-05-11 06:29:59 +00:00
Michael Lazos	e3d5afc60a	Enable dynamo'd test for 116499 (#123469 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123469 Approved by: https://github.com/janeyx99 ghstack dependencies: #123619	2024-05-07 22:17:01 +00:00
Michael Lazos	f0c6d6100b	Enable dynamo-traced optimizer peak memory tests (#124543 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124543 Approved by: https://github.com/yf225, https://github.com/janeyx99	2024-05-07 08:21:50 +00:00
Michael Lazos	787afc5180	Add LR as tensor tests (#123750 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/123750 Approved by: https://github.com/janeyx99	2024-05-01 04:46:49 +00:00
haozhe.zhu	3c964ad1ca	add fused_sgd_kernel support for CPU device (#123629 ) Support fused_sgd_kernel support for CPU. ## Bench result: 32 core/sockets ICX Test Scripts: https://gist.github.com/zhuhaozhe/688763e17e93e4c5e12f25f676ec90d9 https://gist.github.com/zhuhaozhe/ad9938694bc7fae8b66d376f4dffc6c9 ``` Tensor Size: 262144, Num Tensor 4, Num Threads: 1 _single_tensor_sgd time: 0.2301 seconds _fused_sgd time: 0.0925 seconds Tensor Size: 4194304, Num Tensor 32, Num Threads: 32 _single_tensor_sgd time: 2.6195 seconds _fused_sgd time: 1.7543 seconds ``` ## Test Plan: ``` python test_optim.py -k test_fused_matches_forloop python test_optim.py -k test_fused_large_tensor python test_optim.py -k test_can_load_older_state_dict python test_optim.py -k test_grad_scaling_autocast_fused_optimizers python test_torch.py -k test_grad_scaling_autocast_fused python test_torch.py -k test_params_invalidated_with_grads_invalidated_between_unscale_and_step ``` Looks like we already have some PRs under this issue https://github.com/pytorch/pytorch/issues/123451 to unified the UTs, I did not modified UT in this PR. Co-authored-by: Jane Xu <janeyx@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/123629 Approved by: https://github.com/jgong5, https://github.com/janeyx99	2024-04-23 08:28:19 +00:00
Michael Lazos	0d0b5b2655	Enable dynamo rosenbrock sparse tests (#124542 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124542 Approved by: https://github.com/yf225 ghstack dependencies: #124540, #124541	2024-04-20 05:54:41 +00:00
Michael Lazos	184f16016e	Enable dynamo-traced deepcopy test for RMSprop (#124541 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124541 Approved by: https://github.com/yf225 ghstack dependencies: #124540	2024-04-20 05:54:41 +00:00

1 2 3

107 Commits