pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Wanchao Liang	3562ca2da2	[dist_optim] add warning to distributed optimizer (#50630 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50630 Add a warning log to distributed optimizer, to warn user the optimizer is created without TorchScript support. Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D25932777 Pulled By: wanchaol fbshipit-source-id: 8db3b98bdd27fc04c5a3b8d910b028c0c37f138d	2021-01-26 10:30:55 -08:00
Wanchao Liang	2c3c2a4b7a	[dist_optim] add distributed functional AdamW optimizer (#50620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50620 Add TorchScript compatible AdamW functional optimizer to distributed optimizer Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D25932774 Pulled By: wanchaol fbshipit-source-id: 64eb4aeaa3cab208d0ebbec7c4d91a9d43951947	2021-01-23 01:04:45 -08:00
Wanchao Liang	3f982e56b1	[dist_optim] add distributed functional RMSprop optimizer (#50619 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50619 Add TorchScript compatible RMSprop functional optimizer to distributed optimizer Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D25932775 Pulled By: wanchaol fbshipit-source-id: bd4854f9f95a740e02a1bebe24f780488460ba4d	2021-01-23 01:04:41 -08:00
Wanchao Liang	6c81b4d917	[dist_optim] add distributed functional Adadelta optimizer (#50623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50623 Add TorchScript compatible Adadelta functional optimizer to distributed optimizer Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D25932772 Pulled By: wanchaol fbshipit-source-id: d59b04e5f0b6bab7e0d1c5f68e66249a65958e0b	2021-01-23 01:04:36 -08:00
Wanchao Liang	cd2067539e	[dist_optim] add distributed functional sgd optimizer (#50618 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50618 Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D25932778 Pulled By: wanchaol fbshipit-source-id: 8df3567b477bc5ba3556b8c5294cd3da5db963ad	2021-01-23 01:04:32 -08:00
Wanchao Liang	5cbe1e4933	[dist_optim] add distributed functional Adam optimizer (#50624 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50624 Add TorchScript compatible Adam functional optimizer to distributed optimizer Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D25932770 Pulled By: wanchaol fbshipit-source-id: cab3f1164c76186969c284a2c52481b79bbb7190	2021-01-23 01:01:37 -08:00
Benjamin Lefaudeux	87fb3707d9	ZeroRedundancyOptimizer: an implementation of a standalone sharded optimizer wrapper (#46750 ) Summary: Implement the first stage of ZeRO, sharding of the optimizer state, as described in [this blog post](https://www.microsoft.com/en-us/research/blog/zero-2-deepspeed-shattering-barriers-of-deep-learning-speed-scale/) and [this paper](https://arxiv.org/abs/1910.02054). This implementation is completely independent from the [DeepSpeed](https://github.com/microsoft/DeepSpeed) framework, and aims at providing ZeRO-compliant building blocks within the PyTorch scheme of things. This works by: - acting as a wrapper to a pytorch optimizer. ZeROptimizer does not optimize anything by itself, it only shards optimizers for distributed jobs - each rank distributes parameters according to a given partitioning scheme (could be updated), and owns the update of a given shard only - the .step() is called on each rank as expected, the fact that the optimizer actually works on a shard of the model is not visible from the outside - when the update is completed, each rank broadcasts the updated model shard to all the other ranks This can be used with DDP, although some communications are wasted in that case (gradients are all-reduced to all ranks). This implementation was initially developed in [Fairscale](https://github.com/facebookresearch/fairscale), and can also be used with an optimized DDP which only reduces to the relevant ranks. More context on ZeRO and PyTorch can be found in [this RFC](https://github.com/pytorch/pytorch/issues/42849) The API with respect to loading and saving the state is a known pain point and should probably be discussed an updated. Other possible follow ups include integrating more closely to a [modularized DDP](https://github.com/pytorch/pytorch/issues/37002), [making the checkpoints partition-agnostic](https://github.com/facebookresearch/fairscale/issues/164), [exposing a gradient clipping option](https://github.com/facebookresearch/fairscale/issues/98) and making sure that mixed precision states are properly handled. original authors include msbaines, min-xu-ai and myself Pull Request resolved: https://github.com/pytorch/pytorch/pull/46750 Reviewed By: mruberry Differential Revision: D25958918 Pulled By: blefaudeux fbshipit-source-id: 14280f2fd90cf251eee8ef9ac0f1fa6025ae9c50	2021-01-20 14:36:16 -08:00
Wanchao Liang	505be08c75	[dist_optim] serialize compilation when creating dist_optim (#45871 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45871 Attempt to fix https://github.com/pytorch/pytorch/issues/45845 Test Plan: Imported from OSS Reviewed By: pritamdamania87 Differential Revision: D24125209 Pulled By: wanchaol fbshipit-source-id: e3697dd6ef107d8153d2a82d78a17c66d109b4fa	2020-10-07 15:10:41 -07:00
Wanchao Liang	32c355af5b	[dist_optim] introduce distributed functional optimizer (#45221 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45221 This PR introduces a distributed functional optimizer, so that distributed optimizer can reuse the functional optimizer APIs and maintain their own states. This could enable the torchscript compatible functional optimizer when using distributed optimizer, helps getting rid of GIL and improve overall performance of training, especially distributed model parallel training Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D23935256 Pulled By: wanchaol fbshipit-source-id: 59b6d77ff4693ab24a6e1cbb6740bcf614cc624a	2020-09-25 17:13:10 -07:00
Shen Li	f05abd1259	Fix example block format in Distributed Optimizer API doc (#34919 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34919 Test Plan: Imported from OSS Differential Revision: D20500013 Pulled By: mrshenli fbshipit-source-id: d28cbdd1ec207e1e8501ce389b7040fb764f12ca	2020-03-17 17:44:09 -07:00
Rohan Varma	f933fa3613	[docs][1.5] update RPC docs to reflect correct use of dist_autograd backwards and dist_optim step() (#34670 ) Summary: - Clarify that `torch.distributed.autograd.backwards()` does not use the current thread local autograd context, instead it looks it up based on the context_id passed in - Clarify the same for `torch.distributeed.optimizer.optim.step()` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34670 Differential Revision: D20427645 Pulled By: rohan-varma fbshipit-source-id: a1a88de346cdd4dbe65fb2b7627157f86fd2b6a3	2020-03-13 14:09:23 -07:00
Omkar Salpekar	24dd800e6a	[Dist Autograd] Functional API for Dist Autograd and Dist Optimizer (#33711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33711 Fixed #33480 This makes `dist_autograd.backward` and `dist_optimizer.step` functional by making the user explicitly pass in the `context_id` as opposed to relying on the confusing thread_local context_id. This diff incorporates these API changes and all places where these functions are called. More concretely, this code: ``` with dist_autograd.context(): # Forward pass. dist_autograd.backward([loss.sum()]) dist_optim.step() ``` should now be written as follows: ``` with dist_autograd.context() as context_id: # Forward pass. dist_autograd.backward(context_id, [loss.sum()]) dist_optim.step(context_id) ``` Test Plan: Ensuring all existing dist_autograd and dist_optimizer tests pass with the new API. Also added a new test case for input checking. Differential Revision: D20011710 fbshipit-source-id: 216e12207934a2a79c7223332b97c558d89d4d65	2020-02-26 19:08:28 -08:00
Pritam Damania	359c39b3c2	Use global lock instead of per instance lock. (#31404 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31404 Multiple "trainers" could each create different instances of DistributedOptimizer, which means we can still have a race condition unless we do a trully global per worker lock. ghstack-source-id: 95874624 Test Plan: run unit tests -- unfortunatelly due to the non-deterministic behavior it's not clear how to unit test this properly. Differential Revision: D19154248 fbshipit-source-id: fab6286c17212f534f1bd1cbdf9f0de002d48c74	2019-12-18 09:22:54 -08:00
Alisson Gusatti Azzolini	07e14c7cd0	DistributedOptimizer: wait for all workers to finish _LocalOptimizer constructor (#30062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30062 This allows to catch exceptions during optimizer creation. ghstack-source-id: 94232436 Test Plan: new unit test. Differential Revision: D18586108 fbshipit-source-id: 71cfdf337fe803dbea8787b4c68e5a52b70a1f68	2019-11-19 18:30:00 -08:00
Pritam Damania	5d69bc1eda	Add docs for distributed optimizer. (#29971 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29971 ghstack-source-id: 94132160 Test Plan: waitforbuildbot Differential Revision: D18554631 fbshipit-source-id: c4485f7cff5159f423d0f35d1caf71074b62dc28	2019-11-18 18:51:26 -08:00
Alisson Gusatti Azzolini	b0cf43b2dd	Simple distributed optimizer (#29304 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29304 Implements a simple python distributed optimizer that takes rrefs to parameters that will be optimized. It keeps instances of optimizers remotely and calling step on distributed optimizer will call step on each of the remote optimizers in parallel. ghstack-source-id: 93564364 Test Plan: unit tests. Differential Revision: D18354586 fbshipit-source-id: 85d4c8bfec4aa38d2863cda704d024692511cff5	2019-11-11 12:02:24 -08:00

1 2 3 4

166 Commits