pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Vasiliy Kuznetsov	f64d24c941	speed up SyncBatchNorm by batching distributed communication (#38246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38246 Speeds up SyncBatchNorm by batching the distributed communication. Initial benchmarks show a ~15+% speed improvement on MobileNetV2 and EfficientNetB3 on a single machine with 8 gpus. Improvement vs baseline increases as # of gpus increases. Test Plan: verified that before+after intermediate values in fwd/bwd pass are equivalent (with `torch.allclose`) benchmark runner: https://gist.github.com/vkuzo/7b1ce1b1b051ee6d46877d0f18ab9b1f results (1 forward pass + 1 backward pass, 1 machine, 8x Tesla-P100, batch_size=20 per node): ``` model gpus before_ms after_ms speedup efficientnet-b3 2 660 654 0.00909 efficientnet-b3 4 777 710 0.08623 efficientnet-b3 8 988 838 0.15182 mobilenet-v2 2 267 266 0.00375 mobilenet-v2 4 328 289 0.1189 mobilenet-v2 8 453 373 0.1766 ``` Imported from OSS Differential Revision: D21505905 fbshipit-source-id: 3e796343fce8329a2e17671d60ae66c0387924e7	2020-05-13 11:21:42 -07:00
elmirador	ae755a73d3	SyncBatchNorm size check update (#37133 ) Summary: Update the requirements on input dimensions for torch.nn.SyncBatchNorm: 1. Checks the aggregated batch size `count_all` instead of batch size in every DDP process https://github.com/pytorch/pytorch/issues/36865 2. Added test function for SyncBatchNorm where every process only has 1 input Pull Request resolved: https://github.com/pytorch/pytorch/pull/37133 Differential Revision: D21331120 Pulled By: zhaojuanmao fbshipit-source-id: ef3d1937990006609cfe4a68a64d90276c5085f2	2020-05-01 18:01:30 -07:00
gzygzy9211	ab2a9ab925	Non-blocking SyncBatchNorm update (#36659 ) Summary: As shown in https://github.com/pytorch/pytorch/issues/36452 , SyncBatchNorm can block host thread due the ``MemcpyDtoH`` and ``MemcpyHtoD`` when dealing with argument ``counts`` for native function ``batch_norm_gather_stats_with_counts``. - This fix change signiture of ``batch_norm_gather_stats_with_counts`` to ```c++ std::tuple<Tensor, Tensor> batch_norm_gather_stats_with_counts_cuda(const Tensor& self, const Tensor& mean, const Tensor& invstd, const Tensor& running_mean, const Tensor& running_var, double momentum, double epsilon, const Tensor& counts) ``` so it can directly receive "counts" in a ``CUDATensor`` rather than ``IntArrayRef`` whose data is in host memory. - This fix also improve implementation of ``SyncBatchNorm`` function so the construction of ``counts`` tensor will not cause additional ``MemcpyHtoD``, which will block host thread, too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36659 Differential Revision: D21196991 Pulled By: ngimel fbshipit-source-id: 84a529e6cf22e03618fecbb8f070ec452f81229e	2020-04-23 10:22:19 -07:00
Jie	289d52c120	Fixing SyncBN dgrad (#36382 ) Summary: Previous PR https://github.com/pytorch/pytorch/issues/22248 which provides support for variadic batch size across processes doesn't account the mean_dy/mean_dy_xmu on backward path, which produces wrong dgrad. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36382 Differential Revision: D20984446 Pulled By: ngimel fbshipit-source-id: 80066eee83760b275d61e2cdd4e86facca5577fd	2020-04-13 21:08:31 -07:00
Xiao Wang	c1dd70688a	Fix deprecated python "add" calls (#33428 ) Summary: This PR fixed those python "add" calls using deprecated signature `add(Scalar, Tensor)`. The alternative signature `add(Tensor, alpha = Scalar)` is used. cc csarofeen zasdfgbnm ptrblck ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/33428 Differential Revision: D20002534 Pulled By: vincentqb fbshipit-source-id: 81f2dd6170a47a9b53a17e5817c26e70d8afa130	2020-02-26 09:02:31 -08:00
Brian Wignall	f326045b37	Fix typos, via a Levenshtein-type corrector (#31523 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking. Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523 Differential Revision: D19216749 Pulled By: mrshenli fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea	2020-01-17 16:03:19 -08:00
jiej	9c7e604c60	SyncBatchNorm Update on input dimension checks (#29626 ) Summary: update the requirements on input dimensions for `torch.nn.SyncBatchNorm`: 1. 2D inputs is now permissible, https://github.com/pytorch/pytorch/issues/20204 ; 2. requires at least two element along normalization plane (BatchNorm behavior); Pull Request resolved: https://github.com/pytorch/pytorch/pull/29626 Differential Revision: D18492531 Pulled By: albanD fbshipit-source-id: f008e46a2d520d73c3c2730890a7424eba2ede9e	2019-11-18 16:09:51 -08:00
Gregory Chanan	23fde77d3d	Remove Module._backend as it's not used anymore. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25342 Test Plan: Imported from OSS Differential Revision: D17101571 Pulled By: gchanan fbshipit-source-id: 2cda46fe197e26a1cacb5e912f535809973d306e	2019-08-29 15:43:49 -07:00
root	8640aef505	Add support for non-affine batch norm with float stats and half inputs (#22750 ) Summary: This PR creates support for non-affine batch norm with float running estimates and half inputs. Changed were made similar to https://github.com/pytorch/pytorch/issues/16735. I couldn't find a specific test for `SyncBatchNorm`, so I used [this code](https://gist.github.com/ptrblck/ab45bfcde6df55ac28a7be18531f4718) to test it. cc ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/22750 Differential Revision: D17119965 Pulled By: ezyang fbshipit-source-id: 2e8c5d63fc3c636b8a1338c43c9c101a0f5e9b22	2019-08-29 14:04:37 -07:00
Gregory Chanan	a8ae33ce27	Move autograd function for CrossMapLRN2d from being backend specific to modules/_functions. (#25339 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25339 This is to get rid of backend-specific dispatch in modules; this autograd function is no longer backend specific so doesn't need to be in a backend specific location. Test Plan: Imported from OSS Differential Revision: D17101576 Pulled By: gchanan fbshipit-source-id: f4f0bd3ecc2d4dbd8cdfedbaabcadb8c603d2507	2019-08-29 09:55:11 -07:00
Shuaipeng Li	29ec4769bb	Fix SyncBatchNorm running var update issue (#22248 ) Summary: ## Fix https://github.com/pytorch/pytorch/issues/22192 + change signature: `func: batch_norm_gather_stats(Tensor input, Tensor mean, Tensor invstd, Tensor? running_mean, Tensor? running_var, float momentum, float eps, Tensor counts) -> (Tensor, Tensor)` + change cuda & cuda head ```cuda std::tuple<Tensor, Tensor> batch_norm_gather_stats_cuda(const Tensor& self, const Tensor& mean, const Tensor& invstd, const Tensor& running_mean, const Tensor& running_var, double momentum, double epsilon, int64_t count) { const Tensor& running_var, double momentum, double epsilon, const Tensor& counts) ``` + change python interface ```python class SyncBatchNorm(Function): def forward(self, input, weight, bias, running_mean, running_var, eps, momentum, process_group, world_size): ... ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/22248 Differential Revision: D16002146 Pulled By: mrshenli fbshipit-source-id: 9007e83928267b89df4d3847aabfbdb63e456956	2019-07-03 17:17:59 -07:00
jiej	39669316a6	(#14267 ) Summary: - Summary: Added synchronized batch normalization, allows synchronization of stats across mini-batches between processes within a process group. Current implementation uses a mixture of extended ATen native functions (cpp cuda extension) + torch.nn.modules (c10d python API) - User-facing api: 1. torch.nn.utils.convert_sync_batchnorm(modules, process_group=None) 2. torch.nn.SyncBatchNorm(num_features, eps=1e-5, momentum=0.1, affine=True, track_running_stats=True, *process_group=None) - supported use case: DistributedDataParallel with single-gpu multi-process* a. User creates model containing `torch.nn.SyncBatchNorm` layers through one of the ways listed below: 1. use layers directly: torch.nn.SyncBatchNorm(...) similar API as with torch.nn.BatchNormXd(...) with added argument `process_group` which is used to limit the scope of synchronization within each process group. Default value is None, which implies synchronization across all GPUs 2. use torch.nn.utils.convert_sync_batchnorm(modules, process_group) recursively convert all `torch.nn.BatchNormXd` into `torch.nn.SyncBatchNorm` preserving values of parameters/buffers. the utility function also allows user to specify process_group value to all converted layers. b. user wraps their model with `torch.distributed.parallel.DataParallelDistributed`, from this point, user should follow the general guidelines for DDP use guide - Error checking For use cases not supported, we error out: 1. Application launched without ddp: > import torch > sbn = torch.nn.SyncBatchNorm(10).cuda() > inp = torch.randn(5, 10, 3, 3).cuda() > sbn(inp) --> Error! > AttributeError: SyncBatchNorm is only supported within torch.nn.parallel.DistributedDataParallel 2. Application launched using DDP with multi-GPU per-process: > ddp_module = nn.parallel.DistributedDataParallel(module, device_ids=device_ids, output_device=args.local_rank) > ValueError: SyncBatchNorm is only supported for DDP with single GPU per process Pull Request resolved: https://github.com/pytorch/pytorch/pull/14267 Differential Revision: D14270035 Pulled By: ezyang fbshipit-source-id: 4956d8fa565c32e9df5408d53719ff9f945f4d6d	2019-03-06 13:39:11 -08:00

12 Commits