pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Nikita Shulga	bf4fcab681	Fix SyncBatchNorm usage without stats tracking (#50126 ) Summary: In `batch_norm_gather_stats_with_counts_cuda` use `input.scalar_type()` if `running_mean` is not defined In `SyncBatchNorm` forward function create count tensor with `torch.float32` type if `running_mean` is None Fix a few typos Pull Request resolved: https://github.com/pytorch/pytorch/pull/50126 Test Plan: ``` python -c "import torch;print(torch.batch_norm_gather_stats_with_counts( torch.randn(1, 3, 3, 3, device='cuda'), mean = torch.ones(2, 3, device='cuda'), invstd = torch.ones(2, 3, device='cuda'), running_mean = None, running_var = None , momentum = .1, eps = 1e-5, counts = torch.ones(2, device='cuda')))" ``` Fixes https://github.com/pytorch/pytorch/issues/49730 Reviewed By: ngimel Differential Revision: D25797930 Pulled By: malfet fbshipit-source-id: 22a91e3969b5e9bbb7969d9cc70b45013a42fe83	2021-01-07 18:31:13 -08:00
albanD	ccd646696b	Fix Module backward hooks for all Tensor inputs/outputs (#46163 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/598 This is BC-breaking as we now explicitly don't call the hook when there are not Tensors at the top level of the output. This feature was not working anyways as the returned grad_input/grad_output were wrong (not respecting the output structure and wrong inputs for multi-Node Module). This is also BC-breaking as we now report the correct gradients for `nn.Module`s that contain multiple autograd `Node`s while we use to return bad results before. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46163 Reviewed By: ailzhang, mruberry Differential Revision: D24894180 Pulled By: albanD fbshipit-source-id: e1b5d193d2818eb2f51e2a2722c7405c8bd13c2b	2020-12-18 09:04:36 -08:00
Vasiliy Kuznetsov	b167402e2e	[redo] Fix SyncBatchNorm forward pass for non-default process group (#43861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43861 This is a redo of https://github.com/pytorch/pytorch/pull/38874, and fixing my original bug from https://github.com/pytorch/pytorch/pull/38246. Test Plan: CI Imported from OSS Reviewed By: supriyar Differential Revision: D23418816 fbshipit-source-id: 2a3a3d67fc2d03bb0bf30a87cce4e805ac8839fb	2020-09-02 10:44:46 -07:00
Vasiliy Kuznetsov	f64d24c941	speed up SyncBatchNorm by batching distributed communication (#38246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38246 Speeds up SyncBatchNorm by batching the distributed communication. Initial benchmarks show a ~15+% speed improvement on MobileNetV2 and EfficientNetB3 on a single machine with 8 gpus. Improvement vs baseline increases as # of gpus increases. Test Plan: verified that before+after intermediate values in fwd/bwd pass are equivalent (with `torch.allclose`) benchmark runner: https://gist.github.com/vkuzo/7b1ce1b1b051ee6d46877d0f18ab9b1f results (1 forward pass + 1 backward pass, 1 machine, 8x Tesla-P100, batch_size=20 per node): ``` model gpus before_ms after_ms speedup efficientnet-b3 2 660 654 0.00909 efficientnet-b3 4 777 710 0.08623 efficientnet-b3 8 988 838 0.15182 mobilenet-v2 2 267 266 0.00375 mobilenet-v2 4 328 289 0.1189 mobilenet-v2 8 453 373 0.1766 ``` Imported from OSS Differential Revision: D21505905 fbshipit-source-id: 3e796343fce8329a2e17671d60ae66c0387924e7	2020-05-13 11:21:42 -07:00
elmirador	ae755a73d3	SyncBatchNorm size check update (#37133 ) Summary: Update the requirements on input dimensions for torch.nn.SyncBatchNorm: 1. Checks the aggregated batch size `count_all` instead of batch size in every DDP process https://github.com/pytorch/pytorch/issues/36865 2. Added test function for SyncBatchNorm where every process only has 1 input Pull Request resolved: https://github.com/pytorch/pytorch/pull/37133 Differential Revision: D21331120 Pulled By: zhaojuanmao fbshipit-source-id: ef3d1937990006609cfe4a68a64d90276c5085f2	2020-05-01 18:01:30 -07:00
gzygzy9211	ab2a9ab925	Non-blocking SyncBatchNorm update (#36659 ) Summary: As shown in https://github.com/pytorch/pytorch/issues/36452 , SyncBatchNorm can block host thread due the ``MemcpyDtoH`` and ``MemcpyHtoD`` when dealing with argument ``counts`` for native function ``batch_norm_gather_stats_with_counts``. - This fix change signiture of ``batch_norm_gather_stats_with_counts`` to ```c++ std::tuple<Tensor, Tensor> batch_norm_gather_stats_with_counts_cuda(const Tensor& self, const Tensor& mean, const Tensor& invstd, const Tensor& running_mean, const Tensor& running_var, double momentum, double epsilon, const Tensor& counts) ``` so it can directly receive "counts" in a ``CUDATensor`` rather than ``IntArrayRef`` whose data is in host memory. - This fix also improve implementation of ``SyncBatchNorm`` function so the construction of ``counts`` tensor will not cause additional ``MemcpyHtoD``, which will block host thread, too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36659 Differential Revision: D21196991 Pulled By: ngimel fbshipit-source-id: 84a529e6cf22e03618fecbb8f070ec452f81229e	2020-04-23 10:22:19 -07:00
Jie	289d52c120	Fixing SyncBN dgrad (#36382 ) Summary: Previous PR https://github.com/pytorch/pytorch/issues/22248 which provides support for variadic batch size across processes doesn't account the mean_dy/mean_dy_xmu on backward path, which produces wrong dgrad. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36382 Differential Revision: D20984446 Pulled By: ngimel fbshipit-source-id: 80066eee83760b275d61e2cdd4e86facca5577fd	2020-04-13 21:08:31 -07:00
Xiao Wang	c1dd70688a	Fix deprecated python "add" calls (#33428 ) Summary: This PR fixed those python "add" calls using deprecated signature `add(Scalar, Tensor)`. The alternative signature `add(Tensor, alpha = Scalar)` is used. cc csarofeen zasdfgbnm ptrblck ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/33428 Differential Revision: D20002534 Pulled By: vincentqb fbshipit-source-id: 81f2dd6170a47a9b53a17e5817c26e70d8afa130	2020-02-26 09:02:31 -08:00
Brian Wignall	f326045b37	Fix typos, via a Levenshtein-type corrector (#31523 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking. Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523 Differential Revision: D19216749 Pulled By: mrshenli fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea	2020-01-17 16:03:19 -08:00
jiej	9c7e604c60	SyncBatchNorm Update on input dimension checks (#29626 ) Summary: update the requirements on input dimensions for `torch.nn.SyncBatchNorm`: 1. 2D inputs is now permissible, https://github.com/pytorch/pytorch/issues/20204 ; 2. requires at least two element along normalization plane (BatchNorm behavior); Pull Request resolved: https://github.com/pytorch/pytorch/pull/29626 Differential Revision: D18492531 Pulled By: albanD fbshipit-source-id: f008e46a2d520d73c3c2730890a7424eba2ede9e	2019-11-18 16:09:51 -08:00
Gregory Chanan	23fde77d3d	Remove Module._backend as it's not used anymore. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25342 Test Plan: Imported from OSS Differential Revision: D17101571 Pulled By: gchanan fbshipit-source-id: 2cda46fe197e26a1cacb5e912f535809973d306e	2019-08-29 15:43:49 -07:00
root	8640aef505	Add support for non-affine batch norm with float stats and half inputs (#22750 ) Summary: This PR creates support for non-affine batch norm with float running estimates and half inputs. Changed were made similar to https://github.com/pytorch/pytorch/issues/16735. I couldn't find a specific test for `SyncBatchNorm`, so I used [this code](https://gist.github.com/ptrblck/ab45bfcde6df55ac28a7be18531f4718) to test it. cc ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/22750 Differential Revision: D17119965 Pulled By: ezyang fbshipit-source-id: 2e8c5d63fc3c636b8a1338c43c9c101a0f5e9b22	2019-08-29 14:04:37 -07:00
Gregory Chanan	a8ae33ce27	Move autograd function for CrossMapLRN2d from being backend specific to modules/_functions. (#25339 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25339 This is to get rid of backend-specific dispatch in modules; this autograd function is no longer backend specific so doesn't need to be in a backend specific location. Test Plan: Imported from OSS Differential Revision: D17101576 Pulled By: gchanan fbshipit-source-id: f4f0bd3ecc2d4dbd8cdfedbaabcadb8c603d2507	2019-08-29 09:55:11 -07:00
Shuaipeng Li	29ec4769bb	Fix SyncBatchNorm running var update issue (#22248 ) Summary: ## Fix https://github.com/pytorch/pytorch/issues/22192 + change signature: `func: batch_norm_gather_stats(Tensor input, Tensor mean, Tensor invstd, Tensor? running_mean, Tensor? running_var, float momentum, float eps, Tensor counts) -> (Tensor, Tensor)` + change cuda & cuda head ```cuda std::tuple<Tensor, Tensor> batch_norm_gather_stats_cuda(const Tensor& self, const Tensor& mean, const Tensor& invstd, const Tensor& running_mean, const Tensor& running_var, double momentum, double epsilon, int64_t count) { const Tensor& running_var, double momentum, double epsilon, const Tensor& counts) ``` + change python interface ```python class SyncBatchNorm(Function): def forward(self, input, weight, bias, running_mean, running_var, eps, momentum, process_group, world_size): ... ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/22248 Differential Revision: D16002146 Pulled By: mrshenli fbshipit-source-id: 9007e83928267b89df4d3847aabfbdb63e456956	2019-07-03 17:17:59 -07:00
jiej	39669316a6	(#14267 ) Summary: - Summary: Added synchronized batch normalization, allows synchronization of stats across mini-batches between processes within a process group. Current implementation uses a mixture of extended ATen native functions (cpp cuda extension) + torch.nn.modules (c10d python API) - User-facing api: 1. torch.nn.utils.convert_sync_batchnorm(modules, process_group=None) 2. torch.nn.SyncBatchNorm(num_features, eps=1e-5, momentum=0.1, affine=True, track_running_stats=True, *process_group=None) - supported use case: DistributedDataParallel with single-gpu multi-process* a. User creates model containing `torch.nn.SyncBatchNorm` layers through one of the ways listed below: 1. use layers directly: torch.nn.SyncBatchNorm(...) similar API as with torch.nn.BatchNormXd(...) with added argument `process_group` which is used to limit the scope of synchronization within each process group. Default value is None, which implies synchronization across all GPUs 2. use torch.nn.utils.convert_sync_batchnorm(modules, process_group) recursively convert all `torch.nn.BatchNormXd` into `torch.nn.SyncBatchNorm` preserving values of parameters/buffers. the utility function also allows user to specify process_group value to all converted layers. b. user wraps their model with `torch.distributed.parallel.DataParallelDistributed`, from this point, user should follow the general guidelines for DDP use guide - Error checking For use cases not supported, we error out: 1. Application launched without ddp: > import torch > sbn = torch.nn.SyncBatchNorm(10).cuda() > inp = torch.randn(5, 10, 3, 3).cuda() > sbn(inp) --> Error! > AttributeError: SyncBatchNorm is only supported within torch.nn.parallel.DistributedDataParallel 2. Application launched using DDP with multi-GPU per-process: > ddp_module = nn.parallel.DistributedDataParallel(module, device_ids=device_ids, output_device=args.local_rank) > ValueError: SyncBatchNorm is only supported for DDP with single GPU per process Pull Request resolved: https://github.com/pytorch/pytorch/pull/14267 Differential Revision: D14270035 Pulled By: ezyang fbshipit-source-id: 4956d8fa565c32e9df5408d53719ff9f945f4d6d	2019-03-06 13:39:11 -08:00

15 Commits