Commit Graph

53 Commits

Author SHA1 Message Date
Soumith Chintala
6e76813a39 fix SyncBatchNorm doc (#20991)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/19265
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20991

Differential Revision: D15513518

Pulled By: soumith

fbshipit-source-id: 9618c0b2442e013e4d37793cdb04cb4f4b1b141c
2019-05-27 14:46:58 -07:00
Zhang Liliang
f7a7868820 add process_group in convert_sync_batchnorm (#19240)
Summary:
In line 508.
convert_sync_batchnorm is called recursively to convert the bn to syncbn, thus the process_group also should be passed in the function.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19240

Differential Revision: D15240318

Pulled By: ezyang

fbshipit-source-id: 0fc9e856392824814991e5e9e8f9513d57f311af
2019-05-07 06:51:18 -07:00
Spandan Tiwari
df05c7fbac Fix momentum setting in BatchNorm forward pass. (#18764)
Summary:
This is a fix for issue https://github.com/pytorch/pytorch/issues/18525. The issue is related not only to ONNX export, but can manifest in other scenarios.
An existing test point in test/onnx/test_operators.py has been updated to cover this scenario as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18764

Reviewed By: zrphercule

Differential Revision: D14735166

Pulled By: houseroad

fbshipit-source-id: 5a737c648f64355929ff31eb12bd4869e744768d
2019-04-08 16:30:00 -07:00
Arunava
79533ef097 convert_sync_batch_norm to SyncBatchNorm (#18787)
Summary:
Closes #18382

Please let me know if any changes are required.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18787

Differential Revision: D14821147

Pulled By: soumith

fbshipit-source-id: edd98eab1b3f4151c4ae5148146435ddb2ae678d
2019-04-07 00:13:02 -07:00
jiej
39669316a6 (#14267)
Summary:
- Summary:

Added synchronized batch normalization, allows synchronization of stats across mini-batches between processes within a process group.
Current implementation uses a mixture of extended ATen native functions (cpp cuda extension) + torch.nn.modules (c10d python API)

- User-facing api:

1. torch.nn.utils.convert_sync_batchnorm(modules, process_group=None)

2. torch.nn.SyncBatchNorm(num_features, eps=1e-5, momentum=0.1, affine=True, track_running_stats=True, ***process_group=None***)

- supported use case:
DistributedDataParallel with ***single-gpu multi-process***

a. User creates model containing `torch.nn.SyncBatchNorm` layers through one of the ways listed below:

  1. use layers directly:

     torch.nn.SyncBatchNorm(...)

     similar API as with torch.nn.BatchNormXd(...)
     with added argument `process_group` which is used to limit the scope of
     synchronization within each process group. Default value is None, which
     implies synchronization across all GPUs

  2. use torch.nn.utils.convert_sync_batchnorm(modules, process_group)

     recursively convert all `torch.nn.BatchNormXd` into `torch.nn.SyncBatchNorm`
     preserving values of parameters/buffers.
     the utility function also allows user to specify process_group value to all
     converted layers.

b. user wraps their model with
   `torch.distributed.parallel.DataParallelDistributed`, from this point, user
   should follow the general guidelines for DDP use guide

- Error checking

For use cases not supported, we error out:

1. Application launched without ddp:
   > import torch
   > sbn = torch.nn.SyncBatchNorm(10).cuda()
   > inp = torch.randn(5, 10, 3, 3).cuda()
   > sbn(inp) --> Error!
   > AttributeError: SyncBatchNorm is only supported within torch.nn.parallel.DistributedDataParallel

2. Application launched using DDP with multi-GPU per-process:
   > ddp_module = nn.parallel.DistributedDataParallel(module, device_ids=device_ids, output_device=args.local_rank)
   > ValueError: SyncBatchNorm is only supported for DDP with single GPU per process
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14267

Differential Revision: D14270035

Pulled By: ezyang

fbshipit-source-id: 4956d8fa565c32e9df5408d53719ff9f945f4d6d
2019-03-06 13:39:11 -08:00
James Webber
162ad94590 Fixed typo in batchnorm docstrings
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15975

Differential Revision: D13642271

Pulled By: soumith

fbshipit-source-id: 60ffa392bf1f916f2b93c943bb44a642a9815c42
2019-01-11 17:28:37 -08:00
Adam Paszke
c79e305add Don't DCE PythonOp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14773

Reviewed By: eellison

Differential Revision: D13327673

Pulled By: suo

fbshipit-source-id: 236db3407c7eacac470530836e3d4d0dc323110c
2018-12-04 21:37:36 -08:00
Wanchao Liang
d872af9282 Add tests for dropout/batchnorm train/eval, remove training constants (#14780)
Summary:
This PR:

1. add tests for batchnorm/dropout for train/eval parameter mutatino
2. remove training constants from all our standard library
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14780

Differential Revision: D13331578

Pulled By: wanchaol

fbshipit-source-id: d92ca3ce38cc2888688d50fe015e3e22539a20a5
2018-12-04 18:17:43 -08:00
David Riazati
c3bfa0e52b BatchNorm support not tracking stats
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14764

Differential Revision: D13325800

Pulled By: driazati

fbshipit-source-id: a3e4773dc31b83565e7a4de33614d6efd4a12de9
2018-12-04 15:11:53 -08:00
Wanchao Liang
119f9ec291 enable NoneValue parameter assignment for WeakScriptModule (#14715)
Summary:
This PR:

1. Handle None value attr in the WeakScriptModuleProxy
2. add back module tests that now passing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14715

Differential Revision: D13313573

Pulled By: wanchaol

fbshipit-source-id: a6b7892707350290a6d69b6f6270ad089bfc954b
2018-12-03 20:40:55 -08:00
Wanchao Liang
d6bfc53b9e Export BatchNorm functional and module, add necessary JIT support (#14016)
Summary:
This PR did three things:

1. It export the BatchNorm functional and module, and rewrite some of the components to stay align with the current supported JIT features
2. In the process of export, add necessary compiler support for in_place op aug assign
4. change the test_jit behavior in add_module_test to utilize a single rng state during module initialization
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14016

Differential Revision: D13112064

Pulled By: wanchaol

fbshipit-source-id: 31e3aee5fbb509673c781e7dbb6d8884cfa55d91
2018-11-20 14:15:06 -08:00
Junjie Bai
4484f67b47 Revert D10203439: [pytorch][PR] Fix batch norm multiplier init
Differential Revision:
D10203439

Original commit changeset: 999cc134a45e

fbshipit-source-id: 7871e384063db2f3788169338e9c965d5f8ac351
2018-11-09 00:37:05 -08:00
Kaixhin
c9be135bb9 Fix batch norm multiplier init (#12325)
Summary:
Fixes #12259
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12325

Differential Revision: D10203439

Pulled By: SsnL

fbshipit-source-id: 999cc134a45e2554313adb7eb93ee98e1f84335f
2018-11-08 19:00:00 -08:00
Tongzhou Wang
2cd912bcc2 Fix more spectral norm bugs (#13350)
Summary:
Problems with SN and DP after #12671 :
1. in eval mode, `weight_orig` is not getting correct gradient #12737 .

    Fix: keep `v` vector around as a buffer and always calculate `W = W_orig / (u @ W_orig @ v)` even in eval.

2. in training mode, the `weight` buffer of the parallelized module is never updated, if someone touches `weight_orig` and/or `weight` and makes them not sharing storage. So in `eval` the weight used is wrong.

    Fix: Make `weight` not a buffer anymore and always calculate it as above.

3. #12671 changed SN to update `u` in-place to make DP work correctly, but then it breaks backward through two forwards (e.g., the common GAN loss `D(real) - D(fake)`) because the vectors needed to backprop the 1st forward is changed in the 2nd forward.

    Fix: This PR clones `u` and `v` before using them.

To maintain BC, I added a hook interface for producing and loading state_dict. This is ugly and we should really have better interface for spectral_norm. But for the purpose to fix this issue, I make this patch. Even if we have a better interface, BC mechanism for legacy loading legacy state_dict still needs to be done.

cc The controller you requested could not be found. crcrpar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13350

Differential Revision: D12931044

Pulled By: SsnL

fbshipit-source-id: 8be6f934eaa62414d76d2c644dedd7e1b7eb31ef
2018-11-06 19:16:13 -08:00
vishwakftw
f9a99d5504 Specify default initialization schemes for modules in docs (#9038)
Summary: This closes #6906 .

Reviewed By: ezyang

Differential Revision: D8698632

Pulled By: weiyangfb

fbshipit-source-id: 259c1dbdc264a8e9f83e196fa72d135babd97d48
2018-07-24 11:58:08 -07:00
Tongzhou Wang
623ae0c07c Fix loading 0.4 BN checkpoints (#9004)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/8481
Closes https://github.com/pytorch/pytorch/pull/9004

Reviewed By: soumith

Differential Revision: D8684017

Pulled By: SsnL

fbshipit-source-id: 57820ad5f6b60795358c9447409a364a93ffa1d9
2018-07-01 22:24:21 -07:00
Richard Zou
3185d8342e Replace incorrect usages of "NotImplemented" (#7381)
* Replace incorrect usages of "NotImplemented"

Fixes #7266. Replaces "NotImplemented" (which is supposed to be used for
binary ops) with the correct "NotImplementedError".

* Address comments
2018-05-08 18:31:45 -04:00
Jerry Ma
76d3c30783 Enable resetting of batchnorm running moments and cumulative ("simple") moving average (#6445) 2018-04-26 19:27:24 -07:00
Kaiyu Shi
605307f8f3 Add support for printing extra information in Module and refactor redundant codes (#5936)
This PR enables users to print extra information of their subclassed nn.Module.
Now I simply insert the user-defined string at the ending of module name, which should be discussed in this PR.

Before this PR, users should redefine the __repr__ and copy&paste the source code from Module.

* Add support for extra information on Module

* Rewrite the repr method of Module

* Fix flake8

* Change the __repr__ to get_extra_repr in Linear

* Fix extra new-line for empty line

* Add test for __repr__ method

* Fix bug of block string indent

* Add indent for multi-line repr test.

* Address review comments

* Update tutorial for creating nn.Module

* Fix flake8, add extra_repr of bilinear

* Refactor DropoutNd

* Change to extra_repr in some Modules

* Fix flake8

* Refactor padding modules

* Refactor pooling module

* Fix typo

* Change to extra_repr

* Fix bug for GroupNorm

* Fix bug for LayerNorm
2018-04-02 13:52:33 -04:00
Tongzhou Wang
08891b0a4e Group Normalization (#5968)
* Group Normalization

* move to ATen
2018-03-24 12:16:18 -04:00
Soumith Chintala
7e13138eb6
Revert "Enable resetting of batchnorm running stats and cumulative ("simple") moving average" (#5892)
* Revert "Port ATen and JIT C++ tests to Catch2 (#5788)"

This reverts commit 6f80023c29.

* Revert "Fix error message for cat-ing zero-dim tensors (#5819)"

This reverts commit cf2e176049.

* Revert "Softmax symbolic should account for negative dim (#5846)"

This reverts commit ba64724aee.

* Revert "[fft][1 of 3] build system and helpers to support cuFFT and MKL (#5855)"

This reverts commit 22ef8e5654.

* Revert "Don't modify requires_grad when running DataParallel in no_grad mode (#5880)"

This reverts commit d11b7fbd1c.

* Revert "fix some methods not showing up in doc (#5882)"

This reverts commit 24fca0efb2.

* Revert "ReduceOps cleanup and set_num_threads (#5723)"

This reverts commit 84400d5531.

* Revert "introduce shape_as_tensor and reshape_from_variable_shape (#5824)"

This reverts commit f446b82e70.

* Revert "Enable resetting of batchnorm running moments and cumulative ("simple") moving average (#5766)"

This reverts commit 99b1f6cfad.
2018-03-19 17:47:54 -04:00
Jerry Ma
99b1f6cfad Enable resetting of batchnorm running moments and cumulative ("simple") moving average (#5766) 2018-03-19 11:47:57 -04:00
Vishwak Srinivasan
76a283db40 [ready] General Documentation Improvements - 2 (#5685)
* Fix some minor errors in existing docs.

* Fix Convolution and Pooling docs in torch.nn.functional

* Cleaned up torch.nn.functional docs

* Address @SsnL 's comments

* Add multiplication sign missing in docs

* Fix more typos, and clear some warnings

* Change infinity symbol in LPPool2d

* Revert some changes in torch.nn.functional

* Few more minor changes
2018-03-13 09:47:43 -04:00
Vishwak Srinivasan
32b3841553 [ready] General documentation improvements (#5450)
* Improvize documentation
1. Add formula for erf, erfinv
2. Make exp, expm1 similar to log, log1p
3. Symbol change in ge, le, ne, isnan

* Fix minor nit in the docstring

* More doc improvements
1. Added some formulae
2. Complete scanning till "Other Operations" in Tensor docs

* Add more changes
1. Modify all torch.Tensor wherever required

* Fix Conv docs
1. Fix minor nits in the references for LAPACK routines

* Improve Pooling docs
1. Fix lint error

* Improve docs for RNN, Normalization and Padding
1. Fix flake8 error for pooling

* Final fixes for torch.nn.* docs.
1. Improve Loss Function documentation
2. Improve Vision Layers documentation

* Fix lint error

* Improve docstrings in torch.nn.init

* Fix lint error

* Fix minor error in torch.nn.init.sparse

* Fix Activation and Utils Docs
1. Fix Math Errors
2. Add explicit clean to Makefile in docs to prevent running graph generation script
while cleaning
3. Fix utils docs

* Make PYCMD a Makefile argument, clear up prints in the build_activation_images.py

* Fix batch norm doc error
2018-03-08 13:21:12 -05:00
Tongzhou Wang
27265503ad nn.* doc update after Variable/Tensor merge (#5459)
The nn.* counterpart of #5443 . Mostly removed Variable wrapper. Also added doc for nn.RReLU.

Notice that torch.randn(*, requires_grad=True) isn't documented until #5462 is done.
2018-03-01 18:11:39 -05:00
Tongzhou Wang
1848cad108 [ready] Layer Normalization (#4922)
* at::maybe_data_ptr and Check.h => TensorUtils.h

* THNN support for optional BN running_*

* ATen support for optional BN running_*

* Python nn.* support for optional BN running_*; Improve IN and BN doc

* Add tests for IN and BN new option

* Layer Norm

* Fix LRN doc

* functional interface for LN and IN

* Layer norm tests

* fix BN double backward returning undefined tensors

* fix jit test using wrong dim inputs for BN

* add/improve BN, IN and LN GPU tests with half type

* Udpate docs to be consistent with Conv notation
Fix onnx
Clarified onnx symbokic wrapper

* fix typo

* Address comments
2018-02-22 11:56:41 -05:00
Ozan Çağlayan
dd6d04ddf2 doc: Normalize all true/false in docstrings to `True|False` (#3593)
* doc: Normalize all true/false in docstrings to ``True|False``

This makes them more apparent in the documentation.

* doc: fix flake8
2017-11-09 08:12:29 -05:00
Soumith Chintala
3109e4ad6a add common terminology to BatchNorm docs 2017-10-17 11:03:31 +02:00
yunjey
0cd149f06f Add comments for default value (#2242) 2017-07-29 14:14:14 +05:30
Leonid Vlasenkov
46a868dab7 [Ready] Limit docs line length (#1900)
* some docs are ready

* docs

* docs

* fix some more

* fix some more
2017-07-10 10:24:54 -04:00
Alykhan Tejani
c6d7e1e6bf added input size checks to batchnorm (#2020) 2017-07-09 15:31:24 -04:00
Samuel
9d916e561c batch norm docfix (#1804)
fixes the formula for batch normalization (moves the epsilon inside
the square root)
2017-06-14 11:57:46 -04:00
Dmitry Ulyanov
46cf6ff5fb fix batchnorm docs (#1284) 2017-04-18 15:12:38 -04:00
Jihun Choi
d9678c2e34 Correct typo in batchnorm documentation 2017-03-22 13:55:45 +01:00
Luke Yeager
e7c1e6a8e3 [pep8] Fix most lint automatically with autopep8
Here's the command I used to invoke autopep8 (in parallel!):

    git ls-files | grep '\.py$' | xargs -n1 -P`nproc` autopep8 -i

Several rules are ignored in setup.cfg. The goal is to let autopep8
handle everything which it can handle safely, and to disable any rules
which are tricky or controversial to address. We may want to come back
and re-enable some of these rules later, but I'm trying to make this
patch as safe as possible.

Also configures flake8 to match pep8's behavior.

Also configures TravisCI to check the whole project for lint.
2017-01-28 01:15:51 +01:00
Ronny
6d14ef8083 Update batchnorm docstrings
Add missing full stops, and added blank line for increased clarity on rendered documentation.
2017-01-19 14:15:26 +01:00
Sam Gross
f0a6ca4d53 BatchNorm fixes (#423)
- don't use cuDNN for half inputs because weight, bias, running_mean,
   etc. are required to be of different type than for THCUNN
 - accept 3D inputs (N,C,L) in BatchNorm1d
 - remove accidental 'use_cudnn=False'
2017-01-09 13:16:51 -05:00
Soumith Chintala
088f14c697 fix batchnorm and linear docs for rst 2017-01-04 13:35:55 -05:00
Sergey Zagoruyko
55e850d825 test if modules can be printed with fixes 2016-12-29 17:30:46 -05:00
Sergey Zagoruyko
62af45d99f Basic functional interface (#354) 2016-12-29 22:53:57 +01:00
Sam Gross
ffcc38cf05 Deterministic ordering of parameters and buffers. (#317)
Uses the assignment syntax to get deterministic ordering of parameters.
The ordering of parameters using the constructor syntax is
non-deterministic because kwargs use dict() in Python 3.5 and earlier.
2016-12-16 14:45:56 -05:00
Soumith Chintala
513d902df1 adding __repr__ for nn 2016-11-07 16:17:40 -05:00
Adam Paszke
b4f4cca875 Rename training and evaluation methods 2016-10-30 00:16:06 +02:00
Adam Paszke
3cbe66ba8c Change requires_grad default to False 2016-10-05 08:46:34 -07:00
soumith
d92b7da733 fix documentation to not use forward 2016-09-30 09:49:30 -07:00
Adam Paszke
7f4ff0e615 Fix type conversions in nn 2016-09-27 15:45:49 -07:00
Adam Paszke
f9d25e8e72 Refactor nn (require specifying parameters explicitly) 2016-09-27 15:22:26 -07:00
Adam Paszke
8fdec15a55 Codemod to remove camel case method naming 2016-09-20 08:40:28 -07:00
Soumith Chintala
b5f7720ab9 docstrings for container and batchnorm 2016-09-16 05:31:36 -04:00
Adam Paszke
fb39971464 Add more modules to nn 2016-09-14 11:05:56 -07:00