Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54208
It seems like it was added to suppress some errors in LazyModules, but I think we should solve those more directly with some type ignores in more surgical places.
Fixes#54087.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D27137363
Pulled By: ezyang
fbshipit-source-id: 017cafcc3350e73cd62436078835b97cd9b3b929
Summary:
Fixes https://github.com/pytorch/pytorch/issues/53366
gchanan albanD
Thanks for the feedback. Did a first pass trying to address the concerns in the original issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53495
Reviewed By: mrshenli
Differential Revision: D26914768
Pulled By: albanD
fbshipit-source-id: fa049f1952ef05598f0da2abead9a5a5d3602f75
Summary:
Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857
These are the only hand-written parts of this diff:
- the addition to `.github/workflows/lint.yml`
- the file endings changed in these four files (to appease FB-internal land-blocking lints):
- `GLOSSARY.md`
- `aten/src/ATen/core/op_registration/README.md`
- `scripts/README.md`
- `torch/csrc/jit/codegen/fuser/README.md`
The rest was generated by running this command (on macOS):
```
git grep -I -l ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' | xargs gsed -i 's/ *$//'
```
I looked over the auto-generated changes and didn't see anything that looked problematic.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406
Test Plan:
This run (after adding the lint but before removing existing trailing spaces) failed:
- https://github.com/pytorch/pytorch/runs/2043032377
This run (on the tip of this PR) succeeded:
- https://github.com/pytorch/pytorch/runs/2043296348
Reviewed By: walterddr, seemethere
Differential Revision: D26856620
Pulled By: samestep
fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97
Summary:
per title
This PR did
- Migrate `apex.parallel.SyncBatchNorm` channels_last to pytorch `torch.nn.SyncBatchNorm`
- Fix a TODO here by fusing `sum`, `div` kernels into backward elementwise kernel
b167402e2e/torch/nn/modules/_functions.py (L76-L95)
Todo
- [x] Discuss a regression introduced in https://github.com/pytorch/pytorch/pull/37133#discussion_r512530389, which is the synchronized copy here
b167402e2e/torch/nn/modules/_functions.py (L32-L34)
**Comment**: This PR uses apex version for the size check. Test passed and I haven't seen anything wrong so far.
- [x] The restriction to use channels_last kernel will be like this
```
inline bool batch_norm_use_channels_last_kernels(const at::Tensor& self) {
return self.is_contiguous(at::MemoryFormat::ChannelsLast) || self.ndimension() == 2;
}
```
I think we can relax that for channels_last_3d as well?
**Comment**: we don't have benchmark for this now, will check this and add functionality later when needed.
- [x] Add test
- [x] Add benchmark
Detailed benchmark is at https://github.com/xwang233/code-snippet/tree/master/syncbn-channels-last
Close https://github.com/pytorch/pytorch/issues/50781
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46906
Reviewed By: albanD
Differential Revision: D26771437
Pulled By: malfet
fbshipit-source-id: d00387044e9d43ac7e6c0e32a2db22c63d1504de
Summary:
Some minor improvement for lazy modules introduced in https://github.com/pytorch/pytorch/issues/44538, https://github.com/pytorch/pytorch/issues/47350 and https://github.com/pytorch/pytorch/issues/51548.
This PR mainly turn the bias to `UninitializedParameter` and instead of creating empty tensors like
```python
self.bias = Parameter(torch.Tensor(0))
self.bias = UninitializedParameter()
```
I think it would be better to
```python
self.register_parameter('bias', None)
self.bias = UninitializedParameter()
```
In addition, I change the constructor of the `LazyBatchNorm` from
```python
self.running_mean = UninitializedBuffer()
```
to
```python
self.register_buffer('running_mean', UninitializedBuffer())
```
as the original one would not change the underlying `self._buffers`.
Thank you for your time on reviewing this PR :).
Gently ping albanD, mruberry
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52212
Reviewed By: jbschlosser
Differential Revision: D26504508
Pulled By: albanD
fbshipit-source-id: 7094d0bb4fa9e2a40a07b79d350ea12a6ebfd080
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45317
Eager mode quantization depends on the presence of the `config`
model attribute. Currently converting a model to use `SyncBatchNorm`
removes the qconfig - fixing this. This is important if a BN is not
fused to anything during quantization convert.
Test Plan:
```
python test/test_quantization.py TestDistributed.test_syncbn_preserves_qconfig
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D23922072
fbshipit-source-id: cc1bc25c8e5243abb924c6889f78cf65a81be158
Summary:
This PR aims at tackling https://github.com/pytorch/pytorch/issues/37823 by:
- ensuring that buffers will be used for normalization computation but won't be updated, when buffers are not None, and `track_running_stats=False`
- adding a corresponding unittest to ensure expected behaviour
Any feedback is welcome!
_Note: we might want to update the docstrings of `BatchNorm*d`, feel free to share any suggestion!_
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38084
Differential Revision: D22047871
Pulled By: ezyang
fbshipit-source-id: 5acbcad9773e7901f26d625db71d43d7dc236d3e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38211
Just because the annotations are inline doesn't mean the files type
check; most of the newly annotated files have type errors and I
added exclusions for them in mypy.ini. The payoff of moving
all of these modules inline is I can delete the relevant code
generation logic for the pyi files (which was added ignore
annotations that weren't actually relevant anymore.)
For the most part the translation was completely mechanical, but there
were two hairy issues. First, I needed to work around a Python 3.6 and
earlier bug where Generic has a nontrivial metaclass. This fix is in
torch/jit/__init__.py. Second, module.py, we need to apply the same
fix for avoiding contravariance checks that the pyi file used to have;
this is done by declaring forward as a variable (rather than a
function), which appears to be sufficient enough to get mypy to not
contravariantly check input arguments.
Because we aren't actually typechecking these modules in most
cases, it is inevitable that some of these type annotations are wrong.
I slavishly copied the old annotations from the pyi files unless there
was an obvious correction I could make. These annotations will probably
need fixing up later.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D21497397
Pulled By: ezyang
fbshipit-source-id: 2b08bacc152c48f074e7edc4ee5dce1b77d83702
Summary:
xref gh-32838, gh-34032
This is a major refactor of parts of the documentation to split it up using sphinx's `autosummary` feature which will build out `autofuction` and `autoclass` stub files and link to them. The end result is that the top module pages like torch.nn.rst and torch.rst are now more like table-of-contents to the actual single-class or single-function documentations pages.
Along the way, I modified many of the docstrings to eliminate sphinx warnings when building. I think the only thing I changed from a non-documentation perspective is to add names to `__all__` when adding them to `globals()` in `torch.__init__.py`
I do not know the CI system: are the documentation build artifacts available after the build, so reviewers can preview before merging?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37419
Differential Revision: D21337640
Pulled By: ezyang
fbshipit-source-id: d4ad198780c3ae7a96a9f22651e00ff2d31a0c0f
Summary:
two instances of if -> it in torch.nn.modules.batchnorm.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29797
Differential Revision: D19698613
Pulled By: ezyang
fbshipit-source-id: 7312b2333f227113e904dfa91db90d00e525affb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32745
Some parameters (like `bias` in conv) are optional. To achieve this
previously, you had to add `bias` as a constant, which would invoke some
pretty weird behavior in the frontend, summarized as:
```
if bias is not None:
add it as a parameter normally
else: # bias is None
add it as a constant with the value None
```
There are several things bad about this:
1. Bias is not a constant. Marking it `__constants__` is confusing.
2. It basically relies on an implementation detail (the frontend
processes parameters before constants) to work.
Okay, whatever. I don't even know why we did this originally, but
getting rid of it doesn't break anything, so I assume improved NoneType
refinement has made this a non-issue.
Note on perf: this will make no difference; if bias was `None` it's still
folded out today, if bias is a Tensor it would be added as a parameter
both before and after this change
Test Plan: Imported from OSS
Differential Revision: D19628634
Pulled By: suo
fbshipit-source-id: d9128a09c5d096b938fcf567b8c23b09ac9ab37f
Summary:
Fixes https://github.com/pytorch/pytorch/issues/29187
This introduces a new class `_NormBase` that `_InstanceNorm` and `_BatchNorm` inherit from separately. This means the `isinstance(module, _BatchNorm)` check won't falsely pass for `_InstanceNorm`.
The suggested fix of adding `and not isinstance(module, _InstanceNorm)` works as well, but requires introducing a cyclic dependency between `instancenorm.py` and `batchnorm.py`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29985
Differential Revision: D18588104
Pulled By: yf225
fbshipit-source-id: f599da3b902ad9c56836db4d429bfc462ed51338
Summary:
update the requirements on input dimensions for `torch.nn.SyncBatchNorm`:
1. 2D inputs is now permissible, https://github.com/pytorch/pytorch/issues/20204 ;
2. requires at least two element along normalization plane (BatchNorm behavior);
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29626
Differential Revision: D18492531
Pulled By: albanD
fbshipit-source-id: f008e46a2d520d73c3c2730890a7424eba2ede9e
Summary:
Removing in-place operator for num_batches_tracked increment. The in-place
operator used here turns out to block many optimization opportunities due to
alias assumption for inputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27299
Differential Revision: D17909341
Pulled By: ngimel
fbshipit-source-id: 7d635be94dfd2002af435acf6ea71995adaa40f6
Summary:
This is a fix for a potential ONNX export issue with SyncBatchNorm where irrespective of the value of momentum, the value for momentum in ONNX BN node is always 0. The details are captured in https://github.com/pytorch/pytorch/issues/18525.
The fix in this PR for `SyncBatchNorm` is very similar to the fix that went in https://github.com/pytorch/pytorch/pull/18764 for `BatchNorm` (I think this site was just missed).
Please note that there are no ONNX test points added for this, because SyncBatchNorm works exclusively with tensors on GPU and the ONNX test passes are CPU only. If there's a way to add a test point, please let me know.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24995
Differential Revision: D17085570
Pulled By: dzhulgakov
fbshipit-source-id: 162d428673c269efca4360fb103854b7319ec204
Summary:
After converting BN layers to SyncBN layers, the function will set all `requires_grad = True` regardless of the original requires_grad states. I think it is a bug and have fixed it in this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22569
Differential Revision: D16151647
Pulled By: zou3519
fbshipit-source-id: e2ad1886c94d8882485e7fb8be51ad76469ecc67
Summary:
* Deletes all weak script decorators / associated data structures / methods
* In order to keep supporting the standard library in script, this enables recursive script on any function defined in `torch.nn`
* Most changes in `torch/nn` are the result of `ag -Q "weak" torch/nn/ -l | xargs sed -i '/weak/d'`, only `rnn.py` needed manual editing to use the `ignore` and `export` to continue supporting the overloaded `forward` methods
* `Sequential`/`ModuleList` no longer need to be added to constants since they are compiled on demand
This should also fix https://github.com/pytorch/pytorch/issues/22212
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22212
Differential Revision: D15988346
Pulled By: driazati
fbshipit-source-id: af223e3ad0580be895377312949997a70e988e4f
Summary:
Fixes#12259, needs to make sure tests (see #13766) don't break due to numerical precision issues. Not sure what would need to be adjusted here...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13774
Differential Revision: D15715021
Pulled By: ezyang
fbshipit-source-id: 20ce2beee1b39ebe9f023c5f2b25be53acccb5f3
Summary:
A bunch of modules were missing entries for `__constants__` which was making their `__repr__`s not work. Others had `__constants__` that were not necessary since it was provided by some parent class instead.
Fixes#20978
](https://our.intern.facebook.com/intern/diff/15539518/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21071
Pulled By: driazati
Differential Revision: D15539518
fbshipit-source-id: 24bdd1ef41ef636eefd5d2bad4ab2d79646ed4f0
Summary:
In line 508.
convert_sync_batchnorm is called recursively to convert the bn to syncbn, thus the process_group also should be passed in the function.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19240
Differential Revision: D15240318
Pulled By: ezyang
fbshipit-source-id: 0fc9e856392824814991e5e9e8f9513d57f311af
Summary:
This is a fix for issue https://github.com/pytorch/pytorch/issues/18525. The issue is related not only to ONNX export, but can manifest in other scenarios.
An existing test point in test/onnx/test_operators.py has been updated to cover this scenario as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18764
Reviewed By: zrphercule
Differential Revision: D14735166
Pulled By: houseroad
fbshipit-source-id: 5a737c648f64355929ff31eb12bd4869e744768d
Summary:
Closes#18382
Please let me know if any changes are required.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18787
Differential Revision: D14821147
Pulled By: soumith
fbshipit-source-id: edd98eab1b3f4151c4ae5148146435ddb2ae678d
Summary:
- Summary:
Added synchronized batch normalization, allows synchronization of stats across mini-batches between processes within a process group.
Current implementation uses a mixture of extended ATen native functions (cpp cuda extension) + torch.nn.modules (c10d python API)
- User-facing api:
1. torch.nn.utils.convert_sync_batchnorm(modules, process_group=None)
2. torch.nn.SyncBatchNorm(num_features, eps=1e-5, momentum=0.1, affine=True, track_running_stats=True, ***process_group=None***)
- supported use case:
DistributedDataParallel with ***single-gpu multi-process***
a. User creates model containing `torch.nn.SyncBatchNorm` layers through one of the ways listed below:
1. use layers directly:
torch.nn.SyncBatchNorm(...)
similar API as with torch.nn.BatchNormXd(...)
with added argument `process_group` which is used to limit the scope of
synchronization within each process group. Default value is None, which
implies synchronization across all GPUs
2. use torch.nn.utils.convert_sync_batchnorm(modules, process_group)
recursively convert all `torch.nn.BatchNormXd` into `torch.nn.SyncBatchNorm`
preserving values of parameters/buffers.
the utility function also allows user to specify process_group value to all
converted layers.
b. user wraps their model with
`torch.distributed.parallel.DataParallelDistributed`, from this point, user
should follow the general guidelines for DDP use guide
- Error checking
For use cases not supported, we error out:
1. Application launched without ddp:
> import torch
> sbn = torch.nn.SyncBatchNorm(10).cuda()
> inp = torch.randn(5, 10, 3, 3).cuda()
> sbn(inp) --> Error!
> AttributeError: SyncBatchNorm is only supported within torch.nn.parallel.DistributedDataParallel
2. Application launched using DDP with multi-GPU per-process:
> ddp_module = nn.parallel.DistributedDataParallel(module, device_ids=device_ids, output_device=args.local_rank)
> ValueError: SyncBatchNorm is only supported for DDP with single GPU per process
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14267
Differential Revision: D14270035
Pulled By: ezyang
fbshipit-source-id: 4956d8fa565c32e9df5408d53719ff9f945f4d6d
Summary:
This PR:
1. add tests for batchnorm/dropout for train/eval parameter mutatino
2. remove training constants from all our standard library
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14780
Differential Revision: D13331578
Pulled By: wanchaol
fbshipit-source-id: d92ca3ce38cc2888688d50fe015e3e22539a20a5
Summary:
This PR:
1. Handle None value attr in the WeakScriptModuleProxy
2. add back module tests that now passing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14715
Differential Revision: D13313573
Pulled By: wanchaol
fbshipit-source-id: a6b7892707350290a6d69b6f6270ad089bfc954b