Commit Graph

128 Commits

Author SHA1 Message Date
Edward Z. Yang
4f13f69a45 Enable possibly-undefined error code (#118533)
Fixes https://github.com/pytorch/pytorch/issues/118129

Suppressions automatically added with

```
import re

with open("error_file.txt", "r") as f:
    errors = f.readlines()

error_lines = {}
for error in errors:
    match = re.match(r"(.*):(\d+):\d+: error:.*\[(.*)\]", error)
    if match:
        file_path, line_number, error_type = match.groups()
        if file_path not in error_lines:
            error_lines[file_path] = {}
        error_lines[file_path][int(line_number)] = error_type

for file_path, lines in error_lines.items():
    with open(file_path, "r") as f:
        code = f.readlines()
    for line_number, error_type in sorted(lines.items(), key=lambda x: x[0], reverse=True):
        code[line_number - 1] = code[line_number - 1].rstrip() + f"  # type: ignore[{error_type}]\n"
    with open(file_path, "w") as f:
        f.writelines(code)
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118533
Approved by: https://github.com/Skylion007, https://github.com/zou3519
2024-01-30 05:08:10 +00:00
Jarlaze
a17069684c Improve nn.modules.activation and batchnorm docs (#113531)
Fixes #112602

For some reason, I could not get the same output when running pycodestyle command as indicated in the issue. I manually ran ruff checks fixing the following issues  `D202`, `D204`,  `D205`, `D207`, `D400` and `D401`.

### Requested output

nn.modules.activation:
before: 135
after: 79

nn.modules.batchnorm
before: 21
after: 3
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113531
Approved by: https://github.com/mikaylagawarecki
2023-12-27 21:06:47 +00:00
Mikayla Gawarecki
f5919335db Fix _load_from_state_dict for num_batches_tracked in batchnorm (#115285)
I approved https://github.com/pytorch/pytorch/pull/110850 which did the following

Previously:
`num_batches_tracked` not in state_dict when doing `m.load_state_dict(state_dict)` --> always overwrite module's `num_batches_tracked` in `load_from_state_dict` with a 0 cpu tensor

Now:
`num_batches_tracked` not in state_dict loaded when doing `m.load_state_dict(state_dict)` --> only overwrite module's `num_batches_tracked`  in `load_from_state_dict` with a 0 cpu tensor if module does not have `num_batches_tracked`

This causes the following issue:

```
with torch.device('meta'):
     m = BatchNorm(...)
m.load_state_dict(state_dict, assign=True)
```

If `num_batches_tracked` is not in `state_dict`, since `modules's` `num_batches_tracked` is present on meta device, it is not overwritten with a 0 cpu tensor. When compiling, this error is raised

```
AssertionError: Does not support mixing cuda+meta
```

I am not sure whether the explicit check for meta device makes sense as a fix, will add testing if this fix is ok

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115285
Approved by: https://github.com/albanD
2023-12-07 22:48:26 +00:00
XiaobingSuper
395614c1a4 keep sync bn training flag same with converted bn's training flag (#111998)
When converting bn to sync bn, we need to keep sync bn's training flag with the original bn flag, the motivation is there in case the given origin model has set some bn training flag and others are not seated, after we convert sync bn, we hoping not to change this behavior.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111998
Approved by: https://github.com/mikaylagawarecki
2023-10-26 08:18:08 +00:00
FFFrog
0e0f6a248d Fix num_batches_tracked of BatchNorm when load_state_dict (#110850)
Fixes #110361

as the title shown

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110850
Approved by: https://github.com/mikaylagawarecki
2023-10-24 04:20:38 +00:00
Mikayla Gawarecki
b08b0c915f [easy] Fix docs for sd calculation in BatchNorm1d/3d for consistency with BatchNorm2d (#107308)
Fixes https://github.com/pytorch/pytorch/issues/100048

BatchNorm2d docs were updated in https://github.com/pytorch/pytorch/pull/97974. There have been a number of issues filed due to confusion about this so I think we should fix before branch cut

Pull Request resolved: https://github.com/pytorch/pytorch/pull/107308
Approved by: https://github.com/albanD
2023-08-16 21:51:02 +00:00
Justin Chu
79c5e33349 [BE] Enable ruff's UP rules and autoformat nn/ mps/ and torch/ (#105436)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105436
Approved by: https://github.com/malfet, https://github.com/albanD
2023-07-21 07:38:46 +00:00
shibo19
58feefa4ed add custom device support for special nn.modules (#103419)
Fixes #103818
1. for some special nn.Modules, there are checks which only support cuda, so I add `privateuse1` check.
2. when get the device type for `privateuse1` by `torch._C._get_privateuse1_backend_name()`, it will get error in `torch.jit.script`, so I add a global variable to avoid this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103419
Approved by: https://github.com/albanD
2023-06-26 00:58:29 +00:00
Zaccharie Ramzi
65e8c14948 Corrected batch norm docs with the exact computations of the standard deviation (#97974)
Fixes #77427

@jbschlosser sory for taking so long to submit this, I just realized this had been sitting in my backlog for too long.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97974
Approved by: https://github.com/albanD
2023-03-30 16:29:57 +00:00
Xuehai Pan
5b1cedacde [BE] [2/3] Rewrite super() calls in functorch and torch (#94588)
Rewrite Python built-in class `super()` calls. Only non-semantic changes should be applied.

- #94587
- #94588
- #94592

Also, methods with only a `super()` call are removed:

```diff
class MyModule(nn.Module):
-   def __init__(self):
-       super().__init__()
-
    def forward(self, ...):
        ...
```

Some cases that change the semantics should be kept unchanged. E.g.:

f152a79be9/caffe2/python/net_printer.py (L184-L190)

f152a79be9/test/test_jit_fuser_te.py (L2628-L2635)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94588
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-02-10 21:16:33 +00:00
joncrall
ad782ff7df Enable xdoctest runner in CI for real this time (#83816)
Builds on #83317 and enables running the doctests. Just need to figure out what is causing the failures.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83816
Approved by: https://github.com/ezyang, https://github.com/malfet
2022-12-29 05:32:42 +00:00
Yuxin Wu
56e40fe054 Let SyncBatchNorm fallback to BN if not using distributed training (#89706)
Fixes #63662
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89706
Approved by: https://github.com/soumith
2022-11-27 05:55:24 +00:00
joncrall
b136f3f310 More doctest refinements. (#83317)
Follow up to #82797

Now that the doctests themselves are in a better state, we should be able to enable xdoctest on the CI so they stay that way.

@ezyang @vadimkantorov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83317
Approved by: https://github.com/ezyang
2022-08-22 20:07:26 +00:00
joncrall
4618371da5 Integrate xdoctest - Rebased (#82797)
This is a new version of #15648 based on the latest master branch.

Unlike the previous PR where I fixed a lot of the doctests in addition to integrating xdoctest, I'm going to reduce the scope here. I'm simply going to integrate xdoctest, and then I'm going to mark all of the failing tests as "SKIP". This will let xdoctest run on the dashboards, provide some value, and still let the dashboards pass. I'll leave fixing the doctests themselves to another PR.

In my initial commit, I do the bare minimum to get something running with failing dashboards. The few tests that I marked as skip are causing segfaults. Running xdoctest results in 293 failed, 201 passed tests. The next commits will be to disable those tests. (unfortunately I don't have a tool that will insert the `#xdoctest: +SKIP` directive over every failing test, so I'm going to do this mostly manually.)

Fixes https://github.com/pytorch/pytorch/issues/71105

@ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82797
Approved by: https://github.com/ezyang
2022-08-12 02:08:01 +00:00
anjali411
bda04e9f5e Add __all__ for torch.optim and torch.nn.modules modules (#80237)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80237
Approved by: https://github.com/albanD
2022-06-24 21:34:10 +00:00
Adam J. Stewart
dfde877c0b Add type hints for a few random functions/classes
Adds type hints for a few functions/classes that we use in [TorchGeo](https://github.com/microsoft/torchgeo).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74171
Approved by: https://github.com/jbschlosser, https://github.com/anjali411
2022-05-04 13:53:00 +00:00
PyTorch MergeBot
80fe96c860 Revert "Add type hints for a few random functions/classes"
This reverts commit cdb40eb528.

Reverted https://github.com/pytorch/pytorch/pull/74171 on behalf of https://github.com/zengk95
2022-04-21 21:07:15 +00:00
Adam J. Stewart
cdb40eb528 Add type hints for a few random functions/classes
Adds type hints for a few functions/classes that we use in [TorchGeo](https://github.com/microsoft/torchgeo).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74171
Approved by: https://github.com/jbschlosser
2022-04-21 20:09:40 +00:00
Andreas Kouzelis
133c213415 updated the docs for BatchNorm1d and InstanceNorm1d (#71371)
Summary:
Fixes input shape notation inconsistencies mentioned here: https://github.com/pytorch/pytorch/issues/71366

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71371

Reviewed By: anjali411

Differential Revision: D33746814

Pulled By: jbschlosser

fbshipit-source-id: 21a080ea30192cd109e437f758afe54d57778724
(cherry picked from commit c1fecebd03)
2022-01-25 15:34:24 +00:00
Michael Carilli
29ff596dca [CUDA graphs] Changes batchnorm to increment num_batches_tracked in place for improved graph safety (#70444)
Summary:
This PR was not my worst debugging annoyance, nor my smallest in lines changed, but it has the highest `debugging annoyance/lines changed` ratio.

The current pattern
```
self.num_batches_tracked = self.num_batches_tracked + 1
```
, if captured, deletes an eagerly-allocated tensor and overwrites it with a captured tensor. Replays read from the (deallocated) original tensor's address.
This can cause
1. an IMA on graph replay
2. failure to actually increment `num_batches_tracked` during graph replay, because every replay reads from the old location without adding to it
3. numerical corruption if the allocator reassigns the original tensor's memory to some unrelated tensor
4. combinations of 1, 2, and 3, depending on global allocation patterns and if/when the BN module is called eagerly sometimes between replays

(ask me how I know).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70444

Reviewed By: albanD

Differential Revision: D33342203

Pulled By: ngimel

fbshipit-source-id: 5f201cc25030517e75af010bbaa88c452155df21
2022-01-04 17:06:46 -08:00
Calvin McCarter
bdf439a958 Adds _LazyInstanceNorm and LazyInstanceNormXd (#60982)
Summary:
Signed-off-by: Calvin McCarter <calvin@lightmatter.co>

Fixes https://github.com/pytorch/pytorch/issues/60981

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60982

Reviewed By: albanD

Differential Revision: D29810547

Pulled By: jbschlosser

fbshipit-source-id: d933d4c7fe5cf7be9b09a5ab93f740b94cf08cc1
2021-07-21 06:45:45 -07:00
lezcano
4e347f1242 [docs] Fix backticks in docs (#60474)
Summary:
There is a very common error when writing docs: One forgets to write a matching `` ` ``, and something like ``:attr:`x`` is rendered in the docs. This PR fixes most (all?) of these errors (and a few others).

I found these running ``grep -r ">[^#<][^<]*\`"`` on the `docs/build/html/generated` folder. The regex finds an HTML tag that does not start with `#` (as python comments in example code may contain backticks) and that contains a backtick in the rendered HTML.

This regex has not given any false positive in the current codebase, so I am inclined to suggest that we should add this check to the CI. Would this be possible / reasonable / easy to do malfet ?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60474

Reviewed By: mrshenli

Differential Revision: D29309633

Pulled By: albanD

fbshipit-source-id: 9621e0e9f87590cea060dd084fa367442b6bd046
2021-06-24 06:27:41 -07:00
Hameer Abbasi
46e4b2dbda Convert assert -> cast. (#57458)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/55868.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57458

Reviewed By: mruberry

Differential Revision: D28365745

Pulled By: walterddr

fbshipit-source-id: 35cc3fa85f87b0ef98cf970f620ab909d240c7be
2021-05-12 13:54:16 -07:00
Rohan Varma
6ee5e490d4 [BE][SyncBN] Avoid sync stats in eval mode (#56982)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56982

SyncBatchNorm should behave as a regular BN layer in eval model, this
change ensures that this is the case.

In particular, the bug was when `track_running_stats=False`, `bn_training` would be set to True in eval mode, but this would trigger a collective sync in syncBN.

However, in eval mode syncBN should behave like a regular BN layer and not do this sync.

Closes https://github.com/pytorch/pytorch/issues/48988

Ensured with unittest that when used for inference on a single rank, stats sync is not triggered.
ghstack-source-id: 127544421

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D27579297

fbshipit-source-id: 26406e2793f0be14f2daa46ae66f97a8494182ed
2021-04-28 09:53:30 -07:00
Joel Schlosser
febff45900 Support factory kwargs in torch.nn modules (#54508)
Summary:
Continuation of https://github.com/pytorch/pytorch/pull/53144

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54508

Reviewed By: albanD

Differential Revision: D27939544

Pulled By: jbschlosser

fbshipit-source-id: 4bf517e5f74f093e27ca38a85e732da65e44d805
2021-04-22 16:16:53 -07:00
Joel Schlosser
12b2bc94d7 Revert D27909732: [pytorch][PR] Support factory kwargs in torch.nn modules
Test Plan: revert-hammer

Differential Revision:
D27909732 (5a09def9b0)

Original commit changeset: d8684b2403ab

fbshipit-source-id: d00d69fae4fa4ed58d9e97e70b27a06a0dcb39e4
2021-04-21 13:44:03 -07:00
Joel Schlosser
5a09def9b0 Support factory kwargs in torch.nn modules (#54508)
Summary:
Continuation of https://github.com/pytorch/pytorch/pull/53144

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54508

Reviewed By: malfet

Differential Revision: D27909732

Pulled By: jbschlosser

fbshipit-source-id: d8684b2403ab7eb336371d118799146a2520bd76
2021-04-21 13:20:11 -07:00
Sam Estep
75024e228c Add lint for unqualified type: ignore (#56290)
Summary:
The other half of https://github.com/pytorch/pytorch/issues/56272.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56290

Test Plan:
CI should pass on the tip of this PR, and we know that the lint works because the following CI runs (before this PR was finished) failed:

- https://github.com/pytorch/pytorch/runs/2384511062
- https://github.com/pytorch/pytorch/actions/runs/765036024

Reviewed By: seemethere

Differential Revision: D27867219

Pulled By: samestep

fbshipit-source-id: e648f07b6822867e70833e23ddafe7fb7eaca235
2021-04-21 08:07:23 -07:00
Yi Wang
5017c5fcad [SPMD] Remove _specify_ddp_gpu_num method (#56425)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56425

As SPMD mode is gone, `_specify_ddp_gpu_num` becomes useless. It only checks if the module is a GPU module. This actually is already checked by the caller of this function (in fairscale and some other codebases).

Additionally also remove `enable_pytorch_sync_bn` wrapper that only calls this function and does nothing else.
ghstack-source-id: 126885376

Test Plan: waitforbuildbot

Reviewed By: zhaojuanmao

Differential Revision: D27866440

fbshipit-source-id: d2fd5cf43eda25c0a2bd35f647848ec0dbd6ad0f
2021-04-20 11:17:47 -07:00
Yi Wang
07653b7fe0 [SPMD] Remove ddp_gpu_size field from SyncBatchNorm (#55946)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55946

As `ddp_gpu_size` field of `SyncBatchNorm` will always be 1 for GPU modules, remove this field and the relevant code.
ghstack-source-id: 126883498

Test Plan: waitforbuildbot

Reviewed By: zhaojuanmao

Differential Revision: D27746021

fbshipit-source-id: b4518c07e6f0c6943fbd7a7548500a7d4337126c
2021-04-19 21:41:29 -07:00
Natalia Gimelshein
92d24e3060 Revert D27855386: [pytorch][PR] Support factory kwargs in torch.nn modules
Test Plan: revert-hammer

Differential Revision:
D27855386 (40483acc51)

Original commit changeset: dabd505d2a04

fbshipit-source-id: f5bf3120d87861b30a8e1bf11977ad7d27cd8500
2021-04-19 20:07:20 -07:00
Joel Schlosser
40483acc51 Support factory kwargs in torch.nn modules (#54508)
Summary:
Continuation of https://github.com/pytorch/pytorch/pull/53144

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54508

Reviewed By: bdhirsh

Differential Revision: D27855386

Pulled By: jbschlosser

fbshipit-source-id: dabd505d2a04208e74b158570fb2859c736eea2c
2021-04-19 12:24:58 -07:00
Sam Estep
d05e7c163f Revert D27600457: [pytorch][PR] Support factory kwargs in torch.nn modules
Test Plan: revert-hammer

Differential Revision:
D27600457 (1077f87269)

Original commit changeset: b58bfee61c39

fbshipit-source-id: 19d5bfc5133a3880383731d0332503ca1f3bce0c
2021-04-19 07:47:24 -07:00
Joel Schlosser
1077f87269 Support factory kwargs in torch.nn modules (#54508)
Summary:
Continuation of https://github.com/pytorch/pytorch/pull/53144

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54508

Reviewed By: mrshenli

Differential Revision: D27600457

Pulled By: jbschlosser

fbshipit-source-id: b58bfee61c3917524b4622f63ef216c27a588eb1
2021-04-19 06:58:40 -07:00
Yi Wang
d398a705c6 Clang-format batchnorm.py and distributed.py (#55971)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55971

Per title
ghstack-source-id: 126454339

Test Plan: N/A

Reviewed By: zhaojuanmao

Differential Revision: D27752315

fbshipit-source-id: 64ca5dea7b2689037594a6bd9a75641a9bb817c1
2021-04-13 18:40:23 -07:00
Yukio Siraichi
27048c1dfa Remove legacy constructor calls from _torch_ folder. (#53889)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/53146
Related to https://github.com/pytorch/pytorch/issues/47112

As mentioned in https://github.com/pytorch/pytorch/issues/47112, the plan is to:

1. Verify that all `torch.Tensor()` scenarios are covered by other functions
2. Scrub internal `torch.Tensor()` uses
3. Update the docs and throw `TORCH_WARN_ONCE` if someone uses `torch.Tensor()`

In this PR, I replaced all occurrences of `torch.Tensor` present in the _torch_ folder.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53889

Reviewed By: walterddr, zou3519

Differential Revision: D27190743

Pulled By: jbschlosser

fbshipit-source-id: 7ecc201d57935b8dbb98ae3718b60d95cb55a010
2021-03-19 15:20:19 -07:00
Edward Yang
72c7983f23 Remove __get__ from Tensor stub. (#54208)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54208

It seems like it was added to suppress some errors in LazyModules, but I think we should solve those more directly with some type ignores in more surgical places.

Fixes #54087.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D27137363

Pulled By: ezyang

fbshipit-source-id: 017cafcc3350e73cd62436078835b97cd9b3b929
2021-03-17 21:40:58 -07:00
Emilio Castillo
c0c5f80f36 Lazy Modules Documentation Clarifications (#53495)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/53366

gchanan albanD
Thanks for the feedback. Did a first pass trying to address the concerns in the original issue.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53495

Reviewed By: mrshenli

Differential Revision: D26914768

Pulled By: albanD

fbshipit-source-id: fa049f1952ef05598f0da2abead9a5a5d3602f75
2021-03-09 13:09:33 -08:00
Sam Estep
8c798e0622 Forbid trailing whitespace (#53406)
Summary:
Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857

These are the only hand-written parts of this diff:
- the addition to `.github/workflows/lint.yml`
- the file endings changed in these four files (to appease FB-internal land-blocking lints):
  - `GLOSSARY.md`
  - `aten/src/ATen/core/op_registration/README.md`
  - `scripts/README.md`
  - `torch/csrc/jit/codegen/fuser/README.md`

The rest was generated by running this command (on macOS):
```
git grep -I -l ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' | xargs gsed -i 's/ *$//'
```

I looked over the auto-generated changes and didn't see anything that looked problematic.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406

Test Plan:
This run (after adding the lint but before removing existing trailing spaces) failed:
- https://github.com/pytorch/pytorch/runs/2043032377

This run (on the tip of this PR) succeeded:
- https://github.com/pytorch/pytorch/runs/2043296348

Reviewed By: walterddr, seemethere

Differential Revision: D26856620

Pulled By: samestep

fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97
2021-03-05 17:22:55 -08:00
Xiao Wang
d30f4d1dfd Migrate apex.parallel.SyncBatchNorm channels_last to pytorch (#46906)
Summary:
per title

This PR did
- Migrate `apex.parallel.SyncBatchNorm` channels_last to pytorch `torch.nn.SyncBatchNorm`
- Fix a TODO here by fusing `sum`, `div` kernels into backward elementwise kernel
b167402e2e/torch/nn/modules/_functions.py (L76-L95)

Todo
- [x] Discuss a regression introduced in https://github.com/pytorch/pytorch/pull/37133#discussion_r512530389, which is the synchronized copy here
b167402e2e/torch/nn/modules/_functions.py (L32-L34)

**Comment**: This PR uses apex version for the size check. Test passed and I haven't seen anything wrong so far.

- [x] The restriction to use channels_last kernel will be like this
```
inline bool batch_norm_use_channels_last_kernels(const at::Tensor& self) {
  return self.is_contiguous(at::MemoryFormat::ChannelsLast) || self.ndimension() == 2;
}
```
I think we can relax that for channels_last_3d as well?

**Comment**: we don't have benchmark for this now, will check this and add functionality later when needed.
- [x] Add test
- [x] Add benchmark

Detailed benchmark is at https://github.com/xwang233/code-snippet/tree/master/syncbn-channels-last

Close https://github.com/pytorch/pytorch/issues/50781

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46906

Reviewed By: albanD

Differential Revision: D26771437

Pulled By: malfet

fbshipit-source-id: d00387044e9d43ac7e6c0e32a2db22c63d1504de
2021-03-03 15:29:45 -08:00
zilinzhu
c8b3686a3e Make bias in lazy modules lazy and avoid create empty tensors (#52212)
Summary:
Some minor improvement for lazy modules introduced in https://github.com/pytorch/pytorch/issues/44538, https://github.com/pytorch/pytorch/issues/47350 and https://github.com/pytorch/pytorch/issues/51548.

This PR mainly turn the bias to `UninitializedParameter` and instead of creating empty tensors like
```python
self.bias = Parameter(torch.Tensor(0))
self.bias = UninitializedParameter()
```
I think it would be better to
```python
self.register_parameter('bias', None)
self.bias = UninitializedParameter()
```

In addition, I change the constructor of the `LazyBatchNorm` from
```python
self.running_mean = UninitializedBuffer()
```
to
```python
self.register_buffer('running_mean', UninitializedBuffer())
```
as the original one would not change the underlying `self._buffers`.

Thank you for your time on reviewing this PR :).

Gently ping albanD, mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52212

Reviewed By: jbschlosser

Differential Revision: D26504508

Pulled By: albanD

fbshipit-source-id: 7094d0bb4fa9e2a40a07b79d350ea12a6ebfd080
2021-02-18 06:34:53 -08:00
Akifumi Imanishi
b3fda95fe7 Add LazyBatchNormXd (#51862)
Summary:
Same diff with https://github.com/pytorch/pytorch/issues/51548 (cc. albanD)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51862

Reviewed By: izdeby

Differential Revision: D26312289

Pulled By: albanD

fbshipit-source-id: 9cdec0e0c9021c33d10d85010978c7fa5cb4dc60
2021-02-09 10:29:03 -08:00
Alban Desmaison
a930162c69 Revert D26276903: [pytorch][PR] Add LazyBatchNormXd
Test Plan: revert-hammer

Differential Revision:
D26276903 (aa1fd6b45a)

Original commit changeset: 0ac706974178

fbshipit-source-id: bfe01b01cd460f1e2845ea5ef1fc1514e6b6ba54
2021-02-05 12:37:29 -08:00
Akifumi Imanishi
aa1fd6b45a Add LazyBatchNormXd (#51548)
Summary:
This PR implements UninitializedBuffer and LazyBatchnormXd based on https://github.com/pytorch/pytorch/issues/44538. (cc. emcastillo and albanD)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51548

Reviewed By: zhangguanheng66

Differential Revision: D26276903

Pulled By: albanD

fbshipit-source-id: 0ac706974178363f8af075e59b41d5989418922f
2021-02-05 10:27:04 -08:00
Nikita Shulga
bf4fcab681 Fix SyncBatchNorm usage without stats tracking (#50126)
Summary:
In `batch_norm_gather_stats_with_counts_cuda` use `input.scalar_type()` if `running_mean` is not defined
In `SyncBatchNorm` forward function create count tensor with `torch.float32` type if `running_mean` is None
Fix a few typos

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50126

Test Plan:
```
python -c "import torch;print(torch.batch_norm_gather_stats_with_counts( torch.randn(1, 3, 3, 3, device='cuda'), mean = torch.ones(2, 3, device='cuda'), invstd = torch.ones(2, 3, device='cuda'), running_mean = None, running_var = None  , momentum = .1, eps = 1e-5, counts = torch.ones(2, device='cuda')))"
```

Fixes https://github.com/pytorch/pytorch/issues/49730

Reviewed By: ngimel

Differential Revision: D25797930

Pulled By: malfet

fbshipit-source-id: 22a91e3969b5e9bbb7969d9cc70b45013a42fe83
2021-01-07 18:31:13 -08:00
Rohan Varma
c0a0845019 Improve new_group example in the context of SyncBatchNorm (#48897)
Summary:
Closes https://github.com/pytorch/pytorch/issues/48804
Improves some documentation/example in SyncBN docs to clearly show that each rank must call into all `new_group()` calls for creating process subgroups, even if they are not going to be part of that particular subgroup.
We then pick the right group, i.e. the group that the rank is part of, and pass that into the SyncBN APIs.

Doc rendering:

<img width="786" alt="syncbn_update" src="https://user-images.githubusercontent.com/8039770/101271959-b211ab80-373c-11eb-8b6d-d56483fd9f5d.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48897

Reviewed By: zou3519

Differential Revision: D25493181

Pulled By: rohan-varma

fbshipit-source-id: a7e93fc8cc07ec7797e5dbc356f1c3877342cfa3
2020-12-11 10:28:08 -08:00
Guilherme Leobas
9b52654620 annotate a few torch.nn.modules.* modules (#45772)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/45771

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45772

Reviewed By: mruberry

Differential Revision: D24682013

Pulled By: albanD

fbshipit-source-id: e32bc4fe9c586c079f7070924a874c70f3d127fa
2020-11-02 13:04:59 -08:00
Vasiliy Kuznetsov
bdf329ef8a SyncBN: preserve qconfig if it exists (#45317)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45317

Eager mode quantization depends on the presence of the `config`
model attribute.  Currently converting a model to use `SyncBatchNorm`
removes the qconfig - fixing this.  This is important if a BN is not
fused to anything during quantization convert.

Test Plan:
```
python test/test_quantization.py TestDistributed.test_syncbn_preserves_qconfig
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D23922072

fbshipit-source-id: cc1bc25c8e5243abb924c6889f78cf65a81be158
2020-09-24 22:52:07 -07:00
Lin.Sung
f77ba0e48c Change typo 'momemtum' to 'momentum' (#45045)
Summary:
As the title.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45045

Reviewed By: mruberry

Differential Revision: D23808563

Pulled By: mrshenli

fbshipit-source-id: ca818377f4c23d67b037c146fef667ab8731961e
2020-09-21 19:03:26 -07:00
Xiang Gao
20ac736200 Remove py2 compatible future imports (#44735)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735

Reviewed By: mruberry

Differential Revision: D23731306

Pulled By: ezyang

fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f
2020-09-16 12:55:57 -07:00