Fixes#103818
1. for some special nn.Modules, there are checks which only support cuda, so I add `privateuse1` check.
2. when get the device type for `privateuse1` by `torch._C._get_privateuse1_backend_name()`, it will get error in `torch.jit.script`, so I add a global variable to avoid this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103419
Approved by: https://github.com/albanD
This is a new version of #15648 based on the latest master branch.
Unlike the previous PR where I fixed a lot of the doctests in addition to integrating xdoctest, I'm going to reduce the scope here. I'm simply going to integrate xdoctest, and then I'm going to mark all of the failing tests as "SKIP". This will let xdoctest run on the dashboards, provide some value, and still let the dashboards pass. I'll leave fixing the doctests themselves to another PR.
In my initial commit, I do the bare minimum to get something running with failing dashboards. The few tests that I marked as skip are causing segfaults. Running xdoctest results in 293 failed, 201 passed tests. The next commits will be to disable those tests. (unfortunately I don't have a tool that will insert the `#xdoctest: +SKIP` directive over every failing test, so I'm going to do this mostly manually.)
Fixes https://github.com/pytorch/pytorch/issues/71105
@ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82797
Approved by: https://github.com/ezyang
Summary:
This PR was not my worst debugging annoyance, nor my smallest in lines changed, but it has the highest `debugging annoyance/lines changed` ratio.
The current pattern
```
self.num_batches_tracked = self.num_batches_tracked + 1
```
, if captured, deletes an eagerly-allocated tensor and overwrites it with a captured tensor. Replays read from the (deallocated) original tensor's address.
This can cause
1. an IMA on graph replay
2. failure to actually increment `num_batches_tracked` during graph replay, because every replay reads from the old location without adding to it
3. numerical corruption if the allocator reassigns the original tensor's memory to some unrelated tensor
4. combinations of 1, 2, and 3, depending on global allocation patterns and if/when the BN module is called eagerly sometimes between replays
(ask me how I know).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70444
Reviewed By: albanD
Differential Revision: D33342203
Pulled By: ngimel
fbshipit-source-id: 5f201cc25030517e75af010bbaa88c452155df21
Summary:
There is a very common error when writing docs: One forgets to write a matching `` ` ``, and something like ``:attr:`x`` is rendered in the docs. This PR fixes most (all?) of these errors (and a few others).
I found these running ``grep -r ">[^#<][^<]*\`"`` on the `docs/build/html/generated` folder. The regex finds an HTML tag that does not start with `#` (as python comments in example code may contain backticks) and that contains a backtick in the rendered HTML.
This regex has not given any false positive in the current codebase, so I am inclined to suggest that we should add this check to the CI. Would this be possible / reasonable / easy to do malfet ?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60474
Reviewed By: mrshenli
Differential Revision: D29309633
Pulled By: albanD
fbshipit-source-id: 9621e0e9f87590cea060dd084fa367442b6bd046
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56982
SyncBatchNorm should behave as a regular BN layer in eval model, this
change ensures that this is the case.
In particular, the bug was when `track_running_stats=False`, `bn_training` would be set to True in eval mode, but this would trigger a collective sync in syncBN.
However, in eval mode syncBN should behave like a regular BN layer and not do this sync.
Closes https://github.com/pytorch/pytorch/issues/48988
Ensured with unittest that when used for inference on a single rank, stats sync is not triggered.
ghstack-source-id: 127544421
Test Plan: CI
Reviewed By: SciPioneer
Differential Revision: D27579297
fbshipit-source-id: 26406e2793f0be14f2daa46ae66f97a8494182ed
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56425
As SPMD mode is gone, `_specify_ddp_gpu_num` becomes useless. It only checks if the module is a GPU module. This actually is already checked by the caller of this function (in fairscale and some other codebases).
Additionally also remove `enable_pytorch_sync_bn` wrapper that only calls this function and does nothing else.
ghstack-source-id: 126885376
Test Plan: waitforbuildbot
Reviewed By: zhaojuanmao
Differential Revision: D27866440
fbshipit-source-id: d2fd5cf43eda25c0a2bd35f647848ec0dbd6ad0f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55946
As `ddp_gpu_size` field of `SyncBatchNorm` will always be 1 for GPU modules, remove this field and the relevant code.
ghstack-source-id: 126883498
Test Plan: waitforbuildbot
Reviewed By: zhaojuanmao
Differential Revision: D27746021
fbshipit-source-id: b4518c07e6f0c6943fbd7a7548500a7d4337126c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54208
It seems like it was added to suppress some errors in LazyModules, but I think we should solve those more directly with some type ignores in more surgical places.
Fixes#54087.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision: D27137363
Pulled By: ezyang
fbshipit-source-id: 017cafcc3350e73cd62436078835b97cd9b3b929
Summary:
Fixes https://github.com/pytorch/pytorch/issues/53366
gchanan albanD
Thanks for the feedback. Did a first pass trying to address the concerns in the original issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53495
Reviewed By: mrshenli
Differential Revision: D26914768
Pulled By: albanD
fbshipit-source-id: fa049f1952ef05598f0da2abead9a5a5d3602f75
Summary:
Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857
These are the only hand-written parts of this diff:
- the addition to `.github/workflows/lint.yml`
- the file endings changed in these four files (to appease FB-internal land-blocking lints):
- `GLOSSARY.md`
- `aten/src/ATen/core/op_registration/README.md`
- `scripts/README.md`
- `torch/csrc/jit/codegen/fuser/README.md`
The rest was generated by running this command (on macOS):
```
git grep -I -l ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' | xargs gsed -i 's/ *$//'
```
I looked over the auto-generated changes and didn't see anything that looked problematic.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406
Test Plan:
This run (after adding the lint but before removing existing trailing spaces) failed:
- https://github.com/pytorch/pytorch/runs/2043032377
This run (on the tip of this PR) succeeded:
- https://github.com/pytorch/pytorch/runs/2043296348
Reviewed By: walterddr, seemethere
Differential Revision: D26856620
Pulled By: samestep
fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97
Summary:
per title
This PR did
- Migrate `apex.parallel.SyncBatchNorm` channels_last to pytorch `torch.nn.SyncBatchNorm`
- Fix a TODO here by fusing `sum`, `div` kernels into backward elementwise kernel
b167402e2e/torch/nn/modules/_functions.py (L76-L95)
Todo
- [x] Discuss a regression introduced in https://github.com/pytorch/pytorch/pull/37133#discussion_r512530389, which is the synchronized copy here
b167402e2e/torch/nn/modules/_functions.py (L32-L34)
**Comment**: This PR uses apex version for the size check. Test passed and I haven't seen anything wrong so far.
- [x] The restriction to use channels_last kernel will be like this
```
inline bool batch_norm_use_channels_last_kernels(const at::Tensor& self) {
return self.is_contiguous(at::MemoryFormat::ChannelsLast) || self.ndimension() == 2;
}
```
I think we can relax that for channels_last_3d as well?
**Comment**: we don't have benchmark for this now, will check this and add functionality later when needed.
- [x] Add test
- [x] Add benchmark
Detailed benchmark is at https://github.com/xwang233/code-snippet/tree/master/syncbn-channels-last
Close https://github.com/pytorch/pytorch/issues/50781
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46906
Reviewed By: albanD
Differential Revision: D26771437
Pulled By: malfet
fbshipit-source-id: d00387044e9d43ac7e6c0e32a2db22c63d1504de
Summary:
Some minor improvement for lazy modules introduced in https://github.com/pytorch/pytorch/issues/44538, https://github.com/pytorch/pytorch/issues/47350 and https://github.com/pytorch/pytorch/issues/51548.
This PR mainly turn the bias to `UninitializedParameter` and instead of creating empty tensors like
```python
self.bias = Parameter(torch.Tensor(0))
self.bias = UninitializedParameter()
```
I think it would be better to
```python
self.register_parameter('bias', None)
self.bias = UninitializedParameter()
```
In addition, I change the constructor of the `LazyBatchNorm` from
```python
self.running_mean = UninitializedBuffer()
```
to
```python
self.register_buffer('running_mean', UninitializedBuffer())
```
as the original one would not change the underlying `self._buffers`.
Thank you for your time on reviewing this PR :).
Gently ping albanD, mruberry
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52212
Reviewed By: jbschlosser
Differential Revision: D26504508
Pulled By: albanD
fbshipit-source-id: 7094d0bb4fa9e2a40a07b79d350ea12a6ebfd080
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45317
Eager mode quantization depends on the presence of the `config`
model attribute. Currently converting a model to use `SyncBatchNorm`
removes the qconfig - fixing this. This is important if a BN is not
fused to anything during quantization convert.
Test Plan:
```
python test/test_quantization.py TestDistributed.test_syncbn_preserves_qconfig
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D23922072
fbshipit-source-id: cc1bc25c8e5243abb924c6889f78cf65a81be158
Summary:
This PR aims at tackling https://github.com/pytorch/pytorch/issues/37823 by:
- ensuring that buffers will be used for normalization computation but won't be updated, when buffers are not None, and `track_running_stats=False`
- adding a corresponding unittest to ensure expected behaviour
Any feedback is welcome!
_Note: we might want to update the docstrings of `BatchNorm*d`, feel free to share any suggestion!_
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38084
Differential Revision: D22047871
Pulled By: ezyang
fbshipit-source-id: 5acbcad9773e7901f26d625db71d43d7dc236d3e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38211
Just because the annotations are inline doesn't mean the files type
check; most of the newly annotated files have type errors and I
added exclusions for them in mypy.ini. The payoff of moving
all of these modules inline is I can delete the relevant code
generation logic for the pyi files (which was added ignore
annotations that weren't actually relevant anymore.)
For the most part the translation was completely mechanical, but there
were two hairy issues. First, I needed to work around a Python 3.6 and
earlier bug where Generic has a nontrivial metaclass. This fix is in
torch/jit/__init__.py. Second, module.py, we need to apply the same
fix for avoiding contravariance checks that the pyi file used to have;
this is done by declaring forward as a variable (rather than a
function), which appears to be sufficient enough to get mypy to not
contravariantly check input arguments.
Because we aren't actually typechecking these modules in most
cases, it is inevitable that some of these type annotations are wrong.
I slavishly copied the old annotations from the pyi files unless there
was an obvious correction I could make. These annotations will probably
need fixing up later.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D21497397
Pulled By: ezyang
fbshipit-source-id: 2b08bacc152c48f074e7edc4ee5dce1b77d83702