Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63242
These functions are part of the native functions namespace as well as the quantized namespace
Test Plan:
CI
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D30316430
fbshipit-source-id: cd9c839e5c1a961e3c6944e514c16fbc256a2f0c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63240
Op is exposed via torch.quantized_batch_norm to the end user without any existing documentation
Test Plan:
CI
Imported from OSS
Reviewed By: VitalyFedyunin
Differential Revision: D30316431
fbshipit-source-id: bf2dc8b7b6f497cf73528eaa2bedef9f65029d84
Summary:
Warm up of learning rate scheduling has initially been discussed by Priya et. al. in the paper: https://arxiv.org/pdf/1706.02677.pdf .
In the section 2.2 of the paper they discussed and proposed idea of warming up learning schedulers in order to prevent big variance / noise in the learning rate. Then idea has been further discussed in the following papers:
* Akilesh Gotmare et al. https://arxiv.org/abs/1810.13243
* Bernstein et al http://proceedings.mlr.press/v80/bernstein18a/bernstein18a.pdf
* Liyuan Liu et al: https://arxiv.org/pdf/1908.03265.pdf
There are two type of popularly used learning rate warm up ideas
* Constant warmup (start with very small constant learning rate)
* Linear Warmup ( start with small learning rate and gradually increase)
In this PR we are adding warm up as learning rate scheduler. Note that learning rates are chainable, which means that we can merge warmup scheduler with any other learning rate scheduler to make more sophisticated learning rate scheduler.
## Linear Warmup
Linear Warmup is multiplying learning rate with pre-defined constant - warmup_factor in the first epoch (epoch 0). Then targeting to increase this multiplication constant to one in warmup_iters many epochs. Hence we can derive the formula at i-th step to have multiplication constant equal to:
warmup_factor + (1-warmup_factor) * i / warmup_iters
Moreover, the fraction of this quantity at point i to point i-1 will give us
1 + (1.0 - warmup_factor) / [warmup_iters*warmup_factor+(i-1)*(1-warmup_factor)]
which is used in get_lr() method in our implementation. Below we provide an example how to use linear warmup scheduler and to give an example to show how does it works.
```python
import torch
from torch.nn import Parameter
from torch.optim import SGD
from torch.optim.lr_scheduler import WarmUpLR
model = [Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = SGD(model, 0.1)
scheduler = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=10, warmup_method="linear")
for epoch in range(15):
print(epoch, scheduler.get_last_lr()[0])
optimizer.step()
scheduler.step()
```
```
0 0.010000000000000002
1 0.019000000000000003
2 0.028000000000000008
3 0.03700000000000001
4 0.04600000000000001
5 0.055000000000000014
6 0.06400000000000002
7 0.07300000000000002
8 0.08200000000000003
9 0.09100000000000004
10 0.10000000000000005
11 0.10000000000000005
12 0.10000000000000005
13 0.10000000000000005
14 0.10000000000000005
```
## Constant Warmup
Constant warmup has straightforward idea, to multiply learning rate by warmup_factor until we reach to epoch warmup_factor, then do nothing for following epochs
```python
import torch
from torch.nn import Parameter
from torch.optim import SGD
from torch.optim.lr_scheduler import WarmUpLR
model = [Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = SGD(model, 0.1)
scheduler = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="constant")
for epoch in range(10):
print(epoch, scheduler.get_last_lr()[0])
optimizer.step()
scheduler.step()
```
```
0 0.010000000000000002
1 0.010000000000000002
2 0.010000000000000002
3 0.010000000000000002
4 0.010000000000000002
5 0.10000000000000002
6 0.10000000000000002
7 0.10000000000000002
8 0.10000000000000002
9 0.10000000000000002
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60836
Reviewed By: saketh-are
Differential Revision: D29537615
Pulled By: iramazanli
fbshipit-source-id: d910946027acc52663b301f9c56ade686e62cb69
Summary:
This FAQ has a section for CUDA OOMs where there are lots of don'ts. This limits modeling solution. Deep nets can blow up memory due to output caching during training.
It's a known problem with a known solution: to trade-off compute for memory via checkpointing.
FAQ should mention it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62709
Reviewed By: nairbv
Differential Revision: D30103326
Pulled By: ezyang
fbshipit-source-id: 3a8b465a7fbe19aae88f83cc50fe82ebafcb56c9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62662
Replaced the methods set_tensor(.) and get_tensor() in the python exposed API from the C++ logic with buffer() and set_buffer(.) to be a cleaner interface.
Reviewed By: SciPioneer
Differential Revision: D30012869
fbshipit-source-id: bd8efab583dd89c96f9aeb3dd48a12073f0b1482
Summary:
**Overview:**
This removes the preceding `_` from `_Join`, `_Joinable`, and `_JoinHook` in preparation for adding the generic join context manager tutorial (see [here](https://github.com/pytorch/tutorials/pull/1610)). This also adds a docs page, which can be linked from the tutorial. [Here](https://github.com/pytorch/pytorch/files/6919475/render.pdf) is a render of the docs page.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62605
Test Plan:
`DistributedDataParallel.join()`:
```
touch /tmp/barrier && TEMP_DIR="/tmp" BACKEND="nccl" WORLD_SIZE="2" gpurun python test/distributed/test_distributed_fork.py -- TestDistBackendWithFork.test_ddp_uneven_inputs TestDistBackendWithFork.test_ddp_uneven_inputs_stop_iteration_sync_bn TestDistBackendWithFork.test_ddp_grad_div_uneven_inputs TestDistBackendWithFork.test_ddp_uneven_input_join_disable TestDistBackendWithFork.test_ddp_uneven_input_exception
```
`ZeroRedundancyOptimizer`:
```
gpurun4 python test/distributed/optim/test_zero_redundancy_optimizer.py
```
NOTE: DDP overlap tests are failing due to a landing race. See https://github.com/pytorch/pytorch/pull/62592. Once the fix is landed, I will rebase, and tests should be passing.
`Join`:
```
gpurun4 python test/distributed/algorithms/test_join.py
```
Reviewed By: mrshenli
Differential Revision: D30055544
Pulled By: andwgu
fbshipit-source-id: a5ce1f1d9f1904de3bdd4edd0b31b0a612d87026
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62592
Reland #62510
`GradBucket` is an important class defined in both C++ and Python, used for PyTorch Distributed Training. We need to rename the following methods for simplicity:
1) get_index -> index
2) is_the_last_bucket_to_allreduce -> is_last,
3) get_per_parameter_tensors -> gradients,
4) get_model_params_for_bucket -> parameters.
ghstack-source-id: 134848352
Test Plan: unit test
Reviewed By: andwgu
Differential Revision: D30049431
fbshipit-source-id: 1bcac331aa30e529b7230e3891bc811c531b0ea9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62510
`GradBucket` is an important class defined in both C++ and Python, used for PyTorch Distributed Training. We need to rename the following methods for simplicity:
1) get_index -> index
2) is_the_last_bucket_to_allreduce -> is_last,
3) get_per_parameter_tensors -> gradients,
4) get_model_params_for_bucket -> parameters.
Test Plan:
Local run comprehensive test with following results:
https://pxl.cl/1Ml8b
For two timeout failure test cases, most likely environment related and fail in my devserver.
Reviewed By: SciPioneer
Differential Revision: D30024161
fbshipit-source-id: 07e6072a2f7b81f731425d9b71f8c8b60d383b0f
Summary:
This creates `torch.cuda.set_warn_on_synchronization()` function that would warn or error when synchronizing operation is performed. We could wrap it in a context manager for ease of use, but it would be a lie, because it sets global, and not thread-local state. Since it's intended for debugging, maybe that's ok though.
As all `torch.cuda.*` functions, it's going through CPython, not pybind, so the argument is converted to long before being passed to c10 function. I'll make python argument a python enum class, but without pybind it'll still have to go thourgh long conversion.
For a test script
```
import torch
torch.cuda.set_warn_on_synchronization(1)
x=torch.randn(10, device="cuda")
x.nonzero()
y=torch.randn((), device="cuda")
if y:
print("something")
torch.multinomial(x.abs(), 10, replacement=False)
torch.randperm(20000, device="cuda")
ind = torch.randint(10, (3,), device="cuda")
mask = torch.randint(2, (10,), device="cuda", dtype=torch.bool)
val = torch.randn((), device="cuda")
x[mask]=1.
x[mask] = val
torch.cuda.synchronize()
```
the output is
```
/../playground/sync_warn_test.py:4: UserWarning: called a synchronizing operation (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:145.)
x.nonzero()
/../playground/sync_warn_test.py:7: UserWarning: called a synchronizing operation (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:145.)
if y:
something
/../playground/sync_warn_test.py:9: UserWarning: called a synchronizing operation (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:145.)
torch.multinomial(x.abs(), 10, replacement=False)
/../playground/sync_warn_test.py:15: UserWarning: called a synchronizing operation (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:145.)
x[mask] = val
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62092
Reviewed By: mruberry
Differential Revision: D29968792
Pulled By: ngimel
fbshipit-source-id: cc6f817212c164727ed99ecf6ab050dc29631b9e
Summary:
Sphinx 4.x is out, but it seems that requires many more changes to
adopt. So instead use the latest version of 3.x, which includes
several nice features.
* Add some noindex directives to deal with warnings that would otherwise
be triggered by this change due to conflicts between the docstrings
declaring a function and the autodoc extension declaring the
same function.
* Update distributions.utils.lazy_property to make it look like a
regular property when sphinx autodoc inspects classes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61601
Reviewed By: ejguan
Differential Revision: D29801876
Pulled By: albanD
fbshipit-source-id: 544d2434a15ceb77bff236e934dbd8e4dbd9d160
Summary:
CI built the documentation for the recent 1.9.0rc1 tag, but left the git version in the `version`, so (as of now) going to https://pytorch.org/docs/1.9.0/index.html and looking at the version in the upper-left corner shows "1.9.0a0+git5f0bbb3" not "1.9.0". This PR should change that to cut off everything after and including the "a".
It should be cherry-picked to the release/1.9 branch so that the next rc will override the current documentation with a "cleaner" version.
brianjo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58486
Reviewed By: zou3519
Differential Revision: D28640476
Pulled By: malfet
fbshipit-source-id: 9fd1063f4a2bc90fa8c1d12666e8c0de3d324b5c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59077Fixes#58549
`from_buffer` constructs a tensor object from an already allocated buffer through
CPython's buffer protocol. Besides the standard `dtype`, `count`, and `offset` parameters,
this function also accepts:
- `device`: where the buffer lives
- `requires_grad`: should autograd record operations on the new tensor
A new test file _test_buffer_protocol.py_ was created. Currently, only CPU tests were
implemented. That's because neither PyTorch nor Numba implements CPython's buffer
protocol. Therefore, there's no way to create a CUDA buffer with the existing
dependencies (could use PyCUDA for that, though).
At the moment, if `device` differs from the device the buffer actually lives, two things
may happen:
- `RuntimeError`, if `device='cuda'`
- Segmentation fault (not tested -- see above), if `device='cpu'`
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision: D29870914
Pulled By: mruberry
fbshipit-source-id: 9fa8611aeffedfe39c9af74558178157a11326bb
Summary:
This PR un-reverts https://github.com/pytorch/pytorch/issues/61475 + fixes compilation with MSVC, that does not recognize alternative operator spellings (i.e. using `or` instead of `||` )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61937
Reviewed By: albanD
Differential Revision: D29805941
Pulled By: malfet
fbshipit-source-id: 01e5963c6717c1b44b260300d87ba0bf57f26ce9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61556
Prior to 1.10.0 `torch.__version__` was stored as a str and so many did
comparisons against `torch.__version__` as if it were a str. In order to not
break them we have TorchVersion which masquerades as a str while also
having the ability to compare against both packaging.version.Version as
well as tuples of values, eg. (1, 2, 1)
Examples:
Comparing a TorchVersion object to a Version object
```
TorchVersion('1.10.0a') > Version('1.10.0a')
```
Comparing a TorchVersion object to a Tuple object
```
TorchVersion('1.10.0a') > (1, 2) # 1.2
TorchVersion('1.10.0a') > (1, 2, 1) # 1.2.1
```
Comparing a TorchVersion object against a string
```
TorchVersion('1.10.0a') > '1.2'
TorchVersion('1.10.0a') > '1.2.1'
```
Resolves https://github.com/pytorch/pytorch/issues/61540
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Test Plan: Imported from OSS
Reviewed By: zou3519
Differential Revision: D29671234
Pulled By: seemethere
fbshipit-source-id: 6044805918723b4aca60bbec4b5aafc1189eaad7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56058
User facing changes:
1. Adds a negative bit and corresponding new API (`is_neg()`,`resolve_neg()`)
2. `tensor.conj().imag` now returns a floating point tensor with neg bit set to 1 instead of a tensor with no notion of negative bit. Note that imag is still a view and all the view properties still hold for imag.
Non user facing changes:
1. Added a new Negative dispatch key and a backend fallback to handle it
2. Updated copy kernel to handle negative bit
3. Merged conjugate and negative bit fallback kernel
4. fixed https://github.com/pytorch/pytorch/issues/60478 (caused due to https://github.com/pytorch/pytorch/pull/54987)
Testing:
1. Added a new OpInfo based test `test_neg_view` (verifies that out-of-place and in-place operations work correctly for all operations when the input is a neg view tensor by checking the result against an actually negated tensor, verifies that autograd returns the same output for both neg view and actually negated tensors as well as it works fine when grad_out is a neg view).
2. Added a new test class containing `test_conj_view`, `test_neg_view`.
Test Plan: Imported from OSS
Reviewed By: soulitzer
Differential Revision: D29636403
fbshipit-source-id: 12214c9dc4806c51850f4a72a109db9527c0ca63
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61145
Remove 'verbose' mode from PackageExporter as people have complained that it is not useful.
Test Plan: Imported from OSS
Reviewed By: suo
Differential Revision: D29559681
Pulled By: Lilyjjo
fbshipit-source-id: eadb1a3a25fadc64119334a09bf1fa4b355b1edd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60249
* Add introductory paragraph explaining what ONNX is and what the
torch.onnx module does.
* In "Tracing vs Scripting" and doc-string for torch.onnx.export(),
clarify that exporting always happens on ScriptModules and that
tracing and scripting are the two ways to produce a ScriptModule.
* Remove examples of using Caffe2 to run exported models.
Caffe2's website says it's deprecated, so it's probably best not to
encourage people to use it by including it in examples.
* Remove a lot of content that's redundant:
* The example of how to mix tracing and scripting, and instead
link to Introduction to TorchScript, which includes very similar
content.
* "Type annotations" section. Link to TorchScript docs which explain
that in more detail.
* "Using dictionaries to handle Named Arguments as model inputs"
section. It's redundant with the description of the `args` argument
to `export()`, which appears on the same page once the HTML
is generated.
* Remove the list of supported Tensor indexing patterns. If it's not
in the list of unsupported patterns, users can assume it's
supported, so having both is redundant.
* Remove the list of supported operators and models.
I think the list of supported operators is not very useful.
A list of supported model architectures may be useful, but in
reality it's already very out of date. We should add it back if
/ when we have a system for keeping it up to date.
* "Operator Export Type" section. It's redundant with the description
of the `operator_export_type` arg to to `export()`, which appears on
the same page once the HTML is generated.
* "Use external data format" section. It's redundant with the
description of the `use_external_data_format` arg to `export()`.
* "Training" section. It's redundant with the
description of the `training` arg to `export()`.
* Move the content about different operator implementations producing
different results from the "Limitations" section into the doc for the
`operator_export_type` arg.
* Document "quantized" -> "caffe2" behavior of
OperatorExportTypes.ONNX_ATEN_FALLBACK.
* Combing the text about using torch.Tensor.item() and the text about
using NumPy types into a section titled
"Avoid NumPy and built-in Python types", since they're both
fundamentally about the same issue.
* Rename "Write PyTorch model in Torch way" to "Avoiding Pitfalls".
* Lots of minor fixes: spelling, grammar, brevity, fixing links, adding
links.
* Clarify limitation on input and output types. Phrasing it in terms of
PyTorch types is much more accessible than in terms of TorchScript
types. Also clarify what actually happens when dict and str are used
as inputs and outputs.
* In Supported operators, use torch function and class names and link
to them. This is more user friendly than using the internal aten
op names.
* Remove references to VariableType.h, which doesn't appear to contain
the information that it once did. Instead refer to the generated
.pyi files.
* Remove the text in the FAQ about appending to lists within loops.
I think this limitation is no longer present
(perhaps since https://github.com/pytorch/pytorch/pull/51577).
* Minor fixes to some code I read along the way.
* Explain the current rationale for the weird ::prim_PythonOp op name.
Test Plan: Imported from OSS
Reviewed By: zou3519, ZolotukhinM
Differential Revision: D29494912
Pulled By: SplitInfinity
fbshipit-source-id: 7756c010b2320de0692369289604403d28877719
Co-authored-by: Gary Miguel <garymiguel@microsoft.com>
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61294
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60925
* Make `torch.distributed.launch` restarts to 0
* Remove unnecessary `-use_env` warning, move `-use_env` warnings
* Move `-use_env` warnings to `torch.distributed.launch`
* Make default log level WARNING
* Add new doc section around transitioning to `torch.distributed.run`
* Make `torch.distributed.launch` not use error-propagation
* Set default events handler to `null` that does not print events to console
* Add reference from `torch.distributed.launch` to `torch.distributed.run`
* Set correct preexec function that sends SIGTERM to child processes when parent dies
Issues resolved:
https://github.com/pytorch/pytorch/issues/60716https://github.com/pytorch/pytorch/issues/60754
Test Plan:
sandcastle
python -m torch.distributed.launch --nproc_per_node 2 main.py -> uses 0 restarts
python -m torch.distributed.run --nproc_per_node 2 main.py -> uses default for torchelastic, 0 restarts
python -m torch.distributed.launch --nproc_per_node=4 --use_env --no_python main.py -> produces error
python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py -> no warning
python -m torch.distributed.launch --nproc_per_node=4 --no_python main.py ->warning
Output of running torch.distributed.launch without --use_env:
$path/torch/distributed/launch.py:173: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torch.distributed.run.
Note that --use_env is set by default in torch.distributed.run.
If your script expects `--local_rank` argument to be set, please
change it to read from `os.environ('LOCAL_RANK')` instead.
New section:
{F628923078}
{F628974089}
Reviewed By: cbalioglu
Differential Revision: D29559553
fbshipit-source-id: 03ed9ba638bf154354e1530ffc964688431edf6b