Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19821
It is possible that not a single parameter is used during an
iteration. If this is the case, the `prepare_for_backward` function
marks all parameters as unused, kicks off reduction of all buckets,
*and* finalizes the reduction.
This is different from the prior implementation where we assumed that
autograd would produce a gradient for at least a single parameter.
We then used the autograd callback mechanism to queue a finalizer
callback. Now, this finalizer may be executed in line.
Reviewed By: mrshenli
Differential Revision: D15113272
fbshipit-source-id: dc91458b569cd8c106ddaeea558464b515683550
Summary:
When output blob names are specified while load_all=1, output blob names are ignored. However, this behavior is not documented. In this diff, we just disallow users to provide blob names when load_all=1.
See discussion at https://fb.workplace.com/groups/1405155842844877/permalink/2714909788536136/
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19133
Reviewed By: dzhulgakov
Differential Revision: D14883698
Pulled By: chandlerzuo
fbshipit-source-id: 6e4171e36c4ccc4f857e79da98b858a06b7d8ad6
Summary:
Previously in type unification when we encountered an Optional[T] and a None, we would unify it to Optional[Optional[T]]. If you think about Optionals as a union of [T, None], then a union of [Optional[T], None] -> [T, None]. We should just be never create an Optional of an Optional.
The other fix is to change unify_types directly, but I think this is the more general fix, and would play more nicely with our optional type refinement, which also assumes we never encounter an Optional[Optional[T]].
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19813
Reviewed By: suo
Differential Revision: D15103083
Pulled By: eellison
fbshipit-source-id: db803db10d6934eaa5458e7c1746546b0d0c0a6c
Summary:
It's been hard to understand how workers are launched and what code runs in the worker vs. main process, especially on Windows, which leads to many of our samples failing. This explains when workers run an how to make code work on Windows as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18091
Differential Revision: D15083766
Pulled By: soumith
fbshipit-source-id: 8a7e60defc8a72ec63874f657d7d5267d951dccf
Summary:
One more fix for https://github.com/pytorch/pytorch/pull/19810
We now know that we are running with python3, so no need to check python version. The quotes were probably causing problems here.
cc ezyang soumith zou3519
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19814
Differential Revision: D15106459
Pulled By: orionr
fbshipit-source-id: 0443b9b54d17fead9c8c2c9d8d2f373e1f95a28b
Summary:
This is a follow up PR for #19547.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19748
Differential Revision: D15103230
Pulled By: ezyang
fbshipit-source-id: e7ce925faeadea502f77ed42d52e247c8c6571d8
Summary:
This is a follow up PR for #19409.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19738
Differential Revision: D15103231
Pulled By: ezyang
fbshipit-source-id: 11c9fec641b389906b8accd22504a683331fa6ec
Summary:
Eigen was updated with the commit needed to get rid of this warning that plagued the CI. This PR bumps third_party/eigen to that commit head.
```
warning: #warning "host_defines.h is an internal header file and must not be used directly. This file will be removed in a future CUDA release. Please use cuda_runtime_api.h or cuda_runtime.h instead." [-Wcpp]
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19789
Differential Revision: D15103223
Pulled By: ezyang
fbshipit-source-id: 5b56c4dd9cc41ff1794570ba2f6abfbe23f6ab68
Summary:
Tested locally this fix#19039, did not add a test since there's no way to create a script module in the cpp world.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19700
Differential Revision: D15094195
Pulled By: wanchaol
fbshipit-source-id: fcc2c1e5efbc160d976ae485ba2457442f62f065
Summary:
Currently, the Python API doesn't serialize layers that don't have weights (such as `nn.ReLU` and `nn.MaxPool2d`e.g. in https://github.com/pytorch/vision/blob/master/torchvision/models/densenet.py#L80-L81). If one saves a model that contains weight-less layers in Python and tries to load it into C++, the C++ module loading code (`torch::load(...)`) will throw an error complaining that the expected layers are not found in the serialized file (e.g. https://github.com/pytorch/vision/pull/728#issuecomment-480974175). This PR solves the problem by ignoring layers that are not serializable (which currently only include `nn::Functional`) in the C++ module serialization code (`torch::save(...)` and `torch::load(...)`), and the user is expected to use `nn::Functional` to wrap the weight-less layers so that they can be ignored when serializing / deserializing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19740
Differential Revision: D15100575
Pulled By: yf225
fbshipit-source-id: 956481a2355d1de45341585abedda05e35d2ee8b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18820
ghimport-source-id: 220b2a3dd9d4d6d2e557e1802851f082c2dc6452
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18820 Refactor ProcessGroupNCCL collective primitives**
Planning to add reduce-scatter, but no room in my stomach for more
copypasta.
Also rewrote the tensor list validation logic. The existing validation
was ill-suited for all the cases it was being used for; it took a vector
of input tensors and a vector of output tensors, but only ever received
either two references to the same vector, or a bespoke singleton vector
and a vector of outputs (for which it would ignore all but the first
output). In the first case, it performed unnecessary checks, and in the
second, it skipped necessary ones.
Reviewed By: mrshenli
Differential Revision: D14762369
fbshipit-source-id: dcf882ce1c5854333a9eb4424bfc18d9f4648ddf
Summary:
In order to have `torch.utils.tensorboard.SummaryWriter` rendered in the documentation at the bottom of https://pytorch.org/docs/master/tensorboard.html we need to have TensorBoard installed.
This change makes it so our pinned version of `tb-nightly` is used for doc generation same as it is used for running tests at https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/test.sh#L45-L52
Eventually we'll use a pinned version of `pip install tensorboard`, but it's not on the release channel yet.
cc kostmo soumith ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19810
Differential Revision: D15101730
Pulled By: orionr
fbshipit-source-id: c41678c4f9ef3d56a168f2b96a1ab05f351bdc56
Summary:
Input argument `f` in `_model_to_graph()` method in `torch/onnx/utils.py` is unused. This PR removes it. If there's a reason to keep it around, please let me know.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19647
Reviewed By: dzhulgakov
Differential Revision: D15071720
Pulled By: houseroad
fbshipit-source-id: 59e0dd7a4d5ebd64d0e30f274b3892a4d218c496
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19799
A module that returns multiple outputs and where the called may end up
doing multiple calls to torch.autograd.backward did not work with
DistributedDataParallel. It expected the first call to
torch.autograd.backward to provide gradients for ALL parameters that
expect gradients and were used in computing the module output. If you
have outputs with disjoint autograd graphs it is fine to call
torch.autograd.backward on both and fill in the module's parameter
gradients in separate chunks.
With this change we delay queuing the finalizer callback until we have
marked all buckets as ready, instead of queueing it the first time we
receive an autograd hook. This returns the current implementation to
be functionally equivalent to the DistributedDataParallel
implementation before #18953 was merged.
Reviewed By: mrshenli
Differential Revision: D15097045
fbshipit-source-id: 2df023319713bc31e29a8b45108c78e6593fccd4
Summary:
* adds TORCH_API and AT_CUDA_API in places
* refactor code generation Python logic to separate
caffe2/torch outputs
* fix hip and asan
* remove profiler_cuda from hip
* fix gcc warnings for enums
* Fix PythonOp::Kind
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19554
Differential Revision: D15082727
Pulled By: kostmo
fbshipit-source-id: 83a8a99717f025ab44b29608848928d76b3147a4
Summary:
This PR adds TensorBoard logging support natively within PyTorch. It is based on the tensorboardX code developed by lanpa and relies on changes inside the tensorflow/tensorboard repo landing at https://github.com/tensorflow/tensorboard/pull/2065.
With these changes users can simply `pip install tensorboard; pip install torch` and then log PyTorch data directly to the TensorBoard protobuf format using
```
import torch
from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter()
s1 = torch.rand(1)
writer.add_scalar('data/scalar1', s1[0], 0)
writer.close()
```
Design:
- `EventFileWriter` and `RecordWriter` from tensorboardX now live in tensorflow/tensorboard
- `SummaryWriter` and PyTorch-specific conversion from tensors, nn modules, etc. now live in pytorch/pytorch. We also support Caffe2 blobs and nets.
Action items:
- [x] `from torch.utils.tensorboard import SummaryWriter`
- [x] rename functions
- [x] unittests
- [x] move actual writing function to tensorflow/tensorboard in https://github.com/tensorflow/tensorboard/pull/2065
Review:
- Please review for PyTorch standard formatting, code usage, etc.
- Please verify unittest usage is correct and executing in CI
Any significant changes made here will likely be synced back to github.com/lanpa/tensorboardX/ in the future.
cc orionr, ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16196
Differential Revision: D15062901
Pulled By: orionr
fbshipit-source-id: 3812eb6aa07a2811979c5c7b70810261f9ea169e
Summary:
disable_autodiff_subgraph_inlining should be always on to check AD regression.
Thanks eellison for spotting the test regression!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19787
Differential Revision: D15093104
Pulled By: ailzhang
fbshipit-source-id: 82a75a7dd7097d5f93a2e4074023da2105341c1b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19378
Function profile events are typically nested. In this diff I
add parent child relationship to the intervals. This way we can
attribute self time easily. As a result, user printing a table from a
profiler trace gets self cpu time.
This diff doesn't try to address CUDA self time as CUDA kernels are
already getting special care in the profiler.
There are also some other minor improvements. Like reporting total CPU
time spent, reversed sorting, aggregated data after the table,
etc.
There is a new unit test added which tests more functionality than
previous profiler test
Reviewed By: zheng-xq
Differential Revision: D14988612
fbshipit-source-id: 2ee6f64f0a4d0b659c6b23c0510bf13aa46f07dc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19319
Quantized SUM + ReLU (Fused). The implementation is the same as the one in the DNNLOWP.
Reviewed By: jianyuh
Differential Revision: D14866442
fbshipit-source-id: c8c737a37e35b6ce3c1c2077c07546aba16e0612
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19312
Replaces the tuple hack with the QTensor. Please, note this can be landed ONLY after #18960 (D14810261) is landed.
Reviewed By: raghuramank100
Differential Revision: D14819460
fbshipit-source-id: 75ca649304b1619cb3cfe845962c9f226b8f884a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19714
We had rounding in the quantizer set as `round(x/scale) + zp`. To make it consistent, converting it to `round(x/scale + zp)`.
Reviewed By: raghuramank100
Differential Revision: D15077095
fbshipit-source-id: 5d20a90391fe8c2e11b338c05631fcf7770320c3