Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31888
We need a backend-agnostic mechanism to do barrier-like operation before locally destroy RRef context and shutdown RPC Agent.
- Sort worker names.
- Elect the first name as the leader in the ordered worker names.
- Followers reports therir intent to synchronize to the leader.
- Leader also reports to itself, when `_wait_all_workers()` called.
- If all workers report their intent to proceed, leader send the command to every one to proceed.
ghstack-source-id: 96386210
Test Plan:
# Unit tests
```
buck test mode/dev-nosan //caffe2/test:rpc_fork
buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_wait_all_workers
buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rref_leak
```
```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift
buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_wait_all_workers
buck-out/gen/caffe2/test/rpc_fork_thrift\#binary.par -r test_worker_id
```
# Stress runs
```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_light_rpc --stress-runs 10
```
```
buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_light_rpc --stress-runs 10
```
```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_heavy_rpc --stress-runs 10
```
```
buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_heavy_rpc --stress-runs 10
```
Differential Revision: D19290954
fbshipit-source-id: cdb22203c2f27b5e0d0ad5b2d3b279d438c22dcf
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31917
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D19301480
Pulled By: ezyang
fbshipit-source-id: fcce8868733965b9fbd326b4ec273135759df377
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31351
Clang 4 needs the c10:: namespace specifier on fully_qualified_type_name_impl() to work correctly.
Also, let's add an error message for people using clang 3 and earlier, we don't support those compilers anymore but before this PR, they got a crappy message.
ghstack-source-id: 96380163
Test Plan: testinprod
Differential Revision: D19135587
fbshipit-source-id: c206b56240b36e5c207fb2b69c389bb39f1e62aa
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30916
These macros said "make it constexpr if we're in C++14". Since we're now always C++14, we can just say "constexpr" isntead.
ghstack-source-id: 96369584
Test Plan: waitforsandcastle
Differential Revision: D18869635
fbshipit-source-id: f41751e4e26fad6214ec3a98db2d961315fd73ff
Summary: I think this was wrong before?
Test Plan: Not sure.
Reviewed By: IvanKobzarev
Differential Revision: D19221358
fbshipit-source-id: 27e675cac15dde29e026305f4b4e6cc774e15767
Summary:
These were returning incorrect data before. Now we make a contiguous copy
before converting to Java. Exposing raw data to the user might be faster in
some cases, but it's not clear that it's worth the complexity and code size.
Test Plan: New unit test.
Reviewed By: IvanKobzarev
Differential Revision: D19221361
fbshipit-source-id: 22ecdad252c8fd968f833a2be5897c5ae483700c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31584
These were returning incorrect data before.
Test Plan: New unit test.
Reviewed By: IvanKobzarev
Differential Revision: D19221360
fbshipit-source-id: b3f01de086857027f8e952a1c739f60814a57acd
Summary: These are valid tensors.
Test Plan: New unit test.
Reviewed By: IvanKobzarev
Differential Revision: D19221362
fbshipit-source-id: fa9af2fc539eb7381627b3d473241a89859ef2ba
Summary:
As in title, this PR will disable the `--quiet` flag used in the CI as a workaround to a timeout hitting Mac OS CI. Circle CI works by timing out when no text has been printed for 10 min.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31900
Differential Revision: D19302899
Pulled By: bwasti
fbshipit-source-id: 145647da983ee06f40794bda1abd580ea45a0019
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31222
- When constructing torch::from_blob() in the case where the deleter is a nop, switch to using a nullptr context in the DataPtr (with a nop deleter)
- No real extra memory/cpu requirements here, actually saves a minor alloc.
Why? Trying to get a signal that a Tensor might contain non-owned memory from
torch::from_blob(), by detecting the nullptr context.
ghstack-source-id: 96336078
Test Plan:
buck test mode/dev caffe2/test/cpp/api/...
buck test mode/dev-nosan caffe2/test/...
Differential Revision: D18992119
fbshipit-source-id: 4eea642f82d0858b57fdfc6995364a760c10567d
Summary:
For now I'm just removing the decorators from all of the currently overridable functions in `torch.functional`. This means they are no longer overridable, however this should fix the benchmark regressions reported in https://github.com/pytorch/pytorch/issues/30831. Moving forward we'll be looking at reducing the overhead of the python-level override mechanism and failing that, re-implementing all of these operators in C++.
cc hl475
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30839
Differential Revision: D18838848
Pulled By: ezyang
fbshipit-source-id: 22b8015d7b2f7a947f1ebc9632c998e081b48ad8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31343
Fix an issue in TorchScript tracing for modules with `c10::List<at::Tensor>` as an output. TensorList was not supported properly.
Test Plan: unit tests
Reviewed By: wanchaol
Differential Revision: D18850722
fbshipit-source-id: 87a223104d1361fe754d55deceeb1e8bbcad629b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31508
This PR builds on top of https://github.com/pytorch/pytorch/pull/31230
to ensure that distributed autograd doesn't block an RPC thread anymore during
the backward pass.
I've also added a unit test where all ranks hammer rank 0 without about 60
backward calls (which would cause a deadlock earlier), but now such a test
passes without any issues.
ghstack-source-id: 96345097
Test Plan: waitforbuildbot
Differential Revision: D19188749
fbshipit-source-id: b21381b38175699afd0f9dce1ddc8ea6a220f589
Summary:
Fixes https://github.com/pytorch/pytorch/issues/28430
The unpythonic signatures for functions such as `torch.addcdiv` are already seperated in [`deprecated.yaml`] and the signatures marked as deprecated in `PythonArgParser`. However, nothing was done with this information previously. So, this now emits a warning when the deprecated signatures are used.
One minor complication is that if all arguments are passed as keyword args then there is nothing to differentiate the deprecated overload. This can lead to false warnings being emitted. So, I've also modified `PythonArgParser` to prefer non-deprecated signatures.
[`deprecated.yaml`]: https://github.com/pytorch/pytorch/blob/master/tools/autograd/deprecated.yaml
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31514
Differential Revision: D19298735
Pulled By: ezyang
fbshipit-source-id: 03cb78af17658eaab9d577cd2497c6f413f07647
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31909https://github.com/pytorch/pytorch/pull/31230 introduced a bug where
we would end up calling `graph_task_post_processing` twice for reentrant
backward calls (once when we mark the future completed and then we we called
graph_task_post_processing in execute_with_graph_task).
This PR fixes the issues by verifying the future we return in that case is
completed and we remove the call to graph_task_post_processing.
In addition to that I added a test that reproduced the problem and verified it
is fixed by this PR.
ghstack-source-id: 96349102
Test Plan: waitforbuildbot
Differential Revision: D19296363
fbshipit-source-id: dc01a4e95989709ad163bb0357b1d191ef5a4fb2
Summary:
In order to support Ubuntu18.04, some changes to the scripts are required.
* install dependencies with -y flag
* mark install noninteractive
* install some required dependencies (gpg-agent, python3-distutils, libidn11)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31886
Differential Revision: D19300586
Pulled By: bddppq
fbshipit-source-id: d7fb815a3845697ce63af191a5bc449d661ff1de
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31236
It is not compiled on Windows
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D19262581
Pulled By: ezyang
fbshipit-source-id: 80bfa553333a946f00291aaca6ad26313caaa9e6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31707
Change the initialization value for FC weight init and sparse embedding lookup init.
Previous default initialization is uniform(-\sqrt(1/input_dim), \sqrt(1/input_dim)); Now pass into a flexible hyperparameter, say \alpha into it, to change into uniform(-\sqrt(\alpha/input_dim), \sqrt(\alpha/input_dim));
Reviewed By: chonglinsun
Differential Revision: D18825615
fbshipit-source-id: 4c5f2e07f2b3f5d642fd96d64dbf68892ebeb30b
Summary:
The error message produced by AT_ASSERT() in gather() encouraged users to file a bug report ("please report a bug to PyTorch..."). The assertion should be a regular argument check since it can be triggered by passing tensors with different dimensionality, e.g. `torch.cuda.comm.gather([torch.rand(1, device='cuda'), torch.rand(1, 1, device='cuda')])`.
See: https://github.com/pytorch/pytorch/issues/26400
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27456
Differential Revision: D19300270
Pulled By: ezyang
fbshipit-source-id: ec87d225e23445020b377521e0daccceb4748215
Summary:
Done with:
```
❯ sed -i 's/1\.4\.0/1.5.0/g' $(find -type f -not -path "./third_party/*")
```
This was previously done in separate commits, but it would be beneficial to bump all included projects within this repository at the same time.
Old bumps for reference:
* [iOS]Update Cocoapods to 1.4.0: https://github.com/pytorch/pytorch/pull/30326
* [android] Change nightly builds version to 1.4.0-SNAPSHOT: https://github.com/pytorch/pytorch/pull/27381
* Roll master to 1.4.0: https://github.com/pytorch/pytorch/pull/27374
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31785
Differential Revision: D19277925
Pulled By: seemethere
fbshipit-source-id: f72ad082f0566004858c9374879f4b1bee169f9c
Summary:
This PR adds bfloat16 support for convolutions on ROCm.
- Intergrates MIOpen bfloat16 convolution support into PyTorch
- Enables bfloat16 convolution for non-miopen paths, i.e THCUNN, native hip kernels
- Enables bfloat16 type for probability distribution functions(this is included in this PR since conv unit tests use bfloat16 random number generators)
Native cuda kernels for convolution and random functions will be compiled for CUDA as well.
iotamudelta bddppq
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30948
Differential Revision: D19274164
Pulled By: ezyang
fbshipit-source-id: c0888a6ac72a2c5749b1ebb2195ac6f2209996be
Summary:
Compared to cuDNN bias, PyTorch add has the following advantage:
- faster, especially for backward (see: https://github.com/zasdfgbnm/things/blob/master/2019/conv-backward-profile.md)
- handles 64bit indexing automatically
- has less code, less maintenance effort
ngimel I submit this PR early so the CI could start building it. But I have not tested it locally yet (still waiting for compiling).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31524
Differential Revision: D19264244
Pulled By: ngimel
fbshipit-source-id: cb483d378a6d8bce0a05c3643a796e544bd8e8f0
Summary:
Closes https://github.com/pytorch/pytorch/issues/31497
This allows `torch.no_grad` and `torch.enable_grad` to be used as decorators for generator functions. In which case it disables/enables grad only inside the body of the generator and restores the context outside of the generator.
https://github.com/pytorch/pytorch/issues/31497 doesn't include a complete reproducer but the included test with `torch.is_grad_enabled` show this is working where it failed before.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31792
Differential Revision: D19274971
Pulled By: albanD
fbshipit-source-id: fde6d3fd95d76c8d324ad02db577213a4b68ccbe
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31157
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D19262583
Pulled By: ezyang
fbshipit-source-id: 8fb87b41ab53770329b38e1e2fe679fb868fee12
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31155
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D19262584
Pulled By: ezyang
fbshipit-source-id: 147ac5a9c36e813ea9a2f68b498880942d661be5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31152
Per apaszke: I can't find any reasonable references to libIRC online, so
I decided to remove this.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D19262582
Pulled By: ezyang
fbshipit-source-id: a1d47462427a3e0ca469062321d608e0badf8548
Summary:
This change is required for cases like:
x[1:] = data or x[:3] = data
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31552
Reviewed By: hl475
Differential Revision: D19238815
Pulled By: houseroad
fbshipit-source-id: 56c9837d86b341ea92b0a71d55034ce189d12e6c
Summary:
For backend integration, backend (e.g. Glow) needs to check the content of the tensor to determine whether it is a legit byte tensor or some special packed format. This provides a convenient interface for that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31290
Reviewed By: jackm321, qizzzh
Differential Revision: D19069684
Pulled By: yinghai
fbshipit-source-id: 63360fa2c4d32695fe9767a40027d446d63efdd4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31803
Refactored the following fairly similar functions:
1. `test_context_cleanup_tensor_with_grad`
2. `test_context_cleanup_tensor_no_grad`
3. `test_context_cleanup_no_tensors`
by creating a helper function `context_cleanup_test_helper` that can be invoked with the appropriate arguments.
Test Plan: Verified by running tests.
Differential Revision: D19269246
fbshipit-source-id: bfb42b078ad56b97ceeecf0d68b4169768c2c453
Summary:
When calling the add_images() method on the tensorboard SummaryWriter with a uint8 NCHW tensor, the tensor is incorrectly scaled, resulting in overflow behavior. This leads to incorrect images being displayed in tensorboard.
Issue: https://github.com/pytorch/pytorch/issues/31459
Local Testing (ran this code with and without the PR changes and printed scale_factor):
import torch
import torchvision
from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter()
x=torch.tensor([[[[1, 2, 3], [4, 5, 6]]]], dtype=torch.uint8)
writer.add_images("images", x)
Before- scale_factor: 255, After- scale_factor: 1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31778
Differential Revision: D19289189
Pulled By: anjali411
fbshipit-source-id: 350a1650337244deae4fd8f8b7fb0e354ae6986b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31230
A major issue with distributed autograd currently is that we block an
RPC thread when we call Engine::execute_with_graph_task.
To resolve this issue, I've made modifications to the local autograd engine
such that `execute_with_graph_task` returns a Future instead. The `execute()`
methods for Engine::execute() and DistEngine::execute() still wait() on this
Future which ensures there is no change in behavior yet.
In follow up PRs we can modify the distributed autograd engine to take
advantage of this Future.
Closes#26359
ghstack-source-id: 96298057
Test Plan: waitforbuildbot
Differential Revision: D18999709
fbshipit-source-id: 388f54467fd2415a0acb7df17bd063aedc105229
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30710
We need a backend-agnostic mechanism to do barrier-like operation before locally destroy RRef context and shutdown RPC Agent.
- Sort worker names.
- Elect the first name as the leader in the ordered worker names.
- Followers reports therir intent to synchronize to the leader.
- Leader also reports to itself, when `_wait_all_workers()` called.
- If all workers report their intent to proceed, leader send the command to every one to proceed.
Test Plan:
# Unit tests
```
buck test mode/dev-nosan //caffe2/test:rpc_fork -- test_wait_all_workers
buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_wait_all_workers$
buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rref_leak
buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rref_forward_chain
```
```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_wait_all_workers
buck-out/gen/caffe2/test/rpc_fork_thrift\#binary.par -r test_wait_all_workers$
```
# Stress runs
```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_light_rpc --stress-runs 10
```
```
buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_light_rpc --stress-runs 10
```
```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_heavy_rpc --stress-runs 10
```
```
buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_heavy_rpc --stress-runs 10
```
# Debug
```
buck test mode/dev-nosan caffe2/test:rpc_fork -- test_shutdown
```
```
buck test mode/dev-nosan //caffe2/test:dist_autograd_fork -- test_clean_context_during_backward
buck build mode/dev-nosan //caffe2/test:dist_autograd_fork
buck-out/gen/caffe2/test/dist_autograd_fork\#binary.par -r test_clean_context_during_backward
```
https://our.intern.facebook.com/intern/testinfra/diagnostics/281475127895800.844424945328750.1575664368/
```
I1206 12:27:47.491420 185619 process_group_agent.cpp:211] Shutting down ProcessGroupAgent.
I1206 12:27:47.493880 185630 process_group_agent.cpp:211] Shutting down ProcessGroupAgent.
I1206 12:27:47.494526 185625 process_group_agent.cpp:211] Shutting down ProcessGroupAgent.
I1206 12:27:47.495390 185636 process_group_agent.cpp:211] Shutting down ProcessGroupAgent.
E1206 12:27:47.544198 185627 pair.cc:642] 1 --->>> 0, read ERROR: AsyncSocketException: Network error, type = Network error, errno = 104 (Connection reset by peer)
E1206 12:27:47.544203 185633 pair.cc:642] 2 --->>> 0, read ERROR: AsyncSocketException: Network error, type = Network error, errno = 104 (Connection reset by peer)
E1206 12:27:47.544210 185639 pair.cc:642] 3 --->>> 0, read ERROR: AsyncSocketException: Network error, type = Network error, errno = 104 (Connection reset by peer)
```
This should mean the UDF in the request has been run, so Python proceeded and ran to `_agent.shutdown()`.
While the RpcAgents on followers wanted to send back the response, but the leader has closed RPC.
Need to re-trigger "pytorch_rpc-buck" to reproduce the rare-seen issue.
Differential Revision: D18643137
fbshipit-source-id: d669d4fc9ad65ed48bed1329a4eb1c32ba51323c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30612
The first version to move prim ops to c10 registration. After the reviewers are fine with the initial changes, more operators will be moved in the same style.
Test Plan: Imported from OSS
Differential Revision: D19237648
Pulled By: iseeyuan
fbshipit-source-id: c5a519604efffb80564a556536f17d829f71d9f9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29220
Support for accessing constant is added in previous
PRs, this PR re-enables the foldbn tests
Test Plan:
test_jit.py
Imported from OSS
Differential Revision: D18846848
fbshipit-source-id: 90ceaf42539ffee80b984e0d8b2420da66c263c3