Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62794
This pr updates jit serialization to support pickling Sparse COO tensors.
This pr updates message.cpp to support Sparse COO tensors.
A bug was filed a few years ago https://github.com/pytorch/pytorch/issues/30807.
I tested the fix by adding sparse tensor tests to rpc_test.py and dist_autograd_test.py.
cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23 gmagogsfm
Test Plan: Imported from OSS
Reviewed By: soulitzer
Differential Revision: D30608848
Pulled By: gcramer23
fbshipit-source-id: 629ba8e4a3d8365875a709c9b87447c7a71204fb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63799
Add a new module that can be used for module swap with the nni.LinearReLU module in convert function.
Supports INT8 currently (since FP16 op doesn't have relu fusion yet).
Fixes#55393
Test Plan:
python test/test_quantization.py test_dynamic_fusion
Imported from OSS
Reviewed By: heitorschueroff
Differential Revision: D30502812
fbshipit-source-id: 3668e4f001a0626d469e17ac323acf582ee28a51
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62737
ReductionOpInfo is a specialization of OpInfo for reduction operators. For now, it is designed to work with reductions that return a single tensor and that reduce all elements along one or more dimensions to a single value. In particular this excludes operators such as `max` and `min` that return multiple tensors and `quantile` that can return multiple values.
fixes https://github.com/pytorch/pytorch/issues/49746
Test Plan: Imported from OSS
Reviewed By: ejguan
Differential Revision: D30406568
Pulled By: heitorschueroff
fbshipit-source-id: 218b1da1902f67bcf4c3681e2a0f0029a25d51f1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63831
Closes https://github.com/pytorch/pytorch/issues/63812
`at::mul_out` is not supported when `grad` itself requires grad, which is useful for computing higher order derivatives.
In this case, fall back to a mul + copy instead of mul_out.
ghstack-source-id: 136614644
Test Plan: UT
Reviewed By: SciPioneer
Differential Revision: D30505573
fbshipit-source-id: 83532b6207b3d80116fcc4dff0e5520d73b3454f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63983
Test for fixes in D30545351. it should resolve the remote execution flag being populated incorrectly issue.
Test Plan: CI
Reviewed By: malfet, seemethere
Differential Revision: D30549443
fbshipit-source-id: b3895909f5cd654ba163b77950872b332fbad3fe
Summary:
This PR moves some modules into `common_modules` to see what it looks like.
While migrating some no batch modules into `common_modules`, I noticed that `desc` is not used for the name. This means we can not use `-k` to filter tests. This PR moves the sample generation into `_parametrize_test`, and passes in the already generated `module_input` into users of `modules(modules_db)`.
I can see this is a little different from opsinfo and would be happy to revert to the original implementation of `modules`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62999
Reviewed By: heitorschueroff
Differential Revision: D30522737
Pulled By: jbschlosser
fbshipit-source-id: 7ed1aeb3753fc97a4ad6f1a3c789727c78e1bc73
Summary:
Realized we were missing ROCm as a platform on which one could disable a flaky test. (like how this issue specifies windows https://github.com/pytorch/pytorch/issues/61655)
cc jeffdaily sunway513 jithunnair-amd ROCmSupport
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63813
Reviewed By: seemethere
Differential Revision: D30498478
Pulled By: janeyx99
fbshipit-source-id: f1abe8677e1ddd01de3291e1618272ad8e287dc4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63463
Now that `torch.distributed.optim` gates DistributedOptimizer on RPC availability, local sgd optimizer can be used on windows.
ghstack-source-id: 136437632
Test Plan: Ci
Reviewed By: SciPioneer
Differential Revision: D30358922
fbshipit-source-id: 9b56aebf1075f026637296d338805ad8851c9d40
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63462
Now that `torch.distributed.optim` gates DistributedOptimizer on RPC availability, these tests can be run on windows.
ghstack-source-id: 136437635
Test Plan: CI
Reviewed By: SciPioneer
Differential Revision: D30358923
fbshipit-source-id: 36739bdfe7214789f17de652d30c62c2bc124c73
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63711
This removes `_fork_process` from common_distributed.py and fixes all
other callpoints to use `spawn_process` instead.
ghstack-source-id: 136395719
Test Plan: waitforbuildbot
Reviewed By: xush6528
Differential Revision: D30463834
fbshipit-source-id: 0c09e8a996d0e5b912c8cdd45488a39951bac4db
Summary:
Turns on BN in autodiff:
1. outputs an empty tensor for running stats to by pass autodiff issue on None;
2. fixing BN inference backward in cudnn & miopen, where backward falls back to native batchnorm kernel instead;
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57321
Reviewed By: albanD, ngimel
Differential Revision: D30250419
Pulled By: jansel
fbshipit-source-id: a62553789c20fb50a820003a056f40d9d642dfaa
Summary:
As proof of concept, this PR uses the new `BinaryUfuncOpInfo` in broadcasting tests for `add`, `sub`, `mul`, `div`, `floor_div`, and `true_div`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61964
Reviewed By: ngimel
Differential Revision: D30407734
Pulled By: mruberry
fbshipit-source-id: ada28994f43b0635f279f45a02ecba18bc8ee033
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63443
After https://github.com/pytorch/pytorch/pull/63442, all distributed
tests can run with opt-asan. As a result, we can now remove all of our fork
based tests.
This is the first PR in a stack, which first removes fork based tests from RPC.
ghstack-source-id: 136177744
Test Plan: waitforbuildbot
Reviewed By: lw
Differential Revision: D30384905
fbshipit-source-id: 86d438aebaa6cb02ae2a966fea244849849a1889
Summary:
We currently build breakpad from [this fork](https://github.com/driazati/breakpad) to include extra logic to restore signal handlers that were previously present. With some [new additions](https://github.com/google/breakpad/compare/main...driazati:main) this fork now includes a CMake based build, so we can add breakpad as a proper dependency rather than rely on including it in Docker images as a system library which is error prone (we have a bunch of images) and hard to extend to MacOS / Windows. This also includes some changes to the crash handling code to support MacOS / Windows in a similar way to Linux.
```python
import torch
# On Windows this writes crashes to C:\Users\<user>\AppData\pytorch_crashes
# On MacOS/Linux this writes crashes to /tmp/pytorch_crashes
torch.utils._crash_handler.enable_minidumps()
# Easy way to cause a segfault and trigger the handler
torch.bincount(input=torch.tensor([9223372036854775807]))
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63186
Reviewed By: malfet, seemethere
Differential Revision: D30318404
Pulled By: driazati
fbshipit-source-id: 0d7daf3701cfaba5451cc529a0730272ab1eb1dc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62915
As much as 45% and 20% perf improvement on CUDA and CPU respectively.
consistent improvement in perf for all cases -- see perf numbers in comments below
Test Plan: Imported from OSS
Reviewed By: heitorschueroff
Differential Revision: D30404006
Pulled By: anjali411
fbshipit-source-id: 565940da28c7761d993cf43346932c24292e8a4d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63481
This PR changes the SkipInfo decorators to use unittest.expectedFailure so that the test reports as XFAIL as opposed to PASSED.
Note that changing the expectedFailure here 30e1c74dc1/torch/testing/_internal/common_device_type.py (L879) to an XFAIL is not possible because the decision of whether to decorate is delayed until the wrapper function is called.
fixes https://github.com/pytorch/pytorch/issues/63363
Test Plan: Imported from OSS
Reviewed By: ZolotukhinM
Differential Revision: D30397154
Pulled By: heitorschueroff
fbshipit-source-id: c5e4911969ad8667763eec4203dbbc6a51178592
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63361
Python multiprocessing doesn't support LSAN and causes false positives
instead. As a result, disabling LSAN for these tests so that we can still run
with opt-asan
ghstack-source-id: 135962489
Test Plan: waitforbuildbot
Reviewed By: rohan-varma
Differential Revision: D30352269
fbshipit-source-id: f6ab5abce7bdef00cd5e1f5977424d2b151174af
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63406
The `RemoteException` will be thrown on the caller side when converting
the response message to IValue. Since it is a Python error, the error
message needs to be extracted explicitly and clear the `PyErr`.
Test Plan: Imported from OSS
Reviewed By: rohan-varma, ngimel
Differential Revision: D30372741
Pulled By: mrshenli
fbshipit-source-id: 1f72a7ee0c39cc2ef070f99884c142f7b3e0543d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63383
Per title
ghstack-source-id: 135966157
Test Plan: CI
Reviewed By: SciPioneer
Differential Revision: D30358921
fbshipit-source-id: 965e054e525194b1ee55980340df275bab355c9b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63382
Per title
ghstack-source-id: 135966156
Test Plan: CI
Reviewed By: SciPioneer
Differential Revision: D30255446
fbshipit-source-id: e6ffbf339db0bc5b4702d02b74a462309df07c75
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62279
Before rebuild buckets, `kDefaultFirstBucketBytes` is actually misleading because we reverse the parameter indices when initialize reducer so it is actually the size of the last bucket.
Currently rebuild buckets sets this to be the first bucket size, but seeing if keeping it as last can help perf.
This is currently experimental only and don't plan to land it unless experiments show a clear win.
ghstack-source-id: 135966897
Test Plan: CI
Reviewed By: SciPioneer
Differential Revision: D29927931
fbshipit-source-id: 55b949986fa2c3bade6fcb4bf5b513461bf0f490
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63359Fixes#63352. The problem was that in e.g. `torch.matmul(A, B)` with A,
B having shapes [3, 2, 0] and [0, 2], the code attempts to call
`A.view(-1, 0)` which fails due to "-1 being ambiguous". The solution is
to manually compute what we want the shape of the view to be.
Test Plan: - new tests
Reviewed By: ngimel
Differential Revision: D30351583
Pulled By: zou3519
fbshipit-source-id: 7625691fe8b85d96a4073409596a932c303e3e8c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63277
`PostLocalSGDState` requires a subgroup. To initialize this subgroup, a global process group must be initialized. However, this imposes a restriction that a hook state can only be provided after distributed environment initialization, which is not compatible with lightning DDP plugin setup where hook state should be provided before distributed environment initialization.
Proposal: https://github.com/pytorch/pytorch/issues/59699
ghstack-source-id: 135848575
Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_hook_parity_post_localSGD
Reviewed By: cbalioglu
Differential Revision: D30325041
fbshipit-source-id: 7b870166d096d306c3f2f7c69816a705cec0bebd