Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70145
Added support for torch.equal to ShardedTensor. This is really
helpful in terms of comparing two ShardedTensors.
ghstack-source-id: 146066939
Test Plan: waitforbuildbot
Reviewed By: wanchaol
Differential Revision: D33201714
fbshipit-source-id: 56adfc36e345d512c9901c56c07759bf658c745b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69734
Added support for `torch.equal` to ShardedTensor. This is really
helpful in terms of comparing two ShardedTensors.
Will implement `allclose` in a follow PR.
ghstack-source-id: 145301451
Test Plan: waitforbuildbot
Reviewed By: fduwjj, wanchaol
Differential Revision: D33004315
fbshipit-source-id: 786fe26baf82e1bb4fecfdbfc9ad4b64e704877f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68607
This PR added ShardedOptimizer and a API to get module parameters along with ShardedTensor param, it allows user to use this Optimizer Wrapper to construct a optimizer that involves ShardedTensor
The state_dict support will be a follow up diff
ghstack-source-id: 145532834
Test Plan: python test_sharded_optim.py
Reviewed By: pritamdamania87
Differential Revision: D32539994
fbshipit-source-id: a3313c6870d1f1817fc3e08dc2fc27dc43bef743
Summary:
The error message was changed following a PR comment. And since the test doesn't run on CI, I forgot to update the test to catch the new error message.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69565
Reviewed By: mrshenli
Differential Revision: D32932982
Pulled By: albanD
fbshipit-source-id: a1da72b0ca735e72b481bc944039233094f1c422
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68822
Per title, we switched over c10d_gloo and nccl and results look good
so far, so switch the rest of them as well. After the only dist tests that
won't run in subprocess are pipe and fsdp tests, which historically haven't had
much flakiness.
ghstack-source-id: 144213522
Test Plan: CI
Reviewed By: H-Huang
Differential Revision: D32624330
fbshipit-source-id: 469f613e5b0e4529e6b23ef259d948837d4af26b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68821
Continuing effort to move most distributed tests to run in subprocess
for better reproducibility + reduce flakiness.
ghstack-source-id: 144213520
Test Plan: CI
Reviewed By: H-Huang
Differential Revision: D32624199
fbshipit-source-id: 04448636320554d7a3ab29ae92bc1ca9fbe37da2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68504
Per title
ghstack-source-id: 143928767
Test Plan: CI
Reviewed By: H-Huang
Differential Revision: D32485100
fbshipit-source-id: a55687aea4af69e3830aee6f0278550c72f142c2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68503
Per title
ghstack-source-id: 143928768
Test Plan: CI
Reviewed By: H-Huang
Differential Revision: D32484990
fbshipit-source-id: 6682f46256af0da5153e5087a91a7044156dd17f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67400
c10d/frontend.cpp was originally proposed to introduce pure C++ API and use TorcBind to share python level API with TorchScript. This is no longer needed, so delete this to reduce code redundancy.
ghstack-source-id: 143910066
ghstack-source-id: 143910066
Test Plan: wait for ci
Reviewed By: navahgar
Differential Revision: D31979270
fbshipit-source-id: 6ceb8b53d67ab8f9aef44b34da79346dfbb51225
Summary:
Context: https://github.com/pytorch/pytorch/issues/67061
Use `run_test.py`'s provided flag `"--subprocess"`, passed in like `extra_unittest_args=["--subprocess"]` when running test_distributed_spawn. This will ensure that each test is run separately in its own process. The goal is to more closely simulate how a developer would run a single test when reproducing a CI failure and make reproducibility easier in general.
Also, when a test fails, print out the exact command that was issued so developer knows how to reproduce it.
For example test fails, it will print out something like the following to logs -
```
Test exited with non-zero exitcode 1. Command to reproduce: BACKEND=gloo WORLD_SIZE=3 /fsx/users/rvarm1/conda/envs/pytorch/bin/python distributed/test_distributed_spawn.py -v TestDistBackendWithSpawn.test_Backend_enum_class
```
running test_distributed_spawn is still the same cmd as before:
`
python test/run_test.py --verbose -i distributed/test_distributed_spawn
`
as seen in [distributed contributing](https://github.com/pytorch/pytorch/blob/master/torch/distributed/CONTRIBUTING.md) guide.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67901
Reviewed By: cbalioglu, mruberry
Differential Revision: D32225172
Pulled By: rohan-varma
fbshipit-source-id: 7e8d4c7a41858044bd2a4e0d1f0bf8f1ac671d67
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66101
Updated description:
This PR tests the functionalization pass in python in two ways. For each of the test programs that I have in `test_functionalization.py`, it:
- runs the program with and without functionalization, and asserts the outputs and (potentially mutated) inputs are equal in both cases
- runs the program with `LoggingTensor`, and uses expecttests on the resulting graph. I manually confirm that the graphs look reasonable and only contain functional ops.
Mechanically, the changes include:
- factoring out `LoggingTensor` into a testing util so it can be re-used in multiple tests
- adding some private python api's in the `torch` namespace as hooks that I can use during testing
In the original version of this PR, I also added some fixes to the `_make_subclass()` function in python: allowing you to pass in strides and storage_offset. I kept them in mainly because the changes were already there.
Test Plan: Imported from OSS
Reviewed By: zou3519
Differential Revision: D31942095
Pulled By: bdhirsh
fbshipit-source-id: 90ff4c88d461089704922e779571eee09c21d707
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67188
This diff/PR is trying to implement the ShardedEmbeddingBag using the ShardedTensor.
We support both row-wise and column-wise sharding of the embedding bag. The detailed logic can be found in the comment.
Several caveats:
1. Only the sharding of one weight is supported now.
1. We support limited input params for the op. To support more params are on the way.
2. We only support chuck sharding for now.
3. We only support a single local shard per rank for now.
Some other changes include:
1. Refactor the ShardedEmbedding code so that the common logic can be reused.
2. Fix tiny typos and corner cases in API `get_chunked_dim_size`. Where it will return -1 if the we set the dim_size = 5, split_size = 2, idx = 3. (This is a valid case because when chunks = 4, dim_size = 5, then the split_size = 2)
ghstack-source-id: 142325915
Test Plan: Unit test and CI
Reviewed By: pritamdamania87
Differential Revision: D31749458
fbshipit-source-id: ed77e05e4ec94ef1a01b1feda8bbf32dc5d5da1b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63997
Use torch_function to extend torch.nn.init.uniform_
The Init is done in SPMD fashion. Note that ideally we want to aggregate sharded tensors into a global tensor, init it and reshard. It's fine to run it SPMD since uniform is I.I.D indepenent and identifically distributed.
Also enable unit test for test_linear.py for OSS test
Test Plan:
a) Unit Test
(pytorch) ... $ python test/distributed/_sharded_tensor/ops/test_init.py TestShardedTensorNNInit --v
(pytorch) ... $ python test/distributed/_sharded_tensor/ops/test_linear.py --v (before runs this command is no-op)
or b) Manual run: Instruction here: https://docs.google.com/document/d/1_m1Hdo5w51-hhPlZ_F8Y6PIWrN7UgJZqiSpARYvhsaE/edit#
Imported from OSS
Reviewed By: pritamdamania87, anjali411
Differential Revision: D30563017
fbshipit-source-id: d1859f7682235bcb44515efc69ca92bc5e34fce1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66604
This diff/PR is trying to implement the ShardedEmbedding and ShardedEmbedding using the ShardedTensor.
Several caveats:
1. We support limited input params for the op. To support more params are on the way.
2. We only support chuck sharding for now.
3. We only support a single local shard per rank for now.
ghstack-source-id: 141056130
Test Plan: Unit test and CI
Reviewed By: pritamdamania87
Differential Revision: D31544556
fbshipit-source-id: cc867dcba8c11e6f4c7c3722488908f5108cc67f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63881
This PR includes the minimal sets of features to make FSDP work, like sharding, core data flow and hooks. More tests will be added in the follow up PRs. Tests are refactored to utilize common PyTorch utils. Codes are also refactored a little bit. Alternative ways to replace ".data" usage in this PR are still being discussed offline.
Test Plan: unit tests
Reviewed By: mrshenli
Differential Revision: D30521673
fbshipit-source-id: 9a23390dd7c925749604c6860e08fbe39ddc5500
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64128
This PR implements a sharded nn.Linear layer using ShardedTensors with
the following limitations:
1) Works only for ChunkShardingSpec.
2) Implementation is only aimed to demonstrate functionality and is most likely
not performant at all.
The PR also introduces a `shard_parameter` API to easily shard parameters of
`nn.Modules`. This also has the following limitations:
1) Works only for ChunkShardingSpec.
2) Is not performant since it uses broadcast instead of scatter since
ProcessGroupNCCL doesn't yet support scatter.
Overall user API for running a sharded linear would be something like this:
```
# SPMD programming paradigm running same code on all nodes.
fc = nn.Linear(10, 10)
# Setup sharding.
sharding_spec=ChunkShardingSpec(...)
shard_parameter(fc, 'weight', sharding_spec, src_rank=0)
# Run as a normal linear layer.
inp = torch.rand(10, 10)
output = fc(inp)
```
ghstack-source-id: 138500985
Test Plan:
1) unit tests.
2) waitforbuildbot
Reviewed By: wanchaol, bowangbj
Differential Revision: D30621215
fbshipit-source-id: 1aa7478568c18a4572f6c3462fdf24a4cbde01d6
Summary:
There were several reports of target determinator incorrectly skipping
tests, most recent one is https://github.com/pytorch/pytorch/issues/64902
Let's disable it until it could be further stabilized
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64921
Reviewed By: seemethere, janeyx99
Differential Revision: D30901186
Pulled By: malfet
fbshipit-source-id: 531afd2d390c6b51f727330d5dd1882d70b6fdde
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64253
Follow up to D30496178 (f4aff3a346) to move the rest of distributed tests to their own jobs for Linux GHA.
ghstack-source-id: 137233785
Test Plan: CI
Reviewed By: walterddr
Differential Revision: D30662999
fbshipit-source-id: f7cfbc0d1223aca52120f17f9da987d70fda8de6
Summary:
Introduce `discover_tests` function that globs for all Python files
starting with `test_` in test folder excluding subfolders which are
executed differently
Fixes https://github.com/pytorch/pytorch/issues/64178
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64246
Reviewed By: walterddr, seemethere
Differential Revision: D30661652
Pulled By: malfet
fbshipit-source-id: a52e78ec717b6846add267579dd8d9ae75326bf9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64197
Removes this line as test is gone.
ghstack-source-id: 136986275
Test Plan: CI
Reviewed By: walterddr
Differential Revision: D30642929
fbshipit-source-id: a0c7dfdfb35a4a7f7ec1b881dbea53d85136012c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63534
In this PR:
* We have changed the default dtype of `AutocastCPU` from `float16` to `bfloat16` as discussed here `https://github.com/pytorch/pytorch/pull/61002`
* We also update the operation list which needs casting to `lower_precision_fp` or `float32`.
Test Plan: Imported from OSS
Reviewed By: zou3519
Differential Revision: D30644914
Pulled By: ezyang
fbshipit-source-id: 8b93485ba452b3759611e3f0ac88e920fe495ac1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63496
This PR adds a (private) enable_python_mode context manager.
(see torch/utils/_python_dispatch.py).
enable_python_mode accepts the type of a __torch_dispatch__ object
as its argument. Whenever an operator gets called inside of the
context manager, it dispatches to the __torch_dispatch__ of
the passed-in type.
Example usage:
```
with enable_python_mode(LoggingTensor):
z = torch.empty([])
assert isinstance(z, LoggingTensor)
```
There are quite a few changes that were made to support this.
First, we added TorchDispatchTypeObject, a C++ struct that represents the
type of a `__torch_dispatch__` object (e.g. LoggingTensor).
It holds both the PyObject* representing the class and a PyInterpreter*
so we know which Python interpreter it came from.
Next, we updated the concrete_dispatch_fn in python_variable.cpp to accept
a `const std::shared_ptr<TorchDispatchTypeObject>&` argument. When this
is null, dispatching happens as usual. When it is non-null, we prepend
the TorchDispatchTypeObject's PyObject* to the overloaded args list so that
it is considered first for dispatch.
To get that to work, we changed how `handle_torch_dispatch_no_python_arg_parser`
works. The "overloaded args list" previously only consisted of Tensor PyObjects,
but now it can have types in addition to Tensors!
- We renamed `append_overloaded_arg` to `append_overloaded_arg`
- We added a new `append_overloaded_type` that appends a type to
overloaded_args
- We added special handling in `handle_torch_dispatch_no_python_arg_parser`
and `append_overloaded_arg` to handle types in addition to Tensors.
Then, there is PythonMode and PythonModeTLS.
- We reuse the DispatchKey::Python dispatch key as a mode key
- We use PythonMode::enter and PythonMode::exit to enable/disable
DispatchKey::Python and set the PythonModeTLS.
- PythonModeTLS stores a TorchDispatchTypeObject as metadata.
- PythonMode is in libtorch_python, and PythonModeTLS is in ATen.
This split is due to the libtorch_python library boundary (because we need
to save TLS in ATen/ThreadLocalState)
- We modify the PythonFallbackKernel to look up
the relevant TorchDispatchTypeObject (if Python Mode is active) and
dispatch using it.
There are two more miscellaneous changes:
- internal_new_from_data (torch/csrc/utils/tensor_new.cpp) gets an
exclude guard. enable_python_mode currently does not handle
torch.tensor and the exclude guard is to prevent a bug.
Future:
- This PR does not allow for the nesting of Python modes. In the future we
should be able to enable this with a more sane no_dispatch API and by changing
the TLS to a stack. For now I did not need this for CompositeImplicitAutograd testing.
Test Plan: - new tests
Reviewed By: malfet, albanD
Differential Revision: D30543236
Pulled By: zou3519
fbshipit-source-id: ef5444d96a5a957d1657b7e37dce80f9a497d452
Summary:
This is in response to a feature request from some folks in the core team to have a local command that would only run relevant "core" tests. The idea is to have a local smoke test option for developers to run locally before making a PR in order to verify their changes did not break core functionality. These smoke tests are not targeted to be short but rather relevant.
This PR enables that by allowing developers to run `python test/run_test.py --core` or `python test/run_test.py -core` in order to run the CORE_TEST_LIST, which is currently test_nn.py, test_torch.py, and test_ops.py.
I am not the best person to judge what should be considered "core", so please comment which tests should be included and/or excluded from the CORE_TEST_LIST!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63976
Test Plan:
```
(pytorch) janeyx@janeyx-mbp test % python run_test.py --core -v
Selected tests: test_nn, test_ops, test_torch
Running test_nn ... [2021-08-25 14:48:28.865078]
Executing ['/Users/janeyx/miniconda3/envs/pytorch/bin/python', 'test_nn.py', '-v'] ... [2021-08-25 14:48:28.865123]
test_to (__main__.PackedSequenceTest) ... ok
test_to_memory_format (__main__.PackedSequenceTest) ... ok
```
Reviewed By: walterddr
Differential Revision: D30575560
Pulled By: janeyx99
fbshipit-source-id: 3f151982c1e315e50e60cb0d818adaea34556a04
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63809
This moves out the modulefinder determinator to `tools/testing` since it is supposed to be CI-only. This also simplifies run_test.py a little bit.
Test Plan: Imported from OSS
Reviewed By: malfet, seemethere, janeyx99
Differential Revision: D30497438
Pulled By: driazati
fbshipit-source-id: 1d203037af5af6a20c1e7812da935e7cbb5cd82f
Summary:
Currently distributed tests are mixed within test_python.
We would like to split the distributed tests into its own batch thus we need to split them out.
Adding an option to include/exclude distributed tests with CUSTOM_HANDLERS.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63147
Test Plan:
- locally run with the addition run_test.py options.
- CI
Dependency: found a bug in mpiexec test and need https://github.com/pytorch/pytorch/issues/63580 to fix it first.
Reviewed By: bdhirsh
Differential Revision: D30496178
Pulled By: walterddr
fbshipit-source-id: 7903a57b619f2425028028f944211938823918a6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63357
Adds the ability to set CONTINUE_THROUGH_ERROR as an environment
variable so that we can easily set it without having to add the flag
directly
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Test Plan: Imported from OSS
Reviewed By: astaff
Differential Revision: D30351108
Pulled By: seemethere
fbshipit-source-id: 767fa9bd24e1399f359eb24d16f6cc985a2d7173
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63054
1) Ensure these tests are skipped in environments without any GPUs.
2) Add the test to run_test.py
ghstack-source-id: 135595698
Test Plan: waitforbuildbot
Reviewed By: wanchaol
Differential Revision: D30239159
fbshipit-source-id: 21b543ba72e8d10182bc77e7ae1fd34fd4096509
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62937
reland due to windows + cuda failure, fix by running it on gloo on windows even with cuda.
ghstack-source-id: 135306176
Test Plan: ci
Reviewed By: mrshenli
Differential Revision: D30177734
fbshipit-source-id: 7625746984c8f858648c1b3632394b98bd4518d2