Summary:
- [x] add to test matrix
- [x] enable on PRs for testing
- [x] modify the scripts so it actually runs the multigpu tests
- [x] put `num_shards` after `shard` number
- [x] use a separate test-reports artifact
- [x] run on `linux.16xlarge.nvidia.gpu`
- [x] validate that it works
- [x] disable on PRs before merging
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60221
Test Plan: CI. Example run: https://github.com/pytorch/pytorch/actions/runs/984347177
Reviewed By: malfet
Differential Revision: D29430567
Pulled By: samestep
fbshipit-source-id: 09f8e208e524579b603611479ca00515c8a1b5aa
Summary:
This is branch off of https://github.com/pytorch/pytorch/issues/59970 to only shard on linux so far (we're running in issues with windows gflags).
This would enable sharding of tests on a few Linux jobs on GHA, allowing tts to be essentially halved.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60124
Reviewed By: zou3519
Differential Revision: D29204211
Pulled By: janeyx99
fbshipit-source-id: 1cc31d1eccd564d96e2aef14c0acae96a3f0fcd0