pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
cyy	b0dfd242fa	Remove NO_MULTIPROCESSING_SPAWN checks (#146705 ) py 3.9 has spawn. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146705 Approved by: https://github.com/colesbury	2025-02-28 05:53:19 +00:00
PyTorch MergeBot	926b7b5027	Revert "Remove NO_MULTIPROCESSING_SPAWN checks (#146705 )" This reverts commit `40ad5e01df`. Reverted https://github.com/pytorch/pytorch/pull/146705 on behalf of https://github.com/cyyever due to Broke lint?, I guess land race with rufff update ([comment](https://github.com/pytorch/pytorch/pull/146705#issuecomment-2689603077))	2025-02-28 03:04:38 +00:00
cyyever	40ad5e01df	Remove NO_MULTIPROCESSING_SPAWN checks (#146705 ) py 3.9 has spawn. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146705 Approved by: https://github.com/colesbury	2025-02-28 00:15:32 +00:00
Xuehai Pan	db3290846e	[BE][Easy][10/19] enforce style for empty lines in import segments in `test/d*/` (#129761 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129761 Approved by: https://github.com/fegin	2024-07-17 16:57:39 +00:00
Chien-Chin Huang	7420bad74c	[BE] Do not assert if the barrier is not created (#129497 ) the foler will be created as long as TEMP_DIR is set and the program has the write permission. This will ensure some test environment can run the spawn tests. Differential Revision: [D59020736](https://our.internmc.facebook.com/intern/diff/D59020736/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/129497 Approved by: https://github.com/fduwjj, https://github.com/wz337	2024-06-26 05:51:36 +00:00
Yuanhao Ji	e3effa5855	Enable UFMT on all of `test/distributed` (#123539 ) Partially addresses #123062 Ran lintrunner on: - `test/distributed` Pull Request resolved: https://github.com/pytorch/pytorch/pull/123539 Approved by: https://github.com/ezyang	2024-04-17 06:46:02 +00:00
PyTorch MergeBot	52be63eb2c	Revert "Enable UFMT on all of `test/distributed` (#123539 )" This reverts commit `89ac37fe91`. Reverted https://github.com/pytorch/pytorch/pull/123539 on behalf of https://github.com/DanilBaibak due to Broken trunk ([comment](https://github.com/pytorch/pytorch/pull/123539#issuecomment-2058329471))	2024-04-16 06:33:21 +00:00
Yuanhao Ji	89ac37fe91	Enable UFMT on all of `test/distributed` (#123539 ) Partially addresses #123062 Ran lintrunner on: - `test/distributed` Pull Request resolved: https://github.com/pytorch/pytorch/pull/123539 Approved by: https://github.com/ezyang	2024-04-16 03:23:56 +00:00
Will Constable	418c5206ec	Make `test_distributed_spawn.py` tell you how to run it correctly (#112924 ) Sample output if incorrect/missing args are specified: ``` RuntimeError: Missing expected env vars for `test_distributed_spawn.py`. Please ensure to specify the following: 'BACKEND' = one of ('gloo', 'nccl', 'ucc') 'WORLD_SIZE' = int >= 2 'TEMP_DIR' specifying a directory containing a barrier file named 'barrier'. e.g. touch /tmp/barrier && TEMP_DIR=/tmp BACKEND='nccl' WORLD_SIZE=2 python /data/users/whc/pytorch/test/distributed/test_distributed_spawn.py ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/112924 Approved by: https://github.com/wanchaol	2023-11-04 02:43:43 +00:00
Rohan Varma	f044613f78	Back out "Revert "[DDP] multiple forward support for static graph (#103487 )" (#103873 )" (#103938 ) Differential Revision: [D46883396](https://our.internmc.facebook.com/intern/diff/D46883396/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103938 Approved by: https://github.com/awgu, https://github.com/fegin	2023-06-22 21:55:58 +00:00
Huy Do	b1ddd5a293	Revert "[DDP] multiple forward support for static graph (#103487 )" (#103873 ) Per the discussion in https://github.com/pytorch/pytorch/pull/103629#issuecomment-1598001313, I preemptively create this revert PR to revert all commits in the stack. This seems like a safer option than using the bot as the commit has already been in trunk since last week. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103873 Approved by: https://github.com/rohan-varma	2023-06-20 16:25:00 +00:00
Rohan Varma	80139fc2db	[DDP] multiple forward support for static graph (#103487 ) Adds support for multiple forward before bwd call for static_graph=True. There are 2 changes: 1) Change tracking of accounting of when to populate static grap related maps from relying on forward iteration to backward calls 2) In DDP python, don't rely on num_forward iterations == 1 to enqueue the delay allreduce. Instead use a flag. Differential Revision: [D46673736](https://our.internmc.facebook.com/intern/diff/D46673736/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103487 Approved by: https://github.com/awgu	2023-06-14 16:14:52 +00:00
Jeff Daily	72502b94f3	correct use of torch.backends.cudnn.flags() (#93182 ) Fixes #77467. Pull Request resolved: https://github.com/pytorch/pytorch/pull/93182 Approved by: https://github.com/ngimel	2023-01-28 06:50:06 +00:00
Xiang Gao	08c4f8c7a7	ProcessGroupUCC tests (#83285 ) - [x] Direct dependency on UCX is completely removed, UCC active set API always enabled - [x] Remove `TORCH_UCC_PROFILING_ENABLE`, always enable profiling - [x] Fixes profiling of `recv` and `all_gather` - [x] Use the NCCL TL of UCC on CUDA, as the UCP TL is not well supported on CUDA Most tests are passing, but there are a few skipped tests: - `scatter` and `gather` are not supported by the UCP TL of UCC on CPU tensors - A few flaky tests in PyTorch's CI environment - Profiler-related failures, some of them will be fixed by @Fuzzkatt in https://github.com/pytorch/pytorch/pull/84368 After this PR is merged, I will continue to work on these skipped failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83285 Approved by: https://github.com/vtlam, https://github.com/malfet, https://github.com/kwen2501	2022-09-10 10:56:05 +00:00
Jane Xu	34051d74da	Add test owner to distributed files starting with test_ (#66797 ) Summary: Action based on https://github.com/pytorch/pytorch/issues/66232 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/66797 Reviewed By: gchanan Differential Revision: D31761389 Pulled By: janeyx99 fbshipit-source-id: c27c9ab4acec1eb71d5edd4538cd113b770dfc6c	2021-10-19 10:55:20 -07:00
Pritam Damania	f7611b31aa	[4/N] Enable opt-asan for distributed unit tests. (#62051 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62051 The goal here is to enable opt-asan for "spawn" based unit tests since this works for "spawn" unlike "dev-asan". As a result, we can run ASAN for "spawn" unit tests as well. This means we can completely remove fork unit tests from the code base since the only purpose for these tests was to run ASAN. ghstack-source-id: 135523770 Test Plan: waitforbuildbot Reviewed By: SciPioneer Differential Revision: D29854514 fbshipit-source-id: 02a5bfcfae2afc21badecff77082c7a6ad83636b	2021-08-10 22:38:31 -07:00
Pritam Damania	82d81455ae	[2/N] Remove unittest.skip across all of torch.distributed. (#61887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61887 1) Introduced a `sandcastle_skip_if` decorator that ensures these tests just get passed on sandcastle. 2) Fixed all test files under `test/distributed` to not use `unittest.skip` Overall goal is to avoid using skips since sandcastle tags these tests as continuously skipping. ghstack-source-id: 134382237 Test Plan: waitforbuildbot Reviewed By: SciPioneer Differential Revision: D29784152 fbshipit-source-id: 17b4df6c5a55ff1d1e8e1de128fa679c3dfbcb7d	2021-07-27 10:53:23 -07:00
Xiang Gao	dfb5f029da	Disable TF32 on DDP tests (#52941 ) Summary: When a system has an ampere and a non-ampere card, lots of tests will fail, because results on different cards are differnet. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52941 Reviewed By: albanD Differential Revision: D26994287 Pulled By: mrshenli fbshipit-source-id: 287537495fc13361104a4460f5bcd79a208b5d8d	2021-03-11 18:31:28 -08:00
Hong Xu	1b35b1a0c4	Properly skip distributed tests when distributed module is not built (#52945 ) Summary: Currently there is some code that intends to skip distributed tests if the distributed module is not built. However, they are missing in some test files; and in some other test files they are checked after distributed module is imported, which leads to failure. This is generating a lot of headaches when testing minimal builds locally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52945 Reviewed By: anjali411 Differential Revision: D26848241 Pulled By: ezyang fbshipit-source-id: 983a848844add40869a86f3c9413503a3659b115	2021-03-05 10:28:47 -08:00
Xiang Gao	20ac736200	Remove py2 compatible future imports (#44735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735 Reviewed By: mruberry Differential Revision: D23731306 Pulled By: ezyang fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f	2020-09-16 12:55:57 -07:00
Rohan Varma	b22abbe381	Enable test_distributed to work with spawn mode (#41769 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41769 Currently the tests in `test_distributed` only work with the `fork` mode multiprocessing, this PR introduces support for `spawn` mode multiprocessing as well (while keeping the `fork` mode intact). Motivations for the change: 1) Spawn multiprocessing is the default on MacOS, so it better emulates how MacOS users would use distributed 2) With python 3.8+, spawn is the default on linux, so we should have test coverage for this 3) PT multiprocessing suggests using spawn/forkserver over fork, for sharing cuda tensors: https://pytorch.org/docs/stable/multiprocessing.html 4) Spawn is better supported with respect to certain sanitizers such as TSAN, so adding this sanitizer coverage may help us uncover issues. How it is done: 1) Move `test_distributed` tests in `_DistTestBase` class to a shared file `distributed_test` (similar to how the RPC tests are structured) 2) For `Barrier`, refactor the setup of temp directories, as the current version did not work with spawn, each process would get a different randomly generated directory and thus would write to different barriers. 3) Add all the relevant builds to run internally and in OSS. Running test_distributed with spawn mode in OSS can be done with: `python test/run_test.py -i distributed/test_distributed_spawn -v` Reviewed By: izdeby Differential Revision: D22408023 fbshipit-source-id: e206be16961fd80438f995e221f18139d7e6d2a9	2020-09-08 23:11:12 -07:00

21 Commits