Commit Graph

22 Commits

Author SHA1 Message Date
Yulv-git
ac2d2e3a3d Fix some typos.
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75561
Approved by: https://github.com/albanD
2022-04-11 21:55:59 +00:00
Rodrigo Berriel
a0dea074b2 Remove .data from benchmarks and tensorboard (#65389)
Summary:
Related to https://github.com/pytorch/pytorch/issues/30987 and https://github.com/pytorch/pytorch/issues/33628. Fix the following tasks:

- Remove the use of `.data` in all our internal code:
  - [x] `benchmarks/`
  - [x] `torch/utils/tensorboard/`

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang gcramer23 albanD gchanan

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65389

Reviewed By: soulitzer

Differential Revision: D31093464

Pulled By: albanD

fbshipit-source-id: 3a9c8834fd544a59a1cc2b930ae538fd1d46b232
2021-09-22 11:16:59 -07:00
Sean Lawlor
34c9f5a8da [DDP Communication Hook] Update get_tensor and set_tensor to be cleaner naming conventions (buffer() and set_buffer()) (#62662)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62662

Replaced the methods set_tensor(.) and get_tensor() in the python exposed API from the C++ logic with buffer() and set_buffer(.) to be a cleaner interface.

Reviewed By: SciPioneer

Differential Revision: D30012869

fbshipit-source-id: bd8efab583dd89c96f9aeb3dd48a12073f0b1482
2021-08-04 09:27:31 -07:00
Bo Wang
e098e9000b Compare DDP static graph (C++ core) with legacy DDP forward and backward delay. (#61507)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61507

Benchmark Python-only DDP vs production C++ based DistributedDataParallel.
- Implemented a pure python DDP: PythonDDP with support of SYNC and ASYNC reduction
- Added compare_ddp to measure the difference in forward and backward step

Kudos on Shen and Yi for the great idea.

Test Plan:
Test on DevGPUS with 2 CUDA devices.

$python compare_ddp.py

Python only DDP has slightly better (-1%) forward performance and slightly slower (2%-20%) backward performance.
This suggested that we need to keep C++ Core since the maximum latency increase can be 20%. See README.md for details.
Imported from OSS

Differential Revision:
D29685364
D29685364

Reviewed By: mrshenli

Pulled By: bowangbj

fbshipit-source-id: 429e4473fac0ec4c70d6db12d946d2636dd6477a
2021-07-15 12:52:22 -07:00
Garrett Cramer
5a5c7f563d add trainer hook functions (#60785)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60785

This pr adds hook functions for the trainers.

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D29697299

Pulled By: gcramer23

fbshipit-source-id: cc3b991aad0d32503fbfc5acd4fca8b404e74c0f
2021-07-14 13:19:17 -07:00
Garrett Cramer
304c02ee44 refactor ps benchmark (#60784)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60784

This pr refactors the ps benchmark for modular trainers.

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D29697291

Pulled By: gcramer23

fbshipit-source-id: 64579a1f5326d3cd9f32936dcf53bc243d54b71d
2021-07-14 13:19:13 -07:00
Basil Hosmer
cab926b2c0 faster generate_square_subsequent_mask in nn.Transformer (#60631)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60631

Per #48360, speed up `Transformer.generate_square_subsequent_mask`. New impl is informally ~5x faster, though absolute difference is probably small.

PR includes Python and C++ versions as well as a couple of places where the previous impl had been copied around.

Test Plan: Imported from OSS

Reviewed By: jbschlosser, albanD

Differential Revision: D29356673

Pulled By: bhosmer

fbshipit-source-id: 4c062ba0ead61a445aeef451c78777bf0b3a631e
2021-06-25 16:07:01 -07:00
Garrett Cramer
4ed2d5d9bb ps sparse rpc (#58003)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58003

adds trainer class DdpTrainer
adds trainer class DdpSparseRpcTrainer
adds server class ParameterServerBase
adds server class AverageParameterServer
adds experiment ddp_cpu_sparse_rpc_nccl_allreduce
adds experiment ddp_cuda_sparse_rpc_nccl_allreduce

quip document https://fb.quip.com/iQUtAeKIxWpF

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D29379696

Pulled By: gcramer23

fbshipit-source-id: 9cf5fb7398ba2fa3eb694afbddc4ed00d97f205f
2021-06-24 17:21:49 -07:00
Zachary Kneupper
b8d56572a1 Open json config file in context manager (#58077)
Summary:
* Open json config file safely using a context manager (using a with block).
* This will make sure that the file closed even if an exception is raised.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58077

Reviewed By: anjali411

Differential Revision: D28711177

Pulled By: H-Huang

fbshipit-source-id: 597ba578311b1f1d6706e487872db4e784c78c3c
2021-05-26 08:58:40 -07:00
Horace He
79a258f448 s/foward/forward/g (#58497)
Summary:
Annoying typo.

Prompted by these profiling results: https://github.com/pytorch/pytorch/issues/56419#issuecomment-825787828

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58497

Reviewed By: malfet

Differential Revision: D28521081

Pulled By: Chillee

fbshipit-source-id: ab91a2e167dd7d3387fd56106a6cff81f7a32f10
2021-05-19 11:42:42 -07:00
Garrett Cramer
16d617c3e5 test experiment script (#57925)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57925

1. adds test_scripts.py that will run added scripts and verify that there are no errors
2. adds local ddp_nccl_allreduce experiment script

test with command `pytest test_scripts.py`

Test Plan: Imported from OSS

Reviewed By: agolynski

Differential Revision: D28382452

Pulled By: gcramer23

fbshipit-source-id: 21028a990ebfedf1aad6b007a723c02403e8bea8
2021-05-12 10:22:47 -07:00
Garrett Cramer
bc2540f0be benchmark rpc ps (#57454)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57454

DDP with NCCL AllReduce for the entire model experiment from Quip https://fb.quip.com/iQUtAeKIxWpF

I have been testing this on the AI cluster. There seem to be some connection problems with RPC when using multiple trainers or parameter servers.

```
Namespace(bconfig_id='3', dconfig_id='DummyData', mconfig_id='DummyModel', pconfig_id='None', tconfig_id='DdpNcclTrainer')

benchmark warmup done

metrics for trainer=0
+-----------------------------------+----------+---------+----------+------------+-----------+
| name                              |      min |     max |     mean |   variance |     stdev |
+===================================+==========+=========+==========+============+===========+
| backward_metric,backward          | 2.45248  | 4.18304 | 3.972    | 0.097122   | 0.311644  |
+-----------------------------------+----------+---------+----------+------------+-----------+
| batch_level_metric,batch_all      | 4.11955  | 4.58138 | 4.31439  | 0.00229848 | 0.0479424 |
+-----------------------------------+----------+---------+----------+------------+-----------+
| foward_metric,forward_pass        | 0.141312 | 1.4807  | 0.222566 | 0.0555432  | 0.235676  |
+-----------------------------------+----------+---------+----------+------------+-----------+
| hook_future_metric,nccl_allreduce | 0.191488 | 3.54099 | 3.11694  | 0.557106   | 0.746395  |
+-----------------------------------+----------+---------+----------+------------+-----------+
metrics for trainer=1
+-----------------------------------+----------+---------+----------+-------------+------------+
| name                              |      min |     max |     mean |    variance |      stdev |
+===================================+==========+=========+==========+=============+============+
| backward_metric,backward          | 2.4617   | 2.59174 | 2.51196  | 0.000938276 | 0.0306313  |
+-----------------------------------+----------+---------+----------+-------------+------------+
| batch_level_metric,batch_all      | 4.22605  | 4.71757 | 4.27921  | 0.00468424  | 0.0684415  |
+-----------------------------------+----------+---------+----------+-------------+------------+
| foward_metric,forward_pass        | 0.807936 | 1.50118 | 0.846008 | 0.00601693  | 0.0775688  |
+-----------------------------------+----------+---------+----------+-------------+------------+
| hook_future_metric,nccl_allreduce | 0.108544 | 0.1536  | 0.11222  | 2.16726e-05 | 0.00465538 |
+-----------------------------------+----------+---------+----------+-------------+------------+
metrics for all trainer
+-----------------------------------+----------+---------+----------+------------+-----------+
| name                              |      min |     max |     mean |   variance |     stdev |
+===================================+==========+=========+==========+============+===========+
| backward_metric,backward          | 2.45248  | 4.18304 | 3.24198  | 0.584391   | 0.764455  |
+-----------------------------------+----------+---------+----------+------------+-----------+
| batch_level_metric,batch_all      | 4.11955  | 4.71757 | 4.2968   | 0.00378467 | 0.0615197 |
+-----------------------------------+----------+---------+----------+------------+-----------+
| foward_metric,forward_pass        | 0.141312 | 1.50118 | 0.534287 | 0.128284   | 0.358167  |
+-----------------------------------+----------+---------+----------+------------+-----------+
| hook_future_metric,nccl_allreduce | 0.108544 | 3.54099 | 1.61458  | 2.5456     | 1.59549   |
+-----------------------------------+----------+---------+----------+------------+-----------+
```

Test Plan: Imported from OSS

Reviewed By: H-Huang, ngimel

Differential Revision: D28296175

Pulled By: gcramer23

fbshipit-source-id: 5dd208fc86f8b5558d7c8860d685bb25c2e09fe7
2021-05-07 19:58:40 -07:00
Sam Estep
75024e228c Add lint for unqualified type: ignore (#56290)
Summary:
The other half of https://github.com/pytorch/pytorch/issues/56272.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56290

Test Plan:
CI should pass on the tip of this PR, and we know that the lint works because the following CI runs (before this PR was finished) failed:

- https://github.com/pytorch/pytorch/runs/2384511062
- https://github.com/pytorch/pytorch/actions/runs/765036024

Reviewed By: seemethere

Differential Revision: D27867219

Pulled By: samestep

fbshipit-source-id: e648f07b6822867e70833e23ddafe7fb7eaca235
2021-04-21 08:07:23 -07:00
Sam Estep
8c798e0622 Forbid trailing whitespace (#53406)
Summary:
Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857

These are the only hand-written parts of this diff:
- the addition to `.github/workflows/lint.yml`
- the file endings changed in these four files (to appease FB-internal land-blocking lints):
  - `GLOSSARY.md`
  - `aten/src/ATen/core/op_registration/README.md`
  - `scripts/README.md`
  - `torch/csrc/jit/codegen/fuser/README.md`

The rest was generated by running this command (on macOS):
```
git grep -I -l ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' | xargs gsed -i 's/ *$//'
```

I looked over the auto-generated changes and didn't see anything that looked problematic.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406

Test Plan:
This run (after adding the lint but before removing existing trailing spaces) failed:
- https://github.com/pytorch/pytorch/runs/2043032377

This run (on the tip of this PR) succeeded:
- https://github.com/pytorch/pytorch/runs/2043296348

Reviewed By: walterddr, seemethere

Differential Revision: D26856620

Pulled By: samestep

fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97
2021-03-05 17:22:55 -08:00
Rohan Varma
5021582fe6 Fix benchmarks/distributed/ddp/benchmark.py (#51095)
Summary:
Fixes the issue reported in https://github.com/pytorch/pytorch/issues/50679 by using built-in object-based collectives. User has verified this patch works

Test with:
RANK=0 python3 pytorch-dist-benchmark.py --world-size 2 --master-addr 127.0.0.1 --master-port 23456
RANK=1 python3 pytorch-dist-benchmark.py --world-size 2 --master-addr 127.0.0.1 --master-port 23456

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51095

Reviewed By: SciPioneer

Differential Revision: D26070275

Pulled By: rohan-varma

fbshipit-source-id: 59abcaac9e395bcdd8a018bf6ba07521d94b2fdf
2021-01-29 11:10:13 -08:00
Pritam Damania
96cedefd8e [Pipe] Refactor convert_to_balance under non-test package. (#50860)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50860

Since fairscale.nn.Pipe still uses 'balance' and 'devices' parameters,
other frameworks like fairseq still use these parameters. As a result, the
`convert_to_balance` method is a nice utility to use for migrating to PyTorch
Pipe without changing a lot of code in other frameworks.

In addition to this I've renamed the method to be more illustrative of what it
does and also allowed an optional devices parameter.
ghstack-source-id: 120430775

Test Plan:
1) waitforbuildbot
2) Tested with fairseq

Reviewed By: SciPioneer

Differential Revision: D25987273

fbshipit-source-id: dccd42cf1a74b08c876090d3a10a94911cc46dd8
2021-01-28 12:10:21 -08:00
Oscar Sandoval
09f4844c1f Pytorch Distributed RPC Reinforcement Learning Benchmark (Throughput and Latency) (#46901)
Summary:
A Pytorch Distributed RPC benchmark measuring Agent and Observer Throughput and Latency for Reinforcement Learning

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46901

Reviewed By: mrshenli

Differential Revision: D25869514

Pulled By: osandoval-fb

fbshipit-source-id: c3b36b21541d227aafd506eaa8f4e5f10da77c78
2021-01-11 19:02:36 -08:00
skyline75489
46b83212d1 Remove unused six code for Python 2/3 compatibility (#48077)
Summary:
This is basically a reborn version of https://github.com/pytorch/pytorch/issues/45254 .

Ref: https://github.com/pytorch/pytorch/issues/42919

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48077

Reviewed By: ngimel

Differential Revision: D25687042

Pulled By: bugra

fbshipit-source-id: 05f20a6f3c5212f73d0b1505b493b720e6cf74e5
2020-12-22 18:07:08 -08:00
mrshenli
e4eaa6de5f Fix lint (#49629)
Summary:
Fix lint on master

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49629

Reviewed By: rohan-varma

Differential Revision: D25654199

Pulled By: mrshenli

fbshipit-source-id: 2ab5669ad47996c0ca0f9b6611855767d5af0506
2020-12-18 19:26:06 -08:00
Pritam Damania
159de1f1d6 Add benchmark for torch.distributed.pipeline.sync.Pipe (#49577)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49577

Repurposing the benchmarking from
https://github.com/facebookresearch/fairscale/blob/master/benchmarks/pipe.py
and pulling in a stripped down version of the benchmark into PyTorch.

Sample output:
```
Running benchmark with args: Namespace(batch_size=8, checkpoint='never', chunks=4, host='localhost', max_batch=10, num_decoder_layers=10, num_devices=4)
Number of parameters for model: 292833040
| batch     1 | wps 3593.07 | loss 25.98 | ppl 192556591553.37
| batch     2 | wps 4405.16 | loss 19.36 | ppl 256201548.33
| batch     3 | wps 4404.98 | loss 23.56 | ppl 17111244076.37
| batch     4 | wps 4413.25 | loss 27.11 | ppl 594561327825.83
| batch     5 | wps 4408.53 | loss 25.92 | ppl 181277705101.33
| batch     6 | wps 4385.64 | loss 24.92 | ppl 66592883598.50
| batch     7 | wps 4434.11 | loss 24.75 | ppl 56113635884.68
| batch     8 | wps 4441.25 | loss 24.88 | ppl 63666024212.82
| batch     9 | wps 4425.49 | loss 25.35 | ppl 101959669008.98
| batch    10 | wps 4421.05 | loss 25.34 | ppl 101597621863.94
Peak memory usage for GPUs: cuda:0: 2.38GiB, cuda:1: 3.04GiB, cuda:2: 3.04GiB, cuda:3: 3.67GiB,
```
ghstack-source-id: 118939686

Test Plan: sentinel

Reviewed By: rohan-varma

Differential Revision: D25628721

fbshipit-source-id: 41c788eed4f852aef019aec18a84cb25ad254f3a
2020-12-18 18:33:47 -08:00
elfringham
db1b0b06c4 Flake8 fixes (#48453)
Summary:
Quiet errors from flake8. Only a couple of code changes for deprecated Python syntax from before 2.4. The rest is just adding noqa markers.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48453

Reviewed By: mruberry

Differential Revision: D25181871

Pulled By: ngimel

fbshipit-source-id: f8d7298aae783b1bce2a46827b088fc390970641
2020-11-25 19:09:50 -08:00
Shen Li
76c7652cc5 Add distributed data parallel benchmark tool (#35198)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35198

The need for this tool was motivated by #28883. In the past, we have
done ad-hoc benchmarking, but it's time for something more structured.

It would be nice to add more model architectures so that we can get a
full picture of the performance impact of a code change simply by
running this suite a few times.

Test Plan: Imported from OSS

Differential Revision: D20591296

Pulled By: mrshenli

fbshipit-source-id: ee66ce0ebca02086453b02df0a94fde27ab4be49
2020-04-08 15:07:03 -07:00