pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Yulv-git	ac2d2e3a3d	Fix some typos. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/75561 Approved by: https://github.com/albanD	2022-04-11 21:55:59 +00:00
Rodrigo Berriel	a0dea074b2	Remove `.data` from benchmarks and tensorboard (#65389 ) Summary: Related to https://github.com/pytorch/pytorch/issues/30987 and https://github.com/pytorch/pytorch/issues/33628. Fix the following tasks: - Remove the use of `.data` in all our internal code: - [x] `benchmarks/` - [x] `torch/utils/tensorboard/` cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang gcramer23 albanD gchanan Pull Request resolved: https://github.com/pytorch/pytorch/pull/65389 Reviewed By: soulitzer Differential Revision: D31093464 Pulled By: albanD fbshipit-source-id: 3a9c8834fd544a59a1cc2b930ae538fd1d46b232	2021-09-22 11:16:59 -07:00
Sean Lawlor	34c9f5a8da	[DDP Communication Hook] Update get_tensor and set_tensor to be cleaner naming conventions (buffer() and set_buffer()) (#62662 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62662 Replaced the methods set_tensor(.) and get_tensor() in the python exposed API from the C++ logic with buffer() and set_buffer(.) to be a cleaner interface. Reviewed By: SciPioneer Differential Revision: D30012869 fbshipit-source-id: bd8efab583dd89c96f9aeb3dd48a12073f0b1482	2021-08-04 09:27:31 -07:00
Bo Wang	e098e9000b	Compare DDP static graph (C++ core) with legacy DDP forward and backward delay. (#61507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61507 Benchmark Python-only DDP vs production C++ based DistributedDataParallel. - Implemented a pure python DDP: PythonDDP with support of SYNC and ASYNC reduction - Added compare_ddp to measure the difference in forward and backward step Kudos on Shen and Yi for the great idea. Test Plan: Test on DevGPUS with 2 CUDA devices. $python compare_ddp.py Python only DDP has slightly better (-1%) forward performance and slightly slower (2%-20%) backward performance. This suggested that we need to keep C++ Core since the maximum latency increase can be 20%. See README.md for details. Imported from OSS Differential Revision: D29685364 D29685364 Reviewed By: mrshenli Pulled By: bowangbj fbshipit-source-id: 429e4473fac0ec4c70d6db12d946d2636dd6477a	2021-07-15 12:52:22 -07:00
Garrett Cramer	5a5c7f563d	add trainer hook functions (#60785 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60785 This pr adds hook functions for the trainers. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D29697299 Pulled By: gcramer23 fbshipit-source-id: cc3b991aad0d32503fbfc5acd4fca8b404e74c0f	2021-07-14 13:19:17 -07:00
Garrett Cramer	304c02ee44	refactor ps benchmark (#60784 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60784 This pr refactors the ps benchmark for modular trainers. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D29697291 Pulled By: gcramer23 fbshipit-source-id: 64579a1f5326d3cd9f32936dcf53bc243d54b71d	2021-07-14 13:19:13 -07:00
Basil Hosmer	cab926b2c0	faster generate_square_subsequent_mask in nn.Transformer (#60631 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60631 Per #48360, speed up `Transformer.generate_square_subsequent_mask`. New impl is informally ~5x faster, though absolute difference is probably small. PR includes Python and C++ versions as well as a couple of places where the previous impl had been copied around. Test Plan: Imported from OSS Reviewed By: jbschlosser, albanD Differential Revision: D29356673 Pulled By: bhosmer fbshipit-source-id: 4c062ba0ead61a445aeef451c78777bf0b3a631e	2021-06-25 16:07:01 -07:00
Garrett Cramer	4ed2d5d9bb	ps sparse rpc (#58003 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58003 adds trainer class DdpTrainer adds trainer class DdpSparseRpcTrainer adds server class ParameterServerBase adds server class AverageParameterServer adds experiment ddp_cpu_sparse_rpc_nccl_allreduce adds experiment ddp_cuda_sparse_rpc_nccl_allreduce quip document https://fb.quip.com/iQUtAeKIxWpF Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29379696 Pulled By: gcramer23 fbshipit-source-id: 9cf5fb7398ba2fa3eb694afbddc4ed00d97f205f	2021-06-24 17:21:49 -07:00
Zachary Kneupper	b8d56572a1	Open json config file in context manager (#58077 ) Summary: * Open json config file safely using a context manager (using a with block). * This will make sure that the file closed even if an exception is raised. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58077 Reviewed By: anjali411 Differential Revision: D28711177 Pulled By: H-Huang fbshipit-source-id: 597ba578311b1f1d6706e487872db4e784c78c3c	2021-05-26 08:58:40 -07:00
Horace He	79a258f448	s/foward/forward/g (#58497 ) Summary: Annoying typo. Prompted by these profiling results: https://github.com/pytorch/pytorch/issues/56419#issuecomment-825787828 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58497 Reviewed By: malfet Differential Revision: D28521081 Pulled By: Chillee fbshipit-source-id: ab91a2e167dd7d3387fd56106a6cff81f7a32f10	2021-05-19 11:42:42 -07:00
Garrett Cramer	16d617c3e5	test experiment script (#57925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57925 1. adds test_scripts.py that will run added scripts and verify that there are no errors 2. adds local ddp_nccl_allreduce experiment script test with command `pytest test_scripts.py` Test Plan: Imported from OSS Reviewed By: agolynski Differential Revision: D28382452 Pulled By: gcramer23 fbshipit-source-id: 21028a990ebfedf1aad6b007a723c02403e8bea8	2021-05-12 10:22:47 -07:00
Garrett Cramer	bc2540f0be	benchmark rpc ps (#57454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57454 DDP with NCCL AllReduce for the entire model experiment from Quip https://fb.quip.com/iQUtAeKIxWpF I have been testing this on the AI cluster. There seem to be some connection problems with RPC when using multiple trainers or parameter servers. ``` Namespace(bconfig_id='3', dconfig_id='DummyData', mconfig_id='DummyModel', pconfig_id='None', tconfig_id='DdpNcclTrainer') benchmark warmup done metrics for trainer=0 +-----------------------------------+----------+---------+----------+------------+-----------+ \| name \| min \| max \| mean \| variance \| stdev \| +===================================+==========+=========+==========+============+===========+ \| backward_metric,backward \| 2.45248 \| 4.18304 \| 3.972 \| 0.097122 \| 0.311644 \| +-----------------------------------+----------+---------+----------+------------+-----------+ \| batch_level_metric,batch_all \| 4.11955 \| 4.58138 \| 4.31439 \| 0.00229848 \| 0.0479424 \| +-----------------------------------+----------+---------+----------+------------+-----------+ \| foward_metric,forward_pass \| 0.141312 \| 1.4807 \| 0.222566 \| 0.0555432 \| 0.235676 \| +-----------------------------------+----------+---------+----------+------------+-----------+ \| hook_future_metric,nccl_allreduce \| 0.191488 \| 3.54099 \| 3.11694 \| 0.557106 \| 0.746395 \| +-----------------------------------+----------+---------+----------+------------+-----------+ metrics for trainer=1 +-----------------------------------+----------+---------+----------+-------------+------------+ \| name \| min \| max \| mean \| variance \| stdev \| +===================================+==========+=========+==========+=============+============+ \| backward_metric,backward \| 2.4617 \| 2.59174 \| 2.51196 \| 0.000938276 \| 0.0306313 \| +-----------------------------------+----------+---------+----------+-------------+------------+ \| batch_level_metric,batch_all \| 4.22605 \| 4.71757 \| 4.27921 \| 0.00468424 \| 0.0684415 \| +-----------------------------------+----------+---------+----------+-------------+------------+ \| foward_metric,forward_pass \| 0.807936 \| 1.50118 \| 0.846008 \| 0.00601693 \| 0.0775688 \| +-----------------------------------+----------+---------+----------+-------------+------------+ \| hook_future_metric,nccl_allreduce \| 0.108544 \| 0.1536 \| 0.11222 \| 2.16726e-05 \| 0.00465538 \| +-----------------------------------+----------+---------+----------+-------------+------------+ metrics for all trainer +-----------------------------------+----------+---------+----------+------------+-----------+ \| name \| min \| max \| mean \| variance \| stdev \| +===================================+==========+=========+==========+============+===========+ \| backward_metric,backward \| 2.45248 \| 4.18304 \| 3.24198 \| 0.584391 \| 0.764455 \| +-----------------------------------+----------+---------+----------+------------+-----------+ \| batch_level_metric,batch_all \| 4.11955 \| 4.71757 \| 4.2968 \| 0.00378467 \| 0.0615197 \| +-----------------------------------+----------+---------+----------+------------+-----------+ \| foward_metric,forward_pass \| 0.141312 \| 1.50118 \| 0.534287 \| 0.128284 \| 0.358167 \| +-----------------------------------+----------+---------+----------+------------+-----------+ \| hook_future_metric,nccl_allreduce \| 0.108544 \| 3.54099 \| 1.61458 \| 2.5456 \| 1.59549 \| +-----------------------------------+----------+---------+----------+------------+-----------+ ``` Test Plan: Imported from OSS Reviewed By: H-Huang, ngimel Differential Revision: D28296175 Pulled By: gcramer23 fbshipit-source-id: 5dd208fc86f8b5558d7c8860d685bb25c2e09fe7	2021-05-07 19:58:40 -07:00
Sam Estep	75024e228c	Add lint for unqualified `type: ignore` (#56290 ) Summary: The other half of https://github.com/pytorch/pytorch/issues/56272. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56290 Test Plan: CI should pass on the tip of this PR, and we know that the lint works because the following CI runs (before this PR was finished) failed: - https://github.com/pytorch/pytorch/runs/2384511062 - https://github.com/pytorch/pytorch/actions/runs/765036024 Reviewed By: seemethere Differential Revision: D27867219 Pulled By: samestep fbshipit-source-id: e648f07b6822867e70833e23ddafe7fb7eaca235	2021-04-21 08:07:23 -07:00
Sam Estep	8c798e0622	Forbid trailing whitespace (#53406 ) Summary: Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857 These are the only hand-written parts of this diff: - the addition to `.github/workflows/lint.yml` - the file endings changed in these four files (to appease FB-internal land-blocking lints): - `GLOSSARY.md` - `aten/src/ATen/core/op_registration/README.md` - `scripts/README.md` - `torch/csrc/jit/codegen/fuser/README.md` The rest was generated by running this command (on macOS): ``` git grep -I -l ' $' -- . ':(exclude)/contrib/' ':(exclude)third_party' \| xargs gsed -i 's/ *$//' ``` I looked over the auto-generated changes and didn't see anything that looked problematic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406 Test Plan: This run (after adding the lint but before removing existing trailing spaces) failed: - https://github.com/pytorch/pytorch/runs/2043032377 This run (on the tip of this PR) succeeded: - https://github.com/pytorch/pytorch/runs/2043296348 Reviewed By: walterddr, seemethere Differential Revision: D26856620 Pulled By: samestep fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97	2021-03-05 17:22:55 -08:00
Rohan Varma	5021582fe6	Fix benchmarks/distributed/ddp/benchmark.py (#51095 ) Summary: Fixes the issue reported in https://github.com/pytorch/pytorch/issues/50679 by using built-in object-based collectives. User has verified this patch works Test with: RANK=0 python3 pytorch-dist-benchmark.py --world-size 2 --master-addr 127.0.0.1 --master-port 23456 RANK=1 python3 pytorch-dist-benchmark.py --world-size 2 --master-addr 127.0.0.1 --master-port 23456 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51095 Reviewed By: SciPioneer Differential Revision: D26070275 Pulled By: rohan-varma fbshipit-source-id: 59abcaac9e395bcdd8a018bf6ba07521d94b2fdf	2021-01-29 11:10:13 -08:00
Pritam Damania	96cedefd8e	[Pipe] Refactor convert_to_balance under non-test package. (#50860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50860 Since fairscale.nn.Pipe still uses 'balance' and 'devices' parameters, other frameworks like fairseq still use these parameters. As a result, the `convert_to_balance` method is a nice utility to use for migrating to PyTorch Pipe without changing a lot of code in other frameworks. In addition to this I've renamed the method to be more illustrative of what it does and also allowed an optional devices parameter. ghstack-source-id: 120430775 Test Plan: 1) waitforbuildbot 2) Tested with fairseq Reviewed By: SciPioneer Differential Revision: D25987273 fbshipit-source-id: dccd42cf1a74b08c876090d3a10a94911cc46dd8	2021-01-28 12:10:21 -08:00
Oscar Sandoval	09f4844c1f	Pytorch Distributed RPC Reinforcement Learning Benchmark (Throughput and Latency) (#46901 ) Summary: A Pytorch Distributed RPC benchmark measuring Agent and Observer Throughput and Latency for Reinforcement Learning Pull Request resolved: https://github.com/pytorch/pytorch/pull/46901 Reviewed By: mrshenli Differential Revision: D25869514 Pulled By: osandoval-fb fbshipit-source-id: c3b36b21541d227aafd506eaa8f4e5f10da77c78	2021-01-11 19:02:36 -08:00
skyline75489	46b83212d1	Remove unused six code for Python 2/3 compatibility (#48077 ) Summary: This is basically a reborn version of https://github.com/pytorch/pytorch/issues/45254 . Ref: https://github.com/pytorch/pytorch/issues/42919 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48077 Reviewed By: ngimel Differential Revision: D25687042 Pulled By: bugra fbshipit-source-id: 05f20a6f3c5212f73d0b1505b493b720e6cf74e5	2020-12-22 18:07:08 -08:00
mrshenli	e4eaa6de5f	Fix lint (#49629 ) Summary: Fix lint on master Pull Request resolved: https://github.com/pytorch/pytorch/pull/49629 Reviewed By: rohan-varma Differential Revision: D25654199 Pulled By: mrshenli fbshipit-source-id: 2ab5669ad47996c0ca0f9b6611855767d5af0506	2020-12-18 19:26:06 -08:00
Pritam Damania	159de1f1d6	Add benchmark for torch.distributed.pipeline.sync.Pipe (#49577 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49577 Repurposing the benchmarking from https://github.com/facebookresearch/fairscale/blob/master/benchmarks/pipe.py and pulling in a stripped down version of the benchmark into PyTorch. Sample output: ``` Running benchmark with args: Namespace(batch_size=8, checkpoint='never', chunks=4, host='localhost', max_batch=10, num_decoder_layers=10, num_devices=4) Number of parameters for model: 292833040 \| batch 1 \| wps 3593.07 \| loss 25.98 \| ppl 192556591553.37 \| batch 2 \| wps 4405.16 \| loss 19.36 \| ppl 256201548.33 \| batch 3 \| wps 4404.98 \| loss 23.56 \| ppl 17111244076.37 \| batch 4 \| wps 4413.25 \| loss 27.11 \| ppl 594561327825.83 \| batch 5 \| wps 4408.53 \| loss 25.92 \| ppl 181277705101.33 \| batch 6 \| wps 4385.64 \| loss 24.92 \| ppl 66592883598.50 \| batch 7 \| wps 4434.11 \| loss 24.75 \| ppl 56113635884.68 \| batch 8 \| wps 4441.25 \| loss 24.88 \| ppl 63666024212.82 \| batch 9 \| wps 4425.49 \| loss 25.35 \| ppl 101959669008.98 \| batch 10 \| wps 4421.05 \| loss 25.34 \| ppl 101597621863.94 Peak memory usage for GPUs: cuda:0: 2.38GiB, cuda:1: 3.04GiB, cuda:2: 3.04GiB, cuda:3: 3.67GiB, ``` ghstack-source-id: 118939686 Test Plan: sentinel Reviewed By: rohan-varma Differential Revision: D25628721 fbshipit-source-id: 41c788eed4f852aef019aec18a84cb25ad254f3a	2020-12-18 18:33:47 -08:00
elfringham	db1b0b06c4	Flake8 fixes (#48453 ) Summary: Quiet errors from flake8. Only a couple of code changes for deprecated Python syntax from before 2.4. The rest is just adding noqa markers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48453 Reviewed By: mruberry Differential Revision: D25181871 Pulled By: ngimel fbshipit-source-id: f8d7298aae783b1bce2a46827b088fc390970641	2020-11-25 19:09:50 -08:00
Shen Li	76c7652cc5	Add distributed data parallel benchmark tool (#35198 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35198 The need for this tool was motivated by #28883. In the past, we have done ad-hoc benchmarking, but it's time for something more structured. It would be nice to add more model architectures so that we can get a full picture of the performance impact of a code change simply by running this suite a few times. Test Plan: Imported from OSS Differential Revision: D20591296 Pulled By: mrshenli fbshipit-source-id: ee66ce0ebca02086453b02df0a94fde27ab4be49	2020-04-08 15:07:03 -07:00

22 Commits