Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62662
Replaced the methods set_tensor(.) and get_tensor() in the python exposed API from the C++ logic with buffer() and set_buffer(.) to be a cleaner interface.
Reviewed By: SciPioneer
Differential Revision: D30012869
fbshipit-source-id: bd8efab583dd89c96f9aeb3dd48a12073f0b1482
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61507
Benchmark Python-only DDP vs production C++ based DistributedDataParallel.
- Implemented a pure python DDP: PythonDDP with support of SYNC and ASYNC reduction
- Added compare_ddp to measure the difference in forward and backward step
Kudos on Shen and Yi for the great idea.
Test Plan:
Test on DevGPUS with 2 CUDA devices.
$python compare_ddp.py
Python only DDP has slightly better (-1%) forward performance and slightly slower (2%-20%) backward performance.
This suggested that we need to keep C++ Core since the maximum latency increase can be 20%. See README.md for details.
Imported from OSS
Differential Revision:
D29685364
D29685364
Reviewed By: mrshenli
Pulled By: bowangbj
fbshipit-source-id: 429e4473fac0ec4c70d6db12d946d2636dd6477a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60631
Per #48360, speed up `Transformer.generate_square_subsequent_mask`. New impl is informally ~5x faster, though absolute difference is probably small.
PR includes Python and C++ versions as well as a couple of places where the previous impl had been copied around.
Test Plan: Imported from OSS
Reviewed By: jbschlosser, albanD
Differential Revision: D29356673
Pulled By: bhosmer
fbshipit-source-id: 4c062ba0ead61a445aeef451c78777bf0b3a631e
Summary:
* Open json config file safely using a context manager (using a with block).
* This will make sure that the file closed even if an exception is raised.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58077
Reviewed By: anjali411
Differential Revision: D28711177
Pulled By: H-Huang
fbshipit-source-id: 597ba578311b1f1d6706e487872db4e784c78c3c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57925
1. adds test_scripts.py that will run added scripts and verify that there are no errors
2. adds local ddp_nccl_allreduce experiment script
test with command `pytest test_scripts.py`
Test Plan: Imported from OSS
Reviewed By: agolynski
Differential Revision: D28382452
Pulled By: gcramer23
fbshipit-source-id: 21028a990ebfedf1aad6b007a723c02403e8bea8
Summary:
Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857
These are the only hand-written parts of this diff:
- the addition to `.github/workflows/lint.yml`
- the file endings changed in these four files (to appease FB-internal land-blocking lints):
- `GLOSSARY.md`
- `aten/src/ATen/core/op_registration/README.md`
- `scripts/README.md`
- `torch/csrc/jit/codegen/fuser/README.md`
The rest was generated by running this command (on macOS):
```
git grep -I -l ' $' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' | xargs gsed -i 's/ *$//'
```
I looked over the auto-generated changes and didn't see anything that looked problematic.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406
Test Plan:
This run (after adding the lint but before removing existing trailing spaces) failed:
- https://github.com/pytorch/pytorch/runs/2043032377
This run (on the tip of this PR) succeeded:
- https://github.com/pytorch/pytorch/runs/2043296348
Reviewed By: walterddr, seemethere
Differential Revision: D26856620
Pulled By: samestep
fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50860
Since fairscale.nn.Pipe still uses 'balance' and 'devices' parameters,
other frameworks like fairseq still use these parameters. As a result, the
`convert_to_balance` method is a nice utility to use for migrating to PyTorch
Pipe without changing a lot of code in other frameworks.
In addition to this I've renamed the method to be more illustrative of what it
does and also allowed an optional devices parameter.
ghstack-source-id: 120430775
Test Plan:
1) waitforbuildbot
2) Tested with fairseq
Reviewed By: SciPioneer
Differential Revision: D25987273
fbshipit-source-id: dccd42cf1a74b08c876090d3a10a94911cc46dd8
Summary:
Quiet errors from flake8. Only a couple of code changes for deprecated Python syntax from before 2.4. The rest is just adding noqa markers.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48453
Reviewed By: mruberry
Differential Revision: D25181871
Pulled By: ngimel
fbshipit-source-id: f8d7298aae783b1bce2a46827b088fc390970641
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35198
The need for this tool was motivated by #28883. In the past, we have
done ad-hoc benchmarking, but it's time for something more structured.
It would be nice to add more model architectures so that we can get a
full picture of the performance impact of a code change simply by
running this suite a few times.
Test Plan: Imported from OSS
Differential Revision: D20591296
Pulled By: mrshenli
fbshipit-source-id: ee66ce0ebca02086453b02df0a94fde27ab4be49