Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76333
The current PyTorch multi-head attention and transformer
implementations are slow. This should speed them up for inference.
ghstack-source-id: 154737857
(Note: this ignores all push blocking failures!)
Test Plan: CI
Reviewed By: cpuhrsch
Differential Revision: D35239925
fbshipit-source-id: 5a7eb8ff79bc6afb4b7d45075ddb2a24a6e2df28
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75869
ghstack-source-id: 154696012
Test Plan: Verified nothing uses this and relying on CI for confirmation.
Reviewed By: dreiss
Differential Revision: D35674694
fbshipit-source-id: c1d602aa4d85642594160a33606093c33817988f
(cherry picked from commit cac15ca941be298a692570491e96f2db6095e3c1)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75868
This is unused in OSS and internally.
ghstack-source-id: 154696014
Test Plan: I manually verified it is unused and am relying on CI to confirm.
Reviewed By: dreiss
Differential Revision: D35674693
fbshipit-source-id: 945ec0590e9d939eab8944ae48bae72cb61e6261
(cherry picked from commit 01a29161b0a3b386078df3cd081358786a6d8f53)
Fixes https://github.com/pytorch/pytorch/issues/75464 Adds a context manager that will throw if the ops in the context are not fused.
API is :
```
with torch.jit.strict_fusion():
...
```
A few TODOs:
[+] Compose/figure out how to do with autodiff - right now it will run on autodiff as well
[+] Support all of the nvfuser operators that are added in guarding
[+] Figure out what to do with control flow that isn't taken (right now it will just error). this is probably a source of the original issue :/ - will just error
[+] (After those are figured out) add to docs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75777
Approved by: https://github.com/davidberard98
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76180
Provides string variables to let xla customize the generated code sufficiently enough to facilitate their migration onto LTC.
Some/all of these custom variables are expected to be short-lived for migration and eventually revert to using the original content that points to LTC functionality.
Test Plan: Imported from OSS
Reviewed By: huiguoo
Differential Revision: D35861778
Pulled By: wconstab
fbshipit-source-id: ef7aae55334628e2e7ff0c22e5c86ab95439256d
(cherry picked from commit 971f075e0c21804558f46c685508bd23daa42d4f)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75605
Usecase: Milan models have multiple backends and need to use static dispatch to save on static initialization time and to hit native functions directly from the unboxed APIs.
This change passes in List[BackendIndex] and adds ability to generate code for multiple static backends with 1 or 0 kernels
ghstack-source-id: 154525738
(Note: this ignores all push blocking failures!)
Test Plan:
Builds lite_predictor_flatbuffer with multiple backends
```
buck build --config pt.enable_lightweight_dispatch=1 --config pt.static_dispatch_backend=CPU,QuantizedCPU,CompositeExplicitAutograd //xplat/caffe2/fb/lite_predictor:lite_predictor_flatbuffer
```
Reviewed By: larryliu0820
Differential Revision: D35510644
fbshipit-source-id: f985718ad066f8578b006b4759c4a3bd6caac176
(cherry picked from commit a6999729c8cc26c54b8d5684f6585d6c50d8d913)
Summary:
Next stage of breaking up https://github.com/pytorch/pytorch/pull/74710
Move shape cache implementation to the backend interface. Also, clean up some of the hashing logic in the base node class.
CC: wconstab JackCaoG henrytwo
Partially Fixes https://github.com/pytorch/pytorch/issues/74628
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75324
Reviewed By: anjali411
Differential Revision: D35730823
Pulled By: wconstab
fbshipit-source-id: cf6fa326319b9324e5f422a78817b6fb5bf7e9b8
(cherry picked from commit faec5043df56639e2fd23de2d91ae796e4f3df70)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75525
Creating injection point for ProfilerKineto to attach global callback. We'll disable the KinetoObserver via `'kineto.disable_libkineto_observer=1'` and enable this to swap out the implementations.
Test Plan:
1. add temporary logs in the stub + registration method
2. `buck build mode/opt //kineto/libkineto/fb/integration_tests:trace_tester --config 'kineto.disable_libkineto_observer=1' --config "kineto.enable_libkineto_client=1`
3. `./buck-out/gen/kineto/libkineto/fb/integration_tests/trace_tester --test_ondemand --libkineto_runner_iterations 1000000` should see log for registration
4. `dyno gputrace` should see log for start/stop
Reviewed By: aaronenyeshi, robieta
Differential Revision: D35456304
fbshipit-source-id: c0a23a57181818e5a0ee495410163d90874355a9
(cherry picked from commit 5dfc723937356693fc041f5a011161e83a8d2528)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76049
## Context
We are trying to add an out variant for an existing operator, e.g.,:
```
chunk.out(Tensor self, int chunks, int dim=0, *, Tensor(a!)[] out) -> Tensor(a!)[]
```
Notice the out argument is a mutable list of tensors. The existing guideline defined in [model.py](https://fburl.com/nn299ifx) requires the same argument type to be returned from this operator. Given the fact that we don't support mutable tensor list as a return type and it seems not useful to add such a return type.
The solution I'm proposing is to relax the constraint that the number of outs needs to be the same as the number of returns, so we can return a `void`.
```
chunk.out(Tensor self, int chunks, int dim=0, *, Tensor(a!)[] out) -> ()
```
Test Plan: Rely on existing CI
Reviewed By: ezyang, iseeyuan
Differential Revision: D35737310
fbshipit-source-id: 66b5738cc1dcd13d532a6c97fea979bd58f381df
(cherry picked from commit 9aac5493285cd4f49a07053edfa5916c449a930c)
Moves jit shape function registration to python. Like jit decompositions, a script must be run after adding new definitions which serializes them in a c++ file.
This was a request so that torch-mlir could define functions in python and upstream their shape functions. cc @silvasean @makslevental
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75546
Approved by: https://github.com/davidberard98
Fixes #ISSUE_NUMBER
Sharding for linux-bionic-py3.7-clang9 previously included slow test times in the calculation for how long a test takes, causing the sharding to be uneven:
| Duration | Count | Name|
| ----------- | ----------- | ---|
| 11.2m | 221 |linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge)|
| 1.1h | 218 | linux-bionic-py3.7-clang9 / test (default, 2, 2, linux.2xlarge)|
Numbers taken from https://hud.pytorch.org/metrics from 04/10/2022 12:20 PM to 04/17/2022 12:20 PM.
The duration of these jobs on this PR are 39m and 38m.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75918
Approved by: https://github.com/seemethere, https://github.com/janeyx99
This pull request enables accumulating gradients for the CSR tensor.
Functions that work and are tested:
- tensor.abs()
- tensor.neg()
- tensor.conj_physical()
- torch.addmm
`torch.mm` also works, but tests will be added later.
In addition, this PR adds throwing an error when trying to access strides, storage, and contiguity info on a CSR tensor.
`tensor.to_sparse_csr().to_sparse_csr()` was failing and now fixed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75435
Approved by: https://github.com/cpuhrsch
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75808
Just as it is often difficult to write a single kernel that can handle both CPU and CUDA, so can it be difficult to do the same for NestedTensor.
ghstack-source-id: 154171542
(Note: this ignores all push blocking failures!)
Test Plan: CI?
Reviewed By: bdhirsh
Differential Revision: D35603836
fbshipit-source-id: fb0ebb19d34531ed96ce176aca325f8e2b5f90e6
(cherry picked from commit 0bcd753f93c04256c1b745f84a74ecccf0dceef5)
To do https://github.com/pytorch/pytorch/pull/75972 in a lint free
way I need to reformat all the imports (which are now incorrectly
indented). This is a pain to do manually, so I plan to ask black to
do it for me. But the files are not black compliant. So first reformat
everything with black.
This commit was generated with:
```
black tools/codegen
```
Signed-off-by: Edward Z. Yang <ezyangfb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76015
Approved by: https://github.com/bdhirsh
We would for some reason report formatting-based lints as showing up at
line 1 column 1. This removes them for now. Maybe eventually we can
recover better line numbers from the formatting diff and post messages
for each diff cluster, but that requires actual changes to the linting
engine.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75928
Approved by: https://github.com/janeyx99