Commit Graph

131 Commits

Author SHA1 Message Date
Joel Schlosser
b928e08f3d Initial vmap + NT support with unbind fallback (#106786)
PoC demonstrating vmap + NT based on the [design doc](https://docs.google.com/document/d/1dVVk6TOqz93PLTIneU2T3xaxCs9qZ0MaJyCvOAp_bC0). This PR:
* Allows `BatchedTensorImpl`s to contain NTs
* Introduces a `BatchedNestedTensor` dispatch key for NT-specific batching rules
* Provides a batching rule fallback that unbinds the NTs -> performs computation on constituent -> rebinds results into NT

Restrictions:
* Only supports one level of vmap
* Only supports vmapping over dim=0 for NTs
    * For operations with mixed NT / dense inputs, support is also limited to dim=0 for the dense inputs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106786
Approved by: https://github.com/zou3519
2023-09-07 13:53:20 +00:00
Elias Ellison
7bb40be143 Fix fake tensor for private use backends (#103090)
Fix for https://github.com/pytorch/pytorch/issues/101244

We need meta to be higher priority than PrivateUse1 (as it is for cpu and cuda) so that when meta is in tls we hit meta kernel.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103090
Approved by: https://github.com/bdhirsh
2023-06-27 21:17:40 +00:00
Meghan
6ff4548b6e [AMP] Support XLA:TPU (#96370)
With https://github.com/pytorch/xla/pull/5148, https://github.com/pytorch/xla/pull/4740

With these changes
XLA:GPU users should use `torch.cuda.amp.autocast()` for AMP with float16
XLA:TPU users should use `torch.amp.autocast('xla')` for AMP with bfloat16

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96370
Approved by: https://github.com/bdhirsh, https://github.com/malfet
2023-06-23 19:46:42 +00:00
Charlie West-Taylor
5eb7325bc7 Add autocast support for IPU (#103890)
As part of this, a new `AutocastIPU` dispatch key has been added.

There's an existing PR, #85043, to make `Autocast` a proper per-backend functionality key, but it ran into issues with layering with other functionality keys and went stale.

This has been tested in the out-of-tree IPU PyTorch backend.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103890
Approved by: https://github.com/albanD
2023-06-22 15:38:45 +00:00
Brian Hirsh
c3c03e7cb8 Reland of https://github.com/pytorch/pytorch/pull/101818 (#103888)
Original PR broke internal

This reverts commit 5ed618132f.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103888
Approved by: https://github.com/albanD
2023-06-21 21:00:56 +00:00
PyTorch MergeBot
5ed618132f Revert "change pre_autograd to pre_dispatch tracing (#101818)"
This reverts commit b0392de2c3.

Reverted https://github.com/pytorch/pytorch/pull/101818 on behalf of https://github.com/izaitsevfb due to Breaks internal builds see D46629736 TypeError: wrap_key() got an unexpected keyword argument pre_autograd ([comment](https://github.com/pytorch/pytorch/pull/101818#issuecomment-1587837667))
2023-06-12 18:16:37 +00:00
Brian Hirsh
b0392de2c3 change pre_autograd to pre_dispatch tracing (#101818)
We discussed in a composability meeting a few weeks ago that `pre_autograd` should probably be renamed to `pre_dispatch`.

One question in this PR was: should I re-use a dispatch key? Or should I create a new dispatch key (that yet again corresponds to "top of the dispatcher")?

~~For now, I ended up sticking our proxy mode on the mode stack corresponding to `PythonTLSSnapshot`, because it was simple and it works. It looks like one of the functorch dispatch keys has higher priority though, so it's possible that functorch will end up running first. Open to options, but we can consider adding a new dispatch key later if that becomes a problem~~

Update: I added a dedicated dispatch key, `PreDispatch`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101818
Approved by: https://github.com/ezyang, https://github.com/Neilblaze, https://github.com/albanD, https://github.com/zou3519
2023-06-09 17:30:15 +00:00
cyy
3ae42cb7db adjust header inclusions in C10 as sugguested by IWYU (#102467)
This PR aims to reduce unused header inclusions in C10.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102467
Approved by: https://github.com/albanD
2023-05-31 19:19:10 +00:00
Benson Ma
66a2600b6a [T153220354] Fix header inclusions in c10 (#1541) (#101846)
Summary:
This is a re-attempt to land the iwyu header changes, by taking the diff from [PR 100304](https://github.com/pytorch/pytorch/pull/100304), and adding the bare minimal changes to make the diff build corectly in the internal builds.

X-link: https://github.com/facebookresearch/pytorch3d/pull/1541

X-link: https://github.com/fairinternal/pytorch3d/pull/44

- Re-work D45769819 to fix header inclusions in c10

Test Plan:
```
buck2 build --no-remote-cache mode/dev-nosan //caffe2/c10/...

buck2 build --no-remote-cache mode/dev-nosan //deeplearning/fbgemm/fbgemm_gpu/...

buck2 build mode/dev-nosan //vision/fair/pytorch3d/pytorch3d:_C
```

Reviewed By: malfet

Differential Revision: D45920611

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101846
Approved by: https://github.com/malfet, https://github.com/Skylion007
2023-05-20 19:35:14 +00:00
PyTorch MergeBot
4eaaa08623 Revert "Fix header inclusions in c10 by iwyu (#100304)"
This reverts commit 6037ee8cc9.

Reverted https://github.com/pytorch/pytorch/pull/100304 on behalf of https://github.com/jeanschmidt due to Breaking meta internal builds and fbgemm builds ([comment](https://github.com/pytorch/pytorch/pull/100304#issuecomment-1543919257))
2023-05-11 12:37:35 +00:00
cyy
6037ee8cc9 Fix header inclusions in c10 by iwyu (#100304)
This work introduces include-what-you-use  support for c10 by a CMake option defaulting to off. We also remove some unused header inclusions and  fix a trivial inclusion error.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100304
Approved by: https://github.com/ezyang
2023-05-11 05:19:42 +00:00
PyTorch MergeBot
3271413e74 Revert "Fix header inclusions in c10 by iwyu (#100304)"
This reverts commit 39ec5fa722.

Reverted https://github.com/pytorch/pytorch/pull/100304 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, it is almost there but fails on Windows 39ec5fa722, which is in unstable mode after https://github.com/pytorch/pytorch/pull/100548 ([comment](https://github.com/pytorch/pytorch/pull/100304#issuecomment-1542975714))
2023-05-11 00:37:32 +00:00
cyy
39ec5fa722 Fix header inclusions in c10 by iwyu (#100304)
This work introduces include-what-you-use  support for c10 by a CMake option defaulting to off. We also remove some unused header inclusions and  fix a trivial inclusion error.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100304
Approved by: https://github.com/ezyang
2023-05-10 15:42:43 +00:00
Kazuaki Ishizaki
64b8d20a5c Fix typos under c10 directory (#98079)
This PR fixes typos in comments and messages of files under `c10` directory

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98079
Approved by: https://github.com/Skylion007
2023-03-31 18:31:11 +00:00
shibo
6b691b99da add amp support for custom backend (#96188)
Fixes #ISSUE_NUMBER
1、add amp support for custom backend
2、optimize the file `backend_registration.py`, and rename it with `custom_backend_registration.py`. And then we would register other funcs for custom backend.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96188
Approved by: https://github.com/bdhirsh
2023-03-20 20:27:35 +00:00
PyTorch MergeBot
a8f36dd646 Revert "add amp support for custom backend (#96188)"
This reverts commit cf12edee02.

Reverted https://github.com/pytorch/pytorch/pull/96188 on behalf of https://github.com/kit1980 due to Broke some linalg tests : https://github.com/pytorch/pytorch/actions/runs/4420037607/jobs/7750708339
2023-03-15 00:03:19 +00:00
shibo
cf12edee02 add amp support for custom backend (#96188)
Fixes #ISSUE_NUMBER
1、add amp support for custom backend
2、optimize the file `backend_registration.py`, and rename it with `custom_backend_registration.py`. And then we would register other funcs for custom backend.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96188
Approved by: https://github.com/bdhirsh
2023-03-14 20:43:21 +00:00
Brian Hirsh
948cd61afc add fallthrough kernel for AutogradMeta key (#94603)
The other `Autograd[Backend]` keys all have fallthrough kernels registered to them, but `AutogradMeta` was missing the fallthrough kernel.

This is a problem for custom ops that don't have autograd support, if you try to run them with meta tensors. If you have a custom op, and register a CPU and a Meta kernel, then:

(1) if you run the op with cpu tensors, it will dispatch straight to the CPU kernel (as expected)

(2) if you run the op with meta tensors, you will error - because we don't have a fallthrough registered to the AutogradMeta key, we will try to dispatch to the AutogradMeta key and error, since the op author hasn't provided an autograd implementation.

Here's a repro that I confirmed now works:

```
import torch
from torch._dispatch.python import enable_python_dispatcher
from torch._subclasses.fake_tensor import FakeTensorMode

lib = torch.library.Library("test", "DEF")
impl_cpu = torch.library.Library("test", "IMPL", "CPU")
impl_meta = torch.library.Library("test", "IMPL", "Meta")

def foo_impl(x):
    return x + 1

lib.define("foo(Tensor a) -> Tensor")
impl_meta.impl("foo", foo_impl)
impl_cpu.impl("foo", foo_impl)

with enable_python_dispatcher():
    a = torch.ones(2, device='meta')
    print("@@@@@")
    b = torch.ops.test.foo.default(a)
    print(b)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94603
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-02-10 22:44:52 +00:00
cyy
fa65ae8f56 cleanup unused include (#93359)
Using `include-what-you-use` tool to find out and remove some unused includes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93359
Approved by: https://github.com/malfet
2023-02-04 02:15:50 +00:00
Hangchen Yu
5a0fa04a49 Add MTIA DeviceType for Meta training and inference devices (#92232)
Summary: This adds a new MTIA DeviceType which is associated with the MTIA DispatchKey and will be used for the Meta in-house training and inference accelerators.

Test Plan: All CI should pass.

Differential Revision: D42526044

Pull Request resolved: https://github.com/pytorch/pytorch/pull/92232
Approved by: https://github.com/ezyang
2023-01-16 12:20:23 +00:00
Sean Ross-Ross
5f881ac2d1 Adding dispatch alias 'FuncTorchBatchedDecomposition' (#88771)
part of https://github.com/pytorch/functorch/issues/1009

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88771
Approved by: https://github.com/zou3519
2022-12-02 04:38:28 +00:00
Amadeusz Skrzypczak
6be9d9a630 Add AutocastHPU support (#84927)
New dispatch key and necessary functions are added to PyTorch. Backend implementation will be added in the external library.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84927
Approved by: https://github.com/bdhirsh
2022-10-12 19:37:16 +00:00
Michael Voznesensky
8ca1839d32 Python Dispatcher integration with C++ dispatcher (#85050)
#84826 but without ghstack
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85050
Approved by: https://github.com/malfet
2022-09-15 00:43:36 +00:00
PyTorch MergeBot
706b990306 Revert "Python Dispatcher integration with C++ dispatcher (#84826)"
This reverts commit 35f6a69191.

Reverted https://github.com/pytorch/pytorch/pull/84826 on behalf of https://github.com/malfet due to Broke dynamo, see 35f6a69191
2022-09-14 14:07:58 +00:00
Michael Voznesensky
35f6a69191 Python Dispatcher integration with C++ dispatcher (#84826)
Signed-off-by: Edward Z. Yang <ezyangfb.com>

From @ezyang's original PR:

There are a number of situations where we have non-backend kernels (e.g., CompositeImplicitAutograd, batching rules) which we would like to port to Python, but we have no way to integrate these ports with the overall system while using preexisting C++ registrations otherwise. This PR changes that by introducing a Python dispatcher (which can have its own kernels directly in Python), which can be interpose over ordinary C++ dispatch. The ingredients:

We introduce a new PythonDispatcher dispatch key, that has the same tenor as FuncTorchDynamicLayerFrontMode: it works by getting triggered before every other dispatch key in the dispatch key, and shunting to a Python implementation
The Python dispatcher is a per-interpreter global object that is enabled/disabled via the guard EnablePythonDispatcher/DisablePythonDispatcher. We don't make it compositional as I have no idea what a compositional version of this feature would look like. Because it is global, we don't need to memory manage it and so I use a simpler SafePyHandle (newly added) to control access to this pointer from non-Python C++. Like __torch_dispatch__, we use PyInterpreter to get to the Python interpreter to handle the dispatch.
I need to reimplement dispatch table computation logic in Python. To do this, I expose a lot more helper functions for doing computations on alias dispatch keys and similar. I also improve the pybind11 handling for DispatchKey so that you can either accept the pybind11 bound enum or a string; this simplifies our binding code. See https://github.com/pybind/pybind11/issues/483#issuecomment-1237418106 for how this works; the technique is generally useful.

I need to be able to call backend fallbacks. I do this by permitting you to call at a dispatch key which doesn't have a kernel for the operator; if the kernel doesn't exist, we check the backend fallback table instead.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84826
Approved by: https://github.com/ezyang
2022-09-14 06:57:19 +00:00
YifanShenSZ
673b35c847 Better reshape with autograd support (#82754) (#84154)
The original author is @YifanShenSZ  and the original PR is: #82754
# Summary:
Previous reshape [https://github.com/pytorch/pytorch/issues/80981](https://github.com/pytorch/pytorch/pull/80981) is ok for forward, but needs improvement for backward: need to handle "sometimes view sometimes copy" behavior.

This pull request fixes it by:
1. add a new alias dispatch key `CompositeImplicitAutogradNestedTensor`, which ideally would work as nested-tensor version of `CompositeImplicitAutograd`
2. register `reshape_nested` to `reshape` by `CompositeImplicitAutogradNestedTensor`

Side changes:
* add contiguous memory format support to `clone_nested`
* add `view_nested`
* add `reshape_as_nested`

Fix issue [https://github.com/pytorch/pytorch/issues/83041](https://github.com/pytorch/pytorch/issues/83041)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82754

Test Plan:
Imported from GitHub, without a `Test Plan:` line.

**Static Docs Preview: executorch**
|[Full Site](https://our.intern.facebook.com/intern/staticdocs/eph/D39023822/V13/executorch/)|

|**Modified Pages**|

Reviewed By: albanD

Differential Revision: D39023822

Pulled By: drisspg

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84154
Approved by: https://github.com/bdhirsh, https://github.com/albanD
2022-09-01 20:01:39 +00:00
Elias Ellison
642aed8b99 Add Autocast Support for FakeTensors / use fake device dispatch keys (#82449)
From PR:
```
Note: [Fake Tensor Dispatch Keys]
In order to model the behavior of device-specific autocast
and autograd logic, we update the dispatch keys of FakeTensors
to reflect their fake device. This includes the BackendComponent
(DispatchKey::Meta -> DispatchKey::CUDA), and also the BackendComponent
related Autocast and Autograd keys. __torch__dispatch__ sits below
Autocast and Autograd, and is only invoked when we are at the
kernel for the BackendComponent. Then, we add Meta to the
thread-local dispatch include set to hit the meta kernel
instead of the kernel of the BackendComponent for the fake device.
```

Also adds the `conv1/2/3d.padding` operators to the Autocast rule set. Without that fix, the FakeTensor dtype would diverge.

See: https://github.com/pytorch/pytorch/issues/81608

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82449
Approved by: https://github.com/ezyang
2022-08-01 21:40:36 +00:00
Edward Z. Yang
de9b3fb3e5 Minor comment updates on DispatchKey.h (#81923)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81923
Approved by: https://github.com/bdhirsh
2022-07-27 22:38:28 +00:00
Edward Z. Yang
1724e9f21f Refactor functionality and backend keys to reduce duplication (#81752)
Define some macros for stamping these out, and then use them everywhere
applicable.  Parsing should get this treatment too but I leave it to a
follow up.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81752
Approved by: https://github.com/cpuhrsch, https://github.com/bdhirsh
2022-07-21 21:23:54 +00:00
Brian Hirsh
c2d395cf8e functionalization <> LTC integration (take 3) (#80251)
new PR for https://github.com/pytorch/pytorch/pull/75527.

It looks like there's a bug in the windows CI scripts that was causing
flaky failures, that disappear when I create a new PR. example failure:
https://github.com/pytorch/pytorch/runs/6999272635?check_suite_focus=true
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80251
Approved by: https://github.com/wconstab
2022-06-26 23:10:21 +00:00
Brian Hirsh
adf8060600 add a new alias key for functional to view op decompositions
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79615

Approved by: https://github.com/zou3519
2022-06-15 23:18:09 +00:00
Edward Z. Yang
7313a7a987 Make Meta into a backend component
Seems like it should be one.  This will make it possible to register
meta implementations even when there is a CompositeImplicitAutograd
registration already.  It also paves the way for sparse meta, etc.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78469

Approved by: https://github.com/ngimel
2022-05-31 18:59:16 +00:00
Brian Hirsh
7ff091fc4e move Functionalize dispatch key closer to backends
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77132

Approved by: https://github.com/ezyang, https://github.com/zou3519
2022-05-26 16:15:43 +00:00
Kulin Seth
54c75e1e8f Add "mps" device to PyTorch framework.
Remove the "mlc" device for Mac platforms.

This commit will be followed up with:

* adding MPS runtime components
* PyTorch ops for MPS device

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76291
Approved by: https://github.com/albanD
2022-04-27 19:21:57 +00:00
Can Balioglu
a0bf0f5611 Add new dispatch keys for Fake Tensor and Deferred Module Initialization
Thanks to @bdhirsh's work, we now have room for new dispatch keys in `DispatchKey` enum. This PR adds two new keys for out-of-core [Fake Tensor](https://pytorch.org/torchdistx/latest/fake_tensor.html) and [Deferred Module Initialization](https://pytorch.org/torchdistx/latest/deferred_init.html) features.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76139
Approved by: https://github.com/bdhirsh
2022-04-27 18:48:44 +00:00
Edward Yang
36420b5e8c Rename tools/codegen to torchgen (#76275)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76275

In preparation for addressing
https://github.com/pytorch/pytorch/issues/73212

Diff was generated with:

```
git mv tools/codegen torchgen
git grep -l 'tools.codegen' | xargs sed -i 's/tools.codegen/torchgen/g'
sed -i "s/\${TOOLS_PATH}\/codegen/\${TORCH_ROOT}\/torchgen/g" caffe2/CMakeLists.txt
```

and a manual edits to:

* tools/test/test_gen_backend_stubs.py
* torchgen/build.bzl
* torchgen/gen_backend_stubs.py

aka this diff:

```
 diff --git a/tools/test/test_gen_backend_stubs.py b/tools/test/test_gen_backend_stubs.py
index 3dc26c6d2d..104054575e 100644
 --- a/tools/test/test_gen_backend_stubs.py
+++ b/tools/test/test_gen_backend_stubs.py
@@ -9,7 +9,7 @@ from torchgen.gen_backend_stubs import run
 from torchgen.gen import _GLOBAL_PARSE_NATIVE_YAML_CACHE  # noqa: F401

 path = os.path.dirname(os.path.realpath(__file__))
-gen_backend_stubs_path = os.path.join(path, '../torchgen/gen_backend_stubs.py')
+gen_backend_stubs_path = os.path.join(path, '../../torchgen/gen_backend_stubs.py')

 # gen_backend_stubs.py is an integration point that is called directly by external backends.
 # The tests here are to confirm that badly formed inputs result in reasonable error messages.
 diff --git a/torchgen/build.bzl b/torchgen/build.bzl
index ed04e35a43..d00078a3cf 100644
 --- a/torchgen/build.bzl
+++ b/torchgen/build.bzl
@@ -1,6 +1,6 @@
 def define_targets(rules):
     rules.py_library(
-        name = "codegen",
+        name = "torchgen",
         srcs = rules.glob(["**/*.py"]),
         deps = [
             rules.requirement("PyYAML"),
@@ -11,6 +11,6 @@ def define_targets(rules):

     rules.py_binary(
         name = "gen",
-        srcs = [":codegen"],
+        srcs = [":torchgen"],
         visibility = ["//visibility:public"],
     )
 diff --git a/torchgen/gen_backend_stubs.py b/torchgen/gen_backend_stubs.py
index c1a672a655..beee7a15e0 100644
 --- a/torchgen/gen_backend_stubs.py
+++ b/torchgen/gen_backend_stubs.py
@@ -474,7 +474,7 @@ def run(
 ) -> None:

     # Assumes that this file lives at PYTORCH_ROOT/torchgen/gen_backend_stubs.py
-    pytorch_root = pathlib.Path(__file__).parent.parent.parent.absolute()
+    pytorch_root = pathlib.Path(__file__).parent.parent.absolute()
     template_dir = os.path.join(pytorch_root, "aten/src/ATen/templates")

     def make_file_manager(install_dir: str) -> FileManager:
```

run_all_fbandroid_tests

Test Plan: sandcastle

Reviewed By: albanD, ngimel

Differential Revision: D35770317

fbshipit-source-id: 153ac4a7fef15b1e750812a90bfafdbc8f1ebcdf
(cherry picked from commit c6d485d1d4648fa1c8a4c14c5bf3d8e899b9b4dd)
2022-04-25 01:38:06 +00:00
Guo Yejun
6f991fc5fc add XPU support for autocast
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75250
Approved by: https://github.com/bdhirsh
2022-04-19 21:18:23 +00:00
Scott Wolchok
0a5e788ab2 [PyTorch] Add NestedTensorCPU and NestedTensorCUDA dispatch keys (#75808)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75808

Just as it is often difficult to write a single kernel that can handle both CPU and CUDA, so can it be difficult to do the same for NestedTensor.
ghstack-source-id: 154171542

(Note: this ignores all push blocking failures!)

Test Plan: CI?

Reviewed By: bdhirsh

Differential Revision: D35603836

fbshipit-source-id: fb0ebb19d34531ed96ce176aca325f8e2b5f90e6
(cherry picked from commit 0bcd753f93c04256c1b745f84a74ecccf0dceef5)
2022-04-19 18:12:12 +00:00
Brian Hirsh
cb17973a2b split out functionalization codegen to use view_copy operators
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75302

Approved by: https://github.com/ezyang
2022-04-18 20:05:21 +00:00
Anthony Barbier
ce9e27a0fc Add new keys for Graphcore IPU (DispatchKey / Backend / DeviceType)
We need a key to register our out of tree backend: https://github.com/graphcore/poptorch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74763
Approved by: https://github.com/bdhirsh
2022-04-07 17:18:45 +00:00
Brian Hirsh
1b7d7d9327 Reland: "free up dispatch key space (in C++)" (#74963)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74963

This is a re-land of D35192346 (9872a06d77) and D35192317 (a9216cde6c), which together are a diff that changes the internal representation of `DispatchKeySet` in pytorch core to free up the number of dispatch keys that we have available. See a more detailed description of the design in the original PR: https://github.com/pytorch/pytorch/pull/69633.

The original PR broke Milan workflows, which use a pytorch mobile build, and manifested as a memory corruption bug inside of `liboacrmerged.so`.

**Background: Existing Mobile Optimization**
Pytorch mobile builds have an existing optimization (here cc23725e89/c10/core/DispatchKey.h (L382) and here cc23725e89/aten/src/ATen/core/dispatch/OperatorEntry.h (L214)), which works as follows:

Every operator in pytorch has a "dispatch table" of function pointers, corresponding to all of the (up to 64) different kernels that we might dispatch to when we run an operator in pytorch (autograd, cpu, cuda, complex number support, etc).

In mobile builds, the size of that table is shrunk from 64 to 8 to save a bunch of space, because mobile doesn't end up using the functionality associated with most dispatch keys.

The dispatcher also has a notion of "fallback kernels", which are kernels that you can register to a particular dispatch key, but should be able to work for "any operator". The array of fallback kernels is defined here: cc23725e89/aten/src/ATen/core/dispatch/Dispatcher.h (L294).

The mobile-optimization currently does **not** extend to this array (it wouldn't be that useful anyway because there is only one array of fallback kernels globally - vs. there is a separate dispatch table of function pointers per operator). So the per-operator tables on mobile are size 8, while the fallback table is size 64.

**The Bug**
This PR actually makes it difficult to enable that optimization separately for the per-operator arrays vs. the fallback array, and incidentally shrunk the size of the fallback array from 64 to 8 for mobile (that happened on this line: https://github.com/pytorch/pytorch/pull/69633/files#diff-f735cd7aa68f15b624100cbc4bb3b5ea76ffc7c9d3bec3b0ccabaa09609e5319R294).

That isn't a problem by itself (since mobile doesn't actually use any of the fallbacks that can no longer be stored). However, pytorch core will still register all of those fallback kernels on startup in mobile builds, even if they aren't used. When we tried to register one of those fallbacks on startup, it would try to dump the kernel somewhere in memory past the bounds of the (now smaller) array inside of the `Dispatcher` object, `backendFallbackKernels_`.

**Why didn't this problem show up in OSS CI? Why didn't it break other internal mobile workflows aside from Milan?**

Ideally, this failure would show up as part of the OSS signal on GitHub, since we already have mobile OSS builds. Given that it was another memory corruption issue that only affected Milan (subset of mobile), I'm not sure what's specific about Milan's builds that caused it only to manifest there. dreiss I wonder if there's another flavor of mobile builds we could run in OSS CI that could potentially help catch this?

**The debugging experience was pretty difficult**

Debugging the Milan-specific failure was made difficult by the following:

(1) lack of CI
- the original Milan failure didn't surface on my original diff, because the Milan job(s) that failed weren't triggered to run on pytorch changes. There's probably a balance to strike here, since those jobs will only be useful if they aren't flaky, and if they can produce reliable failure logs for debugging.

(2) It's difficult to get a repro.
- my work laptop doesn't have the right specs to run the Milan development workflow (not enough disk space)
- There is an existing OnDemand workflow for Milan, but it appears to be relatively new, and after a bunch of help from MarcioPorto, we ran into issues forwarding the log output from Milan tests on the emulator back to the terminal (see the original discussion here: https://fb.workplace.com/groups/OnDemandFRL/permalink/1424937774645433/)

(3) Lack of stack-traces.
- Most Milan failures didn't include actionable stack traces. phding generously helped me debug by running my suggested patches locally, and reporting back if there were any failures. The failing test didn't include a stack trace though (just the line where the crash appeared), so I ended up making some educated guesses about what the issue was based on the area of the crash.
ghstack-source-id: 152688542

Test Plan: Confirmed with phding that the broken Milan workflow from the previous version of this diff is now passing.

Reviewed By: phding, albanD

Differential Revision: D35222806

fbshipit-source-id: 0ad115a0f768bc8ea5d4c203b2990254c7092d30
(cherry picked from commit 002b91966f11fd55ab3fa3801b636fa39a6dd12c)
2022-03-31 21:52:38 +00:00
Brian Hirsh
9872a06d77 Back out "free up dispatch key space (in C++)" (#74859)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74859

Original commit changeset: 6d1dd0fd8144

Original Phabricator Diff: D34227616 (2cbddc0e9b)
ghstack-source-id: 152381077

(Note: this ignores all push blocking failures!)

Test Plan:
Test on Milan with "get weather utterance"
buck build fbsourcefbandroid/mode/opt fbsourcefbandroid/mode/milan_build_rdk  //fbandroid/apps/wearable/system/speechservice:speechservice_target30_xhdpi_armv7_release_debug_keystore -c  pt.has_backtaces=1

Reviewed By: phding

Differential Revision: D35192346

fbshipit-source-id: b962de5d5effaf23f9aa8afd3ef36f8c6383de5b
(cherry picked from commit 913e3027a11457aaa2d97a9d89ebc6133b14213c)
2022-03-29 15:39:17 +00:00
Richard Zou
a75c718d7c [reland] Update tls logic to work better with guarded call (#73925)
This PR relands https://github.com/pytorch/pytorch/pull/73925 which we
reverted due to a large breakage in functorch.

As a part of the reland, this PR adds a change we agreed upon in
https://docs.google.com/document/d/1i7Y9VZp9PxtgVcrQh6nGQXkXkPc1uMep0dM-OMOGJ9o/edit
The change is moving the PythonTLSSnapshot key after
DynamicLayerFrontMode.

Test Plan:
- I tested this with an updated version of functorch and all the tests
pass so I think we are out of the woods.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74577

Approved by: https://github.com/albanD
2022-03-25 19:51:10 +00:00
Brian Hirsh
2cbddc0e9b free up dispatch key space (in C++) (#72827)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72827

Reland of D34034848 (6690256021)
ghstack-source-id: 152161452

Test Plan: Confirm that Milan tests are passing

Reviewed By: ezyang

Differential Revision: D34227616

fbshipit-source-id: 6d1dd0fd8144dfbd9e194cd7564cce017e7db968
(cherry picked from commit e5c1b29fedd5c2a0bad810cedc94aa784136b6aa)
2022-03-25 17:04:51 +00:00
Alban Desmaison
a7cac05ca6 Add new tls snapshot feature (#72832)
Summary:
Reland of https://github.com/pytorch/pytorch/pull/72623 that was reverted for the tls cleanup was removed.

From close inspection on the counting of the number of available keys, I think there is one more since the guard is actually one after the last usable key. With this update assert, the last updated key will still be <=63 which will fit just fine.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72832

Reviewed By: H-Huang

Differential Revision: D34228571

Pulled By: albanD

fbshipit-source-id: ce5e10a841ea87386727346cfc8d9327252574c4
(cherry picked from commit 59d3b86353)
2022-02-15 19:02:05 +00:00
Brian Hirsh
22ccf448e8 Revert D34034848: free up dispatch key space (in C++)
Test Plan: revert-hammer

Differential Revision:
D34034848 (6690256021)

Original commit changeset: 9677ee2c0a1a

Original Phabricator Diff: D34034848 (6690256021)

fbshipit-source-id: fd50943d915ef813bb9f9ab278fb582429eea3b1
(cherry picked from commit 3acefee1cd)
2022-02-14 23:29:00 +00:00
Brian Hirsh
f1a9650e4f Revert D34214953: Add new tls snapshot feature
Test Plan: revert-hammer

Differential Revision:
D34214953 (6199b5231f)

Original commit changeset: 7aa5d5e3540a

Original Phabricator Diff: D34214953 (6199b5231f)

fbshipit-source-id: 5d271e9a5ab021b8202402630dbf917b43c55421
(cherry picked from commit a12c630198)
2022-02-14 23:14:19 +00:00
Alban Desmaison
6199b5231f Add new tls snapshot feature (#72623)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72623

Test Plan: Imported from OSS

Reviewed By: samdow

Differential Revision: D34214953

Pulled By: albanD

fbshipit-source-id: 7aa5d5e3540a45a0ae70c5af3a4495c755908aa9
(cherry picked from commit dc0a1ab54a)
2022-02-14 20:46:54 +00:00
Brian Hirsh
6690256021 free up dispatch key space (in C++) (#72402)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72402

The original PR had an array-out-of-bounds access in `DispatchKeyExtractor.cpp`, that wasn't caught by ASAN and appeared to only manifest in a subset of android internal tests. After fixing the OOB access (and adding more asserts), I confirmed that the android internal test passes.

Reland of D33255193 (20b8653dfa)
ghstack-source-id: 148830728

Test Plan:
Steps to test:

(1) connect to a mobile OD

(2) run `one_world android emulator android-29` in a terminal to start the android emulator

(3) In a separate terminal, run the test: `buck test //fbandroid/instrumentation_tests/com/facebook/pytorch/bi_xray:instrumentation_test -c test.external_runner=tpx -- --regex 'testBIXRayModel.*PyTorchBIXRayInstrumentationTest' --force-remote-execution --run-disabled`

I also ran `buck test fbandroid/mode/dbg //fbandroid/instrumentation_tests/com/facebook/pytorch/bi_xray:instrumentation_test`, which failed before and passed after the PR.

Reviewed By: albanD

Differential Revision: D34034848

fbshipit-source-id: 9677ee2c0a1afd1183896f7055009445712523c5
(cherry picked from commit 9ab9b12d35)
2022-02-14 16:02:29 +00:00
Jacob Szwejbka
791e7df7d9 Back out "free up dispatch key space (in C++)"
Summary: I think this diff stack broke all the related tasks below.

Test Plan:
For our failing tests:

buck test //fbandroid/instrumentation_tests/com/facebook/pytorch/bi_xray:instrumentation_test -c test.external_runner=tpx -- --regex 'testBIXRayModel.*PyTorchBIXRayInstrumentationTest' --force-remote-execution --run-disabled

For the ubn:

Not really sure what to do, trying to build the app and see if I can use an effect?

Reviewed By: shoumikhin

Differential Revision: D34018849

fbshipit-source-id: 3571718cb6621931af931b494e0a70d6e0164e65
(cherry picked from commit 3cc63cb2ea)
2022-02-05 01:25:42 +00:00