pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Isalia20	a1282b1823	[MPS] Add boilerplate sparse code support (#157238 ) This PR makes minimal changes to support sparse tensors on MPS. In the followup PRs I'll start adding different operations slowly so we can fix the issue of https://github.com/pytorch/pytorch/issues/129842 which is highly requested(I assume because of whisper using sparse tensors) Pull Request resolved: https://github.com/pytorch/pytorch/pull/157238 Approved by: https://github.com/malfet	2025-06-30 01:53:45 +00:00
Nitin Singh	9458b83729	[HPU] Add HPU as a supported device for NestedTensor (#148659 ) This change enables basic NestedTensor operations on HPU, fixing the runtime error when creating a NestedTensor on HPU. - Extended `NestedTensorImpl` to recognize `hpu` as a valid storage device. - Added `NestedTensorHPU` to `DispatchKey` parsing in `DispatchKey.cpp`. - Updated `torchgen/model.py` to include `NestedTensorHPU` in `dispatch_keys`. - Modified `native_functions.yaml` to enable `NestedTensorHPU` support for various ops. Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/148659 Approved by: https://github.com/jeromean, https://github.com/albanD, https://github.com/sujoysaraswati	2025-04-14 03:42:34 +00:00
Wei-Sheng Chin	bca75fe97a	[MAIA] [Autocast] Enable autocast on MAIA device (#148511 ) Fixes #148510. Pull Request resolved: https://github.com/pytorch/pytorch/pull/148511 Approved by: https://github.com/albanD	2025-03-18 03:46:22 +00:00
Simon Mahns	6939a56e13	[autocast][pytorch] Support autocast for MTIA (#145627 ) Summary: Add autocast support to MTIA Reviewed By: egienvalue Differential Revision: D68572548 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145627 Approved by: https://github.com/egienvalue	2025-01-25 03:24:59 +00:00
fan.mo	64829b356a	[PrivateUse1] Support parseDispatchKey with modified PrivateUse1 (#144325 ) PyTorch now support many private1 backend names like `AutogradPrivateUse1` or `QuantizedPrivateUse1`, not mentioned the original `PrivateUse1` backend. However, users that implement `PrivateUse1` funtionalities would modified the backend name by calling `torch.utils.rename_privateuse1_backend("my_backend")`, in that case, all `PrivateUse1` backend string would not be found when we call other functions related to it. For example, we utilize `torch.library` to register some customize functions to our new backend, we would use "my_backend" as the backend name instead of "PrivateUse1", in which the error will be throw: ``` could not parse dispatch key 'my_backend' ``` So, this PR changed the function `c10::DispatchKey parseDispatchKey(const std::string& k)`, it would double check if the `PrivateUse1` has been modified, and if so, we would change `k` to adapt new backend name then find it again. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144325 Approved by: https://github.com/albanD	2025-01-14 21:21:29 +00:00
fan.mo	43edb94f8a	[Quantization][PrivateUse1] Adding more support QuantizedPrivateuse1 backends (#139860 ) Here's are some explanations of this PR. 1. Changes in `aten/src/ATen/core/Tensor.cpp` and `c10/core/DispatchKey.cpp`: Support toString method for `QuantizedPrivateUse1` backend, make pytorch print out correct backend string for it. 2. Add header `DispatchStub.h` in `aten/src/ATen/native/quantized/IndexKernel.h`: If this header is not included, we can't utilize `masked_fill_kernel_quantized_stub` even we include this `IndexKernel.h` header, it would throw an error during compilation. 3. Add multiple `TORCH_API`s in `aten/src/ATen/native/quantized/AffineQuantizer.h`: these functions is useful for other privateuse1 backends supporting quantization functions, if these `TORCH_API` are missed, it would throw an error during runtime (undefined symbol) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139860 Approved by: https://github.com/bdhirsh	2024-11-18 05:09:59 +00:00
sdp	83b6d91d08	[Intel GPU] Add NestedTensorXPU to parseDispatchKey and codegen (#140461 ) Add `NestedTensorXPU` dispatch key. ``` >>> nt = torch.nested.nested_tensor([]).to("xpu") >>> nt nested_tensor([ ], device='xpu:0') >>> nt.is_xpu True ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/140461 Approved by: https://github.com/guangyey, https://github.com/EikanWang, https://github.com/ezyang	2024-11-14 18:54:41 +00:00
Kulin Seth	144fde4fd2	[MPS] Add support for autocast in MPS (#99272 ) Fixes https://github.com/pytorch/pytorch/issues/88415 Need to run inductor/test_cpu_select_algorithm Pull Request resolved: https://github.com/pytorch/pytorch/pull/99272 Approved by: https://github.com/malfet Co-authored-by: Siddharth Kotapati <skotapati@apple.com> Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com> Co-authored-by: Roy Hvaara <roy@lightyear.no>	2024-09-05 23:23:17 +00:00
PyTorch MergeBot	2764bee942	Revert "[MPS] Add support for autocast in MPS (#99272 )" This reverts commit `6919e8baab`. Reverted https://github.com/pytorch/pytorch/pull/99272 on behalf of https://github.com/clee2000 due to Broke test/inductor/test_cpu_select_algorithm.py::TestSelectAlgorithmCPU::test_quantized_linear_amx_batch_size_3_in_features_128_out_features_64_bias_False_cpu on sm86 jobs [GH job link](https://github.com/pytorch/pytorch/actions/runs/10252979157/job/28367091621) [HUD commit link](`6919e8baab`) Not caught on PR due to bad TD ([comment](https://github.com/pytorch/pytorch/pull/99272#issuecomment-2269808857))	2024-08-05 19:59:04 +00:00
Kulin Seth	6919e8baab	[MPS] Add support for autocast in MPS (#99272 ) Fixes https://github.com/pytorch/pytorch/issues/88415 Co-authored-by: Siddharth Kotapati <skotapati@apple.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99272 Approved by: https://github.com/malfet	2024-08-05 17:02:30 +00:00
PyTorch MergeBot	07450e9713	Revert "[MPS] Add support for autocast in MPS (#99272 )" This reverts commit `6240cfd5c7`. Reverted https://github.com/pytorch/pytorch/pull/99272 on behalf of https://github.com/jeanschmidt due to introduced breakages in trunk ([comment](https://github.com/pytorch/pytorch/pull/99272#issuecomment-2203033719))	2024-07-02 12:29:51 +00:00
Kulin Seth	6240cfd5c7	[MPS] Add support for autocast in MPS (#99272 ) Fixes https://github.com/pytorch/pytorch/issues/88415 Co-authored-by: Siddharth Kotapati <skotapati@apple.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99272 Approved by: https://github.com/malfet	2024-07-02 01:49:52 +00:00
Ashwin Hari	5f5778476a	rename ort to maia (#123265 ) Fixes #123264 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123265 Approved by: https://github.com/albanD	2024-04-23 00:33:25 +00:00
Pearu Peterson	70d4d109f2	Make SparseCsr a functionality dispatch key (#120703 ) As in the title. To enable meta and fake tensor support for sparse compressed tensors in compliance with the meta/fake tensor support for sparse COO tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/120703 Approved by: https://github.com/ezyang	2024-03-01 13:28:46 +00:00
PyTorch MergeBot	8a32a07856	Revert "Add meta device support to sparse compressed tensors (#120498 )" This reverts commit `5d71ba6885`. Reverted https://github.com/pytorch/pytorch/pull/120498 on behalf of https://github.com/zou3519 due to broke CI ([comment](https://github.com/pytorch/pytorch/pull/120498#issuecomment-1964491999))	2024-02-26 15:59:36 +00:00
Pearu Peterson	5d71ba6885	Add meta device support to sparse compressed tensors (#120498 ) As in the title. Unblocks https://github.com/pytorch/pytorch/pull/117907#discussion_r1499251745 Pull Request resolved: https://github.com/pytorch/pytorch/pull/120498 Approved by: https://github.com/ezyang	2024-02-25 16:50:17 +00:00
Joel Schlosser	b928e08f3d	Initial vmap + NT support with unbind fallback (#106786 ) PoC demonstrating vmap + NT based on the [design doc](https://docs.google.com/document/d/1dVVk6TOqz93PLTIneU2T3xaxCs9qZ0MaJyCvOAp_bC0). This PR: * Allows `BatchedTensorImpl`s to contain NTs * Introduces a `BatchedNestedTensor` dispatch key for NT-specific batching rules * Provides a batching rule fallback that unbinds the NTs -> performs computation on constituent -> rebinds results into NT Restrictions: * Only supports one level of vmap * Only supports vmapping over dim=0 for NTs * For operations with mixed NT / dense inputs, support is also limited to dim=0 for the dense inputs Pull Request resolved: https://github.com/pytorch/pytorch/pull/106786 Approved by: https://github.com/zou3519	2023-09-07 13:53:20 +00:00
Meghan	6ff4548b6e	[AMP] Support XLA:TPU (#96370 ) With https://github.com/pytorch/xla/pull/5148, https://github.com/pytorch/xla/pull/4740 With these changes XLA:GPU users should use `torch.cuda.amp.autocast()` for AMP with float16 XLA:TPU users should use `torch.amp.autocast('xla')` for AMP with bfloat16 Pull Request resolved: https://github.com/pytorch/pytorch/pull/96370 Approved by: https://github.com/bdhirsh, https://github.com/malfet	2023-06-23 19:46:42 +00:00
Charlie West-Taylor	5eb7325bc7	Add autocast support for IPU (#103890 ) As part of this, a new `AutocastIPU` dispatch key has been added. There's an existing PR, #85043, to make `Autocast` a proper per-backend functionality key, but it ran into issues with layering with other functionality keys and went stale. This has been tested in the out-of-tree IPU PyTorch backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103890 Approved by: https://github.com/albanD	2023-06-22 15:38:45 +00:00
Brian Hirsh	c3c03e7cb8	Reland of https://github.com/pytorch/pytorch/pull/101818 (#103888 ) Original PR broke internal This reverts commit `5ed618132f`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103888 Approved by: https://github.com/albanD	2023-06-21 21:00:56 +00:00
PyTorch MergeBot	5ed618132f	Revert "change pre_autograd to pre_dispatch tracing (#101818 )" This reverts commit `b0392de2c3`. Reverted https://github.com/pytorch/pytorch/pull/101818 on behalf of https://github.com/izaitsevfb due to Breaks internal builds see D46629736 TypeError: wrap_key() got an unexpected keyword argument pre_autograd ([comment](https://github.com/pytorch/pytorch/pull/101818#issuecomment-1587837667))	2023-06-12 18:16:37 +00:00
Brian Hirsh	b0392de2c3	change pre_autograd to pre_dispatch tracing (#101818 ) We discussed in a composability meeting a few weeks ago that `pre_autograd` should probably be renamed to `pre_dispatch`. One question in this PR was: should I re-use a dispatch key? Or should I create a new dispatch key (that yet again corresponds to "top of the dispatcher")? ~~For now, I ended up sticking our proxy mode on the mode stack corresponding to `PythonTLSSnapshot`, because it was simple and it works. It looks like one of the functorch dispatch keys has higher priority though, so it's possible that functorch will end up running first. Open to options, but we can consider adding a new dispatch key later if that becomes a problem~~ Update: I added a dedicated dispatch key, `PreDispatch`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101818 Approved by: https://github.com/ezyang, https://github.com/Neilblaze, https://github.com/albanD, https://github.com/zou3519	2023-06-09 17:30:15 +00:00
shibo19	af50efca24	add nested/sprase/quantized tensor key for privateuse1 (#102696 ) Fixes #ISSUE_NUMBER add nested/sprase/quantized tensor key for privateuse1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102696 Approved by: https://github.com/bdhirsh	2023-06-02 22:35:52 +00:00
shibo	6b691b99da	add amp support for custom backend (#96188 ) Fixes #ISSUE_NUMBER 1、add amp support for custom backend 2、optimize the file `backend_registration.py`, and rename it with `custom_backend_registration.py`. And then we would register other funcs for custom backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96188 Approved by: https://github.com/bdhirsh	2023-03-20 20:27:35 +00:00
PyTorch MergeBot	a8f36dd646	Revert "add amp support for custom backend (#96188 )" This reverts commit `cf12edee02`. Reverted https://github.com/pytorch/pytorch/pull/96188 on behalf of https://github.com/kit1980 due to Broke some linalg tests : https://github.com/pytorch/pytorch/actions/runs/4420037607/jobs/7750708339	2023-03-15 00:03:19 +00:00
shibo	cf12edee02	add amp support for custom backend (#96188 ) Fixes #ISSUE_NUMBER 1、add amp support for custom backend 2、optimize the file `backend_registration.py`, and rename it with `custom_backend_registration.py`. And then we would register other funcs for custom backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96188 Approved by: https://github.com/bdhirsh	2023-03-14 20:43:21 +00:00
Mikayla Gawarecki	895d4781b8	[easy] Add NestedTensorMeta to parseDispatchKey (#94279 ) ran into this when trying to use `torch.library.Library("aten", "IMPL", "NestedTensorMeta")` Pull Request resolved: https://github.com/pytorch/pytorch/pull/94279 Approved by: https://github.com/bdhirsh	2023-02-07 19:46:29 +00:00
Hangchen Yu	5a0fa04a49	Add MTIA DeviceType for Meta training and inference devices (#92232 ) Summary: This adds a new MTIA DeviceType which is associated with the MTIA DispatchKey and will be used for the Meta in-house training and inference accelerators. Test Plan: All CI should pass. Differential Revision: D42526044 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92232 Approved by: https://github.com/ezyang	2023-01-16 12:20:23 +00:00
Sean Ross-Ross	5f881ac2d1	Adding dispatch alias 'FuncTorchBatchedDecomposition' (#88771 ) part of https://github.com/pytorch/functorch/issues/1009 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88771 Approved by: https://github.com/zou3519	2022-12-02 04:38:28 +00:00
Edward Z. Yang	825f4e602b	Add support for symbolic shapes to sparse tensor (#88573 ) Along the way, I undid making sparse/dense dim symint (they're dimensions, so they should be static.) Also symintify set_indices_and_values_unsafe There is a little bit of a nontrivial infra change here: previously, we didn't populate the strides field on sparse tensors. It is now populated with "empty" strides, and this meant that sparse tensors were falsely reporting they were non-overlapping dense/contiguous. I added in a hack to work around this case. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/88573 Approved by: https://github.com/anjali411	2022-11-08 03:13:42 +00:00
Horace He	b3b9786fdd	Unified symbolic shape variables between AOTAutograd and Inductor (#86659 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/86659 Approved by: https://github.com/wconstab	2022-10-14 00:24:43 +00:00
Amadeusz Skrzypczak	6be9d9a630	Add AutocastHPU support (#84927 ) New dispatch key and necessary functions are added to PyTorch. Backend implementation will be added in the external library. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84927 Approved by: https://github.com/bdhirsh	2022-10-12 19:37:16 +00:00
Edward Z. Yang	490727a35f	New calling convention for Python dispatcher (#85133 ) Instead of calling into the Python dispatcher for EVERY dispatcher call, we now have a two step process. First, we getattr(op: OpOverload, dispatch_key) to "load" the handler for the function. This can either be a conventional function (in which case we will call it, in the same way the old Python dispatcher worked), or it can be a DispatchKey, in which case we will directly call that DispatchKey in C++, bypassing marshalling between Python and C++ entirely. OpOverload.__getattr__ is carefully written so that it will cache the A further optimization would be to define __slots__ on OpOverload, and ensuring that the DispatchKey strings are interned. The resulting Python dispatcher is less flexible: after the first lookup, the handler is cached and we won't recompute it. Furthermore, by default, dispatches will not go into Python, and so you won't get stack frames for the Python dispatcher by default. But we get a huge performance improvement: on the following microbenchmark we go from 2.5s to 1.9s. ``` import time import torch from functorch import make_fx def f(x): for i in range(1000): x = x * x return x begin = time.time() res = make_fx(f, tracing_mode="symbolic")(torch.randn(10, 20)) print(time.time()-begin) ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/85133 Approved by: https://github.com/wconstab	2022-09-16 20:38:21 +00:00
Michael Voznesensky	8ca1839d32	Python Dispatcher integration with C++ dispatcher (#85050 ) #84826 but without ghstack Pull Request resolved: https://github.com/pytorch/pytorch/pull/85050 Approved by: https://github.com/malfet	2022-09-15 00:43:36 +00:00
PyTorch MergeBot	706b990306	Revert "Python Dispatcher integration with C++ dispatcher (#84826 )" This reverts commit `35f6a69191`. Reverted https://github.com/pytorch/pytorch/pull/84826 on behalf of https://github.com/malfet due to Broke dynamo, see `35f6a69191`	2022-09-14 14:07:58 +00:00
Michael Voznesensky	35f6a69191	Python Dispatcher integration with C++ dispatcher (#84826 ) Signed-off-by: Edward Z. Yang <ezyangfb.com> From @ezyang's original PR: There are a number of situations where we have non-backend kernels (e.g., CompositeImplicitAutograd, batching rules) which we would like to port to Python, but we have no way to integrate these ports with the overall system while using preexisting C++ registrations otherwise. This PR changes that by introducing a Python dispatcher (which can have its own kernels directly in Python), which can be interpose over ordinary C++ dispatch. The ingredients: We introduce a new PythonDispatcher dispatch key, that has the same tenor as FuncTorchDynamicLayerFrontMode: it works by getting triggered before every other dispatch key in the dispatch key, and shunting to a Python implementation The Python dispatcher is a per-interpreter global object that is enabled/disabled via the guard EnablePythonDispatcher/DisablePythonDispatcher. We don't make it compositional as I have no idea what a compositional version of this feature would look like. Because it is global, we don't need to memory manage it and so I use a simpler SafePyHandle (newly added) to control access to this pointer from non-Python C++. Like __torch_dispatch__, we use PyInterpreter to get to the Python interpreter to handle the dispatch. I need to reimplement dispatch table computation logic in Python. To do this, I expose a lot more helper functions for doing computations on alias dispatch keys and similar. I also improve the pybind11 handling for DispatchKey so that you can either accept the pybind11 bound enum or a string; this simplifies our binding code. See https://github.com/pybind/pybind11/issues/483#issuecomment-1237418106 for how this works; the technique is generally useful. I need to be able to call backend fallbacks. I do this by permitting you to call at a dispatch key which doesn't have a kernel for the operator; if the kernel doesn't exist, we check the backend fallback table instead. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/84826 Approved by: https://github.com/ezyang	2022-09-14 06:57:19 +00:00
YifanShenSZ	673b35c847	Better reshape with autograd support (#82754 ) (#84154 ) The original author is @YifanShenSZ and the original PR is: #82754 # Summary: Previous reshape [https://github.com/pytorch/pytorch/issues/80981](https://github.com/pytorch/pytorch/pull/80981) is ok for forward, but needs improvement for backward: need to handle "sometimes view sometimes copy" behavior. This pull request fixes it by: 1. add a new alias dispatch key `CompositeImplicitAutogradNestedTensor`, which ideally would work as nested-tensor version of `CompositeImplicitAutograd` 2. register `reshape_nested` to `reshape` by `CompositeImplicitAutogradNestedTensor` Side changes: * add contiguous memory format support to `clone_nested` * add `view_nested` * add `reshape_as_nested` Fix issue [https://github.com/pytorch/pytorch/issues/83041](https://github.com/pytorch/pytorch/issues/83041) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82754 Test Plan: Imported from GitHub, without a `Test Plan:` line. Static Docs Preview: executorch \|[Full Site](https://our.intern.facebook.com/intern/staticdocs/eph/D39023822/V13/executorch/)\| \|Modified Pages\| Reviewed By: albanD Differential Revision: D39023822 Pulled By: drisspg Pull Request resolved: https://github.com/pytorch/pytorch/pull/84154 Approved by: https://github.com/bdhirsh, https://github.com/albanD	2022-09-01 20:01:39 +00:00
Elias Ellison	642aed8b99	Add Autocast Support for FakeTensors / use fake device dispatch keys (#82449 ) From PR: ``` Note: [Fake Tensor Dispatch Keys] In order to model the behavior of device-specific autocast and autograd logic, we update the dispatch keys of FakeTensors to reflect their fake device. This includes the BackendComponent (DispatchKey::Meta -> DispatchKey::CUDA), and also the BackendComponent related Autocast and Autograd keys. __torch__dispatch__ sits below Autocast and Autograd, and is only invoked when we are at the kernel for the BackendComponent. Then, we add Meta to the thread-local dispatch include set to hit the meta kernel instead of the kernel of the BackendComponent for the fake device. ``` Also adds the `conv1/2/3d.padding` operators to the Autocast rule set. Without that fix, the FakeTensor dtype would diverge. See: https://github.com/pytorch/pytorch/issues/81608 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82449 Approved by: https://github.com/ezyang	2022-08-01 21:40:36 +00:00
Edward Z. Yang	1724e9f21f	Refactor functionality and backend keys to reduce duplication (#81752 ) Define some macros for stamping these out, and then use them everywhere applicable. Parsing should get this treatment too but I leave it to a follow up. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/81752 Approved by: https://github.com/cpuhrsch, https://github.com/bdhirsh	2022-07-21 21:23:54 +00:00
Brian Hirsh	adf8060600	add a new alias key for functional to view op decompositions Pull Request resolved: https://github.com/pytorch/pytorch/pull/79615 Approved by: https://github.com/zou3519	2022-06-15 23:18:09 +00:00
Edward Z. Yang	7313a7a987	Make Meta into a backend component Seems like it should be one. This will make it possible to register meta implementations even when there is a CompositeImplicitAutograd registration already. It also paves the way for sparse meta, etc. Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/78469 Approved by: https://github.com/ngimel	2022-05-31 18:59:16 +00:00
Kulin Seth	f348b1b2b5	Add the Runtime components for MPS backend. (#76725 ) The PR adds the runtime components and few basic operations like copy, as_strided for MPS backend. Current list of identified TODOs are: - https://github.com/pytorch/pytorch/issues/77176 - Unify the logic with CUDACachingAllocator and remove redundant code. - https://github.com/pytorch/pytorch/issues/77170 - Look into using C++ smart pointers where possible with ObjC code - Use empty_strided_generic() to implement the `empty_strided_mps` code - https://github.com/pytorch/pytorch/issues/77144 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76725 Approved by: https://github.com/albanD	2022-05-11 17:19:45 +00:00
Kulin Seth	54c75e1e8f	Add "mps" device to PyTorch framework. Remove the "mlc" device for Mac platforms. This commit will be followed up with: * adding MPS runtime components * PyTorch ops for MPS device Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/76291 Approved by: https://github.com/albanD	2022-04-27 19:21:57 +00:00
Can Balioglu	a0bf0f5611	Add new dispatch keys for Fake Tensor and Deferred Module Initialization Thanks to @bdhirsh's work, we now have room for new dispatch keys in `DispatchKey` enum. This PR adds two new keys for out-of-core [Fake Tensor](https://pytorch.org/torchdistx/latest/fake_tensor.html) and [Deferred Module Initialization](https://pytorch.org/torchdistx/latest/deferred_init.html) features. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76139 Approved by: https://github.com/bdhirsh	2022-04-27 18:48:44 +00:00
Guo Yejun	6f991fc5fc	add XPU support for autocast Pull Request resolved: https://github.com/pytorch/pytorch/pull/75250 Approved by: https://github.com/bdhirsh	2022-04-19 21:18:23 +00:00
Scott Wolchok	0a5e788ab2	[PyTorch] Add NestedTensorCPU and NestedTensorCUDA dispatch keys (#75808 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/75808 Just as it is often difficult to write a single kernel that can handle both CPU and CUDA, so can it be difficult to do the same for NestedTensor. ghstack-source-id: 154171542 (Note: this ignores all push blocking failures!) Test Plan: CI? Reviewed By: bdhirsh Differential Revision: D35603836 fbshipit-source-id: fb0ebb19d34531ed96ce176aca325f8e2b5f90e6 (cherry picked from commit 0bcd753f93c04256c1b745f84a74ecccf0dceef5)	2022-04-19 18:12:12 +00:00
Anthony Barbier	ce9e27a0fc	Add new keys for Graphcore IPU (DispatchKey / Backend / DeviceType) We need a key to register our out of tree backend: https://github.com/graphcore/poptorch Pull Request resolved: https://github.com/pytorch/pytorch/pull/74763 Approved by: https://github.com/bdhirsh	2022-04-07 17:18:45 +00:00
Brian Hirsh	1b7d7d9327	Reland: "free up dispatch key space (in C++)" (#74963 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74963 This is a re-land of D35192346 (`9872a06d77`) and D35192317 (`a9216cde6c`), which together are a diff that changes the internal representation of `DispatchKeySet` in pytorch core to free up the number of dispatch keys that we have available. See a more detailed description of the design in the original PR: https://github.com/pytorch/pytorch/pull/69633. The original PR broke Milan workflows, which use a pytorch mobile build, and manifested as a memory corruption bug inside of `liboacrmerged.so`. Background: Existing Mobile Optimization Pytorch mobile builds have an existing optimization (here `cc23725e89/c10/core/DispatchKey.h (L382)` and here `cc23725e89/aten/src/ATen/core/dispatch/OperatorEntry.h (L214)`), which works as follows: Every operator in pytorch has a "dispatch table" of function pointers, corresponding to all of the (up to 64) different kernels that we might dispatch to when we run an operator in pytorch (autograd, cpu, cuda, complex number support, etc). In mobile builds, the size of that table is shrunk from 64 to 8 to save a bunch of space, because mobile doesn't end up using the functionality associated with most dispatch keys. The dispatcher also has a notion of "fallback kernels", which are kernels that you can register to a particular dispatch key, but should be able to work for "any operator". The array of fallback kernels is defined here: `cc23725e89/aten/src/ATen/core/dispatch/Dispatcher.h (L294)`. The mobile-optimization currently does not extend to this array (it wouldn't be that useful anyway because there is only one array of fallback kernels globally - vs. there is a separate dispatch table of function pointers per operator). So the per-operator tables on mobile are size 8, while the fallback table is size 64. The Bug This PR actually makes it difficult to enable that optimization separately for the per-operator arrays vs. the fallback array, and incidentally shrunk the size of the fallback array from 64 to 8 for mobile (that happened on this line: https://github.com/pytorch/pytorch/pull/69633/files#diff-f735cd7aa68f15b624100cbc4bb3b5ea76ffc7c9d3bec3b0ccabaa09609e5319R294). That isn't a problem by itself (since mobile doesn't actually use any of the fallbacks that can no longer be stored). However, pytorch core will still register all of those fallback kernels on startup in mobile builds, even if they aren't used. When we tried to register one of those fallbacks on startup, it would try to dump the kernel somewhere in memory past the bounds of the (now smaller) array inside of the `Dispatcher` object, `backendFallbackKernels_`. Why didn't this problem show up in OSS CI? Why didn't it break other internal mobile workflows aside from Milan? Ideally, this failure would show up as part of the OSS signal on GitHub, since we already have mobile OSS builds. Given that it was another memory corruption issue that only affected Milan (subset of mobile), I'm not sure what's specific about Milan's builds that caused it only to manifest there. dreiss I wonder if there's another flavor of mobile builds we could run in OSS CI that could potentially help catch this? The debugging experience was pretty difficult Debugging the Milan-specific failure was made difficult by the following: (1) lack of CI - the original Milan failure didn't surface on my original diff, because the Milan job(s) that failed weren't triggered to run on pytorch changes. There's probably a balance to strike here, since those jobs will only be useful if they aren't flaky, and if they can produce reliable failure logs for debugging. (2) It's difficult to get a repro. - my work laptop doesn't have the right specs to run the Milan development workflow (not enough disk space) - There is an existing OnDemand workflow for Milan, but it appears to be relatively new, and after a bunch of help from MarcioPorto, we ran into issues forwarding the log output from Milan tests on the emulator back to the terminal (see the original discussion here: https://fb.workplace.com/groups/OnDemandFRL/permalink/1424937774645433/) (3) Lack of stack-traces. - Most Milan failures didn't include actionable stack traces. phding generously helped me debug by running my suggested patches locally, and reporting back if there were any failures. The failing test didn't include a stack trace though (just the line where the crash appeared), so I ended up making some educated guesses about what the issue was based on the area of the crash. ghstack-source-id: 152688542 Test Plan: Confirmed with phding that the broken Milan workflow from the previous version of this diff is now passing. Reviewed By: phding, albanD Differential Revision: D35222806 fbshipit-source-id: 0ad115a0f768bc8ea5d4c203b2990254c7092d30 (cherry picked from commit 002b91966f11fd55ab3fa3801b636fa39a6dd12c)	2022-03-31 21:52:38 +00:00
Brian Hirsh	9872a06d77	Back out "free up dispatch key space (in C++)" (#74859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74859 Original commit changeset: 6d1dd0fd8144 Original Phabricator Diff: D34227616 (`2cbddc0e9b`) ghstack-source-id: 152381077 (Note: this ignores all push blocking failures!) Test Plan: Test on Milan with "get weather utterance" buck build fbsourcefbandroid/mode/opt fbsourcefbandroid/mode/milan_build_rdk //fbandroid/apps/wearable/system/speechservice:speechservice_target30_xhdpi_armv7_release_debug_keystore -c pt.has_backtaces=1 Reviewed By: phding Differential Revision: D35192346 fbshipit-source-id: b962de5d5effaf23f9aa8afd3ef36f8c6383de5b (cherry picked from commit 913e3027a11457aaa2d97a9d89ebc6133b14213c)	2022-03-29 15:39:17 +00:00
Brian Hirsh	2cbddc0e9b	free up dispatch key space (in C++) (#72827 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72827 Reland of D34034848 (`6690256021`) ghstack-source-id: 152161452 Test Plan: Confirm that Milan tests are passing Reviewed By: ezyang Differential Revision: D34227616 fbshipit-source-id: 6d1dd0fd8144dfbd9e194cd7564cce017e7db968 (cherry picked from commit e5c1b29fedd5c2a0bad810cedc94aa784136b6aa)	2022-03-25 17:04:51 +00:00

1 2 3

113 Commits