This PR:
- adds a new torch.library.register_fake and deprecates
torch.library.impl_abstract. The motivation is that we have a lot of
confusion around the naming so we are going to align the naming with
the actual subsystem (FakeTensor).
- renames `m.impl_abstract_pystub("fbgemm_gpu.sparse_ops")` to
`m.has_python_registration("fbgemm_gpu.sparse_ops")`. No deprecation
here yet; I need to test how this works with static initialization.
- Renames a bunch of internals to match (e.g. abstractimplpystub ->
pystub)
I'm scared to rename the Python-side internal APIs (e.g.
torch._library.abstract_impl) because of torch.package concerns. I'll do
that in its own isolated PR next just in case it causes problems.
DEPRECATION NOTE: torch.library.impl_abstract was renamed to to
torch.library.register_fake. Please use register_fake. We'll delete
impl_abstract in a future version of PyTorch.
Test Plan:
- existing tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123937
Approved by: https://github.com/albanD
Relying on object lifetimes in Python is a bad idea due to reference
cycles. Previously, when a torch.library.Library object gets destroyed,
it clears all the registrations associated with it, but it's unclear
when it actually gets destroyed due to the existence of refcycles.
This PR:
- adds torch::Library::clear(), which deterministically releases all of
the RAII registration handles of the torch::Library object
- adds a new `torch.library._scoped_library` context manager, which creates
a library and cleans it up at the end of the scope using the previous item.
All tests (unless they already handle library lifetimes) should use
this new API
- Rewrites some flaky tests to use `_scoped_library`.
In the future we'll probably migrate all of our torch.library tests to
use `_scoped_library`, but that's kind of annoying because we have
multiple thousands of LOC
I'm hoping this will deflake those tests; we'll see.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118318
Approved by: https://github.com/albanD
Summary:
We've made the following changes:
- The new way to use the API is `m.impl_abstract_pystub(module, context)`.
Every subsequent m.def of an op inside the TORCH_LIBRARY block gives
the op the `impl_abstract_pystub`.
- Added a mechanism to determine if an operator was defined in Python or C++.
Library.define in Python appends the op to a global set, which is analogous
to what we do for tracking Library.impl.
- If someone does `torch.library.impl_abstract` in Python for an operator, then
we require that it has an `impl_abstract_pystub` specified and we also check
that the module in the `impl_abstract_pystub` is the same as the module where
the call to `torch.library.impl_abstract` exists.
- Unfortunately we can't check the "context" (which is the buck target on
buck-based systems) because buck sits above us.
bypass-github-export-checks
Test Plan: - existing tests
Differential Revision: D51080493
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113182
Approved by: https://github.com/ezyang
Summary:
We've made the following changes:
- The new way to use the API is `m.impl_abstract_pystub(module, context)`.
Every subsequent m.def of an op inside the TORCH_LIBRARY block gives
the op the `impl_abstract_pystub`.
- Added a mechanism to determine if an operator was defined in Python or C++.
Library.define in Python appends the op to a global set, which is analogous
to what we do for tracking Library.impl.
- If someone does `torch.library.impl_abstract` in Python for an operator, then
we require that it has an `impl_abstract_pystub` specified and we also check
that the module in the `impl_abstract_pystub` is the same as the module where
the call to `torch.library.impl_abstract` exists.
- Unfortunately we can't check the "context" (which is the buck target on
buck-based systems) because buck sits above us.
Test Plan: - existing tests
Differential Revision: D50972148
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112851
Approved by: https://github.com/ezyang
We want users to be able to define custom ops in C++ but put the
abstract impl in Python (since it is easier to write them in Python and
the abstract impl better models device semantics and data-dependent
operators).
`m.impl_abstract_pystub(opname, python_module, context)` declares the
abstract_impl of the operator to exist in the given python module.
When the abstract_impl needs to be accessed (either via FakeTensor or
Meta), and it does not exist, the PyTorch Dispatcher will yell
with a descriptive error message.
Some details:
- We construct a new global AbstractImplPyStub mapping in
Dispatcher.cpp. Read/write to this map is protected by the Dispatcher
lock.
- We add a new Meta Tensor fallback kernel. The fallback errors out if there is
no meta kernel, but also offers a nicer error message if we see that there is
a pystub.
- We create a `torch._utils_internal.throw_abstract_impl_not_imported_error`
helper function to throw errors. This way, we can throw different error
messages in OSS PyTorch vs internal PyTorch. To invoke this from C++, we
added a PyInterpreter::throw_abstract_impl_not_imported_error.
Differential Revision: [D49464753](https://our.internmc.facebook.com/intern/diff/D49464753/)
Differential Revision: [D49464753](https://our.internmc.facebook.com/intern/diff/D49464753)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109529
Approved by: https://github.com/ezyang, https://github.com/bdhirsh
# Motivate
Add XPU device type to CppFunction dispatch overload function.
We previously omitted it.
# Solution
Add XPU device type.
# Additional
This list is synchronized with the k-constants in c10/core/DeviceType.h
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96849
Approved by: https://github.com/ezyang
See strategy at PythonOpRegistrationTrampoline.cpp for the
big picture.
Along the way, I made OperatorHandle support == and hashing,
and slightly changed the low level python_dispatch impl API
to disallow empty strings for dispatch key, which had the knock
on effect of requiring us to explicitly make sure we pass in
CompositeImplicitAutograd if we would have passed in "" (I didn't apply
this to the rest of the file because I'm lazy.)
Test strategy is we delete the logic for preventing Python op
registrations in torch from being skipped in a torchdeploy context
and show CI still works.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87162
Approved by: https://github.com/anjali411, https://github.com/bdhirsh
Summary:
This change causes Messenger Dekstop to crash on M1 devices when the user enables background during the call. The change apparently causes the compiler to emit AVX instructions that are not supported by Rosetta.
This is a surgical backout that only backs out the changes in C++ side,
and not Python bindings which I believe are not shipped with Workplace Chat.
Test Plan:
Run the application and make sure that it doesn't crash when the background is enabled
https://pxl.cl/23VSH
Reviewed By: ezyang
Differential Revision: D36358832
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77414
Approved by: https://github.com/bigfootjon
Summary: The new PrivateUse1 DeviceType is associated with the PrivateUse1 DispatchKey, which can be used for non-public devices without introducing a new device type. Note that the stringified name of the PrivateUse1 device is "privateuseone".
Test Plan: All CI should pass.
Differential Revision: D35859437
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77208
Approved by: https://github.com/bdhirsh
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68693
Generation of python bindings for native functions is split over 8
different files. One for each namespace, with the torch namespace
split into 3 shards, and methods in their own file as well. This
change ensures that editing any single (non-method) operator only
causes one of these files to be rebuilt.
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision: D32596270
Pulled By: albanD
fbshipit-source-id: 0570ec69e7476b8f1bc21138ba18fe8f95ebbe3f
(cherry picked from commit ba0fc71a3a)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63688
CppFunction is used for function registration, so it's not performance-sensitive. Outlining the destructor should reduce code size.
ghstack-source-id: 146648927
Test Plan: Mobile buildsizebot
Reviewed By: dhruvbird
Differential Revision: D30462640
fbshipit-source-id: de410f933bf936c16769a10a52092469007c8487
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67340
Currently Torchbind classes arent selective. This makes is a rough granularity pass that will remove entire classes if they arent selected. If we need finer granularity in the future we can make individual methods within classes Selective though instrumenting that will be significantly more involved I think. On a linux build only __torch__.torch.classes._nnapi.Compilation remains unselective. I cant find where its registered :P (theres a couple Android only ones and presumably some metal only ones as well)
Many of the classes registered in functions returned a reference to the class that was created. I talked with dreiss about it and we decided that this seemingly didnt serve any purpose, and leaving it like that would make the return value difficult (but possible) to create with selectivity. Since it seems useless anyway I just changed them to return an int so that they can still be called from a global scope, but not have any issues with the return type.
ghstack-source-id: 141690776
Test Plan: CI, model unit tests, test models in prod apps
Reviewed By: dhruvbird
Differential Revision: D31092564
fbshipit-source-id: 657f7eb83490292436c15cf134ceca9b72fb9e1a
Summary:
This PR implements the necessary hooks/stubs/enums/etc for complete ONNX Runtime (ORT) Eager Mode integration. The actual extension will live out of tree at https://github.com/pytorch/ort.
We have been [working on this at Microsoft](https://github.com/microsoft/onnxruntime-pytorch/tree/eager-ort/torch_onnxruntime) for the last few months, and are finally ready to contribute the PyTorch core changes upstream (nothing major or exciting, just the usual boilerplate for adding new backends).
The ORT backend will allow us to ferry [almost] all torch ops into granular ONNX kernels that ORT will eagerly execute against any devices it supports (therefore, we only need a single ORT backend from a PyTorch perspective).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58248
Reviewed By: astaff
Differential Revision: D30344992
Pulled By: albanD
fbshipit-source-id: 69082b32121246340d686e16653626114b7714b2
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62658
Boxed functors, like their unboxed brethren, support operators which
aren't just a function pointer, but a function pointer with some
associated global state that is allocated at registration time.
The use case I have in mind with this implementation is "dispatcher
API from Python", where the extra state kernel registrations need is
the PyObject callable we will invoke to do the actual invocation.
See next PR in this stack.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: bhosmer
Differential Revision: D30074925
Pulled By: ezyang
fbshipit-source-id: ee040edbbec1e607486d338d1ea78bb5c6b2ece9
Summary:
The function name and return type both are called `class_`, therefore they are ambiguous and this is UB and does not work on NVCC. See the tests for the failure case.
Thanks for the help of Thibaut Lutz from NVIDIA's compiler team.
cc: yueyericardo ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57962
Reviewed By: mruberry
Differential Revision: D28359400
Pulled By: ezyang
fbshipit-source-id: c64ec89203f99f656611aba34f7424eed7bc9e7c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53143
Meta is now an honest to goodness device type, like cpu, so you can use
device='meta' to trigger allocation of meta tensors. This way better
than empty_meta since we now have working API for most factory functions
(they don't necessarily work yet, though, because need to register Meta
versions of those functions.)
Some subtleties:
- I decided to drop the concept of CPU versus CUDA meta tensors; meta
tensors are device agnostic. It's hard to say exactly what the
correct level of abstraction here is, but in this particular case
implementation considerations trump semantic considerations: it
is way easier to have just a meta device, than to have a meta device
AND a cpu device AND a cuda device. This may limit the applicability
of meta tensors for tracing models that do explicit cpu()/cuda()
conversions (unless, perhaps, we make those operations no-ops on meta
tensors).
- I noticed that the DeviceType uppercase strings are kind of weird.
Are they really supposed to be all caps? That's weird.
- I moved the Meta dispatch key to live with the rest of the "device"
dispatch keys.
- I intentionally did NOT add a Backend for Meta. For now, I'm going to
hope meta tensors never exercise any of the Backend conversion code;
even if it does, better to fix the code to just stop converting to and
from Backend.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: samestep
Differential Revision: D26763552
Pulled By: ezyang
fbshipit-source-id: 14633b6ca738e60b921db66a763155d01795480d
Summary:
Apple recently announced ML Compute, a new framework available in macOS Big Sur, which enables users to accelerate the training of neural networks on Mac hardware. This PR is the first on a series of PRs that will enable the integration with ML Compute. Most of the integration code will live on a separate subrepo named `mlc`.
The integration with `mlc` (ML Compute) will be very similar to that of xla. We rely on registering our ops through:
TORCH_LIBRARY_IMPL(aten, PrivateUse1, m) {
m.impl_UNBOXED(<op_schema_name>, &customized_op_kernel)
...
}
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50634
Reviewed By: malfet
Differential Revision: D26614213
Pulled By: smessmer
fbshipit-source-id: 3b492b346c61cc3950ac880ac01a82fbdddbc07b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52150
Renames "whitelist" to "allowlist" to conform to company use standards, prevent critical errors raised by linters which detect the old usage, and to move toward more self-descriptive terminology.
Test Plan: Sandcastle
Reviewed By: suo
Differential Revision: D26405520
fbshipit-source-id: 9c3a41591d4e29c0197de9a8f5858c9c29271e26
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51315
The TODOs said to remove this wrapper, and it seems that it can be removed easily.
ghstack-source-id: 121363465
Test Plan: CI
Reviewed By: ezyang
Differential Revision: D26137147
fbshipit-source-id: f1e5971dca071f37400d77cc823214527e4231bc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50457
The code to infer function schema from a C++ function relies on templates and code expansion. This uses valuable binary size. We can avoid inferring the schema from the C++ function type (arguments, name, return value) in case that the function implementation is being added to the dispatcher via `m.impl`. In this case, it is assumed that we have a schema registered already. Adding an implementation via `m.def` still triggers schema inferrence.
In addition, we don't do schema schema checks on mobile, so the schema is not needed in the first place.
ghstack-source-id: 120915259
Test Plan:
Auto-unit tests succeed.
### Size test: igios
```
D25853094-V1 (https://www.internalfb.com/intern/diff/D25853094/?dest_number=119632217)
igios: Succeeded
Change in Download Size for arm64 + 3x assets variation: -21.8 KiB
Change in Uncompressed Size for arm64 + 3x assets variation: -45.5 KiB
Mbex Comparison: https://our.intern.facebook.com/intern/mbex/bsb:261049318687117@base/bsb:261049318687117@diff/
```
### Size test: fbios
```
D25853094-V1 (https://www.internalfb.com/intern/diff/D25853094/?dest_number=119632217)
fbios: Succeeded
Change in Download Size for arm64 + 3x assets variation: -27.2 KiB
Change in Uncompressed Size for arm64 + 3x assets variation: -80.1 KiB
Mbex Comparison: https://our.intern.facebook.com/intern/mbex/bsb:454289062251865@base/bsb:454289062251865@diff/
```
Reviewed By: smessmer
Differential Revision: D25853094
fbshipit-source-id: e138d9dff7561d424bfb732f3a5898466f018f60
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49220
Since all ops are c10-full, we can remove .impl_UNBOXED now.
This also removes the ability of KernelFunction or CppFunction to store unboxedOnly kernels.
ghstack-source-id: 119450489
Test Plan: waitforsandcastle
Reviewed By: ezyang
Differential Revision: D25490225
fbshipit-source-id: 32de9d591e6a842fe18abc82541580647e9cfdad
Summary:
Since caffe2 and torch have been consolidated, CAFFE2_API should be merged with TORCH_API. Addresses a TODO.
Manually edited some references of the removed `CAFFE2_API`:
* `CONTRIBUTING.md`
* `caffe2/proto/CMakeLists.txt`
* `cmake/ProtoBuf.cmake`
* `c10/macros/Export.h`
* `torch/csrc/WindowsTorchApiMacro.h`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49496
Reviewed By: malfet, samestep
Differential Revision: D25600726
Pulled By: janeyx99
fbshipit-source-id: 7e068d959e397ac183c097d7e9a9afeca5ddd782