pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Brian Hirsh	9ab9b12d35	free up dispatch key space (in C++) (#72402 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72402 The original PR had an array-out-of-bounds access in `DispatchKeyExtractor.cpp`, that wasn't caught by ASAN and appeared to only manifest in a subset of android internal tests. After fixing the OOB access (and adding more asserts), I confirmed that the android internal test passes. Reland of D33255193 (`20b8653dfa`) ghstack-source-id: 148830728 Test Plan: Steps to test: (1) connect to a mobile OD (2) run `one_world android emulator android-29` in a terminal to start the android emulator (3) In a separate terminal, run the test: `buck test //fbandroid/instrumentation_tests/com/facebook/pytorch/bi_xray:instrumentation_test -c test.external_runner=tpx -- --regex 'testBIXRayModel.*PyTorchBIXRayInstrumentationTest' --force-remote-execution --run-disabled` I also ran `buck test fbandroid/mode/dbg //fbandroid/instrumentation_tests/com/facebook/pytorch/bi_xray:instrumentation_test`, which failed before and passed after the PR. Reviewed By: albanD Differential Revision: D34034848 fbshipit-source-id: 9677ee2c0a1afd1183896f7055009445712523c5	2022-02-14 07:59:36 -08:00
Jacob Szwejbka	3cc63cb2ea	Back out "free up dispatch key space (in C++)" Summary: I think this diff stack broke all the related tasks below. Test Plan: For our failing tests: buck test //fbandroid/instrumentation_tests/com/facebook/pytorch/bi_xray:instrumentation_test -c test.external_runner=tpx -- --regex 'testBIXRayModel.*PyTorchBIXRayInstrumentationTest' --force-remote-execution --run-disabled For the ubn: Not really sure what to do, trying to build the app and see if I can use an effect? Reviewed By: shoumikhin Differential Revision: D34018849 fbshipit-source-id: 3571718cb6621931af931b494e0a70d6e0164e65	2022-02-04 17:21:52 -08:00
Brian Hirsh	eac0b13005	free up dispatch key space (in C++) (#69633 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69633 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33255193 Pulled By: bdhirsh fbshipit-source-id: 79773e9c15bf4f2f27675121a49ff5ffd1375238	2022-02-04 09:54:56 -08:00
vfdev	4d28cef03a	Added AutocastCPU string (#70013 ) Summary: Description: - Added "AutocastCPU" string repr into `toString` method Before ``` std::cout << c10::DispatchKey::AutocastCPU; > UNKNOWN_TENSOR_TYPE_ID ``` and now: ``` std::cout << c10::DispatchKey::AutocastCPU; > AutocastCPU ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/70013 Reviewed By: ejguan Differential Revision: D33550777 Pulled By: bdhirsh fbshipit-source-id: b31e15e6d52fc1768af085e428328117d588f283	2022-01-12 12:06:46 -08:00
anjali411	3e6164449f	Add efficient zero tensors (#64837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64837 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D32834987 Pulled By: anjali411 fbshipit-source-id: 20ea08ade0db0044ca633d9c1a117a6a2e65d1fd	2021-12-08 10:37:39 -08:00
Mark Richardson	834bd3134e	Back out "Add efficient zero tensors" (#69327 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69327 Original commit changeset: d44096d88265 Original Phabricator Diff: D32144240 (`668574af4a`) Test Plan: CI original diff failed 175 builds in CI Reviewed By: airboyang, anjali411 Differential Revision: D32809407 fbshipit-source-id: c7c8e69bcee0274992e2d5da901f035332e60071	2021-12-02 19:11:41 -08:00
anjali411	668574af4a	Add efficient zero tensors (#64837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64837 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32144240 Pulled By: anjali411 fbshipit-source-id: d44096d882657c7f9270a16636900e0b73cefa40	2021-12-02 08:47:45 -08:00
Brian Hirsh	0032fa7725	Add a Functionalization pass in core (#64432 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64432 Original PR description + feedback here: https://github.com/pytorch/pytorch/pull/63048 I've addressed all of the feedback in the original PR and made some pretty large changes, listed below. Table of Contents - Starting points - List of the main changes from the original PR - Next Steps - Example codegen output (for a view, mutation, and view+mutation op) Starting Points A good place to start when looking through the PR: * Alban mentioned that this is a useful mental model (thanks Ed for originally making this clear to me). Semantically, the pass currently does THREE things, which are all needed by functorch - all fused together into one big pass. * (a) alias removal, which replaces {view} calls with {view}_copy calls, and manually tracks aliasing information, so that when one tensor is mutated, we re-apply the same mutation to all of the aliases. This is the bulk of the work - once this is done, the next 2 things are trivial to implement. * (b) mutation removal, which is easy to do once we know that there are no aliases. Every mutation `a.add_(b)` becomes `a.replace_(a.add(b))` * (c) reapplying views: all of the `{view}_copy` calls are replaced with `{view}` calls again. This is an optimization that we can make specifically for functorch (and strided backends), that only care about mutation removal and not alias removal * XLA and Vulkan only want (a), or (a) + (b). Later, we'll want to split this out so that you can actually opt into different versions of this logic. * There is currently no {view}_copy replacement, because the pass just <replace views with copies> and <replace copies with views> steps have been combined. Later, we'll want to actually implement {view}_copy variants of each view operator, probably with codegen. * documentation breadcrumb 1, in `FunctionalTensorWrapper.cpp`: https://github.com/pytorch/pytorch/pull/64432/files#diff-a0bac99bf205dba5b94cb64fc2466d3d55d991887572f9cd6a02e27b3a91dd60R59 (you might have to expand the `FunctionalTensorWrapper.cpp` file, which GitHub closes by default because it's large) * documentation breadcrumb 2, in `FunctionalTensorWrapper.h`: https://github.com/pytorch/pytorch/pull/64432/files#diff-c945c71a4ccac65871f24a912e8904f9a5088b24a32e636727ea9c8fe920708aR12 * Reading through the codegen output at the bottom of this description. Main changes from the original PR (1) I use lambdas instead of a giant enum to handle all of the different views. This results in less boilerplate per view op (and more stuff that can be codegen'd). Every `ViewMeta` object now contains a `forward` and `reverse` lambda, that knows how to replay the view and its inverse. This makes the actual code that executes the replaying logic a lot less boilerplate-y (see `Alias::sync_update_operations` and `FunctionalTensorWrapper::sync_`) (2) Every tensor during the functionalization pass is always wrapped in a `FunctionalTensorWrapper`. This is potentially unnecessary for Vulkan/XLA, and will have a mild perf impact, but for now this PR just targets the functorch use case. I previously had a complicated design a (`FunctionalTensorImplBase` class) to avoid needing the wrapper for XLA, but it had some subtleties that are gonna require more thought to fix, so I'm pushing that off for now. (3) `FunctionalTensorWrapper` objects accurately report stride information. It's a little annoying to do this though, because the logic that calculates stride info for each view isn't easily separated from the actual view kernels in core, `at::native::{view}`. I do this by adding logic in each `at::functionalization::{view}` kernel to call the reference implementation `at::native::{view}`. I don't do anything with the output aside from taking it's size/stride/storage_offset to set the actual output tensor's size/stride/storage_offset correctly. There's another annoying part to this: I'm pretty sure that we want to pass in the actual wrapper tensors directly into the native kernels, not their inner unwrapped values. But there are some `at::native::{view}` kernels that call other tensor methods, which re-invokes the dispatcher, calling functionalization/functorch kernels that try do the unwrapping. To do this, right now I have an `AutoDispatchDirectlyToNative` guard that basically ensures that any tensor methods called inside of the at::native::{view} op always redispatch straight to the CPU kernel (which will be another at::native:: kernel). This feels kind of heavy handed, but I'm not sure of a better way to do it. (4) `FunctionalTensorWrapper` objects accurately report aliasing information. There's a new `FunctionalStorageImpl` class (subclass of `StorageImpl`) that allows tensors in the functionalization pass to accurately alias storage. If two tensors `a` and `b` in a functionalized program are views of one another, then `a.storage.is_alias_of(b.storage)` should return true. I added this in a pretty similar way to how meta tensors allocate storage, although I don't pass in an actual allocator (I think this is fine because you should never resize a functional tensor's storage). One thing I'm not sure about - should `FunctionalTensorWrapper` set `storage_access_should_throw_`: (a) always, (b) never, (c) only if its wrapped tensor has it set. Right now I have it not set, mostly because calling the reference view functions (`at::native::{view}`) requires looking at the storage. But that means that if you try to access storage from python in a functionalized program, you'll get silent garbage instead of an error. Related question: are we planning on exposing meta tensor storage to python in the future (even though it contains garbage)? (5) better docs :) View operator coverage (6) The functionalization pass now gets math-composite view ops for free. I didn't add the `Functionalize` dispatch key to the composite set, because I don't want composite ops like `torch.ones` to get decomposed before hitting the functionalization pass. Instead, I added codegen to manually register the `at::native::` kernels of composite view ops. This is a little hairy, because the names of the `at::native::` kernels aren't easily accessible. They're stored in a `Dict[DispatchKey, BackendIndex]`. I made a best-effort attempt to get each view kernel's name, basically by assuming that every view op has either a composite or cpu implementation. There's also a hardcoded list of composite view ops in `gen_inplace_or_view_type.py`, but it looks like it's wrong. This is probably worth rationalizing later, but instead I created a new list of the "complete" set of composite view ops, and preserved the old set by hardcoding the delta between the two sets. (7) I've added codegen for ops that are both views AND mutations, like `transpose_()` (why do we even have these {emoji:1f622}). From some light testing, it looks like they work correctly with one caveat: I had a hard time ensuring that functorch programs that mutate their inputs using ops like `transpose_()` preserve the input mutations after the program finishes running. For (in my corresponding functorch branch) I emit a warning when this happens, and just don't preserve the mutation (8) I added `{view}_inverse` implementations for every view op, in `FunctionalInverses.cpp`. These are needed to take mutations made to views and replay them back onto the base. To reduce boilerplate, the codegen generates function declarations for each `{view}_inverse` function, so you get a nice compiler error when someone eventually adds a new view op. The only view ops currently not supported are (a) as_strided, and (b) the sparse view ops (values()/indices()). I can add support for as_strided, but it needs an `as_strided_inverse()` function. That will look really similar to the `as_strided_backward()` function in FunctionsManual.cpp, but it has some noticeable differences: we basically want an `as_strided_embed` for autograd and `as_strided_scatter` for functionalization. We also will probably need them to be primitives w.r.t to autograd, since the currently implementation for autograd uses view().copy_() calls that XLA won't be able to handle. I'm wondering if anyone has any objections, but otherwise I can make those change (which will require writing backward formulas for `as_strided_embed` and `as_strided_scatter`). I did a bunch of manual testing that all looks pretty good, but it's definitely not fully tested. Ed pointed out that once XLA uses this pass (or at least once there's a POC), we can just run the existing xla view test suite. Hopefully that delay is okay - if it's not, maybe we can think about using OpInfos similar to how functorch uses them for testing. Note: there's some duplication with autograd's view code. Every `{view}_inverse` implementation is really similar to the implementation for that view listed in `derivatives.yaml`. There are some major differences though: * the autograd implementations over those backwards functions (like `permute_backwards()`, in `FunctionsManual.cpp`) internally call other view ops. For functoinalization, we want them to (eventually call `{view}_copy` operators). * For view ops that take a subset of the original storage, like `slice/select/diagonal/as_strided()`, the autograd backward functions fill the "spaces" in the inverse call with zeroes. For functionalizations, we want to fill them with the value of `base` at those positions. It looks like this currently applies to 6 total ops (since we can ignore composites): * select * slice * diagonal * as_stridied * split * split_with_sizes A nice end state would probably be for the autograd + functoinalization codegen to both look at the same yaml (either `derivatives.yaml`, or something else), and automatically generate the right thing. I didn't leave that in scope for this PR though. Current State + Next Steps There are a bunch of followups after this PR eventually lands. Roughly in order: * Use the current pass to register problematic composite ops in functorch. Also, nested `functionalize()` calls aren't supported yet (I mostly just need to remove some debug asserts and test it). * Work on freeing up dispatch key space in the by deduplicating the `{backend}`/`Autograd{backend}`/`Sparse{backend}`/`Quantized{backend}` keys * Once we have more dispatch keys, split up this pass into 3 pieces - it's currently fused, and doesn't do the right thing for vulkan/XLA. Specifically, all of the `{view}` calls in the current pass's view-replay logic should turn into `{view}_copy` calls that vulkan/XLA know how to implement, and there will be separate passes for (a) removing mutations, and (b) turning `{view}_copy` calls back into `{view}` calls. For Vulkan, we eventually want a pass that ONLY removes aliasing and view calls, and doesn't remove mutations. We can also probably make the 2 new passes user dispatch keys to save dispatch key space, if they'll only be used by functorch anyway. * Do more of a dive on perf for the vulkan/xla use cases. There are several areas to improve perf with varying levels of effort required. The simplest one that I'll probably do regardless is to codegen the out-of-place kernels instead of using a boxed fallback. Getting a POC working for xla will also be useful to test the view operator coverage. Example Codegen Output View Op: ``` ::std::vector<at::Tensor> split_Tensor(c10::DispatchKeySet ks, const at::Tensor & self, int64_t split_size, int64_t dim) { auto self_ = at::functionalization::impl::unwrapFunctionalTensor(self); ::std::vector<at::Tensor> out; { at::AutoDispatchBelowFunctionalize guard; auto tmp_output = at::redispatch::split(ks & c10::after_func_keyset, self_, split_size, dim); out = at::functionalization::impl::wrapFunctionalTensor(tmp_output); // I'm fusing the [alias removal], [mutation removal], [add views back] passes together. // Later, we'll want to turn them into separate passes (since e.g. vulkan only cares about alias removal). } at::functionalization::ViewMeta view_meta = at::functionalization::ViewMeta( [split_size, dim](const at::Tensor& base, int64_t mutated_view_idx) -> at::Tensor { return base.split(split_size, dim)[mutated_view_idx]; }, [split_size, dim](const at::Tensor& base, const at::Tensor& mutated_view, int64_t mutated_view_idx) -> at::Tensor { return at::functionalization::impl::split_inverse(base, mutated_view, mutated_view_idx, split_size, dim); } ); at::functionalization::impl::set_view_meta(out, self, view_meta); at::AutoDispatchDirectlyToNative native_guard; ::std::vector<at::Tensor> reference_tensor_output = at::native::split(self, split_size, dim); at::functionalization::impl::set_strides(out, reference_tensor_output); return out; } ``` Mutation Op: ``` at::Tensor & add__Tensor(c10::DispatchKeySet ks, at::Tensor & self, const at::Tensor & other, const at::Scalar & alpha) { at::functionalization::impl::sync(self); at::functionalization::impl::sync(other); auto self_ = at::functionalization::impl::unwrapFunctionalTensor(self); auto other_ = at::functionalization::impl::unwrapFunctionalTensor(other); at::Tensor tmp_output; { at::AutoDispatchBelowFunctionalize guard; // The functionalization pass explicitly doesn't pass out= parameters to the redispatch tmp_output = at::redispatch::add( ks & c10::after_func_keyset, self_, other_, alpha); } self.replace_(tmp_output); at::functionalization::impl::maybe_add_update(self); return self; } ``` View + Mutation Op: ``` at::Tensor & transpose_(c10::DispatchKeySet ks, at::Tensor & self, int64_t dim0, int64_t dim1) { at::functionalization::ViewMeta view_meta = at::functionalization::ViewMeta( [dim0, dim1](const at::Tensor& base, int64_t mutated_view_idx) -> at::Tensor { return base.transpose(dim0, dim1); }, [dim0, dim1](const at::Tensor& base, const at::Tensor& mutated_view, int64_t mutated_view_idx) -> at::Tensor { return at::functionalization::impl::transpose_inverse(base, mutated_view, dim0, dim1); } ); at::functionalization::impl::mutate_view_meta(self, view_meta); // See Note [Propagating strides in the functionalization pass] // Directly update the sizes/strides/storage_offset fields on self using the inplace call. // I need the guard because I don't want the at::native kernel to end up calling more functionalization/functorch kernels. // Its only job is to directly compute the output size/stride/storage_offset metadata. at::AutoDispatchDirectlyToNative native_guard; at::native::transpose_(self, dim0, dim1); return self; } ``` Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31942093 Pulled By: bdhirsh fbshipit-source-id: b95598dae35dd1842fa8b1d8d1448332f3afaadf	2021-10-28 10:51:17 -07:00
Brian Hirsh	bcc6e3ab5e	add python API to print all operators that have kernels registered to a particular DispatchKey (#63575 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63575 Test Plan: Imported from OSS Reviewed By: ezyang, Chillee Differential Revision: D30426919 Pulled By: bdhirsh fbshipit-source-id: b0e487e48dfe02f7b9d678403f0a2b5bfe146f4e	2021-09-22 09:15:55 -07:00
Aaron Bockover	c78ab28441	Add support for the ONNX Runtime Eager Mode backend (#58248 ) Summary: This PR implements the necessary hooks/stubs/enums/etc for complete ONNX Runtime (ORT) Eager Mode integration. The actual extension will live out of tree at https://github.com/pytorch/ort. We have been [working on this at Microsoft](https://github.com/microsoft/onnxruntime-pytorch/tree/eager-ort/torch_onnxruntime) for the last few months, and are finally ready to contribute the PyTorch core changes upstream (nothing major or exciting, just the usual boilerplate for adding new backends). The ORT backend will allow us to ferry [almost] all torch ops into granular ONNX kernels that ORT will eagerly execute against any devices it supports (therefore, we only need a single ORT backend from a PyTorch perspective). Pull Request resolved: https://github.com/pytorch/pytorch/pull/58248 Reviewed By: astaff Differential Revision: D30344992 Pulled By: albanD fbshipit-source-id: 69082b32121246340d686e16653626114b7714b2	2021-08-20 11:17:13 -07:00
Alex Suhan	b176feec1e	Add device and key for lazy tensors (#61621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61621 Test Plan: CI Reviewed By: mruberry Differential Revision: D29912934 Pulled By: asuhan fbshipit-source-id: 493c32063a3e756d93cbf1d876563a35eaafb537	2021-07-26 23:00:22 -07:00
Anjali Chourdia	30e48bbeae	Add neg bit (#56058 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56058 User facing changes: 1. Adds a negative bit and corresponding new API (`is_neg()`,`resolve_neg()`) 2. `tensor.conj().imag` now returns a floating point tensor with neg bit set to 1 instead of a tensor with no notion of negative bit. Note that imag is still a view and all the view properties still hold for imag. Non user facing changes: 1. Added a new Negative dispatch key and a backend fallback to handle it 2. Updated copy kernel to handle negative bit 3. Merged conjugate and negative bit fallback kernel 4. fixed https://github.com/pytorch/pytorch/issues/60478 (caused due to https://github.com/pytorch/pytorch/pull/54987) Testing: 1. Added a new OpInfo based test `test_neg_view` (verifies that out-of-place and in-place operations work correctly for all operations when the input is a neg view tensor by checking the result against an actually negated tensor, verifies that autograd returns the same output for both neg view and actually negated tensors as well as it works fine when grad_out is a neg view). 2. Added a new test class containing `test_conj_view`, `test_neg_view`. Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D29636403 fbshipit-source-id: 12214c9dc4806c51850f4a72a109db9527c0ca63	2021-07-13 13:50:42 -07:00
xiaolil1	66158a6e90	Enable AutogradXPU DispatchKey for Intel heterogeneous computation platform. (#61105 ) Summary: Add string wrapper for AutogradXPU to enable this DispatchKey. We are going to use AutogradXPU as custom autograd backend, which needs this DispatchKey. This sting wrapper is used to map AutogradXPU to the corresponding DispatchKey. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61105 Reviewed By: malfet Differential Revision: D29557697 Pulled By: ezyang fbshipit-source-id: f0c8155decc8e2fd90741650a05de5a8b5a70121	2021-07-07 07:47:01 -07:00
Edward Yang	aacc722aec	Dispatch to Python via __torch_dispatch__ (#59760 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59760 See https://github.com/pytorch/pytorch/issues/59049 There are some moving parts to this PR, I'll structure this explanation so the straightforward parts go first, and then the less straightforward parts. The actual dispatch to Python. The core logic of dispatch to Python lives in `concrete_dispatch_fn` in `torch/csrc/autograd/python_variable.cpp`. It takes the input IValue stack, scans all the arguments for Tensor arguments, and defers most of the heavy lifting to `handle_torch_function_no_python_arg_parser` which actually does all of the logic for calling out to torch dispatch (in particular, this function handles multiple dispatch situations for you). Because we have a different function name than regular `__torch_function__` handling, `handle_torch_function_no_python_arg_parser` is generalized to accept a magic method name to look for when testing if Tensors have custom handling or not. Unlike `__torch_function__`, by default there is no `__torch_dispatch__` on Tensor classes. Maintaining the Python dispatch key. In order to get to the dispatch to Python logic, we must tag Tensors with the `__torch_dispatch__` magic method with the newly added Python dispatch key (separated from PythonFuncTorch to allow for a transitional period while they migrate to this mechanism). We expose a new private property `_is_python_dispatch` that assists in debugging if a Tensor is participating in Python dispatch or not. We apply the Python dispatch key the first time a PyObject for a Tensor is constructed (THPVariable_NewWithVar), testing if `__torch_dispatch__` exists with then newly added `check_has_torch_dispatch`. Shallow copy and detach. For the simple examples tested in this PR, most creations of Tensor route through the dispatcher. The exception to this is `shallow_copy_and_detach`, which bypasses the dispatcher and is used when saving tensors for backwards. When a Tensor is Python dispatch, we override the behavior of `shallow_copy_and_detach` to instead directly call into `__torch_dispatch__` to perform a `detach` operation (in the same way it would be invoked if you called `detach` directly). Because this Python call is triggered directly from c10::TensorImpl, it must be indirected through `PyInterpreter::detach`, which is the general mechanism for dynamic dispatching to the Python interpreter associated with a TensorImpl. torchdeploy compatibility. The dispatch to Python logic cannot be directly registered to the dispatcher as it is compiled in the Python library, which will get loaded multiple times per torchdeploy interpreter. Thus, we must employ a two phase process. First, we register a fallback inside a non-Python library (aten/src/ATen/core/PythonFallbackKernel.cpp). Its job is to determine the appropriate PyInterpreter to handle the Python dispatch by going through all of the arguments and finding the first argument that has a PyObject/PyInterpreter. With this PyInterpreter, it makes another dynamic dispatch via "dispatch" which will go to the correct torchdeploy interpreter to handle dispatching to actual Python. Testing. We provide a simple example of a LoggingTensor for testing, which can be used to generate TorchScript-like traces to observe what operations are being called when a Tensor is invoked. Although a LoggingTensor would be better implemented via an is-a relationship rather than a has-a relationship (as is done in the test), we've done it this way to show that arbitrarily complex compositions of tensors inside a tensor work properly. Known limitations. * We haven't adjusted any operator code, so some patterns may not work (as they lose the Python subclass in an unrecoverable way) * `__torch_function__` must be explicitly disabled with `_disabled_torch_function_impl` otherwise things don't work quite correctly (in particular, what is being disabled is default subclass preservation behavior.) * We don't ever populate kwargs, even when an argument is kwarg-only Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D29017912 D29017912 Test Plan: Imported from OSS Reviewed By: bdhirsh Pulled By: ezyang fbshipit-source-id: a67714d9e541d09203a8cfc85345b8967db86238	2021-06-25 11:50:32 -07:00
Nicolas Weber	25e077bce1	[Issue 59296] added VE device (#59620 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59296 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59620 Reviewed By: zou3519 Differential Revision: D29196830 Pulled By: ezyang fbshipit-source-id: 7bb49f776dc755804a0ba0bc3a7dbdab9c93914e	2021-06-21 16:44:52 -07:00
anjali411	3607478ecd	Conjugate View (#54987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54987 Based off of ezyang (https://github.com/pytorch/pytorch/pull/44799) and bdhirsh (https://github.com/pytorch/pytorch/pull/43702) 's prototype: Here's a summary of the changes in this PR: This PR adds a new dispatch key called Conjugate. This enables us to make conjugate operation a view and leverage the specialized library functions that fast path with the hermitian operation (conj + transpose). 1. Conjugate operation will now return a view with conj bit (1) for complex tensors and returns self for non-complex tensors as before. This also means `torch.view_as_real` will no longer be a view on conjugated complex tensors and is hence disabled. To fill the gap, we have added `torch.view_as_real_physical` which would return the real tensor agnostic of the conjugate bit on the input complex tensor. The information about conjugation on the old tensor can be obtained by calling `.is_conj()` on the new tensor. 2. NEW API: a) `.conj()` -- now returning a view. b) `.conj_physical()` -- does the physical conjugate operation. If the conj bit for input was set, you'd get `self.clone()`, else you'll get a new tensor with conjugated value in its memory. c) `.conj_physical_()`, and `out=` variant d) `.resolve_conj()` -- materializes the conjugation. returns self if the conj bit is unset, else returns a new tensor with conjugated values and conj bit set to 0. e) `.resolve_conj_()` in-place version of (d) f) `view_as_real_physical` -- as described in (1), it's functionally same as `view_as_real`, just that it doesn't error out on conjugated tensors. g) `view_as_real` -- existing function, but now errors out on conjugated tensors. 3. Conjugate Fallback a) Vast majority of PyTorch functions would currently use this fallback when they are called on a conjugated tensor. b) This fallback is well equipped to handle the following cases: - functional operation e.g., `torch.sin(input)` - Mutable inputs and in-place operations e.g., `tensor.add_(2)` - out-of-place operation e.g., `torch.sin(input, out=out)` - Tensorlist input args - NOTE: Meta tensors don't work with conjugate fallback. 4. Autograd a) `resolve_conj()` is an identity function w.r.t. autograd b) Everything else works as expected. 5. Testing: a) All method_tests run with conjugate view tensors. b) OpInfo tests that run with conjugate views - test_variant_consistency_eager/jit - gradcheck, gradgradcheck - test_conj_views (that only run for `torch.cfloat` dtype) NOTE: functions like `empty_like`, `zero_like`, `randn_like`, `clone` don't propagate the conjugate bit. Follow up work: 1. conjugate view RFC 2. Add neg bit to re-enable view operation on conjugated tensors 3. Update linalg functions to call into specialized functions that fast path with the hermitian operation. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D28227315 Pulled By: anjali411 fbshipit-source-id: acab9402b9d6a970c6d512809b627a290c8def5f	2021-06-04 14:12:41 -07:00
Sujoy Saraswati	3c973de543	HABANA Device registration key and Autograd key addition (#57094 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/57094 Reviewed By: mruberry Differential Revision: D28355895 Pulled By: wconstab fbshipit-source-id: 5d8b5762a69f444f4fe7f476891150fa5483d893	2021-05-12 13:07:33 -07:00
Ailing Zhang	0ecdbfebff	s/InplaceOrView/ADInplaceOrView/g (#57372 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57372 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57324 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D28121821 Pulled By: ailzhang fbshipit-source-id: f568dd2505f6279da9ffb93ce1d22e0f98c606bb	2021-05-01 22:56:18 -07:00
Edward Yang	09feb5f579	Delete grandfathered Caffe2 dispatch keys. (#56939 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56939 These never have kernels registered to them and are effectively useless. What I am not so sure if we allocate tensors to them or not; if we do I cannot use asserts and I need to ensure we just return undefined or something equivalent. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D28006160 Pulled By: ezyang fbshipit-source-id: f8e2b61b8bd928fb2c0ac0b534bd4af076423f71	2021-04-27 14:58:35 -07:00
Richard Zou	338a600e78	Add dispatch keys for out-of-tree grad+vmap prototype (#56824 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56824 This PR adds 6 dispatch uses to be used with prototyping. I'm not sure what the best way to name these are, please let me know if you think that these should have the same prefix. Test Plan: - wait for tests Reviewed By: driazati Differential Revision: D27999963 Pulled By: zou3519 fbshipit-source-id: 0c3ef4788854f7a93d077cc454b773a6eedbbc22	2021-04-27 09:02:49 -07:00
haozhe.zhu	ab20ba4427	Fix issue with dispatch key: AutogradXPU (#56336 ) Summary: Automatically add dispatch key "AutogradXPU" with "xpu" tensor. And set "fall through" for AutogradXPU Pull Request resolved: https://github.com/pytorch/pytorch/pull/56336 Reviewed By: heitorschueroff Differential Revision: D27872125 Pulled By: ailzhang fbshipit-source-id: c120c62becd577699f9aecb4c356c889bd37ad06	2021-04-20 12:09:59 -07:00
Sameer Deshmukh	5fb1142702	Add CSR (compressed sparse row) layout for sparse tensors (#50937 ) Summary: Implement compressed sparse row format. Derived from the GCS implementation at https://github.com/pytorch/pytorch/pull/44190 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50937 Reviewed By: mrshenli Differential Revision: D27439865 Pulled By: ezyang fbshipit-source-id: 3ba3dcb9679505b980ff6a5f513e913bbae2fb1d	2021-04-12 10:09:12 -07:00
Edward Yang	13b1ca9466	Rename DefaultBackend to CompositeExplicitAutograd (#54470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54470 ``` git grep -l 'DefaultBackend' \| xargs sed -i 's/DefaultBackend/CompositeExplicitAutograd/g' ``` Plus a quick fixup in native/README.md Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D27253240 Pulled By: ezyang fbshipit-source-id: 964df951ea8b52fa72937f3cc66aeaf49a702e6f	2021-03-26 10:53:30 -07:00
Edward Yang	145bc5cd51	Rename Math to CompositeImplicitAutograd (#54466 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54466 I had to very carefully audit all the use sites since there are a lot of other uses of the string Math; I did most of the conversion by grepping for all occurrences of Math and then doing a search replace. I also updated documentation for clarity. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27253239 Pulled By: ezyang fbshipit-source-id: afb485d07ff39575742a4f0e1e205179b60bc953	2021-03-24 13:49:24 -07:00
Edward Yang	282eefebf3	Delete defunct ComplexCPU/ComplexCUDA dispatch keys (#54013 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54013 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D27051837 Pulled By: ezyang fbshipit-source-id: c2a20737b6bd4a1317905bafceb2d8cb39f37e76	2021-03-16 15:20:04 -07:00
Ailing Zhang	274b96b878	Move as_view/increment_version to its separate key. (#53342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53342 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D26973913 Pulled By: ailzhang fbshipit-source-id: bc7fc25d1a3a1f20cdfa1d7126fa559a84d194a4	2021-03-15 14:47:12 -07:00
Bel H	30cb6ac53c	Introduce `mlc` device (ML Compute device) to PyTorch's device list (#50634 ) Summary: Apple recently announced ML Compute, a new framework available in macOS Big Sur, which enables users to accelerate the training of neural networks on Mac hardware. This PR is the first on a series of PRs that will enable the integration with ML Compute. Most of the integration code will live on a separate subrepo named `mlc`. The integration with `mlc` (ML Compute) will be very similar to that of xla. We rely on registering our ops through: TORCH_LIBRARY_IMPL(aten, PrivateUse1, m) { m.impl_UNBOXED(<op_schema_name>, &customized_op_kernel) ... } Pull Request resolved: https://github.com/pytorch/pytorch/pull/50634 Reviewed By: malfet Differential Revision: D26614213 Pulled By: smessmer fbshipit-source-id: 3b492b346c61cc3950ac880ac01a82fbdddbc07b	2021-02-24 22:39:11 -08:00
chengjun	4a8ef4525e	Add new backend type for Intel heterogeneous computation platform. (#49786 ) Summary: Add a new device type 'XPU' ('xpu' for lower case) to PyTorch. Changes are needed for code related to device model and kernel dispatch, e.g. DeviceType, Backend and DispatchKey etc. https://github.com/pytorch/pytorch/issues/48246 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49786 Reviewed By: mrshenli Differential Revision: D25893962 Pulled By: ezyang fbshipit-source-id: 7ff0a316ee34cf0ed6fc7ead08ecdeb7df4b0052	2021-01-20 08:15:18 -08:00
Christian Puhrsch	09a52676ad	Add NestedTensor specific dispatch key to PyTorch (#44668 ) Summary: This adds a dedicated dispatch key for the [nestedtensor project](https://github.com/pytorch/nestedtensor). - [ ] Since this isn't a device or a backend, does this need further updates in other places other than DispatchKey.h? Pull Request resolved: https://github.com/pytorch/pytorch/pull/44668 Reviewed By: zhangguanheng66, ailzhang Differential Revision: D23998801 Pulled By: cpuhrsch fbshipit-source-id: 133b5a9a04c4f61c27c0728832da09e4b38a5939	2020-11-02 21:35:54 -08:00
Basil Hosmer	d22455128f	[dispatcher] avoid autograd fixup step on non-backend keys (#46135 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46135 Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D24235974 Pulled By: bhosmer fbshipit-source-id: 21215b31146673caae904bb82395858419641633	2020-10-13 23:33:15 -07:00
Tao Xu	a277c097ac	[iOS][GPU] Add Metal/MPSCNN support on iOS (#46112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46112 ### Summary This PR adds the support of running torchscript models on iOS GPU via Metal (Inference only). The feature is currently in prototype state, API changes are expected. The tutorial and the documents will be added once it goes to beta. allow-large-files - Users API ``` auto module = torch::jit::load(model); module.eval(); at::Tensor input = at::ones({1,3,224,224}, at::ScalarType::Float).metal(); auto output = module.forward({input}).toTensor().cpu(); ``` - Supported Models - Person Segmentation v106 (FB Internal) - Mobilenetv2 - Supported Operators - aten::conv2d - aten::addmm - aten::add.Tensor - aten::sub.Tensor - aten::mul.Tensor - aten::relu - aten::hardtanh - aten::hardtanh_ - aten::sigmoid - aten::max_pool2d - aten::adaptive_avg_pool2d - aten::reshape - aten::t - aten::view - aten::log_softmax.int - aten::upsample_nearest2d.vec - Supported Devices - Apple A9 and above - iOS 10.2 and above - CMake scripts - `IOS_ARCH=arm64 ./scripts/build_ios.sh -DUSE_METAL=ON` ### Test Plan - Circle CI ghstack-source-id: 114155638 Test Plan: 1. Sandcastle CI 2. Circle CI Reviewed By: dreiss Differential Revision: D23236555 fbshipit-source-id: 98ffc48b837e308bc678c37a9a5fd8ae72d11625	2020-10-13 01:46:56 -07:00
Ailing Zhang	0ddcc0ce35	Add alias dispatch key DefaultBackend. (#45718 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45718 Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D24165892 Pulled By: ailzhang fbshipit-source-id: ed28bf62b7c6320d966fd10b7a44b14efffe2f62	2020-10-09 12:02:44 -07:00
Ailing Zhang	92f8f75c59	Add alias dispatch key Math. (#44354 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44354 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23591481 Pulled By: ailzhang fbshipit-source-id: 6e93c4ec99a07f3fc920ba2d09dc222e6ced5adf	2020-09-21 11:10:39 -07:00
Ailing Zhang	39bb455e36	Update fallback kernel for Autograd keys. (#44349 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44349 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23589807 Pulled By: ailzhang fbshipit-source-id: 0e4b0bf3e07bb4e35cbf1bda22f7b03193eb3dc4	2020-09-11 12:04:52 -07:00
Ailing Zhang	224232032c	Move Autograd to an alias dispatch key (#43070 ) Summary: This PR moves `DispatchKey::Autograd` to an alias dispatch key mapping to `AutogradCPU, AutogradCUDA, AutogradXLA, AutogradOther, AutogradPrivate*` keys. A few things are handled in this PR: - Update alias dispatch key mapping and precompute dispatchTable logic - Move `Autograd` key from `always_included` set to TensorImpl constructor. - Update `dummyTensor` constructor to take `requires_grad` as optional argument so that it's closer to the real application in op_registration_test. - Use `BackendSelect` key for both backend select before and after autograd layer. (1 liner in backend_select codegen) A few planned followups ordered by priority: - [cleanup] Update `test_dispatch.py` to include testing `Autograd`. - [cleanup] Add Math alias key and move catchAll to Math. (to remove 2.2 in `computeDispatchTableEntryWithDebug`) - [new feature] Add support for Math in native_functions.yaml - [cleanup] Add iterator like functionality to DispatchKeySet - [cleanup/large] Only add Autograd backend keys when tensor requires grad. (cc: ljk53 ?) Pull Request resolved: https://github.com/pytorch/pytorch/pull/43070 Reviewed By: ezyang Differential Revision: D23281535 Pulled By: ailzhang fbshipit-source-id: 9ad00b17142e9b83304f63cf599f785500f28f71	2020-09-01 09:05:29 -07:00
Ailing Zhang	7cb8d68ae1	Rename XLAPreAutograd to AutogradXLA. (#43047 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/43047 Reviewed By: ezyang Differential Revision: D23134326 Pulled By: ailzhang fbshipit-source-id: 5fcbc23755daa8a28f9b03af6aeb3ea0603b5c9a	2020-08-17 10:47:43 -07:00
Richard Zou	e8f4b04d9a	vmap: temporarily disable support for random functions (#42617 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42617 While we figure out the random plan, I want to initially disable support for random operations. This is because there is an ambiguity in what randomness means. For example, ``` tensor = torch.zeros(B0, 1) vmap(lambda t: t.normal_())(tensor) ``` in the above example, should tensor[0] and tensor[1] be equal (i.e., use the same random seed), or should they be different? The mechanism for disabling random support is as follows: - We add a new dispatch key called VmapMode - Whenever we're inside vmap, we enable VmapMode for all tensors. This is done via at::VmapMode::increment_nesting and at::VmapMode::decrement_nesting. - DispatchKey::VmapMode's fallback kernel is the fallthrough kernel. - We register kernels that raise errors for all random functions on DispatchKey::VmapMode. This way, whenever someone calls a random function on any tensor (not just BatchedTensors) inside of a vmap block, an error gets thrown. Test Plan: - pytest test/test_vmap.py -v -k "Operators" Reviewed By: ezyang Differential Revision: D22954840 Pulled By: zou3519 fbshipit-source-id: cb8d71062d4087e10cbf408f74b1a9dff81a226d	2020-08-11 07:19:51 -07:00
Basil Hosmer	c889de7e25	update DispatchKey::toString() (#42619 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42619 Added missing entries to `DispatchKey::toString()` and reordered to match declaration order in `DispatchKey.h` Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D22963407 Pulled By: bhosmer fbshipit-source-id: 34a012135599f497c308ba90ea6e8117e85c74ac	2020-08-07 22:39:23 -07:00
Ilia Cherniavskii	a53fdaa23f	Remove ProfiledType (#42570 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42570 ProfiledType doesn't do anything and is not used atm, removing Test Plan: CI Reviewed By: ezyang Differential Revision: D22938664 Pulled By: ilia-cher fbshipit-source-id: 037c512938028f44258b702bbcde3f8c144f4aa0	2020-08-06 01:52:08 -07:00
Edward Yang	e4766fb4d9	Meta tensors, but without code deduplication (#38490 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38490 A meta tensor is a tensor that is a lot like a normal tensor, except it doesn't actually have any data associated with it. You can use them to carry out shape/dtype computations without actually having to run the actual code; for example, this could be used to do shape inference in a JIT analysis pass. Check out the description in DispatchKey.h for more information. Meta tensors are part of a larger project to rationalize how we write kernels so that we don't have to duplicate shape logic in CPU kernel, CUDA kernel and meta kernel (this PR makes the duplication problem worse!) However, that infrastructure can be built on top of this proof of concept, which just shows how you can start writing meta kernels today even without this infrastructure. There are a lot of things that don't work: - I special cased printing for dense tensors only; if you try to allocate a meta sparse / quantized tensor things aren't going to work. - The printing formula implies that torch.tensor() can take an ellipsis, but I didn't add this. - I wrote an example formula for binary operators, but it isn't even right! (It doesn't do type promotion of memory layout correctly). The most future proof way to do it right is to factor out the relevant computation out of TensorIterator, as it is quite involved. - Nothing besides torch.add works right now - Meta functions are ALWAYS included in mobile builds (selective build doesn't work on them). This isn't a big deal for now but will become more pressing as more meta functions are added. One reason I'm putting up this PR now is to check with Yinghai Lu if we can unblock shape inference for accelerators, while we are still working on a long term plan for how to unify all shape computation across our kernels. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21935609 Pulled By: ezyang fbshipit-source-id: f7d8636eeb8516b6bc296db99a16e56029972eee	2020-06-22 09:18:33 -07:00
Jiakai Liu	72b0447f8d	[pytorch] move tracing logic to a separate dispatch backend (#38467 ) Summary: This PR moves tracing logic out of the generated VariableType kernels, to associate it with a new dedicated dispatch key Tracer. It also toggles the dispatch key set at various places to keep the semantics unchanged - see the inline [Tracing Mode Switches] note. Sample generated code: ``` Tensor & __ilshift___Tensor(Tensor & self, const Tensor & other) { #if !defined(PYTORCH_DISABLE_TRACING) torch::jit::Node* node = nullptr; std::shared_ptr<jit::tracer::TracingState> tracer_state; if (jit::tracer::isTracing()) { tracer_state = jit::tracer::getTracingState(); at::Symbol op_name; op_name = jit::Symbol::fromQualString("aten::__ilshift__"); node = tracer_state->graph->create(op_name, /num_outputs=/0); jit::tracer::recordSourceLocation(node); jit::tracer::addInputs(node, "self", self); jit::tracer::addInputs(node, "other", other); tracer_state->graph->insertNode(node); jit::tracer::setTracingState(nullptr); } #endif static auto op = c10::Dispatcher::singleton().findSchemaOrThrow("aten::__ilshift__", "Tensor"); c10::Dispatcher::singleton().redispatch<Tensor &, Tensor &, const Tensor &>(op, c10::DispatchKey::Tracer, self, other); #if !defined(PYTORCH_DISABLE_TRACING) if (tracer_state) { jit::tracer::setTracingState(std::move(tracer_state)); jit::tracer::addOutput(node, self); } #endif return self; } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/38467 ghstack-source-id: 105215150 Test Plan: CI Differential Revision: D21570684 fbshipit-source-id: 1a96761830307f9a934f38bfb9fe8b5b1763e0e0	2020-06-04 01:51:30 -07:00
Dylan Bespalko	c767d65caf	Added FPGA DispatchKey, DeviceType, Backend (#38938 ) Summary: ezyang, I have added the changes to DispatchKey, DeviceType, Backend to support the out-of-tree FPGA. cc. tataetae Pull Request resolved: https://github.com/pytorch/pytorch/pull/38938 Differential Revision: D21748955 Pulled By: ezyang fbshipit-source-id: fe76d9730818205961430d2a0e00727b5c547b32	2020-06-03 07:28:14 -07:00
Edward Yang	2b6a48e962	Remove supports_named_tensor from codegen entirely. (#38739 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38739 Instead of codegenning the named tensor support checks into CPUType/CUDAType, we instead add a new dispatch key that is put into tensor whenever it has names. By default, the fallback implementation says that named tensors are not supported, but if they are supported, we register a fallthrough which lets us through to the true backend implementation. There are a bunch of small pieces which are necessary to make this happen: - NameMode now also excludes DispatchKey::Named from the dispatch set - To avoid bad error messages, we add a teensy special case to the dispatcher for named_not_supported_kernel: if we see that the boxed kernel we need to invoke from unboxed is this kernel, but we don't support boxing, but it's a kernel which is known to not need boxing, we just pass in nullptr for the stack. The special case here is very nice: it doesn't affect the fast path and only gets exercised when things are not supported. - I need to add support for per operator fallthrough registration. This is done similarly to how we support fallthrough fallback, by just keeping track if the registered kernel for an operator is a fallthrough. It is possible we could go even further down this path, and move the named tensor logic itself into this key. I leave this up to future work. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21662643 Pulled By: ezyang fbshipit-source-id: 5bc6ae14a1f600189bd8bf865f74dd1700d932f7	2020-06-01 13:09:08 -07:00
Richard Zou	d26f7f09b5	Fixup: rename BatchedTensorKey to Batched (#38798 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38798 This makes it more in-line with the other keys in the file (DispatchKey.h). Test Plan: Imported from OSS Differential Revision: D21691789 Pulled By: zou3519 fbshipit-source-id: 8d8b902360c0238f67bd0e58f9d969cec4b63320	2020-05-28 13:47:09 -07:00
Ivan Kobzarev	b460465a18	[Mobile GPU][Integration] Vulkan backend integration (#36491 ) Summary: This PR contains the initial version of Vulkan (GPU) Backend integration. The primary target environment is Android, but the desktop build is also supported. ## CMake Introducing three cmake options: USE_VULKAN: The main switch, if it is off, all other options do not affect. USE_VULKAN_WRAPPER: ON - Vulkan will be used loading it at runtime as "libvulkan.so" using libdl, every function call is wrapped in vulkan_wrapper.h. OFF - linking with libvulkan.so directly USE_VULKAN_SHADERC_RUNTIME: ON - Shader compilation library will be linked, and shaders will be compiled runtime. OFF - Shaders will be precompiled and shader compilation library is not included. ## Codegen if `USE_VULKAN_SHADERC_RUNTIME` is ON: Shaders precompilation () starts in cmake/VulkanCodegen.cmake, which calls `aten/src/ATen/native/vulkan/gen_glsl.py` or `aten/src/ATen/native/vulkan/gen_spv.py` to include shaders source or SPIR-V bytecode inside binary as uint32_t array in spv.h,spv.cpp. if `USE_VULKAN_SHADERC_RUNTIME` is OFF: The source of shaders is included as `glsl.h`,`glsl.cpp`. All codegen results happen in the build directory. ## Build dependencies cmake/Dependencies.cmake If the target platform is Android - vulkan library, headers, Vulkan wrapper will be used from ANDROID_NDK. Desktop build requires the VULKAN_SDK environment variable, and all vulkan dependencies will be used from it. (Desktop build was tested only on Linux). ## Pytorch integration: Adding 'Vulkan" as new Backend, DispatchKey, DeviceType. We are using Strided layout without supporting strides at the moment, but we plan to support them in the future. Using OpaqueTensorImpl where OpaqueHandle is copyable VulkanTensor, more details in comments in `aten/src/ATen/native/vulkan/Vulkan.h` Main code location: `aten/src/ATen/native/vulkan` `aten/src/ATen/native/vulkan/VulkanAten.cpp` - connection link between ATen and Vulkan api (Vulkan.h) that converts at::Tensor to VulkanTensor. `aten/src/ATen/native/Vulkan/Vulkan.h` - Vulkan API that contains VulkanTensor representation and functions to work with it. Plan to expose it for clients to be able to write their own Vulkan Ops. `aten/src/ATen/native/vulkan/VulkanOps.cpp` - Vulkan Operations Implementations that uses Vulkan.h API ## GLSL shaders Located in `aten/src/ATen/native/vulkan/glsl` as *.glsl files. All shaders use Vulkan specialized constants for workgroup sizes with ids 1, 2, 3 ## Supported operations Code point: conv2d no-groups conv2d depthwise addmm upsample nearest 2d clamp hardtanh ## Testing `aten/src/ATen/test/vulkan_test.cpp` - contains tests for copy from CPU to Vulkan and back all supported operations Desktop builds supported, and testing can be done on a desktop that has Vulkan supported GPU or with installed software implementation of Vulkan, like https://github.com/google/swiftshader ## Vulkan execution The initial implementation is trivial and waits every operator's execution. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36491 Differential Revision: D21696709 Pulled By: IvanKobzarev fbshipit-source-id: da3e5a770b1a1995e9465d7e81963e7de56217fa	2020-05-26 08:30:13 -07:00
James Reed	1592d6842c	[resubmit] Move profiler to a dispatch wrapper (#36766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36766 Original commit changeset: dcb41d243369 ghstack-source-id: 102614215 Test Plan: waitforsadcastle Differential Revision: D21076029 fbshipit-source-id: c2461c57cfd364bd23ff99bc2cb5572d22e23391	2020-04-21 16:37:11 -07:00
Karl Ostmo	4894cba572	Revert D19775659: [WIP] Move profiler to a dispatch wrapper Test Plan: revert-hammer Differential Revision: D19775659 Original commit changeset: 5cbe5f736660 fbshipit-source-id: dcb41d2433697c5d521044a9dbc12c79f31e0929	2020-04-16 14:18:51 -07:00
James Reed	a85c835196	[WIP] Move profiler to a dispatch wrapper (#33057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33057 Test Plan: Imported from OSS Differential Revision: D19775659 Pulled By: jamesr66a fbshipit-source-id: 5cbe5f736660c8543764ef62b16550638d9ceb72	2020-04-16 13:36:37 -07:00
Edward Yang	dd64e738c5	Expunge TensorId from all DispatchKey names. (#36240 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36240 It's annoying, historical, and unnecessary (enum class is already namespaced). I did this codemod with: ``` git grep -l 'CPUTensorId' \| xargs sed -i 's/CPUTensorId/CPU/g' git grep -l 'CUDATensorId' \| xargs sed -i 's/CUDATensorId/CUDA/g' git grep -l 'VariableTensorId' \| xargs sed -i 's/VariableTensorId/Autograd/g' git grep -l 'HIPTensorId' \| xargs sed -i 's/HIPTensorId/HIP/g' git grep -l 'MSNPUTensorId' \| xargs sed -i 's/MSNPUTensorId/MSNPU/g' git grep -l 'XLATensorId' \| xargs sed -i 's/XLATensorId/XLA/g' git grep -l 'PrivateUse1_TensorId' \| xargs sed -i 's/PrivateUse1_TensorId/PrivateUse1/g' git grep -l 'PrivateUse2_TensorId' \| xargs sed -i 's/PrivateUse2_TensorId/PrivateUse2/g' git grep -l 'PrivateUse3_TensorId' \| xargs sed -i 's/PrivateUse3_TensorId/PrivateUse3/g' git grep -l 'AutocastTensorId' \| xargs sed -i 's/AutocastTensorId/Autocast/g' git grep -l '_PreAutogradTensorId' \| xargs sed -i 's/_PreAutogradTensorId/_PreAutograd/g' git grep -l 'TESTING_ONLY_GenericWrapperTensorId' \| xargs sed -i 's/TESTING_ONLY_GenericWrapperTensorId/TESTING_ONLY_GenericWrapper/g' git grep -l 'TESTING_ONLY_GenericModeTensorId' \| xargs sed -i 's/TESTING_ONLY_GenericModeTensorId/TESTING_ONLY_GenericMode/g' ``` Then I did a git grep for remaining TensorId occurrences, and manually killed those (mostly in codegen, and some docs that needed updating). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20929255 Pulled By: ezyang fbshipit-source-id: dc371b6aa6e6ea7c0a5660137c14debde806a09d	2020-04-13 23:33:44 -07:00
Xiang Gao	15c7486416	Canonicalize includes in c10, and add tests for it (#36299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36299 Test Plan: Imported from OSS Differential Revision: D20943005 Pulled By: ezyang fbshipit-source-id: 9dd0a58824bd0f1b5ad259942f92954ba1f63eae	2020-04-10 12:07:52 -07:00

1 2

59 Commits