pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Edward Z. Yang	c20969c40c	Fix ParameterList printing meta tensor Fixes https://github.com/pytorch/pytorch/issues/78250 There are actually two bugs. First, the crash is caused by TensorOptions::backend incorrectly reporting noexcept when it can failed. Second, ParameterList is using torch.tensortype for no good reason; we can just print the dtype instead. Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/78529 Approved by: https://github.com/albanD	2022-06-01 00:46:52 +00:00
Hangchen Yu	abb6fab0f4	Add new PrivateUse1 DeviceType for non-public devices (#77208 ) Summary: The new PrivateUse1 DeviceType is associated with the PrivateUse1 DispatchKey, which can be used for non-public devices without introducing a new device type. Note that the stringified name of the PrivateUse1 device is "privateuseone". Test Plan: All CI should pass. Differential Revision: D35859437 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77208 Approved by: https://github.com/bdhirsh	2022-05-13 16:03:27 +00:00
Kulin Seth	54c75e1e8f	Add "mps" device to PyTorch framework. Remove the "mlc" device for Mac platforms. This commit will be followed up with: * adding MPS runtime components * PyTorch ops for MPS device Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/76291 Approved by: https://github.com/albanD	2022-04-27 19:21:57 +00:00
Pearu Peterson	1cd46b309b	Introduce sparse compressed layouts: SparseCsr, SparseBsr, SparseBsc Pull Request resolved: https://github.com/pytorch/pytorch/pull/75831 Approved by: https://github.com/cpuhrsch	2022-04-15 03:55:32 +00:00
Anthony Barbier	ce9e27a0fc	Add new keys for Graphcore IPU (DispatchKey / Backend / DeviceType) We need a key to register our out of tree backend: https://github.com/graphcore/poptorch Pull Request resolved: https://github.com/pytorch/pytorch/pull/74763 Approved by: https://github.com/bdhirsh	2022-04-07 17:18:45 +00:00
Brian Hirsh	58dabebcd7	improve quantized error checking for structured kernels (#71928 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71928 Test Plan: Imported from OSS Reviewed By: wconstab, bhosmer Differential Revision: D33823417 Pulled By: bdhirsh fbshipit-source-id: e894b9724833b77b12963cc4bf194bc6ce526ad9 (cherry picked from commit `6be10b79e7`)	2022-02-01 16:09:45 +00:00
Amr Elshennawy	ca649851c6	Reduce PyTorch warnings: Cast fix xplat/caffe2/c10/core/TensorOptions.h (#65030 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65030 Test Plan: ``` buck build --show-output //caffe2/torch/fb/sparsenn:sparsenn_operators buck test caffe2/torch/fb/sparsenn:test ``` Reviewed By: r-barnes Differential Revision: D30948721 fbshipit-source-id: 16fe42daab35709c56a4d3ccc276ea635a3510c1	2021-09-20 17:23:58 -07:00
Aaron Bockover	c78ab28441	Add support for the ONNX Runtime Eager Mode backend (#58248 ) Summary: This PR implements the necessary hooks/stubs/enums/etc for complete ONNX Runtime (ORT) Eager Mode integration. The actual extension will live out of tree at https://github.com/pytorch/ort. We have been [working on this at Microsoft](https://github.com/microsoft/onnxruntime-pytorch/tree/eager-ort/torch_onnxruntime) for the last few months, and are finally ready to contribute the PyTorch core changes upstream (nothing major or exciting, just the usual boilerplate for adding new backends). The ORT backend will allow us to ferry [almost] all torch ops into granular ONNX kernels that ORT will eagerly execute against any devices it supports (therefore, we only need a single ORT backend from a PyTorch perspective). Pull Request resolved: https://github.com/pytorch/pytorch/pull/58248 Reviewed By: astaff Differential Revision: D30344992 Pulled By: albanD fbshipit-source-id: 69082b32121246340d686e16653626114b7714b2	2021-08-20 11:17:13 -07:00
Alex Suhan	b176feec1e	Add device and key for lazy tensors (#61621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61621 Test Plan: CI Reviewed By: mruberry Differential Revision: D29912934 Pulled By: asuhan fbshipit-source-id: 493c32063a3e756d93cbf1d876563a35eaafb537	2021-07-26 23:00:22 -07:00
Richard Barnes	15010bf223	Make some downcast issues explicit (#60412 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60412 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D29243195 fbshipit-source-id: c508b729d6a0e6f8a591521bce788e6cfd8531f8	2021-07-09 01:29:29 -07:00
Nicolas Weber	25e077bce1	[Issue 59296] added VE device (#59620 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59296 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59620 Reviewed By: zou3519 Differential Revision: D29196830 Pulled By: ezyang fbshipit-source-id: 7bb49f776dc755804a0ba0bc3a7dbdab9c93914e	2021-06-21 16:44:52 -07:00
Sujoy Saraswati	3c973de543	HABANA Device registration key and Autograd key addition (#57094 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/57094 Reviewed By: mruberry Differential Revision: D28355895 Pulled By: wconstab fbshipit-source-id: 5d8b5762a69f444f4fe7f476891150fa5483d893	2021-05-12 13:07:33 -07:00
Scott Wolchok	44cc873fba	[PyTorch] Autoformat c10 (#56830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56830 Opt into formatting on GitHub and format everything. This is a trial run before turning on formatting for more and eventually all of the codebase. Test Plan: CI Reviewed By: zertosh Differential Revision: D27979080 fbshipit-source-id: a80f0c48691c08ae8ca0af06377b87e6a2351151	2021-04-30 21:23:28 -07:00
Edward Yang	09feb5f579	Delete grandfathered Caffe2 dispatch keys. (#56939 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56939 These never have kernels registered to them and are effectively useless. What I am not so sure if we allocate tensors to them or not; if we do I cannot use asserts and I need to ensure we just return undefined or something equivalent. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D28006160 Pulled By: ezyang fbshipit-source-id: f8e2b61b8bd928fb2c0ac0b534bd4af076423f71	2021-04-27 14:58:35 -07:00
Sameer Deshmukh	5fb1142702	Add CSR (compressed sparse row) layout for sparse tensors (#50937 ) Summary: Implement compressed sparse row format. Derived from the GCS implementation at https://github.com/pytorch/pytorch/pull/44190 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50937 Reviewed By: mrshenli Differential Revision: D27439865 Pulled By: ezyang fbshipit-source-id: 3ba3dcb9679505b980ff6a5f513e913bbae2fb1d	2021-04-12 10:09:12 -07:00
Edward Yang	1f36ce6e4d	Restore storage on meta tensors; increase meta coverage (#53973 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53973 Two parts to this PR; I had to put them together because adding support for X causes more test code to be exercised, which in turn may require a fix for Y. The first part is restoring the concept of storage to meta tensors. Previously, meta tensors had a nullptr storage (e.g., `meta_tensor.storage()` is an error.) As I was increasing the coverage of meta tensors, I started running into test cases (specifically memory overlap tests) that were failing because not having storage meant I couldn't check for memory overlap. After some discussion, we decided that it would make sense for meta tensors to model this as well (we already model strides, so getting accurate view information also seems useful). This PR does that by: * Rewrite all of the factory functions in MetaTensor.cpp to use the generic versions (which are very carefully written to not actually poke at the data pointer, so everything works out). The key idea here is we give meta tensors a special allocator, MetaAllocator, which always returns a nullptr even if you ask for a nonzero number of bytes. resize_ is also made generic; the normal variant can be used directly rather than having to instruct it to avoid resizing storage * Turn on memory overlap checking in TensorIterator even for meta tensors * Although meta tensors now have storage, the concept of meta storage is NOT exposed to Python land (as it would imply I would have to codegen MetaFloatStorage, MetaDoubleStorage, etc. classes). So `x.storage()` still raises an error and I have a cludge in `__deepcopy__` to break storage sharing upon deep copy (this is wrong, but no tests exercise this at the moment). The second part is adding more support for the most used functions in the test suite. * Inplace operations have very simple meta functions. I added `fill_`, `zero_`, `random_`, `uniform_` and `normal_`. In the case of random, I take advantage of pbelevich's templates for defining random kernels, so that I can reuse the common scaffolding, and then just register a noop stub that actually does the RNG. (Look, another structured kernels tiny variant!) * `copy_` is now implemented. Copying into a meta tensor is always OK, but copying out of a meta tensor raises an error (as we don't know what the "correct" data to copy out is in this case) * `empty_strided` usage from structured kernels now is implemented (TBH, this could have been done as soon as `empty_strided` was added) * Meta was missing in a few places in TensorOptions/DispatchKey utility functions, so I added them * Autograd engine now correctly homes meta tensors with CPU tensors (they have -1 device index so CUDA queues wouldn't work anyway) * `apply_`, `map_` and `map2_` are special cased to no-op on meta tensor self. These count as inplace operations too but they are implemented a little differently. Getting more meta function support triggers a number of bugs in the test suite, which I then fix: - Linear algebra functions sometimes don't report NotImplementedError because they get swallowed by catch all try blocks. This is tracked in https://github.com/pytorch/pytorch/issues/53739 - dlpack obviously doesn't work with meta tensors, I just disabled the test Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D27036572 Test Plan: Imported from OSS Reviewed By: agolynski, bdhirsh Pulled By: ezyang fbshipit-source-id: 7005ecf4feb92a643c37389fdfbd852dbf00ac78	2021-03-29 08:37:46 -07:00
Edward Yang	4919fecf23	Delete dead TensorOptions::key_set (#54004 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54004 According to `glean-search find-decls --refs 'c10::TensorOptions::key_set'` there are no uses of this function Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D27047971 Pulled By: ezyang fbshipit-source-id: 63662dd7ab27753ecb79c45c152c2cad1160dab2	2021-03-22 15:24:34 -07:00
Edward Yang	e0aebe241d	Refactor tensor_new.cpp to use TensorOptions instead of DispatchKey (#54034 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54034 Fixes #53544 I had to touch a bunch of lines but the refactoring was fairly mechanical. Here's how it works. The basic concept behind this PR is that tensor_new.cpp was previously abusing DispatchKey when it actually meant TensorOptions. The provided DispatchKey argument to most of the constructor functions typically comes from torch::tensors::get_default_dispatch_key(); it doesn't really make sense for people to set the default dispatch key, but this got grandfathered in due to the old API set_default_tensor_type (where the "Type" concept got refactored into "DispatchKey" concept over time). See also #53124. But the upshot is that, semantically, what we refer to as the default dispatch key really is more like torch.set_default_tensor_type(torch.Tensor) versus torch.set_default_tensor_type(torch.cuda.Tensor): clearly the user wants to do something about construction of the tensor, and TensorOptions captures that exactly. So, how exactly to translate from one to the other? - Sources (things that used to PRODUCE DispatchKey) - Most top level functions take a DispatchKey as their argument. I use the new function dispatchKeyToTensorOptions to convert it into a TensorOptions - typeIdWithDefault now produces a TensorOptions (probably could do with a rename, though I didn't) - Sinks (things that used to CONSUME DispatchKey) - Previously, the function options() was typically used to convert the DispatchKey into a TensorOptions. Now its replacement build_options just takes a TensorOptions and sets some extra fields on it. Irritatingly, I can't just replace `build_options(options, scalar_type, device)` with `options.dtype(scalar_type).device(device)` because the semantics are slightly different: if device is nullopt, we should preserve the usage of the device specified in options (what options.device() does is overwrite the device unconditionally; e.g., if device is nullopt, unset device from options) - The other major sink for DispatchKey was `internal_new_from_data`, but it turns out it only really extracts the device type from the dispatch key. Now it just pulls out the device from TensorOptions. - To actually do the translation of DispatchKey to TensorOptions, I introduce new functions dispatchKeyToLayout (replicating layout_from_backend--there are still a few uses of this function so I couldn't delete it) and dispatchKeyToDeviceType (replacing computeDeviceType) - In all internal functions, whenever DispatchKey is taken as an argument, I instead take TensorOptions as an argument, and pass it along. - Anywhere `legacyExtractDispatchKey(other.key_set())` equality was previously used, I now do `other.options().type_equal()`, which is the intended BC for doing "backend to backend" comparisons - There are a few places in the sparse constructors where we allocated a tensor for values, and then read out the dispatch key from the result to allocate the keys. As best as I can tell, this is totally equivalent to just passing in the options to both values and indices (the only difference is dtype, which is captured via a separate argument) This refactor doesn't really go far enough: for example, there are now functions that take both TensorOptions and ScalarType, when really the TensorOptions can capture this all. I kept it solely just s/DispatchKey/TensorOptions/ to reduce the number of possible bugs; also, a lot of this will be mooted by a proper fix to #53124. Even with this limited refactor, the payoff is sweet. I can delete: - backendToCPU - backendToXPU - backendToCUDA - backendToHIP - backendToBackendOfDeviceType The reason I can do this is because I can simply overwrite layout in TensorOptions to do the conversion, rather than having to type out each backend case explicitly. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D27109509 Pulled By: ezyang fbshipit-source-id: 91d16cfbc390127770362ac04fb43f7e070077e9	2021-03-19 09:08:32 -07:00
Edward Yang	d47fd3df81	Compute type_equal() without reference to backend() (#53823 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53823 Argument for correctness: type_equal previous compared if backends are equal. Backend is computed by translation from dispatch key. I verified that computeDispatchKey never computed a weird dispatch key (e.g., AutogradXLA), so that dispatchKeyToBackend was effectively injective. Then it is always valid to compare the arguments of an injective function for equality, rather than the output of the injective function. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D27036575 Pulled By: ezyang fbshipit-source-id: 6aeafc89f287da0bc0065bd21c1adb5e272dbb81	2021-03-16 15:19:57 -07:00
Edward Yang	4dbd0b639d	Convert a few more checks to raise NotImplementedError (#53610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53610 I noticed these because I was running the test suite under meta device and triggered these error checks without getting a NotImplementedError. Well, now they raise. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D26918376 Pulled By: ezyang fbshipit-source-id: 20d57417aa64875d43460fce58af11dd33eb4a23	2021-03-10 08:53:59 -08:00
Edward Yang	0f81a69a96	Make meta a device (getting rid of empty_meta) (#53143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53143 Meta is now an honest to goodness device type, like cpu, so you can use device='meta' to trigger allocation of meta tensors. This way better than empty_meta since we now have working API for most factory functions (they don't necessarily work yet, though, because need to register Meta versions of those functions.) Some subtleties: - I decided to drop the concept of CPU versus CUDA meta tensors; meta tensors are device agnostic. It's hard to say exactly what the correct level of abstraction here is, but in this particular case implementation considerations trump semantic considerations: it is way easier to have just a meta device, than to have a meta device AND a cpu device AND a cuda device. This may limit the applicability of meta tensors for tracing models that do explicit cpu()/cuda() conversions (unless, perhaps, we make those operations no-ops on meta tensors). - I noticed that the DeviceType uppercase strings are kind of weird. Are they really supposed to be all caps? That's weird. - I moved the Meta dispatch key to live with the rest of the "device" dispatch keys. - I intentionally did NOT add a Backend for Meta. For now, I'm going to hope meta tensors never exercise any of the Backend conversion code; even if it does, better to fix the code to just stop converting to and from Backend. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D26763552 Pulled By: ezyang fbshipit-source-id: 14633b6ca738e60b921db66a763155d01795480d	2021-03-03 11:24:13 -08:00
Lance Ware	fdd25f82c9	Update to replace AT_ERROR with TORCH_CHECK (#52711 ) Summary: Fixes #{52699} Pull Request resolved: https://github.com/pytorch/pytorch/pull/52711 Reviewed By: ailzhang Differential Revision: D26654677 Pulled By: malfet fbshipit-source-id: 97079250d144c9b1c69028f35e4a23a34481b2a5	2021-02-25 19:51:29 -08:00
Bel H	30cb6ac53c	Introduce `mlc` device (ML Compute device) to PyTorch's device list (#50634 ) Summary: Apple recently announced ML Compute, a new framework available in macOS Big Sur, which enables users to accelerate the training of neural networks on Mac hardware. This PR is the first on a series of PRs that will enable the integration with ML Compute. Most of the integration code will live on a separate subrepo named `mlc`. The integration with `mlc` (ML Compute) will be very similar to that of xla. We rely on registering our ops through: TORCH_LIBRARY_IMPL(aten, PrivateUse1, m) { m.impl_UNBOXED(<op_schema_name>, &customized_op_kernel) ... } Pull Request resolved: https://github.com/pytorch/pytorch/pull/50634 Reviewed By: malfet Differential Revision: D26614213 Pulled By: smessmer fbshipit-source-id: 3b492b346c61cc3950ac880ac01a82fbdddbc07b	2021-02-24 22:39:11 -08:00
Hong Xu	dfca1e48d3	Replace all AT_ASSERTM under c10/ (except Exception.h) (#50843 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50843 AT_ASSERTM is deprecated and should be replaced by either TORCH_CHECK or TORCH_INTERNAL_ASSERT, depending on the situation. Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D26074365 Pulled By: ezyang fbshipit-source-id: 46e13588fad4e24828f3cc99635e9cb2223a6c2c	2021-01-29 11:37:07 -08:00
chengjun	4a8ef4525e	Add new backend type for Intel heterogeneous computation platform. (#49786 ) Summary: Add a new device type 'XPU' ('xpu' for lower case) to PyTorch. Changes are needed for code related to device model and kernel dispatch, e.g. DeviceType, Backend and DispatchKey etc. https://github.com/pytorch/pytorch/issues/48246 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49786 Reviewed By: mrshenli Differential Revision: D25893962 Pulled By: ezyang fbshipit-source-id: 7ff0a316ee34cf0ed6fc7ead08ecdeb7df4b0052	2021-01-20 08:15:18 -08:00
Vasiliy Kuznetsov	a9137aeb06	quantized tensor: add preliminary support for advanced indexing, try 2 (#49346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49346 This is less ambitious redo of https://github.com/pytorch/pytorch/pull/49129/. We make the ``` xq_slice = xq[:, [0], :, :] ``` indexing syntax work if `xq` is a quantized Tensor. For now, we are making the code not crash, with an in efficient `dq -> index -> q` implementation. A future PR can optimize performance by removing the unnecessary memory copies (which will require some non-trivial changes to TensorIterator). Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_advanced_indexing ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25539365 fbshipit-source-id: 98485875aaaf5743e1a940e170258057691be4fa	2020-12-16 01:28:38 -08:00
Scott Wolchok	4c9eb57914	[PyTorch] Narrow Device to 2 bytes by narrowing DeviceType and DeviceIndex (#47023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47023 DeviceType pretty clearly only needs 1 byte. DeviceIndex only needs 1 byte given that machines don't have anywhere near 255 GPUs in them as far as I know. ghstack-source-id: 116901430 Test Plan: Existing tests, added assertion to catch if my assumption about DeviceIndex is incorrect Reviewed By: dzhulgakov Differential Revision: D24605460 fbshipit-source-id: 7c9a89027fcf8eebd623b7cdbf6302162c981cd2	2020-11-18 19:39:40 -08:00
Sebastian Messmer	edf751ca2f	Make empty c10-full (#46092 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46092 Make empty c10-full without using hacky-wrapper, i.e. port the kernel to the new style signature. This PR also changes the signature of some helpers called by empty to the new style. ghstack-source-id: 116544203 (Note: this ignores all push blocking failures!) Test Plan: vs prev diff (outdated, before c10::optional fix): https://www.internalfb.com/intern/fblearner/details/224735103/ after c10::optional fix: https://www.internalfb.com/intern/fblearner/details/231391773/ Also, after the c10::optional fix, the instruction counting benchmark shows a 2% regression for calling empty from Python. We decided this is acceptable and decided against landing D24425836 which would fix the regression. Reviewed By: ezyang Differential Revision: D24219944 fbshipit-source-id: e554096e90ce438c75b679131c3151ff8e5c5d50	2020-11-12 17:08:21 -08:00
Basil Hosmer	99fed7bd87	faster TensorOptions merging (#45046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45046 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23806787 Pulled By: bhosmer fbshipit-source-id: 3c8304f9a4503658081f8805ec06da78a467e125	2020-10-30 10:18:40 -07:00
Basil Hosmer	377a09c8e8	reland fast TypeMeta/ScalarType conversion (#45544 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45544 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D24006482 Pulled By: bhosmer fbshipit-source-id: 5da2401ab40bbf58da27a5d969e00bcee7562ed6	2020-10-29 14:07:39 -07:00
Tao Xu	a277c097ac	[iOS][GPU] Add Metal/MPSCNN support on iOS (#46112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46112 ### Summary This PR adds the support of running torchscript models on iOS GPU via Metal (Inference only). The feature is currently in prototype state, API changes are expected. The tutorial and the documents will be added once it goes to beta. allow-large-files - Users API ``` auto module = torch::jit::load(model); module.eval(); at::Tensor input = at::ones({1,3,224,224}, at::ScalarType::Float).metal(); auto output = module.forward({input}).toTensor().cpu(); ``` - Supported Models - Person Segmentation v106 (FB Internal) - Mobilenetv2 - Supported Operators - aten::conv2d - aten::addmm - aten::add.Tensor - aten::sub.Tensor - aten::mul.Tensor - aten::relu - aten::hardtanh - aten::hardtanh_ - aten::sigmoid - aten::max_pool2d - aten::adaptive_avg_pool2d - aten::reshape - aten::t - aten::view - aten::log_softmax.int - aten::upsample_nearest2d.vec - Supported Devices - Apple A9 and above - iOS 10.2 and above - CMake scripts - `IOS_ARCH=arm64 ./scripts/build_ios.sh -DUSE_METAL=ON` ### Test Plan - Circle CI ghstack-source-id: 114155638 Test Plan: 1. Sandcastle CI 2. Circle CI Reviewed By: dreiss Differential Revision: D23236555 fbshipit-source-id: 98ffc48b837e308bc678c37a9a5fd8ae72d11625	2020-10-13 01:46:56 -07:00
Mike Ruberry	ab5edf21b0	Revert D23789657: [wip] fast typeMeta/ScalarType conversion approach 2 Test Plan: revert-hammer Differential Revision: D23789657 (`1ed1a2f5b0`) Original commit changeset: 5afdd52d24bd fbshipit-source-id: 6d827be8895bcb39c8e85342eee0f7a3f5056c76	2020-09-29 09:40:53 -07:00
Basil Hosmer	1ed1a2f5b0	[wip] fast typeMeta/ScalarType conversion approach 2 (#44965 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44965 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23789657 Pulled By: bhosmer fbshipit-source-id: 5afdd52d24bd097891ff4a7313033f7bd400165e	2020-09-29 02:39:36 -07:00
Sebastian Messmer	78fcde9c50	Trace scattered tensor options arguments (#44071 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44071 Previously, tracing re-gathered ScalarType, Layout, Device, bool into a TensorOptions object and called `tracer::addInput()` on the gathered TensorOptions argument. `tracer::addInput()` then scattered them again and added the individual scattered arguments to the traced graph. This PR avoids the extraneous gathering and re-scattering step and calls `tracer::addInput()` on the individual arguments directly. This avoid the perf hit for an unnecessary gathering step. This applies to both c10-full and non-c10-full ops. In the case of c10-full ops, the tracing kernels takes scattered arguments and we can directly pass them to `tracer::addInput()`. In the case of non-c10-full ops, the kernel takes a `TensorOptions` argument but we still call `tracer::addInput()` on the scattered arguments. ghstack-source-id: 112825793 Test Plan: waitforsandcastle vs master: https://www.internalfb.com/intern/fblearner/details/216129483/ vs previous diff: https://www.internalfb.com/intern/fblearner/details/216170069/ Reviewed By: ezyang Differential Revision: D23486638 fbshipit-source-id: e0b53e6673cef8d7f94158e718301eee261e5d22	2020-09-25 09:04:06 -07:00
Sebastian Messmer	2ac7de7d53	Remove hacky_wrapper from BackendSelect kernels (#44062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44062 Previously, BackendSelect kernels were still written in the legacy way, i.e. they took one TensorOptions argument instead of scattered dtype, layout, device, pin_memory, and they used hacky_wrapper to be callable. This caused a re-wrapping step. Calling into a BackencSelect kernel required taking the individual scattered arguments, packing them into a TensorOptions, and the kernel itself then gathered them again for redispatch. Now with this PR, BackendSelect kernels are written in the new way and no hacky_wrapper or rewrapping is needed for them. ghstack-source-id: 112825789 Test Plan: vs master: https://www.internalfb.com/intern/fblearner/details/216117032/ vs previous diff: https://www.internalfb.com/intern/fblearner/details/216170194/ Reviewed By: ezyang Differential Revision: D23484192 fbshipit-source-id: e8fb49c4692404b6b775d18548b990c4cdddbada	2020-09-25 09:04:03 -07:00
Ailing Zhang	224232032c	Move Autograd to an alias dispatch key (#43070 ) Summary: This PR moves `DispatchKey::Autograd` to an alias dispatch key mapping to `AutogradCPU, AutogradCUDA, AutogradXLA, AutogradOther, AutogradPrivate*` keys. A few things are handled in this PR: - Update alias dispatch key mapping and precompute dispatchTable logic - Move `Autograd` key from `always_included` set to TensorImpl constructor. - Update `dummyTensor` constructor to take `requires_grad` as optional argument so that it's closer to the real application in op_registration_test. - Use `BackendSelect` key for both backend select before and after autograd layer. (1 liner in backend_select codegen) A few planned followups ordered by priority: - [cleanup] Update `test_dispatch.py` to include testing `Autograd`. - [cleanup] Add Math alias key and move catchAll to Math. (to remove 2.2 in `computeDispatchTableEntryWithDebug`) - [new feature] Add support for Math in native_functions.yaml - [cleanup] Add iterator like functionality to DispatchKeySet - [cleanup/large] Only add Autograd backend keys when tensor requires grad. (cc: ljk53 ?) Pull Request resolved: https://github.com/pytorch/pytorch/pull/43070 Reviewed By: ezyang Differential Revision: D23281535 Pulled By: ailzhang fbshipit-source-id: 9ad00b17142e9b83304f63cf599f785500f28f71	2020-09-01 09:05:29 -07:00
Ailing Zhang	7cb8d68ae1	Rename XLAPreAutograd to AutogradXLA. (#43047 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/43047 Reviewed By: ezyang Differential Revision: D23134326 Pulled By: ailzhang fbshipit-source-id: 5fcbc23755daa8a28f9b03af6aeb3ea0603b5c9a	2020-08-17 10:47:43 -07:00
Ailing Zhang	cfe1c6ef9e	Update XLAPreAutograd keys. (#40265 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40265 Differential Revision: D22137998 Pulled By: ailzhang fbshipit-source-id: 41edac06f8aafa5d4c1dcefd5da81be6c9ac4a9c	2020-06-19 21:12:50 -07:00
Dylan Bespalko	c767d65caf	Added FPGA DispatchKey, DeviceType, Backend (#38938 ) Summary: ezyang, I have added the changes to DispatchKey, DeviceType, Backend to support the out-of-tree FPGA. cc. tataetae Pull Request resolved: https://github.com/pytorch/pytorch/pull/38938 Differential Revision: D21748955 Pulled By: ezyang fbshipit-source-id: fe76d9730818205961430d2a0e00727b5c547b32	2020-06-03 07:28:14 -07:00
Ivan Kobzarev	b460465a18	[Mobile GPU][Integration] Vulkan backend integration (#36491 ) Summary: This PR contains the initial version of Vulkan (GPU) Backend integration. The primary target environment is Android, but the desktop build is also supported. ## CMake Introducing three cmake options: USE_VULKAN: The main switch, if it is off, all other options do not affect. USE_VULKAN_WRAPPER: ON - Vulkan will be used loading it at runtime as "libvulkan.so" using libdl, every function call is wrapped in vulkan_wrapper.h. OFF - linking with libvulkan.so directly USE_VULKAN_SHADERC_RUNTIME: ON - Shader compilation library will be linked, and shaders will be compiled runtime. OFF - Shaders will be precompiled and shader compilation library is not included. ## Codegen if `USE_VULKAN_SHADERC_RUNTIME` is ON: Shaders precompilation () starts in cmake/VulkanCodegen.cmake, which calls `aten/src/ATen/native/vulkan/gen_glsl.py` or `aten/src/ATen/native/vulkan/gen_spv.py` to include shaders source or SPIR-V bytecode inside binary as uint32_t array in spv.h,spv.cpp. if `USE_VULKAN_SHADERC_RUNTIME` is OFF: The source of shaders is included as `glsl.h`,`glsl.cpp`. All codegen results happen in the build directory. ## Build dependencies cmake/Dependencies.cmake If the target platform is Android - vulkan library, headers, Vulkan wrapper will be used from ANDROID_NDK. Desktop build requires the VULKAN_SDK environment variable, and all vulkan dependencies will be used from it. (Desktop build was tested only on Linux). ## Pytorch integration: Adding 'Vulkan" as new Backend, DispatchKey, DeviceType. We are using Strided layout without supporting strides at the moment, but we plan to support them in the future. Using OpaqueTensorImpl where OpaqueHandle is copyable VulkanTensor, more details in comments in `aten/src/ATen/native/vulkan/Vulkan.h` Main code location: `aten/src/ATen/native/vulkan` `aten/src/ATen/native/vulkan/VulkanAten.cpp` - connection link between ATen and Vulkan api (Vulkan.h) that converts at::Tensor to VulkanTensor. `aten/src/ATen/native/Vulkan/Vulkan.h` - Vulkan API that contains VulkanTensor representation and functions to work with it. Plan to expose it for clients to be able to write their own Vulkan Ops. `aten/src/ATen/native/vulkan/VulkanOps.cpp` - Vulkan Operations Implementations that uses Vulkan.h API ## GLSL shaders Located in `aten/src/ATen/native/vulkan/glsl` as *.glsl files. All shaders use Vulkan specialized constants for workgroup sizes with ids 1, 2, 3 ## Supported operations Code point: conv2d no-groups conv2d depthwise addmm upsample nearest 2d clamp hardtanh ## Testing `aten/src/ATen/test/vulkan_test.cpp` - contains tests for copy from CPU to Vulkan and back all supported operations Desktop builds supported, and testing can be done on a desktop that has Vulkan supported GPU or with installed software implementation of Vulkan, like https://github.com/google/swiftshader ## Vulkan execution The initial implementation is trivial and waits every operator's execution. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36491 Differential Revision: D21696709 Pulled By: IvanKobzarev fbshipit-source-id: da3e5a770b1a1995e9465d7e81963e7de56217fa	2020-05-26 08:30:13 -07:00
Jerry Zhang	385165ec67	[reland][quant] QuantizedCUDA implementation (#36936 ) (#37081 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37081 Closes https://github.com/pytorch/pytorch/issues/30813 Relanding of https://github.com/pytorch/pytorch/pull/35463 1. Tensor quantization logic(quantize_) is moved to the aten/native/quantized. Previously all logic for tensor quantization lived in the aten/quantized/Quantizer.cpp file, and started to become complicated and hard to read. This problem should be addressed in refactoring PR. Still, I reworked this partially because I had to add tensor quantization logic for CUDA, and it was native to move everything to the aten/native/quantized. 2. Requirements to run CUDA_tensor_apply was eased to process any tenser that lives on the CUDA device(QuantizedCUDA included). 3. All quantized data types now have a default constructor. NVCC refuses to compile any gpu_kernel or CUDA_tensor_apply* without them. 4. Minor changes in many files to register QuantizedCUDA backend. 5. test_quantized_tensor is extended to process QuantizedCUDA backend where possible. Test Plan: Imported from OSS Differential Revision: D21206694 Pulled By: jerryzh168 fbshipit-source-id: c7433aad9c095a34c57e6dddd128b5c5d9292373	2020-04-24 10:21:59 -07:00
Mike Ruberry	4bbc49f53a	Revert D21143025: [reland][quant] QuantizedCUDA implementation Test Plan: revert-hammer Differential Revision: D21143025 Original commit changeset: 11405e2e8f87 fbshipit-source-id: ce471ec95c1fc6abff6d1bbdba11bef02f3a0d62	2020-04-21 20:36:12 -07:00
Jerry Zhang	97d3a8495d	[reland][quant] QuantizedCUDA implementation (#36936 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36936 Closes https://github.com/pytorch/pytorch/issues/30813 Relanding of https://github.com/pytorch/pytorch/pull/35463 1. Tensor quantization logic(quantize_) is moved to the aten/native/quantized. Previously all logic for tensor quantization lived in the aten/quantized/Quantizer.cpp file, and started to become complicated and hard to read. This problem should be addressed in refactoring PR. Still, I reworked this partially because I had to add tensor quantization logic for CUDA, and it was native to move everything to the aten/native/quantized. 2. Requirements to run CUDA_tensor_apply was eased to process any tenser that lives on the CUDA device(QuantizedCUDA included). 3. All quantized data types now have a default constructor. NVCC refuses to compile any gpu_kernel or CUDA_tensor_apply* without them. 4. Minor changes in many files to register QuantizedCUDA backend. 5. test_quantized_tensor is extended to process QuantizedCUDA backend where possible. Test Plan: Imported from OSS Differential Revision: D21143025 Pulled By: jerryzh168 fbshipit-source-id: 11405e2e8f87e48fadc0a084c51db15f85ccb500	2020-04-21 13:18:52 -07:00
Alban Desmaison	49b10c58a3	Revert D20896697: [pytorch][PR] QuantizedCUDA implementation Test Plan: revert-hammer Differential Revision: D20896697 Original commit changeset: 163554efa23d fbshipit-source-id: e3e370ef7c8be68ea34368dfcc7a7efc9d1f8761	2020-04-19 12:41:51 -07:00
Aleksandr Fedorov	f6daa6220e	QuantizedCUDA implementation (#35463 ) Summary: Closes https://github.com/pytorch/pytorch/issues/30813 1. Tensor quantization logic(quantize_) is moved to the aten/native/quantized. Previously all logic for tensor quantization lived in the aten/quantized/Quantizer.cpp file, and started to become complicated and hard to read. This problem should be addressed in refactoring PR. Still, I reworked this partially because I had to add tensor quantization logic for CUDA, and it was native to move everything to the aten/native/quantized. 2. Requirements to run CUDA_tensor_apply was eased to process any tenser that lives on the CUDA device(QuantizedCUDA included). 3. All quantized data types now have a default constructor. NVCC refuses to compile any gpu_kernel or CUDA_tensor_apply* without them. 4. Minor changes in many files to register QuantizedCUDA backend. 5. test_quantized_tensor is extended to process QuantizedCUDA backend where possible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35463 Differential Revision: D20896697 Pulled By: jerryzh168 fbshipit-source-id: 163554efa23d11a2b10bbc2492439db4798eb26b	2020-04-19 08:33:16 -07:00
Edward Yang	dd64e738c5	Expunge TensorId from all DispatchKey names. (#36240 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36240 It's annoying, historical, and unnecessary (enum class is already namespaced). I did this codemod with: ``` git grep -l 'CPUTensorId' \| xargs sed -i 's/CPUTensorId/CPU/g' git grep -l 'CUDATensorId' \| xargs sed -i 's/CUDATensorId/CUDA/g' git grep -l 'VariableTensorId' \| xargs sed -i 's/VariableTensorId/Autograd/g' git grep -l 'HIPTensorId' \| xargs sed -i 's/HIPTensorId/HIP/g' git grep -l 'MSNPUTensorId' \| xargs sed -i 's/MSNPUTensorId/MSNPU/g' git grep -l 'XLATensorId' \| xargs sed -i 's/XLATensorId/XLA/g' git grep -l 'PrivateUse1_TensorId' \| xargs sed -i 's/PrivateUse1_TensorId/PrivateUse1/g' git grep -l 'PrivateUse2_TensorId' \| xargs sed -i 's/PrivateUse2_TensorId/PrivateUse2/g' git grep -l 'PrivateUse3_TensorId' \| xargs sed -i 's/PrivateUse3_TensorId/PrivateUse3/g' git grep -l 'AutocastTensorId' \| xargs sed -i 's/AutocastTensorId/Autocast/g' git grep -l '_PreAutogradTensorId' \| xargs sed -i 's/_PreAutogradTensorId/_PreAutograd/g' git grep -l 'TESTING_ONLY_GenericWrapperTensorId' \| xargs sed -i 's/TESTING_ONLY_GenericWrapperTensorId/TESTING_ONLY_GenericWrapper/g' git grep -l 'TESTING_ONLY_GenericModeTensorId' \| xargs sed -i 's/TESTING_ONLY_GenericModeTensorId/TESTING_ONLY_GenericMode/g' ``` Then I did a git grep for remaining TensorId occurrences, and manually killed those (mostly in codegen, and some docs that needed updating). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20929255 Pulled By: ezyang fbshipit-source-id: dc371b6aa6e6ea7c0a5660137c14debde806a09d	2020-04-13 23:33:44 -07:00
Edward Yang	16a88e4369	Add unboxedCallRedispatch (#35476 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35476 A few things: - Add new callUnboxedRedispatch function which can be used to do a redispatch when you don't want to add a type id to the excluded set. This will recompute the dispatch key but ignore everything including and before the currentDispatchKey - Add FULL_AFTER constructor to DispatchKeySet; used to implement redispatch. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D20680518 Test Plan: Imported from OSS Pulled By: ezyang fbshipit-source-id: ecd7fbdfa916d0d2550a5b19dd3ee4a9f2272457	2020-04-01 12:48:33 -07:00
Basil Hosmer	ad769d74d9	Collapse _like overloads into a single overload. (#33705 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33705 The fact that there were two overloads appears to be a historical artifact that dates back to when goldsborough originally added these bindings in the first place. If TensorOptions is made optional, then you only need one overload, not two, as they are exactly redundant with each other. When MemoryFormat was added, it was made a little harder to do this, as the C++ syntax at::empty_like(t, memory_format) would not work if you collapsed the overload; but now it works because TensorOptions supports MemoryFormat. The upshot is, I can get rid of all the overloads and just have one overload. Amazingly, this change is backwards compatible, as the test attests. While I was at it, I also deleted the overload name from the functions entirely. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20073355 Pulled By: bhosmer fbshipit-source-id: c6a8908213b32ccf6737ea864d135e2cce34f56b	2020-03-01 19:40:22 -08:00
Basil Hosmer	b98bce8cd4	Add MemoryFormat to TensorOptions, but not codegen. (#33704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33704 This diff adds MemoryFormat field to TensorOptions, and teaches all kernels that take TensorOptions to respect it, but doesn't teach the codegen about it. As such, it is now possible to specify memory_format using TensorOptions syntax, e.g., at::empty_like(tensor, at::memory_format(MemoryFormat::Contiguous)) in the C++ API, but there isn't any other user visible effect. The intended end state of this diff stack is to eliminate the explicit MemoryFormat? arguments from native functions, but as this change has BC implications I'd prefer to do it separately. So this starts things off with a non-BC breaking addition to the API. For all internal functions that are not bound by codegen, I switch them to exclusively using TensorOptions (eliminating MemoryFormat); there's only a few, mostly quantized and to(). To keep things screwed down in the short term, it is a HARD ERROR to specify both the explicit MemoryFormat argument as well as TensorOptions. This caught a few errors in my diff where I needed to modify memory format settings and then call code later, esp in empty_like. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20073356 Pulled By: bhosmer fbshipit-source-id: 18d310d7ee7cf2ee182994104652afcfc9d613e2	2020-03-01 18:22:12 -08:00
Edward Yang	d41c8d0461	Correctly preserve "not set anywhere" TensorOptions when merging. (#33510 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33510 Previously, we would fill in TensorOption with defaults whenever an item was missing from both the left and right side of the merge. This is morally incorrect: if we don't have an item on the left or right, we should keep the entry empty (so the downstream user can apply the appropriate defaulting rule). I don't think this caused any bugs, but I noticed this error when working on a later patch in my diff stack. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20001775 Pulled By: ezyang fbshipit-source-id: 88139fc268b488cd1834043584a0d73f46c8ecaa	2020-02-26 21:46:39 -08:00

1 2

84 Commits