pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Tugsbayasgalan Manlaibaatar	39fd7f945f	Add Symbool support in python to C++ translation (#98453 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98453 Approved by: https://github.com/ezyang	2023-04-12 03:21:57 +00:00
albanD	dda95236c9	Add fast path in our type checks and argparser (#98764 ) Add fastpath for common use cases in our python arg parsing. This is using the observation that exact type check is a lot fast (pointer comparison) than subtype check (isintance call). So we make sure to do these before any isinstance check. This can be pretty significant where `a.view((1, 1, 1, 1))` goes from ~1.13us to 800ns. Full test: Tested perf locally with cpu freq locked and script pinned to a single core to reduce jitter. Benchmark results after doing each change in this PR one by one: ``` [albandes@albandes-fedora-K2202N0104138 test]$ # Original [albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' Running a.view(1) 827 ns ± 0.945 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1)) 947 ns ± 1.23 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1)) 1.04 µs ± 0.882 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1, 1)) 1.14 µs ± 1.59 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze(0) 797 ns ± 0.955 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0,)) 937 ns ± 1.51 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0, 1)) 1.02 µs ± 3.52 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) [albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' Running a.view(1) 823 ns ± 1.76 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1)) 938 ns ± 1.38 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1)) 1.03 µs ± 0.801 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1, 1)) 1.13 µs ± 0.877 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze(0) 768 ns ± 2.27 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0,)) 927 ns ± 0.779 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0, 1)) 1.01 µs ± 1.34 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) [albandes@albandes-fedora-K2202N0104138 test]$ # checkLong fastpath [albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' Running a.view(1) 801 ns ± 0.982 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1)) 900 ns ± 0.593 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1)) 1 µs ± 1.44 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1, 1)) 1.1 µs ± 1.38 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze(0) 782 ns ± 0.968 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0,)) 1.11 µs ± 424 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0, 1)) 1.09 µs ± 54.7 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) [albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' Running a.view(1) 817 ns ± 0.65 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1)) 912 ns ± 0.853 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1)) 1.02 µs ± 8.45 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1, 1)) 1.11 µs ± 2.53 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze(0) 781 ns ± 0.942 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0,)) 939 ns ± 1.57 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0, 1)) 1.01 µs ± 0.875 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) [albandes@albandes-fedora-K2202N0104138 test]$ # Tensor check fastpath [albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' Running a.view(1) 806 ns ± 2.8 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1)) 903 ns ± 1.82 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1)) 1 µs ± 1.21 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1, 1)) 1.1 µs ± 1.17 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze(0) 770 ns ± 1.66 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0,)) 931 ns ± 3.36 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0, 1)) 1.02 µs ± 0.983 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) [albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' Running a.view(1) 813 ns ± 2.42 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1)) 915 ns ± 0.868 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1)) 1.02 µs ± 1.09 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1, 1)) 1.11 µs ± 1.15 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze(0) 785 ns ± 0.807 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0,)) 941 ns ± 1.02 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0, 1)) 1.02 µs ± 0.857 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) [albandes@albandes-fedora-K2202N0104138 test]$ # Fast path number in intlist/symintlist [albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' Running a.view(1) 728 ns ± 0.503 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1)) 749 ns ± 0.829 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1)) 771 ns ± 0.727 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1, 1)) 800 ns ± 0.962 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze(0) 772 ns ± 0.622 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0,)) 883 ns ± 0.567 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0, 1)) 915 ns ± 0.638 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) [albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' Running a.view(1) 735 ns ± 1.27 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1)) 753 ns ± 2.57 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1)) 774 ns ± 1.38 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1, 1)) 801 ns ± 0.835 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze(0) 773 ns ± 0.677 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0,)) 873 ns ± 1.1 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0, 1)) 907 ns ± 0.836 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) ``` <details> <summary>Test script</summary> ```python import torch from IPython import get_ipython a = torch.empty(1) print("Running ", "a.view(1)") get_ipython().run_line_magic("timeit", "a.view(1)") print("Running ", "a.view((1, 1))") get_ipython().run_line_magic("timeit", "a.view((1, 1))") print("Running ", "a.view((1, 1, 1))") get_ipython().run_line_magic("timeit", "a.view((1, 1, 1))") print("Running ", "a.view((1, 1, 1, 1))") get_ipython().run_line_magic("timeit", "a.view((1, 1, 1, 1))") a = torch.empty(1, 1, 1) print("Running ", "a.squeeze(0)") get_ipython().run_line_magic("timeit", "a.squeeze(0)") print("Running ", "a.squeeze((0,))") get_ipython().run_line_magic("timeit", "a.squeeze((0,))") print("Running ", "a.squeeze((0, 1))") get_ipython().run_line_magic("timeit", "a.squeeze((0, 1))") ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98764 Approved by: https://github.com/ngimel	2023-04-11 00:08:26 +00:00
Elias Ellison	5c8fea5647	Reduce overhead in CUDAGraph Trees (#98529 ) Significantly reduces overhead of constructing Tensors and Storages and checking Storage Liveness. Removes the regression for HF models that I tested and removes 75% of overhead of the extremely overhead bound resnet50 training we have in torchbench. (.91x base commit, 1.02x torchinductor default, 1.16x this PR, 1.25 previous cudagraphs impl). This PR takes care of all of the lower hanging fruit. - Computes storage aliasing at record time instead of during at runtime. We no longer need to use a runtime storage cache, and can instead index directly into the existing alias if there is one, or construct a new Storage - Moves the heavyweight C++ calls into a batch - getting storage weakrefs and constructing tensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/98529 Approved by: https://github.com/jansel, https://github.com/ngimel	2023-04-07 05:46:08 +00:00
Escapeqyq	3112d2a2b6	Export function symbols to enable Windows build of Intel Extension for PyTorch (#98054 ) This PR is to export specific function symbols into .dll shared library on Windows platform to support Windows build for [Intel Extension for PyTorch](https://github.com/intel/intel-extension-for-pytorch). TORCH_API/TORCH_PYTHON_API/PYBIND11_EXPORT are macros that decorate the function as dllexport while compilation, so that the function symbol will be exported into the .dll shared library file on Windows platform. It is necessary for other libraries (such as IPEX) to import and call these functions through dynamic linking of PyTorch on Windows platform. The code changes of this PR adds decorators to export specific functions used by IPEX. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98054 Approved by: https://github.com/ezyang	2023-04-05 23:23:18 +00:00
cyy	f27e09de04	Cleanup Windows warning suppression in CMake and fix some warnings in the source code (#94927 ) This PR do two things: 1. It moves some Windows warning suppression from various CMake files into the main CMakeList.txt, following the conventions of gcc and clang. 2. It fixes some Windows warnings in the source code. Most importantly, it fixes lots of dll warnings by adjusting C10_API to TORCH_API or TORCH_PYTHON_API. There are still some dll warnings because some TORCH_API functions are actually built as part of libtorch_python Pull Request resolved: https://github.com/pytorch/pytorch/pull/94927 Approved by: https://github.com/malfet	2023-02-27 19:22:20 +00:00
Edward Z. Yang	d78274b759	Automatically guard when SymInt is converted to int (#95479 ) During enablement, we disabled int() conversions because they were any easy way to footgun guards. We have enough of dynamic shapes working now that this is now causing spurious errors; e.g., if you feed a symbolic int to x.size(symint). We now allow for implicit conversions of SymInt to int here, posting a guard. We expect guard provenance to help people debug overspecialization. Fixes https://github.com/pytorch/pytorch/issues/95328 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/95479 Approved by: https://github.com/wconstab, https://github.com/voznesenskym, https://github.com/ngimel	2023-02-25 19:41:51 +00:00
cyy	bfe5e1258b	avoid unnecessary static_cast (#93898 ) avoid unnecessary static_cast Pull Request resolved: https://github.com/pytorch/pytorch/pull/93898 Approved by: https://github.com/Skylion007	2023-02-03 03:44:43 +00:00
Eddie Yan	e096d2db5a	[BC-Breaking] Separate `stream_id`, `device_index`, and `device_type` in `pack` and `unpack` for `Streams` (#81596 ) #75854 A naive attempt at working around the limitations of using a single 64-bit integer to pack `stream_id`, `device_index`, and `device_type`. Stills needs sanity checks, testing, and minimization of BC-breaking changes. Currently a Holder for the `StreamData3` struct is used for `IValue` compatibility. While doing this seems to work for `ivalue.h` and `ivalue_inl.h`, this doesn't seem to be naively working for the JIT CUDA stream wrapper? (Something about ambiguous calls if an `intrusive_ptr` to `c10::ivalue::StreamData3Holder` is used as the return type for `pack()`. It turns out that the methods required to access the fields for rematerializing a CUDA Stream are basically already present anyway, so `pack` is simply removed in the wrapper for now and the methods to access the required fields are called directly. CC @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/81596 Approved by: https://github.com/ezyang	2023-01-12 14:16:49 +00:00
PyTorch MergeBot	b3603f8129	Revert "Deduplicate c10 error and PyTorchError hierarchy (#87855 )" This reverts commit `34f2d3e6ae`. Reverted https://github.com/pytorch/pytorch/pull/87855 on behalf of https://github.com/osalpekar due to perf regression in quantization tests	2023-01-06 19:56:35 +00:00
William Phetsinorath	34f2d3e6ae	Deduplicate c10 error and PyTorchError hierarchy (#87855 ) Fixes #53370 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87855 Approved by: https://github.com/albanD	2023-01-02 15:53:36 +00:00
Aaron Gokaslan	a34a9c3471	Perf: Apply more clang-tidy fixups to torch headers (#91445 ) Applies so more fixes to headers that may have been missed before for performance optimization.cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @EikanWang @ezyang since this more in the series of the clang-tidy fixup This is PR fixes 3 main issues: 1. Use emplacement more in headers 1. Avoid unnecessary copies and use const ref when possible 1. Default any special functions when possible to make them potentially trivial and more readable. 1. There is also one change in this PR that tries to prevent unnecessary math promotion, the rest of these changes are in another PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/91445 Approved by: https://github.com/ezyang	2022-12-29 23:43:45 +00:00
Edward Z. Yang	f7365eca90	Add unbacked symints support; item works now (#90624 ) The big idea is to add `create_unbacked_symfloat` and `create_unbacked_symint` to ShapeEnv, allowing you to allocate symbolic floats/ints corresponding to data you don't know about at compile time. Then, instead of immediately erroring out when you try to call local_scalar_dense on a FakeTensor, we instead create a fresh symint/symfloat and return that. There a bunch of odds and ends that need to be handled: * A number of `numel` calls converted to `sym_numel` * When we finally return from item(), we need to ensure we actually produce a SymInt/SymFloat when appropriate. The previous binding code assumed that you would have to get a normal Python item. I add a pybind11 binding for Scalar (to PyObject only) and refactor the code to use that. There is some trickiness where you are NOT allowed to go through c10::SymInt if there isn't actually any SymInt involved. See comment. * One of our unit tests tripped an implicit data dependent access which occurs when you pass a Tensor as an argument to a sizes parameter. This is also converted to support symbolic shapes * We now support tracking bare SymInt/SymFloat returns in proxy tensor mode (this was already in symbolic-shapes branch) * Whenever we allocate an unbacked symint, we record the stack trace it was allocated at. These get printed when you attempt data dependent access on the symint (e.g., you try to guard on it) * Subtlety: unbacked symints are not necessarily > 1. I added a test for this. These unbacked symints are not very useful right now as you will almost always immediately raise an error later when you try to guard on them. The next logical step is adding an assertion refinement system that lets ShapeEnv learn facts about unbacked symints so it can do a better job eliding guards that are unnecessary. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/90624 Approved by: https://github.com/Skylion007, https://github.com/voznesenskym	2022-12-12 13:33:07 +00:00
Edward Z. Yang	d3c01c722d	Fix pybind11 problems with c10::SymInt unregistered (#88011 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/88011 Approved by: https://github.com/weiwangmeta, https://github.com/albanD	2022-10-29 07:55:45 +00:00
Edward Z. Yang	1ff52225f1	Unify SymIntNode and SymFloatNode into SymNode (#87817 ) This refactor was prompted by challenges handling mixed int/float operations in C++. A previous version of this patch added overloads for each permutation of int/float and was unwieldy https://github.com/pytorch/pytorch/pull/87722/ This PR takes a different approach. The general outline of the patch is to combine the C++ types SymIntNode and SymFloatNode into a single type, SymNode. This is type erased; we no longer know statically at C++ if we have an int/float and have to test it with the is_int()/is_float() virtual methods. This has a number of knock on effects. - We no longer have C++ classes to bind to Python. Instead, we take an entirely new approach to our Python API, where we have a SymInt/SymFloat class defined entirely in Python, which hold a SymNode (which corresponds to the C++ SymNode). However, SymNode is not pybind11-bound; instead, it lives as-is in Python, and is wrapped into C++ SymNode using PythonSymNode when it goes into C++. This implies a userland rename. In principle, it is also possible for the canonical implementation of SymNode to be written in C++, and then bound to Python with pybind11 (we have this code, although it is commented out.) However, I did not implement this as we currently have no C++ implementations of SymNode. Because we do return SymInt/SymFloat from C++ bindings, the C++ binding code needs to know how to find these classes. Currently, this is done just by manually importing torch and getting the attributes. - Because SymInt/SymFloat are easy Python wrappers, __sym_dispatch__ now takes SymInt/SymFloat, rather than SymNode, bringing it in line with how __torch_dispatch__ works. Some miscellaneous improvements: - SymInt now has a constructor that takes SymNode. Note that this constructor is ambiguous if you pass in a subclass of SymNode, so an explicit downcast is necessary. This means toSymFloat/toSymInt are no more. This is a mild optimization as it means rvalue reference works automatically. - We uniformly use the caster for c10::SymInt/SymFloat, rather than going the long way via the SymIntNode/SymFloatNode. - Removed some unnecessary toSymInt/toSymFloat calls in normalize_* functions, pretty sure this doesn't do anything. - guard_int is now a free function, since to guard on an int you cannot assume the method exists. A function can handle both int and SymInt inputs. - We clean up the magic method definition code for SymInt/SymFloat/SymNode. ONLY the user classes (SymInt/SymFloat) get magic methods; SymNode gets plain methods; this is to help avoid confusion between the two types. Signed-off-by: Edward Z. Yang <ezyang@fb.com> cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87817 Approved by: https://github.com/albanD, https://github.com/anjali411	2022-10-27 20:56:02 +00:00
albanD	3263bd24be	Improve argument printing (#87601 ) No more "expected tuple but got tuple". We appropriately grovel in the list/tuple for the element that mismatched and report what exactly twinged the failure. invalid_arguments.cpp is a shitshow so I did something slapdash to get it not completely horrible. See https://github.com/pytorch/pytorch/issues/87514 for more context. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/87601 Approved by: https://github.com/Chillee	2022-10-24 23:55:10 +00:00
samdow	169ec120ef	[Modes] refactor modes to only use a stack in cpp (#86458 ) Refactors the mode code to only have the C++ mode stack and not the "C++ mode" like we originally had. This also simplifies the mode logic in a number of places Pull Request resolved: https://github.com/pytorch/pytorch/pull/86458 Approved by: https://github.com/zou3519	2022-10-21 19:18:23 +00:00
Edward Z. Yang	954660a308	Correctly error if you pass in tensors where size arguments expected (#86126 ) This also makes symintlist track intlist exception handling, which eellison fixed. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/86126 Approved by: https://github.com/eellison	2022-10-03 20:18:41 +00:00
Edward Z. Yang	07800c9c81	Miscellaneous fixes from symbolic-shapes branch (#86042 ) - Make toIValue accept SymIntNode and SymFloatNode where number (aka Scalar) is expected - Binding for symintlistOptional in python arg parser - Teach translate to convert from IntArrayRef to ArrayRef<int64_t> - Don't query _symint function for meta info in LTC unless LTC is code generating a symint function Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/86042 Approved by: https://github.com/Chillee	2022-10-01 13:57:58 +00:00
Elias Ellison	75db0225ad	Handle fake tensor in intlist (#85759 ) Previously, we were swallowing up the Fake Tensor Exception and throwing `TypeError`, which led to https://github.com/pytorch/torchdynamo/issues/1066. Now, we are propagating back the `DataDependentOutputException`. If this approach is accepted, I can go ahead and do doublelist, symintlist, afterward. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85759 Approved by: https://github.com/ezyang	2022-09-28 21:58:54 +00:00
Edward Z. Yang	9c036aa112	Add SymInt to Scalar (#84958 ) This is by no means comprehensive, but adds initial support for SymInt as a Scalar. Things that don't work yet but need to: - for some reason `torch.add(tensor, sym_int)` got matched to the `add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor` schema - `x + sym_int` failed bc we tried to turn `x` into a sym int: ``` "__radd__", [](c10::SymIntNode a, py::object b) -> c10::SymIntNode { auto snb = toSymIntNode(a, b); return a->add(snb); }) ``` - Many more things I'm sure Pull Request resolved: https://github.com/pytorch/pytorch/pull/84958 Approved by: https://github.com/ezyang	2022-09-25 23:51:06 +00:00
Nikolay Korovaiko	f725009a48	as_strided supports SymInt; codegen supports optional SymInt (#84393 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/84393 Approved by: https://github.com/ezyang	2022-09-06 16:39:24 +00:00
Edward Z. Yang	2a332afbf4	Add SymFloat, support SymInt to SymFloat conversion (#84284 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/84284 Approved by: https://github.com/albanD	2022-09-03 01:30:32 +00:00
Nikolay Korovaiko	63cbdc92a7	switching the exact check to isinstance check (#84023 ) Simplifying a type check if an object is a SymIntNode in `is_symint_node` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84023 Approved by: https://github.com/ezyang	2022-08-25 08:28:40 +00:00
Nikolay Korovaiko	5b621205f4	Revert "Revert "adding a custom caster for c10::SymInt (#82692 )"" (#83223 ) This should fix the MacOS build errors and reland #82692 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83223 Approved by: https://github.com/albanD	2022-08-12 00:46:50 +00:00
PyTorch MergeBot	daeea7d2c3	Revert "adding a custom caster for c10::SymInt (#82692 )" This reverts commit `dee63f4f7b`. Reverted https://github.com/pytorch/pytorch/pull/82692 on behalf of https://github.com/seemethere due to Broke internal builds, see [logs](https://www.internalfb.com/intern/sandcastle/job/4503600373141339/insights)	2022-08-09 22:17:41 +00:00
Nikolay Korovaiko	dee63f4f7b	adding a custom caster for c10::SymInt (#82692 ) ### Description Adding a custom caster for `c10::SymInt`. This simplifies handling of c10::SymInt on C++/Pytorch boundary. Namely, removing if statements to handle the union nature (e.g. SymIntNode, int) of c10::SymInt. ### Issue <!-- Link to Issue ticket or RFP --> ### Testing <!-- How did you test your change? --> Pull Request resolved: https://github.com/pytorch/pytorch/pull/82692 Approved by: https://github.com/ezyang	2022-08-08 21:40:53 +00:00
Peter Bell	2c2278a960	Make python TensorOption signatures consistent with JIT schemas (#82241 ) Fixes #81774 `TensorOptions` arguments in the JIT schema are optional, but in the Python API these were being translated to non-optional but with a default value. This change makes the arguments accept `None` for consistency with the JIT schema. However, it also means that `dtype=c10::nullopt` was previously completely untested so this also fixes several related bugs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82241 Approved by: https://github.com/ngimel	2022-08-07 00:10:27 +00:00
Edward Z. Yang	a9320e6d96	Delete SymInt::data() in favor of as_int_unchecked() (#82477 ) I audited all the sites while I was at it, and marked a few suspicious ones. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/82477 Approved by: https://github.com/Chillee	2022-08-01 15:07:22 +00:00
Edward Z. Yang	50e8abbcad	Change SymIntNode into an intrusive pointer (#82548 ) This will make the pointer type a single word, which is important for packing it into an int64_t This time, this diff doesn't segfault when you build with DEBUG mode; more details at https://github.com/pybind/pybind11/issues/4099 Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/82548 Approved by: https://github.com/albanD	2022-08-01 15:07:21 +00:00
Edward Z. Yang	fd5ac1e6b5	Rename SymbolicIntNode to SymIntNodeImpl (#82350 ) Done via ``` git grep -l 'SymbolicIntNode' \| xargs sed -i 's/SymbolicIntNode/SymIntNodeImpl/g' ``` Reasoning for the change: * Sym is shorter than Symbolic, and consistent with SymInt * You usually will deal in shared_ptr<...>, so we're going to reserve the shorter name (SymIntNode) for the shared pointer. But I don't want to update the Python name, so afterwards I ran ``` git grep -l _C.SymIntNodeImpl \| xargs sed -i 's/_C.SymIntNodeImpl/_C.SymIntNode/' ``` and manually fixed up the binding code Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/82350 Approved by: https://github.com/Krovatkin	2022-07-28 18:27:45 +00:00
George Qi	393f7f6ad7	add layout to slow path (#80429 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80429 Approved by: https://github.com/ezyang	2022-07-06 18:01:31 +00:00
Edward Z. Yang	421f04dd02	Only allow numbers as tensors if operator was explicitly allowlisted so (#80587 ) Fixes https://github.com/pytorch/pytorch/issues/80508 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/80587 Approved by: https://github.com/ngimel	2022-06-30 18:59:38 +00:00
Edward Z. Yang	f7ee061638	Wconstab/reland pysymint (#79795 ) rebased https://github.com/pytorch/pytorch/pull/79617/ to see if issues are reproducible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79795 Approved by: https://github.com/malfet	2022-06-20 22:55:06 +00:00
PyTorch MergeBot	44436947bc	Revert "Reland PySymInt (#79617 )" This reverts commit `8ef6356f26`. Reverted https://github.com/pytorch/pytorch/pull/79617 on behalf of https://github.com/zengk95 due to this is breaking periodic jobs (and maybe pull) on trunk	2022-06-16 19:40:27 +00:00
Nikolay Korovaiko	8ef6356f26	Reland PySymInt (#79617 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/79617 Approved by: https://github.com/Chillee	2022-06-16 04:18:06 +00:00
PyTorch MergeBot	b8db0a0475	Revert "Python Bindings for SymInts (#78135 )" This reverts commit `d332724071`. Reverted https://github.com/pytorch/pytorch/pull/78135 on behalf of https://github.com/ezyang due to broke torchvision tests	2022-06-15 13:52:14 +00:00
Nikolay Korovaiko	d332724071	Python Bindings for SymInts (#78135 ) This PR adds support for `SymInt`s in python. Namely, * `THPVariable_size` now returns `sym_sizes()` * python arg parser is modified to parse PyObjects into ints and `SymbolicIntNode`s * pybind11 bindings for `SymbolicIntNode` are added, so size expressions can be traced * a large number of tests added to demonstrate how to implement python symints. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78135 Approved by: https://github.com/ezyang	2022-06-14 02:17:59 +00:00
Michael Suo	30fb2c4aba	[lint] autoformat test/cpp and torch/csrc Let's have some fun. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78828 Approved by: https://github.com/ezyang	2022-06-11 21:11:16 +00:00
Elias Ellison	2d93e1fada	Add slow path for device Pull Request resolved: https://github.com/pytorch/pytorch/pull/77684 Approved by: https://github.com/ezyang	2022-05-24 21:56:01 +00:00
Michael Suo	7f1e331b34	Make SymInt constructor explicit Since we plan to have a bunch of code that is sensitive to whether or not a SymInt contains a symbolic shape or not, it seems like a bad idea to have an implicit constructor. For example, code like: ``` sizes_and_strides_.stride_at_unchecked(dim) = 0; ``` would sail through, and the `0` would get implicitly promoted to a SymInt. This is a tradeoff though: it makes code that handles `SymInt`s more clunky as `int64_t`s and integer literals need to be explicitly wrapped in `SymInt` before being used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77666 Approved by: https://github.com/ezyang	2022-05-17 22:28:35 +00:00
Nikolay Korovaiko	69e048b090	List of SymInt rebase on master Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/75115 Approved by: https://github.com/ezyang	2022-04-20 02:09:55 +00:00
Edward Z. Yang	0a1bc5f501	Miscellaneous __torch_function__ fixes I figured these out by unconditionally turning on a no-op torch function mode on the test suite and then fixing errors as they showed up. Here's what I found: - _parse_to failed internal assert when __torch_function__'ed because it claims its name is "to" to the argument parser; added a name override so we know how to find the correct name - Infix operator magic methods on Tensor did not uniformly handle __torch_function__ and TypeError to NotImplemented. Now, we always do the __torch_function__ handling in _wrap_type_error_to_not_implemented and your implementation of __torch_function__ gets its TypeErrors converted to NotImplemented (for better or for worse; see https://github.com/pytorch/pytorch/issues/75462 ) - A few cases where code was incorrectly testing if a Tensor was Tensor-like in the wrong way, now use is_tensor_like (in grad and in distributions). Also update docs for has_torch_function to push people to use is_tensor_like. - is_grads_batched was dropped from grad in handle_torch_function, now fixed - Report that you have a torch function even if torch function is disabled if a mode is enabled. This makes it possible for a mode to return NotImplemented, pass to a subclass which does some processing and then pass back to the mode even after the subclass disables __torch_function__ (so the tensors are treated "as if" they are regular Tensors). This brings the C++ handling behavior in line with the Python behavior. - Make the Python implementation of overloaded types computation match the C++ version: when torch function is disabled, there are no overloaded types (because they all report they are not overloaded). Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/75484 Approved by: https://github.com/zou3519	2022-04-11 16:52:16 +00:00
Edward Z. Yang	31c86625cc	__torch_function__ mode Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/75154 Approved by: https://github.com/albanD, https://github.com/zou3519	2022-04-07 02:23:29 +00:00
Edward Z. Yang	e3848d75df	Dedupe no parsing __torch_function__ handler Now there is truly only one way to call __torch_function__ and that is via handle_torch_function_no_python_arg_parser Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/75159 Approved by: https://github.com/zou3519	2022-04-04 14:35:02 +00:00
Nikolay Korovaiko	5177f95d21	Introducing SymInt to Pytorch (for tracing size arithmetic) (master rebase) (#74861 ) Summary: This PR introduces `SymInt` type to Pytorch which will be used by LTC and AOTAutograd for tracing size arithmetic and tests. `SymInt` is a C++ union structure [int64_t, SymbolicIntNode*] that wraps around an int64_t field where the value of the field could be an index into a list of `shared_ptr<SymbolicIntNode>` or a real int. This PR doesn't add any support for actually tracing symbolic ints. i.e. data_ for now can only contain real ints. ``` Goal 1: just to show we can add a type to PyTorch core. (wraps int) LANDEABLE Finalize the naming - symint Want the name to be short Does invoke “size” - NO SInt/SymInt/SymbolicInt SInt could mean signed int sym_int or symint or SymInt (originally it was “int”; capitalized implies object semantics, whereas lowercase implies value semantics) JIT schema - symint C++ - symint ``` See more details here: https://docs.google.com/document/d/1iiLNwR5ohAsw_ymfnOpDsyF6L9RTUaHMpD8 (`d843f63f2a`)YLw-jxEw Pull Request resolved: https://github.com/pytorch/pytorch/pull/74861 Reviewed By: qihqi, ngimel Differential Revision: D35226230 Pulled By: Krovatkin fbshipit-source-id: 34acf342bd50fcaa4d8d5dd49c2fd6a98823a5b3 (cherry picked from commit 218643f63ef181cabb92d13a6e837eb64f2dda3c)	2022-03-31 21:59:59 +00:00
Peter Bell	40d1f77384	Codegen: python_torch_functions only include relevant operators (#68693 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68693 Generation of python bindings for native functions is split over 8 different files. One for each namespace, with the torch namespace split into 3 shards, and methods in their own file as well. This change ensures that editing any single (non-method) operator only causes one of these files to be rebuilt. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32596270 Pulled By: albanD fbshipit-source-id: 0570ec69e7476b8f1bc21138ba18fe8f95ebbe3f (cherry picked from commit `ba0fc71a3a`)	2022-01-21 15:37:06 +00:00
Nikita Shulga	356af8f857	Do not use `ssize_t` in `python_arg_parser.[cpp\|h]` (#71250 ) Summary: Use `Py_ssize_t` when calling Python API Use `c10::irange` to automatically infer loop type Use `size_t` or `unsigned` for unsigned type Partially addresses https://github.com/pytorch/pytorch/issues/69948 Pull Request resolved: https://github.com/pytorch/pytorch/pull/71250 Reviewed By: atalman Differential Revision: D33569724 Pulled By: malfet fbshipit-source-id: c9eb75be9859d586c00db2f824c68840488a2822	2022-01-13 19:10:30 -08:00
Bert Maher	931352c68d	Make handle_torch_function_no_python_arg_parser public (#66054 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66054 I need this function in functorch to support the ability of custom jitted kernels to invoke torch_function when applicable. Test Plan: functorch unit tests Reviewed By: qihqi, ngimel Differential Revision: D31416599 Pulled By: bertmaher fbshipit-source-id: 90b57badd6a6b9d505ebfc436869b962b55c66d7	2021-10-06 00:27:10 -07:00
Kurt Mohler	5883523c1d	Remove dtype from torch.Storage and use only torch.ByteStorage (#62030 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62030 Remove dtype tracking from Python Storage interface, remove all the different `<type>Storage` classes except for `ByteStorage`, and update serialization accordingly, while maintaining as much FC/BC as possible Fixes https://github.com/pytorch/pytorch/issues/47442 * THE SERIALIZATION FORMAT IS FULLY FC/BC. We worked very hard to make sure this is the case. We will probably want to break FC at some point to make the serialization structure of tensors make more sense, but not today. * There is now only a single torch.ByteStorage class. Methods like `Tensor.set_` no longer check that the dtype of storage is appropriate. * As we no longer know what dtype of a storage is, we've removed the size method from Storage, replacing it with nbytes. This is to help catch otherwise silent errors where you confuse number of elements with number of bytes. * `Storage._new_shared` takes a `nbytes` kwarg and will reject previous positional only calls. `Storage._new_with_file` and `_set_from_file` require explicit element size arguments. * It's no longer possible to convert storages to different types using the float/double/etc methods. Instead, do the conversion using a tensor. * It's no longer possible to allocate a typed storage directly using FloatStorage/DoubleStorage/etc constructors. Instead, construct a tensor and extract its storage. The classes still exist but they are used purely for unpickling. * The preexisting serialization format stores dtype with storage, and in fact this dtype is used to determine the dtype of the tensor overall. To accommodate this case, we introduce a new TypedStorage concept that exists only during unpickling time which is used to temporarily store the dtype so we can construct a tensor. If you overrode the handling of pickling/unpickling, you MUST add handling for TypedStorage or your serialization code will degrade to standard file-based serialization. Original pull request: https://github.com/pytorch/pytorch/pull/59671 Reviewed By: soulitzer, ngimel Differential Revision: D29466819 Pulled By: ezyang fbshipit-source-id: 4a14e5d3c2b08e06e558683d97f7378a3180b00e	2021-10-05 13:50:34 -07:00
Richard Zou	67bd2a31b5	[Reland] Add python mode (#64360 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64360 This PR adds a (private) enable_python_mode context manager. (see torch/utils/_python_dispatch.py). enable_python_mode accepts the type of a __torch_dispatch__ object as its argument. Whenever an operator gets called inside of the context manager, it dispatches to the __torch_dispatch__ of the passed-in type. Example usage: ``` with enable_python_mode(LoggingTensor): z = torch.empty([]) assert isinstance(z, LoggingTensor) ``` There are quite a few changes that were made to support this. First, we added TorchDispatchTypeObject, a C++ struct that represents the type of a `__torch_dispatch__` object (e.g. LoggingTensor). It holds both the PyObject* representing the class and a PyInterpreter* so we know which Python interpreter it came from. Next, we updated the concrete_dispatch_fn in python_variable.cpp to accept a `const std::shared_ptr<TorchDispatchTypeObject>&` argument. When this is null, dispatching happens as usual. When it is non-null, we prepend the TorchDispatchTypeObject's PyObject* to the overloaded args list so that it is considered first for dispatch. To get that to work, we changed how `handle_torch_dispatch_no_python_arg_parser` works. The "overloaded args list" previously only consisted of Tensor PyObjects, but now it can have types in addition to Tensors! - We renamed `append_overloaded_arg` to `append_overloaded_arg` - We added a new `append_overloaded_type` that appends a type to overloaded_args - We added special handling in `handle_torch_dispatch_no_python_arg_parser` and `append_overloaded_arg` to handle types in addition to Tensors. Then, there is PythonMode and PythonModeTLS. - We reuse the DispatchKey::Python dispatch key as a mode key - We use PythonMode::enter and PythonMode::exit to enable/disable DispatchKey::Python and set the PythonModeTLS. - PythonModeTLS stores a TorchDispatchTypeObject as metadata. - PythonMode is in libtorch_python, and PythonModeTLS is in ATen. This split is due to the libtorch_python library boundary (because we need to save TLS in ATen/ThreadLocalState) - We modify the PythonFallbackKernel to look up the relevant TorchDispatchTypeObject (if Python Mode is active) and dispatch using it. There are two more miscellaneous changes: - internal_new_from_data (torch/csrc/utils/tensor_new.cpp) gets an exclude guard. enable_python_mode currently does not handle torch.tensor and the exclude guard is to prevent a bug. Future: - This PR does not allow for the nesting of Python modes. In the future we should be able to enable this with a more sane no_dispatch API and by changing the TLS to a stack. For now I did not need this for CompositeImplicitAutograd testing. Test Plan: - new tests Reviewed By: ezyang Differential Revision: D30698082 Pulled By: zou3519 fbshipit-source-id: 7094a90eee6aa51f8b71bc4d91cfb6f49e9691f8	2021-09-16 09:02:30 -07:00

1 2 3 4

186 Commits