pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
Edward Z. Yang	40c44c2307	Force specialization on INT_LIST (#111216 ) Follow up on https://github.com/pytorch/pytorch/pull/95479 Fixes https://github.com/pytorch/pytorch/issues/111198 Fixes https://github.com/pytorch/pytorch/issues/111197 Fixes https://github.com/pytorch/pytorch/issues/111188 Fixes https://github.com/pytorch/pytorch/issues/111201 Fixes https://github.com/pytorch/pytorch/issues/111202 I can also do this for some other types, will do this stacked on top. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/111216 Approved by: https://github.com/voznesenskym	2023-10-19 12:55:18 +00:00
Kurt Mohler	4c5e43574c	Reland 2: Add PyObject preservation for UntypedStorage (#109039 ) Relands #103907 after it was reverted. This PR makes the new `ignore_hermetic_tls` argument of `check_pyobj` optional to avoid causing a compilation error in torchdistx Part of #91395 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109039 Approved by: https://github.com/ezyang	2023-09-12 22:26:05 +00:00
PyTorch MergeBot	59f605be57	Revert "Reland 2: Add PyObject preservation for UntypedStorage (#109039 )" This reverts commit `419e4e17a2`. Reverted https://github.com/pytorch/pytorch/pull/109039 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing linter job in trunk, probably due to a landrace ([comment](https://github.com/pytorch/pytorch/pull/109039#issuecomment-1715147020))	2023-09-12 07:26:11 +00:00
Kurt Mohler	419e4e17a2	Reland 2: Add PyObject preservation for UntypedStorage (#109039 ) Relands #103907 after it was reverted. This PR makes the new `ignore_hermetic_tls` argument of `check_pyobj` optional to avoid causing a compilation error in torchdistx Part of #91395 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109039 Approved by: https://github.com/ezyang	2023-09-12 01:19:40 +00:00
PyTorch MergeBot	68238606f3	Revert "Reland: Add PyObject preservation for UntypedStorage (#103907 )" This reverts commit `56b848157c`. Reverted https://github.com/pytorch/pytorch/pull/103907 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it is failing torchdistx build which uses check_pyobj here `9c1b9f5cb2/src/python/torchdistx/_C/deferred_init.cc (L87)` ([comment](https://github.com/pytorch/pytorch/pull/103907#issuecomment-1712121158))	2023-09-08 19:27:07 +00:00
soulitzer	8d863560bd	Allow adding extra dispatch keys to wrapper tensor subclass (#108808 ) Updated version of https://github.com/pytorch/pytorch/pull/108313 which has more review comments Pull Request resolved: https://github.com/pytorch/pytorch/pull/108808 Approved by: https://github.com/bdhirsh	2023-09-08 18:46:09 +00:00
Kurt Mohler	56b848157c	Reland: Add PyObject preservation for UntypedStorage (#103907 ) This relands #97470 after #102553 reverted it. This PR attempts to fix the internal failure by avoiding an unnecessary intermediate storage buffer allocation in `c10::newStorageImplFromRefcountedDataPtr`. Part of #91395 Pull Request resolved: https://github.com/pytorch/pytorch/pull/103907 Approved by: https://github.com/ezyang	2023-09-07 04:24:11 +00:00
cyy	1fd4e787ce	[2/N] fix clang-tidy warnings in torch/csrc (#107966 ) Apply fixes to some found issues by clang-tidy in torch/csrc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107966 Approved by: https://github.com/Skylion007	2023-08-27 18:06:21 +00:00
FFFrog	6f0d0b3850	fix type check of overflow (#107579 ) Fixes #95451 and remove duplicate check Code: ```python import torch import sys i = sys.maxsize + 1 input = torch.full((1, 32, 32,), 0.5) torch.max_pool1d(input, kernel_size=[i] , stride=[i], padding=0, dilation=[i], ceil_mode=True) ``` Result: ```shell Traceback (most recent call last): File "/root/Git.d/pytorch/samples/src/simple.py", line 13, in <module> torch.max_pool1d(input, kernel_size=[i] , stride=[i], padding=0, dilation=[i], ceil_mode=True) TypeError: max_pool1d(): argument 'dilation' failed to unpack the object at pos 1 with error "Overflow when unpacking long" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/107579 Approved by: https://github.com/albanD	2023-08-23 15:34:40 +00:00
Yukio Siraichi	bcede143bd	Do not mutate `SymNode` expression. (#107492 ) This PR stops `SymNode` from mutating (i.e. simplifying) its expression. Instead, the simplification (without mutation) is deferred to the `SymNode.maybe_as_int` method. ```python - FakeTensor(size=(s0,), ...) - FakeTensor(size=(s1, s2, s3), ...) - Eq(s0, s1 + s2 + s3) - FakeTensor(size=(s0,), ...) - FakeTensor(size=(s1, s2, s3), ...) ``` In summary, this PR: - Replaces `SymNode._expr` by `SymNode.expr`, removing the old property function - This makes it so `SymNode` instances never update their expression - Creates `SymNode.simplified_expr()` method for actually calling `ShapeEnv.replace` on its expression. Note that this doesn't updates `SymNode.expr` - Changes how `tensor.size()` gets converted to its Python `torch.Size` type - Instead of calling `SymInt::maybe_as_int()` method, we create a new `SymInt::is_symbolic()` method for checking whether it is actually a symbolic value - This is needed so that when we call `tensor.size()` in the Python side, the returned sequence is faithful to the actual data, instead of possibly simplifying it and returning an integer - 2 files needs this modification: - _torch/csrc/Size.cpp_: for handling `torch.Tensor.size` Python calls - _torch/csrc/utils/pybind.cpp_: for handling `symint.cast()` C++ calls Pull Request resolved: https://github.com/pytorch/pytorch/pull/107492 Approved by: https://github.com/ezyang ghstack dependencies: #107523	2023-08-22 12:38:05 +00:00
Sam Gross	d0e50d9094	Move overloaded_args from FunctionSignature to PythonArgs (#106983 ) This moves the `overloaded_args` field from FunctionSignature to PythonArgs. FunctionSignature is shared by all calls and should be immutable. PythonArgs contains the parsing results for an single call to the PyTorch API. I did not measure a difference in performance in the "overrides_benchmark", although I expect there to be a bit more work in the common case. Note that the noise factor for the benchmark is much larger than the differences reported below: Before: ``` Type tensor had a minimum time of 2.3615360260009766 us and a standard deviation of 0.7833134150132537 us. Type SubTensor had a minimum time of 10.473251342773438 us and a standard deviation of 0.1973132457351312 us. Type WithTorchFunction had a minimum time of 5.484819412231445 us and a standard deviation of 0.13305981701705605 us. Type SubWithTorchFunction had a minimum time of 11.098146438598633 us and a standard deviation of 0.15598918253090233 us. ``` After: ``` Type tensor had a minimum time of 2.2134780883789062 us and a standard deviation of 0.802064489107579 us. Type SubTensor had a minimum time of 10.625839233398438 us and a standard deviation of 0.15155907021835446 us. Type WithTorchFunction had a minimum time of 5.520820617675781 us and a standard deviation of 0.23115111980587244 us. Type SubWithTorchFunction had a minimum time of 11.227846145629883 us and a standard deviation of 0.23032321769278497 us. ``` Fixes #106974 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106983 Approved by: https://github.com/zou3519, https://github.com/ezyang, https://github.com/albanD	2023-08-16 15:59:26 +00:00
cyy	646fa36875	Add const reference in opportunities detected by clang-tidy (#105931 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105931 Approved by: https://github.com/Skylion007	2023-07-26 21:38:10 +00:00
Anthony Alayo	8d65635378	Prefixing DeviceType with c10 namespace to avoid name collisions (#104364 ) Fixes #91338 Follow up from https://github.com/pytorch/pytorch/pull/91342 > 🚀 The feature, motivation and pitch > We have an existing DeviceType class all over the place in our code base, and it conflicts with the one that is used in torch. > Thankfully the pytorch DeciceType enum class is under the c10 namespace. ``` In file included from /xxx/build/_deps/torch-src/../../aten/src/ATen/ops/view.h:5: /xxx/_deps/torch-src/aten/src/ATen/Context.h:265:14: error: reference to 'DeviceType' is ambiguous if (p == DeviceType::HIP) { ^ /xxx/include/Common_types.h:178:8: note: candidate found by name lookup is 'DeviceType' struct DeviceType { ^ /xxx/build/_deps/torch-src/c10/../c10/core/DeviceType.h:32:12: note: candidate found by name lookup is 'c10::DeviceType' enum class DeviceType : int8_t { ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/104364 Approved by: https://github.com/albanD	2023-07-07 13:23:03 +00:00
Will Feng	9541053cca	[dynamo] support FakeTensor for SYM_INT/SYM_INT_LIST/INT_LIST param in python-to-cpp argument parsing (#103448 ) before the PR, when compiling a function with signature symint/symintlist/intlist, we have runtime error like ```argument 'shifts' must be tuple of ints, not FakeTensor```. see newly added unit test in test/dynamo/test_misc.py for repro after the PR, for FakeTensor with empty size and numel()=1, we will try to convert it into symint/symintlist. we will likely see expected exception ```torch._subclasses.fake_tensor.DataDependentOutputException / aten._local_scalar_dense.default``` during conversion reference PR: * we handle FakeTensor for symintlist as 1st varags: https://github.com/pytorch/pytorch/pull/97508 * we handle FakeTensor for intlist in a similar way: https://github.com/pytorch/pytorch/pull/85759/files * call local_scalar_dense on a FakeTensor: `f7365eca90` Pull Request resolved: https://github.com/pytorch/pytorch/pull/103448 Approved by: https://github.com/yanboliang	2023-06-16 21:33:40 +00:00
Shiyan Deng	685505353a	Back out "Add PyObject preservation for UntypedStorage (#97470 )" (#102553 ) Summary: Original commit changeset: c24708d18ccb Original Phabricator Diff: D46159983 Test Plan: SL tests and CI Differential Revision: D46284986 Pull Request resolved: https://github.com/pytorch/pytorch/pull/102553 Approved by: https://github.com/DanilBaibak	2023-06-01 17:23:43 +00:00
lantiankaikai	17166c2511	python_arg_parser to allow fake tensor element in symint_list when in dynamo mode #95424 (#97508 ) Failing mechanism on #95424 : In dynamo mode, when passing numpy.int_ to 'shape' like param (Sequence[Union[int, symint]]) is wrapped as list with FakeTensor. However, in python_arg_parser, parser expect int in symint_list but got FakeTensor. Following #85759, this PR allow tensor element in symint_list when in dynamo mode This PR also fix below test with similar failing mechanism pytest ./generated/test_huggingface_diffusers.py -k test_016 pytest ./generated/test_ustcml_RecStudio.py -k test_036 Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/97508 Approved by: https://github.com/yanboliang	2023-05-31 19:19:17 +00:00
Kurt Mohler	5fe629e314	Add PyObject preservation for UntypedStorage (#97470 ) Part of #91395 Pull Request resolved: https://github.com/pytorch/pytorch/pull/97470 Approved by: https://github.com/ezyang	2023-05-23 01:27:30 +00:00
PandaNinjas	f0786ad776	Use %zu instead of %ld when formatting size_t (#101412 ) This fixes compiling on systems where `size_t` is an `unsigned int` instead of an `unsigned long int` (32 bit Raspberry Pi OS is one example). `%ld` expects an `unsigned long int`, while `%zu` specifies that it's an unsigned size_t. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101412 Approved by: https://github.com/albanD	2023-05-16 02:45:55 +00:00
Edward Z. Yang	756a86d52c	Support large negative SymInt (#99157 ) The strategy is that we will heap allocate a LargeNegativeIntSymNodeImpl whenever we have a large negative int, so that we can keep the old `is_symbolic` test (now called `is_heap_allocated`) on SymInt. Whenever we need to do something with these ints, though, we convert them back into a plain `int64_t` (and then, e.g., wrap it in whatever user specificed SymNodeImpl they need.) We cannot wrap directly in the user specified SymNodeImpl as we generally do not know what the "tracing context" is from C++. We expect large negative ints to be rare, so we don't apply optimizations like singleton-ifying INT_MIN. Here's the order to review: * c10/core/SymInt.h and cpp * `is_symbolic` renamed to `is_heap_allocated` as I needed to audit all use sites: the old `is_symbolic` test would return true for large negative int, but it would be wrong to then try to dispatch on the LargeNegativeIntSymNodeImpl which supports very few operations. In this file, I had to update expect_int, * If you pass in a large negative integer, we instead heap allocate it in `promote_to_negative`. The function is written in a funny way to keep compact constructor code for SymInt (the heap allocation happens out of line) * clone is now moved out-of-line * New method maybe_as_int which will give you a constant int if it is possible, either because it's stored inline or in LargeNegativeIntSymNodeImpl. This is the preferred replacement for previous use of is_symbolic() and then as_int_unchecked(). * Rename toSymNodeImpl to toSymNode, which is more correct (since it returns a SymNode) * Complete rewrite of `normalize_symints.cpp` to use new `maybe_as_int`. Cannot easily use the old code structure, so it's now done doing a macro and typing out each case manually (it's actually not that bad.) * Reimplementations of all the unary operators by hand to use `maybe_as_int`, relatively simple. * c10/core/LargeNegativeIntSymNodeImpl.h - Just stores a int64_t value, but it has to be big and negative. Most methods are not implemented, since we will rewrap the large negative int in the real SymNodeImpl subclass before doing operations with it * The rest of the files are just rewriting code to use `maybe_as_int`. There is a nontrivial comment in c10/core/SymIntArrayRef.h Very minor test adjustment in c10/test/core/SymInt_test.cpp . Plan to exercise this properly in next PR. Companion XLA PR: https://github.com/pytorch/xla/pull/4882 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/99157 Approved by: https://github.com/albanD	2023-04-15 22:43:51 +00:00
Tugsbayasgalan Manlaibaatar	39fd7f945f	Add Symbool support in python to C++ translation (#98453 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/98453 Approved by: https://github.com/ezyang	2023-04-12 03:21:57 +00:00
albanD	dda95236c9	Add fast path in our type checks and argparser (#98764 ) Add fastpath for common use cases in our python arg parsing. This is using the observation that exact type check is a lot fast (pointer comparison) than subtype check (isintance call). So we make sure to do these before any isinstance check. This can be pretty significant where `a.view((1, 1, 1, 1))` goes from ~1.13us to 800ns. Full test: Tested perf locally with cpu freq locked and script pinned to a single core to reduce jitter. Benchmark results after doing each change in this PR one by one: ``` [albandes@albandes-fedora-K2202N0104138 test]$ # Original [albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' Running a.view(1) 827 ns ± 0.945 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1)) 947 ns ± 1.23 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1)) 1.04 µs ± 0.882 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1, 1)) 1.14 µs ± 1.59 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze(0) 797 ns ± 0.955 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0,)) 937 ns ± 1.51 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0, 1)) 1.02 µs ± 3.52 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) [albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' Running a.view(1) 823 ns ± 1.76 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1)) 938 ns ± 1.38 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1)) 1.03 µs ± 0.801 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1, 1)) 1.13 µs ± 0.877 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze(0) 768 ns ± 2.27 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0,)) 927 ns ± 0.779 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0, 1)) 1.01 µs ± 1.34 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) [albandes@albandes-fedora-K2202N0104138 test]$ # checkLong fastpath [albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' Running a.view(1) 801 ns ± 0.982 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1)) 900 ns ± 0.593 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1)) 1 µs ± 1.44 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1, 1)) 1.1 µs ± 1.38 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze(0) 782 ns ± 0.968 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0,)) 1.11 µs ± 424 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0, 1)) 1.09 µs ± 54.7 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) [albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' Running a.view(1) 817 ns ± 0.65 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1)) 912 ns ± 0.853 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1)) 1.02 µs ± 8.45 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1, 1)) 1.11 µs ± 2.53 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze(0) 781 ns ± 0.942 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0,)) 939 ns ± 1.57 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0, 1)) 1.01 µs ± 0.875 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) [albandes@albandes-fedora-K2202N0104138 test]$ # Tensor check fastpath [albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' Running a.view(1) 806 ns ± 2.8 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1)) 903 ns ± 1.82 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1)) 1 µs ± 1.21 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1, 1)) 1.1 µs ± 1.17 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze(0) 770 ns ± 1.66 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0,)) 931 ns ± 3.36 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0, 1)) 1.02 µs ± 0.983 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) [albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' Running a.view(1) 813 ns ± 2.42 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1)) 915 ns ± 0.868 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1)) 1.02 µs ± 1.09 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1, 1)) 1.11 µs ± 1.15 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze(0) 785 ns ± 0.807 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0,)) 941 ns ± 1.02 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0, 1)) 1.02 µs ± 0.857 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) [albandes@albandes-fedora-K2202N0104138 test]$ # Fast path number in intlist/symintlist [albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' Running a.view(1) 728 ns ± 0.503 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1)) 749 ns ± 0.829 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1)) 771 ns ± 0.727 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1, 1)) 800 ns ± 0.962 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze(0) 772 ns ± 0.622 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0,)) 883 ns ± 0.567 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0, 1)) 915 ns ± 0.638 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) [albandes@albandes-fedora-K2202N0104138 test]$ taskset 0x1 ipython foo.py No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' Running a.view(1) 735 ns ± 1.27 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1)) 753 ns ± 2.57 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1)) 774 ns ± 1.38 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.view((1, 1, 1, 1)) 801 ns ± 0.835 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze(0) 773 ns ± 0.677 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0,)) 873 ns ± 1.1 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) Running a.squeeze((0, 1)) 907 ns ± 0.836 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each) ``` <details> <summary>Test script</summary> ```python import torch from IPython import get_ipython a = torch.empty(1) print("Running ", "a.view(1)") get_ipython().run_line_magic("timeit", "a.view(1)") print("Running ", "a.view((1, 1))") get_ipython().run_line_magic("timeit", "a.view((1, 1))") print("Running ", "a.view((1, 1, 1))") get_ipython().run_line_magic("timeit", "a.view((1, 1, 1))") print("Running ", "a.view((1, 1, 1, 1))") get_ipython().run_line_magic("timeit", "a.view((1, 1, 1, 1))") a = torch.empty(1, 1, 1) print("Running ", "a.squeeze(0)") get_ipython().run_line_magic("timeit", "a.squeeze(0)") print("Running ", "a.squeeze((0,))") get_ipython().run_line_magic("timeit", "a.squeeze((0,))") print("Running ", "a.squeeze((0, 1))") get_ipython().run_line_magic("timeit", "a.squeeze((0, 1))") ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/98764 Approved by: https://github.com/ngimel	2023-04-11 00:08:26 +00:00
Elias Ellison	5c8fea5647	Reduce overhead in CUDAGraph Trees (#98529 ) Significantly reduces overhead of constructing Tensors and Storages and checking Storage Liveness. Removes the regression for HF models that I tested and removes 75% of overhead of the extremely overhead bound resnet50 training we have in torchbench. (.91x base commit, 1.02x torchinductor default, 1.16x this PR, 1.25 previous cudagraphs impl). This PR takes care of all of the lower hanging fruit. - Computes storage aliasing at record time instead of during at runtime. We no longer need to use a runtime storage cache, and can instead index directly into the existing alias if there is one, or construct a new Storage - Moves the heavyweight C++ calls into a batch - getting storage weakrefs and constructing tensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/98529 Approved by: https://github.com/jansel, https://github.com/ngimel	2023-04-07 05:46:08 +00:00
Escapeqyq	3112d2a2b6	Export function symbols to enable Windows build of Intel Extension for PyTorch (#98054 ) This PR is to export specific function symbols into .dll shared library on Windows platform to support Windows build for [Intel Extension for PyTorch](https://github.com/intel/intel-extension-for-pytorch). TORCH_API/TORCH_PYTHON_API/PYBIND11_EXPORT are macros that decorate the function as dllexport while compilation, so that the function symbol will be exported into the .dll shared library file on Windows platform. It is necessary for other libraries (such as IPEX) to import and call these functions through dynamic linking of PyTorch on Windows platform. The code changes of this PR adds decorators to export specific functions used by IPEX. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98054 Approved by: https://github.com/ezyang	2023-04-05 23:23:18 +00:00
cyy	f27e09de04	Cleanup Windows warning suppression in CMake and fix some warnings in the source code (#94927 ) This PR do two things: 1. It moves some Windows warning suppression from various CMake files into the main CMakeList.txt, following the conventions of gcc and clang. 2. It fixes some Windows warnings in the source code. Most importantly, it fixes lots of dll warnings by adjusting C10_API to TORCH_API or TORCH_PYTHON_API. There are still some dll warnings because some TORCH_API functions are actually built as part of libtorch_python Pull Request resolved: https://github.com/pytorch/pytorch/pull/94927 Approved by: https://github.com/malfet	2023-02-27 19:22:20 +00:00
Edward Z. Yang	d78274b759	Automatically guard when SymInt is converted to int (#95479 ) During enablement, we disabled int() conversions because they were any easy way to footgun guards. We have enough of dynamic shapes working now that this is now causing spurious errors; e.g., if you feed a symbolic int to x.size(symint). We now allow for implicit conversions of SymInt to int here, posting a guard. We expect guard provenance to help people debug overspecialization. Fixes https://github.com/pytorch/pytorch/issues/95328 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/95479 Approved by: https://github.com/wconstab, https://github.com/voznesenskym, https://github.com/ngimel	2023-02-25 19:41:51 +00:00
cyy	bfe5e1258b	avoid unnecessary static_cast (#93898 ) avoid unnecessary static_cast Pull Request resolved: https://github.com/pytorch/pytorch/pull/93898 Approved by: https://github.com/Skylion007	2023-02-03 03:44:43 +00:00
Eddie Yan	e096d2db5a	[BC-Breaking] Separate `stream_id`, `device_index`, and `device_type` in `pack` and `unpack` for `Streams` (#81596 ) #75854 A naive attempt at working around the limitations of using a single 64-bit integer to pack `stream_id`, `device_index`, and `device_type`. Stills needs sanity checks, testing, and minimization of BC-breaking changes. Currently a Holder for the `StreamData3` struct is used for `IValue` compatibility. While doing this seems to work for `ivalue.h` and `ivalue_inl.h`, this doesn't seem to be naively working for the JIT CUDA stream wrapper? (Something about ambiguous calls if an `intrusive_ptr` to `c10::ivalue::StreamData3Holder` is used as the return type for `pack()`. It turns out that the methods required to access the fields for rematerializing a CUDA Stream are basically already present anyway, so `pack` is simply removed in the wrapper for now and the methods to access the required fields are called directly. CC @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/81596 Approved by: https://github.com/ezyang	2023-01-12 14:16:49 +00:00
PyTorch MergeBot	b3603f8129	Revert "Deduplicate c10 error and PyTorchError hierarchy (#87855 )" This reverts commit `34f2d3e6ae`. Reverted https://github.com/pytorch/pytorch/pull/87855 on behalf of https://github.com/osalpekar due to perf regression in quantization tests	2023-01-06 19:56:35 +00:00
William Phetsinorath	34f2d3e6ae	Deduplicate c10 error and PyTorchError hierarchy (#87855 ) Fixes #53370 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87855 Approved by: https://github.com/albanD	2023-01-02 15:53:36 +00:00
Aaron Gokaslan	a34a9c3471	Perf: Apply more clang-tidy fixups to torch headers (#91445 ) Applies so more fixes to headers that may have been missed before for performance optimization.cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @EikanWang @ezyang since this more in the series of the clang-tidy fixup This is PR fixes 3 main issues: 1. Use emplacement more in headers 1. Avoid unnecessary copies and use const ref when possible 1. Default any special functions when possible to make them potentially trivial and more readable. 1. There is also one change in this PR that tries to prevent unnecessary math promotion, the rest of these changes are in another PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/91445 Approved by: https://github.com/ezyang	2022-12-29 23:43:45 +00:00
Edward Z. Yang	f7365eca90	Add unbacked symints support; item works now (#90624 ) The big idea is to add `create_unbacked_symfloat` and `create_unbacked_symint` to ShapeEnv, allowing you to allocate symbolic floats/ints corresponding to data you don't know about at compile time. Then, instead of immediately erroring out when you try to call local_scalar_dense on a FakeTensor, we instead create a fresh symint/symfloat and return that. There a bunch of odds and ends that need to be handled: * A number of `numel` calls converted to `sym_numel` * When we finally return from item(), we need to ensure we actually produce a SymInt/SymFloat when appropriate. The previous binding code assumed that you would have to get a normal Python item. I add a pybind11 binding for Scalar (to PyObject only) and refactor the code to use that. There is some trickiness where you are NOT allowed to go through c10::SymInt if there isn't actually any SymInt involved. See comment. * One of our unit tests tripped an implicit data dependent access which occurs when you pass a Tensor as an argument to a sizes parameter. This is also converted to support symbolic shapes * We now support tracking bare SymInt/SymFloat returns in proxy tensor mode (this was already in symbolic-shapes branch) * Whenever we allocate an unbacked symint, we record the stack trace it was allocated at. These get printed when you attempt data dependent access on the symint (e.g., you try to guard on it) * Subtlety: unbacked symints are not necessarily > 1. I added a test for this. These unbacked symints are not very useful right now as you will almost always immediately raise an error later when you try to guard on them. The next logical step is adding an assertion refinement system that lets ShapeEnv learn facts about unbacked symints so it can do a better job eliding guards that are unnecessary. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/90624 Approved by: https://github.com/Skylion007, https://github.com/voznesenskym	2022-12-12 13:33:07 +00:00
Edward Z. Yang	d3c01c722d	Fix pybind11 problems with c10::SymInt unregistered (#88011 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/88011 Approved by: https://github.com/weiwangmeta, https://github.com/albanD	2022-10-29 07:55:45 +00:00
Edward Z. Yang	1ff52225f1	Unify SymIntNode and SymFloatNode into SymNode (#87817 ) This refactor was prompted by challenges handling mixed int/float operations in C++. A previous version of this patch added overloads for each permutation of int/float and was unwieldy https://github.com/pytorch/pytorch/pull/87722/ This PR takes a different approach. The general outline of the patch is to combine the C++ types SymIntNode and SymFloatNode into a single type, SymNode. This is type erased; we no longer know statically at C++ if we have an int/float and have to test it with the is_int()/is_float() virtual methods. This has a number of knock on effects. - We no longer have C++ classes to bind to Python. Instead, we take an entirely new approach to our Python API, where we have a SymInt/SymFloat class defined entirely in Python, which hold a SymNode (which corresponds to the C++ SymNode). However, SymNode is not pybind11-bound; instead, it lives as-is in Python, and is wrapped into C++ SymNode using PythonSymNode when it goes into C++. This implies a userland rename. In principle, it is also possible for the canonical implementation of SymNode to be written in C++, and then bound to Python with pybind11 (we have this code, although it is commented out.) However, I did not implement this as we currently have no C++ implementations of SymNode. Because we do return SymInt/SymFloat from C++ bindings, the C++ binding code needs to know how to find these classes. Currently, this is done just by manually importing torch and getting the attributes. - Because SymInt/SymFloat are easy Python wrappers, __sym_dispatch__ now takes SymInt/SymFloat, rather than SymNode, bringing it in line with how __torch_dispatch__ works. Some miscellaneous improvements: - SymInt now has a constructor that takes SymNode. Note that this constructor is ambiguous if you pass in a subclass of SymNode, so an explicit downcast is necessary. This means toSymFloat/toSymInt are no more. This is a mild optimization as it means rvalue reference works automatically. - We uniformly use the caster for c10::SymInt/SymFloat, rather than going the long way via the SymIntNode/SymFloatNode. - Removed some unnecessary toSymInt/toSymFloat calls in normalize_* functions, pretty sure this doesn't do anything. - guard_int is now a free function, since to guard on an int you cannot assume the method exists. A function can handle both int and SymInt inputs. - We clean up the magic method definition code for SymInt/SymFloat/SymNode. ONLY the user classes (SymInt/SymFloat) get magic methods; SymNode gets plain methods; this is to help avoid confusion between the two types. Signed-off-by: Edward Z. Yang <ezyang@fb.com> cc @jansel @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87817 Approved by: https://github.com/albanD, https://github.com/anjali411	2022-10-27 20:56:02 +00:00
albanD	3263bd24be	Improve argument printing (#87601 ) No more "expected tuple but got tuple". We appropriately grovel in the list/tuple for the element that mismatched and report what exactly twinged the failure. invalid_arguments.cpp is a shitshow so I did something slapdash to get it not completely horrible. See https://github.com/pytorch/pytorch/issues/87514 for more context. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/87601 Approved by: https://github.com/Chillee	2022-10-24 23:55:10 +00:00
samdow	169ec120ef	[Modes] refactor modes to only use a stack in cpp (#86458 ) Refactors the mode code to only have the C++ mode stack and not the "C++ mode" like we originally had. This also simplifies the mode logic in a number of places Pull Request resolved: https://github.com/pytorch/pytorch/pull/86458 Approved by: https://github.com/zou3519	2022-10-21 19:18:23 +00:00
Edward Z. Yang	954660a308	Correctly error if you pass in tensors where size arguments expected (#86126 ) This also makes symintlist track intlist exception handling, which eellison fixed. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/86126 Approved by: https://github.com/eellison	2022-10-03 20:18:41 +00:00
Edward Z. Yang	07800c9c81	Miscellaneous fixes from symbolic-shapes branch (#86042 ) - Make toIValue accept SymIntNode and SymFloatNode where number (aka Scalar) is expected - Binding for symintlistOptional in python arg parser - Teach translate to convert from IntArrayRef to ArrayRef<int64_t> - Don't query _symint function for meta info in LTC unless LTC is code generating a symint function Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/86042 Approved by: https://github.com/Chillee	2022-10-01 13:57:58 +00:00
Elias Ellison	75db0225ad	Handle fake tensor in intlist (#85759 ) Previously, we were swallowing up the Fake Tensor Exception and throwing `TypeError`, which led to https://github.com/pytorch/torchdynamo/issues/1066. Now, we are propagating back the `DataDependentOutputException`. If this approach is accepted, I can go ahead and do doublelist, symintlist, afterward. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85759 Approved by: https://github.com/ezyang	2022-09-28 21:58:54 +00:00
Edward Z. Yang	9c036aa112	Add SymInt to Scalar (#84958 ) This is by no means comprehensive, but adds initial support for SymInt as a Scalar. Things that don't work yet but need to: - for some reason `torch.add(tensor, sym_int)` got matched to the `add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor` schema - `x + sym_int` failed bc we tried to turn `x` into a sym int: ``` "__radd__", [](c10::SymIntNode a, py::object b) -> c10::SymIntNode { auto snb = toSymIntNode(a, b); return a->add(snb); }) ``` - Many more things I'm sure Pull Request resolved: https://github.com/pytorch/pytorch/pull/84958 Approved by: https://github.com/ezyang	2022-09-25 23:51:06 +00:00
Nikolay Korovaiko	f725009a48	as_strided supports SymInt; codegen supports optional SymInt (#84393 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/84393 Approved by: https://github.com/ezyang	2022-09-06 16:39:24 +00:00
Edward Z. Yang	2a332afbf4	Add SymFloat, support SymInt to SymFloat conversion (#84284 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/84284 Approved by: https://github.com/albanD	2022-09-03 01:30:32 +00:00
Nikolay Korovaiko	63cbdc92a7	switching the exact check to isinstance check (#84023 ) Simplifying a type check if an object is a SymIntNode in `is_symint_node` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84023 Approved by: https://github.com/ezyang	2022-08-25 08:28:40 +00:00
Nikolay Korovaiko	5b621205f4	Revert "Revert "adding a custom caster for c10::SymInt (#82692 )"" (#83223 ) This should fix the MacOS build errors and reland #82692 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83223 Approved by: https://github.com/albanD	2022-08-12 00:46:50 +00:00
PyTorch MergeBot	daeea7d2c3	Revert "adding a custom caster for c10::SymInt (#82692 )" This reverts commit `dee63f4f7b`. Reverted https://github.com/pytorch/pytorch/pull/82692 on behalf of https://github.com/seemethere due to Broke internal builds, see [logs](https://www.internalfb.com/intern/sandcastle/job/4503600373141339/insights)	2022-08-09 22:17:41 +00:00
Nikolay Korovaiko	dee63f4f7b	adding a custom caster for c10::SymInt (#82692 ) ### Description Adding a custom caster for `c10::SymInt`. This simplifies handling of c10::SymInt on C++/Pytorch boundary. Namely, removing if statements to handle the union nature (e.g. SymIntNode, int) of c10::SymInt. ### Issue <!-- Link to Issue ticket or RFP --> ### Testing <!-- How did you test your change? --> Pull Request resolved: https://github.com/pytorch/pytorch/pull/82692 Approved by: https://github.com/ezyang	2022-08-08 21:40:53 +00:00
Peter Bell	2c2278a960	Make python TensorOption signatures consistent with JIT schemas (#82241 ) Fixes #81774 `TensorOptions` arguments in the JIT schema are optional, but in the Python API these were being translated to non-optional but with a default value. This change makes the arguments accept `None` for consistency with the JIT schema. However, it also means that `dtype=c10::nullopt` was previously completely untested so this also fixes several related bugs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82241 Approved by: https://github.com/ngimel	2022-08-07 00:10:27 +00:00
Edward Z. Yang	a9320e6d96	Delete SymInt::data() in favor of as_int_unchecked() (#82477 ) I audited all the sites while I was at it, and marked a few suspicious ones. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/82477 Approved by: https://github.com/Chillee	2022-08-01 15:07:22 +00:00
Edward Z. Yang	50e8abbcad	Change SymIntNode into an intrusive pointer (#82548 ) This will make the pointer type a single word, which is important for packing it into an int64_t This time, this diff doesn't segfault when you build with DEBUG mode; more details at https://github.com/pybind/pybind11/issues/4099 Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/82548 Approved by: https://github.com/albanD	2022-08-01 15:07:21 +00:00
Edward Z. Yang	fd5ac1e6b5	Rename SymbolicIntNode to SymIntNodeImpl (#82350 ) Done via ``` git grep -l 'SymbolicIntNode' \| xargs sed -i 's/SymbolicIntNode/SymIntNodeImpl/g' ``` Reasoning for the change: * Sym is shorter than Symbolic, and consistent with SymInt * You usually will deal in shared_ptr<...>, so we're going to reserve the shorter name (SymIntNode) for the shared pointer. But I don't want to update the Python name, so afterwards I ran ``` git grep -l _C.SymIntNodeImpl \| xargs sed -i 's/_C.SymIntNodeImpl/_C.SymIntNode/' ``` and manually fixed up the binding code Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/82350 Approved by: https://github.com/Krovatkin	2022-07-28 18:27:45 +00:00
George Qi	393f7f6ad7	add layout to slow path (#80429 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/80429 Approved by: https://github.com/ezyang	2022-07-06 18:01:31 +00:00

1 2 3 4 5

205 Commits