pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Bin Bao	3058700f7f	[aotinductor] Add AOTIModelRunner as a utility class (#110891 ) Summary: Introduce a utility class AOTIModelRunner to take care of running an AOTInductor compiled model. It does things like dlopen a model, initialize the model container, setup inputs and outputs, and destroy the model container. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110891 Approved by: https://github.com/chenyang78 ghstack dependencies: #110652	2023-10-11 15:58:28 +00:00
Nikita Shulga	ef19824db8	Suppress warnings in tensorpipe.h (#111012 ) To fix distributed compilation with clang-15 Fixes https://github.com/pytorch/pytorch/issues/110974 Pull Request resolved: https://github.com/pytorch/pytorch/pull/111012 Approved by: https://github.com/huydhn, https://github.com/drisspg, https://github.com/Skylion007	2023-10-11 15:41:30 +00:00
Michael Voznesensky	1e7947b3e0	Revert "Reland 3rd try [finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#109323 )" + Forward fixes + test (#110964 ) This reverts commit `f786fbdebd`. Forward fixes Pull Request resolved: https://github.com/pytorch/pytorch/pull/110964 Approved by: https://github.com/ezyang, https://github.com/anijain2305	2023-10-11 05:16:47 +00:00
Nikita Shulga	e49ea87162	Fix socket.cpp compilation using gcc-9.4 (#111002 ) Otherwise following error is thrown when attempted to compile with WERROR enabled: ``` In file included from /home/nshulga/git/pytorch/pytorch/torch/csrc/distributed/c10d/socket.cpp:30: /home/nshulga/git/pytorch/pytorch/third_party/fmt/include/fmt/chrono.h:340:24: warning: redundant redeclaration of ‘constexpr’ static data member ‘fmt::v10::detail::codecvt_result<CodeUnit>::max_size’ [-Wdeprecated] 340 \| constexpr const size_t codecvt_result<CodeUnit>::max_size; \| ^~~~~~~~~~~~~~~~~~~~~~~~ /home/nshulga/git/pytorch/pytorch/third_party/fmt/include/fmt/chrono.h:335:33: note: previous declaration of ‘fmt::v10::detail::codecvt_result<CodeUnit>::max_size’ 335 \| static constexpr const size_t max_size = 32; \| ^~~~~~~~ ``` or following if using clang as host compiler ``` In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/distributed/c10d/socket.cpp:30: /Users/nshulga/git/pytorch/pytorch/third_party/fmt/include/fmt/chrono.h:340:50: warning: out-of-line definition of constexpr static data member is redundant in C++17 and is deprecated [-Wdeprecated] constexpr const size_t codecvt_result<CodeUnit>::max_size; ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/111002 Approved by: https://github.com/drisspg	2023-10-11 05:16:00 +00:00
PyTorch MergeBot	314a502eb0	Revert "Reland "[C10] PG observability hooks. (#108815 )" (#110907 )" This reverts commit `7678cd22af`. Reverted https://github.com/pytorch/pytorch/pull/110907 on behalf of https://github.com/huydhn due to Sorry for reverting this, but macos job in trunk starts failing after this `7678cd22af` ([comment](https://github.com/pytorch/pytorch/pull/110907#issuecomment-1756497387))	2023-10-11 00:23:42 +00:00
Will Constable	ca03f36233	Change ProcessGroupNCCL default timeout to 10 min (#110947 ) Avoid changing default for other backends as CPU backend (GLOO) may need longer timeouts. Motivated by trying to save cluster time when encountering collective hangs. Generally collectives should time out within seconds and 30 minutes (or 10 minutes) should provide ample headroom for edge cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110947 Approved by: https://github.com/xw285cornell, https://github.com/fduwjj	2023-10-10 22:28:39 +00:00
Will Constable	7678cd22af	Reland "[C10] PG observability hooks. (#108815 )" (#110907 ) This reverts commit `ff0358b038`. (original PR https://github.com/pytorch/pytorch/pull/108815 desc copied below) Expose a set of observability hooks into C10D such that our users can detect collectives failure both faster and more easily. The design is similar to NCCL desync debug that it minimized the overhead by doing most of the work out of the main thread. This PR introduces a new module torch.distributed.hooks that exposes the following set of methods: register_collective_start_hook register_collective_end_hook register_process_group_hook The process group hook exposes PG creation on the member ranks and call them inline from the the PG creation code. This is fine since this happens during initialization and a limited number of times. The collective start/end hooks are fired from a single background thread. It reads events from a C++ queue and dispatches over. Queue notification is oddly done using a pipe, this is needed so python can abort the thread on shutdown and have it as background thread. This is not possible with more reasonable choices like a condvar. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110907 Approved by: https://github.com/fduwjj	2023-10-10 20:09:40 +00:00
soulitzer	df9a6bcaef	[reland] Symintify guards.cpp (#110675 ) reland of https://github.com/pytorch/pytorch/pull/110371 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110675 Approved by: https://github.com/ezyang ghstack dependencies: #110673, #110674	2023-10-10 19:37:17 +00:00
soulitzer	fda0a965c7	[reland] Support SingletonSymNode mul with coefficient (#110673 ) reland of https://github.com/pytorch/pytorch/pull/110369 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110673 Approved by: https://github.com/ezyang	2023-10-10 19:37:17 +00:00
jjsjann123	37567fdf31	Nvfuser cpp api deprecation attempt 2 (#110881 ) attempting to re-try #110318 deprecating nvfuser c++ API warning has been updated to TORCH_WARN_ONCE; Warning thrown inside torch::jit::fuser::cuda::isEnabled() is turned off and will be deprecated when we pulled out TorchScript integration in the follow up PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110881 Approved by: https://github.com/davidberard98, https://github.com/NicolasHug	2023-10-10 08:07:03 +00:00
Edward Z. Yang	de3ae93e9b	Include rank of default PG in C++ log messages (#110623 ) I tested by adding some warning logs in C++, run a distributed program and show that they now had `[rank0]:` in the messages. There is no existing test infra for C++ logging so I couldn't easily add a unit test. The implementation strategy is to setup a global variable in C++, and then poke it when we initialize a process group. This was the simplest thing I could think of that would work. This PR only works for non-glog logging. Probably need to come up with some other strategy for glog, e.g., a custom prefix, but need to make sure this doesn't conflict with fbcode. I can't easily test this from OSS, will leave as follow up work. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/110623 Approved by: https://github.com/voznesenskym, https://github.com/wanchaol, https://github.com/fduwjj	2023-10-10 00:26:52 +00:00
Will Constable	733368a822	Change default NCCL_ASYNC_ERROR_HANDLING to 3:SkipCleanUp (#110723 ) Summary Currently, when detecting a timeout/exception in the watchdog workCleanupLoop, we call nccl APIs to abort all the active communicators before finally re-raising the exception and killing the process. The nccl APIs may hang, causing additional problems. Instead, just re-raise. @kumpera proposed that changing this default should save us from a lot of commonly observed errors. Note: there are other cuda/nccl api calls in our watchdog, which also could hang. This change is not a substitute for a deeper refactor. Detail The current default (NCCL_ASYNC_ERROR_HANDLING=1:TearDown) meant the following: SHOULD_TEAR_DOWN() evaluates to true - This affects 'ProcessGroupNCCL::WorkNCCL::handleException` - handleException is called from two places: - work.wait() -> synchronizeInternal() -> handleException() - workCleanupLoop() -> handleException() - when true, the excpetion is logged and rethrown SHOULD_CLEAN_UP() evaluates to true - This only impacts the workCleanupLoop() - When true, it means all communicators will be aborted (ncclCommAbort()) upon work exception or timeout The proposed new default is NCCL_ASYNC_ERROR_HANDLING3=3:SkipCleanUp. This only changes SHOULD_CLEAN_UP() to false, impacting workCleanupLoop() behavior. Communicators will no longer be aborted, which should avoid a class of bugs where the watchdog hangs due to calling nccl APIs which may block/hang. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110723 Approved by: https://github.com/fduwjj, https://github.com/xw285cornell	2023-10-09 21:38:32 +00:00
Kazuaki Ishizaki	b5f9696d81	Fix typo under torch directory (#110824 ) This PR fixes typo `the the` of comments and exception messages in files under `torch` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110824 Approved by: https://github.com/H-Huang	2023-10-09 19:16:43 +00:00
PyTorch MergeBot	d1c157c598	Revert "[reland] Update custom Function preserve torch function when inputs r… (#110679 )" This reverts commit `563728f61c`. Reverted https://github.com/pytorch/pytorch/pull/110679 on behalf of https://github.com/kit1980 due to The diff has Meta-internal changes, please land from Phabricator ([comment](https://github.com/pytorch/pytorch/pull/110679#issuecomment-1753523182))	2023-10-09 19:09:01 +00:00
PyTorch MergeBot	bbdc8c7b05	Revert "deprecating nvfuser c++ API (#110318 )" This reverts commit `bf0866fc16`. Reverted https://github.com/pytorch/pytorch/pull/110318 on behalf of https://github.com/davidberard98 due to too many warnings being thrown in torchvision https://github.com/pytorch/pytorch/issues/110857 ([comment](https://github.com/pytorch/pytorch/pull/110318#issuecomment-1753245449))	2023-10-09 15:41:50 +00:00
cyy	3ec33957eb	[1/N] Enable Wunused-result and Wunused-variable in torch targets (#110722 ) They are useful for checking results of function calls. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110722 Approved by: https://github.com/Skylion007	2023-10-08 23:43:45 +00:00
albanD	8edb561631	Fix use after free in tensor creation (#106707 ) Fix https://github.com/pytorch/pytorch/issues/106534 Pull Request resolved: https://github.com/pytorch/pytorch/pull/106707 Approved by: https://github.com/Skylion007, https://github.com/ezyang	2023-10-07 22:41:21 +00:00
cyy	c3e4e4f6d2	[4/N] Add -Wdeprecated and related fixes (#110204 ) This PR enables Wdeprecated on torch_cpu Pull Request resolved: https://github.com/pytorch/pytorch/pull/110204 Approved by: https://github.com/ezyang	2023-10-07 19:46:08 +00:00
cyy	12f97bb2e9	[Reland][3/N] Add -Wdeprecated and related fixes (#110518 ) Fixes the string_view errors and reland the work. The previous changes in torch/csrc/utils/invalid_arguments.cpp were too aggressive and not tested thoroughly. They are discarded. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110518 Approved by: https://github.com/ezyang	2023-10-07 08:38:40 +00:00
Adnan Akhundov	98b79e9488	[inductor] Add AOTI ABI shim function for torch.nonzero (#110766 ) Summary: `torch.nonzero` doesn't have inductor lowering (yet). To invoke the operator in AOT Inductor's ABI compatibility mode we need a dedicated shim function. Test Plan: ``` $ python test/inductor/test_aot_inductor.py -k test_zero_grid_with_unbacked_symbols ... ---------------------------------------------------------------------- Ran 4 tests in 78.650s OK ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110766 Approved by: https://github.com/chenyang78 ghstack dependencies: #110713, #110745, #110764	2023-10-07 08:32:27 +00:00
Adnan Akhundov	13a2f42635	[inductor] Add size, stride, storage_offset to RAIIAtenTensorHandle (#110764 ) Summary: For unbacked SymInts, the C++ wrapper codegen can generate expressions like `buf123.size()` or `.stride()` or `.storage_offset()`: `7cc0020a80/torch/_inductor/ir.py (L2504-L2520)` Here we add corresponding methods to the `RAIIAtenTensorHandle` class so that the above codegen works in the ABI compatibility mode. Test Plan: CI + the following PR Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110764 Approved by: https://github.com/chenyang78 ghstack dependencies: #110713, #110745	2023-10-07 08:26:42 +00:00
Adnan Akhundov	abb00f66d8	[inductor] Add AOTI ABI shim function for repeat_interleave.Tensor (#110745 ) Summary: `repeat_interleave.Tensor` doesn't have inductor lowering. To invoke the operator in AOT Inductor's ABI compatibility mode we need a dedicated shim function. Test Plan: ``` $ python test/inductor/test_aot_inductor.py -k test_repeat_interleave ... ---------------------------------------------------------------------- Ran 4 tests in 70.526s OK ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110745 Approved by: https://github.com/chenyang78 ghstack dependencies: #110713	2023-10-07 08:18:01 +00:00
Mu-Chu Lee	840e68301c	[AOTInductor] Change UpdateConstants to UpdateConstantsMap (#110576 ) Summary: Change name of UpdateConstants to UpdateConstantsMap Test Plan: Differential Revision: D49937744 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110576 Approved by: https://github.com/chenyang78, https://github.com/khabinov	2023-10-07 07:36:57 +00:00
jjsjann123	bf0866fc16	deprecating nvfuser c++ API (#110318 ) deprecating nvfuser c++ API Pull Request resolved: https://github.com/pytorch/pytorch/pull/110318 Approved by: https://github.com/davidberard98	2023-10-07 02:25:21 +00:00
soulitzer	563728f61c	[reland] Update custom Function preserve torch function when inputs r… (#110679 ) …eturned as-is reland of https://github.com/pytorch/pytorch/pull/109825#issuecomment-1749803837 Opening this without ghstack to do codev. In our PR, we changed the signature of `_wrap_outputs`. There is some internal code that calls `_wrap_outputs` directly, so we also need to update that callsite. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110679 Approved by: https://github.com/albanD	2023-10-07 00:27:45 +00:00
Adnan Akhundov	2aa3064364	[inductor] Add aoti_torch_dtype_bool to AOTI ABI shim (#110713 ) Summary: ATT Test Plan: CI Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110713 Approved by: https://github.com/chenyang78, https://github.com/desertfire	2023-10-06 22:16:39 +00:00
George White	f4796df914	Add support for generators on the IPU device (#110704 ) This change adds hooks similar to those used on other device types, to allow the Torch to create and use generators provided by the IPU backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110704 Approved by: https://github.com/ezyang	2023-10-06 21:36:14 +00:00
PyTorch MergeBot	ff0358b038	Revert "[C10] PG observability hooks. (#108815 )" This reverts commit `0c7a877745`. Reverted https://github.com/pytorch/pytorch/pull/108815 on behalf of https://github.com/albanD due to Add a new torch.distributed.hooks namespace but does not document it, test was added this morning ([comment](https://github.com/pytorch/pytorch/pull/108815#issuecomment-1751327751))	2023-10-06 19:49:49 +00:00
Rodrigo Kumpera	0c7a877745	[C10] PG observability hooks. (#108815 ) Expose a set of observability hooks into C10D such that our users can detect collectives failure both faster and more easily. The design is similar to NCCL desync debug that it minimized the overhead by doing most of the work out of the main thread. This PR introduces a new module torch.distributed.hooks that exposes the following set of methods: register_collective_start_hook register_collective_end_hook register_process_group_hook The process group hook exposes PG creation on the member ranks and call them inline from the the PG creation code. This is fine since this happens during initialization and a limited number of times. The collective start/end hooks are fired from a single background thread. It reads events from a C++ queue and dispatches over. Queue notification is oddly done using a pipe, this is needed so python can abort the thread on shutdown and have it as background thread. This is not possible with more reasonable choices like a condvar. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108815 Approved by: https://github.com/wconstab, https://github.com/fduwjj	2023-10-06 18:52:46 +00:00
Joel Schlosser	17348b0f51	Implement split_with_sizes backward for NT (#110647 ) Needed internally. Note that `split_with_sizes()` for NT is currently supported only on `dim=-1`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110647 Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer ghstack dependencies: #110646	2023-10-06 18:44:22 +00:00
Joel Schlosser	48240ec62e	Make unbind() overrideable for NT subclass (#110646 ) Reland of #109122. Fixed the memory leak by not saving the outputs of `unbind()` for backward. Rather, the NT sizes are saved so undefined grads can replaced with zeros of the correct size. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110646 Approved by: https://github.com/soulitzer, https://github.com/cpuhrsch	2023-10-06 18:44:22 +00:00
cyy	e75f2e2ea1	Fix clang-tidy warnings in CUDAPluggableAllocator (#110678 ) This PR fixes clang-tidy warnings in CUDAPluggableAllocator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110678 Approved by: https://github.com/Skylion007	2023-10-06 17:33:08 +00:00
Michael Voznesensky	7d98549ca9	retain_graph=True in compiled_autograd (#110367 ) Adds support for retain_graph=True - known as keep_graph_ internally in the autograd engine. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110367 Approved by: https://github.com/jansel	2023-10-06 08:22:10 +00:00
cyy	11b3210a11	[Reland2] Remove calls of c10::either (#110487 ) This PR is reland of #109707 with fixes of MSVC failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110487 Approved by: https://github.com/soulitzer	2023-10-06 00:25:15 +00:00
PyTorch MergeBot	1c3fae46ee	Revert "Support SingletonSymNode mul with coefficient (#110369 )" This reverts commit `eb8feb8ff8`. Reverted https://github.com/pytorch/pytorch/pull/110369 on behalf of https://github.com/PaliC due to bottom diff is causing a plethora of internal failures ([comment](https://github.com/pytorch/pytorch/pull/110369#issuecomment-1749802899))	2023-10-05 23:51:28 +00:00
PyTorch MergeBot	236afe73a2	Revert "Update custom Function preserve torch function when inputs returned as-is (#109825 )" This reverts commit `4e73eee93f`. Reverted https://github.com/pytorch/pytorch/pull/109825 on behalf of https://github.com/PaliC due to causing a plethora of internal failures ([comment](https://github.com/pytorch/pytorch/pull/109825#issuecomment-1749802739))	2023-10-05 23:49:41 +00:00
PyTorch MergeBot	585e2bd818	Revert "Symintify guards.cpp (#110371 )" This reverts commit `e1cfcdfa06`. Reverted https://github.com/pytorch/pytorch/pull/110371 on behalf of https://github.com/PaliC due to bottom diff is causing a plethora of internal failures ([comment](https://github.com/pytorch/pytorch/pull/110371#issuecomment-1749798063))	2023-10-05 23:42:35 +00:00
Scott Wolchok	9e72c9cccd	[torch] easy missing move in aoti_runtime/model.h (#110469 ) Just an extra shared_ptr copy, nothing fancy. Differential Revision: [D49792510](https://our.internmc.facebook.com/intern/diff/D49792510/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110469 Approved by: https://github.com/Skylion007	2023-10-05 20:56:06 +00:00
Jason Park	26f634eefb	Enable aarch64 for fixing undefined symbol error. (#110542 ) Summary: ARM can be safely supported Reviewed By: andrewjcg, aaronenyeshi Differential Revision: D49921679 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110542 Approved by: https://github.com/aaronenyeshi	2023-10-05 16:16:06 +00:00
Oleg Khabinov	cf1b494afd	[AOTInductor] Store loaded kernels in the model (#110554 ) Defining kernels as static vars is problematic for subsequent model loading on non-default CUDA devices. Assuming those kernels were loaded in context of the device #0, so, they are not nullptr anymore, therefore kernels won't work on devices other than the device #0. This change makes devices remembered at model level in AOT mode. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110554 Approved by: https://github.com/chenyang78, https://github.com/desertfire	2023-10-05 10:17:05 +00:00
Sehoon Kim	c36b31d530	`torch::nn::AdaptiveLogSoftmaxWithLoss`: check length of `cutoffs` (#106777 ) Fixes #106698 Also added a check for python API, because current error message ``` Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/sehoon/pytorch-latest/torch/nn/modules/adaptive.py", line 128, in __init__ or (min(cutoffs) <= 0) \ ValueError: min() arg is an empty sequence ``` is not very comprehensible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106777 Approved by: https://github.com/albanD	2023-10-05 05:35:47 +00:00
Edward Z. Yang	6a974bec5d	Change flash attention outputs to be SymInt instead of int (#110533 ) Fixes https://github.com/pytorch/pytorch/issues/110322 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/110533 Approved by: https://github.com/albanD	2023-10-05 01:00:07 +00:00
soulitzer	e1cfcdfa06	Symintify guards.cpp (#110371 ) Separating this out so we can check perf more easily Pull Request resolved: https://github.com/pytorch/pytorch/pull/110371 Approved by: https://github.com/ezyang ghstack dependencies: #110044, #110369, #110370	2023-10-04 22:56:42 +00:00
soulitzer	eb8feb8ff8	Support SingletonSymNode mul with coefficient (#110369 ) We want to be able to use SingletonSymNode to represent strides for Jagged layout tensor. The following is for 3D, but easily generalizable to higher dimensions. Constraints: - [B, x, D] (where x represents the "variably lengthed dim") can be strided in two ways [x, 1, sum(x)] and [dx, d, 1]. We need two different placeholder values depending on how the jagged tensor is strided. - When doing operations we need the strides of output tensors to be expressable in terms of the strides and sizes of the inner tensors. Given [B, x, D] @ [D, D'], the output strides is [x * D', D', 1] rather than some opaque [x2, D', 1]. This constraint exists because if I'm tracing, I need a symint to represent the output stride. This symint needs to come from somewhere; I get it in several ways: (1) create a constant, (2) unbacked symint, (3) create a new input using a source, (4) output of an operation on an existing symint. It is clear that (4) is what we want here, which brings us to the design below. Design: Given the two constraints, the most straightforward way to implement this is actually to update SingletonSymNode to include some scalar factor, i.e. Morally, SingletonSymNode represents `factor * [s_0, s_1, …, s_n]` This enables us to symbolically compute strides from sizes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110369 Approved by: https://github.com/ezyang ghstack dependencies: #110044	2023-10-04 22:56:15 +00:00
soulitzer	4e73eee93f	Update custom Function preserve torch function when inputs returned as-is (#109825 ) Fixes https://github.com/pytorch/pytorch/issues/109805 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109825 Approved by: https://github.com/albanD	2023-10-04 22:45:11 +00:00
Yang Chen	46a5558cd5	[AOTInductor] Simplified AOTInductor interface and model class (#110411 ) Summary: This PR removed several APIs from the AOTInductor interface, which are not used by the client. It also simplified AOTInductor's model class by removing the dim info for input/output tensors. We included dim info before to return max output shapes, which was used by the client to allocate memory for output tensors. Now, we allocate output tensor memory from the .so so that we don't need to maintain such information any more. The deletion of dim info from the model class also simplified the codegen quite a bit. Test Plan: ci Reviewed By: khabinov Differential Revision: D49835430 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110411 Approved by: https://github.com/khabinov, https://github.com/desertfire, https://github.com/jansel	2023-10-04 18:35:24 +00:00
Oguz Ulgen	f04b1a0d27	[AOTInductor] Implement autograd eager backend for native triton kernels (#110403 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110403 Approved by: https://github.com/zou3519, https://github.com/bdhirsh	2023-10-04 17:56:56 +00:00
Shiyan Deng	247c574313	[jit] make register parameter/buffer thread safe in torch::jit::Module (#110488 ) Summary: Registering param/buffer will write into a vector inside Object, need to maintain thread safety if we have threads reading from the vector and writing to the vector at the same time. Test Plan: CI Differential Revision: D49882601 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110488 Approved by: https://github.com/davidberard98	2023-10-04 17:04:23 +00:00
Banit Agrawal	30c4c6ff9b	[PyTorch CCA] Refactor caching allocator config code (#110123 ) Summary: This diff refactors the code by moving CUDAAllocatorConfig into the header file. This config refactoring is done so that we can use the same config code for CUDA pinned memory as well. Test Plan: sandcastle Differential Revision: D49653265 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110123 Approved by: https://github.com/zdevito	2023-10-04 14:58:23 +00:00
PyTorch MergeBot	156aefa89b	Revert "[3/N] Add -Wdeprecated and related fixes (#109698 )" This reverts commit `c31fcdaa4f`. Reverted https://github.com/pytorch/pytorch/pull/109698 on behalf of https://github.com/PaliC due to breaking quantization tests ( quantization/test_quantize_per_channel_sub_byte and quantization/test_quantize_per_channel_float_qparams) internally ([comment](https://github.com/pytorch/pytorch/pull/109698#issuecomment-1746999806))	2023-10-04 14:33:47 +00:00

1 2 3 4 5 ...

12789 Commits