Commit Graph

12789 Commits

Author SHA1 Message Date
Bin Bao
3058700f7f [aotinductor] Add AOTIModelRunner as a utility class (#110891)
Summary: Introduce a utility class AOTIModelRunner to take care of running an AOTInductor compiled model. It does things like dlopen a model, initialize the model container, setup inputs and outputs, and destroy the model container.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110891
Approved by: https://github.com/chenyang78
ghstack dependencies: #110652
2023-10-11 15:58:28 +00:00
Nikita Shulga
ef19824db8 Suppress warnings in tensorpipe.h (#111012)
To fix distributed compilation with clang-15

Fixes https://github.com/pytorch/pytorch/issues/110974

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111012
Approved by: https://github.com/huydhn, https://github.com/drisspg, https://github.com/Skylion007
2023-10-11 15:41:30 +00:00
Michael Voznesensky
1e7947b3e0 Revert "Reland 3rd try [finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#109323)" + Forward fixes + test (#110964)
This reverts commit f786fbdebd.

Forward fixes

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110964
Approved by: https://github.com/ezyang, https://github.com/anijain2305
2023-10-11 05:16:47 +00:00
Nikita Shulga
e49ea87162 Fix socket.cpp compilation using gcc-9.4 (#111002)
Otherwise following error is thrown when attempted to compile with WERROR enabled:
```
In file included from /home/nshulga/git/pytorch/pytorch/torch/csrc/distributed/c10d/socket.cpp:30:
/home/nshulga/git/pytorch/pytorch/third_party/fmt/include/fmt/chrono.h:340:24: warning: redundant redeclaration of ‘constexpr’ static data member ‘fmt::v10::detail::codecvt_result<CodeUnit>::max_size’ [-Wdeprecated]
  340 | constexpr const size_t codecvt_result<CodeUnit>::max_size;
      |                        ^~~~~~~~~~~~~~~~~~~~~~~~
/home/nshulga/git/pytorch/pytorch/third_party/fmt/include/fmt/chrono.h:335:33: note: previous declaration of ‘fmt::v10::detail::codecvt_result<CodeUnit>::max_size’
  335 |   static constexpr const size_t max_size = 32;
      |                                 ^~~~~~~~
```
or following if using clang as host compiler
```
In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/distributed/c10d/socket.cpp:30:
/Users/nshulga/git/pytorch/pytorch/third_party/fmt/include/fmt/chrono.h:340:50: warning: out-of-line definition of constexpr static data member is redundant in C++17 and is deprecated [-Wdeprecated]
constexpr const size_t codecvt_result<CodeUnit>::max_size;
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111002
Approved by: https://github.com/drisspg
2023-10-11 05:16:00 +00:00
PyTorch MergeBot
314a502eb0 Revert "Reland "[C10] PG observability hooks. (#108815)" (#110907)"
This reverts commit 7678cd22af.

Reverted https://github.com/pytorch/pytorch/pull/110907 on behalf of https://github.com/huydhn due to Sorry for reverting this, but macos job in trunk starts failing after this 7678cd22af ([comment](https://github.com/pytorch/pytorch/pull/110907#issuecomment-1756497387))
2023-10-11 00:23:42 +00:00
Will Constable
ca03f36233 Change ProcessGroupNCCL default timeout to 10 min (#110947)
Avoid changing default for other backends as CPU backend (GLOO) may need
longer timeouts.

Motivated by trying to save cluster time when encountering collective
hangs.  Generally collectives should time out within seconds and 30
minutes (or 10 minutes) should provide ample headroom for edge cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110947
Approved by: https://github.com/xw285cornell, https://github.com/fduwjj
2023-10-10 22:28:39 +00:00
Will Constable
7678cd22af Reland "[C10] PG observability hooks. (#108815)" (#110907)
This reverts commit ff0358b038.

(original PR https://github.com/pytorch/pytorch/pull/108815 desc copied below)

Expose a set of observability hooks into C10D such that our users can
detect collectives failure both faster and more easily.

The design is similar to NCCL desync debug that it minimized the
overhead by doing most of the work out of the main thread.

This PR introduces a new module torch.distributed.hooks that exposes the following set of methods:

    register_collective_start_hook
    register_collective_end_hook
    register_process_group_hook

The process group hook exposes PG creation on the member ranks and call them inline from the
the PG creation code. This is fine since this happens during initialization and a limited number of times.

The collective start/end hooks are fired from a single background thread. It reads
events from a C++ queue and dispatches over.

Queue notification is oddly done using a pipe, this is needed so python can abort the thread on shutdown
and have it as background thread. This is not possible with more reasonable choices like a condvar.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110907
Approved by: https://github.com/fduwjj
2023-10-10 20:09:40 +00:00
soulitzer
df9a6bcaef [reland] Symintify guards.cpp (#110675)
reland of https://github.com/pytorch/pytorch/pull/110371

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110675
Approved by: https://github.com/ezyang
ghstack dependencies: #110673, #110674
2023-10-10 19:37:17 +00:00
soulitzer
fda0a965c7 [reland] Support SingletonSymNode mul with coefficient (#110673)
reland of https://github.com/pytorch/pytorch/pull/110369
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110673
Approved by: https://github.com/ezyang
2023-10-10 19:37:17 +00:00
jjsjann123
37567fdf31 Nvfuser cpp api deprecation attempt 2 (#110881)
attempting to re-try #110318 deprecating nvfuser c++ API

warning has been updated to TORCH_WARN_ONCE;
Warning thrown inside torch::jit::fuser::cuda::isEnabled() is turned off and will be deprecated when we pulled out TorchScript integration in the follow up PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110881
Approved by: https://github.com/davidberard98, https://github.com/NicolasHug
2023-10-10 08:07:03 +00:00
Edward Z. Yang
de3ae93e9b Include rank of default PG in C++ log messages (#110623)
I tested by adding some warning logs in C++, run a distributed program and show that they now had `[rank0]:` in the messages. There is no existing test infra for C++ logging so I couldn't easily add a unit test.

The implementation strategy is to setup a global variable in C++, and then poke it when we initialize a process group. This was the simplest thing I could think of that would work.

This PR only works for non-glog logging. Probably need to come up with some other strategy for glog, e.g., a custom prefix, but need to make sure this doesn't conflict with fbcode. I can't easily test this from OSS, will leave as follow up work.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110623
Approved by: https://github.com/voznesenskym, https://github.com/wanchaol, https://github.com/fduwjj
2023-10-10 00:26:52 +00:00
Will Constable
733368a822 Change default NCCL_ASYNC_ERROR_HANDLING to 3:SkipCleanUp (#110723)
Summary

Currently, when detecting a timeout/exception in the watchdog
workCleanupLoop, we call nccl APIs to abort all the active communicators
before finally re-raising the exception and killing the process.  The
nccl APIs may hang, causing additional problems. Instead, just re-raise.

@kumpera proposed that changing this default should save us from a lot of commonly observed errors.

Note: there are other cuda/nccl api calls in our watchdog, which also could hang. This change is not a substitute for a deeper refactor.

Detail

The current default (NCCL_ASYNC_ERROR_HANDLING=1:TearDown) meant the following:

SHOULD_TEAR_DOWN() evaluates to true
  - This affects 'ProcessGroupNCCL::WorkNCCL::handleException`
  - handleException is called from two places:
     - work.wait() -> synchronizeInternal() -> handleException()
     - workCleanupLoop() -> handleException()
  - when true, the excpetion is logged and rethrown

SHOULD_CLEAN_UP() evaluates to true
  - This only impacts the workCleanupLoop()
  - When true, it means all communicators will be aborted (ncclCommAbort())
    upon work exception or timeout

The proposed new default is NCCL_ASYNC_ERROR_HANDLING3=3:SkipCleanUp.

This only changes SHOULD_CLEAN_UP() to false, impacting workCleanupLoop() behavior.
Communicators will no longer be aborted, which should avoid a class of bugs where the watchdog hangs due to calling nccl APIs which may block/hang.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110723
Approved by: https://github.com/fduwjj, https://github.com/xw285cornell
2023-10-09 21:38:32 +00:00
Kazuaki Ishizaki
b5f9696d81 Fix typo under torch directory (#110824)
This PR fixes typo `the the` of comments and exception messages in files under `torch` directory.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110824
Approved by: https://github.com/H-Huang
2023-10-09 19:16:43 +00:00
PyTorch MergeBot
d1c157c598 Revert "[reland] Update custom Function preserve torch function when inputs r… (#110679)"
This reverts commit 563728f61c.

Reverted https://github.com/pytorch/pytorch/pull/110679 on behalf of https://github.com/kit1980 due to The diff has Meta-internal changes, please land from Phabricator ([comment](https://github.com/pytorch/pytorch/pull/110679#issuecomment-1753523182))
2023-10-09 19:09:01 +00:00
PyTorch MergeBot
bbdc8c7b05 Revert "deprecating nvfuser c++ API (#110318)"
This reverts commit bf0866fc16.

Reverted https://github.com/pytorch/pytorch/pull/110318 on behalf of https://github.com/davidberard98 due to too many warnings being thrown in torchvision https://github.com/pytorch/pytorch/issues/110857 ([comment](https://github.com/pytorch/pytorch/pull/110318#issuecomment-1753245449))
2023-10-09 15:41:50 +00:00
cyy
3ec33957eb [1/N] Enable Wunused-result and Wunused-variable in torch targets (#110722)
They are useful for checking results of function calls.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110722
Approved by: https://github.com/Skylion007
2023-10-08 23:43:45 +00:00
albanD
8edb561631 Fix use after free in tensor creation (#106707)
Fix https://github.com/pytorch/pytorch/issues/106534
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106707
Approved by: https://github.com/Skylion007, https://github.com/ezyang
2023-10-07 22:41:21 +00:00
cyy
c3e4e4f6d2 [4/N] Add -Wdeprecated and related fixes (#110204)
This PR enables Wdeprecated on torch_cpu

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110204
Approved by: https://github.com/ezyang
2023-10-07 19:46:08 +00:00
cyy
12f97bb2e9 [Reland][3/N] Add -Wdeprecated and related fixes (#110518)
Fixes the string_view errors and reland the work. The previous changes in torch/csrc/utils/invalid_arguments.cpp were too aggressive and not tested thoroughly. They are discarded.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110518
Approved by: https://github.com/ezyang
2023-10-07 08:38:40 +00:00
Adnan Akhundov
98b79e9488 [inductor] Add AOTI ABI shim function for torch.nonzero (#110766)
Summary: `torch.nonzero` doesn't have inductor lowering (yet). To invoke the operator in AOT Inductor's ABI compatibility mode we need a dedicated shim function.

Test Plan:

```
$ python test/inductor/test_aot_inductor.py -k test_zero_grid_with_unbacked_symbols
...
----------------------------------------------------------------------
Ran 4 tests in 78.650s

OK
```

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110766
Approved by: https://github.com/chenyang78
ghstack dependencies: #110713, #110745, #110764
2023-10-07 08:32:27 +00:00
Adnan Akhundov
13a2f42635 [inductor] Add size, stride, storage_offset to RAIIAtenTensorHandle (#110764)
Summary: For unbacked SymInts, the C++ wrapper codegen can generate expressions like `buf123.size()` or `.stride()` or `.storage_offset()`:

7cc0020a80/torch/_inductor/ir.py (L2504-L2520)

Here we add corresponding methods to the `RAIIAtenTensorHandle` class so that the above codegen works in the ABI compatibility mode.

Test Plan: CI + the following PR

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110764
Approved by: https://github.com/chenyang78
ghstack dependencies: #110713, #110745
2023-10-07 08:26:42 +00:00
Adnan Akhundov
abb00f66d8 [inductor] Add AOTI ABI shim function for repeat_interleave.Tensor (#110745)
Summary: `repeat_interleave.Tensor` doesn't have inductor lowering. To invoke the operator in AOT Inductor's ABI compatibility mode we need a dedicated shim function.

Test Plan:

```
$ python test/inductor/test_aot_inductor.py -k test_repeat_interleave
...
----------------------------------------------------------------------
Ran 4 tests in 70.526s

OK
```

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110745
Approved by: https://github.com/chenyang78
ghstack dependencies: #110713
2023-10-07 08:18:01 +00:00
Mu-Chu Lee
840e68301c [AOTInductor] Change UpdateConstants to UpdateConstantsMap (#110576)
Summary: Change name of UpdateConstants to UpdateConstantsMap

Test Plan:

Differential Revision: D49937744

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110576
Approved by: https://github.com/chenyang78, https://github.com/khabinov
2023-10-07 07:36:57 +00:00
jjsjann123
bf0866fc16 deprecating nvfuser c++ API (#110318)
deprecating nvfuser c++ API

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110318
Approved by: https://github.com/davidberard98
2023-10-07 02:25:21 +00:00
soulitzer
563728f61c [reland] Update custom Function preserve torch function when inputs r… (#110679)
…eturned as-is

reland of https://github.com/pytorch/pytorch/pull/109825#issuecomment-1749803837

Opening this without ghstack to do codev. In our PR, we changed the signature of `_wrap_outputs`. There is some internal code that calls `_wrap_outputs` directly, so we also need to update that callsite.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110679
Approved by: https://github.com/albanD
2023-10-07 00:27:45 +00:00
Adnan Akhundov
2aa3064364 [inductor] Add aoti_torch_dtype_bool to AOTI ABI shim (#110713)
Summary: ATT

Test Plan: CI

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110713
Approved by: https://github.com/chenyang78, https://github.com/desertfire
2023-10-06 22:16:39 +00:00
George White
f4796df914 Add support for generators on the IPU device (#110704)
This change adds hooks similar to those used on other device types, to allow the Torch to create and use generators provided by the IPU backend.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110704
Approved by: https://github.com/ezyang
2023-10-06 21:36:14 +00:00
PyTorch MergeBot
ff0358b038 Revert "[C10] PG observability hooks. (#108815)"
This reverts commit 0c7a877745.

Reverted https://github.com/pytorch/pytorch/pull/108815 on behalf of https://github.com/albanD due to Add a new torch.distributed.hooks namespace but does not document it, test was added this morning ([comment](https://github.com/pytorch/pytorch/pull/108815#issuecomment-1751327751))
2023-10-06 19:49:49 +00:00
Rodrigo Kumpera
0c7a877745 [C10] PG observability hooks. (#108815)
Expose a set of observability hooks into C10D such that our users can
detect collectives failure both faster and more easily.

The design is similar to NCCL desync debug that it minimized the
overhead by doing most of the work out of the main thread.

This PR introduces a new module torch.distributed.hooks that exposes the following set of methods:

    register_collective_start_hook
    register_collective_end_hook
    register_process_group_hook

The process group hook exposes PG creation on the member ranks and call them inline from the
the PG creation code. This is fine since this happens during initialization and a limited number of times.

The collective start/end hooks are fired from a single background thread. It reads
events from a C++ queue and dispatches over.

Queue notification is oddly done using a pipe, this is needed so python can abort the thread on shutdown
and have it as background thread. This is not possible with more reasonable choices like a condvar.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108815
Approved by: https://github.com/wconstab, https://github.com/fduwjj
2023-10-06 18:52:46 +00:00
Joel Schlosser
17348b0f51 Implement split_with_sizes backward for NT (#110647)
Needed internally. Note that `split_with_sizes()` for NT is currently supported only on `dim=-1`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110647
Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer
ghstack dependencies: #110646
2023-10-06 18:44:22 +00:00
Joel Schlosser
48240ec62e Make unbind() overrideable for NT subclass (#110646)
Reland of #109122. Fixed the memory leak by not saving the outputs of `unbind()` for backward. Rather, the NT sizes are saved so undefined grads can replaced with zeros of the correct size.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110646
Approved by: https://github.com/soulitzer, https://github.com/cpuhrsch
2023-10-06 18:44:22 +00:00
cyy
e75f2e2ea1 Fix clang-tidy warnings in CUDAPluggableAllocator (#110678)
This PR fixes clang-tidy warnings in CUDAPluggableAllocator.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110678
Approved by: https://github.com/Skylion007
2023-10-06 17:33:08 +00:00
Michael Voznesensky
7d98549ca9 retain_graph=True in compiled_autograd (#110367)
Adds support for retain_graph=True - known as keep_graph_ internally in the autograd engine.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110367
Approved by: https://github.com/jansel
2023-10-06 08:22:10 +00:00
cyy
11b3210a11 [Reland2] Remove calls of c10::either (#110487)
This PR is reland of #109707 with fixes of MSVC failures.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110487
Approved by: https://github.com/soulitzer
2023-10-06 00:25:15 +00:00
PyTorch MergeBot
1c3fae46ee Revert "Support SingletonSymNode mul with coefficient (#110369)"
This reverts commit eb8feb8ff8.

Reverted https://github.com/pytorch/pytorch/pull/110369 on behalf of https://github.com/PaliC due to bottom diff is causing a plethora of internal failures ([comment](https://github.com/pytorch/pytorch/pull/110369#issuecomment-1749802899))
2023-10-05 23:51:28 +00:00
PyTorch MergeBot
236afe73a2 Revert "Update custom Function preserve torch function when inputs returned as-is (#109825)"
This reverts commit 4e73eee93f.

Reverted https://github.com/pytorch/pytorch/pull/109825 on behalf of https://github.com/PaliC due to causing a plethora of internal failures ([comment](https://github.com/pytorch/pytorch/pull/109825#issuecomment-1749802739))
2023-10-05 23:49:41 +00:00
PyTorch MergeBot
585e2bd818 Revert "Symintify guards.cpp (#110371)"
This reverts commit e1cfcdfa06.

Reverted https://github.com/pytorch/pytorch/pull/110371 on behalf of https://github.com/PaliC due to bottom diff is causing a plethora of internal failures ([comment](https://github.com/pytorch/pytorch/pull/110371#issuecomment-1749798063))
2023-10-05 23:42:35 +00:00
Scott Wolchok
9e72c9cccd [torch] easy missing move in aoti_runtime/model.h (#110469)
Just an extra shared_ptr copy, nothing fancy.

Differential Revision: [D49792510](https://our.internmc.facebook.com/intern/diff/D49792510/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110469
Approved by: https://github.com/Skylion007
2023-10-05 20:56:06 +00:00
Jason Park
26f634eefb Enable aarch64 for fixing undefined symbol error. (#110542)
Summary: ARM can be safely supported

Reviewed By: andrewjcg, aaronenyeshi

Differential Revision: D49921679

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110542
Approved by: https://github.com/aaronenyeshi
2023-10-05 16:16:06 +00:00
Oleg Khabinov
cf1b494afd [AOTInductor] Store loaded kernels in the model (#110554)
Defining kernels as static vars is problematic for subsequent model loading on non-default CUDA devices.

Assuming those kernels were loaded in context of the device #0, so, they are not nullptr anymore, therefore kernels won't work on devices other than the device #0.

This change makes devices remembered at model level in AOT mode.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110554
Approved by: https://github.com/chenyang78, https://github.com/desertfire
2023-10-05 10:17:05 +00:00
Sehoon Kim
c36b31d530 torch::nn::AdaptiveLogSoftmaxWithLoss: check length of cutoffs (#106777)
Fixes #106698

Also added a check for python API, because current error message
```
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/sehoon/pytorch-latest/torch/nn/modules/adaptive.py", line 128, in __init__
    or (min(cutoffs) <= 0) \
ValueError: min() arg is an empty sequence
```
is not very comprehensible.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106777
Approved by: https://github.com/albanD
2023-10-05 05:35:47 +00:00
Edward Z. Yang
6a974bec5d Change flash attention outputs to be SymInt instead of int (#110533)
Fixes https://github.com/pytorch/pytorch/issues/110322

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110533
Approved by: https://github.com/albanD
2023-10-05 01:00:07 +00:00
soulitzer
e1cfcdfa06 Symintify guards.cpp (#110371)
Separating this out so we can check perf more easily

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110371
Approved by: https://github.com/ezyang
ghstack dependencies: #110044, #110369, #110370
2023-10-04 22:56:42 +00:00
soulitzer
eb8feb8ff8 Support SingletonSymNode mul with coefficient (#110369)
We want to be able to use SingletonSymNode to represent strides for Jagged layout tensor. The following is for 3D, but easily generalizable to higher dimensions.

Constraints:
- [B, x, D] (where x represents the "variably lengthed dim") can be strided in two ways [x, 1, sum(x)] and [dx, d, 1]. We need two different placeholder values depending on how the jagged tensor is strided.
- When doing operations we need the strides of output tensors to be expressable in terms of the strides and sizes of the inner tensors. Given [B, x, D] @ [D, D'], the output strides is [x * D', D', 1] rather than some opaque [x2, D', 1]. This constraint exists because if I'm tracing, I need a symint to represent the output stride. This symint needs to come from somewhere; I get it in several ways: (1) create a constant, (2) unbacked symint, (3) create a new input using a source, (4) output of an operation on an existing symint. It is clear that (4) is what we want here, which brings us to the design below.

Design:

Given the two constraints, the most straightforward way to implement this is actually to update SingletonSymNode to include some scalar factor, i.e. Morally, SingletonSymNode represents `factor * [s_0, s_1, …, s_n]` This enables us to symbolically compute strides from sizes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110369
Approved by: https://github.com/ezyang
ghstack dependencies: #110044
2023-10-04 22:56:15 +00:00
soulitzer
4e73eee93f Update custom Function preserve torch function when inputs returned as-is (#109825)
Fixes https://github.com/pytorch/pytorch/issues/109805
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109825
Approved by: https://github.com/albanD
2023-10-04 22:45:11 +00:00
Yang Chen
46a5558cd5 [AOTInductor] Simplified AOTInductor interface and model class (#110411)
Summary:
This PR removed several APIs from the AOTInductor interface,
which are not used by the client.

It also simplified AOTInductor's model class by removing
the dim info for input/output tensors. We included dim info
before to return max output shapes, which was used by the client
to allocate memory for output tensors. Now, we allocate output
tensor memory from the .so so that we don't need to maintain
such information any more. The deletion of dim info from
the model class also simplified the codegen quite a bit.

Test Plan: ci

Reviewed By: khabinov

Differential Revision: D49835430

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110411
Approved by: https://github.com/khabinov, https://github.com/desertfire, https://github.com/jansel
2023-10-04 18:35:24 +00:00
Oguz Ulgen
f04b1a0d27 [AOTInductor] Implement autograd eager backend for native triton kernels (#110403)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110403
Approved by: https://github.com/zou3519, https://github.com/bdhirsh
2023-10-04 17:56:56 +00:00
Shiyan Deng
247c574313 [jit] make register parameter/buffer thread safe in torch::jit::Module (#110488)
Summary: Registering param/buffer will write into a vector inside Object, need to maintain thread safety if we have threads reading from the vector and writing to the vector at the same time.

Test Plan: CI

Differential Revision: D49882601

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110488
Approved by: https://github.com/davidberard98
2023-10-04 17:04:23 +00:00
Banit Agrawal
30c4c6ff9b [PyTorch CCA] Refactor caching allocator config code (#110123)
Summary: This diff refactors the code by moving CUDAAllocatorConfig into the header file. This config refactoring is done so that we can use the same config code for CUDA pinned memory as well.

Test Plan: sandcastle

Differential Revision: D49653265

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110123
Approved by: https://github.com/zdevito
2023-10-04 14:58:23 +00:00
PyTorch MergeBot
156aefa89b Revert "[3/N] Add -Wdeprecated and related fixes (#109698)"
This reverts commit c31fcdaa4f.

Reverted https://github.com/pytorch/pytorch/pull/109698 on behalf of https://github.com/PaliC due to breaking quantization tests ( quantization/test_quantize_per_channel_sub_byte and  quantization/test_quantize_per_channel_float_qparams) internally ([comment](https://github.com/pytorch/pytorch/pull/109698#issuecomment-1746999806))
2023-10-04 14:33:47 +00:00