Commit Graph

2184 Commits

Author SHA1 Message Date
Bin Bao
fabf9433e7 [AOTI][refactor] Organize model runner files (#116022)
Summary: Move runner util files into a subdirectory and put AOTIModelContainerRunnerCpu into a separate file

Differential Revision: [D52300693](https://our.internmc.facebook.com/intern/diff/D52300693)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/116022
Approved by: https://github.com/khabinov
2023-12-20 15:35:34 +00:00
Nikita Shulga
d7caef7996 [CI] Update clang-format (#116002)
To 17.0.6 build using https://github.com/pytorch/test-infra/blob/main/.github/workflows/clang-tidy-linux.yml

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116002
Approved by: https://github.com/suo
2023-12-18 14:58:46 +00:00
Mu-Chu Lee
c285ca7916 [AOTInductor] Add updaing constant buffer to active buffer. (#116001)
Summary:
Refactor update inactive constant buffer to allow updating with active
buffer.

Test Plan:
Existing test to test inactive buffer updates.
UpdateConstantsCuda in cpp test for active buffer updates.

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/116001
Approved by: https://github.com/chenyang78
2023-12-18 11:49:03 +00:00
soulitzer
4d8ad4fb82 Move SingletonSymNodeImpl from c10 to aten (#114895)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114895
Approved by: https://github.com/jbschlosser
2023-12-13 20:01:18 +00:00
Wongboo
68f74dd162 Add python and C++ support for LPPool3d (#114199)
Add python and C++ support for LPPool3d to Fixes #114114

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114199
Approved by: https://github.com/mikaylagawarecki
2023-12-08 18:18:44 +00:00
Will Constable
7562b45454 Reland "[C10D] Use future for flight recorder dump (#115176)" (#115332)
Replaces the "always sleep 30 sec before abort" with "wait up to 30 sec
for the future to complete then abort". The difference in this case is
the abort happens as soon as the dump finishes up to a maximum, instead
of always waiting the maximum.

Allows multiple calls to dump, which will be serialized.

Renames tryWriteDebugInfo to launchAsyncDebugDump in spirit of the
change to support more than one launch and to always launch rather than
only launching on the first call.

Adds a test for dumping on timeout.

This reverts commit ac7d14baad.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115332
Approved by: https://github.com/fduwjj
2023-12-07 21:20:58 +00:00
Shaltiel Shmidman
ee8b33f7d5 Fixed crash when calling pad_packed_tensor when packed with cuda tensors and ensure_sorted=false due to indexing with tensors on different devices (#115028)
Fixes #115027

Fix in csrc as done in the python code [here](https://github.com/pytorch/pytorch/blob/main/torch/nn/utils/rnn.py#L338).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115028
Approved by: https://github.com/drisspg
2023-12-07 18:09:18 +00:00
suo
686a3e0bf0 [pytorch][PR] introduce WeakHashRef (#115216)
We would like weak dictionaries that have `torch.ScriptObject` keys. Similar to tensors, we need to override the behavior of the ref to dot he right thing under comparison.

This change also makes it so that WeakIdKeyDictionary works with a pluggable ref_type.

Differential Revision: [D51828205](https://our.internmc.facebook.com/intern/diff/D51828205/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/115216
Approved by: https://github.com/albanD
2023-12-07 17:48:11 +00:00
PyTorch MergeBot
ac7d14baad Revert "[C10D] Use future for flight recorder dump (#115176)"
This reverts commit 0e07e3dbe4.

Reverted https://github.com/pytorch/pytorch/pull/115176 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but the test_timeout_dumps is failing in trunk 0e07e3dbe4 ([comment](https://github.com/pytorch/pytorch/pull/115176#issuecomment-1844076455))
2023-12-07 02:09:58 +00:00
Will Constable
0e07e3dbe4 [C10D] Use future for flight recorder dump (#115176)
Replaces the "always sleep 30 sec before abort" with "wait up to 30 sec
for the future to complete then abort".  The difference in this case is
the abort happens as soon as the dump finishes up to a maximum, instead
of always waiting the maximum.

Allows multiple calls to dump, which will be serialized.

Renames `tryWriteDebugInfo` to `launchAsyncDebugDump` in spirit of the
change to support more than one launch and to always launch rather than
only launching on the first call.

Adds a test for dumping on timeout.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/115176
Approved by: https://github.com/zdevito
2023-12-06 23:42:19 +00:00
Mu-Chu Lee
80527c0cf2 [AOTInductor] Double buffering for Weights (#114446)
Summary:
This adds function to model container doing weight swapping with double buffering.

There are 2 parts for double buffering
a) Write constants into inactive buffer
b) Swap active buffer

For (a), we write the constants into the buffer that's currently not in use, and store the information in both constants map and the corresponding constant array to read.
For (b), we obtain the lock, and activate the constant map/constant array that is inactive, and flag the one that's currently in use to inactive.

Test Plan:
test/cpp/aot_inductor/test.cpp

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D51543732](https://our.internmc.facebook.com/intern/diff/D51543732)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114446
Approved by: https://github.com/chenyang78, https://github.com/eellison
2023-12-05 22:31:56 +00:00
FFFrog
541591dd79 Add the appropriate check on div_value to the cpp frontend (#114671)
Fixes #114334

As the title stated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/114671
Approved by: https://github.com/mikaylagawarecki
2023-12-04 01:28:11 +00:00
Chip Turner
9cc040fef6 Switch env variable use in test harnesses to the non-deprecated names to fix warnings (#114880)
Previously:

```
[W Utils.hpp:133] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt)
[W Utils.hpp:133] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function getCvarInt)
```

With this PR, those warnings disappear.  They were introduced in #114077

This change was generated with this sed script, applied with `sed -i -f /tmp/x **/*.{py,hpp,cpp,cc}` and hand inspected.

```
s/\bNCCL_BLOCKING_WAIT\b/TORCH_NCCL_BLOCKING_WAIT/g
s/\bNCCL_ENABLE_TIMING\b/TORCH_NCCL_ENABLE_TIMING/g
s/\bNCCL_DESYNC_DEBUG\b/TORCH_NCCL_DESYNC_DEBUG/g
s/\bNCCL_ASYNC_ERROR_HANDLING\b/TORCH_NCCL_ASYNC_ERROR_HANDLING/g
s/\bENABLE_NCCL_HEALTH_CHECK\b/TORCH_ENABLE_NCCL_HEALTH_CHECK/g
s/\bNCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK\b/TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK/g
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114880
Approved by: https://github.com/kwen2501
2023-12-01 20:08:23 +00:00
Scott Wolchok
165f4f6ccf [PyTorch] Redirect c10::optional to std::optional (#101995)
We have C++17 now!

I am intentionally dropping the `c10::optional<c10::ArrayRef>` size optimization. It was intended to improve dispatch, but thanks to D34602980 / #70864 we don't use `optional<ArrayRef>` in function arguments anymore anyway.

Differential Revision: [D46079028](https://our.internmc.facebook.com/intern/diff/D46079028/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/101995
Approved by: https://github.com/malfet, https://github.com/Skylion007, https://github.com/ezyang
2023-11-30 02:46:41 +00:00
Chip Turner
066e072524 Retry #112889 (Opportunistically use ncclCommSplit when creating new NCCL groups) (#114385)
- [c10d] (retry) Opportunistically use `ncclCommSplit` when creating new NCCL groups (#112889)
- Guard use of `split_from` with a `hasattr` check for cases when NCCL (or RCCL) lacks `ncclCommSplit`

Fixes cause of revert of original PR

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114385
Approved by: https://github.com/huydhn
2023-11-23 07:00:00 +00:00
PyTorch MergeBot
b927a4e2ca Revert "Opportunistically use ncclCommSplit when creating new NCCL groups (#112889)"
This reverts commit 64a5372e6c.

Reverted https://github.com/pytorch/pytorch/pull/112889 on behalf of https://github.com/huydhn due to Sorry for reverting you change, but it is failing ROCm distributed jobs in trunk 4d07428ede ([comment](https://github.com/pytorch/pytorch/pull/112889#issuecomment-1823214376))
2023-11-22 17:43:51 +00:00
Chip Turner
64a5372e6c Opportunistically use ncclCommSplit when creating new NCCL groups (#112889)
Currently `ncclCommInitRankConfig` is always used when creating new
communicator groups.  This is wasteful as it creates non-shared pairs
of endpoint queues as well as costs time to re-establish
communication.

This change is transparent and opportunistic; when `dist.new_group` is
called, it will use the existing, healthy world process group to
select the right ranks to include in the process group.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112889
Approved by: https://github.com/kwen2501
2023-11-21 21:03:52 +00:00
Pavan Balaji
8f8722e3f1 [nccl-pg] Avoid using NCCL_ prefix for non-NCCL env variables (#114077)
NCCL_ prefix should only be used for NCCL library's environment variables.  We currently use a few environment variables in PyTorch with the NCCL_ prefix that are the NCCL library does not understand.

This patch renames such environment variables to use the TORCH_NCCL_ prefix instead.  We still maintain the old NCCL_ variables, but throw a warning when they are used.

The following env changes have been made:

`NCCL_BLOCKING_WAIT` -> `TORCH_NCCL_BLOCKING_WAIT`
`NCCL_ENABLE_TIMING` -> `TORCH_NCCL_ENABLE_TIMING`
`NCCL_DESYNC_DEBUG` -> `TORCH_NCCL_DESYNC_DEBUG`
`NCCL_ASYNC_ERROR_HANDLING` -> `TORCH_NCCL_ASYNC_ERROR_HANDLING`
`ENABLE_NCCL_HEALTH_CHECK` -> `TORCH_ENABLE_NCCL_HEALTH_CHECK`
`NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK` -> `TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK`

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/114077
Approved by: https://github.com/fduwjj
2023-11-21 07:23:42 +00:00
cyy
226384b460 [2/N] Cleanup header inclusions in torch_cpu by iwyu (#109964)
Further cleaning up of torch_cpu header inclusions.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109964
Approved by: https://github.com/ezyang, https://github.com/Skylion007
2023-11-19 20:56:32 +00:00
Pavan Balaji
958f3b0df6 [nccl-pg] Migrate to getCvar* functions for env variable checking (#113797)
Summary:
The getCvar* functions allow us to provide multiple environment variables for the same value.  This allows us to deprecate some variables in favor of others, while still allowing users to temporarily use the old variables for some time.

Test Plan: OSS CI

Reviewed By: fduwjj, XilunWu

Differential Revision: D51225487

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113797
Approved by: https://github.com/fduwjj
2023-11-19 03:48:58 +00:00
Andrzej Kotlowski
0885c58296 Add Bfloat16 scalar support to gloo backend (#113557)
There was missing support for bfloat scalars. When I use gloo backend
`torch.distributed.init_process_group(backend='gloo')`
and run
`torch.nn.parallel.DistributedDataParallel(model)`
and _model_ has Bfloat16 features I receive following error:
`RuntimeError: Invalid scalar type`

This change fix this issue.
c10::BFloat16 defines conversions from/to float, so calculations are made on float for bfloat.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113557
Approved by: https://github.com/XilunWu, https://github.com/jgong5
2023-11-17 21:16:54 +00:00
fduwjj
015fd2eb41 [NCCL PG] Add dumping flight recorder in the NCCL watchdog timeout (#113678)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113678
Approved by: https://github.com/XilunWu
ghstack dependencies: #113503
2023-11-17 07:00:41 +00:00
Mu-Chu Lee
eddce3c054 [AOTInductor] Rename model_runner to model_container_runner (#111324)
Summary:
We rename the model_runner to model_container_runner to prepare for
adding tests of pure model without container.

Test Plan:
commit itself is a test.

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111324
Approved by: https://github.com/desertfire, https://github.com/chenyang78
2023-11-16 19:14:22 +00:00
fduwjj
5fb1d8f18a [NCCL PG] Enable storing nccl traces into storage and make it configurable (#113503)
This PR is to enable the store of NCCL flight recorder to storage and make it configurable by letting users register their own way of storing the debug info. We will then provide users a script to offline parse and process the dumped blobs.

One thing, this PR is not trying to resolve is to decide where to dump the debug info. I will send a follow-up PR to address that.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113503
Approved by: https://github.com/zdevito
2023-11-16 07:44:15 +00:00
Aleksei Nikiforov
b526aae95a test_lazy: skip HashTest.Scalar (#112747)
Fixes #99883

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112747
Approved by: https://github.com/huydhn
2023-11-16 01:22:58 +00:00
fduwjj
f9114193bd [NCCL PG] ADD a separate monitoring thread to ensure we collect debug info and check watchdog heartbeat (#112518)
This PR has the following goals:
1. Detect unhealthy nccl watchdog thread by implementing a heartbeat. NCCL watchdog sometimes can hang for several reasons such as nccl/cuda API bugs or unexpected blocking behaviors. This is the last resort to ensure that we don't silently keep the training job run for hours.
2. Sometimes, the process gets stuck in the destroy of NCCL PG, and this PR will ensure that we will eventually abort it after some time (by default 2 mins)
3. Once heartbeat cannot be heard, we dump debug information (for now, we just use the flight recorder implemented in https://github.com/pytorch/pytorch/pull/110960/files) to disk. (How and where to dump the debug info will be addressed in the following PR).
4. Finally, we initiate std::abort via `LOG(FATAL)` to kill the process.

To clarify further what this PR is trying to solve, we first list are four cases when a NCCL PG can end up with:
- case 1: ncclwatchdog gets stuck (maybe some blocking API) and heartbeat monitor kills it during regular heartbeat monitor loop.
- case 2: ncclwatchdog timeout and desync report or destroy kicked in(let's call it shutdown) but this shutdown takes so long and heartbeat believes it has to kills the process anyway.
- case 3: ncclwatchdog aborts the process (heartbeat monitor not involved)
- case 4: program exits cleanly (heartbeat monitor not involved)

As we can see here, this PR is trying to address case one and two and we also want to ensure adding one more monitor thread does not interfere what we are currently doing in case three and four. That's why we added two flags `terminateHeartbeatMonitorThread_` and `collectiveDebugInfoMode_`.

For case three and four, either `monitorWakeUpCV_` will be waked up in the destructor or `terminateHeartbeatMonitorThread_` will be set to true. So that monitor thread will just exit ASAP.

For case one, both `terminateHeartbeatMonitorThread_` and `collectiveDebugInfoMode_` will still false when monitor thread see there are no heartbeat, so it will directly kill the process. For case two, either `terminateHeartbeatMonitorThread_` and `collectiveDebugInfoMode_` will be true, the monitor thread will wait extra time before killing the process.

Differential Revision: [D51146305](https://our.internmc.facebook.com/intern/diff/D51146305)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/112518
Approved by: https://github.com/kwen2501, https://github.com/wconstab
2023-11-10 04:41:14 +00:00
Nikita Shulga
88920b26be [Cmake] Check that gcc-9.4 or newer is used (#112858)
As this is the oldest gcc that is fully compatible with C++17 standard.
- Replace number of conditional version with simpler `if(CMAKE_COMPILER_IS_GNUCXX)` or `append_cxx_flag_if_supported`.
- As `-Wsuggest-override` condition was hidden before incorrect guard, add missing `override` keywords to `torch::autograd::PyFunctionTensorPostAccGradHooks::apply_with_saved` , `caffe2::python::TensorFeeder::Feed` and `cafee2::NetObserverReporterPrint::report```

Fixes https://github.com/pytorch/pytorch/issues/101839

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112858
Approved by: https://github.com/Skylion007, https://github.com/albanD
2023-11-06 17:19:53 +00:00
PyTorch MergeBot
679ca510b0 Revert "[Cmake] Check that gcc-9.4 or newer is used (#112858)"
This reverts commit ad894cd072.

Reverted https://github.com/pytorch/pytorch/pull/112858 on behalf of https://github.com/PaliC due to breaking internal tests (check diff for test page) ([comment](https://github.com/pytorch/pytorch/pull/112858#issuecomment-1795485009))
2023-11-06 16:56:09 +00:00
Nikita Shulga
ad894cd072 [Cmake] Check that gcc-9.4 or newer is used (#112858)
As this is the oldest gcc that is fully compatible with C++17 standard.
- Replace number of conditional version with simpler `if(CMAKE_COMPILER_IS_GNUCXX)` or `append_cxx_flag_if_supported`.
- As `-Wsuggest-override` condition was hidden before incorrect guard, add missing `override` keywords to `torch::autograd::PyFunctionTensorPostAccGradHooks::apply_with_saved` , `caffe2::python::TensorFeeder::Feed` and `cafee2::NetObserverReporterPrint::report```

Fixes https://github.com/pytorch/pytorch/issues/101839

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112858
Approved by: https://github.com/Skylion007, https://github.com/albanD
2023-11-04 05:40:08 +00:00
Nikita Shulga
8665a51baf Initialize logging facility when running ProcessGroupNCCLTest (#112809)
If code is compiled without `glog`, there are no way to control log levels other than explicitly calling `c10::initLogging()`

Test plan: Run `TORCH_CPP_LOG_LEVEL=0 ./bin/ProcessGroupNCCLTest` and observe extra log messages

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112809
Approved by: https://github.com/fduwjj
2023-11-03 02:26:13 +00:00
Pritam Damania
e66ec5843f [RESUBMIT] Cleanup error reporting for ProcessGroupNCCL (#112419)
Continuing some of the work from https://github.com/pytorch/pytorch/pull/108191, I realized majority of errors raised from ProcessGroupNCCL were just generic RuntimeError.

In this PR, I've added appropriate error types to all the exceptions raised from ProcessGroupNCCL.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/112419
Approved by: https://github.com/fduwjj
2023-10-31 05:58:21 +00:00
Aaron Enye Shi
63c089b09d [c10] Move profiler clock to libc10 for timestamps (#111972)
Summary:
Move the profiler's Approximate Clock from libtorch to libc10. The main reason is to allow c10 features to get time.

The clock is using TSC when available for performance. CUDA Caching Allocator's implementation of memory snapshot will add the timestamps to memory events with this same clock in subsequent diff.

Test Plan: CI

Differential Revision: D50601935

Pulled By: aaronenyeshi

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111972
Approved by: https://github.com/davidberard98
2023-10-27 16:18:40 +00:00
PyTorch MergeBot
abe172e268 Revert "Cleanup error reporting for ProcessGroupNCCL (#111979)"
This reverts commit b29c658265.

Reverted https://github.com/pytorch/pytorch/pull/111979 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is failing multigpu test in trunk b29c658265 ([comment](https://github.com/pytorch/pytorch/pull/111979#issuecomment-1781919184))
2023-10-26 21:29:40 +00:00
angelayi
b126adcdee [aotinductor] Pass TorchIR to AOTInductor (#110020)
Updates `_export.aot_compile` to pass a torch IR graph to inductor, allowing inductor to now run the pre_grad_passes, and reuse more of inductor's code.
Also updates the API to only return the `so_path`, and not returning the exported program. The pytree call spec is now serialized and placed inside of the generated model code. When calling the model, because there is no c++ pytree implementation linked yet, we can access the call specs through `get_call_spec()`, and call pytree flatten/unflattenin python.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110020
Approved by: https://github.com/desertfire
2023-10-26 15:54:31 +00:00
Jeff Daily
28c0b07d19 [ROCm] remove HCC references (#111975)
- rename `__HIP_PLATFORM_HCC__` to `__HIP_PLATFORM_AMD__`
- rename `HIP_HCC_FLAGS` to `HIP_CLANG_FLAGS`
- rename `PYTORCH_HIP_HCC_LIBRARIES` to `PYTORCH_HIP_LIBRARIES`
- workaround in tools/amd_build/build_amd.py until submodules are updated

These symbols have had a long deprecation cycle and will finally be removed in ROCm 6.0.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111975
Approved by: https://github.com/ezyang, https://github.com/hongxiayang
2023-10-26 02:39:10 +00:00
Pritam Damania
b29c658265 Cleanup error reporting for ProcessGroupNCCL (#111979)
Continuing some of the work from https://github.com/pytorch/pytorch/pull/108191, I realized majority of errors raised from ProcessGroupNCCL were just generic RuntimeError.

In this PR, I've added appropriate error types to all the exceptions raised from ProcessGroupNCCL.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111979
Approved by: https://github.com/fduwjj
2023-10-26 01:39:54 +00:00
cyy
f9cc7f6a1c Enable Wno-unused-private-field,Wunused-lambda-capture and fix CUDA warnings (#110856)
This PR enables Wno-unused-private-field,Wunused-lambda-capture  and some CUDA warnings were fixed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110856
Approved by: https://github.com/albanD, https://github.com/malfet
2023-10-25 03:39:05 +00:00
zdevito
babb6c6ac4 nccl flight recorder (#110960)
Keep a buffer of the last 16384 nccl work actions, including the stack
trace that launched the event.

When torch._C._distributed_c10d._dump_nccl_trace(), it an dump these to
a pickled archive.

For each action we get:
process_group_id, seq_id, collective_name, size_of_first_tensor, stack trace

state - issued, started, completed (based on cuda events and queried if
necessary when the dump is requested)

I tested that it is possible to query event state when the streams are
otherwise stuck.

Differential Revision: [D50138956](https://our.internmc.facebook.com/intern/diff/D50138956)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110960
Approved by: https://github.com/wconstab
2023-10-24 07:12:21 +00:00
Kazuaki Ishizaki
deb800ee81 Fix typo under test directory (#111304)
This PR fixes typo in comments under `test` directory.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111304
Approved by: https://github.com/Skylion007
2023-10-16 23:06:06 +00:00
Yang Chen
6748a14a71 [aot_inductor] add a test with AOTInductor + TorchScript (#111124)
This test may be of a reference for using AOTInductor with
TorchScript.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111124
Approved by: https://github.com/jansel
2023-10-12 19:29:07 +00:00
Kurt Mohler
0f924cdee3 Fix functional::smooth_l1_loss signatures to not override beta (#109798)
This splits `nn::functional::smooth_l1_loss` into two different signatures in order to keep backward compatibility for calling the function like `smooth_l1_loss(input, target, /*reduction=*/..., /*beta=*/...)`

Fixes #70163

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109798
Approved by: https://github.com/mikaylagawarecki
2023-10-11 21:37:37 +00:00
Bin Bao
86619c9c9d [aotinductor] Add both cpu and cuda tests for the AOTInductor cpp test (#110920)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110920
Approved by: https://github.com/chenyang78
ghstack dependencies: #110652, #110891
2023-10-11 15:58:28 +00:00
Bin Bao
3058700f7f [aotinductor] Add AOTIModelRunner as a utility class (#110891)
Summary: Introduce a utility class AOTIModelRunner to take care of running an AOTInductor compiled model. It does things like dlopen a model, initialize the model container, setup inputs and outputs, and destroy the model container.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110891
Approved by: https://github.com/chenyang78
ghstack dependencies: #110652
2023-10-11 15:58:28 +00:00
Bin Bao
b17c247eb1 [aotindutor] Update the cpp test example (#110652)
Summary: store inputs and outpus in python, and load them back to run the compiled model in c++ and compare the output
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110652
Approved by: https://github.com/chenyang78
2023-10-11 15:58:28 +00:00
Will Constable
ca03f36233 Change ProcessGroupNCCL default timeout to 10 min (#110947)
Avoid changing default for other backends as CPU backend (GLOO) may need
longer timeouts.

Motivated by trying to save cluster time when encountering collective
hangs.  Generally collectives should time out within seconds and 30
minutes (or 10 minutes) should provide ample headroom for edge cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110947
Approved by: https://github.com/xw285cornell, https://github.com/fduwjj
2023-10-10 22:28:39 +00:00
sunghyunjun
b5268456f9 Fix optimize_for_inference to support modules that don't have a forward method (#110013)
Fixes #108662

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110013
Approved by: https://github.com/davidberard98
2023-10-02 20:13:44 +00:00
cyy
a81d083b1c [Reland] Add -Wdeprecated and related fixes (#110019)
This is reland of PRs #https://github.com/pytorch/pytorch/pull/108626 and #109564. We fixed the IOS build failure by changing
```
((CHECK) ? (EXPR) : ([] { assert(!#CHECK); }(), (EXPR)))
```
to
```
((CHECK) ? (EXPR) : ([] { assert(false); }(), (EXPR)))
```
in TR2_OPTIONAL_ASSERTED_EXPRESSION, since the former syntax was invalid on Apple Clang. Anyway, we could apply the simple fix hoping that c10::optional would be replaced by std::optional soon.
We also enabled -Wdeprecated on c10.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110019
Approved by: https://github.com/clee2000
2023-09-28 03:34:29 +00:00
hongxyan
0511df0ee9 [ROCM] enable skipped test_api cpp tests (#109817)
[ROCM] enable skipped  test_api cpp tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109817
Approved by: https://github.com/jithunnair-amd, https://github.com/malfet
2023-09-27 16:52:46 +00:00
Bin Bao
4bf1cd6961 [aotinductor] Rename aot_runtime to aoti_runtime (#110007)
Summary: Make the naming more explicit

Differential Revision: D49593528

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110007
Approved by: https://github.com/houseroad
2023-09-26 00:46:54 +00:00
cyy
265acd4bea Clean up CMake target linking (#109959)
This PR cleans up more CMake target linking.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109959
Approved by: https://github.com/ezyang
2023-09-25 01:37:14 +00:00