pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Yu Guo	2bf3ca1be7	[torchdynamo] preserve deterministic_algorithms_warn_only in convert_context (#110457 ) Summary: preserve deterministic_algorithms_warn_only in dynamo context Test Plan: modified unit tests to test warn_only Differential Revision: D49872622 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110457 Approved by: https://github.com/jansel	2023-10-04 07:12:32 +00:00
Xiaodong Wang	562c68e56f	[nccl] denoise warning msg (#110433 ) Summary: This is too noisy for anything set with TORCH_NCCL_USE_COMM_NONBLOCKING. Just warn once. Test Plan: GH CI Differential Revision: D49846339 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110433 Approved by: https://github.com/awgu	2023-10-04 06:21:53 +00:00
zdevito	3fe3439242	Use LLVMSymbolizer directly for unwind inside fbcode (#108800 ) Using LLVMSymbolizer directly avoids having to call fork which has caused timeouts in some circumstances. Differential Revision: [D49070589](https://our.internmc.facebook.com/intern/diff/D49070589/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/108800 Approved by: https://github.com/aaronenyeshi	2023-10-04 04:04:08 +00:00
Howard Huang	efb73fe8e4	Fix send()/recv() to adhere to timeout (#109611 ) Summary: Point to point ops don't enqueue their work to the `workMetaList_` which means that the NCCL watchdog does not watch over them, hence they do not respect the collective timeouts. Test Plan: While trying to add a test I found we dont have tests which validate the nccl watch dog. It looks like this is because we dont have a good way to detect when nccl watchdog has thrown an error (exception is thrown in a side thread) in our testing framework / `MultiprocessTestCase` I manually tested this change with the script in https://github.com/pytorch/pytorch/issues/109401, but need to look more closely at how to automate a test for NCCL watchdog Differential Revision: D49418976 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109611 Approved by: https://github.com/wconstab	2023-10-03 23:27:45 +00:00
Xiaodong Wang	a0bffe7ed7	[S366352] Print nccl version during initialization (#110305 ) Summary: print nccl version during initialization Differential Revision: D49603220 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110305 Approved by: https://github.com/Skylion007, https://github.com/fegin, https://github.com/rohan-varma	2023-10-03 23:09:48 +00:00
cyy	c31fcdaa4f	[3/N] Add -Wdeprecated and related fixes (#109698 ) This PR follows #108626. Hopefully we can enable the warning in the next PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109698 Approved by: https://github.com/Skylion007, https://github.com/ezyang	2023-10-03 22:50:53 +00:00
Mu-Chu Lee	836ba6430a	[AOTInductor] Initial functionality for Inf and NaN checker (#109526 ) Summary: Add initial functionality for Inf and NaN checker for AOTInductor. Test Plan: Included in commit. Skipped for CI as SIGABRT can't be captured by pytest. Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D49379751](https://our.internmc.facebook.com/intern/diff/D49379751) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109526 Approved by: https://github.com/chenyang78	2023-10-03 22:39:42 +00:00
Octavian Guzu	b5c3a17c2c	[fuzzing result][fuzz_torch_jit_lite_interpreter] read-heap-buffer-overflow-far-from-bounds (size 4) in c10::IValue::IValue() (#110441 ) Summary: This diff fixes a heap underflow found by fuzzing in torch/csrc/jit/runtime/vararg_functions.cpp Test Plan: CI and ``` arc lionhead crash reproduce 1753074381791061 ``` doesn't crash anymore. Differential Revision: D49537535 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110441 Approved by: https://github.com/Skylion007	2023-10-03 18:48:12 +00:00
Yang Chen	da63c7f2c3	[AOTInductor] remove CUDA dependency for cpp backend (#110409 ) Summary: Previously, we link against cuda libs even for pure cpp backend. This caused issues for cases where the inference platform does not have GPUs. This diff removed cuda dependency for cpp backend. Reviewed By: bertmaher, muchulee8, mikekgfb Differential Revision: D49800712 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110409 Approved by: https://github.com/bertmaher, https://github.com/desertfire	2023-10-03 18:36:00 +00:00
Bert Maher	aecfe5d168	[aoti] Remove pessimizing move (#110446 ) "`std::move` of a temporary prevents copy elision" says the compiler, and I am pretty sure it is right. Since AtenTensorHandle* implicitly converts to RAIIAtenTensorHandle, I simply called emplace_back; happy to put an explicit ctor if that makes folks happier. Differential Revision: [D49842542](https://our.internmc.facebook.com/intern/diff/D49842542/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110446 Approved by: https://github.com/desertfire, https://github.com/Skylion007 ghstack dependencies: #110445	2023-10-03 17:44:58 +00:00
Bert Maher	174e46b853	[inductor][easy] Free functions in headers should be declared inline (#110445 ) If multiple files include model.h, you end up with duplicate symbols errors. Differential Revision: [D49842167](https://our.internmc.facebook.com/intern/diff/D49842167/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110445 Approved by: https://github.com/desertfire, https://github.com/Skylion007	2023-10-03 17:44:49 +00:00
David Berard	4069d1de59	[distributed] Remove recordStream for callback that ends a profiler event (#109933 ) Background: recordStreams can result in memory spikes, so we don't want them to appear in FSDP (https://dev-discuss.pytorch.org/t/fsdp-cudacachingallocator-an-outsider-newb-perspective/1486). @ awgu is working on fixing this, but it turns out profiler was causing recordStream to get called when it is enabled. Why profiler was causing recordStream to get called: NCCL calls add profiler events manually; they register a callback to be executed when the future for the collective is completed; this indicates the end of the CPU-side profiler event for the callback: `c2c7c4035f/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp (L1822-L1824)` In order to guarantee safety, ivalue::Future::invokeCallback calls `recordStream` on the future's storage buffers; this marks the fact that other streams (e.g. the one that the callback runs on) may need to use the storage. `c2c7c4035f/aten/src/ATen/core/ivalue_inl.h (L1171-L1173)` Change: The end-profiler-event callback doesn't actually use the future, so we don't need to recordStream on it. This PR introduces an optional parameter `uses_future` for adding callbacks; a user can set this variable to "false" to unsafely skip the recordStream, if the user knows that the future will not be used in the lambda. Tests: (a) unit tests; (b) added an assert in recordStream: `c2c7c4035f/c10/cuda/CUDACachingAllocator.cpp (L3260)` and verified that it doesn't get triggered when running basic distributed tests w/ profiler enabled Pull Request resolved: https://github.com/pytorch/pytorch/pull/109933 Approved by: https://github.com/wconstab	2023-10-03 14:40:43 +00:00
cyy	d58a91b2a6	[4/N] Move remaining c10::variant calls to std::variant (#110382 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/110382 Approved by: https://github.com/Skylion007	2023-10-02 23:52:04 +00:00
sunghyunjun	b5268456f9	Fix optimize_for_inference to support modules that don't have a forward method (#110013 ) Fixes #108662 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110013 Approved by: https://github.com/davidberard98	2023-10-02 20:13:44 +00:00
RihamSelim	92242f599a	[PyTorch] Add Expanded call stack to nodes [Take 2] (#110229 ) Summary: Adding back D46578700 / PR https://github.com/pytorch/pytorch/pull/108426 Note: The changes were originally reverted due to memory regression, these changes are putting the code behind a gflag so it is only used by binaries that require expanded stack for BPF Profiling. Original Diff comment: To get a Node's call stack we currently loop on the InlinedCallStack graph and follow the "callee" chain. Since the node's inlined stack does not change we can optimize this but expanding the node's inlined stack once and reusing it. This is particularly useful when reading the node's stack from another process (e.g. BPF) as it simplified the memory traversal process. The new data structure (NodeSourceInfo) only holds pointers to the function name and file name variables, and assumes these objects will be alive throughout the lifetime of the process. Each Node has an extended attribute that has an index to a vector of stack frames expanded_node_stacks_ node_stack_attr_symbol_ is only needed to make accessing the stack vector index attribute easier from BPF. Test Plan: - Verified using BPF Program in subsequent diffs - Perf testing for loading large model: P822455246 Differential Revision: D49565461 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110229 Approved by: https://github.com/zdevito	2023-10-02 19:52:41 +00:00
Li-Huai (Allan) Lin	a3c1e3c95c	Generalize toAccumulateType() (#108248 ) Trying to address this comment: https://github.com/pytorch/pytorch/pull/106666#discussion_r1297397554 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108248 Approved by: https://github.com/kulinseth, https://github.com/albanD	2023-10-02 16:34:36 +00:00
cyy	d0ad848aa5	Enable misc clang-tidy checks (#110283 ) This PR enables the misc-XX checks in clang-tidy. Meanwhile, I excluded some of them that require a lot of code changes and have no immediate benefits. Some additional fixes and suppression were also given. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110283 Approved by: https://github.com/albanD	2023-09-30 10:39:52 +00:00
Nikita Shulga	ad8aef0f98	[BE] [3/N] Use nested namespaces (#110314 ) Mostly in torch/csrc/jit/runtime and in `ATen/cuda/` Pull Request resolved: https://github.com/pytorch/pytorch/pull/110314 Approved by: https://github.com/seemethere	2023-09-30 02:23:48 +00:00
davidgens-cerebras	ee0bff209c	[LTC] correct AdaptiveAvgPool3d channel dim index for shape inference (#109822 ) Fixes #109821 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109822 Approved by: https://github.com/mikaylagawarecki, https://github.com/alanwaketan	2023-09-29 22:54:12 +00:00
PyTorch MergeBot	b083058e45	Revert "Make unbind() overrideable for NT subclass (#109122 )" This reverts commit `f5a23ca78d`. Reverted https://github.com/pytorch/pytorch/pull/109122 on behalf of https://github.com/PaliC due to breaking slow tests ([comment](https://github.com/pytorch/pytorch/pull/109122#issuecomment-1741555305))	2023-09-29 22:41:56 +00:00
Octavian Guzu	9c7071b0e3	[fuzzing result][fuzz_torch_jit_lite_interpreter] read-heap-use-after-free (size 8) in std::_Function_base::_M_empty() (#110289 ) Summary: This diff fixes a heap UAF found by fuzzing in torch/csrc/jit/mobile/interpreter.cpp Test Plan: CI and ``` arc lionhead crash reproduce 1009060456885023 ``` doesn't crash anymore. Reviewed By: malfet Differential Revision: D49538326 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110289 Approved by: https://github.com/malfet	2023-09-29 22:32:38 +00:00
Mu-Chu Lee	d6d3f6cfe5	Add weight update for DSOModel. (#110273 ) Summary: Add weight update for DSOModel and AOTInductorModel Test Plan: buck2 test accelerators/workloads/models/slimdsnn:slimdsnn_dso_test - SlimDSNN.DSO_Update_Constants Reviewed By: mikekgfb Differential Revision: D49748685 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110273 Approved by: https://github.com/hl475	2023-09-29 18:14:01 +00:00
jjsjann123	e6b5e0ecc6	removing the functionality of nvfuser python APIs (#110124 ) Removing the functionalities from nvfuser python APIs. Since the use of nvfuser has been deprecated before the last release cut. We are removing torch script support. I'll have the next PR to actually remove the code base. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110124 Approved by: https://github.com/davidberard98	2023-09-29 01:45:00 +00:00
skc7	bbb95878e9	[LLVM] Update apis incompatible with llvm versions in codegen (#110200 ) Opaque pointers support is disabled in llvm 14 and enabled by default from llvm 15 and above. setOpaquePointers api usage is deprecated from llvm 16. Removed this API. Update CreateMalloc and CreateFree apis for latest llvm release. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110200 Approved by: https://github.com/Skylion007	2023-09-28 21:49:30 +00:00
cyy	168f516fae	[3/N] Move c10::variant to std::variant (#110141 ) This PR moves more c10::variant calls to std::variant Pull Request resolved: https://github.com/pytorch/pytorch/pull/110141 Approved by: https://github.com/Skylion007	2023-09-28 18:43:55 +00:00
cyy	7f5fd92372	Reland use std::make_unique after internal changes (#109742 ) check internal follow up of #109780 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109742 Approved by: https://github.com/ezyang	2023-09-28 17:24:08 +00:00
Bert Maher	5f417fd710	[aot_inductor] Lightweight model runner (#110158 ) It's useful to have a simple, lightweight way to run a model that adds essentially no overhead to calling the model's generated `run_impl` method. This C API is a super thin wrapper around AOTInductorModel: Create, Run, and Delete are provided, and do very little work beyond dispatch to the appropriate helpers. Note the Create function also provides additional functionality beyond the Container API; it allows the user to pass in a weight map defined in userland, which is a requirement for several serving use cases. Differential Revision: [D49670711](https://our.internmc.facebook.com/intern/diff/D49670711/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110158 Approved by: https://github.com/desertfire, https://github.com/chenyang78	2023-09-28 14:59:41 +00:00
cyy	a81d083b1c	[Reland] Add -Wdeprecated and related fixes (#110019 ) This is reland of PRs #https://github.com/pytorch/pytorch/pull/108626 and #109564. We fixed the IOS build failure by changing ``` ((CHECK) ? (EXPR) : ([] { assert(!#CHECK); }(), (EXPR))) ``` to ``` ((CHECK) ? (EXPR) : ([] { assert(false); }(), (EXPR))) ``` in TR2_OPTIONAL_ASSERTED_EXPRESSION, since the former syntax was invalid on Apple Clang. Anyway, we could apply the simple fix hoping that c10::optional would be replaced by std::optional soon. We also enabled -Wdeprecated on c10. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110019 Approved by: https://github.com/clee2000	2023-09-28 03:34:29 +00:00
Joel Schlosser	f5a23ca78d	Make unbind() overrideable for NT subclass (#109122 ) Goal: avoid making unbind composite implicit so we can override it within `__torch_dispatch__()` for the NT subclass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109122 Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer	2023-09-28 01:26:22 +00:00
Sherlock Huang	ec5bbef8af	[AOTInductor] Switch ProxyExecutor to use AtenTensorHandle (#109748 ) Summary: Switch ProxyExecutor to use AtenTensorHandle. Test Plan: E2E Test Differential Revision: D49471659 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109748 Approved by: https://github.com/yifuwang, https://github.com/desertfire, https://github.com/chenyang78	2023-09-27 17:51:30 +00:00
Lei, Zhenyuan	633bd0765e	Integrate xpu into torch.Generator and torch.seed (#109866 ) Integrate torch.xpu.Generator into torch.Generator Integrate torch.xpu.seed into torch.seed Pull Request resolved: https://github.com/pytorch/pytorch/pull/109866 Approved by: https://github.com/ezyang	2023-09-27 17:44:45 +00:00
Kaichao You	34ded74399	[Dynamo] fix signature in dynamo types (#110081 ) The type signature is obsolete. This PR fixes the type signature, leaves comments in the C code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110081 Approved by: https://github.com/jansel	2023-09-27 09:30:04 +00:00
Yang Chen	4d0ae7c9da	[inductor] support _scaled_dot_product_flash_attention fallback (#110085 ) Summary: This PR supports _scaled_dot_product_flash_attention fallback kernel. Note that in the abi_compatible mode, we retrieve outputs by passing output argument pointers rather than relying on std::get. It also fixes an issue related to dynamic shapes, where we wrongfully query undefined dynamic symbols. Test Plan: ci Reviewed By: frank-wei Differential Revision: D49620191 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110085 Approved by: https://github.com/desertfire	2023-09-27 00:09:56 +00:00
Shiyan Deng	19ca883f8b	[pytorch][jit] allow passing in obj loader in unpickle api (#109730 ) Summary: We are trying to use wired message to pass python objects like KJT. In order to make JIT be able to unpickle it, we need to provide a type resolver as well as an obj loader. This diff modify the interface to let we be able to do that. Test Plan: Rely on current CI to make sure existing usage doesn't break. In the next diff, test e2e Differential Revision: D49438569 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109730 Approved by: https://github.com/davidberard98	2023-09-26 23:50:20 +00:00
Rodrigo Kumpera	317e39a8ad	[C10d] Cleanup collective sequence number. (#109136 ) Sequence numbers must be associated with a Work object if we want to use it as a way to report collective progress. The API surface change is introducing Work::getSequenceNumber, which should eventually be exposed to python. The bulk of this change is changing gloo to make the sequence number be always in use and weave it to the dozens subclasses of Work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109136 Approved by: https://github.com/fduwjj	2023-09-26 17:17:04 +00:00
Animesh Jain	0673aa3d28	[dynamo][guards-log] Print nn module guard saved dict versions for debugging (#110028 ) This is the output for nn module guards ~~~ [DEBUG] GUARDS: [DEBUG] hasattr(L['x'], '_dynamo_dynamic_indices') == False # _dynamo/variables/builder.py:1356 in wrap_fx_proxy_cls [DEBUG] ___check_obj_id(L['self'], 139820807110912) # for mod in self.mods: # examples/graph_break.py:35 in forward [DEBUG] __nn_module_guard_0(L['self']) # versions(mod=9998, _parameters=1194395, _buffers=1194397, _modules=1194423, _forward_hooks=1194405, _forward_pre_hooks=1194411, _backward_hooks=1194402, _backward_pre_hooks=1194400) # for mod in self.mods: # examples/graph_break.py:35 in forward [DEBUG] ___check_obj_id(L['self'].mods[0], 139817945727568) # for mod in self.mods: # examples/graph_break.py:35 in forward [DEBUG] __nn_module_guard_1(L['self'].mods[0]) # versions(mod=10001, _parameters=1194428, _buffers=1194430, _modules=1194522, _forward_hooks=1194438, _forward_pre_hooks=1194444, _backward_hooks=1194435, _backward_pre_hooks=1194433) # for mod in self.mods: # examples/graph_break.py:35 in forward [DEBUG] ___check_obj_id(L['self'].mods[1], 139817945560640) # for mod in self.mods: # examples/graph_break.py:35 in forward [DEBUG] __nn_module_guard_2(L['self'].mods[1]) # versions(mod=10001, _parameters=1194660, _buffers=1194662, _modules=1194753, _forward_hooks=1194670, _forward_pre_hooks=1194676, _backward_hooks=1194667, _backward_pre_hooks=1194665) # for mod in self.mods: # examples/graph_break.py:35 in forward [DEBUG] ___check_obj_id(L['self'].mods[0].linear, 139817945727856) # return self.linear(a) # examples/graph_break.py:24 in helper [DEBUG] __nn_module_guard_3(L['self'].mods[0].linear) # versions(mod=10004, _parameters=1470004, _buffers=1194467, _modules=1194493, _forward_hooks=1194475, _forward_pre_hooks=1194481, _backward_hooks=1194472, _backward_pre_hooks=1194470) # return self.linear(a) # examples/graph_break.py:24 in helper [DEBUG] ___check_obj_id(L['self'].mods[1].linear, 139817945561120) # return self.linear(a) # examples/graph_break.py:24 in helper [DEBUG] __nn_module_guard_4(L['self'].mods[1].linear) # versions(mod=10004, _parameters=1470008, _buffers=1194699, _modules=1194725, _forward_hooks=1194707, _forward_pre_hooks=1194713, _backward_hooks=1194704, _backward_pre_hooks=1194702) # return self.linear(a) # examples/graph_break.py:24 in helper [DEBUG] utils_device.CURRENT_DEVICE == None # _dynamo/output_graph.py:373 in init_ambient_guards ~~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/110028 Approved by: https://github.com/ezyang ghstack dependencies: #110023, #110039	2023-09-26 08:53:07 +00:00
Yuqing Jiang	56659844f9	[profiler] Show shapes for lists of tensors in chrome traces #109263 (#109751 ) Summary: https://github.com/pytorch/pytorch/issues/109263 Show the shape of tensorlist when the length is < 30. Test Plan: {F1097707985} and unit tests Reviewed By: davidberard98 Differential Revision: D49351902 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109751 Approved by: https://github.com/davidberard98	2023-09-26 01:03:54 +00:00
Bin Bao	4bf1cd6961	[aotinductor] Rename aot_runtime to aoti_runtime (#110007 ) Summary: Make the naming more explicit Differential Revision: D49593528 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110007 Approved by: https://github.com/houseroad	2023-09-26 00:46:54 +00:00
fwenguang	c4f2b6dbd2	[profiler] use PyCFunction_Check to check both PyCMethod_Type and PyC… (#110002 ) At https://github.com/pytorch/pytorch/blob/main/torch/csrc/autograd/profiler_python.cpp#L1096, when what is PyTrace_C_CALL, Py_TYPE(arg) only can be PyCFunction_Type before python3.9. But in python3.9 or later, Py_TYPE(arg) also can be PyCMethod_Type. PyCMethod_Type is subtype of PyCFunction_Type, ref to `f2eaa92b0c/Objects/methodobject.c (L372)`. So there should use PyCFunction_Check to check arg->ob_type. Fixes #109877 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110002 Approved by: https://github.com/ezyang	2023-09-25 20:17:25 +00:00
PyTorch MergeBot	83deaa16ed	Revert "[1/N] Cleanup header inclusions in torch_cpu by iwyu (#101178 )" This reverts commit `b7a95f4fdb`. Reverted https://github.com/pytorch/pytorch/pull/101178 on behalf of https://github.com/atalman due to Break internal CI ([comment](https://github.com/pytorch/pytorch/pull/101178#issuecomment-1734384645))	2023-09-25 20:05:25 +00:00
Moritz Hennen	09c598745c	Rename `torch._C._TensorBase` to `TensorBase` (#109940 ) I have gone ahead and implemented the renaming of the type `torch._C._TensorBase` to a non-private class name `TensorBase`. The changes also include leaving `torch._C._TensorBase` as an alias to the new type: `70458768fb/torch/csrc/autograd/python_variable.cpp (L2196-L2197)` both in the c++ code and in the corresponding `__init__.pyi.in` file: `70458768fb/torch/_C/__init__.pyi.in (L1522)` Fixes #109438 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109940 Approved by: https://github.com/ezyang	2023-09-25 19:10:22 +00:00
Randolf Scholz	837272f150	Python 3.10 Union operator \| support for JIT (#109293 ) Fixes #101777 - [x] Duplicated the tests from `test/jit/test_union.py` into [`test/jit/test_union_pep604.py`](https://github.com/pytorch/pytorch/pull/109293/files#diff-b981f6493093482b43b0e62057b0c01b004b3e932d4e63a1166c3808c0172b83), using PEP604 style Unions - [x] Exchanged custom `get_args` and `get_origin` with `typing.get_args` and `typing.get_origin` which have the same functionality and became part of the standard library in 3.8 - [x] Added utility function `pep604union_to_union` in `tree_views.h` which converts a `BinOP("\|")` node into the corresponding `Union`. This function intercepts `ScriptTypeParser::parseTypeFromExpr` and `ScriptTypeParser::parseTypeFromExprImpl` and patches the expression. - [ ] There is a single failing test, I commented it out for the moment to see if CI complains about anything else. I tried several hours to figure out how to patch it, but I am not experienced with C++ development and debugging. From what I could gather, the following fails: ```python def test_union_optional_of_union_return(self): @torch.jit.script def fn() -> None \| str \| int: y: Optional[int \| str] = "foo" return y ``` In the section: `75b954b715/torch/csrc/jit/frontend/script_type_parser.cpp (L232-L243)` When using regular `Union`, the `resolver` path is taken, whereas with the patch pep604 union, `resolveType` doesn't work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109293 Approved by: https://github.com/ezyang	2023-09-25 15:35:54 +00:00
Pritam Damania	5565a29568	Release GIL in torch.cuda ops wherever possible. (#109159 ) Most `torch.cuda` ops (ex: `torch.cuda.synchronize`) do not release GIL in C++ land. This has the potential of causing deadlocks and freeze the python process. For example, `torch.cuda.synchronize` could hold GIL and get blocked on some operation. However, that operation might never complete in python land since GIL is held by `torch.cuda.synchronize`. In this PR, I've tried to release GIL as much as possible in `torch.cuda` ops. See https://github.com/pytorch/pytorch/issues/109074 for an example of how holding GIL causes a deadlock. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109159 Approved by: https://github.com/ezyang	2023-09-25 14:35:31 +00:00
cyy	b7a95f4fdb	[1/N] Cleanup header inclusions in torch_cpu by iwyu (#101178 ) Following our previous IWYU work #100304 on C10, it makes more sense to try IWYU on torch_cpu. This PR does exactly that. Meanwhile, it fixes issue #48684. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101178 Approved by: https://github.com/ezyang	2023-09-24 05:01:20 +00:00
cyy	dee100945e	[2/N] Move c10::variant to std::variant (#109723 ) This PR moves most of c10::variant calls to std::variant. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109723 Approved by: https://github.com/ezyang	2023-09-24 02:47:43 +00:00
Oleg Khabinov	54faedf5f2	[AOTInductor] Load model on arbitrary device (#109816 ) Reviewed By: desertfire Differential Revision: D49402404 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109816 Approved by: https://github.com/chenyang78	2023-09-23 04:45:20 +00:00
Modi Mo	f0d71de4ac	Update caffe2 with LLVM-18 API change (#109408 ) Summary: https://github.com/llvm/llvm-project/pull/66295 modified some internal LLVM APIs, update these places with the changes under LLVM version guard Test Plan: CI Differential Revision: D49340871 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109408 Approved by: https://github.com/Skylion007	2023-09-22 21:40:58 +00:00
Rodrigo Kumpera	c26270c733	[C10D] Even more store scalability work. (#109218 ) Fix a bug socket.cpp in timeout detection that only shows up with 10k ranks. Make the minimum wait time in _store_based_barrier to be adaptative based on the number of ranks. Longer timeouts give more room for the store to do productive work when swamped. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109218 Approved by: https://github.com/XilunWu ghstack dependencies: #109217	2023-09-22 21:27:09 +00:00
Rodrigo Kumpera	92de1d3222	[C10D] Push store scalability a bit further. (#109217 ) This is a bunch of small changes to improve store scalability: - stagger client connection to avoid a stampede. - warn if somaxconn is too small. - increase the backlog to 16k. Differential Revision: [D49238587](https://our.internmc.facebook.com/intern/diff/D49238587) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109217 Approved by: https://github.com/XilunWu	2023-09-22 17:23:46 +00:00
Andrew Calvano	2512017814	Fix for out of bounds read in torch mobile flatbuffer loader (#108439 ) Remove redundant (and unsafe) `mobile::serialization::ModuleBufferHasIdentifier(data)` as ` mobile::serialization::VerifyModuleBuffer(verifier)` validates the same thing but in boundary-check safe manner. Test Plan: Out of bounds read crash no longer reproduces Differential Revision: D48914114 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108439 Approved by: https://github.com/manuelcandales, https://github.com/malfet	2023-09-22 14:26:33 +00:00
Nikita Shulga	f092eecc92	Handle C++ exceptions raised during `finfo`/`iinfo` calls (#109743 ) Partially fixes https://github.com/pytorch/pytorch/issues/109737 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109743 Approved by: https://github.com/albanD ghstack dependencies: #109744	2023-09-22 14:17:58 +00:00
Bin Bao	d7dfa91e12	[inductor] Refactor some libtorch c shim interfaces (#109834 ) Summary: Change the returned values to be in the back of the parameters, because 1) it is more consistent with AOTInductor runtime API convention; 2) because the out-variant ops have the out tensor at the beginning of parameters, this makes the return values more distinguished from those Test Plan: ``` buck test mode/opt caffe2/torch/fb/model_transform/experimental/benchmark/test/aotinductor:test_aot_inductor_benchmark ``` Differential Revision: D49522928 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109834 Approved by: https://github.com/chenyang78	2023-09-22 12:45:23 +00:00
Brian Hirsh	63526a63f5	Make FunctionalTensor subclass to be more like functorch (interaction with ZeroTensor + Conjugate key) (#109023 ) I added some tests for Conj, Neg and ZeroTensor for both python and C++ functionalization. This also fixes a nasty segfult when running a functorch `jacfwd` test with `torch.compile`, once AOTAutograd is using `FunctionalTensor`. Changes: (1) I use Jeffrey's `make_wrapper_subclass(extra_dispatch_keys)` kwarg to plumb extra dispatch keys ontoto the wrapper, mirroring what C++ functionalization does (C++ functionalization will mirror all dispatch keys from the inner tensor to the wrapper, except for python and functorch keys). (2) FunctionalTensorMode will decompose CompositeImplicitAutograd ops, since (for example) ZeroTensor kernels can send ops like `.to()` directly to the Python key. We'll need a way to toggle this later for pre-dispatch functionalization (3) Bound `_ForceDispatchKeyGuard` and BatchedTensorImpl's dispatch keyset to python Pull Request resolved: https://github.com/pytorch/pytorch/pull/109023 Approved by: https://github.com/zou3519 ghstack dependencies: #108654, #109662, #109632	2023-09-22 07:09:04 +00:00
Brian Hirsh	dae9aa8925	fix subclass custom sizes dynamic shapes caching (#108654 ) This PR fixes the ownership/lifetime handling for tensor subclasses that override sizes/strides, when tensors get resized. This is needed now, because `FunctionalTensor` is a subclass that has a custom size/stride (so it can plumb requests to its inner tensor), and is also a core piece of infra (it's used during tracing in AOTAutograd, which means that metadata mutation and resizing that happens to work with torch.compile today needs to work with FunctionalTensor). After a bunch of discussion with @ezyang and @soulitzer, I updated `PyInterpreter::sym_sizes()` (and friends) so that: (1) They allocate a py::capsule buffer and stash it on the tensor on the first call to size/stride (2) On a size/stride call where we noticed that the number of dimensions on the tensor has changed (so our buffer it stale), we re-allocate the buffer (3) On a size/strude cal where we notice that the number of dimensions is the same, but the values are different (this happens whenever a tensor experiences a metadata mutation, like `.transpose_()`), we inplace-modify the buffer and put the new ints/symints into it I also ended up doing the SmallVector optimization, which was required to fix some tests in AOTAutograd. Ideally we should look into those tests, and nail down the parts of our codebase that rely on SmallVector not re-allocating on a resize... but I'm saving this for a followup. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108654 Approved by: https://github.com/ezyang	2023-09-22 07:09:04 +00:00
cyy	cd99cdc3af	fix std::move warnings from gcc (#105780 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105780 Approved by: https://github.com/Skylion007	2023-09-22 05:55:21 +00:00
rzou	8124a6c40c	[TORCH_LIBRARY] Add impl_abstract_pystub (#109529 ) We want users to be able to define custom ops in C++ but put the abstract impl in Python (since it is easier to write them in Python and the abstract impl better models device semantics and data-dependent operators). `m.impl_abstract_pystub(opname, python_module, context)` declares the abstract_impl of the operator to exist in the given python module. When the abstract_impl needs to be accessed (either via FakeTensor or Meta), and it does not exist, the PyTorch Dispatcher will yell with a descriptive error message. Some details: - We construct a new global AbstractImplPyStub mapping in Dispatcher.cpp. Read/write to this map is protected by the Dispatcher lock. - We add a new Meta Tensor fallback kernel. The fallback errors out if there is no meta kernel, but also offers a nicer error message if we see that there is a pystub. - We create a `torch._utils_internal.throw_abstract_impl_not_imported_error` helper function to throw errors. This way, we can throw different error messages in OSS PyTorch vs internal PyTorch. To invoke this from C++, we added a PyInterpreter::throw_abstract_impl_not_imported_error. Differential Revision: [D49464753](https://our.internmc.facebook.com/intern/diff/D49464753/) Differential Revision: [D49464753](https://our.internmc.facebook.com/intern/diff/D49464753) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109529 Approved by: https://github.com/ezyang, https://github.com/bdhirsh	2023-09-22 04:55:36 +00:00
Bin Bao	8856c1628e	[inductor] Change AOTInductor to return output tensors (#109790 ) Summary: Change AOTInductor to directly return output tensors instead of taking pre-allocated output tensors to return the results. This gives several benefits: * It makes sure AOTInductor has the same behavior when managing the output tensors as the default Inductor, which is widely tested and thus more reliable. * As we have debugged before, there are cases we still have to codegen extra copy_ ops to fill the pre-allocated output tensors which doesn't make sense for performance. * With the coming enhanced memory planning, this again will make sure the memory planning logic is the between AOTInductor and Inductor, which will greatly simplify the problem and improve the reliability. This change also combines D49494954 from Yang and https://github.com/pytorch/pytorch/pull/109560 from Angela. Differential Revision: D49502318 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109790 Approved by: https://github.com/chenyang78	2023-09-22 02:31:52 +00:00
Gustav Larsson	8dcdc74915	torch->onnx export support: quantized::linear_relu (#109755 ) - Adds support for quantized::linear_relu - Adds weight unpacking pattern matcher - Adds to export for opset 10 and 13. - Adds QAT test modeled after conv2d+relu fusion test Pull Request resolved: https://github.com/pytorch/pytorch/pull/109755 Approved by: https://github.com/BowenBao, https://github.com/thiagocrepaldi	2023-09-21 23:24:20 +00:00
Daniil Kutz	175ccfc4c8	Verify flatbuffer module fields are initialized (#109794 ) Fixes #109793 Add validation on flatbuffer module field to prevent segfault Pull Request resolved: https://github.com/pytorch/pytorch/pull/109794 Approved by: https://github.com/malfet	2023-09-21 23:19:17 +00:00
PyTorch MergeBot	b5fde4c382	Revert "[Reland] Remove calls of c10::either (#109708 )" This reverts commit `0735f6c0d5`. Reverted https://github.com/pytorch/pytorch/pull/109708 on behalf of https://github.com/atalman due to Broke windows periodic tests ([comment](https://github.com/pytorch/pytorch/pull/109708#issuecomment-1730356321))	2023-09-21 22:04:25 +00:00
cyy	e9e93c5350	[Reland] Move torch::make_unique to std::make_unique (#109780 ) We can first try to move torch::make_unique to std::make_unique despite reverting of #108866 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/109780 Approved by: https://github.com/ezyang	2023-09-21 18:30:21 +00:00
Edward Z. Yang	09622d8d49	Allow inferring size-nature from sizes passed to empty constructor (#109720 ) This removes the need for many constrain_as_size calls as we now infer them from error checking for sizes. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109720 Approved by: https://github.com/aakhundov	2023-09-21 17:57:40 +00:00
Edward Z. Yang	0351e2042b	Avoid throwing exception in ClosingTHPObjectPtr (#109758 ) Previously, if ClosingTHPObjectPtr was destructed because we were unwinding the stack from an exception, we would attempt to call close() which just isn't going to work. Two fixes: 1. Detect if we're unwinding due to a Python error, and don't try to do more Python stuff if so. 2. If close() fails somehow, write an unraisable exception, don't try to throw because that will terminate if you're in an exception. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109758 Approved by: https://github.com/jansel	2023-09-21 17:04:14 +00:00
Nikita Shulga	cddd0db241	Add `finfo` properties for float8 dtypes (#109744 ) Add float8 finfo checks to `test_type_info.py` Fixes https://github.com/pytorch/pytorch/issues/109737 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109744 Approved by: https://github.com/drisspg	2023-09-21 03:41:48 +00:00
Bin Bao	9c2715bbb2	[inductor] Clean up AOTInductor runtime ABI (#109678 ) Summary: Change the AOTInductor runtime interface to avoid referring to aten data structures directly, mostly at::Tensor and ProxyExecutor. This a combination of https://github.com/pytorch/pytorch/pull/109436, https://github.com/pytorch/pytorch/pull/109498, https://github.com/pytorch/pytorch/pull/109450, https://github.com/pytorch/pytorch/pull/109606, plus a few internal build changes. Reviewed By: frank-wei Differential Revision: D49374820 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109678 Approved by: https://github.com/frank-wei, https://github.com/chenyang78	2023-09-21 00:25:24 +00:00
Nikita Shulga	4e3b03217d	[BE] Replace 8 with `CHAR_BIT` (#109740 ) Defined in [limits.h](https://en.cppreference.com/w/c/types/limits) as number of bits per byte Pull Request resolved: https://github.com/pytorch/pytorch/pull/109740 Approved by: https://github.com/kit1980, https://github.com/ZainRizvi	2023-09-20 23:42:25 +00:00
Peter Bell	7ce69d5dbe	[RELAND] Remove some unnecessary <iostream> includes from headers (#108150 ) In almost all cases this is only included for writing the output formatter, which only uses `std::ostream` so including `<ostream>` is sufficient. The istream header is ~1000 lines so the difference is non-trivial. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108150 Approved by: https://github.com/albanD, https://github.com/malfet ghstack dependencies: #108149	2023-09-20 21:55:15 +00:00
cyy	0735f6c0d5	[Reland] Remove calls of c10::either (#109708 ) While there were FB issues encountered when removing c10::either #109299 , we should be able to change OSS code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109708 Approved by: https://github.com/clee2000	2023-09-20 21:23:10 +00:00
soulitzer	8bc00dfffd	Hashing for constant and singleton SymInt/SymBool (#109170 ) Bugfix: - previously, SymBool does not implement `__eq__`, Python falls back to default `__eq__ `and `__hash__` - in this PR, we make SymBool implement `__eq__` - symbolic SymBool now raises an error when hashed just like SymInt/SymFloat New feature: - previously, SymInt and SymFloat are unhashable (even if you are singleton or constant) - in this PR, SymInt and SymBool are hashable if singleton/constant Stay the same: - SymNode are hashable due to default Python behavior Pull Request resolved: https://github.com/pytorch/pytorch/pull/109170 Approved by: https://github.com/ezyang ghstack dependencies: #109169	2023-09-20 20:37:15 +00:00
soulitzer	5252fcb133	Handle constant SymBool in unary and binary operations (#109169 ) In this PR: - When Constant SymNode are detected in unary/binary ops demote them to plain int/bool before proceeding. Sometimes this means doing a unary op with a Constant SymNode would result in a plain bool. - Introduce an is_symbolic method, only available from Python. We need this because isinstance(x, SymInt) is no longer sufficient to check whether a given int/SymInt is symbolic or not. See later PR in the stack to see how this is used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109169 Approved by: https://github.com/ezyang	2023-09-20 20:37:15 +00:00
Rodrigo Kumpera	9a1b6d44bb	[C10d] Add PG::enableCollectivesTiming to make it dynamically enabled. (#108814 ) Collectives timing gates the tracking when a collective starts on device. Currently it's enabled by set the NCCL_ENABLE_TIMING env var. The goal of this PR is to make it possible to dynamically enable that flag so users of the PG hooks don't have to set that flag in order to have their hooks work. The design is that once set, all new collectives will have such behavior so we track it on each Work object. We make enableTiming_ atomic in PGNCCL to avoid races on non-TSO hardware. To ensure consistency, we copy its value during Work construction and replace all previous usage of enableTiming_ from the PG with usages from the Work, which now has an immutable value. Pull Request resolved: https://github.com/pytorch/pytorch/pull/108814 Approved by: https://github.com/wconstab, https://github.com/fduwjj ghstack dependencies: #108813	2023-09-20 19:47:41 +00:00
PyTorch MergeBot	cdb51d2ad0	Revert "[2/N] Add -Wdeprecated and related fixes (#109564 )" This reverts commit `5b50641bac`. Reverted https://github.com/pytorch/pytorch/pull/109564 on behalf of https://github.com/atalman due to Need to revert as followup revert of first PR 108626 ([comment](https://github.com/pytorch/pytorch/pull/109564#issuecomment-1728137207))	2023-09-20 17:15:57 +00:00
cyy	567e8ebf94	[1/N] Move c10::variant to std::variant (#103675 ) This PR moves some calls of c10::variant to std::variant. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103675 Approved by: https://github.com/ezyang	2023-09-20 15:21:24 +00:00
Aleksei Nikiforov	a019e5cbff	s390x onnx: byteswap data when serializing it (#107963 ) This change fixes test_pad, test_pad_with_dynamic_input_shape, test_reshape, test_resize and test_resize_after_concat in test/onnx/test_pytorch_onnx_shape_inference.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/107963 Approved by: https://github.com/justinchuby	2023-09-20 14:27:45 +00:00
cyy	5b50641bac	[2/N] Add -Wdeprecated and related fixes (#109564 ) This PR follows #108626. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109564 Approved by: https://github.com/ezyang	2023-09-20 07:03:25 +00:00
Edward Z. Yang	518308a740	Trace through `pytree` API with dynamo. (#108533 ) Fix: #107315 This PR enables dynamo to trace through the `pytree` API by inlining its functions. In order to do so, a few details of `pytree` had to be changed. In summary, this PR: - Introduces `TreeSpecVariable` for representing `TreeSpec` instances - Specializes `<type>.__bases__` call, returning a `TupleVariable` - Enables the call to `id` builtin function for every variable that implements `as_python_constant` method - Specializes `ConstantVariable.call_method` for its (un)flatten functions - Implements `UserDefinedObjectVariable.as_python_constant` - Modifies `pytree` by: - Make `SUPPORTED_NODES` a map of ids (instead of types) to `NodeDef` - Removed `functools.wraps` function, since it can't be inlined Pull Request resolved: https://github.com/pytorch/pytorch/pull/108533 Approved by: https://github.com/ezyang, https://github.com/voznesenskym ghstack dependencies: #109201	2023-09-20 00:04:56 +00:00
Digant Desai	5845fc2fa6	[PyTorch][Coreml] Bubble up NSError from loadModel (#109444 ) Summary: This can help debug issues esp fc/bc issues with coreml tools, when a model fails to load. Test Plan: On a macbook fbsource, ``` arc focus2 -b pp-ios -a ModelRunner -a //xplat/caffe2/c10:c10Apple -a //xplat/caffe2/fb/dynamic_pytorch:dynamic_pytorch_implApple -a //xplat/caffe2:coreml_delegateApple --auto-test-schemes --force-with-wrong-xcode ``` It builds and runs the Playground app using a bunch of coreml models on my iPhone. Here is one for example, https://pxl.cl/3nSPn Also forcefully triggering MLModel ctor failure to test this code by setting a `modelURL=nil`, and as expected got this, ``` libc++abi: terminating due to uncaught exception of type c10::Error: Error loading MLModel Error details: Localized_description: nil value for URL Domain: com.apple.CoreML Code: 3 User Info: { NSLocalizedDescription = "nil value for URL"; } Input Shapes: N/A Exception raised from compile at xplat/caffe2/torch/csrc/jit/backends/coreml/objc/PTMCoreMLBackend.mm:162 (most recent call first): (no backtrace available) ``` Instead of a previous message would have been, ``` Loading MLModel failed ``` Unrelated issues * P829736691 - with running MaskRCNN on Coreml with the Playground app. Only happens some times. * P829741377 - with Metal Operator Tests with the Playground app. Differential Revision: D49349726 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109444 Approved by: https://github.com/kimishpatel	2023-09-19 20:08:37 +00:00
Brian Hirsh	25e81f19f3	reland "python functionalization: add helpers, functionalize_sync and mirror_autograd_meta (#107917 )" (#109518 ) Reland - the previous PR was reverted by internal with this error: ``` File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/buck-out/v2/gen/fbcode/363cd7e240f5d021/caffe2/torch/fb/trainer/data_modules/tests/__test_dataloader__/test_dataloader#link-tree/torch/__init__.py", line 29, in <module> from ._utils_internal import _functionalize_sync as _sync ImportError: cannot import name '_functionalize_sync' from 'torch._utils_internal' ``` I couldn't figure out why internal was unhappy with the import. One potential reason is that I see a build rule for another `_utils_internal.py` in the fb folder here ([link](https://www.internalfb.com/code/fbsource/[30ed85cd88409af98b7490be137aaa5dfd7afd01]/fbcode/caffe2/TARGETS?lines=444)) Rather than burn more time investigating, I confirmed internally that the error goes away if I move the util from `torch/_utils_internal.py` to `torch/_utils.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109518 Approved by: https://github.com/albanD	2023-09-19 13:25:24 +00:00
cyy	ac603bc2f8	[Reland] Eliminate invocations of c10::stoi,c10::stod,c10::stoull,c10::stoll (#109566 ) This is reland of #87603 with definitions of c10::stoXX kept for further investigation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109566 Approved by: https://github.com/huydhn	2023-09-19 07:15:25 +00:00
Edward Z. Yang	2c1554a032	Make SymFloat behave symmetrically with float in torch.tensor (#109513 ) Previously, SymFloat would force double precision. That's wrong; instead, we must respect default dtype. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109513 Approved by: https://github.com/voznesenskym	2023-09-19 01:52:41 +00:00
Pritam Damania	550b0ec3d4	Release GIL around VariableInfo::zeros to avoid deadlocks (#109454 ) See https://github.com/pytorch/pytorch/issues/109074#issue-1891369807 and https://github.com/pytorch/pytorch/issues/109074#issuecomment-1718825855 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109454 Approved by: https://github.com/albanD	2023-09-18 22:28:48 +00:00
PyTorch MergeBot	4d44d8c00a	Revert "Eliminate c10::stoi,c10::stod,c10::stoull,c10::stoll (#109179 )" This reverts commit `852f1b8417`. Reverted https://github.com/pytorch/pytorch/pull/109179 on behalf of https://github.com/huydhn due to Sorry for reverting your change but this is breaking periodic buck build, so please fix the issue and reland the change https://github.com/pytorch/pytorch/actions/runs/6207458526/job/16852695272 ([comment](https://github.com/pytorch/pytorch/pull/109179#issuecomment-1724168571))	2023-09-18 18:41:12 +00:00
Bin Bao	6ffa59031a	[inductor] Fix CudaStreamGuard in AOTInductor ABI compatible mode (#109471 ) Summary: Use a RAII class to wrap around at::cuda::CUDAStreamGuard. Previous implementation didn't follow the exact CUDAStreamGuard behavior. Test Plan: CI Differential Revision: D49355542 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109471 Approved by: https://github.com/chenyang78	2023-09-18 15:54:58 +00:00
Catherine Lee	0cae3b5df5	Revert "[PyTorch] Add Expanded call stack to nodes (#108426 )" (#109468 ) This reverts commit `c657d9ecc5`. https://github.com/pytorch/pytorch/pull/108426 The diff got reverted internally via a backout diff without getting exported to github. Do not import this PR Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/109468 Approved by: https://github.com/kit1980	2023-09-17 23:46:20 +00:00
Ken Jin	f9e72acc8f	Guard default dtype in torchdynamo (#109459 ) Fixes https://github.com/pytorch/pytorch/issues/109458 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109459 Approved by: https://github.com/ezyang	2023-09-17 22:51:33 +00:00
PyTorch MergeBot	71420a98ab	Revert "Remove c10::either (#109299 )" This reverts commit `9d297cc773`. Reverted https://github.com/pytorch/pytorch/pull/109299 on behalf of https://github.com/clee2000 due to sorry but there are a few internal usages and when I tried swapping them out, I got some errors. I will get someone to look at them on Monday ([comment](https://github.com/pytorch/pytorch/pull/109299#issuecomment-1722579387))	2023-09-17 22:05:47 +00:00
PyTorch MergeBot	525e4f42d0	Revert "replace torch::make_unique with std::make_unique (#108866 )" This reverts commit `03e35efbf7`. Reverted https://github.com/pytorch/pytorch/pull/108866 on behalf of https://github.com/clee2000 due to Sorry but I found more usages of `torch::make_unique` internally, I can go change all of these, but I'd prefer if that gets done before this gets merged ([comment](https://github.com/pytorch/pytorch/pull/108866#issuecomment-1722577925))	2023-09-17 21:57:30 +00:00
PyTorch MergeBot	49b18ae546	Revert "python functionalization: add helpers, functionalize_sync and mirror_autograd_meta (#107917 )" This reverts commit `0ad595954a`. Reverted https://github.com/pytorch/pytorch/pull/107917 on behalf of https://github.com/clee2000 due to breaking internal builds D49346637 ([comment](https://github.com/pytorch/pytorch/pull/107917#issuecomment-1722566885))	2023-09-17 20:57:41 +00:00
cyy	75b954b715	[4/N] Enable clang-tidy in torch/csrc/autograd (#109455 ) The PR enables clang-tidy checks in torch/csrc/autograd. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109455 Approved by: https://github.com/Skylion007	2023-09-17 17:11:50 +00:00
cyy	51d2d825ab	[3/N] apply clang-tidy in torch/csrc/autograd (#109368 ) This PR applies clang-tidy fixes in torch/csrc/autograd/FunctionsManual.cpp. There are also other fixes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109368 Approved by: https://github.com/Skylion007	2023-09-17 07:26:59 +00:00
Bin Bao	0f646b1d15	[inductor] Add a C shim layer for libtorch (#109391 ) Summary: This PR adds a limited C shim layer for libtorch. The ultimate goal is to ban any direct reference to aten/c10 data structures or functions, to avoid ABI breakage by providing stable C interfaces. To make the review and landing easier, we broke the changes into several steps. In this PR (a combination of https://github.com/pytorch/pytorch/pull/109022 and https://github.com/pytorch/pytorch/pull/109351), we add C interfaces for certain libtorch functions and modify the wrapper codegen to generate calls to those interfaces. There are a few other items to be addressed in future PRs: * The AOTInductor runtime interface still takes lists of aten tensors as input and output * The interaction with ProxyExecutor (general fallback support) needs to move away from aten tensor * Remove all references to aten/c10 headers in the AOTInductor-generated code Differential Revision: D49302669 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109391 Approved by: https://github.com/chenyang78	2023-09-16 16:46:26 +00:00
cyy	852f1b8417	Eliminate c10::stoi,c10::stod,c10::stoull,c10::stoll (#109179 ) We can remove these functions in favor of std ones. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109179 Approved by: https://github.com/colesbury	2023-09-16 07:22:50 +00:00
cyy	a14d30d8d1	[1/N] apply clang-tidy in torch/csrc/autograd (#109032 ) This PR begins a new series of patches for enabling clang-tidy checks in torch/csrc/augograd Pull Request resolved: https://github.com/pytorch/pytorch/pull/109032 Approved by: https://github.com/albanD, https://github.com/Skylion007	2023-09-15 23:28:43 +00:00
Brian Hirsh	0ad595954a	python functionalization: add helpers, functionalize_sync and mirror_autograd_meta (#107917 ) Added two new utils to help with turning python functionalization on in AOTAutograd (next PR): (1) updated `torch._sync()`. Previously, this API could only handle `torch.Tensor` instances that had a `FunctionalTensorWrapper` TensorImpl. It now needs to handle python `FunctionalTensor`'s. In theory I can probably break BC and change this API (since it's private?), but I decided not to do it in this PR stack do minimize the chance of reverts. Instead of updating that API directly (which is in C++), I just added a python shim that first tries to unwrap the python `FunctionalTensor` if there is one, then calls the existing C++ logic (2) `mirror_autograd_meta` is now a standalone API that tries to mirror the `requires_grad` and `is_leaf` autograd metadata from one tensor to another. Previously this was hardcoded into `torch._to_functional_tensor()`. But I now need to use it in a more standalone way: later in AOTAutograd when we unwrap and re-wrap a tensor subclasses, we need to manually mirror the autograd metadata from the original to the updated version of the subclass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107917 Approved by: https://github.com/ezyang ghstack dependencies: #106404	2023-09-15 20:19:25 +00:00
Brian Hirsh	f22b303f65	Add TorchDispatch version of functionalization (#106404 ) This PR adds a new `FunctionalTensor` subclass, and `FunctionalTensorMode` torch dispatch mode. Together, this class/mode are a lightweight wrapper around our existing C++ functionalization logic. This idea came from Ed - later in the stack, I want to be able to run functionalization underneath torch_dispatch, when performing tracing in AOTAutograd. I can't do this easily with vanilla C++ functionalization, because it has a dedicated dispatch key that always runs before TorchDispatch. However, by adding a torch_dispatch mode shim around functionalization, we can use functionalization as a torch_dispatch mode, which will make it easier to run underneath other modes later. This PR provides the basic new classes, and some light testing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106404 Approved by: https://github.com/ezyang	2023-09-15 20:19:25 +00:00
Edward Z. Yang	d3a64ff249	Display subclass name when tolist() fails due to tensor subclass (#109376 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109376 Approved by: https://github.com/wanchaol	2023-09-15 19:42:39 +00:00
cyy	9d297cc773	Remove c10::either (#109299 ) We can replace it with std::variant. Pull Request resolved: https://github.com/pytorch/pytorch/pull/109299 Approved by: https://github.com/colesbury, https://github.com/ezyang	2023-09-15 19:34:31 +00:00
Oleg Khabinov	cc03e3a892	[AOTInductor] Do not hardcode directory with .cubin files (#109151 ) Reviewed By: frank-wei, chenyang78 Differential Revision: D49081883 Pull Request resolved: https://github.com/pytorch/pytorch/pull/109151 Approved by: https://github.com/chenyang78	2023-09-15 18:38:05 +00:00
Andrew Gallagher	a873f523ba	[aarch64][caffe2/torch/csrc/profiler] Support aarch64 in inline assembly (#104707 ) Summary: Port x86 inline assembly to aarch64: - Use `sp` instead of `%rsp` for stack pointer; move to second caller- saved register `x1` instead of `%rsi` - Use `x29` instead of `%rbp` for base pointer; move to third caller- saved register `x2` instead of `%rdx` Test Plan: ``` $ buck2 build fbcode//mode/opt fbcode//caffe2/torch/fb/model_transform/fx2trt/packaging:generate_merge_net_file ``` Reviewed By: jasonjk-park Differential Revision: D47242468 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104707 Approved by: https://github.com/aaronenyeshi	2023-09-15 14:34:55 +00:00
Paul Gesel	0cbca85707	Add check to prevent NumPy ndarray from being treated as tuple when indexing (#108954 ) Fixes #108689 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108954 Approved by: https://github.com/lezcano	2023-09-15 08:51:58 +00:00

1 2 3 4 5 ...

12789 Commits