pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
PyTorch MergeBot	3122a96ee4	Revert "Improve and expose cpp_backtrace to python binding (#84896 )" This reverts commit `73fbca1ea6`. Reverted https://github.com/pytorch/pytorch/pull/84896 on behalf of https://github.com/kit1980 due to Broke libtorch and linux-binary-manywheel - `73fbca1ea6`	2022-09-21 03:13:20 +00:00
Sherlock Huang	73fbca1ea6	Improve and expose cpp_backtrace to python binding (#84896 ) We can now get cpp stack trace by calling torch.utils.get_cpp_backtrace() Sample output when calling from a torch_dispatch stack: ``` <omitting python frames> frame #23: torch::handle_torch_function_no_python_arg_parser(c10::ArrayRef<pybind11::handle>, _object, _object, char const, _object, char const, torch::TorchFunctionName) (0x7f69330bab90 in /fsx/users/bahuang/repos/pytorch_fsx/torch/csrc/utils/python_arg_parser.cpp:323) frame #24: <unknown function> (0x7f6932a09e79 in /fsx/users/bahuang/repos/pytorch_fsx/torch/csrc/autograd/python_variable.cpp:2252) frame #25: <unknown function> (0x7f69261aee33 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/PythonFallbackKernel.cpp:56) frame #26: <unknown function> (0x7f69261afef9 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/BoxedKernel_impl.h:19) frame #27: c10::BoxedKernel::callBoxed(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >) const (0x7f6932aadced in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/BoxedKernel_impl.h:41) frame #28: <unknown function> (0x7f6926fae9b9 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/impl/boxing.h:227) frame #29: at::Tensor c10::Dispatcher::redispatch<at::Tensor, at::Tensor const&>(c10::TypedOperatorHandle<at::Tensor (at::Tensor const&)> const&, c10::DispatchKeySet, at::Tensor const&) const (0x7f6926e821f5 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/KernelFunction_impl.h:106) frame #30: at::_ops::alias::redispatch(c10::DispatchKeySet, at::Tensor const&) (0x7f6927142c31 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/dispatch/Dispatcher.h:438) frame #31: <unknown function> (0x7f692ae4f8be in /fsx/users/bahuang/repos/pytorch_fsx/torch/csrc/autograd/generated/ADInplaceOrViewType_1.cpp:1361) frame #32: <unknown function> (0x7f692ae4f9b1 in /fsx/users/bahuang/repos/pytorch_fsx/torch/csrc/autograd/generated/ADInplaceOrViewType_1.cpp:1362) frame #33: <unknown function> (0x7f692aef77e9 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/impl/WrapFunctionIntoFunctor.h:13) frame #34: <unknown function> (0x7f6926fae7d8 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/KernelFunction_impl.h:50) frame #35: at::Tensor c10::Dispatcher::redispatch<at::Tensor, at::Tensor const&>(c10::TypedOperatorHandle<at::Tensor (at::Tensor const&)> const&, c10::DispatchKeySet, at::Tensor const&) const (0x7f6926e821c9 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/KernelFunction_impl.h:97) frame #36: at::_ops::alias::redispatch(c10::DispatchKeySet, at::Tensor const&) (0x7f6927142c31 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/dispatch/Dispatcher.h:438) frame #37: <unknown function> (0x7f6929ec654a in /fsx/users/bahuang/repos/pytorch_fsx/build/aten/src/ATen/RedispatchFunctions.h:10697) frame #38: <unknown function> (0x7f6929d9edae in /fsx/users/bahuang/repos/pytorch_fsx/torch/csrc/autograd/generated/VariableType_1.cpp:2837) frame #39: <unknown function> (0x7f6929d9f043 in /fsx/users/bahuang/repos/pytorch_fsx/torch/csrc/autograd/generated/VariableType_1.cpp:2838) frame #40: <unknown function> (0x7f6929e7d2f9 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/impl/WrapFunctionIntoFunctor.h:13) frame #41: <unknown function> (0x7f6929eb1344 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h:478) frame #42: <unknown function> (0x7f6929ea7b99 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h:490) frame #43: <unknown function> (0x7f6929e7d370 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h:563) frame #44: <unknown function> (0x7f6929e7d43a in /fsx/users/bahuang/repos/pytorch_fsx/c10/util/C++17.h:239) frame #45: <unknown function> (0x7f6929e7d48c in /fsx/users/bahuang/repos/pytorch_fsx/c10/util/C++17.h:364) frame #46: <unknown function> (0x7f6929e7d50a in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h:554) frame #47: c10::BoxedKernel::callBoxed(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >) const (0x7f6932aadced in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/BoxedKernel_impl.h:41) frame #48: c10::KernelFunction::callBoxed(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >) const (0x7f6932aadd26 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/KernelFunction_impl.h:43) frame #49: c10::Dispatcher::redispatchBoxed(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >) const (0x7f692603890a in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/dispatch/Dispatcher.h:652) frame #50: <unknown function> (0x7f69260387f9 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/dispatch/Dispatcher.h:388) frame #51: <unknown function> (0x7f69261af0ef in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/PythonFallbackKernel.cpp:96) frame #52: <unknown function> (0x7f69261aff2b in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/BoxedKernel_impl.h:25) frame #53: c10::BoxedKernel::callBoxed(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >) const (0x7f6932aadced in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/BoxedKernel_impl.h:41) frame #54: c10::KernelFunction::callBoxed(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >) const (0x7f6932aadd26 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/boxing/KernelFunction_impl.h:43) frame #55: c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >) const (0x7f6925fd6ab2 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/dispatch/Dispatcher.h:628) frame #56: <unknown function> (0x7f6925fd6690 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/dispatch/Dispatcher.h:376) frame #57: <unknown function> (0x7f692bf5b525 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/dispatch/Dispatcher.h:380) frame #58: <unknown function> (0x7f692bf59fac in /fsx/users/bahuang/repos/pytorch_fsx/torch/csrc/jit/runtime/register_c10_ops.cpp:15) frame #59: <unknown function> (0x7f692bf5af41 in /usr/include/c++/7/bits/std_function.h:316) frame #60: std::function<void (std::vector<c10::IValue, std::allocator<c10::IValue> >&)>::operator()(std::vector<c10::IValue, std::allocator<c10::IValue> >&) const (0x7f6932ab9a0f in /usr/include/c++/7/bits/std_function.h:706) frame #61: <unknown function> (0x7f6932aad541 in /fsx/users/bahuang/repos/pytorch_fsx/aten/src/ATen/core/stack.h:41) frame #62: <unknown function> (0x7f6932ab3102 in /fsx/users/bahuang/repos/pytorch_fsx/torch/csrc/jit/python/pybind_utils.h:1206 (discriminator 1)) frame #63: <unknown function> (0x7f6932ab3943 in /fsx/users/bahuang/repos/pytorch_fsx/torch/csrc/jit/python/pybind_utils.h:1272) frame #64: <unknown function> (0x7f6932a46120 in /fsx/users/bahuang/repos/pytorch_fsx/torch/csrc/jit/python/init.cpp:1767) frame #65: <unknown function> (0x7f6932a997be in /fsx/users/bahuang/repos/pytorch_fsx/third_party/pybind11/include/pybind11/cast.h:1441) frame #66: <unknown function> (0x7f6932a8a985 in /fsx/users/bahuang/repos/pytorch_fsx/third_party/pybind11/include/pybind11/cast.h:1410) frame #67: <unknown function> (0x7f6932a66e1e in /fsx/users/bahuang/repos/pytorch_fsx/third_party/pybind11/include/pybind11/pybind11.h:249) frame #68: <unknown function> (0x7f6932a66ec2 in /fsx/users/bahuang/repos/pytorch_fsx/third_party/pybind11/include/pybind11/pybind11.h:224) frame #69: <unknown function> (0x7f6932473111 in /fsx/users/bahuang/repos/pytorch_fsx/third_party/pybind11/include/pybind11/pybind11.h:929) frame #104: __libc_start_main (0x7f693485dc87 in /build/glibc-uZu3wS/glibc-2.27/csu/../csu/libc-start.c:310) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/84896 Approved by: https://github.com/ezyang	2022-09-21 01:32:33 +00:00
Aidyn-A	1456cca1fc	Fix exception handling, improve overheads and avoid constructing storage for element size (#84612 ) These changes were proposed by @MatthiasKohl in #84271 and #84542 that fix #84267 and #84056 respectively. The reason I am creating the pull request is CLA check (see original PRs). cc @ptrblck @malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/84612 Approved by: https://github.com/ngimel	2022-09-19 20:21:46 +00:00
soulitzer	7f88934a8f	[reland 2] Call jit decomp in VariableType to improve forward AD coverage (#84976 ) Reland of https://github.com/pytorch/pytorch/pull/84675 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84976 Approved by: https://github.com/zou3519	2022-09-15 22:46:19 +00:00
PyTorch MergeBot	36d79143ce	Revert "[reland] Call jit decomposition in VariableType to increase forward AD coverage (#84151 ) (#84675 )" This reverts commit `bb4e96c964`. Reverted https://github.com/pytorch/pytorch/pull/84675 on behalf of https://github.com/osalpekar due to causing asan xplat link-time errors like ld.lld: error: undefined symbol: torch::jit::has_jit_decomposition(c10::FunctionSchema const&)	2022-09-13 22:54:54 +00:00
soulitzer	bb4e96c964	[reland] Call jit decomposition in VariableType to increase forward AD coverage (#84151 ) (#84675 ) This reverts commit `acb4a09628`. In addition, we also fix a memory leak in layer norm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84675 Approved by: https://github.com/zou3519	2022-09-12 20:33:14 +00:00
Mikayla Gawarecki	e217b30b0f	Add `torch.nested` namespace (#84102 ) First step towards #83775 - only `to_padded_tensor` is moved to the nested namespace for now - following the schema used for `special`, `fft`, `linalg` and other namespaces, nested functions are registered in native_functions.yaml as `nested_{function_name}` and are bound to the desired Python name in `torch/nested/__init__.py`, and the desired C++ name in `torch/csrc/api/include/torch/nested.h`. ~~Question: should we keep the documentation for `Tensor.to_padded_tensor` or can this deleted since it is shared by `torch.nested.to_padded_tensor`?~~ [generated nested docs](https://docs-preview.pytorch.org/84102/nested.html?highlight=nested#module-torch.nested) Differential Revision: [D39361148](https://our.internmc.facebook.com/intern/diff/D39361148) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84102 Approved by: https://github.com/drisspg	2022-09-12 16:31:05 +00:00
Taylor Robie	1fa9a377d0	[Profiler] Start moving python bindings out of autograd (#82584 ) A lot of profiler code still lives in autograd for historic reasons. However as we formalize and clean up profiler internals it makes sense to pull more and more into the profiler folders/namespace. For now I'm just moving some of the core config data structures and those related to `torch::profiler::impl::Result` to keep the scope manageable. Differential Revision: [D37961462](https://our.internmc.facebook.com/intern/diff/D37961462/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D37961462/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/82584 Approved by: https://github.com/albanD, https://github.com/Gamrix	2022-08-19 17:15:18 +00:00
Edward Z. Yang	df69660832	Revert "Revert "Add a lint rule for torch/csrc/util/pybind.h include (#82552 )"" (#82599 ) This reverts commit `532b8a9e00`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82599 Approved by: https://github.com/albanD	2022-08-02 19:37:02 +00:00
Elias Ellison	642aed8b99	Add Autocast Support for FakeTensors / use fake device dispatch keys (#82449 ) From PR: ``` Note: [Fake Tensor Dispatch Keys] In order to model the behavior of device-specific autocast and autograd logic, we update the dispatch keys of FakeTensors to reflect their fake device. This includes the BackendComponent (DispatchKey::Meta -> DispatchKey::CUDA), and also the BackendComponent related Autocast and Autograd keys. __torch__dispatch__ sits below Autocast and Autograd, and is only invoked when we are at the kernel for the BackendComponent. Then, we add Meta to the thread-local dispatch include set to hit the meta kernel instead of the kernel of the BackendComponent for the fake device. ``` Also adds the `conv1/2/3d.padding` operators to the Autocast rule set. Without that fix, the FakeTensor dtype would diverge. See: https://github.com/pytorch/pytorch/issues/81608 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82449 Approved by: https://github.com/ezyang	2022-08-01 21:40:36 +00:00
PyTorch MergeBot	532b8a9e00	Revert "Add a lint rule for torch/csrc/util/pybind.h include (#82552 )" This reverts commit `9465c0e0b5`. Reverted https://github.com/pytorch/pytorch/pull/82552 on behalf of https://github.com/zengk95 due to This seems to be breaking windows binary wheels	2022-08-01 20:25:35 +00:00
Edward Z. Yang	9465c0e0b5	Add a lint rule for torch/csrc/util/pybind.h include (#82552 ) We define specializations for pybind11 defined templates (in particular, PYBIND11_DECLARE_HOLDER_TYPE) and consequently it is important that these specializations always be #include'd when making use of pybind11 templates whose behavior depends on these specializations, otherwise we can cause an ODR violation. The easiest way to ensure that all the specializations are always loaded is to designate a header (in this case, torch/csrc/util/pybind.h) that ensures the specializations are defined, and then add a lint to ensure this header is included whenever pybind11 headers are included. The existing grep linter didn't have enough knobs to do this conveniently, so I added some features. I'm open to suggestions for how to structure the features better. The main changes: - Added an --allowlist-pattern flag, which turns off the grep lint if some other line exists. This is used to stop the grep lint from complaining about pybind11 includes if the util include already exists. - Added --match-first-only flag, which lets grep only match against the first matching line. This is because, even if there are multiple includes that are problematic, I only need to fix one of them. We don't /really/ need this, but when I was running lintrunner -a to fixup the preexisting codebase it was annoying without this, as the lintrunner overall driver fails if there are multiple edits on the same file. I excluded any files that didn't otherwise have a dependency on torch/ATen, this was mostly caffe2 and the valgrind wrapper compat bindings. Note the grep replacement is kind of crappy, but clang-tidy lint cleaned it up in most cases. See also https://github.com/pybind/pybind11/issues/4099 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/82552 Approved by: https://github.com/albanD	2022-08-01 17:16:58 +00:00
Jing Xu	0e95746580	[RFC] enable oneMKL&oneDNN on-demands verbose functinality (#63212 ) RFC: Problem statement  Intel oneMKL and oneDNN are used to accelerate performance on Intel platforms. Both these 2 libraries provide verbose functionality to dump detailed operator execution information as well as execution time. These verbose messages are very helpful to performance profiling. However, the verbose functionality works for the entire execution. In many scenarios, though, we only would like to profile partial of the execution process. This feature is to expose PyTorch API functions to control oneDNN and oneMKL verbose functionality in runtime. Additional context   The most used performance profiling steps are shown as the following code snippet: ``` def inference(model, inputs): # step0 (optional): jit model = torch.jit.trace(model, inputs) # step1: warmup for _ in range(100): model(inputs) # step2: performance profiling. We only care the profiling result, as well as oneDNN and oneMKL verbose messages, of this step model(inputs) # step3 (optional): benchmarking t0 = time.time() for _ in range(100): model(inputs) t1 = time.time() print(‘dur: {}’.format((t1-t0)/100)) return model(inputs) ``` Since environment variables MKL_VERBOSE and DNNL_VERBOSE will be effect to the entire progress, we will get a great number of verbose messages for all of 101 iterations (if step3 is not involved). However, we only care about the verbose messages dumped in step2. It is very difficult to filter unnecessary verbose messages out if we are running into a complicated usages scenario. Also, jit trace will also bring more undesired verbose messages. Furthermore, there are more complicated topologies or usages like cascaded topologies as below: ``` model1 = Model1() model2 = Model2() model3 = Model3() x1 = inference(model1, x) x2 = inference(model2, x1) y = inference(model3, x2) ``` There are many cases that it is very hard to split these child topologies out. In this scenario, it is not possible to investigate performance of each individual topology with `DNNL_VERBOSE` and `MKL_VERBOSE`. To solve this issue, oneDNN and oneMKL provide API functions to make it possible to control verbose functionality in runtime. ``` int mkl_verbose (int enable) status dnnl::set_verbose(int level) ``` oneDNN and oneMKL print verbose messages to stdout when oneMKL or oneDNN ops are executed. Sample verbose messages: ``` MKL_VERBOSE SGEMM(t,n,768,2048,3072,0x7fff64115800,0x7fa1aca58040,3072,0x1041f5c0,3072,0x7fff64115820,0x981f0c0,768) 8.52ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:44 dnnl_verbose,exec,cpu,inner_product,brgemm:avx512_core,forward_training,src_f32::blocked:ab:f0 wei_f32::blocked:AB16b64a:f0 bia_f32::blocked:a:f0 dst_f32::blocked:ab:f0,,,mb16ic768oc768,0.0839844 ``` Design and implementation  The design is to make python-interfaced wrap functions to invoke mkl_verbose and dnnl::set_verbose functions. Design concern   - Need to add wrapper C++ functions for mkl_verbose and dnnl::set_verbose functions in torch/csrc and aten/csrc. - Python API functions will be added to device-specific backends - with torch.backends.mkl.verbose(1): - with torch.backends.mkldnn.verbose(1): Use cases   ``` def inference(model, inputs): # step0 (optional): jit model = torch.jit.trace(model, inputs) # step1: warmup for _ in range(100): model(inputs) # step2: performance profiling with torch.backends.mkl.verbose(1), torch.backends.mkldnn.verbose(1): model(inputs) # step3 (optional): benchmarking t0 = time.time() for _ in range(100): model(inputs) t1 = time.time() print(‘dur: {}’.format((t1-t0)/100)) return model(inputs) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/63212 Approved by: https://github.com/VitalyFedyunin, https://github.com/malfet	2022-07-27 23:29:35 +00:00
Jing Xu	3c7044728b	Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289 ) More detailed description of benefits can be found at #41001. This is Intel's counterpart of NVidia’s NVTX (https://pytorch.org/docs/stable/autograd.html#torch.autograd.profiler.emit_nvtx). ITT is a functionality for labeling trace data during application execution across different Intel tools. For integrating Intel(R) VTune Profiler into Kineto, ITT needs to be integrated into PyTorch first. It works with both standalone VTune Profiler [(https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html)) and Kineto-integrated VTune functionality in the future. It works for both Intel CPU and Intel XPU devices. Pitch Add VTune Profiler's ITT API function calls to annotate PyTorch ops, as well as developer customized code scopes on CPU, like NVTX for NVidia GPU. This PR rebases the code changes at https://github.com/pytorch/pytorch/pull/61335 to the latest master branch. Usage example: ``` with torch.autograd.profiler.emit_itt(): for i in range(10): torch.itt.range_push('step_{}'.format(i)) model(input) torch.itt.range_pop() ``` cc @ilia-cher @robieta @chaekit @gdankel @bitfort @ngimel @orionr @nbcsm @guotuofeng @guyang3532 @gaoteng-git Pull Request resolved: https://github.com/pytorch/pytorch/pull/63289 Approved by: https://github.com/malfet	2022-07-13 13:50:15 +00:00
PyTorch MergeBot	1454515253	Revert "Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289 )" This reverts commit `f988aa2b3f`. Reverted https://github.com/pytorch/pytorch/pull/63289 on behalf of https://github.com/malfet due to broke trunk, see `f988aa2b3f`	2022-06-30 12:49:41 +00:00
Jing Xu	f988aa2b3f	Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289 ) More detailed description of benefits can be found at #41001. This is Intel's counterpart of NVidia’s NVTX (https://pytorch.org/docs/stable/autograd.html#torch.autograd.profiler.emit_nvtx). ITT is a functionality for labeling trace data during application execution across different Intel tools. For integrating Intel(R) VTune Profiler into Kineto, ITT needs to be integrated into PyTorch first. It works with both standalone VTune Profiler [(https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html)) and Kineto-integrated VTune functionality in the future. It works for both Intel CPU and Intel XPU devices. Pitch Add VTune Profiler's ITT API function calls to annotate PyTorch ops, as well as developer customized code scopes on CPU, like NVTX for NVidia GPU. This PR rebases the code changes at https://github.com/pytorch/pytorch/pull/61335 to the latest master branch. Usage example: ``` with torch.autograd.profiler.emit_itt(): for i in range(10): torch.itt.range_push('step_{}'.format(i)) model(input) torch.itt.range_pop() ``` cc @ilia-cher @robieta @chaekit @gdankel @bitfort @ngimel @orionr @nbcsm @guotuofeng @guyang3532 @gaoteng-git Pull Request resolved: https://github.com/pytorch/pytorch/pull/63289 Approved by: https://github.com/malfet	2022-06-30 05:14:03 +00:00
Michael Suo	30fb2c4aba	[lint] autoformat test/cpp and torch/csrc Let's have some fun. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78828 Approved by: https://github.com/ezyang	2022-06-11 21:11:16 +00:00
anjali411	38350acf8f	Autogen Tags enum, and allow specifying tags while defining an op Pull Request resolved: https://github.com/pytorch/pytorch/pull/79322 Approved by: https://github.com/albanD	2022-06-11 00:29:32 +00:00
Elias Ellison	3c5a3ca9e8	Make FakeTensors return meta within kerenl invocation, add FakeTensor op tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/78972 Approved by: https://github.com/ezyang	2022-06-09 01:39:27 +00:00
PyTorch MergeBot	954522a485	Revert "Autogen Tags enum, and allow specifying tags while defining an op" This reverts commit `9476a78f37`. Reverted https://github.com/pytorch/pytorch/pull/77313 on behalf of https://github.com/malfet due to Broke OSS buck builds, see `9476a78f37`	2022-06-03 01:53:53 +00:00
anjali411	9476a78f37	Autogen Tags enum, and allow specifying tags while defining an op Pull Request resolved: https://github.com/pytorch/pytorch/pull/77313 Approved by: https://github.com/ezyang, https://github.com/albanD	2022-06-03 01:13:44 +00:00
PyTorch MergeBot	b994ce359e	Revert "[cuDNN V8 API] (reopen) Allow the number of kernels profiled under torch.backends.cudnn.benchmark = True to be limitedCudnnv8 benchmark limit (#77002 )" This reverts commit `c274f2ad52`. Reverted https://github.com/pytorch/pytorch/pull/77002 on behalf of https://github.com/malfet due to please, as it breaks internal CI, but also no CUDA heads should be included from `torch/csrc/Module.cpp`, but rather should be implemented/registered in `torch/csrc/cuda/Module.cpp`	2022-05-24 21:52:35 +00:00
Nikita Shulga	6244daa6a9	[MPS] Fix torch.mps.is_available() (#78121 ) By introducing `at:mps::is_available()` and changing `torch._C._is_mps_available` from property to memoizable callable Also, if `_mtl_device` is released in MPSDevice destructor, shouldn't it be retained in the constructor Looks like GitHubActions Mac runner does not have any Metal devices available, according to https://github.com/malfet/deleteme/runs/6560871657?check_suite_focus=true#step:3:15 Pull Request resolved: https://github.com/pytorch/pytorch/pull/78121 Approved by: https://github.com/albanD	2022-05-24 05:10:38 +00:00
Eddie Yan	c274f2ad52	[cuDNN V8 API] (reopen) Allow the number of kernels profiled under torch.backends.cudnn.benchmark = True to be limitedCudnnv8 benchmark limit (#77002 ) (reopening due to botched merge) The cuDNN V8 API (main support merged in https://github.com/pytorch/pytorch/pull/60755) potentially exposes many more kernels with benchmark=True. While these additional kernels can improve performance, it is often unnecessary to run every kernel returned by the heuristic and doing so may degrade the user experience by causing the first model iteration to be very slow. To alleviate this issue, this PR introduces torch.backends.cudnn.benchmark_limit. benchmark_limit specifies the maximum number of working cuDNN kernels to try for a given workload, with the default being 10 (similar to what TensorFlow does). benchmark_limit = 0 yields the current behavior of trying every kernel returned by the heuristic. CC @ptrblck @ngimel @xwang233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/77002 Approved by: https://github.com/ngimel	2022-05-24 00:11:47 +00:00
Jeff Daily	9aed30d3ad	[ROCm] support benchmark flag for MIOpen (#77438 ) Fixes #68172. Generally, this corrects multiple flaky convolution unit test behavior seen on ROCm. The MIOpen integration has been forcing benchmark=True when calling `torch._C._set_cudnn_benchmark(False)`, typically called by `torch.backends.cudnn.set_flags(enabled=True, benchmark=False)`. We now add support for MIOpen immediate mode to avoid benchmarking during MIOpen solution selection. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77438 Approved by: https://github.com/ngimel, https://github.com/malfet	2022-05-23 17:10:24 +00:00
Kurt Mohler	aea6e2c396	Merge torch.cuda._UntypedStorage into torch._UntypedStorage (#75459 ) Fixes #74933 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75459 Approved by: https://github.com/ezyang	2022-05-19 13:54:39 +00:00
Kulin Seth	f348b1b2b5	Add the Runtime components for MPS backend. (#76725 ) The PR adds the runtime components and few basic operations like copy, as_strided for MPS backend. Current list of identified TODOs are: - https://github.com/pytorch/pytorch/issues/77176 - Unify the logic with CUDACachingAllocator and remove redundant code. - https://github.com/pytorch/pytorch/issues/77170 - Look into using C++ smart pointers where possible with ObjC code - Use empty_strided_generic() to implement the `empty_strided_mps` code - https://github.com/pytorch/pytorch/issues/77144 Pull Request resolved: https://github.com/pytorch/pytorch/pull/76725 Approved by: https://github.com/albanD	2022-05-11 17:19:45 +00:00
Eddie Yan	e838137b3e	Add high level control of fp32 matmul precision; disable TF32 for matmuls by default #76440 CC @mruberry @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/76509 Approved by: https://github.com/ngimel	2022-05-04 20:40:13 +00:00
Nikita Shulga	8473173c36	Remove breakpad dependency This functionality does not seem to be used and there are some requests to update dependency. Add `third_party` to torch_cpu include directories if compiling with Caffe2 support, as `caffe2/quantization/server/conv_dnnlowp_op.cc` depends on `third_party/fbgemm/src/RefImplementations.h` Pull Request resolved: https://github.com/pytorch/pytorch/pull/75394 Approved by: https://github.com/janeyx99, https://github.com/seemethere	2022-05-03 20:21:55 +00:00
Kulin Seth	54c75e1e8f	Add "mps" device to PyTorch framework. Remove the "mlc" device for Mac platforms. This commit will be followed up with: * adding MPS runtime components * PyTorch ops for MPS device Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/76291 Approved by: https://github.com/albanD	2022-04-27 19:21:57 +00:00
Peter Bell	58fb3f018e	Fix conjugate bit discrepancy in composite compliance When testing composite compliance, the conj bit and neg bit are not propagated to the wrapper tensor. This leads to problems when a composite operator has two paths depending on whether one of these bits are set, since the non-conjugated path will always be taken. For example, `at::real` effectively does ```cpp view_as_real(tensor.is_conj() ? tensor.conj() : tensor) ``` which will never call `conj()` because the `CompositeCompliantTensor` never has has the conj bit set. The result is `view_as_real` fails when `r.elem` does have the conj bit set. Pull Request resolved: https://github.com/pytorch/pytorch/pull/75830 Approved by: https://github.com/zou3519	2022-04-19 13:59:28 +00:00
PyTorch MergeBot	d79d9fa283	Revert "Remove breakpad dependency" This reverts commit `9aa3c7fd83`. Reverted https://github.com/pytorch/pytorch/pull/75394 on behalf of https://github.com/malfet	2022-04-17 17:58:51 +00:00
Nikita Shulga	9aa3c7fd83	Remove breakpad dependency This functionality does not seem to be used and there are some requests to update dependency Pull Request resolved: https://github.com/pytorch/pytorch/pull/75394 Approved by: https://github.com/janeyx99, https://github.com/seemethere	2022-04-17 17:43:45 +00:00
Edward Z. Yang	35cfa74f97	Add a default implementation of __torch_dispatch__ I was working on an explanation of how to call into the "super" implementation of some given ATen operation inside of __torch_dispatch__ (https://github.com/albanD/subclass_zoo/blob/main/trivial_tensors.py) and I kept thinking to myself "Why doesn't just calling super() on __torch_dispatch__ work"? Well, after this patch, it does! The idea is if you don't actually unwrap the input tensors, you can call super().__torch_dispatch__ to get at the original behavior. Internally, this is implemented by disabling PythonKey and then redispatching. This implementation of disabled_torch_dispatch is not /quite/ right, and some reasons why are commented in the code. There is then some extra work I have to do to make sure we recognize disabled_torch_dispatch as the "default" implementation (so we don't start slapping PythonKey on all tensors, including base Tensors), which is modeled the same way as how disabled_torch_function is done. Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/73684 Approved by: albanD	2022-03-03 20:19:33 +00:00
Can Balioglu	7366724e07	Introduce an environment variable to change c10 log level (#71746 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71746 This PR contains the following improvements: - It exposes a new environment variable `TORCH_CPP_LOG_LEVEL` that enables users to set the log level of c10 logging facility (supports both GLOG and c10 loggers). Valid values are `INFO`, `WARNING`, `ERROR`, and `FATAL` or their numerical equivalents `0`, `1`, `2`, and `3`. - It implements an `initLogging()` function and calls it as part of `torch._C` module import to ensure that the underlying logging facility is correctly initialized in Python. With these changes a user can dynamically set the log level of c10 as in the following example: ``` $ TORCH_CPP_LOG_LEVEL=INFO python my_torch_script.py ``` ghstack-source-id: 149822703 Test Plan: Run existing tests. Reviewed By: malfet Differential Revision: D33756252 fbshipit-source-id: 7fd078c03a598595d992de0b474a23cec91838af (cherry picked from commit 01d6ec6207faedf259ed1368730e9e197cb3e1c6)	2022-02-24 14:34:01 +00:00
Alban Desmaison	7807a83f6e	Fix error handling TestSetDefaultMobileCPUAllocator Pull Request resolved: https://github.com/pytorch/pytorch/pull/73207	2022-02-22 19:45:49 +00:00
Will Constable	328cfd50e7	Move debug_util and python_util to torch/csrc/lazy (#72607 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72607 since python isn't available from libtorch, most of lazy tensor code can't depend on python. separate python_util into libtorch_python library make debug_util and IR dump work with or without python by providing a default function for 'maybe getting python stacktrace' that returns an empty stacktrace use a registration mechanism on libtorch_python library load to update the 'maybe' function to use the real python stacktrace getter Test Plan: OSS build tests: - test_ptltc by itself works - LTC_SAVE_TENSORS_FILE=log test_ptltc works, and log contains empty stacktrces - python examply.py by itself works - LTC_SAVE_TENSORS_FILE=log test_ptltc works, and log contains real stacktraces fbcode build: rely on CI to run test/lazy Reviewed By: desertfire Differential Revision: D34115046 fbshipit-source-id: 8d6222963c146da36b3c1b5ff8a638bbc3f1442e (cherry picked from commit `3717688ade`)	2022-02-11 18:00:40 +00:00
Tristan Rice	bfe1abd3b5	torch/monitor: add pybind (#69567 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69567 This exposes torch.monitor events and stats via pybind11 to the underlying C++ implementation. * The registration interface is a tad different since it takes a lambda function in Python where as in C++ it's a full class. * This has a small amount of changes to the counter interfaces since there's no way to create an initializer list at runtime so they now also take a vector. * Only double based stats are provided in Python since it's intended more for high level stats where float imprecision shouldn't be an issue. This can be changed down the line if need arises. ``` events = [] def handler(event): events.append(event) handle = register_event_handler(handler) log_event(Event(type="torch.monitor.TestEvent", timestamp=datetime.now(), metadata={"foo": 1.0})) ``` D32969391 is now included in this diff. This cleans up the naming for events. type is now name, message is gone, and metadata is renamed data. Test Plan: buck test //caffe2/test:monitor //caffe2/test/cpp/monitor:monitor Reviewed By: kiukchung Differential Revision: D32924141 fbshipit-source-id: 563304c2e3261a4754e40cca39fc64c5a04b43e8	2022-01-12 13:35:11 -08:00
Peter Bell	b08d64202a	Remove THGeneral (#69041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69041 `TH_CONCAT_{N}` is still being used by THP so I've moved that into it's own header but all the compiled code is gone. Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D32872477 Pulled By: ngimel fbshipit-source-id: 06c82d8f96dbcee0715be407c61dfc7d7e8be47a	2021-12-13 16:14:28 -08:00
kshitij12345	b737e09f60	expose return_types in Python (#66614 ) Summary: https://github.com/facebookresearch/functorch/issues/87 TODO: * [x] Add comments * [x] Add test * [x] Fix XLA <details> <summary>Generated python_return_types.cpp</summary> ```cpp #include <Python.h> #include <vector> #include <map> #include <string> #include "torch/csrc/autograd/python_return_types.h" #include "torch/csrc/utils/structseq.h" #include "torch/csrc/Exceptions.h" namespace { PyTypeObject* get__det_lu_based_helper_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"det", ""}, {"lu", ""}, {"pivs", ""}, {nullptr} }; static PyTypeObject _det_lu_based_helperNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types._det_lu_based_helper", nullptr, NamedTuple_fields, 3 }; if (!is_initialized) { PyStructSequence_InitType(&_det_lu_based_helperNamedTuple, &desc); _det_lu_based_helperNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &_det_lu_based_helperNamedTuple; } PyTypeObject* get__fake_quantize_per_tensor_affine_cachemask_tensor_qparams_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"output", ""}, {"mask", ""}, {nullptr} }; static PyTypeObject _fake_quantize_per_tensor_affine_cachemask_tensor_qparamsNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types._fake_quantize_per_tensor_affine_cachemask_tensor_qparams", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&_fake_quantize_per_tensor_affine_cachemask_tensor_qparamsNamedTuple, &desc); _fake_quantize_per_tensor_affine_cachemask_tensor_qparamsNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &_fake_quantize_per_tensor_affine_cachemask_tensor_qparamsNamedTuple; } PyTypeObject* get__fused_moving_avg_obs_fq_helper_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"output", ""}, {"mask", ""}, {nullptr} }; static PyTypeObject _fused_moving_avg_obs_fq_helperNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types._fused_moving_avg_obs_fq_helper", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&_fused_moving_avg_obs_fq_helperNamedTuple, &desc); _fused_moving_avg_obs_fq_helperNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &_fused_moving_avg_obs_fq_helperNamedTuple; } PyTypeObject* get__lu_with_info_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"LU", ""}, {"pivots", ""}, {"info", ""}, {nullptr} }; static PyTypeObject _lu_with_infoNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types._lu_with_info", nullptr, NamedTuple_fields, 3 }; if (!is_initialized) { PyStructSequence_InitType(&_lu_with_infoNamedTuple, &desc); _lu_with_infoNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &_lu_with_infoNamedTuple; } PyTypeObject* get__unpack_dual_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"primal", ""}, {"tangent", ""}, {nullptr} }; static PyTypeObject _unpack_dualNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types._unpack_dual", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&_unpack_dualNamedTuple, &desc); _unpack_dualNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &_unpack_dualNamedTuple; } PyTypeObject* get_aminmax_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"min", ""}, {"max", ""}, {nullptr} }; static PyTypeObject aminmaxNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.aminmax", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&aminmaxNamedTuple, &desc); aminmaxNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &aminmaxNamedTuple; } PyTypeObject* get_aminmax_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"min", ""}, {"max", ""}, {nullptr} }; static PyTypeObject aminmax_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.aminmax_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&aminmax_outNamedTuple1, &desc); aminmax_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &aminmax_outNamedTuple1; } PyTypeObject* get_cummax_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject cummaxNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.cummax", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&cummaxNamedTuple, &desc); cummaxNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &cummaxNamedTuple; } PyTypeObject* get_cummax_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject cummax_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.cummax_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&cummax_outNamedTuple1, &desc); cummax_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &cummax_outNamedTuple1; } PyTypeObject* get_cummin_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject cumminNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.cummin", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&cumminNamedTuple, &desc); cumminNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &cumminNamedTuple; } PyTypeObject* get_cummin_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject cummin_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.cummin_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&cummin_outNamedTuple1, &desc); cummin_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &cummin_outNamedTuple1; } PyTypeObject* get_eig_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""}, {nullptr} }; static PyTypeObject eig_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.eig_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&eig_outNamedTuple, &desc); eig_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &eig_outNamedTuple; } PyTypeObject* get_eig_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""}, {nullptr} }; static PyTypeObject eigNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.eig", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&eigNamedTuple1, &desc); eigNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &eigNamedTuple1; } PyTypeObject* get_frexp_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"mantissa", ""}, {"exponent", ""}, {nullptr} }; static PyTypeObject frexpNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.frexp", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&frexpNamedTuple, &desc); frexpNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &frexpNamedTuple; } PyTypeObject* get_frexp_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"mantissa", ""}, {"exponent", ""}, {nullptr} }; static PyTypeObject frexp_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.frexp_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&frexp_outNamedTuple1, &desc); frexp_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &frexp_outNamedTuple1; } PyTypeObject* get_geqrf_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"a", ""}, {"tau", ""}, {nullptr} }; static PyTypeObject geqrf_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.geqrf_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&geqrf_outNamedTuple, &desc); geqrf_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &geqrf_outNamedTuple; } PyTypeObject* get_geqrf_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"a", ""}, {"tau", ""}, {nullptr} }; static PyTypeObject geqrfNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.geqrf", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&geqrfNamedTuple1, &desc); geqrfNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &geqrfNamedTuple1; } PyTypeObject* get_histogram_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"hist", ""}, {"bin_edges", ""}, {nullptr} }; static PyTypeObject histogram_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.histogram_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&histogram_outNamedTuple, &desc); histogram_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &histogram_outNamedTuple; } PyTypeObject* get_histogram_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"hist", ""}, {"bin_edges", ""}, {nullptr} }; static PyTypeObject histogramNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.histogram", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&histogramNamedTuple1, &desc); histogramNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &histogramNamedTuple1; } PyTypeObject* get_kthvalue_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject kthvalueNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.kthvalue", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&kthvalueNamedTuple, &desc); kthvalueNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &kthvalueNamedTuple; } PyTypeObject* get_kthvalue_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject kthvalue_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.kthvalue_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&kthvalue_outNamedTuple1, &desc); kthvalue_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &kthvalue_outNamedTuple1; } PyTypeObject* get_linalg_cholesky_ex_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"L", ""}, {"info", ""}, {nullptr} }; static PyTypeObject linalg_cholesky_exNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_cholesky_ex", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_cholesky_exNamedTuple, &desc); linalg_cholesky_exNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_cholesky_exNamedTuple; } PyTypeObject* get_linalg_cholesky_ex_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"L", ""}, {"info", ""}, {nullptr} }; static PyTypeObject linalg_cholesky_ex_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_cholesky_ex_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_cholesky_ex_outNamedTuple1, &desc); linalg_cholesky_ex_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_cholesky_ex_outNamedTuple1; } PyTypeObject* get_linalg_eig_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""}, {nullptr} }; static PyTypeObject linalg_eigNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_eig", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_eigNamedTuple, &desc); linalg_eigNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_eigNamedTuple; } PyTypeObject* get_linalg_eig_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""}, {nullptr} }; static PyTypeObject linalg_eig_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_eig_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_eig_outNamedTuple1, &desc); linalg_eig_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_eig_outNamedTuple1; } PyTypeObject* get_linalg_eigh_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""}, {nullptr} }; static PyTypeObject linalg_eighNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_eigh", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_eighNamedTuple, &desc); linalg_eighNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_eighNamedTuple; } PyTypeObject* get_linalg_eigh_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""}, {nullptr} }; static PyTypeObject linalg_eigh_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_eigh_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_eigh_outNamedTuple1, &desc); linalg_eigh_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_eigh_outNamedTuple1; } PyTypeObject* get_linalg_inv_ex_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"inverse", ""}, {"info", ""}, {nullptr} }; static PyTypeObject linalg_inv_exNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_inv_ex", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_inv_exNamedTuple, &desc); linalg_inv_exNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_inv_exNamedTuple; } PyTypeObject* get_linalg_inv_ex_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"inverse", ""}, {"info", ""}, {nullptr} }; static PyTypeObject linalg_inv_ex_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_inv_ex_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_inv_ex_outNamedTuple1, &desc); linalg_inv_ex_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_inv_ex_outNamedTuple1; } PyTypeObject* get_linalg_lstsq_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"residuals", ""}, {"rank", ""}, {"singular_values", ""}, {nullptr} }; static PyTypeObject linalg_lstsqNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_lstsq", nullptr, NamedTuple_fields, 4 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_lstsqNamedTuple, &desc); linalg_lstsqNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_lstsqNamedTuple; } PyTypeObject* get_linalg_lstsq_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"residuals", ""}, {"rank", ""}, {"singular_values", ""}, {nullptr} }; static PyTypeObject linalg_lstsq_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_lstsq_out", nullptr, NamedTuple_fields, 4 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_lstsq_outNamedTuple1, &desc); linalg_lstsq_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_lstsq_outNamedTuple1; } PyTypeObject* get_linalg_qr_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"Q", ""}, {"R", ""}, {nullptr} }; static PyTypeObject linalg_qrNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_qr", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_qrNamedTuple, &desc); linalg_qrNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_qrNamedTuple; } PyTypeObject* get_linalg_qr_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"Q", ""}, {"R", ""}, {nullptr} }; static PyTypeObject linalg_qr_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_qr_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_qr_outNamedTuple1, &desc); linalg_qr_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_qr_outNamedTuple1; } PyTypeObject* get_linalg_slogdet_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"sign", ""}, {"logabsdet", ""}, {nullptr} }; static PyTypeObject linalg_slogdetNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_slogdet", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_slogdetNamedTuple, &desc); linalg_slogdetNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_slogdetNamedTuple; } PyTypeObject* get_linalg_slogdet_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"sign", ""}, {"logabsdet", ""}, {nullptr} }; static PyTypeObject linalg_slogdet_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_slogdet_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_slogdet_outNamedTuple1, &desc); linalg_slogdet_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_slogdet_outNamedTuple1; } PyTypeObject* get_linalg_svd_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"U", ""}, {"S", ""}, {"Vh", ""}, {nullptr} }; static PyTypeObject linalg_svd_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_svd_out", nullptr, NamedTuple_fields, 3 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_svd_outNamedTuple, &desc); linalg_svd_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_svd_outNamedTuple; } PyTypeObject* get_linalg_svd_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"U", ""}, {"S", ""}, {"Vh", ""}, {nullptr} }; static PyTypeObject linalg_svdNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_svd", nullptr, NamedTuple_fields, 3 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_svdNamedTuple1, &desc); linalg_svdNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_svdNamedTuple1; } PyTypeObject* get_lstsq_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"QR", ""}, {nullptr} }; static PyTypeObject lstsq_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.lstsq_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&lstsq_outNamedTuple, &desc); lstsq_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &lstsq_outNamedTuple; } PyTypeObject* get_lstsq_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"QR", ""}, {nullptr} }; static PyTypeObject lstsqNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.lstsq", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&lstsqNamedTuple1, &desc); lstsqNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &lstsqNamedTuple1; } PyTypeObject* get_lu_unpack_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"P", ""}, {"L", ""}, {"U", ""}, {nullptr} }; static PyTypeObject lu_unpackNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.lu_unpack", nullptr, NamedTuple_fields, 3 }; if (!is_initialized) { PyStructSequence_InitType(&lu_unpackNamedTuple, &desc); lu_unpackNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &lu_unpackNamedTuple; } PyTypeObject* get_lu_unpack_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"P", ""}, {"L", ""}, {"U", ""}, {nullptr} }; static PyTypeObject lu_unpack_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.lu_unpack_out", nullptr, NamedTuple_fields, 3 }; if (!is_initialized) { PyStructSequence_InitType(&lu_unpack_outNamedTuple1, &desc); lu_unpack_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &lu_unpack_outNamedTuple1; } PyTypeObject* get_max_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject maxNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.max", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&maxNamedTuple, &desc); maxNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &maxNamedTuple; } PyTypeObject* get_max_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject max_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.max_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&max_outNamedTuple1, &desc); max_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &max_outNamedTuple1; } PyTypeObject* get_median_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject medianNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.median", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&medianNamedTuple, &desc); medianNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &medianNamedTuple; } PyTypeObject* get_median_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject median_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.median_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&median_outNamedTuple1, &desc); median_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &median_outNamedTuple1; } PyTypeObject* get_min_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject minNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.min", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&minNamedTuple, &desc); minNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &minNamedTuple; } PyTypeObject* get_min_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject min_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.min_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&min_outNamedTuple1, &desc); min_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &min_outNamedTuple1; } PyTypeObject* get_mode_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject modeNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.mode", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&modeNamedTuple, &desc); modeNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &modeNamedTuple; } PyTypeObject* get_mode_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject mode_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.mode_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&mode_outNamedTuple1, &desc); mode_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &mode_outNamedTuple1; } PyTypeObject* get_nanmedian_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject nanmedianNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.nanmedian", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&nanmedianNamedTuple, &desc); nanmedianNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &nanmedianNamedTuple; } PyTypeObject* get_nanmedian_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject nanmedian_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.nanmedian_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&nanmedian_outNamedTuple1, &desc); nanmedian_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &nanmedian_outNamedTuple1; } PyTypeObject* get_qr_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"Q", ""}, {"R", ""}, {nullptr} }; static PyTypeObject qr_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.qr_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&qr_outNamedTuple, &desc); qr_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &qr_outNamedTuple; } PyTypeObject* get_qr_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"Q", ""}, {"R", ""}, {nullptr} }; static PyTypeObject qrNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.qr", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&qrNamedTuple1, &desc); qrNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &qrNamedTuple1; } PyTypeObject* get_slogdet_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"sign", ""}, {"logabsdet", ""}, {nullptr} }; static PyTypeObject slogdetNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.slogdet", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&slogdetNamedTuple, &desc); slogdetNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &slogdetNamedTuple; } PyTypeObject* get_solve_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"LU", ""}, {nullptr} }; static PyTypeObject solveNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.solve", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&solveNamedTuple, &desc); solveNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &solveNamedTuple; } PyTypeObject* get_solve_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"LU", ""}, {nullptr} }; static PyTypeObject solve_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.solve_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&solve_outNamedTuple1, &desc); solve_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &solve_outNamedTuple1; } PyTypeObject* get_sort_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject sort_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.sort_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&sort_outNamedTuple, &desc); sort_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &sort_outNamedTuple; } PyTypeObject* get_sort_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject sortNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.sort", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&sortNamedTuple1, &desc); sortNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &sortNamedTuple1; } PyTypeObject* get_svd_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"U", ""}, {"S", ""}, {"V", ""}, {nullptr} }; static PyTypeObject svd_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.svd_out", nullptr, NamedTuple_fields, 3 }; if (!is_initialized) { PyStructSequence_InitType(&svd_outNamedTuple, &desc); svd_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &svd_outNamedTuple; } PyTypeObject* get_svd_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"U", ""}, {"S", ""}, {"V", ""}, {nullptr} }; static PyTypeObject svdNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.svd", nullptr, NamedTuple_fields, 3 }; if (!is_initialized) { PyStructSequence_InitType(&svdNamedTuple1, &desc); svdNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &svdNamedTuple1; } PyTypeObject* get_symeig_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""}, {nullptr} }; static PyTypeObject symeig_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.symeig_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&symeig_outNamedTuple, &desc); symeig_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &symeig_outNamedTuple; } PyTypeObject* get_symeig_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""}, {nullptr} }; static PyTypeObject symeigNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.symeig", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&symeigNamedTuple1, &desc); symeigNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &symeigNamedTuple1; } PyTypeObject* get_topk_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject topk_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.topk_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&topk_outNamedTuple, &desc); topk_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &topk_outNamedTuple; } PyTypeObject* get_topk_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject topkNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.topk", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&topkNamedTuple1, &desc); topkNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &topkNamedTuple1; } PyTypeObject* get_triangular_solve_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"cloned_coefficient", ""}, {nullptr} }; static PyTypeObject triangular_solve_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.triangular_solve_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&triangular_solve_outNamedTuple, &desc); triangular_solve_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &triangular_solve_outNamedTuple; } PyTypeObject* get_triangular_solve_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"cloned_coefficient", ""}, {nullptr} }; static PyTypeObject triangular_solveNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.triangular_solve", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&triangular_solveNamedTuple1, &desc); triangular_solveNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &triangular_solveNamedTuple1; } } namespace torch { namespace autograd { std::map<std::string, PyTypeObject>& get_namedtuple_types_map() { // [NOTE] Non-global map // This map calls Python functions during its initialization. // If it is a global static variable and in case it is loaded // before Python interpreter is ready, then the calls it makes during // initialization will SEGFAULT. // To avoid this we make it function static variable so that it is // initialized only after the Python interpreter is ready. static std::map<std::string, PyTypeObject> namedtuple_types_map = { {"_det_lu_based_helper", get__det_lu_based_helper_namedtuple()}, {"_fake_quantize_per_tensor_affine_cachemask_tensor_qparams", get__fake_quantize_per_tensor_affine_cachemask_tensor_qparams_namedtuple()}, {"_fused_moving_avg_obs_fq_helper", get__fused_moving_avg_obs_fq_helper_namedtuple()}, {"_lu_with_info", get__lu_with_info_namedtuple()}, {"_unpack_dual", get__unpack_dual_namedtuple()}, {"aminmax", get_aminmax_namedtuple()}, {"aminmax_out", get_aminmax_out_namedtuple()}, {"cummax", get_cummax_namedtuple()}, {"cummax_out", get_cummax_out_namedtuple()}, {"cummin", get_cummin_namedtuple()}, {"cummin_out", get_cummin_out_namedtuple()}, {"eig_out", get_eig_out_namedtuple()}, {"eig", get_eig_namedtuple()}, {"frexp", get_frexp_namedtuple()}, {"frexp_out", get_frexp_out_namedtuple()}, {"geqrf_out", get_geqrf_out_namedtuple()}, {"geqrf", get_geqrf_namedtuple()}, {"histogram_out", get_histogram_out_namedtuple()}, {"histogram", get_histogram_namedtuple()}, {"kthvalue", get_kthvalue_namedtuple()}, {"kthvalue_out", get_kthvalue_out_namedtuple()}, {"linalg_cholesky_ex", get_linalg_cholesky_ex_namedtuple()}, {"linalg_cholesky_ex_out", get_linalg_cholesky_ex_out_namedtuple()}, {"linalg_eig", get_linalg_eig_namedtuple()}, {"linalg_eig_out", get_linalg_eig_out_namedtuple()}, {"linalg_eigh", get_linalg_eigh_namedtuple()}, {"linalg_eigh_out", get_linalg_eigh_out_namedtuple()}, {"linalg_inv_ex", get_linalg_inv_ex_namedtuple()}, {"linalg_inv_ex_out", get_linalg_inv_ex_out_namedtuple()}, {"linalg_lstsq", get_linalg_lstsq_namedtuple()}, {"linalg_lstsq_out", get_linalg_lstsq_out_namedtuple()}, {"linalg_qr", get_linalg_qr_namedtuple()}, {"linalg_qr_out", get_linalg_qr_out_namedtuple()}, {"linalg_slogdet", get_linalg_slogdet_namedtuple()}, {"linalg_slogdet_out", get_linalg_slogdet_out_namedtuple()}, {"linalg_svd_out", get_linalg_svd_out_namedtuple()}, {"linalg_svd", get_linalg_svd_namedtuple()}, {"lstsq_out", get_lstsq_out_namedtuple()}, {"lstsq", get_lstsq_namedtuple()}, {"lu_unpack", get_lu_unpack_namedtuple()}, {"lu_unpack_out", get_lu_unpack_out_namedtuple()}, {"max", get_max_namedtuple()}, {"max_out", get_max_out_namedtuple()}, {"median", get_median_namedtuple()}, {"median_out", get_median_out_namedtuple()}, {"min", get_min_namedtuple()}, {"min_out", get_min_out_namedtuple()}, {"mode", get_mode_namedtuple()}, {"mode_out", get_mode_out_namedtuple()}, {"nanmedian", get_nanmedian_namedtuple()}, {"nanmedian_out", get_nanmedian_out_namedtuple()}, {"qr_out", get_qr_out_namedtuple()}, {"qr", get_qr_namedtuple()}, {"slogdet", get_slogdet_namedtuple()}, {"solve", get_solve_namedtuple()}, {"solve_out", get_solve_out_namedtuple()}, {"sort_out", get_sort_out_namedtuple()}, {"sort", get_sort_namedtuple()}, {"svd_out", get_svd_out_namedtuple()}, {"svd", get_svd_namedtuple()}, {"symeig_out", get_symeig_out_namedtuple()}, {"symeig", get_symeig_namedtuple()}, {"topk_out", get_topk_out_namedtuple()}, {"topk", get_topk_namedtuple()}, {"triangular_solve_out", get_triangular_solve_out_namedtuple()}, {"triangular_solve", get_triangular_solve_namedtuple()}, }; return namedtuple_types_map; } PyTypeObject* get_namedtuple(std::string name) { static auto& namedtuple_types_map = get_namedtuple_types_map(); return namedtuple_types_map[name]; } void initReturnTypes(PyObject* module) { static struct PyModuleDef def = { PyModuleDef_HEAD_INIT, "torch._C._return_types", nullptr, -1, {}}; PyObject* return_types_module = PyModule_Create(&def); if (!return_types_module) { throw python_error(); } for (const auto& return_type_pair : get_namedtuple_types_map()) { // hold onto the TypeObject for the unlikely case of user // deleting or overriding it. Py_INCREF(return_type_pair.second); if (PyModule_AddObject( return_types_module, return_type_pair.first.c_str(), (PyObject)return_type_pair.second) != 0) { Py_DECREF((PyObject)return_type_pair.second); throw python_error(); } } // steals a reference to return_types on success if (PyModule_AddObject(module, "_return_types", return_types_module) != 0) { Py_DECREF(return_types_module); throw python_error(); } } } // namespace autograd } // namespace torch ``` </details> <details> <summary>Eg. updated call in other python__functions</summary> ```cpp // linalg_cholesky_ex static PyObject THPVariable_linalg_cholesky_ex(PyObject* self_, PyObject* args, PyObject* kwargs) { HANDLE_TH_ERRORS static PyTypeObject* NamedTuple = get_namedtuple("linalg_cholesky_ex"); static PyTypeObject* NamedTuple1 = get_namedtuple("linalg_cholesky_ex_out"); static PythonArgParser parser({ "linalg_cholesky_ex(Tensor input, , bool upper=False, bool check_errors=False, TensorList[2] out=None)", }, /traceable=/true); ParsedArgs<4> parsed_args; auto _r = parser.parse(nullptr, args, kwargs, parsed_args); if(_r.has_torch_function()) { return handle_torch_function(_r, nullptr, args, kwargs, THPLinalgVariableFunctionsModule, "torch.linalg"); } if (_r.isNone(3)) { // aten::linalg_cholesky_ex(Tensor self, , bool upper=False, bool check_errors=False) -> (Tensor L, Tensor info) auto dispatch_linalg_cholesky_ex = [](const at::Tensor & self, bool upper, bool check_errors) -> ::std::tuple<at::Tensor,at::Tensor> { pybind11::gil_scoped_release no_gil; return at::linalg_cholesky_ex(self, upper, check_errors); }; return wrap(NamedTuple, dispatch_linalg_cholesky_ex(_r.tensor(0), _r.toBool(1), _r.toBool(2))); } else { // aten::linalg_cholesky_ex.L(Tensor self, *, bool upper=False, bool check_errors=False, Tensor(a!) L, Tensor(b!) info) -> (Tensor(a!) L, Tensor(b!) info) auto out = _r.tensorlist_n<2>(3); auto dispatch_linalg_cholesky_ex_out = [](at::Tensor & L, at::Tensor & info, const at::Tensor & self, bool upper, bool check_errors) -> ::std::tuple<at::Tensor,at::Tensor> { pybind11::gil_scoped_release no_gil; return at::linalg_cholesky_ex_out(L, info, self, upper, check_errors); }; return wrap(NamedTuple1, dispatch_linalg_cholesky_ex_out(out[0], out[1], _r.tensor(0), _r.toBool(1), _r.toBool(2))); } Py_RETURN_NONE; END_HANDLE_TH_ERRORS } ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/66614 Reviewed By: H-Huang Differential Revision: D32741134 Pulled By: zou3519 fbshipit-source-id: 27bada30d20e66333ca1be1775608d9f0cbf9f59	2021-12-06 09:05:29 -08:00
Xiao Wang	bfe5ad28e6	[Linalg] Add a runtime switch to let pytorch prefer a backend impl in linalg functions on GPU (#67980 ) Summary: Per title. This PR introduces a global flag that lets pytorch prefer one of the many backend implementations while calling linear algebra functions on GPU. Usage: ```python torch.backends.cuda.preferred_linalg_library('cusolver') ``` Available options (str): `'default'`, `'cusolver'`, `'magma'`. Issue https://github.com/pytorch/pytorch/issues/63992 inspired me to write this PR. No heuristic is perfect on all devices, library versions, matrix shapes, workloads, etc. We can obtain better performance if we can conveniently switch linear algebra backends at runtime. Performance of linear algebra operators after this PR should be no worse than before. The flag is set to `'default'` by default, which makes everything the same as before this PR. The implementation of this PR is basically following that of https://github.com/pytorch/pytorch/pull/67790. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67980 Reviewed By: mruberry Differential Revision: D32849457 Pulled By: ngimel fbshipit-source-id: 679fee7744a03af057995aef06316306073010a6	2021-12-03 19:06:30 -08:00
Michael Suo	0aa9d177fe	[fx] remove CPatcher (#69032 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69032 I am removing it because, for packaging-related reasons, it's easier if torch.fx is a pure Python module. I don't think there is much reason to keep it: this functionality was experimental, has no known users currently, and we didn't have a clear path to turning it on by default due to regressions in tracing performance. Also, it only was ever enabled for `rand` and friends. Technically the removal of the `enable_cpatching` arguments on `symbolic_trace` and `Tracer.__init__` are BC-breaking, but the docstrings clearly state that the argument is experimental and BC is not guaranteed, so I think it's fine. Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D32706344 Pulled By: suo fbshipit-source-id: 501648b5c3610ae71829b5e7db74e3b8c9e1a480	2021-11-30 11:59:57 -08:00
Christian Puhrsch	75955e4ef8	[clone][sparse] Add `torch._C._sparse` namespace (#68672 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68672 This PR adds `python_module: sparse` to `native_function.yaml`. These functions would appear in `torch._C._sparse` namespace instead of just `torch`. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D32517813 fbshipit-source-id: 7c3d6df57a24d7c7354d0fefe1b628dc89be9431	2021-11-19 19:47:38 -08:00
Joel Schlosser	9a2db6f091	Factor backend routing logic out of convolution forward (#67790 ) Summary: This PR introduces a new function `_select_conv_backend` that returns a `ConvBackend` enum representing the selected backend for a given set of convolution inputs and params. The function and enum are exposed to python for testing purposes through `torch/csrc/Module.cpp` (please let me know if there's a better place to do this). A new set of tests validates that the correct backend is selected for several sets of inputs + params. Some backends aren't tested yet: * nnpack (for mobile) * xnnpack (for mobile) * winograd 3x3 (for mobile) Some flowcharts for reference: ![conv_routing_graph md](https://user-images.githubusercontent.com/75754324/140828957-1135b400-38c0-4c9f-87ef-4f33ceebeeae.png) ![conv_nogroup_routing_graph md](https://user-images.githubusercontent.com/75754324/140828977-ed223a4e-aa86-49f1-9925-c0f6b9ab36af.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/67790 Reviewed By: zou3519 Differential Revision: D32280878 Pulled By: jbschlosser fbshipit-source-id: 0ce55174f470f65c9b5345b9980cf12251f3abbb	2021-11-10 07:53:55 -08:00
eqy	790763b0fe	Add an option to disable reduced precision reductions for FP16 GEMM (#67946 ) Summary: https://github.com/pytorch/pytorch/issues/67578 disabled reduced precision reductions for FP16 GEMMs. After benchmarking, we've found that this has substantial performance impacts for common GEMM shapes (e.g., those found in popular instantiations of multiheaded-attention) on architectures such as Volta. As these performance regressions may come as a surprise to current users, this PR adds a toggle to disable reduced precision reductions `torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = ` rather than making it the default behavior. CC ngimel ptrblck stas00 Note that the behavior after the previous PR can be replicated with `torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = False` Pull Request resolved: https://github.com/pytorch/pytorch/pull/67946 Reviewed By: zou3519 Differential Revision: D32289896 Pulled By: ngimel fbshipit-source-id: a1ea2918b77e27a7d9b391e030417802a0174abe	2021-11-09 17:27:20 -08:00
Yukio Siraichi	8854817f44	Implement Python Array API `asarray` function. (#60627 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60627 In this PR, the core of `frombuffer` and `fromDLPack` onto _tensor_new.cpp_. `asarray` uses such refactored functions for interpreting the object as a tensor. We follow the Python Array API standard found: https://data-apis.org/array-api/latest/API_specification/creation_functions.html?highlight=asarray Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D31640510 Pulled By: mruberry fbshipit-source-id: d0869e0d73cb50023d5866b001dac5d34ca30dfd	2021-10-16 21:11:31 -07:00
Kurt Mohler	a25648953c	Add `warn_only` kwarg to `use_deterministic_algorithms` (#66233 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/64883 Adds a `warn_only` kwarg to `use_deterministic_algorithms`. When enabled, calling an operation that does not have a deterministic implementation will raise a warning, rather than an error. `torch.testing._internal.common_device_type.expectedAlertNondeterministic` is also refactored and documented in this PR to make it easier to use and understand. cc mruberry kurtamohler Pull Request resolved: https://github.com/pytorch/pytorch/pull/66233 Reviewed By: bdhirsh Differential Revision: D31616481 Pulled By: mruberry fbshipit-source-id: 059634a82d54407492b1d8df08f059c758d0a420	2021-10-15 13:54:59 -07:00
Kurt Mohler	5883523c1d	Remove dtype from torch.Storage and use only torch.ByteStorage (#62030 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62030 Remove dtype tracking from Python Storage interface, remove all the different `<type>Storage` classes except for `ByteStorage`, and update serialization accordingly, while maintaining as much FC/BC as possible Fixes https://github.com/pytorch/pytorch/issues/47442 * THE SERIALIZATION FORMAT IS FULLY FC/BC. We worked very hard to make sure this is the case. We will probably want to break FC at some point to make the serialization structure of tensors make more sense, but not today. * There is now only a single torch.ByteStorage class. Methods like `Tensor.set_` no longer check that the dtype of storage is appropriate. * As we no longer know what dtype of a storage is, we've removed the size method from Storage, replacing it with nbytes. This is to help catch otherwise silent errors where you confuse number of elements with number of bytes. * `Storage._new_shared` takes a `nbytes` kwarg and will reject previous positional only calls. `Storage._new_with_file` and `_set_from_file` require explicit element size arguments. * It's no longer possible to convert storages to different types using the float/double/etc methods. Instead, do the conversion using a tensor. * It's no longer possible to allocate a typed storage directly using FloatStorage/DoubleStorage/etc constructors. Instead, construct a tensor and extract its storage. The classes still exist but they are used purely for unpickling. * The preexisting serialization format stores dtype with storage, and in fact this dtype is used to determine the dtype of the tensor overall. To accommodate this case, we introduce a new TypedStorage concept that exists only during unpickling time which is used to temporarily store the dtype so we can construct a tensor. If you overrode the handling of pickling/unpickling, you MUST add handling for TypedStorage or your serialization code will degrade to standard file-based serialization. Original pull request: https://github.com/pytorch/pytorch/pull/59671 Reviewed By: soulitzer, ngimel Differential Revision: D29466819 Pulled By: ezyang fbshipit-source-id: 4a14e5d3c2b08e06e558683d97f7378a3180b00e	2021-10-05 13:50:34 -07:00
Pruthvi Madugundu	085e2f7bdd	[ROCm] Changes not to rely on CUDA_VERSION or HIP_VERSION (#65610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65610 - Replace HIP_PLATFORM_HCC with USE_ROCM - Dont rely on CUDA_VERSION or HIP_VERSION and use USE_ROCM and ROCM_VERSION. - In the next PR - Will be removing the mapping from CUDA_VERSION to HIP_VERSION and CUDA to HIP in hipify. - HIP_PLATFORM_HCC is deprecated, so will add HIP_PLATFORM_AMD to support HIP host code compilation on gcc. cc jeffdaily sunway513 jithunnair-amd ROCmSupport amathews-amd Reviewed By: jbschlosser Differential Revision: D30909053 Pulled By: ezyang fbshipit-source-id: 224a966ebf1aaec79beccbbd686fdf3d49267e06	2021-09-29 09:55:43 -07:00
Nikita Shulga	a9b0a921d5	Disable `avoid-non-const-global-variables` lint check (#62008 ) Summary: As GoogleTest `TEST` macro is non-compliant with it as well as `DEFINE_DISPATCH` All changes but the ones to `.clang-tidy` are generated using following script: ``` for i in `find . -type f -iname ".c" -or -iname "*.h"\|xargs grep cppcoreguidelines-avoid-non-const-global-variables\|cut -f1 -d:\|sort\|uniq`; do sed -i "/\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-non-const-global-variables)/d" $i; done ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62008 Reviewed By: driazati, r-barnes Differential Revision: D29838584 Pulled By: malfet fbshipit-source-id: 1b2f8602c945bd4ce50a9bfdd204755556e31d13	2021-07-22 18:04:40 -07:00
cyy	a26a9f8b75	zero initialize some members and other fixes (#59915 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59915 Reviewed By: soulitzer Differential Revision: D29106684 Pulled By: ezyang fbshipit-source-id: 713cbdf10866017ee715ee89ec82acb592c769b6	2021-07-19 07:36:26 -07:00
Nikita Shulga	4036820506	Add PocketFFT support (#60976 ) Summary: Needed on platforms, that do not have MKL, such as aarch64 and M1 - Add `AT_POCKETFFT_ENABLED()` to Config.h.in - Introduce torch._C.has_spectral that is true if PyTorch was compiled with either MKL or PocketFFT - Modify spectral test to use skipCPUIfNoFFT instead of skipCPUIfNoMKL Share implementation of `_out` functions as well as fft_fill_with_conjugate_symmetry_stub between MKL and PocketFFT implementations Fixes https://github.com/pytorch/pytorch/issues/41592 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60976 Reviewed By: walterddr, driazati, janeyx99, samestep Differential Revision: D29466530 Pulled By: malfet fbshipit-source-id: ac5edb3d40e7c413267825f92a5e8bc4bb249caf	2021-06-30 16:28:20 -07:00
Victor Bittorf	8b6487c650	Add CUDA Vital (#58059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58059 Add CUDA.used vital sign which is true only if CUDA was "used" which technically means the context was created. Also adds the following features: - Force vitals to be written even if vitals are disabled, to enable testing when the env variable is not set from the start of execution - Add a read_vitals call for python to read existing vital signs. Test Plan: buck test mode/dbg caffe2/test:torch -- --regex basic_vitals Reviewed By: xuzhao9 Differential Revision: D28357615 fbshipit-source-id: 681bf9ef63cb1458df9f1c241d301a3ddf1e5252	2021-06-25 16:31:11 -07:00
Nikita Shulga	36a5647e30	Handle exceptions from THPModule_setQEngine (#60073 ) Summary: Prevents Python runtime crashes when `torch._C._set_qengine(2**65)` or `torch.backends.quantized.engine="fbgemm"` if PyTorch was compiled without fbgemm Pull Request resolved: https://github.com/pytorch/pytorch/pull/60073 Reviewed By: supriyar Differential Revision: D29156430 Pulled By: malfet fbshipit-source-id: 95b97352a52a262f1634b72da64a0c950eaf2373	2021-06-16 00:40:59 -07:00
Richard Barnes	e3d75b8475	irange for PyTorch sans jit (#59481 ) Summary: Switches most of the simple for loops outside of `jit` directories to use `c10::irange`. Generated with D28874212. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59481 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D28909681 fbshipit-source-id: ec9ab1bd602933238d9d0f73d4d8d027b75d9d85	2021-06-09 14:46:11 -07:00
driazati	059a717c9e	Fix breakpad build and add to more images (#59236 ) Summary: This PR * adds the breakpad build to most of the remaining docker images (except the mobile + slim ones) * pins to a [fork of breakpad](https://github.com/google/breakpad/compare/master...driazati:master?expand=1) to enable dasiy chaining on signal handlers * renames the API to be nicer Pull Request resolved: https://github.com/pytorch/pytorch/pull/59236 Reviewed By: malfet Differential Revision: D28792511 Pulled By: driazati fbshipit-source-id: 83723e74b7f0a00e1695210ac2620a0c91ab4bf2	2021-06-01 22:47:14 -07:00
Elton Leander Pinto	029bec4505	[lint] Fix uninitialized variable lint error in `Module.cpp` (#58499 ) Summary: This PR fixes two uninitialized variable lint warnings in `Module.cpp` by initializing them to `nullptr`s. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58499 Reviewed By: driazati, samestep Differential Revision: D28519192 Pulled By: 1ntEgr8 fbshipit-source-id: 293cd4b296eea70b72adf02cd73f354063b124c6	2021-05-19 07:55:24 -07:00
Kenichi Maehashi	4350d4af77	Immediately mark DLPack capsule as used after stealing the ownership (#56789 ) Summary: After stealing the ownership of the tensor passed via DLPack capsule, PyTorch should immediately mark it as used (by changing its name to `used_dltensor`). This fix is needed because the following line may raise an exception: ```cpp py::module::import("torch.cuda").attr("init")(); ``` When an exception is raised, Tensor created by `at::fromDLPack` calls the `deleter`. However as the causple is not consumed, the producer (a library that created the capsule) also calls the `deleter`, causing a double free. Reprodcuer (I'm running this code on A100 GPU + PyTorch wheel which does not include `sm_80` support; in this configuration `torch.cuda.init` will raise a warning): ```py $ python -Werror >>> import torch.utils.dlpack >>> import cupy >>> tensor = torch.utils.dlpack.from_dlpack(cupy.arange(10).toDlpack()) free(): double free detected in tcache 2 zsh: abort (core dumped) python -Werror ``` Once this fix is merged users can now see the exception correctly: ``` A100-PCIE-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70. If you want to use the A100-PCIE-40GB GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56789 Reviewed By: astaff Differential Revision: D28118512 Pulled By: mruberry fbshipit-source-id: 56992f7a3fc78d94c69513e864a473ae9587a9c8	2021-05-01 16:20:54 -07:00
Nikita Shulga	eac02f85cf	Fix more clang-tidy errors (#57235 ) Summary: In my last PR I've missed CUDA and distributed folders, fixing this now This change is autogenerated by `python tool/clang_tidy.py -s` Pull Request resolved: https://github.com/pytorch/pytorch/pull/57235 Reviewed By: janeyx99 Differential Revision: D28084444 Pulled By: malfet fbshipit-source-id: bf222f69ee90c7872c3cb0931e8cdb84f0cb3cda	2021-04-28 23:29:10 -07:00
Nikita Shulga	4cb534f92e	Make PyTorch code-base clang-tidy compliant (#56892 ) Summary: This is an automatic change generated by the following script: ``` #!/usr/bin/env python3 from subprocess import check_output, check_call import os def get_compiled_files_list(): import json with open("build/compile_commands.json") as f: data = json.load(f) files = [os.path.relpath(node['file']) for node in data] for idx, fname in enumerate(files): if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'): files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')] return files def run_clang_tidy(fname): check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"]) changes = check_output(["git", "ls-files", "-m"]) if len(changes) == 0: return check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"]) def main(): git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n") compiled_files = get_compiled_files_list() for idx, fname in enumerate(git_files): if fname not in compiled_files: continue if fname.startswith("caffe2/contrib/aten/"): continue print(f"[{idx}/{len(git_files)}] Processing {fname}") run_clang_tidy(fname) if __name__ == "__main__": main() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892 Reviewed By: H-Huang Differential Revision: D27991944 Pulled By: malfet fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179	2021-04-28 14:10:25 -07:00
davidriazati@fb.com	638617f9f8	Write mini dump on pybind exceptions (#55652 ) Summary: We register an [error handler](https://pybind11.readthedocs.io/en/stable/advanced/exceptions.html#registering-custom-translators) with pybind so that C++ exceptions are passed to Python and raised as runtime errors that can be `try...except`ed etc. Since these don't terminate the program (until Python does), they never fire the signal handler to write a minidump out with the crash information. This PR adds some logic in the exception translator to write out a minidump if enabled. ](https://our.intern.facebook.com/intern/diff/27830952/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/55652 Pulled By: driazati Reviewed By: bertmaher Differential Revision: D27830952 fbshipit-source-id: 26e8f913e99dff971a4eb09eb87221c66f759763	2021-04-19 14:53:43 -07:00
David Riazati	1ec12fd491	Add minidump collection via breakpad (#55647 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55647 This adds [breakpad](https://github.com/google/breakpad) which comes with out-of-the-box utilities to register a signal handler that writes out a minidump on an unhandled exception. Right now this is gated behind a flag in `torch.utils`, but in the future it could be on by default. Sizewise this adds aboute 500k to `libtorch_cpu.so` (187275968 B to 187810016 B). ```bash $ cat <<EOF > test.py import torch torch.utils.enable_minidump_collection() # temporary util that just segfaults torch._C._crash() EOF $ python test.py Wrote minidump to /tmp/pytorch_crashes/6a829041-50e9-4247-ea992f99-a74cf47a.dmp fish: “python test.py” terminated by signal SIGSEGV (Address boundary error) $ minidump-2-core /tmp/pytorch_crashes/6a829041-50e9-4247-ea992f99-a74cf47a.dmp -o core.dmp $ gdb python core.dmp ... commence debugging ... ``` Right now all exceptions that get passed up to Python don't trigger the signal handler (which by default only handles [these](https://github.com/google/breakpad/blob/main/src/client/linux/handler/exception_handler.cc#L115)). It would be possible for PyTorch exceptions to explicitly write a minidump when passed up to Python (maybe only when the exception is unhandled or something). Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D27679767 Pulled By: driazati fbshipit-source-id: 1ab3b5160b6dc405f5097eb25acc644d533358d7	2021-04-16 13:05:01 -07:00
Victor Bittorf	52f1a07b63	Python API for Vitals (#53238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53238 There is a tension for the Vitals design: (1) we want a macro based logging API for C++ and (2) we want a clean python API. Furthermore, we want to this to work with "print on destruction" semantics. The unfortunate resolution is that there are (2) ways to define vitals: (1) Use the macros for local use only within C++ - this keeps the semantics people enjoy (2) For vitals to be used through either C++ or Python, we use a global VitalsAPI object. Both these go to the same place for the user: printing to stdout as the globals are destructed. The long history on this diff shows many different ways to try to avoid having 2 different paths... we tried weak pointers & shared pointers, verbose switch cases, etc. Ultimately each ran into an ugly trade-off and this cuts the difference better the alternatives. Test Plan: buck test mode/dev caffe2/test:torch -- --regex vital buck test //caffe2/aten:vitals Reviewed By: orionr Differential Revision: D26736443 fbshipit-source-id: ccab464224913edd07c1e8532093f673cdcb789f	2021-04-15 16:06:43 -07:00
Alban Desmaison	b91d48877d	Reland Fix reference cycle in sparse coalesce graph (#55404 ) Summary: Reland of https://github.com/pytorch/pytorch/pull/52874 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55404 Reviewed By: bdhirsh Differential Revision: D27600438 Pulled By: albanD fbshipit-source-id: f5c286638b324ad59be65657a016028af5e2b303	2021-04-07 12:02:42 -07:00
Brian Hirsh	ec80981d28	Revert D27246997: [pytorch][PR] Fix reference cycle in sparse coalesce graph Test Plan: revert-hammer Differential Revision: D27246997 (`815bfad28c`) Original commit changeset: 0fe6c1104350 fbshipit-source-id: 4d345718589a642d3c65474b266342285205ccdf	2021-04-06 11:45:27 -07:00
Peter Bell	815bfad28c	Fix reference cycle in sparse coalesce graph (#52874 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52253 In the issue reproducer we can replace `torch.sparse.sum(S)` with `S.coalesce()` and get the same memory leak. The reason is that calling `coalesce()` on an already coalesced tensor returns `self`. With autograd, the result gets it's `grad_fn` set to a node that contains a reference to the input tensor, creating a reference cycle. Cloning the tensor fixes this, so `coalesce` always returns a new tensor. As an aside, `torch.sparse.sum(S)` doesn't need to coalesce. The result should be the same either way. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52874 Reviewed By: bdhirsh Differential Revision: D27246997 Pulled By: albanD fbshipit-source-id: 0fe6c11043501a7874a50982afd42964f47470d3	2021-04-06 08:32:19 -07:00
Horace He	1324b0dd44	[FX] Adds C-level monkeypatching of `torch.randn` so that we can capture it during tracing. (#54060 ) Summary: ``` def foo(x): return x + torch.randn(3, 3) fx.enable_ctracing(True) print(fx.symbolic_trace(foo).code) ``` results in ``` def forward(self, x): randn = torch.randn(3, 3) add = x + randn; x = randn = None return add ``` Seems to slow down tracing by 1.5-3x. DenseNet121: 0.05 -> 0.12 seconds ResNet18: 0.10 -> 0.15 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54060 Reviewed By: jamesr66a Differential Revision: D27208978 Pulled By: Chillee fbshipit-source-id: b9e19a9b1084dadfc0dfaee41a03bc25a45910b1	2021-04-01 07:34:31 -07:00
Taylor Robie	a4ca394f8a	Revert "Revert D26907093: Add repeats to Timer.collect_callgrind(...)" (#54484 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54484 Re-land of https://github.com/pytorch/pytorch/pull/53295. (With fixed unit tests.) This reverts commit `0dc5abfaa9`. Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D27255201 Pulled By: robieta fbshipit-source-id: 4e9fed7522631d66c5cd7e27ace9b5ffc3a0bbfc	2021-03-23 21:58:17 -07:00
Edward Yang	e0aebe241d	Refactor tensor_new.cpp to use TensorOptions instead of DispatchKey (#54034 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54034 Fixes #53544 I had to touch a bunch of lines but the refactoring was fairly mechanical. Here's how it works. The basic concept behind this PR is that tensor_new.cpp was previously abusing DispatchKey when it actually meant TensorOptions. The provided DispatchKey argument to most of the constructor functions typically comes from torch::tensors::get_default_dispatch_key(); it doesn't really make sense for people to set the default dispatch key, but this got grandfathered in due to the old API set_default_tensor_type (where the "Type" concept got refactored into "DispatchKey" concept over time). See also #53124. But the upshot is that, semantically, what we refer to as the default dispatch key really is more like torch.set_default_tensor_type(torch.Tensor) versus torch.set_default_tensor_type(torch.cuda.Tensor): clearly the user wants to do something about construction of the tensor, and TensorOptions captures that exactly. So, how exactly to translate from one to the other? - Sources (things that used to PRODUCE DispatchKey) - Most top level functions take a DispatchKey as their argument. I use the new function dispatchKeyToTensorOptions to convert it into a TensorOptions - typeIdWithDefault now produces a TensorOptions (probably could do with a rename, though I didn't) - Sinks (things that used to CONSUME DispatchKey) - Previously, the function options() was typically used to convert the DispatchKey into a TensorOptions. Now its replacement build_options just takes a TensorOptions and sets some extra fields on it. Irritatingly, I can't just replace `build_options(options, scalar_type, device)` with `options.dtype(scalar_type).device(device)` because the semantics are slightly different: if device is nullopt, we should preserve the usage of the device specified in options (what options.device() does is overwrite the device unconditionally; e.g., if device is nullopt, unset device from options) - The other major sink for DispatchKey was `internal_new_from_data`, but it turns out it only really extracts the device type from the dispatch key. Now it just pulls out the device from TensorOptions. - To actually do the translation of DispatchKey to TensorOptions, I introduce new functions dispatchKeyToLayout (replicating layout_from_backend--there are still a few uses of this function so I couldn't delete it) and dispatchKeyToDeviceType (replacing computeDeviceType) - In all internal functions, whenever DispatchKey is taken as an argument, I instead take TensorOptions as an argument, and pass it along. - Anywhere `legacyExtractDispatchKey(other.key_set())` equality was previously used, I now do `other.options().type_equal()`, which is the intended BC for doing "backend to backend" comparisons - There are a few places in the sparse constructors where we allocated a tensor for values, and then read out the dispatch key from the result to allocate the keys. As best as I can tell, this is totally equivalent to just passing in the options to both values and indices (the only difference is dtype, which is captured via a separate argument) This refactor doesn't really go far enough: for example, there are now functions that take both TensorOptions and ScalarType, when really the TensorOptions can capture this all. I kept it solely just s/DispatchKey/TensorOptions/ to reduce the number of possible bugs; also, a lot of this will be mooted by a proper fix to #53124. Even with this limited refactor, the payoff is sweet. I can delete: - backendToCPU - backendToXPU - backendToCUDA - backendToHIP - backendToBackendOfDeviceType The reason I can do this is because I can simply overwrite layout in TensorOptions to do the conversion, rather than having to type out each backend case explicitly. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D27109509 Pulled By: ezyang fbshipit-source-id: 91d16cfbc390127770362ac04fb43f7e070077e9	2021-03-19 09:08:32 -07:00
Pavel Belevich	0dc5abfaa9	Revert D26907093: Add repeats to Timer.collect_callgrind(...) Test Plan: revert-hammer Differential Revision: D26907093 (`74993dcf7b`) Original commit changeset: 72e5b4889691 fbshipit-source-id: 80779ec895920a4e9b33daa56f32b587f8912ed6	2021-03-17 20:14:21 -07:00
Taylor Robie	74993dcf7b	Add repeats to Timer.collect_callgrind(...) (#53295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53295 A lot of the time spent in `collect_callgrind` is spinning up Valgrind and executing the initial `import torch`. In most cases the actual run loop is a much smaller fraction. As a result, we can reuse the same process to do multiple replicates and do a much better job amortizing that startup cost. This also tends to result in more stable measurements: the kth run is more repeatable than the first because everything has been given a chance to settle into a steady state. The instruction microbenchmarks lean heavily on this behavior. I found that in practice doing several `n=100` replicates to be more reliable than one monolithic 10,000+ iteration run. (Since rare cases like memory consolidation will just contaminate that one replicate, as opposed to getting mixed into the entire long run.) Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D26907093 Pulled By: robieta fbshipit-source-id: 72e5b48896911f5dbde96c8387845d7f9882fdb2	2021-03-17 18:05:13 -07:00
mattip	54a2498919	Modify tests to use assertWarnsOnceRegex instead of maybeWarnsRegex (#52387 ) Summary: Related to https://github.com/pytorch/pytorch/issues/50006 Follow on for https://github.com/pytorch/pytorch/issues/48560 to ensure TORCH_WARN_ONCE warnings are caught. Most of this is straight-forward find-and-replace, but I did find one place where the TORCH_WARN_ONCE warning was not wrapped into a python warning. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52387 Reviewed By: albanD Differential Revision: D26773387 Pulled By: mruberry fbshipit-source-id: 5be7efbc8ab4a32ec8437c9c45f3b6c3c328f5dd	2021-03-08 03:32:14 -08:00
Sam Estep	8c798e0622	Forbid trailing whitespace (#53406 ) Summary: Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857 These are the only hand-written parts of this diff: - the addition to `.github/workflows/lint.yml` - the file endings changed in these four files (to appease FB-internal land-blocking lints): - `GLOSSARY.md` - `aten/src/ATen/core/op_registration/README.md` - `scripts/README.md` - `torch/csrc/jit/codegen/fuser/README.md` The rest was generated by running this command (on macOS): ``` git grep -I -l ' $' -- . ':(exclude)/contrib/' ':(exclude)third_party' \| xargs gsed -i 's/ *$//' ``` I looked over the auto-generated changes and didn't see anything that looked problematic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406 Test Plan: This run (after adding the lint but before removing existing trailing spaces) failed: - https://github.com/pytorch/pytorch/runs/2043032377 This run (on the tip of this PR) succeeded: - https://github.com/pytorch/pytorch/runs/2043296348 Reviewed By: walterddr, seemethere Differential Revision: D26856620 Pulled By: samestep fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97	2021-03-05 17:22:55 -08:00
kshitij12345	c4c77e2001	[special] add `torch.special` namespace (#52296 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 * Add `torch.special` namespace * Add `torch.special.gammaln` (alias to `torch.lgamma`) TODO: * Add proper entries for docs. * [x] Add .rst file entry * [x] Add documentation * [x] Update `lgamma` OpInfo entry for alias to `special.gammaln`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52296 Reviewed By: ngimel Differential Revision: D26754890 Pulled By: mruberry fbshipit-source-id: 73479f68989d6443ad07b7b02763fa98973c15f6	2021-03-04 00:04:36 -08:00
Nikita Shulga	a0a1bb074b	Make NumPy dependency dynamic (#52794 ) Summary: Move NumPy initialization from `initModule()` to singleton inside `torch::utils::is_numpy_available()` function. This singleton will print a warning, that NumPy integration is not available, rather than fails to import torch altogether. The warning be printed only once, and will look something like the following: ``` UserWarning: Failed to initialize NumPy: No module named 'numpy.core' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:66.) ``` This is helpful if PyTorch was compiled with wrong NumPy version, of NumPy is not commonly available on the platform (which is often the case on AARCH64 or Apple M1) Test that PyTorch is usable after numpy is uninstalled at the end of `_test1` CI config. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52794 Reviewed By: seemethere Differential Revision: D26650509 Pulled By: malfet fbshipit-source-id: a2d98769ef873862c3704be4afda075d76d3ad06	2021-02-25 19:45:00 -08:00
Bel H	db33afbf9f	Change cmake to allow building with MLC kick-off build (#51326 ) Summary: - Allows build process to build with MLC enabled if subrepo folder mlc is in path and we can link against ML Compute on macOS BigSur - To build with MLC enabled you will need to clone the mlc repo inside the pytorch repository. - We need both this change and https://github.com/pytorch/pytorch/pull/50634 on pytorch/pytorch to enable the `mlc` device. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51326 Reviewed By: glaringlee Differential Revision: D26533138 Pulled By: malfet fbshipit-source-id: 0baa06b4eb2d62dbfc0f6fc922096cb0db1cc7d1	2021-02-19 13:04:25 -08:00
Kimish Patel	a6e94d274f	[Pytorch] Add python binding to use mobile cpu allocator. (#52323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52323 Using default cpu allocator for ops executed on qnnpack backend will result in asan failures with heap overflow since qnnpack (and xnnpack) can access input beyond their and/beginning. Here we are enabling this feature specifically to enable dynamic sparse linear op test using qnnpack engine. In dynamic linear op, the fp32 bias is not packed and hence can result in out-of-bound access. Test Plan: test_set_default_mobile_cpu_allocator.py Reviewed By: z-a-f Differential Revision: D26263481 fbshipit-source-id: a49227cac7e6781b0db4a156ca734d7671972d9f	2021-02-17 08:42:23 -08:00
mattip	b97a040f71	ENH: toggle TORCH_WARN_ONCE to TORCH_WARN for tests (#48560 ) Summary: Toward fixing https://github.com/pytorch/pytorch/issues/47624 ~Step 1: add `TORCH_WARN_MAYBE` which can either warn once or every time in c++, and add a c++ function to toggle the value. Step 2 will be to expose this to python for tests. Should I continue in this PR or should we take a different approach: add the python level exposure without changing any c++ code and then over a series of PRs change each call site to use the new macro and change the tests to make sure it is being checked?~ Step 1: add a python and c++ toggle to convert TORCH_WARN_ONCE into TORCH_WARN so the warnings can be caught in tests Step 2: add a python-level decorator to use this toggle in tests Step 3: (in future PRs): use the decorator to catch the warnings instead of `maybeWarnsRegex` Pull Request resolved: https://github.com/pytorch/pytorch/pull/48560 Reviewed By: ngimel Differential Revision: D26171175 Pulled By: mruberry fbshipit-source-id: d83c18f131d282474a24c50f70a6eee82687158f	2021-02-08 08:21:19 -08:00
Will Constable	f2e41257e4	Back out "Revert D26077905: Back out "Revert D25850783: Add torch::deploy, an embedded torch-python interpreter"" (#51267 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51267 Original commit changeset: b70185916502 Test Plan: test locally, oss ci-all, fbcode incl deferred Reviewed By: suo Differential Revision: D26121251 fbshipit-source-id: 4315b7fd5476914c8e5d6f547e1cfbcf0c227781	2021-01-28 19:30:45 -08:00
Richard Zou	1379842f4a	Add private mechanism to toggle vmap fallback warnings (#51218 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51218 Fixes #51144. Context ======= Users have complained about warning spam from batched gradient computation. This warning spam happens because warnings in C++ don't correctly get turned into Python warnings when those warnings arise from the autograd engine. To work around that, this PR adds a mechanism to toggle vmap warnings. By default, the vmap fallback will not warn when it is invoked. However, by using `torch._C._debug_only_display_vmap_fallback_warnings(enabled)`, one can toggle the existence of vmap fallback warnings. This API is meant to be a private, debug-only API. The goal is to be able to non-intrusively collect feedback from users to improve performance on their workloads. What this PR does ================= This PR adds an option to toggle vmap warnings. The mechanism is toggling a bool in ATen's global context. There are some other minor changes: - This PR adds a more detailed explanation of performance cliffs to the autograd.functional.{jacobian, hessian} documentation - A lot of the vmap tests in `test_vmap.py` rely on the fallback warning to test the presence of the fallback. In test_vmap, I added a context manager to toggle on the fallback warning while testing. Alternatives ============ I listed a number of alternatives in #51144. My favorite one is having a new "performance warnings mode" (this is currently a WIP by some folks on the team). This PR is to mitigate the problem of warning spam before a "performance warnings mode" gets shipped into PyTorch Concerns ======== I am concerned that we are advertising a private API (`torch._C._debug_only_display_vmap_fallback_warnings(enabled)`) in the PyTorch documentation. However, I hope the naming makes it clear to users that they should not rely on this API (and I don't think they have any reason to rely on the API). Test Plan ========= Added tests in `test_vmap.py` to check: - by default, the fallback does not warn - we can toggle whether the fallback warns or not Test Plan: Imported from OSS Reviewed By: pbelevich, anjali411 Differential Revision: D26126419 Pulled By: zou3519 fbshipit-source-id: 95a97f9b40dc7334f6335a112fcdc85dc03dcc73	2021-01-28 13:05:00 -08:00
Mike Ruberry	12a434abbc	Revert D26077905: Back out "Revert D25850783: Add torch::deploy, an embedded torch-python interpreter" Test Plan: revert-hammer Differential Revision: D26077905 (`dc2a44c4fc`) Original commit changeset: fae83bf9822d fbshipit-source-id: b70185916502ba9ebe16d781cf0659b9f7865c9a	2021-01-27 19:53:29 -08:00
Will Constable	dc2a44c4fc	Back out "Revert D25850783: Add torch::deploy, an embedded torch-python interpreter" (#51124 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51124 Original commit changeset: 1c7133627da2 Test Plan: Test locally with interpreter_test and on CI Reviewed By: suo Differential Revision: D26077905 fbshipit-source-id: fae83bf9822d79e9a9b5641bc5191a7f3fdea78d	2021-01-27 16:49:42 -08:00
Mike Ruberry	e843974a6e	Revert D25850783: Add torch::deploy, an embedded torch-python interpreter Test Plan: revert-hammer Differential Revision: D25850783 (`3192f9e4fe`) Original commit changeset: a4656377caff fbshipit-source-id: 1c7133627da28fb12848da7a9a46de6d3b2b67c6	2021-01-26 02:07:44 -08:00
Will Constable	3192f9e4fe	Add torch::deploy, an embedded torch-python interpreter (#50458 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50458 libinterpreter.so contains a frozen python distribution including torch-python bindings. Freezing refers to serializing bytecode of python standard library modules as well as the torch python library and embedding them in the library code. This library can then be dlopened multiple times in one process context, each interpreter having its own python state and GIL. In addition, each python environment is sealed off from the filesystem and can only import the frozen modules included in the distribution. This change relies on newly added frozenpython, a cpython 3.8.6 fork built for this purpose. Frozenpython provides libpython3.8-frozen.a which contains frozen bytecode and object code for the python standard library. Building on top of frozen python, the frozen torch-python bindings are added in this diff, providing each embedded interpreter with a copy of the torch bindings. Each interpreter is intended to share one instance of libtorch and the underlying tensor libraries. Known issues - Autograd is not expected to work with the embedded interpreter currently, as it manages its own python interactions and needs to coordinate with the duplicated python states in each of the interpreters. - Distributed and cuda stuff is disabled in libinterpreter.so build, needs to be revisited - __file__ is not supported in the context of embedded python since there are no files for the underlying library modules. using __file__ - __version__ is not properly supported in the embedded torch-python, just a workaround for now Test Plan: tested locally and on CI with cmake and buck builds running torch::deploy interpreter_test Reviewed By: ailzhang Differential Revision: D25850783 fbshipit-source-id: a4656377caff25b73913daae7ae2f88bcab8fd88	2021-01-25 15:14:28 -08:00
Kurt Mohler	8ab1a1495d	Rename `set_deterministic` to `use_deterministic_algorithms` (#49904 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49100 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49904 Reviewed By: ezyang, mrshenli Differential Revision: D25956761 Pulled By: mruberry fbshipit-source-id: 86a59289d50825a0ebbd7c358b483c8d8039ffa6	2021-01-22 11:27:07 -08:00
Taylor Robie	d31a760be4	move has_torch_function to C++, and make a special case object_has_torch_function (#48965 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48965 This PR pulls `__torch_function__` checking entirely into C++, and adds a special `object_has_torch_function` method for ops which only have one arg as this lets us skip tuple construction and unpacking. We can now also do away with the Python side fast bailout for `Tensor` (e.g. `if any(type(t) is not Tensor for t in tensors) and has_torch_function(tensors)`) because they're actually slower than checking with the Python C API. Test Plan: Existing unit tests. Benchmarks are in #48966 Reviewed By: ezyang Differential Revision: D25590732 Pulled By: robieta fbshipit-source-id: 6bd74788f06cdd673f3a2db898143d18c577eb42	2021-01-10 19:23:35 -08:00
Qifan Lu	cfc3db0ca9	Remove THPWrapper (#49871 ) Summary: Remove `THPWrapper` from PyTorch C code since it is not used anymore and because we have dropped Python 2 compatibility, its usage can be replaced by capsule objects (`PyCapsule_New`, `PyCapsule_CheckExact`, `PyCapsule_GetPointer` and `PyCapsule_GetDestructor`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49871 Reviewed By: mruberry Differential Revision: D25715038 Pulled By: albanD fbshipit-source-id: cc3b6f967bbe0dc42c692adf76dff4e4b667fdd5	2020-12-30 03:01:52 -08:00
peterjc123	815d38395a	PyLong_{As/From}{Long/UnsignedLong} lint checks (#49280 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45581 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49280 Reviewed By: mruberry Differential Revision: D25592330 Pulled By: ezyang fbshipit-source-id: 5c16d6aed88ad1feaa7f129b4cd44c0561be2de2	2020-12-17 09:32:08 -08:00
Michael Carilli	c068180a17	[CUDA graphs] Cuda RNG-safe graph capture and replay bindings (#48875 ) Summary: Part 2 of https://github.com/pytorch/pytorch/pull/46148 refactor. (part 1 was https://github.com/pytorch/pytorch/pull/48694.) Contains - a few more CUDAGeneratorImpl diffs to clean up graph capture interaction - Capture and replay bindings that interact correctly with CUDAGeneratorImpl - Tests. Diffs compile and tests pass on my machine (ubuntu 20.04, cuda 11.0) but it needs finetuning for many CI builds. See [Note [CUDA Graph-safe RNG states]](`02d89f9f1d/aten/src/ATen/CUDAGeneratorImpl.h (L13-L85)`) for the strategy, based on https://github.com/pytorch/pytorch/pull/46148#issuecomment-724414794. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48875 Reviewed By: zou3519 Differential Revision: D25482654 Pulled By: ngimel fbshipit-source-id: 634dbc4c6c9d7d0d9a62dc81a52d430561f905fe	2020-12-14 10:51:58 -08:00
Taylor Robie	27905dfe9c	Expose CXX_FLAGS through __config__ (#47861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47861 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D25199263 Pulled By: robieta fbshipit-source-id: 3cfdb0485d686a03a68dd0907d1733634857963f	2020-12-01 19:58:29 -08:00
Nikita Shulga	2b6a720eb1	Update pybind to 2.6.0 (#46415 ) Summary: Preserve PYBIND11 (`63ce3fbde8`) configuration options in `torch._C._PYBIND11 (`63ce3fbde8`)_COMPILER_TYPE` and use them when building extensions Also, use f-strings in `torch.utils.cpp_extension` "Fixes" https://github.com/pytorch/pytorch/issues/46367 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46415 Reviewed By: VitalyFedyunin Differential Revision: D24605949 Pulled By: malfet fbshipit-source-id: 87340f2ed5308266a46ef8f0317316227dab9d4d	2020-10-29 10:53:47 -07:00
Nikita Shulga	42a51148c1	Use f-strings in torch.utils.cpp_extension (#47025 ) Summary: Plus two minor fixes to `torch/csrc/Module.cpp`: - Use iterator of type `Py_ssize_t` for array indexing in `THPModule_initNames` - Fix clang-tidy warning of unneeded defaultGenerator copy by capturing it as `const auto&` Pull Request resolved: https://github.com/pytorch/pytorch/pull/47025 Reviewed By: samestep Differential Revision: D24605907 Pulled By: malfet fbshipit-source-id: c276567d320758fa8b6f4bd64ff46d2ea5d40eff	2020-10-28 21:32:33 -07:00
albanD	143d1fd9f5	Namespace cleanup for 1.7 Part 2 (#46673 ) Summary: make valgrind_toggle and valgrind_supported_platform private functions Pull Request resolved: https://github.com/pytorch/pytorch/pull/46673 Reviewed By: gchanan Differential Revision: D24458133 Pulled By: albanD fbshipit-source-id: 6f3fad9931d73223085edbd3cd3b7830c569570c	2020-10-22 07:57:51 -07:00
Pritam Damania	2b221a9599	Remove PyCFunction casts as much as possible. (#46227 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46227 Follow up from https://github.com/pytorch/pytorch/issues/45419, in this PR I've removed as many PyCFunction casts as I could from the codebase. The only ones I didn't remove were the ones with `METH_VARARGS \| METH_KEYWORDS` which have 3 parameters instead of 2 and had to be casted. Example: ` {"copy_", (PyCFunction)(void(*)(void))THPStorage_(copy_), METH_VARARGS \| METH_KEYWORDS, nullptr},` ghstack-source-id: 114632704 Test Plan: waitforbuildbot Reviewed By: albanD Differential Revision: D24269435 fbshipit-source-id: 025cfd43a9a2a3e59f6b2951c1a78749193d77cf	2020-10-20 15:01:51 -07:00
Michael Ranieri	b1d24dded1	make a way to disable callgrind (#46116 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46116 Ideally I would just use one of the existing preprocessor flags such as `FBCODE_CAFFE2`, but this implies a whole bunch of other things elsewhere, so it is not really a solution for ovrsource. Test Plan: CI green, we are able to disable it internally with `-DNVALGRIND` Reviewed By: malfet Differential Revision: D24227360 fbshipit-source-id: 24a3b393cf46d6a16acca0a9ec52610d4bb8704f	2020-10-13 16:18:04 -07:00
chengjun	5741de883a	Define the record_stream method in native_functions.yaml (#44301 ) Summary: The record_stream method was hard coded for CUDA device. Define the record_stream in the native_functions.yaml to enable the dynamic dispatch to different end device. Fixes https://github.com/pytorch/pytorch/issues/36556 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44301 Reviewed By: glaringlee Differential Revision: D23763954 Pulled By: ezyang fbshipit-source-id: e6d24f5e7892b56101fa858a6cad2abc5cdc4293	2020-10-13 09:15:22 -07:00
Pritam Damania	6e43f0db8b	Use correct signatures for METH_NOARGS. (#45528 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45528 As described in https://github.com/pytorch/pytorch/issues/45419, resolving a bunch of cpython signature issues. #Closes: https://github.com/pytorch/pytorch/issues/45419 ghstack-source-id: 113385726 Test Plan: sentinel Reviewed By: albanD Differential Revision: D24000626 fbshipit-source-id: d334596f1f0256063691aa044c8fb2face260817	2020-10-02 10:43:58 -07:00
Supriya Rao	04526a49d3	[quant] creating quint4x2 dtype for quantized tensors (#44678 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44678 This is a prototype PR that introduces 4 bit qtensors. The new dtype added for this is c10::quint4x2 The underlying storage for this is still uint8_t, so we pack 2 4-bit values in a byte while quantizing it. This change uses most of the existing scaffolding for qtensor storage. We allocate storage based on the dtype before creating a new qtensor. It also adds a dispatch mechanism for this dtype so we can use this to get the bitwidth, qmin and qmax info while quantizing and packing the qtensor (when we add 2-bit qtensor) Kernels that use this dtype should be aware of the packing format. Test Plan: Locally tested ``` x = torch.ones((100, 100), dtype=torch.float) qx_8bit = torch.quantize_per_tensor(x, scale=1.0, zero_point=2, dtype=torch.quint8) qx = torch.quantize_per_tensor(x, scale=1.0, zero_point=2, dtype=torch.quint4x2) torch.save(x, "temp.p") print('Size float (B):', os.path.getsize("temp.p")) os.remove('temp.p') torch.save(qx_8bit, "temp.p") print('Size quantized 8bit(B):', os.path.getsize("temp.p")) os.remove('temp.p') torch.save(qx, "temp.p") print('Size quantized 4bit(B):', os.path.getsize("temp.p")) os.remove('temp.p') ``` Size float (B): 40760 Size quantized 8bit(B): 10808 Size quantized 4bit(B): 5816 Imported from OSS Reviewed By: raghuramank100 Differential Revision: D23993134 fbshipit-source-id: 073bf262f9680416150ba78ed2d932032275946d	2020-10-01 23:53:34 -07:00
Taylor Robie	2b13d9413e	Re-land: Add callgrind collection to Timer #44717 (#45586 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45586 Test Plan: The unit test has been softened to be less platform sensitive. Reviewed By: mruberry Differential Revision: D24025415 Pulled By: robieta fbshipit-source-id: ee986933b984e736cf1525e1297de6b21ac1f0cf	2020-09-30 17:43:06 -07:00
Mike Ruberry	51d0ae9207	Revert D24010742: [pytorch][PR] Add callgrind collection to Timer Test Plan: revert-hammer Differential Revision: D24010742 (`9b27e0926b`) Original commit changeset: df6bc765f8ef fbshipit-source-id: 4c1edd57ea932896f7052716427059c924222501	2020-09-30 10:15:46 -07:00
Taylor Robie	9b27e0926b	Add callgrind collection to Timer (#44717 ) Summary: This PR allows Timer to collect deterministic instruction counts for (some) snippets. Because of the intrusive nature of Valgrind (effectively replacing the CPU with an emulated one) we have to perform our measurements in a separate process. This PR writes a `.py` file containing the Timer's `setup` and `stmt`, and executes it within a `valgrind` subprocess along with a plethora of checks and error handling. There is still a bit of jitter around the edges due to the Python glue that I'm using, but the PyTorch signal is quite good and thus this provides a low friction way of getting signal. I considered using JIT as an alternative, but: A) Python specific overheads (e.g. parsing) are important B) JIT might do rewrites which would complicate measurement. Consider the following bit of code, related to https://github.com/pytorch/pytorch/issues/44484: ``` from torch.utils._benchmark import Timer counts = Timer( "x.backward()", setup="x = torch.ones((1,)) + torch.ones((1,), requires_grad=True)" ).collect_callgrind() for c, fn in counts[:20]: print(f"{c:>12} {fn}") ``` ``` 812800 ???:_dl_update_slotinfo 355600 ???:update_get_addr 308300 work/Python/ceval.c:_PyEval_EvalFrameDefault'2 304800 ???:__tls_get_addr 196059 ???:_int_free 152400 ???:__tls_get_addr_slow 138400 build/../c10/core/ScalarType.h:c10::typeMetaToScalarType(caffe2::TypeMeta) 126526 work/Objects/dictobject.c:_PyDict_LoadGlobal 114268 ???:malloc 101400 work/Objects/unicodeobject.c:PyUnicode_FromFormatV 85900 work/Python/ceval.c:_PyEval_EvalFrameDefault 79946 work/Objects/typeobject.c:_PyType_Lookup 72000 build/../c10/core/Device.h:c10::Device::validate() 70000 /usr/include/c++/8/bits/stl_vector.h:std::vector<at::Tensor, std::allocator<at::Tensor> >::~vector() 66400 work/Objects/object.c:_PyObject_GenericGetAttrWithDict 63000 ???:pthread_mutex_lock 61200 work/Objects/dictobject.c:PyDict_GetItem 59800 ???:free 58400 work/Objects/tupleobject.c:tupledealloc 56707 work/Objects/dictobject.c:lookdict_unicode_nodummy ``` Moreover, if we backport this PR to 1.6 (just copy the `_benchmarks` folder) and load those counts as `counts_1_6`, then we can easily diff them: ``` print(f"Head instructions: {sum(c for c, _ in counts)}") print(f"1.6 instructions: {sum(c for c, _ in counts_1_6)}") count_dict = {fn: c for c, fn in counts} for c, fn in counts_1_6: _ = count_dict.setdefault(fn, 0) count_dict[fn] -= c count_diffs = sorted([(c, fn) for fn, c in count_dict.items()], reverse=True) for c, fn in count_diffs[:15] + [["", "..."]] + count_diffs[-15:]: print(f"{c:>8} {fn}") ``` ``` Head instructions: 7609547 1.6 instructions: 6059648 169600 ???:_dl_update_slotinfo 101400 work/Objects/unicodeobject.c:PyUnicode_FromFormatV 74200 ???:update_get_addr 63600 ???:__tls_get_addr 46800 work/Python/ceval.c:_PyEval_EvalFrameDefault 33512 work/Objects/dictobject.c:_PyDict_LoadGlobal 31800 ???:__tls_get_addr_slow 31700 build/../aten/src/ATen/record_function.cpp:at::RecordFunction::RecordFunction(at::RecordScope) 28300 build/../torch/csrc/utils/python_arg_parser.cpp:torch::FunctionSignature::parse(_object, _object, _object, _object, bool) 27800 work/Objects/object.c:_PyObject_GenericGetAttrWithDict 27401 work/Objects/dictobject.c:lookdict_unicode_nodummy 24115 work/Objects/typeobject.c:_PyType_Lookup 24080 ???:_int_free 21700 work/Objects/dictobject.c:PyDict_GetItemWithError 20700 work/Objects/dictobject.c:PyDict_GetItem ... -3200 build/../c10/util/SmallVector.h:at::TensorIterator::binary_op(at::Tensor&, at::Tensor const&, at::Tensor const&, bool) -3400 build/../aten/src/ATen/native/TensorIterator.cpp:at::TensorIterator::resize_outputs(at::TensorIteratorConfig const&) -3500 /usr/include/c++/8/x86_64-redhat-linux/bits/gthr-default.h:std::unique_lock<std::mutex>::unlock() -3700 build/../torch/csrc/utils/python_arg_parser.cpp:torch::PythonArgParser::raw_parse(_object, _object, _object) -4207 work/Objects/obmalloc.c:PyMem_Calloc -4500 /usr/include/c++/8/bits/stl_vector.h:std::vector<at::Tensor, std::allocator<at::Tensor> >::~vector() -4800 build/../torch/csrc/autograd/generated/VariableType_2.cpp:torch::autograd::VariableType::add__Tensor(at::Tensor&, at::Tensor const&, c10::Scalar) -5000 build/../c10/core/impl/LocalDispatchKeySet.cpp:c10::impl::ExcludeDispatchKeyGuard::ExcludeDispatchKeyGuard(c10::DispatchKey) -5300 work/Objects/listobject.c:PyList_New -5400 build/../torch/csrc/utils/python_arg_parser.cpp:torch::FunctionParameter::check(_object, std::vector<pybind11::handle, std::allocator<pybind11::handle> >&) -5600 /usr/include/c++/8/bits/std_mutex.h:std::unique_lock<std::mutex>::unlock() -6231 work/Objects/obmalloc.c:PyMem_Free -6300 work/Objects/listobject.c:list_repeat -11200 work/Objects/listobject.c:list_dealloc -28900 build/../torch/csrc/utils/python_arg_parser.cpp:torch::FunctionSignature::parse(_object, _object, _object*, bool) ``` Remaining TODOs: Include a timer in the generated script for cuda sync. * Add valgrind to CircleCI machines and add a unit test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44717 Reviewed By: soumith Differential Revision: D24010742 Pulled By: robieta fbshipit-source-id: df6bc765f8efce7193893edba186cd62b4b23623	2020-09-30 05:52:54 -07:00
gunandrose4u	f07ac6a004	Fix Windows build failure after DDP PR merged (#45335 ) Summary: Fixes #{issue number} This is resubmit for PR https://github.com/pytorch/pytorch/issues/42897 . Together with fix for Windows build issue introduced by PR https://github.com/pytorch/pytorch/issues/44344 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/45335 Reviewed By: zou3519 Differential Revision: D23931471 Pulled By: mrshenli fbshipit-source-id: f49b5a114944c1450b32934b3292170be064f494	2020-09-25 12:37:50 -07:00
Mike Ruberry	103fa3894a	Revert D23841786: [pytorch][PR] Enable distributed package on windows, Gloo backend supported only Test Plan: revert-hammer Differential Revision: D23841786 (`0122299f9b`) Original commit changeset: 334ba1ed73ef fbshipit-source-id: ec95432f9957df56a5a04e52661f5db920b7f57f	2020-09-24 22:44:33 -07:00
gunandrose4u	0122299f9b	Enable distributed package on windows, Gloo backend supported only (#42897 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42095 For test case part will be committed to this PR later mrshenli, please help to review Pull Request resolved: https://github.com/pytorch/pytorch/pull/42897 Reviewed By: osalpekar Differential Revision: D23841786 Pulled By: mrshenli fbshipit-source-id: 334ba1ed73eff2f668857390fc32d1bc7f08e5f3	2020-09-24 21:13:55 -07:00
Gao, Xiang	5e97f251a8	Enable TF32 support for cuDNN (#40737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40737 Reviewed By: mruberry Differential Revision: D22801525 Pulled By: ngimel fbshipit-source-id: ac7f7e728b4b3e01925337e8c9996f26a6433fd2	2020-09-01 15:34:24 -07:00
Richard Zou	e8f4b04d9a	vmap: temporarily disable support for random functions (#42617 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42617 While we figure out the random plan, I want to initially disable support for random operations. This is because there is an ambiguity in what randomness means. For example, ``` tensor = torch.zeros(B0, 1) vmap(lambda t: t.normal_())(tensor) ``` in the above example, should tensor[0] and tensor[1] be equal (i.e., use the same random seed), or should they be different? The mechanism for disabling random support is as follows: - We add a new dispatch key called VmapMode - Whenever we're inside vmap, we enable VmapMode for all tensors. This is done via at::VmapMode::increment_nesting and at::VmapMode::decrement_nesting. - DispatchKey::VmapMode's fallback kernel is the fallthrough kernel. - We register kernels that raise errors for all random functions on DispatchKey::VmapMode. This way, whenever someone calls a random function on any tensor (not just BatchedTensors) inside of a vmap block, an error gets thrown. Test Plan: - pytest test/test_vmap.py -v -k "Operators" Reviewed By: ezyang Differential Revision: D22954840 Pulled By: zou3519 fbshipit-source-id: cb8d71062d4087e10cbf408f74b1a9dff81a226d	2020-08-11 07:19:51 -07:00
Mike Ruberry	9c8021c0b1	Adds torch.linalg namespace (#42664 ) Summary: This PR adds the `torch.linalg` namespace as part of our continued effort to be more compatible with NumPy. The namespace is tested by adding a single function, `torch.linalg.outer`, and testing it in a new test suite, test_linalg.py. It follows the same pattern that https://github.com/pytorch/pytorch/pull/41911, which added the `torch.fft` namespace, did. Future PRs will likely: - add more functions to torch.linalg - expand the testing done in test_linalg.py, including legacy functions, like torch.ger - deprecate existing linalg functions outside of `torch.linalg` in preference to the new namespace Pull Request resolved: https://github.com/pytorch/pytorch/pull/42664 Reviewed By: ngimel Differential Revision: D22991019 Pulled By: mruberry fbshipit-source-id: 39258d9b116a916817b3588f160b141f956e5d0b	2020-08-07 10:18:30 -07:00
Mike Ruberry	ccfce9d4a9	Adds fft namespace (#41911 ) Summary: This PR creates a new namespace, torch.fft (torch::fft) and puts a single function, fft, in it. This function is analogous to is a simplified version of NumPy's [numpy.fft.fft](https://numpy.org/doc/1.18/reference/generated/numpy.fft.fft.html?highlight=fft#numpy.fft.fft) that accepts no optional arguments. It is intended to demonstrate how to add and document functions in the namespace, and is not intended to deprecate the existing torch.fft function. Adding this namespace was complicated by the existence of the torch.fft function in Python. Creating a torch.fft Python module makes this name ambiguous: does it refer to a function or module? If the JIT didn't exist, a solution to this problem would have been to make torch.fft refer to a callable class that mimicked both the function and module. The JIT, however, cannot understand this pattern. As a workaround it's required to explicitly `import torch.fft` to access the torch.fft.fft function in Python: ``` import torch.fft t = torch.randn(128, dtype=torch.cdouble) torch.fft.fft(t) ``` See https://github.com/pytorch/pytorch/issues/42175 for future work. Another possible future PR is to get the JIT to understand torch.fft as a callable class so it need not be imported explicitly to be used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41911 Reviewed By: glaringlee Differential Revision: D22941894 Pulled By: mruberry fbshipit-source-id: c8e0b44cbe90d21e998ca3832cf3a533f28dbe8d	2020-08-06 00:20:50 -07:00
Hameer Abbasi	3d46e02ea1	Add __torch_function__ for methods (#37091 ) Summary: According to pytorch/rfcs#3 From the goals in the RFC: 1. Support subclassing `torch.Tensor` in Python (done here) 2. Preserve `torch.Tensor` subclasses when calling `torch` functions on them (done here) 3. Use the PyTorch API with `torch.Tensor`-like objects that are _not_ `torch.Tensor` subclasses (done in https://github.com/pytorch/pytorch/issues/30730) 4. Preserve `torch.Tensor` subclasses when calling `torch.Tensor` methods. (done here) 5. Propagating subclass instances correctly also with operators, using views/slices/indexing/etc. (done here) 6. Preserve subclass attributes when using methods or views/slices/indexing. (done here) 7. A way to insert code that operates on both functions and methods uniformly (so we can write a single function that overrides all operators). (done here) 8. The ability to give external libraries a way to also define functions/methods that follow the `__torch_function__` protocol. (will be addressed in a separate PR) This PR makes the following changes: 1. Adds the `self` argument to the arg parser. 2. Dispatches on `self` as well if `self` is not `nullptr`. 3. Adds a `torch._C.DisableTorchFunction` context manager to disable `__torch_function__`. 4. Adds a `torch::torch_function_enabled()` and `torch._C._torch_function_enabled()` to check the state of `__torch_function__`. 5. Dispatches all `torch._C.TensorBase` and `torch.Tensor` methods via `__torch_function__`. TODO: - [x] Sequence Methods - [x] Docs - [x] Tests Closes https://github.com/pytorch/pytorch/issues/28361 Benchmarks in https://github.com/pytorch/pytorch/pull/37091#issuecomment-633657778 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37091 Reviewed By: ngimel Differential Revision: D22765678 Pulled By: ezyang fbshipit-source-id: 53f8aa17ddb8b1108c0997f6a7aa13cb5be73de0	2020-08-05 20:44:13 -07:00
Xiang Gao	23174ca71b	[reland] Enable TF32 support for cuBLAS (#41498 ) Summary: fix rocm Pull Request resolved: https://github.com/pytorch/pytorch/pull/41498 Reviewed By: mruberry Differential Revision: D22560572 Pulled By: ngimel fbshipit-source-id: 5ee79e96cb29e70d9180830d058efb53d1c6c041	2020-07-15 21:00:55 -07:00
Shen Li	3a63a939d4	Revert D22517785: [pytorch][PR] Enable TF32 support for cuBLAS Test Plan: revert-hammer Differential Revision: D22517785 (`288ece89e1`) Original commit changeset: 87334c893561 fbshipit-source-id: 0a0674f49c1bcfc98f7f88af5a8c7de93b76e458	2020-07-15 08:15:48 -07:00
Xiang Gao	288ece89e1	Enable TF32 support for cuBLAS (#40800 ) Summary: Benchmark on a fully connected network and torchvision models (time in seconds) on GA100: \| model \| batch size \| forward(TF32) \| forward(FP32) \| backward(TF32) \| backward(FP32) \| \|--------------------\|------------\|---------------\|---------------\|----------------\|----------------\| \| FC 512-128-32-8 \| 512 \| 0.000211 \| 0.000321 \| 0.000499 \| 0.000532 \| \| alexnet \| 512 \| 0.0184 \| 0.0255 \| 0.0486 \| 0.0709 \| \| densenet161 \| 128 \| 0.0665 \| 0.204 \| 0.108 \| 0.437 \| \| googlenet \| 256 \| 0.0925 \| 0.110 \| 0.269 \| 0.326 \| \| inception_v3 \| 256 \| 0.155 \| 0.214 \| 0.391 \| 0.510 \| \| mnasnet1_0 \| 512 \| 0.108 \| 0.137 \| 0.298 \| 0.312 \| \| mobilenet_v2 \| 512 \| 0.114 \| 0.294 \| 0.133 \| 0.303 \| \| resnet18 \| 512 \| 0.0722 \| 0.100 \| 0.182 \| 0.228 \| \| resnext50_32x4d \| 256 \| 0.170 \| 0.237 \| 0.373 \| 0.479 \| \| shufflenet_v2_x1_0 \| 512 \| 0.0463 \| 0.0473 \| 0.125 \| 0.123 \| \| squeezenet1_0 \| 512 \| 0.0870 \| 0.0948 \| 0.205 \| 0.214 \| \| vgg16 \| 256 \| 0.167 \| 0.234 \| 0.401 \| 0.502 \| \| wide_resnet50_2 \| 512 \| 0.186 \| 0.310 \| 0.415 \| 0.638 \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/40800 Reviewed By: mruberry Differential Revision: D22517785 Pulled By: ngimel fbshipit-source-id: 87334c8935616f72a6af5abbd3ae69f76923dc3e	2020-07-14 13:21:10 -07:00
Kurt Mohler	124cdf2290	Add experimental deterministic flag (#38683 ) Summary: Adds `torch.experimental.deterministic` flag to enforce deterministic algorithms across all of pytorch. Adds `torch.experimental.deterministic_error_level` to allow users to choose between error/warning/silent if determinism for an operation is not available. Adds `torch.experimental.alert_not_deterministic()` which should be called within operations that are not deterministic. Offers both Python and ATen interfaces Issue https://github.com/pytorch/pytorch/issues/15359 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38683 Differential Revision: D21998093 Pulled By: ezyang fbshipit-source-id: 23aabbddd20f6199d846f97764ff24d728163737	2020-06-12 08:44:06 -07:00
Wanchao Liang	d493918436	[dist_autograd] expose distributed backward C++ API (#38656 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38656 Test Plan: Imported from OSS Differential Revision: D21940441 Pulled By: wanchaol fbshipit-source-id: e9d35201825912f5e7d7e1d0a71586abe5a6f71c	2020-06-08 19:42:21 -07:00
Edward Yang	4d880c0693	Device and torch._C function cleanup (#38173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38173 - Introduce torch.types.Device representing all "device-like" types - Stubbed torch.device.__reduce__ - Stubbed all torch._C functions comprehensively - Deleted _safe_call which is unused throughout the codebase Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21497399 Pulled By: ezyang fbshipit-source-id: 1f534442b0ec9a70d556545d072f2c06a08b9d15	2020-06-03 19:17:22 -07:00
Nikita Shulga	a864dbb360	Make `_C` extension a thin C wrapper (#39375 ) Summary: It just depends on a single `torch_python` library. C library does not depend on standard C++ library and as result it closes https://github.com/pytorch/pytorch/issues/36941 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39375 Reviewed By: orionr Differential Revision: D21840645 Pulled By: malfet fbshipit-source-id: 777c189feee9d6fc686816d92cb9f109b8aac7ca	2020-06-02 13:11:59 -07:00
David Reiss	6d642a6f6c	Remove (most) Python 2 support from C++ code (#35614 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35614 Python 2 has reached end-of-life and is no longer supported by PyTorch. Now we can clean up a lot of cruft that we put in place to support it. These changes were all done manually, and I skipped anything that seemed like it would take more than a few seconds, so I think it makes sense to review it manually as well. Test Plan: CI Differential Revision: D20842876 Pulled By: dreiss fbshipit-source-id: 18abf0d324ed2185ec6d27c864e935d856dcc6ad	2020-05-14 15:01:49 -07:00
Peter Bell	5137827ad0	Lazily initialise thread local num_threads value (#37461 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/37259, fixes https://github.com/pytorch/pytorch/issues/20156 This lazily calls `at::init_num_threads` once for each thread by adding a call to `lazy_init_num_threads` in `at::parallel_for` and `at::parallel_reduce`. If this solution is okay, then we should add the same to guard other places that might use MKL or OpenMP. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37461 Reviewed By: ezyang Differential Revision: D21472763 Pulled By: ilia-cher fbshipit-source-id: 889d6664f5bd4080037ade02ee324b1233992915	2020-05-11 13:24:45 -07:00
anjali411	1f09f7ea44	Python API for Complex Storage and storage copy logic (#35771 ) Summary: Following up on this: https://github.com/pytorch/pytorch/pull/35851 cross dtype storage copy is not being used internally, so I have not included cross dtype copy for complex. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35771 Differential Revision: D21319650 Pulled By: anjali411 fbshipit-source-id: 07c72996ee598eba0cf401ad61534494d6f5b5b3	2020-05-01 11:47:22 -07:00
Gregory Chanan	287f3b746e	Remove Backend -> THPLayout mapping. (#37527 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37527 This is yet another place that needs to be updated for adding a new "Backend" and is unnecessary. Instead, just use layout_from_backend and have a map from Layout -> THPLayout. Other changes: - rename torch::getDtype and torch::getLayout to torch::getTHPDtype and torch::getTHPLayout since e.g. for layout you are both passing in and returning a "layout" type. - add NumOptions to Layout to match the dtype/ScalarType formulation. Test Plan: Imported from OSS Differential Revision: D21309836 Pulled By: gchanan fbshipit-source-id: ede0e4f3bf7ff2cd04a9b17df020f0d4fd654ba3	2020-04-30 11:11:09 -07:00
Edward Yang	9e3605de98	[RELAND] New operator registration API (#35061 ) (#35629 ) Summary: Reland of https://github.com/pytorch/pytorch/pull/35061 ; removed the get qualified type name magic from debug strings to work around MSVC 2017 bug. Main points of the new API: - You can register implementations (impl) without having to specify a schema. - Registrations are commutative, so no matter what order your static initializers run, you end up with the same end result. op_registration_test.cpp contains a reasonably comprehensive accounting for the available API surface How does this implementation proceed? The basic concept is to relax the internal invariants of Dispatcher data structures to allow the possibility that a FunctionSchema is not specified in an Operator. - DispatchKeyExtractor has an uninitialized state where it doesn't look for dispatch keys in any arguments of the stack. It can have a schema (de)registered to itself post facto with registerSchema/unregisterSchema. - DispatchTable has a new constructor taking only an OperatorName for the uninitialized state. It can have a schema (de)registered to itself post facto with registerSchema/unregisterSchema - OperatorDef maintains counts of both defs and well as defs_and_impls. defs_and_impls keeps track of the outstanding impl registrations; you may have impl registrations but no defs. If there are no defs (no schema), the operator is not returned by findSchema. A new findOperatorByName fucntion unconditionally returns the OperatorHandle even if there's no schema. OperatorHandle::hasSchema can be used to check if the operator has schema. - Replaced 'registerKernel' with 'registerImpl', which is the new interface for directly registering kernels without implementations. - Because 'registerImpl' no longer requires an OperatorHandle, change 'registerDef' to only return a RegistrationHandleRAII. This is marginally less efficient (since we're doing two hash table lookups on a registration now), but this won't matter in the long term, and probably doesn't matter now either. - Rename registerBackendFallbackKernel to registerFallback (this exposed a bunch of places where we're improperly directly interfacing with Dispatcher; we need to add this capability to the true public API) - All code generated internal registrations are switched to use the new API. This includes VariableType registrations (which previously weren't converted) and the mobile autograd stuff - Switch the new-style def()/impl() APIs to interact directly with Dispatcher, rather than indirecting through the old API - We deleted alias analysis kind merging entirely. As a nod to BC, it's possible to define a full schema with alias analysis kind, and then later do another full schema def with missing alias analysis kind, but the opposite direction is not allowed. We can remove this entirely following the plan at https://github.com/pytorch/pytorch/issues/35040 - Schema matching is moved inside the dispatcher, because we might not be able to immediately schema match at the point of an impl() (because we don't have the schema yet). To do this, we store the inferred function schema inside a KernelEntry, so we can check it when we get the real schema. - Registered kernel functions now store a debug string which can be used to more easily identify them. Tests use this to distinguish between multiple distinct registrations; regular invocations get only very basic information. Because we need our static initializers to work no matter what order they're run, the testing strategy on this PR is quite involved. The general concept: - Bind a (very gimped) version of the dispatcher API from Python, so that we can easily write a more complex testing harness using expect tests. - For series of registrations we want to test, exhaustively test every possible permutation of registrations (and deregistrations), and show that the intermediate states agree no matter what path is taken. - Intermediate states are rendered using a new dumpState() debugging method that prints the internal state of the dispatcher. This method may be generally useful for people who want to see what's in the dispatcher. - Simultaneously, add a new invariant testing function which checks that the internal invariants of the dispatcher are upheld (so we don't have to print internal implementation details of the dispatcher) The testing framework found a few bugs in development. For example, here is a case where we registered schema too early, before checking if it was valid: ``` Traceback (most recent call last): File "test/test_dispatch.py", line 164, in test_def_impl_schema_mismatch ], raises=True) File "test/test_dispatch.py", line 135, in commute results=results, raises=raises) File "test/test_dispatch.py", line 83, in run_permutation .format(ctor_order[:i], op_ix)) File "test/test_dispatch.py", line 59, in check_invariants .format(expected_provenance, actual_provenance) AssertionError: 'name[16 chars]ema: (none)\ncatchall: boxed unboxed :: (Tenso[18 chars]0)\n' != 'name[16 chars]ema: test::foo(Tensor x, Tensor y) -> (Tensor)[53 chars]0)\n' name: test::foo - schema: (none) + schema: test::foo(Tensor x, Tensor y) -> (Tensor) catchall: boxed unboxed :: (Tensor _0) -> (Tensor _0) : expected from running ctors (1,); actual from running ctors (1,) and then failing to run ctor 0 (did this failure leave the dispatcher in a wedged state? it shouldn't!) ``` There are also C++ smoketests for the API. These tests comprehensively cover the C++ API surface of the new operator registration API, but don't check very hard if the API does the right thing (that's what test_dispatch.py is for) Some miscellaneous changes which could have been split into other PRs, but I was too lazy to do so: - Add torch::jit::parseName (mirroring parseSchema/parseSchemaOrName) - Add cloneWithName functionality to FunctionSchema - Unconditionally generate schema registration, even when type_method_dispatch is a dict. The one exception is for manual registrations.... - Add fallback, CppFunction::makeFallthrough and CppFunction::makeFromBoxedFunction to public API of op_registration, so we can stop calling internal registerImpl directly - Add new syntax sugar dispatch_autograd for registering autograd kernels - Minor OperatorName cleanup, storing OperatorName in DispatchTable and defining operator<< on OperatorName - Refactored the op registration API to take FunctionSchema directly. We now do namespacing by post facto fixing up the OperatorName embedded in FunctionSchema. This also means that you can now do torch::import("ns1").def("ns2::blah") and have the ns2 override ns1 (although maybe this is not the correct behavior.) - New torch::schema public API, for attaching alias analysis kind annotation kinds. This meant we had to template up some function signatures which previously took const char*. There's now a nice comment explaining this strategy. - torch::import now takes std::string which means we can use the namespacing from Python Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/35629 Differential Revision: D20724551 Pulled By: ezyang fbshipit-source-id: befa46a1affb4ec4ae1fb39e3564a63695a6ca41	2020-03-29 19:48:29 -07:00
Edward Yang	227beb9095	Revert D20680520: New operator registration API Test Plan: revert-hammer Differential Revision: D20680520 Original commit changeset: 5d39a28e4ec7 fbshipit-source-id: 5b2497ffc24db9a05b01d526f161bc0164f9f707	2020-03-28 14:49:56 -07:00
Edward Yang	28ab8c6ff8	New operator registration API (#35061 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35061 Main points of the new API: - You can register implementations (impl) without having to specify a schema. - Registrations are commutative, so no matter what order your static initializers run, you end up with the same end result. op_registration_test.cpp contains a reasonably comprehensive accounting for the available API surface How does this implementation proceed? The basic concept is to relax the internal invariants of Dispatcher data structures to allow the possibility that a FunctionSchema is not specified in an Operator. - DispatchKeyExtractor has an uninitialized state where it doesn't look for dispatch keys in any arguments of the stack. It can have a schema (de)registered to itself post facto with registerSchema/unregisterSchema. - DispatchTable has a new constructor taking only an OperatorName for the uninitialized state. It can have a schema (de)registered to itself post facto with registerSchema/unregisterSchema - OperatorDef maintains counts of both defs and well as defs_and_impls. defs_and_impls keeps track of the outstanding impl registrations; you may have impl registrations but no defs. If there are no defs (no schema), the operator is not returned by findSchema. A new findOperatorByName fucntion unconditionally returns the OperatorHandle even if there's no schema. OperatorHandle::hasSchema can be used to check if the operator has schema. - Replaced 'registerKernel' with 'registerImpl', which is the new interface for directly registering kernels without implementations. - Because 'registerImpl' no longer requires an OperatorHandle, change 'registerDef' to only return a RegistrationHandleRAII. This is marginally less efficient (since we're doing two hash table lookups on a registration now), but this won't matter in the long term, and probably doesn't matter now either. - Rename registerBackendFallbackKernel to registerFallback (this exposed a bunch of places where we're improperly directly interfacing with Dispatcher; we need to add this capability to the true public API) - All code generated internal registrations are switched to use the new API. This includes VariableType registrations (which previously weren't converted) and the mobile autograd stuff - Switch the new-style def()/impl() APIs to interact directly with Dispatcher, rather than indirecting through the old API - We deleted alias analysis kind merging entirely. As a nod to BC, it's possible to define a full schema with alias analysis kind, and then later do another full schema def with missing alias analysis kind, but the opposite direction is not allowed. We can remove this entirely following the plan at https://github.com/pytorch/pytorch/issues/35040 - Schema matching is moved inside the dispatcher, because we might not be able to immediately schema match at the point of an impl() (because we don't have the schema yet). To do this, we store the inferred function schema inside a KernelEntry, so we can check it when we get the real schema. - Registered kernel functions now store a debug string which can be used to more easily identify them. There's some best effort stuff based on __FUNCSIG__ but this is only really capable of reporting types and not function symbols. Tests use this to distinguish between multiple distinct registrations. Because we need our static initializers to work no matter what order they're run, the testing strategy on this PR is quite involved. The general concept: - Bind a (very gimped) version of the dispatcher API from Python, so that we can easily write a more complex testing harness using expect tests. - For series of registrations we want to test, exhaustively test every possible permutation of registrations (and deregistrations), and show that the intermediate states agree no matter what path is taken. - Intermediate states are rendered using a new dumpState() debugging method that prints the internal state of the dispatcher. This method may be generally useful for people who want to see what's in the dispatcher. - Simultaneously, add a new invariant testing function which checks that the internal invariants of the dispatcher are upheld (so we don't have to print internal implementation details of the dispatcher) The testing framework found a few bugs in development. For example, here is a case where we registered schema too early, before checking if it was valid: ``` Traceback (most recent call last): File "test/test_dispatch.py", line 164, in test_def_impl_schema_mismatch ], raises=True) File "test/test_dispatch.py", line 135, in commute results=results, raises=raises) File "test/test_dispatch.py", line 83, in run_permutation .format(ctor_order[:i], op_ix)) File "test/test_dispatch.py", line 59, in check_invariants .format(expected_provenance, actual_provenance) AssertionError: 'name[16 chars]ema: (none)\ncatchall: boxed unboxed :: (Tenso[18 chars]0)\n' != 'name[16 chars]ema: test::foo(Tensor x, Tensor y) -> (Tensor)[53 chars]0)\n' name: test::foo - schema: (none) + schema: test::foo(Tensor x, Tensor y) -> (Tensor) catchall: boxed unboxed :: (Tensor _0) -> (Tensor _0) : expected from running ctors (1,); actual from running ctors (1,) and then failing to run ctor 0 (did this failure leave the dispatcher in a wedged state? it shouldn't!) ``` There are also C++ smoketests for the API. These tests comprehensively cover the C++ API surface of the new operator registration API, but don't check very hard if the API does the right thing (that's what test_dispatch.py is for) Some miscellaneous changes which could have been split into other PRs, but I was too lazy to do so: - Add torch::jit::parseName (mirroring parseSchema/parseSchemaOrName) - Add cloneWithName functionality to FunctionSchema - Unconditionally generate schema registration, even when type_method_dispatch is a dict. The one exception is for manual registrations.... - Add fallback, CppFunction::makeFallthrough and CppFunction::makeFromBoxedFunction to public API of op_registration, so we can stop calling internal registerImpl directly - Add new syntax sugar dispatch_autograd for registering autograd kernels - Minor OperatorName cleanup, storing OperatorName in DispatchTable and defining operator<< on OperatorName - Refactored the op registration API to take FunctionSchema directly. We now do namespacing by post facto fixing up the OperatorName embedded in FunctionSchema. This also means that you can now do torch::import("ns1").def("ns2::blah") and have the ns2 override ns1 (although maybe this is not the correct behavior.) - New torch::schema public API, for attaching alias analysis kind annotation kinds. This meant we had to template up some function signatures which previously took const char*. There's now a nice comment explaining this strategy. - torch::import now takes std::string which means we can use the namespacing from Python Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20680520 Pulled By: ezyang fbshipit-source-id: 5d39a28e4ec7c73fe4b1fb2222e865ab65e188f5	2020-03-28 10:52:49 -07:00
Omkar Salpekar	4025729e88	[1.5 Release][RPC Reliability] RRef Idempotency and RPC Retry enablement (#33636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33636 Fixes https://github.com/pytorch/pytorch/issues/32119, https://github.com/pytorch/pytorch/issues/26116, https://github.com/pytorch/pytorch/issues/33072 Makes RRef control messages idempotent and enables sending with retries for distributed autograd cleanup and RRef internal messages. In order to effectively test whether these RRef and distributed autograd cleanup work with network failures/retries, I implemented an RPC Agent with a faulty send function, and enabled running tests using this as a third backend (in addition to Thrift and PGA). The tests using this backend are in a separate class (the test cases are similar but with minor changes to ensure short-running tests wait for retried RPCs to finish). This faulty RPC Agent is pretty configurable. The tests can configure which messages types to fail, and how many messages to fail, but going forward, other RPC functionality can be overriden with faulty methods to test with failures injected. Differential Revision: D20019236 fbshipit-source-id: 540a977e96b2e29aa0393ff12621fa293fe92b48	2020-03-20 20:07:47 -07:00
Kimish Patel	4c30fc7238	Integrate XNNPACK with custom class for packing weights. (#34047 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34047 This PR integrates the added xnnpack conv2d and linear op via custom class registration for packed weights. The packed struct is serializable. Test Plan: python test test/test_xnnpack_integration.py Imported from OSS Differential Revision: D20185657 fbshipit-source-id: fc7e692d8f913e493b293b02d92f4e78536d7698	2020-03-14 12:51:56 -07:00
Peter Bell	5fc5cf6571	Stop using ctypes to interface with CUDA libraries. (#33678 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33016, Continuation of https://github.com/pytorch/pytorch/issues/31160 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33678 Differential Revision: D20249187 Pulled By: ezyang fbshipit-source-id: 172ce4a0fee7fbe01436a421d1af22ef6173b6ed	2020-03-11 07:22:46 -07:00
Michael Suo	dbe850af5b	[jit] do the code reorg (#33851 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33851 Rationale and context described in #33828. Script to reproduce the move: https://gist.github.com/suo/16cbefaaeb67ca5a7c6caffd49b7f6e9 ghstack-source-id: 99079645 Test Plan: Make sure CI passes Reviewed By: jamesr66a Differential Revision: D20133869 fbshipit-source-id: 390e9241a9c85366d9005c492ac31f10aa96488e	2020-02-27 13:02:51 -08:00
Jithun Nair	718c538ff9	Add ability to enable/disable MIOpen at runtime (#33118 ) Summary: 1. Set `torch._C.has_cudnn` to `True` for ROCm 2. Make MIOpen invocations respect value of `cudnn_enabled` or `at::globalContext().userEnabledCuDNN()` 3. `torch/backends/cudnn/__init__.py`: Add hip-specific changes (use "hide whitespace changes" option to view simpler diff) Pull Request resolved: https://github.com/pytorch/pytorch/pull/33118 Differential Revision: D19977719 Pulled By: bddppq fbshipit-source-id: 64d4dd1d78afcf96201360d85b8be5950f96dfad	2020-02-20 10:47:57 -08:00
Peter Bell	44af8ee6cd	Add pybind11 exception translator (#30588 ) Summary: Closes https://github.com/pytorch/pytorch/issues/30027 The idea here is that you can bind a function with `pybind11` in a single line and without modifying the function: ```cpp m.def("foo", foo, py::call_guard<torch::PyWarningHandler>()); ``` Where warnings are handled by the [`call_guard`](https://pybind11.readthedocs.io/en/stable/advanced/functions.html#call-guard) and exceptions are handled by the `pybind11` exception translator. To do this, I have added support for handling C++ exceptions in `torch::PyWarningHandler`'s destructor without setting the python error state before hand. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30588 Differential Revision: D19905626 Pulled By: albanD fbshipit-source-id: 90c0a5e298b123cc0c8ab9c52c91be4e96ea47c6	2020-02-18 11:33:29 -08:00
Basil Hosmer	fb159b5236	Some work on eager op binding codegen (gen_python_functions.py) (#29986 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29986 Previously in addition to generating a python binding for each op, we would generate an almost-trivial helper for each overload. This PR eliminates the helpers, simplifying codegen logic a bit and reducing the source-level indirection by a step. Perf should be unchanged. codegen diff: `1f2f07fb60` Note: in the interests of keeping the diff contained, there's only some light cleanup here beyond what's necessary for the codegen changes. Plan is to do some more substantial refactoring in followup PRs that leave generated code unchanged. Test Plan: Imported from OSS Differential Revision: D18567980 Pulled By: bhosmer fbshipit-source-id: eb9a81babb4489abd470842757af45580d4c9906	2020-01-30 00:29:53 -08:00
Pavel Belevich	62b06b9fae	Rename TensorTypeId to DispatchKey (#32154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32154 TensorTypeId -> DispatchKey c10/core/TensorTypeId.h -> c10/core/DispatchKey.h c10/core/TensorTypeId.cpp -> c10/core/DispatchKey.cpp TensorTypeId::* -> DispatchKey::* TensorTypeId type_id -> DispatchKey dispatch_key type_id -> dispatch_key TensorTypeId::NumTensorIds -> DispatchKey::NumDispatchKeys RealTensorTypeId -> RealDispatchKey TensorTypeSet -> DispatchKeySet TensorTypeIds -> DispatchKeys c10/core/TensorTypeSet.h -> c10/core/DispatchKeySet.h c10/core/TensorTypeSet.cpp -> c10/core/DispatchKeySet.cpp type_set() -> key_set() type_set_ -> key_set_ typeSet -> keySet ExcludeTensorTypeIdGuard -> ExcludeDispatchKeyGuard IncludeTensorTypeIdGuard -> IncludeDispatchKeyGuard LocalTensorTypeSet -> LocalDispatchKeySet c10/core/impl/LocalTensorTypeSet.h -> c10/core/impl/LocalDispatchKeySet.h c10/core/impl/LocalTensorTypeSet.cpp -> c10/core/impl/LocalDispatchKeySet.cpp tls_local_tensor_type_set -> tls_local_dispatch_key_set tls_is_tensor_type_id_excluded -> tls_is_dispatch_key_excluded tls_set_tensor_type_id_excluded -> tls_set_dispatch_key_excluded tls_is_tensor_type_id_included -> tls_is_dispatch_key_included tls_set_tensor_type_id_included -> tls_set_dispatch_key_included MultiDispatchTensorTypeSet -> MultiDispatchKeySet multi_dispatch_tensor_type_set -> multi_dispatch_key_set tensorTypeIdToBackend -> dispatchKeyToBackend backendToTensorTypeId -> backendToDispatchKey initForTensorTypeSet -> initForDispatchKeySet inferred_type_set -> inferred_key_set computeTensorTypeId -> computeDispatchKey PODLocalTensorTypeSet raw_local_tensor_type_set -> PODLocalDispatchKeySet raw_local_dispatch_key_set get_default_tensor_type_id -> get_default_dispatch_key inferred_type_id -> inferred_dispatch_key actual_type_id -> actual_dispatch_key typeSetToDispatchKey_ -> dispatchKeySetToDispatchKey_ get_type_id() -> get_dispatch_key() legacyExtractTypeId -> legacyExtractDispatchKey extractTypeId -> extractDispatchKey Test Plan: Imported from OSS Differential Revision: D19398900 Pulled By: pbelevich fbshipit-source-id: 234ad19f93d33e00201b61e153b740a339035776	2020-01-15 11:16:08 -08:00
Richard Zou	bcb0bb7e0e	Remove unnecessary ATen/core/EnableNamedTensor.h (#31117 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31117 After this diff, we will have completely removed the named tensor feature flagging. This means that named tensors are always on and that there is no mechanism to turn them off. There should be no more follow-up diffs. I performed the deletion of the header with ``` find . -type f -print0 \| xargs -0 sed -i '/#include <ATen\/core\/EnableNamedTensor.h>/d' ``` Test Plan: - wait for CI Differential Revision: D18934952 Pulled By: zou3519 fbshipit-source-id: 253d059074b910fef15bdf885ebf71e0edf5bea5	2019-12-12 09:53:07 -08:00
Richard Zou	9047d4df45	Remove all remaining usages of BUILD_NAMEDTENSOR (#31116 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31116 Changelist: - remove BUILD_NAMEDTENSOR macro - remove torch._C._BUILD_NAMEDTENSOR - remove all python behavior that relies on torch._C._BUILD_NAMEDTENSOR Future: - In the next diff, I will remove all usages of ATen/core/EnableNamedTensor.h since that header doesn't do anything anymore - After that, we'll be done with the BUILD_NAMEDTENSOR removal. Test Plan: - run CI Differential Revision: D18934951 Pulled By: zou3519 fbshipit-source-id: 0a0df0f1f0470d0a01c495579333a2835aac9f5d	2019-12-12 09:53:03 -08:00
Richard Zou	e05ee4c421	Remove BUILD_NAMEDTENSOR macros (#30894 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30894 This PR begins the process of removing BUILD_NAMEDTENSOR macros. There will be followups. Reasons for removing the macros: - BUILD_NAMEDTENSOR is always on and has been on since pytorch 1.3.0. - Since we don't test building without it, it is useless to keep around. - Code becomes nicer to read without the macros Reasons for not removing the macros: - potential for feature flagging Now, I argue against needing to feature flag. The main reason why we might want to feature flag is if we need to disable the feature. We'd need a fast switch to disable the feature if someone discovers in the future that named tensors caused some regression in some existing workflows. In https://github.com/pytorch/pytorch/pull/25798, I did a variety of macro- and micro- benchmarks to determine the performance impact of named tensors on regular tensors. [The microbenchmarks](https://github.com/pytorch/pytorch/pull/25798#issuecomment-529014810) were not very stable, and running the microbenchmarks for more iterations doesn't actually help because the noise is not distributed in a nice way. Instead of microbenchmarks I ran a [profiler (perf)](https://github.com/pytorch/pytorch/pull/25798#issuecomment-555707645) to estimate how much overhead named tensors add to unnamed code. I estimated the overhead to be less than 100ns for `add` and even smaller for `mm`; there are ways to optimize even futher if we find this to be a problem. [Initial macrobenchmarks](https://github.com/pytorch/pytorch/pull/25798#issuecomment-530539104) were also not very stable. I ran imagenet for some number of epochs. To make them more stable, I got rid of the data loading (which seemed to vary between runs). [In some benchmarkers without data loading](https://github.com/pytorch/pytorch/pull/25798#issuecomment-562214053), we can see that the results are less noisy now. These results support no noticeable regressions in speed. Test Plan: - wait for CI Differential Revision: D18858543 Pulled By: zou3519 fbshipit-source-id: 08bf3853a9f506c6b084808dc9ddd1e835f48c13	2019-12-10 07:54:05 -08:00
Edward Yang	0c91ebb694	Delete all trivial uses of make_variable. (#29213 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29213 A trivial use of make_variable is one where requires_grad=False. This transformation is not technically semantics preserving, as make_variable will create a shallow copy of the tensor in question; however, I am guessing that we have the invariant that we don't actually make use of this shallow copy in a nontrivial way. There were some cases where the surrounding code expected a Variable proper to be returned; I retained those sites. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18353503 Pulled By: ezyang fbshipit-source-id: 57fe34d82e009c0cc852266fb0b79d6d9c62bb03	2019-11-13 07:43:41 -08:00
Alban Desmaison	9b875e1256	Buffer python warning to avoid deadlocks Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26613 Test Plan: Imported from OSS Differential Revision: D18249633 Pulled By: albanD fbshipit-source-id: 863f52400e1b97943a67a9e1abb09ae8d045e7f0	2019-11-07 08:35:06 -08:00
albanD	cb3232fdb9	Fix clang-tidy errors in csrc/Module.cpp Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28283 Test Plan: Imported from OSS Differential Revision: D18249632 Pulled By: albanD fbshipit-source-id: 0c7c71b3b7c74d338a90850e06c841b399f5709f	2019-11-07 08:34:58 -08:00
Supriya Rao	45391ccecb	Update qengine flag in python to string (#26620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26620 This change updates torch.backend.quantized.engine to accept string ("fbgemm"/"qnnpack"/"none" for now). set_qengine and get_qengine return an int which represents the at::QEngine enum Test Plan: python test/test_torch.py Imported from OSS Differential Revision: D17533582 fbshipit-source-id: 5103263d0d59ff37d43dec27243cb76ba8ba633f	2019-09-23 17:56:50 -07:00
Jerry Zhang	2667493f4c	Expose supportedQEngines to python (#26474 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26474 att Test Plan: python test/test_torch.py Imported from OSS Differential Revision: D17517373 fbshipit-source-id: af931761d6ee31a88808d05f686002a83b6b25af	2019-09-21 10:36:13 -07:00
Jerry Zhang	8f50ea0f5c	Add NoQEngine to QEngine and refactor the name of set/get qengine (#26471 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26471 att Test Plan: . Imported from OSS Differential Revision: D17491215 fbshipit-source-id: 5790aa0113bfdbeeb838f3d1530397606ccaa1e9	2019-09-19 17:42:09 -07:00
Ailing Zhang	b1ecf4bc82	Revert D17464904: Add NoQEngine to QEngine and refactor the name of set/get qengine Test Plan: revert-hammer Differential Revision: D17464904 Original commit changeset: d8f2cebb978f fbshipit-source-id: 8feb86f7347f455eb51538ce7893d4a096ba0ba4	2019-09-18 20:04:58 -07:00
Jerry Zhang	4f7292f7ee	Add NoQEngine to QEngine and refactor the name of set/get qengine (#26330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26330 att Test Plan: . Imported from OSS Differential Revision: D17464904 fbshipit-source-id: d8f2cebb978fcbc478bc7e111ba24bc71a6f8915	2019-09-18 19:38:59 -07:00
Supriya Rao	bb1efb3bee	Adding quantized::linear function for pytorch mobile in c10 (#26135 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26135 This change adds the support to call QNNPACK using the refactored API for Linear operators (Fully Connected) It also has certain cmake changes to enable builing and using pytorch_qnnpack inside aten I have disabled USE_QNNPACK in CMakeLists.txt. Enabling it results in picking kernels from third_party/QNNPACK during runtime since the function names are the same. Test Plan: python test/test_quantized.py TestQNNPackOps.test_qlinear_qnnpack Imported from OSS Differential Revision: D17434885 fbshipit-source-id: 084698026938f4529f61d12e86dfe82534ec73dd	2019-09-17 16:16:39 -07:00
Richard Zou	caed485873	Turn on BUILD_NAMEDTENSOR permanently (#26060 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26060 This PR enables BUILD_NAMEDTENSOR by default. This is done via including a header, `c10/core/EnableNamedTensor`, that sets `BUILD_NAMEDTENSOR`. In the future, the plan is to get rid of the flag entirely: we can incrementally delete usages after this PR goes in. This PR also maintains the namedtensor ci vs regular ci distinction. `test/test_namedtensor.py` only runs if TEST_NAMEDTENSOR=1 is specified. TEST_NAMEDTENSOR=1 is set on the namedtensor ci. I'll remove this distinction later and send out an announcement about it; devs will be responsible for named tensor failures after that. The initial reason why we had the BUILD_NAMEDTENSOR flag was so that we could quickly prototype named tensor features without worrying about adding overhead to the framework. The overheads can be categorized as memory overhead and performance overhead. Memory overhead: named tensors adds 1 additional word per Tensor. This is because TensorImpl stores a `unique_ptr<NamedTensorMetaInterface>` field. This is not a lot of overhead. Performance overhead: At all entry points to name inference, we check if inputs to an op are named. If inputs are not named, we short-circuit and don't do name inference. These calls should therefore be as efficient as error-checking code and not take up a lot of time. My plan is to benchmark a few functions and then post the results in a comment to this PR. Test Plan: - [namedtensor ci] Differential Revision: D17331635 Pulled By: zou3519 fbshipit-source-id: deed901347448ae2c26066c1fa432e3dc0cadb92	2019-09-17 08:25:00 -07:00
Ralf Gommers	1b4951d3a5	Fix remaining invalid function cast warnings that show up with GCC 8/9 (#26104 ) Summary: Follow-up to gh-25483, more of the same fixes for warnings like: ``` ../torch/csrc/autograd/python_variable.cpp:503:31: warning: cast between incompatible function types from ‘PyObject* ()(THPVariable)’ {aka ‘_object* ()(THPVariable)’} to ‘getter’ {aka ‘_object* ()(_object, void*)’} [-Wcast-function-type] 503 \| {"_backward_hooks", (getter)THPVariable_get_backwards_hooks, (setter)THPVariable_set_backwards_hooks, nullptr, nullptr}, \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` This takes the build log output for a full rebuild with GCC 9.1 from ~10,000 to ~7,000 lines. `clang-tidy` is going to complain, no way around that - see discussion at the end of gh-25483. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26104 Differential Revision: D17396831 Pulled By: ezyang fbshipit-source-id: d71696bfe4dbe25519e4bcb7753151c118bd39f7	2019-09-17 07:43:37 -07:00
Supriya Rao	24d5b5f5f9	Add Runtime flag for quantized backend. (#25680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25680 Add a runtime flag to choose between FBGEMM and QNNPACK when compiled with both. The flag can be set by using torch.backends.quantized.engine = torch.fbgemm/torch.qnnpack or ctx::setPreferredQuantizedEngine(at::QEngine) ghstack-source-id: 89935643 Test Plan: Verified torch.backends.quantized.engine works Differential Revision: D17198233 fbshipit-source-id: e5449d06f4136385e0e6d18bd4237f8654a61672	2019-09-11 21:37:36 -07:00
jiayisun	b9bf91feb8	Add torch.backends.mkldnn.enabled flag (#25459 ) Summary: This PR is about add torch.backends.mkldnn.enabled flag said in https://github.com/pytorch/pytorch/issues/25186 which can be used disable mkldnn at runtime step as torch.backends.cudnn.enabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25459 Differential Revision: D17258926 Pulled By: ezyang fbshipit-source-id: e179ad364cc608fdaa7d0f37e2e762ceb5eda598	2019-09-11 12:09:40 -07:00
Gregory Chanan	716815e3de	Stop initializing THNN backend. (#25352 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25352 It doesn't appear to be necessary anymore; assuming this works I'll kill the codegen in a follow-up PR. Test Plan: Imported from OSS Differential Revision: D17101573 Pulled By: gchanan fbshipit-source-id: bd3d1724ee5c659185a161b1e291e30af52f0a8a	2019-08-30 07:42:17 -07:00
Pritam Damania	7818e7e5d4	Basic framework for Distributed Autograd context. (#24875 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24875 As per https://github.com/pytorch/pytorch/issues/23110, each autograd pass would be assigned a unique autograd_context_id. In this change we introduce a DistAutogradContainer per worker which holds information for each autograd pass currently running. DistAutogradContainer has a map from the autograd_context_id to DistAutogradContext (which holds all the relevant information for the autograd pass). DistAutogradContext currently only stores the autograd_context_id and more information would be added to it later as we build out the rest of the framework. The autograd_context_id is a 64 bit globally unique integer where the first 16 bits are the worker_id and next 48 bits are auto-incrementing for uniqueness. Sample python code on how this would be used for distributed autograd: ``` import torch.distributed.autograd as dist_autograd worker_id = 0 dist_autograd.init(worker_id) with dist_autograd.context() as context_id: # forward pass... # backward pass... # optimizer step... ``` ghstack-source-id: 89119248 Test Plan: unit tests. Differential Revision: D16356694 fbshipit-source-id: d1a8678da0c2af611758dbb5d624d554212330ce	2019-08-28 18:51:56 -07:00
Shen Li	02d3c302d8	Fix build failure on OSX (#23998 ) Summary: https://github.com/pytorch/pytorch/pull/23228 caused build failure on OSX, because rpc.h is included as long as USE_DISTRIBUTED=1, but rpc/init.cpp (and others) is only included when NOT APPLE. So, it cannot find python_functions defined in init.cpp on MacOS. This PR attempt to fix it by wrapping rpc.h with USE_C10D, which is only set when NOT APPLE. I tried this fix locally and it works. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23998 Differential Revision: D16706087 Pulled By: mrshenli fbshipit-source-id: d04fe6717a181a3198289cdef51439708c2e291d	2019-08-07 22:05:41 -07:00

1 2 3 4 5 ...

468 Commits