pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Nikita Shulga	97396cdbb2	Fix undefined behavior detected by clang-12 (#106354 ) Compiler behavior when non-zero offset is added to a null pointer is undefined and is a bad habit. - When `lapackEig` is called with to estimate a workspace size, do not add matrix size to the W pointer. - When `unpack_pivots_cpu_kernel` with zero `dim_size` exit early. - When `topk_impl_loop` is called with `k` is zero, exit right away as output tensors are empty anyway. - Ignore adding non-zero storage-offset in `TensorImpl::data_ptr_impl_impl`, which can be the case if tensor is created as `torch.empty(3)[4:]`. - In `s_addmm_out_sparse_dense_worker` do not call `axpy` over an empty vector. - In `_sparse_binary_op_intersection_kernel_impl` do skip computing `ptr_indices_dim` when `sparse_dim` is empty. - Exit `grid_sample` forward/backward kernels earlier if either `input` or `grid` are empty tensors. Found by asan in clang-12 Before the change UBSan report looks as follows: ``` ASAN_SYMBOLIZER_PATH=/usr/lib/llvm-12/bin/llvm-symbolizer UBSAN_OPTIONS=print_stacktrace=1 LD_PRELOAD=/usr/lib/llvm-12/lib/clang/12.0.1/lib/linux/libclang_rt.asan-x86_64.so python test_fx_experimental.py -v -k test_normalize_operator_exhaustive_linalg_eig_cpu_float32 Test results will be stored in test-reports/python-unittest/test_fx_experimental Running tests... ---------------------------------------------------------------------- test_normalize_operator_exhaustive_linalg_eig_cpu_float32 (__main__.TestNormalizeOperatorsCPU) ... /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/overrides.py:111: UserWarning: 'has_cuda' is deprecated, please use 'torch.backends.cuda.is_built()' torch.has_cuda, /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/overrides.py:112: UserWarning: 'has_cudnn' is deprecated, please use 'torch.backends.cudnn.is_available()' torch.has_cudnn, /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/overrides.py:118: UserWarning: 'has_mps' is deprecated, please use 'torch.backends.mps.is_built()' torch.has_mps, /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/overrides.py:119: UserWarning: 'has_mkldnn' is deprecated, please use 'torch.backends.mkldnn.is_available()' torch.has_mkldnn, /var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebra.cpp:937:17: runtime error: applying non-zero offset 20 to null pointer #0 0x7f2025794888 in void at::native::lapackEig<float, float>(char, char, int, float, int, float, float, int, float, int, float, int, float, int) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x9945888) #1 0x7f20257da256 in void at::native::(anonymous namespace)::apply_linalg_eig<float>(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, bool) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x998b256) #2 0x7f20257d902d in at::native::(anonymous namespace)::linalg_eig_kernel(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor const&, bool) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x998a02d) #3 0x7f20257b5b3d in at::native::linalg_eig_out_info(at::Tensor const&, at::Tensor&, at::Tensor&, at::Tensor&, bool) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x9966b3d) #4 0x7f20257b4770 in at::native::linalg_eig_out(at::Tensor const&, at::Tensor&, at::Tensor&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x9965770) #5 0x7f20280710e6 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor&, at::Tensor&> (at::Tensor const&, at::Tensor&, at::Tensor&), &(at::(anonymous namespace)::(anonymous namespace)::wrapper_CPU_out_linalg_eig_out(at::Tensor const&, at::Tensor&, at::Tensor&))>, std::tuple<at::Tensor&, at::Tensor&>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor&, at::Tensor&> >, std::tuple<at::Tensor&, at::Tensor&> (at::Tensor const&, at::Tensor&, at::Tensor&)>::call(c10::OperatorKernel, c10::DispatchKeySet, at::Tensor const&, at::Tensor&, at::Tensor&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xc2220e6) #6 0x7f202727a045 in at::_ops::linalg_eig_out::call(at::Tensor const&, at::Tensor&, at::Tensor&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xb42b045) #7 0x7f20257b7e29 in at::native::linalg_eig(at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x9968e29) #8 0x7f2028070bf0 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor> (at::Tensor const&), &(at::(anonymous namespace)::(anonymous namespace)::wrapper_CPU__linalg_eig(at::Tensor const&))>, std::tuple<at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&> >, std::tuple<at::Tensor, at::Tensor> (at::Tensor const&)>::call(c10::OperatorKernel, c10::DispatchKeySet, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xc221bf0) #9 0x7f2026b1f787 in std::tuple<at::Tensor, at::Tensor> c10::Dispatcher::redispatch<std::tuple<at::Tensor, at::Tensor>, at::Tensor const&>(c10::TypedOperatorHandle<std::tuple<at::Tensor, at::Tensor> (at::Tensor const&)> const&, c10::DispatchKeySet, at::Tensor const&) const (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xacd0787) #10 0x7f20273230a7 in at::_ops::linalg_eig::redispatch(c10::DispatchKeySet, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xb4d40a7) #11 0x7f202c3cc32d in torch::autograd::VariableType::(anonymous namespace)::linalg_eig(c10::DispatchKeySet, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x1057d32d) #12 0x7f202c3cba96 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&), &(torch::autograd::VariableType::(anonymous namespace)::linalg_eig(c10::DispatchKeySet, at::Tensor const&))>, std::tuple<at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&> >, std::tuple<at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&)>::call(c10::OperatorKernel, c10::DispatchKeySet, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x1057ca96) #13 0x7f20272798e0 in at::_ops::linalg_eig::call(at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xb42a8e0) #14 0x7f2043d97ae3 in torch::autograd::THPVariable_linalg_eig(_object, _object, _object*) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_python.so+0x23feae3) #15 0x5072d6 in cfunction_call /usr/local/src/conda/python-3.9.17/Objects/methodobject.c:543:19 ... SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebra.cpp:937:17 in ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/106354 Approved by: https://github.com/huydhn, https://github.com/lezcano	2023-08-03 05:36:03 +00:00
James Donald	a1d0db1c60	[pytorch] Fix MSVC unexpected tokens following preprocessor directive (#105922 ) Summary: Fix this warning: ``` caffe2\c10\macros\Macros.h(138): warning C4067: unexpected tokens following preprocessor directive - expected a newline ``` `caffe2/c10/util/variant.h` already has a similar to check and define a stub for `__has_attribute(x)`, so this would not be new to caffe2/pytorch. Test Plan: CI should complete, still with plenty of caffe2 warnings but this one should be gone from the Windows build log Differential Revision: D47735319 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105922 Approved by: https://github.com/kit1980	2023-07-27 06:03:31 +00:00
PyTorch MergeBot	076781ba9b	Revert "fix building errors on FreeBSD (#105897 )" This reverts commit `5c5eece6d8`. Reverted https://github.com/pytorch/pytorch/pull/105897 on behalf of https://github.com/PaliC due to causing regressions on internal models ([comment](https://github.com/pytorch/pytorch/pull/105897#issuecomment-1652840218))	2023-07-27 03:01:44 +00:00
cyy	5c5eece6d8	fix building errors on FreeBSD (#105897 ) Although FreeBSD is not officially supported, this PR fixes some errors on FreeBSD. Pull Request resolved: https://github.com/pytorch/pytorch/pull/105897 Approved by: https://github.com/kit1980	2023-07-26 08:11:42 +00:00
mikey dagitses	2ac9086987	run buildifier on unified build files (#98141 ) This is pretty tricky. buildifier by default doesn't do much to these files. It does a little more if you tell it that they are `BUILD.bazel` files with -type=build. But it can do even more if you remove the target definitions from the `def define_rules()` wrapper and dedent them. I wrote a little wrapper that does that. I'll submit it at a later date. Differential Revision: [D44606558](https://our.internmc.facebook.com/intern/diff/D44606558/) NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D44606558/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/98141 Approved by: https://github.com/ezyang, https://github.com/PaliC	2023-04-04 00:37:19 +00:00
cyy	37f7c00a8a	More fixes and improved clang-tidy checkers (#93213 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/93213 Approved by: https://github.com/Skylion007	2023-02-01 14:44:17 +00:00
cyy	f172feae0d	More tidy fixes (#93069 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/93069 Approved by: https://github.com/Skylion007	2023-01-27 06:40:50 +00:00
eqy	f8026413f5	Fix `CUDA_MAX_THREADS_PER_SM` for `sm_89` (#91972 ) Basically the same as #88644, to fix warnings like `ptxas warning : Value of threads per SM for entry _ZN2at6native13reduce_kernelILi512ELi1ENS0_8ReduceOpIfNS0_10NormTwoffEEjfLi4EEEEEvT1_ is out of range. .minnctapersm will be ignored` CC @ptrblck @ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/91972 Approved by: https://github.com/ngimel	2023-01-12 00:30:27 +00:00
Kazuki Sakamoto	bfdc0358dc	Compile fix for Clang + libc++ (#91212 ) Summary: LLVM 15 has a compile issue with the deprecated __has_trivial_copy. Update the GCC ifdef logic to exclude Clang + libc++. ``` caffe2/c10/util/Optional.h:536:13: error: builtin __has_trivial_copy is deprecated; use __is_trivially_copyable instead [-Werror,-Wdeprecated-builtins] C10_IS_TRIVIALLY_COPYABLE(T) && ^ caffe2/c10/macros/Macros.h:438:38: note: expanded from macro 'C10_IS_TRIVIALLY_COPYABLE' #define C10_IS_TRIVIALLY_COPYABLE(T) __has_trivial_copy(T) ``` Test Plan: CI Reviewed By: kit1980 Differential Revision: D42180203 Pull Request resolved: https://github.com/pytorch/pytorch/pull/91212 Approved by: https://github.com/kit1980, https://github.com/soumith	2022-12-21 11:19:58 +00:00
mikey dagitses	322e4b4c8a	set -Wsuggest-override for builds (#89852 ) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/pytorch/pytorch/pull/89852). * __->__ #89852 * #89851 set -Wsuggest-override for builds Summary: This was flagged by a Meta internal build. Test Plan: Rely on CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89852 Approved by: https://github.com/malfet	2022-12-19 22:08:47 +00:00
Nikita Shulga	5654fed23e	Export c10/[macros\|util] headers to be used by internal inductor builds (#89249 ) Summary: Fixes package boundary violation that existed in previous implementation Test Plan: CI Differential Revision: D41391862 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89249 Approved by: https://github.com/izaitsevfb	2022-11-18 10:51:07 +00:00
Eddie Yan	3e30a9ea1c	Fix `CUDA_MAX_THREADS_PER_SM` for `sm_87` (#88644 ) #88326 CC @ngimel @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/88644 Approved by: https://github.com/ngimel	2022-11-08 19:44:23 +00:00
Pruthvi Madugundu	fbd08fb358	Introduce TORCH_DISABLE_GPU_ASSERTS (#84190 ) - Asserts for CUDA are enabled by default - Disabled for ROCm by default by setting `TORCH_DISABLE_GPU_ASSERTS` to `ON` - Can be enabled for ROCm by setting above variable to`OFF` during build or can be forcefully enabled by setting `ROCM_FORCE_ENABLE_GPU_ASSERTS:BOOL=ON` This is follow up changes as per comment in PR #81790, comment [link](https://github.com/pytorch/pytorch/pull/81790#issuecomment-1215929021) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84190 Approved by: https://github.com/jeffdaily, https://github.com/malfet	2022-11-04 04:43:05 +00:00
PyTorch MergeBot	0fa23663cc	Revert "Introduce TORCH_DISABLE_GPU_ASSERTS (#84190 )" This reverts commit `1e2c4a6e0e`. Reverted https://github.com/pytorch/pytorch/pull/84190 on behalf of https://github.com/malfet due to Needs internal changes, has to be landed via co-dev	2022-11-02 18:13:37 +00:00
Pruthvi Madugundu	1e2c4a6e0e	Introduce TORCH_DISABLE_GPU_ASSERTS (#84190 ) - Asserts for CUDA are enabled by default - Disabled for ROCm by default by setting `TORCH_DISABLE_GPU_ASSERTS` to `ON` - Can be enabled for ROCm by setting above variable to`OFF` during build or can be forcefully enabled by setting `ROCM_FORCE_ENABLE_GPU_ASSERTS:BOOL=ON` This is follow up changes as per comment in PR #81790, comment [link](https://github.com/pytorch/pytorch/pull/81790#issuecomment-1215929021) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84190 Approved by: https://github.com/jeffdaily, https://github.com/malfet	2022-11-02 17:41:57 +00:00
sanchitintel	9c793b366f	Move incorrectly placed closing curly brace of `extern "C"` block (#87853 ) ### Bug description When `__SYCL_DEVICE_ONLY__` is defined, while building PyTorch, the output of the preprocessing step would not have the closing curly brace of the `extern "C"` block, as it has been incorrectly placed. Compilers don't seem to report an error or a warning for a missing closing brace of an `extern "C"` block. ### Impact of the bug If `c10/macros/Macros.h` would be included in a C++ file, and after the preprocessing stage, if the preprocessed source file would have some templated code after `extern "C" {`, then, after compilation, linking might fail with the error `templates must have c++ linkage`). eg. https://stackoverflow.com/questions/61717819/template-with-c-linkage-error-when-using-template-keyword-in-main-cpp/61717908#61717908 (its answer also has a small snippet of code to reproduce such an issue). ### Solution in this PR one-liner bug fix that rectifies the placement of closing curly brace (`}`), so that the `extern "C"` block ends properly when `__SYCL_DEVICE_ONLY__` is defined. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87853 Approved by: https://github.com/jgong5, https://github.com/kit1980, https://github.com/malfet	2022-10-28 03:42:20 +00:00
Richard Barnes	85ffbedfb2	Strip GCC5 stuff from PyTorch (#85914 ) [This file](https://github.com/pytorch/pytorch/pull/63208/files) indicates that we don't support anything less than GCC 7.5. Given that, let's remove this GCC 5 stuff. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85914 Approved by: https://github.com/ezyang	2022-10-26 00:07:44 +00:00
Yu, Guangye	ff56f1c30d	Define the SYCL device version assertation used in the other backend, like XPU (#84106 ) # Motivation: We need a device version assertation that can be used in SYCL kernel. SYCL_KERNEL_ASSERT will be used in the kernel launched on device XPU. # Solution: We add a macro SYCL_KERNEL_ASSERT via __assert_fail declaration for Linux and _wassert declaration for Windows even though NDEBUG is enabled. # Additional context: `__assert_fail` in SYCL kernel `extern SYCL_EXTERNAL void __assert_fail(const char expr, const char file, unsigned int line, const char func);` `_wassert` in SYCL kernel `extern SYCL_EXTERNAL void _wassert(const wchar_t wexpr, const wchar_t *wfile, unsigned line);` No additional unit test because this change could not affect PyTorch's functionality. It only affects assertation in kernel on XPU backend. So it is difficult to add ut to test it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/84106 Approved by: https://github.com/malfet	2022-09-01 22:22:28 +00:00
Peter Bell	a08911400e	Use C10_HAS_CPP_ATTRIBUTE to simplify nodiscard definition (#83976 ) `C10_HAS_CPP_ATTRIBUTE` only expands to `__has_cpp_attribute` when it is defined, so we avoid the extra `#if defined(__has_cpp_attribute)` checks and double-nested `#if`s Pull Request resolved: https://github.com/pytorch/pytorch/pull/83976 Approved by: https://github.com/albanD	2022-08-26 15:45:47 +00:00
Scott Wolchok	e4af53c1a1	[PyTorch] Remove unused sstream/string includes from c10/macros/Macros.h (#83353 ) Nothing in the rest of the header seems to use these. Differential Revision: [D38672680](https://our.internmc.facebook.com/intern/diff/D38672680/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83353 Approved by: https://github.com/malfet	2022-08-23 16:56:00 +00:00
Pruthvi Madugundu	8473e69684	[ROCm] Fixes the kernel asserts API declaration mismatch error (#81790 ) This problem updates the the PR [#73040](https://github.com/pytorch/pytorch/pull/73040) The compilation error in pyTorch with ROCm is successful with these changes when `NDEBUG` is enabled. Solution: For HIP we keep `__device__ __assert_fail()` and for host side compilation we want to use the `__assert_fail()` from the glibc library. Tested the code by compiling with below steps ``` python3 tools/amd_build/build_amd.py python3 setup.py develop --cmake-only cmake -DHIP_HIPCC_FLAGS_RELEASE="-DNDEBUG" build cmake --build build ``` The UT test_fixed_cuda_assert_async is still skipped due performance overhead. cc @jithunnair-amd Pull Request resolved: https://github.com/pytorch/pytorch/pull/81790 Approved by: https://github.com/shintaro-iwasaki, https://github.com/jeffdaily, https://github.com/malfet	2022-08-16 19:22:31 +00:00
Scott Wolchok	80c4919bec	[PyTorch] Stack-allocate boxed args for RecordFunction Pull Request resolved: https://github.com/pytorch/pytorch/pull/76266 Saving a heap allocation in this path improves performance. Differential Revision: [D34090699](https://our.internmc.facebook.com/intern/diff/D34090699/) Approved by: https://github.com/ezyang	2022-05-24 17:06:40 +00:00
Peter Bell	2e480fc2db	Cleanup ATen-core forward declarations I noticed that when `SymInt` was introduced, `jit_type_base.h` was added as an include to the `Operator.h` template which is supposed to be kept extremely clean and only use forward declarations. Also, that forward declarations for `OptionalArrayRef` were missing. So, I've refactored the forward declarations into `ATen/core/ATen_fwd.h` and cleaned up some of the `c10` headers that were masking these missing declarations. I've also re-generated the pre-compiled header so `SymInt` is included. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76576 Approved by: https://github.com/albanD	2022-05-02 14:50:48 +00:00
Hannes Friederich	5932c37198	[caffe2] drop XROS ports (#76366 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76366 caffe2 is not currently being built for XROS. Test Plan: CI Reviewed By: kimishpatel Differential Revision: D35923922 fbshipit-source-id: 260dacadf0bd5b6bab7833a4ce81e896d280b053 (cherry picked from commit 8370b8dd2519d55a79fa8d45e7951ca8dc0b21a8)	2022-04-26 23:54:22 +00:00
mikey dagitses	ba8dd48d8a	use cross-platform sed command in cmake_configure_file (#73589 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73589 Long-form flags do not work and neither does the \w character class. Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D34558345 Pulled By: dagitses fbshipit-source-id: 73e1b18bdd55d67fd3936428400c3835684549b0 (cherry picked from commit fc85796fadb8dac0043ba0bf43fd54cf817665b7)	2022-03-02 12:28:51 +00:00
Shintaro Iwasaki	7dc2cfa249	[c10][rocm] fix __assert_fail() declaration mismatch error (#73040 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73040 This patch fixes a compilation error in PyTorch with ROCm when `NDEBUG` is passed. ## Problem Forward declaration of `__host__ __device__ __assert_fail()` is used in `c10/macros/Macros.h` for HIP compilation when `NDEBUG` is set However, HIP has `__device__ __assert_fail()` in `hip/amd_detail/amd_device_functions.h`, causing a function type error. This issue does not appear in ROCm CI tests since it happens only when `NDEBUG` is passed. ## Solution [EDIT] After the discussion on GitHub, we chose to entirely disable `CUDA_KERNEL_ASSERT()` for ROCm. --- To solve this compilation error, this patch disables `CUDA_KERNEL_ASSERT()`, which uses `__assert_fail()` when 1. `c10/macros/Macros.h` is included for `.hip` (precisely speaking, `__HIP__` or `__HIP_ARCH__` is defined), and 2. `NDEBUG` is passed. Note that there's no impact on default compilation because, without a special compilation flag, those HIP files are compiled without `-NDEBUG`. And that's why this issue has not been found. ### Justification [1] We cannot declare one host-and-device function for two separate host and device functions. ``` __device__ int func() {return 0}; __host__ int func() {return 0}; // Compile error (hipcc) // __device__ __host__ int func(); ``` [2] Forward declaration of a correct `__device__` only `__assert_fail()` for `__HIP__` causes the following error: ``` pytorch/c10/util/TypeCast.h:135:7: error: reference to __device__ function '__assert_fail' in __host__ __device__ function ERROR_UNSUPPORTED_CAST ^ pytorch/c10/util/TypeCast.h:118:32: note: expanded from macro 'ERROR_UNSUPPORTED_CAST' #define ERROR_UNSUPPORTED_CAST CUDA_KERNEL_ASSERT(false); ^ pytorch/c10/macros/Macros.h:392:5: note: expanded from macro 'CUDA_KERNEL_ASSERT' __assert_fail( ``` [3] Maybe there's a way to properly define `__assert_fail()` for HIP + NDEBUG, but this might be too much. Please let me just disable it. ### Technical details Error ``` pytorch/c10/macros/Macros.h:368:5: error: __host__ __device__ function '__assert_fail' cannot overload __device__ function '__assert_fail' __assert_fail( ^ /opt/rocm/hip/include/hip/amd_detail/amd_device_functions.h:1173:6: note: previous declaration is here void __assert_fail(const char assertion, ``` CUDA definition (9.x) of `__assert_fail()` ``` #elif defined(__GNUC__) extern __host__ __device__ __cudart_builtin__ void __assert_fail( const char , const char , unsigned int, const char ) __THROW; ``` ROCm definition (the latest version) ``` // `2b59661f3e/include/hip/amd_detail/amd_device_functions.h (L1172-L1177)` extern "C" __device__ __attribute__((noinline)) __attribute__((weak)) void __assert_fail(const char assertion, const char file, unsigned int line, const char function); ``` Test Plan: CI + reproducer ``` python3 tools/amd_build/build_amd.py python3 setup.py develop --cmake-only cmake -DHIP_HIPCC_FLAGS_RELEASE="-DNDEBUG" build cmake --build build ``` Reviewed By: xw285cornell Differential Revision: D34310555 fbshipit-source-id: 7542288912590533ced3f20afd2e704b6551991b (cherry picked from commit 9e52196e36820abe36bf6427cabc7389d3ea6cb5)	2022-03-01 04:35:30 +00:00
Zhengxu Chen	fe277b8717	[jit][edge] Migrate to TypeFactory for jit types on mobile (#71516 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71516 Mobile should be able to contruct dynamic types by default. ghstack-source-id: 147498365 Test Plan: CI. -48KB binary size reduction for igios BSB. UMBEX link: https://www.internalfb.com/intern/unigraph/explorer/?jsgq_traversal_spec=%7B%22builds%22%3A[%22bsb%3A422553426218394%5Cu0040base%22%2C%22bsb%3A422553426218394%5Cu0040diff%22]%7D&unigraph_project=UnigraphProjectMbex&is_mbex_redirected Reviewed By: iseeyuan Differential Revision: D33673958 fbshipit-source-id: 8600c04ae929283681971aae264d3774188df9cd (cherry picked from commit `64ebcec09e`)	2022-01-26 07:32:04 +00:00
Michael Dagitses	78e1f9db34	port //c10/macros to common build structure (#70852 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70852 This is the first change that uses a common build file, build.bzl, to hold most of the build logic. ghstack-source-id: 147170895 Test Plan: Relying on internal and external CI. Reviewed By: malfet Differential Revision: D33299331 fbshipit-source-id: a66afffba6deec76b758dfb39bdf61d747b5bd99 (cherry picked from commit `d9163c56f5`)	2022-01-19 20:56:12 +00:00
Michael Dagitses	d665097cad	allow Bazel to build without glog and gflags (#70850 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70850 We support both, so we want to ensure both continue to work. ghstack-source-id: 146960552 Test Plan: Tested manually. A subsequent diff adds this test configuration to CI. Reviewed By: malfet Differential Revision: D33297464 fbshipit-source-id: 70e1431d0907d480c576239af93ef57036d5e4d7	2022-01-18 08:08:46 -08:00
Michael Dagitses	ffdc6b4994	extract //c10/macros to its own package (#70849 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70849 ghstack-source-id: 146960563 Test Plan: Bazel CI tests will protect this. Reviewed By: malfet Differential Revision: D33297235 fbshipit-source-id: 6504a977e82ad2f2232a74233b96cdea8bf94a20	2022-01-18 08:08:42 -08:00
CodemodService FBSourceClangFormatLinterBot	60632a00fe	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D33561057 fbshipit-source-id: 79873717c45c8bbe6d0ae760e718770fd960185d	2022-01-13 03:27:06 -08:00
Nolan O'Brien	8f4cec2231	[warnings][Caffe2] Suppress warnings in caffe2 headers (#71196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71196 `caffe2` headers contain code that can elicit warnings when built with strict compiler flags. Rather than force downstream/consuming code to weaken their compiler flags, suppress those warnings in the header using `#pragma clang diagnostic` suppressions. Test Plan: CI Pass Reviewed By: malfet Differential Revision: D33536233 fbshipit-source-id: 74404e7a5edaf244f79f7a0addd991a84442a31f	2022-01-12 10:16:35 -08:00
Michael Suo	23ab6ce723	Revert D33141011: extract //c10/macros into its own package Test Plan: revert-hammer Differential Revision: D33141011 (`8f4c724bb6`) Original commit changeset: caa97448f922 Original Phabricator Diff: D33141011 (`8f4c724bb6`) fbshipit-source-id: 79423ed51f9a43ecf1f716a739c74949b66fadb4	2021-12-22 17:48:45 -08:00
Michael Suo	f126501d37	Revert D33141010: allow Bazel to build without glog and gflags Test Plan: revert-hammer Differential Revision: D33141010 (`8c41f258f4`) Original commit changeset: d951e5616459 Original Phabricator Diff: D33141010 (`8c41f258f4`) fbshipit-source-id: d52ca20ddf4c5a91cb09a32fecb30a00227fc4ae	2021-12-22 17:47:23 -08:00
Michael Dagitses	8c41f258f4	allow Bazel to build without glog and gflags (#69995 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69995 ghstack-source-id: 146027060 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D33141010 fbshipit-source-id: d951e5616459e8aa163ae0741e245f53185580e8	2021-12-22 14:30:30 -08:00
Michael Dagitses	8f4c724bb6	extract //c10/macros into its own package (#69994 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69994 ghstack-source-id: 145799968 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D33141011 fbshipit-source-id: caa97448f922d7c12980bf01669c1b3ef5c1213b	2021-12-22 14:30:27 -08:00
Scott Wolchok	d026057bb3	[PyTorch] Update SmallVector from LLVM (#69110 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69110 I pasted the current LLVM code, reapplied the modifications listed in the code comments, caught a few more in the diff/build process. The trivially copyable detection is different now; if gcc builds fail, will try reverting to C10_IS_TRIVIALLY_COPYABLE or copying what LLVM is doing. The motivation for this change is that, as noted in an existing comment, C10_IS_TRIVIALLY_COPYABLE did the wrong thing for std::unique_ptr, which caused problems with D32454856 / #68412. ghstack-source-id: 145327773 Test Plan: CI Reviewed By: bhosmer, mruberry Differential Revision: D32733017 fbshipit-source-id: 9452ab90328e3fdf457aad23a26f2f6835b0bd3d	2021-12-10 11:57:19 -08:00
Mengwei Liu	53aac4b6f3	[PyTorch] Allow override for macro `HAS_DEMANGLE` (#66540 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66540 Currently the macro `HAS_DEMANGLE` is determined by compiler predefined macros. Here I'm adding an option to allow `HAS_DEMANGLE` to be defined in build files. Test Plan: Rely on CI Reviewed By: poweic Differential Revision: D31600007 fbshipit-source-id: 76cf088b0f5ee940e977d3b213f1446ea64be036	2021-10-17 16:10:45 -07:00
Mengwei Liu	d8532e3524	[PyTorch] Split c10 Type.cpp into two files to allow targets to include one of them (#66445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66445 `Type.cpp` implements `demangle()` function based on the macro `HAS_DEMANGLE`. This diff splits it into two `.cpps` so that we can add either one into the build target. This change follows the patternof `flags_use_no_gflags.cpp` and `flags_use_gflags.cpp`. Test Plan: Rely on CI Reviewed By: iseeyuan Differential Revision: D31551432 fbshipit-source-id: f8b11783e513fa812228ec873459ad3043ff9147	2021-10-11 21:52:24 -07:00
Karol Kosik	eb3b9fe719	[XROS][ML] System specific adjustments for UTs to work. (#65245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65245 Building and running c10 and qnnpack tests on XROS. Notable changes: - Adding #if define(_XROS_) in few places not supported by XROS - Changing Threadpool to abstract class ghstack-source-id: 139513579 Test Plan: Run c10 and qnnpack tests on XROS. Reviewed By: veselinp, iseeyuan Differential Revision: D30137333 fbshipit-source-id: bb6239b935187fac712834341fe5a8d3377762b1	2021-10-01 18:15:14 -07:00
Pruthvi Madugundu	085e2f7bdd	[ROCm] Changes not to rely on CUDA_VERSION or HIP_VERSION (#65610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65610 - Replace HIP_PLATFORM_HCC with USE_ROCM - Dont rely on CUDA_VERSION or HIP_VERSION and use USE_ROCM and ROCM_VERSION. - In the next PR - Will be removing the mapping from CUDA_VERSION to HIP_VERSION and CUDA to HIP in hipify. - HIP_PLATFORM_HCC is deprecated, so will add HIP_PLATFORM_AMD to support HIP host code compilation on gcc. cc jeffdaily sunway513 jithunnair-amd ROCmSupport amathews-amd Reviewed By: jbschlosser Differential Revision: D30909053 Pulled By: ezyang fbshipit-source-id: 224a966ebf1aaec79beccbbd686fdf3d49267e06	2021-09-29 09:55:43 -07:00
Jane Xu	6707dfeefb	Remove 9.2 related macros for CONSTEXPR (#65066 ) Summary: Removes C10_HOST_CONSTEXPR_EXCEPT_CUDA92 references in the code Pull Request resolved: https://github.com/pytorch/pytorch/pull/65066 Reviewed By: driazati Differential Revision: D31022520 Pulled By: janeyx99 fbshipit-source-id: f02cdc6caba5b48405575242921f5845ff18f729	2021-09-17 17:31:20 -07:00
Jeff Daily	c7b03e2b83	[ROCm] define C10_WARP_SIZE to warpSize HIP constant (#64302 ) Summary: warpSize is defined as a constexpr in HIP headers. It is incorrect to assume warpSize 64. This change fixes the C10_WARP_SIZE definition in torch sources similar to [how it was done in caffe2](https://github.com/pytorch/pytorch/blob/master/caffe2/utils/GpuDefs.cuh#L10-L14). cc jeffdaily sunway513 jithunnair-amd ROCmSupport Pull Request resolved: https://github.com/pytorch/pytorch/pull/64302 Reviewed By: mrshenli Differential Revision: D30785975 Pulled By: malfet fbshipit-source-id: 68f8333182ad4d02bd0c8d02f1751a50bc5bafa7	2021-09-10 09:43:47 -07:00
Dmytro Dzhulgakov	f446e835ee	Fix CUDA_KERNEL_ASSERT ambiguous symbol in NDEBUG mode (#62527 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62527 If NDEBUG is applied inconsistently in compilation we might get 'ambiguous declaration' error. Let's make sure that the forward declaration matches glibc including all specifiers. Test Plan: sandcastle Reviewed By: mdschatz Differential Revision: D30030051 fbshipit-source-id: 9f4d5f1d4e74f0a4eaeeaaaad76b93ee485d8bcd	2021-08-11 01:10:09 -07:00
Brian Hirsh	7bc86458e1	Revert "Revert D28833086: beef up at::_ops API" (#60214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60214 Relanding this PR, but with a fix for windows cuda builds (example failure in master here: https://github.com/pytorch/pytorch/runs/2852662871) This is identical to the original PR except for one change in `tools/codegen/gen.py`: `static constexpr` -> `static CONSTEXPR_EXCEPT_WIN_CUDA` This actually took a while to figure out, until I tracked down a previous pytorch PR that encountered a similar issue: https://github.com/pytorch/pytorch/pull/40675 This reverts commit `6d0fb85a62`. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D29213932 Pulled By: bdhirsh fbshipit-source-id: b90c7c10e5a51f8d6173ddca673b418e5774c248	2021-06-24 18:08:54 -07:00
Erjia Guan	65f33ec85c	Follow-up fix for compilation error on CUDA92 (#60287 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60287 Follow up of #60017 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D29236208 Pulled By: ejguan fbshipit-source-id: f1acf9630b45fea8cbdf7d64e47661643d0a52b8	2021-06-21 13:29:11 -07:00
Erjia Guan	691183bb74	Fix compile failure on CUDA92 (#60017 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60016 For CUDA 92 - OptionalBase was not check if `is_arrayref` - constexpr seems not expect to raise Exception for cuda 92 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60017 Reviewed By: malfet Differential Revision: D29139515 Pulled By: ejguan fbshipit-source-id: 4f4f6d9fe6a5f2eadf913de0a9781cc9f2e6ac6f	2021-06-16 12:23:08 -07:00
Jeff Daily	24e27af683	[ROCm] enable kernel asserts (#49624 ) Summary: Addresses missing ROCm feature indicated in https://github.com/pytorch/pytorch/issues/38943. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49624 Reviewed By: agolynski Differential Revision: D28902459 Pulled By: malfet fbshipit-source-id: 29c9b552770241a0ec52cd057ea45efc4389d838	2021-06-07 13:43:07 -07:00
Edward Yang	f05d5bec48	Preserve PyObject even when it goes dead (#56017 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56017 Fixes #55686 This patch is seemingly straightforward but some of the changes are very subtle. For the general algorithmic approach, please first read the quoted issue. Based on the algorithm, there are some fairly straightforward changes: - New boolean on TensorImpl tracking if we own the pyobj or not - PythonHooks virtual interface for requesting deallocation of pyobj when TensorImpl is being released and we own its pyobj, and implementation of the hooks in python_tensor.cpp - Modification of THPVariable to MaybeOwned its C++ tensor, directly using swolchok's nice new class And then, there is python_variable.cpp. Some of the changes follow the general algorithmic approach: - THPVariable_NewWithVar is simply adjusted to handle MaybeOwned and initializes as owend (like before) - THPVariable_Wrap adds the logic for reverting ownership back to PyObject when we take out an owning reference to the Python object - THPVariable_dealloc attempts to resurrect the Python object if the C++ tensor is live, and otherwise does the same old implementation as before - THPVariable_tryResurrect implements the resurrection logic. It is modeled after CPython code so read the cited logic and see if it is faithfully replicated - THPVariable_clear is slightly updated for MaybeOwned and also to preserve the invariant that if owns_pyobj, then pyobj_ is not null. This change is slightly dodgy: the previous implementation has a comment mentioning that the pyobj nulling is required to ensure we don't try to reuse the dead pyobj. I don't think, in this new world, this is possible, because the invariant says that the pyobj only dies if the C++ object is dead too. But I still unset the field for safety. And then... there is THPVariableMetaType. colesbury explained in the issue why this is necessary: when destructing an object in Python, you start off by running the tp_dealloc of the subclass before moving up to the parent class (much in the same way C++ destructors work). The deallocation process for a vanilla Python-defined class does irreparable harm to the PyObject instance (e.g., the finalizers get run) making it no longer valid attempt to resurrect later in the tp_dealloc chain. (BTW, the fact that objects can resurrect but in an invalid state is one of the reasons why it's so frickin' hard to write correct __del__ implementations). So we need to make sure that we actually override the tp_dealloc of the bottom most subclass of Tensor to make sure we attempt a resurrection before we start finalizing. To do this, we need to define a metaclass for Tensor that can override tp_dealloc whenever we create a new subclass of Tensor. By the way, it was totally not documented how to create metaclasses in the C++ API, and it took a good bit of trial error to figure it out (and the answer is now immortalized in https://stackoverflow.com/q/67077317/23845 -- the things that I got wrong in earlier versions of the PR included setting tp_basicsize incorrectly, incorrectly setting Py_TPFLAGS_HAVE_GC on the metaclass--you want to leave it unset so that it inherits, and determining that tp_init is what actually gets called when you construct a class, not tp_call as another not-to-be-named StackOverflow question suggests). Aside: Ordinarily, adding a metaclass to a class is a user visible change, as it means that it is no longer valid to mixin another class with a different metaclass. However, because _C._TensorBase is a C extension object, it will typically conflict with most other metaclasses, so this is not BC breaking. The desired new behavior of a subclass tp_dealloc is to first test if we should resurrect, and otherwise do the same old behavior. In an initial implementation of this patch, I implemented this by saving the original tp_dealloc (which references subtype_dealloc, the "standard" dealloc for all Python defined classes) and invoking it. However, this results in an infinite loop, as it attempts to call the dealloc function of the base type, but incorrectly chooses subclass type (because it is not a subtype_dealloc, as we have overridden it; see `b38601d496/Objects/typeobject.c (L1261)` ) So, with great reluctance, I must duplicate the behavior of subtype_dealloc in our implementation. Note that this is not entirely unheard of in Python binding code; for example, Cython `c25c3ccc4b/Cython/Compiler/ModuleNode.py (L1560)` also does similar things. This logic makes up the bulk of THPVariable_subclass_dealloc To review this, you should pull up the CPython copy of subtype_dealloc `b38601d496/Objects/typeobject.c (L1230)` and verify that I have specialized the implementation for our case appropriately. Among the simplifications I made: - I assume PyType_IS_GC, because I assume that Tensor subclasses are only ever done in Python and those classes are always subject to GC. (BTW, yes! This means I have broken anyone who has extend PyTorch tensor from C API directly. I'm going to guess no one has actually done this.) - I don't bother walking up the type bases to find the parent dealloc; I know it is always THPVariable_dealloc. Similarly, I can get rid of some parent type tests based on knowledge of how THPVariable_dealloc is defined - The CPython version calls some private APIs which I can't call, so I use the public PyObject_GC_UnTrack APIs. - I don't allow the finalizer of a Tensor to change its type (but more on this shortly) One alternative I discussed with colesbury was instead of copy pasting the subtype_dealloc, we could transmute the type of the object that was dying to turn it into a different object whose tp_dealloc is subtype_dealloc, so the stock subtype_dealloc would then be applicable. We decided this would be kind of weird and didn't do it that way. TODO: - More code comments - Figure out how not to increase the size of TensorImpl with the new bool field - Add some torture tests for the THPVariable_subclass_dealloc, e.g., involving subclasses of Tensors that do strange things with finalizers - Benchmark the impact of taking the GIL to release C++ side tensors (e.g., from autograd) - Benchmark the impact of adding a new metaclass to Tensor (probably will be done by separating out the metaclass change into its own change) - Benchmark the impact of changing THPVariable to conditionally own Tensor (as opposed to unconditionally owning it, as before) - Add tests that this actually indeed preserves the Python object Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27765125 Pulled By: ezyang fbshipit-source-id: 857f14bdcca2900727412aff4c2e2d7f0af1415a	2021-06-03 10:50:36 -07:00
chengjun	511979df85	Define the SYCL device version __assert_fail when the NDEBUG defined. (#58906 ) Summary: ## Motivation The utils in namespace `c10` require the `__assert_fail` when the NDEBUG is defined in kernel code. The `__assert_fail` declaration in pytorch is not compatible to the SYCL‘s specification. This causes compile error when use these utils in SYCL kernels. ## Solution Add the `__assert_fail` declaration for SYCL kernels to pytorch when compiling the SYCL kernels with `c10` utils. ## Additional context `__assert_fail` in SYCL kernel `extern SYCL_EXTERNAL void __assert_fail(const char expr, const char file, unsigned int line, const char *func);` Pull Request resolved: https://github.com/pytorch/pytorch/pull/58906 Reviewed By: anjali411 Differential Revision: D28700863 Pulled By: ezyang fbshipit-source-id: 81896d022b35ace8cd16474128649eabedfaf138	2021-05-26 12:48:37 -07:00

1 2 3

142 Commits