Commit Graph

142 Commits

Author SHA1 Message Date
Nikita Shulga
97396cdbb2 Fix undefined behavior detected by clang-12 (#106354)
Compiler behavior when non-zero offset is added to a null pointer is undefined and is a bad habit.

- When `lapackEig` is called with to estimate a workspace size, do not add matrix size to the W pointer.
- When `unpack_pivots_cpu_kernel` with zero `dim_size` exit early.
- When `topk_impl_loop` is called with  `k` is zero, exit right away as output tensors are empty anyway.
- Ignore adding non-zero storage-offset in `TensorImpl::data_ptr_impl_impl`, which can be the case if tensor is created as `torch.empty(3)[4:]`.
- In `s_addmm_out_sparse_dense_worker` do not call `axpy` over an empty vector.
- In `_sparse_binary_op_intersection_kernel_impl` do skip computing `ptr_indices_dim` when `sparse_dim` is empty.
- Exit `grid_sample` forward/backward kernels earlier if either `input` or `grid` are empty tensors.

Found by asan in clang-12

Before the change UBSan report looks as follows:
```
 ASAN_SYMBOLIZER_PATH=/usr/lib/llvm-12/bin/llvm-symbolizer UBSAN_OPTIONS=print_stacktrace=1 LD_PRELOAD=/usr/lib/llvm-12/lib/clang/12.0.1/lib/linux/libclang_rt.asan-x86_64.so python test_fx_experimental.py -v -k test_normalize_operator_exhaustive_linalg_eig_cpu_float32
Test results will be stored in test-reports/python-unittest/test_fx_experimental

Running tests...
----------------------------------------------------------------------
  test_normalize_operator_exhaustive_linalg_eig_cpu_float32 (__main__.TestNormalizeOperatorsCPU) ... /opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/overrides.py:111: UserWarning: 'has_cuda' is deprecated, please use 'torch.backends.cuda.is_built()'
  torch.has_cuda,
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/overrides.py:112: UserWarning: 'has_cudnn' is deprecated, please use 'torch.backends.cudnn.is_available()'
  torch.has_cudnn,
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/overrides.py:118: UserWarning: 'has_mps' is deprecated, please use 'torch.backends.mps.is_built()'
  torch.has_mps,
/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/overrides.py:119: UserWarning: 'has_mkldnn' is deprecated, please use 'torch.backends.mkldnn.is_available()'
  torch.has_mkldnn,
/var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebra.cpp:937:17: runtime error: applying non-zero offset 20 to null pointer
    #0 0x7f2025794888 in void at::native::lapackEig<float, float>(char, char, int, float*, int, float*, float*, int, float*, int, float*, int, float*, int*) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x9945888)
    #1 0x7f20257da256 in void at::native::(anonymous namespace)::apply_linalg_eig<float>(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, bool) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x998b256)
    #2 0x7f20257d902d in at::native::(anonymous namespace)::linalg_eig_kernel(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor const&, bool) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x998a02d)
    #3 0x7f20257b5b3d in at::native::linalg_eig_out_info(at::Tensor const&, at::Tensor&, at::Tensor&, at::Tensor&, bool) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x9966b3d)
    #4 0x7f20257b4770 in at::native::linalg_eig_out(at::Tensor const&, at::Tensor&, at::Tensor&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x9965770)
    #5 0x7f20280710e6 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor&, at::Tensor&> (at::Tensor const&, at::Tensor&, at::Tensor&), &(at::(anonymous namespace)::(anonymous namespace)::wrapper_CPU_out_linalg_eig_out(at::Tensor const&, at::Tensor&, at::Tensor&))>, std::tuple<at::Tensor&, at::Tensor&>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor&, at::Tensor&> >, std::tuple<at::Tensor&, at::Tensor&> (at::Tensor const&, at::Tensor&, at::Tensor&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor&, at::Tensor&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xc2220e6)
    #6 0x7f202727a045 in at::_ops::linalg_eig_out::call(at::Tensor const&, at::Tensor&, at::Tensor&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xb42b045)
    #7 0x7f20257b7e29 in at::native::linalg_eig(at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x9968e29)
    #8 0x7f2028070bf0 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor> (at::Tensor const&), &(at::(anonymous namespace)::(anonymous namespace)::wrapper_CPU__linalg_eig(at::Tensor const&))>, std::tuple<at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&> >, std::tuple<at::Tensor, at::Tensor> (at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xc221bf0)
    #9 0x7f2026b1f787 in std::tuple<at::Tensor, at::Tensor> c10::Dispatcher::redispatch<std::tuple<at::Tensor, at::Tensor>, at::Tensor const&>(c10::TypedOperatorHandle<std::tuple<at::Tensor, at::Tensor> (at::Tensor const&)> const&, c10::DispatchKeySet, at::Tensor const&) const (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xacd0787)
    #10 0x7f20273230a7 in at::_ops::linalg_eig::redispatch(c10::DispatchKeySet, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xb4d40a7)
    #11 0x7f202c3cc32d in torch::autograd::VariableType::(anonymous namespace)::linalg_eig(c10::DispatchKeySet, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x1057d32d)
    #12 0x7f202c3cba96 in c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&), &(torch::autograd::VariableType::(anonymous namespace)::linalg_eig(c10::DispatchKeySet, at::Tensor const&))>, std::tuple<at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&> >, std::tuple<at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0x1057ca96)
    #13 0x7f20272798e0 in at::_ops::linalg_eig::call(at::Tensor const&) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so+0xb42a8e0)
    #14 0x7f2043d97ae3 in torch::autograd::THPVariable_linalg_eig(_object*, _object*, _object*) (/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/lib/libtorch_python.so+0x23feae3)
    #15 0x5072d6 in cfunction_call /usr/local/src/conda/python-3.9.17/Objects/methodobject.c:543:19
    ...

SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebra.cpp:937:17 in
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106354
Approved by: https://github.com/huydhn, https://github.com/lezcano
2023-08-03 05:36:03 +00:00
James Donald
a1d0db1c60 [pytorch] Fix MSVC unexpected tokens following preprocessor directive (#105922)
Summary:
Fix this warning:
```
caffe2\c10\macros\Macros.h(138): warning C4067: unexpected tokens following preprocessor directive - expected a newline
```
`caffe2/c10/util/variant.h` already has a similar to check and define a stub for `__has_attribute(x)`, so this would not be new to caffe2/pytorch.

Test Plan: CI should complete, still with plenty of caffe2 warnings but this one should be gone from the Windows build log

Differential Revision: D47735319

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105922
Approved by: https://github.com/kit1980
2023-07-27 06:03:31 +00:00
PyTorch MergeBot
076781ba9b Revert "fix building errors on FreeBSD (#105897)"
This reverts commit 5c5eece6d8.

Reverted https://github.com/pytorch/pytorch/pull/105897 on behalf of https://github.com/PaliC due to causing regressions on internal models ([comment](https://github.com/pytorch/pytorch/pull/105897#issuecomment-1652840218))
2023-07-27 03:01:44 +00:00
cyy
5c5eece6d8 fix building errors on FreeBSD (#105897)
Although FreeBSD is not officially supported, this PR fixes some errors on FreeBSD.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105897
Approved by: https://github.com/kit1980
2023-07-26 08:11:42 +00:00
mikey dagitses
2ac9086987 run buildifier on unified build files (#98141)
This is pretty tricky. buildifier by default doesn't do much to these
files. It does a little more if you tell it that they are
`BUILD.bazel` files with -type=build. But it can do even more if you
remove the target definitions from the `def define_rules()` wrapper
and dedent them.

I wrote a little wrapper that does that. I'll submit it at a later
date.

Differential Revision: [D44606558](https://our.internmc.facebook.com/intern/diff/D44606558/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D44606558/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98141
Approved by: https://github.com/ezyang, https://github.com/PaliC
2023-04-04 00:37:19 +00:00
cyy
37f7c00a8a More fixes and improved clang-tidy checkers (#93213)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93213
Approved by: https://github.com/Skylion007
2023-02-01 14:44:17 +00:00
cyy
f172feae0d More tidy fixes (#93069)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93069
Approved by: https://github.com/Skylion007
2023-01-27 06:40:50 +00:00
eqy
f8026413f5 Fix CUDA_MAX_THREADS_PER_SM for sm_89 (#91972)
Basically the same as #88644, to fix warnings like `ptxas warning : Value of threads per SM for entry _ZN2at6native13reduce_kernelILi512ELi1ENS0_8ReduceOpIfNS0_10NormTwoffEEjfLi4EEEEEvT1_ is out of range. .minnctapersm will be ignored`

CC @ptrblck @ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91972
Approved by: https://github.com/ngimel
2023-01-12 00:30:27 +00:00
Kazuki Sakamoto
bfdc0358dc Compile fix for Clang + libc++ (#91212)
Summary:
LLVM 15 has a compile issue with the deprecated __has_trivial_copy. Update the GCC ifdef logic to exclude Clang + libc++.

```
caffe2/c10/util/Optional.h:536:13: error: builtin __has_trivial_copy is deprecated; use __is_trivially_copyable instead [-Werror,-Wdeprecated-builtins]
            C10_IS_TRIVIALLY_COPYABLE(T) &&
            ^
caffe2/c10/macros/Macros.h:438:38: note: expanded from macro 'C10_IS_TRIVIALLY_COPYABLE'
#define C10_IS_TRIVIALLY_COPYABLE(T) __has_trivial_copy(T)
```

Test Plan: CI

Reviewed By: kit1980

Differential Revision: D42180203

Pull Request resolved: https://github.com/pytorch/pytorch/pull/91212
Approved by: https://github.com/kit1980, https://github.com/soumith
2022-12-21 11:19:58 +00:00
mikey dagitses
322e4b4c8a set -Wsuggest-override for builds (#89852)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/pytorch/pytorch/pull/89852).
* __->__ #89852
* #89851

set -Wsuggest-override for builds

Summary: This was flagged by a Meta internal build.

Test Plan: Rely on CI.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89852
Approved by: https://github.com/malfet
2022-12-19 22:08:47 +00:00
Nikita Shulga
5654fed23e Export c10/[macros|util] headers to be used by internal inductor builds (#89249)
Summary: Fixes package boundary violation that existed in previous implementation

Test Plan: CI

Differential Revision: D41391862

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89249
Approved by: https://github.com/izaitsevfb
2022-11-18 10:51:07 +00:00
Eddie Yan
3e30a9ea1c Fix CUDA_MAX_THREADS_PER_SM for sm_87 (#88644)
#88326
CC @ngimel @ptrblck

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88644
Approved by: https://github.com/ngimel
2022-11-08 19:44:23 +00:00
Pruthvi Madugundu
fbd08fb358 Introduce TORCH_DISABLE_GPU_ASSERTS (#84190)
- Asserts for CUDA are enabled by default
- Disabled for ROCm by default by setting `TORCH_DISABLE_GPU_ASSERTS` to `ON`
- Can be enabled for ROCm by setting above variable to`OFF` during build or can be forcefully enabled by setting `ROCM_FORCE_ENABLE_GPU_ASSERTS:BOOL=ON`

This is follow up changes as per comment in PR #81790, comment [link](https://github.com/pytorch/pytorch/pull/81790#issuecomment-1215929021)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84190
Approved by: https://github.com/jeffdaily, https://github.com/malfet
2022-11-04 04:43:05 +00:00
PyTorch MergeBot
0fa23663cc Revert "Introduce TORCH_DISABLE_GPU_ASSERTS (#84190)"
This reverts commit 1e2c4a6e0e.

Reverted https://github.com/pytorch/pytorch/pull/84190 on behalf of https://github.com/malfet due to Needs internal changes, has to be landed via co-dev
2022-11-02 18:13:37 +00:00
Pruthvi Madugundu
1e2c4a6e0e Introduce TORCH_DISABLE_GPU_ASSERTS (#84190)
- Asserts for CUDA are enabled by default
- Disabled for ROCm by default by setting `TORCH_DISABLE_GPU_ASSERTS` to `ON`
- Can be enabled for ROCm by setting above variable to`OFF` during build or can be forcefully enabled by setting `ROCM_FORCE_ENABLE_GPU_ASSERTS:BOOL=ON`

This is follow up changes as per comment in PR #81790, comment [link](https://github.com/pytorch/pytorch/pull/81790#issuecomment-1215929021)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84190
Approved by: https://github.com/jeffdaily, https://github.com/malfet
2022-11-02 17:41:57 +00:00
sanchitintel
9c793b366f Move incorrectly placed closing curly brace of extern "C" block (#87853)
### Bug description
When `__SYCL_DEVICE_ONLY__` is defined, while building PyTorch, the output of the preprocessing step would not have the closing curly brace of the `extern "C"` block, as it has been incorrectly placed. Compilers don't seem to report an error or a warning for a missing closing brace of an `extern "C"` block.

### Impact of the bug
If `c10/macros/Macros.h` would be included in a C++ file, and after the preprocessing stage, if the preprocessed source file would have some templated code after `extern "C" {`, then, after compilation, linking might fail with the error `templates must have c++ linkage`). eg. https://stackoverflow.com/questions/61717819/template-with-c-linkage-error-when-using-template-keyword-in-main-cpp/61717908#61717908 (its answer also has a small snippet of code to reproduce such an issue).

### Solution in this PR
one-liner bug fix that rectifies the placement of closing curly brace (`}`), so that the `extern "C"` block ends properly when `__SYCL_DEVICE_ONLY__` is defined.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87853
Approved by: https://github.com/jgong5, https://github.com/kit1980, https://github.com/malfet
2022-10-28 03:42:20 +00:00
Richard Barnes
85ffbedfb2 Strip GCC5 stuff from PyTorch (#85914)
[This file](https://github.com/pytorch/pytorch/pull/63208/files) indicates that we don't support anything less than GCC 7.5. Given that, let's remove this GCC 5 stuff.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85914
Approved by: https://github.com/ezyang
2022-10-26 00:07:44 +00:00
Yu, Guangye
ff56f1c30d Define the SYCL device version assertation used in the other backend, like XPU (#84106)
# Motivation:
We need a device version assertation that can be used in SYCL kernel. SYCL_KERNEL_ASSERT will be used in the kernel launched on device XPU.

# Solution:
We add a macro SYCL_KERNEL_ASSERT via __assert_fail declaration for Linux and _wassert declaration for Windows even though  NDEBUG is enabled.

# Additional context:
`__assert_fail` in SYCL kernel
`extern SYCL_EXTERNAL void __assert_fail(const char *expr, const char *file, unsigned int line, const char *func);`
`_wassert` in SYCL kernel
`extern SYCL_EXTERNAL void _wassert(const wchar_t *wexpr, const wchar_t *wfile, unsigned line);`
No additional unit test because this change could not affect PyTorch's functionality. It only affects assertation in kernel on XPU backend. So it is difficult to add ut to test it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84106
Approved by: https://github.com/malfet
2022-09-01 22:22:28 +00:00
Peter Bell
a08911400e Use C10_HAS_CPP_ATTRIBUTE to simplify nodiscard definition (#83976)
`C10_HAS_CPP_ATTRIBUTE` only expands to `__has_cpp_attribute` when it
is defined, so we avoid the extra `#if defined(__has_cpp_attribute)`
checks and double-nested `#if`s
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83976
Approved by: https://github.com/albanD
2022-08-26 15:45:47 +00:00
Scott Wolchok
e4af53c1a1 [PyTorch] Remove unused sstream/string includes from c10/macros/Macros.h (#83353)
Nothing in the rest of the header seems to use these.

Differential Revision: [D38672680](https://our.internmc.facebook.com/intern/diff/D38672680/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83353
Approved by: https://github.com/malfet
2022-08-23 16:56:00 +00:00
Pruthvi Madugundu
8473e69684 [ROCm] Fixes the kernel asserts API declaration mismatch error (#81790)
This problem updates the the PR [#73040](https://github.com/pytorch/pytorch/pull/73040)

The compilation error in pyTorch with ROCm is successful with these changes when `NDEBUG` is enabled.

Solution:
For HIP we keep `__device__ __assert_fail()`
and for host side compilation we want to use the `__assert_fail()` from the glibc library.

Tested the code by compiling with below steps
```
python3 tools/amd_build/build_amd.py
python3 setup.py develop --cmake-only
cmake -DHIP_HIPCC_FLAGS_RELEASE="-DNDEBUG" build
cmake --build build
```

The UT test_fixed_cuda_assert_async is still skipped due performance overhead.

cc @jithunnair-amd

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81790
Approved by: https://github.com/shintaro-iwasaki, https://github.com/jeffdaily, https://github.com/malfet
2022-08-16 19:22:31 +00:00
Scott Wolchok
80c4919bec [PyTorch] Stack-allocate boxed args for RecordFunction
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76266

Saving a heap allocation in this path improves performance.

Differential Revision: [D34090699](https://our.internmc.facebook.com/intern/diff/D34090699/)

Approved by: https://github.com/ezyang
2022-05-24 17:06:40 +00:00
Peter Bell
2e480fc2db Cleanup ATen-core forward declarations
I noticed that when `SymInt` was introduced, `jit_type_base.h` was
added as an include to the `Operator.h` template which is supposed to
be kept extremely clean and only use forward declarations. Also,
that forward declarations for `OptionalArrayRef` were missing.

So, I've refactored the forward declarations into
`ATen/core/ATen_fwd.h` and cleaned up some of the `c10`
headers that were masking these missing declarations. I've also
re-generated the pre-compiled header so `SymInt` is included.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76576
Approved by: https://github.com/albanD
2022-05-02 14:50:48 +00:00
Hannes Friederich
5932c37198 [caffe2] drop XROS ports (#76366)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76366

caffe2 is not currently being built for XROS.

Test Plan: CI

Reviewed By: kimishpatel

Differential Revision: D35923922

fbshipit-source-id: 260dacadf0bd5b6bab7833a4ce81e896d280b053
(cherry picked from commit 8370b8dd2519d55a79fa8d45e7951ca8dc0b21a8)
2022-04-26 23:54:22 +00:00
mikey dagitses
ba8dd48d8a use cross-platform sed command in cmake_configure_file (#73589)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73589

Long-form flags do not work and neither does the \w character class.

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D34558345

Pulled By: dagitses

fbshipit-source-id: 73e1b18bdd55d67fd3936428400c3835684549b0
(cherry picked from commit fc85796fadb8dac0043ba0bf43fd54cf817665b7)
2022-03-02 12:28:51 +00:00
Shintaro Iwasaki
7dc2cfa249 [c10][rocm] fix __assert_fail() declaration mismatch error (#73040)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73040

This patch fixes a compilation error in PyTorch with ROCm when `NDEBUG` is passed.

## Problem

Forward declaration of `__host__ __device__ __assert_fail()` is used in `c10/macros/Macros.h` for HIP compilation when `NDEBUG` is set  However, HIP has  `__device__ __assert_fail()` in `hip/amd_detail/amd_device_functions.h`, causing a function type error.

This issue does not appear in ROCm CI tests since it happens only when `NDEBUG` is passed.

## Solution

[EDIT] After the discussion on GitHub, we chose to entirely disable `CUDA_KERNEL_ASSERT()` for ROCm.

 ---

To solve this compilation error, this patch disables `CUDA_KERNEL_ASSERT()`, which uses `__assert_fail()` when
1. `c10/macros/Macros.h` is included for `*.hip` (precisely speaking, `__HIP__` or `__HIP_ARCH__` is defined), and
2. `NDEBUG` is passed.

Note that there's no impact on default compilation because, without a special compilation flag, those HIP files are compiled without `-NDEBUG`. And that's why this issue has not been found.

### Justification
[1] We cannot declare one host-and-device function for two separate host and device functions.
```
__device__ int func() {return 0};
__host__ int func() {return 0};
// Compile error (hipcc)
// __device__ __host__ int func();
```
[2] Forward declaration of a correct `__device__` only `__assert_fail()` for `__HIP__` causes the following error:
```
pytorch/c10/util/TypeCast.h:135:7: error: reference to __device__ function '__assert_fail' in __host__ __device__ function
      ERROR_UNSUPPORTED_CAST
      ^
pytorch/c10/util/TypeCast.h:118:32: note: expanded from macro 'ERROR_UNSUPPORTED_CAST'
#define ERROR_UNSUPPORTED_CAST CUDA_KERNEL_ASSERT(false);
                               ^
pytorch/c10/macros/Macros.h:392:5: note: expanded from macro 'CUDA_KERNEL_ASSERT'
    __assert_fail(
```

[3] Maybe there's a way to properly define `__assert_fail()` for HIP + NDEBUG, but this might be too much. Please let me just disable it.

### Technical details

Error
```
pytorch/c10/macros/Macros.h:368:5: error: __host__ __device__ function '__assert_fail' cannot overload __device__ function '__assert_fail'
    __assert_fail(
    ^
/opt/rocm/hip/include/hip/amd_detail/amd_device_functions.h:1173:6: note: previous declaration is here
void __assert_fail(const char *assertion,
```

CUDA definition (9.x) of `__assert_fail()`
```
#elif defined(__GNUC__)
extern __host__ __device__ __cudart_builtin__ void __assert_fail(
  const char *, const char *, unsigned int, const char *)
  __THROW;
```

ROCm definition (the latest version)
```
// 2b59661f3e/include/hip/amd_detail/amd_device_functions.h (L1172-L1177)
extern "C" __device__ __attribute__((noinline)) __attribute__((weak))
void __assert_fail(const char *assertion,
                   const char *file,
                   unsigned int line,
                   const char *function);
```

Test Plan:
CI + reproducer
```
python3 tools/amd_build/build_amd.py
python3 setup.py develop --cmake-only
cmake -DHIP_HIPCC_FLAGS_RELEASE="-DNDEBUG" build
cmake --build build
```

Reviewed By: xw285cornell

Differential Revision: D34310555

fbshipit-source-id: 7542288912590533ced3f20afd2e704b6551991b
(cherry picked from commit 9e52196e36820abe36bf6427cabc7389d3ea6cb5)
2022-03-01 04:35:30 +00:00
Zhengxu Chen
fe277b8717 [jit][edge] Migrate to TypeFactory for jit types on mobile (#71516)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71516

Mobile should be able to contruct dynamic types by default.
ghstack-source-id: 147498365

Test Plan:
CI.

**-48KB** binary size reduction for igios BSB.
UMBEX link: https://www.internalfb.com/intern/unigraph/explorer/?jsgq_traversal_spec=%7B%22builds%22%3A[%22bsb%3A422553426218394%5Cu0040base%22%2C%22bsb%3A422553426218394%5Cu0040diff%22]%7D&unigraph_project=UnigraphProjectMbex&is_mbex_redirected

Reviewed By: iseeyuan

Differential Revision: D33673958

fbshipit-source-id: 8600c04ae929283681971aae264d3774188df9cd
(cherry picked from commit 64ebcec09e)
2022-01-26 07:32:04 +00:00
Michael Dagitses
78e1f9db34 port //c10/macros to common build structure (#70852)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70852

This is the first change that uses a common build file, build.bzl, to
hold most of the build logic.
ghstack-source-id: 147170895

Test Plan: Relying on internal and external CI.

Reviewed By: malfet

Differential Revision: D33299331

fbshipit-source-id: a66afffba6deec76b758dfb39bdf61d747b5bd99
(cherry picked from commit d9163c56f5)
2022-01-19 20:56:12 +00:00
Michael Dagitses
d665097cad allow Bazel to build without glog and gflags (#70850)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70850

We support both, so we want to ensure both continue to work.
ghstack-source-id: 146960552

Test Plan: Tested manually. A subsequent diff adds this test configuration to CI.

Reviewed By: malfet

Differential Revision: D33297464

fbshipit-source-id: 70e1431d0907d480c576239af93ef57036d5e4d7
2022-01-18 08:08:46 -08:00
Michael Dagitses
ffdc6b4994 extract //c10/macros to its own package (#70849)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70849

ghstack-source-id: 146960563

Test Plan: Bazel CI tests will protect this.

Reviewed By: malfet

Differential Revision: D33297235

fbshipit-source-id: 6504a977e82ad2f2232a74233b96cdea8bf94a20
2022-01-18 08:08:42 -08:00
CodemodService FBSourceClangFormatLinterBot
60632a00fe [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D33561057

fbshipit-source-id: 79873717c45c8bbe6d0ae760e718770fd960185d
2022-01-13 03:27:06 -08:00
Nolan O'Brien
8f4cec2231 [warnings][Caffe2] Suppress warnings in caffe2 headers (#71196)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71196

`caffe2` headers contain code that can elicit warnings when built with strict compiler flags.  Rather than force downstream/consuming code to weaken their compiler flags, suppress those warnings in the header using `#pragma clang diagnostic` suppressions.

Test Plan: CI Pass

Reviewed By: malfet

Differential Revision: D33536233

fbshipit-source-id: 74404e7a5edaf244f79f7a0addd991a84442a31f
2022-01-12 10:16:35 -08:00
Michael Suo
23ab6ce723 Revert D33141011: extract //c10/macros into its own package
Test Plan: revert-hammer

Differential Revision:
D33141011 (8f4c724bb6)

Original commit changeset: caa97448f922

Original Phabricator Diff: D33141011 (8f4c724bb6)

fbshipit-source-id: 79423ed51f9a43ecf1f716a739c74949b66fadb4
2021-12-22 17:48:45 -08:00
Michael Suo
f126501d37 Revert D33141010: allow Bazel to build without glog and gflags
Test Plan: revert-hammer

Differential Revision:
D33141010 (8c41f258f4)

Original commit changeset: d951e5616459

Original Phabricator Diff: D33141010 (8c41f258f4)

fbshipit-source-id: d52ca20ddf4c5a91cb09a32fecb30a00227fc4ae
2021-12-22 17:47:23 -08:00
Michael Dagitses
8c41f258f4 allow Bazel to build without glog and gflags (#69995)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69995
ghstack-source-id: 146027060

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D33141010

fbshipit-source-id: d951e5616459e8aa163ae0741e245f53185580e8
2021-12-22 14:30:30 -08:00
Michael Dagitses
8f4c724bb6 extract //c10/macros into its own package (#69994)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69994
ghstack-source-id: 145799968

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D33141011

fbshipit-source-id: caa97448f922d7c12980bf01669c1b3ef5c1213b
2021-12-22 14:30:27 -08:00
Scott Wolchok
d026057bb3 [PyTorch] Update SmallVector from LLVM (#69110)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69110

I pasted the current LLVM code, reapplied the modifications listed in the code comments, caught a few more in the diff/build process. The trivially copyable detection is different now; if gcc builds fail, will try reverting to C10_IS_TRIVIALLY_COPYABLE or copying what LLVM is doing.

The motivation for this change is that, as noted in an existing comment, C10_IS_TRIVIALLY_COPYABLE did the wrong thing for std::unique_ptr, which caused problems with D32454856 / #68412.

ghstack-source-id: 145327773

Test Plan: CI

Reviewed By: bhosmer, mruberry

Differential Revision: D32733017

fbshipit-source-id: 9452ab90328e3fdf457aad23a26f2f6835b0bd3d
2021-12-10 11:57:19 -08:00
Mengwei Liu
53aac4b6f3 [PyTorch] Allow override for macro HAS_DEMANGLE (#66540)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66540

Currently the macro `HAS_DEMANGLE` is determined by compiler predefined macros. Here I'm adding an option to allow `HAS_DEMANGLE` to be defined in build files.

Test Plan: Rely on CI

Reviewed By: poweic

Differential Revision: D31600007

fbshipit-source-id: 76cf088b0f5ee940e977d3b213f1446ea64be036
2021-10-17 16:10:45 -07:00
Mengwei Liu
d8532e3524 [PyTorch] Split c10 Type.cpp into two files to allow targets to include one of them (#66445)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66445

`Type.cpp` implements `demangle()` function based on the macro `HAS_DEMANGLE`. This diff splits it into two `.cpps` so that we can add either one into the build target. This change follows the patternof `flags_use_no_gflags.cpp` and `flags_use_gflags.cpp`.

Test Plan: Rely on CI

Reviewed By: iseeyuan

Differential Revision: D31551432

fbshipit-source-id: f8b11783e513fa812228ec873459ad3043ff9147
2021-10-11 21:52:24 -07:00
Karol Kosik
eb3b9fe719 [XROS][ML] System specific adjustments for UTs to work. (#65245)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65245

Building and running c10 and qnnpack tests on XROS.

Notable changes:
- Adding #if define(_XROS_) in few places not supported by XROS
- Changing Threadpool to abstract class
ghstack-source-id: 139513579

Test Plan: Run c10 and qnnpack tests on XROS.

Reviewed By: veselinp, iseeyuan

Differential Revision: D30137333

fbshipit-source-id: bb6239b935187fac712834341fe5a8d3377762b1
2021-10-01 18:15:14 -07:00
Pruthvi Madugundu
085e2f7bdd [ROCm] Changes not to rely on CUDA_VERSION or HIP_VERSION (#65610)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65610

- Replace HIP_PLATFORM_HCC with USE_ROCM
- Dont rely on CUDA_VERSION or HIP_VERSION and use USE_ROCM and ROCM_VERSION.

- In the next PR
   - Will be removing the mapping from CUDA_VERSION to HIP_VERSION and CUDA to HIP in hipify.
   - HIP_PLATFORM_HCC is deprecated, so will add HIP_PLATFORM_AMD to support HIP host code compilation on gcc.

cc jeffdaily sunway513 jithunnair-amd ROCmSupport amathews-amd

Reviewed By: jbschlosser

Differential Revision: D30909053

Pulled By: ezyang

fbshipit-source-id: 224a966ebf1aaec79beccbbd686fdf3d49267e06
2021-09-29 09:55:43 -07:00
Jane Xu
6707dfeefb Remove 9.2 related macros for CONSTEXPR (#65066)
Summary:
Removes C10_HOST_CONSTEXPR_EXCEPT_CUDA92 references in the code

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65066

Reviewed By: driazati

Differential Revision: D31022520

Pulled By: janeyx99

fbshipit-source-id: f02cdc6caba5b48405575242921f5845ff18f729
2021-09-17 17:31:20 -07:00
Jeff Daily
c7b03e2b83 [ROCm] define C10_WARP_SIZE to warpSize HIP constant (#64302)
Summary:
warpSize is defined as a constexpr in HIP headers.  It is incorrect to assume warpSize 64.  This change fixes the C10_WARP_SIZE definition in torch sources similar to [how it was done in caffe2](https://github.com/pytorch/pytorch/blob/master/caffe2/utils/GpuDefs.cuh#L10-L14).

cc jeffdaily sunway513 jithunnair-amd ROCmSupport

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64302

Reviewed By: mrshenli

Differential Revision: D30785975

Pulled By: malfet

fbshipit-source-id: 68f8333182ad4d02bd0c8d02f1751a50bc5bafa7
2021-09-10 09:43:47 -07:00
Dmytro Dzhulgakov
f446e835ee Fix CUDA_KERNEL_ASSERT ambiguous symbol in NDEBUG mode (#62527)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62527

If NDEBUG is applied inconsistently in compilation we might get 'ambiguous declaration' error. Let's make sure that the forward declaration matches glibc including all specifiers.

Test Plan: sandcastle

Reviewed By: mdschatz

Differential Revision: D30030051

fbshipit-source-id: 9f4d5f1d4e74f0a4eaeeaaaad76b93ee485d8bcd
2021-08-11 01:10:09 -07:00
Brian Hirsh
7bc86458e1 Revert "Revert D28833086: beef up at::_ops API" (#60214)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60214

Relanding this PR, but with a fix for windows cuda builds (example failure in master here: https://github.com/pytorch/pytorch/runs/2852662871)

This is identical to the original PR except for one change in `tools/codegen/gen.py`: `static constexpr` -> `static CONSTEXPR_EXCEPT_WIN_CUDA`

This actually took a while to figure out, until I tracked down a previous pytorch PR that encountered a similar issue: https://github.com/pytorch/pytorch/pull/40675

This reverts commit 6d0fb85a62.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D29213932

Pulled By: bdhirsh

fbshipit-source-id: b90c7c10e5a51f8d6173ddca673b418e5774c248
2021-06-24 18:08:54 -07:00
Erjia Guan
65f33ec85c Follow-up fix for compilation error on CUDA92 (#60287)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60287

Follow up of #60017

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D29236208

Pulled By: ejguan

fbshipit-source-id: f1acf9630b45fea8cbdf7d64e47661643d0a52b8
2021-06-21 13:29:11 -07:00
Erjia Guan
691183bb74 Fix compile failure on CUDA92 (#60017)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60016

For CUDA 92
- OptionalBase was not check if `is_arrayref`
- constexpr seems not expect to raise Exception for cuda 92

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60017

Reviewed By: malfet

Differential Revision: D29139515

Pulled By: ejguan

fbshipit-source-id: 4f4f6d9fe6a5f2eadf913de0a9781cc9f2e6ac6f
2021-06-16 12:23:08 -07:00
Jeff Daily
24e27af683 [ROCm] enable kernel asserts (#49624)
Summary:
Addresses missing ROCm feature indicated in https://github.com/pytorch/pytorch/issues/38943.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49624

Reviewed By: agolynski

Differential Revision: D28902459

Pulled By: malfet

fbshipit-source-id: 29c9b552770241a0ec52cd057ea45efc4389d838
2021-06-07 13:43:07 -07:00
Edward Yang
f05d5bec48 Preserve PyObject even when it goes dead (#56017)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56017

Fixes #55686

This patch is seemingly straightforward but some of the changes are very
subtle.  For the general algorithmic approach, please first read the
quoted issue.  Based on the algorithm, there are some fairly
straightforward changes:

- New boolean on TensorImpl tracking if we own the pyobj or not
- PythonHooks virtual interface for requesting deallocation of pyobj
  when TensorImpl is being released and we own its pyobj, and
  implementation of the hooks in python_tensor.cpp
- Modification of THPVariable to MaybeOwned its C++ tensor, directly
  using swolchok's nice new class

And then, there is python_variable.cpp.  Some of the changes follow the
general algorithmic approach:

- THPVariable_NewWithVar is simply adjusted to handle MaybeOwned and
  initializes as owend (like before)
- THPVariable_Wrap adds the logic for reverting ownership back to
  PyObject when we take out an owning reference to the Python object
- THPVariable_dealloc attempts to resurrect the Python object if
  the C++ tensor is live, and otherwise does the same old implementation
  as before
- THPVariable_tryResurrect implements the resurrection logic.  It is
  modeled after CPython code so read the cited logic and see if
  it is faithfully replicated
- THPVariable_clear is slightly updated for MaybeOwned and also to
  preserve the invariant that if owns_pyobj, then pyobj_ is not null.
  This change is slightly dodgy: the previous implementation has a
  comment mentioning that the pyobj nulling is required to ensure we
  don't try to reuse the dead pyobj.  I don't think, in this new world,
  this is possible, because the invariant says that the pyobj only
  dies if the C++ object is dead too.  But I still unset the field
  for safety.

And then... there is THPVariableMetaType.  colesbury explained in the
issue why this is necessary: when destructing an object in Python, you
start off by running the tp_dealloc of the subclass before moving up
to the parent class (much in the same way C++ destructors work).  The
deallocation process for a vanilla Python-defined class does irreparable
harm to the PyObject instance (e.g., the finalizers get run) making it
no longer valid attempt to resurrect later in the tp_dealloc chain.
(BTW, the fact that objects can resurrect but in an invalid state is
one of the reasons why it's so frickin' hard to write correct __del__
implementations).  So we need to make sure that we actually override
the tp_dealloc of the bottom most *subclass* of Tensor to make sure
we attempt a resurrection before we start finalizing.  To do this,
we need to define a metaclass for Tensor that can override tp_dealloc
whenever we create a new subclass of Tensor.  By the way, it was totally
not documented how to create metaclasses in the C++ API, and it took
a good bit of trial error to figure it out (and the answer is now
immortalized in https://stackoverflow.com/q/67077317/23845 -- the things
that I got wrong in earlier versions of the PR included setting
tp_basicsize incorrectly, incorrectly setting Py_TPFLAGS_HAVE_GC on
the metaclass--you want to leave it unset so that it inherits, and
determining that tp_init is what actually gets called when you construct
a class, not tp_call as another not-to-be-named StackOverflow question
suggests).

Aside: Ordinarily, adding a metaclass to a class is a user visible
change, as it means that it is no longer valid to mixin another class
with a different metaclass.  However, because _C._TensorBase is a C
extension object, it will typically conflict with most other
metaclasses, so this is not BC breaking.

The desired new behavior of a subclass tp_dealloc is to first test if
we should resurrect, and otherwise do the same old behavior.  In an
initial implementation of this patch, I implemented this by saving the
original tp_dealloc (which references subtype_dealloc, the "standard"
dealloc for all Python defined classes) and invoking it.  However, this
results in an infinite loop, as it attempts to call the dealloc function
of the base type, but incorrectly chooses subclass type (because it is
not a subtype_dealloc, as we have overridden it; see
b38601d496/Objects/typeobject.c (L1261) )
So, with great reluctance, I must duplicate the behavior of
subtype_dealloc in our implementation.  Note that this is not entirely
unheard of in Python binding code; for example, Cython
c25c3ccc4b/Cython/Compiler/ModuleNode.py (L1560)
also does similar things.  This logic makes up the bulk of
THPVariable_subclass_dealloc

To review this, you should pull up the CPython copy of subtype_dealloc
b38601d496/Objects/typeobject.c (L1230)
and verify that I have specialized the implementation for our case
appropriately.  Among the simplifications I made:

- I assume PyType_IS_GC, because I assume that Tensor subclasses are
  only ever done in Python and those classes are always subject to GC.
  (BTW, yes!  This means I have broken anyone who has extend PyTorch
  tensor from C API directly.  I'm going to guess no one has actually
  done this.)

- I don't bother walking up the type bases to find the parent dealloc;
  I know it is always THPVariable_dealloc.  Similarly, I can get rid
  of some parent type tests based on knowledge of how
  THPVariable_dealloc is defined

- The CPython version calls some private APIs which I can't call, so
  I use the public PyObject_GC_UnTrack APIs.

- I don't allow the finalizer of a Tensor to change its type (but
  more on this shortly)

One alternative I discussed with colesbury was instead of copy pasting
the subtype_dealloc, we could transmute the type of the object that was
dying to turn it into a different object whose tp_dealloc is
subtype_dealloc, so the stock subtype_dealloc would then be applicable.
We decided this would be kind of weird and didn't do it that way.

TODO:

- More code comments

- Figure out how not to increase the size of TensorImpl with the new
  bool field

- Add some torture tests for the THPVariable_subclass_dealloc, e.g.,
  involving subclasses of Tensors that do strange things with finalizers

- Benchmark the impact of taking the GIL to release C++ side tensors
  (e.g., from autograd)

- Benchmark the impact of adding a new metaclass to Tensor (probably
  will be done by separating out the metaclass change into its own
  change)

- Benchmark the impact of changing THPVariable to conditionally own
  Tensor (as opposed to unconditionally owning it, as before)

- Add tests that this actually indeed preserves the Python object

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D27765125

Pulled By: ezyang

fbshipit-source-id: 857f14bdcca2900727412aff4c2e2d7f0af1415a
2021-06-03 10:50:36 -07:00
chengjun
511979df85 Define the SYCL device version __assert_fail when the NDEBUG defined. (#58906)
Summary:
## Motivation
The utils in namespace  `c10` require the `__assert_fail` when the NDEBUG is defined in kernel code.

The `__assert_fail` declaration in pytorch is not compatible to the SYCL‘s specification.

This causes compile error when use these utils in SYCL kernels.

## Solution
Add the `__assert_fail` declaration for SYCL kernels to pytorch when compiling the SYCL kernels with `c10` utils.

## Additional context
`__assert_fail` in SYCL kernel

`extern SYCL_EXTERNAL void __assert_fail(const char *expr, const char *file, unsigned int line, const char *func);`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58906

Reviewed By: anjali411

Differential Revision: D28700863

Pulled By: ezyang

fbshipit-source-id: 81896d022b35ace8cd16474128649eabedfaf138
2021-05-26 12:48:37 -07:00