Commit Graph

55 Commits

Author SHA1 Message Date
Jeff Daily
5379b5f927 [ROCm] use hipblas instead of rocblas (#105881)
- BatchLinearAlgebraLib.cpp is now split into one additional file
  - BatchLinearAlgebraLib.cpp uses only cusolver APIs
  - BatchLinearAlgebraLibBlas.cpp uses only cublas APIs
  - hipify operates at the file level and cannot mix cusolver and cublas APIs within the same file
- cmake changes to link against hipblas instead of rocblas
- hipify mappings changes to map cublas -> hipblas instead of rocblas

Pull Request resolved: https://github.com/pytorch/pytorch/pull/105881
Approved by: https://github.com/albanD
2023-07-31 20:42:55 +00:00
Andres Lugo-Reyes
eaffd98880 Enable hipSOLVER in ROCm builds (#97370)
Enables the hipSolver backend for ROCm builds
--------------------------------------------------------------------------

- Minimum ROCm version requirement - 5.3
- Introduces new macro USE_LINALG_SOLVER the controls enablement of both cuSOLVER and hipSOLVER
- Adds hipSOLVER API to hipification process
- combines hipSOLVER and hipSPARSE mappings into single SPECIAL map that takes priority among normal mappings
- Torch api to be moved to hipsolver backend (as opposed to magma) include: torch.svd(), torch.geqrf(), torch.orgqr(), torch.ormqr()
- Will enable 100+ linalg unit tests for ROCm

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97370
Approved by: https://github.com/malfet
2023-05-31 16:53:23 +00:00
Pruthvi Madugundu
08f125bcac [ROCm] Remove usage of deprecated ROCm component header includes (#97620)
- clang parameter 'amdgpu-target' changed to 'offload-arch'
- HIP and MIOpen includes path updated for extensions

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97620
Approved by: https://github.com/ezyang, https://github.com/jithunnair-amd
2023-03-28 19:28:38 +00:00
Pruthvi Madugundu
fbd08fb358 Introduce TORCH_DISABLE_GPU_ASSERTS (#84190)
- Asserts for CUDA are enabled by default
- Disabled for ROCm by default by setting `TORCH_DISABLE_GPU_ASSERTS` to `ON`
- Can be enabled for ROCm by setting above variable to`OFF` during build or can be forcefully enabled by setting `ROCM_FORCE_ENABLE_GPU_ASSERTS:BOOL=ON`

This is follow up changes as per comment in PR #81790, comment [link](https://github.com/pytorch/pytorch/pull/81790#issuecomment-1215929021)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84190
Approved by: https://github.com/jeffdaily, https://github.com/malfet
2022-11-04 04:43:05 +00:00
PyTorch MergeBot
0fa23663cc Revert "Introduce TORCH_DISABLE_GPU_ASSERTS (#84190)"
This reverts commit 1e2c4a6e0e.

Reverted https://github.com/pytorch/pytorch/pull/84190 on behalf of https://github.com/malfet due to Needs internal changes, has to be landed via co-dev
2022-11-02 18:13:37 +00:00
Pruthvi Madugundu
1e2c4a6e0e Introduce TORCH_DISABLE_GPU_ASSERTS (#84190)
- Asserts for CUDA are enabled by default
- Disabled for ROCm by default by setting `TORCH_DISABLE_GPU_ASSERTS` to `ON`
- Can be enabled for ROCm by setting above variable to`OFF` during build or can be forcefully enabled by setting `ROCM_FORCE_ENABLE_GPU_ASSERTS:BOOL=ON`

This is follow up changes as per comment in PR #81790, comment [link](https://github.com/pytorch/pytorch/pull/81790#issuecomment-1215929021)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84190
Approved by: https://github.com/jeffdaily, https://github.com/malfet
2022-11-02 17:41:57 +00:00
Pruthvi Madugundu
8473e69684 [ROCm] Fixes the kernel asserts API declaration mismatch error (#81790)
This problem updates the the PR [#73040](https://github.com/pytorch/pytorch/pull/73040)

The compilation error in pyTorch with ROCm is successful with these changes when `NDEBUG` is enabled.

Solution:
For HIP we keep `__device__ __assert_fail()`
and for host side compilation we want to use the `__assert_fail()` from the glibc library.

Tested the code by compiling with below steps
```
python3 tools/amd_build/build_amd.py
python3 setup.py develop --cmake-only
cmake -DHIP_HIPCC_FLAGS_RELEASE="-DNDEBUG" build
cmake --build build
```

The UT test_fixed_cuda_assert_async is still skipped due performance overhead.

cc @jithunnair-amd

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81790
Approved by: https://github.com/shintaro-iwasaki, https://github.com/jeffdaily, https://github.com/malfet
2022-08-16 19:22:31 +00:00
Dmitry Mikushin
e08026d4d4 Use miopen_LIBRARIES and rccl_LIBRARIES directly, when they are valid target (#80446)
As of [this RCCL PR](https://github.com/ROCmSoftwarePlatform/rccl/pull/570), `${rccl_LIBRARIES}` refers to the actual RCCL library target, not just a symbolic "rccl" string. So starting from the next release, no special treatment of it would be required in PyTorch anymore. This patch checks whether `${RCCL_LIBRARIES}` and `${MIOpen_LIBRARIES}` are already valid, and if they are - is not trying to find them manually.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/80446
Approved by: https://github.com/pruthvistony, https://github.com/malfet
2022-07-06 23:39:59 +00:00
Jeff Daily
64b543434d [ROCm] update cmake package DIR paths (#77087)
Fixes nightly libtorch builds.  As of ROCm 5.1.x, all *.cmake files are under /opt/rocm/lib/cmake/package instead of /opt/rocm/package/lib/cmake.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77087
Approved by: https://github.com/seemethere
2022-05-10 19:06:51 +00:00
Shintaro Iwasaki
7dc2cfa249 [c10][rocm] fix __assert_fail() declaration mismatch error (#73040)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73040

This patch fixes a compilation error in PyTorch with ROCm when `NDEBUG` is passed.

## Problem

Forward declaration of `__host__ __device__ __assert_fail()` is used in `c10/macros/Macros.h` for HIP compilation when `NDEBUG` is set  However, HIP has  `__device__ __assert_fail()` in `hip/amd_detail/amd_device_functions.h`, causing a function type error.

This issue does not appear in ROCm CI tests since it happens only when `NDEBUG` is passed.

## Solution

[EDIT] After the discussion on GitHub, we chose to entirely disable `CUDA_KERNEL_ASSERT()` for ROCm.

 ---

To solve this compilation error, this patch disables `CUDA_KERNEL_ASSERT()`, which uses `__assert_fail()` when
1. `c10/macros/Macros.h` is included for `*.hip` (precisely speaking, `__HIP__` or `__HIP_ARCH__` is defined), and
2. `NDEBUG` is passed.

Note that there's no impact on default compilation because, without a special compilation flag, those HIP files are compiled without `-NDEBUG`. And that's why this issue has not been found.

### Justification
[1] We cannot declare one host-and-device function for two separate host and device functions.
```
__device__ int func() {return 0};
__host__ int func() {return 0};
// Compile error (hipcc)
// __device__ __host__ int func();
```
[2] Forward declaration of a correct `__device__` only `__assert_fail()` for `__HIP__` causes the following error:
```
pytorch/c10/util/TypeCast.h:135:7: error: reference to __device__ function '__assert_fail' in __host__ __device__ function
      ERROR_UNSUPPORTED_CAST
      ^
pytorch/c10/util/TypeCast.h:118:32: note: expanded from macro 'ERROR_UNSUPPORTED_CAST'
#define ERROR_UNSUPPORTED_CAST CUDA_KERNEL_ASSERT(false);
                               ^
pytorch/c10/macros/Macros.h:392:5: note: expanded from macro 'CUDA_KERNEL_ASSERT'
    __assert_fail(
```

[3] Maybe there's a way to properly define `__assert_fail()` for HIP + NDEBUG, but this might be too much. Please let me just disable it.

### Technical details

Error
```
pytorch/c10/macros/Macros.h:368:5: error: __host__ __device__ function '__assert_fail' cannot overload __device__ function '__assert_fail'
    __assert_fail(
    ^
/opt/rocm/hip/include/hip/amd_detail/amd_device_functions.h:1173:6: note: previous declaration is here
void __assert_fail(const char *assertion,
```

CUDA definition (9.x) of `__assert_fail()`
```
#elif defined(__GNUC__)
extern __host__ __device__ __cudart_builtin__ void __assert_fail(
  const char *, const char *, unsigned int, const char *)
  __THROW;
```

ROCm definition (the latest version)
```
// 2b59661f3e/include/hip/amd_detail/amd_device_functions.h (L1172-L1177)
extern "C" __device__ __attribute__((noinline)) __attribute__((weak))
void __assert_fail(const char *assertion,
                   const char *file,
                   unsigned int line,
                   const char *function);
```

Test Plan:
CI + reproducer
```
python3 tools/amd_build/build_amd.py
python3 setup.py develop --cmake-only
cmake -DHIP_HIPCC_FLAGS_RELEASE="-DNDEBUG" build
cmake --build build
```

Reviewed By: xw285cornell

Differential Revision: D34310555

fbshipit-source-id: 7542288912590533ced3f20afd2e704b6551991b
(cherry picked from commit 9e52196e36820abe36bf6427cabc7389d3ea6cb5)
2022-03-01 04:35:30 +00:00
Douglas Lehr
eb4e6ca30c [ROCM] Add ROCM version api within cmake (#69481)
Summary:
In ROCm 5.0 and later the version of the ROCm platform can be obtained via
an api call vs reading from a flat file.

If the header file /opt/rocm/include/rocm_version.h exists,
LoadHIP.cmake compiles source referencing the api and prints out the
ROCM Versions.

If the file does not exist, LoadHIP.cmake will revert to the previous
approach of looking for the version-dev file.

Fixes #{issue number}

cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69481

Reviewed By: seemethere, janeyx99

Differential Revision: D34153435

Pulled By: malfet

fbshipit-source-id: f8c0650d27666d2a3cf47d812807798c47210b37
(cherry picked from commit 6cbb4f7a0c)
2022-02-11 00:15:10 +00:00
Jithun Nair
8dfdc3df82 [ROCm] Refactor how to specify AMD gpu targets using PYTORCH_ROCM_ARCH (#61706)
Summary:
Remove all hardcoded AMD gfx targets

PyTorch build and Magma build will use rocm_agent_enumerator as
backup if PYTORCH_ROCM_ARCH env var is not defined

PyTorch extensions will use same gfx targets as the PyTorch build,
unless PYTORCH_ROCM_ARCH env var is defined

torch.cuda.get_arch_list() now works for ROCm builds

PyTorch CI dockers will continue to be built for gfx900 and gfx906 for now.

PYTORCH_ROCM_ARCH env var can be a space or semicolon separated list of gfx archs eg. "gfx900 gfx906" or "gfx900;gfx906"
cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61706

Reviewed By: seemethere

Differential Revision: D32735862

Pulled By: malfet

fbshipit-source-id: 3170e445e738e3ce373203e1e4ae99c84e645d7d
2021-12-13 15:41:40 -08:00
Pruthvi Madugundu
ab7a472980 [ROCm] Update HIP_VERSION to TORCH_HIP_VERSION (#62786)
Summary:
- HIP_VERSION semantic versioning will change in ROCm4.3. The changes essentially remove the dependency on HIP_VERSION provided in the hip header to keep code compatible with older and newer versions of ROCm.
- TORCH_HIP_VERSION is derived from HIP_VERSION_MAJOR and HIP_VERSION_MINOR

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62786

Reviewed By: bdhirsh

Differential Revision: D30281682

Pulled By: seemethere

fbshipit-source-id: e41e69fb9e13de5ddd1af99ba5bbdcbb7b64b673
2021-08-13 15:00:43 -07:00
Jeff Daily
24e27af683 [ROCm] enable kernel asserts (#49624)
Summary:
Addresses missing ROCm feature indicated in https://github.com/pytorch/pytorch/issues/38943.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49624

Reviewed By: agolynski

Differential Revision: D28902459

Pulled By: malfet

fbshipit-source-id: 29c9b552770241a0ec52cd057ea45efc4389d838
2021-06-07 13:43:07 -07:00
Jeff Daily
e1752ffa04 [reland][ROCm] use hiprtc precompiled header (#55965)
Summary:
Revert "Revert D27449031 (2a7df657fe): [pytorch][PR] [ROCm] use hiprtc precompiled header".  Reland PR https://github.com/pytorch/pytorch/issues/54350.

This reverts commit 204ac21bf1.

The original PR was reverted under suspicion that it was causing CI instability, but it was instead due to a hardware failure.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55965

Reviewed By: jbschlosser

Differential Revision: D27755907

Pulled By: malfet

fbshipit-source-id: 75bf0b9d888df3dee62f00a366b1123757e0474e
2021-04-15 15:47:56 -07:00
Jeff Daily
a128938a75 [ROCm] add MAGMA_HOME env var hint to cmake, centos-rocm Dockerfile (#54511)
Summary:
MAGMA_HOME was previously set for the ubuntu-rocm/Dockerfile.  However, this missed centos builds as well as any builds that do not use the CI image environments.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54511

Reviewed By: jbschlosser

Differential Revision: D27755983

Pulled By: malfet

fbshipit-source-id: 1ffd2cd100f4221c2bb64e6915fa3372ee1f6247
2021-04-15 01:06:44 -07:00
Alexander Golynski
204ac21bf1 Revert D27449031: [pytorch][PR] [ROCm] use hiprtc precompiled header
Test Plan: revert-hammer

Differential Revision:
D27449031 (2a7df657fe)

Original commit changeset: 81a8d7847a47

fbshipit-source-id: b7b970c8ea4110357fba3ad4d52a86fa5641d90c
2021-04-01 06:42:04 -07:00
Jeff Daily
2a7df657fe [ROCm] use hiprtc precompiled header (#54350)
Summary:
HIP's runtime compiler (hiprtc) is adding support for precompiled HIP headers in the ROCm 4.2 release.  Conditionally add support for this feature.  Using this feature will improve the ROCm torch wheel user experience; users will no longer need to install HIP headers separately to use torch JIT features.

The use of this feature is conditionalized on a new ROCM_VERSION macro.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54350

Reviewed By: H-Huang

Differential Revision: D27449031

Pulled By: malfet

fbshipit-source-id: 81a8d7847a47ce2bb253d1ea58740ef66ed154a3
2021-03-31 13:36:50 -07:00
Michael Melesse
2620bce42a [ROCM] load only hipfft separately past rocm4.1 (#54349)
Summary:
This PR is a follow up to https://github.com/pytorch/pytorch/pull/53408.

It only loads hipfft if the version is rocm 4.1 or after and stops loading rocfft. This was done to resolve some issues observed in our internal ci due to conflicts.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54349

Reviewed By: ezyang

Differential Revision: D27374252

Pulled By: ngimel

fbshipit-source-id: 724e80df5011ea8fabd81739e18ae8a13d3a7ea0
2021-03-26 19:54:25 -07:00
Michael Melesse
4c1af249fb [ROCM] load hipfft separately from rocfft (#53408)
Summary:
This PR makes changes to how hipfft is loaded in pytorch. hipfft is packaged in a separate library to rocfft following rocm 4.1.

We check the rocm version and if it is past rocm 4.1 we load hipfft in addition to rocfft.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53408

Reviewed By: albanD

Differential Revision: D26952702

Pulled By: malfet

fbshipit-source-id: f42be304b587c060816e39d36f5c1a2cdc37bfab
2021-03-11 09:18:33 -08:00
Jithun Nair
5f62308739 Hipify revamp [REDUX] (#48715)
Summary:
[Refiled version of earlier PR https://github.com/pytorch/pytorch/issues/45451]

This PR revamps the hipify module in PyTorch to overcome a long list of shortcomings in the original implementation. However, these improvements are applied only when using hipify to build PyTorch extensions, not for PyTorch or Caffe2 itself.

Correspondingly, changes are made to cpp_extension.py to match these improvements.

The list of improvements to hipify is as follows:

1. Hipify files in the same directory as the original file, unless there's a "cuda" subdirectory in the original file path, in which case the hipified file will be in the corresponding file path with "hip" subdirectory instead of "cuda".
2. Never hipify the file in-place if changes are introduced due to hipification i.e. always ensure the hipified file either resides in a different folder or has a different filename compared to the original file.
3. Prevent re-hipification of already hipified files. This avoids creation of unnecessary "hip/hip" etc. subdirectories and additional files which have no actual use.
4. Do not write out hipified versions of files if they are identical to the original file. This results in a cleaner output directory, with minimal number of hipified files created.
5. Update header rewrite logic so that it accounts for the previous improvement.
6. Update header rewrite logic so it respects the rules for finding header files depending on whether "" or <> is used.
7. Return a dictionary of mappings of original file paths to hipified file paths from hipify function.
8. Introduce a version for hipify module to allow extensions to contain back-compatible code that targets a specific point in PyTorch where the hipify functionality changed.
9. Update cuda_to_hip_mappings.py to account for the ROCm component subdirectories inside /opt/rocm/include. This also results in cleanup of the Caffe2_HIP_INCLUDE path to remove unnecessary additions to the include path.

The list of changes to cpp_extension.py is as follows:

1. Call hipify when building a CUDAExtension for ROCm.
2. Prune the list of source files to CUDAExtension to include only the hipified versions of any source files in the list (if both original and hipified versions of the source file are in the list)
3. Add subdirectories of /opt/rocm/include to the include path for extensions, so that ROCm headers for subcomponent libraries are found automatically

cc jeffdaily sunway513 ezyang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48715

Reviewed By: bdhirsh

Differential Revision: D25272824

Pulled By: ezyang

fbshipit-source-id: 8bba68b27e41ca742781e1c4d7b07c6f985f040e
2020-12-02 18:03:23 -08:00
Nikita Shulga
8af9f2cc23 Revert D24924736: [pytorch][PR] Hipify revamp
Test Plan: revert-hammer

Differential Revision:
D24924736 (10b490a3e0)

Original commit changeset: 4af42b8ff4f2

fbshipit-source-id: 7f8f90d55d8a69a2890ec73622fcea559189e381
2020-11-18 11:48:30 -08:00
Jithun Nair
10b490a3e0 Hipify revamp (#45451)
Summary:
This PR revamps the hipify module in PyTorch to overcome a long list of shortcomings in the original implementation. However, these improvements are applied only when using hipify to build PyTorch extensions, **not for PyTorch or Caffe2 itself**.

Correspondingly, changes are made to `cpp_extension.py` to match these improvements.

The list of improvements to hipify is as follows:

1. Hipify files in the same directory as the original file, unless there's a "cuda" subdirectory in the original file path, in which case the hipified file will be in the corresponding file path with "hip" subdirectory instead of "cuda".
2. Never hipify the file in-place if changes are introduced due to hipification i.e. always ensure the hipified file either resides in a different folder or has a different filename compared to the original file.
3. Prevent re-hipification of already hipified files. This avoids creation of unnecessary "hip/hip" etc. subdirectories and additional files which have no actual use.
4. Do not write out hipified versions of files if they are identical to the original file. This results in a cleaner output directory, with minimal number of hipified files created.
5. Update header rewrite logic so that it accounts for the previous improvement.
6. Update header rewrite logic so it respects the rules for finding header files depending on whether `""` or `<>` is used.
7. Return a dictionary of mappings of original file paths to hipified file paths from `hipify` function.
8. Introduce a version for hipify module to allow extensions to contain back-compatible code that targets a specific point in PyTorch where the hipify functionality changed.
9. Update `cuda_to_hip_mappings.py` to account for the ROCm component subdirectories inside `/opt/rocm/include`. This also results in cleanup of the `Caffe2_HIP_INCLUDE` path to remove unnecessary additions to the include path.

The list of changes to `cpp_extension.py` is as follows:
1. Call `hipify` when building a CUDAExtension for ROCm.
2. Prune the list of source files to CUDAExtension to include only the hipified versions of any source files in the list (if both original and hipified versions of the source file are in the list)
3. Add subdirectories of /opt/rocm/include to the include path for extensions, so that ROCm headers for subcomponent libraries are found automatically

cc jeffdaily sunway513 hgaspar lcskrishna ashishfarmer

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45451

Reviewed By: ezyang

Differential Revision: D24924736

Pulled By: malfet

fbshipit-source-id: 4af42b8ff4f21c3782dedb8719b8f9f86b34bd2d
2020-11-18 08:37:49 -08:00
acxz
5e0bfd7455 [Build] [CMake] [ROCm] find hsa-runtime64 properly (#45550)
Summary:
Properly Fixes https://github.com/pytorch/pytorch/issues/44384
similar in vein to https://github.com/pytorch/pytorch/issues/42064

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45550

Reviewed By: ezyang

Differential Revision: D24412674

Pulled By: malfet

fbshipit-source-id: f3d056c7069cb9d8a7d4174b604b9e3fbb14180b
2020-10-20 08:38:32 -07:00
acxz
e59db43313 Find hip properly (#42064)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/41886

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42064

Reviewed By: seemethere

Differential Revision: D22757115

Pulled By: malfet

fbshipit-source-id: 9c8805e6eb0b7d7defe0ecb08c1e45dcc775a237
2020-07-27 13:47:01 -07:00
Jeff Daily
5152633258 [ROCm] update hip library name (#41813)
Summary:
With transition to hipclang, the HIP runtime library name was changed.  A symlink was added to ease the transition, but is going to be removed.  Conditionally set library name based on HIP compiler used.  Patch gloo submodule as part of build_amd.py script until its associated fix is available.

CC ezyang xw285cornell sunway513

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41813

Reviewed By: zhangguanheng66

Differential Revision: D22660077

Pulled By: xw285cornell

fbshipit-source-id: c538129268d9947535b34523201f655b13c9e0a3
2020-07-22 09:42:45 -07:00
peter
3bdc4a37ed CMake script cleanup - mixed case for function names (#35589)
Summary:
Running the following code.
```bash
cmake --help-command-list |
grep -v "cmake version" |
while read c; do
    echo 's/\b'"$(echo $c | tr '[:lower:]' '[:upper:]')"'\(\s*\)(/'"$c"'\1(/g'
done >convert.sed &&
git ls-files -z -- bootstrap '*.cmake' '*.cmake.in' '*CMakeLists.txt' |
egrep -z -v '^(cmake/Modules/|cmake/Modules_CUDA_fix/)' |
xargs -0 sed -i -f convert.sed &&
rm convert.sed
```
cmake-lint is too sensitive about mixed case so I didn't switch the check on.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35589

Differential Revision: D20735648

Pulled By: ezyang

fbshipit-source-id: a09a60a7ce921bb198575a35335faa299bd10b66
2020-03-30 11:37:02 -07:00
peter
45c9ed825a Formatting cmake (to lowercase without space for if/elseif/else/endif) (#35521)
Summary:
Running commands:
```bash
shopt -s globstar

sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i CMakeLists.txt
sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i caffe2/**/CMakeLists.txt
sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i torch/**/CMakeLists.txt
sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i c10/**/CMakeLists.txt
sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i cmake/**/*.cmake
sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i cmake/**/*.cmake.in
```
We may further convert all the commands into lowercase according to the following issue: 77543bde41.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35521

Differential Revision: D20704382

Pulled By: malfet

fbshipit-source-id: 42186b9b1660c34428ab7ceb8d3f7a0ced5d2e80
2020-03-27 14:25:17 -07:00
Johannes M Dieterich
17c672e704 enable rocTX API (#27416)
Summary:
ROCm 2.9 brings support for the rocTX API through rocTracer.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27416

Differential Revision: D17777480

Pulled By: bddppq

fbshipit-source-id: 6bce9b54c94e5b4c5787570d2b85736882bd23a7
2019-10-05 01:55:00 -07:00
Junjie Bai
f4d0d0a811 Enable RCCL in ROCm build (#27383)
Summary:
continues https://github.com/pytorch/pytorch/pull/23884
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27383

Differential Revision: D17767248

Pulled By: bddppq

fbshipit-source-id: 3a506844ca6f01d7bbe8be5bde0976999e3a2b90
2019-10-04 17:41:41 -07:00
J M Dieterich
32d009a37f Add gfx908 to the list of per-default compiled architectures. (#27388)
Summary:
ROCm 2.8 added preliminary support for gfx908.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27388

Differential Revision: D17767772

Pulled By: bddppq

fbshipit-source-id: 172daf5bb66d3db86a13e287059af4b9b90a7f57
2019-10-04 14:49:33 -07:00
Johannes M Dieterich
9c5a899773 Enable jit fusion on ROCm (#22872)
Summary:
As of ROCm 2.6, we support hiprtc - the HIP runtime compilation API. Enable the jit fusion feature depending on the existence of such an API. This entails
* new hipification rules for API_RTC
* add hiprtc APIs to the shim loader
* update cmake infrastructure to find the hiprtc library (it is part of the HIP package)
* enabling of unit tests in the jit_fuser test set
* special casing in resource strings for HIP - the typedefs CUDA requires would be redundant
* for now disable the occupancy calculation we do not support yet and hard-code

Thanks to t-vi for working with me on getting this integration done!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22872

Differential Revision: D17207425

Pulled By: bddppq

fbshipit-source-id: 93409f3051ad0ea06afacc2239fd6c402152debe
2019-09-05 18:22:08 -07:00
iotamudelta
4fe857187c switch to rocThrust for thrust/cub APIs (#25620)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25620

Pull Request resolved: https://github.com/pytorch/pytorch/pull/25602

Enable rocThrust with hipCUB and rocPRIM for ROCm. They are the ROCm implementations of the thrust and cub APIs and replace the older hip-thrust and cub-hip packages going forward. ROCm 2.5 is the first release to contain the new packages as an option, as of 2.6 they will be the only available option.

Add hipification rules to correctly hipify thrust::cuda to thrust::hip and cub:: to hipcub:: going forward. Add hipification rules to hipify specific cub headers to the general hipcub header.

Infrastructure work to correctly find, include and link against the new packages. Add the macro definition to choose the HIP backend to Thrust.

Since include chains are now a little different from CUDA's Thrust, add includes for functionality used where applicable.

Skip four tests that fail with the new rocThrust for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21864

Reviewed By: xw285cornell

Differential Revision: D16940768

Pulled By: bddppq

fbshipit-source-id: 3dba8a8f1763dd23d89eb0dd26d1db109973dbe5
2019-09-03 22:16:30 -07:00
Peter Yeh
8df83ce559 Bump Gloo (#23400)
Summary:
Feature includes

- Log message if bind(2) fail
- Make collective work with single process context
- Use hipStreamCreateWithFlags instead of hipStreamCreateWithPriority
- Add RCCl support
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23400

Differential Revision: D16623110

Pulled By: bddppq

fbshipit-source-id: e75cd8d2e2cad551ad0b0a08667320d7036b78bd
2019-08-02 11:26:28 -07:00
Johannes M Dieterich
4b78ce1ba4 Clean cmake infrastructure up (#23527)
Summary:
Only check for cmake dependencies we directly depend on (e.g., hipsparse but not rocsparse)

Use cmake targets for ROCm where possible.

While there, update the docker CI build infrastructure to only pull in packages by name we directly depend on (anticipating the demise of, e.g., miopengemm). I do not anticipate a docker rebuild to be necessary at this stage as the changes are somewhat cosmetic.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23527

Differential Revision: D16561010

Pulled By: ezyang

fbshipit-source-id: 87cd9d8a15a74caf9baca85a3e840e9d19ad5d9f
2019-07-30 07:26:48 -07:00
Johannes M Dieterich
1607bb322d Support all ROCm supported uarchs simultaneously: gfx803, gfx900, gfx906 (#17367)
Summary:
Correct misspelled flag.

Remove dependency on debug flag (HCC_AMDGPU_TARGET)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17367

Differential Revision: D14227334

Pulled By: bddppq

fbshipit-source-id: d838f219a9a1854330b0bc851c40dfbba77a32ef
2019-02-26 11:54:07 -08:00
bddppq
c063a33ef3 Add support to build for multiple amd gpu targets (#17329)
Summary:
iotamudelta petrex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17329

Differential Revision: D14161277

Pulled By: bddppq

fbshipit-source-id: f3eb9f52e96a8fcd779c57df0f8c9a2c54754e35
2019-02-20 18:45:24 -08:00
Junjie Bai
0d7a986da1 Change hip filename extension to .hip (#14036)
Summary:
xw285cornell

- To make hip files to have unique filename extension we change hip files from _hip.cc to .hip (it's the only blessing option other than .cu in hipcc 3d51a1fb01/bin/hipcc (L552)).
- Change to use host compiler to compile .cc|.cpp files. Previously we use hcc to compile them which is unnecessary
- Change the hipify script to not replace "gpu" with "hip" in the filename of the generated hipified files. Previously we do this because hcc has a bug when linking files that have same filename. We have now changed to use host linker to do linking so this is unnecessary anymore.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14036

Reviewed By: xw285cornell

Differential Revision: D13091813

Pulled By: bddppq

fbshipit-source-id: ea3d887751d8abb39d75f5d5104aa66ce66b9ee0
2018-11-16 11:55:59 -08:00
Johannes M Dieterich
53a3c46950 Switch to packaged Thrust on Ubuntu, enable CentOS 7.5 as a CI target (#12899)
Summary:
1) Use the hip-thrust version of Thrust as opposed to the GH master. (ROCm 267)

2) CentOS 7.5 docker (ROCm 279)

* Always install the libraries at docker creation for ubuntu.
* Add Dockerfile for CentOS ROCm
* Enable the centos build
* Source devtoolset in bashrc
* Set locales correctly depending on whether we are on Ubuntu or CentOS
* Install a newer cmake for CentOS
* Checkout thrust as there is no package for CentOS yet.

PyTorch/Caffe2 on ROCm passed tests: https://github.com/ROCmSoftwarePlatform/pytorch/pull/280

For attention: bddppq ezyang

Docker rebuild for Ubuntu not urgent (getting rid of Thrust checkout and package install is mainly cosmetic). If docker for CentOS 7.5 is wanted, build is necessary. Build of PyTorch tested by me in CentOS docker. PyTorch unit tests work mostly, however, a test in test_jit causes a python recursion error that seems to be due to the python2 on CentOS as we haven't ever seen this on Ubuntu - hence please do not enable unit tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12899

Differential Revision: D13029424

Pulled By: bddppq

fbshipit-source-id: 1ca8f4337ec6a603f2742fc81046d5b8f8717c76
2018-11-12 14:39:54 -08:00
Johannes M Dieterich
957142a4fe switch ROCm CI targets to white rabbit release (#12577)
Summary:
* switches docker files over to white rabbit release - removed custom package installs
* skips five tests that regressed in that release
* fixes some case-sensitivity issues in ROCm supplied cmake files by sed'ing them in the docker
* includes first changes to the infrastructure to support upcoming hip-clang compiler
* prints ROCm library versions as part of the build (as discussed w/ ezyang )
* explicitly searches for miopengemm
* installs the new hip-thrust package to be able to remove the explicit Thrust checkout in a future revision
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12577

Differential Revision: D10350165

Pulled By: bddppq

fbshipit-source-id: 60f9c9caf04a48cfa90f4c37e242d944a175ab31
2018-10-11 18:03:11 -07:00
iotamudelta
b572e27502 Fix types and warp sizes for ROCm (ROCm 256) (#12485)
Summary:
* Correct the warp size for current AMD GPUs
* Fix copy paste error in configure
* Correct the wrong typing explicitly

bddppq ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12485

Differential Revision: D10262490

Pulled By: ezyang

fbshipit-source-id: 93467944247ed764d9ac9f7bb212a94fc250608e
2018-10-09 12:53:48 -07:00
iotamudelta
9de2085806 Use custom hcc/HIP, purge hcSPARSE (#11198)
Summary:
* purge hcSPARSE now that rocSPARSE is available
* integrate a custom hcc and HIP
* hcc brings two important compiler fixes (fixes hundreds of unit tests)
* HIP brings a smart dispatcher that allows us to avoid a lot of static_casts (we haven't yet removed the automatic static_casts but this catches some occurrences the script did not catch)
* mark 5 unit tests skipping that have regressed w/ the new hcc (we don't know yet what is at fault)
* optimize bitonic sort - the comparator is always an empty struct - therefore passing it by value saves at least 3 bytes. It also removes an ambiguity around passing references to `__global__` functions
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11198

Differential Revision: D9652340

Pulled By: ezyang

fbshipit-source-id: f5af1d891189da820e3d13b7bed91a7a43154690
2018-09-06 19:38:07 -07:00
iotamudelta
33c7cc13ca improve docker packages, fix bugs, enable tests, enable FFT (#10893)
Summary:
* improve docker packages (install OpenBLAS to have at-compile-time LAPACK functionality w/ optimizations for both Intel and AMD CPUs)
* integrate rocFFT (i.e., enable Fourier functionality)
* fix bugs in ROCm caused by wrong warp size
* enable more test sets, skip the tests that don't work on ROCm yet
* don't disable asserts any longer in hipification
* small improvements
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10893

Differential Revision: D9615053

Pulled By: ezyang

fbshipit-source-id: 864b4d27bf089421f7dfd8065e5017f9ea2f7b3b
2018-09-02 08:54:42 -07:00
iotamudelta
75651d5b58 improve use of ROCm libraries, enable more tests, small fixes (#10406)
Summary:
* some small leftovers from the last PR review
* enable more unit test sets for CI
* replace use of hcRNG w/ rocRAND (docker image was already updated w/ newer rocRAND)
* use rocBLAS instead of hipBLAS to allow convergence w/ Caffe2
* use strided_batched gemm interface also from the batched internal interface
* re-enable Dropout.cu as we now have philox w/ rocRAND
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10406

Reviewed By: Jorghi12

Differential Revision: D9277093

Pulled By: ezyang

fbshipit-source-id: 7ef2f6fe4ead77e501ed7aea5c3743afe2466ca2
2018-08-13 11:39:43 -07:00
iotamudelta
a38b572de3 enable unit tests and other changes (#10266)
Summary:
This PR for the ROCm target does the following:
* enable some unit tests on ROCm
* fix a missing static_cast that breaks BatchNorm call on ROCm
* fix BatchNorm to work on ROCm w/ ROCm warp sizes etc
* improve the pyhipify script by introducing kernel scope to some transpilations and other improvements
* fix a linking issue on ROCm
* for more unit test sets: mark currently broken tests broken (to be fixed)
* enable THINLTO (phase one) to parallelize linking
* address the first failing of the elementwise kernel by removing non-working ROCm specialization
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10266

Differential Revision: D9184178

Pulled By: ezyang

fbshipit-source-id: 03bcd1fe4ca4dd3241f09634dbd42b6a4c350297
2018-08-06 14:54:01 -07:00
iotamudelta
cfa05706ef ROCm contributions week 29 (#9653)
Summary:
In this changeset:
* improvements to `hipify-python.py`
* marking unit tests broken for ROCm
* reducing the number of jobs for the built to avoid out of memory issues
* switch to Thrust/cub-hip master for the CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9653

Differential Revision: D9117791

Pulled By: ezyang

fbshipit-source-id: a6c3c7b81f2bda9825974bf9bf89a97767244352
2018-08-02 09:09:00 -07:00
Junjie Bai
99dbcd0451 set CMAKE_HIP_ARCHIVE_APPEND (#9394)
Summary:
petrex

To make `-DBUILD_SHARED_LIBS=OFF` working
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9394

Reviewed By: mingzhe09088

Differential Revision: D8822947

Pulled By: bddppq

fbshipit-source-id: 4fb213c723138804fb0fdb3b381e32623cf14468
2018-07-12 12:24:49 -07:00
bddppq
a8bf30d7a5
caffe2 hip python binding (#8491)
* caffe2 hip python binding

* Change back onnx submodule
2018-06-14 19:56:56 -07:00
Jorghi12
8a837f0fe3 Repairing the integrated build path to handle the Caffe2 PR. (#8441)
* Modifying the build path to handle Caffe2's merge

* Update LoadHIP.cmake

Fixing typo.

* Update Dependencies.cmake

Keeping hip_include_directories since other Caffe2 libs depend on it.

* Update CMakeLists.txt

Only including for the second time if we're building with ATen.

* Update CMakeLists.txt

Adding comments to make sure future users understand why necessary commands have been added.
2018-06-13 17:16:59 -04:00
Peter Yeh
c37e5b7137 [Caffe2] Enable AMD/MIOPEN ops for Caffe2 (#8306)
* Add hip support for caffe2 core

* Add MIOPEN header/wrapper to caffe2 core

* Add HIP device into caffe2 PB

* top level makefile change for rocm/hip

* makefile scaffolding for AMD/RocM/HIP

* Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files

* caffe2 PB update for AMD/ROCM HIP device

* Add AMD/RocM/Thrust dependency

* HIP threadpool update

* Fix makefile macro

* makefile fix: duplicate test/binary name

* makefile clean-up

* makefile clean-up

* add HIP operator registry

* add utilities for hip device

* Add USE_HIP to config summary

* makefile fix for BUILD_TEST

* merge latest

* Fix indentation

* code clean-up

* Guard builds without HIP and use the same cmake script as PyTorch to find HIP

* Setup rocm environment variables in build.sh (ideally should be done in the docker images)

* setup locale

* set HIP_PLATFORM

* Revert "set HIP_PLATFORM"

This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad.

* continue the build script environment variables mess

* HCC_AMDGPU_TARGET

* Cleanup the mess, has been fixed in the lastest docker images

* Assign protobuf field hip_gpu_id a new field number for backward compatibility

* change name to avoid conflict

* Fix duplicated thread pool flag

* Refactor cmake files to not add hip includes and libs globally

* Fix the wrong usage of environment variables detection in cmake

* Add MIOPEN CNN operators

* Revert "Add MIOPEN CNN operators"

This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a.

* Add MIOPEN pooling operator

* Add MIOPEN activation operator

* Add MIOPEN softmax operator

* Add MIOPEN spatial batch norm operator

* Add MIOPEN loacl response normalization operator

* Add MIOPEN conv operator

* Clean-up LRN ops

* enable fp16 in MIOPEN pool ops

* Enable fp16 for MIOPEN relu op

* Enable fp16 for MIOPEN spatial batch norm op

* code clean-up

* revert float16 support

* Create Caffe2 python binding for AMD/ROCM/HIP

* Add op fallback for HIP operator

* add hip src/test files in cmake

* exclude hip src/test files

* fix python binding for hip backend

* fix MIOPEN pooling op workspace

* hack to compile miopen operators

* fix include path for MIOPEN ops

* Fix include path

* Add HIP math utilities

* Fix path for HIP math utils

* cmake fix

* Cmake fix / hipcc for hip files

* suppress hipcc warning

* cmake fix /replcae USE_HIP with USE_ROCM

* revert LoadHIP.cmake change

* fix include for thrust/cub-hip

* include path fix for conversion.h

* Updated with latest upstream changes

* clang format fixes

* Context_hip updates

* Fixed typo in rocblas handle get function

* Updated hipified math utils

* Updated math hip test util

* Updated context hip test

* Updated common_hip

* Updated net async dag for HIP

* Added MIOPEN in operator hip test

* fix

* C2 dependencies clean-up

* fix include path for building custom protobuf

* Decouple miopen pool op and conv_pool_op base

* cmake refactor

* fix operator_hip_test

* move all hip/miopen ops files into caffe2/operators/hip

* sanitize cmake

* permission issue

* remove extra parenthesis

* remove artifact from resolving merge conflict

* cont. sanitize cmake files

* fix syntax error

* sanitize conversion.h

* .

* Revert "."

This reverts commit 56020cb0e996a31ae27bf1f8f491955ed0b121b9.

* clang-format
2018-06-13 04:00:39 -07:00