Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598
ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18598 Turn on F401: Unused import warning.**
This was requested by someone at Facebook; this lint is turned
on for Facebook by default. "Sure, why not."
I had to noqa a number of imports in __init__. Hypothetically
we're supposed to use __all__ in this case, but I was too lazy
to fix it. Left for future work.
Be careful! flake8-2 and flake8-3 behave differently with
respect to import resolution for # type: comments. flake8-3 will
report an import unused; flake8-2 will not. For now, I just
noqa'd all these sites.
All the changes were done by hand.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D14687478
fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3
Summary:
hip-clang uses triple chevron kernel dispatch syntax. Add an option to the hipification script to skip translating triple chevron to hipLaunchKernelGGL.
Once we switch to hip-clang, this option will be default and subsequently removed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17686
Differential Revision: D14327810
Pulled By: bddppq
fbshipit-source-id: 5e1512325077dd3ebb8fb9b5bf35fd1f8d9a4dc3
Summary:
HIPGuard interfaces that interacted with HIPStream were previously
totally busted (because the streams had the wrong device type).
This fixes it, following along the same lines of MasqueardingAsCUDA.
Along the way I beefed up the explanatory comment.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
cc jithunnair-amd iotamudelta bddppq
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17469
Differential Revision: D14243396
Pulled By: ezyang
fbshipit-source-id: 972455753a62f8584ba9ab194f9c785db7bb9bde
Summary:
In light of the antistatic feature being a part of the released ROCm 2.1, remove
the feature in pyHIPIFY for extraction of kernel arguments and insertion of
static_casts.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17055
Differential Revision: D14068478
Pulled By: bddppq
fbshipit-source-id: 6895f490c78247a129aa18c520ff8d4d1a3d3642
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17076
OSS: slightely change the tools/amd_build/build_amd.py to add the output_directory for internal use. Also modify the renaming convention in hipify script to reflect the updated rules.
Reviewed By: bddppq
Differential Revision: D13767218
fbshipit-source-id: cbcadc51daab42197d545f204840dcc18176bb3d
Summary:
Rehash of previous attempts. This tries a different approach where we accept the install as specified in cmake (leaving bin/ include/ and lib/ alone), and then try to adjust the rest of the files to this more standard layout.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16414
Differential Revision: D13863635
Pulled By: zdevito
fbshipit-source-id: 23725f5c64d7509bf3ca8f472dcdcad074de9828
Summary:
Some renaming and renamespacing also took place. I was originally planning not to do anything, but it turns out that it was easier to make HIPify work by using a namespace CUDACachingAllocator:: rather than THCCachingAllocator_, since :: is a word boundary but _ is not.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16119
Reviewed By: smessmer
Differential Revision: D13718768
fbshipit-source-id: 884a481d99027fd3e34471c020f826aa12225656
Summary:
Hello,
This is a little patch to fix `DeprecationWarning: invalid escape sequence`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15733
Differential Revision: D13587291
Pulled By: soumith
fbshipit-source-id: ce68db2de92ca7eaa42f78ca5ae6fbc1d4d90e05
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14248
This diff also introduces a horrifying hack to override CUDA's DeviceGuardImpl
with a HIPGuardImplMasqueradingAsCUDA, to accommodate PyTorch's current
behavior of pretending CUDA is HIP when you build with ROCm enabled.
Reviewed By: bddppq
Differential Revision: D13145293
fbshipit-source-id: ee0e207b6fd132f0d435512957424a002d588f02
Summary:
```
This diff changes the HIPification of ATen to be out-of-place.
We now have the following mappings:
- ATen/cuda => ATen/hip
- ATen/native/cuda => ATen/native/hip
- ATen/native/sparse/cuda => ATen/native/sparse/hip
- THC => THH
- THCUNN => THHUNN
The build system is adjusted to know about these new build paths,
and HIPify is taught how to adjust include paths and
THC_GENERIC_FILE appropriately. ATen_hip is now built as
the ATen_hip library, rather than reusing ATen_cuda.
However, despite these new filepaths, none of the identifiers in ATen
have actually changed. So, e.g., THHGeneral.h still defines functions
named THC_blahblah, and HIP still shows up as CUDA in PyTorch itself.
We'll tackle this in a subsequent PR; this diff is just to get the files
out-of-place.
Minor extra improvements:
- Don't edit tmp_install when hipifying
- HIP no longer builds native_cudnn_cpp; it was unnecessary
- Caffe2_HIP_INCLUDES is now Caffe2_HIP_INCLUDE, for consistency
with all the other variables.
- HIP build now properly respects ATEN_CUDA_FILES_GEN_LIB (it
did not previously.)
- You can now override file extension matching in pyHIPIFY
by explicitly specifying its full name in the matching list.
This is used so we can HIPify CMakeLists.txt in some situations.
A little bit of string and ceiling wax:
- gen.py grows a --rocm flag so that it knows to generate CUDA
files which actually refer to the HIP headers (e.g., THH.h)
We'll get rid of this eventually and generate real HIP files,
but not for this PR.
- Management of HIP dependencies is now completely deleted
from the ATen CMakeLists.txt. The old code was dead (because
it was shoveled in ATen_CUDA_DEPENDENCY_LIBS and promptly
ignored by the Caffe2 build system) and didn't actually work.
```
Stacked on https://github.com/pytorch/pytorch/pull/14849 review last commit only
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14866
Differential Revision: D13419475
Pulled By: ezyang
fbshipit-source-id: cb4c843df69a1d8369314c9fab1b7719520fa3db
Summary:
cc iotamudelta
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14993
Differential Revision: D13405804
Pulled By: ezyang
fbshipit-source-id: c4aa9ed29ee2a4f3abf76c1e0fa8babfd738db35
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
cc iotamudelta
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14999
Differential Revision: D13405754
Pulled By: ezyang
fbshipit-source-id: 98459496494390ad1115b4f1f6738d53c14f0745
Summary:
Latest hcc can now properly cast to correct type internally, so there is no need to insert static_cast in hipify scripts anymore.
However the hcc included in the latest ROCm release (1.9.2) doesn't have this fix, so leaving a flag to continue doing static_cast for those using the official ROCm releases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14853
Differential Revision: D13363171
Pulled By: bddppq
fbshipit-source-id: a36476a8511222ff3c933d31788e8a0ffb04f5ca
Summary:
```
Improve performance of pyHIPIFY
Changes:
- Pre-compile regexes, don't use regexes when it's not necessary
(this saves us ~15%)
- Compile all substitutions for mappings into a single, non-backtracking
regex using a Trie. This gives big savings.
Before, running pyHIPIFY on all files took 15.8s. Now it takes 3.9s.
```
Stacked on #14769
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14803
Differential Revision: D13342620
Pulled By: ezyang
fbshipit-source-id: 1cfa36b3236bbe24d07080a31cc788a52d740f40
Summary:
I need to preserve ability to HIPify out-of-place files
only, so build_amd.py grows a --out-of-place-only flag.
Stacked on #14757
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14769
Differential Revision: D13340154
Pulled By: ezyang
fbshipit-source-id: 1b855bc79e824ea94517a893236fd2c8ba4cb79d
Summary:
This is towards unifying build_pytorch_amd.py and build_caffe2_amd.py
scripts. There is only one use of hipify_caffe2 left, which is just
to control which files actually get HIPified.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14757
Differential Revision: D13323486
Pulled By: ezyang
fbshipit-source-id: 958cd91be32dfc3c0a9ba9eda507adb5937aebcd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13756
This implements general Gather operator for arbitrary axis, sharing the code with BatchGather.
- CPU gather & batch gather logic is now shared through caffe2::gather_helper, for any axis.
- Shared CUDA kernel moved to gather_op.cuh, for any axis.
- Gradients of axis > 0 delegate to BatchGatherGradientOp which now has axis argument.
- BatchGatherOp doc strings updated to have correct rank (q + (r -1)) and output.
- Added tests for axis == 2.
GatherOp supports index wrapping for axis == 0 by default, which was earlier for ONNX.
This diff also extends it to work in Cuda kernel. Added "wrap_indices" argument which specifies
wheather this wrapping should be done; set it to true if you'd like wrapping for any axis.
TBD: Update gradients to support negative indices (separate diff).
TBD: Once we have operator versioning, we'd like to update GatherOp to NOT support axis 0 wrapping
by default, but rather do it only if wrap_indices is set.
Reviewed By: dzhulgakov
Differential Revision: D12983815
fbshipit-source-id: 8add9d67b47fe8c5ba7a335f581ca0530b205cd7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13920
I want to move CUDAStream and CUDAGuard to c10_cuda without also
bringing along CUDAContext or CUDAEvent for the ride (at least for
now). To do this, I need to eliminate those dependencies.
There's a few functions in CUDAContext.h which don't really need
THCState, so they're separated out and put in general
purpose c10/cuda/CUDAFunctions.h
Reviewed By: smessmer
Differential Revision: D13047468
fbshipit-source-id: 7ed9d5e660f95805ab39d7af25892327edae050e
Summary:
xw285cornell
- To make hip files to have unique filename extension we change hip files from _hip.cc to .hip (it's the only blessing option other than .cu in hipcc 3d51a1fb01/bin/hipcc (L552)).
- Change to use host compiler to compile .cc|.cpp files. Previously we use hcc to compile them which is unnecessary
- Change the hipify script to not replace "gpu" with "hip" in the filename of the generated hipified files. Previously we do this because hcc has a bug when linking files that have same filename. We have now changed to use host linker to do linking so this is unnecessary anymore.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14036
Reviewed By: xw285cornell
Differential Revision: D13091813
Pulled By: bddppq
fbshipit-source-id: ea3d887751d8abb39d75f5d5104aa66ce66b9ee0
Summary:
- Speed up hipify_python.py by blacklisting useless (and quite large)
directory trees that it would otherwise recurse into
- Pass around relative paths instead of absolute paths. This makes it
easier to do filename matches based on the root of the tree.
- Redo the streaming output to contain more useful information
- Make it handle c10/cuda correctly, rewrite c10::cuda to
c10::hip, and the header name from CUDAMathCompat.h to
CUDAHIPCompat.h
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13973
Differential Revision: D13062374
Pulled By: ezyang
fbshipit-source-id: f0858dd18c94d449ff5dbadc22534c695dc0f8fb
Summary:
1) Use the hip-thrust version of Thrust as opposed to the GH master. (ROCm 267)
2) CentOS 7.5 docker (ROCm 279)
* Always install the libraries at docker creation for ubuntu.
* Add Dockerfile for CentOS ROCm
* Enable the centos build
* Source devtoolset in bashrc
* Set locales correctly depending on whether we are on Ubuntu or CentOS
* Install a newer cmake for CentOS
* Checkout thrust as there is no package for CentOS yet.
PyTorch/Caffe2 on ROCm passed tests: https://github.com/ROCmSoftwarePlatform/pytorch/pull/280
For attention: bddppq ezyang
Docker rebuild for Ubuntu not urgent (getting rid of Thrust checkout and package install is mainly cosmetic). If docker for CentOS 7.5 is wanted, build is necessary. Build of PyTorch tested by me in CentOS docker. PyTorch unit tests work mostly, however, a test in test_jit causes a python recursion error that seems to be due to the python2 on CentOS as we haven't ever seen this on Ubuntu - hence please do not enable unit tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12899
Differential Revision: D13029424
Pulled By: bddppq
fbshipit-source-id: 1ca8f4337ec6a603f2742fc81046d5b8f8717c76
Summary:
This PR contains changes for:
1. Adding HIP top_k operator in Caffe2
2. Added HIP equivalent definitions of GPUDefs and GPUScanUtils
3. Removing the top_k operator test from ROCm test ignore list
4. Bug fixes in related code in THC/THCAsmUtils.cuh
Differential Revision: D12986451
Pulled By: bddppq
fbshipit-source-id: 6d5241fb674eaeb7cde42166426ac88043b83504
Summary:
Since json is a builtin module in Python (>= 2.6), this makes pyhipify
can be invoked without installing any extra dependencies.
petrex iotamudelta
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13595
Differential Revision: D12931045
Pulled By: bddppq
fbshipit-source-id: 31d68fb6e730fd9d11593550ca531423cb0596e9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13265
Make changes to make hipify_python script to work with fbcode.
1. Add TARGETS file
2. Make hipify_python a module as well as a standalone script.
Reviewed By: bddppq
Differential Revision: D10851216
fbshipit-source-id: cacd04df6fe2084832256d1916d62dccea86baa9
Summary:
* Enable disabled functions for ROCm (ROCm 252)
* fixes for topk fp16 (ROCm 270)
* HIP needs kernel invocation to be explicitly templated to be able to take non-const arg as const kernel arg (ROCm 281)
For attention: bddppq ezyang
Full set of PyTorch/Caffe2 tests on ROCm here: https://github.com/ROCmSoftwarePlatform/pytorch/pull/283
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12973
Differential Revision: D10516072
Pulled By: bddppq
fbshipit-source-id: 833b3de1544dfa4886a34e2b5ea53d77b6f0ba9e
Summary:
The legal function cublasHandle_t cublas_handle() was hipified to the
clearly illegal rocblas_handle rocblas_handle(). It should not work and
correctly fails with gcc as the host compiler as it induces an
ambiguity.
Function now hipifies to rocblas_handle rocblashandle()
Fixes long standing issue we've observed in PyTorch when base compiler is gcc.
For attention: bddppq ezyang
Tests on ROCm PyTorch/Caffe2: https://github.com/ROCmSoftwarePlatform/pytorch/pull/284
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12957
Differential Revision: D10501227
Pulled By: bddppq
fbshipit-source-id: 568cb80801c0d14c9b1b61e3a7db387a5c21acf4