`torch/csrc/utils.h` should be device-independent. Currently, it contains CUDA-related implementations, which indirectly causes the [failure of ROCm testing](https://github.com/pytorch/pytorch/pull/151914#issuecomment-2839691038) (The reason is that the ROCm test environment shouldn`t expose HIP-related header files, which causes the JIT compilation to fail during testing)
Therefore, move CUDA-related implementations to `torch/csrc/cuda/utils.h`.
**Question:**
This change may introduce BC-breack.
I searched for this function globally on github and I think the impact is very small.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/152521
Approved by: https://github.com/Skylion007, https://github.com/albanD
ghstack dependencies: #152512, #152513
Summary:
These seem to not be needed and cause ninja to rebuild the files at every build.
(There also is THCStorage.cu, but hopefully this will go away with https://github.com/pytorch/pytorch/issues/68556 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69024
Reviewed By: soulitzer
Differential Revision: D32705309
Pulled By: ngimel
fbshipit-source-id: 5255297f213fdcf36e7203de7460a71291f8c9a0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29143
THP_CORE macro is a very old macro that appeared to have served
two purposes:
1. The torch-python equivalent of CAFFE2_BUILD_MAIN_LIB, to toggle
symbol visibility headers
2. Some sort of ad hoc way of hiding certain definitions from headers
so external clients can't get at them.
It did (2) in a very confusing manner, because we set THP_CORE in both
torch and torch-python (it shouldn't do anything in torch). In this
PR I just get rid of use case (2) entirely (so everything shows up in
headers all the time), and then redo (1) using a new THP_BUILD_MAIN_LIB
macro. This cleans up some of the macro definitions and makes my life
easier for working on #27215.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Differential Revision: D18309594
Pulled By: ezyang
fbshipit-source-id: adcb6d7cb387cd818480137e2b94e5e761dbfefc
Summary:
1. Added `torch/csrc/cuda/Event.h` and `torch/csrc/cuda/Event.cpp` to bind Python Event class to C++ implementation.
2. Move all CUDA runtime invocations from `torch/cuda/streams.py` to C++
3. Added tests to cover Stream and Event APIs. ~(event IPC handle tests is introduced in #15974)~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15937
Differential Revision: D13649001
Pulled By: mrshenli
fbshipit-source-id: 84ca58f35f6ba679a4ba33150ceba678d760d240
Summary:
Anywhere we used #include "foo.h", we now say #include <foo.h>
Paths are adjusted to be rooted out of aten/src, torch/lib, or
the root level directory.
I modified CMakeLists.txt by hand to remove TH and THC from
the include paths.
I used the following script to do the canonicalization:
```
import subprocess
import re
import os.path
files = subprocess.check_output(['git', 'ls-files']).decode('utf-8').rstrip().split('\n')
for fn in files:
if not any(fn.endswith(suff) for suff in ['.cu', '.cpp', '.in', '.h', '.hpp', '.cu', '.cuh', '.cc']):
continue
if not any(fn.startswith(pref) for pref in ["aten/", "torch/"]):
continue
with open(fn, 'r') as f:
c = f.read()
def fmt(p):
return "#include <{}>".format(p)
def repl(m):
p = m.group(1)
if p in ["dlfcn.h", "unistd.h", "nvrtc.h", "cuda.h", "cuda_runtime.h", "cstdint", "cudnn.h", "Python.h", "cusparse.h", "cuda_runtime_api.h", "cuda_fp16.h", "cublas_v2.h", "stdint.h", "curand_kernel.h"]:
return fmt(p)
if any(p.startswith(pref) for pref in ["torch/csrc", "c10/", "ATen/", "caffe2/", "TH/", "THC/", "Eigen/", "gtest/", "zdl/", "gloo/", "onnx/", "miopen/"]):
return fmt(p)
for root in ["aten/src", "torch/lib", ""]:
for bad_root in [os.path.dirname(fn), "aten/src/TH", "aten/src/THC", "torch/csrc"]:
new_p = os.path.relpath(os.path.join(bad_root, p), root)
if not new_p.startswith("../") and (os.path.exists(os.path.join(root, new_p)) or os.path.exists(os.path.join(root, new_p + ".in"))):
return fmt(new_p)
print("ERROR: ", fn, p)
return m.group(0)
new_c = re.sub(r'#include "([^"]+)"', repl, c)
if new_c != c:
print(fn)
with open(fn, 'w') as f:
f.write(new_c)
```
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14849
Reviewed By: dzhulgakov
Differential Revision: D13363445
Pulled By: ezyang
fbshipit-source-id: 52361f878a672785f9306c9e9ab2513128092b68
* Port THCS to ATen.
General structure of the sparse implementation:
- SparseCUDATensor.{cpp, cu} and SparseCUDATensorMath.cu contain
the same functions as their CPU analogues
- SparseCUDAApplyUtils.cuh contains what used to be in
THCSTensor.cu
- SparseCUDABlas.cu contains what used to be THCSparse.cu
Unrelated improvements:
- Forward declared CUDA types in Context.h are now moved
exclusively to CUDAHooks
- New getCurrentCUDASparseHandle in Context
- Support for printing CUSPARSE_STATUS_ZERO_PIVOT error message
directly
Some unusual pieces:
- get_device got the LegacyBridge makeover, as it needs special
logic on sparse tensors (defer to the inner tensors).
- I noticed that I need to turn off device_guard codegen
for many functions in sparse, noticed because get_device
became a native function, and resulted in an infinite recursion. This was
done by adding device_guard: False to the native definitions. An alternative
strategy might be to make the heuristic for deciding when to put in a device
guard more clever.
Scaffolding removal:
- LegacyBridge now special-cases only on sparse versus dense;
no more CUDA test (hooray!)
- Native bindings get CUDA/SparseCUDA dispatch entries.
CPU sparse refactoring:
- New SparseUtils.h header, with all of the utility functions that
used to live in SparseTensor.cpp
- new_with_tensor_sparse now correctly handles both CPU and CUDA
- transpose functions in sparse/ turned out to be dead, so I killed them
Bugs I noticed while working on this:
- I used accessor<...>() on a CUDA tensor, because I thought it does
the CUDA-CPU sync. It does not.
Last mile changes:
- I killed all of the THS/THCS directories, build scripts, bindings everything.
It is now no more!
- A bunch of trampolines in LegacyBridge are no more; anything
that was "sparse only" is now done natively.
- `sparse_coo_tensor` is implemented a little funny, but we think
it's a good idea.
- HIP is handled by explicitly ifdef'ing out all kernels; we'll add support
for this at some later point in time.
- TH_INDEX_BASE is now unconditionally set to 0.
- Some uses of x.type() now replaced with x.options(), the new way of doing it.
- More notes about checked_cast_tensor, and eliminate Storage/Tensor fields in
the code gen env when they are dead.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
* Remove remaining TensorTypeUtils functions.
Mostly what's remaining is copy utilities -- these are now provided in THCTensorCopy.hpp and templatized on the ScalarType rather than the TensorType.
* Have a single THTensor / THCTensor type.
As was previously done with Storages, have only a single (dtype-independent) THTensor / THCTensor.
For documentation and backwards compatibility purposes, the old names, e.g. TH(Cuda)LongTensor alias the new TH(C)Tensor type.
* undef GENERATE_SPARSE.
This deletes most of the dead Tensor code paths, including the TensorMethods cwrap and generic/Tensor.cpp.
This also moves the THNN.cwrap/.cpp generation to generate_code which can use ninja if installed.
Primary things I had to fix:
- Suppress _XOPEN_SOURCE warnings by ensuring that Python.h is included
first, because it always unconditionally defines this macro.
- Turn off strict aliasing, because Python 2 doesn't work with strict
aliasing.
- Workaround setuptools bug, where it's incorrectly passing
-Wstrict-prototypes to C++ compilers (where this doesn't make
any sense)
To compile csrc with -Werror, run `CFLAGS="-Werror" python setup.py build_ext`
Signed-off-by: Edward Z. Yang <ezyang@fb.com>