#75854
A naive attempt at working around the limitations of using a single 64-bit integer to pack `stream_id`, `device_index`, and `device_type`.
Stills needs sanity checks, testing, and minimization of BC-breaking changes.
Currently a Holder for the `StreamData3` struct is used for `IValue` compatibility. While doing this seems to work for `ivalue.h` and `ivalue_inl.h`, this doesn't seem to be naively working for the JIT CUDA stream wrapper? (Something about ambiguous calls if an `intrusive_ptr` to `c10::ivalue::StreamData3Holder` is used as the return type for `pack()`. It turns out that the methods required to access the fields for rematerializing a CUDA Stream are basically already present anyway, so `pack` is simply removed in the wrapper for now and the methods to access the required fields are called directly.
CC @ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81596
Approved by: https://github.com/ezyang
Summary:
In my last PR I've missed CUDA and distributed folders, fixing this now
This change is autogenerated by `python tool/clang_tidy.py -s`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57235
Reviewed By: janeyx99
Differential Revision: D28084444
Pulled By: malfet
fbshipit-source-id: bf222f69ee90c7872c3cb0931e8cdb84f0cb3cda
Summary:
Following up on this: https://github.com/pytorch/pytorch/pull/35851 cross dtype storage copy is not being used internally, so I have not included cross dtype copy for complex.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35771
Differential Revision: D21319650
Pulled By: anjali411
fbshipit-source-id: 07c72996ee598eba0cf401ad61534494d6f5b5b3
Summary:
This is the first commit from a series of planned changes in order to add boolean tensors to PyTorch. The whole plan looks like this:
0. Storage Implementation (this change)
1. Tensor Creation.
2. Tensor Conversions.
3. Tensor Indexing.
4. Tensor Operations.
5. Back compatibility related changes.
This feature was requested by the community:
https://github.com/pytorch/pytorch/issues/4764https://github.com/pytorch/pytorch/issues/4219https://github.com/pytorch/pytorch/issues/4288
**Change**:
Added boolean type to the Storage class for CPU and CUDA backends.
**Tested via**:
1. unit tests
2. running this:
-> import torch
-> torch.BoolStorage
<class 'torch.BoolStorage'>
-> torch.cuda.BoolStorage
<class 'torch.cuda.BoolStorage'>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16810
Reviewed By: gchanan
Differential Revision: D14087246
Pulled By: izdeby
fbshipit-source-id: 042642ced1cb0fd1bb6bff05f9ca871a5c54ee5e
Summary:
Anywhere we used #include "foo.h", we now say #include <foo.h>
Paths are adjusted to be rooted out of aten/src, torch/lib, or
the root level directory.
I modified CMakeLists.txt by hand to remove TH and THC from
the include paths.
I used the following script to do the canonicalization:
```
import subprocess
import re
import os.path
files = subprocess.check_output(['git', 'ls-files']).decode('utf-8').rstrip().split('\n')
for fn in files:
if not any(fn.endswith(suff) for suff in ['.cu', '.cpp', '.in', '.h', '.hpp', '.cu', '.cuh', '.cc']):
continue
if not any(fn.startswith(pref) for pref in ["aten/", "torch/"]):
continue
with open(fn, 'r') as f:
c = f.read()
def fmt(p):
return "#include <{}>".format(p)
def repl(m):
p = m.group(1)
if p in ["dlfcn.h", "unistd.h", "nvrtc.h", "cuda.h", "cuda_runtime.h", "cstdint", "cudnn.h", "Python.h", "cusparse.h", "cuda_runtime_api.h", "cuda_fp16.h", "cublas_v2.h", "stdint.h", "curand_kernel.h"]:
return fmt(p)
if any(p.startswith(pref) for pref in ["torch/csrc", "c10/", "ATen/", "caffe2/", "TH/", "THC/", "Eigen/", "gtest/", "zdl/", "gloo/", "onnx/", "miopen/"]):
return fmt(p)
for root in ["aten/src", "torch/lib", ""]:
for bad_root in [os.path.dirname(fn), "aten/src/TH", "aten/src/THC", "torch/csrc"]:
new_p = os.path.relpath(os.path.join(bad_root, p), root)
if not new_p.startswith("../") and (os.path.exists(os.path.join(root, new_p)) or os.path.exists(os.path.join(root, new_p + ".in"))):
return fmt(new_p)
print("ERROR: ", fn, p)
return m.group(0)
new_c = re.sub(r'#include "([^"]+)"', repl, c)
if new_c != c:
print(fn)
with open(fn, 'w') as f:
f.write(new_c)
```
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14849
Reviewed By: dzhulgakov
Differential Revision: D13363445
Pulled By: ezyang
fbshipit-source-id: 52361f878a672785f9306c9e9ab2513128092b68
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14246
This commit systematically eliminates THCStream entirely from THC, replacing it
with at::cuda::CUDAStream. In places where the previous pointer type showed up
in a public API signature, those functions are now only available to C++
clients. (It would not be too difficult to make a C-compatible version of
CUDAStream, as it's really just a simple struct, but we leave this for
future work.)
All functions in THC that referred to THCStream were expunged in favor of their
modern counterparts.
One annoyance was that I didn't feel like redoing how the torch.cuda.Stream
binding code worked, but I really wanted to get rid of the stored THCStream*
pointer. So I repurposed the bit-packing code I implemented for Stream hashing,
and used that to (reversibly) store streams in a uint64_t cdata field. A perhaps
more future proof solution would be to get rid of cdata entirely, and store the
device and stream ID directly.
Billing of changes:
- All CUDAStream_ pointer API functions are now hidden and anonymously
namespaced (instead of being in the impl namespace). All use sites
rewritten to use the modern C++ API. Since CUDAStreamInternals is no
longer part of the public API, the CUDAStreamInternals constructor and
internals() method have been removed, and replaced with anonymous
functions in the C++ file.
- device_index() returns DeviceIndex rather than int64_t now
- Stream and CUDAStream now have pack/unpack methods. (CUDAStream checks
that the unpacked bit-pattern is for a CUDA device.)
- THCStream.h header is removed entirely
- Most THCStream handling functions in THC API are removed
Reviewed By: gchanan
Differential Revision: D13121531
fbshipit-source-id: 48873262cc0a37c3eec75a7ba1c93c800da40222
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13125
Previously, it returned a vector of THCStream*, which we eventually turned
into CUDAStream. No need to spatter the conversion code everywhere: just
do it correctly to begin with. An important side effect of doing it this
way is that we no longer pass nullptr to CUDAStream; instead, we create
the default stream. I will rely on this in a later patch.
Reviewed By: gchanan
Differential Revision: D10853224
fbshipit-source-id: f6bd6594eba4626eb41a4a5e67fc64c9bbb46a1a
Summary:
How did we get so many uses of `NULL` again?
ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11047
Differential Revision: D9566799
Pulled By: goldsborough
fbshipit-source-id: 83469f352ac69aa65bdaf1a1a21f922d892e0db3
Summary:
This is enabled by the allocator patch; previously we could not
deduplicate THStorage_free/THCStorage_free; now we can.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9495
Reviewed By: SsnL
Differential Revision: D8875497
Pulled By: ezyang
fbshipit-source-id: 387198dff446eb9f84d2d6187066fae1d595dea7
* Remove remaining TensorTypeUtils functions.
Mostly what's remaining is copy utilities -- these are now provided in THCTensorCopy.hpp and templatized on the ScalarType rather than the TensorType.
* Have a single THTensor / THCTensor type.
As was previously done with Storages, have only a single (dtype-independent) THTensor / THCTensor.
For documentation and backwards compatibility purposes, the old names, e.g. TH(Cuda)LongTensor alias the new TH(C)Tensor type.
* undef GENERATE_SPARSE.
These files don't follow the usual pattern: In general the files torch/csrc/X torch/csrc/cuda/X
both include the generic file torch/csrc/generic/X, where torch/csrc/X includes the cpu implementations and torch/csrc/cuda/X includes the cuda implementations.
(Aside: this is probably not the best structure, the torch/csrc/X fiels should probably be moved to torch/csrc/cpu/X).
utils.cpp combines these so that torch/csrc/utils.cpp has cuda specific code. This makes it impossible to declare a single THTensor and THCTensor template type (i.e. THPPointer<_THTensor>, THPointer<_THCTensor>).