Summary:
Following up on this: https://github.com/pytorch/pytorch/pull/35851 cross dtype storage copy is not being used internally, so I have not included cross dtype copy for complex.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35771
Differential Revision: D21319650
Pulled By: anjali411
fbshipit-source-id: 07c72996ee598eba0cf401ad61534494d6f5b5b3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18749
ghimport-source-id: 9026a037f5e11cdb9ccd386f4b6b5768b9c3259b
Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18751 Disallow changing the device of a tensor via set_.
* #18750 Use non-legacy constructors for tensor deserialization.
* **#18749 Add device and dtype to storage.**
The goal here is to fix our serialization, which currently depends on the legacy constructors. Having dtype and device on Storage allows us to use the non-legacy constructors.
This fits somewhat along our goal of removing Storage, my having Storage act like a Tensor.
Differential Revision: D14729516
fbshipit-source-id: bf4a3e8669ad4859931f4a3fa56df605cbc08dcb
Summary:
This is to fix#16141 and similar issues.
The idea is to track a reference to every shared CUDA Storage and deallocate memory only after a consumer process deallocates received Storage.
ezyang Done with cleanup. Same (insignificantly better) performance as in file-per-share solution, but handles millions of shared tensors easily. Note [ ] documentation in progress.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16854
Differential Revision: D13994490
Pulled By: VitalyFedyunin
fbshipit-source-id: 565148ec3ac4fafb32d37fde0486b325bed6fbd1
Summary:
This is the first commit from a series of planned changes in order to add boolean tensors to PyTorch. The whole plan looks like this:
0. Storage Implementation (this change)
1. Tensor Creation.
2. Tensor Conversions.
3. Tensor Indexing.
4. Tensor Operations.
5. Back compatibility related changes.
This feature was requested by the community:
https://github.com/pytorch/pytorch/issues/4764https://github.com/pytorch/pytorch/issues/4219https://github.com/pytorch/pytorch/issues/4288
**Change**:
Added boolean type to the Storage class for CPU and CUDA backends.
**Tested via**:
1. unit tests
2. running this:
-> import torch
-> torch.BoolStorage
<class 'torch.BoolStorage'>
-> torch.cuda.BoolStorage
<class 'torch.cuda.BoolStorage'>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16810
Reviewed By: gchanan
Differential Revision: D14087246
Pulled By: izdeby
fbshipit-source-id: 042642ced1cb0fd1bb6bff05f9ca871a5c54ee5e
Summary:
Anywhere we used #include "foo.h", we now say #include <foo.h>
Paths are adjusted to be rooted out of aten/src, torch/lib, or
the root level directory.
I modified CMakeLists.txt by hand to remove TH and THC from
the include paths.
I used the following script to do the canonicalization:
```
import subprocess
import re
import os.path
files = subprocess.check_output(['git', 'ls-files']).decode('utf-8').rstrip().split('\n')
for fn in files:
if not any(fn.endswith(suff) for suff in ['.cu', '.cpp', '.in', '.h', '.hpp', '.cu', '.cuh', '.cc']):
continue
if not any(fn.startswith(pref) for pref in ["aten/", "torch/"]):
continue
with open(fn, 'r') as f:
c = f.read()
def fmt(p):
return "#include <{}>".format(p)
def repl(m):
p = m.group(1)
if p in ["dlfcn.h", "unistd.h", "nvrtc.h", "cuda.h", "cuda_runtime.h", "cstdint", "cudnn.h", "Python.h", "cusparse.h", "cuda_runtime_api.h", "cuda_fp16.h", "cublas_v2.h", "stdint.h", "curand_kernel.h"]:
return fmt(p)
if any(p.startswith(pref) for pref in ["torch/csrc", "c10/", "ATen/", "caffe2/", "TH/", "THC/", "Eigen/", "gtest/", "zdl/", "gloo/", "onnx/", "miopen/"]):
return fmt(p)
for root in ["aten/src", "torch/lib", ""]:
for bad_root in [os.path.dirname(fn), "aten/src/TH", "aten/src/THC", "torch/csrc"]:
new_p = os.path.relpath(os.path.join(bad_root, p), root)
if not new_p.startswith("../") and (os.path.exists(os.path.join(root, new_p)) or os.path.exists(os.path.join(root, new_p + ".in"))):
return fmt(new_p)
print("ERROR: ", fn, p)
return m.group(0)
new_c = re.sub(r'#include "([^"]+)"', repl, c)
if new_c != c:
print(fn)
with open(fn, 'w') as f:
f.write(new_c)
```
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14849
Reviewed By: dzhulgakov
Differential Revision: D13363445
Pulled By: ezyang
fbshipit-source-id: 52361f878a672785f9306c9e9ab2513128092b68
Summary:
```
This removes PyObjectFinalizer. We were seeing SIGSEGV at exit in some
programs that use multiprocessing. The backtrace pointed to
StorageRef.__del__ being called from subtype_dealloc. My guess is that
the Python interpreter was shutdown before all C++ Storage objects were
deallocated. Deallocating the C++ Storage called the finalizer which
called back into Python after it was no longer safe to do so.
This avoids a callback from C++ into Python during Storage finalization.
Instead, dead Storage objects (expired weak references) are collected
periodically when shared_cache exceeds a limit. The limit is scaled with
2x the number of live references, which places an upper bound on the
amount of extra memory held by dead Storage objects. In practice, this
should be very small.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10407
Differential Revision: D9272400
Pulled By: colesbury
fbshipit-source-id: ecb14d9c6d54ffc91e134c34a4e770a4d09048a2
Summary:
The underlying use-case is the file descriptor to storage cache in
torch.multiprocessing.reductions. Previously, this was implemented by wrapping
an existing allocator with a "weak ref" allocator which also knew to null out
the weak reference when the storage died. This is terribly oblique, and
prevents us from refactoring the allocators to get rid of per-storage allocator
state.
So instead of going through this fiasco, we instead directly implement weak
pointers and finalizers in THStorage. Weak pointers to THStorage retain the
THStorage struct, but not the data_ptr. When all strong references die,
data_ptr dies and the finalizers get invoked.
There is one major hazard in this patch, which is what happens if you
repeatedly call _weak_ref on a storage. For cleanliness, we no longer
shove our grubby fingers into the finalizer struct to see if there is already
a Python object for the weak reference and return it; we just create a new one
(no one is checking these Python objects for identity). This means if you
keep calling it, we'll keep piling on finalizers. That's bad! But I am
not going to fix it until it is actually a problem for someone, because
then we need to add another caching layer.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9148
Differential Revision: D8729106
Pulled By: ezyang
fbshipit-source-id: 69710ca3b7c7e05069090e1b263f8b6b9f1cf72f
Summary:
Closes https://github.com/pytorch/pytorch/pull/9107
Some details about how this was done:
- For now, the allocators for CPU and CUDA are different (unifying
the allocators is a bigger change to make, I'll contribute this in
a later patch). To smooth this over, the allocator field now
stores a void* instead of THAllocator* or THCDeviceAllocator*; to
make this clear the field is renamed to allocatorVoidPtr.
- Some THStorage functions which were generated per-scalar are now
generalized, and thus moved out of the generic/ library. This way
they can be called directly from a non-code-generated at::Storage
- THCState is moved into a C++ header. This is actually not really
related to this particular diff, but I'll need it soon to replace
THAllocator/THCDeviceAllocator with at::Allocator (C++, so I can't
mention it in a C header file.)
- THPPointer needs to be adjusted, since there is no more type refinement
between THStorage/THCStorage for it to template match over. This
is a little tricky, because I can't refer to THCStorage_free unless
we actually compile with CUDA. So there's two copies of the function
now: one for the CPU build, one for the CUDA build. If we ever split
CUDA/non-CUDA Python builds, you will have to indirect this through some
dynamic dispatch.
I want to soon replace the THCDeviceAllocator pointers in
THCState with at::Allocator, but I can't reference a C++ namespaced type
from C code, so THCState needs to move.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Closes https://github.com/pytorch/pytorch/pull/9087
Reviewed By: orionr
Differential Revision: D8712072
Pulled By: ezyang
fbshipit-source-id: c6e1ea236cd1df017b42a7fffb2dbff20d50a284
* Don't override Tensor, Storage macros defined outside torch/csrc in torch/csrc.
This PR does the following:
1) Removes THSTensor macros in torch/csrc, which aren't used.
2) For macros defined outside of torch/csrc (THTensor, THTensor_, THStorage, THStorage_):
a) No longer override them, i.e. previously THTensor could actually be THCTensor if a generic file was included from a file including THCP.h.
b) Instead, introduce new macros THW* (e.g. THWTensor) to represent a (potentially empty) wildcard character.
In addition to making this code easier to read and codemod, this allows us to more freely change TH/THC; for example:
currently in the THC random code, the state is casted to THByteTensor*; this happens to work because the macros don't happen to override THByteTensor.
But if THByteTensor just becomes an alias of THTensor (which is the plan for a single tensor type), then this no longer works.
The whole thing is a bit of a mess previously because you really have to understand which macros and redefined and which aren't.
We could also rename the macros that live in torch/csrc (e.g. the THPTensor macros), but since that is more self contained, I punted for now.
* Don't change the plugin.
No longer generate data-type specific Storage types, since all Storage types are now identical anyway.
For (some) backwards compatibility and documentation purposes, the Real names, e.g. THLongStorage are now #defined as aliases to the single THStorage type
Changelist:
- Move *.c to *.cpp
- Change includes of ".c" to ".cpp"
- A bunch of cmake configuration modifying CMAKE_C_FLAGS changed
to CMAKE_CXX_FLAGS or add_compile_options, because if you do CMAKE_C_FLAGS it only applies when you compile C code
- Explicitly cast void* to T* in a number of places
- Delete extern "C" { ... } blocks; instead, properly apply TH_API to everything that should have it (TH_API handles extern "C")
- Stop using stdatomic.h, instead, use <atomic>. This resulted in a bunch of placement-new/delete to be "totally properly correct"
- Refactor of THLongStorageView to not have static constructor methods (since it no longer has a copy/move constructor)
- Documentation about how the TH C interface (and extern C business) works
- Note that THD master_worker mode is dead
- C++ headers in TH libraries are given .hpp suffix, to make it less likely that you'll confuse them with the C-compatible headers (now suffixed .h)
- New function THCStream_stream and THCStream_device to project out fields of THCStream instead of accessing fields directly
- New function THStorage_(retainIfLive), which is equivalent to a retain but only if the refcount is greater than zero.
- In general, I tried to avoid using hpp headers outside of ATen/TH. However, there were a few places where I gave up and depended on the headers for my own sanity. See Note [TH abstraction violation] for all the sites where this occurred. All other sites were refactored to use functions
- Some extra Werror fixes (char* versus const char*)
- Remove some uses of mega-header THP.h
- Use HANDLE_TH_ERRORS in functions that may throw
- Move NumPy includes to common header
- Delete unused allocator