Commit Graph

399 Commits

Author SHA1 Message Date
Pritam Damania
2b221a9599 Remove PyCFunction casts as much as possible. (#46227)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46227

Follow up from https://github.com/pytorch/pytorch/issues/45419, in
this PR I've removed as many PyCFunction casts as I could from the codebase.

The only ones I didn't remove were the ones with `METH_VARARGS | METH_KEYWORDS`
which have 3 parameters instead of 2 and had to be casted. Example: `
{"copy_", (PyCFunction)(void(*)(void))THPStorage_(copy_), METH_VARARGS |
METH_KEYWORDS, nullptr},`
ghstack-source-id: 114632704

Test Plan: waitforbuildbot

Reviewed By: albanD

Differential Revision: D24269435

fbshipit-source-id: 025cfd43a9a2a3e59f6b2951c1a78749193d77cf
2020-10-20 15:01:51 -07:00
Gregory Chanan
2070834b9e Improve error checking of Storage._writeFile. (#46036)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46036

Previously, this function didn't do error-bounds checking on the GetItem (GET_ITEM) calls, which led to issues like https://github.com/pytorch/pytorch/issues/46020.

A better solution would be to use pybind, but given writing the file is going to dominate bounds checking, this is strictly better.

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D24228370

Pulled By: gchanan

fbshipit-source-id: f5d0a3d21ff12b4380beefe1e9954fa81ea2f567
2020-10-12 11:10:04 -07:00
Nikita Shulga
4066022146 Do not use PRId64 in torch/csrc (#44767)
Summary:
Instead use `fmt::format()` or `%lld` and cast argument to `(long long)`
Fix typos and add helper `PyErr_SetString()` method in torch/csrc/Exceptions.h

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44767

Reviewed By: ezyang

Differential Revision: D23723671

Pulled By: malfet

fbshipit-source-id: c0101aed222184aa436b1e8768480d1531dff232
2020-09-17 14:00:02 -07:00
Kurt Mohler
f9eb8824f1 Remove datatype from Storage and StorageImpl (#38870)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38870

* Removed dtype data member from StorageImpl
* Removed any methods or method arguments in Storage/StorageImpl that deal with dtypes
* Update all callers of the changed API

Part of issue https://github.com/pytorch/pytorch/issues/33950
Original PR: https://github.com/pytorch/pytorch/pull/38038

Reviewed By: albanD

Differential Revision: D21549645

Pulled By: ezyang

fbshipit-source-id: 4289b356c55ff6b9530376a79343b99b540ee3de
2020-05-21 15:26:08 -07:00
Edward Yang
fe88806784 Back out "Revert D21171334: [pytorch][PR] Change StorageImpl to track byte count rather than element count" (#37893)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37893

Original commit changeset: 50746043acf3

Test Plan: sandcastle and ossci

Reviewed By: malfet, seemethere, ngimel

Differential Revision: D21416509

fbshipit-source-id: 735ec4e61f9d36d4537f52dd2dc6267751aeb94b
2020-05-05 22:43:15 -07:00
Edward Yang
a2fc7f787a Revert D21171334: [pytorch][PR] Change StorageImpl to track byte count rather than element count
Test Plan: revert-hammer

Differential Revision:
D21171334

Original commit changeset: 37329a379de9

fbshipit-source-id: 50746043acf3c76754688de0fe6f1cc12437ea2f
2020-05-05 16:36:15 -07:00
Kurt Mohler
3706803b60 Change StorageImpl to track byte count rather than element count (#37776)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37776

* Remove type-specific size tracking in favor of byte size tracking in Storage and StorageImpl
* Changed numel() and set_numel() to nbytes() and set_nbytes()
* Added enum argument to Storage/StorageImpl constructor to indicate new meaning of the size parameter
* Update all callers of the changed API

Part of issue https://github.com/pytorch/pytorch/issues/33950
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37028

Differential Revision: D21171334

Pulled By: ezyang

fbshipit-source-id: 37329a379de9a3a83cc5e9007e455a3e1c2d10b8
2020-05-05 14:20:51 -07:00
anjali411
1f09f7ea44 Python API for Complex Storage and storage copy logic (#35771)
Summary:
Following up on this: https://github.com/pytorch/pytorch/pull/35851 cross dtype storage copy is not being used internally, so I have not included cross dtype copy for complex.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35771

Differential Revision: D21319650

Pulled By: anjali411

fbshipit-source-id: 07c72996ee598eba0cf401ad61534494d6f5b5b3
2020-05-01 11:47:22 -07:00
Gregory Chanan
287f3b746e Remove Backend -> THPLayout mapping. (#37527)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37527

This is yet another place that needs to be updated for adding a new "Backend" and is unnecessary.  Instead, just use layout_from_backend and have a map from Layout -> THPLayout.

Other changes:
- rename torch::getDtype and torch::getLayout to torch::getTHPDtype and torch::getTHPLayout since e.g. for layout you are both passing in and returning a "layout" type.
- add NumOptions to Layout to match the dtype/ScalarType formulation.

Test Plan: Imported from OSS

Differential Revision: D21309836

Pulled By: gchanan

fbshipit-source-id: ede0e4f3bf7ff2cd04a9b17df020f0d4fd654ba3
2020-04-30 11:11:09 -07:00
davidriazati
74ce3a032c Fix some bugs with zipfile serialization (#32244)
Summary:
Stacked PRs
 * #32958 - Make zip serialization the default
 * **#32244 - Fix some bugs with zipfile serialization**

It includes the following changes:
* Split up tests so that we can test both serialization methods
    * Loading something within a buffer doesn't work anymore, so those tests are only on the old serialization method (it's possible but introduces a big slowdown since it requires a linear scan of the entire zipfile to find the magic number at the end)
* Call `readinto` on a buffer if possible instead of `read` + a copy
* Disable CRC-32 checks on read (there was some issue where miniz said the CRC was wrong but `zipinfo` and `unzip` said the zip file was fine)
](https://our.intern.facebook.com/intern/diff/19418935/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32244

Pulled By: driazati

Reviewed By: eellison

Differential Revision: D19418935

fbshipit-source-id: df140854f52ecd04236225417d625374fd99f573
2020-02-05 15:32:14 -08:00
Brian Wignall
e7fe64f6a6 Fix typos (#30606)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30606

Differential Revision: D18763028

Pulled By: mrshenli

fbshipit-source-id: 896515a2156d062653408852e6c04b429fc5955c
2019-12-02 20:17:42 -08:00
Davide Libenzi
cc6af45944 Fix writeable strings warnings.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29512

Differential Revision: D18417715

Pulled By: mruberry

fbshipit-source-id: 7029f0a73bcdf0ce8594b90b6f5af8be4e8b5e02
2019-11-09 21:16:35 -08:00
vishwakftw
86c64440c9 Make PyTorch Python 3.8 compatible (#29302)
Summary:
PEP 590 modifies the `tp_print` offset to `tp_vectorcall_offset` - which requires a Py_ssize_t object.
Passing a nullptr caused compatibility issues for Python 3.8.

Changelog:
- Modify all occurrences of `nullptr  /* tp_print */` to 0  /* tp_vectorcall_offset */
- Minor formatting changes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29302

Test Plan:
- Local fresh build with Python 3.8 completed successfully.

Fixes https://github.com/pytorch/pytorch/issues/28060.
Fixes https://github.com/pytorch/pytorch/issues/29162.

Supersedes https://github.com/pytorch/pytorch/pull/28364

Differential Revision: D18372022

Pulled By: ezyang

fbshipit-source-id: 8e9a15b0d0f72101ccc69bd489f5efa216b880bb
2019-11-07 09:20:19 -08:00
Edward Yang
a5d356cb39 Delete THP_CORE macro; partially replace with THP_BUILD_MAIN_LIB (#29143)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29143

THP_CORE macro is a very old macro that appeared to have served
two purposes:

1. The torch-python equivalent of CAFFE2_BUILD_MAIN_LIB, to toggle
   symbol visibility headers

2. Some sort of ad hoc way of hiding certain definitions from headers
   so external clients can't get at them.

It did (2) in a very confusing manner, because we set THP_CORE in both
torch and torch-python (it shouldn't do anything in torch).  In this
PR I just get rid of use case (2) entirely (so everything shows up in
headers all the time), and then redo (1) using a new THP_BUILD_MAIN_LIB
macro.  This cleans up some of the macro definitions and makes my life
easier for working on #27215.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18309594

Pulled By: ezyang

fbshipit-source-id: adcb6d7cb387cd818480137e2b94e5e761dbfefc
2019-11-06 15:02:02 -08:00
Jerry Zhang
23193c155f Quantized Tensor support copy (#28612)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28612

att

Test Plan:
python test/test_quantized_tensor.py

Imported from OSS

Differential Revision: D18255247

fbshipit-source-id: 814b12640fdf9d79b27482ee642ce430dbaeea68
2019-11-01 17:40:17 -07:00
Pritam Damania
fe4170bda8 Add send and recv backward functions for builtin operators RPC. (#25527)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25527

Master GH issue: https://github.com/pytorch/pytorch/issues/23110.

This change builds upon https://github.com/pytorch/pytorch/pull/24876 and
provides all the autograd hooks needed for a forward pass with distributed rpc
for builtin operators. This change does not address distributed rpc for python
UDFs and that will be addressed in follow up PRs.

Summary of changes:
1. Attach send autograd functions when a request is sent from the client and
response is sent from the server.
2. Attach receive autograd functions when a request is received on the server
and a response is received on the client.
3. Generate a globally unique autograd_message_id for each send/recv autograd
function pair to uniquely identify them.
ghstack-source-id: 91240466

Test Plan: unit tests.

Differential Revision: D17148077

fbshipit-source-id: 192d8a3f552ed7cc939f55dcca332965c9bd3233
2019-10-03 01:18:46 -07:00
peter
ec07d144ba Fixed seek offset size to 64bit. (#27125)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/26998.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27125

Differential Revision: D17687154

Pulled By: ezyang

fbshipit-source-id: 6784f4fd799130ac72a25884f120a0ba96bd4f51
2019-10-01 08:50:32 -07:00
Edward Yang
b16358b251 Revert D17666050: [pytorch][PR] Fixed seek offset size to 64bit.
Test Plan: revert-hammer

Differential Revision:
D17666050

Original commit changeset: f02ebd5320ae

fbshipit-source-id: 6bc8fe583e350e2b573f767af85d1287dd048d1f
2019-09-30 11:07:35 -07:00
Yoshiaki Nakamura
1afe3fc01e Fixed seek offset size to 64bit. (#27047)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/26998
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27047

Differential Revision: D17666050

Pulled By: ezyang

fbshipit-source-id: f02ebd5320ae25f8949be20d0744fe3cd3e2fee9
2019-09-30 07:52:15 -07:00
Bulent Abali
afa5d0823b Fixes big endian arch bugs. (#26383)
Summary:
Serialization.cpp fails on big endian machines.
This patch fixes the endian bugs and also makes the pytorch
model files portable across different endian architectures.
x86 generated model file can be read on s390 arch.

First problem, is serialization.cpp forgets to convert "size" value
of the storage elements to the native byte order.
torch.load throws an assertion as a result
(see the first stack trace below).

Second problem is when it reads the model from storage (doRead)
it decodes values to little endian which is the wrong order
on a big endian machine.  The decode should be
to THP_nativeByteOrder() instead
	(see the model dump below)
```loaded_model = torch.load( opt.model_file, map_location=torch.device("cpu"))
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 422, in load
return _load(f, map_location, pickle_module, **pickle_load_args)
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 616, in _load
deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)
RuntimeError: storage has wrong size: expected 2305843009213693952 got 32
	(the very long number is actually 32 in the wrong endianness)
```

Model file load on x86 (correct output)
```>>> import torch
>>> torch.load('400f2k_best.model', map_location=torch.device("cpu"))
{'epoch': 24, 'model_type': 'emb_aec', 'classifier_model': OrderedDict([('model.0.weight', tensor([[ 2.4608e-01, -1.1174e-01, -1.0854e-01,  4.0124e-01, -1.5261e-02,
         -1.2206e-01,  1.3229e-01, -1.2615e-01, -5.2773e-01,  2.6333e-01,
         -3.1462e-03, -1.4902e-01,  9.8545e-02, -1.5789e-01, -2.2625e-01,
         -1.0776e-01, -9.0895e-02, -3.8530e-01,  9.1152e-01, -3.9720e-01,
         -8.5848e-01, -4.7837e-02, -1.5178e-01,  8.5023e-02,  1.5013e-01,
         -9.9294e-02, -2.7422e-01, -4.3986e-01, -4.4297e-01, -3.9570e-01,
```

Model file load on s390x (wrong endianness; notice the exponents)
```>>> import torch
>>> torch.load( "400f2k_best.model", map_location=torch.device("cpu"))
{'epoch': 24, 'model_type': 'emb_aec', 'classifier_model': OrderedDict([('model.0.weight', tensor([[ 9.2780e+21, -9.7722e-11,  4.1350e+33,  7.782e+34,  4.2056e-31,
          9.0784e+18,  1.1846e-32,  3.3320e-32, -4.8288e-28, -7.2679e+12,
          1.5379e-16, -5.2604e+12, -4.7240e+17,  4.6092e-21, -1.8360e-20,
         -2.7712e-31,  1.4548e-16, -2.5089e-27,  7.9094e-10,  7.1977e+34,
          1.1930e+26,  8.4536e+15,  2.7757e+23, -5.8455e-10, -1.5611e+09,
         -1.1311e-23,  6.6451e+19, -2.0970e+20,  3.4878e-19, -1.0857e-12,
          7.8098e+22,  5.3998e-35],
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26383

Differential Revision: D17480891

fbshipit-source-id: f40569c7b9c4a1935dceb41f1a2508ce21ea3491
2019-09-19 19:58:02 -07:00
Ralf Gommers
1b4951d3a5 Fix remaining invalid function cast warnings that show up with GCC 8/9 (#26104)
Summary:
Follow-up to gh-25483, more of the same fixes for warnings like:

```
../torch/csrc/autograd/python_variable.cpp:503:31: warning: cast between incompatible function types from ‘PyObject* (*)(THPVariable*)’ {aka ‘_object* (*)(THPVariable*)’} to ‘getter’ {aka ‘_object* (*)(_object*, void*)’} [-Wcast-function-type]
  503 |   {"_backward_hooks", (getter)THPVariable_get_backwards_hooks, (setter)THPVariable_set_backwards_hooks, nullptr, nullptr},
      |                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```

This takes the build log output for a full rebuild with GCC 9.1 from ~10,000 to ~7,000 lines.

`clang-tidy` is going to complain, no way around that - see discussion at the end of gh-25483.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26104

Differential Revision: D17396831

Pulled By: ezyang

fbshipit-source-id: d71696bfe4dbe25519e4bcb7753151c118bd39f7
2019-09-17 07:43:37 -07:00
Ralf Gommers
4299faa10b Fix invalid function cast warnings that show up with GCC 8/9 (#25483)
Summary:
Fixes ~5000 lines of warnings like:

```
In file included from ../aten/src/TH/TH.h:4,
                 from ../torch/csrc/Storage.cpp:11:
../torch/csrc/Storage.h:6:39: warning: cast between incompatible function types from ‘PyObject* (*)(THPStorage*)’ {aka ‘_object* (*)(THPStorage*)’} to ‘getter’ {aka ‘_object* (*)(_object*, void*)’} [-Wcast-function-type]
    6 | #define THPStorage_(NAME) TH_CONCAT_4(THP,Real,Storage_,NAME)
      |                                       ^~~
caffe2/aten/src/TH/THGeneral.h:154:37: note: in definition of macro ‘TH_CONCAT_4_EXPAND’
  154 | #define TH_CONCAT_4_EXPAND(x,y,z,w) x ## y ## z ## w
      |                                     ^
../torch/csrc/Storage.h:6:27: note: in expansion of macro ‘TH_CONCAT_4’
    6 | #define THPStorage_(NAME) TH_CONCAT_4(THP,Real,Storage_,NAME)
      |                           ^~~~~~~~~~~
../torch/csrc/generic/Storage.cpp:299:22: note: in expansion of macro ‘THPStorage_’
  299 |   {"device", (getter)THPStorage_(device), nullptr, nullptr, nullptr},
      |                      ^~~~~~~~~~~
../torch/csrc/Storage.h:6:39: warning: cast between incompatible function types from ‘PyObject* (*)(THPStorage*)’ {aka ‘_object* (*)(THPStorage*)’} to ‘getter’ {aka ‘_object* (*)(_object*, void*)’} [-Wcast-function-type]
    6 | #define THPStorage_(NAME) TH_CONCAT_4(THP,Real,Storage_,NAME)
      |                                       ^~~
caffe2/aten/src/TH/THGeneral.h:154:37: note: in definition of macro ‘TH_CONCAT_4_EXPAND’
  154 | #define TH_CONCAT_4_EXPAND(x,y,z,w) x ## y ## z ## w
      |                                     ^
../torch/csrc/Storage.h:6:27: note: in expansion of macro ‘TH_CONCAT_4’
    6 | #define THPStorage_(NAME) TH_CONCAT_4(THP,Real,Storage_,NAME)
      |                           ^~~~~~~~~~~
```

This issue and the fix is very similar to how CPython fixed it, see https://bugs.python.org/issue33012.

There's still more of these warnings left, but this fixes the majority of them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25483

Differential Revision: D17149824

Pulled By: ezyang

fbshipit-source-id: 353560a4f76070fa7482608e9532b60205d16798
2019-09-09 06:35:11 -07:00
Tongzhou Wang
af638ad5d7 pin_memory should not copy on already pinned tensors (#23484)
Summary:
fixes https://github.com/pytorch/pytorch/issues/21076
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23484

Differential Revision: D16546264

Pulled By: ezyang

fbshipit-source-id: 8058e0bbc6336751f36b884d71234feef498a982
2019-07-30 21:16:23 -07:00
Iurii Zdebskyi
3a8d7463bd Enabled BFloat16 storage (#21523)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21523
ghimport-source-id: 698b3cbd6b21c09b9ff8bf8011980df8e35c33b0

Test Plan: Imported from OSS

Differential Revision: D15819368

Pulled By: izdeby

fbshipit-source-id: f6b3bba7b3ca8ee677bd80a231dbb3920c07d61c
2019-07-09 21:51:06 -07:00
Pieter Noordhuis
6ff0c6ca3f Remove THD (#22065)
Summary:
It's been ~9 months since moving THD to the `torch.distributed.deprecated` namespace (see https://github.com/pytorch/pytorch/issues/11405) and we haven't seen issues related to it, so it's time to remove it.

Closes https://github.com/pytorch/pytorch/issues/18967.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22065

Reviewed By: mrshenli

Differential Revision: D15983669

Pulled By: pietern

fbshipit-source-id: 2a2f5866f9a63040bc7cef3956d5fd215aba7165
2019-06-25 12:19:13 -07:00
Vitaly Fedyunin
b46e87cd3d Fix catch block to fix 'error: catching polymorphic type' (#21637)
Summary:
Fix catch block to fix 'error: catching polymorphic type `class c10::Error` by value [-Werror=catch-value=]'
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21637

Differential Revision: D15761860

Pulled By: VitalyFedyunin

fbshipit-source-id: befc18a9c217440381cdb50a1319b0b5db5710e9
2019-06-11 12:30:52 -07:00
Yoshiaki Nakamura
52596164d4 Fix 32-bit env. model load issue (#20900)
Summary:
Fixed an issue where models can not be loaded in a 32-bit environment like Raspbian.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20900

Differential Revision: D15696709

Pulled By: ezyang

fbshipit-source-id: 37a81f05f235d3b9fc6244e12d3320ced3d1465e
2019-06-06 10:30:29 -07:00
Jerry Zhang
277bf69fa0 Add torch.load/torch.save for QTensor (#20830)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20830

att

Reviewed By: dzhulgakov

Differential Revision: D15340701

fbshipit-source-id: 677038c8101f66dec4856c2eccf9f9e394012226
2019-05-30 20:52:19 -07:00
Jerry Zhang
56fb5e03b5 refactor registerStoragePyTypeObject (#20467)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20467

for upcoming changes in Storage for QInt8

Reviewed By: ezyang

Differential Revision: D15330865

fbshipit-source-id: 2840e59c0bf088983f792fd724de41b3bb3dec55
2019-05-14 18:22:33 -07:00
Philipp Lang
f23fb66e6e Fix in file position logic: file descriptor and Python-side handle (#20270)
Summary:
This addresses #18436

The logic replicates the essence of closing file descriptors in numpy:
bf20e30340/numpy/core/include/numpy/npy_3kcompat.h (L278)

This stores the position of the file descriptor before resetting it to the Python handle offset, then resets to the original position before exit. The Python-side handle is then updated to reflect the new position. Also added somewhat more demanding tests to cover this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20270

Differential Revision: D15275902

Pulled By: soumith

fbshipit-source-id: 5ca8a52b61c7718d2e69571f72f80b1350b0acdb
2019-05-09 08:20:01 -07:00
Gregory Chanan
2113ea6fbf Add device and dtype to storage. (#18749)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18749
ghimport-source-id: 9026a037f5e11cdb9ccd386f4b6b5768b9c3259b

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18751 Disallow changing the device of a tensor via set_.
* #18750 Use non-legacy constructors for tensor deserialization.
* **#18749 Add device and dtype to storage.**

The goal here is to fix our serialization, which currently depends on the legacy constructors.  Having dtype and device on Storage allows us to use the non-legacy constructors.

This fits somewhat along our goal of removing Storage, my having Storage act like a Tensor.

Differential Revision: D14729516

fbshipit-source-id: bf4a3e8669ad4859931f4a3fa56df605cbc08dcb
2019-04-03 07:59:02 -07:00
Vitaly Fedyunin
5653a914f7 Implement reference counting for shared IPC CUDA tensors (#16854)
Summary:
This is to fix #16141 and similar issues.

The idea is to track a reference to every shared CUDA Storage and deallocate memory only after a consumer process deallocates received Storage.

ezyang Done with cleanup. Same (insignificantly better) performance as in file-per-share solution, but handles millions of shared tensors easily. Note [ ] documentation in progress.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16854

Differential Revision: D13994490

Pulled By: VitalyFedyunin

fbshipit-source-id: 565148ec3ac4fafb32d37fde0486b325bed6fbd1
2019-03-25 10:24:38 -07:00
Iurii Zdebskyi
444039c47b Bool tensor. Part 0: Boolean storage implementation (#16810)
Summary:
This is the first commit from a series of planned changes in order to add boolean tensors to PyTorch. The whole plan looks like this:

0. Storage Implementation (this change)
1. Tensor Creation.
2. Tensor Conversions.
3. Tensor Indexing.
4. Tensor Operations.
5. Back compatibility related changes.

This feature was requested by the community:
https://github.com/pytorch/pytorch/issues/4764
https://github.com/pytorch/pytorch/issues/4219
https://github.com/pytorch/pytorch/issues/4288

**Change**:
Added boolean type to the Storage class for CPU and CUDA backends.

**Tested via**:
1. unit tests
2. running this:
-> import torch
-> torch.BoolStorage
<class 'torch.BoolStorage'>
-> torch.cuda.BoolStorage
<class 'torch.cuda.BoolStorage'>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16810

Reviewed By: gchanan

Differential Revision: D14087246

Pulled By: izdeby

fbshipit-source-id: 042642ced1cb0fd1bb6bff05f9ca871a5c54ee5e
2019-02-19 08:22:13 -08:00
Edward Yang
e936a69085 Move THCCachingAllocator to c10_cuda. (#16119)
Summary:
Some renaming and renamespacing also took place. I was originally planning not to do anything, but it turns out that it was easier to make HIPify work by using a namespace CUDACachingAllocator:: rather than THCCachingAllocator_, since :: is a word boundary but _ is not.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/16119

Reviewed By: smessmer

Differential Revision: D13718768

fbshipit-source-id: 884a481d99027fd3e34471c020f826aa12225656
2019-01-24 12:06:56 -08:00
Edward Yang
24b50f1411 Remove unnecessary includes and headers from THCCachingAllocator, move to at::cuda:: namespace (#16117)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16117

This means I can move it to c10_cuda with minimal fuss.

Reviewed By: smessmer

Differential Revision: D13717836

fbshipit-source-id: a94c7dc649af64542480fc1c226b289588886c00
2019-01-24 12:06:54 -08:00
Edward Yang
411173757e Rename away uses of THAllocator and THCDeviceAllocator (#16061)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16061

I discovered I needed to delete these names in preparation of moving
THCCachingAllocator to c10_cuda; might as well also fix all the other
sites too.

Reviewed By: dzhulgakov

Differential Revision: D13686869

fbshipit-source-id: e8cc55d39ac4bfd3e3a22c761f89a7a111ce5f5e
2019-01-16 05:36:47 -08:00
Syed Tousif Ahmed
86af14b0c7 Resolves ptxas warnings when compiling for CUDA_ARCH 750 and a memoryType deprecation warning (#15461)
Summary:
When compiling for `TORCH_CUDA_ARCH_LIST=7.5` we were getting ptxas warnings (https://github.com/pytorch/pytorch/issues/14310). This was because we had some hardcoded values when using launch_bounds in kernels. The maximum number of threads per multiprocessor is 1024 for Turing architecture (7.5) but 2048 for previous architectures. The hardcoded launch_bounds in the kernel were requesting for 2048 threads when compiling for Turing and hence were generating the warning.

This PR adds a macro that checks for the bounds on the launch bounds value supplied. The max number of threads per block across all architectures is 1024. If a user supplies more than 1024, I just clamp it down to 512. Depending on this value, I set the minimum number of blocks per sm. This PR should resolve https://github.com/pytorch/pytorch/issues/14310. The gradient computation being wrong reported in that PR is probably due to the faulty card.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15461

Differential Revision: D13633952

Pulled By: soumith

fbshipit-source-id: 795aa151109f343ab5433bf3cb070cb6ec896fff
2019-01-10 21:44:39 -08:00
Richard Zou
8f11147d43 Use CUDAGuard when serializing CUDA Tensors (#15807)
Summary:
Fixes #15308. Before this change, `torch.save` and `torch.load` would
initialize the CUDA context on GPU 0 if it hadn't been initialized
already, even if the serialized tensors are only on GPU 1.

This PR fixes that bug by using CUDAGuard in the storage serialization
path.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15807

Differential Revision: D13593201

Pulled By: zou3519

fbshipit-source-id: 4addc91ea5a5278d56a03f3d422577ee39e99897
2019-01-08 07:31:50 -08:00
Edward Yang
2d485ffb17 Move CUDAGuard, CUDAStream and CUDAGuardImpl to c10/cuda (#14248)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14248

This diff also introduces a horrifying hack to override CUDA's DeviceGuardImpl
with a HIPGuardImplMasqueradingAsCUDA, to accommodate PyTorch's current
behavior of pretending CUDA is HIP when you build with ROCm enabled.

Reviewed By: bddppq

Differential Revision: D13145293

fbshipit-source-id: ee0e207b6fd132f0d435512957424a002d588f02
2018-12-12 11:24:26 -08:00
Edward Yang
517c7c9861 Canonicalize all includes in PyTorch. (#14849)
Summary:
Anywhere we used #include "foo.h", we now say #include <foo.h>
Paths are adjusted to be rooted out of aten/src, torch/lib, or
the root level directory.

I modified CMakeLists.txt by hand to remove TH and THC from
the include paths.

I used the following script to do the canonicalization:

```
  import subprocess
  import re
  import os.path

  files = subprocess.check_output(['git', 'ls-files']).decode('utf-8').rstrip().split('\n')
  for fn in files:
      if not any(fn.endswith(suff) for suff in ['.cu', '.cpp', '.in', '.h', '.hpp', '.cu', '.cuh', '.cc']):
          continue
      if not any(fn.startswith(pref) for pref in ["aten/", "torch/"]):
          continue
      with open(fn, 'r') as f:
          c = f.read()
      def fmt(p):
          return "#include <{}>".format(p)
      def repl(m):
          p = m.group(1)
          if p in ["dlfcn.h", "unistd.h", "nvrtc.h", "cuda.h", "cuda_runtime.h", "cstdint", "cudnn.h", "Python.h", "cusparse.h", "cuda_runtime_api.h", "cuda_fp16.h", "cublas_v2.h", "stdint.h", "curand_kernel.h"]:
              return fmt(p)
          if any(p.startswith(pref) for pref in ["torch/csrc", "c10/", "ATen/", "caffe2/", "TH/", "THC/", "Eigen/", "gtest/", "zdl/", "gloo/", "onnx/", "miopen/"]):
              return fmt(p)
          for root in ["aten/src", "torch/lib", ""]:
              for bad_root in [os.path.dirname(fn), "aten/src/TH", "aten/src/THC", "torch/csrc"]:
                  new_p = os.path.relpath(os.path.join(bad_root, p), root)
                  if not new_p.startswith("../") and (os.path.exists(os.path.join(root, new_p)) or os.path.exists(os.path.join(root, new_p + ".in"))):
                      return fmt(new_p)
          print("ERROR: ", fn, p)
          return m.group(0)
      new_c = re.sub(r'#include "([^"]+)"', repl, c)
      if new_c != c:
          print(fn)
          with open(fn, 'w') as f:
              f.write(new_c)
```

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14849

Reviewed By: dzhulgakov

Differential Revision: D13363445

Pulled By: ezyang

fbshipit-source-id: 52361f878a672785f9306c9e9ab2513128092b68
2018-12-08 19:38:30 -08:00
Peter Goldsborough
d6c53328f9 Large scale fix of python-related files in torch/csrc/
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14515

Differential Revision: D13247966

Pulled By: goldsborough

fbshipit-source-id: 7a127c508fc576a7a92626dd6b729f660162d628
2018-12-07 13:04:46 -08:00
Lin Huang
524574ab73 Define THPStorage struct only once (rather than N times) (#14802)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14802

The definetion of THPStorage does not depend on any Real, its macro
defintion is unnecessary, refactor the code so that THPStorage is not macro
defined.

Reviewed By: ezyang

Differential Revision: D13340445

fbshipit-source-id: 343393d0a36c868b9a06eea2ad9b80f5e395e947
2018-12-05 13:19:29 -08:00
Ailing Zhang
be47470c91 Fix cuda multiprocessing cached memory (#14736)
Summary:
This PR fixes #11422

In the old world of CUDA IPC, when we want to share a tensor T from A to B, we have to share the whole CUDA mem allocation where T's storage sit in. And we casted it to the same type of storage of T's.

This causes problem when two different types of storage got allocated to the same CUDA mem block. When we try to reconstruct the second tensor, it will complain about wrong storage type.

In this PR we reconstruct the storage only (not the entire mem block). However, CUDA only allows one open memHandle once per process, we have to save the device pointer in a global cache so that we can reconstruct tensors as they come.

Thanks a ton to ezyang who helped design the solution and debugged the issue!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14736

Differential Revision: D13335899

Pulled By: ailzhang

fbshipit-source-id: cad69db392ed6f8fdc2b93a9dc2899f6d378c371
2018-12-05 10:55:43 -08:00
albanD
6c8ac50753 Fix exception catching to catch c10::Error properly (#13665)
Summary:
In particular, this was breaking the logic for cudnn algorithm to fall back to a less memory hungry algorithm if the selected one OOM when creating the workspace.
c10::Error are subclass of `std::exception` and not `std::runtime_error`.

I removed `runtime_error` in all places in our code and replaced them with `const exception`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13665

Differential Revision: D12958396

Pulled By: soumith

fbshipit-source-id: af557efd9887b013140113d3067de157ffcf8465
2018-11-07 11:22:48 -08:00
Edward Yang
e5d56659ec Delete DeviceGuard(int64_t) constructor. (#13232)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13232

DeviceGuard should be device agnostic, which means that it shouldn't
assume that int64_t means select the CUDA device.

Reviewed By: gchanan

Differential Revision: D10858024

fbshipit-source-id: b40e8337e4046906fd8f83a95e6206367fb29dbe
2018-10-31 07:55:11 -07:00
Edward Z. Yang
a5818047c4 Rewrite serialization to correctly handle partial reads/writes in all cases (#12143)
Summary:
Previously, doRead/doWrite were functions that could return partial reads/writes,
and we checked for this case inconsistently in the call sites of serialization.cpp.
Now, these functions do NOT return the amount of bytes read/written, and instead
handle the necessary checking loop themselves.

Fixes #12042. Maybe.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12143

Differential Revision: D10097027

Pulled By: ezyang

fbshipit-source-id: fd222ab8a825bed352153648ad396acfe124a3e1
2018-09-27 19:09:53 -07:00
Roy Li
f00f99ebcc use at::Half in THC (#11322)
Summary:
- use Half instead of half in THC
- clean up TH_float2half, TH_half2float, etc. conversions
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11322

Differential Revision: D9799553

Pulled By: li-roy

fbshipit-source-id: 9aa3e003bff73d9df6224a393f3ec0624b1f44ed
2018-09-12 17:39:37 -07:00
Edward Yang
49231ab0a8 Reimplement storage slicing. (#11314)
Summary:
In #9466 I got rid of storage views and eliminated all places where
they were used... OR SO I THOUGHT.  In actuality, under certain
conditions (specifically, if you trained a CUDA multiprocessing model
shared over CUDA IPC and then serialized your parameters), you could
also serialize storage slices to the saved model format.  In #9466,
I "fixed" the case when you loaded the legacy model format (really,
just unshared the storages--not strictly kosher but if you aren't
updating the parameters, shouldn't matter), but NOT the modern model format, so
such models would fail.

So, I could have applied the legacy model format fix too, but
hyperfraise remarked that he had applied a fix that was effectively
the same as unsharing the storages, but it had caused his model to
behave differently.  So I looked into it again, and realized that
using a custom deleter, I could simulate the same behavior as old
storage slices.  So back they come.

In principle, I could also reimplement storage views entirely using
our allocators, but I'm not going to do that unless someone really
really wants it.

Fixes #10120.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11314

Reviewed By: ailzhang

Differential Revision: D9671966

Pulled By: ezyang

fbshipit-source-id: fd863783d03b6a6421d6b9ae21ce2f0e44a0dcce
2018-09-06 16:11:59 -07:00
Edward Yang
0a8c8c1dbe Rename real to scalar_t. (#11163)
Summary:
This is necessary to allow us to use the complex header
which defines real (and is very sad if real is macro'ed).

We should also fix accreal, ureal, Real and REAL, but
only 'real' is the real blocker.

```
codemod -d aten/src/TH --extensions c,cc,cpp,cu,cuh,h,TARGETS,py,hpp '\breal\b' scalar_t
codemod -d aten/src/THC --extensions c,cc,cpp,cu,cuh,h,TARGETS,py,hpp '\breal\b' scalar_t
codemod -d aten/src/THNN --extensions c,cc,cpp,cu,cuh,h,TARGETS,py,hpp '\breal\b' scalar_t
codemod -d aten/src/THCUNN --extensions c,cc,cpp,cu,cuh,h,TARGETS,py,hpp '\breal\b' scalar_t
```

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11163

Reviewed By: SsnL

Differential Revision: D9619906

Pulled By: ezyang

fbshipit-source-id: 922cb3a763c0bffecbd81200c1cefc6b8ea70942
2018-09-02 15:26:01 -07:00
Peter Goldsborough
7ddc6f84c4 NULL -> nullptr (#11047)
Summary:
How did we get so many uses of `NULL` again?

ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11047

Differential Revision: D9566799

Pulled By: goldsborough

fbshipit-source-id: 83469f352ac69aa65bdaf1a1a21f922d892e0db3
2018-08-30 16:25:42 -07:00