Part of #91395
Also modifies how `StorageImpl`s are stored in JIT static runtime's `MemoryPlanner`, which used to `std::move` `StorageImpl`s into a vector. But `StorageImpl` can no longer be moved. Instead, `MemoryPlanner` now contains a malloced buffer to which we add new `StorageImpl`s using placement new
Pull Request resolved: https://github.com/pytorch/pytorch/pull/93342
Approved by: https://github.com/ezyang
Summary:
This is part 1 of the effort to support `share_memory_()` in C++ aten library.
This allows C++ code to in place replace the tensor storage to shm based.
For now fd based shm is the only implementation supported to simplify memory management in general.
This first part intentionally avoids public api changes (to `TensorBase`, see comments in `StorageUtil.h`) such that we can get the core features usable outside pt/csrc first. The API addition to `Tensor` or `TensorBase` would involve more distracting changes and make the change harder to review.
Test Plan:
```
buck test caffe2:StorageUtils_test
```
Differential Revision: D43467616
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95228
Approved by: https://github.com/ezyang
See discussion here for context: https://pytorch.slack.com/archives/GEEQ2K4MD/p1663672716533319?thread_ts=1662155536.133099&cid=GEEQ2K4MD, opening a PR as suggested by @albanD
Currently PyTorch holds the GIL when copying Tensors into shared memory. For certain workloads it would be nice to be able to copy different tensors into shared memory in parallel, but with the GIL being held the copies can't truly run in parallel.
Here's a short example of this:
```
import torch
import time
from multiprocessing.pool import ThreadPool
tensors = []
for i in range(64):
for j in range(8):
t = torch.ones(128, 480, 640).type(torch.uint8) * i
tensors.append(t)
print("Done generating input tensors")
with ThreadPool(processes=8) as pool:
futures = []
before = time.time()
for t in tensors:
future = pool.apply_async(t.share_memory_)
futures.append(future)
for f in futures:
f.get()
elapsed = time.time() - before
print("ELAPSED TIME", elapsed)
```
With this diff, I get:
```
~$ python repro.py
Done generating input tensors
ELAPSED TIME 3.561321258544922
~$
```
Previously, I would get:
```
~$ python repro.py
Done generating input tensors
ELAPSED TIME 16.305657386779785
~$
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85389
Approved by: https://github.com/albanD