Summary:
Previously this used the ``.toliist`` method, which converted the
storage object into a list of Python objects, and then sent those to
pickle. For storage objects of non-trivial size, this was very slow.
Now we reuse the logic of the ``torch.save`` function to efficiently
turn the Storage object into bytes, and send those instead. This
reduces the semantic information (it's harder to interpret the bytes)
but should be orders of magnitude more efficient when serializing data
with the pickle protocol or with copy
For future work it would be nice to develop a mechanism to get a buffer
of bytes out of a Storage object, and use that alongside the current
``from_buffer`` method.
See #9168 for context
Closes https://github.com/pytorch/pytorch/pull/9184
Differential Revision: D8747794
Pulled By: soumith
fbshipit-source-id: ac598e660c043788ed1ffab3d0303812886edf79
This moves the implementation of repeat to _utils so that the autograd
function can call it directly instead of relying on forward being called
on tensors.
This also removes _range, which was previously necessary because we
shadowed the built-in range() function.
This saves an extra memory copy, which speeds up data loading a bit
(5-10% with accimage).
As part of this change:
* torch.cat accepts keyword argument out
* sepcifiying out=None is treated like not specifying out
This hooks into the (internal) ForkingPickler class in multiprocessing
to reduce tensors, storages, and CUDA events instead of our queue from
joblib. This makes it easier to use the standard multiprocessing classes
in later versions of Python.
This also exposes:
- Tensor/Storage.share_memory_()
- Module.share_memory()
These methods move the CPU tensors and storages to shared memory. If
you're using the "fork" method of multiprocessing, these objects can be
directly inherited instead of serialized through a queue.