To implement the warning when transitioning reshape to copy-on-write
storage, we want to be able to detect a write to one view family
following by a read or a write to another one that shares the same
copy-on-write storage.
Because we have historically not been strict about the mutability of
our data pointers, any warning we have would likely be far too
aggressive.
Therefore, this is the first PR in a long series to ensure a strict
distinction between mutable and const data accessors in TensorBase,
TensorImpl, Storage, and StorageImpl.
The rough plan is to give the mutable accessor a new name that is
explicit about mutation, this will also force us to rewrite any code
that really needs a mutation.
Differential Revision: [D44409928](https://our.internmc.facebook.com/intern/diff/D44409928/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97647
Approved by: https://github.com/ezyang
Applies so more fixes to headers that may have been missed before for performance optimization.cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @EikanWang @ezyang since this more in the series of the clang-tidy fixup
This is PR fixes 3 main issues:
1. Use emplacement more in headers
1. Avoid unnecessary copies and use const ref when possible
1. Default any special functions when possible to make them potentially trivial and more readable.
1. There is also one change in this PR that tries to prevent unnecessary math promotion, the rest of these changes are in another PR
Pull Request resolved: https://github.com/pytorch/pytorch/pull/91445
Approved by: https://github.com/ezyang
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60943
In https://github.com/pytorch/pytorch/pull/60470 we made Future store Storages rather than store references to their DataPtrs (because these references could go stale...). However this meant that the Future could keep the Storage alive, and thus keep its memory allocated, even after the user was done with it. We fix it here by instead storing a weak ptr to that Storage (well, in fact to the StorageImpl, but it's the same).
ghstack-source-id: 133295799
Test Plan: CI
Reviewed By: mrshenli
Differential Revision: D29454104
fbshipit-source-id: d36dee00a4841c087bb7b3f5bc39e0459f209cdb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56830
Opt into formatting on GitHub and format everything. This is a trial run before turning on formatting for more and eventually all of the codebase.
Test Plan: CI
Reviewed By: zertosh
Differential Revision: D27979080
fbshipit-source-id: a80f0c48691c08ae8ca0af06377b87e6a2351151
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54530
This diff introduces the following changes and improvements:
- Introduces a new fluent API to construct tensors from external data as an alternative to `from_blob` overloads. See below for an example.
- Leverages several small-buffer optimizations which result in %50 reduction in tensor construction times.
- Exposes a new (lightweight) way to construct tensors by passing a naked `context` and `context_deleter` pair as an alternative to the existing `deleter` parameter.
- Updates the existing `from_blob` overloads to internally use the fluent API.
```
// Example 1
at::Tensor tensor = at::for_blob(data, sizes)
.strides(strides)
.context(context, [](void *ctx) { delete static_cast<Ctx*>(ctx); })
.options(...)
.target_device(...)
.make_tensor();
// Example 2
at::Tensor tensor = at::for_blob(data, sizes).make_tensor();
// Example 3
at::Tensor tensor = at::for_blob(data, sizes)
.deleter(...)
.make_tensor();
```
Test Plan:
Below are the folly Benchmark results for the following two equivalent operations:
```
// The fluent API
at::Tensor tensor = at::for_blob(data, sizes)
.deleter([buffer](void*) mutable { buffer.reset(); })
.options(dtype(c10::ScalarType::Float))
.make_tensor();
// The original `from_blob` overload
at::Tensor tensor = at::from_blob(
data,
sizes,
[buffer](void*) mutable { buffer.reset(); },
dtype(c10::ScalarType::Float));
```
```
============================================================================
scripts/balioglu/from_blob_exp/main.cpp relative time/iter iters/s
============================================================================
fluent 298.34ns 3.35M
from_blob 55.19% 540.51ns 1.85M
============================================================================
```
Various similar experiments show an approximate %50 reduction in tensor construction times.
Reviewed By: ezyang
Differential Revision: D27269344
fbshipit-source-id: e6bd0b78384bf89fd24f22254008180329000363
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37776
* Remove type-specific size tracking in favor of byte size tracking in Storage and StorageImpl
* Changed numel() and set_numel() to nbytes() and set_nbytes()
* Added enum argument to Storage/StorageImpl constructor to indicate new meaning of the size parameter
* Update all callers of the changed API
Part of issue https://github.com/pytorch/pytorch/issues/33950
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37028
Differential Revision: D21171334
Pulled By: ezyang
fbshipit-source-id: 37329a379de9a3a83cc5e9007e455a3e1c2d10b8
Summary:
It's not intended that Storages have 'default' CUDA devices, but this is allowable via the Storage::create_legacy codepath.
This also messages with device_caching, because the initial cache is obtained from the Storage, which may have a 'default' device.
Instead, we materialize a device by allocating 0 bytes via the allocator.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18605
Differential Revision: D14680620
Pulled By: gchanan
fbshipit-source-id: 6d43383d836e90beaf12bfe37c3f0506843f5432
Summary:
Check whether the codegen'd alias annotations actually track alias creation and writes correctly. This could be made more exhaustive, but it's good enough for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14588
Differential Revision: D13312653
Pulled By: suo
fbshipit-source-id: 98de1610ea86deada71957c75c222fff331a0888