Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14858
This diff doesn't change logic but just takes the existing code and moves it to caffe2::Tensor
Reviewed By: ezyang
Differential Revision: D13365817
fbshipit-source-id: bc73b27a793602cb14200dcdf357aa63233da43c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14656
This diff doesn't move it yet, but prepares it to be moved, i.e. removes all access to class internals.
dzhulgakov: Please comment on if you think it still makes sense to land this even though it's not blocking anymore since we're going to move at::CopyBytes anyhow.
ezyang: There's some changes in the implementation, especially handling undefined dest tensors. Please review carefully.
Reviewed By: ezyang
Differential Revision: D13287688
fbshipit-source-id: 17800ca8a79ab1633f23be58d96f99a160d8ed24
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14269
Removes reference to Context proper and instead adds a bool argument for async copy (the same as `copy_`)
For CopyFrom - I haven't tweaked all callsites yet. Instead I rely on a terrible hack that pointer to context is implicitly converted to bool when passed, haha :) It's not a good code and I propose to fix it in a follow up diff (maybe using clangr tooling).
Reviewed By: ezyang
Differential Revision: D13117981
fbshipit-source-id: 7cb1dc2ba6a4c50ac26614f45ab8318ea96e3138
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13629
Previously we have a Tensor which has a initialized storage(therefore a known device_type) and
then we'll call CopyFrom on it to initialize the sizes and data.
We want to eliminate partially initialized Tensor by replacing the pattern of calling CopyFrom with a partially initialized Tensor with either splitting that to undefined Tensor + initialization API(1)(3) or combine all the initialization in the same step(2).
1. member variable initialization + CopyFrom
Previously we have a tensor that is initialized with device_type, and then use CopyFrom to populate the content, now we remove the partial initialization by make the original member variable an undefined Tensor and use ReinitializeFrom to copy from another Tensor.
2. Output + CopyFrom
Previously, we first get a tensor with device_type, and then CopyFrom another Tensor,
We changed it two combining these two operations into OperatorBase::OutputTensor.
3. Output + custom functions
Example can be found in TransformGPU function.
In this case we move the part that initializes the tensor outside of the function, and do that explicitly outside so that we could reuse the Output functions to make a fully initialized Tensor.
Note that to keep the original semantics, both of the APIs has a caching effect based on device_type, which means we only create a Tensor object when device_type does not match or the Tensor is undefined, otherwise, we will reuse the original Tensor object.
Reviewed By: dzhulgakov
Differential Revision: D12848855
fbshipit-source-id: 37bb4ddc1698ebea533b73006eeb1218faa8ddf8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13134
For tensor, we plan to do the following renaming:
```
* t.ndim() → t.dim()
* t.size() → t.numel()
* dims() → t.sizes()
* t.meta() → t.dtype()
* t.dim(d) → t.size(d)
```
This diff adds new APIs in caffe2::Tensor so we can start codemod,
we'll remove old API after the codemod
Reviewed By: ezyang
Differential Revision: D10856028
fbshipit-source-id: 1638997e234d7b3113ef8be65a16246f902273c7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12656
I originally wanted to do this in two steps, but deleting the Storage-only
constructor also changes the default numel state (which breaks tests),
so easiest to do it all in one go.)
- I still need a way to compute the correct TensorTypeId for all of the
Caffe2 constructors; rather than hard-code it, I wrote a function
in at::detail::computeTensorTypeId() to do this calculation. Maybe
this function could be used more widely, but for now, it's used
by Caffe2 only.
- Added a pile more TensorTypeId for all of Caffe2's supported DeviceTypes
- Because I still can't put arbitrary TypeMeta in TensorOptions, the
TensorTypeId() calculation doesn't respect dtype. For now, this is
not a problem, but this might block work to split non-POD dtypes
into their own TensorTypeId.
Reviewed By: li-roy
Differential Revision: D10380678
fbshipit-source-id: 10c5d12020596fc9f27d5579adffad00513af363
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12729
This may have a dependency on D10380678 if size_from_dim(0)
was required because numel() used to return -1 in some cases.
This is no longer true.
Reviewed By: li-roy, dzhulgakov
Differential Revision: D10415069
fbshipit-source-id: 39f46f56249ecaf3533f62a0205b3a45d519d789
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12407
We want to use tensor factory to refactor the caffe2's old way of initialize Tensor by Resize and mutable_data
in order to eliminate uninitialized Tensor.
Previously when we want to create a Tensor in caffe2, we'll do the following
```
Tensor x(CPU); // device type provided
x.Resize({1, 2, 3}); // size provided
x.mutable_data<float>(); // data type provided and memory allocated
```
This leaves Tensor in not fully initialized state during the process, to eliminate this, we
want to provide all the needed information in the begining. ATen already has its TensorFactories: https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/TensorFactories.cpp, and there is a TensorOption, we want to adopt the same interface to ease future refactoring.
In the callsite, we used to have `Output(i)` that returns a `Blob` that contains an uninitialized `Tensor` and we'll call Resize and mutable_data afterwards to provide dimension and data type,
```
// uninitialized tensor
auto* Y = Output(0);
// set dimensions
Y->Resize({1, 2, 3});
// actually allocate the data
auto* data = Y->mutable_data<float>();
// After this step, Tensor is fully initialized.
```
We want to change it to the following:
```
// provide dimensions and TensorOptions which include device type and data type.
// This will set all the information of Tensor properly and also allocate memory.
auto* Y = Output(0, {1, 2, 3}, at::device({context_.device_type()}).template dtype<T>());
// Tensor is fully initialized after this step
// following `mutable_data` call won't allocate memory.
auto* data = Y->mutable_data<float>();
```
microbenchmarks
```
============================================================================
caffe2/caffe2/fb/benchmarks/core_overhead_benchmark.ccrelative time/iter iters/s
============================================================================
OperatorNewOutputTensorAPI 3.27us 306.05K
OperatorOldOutputTensorAPI 3.55us 281.54K
============================================================================
```
Reviewed By: ezyang
Differential Revision: D10207890
fbshipit-source-id: f54ddacaa057b7c6bc7d5a8290171f35e9e40e29
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12304
- make ExtractDeviceOption to be a free function.
- Add a Strorage(at::Device) constructor in order to preserve the device_id.
Reviewed By: dzhulgakov
Differential Revision: D10069839
fbshipit-source-id: a5f3994a39bdf1b7503b39bb42c228e438b52bfa
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12180
I had to fix a lot of call sites, because a lot of places assume that
you can actually get a const vector&, and if the internal representation
of sizes in a tensor is NOT a vector, it's not possible to fulfill
this API contract.
Framework changes:
- I deleted TensorImpl::dims(); caffe2::Tensor::dims() just forwards to
sizes() now.
- De-templatized SetDims; now it is an explicit list of ArrayRef and
variadic overloads. This makes implicit conversions work again,
so I don't need to explicitly list the std::vector cases too.
- As a knock-on effect, this causes Reset() to accept at::IntList as well as
const std::vector<int64_t>&
- Edited variadic overloads of SetDims to all forward to the underlying
arbitrary-dim implementation, reducing code duplication. (It's probably
marginally less efficient in the new world.)
- Replace Tensor constructor accepting const std::vector<int64_t>& with at::IntList
- Make MKLTensor accept ArrayRef along with vector in constructor and
Reset (unfortunately, no implicit conversions here, since it's templated on
index type.)
- There are a few other places, like cudnn, where I changed functions
that previously took const std::vector<int64_t>& to take at::IntList
instead.
Classification of call site changes:
- 'const std::vector<int64_t>& x_dims = x.dims()' ==>
'at::IntList x_dims = x.dims()'
- 'std::vector<int64_t> x_dims = x.dims()' ==>
'std::vector<int64_t> x_dims = x.dims().vec()' (we need a copy!)
Usually this is because we're about to mutably modify the vector
to compute some new dimension. However, it also very commonly occurs in the
form: 'x_dims_ = x.dims()' because we frequently cache sizes in operators.
- Instead of constructing std::vector<int64_t>{blah, blah}, construct an
at::IntList directly
ArrayRef changes:
- cbegin()/cend() iterators, they operate the same aas begin()/end() because
everything on ArrayRef is const.
- Moved operator<< into ArrayRef.h, so that it's always available when
working with ArrayRef. I also templated it, so it now works on an
ArrayRef of any type.
- Add operator== overload for ArrayRef, and also add variants to permit
comparison of ArrayRef with std::vector, a very common operation.
(The non-templated version of operator== can get these automatically
via implicit conversion, but with templates C++ refuses to do
any explicit conversions.)
I'm planning to audit all dims() call sites to make sure they don't
expect 'auto x = t.dims()' to give you an x whose lifetime can validly
outlive the tensor.
I opted not to do a dims() to sizes() rename, because dims() also matches
the protobufs accessor. Bad news!
Reviewed By: jerryzh168
Differential Revision: D10111759
fbshipit-source-id: a2a81dc4b92c22ad4b3b8ef4077a7e97b6479452
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12090
This is a slight pessimization because we need to do a
full recompute of is_contiguous(), even though a modification
of dim-0 is guaranteed to preserve contiguity.
Reviewed By: jerryzh168
Differential Revision: D10050905
fbshipit-source-id: b99233e21c9f4275b0db6e76740462e5430ce152
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11971
- Switched TensorImpl::data<T>() to use Storage::unsafe_data<T>() to work
around an outstanding bug in the Storage::data<T>() implementation
where it only works on Ts which are valid ScalarType
- Qualify a bunch of identifiers which still live in caffe2:: namespace
- strides returns an IntList now
- s/update_strides/update_to_contiguous_strides/
- Correctly compute type_id_ for the Storage only constructor from Caffe2.
This is special cased to only work for CPU and CUDA dense tensors.
- Fix some signed-unsigned comparisons in Caffe2 code (OSS build for
ATen/core has more restrictive warning tests.)
Reviewed By: jerryzh168
Differential Revision: D9995559
fbshipit-source-id: 9c74032e011189e1c7e9a98d20f2bd1e25ad2e5c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11688
As a first step to remove static context(merge with allocator), we'll create a
global registries for context constructors, and remove CreateContext function from tensor.
Reviewed By: ezyang, dzhulgakov
Differential Revision: D9779821
fbshipit-source-id: 8b239ea50af7a0556fde2382f58f79194f0e3dc1
Summary:
I also fix a bug that crept in while we had incorrect semantics where UndefinedTensorImpl was a CPU tensor, and thus some moves which shouldn't have been legal didn't crash. Moving out the Tensor* also moved out the Tensor* in the blob, and it's not supported to store an undefined tensor in a blob.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11738
Reviewed By: gchanan
Differential Revision: D9847859
fbshipit-source-id: db6be0f76a8e6526a89fd0e87b6a23b9cc820c8d