Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14197
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13642
Previously we pass in a patially initialized Tensor to Deserialize and it will fill
it with the result of deserialization of a tensor proto. Now we want it to return
a Tensor directly since it's just a shared pointer to TensorImpl.
Reviewed By: dzhulgakov
Differential Revision: D12874357
fbshipit-source-id: 12b80a763375da23cfa64a74d6bc186d8d03b94f
Summary:
See D10380678 for the discussion.
Caffe2 serialization code was able to handle dtype uninitalized tensor as long as their numel was 0 O_O.
For safety to unblock the push I'm preserving this behavior with critical. As we fix all occurrences of old API, we can delete this test.
Reviewed By: kennyhorror
Differential Revision: D10866562
fbshipit-source-id: e172bd045fdfca660ff05b426e001f5f2f03f408
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12656
I originally wanted to do this in two steps, but deleting the Storage-only
constructor also changes the default numel state (which breaks tests),
so easiest to do it all in one go.)
- I still need a way to compute the correct TensorTypeId for all of the
Caffe2 constructors; rather than hard-code it, I wrote a function
in at::detail::computeTensorTypeId() to do this calculation. Maybe
this function could be used more widely, but for now, it's used
by Caffe2 only.
- Added a pile more TensorTypeId for all of Caffe2's supported DeviceTypes
- Because I still can't put arbitrary TypeMeta in TensorOptions, the
TensorTypeId() calculation doesn't respect dtype. For now, this is
not a problem, but this might block work to split non-POD dtypes
into their own TensorTypeId.
Reviewed By: li-roy
Differential Revision: D10380678
fbshipit-source-id: 10c5d12020596fc9f27d5579adffad00513af363
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12848
Updated all non-test uses of protobuf::MessageLite::SerializeAsString to call
SerializeAsString_EnforceCheck so that the return value is checked and can
throw an exception if failing.
Most of the affected code was called from classes derived from BlobSerializeBase.
Didn't touch most tests and ENFORCE calls because they usually do checks
anyway.
Original commit changeset: c0760e73ecc7
Reviewed By: dzhulgakov
Differential Revision: D10453456
fbshipit-source-id: d2f2b7b4578e721924354149f08f627c7e3bf070
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12799
Updated all non-test uses of protobuf::MessageLite::SerializeAsString to call
SerializeAsString_EnforceCheck so that the return value is checked and can
throw an exception if failing.
Most of the affected code was called from classes derived from BlobSerializeBase.
Didn't touch most tests and ENFORCE calls because they usually do checks
anyway.
Reviewed By: ezyang
Differential Revision: D10416438
fbshipit-source-id: cb842e3e26b0918829d71267a375d4dd40600d58
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12407
We want to use tensor factory to refactor the caffe2's old way of initialize Tensor by Resize and mutable_data
in order to eliminate uninitialized Tensor.
Previously when we want to create a Tensor in caffe2, we'll do the following
```
Tensor x(CPU); // device type provided
x.Resize({1, 2, 3}); // size provided
x.mutable_data<float>(); // data type provided and memory allocated
```
This leaves Tensor in not fully initialized state during the process, to eliminate this, we
want to provide all the needed information in the begining. ATen already has its TensorFactories: https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/TensorFactories.cpp, and there is a TensorOption, we want to adopt the same interface to ease future refactoring.
In the callsite, we used to have `Output(i)` that returns a `Blob` that contains an uninitialized `Tensor` and we'll call Resize and mutable_data afterwards to provide dimension and data type,
```
// uninitialized tensor
auto* Y = Output(0);
// set dimensions
Y->Resize({1, 2, 3});
// actually allocate the data
auto* data = Y->mutable_data<float>();
// After this step, Tensor is fully initialized.
```
We want to change it to the following:
```
// provide dimensions and TensorOptions which include device type and data type.
// This will set all the information of Tensor properly and also allocate memory.
auto* Y = Output(0, {1, 2, 3}, at::device({context_.device_type()}).template dtype<T>());
// Tensor is fully initialized after this step
// following `mutable_data` call won't allocate memory.
auto* data = Y->mutable_data<float>();
```
microbenchmarks
```
============================================================================
caffe2/caffe2/fb/benchmarks/core_overhead_benchmark.ccrelative time/iter iters/s
============================================================================
OperatorNewOutputTensorAPI 3.27us 306.05K
OperatorOldOutputTensorAPI 3.55us 281.54K
============================================================================
```
Reviewed By: ezyang
Differential Revision: D10207890
fbshipit-source-id: f54ddacaa057b7c6bc7d5a8290171f35e9e40e29
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12714
This is a short change to enable c10 namespace in caffe2. We did not enable
it before due to gflags global variable confusion, but it should have been
mostly cleaned now. Right now, the plan on record is that namespace caffe2 and
namespace aten will fully be supersets of namespace c10.
Most of the diff is codemod, and only two places of non-codemod is in caffe2/core/common.h, where
```
using namespace c10;
```
is added, and in Flags.h, where instead of creating aliasing variables in c10 namespace, we directly put it in the global namespace to match gflags (and same behavior if gflags is not being built with).
Reviewed By: dzhulgakov
Differential Revision: D10390486
fbshipit-source-id: 5e2df730e28e29a052f513bddc558d9f78a23b9b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11925
This is step 1 in the refactoring to remove Blob::ShareExternal(), i.e. Blob would then always own its contents.
ShareExternal() is for example used to pass non-owning blobs to serialization. This diff prepares removing that.
Reviewed By: ezyang
Differential Revision: D9884177
fbshipit-source-id: d01df9a613a4fc62e5679fe45bfc47e2c899b818
Summary:
There are still a few work to be done:
- Move logging and unify AT_WARN with LOG(ERROR).
- A few header files are still being plumbed through, need cleaning.
- caffe2::EnforceNotMet aliasing is not done yet.
- need to unify the macros. See c10/util/Exception.h
This is mainly a codemod and not causing functional changes. If you find your job failing and trace back to this diff, usually it can be fixed by the following approaches:
(1) add //caffe2/c10:c10 to your dependency (or transitive dependency).
(2) change objects such as at::Error, at::Optional to the c10 namespace.
(3) change functions to the c10 namespace. Especially, caffe2::MakeString is not overridden by the unified c10::str function. Nothing else changes.
Please kindly consider not reverting this diff - it involves multiple rounds of rebasing and the fix is usually simple. Contact jiayq@ or AI Platform Dev for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12354
Reviewed By: orionr
Differential Revision: D10238910
Pulled By: Yangqing
fbshipit-source-id: 7794d5bf2797ab0ca6ebaccaa2f7ebbd50ff8f32
Summary:
This does 6 things:
- add c10/util/Registry.h as the unified registry util
- cleaned up some APIs such as export condition
- fully remove aten/core/registry.h
- fully remove caffe2/core/registry.h
- remove a bogus aten/registry.h
- unifying all macros
- set up registry testing in c10
Also, an important note that we used to mark the templated Registry class as EXPORT - this should not happen, because one should almost never export a template class. This PR fixes that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12077
Reviewed By: ezyang
Differential Revision: D10050771
Pulled By: Yangqing
fbshipit-source-id: 417b249b49fed6a67956e7c6b6d22374bcee24cf
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12043
Re-trying D9979976, this time with all call sites fixed.
D9979976 got reverted because there was a call site that wasn't covered by sandcastle it seems.
I fixed it and used 'grep' to ensure there aren't any more call sites in fbsource.
Reviewed By: ezyang
Differential Revision: D10026392
fbshipit-source-id: cd341514a8e53a40147ea0ee3e52f63bb6444157
Summary: The controller you requested could not be found. Original commit changeset: 2ea17724e223
Differential Revision:
D10026321
Ninja: stable broken
fbshipit-source-id: faf87cb7cc0f78c2c10d4aa6fceea279cd27acd6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11923
This is pre-work to allow moving Blob to ATen/core, which cannot depend on caffe2 anymore.
(1) Removing the Blob -> Tensor dependency allows us to move Blob to ATen/core and use it inside IValue without having to wait for the Tensor merge to be complete.
(2) In the final Blob design, we want it to be a very small class that doesn't have any special treatment for Tensor (or to be more correct, doesn't allow storing Tensor anymore), so this is anyhow the direction we want to go.
This changes call sites that will have to be moved to IValue later, but they cannot be moved to IValue directly, because for that, IValue first needs to be able to store Blob, which in turn first needs this diff and some other changes coming up in future diffs.
Codemods:
$ codemod --extensions h,hpp,c,cpp,cc "([a-zA-Z0-9_]+)\\.IsTensorType\\(" "BlobIsTensorType(\\1, "
$ codemod --extensions h,hpp,c,cpp,cc "([a-zA-Z0-9_]+)->IsTensorType\\(" "BlobIsTensorType(*\\1, "
$ codemod --extensions h,hpp,c,cpp,cc "([a-zA-Z0-9_]+)\\.GetMutableTensor\\(" "BlobGetMutableTensor(\\1, "
$ codemod --extensions h,hpp,c,cpp,cc "([a-zA-Z0-9_]+)->GetMutableTensor\\(" "BlobGetMutableTensor(*\\1, "
It is, however, not only these codemods because regex based refactoring was only able to match a small amount of the call sites. To catch more, I wouldn've needed a AST aware tool like clangr, which I didn't figure out how to use.
Reviewed By: ezyang
Differential Revision: D9979976
fbshipit-source-id: 2ea17724e223b5b73b44f99362727759ca689e61
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11817
Blob::Serialize() and Blob::Deserialize() are now free functions SerializeBlob(), DeserializeBlob() instead.
This takes away access to Blob internals from them and makes future refactorings easier.
Reviewed By: ezyang
Differential Revision: D9882726
fbshipit-source-id: 3251ebd4b53fc12f5e6924a6e4a8db3846ab3729
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11167
Narrow the Blob API as preparation for merging Blob/IValue
- get rid of templated IsType and Operator::InputIsType / OutputIsType
- Use 'using' instead of 'typedef' for DestroyCall (just for readability)
Reviewed By: ezyang
Differential Revision: D9623916
fbshipit-source-id: 952f0b0cf5a525094b02e8d2798dd57a56a9e1d8
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11254
Previously we use DeviceType in caffe2.proto directly, but it's an `enum` and have implicit conversion to int, which does not have type safety, e.g. we have to explicitly check for a device type is valid in event.h:
```
template <int d>
struct EventCreateFunctionRegisterer {
explicit EventCreateFunctionRegisterer(EventCreateFunction f) {
static_assert(d < MaxDeviceTypes, "");
Event::event_creator_[d] = f;
}
};
```
at::DeviceType is an `enum class`, and it does not have implicit conversion to int, and provides better type safety guarantees. In this diff we have done the following refactor(taking CPU as an example):
1. caffe2::DeviceType → caffe2::DeviceTypeProto
2. caffe2::CPU → caffe2::PROTO_CPU
3. caffe2::DeviceType = at::DeviceType
4. caffe2::CPU = at::DeviceType::CPU
codemod -d caffe2/caffe2 --extensions h,cc,cpp 'device_type\(\), ' 'device_type(), PROTO_'
+ some manual changes
In short, after this diff, in c++, caffe2::CPU refers to the at::DeviceType::CPU and the old proto caffe2::CPU will be caffe2::PROTO_CPU.
In python side, we have a temporary workaround that alias `caffe2_pb2.CPU = caffe2_pb2.PROOT_CPU` to make the change easier to review and this will be removed later.
Reviewed By: ezyang
Differential Revision: D9545704
fbshipit-source-id: 461a28a4ca74e616d3ee183a607078a717fd38a7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10804
Make ShareData and ShareExternalPointer to create new storage when the old one is used by multiple tensors.
When we need to modify the field of storage, we'll create a new storage instead.
Reviewed By: ezyang
Differential Revision: D9350686
fbshipit-source-id: 68d2b6b886b0367b0fc4fabfd55b9a480e7388ca
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9939
Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13
Pull Request resolved: https://github.com/pytorch/translate/pull/166
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125
Closes https://github.com/pytorch/pytorch/pull/9125
Use inheritance for polymorphism, and remove template parameter
This is to change the templating in call sites, the core implementations will change later
Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are:
1. We added an extra argument *DeviceType* to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)),
2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided.
3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type
4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change
Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s.
Reviewed By: ezyang, houseroad
Differential Revision: D9024330
fbshipit-source-id: e0b8295d2dc6ebe2963383ded5af799ad17164ba
Summary:
Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13
Pull Request resolved: https://github.com/pytorch/translate/pull/166
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125
Closes https://github.com/pytorch/pytorch/pull/9125
Use inheritance for polymorphism, and remove template parameter
This is to change the templating in call sites, the core implementations will change later
Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are:
1. We added an extra argument *DeviceType* to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)),
2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided.
3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type
4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change
Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s.
Reviewed By: xw285cornell
Differential Revision: D8121878
fbshipit-source-id: 4a5e9a677ba4ac82095df959851a054c81eccf81
Necessary for Tensor detemplatization (D8121878) - now tensor won't have default constructor (as we don't know the device).
Thus this diff makes TypeMeta be constructible with non-default-constructible types in which case ctor() is non-null but always throws.
It's dangerous however as we won't catch potential type errors at compile time. Luckily - the only place where ctor() is used is in Blob and Tensor which have templated wrappers there (GetMutable and mutable_data respectively). We can just enforce the necessary type requirements there explicitly as a static_assert.
It also changes the failure behavior to be throw() instead of abort(). Aborting the process is not cool for the library :)
This is after 2 years and we do not seem to have a use case for this one, so
for the sake of clean API design we should potentially remove this. This would
allow us to potentially pass in arguments to optionally construct an object,
although it is indeed a little bit unclear how we can reuse existing objects if
constructor arguments are passed in. In any case, we may want to remove this
dangling feature.
* WIP: Fix Out of Memory failure in test TensorTest.Tensor64BitDimension
* WIP: update warning message and wrap resize inside TensorTest.Tensor64BitDimension
* WIP: only catch exception which is related to out of memory
* WIP: add return in the out of memory exception
Summary:
When RTTI was not enabled, previously we can only print
(RTTI not enabled ...) type error message. This is annoying when developing
on mobile environment. Adding gRegistry when #T to have basic string for type
easy type inference
Reviewed By: Yangqing
Differential Revision: D6849614
fbshipit-source-id: d41417d72fdcfb7b8c9ddc4ded604ea598572b73
Summary:
The use case is that sometimes we need a Tensor of custom type instead of POD
or string. This diff allows one to delegate to BlobSerializerBase to further
serialize the contents inside the Tensor.
Design choices:
(1) Each element is serialized as a BlobProto string, and stored in the
repeated string field.
(2) UNDEFINED is used as the enum value for the tensor data type, and the exact
type string is stored in the additional field.
(3) BlobSerializer is called on each item to obtain the serialized string.
(4) This requires the custom type to have copy constructor - otherwise it
will simply not be possible to copy over the deserialized content without
explicit type.
See blob_test.cc for an example.
Reviewed By: sunnieshang
Differential Revision: D6300196
fbshipit-source-id: 18bf94a22a07337e0fa83d3f1004b3651e38cf27
Summary: If blob type switches between fp32, fp16 - for example - we should share the tensor buffer. This kind of switching can happen with memonger and in-place conversions.
Reviewed By: bddppq
Differential Revision: D5812333
fbshipit-source-id: 44d54bfe52cbda734db8c7f20d6970e4b51ee1e1
Summary:
There are ad-hoc efforts on avoiding excessive device synchronizations, such as
async_dag, singlethread_async, etc. This diff aims to provide an early design
for a general Event class, that can achieve the following:
(1) It is device agnostic, essentially using a vtable to do cross device record,
wait and synchronization.
(2) Created new functions WaitEvent and Record in the Context class for
interacting with Events.
(3) Exposed the corresponding WaitEvent and Record functions in the OperatorBase
class as well.
An example use case is that, after potential future refactoring, one can achieve
a real async execution per operator by running
op.WaitEvent(previous_event);
op.RunAsync();
op.RecordEvent(this_op_event);
and the next op can do
next_op.WaitEvent(this_op_event);
Right now, I changed async_dag net implementation so that it uses the general
event design. The old Event class is assimilated to the general Event class and
the old Stream class is now essentially taken over by the Context class itself.
Reviewed By: harouwu
Differential Revision: D5648463
fbshipit-source-id: 58bd84d06e4a9977b0b835110ddb2f18be3b7cbc
Summary: Added functionality that allows users to store huge blobs of any type not only Tensors. Blob has to be divided into chunks in the same way as Tensor blob.
Reviewed By: kennyhorror
Differential Revision: D5432762
fbshipit-source-id: c171faacd99d209bfae6f9707ebde7c4e23ba3b9
Summary:
__attribute__((unused)) is not supported on Windows, so we actually need to
substitute it with a macro.
Also changed UNUSED_VARIABLE to CAFFE2_UNUSED because we also use it to mark
functions now.
Reviewed By: ajtulloch
Differential Revision: D5497063
fbshipit-source-id: bcda026e626c41f71c21c36f029a3f871eaea7d4
Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually.
Reviewed By: igorsugak
Differential Revision: D5454343
fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2
Summary: When we use int32_data field for float16 tensors serialization it's possible to end up with up to 50% larger representation than can be achieved using byte_data. The reason for it is varints (https://developers.google.com/protocol-buffers/docs/encoding#varints). In worst cast (when highest sign bit is set) it uses 3 8-bit blocks i.e. 24 bits for each number. Saving in byte_field removes this overhead.
Reviewed By: Yangqing
Differential Revision: D5375267
fbshipit-source-id: 0068daed25cd0157ea80a768b6e3899ea2bd8caf