Commit Graph

78 Commits

Author SHA1 Message Date
Richard Barnes
29d759948e use irange for loops 2 (#66746)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66746

Modified loops in files under fbsource/fbcode/caffe2/ from the format

`for(TYPE var=x0;var<x_max;x++)`

to the format

`for(const auto var: irange(xmax))`

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D31705361

fbshipit-source-id: 33fd22eb03086d114e2c98e56703e8ec84460268
2021-12-10 04:26:23 -08:00
Xue Li
2f099c7555 Revert D30652629: use irange for loops
Test Plan: revert-hammer

Differential Revision:
D30652629 (687c2267d4)

Original commit changeset: 0ae6c4bbbb55

fbshipit-source-id: 5c4f067b584a021c8c9656454d1ee60999600fb3
2021-10-15 15:23:10 -07:00
Richard Barnes
687c2267d4 use irange for loops (#66234)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234

Modified loops in files under fbsource/fbcode/caffe2/ from the format

`for(TYPE var=x0;var<x_max;x++)`

to the format

`for(const auto var: irange(xmax))`

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

bypass_size_limit
allow-large-files

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D30652629

fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e
2021-10-15 13:50:33 -07:00
Adam Simpkins
e0364ccc33 [caffe2] break one circular dependency between Caffe2 and ATen-cpu (#62632)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62632

Update the caffe2/core/context.h to directly use `at::mt19937` instead of the
`at::CPUGeneratorImpl` wrapper class from the ATen-cpu library.

Using `at::CPUGeneratorImpl` causes circular dependencies between the ATen and
caffe2 code.  In particular the `at::CPUGeneratorImpl::get_state()` logic
depends on CPU Tensor functionality that currently depends on code from
caffe2.

Test Plan:
The RNG behavior should be identically to the previous code (perhaps even
faster since we now avoid virtual function calls).

  buck test //caffe2/caffe2:caffe2_test_cpu \
    //caffe2/caffe2/python: //caffe2/caffe2/fb/operators:

Differential Revision: D29915701

fbshipit-source-id: f9b2eab8d3b21b2224d30bcf52be9c0e7eb7cd0a
2021-08-02 22:40:56 -07:00
Emilio Castillo
f9ec86a6c6 External stream (#59527)
Summary:
Previous is https://github.com/pytorch/pytorch/issues/57781

We add now two CUDA bindings to avoid using ctypes to fix a windows issue.
However, we use ctypes to allocate the stream and create its pointer
(we can do this with a 0-dim tensor too if it feels better).

CC. ezyang rgommers ngimel mruberry

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59527

Reviewed By: albanD

Differential Revision: D29053062

Pulled By: ezyang

fbshipit-source-id: 661e7e58de98b1bdb7a0871808cd41d91fe8f13f
2021-06-14 13:46:11 -07:00
Rong Rong (AI Infra)
689a5edd0a Revert D28326365: [pytorch][PR] Add torch.cuda.streams.ExternalStream
Test Plan: revert-hammer

Differential Revision:
D28326365 (d7ef9b73fb)

Original commit changeset: b67858c80339

fbshipit-source-id: 337588d40b96cf04e46e554fa481ae7fd4254478
2021-06-04 11:19:36 -07:00
Emilio Castillo
d7ef9b73fb Add torch.cuda.streams.ExternalStream (#57781)
Summary:
This is required in https://github.com/pytorch/pytorch/pull/57110#issuecomment-828357947

We need to provide means to synchronize on externally allocated streams for dlpack support in python array data api.

cc mruberry rgommers leofang asi1024 kmaehashi

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57781

Reviewed By: mrshenli

Differential Revision: D28326365

Pulled By: ezyang

fbshipit-source-id: b67858c8033949951b49a3d319f649884dfd0a91
2021-06-04 08:47:09 -07:00
Jane Xu
71ca600af9 Renaming CAFFE2_API to TORCH_API (#49496)
Summary:
Since caffe2 and torch have been consolidated, CAFFE2_API should be merged with TORCH_API. Addresses a TODO.

Manually edited some references of the removed `CAFFE2_API`:
* `CONTRIBUTING.md`
* `caffe2/proto/CMakeLists.txt`
* `cmake/ProtoBuf.cmake`
* `c10/macros/Export.h`
* `torch/csrc/WindowsTorchApiMacro.h`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49496

Reviewed By: malfet, samestep

Differential Revision: D25600726

Pulled By: janeyx99

fbshipit-source-id: 7e068d959e397ac183c097d7e9a9afeca5ddd782
2020-12-18 10:54:50 -08:00
Basil Hosmer
f05b66b70d pass TypeMeta by value (#45026)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45026

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D23802943

Pulled By: bhosmer

fbshipit-source-id: 81b06ef00bf8eb4375c0e0ff2032e03bd1d1188a
2020-10-30 10:14:17 -07:00
Tristan Rice
0c9787c758 caffe2: use at::mt19937 instead of std::mt19937 (10x speedup) (#43987)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43987

This replaces the caffe2 CPU random number (std::mt19937) with at::mt19937 which is the one currently used in pytorch. The ATen RNG is 10x faster than the std one and appears to be more robust given bugs in the std (https://fburl.com/diffusion/uhro7lqb)

For large embedding tables (10GB+) we see UniformFillOp taking upwards of 10 minutes as we're bottlenecked on the single threaded RNG. Swapping to at::mt19937 cuts that time to 10% of the current.

Test Plan: Ran all relevant tests + CI. This doesn't introduce new features (+ is a core change) so existing tests+CI should be sufficient to catch regressions.

Reviewed By: dzhulgakov

Differential Revision: D23219710

fbshipit-source-id: bd16ed6415b2933e047bcb283a013d47fb395814
2020-10-16 16:08:35 -07:00
Tristan Rice
5e04bb2c1c caffe2: expose CPUContext RandSeed for backwards compatibility with external RNG (#43239)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43239

This is an incremental step as part of the process to migrate caffe2 random number generator off of std::mt19937 and to instead use at::mt19937+at::CPUGeneratorImpl. The ATen variants are much more performant (10x faster).

This adds a way to get the CPUContext RandSeed for tail use cases that require a std::mt19937 and borrow the CPUContext one.

Test Plan: This isn't used anywhere within the caffe2 codebase. Compile should be sufficient.

Reviewed By: dzhulgakov

Differential Revision: D23203280

fbshipit-source-id: 595c1cb447290604ee3ef61d5b5fc079b61a4e14
2020-08-21 19:36:38 -07:00
Yinghai Lu
a2f3c6c26f Call RandomNumberSeed() on-demand (#33539)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33539

We rarely use the `random_seed_` in context but we always initialize it with `RandomNumberSeed()` which isn't trivial. This diff makes it that we only call `RandomNumberSeed()` once when we want to use `random_seed_`.

Test Plan:
unittests.

Canaries:
AF: https://our.intern.facebook.com/intern/ads/canary/424753437441438410
AI: https://our.intern.facebook.com/intern/ads/canary/424753467414318838
Prospector: https://our.intern.facebook.com/intern/ads/canary/424753976999968569

Reviewed By: ipiszy

Differential Revision: D19993190

fbshipit-source-id: 1d2606bd65476ff3b519c69f9cbfa3b80f75cdff
2020-02-22 01:22:18 -08:00
peterjc123
c4121ed8db Fix is_fundamental template for MSVC (#30959)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/30932
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30959

Differential Revision: D18891797

Pulled By: mingbowan

fbshipit-source-id: e6c36ee80065e66117873e768f86f507c48aaef1
2019-12-19 12:10:22 -08:00
James Reed
dfb081a7e4 Fix a lot of C++ build warnings (#16411)
Summary:
I went through my build log and did what I thought were reasonable fixes to all the C++ compilation warnings that came up
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16411

Differential Revision: D13901006

Pulled By: jamesr66a

fbshipit-source-id: 02df4e3e5a5c8dd9e69ac9f065cd3f2a80645033
2019-01-31 14:35:56 -08:00
Edward Yang
298b775577 Delete temporary ATenCoreTest. (#14622)
Summary:
It was previously used to sure that ATen/core was working;
but now we have plenty of headers and C++ files in ATen/core
so this is no longer necessary.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14622

Differential Revision: D13276899

Pulled By: ezyang

fbshipit-source-id: 9bef7eb1882ccdfa3ee7681a3d5b048ea94b59d3
2018-12-03 15:07:40 -08:00
Dmytro Dzhulgakov
0cfbbceac3 Change Tensor::CopyFrom to a simple double dispatch (#14268)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14268

Removes the need for Context in Tensor by doing simple dispatch for CopyBytes. It'd eventually be subsumed by Roy Li's changes of proper copy_ op, but before that is done, let's get a clear logic of how copies are implemented and clean up some craft in CopyFrom implementation.

Note, that with these changes, one can probably can get rid of Context::CopyFromCPU/CopyToCPU, but it's a matter for follow up diffs.

This diff doesn't change the API of Tensor yet, but relies on the fact that passing `Context` to CopyFrom makes copy async if the device is CUDA and doesn't have any effect otherwise (that's how Context methods are implemented).

This doesn't change semantics of copy async implementation - as before it blindly calls cudaMemcpyAsync which probably means that it can be misused if invoked separately outside of operator body. I'll leave it for the follow up copy_ unification.

For Extend() we always do async copy - it makes sense as it's an in-place device-device operation and only any further op would be observable.

Note: there are now three ways of invoking copy in C2 code - templated CopyBytes, virtual CopyFromCPU/etc, and double-dispatch free method here. Hopefully we can get rid of the second one.

Also, please advise whether it's c10-worthy :)

Reviewed By: ezyang

Differential Revision: D13117987

fbshipit-source-id: a6772d6dcf3effaf06717da3a656fc9873b310b5
2018-11-28 15:45:37 -08:00
Sebastian Messmer
4b0fc5200b Fix include paths for typeid.h (#13689)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13689

Now that typeid.h lives in c10/util, the include paths should reflect that.

Reviewed By: ezyang

Differential Revision: D12912237

fbshipit-source-id: e54225f049f690de77cb6d5f417994b211a6e1fb
2018-11-14 18:04:09 -08:00
Edward Yang
0478d32cb8 Move AlignOf, SmallVector and ArrayRef to c10.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13916

Reviewed By: smessmer

Differential Revision: D13046722

fbshipit-source-id: 1583d3170d60e22f0a535cd1fd56bdf928186f5d
2018-11-14 11:13:16 -08:00
Jerry Zhang
b89a3b50fb Remove StaticContext (#12547)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12547

Pull Request resolved: https://github.com/pytorch/pytorch/pull/12305

Remove StaticContext from context_base.h

Reviewed By: dzhulgakov

Differential Revision: D10073519

fbshipit-source-id: 350beec3c54365edef338318ce58229ccb825a98
2018-10-10 19:41:03 -07:00
Jerry Zhang
7724807551 Remove ExtractDeviceOption from StaticContext (#12304)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12304

- make ExtractDeviceOption to be a free function.
- Add a Strorage(at::Device) constructor in order to preserve the device_id.

Reviewed By: dzhulgakov

Differential Revision: D10069839

fbshipit-source-id: a5f3994a39bdf1b7503b39bb42c228e438b52bfa
2018-10-10 14:12:16 -07:00
Jerry Zhang
1c69d368e1 Remove New with Allocator Registry (#12111)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12111

Setup allocator registry keyed by at::DeviceType, and remove New from StaticContext.

Reviewed By: ezyang

Differential Revision: D10022853

fbshipit-source-id: 3e88a181fe5df24f33f49b88be1f75284a185588
2018-10-09 10:53:52 -07:00
Yangqing Jia
38f3d1fc40 move flags to c10 (#12144)
Summary:
still influx.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12144

Reviewed By: smessmer

Differential Revision: D10140176

Pulled By: Yangqing

fbshipit-source-id: 1a313abed022039333e3925d19f8b3ef2d95306c
2018-10-04 02:09:56 -07:00
Jerry Zhang
74dc4460eb New in StaticContext returns at::DataPtr (#12029)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12029

In order to remove New() function in StaticContext(to remove StaticContext) and converge to the Allocator design, we'll first change the return type of New to at::DataPtr.

Reviewed By: ezyang

Differential Revision: D9889990

fbshipit-source-id: 3257c763530b987025f428741bdd2e089d11bad4
2018-10-03 19:10:07 -07:00
Jerry Zhang
006171fffc Back out "[pytorch][PR] Revert "Move CreateContext to global registry (#11688)"" (#12121)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12121

Pull Request resolved: https://github.com/pytorch/pytorch/pull/12055

Original commit changeset: 6ca9de65b707

Reviewed By: ezyang

Differential Revision: D10033396

fbshipit-source-id: ca9f4b2f7ef0561f619b833415d394a8b9972bf4
2018-10-01 11:10:46 -07:00
Edward Yang
d7e11e3aae Revert "Move CreateContext to global registry (#11688)" (#12049)
Summary:
This reverts commit 3ae6ee4ebd.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12049

Differential Revision: D10030954

Pulled By: ezyang

fbshipit-source-id: 6ca9de65b707c5b4c68280fc6f1b8e5ad7251efc
2018-09-25 10:13:43 -07:00
Jerry Zhang
3ae6ee4ebd Move CreateContext to global registry (#11688)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11688

As a first step to remove static context(merge with allocator), we'll create a
global registries for context constructors, and remove CreateContext function from tensor.

Reviewed By: ezyang, dzhulgakov

Differential Revision: D9779821

fbshipit-source-id: 8b239ea50af7a0556fde2382f58f79194f0e3dc1
2018-09-24 17:07:50 -07:00
Alexander Sidorov
eb039dc92c Add CHECKs into GetTensorInfo and ExtractDeviceOption (#11597)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11597

We should always CHECK pointers which we plan to dereference
if they are inputs to the function. Nobody knows how the function will
be called in the future.

Reviewed By: yinghai

Differential Revision: D9800002

fbshipit-source-id: 7fd05f4717f2256d1b09a9e75475b12de6685b03
2018-09-14 09:40:27 -07:00
Edward Yang
7607b49538 s/GetDevicetype/device_type/ (#11656)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11656

The mis-capitalization really sticks up my craw.  I know why (we
already have a static function named GetDeviceType), but let's
name it differently.

```
codemod -d . --extensions cc,cpp,cu,cuh,h,py,hpp,TARGETS GetDevicetype device_type
```

Reviewed By: jerryzh168

Differential Revision: D9813544

fbshipit-source-id: fe462f4bc40b03e74921f8cf5ebd9cfc52e7e636
2018-09-13 16:32:51 -07:00
Jerry Zhang
5e400e9cae move context_base.h to ATen/core (#11336)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11336

Move `context_base.h` header to `ATen/core` and the implementations are in `caffe2/core/context_base.cc`

Reviewed By: ezyang

Differential Revision: D9670493

fbshipit-source-id: ce5bf2b3b4c80e9b62819f4332ce68af82720055
2018-09-07 12:20:25 -07:00
Jerry Zhang
9f4bcdf075 caffe2::DeviceType -> at::DeviceType (#11254)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11254
Previously we use DeviceType in caffe2.proto directly, but it's an `enum` and have implicit conversion to int, which does not have type safety, e.g. we have to explicitly check for a device type is valid in event.h:
```
template <int d>
struct EventCreateFunctionRegisterer {
  explicit EventCreateFunctionRegisterer(EventCreateFunction f) {
    static_assert(d < MaxDeviceTypes, "");
    Event::event_creator_[d] = f;
  }
};
```
at::DeviceType is an `enum class`, and it does not have implicit conversion to int, and provides better type safety guarantees. In this diff we have done the following refactor(taking CPU as an example):

    1. caffe2::DeviceType → caffe2::DeviceTypeProto
    2. caffe2::CPU → caffe2::PROTO_CPU
    3. caffe2::DeviceType = at::DeviceType
    4. caffe2::CPU = at::DeviceType::CPU

codemod -d caffe2/caffe2 --extensions h,cc,cpp 'device_type\(\), ' 'device_type(), PROTO_'
+ some manual changes

In short, after this diff, in c++, caffe2::CPU refers to the at::DeviceType::CPU and the old proto caffe2::CPU will be caffe2::PROTO_CPU.
In python side, we have a temporary workaround that alias `caffe2_pb2.CPU = caffe2_pb2.PROOT_CPU` to make the change easier to review and this will be removed later.

Reviewed By: ezyang

Differential Revision: D9545704

fbshipit-source-id: 461a28a4ca74e616d3ee183a607078a717fd38a7
2018-09-05 16:28:09 -07:00
Edward Yang
91797c0672 Replace direct include of caffe2.pb.h with an intermediary header caffe2_pb.h (#10946)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10946

```
codemod -d . --extensions cc,cpp,cu,cuh,h caffe2/proto/caffe2.pb.h caffe2/proto/caffe2_pb.h
```

Reviewed By: houseroad

Differential Revision: D9539945

fbshipit-source-id: 497d04720e8e7e61c05ffe1b23733d0cb774de7e
2018-08-28 11:57:08 -07:00
Tongliang Liao
7487ee55f1 Resolving error C2487 "member of dll interface class may not be declared with dll interface" by removing nested CAFFE2_API. (#10572)
Summary:
For #10570
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10572

Differential Revision: D9357984

Pulled By: Yangqing

fbshipit-source-id: a8f74e384eb3219fb6ac71ada4a45e6bce9199eb
2018-08-16 00:25:41 -07:00
Yangqing Jia
0a809fc8b1 build changes to make cpu unified build working. (#10504)
Summary:
Properly annotated all apis for cpu front. Checked with cmake using

cmake -DUSE_ATEN=ON -DUSE_CUDA=OFF -DBUILD_ATEN=ON

and resulting libcaffe2.so has about 11k symbols.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10504

Reviewed By: ezyang

Differential Revision: D9316491

Pulled By: Yangqing

fbshipit-source-id: 215659abf350af7032e9a4b0f28a856babab2454
2018-08-15 17:22:36 -07:00
Edward Yang
fa6b28bf40 Move ArrayRef, Backtrace, Error, SmallVector, optional to ATen/core; add CoreAPI (#10092)
Summary:
This also makes Backtrace more portable, by disabling its functionality for
mobile builds as well.

It also handles Caffe2 static Windows builds by introducing a new variable,
AT_CORE_STATIC_WINDOWS, which must be set if you're building
ATen on Windows as part of a static library.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10092

Reviewed By: gchanan, smessmer

Differential Revision: D9094393

Pulled By: ezyang

fbshipit-source-id: 93281f9302bd378605a26589ae308faf1dac7df4
2018-08-01 08:39:22 -07:00
Edward Yang
37a226de63 When BUILD_ATEN=OFF, use ATen/core directly (#10019)
Summary:
ATenCore.h is a dummy header to just test that this is working at all.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10019

Reviewed By: smessmer

Differential Revision: D9067262

Pulled By: ezyang

fbshipit-source-id: 58bab9c0aa83b56335e36b719b9b6505400d8dee
2018-07-30 21:09:55 -07:00
Jerry Zhang
aebf3b47ae Remove template parameter from Tensor (#9939)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9939

Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13

Pull Request resolved: https://github.com/pytorch/translate/pull/166

Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125

Closes https://github.com/pytorch/pytorch/pull/9125

Use inheritance for polymorphism, and remove template parameter
This is to change the templating in call sites, the core implementations will change later

Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are:

1. We added an extra argument *DeviceType* to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)),
2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided.
3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type
4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change

Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s.

Reviewed By: ezyang, houseroad

Differential Revision: D9024330

fbshipit-source-id: e0b8295d2dc6ebe2963383ded5af799ad17164ba
2018-07-27 10:56:39 -07:00
Jerry Zhang
969b62f276 Revert D8121878: Remove template parameter from Tensor
Differential Revision:
D8121878

Original commit changeset: 4a5e9a677ba4

fbshipit-source-id: d8e2c0bb145b52fbcca323b22d1d3346f0b3249e
2018-07-26 14:02:04 -07:00
Jerry Zhang
cd5adc7b5f Remove template parameter from Tensor (#13)
Summary:
Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13

Pull Request resolved: https://github.com/pytorch/translate/pull/166

Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125

Closes https://github.com/pytorch/pytorch/pull/9125

Use inheritance for polymorphism, and remove template parameter
This is to change the templating in call sites, the core implementations will change later

Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are:

1. We added an extra argument *DeviceType* to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)),
2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided.
3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type
4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change

Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s.

Reviewed By: xw285cornell

Differential Revision: D8121878

fbshipit-source-id: 4a5e9a677ba4ac82095df959851a054c81eccf81
2018-07-26 10:25:23 -07:00
Orion Reblitz-Richardson
1d5780d42c Remove Apache headers from source.
* LICENSE file contains details, so removing from individual source files.
2018-03-27 13:10:18 -07:00
Yangqing Jia
91d76f5dbd Reapply Windows fix
Summary:
Last fix was uncommitted due to a bug in internal build (CAFFE2_API causing error). This one re-applies it as well as a few more, especially enabling gtest.

Earlier commit message: Basically, this should make windows {static_lib, shared_lib} * {static_runtime, shared_runtime} * {cpu, gpu} work other than gpu shared_lib, which willyd kindly pointed out a symbol limit problem. A few highlights:
(1) Updated newest protobuf.
(2) use protoc dllexport command to ensure proper symbol export for windows.
(3) various code updates to make sure that C2 symbols are properly shown
(4) cmake file changes to make build proper
(5) option to choose static runtime and shared runtime similar to protobuf
(6) revert to visual studio 2015 as current cuda and msvc 2017 do not play well together.
(7) enabled gtest and fixed testing bugs.

Earlier PR is #1793

Closes https://github.com/caffe2/caffe2/pull/1827

Differential Revision: D6832086

Pulled By: Yangqing

fbshipit-source-id: 85f86e9a992ee5c53c70b484b761c9d6aed721df
2018-01-29 10:03:28 -08:00
Vladimir Chalyshev
8c02674964 Revert D6817719: [caffe2][PR] Better support for windows
Summary:
This reverts commit d286264fccc72bf90a2fcd7da533ecca23ce557e

bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
cause_a_sev_many_files

Differential Revision: D6817719

fbshipit-source-id: 8fe0ad7aba75caaa4c3cac5e0a804ab957a1b836
2018-01-26 06:08:49 -08:00
Yangqing Jia
8aa8eaabb1 Better support for windows
Summary:
Basically, this should make windows {static_lib, shared_lib} * {static_runtime, shared_runtime} * {cpu, gpu} work. A few highlights:

(1) Updated newest protobuf.
(2) use protoc dllexport command to ensure proper symbol export.
(3) various code updates to make sure that C2 symbols are properly shown
(4) cmake file changes to make build proper
(5) option to choose static runtime and shared runtime similar to protobuf
(6) revert to visual studio 2015 as current cuda and msvc 2017 do not play well together.
Closes https://github.com/caffe2/caffe2/pull/1793

Reviewed By: dzhulgakov

Differential Revision: D6817719

Pulled By: Yangqing

fbshipit-source-id: d286264fccc72bf90a2fcd7da533ecca23ce557e
2018-01-26 00:48:43 -08:00
Ilia Cherniavskii
dd1558dc8d Improve stream selection
Summary:
Check that next picked up stream is non-busy. Details:
https://fb.facebook.com/groups/101100140348621/permalink/377531329372166/

Reviewed By: azzolini

Differential Revision: D6381701

fbshipit-source-id: 58f81b8d7ed8179e524f4ee50578dddbb3e69e45
2017-11-27 17:29:08 -08:00
Yangqing Jia
59b2654544 reapply header change after xplat move
Summary: This is a reapplication of the earlier PR due to xplat move. Original author is Christoph Conrads <christoph.conrads@fluent.ai> christoph-conrads .

Reviewed By: houseroad

Differential Revision: D6379736

fbshipit-source-id: b7482ecf3b9487a528c15e92976e915791210002
2017-11-22 13:04:37 -08:00
Ilia Cherniavskii
1149b9bbb5 Polling async net executor
Summary:
Implementation of polling async net executor.
Notes:
- New net executor async_polling - schedules CPU and GPU ops asynchronously, uses single polling thread
- Events: update to Caffe2 events to support async CPU events, adding new methods:
 Query() - non-blocking checking of event states: INITIALIZED -> RECORDED -> SUCCESS/FAILED
 ErrorMessage() - when operation runs asynchronously and fails calling this on event will give error message
- Tasks: using existing DAGNet's algorithm to compute CPU and GPU chains, a separate task for each chain
- Polling: using single thread to query state of events - for CPU tasks atomically queries task state, for GPU task - uses cudaEventQuery; using Event
- Scheduling of CPU ops: using global thread pools
- Scheduling of GPU ops: using GPU thread pool per GPU device

Reviewed By: dzhulgakov

Differential Revision: D5985110

fbshipit-source-id: a9de7fcbb71d046a3aa1b573072b89a65dfeee8c
2017-11-03 07:27:44 -07:00
Yangqing Jia
8286ce1e3a Re-license to Apache
Summary: Closes https://github.com/caffe2/caffe2/pull/1260

Differential Revision: D5906739

Pulled By: Yangqing

fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902
2017-09-28 16:22:00 -07:00
Dmytro Dzhulgakov
97e733615c Use simple function pointers for memory allocation and deallocation.
Reviewed By: Yangqing

Differential Revision: D5822238

fbshipit-source-id: 9624e6494ea6be10221aa75c7f22aa8721946af2
2017-09-13 14:26:04 -07:00
Aapo Kyrola
47fd6cc255 Revert D5801013: [caffe2] Use simple function pointers for memory allocation and deallocation.
Summary:
This reverts commit 7068207a43400fa3902bbb3689b3c729e839456c

bypass-lint

Differential Revision: D5801013

fbshipit-source-id: ca2bd9aaf61c20ce1935a007ab7b34f5d37f5033
2017-09-12 16:36:36 -07:00
Yangqing Jia
10a032de67 Use simple function pointers for memory allocation and deallocation.
Summary:
During the team meeting today Dima and Alex mentioned that the current lambda
function causes slowdown in performance when a large number of alloc and
dealloc happen. My observation is that most of the Delete are actually direct
Delete() function pointers, so I gave it a shot to see if we can reduce
the overhead.

RawAllocDealloc is much fast already, and we observe another 5ns reduction
(12.5%). For TensorAllocDealloc of 32x32 tensors, we are observing 57ns saving
(26%). This is measured on Xeon(R) CPU E5-2660.

Also cleaned up the function interfaces of ShareExternalPointer so we have 2
functions only.

Reviewed By: salexspb, dzhulgakov

Differential Revision: D5801013

fbshipit-source-id: 7068207a43400fa3902bbb3689b3c729e839456c
2017-09-10 22:47:26 -07:00
Yangqing Jia
5d24a4eeef Early design for a general Event abstraction cross-devices.
Summary:
There are ad-hoc efforts on avoiding excessive device synchronizations, such as
async_dag, singlethread_async, etc. This diff aims to provide an early design
for a general Event class, that can achieve the following:

(1) It is device agnostic, essentially using a vtable to do cross device record,
wait and synchronization.
(2) Created new functions WaitEvent and Record in the Context class for
interacting with Events.
(3) Exposed the corresponding WaitEvent and Record functions in the OperatorBase
class as well.

An example use case is that, after potential future refactoring, one can achieve
a real async execution per operator by running

op.WaitEvent(previous_event);
op.RunAsync();
op.RecordEvent(this_op_event);

and the next op can do

next_op.WaitEvent(this_op_event);

Right now, I changed async_dag net implementation so that it uses the general
event design. The old Event class is assimilated to the general Event class and
the old Stream class is now essentially taken over by the Context class itself.

Reviewed By: harouwu

Differential Revision: D5648463

fbshipit-source-id: 58bd84d06e4a9977b0b835110ddb2f18be3b7cbc
2017-08-18 15:46:51 -07:00