Commit Graph

64 Commits

Author SHA1 Message Date
David Reiss
b7e044f0e5 Re-apply PyTorch pthreadpool changes
Summary:
This re-applies D21232894 (b9d3869df3) and D22162524, plus updates jni_deps in a few places
to avoid breaking host JNI tests.

Test Plan: `buck test @//fbandroid/mode/server //fbandroid/instrumentation_tests/com/facebook/caffe2:host-test`

Reviewed By: xcheng16

Differential Revision: D22199952

fbshipit-source-id: df13eef39c01738637ae8cf7f581d6ccc88d37d5
2020-06-23 19:26:21 -07:00
Kate Mormysh
92d3182c11 Revert D21232894: Unify PyTorch mobile's threadpool usage.
Test Plan: revert-hammer

Differential Revision:
D21232894 (b9d3869df3)

Original commit changeset: 8b3de86247fb

fbshipit-source-id: e6517cfec08f7dd0f4f8877dab62acf1d65afacd
2020-06-23 17:09:14 -07:00
Ashkan Aliabadi
b9d3869df3 Unify PyTorch mobile's threadpool usage. (#37243)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37243

*** Why ***

As it stands, we have two thread pool solutions concurrently in use in PyTorch mobile: (1) the open source pthreadpool library under third_party, and (2) Caffe2's implementation of pthreadpool under caffe2/utils/threadpool.  Since the primary use-case of the latter has been to act as a drop-in replacement for the third party version so as to enable integration and usage from within NNPACK and QNNPACK, Caffe2's implementation is intentionally written to the exact same interface as the third party version.

The original argument in favor of C2's implementation has been improved performance as a result of using spin locks, as opposed to relinquishing the thread's time slot and putting it to sleep - a less expensive operation up to a point.  That seems to have given C2's implementation the upper hand in performance, hence justifying the added maintenance complexity, until the third party version improved in parallel surpassing the efficiency of C2's implementation as I have verified in benchmarks.  With that advantage gone, there is no reason to continue using C2's implementation in PyTorch mobile either from the perspective of performance or code hygiene.  As a matter of fact, there is considerable performance benefit to be had as a result of using the third party version as it currently stands.

This is a tricky change though, mainly because in order to avoid potential performance regressions, of which I have witnessed none but just in abundance of caution, we have decided to continue using the internal C2's implementation whenever building for Caffe2.  Again, this is mainly to avoid potential performance regressions in production C2 use cases even if doing so results in reduced performance as far as I can tell.

So to summarize, today, and as it currently stands, we are using C2's implementation for (1) NNPACK, (2) PyTorch QNNPACK, and (3) ATen parallel_for on mobile builds, while using the third party version of pthreadpool for XNNPACK as XNNPACK does not provide any build options to link against an external implementation unlike NNPACK and QNNPACK do.

The goal of this PR then, is to unify all usage on mobile to the third party implementation both for improved performance and better code hygiene.  This applies to PyTorch's use of NNPACK, QNNPACK, XNNPACK, and mobile's implementation of ATen parallel_for, all getting routed to the
exact same third party implementation in this PR.

Considering that NNPACK, QNNPACK, and XNNPACK are not mobile specific, these benefits carry over to non-mobile builds of PyTorch (but not Caffe2) as well.  The implementation of ATen parallel_for on non-mobile builds remains unchanged.

*** How ***

This is where things get tricky.

A good deal of the build system complexity in this PR arises from our desire to maintain C2's implementation intact for C2's use.

pthreadpool is a C library with no concept of namespaces, which means two copies of the library cannot exist in the same binary or symbol collision will occur violating ODR.  This means that somehow, and based on some condition, we must decide on the choice of a pthreadpool implementation.  In practice, this has become more complicated as a result of all the possible combinations that USE_NNPACK, USE_QNNPACK, USE_PYTORCH_QNNPACK, USE_XNNPACK, USE_SYSTEM_XNNPACK, USE_SYSTEM_PTHREADPOOL and other variables can result in.  Having said that, I have done my best in this PR to surgically cut through this complexity in a way that minimizes the side effects, considering the significance of the performance we are leaving on the table, yet, as a result of this combinatorial explosion explained above I cannot guarantee that every single combination will work as expected on the first try.  I am heavily relying on CI to find any issues as local testing can only go that far.

Having said that, this PR provides a simple non mobile-specific C++ thread pool implementation on top of pthreadpool, namely caffe2::PThreadPool that automatically routes to C2's implementation or the third party version depending on the build configuration.  This simplifies the logic at the cost of pushing the complexity to the build scripts.  From there on, this thread pool is used in aten parallel_for, and NNPACK and family, again, routing all usage of threading to C2 or third party pthreadpool depending on the build configuration.

When it is all said or done, the layering will look like this:

a) aten::parallel_for, uses
b) caffe2::PThreadPool, which uses
c) pthreadpool C API, which delegates to
    c-1) third_party implementation of pthreadpool if that's what the build has requested, and the rabbit hole ends here.
    c-2) C2's implementation of pthreadpool if that's what the build has requested, which itself delegates to
    c-2-1) caffe2::ThreadPool, and the rabbit hole ends here.

NNPACK, and (PyTorch) QNNPACK directly hook into (c). They never go through (b).

Differential Revision: D21232894

Test Plan: Imported from OSS

Reviewed By: dreiss

Pulled By: AshkanAliabadi

fbshipit-source-id: 8b3de86247fbc3a327e811983e082f9d40081354
2020-06-23 16:34:51 -07:00
Nikita Shulga
44345ad08c Do not define C10_IOS on Mac (#37283)
Summary:
Because MacOS is not iOS
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37283

Test Plan: CI

Differential Revision: D21244398

Pulled By: malfet

fbshipit-source-id: b822e216e83887e2f2961b5c5384eaf749629f61
2020-04-25 13:52:46 -07:00
Kimish Patel
8269c4f3d3 Added nullptr check for pthradpool_get_threads_count (#34087)
Summary:
We get seg fault without this in using XNNPACK.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34087

Differential Revision: D20199787

Pulled By: kimishpatel

fbshipit-source-id: d3d274e7bb197461632b21688820cd4c10dcd819
2020-03-04 11:10:53 -08:00
Michael Ranieri
51d969e86a preprocessor cleanup (#33957)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33957

lots of small preprocessor warning cleanup for windows

Test Plan: CI green

Reviewed By: malfet, albanD

Differential Revision: D20153582

fbshipit-source-id: 18fd61c466fd1f55ededdae4448b3009a9cedc04
2020-03-02 13:37:19 -08:00
Kimish Patel
0e52627358 Fixing pthreadpool symbol conflict issue. (#33869)
Summary:
Mainly renaming pthread_create of C2, the only one referred internally in NNPACK, that
is conflicting, to pthread_create_c2.
Removed 2 other conflicting symbols that are not used internally at all.
Pointing XNNPACK to original repo instead of the fork.

Copy pasted the new interface and implementation to
caff2/utils/threadpool, so that for internal builds we compile against
this.

When threadpool is unified this will be removed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33869

Differential Revision: D20140580

Pulled By: kimishpatel

fbshipit-source-id: de70df0af9c7d6bc065e85ede0e1c4dd6a9e6be3
2020-02-28 21:23:18 -08:00
Hao Lu
81394581a3 [Caffe2][ThreadPool] Make sure numThreads does not exceed the number of big cores (#33523)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33523

When using `ThreadPool::setNumThreads` to set the number of threads, it should not exceed the number of big cores. Otherwise, the performance could degrade significantly.

Test Plan:
```
cd ~/fbsource/xplat
buck test caffe2:caffe2_testAndroid
```

Reviewed By: dreiss

Differential Revision: D19779267

fbshipit-source-id: 4e980e8a0ccc2f37e1c8ed16e2f4651d72924dbd
2020-02-19 18:24:24 -08:00
Brian Wignall
f326045b37 Fix typos, via a Levenshtein-type corrector (#31523)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking.

Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523

Differential Revision: D19216749

Pulled By: mrshenli

fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea
2020-01-17 16:03:19 -08:00
Sebastian Messmer
643ca5def2 Replace c10::guts::stuff with std::stuff (#30915)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30915

Since we now have C++14, we don't need these c10::guts helpers anymore
ghstack-source-id: 95777609

Test Plan: waitforsandcastle

Differential Revision: D18869639

fbshipit-source-id: 97716f932297c64c6e814410ac47b444c33d4e2e
2019-12-16 13:57:19 -08:00
Ivan Kobzarev
ca8cb3241a Expose setNumThreads to android api (#31205)
Summary:
PR https://github.com/pytorch/pytorch/pull/31033 was unlanded due to macos build failure:
https://app.circleci.com/jobs/github/pytorch/pytorch/3916388

This PR has changes that `setNumThreads` is only for android and moved to separate class `org.pytorch.PytorchAndroid` as a static function which is better as it has global effect
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31205

Reviewed By: dreiss

Differential Revision: D18977250

Pulled By: IvanKobzarev

fbshipit-source-id: 4995859808af498c82933c4db52bd7c7dfae90e5
2019-12-12 18:57:27 -08:00
Michael Suo
c0bcfd0445 Revert D18923167: Expose setNumThreads to android api
Test Plan: revert-hammer

Differential Revision:
D18923167

Original commit changeset: 8d98c2edbff4

fbshipit-source-id: 7db37cff298c511d0dd9eb373811c769e4a73be9
2019-12-12 09:23:58 -08:00
Ivan Kobzarev
6225443009 Expose setNumThreads to android api (#31033)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31033

Intention:
There are requests from users to control number of threads from android side:
https://discuss.pytorch.org/t/android-pytorch-forward-method-running-in-a-separate-thread-slow-down-ui-thread/63516/2
https://discuss.pytorch.org/t/threading-of-model-pytorch-android/62490/2

At the moment `setNumThreads` is placed in `org.pytorch.Module`, but this method changes global threadPool size, in future we will move it to some separate class to repeat python binding structure, which has torch.set_num_threads()

Test Plan: Imported from OSS

Differential Revision: D18923167

Pulled By: IvanKobzarev

fbshipit-source-id: 8d98c2edbff42e9b673509672dce3f2dd03a923e
2019-12-11 14:20:14 -08:00
Tao Xu
b730d04ed2 Fix deadlock issues in ThreadPool (#29885)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29885

### Summary

Currently, we have a deadlock issue on iOS when running Resnet50. The problem happens when the task being run in the ThreadPool wants to call `getNumThread()` who will try to acquire the same mutex. And thus cause the deadlock situation. The fix is just remove the guard for `_numThreads`, as it's not likely to change after initialization.

### Test Plan

1. Generate a Resnet50 model using trace_model.py
2. Run `ios/TestApp/bootstrap.sh` to do the benchmark

cc shoumikhin AshkanAliabadi

Test Plan: Imported from OSS

Differential Revision: D18533505

Pulled By: xta0

fbshipit-source-id: 2a069d20b59833ec8b02ff05515c3739a85a15de
2019-11-15 19:27:52 -08:00
Jiakai Liu
6d0b004574 rename caffe2::mobile_threadpool to caffe2::mobile_pthreadpool
Summary:
Rename old mobile_threadpool() API, replace it with a new version that
returns caffe2::ThreadPool instead of pthreadpool_t.

Test Plan: - builds

Differential Revision: D17543413

Pulled By: ljk53

fbshipit-source-id: a3effd24e8ce9d677a2a04ebe6b6e1582e6f0a65
2019-09-24 22:27:35 -07:00
Supriya Rao
40db964455 Add support for using caffe2::ThreadPool in pytorch mobile QNNPACK. (#23658)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23658

**How things work for caffe2:**
Caffe2 Ops -> NNPACK/QNNPACK -> pthreadpool_compute_1/2/3/4d_tiled -> pthreadpool_compute_1d (caffe2 shim) -> caffe2::ThreadPool

**Before this PR:**
Pytorch Ops -> NNPACK/QNNPACK -> pthreadpool_compute_1/2/3/4d_tiled -> pthreadpool_compute_1d (third_party implementation without mobile optimization)

caffe2::ThreadPool is optimized for mobile. This change leverages this logic for pytorch mobile as a temporary solution improve pytorch mobile perf. It is guarded by the C10_MOBILE macro.
For server side we return nullptr.

**Plan for next steps:**
Implement a mobile version of "at::parallel_for" which uses caffe2::ThreadPool internally so all ATen/TH multithreading usage is mobile optimized.
Refactor QNNPACK and/or pthreadpool to explicitly using "at::parallel_for" primitive to replace pthreadpool_compute_1d for Pytorch.
After QNNPACK is refactored, we will delete the mobile_threadpool() API.

ghstack-source-id: 88073396

Reviewed By: dreiss

Differential Revision: D16594020

fbshipit-source-id: 9f94600756d5f86d24a12a2fd7df3eebd0994f1d
2019-08-12 18:14:15 -07:00
Oleg Bogdanov
bf5a5c2a31 caffe2 | Use _aligned_free in WorkerPool destruction (#19751)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19751

This has probably never been tested on Windows but destruction of WorkersPool crashes because it uses _aligned_malloc to allocate and 'free' to deallocate, which is not symmetric. Fix is to use _aligned_free in deallocation

Reviewed By: hlu1

Differential Revision: D15083472

fbshipit-source-id: 42243fce8f2dfea7554b52e6b289d9fea81d7681
2019-04-25 14:54:50 -07:00
Oleg Bogdanov
70b82d28b8 caffe2 | Windows compat fixes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19531

Reviewed By: hlu1

Differential Revision: D15024541

fbshipit-source-id: cd8249a6d529afb65fa8afd74a05dbfe73eb1fb0
2019-04-23 14:30:19 -07:00
James Reed
1d26a3ae7e Open registration for c10 thread pool (#17788)
Summary:
1. Move ATen threadpool & open registration mechanism to C10
2. Move the `global_work_queue` to use this open registration mechanism, to allow users to substitute in their own
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17788

Reviewed By: zdevito

Differential Revision: D14379707

Pulled By: jamesr66a

fbshipit-source-id: 949662d0024875abf09907d97db927f160c54d45
2019-03-08 15:38:41 -08:00
Michael Liu
92a516b9ff Apply modernize-use-override - 2/2
Summary:
Use C++11’s override and remove virtual where applicable.
Change are automatically generated.

Reviewed By: Orvid

Differential Revision: D14054721

fbshipit-source-id: 15d266fa1779b1e3ea6270f00841d7fb1e4d44ee
2019-02-13 21:01:28 -08:00
Thomas Viehmann
b662a9b66a add back NNPACK in PyTorch (#15924)
Summary:
This tests the water for adding back NNPACK in PyTorch, it's a lot better than the fallback THNN versions.

In #6151, we (ezyang and soumith) removed NNPACK support from PyTorch. Of course Maratyszcza might have advice, too. (Or an opinion on the CMake changes.)

The only functional changes are to use NNPack more aggressively on mobile and a .contiguous() to match NNPack's assumption (I stumbled over that while using NNPack for style transfer.)
The CMake changes try to use the NNPack we already have in git.

In terms of lines of code this is a large part of the diff of https://lernapparat.de/pytorch-jit-android/ . As far as I can tell, we don't have MKLDNN on mobile and the native THNN implementation are prohibitively expensive in terms of both CPU and memory.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15924

Differential Revision: D13709576

Pulled By: ezyang

fbshipit-source-id: f2e287739909451c173abf046588209a7450ca2c
2019-01-18 15:34:35 -08:00
Jerry Zhang
0c32e1b43e use C10_MOBILE/ANDROID/IOS (#15363)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15363

Didn't define C10_MOBILE in the numa file move diff: D13380559
move CAFFE2_MOBILE/ANDROID/IOS to c10

```
codemod -m -d caffe2 --extensions h,hpp,cc,cpp,mm "CAFFE2_MOBILE" "C10_MOBILE"
codemod -m -d caffe2 --extensions h,hpp,cc,cpp,mm "CAFFE2_ANDROID" "C10_ANDROID"
codemod -m -d caffe2 --extensions h,hpp,cc,cpp,mm "CAFFE2_IOS" "C10_IOS"

```

i-am-not-moving-c2-to-c10

Reviewed By: marcinkwiatkowski

Differential Revision: D13490020

fbshipit-source-id: c4f01cacbefc0f16d5de94155c26c92fd5d780e4
2019-01-09 15:08:20 -08:00
Hao Lu
58a7f2aed1 Add pthreadpool_create and pthreadpool_destroy (#15492)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15492

Add pthreadpool_create and pthreadpool_destroy, which are used by NNPACK tests.

Reviewed By: Maratyszcza

Differential Revision: D13540997

fbshipit-source-id: 628c599df87b552ca1a3703854ec170243f04d2e
2018-12-21 20:28:18 -08:00
Hao Lu
01be9b7292 Handling nullptr case
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15467

Reviewed By: Maratyszcza

Differential Revision: D13536504

fbshipit-source-id: ab46ff6bb4b6ce881c3e29d7e6a095ea62289db4
2018-12-21 15:08:00 -08:00
Yangqing Jia
7d5f7ed270 Using c10 namespace across caffe2. (#12714)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12714

This is a short change to enable c10 namespace in caffe2. We did not enable
it before due to gflags global variable confusion, but it should have been
mostly cleaned now. Right now, the plan on record is that namespace caffe2 and
namespace aten will fully be supersets of namespace c10.

Most of the diff is codemod, and only two places of non-codemod is in caffe2/core/common.h, where

```
using namespace c10;
```

is added, and in Flags.h, where instead of creating aliasing variables in c10 namespace, we directly put it in the global namespace to match gflags (and same behavior if gflags is not being built with).

Reviewed By: dzhulgakov

Differential Revision: D10390486

fbshipit-source-id: 5e2df730e28e29a052f513bddc558d9f78a23b9b
2018-10-17 12:57:19 -07:00
Marat Dukhan
a55b9f77a0 Implement 3D and 4D parallelization in Caffe2 thread pool (#12455)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12455

- Mirror changes in pthreadpool

Reviewed By: harouwu

Differential Revision: D10240470

fbshipit-source-id: c1af769b5894f7865736fdaf4e0e5bf17c524614
2018-10-08 13:12:57 -07:00
Yangqing Jia
38f3d1fc40 move flags to c10 (#12144)
Summary:
still influx.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12144

Reviewed By: smessmer

Differential Revision: D10140176

Pulled By: Yangqing

fbshipit-source-id: 1a313abed022039333e3925d19f8b3ef2d95306c
2018-10-04 02:09:56 -07:00
Yangqing Jia
28dba2f928 Unify all *_EXPORT and *_IMPORT macros across c++ backend (#12019)
Summary:
TSIA. Right now we should basically use C10_EXPORT and C10_IMPORT for explicitly marking dllexport and dllimport, as a continued effort of the C10 unification.

This is a codemod by mechanically doing the following change:

CAFFE2_{EXPORT,IMPORT} -> C10_{EXPORT,IMPORT}
AT_CORE_{EXPORT,IMPORT} -> C10_{EXPORT,IMPORT}
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12019

Reviewed By: ezyang, teng-li

Differential Revision: D10016276

Pulled By: Yangqing

fbshipit-source-id: a420d62c43d1110105fc88f9e9076e28a3203164
2018-09-25 17:41:05 -07:00
Marat Dukhan
67c6d93634 Tune minimal work size (#10599)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10599

Not spawning threads with spin-lock synchronization is bad because they will switch to `condvar` wait, which increases wake-up latency next time they are needed.

Reviewed By: ajtulloch

Differential Revision: D9366664

fbshipit-source-id: 3b9e4a502aeefaf0ddc4795303a855d98980b02e
2018-08-16 17:39:57 -07:00
Yangqing Jia
0a809fc8b1 build changes to make cpu unified build working. (#10504)
Summary:
Properly annotated all apis for cpu front. Checked with cmake using

cmake -DUSE_ATEN=ON -DUSE_CUDA=OFF -DBUILD_ATEN=ON

and resulting libcaffe2.so has about 11k symbols.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10504

Reviewed By: ezyang

Differential Revision: D9316491

Pulled By: Yangqing

fbshipit-source-id: 215659abf350af7032e9a4b0f28a856babab2454
2018-08-15 17:22:36 -07:00
Edward Yang
ad76fc8807 s/DISABLE_COPY_AND_ASSIGN/AT_DISABLE_COPY_AND_ASSIGN/ (#10275)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10275

Remove forwarding declaration in caffe2/core/common.h

```
codemod -d caffe2 --extensions cc,cpp,cu,cuh,h \\bDISABLE_COPY_AND_ASSIGN AT_DISABLE_COPY_AND_ASSIGN
```

Reviewed By: mingzhe09088

Differential Revision: D9184809

fbshipit-source-id: 958cf5162b0d92b83ea9c2597abb77320ca57ce8
2018-08-07 08:54:26 -07:00
Xiaomeng Yang
03e7953a98 Use FixedDivisor in Reduce and Broadcast CUDA kernels (#9072)
Summary:
Closes https://github.com/pytorch/pytorch/pull/9072

Use FixedDivisor in Reduce and Broadcast CUDA kernels

Reviewed By: houseroad

Differential Revision: D8710243

fbshipit-source-id: 6f1da12234898594a1be8c979d942aa515832aeb
2018-07-01 00:25:34 -07:00
Marat Dukhan
d3651585b8
Simplify pthreadpool implementation on top of Caffe2 thread pool (#7666)
Remove one layer of pointer dereference when calling the thread pool.
2018-06-18 19:06:50 -07:00
sf-wind
5b86c3af4a
Update from facebook (#8384)
* [fix] fixup the bias multiplier data access issue

Hotfix for failues in conv_transpose

* [D2][Easy]: lint regularizer

lint with black

* [GanH]: Split mu in adaptive weight for diagnose

* [Dper] Add the ability to split FC weights into multiple smaller ones

* fix SumReduceLikeOp for empty blob

as desc.

* add ctc_greedy_decoder for caffe2

ctc_greedy_decoder same as tf's

* Update event callback handling

Allow multiple callbacks per event

* Add WeightedSum layer

The motivation is to do weighted sum in HoNet/crossnet, in the next diff, I'll replace model.Add with model.WeightedSum in
honet: https://fburl.com/f4rmolg2
crossnet: https://fburl.com/v7awn8se, https://fburl.com/63filbnm

* Replicate DAG's behavior

Some callers expect RunAsync to block, replicate that behavior in case of
explicit 'dag' net type

* [dper] layernorm layer

as title

* Override dag, async_dag, async_polling

Overriding dag, async_dag and async_polling with async_scheduling

* Name the thread pools

Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead.

* [Caffe2] FilleOp should support int64_t dimensions

Change argument type to int64_t for shape argument of FillerOp (used in ConstantFill, XavierFill, etc)

* Remove caffe2/caffe2/contrib/torch/

It's not used anywhere and depends on old lua torch that conflicts with Aten. Given PT1 it's not relevant any more (though it was nice and clever code!)

#accept2ship

* Fix linearWarmup multiplier check

The multiplier needs to be non-negative, not strictly positive.

* Revert D3314316

This is after 2 years and we do not seem to have a use case for this one, so
for the sake of clean API design we should potentially remove this. This would
allow us to potentially pass in arguments to optionally construct an object,
although it is indeed a little bit unclear how we can reuse existing objects if
constructor arguments are passed in. In any case, we may want to remove this
dangling feature.

* Speedup generate proposals by partial_sort.

Speedup generate proposals by partial_sort.

FACEBOOK:
- Saw speed improvement for training with this op.
- Yanghan benchmarked the op on a small dataset and see consistent 100% improvement on speed (6ms -> 3ms) on 420 input resolution. See next diff for details.

* More parallel processing friendly for CPP version of GenerateProposals.

More parallel processing friendly for CPP version of GenerateProposals.

* [DT] [43/n] Lift stop conditions inside reader code back to flow control

1. Split multi_reader function into local_reader and remote_reader
2. Lifted stop conditions inside Limiter back to flow control
3. Split epoch flow building logic into 3 cases:
  - single machine (1 reader, 1 trainer on trainer0 node, no PS)
  - (1 reader + 1 trainer) on trainer0 node, has PS
  - multiple readers, readers do not share nodes with trainers, might have PS or not

* Resolve conflicts for torch/_thnn/utils.py

* [Caffe2] Handle image decoding errors

Image decoding errors can make the whole training fail. This diff is to handle them
1.Catch imdecode exceptions and check if decoded image has zero columns or rows. This is counted as decoding errors.
2.Replace the image with empty in case of error
3.Count the number of errors and throw runtime exception if the rate reaches given number

The empty image data is kept. It might introduce noise in the training data.

* Update MKL exporter to IDEEP ops

TSIA

* [Caffe2] GlobalInit is thread safe, fixing the comment

With the mutex and lock, GlobalInit is thread safe.
Update the comments.

* Back out "Add support for generating ATen files during fbcode build"

Original commit changeset: 28970ddba353

@override-unit-failures
(Note: this ignores all push blocking failures!)

* [DT]: fix predictor save

similar to D6610058, here we add the fix for distributed online training

* Remove net_singlethread_async_gpu.cc

Closes https://github.com/caffe2/caffe2/pull/2528

This removes net_singlethread_async_gpu.cc as part of our effort to clean
CUDAContext and the net executors.

* Inline DFS task execution

Add a DFS inline task execution mode in executor

* Add c10 folder to fbcode

This adds the c10 folder and its test cases to fbcode. Build flags are mostly taken from aten.

* add dependencies for online trainer

Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators

Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/

* Resolve conflicts for tools/jit/gen_jit_dispatch.py

* [Fix] sparse regularization in distributed training

* Support advanced pooling options in sum processor

* support advanced pooling options in sum processor
* remove redundant code
* support attention in sum processor

* Improve shard logging in net tracing code

Make it handle arbitrary shard ids instead of just one digit ids.

* [Caffe2] Call GlobalInit in predictor only in mobile

FACEBOOK:
Calling GlobalInit long after the program starts may not be safe. There are issues if the following happens:

User does not call GlobalInit and initFacebook after program starts
User sets a flag manually: https://fburl.com/mcsumw7d
User calls OSS predictor.
OSS predictor calls GlobalInit
GlobalInit calls initFacebook
initFacebook resets all flags: https://fburl.com/tolszha1
Thus, the user manually set flags are overwritten

This would happen anytime GlobalInit is called long after the program starts.
I suppose the intention of the user in this case is not to call GlobalInit throughout the program,
but use Caffe2 regardless (is that desired?)
But adding GlobalInit in the OSS predictor would automatically call GlobalInit when using Caffe2.

This issue doesn't exist in mobile, since initFacebook is not called on mobile.

For now, guard the GlobalInit in predictor for mobile only.
May want to ensure the GlobalInit is always called at the start of the program. @[3501714:kutta] has seen weird issues when not calling GlobalInit at the start of the program on server side. He has made some progress on this.

* resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py

* Add empty fix for SumLikeReduceOp

Add empty fix for SumLikeReduceOp

* Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN

This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* Remove Declarations.yaml

* Include common.h

* Change std::stoi to caffe2::stoi

* Add thread_name.cc to the CMake file

* No need to subtract 1. Fix test segfaults

* Fix NetTest, ObserverTest

Fix tests

(cherry picked from commit 3767e66c3f365596cba3d46d3e7322c933a0ab41)

* CTCGreedyDecoderOp only has CPU implementation, test should only run on CPU

* Add a variable to avoid conversion resizing issue

* [fix] fixup the bias multiplier data access issue

Hotfix for failues in conv_transpose

* [D2][Easy]: lint regularizer

lint with black

* [GanH]: Split mu in adaptive weight for diagnose

* [Dper] Add the ability to split FC weights into multiple smaller ones

* fix SumReduceLikeOp for empty blob

as desc.

* add ctc_greedy_decoder for caffe2

ctc_greedy_decoder same as tf's

* Update event callback handling

Allow multiple callbacks per event

* Add WeightedSum layer

The motivation is to do weighted sum in HoNet/crossnet, in the next diff, I'll replace model.Add with model.WeightedSum in
honet: https://fburl.com/f4rmolg2
crossnet: https://fburl.com/v7awn8se, https://fburl.com/63filbnm

* Replicate DAG's behavior

Some callers expect RunAsync to block, replicate that behavior in case of
explicit 'dag' net type

* [dper] layernorm layer

as title

* Override dag, async_dag, async_polling

Overriding dag, async_dag and async_polling with async_scheduling

* Name the thread pools

Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead.

* [Caffe2] FilleOp should support int64_t dimensions

Change argument type to int64_t for shape argument of FillerOp (used in ConstantFill, XavierFill, etc)

* Remove caffe2/caffe2/contrib/torch/

It's not used anywhere and depends on old lua torch that conflicts with Aten. Given PT1 it's not relevant any more (though it was nice and clever code!)

#accept2ship

* Fix linearWarmup multiplier check

The multiplier needs to be non-negative, not strictly positive.

* Revert D3314316

This is after 2 years and we do not seem to have a use case for this one, so
for the sake of clean API design we should potentially remove this. This would
allow us to potentially pass in arguments to optionally construct an object,
although it is indeed a little bit unclear how we can reuse existing objects if
constructor arguments are passed in. In any case, we may want to remove this
dangling feature.

* Speedup generate proposals by partial_sort.

Speedup generate proposals by partial_sort.

FACEBOOK:
- Saw speed improvement for training with this op.
- Yanghan benchmarked the op on a small dataset and see consistent 100% improvement on speed (6ms -> 3ms) on 420 input resolution. See next diff for details.

* More parallel processing friendly for CPP version of GenerateProposals.

More parallel processing friendly for CPP version of GenerateProposals.

* [DT] [43/n] Lift stop conditions inside reader code back to flow control

1. Split multi_reader function into local_reader and remote_reader
2. Lifted stop conditions inside Limiter back to flow control
3. Split epoch flow building logic into 3 cases:
  - single machine (1 reader, 1 trainer on trainer0 node, no PS)
  - (1 reader + 1 trainer) on trainer0 node, has PS
  - multiple readers, readers do not share nodes with trainers, might have PS or not

* Resolve conflicts for torch/_thnn/utils.py

* [Caffe2] Handle image decoding errors

Image decoding errors can make the whole training fail. This diff is to handle them
1.Catch imdecode exceptions and check if decoded image has zero columns or rows. This is counted as decoding errors.
2.Replace the image with empty in case of error
3.Count the number of errors and throw runtime exception if the rate reaches given number

The empty image data is kept. It might introduce noise in the training data.

* Update MKL exporter to IDEEP ops

TSIA

* [Caffe2] GlobalInit is thread safe, fixing the comment

With the mutex and lock, GlobalInit is thread safe.
Update the comments.

* Back out "Add support for generating ATen files during fbcode build"

Original commit changeset: 28970ddba353

@override-unit-failures
(Note: this ignores all push blocking failures!)

* [DT]: fix predictor save

similar to D6610058, here we add the fix for distributed online training

* Remove net_singlethread_async_gpu.cc

Closes https://github.com/caffe2/caffe2/pull/2528

This removes net_singlethread_async_gpu.cc as part of our effort to clean
CUDAContext and the net executors.

* Inline DFS task execution

Add a DFS inline task execution mode in executor

* Add c10 folder to fbcode

This adds the c10 folder and its test cases to fbcode. Build flags are mostly taken from aten.

* add dependencies for online trainer

Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators

Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/

* Resolve conflicts for tools/jit/gen_jit_dispatch.py

* [Fix] sparse regularization in distributed training

* Support advanced pooling options in sum processor

* support advanced pooling options in sum processor
* remove redundant code
* support attention in sum processor

* Improve shard logging in net tracing code

Make it handle arbitrary shard ids instead of just one digit ids.

* [Caffe2] Call GlobalInit in predictor only in mobile

FACEBOOK:
Calling GlobalInit long after the program starts may not be safe. There are issues if the following happens:

User does not call GlobalInit and initFacebook after program starts
User sets a flag manually: https://fburl.com/mcsumw7d
User calls OSS predictor.
OSS predictor calls GlobalInit
GlobalInit calls initFacebook
initFacebook resets all flags: https://fburl.com/tolszha1
Thus, the user manually set flags are overwritten

This would happen anytime GlobalInit is called long after the program starts.
I suppose the intention of the user in this case is not to call GlobalInit throughout the program,
but use Caffe2 regardless (is that desired?)
But adding GlobalInit in the OSS predictor would automatically call GlobalInit when using Caffe2.

This issue doesn't exist in mobile, since initFacebook is not called on mobile.

For now, guard the GlobalInit in predictor for mobile only.
May want to ensure the GlobalInit is always called at the start of the program. @[3501714:kutta] has seen weird issues when not calling GlobalInit at the start of the program on server side. He has made some progress on this.

* resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py

* Add empty fix for SumLikeReduceOp

Add empty fix for SumLikeReduceOp

* Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN

This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* Remove Declarations.yaml

* Include common.h

* Change std::stoi to caffe2::stoi

* Add thread_name.cc to the CMake file

* No need to subtract 1. Fix test segfaults

* Fix NetTest, ObserverTest

Fix tests

(cherry picked from commit 3767e66c3f365596cba3d46d3e7322c933a0ab41)

* CTCGreedyDecoderOp only has CPU implementation, test should only run on CPU

* Add a variable to avoid conversion resizing issue

* Remove the code per soumith's comments

* Remove the code per soumith's comments

* Remove blank lines in the end of file

* Resolve conflicts for torch/_thnn/utils.py

* Update MKL exporter to IDEEP ops

TSIA

* Back out "Add support for generating ATen files during fbcode build"

Original commit changeset: 28970ddba353

@override-unit-failures
(Note: this ignores all push blocking failures!)

* add dependencies for online trainer

Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators

Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/

* Resolve conflicts for tools/jit/gen_jit_dispatch.py

* Support advanced pooling options in sum processor

* support advanced pooling options in sum processor
* remove redundant code
* support attention in sum processor

* resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py

* Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN

This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* Remove Declarations.yaml

* Include common.h

* Change std::stoi to caffe2::stoi

* [caffe2] uprade IDEEP and hotfix for conv op accuracy issue (#8364)

* [IDEEP] Upgrade IDEEP version

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* [IDEEP] Fix accuracy issue in conv op

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fix build error due to lack of src in CMakeLists

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Remove the code per soumith's comments

* [ONNX] Add an ATen fallback pathway for ONNX export (#8273)

* ATen fallback for ONNX export

* Move to enum

* Fix model test

* Add comment

* Address comments

BC interface

* Remove imaginary file (#8415)

* [Caffe2] Enable AMD/MIOPEN ops for Caffe2  (#8306)

* Add hip support for caffe2 core

* Add MIOPEN header/wrapper to caffe2 core

* Add HIP device into caffe2 PB

* top level makefile change for rocm/hip

* makefile scaffolding for AMD/RocM/HIP

* Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files

* caffe2 PB update for AMD/ROCM HIP device

* Add AMD/RocM/Thrust dependency

* HIP threadpool update

* Fix makefile macro

* makefile fix: duplicate test/binary name

* makefile clean-up

* makefile clean-up

* add HIP operator registry

* add utilities for hip device

* Add USE_HIP to config summary

* makefile fix for BUILD_TEST

* merge latest

* Fix indentation

* code clean-up

* Guard builds without HIP and use the same cmake script as PyTorch to find HIP

* Setup rocm environment variables in build.sh (ideally should be done in the docker images)

* setup locale

* set HIP_PLATFORM

* Revert "set HIP_PLATFORM"

This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad.

* continue the build script environment variables mess

* HCC_AMDGPU_TARGET

* Cleanup the mess, has been fixed in the lastest docker images

* Assign protobuf field hip_gpu_id a new field number for backward compatibility

* change name to avoid conflict

* Fix duplicated thread pool flag

* Refactor cmake files to not add hip includes and libs globally

* Fix the wrong usage of environment variables detection in cmake

* Add MIOPEN CNN operators

* Revert "Add MIOPEN CNN operators"

This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a.

* Add MIOPEN pooling operator

* Add MIOPEN activation operator

* Add MIOPEN softmax operator

* Add MIOPEN spatial batch norm operator

* Add MIOPEN loacl response normalization operator

* Add MIOPEN conv operator

* Clean-up LRN ops

* enable fp16 in MIOPEN pool ops

* Enable fp16 for MIOPEN relu op

* Enable fp16 for MIOPEN spatial batch norm op

* code clean-up

* revert float16 support

* Create Caffe2 python binding for AMD/ROCM/HIP

* Add op fallback for HIP operator

* add hip src/test files in cmake

* exclude hip src/test files

* fix python binding for hip backend

* fix MIOPEN pooling op workspace

* hack to compile miopen operators

* fix include path for MIOPEN ops

* Fix include path

* Add HIP math utilities

* Fix path for HIP math utils

* cmake fix

* Cmake fix / hipcc for hip files

* suppress hipcc warning

* cmake fix /replcae USE_HIP with USE_ROCM

* revert LoadHIP.cmake change

* fix include for thrust/cub-hip

* include path fix for conversion.h

* Updated with latest upstream changes

* clang format fixes

* Context_hip updates

* Fixed typo in rocblas handle get function

* Updated hipified math utils

* Updated math hip test util

* Updated context hip test

* Updated common_hip

* Updated net async dag for HIP

* Added MIOPEN in operator hip test

* fix

* C2 dependencies clean-up

* fix include path for building custom protobuf

* Decouple miopen pool op and conv_pool_op base

* cmake refactor

* fix operator_hip_test

* move all hip/miopen ops files into caffe2/operators/hip

* sanitize cmake

* permission issue

* remove extra parenthesis

* remove artifact from resolving merge conflict

* cont. sanitize cmake files

* fix syntax error

* sanitize conversion.h

* .

* Revert "."

This reverts commit 56020cb0e996a31ae27bf1f8f491955ed0b121b9.

* clang-format

* Enable some reduce operators' ONNX backend tests (#8418)

* fix old comment to point to the right file (#8416)

* Stop pinning nccl version. (#8421)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Expose logsumexp docs and mark log_sum_exp in distributions for internal use (#8428)

* Enable some of the ONNX backend test on broadcasting (#8423)

* Enable some of the ONNX backend test on broadcasting

* enable gemm broadcast

* Expose proto utils and ONNX (#8073)

* Expose proto utils and ONNX from PyTorch libcaffe2.so

* Try to use protobuf from _C.so

* Fix ONNX proto header include

* Adjust order of imports for ONNX until nanopb goes away

* Set and use ONNX_NAMESPACE for PyTorch builds

* Show protobuf summary for all builds

* Add ONNX_NAMESPACE for cpp_build

* Statically link libprotobuf.a into libtorch.so

* Set ONNX_NAMESPACE on Windows build

* Move core/dispatch up as well

* Add /MD flag for Windows build of _C

* Potential Windows fix for ONNX and protobuf

* Add direct linkage from _C to ONNX on Windows

* Only include protobuf wrapper for PyTorch

* Pass extra_compile_args to _nvrtc ext build

* Remove installation of .a files

* Rebase creates some weird situations, revert them manually

* Remove more weird changes due to rebase

* Need to add thread_name.cc after merge
2018-06-13 13:10:45 -07:00
Edward Z. Yang
cc8fbc9d08
Revert "Name the thread pools (#8137)" (#8379)
This reverts commit 96876d9e7e.
2018-06-12 11:51:32 -04:00
Giuseppe Ottaviano
96876d9e7e Name the thread pools (#8137)
Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead.
2018-06-11 23:13:46 -07:00
Orion Reblitz-Richardson
d1bdb3b10a Remove core and util warnings (#8239)
* Fix some signed/unsigned mismatches

* Skip unused result warning

* Explict fallthrough for murmur hash

* Enable aligned new support to eliminate warning

* Switch to int instead of unsigned in some cases
2018-06-07 09:10:33 -07:00
Andrew Tulloch
84707be156
WorkersPool uses atomic writes to task_ (#6577) 2018-04-13 13:26:41 -07:00
Orion Reblitz-Richardson
1d5780d42c Remove Apache headers from source.
* LICENSE file contains details, so removing from individual source files.
2018-03-27 13:10:18 -07:00
Marat Dukhan
7462eca363 Initialize cpuinfo in the thread pool
Thread pool called cpuinfo_get_processors_count() without initializing cpuinfo. Only by luck it didn't make Caffe2 single-threaded: threadpool is initialized after NNPACK, and NNPACK initializes cpuinfo itself.

This commit also updates cpuinfo to a version that aborts with a fatal error if its used uninitialized.
2018-03-26 15:44:47 -04:00
sf-wind
602a09dde7 Update caffe2 from facebook 4f527ef46abf (#2234)
* [GanH]: two_task_discriminator

as titled

and adding label smooth

* [Dper2] Simplified UI options needed for blob magnitude visualization

* [GanH]: fix tags

as titled

* Added type and shape inference for GatherRange operator

This helps with type / shape inference when using this operator in layers.
Also just a nice to have in general.

* Demonstrate Caffe2 exception handling with StoreHandlerTimeoutError in Python

We'd like to catch and recover from certain Caffe2 net exceptions. Use this diff to demonstrate a pattern of registering a pybind exception mapping and catching in Pythonusing caffe2::StoreHandlerTimeoutException.

* Bind Gloo IoException to IoError in Python

Allow peer failure handling and recovery using an exception based mechanism. This diff registers gloo::IoException with pybind.

* [GanH]: add label smoothing to softmax with loss

as titled

* [C2] Enable LARS in Adagrad and hook it to DPER

* [DPER] Don't pass LayerModelHelper in create_trainer_nodes

Since we're planning to get rid of it eventually and I want to get access to
NetDef only interface ASAP - I'm looking towards removing all references to
LMH, where we don't really need them.

* fix bugs in LambdaRankNdcgOp

the loss and gradient in LambdaRankNdcgOp are incorrect. The loss should be negative log of probs instead of log.

* Restrict thread pool on iOS to only big cores

Historically, iPhones exposed only one type of cores, and Caffe2 thread pool used all of them.
However, iPhone 8/iPhone X exposes 2 big + 4 LITTLE cores. As our thread pool doesn't support work stealing or other forms of load balancing, fast cores end up waiting for the slow ones, and it may be better to restrict execution to only 2 fast cores, like we do on Android.

* Remove SparseLength Sum/WeightedSum/Mean operators with fp16 engine

Remove SparseLength Sum/WeightedSum/Mean operators with fp16 engine

* make clang happy and get fewer warnings

make clang happy and get fewer warnings

* [Personalization] Support add_output_schema() in layer_model_helper

Problem:
Currently the output_schema of sparse_nn can only be set once. https://fburl.com/efth5zer.

Solution:
For flexibility, we want to add fields to output_schema incrementally.

Plan:
Wrap the change of `model._output_schema` into a new function `add_output_schema()` for adding additional output_schema.

Callsite:
The add_output_schema() should be called instead at https://fburl.com/efth5zer

Reference:
The newly added `add_output_schema()` will be similar to `add_loss()` in https://fburl.com/t2ii8njh
2018-03-12 12:22:59 -07:00
Marat Dukhan
09b6ad5785 Use cpuinfo instead of Android's libcpufeatures in Android build 2018-03-09 22:20:37 -05:00
Andrew Tulloch
66131dec6f Expose Caffe2 WorkerPool from ThreadPool
Reviewed By: harouwu

Differential Revision: D6946610

fbshipit-source-id: a9fef0f1c7732b534433ee9517abddc32d0ec702
2018-02-14 21:09:15 -08:00
Yangqing Jia
91d76f5dbd Reapply Windows fix
Summary:
Last fix was uncommitted due to a bug in internal build (CAFFE2_API causing error). This one re-applies it as well as a few more, especially enabling gtest.

Earlier commit message: Basically, this should make windows {static_lib, shared_lib} * {static_runtime, shared_runtime} * {cpu, gpu} work other than gpu shared_lib, which willyd kindly pointed out a symbol limit problem. A few highlights:
(1) Updated newest protobuf.
(2) use protoc dllexport command to ensure proper symbol export for windows.
(3) various code updates to make sure that C2 symbols are properly shown
(4) cmake file changes to make build proper
(5) option to choose static runtime and shared runtime similar to protobuf
(6) revert to visual studio 2015 as current cuda and msvc 2017 do not play well together.
(7) enabled gtest and fixed testing bugs.

Earlier PR is #1793

Closes https://github.com/caffe2/caffe2/pull/1827

Differential Revision: D6832086

Pulled By: Yangqing

fbshipit-source-id: 85f86e9a992ee5c53c70b484b761c9d6aed721df
2018-01-29 10:03:28 -08:00
Marat Dukhan
224493d9ce NNPACK: Use new bindings and custom thread pool
Summary:
This change should dramatically (~10X) improve performance of convolution with NNPACK engine
Closes https://github.com/caffe2/caffe2/pull/1730

Reviewed By: sf-wind

Differential Revision: D6695895

Pulled By: Maratyszcza

fbshipit-source-id: 26291916811ef4cb819a59aec848c4e23668e568
2018-01-11 10:48:12 -08:00
Anton Vaneev
e105fee57d Fix ThreadPool class/struct forward declaration mixup
Summary: `ThreadPool` is a class, but it is forward-declared as a struct, which produces an error when compiled with clang 5.

Reviewed By: Maratyszcza

Differential Revision: D6399594

fbshipit-source-id: e8e81006f484b38e60389c659e9500ec9cfab731
2017-11-29 07:21:03 -08:00
Yangqing Jia
8286ce1e3a Re-license to Apache
Summary: Closes https://github.com/caffe2/caffe2/pull/1260

Differential Revision: D5906739

Pulled By: Yangqing

fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902
2017-09-28 16:22:00 -07:00
Yangqing Jia
9b2c5501b8 Fix Windows build
Summary:
After this, windows should be all green.
Closes https://github.com/caffe2/caffe2/pull/1228

Reviewed By: bwasti

Differential Revision: D5888328

Pulled By: Yangqing

fbshipit-source-id: 98fd39a4424237f2910df69c8609455d7af3ca34
2017-09-21 20:13:15 -07:00
Marat Dukhan
bd17684252 Run thread pool only on fast cores
Summary:
choose the number of cores for the thread pool as the number of fast cores

Didn't do any benchmarks, so its mostly FYI diff

Reviewed By: ajtulloch

Differential Revision: D5579797

fbshipit-source-id: 5ada001116c731780f38a62e9c0b500bd64a4bfe
2017-09-13 14:35:28 -07:00
Andrew Tulloch
81ddd5e869 Use std::{thread,mutex,condition_variable} instead of raw pthreads in WorkersPool
Reviewed By: Yangqing

Differential Revision: D5753072

fbshipit-source-id: 436915e6253eb517306c577e31854f8e018a36dc
2017-09-05 12:33:13 -07:00