Commit Graph

28 Commits

Author SHA1 Message Date
Michael Ranieri
51d969e86a preprocessor cleanup (#33957)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33957

lots of small preprocessor warning cleanup for windows

Test Plan: CI green

Reviewed By: malfet, albanD

Differential Revision: D20153582

fbshipit-source-id: 18fd61c466fd1f55ededdae4448b3009a9cedc04
2020-03-02 13:37:19 -08:00
Kimish Patel
0e52627358 Fixing pthreadpool symbol conflict issue. (#33869)
Summary:
Mainly renaming pthread_create of C2, the only one referred internally in NNPACK, that
is conflicting, to pthread_create_c2.
Removed 2 other conflicting symbols that are not used internally at all.
Pointing XNNPACK to original repo instead of the fork.

Copy pasted the new interface and implementation to
caff2/utils/threadpool, so that for internal builds we compile against
this.

When threadpool is unified this will be removed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33869

Differential Revision: D20140580

Pulled By: kimishpatel

fbshipit-source-id: de70df0af9c7d6bc065e85ede0e1c4dd6a9e6be3
2020-02-28 21:23:18 -08:00
Hao Lu
81394581a3 [Caffe2][ThreadPool] Make sure numThreads does not exceed the number of big cores (#33523)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33523

When using `ThreadPool::setNumThreads` to set the number of threads, it should not exceed the number of big cores. Otherwise, the performance could degrade significantly.

Test Plan:
```
cd ~/fbsource/xplat
buck test caffe2:caffe2_testAndroid
```

Reviewed By: dreiss

Differential Revision: D19779267

fbshipit-source-id: 4e980e8a0ccc2f37e1c8ed16e2f4651d72924dbd
2020-02-19 18:24:24 -08:00
Sebastian Messmer
643ca5def2 Replace c10::guts::stuff with std::stuff (#30915)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30915

Since we now have C++14, we don't need these c10::guts helpers anymore
ghstack-source-id: 95777609

Test Plan: waitforsandcastle

Differential Revision: D18869639

fbshipit-source-id: 97716f932297c64c6e814410ac47b444c33d4e2e
2019-12-16 13:57:19 -08:00
Ivan Kobzarev
ca8cb3241a Expose setNumThreads to android api (#31205)
Summary:
PR https://github.com/pytorch/pytorch/pull/31033 was unlanded due to macos build failure:
https://app.circleci.com/jobs/github/pytorch/pytorch/3916388

This PR has changes that `setNumThreads` is only for android and moved to separate class `org.pytorch.PytorchAndroid` as a static function which is better as it has global effect
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31205

Reviewed By: dreiss

Differential Revision: D18977250

Pulled By: IvanKobzarev

fbshipit-source-id: 4995859808af498c82933c4db52bd7c7dfae90e5
2019-12-12 18:57:27 -08:00
Michael Suo
c0bcfd0445 Revert D18923167: Expose setNumThreads to android api
Test Plan: revert-hammer

Differential Revision:
D18923167

Original commit changeset: 8d98c2edbff4

fbshipit-source-id: 7db37cff298c511d0dd9eb373811c769e4a73be9
2019-12-12 09:23:58 -08:00
Ivan Kobzarev
6225443009 Expose setNumThreads to android api (#31033)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31033

Intention:
There are requests from users to control number of threads from android side:
https://discuss.pytorch.org/t/android-pytorch-forward-method-running-in-a-separate-thread-slow-down-ui-thread/63516/2
https://discuss.pytorch.org/t/threading-of-model-pytorch-android/62490/2

At the moment `setNumThreads` is placed in `org.pytorch.Module`, but this method changes global threadPool size, in future we will move it to some separate class to repeat python binding structure, which has torch.set_num_threads()

Test Plan: Imported from OSS

Differential Revision: D18923167

Pulled By: IvanKobzarev

fbshipit-source-id: 8d98c2edbff42e9b673509672dce3f2dd03a923e
2019-12-11 14:20:14 -08:00
Tao Xu
b730d04ed2 Fix deadlock issues in ThreadPool (#29885)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29885

### Summary

Currently, we have a deadlock issue on iOS when running Resnet50. The problem happens when the task being run in the ThreadPool wants to call `getNumThread()` who will try to acquire the same mutex. And thus cause the deadlock situation. The fix is just remove the guard for `_numThreads`, as it's not likely to change after initialization.

### Test Plan

1. Generate a Resnet50 model using trace_model.py
2. Run `ios/TestApp/bootstrap.sh` to do the benchmark

cc shoumikhin AshkanAliabadi

Test Plan: Imported from OSS

Differential Revision: D18533505

Pulled By: xta0

fbshipit-source-id: 2a069d20b59833ec8b02ff05515c3739a85a15de
2019-11-15 19:27:52 -08:00
Michael Liu
92a516b9ff Apply modernize-use-override - 2/2
Summary:
Use C++11’s override and remove virtual where applicable.
Change are automatically generated.

Reviewed By: Orvid

Differential Revision: D14054721

fbshipit-source-id: 15d266fa1779b1e3ea6270f00841d7fb1e4d44ee
2019-02-13 21:01:28 -08:00
Jerry Zhang
0c32e1b43e use C10_MOBILE/ANDROID/IOS (#15363)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15363

Didn't define C10_MOBILE in the numa file move diff: D13380559
move CAFFE2_MOBILE/ANDROID/IOS to c10

```
codemod -m -d caffe2 --extensions h,hpp,cc,cpp,mm "CAFFE2_MOBILE" "C10_MOBILE"
codemod -m -d caffe2 --extensions h,hpp,cc,cpp,mm "CAFFE2_ANDROID" "C10_ANDROID"
codemod -m -d caffe2 --extensions h,hpp,cc,cpp,mm "CAFFE2_IOS" "C10_IOS"

```

i-am-not-moving-c2-to-c10

Reviewed By: marcinkwiatkowski

Differential Revision: D13490020

fbshipit-source-id: c4f01cacbefc0f16d5de94155c26c92fd5d780e4
2019-01-09 15:08:20 -08:00
Yangqing Jia
7d5f7ed270 Using c10 namespace across caffe2. (#12714)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12714

This is a short change to enable c10 namespace in caffe2. We did not enable
it before due to gflags global variable confusion, but it should have been
mostly cleaned now. Right now, the plan on record is that namespace caffe2 and
namespace aten will fully be supersets of namespace c10.

Most of the diff is codemod, and only two places of non-codemod is in caffe2/core/common.h, where

```
using namespace c10;
```

is added, and in Flags.h, where instead of creating aliasing variables in c10 namespace, we directly put it in the global namespace to match gflags (and same behavior if gflags is not being built with).

Reviewed By: dzhulgakov

Differential Revision: D10390486

fbshipit-source-id: 5e2df730e28e29a052f513bddc558d9f78a23b9b
2018-10-17 12:57:19 -07:00
Yangqing Jia
38f3d1fc40 move flags to c10 (#12144)
Summary:
still influx.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12144

Reviewed By: smessmer

Differential Revision: D10140176

Pulled By: Yangqing

fbshipit-source-id: 1a313abed022039333e3925d19f8b3ef2d95306c
2018-10-04 02:09:56 -07:00
Marat Dukhan
67c6d93634 Tune minimal work size (#10599)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10599

Not spawning threads with spin-lock synchronization is bad because they will switch to `condvar` wait, which increases wake-up latency next time they are needed.

Reviewed By: ajtulloch

Differential Revision: D9366664

fbshipit-source-id: 3b9e4a502aeefaf0ddc4795303a855d98980b02e
2018-08-16 17:39:57 -07:00
Orion Reblitz-Richardson
1d5780d42c Remove Apache headers from source.
* LICENSE file contains details, so removing from individual source files.
2018-03-27 13:10:18 -07:00
Marat Dukhan
7462eca363 Initialize cpuinfo in the thread pool
Thread pool called cpuinfo_get_processors_count() without initializing cpuinfo. Only by luck it didn't make Caffe2 single-threaded: threadpool is initialized after NNPACK, and NNPACK initializes cpuinfo itself.

This commit also updates cpuinfo to a version that aborts with a fatal error if its used uninitialized.
2018-03-26 15:44:47 -04:00
sf-wind
602a09dde7 Update caffe2 from facebook 4f527ef46abf (#2234)
* [GanH]: two_task_discriminator

as titled

and adding label smooth

* [Dper2] Simplified UI options needed for blob magnitude visualization

* [GanH]: fix tags

as titled

* Added type and shape inference for GatherRange operator

This helps with type / shape inference when using this operator in layers.
Also just a nice to have in general.

* Demonstrate Caffe2 exception handling with StoreHandlerTimeoutError in Python

We'd like to catch and recover from certain Caffe2 net exceptions. Use this diff to demonstrate a pattern of registering a pybind exception mapping and catching in Pythonusing caffe2::StoreHandlerTimeoutException.

* Bind Gloo IoException to IoError in Python

Allow peer failure handling and recovery using an exception based mechanism. This diff registers gloo::IoException with pybind.

* [GanH]: add label smoothing to softmax with loss

as titled

* [C2] Enable LARS in Adagrad and hook it to DPER

* [DPER] Don't pass LayerModelHelper in create_trainer_nodes

Since we're planning to get rid of it eventually and I want to get access to
NetDef only interface ASAP - I'm looking towards removing all references to
LMH, where we don't really need them.

* fix bugs in LambdaRankNdcgOp

the loss and gradient in LambdaRankNdcgOp are incorrect. The loss should be negative log of probs instead of log.

* Restrict thread pool on iOS to only big cores

Historically, iPhones exposed only one type of cores, and Caffe2 thread pool used all of them.
However, iPhone 8/iPhone X exposes 2 big + 4 LITTLE cores. As our thread pool doesn't support work stealing or other forms of load balancing, fast cores end up waiting for the slow ones, and it may be better to restrict execution to only 2 fast cores, like we do on Android.

* Remove SparseLength Sum/WeightedSum/Mean operators with fp16 engine

Remove SparseLength Sum/WeightedSum/Mean operators with fp16 engine

* make clang happy and get fewer warnings

make clang happy and get fewer warnings

* [Personalization] Support add_output_schema() in layer_model_helper

Problem:
Currently the output_schema of sparse_nn can only be set once. https://fburl.com/efth5zer.

Solution:
For flexibility, we want to add fields to output_schema incrementally.

Plan:
Wrap the change of `model._output_schema` into a new function `add_output_schema()` for adding additional output_schema.

Callsite:
The add_output_schema() should be called instead at https://fburl.com/efth5zer

Reference:
The newly added `add_output_schema()` will be similar to `add_loss()` in https://fburl.com/t2ii8njh
2018-03-12 12:22:59 -07:00
Marat Dukhan
09b6ad5785 Use cpuinfo instead of Android's libcpufeatures in Android build 2018-03-09 22:20:37 -05:00
Andrew Tulloch
66131dec6f Expose Caffe2 WorkerPool from ThreadPool
Reviewed By: harouwu

Differential Revision: D6946610

fbshipit-source-id: a9fef0f1c7732b534433ee9517abddc32d0ec702
2018-02-14 21:09:15 -08:00
Marat Dukhan
224493d9ce NNPACK: Use new bindings and custom thread pool
Summary:
This change should dramatically (~10X) improve performance of convolution with NNPACK engine
Closes https://github.com/caffe2/caffe2/pull/1730

Reviewed By: sf-wind

Differential Revision: D6695895

Pulled By: Maratyszcza

fbshipit-source-id: 26291916811ef4cb819a59aec848c4e23668e568
2018-01-11 10:48:12 -08:00
Yangqing Jia
8286ce1e3a Re-license to Apache
Summary: Closes https://github.com/caffe2/caffe2/pull/1260

Differential Revision: D5906739

Pulled By: Yangqing

fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902
2017-09-28 16:22:00 -07:00
Marat Dukhan
bd17684252 Run thread pool only on fast cores
Summary:
choose the number of cores for the thread pool as the number of fast cores

Didn't do any benchmarks, so its mostly FYI diff

Reviewed By: ajtulloch

Differential Revision: D5579797

fbshipit-source-id: 5ada001116c731780f38a62e9c0b500bd64a4bfe
2017-09-13 14:35:28 -07:00
Andrew Tulloch
898f3f398c Use gemmlowp-based worker pool (spinning + #threads of blocks of work) instead of custom work-stealing impl
Reviewed By: Yangqing

Differential Revision: D5696841

fbshipit-source-id: 84b629d2c1ebd418c75d5da907799e580cc59d1e
2017-08-28 00:46:01 -07:00
Jon Morton
9b9df3fbeb Sync mobile codebase changes back to fbcode
Summary: Rather chunky sync of changes made exclusively to mobile codebases back to fbcode.

Reviewed By: ajtulloch

Differential Revision: D5314405

fbshipit-source-id: c4d0a7244468f953eb63288306bc9bc78eb9e1be
2017-07-18 17:54:41 -07:00
Andrew Tulloch
6bff82eb6a Revert threadpool minWorkSize change on iOS
Reviewed By: sf-wind

Differential Revision: D5380298

fbshipit-source-id: fdf98bdda30e8cd6689c59fcc0357bca129d409b
2017-07-07 12:41:52 -07:00
Andrew Tulloch
43c46cc883 Reduce default ThreadPool min work size (~25% speedup for segmentation on S7).
Summary:
I noticed this when experimenting with the compute-bound convolutions
for the ULP HWGQ binary conv/gemm.

It's an ugly heuristic that Maratyszcza and co. are improving this half, but I think
this will be a net win for C2 especially if segmentation/mask r-cnn are
critical.

Differential Revision: D5375976

fbshipit-source-id: 863f76d434f133bf5a00e7ced1cfadfcf92e3c84
2017-07-06 08:32:32 -07:00
Andrew Tulloch
7d9a0a41fd Allow forcing single-threaded execution at runtime.
Summary: Might be useful for the EXC_RESOURCE / CPU issues.

Reviewed By: salexspb

Differential Revision: D4565494

fbshipit-source-id: 74ac9edeba6334a46ee6799a93ca96eb68216439
2017-02-16 06:11:27 -08:00
Yangqing Jia
589398950f fbsync at f5a877 2016-11-18 15:41:06 -08:00
Yangqing Jia
238ceab825 fbsync. TODO: check if build files need update. 2016-11-15 00:00:46 -08:00