Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33957
lots of small preprocessor warning cleanup for windows
Test Plan: CI green
Reviewed By: malfet, albanD
Differential Revision: D20153582
fbshipit-source-id: 18fd61c466fd1f55ededdae4448b3009a9cedc04
Summary:
Mainly renaming pthread_create of C2, the only one referred internally in NNPACK, that
is conflicting, to pthread_create_c2.
Removed 2 other conflicting symbols that are not used internally at all.
Pointing XNNPACK to original repo instead of the fork.
Copy pasted the new interface and implementation to
caff2/utils/threadpool, so that for internal builds we compile against
this.
When threadpool is unified this will be removed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33869
Differential Revision: D20140580
Pulled By: kimishpatel
fbshipit-source-id: de70df0af9c7d6bc065e85ede0e1c4dd6a9e6be3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33523
When using `ThreadPool::setNumThreads` to set the number of threads, it should not exceed the number of big cores. Otherwise, the performance could degrade significantly.
Test Plan:
```
cd ~/fbsource/xplat
buck test caffe2:caffe2_testAndroid
```
Reviewed By: dreiss
Differential Revision: D19779267
fbshipit-source-id: 4e980e8a0ccc2f37e1c8ed16e2f4651d72924dbd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30915
Since we now have C++14, we don't need these c10::guts helpers anymore
ghstack-source-id: 95777609
Test Plan: waitforsandcastle
Differential Revision: D18869639
fbshipit-source-id: 97716f932297c64c6e814410ac47b444c33d4e2e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29885
### Summary
Currently, we have a deadlock issue on iOS when running Resnet50. The problem happens when the task being run in the ThreadPool wants to call `getNumThread()` who will try to acquire the same mutex. And thus cause the deadlock situation. The fix is just remove the guard for `_numThreads`, as it's not likely to change after initialization.
### Test Plan
1. Generate a Resnet50 model using trace_model.py
2. Run `ios/TestApp/bootstrap.sh` to do the benchmark
cc shoumikhin AshkanAliabadi
Test Plan: Imported from OSS
Differential Revision: D18533505
Pulled By: xta0
fbshipit-source-id: 2a069d20b59833ec8b02ff05515c3739a85a15de
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12714
This is a short change to enable c10 namespace in caffe2. We did not enable
it before due to gflags global variable confusion, but it should have been
mostly cleaned now. Right now, the plan on record is that namespace caffe2 and
namespace aten will fully be supersets of namespace c10.
Most of the diff is codemod, and only two places of non-codemod is in caffe2/core/common.h, where
```
using namespace c10;
```
is added, and in Flags.h, where instead of creating aliasing variables in c10 namespace, we directly put it in the global namespace to match gflags (and same behavior if gflags is not being built with).
Reviewed By: dzhulgakov
Differential Revision: D10390486
fbshipit-source-id: 5e2df730e28e29a052f513bddc558d9f78a23b9b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10599
Not spawning threads with spin-lock synchronization is bad because they will switch to `condvar` wait, which increases wake-up latency next time they are needed.
Reviewed By: ajtulloch
Differential Revision: D9366664
fbshipit-source-id: 3b9e4a502aeefaf0ddc4795303a855d98980b02e
Thread pool called cpuinfo_get_processors_count() without initializing cpuinfo. Only by luck it didn't make Caffe2 single-threaded: threadpool is initialized after NNPACK, and NNPACK initializes cpuinfo itself.
This commit also updates cpuinfo to a version that aborts with a fatal error if its used uninitialized.
* [GanH]: two_task_discriminator
as titled
and adding label smooth
* [Dper2] Simplified UI options needed for blob magnitude visualization
* [GanH]: fix tags
as titled
* Added type and shape inference for GatherRange operator
This helps with type / shape inference when using this operator in layers.
Also just a nice to have in general.
* Demonstrate Caffe2 exception handling with StoreHandlerTimeoutError in Python
We'd like to catch and recover from certain Caffe2 net exceptions. Use this diff to demonstrate a pattern of registering a pybind exception mapping and catching in Pythonusing caffe2::StoreHandlerTimeoutException.
* Bind Gloo IoException to IoError in Python
Allow peer failure handling and recovery using an exception based mechanism. This diff registers gloo::IoException with pybind.
* [GanH]: add label smoothing to softmax with loss
as titled
* [C2] Enable LARS in Adagrad and hook it to DPER
* [DPER] Don't pass LayerModelHelper in create_trainer_nodes
Since we're planning to get rid of it eventually and I want to get access to
NetDef only interface ASAP - I'm looking towards removing all references to
LMH, where we don't really need them.
* fix bugs in LambdaRankNdcgOp
the loss and gradient in LambdaRankNdcgOp are incorrect. The loss should be negative log of probs instead of log.
* Restrict thread pool on iOS to only big cores
Historically, iPhones exposed only one type of cores, and Caffe2 thread pool used all of them.
However, iPhone 8/iPhone X exposes 2 big + 4 LITTLE cores. As our thread pool doesn't support work stealing or other forms of load balancing, fast cores end up waiting for the slow ones, and it may be better to restrict execution to only 2 fast cores, like we do on Android.
* Remove SparseLength Sum/WeightedSum/Mean operators with fp16 engine
Remove SparseLength Sum/WeightedSum/Mean operators with fp16 engine
* make clang happy and get fewer warnings
make clang happy and get fewer warnings
* [Personalization] Support add_output_schema() in layer_model_helper
Problem:
Currently the output_schema of sparse_nn can only be set once. https://fburl.com/efth5zer.
Solution:
For flexibility, we want to add fields to output_schema incrementally.
Plan:
Wrap the change of `model._output_schema` into a new function `add_output_schema()` for adding additional output_schema.
Callsite:
The add_output_schema() should be called instead at https://fburl.com/efth5zer
Reference:
The newly added `add_output_schema()` will be similar to `add_loss()` in https://fburl.com/t2ii8njh
Summary:
choose the number of cores for the thread pool as the number of fast cores
Didn't do any benchmarks, so its mostly FYI diff
Reviewed By: ajtulloch
Differential Revision: D5579797
fbshipit-source-id: 5ada001116c731780f38a62e9c0b500bd64a4bfe
Summary: Rather chunky sync of changes made exclusively to mobile codebases back to fbcode.
Reviewed By: ajtulloch
Differential Revision: D5314405
fbshipit-source-id: c4d0a7244468f953eb63288306bc9bc78eb9e1be
Summary:
I noticed this when experimenting with the compute-bound convolutions
for the ULP HWGQ binary conv/gemm.
It's an ugly heuristic that Maratyszcza and co. are improving this half, but I think
this will be a net win for C2 especially if segmentation/mask r-cnn are
critical.
Differential Revision: D5375976
fbshipit-source-id: 863f76d434f133bf5a00e7ced1cfadfcf92e3c84
Summary: Might be useful for the EXC_RESOURCE / CPU issues.
Reviewed By: salexspb
Differential Revision: D4565494
fbshipit-source-id: 74ac9edeba6334a46ee6799a93ca96eb68216439