Commit Graph

23 Commits

Author SHA1 Message Date
Jane Xu
71ca600af9 Renaming CAFFE2_API to TORCH_API (#49496)
Summary:
Since caffe2 and torch have been consolidated, CAFFE2_API should be merged with TORCH_API. Addresses a TODO.

Manually edited some references of the removed `CAFFE2_API`:
* `CONTRIBUTING.md`
* `caffe2/proto/CMakeLists.txt`
* `cmake/ProtoBuf.cmake`
* `c10/macros/Export.h`
* `torch/csrc/WindowsTorchApiMacro.h`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49496

Reviewed By: malfet, samestep

Differential Revision: D25600726

Pulled By: janeyx99

fbshipit-source-id: 7e068d959e397ac183c097d7e9a9afeca5ddd782
2020-12-18 10:54:50 -08:00
Tristan Rice
ce54f0d411 Back out "Revert D20449887: [dt][caffe2] enable using smart exceptions in async nets" (#36172)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36172

Original commit changeset: 3d7801613f86

D20449887 broke some OSS tests as the OSS export sync wasn't working correctly.

Test Plan:
Manually export latest version to OSS to trigger the tests

+ test plan in D20449887

verified onnx tests are passing in https://github.com/pytorch/pytorch/pull/36172

Reviewed By: andrewwdye

Differential Revision: D20902279

fbshipit-source-id: bc30fcc9f5cc8076f69a5d92675fd27455948372
2020-04-13 11:31:52 -07:00
Edward Yang
459163b8eb Revert D20449887: [dt][caffe2] enable using smart exceptions in async nets
Test Plan: revert-hammer

Differential Revision:
D20449887

Original commit changeset: 047fdf1bd52f

fbshipit-source-id: 3d7801613f86885c204f3946f3a52a855516faa3
2020-04-06 19:37:05 -07:00
Tristan Rice
8ef82fc2c9 [dt][caffe2] enable using smart exceptions in async nets (#34753)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34753

This improves support for exceptions and capturing stack traces in caffe2 async nets. We generally want to use exceptions everywhere we can in order to preserve stack information. It also makes the exception timestamp more accurate so multiple exceptions at the same time can be correctly ordered.

Test Plan: Updated the tests to use the new error semantics + adds a test to ensure the stack is correctly propagated through deferrable async scheduling.

Reviewed By: andrewwdye

Differential Revision: D20449887

fbshipit-source-id: 047fdf1bd52fd7c7c1f3fde77df9a27ed9e288e7
2020-04-06 14:27:07 -07:00
Benny Chen
f25322fb97 Fix issues under caffe round 1
Summary: Some automation to fix uninitialized members for caffe2 code. Ran canary to make sure I don't have any regression in prod, but not sure how to test comprehensively for caffe2

Reviewed By: ezyang

Differential Revision: D13776185

fbshipit-source-id: fb2a479971cc0276d8784be1c44f01252410bd24
2019-01-23 19:04:59 -08:00
Sebastian Messmer
d408324350 Move files to/from c10/core and c10/util (#15316)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15316

This starts cleaning up the files in c10 according to the module structure we decided on.

Move to c10/util:
- Half.h, Half-inl.h, Half.cpp, bitcasts.h

Move to c10/core:
- Device.h, Device.cpp
- DeviceType.h, DeviceType.cpp

i-am-not-moving-c2-to-c10

Reviewed By: dzhulgakov

Differential Revision: D13498493

fbshipit-source-id: dfcf1c490474a12ab950c72ca686b8ad86428f63
2019-01-10 16:22:22 -08:00
Ilia Cherniavskii
4276fe7867 Support for saving exceptions in async CPU ops (#12904)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12904

Enabling support for saving exceptions in async parts of CPU ops via
event().SaveException(). The error contract for CPU ops becomes:
 - return false in sync part -> net->Run() returns false
 - throw in sync part -> net->Run() rethrows the same exception
 - SetFinished("error msg") in async part -> net->Run() returns false
 - event().SetFinishedWithException() in async part -> net->Run() rethrows the same
   exception

Reviewed By: andrewwdye

Differential Revision: D10479130

fbshipit-source-id: 850ee9cbf83b04dd24b25eba359439b0cf7853c0
2018-10-29 04:57:40 -07:00
Edward Yang
34cca9f05b Move Device and DeviceType to c10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12995

Reviewed By: Yangqing

Differential Revision: D10513246

fbshipit-source-id: 0c6d52e09166d7e8a786c1a0e21685ec9c35b12a
2018-10-24 08:27:44 -07:00
Jerry Zhang
9f4bcdf075 caffe2::DeviceType -> at::DeviceType (#11254)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11254
Previously we use DeviceType in caffe2.proto directly, but it's an `enum` and have implicit conversion to int, which does not have type safety, e.g. we have to explicitly check for a device type is valid in event.h:
```
template <int d>
struct EventCreateFunctionRegisterer {
  explicit EventCreateFunctionRegisterer(EventCreateFunction f) {
    static_assert(d < MaxDeviceTypes, "");
    Event::event_creator_[d] = f;
  }
};
```
at::DeviceType is an `enum class`, and it does not have implicit conversion to int, and provides better type safety guarantees. In this diff we have done the following refactor(taking CPU as an example):

    1. caffe2::DeviceType → caffe2::DeviceTypeProto
    2. caffe2::CPU → caffe2::PROTO_CPU
    3. caffe2::DeviceType = at::DeviceType
    4. caffe2::CPU = at::DeviceType::CPU

codemod -d caffe2/caffe2 --extensions h,cc,cpp 'device_type\(\), ' 'device_type(), PROTO_'
+ some manual changes

In short, after this diff, in c++, caffe2::CPU refers to the at::DeviceType::CPU and the old proto caffe2::CPU will be caffe2::PROTO_CPU.
In python side, we have a temporary workaround that alias `caffe2_pb2.CPU = caffe2_pb2.PROOT_CPU` to make the change easier to review and this will be removed later.

Reviewed By: ezyang

Differential Revision: D9545704

fbshipit-source-id: 461a28a4ca74e616d3ee183a607078a717fd38a7
2018-09-05 16:28:09 -07:00
Orion Reblitz-Richardson
6508db7421 Remove BUILD_CAFFE2 and build everything (#8338)
Summary:
This completely removes BUILD_CAFFE2 from CMake. There is still a little bit of "full build" stuff in setup.py that enables USE_CUDNN and BUILD_PYTHON, but otherwise everything should be enabled for PyTorch as well as Caffe2. This gets us a lot closer to full unification.

cc mingzhe09088, pjh5, ezyang, smessmer, Yangqing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8338

Reviewed By: mingzhe09088

Differential Revision: D9600513

Pulled By: orionr

fbshipit-source-id: 9f6ca49df35b920d3439dcec56e7b26ad4768b7d
2018-08-31 13:10:24 -07:00
Edward Yang
91797c0672 Replace direct include of caffe2.pb.h with an intermediary header caffe2_pb.h (#10946)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10946

```
codemod -d . --extensions cc,cpp,cu,cuh,h caffe2/proto/caffe2.pb.h caffe2/proto/caffe2_pb.h
```

Reviewed By: houseroad

Differential Revision: D9539945

fbshipit-source-id: 497d04720e8e7e61c05ffe1b23733d0cb774de7e
2018-08-28 11:57:08 -07:00
sf-wind
5b86c3af4a
Update from facebook (#8384)
* [fix] fixup the bias multiplier data access issue

Hotfix for failues in conv_transpose

* [D2][Easy]: lint regularizer

lint with black

* [GanH]: Split mu in adaptive weight for diagnose

* [Dper] Add the ability to split FC weights into multiple smaller ones

* fix SumReduceLikeOp for empty blob

as desc.

* add ctc_greedy_decoder for caffe2

ctc_greedy_decoder same as tf's

* Update event callback handling

Allow multiple callbacks per event

* Add WeightedSum layer

The motivation is to do weighted sum in HoNet/crossnet, in the next diff, I'll replace model.Add with model.WeightedSum in
honet: https://fburl.com/f4rmolg2
crossnet: https://fburl.com/v7awn8se, https://fburl.com/63filbnm

* Replicate DAG's behavior

Some callers expect RunAsync to block, replicate that behavior in case of
explicit 'dag' net type

* [dper] layernorm layer

as title

* Override dag, async_dag, async_polling

Overriding dag, async_dag and async_polling with async_scheduling

* Name the thread pools

Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead.

* [Caffe2] FilleOp should support int64_t dimensions

Change argument type to int64_t for shape argument of FillerOp (used in ConstantFill, XavierFill, etc)

* Remove caffe2/caffe2/contrib/torch/

It's not used anywhere and depends on old lua torch that conflicts with Aten. Given PT1 it's not relevant any more (though it was nice and clever code!)

#accept2ship

* Fix linearWarmup multiplier check

The multiplier needs to be non-negative, not strictly positive.

* Revert D3314316

This is after 2 years and we do not seem to have a use case for this one, so
for the sake of clean API design we should potentially remove this. This would
allow us to potentially pass in arguments to optionally construct an object,
although it is indeed a little bit unclear how we can reuse existing objects if
constructor arguments are passed in. In any case, we may want to remove this
dangling feature.

* Speedup generate proposals by partial_sort.

Speedup generate proposals by partial_sort.

FACEBOOK:
- Saw speed improvement for training with this op.
- Yanghan benchmarked the op on a small dataset and see consistent 100% improvement on speed (6ms -> 3ms) on 420 input resolution. See next diff for details.

* More parallel processing friendly for CPP version of GenerateProposals.

More parallel processing friendly for CPP version of GenerateProposals.

* [DT] [43/n] Lift stop conditions inside reader code back to flow control

1. Split multi_reader function into local_reader and remote_reader
2. Lifted stop conditions inside Limiter back to flow control
3. Split epoch flow building logic into 3 cases:
  - single machine (1 reader, 1 trainer on trainer0 node, no PS)
  - (1 reader + 1 trainer) on trainer0 node, has PS
  - multiple readers, readers do not share nodes with trainers, might have PS or not

* Resolve conflicts for torch/_thnn/utils.py

* [Caffe2] Handle image decoding errors

Image decoding errors can make the whole training fail. This diff is to handle them
1.Catch imdecode exceptions and check if decoded image has zero columns or rows. This is counted as decoding errors.
2.Replace the image with empty in case of error
3.Count the number of errors and throw runtime exception if the rate reaches given number

The empty image data is kept. It might introduce noise in the training data.

* Update MKL exporter to IDEEP ops

TSIA

* [Caffe2] GlobalInit is thread safe, fixing the comment

With the mutex and lock, GlobalInit is thread safe.
Update the comments.

* Back out "Add support for generating ATen files during fbcode build"

Original commit changeset: 28970ddba353

@override-unit-failures
(Note: this ignores all push blocking failures!)

* [DT]: fix predictor save

similar to D6610058, here we add the fix for distributed online training

* Remove net_singlethread_async_gpu.cc

Closes https://github.com/caffe2/caffe2/pull/2528

This removes net_singlethread_async_gpu.cc as part of our effort to clean
CUDAContext and the net executors.

* Inline DFS task execution

Add a DFS inline task execution mode in executor

* Add c10 folder to fbcode

This adds the c10 folder and its test cases to fbcode. Build flags are mostly taken from aten.

* add dependencies for online trainer

Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators

Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/

* Resolve conflicts for tools/jit/gen_jit_dispatch.py

* [Fix] sparse regularization in distributed training

* Support advanced pooling options in sum processor

* support advanced pooling options in sum processor
* remove redundant code
* support attention in sum processor

* Improve shard logging in net tracing code

Make it handle arbitrary shard ids instead of just one digit ids.

* [Caffe2] Call GlobalInit in predictor only in mobile

FACEBOOK:
Calling GlobalInit long after the program starts may not be safe. There are issues if the following happens:

User does not call GlobalInit and initFacebook after program starts
User sets a flag manually: https://fburl.com/mcsumw7d
User calls OSS predictor.
OSS predictor calls GlobalInit
GlobalInit calls initFacebook
initFacebook resets all flags: https://fburl.com/tolszha1
Thus, the user manually set flags are overwritten

This would happen anytime GlobalInit is called long after the program starts.
I suppose the intention of the user in this case is not to call GlobalInit throughout the program,
but use Caffe2 regardless (is that desired?)
But adding GlobalInit in the OSS predictor would automatically call GlobalInit when using Caffe2.

This issue doesn't exist in mobile, since initFacebook is not called on mobile.

For now, guard the GlobalInit in predictor for mobile only.
May want to ensure the GlobalInit is always called at the start of the program. @[3501714:kutta] has seen weird issues when not calling GlobalInit at the start of the program on server side. He has made some progress on this.

* resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py

* Add empty fix for SumLikeReduceOp

Add empty fix for SumLikeReduceOp

* Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN

This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* Remove Declarations.yaml

* Include common.h

* Change std::stoi to caffe2::stoi

* Add thread_name.cc to the CMake file

* No need to subtract 1. Fix test segfaults

* Fix NetTest, ObserverTest

Fix tests

(cherry picked from commit 3767e66c3f365596cba3d46d3e7322c933a0ab41)

* CTCGreedyDecoderOp only has CPU implementation, test should only run on CPU

* Add a variable to avoid conversion resizing issue

* [fix] fixup the bias multiplier data access issue

Hotfix for failues in conv_transpose

* [D2][Easy]: lint regularizer

lint with black

* [GanH]: Split mu in adaptive weight for diagnose

* [Dper] Add the ability to split FC weights into multiple smaller ones

* fix SumReduceLikeOp for empty blob

as desc.

* add ctc_greedy_decoder for caffe2

ctc_greedy_decoder same as tf's

* Update event callback handling

Allow multiple callbacks per event

* Add WeightedSum layer

The motivation is to do weighted sum in HoNet/crossnet, in the next diff, I'll replace model.Add with model.WeightedSum in
honet: https://fburl.com/f4rmolg2
crossnet: https://fburl.com/v7awn8se, https://fburl.com/63filbnm

* Replicate DAG's behavior

Some callers expect RunAsync to block, replicate that behavior in case of
explicit 'dag' net type

* [dper] layernorm layer

as title

* Override dag, async_dag, async_polling

Overriding dag, async_dag and async_polling with async_scheduling

* Name the thread pools

Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead.

* [Caffe2] FilleOp should support int64_t dimensions

Change argument type to int64_t for shape argument of FillerOp (used in ConstantFill, XavierFill, etc)

* Remove caffe2/caffe2/contrib/torch/

It's not used anywhere and depends on old lua torch that conflicts with Aten. Given PT1 it's not relevant any more (though it was nice and clever code!)

#accept2ship

* Fix linearWarmup multiplier check

The multiplier needs to be non-negative, not strictly positive.

* Revert D3314316

This is after 2 years and we do not seem to have a use case for this one, so
for the sake of clean API design we should potentially remove this. This would
allow us to potentially pass in arguments to optionally construct an object,
although it is indeed a little bit unclear how we can reuse existing objects if
constructor arguments are passed in. In any case, we may want to remove this
dangling feature.

* Speedup generate proposals by partial_sort.

Speedup generate proposals by partial_sort.

FACEBOOK:
- Saw speed improvement for training with this op.
- Yanghan benchmarked the op on a small dataset and see consistent 100% improvement on speed (6ms -> 3ms) on 420 input resolution. See next diff for details.

* More parallel processing friendly for CPP version of GenerateProposals.

More parallel processing friendly for CPP version of GenerateProposals.

* [DT] [43/n] Lift stop conditions inside reader code back to flow control

1. Split multi_reader function into local_reader and remote_reader
2. Lifted stop conditions inside Limiter back to flow control
3. Split epoch flow building logic into 3 cases:
  - single machine (1 reader, 1 trainer on trainer0 node, no PS)
  - (1 reader + 1 trainer) on trainer0 node, has PS
  - multiple readers, readers do not share nodes with trainers, might have PS or not

* Resolve conflicts for torch/_thnn/utils.py

* [Caffe2] Handle image decoding errors

Image decoding errors can make the whole training fail. This diff is to handle them
1.Catch imdecode exceptions and check if decoded image has zero columns or rows. This is counted as decoding errors.
2.Replace the image with empty in case of error
3.Count the number of errors and throw runtime exception if the rate reaches given number

The empty image data is kept. It might introduce noise in the training data.

* Update MKL exporter to IDEEP ops

TSIA

* [Caffe2] GlobalInit is thread safe, fixing the comment

With the mutex and lock, GlobalInit is thread safe.
Update the comments.

* Back out "Add support for generating ATen files during fbcode build"

Original commit changeset: 28970ddba353

@override-unit-failures
(Note: this ignores all push blocking failures!)

* [DT]: fix predictor save

similar to D6610058, here we add the fix for distributed online training

* Remove net_singlethread_async_gpu.cc

Closes https://github.com/caffe2/caffe2/pull/2528

This removes net_singlethread_async_gpu.cc as part of our effort to clean
CUDAContext and the net executors.

* Inline DFS task execution

Add a DFS inline task execution mode in executor

* Add c10 folder to fbcode

This adds the c10 folder and its test cases to fbcode. Build flags are mostly taken from aten.

* add dependencies for online trainer

Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators

Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/

* Resolve conflicts for tools/jit/gen_jit_dispatch.py

* [Fix] sparse regularization in distributed training

* Support advanced pooling options in sum processor

* support advanced pooling options in sum processor
* remove redundant code
* support attention in sum processor

* Improve shard logging in net tracing code

Make it handle arbitrary shard ids instead of just one digit ids.

* [Caffe2] Call GlobalInit in predictor only in mobile

FACEBOOK:
Calling GlobalInit long after the program starts may not be safe. There are issues if the following happens:

User does not call GlobalInit and initFacebook after program starts
User sets a flag manually: https://fburl.com/mcsumw7d
User calls OSS predictor.
OSS predictor calls GlobalInit
GlobalInit calls initFacebook
initFacebook resets all flags: https://fburl.com/tolszha1
Thus, the user manually set flags are overwritten

This would happen anytime GlobalInit is called long after the program starts.
I suppose the intention of the user in this case is not to call GlobalInit throughout the program,
but use Caffe2 regardless (is that desired?)
But adding GlobalInit in the OSS predictor would automatically call GlobalInit when using Caffe2.

This issue doesn't exist in mobile, since initFacebook is not called on mobile.

For now, guard the GlobalInit in predictor for mobile only.
May want to ensure the GlobalInit is always called at the start of the program. @[3501714:kutta] has seen weird issues when not calling GlobalInit at the start of the program on server side. He has made some progress on this.

* resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py

* Add empty fix for SumLikeReduceOp

Add empty fix for SumLikeReduceOp

* Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN

This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* Remove Declarations.yaml

* Include common.h

* Change std::stoi to caffe2::stoi

* Add thread_name.cc to the CMake file

* No need to subtract 1. Fix test segfaults

* Fix NetTest, ObserverTest

Fix tests

(cherry picked from commit 3767e66c3f365596cba3d46d3e7322c933a0ab41)

* CTCGreedyDecoderOp only has CPU implementation, test should only run on CPU

* Add a variable to avoid conversion resizing issue

* Remove the code per soumith's comments

* Remove the code per soumith's comments

* Remove blank lines in the end of file

* Resolve conflicts for torch/_thnn/utils.py

* Update MKL exporter to IDEEP ops

TSIA

* Back out "Add support for generating ATen files during fbcode build"

Original commit changeset: 28970ddba353

@override-unit-failures
(Note: this ignores all push blocking failures!)

* add dependencies for online trainer

Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators

Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/

* Resolve conflicts for tools/jit/gen_jit_dispatch.py

* Support advanced pooling options in sum processor

* support advanced pooling options in sum processor
* remove redundant code
* support attention in sum processor

* resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py

* Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN

This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9

@bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
@cause_a_sev_many_files

* Remove Declarations.yaml

* Include common.h

* Change std::stoi to caffe2::stoi

* [caffe2] uprade IDEEP and hotfix for conv op accuracy issue (#8364)

* [IDEEP] Upgrade IDEEP version

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* [IDEEP] Fix accuracy issue in conv op

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fix build error due to lack of src in CMakeLists

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Remove the code per soumith's comments

* [ONNX] Add an ATen fallback pathway for ONNX export (#8273)

* ATen fallback for ONNX export

* Move to enum

* Fix model test

* Add comment

* Address comments

BC interface

* Remove imaginary file (#8415)

* [Caffe2] Enable AMD/MIOPEN ops for Caffe2  (#8306)

* Add hip support for caffe2 core

* Add MIOPEN header/wrapper to caffe2 core

* Add HIP device into caffe2 PB

* top level makefile change for rocm/hip

* makefile scaffolding for AMD/RocM/HIP

* Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files

* caffe2 PB update for AMD/ROCM HIP device

* Add AMD/RocM/Thrust dependency

* HIP threadpool update

* Fix makefile macro

* makefile fix: duplicate test/binary name

* makefile clean-up

* makefile clean-up

* add HIP operator registry

* add utilities for hip device

* Add USE_HIP to config summary

* makefile fix for BUILD_TEST

* merge latest

* Fix indentation

* code clean-up

* Guard builds without HIP and use the same cmake script as PyTorch to find HIP

* Setup rocm environment variables in build.sh (ideally should be done in the docker images)

* setup locale

* set HIP_PLATFORM

* Revert "set HIP_PLATFORM"

This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad.

* continue the build script environment variables mess

* HCC_AMDGPU_TARGET

* Cleanup the mess, has been fixed in the lastest docker images

* Assign protobuf field hip_gpu_id a new field number for backward compatibility

* change name to avoid conflict

* Fix duplicated thread pool flag

* Refactor cmake files to not add hip includes and libs globally

* Fix the wrong usage of environment variables detection in cmake

* Add MIOPEN CNN operators

* Revert "Add MIOPEN CNN operators"

This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a.

* Add MIOPEN pooling operator

* Add MIOPEN activation operator

* Add MIOPEN softmax operator

* Add MIOPEN spatial batch norm operator

* Add MIOPEN loacl response normalization operator

* Add MIOPEN conv operator

* Clean-up LRN ops

* enable fp16 in MIOPEN pool ops

* Enable fp16 for MIOPEN relu op

* Enable fp16 for MIOPEN spatial batch norm op

* code clean-up

* revert float16 support

* Create Caffe2 python binding for AMD/ROCM/HIP

* Add op fallback for HIP operator

* add hip src/test files in cmake

* exclude hip src/test files

* fix python binding for hip backend

* fix MIOPEN pooling op workspace

* hack to compile miopen operators

* fix include path for MIOPEN ops

* Fix include path

* Add HIP math utilities

* Fix path for HIP math utils

* cmake fix

* Cmake fix / hipcc for hip files

* suppress hipcc warning

* cmake fix /replcae USE_HIP with USE_ROCM

* revert LoadHIP.cmake change

* fix include for thrust/cub-hip

* include path fix for conversion.h

* Updated with latest upstream changes

* clang format fixes

* Context_hip updates

* Fixed typo in rocblas handle get function

* Updated hipified math utils

* Updated math hip test util

* Updated context hip test

* Updated common_hip

* Updated net async dag for HIP

* Added MIOPEN in operator hip test

* fix

* C2 dependencies clean-up

* fix include path for building custom protobuf

* Decouple miopen pool op and conv_pool_op base

* cmake refactor

* fix operator_hip_test

* move all hip/miopen ops files into caffe2/operators/hip

* sanitize cmake

* permission issue

* remove extra parenthesis

* remove artifact from resolving merge conflict

* cont. sanitize cmake files

* fix syntax error

* sanitize conversion.h

* .

* Revert "."

This reverts commit 56020cb0e996a31ae27bf1f8f491955ed0b121b9.

* clang-format

* Enable some reduce operators' ONNX backend tests (#8418)

* fix old comment to point to the right file (#8416)

* Stop pinning nccl version. (#8421)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* Expose logsumexp docs and mark log_sum_exp in distributions for internal use (#8428)

* Enable some of the ONNX backend test on broadcasting (#8423)

* Enable some of the ONNX backend test on broadcasting

* enable gemm broadcast

* Expose proto utils and ONNX (#8073)

* Expose proto utils and ONNX from PyTorch libcaffe2.so

* Try to use protobuf from _C.so

* Fix ONNX proto header include

* Adjust order of imports for ONNX until nanopb goes away

* Set and use ONNX_NAMESPACE for PyTorch builds

* Show protobuf summary for all builds

* Add ONNX_NAMESPACE for cpp_build

* Statically link libprotobuf.a into libtorch.so

* Set ONNX_NAMESPACE on Windows build

* Move core/dispatch up as well

* Add /MD flag for Windows build of _C

* Potential Windows fix for ONNX and protobuf

* Add direct linkage from _C to ONNX on Windows

* Only include protobuf wrapper for PyTorch

* Pass extra_compile_args to _nvrtc ext build

* Remove installation of .a files

* Rebase creates some weird situations, revert them manually

* Remove more weird changes due to rebase

* Need to add thread_name.cc after merge
2018-06-13 13:10:45 -07:00
Sebastian Meßmer
49f8581745
Update from facebook (#7855)
* [mpscnn] MPSCNNChannelShuffle

att

* [Easy] Adding tags as an argument to the functional layer

Without it "tags" would be added as an argument to the operator.

The change here is based on the assumption that there is no operator that takes "tags" as an argument.

* Fix locally_connected_op schema check.

Fix locally_connected_op schema check.

* [C2] Add TypeAndShape inference for few more operators

As desc

* [c2] Shape inference should support 0 as dimension

Tensors can have 0 in their dimension.

* Make MockHiveReader loop over and support max_examples

Replace DatasetReader with RandomDatasetReader.

So that Mock Hive Reader can simulate a large data input using a small sample file as source.

* Utility function to wipe cache between benchmark runs

Caffe2 benchmark does not wipe out cache between runs, and this potentially creates an unrealistically optimistic picture of performance. This diff adds utility function to wipe out the cache.

* Allow caffe2 GlobalInit to be invoked multiple times

Allow caffe2 GlobalInit to be invoked multiple times. Will re-parse gflags and update logging levels on successive invocations, but will not re-run init functions or perform other one-time initialization.

* Add Caffe2 GlobalInitIsCalledGuard to base net and operator classes

Warn if caffe2's GlobalInit function has not been invoked before creating an operator or net object. This is based on discussion here: https://fb.quip.com/kqGIAbmK7vNG

* Rethrow current exception on failure

Rethrow current exception instead of copy constructing a new one on op failure.

* Make `clone()` return subclass of List/Struct

`clone()` is not working correctly when we subclass those classes

* Wipe the cache before the net run

the util function is copied from D7409424
will rebase once D7409424 is landed.

* [Caffe2] [Mobile] Support utils/cast.h::GetCastDataType with LITE_PROTO builds

* Correct includes

async_polling include -> async_base include

* Prepare execution flags for executor migration

Making async_scheduling aware of underlying net type to prepare for executor
migration

* Add operator level observers into async executor

Adding operator level observers into RunAsync operators' calls

* Cleanup TEST_Benchmark

Remove duplicate code and provide default implementation in NetBase

* [C2] Fix type and shape inference for binary comparison ops

As desc.

* Add GlobalInit to predictor to ensure initialization is always done before prediction

FACEBOOK:

Redo D7651453 the correct way.

Now use a static variable for the arguments passed to GLog

* Remove spammy log message

This method is currently used in various places inside Caffe itself.

* Disable events for operators inside a chain

We don't need to use events in operators within a chain because the chain is
always scheduled on a single stream, keeping only first and last event for
scheduling purposes

* Ensure correct finish run order

In rare cases we might call finishRun and trigger net's destruction while
another worker is still holding shared_ptr to a thread pool, that can cause
thread pool destruction from within a worker thread in case no other nets are
using the pool. This diff fixes the order of calling finishRun and also changes
pool() to return raw pointer to keep pool's ownership within the net

* Reduce unnecessary polling

Make sure we don't waste CPU by polling operators that we can set an efficient
callbacks on

* Squash commit of syncing 9506eeb from github to fbcode

Patch xplat buck fix

add virtual destructor to OptimizationPass

add virtual destructor to OptimizationPass

build fixes for sync

build fixes for sync

* Fix net tracing

Fix net tracing from async_scheduling

* Fix logging
2018-05-29 11:38:02 -07:00
Orion Reblitz-Richardson
1d5780d42c Remove Apache headers from source.
* LICENSE file contains details, so removing from individual source files.
2018-03-27 13:10:18 -07:00
Yangqing Jia
91d76f5dbd Reapply Windows fix
Summary:
Last fix was uncommitted due to a bug in internal build (CAFFE2_API causing error). This one re-applies it as well as a few more, especially enabling gtest.

Earlier commit message: Basically, this should make windows {static_lib, shared_lib} * {static_runtime, shared_runtime} * {cpu, gpu} work other than gpu shared_lib, which willyd kindly pointed out a symbol limit problem. A few highlights:
(1) Updated newest protobuf.
(2) use protoc dllexport command to ensure proper symbol export for windows.
(3) various code updates to make sure that C2 symbols are properly shown
(4) cmake file changes to make build proper
(5) option to choose static runtime and shared runtime similar to protobuf
(6) revert to visual studio 2015 as current cuda and msvc 2017 do not play well together.
(7) enabled gtest and fixed testing bugs.

Earlier PR is #1793

Closes https://github.com/caffe2/caffe2/pull/1827

Differential Revision: D6832086

Pulled By: Yangqing

fbshipit-source-id: 85f86e9a992ee5c53c70b484b761c9d6aed721df
2018-01-29 10:03:28 -08:00
Vladimir Chalyshev
8c02674964 Revert D6817719: [caffe2][PR] Better support for windows
Summary:
This reverts commit d286264fccc72bf90a2fcd7da533ecca23ce557e

bypass-lint

An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
cause_a_sev_many_files

Differential Revision: D6817719

fbshipit-source-id: 8fe0ad7aba75caaa4c3cac5e0a804ab957a1b836
2018-01-26 06:08:49 -08:00
Yangqing Jia
8aa8eaabb1 Better support for windows
Summary:
Basically, this should make windows {static_lib, shared_lib} * {static_runtime, shared_runtime} * {cpu, gpu} work. A few highlights:

(1) Updated newest protobuf.
(2) use protoc dllexport command to ensure proper symbol export.
(3) various code updates to make sure that C2 symbols are properly shown
(4) cmake file changes to make build proper
(5) option to choose static runtime and shared runtime similar to protobuf
(6) revert to visual studio 2015 as current cuda and msvc 2017 do not play well together.
Closes https://github.com/caffe2/caffe2/pull/1793

Reviewed By: dzhulgakov

Differential Revision: D6817719

Pulled By: Yangqing

fbshipit-source-id: d286264fccc72bf90a2fcd7da533ecca23ce557e
2018-01-26 00:48:43 -08:00
Ilia Cherniavskii
38f166c13a Async executor with less polling
Summary:
Async executor based on async_polling (D5985110):
- Tasks scheduling other tasks, using polling only when necessary (e.g.
  CUDA->CPU case)
- Fully async, i.e. RunAsync immediately returns

Reviewed By: azzolini

Differential Revision: D6281681

fbshipit-source-id: 06e3723e1424ffab652c38ca7b279cf76e43fa44
2017-11-28 18:50:32 -08:00
Ilia Cherniavskii
1149b9bbb5 Polling async net executor
Summary:
Implementation of polling async net executor.
Notes:
- New net executor async_polling - schedules CPU and GPU ops asynchronously, uses single polling thread
- Events: update to Caffe2 events to support async CPU events, adding new methods:
 Query() - non-blocking checking of event states: INITIALIZED -> RECORDED -> SUCCESS/FAILED
 ErrorMessage() - when operation runs asynchronously and fails calling this on event will give error message
- Tasks: using existing DAGNet's algorithm to compute CPU and GPU chains, a separate task for each chain
- Polling: using single thread to query state of events - for CPU tasks atomically queries task state, for GPU task - uses cudaEventQuery; using Event
- Scheduling of CPU ops: using global thread pools
- Scheduling of GPU ops: using GPU thread pool per GPU device

Reviewed By: dzhulgakov

Differential Revision: D5985110

fbshipit-source-id: a9de7fcbb71d046a3aa1b573072b89a65dfeee8c
2017-11-03 07:27:44 -07:00
Yangqing Jia
8286ce1e3a Re-license to Apache
Summary: Closes https://github.com/caffe2/caffe2/pull/1260

Differential Revision: D5906739

Pulled By: Yangqing

fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902
2017-09-28 16:22:00 -07:00
Yangqing Jia
a5879ea9bd Resolve Windows warning C4099 issue (class/struct name mixture)
Summary:
TSIA - no functionality change introduced. See current build (e.g. https://ci.appveyor.com/project/Yangqing/caffe2/build/2148/job/mj9auhnernrgdfpe) for the warning messages produced right now.
Closes https://github.com/caffe2/caffe2/pull/1255

Differential Revision: D5899177

Pulled By: Yangqing

fbshipit-source-id: 3f41c82b0d5a1caba63d8cc7101582af63fbc99f
2017-09-23 23:04:56 -07:00
Yangqing Jia
0b363fd9de Add event as a first-class citizen of the OperatorBase interface.
Summary:
This adds Event as a new member object to OperatorBase, hence allowing us to do
async computation more easily. Will send a fix for proper RunAsync() for
SimpleNet.

In principle this should have no functionality change yet - the only difference
is that async_dag net now delegates to the operators for holding the event
objects.

Reviewed By: harouwu

Differential Revision: D5668627

fbshipit-source-id: 55f994074be6b85d6c66f09795dcbe2b93aba300
2017-08-21 13:30:53 -07:00
Yangqing Jia
5d24a4eeef Early design for a general Event abstraction cross-devices.
Summary:
There are ad-hoc efforts on avoiding excessive device synchronizations, such as
async_dag, singlethread_async, etc. This diff aims to provide an early design
for a general Event class, that can achieve the following:

(1) It is device agnostic, essentially using a vtable to do cross device record,
wait and synchronization.
(2) Created new functions WaitEvent and Record in the Context class for
interacting with Events.
(3) Exposed the corresponding WaitEvent and Record functions in the OperatorBase
class as well.

An example use case is that, after potential future refactoring, one can achieve
a real async execution per operator by running

op.WaitEvent(previous_event);
op.RunAsync();
op.RecordEvent(this_op_event);

and the next op can do

next_op.WaitEvent(this_op_event);

Right now, I changed async_dag net implementation so that it uses the general
event design. The old Event class is assimilated to the general Event class and
the old Stream class is now essentially taken over by the Context class itself.

Reviewed By: harouwu

Differential Revision: D5648463

fbshipit-source-id: 58bd84d06e4a9977b0b835110ddb2f18be3b7cbc
2017-08-18 15:46:51 -07:00