Summary:
Implement some simple fixes to clean up windows build by fixing compiler warnings. Three main types of warnings were fixes:
1. GCC specific pragmas were changed to not be used on windows.
2. cmake flags that don't exist on windows were removed from windows build
3. Fix a macro that was defined multiple times on Windows.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14490
Differential Revision: D13241988
Pulled By: ezyang
fbshipit-source-id: 38da8354f0e3a3b9c97e33309cdda9fd23c08247
Summary:
Hi guys,
I'd like to build Caffe2 with more supported options in Windows with Microsoft Visual Studios.
This is the first pull request.
Running scripts/build_windows_shared.bat is able to build Caffe2 with both CMAKE_BUILD_TYPE=Debug and CMAKE_BUILD_TYPE=Release with Visual Studio 14 2015.
CUDA is 9.0, cudnn is 7.0.5, glog, gflags and lmdb are supported on my system.
Python is 3.5, Detectron works from python interface as well.
It was even possible to debug detectron code and step into caffe2_gpu.dll with pdbs built.
What is disappointing, that c10/experimental ops don't build with this Visual Studio generator, I added special option INCLUDE_EXPERIMENTAL_C10_OPS (default ON) to deal with it in build_windows_shared.bat.
After this pull request the next step is to add Visual Studio 2017 support in the script.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13550
Reviewed By: ezyang
Differential Revision: D13042597
Pulled By: orionr
fbshipit-source-id: f313f909f599cd582a1d000eff766eef3a9fc4fc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12862
This is a redo of the previous move in a way that doesn't migrate the namespace -- also will check for the windows cudnn build failure
Reviewed By: Yangqing
Differential Revision: D10459665
fbshipit-source-id: 563dec9987aa979702e6d71072ee2f4b2d969d69
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12950
For backwards compatibility, we want the c10 symbols to be reachable from caffe2 and aten.
When we move classes from at/caffe2 to c10, this
1. allow keeping backwards compatibility with third paty code we can't control
2. allows splitting diffs that move such classes into two diffs, where one only fixes the includes and the second one fixes the namespaces.
Reviewed By: ezyang
Differential Revision: D10496244
fbshipit-source-id: 914818688fad8c079889dfdc6242bc228b539f0e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12714
This is a short change to enable c10 namespace in caffe2. We did not enable
it before due to gflags global variable confusion, but it should have been
mostly cleaned now. Right now, the plan on record is that namespace caffe2 and
namespace aten will fully be supersets of namespace c10.
Most of the diff is codemod, and only two places of non-codemod is in caffe2/core/common.h, where
```
using namespace c10;
```
is added, and in Flags.h, where instead of creating aliasing variables in c10 namespace, we directly put it in the global namespace to match gflags (and same behavior if gflags is not being built with).
Reviewed By: dzhulgakov
Differential Revision: D10390486
fbshipit-source-id: 5e2df730e28e29a052f513bddc558d9f78a23b9b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12466
Moves type.{h,cpp} and functional.h to ATen/core
move is necessary for IR merging -- slimmed down from this diff: D9819906
Reviewed By: ezyang
Differential Revision: D10242680
fbshipit-source-id: b71eeec98dfe9496e751a91838d538970ff05b25
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12408
Using static_cast is better than reinterpret_cast because it will cause a compile time error in the following cases, while reinterpret_cast would run into undefined behavior and likely segfault:
- Src and Dst are not related through inheritance (say converting int* to double*)
- Src and Dst are related through virtual inheritance
This `dynamic_cast_if_rtti` is still unsafe because `dynamic_cast` and `static_cast` behave differently if the runtime type is not what you expected (i.e. dynamic_cast returns nullptr or throws whereas static_cast has undefined behavior), but it's much safer than doing reinterpret_cast.
Reviewed By: Yangqing
Differential Revision: D10227820
fbshipit-source-id: 530bebe9fe1ff88646f435096d7314b65622f31a
Summary:
This does 6 things:
- add c10/util/Registry.h as the unified registry util
- cleaned up some APIs such as export condition
- fully remove aten/core/registry.h
- fully remove caffe2/core/registry.h
- remove a bogus aten/registry.h
- unifying all macros
- set up registry testing in c10
Also, an important note that we used to mark the templated Registry class as EXPORT - this should not happen, because one should almost never export a template class. This PR fixes that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12077
Reviewed By: ezyang
Differential Revision: D10050771
Pulled By: Yangqing
fbshipit-source-id: 417b249b49fed6a67956e7c6b6d22374bcee24cf
Summary:
Some more `ATEN_API` additions for hidden visibility.
Running CI tests to see what fails to link.
cc Yangqing mingzhe09088 ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10624
Reviewed By: mingzhe09088
Differential Revision: D9392728
Pulled By: orionr
fbshipit-source-id: e0f0861496b12c9a4e40c10b6e0c9e0df18e8726
Summary:
Properly annotated all apis for cpu front. Checked with cmake using
cmake -DUSE_ATEN=ON -DUSE_CUDA=OFF -DBUILD_ATEN=ON
and resulting libcaffe2.so has about 11k symbols.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10504
Reviewed By: ezyang
Differential Revision: D9316491
Pulled By: Yangqing
fbshipit-source-id: 215659abf350af7032e9a4b0f28a856babab2454
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10274
Good C++ libraries don't take up un-namespaced identifiers
like DISABLE_COPY_AND_ASSIGN. Re-prefix this.
Follow up fix: codemod Caffe2 to use the new macro, delete
the forwarding definition
Reviewed By: mingzhe09088
Differential Revision: D9181939
fbshipit-source-id: 857d099de1c2c0c4d0c1768c1ab772d59e28977c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10264
Since we now have DISABLE_COPY_AND_ASSIGN macro in the file,
CoreAPI is no longer an accurate name.
Reviewed By: dzhulgakov
Differential Revision: D9181687
fbshipit-source-id: a9cc5556be9c43e6aaa22671f755010707caef67
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10263
Auxiliary changes that were needed:
- Add DISABLE_COPY_AND_ASSIGN to CoreAPI.h (maybe we should rename this file
now)
Reviewed By: dzhulgakov
Differential Revision: D9181321
fbshipit-source-id: 975687068285b5a94a57934817c960aeea2bbafa
* fix a bug for SkipIndices
* IDEEP bug, revise the output to CPUTensor in SkipOutputCopy strategy
* [IDEEP] Add IDEEP fallbacks for Style-Transfer ops
* Add hip support for caffe2 core
* Add MIOPEN header/wrapper to caffe2 core
* Add HIP device into caffe2 PB
* top level makefile change for rocm/hip
* makefile scaffolding for AMD/RocM/HIP
* Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files
* caffe2 PB update for AMD/ROCM HIP device
* Add AMD/RocM/Thrust dependency
* HIP threadpool update
* Fix makefile macro
* makefile fix: duplicate test/binary name
* makefile clean-up
* makefile clean-up
* add HIP operator registry
* add utilities for hip device
* Add USE_HIP to config summary
* makefile fix for BUILD_TEST
* merge latest
* Fix indentation
* code clean-up
* Guard builds without HIP and use the same cmake script as PyTorch to find HIP
* Setup rocm environment variables in build.sh (ideally should be done in the docker images)
* setup locale
* set HIP_PLATFORM
* Revert "set HIP_PLATFORM"
This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad.
* continue the build script environment variables mess
* HCC_AMDGPU_TARGET
* Cleanup the mess, has been fixed in the lastest docker images
* Assign protobuf field hip_gpu_id a new field number for backward compatibility
* change name to avoid conflict
* Fix duplicated thread pool flag
* Refactor cmake files to not add hip includes and libs globally
* Fix the wrong usage of environment variables detection in cmake
* Add MIOPEN CNN operators
* Revert "Add MIOPEN CNN operators"
This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a.
* Resolve merge conflicts
* .
* Update GetAsyncNetHIPThreadPool
* Enable BUILD_CAFFE2 in pytorch build
* Unifiy USE_HIP and USE_ROCM
* always check USE_ROCM
* .
* remove unrelated change
* move all core hip files to separate subdirectory
* .
* .
* recurse glob core directory
* .
* correct include
* .
* Import/export observer symbols for DLL, which fixes the linking error in Visual Studio.
* Add support of all default cmake build types for release to cuda.
* Add hip support for caffe2 core
* Add MIOPEN header/wrapper to caffe2 core
* Add HIP device into caffe2 PB
* top level makefile change for rocm/hip
* makefile scaffolding for AMD/RocM/HIP
* Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files
* caffe2 PB update for AMD/ROCM HIP device
* Add AMD/RocM/Thrust dependency
* HIP threadpool update
* Fix makefile macro
* makefile fix: duplicate test/binary name
* makefile clean-up
* makefile clean-up
* add HIP operator registry
* add utilities for hip device
* Add USE_HIP to config summary
* makefile fix for BUILD_TEST
* merge latest
* Fix indentation
* code clean-up
* Guard builds without HIP and use the same cmake script as PyTorch to find HIP
* Setup rocm environment variables in build.sh (ideally should be done in the docker images)
* setup locale
* set HIP_PLATFORM
* Revert "set HIP_PLATFORM"
This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad.
* continue the build script environment variables mess
* HCC_AMDGPU_TARGET
* Cleanup the mess, has been fixed in the lastest docker images
* Assign protobuf field hip_gpu_id a new field number for backward compatibility
* change name to avoid conflict
* Fix duplicated thread pool flag
* Refactor cmake files to not add hip includes and libs globally
* Fix the wrong usage of environment variables detection in cmake
* Add MIOPEN CNN operators
* Revert "Add MIOPEN CNN operators"
This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a.
* [GanH][Easy]: Add assertion to adaptive weighting layer
0 weight causes numeric instability and exploding ne
* [Easy] Add cast op before computing norm in diagnose options
As LpNorm only takes floats we add a manual casting here.
* Introduce a new caching device allocator
`cudaMalloc` and `cudaFree` calls are slow, and become slower the
more GPUs there are. Essentially, they grab a host-wide (not device-wide) lock
because GPU memory is transparently shared across all GPUs. Normally, this
isn't much of a concern since workloads allocate memory upfront, and reuse it
during later computation.
However, under some computation models (specifically, memory conserving
approaches like checkpoint-and-recompute, see
https://medium.com/@yaroslavvb/fitting-larger-networks-into-memory-583e3c758ff9)
this assumption is no longer true. In these situations, `cudaMalloc` and
`cudaFree` are common and frequent. Furthermore, in data parallel contexts,
these calls happen at nearly the same time from all GPUs worsening lock
contention.
A common solution to this problem is to add a custom allocator. In fact,
nVIDIA provides one out of the box: CUB, which Caffe2 already supports.
Unfortunately, the CUB allocator suffers from very high fragmentation. This is
primarily because it is a "buddy" allocator which neither splits nor merges
free cached blocks. Study
https://github.com/NVlabs/cub/blob/1.8.0/cub/util_allocator.cuh#L357 if you
want to convince yourself.
This diff adapts a caching allocator from the Torch codebase
https://github.com/torch/cutorch/blob/master/lib/THC/THCCachingAllocator.cpp
which does splitting and merging and ends up working really well, at least for
workloads like the checkpoint-and-recompute computation models noted above.
I simplified the implementation a little bit, made it a bit more C++-like. I
also removed a bunch of stream synchronization primitives for this diff. I
plan to add them back in subsequent diffs.
* Report reader progress in fblearner workflows
Integrate with fblearner progress reporting API and add support to report training progress from reader nodes.
If reader is constructed with batch limits, report based on finished batch vs total batch. The finished batch may be more than total batch because we evaludate if we should stop processing everytime we dequeue a split.
If no limit for the reader, report based on finished splits (Hive files) vs total splits. This is fairly accurate.
* [GanH][Diagnose]: fix plotting
1. ganh diagnose needs to set plot options
2. modifier's blob name is used for metric field can need to be fixed before
generating net
* Automatic update of fbcode/onnx to 985af3f5a0f7e7d29bc0ee6b13047e7ead9c90c8
* Make CompositeReader stops as soon as one reader finishes
Previously, CompositeReader calls all readers before stopping. It results in flaky test since the last batch may be read by different threads; resulting in dropped data.
* [dper] make sure loss is not nan
as desc.
* [rosetta2] [mobile-vision] Option to export NHWC order for RoIWarp/RoIAlign
Thanks for finding this @stzpz and @wangyanghan. Looks like NHWC is more
optimized. For OCR though it doesn't yet help since NHWC uses more mem b/w but
will soon become important.
* Intra-op parallel FC operator
Intra-op parallel FC operator
* [C2 Proto] extra info in device option
passing extra information in device option
design doc: https://fb.quip.com/yAiuAXkRXZGx
* Unregister MKL fallbacks for NCHW conversions
* Tracing for more executors
Modified Tracer to work with other executors and add more tracing
* Remove ShiftActivationDevices()
* Check for blob entry iff it is present
When processing the placeholders ops, ignore if the blob is not present in the blob_to_device.
* Internalize use of eigen tensor
Move use of eigen tensor out of the header file so we don't get template partial specialization errors when building other libraries.
* feature importance for transformed features.
* - Fix unused parameter warnings
The changes in this diff comments out unused parameters.
This will allow us to enable -Wunused-parameter as error.
#accept2ship
* add opencv dependencies to caffe2
The video input op requires additional opencv packages. This is to add them to
cmake so that it can build
* Add clip_by_value option in gradient clipping
Add clip_by_value option in gradient clipping
when the value is bigger than max or smaller than min, do the clip
* std::round compat
* Update ReduceMean
* Add reduce mean to math
* Update cuda flag
* Update Eigen::Tensor ctor
* Remove unused variables
* Skip ReduceTensorGPUTest if no gpus
* Add NOMINMAX for windows
* Fix lpnorm_op in windows
Summary:
Last fix was uncommitted due to a bug in internal build (CAFFE2_API causing error). This one re-applies it as well as a few more, especially enabling gtest.
Earlier commit message: Basically, this should make windows {static_lib, shared_lib} * {static_runtime, shared_runtime} * {cpu, gpu} work other than gpu shared_lib, which willyd kindly pointed out a symbol limit problem. A few highlights:
(1) Updated newest protobuf.
(2) use protoc dllexport command to ensure proper symbol export for windows.
(3) various code updates to make sure that C2 symbols are properly shown
(4) cmake file changes to make build proper
(5) option to choose static runtime and shared runtime similar to protobuf
(6) revert to visual studio 2015 as current cuda and msvc 2017 do not play well together.
(7) enabled gtest and fixed testing bugs.
Earlier PR is #1793
Closes https://github.com/caffe2/caffe2/pull/1827
Differential Revision: D6832086
Pulled By: Yangqing
fbshipit-source-id: 85f86e9a992ee5c53c70b484b761c9d6aed721df
Summary:
This reverts commit d286264fccc72bf90a2fcd7da533ecca23ce557e
bypass-lint
An infra SEV is better than not reverting this diff.
If you copy this password, see you in SEV Review!
cause_a_sev_many_files
Differential Revision: D6817719
fbshipit-source-id: 8fe0ad7aba75caaa4c3cac5e0a804ab957a1b836
Summary:
Basically, this should make windows {static_lib, shared_lib} * {static_runtime, shared_runtime} * {cpu, gpu} work. A few highlights:
(1) Updated newest protobuf.
(2) use protoc dllexport command to ensure proper symbol export.
(3) various code updates to make sure that C2 symbols are properly shown
(4) cmake file changes to make build proper
(5) option to choose static runtime and shared runtime similar to protobuf
(6) revert to visual studio 2015 as current cuda and msvc 2017 do not play well together.
Closes https://github.com/caffe2/caffe2/pull/1793
Reviewed By: dzhulgakov
Differential Revision: D6817719
Pulled By: Yangqing
fbshipit-source-id: d286264fccc72bf90a2fcd7da533ecca23ce557e
Summary:
This is in order for Android to pass - Android support for string related functions is quite limited.
Closes https://github.com/caffe2/caffe2/pull/1571
Reviewed By: pietern
Differential Revision: D6486079
Pulled By: Yangqing
fbshipit-source-id: f0961e2dde6202bd6506f4fb8a3aea4af1670cb5
Summary:
Useful for figuring out with people which version they built with. We can just ask for --caffe2_version gflag or get core.build_options from python.
Also adds CMAKE_INSTALL_RPATH_USE_LINK_PATH - without it wasn't building on my Mac. How should it be tested?
Closes https://github.com/caffe2/caffe2/pull/1271
Reviewed By: bddppq
Differential Revision: D5940750
Pulled By: dzhulgakov
fbshipit-source-id: 45b4c94f67e79346a10a65b34f40fd258295dad1
Summary:
This brings proper versioning in Caffe2: instead of manual version macros, this puts the version information in CMake (replacing the TODO bwasti line) and uses macros.h.in to then generate the version in the C++ header.
A few misc updates:
- Removed the mac os rpath, verified on local macbook that it is no longer needed.
- Misc updates for caffe2 ready:
- Mapped cmake/Cuda.cmake with gloo's setting.
- upstreamed third_party/nccl so it builds with cuda 9.
- Separated the Caffe2 cpu dependencies and cuda dependencies
- now libCaffe2_CPU.so do not depend on any cuda libs.
- caffe2 python extensions now depend on cpu and gpu separately too.
- Reduced the number of unused functions in Utils.cmake
Closes https://github.com/caffe2/caffe2/pull/1256
Reviewed By: dzhulgakov
Differential Revision: D5899210
Pulled By: Yangqing
fbshipit-source-id: 36366e47366c3258374d646cf410b5f49f95767b
Summary:
(1) BlobsQueue is causing a gcc error (google search suggeste it was a
bug, but we'll put the implementation in a separate cc file).
(2) Preparing for cuda 9: update cub.
(3) Prepare for cudnn 7: update cudnn rnn op.
(4) Fix an MSVC issue
Reviewed By: sf-wind, jerryzh168
Differential Revision: D5574352
fbshipit-source-id: 230820ce3ceaa32bee8323bdc509de352c93fcf2
Summary:
__attribute__((unused)) is not supported on Windows, so we actually need to
substitute it with a macro.
Also changed UNUSED_VARIABLE to CAFFE2_UNUSED because we also use it to mark
functions now.
Reviewed By: ajtulloch
Differential Revision: D5497063
fbshipit-source-id: bcda026e626c41f71c21c36f029a3f871eaea7d4
Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually.
Reviewed By: igorsugak
Differential Revision: D5454343
fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2
Summary:
(Also, exposed the macros that we use during build time via the macros.h header file)
Closes https://github.com/caffe2/caffe2/pull/233
Differential Revision: D4803311
Pulled By: Yangqing
fbshipit-source-id: 9f8ce57692f81f7a8994344846d3c90aa2c7070a