pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
James Donald	1dcc034fba	[caffe2] Avoid attempt to use undefined preprocessor directive Summary: This is somewhat more verbose, but it's more correct and addresses this warning on Visual Studio 2017: ``` xplat\caffe2\caffe2\core\common.h(76): warning C4067: unexpected tokens following preprocessor directive - expected a newline ``` Test Plan: Built locally with fix Reviewed By: simpkins Differential Revision: D28868632 fbshipit-source-id: f6a583e8275162adedb2a4bc5ed0f64847020871	2021-06-05 09:22:52 -07:00
Adam Simpkins	91531d3047	[caffe2] add a CAFFE2_NODISCARD macro to help support old compilers (#53754 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53754 Some of the PyTorch CircleCI builds still use gcc 5.4, and compile with `-Werror=attributes` causing this old compiler to fail because it does not understand the `[[nodiscard]]` attribute. Let's define a `CAFFE2_NODISCARD` macro to work around this. ghstack-source-id: 123594084 Test Plan: I'm using this macro in subsequent diffs in the stack. Reviewed By: mraway Differential Revision: D26959584 fbshipit-source-id: c7ba94f7ea944b6340e9fe20949ba41931e11d41	2021-03-12 11:32:30 -08:00
Jane Xu	71ca600af9	Renaming CAFFE2_API to TORCH_API (#49496 ) Summary: Since caffe2 and torch have been consolidated, CAFFE2_API should be merged with TORCH_API. Addresses a TODO. Manually edited some references of the removed `CAFFE2_API`: * `CONTRIBUTING.md` * `caffe2/proto/CMakeLists.txt` * `cmake/ProtoBuf.cmake` * `c10/macros/Export.h` * `torch/csrc/WindowsTorchApiMacro.h` Pull Request resolved: https://github.com/pytorch/pytorch/pull/49496 Reviewed By: malfet, samestep Differential Revision: D25600726 Pulled By: janeyx99 fbshipit-source-id: 7e068d959e397ac183c097d7e9a9afeca5ddd782	2020-12-18 10:54:50 -08:00
Michael Ranieri	9239608037	fix windows clang attributes (#33959 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33959 make sure clang on windows uses correct attributes. add support for cl.exe style pragma attributes Test Plan: CI green Differential Revision: D20153548 fbshipit-source-id: bfbfd374e8f5e7d7b8598453c3ca2b6693a425f1	2020-03-02 13:20:51 -08:00
Brian Wignall	f326045b37	Fix typos, via a Levenshtein-type corrector (#31523 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking. Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523 Differential Revision: D19216749 Pulled By: mrshenli fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea	2020-01-17 16:03:19 -08:00
Sebastian Messmer	643ca5def2	Replace c10::guts::stuff with std::stuff (#30915 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30915 Since we now have C++14, we don't need these c10::guts helpers anymore ghstack-source-id: 95777609 Test Plan: waitforsandcastle Differential Revision: D18869639 fbshipit-source-id: 97716f932297c64c6e814410ac47b444c33d4e2e	2019-12-16 13:57:19 -08:00
Sebastian Messmer	409151e1bb	Use [[noreturn]] instead of C10_NORETURN or CAFFE_NORETURN (#30917 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30917 This is a C++14 feature, we can use this now. ghstack-source-id: 95255753 Test Plan: waitforsandcastle Differential Revision: D18869637 fbshipit-source-id: dd02036b9faeaffa64b2d2d305725443054da31b	2019-12-15 23:54:16 -08:00
Karl Ostmo	49481d576d	Torch rename (#20774 ) Summary: This renames the CMake `caffe2` target to `torch`, as well as renaming `caffe2_gpu` to `torch_gpu` (and likewise for other gpu target variants). Many intermediate variables that don't manifest as artifacts of the build remain for now with the "caffe2" name; a complete purge of `caffe2` from CMake variable names is beyond the scope of this PR. The shell `libtorch` library that had been introduced as a stopgap in https://github.com/pytorch/pytorch/issues/17783 is again flattened in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/20774 Differential Revision: D15769965 Pulled By: kostmo fbshipit-source-id: b86e8c410099f90be0468e30176207d3ad40c821	2019-06-12 20:12:34 -07:00
Jerry Zhang	0c32e1b43e	use C10_MOBILE/ANDROID/IOS (#15363 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15363 Didn't define C10_MOBILE in the numa file move diff: D13380559 move CAFFE2_MOBILE/ANDROID/IOS to c10 ``` codemod -m -d caffe2 --extensions h,hpp,cc,cpp,mm "CAFFE2_MOBILE" "C10_MOBILE" codemod -m -d caffe2 --extensions h,hpp,cc,cpp,mm "CAFFE2_ANDROID" "C10_ANDROID" codemod -m -d caffe2 --extensions h,hpp,cc,cpp,mm "CAFFE2_IOS" "C10_IOS" ``` i-am-not-moving-c2-to-c10 Reviewed By: marcinkwiatkowski Differential Revision: D13490020 fbshipit-source-id: c4f01cacbefc0f16d5de94155c26c92fd5d780e4	2019-01-09 15:08:20 -08:00
vaeksare	82903dda9b	Fixes for some Windows compiler warnings (#14490 ) Summary: Implement some simple fixes to clean up windows build by fixing compiler warnings. Three main types of warnings were fixes: 1. GCC specific pragmas were changed to not be used on windows. 2. cmake flags that don't exist on windows were removed from windows build 3. Fix a macro that was defined multiple times on Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14490 Differential Revision: D13241988 Pulled By: ezyang fbshipit-source-id: 38da8354f0e3a3b9c97e33309cdda9fd23c08247	2018-12-05 21:27:07 -08:00
ArutyunovG	8e91da4cb3	Windows shared build (#13550 ) Summary: Hi guys, I'd like to build Caffe2 with more supported options in Windows with Microsoft Visual Studios. This is the first pull request. Running scripts/build_windows_shared.bat is able to build Caffe2 with both CMAKE_BUILD_TYPE=Debug and CMAKE_BUILD_TYPE=Release with Visual Studio 14 2015. CUDA is 9.0, cudnn is 7.0.5, glog, gflags and lmdb are supported on my system. Python is 3.5, Detectron works from python interface as well. It was even possible to debug detectron code and step into caffe2_gpu.dll with pdbs built. What is disappointing, that c10/experimental ops don't build with this Visual Studio generator, I added special option INCLUDE_EXPERIMENTAL_C10_OPS (default ON) to deal with it in build_windows_shared.bat. After this pull request the next step is to add Visual Studio 2017 support in the script. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13550 Reviewed By: ezyang Differential Revision: D13042597 Pulled By: orionr fbshipit-source-id: f313f909f599cd582a1d000eff766eef3a9fc4fc	2018-11-16 12:16:28 -08:00
Lu Fang	e2a7d43dfd	Use the torch.proto to store script module (#13736 ) Summary: Directly operate protobuf in the serializer/deserializer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13736 Reviewed By: dzhulgakov Differential Revision: D13028487 Pulled By: houseroad fbshipit-source-id: e578474008874f00f2a22f0a2ffd85f52643881a	2018-11-14 00:22:09 -08:00
Edward Yang	c0e24443f7	Revert D10459665: [c10] Redo jit/type and utils/functional to ATen/core Differential Revision: D10459665 Original commit changeset: 563dec9987aa fbshipit-source-id: bea1dac93ebe73c9e09753d641f04f722d80aef7	2018-11-01 07:26:54 -07:00
Bram Wasti	10a6a3e404	Redo jit/type and utils/functional to ATen/core (#12862 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12862 This is a redo of the previous move in a way that doesn't migrate the namespace -- also will check for the windows cudnn build failure Reviewed By: Yangqing Differential Revision: D10459665 fbshipit-source-id: 563dec9987aa979702e6d71072ee2f4b2d969d69	2018-10-31 19:57:43 -07:00
Sebastian Messmer	979560c9fc	Include c10 namespace into caffe2 and at namespaces. (#12950 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12950 For backwards compatibility, we want the c10 symbols to be reachable from caffe2 and aten. When we move classes from at/caffe2 to c10, this 1. allow keeping backwards compatibility with third paty code we can't control 2. allows splitting diffs that move such classes into two diffs, where one only fixes the includes and the second one fixes the namespaces. Reviewed By: ezyang Differential Revision: D10496244 fbshipit-source-id: 914818688fad8c079889dfdc6242bc228b539f0e	2018-10-25 14:08:47 -07:00
Yangqing Jia	7d5f7ed270	Using c10 namespace across caffe2. (#12714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12714 This is a short change to enable c10 namespace in caffe2. We did not enable it before due to gflags global variable confusion, but it should have been mostly cleaned now. Right now, the plan on record is that namespace caffe2 and namespace aten will fully be supersets of namespace c10. Most of the diff is codemod, and only two places of non-codemod is in caffe2/core/common.h, where ``` using namespace c10; ``` is added, and in Flags.h, where instead of creating aliasing variables in c10 namespace, we directly put it in the global namespace to match gflags (and same behavior if gflags is not being built with). Reviewed By: dzhulgakov Differential Revision: D10390486 fbshipit-source-id: 5e2df730e28e29a052f513bddc558d9f78a23b9b	2018-10-17 12:57:19 -07:00
Orion Reblitz-Richardson	cee19eb31c	Back out "[c10][NFCI] Move jit/type, function_schema, and utils/functional to ATen/core" (#12568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12568 Second attempt at D10324615 Original commit changeset: b71eeec98dfe Original commit changeset #2: 1af6400ae0c1 Reviewed By: bwasti Differential Revision: D10338168 fbshipit-source-id: 04cb443a89a9cd1a174df6d5ac1a86c3d423d56b	2018-10-11 09:53:40 -07:00
Andrey Malevich	229397b439	Revert D10324615: [pytorch][PR] Revert #12466 and #12467 to fix JIT test error on Windows CI Differential Revision: D10324615 Original commit changeset: 12e5fc73da42 fbshipit-source-id: 710c5f3b7a4fe56799ae31a86359b2085b7e741d	2018-10-11 03:39:14 -07:00
Will Feng	1f7cbea984	Revert #12466 and #12467 to fix JIT test error on Windows CI (#12557 ) Summary: Sample error log: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-test2/11766/console Pull Request resolved: https://github.com/pytorch/pytorch/pull/12557 Differential Revision: D10324615 Pulled By: yf225 fbshipit-source-id: 12e5fc73da42ffa22e39250aee9ea072fd2e33de	2018-10-10 23:56:56 -07:00
Bram Wasti	f989d4b18e	Move jit/type and utils/functional to ATen/core (#12466 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12466 Moves type.{h,cpp} and functional.h to ATen/core move is necessary for IR merging -- slimmed down from this diff: D9819906 Reviewed By: ezyang Differential Revision: D10242680 fbshipit-source-id: b71eeec98dfe9496e751a91838d538970ff05b25	2018-10-09 15:38:24 -07:00
Sebastian Messmer	ac9bb8ecef	Make dynamic_cast_if_rtti safer (#12408 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12408 Using static_cast is better than reinterpret_cast because it will cause a compile time error in the following cases, while reinterpret_cast would run into undefined behavior and likely segfault: - Src and Dst are not related through inheritance (say converting int* to double*) - Src and Dst are related through virtual inheritance This `dynamic_cast_if_rtti` is still unsafe because `dynamic_cast` and `static_cast` behave differently if the runtime type is not what you expected (i.e. dynamic_cast returns nullptr or throws whereas static_cast has undefined behavior), but it's much safer than doing reinterpret_cast. Reviewed By: Yangqing Differential Revision: D10227820 fbshipit-source-id: 530bebe9fe1ff88646f435096d7314b65622f31a	2018-10-06 12:56:27 -07:00
Yangqing Jia	9c49bb9ddf	Move registry fully to c10 (#12077 ) Summary: This does 6 things: - add c10/util/Registry.h as the unified registry util - cleaned up some APIs such as export condition - fully remove aten/core/registry.h - fully remove caffe2/core/registry.h - remove a bogus aten/registry.h - unifying all macros - set up registry testing in c10 Also, an important note that we used to mark the templated Registry class as EXPORT - this should not happen, because one should almost never export a template class. This PR fixes that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12077 Reviewed By: ezyang Differential Revision: D10050771 Pulled By: Yangqing fbshipit-source-id: 417b249b49fed6a67956e7c6b6d22374bcee24cf	2018-09-27 03:09:54 -07:00
Christian Puhrsch	db5f8d42bb	Remove TIndex typedef from core/common.h (#12032 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12032 See title Reviewed By: dinhviethoa Differential Revision: D10023757 fbshipit-source-id: dbf0a043b2afab767f052bd4c5e8de13e0f57dcc	2018-09-26 17:02:54 -07:00
Hoa Dinh	70e4b3ef59	Revert D10006069: Remove TIndex typedef from core/common.h Differential Revision: D10006069 Original commit changeset: 5e2aac993968 fbshipit-source-id: fbd8d3860635211e641ca14eaff7a64882e0d6bd	2018-09-24 15:30:25 -07:00
Yangqing Jia	a6f1ae7f20	set up c10 scaffolding. Move macros proper first. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11939 Reviewed By: orionr, dzhulgakov Differential Revision: D10004629 Pulled By: Yangqing fbshipit-source-id: ba50a96820d35c7922d81c78c4cbe849c85c251c	2018-09-24 11:09:59 -07:00
Christian Puhrsch	1a1d79e761	Remove TIndex typedef from core/common.h (#11993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11993 See title Reviewed By: ezyang Differential Revision: D10006069 fbshipit-source-id: 5e2aac993968307c850e431c00052cb1a339ced2	2018-09-24 10:55:55 -07:00
Orion Reblitz-Richardson	d4832f1e7b	More fixes for hidden visibility (#10624 ) Summary: Some more `ATEN_API` additions for hidden visibility. Running CI tests to see what fails to link. cc Yangqing mingzhe09088 ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/10624 Reviewed By: mingzhe09088 Differential Revision: D9392728 Pulled By: orionr fbshipit-source-id: e0f0861496b12c9a4e40c10b6e0c9e0df18e8726	2018-08-20 10:11:59 -07:00
Yangqing Jia	0a809fc8b1	build changes to make cpu unified build working. (#10504 ) Summary: Properly annotated all apis for cpu front. Checked with cmake using cmake -DUSE_ATEN=ON -DUSE_CUDA=OFF -DBUILD_ATEN=ON and resulting libcaffe2.so has about 11k symbols. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10504 Reviewed By: ezyang Differential Revision: D9316491 Pulled By: Yangqing fbshipit-source-id: 215659abf350af7032e9a4b0f28a856babab2454	2018-08-15 17:22:36 -07:00
Edward Yang	ad76fc8807	s/DISABLE_COPY_AND_ASSIGN/AT_DISABLE_COPY_AND_ASSIGN/ (#10275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10275 Remove forwarding declaration in caffe2/core/common.h ``` codemod -d caffe2 --extensions cc,cpp,cu,cuh,h \\bDISABLE_COPY_AND_ASSIGN AT_DISABLE_COPY_AND_ASSIGN ``` Reviewed By: mingzhe09088 Differential Revision: D9184809 fbshipit-source-id: 958cf5162b0d92b83ea9c2597abb77320ca57ce8	2018-08-07 08:54:26 -07:00
Edward Yang	66f7b8abbe	Better macro name hygiene prefixing. (#10274 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10274 Good C++ libraries don't take up un-namespaced identifiers like DISABLE_COPY_AND_ASSIGN. Re-prefix this. Follow up fix: codemod Caffe2 to use the new macro, delete the forwarding definition Reviewed By: mingzhe09088 Differential Revision: D9181939 fbshipit-source-id: 857d099de1c2c0c4d0c1768c1ab772d59e28977c	2018-08-07 08:54:24 -07:00
Edward Yang	fa9ea5bde9	Move CoreAPI.h to Macros.h, to give it a more accurate name. (#10264 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10264 Since we now have DISABLE_COPY_AND_ASSIGN macro in the file, CoreAPI is no longer an accurate name. Reviewed By: dzhulgakov Differential Revision: D9181687 fbshipit-source-id: a9cc5556be9c43e6aaa22671f755010707caef67	2018-08-06 22:27:44 -07:00
Edward Yang	da44cf6101	Move TensorTypeId, TensorTypeIdRegistration and flat_hash_map to ATen/core (#10263 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10263 Auxiliary changes that were needed: - Add DISABLE_COPY_AND_ASSIGN to CoreAPI.h (maybe we should rename this file now) Reviewed By: dzhulgakov Differential Revision: D9181321 fbshipit-source-id: 975687068285b5a94a57934817c960aeea2bbafa	2018-08-06 22:27:40 -07:00
Sebastian Meßmer	d3b690ecd5	TensorTypeId (#8389 )	2018-06-19 15:05:24 -07:00
wuhuikx	fa277e6785	[IDEEP] [fix bug] Fix bug in ideep SkipOutputCopy strategy (#8372 ) * fix a bug for SkipIndices * IDEEP bug, revise the output to CPUTensor in SkipOutputCopy strategy * [IDEEP] Add IDEEP fallbacks for Style-Transfer ops	2018-06-14 09:42:00 -07:00
bddppq	e5b997223c	[Caffe2] Enabling AMD GPU Backend for Caffe2 (#7955 ) * Add hip support for caffe2 core * Add MIOPEN header/wrapper to caffe2 core * Add HIP device into caffe2 PB * top level makefile change for rocm/hip * makefile scaffolding for AMD/RocM/HIP * Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files * caffe2 PB update for AMD/ROCM HIP device * Add AMD/RocM/Thrust dependency * HIP threadpool update * Fix makefile macro * makefile fix: duplicate test/binary name * makefile clean-up * makefile clean-up * add HIP operator registry * add utilities for hip device * Add USE_HIP to config summary * makefile fix for BUILD_TEST * merge latest * Fix indentation * code clean-up * Guard builds without HIP and use the same cmake script as PyTorch to find HIP * Setup rocm environment variables in build.sh (ideally should be done in the docker images) * setup locale * set HIP_PLATFORM * Revert "set HIP_PLATFORM" This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad. * continue the build script environment variables mess * HCC_AMDGPU_TARGET * Cleanup the mess, has been fixed in the lastest docker images * Assign protobuf field hip_gpu_id a new field number for backward compatibility * change name to avoid conflict * Fix duplicated thread pool flag * Refactor cmake files to not add hip includes and libs globally * Fix the wrong usage of environment variables detection in cmake * Add MIOPEN CNN operators * Revert "Add MIOPEN CNN operators" This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a. * Resolve merge conflicts * . * Update GetAsyncNetHIPThreadPool * Enable BUILD_CAFFE2 in pytorch build * Unifiy USE_HIP and USE_ROCM * always check USE_ROCM * . * remove unrelated change * move all core hip files to separate subdirectory * . * . * recurse glob core directory * . * correct include * .	2018-06-04 09:04:30 -07:00
xkszltl	89ba9dc44f	Import/export observer symbols for DLL, which fixes the linking error in Visual Studio. (#6834 ) * Import/export observer symbols for DLL, which fixes the linking error in Visual Studio. * Add support of all default cmake build types for release to cuda.	2018-05-31 10:22:21 -07:00
bddppq	966c65859d	Revert "[Caffe2] Enabling AMD GPU Backend for Caffe2" (#7802 ) * Revert "[auto] Update onnx to 4898c9e - Added TensorDenotation and metadata_props for images (onnx/onnx#879) `4898c9e925`" This reverts commit `9c679dab5f`. * Revert "Add BiasCHW fallback for GPU (#7738)" This reverts commit `14ad2e74f1`. * Revert "[Caffe2] Enabling AMD GPU Backend for Caffe2 (#7566)" This reverts commit `2ebcf4bb37`.	2018-05-23 17:58:47 -07:00
Peter Yeh	2ebcf4bb37	[Caffe2] Enabling AMD GPU Backend for Caffe2 (#7566 ) * Add hip support for caffe2 core * Add MIOPEN header/wrapper to caffe2 core * Add HIP device into caffe2 PB * top level makefile change for rocm/hip * makefile scaffolding for AMD/RocM/HIP * Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files * caffe2 PB update for AMD/ROCM HIP device * Add AMD/RocM/Thrust dependency * HIP threadpool update * Fix makefile macro * makefile fix: duplicate test/binary name * makefile clean-up * makefile clean-up * add HIP operator registry * add utilities for hip device * Add USE_HIP to config summary * makefile fix for BUILD_TEST * merge latest * Fix indentation * code clean-up * Guard builds without HIP and use the same cmake script as PyTorch to find HIP * Setup rocm environment variables in build.sh (ideally should be done in the docker images) * setup locale * set HIP_PLATFORM * Revert "set HIP_PLATFORM" This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad. * continue the build script environment variables mess * HCC_AMDGPU_TARGET * Cleanup the mess, has been fixed in the lastest docker images * Assign protobuf field hip_gpu_id a new field number for backward compatibility * change name to avoid conflict * Fix duplicated thread pool flag * Refactor cmake files to not add hip includes and libs globally * Fix the wrong usage of environment variables detection in cmake * Add MIOPEN CNN operators * Revert "Add MIOPEN CNN operators" This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a.	2018-05-23 15:13:09 -07:00
Orion Reblitz-Richardson	6223bfdb1d	Update from Facebook (#6692 ) * [GanH][Easy]: Add assertion to adaptive weighting layer 0 weight causes numeric instability and exploding ne * [Easy] Add cast op before computing norm in diagnose options As LpNorm only takes floats we add a manual casting here. * Introduce a new caching device allocator `cudaMalloc` and `cudaFree` calls are slow, and become slower the more GPUs there are. Essentially, they grab a host-wide (not device-wide) lock because GPU memory is transparently shared across all GPUs. Normally, this isn't much of a concern since workloads allocate memory upfront, and reuse it during later computation. However, under some computation models (specifically, memory conserving approaches like checkpoint-and-recompute, see https://medium.com/@yaroslavvb/fitting-larger-networks-into-memory-583e3c758ff9) this assumption is no longer true. In these situations, `cudaMalloc` and `cudaFree` are common and frequent. Furthermore, in data parallel contexts, these calls happen at nearly the same time from all GPUs worsening lock contention. A common solution to this problem is to add a custom allocator. In fact, nVIDIA provides one out of the box: CUB, which Caffe2 already supports. Unfortunately, the CUB allocator suffers from very high fragmentation. This is primarily because it is a "buddy" allocator which neither splits nor merges free cached blocks. Study https://github.com/NVlabs/cub/blob/1.8.0/cub/util_allocator.cuh#L357 if you want to convince yourself. This diff adapts a caching allocator from the Torch codebase https://github.com/torch/cutorch/blob/master/lib/THC/THCCachingAllocator.cpp which does splitting and merging and ends up working really well, at least for workloads like the checkpoint-and-recompute computation models noted above. I simplified the implementation a little bit, made it a bit more C++-like. I also removed a bunch of stream synchronization primitives for this diff. I plan to add them back in subsequent diffs. * Report reader progress in fblearner workflows Integrate with fblearner progress reporting API and add support to report training progress from reader nodes. If reader is constructed with batch limits, report based on finished batch vs total batch. The finished batch may be more than total batch because we evaludate if we should stop processing everytime we dequeue a split. If no limit for the reader, report based on finished splits (Hive files) vs total splits. This is fairly accurate. * [GanH][Diagnose]: fix plotting 1. ganh diagnose needs to set plot options 2. modifier's blob name is used for metric field can need to be fixed before generating net * Automatic update of fbcode/onnx to 985af3f5a0f7e7d29bc0ee6b13047e7ead9c90c8 * Make CompositeReader stops as soon as one reader finishes Previously, CompositeReader calls all readers before stopping. It results in flaky test since the last batch may be read by different threads; resulting in dropped data. * [dper] make sure loss is not nan as desc. * [rosetta2] [mobile-vision] Option to export NHWC order for RoIWarp/RoIAlign Thanks for finding this @stzpz and @wangyanghan. Looks like NHWC is more optimized. For OCR though it doesn't yet help since NHWC uses more mem b/w but will soon become important. * Intra-op parallel FC operator Intra-op parallel FC operator * [C2 Proto] extra info in device option passing extra information in device option design doc: https://fb.quip.com/yAiuAXkRXZGx * Unregister MKL fallbacks for NCHW conversions * Tracing for more executors Modified Tracer to work with other executors and add more tracing * Remove ShiftActivationDevices() * Check for blob entry iff it is present When processing the placeholders ops, ignore if the blob is not present in the blob_to_device. * Internalize use of eigen tensor Move use of eigen tensor out of the header file so we don't get template partial specialization errors when building other libraries. * feature importance for transformed features. * - Fix unused parameter warnings The changes in this diff comments out unused parameters. This will allow us to enable -Wunused-parameter as error. #accept2ship * add opencv dependencies to caffe2 The video input op requires additional opencv packages. This is to add them to cmake so that it can build * Add clip_by_value option in gradient clipping Add clip_by_value option in gradient clipping when the value is bigger than max or smaller than min, do the clip * std::round compat	2018-04-17 23:36:40 -07:00
Xiaomeng Yang	8849bea120	[caffe2] Update ReduceOps (#6497 ) * Update ReduceMean * Add reduce mean to math * Update cuda flag * Update Eigen::Tensor ctor * Remove unused variables * Skip ReduceTensorGPUTest if no gpus * Add NOMINMAX for windows * Fix lpnorm_op in windows	2018-04-11 23:36:05 -07:00
Orion Reblitz-Richardson	1d5780d42c	Remove Apache headers from source. * LICENSE file contains details, so removing from individual source files.	2018-03-27 13:10:18 -07:00
Yangqing Jia	91d76f5dbd	Reapply Windows fix Summary: Last fix was uncommitted due to a bug in internal build (CAFFE2_API causing error). This one re-applies it as well as a few more, especially enabling gtest. Earlier commit message: Basically, this should make windows {static_lib, shared_lib} * {static_runtime, shared_runtime} * {cpu, gpu} work other than gpu shared_lib, which willyd kindly pointed out a symbol limit problem. A few highlights: (1) Updated newest protobuf. (2) use protoc dllexport command to ensure proper symbol export for windows. (3) various code updates to make sure that C2 symbols are properly shown (4) cmake file changes to make build proper (5) option to choose static runtime and shared runtime similar to protobuf (6) revert to visual studio 2015 as current cuda and msvc 2017 do not play well together. (7) enabled gtest and fixed testing bugs. Earlier PR is #1793 Closes https://github.com/caffe2/caffe2/pull/1827 Differential Revision: D6832086 Pulled By: Yangqing fbshipit-source-id: 85f86e9a992ee5c53c70b484b761c9d6aed721df	2018-01-29 10:03:28 -08:00
Vladimir Chalyshev	8c02674964	Revert D6817719: [caffe2][PR] Better support for windows Summary: This reverts commit d286264fccc72bf90a2fcd7da533ecca23ce557e bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! cause_a_sev_many_files Differential Revision: D6817719 fbshipit-source-id: 8fe0ad7aba75caaa4c3cac5e0a804ab957a1b836	2018-01-26 06:08:49 -08:00
Yangqing Jia	8aa8eaabb1	Better support for windows Summary: Basically, this should make windows {static_lib, shared_lib} * {static_runtime, shared_runtime} * {cpu, gpu} work. A few highlights: (1) Updated newest protobuf. (2) use protoc dllexport command to ensure proper symbol export. (3) various code updates to make sure that C2 symbols are properly shown (4) cmake file changes to make build proper (5) option to choose static runtime and shared runtime similar to protobuf (6) revert to visual studio 2015 as current cuda and msvc 2017 do not play well together. Closes https://github.com/caffe2/caffe2/pull/1793 Reviewed By: dzhulgakov Differential Revision: D6817719 Pulled By: Yangqing fbshipit-source-id: d286264fccc72bf90a2fcd7da533ecca23ce557e	2018-01-26 00:48:43 -08:00
Yangqing Jia	046c11cd73	Stod Summary: This is in order for Android to pass - Android support for string related functions is quite limited. Closes https://github.com/caffe2/caffe2/pull/1571 Reviewed By: pietern Differential Revision: D6486079 Pulled By: Yangqing fbshipit-source-id: f0961e2dde6202bd6506f4fb8a3aea4af1670cb5	2017-12-05 10:48:09 -08:00
Dmytro Dzhulgakov	5527dd3b08	Expose CMake options in the binary Summary: Useful for figuring out with people which version they built with. We can just ask for --caffe2_version gflag or get core.build_options from python. Also adds CMAKE_INSTALL_RPATH_USE_LINK_PATH - without it wasn't building on my Mac. How should it be tested? Closes https://github.com/caffe2/caffe2/pull/1271 Reviewed By: bddppq Differential Revision: D5940750 Pulled By: dzhulgakov fbshipit-source-id: 45b4c94f67e79346a10a65b34f40fd258295dad1	2017-10-04 02:33:02 -07:00
Yangqing Jia	8286ce1e3a	Re-license to Apache Summary: Closes https://github.com/caffe2/caffe2/pull/1260 Differential Revision: D5906739 Pulled By: Yangqing fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902	2017-09-28 16:22:00 -07:00
Yangqing Jia	f14d75c7ef	Proper versioning and misc CMake improvements Summary: This brings proper versioning in Caffe2: instead of manual version macros, this puts the version information in CMake (replacing the TODO bwasti line) and uses macros.h.in to then generate the version in the C++ header. A few misc updates: - Removed the mac os rpath, verified on local macbook that it is no longer needed. - Misc updates for caffe2 ready: - Mapped cmake/Cuda.cmake with gloo's setting. - upstreamed third_party/nccl so it builds with cuda 9. - Separated the Caffe2 cpu dependencies and cuda dependencies - now libCaffe2_CPU.so do not depend on any cuda libs. - caffe2 python extensions now depend on cpu and gpu separately too. - Reduced the number of unused functions in Utils.cmake Closes https://github.com/caffe2/caffe2/pull/1256 Reviewed By: dzhulgakov Differential Revision: D5899210 Pulled By: Yangqing fbshipit-source-id: 36366e47366c3258374d646cf410b5f49f95767b	2017-09-26 08:52:21 -07:00
Yangqing Jia	e368740612	Update the speed benchmark code Summary: (for TIR demo cases) Closes https://github.com/caffe2/caffe2/pull/1160 Differential Revision: D5761679 Pulled By: Yangqing fbshipit-source-id: 53b6c7fd098a394eba51baeac1e70371bcddf360	2017-09-01 23:16:39 -07:00
Yangqing Jia	93e12e75df	Allow caffe2 to detect if cuda lib has been linked, and also fix oss build error. Summary: Closes https://github.com/caffe2/caffe2/pull/1114 Reviewed By: pietern Differential Revision: D5686557 Pulled By: Yangqing fbshipit-source-id: 6b7245ebbe4eeb025ce9d0fe8fda427a0c3d9770	2017-08-23 18:41:15 -07:00

1 2

79 Commits