pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Jerry Zhang	07c4991622	Tensor construction codemod - 2/2 (#15600 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15600 Codemod generated with clangr shard mode, 25 files per diff, motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: dzhulgakov Differential Revision: D13542455 fbshipit-source-id: 8a3b15b0a1f81565f34e309114e1c3e1f7f65a3c	2019-01-04 13:31:53 -08:00
Jerry Zhang	ed5b584f65	Tensor construction codemod(ResizeLike) - 7/7 (#15087 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15087 Codemod generated with clangr shard mode, 25 files per diff, motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: ezyang Differential Revision: D13419765 fbshipit-source-id: 34d695309a66723281429610a12544598c507d74	2018-12-20 15:33:07 -08:00
Edward Yang	71ee882157	Reenable OpenMP by reverting the following two commits. (#15315 ) Summary: Revert "Put back linker flag for OpenMP to prevent build break on ppc64le (#14569)" This reverts commit `a84e873bb1`. Revert "Update OpenMP cmake setting for xcode 9 compiler(AppleClang 9.0) (#14473)" This reverts commit `8901935ad4`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15315 Differential Revision: D13495852 Pulled By: ezyang fbshipit-source-id: bcd3f60088b14831c53d3c171f10cd1ab6b35dee	2018-12-17 19:54:41 -08:00
Jerry Zhang	b5db6ac9f1	Tensor construction codemod - 3/3 (#14835 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14835 Codemod generated with clangr shard mode, 25 files per diff, motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: bddppq Differential Revision: D13335184 fbshipit-source-id: 26d8247e16b30bdff045530034af9b72c76d066f	2018-12-06 11:50:59 -08:00
JerryShih	8901935ad4	Update OpenMP cmake setting for xcode 9 compiler(AppleClang 9.0) (#14473 ) Summary: Original PR: https://github.com/pytorch/pytorch/pull/11563 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14473 Differential Revision: D13234208 Pulled By: ezyang fbshipit-source-id: 7d874c63659e93728af239ecdfb85547613e52ad	2018-11-28 09:28:26 -08:00
ArutyunovG	8e91da4cb3	Windows shared build (#13550 ) Summary: Hi guys, I'd like to build Caffe2 with more supported options in Windows with Microsoft Visual Studios. This is the first pull request. Running scripts/build_windows_shared.bat is able to build Caffe2 with both CMAKE_BUILD_TYPE=Debug and CMAKE_BUILD_TYPE=Release with Visual Studio 14 2015. CUDA is 9.0, cudnn is 7.0.5, glog, gflags and lmdb are supported on my system. Python is 3.5, Detectron works from python interface as well. It was even possible to debug detectron code and step into caffe2_gpu.dll with pdbs built. What is disappointing, that c10/experimental ops don't build with this Visual Studio generator, I added special option INCLUDE_EXPERIMENTAL_C10_OPS (default ON) to deal with it in build_windows_shared.bat. After this pull request the next step is to add Visual Studio 2017 support in the script. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13550 Reviewed By: ezyang Differential Revision: D13042597 Pulled By: orionr fbshipit-source-id: f313f909f599cd582a1d000eff766eef3a9fc4fc	2018-11-16 12:16:28 -08:00
Jerry Zhang	57ec8f111f	Rename ndim() -> dim() - 6/6 Summary: Codemod generated with clangr shard mode, 50 files per diff, clangr code(ndim()->dim()): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp Reviewed By: ezyang Differential Revision: D12935827 fbshipit-source-id: 80ecb034c243dbfd267b9f131cee9d7afd5ef063	2018-11-07 07:27:45 -08:00
Gu, Jinghui	dbab9b73b6	seperate mkl, mklml, and mkldnn (#12170 ) Summary: 1. Remove avx2 support in mkldnn 2. Seperate mkl, mklml, and mkldnn 3. Fix convfusion test case Pull Request resolved: https://github.com/pytorch/pytorch/pull/12170 Reviewed By: yinghai Differential Revision: D10207126 Pulled By: orionr fbshipit-source-id: 1e62eb47943f426a89d57e2d2606439f2b04fd51	2018-10-29 10:52:55 -07:00
Jerry Zhang	e5752f2cb4	Renaming dims() to sizes() (fbcode) Summary: Codemod generated with clangr shard mode, 25 files per diff, for renaming dims() to sizes() Reviewed By: ezyang Differential Revision: D10848643 fbshipit-source-id: ac75833be8be9162e35b00dcd352f616bc7bbafe	2018-10-25 09:32:18 -07:00
Viswanath Sivakumar	1bea5fc3ad	Fix UpsampleNearest op CPU impl batch handling (#13002 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13002 Batch dim wasn't handled in the CPU impl (will fail for inputs with N > 1). Fixing that here. Differential Revision: D10515159 fbshipit-source-id: ee7e4f489d2d4de793f550b31db7c0e2ba3651e8	2018-10-24 13:10:53 -07:00
103yiran	0a190c8869	Move the location of annotation Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12969 Differential Revision: D10560824 Pulled By: ezyang fbshipit-source-id: 86c21149682db5ebfd9610df9e9845688a3db3b0	2018-10-24 12:35:08 -07:00
Yangqing Jia	7d5f7ed270	Using c10 namespace across caffe2. (#12714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12714 This is a short change to enable c10 namespace in caffe2. We did not enable it before due to gflags global variable confusion, but it should have been mostly cleaned now. Right now, the plan on record is that namespace caffe2 and namespace aten will fully be supersets of namespace c10. Most of the diff is codemod, and only two places of non-codemod is in caffe2/core/common.h, where ``` using namespace c10; ``` is added, and in Flags.h, where instead of creating aliasing variables in c10 namespace, we directly put it in the global namespace to match gflags (and same behavior if gflags is not being built with). Reviewed By: dzhulgakov Differential Revision: D10390486 fbshipit-source-id: 5e2df730e28e29a052f513bddc558d9f78a23b9b	2018-10-17 12:57:19 -07:00
wuhuikx	e497aa1e35	Optimize UpsampleNearest Op (#12151 ) Summary: Optimize the UpsampleNearest Op. 1. Add OMP 2. revise the translated_idx method Pull Request resolved: https://github.com/pytorch/pytorch/pull/12151 Differential Revision: D10362856 Pulled By: ezyang fbshipit-source-id: 535a4b87c7423942217f2d79bedc463a0617c67a	2018-10-16 20:34:20 -07:00
ChongyuIntel	5416260b1e	Add the OpenMP optimization for BatchPermutation. (#12153 ) Summary: This is for Caffe2 optimization. WIth this optimization, the following two ops can boost a lot. (Test with MaskRCNN, on SKX8180 one socket) BatchPermutation op: reduced from 8.296387 ms to 1.4501984 ms. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12153 Differential Revision: D10362823 Pulled By: ezyang fbshipit-source-id: 04d1486f6c7db49270992cd8cde41092154e62ee	2018-10-16 20:23:09 -07:00
Edward Yang	54d9823d00	Make caffe2::Tensor::dims() return an IntList instead of a const vector& (#12180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12180 I had to fix a lot of call sites, because a lot of places assume that you can actually get a const vector&, and if the internal representation of sizes in a tensor is NOT a vector, it's not possible to fulfill this API contract. Framework changes: - I deleted TensorImpl::dims(); caffe2::Tensor::dims() just forwards to sizes() now. - De-templatized SetDims; now it is an explicit list of ArrayRef and variadic overloads. This makes implicit conversions work again, so I don't need to explicitly list the std::vector cases too. - As a knock-on effect, this causes Reset() to accept at::IntList as well as const std::vector<int64_t>& - Edited variadic overloads of SetDims to all forward to the underlying arbitrary-dim implementation, reducing code duplication. (It's probably marginally less efficient in the new world.) - Replace Tensor constructor accepting const std::vector<int64_t>& with at::IntList - Make MKLTensor accept ArrayRef along with vector in constructor and Reset (unfortunately, no implicit conversions here, since it's templated on index type.) - There are a few other places, like cudnn, where I changed functions that previously took const std::vector<int64_t>& to take at::IntList instead. Classification of call site changes: - 'const std::vector<int64_t>& x_dims = x.dims()' ==> 'at::IntList x_dims = x.dims()' - 'std::vector<int64_t> x_dims = x.dims()' ==> 'std::vector<int64_t> x_dims = x.dims().vec()' (we need a copy!) Usually this is because we're about to mutably modify the vector to compute some new dimension. However, it also very commonly occurs in the form: 'x_dims_ = x.dims()' because we frequently cache sizes in operators. - Instead of constructing std::vector<int64_t>{blah, blah}, construct an at::IntList directly ArrayRef changes: - cbegin()/cend() iterators, they operate the same aas begin()/end() because everything on ArrayRef is const. - Moved operator<< into ArrayRef.h, so that it's always available when working with ArrayRef. I also templated it, so it now works on an ArrayRef of any type. - Add operator== overload for ArrayRef, and also add variants to permit comparison of ArrayRef with std::vector, a very common operation. (The non-templated version of operator== can get these automatically via implicit conversion, but with templates C++ refuses to do any explicit conversions.) I'm planning to audit all dims() call sites to make sure they don't expect 'auto x = t.dims()' to give you an x whose lifetime can validly outlive the tensor. I opted not to do a dims() to sizes() rename, because dims() also matches the protobufs accessor. Bad news! Reviewed By: jerryzh168 Differential Revision: D10111759 fbshipit-source-id: a2a81dc4b92c22ad4b3b8ef4077a7e97b6479452	2018-10-05 15:57:41 -07:00
Yangqing Jia	38f3d1fc40	move flags to c10 (#12144 ) Summary: still influx. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12144 Reviewed By: smessmer Differential Revision: D10140176 Pulled By: Yangqing fbshipit-source-id: 1a313abed022039333e3925d19f8b3ef2d95306c	2018-10-04 02:09:56 -07:00
Yangqing Jia	28dba2f928	Unify all _EXPORT and _IMPORT macros across c++ backend (#12019 ) Summary: TSIA. Right now we should basically use C10_EXPORT and C10_IMPORT for explicitly marking dllexport and dllimport, as a continued effort of the C10 unification. This is a codemod by mechanically doing the following change: CAFFE2_{EXPORT,IMPORT} -> C10_{EXPORT,IMPORT} AT_CORE_{EXPORT,IMPORT} -> C10_{EXPORT,IMPORT} Pull Request resolved: https://github.com/pytorch/pytorch/pull/12019 Reviewed By: ezyang, teng-li Differential Revision: D10016276 Pulled By: Yangqing fbshipit-source-id: a420d62c43d1110105fc88f9e9076e28a3203164	2018-09-25 17:41:05 -07:00
Yangqing Jia	a6f1ae7f20	set up c10 scaffolding. Move macros proper first. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11939 Reviewed By: orionr, dzhulgakov Differential Revision: D10004629 Pulled By: Yangqing fbshipit-source-id: ba50a96820d35c7922d81c78c4cbe849c85c251c	2018-09-24 11:09:59 -07:00
Christian Puhrsch	a6630e25af	Remove many caffe2::TIndex and replace them with int64_t (#11943 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11943 See title Reviewed By: ezyang Differential Revision: D9992645 fbshipit-source-id: e8f80d6ea762971513e5e8072975ceea53e1f11a	2018-09-22 18:11:04 -07:00
Orion Reblitz-Richardson	8ad846fda5	Don't build Detectron ops with NO_CAFFE2_OPS=1 (#11799 ) Summary: cc apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11799 Differential Revision: D9922745 Pulled By: orionr fbshipit-source-id: b88724b7c2919aabc00d98658e8e563233e01c85	2018-09-18 14:09:33 -07:00
Lingyi Liu	958ba4e913	Aibench for asr decoder Summary: as title Reviewed By: sf-wind Differential Revision: D9738021 fbshipit-source-id: 98f570484bca6486ad99207732efd534ec7e3251	2018-09-12 14:25:19 -07:00
Orion Reblitz-Richardson	6508db7421	Remove BUILD_CAFFE2 and build everything (#8338 ) Summary: This completely removes BUILD_CAFFE2 from CMake. There is still a little bit of "full build" stuff in setup.py that enables USE_CUDNN and BUILD_PYTHON, but otherwise everything should be enabled for PyTorch as well as Caffe2. This gets us a lot closer to full unification. cc mingzhe09088, pjh5, ezyang, smessmer, Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/8338 Reviewed By: mingzhe09088 Differential Revision: D9600513 Pulled By: orionr fbshipit-source-id: 9f6ca49df35b920d3439dcec56e7b26ad4768b7d	2018-08-31 13:10:24 -07:00
jgong5	c755616e00	Enable Detectron model inference for CPU and MKL-DNN paths (#10157 ) Summary: 1. Support ops needed for inference of Faster-RCNN/Mask-RCNN needed in Detectron, mostly direct fallbacks. 2. Use CPU device to hold 0-dim tensors and integer tensors in both fallback op and blob feeder, needed by Detectron models. 3. Ignore 0-dim tensor in MKL-DNN concat operator. 4. Generate dynamic library of Detectron module for CPU device. This PR obsoletes #9164. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10157 Differential Revision: D9276837 Pulled By: yinghai fbshipit-source-id: dc364932ae4a2e7fcefdee70b5fce3c0cee91b6f	2018-08-29 15:11:01 -07:00
Mingzhe Li	964e30de1d	Workaround for Cuda9.2 and GCC7 compilation errors (#10510 ) Summary: Breaking out of #8338 This PR is a workaround for a bug with CUDA9.2 + GCC7. Here is the error this PR fixed: .../pytorch/caffe2/operators/elementwise_ops.h: In constructor ‘caffe2::BinaryElementwiseWithArgsOp<InputTypes, Context, Functor, OutputTypeMap>::BinaryElementwiseWithArgsOp(const caffe2::OperatorDef&, caffe2::Workspace)’: .../pytorch/caffe2/operators/elementwise_ops.h:106:189: error: ‘GetSingleArgument<bool>’ is not a member of ‘caffe2::BinaryElementwiseWithArgsOp<InputTypes, Context, Functor, OutputTypeMap>’ BinaryElementwiseWithArgsOp(const OperatorDef& operator_def, Workspace ws) Pull Request resolved: https://github.com/pytorch/pytorch/pull/10510 Reviewed By: orionr Differential Revision: D9319742 Pulled By: mingzhe09088 fbshipit-source-id: ce59e3db14539f071f3c20301e77ca36a6fc3f81	2018-08-14 20:54:52 -07:00
Tongliang Liao	3cbe8f0c3e	Detect system RocksDB installation with CMake config files. (#7315 ) Summary: On Windows, the FindRocksDB script doesn't detect rocksdb installation built by cmake. And it doesn't include/link the RocksDB dependencies either, like: * `Snappy` * `Shlwapi.lib` * `Rpcrt4.lib` This PR try to detect in config mode first before using private find module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/7315 Differential Revision: D9287587 Pulled By: Yangqing fbshipit-source-id: 314a36a14bfe04aa45013349c5537163fb4c5c00	2018-08-12 18:24:10 -07:00
Tongliang Liao	508de8109f	Added missing "AT_" prefix to macro. (#10436 ) Summary: For issue #10435 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10436 Differential Revision: D9287578 Pulled By: Yangqing fbshipit-source-id: b07de3a2d7fa6f980a189b5e8f7ce05dfa1bef50	2018-08-12 18:09:19 -07:00
Xiaomeng Yang	57d2d4bcff	Optimize reduce ops for 2d and 3d (#9992 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9992 Optimize reduce ops for 2d and 3d Reviewed By: houseroad Differential Revision: D9042505 fbshipit-source-id: 62af2125aa6439106293e59bdf6a2b920792fd2d	2018-08-04 13:53:58 -07:00
Jerry Zhang	aebf3b47ae	Remove template parameter from Tensor (#9939 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9939 Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13 Pull Request resolved: https://github.com/pytorch/translate/pull/166 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125 Closes https://github.com/pytorch/pytorch/pull/9125 Use inheritance for polymorphism, and remove template parameter This is to change the templating in call sites, the core implementations will change later Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are: 1. We added an extra argument DeviceType to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)), 2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided. 3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type 4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s. Reviewed By: ezyang, houseroad Differential Revision: D9024330 fbshipit-source-id: e0b8295d2dc6ebe2963383ded5af799ad17164ba	2018-07-27 10:56:39 -07:00
Jerry Zhang	969b62f276	Revert D8121878: Remove template parameter from Tensor Differential Revision: D8121878 Original commit changeset: 4a5e9a677ba4 fbshipit-source-id: d8e2c0bb145b52fbcca323b22d1d3346f0b3249e	2018-07-26 14:02:04 -07:00
Jerry Zhang	cd5adc7b5f	Remove template parameter from Tensor (#13 ) Summary: Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13 Pull Request resolved: https://github.com/pytorch/translate/pull/166 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125 Closes https://github.com/pytorch/pytorch/pull/9125 Use inheritance for polymorphism, and remove template parameter This is to change the templating in call sites, the core implementations will change later Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are: 1. We added an extra argument DeviceType to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)), 2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided. 3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type 4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s. Reviewed By: xw285cornell Differential Revision: D8121878 fbshipit-source-id: 4a5e9a677ba4ac82095df959851a054c81eccf81	2018-07-26 10:25:23 -07:00
Fei Sun	14d4bdb406	Reformat output data format to make it more general for other binaries (#9555 ) Summary: This is to simplify the data format during benchmarking. After this change, we can use the same benchmarking harness data conversion method to parse data from multiple binaries. This change should be coordinated with the PR: https://github.com/facebook/FAI-PEP/pull/63 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9555 Reviewed By: pjh5 Differential Revision: D8903024 Pulled By: sf-wind fbshipit-source-id: 61cabcff99f0873729142ec6cb6dc230c685d13a	2018-07-23 11:11:26 -07:00
Brad Stocks	c4bff25282	Additional operator information values (#9153 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9153 Closes https://github.com/pytorch/pytorch/pull/9153 Modified the values reported by the benchmarking platform to include tensor_shape and op_args. These values have a different naming scheme to values like flops and latency. Reviewed By: sf-wind Differential Revision: D8729791 fbshipit-source-id: f050200be01c6d0794bf5faaa6e8cef12a00affe	2018-07-16 17:40:44 -07:00
Hao Lu	af107c4d16	Fix shape inference bug (#9199 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9199 The input shapes are not logged correctly in production because `PerfNetObserver::Stop()` only gets called after the inference is done for the net and in the mobile models, it's common practice to reuse the blobs as much as possible to save memory. And the shapes of the blobs keep changing during inference. By the time you you query `InputTensorShapes()` in `PerfNetObserver::Stop()`, you only get the final shape of the blobs. To fix this bug, I moved the 'InputTensorShapes()' query from `PerfNetObserver::Stop()` to `PerfOperatorObserver::Stop()`. The latter gets called at the end of operator->run() whereas `PerfNetObserver::Stop()` gets called at the end of net->run(). Also remove `PerfOperatorObserver::getAnalyticalCost()` since it's now done on the server side and no longer needed on mobile Reviewed By: Maratyszcza Differential Revision: D8743346 fbshipit-source-id: 5d2d0132e3f5e084be7d0173863e695e62a6b4a0	2018-07-06 15:15:17 -07:00
llyfacebook	681964cc47	output each operator separately due to logcat truncation (#8456 ) as title	2018-06-13 21:05:05 -04:00
Xiaomeng Yang	44973a06ba	Add affine_channel_op (#8356 ) Add affine_channel_op	2018-06-11 20:51:11 -07:00
llyfacebook	0c9b5f0825	Change the output format of caffe2 observers (#8261 ) as title	2018-06-07 17:30:43 -07:00
llyfacebook	7cace7219a	Change the benchmark log format and also log flops (#8215 ) as title	2018-06-06 17:04:54 -07:00
Bram Wasti	82b981e4db	Update from facebook 1ee4edd286a3 (#8040 ) * Adding instance weight to batch distill loss as title * add bfloat 16-31 added bfloat 16-31 and their respective unit tests * [CUDA9] Upgrade - fbcode CUDA9 upgrade diff D5654023 has been out for a while thanks to Pieter. But with time growing it's becoming quite hard to rebase, because of the symlinks and auto-generated build/config files in tp2. Break D5654023 into two diffs, one touching tp2 config files, and another one touching fbcode TARGETS file (adding nvcc flag). These two should be a bit easier to rebase (for detailed procedure see "Test Plan"). This diff can only be committed if: 1. CUDA 9 rpm is rolled out fleet-wide (TBD) 2. NVidia driver 390.40 is rolled out fleet-wide (done) 3. Upgrade CUDA 9.1, cudnn 7.1, nccl 2.1 (done) 4. Make sure all dependents are built (done) 5. Test all C2 operators, PyTorch (see test plan) * Share intermediate int32 buffer across Conv ops Adding a known type * [C2 fix] infer function for ensure_cpu_output_op this is adding the missing device funtion for ensure_cpu_output_op * [int8] Add blob serializer/deserializer for Int8TensorCPU To export to logfiledb * [nomnigraph] Add try catch block to optimization passes in predictor This will catch failures that happen in the optimization pass. * Caffe2: avoid static initialization order fiasco for CAFFE_ENFORCE CAFFE_ENFORCE uses strack trace fetcher. Which is currently a global static variable. If at static initialization time CAFFE_ENFORCE is used, this is a SIOF. Recently CAFFE_ENFORCE was added into init functions registration, so we started to see this. Meyers singleton is going to provide safety here. If stacktrace fetcher was not registered yet, it will just use a dummy one. * NUMA support in SparseNN CPU benchmark Adding support for NUMA in SparseNN CPU benchmark * [mobile-roofline] Add logging needed for roofline model This should be all that's needed * Let the operators using the same input if the operators are not chained or else, we have to change the input data dims * fix null-pointer-use UBSAN errors in in reshape_op.h * revert previous fix on input blob name as title * Adding flag to let MineHardNegative automatically extract single value from dict Model exporter requires the output of the model to be a struct. This makes it convenient to use those models directly in MineHardNegative by allow automatic extraction of the single element of dict, which is a common use case. * Reverting change that broke internal tests back to OSS compatible state	2018-06-01 17:41:09 -04:00
xkszltl	89ba9dc44f	Import/export observer symbols for DLL, which fixes the linking error in Visual Studio. (#6834 ) * Import/export observer symbols for DLL, which fixes the linking error in Visual Studio. * Add support of all default cmake build types for release to cuda.	2018-05-31 10:22:21 -07:00
Lu Fang	664fe34e0a	[Caffe2][fbcode=>GH sync] Update from facebook 4323b18ce13c (#7116 ) * [fix] Re-enable events in RNN ops We have earlier added event disabling in RNN ops as back then we didn't use events, with current use cases this is no longer true (https://fburl.com/8vd0lp8y) * use ops with cude impl * Revert D7729695: [caffe2][fix] Re-enable events in RNN ops This reverts commit 4b215c7496fb724656ff4c776933a15bdbbcde5e @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * [observer] Clean up observer_config.h #accept2ship * [1/n] Refactor dataio_test.py Replace code duplication with a common function * Add barrier net that runs before training nets Add a synchonize barrier net that is run before training nets. With this net, shards that are faster will wait for other shards before start training. This reduce chances of the faster shards timing out during GLOO AllReduce. Removed explicit data_parallel_model.py.synchronize call in holmes workflow. Similar change in speech/asr_training workflow will come in another diff. * Support the dnnlowp backend in caffe2_benchmark This is for SHARE operator latency evaluation * Migrate integral_image_op to main caffe2 migrate integral_image_op(GPU version) given by https://fburl.com/yvqezigi to caffe2/caffe2/operators and implement its CPU version. Write up a test using the hypothesis_test mechanism * [pos_disc, fbcode] Implement unjoined lr loss As explained in https://our.intern.facebook.com/intern/wiki/Model_Based_Calibration/, when the dataset is an joined data set, where labels might change later, we need to use unjoined logloss. The implementation is almost the same as in Sigrid (https://fburl.com/1trngsls), where loss = y (log(p) - log(1-p)) + (1-y)(log(1-p)) = xy - (1-y)x - (1-y)log(1+exp(-x)) For x < 0, to ensure stability and avoid overflow, we reformulate the above exp as loss = xy - (1-y)x - (1-y)x + (1-y)log(1+exp(x)) = xy + (1-y)log(1+exp(x)) Then the final expression becomes loss = xy + (y - 1) x (x >= 0) - (1 - y) log(1 + exp(x - 2 x (x >= 0))) where y is the true label, x is the dot product and p = logistic(x). This kind of implementation is align with the current implementation of the original cross entropy in https://phabricator.intern.facebook.com/diffusion/FBS/browse/master/fbcode/caffe2/caffe2/operators/cross_entropy_op.cc;0bae3b5d0f825897c5e0dd0ff10f489d7271bf25$7-13 * Keep the array to fix the conflict * [C2] Compute Adagrad effective LR The AdagradWithLR op outputs an extra blob which is contains the average effective learning rate across all weights in this blob. * Open-source extractMetaNetDef & runGlobalInitialization, add new Predictor constructor from db file, and add run_map_outputs 1. Open-source extractMetaNetDef and runGlobalInitialization, for use in 2. new Predictor constructor from db file. 3. Add new run function that returns outputs as TensorMap * Disable eigen cpu Disable eigen cpu in transpose and reduce * Introduce request_only/object_only property of ModelLayer by default this is False * A simple TC Caffe2 benchmark We can run tunner, get MappingOptions and then use them to compare against cuBLAS currently broken due to LLVM issues. How to run: hg checkout eec1ab31b59c03b8deded1c755a9abaf8c45be01 add D7401202 add D7434625 add D7506031 add D7540728 buck run @mode/dev-nosan tc/tc/benchmarks_python:caffe2_benchmark * Move Caffe2 feature_maps_ops to open source Need feature maps operators in open source project facebookresearch/BlueWhale * Manually fix the conflicts in channel shuffle op * Fix the inconsistency between different gh and fbcode * Skip Adagrad GPU Test (Because some gpu implementation is missing) * Fix another test to make sure it won't run on gpu when implementation is not available yet	2018-05-01 20:49:00 -07:00
Bram Wasti	aa56a1211d	Update from facebook (#6871 ) * Track checkpoint performance in scuba As title. * [C2/CUDA]: fix cross entropy sigmoid with logits when adding log_d_trick, I forgot to add it to the cuda impl; this diff fixes it. * Back out "[caffe2] Unregister MKL fallbacks for NCHW conversions" Original commit changeset: 8918dd40205a Will land after @jongsoo's diff https://phabricator.intern.facebook.com/D7596315 lands * [Easy][C2] Don't add blob to external outputs from output_record if it's already external output As desc. * On Mobile phones, call GlobalInit with no arguments in predictor in case we need to perform initialization FACEBOOK: The QPL logger needs the initialization code. In the past, the initialization code is put in the pipeline calling Caffe2. However, those places become obsolete quickly, as the product teams change places to call Caffe2 from time to time. We also need to track which teams use Caffe2 so that we can put the initialization code there. With this diff, the initialization code is put in the predictor constructor, only enabled for mobile phones. This way, we can always enable QPL logging. Once we do this, we can check how many times Caffe2 inference is called in production, and which models are more popular in production. This way, we can prioritize our effort supporting those models. Will clean up the old code calling the init in the product in a separate diff. * add padding op for sparse length tensor to pad length-based sparse tensor with padding_value * Add conv_op with cudaconvnet engine Add conv_op with cudaconvnet engine * [numa] Fix simple NUMA copy benchmark Move XavierFill into init_net and also compute BW * call roundf (device function) instead of round (host function) * [caffe2_benchmark][observer] Make caffe2_benchmark use its own observer 1. Add ClearGlobalNetObservers() 2. Make caffe2_benchmark use its own observer and observer_reporter * [detectron] Use roundf instead of round in the detectron module ops * allow K larger than number of elements in top k op one use case is to use this op together with PackSegments for sparse tensors, where the number of elements in each slice is not statistically defined. * add ChannelShuffle DNNLOWP op * fixup math_cpu.cc break	2018-04-23 15:01:56 -07:00
xkszltl	7e1c5ca6d5	Add missing #include for CAFFE2_MODULE macro. (#6790 )	2018-04-19 20:46:09 -07:00
daquexian	63d42408d0	[Caffe2] Detectron fpn support (#6645 ) * [Caffe2] Update collect_and_distribe op to fit arbitrary size * [Caffe2] batch_permutation CPU implementation * Make requested changes	2018-04-18 10:00:49 -07:00
Yinghai Lu	434f710f3f	[Caffe2] Add support to TensorRT (#6150 ) * Add support to TensorRT * Removed License header * Bind input/output by position * Comments * More comments * Add benchmark * Add warning for performance degradation on large batch * Address comments * comments	2018-04-11 17:03:54 -07:00
Martin Schatz	8baa563daf	Change observer copy() method to take id parameter This diff is added to support the ProfileObserver in order to differentiate operators in the stepnet properly. Since copy() is only used in the context of RNNs, the name has been changed to reflect that.	2018-03-27 18:10:39 -07:00
Alexander Sidorov	e431c98205	Caffe2: Add support for several auto-created observers and move net summary to (#2304 ) a separate observer This allows to support several auto-attached observers.	2018-03-18 18:23:40 -07:00
Paul Jesse Hellemn	1df99e541c	Fixes for build errors on Windows with GPU (#2222 ) * Fixes for build errors on Windows with GPU * Typo	2018-03-11 15:44:14 -07:00
Yangqing Jia	dd1564b061	Caffe2 module update: move observers as well as binaries. (#2145 ) * Caffe2 module update: move observers as well as binaries. * Add threads linkage * Add Threads dependency to public interface	2018-03-06 14:45:21 -08:00
Dmytro Dzhulgakov	7d141d4243	Changes done internally at Facebook (#2154 ) f679c644e332 dzhulgakov [caffe2] Sync script - add ability to handle rebase conflicts 51729b061a15 dzhulgakov [caffe2] Changes done on GitHub	2018-03-06 01:23:54 -08:00
Yangqing Jia	56096c2311	Building rocksdb as a module (#2094 )	2018-03-01 12:01:44 -08:00
Yanghan Wang	2828c7a391	Moved RoIAlign to OSS. Reviewed By: newstzpz Differential Revision: D6775228 fbshipit-source-id: a9a6689fb5f6004f13ec03db8410fd81e2e6468e	2018-01-24 13:05:27 -08:00
Ross Girshick	8e4f67ed72	Enable the detectron module in cmake Summary: Closes https://github.com/caffe2/caffe2/pull/1761 Reviewed By: pietern Differential Revision: D6749288 Pulled By: rbgirshick fbshipit-source-id: cfdd2a6c9fe30b7e8f24b2e83e4bb0191d1893a0	2018-01-18 10:21:22 -08:00
Ross Girshick	d6423d9895	Import Detectron ops	2018-01-17 10:31:30 -08:00
Ilija Radosavovic	387b4234ea	Provide CMake support for detectron ops Reviewed By: Yangqing Differential Revision: D6637258 fbshipit-source-id: 72b2bf55a5f8ca8e322c8b65f62977416319ed9e	2018-01-03 06:23:14 -08:00
Yangqing Jia	545c0937fb	Making a module option for Caffe2 Summary: This will help releasing models that are using Caffe2 but have their own operator implementations and extensions. More detailed docs to arrive later. Let's see what contbuild says. Closes https://github.com/caffe2/caffe2/pull/1378 Differential Revision: D6155045 Pulled By: Yangqing fbshipit-source-id: 657a4c8de2f8e095bad5ed5db5b3e476b2a877e1	2017-10-26 12:33:58 -07:00

1 2 3

105 Commits