pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Xiaomeng Yang	278d398748	Add GPU version of math::Transpose Summary: Add GPU version of math::Transpose Reviewed By: Yangqing Differential Revision: D6747958 fbshipit-source-id: 7047107609386c1ab53492381ca9bcf8bccd2924	2018-01-24 14:18:02 -08:00
Xiaomeng Yang	0a8a18ca01	Fix GemmBatched Summary: Fix GemmBatched Reviewed By: Yangqing Differential Revision: D6678168 fbshipit-source-id: 132117633573600d4e31c1959a0ccbe34416e1f1	2018-01-10 18:16:52 -08:00
Xian Li	c1d9694f42	Backed out changeset 6f532bad5824 Summary: D6636282 caused regression test failure of nmt model use in prod, see 24949620 for besect history. Reviewed By: pietern Differential Revision: D6671602 fbshipit-source-id: d863013964666727cf488a6ac5b01f5216f149d9	2018-01-05 19:34:38 -08:00
Xiaomeng Yang	2cda295244	Adds cpu version of transpose util function in math. Summary: Adds transpose CPU version to prepare for LC layer. Reviewed By: Yangqing Differential Revision: D6641358 fbshipit-source-id: 1825b4c270dea2c0049ba334303abcbf50b22ee7	2018-01-04 23:05:40 -08:00
Xiaomeng Yang	68726df0ac	Fix GemmBatchedOp Summary: Fix GemmBatchedOp to prepare for LC Layer. Reviewed By: Yangqing Differential Revision: D6636282 fbshipit-source-id: 6f532bad582442ebf3da843e973eb85405371c02	2018-01-03 21:16:18 -08:00
Yangqing Jia	77484ecc45	Manually applying cudnn5 pull request. Summary: TSIA. Closes #1631 Reviewed By: pietern, Maratyszcza Differential Revision: D6626887 fbshipit-source-id: 1a2dc7c47bc6ce794fdf598fbd547c04029edce4	2018-01-02 15:31:33 -08:00
Yangqing Jia	59b2654544	reapply header change after xplat move Summary: This is a reapplication of the earlier PR due to xplat move. Original author is Christoph Conrads <christoph.conrads@fluent.ai> christoph-conrads . Reviewed By: houseroad Differential Revision: D6379736 fbshipit-source-id: b7482ecf3b9487a528c15e92976e915791210002	2017-11-22 13:04:37 -08:00
Xianjie Chen	d1c73eb407	use size_t for rand fill functions in math Summary: The number of elements in the caffe2 blob can be larger than int32. Use size_t to prevent overflow. Reviewed By: ajtulloch Differential Revision: D6278363 fbshipit-source-id: 356e294c667a53360d8a65b56a63a39d5ce3384e	2017-11-09 18:44:46 -08:00
Ilia Cherniavskii	1dbbef6b48	Fix crash in blob deallocation Summary: We have to use copy constructor in Concat when copying non-primitive types Reviewed By: Yangqing Differential Revision: D6002883 fbshipit-source-id: 0aebc955079975bb6423291589ed09ce0660acf3	2017-10-10 19:03:01 -07:00
Yangqing Jia	8286ce1e3a	Re-license to Apache Summary: Closes https://github.com/caffe2/caffe2/pull/1260 Differential Revision: D5906739 Pulled By: Yangqing fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902	2017-09-28 16:22:00 -07:00
Aapo Kyrola	fb45383ed6	resubmission of PR1175: fp16 BatchMatMul Summary: PR 1175 caused a build error because gemmBatched was only under a specific #ifdef. Now put it outside the #ifdef, and things work. Reviewed By: asaadaldien Differential Revision: D5834868 fbshipit-source-id: 072a64c8f4b259ff7504104121766115b46b8aa0	2017-09-14 21:46:05 -07:00
Yangqing Jia	f0d0361609	Revert D5794634: [caffe2][PR] fp16: BatchMatMul Summary: This reverts commit 911c462824edec3de529a5a4385a4c437e24bf59 bypass-lint Differential Revision: D5794634 fbshipit-source-id: 1863b02282329cbee6b10e5870f03051b4bb6c58	2017-09-13 18:46:47 -07:00
Luke Yeager	3cfc6f26e7	fp16: BatchMatMul Summary: Was https://github.com/caffe2/caffe2/pull/1151 Closes https://github.com/caffe2/caffe2/pull/1175 Reviewed By: Yangqing Differential Revision: D5794634 Pulled By: akyrola fbshipit-source-id: 911c462824edec3de529a5a4385a4c437e24bf59	2017-09-13 14:35:25 -07:00
Wojciech Glogowski	e27431ddf5	New math.h functions required by YellowFin Summary: New math.h functions requred by YellowFin Reviewed By: akyrola Differential Revision: D5695258 fbshipit-source-id: b21a23b7f9647004173f8eb4f8ba9a852370d97a	2017-08-25 18:09:34 -07:00
Yangqing Jia	5954211ed9	Fix #997 Summary: cc phg1024 Closes https://github.com/caffe2/caffe2/pull/998 Differential Revision: D5538341 Pulled By: Yangqing fbshipit-source-id: 2df69e03c8c94c67628ab8051d2a863e93f49692	2017-08-01 11:21:00 -07:00
Wojciech Glogowski	f656e002a7	CosineSimilarity GPU Reviewed By: asaadaldien, akyrola Differential Revision: D5476812 fbshipit-source-id: d931a7d8e4a4dfdf22ee18f8b9c755cc21b0e75b	2017-07-25 13:34:01 -07:00
Matt Uyttendaele	7f28a891f3	added sincos function to caffe2/utils/math Summary: In situations where both sin & cos are necessary to compute, the joint SinCos function is faster than doing these individually. Both MKL and CUDA support this function, so exposing it here. Reviewed By: kmatzen Differential Revision: D5465588 fbshipit-source-id: 7686498e4f2d4b5862d83a1ecf14fcc88ea53640	2017-07-21 09:55:21 -07:00
Junjie Bai	4fddc04054	Use the same schema of switching to device reduce sum for SumSqrElements Summary: Based on benchmark script located at `caffe2/experiments/python/device_reduce_sum_bench.py`, device reduce sum is slower for N <= 10000, so we only switch to use device reduce for large N in SumElements. This diff applies the same schema for SumSqrElements. Reviewed By: jamesr66a Differential Revision: D5369868 fbshipit-source-id: ae13a611aff9d3464d1c4950ee155c740a2da339	2017-07-05 10:52:17 -07:00
Marat Dukhan	2ac9ff5c96	Cos, Sin, and Abs operators Summary: add Cos, Sin, and Abs operators Reviewed By: akyrola Differential Revision: D5307632 fbshipit-source-id: 743c9d289e4d3fd439e4b5385841cdff87d9247a	2017-07-03 22:18:32 -07:00
Junjie Bai	f3a59aedff	Use cub::DeviceReduce for faster math::Sum CUDA version Summary: Port SumElements and softmax_ops.cu to use device reduce sum Reviewed By: akyrola Differential Revision: D5351881 fbshipit-source-id: ca9604186c261ffcb1480da2a17baab8a4809372	2017-06-30 15:04:06 -07:00
Jeff Johnson	3f860af050	Implement TopKOp for GPU Summary: This is a real implementation (not GPUFallbackOp) of the TopKOp for GPU. There are two algorithm implementations: -for k <= 512, it maps to a warp-wide min-heap implementation, which requires only a single scan of the input data. -for k > 512, it maps to a multi-pass radix selection algorithm that I originally wrote in cutorch. I took the recent cutorch code and removed some cutorch-specific things as it made sense. Also added several utility files that one or the other implementations use, some from the Faiss library and some from the cutorch library. Reviewed By: jamesr66a Differential Revision: D5248206 fbshipit-source-id: ae5fa3451473264293516c2838f1f40688781cf3	2017-06-17 08:47:38 -07:00
Ahmed Taei	b294aadc66	fp16 support for FullyConnected op(Fixed) Summary: This diff resloved some issues in reverted PR246. Differential Revision: D4911821 fbshipit-source-id: 0a6fa47f4c2405475697e40fb926758c534f8ef7	2017-04-19 12:49:12 -07:00
Aapo Kyrola	9ab077dc9d	Revert D4871248: [caffe2][PR] fp16 support for FullyConnected op Summary: This reverts commit 6a991c2c993dcf0b1e18aa3f2ffbe19e693dbadd Differential Revision: D4871248 fbshipit-source-id: b6d812d09a00c83e363432e84742c503abfed65b	2017-04-17 21:31:20 -07:00
Simon Layton	1082db600e	fp16 support for FullyConnected op Summary: Includes math lib support, removal of double-precision. Closes https://github.com/caffe2/caffe2/pull/246 Reviewed By: Yangqing Differential Revision: D4871248 Pulled By: asaadaldien fbshipit-source-id: 6a991c2c993dcf0b1e18aa3f2ffbe19e693dbadd	2017-04-17 12:07:57 -07:00
Aapo Kyrola	092c1440a2	SumSqrElements Summary: Added SumSqrElements, since then we can avoid a large temporary blob which is needed when doing Sqr + SumElements. Also moved to reduction_ops, because utlitity_ops has grown too big. Reviewed By: jamesr66a Differential Revision: D4844172 fbshipit-source-id: 032eec45e24d6724f0d5fb83f4ec1c771d1146e5	2017-04-10 16:16:52 -07:00
Aapo Kyrola	ed44e87f98	use striped batch add for the recurrent network gradient Summary: Instead of callint batch-size many math::Adds, added a new function that does a batch of additions. For CPU there is no difference, but for CUDA we do everything in one kernel. I don't think this has huge performance impact, but at least makes the CUDA profiling look better with less kernel launches. Reviewed By: jamesr66a Differential Revision: D4798411 fbshipit-source-id: 44ac65b2da5a615971219809b9298b4e122085cd	2017-03-30 08:57:16 -07:00
Ahmed Taei	e41d35909a	Conv-ND NCHW CUP/CUDA implementation Summary: Migrate caffe1 ConvNd implementation to caffe2. Reviewed By: Yangqing Differential Revision: D4659868 fbshipit-source-id: 14b178af3faa2c0b12e5a9f7aa76c1d8945419ea	2017-03-20 14:01:07 -07:00
Yangqing Jia	1741fd839f	Re-apply windows diff D4657831 Summary: (Note: previous revert was due to a race condition between D4657831 and D4659953 that I failed to catch.) After this, we should have contbuild guarding the Windows build both with and without CUDA. This includes a series of changes that are needed to make Windows build, specifically: (1) Various flags that are needed in the cmake system, specially dealing with /MD, /MT, cuda, cudnn, whole static linking, etc. (2) Contbuild scripts based on appveyo. (3) For Windows build, note that one will need to use "cmake --build" to build stuff so that the build type is consistent between configuration and actual build. see scripts\build_windows.bat for details. (4) In logging.h, ERROR is already defined by Windows. I don't have a good solution now, and as a result, LOG(ERROR) on windows is going to be LOG(INFO). (5) variable length array is not supported by MSVC (and it is not part of C++ standard). As a result I replaced them with vectors. (6) sched.h is not available on Windows, so akyrola 's awesome simple async net might encounter some slowdown due to no affinity setting on Windows. (7) MSVC has a bug that does not work very well with template calls inide a templated function call, which is a known issue that should be fixed in MSVC 2017. However for now this means changes to conv_op_impl.h and recurrent_net_op.h. No actual functionalities are changed. (8) std host function calls are not supported in CUDA8+MSVC, so I changed lp_pool (and maybe a few others) to use cuda device functions. (9) The current Scale and Axpy has heavy templating that does not work well with MSVC. As a result I reverted azzolini 's changes to the Scale and Axpy interface, moved the fixed-length version to ScaleFixedSize and AxpyFixedSize. (10) CUDA + MSVC does not deal with Eigen well, so I guarded all Eigen parts to only the non-CUDA part. (11) In conclusion, it is fun but painful to deal with visual c++. Differential Revision: D4666745 fbshipit-source-id: 3c9035083067bdb19a16d9c345c1ce66b6a86600	2017-03-07 11:02:12 -08:00
Avani Nandini	039c3cf0ba	Revert D4657831: [caffe2][PR] Changes for Windows build to pass. Summary: This reverts commit 070ded372ed78a7e3e3919fdffa1d337640f146e Differential Revision: D4657831 fbshipit-source-id: 3a0fb403936a9257776d637ce3ba5dbd81e1119f	2017-03-06 21:02:36 -08:00
Yangqing Jia	7b8c7b11d2	Changes for Windows build to pass. Summary: After this, we should have contbuild guarding the Windows build both with and without CUDA. This includes a series of changes that are needed to make Windows build, specifically: (1) Various flags that are needed in the cmake system, specially dealing with /MD, /MT, cuda, cudnn, whole static linking, etc. (2) Contbuild scripts based on appveyo. (3) For Windows build, note that one will need to use "cmake --build" to build stuff so that the build type is consistent between configuration and actual build. see scripts\build_windows.bat for details. (4) In logging.h, ERROR is already defined by Windows. I don't have a good solution now, and as a result, LOG(ERROR) on windows is going to be LOG(INFO). (5) variable length array is not supported by MSVC (and it is not part of C++ standard). As a result I replaced them with vectors. (6) sched.h is not available on Windows, so akyrola 's awesome simple async net might encounter some slowdown due to no affinity setting on Windows. (7) MSVC has a Closes https://github.com/caffe2/caffe2/pull/183 Reviewed By: ajtulloch Differential Revision: D4657831 Pulled By: Yangqing fbshipit-source-id: 070ded372ed78a7e3e3919fdffa1d337640f146e	2017-03-06 20:03:37 -08:00
Kittipat Virochsiri	718786add7	UniqueUniformFillOp Summary: This is like `UniformIntFill` but guarantee to return unique elements in the output, excluding the optional avoiding elements. Reviewed By: xianjiec Differential Revision: D4511814 fbshipit-source-id: 5dc98ee580616e60e46ee74ebb3f5ddd29a09965	2017-02-15 16:00:44 -08:00
Yangqing Jia	d87edd39e7	math gemm interface fix Summary: I don't know why I did this embarrassing bug that changes the order of ldb and beta in the gemm interface. This fixes that. Differential Revision: D4014493 fbshipit-source-id: 1aec950b6e9d57e947654d4044e50930f2db1344	2016-12-19 10:45:20 -08:00
Xianjie Chen	dea27ca4ca	use TIndex for set in math.h Summary: as desc Differential Revision: D4271900 fbshipit-source-id: 92f7cbbe33e0ce4fcc21a8af9ded4f436afb43e2	2016-12-05 11:53:27 -08:00
Yangqing Jia	589398950f	fbsync at f5a877	2016-11-18 15:41:06 -08:00
Yangqing Jia	238ceab825	fbsync. TODO: check if build files need update.	2016-11-15 00:00:46 -08:00
Yangqing Jia	b23e51d467	chunky sync	2016-09-06 15:55:19 -07:00
Yangqing Jia	05512d1e10	sync	2016-08-10 11:02:15 -07:00
Yangqing Jia	6463eebc7b	chunky sync - build scripts to be written	2016-07-21 10:16:42 -07:00
Yangqing Jia	559053d3a8	chunky sync	2016-05-13 14:43:48 -07:00
Yangqing Jia	4ae1bbbd7e	bugfix	2016-03-11 10:30:16 -08:00
Yangqing Jia	50874dc746	relu and pool wip	2016-02-01 14:08:10 -08:00
Yangqing Jia	1740974347	average pooling wrapper: without this the NHWC path would throw an error as the order is not passed along.	2016-01-22 09:31:49 -08:00
Yangqing Jia	98c5b86ef7	A few changes: (1) cudnn for conv (2) cublas: after going through the work I feel it's beter to use HOST pointer mode, so changed it. (3) storage order: despite that googlenet and multibox uses NHWC, it seems better to be still using NCHW as default to be consistent with caffe and cudnn; moved to NCHW as default.	2015-10-21 22:37:11 -07:00
Yangqing Jia	648d1b101a	A consolidation of a couple random weekend work. (1) various bugfixes. (2) Tensor is now a class independent from its data type. This allows us to write easier type-independent operators. (3) code convention changes a bit: dtype -> T, Tensor<Context> -> Tensor alias. (4) ParallelNet -> DAGNet to be more consistent with what it does. (5) Caffe's own flags library instead of gflags. (6) Caffe's own logging library instead of glog, but glog can be chosen with compile-time definition -DCAFFE2_USE_GOOGLE_GLOG. As a result, glog macros like CHECK, DCHECK now have prefix CAFFE_, and LOG() now becomes CAFFE_LOG_. (7) an optional protobuf inclusion, which can be chosen with USE_SYSTEM_PROTOBUF in build_env.py.	2015-10-11 23:14:06 -07:00
Yangqing Jia	2ed1077a83	A clean init for Caffe2, removing my earlier hacky commits.	2015-06-25 16:26:01 -07:00

1 2

95 Commits