pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Xiang Gao	23174ca71b	[reland] Enable TF32 support for cuBLAS (#41498 ) Summary: fix rocm Pull Request resolved: https://github.com/pytorch/pytorch/pull/41498 Reviewed By: mruberry Differential Revision: D22560572 Pulled By: ngimel fbshipit-source-id: 5ee79e96cb29e70d9180830d058efb53d1c6c041	2020-07-15 21:00:55 -07:00
Shen Li	3a63a939d4	Revert D22517785: [pytorch][PR] Enable TF32 support for cuBLAS Test Plan: revert-hammer Differential Revision: D22517785 (`288ece89e1`) Original commit changeset: 87334c893561 fbshipit-source-id: 0a0674f49c1bcfc98f7f88af5a8c7de93b76e458	2020-07-15 08:15:48 -07:00
Xiang Gao	288ece89e1	Enable TF32 support for cuBLAS (#40800 ) Summary: Benchmark on a fully connected network and torchvision models (time in seconds) on GA100: \| model \| batch size \| forward(TF32) \| forward(FP32) \| backward(TF32) \| backward(FP32) \| \|--------------------\|------------\|---------------\|---------------\|----------------\|----------------\| \| FC 512-128-32-8 \| 512 \| 0.000211 \| 0.000321 \| 0.000499 \| 0.000532 \| \| alexnet \| 512 \| 0.0184 \| 0.0255 \| 0.0486 \| 0.0709 \| \| densenet161 \| 128 \| 0.0665 \| 0.204 \| 0.108 \| 0.437 \| \| googlenet \| 256 \| 0.0925 \| 0.110 \| 0.269 \| 0.326 \| \| inception_v3 \| 256 \| 0.155 \| 0.214 \| 0.391 \| 0.510 \| \| mnasnet1_0 \| 512 \| 0.108 \| 0.137 \| 0.298 \| 0.312 \| \| mobilenet_v2 \| 512 \| 0.114 \| 0.294 \| 0.133 \| 0.303 \| \| resnet18 \| 512 \| 0.0722 \| 0.100 \| 0.182 \| 0.228 \| \| resnext50_32x4d \| 256 \| 0.170 \| 0.237 \| 0.373 \| 0.479 \| \| shufflenet_v2_x1_0 \| 512 \| 0.0463 \| 0.0473 \| 0.125 \| 0.123 \| \| squeezenet1_0 \| 512 \| 0.0870 \| 0.0948 \| 0.205 \| 0.214 \| \| vgg16 \| 256 \| 0.167 \| 0.234 \| 0.401 \| 0.502 \| \| wide_resnet50_2 \| 512 \| 0.186 \| 0.310 \| 0.415 \| 0.638 \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/40800 Reviewed By: mruberry Differential Revision: D22517785 Pulled By: ngimel fbshipit-source-id: 87334c8935616f72a6af5abbd3ae69f76923dc3e	2020-07-14 13:21:10 -07:00
Tongzhou Wang	973d51079b	Add device-specific cuFFT plan caches (#19300 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/19224 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19300 Differential Revision: D14986967 Pulled By: soumith fbshipit-source-id: 8c31237db50d6924bba1472434c10326610d9255	2019-04-18 06:39:35 -07:00
Edward Yang	50df3e5e2e	Add ability to query if built with CUDA and MKL-DNN. (#18362 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18362 ghimport-source-id: 374b7ab97e2d6a894368007133201f510539296f Stack from [ghstack](https://github.com/ezyang/ghstack): * #18242 Test running a CUDA build on CPU machine. * #18362 Add ability to query if built with CUDA and MKL-DNN. Fixes #18108. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D14584430 fbshipit-source-id: 7605a1ac4e8f2a7c70d52e5a43ad7f03f0457473	2019-03-25 10:39:09 -07:00
Tongzhou Wang	e6c7b38f94	Cache cufft plans (#8344 ) * cache cufft plans * use an LRU cache * suffix CuFFTParams members with _ * import print_function for py2 * lint * fix potential race; add dummy impl for CPU only builds * cpp formatting; remove nccl makefile change * Use CUDA hooks instead * comments and doc * update the error message * move LRU cachae to a separate file and native::detail namespace * update comment * specify NOTE location in CuFFTPlanCache.h * update disabled_features.yaml to make amd ci work * another fix for AMD CI in disabled_features.yaml * Wrap cufft_plan_cache_* methods in __HIP_PLATFORM_HCC__ * improve the notes * lint * revert onnx change * put back inlining for CUFFT_CHECK	2018-06-22 13:02:34 -04:00

6 Commits