pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

History

Xiang Gao 288ece89e1 Enable TF32 support for cuBLAS (#40800 ) Summary: Benchmark on a fully connected network and torchvision models (time in seconds) on GA100: \| model \| batch size \| forward(TF32) \| forward(FP32) \| backward(TF32) \| backward(FP32) \| \|--------------------\|------------\|---------------\|---------------\|----------------\|----------------\| \| FC 512-128-32-8 \| 512 \| 0.000211 \| 0.000321 \| 0.000499 \| 0.000532 \| \| alexnet \| 512 \| 0.0184 \| 0.0255 \| 0.0486 \| 0.0709 \| \| densenet161 \| 128 \| 0.0665 \| 0.204 \| 0.108 \| 0.437 \| \| googlenet \| 256 \| 0.0925 \| 0.110 \| 0.269 \| 0.326 \| \| inception_v3 \| 256 \| 0.155 \| 0.214 \| 0.391 \| 0.510 \| \| mnasnet1_0 \| 512 \| 0.108 \| 0.137 \| 0.298 \| 0.312 \| \| mobilenet_v2 \| 512 \| 0.114 \| 0.294 \| 0.133 \| 0.303 \| \| resnet18 \| 512 \| 0.0722 \| 0.100 \| 0.182 \| 0.228 \| \| resnext50_32x4d \| 256 \| 0.170 \| 0.237 \| 0.373 \| 0.479 \| \| shufflenet_v2_x1_0 \| 512 \| 0.0463 \| 0.0473 \| 0.125 \| 0.123 \| \| squeezenet1_0 \| 512 \| 0.0870 \| 0.0948 \| 0.205 \| 0.214 \| \| vgg16 \| 256 \| 0.167 \| 0.234 \| 0.401 \| 0.502 \| \| wide_resnet50_2 \| 512 \| 0.186 \| 0.310 \| 0.415 \| 0.638 \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/40800 Reviewed By: mruberry Differential Revision: D22517785 Pulled By: ngimel fbshipit-source-id: 87334c8935616f72a6af5abbd3ae69f76923dc3e		2020-07-14 13:21:10 -07:00
..
amp_examples.rst	Small clarification of torch.cuda.amp multi-model example (#41203 )	2020-07-10 11:13:26 -07:00
autograd.rst	Autograd Doc for Complex Numbers (#41012 )	2020-07-10 09:57:43 -07:00
broadcasting.rst	[docs] Update broadcasting and cuda semantics notes (#6904 )	2018-04-24 13:41:24 -04:00
cpu_threading_runtimes.svg	Update CPU threading doc (#33083 )	2020-02-11 14:13:51 -08:00
cpu_threading_torchscript_inference.rst	Upgrade MKL-DNN to DNNL v1.2 (#32422 )	2020-03-26 22:07:59 -07:00
cpu_threading_torchscript_inference.svg	Threading and CPU Inference note	2019-07-29 15:45:49 -07:00
cuda.rst	Enable TF32 support for cuBLAS (#40800 )	2020-07-14 13:21:10 -07:00
ddp.rst	Fix wrong link in docs/source/notes/ddp.rst (#40484 )	2020-06-28 13:55:56 -07:00
extending.rst	Prevent custom Functions from creating non differentiable type that requires grad (#38326 )	2020-05-21 08:30:14 -07:00
faq.rst	Revert "Revert D21337640: [pytorch][PR] Split up documentation into subpages and clean up some warnings" (#37778 )	2020-05-04 14:32:35 -07:00
large_scale_deployments.rst	Move ThreadLocalDebugInfo to c10 (#37774 )	2020-05-11 19:27:41 -07:00
multiprocessing.rst	Update docs for master to remove Python 2 references (#36336 )	2020-04-16 10:15:48 -07:00
randomness.rst	Enhance reproducibility documentation (#33795 )	2020-03-06 15:32:04 -08:00
serialization.rst	Add documentation about storage sharing is preserved and serialized f… (#40412 )	2020-06-29 17:23:29 -07:00
windows.rst	Update docs for master to remove Python 2 references (#36336 )	2020-04-16 10:15:48 -07:00