pytorch/docs/source/notes
Xiang Gao 288ece89e1 Enable TF32 support for cuBLAS (#40800)
Summary:
Benchmark on a fully connected network and torchvision models (time in seconds) on GA100:

| model              | batch size | forward(TF32) | forward(FP32) | backward(TF32) | backward(FP32) |
|--------------------|------------|---------------|---------------|----------------|----------------|
| FC 512-128-32-8    | 512        | 0.000211      | 0.000321      | 0.000499       | 0.000532       |
| alexnet            | 512        | 0.0184        | 0.0255        | 0.0486         | 0.0709         |
| densenet161        | 128        | 0.0665        | 0.204         | 0.108          | 0.437          |
| googlenet          | 256        | 0.0925        | 0.110         | 0.269          | 0.326          |
| inception_v3       | 256        | 0.155         | 0.214         | 0.391          | 0.510          |
| mnasnet1_0         | 512        | 0.108         | 0.137         | 0.298          | 0.312          |
| mobilenet_v2       | 512        | 0.114         | 0.294         | 0.133          | 0.303          |
| resnet18           | 512        | 0.0722        | 0.100         | 0.182          | 0.228          |
| resnext50_32x4d    | 256        | 0.170         | 0.237         | 0.373          | 0.479          |
| shufflenet_v2_x1_0 | 512        | 0.0463        | 0.0473        | 0.125          | 0.123          |
| squeezenet1_0      | 512        | 0.0870        | 0.0948        | 0.205          | 0.214          |
| vgg16              | 256        | 0.167         | 0.234         | 0.401          | 0.502          |
| wide_resnet50_2    | 512        | 0.186         | 0.310         | 0.415          | 0.638          |

Pull Request resolved: https://github.com/pytorch/pytorch/pull/40800

Reviewed By: mruberry

Differential Revision: D22517785

Pulled By: ngimel

fbshipit-source-id: 87334c8935616f72a6af5abbd3ae69f76923dc3e
2020-07-14 13:21:10 -07:00
..
amp_examples.rst Small clarification of torch.cuda.amp multi-model example (#41203) 2020-07-10 11:13:26 -07:00
autograd.rst Autograd Doc for Complex Numbers (#41012) 2020-07-10 09:57:43 -07:00
broadcasting.rst [docs] Update broadcasting and cuda semantics notes (#6904) 2018-04-24 13:41:24 -04:00
cpu_threading_runtimes.svg Update CPU threading doc (#33083) 2020-02-11 14:13:51 -08:00
cpu_threading_torchscript_inference.rst Upgrade MKL-DNN to DNNL v1.2 (#32422) 2020-03-26 22:07:59 -07:00
cpu_threading_torchscript_inference.svg Threading and CPU Inference note 2019-07-29 15:45:49 -07:00
cuda.rst Enable TF32 support for cuBLAS (#40800) 2020-07-14 13:21:10 -07:00
ddp.rst Fix wrong link in docs/source/notes/ddp.rst (#40484) 2020-06-28 13:55:56 -07:00
extending.rst Prevent custom Functions from creating non differentiable type that requires grad (#38326) 2020-05-21 08:30:14 -07:00
faq.rst Revert "Revert D21337640: [pytorch][PR] Split up documentation into subpages and clean up some warnings" (#37778) 2020-05-04 14:32:35 -07:00
large_scale_deployments.rst Move ThreadLocalDebugInfo to c10 (#37774) 2020-05-11 19:27:41 -07:00
multiprocessing.rst Update docs for master to remove Python 2 references (#36336) 2020-04-16 10:15:48 -07:00
randomness.rst Enhance reproducibility documentation (#33795) 2020-03-06 15:32:04 -08:00
serialization.rst Add documentation about storage sharing is preserved and serialized f… (#40412) 2020-06-29 17:23:29 -07:00
windows.rst Update docs for master to remove Python 2 references (#36336) 2020-04-16 10:15:48 -07:00