mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-07 12:21:27 +01:00
Summary: Should close https://github.com/pytorch/pytorch/issues/42218 Numerically, `grid_sampler` is fine in fp16 or fp32, but takes several inputs and expects their dtypes to match, so it belongs on the autocast promote list. `grid_sampler` currently uses `gpuAtomicAdd`, notoriously slow in fp16 because it calls cuda's atomicAdd __half overload which uses a software compare-and-swap loop internally. To allow good performance if both inputs happen to be FP16, the PR also modifies `grid_sampler_[2,3]d_backward_kernel`s to use `fastAtomicAdd` instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58618 Reviewed By: mruberry Differential Revision: D29257199 Pulled By: ngimel fbshipit-source-id: 3cc7505945b480427f2fc1beb36bee80bf3853b3 |
||
|---|---|---|
| .. | ||
| caffe2 | ||
| cpp | ||
| source | ||
| .gitignore | ||
| libtorch.rst | ||
| make.bat | ||
| Makefile | ||
| README.md | ||
| requirements.txt | ||
Please see the Writing documentation section of CONTRIBUTING.md for details on both writing and building the docs.