Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15588
Use NHWC2NCHW or NCHW2NHWC functions which is easier to understand compared to code using transpose and generalizable to non-2D convolutions.
Reviewed By: csummersea
Differential Revision: D13557674
fbshipit-source-id: c4fdb8850503ea58f6b17b188513ae2b29691ec0
Summary:
This is mostly for reusing all the cudnn test cases in our python operator_tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12278
Differential Revision: D10842592
Pulled By: bddppq
fbshipit-source-id: 4b3ed91fca64ff02060837b3270393bc2f9a9898
Summary: Test in Jenkins fail becasue test_global_pooling_3d filtered too many tests. We made use of infered value of global_pooling (pad and stride will be constant) to reduce the test samples generated.
Reviewed By: pietern
Differential Revision: D6686840
fbshipit-source-id: d316c0e9f9070b12770170ab9f36e33de68a9ab9
Summary:
In D5681122 - when routing to global maxpool and average pool, the condition is not correct.
see T24876217 for discussion
Reviewed By: Yangqing
Differential Revision: D6665466
fbshipit-source-id: dcb5b4686249e6ee8e1e976ab66b003ef09b32fd
Summary:
This adds a fast path for global max pooling with NCHW. Compared to equivalent ReduceBackMean, this is about 3.5x faster.
Based on D5533059.
Reviewed By: akyrola
Differential Revision: D5681122
fbshipit-source-id: 7a4df934044c7dd01888f095f7dd46654aaf4eae
Summary:
KaimingHe debugged slow model, and found out that global average pooling was hideously slow, even with CUDNN. Turns out CUDNN pooling op (especially backward pass) is not optimized for global pooling.
This adds a fast path for global average pooling with NCHW. This is about 30x faster than CUDNN with 56 x 56 pooling, Compared to equivalent ReduceBackSum, this is about 3x faster.
I will bootcamp the max pooling.
Reviewed By: asaadaldien
Differential Revision: D5533059
fbshipit-source-id: 2d590693d737fa92184603663031d96f6145f304
Summary:
cuDNN versions of dropout and LRN (for native fp16 support), port of Caffe's max pooling algo that uses an explicit mask to store locations (also supports fp16 storage)
Closes https://github.com/caffe2/caffe2/pull/396
Reviewed By: akyrola
Differential Revision: D4990880
Pulled By: asaadaldien
fbshipit-source-id: a716acffb656843e9b31e3e6808bd2d8aa959d03
Summary:
Needed by oss.
This is done by running the following line:
find . -name "*_test.py" -exec sed -i '$ a \\nif __name__ == "__main__":\n import unittest\n unittest.main()' {} \;
Reviewed By: ajtulloch
Differential Revision: D4223848
fbshipit-source-id: ef4696e9701d45962134841165c53e76a2e19233