mirror of
https://github.com/zebrajr/pytorch.git
synced 2025-12-06 12:20:52 +01:00
Summary:
This PR move glu to Aten(CPU).
Test script:
```
import torch
import torch.nn.functional as F
import time
torch.manual_seed(0)
def _time():
if torch.cuda.is_available():
torch.cuda.synchronize()
return time.time()
device = "cpu"
#warm up
for n in [10, 100, 1000, 10000]:
input = torch.randn(128, n, requires_grad=True, device=device)
grad_output = torch.ones(128, n // 2, device=device)
for i in range(1000):
output = F.glu(input)
output.backward(grad_output)
for n in [10, 100, 1000, 10000]:
fwd_t = 0
bwd_t = 0
input = torch.randn(128, n, requires_grad=True, device=device)
grad_output = torch.ones(128, n // 2, device=device)
for i in range(10000):
t1 = _time()
output = F.glu(input)
t2 = _time()
output.backward(grad_output)
t3 = _time()
fwd_t = fwd_t + (t2 -t1)
bwd_t = bwd_t + (t3 - t2)
fwd_avg = fwd_t / 10000 * 1000
bwd_avg = bwd_t / 10000 * 1000
print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)."
% (n, fwd_avg, bwd_avg))
```
Test device: **skx-8180.**
Before:
```
input size(128, 10) forward time is 0.04 (ms); backwad avg time is 0.08 (ms).
input size(128, 100) forward time is 0.06 (ms); backwad avg time is 0.14 (ms).
input size(128, 1000) forward time is 0.11 (ms); backwad avg time is 0.31 (ms).
input size(128, 10000) forward time is 1.52 (ms); backwad avg time is 2.04 (ms).
```
After:
```
input size(128, 10) forward time is 0.02 (ms); backwad avg time is 0.05 (ms).
input size(128, 100) forward time is 0.04 (ms); backwad avg time is 0.09 (ms).
input size(128, 1000) forward time is 0.07 (ms); backwad avg time is 0.17 (ms).
input size(128, 10000) forward time is 0.13 (ms); backwad avg time is 1.03 (ms).
```
Fix https://github.com/pytorch/pytorch/issues/24707, https://github.com/pytorch/pytorch/issues/24708.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33179
Differential Revision: D19839835
Pulled By: VitalyFedyunin
fbshipit-source-id: e4d3438556a1068da2c4a7e573d6bbf8d2a6e2b9
60 lines
1.7 KiB
Bash
Executable File
60 lines
1.7 KiB
Bash
Executable File
#!/bin/bash
|
|
|
|
set -ex
|
|
|
|
ignore_warning() {
|
|
# Invert match to filter out $1.
|
|
set +e
|
|
grep -v "$1" doxygen-log.txt > temp.txt
|
|
set -e
|
|
mv temp.txt doxygen-log.txt
|
|
}
|
|
|
|
pushd "$(dirname "$0")/../../.."
|
|
|
|
cp aten/src/ATen/common_with_cwrap.py tools/shared/cwrap_common.py
|
|
cp torch/_utils_internal.py tools/shared
|
|
|
|
python aten/src/ATen/gen.py \
|
|
-s aten/src/ATen \
|
|
-d build/aten/src/ATen \
|
|
aten/src/ATen/Declarations.cwrap \
|
|
aten/src/THCUNN/generic/THCUNN.h \
|
|
aten/src/ATen/nn.yaml \
|
|
aten/src/ATen/native/native_functions.yaml
|
|
|
|
python tools/setup_helpers/generate_code.py \
|
|
--declarations-path build/aten/src/ATen/Declarations.yaml \
|
|
--nn-path aten/src
|
|
|
|
popd
|
|
|
|
# Run doxygen and log all output.
|
|
doxygen 2> original-doxygen-log.txt
|
|
cp original-doxygen-log.txt doxygen-log.txt
|
|
|
|
# Uncomment this if you need it for debugging; we're not printing this
|
|
# by default because it is confusing.
|
|
# echo "Original output"
|
|
# cat original-doxygen-log.txt
|
|
|
|
# Filter out some warnings.
|
|
ignore_warning "warning: no uniquely matching class member found for"
|
|
ignore_warning "warning: explicit link request to 'Item' could not be resolved"
|
|
ignore_warning "warning: Included by graph for 'types.h' not generated, too many nodes"
|
|
|
|
# Count the number of remaining warnings.
|
|
warnings="$(grep 'warning:' doxygen-log.txt | wc -l)"
|
|
|
|
echo "Treating all remaining warnings as errors"
|
|
|
|
if [[ "$warnings" -ne "0" ]]; then
|
|
echo "Failing Doxygen test because the following warnings were treated fatally:"
|
|
cat doxygen-log.txt
|
|
echo "Please fix these warnings. To run this test locally, use docs/cpp/source/check-doxygen.sh"
|
|
rm -f doxygen-log.txt original-doxygen-log.txt
|
|
exit 1
|
|
fi
|
|
|
|
rm -f doxygen-log.txt original-doxygen-log.txt
|