Commit Graph

2 Commits

Author SHA1 Message Date
Igor Sugak
5dde8cd483 [caffe2] fix no matching function min/max Clang errors (#33563)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33563

When NVCC or Clang are driving CUDA compilation many math functions are declared by default, with a small difference: Clang marks them as `__device__` only, while NVCC uses both `__host__` and `__device__`. This makes every un-elaborated `min` or `max` function call from a `__host__` function generate a syntax error when Clang is used.

Fix the errors by using `std::min` and `std::max` from `<algorithm>`, since C++14 they are `constexpr` and can be used in the `__device__` code [1].

1. https://llvm.org/docs/CompileCudaWithLLVM.html#algorithm

Test Plan:
```lang=bash
buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow
buck build mode/opt //fblearner/flow/projects/dper:workflow
```
Execute tests on devgpu:
```
buck test mode/dev-nosan -j 8 //caffe2/caffe2/python/operator_test/... //caffe2/test:cuda
```

Reviewed By: ngimel

Differential Revision: D20005795

fbshipit-source-id: 98a3f35e8a96c15d3ad3d2066396591f5cca1696
2020-02-28 11:33:24 -08:00
Bilge Acun
3ee97183b0 ScaleBlobs Operator (#19660)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19660

Implementation of aggregated Scale operator.
The operator takes a list of tensors as an input and scales all of them them with the argument float value.
The tensor sizes can be different, therefore bookkeeping of the sizes and pointers to the tensors are
necessary for the GPU version of the kernel.

Reviewed By: BIT-silence

Differential Revision: D14984233

fbshipit-source-id: 37cc97159a4f2c38cd6fff4f5710ab7d3a773611
2019-05-08 17:57:33 -07:00