pytorch

OSSForks/pytorch

Fork 0

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Commit Graph

Author	SHA1	Message	Date
Igor Sugak	5dde8cd483	[caffe2] fix no matching function min/max Clang errors (#33563 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33563 When NVCC or Clang are driving CUDA compilation many math functions are declared by default, with a small difference: Clang marks them as `__device__` only, while NVCC uses both `__host__` and `__device__`. This makes every un-elaborated `min` or `max` function call from a `__host__` function generate a syntax error when Clang is used. Fix the errors by using `std::min` and `std::max` from `<algorithm>`, since C++14 they are `constexpr` and can be used in the `__device__` code [1]. 1. https://llvm.org/docs/CompileCudaWithLLVM.html#algorithm Test Plan: ```lang=bash buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow buck build mode/opt //fblearner/flow/projects/dper:workflow ``` Execute tests on devgpu: ``` buck test mode/dev-nosan -j 8 //caffe2/caffe2/python/operator_test/... //caffe2/test:cuda ``` Reviewed By: ngimel Differential Revision: D20005795 fbshipit-source-id: 98a3f35e8a96c15d3ad3d2066396591f5cca1696	2020-02-28 11:33:24 -08:00
Bilge Acun	3ee97183b0	ScaleBlobs Operator (#19660 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19660 Implementation of aggregated Scale operator. The operator takes a list of tensors as an input and scales all of them them with the argument float value. The tensor sizes can be different, therefore bookkeeping of the sizes and pointers to the tensors are necessary for the GPU version of the kernel. Reviewed By: BIT-silence Differential Revision: D14984233 fbshipit-source-id: 37cc97159a4f2c38cd6fff4f5710ab7d3a773611	2019-05-08 17:57:33 -07:00

Author

SHA1

Message

Date

Igor Sugak

5dde8cd483

[caffe2] fix no matching function min/max Clang errors (#33563 )

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33563

When NVCC or Clang are driving CUDA compilation many math functions are declared by default, with a small difference: Clang marks them as `__device__` only, while NVCC uses both `__host__` and `__device__`. This makes every un-elaborated `min` or `max` function call from a `__host__` function generate a syntax error when Clang is used.

Fix the errors by using `std::min` and `std::max` from `<algorithm>`, since C++14 they are `constexpr` and can be used in the `__device__` code [1].

1. https://llvm.org/docs/CompileCudaWithLLVM.html#algorithm

Test Plan:
```lang=bash
buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow
buck build mode/opt //fblearner/flow/projects/dper:workflow
```
Execute tests on devgpu:
```
buck test mode/dev-nosan -j 8 //caffe2/caffe2/python/operator_test/... //caffe2/test:cuda
```

Reviewed By: ngimel

Differential Revision: D20005795

fbshipit-source-id: 98a3f35e8a96c15d3ad3d2066396591f5cca1696

2020-02-28 11:33:24 -08:00

Bilge Acun

3ee97183b0

ScaleBlobs Operator (#19660 )

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19660

Implementation of aggregated Scale operator.
The operator takes a list of tensors as an input and scales all of them them with the argument float value.
The tensor sizes can be different, therefore bookkeeping of the sizes and pointers to the tensors are
necessary for the GPU version of the kernel.

Reviewed By: BIT-silence

Differential Revision: D14984233

fbshipit-source-id: 37cc97159a4f2c38cd6fff4f5710ab7d3a773611

2019-05-08 17:57:33 -07:00

2 Commits