Commit Graph

21 Commits

Author SHA1 Message Date
Lingyi Liu
2d884f2263 Optimize Scale function (#44913)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44913

Pull Request resolved: https://github.com/pytorch/pytorch/pull/18322

Optimize Scale function

i-am-not-moving-c2-to-c10

Test Plan: buck test mode/dbg caffe2/caffe2/python/operator_test:weighted_sum_test

Reviewed By: BIT-silence

Differential Revision: D14575780

fbshipit-source-id: db333a7964581dcaff6e432ff1d6b517ba1a075f
2020-09-18 14:31:33 -07:00
Kevin Matzen
6d8649dc53 [caffe2] fix Transpose2D calls in NHWC<->NCHW (#34625)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34625

These templated function calls are not specifying the template args correctly.  The first arg is the index type, not the array data type.  That means, right now it's using `T` as the index type as well, which will break if we do a template specialization for uint8_t.  If we omit both, it will correctly infer that the index type is `int` and the data type is `T`.

Reviewed By: BIT-silence

Differential Revision: D20358728

fbshipit-source-id: 8cbd8eeb14bce602c02eb6fce2cc141f0121fa24
2020-03-16 15:18:44 -07:00
Igor Sugak
23846d5a38 [caffe2] use Clang identification macro in various places (#33574)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33574

Sprinkle with Clang identification macro places that otherwise would cause build errors when Clang is used to drive the CUDA compilation.

Note: `__clang__` is defined when either Clang is used as host compiler by NVCC or when Clang drives the compilation. `__CUDA__` is defined only for the latter case.

Test Plan:
```lang=bash
buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow
buck build mode/opt //fblearner/flow/projects/dper:workflow
```

Reviewed By: BIT-silence

Differential Revision: D20007440

fbshipit-source-id: 53caa70695b99461a3910d41dc71a9f6d0728a75
2020-02-20 15:16:11 -08:00
Gregory Chanan
2f03205c65 Support torch::tensor and at::tensor with bool and BFloat16 dtypes.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23337

Test Plan: Imported from OSS

Differential Revision: D16467979

Pulled By: gchanan

fbshipit-source-id: 2e6ad431c47a61c917d501390d14c55b788958ab
2019-08-09 12:36:35 -07:00
Xiaomeng Yang
29b53b0259 Fix bug in caffe2 transpose on GPU (#22233)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22233

Fix bug in caffe2 transpose on GPU

Reviewed By: hl475

Differential Revision: D15994973

fbshipit-source-id: 542dc8757b51a6322fffa55826c1d4e32927398d
2019-06-26 11:33:25 -07:00
Xiaomeng Yang
2ce39de3fc Add elementwise_affine for layer_norm_op (#19713)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19713

Add elementwise_affine for layer_norm_op

Reviewed By: houseroad

Differential Revision: D15075454

fbshipit-source-id: e8a7d3da1c81e49fa55323f5e74a68bc4ef8d83f
2019-04-26 17:20:01 -07:00
Xiaomeng Yang
fb9fc42a0c optimize BatchMatmulOp (#18612)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18612

optimize BatchMatmulOp

Reviewed By: houseroad

Differential Revision: D14681665

fbshipit-source-id: cf5ea4909ace58fd44fe6fa634531102ac84e851
2019-04-23 15:34:59 -07:00
Xiaomeng Yang
fd40c0eba0 Add gelu op (#18992)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18992

Add gelu op

Reviewed By: houseroad

Differential Revision: D14814811

fbshipit-source-id: 00f126b8b83763c57ebbf28fbd2de5a8fab6d491
2019-04-08 21:58:29 -07:00
Xiaomeng Yang
265fa0ce4d Move math::Axpy function to elementwise lib (#18316)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18316

Move math::Axpy function to elementwise lib

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D14574697

fbshipit-source-id: 7cfbb2da295c8966c5328bd6b577cce2638eea62
2019-03-26 12:19:19 -07:00
nihui
ed8c462dc7 Fix caffe2 build with BLAS=OpenBLAS (#18422)
Summary:
g++ complains about failing to find the declaration of cblas_sscal and cblas_dscal BLAS function
let's fix it  :)

fedora 29, gcc 8.3.1, openblas 0.3.5
build with cmake -DBLAS=OpenBLAS ..
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18422

Differential Revision: D14598977

Pulled By: soumith

fbshipit-source-id: bde77bfb359d2ff38226401caeed78c114ef7468
2019-03-25 11:59:10 -07:00
Xiaomeng Yang
e04c9195b7 Update math::Transpose to support tensor with size > 2G (#17670)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17670

Update math::Transpose to support tensor with size > 2G

i-am-not-moving-c2-to-c10

Differential Revision: D14313624

fbshipit-source-id: 0b4a85b913972e5a8981f0d40d0c539407b98f30
2019-03-20 18:22:21 -07:00
Xiaomeng Yang
0fd1dc45c0 Optimize LayerNormOp (#17604)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17604

Optimize LayerNormOp

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D14274175

fbshipit-source-id: a7aa263a1b0eb109682d2be99306e7b2cdcc0faf
2019-03-08 17:38:14 -08:00
Xiaomeng Yang
9709d5e787 Fix math::Set for large tensor (#17539)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17539

Fix math::Set for large tensor

i-am-not-moving-c2-to-c10

Reviewed By: dzhulgakov, houseroad

Differential Revision: D14240756

fbshipit-source-id: 0ade26790be41fb26d2cc193bfa3082c7bd4e69d
2019-02-27 12:34:58 -08:00
Xiaomeng Yang
2e67b34ea7 Separate gpu reduce functions (#17146)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17146

Separate gpu reduce functions

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D14097564

fbshipit-source-id: a27de340997111a794b1d083c1673d4263afb9fb
2019-02-20 14:49:01 -08:00
Xiaomeng Yang
3a34f443c5 Separate reduce functions from math (#16929)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16929

Separate CPU reduce functions from math

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D13999469

fbshipit-source-id: bd628b15a6e3c1f04cc62aefffb0110690e1c0d1
2019-02-13 17:50:47 -08:00
Xiaomeng Yang
2db847b3a7 Separate elementwise level2 math functions (#16753)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16753

Separate elementwise level2 math functions

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D13954928

fbshipit-source-id: 1ca7a5d3da96e32510f502e5e4e79168854bee67
2019-02-07 18:38:26 -08:00
Xiaomeng Yang
7d4a81cbb2 Use macro for reduce on 2d blocks (#16344)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16344

Use macro for reduce on 2d blocks

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D13808988

fbshipit-source-id: b68c0fb6079c1b6e203a072083aba7a95c202bc2
2019-02-01 23:49:07 -08:00
Xiaomeng Yang
598b713660 Seperate level1 elementwise functions from math (#16397)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16397

Seperate level1 elementwise functions from math

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D13830626

fbshipit-source-id: e6e672647076dab8b3b24be181f580a1486250c9
2019-01-30 00:04:12 -08:00
Xiaomeng Yang
0a2d14dd7c Optimize SpatialBNOp on GPU (#16395)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16395

Optimize SpatialBNOp on GPU

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D13829833

fbshipit-source-id: 04d2a63e8e9830c4c39a91cf87fcd7aa765dc55f
2019-01-28 09:36:45 -08:00
Xiaomeng Yang
866c4e3467 Separate Moments from math and optimize it (#16175)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16175

Separate Moments from math and optimize it

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D13742472

fbshipit-source-id: 90757d908d38c98ca69818855aaf68315e525992
2019-01-20 08:53:25 -08:00
Xiaomeng Yang
b436f94b53 Separate affine_channel from math and optimize it (#16135)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16135

Separate affine_channel from math and optimize it

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D13727606

fbshipit-source-id: 8980af4afadaf964a18a9da581106fe30896a7e9
2019-01-18 22:40:16 -08:00