Lingyi Liu
2d884f2263
Optimize Scale function ( #44913 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44913
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18322
Optimize Scale function
i-am-not-moving-c2-to-c10
Test Plan: buck test mode/dbg caffe2/caffe2/python/operator_test:weighted_sum_test
Reviewed By: BIT-silence
Differential Revision: D14575780
fbshipit-source-id: db333a7964581dcaff6e432ff1d6b517ba1a075f
2020-09-18 14:31:33 -07:00
Kevin Matzen
6d8649dc53
[caffe2] fix Transpose2D calls in NHWC<->NCHW ( #34625 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34625
These templated function calls are not specifying the template args correctly. The first arg is the index type, not the array data type. That means, right now it's using `T` as the index type as well, which will break if we do a template specialization for uint8_t. If we omit both, it will correctly infer that the index type is `int` and the data type is `T`.
Reviewed By: BIT-silence
Differential Revision: D20358728
fbshipit-source-id: 8cbd8eeb14bce602c02eb6fce2cc141f0121fa24
2020-03-16 15:18:44 -07:00
Igor Sugak
23846d5a38
[caffe2] use Clang identification macro in various places ( #33574 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33574
Sprinkle with Clang identification macro places that otherwise would cause build errors when Clang is used to drive the CUDA compilation.
Note: `__clang__` is defined when either Clang is used as host compiler by NVCC or when Clang drives the compilation. `__CUDA__` is defined only for the latter case.
Test Plan:
```lang=bash
buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow
buck build mode/opt //fblearner/flow/projects/dper:workflow
```
Reviewed By: BIT-silence
Differential Revision: D20007440
fbshipit-source-id: 53caa70695b99461a3910d41dc71a9f6d0728a75
2020-02-20 15:16:11 -08:00
Gregory Chanan
2f03205c65
Support torch::tensor and at::tensor with bool and BFloat16 dtypes.
...
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23337
Test Plan: Imported from OSS
Differential Revision: D16467979
Pulled By: gchanan
fbshipit-source-id: 2e6ad431c47a61c917d501390d14c55b788958ab
2019-08-09 12:36:35 -07:00
Xiaomeng Yang
29b53b0259
Fix bug in caffe2 transpose on GPU ( #22233 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22233
Fix bug in caffe2 transpose on GPU
Reviewed By: hl475
Differential Revision: D15994973
fbshipit-source-id: 542dc8757b51a6322fffa55826c1d4e32927398d
2019-06-26 11:33:25 -07:00
Xiaomeng Yang
2ce39de3fc
Add elementwise_affine for layer_norm_op ( #19713 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19713
Add elementwise_affine for layer_norm_op
Reviewed By: houseroad
Differential Revision: D15075454
fbshipit-source-id: e8a7d3da1c81e49fa55323f5e74a68bc4ef8d83f
2019-04-26 17:20:01 -07:00
Xiaomeng Yang
fb9fc42a0c
optimize BatchMatmulOp ( #18612 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18612
optimize BatchMatmulOp
Reviewed By: houseroad
Differential Revision: D14681665
fbshipit-source-id: cf5ea4909ace58fd44fe6fa634531102ac84e851
2019-04-23 15:34:59 -07:00
Xiaomeng Yang
fd40c0eba0
Add gelu op ( #18992 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18992
Add gelu op
Reviewed By: houseroad
Differential Revision: D14814811
fbshipit-source-id: 00f126b8b83763c57ebbf28fbd2de5a8fab6d491
2019-04-08 21:58:29 -07:00
Xiaomeng Yang
265fa0ce4d
Move math::Axpy function to elementwise lib ( #18316 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18316
Move math::Axpy function to elementwise lib
i-am-not-moving-c2-to-c10
Reviewed By: houseroad
Differential Revision: D14574697
fbshipit-source-id: 7cfbb2da295c8966c5328bd6b577cce2638eea62
2019-03-26 12:19:19 -07:00
nihui
ed8c462dc7
Fix caffe2 build with BLAS=OpenBLAS ( #18422 )
...
Summary:
g++ complains about failing to find the declaration of cblas_sscal and cblas_dscal BLAS function
let's fix it :)
fedora 29, gcc 8.3.1, openblas 0.3.5
build with cmake -DBLAS=OpenBLAS ..
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18422
Differential Revision: D14598977
Pulled By: soumith
fbshipit-source-id: bde77bfb359d2ff38226401caeed78c114ef7468
2019-03-25 11:59:10 -07:00
Xiaomeng Yang
e04c9195b7
Update math::Transpose to support tensor with size > 2G ( #17670 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17670
Update math::Transpose to support tensor with size > 2G
i-am-not-moving-c2-to-c10
Differential Revision: D14313624
fbshipit-source-id: 0b4a85b913972e5a8981f0d40d0c539407b98f30
2019-03-20 18:22:21 -07:00
Xiaomeng Yang
0fd1dc45c0
Optimize LayerNormOp ( #17604 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17604
Optimize LayerNormOp
i-am-not-moving-c2-to-c10
Reviewed By: houseroad
Differential Revision: D14274175
fbshipit-source-id: a7aa263a1b0eb109682d2be99306e7b2cdcc0faf
2019-03-08 17:38:14 -08:00
Xiaomeng Yang
9709d5e787
Fix math::Set for large tensor ( #17539 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17539
Fix math::Set for large tensor
i-am-not-moving-c2-to-c10
Reviewed By: dzhulgakov, houseroad
Differential Revision: D14240756
fbshipit-source-id: 0ade26790be41fb26d2cc193bfa3082c7bd4e69d
2019-02-27 12:34:58 -08:00
Xiaomeng Yang
2e67b34ea7
Separate gpu reduce functions ( #17146 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17146
Separate gpu reduce functions
i-am-not-moving-c2-to-c10
Reviewed By: houseroad
Differential Revision: D14097564
fbshipit-source-id: a27de340997111a794b1d083c1673d4263afb9fb
2019-02-20 14:49:01 -08:00
Xiaomeng Yang
3a34f443c5
Separate reduce functions from math ( #16929 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16929
Separate CPU reduce functions from math
i-am-not-moving-c2-to-c10
Reviewed By: houseroad
Differential Revision: D13999469
fbshipit-source-id: bd628b15a6e3c1f04cc62aefffb0110690e1c0d1
2019-02-13 17:50:47 -08:00
Xiaomeng Yang
2db847b3a7
Separate elementwise level2 math functions ( #16753 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16753
Separate elementwise level2 math functions
i-am-not-moving-c2-to-c10
Reviewed By: houseroad
Differential Revision: D13954928
fbshipit-source-id: 1ca7a5d3da96e32510f502e5e4e79168854bee67
2019-02-07 18:38:26 -08:00
Xiaomeng Yang
7d4a81cbb2
Use macro for reduce on 2d blocks ( #16344 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16344
Use macro for reduce on 2d blocks
i-am-not-moving-c2-to-c10
Reviewed By: houseroad
Differential Revision: D13808988
fbshipit-source-id: b68c0fb6079c1b6e203a072083aba7a95c202bc2
2019-02-01 23:49:07 -08:00
Xiaomeng Yang
598b713660
Seperate level1 elementwise functions from math ( #16397 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16397
Seperate level1 elementwise functions from math
i-am-not-moving-c2-to-c10
Reviewed By: houseroad
Differential Revision: D13830626
fbshipit-source-id: e6e672647076dab8b3b24be181f580a1486250c9
2019-01-30 00:04:12 -08:00
Xiaomeng Yang
0a2d14dd7c
Optimize SpatialBNOp on GPU ( #16395 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16395
Optimize SpatialBNOp on GPU
i-am-not-moving-c2-to-c10
Reviewed By: houseroad
Differential Revision: D13829833
fbshipit-source-id: 04d2a63e8e9830c4c39a91cf87fcd7aa765dc55f
2019-01-28 09:36:45 -08:00
Xiaomeng Yang
866c4e3467
Separate Moments from math and optimize it ( #16175 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16175
Separate Moments from math and optimize it
i-am-not-moving-c2-to-c10
Reviewed By: houseroad
Differential Revision: D13742472
fbshipit-source-id: 90757d908d38c98ca69818855aaf68315e525992
2019-01-20 08:53:25 -08:00
Xiaomeng Yang
b436f94b53
Separate affine_channel from math and optimize it ( #16135 )
...
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16135
Separate affine_channel from math and optimize it
i-am-not-moving-c2-to-c10
Reviewed By: houseroad
Differential Revision: D13727606
fbshipit-source-id: 8980af4afadaf964a18a9da581106fe30896a7e9
2019-01-18 22:40:16 -08:00