Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66742
Modified loops in files under fbsource/fbcode/caffe2/ from the format
`for(TYPE var=x0;var<x_max;x++)`
to the format
`for(const auto var: irange(xmax))`
This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.
Test Plan: Sandcastle
Reviewed By: malfet
Differential Revision: D31705366
fbshipit-source-id: be58222426c192406a7f93c21582c3f6f2082401
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234
Modified loops in files under fbsource/fbcode/caffe2/ from the format
`for(TYPE var=x0;var<x_max;x++)`
to the format
`for(const auto var: irange(xmax))`
This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.
bypass_size_limit
allow-large-files
Test Plan: Sandcastle
Reviewed By: ngimel
Differential Revision: D30652629
fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62369
This diff is a big no-op that just sets up scaffolding for passing the "allow_broadcast_fastpath" from caffe2 operator protos created in Python down to C++. To facilitate this, we create helper template wrappers that pass a flag for "allow_broadcast_fastpath" down to elementwise functors. This flag will determine whether to try and take the broadcast fastpath, which we will add in subsequent diffs.
Test Plan: sandcastle + let github CI run
Differential Revision: D28154475
fbshipit-source-id: 15750a0bcd2994fbc6a61fb5653d8cae6b0177dd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17088
clangr codemod
also manually moved the constructor of a class from the .cpp file to the .h file.
Reviewed By: ezyang
Differential Revision: D14078531
fbshipit-source-id: 2adb4ac0ce523742da6cce3bc3b6c177b816c299
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16929
Separate CPU reduce functions from math
i-am-not-moving-c2-to-c10
Reviewed By: houseroad
Differential Revision: D13999469
fbshipit-source-id: bd628b15a6e3c1f04cc62aefffb0110690e1c0d1
Summary:
The PR did two things:
1. fix the bug in erase_number_type on node inputs
2. handle negative indices for dim-reduce in caffe2
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12888
Reviewed By: houseroad
Differential Revision: D12833486
Pulled By: wanchaol
fbshipit-source-id: c3ceb400d91f0173b73ad95e392b010c3c14db7d
Summary:
Breaking out of #8338
This PR is a workaround for a bug with CUDA9.2 + GCC7.
Here is the error this PR fixed:
.../pytorch/caffe2/operators/elementwise_ops.h: In constructor ‘caffe2::BinaryElementwiseWithArgsOp<InputTypes, Context, Functor, OutputTypeMap>::BinaryElementwiseWithArgsOp(const caffe2::OperatorDef&, caffe2::Workspace*)’:
.../pytorch/caffe2/operators/elementwise_ops.h:106:189: error: ‘GetSingleArgument<bool>’ is not a member of ‘caffe2::BinaryElementwiseWithArgsOp<InputTypes, Context, Functor, OutputTypeMap>’
BinaryElementwiseWithArgsOp(const OperatorDef& operator_def, Workspace* ws)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10510
Reviewed By: orionr
Differential Revision: D9319742
Pulled By: mingzhe09088
fbshipit-source-id: ce59e3db14539f071f3c20301e77ca36a6fc3f81
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9299
Onnx has ReduceL1 and ReduceL2 operators that would facilitate this, so allow pytorch to export those and allow caffe2 to run them.
I only implemented this on CPU so far.
Reviewed By: pjh5
Differential Revision: D8757381
fbshipit-source-id: 68afc9e2f90042a70929b73ace05a499b5c670c7
* Update elementwise ops to support numpy style boradcast
Update elementwise ops to support numpy style boradcast
* Fix sqrt_op
* Fix compare ops
* Fix gradient test
* Fix optimizer legacy broadcast
* Fix legacy broadcast for elementwise ops
* Skip flaky test
* Fix eigen simple binary op
* Fix attention test
* Fix rnn test
* Fix LSTM test
* Fix tan grad
* Fix schema check
* Refactor reduce ops to take flexible input types
* Add DISPATCH_FUNCTION macros in common_gpu.h
* Use macros to reduce switch case in dispatching cuda functions
* Update ReduceMean
* Add reduce mean to math
* Update cuda flag
* Update Eigen::Tensor ctor
* Remove unused variables
* Skip ReduceTensorGPUTest if no gpus
* Add NOMINMAX for windows
* Fix lpnorm_op in windows
* Reduce Sum and Reduce Mean
* Handle reductions with empty 'axes'
* Merge codebase and simplify tesnor reduction logic
* Restructure code and add comments.
* Fix parameter to scale
* Fix parameter to scale