Commit Graph

14 Commits

Author SHA1 Message Date
Lu Fang
a5d7abedae Enable fusing aten::expand on GT, LT, EQ (#10845)
Summary:
GT, LT, EQ all support numpy broadcasting, just enable the fusion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10845

Reviewed By: bddppq

Differential Revision: D9494089

Pulled By: houseroad

fbshipit-source-id: 7c65ca06c54dbd476ac7d07b47a413faaed3dd5e
2018-08-28 23:56:50 -07:00
Lu Fang
5ed62ea6fa Add Upsample example for torch onnx exporting
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10550

Reviewed By: orionr

Differential Revision: D9541932

Pulled By: houseroad

fbshipit-source-id: 4d179d189c176482ae919e5cc74607b9d315ed26
2018-08-28 11:39:55 -07:00
Xiang Gao
83066e9b30 Add trigonometry functions for ONNX export (#7540)
Summary:
Trigonometry functions are newly added to ONNX in a recent PR https://github.com/onnx/onnx/pull/869

This PR makes pytorch support exporting graphs with trigonometry functions.

This PR might need to wait until it is ready to change
```python
_onnx_opset_version = 6
```
to
```python
_onnx_opset_version = 7
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/7540

Differential Revision: D9395041

Pulled By: bddppq

fbshipit-source-id: bdf3e9d212b911c8c4eacf5a0753bb092e4748d2
2018-08-19 23:01:28 -07:00
Lu Fang
bdb11e716a Split the dependence of ONNX from test_operators.py (#10151)
Summary:
Now, run `python test/onnx/test_operators.py --no-onnx`, we won't introduce any onnx python dependence. (No onnx/protobuf python packages needs to be installed)

The major changes:
- output pbtxt from C++ exporter directly, so the floating format may be slightly different. (This should be fine, since it's just to guard ONNX exporting.)
- ONNX python packages are only imported if we run the ONNX related checks. Those checks are disabled when using `--no-onnx` flag.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10151

Reviewed By: jamesr66a

Differential Revision: D9130706

Pulled By: houseroad

fbshipit-source-id: ea28cf5db8399929179698ee535137f209e9ce6f
2018-08-14 12:54:44 -07:00
Xiang Gao
6fc75eadf0 Add CELU activation to pytorch (#8551)
Summary:
Also fuse input scale multiplication into ELU

Paper:
https://arxiv.org/pdf/1704.07483.pdf
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8551

Differential Revision: D9088477

Pulled By: SsnL

fbshipit-source-id: 877771bee251b27154058f2b67d747c9812c696b
2018-08-01 07:54:44 -07:00
Lu Fang
ee827f6ba3 Fix a testcase in logsoftmax onnx export (#9660)
Summary:
We only support special case. The original dim is not supported by ONNX.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9660

Reviewed By: bddppq

Differential Revision: D8965507

Pulled By: houseroad

fbshipit-source-id: 021dffdf0489c2d3a50bfd1e0c4cfd00d4a3d776
2018-07-27 17:54:32 -07:00
Sam Gross
829d763c69 Implement add, sub, mul, div using TensorIterator (#8919)
Summary:
```
This adds TensorIterator, a helper class for computing element-wise
operations that's intended to replace the CPU and CUDA apply utils
functions.

CPU kernels are implemented as functions that operate on strided 1-d
tensors compared to CPUApplyUtils which operated individual elements. This
allows the kernels to handle vectorization, while TensorIterator handles
parallelization and non-coalesced dimensions.

GPU kernels continue to operate on elements, but the number of
specializations is reduced. The contiguous case remains the same. The
non-contiguous case uses a single (reduced) shape for all operands and
the fast integer division from THCIntegerDivider. To avoid extra
specializations for indexing with 64-bits, large operations are split
into smaller operations that can be indexed with 32-bits.

Major semantic changes:

 - No more s_add, s_mul, s_div, or s_sub. Broadcasting is handled by
   TensorIterator. The autograd engine performs the reduction assuming
   standard broadcasting if the gradient shape does not match the
   expected shape. Functions that do not use standard broadcasting rules
   should either continue to trace the expand calls or handle the
   reduction in their derivative formula.

 - Use ONNX v7, which supports broadcasting ops.

Performance impact:

 - Small increased fixed overhead (~0.5 us)
 - Larger overhead for wrapped numbers (~2.5 us)
 - No significant change for ops on contiguous tensors
 - Much faster worst-case performance for non-contiguous GPU tensors
 - Faster CPU bias addition (~2x)
 - Faster GPU bias addition (~30% faster)

Future work:

 - Decrease overhead, especially for wrapping numbers in Tensors
 - Handle general inter-type operations
 - Extend to unary ops and reductions
 - Use buffering for compute-bound operations on non-contiguous tensors
   (pull in from CPUApplyUtils)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8919

Differential Revision: D8677600

Pulled By: colesbury

fbshipit-source-id: 61bc9cc2a36931dfd00eb7153501003fe0584afd
2018-07-27 14:43:24 -07:00
Adam Paszke
aa7af94656 Make JIT tracing a thread-local property (#9414)
Summary:
As in the title. Lets us simplify a lot of code.

Depends on #9363, so please review only the last commit.

zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9414

Reviewed By: zdevito

Differential Revision: D8836496

Pulled By: apaszke

fbshipit-source-id: 9b3c3d1f001a9dc522f8478abc005b6b86cfa3e3
2018-07-19 19:09:39 -07:00
Mark Richardson
88146484b4 Add support for .norm() pytorch onnx export and ReduceL1/ReduceL2 caffe2 operators (#9299)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9299

Onnx has ReduceL1 and ReduceL2 operators that would facilitate this, so allow pytorch to export those and allow caffe2 to run them.

I only implemented this on CPU so far.

Reviewed By: pjh5

Differential Revision: D8757381

fbshipit-source-id: 68afc9e2f90042a70929b73ace05a499b5c670c7
2018-07-14 10:54:13 -07:00
Akshay Chalana
e30ff68410 Add Hardtanh Export (#8804)
Summary:
Added hartanh CPU/GPU Implementations and backend tests to Caffe2
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8804

Reviewed By: bddppq

Differential Revision: D8813987

Pulled By: houseroad

fbshipit-source-id: 2480296eab3373425b9e1734a10c009b4f5d3e26
2018-07-11 18:09:51 -07:00
Lu Fang
c67ade26a7 Add onnx support for clamp_min clamp_max (#9224)
Summary:
Add support for clamp as required by https://github.com/onnx/onnx/issues/1168
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9224

Reviewed By: yinghai

Differential Revision: D8758945

Pulled By: houseroad

fbshipit-source-id: fad724d273c59f4527e96481ee6b2d14bfba205d
2018-07-09 16:25:44 -07:00
Peter Goldsborough
372d1d6735
Create ATen tensors via TensorOptions (#7869)
* Created TensorOptions

Storing the type in TensorOptions to solve the Variable problem

Created convenience creation functions for TensorOptions and added tests

Converted zeros to TensorOptions

Converted rand to TensorOptions

Fix codegen for TensorOptions and multiple arguments

Put TensorOptions convenience functions into torch namespace too

All factory functions except *_like support TensorOptions

Integrated with recent JIT changes

Support *_like functions

Fix in place modification

Some cleanups and fixes

Support sparse_coo_tensor

Fix bug in Type.cpp

Fix .empty calls in C++ API

Fix bug in Type.cpp

Trying to fix device placement

Make AutoGPU CPU compatible

Remove some auto_gpu.h uses

Fixing some headers

Fix some remaining CUDA/AutoGPU issues

Fix some AutoGPU uses

Fixes to dispatch_tensor_conversion

Reset version of new variables to zero

Implemented parsing device strings

Random fixes to tests

Self review cleanups

flake8

Undo changes to variable.{h,cpp} because they fail on gcc7.2

Add [cuda] tag to tensor_options_cuda.cpp

Move AutoGPU::set_index_from into .cpp file because Windows is stupid and sucks

Fix linker error in AutoGPU.cpp

Fix bad merge conflict in native_functions.yaml

Fixed caffe2/contrib/aten

Fix new window functions added to TensorFactories.cpp

* Removed torch::TensorOptions

Added code to generate wrapper functions for factory methods

Add implicit constructor from Backend to TensorOptions

Remove Var() from C++ API and use torch:: functions

Use torch:: functions more subtly in C++ API

Make AutoGPU::set_device more exception safe

Check status directly in DynamicCUDAHooksInterface

Rename AutoGPU to DeviceGuard

Removed set_requires_grad from python_variables.h and warn appropriately in Variable::set_requires_grad

remove python_default_init: self.type()

Add back original factory functions, but with deprecation warnings

Disable DeviceGuard for a couple functions in ATen

Remove print statement

Fix DeviceGuard construction from undefined tensor

Fixing CUDA device compiler issues

Moved as many methods as possible into header files

Dont generate python functions for deprecated factories

Remove merge conflict artefact

Fix tensor_options_cuda.cpp

Fix set_requires_grad not being checked

Fix tensor_new.h

TEMPORARILY put some methods in .cpp files to see if it solves issues on windows and mac

Fix bug in DeviceGuard.h

Missing includes

TEMPORARILY moving a few more methods into .cpp to see if it fixes windows

Fixing linker errors

* Fix up SummaryOps to use new factories

Undo device agnostic behavior of DeviceGuard

Use -1 instead of optional for default device index

Also move DeviceGuard methods into header

Fixes around device index after optional -> int32_t switch

Fix use of DeviceGuard in new_with_tensor_copy

Fix tensor_options.cpp

* Fix Type::copy(

* Remove test_non_float_params from ONNX tests

* Set requires_grad=False in ONNX tests that use ints

* Put layout/dtype/device on Tensor

* Post merge fixes

* Change behavior of DeviceGuard to match AutoGPU

* Fix C++ API integration tests

* Fix flip functions
2018-06-16 00:40:35 -07:00
Gao, Xiang
84730aa659 support <= and >= (#7633) 2018-05-17 10:01:29 -07:00
bddppq
141d81d095
Move ONNX integration tests from onnx-fb-universe to PyTorch repo (#7397)
* Move ONNX integration tests from onnx-fb-universe to PyTorch repo

* Switch to use torchvision

* Delete single rnn operator tests, they have been covered in e2e tests in test_caffe2.py

* Mirror the fix in onnx-fb-universe to bypass cuda check

667326d84b
2018-05-11 15:05:18 -07:00