Commit Graph

6402 Commits

Author SHA1 Message Date
Omkar Salpekar
6b65b3cbd8 [Distributed] DeleteKey API for c10d TCP Store (#45401)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45401

Added a DeleteKey API for the TCP Store
ghstack-source-id: 112997162

Test Plan:
Modified the existing get/set test to use delete. verified that the
correct keys were deleted and that the numKeys API returned the right values

Reviewed By: mrshenli

Differential Revision: D23955730

fbshipit-source-id: 5c9f82be34ff4521c59f56f8d9c1abf775c67f9f
2020-09-28 15:30:39 -07:00
Yangxin Zhong
190f91e3db Adding Histogram Binning Calibration to DSNN and Adding Type Double to Caffe2 ParallelSumOp/SumReluOp
Summary: As title.

Test Plan:
FBL job without this diff failed:
f221545832

Error message:
```
NonRetryableException: AssertionError: Label is missing in training stage for HistogramBinningCalibration
```

FBL job with canary package built in this diff is running without failure:
f221650379

Reviewed By: chenshouyuan

Differential Revision: D23959508

fbshipit-source-id: c077230de29f7abfd092c84747eaabda0b532bcc
2020-09-28 15:21:31 -07:00
Yangxin Zhong
9163e8171e Adding Type Double to Caffe2 Mean Op
Summary: Adding support for type double to caffe2 MeanOp and MeanGradientOp.

Test Plan:
All tests passed.

Example FBL job failed without this diff:
f221169563

Error message:
```
c10::Error: [enforce fail at mean_op.h:72] . Mean operator only supports 32-bit float, but input was of type double (Error from operator:
input: "dpsgd_8/Copy_3" input: "dpsgd_8/Copy_4" output: "dpsgd_8/Mean_2" name: "" type: "Mean" device_option { device_type: 0 device_id: 0 })
```

Example FBL job is running without failure with the canary package built from this diff:
f221468723

Reviewed By: chenshouyuan

Differential Revision: D23956222

fbshipit-source-id: 6c81bbc390d812ae0ac235e7d025141c8402def1
2020-09-28 13:35:29 -07:00
Natalia Gimelshein
78caa028b6 Revert D23009117: [Distributed] DeleteKey API for c10d TCP Store
Test Plan: revert-hammer

Differential Revision:
D23009117 (addf94f2d6)

Original commit changeset: 1a0d95b43d79

fbshipit-source-id: ad3fe5501267e1a0a7bf23410766f1e92b34b24d
2020-09-27 12:04:42 -07:00
Thomas Bredillet
0fa551f0ab [c2] Fix int types for learning rate
Summary: Currently GetSingleArgument is overflowing since it's expecting an int instead of an int64 when using a 1cycle (hill policy) annealing schedule

Test Plan:
unittest

buck test  caffe2/caffe2/python/operator_test:learning_rate_op_test

Differential Revision: D23938169

fbshipit-source-id: 20d65df800d7a0f1dd9520705af31f63ae716463
2020-09-26 10:59:29 -07:00
Omkar Salpekar
addf94f2d6 [Distributed] DeleteKey API for c10d TCP Store (#43963)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43963

Added a DeleteKey API for the TCP Store
ghstack-source-id: 112939762

Test Plan:
Modified the existing get/set test to use delete. verified that the
correct keys were deleted and that the numKeys API returned the right values

Reviewed By: jiayisuse

Differential Revision: D23009117

fbshipit-source-id: 1a0d95b43d79e665a69b2befbaa059b2b50a1f66
2020-09-26 00:54:21 -07:00
Omkar Salpekar
304e1d1e19 [Distributed] getNumKeys API to c10d TCPStore (#43962)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43962

TCPStore needs a getNumKeys API for our logging needs.
ghstack-source-id: 112939761

Test Plan: Adding tests to C++ Store Tests

Reviewed By: pritamdamania87

Differential Revision: D22985085

fbshipit-source-id: 8a0d286fbd6fd314dcc997bae3aad0e62b51af83
2020-09-26 00:49:00 -07:00
gunandrose4u
f07ac6a004 Fix Windows build failure after DDP PR merged (#45335)
Summary:
Fixes #{issue number}
This is resubmit for PR https://github.com/pytorch/pytorch/issues/42897 . Together with fix for Windows build issue introduced by PR https://github.com/pytorch/pytorch/issues/44344 .

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45335

Reviewed By: zou3519

Differential Revision: D23931471

Pulled By: mrshenli

fbshipit-source-id: f49b5a114944c1450b32934b3292170be064f494
2020-09-25 12:37:50 -07:00
Bram Wasti
d1a11618f5 [static runtime] Add _out variants and reuse memory (#44128)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44128

Test Plan: Imported from OSS

Reviewed By: hlu1

Differential Revision: D23604304

Pulled By: bwasti

fbshipit-source-id: 06a23cb75700a0fc733069071843b7b498e7b9e9
2020-09-25 11:03:06 -07:00
Sebastian Messmer
2ac7de7d53 Remove hacky_wrapper from BackendSelect kernels (#44062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44062

Previously, BackendSelect kernels were still written in the legacy way, i.e. they took one TensorOptions argument instead of scattered dtype, layout, device, pin_memory,  and they used hacky_wrapper to be callable. This caused a re-wrapping step. Calling into a BackencSelect kernel required taking the individual scattered arguments, packing them into a TensorOptions, and the kernel itself then gathered them again for redispatch.

Now with this PR, BackendSelect kernels are written in the new way and no hacky_wrapper or rewrapping is needed for them.
ghstack-source-id: 112825789

Test Plan:
vs master: https://www.internalfb.com/intern/fblearner/details/216117032/

vs previous diff: https://www.internalfb.com/intern/fblearner/details/216170194/

Reviewed By: ezyang

Differential Revision: D23484192

fbshipit-source-id: e8fb49c4692404b6b775d18548b990c4cdddbada
2020-09-25 09:04:03 -07:00
jjsjann123
99e0a87bbb [nvFuser] Latency improvements for pointwise + reduction fusion (#45218)
Summary:
A lot of changes are in this update, some highlights:

- Added Doxygen config file
- Split the fusion IR (higher level TE like IR) from kernel IR (lower level CUDA like IR)
- Improved latency with dynamic shape handling for the fusion logic
- Prevent recompilation for pointwise + reduction fusions when not needed
- Improvements to inner dimension reduction performance
- Added input -> kernel + kernel launch parameters cache, added eviction policy
- Added reduction fusions with multiple outputs (still single reduction stage)
- Fixed code generation bugs for symbolic tiled GEMM example
- Added thread predicates to prevent shared memory form being loaded multiple times
- Improved sync threads placements with shared memory and removed read before write race
- Fixes to FP16 reduction fusions where output would come back as FP32

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45218

Reviewed By: ezyang

Differential Revision: D23905183

Pulled By: soumith

fbshipit-source-id: 12f5ad4cbe03e9a25043bccb89e372f8579e2a79
2020-09-24 23:17:20 -07:00
Mike Ruberry
103fa3894a Revert D23841786: [pytorch][PR] Enable distributed package on windows, Gloo backend supported only
Test Plan: revert-hammer

Differential Revision:
D23841786 (0122299f9b)

Original commit changeset: 334ba1ed73ef

fbshipit-source-id: ec95432f9957df56a5a04e52661f5db920b7f57f
2020-09-24 22:44:33 -07:00
gunandrose4u
0122299f9b Enable distributed package on windows, Gloo backend supported only (#42897)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/42095

For test case part will be committed to this PR later

mrshenli, please help to review

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42897

Reviewed By: osalpekar

Differential Revision: D23841786

Pulled By: mrshenli

fbshipit-source-id: 334ba1ed73eff2f668857390fc32d1bc7f08e5f3
2020-09-24 21:13:55 -07:00
Dianshi Li
03dde4c62a Resend diff D23858329 (#45315)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45315

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45314

in D23858329 (721cfbf842), we put PriorCorrectionCalibrationPrediction unit test in OSS file which causes test failure issue in public trunk.

this diff moves it to FB only test file.

Test Plan:
```
 buck test //caffe2/caffe2/python/operator_test:torch_integration_test -- test_gather_ranges_to_dense_op

buck test //caffe2/caffe2/fb/python/operator_test:torch_integration_test -- test_prior_correct_calibration_prediction_op
```
all pass.

Reviewed By: houseroad

Differential Revision: D23899012

fbshipit-source-id: 1ed97d8702e2765991e6caf5695d4c49353dae82
2020-09-24 18:41:49 -07:00
Haixin Liu
1539d4a664 Add operator to compute the equalization scale (#45096)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45096

Add operator to compute the equalization scale. This will be used in the integration of equalization into dper int8 fixed quant scheme quantization flow.

Design docs:
https://fb.quip.com/bb7SAGBxPGNC

https://fb.quip.com/PDAOAsgoLfRr

Test Plan: buck test caffe2/caffe2/quantization/server:compute_equalization_scale_test

Reviewed By: jspark1105

Differential Revision: D23779870

fbshipit-source-id: 5e6a8c220935a142ecf8e61100a8c71932afa8d7
2020-09-24 15:19:49 -07:00
Danny Huang
cd7a682282 [caffe2] adds hypothesis test for queue ops cancel (#45178)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45178

## Motivation
* To be able to make C2 ops cancellable so we can safely exit.
* Some C2 operators are now blocking thus being non-cancellable. If an error
occurs we need to be able to safely stop all net execution so we can throw
the exception to the caller.

## Summary
* Adds a hypothesis test for queue ops cancellation.

Test Plan:
## Unit test added to verify that queue ops propagate errors

```
buck test caffe2/caffe2/python:hypothesis_test
buck test caffe2/caffe2/python:hypothesis_test -- test_safe_dequeue_blob__raises_exception_when_hang --stress-runs 1000
```

```
Summary
  Pass: 1000
  ListingSuccess: 1
```

Reviewed By: d4l3k

Differential Revision: D23847576

fbshipit-source-id: 2fc351e1ee13ea8b32d976216d2d01dfb6fcc1ad
2020-09-24 14:43:52 -07:00
Danny Huang
cbe1eac1f4 [caffe2] adds Cancel to SafeDequeueBlobsOp and SafeEnqueueBlobsOp (#45177)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45177

## Motivation
* To be able to make C2 ops cancellable so we can safely exit.
* Some C2 operators are now blocking thus being non-cancellable. If an error
occurs we need to be able to safely stop all net execution so we can throw
the exception to the caller.

## Summary
* When an error occurs in a net or it got cancelled, running ops will have the
`Cancel` method called.
This diff adds `Cancel` method to the `SafeEnqueueBlobsOp`
and `SafeDequeueBlobsOp` to have the call queue->close() to force all the
blocking ops to return.
* Adds unit test that verified the error propagation.

Test Plan:
## Unit test added to verify that queue ops propagate errors

```
buck test caffe2/caffe2/python:hypothesis_test -- test_safe_dequeue_blob__raises_exception_when_hang --stress-runs 1000
```

```
Summary
  Pass: 1000
  ListingSuccess: 1
```

Reviewed By: d4l3k

Differential Revision: D23846967

fbshipit-source-id: c7ddd63259e033ed0bed9df8e1b315f87bf59394
2020-09-24 14:22:46 -07:00
Xiaomeng Yang
e2bcdc7b69 [Caffe2] Fix LayerNormOp when batch_size == 0. (#45250)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45250

[Caffe2] Fix LayerNormOp when batch_size == 0.

Test Plan: buck test mode/dev-nosan //caffe2/caffe2/python/operator_test:layer_norm_op_test

Reviewed By: houseroad

Differential Revision: D23892091

fbshipit-source-id: 9a34654dd6880c9d14b7111fcf850e4f48ffdf91
2020-09-24 12:30:03 -07:00
Mike Ruberry
956a25d061 Revert D23858329: [PT Model Split] Support 2 operators in PT by C2 conversion
Test Plan: revert-hammer

Differential Revision:
D23858329 (721cfbf842)

Original commit changeset: ed37118ca7f0

fbshipit-source-id: 30c700f80665be11afc608b00a77766064e60b35
2020-09-23 21:20:21 -07:00
Jordan Fix
c760bc8fb1 Add GlowLoadAOTModel flag (#45189)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45189

Pull Request resolved: https://github.com/pytorch/glow/pull/4902

Test Plan: Test locally

Reviewed By: yinghai

Differential Revision: D23810445

fbshipit-source-id: 56e717d80abbfe76b15d0f4249e1e399a9722753
2020-09-23 20:50:04 -07:00
Dianshi Li
721cfbf842 [PT Model Split] Support 2 operators in PT by C2 conversion (#45231)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45231

There are two operators:
`PriorCorrectionCalibrationPrediction` and `GatherRangesToDense` is not supported in PT which makes GLOW cannot work.

To unblock, we first try to use C2->PT conversion. In the long-term, we need to implement PT custom ops.

This diff does this conversion to unblock current project.

Test Plan:
Run unit test. the Test input is from current DPER example.
All pass.
```buck test //caffe2/caffe2/python/operator_test:torch_integration_test -- test_prior_correct_calibration_prediction_op  --print-passing-details

> c2 reference output
> [0.14285715 0.27272728 0.39130434 0.5 ]

> PT converted output
> tensor([0.1429, 0.2727, 0.3913, 0.5000])

buck test //caffe2/caffe2/python/operator_test:torch_integration_test -- test_gather_ranges_to_dense_op  --print-passing-details

c2 reference output
> [array([[6, 5, 4, 3], [0, 0, 0, 0]], dtype=int64)]

> PT converted output
> [tensor([[6, 5, 4, 3], [0, 0, 0, 0]])]
```

Reviewed By: allwu, qizzzh

Differential Revision: D23858329

fbshipit-source-id: ed37118ca7f09e1cd0ad1fdec3d37f66dce60dd9
2020-09-23 18:31:57 -07:00
Bugra Akyildiz
27c7158166 Remove __future__ imports for legacy Python2 supports (#45033)
Summary:
There is a module called `2to3` which you can target for future specifically to remove these, the directory of `caffe2` has the most redundant imports:

```2to3 -f future -w caffe2```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45033

Reviewed By: seemethere

Differential Revision: D23808648

Pulled By: bugra

fbshipit-source-id: 38971900f0fe43ab44a9168e57f2307580d36a38
2020-09-23 17:57:02 -07:00
Hao Lu
c0267c6845 [caffe2] Support data types in shape hints (#45110)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45110

A recent change in DSNN quantizes the ad embedding to 8 bits. Ad embeddings are part of the inputs to the DSNN merge net. To correctly pass shape hints of input tensors including quantized ad embeddings, we need to be able to annotate the data types in shape hints.

A bit on the corner cases, if type is omitted or not a valid type, e.g., white spaces, instead of throwing an exception, I decided to return the default type, float.

Test Plan:
```
buck test caffe2/caffe2/fb/opt:shape_info_utils_test
```

Reviewed By: yinghai

Differential Revision: D23834091

fbshipit-source-id: 5e072144a7a7ff4b5126b618062dfc4041851dd3
2020-09-22 17:49:33 -07:00
Daya Khudia
09aee06e82 [caffe2] Replace embedding conversion ops with fbgemm functions (#44843)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44843

Replace perfkernels calls with fbgemm kernels to avoid code duplication
ghstack-source-id: 112496292

Test Plan: CI

Reviewed By: radkris-git

Differential Revision: D23675519

fbshipit-source-id: 05c285a9eeb9ea109a04a78cb442a24ee40a4aec
2020-09-22 11:57:01 -07:00
Brandon Lin
36ec8f8fb8 [dper3] Create dper LearningRate low-level module (#44639)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44639

As title; this will unblock migration of several modules that need learning rate functionality.

Test Plan:
```
buck test //dper3/dper3/modules/low_level_modules/tests:learning_rate_test
```

Reviewed By: yf225

Differential Revision: D23681733

fbshipit-source-id: 1d98cb35bf6a4ff0718c9cb6abf22401980b523c
2020-09-22 08:26:07 -07:00
Hector Yuen
32c1a8c79f adjust shape inference in sls tests (#44936)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44936

need to provide max sequence size and max element size instead of
total
added a check that onnxifi was succesful

Test Plan: sls tests

Reviewed By: yinghai

Differential Revision: D23779437

fbshipit-source-id: 5048d6536ca00f0a3b0b057c4e2cf6584b1329d6
2020-09-21 22:09:55 -07:00
Lin.Sung
f77ba0e48c Change typo 'momemtum' to 'momentum' (#45045)
Summary:
As the title.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45045

Reviewed By: mruberry

Differential Revision: D23808563

Pulled By: mrshenli

fbshipit-source-id: ca818377f4c23d67b037c146fef667ab8731961e
2020-09-21 19:03:26 -07:00
Lingyi Liu
2d884f2263 Optimize Scale function (#44913)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44913

Pull Request resolved: https://github.com/pytorch/pytorch/pull/18322

Optimize Scale function

i-am-not-moving-c2-to-c10

Test Plan: buck test mode/dbg caffe2/caffe2/python/operator_test:weighted_sum_test

Reviewed By: BIT-silence

Differential Revision: D14575780

fbshipit-source-id: db333a7964581dcaff6e432ff1d6b517ba1a075f
2020-09-18 14:31:33 -07:00
Lucas Hosseini
af3fc9725d Extract rpc/tensorpipe_utils.{cpp,h} from rpc/utils.{cpp,h} (#44803)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44803

Test Plan: CI

Reviewed By: lw

Differential Revision: D23732022

fbshipit-source-id: 5b839c7997bbee162a14d03414ee32baabbc8ece
2020-09-18 13:51:43 -07:00
Saif Ul Islam
e400150c3b Fixed for caffe2/opt/tvm_transformer.cc (#44249)
Summary:
Fixes #https://github.com/pytorch/pytorch/issues/41706

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44249

Reviewed By: gmagogsfm

Differential Revision: D23752331

Pulled By: SplitInfinity

fbshipit-source-id: 1d7297e080bc1e065129259e406af7216f3f0665
2020-09-18 00:03:59 -07:00
Jongsoo Park
086a2e7a4e [caffe2] add cost inference for FusedFakeQuantFC and FusedFakeQuantFCGradient (#44840)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44840

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44762

Move CostInferenceForFCGradient to fc_inference.cc/h to be used in multiple .cc files.

Test Plan: CI

Reviewed By: qizzzh

Differential Revision: D23714877

fbshipit-source-id: d27f33e270a93b0e053f2af592dc4a24e35526cd
2020-09-17 14:07:17 -07:00
Mike Ruberry
b6f4bb0a70 Revert D23236088: [pytorch][PR] [caffe2] adds Cancel to SafeDequeueBlobsOp and SafeEnqueueBlobsOp
Test Plan: revert-hammer

Differential Revision:
D23236088 (0ccc38b773)

Original commit changeset: daa90d9ee324

fbshipit-source-id: 933c7deab177250075683a9bea143ac37f16a598
2020-09-16 23:32:50 -07:00
Danny Huang
0ccc38b773 [caffe2] adds Cancel to SafeDequeueBlobsOp and SafeEnqueueBlobsOp (#44495)
Summary:
## Motivation

* To be able to make C2 ops cancellable so we can safely exit.
* Some C2 operators are now blocking thus being non-cancellable. If an error
  occurs we need to be able to safely stop all net execution so we can throw
  the exception to the caller.

* When an error occurs in a net or it got cancelled, running ops will have the
 `Cancel` method called.

* This diff adds `Cancel` method to the `SafeEnqueueBlobsOp`
and `SafeDequeueBlobsOp` to have the call queue->close() to force all the
 blocking ops to return.
* Adds unit test that verified the error propagation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44495

Test Plan:
## Unit Test added to verify that queue ops propagate errors
```
buck test caffe2/caffe2/python:hypothesis_test
```

Reviewed By: dzhulgakov

Differential Revision: D23236088

Pulled By: dahsh

fbshipit-source-id: daa90d9ee32483fb51195e269a52cf5987bb0a5a
2020-09-16 18:17:34 -07:00
Venkata Chintapalli
a3835179a1 [FakeLowP] Addressing FakeLowP OSS issues. (#44819)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44819

[12:39 AM] Cherckez, Tal
please review the following patch.
should address these issues that our validation team found:
A) test_op_nnpi_fp16: hypothesis to trigger max_example*max_example.
B) batchnorm: batchNorm has derived from unit test which doesnt have setting required for hypothesis. hence default value as 100 getting set.

Test Plan:
buck test //caffe2/caffe2/contrib/fakelowp/test/...
https://our.intern.facebook.com/intern/testinfra/testrun/5910974543950859

Reviewed By: hyuen

Differential Revision: D23740970

fbshipit-source-id: 16fcc49f7bf84a5d7342786f671cd0b4e0fc87d3
2020-09-16 13:56:11 -07:00
Xiang Gao
20ac736200 Remove py2 compatible future imports (#44735)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735

Reviewed By: mruberry

Differential Revision: D23731306

Pulled By: ezyang

fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f
2020-09-16 12:55:57 -07:00
Nikita Shulga
1718b16d15 [Caffe2] gcs_cuda_only is trivial if CUDA not available (#44578)
Summary:
Make `gcs_cuda_only` and `gcs_gpu_only` return empty device lists if CUDA/GPU(CUDA or RocM) not available

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44578

Reviewed By: walterddr

Differential Revision: D23664227

Pulled By: malfet

fbshipit-source-id: 176b5d964c0b02b8379777cd9a38698c11818690
2020-09-16 12:24:08 -07:00
Abdelrauf
6954ae1278 Vec256 Test cases (#42685)
Summary:
[Tests for Vec256 classes https://github.com/pytorch/pytorch/issues/15676](https://github.com/pytorch/pytorch/issues/15676)

Testing
Current list:

- [x] Blends
- [x] Memory: UnAlignedLoadStore
- [x] Arithmetics: Plus,Minu,Multiplication,Division
- [x] Bitwise: BitAnd, BitOr, BitXor
- [x] Comparison: Equal, NotEqual, Greater, Less, GreaterEqual, LessEqual
- [x] MinMax: Minimum, Maximum, ClampMin, ClampMax, Clamp
- [x] SignManipulation: Absolute, Negate
- [x] Interleave: Interleave, DeInterleave
- [x] Rounding: Round, Ceil, Floor, Trunc
- [x] Mask: ZeroMask
- [x] SqrtAndReciprocal: Sqrt, RSqrt, Reciprocal
- [x] Trigonometric: Sin, Cos, Tan
- [x] Hyperbolic: Tanh, Sinh, Cosh
- [x] InverseTrigonometric: Asin, ACos, ATan, ATan2
- [x] Logarithm: Log, Log2, Log10, Log1p
- [x] Exponents: Exp, Expm1
- [x] ErrorFunctions: Erf, Erfc, Erfinv
- [x] Pow: Pow
- [x] LGamma: LGamma
- [x] Quantization: quantize, dequantize, requantize_from_int
- [x] Quantization: widening_subtract, relu, relu6
Missing:
- [ ] Constructors, initializations
- [ ] Conversion , Cast
- [ ] Additional: imag, conj, angle (note: imag and conj only checked for float complex)

#### Notes on tests and testing framework
- some math functions are tested within domain range
- mostly testing framework randomly tests against std implementation within the domain or within the implementation domain for some math functions.
- some functions are tested against the local version. ~~For example, std::round and vector version of round differs. so it was tested against the local version~~
- round was tested against pytorch at::native::round_impl. ~~for double type on **Vsx  vec_round failed  for  (even)+0 .5 values**~~ . it was solved by using vec_rint
- ~~**complex types are not tested**~~  **After enabling complex testing due to precision and domain some of the complex functions failed for vsx and x86 avx as well. I will either test it against local implementation or check within the accepted domain**
- ~~quantizations are not tested~~  Added tests for quantizing, dequantize, requantize_from_int, relu, relu6, widening_subtract functions
- the testing framework should be improved further
- ~~For now  `-DBUILD_MOBILE_TEST=ON `will be used for Vec256Test too~~
Vec256 Test cases will be built for each CPU_CAPABILITY

Fixes: https://github.com/pytorch/pytorch/issues/15676

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42685

Reviewed By: malfet

Differential Revision: D23034406

Pulled By: glaringlee

fbshipit-source-id: d1bf03acdfa271c88744c5d0235eeb8b77288ef8
2020-09-16 11:48:02 -07:00
Yan Xie
285ba0d068 Enable fp16 for UniformFill (#44540)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44540

Support output type to be fp16 for UniformFill

Reviewed By: jianyuh

Differential Revision: D23558030

fbshipit-source-id: 53a5b2c92cfe78cd11f55e6ee498e1bd682fe4a1
2020-09-15 15:09:18 -07:00
Yan Xie
4ce6af35c4 Enable fp16 for CUDA SparseLengthsSum/Mean (#44089)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44089

Add support of fp16 as input type in SparseLengthSum/Mean caffe2 operator

Reviewed By: xianjiec

Differential Revision: D23436877

fbshipit-source-id: 02fbef2fde17d4b0abea9ca5d17a36aa989f98a0
2020-09-15 11:10:54 -07:00
Natalia Gimelshein
e703c17967 Revert D23584071: [dper3] Create dper LearningRate low-level module
Test Plan: revert-hammer

Differential Revision:
D23584071 (a309355be3)

Original commit changeset: f6656531b1ca

fbshipit-source-id: b0a93f4286053fb8576a70278edca3a7d89c722b
2020-09-12 20:45:30 -07:00
Brandon Lin
a309355be3 [dper3] Create dper LearningRate low-level module
Summary: As title; this will unblock migration of several modules that need learning rate functionality.

Test Plan:
```
buck test //dper3/dper3/modules/low_level_modules/tests:learning_rate_test
```

WIP: need to add more learning rate tests for the different policies

Reviewed By: yf225

Differential Revision: D23584071

fbshipit-source-id: f6656531b1caba38c3e3a7d6e16d9591563391e2
2020-09-12 15:33:29 -07:00
Hector Yuen
0743d013a6 fuse layernorm + quantize (#44232)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44232

enhance layernorm to optionally quantize its output
add fusion code to replace instances of layernorm +quantization

Test Plan:
tested layernorm
net_runner

P141557987

Reviewed By: venkatacrc

Differential Revision: D23510893

fbshipit-source-id: 32f57ba2090d35d86dcc951e0f3f6a8901ab3153
2020-09-12 13:32:33 -07:00
Nikita Shulga
2ae74c0632 Compile less legacy code when BUILD_CAFFE2 is set to False (take 2) (#44453)
Summary:
2nd attempt to land https://github.com/pytorch/pytorch/pull/44079

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44453

Reviewed By: walterddr, seemethere

Differential Revision: D23619528

Pulled By: malfet

fbshipit-source-id: c7c206ebd327dcf3994789bd47008b05ff862fe7
2020-09-11 16:27:47 -07:00
Danny Huang
2b8f0b2023 [caffe2] adds Cancel to OperatorBase and NetBase (#44145)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44145

## Motivation

* To be able to make C2 ops cancellable so we can safely exit.
* Some C2 operators are now blocking thus being non-cancellable. If an error
  occurs we need to be able to safely stop all net execution so we can throw
  the exception to the caller.

## Summary
*  Adds `NetBase::Cancel()` to NetBase which iterates over the entire list of
   operators and call Cancel.
* Cancel on all ops was added to Net since there's nothing Asyc specific about it.
* `AsyncSchedulingNet` calls parent Cancel.
* To preserve backwards compatibility, `AsyncSchedulingNet`'s Cancel still calls
   `CancelAndFinishAsyncTasks` .
* Adds `Cancel()` to `OperatorBase`.

Reviewed By: dzhulgakov

Differential Revision: D23279202

fbshipit-source-id: e1bb0ff04a4e1393f935dbcac7c78c0baf728550
2020-09-11 12:50:26 -07:00
Brandon Lin
ea55820606 [dper3] Export PackSegments and UnpackSegments to Pytorch
Summary: As title.

Test Plan:
```
buck test //caffe2/caffe2/python/operator_test/:torch_integration_test -- test_pack_segments
```

Reviewed By: yf225

Differential Revision: D23610495

fbshipit-source-id: bd8cb61f2284a08a54091a4f982f01fcf681f215
2020-09-11 09:29:24 -07:00
Hector Yuen
6c98d904c0 handle the case of -0.0 on tanh quantization (#44406)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44406

this fix makes fakelowp identical to hw

- mask out the floating point number with 0x7fff so we are always dealing
with positive numbers
- dsp implementation is correct, ice-ref suffers from this same problem

Test Plan: - tested with test_fusions.py, can't enable the test until the fix in ice-ref appears

Reviewed By: venkatacrc

Differential Revision: D23603878

fbshipit-source-id: a72d93a4bc811f98d1b5e82ddb204be028addfeb
2020-09-10 01:18:45 -07:00
Giuseppe Ottaviano
6324ef4ced [caffe2] Speed up compilation of aten-op.cc (#44440)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44440

`aten-op.cc` takes a long time to compile due to the large generated constructor. For each case, the `std::function` constructor and the initialization functions are inlined, producing a huge amount of intermediate code that takes a long time to optimize, given that many compiler optimization passes are superlinear in the function size.

This diff moves each case to a separate function, so that each one is cheap to optimize, and the constructor is just a large jump table, which is easy to optimize.

Reviewed By: dzhulgakov

Differential Revision: D23593741

fbshipit-source-id: 1ce7a31cda10d9b0c9d799716ea312a291dc0d36
2020-09-09 21:21:48 -07:00
Gang Shen
058d7228ec Expose the interface of nesterov of SGD Optimizer from caffe2 to dper
Summary:
Expose the interface of `nesterov` of SGD Optimizer from caffe2 to dper.

dper sgd optimizer (https://fburl.com/diffusion/chpobg0h) has referred to NAG sgdoptimizer in caffe2: https://fburl.com/diffusion/uat2lnan. So just need to add the parameter 'nesterov' in dper sgd optimizer.

Analysis of run resutls: N345540.

- train_ne increases as momentum (m) decreases.
- for m=0.95, 0.9: eval_ne is lower with NAG than production (no NAG, m = 0.95).
- for m=0.99: eval_ne with or without NAG is higher than production. It indicates larger variance in validation and overfit in training (lower train_ne).

Test Plan:
1. unit tests:
`buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test -- test_sgd_without_nesterov`
`buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test -- test_sgd_with_nesterov`
.
1. build dper front end package: `flow-cli canary   ads.dper3.workflows.sparse_nn.train --mode opt --entitlement      ads_global --run-as-secure-group      team_ads_ml_ranking`. The build result (refreshed) is here https://www.internalfb.com/intern/buck/build/2a368b55-d94b-45c1-8617-2753fbce994b. Flow package version is ads_dper3.canary:856b545cc6b249c0bd328f845adeb0d2.
.
2. To build dper back end package: `flow-cli canary  dper.workflows.dper3.train --mode opt --entitlement      ads_global --run-as-secure-group      team_ads_ml_ranking`. The build result (refreshed) is here: https://www.internalfb.com/intern/buck/build/70fa91cd-bf6e-4a08-8a4d-41e41a77fb52. Flow package version is aml.dper2.canary:84123a34be914dfe86b1ffd9925869de.
.
3. Compare prod with NAG-enabled runs:
a) refreshed prod run (m=0.95): f213877098
NAG enabled run (m=0.95): f213887113
.
b) prod run (m=0.9): f214065288
NAG enabled run (m=0.9): f214066319
.
c) prod run (m=0.99): f214065804
NAG enabled run (m=0.99): f214066725
.
d) change date type of nestrov to `bool` and launched a validation run
NAG enabled (m=0.95): f214500597

Reviewed By: ustctf

Differential Revision: D23152229

fbshipit-source-id: 61703ef6b4e72277f4c73171640fb8afc6d31f3c
2020-09-09 19:37:00 -07:00
Danny Huang
5ee31308e6 [caffe2] exposes Net cancellation through pybind state (#44043)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44043

To invoke `cancel` from the net instance in Python, we expose it through pybind state.

Reviewed By: dzhulgakov

Differential Revision: D23249660

fbshipit-source-id: 45a1e9062dca811746fcf2e5e42199da8f76bb54
2020-09-09 18:13:13 -07:00
Xiaodong Wang
ba6ddaf04c [pyper] export caffe2 bucketize GPU operator to pytorch
Summary: Exporting the Bucketize operator on CUDA. Also adding unit test.

Test Plan: buck test mode/dev-nosan caffe2/torch/fb/sparsenn:gpu_test -- test_bucketize

Differential Revision: D23581321

fbshipit-source-id: 7f21862984c04d840410b8718db93006f526938a
2020-09-09 16:08:53 -07:00