Commit Graph

6402 Commits

Author SHA1 Message Date
Sean Lynch
f9a766bb39 Increase deadline time for load_save tests (#43205)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43205

A number of tests that forward to `TestLoadSaveBase.load_save` are all marked as flaky due to them regularly taking much longer to start up than hypothesis' default timeout of 200ms. This diff fixes the problem by removing the timeout for `load_save`. This is alright as these tests aren't meant to be testing the performance of these operators.

I would set the deadline to 60s if I could however it appears the that caffe2 github CI uses a different version of hypothesis that doesn't allow using `dateutil.timedelta` so instead of trying to figure out an approach that works on both I've just removed the deadline time.

I've also tagged all existing tasks WRT these failures.

Differential Revision: D23175752

fbshipit-source-id: 324f9ff034df1ac4874797f04f50067149a6ba48
2020-08-20 08:41:24 -07:00
Hector Yuen
06d43dc69a default ice-ref to c-step (#4812)
Summary:
Pull Request resolved: https://github.com/pytorch/glow/pull/4812

if no compilation options are passed, default to c-step

fixed the FC and batchmatmul implementations to match C-step
fixed the fakelowp map calling to make sure we use the fp32 substitution of operators
updated the accumulator test to make it pass with fp32

Test Plan:
fakelowp tests
glow/test/numerics
net_runner

Reviewed By: jfix71

Differential Revision: D23086534

fbshipit-source-id: 3fbb8c4055bb190becb39ce8cdff6671f8558734
2020-08-19 09:50:34 -07:00
Ann Shan
2e6e295ecc refactor _save_parameters to _save_data (#43162)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43162

Test Plan: Imported from OSS

Reviewed By: iseeyuan

Differential Revision: D23175286

Pulled By: ann-ss

fbshipit-source-id: 6f930b98c367242fd4efbf51cb1d09995f7c4b40
2020-08-18 14:57:03 -07:00
Yinghai Lu
b92b556a12 Add shape inference to SparseLengthsSumSparse ops (#43181)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43181

att

Test Plan:
```
buck test caffe2/caffe2/opt:bound_shape_inference_test
```

Reviewed By: ChunliF

Differential Revision: D23097145

fbshipit-source-id: 3e4506308446f28fbeb01dcac97dce70c0443975
2020-08-18 09:36:53 -07:00
Christian Sarofeen
b3bda94393 [NVFuser] Enable E2E BCast-PWise-Reduction fusions (#43129)
Summary:
Had a bunch of merged commits that shouldn't have been there, reverted them to prevent conflicts. Lots of new features, highlights listed below.

**Overall:**

- Enables pointwise fusion, single (but N-D) broadcast -- pointwise fusion, single (but N-D) broadcast -- pointwise -- single (but N-D) reduction fusion.

**Integration:**

- Separate "magic scheduler" logic that takes a fusion and generates code generator schedule
- Reduction fusion scheduling with heuristics closely matching eagermode (unrolling supported, but no vectorize support)
- 2-Stage caching mechanism, one on contiguity, device, type, and operations, the other one is input size->reduction heuristic

**Code Generation:**

- More generic support in code generation for computeAt
- Full rework of loop nest generation and Indexing to more generically handle broadcast operations
- Code generator has automatic kernel launch configuration (including automatic allocation of grid reduction buffers)
- Symbolic (runtime) tilling on grid/block dimensions is supported
- Simplified index generation based on user-defined input contiguity
- Automatic broadcast support (similar to numpy/pytorch semantics)
- Support for compile time constant shared memory buffers
- Parallelized broadcast support (i.e. block reduction -> block broadcast support)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43129

Reviewed By: mrshenli

Differential Revision: D23162207

Pulled By: soumith

fbshipit-source-id: 16deee4074c64de877eed7c271d6a359927111b2
2020-08-18 09:10:08 -07:00
Nikita Shulga
034e6727e7 Set default ATen threading backend to native if USE_OPENMP is false (#43067)
Summary:
Since OpenMP is not available on some platforms, or might be disabled by user, set default `ATEN_THREADING` based on USE_OPENMP and USE_TBB options

Fixes https://github.com/pytorch/pytorch/issues/43036

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43067

Reviewed By: houseroad

Differential Revision: D23138856

Pulled By: malfet

fbshipit-source-id: cc8f9ee59a5559baeb3f19bf461abbc08043b71c
2020-08-17 10:33:31 -07:00
Venkata Chintapalli
33c5fe3c1d Enable test_logit FakeLowP test. (#43073)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43073

Enable test_logit FakeLowP test.

Test Plan: test_op_nnpi_fp16.py

Reviewed By: hyuen

Differential Revision: D23141375

fbshipit-source-id: cb7e7879487e33908b14ef401e1ab05fda193d28
2020-08-14 14:49:29 -07:00
Edson Romero
5014cf4a4d Export MergeIdLists Caffe2 Operator to PyTorch
Summary: As titled.

Test Plan: buck test //caffe2/caffe2/python/operator_test:torch_integration_test -- test_merge_id_lists

Reviewed By: yf225

Differential Revision: D23076951

fbshipit-source-id: c37dfd93003590eed70b0d46e0151397a402dde6
2020-08-14 14:46:17 -07:00
Hector Yuen
c8e789e06e add fake fp16 fusions to net transforms (#42927)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42927

added fp16 fusion to net transforms
refactored the transforms as well as glow_transform to get out of opt/custom so that the OSS builds passed

Test Plan: added net runner tests for this

Reviewed By: yinghai

Differential Revision: D23080881

fbshipit-source-id: ee6451811fedfd07c6560c178229854bca29301f
2020-08-14 13:30:27 -07:00
Dongxin Liu
a2b86d95d1 Make Mish support large inputs. (#43037)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43037

In the previous version of mish_op.cc, the output would be 'nan' for large inputs. We re-write mish_op.cc to solve this problem.

Test Plan:
Unit test
buck test //dper3/dper3/modules/tests:core_modules_test -- test_linear_compress_embedding_with_attention_with_activation_mish
{F284052906}

buck test mode/opt //dper3/dper3_models/ads_ranking/tests:model_paradigm_e2e_tests -- test_sparse_nn_with_mish
{F284224158}

## Workflow
f212113434

{F285281318}

Differential Revision: D23102644

fbshipit-source-id: 98f1ea82f8c8e05b655047b4520c600fc1a826f4
2020-08-14 08:53:16 -07:00
Luca Wehrstedt
ed242cbec5 Guard TensorPipe agent by USE_TENSORPIPE (#42682)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42682

ghstack-source-id: 109834351

Test Plan: CI

Reviewed By: malfet

Differential Revision: D22978717

fbshipit-source-id: 18b7cbdb532e78ff9259e82f0f92ad279124419d
2020-08-14 02:57:36 -07:00
Ren Chen
e182ec97b3 Fix illegal memory acess issue for CUDA versionn of SplitByLengths operator.
Summary:
1. Fix illegal memory access issue for SplitByLengths operator in the CUDA context.
2. Add support to scaling lengths vector for SplitByLengths operator.
3. Add support to test SplitByLengths operator in the CUDA context.

Example for SplitByLengths operator processing scaling lengths vector:
value vector A = [1, 2, 3, 4, 5, 6]
length vector B = [1, 2]
after execution of SplitByLengths operator,
the output should be [1,2] and [3,4,5,6]

Test Plan: buck test mode/dev-nosan caffe2/caffe2/python/operator_test:concat_split_op_test

Reviewed By: kennyhorror

Differential Revision: D23079841

fbshipit-source-id: 3700e7f2ee0a5a2791850071fdc16e5b054f8400
2020-08-14 01:04:08 -07:00
Hector Yuen
3544f60f76 make deadline=None for all numerics tests (#43014)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43014

changing this behavior mimics the behavior of the hold hypothesis
testing library

Test Plan: ran all tests on devserver

Reviewed By: hl475

Differential Revision: D23085949

fbshipit-source-id: 433fdfbb04b6a609b738eb7c319365049a49579b
2020-08-13 16:48:31 -07:00
Luca Wehrstedt
8493b0d5d6 Enroll TensorPipe agent in C++-only E2E test (#42680)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42680

ghstack-source-id: 109544678

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D22978714

fbshipit-source-id: 04d6d190c240c6ead9bd9f3b7f3a5f964d7451e8
2020-08-13 07:07:30 -07:00
Hector Yuen
5157afcf59 fix int8 FC (#42691)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42691

fix quantization of FC bias to match nnpi
quantize biases to fp16

Test Plan: improved the unit test to have input tensors in fp32

Reviewed By: tracelogfb

Differential Revision: D22941521

fbshipit-source-id: 00afb70610f8a149110344d52595c39e3fc988ab
2020-08-12 09:30:34 -07:00
Ehsan K. Ardestani
ecb9e790ed Remove excessive logging in plan_executor (#42888)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42888

as title

Test Plan: flow-cli test-locally dper.workflows.evaluation.eval_workflow --parameters-file /mnt/public/ehsanardestani/temp/quant_eval_inputs_all.json

Reviewed By: amylittleyang

Differential Revision: D23066529

fbshipit-source-id: f925afd1734e617e412b0f171e16c781d13272d9
2020-08-11 23:57:17 -07:00
Ophir Romano
a346e90c49 Update to NNP-I v1.0.0.5 (#4770)
Summary:
Align code to NNP-I v1.0.0.5 (glow tracing changes).

Pull Request resolved: https://github.com/pytorch/glow/pull/4770

Reviewed By: arunm-git

Differential Revision: D22927904

Pulled By: hl475

fbshipit-source-id: 3746a6b07f3fcffc662d80a95513427cfccac7a5
2020-08-11 23:53:23 -07:00
Christopher Whelan
7a9ae52550 [hypothesis] Deadline followup (#42842)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42842

Test Plan: `buck test`

Reviewed By: thatch

Differential Revision: D23045269

fbshipit-source-id: 8a3f4981869287a0f5fb3f0009e13548b7478086
2020-08-11 15:33:23 -07:00
Hector Yuen
3bf2978497 remove deadline enforcement for hypothesis (#42871)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42871

old version of hypothesis.testing was not enforcing deadlines
after the library got updated, default deadline=200ms, but even with 1s or
more, tests are flaky. Changing deadline to non-enforced which is the same
behavior as the old version

Test Plan: tested fakelowp/tests

Reviewed By: hl475

Differential Revision: D23059033

fbshipit-source-id: 79b6aec39a2714ca5d62420c15ca9c2c1e7a8883
2020-08-11 14:28:53 -07:00
Edson Romero
71dbfc79b3 Export BatchBucketOneHot Caffe2 Operator to PyTorch
Summary: As titled.

Test Plan:
```
buck test caffe2/caffe2/python/operator_test:torch_integration_test -- test_batch_bucket_one_hot_op
```

Reviewed By: yf225

Differential Revision: D23005981

fbshipit-source-id: 1daa8d3e7d6ad75e97e94964db95ccfb58541672
2020-08-11 14:00:19 -07:00
Yury Gitman
9c8f5cb61d Ensure IDEEP transpose operator works correctly
Summary: I found out that without exporting to public format IDEEP transpose operator in the middle of convolution net produces incorrect results (probably reading some out-of-bound memory). Exporting to public format might not be the most efficient solution, but at least it ensures correct behavior.

Test Plan: Running ConvFusion followed by transpose should give identical results on CPU and IDEEP

Reviewed By: bwasti

Differential Revision: D22970872

fbshipit-source-id: 1ddca16233e3d7d35a367c93e72d70632d28e1ef
2020-08-11 12:58:31 -07:00
Mike Ruberry
ddcf3ded3e Revert D23002043: add net transforms for fusion
Test Plan: revert-hammer

Differential Revision:
D23002043 (a4b763bc2c)

Original commit changeset: f0b13d51d68c

fbshipit-source-id: d43602743af35db825e951358992e979283a26f6
2020-08-10 21:22:57 -07:00
Mike Ruberry
dedcc30c84 Fix ROCm CI by increasing test timeout (#42827)
Summary:
ROCm is failing to run this test in the allotted time. See, for example, https://app.circleci.com/pipelines/github/pytorch/pytorch/198759/workflows/f6066acf-b289-46c5-aad0-6f4f663ce820/jobs/6618625.

cc jeffdaily

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42827

Reviewed By: pbelevich

Differential Revision: D23042220

Pulled By: mruberry

fbshipit-source-id: 52b426b0733b7b52ac3b311466d5000334864a82
2020-08-10 20:26:20 -07:00
Hector Yuen
a4b763bc2c add net transforms for fusion (#42763)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42763

add the fp16 fusions as net transforms:
-layernorm fused with mul+add
-swish int8

Test Plan: added unit test, ran flows

Reviewed By: yinghai

Differential Revision: D23002043

fbshipit-source-id: f0b13d51d68c240b05d2a237a7fb8273e996328b
2020-08-10 20:16:14 -07:00
Venkata Chintapalli
e7b5a23607 include missing settings import
Summary: from hypothesis import given, settings

Test Plan: test_op_nnpi_fp16.py

Differential Revision: D23031038

fbshipit-source-id: 751547e6a6e992d8816d4cc2c5a699ba19a97796
2020-08-10 10:45:34 -07:00
Christopher Whelan
5cd0f5e8ec [PyFI] Update hypothesis and switch from tp2 (#41645)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41645

Pull Request resolved: https://github.com/facebookresearch/pytext/pull/1405

Test Plan: buck test

Reviewed By: thatch

Differential Revision: D20323893

fbshipit-source-id: 54665d589568c4198e96a27f0ed8e5b41df7b86b
2020-08-08 12:13:04 -07:00
Hector Yuen
18ca999e1a integrate int8 swish with net transformer
Summary:
add a fuse path for deq->swish->quant
update swish fake op interface to take arguments accordingly

Test Plan:
net_runner passes
unit tests need to be updated

Reviewed By: venkatacrc

Differential Revision: D22962064

fbshipit-source-id: cef79768db3c8af926fca58193d459d671321f80
2020-08-07 23:01:06 -07:00
Venkata Chintapalli
e95fbaaba3 Adding Peter's Swish Op ULP analysis. (#42573)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42573

* Generate the ULP png files for different ranges.

Test Plan: test_op_ulp_error.py

Reviewed By: hyuen

Differential Revision: D22938572

fbshipit-source-id: 6374bef6d44c38e1141030d44029dee99112cd18
2020-08-07 19:13:01 -07:00
Jianyu Huang
d4a4c62df3 [caffe2] Fix the timeout (stuck) issues of dedup SparseAdagrad C2 kernel
Summary:
Backout D22800959 (f30ac66e79). This one is causing the timeout (machine stuck) issues for dedup kernels. Reverting it make the unit test pass. Still need to investigate why this is the culprit...

Original commit changeset: 641d52a51070

Test Plan:
```
buck test mode/dev-nosan //caffe2/caffe2/fb/net_transforms/tests:fuse_sparse_ops_test -- 'test_fuse_sparse_adagrad_with_sparse_lengths_sum_gradient \(caffe2\.caffe2\.fb\.net_transforms\.tests\.fuse_sparse_ops_test\.TestFuseSparseOps\)' --print-passing-details
```

Reviewed By: jspark1105

Differential Revision: D23008389

fbshipit-source-id: 4f1b9a41c78eaa5541d57b9d8aa12401e1d495f2
2020-08-07 18:42:36 -07:00
Jongsoo Park
3fa0581cf2 [fbgemm] use new more general depthwise 3d conv interface (#42697)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42697

Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/401

As title

Test Plan: CI

Reviewed By: dskhudia

Differential Revision: D22972233

fbshipit-source-id: a2c8e989dee84b2c0587faccb4f8e3bcb05c797c
2020-08-07 18:30:56 -07:00
Edson Romero
2b04712205 Exposing Percentile Caffe2 Operator in PyTorch
Summary: As titled.

Test Plan:
```
buck test caffe2/caffe2/python/operator_test:torch_integration_test -- test_percentile
```

Reviewed By: yf225

Differential Revision: D22999896

fbshipit-source-id: 2e3686cb893dff1518d533cb3d78c92eb2a6efa5
2020-08-07 16:22:37 -07:00
Adam Simpkins
02f58bdbd7 [caffe2] add type annotations for caffe2.distributed.python
Summary: Add Python type annotations for the `caffe2.distributed.python` module.

Test Plan: Will check sandcastle results.

Reviewed By: jeffdunn

Differential Revision: D22994012

fbshipit-source-id: 30565cc41dd05b5fbc639ae994dfe2ddd9e56cb1
2020-08-07 13:12:53 -07:00
lixinyu
98de150381 C++ API TransformerEncoderLayer (#42633)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42633

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D22994332

Pulled By: glaringlee

fbshipit-source-id: 873abdf887d135fb05bde560d695e2e8c992c946
2020-08-07 11:49:42 -07:00
Stephen Chen
2971bc23a6 Handle fused scale and bias in fake fp16 layernorm
Summary: Allow passing scale and bias to fake fp16 layernorm.

Test Plan: net_runner. Now matches glow's fused layernorm.

Reviewed By: hyuen

Differential Revision: D22952646

fbshipit-source-id: cf9ad055b14f9d0167016a18a6b6e26449cb4de8
2020-08-07 10:48:33 -07:00
Mike Ruberry
9c8021c0b1 Adds torch.linalg namespace (#42664)
Summary:
This PR adds the `torch.linalg` namespace as part of our continued effort to be more compatible with NumPy. The namespace is tested by adding a single function, `torch.linalg.outer`, and testing it in a new test suite, test_linalg.py. It follows the same pattern that https://github.com/pytorch/pytorch/pull/41911, which added the `torch.fft` namespace, did.

Future PRs will likely:

- add more functions to torch.linalg
- expand the testing done in test_linalg.py, including legacy functions, like torch.ger
- deprecate existing linalg functions outside of `torch.linalg` in preference to the new namespace

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42664

Reviewed By: ngimel

Differential Revision: D22991019

Pulled By: mruberry

fbshipit-source-id: 39258d9b116a916817b3588f160b141f956e5d0b
2020-08-07 10:18:30 -07:00
DeepakVelmurugan
4eb02add51 Blacklist to Blocklist in onnxifi_transformer (#42590)
Summary:
Fixes issues in https://github.com/pytorch/pytorch/issues/41704 and https://github.com/pytorch/pytorch/issues/41705

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42590

Reviewed By: ailzhang

Differential Revision: D22977357

Pulled By: malfet

fbshipit-source-id: ab61b964cfdf8bd2b469f4ff8f6486a76bc697de
2020-08-07 08:05:32 -07:00
Jordan Fix
fb8aa0046c Add use_glow_aot, and include ONNX again as a backend for onnxifiGlow (#4787)
Summary:
Pull Request resolved: https://github.com/pytorch/glow/pull/4787

Resurrect ONNX as a backend through onnxifiGlow (was killed as part of D16215878). Then look for the `use_glow_aot` argument in the Onnxifi op. If it's there and true, then we override whatever `backend_id` is set and use the ONNX backend.

Reviewed By: yinghai, rdzhabarov

Differential Revision: D22762123

fbshipit-source-id: abb4c3458261f8b7eeae3016dda5359fa85672f0
2020-08-07 04:31:24 -07:00
Chunli Fu
cb1ac94069 [blob reorder] Seperate user embeddings and ad embeddings in large model loading script
Summary: Put user embedding before ads embedding in blobReorder, for flash verification reason.

Test Plan:
```
buck run mode/opt-clang -c python.package_style=inplace sigrid/predictor/scripts:enable_large_model_loading -- --model_path_src="/home/$USER/models/" --model_path_dst="/home/$USER/models_modified/" --model_file_name="182560549_0.predictor"
```
https://www.internalfb.com/intern/anp/view/?id=320921 to check blobsOrder

Reviewed By: yinghai

Differential Revision: D22964332

fbshipit-source-id: 78b4861476a3c889a5ff62492939f717c307a8d2
2020-08-06 23:54:03 -07:00
Stephen Chen
cdd7db1ffc Bound shape inferencer: fix int8fc scale and bias
Summary:
Previous when inferring Int8FC, we failed to carry over the scale and zero point properly.

Also fixed int8 FC weight data type to be int8 instead of uint8 as that's what C2 actually uses.

Test Plan: Use net_runner to lower a single Int8Dequantize op. Previous scale and bias would always be 1 and 0. Now the proper value is set.

Reviewed By: yinghai

Differential Revision: D22912186

fbshipit-source-id: a6620c3493e492bdda91da73775bfc9117db12d1
2020-08-06 14:40:25 -07:00
Ehsan K. Ardestani
a5af2434fe NVMified NE Eval
Summary:
This diff NVMifies the NE Eval Flow.
- It defines a `LoadNVM` operator which either
  - receives a list of nvm blobs, or
  - extracts the blobs that could be NVMified from the model.
- dumps NVMified blobs into NVM
-  and deallocates from DRAM
- NVMify the Eval net on dper and C2 backend

Specific NVMOp for SLS is pushed through different diffs.

Test Plan: flow-cli test-locally dper.workflows.evaluation.eval_workflow --parameters-file=/mnt/public/ehsaardestani/temp/small_model.json 2>&1 | tee log

Reviewed By: yinghai, amylittleyang

Differential Revision: D22469973

fbshipit-source-id: ed8379ad404e96d04ac05e580176d3aca984575b
2020-08-06 10:25:31 -07:00
Luca Wehrstedt
c30bc6d4d7 Update TensorPipe submodule (#42522)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42522

Main changes:
- Consolidated CMake files to have a single entry point, rather than having a specialized one for PyTorch.
- Changed the way the preprocessor flags are provided, and changed their name.

There were a few instances in PyTorch's CMake files where we were directly adding TensorPipe's source directory as an include path, which however doesn't contain the auto-generated header we now added. We fix that by adding the `tensorpipe` CMake target as a dependency, so that the include paths defined by TensorPipe are used, which contain that auto-generated header. So instead we link those targets to the tensorpipe target in order for them to pick up the correct include directories.

I'm turning off SHM and CMA for now because they have never been covered by the CI. I'll enable them in a separate PR so that if they turn out to be flaky we can revert that change without reverting this one.

Test Plan: CI

Reviewed By: malfet

Differential Revision: D22959472

fbshipit-source-id: 1959a41c4a66ef78bf0f3bd5e3964969a2a1bf67
2020-08-06 02:14:58 -07:00
Ilia Cherniavskii
a53fdaa23f Remove ProfiledType (#42570)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42570

ProfiledType doesn't do anything and is not used atm, removing

Test Plan: CI

Reviewed By: ezyang

Differential Revision: D22938664

Pulled By: ilia-cher

fbshipit-source-id: 037c512938028f44258b702bbcde3f8c144f4aa0
2020-08-06 01:52:08 -07:00
Mike Ruberry
ccfce9d4a9 Adds fft namespace (#41911)
Summary:
This PR creates a new namespace, torch.fft (torch::fft) and puts a single function, fft, in it. This function is analogous to is a simplified version of NumPy's [numpy.fft.fft](https://numpy.org/doc/1.18/reference/generated/numpy.fft.fft.html?highlight=fft#numpy.fft.fft) that accepts no optional arguments. It is intended to demonstrate how to add and document functions in the namespace, and is not intended to deprecate the existing torch.fft function.

Adding this namespace was complicated by the existence of the torch.fft function in Python. Creating a torch.fft Python module makes this name ambiguous: does it refer to a function or module? If the JIT didn't exist, a solution to this problem would have been to make torch.fft refer to a callable class that mimicked both the function and module. The JIT, however, cannot understand this pattern. As a workaround it's required to explicitly `import torch.fft` to access the torch.fft.fft function in Python:

```
import torch.fft

t = torch.randn(128, dtype=torch.cdouble)
torch.fft.fft(t)
```

See https://github.com/pytorch/pytorch/issues/42175 for future work. Another possible future PR is to get the JIT to understand torch.fft as a callable class so it need not be imported explicitly to be used.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41911

Reviewed By: glaringlee

Differential Revision: D22941894

Pulled By: mruberry

fbshipit-source-id: c8e0b44cbe90d21e998ca3832cf3a533f28dbe8d
2020-08-06 00:20:50 -07:00
Rui Liu
92b7347fd7 Enforce counter value to double type in rowwise_counter
Summary:
Enforce counter value to double type in rowwise_counter.

**Context:**
The existing implementation is using float type for counter value. But due to the precision limit of a floating number [1], we observed that the counter value can't increment beyond 16777216.0 (i.e., the max value is 16777216.0) in our earlier experiments. We decide to enforce double type to avoid this issue.

[1] https://stackoverflow.com/questions/12596695/why-does-a-float-variable-stop-incrementing-at-16777216-in-c

Test Plan:
op test
```
ruixliu@devvm1997:~/fbsource/fbcode/caffe2/caffe2/python/operator_test(f0b0b48c)$ buck test :rowwise_counter_test
Trace available for this run at /tmp/testpilot.20200728-083200.729292.log
TestPilot test runner for Facebook. See https://fburl.com/testpilot for details.
Testpilot build revision cd2638f1f47250eac058b8c36561760027d16add fbpkg f88726c8ebde4ba288e1172a348c7f46 at Mon Jul 27 18:11:43 2020 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/887/t.par
Discovering tests
Running 1 test
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/7881299364977047
      ✓ caffe2/caffe2/python/operator_test:rowwise_counter_test - test_rowwise_counter (caffe2.caffe2.python.operator_test.rowwise_counter_test.TestRowWiseCounter) 0.265 1/1 (passed)
      ✓ caffe2/caffe2/python/operator_test:rowwise_counter_test - main 14.414 (passed)
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/7881299364977047
Summary (total time 18.51s):
  PASS: 2
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

optimizer test
```
ruixliu@devvm1997:~/fbsource/fbcode/caffe2/caffe2/python(7d66fbb9)$ buck test :optimizer_test
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/7036874434841896
Summary (total time 64.87s):
  PASS: 48
  FAIL: 0
  SKIP: 24
    caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestMomentumSgd)
    caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestGFtrl)
    caffe2/caffe2/python:optimizer_test - test_caffe2_cpu_vs_numpy (caffe2.caffe2.python.optimizer_test.TestYellowFin)
    caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestSparseRAdam)
    caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestRowWiseAdagradWithCounter)
    caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestAdagrad)
    caffe2/caffe2/python:optimizer_test - test_caffe2_gpu_vs_numpy (caffe2.caffe2.python.optimizer_test.TestYellowFin)
    caffe2/caffe2/python:optimizer_test - testDense (caffe2.caffe2.python.optimizer_test.TestRowWiseAdagrad)
    caffe2/caffe2/python:optimizer_test - testGPUDense (caffe2.caffe2.python.optimizer_test.TestFtrl)
    caffe2/caffe2/python:optimizer_test - testSparse (caffe2.caffe2.python.optimizer_test.TestRmsProp)
    ...and 14 more not shown...
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

param download test
```
ruixliu@devvm1997:~/fbsource/fbcode/caffe2/caffe2/fb/net_transforms/tests(7ef20a38)$ sudo buck test :param_download_test
Finished test run: Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/6473924481526935
```

e2e flow:
f208394929
f207991149
f207967273

ANP notebook to check the counter value loaded from the flows
https://fburl.com/anp/5fdcbnoi

screenshot of the loaded counter (note that counter max is larger than 16777216.0)

{F250926501}

Reviewed By: ellie-wen

Differential Revision: D22711514

fbshipit-source-id: 426fed7415270aa3f276dda8141907534734337f
2020-08-05 20:40:51 -07:00
Summer Deng
509fb77b70 Adjust bound_shape_inferencer to take 4 inputs for FCs (#41934)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41934

The model exported from online training workflow with int8 quantization contains FCs with 4 inputs. The extra input is the quant_param blob. This diff is to adjust the bound_shape_inferencer and int8 op schema to get shape info for the quant_param input.

Test Plan:
```
buck test caffe2/caffe2/opt:bound_shape_inference_test
```

Reviewed By: yinghai

Differential Revision: D22683554

fbshipit-source-id: 684d1433212a528120aba1c37d27e26b6a31b403
2020-08-05 18:44:48 -07:00
Andres Suarez
9ea9d1b52e [fbs][2/n] Remove .python3 markers
Test Plan:
`xbgr '\.python3'` shows only one (dead) usage of this file:
https://www.internalfb.com/intern/diffusion/FBS/browse/master/fbcode/python/repo_stats/buck.py?commit=9a8dd3243207819325d520c208218f6ab69e4e49&lines=854

Reviewed By: lisroach

Differential Revision: D22955631

fbshipit-source-id: e686d9157c08c347d0ce4acdd05bd7ab29ff7df5
2020-08-05 18:25:50 -07:00
Stephen Chen
54ffb05eff better error message between C2 and glow (#41603)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41603

Pull Request resolved: https://github.com/pytorch/glow/pull/4704

Previously in the glow onnxifi path, when an error is encountered, we log it to stderr then just return ONNXIFI_STATUS_INTERNAL_ERROR to C2. C2 then does CAFFE2_ENFORCE_EQUAL(return_code, ONNXIFI_STATUS_SUCCESS). The error message that eventually went to the user is something like

   [enforce fail at onnxifi_op.cc:545] eventStatus == ONNXIFI_STATUS_SUCCESS. 1030 vs 0

This diff adds plumbing to get human readable error message out of glow into C2.

Test Plan:
Run ads replayer. Overload it with traffic. Now the error message sent back to the client used to be

  E0707 00:57:45.697196 3709559 Caffe2DisaggAcceleratorTask.cpp:493] During running REMOTE_OTHER net: [enforce fail at onnxifi_op.cc:545] eventStatus == ONNXIFI_STATUS_SUCCESS. 1030 vs 0 (Error from operator:....

Now it's

```
E0707 16:46:48.366263 1532943 Client.cpp:966] Exception when calling caffe2_run_disagg_accelerator on remote predictor for model 190081310_0 : apache::thrift::TApplicationException: c10::Error: [enforce fail at onnxifi_op.cc:556] .
Error code: RUNTIME_REQUEST_REFUSED
Error message: The number of allowed queued requests has been exceeded. queued requests: 100 allowed requests: 100
Error return stack:
glow/glow/lib/Runtime/HostManager/HostManager.cpp:673
glow/glow/lib/Onnxifi/HostMana (Error from operator:...
```

Reviewed By: gcatron, yinghai

Differential Revision: D22416857

fbshipit-source-id: 564bc7644d9666eb660725c2dca5637affae9b73
2020-08-05 16:25:13 -07:00
Stephen Chen
5023995292 fix output size adjustment for onnxifi_op
Summary: this breaks if we cut the net at certain int8 ops boundary.

Test Plan: with net_runner to lower a single Int8Quantize op. It used to break. Now it works.

Reviewed By: yinghai

Differential Revision: D22912178

fbshipit-source-id: ca306068c9768df84c1cfa8b34226a1330e19912
2020-08-05 15:55:46 -07:00
Mike Ruberry
24e2a8a171 Revert D22780307: Fix illegal memory acess issue for CUDA versionn of SplitByLengths operator.
Test Plan: revert-hammer

Differential Revision:
D22780307 (76905527fe)

Original commit changeset: c5ca60ae16b2

fbshipit-source-id: f3c99eec5f05121e2bed606fe2ba84a0be0cdf16
2020-08-05 12:47:56 -07:00
Yongbin Gu
18a32b807b Add API to collect output_col_minmax_histogram
Summary:
Add an API to collect output_col_minmax_histogram. This is used to implement input_equalization.

Roll back revised the collect_single_histogram in the new version to make sure it does not affect the product.
The newly added one can implement collect the activation histogram and output col max histogram at the same time.

Test Plan:
Add a unit test, and pass it.
https://our.intern.facebook.com/intern/testinfra/testrun/2251799847601374
After updating the dump API, it passed the updated unit test
https://our.intern.facebook.com/intern/testinfra/testrun/844425097716401

Integrated the output_col_minmax_histogram to the collect single histogram, and make it downward compatible
https://our.intern.facebook.com/intern/testinfra/testrun/8162774342207893

I added different cases to tested newly added function. It passed the unit test  https://our.intern.facebook.com/intern/testinfra/testrun/4503599658969000

Tested after new revision: https://our.intern.facebook.com/intern/testinfra/testrun/5348024589078557

Reviewed By: hx89

Differential Revision: D22919913

fbshipit-source-id: c9cb05e0cf14af0dfde3d22921abb42f97a61df2
2020-08-05 12:33:10 -07:00
Yinghai Lu
5c5d7a9dca Freeze dynamic (re)quantizaiton ops into standard ones (#42591)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42591

We don't support lowering with 2-input Int8Quantize and 4-input Int8FC. Just do a conversion to absorb the quantization params into the op itself.

Test Plan:
```
buck test caffe2/caffe2/quantization/server:quantize_dnnlowp_op_test
```

Reviewed By: benjibc

Differential Revision: D22942673

fbshipit-source-id: a392ba2afdfa39c05c5adcb6c4dc5f814c95e449
2020-08-05 11:53:09 -07:00
Ren Chen
76905527fe Fix illegal memory acess issue for CUDA versionn of SplitByLengths operator.
Summary:
1. Fix illegal memory access issue for SplitByLengths operator in the CUDA context.
2. Add support to scaling lengths vector for SplitByLengths operator.
3. Add support to test SplitByLengths operator in the CUDA context.

Example for SplitByLengths operator processing scaling lengths vector:
value vector A = [1, 2, 3, 4, 5, 6]
length vector B = [1, 2]
after execution of SplitByLengths operator,
the output should be [1,2] and [3,4,5,6]

Test Plan: buck test mode/dev-nosan caffe2/caffe2/python/operator_test:concat_split_op_test

Reviewed By: kennyhorror

Differential Revision: D22780307

fbshipit-source-id: c5ca60ae16b24032cedfa045a421503b713daa6c
2020-08-05 11:46:00 -07:00
Dmytro Dzhulgakov
06d978a9ad [c10/cuda] Reorganize device_count() and robustly surface ASAN warnings (#42249)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42249

Main change is to bring Caffe2's superior error messages for cuda initialization into c10 and use them in all code paths.

Basic logic:

| Case | Call to device_count() | init_cuda, e.g. allocating tensor |
| -- | -- | -- |
| all good | non-zero | just works |
| no gpus | 0, no warning | throw exception with good message |
| driver issues | 0, produce warning | throw exception with good message |
| out of memory with ASAN | 0, produce warning| throw exception with ASAN message |

Previously, the error thrown from init_cuda was very generic and the ASAN warning (if any) was buried in the logs.

Other clean up changes:
* cache device_count() always in a static variable
* move all asan macros in c10

Test Plan:
Hard to unittest because of build modes. Verified manually that the behavior from the table above holds by running the following script in different modes (ASAN/no-ASAN, CUDA_VISIBLE_DEVICES=):

```
print('before import')
import torch
print('after import')
print('devices: ', torch.cuda.device_count())
x = torch.tensor([1,2,3])
print('tensor creation')
x = x.cuda()
print('moved to cuda')
```

Reviewed By: ngimel

Differential Revision: D22824329

fbshipit-source-id: 5314007313a3897fc955b02f8b21b661ae35fdf5
2020-08-05 11:39:31 -07:00
Yongbin Gu
d45e2d3ef9 Reduce the output overhead of OutputColumnMaxHistogramObserver by enabling changing bin_nums, Update the observer_test.py
Summary: Current OutputColumnMaxHistogramObserver will output 2048 bins for each column. The file will be extremely large and the dumping time is quite long. However, we only use the min and max finally. This diff enables changing bin_nums by adding an argument. And the default value is set to 16 to reduce dumping overhead. When we need more bins to analyze the results, we only need to change this argument

Test Plan:
buck run caffe2/caffe2/quantization/server:observer_test

{F263843430}

Reviewed By: hx89

Differential Revision: D22918202

fbshipit-source-id: bda34449355b269b24c55802012450ebaa4d280c
2020-08-04 17:07:25 -07:00
Yinghai Lu
8850fd1952 Add python inferface to create OfflineTensor (#42516)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42516

att. We need it for some scripts.

Reviewed By: houseroad

Differential Revision: D22918112

fbshipit-source-id: 8a1696ceeeda67a34114bc57cb52c925711cfb4c
2020-08-04 01:31:34 -07:00
Ann Shan
d707d4bf6d Implement a light SGD optimizer (#42137)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42137

This PR implements an SGD optimizer class similar to torch::optim::SGD, but it doesn't inherit from torch::optim::Optimizer, for use on mobile devices (or other lightweight use case).

Adding Martin's comment for visibility: "SGD may be the only optimizer used in near future. If more client optimizers are needed, refactoring the full optim codes and reusing the existing code would be an option."

Test Plan: Imported from OSS

Reviewed By: iseeyuan

Differential Revision: D22846514

Pulled By: ann-ss

fbshipit-source-id: f5f46804aa021e7ada7c0cd3f16e24404d10c7eb
2020-08-03 17:27:53 -07:00
Yinghai Lu
dbdd28207c Expose a generic shape info struct for ONNXIFI Python interface (#42421)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42421

Previously, we can only feed shape info from Python with float dtype, and batch based dim type when we do onnxifi from Python. This diff removes this limitation and uses TensorBoundShapes protobuf as a generic shape info struct. This will make the onnxifi interface in Python more flexible.

Reviewed By: ChunliF

Differential Revision: D22889781

fbshipit-source-id: 1a89f3a68c215a0409738c425b4e0d0617d58245
2020-08-03 16:10:05 -07:00
Xing Wang
ebfff31e19 [distributedhogwild] Introducing new tags for distributed hogwild. (#42381)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42381

Introduce new tag to support distributed hogwild.

Reviewed By: boryiingsu

Differential Revision: D20484099

fbshipit-source-id: 5973495589e0a7ab185d3867b37437aa747f408a
2020-08-03 07:10:44 -07:00
Martin Yuan
bfa94487b9 Remove register_mobile_autograd.cpp. (#42397)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42397

Since the autograd registration is unified to code-gen, we don't need to keep a manual registration file for mobile.
Remove it to avoid extra maintenance.

Test Plan: Imported from OSS

Reviewed By: ljk53

Differential Revision: D22883153

Pulled By: iseeyuan

fbshipit-source-id: 6db0bd89369beab9eed6e9a9692dd46f5bd1ff48
2020-08-02 14:14:33 -07:00
Xiaomeng Yang
5769b06ab5 [Caffe2] Remove explicitly divide by zero in SpatialBN training mode (#42380)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42380

[Caffe2] Remove explicitly divide by zero in SpatialBN training mode

Test Plan: buck test mode/dev-nosan //caffe2/caffe2/python/operator_test:spatial_bn_op_test

Reviewed By: houseroad

Differential Revision: D22873214

fbshipit-source-id: 70b505391b5db02b45fc46ecd7feb303e50c6280
2020-08-01 11:54:58 -07:00
Venkata Chintapalli
ff91b169c7 Changes to match Fused Op: Dequantize->Swish->Quantize (#42255)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42255

Changes to match Fused Op: Dequantize->Swish->Quantize
* Changes to scale handling

Results showing matching intermediate and final Swish_Int8 Op.
P137389801

Test Plan: test case test_deq_swish_quant_nnpi.py

Reviewed By: hyuen

Differential Revision: D22827499

fbshipit-source-id: b469470ca66f6405ccc89696694af372ce6ce89e
2020-07-31 16:54:39 -07:00
Jing Ma
4fc525e729 [Dper3] Implementation of squeezed input to DC++
Summary:
This Diff provides an option for DC++ module to use the squeezed sparse feature embeddings to generate attention weights, with the purpose of reducing the network size to achieve QPS gains. There are 3 squeeze options: sum, max, and mean, along the embedding dimension and are provided for both the attention weights and resnet generation.
Example workflow: f208474456

{F257199459}

Test Plan:
1. Test single ops
buck test dper3/dper3/modules/low_level_modules/tests:single_operators_test -- test_reduce_back_mean
buck test dper3/dper3/modules/low_level_modules/tests:single_operators_test -- test_reduce_back_max
2. Test DC++ module
buck test dper3/dper3/modules/tests:core_modules_test -- test_dc_pp_arch_one_layer_compressed_embeddings_only_squeeze_input
buck test dper3/dper3/modules/tests:core_modules_test -- test_dc_pp_arch_shared_input_squeeze_input
buck test dper3/dper3/modules/tests:core_modules_test -- test_dc_pp_input_compress_embeddings_squeeze_input
3. Test Arch
buck test dper3/dper3_models/ads_ranking/model_impl/sparse_nn/tests:sparse_nn_lib_test -- test_dense_sparse_interaction_compress_dot_arch_dot_compress_pp_squeezed_input
4. e2e test
buck test dper3/dper3_models/ads_ranking/tests:model_paradigm_e2e_tests -- test_sparse_nn_compress_dot_attention_fm_max_fc_size_squeeze_input

Reviewed By: taiqing

Differential Revision: D22825069

fbshipit-source-id: 29269ea22cb47d487a1c92a1f6daae1055f54cfc
2020-07-31 14:31:43 -07:00
Yan Xie
bdd9ef1981 Support RowWiseSparseAdam on GPU (#35404)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35404

Implement RowWiseSparseAdam on CUDA

Reviewed By: xw285cornell

Differential Revision: D20650225

fbshipit-source-id: 5f871e2f259e362b713c9281b4d94534453995cf
2020-07-31 10:47:29 -07:00
Edward Yang
352e15f1a2 Revert D22812445: Update TensorPipe submodule
Test Plan: revert-hammer

Differential Revision:
D22812445 (2335430086)

Original commit changeset: e6d824bb28f5

fbshipit-source-id: 606632a9aaf2513b5ac949e4d6687aa7563eae5d
2020-07-31 10:16:48 -07:00
DeepakVelmurugan
fbb052c2cc BlackList to BlockList (#42279)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/41701 blackList convention to blockList convention

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42279

Reviewed By: VitalyFedyunin

Differential Revision: D22843178

Pulled By: malfet

fbshipit-source-id: c9be5a5f084dfd0e46545d4a3d1124ef59277604
2020-07-30 18:06:49 -07:00
Hong Xu
7d6c4f62ef Remove 4 unused variables in lp_pool_op.cc (#42329)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42329

Reviewed By: VitalyFedyunin

Differential Revision: D22850894

Pulled By: mrshenli

fbshipit-source-id: 1e91380a432525b83c0bb0bfef0d5067c767cb67
2020-07-30 15:50:17 -07:00
Luca Wehrstedt
2335430086 Update TensorPipe submodule (#42225)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42225

Main changes:
- Consolidated CMake files to have a single entry point, rather than having a specialized one for PyTorch.
- Changed the way the preprocessor flags are provided, and changed their name.

There were a few instances in PyTorch's CMake files where we were directly adding TensorPipe's source directory as an include path, which however doesn't contain the auto-generated header we now added. We fix that by adding the `tensorpipe` CMake target as a dependency, so that the include paths defined by TensorPipe are used, which contain that auto-generated header.

I'm turning off SHM and CMA for now because they have never been covered by the CI. I'll enable them in a separate PR so that if they turn out to be flaky we can revert that change without reverting this one.

Test Plan: CircleCI is all green.

Reviewed By: beauby

Differential Revision: D22812445

fbshipit-source-id: e6d824bb28f5afe75fd765de0430968174f3531f
2020-07-30 02:32:52 -07:00
Hao Lu
4f163df41a [caffe2] Special handling of If/AsyncIf op in RemoveOpsByType (#42286)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42286

One more bug to fix. Operators such as If and AsyncIf need special treatment not just in `onnx::SsaRewrite`, but also in `RemoveOpsByType`. The solution needs two steps:
1) add external inputs/outputs of the subnets of If/AsyncIf op to the inputs/outputs of the op
2) if the inputs/outputs of the If/AsyncIf op need to be renamed as a result, the same inputs/outputs of the subnets need to be renamed as well.

I also added unit tests to cover this corner case.

Test Plan:
```
buck test //caffe2/caffe2/fb/predictor:black_box_predictor_test

mkdir /tmp/models
rm -rf /tmp/$USER/snntest
rm -rf /tmp/snntest
buck run mode/opt admarket/lib/ranking/prediction_replayer/snntest_replayer_test/tools:snntest_replay_test -- --serving_paradigm=USER_AD_PRECOMPUTATION_DSNN
```

Differential Revision: D22834028

fbshipit-source-id: c070707316cac694f452a96e5c80255abf4014bc
2020-07-30 02:02:20 -07:00
Jianyu Huang
f30ac66e79 [caffe2] Fix a performance bug in Dedup SparseAdagrad op (#42287)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42287

We shouldn't use block_size for thread dimensions in linear_index_weight_offsets_dedup_kernel, since the kernel doesn't iterate the embedding dimensions.
ghstack-source-id: 108834058

Test Plan:
```
buck test mode/dev-nosan //caffe2/caffe2/fb/net_transforms/tests:fuse_sparse_ops_test -- 'test_fuse_sparse_adagrad_with_sparse_lengths_sum_gradient \(caffe2\.caffe2\.fb\.net_transforms\.tests\.fuse_sparse_ops_test\.TestFuseSparseOps\)' --print-passing-details
```

Reviewed By: jspark1105

Differential Revision: D22800959

fbshipit-source-id: 641d52a51070715c04f9fd286e7e22ac62001f61
2020-07-30 01:00:59 -07:00
Priyanshu
6c251f74b2 replace black_list/blacklist with blocklist/block_list (#42089)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/41734

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42089

Reviewed By: pbelevich

Differential Revision: D22794556

Pulled By: SplitInfinity

fbshipit-source-id: 4404845b6293b076b3c8cc02b135b20c91397a79
2020-07-29 16:26:02 -07:00
Xing Wang
27b03d62de [HT] Clear the device placement tag for the auto gen sum so that we could break the component for FC sharing the same input (#42219)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42219

Introduce a new extra info that is tagged on the forward net for the operators sharing the same input. The effect is that the auto gen sum of gradient for the input will not follow the tag of the operator tags in the forward net. This allow more flexible device allocation.

Test Plan:
# unit test
`./buck-out/gen/caffe2/caffe2/python/core_gradients_test#binary.par -r  testMultiUseInputAutoGenSumDevice`

Reviewed By: xianjiec, boryiingsu

Differential Revision: D22609080

fbshipit-source-id: d558145e5eb36295580a70e1ee3a822504dd439a
2020-07-29 15:21:27 -07:00
Xiaomeng Yang
60f51542dc [Caffe2] Fix spatial_bn bug for computing running_var on CPU or on CUDA without CuDNN (#42151)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42151

Previously our Caffe2 SpatialBN op impl was incorrect for computing running_var without unbias coefficent. Actually it should fail the test because the output will be different with CuDNN's output. However, our tests are too weak to find this bug. This diff fix all of them.

Test Plan: buck test mode/dev-nosan //caffe2/caffe2/python/operator_test:spatial_bn_op_test

Reviewed By: houseroad

Differential Revision: D22786127

fbshipit-source-id: db80becb67d60c44faae180c7e4257cb136a266d
2020-07-29 11:20:03 -07:00
Ying Zhang
b2ef7fa359 Add a flag to enforce fp32 to fp16 conversion for all inputs of the onnxifi net. (#39931)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39931

ATT.

Reviewed By: yinghai, ChunliF

Differential Revision: D21993492

fbshipit-source-id: ff386e6e9b95a783906fc1ae6a62462e6559a20b
2020-07-28 16:48:43 -07:00
Chunli Fu
8a644f0c13 [Shape Inference] Fix InferFC
Summary: Sometimes first dim of X in FC is BATCH_OF_FEATURE_MAX instead of BATCH. This caused an issue in f207899183 (when first dim of X is 64 but is set to 1 in inferFC). Change the check from `!= BATCH` to `== UNKNOWN`

Test Plan: unit test

Reviewed By: yinghai

Differential Revision: D22784691

fbshipit-source-id: eb66ba361d6fe75672b13edbac2fbd269a7e7a00
2020-07-28 16:43:19 -07:00
Venkata Chintapalli
3c084fd358 Dequant => Swish => Quant Test case. (#41976)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41976

Dequant => Swish => Quant Test case.

(Note: this ignores all push blocking failures!)

Test Plan: test_deq_swish_quant_nnpi.py.

Reviewed By: hyuen

Differential Revision: D22718593

fbshipit-source-id: 1cee503a27e339af6d89c819007511b90bb6610c
2020-07-28 16:05:12 -07:00
Nikita Shulga
fd9205e14b Enable caffe2 tests for RocM jobs (#41604)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41604

Reviewed By: ezyang

Differential Revision: D22603703

Pulled By: malfet

fbshipit-source-id: 789ccf2bb79668a5a68006bb877b2d88fb569809
2020-07-28 14:21:42 -07:00
Nitish Awasthi
4d17ecb071 Changed Blacklisted to Blocklisted (#42100)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/41703

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42100

Reviewed By: ngimel

Differential Revision: D22780380

Pulled By: SplitInfinity

fbshipit-source-id: d465c41f1d4951ab6de55cb827c7ef53975209af
2020-07-28 13:21:26 -07:00
Nikita Shulga
48ae5945de Skip TestExtractPredictorNet if compiled without OpenCV (#42168)
Summary:
Found while trying to get RocM Caffe2 CI green

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42168

Reviewed By: seemethere

Differential Revision: D22791879

Pulled By: malfet

fbshipit-source-id: 8f7ef9711bdc5941b2836e4c8943bb95c72ef8af
2020-07-28 11:26:55 -07:00
Nikita Shulga
2f61aca17b Skip DataIO tests relying on LevelDB if compiled without it (#42169)
Summary:
Found while trying to get RocM Caffe2 job green

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42169

Reviewed By: seemethere

Differential Revision: D22791896

Pulled By: malfet

fbshipit-source-id: 9df6233876aec5ead056365499bab970aa7e8bdc
2020-07-28 10:18:26 -07:00
Khalid Almufti
b282297559 Replace whitelist with allowlist (#42067)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/41757

I've replaced all the whitelist with allowlist for this issue.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42067

Reviewed By: pbelevich

Differential Revision: D22791690

Pulled By: malfet

fbshipit-source-id: 638c13cf49915f5c83bd79c7f4a39b8390cc15b4
2020-07-28 08:01:16 -07:00
Hao Lu
5336ccc1b2 [BugFix] Fix bug in onnx::SsaRewrite (#42148)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42148

Differential Revision: D22687388

fbshipit-source-id: facf7a186dd48d6f919d0ff5d42f756977c3f9f4
2020-07-28 01:44:47 -07:00
Yinghai Lu
0a0960126c If we don't collect tracing, always free the trace data (#42118)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42118

We toggle trace on with a certain probablility. In the case of 3 inferences with trace on/off/on. We leak the trace from the first inference. Always clean up the trace will fix it.

Test Plan:
predictor

I created a tiny repro here: D22786551

With this fix, this issue is gone.

Reviewed By: gcatron

Differential Revision: D22768382

fbshipit-source-id: 9ee0bbcb2bc5f76107dae385759fe578909a683d
2020-07-27 21:49:30 -07:00
Jiyan Yang
c062cdbd90 Log the net if blob doesn't exist when setting output record (#41971)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41971

Reviewed By: wx1988

Differential Revision: D22490309

fbshipit-source-id: d967ee211b610f5523a307b5266b9fcb0277a21c
2020-07-27 19:13:50 -07:00
Stephen Chen
f805184165 onnxifi: make it work with AsyncIf
Summary:
the onnxifi path didn't handle the input/output name rewrite for ssa correctly for AsyncIf op. Add support for it.

Also fixed a place where we lose the net type while doing onnxifi transform.

Test Plan: Load 163357582_593 which is a multi feed model that uses AsyncIf. This used to fail with c2 not finding some blobs in workspace. Now it works.

Reviewed By: dhe95

Differential Revision: D21268230

fbshipit-source-id: ce7ec0e952513d0f251df1bfcfb2b0250f51fd94
2020-07-27 18:27:35 -07:00
Lingyi Liu
d6f1346c37 Add a new op for converting the dense feature to sparse representation
Summary: we need this op to avoid the splicing of a dense tensor and then use the Mergesinglescaler op

Test Plan: integrated test with dper2

Differential Revision: D22677523

fbshipit-source-id: f4f9a1f06841b0906ec8cbb435482ae0a89e1721
2020-07-27 12:45:37 -07:00
Venkata Chintapalli
4290d0be60 Remove settings for the logit test case. (#42114)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42114

Remove settings for the logit test case.

(Note: this ignores all push blocking failures!)

Test Plan: test_op_nnpi_fp16.py test case.

Reviewed By: hyuen

Differential Revision: D22766728

fbshipit-source-id: 2fe8404b103c613524cf1beddf1a0eb9068caf8a
2020-07-27 10:59:23 -07:00
Tristan Rice
976e614915 caffe2: add PIPELINE tag (#41482)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41482

This adds a new tag for use with pipeline parallelism.

Test Plan: CI

Reviewed By: heslami

Differential Revision: D22551487

fbshipit-source-id: 90910f458a9bce68f7ef684773322a49aa24494a
2020-07-24 15:25:14 -07:00
Jeff Daily
2e95b29988 restore at::Half support for caffe2 SumOp (#41952)
Summary:
PR https://github.com/pytorch/pytorch/issues/40379 added long support but removed at::Half support.  Restore at::Half support.

CC ezyang xw285cornell neha26shah

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41952

Reviewed By: colesbury

Differential Revision: D22720656

Pulled By: xw285cornell

fbshipit-source-id: be83ca7fe51fc43d81bc0685a3b658353d42f8ea
2020-07-24 10:49:06 -07:00
Priyanshu
401ac2dd39 Replaced whitelisted with allowed (#41867)
Summary:
Closes https://github.com/pytorch/pytorch/issues/41746
Closes https://github.com/pytorch/pytorch/issues/41745

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41867

Reviewed By: izdeby

Differential Revision: D22703533

Pulled By: mrshenli

fbshipit-source-id: 915895463a92e18f36db93b8884d9fd432c0997d
2020-07-23 16:53:51 -07:00
Ann Shan
dfe7d27d0e implement lite parameter serializer (#41403)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41403

Test Plan: Imported from OSS

Reviewed By: kwanmacher

Differential Revision: D22611633

Pulled By: ann-ss

fbshipit-source-id: b391e8c96234b2e69f350119a11f688e920c7817
2020-07-23 14:25:44 -07:00
Mengchi Zhang
30ce7b3740 Fix bug when compiling with caffe2 (#41868)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41868

Fix bug when compiling with caffe2

Reviewed By: jianyuh

Differential Revision: D22670707

fbshipit-source-id: aa654d7b9004257e0288c8ae8819ca5752eea443
2020-07-23 09:11:05 -07:00
Rohith Menon
4e16be9073 [MemLeak] Fix memory leak from releasing unique ptr (#41883)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41883

Fix memory leak from releasing unique ptr

Test Plan:
Tested serialization with and without the change.

Heap profile without change:
```
Welcome to jeprof!  For help, type 'help'.
(jeprof) top
Total: 7298.4 MB
  4025.2  55.2%  55.2%   4025.2  55.2% c10::alloc_cpu (inline)
  3195.3  43.8%  98.9%   3195.3  43.8% caffe2::SerializeUsingBytesOrInt32
    63.6   0.9%  99.8%     63.6   0.9% __gnu_cxx::new_allocator::allocate (inline)
     5.0   0.1%  99.9%      5.0   0.1% google::protobuf::RepeatedField::Reserve
     2.5   0.0%  99.9%      2.5   0.0% folly::aligned_malloc (inline)
     1.2   0.0%  99.9%      1.2   0.0% caffe2::detail::CopyFromProtoWithCast (inline)
     1.0   0.0%  99.9%      1.0   0.0% __new_exitfn
     1.0   0.0% 100.0%      1.0   0.0% std::_Function_base::_Base_manager::_M_init_functor (inline)
     0.5   0.0% 100.0%      0.5   0.0% folly::HHWheelTimerBase::newTimer (inline)
     0.5   0.0% 100.0%      0.5   0.0% std::__detail::_Hashtable_alloc::_M_allocate_node
```

Heap profile with change:
```
Welcome to jeprof!  For help, type 'help'.
(jeprof) top
Total: 6689.2 MB
  4025.2  60.2%  60.2%   4025.2  60.2% c10::alloc_cpu (inline)
  2560.0  38.3%  98.4%   2560.0  38.3% caffe2::::HugePagesArena::alloc_huge (inline)
    90.9   1.4%  99.8%     90.9   1.4% __gnu_cxx::new_allocator::allocate (inline)
     5.0   0.1%  99.9%      5.0   0.1% google::protobuf::RepeatedField::Reserve
     2.0   0.0%  99.9%      2.0   0.0% prof_backtrace_impl (inline)
     1.0   0.0%  99.9%     20.3   0.3% std::__cxx11::basic_string::_M_construct (inline)
     1.0   0.0%  99.9%      1.0   0.0% std::_Function_base::_Base_manager::_M_init_functor (inline)
     0.5   0.0%  99.9%      0.5   0.0% folly::UnboundedQueue::allocNextSegment (inline)
     0.5   0.0% 100.0%      0.5   0.0% folly::aligned_malloc (inline)
     0.5   0.0% 100.0%      0.5   0.0% __new_exitfn
```

Reviewed By: yinghai

Differential Revision: D22662093

fbshipit-source-id: d0b8ff1ed26c72b14bb02fb1146c51ef11a7e519
2020-07-22 16:54:19 -07:00
Colin L Reliability Rice
dfa914a90c Modify lazy_dyndep loading to trigger inside workspace. (#41687)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41687

Specifically, this makes a new library (lazy), which can be used from both core
and workspace.

This allows workspace.Createnet to trigger lazy loading of dyndep dependencies.

Test Plan: Added a unit test specifically for workspace.CreateNet

Reviewed By: dzhulgakov

Differential Revision: D22441877

fbshipit-source-id: 3a9d1af9962585d08ea2566c9c85bec7377d39f2
2020-07-22 15:36:43 -07:00
Yinghai Lu
2d15b39745 [Onnxifi] Support running with quantized int8 inputs (#41820)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41820

Pull Request resolved: https://github.com/pytorch/glow/pull/4721

In order to support int8 quantized tensor as an input to OnnxifiOp, we need to
- Add support to recognize and extract shape meta from int8 tensor at input of OnnxifiOp
- Make a copy of the input data and shift by 128 in Glow if input data is uint8 quantized tensor to get correct result because Glow uses int8 to represent the quantized data regardless.
- Propagate correct quantization parameters to through shape info in C2.

This diff implements the above.

Test Plan:
```
buck test caffe2/caffe2/contrib/fakelowp/test:test_int8_quantnnpi
```

Reviewed By: jackm321

Differential Revision: D22650584

fbshipit-source-id: 5e867f7ec7ce98bb066ec4128ceb7cad321b3392
2020-07-22 13:42:34 -07:00
Venkata Chintapalli
ca68dc7fa2 replace std::clamp with shim (#41855)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41855

replace std::clamp with shim

Test Plan: test_op_nnpi_fp16.py covers the testing.

Reviewed By: hyuen

Differential Revision: D22667645

fbshipit-source-id: 5e7c94b499f381bde73f1984a6f0d01fb962a671
2020-07-22 11:06:36 -07:00
Jeong Ukjae
f03156f9df replace blacklist in caffe2/python/onnx/frontend.py (#41777)
Summary:
Close https://github.com/pytorch/pytorch/issues/41712

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41777

Reviewed By: izdeby

Differential Revision: D22648532

Pulled By: yinghai

fbshipit-source-id: 7f4c9f313e2887e70bb4eb1ab037aea6b549cec7
2020-07-22 10:02:16 -07:00
Mengchi Zhang
5c9918e757 Fix row-wise sparse SparseLengthSum and sparse adagrad fused operator (#41818)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41818

Fix row-wise sparse SparseLengthSum and sparse adagrad fused operator

Reviewed By: jianyuh

Differential Revision: D22345013

fbshipit-source-id: 7c2d6c506b404f15a7aa8f1d0ccadb82e515a4c3
2020-07-21 19:32:16 -07:00
Venkata Chintapalli
3a9a64a4da Add non zero offset test cases for Quantize and Dequantize Ops. (#41693)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41693

Add non zero offset test cases for Quantize and Dequantize Ops.

Test Plan: Added new test case test_int8_non_zero_offset_quantize part of the test_int8_ops_nnpi.py test file.

Reviewed By: hyuen

Differential Revision: D22633796

fbshipit-source-id: be17ee7a0caa6e9bc7b175af539be2e6625ad47a
2020-07-20 16:03:32 -07:00
Eileen Pan
f07816003a [2/n][Compute Meta] support analysis for null flag features
Summary:
## TLDR
Support using NaN default value for missing dense features in RawInputProcessor for DPER2. In preparation for subsequent support for null flag features in compute meta. For train_eval this is already supported in DPER3 and we do not plan to support this in DPER2 train eval.

Differential Revision: D22439142

fbshipit-source-id: 99ae9755bd41a5d5f43bf5a9a2819d64f3883005
2020-07-20 13:13:45 -07:00
Venkata Chintapalli
cc3c18edbc More LayerNorm Vectorization in calcMeanStd function. (#41618)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41618

More LayerNorm Vectorization in calcMeanStd function.

Test Plan: test covered in test_layernorm_nnpi_fp16.py

Reviewed By: hyuen

Differential Revision: D22606585

fbshipit-source-id: be773e62f0fc479dbc2d6735f60c2e98441916e9
2020-07-20 11:55:54 -07:00
Alphons Jaimon
ce443def01 Grammar patch 1 (.md) (#41599)
Summary:
A minor spell check!
I have gone through a dozen of .md files to fix the typos.
zou3519 take a look!

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41599

Reviewed By: ezyang

Differential Revision: D22601629

Pulled By: zou3519

fbshipit-source-id: 68d8f77ad18edc1e77874f778b7dadee04b393ef
2020-07-20 10:19:08 -07:00
Hongzheng Shi
581e9526bb [GradualGating] support better k value change (#41557)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41557

 - add new learning rate functor "slope"
 - use "slope" learning rate in gated_sparse_feature module

Test Plan:
buck test dper3/dper3/modules/tests:core_modules_test -- test_gated_sparse_features_shape_num_warmup_tensor_k
buck test caffe2/caffe2/python/operator_test:learning_rate_op_test -- test_slope_learning_rate_op

Reviewed By: huayuli00

Differential Revision: D22544628

fbshipit-source-id: f2fcae564e79e1d8bcd3a2305d0c11ca7c0d3b3c
2020-07-17 20:44:28 -07:00
Stanislau Hlebik
b774ce54f8 remediation of S205607
fbshipit-source-id: 798decc90db4f13770e97cdce3c0df7d5421b2a3
2020-07-17 17:19:47 -07:00
Stanislau Hlebik
8fdea489af remediation of S205607
fbshipit-source-id: 5113fe0c527595e4227ff827253b7414abbdf7ac
2020-07-17 17:17:03 -07:00
Hao Lu
39b4701d31 [caffe2][redo] Reimplement RemoveOpsByType with SSA (#41606)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41606

The previous diff (D22220798 (59294fbbb9) and D22220797) was recently reverted (D22492356 (28291d3cf8), D22492355) because of a bug associated with the op AsyncIf. The AsyncIf op has net_defs as args and the SSA rewriting didn't take that into account. It has a special path for the op If, but not for AsyncIf. Several changes I made to fix the bug:
1) Add op AsyncIf to the special path for If op in SSA rewriting
2) clear inputs/outputs of the netdefs that are args in If/AsyncIf ops because they're no longer valid
3) revert renamed inputs/outputs in the arg netdefs that are in the external_outputs in the parent netdef

2) and 3) are existing bugs in the `SsaRewrite` function that were just never exposed before.

The algorithm for `RemoveOpsByType` is the same as in my previous diff D22220798 (59294fbbb9). The only new changes in this diff are in `onnx::SsaRewrite` and a few newly added unit tests.

(Note: this ignores all push blocking failures!)

Reviewed By: yinghai

Differential Revision: D22588652

fbshipit-source-id: ebb68ecd1662ea2bae14d4be8f61a75cd8b7e3e6
2020-07-17 16:06:43 -07:00
Venkata Chintapalli
43b1923d98 Enable SLS FP32 accumulation SparseLengthsWeightedSumFused8BitRowwiseFakeFP32NNPI Op. (#41577)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41577

* Remove skipping test
* Use fma_avx_emulation
* Increase test examples to 100

(Note: this ignores all push blocking failures!)

Test Plan: Tests are covered in test_sls_8bit_nnpi.py

Reviewed By: hyuen

Differential Revision: D22585742

fbshipit-source-id: e1f62f47eb10b402b11893ffca7a6786e31daa79
2020-07-17 11:19:47 -07:00
Nathan Goldbaum
1e230a5c52 rewrite C++ __torch_function__ handling to work with TensorList operands (#41575)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41575

Fixes https://github.com/pytorch/pytorch/issues/34294

This updates the C++ argument parser to correctly handle `TensorList` operands. I've also included a number of updates to the testing infrastructure, this is because we're now doing a much more careful job of testing the signatures of aten kernels, using the type information about the arguments as read in from `Declarations.yaml`. The changes to the tests are required because we're now only checking for `__torch_function__` attributes on `Tensor`, `Optional[Tensor]` and elements of `TensorList` operands, whereas before we were checking for `__torch_function__` on all operands, so the relatively simplistic approach the tests were using before -- assuming all positional arguments might be tensors -- doesn't work anymore. I now think that checking for `__torch_function__` on all operands was a mistake in the original design.

The updates to the signatures of the `lambda` functions are to handle this new, more stringent checking of signatures.

I also added override support for `torch.nn.functional.threshold` `torch.nn.functional.layer_norm`, which did not yet have python-level support.

Benchmarks are still WIP.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/34725

Reviewed By: mruberry

Differential Revision: D22357738

Pulled By: ezyang

fbshipit-source-id: 0e7f4a58517867b2e3f193a0a8390e2ed294e1f3
2020-07-17 08:54:29 -07:00
Yinghai Lu
eb3bf96f95 During inbatch broadcast, move Tile op after Fused8BitRowwiseQuantizedToFloat if applicable (#41464)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41464

If input is int8 rowwise quantized, currently we cannot low it to Glow. And previously, we had some error when running with inbatch broadcast. The main issue is that Tile op doesn't support uint8_t type, which is very easily added here. However, this will result in non-ideal situation that we will leave Tile -> Fused8BitRowwiseQuantizedToFloat on host side, which probably hurt the memory bw a lot. Even we later add the support to Fused8BitRowwiseQuantizedToFloat in Glow, it's still not ideal because we are doing redudant compute on identical columns. So the solution here is to swap the order of Fused8BitRowwiseQuantizedToFloat and Tile to make it Tile -> Fused8BitRowwiseQuantizedToFloat. In this way, it will resolve the error we saw immediately. For the short term, we can still run Tile in card. And for longer term, things runs faster on card.

The optimization is a heuristic. If in the net, there isn't such pattern, inbatch broadcast will work as it was before.

(Note: this ignores all push blocking failures!)

Test Plan:
```
buck test caffe2/caffe2/opt/custom:in_batch_broadcast_test
```

Reviewed By: benjibc

Differential Revision: D22544162

fbshipit-source-id: b6dd36a5925a9c8103b80f034e7730a7a085a6ff
2020-07-16 21:25:18 -07:00
Colin L Reliability Rice
415ff0bceb Create lazy_dyndeps to avoid caffe2 import costs. (#41343)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41343

Currently caffe2.InitOpLibrary does the dll import uniliaterally. Instead if we make a lazy version and use it, then many pieces of code which do not need the caffe2urrenoperators get a lot faster.

One a real test, the import time went from 140s to 68s. 8s.

This also cleans up the algorithm slightly (although it makes a very minimal
difference), by parsing the list of operators once, rather than every time a
new operator is added, since we defer the RefreshCall until after we've
imported all the operators.

The key way we maintain safety, is that as soon as someone does an operation
which requires a operator (or could), we force importing of all available
operators.

Future work could include trying to identify which code is needed for which
operator and only import the needed ones. There may also be wins available by
playing with dlmopen (which opens within a namespace), or seeing if the dl
flags have an impact (I tried this and didn't see an impact, but dlmopen may
make it better).

Note that this was previously landed and reverted. The issue was that if a import failed and raised an exception, the specific library would not be removed from the lazy imports. This caused our tests which had libraries that failed to poison all other tests that ran after it. This has been fixed and a unit test has been added for this case (to help make it obvious what failed).

Test Plan:
I added a new test a lazy_dyndep_test.py (copied from all_compare_test.py).
I'm a little concerned that I don't see any explicit tests for dyndep, but this
should provide decent coverage.

I've added a specific test to handle the poisoning issues mentioned above, which caused the previous version to get reverted.

Differential Revision: D22506369

fbshipit-source-id: 7395df4778e8eb0220630c570360b99a7d60eb83
2020-07-16 15:17:41 -07:00
Hector Yuen
d80e0c62be fix dequantization to match nnpi (#41505)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41505

fix the dequantization to match the fixes from quantization

Test Plan:
test is not conclusive, since only comparing emulation with reference collected from Amy's run

running an evaluation workflow at the moment

Reviewed By: venkatacrc

Differential Revision: D22558092

fbshipit-source-id: 3ff00ea15eac76007e194659c3b4949f07ff02a4
2020-07-16 00:40:57 -07:00
Hector Yuen
26790fb26d fix quantization mechanism to match nnpi (#41494)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41494

revert back to the changes from amylittleyang to make quantization work

Test Plan:
ran against a dump from ctr_instagram, and verified that:
-nnpi and fakelowp match bitwise
-nnpi is different at most by 1 vs fbgemm, most likely due to the type of
rounding

Reviewed By: venkatacrc

Differential Revision: D22555276

fbshipit-source-id: 7074521d181f15ef6270985bb71c4b44d25d1c30
2020-07-16 00:40:55 -07:00
Hector Yuen
e6859ec78f resurrect single quantization op test (#41476)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41476

deleted this test by default, re-adding it in its own file to make it
more explicit

Test Plan: ran the test

Reviewed By: yinghai

Differential Revision: D22550217

fbshipit-source-id: 758e279b2bab3b23452a3d0ce75fb366f7afb7be
2020-07-16 00:37:46 -07:00
Lu Fang
b2e52186b9 Rename capacity to nbytes in ShareExternalPointer to avoid confusion in future (#41461)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41461

capacity is misleading, and we have many wrong uses internally. Let's rename to nbytes to avoid the confusion in future. Ultimately, we could remove this parameter if possible.
So far I haven't seen any case this capacity is necessary.

Test Plan: oss ci

Differential Revision: D22544189

fbshipit-source-id: f310627f2ab8f4ebb294e0dd5eabc380926991eb
2020-07-15 22:04:18 -07:00
peter
404799d43f Disable failed caffe2 tests for BoundShapeInference on Windows (#41472)
Summary:
Related:
https://github.com/pytorch/pytorch/issues/40861
https://github.com/pytorch/pytorch/issues/41471

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41472

Reviewed By: yns88

Differential Revision: D22562385

Pulled By: malfet

fbshipit-source-id: aebc600915342b984f4fc47cef0a1e79d8965c10
2020-07-15 19:39:45 -07:00
Pritam Damania
ff6e560301 Add C++ end to end test for RPC and distributed autograd. (#36893)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36893

Adding an end to end test for running a simple training loop in C++
for the distributed RPC framework.

The goal of this change is to enable LeakSanitizer and potentially catch memory
leaks in the Future. Enabling LSAN with python multiprocessing is tricky and we
haven't found a solution for this. As a result, adding a C++ test that triggers
most of the critical codepaths would be good for now.

As an example, this unit test would've caught the memory leak fixed by:
https://github.com/pytorch/pytorch/pull/31030
ghstack-source-id: 107781167

Test Plan:
1) Verify the test catches memory leaks.
2) waitforbuildbot

Reviewed By: mrshenli

Differential Revision: D21112208

fbshipit-source-id: 4eb2a6b409253108f6b6e14352e593d250c7a64d
2020-07-15 12:59:19 -07:00
Venkata Chintapalli
225289abc6 Adding epsilon input argument to the Logit Op
Summary: Adding epsilon input argument to the Logit Op

Test Plan: Added test_logit test case.

Reviewed By: hyuen

Differential Revision: D22537133

fbshipit-source-id: d6f89afd1589fda99f09550a9d1b850cfc0b9ee1
2020-07-15 12:16:19 -07:00
Anush Elangovan
c86699d425 [cmake] Use PROJECT_SOURCE_DIR instead of CMAKE_* (#41387)
Summary:
Add support for including pytorch via an add_subdirectory()
This requires using PROJECT_* instead of CMAKE_* which refer to
the top-most project including pytorch.

TEST=add_subdirectory() into a pytorch checkout and build.
There are still some hardcoded references to TORCH_SRC_DIR, I will
fix in a follow on commit. For now you can create a symlink to
 <pytorch>/torch/ in your project.

Change-Id: Ic2a8aec3b08f64e2c23d9e79db83f14a0a896abc

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41387

Reviewed By: zhangguanheng66

Differential Revision: D22539944

Pulled By: ezyang

fbshipit-source-id: b7e9631021938255f0a6ea897a7abb061759093d
2020-07-15 11:09:05 -07:00
Shen Li
8548a21c00 Revert D22543215: Adjust bound_shape_inferencer to take 4 inputs for FCs
Test Plan: revert-hammer

Differential Revision:
D22543215 (86a2bdc35e)

Original commit changeset: 0977fca06630

fbshipit-source-id: b440f9b1eaeb35ec8b08e899890691e7a77a9f6d
2020-07-15 08:10:39 -07:00
Dinesh Govindaraj
f153b35b9b Shape inference for SparseToDense in ExpertCombiner
Summary: Adding shape inference for SpraseToDense. Proposal impl of shape inference only works when data_to_infer_dim is given, otherwise SpraseToDense output dimension depends on max value of input tensor

Test Plan:
buck test //caffe2/caffe2/python:sparse_to_dense_test
buck test //caffe2/caffe2/python:hypothesis_test -- test_sparse_to_dense

Dper3 Changes:
f204594813
buck test dper3/dper3_models/ads_ranking/model_impl/sparse_nn/tests:sparse_nn_lib_test

Reviewed By: zhongyx12, ChunliF

Differential Revision: D22479511

fbshipit-source-id: 8983a9baea8853deec53ad6f795c874c3fb93de0
2020-07-15 08:04:48 -07:00
Summer Deng
86a2bdc35e Adjust bound_shape_inferencer to take 4 inputs for FCs (#41452)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41452

The model exported from online training workflow with int8 quantization contains FCs with 4 inputs. The extra input is the quant_param blob. This diff is to adjust the bound_shape_inferencer to get shape info for the quant_param input.

Test Plan:
```
buck test caffe2/caffe2/opt:bound_shape_inference_test
```

Reviewed By: anurag16

Differential Revision: D22543215

fbshipit-source-id: 0977fca06630e279d47292e6b44f3d8180a767a5
2020-07-15 01:43:39 -07:00
Yan Xie
921d2a164f SparseAdagrad/RowWiseSparseAdagrad mean fusion on CPU & GPU and dedup version for RowWiseSparse mean fusion on GPU
Summary:
1. Support SparseAdagradFusedWithSparseLengthsMeanGradient and RowWiseSparseAdagradFusedWithSparseLengthsMeanGradient on CPU and GPU
2. Add the dedup implementation of fused RowWiseAdagrad op on GPUs for mean pooling

Reviewed By: xianjiec

Differential Revision: D22165603

fbshipit-source-id: 743fa55ed5893c34bc6406ddfbbbb347b88091d1
2020-07-14 22:36:16 -07:00
Hector Yuen
f074994a31 vectorize rounding ops (#41439)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41439

use RoundToFloat16 on arrays

Test Plan: layernorm unittest

Reviewed By: venkatacrc

Differential Revision: D22540118

fbshipit-source-id: dc84fd22b5dc6a3bd15ad4ec1eecb9db13d64e97
2020-07-14 20:59:39 -07:00
Hector Yuen
96f124e623 remove template arguments of layernorm
Summary:
remove layernorm templates and make them float since that's the only variant
minor fixes in logging and testing

Test Plan: ran the test

Reviewed By: venkatacrc

Differential Revision: D22527359

fbshipit-source-id: d6eec362a6e88e1c12fddf820ae629ede13fb2b8
2020-07-14 20:56:23 -07:00
rohithkrn
c528faac7d [ROCm] Skip problematic mgpu tests on ROCm3.5 (#41409)
Summary:
nccl tests and parallelize_bmuf_distributed test are failing on rocm3.5.1. Skipping these tests to upgrade the CI to rocm3.5.1

jeffdaily sunway513

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41409

Reviewed By: orionr

Differential Revision: D22528928

Pulled By: seemethere

fbshipit-source-id: 928196b7a62a441d391e69f54b278313ecc75d77
2020-07-14 11:55:43 -07:00
Hector Yuen
5f146a4125 fix include file path in unary ops
Summary: fix include file path in unary ops

Test Plan: compile

Reviewed By: amylittleyang

Differential Revision: D22527312

fbshipit-source-id: 589efd2231ff8bd3133cb7844738429927ecee68
2020-07-14 11:08:51 -07:00
Edward Yang
befb22790f Fix a number of deprecation warnings (#40179)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40179

- Pass no-psabi to shut up GCC about # Suppress "The ABI for passing
  parameters with 64-byte alignment has changed in GCC 4.6"
- Fix use of deprecated data() accessor (and minor optimization: hoist
  accessor out of loop)
- Undeprecate NetDef.num_workers, no one is serious about fixing these
- Suppress warnings about deprecated pthreadpool types

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D22234138

Pulled By: ezyang

fbshipit-source-id: 6a1601b6d7551a7e6487a44ae65b19acdcb7b849
2020-07-14 09:11:34 -07:00
Hector Yuen
d601325de4 update operators in the mapping to fp16 emulation
Summary: add logit and swish to this list

Test Plan: f203925461

Reviewed By: amylittleyang

Differential Revision: D22506814

fbshipit-source-id: b449e4ea16354cb76915adb01cf317cffb494733
2020-07-13 14:08:24 -07:00
Summer Deng
c451ddaeda Add shape inference functions for int8 quantization related ops (#41215)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41215

To unblock int8 model productization on accelerators, we need the shape and type info for all the blobs after int8 quantization. This diff added shape inference functions for int8 quantization related ops.

Test Plan:
```
buck test caffe2/caffe2/quantization/server:int8_gen_quant_params_test
buck test caffe2/caffe2/quantization/server:fully_connected_dnnlowp_op_test
```

Reviewed By: hx89

Differential Revision: D22467487

fbshipit-source-id: 8298abb0df3457fcb15df81f423f557c1a11f530
2020-07-13 12:02:11 -07:00
Yavuz Yetim
d04a2e4dae Back out "Revert D22329069: Self binning histogram" (#41313)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41313

This diff backs out the backout diff.  The failure was due to C++ `or`
not being supported in MSVC. This is now replaced with ||

Original commit changeset: fc7f3f8c968d

Test Plan: Existing unit tests, check github CI.

Reviewed By: malfet

Differential Revision: D22494777

fbshipit-source-id: 3271288919dc3a6bfb82508ab9d021edc910ae45
2020-07-13 11:46:34 -07:00
Hector Yuen
dea39b596e reduce logging for layernorm (#41305)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41305

added a warning message when layernorm under/overflows, which is what
nnpi does, reducing the frequency of the logging to every 1000

Test Plan: compilation

Reviewed By: yinghai

Differential Revision: D22492726

fbshipit-source-id: 9343beeae6e65bf3846c6b3d2edd2a08dac85ed6
2020-07-13 10:23:46 -07:00
Mengchi Zhang
67a4f375cd Pass the number of indices but not embedding size in PyTorch operator (#41315)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41315

We should pass the number of indices but not embedding size in SparseAdagrad fused PyTorch operator

Reviewed By: jianyuh

Differential Revision: D22495422

fbshipit-source-id: ec5d3a5c9547fcd8f95106d912b71888217a5af0
2020-07-12 20:55:40 -07:00
Anurag Gupta
106b0b6a62 Op to create quant scheme blob (#40760)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40760

Add op to create a quant scheme.

Test Plan:
buck test mode/opt caffe2/caffe2/quantization/server:int8_quant_scheme_blob_fill_test

{F241838981}

Reviewed By: csummersea

Differential Revision: D22228154

fbshipit-source-id: 1b7a02c06937c68e2fcccf77eb10a965300ed732
2020-07-11 10:53:10 -07:00
Mengchi Zhang
c864158475 Add fp16 support to SparseLengthSum PyTorch operator (#41058)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41058

SparseLengthSum PyTorch operator just accept float and double type before, this diff add fp16 support to SparseLengthSum PT operator.

Reviewed By: jianyuh

Differential Revision: D22387253

fbshipit-source-id: 2a7d03ceaadbb7b04077cff72ab77da6457ba989
2020-07-11 07:54:32 -07:00
Hao Lu
28291d3cf8 [caffe2] Revert D22220798 (#41302)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41302

Test Plan:
```
buck test //caffe2/caffe2/fb/predictor:black_box_predictor_test
```

Differential Revision: D22492356

fbshipit-source-id: efcbc3c67abda5cb9da47e633804a4800d92f89b
2020-07-11 03:28:29 -07:00
Hector Yuen
e544bf2924 fix the range of the random weights used in the int8fc test (#41303)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41303

the error came from I0710 18:02:48.025024 1780875 NNPIOptions.cpp:49]
[NNPI_LOG][D] [KS] convert_base_kernel_ivp.cpp(524): Output Scale 108240.101562
is out of valid range +-(Min 0.000061 Max 65504.000000)!!!

Seems like the weights we are using are too small, thus generating scaling
factors out of the range of fp16 (>65k). I am tentatively increasing this
factor to a higher value to avoid this. (10x bigger)

Also increased max_examples to 100

Test Plan: ran this test

Reviewed By: yinghai

Differential Revision: D22492481

fbshipit-source-id: c0f9e59b0e70895ab787868ef1d87e6e80106554
2020-07-11 00:19:29 -07:00
Jianyu Huang
095886fa42 [caffe2] Fix the issues when using CUB RadixSort (#41299)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41299

When using `cub::DeviceRadixSort::SortPairs` (https://nvlabs.github.io/cub/structcub_1_1_device_radix_sort.html), the `end_bit` argument, or the most-significant bit index (exclusive) needed for key comparison, should be passed with  `int(log2(float(num_rows)) + 1)` instead of `int(log2(float(num_indice)) + 1)`. This is because all the values in indices array are guaranteed to be less than num_rows (hash_size), not num_indices. Thanks ngimel for pointing this point and thanks malfet for quickly fixing the log2() compilation issues.

Note:
An optional bit subrange [begin_bit, end_bit) of differentiating key bits can be specified. This can reduce overall sorting overhead and yield a corresponding performance improvement.

Test Plan:
```
buck test mode/dev-nosan //caffe2/caffe2/fb/net_transforms/tests:fuse_sparse_ops_test -- 'test_fuse_sparse_adagrad_with_sparse_lengths_sum_gradient \(caffe2\.caffe2\.fb\.net_transforms\.tests\.fuse_sparse_ops_test\.TestFuseSparseOps\)' --print-passing-details
```

Reviewed By: malfet

Differential Revision: D22491662

fbshipit-source-id: 4fdabe86244c948af6244f9bd91712844bf1dec1
2020-07-10 22:39:43 -07:00
Nikita Shulga
d1f06da9b7 Solve log2(x:int) ambiguity by using log2(float(x)) (#41295)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41295

Differential Revision: D22490995

Pulled By: malfet

fbshipit-source-id: 17037e551ce5986f3162389a61932099563c02a7
2020-07-10 20:12:36 -07:00
Nikita Shulga
7bae5780a2 Revert D22329069: Self binning histogram
Test Plan: revert-hammer

Differential Revision:
D22329069 (16c8146da9)

Original commit changeset: 28406b94e284

fbshipit-source-id: fc7f3f8c968d1ec7d2a1cf7a4d05900f51055d82
2020-07-10 16:22:29 -07:00
Yavuz Yetim
16c8146da9 Self binning histogram (#40875)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40875

This op uses the given num_bins and a spacing strategy to automatically bin and compute the histogram of given matrices.

Test Plan: Unit tests.

Reviewed By: neha26shah

Differential Revision: D22329069

fbshipit-source-id: 28406b94e284d52d875f73662fc82f93dbc00064
2020-07-10 13:55:42 -07:00
Jade Nie
75b6dd3d49 Wrap Caffe2's SparseLengthsSum into a PyTorch op (#39596)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39596

This diff wraps Caffe2's SparseLengthsSum on GPU as a PT op.

Reviewed By: jianyuh

Differential Revision: D21895309

fbshipit-source-id: 38bb156f9be8d28225d2b44f5b4c93d27779aff9
2020-07-10 11:19:13 -07:00
Venkata Chintapalli
4a09501fbe LogitOp LUT based fake FP16 Op. (#41258)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41258

LogitOp LUT based fake FP16 Op.

(Note: this ignores all push blocking failures!)

Test Plan: test_op_nnpi_fp16.py covers the test_logit testing.

Reviewed By: hyuen

Differential Revision: D22351963

fbshipit-source-id: e2ed2bd9bfdc58c6f823d7d41557109c08628bd7
2020-07-10 10:53:42 -07:00
Jianyu Huang
62e16934cb [caffe2] Add the dedup implementation of fused RowWiseAdagrad op on GPUs (#40282)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40282

Test Plan:
```
buck test mode/dev-nosan //caffe2/caffe2/fb/net_transforms/tests:fuse_sparse_ops_test -- 'test_fuse_sparse_adagrad_with_sparse_lengths_sum_gradient \(caffe2\.caffe2\.fb\.net_transforms\.tests\.fuse_sparse_ops_test\.TestFuseSparseOps\)' --print-passing-details
```

https://our.intern.facebook.com/intern/testinfra/testrun/4785074632584150

Reviewed By: jspark1105

Differential Revision: D22102737

fbshipit-source-id: fa3fef7cecb1e2cf5c9b6019579dc0f86fd3a3b2
2020-07-10 09:05:24 -07:00
rohithkrn
df252c059c [ROCm] Skip caffe2 unique op test for rocm3.5 (#41219)
Summary:
unique op test failure in caffe2 blocks upgrading CI to rocm3.5.1. Skipping the test to unblock will re-enable after root causing and fixing the issue.
jeffdaily sunway513

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41219

Differential Revision: D22471452

Pulled By: xw285cornell

fbshipit-source-id: 9e503c8b37c0a4b92632f77b2f8a90281a9889c3
2020-07-09 20:00:29 -07:00
Hector Yuen
a79b416847 make Int8 FC bias quantization use round flush to infinity
Summary:
the current quantization rounding function uses fbgemm which
defaults to round to nearest. The current implementation of hw uses round
flush to infinity. Adding such an option to switch the mode of rounding.

Test Plan: ran against test_fc_int8

Reviewed By: venkatacrc

Differential Revision: D22452306

fbshipit-source-id: d2a1fbfc695612fe07caaf84f52669643507cc9c
2020-07-09 17:25:41 -07:00
Kimish Patel
d6feb6141f [Vec256][neon] Add neon backend for vec256 (#39341)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39341

This PR introduces neon backend for vec256 class for float datatype.
For now only aarch64 is enabled due to few issues with enabling in
aarch32 bit.

Test Plan:
vec256_test

Imported from OSS

Differential Revision: D21822399

fbshipit-source-id: 3851c4336d93d1c359c85b38cf19904f82bc7b8d
2020-07-09 16:25:09 -07:00
Kimish Patel
bddba1e336 Add benchmark for add op. (#40059)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40059

This benchmark is added specifically for mobile to see if compiler is
autovectorizing and thus we have no advantage of neon backend for vec256
for add op.

Test Plan:
CI

Imported from OSS

Differential Revision: D22055146

fbshipit-source-id: 43ba6c4ae57c6f05d84887c2750ce21ae1b0f0b5
2020-07-09 16:22:55 -07:00
Nikita Shulga
1f1351488e Revert D21870844: Create lazy_dyndeps to avoid caffe2 import costs.
Test Plan: revert-hammer

Differential Revision:
D21870844 (07fd5f8ff9)

Original commit changeset: 3f65fedb65bb

fbshipit-source-id: 4f661072d72486a9c14711e368247b3d30e28af9
2020-07-09 14:18:38 -07:00
HC Zhu
2252188e85 [caffe2] Fix spatial_batch_norm_op dividision-by-zero crash (#40806)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/40806

When the input is empty, the operator will crash on "runtime error: division by zero". This has been causing Inference platform server crashes.

Example crash logs:

{P134526683}

Test Plan:
Unit test

See reproducing steps in the Test Plan of D22300135

Reviewed By: houseroad

Differential Revision: D22302089

fbshipit-source-id: aaa5391fddc86483b0f3aba3efa7518e54913635
2020-07-09 12:04:11 -07:00
Linbin Yu
df1f8a48d8 add null check for c2 tensor conversion (#41096)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41096

The spark spot model had some issues in tensor conversion, see P134598596. It happens when we convert an undefined c10 tensor to caffe2 tensor.
This diff added a null check.

Test Plan: spark spot model runs without problem

Reviewed By: smessmer

Differential Revision: D22330705

fbshipit-source-id: dfe0f29a48019b6611cad3fd8f2ae49e8db5427e
2020-07-09 11:44:23 -07:00
Colin L Reliability Rice
07fd5f8ff9 Create lazy_dyndeps to avoid caffe2 import costs. (#39488)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39488

Currently caffe2.InitOpLibrary does the dll import uniliaterally. Instead if we make a lazy version and use it, then many pieces of code which do not need the caffe2urrenoperators get a lot faster.

One a real test, the import time went from 140s to 68s. 8s.

This also cleans up the algorithm slightly (although it makes a very minimal
difference), by parsing the list of operators once, rather than every time a
new operator is added, since we defer the RefreshCall until after we've
imported all the operators.

The key way we maintain safety, is that as soon as someone does an operation
which requires a operator (or could), we force importing of all available
operators.

Future work could include trying to identify which code is needed for which
operator and only import the needed ones. There may also be wins available by
playing with dlmopen (which opens within a namespace), or seeing if the dl
flags have an impact (I tried this and didn't see an impact, but dlmopen may
make it better).

Test Plan:
I added a new test a lazy_dyndep_test.py (copied from all_compare_test.py).
I'm a little concerned that I don't see any explicit tests for dyndep, but this
should provide decent coverage.

Differential Revision: D21870844

fbshipit-source-id: 3f65fedb65bb48663670349cee5e1d3e22d560ed
2020-07-09 11:34:57 -07:00