Commit Graph

6457 Commits

Author SHA1 Message Date
Hao Lu
53dff784e2 [caffe2] Fix inplace ops in onnx::SsaRewrite (#46134)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46134

Make sure in-place ops stay in-place after SsaRewrite. This seems to break the premise of SSA, but it's necessary to ensure correctness. Note here we only preserve the inplace ops that enforce inplace. Ops like `Relu` don't enforce inplace, they allow inplace.

(Note: this ignores all push blocking failures!)

Reviewed By: yinghai

Differential Revision: D24234957

fbshipit-source-id: 274bd3ad6227fce6a98e615aad7e57cd2696aec3
2020-10-22 13:26:31 -07:00
Hao Lu
51bf7bed84 [caffe2] Allow memonger to optimize nets with inplace(enforced) ops (#46560)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46560

Follow-up for D24236604 (16c52d918b).

For nets that pass the schema check, memonger actually makes sure to preserve the inplaceness of operators if they are already inplace. So we can safely enable it for correct input nets.

(Note: this ignores all push blocking failures!)

Differential Revision: D24402482

fbshipit-source-id: a7e95cb0e3eb87adeac79b9b69eef207957b0bd5
2020-10-22 13:23:33 -07:00
Richard Barnes
c44300884e Clarify timing of GetDeviceProperty() (#46715)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46715

Test Plan: N/A

Reviewed By: ezyang

Differential Revision: D24455538

fbshipit-source-id: 1770807d178f618ef6338e28f669f09e4cbd2009
2020-10-22 11:29:31 -07:00
Alexander Grund
93719440b8 Replace map(lambda constructs (#46462)
Summary:
Follow-up of https://github.com/pytorch/pytorch/issues/46461 with a similar goal

Makes them more readable and possibly faster. Care has to be taken because `map` applies the function immediately while `(x for x in xs)` is a generator expression which gets evaluated later. This is a benefit in some cases where it is not required to actually create the list of values in memory (e.g. when passing to `tuple` or `extend` or `join`)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46462

Reviewed By: zou3519

Differential Revision: D24422343

Pulled By: ezyang

fbshipit-source-id: 252e33499c92ac0b15238f2df32681dbbda2b237
2020-10-22 09:50:22 -07:00
Jeff Hwang
9b5197b763 [mlf][efficiency] add tensor inference function to last-n collector op (#46693)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46693

title

Test Plan: unit tests

Reviewed By: hx89

Differential Revision: D23946770

fbshipit-source-id: f7c3d4a1b4ef3b0e5f56e5a9a30f5003ce9f40b0
2020-10-22 01:15:00 -07:00
Daya Khudia
f47231bf0e [caffe2][dnnlowp] Remove openmp usage in quantize dnnlowp op
Summary: It creates cpu overload issues when openmp gets enabled and OMP_NUM_THREADS=1 is not set.

Test Plan: buck test //caffe2/caffe2/quantization/server:quantize_dnnlowp_op_test

Reviewed By: jspark1105

Differential Revision: D24437305

fbshipit-source-id: 426209fc33ce0d4680c478f584716837ee62cb5e
2020-10-20 19:33:56 -07:00
Alexander Grund
5b0f400488 Replace list(map(...)) constructs by list comprehensions (#46461)
Summary:
As discussed in https://github.com/pytorch/pytorch/issues/46392 this makes the code more readable and possibly more performant.

It also fixes a bug detected by this where the argument order of `map` was confused: 030a24906e (diff-5bb26bd3a23ee3bb540aeadcc0385df2a4e48de39f87ed9ea76b21990738fe98L1537-R1537)

Fixes https://github.com/pytorch/pytorch/issues/46392

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46461

Reviewed By: ailzhang

Differential Revision: D24367015

Pulled By: ezyang

fbshipit-source-id: d55a67933cc22346b00544c9671f09982ad920e7
2020-10-19 18:42:49 -07:00
Jiakai Liu
3d421b3137 [pytorch] rewrite of the python binding codegen with the v2 API (#46244)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46244

- What does the generated binding code do?

The Python binding codegen produces code that takes the input list of
PyObjects, finds the matching ATen C++ function using PythonArgParser,
converts the PyObjects into C++ types and calls the ATen C++ function:

```
+--------+  parsing   +------------------------+  binding   +-----------------------+
| PyObjs | ---------> | PythonArgParser Output | ---------> | Cpp Function Dispatch |
+--------+            +------------------------+            +-----------------------+
```

- Are Python arguments 1-1 mapped to C++ arguments?

Python arguments might be reordered, packed, unpacked when binding to
C++ arguments, as illustrated below:

```
// Binding - Reorder & Packing
// aten::empty.names(int[] size, *, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None,
                     Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor

            Python Args               Cpp Args
-----------------------------------------------------------
         0: size                      size
         1: names                     names
         2: memory_format -------+
         3: dtype         -----+-|--> options
         4: layout            /  |
         5: device           /   +--> memory_format
         6: pin_memory      /
         7: requires_grad -+

// Binding - Unpacking
// aten::max.names_dim(Tensor self, Dimname dim, bool keepdim=False) -> (Tensor values, Tensor indices)

            Python Args               Cpp Args
-----------------------------------------------------------
                               +----> max
                              /-----> max_values
         0: input            /        self
         1: dim             /         dim
         2: keepdim        /          keepdim
         3: out      -----+
```

- Why do we want to rewrite the python binding codegen?

The old codegen takes Declarations.yaml as input. It doesn't distinguish
between Python arguments and C++ arguments - they are all mixed together
as a bag of non-typed dict objects. Different methods process these arg
objects and add new attributes for various different purposes. It's not so
obvious to figure out the semantics of these attributes. The complicated
binding logic happens implicitly and scatteredly.

```
+--------------------+
|  Native Functions  |
+--------------------+
  |
  |
  v
+--------------------+
|   Cpp Signatures   |
+--------------------+
  |
  |
  v
+--------------------+
| Declarations.yaml  |
+--------------------+
  |                        +-------------------------------------+
  |              +-------> |       PythonArgParser Schema        |
  |              |         +-------------------------------------+
  |              |                            .
  |              |                            .
  v              |                            .
+--------------------+     +-------------------------------------+
| NonTyped Args Objs | --> | PythonArgParser -> Cpp Args Binding |
+--------------------+     +-------------------------------------+
                 |                            .
                 |                            .
                 |                            .
                 |         +-------------------------------------+
                 +-------> |        Cpp Function Dispatch        |
                           +-------------------------------------+
```

This PR leverages the new immutable data models introduced in the new
aten codegen. It introduces dedicated data models for python schema.
This way, we can not only avoid subtle Declaration.yaml conversions but
also decouple the generation of python schema, python to c++ binding and
c++ function call.

The ultimate state will be like the following diagram:

```
            +-------------------+     +-------------------------------------+
  +-------> | Python Signatures | --> |       PythonArgParser Schema        |
  |         +-------------------+     +-------------------------------------+
  |                         |                            .
  |                         |                            .
  |                         |                            .
+------------------+        |         +-------------------------------------+
| Native Functions |        +-------> | PythonArgParser -> Cpp Args Binding |
+------------------+        |         +-------------------------------------+
  |                         |                            .
  |                         |                            .
  |                         |                            .
  |         +-------------------+     +-------------------------------------+
  +-------> |  Cpp Signatures   | --> |        Cpp Function Dispatch        |
            +-------------------+     +-------------------------------------+
```

This PR has migrated the core binding logic from
tools/autograd/gen_python_functions.py to tools/codegen/api/python.py.

It produces the byte-for-byte same results (tested with #46243).

Will migrate the rest of gen_python_functions.py in subsequent PRs.

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D24388874

Pulled By: ljk53

fbshipit-source-id: f88b6df4e917cf90d868a2bbae2d5ffb680d1841
2020-10-19 17:36:45 -07:00
jiej
ac146c4820 [nvFuser] Switching to CudaFusionGuard from BailOut for nvfuser - update 2 (#46452)
Summary:
1. Added CudaFusionGuard as the custom TypeCheck for nvfuser; enabled dynamic shape support with profiling executor;
2. dropped support for legacy fuser;
3. re-enabled nvfuser tests;
4. added registration for profiling record to allow profiling on user specified nodes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46452

Reviewed By: zou3519, anjali411

Differential Revision: D24364642

Pulled By: ngimel

fbshipit-source-id: daf53a9a6b6636e1ede420a3a6d0397d4a8b450b
2020-10-19 15:44:31 -07:00
Tristan Rice
0c9787c758 caffe2: use at::mt19937 instead of std::mt19937 (10x speedup) (#43987)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43987

This replaces the caffe2 CPU random number (std::mt19937) with at::mt19937 which is the one currently used in pytorch. The ATen RNG is 10x faster than the std one and appears to be more robust given bugs in the std (https://fburl.com/diffusion/uhro7lqb)

For large embedding tables (10GB+) we see UniformFillOp taking upwards of 10 minutes as we're bottlenecked on the single threaded RNG. Swapping to at::mt19937 cuts that time to 10% of the current.

Test Plan: Ran all relevant tests + CI. This doesn't introduce new features (+ is a core change) so existing tests+CI should be sufficient to catch regressions.

Reviewed By: dzhulgakov

Differential Revision: D23219710

fbshipit-source-id: bd16ed6415b2933e047bcb283a013d47fb395814
2020-10-16 16:08:35 -07:00
Tristan Rice
dd169ca17c caffe2/plan_executor: propagate exceptions from reporter substeps (#46424)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46424

Currently if an exception occurs in a reporter thread the process is killed via std::terminate. This adds support for handling the reporter exception if FLAGS_caffe2_handle_executor_threads_exceptions is set to true.

Test Plan: buck test mode/opt -c python.package_style=inplace //caffe2/caffe2/python:hypothesis_test //caffe2/caffe2:caffe2_test_cpu -- --stress-runs 100

Reviewed By: dahsh

Differential Revision: D24345027

fbshipit-source-id: 0659495c9e27680ebae41fe5a3cf26ce2f455cb3
2020-10-16 12:28:57 -07:00
Jongsoo Park
c37baa9177 [caffe2] add concat benchmark (#46457)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46457

Wanted to see if using CopyMatrix specialized for float that uses mkl_somatcopy can be faster but it wasn't. Still want to check in benchmark that can be used later.

Test Plan: .

Reviewed By: dskhudia

Differential Revision: D24345901

fbshipit-source-id: d3e68dbb560e3138fda11c55789cd41bc0715c6d
2020-10-16 08:48:42 -07:00
Jeff Hwang
ecf63351bc [mlf][efficiency] modify equalization scale operator to return single output (#46449)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46449

modifies `ComputeEqualizationScale` to have a single output `S`

Test Plan:
```
buck test caffe2/caffe2/quantization/server:compute_equalization_scale_test
```

plus e2e tests

Reviewed By: hx89

Differential Revision: D23946768

fbshipit-source-id: 137c2d7a58bb858db411248606a5784b8066ab23
2020-10-16 01:22:37 -07:00
Ben Koopman
757173a4da Add Sigmoid operator from Caffe2 (#46286)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46286

commonize fp16 unary operators

Reviewed By: hyuen

Differential Revision: D24199660

fbshipit-source-id: 99bffa24dc3fa459561a7a2743b1a4dce4be5d58
2020-10-15 16:13:37 -07:00
Hao Lu
16c52d918b [caffe2] Bypass memonger for in-place ops (#46378)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46378

Reviewed By: dzhulgakov

Differential Revision: D24236604

fbshipit-source-id: 9f599687467ea969e89243482f8e2a41f7db0a23
2020-10-15 16:03:52 -07:00
Zeliang Chen
38c97fb6f0 [shape inference] add shape inference support
Summary:
* To make pruning op compatible with shape inference, we introduced a new quantile argument (as in D23463390) to differentiate dynamic/fixed pruning.

* The fixed pruning op has defined output shapes. However, the input shapes are not determined therefore we want to bypass the input shapes checking for two pruning ops, as implemented in this diff.

Test Plan:
buck test caffe2/caffe2/opt:bound_shape_inference_test

```
Started reporting to test run: https://our.intern.facebook.com/intern/testinfra/testrun/844425102187909
    ✓ ListingSuccess: caffe2/caffe2/opt:bound_shape_inference_test - main (1.973)
    ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.FC3D (2.604)
    ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.SparseLengthsSumFused4BitRowwise (2.635)
    ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.FC (2.690)
    ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.Int8QuantizeInferInputBackwards (2.705)
    ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.SparseLengthsSum (2.729)
    ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.Reshape (2.754)
    ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.ConcatMissingInput (2.770)
    ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.ElementwiseOp (2.770)
    ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.Tile (2.785)
    ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.Bucketize (2.789)
    ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.SparseLengthsSumFused8BitRowwise (2.807)
    ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.SparseLengthsSum8BitRowwiseSparse (2.841)
    ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.Split (2.863)
    ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.ConcatInferInputBackwards (2.894)
    ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.ElementwiseInferInputBackwards (2.898)
    ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.Combo0 (2.902)
    ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.LengthsRangeFill (2.964)
    ✓ Pass: caffe2/caffe2/opt:bound_shape_inference_test - BoundShapeInference.Quantization (2.964)
Summary
  Pass: 18
  ListingSuccess: 1
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/844425102187909
```

buck test caffe2/caffe2/fb/opt:bound_shape_inference_net_test

```
 Started reporting to test run: https://our.intern.facebook.com/intern/testinfra/testrun/3096224780078093
    ✓ ListingSuccess: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - main (14.092)
    ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.ClipLengths (15.508)
    ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.DPER3IdListFeaturePreProcessing (15.521)
    ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.ClipRanges (16.198)
    ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.RowwisePrune (16.302)
    ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - FbBoundShapeInferencerTest.GatherRanges1 (16.585)
    ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.Combo3 (16.865)
    ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.DPER3IdListFeaturePreProcessingWithCast (16.907)
    ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.GatherRanges2 (16.921)
    ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - FbBoundShapeInferencerTest.LengthsRangeFill (17.157)
    ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.ClipRangesAndGatherRanges (17.277)
    ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.DPER3IdScoreListFeaturePreProcessing (17.274)
    ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.ClipRangesGatherSigridHash (17.554)
    ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.Combo1 (17.645)
    ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.DPER3IdScoreListFeaturePreProcessingDEFAULT (17.887)
    ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.DPER3IdListFeaturePreProcessingDEFAULT (17.929)
    ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.f97293388_0 (19.343)
    ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - FbBoundShapeInferencerTest.GatherRangesToDense1 (19.489)
    ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.DPER3IdScoreListFeaturePreProcessingWithCast (19.887)
    ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.xray_v11 (19.905)
    ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - FbBoundShapeInferencerTest.SigridTransforms (20.080)
    ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.Combo2 (20.086)
    ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.vanillaSparseNN (59.847)
    ✓ Pass: caffe2/caffe2/fb/opt:bound_shape_inference_net_test - BoundShapeInference.gather (97.822)
Summary
  Pass: 23
  ListingSuccess: 1
```

## Workflow testing

===
* non-DI/fixed quantile/user side/non-self-binning
f224250571

*  non-DI/fixed quantile/user+ad side/non-self-binning
f224250610

* DI/fixed quantile/user side/self-binning
f224250637

* DI/fixed quantile/user+ad side/self-binning
f224250662

*  non-DI/dynamic quantile/user+ad side/non-self-binning
f224250705

* DI/dynamic quantile/user+ad side/self-binning
f224250760

Reviewed By: ChunliF

Differential Revision: D23647390

fbshipit-source-id: 3ec1c0eaea53bd4d5eda4a0436577216f7fa8ead
2020-10-15 00:46:06 -07:00
Nikita Shulga
84771fc64f [caffe2] Add 10s deadline for all Caffe2 hypothesis fuzz tests
Test Plan: CI

Reviewed By: walterddr

Differential Revision: D24298118

fbshipit-source-id: 2286c1e37ed9c43f404b888386c0bd4b0b6a55c6
2020-10-14 06:30:09 -07:00
Nikita Shulga
1fcec6e72b [caffe2] Add operator schema for FP16SparseNorm (#46300)
Summary:
Fixes regression introduced by https://github.com/pytorch/pytorch/pull/45551
Also Fix signed-unsigned comparison warnings in test/cpp/tensorexpr/test_train_impl.cpp

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46300

Reviewed By: walterddr

Differential Revision: D24294821

Pulled By: malfet

fbshipit-source-id: 16bffa71ec0d2d38208855223a3c5efb18414ab5
2020-10-13 18:58:23 -07:00
Jianyu Huang
5c67cc7a9e [caffe2] Enable fp16 for SparseNormalize op (#45551)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45551

The FP16 version of SparseNormalize op in Caffe2 is missing. This Diff adds FP16 support to unblock MC process of adding FP16 to Dper3.

Check https://fb.quip.com/L0T2AXGwUY3n#EReACAeifk3 .

One question is whether the pure FP16 Sparse Normalized op will affect the accuracy? Maybe we should do it in FP32 domain.
ghstack-source-id: 114184398

Test Plan:
```
 buck run mode/opt //caffe2/caffe2/python/operator_test:sparse_normalize_test
```

```
buck run mode/opt -c python.package_style=inplace mode/no-gpu //caffe2/caffe2/python/benchmarks:sparse_normalize_benchmark -- --fp16
```

Reviewed By: jspark1105

Differential Revision: D24005618

fbshipit-source-id: 8b918ec4063fdaafa444779b95206ba2b7b38537
2020-10-13 15:35:22 -07:00
Danny Huang
85c3ba5588 [caffe2] add PlanExecutorTest ErrorPlanWithCancellableStuckNet (#46110)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46110

## Motivation
* `Cancel` is now added to `OperatorBase` and `NetBase` (https://github.com/pytorch/pytorch/pull/44145).
* We need a test to cover and exhibit that we can cancel stuck net and propagate error with plan executor.

## Summary
* Added PlanExecutorTest `ErrorPlanWithCancellableStuckNet` for plan executor.
* Set cancelCount to zero at the beginning of tests to avoid global state be carried over in some test environment.

Test Plan:
## Unit Test Added

```
buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest
buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 1000
```

Reviewed By: d4l3k

Differential Revision: D24226577

fbshipit-source-id: c834383bfe6ab50747975c229eb42a363eed3458
2020-10-12 12:00:15 -07:00
John Lundell
3883cdb87e TensorInferenceFunction checks
Summary: Added OpSchema::NeedsAllInputShapes wrapper around the TensorInferenceFunction to fix exception when referencing the dim array when the input shape was unknown. There may be other operators that could use a similar change, these are just the ones that was causing InferShapesAndTypes throw an exception for my examples.

Test Plan: Tested with notebook n352716

Differential Revision: D23745442

fbshipit-source-id: d63eddea47d7ba595e73c4693d34c790f3a329cc
2020-10-11 16:08:58 -07:00
Mark Santaniello
1a99689d71 [caffe2] Fix preprocessor checks for FMA
Summary: I think this preprocessor check is incorrect.  The fused multiply-add (FMA) instructions are not part of AVX2.

Test Plan: CI

Reviewed By: jspark1105

Differential Revision: D24237836

fbshipit-source-id: 44f9b9179918332eb85ac087827726300f56224e
2020-10-11 11:48:32 -07:00
Jongsoo Park
4c87d337af [Caffe2] use the real new fbgemm sparse adagrad interface (#46132)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46132

As title

Test Plan: .

Reviewed By: dskhudia

Differential Revision: D24197694

fbshipit-source-id: 2bfe8f52409fa500d2ea359dec7f521cffb20efb
2020-10-10 08:57:54 -07:00
Zeliang Chen
34951e9adc [shape inference] adding a new flag to the struct
Summary: Adding a new flag shape_is_set to the structs for shape inference on in-place op to prevent duplicated inference.

Test Plan:
buck test mode/opt-clang caffe2/caffe2/opt:bound_shape_inference_test

buck test mode/opt-clang caffe2/caffe2/fb/opt:shape_info_utils_test

Reviewed By: ChunliF

Differential Revision: D24134767

fbshipit-source-id: 5142e749fd6d1b1092a45425ff7b417a8086f215
2020-10-09 19:29:08 -07:00
Jongsoo Park
da033e0b2d [Caffe2] use new fbgemm sparse adagrad interface with temp name (#46089)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46089

Follow-up of D24195799

Test Plan: .

Reviewed By: dskhudia

Differential Revision: D24196753

fbshipit-source-id: 216512822cfb752984bb97bd229af9746e866eaa
2020-10-09 12:51:43 -07:00
Danny Huang
87226f72d2 [caffe2] temp remove ErrorPlanWithCancellableStuckNet (#46080)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46080

temp removal of ErrorPlanWithCancellableStuckNet, will fill out more

Test Plan:
```
buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest
```
remove a test

Reviewed By: fegin

Differential Revision: D24213971

fbshipit-source-id: e6e600bad00b45c726311193b4b3238f1700526e
2020-10-08 23:35:45 -07:00
Danny Huang
487624e369 [caffe2] plan executor error propagation test with blocking cancellable op (#45319)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45319

## Motivation
* `Cancel` is now added to `OperatorBase` and `NetBase` (https://github.com/pytorch/pytorch/pull/44145)
* We need a test to cover and exhibit that we can cancel stuck net and propagate error with plan executor.

## Summary
* Added `ErrorPlanWithCancellableStuckNet` for plan executor.
* We set a plan with two nets: one stuck net with blocking operator that never returns, and one with error
  net with error op that throws, and tested it throw and cancel.

Test Plan:
## Unit Test added
```
buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest
buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 100
```
```
Summary
  Pass: 400
  ListingSuccess: 2
```

Reviewed By: d4l3k

Differential Revision: D23920548

fbshipit-source-id: feff41f73698bd6ea9b744f920e0fece4ee44438
2020-10-08 19:54:49 -07:00
Michael Ranieri
c6672a608b caffe2 missing cctype header (#46052)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46052

`<cctype>` is what provides `isuppper`, etc.
https://en.cppreference.com/w/cpp/header/cctype

clang on windows complaining about the missing header.

Test Plan: CI green

Reviewed By: yinghai

Differential Revision: D24201925

fbshipit-source-id: 7b242200f09c30bf78dde226e14ee4be71758b87
2020-10-08 16:48:49 -07:00
n-v-k
64b0686986 Expose ChannelShuffle (#46000)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/45999
Also small fix for caffe2 counterpart

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46000

Reviewed By: mruberry

Differential Revision: D24185855

Pulled By: ngimel

fbshipit-source-id: c5d599bb8100b86b81c6901f1b8b8baefc12cb16
2020-10-08 16:00:01 -07:00
Tristan Rice
59e4803b94 Recommit: caffe2/plan_executor: wait for 1 minute after exception and then abort (#45981)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45981

This is a recommit of previously reverted D20850851 (3fbddb92b1).

TL;DR - combining condition_variables and atomics is a bad idea

https://stackoverflow.com/questions/49622713/c17-atomics-and-condition-variable-deadlock

This also adds some ifdefs to disable the death test for mobile, xplat and tsan builds since forking doesn't play nicely with them.

Test Plan:
buck test mode/opt //caffe2/caffe2/python:hypothesis_test -- --stress-runs 1000 test_atomic_iter_with_concurrent_steps --timeout 120
  buck test mode/opt //caffe2/caffe2/python:hypothesis_test -- --stress-runs 100
  buck test mode/opt caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 100

no timeouts https://www.internalfb.com/intern/testinfra/testconsole/testrun/7036874440059883/

will ensure no timeouts in OSS

Reviewed By: walterddr, dahsh

Differential Revision: D24165505

fbshipit-source-id: 17cd23bfbcd9c2826a4067a387023d5186353196
2020-10-08 14:17:30 -07:00
Bugra Akyildiz
298e0e0d57 Refactor gather_ranges_to_dense from Python to C++ (#46021)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46021

Refactor gather_ranges_to_dense from Python to C++

https://www.internalfb.com/intern/tasks/?t=71935517

Test Plan:
General build/test:
```
buck build -c python.helpers=true fbcode/caffe2
buck test -c python.helpers=true fbcode/caffe2
```

Specific Test:
```buck test mode/dev-nosan //caffe2/torch/fb/sparsenn:test -- 'test_gather_ranges_to_dense \(caffe2\.torch\.fb\.sparsenn\.tests\.sparsenn_operators_test\.SparseNNOperatorsTest\)'
```

Reviewed By: houseroad

Differential Revision: D23858186

fbshipit-source-id: 8bce7c279275c8ff7316901b455e1d1dd7e36b13
2020-10-08 11:03:06 -07:00
Thomas Viehmann
d3d8da7a8e Enable CUDA Fuser for ROCm (#45965)
Summary:
This enables the cuda fuser on ROCm and enables tests for them.

Part of this patch is based on work of Rohith Nallamaddi, thank you.
Errors are my own, of course.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45965

Reviewed By: seemethere

Differential Revision: D24170457

Pulled By: walterddr

fbshipit-source-id: 3dd25b3501a41d2f00acba3ce8642ce51c49c9a6
2020-10-08 10:41:56 -07:00
Yinghai Lu
c9caa828f5 Throw special exception when backend compilation is met with fatal error (#45952)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45952

Pull Request resolved: https://github.com/pytorch/glow/pull/4967

When glow compilation meets with nonrecoverable fatal error (hardware is busted), we would like to throw a special exception other than the normal caffe2::EnforceNotMet so that we can signal the upper layer application to handle it differently.

Test Plan: Manually code some error and add LOG(FATAL) in the special exception path and wait for application to fatal.

Reviewed By: ipiszy

Differential Revision: D24156792

fbshipit-source-id: 4ae21bb0d36c89eac331fc52dd4682826b3ea180
2020-10-08 00:46:01 -07:00
Yinghai Lu
a92b49f7c8 [Onnxifi] Don't throw exception when we cannot write out debug files (#45979)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45979

For some reason, sometime we cannot write out the debug files. This shouldn't block the whole service. Hence, we opt in to error out instead of throw error.

Test Plan: Run net_runner test at `/` and observe error being printed out but the test passes.

Reviewed By: ipiszy

Differential Revision: D24165081

fbshipit-source-id: a4e1d0479d54d741e615e3a00b3003f512394fd4
2020-10-08 00:18:24 -07:00
Nikita Shulga
81d40aaf96 Add [zc]heevd to the list of MKL symbols exported from torch_cpu (#46002)
Summary:
cpu implementation of `torch.symeig` uses `[zc]heev`, but MAGMA only have `d`-suffixed flavors of those functions

Fixes https://github.com/pytorch/pytorch/issues/45922

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46002

Reviewed By: walterddr

Differential Revision: D24177730

Pulled By: malfet

fbshipit-source-id: 0e9aeb60a83f8a4b8ac2a86288721bd362b6040b
2020-10-07 20:50:10 -07:00
Venkata Chintapalli
a36f11a3a5 [FakeLowP] T76913842 Make AddFakeFp16 take int inputs (#45992)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45992

Created a template version of AddFakeFp16 to take both float and int inputs.

Test Plan: notebook with local bento kernel: N369049

Reviewed By: amylittleyang

Differential Revision: D24169720

fbshipit-source-id: 679de391224f65f6c5b3ca890eb0d157f09712f6
2020-10-07 17:43:00 -07:00
Hao Lu
0927e02a6a [caffe2] Do not run RemoveOpsByType on recurrent networks (#45986)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45986

Recurrent networks have subnets that are not well supported by `RemoveOpsByType`. Here we exclude recurrent networks by adding the same check as in memonger.

Test Plan:
```
buck test //caffe2/caffe2/fb/predictor:black_box_predictor_test
```

AdIndexer canary for sanity check:
https://www.internalfb.com/intern/ads/canary/430059485214766620

Differential Revision: D24167284

fbshipit-source-id: fa90d1c1f34af334a599d879af09d4c0bf7c27bd
2020-10-07 14:07:52 -07:00
Rong Rong
1bb2d41b68 Revert D20850851: caffe2/plan_executor: wait for 1 minute after exception and then abort
Test Plan: revert-hammer

Differential Revision:
D20850851 (3fbddb92b1)

Original commit changeset: 330503775d80

fbshipit-source-id: 612c6c3c4d5586bc8ad00a112cd00fc74fb44243
2020-10-07 09:04:24 -07:00
Bert Maher
50f89578dd [te] Add a benchmark harness (#45875)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45875

Adds a googlebenchmark harness for perf testing programs generated by
tensorexpr, sans any pytorch wrappings (for python-level benchmarks of
tensorexpr, see benchmarks/tensorexpr).

Currently there's a harness for gemm that sets up the problem using torch (and
also measures the perf of a torch::mm to give a baseline).

Right now there's just an unoptimized implementation that is expected to be not
very fast.  More optimized versions are coming.

Sample output from my dev box:
```
Run on (48 X 2501 MHz CPU s)
CPU Caches:
  L1 Data 32K (x24)
  L1 Instruction 32K (x24)
  L2 Unified 256K (x24)
  L3 Unified 30720K (x2)
--------------------------------------------------------------------------------------------
Benchmark                                     Time           CPU Iterations UserCounters...
--------------------------------------------------------------------------------------------
Gemm/Torch/128/128/128                    73405 ns      73403 ns       8614 GFLOPS=57.1411G/s
Gemm/TensorExprNoopt/128/128/128        3073003 ns    3072808 ns        229 GFLOPS=1.36497G/s
```

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D24142403

Pulled By: bertmaher

fbshipit-source-id: 3354aaa56868a43a553acd1ad9a192f28d8e3597
2020-10-06 16:57:27 -07:00
n-v-k
c1af91a13a [caffe2] SliceOp axes indexing fixes. (#45432)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/45431

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45432

Reviewed By: albanD

Differential Revision: D24132547

Pulled By: dzhulgakov

fbshipit-source-id: d67f7a92d806fb8ac8fc8f522b251d3a8fb83037
2020-10-06 13:21:08 -07:00
Tristan Rice
3fbddb92b1 caffe2/plan_executor: wait for 1 minute after exception and then abort (#45297)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45297

If we have two concurrent substeps and one of them throws an exception and the other is blocking, we'll currently hang. This waits up to 1 minute for it to complete before terminating the process.

Test Plan: buck test caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 100

Reviewed By: dahsh

Differential Revision: D20850851

fbshipit-source-id: 330503775d8062a34645ba55fe38e6770de5e3c7
2020-10-06 12:59:09 -07:00
Pawel Garbacki
fb50fcaa82 [C2] Add string equality operator (#45886)
Summary:
This diff adds a string equality checking operator.

Another attempt at reverted D24042344 (cf48872d28)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45886

Test Plan: unit tests, github builds

Reviewed By: dzhulgakov

Differential Revision: D24129953

fbshipit-source-id: caa53c7eac5c67c414c37e9d93416104f72556b9
2020-10-06 12:08:26 -07:00
Dmytro Dzhulgakov
519c086418 Revert D24042344: [C2] Add string equality operator
Test Plan: revert-hammer

Differential Revision:
D24042344 (cf48872d28)

Original commit changeset: c8997c6130e3

fbshipit-source-id: 3d8aec1104a2a59c67ab4b7e77caeaf9fc94ae1d
2020-10-05 15:09:03 -07:00
Pawel Garbacki
cf48872d28 [C2] Add string equality operator
Summary: This diff adds a string equality checking operator.

Test Plan: Unit tests

Differential Revision: D24042344

fbshipit-source-id: c8997c6130e3438f2ae95dae69f76978e2e95527
2020-10-05 10:47:53 -07:00
Thomas Viehmann
3ab88c3903 Enable TorchBind tests on ROCm (#45426)
Summary:
The torchbind tests didn't work be cause somehow we missed the rename of caffe2_gpu to torch_... (hip for us) in https://github.com/pytorch/pytorch/issues/20774 (merged 2019-06-13, oops) and still tried to link against it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45426

Reviewed By: VitalyFedyunin

Differential Revision: D24112439

Pulled By: walterddr

fbshipit-source-id: a66a574e63714728183399c543d2dafbd6c028f7
2020-10-05 09:38:12 -07:00
Xianjie Chen
73e9daa35f [caffe2] Optimize Dedup version of RowWiseSparseAdagrad fused op by WarpReduce (#45649)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45649

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44275

* This Diff applies WarpReduce optimization for dedup version of RowWiseSparseAdagrad fused op. Basically we can achieve ~1.33x performance improvement with this Diff.

* Port the way from D23948802 to find the num_dup
* fix the likely bug about fp16 in the dedup kernel

Reviewed By: jianyuh

Differential Revision: D23561994

fbshipit-source-id: 1a633fcdc924593063a67f9ce0d36eadb19a7efb
2020-10-02 14:28:24 -07:00
Marcio Porto
c31066ac9d Torch Integration Test Formatting Changes (#45740)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45740

Reviewed By: esqu1

Differential Revision: D23869021

fbshipit-source-id: 5910d44f9475bd7a53dc0478b69b39572dc8666f
2020-10-02 14:02:31 -07:00
Marcio Porto
b234acd414 Exposes SparseToDenseMask Caffe2 Operator (#45670)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45670

Reviewed By: esqu1

Differential Revision: D23868280

fbshipit-source-id: d6afa129c073fe611cb43a170025bc3c880a4bec
2020-10-02 10:05:13 -07:00
Michael Suo
18253f4a48 Fix BUILD_CAFFE2 if FBGEMM and NNPACK are not built (#45610)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45610

Also add to the usual documentation places that this option exists.

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D24058199

Pulled By: suo

fbshipit-source-id: 81574fbd042f47587e2c7820c726fac0f68af2a7
2020-10-01 14:58:55 -07:00
Kunal Bhalla
4564444c91 [RFC][caffe2] TaskGroup.__repr__ shouldn't have side effects
Summary: `__repr__` calling self.tasks() ends up marking the instance as "used", which doesn't seem appropriate. I was debugging a value being passed around and then ran into `Cannot add Task to an already used TaskGroup.` because the value had been logged once.

Test Plan:
Added a unit test -- didn't see a clean public method to test it, but I'm happy to add one if that makes sense.

Will wait for sandcastle to trigger everything else; I'm not at all familiar with this code so any other recommendations would be great!

Reviewed By: cryptopic

Differential Revision: D23541198

fbshipit-source-id: 5d1ec674a1ddaedf113140133b90e0da6afa7270
2020-10-01 14:21:03 -07:00