Commit Graph

43637 Commits

Author SHA1 Message Date
Philip Meier
20c2bb4c9f fix kl_div for negative targets
ghstack-source-id: d69d60f4fe
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69212
2022-02-08 14:36:26 +01:00
Philip Meier
334339a3d2 add OpInfos for torch.nn.functional.triplet_margin(_with_distance)?_loss
ghstack-source-id: bbc38b4b85
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67079
2022-02-08 14:36:26 +01:00
Philip Meier
5ada829c4b add OpInfos for nn.functional.binary_cross_entropy(_with_logits)?
ghstack-source-id: 740eeff117
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67023
2022-02-08 14:36:26 +01:00
Philip Meier
45cdfbeeab add OpInfo for torch.nn.functional.pdist
ghstack-source-id: 520a646689
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67022
2022-02-08 14:36:25 +01:00
Philip Meier
3cc2faa064 add OpInfo for torch.nn.functional.l1_loss
ghstack-source-id: c1f6e39524
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69211
2022-02-08 14:36:25 +01:00
francescocastelli
5e6f296612 Structured Kernel Precompute codegen handle fields without replacement (#71368)
Summary:
I've added the parsing of an optional first line in native_functions.yaml after the precomputed keyword for arguments that will be precomputed without replacement. This line is optional, must be the first and does not contain any arrow.

These new fields are precomputed as before in the meta function and added to the precompute struct returned by the meta function. For now I've put them as last args of the impl function where they can be reused.

example:

native_function.yaml:
```
  ...
  precomputed:
  - int numBatch, int numPlanes, int inputT, int inputH, int inputW   <- new
  - kernel_size -> int poolSizeT, int poolSizeH, int poolSizeW
  - output_size -> int outputT, int outputH, int outputW
```

meta:
```
TORCH_PRECOMPUTE_META_FUNC(fractional_max_pool3d)(
  const at::Tensor& input_,
  IntArrayRef pool_size,
  IntArrayRef output_size,
  const at::Tensor& randomSamples
) {
    ...

return TORCH_PRECOMPUTE_STRUCT(fractional_max_pool3d)().set_numBatch(numBatch).set_numPlanes(numPlanes).set_inputT(inputT).set_inputH(inputH).set_inputW(inputW)
  .set_poolSizeT(poolSizeT) ...
}
```

impl:
```
TORCH_IMPL_FUNC(fractional_max_pool3d_out_cpu)(
  const at::Tensor& input_,
  int64_t poolSizeT,
  int64_t poolSizeH,
  int64_t poolSizeW,
  int64_t outputT,
  int64_t outputH,
  int64_t outputW,
  const at::Tensor& randomSamples,
  const at::Tensor& output,
  const at::Tensor& indices,
  int64_t numBatch,    <- for now I've put them here
  int64_t numPlanes,
  int64_t inputT,
  int64_t inputH,
  int64_t inputW) {
```

Fixes https://github.com/pytorch/pytorch/issues/71314

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71368

Reviewed By: zou3519

Differential Revision: D33683984

Pulled By: bdhirsh

fbshipit-source-id: 33066dd92b8743aadf0dc8102f6bf0689f843242
(cherry picked from commit 64e46af6a4)
2022-02-08 03:56:56 +00:00
Brian Muse
8bf3179f6e #71946 Remove Python 3.6 references (#72211)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/71946

This commit removes some bits of code that were hard coded for Python 3.6 support from the `.circleci` and `torch` folders. It should only be merged if https://github.com/pytorch/pytorch/issues/66462 is complete.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72211

Reviewed By: dagitses, seemethere

Differential Revision: D33982604

Pulled By: musebc

fbshipit-source-id: 8f453bf9909df615addd59538adb369c65484044
(cherry picked from commit 944a9970fe)
2022-02-08 03:46:20 +00:00
Shiyan Deng
2afed243b5 [fx2trt] remove split.py (#71933)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71933

Add the functionalities provided by split.py to splitter_base.
- Propagate submodule inputs
- Create SplitResult to hold the split results.
Then removed split.py, to me this makes navigating the lowering code a bit easier.

Added default split and trace function for use.

Next step is to add better error handling for each stage during lowering and create unit tests for each stage. I'll probably make some bootcamp tasks for unit tests.

Test Plan: CI

Reviewed By: frank-wei, wushirong

Differential Revision: D33794322

fbshipit-source-id: f991893047a3701177f54cf22d9a6e48e0529472
(cherry picked from commit 1f3e13efba)
2022-02-08 03:31:25 +00:00
Mike Iovine
d51d2bd608 [SR] Add a flag to disable copy variants (#71102)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71102

This graph pass is causing a major perf regression on some models. Ideally we would introduce maybe_copy variants for all these ops. But since those are tricky to write, I've introduced a flag to just turn the pass off for now.
ghstack-source-id: 148541673

Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Reviewed By: navahgar

Differential Revision: D33510080

fbshipit-source-id: bb4847f26561197ea5e6bbad0a4d25db4ef468eb
(cherry picked from commit 8f333d3e81)
2022-02-08 02:43:07 +00:00
Raghavan Raman
765908708b [nnc] Adding a test with dynamic shapes from a model (#72198)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72198

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D33951741

Pulled By: navahgar

fbshipit-source-id: 596b193eba14c8e1affa9fa13070079f05d64cac
(cherry picked from commit ddbb78ff80)
2022-02-08 02:00:46 +00:00
Nikita Shulga
cedf37d933 Back out "Revert D34043182: [pytorch][PR] Added missing antialias argument to functional.pyi.in"
Summary:
Original commit changeset: 4ce347cb0f30

Original Phabricator Diff: D34043182 (8315c9b885)

Test Plan: It's a backout of a backout

Reviewed By: pbelevich, jaceyca

Differential Revision: D34060843

fbshipit-source-id: 6aaf62ce74330cbf142ab483b2a31eccba775ca9
(cherry picked from commit 046b1dbb72)
2022-02-08 01:37:22 +00:00
Nikita Shulga
bb101ec78d Revert D33595240: [JIT] Opinfo tests for nnc fusion
Test Plan: revert-hammer

Differential Revision:
D33595240 (0b57bd4c66)

Original commit changeset: e2e17a921bc3

Original Phabricator Diff: D33595240 (0b57bd4c66)

fbshipit-source-id: 172a3ffd19d180b1b3617956b1f881be62f37bc9
(cherry picked from commit 324cfaea86)
2022-02-08 01:28:42 +00:00
Nikita Shulga
58f25678bd Revert D33780905: Opinfo test for mvlgamma: add epsilon
Test Plan: revert-hammer

Differential Revision:
D33780905 (72cedba655)

Original commit changeset: c9afd443bc90

Original Phabricator Diff: D33780905 (72cedba655)

fbshipit-source-id: 180b862ed03e18f96cc1c7f956476eb16dd56225
(cherry picked from commit 623643b362)
2022-02-08 01:28:42 +00:00
Guo Yejun
4d4b94b3cb gen_backend_stubs.py: fix typo for supported_autograd (#68562)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/68562

Reviewed By: jbschlosser

Differential Revision: D32758608

Pulled By: bdhirsh

fbshipit-source-id: 496e1ec831edaa6fcc586f3c8f0361c31cad4e78
(cherry picked from commit 68ea9e9df5)
2022-02-08 01:28:42 +00:00
Nikita Shulga
127bf42ee7 Revert D34043182: [pytorch][PR] Added missing antialias argument to functional.pyi.in
Test Plan: revert-hammer

Differential Revision:
D34043182 (8315c9b885)

Original commit changeset: ca64a8f0d251

Original Phabricator Diff: D34043182 (8315c9b885)

fbshipit-source-id: 4ce347cb0f30c4e1eaeef86995f698dd72494d66
(cherry picked from commit 93eef03aa6)
2022-02-08 01:11:39 +00:00
Ivan Yashchuk
8cdcc1181c Add missing entry for sampled_addmm in sparse.rst (#72312)
Summary:
Let's make the documentation for `torch.sparse.sampled_addmm` searchable in the PyTorch documentation.
This PR shall be cherry-picked for the next 1.11 release.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72312

Reviewed By: davidberard98

Differential Revision: D34045230

Pulled By: cpuhrsch

fbshipit-source-id: c1b1dc907443284857f48c8ce1efab22c6701bbe
(cherry picked from commit 225929ecf2)
2022-02-08 00:07:20 +00:00
Nikita Shulga
896703d3d7 Do not push binaries generated by ciflow
Fixes https://github.com/pytorch/pytorch/issues/72422

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72427
2022-02-07 22:56:30 +00:00
Evgeny Fiksman
9ab71f5ac8 [pytorch/aten] Avoid temporary array reconstruction (#72391)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72391

Temporary array can be reused with in the loop, this will save memory reallocations and uninitialized_copy calls for the vector

Test Plan: CI

Reviewed By: jspark1105

Differential Revision: D34030993

fbshipit-source-id: 40708e3144c6c8f8ac3a6a45d668b34b5e52e095
(cherry picked from commit 859e126aef)
2022-02-07 22:49:12 +00:00
David Berard
72cedba655 Opinfo test for mvlgamma: add epsilon (#71794)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71794

mvlgamma(inp, p) requires that all the elements of inp are > (p-1)/2.

The opinfo test was occasionally producing inputs with elements == (p-1/2), which would generate errors like:

```
ERROR: test_nnc_correctness_mvlgamma_mvlgamma_p_5_cpu_bfloat16 (__main__.TestNNCOpInfoCPU)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/path/pytorch/torch/testing/_internal/common_device_type.py", line 381, in instantiated_test
    raise rte
  File "/path/pytorch/torch/testing/_internal/common_device_type.py", line 376, in instantiated_test
    result = test(self, **param_kwargs)
  File "/path/pytorch/torch/testing/_internal/common_device_type.py", line 753, in test_wrapper
    return test(*args, **kwargs)
  File "/path/pytorch/torch/testing/_internal/common_device_type.py", line 907, in only_fn
    return fn(slf, *args, **kwargs)
  File "/path/pytorch/test/test_jit_fuser_te.py", line 2293, in test_nnc_correctness
    ref = variant(*clone_inputs((sample.input, *sample.args)), **sample.kwargs)
RuntimeError: All elements must be greater than (p-1)/2
```

repro example: https://gist.github.com/davidberard98/9da688e31cdfbaed7e990746b28a4ba2

Test Plan: Imported from OSS

Reviewed By: qihqi

Differential Revision: D33780905

Pulled By: davidberard98

fbshipit-source-id: c9afd443bc90ce68f33b97498921b447e4f7d1d8
(cherry picked from commit a974b03f07)
2022-02-07 22:21:03 +00:00
Nikita Shulga
5654b68731 Revert D34011981: [pytorch][PR] remove some spurious warnings fixing
Test Plan: revert-hammer

Differential Revision:
D34011981 (1bad3c4a84)

Original commit changeset: 55bedc8a4092

Original Phabricator Diff: D34011981 (1bad3c4a84)

fbshipit-source-id: 216643e251597cd7086e7854426f4f189a77adc9
(cherry picked from commit bb39550500)
2022-02-07 22:01:25 +00:00
Daniël de Kok
d50211860a Use SLEEF functions for NEON vectors on macOS ARM64 (#70354)
Summary:
We noticed that on M1 Macs Tranformer network profiles are dominated by scalar `exp` and `erff` functions (for softmax and GELU).

The NEON `Vectorized<float>` implementation does not use SLEEF functions in order to compile on mobile platforms. However, SLEEF is already compiled on macOS ARM64 and is safe to use there. This change adds another implementation of `Vectorized<float>` that uses SLEEF functions. This implementation is only used on macOS ARM64.

This change speeds up e.g. prediction of spaCy transformer models by 20% on M1 Macs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/70354

Reviewed By: albanD

Differential Revision: D33659540

Pulled By: kimishpatel

fbshipit-source-id: b8f02a61321873fc60778190a005c466c7d0cc0c
(cherry picked from commit 71286a207c)
2022-02-07 21:55:28 +00:00
Bradley Davis
f0f49a1153 [torch.package] add test case for repackaging parent module (#72367)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72367

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72299

Test Plan:
Before https://github.com/pytorch/pytorch/pull/71520:
```
Summary
  Pass: 106
  Fail: 1
    ✗ caffe2/test:package - test_repackage_import_indirectly_via_parent_module (package.package_d.test_repackage.TestRepackage)
  Skip: 22
  ...
  ListingSuccess: 1
```

After https://github.com/pytorch/pytorch/pull/71520:

```
BUILD SUCCEEDED
    ✓ ListingSuccess: caffe2/test:package : 129 tests discovered (28.595)
    ✓ Pass: caffe2/test:package - test_repackage_import_indirectly_via_parent_module (package.package_d.test_repackage.TestRepackage) (18.635)
Summary
  Pass: 1
  ListingSuccess: 1
```

Reviewed By: PaliC

Differential Revision: D34015540

fbshipit-source-id: b45af5872ae4a5f52afbc0008494569d1080fa38
(cherry picked from commit 432d728e66)
2022-02-07 21:49:36 +00:00
Ivan Yashchuk
29c81bbff5 Fix SVD error code handling for OpenBLAS 0.3.15+ and MKL 2022+ (again) (#72357)
Summary:
This PR was opened as copy of https://github.com/pytorch/pytorch/pull/68812 by request https://github.com/pytorch/pytorch/pull/68812#issuecomment-1030215862.

-----

Fixes https://github.com/pytorch/pytorch/issues/67693.

Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate the wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code.

MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+
This PR also fixes https://github.com/pytorch/pytorch/issues/71645 that is due to the updated MKL version in CI.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72357

Reviewed By: albanD

Differential Revision: D34012245

Pulled By: ngimel

fbshipit-source-id: 2b66c173cc3458d8c766b542d0d569191cdce310
(cherry picked from commit fa29e65611)
2022-02-07 21:36:30 +00:00
Peter Bell
bc1fb7a618 CMake: Limit python include directories to only python libraries (#69085)
Summary:
`include_directories` is old-style CMake which adds the include path to every file being compiled. This instead makes `python`, `numpy` and `pybind11` into targets that only `torch_python` and `caffe2_pybind_state` are linked to. So, python libraries can't be accidentally included elsewhere.

Resubmit of https://github.com/pytorch/pytorch/issues/65654, Closes https://github.com/pytorch/pytorch/issues/65828

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69085

Reviewed By: anjali411

Differential Revision: D33776456

Pulled By: malfet

fbshipit-source-id: 018b0f6cd5a4f8c9e36df961deff832bc4afd479
(cherry picked from commit 57063107d6)
2022-02-07 21:18:32 +00:00
Nikita Shulga
bec2ed05e8 [BE] Move upload logic to shared template (#72426)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72426

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D34050483

Pulled By: malfet

fbshipit-source-id: c8ab8505433a8eab3da10a1d2f990496e9e6300c
(cherry picked from commit 81346d7c8b)
2022-02-07 21:18:32 +00:00
Nikita Shulga
b74c2de46a Set DRY_RUN to disabled for Win binary builds (#72425)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72425

Not sure how it worked before

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D34050484

Pulled By: malfet

fbshipit-source-id: 91e2660d4f4e3b8c04bddd07bac434fcba630c0f
(cherry picked from commit b652b25d39)
2022-02-07 21:18:32 +00:00
Omar
25f9fe22a9 [PowerSGD] Add orthogonalization with QR factorization (#72043)
Summary:
### 🚀 The feature, motivation and pitch
Following the discussion in https://github.com/pytorch/pytorch/issues/65813, I added the QR factorization to powerSGD_hook.py
Gram-Schmidt orthogonalization can't be fully replaced because _torch.linalg.qr_ doesn't work with half-precision. Moreover, in my tests, it works faster with a rank lesser than 3.

This is one sample experiment timing powerSGD_hook on ResNext101 with the two different methods:
![Screenshot from 2022-01-31 18-14-00](https://user-images.githubusercontent.com/42100908/151840929-270c67dd-9fe7-4f11-8e70-8bf2d0ba678d.png)

### Alternatives
Use _torch.orgqr(*torch.geqrf(matrix))_. From my tests it performances are similar to _torch.linalg.qr_.

### Additional context
_No response_

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72043

Reviewed By: albanD

Differential Revision: D34042781

Pulled By: cbalioglu

fbshipit-source-id: e331179d3b7ac40d445b651fc473b16ae4ead462
(cherry picked from commit f64bf3839a)
2022-02-07 21:15:40 +00:00
David Berard
0b57bd4c66 [JIT] Opinfo tests for nnc fusion (#70465)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70465

These tests check to ensure that
(a) the result after nnc fusion (of a single op) is the same as the
unfused op
(b) for certain ops where fusion is expected to occur, ensure that
fusion does actually occur

Test Plan: Imported from OSS

Reviewed By: wenleix

Differential Revision: D33595240

Pulled By: davidberard98

fbshipit-source-id: e2e17a921bc30c313e92e8e5bbc6c1b5fcd14bc1
(cherry picked from commit b1ba221acc)
2022-02-07 20:56:21 +00:00
vfdev-5
8315c9b885 Added missing antialias argument to functional.pyi.in (#72420)
Summary:
Description:
- Added missing antialias argument to functional.pyi.in
- mypy is happy if checking `interpolate` method with antialias argument

Related torchvision issue: https://github.com/pytorch/vision/pull/5329

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72420

Reviewed By: mruberry

Differential Revision: D34043182

Pulled By: albanD

fbshipit-source-id: ca64a8f0d2516c1be5b060c1c24e0b1ed2127b96
(cherry picked from commit 7c8a90cbfa)
2022-02-07 20:44:59 +00:00
Alban Desmaison
1bad3c4a84 remove some spurious warnings fixing (#72352)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/70389

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72352

Reviewed By: jbschlosser

Differential Revision: D34011981

Pulled By: albanD

fbshipit-source-id: 55bedc8a40929bc5b49cb6d7d7d51a3750f2ff27
(cherry picked from commit a6657a9071)
2022-02-07 19:53:14 +00:00
Raghavan Raman
ff71429906 [nnc] Add stride args while running with allocated outputs (#72223)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72223

ghstack-source-id: 148494871

Test Plan:
```
buck test mode/opt //caffe2/test/cpp/tensorexpr:tensorexpr -- --exact 'caffe2/test/cpp/tensorexpr:tensorexpr - DynamicShapes.GraphWithSymbolicStrides'
```

Reviewed By: eellison

Differential Revision: D33960592

fbshipit-source-id: 6334978d5e3713889b4ad12bcd8ed8c69df39d58
(cherry picked from commit 95cc102bc2)
2022-02-07 19:24:56 +00:00
Chien-Chin Huang
224093db11 [FSDP] Add FlatParameter to track the information of a flat parameter (#69241)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69241

Implement FlatParameter to track the information of a flat parameter, including the sharding information.

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D32432503

fbshipit-source-id: b4aabba6cef29e825b45869895709c79e69c211d
(cherry picked from commit 0e5505f70b)
2022-02-07 18:51:17 +00:00
Shijun Kong
09e2fb8f6e Make LinearPackedParams works with both torchscript and torch.package (#71656)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71656

Customized `__getstate__`/`__setstate__` didn't call super (torch.nn.Module), and won't restore attributes (e.g. `_modules`) after being serialized and deserialized via torch.package

After a few iteration, as it turns out, pack/unpack linear param has been supported in torchbind class already, no need to hack torch module anymore.

Test Plan: `buck test caffe2/test/:quantization -- test_linear_api`

Reviewed By: jerryzh168

Differential Revision: D33711086

fbshipit-source-id: 3a36d10c64b7da414d3657d2ef766bb9a9290ea9
(cherry picked from commit 6337b6c207)
2022-02-07 18:39:28 +00:00
Nikita Shulga
717d8c6224 [BE] Fix pybind deprecation warnings (#72376)
Summary:
Fixes:
```
../torch/csrc/autograd/python_variable.cpp:1798:33: warning: ‘bool pybind11::handle::operator==(const pybind11::handle&) const’ is deprecated: Use obj1.is(obj2) instead [-Wdeprecated-declarations]
     TORCH_CHECK(out == py::none(), "Expected __torch_dispatch__ for ", op.operator_name(),
```
and
```
../torch/csrc/jit/python/python_list.cpp:254:57: warning: ‘pybind11::object::object(pybind11::handle, bool)’ is deprecated: Use reinterpret_borrow<object>() or reinterpret_steal<object>() [-Wdeprecated-declarations]
                     py::object(obj, /*is_borrowed*/ true),
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72376

Reviewed By: albanD

Differential Revision: D34021328

Pulled By: malfet

fbshipit-source-id: 72906077db9031311c6b0ae4c65eb79df9c514d4
(cherry picked from commit e1877ca268)
2022-02-07 18:33:32 +00:00
Richard Barnes
5da6de5dc2 Fix unused variable warnings (#72410)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72410

Fixes
```
caffe2/caffe2/operators/cross_entropy_op.cu(330): warning: parameter "outer_size" was declared but never referenced

caffe2/caffe2/operators/cross_entropy_op.cu(191): warning: parameter "outer_size" was declared but never referenced
caffe2/caffe2/operators/generate_proposals_op_util_nms.h(347): warning: variable "order" was declared but never referenced
caffe2/caffe2/operators/segment_reduction_op_gpu.cu(319): warning: parameter "N" was declared but never referenced
          detected during:
            instantiation of "__nv_bool caffe2::CUDASparseLengthsWeightedSumOp<T, Context, SparseFused>::DoRunWithType<IndexType>() [with T=float, Context=caffe2::CUDAContext, SparseFused=true, IndexType=int32_t]"
caffe2/caffe2/core/operator.h(1304): here
            instantiation of "__nv_bool caffe2::DispatchHelper<caffe2::TensorTypes<FirstType, Types...>, ExtraArgs...>::call(Op *, caffe2::TypeMeta) [with FirstType=int32_t, Types=<int64_t>, ExtraArgs=<>, Op=caffe2::CUDASparseLengthsWeightedSumOp<float, caffe2::CUDAContext, true>]"
caffe2/caffe2/core/operator.h(1304): here
            instantiation of "__nv_bool caffe2::DispatchHelper<caffe2::TensorTypes<FirstType, Types...>, ExtraArgs...>::call(Op *, const caffe2::Tensor &) [with FirstType=int32_t, Types=<int64_t>, ExtraArgs=<>, Op=caffe2::CUDASparseLengthsWeightedSumOp<float, caffe2::CUDAContext, true>]"
(786): here
caffe2/caffe2/operators/segment_reduction_op_gpu.cu(96): warning: parameter "len_length" was declared but never referenced
          detected during:
            instantiation of "__nv_bool caffe2::CUDASparseLengthsSumGradientWithIndicesOp<T, Context>::RunOnDevice() [with T=float, Context=caffe2::CUDAContext]"
(1296): here
caffe2/caffe2/sgd/adagrad_fused_op_gpu.cu(1226): warning: variable "N" was declared but never referenced
          detected during:
            instantiation of "__nv_bool caffe2::DispatchHelper<caffe2::TensorTypes2<FirstType, Types...>, ExtraArgs...>::call(Op *, caffe2::TypeMeta) [with FirstType=float, Types=<c10::Half>, ExtraArgs=<int32_t>, Op=caffe2::CUDARowWiseSparseAdagradFusedWithSparseLengthsSumGradientExactOp<float, int, false, caffe2::CUDAContext>]"
caffe2/caffe2/sgd/adagrad_fused_op_gpu.cu(259): warning: parameter "indices" was declared but never referenced
          detected during:
            instantiation of "__nv_bool caffe2::CUDARowWiseSparseAdagradFusedWithSparseLengthsSumGradientExactOp<T, TLengths, is_mean, Context>::DoRunWithType2<IndexType,TParam>() [with T=float, TLengths=int, is_mean=false, Context=caffe2::CUDAContext, IndexType=int32_t, TParam=float]"
caffe2/caffe2/core/operator.h(1308): here
caffe2/caffe2/operators/piecewise_linear_transform_op.cu(15): warning: parameter "num_grp" was declared but never referenced

caffe2/caffe2/operators/piecewise_linear_transform_op.cu(50): warning: parameter "M" was declared but never referenced

caffe2/caffe2/operators/piecewise_linear_transform_op.cu(51): warning: parameter "num_grp" was declared but never referenced

caffe2/caffe2/operators/piecewise_linear_transform_op.cu(78): warning: parameter "num_grp" was declared but never referenced
```

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D34034404

fbshipit-source-id: b834088d6a3e204e94bbffe3ac6fdccf9d0176b8
(cherry picked from commit 0148d0de04)
2022-02-07 18:25:29 +00:00
Richard Barnes
805dff354e Avoid type qualifier specified more than once (#72411)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72411

Fixes
```

caffe2/caffe2/operators/max_pool_with_index.cu(16): warning: type qualifier specified more than once

caffe2/caffe2/operators/max_pool_with_index.cu(28): warning: type qualifier specified more than once

caffe2/caffe2/operators/max_pool_with_index.cu(61): warning: type qualifier specified more than once

caffe2/caffe2/operators/max_pool_with_index.cu(62): warning: type qualifier specified more than once

caffe2/caffe2/operators/max_pool_with_index.cu(74): warning: type qualifier specified more than once
```

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D34034382

fbshipit-source-id: 2b73c55358632090baf673b32b800656ae874040
(cherry picked from commit ab3f3f9a79)
2022-02-07 18:25:29 +00:00
Richard Barnes
2b702b43c5 Fix unused variable warning (#72412)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72412

Fixes:
```
caffe2/aten/src/ATen/cuda/CUDAApplyUtils.cuh(328): warning: parameter "n" was declared but never referenced
          detected during:
            instantiation of "void at::cuda::<unnamed>::ApplyOp2<Op, scalar1, scalar2, IndexType, ADims, BDims, remaining_steps, Offsets...>::apply(at::cuda::detail::TensorInfo<scalar1, IndexType> &, at::cuda::detail::TensorInfo<scalar2, IndexType> &, const Op &, int64_t, IndexType, Offsets..., Offsets...) [with Op=lambda [](double &, const double &)->void, scalar1=double, scalar2=double, IndexType=unsigned int, ADims=1, BDims=1, remaining_steps=1, Offsets=<>]"
(370): here
            instantiation of "void at::cuda::<unnamed>::kernelPointwiseApply2<Op,scalar1,scalar2,IndexType,ADims,BDims,step,max_threads_per_block,min_blocks_per_sm>(at::cuda::detail::TensorInfo<scalar1, IndexType>, at::cuda::detail::TensorInfo<scalar2, IndexType>, IndexType, Op) [with Op=lambda [](double &, const double &)->void, scalar1=double, scalar2=double, IndexType=unsigned int, ADims=1, BDims=1, step=1, max_threads_per_block=512, min_blocks_per_sm=2]"
(487): here
            instantiation of "__nv_bool at::cuda::CUDA_tensor_apply2<scalar1,scalar2,step,Op,max_threads_per_block,min_blocks_per_sm>(at::Tensor, at::Tensor, Op, at::cuda::TensorArgType, at::cuda::TensorArgType) [with scalar1=double, scalar2=double, step=1, Op=lambda [](double &, const double &)->void, max_threads_per_block=512, min_blocks_per_sm=2]"
(533): here
            instantiation of "__nv_bool at::cuda::CUDA_tensor_apply2<scalar1,scalar2,Op,max_threads_per_block,min_blocks_per_sm>(at::Tensor, at::Tensor, Op, at::cuda::TensorArgType, at::cuda::TensorArgType) [with scalar1=double, scalar2=double, Op=lambda [](double &, const double &)->void, max_threads_per_block=512, min_blocks_per_sm=2]"
caffe2/aten/src/ATen/native/cuda/Distributions.cu(60): here
            instantiation of "void <unnamed>::poisson_cuda_kernel<scalar_t>(at::Tensor &, const at::Tensor &, at::PhiloxCudaState) [with scalar_t=double]"
caffe2/aten/src/ATen/native/cuda/Distributions.cu(169): here

caffe2/aten/src/ATen/cuda/CUDAApplyUtils.cuh(328): warning: parameter "linearIndex" was declared but never referenced
          detected during:
            instantiation of "void at::cuda::<unnamed>::ApplyOp2<Op, scalar1, scalar2, IndexType, ADims, BDims, remaining_steps, Offsets...>::apply(at::cuda::detail::TensorInfo<scalar1, IndexType> &, at::cuda::detail::TensorInfo<scalar2, IndexType> &, const Op &, int64_t, IndexType, Offsets..., Offsets...) [with Op=lambda [](double &, const double &)->void, scalar1=double, scalar2=double, IndexType=unsigned int, ADims=1, BDims=1, remaining_steps=1, Offsets=<>]"
(370): here
            instantiation of "void at::cuda::<unnamed>::kernelPointwiseApply2<Op,scalar1,scalar2,IndexType,ADims,BDims,step,max_threads_per_block,min_blocks_per_sm>(at::cuda::detail::TensorInfo<scalar1, IndexType>, at::cuda::detail::TensorInfo<scalar2, IndexType>, IndexType, Op) [with Op=lambda [](double &, const double &)->void, scalar1=double, scalar2=double, IndexType=unsigned int, ADims=1, BDims=1, step=1, max_threads_per_block=512, min_blocks_per_sm=2]"
(487): here
            instantiation of "__nv_bool at::cuda::CUDA_tensor_apply2<scalar1,scalar2,step,Op,max_threads_per_block,min_blocks_per_sm>(at::Tensor, at::Tensor, Op, at::cuda::TensorArgType, at::cuda::TensorArgType) [with scalar1=double, scalar2=double, step=1, Op=lambda [](double &, const double &)->void, max_threads_per_block=512, min_blocks_per_sm=2]"
(533): here
            instantiation of "__nv_bool at::cuda::CUDA_tensor_apply2<scalar1,scalar2,Op,max_threads_per_block,min_blocks_per_sm>(at::Tensor, at::Tensor, Op, at::cuda::TensorArgType, at::cuda::TensorArgType) [with scalar1=double, scalar2=double, Op=lambda [](double &, const double &)->void, max_threads_per_block=512, min_blocks_per_sm=2]"
caffe2/aten/src/ATen/native/cuda/Distributions.cu(60): here
            instantiation of "void <unnamed>::poisson_cuda_kernel<scalar_t>(at::Tensor &, const at::Tensor &, at::PhiloxCudaState) [with scalar_t=double]"
caffe2/aten/src/ATen/native/cuda/Distributions.cu(169): here
caffe2/aten/src/ATen/cuda/CUDAApplyUtils.cuh(328): warning: parameter "linearIndex" was declared but never referenced
          detected during:
            instantiation of "void at::cuda::<unnamed>::ApplyOp2<Op, scalar1, scalar2, IndexType, ADims, BDims, remaining_steps, Offsets...>::apply(at::cuda::detail::TensorInfo<scalar1, IndexType> &, at::cuda::detail::TensorInfo<scalar2, IndexType> &, const Op &, int64_t, IndexType, Offsets..., Offsets...) [with Op=lambda [](double &, const double &)->void, scalar1=double, scalar2=double, IndexType=unsigned int, ADims=1, BDims=1, remaining_steps=1, Offsets=<>]"
(370): here
            instantiation of "void at::cuda::<unnamed>::kernelPointwiseApply2<Op,scalar1,scalar2,IndexType,ADims,BDims,step,max_threads_per_block,min_blocks_per_sm>(at::cuda::detail::TensorInfo<scalar1, IndexType>, at::cuda::detail::TensorInfo<scalar2, IndexType>, IndexType, Op) [with Op=lambda [](double &, const double &)->void, scalar1=double, scalar2=double, IndexType=unsigned int, ADims=1, BDims=1, step=1, max_threads_per_block=512, min_blocks_per_sm=2]"
(487): here
            instantiation of "__nv_bool at::cuda::CUDA_tensor_apply2<scalar1,scalar2,step,Op,max_threads_per_block,min_blocks_per_sm>(at::Tensor, at::Tensor, Op, at::cuda::TensorArgType, at::cuda::TensorArgType) [with scalar1=double, scalar2=double, step=1, Op=lambda [](double &, const double &)->void, max_threads_per_block=512, min_blocks_per_sm=2]"
(533): here
            instantiation of "__nv_bool at::cuda::CUDA_tensor_apply2<scalar1,scalar2,Op,max_threads_per_block,min_blocks_per_sm>(at::Tensor, at::Tensor, Op, at::cuda::TensorArgType, at::cuda::TensorArgType) [with scalar1=double, scalar2=double, Op=lambda [](double &, const double &)->void, max_threads_per_block=512, min_blocks_per_sm=2]"
caffe2/aten/src/ATen/native/cuda/Distributions.cu(60): here
            instantiation of "void <unnamed>::poisson_cuda_kernel<scalar_t>(at::Tensor &, const at::Tensor &, at::PhiloxCudaState) [with scalar_t=double]"
caffe2/aten/src/ATen/native/cuda/Distributions.cu(169): here

caffe2/aten/src/ATen/cuda/CUDAApplyUtils.cuh(328): warning: parameter "n" was declared but never referenced
          detected during:
            instantiation of "void at::cuda::<unnamed>::ApplyOp2<Op, scalar1, scalar2, IndexType, ADims, BDims, remaining_steps, Offsets...>::apply(at::cuda::detail::TensorInfo<scalar1, IndexType> &, at::cuda::detail::TensorInfo<scalar2, IndexType> &, const Op &, int64_t, IndexType, Offsets..., Offsets...) [with Op=lambda [](double &, const double &)->void, scalar1=double, scalar2=double, IndexType=unsigned int, ADims=1, BDims=1, remaining_steps=1, Offsets=<>]"
(370): here
            instantiation of "void at::cuda::<unnamed>::kernelPointwiseApply2<Op,scalar1,scalar2,IndexType,ADims,BDims,step,max_threads_per_block,min_blocks_per_sm>(at::cuda::detail::TensorInfo<scalar1, IndexType>, at::cuda::detail::TensorInfo<scalar2, IndexType>, IndexType, Op) [with Op=lambda [](double &, const double &)->void, scalar1=double, scalar2=double, IndexType=unsigned int, ADims=1, BDims=1, step=1, max_threads_per_block=512, min_blocks_per_sm=2]"
(487): here
            instantiation of "__nv_bool at::cuda::CUDA_tensor_apply2<scalar1,scalar2,step,Op,max_threads_per_block,min_blocks_per_sm>(at::Tensor, at::Tensor, Op, at::cuda::TensorArgType, at::cuda::TensorArgType) [with scalar1=double, scalar2=double, step=1, Op=lambda [](double &, const double &)->void, max_threads_per_block=512, min_blocks_per_sm=2]"
(533): here
            instantiation of "__nv_bool at::cuda::CUDA_tensor_apply2<scalar1,scalar2,Op,max_threads_per_block,min_blocks_per_sm>(at::Tensor, at::Tensor, Op, at::cuda::TensorArgType, at::cuda::TensorArgType) [with scalar1=double, scalar2=double, Op=lambda [](double &, const double &)->void, max_threads_per_block=512, min_blocks_per_sm=2]"
caffe2/aten/src/ATen/native/cuda/Distributions.cu(60): here
            instantiation of "void <unnamed>::poisson_cuda_kernel<scalar_t>(at::Tensor &, const at::Tensor &, at::PhiloxCudaState) [with scalar_t=double]"
caffe2/aten/src/ATen/native/cuda/Distributions.cu(169): here

caffe2/aten/src/ATen/cuda/CUDAApplyUtils.cuh(328): warning: parameter "linearIndex" was declared but never referenced
          detected during:
            instantiation of "void at::cuda::<unnamed>::ApplyOp2<Op, scalar1, scalar2, IndexType, ADims, BDims, remaining_steps, Offsets...>::apply(at::cuda::detail::TensorInfo<scalar1, IndexType> &, at::cuda::detail::TensorInfo<scalar2, IndexType> &, const Op &, int64_t, IndexType, Offsets..., Offsets...) [with Op=lambda [](double &, const double &)->void, scalar1=double, scalar2=double, IndexType=unsigned int, ADims=1, BDims=1, remaining_steps=1, Offsets=<>]"
(370): here
            instantiation of "void at::cuda::<unnamed>::kernelPointwiseApply2<Op,scalar1,scalar2,IndexType,ADims,BDims,step,max_threads_per_block,min_blocks_per_sm>(at::cuda::detail::TensorInfo<scalar1, IndexType>, at::cuda::detail::TensorInfo<scalar2, IndexType>, IndexType, Op) [with Op=lambda [](double &, const double &)->void, scalar1=double, scalar2=double, IndexType=unsigned int, ADims=1, BDims=1, step=1, max_threads_per_block=512, min_blocks_per_sm=2]"
(487): here
            instantiation of "__nv_bool at::cuda::CUDA_tensor_apply2<scalar1,scalar2,step,Op,max_threads_per_block,min_blocks_per_sm>(at::Tensor, at::Tensor, Op, at::cuda::TensorArgType, at::cuda::TensorArgType) [with scalar1=double, scalar2=double, step=1, Op=lambda [](double &, const double &)->void, max_threads_per_block=512, min_blocks_per_sm=2]"
(533): here
            instantiation of "__nv_bool at::cuda::CUDA_tensor_apply2<scalar1,scalar2,Op,max_threads_per_block,min_blocks_per_sm>(at::Tensor, at::Tensor, Op, at::cuda::TensorArgType, at::cuda::TensorArgType) [with scalar1=double, scalar2=double, Op=lambda [](double &, const double &)->void, max_threads_per_block=512, min_blocks_per_sm=2]"
caffe2/aten/src/ATen/native/cuda/Distributions.cu(60): here
            instantiation of "void <unnamed>::poisson_cuda_kernel<scalar_t>(at::Tensor &, const at::Tensor &, at::PhiloxCudaState) [with scalar_t=double]"
caffe2/aten/src/ATen/native/cuda/Distributions.cu(169): here
```

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D34034374

fbshipit-source-id: c92f0374eb5c821e1a67c2b8122c0791ed0809d4
(cherry picked from commit 66f5f96371)
2022-02-07 18:25:29 +00:00
Andrew Gu
b047963983 [PT-D][BE] Fix DDP no_sync() test logic (#72348)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72348

**Overview**
#43307 changed `_test_accumulate_gradients_no_sync()` to add a `num_iters` argument. However, I think the change misconstrued the test logic slightly.

61ab04e1db/torch/testing/_internal/distributed/distributed_test.py (L4369-L4397)

- `iteration % num_iters == 0` evaluates to `True` only for `iteration == 0` since `iteration` comes from `for iteration in `range(num_iters)`.
- IIUC, the intention is to alternate between accumulating gradients (using `no_sync()`) and synchronizing gradients normally. In the existing implementation, any iterations following the second one are non-productive since gradients are in sync, meaning it reduces to testing normal DDP.
- This PR changes the check back to `iteration % 2 == 0` to restore the alternating behavior.

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D34011559

Pulled By: awgu

fbshipit-source-id: 4ba771e45b28a343167a324462571e4b8e25ae72
(cherry picked from commit 8492a8b803)
2022-02-07 18:05:19 +00:00
Nikita Shulga
133461e5d6 Move CUDA linalg code to its own subfolder (#72304)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72304

This is a no-op change that simply moves files around in preparation of moving linear algebra in its own dynamically boundable module
This also simplifies torch_cuda_cu build rules, as all files from linalg it needs are in its own folder now.
Bazel CUDA rules are in some weird disarray(needed to add wildcard there as it ignores files mentioned in build_variables.so) and similar wildcard needs to be added to internal build system.

Test Plan: Imported from OSS

Reviewed By: dagitses, ngimel

Differential Revision: D33992796

Pulled By: malfet

fbshipit-source-id: 3f4fa1c224016d03e1a982a7ae5ac7807bc772e2
(cherry picked from commit 6a5a1b0c3f)
2022-02-07 17:55:50 +00:00
Jane Xu
d8c3ab11ae Fix BC by adding aten::_native_multi_head_self_attention (#72429)
Summary:
Forward fixes https://hud2.pytorch.org/minihud?name_filter=linux-xenial-py3.7-gcc5.4%20/%20test%20(backwards_compat,%201,%201,%20linux.2xlarge)
```
The PR is introducing backward incompatible changes to the operator library. Please contact PyTorch team to confirm whether this change is wanted or not.

Broken ops: [
	aten::_native_multi_head_self_attention(Tensor query, Tensor qkv_weight, Tensor qkv_bias, Tensor proj_weight, Tensor proj_bias, Tensor? mask=None) -> (Tensor)
]
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72429

Reviewed By: albanD

Differential Revision: D34043480

Pulled By: janeyx99

fbshipit-source-id: 7db8c682c7d5c3bd911a87d21670b5bd2f3ad5a1
(cherry picked from commit 0985ebb7f1)
2022-02-07 17:31:57 +00:00
Louis Feng
83b3b5fb00 [PyTorch] Support NVTX range_start and range_end (#70030)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70030

range_push and range_pop do not support multi-thread. It only works for push and pop range in the same thread.

For process level ranges, we should use range_start and range_end. This is important because PyTorch forward is on one thread, while the autograd is on a different thread.

See NVidia implementation documentation:
cab2dec760/NSight/nvToolsExt.h (L397-L407)

Test Plan:
```
buck test caffe2/test:cuda

Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/8162774391483460
    ✓ ListingSuccess: caffe2/test:cuda - main (19.640)
Summary
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/8162774391483460
```

Reviewed By: malfet

Differential Revision: D33155244

fbshipit-source-id: c7d5143f6da9b6ef0e0811e2fcae03a3e76f24de
(cherry picked from commit 22134e91b7)
2022-02-07 17:31:57 +00:00
Alban Desmaison
9f9b9c48e5 Tensorimpl cleanup try 2 (#72336)
Summary:
This reverts the previous PR and add some comments to make it clear what the intent is.
Also removes some extra static_assert that are not needed (at least for the compilers I tried).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/72336

Reviewed By: r-barnes

Differential Revision: D34006722

Pulled By: albanD

fbshipit-source-id: 290fb89a2d2c66a0d1c3651198b31d21216ec230
(cherry picked from commit 76f0aaa765)
2022-02-07 17:31:57 +00:00
anjali411
9d8f0c7842 Add ZT fastpath for torch.{dot, vdot} (#71129)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71129

cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D34012577

Pulled By: anjali411

fbshipit-source-id: 02d2f2d761f7c9332e2f3cc529e8f1c6b60d7da2
(cherry picked from commit 87318a2e0d)
2022-02-07 17:31:57 +00:00
albanD
4e98a4b6e3 Update release note bot to actually ping people
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72372
2022-02-07 17:08:42 +00:00
Jane Xu
a004f13567 Pin librosa
Should mitigate https://github.com/pytorch/pytorch/issues/72432
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72433
2022-02-07 17:01:01 +00:00
Kelly Stanton
9ac28cbfd2 Added prod op to FX2TRT (#72284)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72284

This update adds the prod op to the fx2trt tool which is used to create a TensorRT engine for a PyTorch model.

Test Plan:
A new unit test was added to test that the op was added to the acc tracer.  This text can be run using the following command: buck test --debug //caffe2/test:test_fx_acc_tracer -- --exact 'caffe2/test:test_fx_acc_tracer - test_prod (fx_acc.test_acc_tracer.AccTracerTest)'

A new suite of unit tests were also added for the conversion to tensorRT and can be tested using the following command:  buck test mode/dev-nosan  //caffe2/test/fx2trt/converters:test_prod

Please note that unfortunately unlike other pytorch reduce ops such as sum, the pytorch prod function does not support reducing more than 1 dimension at a time (the dim arg cannot be a tuple, only a single int is acceptable for prod).  Therefore prod cannot utilize all of the reduce_op code.

https://pxl.cl/1Xpn8

https://pxl.cl/1Xpn9

Reviewed By: 842974287

Differential Revision: D33875336

fbshipit-source-id: f9340db3685d681b1cf4ffc3b9fd25d16914e231
(cherry picked from commit cfe48d3737)
2022-02-07 16:56:35 +00:00
Peter Bell
bebf8dd543 Define TORCH_ASSERT_ONLY_METHOD_OPERATORS in ATen/core (#72344)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72344

ATen core is mostly compliant already so we can just add the flag to
the build system. The only exception is interned string which includes
symbols like `aten::add` generated for each operator.

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D34010820

Pulled By: albanD

fbshipit-source-id: ef1a625d96f30457b5e6beffc5e630516e54f9b4
(cherry picked from commit b90c262a92)
2022-02-07 15:48:56 +00:00
Vasiliy Kuznetsov
998a5adf8a dbr quant function fusion [2/x]: use fusion for observation and inference (#71781)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71781

The previous PR added information about fusions found in the subgraphs.

This PR uses that information for:
1. inserting observers at the end of fusions and not in the middle
2. during inference, replacing the original op with the fused op. The
way this is implemented is that the base op is replaced with the fused op,
and all other ops are replaced with identity functions.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR.test_fusion_functions
```

Reviewed By: jerryzh168

Differential Revision: D33775097

Pulled By: vkuzo

fbshipit-source-id: 12249b85b2f7ba7545a54872aeb5f1ff2fc928cf
(cherry picked from commit 0db4324ea9)
2022-02-07 14:00:26 +00:00
Vasiliy Kuznetsov
d672bbd0a9 fx quant: add fusion matching for operator.add and torch.relu (#71780)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71780

Adds support for matching operator.add -> torch.relu in FX graph
mode quantization.

It would be nice to support torch.relu better in general, but
saving that for a future PR to keep PRs small.

This is useful for DBR quant because we have some test cases in DBR
quant which use add-relu, and we'd like to match them to FX.

Test Plan:
```
python test/test_quantization.py TestQuantizeFxOps.test_add_relu
python test/test_quantization.py TestQuantizeFxOps.test_mul_relu
```

Reviewed By: jerryzh168

Differential Revision: D33775096

Pulled By: vkuzo

fbshipit-source-id: 889d9b41d3758ecbbb6d7eab67f64ce3d4892d24
(cherry picked from commit c1f9f38ca1)
2022-02-07 14:00:26 +00:00
Vasiliy Kuznetsov
5937c48f4e dbr quant function fusion [1/x]: record matches for functions (#71764)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71764

For DBR quant, adds the code for matching seen ops to function fusion
patterns. After we have the full DAG, we have a separate pass over the
dag and add matched fusion patterns to the seen op data structure.

This is the first PR in the stack which implements matching and
recording the match results. Future PRs in this stack will use
the match results to modify observer insertion and inference.

Test Plan:
```
python test/test_quantization.py TestQuantizeDBR.test_fusion_functions
```

Reviewed By: jerryzh168

Differential Revision: D33775098

Pulled By: vkuzo

fbshipit-source-id: 488aac902bf568d41c863ee49248990411ed9c53
(cherry picked from commit 4ad1ca1abc)
2022-02-07 14:00:26 +00:00