pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 00:20:18 +01:00

Author	SHA1	Message	Date
Philip Meier	20c2bb4c9f	fix kl_div for negative targets ghstack-source-id: `d69d60f4fe` Pull Request resolved: https://github.com/pytorch/pytorch/pull/69212	2022-02-08 14:36:26 +01:00
Philip Meier	334339a3d2	add `OpInfo`s for `torch.nn.functional.triplet_margin(_with_distance)?_loss` ghstack-source-id: `bbc38b4b85` Pull Request resolved: https://github.com/pytorch/pytorch/pull/67079	2022-02-08 14:36:26 +01:00
Philip Meier	5ada829c4b	add `OpInfo`s for `nn.functional.binary_cross_entropy(_with_logits)?` ghstack-source-id: `740eeff117` Pull Request resolved: https://github.com/pytorch/pytorch/pull/67023	2022-02-08 14:36:26 +01:00
Philip Meier	45cdfbeeab	add `OpInfo` for `torch.nn.functional.pdist` ghstack-source-id: `520a646689` Pull Request resolved: https://github.com/pytorch/pytorch/pull/67022	2022-02-08 14:36:25 +01:00
Philip Meier	3cc2faa064	add `OpInfo` for `torch.nn.functional.l1_loss` ghstack-source-id: `c1f6e39524` Pull Request resolved: https://github.com/pytorch/pytorch/pull/69211	2022-02-08 14:36:25 +01:00
francescocastelli	5e6f296612	Structured Kernel Precompute codegen handle fields without replacement (#71368 ) Summary: I've added the parsing of an optional first line in native_functions.yaml after the precomputed keyword for arguments that will be precomputed without replacement. This line is optional, must be the first and does not contain any arrow. These new fields are precomputed as before in the meta function and added to the precompute struct returned by the meta function. For now I've put them as last args of the impl function where they can be reused. example: native_function.yaml: ``` ... precomputed: - int numBatch, int numPlanes, int inputT, int inputH, int inputW <- new - kernel_size -> int poolSizeT, int poolSizeH, int poolSizeW - output_size -> int outputT, int outputH, int outputW ``` meta: ``` TORCH_PRECOMPUTE_META_FUNC(fractional_max_pool3d)( const at::Tensor& input_, IntArrayRef pool_size, IntArrayRef output_size, const at::Tensor& randomSamples ) { ... return TORCH_PRECOMPUTE_STRUCT(fractional_max_pool3d)().set_numBatch(numBatch).set_numPlanes(numPlanes).set_inputT(inputT).set_inputH(inputH).set_inputW(inputW) .set_poolSizeT(poolSizeT) ... } ``` impl: ``` TORCH_IMPL_FUNC(fractional_max_pool3d_out_cpu)( const at::Tensor& input_, int64_t poolSizeT, int64_t poolSizeH, int64_t poolSizeW, int64_t outputT, int64_t outputH, int64_t outputW, const at::Tensor& randomSamples, const at::Tensor& output, const at::Tensor& indices, int64_t numBatch, <- for now I've put them here int64_t numPlanes, int64_t inputT, int64_t inputH, int64_t inputW) { ``` Fixes https://github.com/pytorch/pytorch/issues/71314 Pull Request resolved: https://github.com/pytorch/pytorch/pull/71368 Reviewed By: zou3519 Differential Revision: D33683984 Pulled By: bdhirsh fbshipit-source-id: 33066dd92b8743aadf0dc8102f6bf0689f843242 (cherry picked from commit `64e46af6a4`)	2022-02-08 03:56:56 +00:00
Brian Muse	8bf3179f6e	#71946 Remove Python 3.6 references (#72211 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/71946 This commit removes some bits of code that were hard coded for Python 3.6 support from the `.circleci` and `torch` folders. It should only be merged if https://github.com/pytorch/pytorch/issues/66462 is complete. Pull Request resolved: https://github.com/pytorch/pytorch/pull/72211 Reviewed By: dagitses, seemethere Differential Revision: D33982604 Pulled By: musebc fbshipit-source-id: 8f453bf9909df615addd59538adb369c65484044 (cherry picked from commit `944a9970fe`)	2022-02-08 03:46:20 +00:00
Shiyan Deng	2afed243b5	[fx2trt] remove split.py (#71933 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71933 Add the functionalities provided by split.py to splitter_base. - Propagate submodule inputs - Create SplitResult to hold the split results. Then removed split.py, to me this makes navigating the lowering code a bit easier. Added default split and trace function for use. Next step is to add better error handling for each stage during lowering and create unit tests for each stage. I'll probably make some bootcamp tasks for unit tests. Test Plan: CI Reviewed By: frank-wei, wushirong Differential Revision: D33794322 fbshipit-source-id: f991893047a3701177f54cf22d9a6e48e0529472 (cherry picked from commit `1f3e13efba`)	2022-02-08 03:31:25 +00:00
Mike Iovine	d51d2bd608	[SR] Add a flag to disable copy variants (#71102 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71102 This graph pass is causing a major perf regression on some models. Ideally we would introduce maybe_copy variants for all these ops. But since those are tricky to write, I've introduced a flag to just turn the pass off for now. ghstack-source-id: 148541673 Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: navahgar Differential Revision: D33510080 fbshipit-source-id: bb4847f26561197ea5e6bbad0a4d25db4ef468eb (cherry picked from commit `8f333d3e81`)	2022-02-08 02:43:07 +00:00
Raghavan Raman	765908708b	[nnc] Adding a test with dynamic shapes from a model (#72198 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72198 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D33951741 Pulled By: navahgar fbshipit-source-id: 596b193eba14c8e1affa9fa13070079f05d64cac (cherry picked from commit `ddbb78ff80`)	2022-02-08 02:00:46 +00:00
Nikita Shulga	cedf37d933	Back out "Revert D34043182: [pytorch][PR] Added missing antialias argument to functional.pyi.in" Summary: Original commit changeset: 4ce347cb0f30 Original Phabricator Diff: D34043182 (`8315c9b885`) Test Plan: It's a backout of a backout Reviewed By: pbelevich, jaceyca Differential Revision: D34060843 fbshipit-source-id: 6aaf62ce74330cbf142ab483b2a31eccba775ca9 (cherry picked from commit `046b1dbb72`)	2022-02-08 01:37:22 +00:00
Nikita Shulga	bb101ec78d	Revert D33595240: [JIT] Opinfo tests for nnc fusion Test Plan: revert-hammer Differential Revision: D33595240 (`0b57bd4c66`) Original commit changeset: e2e17a921bc3 Original Phabricator Diff: D33595240 (`0b57bd4c66`) fbshipit-source-id: 172a3ffd19d180b1b3617956b1f881be62f37bc9 (cherry picked from commit `324cfaea86`)	2022-02-08 01:28:42 +00:00
Nikita Shulga	58f25678bd	Revert D33780905: Opinfo test for mvlgamma: add epsilon Test Plan: revert-hammer Differential Revision: D33780905 (`72cedba655`) Original commit changeset: c9afd443bc90 Original Phabricator Diff: D33780905 (`72cedba655`) fbshipit-source-id: 180b862ed03e18f96cc1c7f956476eb16dd56225 (cherry picked from commit `623643b362`)	2022-02-08 01:28:42 +00:00
Guo Yejun	4d4b94b3cb	gen_backend_stubs.py: fix typo for supported_autograd (#68562 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/68562 Reviewed By: jbschlosser Differential Revision: D32758608 Pulled By: bdhirsh fbshipit-source-id: 496e1ec831edaa6fcc586f3c8f0361c31cad4e78 (cherry picked from commit `68ea9e9df5`)	2022-02-08 01:28:42 +00:00
Nikita Shulga	127bf42ee7	Revert D34043182: [pytorch][PR] Added missing antialias argument to functional.pyi.in Test Plan: revert-hammer Differential Revision: D34043182 (`8315c9b885`) Original commit changeset: ca64a8f0d251 Original Phabricator Diff: D34043182 (`8315c9b885`) fbshipit-source-id: 4ce347cb0f30c4e1eaeef86995f698dd72494d66 (cherry picked from commit `93eef03aa6`)	2022-02-08 01:11:39 +00:00
Ivan Yashchuk	8cdcc1181c	Add missing entry for sampled_addmm in sparse.rst (#72312 ) Summary: Let's make the documentation for `torch.sparse.sampled_addmm` searchable in the PyTorch documentation. This PR shall be cherry-picked for the next 1.11 release. Pull Request resolved: https://github.com/pytorch/pytorch/pull/72312 Reviewed By: davidberard98 Differential Revision: D34045230 Pulled By: cpuhrsch fbshipit-source-id: c1b1dc907443284857f48c8ce1efab22c6701bbe (cherry picked from commit `225929ecf2`)	2022-02-08 00:07:20 +00:00
Nikita Shulga	896703d3d7	Do not push binaries generated by ciflow Fixes https://github.com/pytorch/pytorch/issues/72422 Pull Request resolved: https://github.com/pytorch/pytorch/pull/72427	2022-02-07 22:56:30 +00:00
Evgeny Fiksman	9ab71f5ac8	[pytorch/aten] Avoid temporary array reconstruction (#72391 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72391 Temporary array can be reused with in the loop, this will save memory reallocations and uninitialized_copy calls for the vector Test Plan: CI Reviewed By: jspark1105 Differential Revision: D34030993 fbshipit-source-id: 40708e3144c6c8f8ac3a6a45d668b34b5e52e095 (cherry picked from commit `859e126aef`)	2022-02-07 22:49:12 +00:00
David Berard	72cedba655	Opinfo test for mvlgamma: add epsilon (#71794 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71794 mvlgamma(inp, p) requires that all the elements of inp are > (p-1)/2. The opinfo test was occasionally producing inputs with elements == (p-1/2), which would generate errors like: ``` ERROR: test_nnc_correctness_mvlgamma_mvlgamma_p_5_cpu_bfloat16 (__main__.TestNNCOpInfoCPU) ---------------------------------------------------------------------- Traceback (most recent call last): File "/path/pytorch/torch/testing/_internal/common_device_type.py", line 381, in instantiated_test raise rte File "/path/pytorch/torch/testing/_internal/common_device_type.py", line 376, in instantiated_test result = test(self, *param_kwargs) File "/path/pytorch/torch/testing/_internal/common_device_type.py", line 753, in test_wrapper return test(args, *kwargs) File "/path/pytorch/torch/testing/_internal/common_device_type.py", line 907, in only_fn return fn(slf, args, *kwargs) File "/path/pytorch/test/test_jit_fuser_te.py", line 2293, in test_nnc_correctness ref = variant(clone_inputs((sample.input, sample.args)), *sample.kwargs) RuntimeError: All elements must be greater than (p-1)/2 ``` repro example: https://gist.github.com/davidberard98/9da688e31cdfbaed7e990746b28a4ba2 Test Plan: Imported from OSS Reviewed By: qihqi Differential Revision: D33780905 Pulled By: davidberard98 fbshipit-source-id: c9afd443bc90ce68f33b97498921b447e4f7d1d8 (cherry picked from commit `a974b03f07`)	2022-02-07 22:21:03 +00:00
Nikita Shulga	5654b68731	Revert D34011981: [pytorch][PR] remove some spurious warnings fixing Test Plan: revert-hammer Differential Revision: D34011981 (`1bad3c4a84`) Original commit changeset: 55bedc8a4092 Original Phabricator Diff: D34011981 (`1bad3c4a84`) fbshipit-source-id: 216643e251597cd7086e7854426f4f189a77adc9 (cherry picked from commit `bb39550500`)	2022-02-07 22:01:25 +00:00
Daniël de Kok	d50211860a	Use SLEEF functions for NEON vectors on macOS ARM64 (#70354 ) Summary: We noticed that on M1 Macs Tranformer network profiles are dominated by scalar `exp` and `erff` functions (for softmax and GELU). The NEON `Vectorized<float>` implementation does not use SLEEF functions in order to compile on mobile platforms. However, SLEEF is already compiled on macOS ARM64 and is safe to use there. This change adds another implementation of `Vectorized<float>` that uses SLEEF functions. This implementation is only used on macOS ARM64. This change speeds up e.g. prediction of spaCy transformer models by 20% on M1 Macs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70354 Reviewed By: albanD Differential Revision: D33659540 Pulled By: kimishpatel fbshipit-source-id: b8f02a61321873fc60778190a005c466c7d0cc0c (cherry picked from commit `71286a207c`)	2022-02-07 21:55:28 +00:00
Bradley Davis	f0f49a1153	[torch.package] add test case for repackaging parent module (#72367 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72367 Pull Request resolved: https://github.com/pytorch/pytorch/pull/72299 Test Plan: Before https://github.com/pytorch/pytorch/pull/71520: ``` Summary Pass: 106 Fail: 1 ✗ caffe2/test:package - test_repackage_import_indirectly_via_parent_module (package.package_d.test_repackage.TestRepackage) Skip: 22 ... ListingSuccess: 1 ``` After https://github.com/pytorch/pytorch/pull/71520: ``` BUILD SUCCEEDED ✓ ListingSuccess: caffe2/test:package : 129 tests discovered (28.595) ✓ Pass: caffe2/test:package - test_repackage_import_indirectly_via_parent_module (package.package_d.test_repackage.TestRepackage) (18.635) Summary Pass: 1 ListingSuccess: 1 ``` Reviewed By: PaliC Differential Revision: D34015540 fbshipit-source-id: b45af5872ae4a5f52afbc0008494569d1080fa38 (cherry picked from commit `432d728e66`)	2022-02-07 21:49:36 +00:00
Ivan Yashchuk	29c81bbff5	Fix SVD error code handling for OpenBLAS 0.3.15+ and MKL 2022+ (again) (#72357 ) Summary: This PR was opened as copy of https://github.com/pytorch/pytorch/pull/68812 by request https://github.com/pytorch/pytorch/pull/68812#issuecomment-1030215862. ----- Fixes https://github.com/pytorch/pytorch/issues/67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate the wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code. MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+ This PR also fixes https://github.com/pytorch/pytorch/issues/71645 that is due to the updated MKL version in CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/72357 Reviewed By: albanD Differential Revision: D34012245 Pulled By: ngimel fbshipit-source-id: 2b66c173cc3458d8c766b542d0d569191cdce310 (cherry picked from commit `fa29e65611`)	2022-02-07 21:36:30 +00:00
Peter Bell	bc1fb7a618	CMake: Limit python include directories to only python libraries (#69085 ) Summary: `include_directories` is old-style CMake which adds the include path to every file being compiled. This instead makes `python`, `numpy` and `pybind11` into targets that only `torch_python` and `caffe2_pybind_state` are linked to. So, python libraries can't be accidentally included elsewhere. Resubmit of https://github.com/pytorch/pytorch/issues/65654, Closes https://github.com/pytorch/pytorch/issues/65828 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69085 Reviewed By: anjali411 Differential Revision: D33776456 Pulled By: malfet fbshipit-source-id: 018b0f6cd5a4f8c9e36df961deff832bc4afd479 (cherry picked from commit `57063107d6`)	2022-02-07 21:18:32 +00:00
Nikita Shulga	bec2ed05e8	[BE] Move upload logic to shared template (#72426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72426 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D34050483 Pulled By: malfet fbshipit-source-id: c8ab8505433a8eab3da10a1d2f990496e9e6300c (cherry picked from commit `81346d7c8b`)	2022-02-07 21:18:32 +00:00
Nikita Shulga	b74c2de46a	Set `DRY_RUN` to disabled for Win binary builds (#72425 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72425 Not sure how it worked before Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D34050484 Pulled By: malfet fbshipit-source-id: 91e2660d4f4e3b8c04bddd07bac434fcba630c0f (cherry picked from commit `b652b25d39`)	2022-02-07 21:18:32 +00:00
Omar	25f9fe22a9	[PowerSGD] Add orthogonalization with QR factorization (#72043 ) Summary: ### 🚀 The feature, motivation and pitch Following the discussion in https://github.com/pytorch/pytorch/issues/65813, I added the QR factorization to powerSGD_hook.py Gram-Schmidt orthogonalization can't be fully replaced because _torch.linalg.qr_ doesn't work with half-precision. Moreover, in my tests, it works faster with a rank lesser than 3. This is one sample experiment timing powerSGD_hook on ResNext101 with the two different methods: ![Screenshot from 2022-01-31 18-14-00](https://user-images.githubusercontent.com/42100908/151840929-270c67dd-9fe7-4f11-8e70-8bf2d0ba678d.png) ### Alternatives Use _torch.orgqr(*torch.geqrf(matrix))_. From my tests it performances are similar to _torch.linalg.qr_. ### Additional context _No response_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/72043 Reviewed By: albanD Differential Revision: D34042781 Pulled By: cbalioglu fbshipit-source-id: e331179d3b7ac40d445b651fc473b16ae4ead462 (cherry picked from commit `f64bf3839a`)	2022-02-07 21:15:40 +00:00
David Berard	0b57bd4c66	[JIT] Opinfo tests for nnc fusion (#70465 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70465 These tests check to ensure that (a) the result after nnc fusion (of a single op) is the same as the unfused op (b) for certain ops where fusion is expected to occur, ensure that fusion does actually occur Test Plan: Imported from OSS Reviewed By: wenleix Differential Revision: D33595240 Pulled By: davidberard98 fbshipit-source-id: e2e17a921bc30c313e92e8e5bbc6c1b5fcd14bc1 (cherry picked from commit `b1ba221acc`)	2022-02-07 20:56:21 +00:00
vfdev-5	8315c9b885	Added missing antialias argument to functional.pyi.in (#72420 ) Summary: Description: - Added missing antialias argument to functional.pyi.in - mypy is happy if checking `interpolate` method with antialias argument Related torchvision issue: https://github.com/pytorch/vision/pull/5329 Pull Request resolved: https://github.com/pytorch/pytorch/pull/72420 Reviewed By: mruberry Differential Revision: D34043182 Pulled By: albanD fbshipit-source-id: ca64a8f0d2516c1be5b060c1c24e0b1ed2127b96 (cherry picked from commit `7c8a90cbfa`)	2022-02-07 20:44:59 +00:00
Alban Desmaison	1bad3c4a84	remove some spurious warnings fixing (#72352 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/70389 Pull Request resolved: https://github.com/pytorch/pytorch/pull/72352 Reviewed By: jbschlosser Differential Revision: D34011981 Pulled By: albanD fbshipit-source-id: 55bedc8a40929bc5b49cb6d7d7d51a3750f2ff27 (cherry picked from commit `a6657a9071`)	2022-02-07 19:53:14 +00:00
Raghavan Raman	ff71429906	[nnc] Add stride args while running with allocated outputs (#72223 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72223 ghstack-source-id: 148494871 Test Plan: ``` buck test mode/opt //caffe2/test/cpp/tensorexpr:tensorexpr -- --exact 'caffe2/test/cpp/tensorexpr:tensorexpr - DynamicShapes.GraphWithSymbolicStrides' ``` Reviewed By: eellison Differential Revision: D33960592 fbshipit-source-id: 6334978d5e3713889b4ad12bcd8ed8c69df39d58 (cherry picked from commit `95cc102bc2`)	2022-02-07 19:24:56 +00:00
Chien-Chin Huang	224093db11	[FSDP] Add FlatParameter to track the information of a flat parameter (#69241 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69241 Implement FlatParameter to track the information of a flat parameter, including the sharding information. Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D32432503 fbshipit-source-id: b4aabba6cef29e825b45869895709c79e69c211d (cherry picked from commit `0e5505f70b`)	2022-02-07 18:51:17 +00:00
Shijun Kong	09e2fb8f6e	Make LinearPackedParams works with both torchscript and torch.package (#71656 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71656 Customized `__getstate__`/`__setstate__` didn't call super (torch.nn.Module), and won't restore attributes (e.g. `_modules`) after being serialized and deserialized via torch.package After a few iteration, as it turns out, pack/unpack linear param has been supported in torchbind class already, no need to hack torch module anymore. Test Plan: `buck test caffe2/test/:quantization -- test_linear_api` Reviewed By: jerryzh168 Differential Revision: D33711086 fbshipit-source-id: 3a36d10c64b7da414d3657d2ef766bb9a9290ea9 (cherry picked from commit `6337b6c207`)	2022-02-07 18:39:28 +00:00
Nikita Shulga	717d8c6224	[BE] Fix pybind deprecation warnings (#72376 ) Summary: Fixes: ``` ../torch/csrc/autograd/python_variable.cpp:1798:33: warning: ‘bool pybind11::handle::operator==(const pybind11::handle&) const’ is deprecated: Use obj1.is(obj2) instead [-Wdeprecated-declarations] TORCH_CHECK(out == py::none(), "Expected __torch_dispatch__ for ", op.operator_name(), ``` and ``` ../torch/csrc/jit/python/python_list.cpp:254:57: warning: ‘pybind11::object::object(pybind11::handle, bool)’ is deprecated: Use reinterpret_borrow<object>() or reinterpret_steal<object>() [-Wdeprecated-declarations] py::object(obj, /is_borrowed/ true), ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/72376 Reviewed By: albanD Differential Revision: D34021328 Pulled By: malfet fbshipit-source-id: 72906077db9031311c6b0ae4c65eb79df9c514d4 (cherry picked from commit `e1877ca268`)	2022-02-07 18:33:32 +00:00
Richard Barnes	5da6de5dc2	Fix unused variable warnings (#72410 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72410 Fixes ``` caffe2/caffe2/operators/cross_entropy_op.cu(330): warning: parameter "outer_size" was declared but never referenced caffe2/caffe2/operators/cross_entropy_op.cu(191): warning: parameter "outer_size" was declared but never referenced caffe2/caffe2/operators/generate_proposals_op_util_nms.h(347): warning: variable "order" was declared but never referenced caffe2/caffe2/operators/segment_reduction_op_gpu.cu(319): warning: parameter "N" was declared but never referenced detected during: instantiation of "__nv_bool caffe2::CUDASparseLengthsWeightedSumOp<T, Context, SparseFused>::DoRunWithType<IndexType>() [with T=float, Context=caffe2::CUDAContext, SparseFused=true, IndexType=int32_t]" caffe2/caffe2/core/operator.h(1304): here instantiation of "__nv_bool caffe2::DispatchHelper<caffe2::TensorTypes<FirstType, Types...>, ExtraArgs...>::call(Op , caffe2::TypeMeta) [with FirstType=int32_t, Types=<int64_t>, ExtraArgs=<>, Op=caffe2::CUDASparseLengthsWeightedSumOp<float, caffe2::CUDAContext, true>]" caffe2/caffe2/core/operator.h(1304): here instantiation of "__nv_bool caffe2::DispatchHelper<caffe2::TensorTypes<FirstType, Types...>, ExtraArgs...>::call(Op , const caffe2::Tensor &) [with FirstType=int32_t, Types=<int64_t>, ExtraArgs=<>, Op=caffe2::CUDASparseLengthsWeightedSumOp<float, caffe2::CUDAContext, true>]" (786): here caffe2/caffe2/operators/segment_reduction_op_gpu.cu(96): warning: parameter "len_length" was declared but never referenced detected during: instantiation of "__nv_bool caffe2::CUDASparseLengthsSumGradientWithIndicesOp<T, Context>::RunOnDevice() [with T=float, Context=caffe2::CUDAContext]" (1296): here caffe2/caffe2/sgd/adagrad_fused_op_gpu.cu(1226): warning: variable "N" was declared but never referenced detected during: instantiation of "__nv_bool caffe2::DispatchHelper<caffe2::TensorTypes2<FirstType, Types...>, ExtraArgs...>::call(Op *, caffe2::TypeMeta) [with FirstType=float, Types=<c10::Half>, ExtraArgs=<int32_t>, Op=caffe2::CUDARowWiseSparseAdagradFusedWithSparseLengthsSumGradientExactOp<float, int, false, caffe2::CUDAContext>]" caffe2/caffe2/sgd/adagrad_fused_op_gpu.cu(259): warning: parameter "indices" was declared but never referenced detected during: instantiation of "__nv_bool caffe2::CUDARowWiseSparseAdagradFusedWithSparseLengthsSumGradientExactOp<T, TLengths, is_mean, Context>::DoRunWithType2<IndexType,TParam>() [with T=float, TLengths=int, is_mean=false, Context=caffe2::CUDAContext, IndexType=int32_t, TParam=float]" caffe2/caffe2/core/operator.h(1308): here caffe2/caffe2/operators/piecewise_linear_transform_op.cu(15): warning: parameter "num_grp" was declared but never referenced caffe2/caffe2/operators/piecewise_linear_transform_op.cu(50): warning: parameter "M" was declared but never referenced caffe2/caffe2/operators/piecewise_linear_transform_op.cu(51): warning: parameter "num_grp" was declared but never referenced caffe2/caffe2/operators/piecewise_linear_transform_op.cu(78): warning: parameter "num_grp" was declared but never referenced ``` Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D34034404 fbshipit-source-id: b834088d6a3e204e94bbffe3ac6fdccf9d0176b8 (cherry picked from commit `0148d0de04`)	2022-02-07 18:25:29 +00:00
Richard Barnes	805dff354e	Avoid type qualifier specified more than once (#72411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72411 Fixes ``` caffe2/caffe2/operators/max_pool_with_index.cu(16): warning: type qualifier specified more than once caffe2/caffe2/operators/max_pool_with_index.cu(28): warning: type qualifier specified more than once caffe2/caffe2/operators/max_pool_with_index.cu(61): warning: type qualifier specified more than once caffe2/caffe2/operators/max_pool_with_index.cu(62): warning: type qualifier specified more than once caffe2/caffe2/operators/max_pool_with_index.cu(74): warning: type qualifier specified more than once ``` Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D34034382 fbshipit-source-id: 2b73c55358632090baf673b32b800656ae874040 (cherry picked from commit `ab3f3f9a79`)	2022-02-07 18:25:29 +00:00
Richard Barnes	2b702b43c5	Fix unused variable warning (#72412 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72412 Fixes: ``` caffe2/aten/src/ATen/cuda/CUDAApplyUtils.cuh(328): warning: parameter "n" was declared but never referenced detected during: instantiation of "void at::cuda::<unnamed>::ApplyOp2<Op, scalar1, scalar2, IndexType, ADims, BDims, remaining_steps, Offsets...>::apply(at::cuda::detail::TensorInfo<scalar1, IndexType> &, at::cuda::detail::TensorInfo<scalar2, IndexType> &, const Op &, int64_t, IndexType, Offsets..., Offsets...) [with Op=lambda [](double &, const double &)->void, scalar1=double, scalar2=double, IndexType=unsigned int, ADims=1, BDims=1, remaining_steps=1, Offsets=<>]" (370): here instantiation of "void at::cuda::<unnamed>::kernelPointwiseApply2<Op,scalar1,scalar2,IndexType,ADims,BDims,step,max_threads_per_block,min_blocks_per_sm>(at::cuda::detail::TensorInfo<scalar1, IndexType>, at::cuda::detail::TensorInfo<scalar2, IndexType>, IndexType, Op) [with Op=lambda [](double &, const double &)->void, scalar1=double, scalar2=double, IndexType=unsigned int, ADims=1, BDims=1, step=1, max_threads_per_block=512, min_blocks_per_sm=2]" (487): here instantiation of "__nv_bool at::cuda::CUDA_tensor_apply2<scalar1,scalar2,step,Op,max_threads_per_block,min_blocks_per_sm>(at::Tensor, at::Tensor, Op, at::cuda::TensorArgType, at::cuda::TensorArgType) [with scalar1=double, scalar2=double, step=1, Op=lambda [](double &, const double &)->void, max_threads_per_block=512, min_blocks_per_sm=2]" (533): here instantiation of "__nv_bool at::cuda::CUDA_tensor_apply2<scalar1,scalar2,Op,max_threads_per_block,min_blocks_per_sm>(at::Tensor, at::Tensor, Op, at::cuda::TensorArgType, at::cuda::TensorArgType) [with scalar1=double, scalar2=double, Op=lambda [](double &, const double &)->void, max_threads_per_block=512, min_blocks_per_sm=2]" caffe2/aten/src/ATen/native/cuda/Distributions.cu(60): here instantiation of "void <unnamed>::poisson_cuda_kernel<scalar_t>(at::Tensor &, const at::Tensor &, at::PhiloxCudaState) [with scalar_t=double]" caffe2/aten/src/ATen/native/cuda/Distributions.cu(169): here caffe2/aten/src/ATen/cuda/CUDAApplyUtils.cuh(328): warning: parameter "linearIndex" was declared but never referenced detected during: instantiation of "void at::cuda::<unnamed>::ApplyOp2<Op, scalar1, scalar2, IndexType, ADims, BDims, remaining_steps, Offsets...>::apply(at::cuda::detail::TensorInfo<scalar1, IndexType> &, at::cuda::detail::TensorInfo<scalar2, IndexType> &, const Op &, int64_t, IndexType, Offsets..., Offsets...) [with Op=lambda [](double &, const double &)->void, scalar1=double, scalar2=double, IndexType=unsigned int, ADims=1, BDims=1, remaining_steps=1, Offsets=<>]" (370): here instantiation of "void at::cuda::<unnamed>::kernelPointwiseApply2<Op,scalar1,scalar2,IndexType,ADims,BDims,step,max_threads_per_block,min_blocks_per_sm>(at::cuda::detail::TensorInfo<scalar1, IndexType>, at::cuda::detail::TensorInfo<scalar2, IndexType>, IndexType, Op) [with Op=lambda [](double &, const double &)->void, scalar1=double, scalar2=double, IndexType=unsigned int, ADims=1, BDims=1, step=1, max_threads_per_block=512, min_blocks_per_sm=2]" (487): here instantiation of "__nv_bool at::cuda::CUDA_tensor_apply2<scalar1,scalar2,step,Op,max_threads_per_block,min_blocks_per_sm>(at::Tensor, at::Tensor, Op, at::cuda::TensorArgType, at::cuda::TensorArgType) [with scalar1=double, scalar2=double, step=1, Op=lambda [](double &, const double &)->void, max_threads_per_block=512, min_blocks_per_sm=2]" (533): here instantiation of "__nv_bool at::cuda::CUDA_tensor_apply2<scalar1,scalar2,Op,max_threads_per_block,min_blocks_per_sm>(at::Tensor, at::Tensor, Op, at::cuda::TensorArgType, at::cuda::TensorArgType) [with scalar1=double, scalar2=double, Op=lambda [](double &, const double &)->void, max_threads_per_block=512, min_blocks_per_sm=2]" caffe2/aten/src/ATen/native/cuda/Distributions.cu(60): here instantiation of "void <unnamed>::poisson_cuda_kernel<scalar_t>(at::Tensor &, const at::Tensor &, at::PhiloxCudaState) [with scalar_t=double]" caffe2/aten/src/ATen/native/cuda/Distributions.cu(169): here caffe2/aten/src/ATen/cuda/CUDAApplyUtils.cuh(328): warning: parameter "linearIndex" was declared but never referenced detected during: instantiation of "void at::cuda::<unnamed>::ApplyOp2<Op, scalar1, scalar2, IndexType, ADims, BDims, remaining_steps, Offsets...>::apply(at::cuda::detail::TensorInfo<scalar1, IndexType> &, at::cuda::detail::TensorInfo<scalar2, IndexType> &, const Op &, int64_t, IndexType, Offsets..., Offsets...) [with Op=lambda [](double &, const double &)->void, scalar1=double, scalar2=double, IndexType=unsigned int, ADims=1, BDims=1, remaining_steps=1, Offsets=<>]" (370): here instantiation of "void at::cuda::<unnamed>::kernelPointwiseApply2<Op,scalar1,scalar2,IndexType,ADims,BDims,step,max_threads_per_block,min_blocks_per_sm>(at::cuda::detail::TensorInfo<scalar1, IndexType>, at::cuda::detail::TensorInfo<scalar2, IndexType>, IndexType, Op) [with Op=lambda [](double &, const double &)->void, scalar1=double, scalar2=double, IndexType=unsigned int, ADims=1, BDims=1, step=1, max_threads_per_block=512, min_blocks_per_sm=2]" (487): here instantiation of "__nv_bool at::cuda::CUDA_tensor_apply2<scalar1,scalar2,step,Op,max_threads_per_block,min_blocks_per_sm>(at::Tensor, at::Tensor, Op, at::cuda::TensorArgType, at::cuda::TensorArgType) [with scalar1=double, scalar2=double, step=1, Op=lambda [](double &, const double &)->void, max_threads_per_block=512, min_blocks_per_sm=2]" (533): here instantiation of "__nv_bool at::cuda::CUDA_tensor_apply2<scalar1,scalar2,Op,max_threads_per_block,min_blocks_per_sm>(at::Tensor, at::Tensor, Op, at::cuda::TensorArgType, at::cuda::TensorArgType) [with scalar1=double, scalar2=double, Op=lambda [](double &, const double &)->void, max_threads_per_block=512, min_blocks_per_sm=2]" caffe2/aten/src/ATen/native/cuda/Distributions.cu(60): here instantiation of "void <unnamed>::poisson_cuda_kernel<scalar_t>(at::Tensor &, const at::Tensor &, at::PhiloxCudaState) [with scalar_t=double]" caffe2/aten/src/ATen/native/cuda/Distributions.cu(169): here caffe2/aten/src/ATen/cuda/CUDAApplyUtils.cuh(328): warning: parameter "n" was declared but never referenced detected during: instantiation of "void at::cuda::<unnamed>::ApplyOp2<Op, scalar1, scalar2, IndexType, ADims, BDims, remaining_steps, Offsets...>::apply(at::cuda::detail::TensorInfo<scalar1, IndexType> &, at::cuda::detail::TensorInfo<scalar2, IndexType> &, const Op &, int64_t, IndexType, Offsets..., Offsets...) [with Op=lambda [](double &, const double &)->void, scalar1=double, scalar2=double, IndexType=unsigned int, ADims=1, BDims=1, remaining_steps=1, Offsets=<>]" (370): here instantiation of "void at::cuda::<unnamed>::kernelPointwiseApply2<Op,scalar1,scalar2,IndexType,ADims,BDims,step,max_threads_per_block,min_blocks_per_sm>(at::cuda::detail::TensorInfo<scalar1, IndexType>, at::cuda::detail::TensorInfo<scalar2, IndexType>, IndexType, Op) [with Op=lambda [](double &, const double &)->void, scalar1=double, scalar2=double, IndexType=unsigned int, ADims=1, BDims=1, step=1, max_threads_per_block=512, min_blocks_per_sm=2]" (487): here instantiation of "__nv_bool at::cuda::CUDA_tensor_apply2<scalar1,scalar2,step,Op,max_threads_per_block,min_blocks_per_sm>(at::Tensor, at::Tensor, Op, at::cuda::TensorArgType, at::cuda::TensorArgType) [with scalar1=double, scalar2=double, step=1, Op=lambda [](double &, const double &)->void, max_threads_per_block=512, min_blocks_per_sm=2]" (533): here instantiation of "__nv_bool at::cuda::CUDA_tensor_apply2<scalar1,scalar2,Op,max_threads_per_block,min_blocks_per_sm>(at::Tensor, at::Tensor, Op, at::cuda::TensorArgType, at::cuda::TensorArgType) [with scalar1=double, scalar2=double, Op=lambda [](double &, const double &)->void, max_threads_per_block=512, min_blocks_per_sm=2]" caffe2/aten/src/ATen/native/cuda/Distributions.cu(60): here instantiation of "void <unnamed>::poisson_cuda_kernel<scalar_t>(at::Tensor &, const at::Tensor &, at::PhiloxCudaState) [with scalar_t=double]" caffe2/aten/src/ATen/native/cuda/Distributions.cu(169): here caffe2/aten/src/ATen/cuda/CUDAApplyUtils.cuh(328): warning: parameter "linearIndex" was declared but never referenced detected during: instantiation of "void at::cuda::<unnamed>::ApplyOp2<Op, scalar1, scalar2, IndexType, ADims, BDims, remaining_steps, Offsets...>::apply(at::cuda::detail::TensorInfo<scalar1, IndexType> &, at::cuda::detail::TensorInfo<scalar2, IndexType> &, const Op &, int64_t, IndexType, Offsets..., Offsets...) [with Op=lambda [](double &, const double &)->void, scalar1=double, scalar2=double, IndexType=unsigned int, ADims=1, BDims=1, remaining_steps=1, Offsets=<>]" (370): here instantiation of "void at::cuda::<unnamed>::kernelPointwiseApply2<Op,scalar1,scalar2,IndexType,ADims,BDims,step,max_threads_per_block,min_blocks_per_sm>(at::cuda::detail::TensorInfo<scalar1, IndexType>, at::cuda::detail::TensorInfo<scalar2, IndexType>, IndexType, Op) [with Op=lambda [](double &, const double &)->void, scalar1=double, scalar2=double, IndexType=unsigned int, ADims=1, BDims=1, step=1, max_threads_per_block=512, min_blocks_per_sm=2]" (487): here instantiation of "__nv_bool at::cuda::CUDA_tensor_apply2<scalar1,scalar2,step,Op,max_threads_per_block,min_blocks_per_sm>(at::Tensor, at::Tensor, Op, at::cuda::TensorArgType, at::cuda::TensorArgType) [with scalar1=double, scalar2=double, step=1, Op=lambda [](double &, const double &)->void, max_threads_per_block=512, min_blocks_per_sm=2]" (533): here instantiation of "__nv_bool at::cuda::CUDA_tensor_apply2<scalar1,scalar2,Op,max_threads_per_block,min_blocks_per_sm>(at::Tensor, at::Tensor, Op, at::cuda::TensorArgType, at::cuda::TensorArgType) [with scalar1=double, scalar2=double, Op=lambda [](double &, const double &)->void, max_threads_per_block=512, min_blocks_per_sm=2]" caffe2/aten/src/ATen/native/cuda/Distributions.cu(60): here instantiation of "void <unnamed>::poisson_cuda_kernel<scalar_t>(at::Tensor &, const at::Tensor &, at::PhiloxCudaState) [with scalar_t=double]" caffe2/aten/src/ATen/native/cuda/Distributions.cu(169): here ``` Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D34034374 fbshipit-source-id: c92f0374eb5c821e1a67c2b8122c0791ed0809d4 (cherry picked from commit `66f5f96371`)	2022-02-07 18:25:29 +00:00
Andrew Gu	b047963983	[PT-D][BE] Fix DDP no_sync() test logic (#72348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72348 Overview #43307 changed `_test_accumulate_gradients_no_sync()` to add a `num_iters` argument. However, I think the change misconstrued the test logic slightly. `61ab04e1db/torch/testing/_internal/distributed/distributed_test.py (L4369-L4397)` - `iteration % num_iters == 0` evaluates to `True` only for `iteration == 0` since `iteration` comes from `for iteration in `range(num_iters)`. - IIUC, the intention is to alternate between accumulating gradients (using `no_sync()`) and synchronizing gradients normally. In the existing implementation, any iterations following the second one are non-productive since gradients are in sync, meaning it reduces to testing normal DDP. - This PR changes the check back to `iteration % 2 == 0` to restore the alternating behavior. Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D34011559 Pulled By: awgu fbshipit-source-id: 4ba771e45b28a343167a324462571e4b8e25ae72 (cherry picked from commit `8492a8b803`)	2022-02-07 18:05:19 +00:00
Nikita Shulga	133461e5d6	Move CUDA linalg code to its own subfolder (#72304 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72304 This is a no-op change that simply moves files around in preparation of moving linear algebra in its own dynamically boundable module This also simplifies torch_cuda_cu build rules, as all files from linalg it needs are in its own folder now. Bazel CUDA rules are in some weird disarray(needed to add wildcard there as it ignores files mentioned in build_variables.so) and similar wildcard needs to be added to internal build system. Test Plan: Imported from OSS Reviewed By: dagitses, ngimel Differential Revision: D33992796 Pulled By: malfet fbshipit-source-id: 3f4fa1c224016d03e1a982a7ae5ac7807bc772e2 (cherry picked from commit `6a5a1b0c3f`)	2022-02-07 17:55:50 +00:00
Jane Xu	d8c3ab11ae	Fix BC by adding aten::_native_multi_head_self_attention (#72429 ) Summary: Forward fixes https://hud2.pytorch.org/minihud?name_filter=linux-xenial-py3.7-gcc5.4%20/%20test%20(backwards_compat,%201,%201,%20linux.2xlarge) ``` The PR is introducing backward incompatible changes to the operator library. Please contact PyTorch team to confirm whether this change is wanted or not. Broken ops: [ aten::_native_multi_head_self_attention(Tensor query, Tensor qkv_weight, Tensor qkv_bias, Tensor proj_weight, Tensor proj_bias, Tensor? mask=None) -> (Tensor) ] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/72429 Reviewed By: albanD Differential Revision: D34043480 Pulled By: janeyx99 fbshipit-source-id: 7db8c682c7d5c3bd911a87d21670b5bd2f3ad5a1 (cherry picked from commit `0985ebb7f1`)	2022-02-07 17:31:57 +00:00
Louis Feng	83b3b5fb00	[PyTorch] Support NVTX range_start and range_end (#70030 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70030 range_push and range_pop do not support multi-thread. It only works for push and pop range in the same thread. For process level ranges, we should use range_start and range_end. This is important because PyTorch forward is on one thread, while the autograd is on a different thread. See NVidia implementation documentation: `cab2dec760/NSight/nvToolsExt.h (L397-L407)` Test Plan: ``` buck test caffe2/test:cuda Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/8162774391483460 ✓ ListingSuccess: caffe2/test:cuda - main (19.640) Summary ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/8162774391483460 ``` Reviewed By: malfet Differential Revision: D33155244 fbshipit-source-id: c7d5143f6da9b6ef0e0811e2fcae03a3e76f24de (cherry picked from commit `22134e91b7`)	2022-02-07 17:31:57 +00:00
Alban Desmaison	9f9b9c48e5	Tensorimpl cleanup try 2 (#72336 ) Summary: This reverts the previous PR and add some comments to make it clear what the intent is. Also removes some extra static_assert that are not needed (at least for the compilers I tried). Pull Request resolved: https://github.com/pytorch/pytorch/pull/72336 Reviewed By: r-barnes Differential Revision: D34006722 Pulled By: albanD fbshipit-source-id: 290fb89a2d2c66a0d1c3651198b31d21216ec230 (cherry picked from commit `76f0aaa765`)	2022-02-07 17:31:57 +00:00
anjali411	9d8f0c7842	Add ZT fastpath for torch.{dot, vdot} (#71129 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71129 cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D34012577 Pulled By: anjali411 fbshipit-source-id: 02d2f2d761f7c9332e2f3cc529e8f1c6b60d7da2 (cherry picked from commit `87318a2e0d`)	2022-02-07 17:31:57 +00:00
albanD	4e98a4b6e3	Update release note bot to actually ping people Pull Request resolved: https://github.com/pytorch/pytorch/pull/72372	2022-02-07 17:08:42 +00:00
Jane Xu	a004f13567	Pin librosa Should mitigate https://github.com/pytorch/pytorch/issues/72432 Pull Request resolved: https://github.com/pytorch/pytorch/pull/72433	2022-02-07 17:01:01 +00:00
Kelly Stanton	9ac28cbfd2	Added prod op to FX2TRT (#72284 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72284 This update adds the prod op to the fx2trt tool which is used to create a TensorRT engine for a PyTorch model. Test Plan: A new unit test was added to test that the op was added to the acc tracer. This text can be run using the following command: buck test --debug //caffe2/test:test_fx_acc_tracer -- --exact 'caffe2/test:test_fx_acc_tracer - test_prod (fx_acc.test_acc_tracer.AccTracerTest)' A new suite of unit tests were also added for the conversion to tensorRT and can be tested using the following command: buck test mode/dev-nosan //caffe2/test/fx2trt/converters:test_prod Please note that unfortunately unlike other pytorch reduce ops such as sum, the pytorch prod function does not support reducing more than 1 dimension at a time (the dim arg cannot be a tuple, only a single int is acceptable for prod). Therefore prod cannot utilize all of the reduce_op code. https://pxl.cl/1Xpn8 https://pxl.cl/1Xpn9 Reviewed By: 842974287 Differential Revision: D33875336 fbshipit-source-id: f9340db3685d681b1cf4ffc3b9fd25d16914e231 (cherry picked from commit `cfe48d3737`)	2022-02-07 16:56:35 +00:00
Peter Bell	bebf8dd543	Define TORCH_ASSERT_ONLY_METHOD_OPERATORS in ATen/core (#72344 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72344 ATen core is mostly compliant already so we can just add the flag to the build system. The only exception is interned string which includes symbols like `aten::add` generated for each operator. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D34010820 Pulled By: albanD fbshipit-source-id: ef1a625d96f30457b5e6beffc5e630516e54f9b4 (cherry picked from commit `b90c262a92`)	2022-02-07 15:48:56 +00:00
Vasiliy Kuznetsov	998a5adf8a	dbr quant function fusion [2/x]: use fusion for observation and inference (#71781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71781 The previous PR added information about fusions found in the subgraphs. This PR uses that information for: 1. inserting observers at the end of fusions and not in the middle 2. during inference, replacing the original op with the fused op. The way this is implemented is that the base op is replaced with the fused op, and all other ops are replaced with identity functions. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR.test_fusion_functions ``` Reviewed By: jerryzh168 Differential Revision: D33775097 Pulled By: vkuzo fbshipit-source-id: 12249b85b2f7ba7545a54872aeb5f1ff2fc928cf (cherry picked from commit `0db4324ea9`)	2022-02-07 14:00:26 +00:00
Vasiliy Kuznetsov	d672bbd0a9	fx quant: add fusion matching for operator.add and torch.relu (#71780 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71780 Adds support for matching operator.add -> torch.relu in FX graph mode quantization. It would be nice to support torch.relu better in general, but saving that for a future PR to keep PRs small. This is useful for DBR quant because we have some test cases in DBR quant which use add-relu, and we'd like to match them to FX. Test Plan: ``` python test/test_quantization.py TestQuantizeFxOps.test_add_relu python test/test_quantization.py TestQuantizeFxOps.test_mul_relu ``` Reviewed By: jerryzh168 Differential Revision: D33775096 Pulled By: vkuzo fbshipit-source-id: 889d9b41d3758ecbbb6d7eab67f64ce3d4892d24 (cherry picked from commit `c1f9f38ca1`)	2022-02-07 14:00:26 +00:00
Vasiliy Kuznetsov	5937c48f4e	dbr quant function fusion [1/x]: record matches for functions (#71764 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71764 For DBR quant, adds the code for matching seen ops to function fusion patterns. After we have the full DAG, we have a separate pass over the dag and add matched fusion patterns to the seen op data structure. This is the first PR in the stack which implements matching and recording the match results. Future PRs in this stack will use the match results to modify observer insertion and inference. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR.test_fusion_functions ``` Reviewed By: jerryzh168 Differential Revision: D33775098 Pulled By: vkuzo fbshipit-source-id: 488aac902bf568d41c863ee49248990411ed9c53 (cherry picked from commit `4ad1ca1abc`)	2022-02-07 14:00:26 +00:00

1 2 3 4 5 ...

43637 Commits