pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
PyTorch MergeBot	4b82ef7928	Revert "Port `index.Tensor` to structured kernels." This reverts commit `cfd84125bd`. Reverted https://github.com/pytorch/pytorch/pull/69607 on behalf of https://github.com/zengk95 due to This is breaking mac trunk tests `cfd84125bd`	2022-06-08 20:16:10 +00:00
Brian Hirsh	cfd84125bd	Port `index.Tensor` to structured kernels. Tracking issue: #55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69607 Approved by: https://github.com/bdhirsh	2022-06-08 18:17:52 +00:00
PyTorch MergeBot	954522a485	Revert "Autogen Tags enum, and allow specifying tags while defining an op" This reverts commit `9476a78f37`. Reverted https://github.com/pytorch/pytorch/pull/77313 on behalf of https://github.com/malfet due to Broke OSS buck builds, see `9476a78f37`	2022-06-03 01:53:53 +00:00
anjali411	9476a78f37	Autogen Tags enum, and allow specifying tags while defining an op Pull Request resolved: https://github.com/pytorch/pytorch/pull/77313 Approved by: https://github.com/ezyang, https://github.com/albanD	2022-06-03 01:13:44 +00:00
PyTorch MergeBot	fca1f495c2	Revert "Port `index.Tensor` to structured kernels." This reverts commit `9fe6f1baf5`. Reverted https://github.com/pytorch/pytorch/pull/69607 on behalf of https://github.com/suo due to this broke master, see: `9fe6f1baf5`	2022-06-01 00:12:15 +00:00
Brian Hirsh	9fe6f1baf5	Port `index.Tensor` to structured kernels. Tracking issue: #55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69607 Approved by: https://github.com/bdhirsh	2022-05-31 22:15:20 +00:00
Brian Hirsh	0161e9eb00	[test] attempt to functionalize ops with mutable positional-only args Pull Request resolved: https://github.com/pytorch/pytorch/pull/76320 Approved by: https://github.com/ezyang	2022-05-19 18:50:34 +00:00
Nikita Vedeneev	3ccf35e003	`nn.functional.glu`: forward over reserse AD support (#77309 ) As per title. To knock out functions from https://github.com/pytorch/pytorch/issues/75432. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77309 Approved by: https://github.com/soulitzer	2022-05-17 13:29:14 +00:00
Brian Hirsh	47dd092bae	add a new at::lift operator, fix torch.tensor for functionalization This reverts commit `85bd65a880`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77285 Approved by: https://github.com/albanD, https://github.com/ezyang	2022-05-12 13:31:19 +00:00
Natalia Gimelshein	257c55f422	make clamp_min/max use minimum/maximum kernels, make clamp* correctly propagate nans (#77306 ) Since clamp_min and maximum is the same op, reuse the same kernel (it also correctly propagate nans from both input and boundary, clamp* propagated from input only). Also fixed codegen to make Tensor? overloads come before Scalar? overloads, cc @alband. Fixes #67428 and #76795 (scalar overloads for clamp* are still not fixed, will do in the next PR). Pull Request resolved: https://github.com/pytorch/pytorch/pull/77306 Approved by: https://github.com/albanD	2022-05-12 03:35:23 +00:00
PyTorch MergeBot	85bd65a880	Revert "[test] try to fix torch.tensor for functionalization" This reverts commit `9edee09ed6`. Reverted https://github.com/pytorch/pytorch/pull/76319 on behalf of https://github.com/janeyx99	2022-05-11 18:48:42 +00:00
Brian Hirsh	9edee09ed6	[test] try to fix torch.tensor for functionalization Pull Request resolved: https://github.com/pytorch/pytorch/pull/76319 Approved by: https://github.com/ezyang	2022-05-11 17:27:34 +00:00
Pearu Peterson	ff10e45993	Unsafe Sparse Compressed tensor factory function Pull Request resolved: https://github.com/pytorch/pytorch/pull/75961 Approved by: https://github.com/cpuhrsch	2022-04-28 23:32:36 +00:00
anjali411	b204ad863f	Revert "Revert "Allow specifying tags for aten operators in native_functions.yaml"" This reverts commit `ea44645c9a`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76456 Approved by: https://github.com/osalpekar	2022-04-28 02:04:57 +00:00
Brian Hirsh	40d96f0afd	Revert "functionalization: add support for zero_()" This reverts commit `7d44b3675b`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/76375 Approved by: https://github.com/datumbox, https://github.com/albanD	2022-04-26 19:27:27 +00:00
Brian Hirsh	ea5209c9fd	functionalization: add native fill() op Pull Request resolved: https://github.com/pytorch/pytorch/pull/76084 Approved by: https://github.com/ezyang	2022-04-25 21:34:16 +00:00
Brian Hirsh	5da76acd1d	functionalization: add a copy() native function Pull Request resolved: https://github.com/pytorch/pytorch/pull/76083 Approved by: https://github.com/albanD	2022-04-25 21:31:48 +00:00
Brian Hirsh	7d44b3675b	functionalization: add support for zero_() Pull Request resolved: https://github.com/pytorch/pytorch/pull/75913 Approved by: https://github.com/albanD	2022-04-25 21:31:48 +00:00
Edward Yang	36420b5e8c	Rename tools/codegen to torchgen (#76275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76275 In preparation for addressing https://github.com/pytorch/pytorch/issues/73212 Diff was generated with: ``` git mv tools/codegen torchgen git grep -l 'tools.codegen' \| xargs sed -i 's/tools.codegen/torchgen/g' sed -i "s/\${TOOLS_PATH}\/codegen/\${TORCH_ROOT}\/torchgen/g" caffe2/CMakeLists.txt ``` and a manual edits to: * tools/test/test_gen_backend_stubs.py * torchgen/build.bzl * torchgen/gen_backend_stubs.py aka this diff: ``` diff --git a/tools/test/test_gen_backend_stubs.py b/tools/test/test_gen_backend_stubs.py index 3dc26c6d2d..104054575e 100644 --- a/tools/test/test_gen_backend_stubs.py +++ b/tools/test/test_gen_backend_stubs.py @@ -9,7 +9,7 @@ from torchgen.gen_backend_stubs import run from torchgen.gen import _GLOBAL_PARSE_NATIVE_YAML_CACHE # noqa: F401 path = os.path.dirname(os.path.realpath(__file__)) -gen_backend_stubs_path = os.path.join(path, '../torchgen/gen_backend_stubs.py') +gen_backend_stubs_path = os.path.join(path, '../../torchgen/gen_backend_stubs.py') # gen_backend_stubs.py is an integration point that is called directly by external backends. # The tests here are to confirm that badly formed inputs result in reasonable error messages. diff --git a/torchgen/build.bzl b/torchgen/build.bzl index ed04e35a43..d00078a3cf 100644 --- a/torchgen/build.bzl +++ b/torchgen/build.bzl @@ -1,6 +1,6 @@ def define_targets(rules): rules.py_library( - name = "codegen", + name = "torchgen", srcs = rules.glob(["*/.py"]), deps = [ rules.requirement("PyYAML"), @@ -11,6 +11,6 @@ def define_targets(rules): rules.py_binary( name = "gen", - srcs = [":codegen"], + srcs = [":torchgen"], visibility = ["//visibility:public"], ) diff --git a/torchgen/gen_backend_stubs.py b/torchgen/gen_backend_stubs.py index c1a672a655..beee7a15e0 100644 --- a/torchgen/gen_backend_stubs.py +++ b/torchgen/gen_backend_stubs.py @@ -474,7 +474,7 @@ def run( ) -> None: # Assumes that this file lives at PYTORCH_ROOT/torchgen/gen_backend_stubs.py - pytorch_root = pathlib.Path(__file__).parent.parent.parent.absolute() + pytorch_root = pathlib.Path(__file__).parent.parent.absolute() template_dir = os.path.join(pytorch_root, "aten/src/ATen/templates") def make_file_manager(install_dir: str) -> FileManager: ``` run_all_fbandroid_tests Test Plan: sandcastle Reviewed By: albanD, ngimel Differential Revision: D35770317 fbshipit-source-id: 153ac4a7fef15b1e750812a90bfafdbc8f1ebcdf (cherry picked from commit c6d485d1d4648fa1c8a4c14c5bf3d8e899b9b4dd)	2022-04-25 01:38:06 +00:00
Edward Z. Yang	a11c1bbdd0	Run Black on all of tools/ Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/76089 Approved by: https://github.com/albanD	2022-04-20 17:29:41 +00:00
PyTorch MergeBot	ea44645c9a	Revert "Allow specifying tags for aten operators in native_functions.yaml" This reverts commit `1dab71ab25`. Reverted https://github.com/pytorch/pytorch/pull/72549 on behalf of https://github.com/malfet	2022-03-28 18:04:38 +00:00
anjali411	1dab71ab25	Allow specifying tags for aten operators in native_functions.yaml Pull Request resolved: https://github.com/pytorch/pytorch/pull/72549 Approved by: https://github.com/ezyang	2022-03-25 21:17:52 +00:00
Peter Bell	40d1f77384	Codegen: python_torch_functions only include relevant operators (#68693 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68693 Generation of python bindings for native functions is split over 8 different files. One for each namespace, with the torch namespace split into 3 shards, and methods in their own file as well. This change ensures that editing any single (non-method) operator only causes one of these files to be rebuilt. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32596270 Pulled By: albanD fbshipit-source-id: 0570ec69e7476b8f1bc21138ba18fe8f95ebbe3f (cherry picked from commit `ba0fc71a3a`)	2022-01-21 15:37:06 +00:00
Alban Desmaison	badf7b0210	fix typo changing the generated code (#69899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69899 Reviewed By: soulitzer Differential Revision: D33093461 Pulled By: albanD fbshipit-source-id: 2c672a2b767f0caed1ef3a1d2afa1cacdfcdc320	2021-12-14 06:36:14 -08:00
soulitzer	51033ec840	Add forward AD layout check for storage numel (#68631 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68631 This PR: - Adds the check that the storage numel of the base and tangent tensors are the same. This is to support the case when as_strided reveals elements that aren't indexable by the input tensor. - Skips the check when batched tensors are involved, because using as_strided to reveal elements that not indexable by the input tensor is already not allowed vmap. - Adds tests for the above two cases, as well as an edge case regarding conj bit (what about neg bit?) For functorch: - we need to copy the batching rule implemented here Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32899678 Pulled By: soulitzer fbshipit-source-id: 54db9550dd2c93bc66b8fb2d36ce40799ebba794	2021-12-14 04:34:25 -08:00
kshitij12345	b737e09f60	expose return_types in Python (#66614 ) Summary: https://github.com/facebookresearch/functorch/issues/87 TODO: * [x] Add comments * [x] Add test * [x] Fix XLA <details> <summary>Generated python_return_types.cpp</summary> ```cpp #include <Python.h> #include <vector> #include <map> #include <string> #include "torch/csrc/autograd/python_return_types.h" #include "torch/csrc/utils/structseq.h" #include "torch/csrc/Exceptions.h" namespace { PyTypeObject* get__det_lu_based_helper_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"det", ""}, {"lu", ""}, {"pivs", ""}, {nullptr} }; static PyTypeObject _det_lu_based_helperNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types._det_lu_based_helper", nullptr, NamedTuple_fields, 3 }; if (!is_initialized) { PyStructSequence_InitType(&_det_lu_based_helperNamedTuple, &desc); _det_lu_based_helperNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &_det_lu_based_helperNamedTuple; } PyTypeObject* get__fake_quantize_per_tensor_affine_cachemask_tensor_qparams_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"output", ""}, {"mask", ""}, {nullptr} }; static PyTypeObject _fake_quantize_per_tensor_affine_cachemask_tensor_qparamsNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types._fake_quantize_per_tensor_affine_cachemask_tensor_qparams", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&_fake_quantize_per_tensor_affine_cachemask_tensor_qparamsNamedTuple, &desc); _fake_quantize_per_tensor_affine_cachemask_tensor_qparamsNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &_fake_quantize_per_tensor_affine_cachemask_tensor_qparamsNamedTuple; } PyTypeObject* get__fused_moving_avg_obs_fq_helper_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"output", ""}, {"mask", ""}, {nullptr} }; static PyTypeObject _fused_moving_avg_obs_fq_helperNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types._fused_moving_avg_obs_fq_helper", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&_fused_moving_avg_obs_fq_helperNamedTuple, &desc); _fused_moving_avg_obs_fq_helperNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &_fused_moving_avg_obs_fq_helperNamedTuple; } PyTypeObject* get__lu_with_info_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"LU", ""}, {"pivots", ""}, {"info", ""}, {nullptr} }; static PyTypeObject _lu_with_infoNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types._lu_with_info", nullptr, NamedTuple_fields, 3 }; if (!is_initialized) { PyStructSequence_InitType(&_lu_with_infoNamedTuple, &desc); _lu_with_infoNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &_lu_with_infoNamedTuple; } PyTypeObject* get__unpack_dual_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"primal", ""}, {"tangent", ""}, {nullptr} }; static PyTypeObject _unpack_dualNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types._unpack_dual", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&_unpack_dualNamedTuple, &desc); _unpack_dualNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &_unpack_dualNamedTuple; } PyTypeObject* get_aminmax_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"min", ""}, {"max", ""}, {nullptr} }; static PyTypeObject aminmaxNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.aminmax", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&aminmaxNamedTuple, &desc); aminmaxNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &aminmaxNamedTuple; } PyTypeObject* get_aminmax_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"min", ""}, {"max", ""}, {nullptr} }; static PyTypeObject aminmax_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.aminmax_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&aminmax_outNamedTuple1, &desc); aminmax_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &aminmax_outNamedTuple1; } PyTypeObject* get_cummax_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject cummaxNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.cummax", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&cummaxNamedTuple, &desc); cummaxNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &cummaxNamedTuple; } PyTypeObject* get_cummax_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject cummax_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.cummax_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&cummax_outNamedTuple1, &desc); cummax_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &cummax_outNamedTuple1; } PyTypeObject* get_cummin_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject cumminNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.cummin", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&cumminNamedTuple, &desc); cumminNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &cumminNamedTuple; } PyTypeObject* get_cummin_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject cummin_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.cummin_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&cummin_outNamedTuple1, &desc); cummin_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &cummin_outNamedTuple1; } PyTypeObject* get_eig_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""}, {nullptr} }; static PyTypeObject eig_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.eig_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&eig_outNamedTuple, &desc); eig_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &eig_outNamedTuple; } PyTypeObject* get_eig_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""}, {nullptr} }; static PyTypeObject eigNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.eig", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&eigNamedTuple1, &desc); eigNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &eigNamedTuple1; } PyTypeObject* get_frexp_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"mantissa", ""}, {"exponent", ""}, {nullptr} }; static PyTypeObject frexpNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.frexp", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&frexpNamedTuple, &desc); frexpNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &frexpNamedTuple; } PyTypeObject* get_frexp_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"mantissa", ""}, {"exponent", ""}, {nullptr} }; static PyTypeObject frexp_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.frexp_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&frexp_outNamedTuple1, &desc); frexp_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &frexp_outNamedTuple1; } PyTypeObject* get_geqrf_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"a", ""}, {"tau", ""}, {nullptr} }; static PyTypeObject geqrf_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.geqrf_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&geqrf_outNamedTuple, &desc); geqrf_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &geqrf_outNamedTuple; } PyTypeObject* get_geqrf_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"a", ""}, {"tau", ""}, {nullptr} }; static PyTypeObject geqrfNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.geqrf", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&geqrfNamedTuple1, &desc); geqrfNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &geqrfNamedTuple1; } PyTypeObject* get_histogram_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"hist", ""}, {"bin_edges", ""}, {nullptr} }; static PyTypeObject histogram_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.histogram_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&histogram_outNamedTuple, &desc); histogram_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &histogram_outNamedTuple; } PyTypeObject* get_histogram_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"hist", ""}, {"bin_edges", ""}, {nullptr} }; static PyTypeObject histogramNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.histogram", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&histogramNamedTuple1, &desc); histogramNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &histogramNamedTuple1; } PyTypeObject* get_kthvalue_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject kthvalueNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.kthvalue", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&kthvalueNamedTuple, &desc); kthvalueNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &kthvalueNamedTuple; } PyTypeObject* get_kthvalue_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject kthvalue_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.kthvalue_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&kthvalue_outNamedTuple1, &desc); kthvalue_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &kthvalue_outNamedTuple1; } PyTypeObject* get_linalg_cholesky_ex_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"L", ""}, {"info", ""}, {nullptr} }; static PyTypeObject linalg_cholesky_exNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_cholesky_ex", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_cholesky_exNamedTuple, &desc); linalg_cholesky_exNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_cholesky_exNamedTuple; } PyTypeObject* get_linalg_cholesky_ex_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"L", ""}, {"info", ""}, {nullptr} }; static PyTypeObject linalg_cholesky_ex_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_cholesky_ex_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_cholesky_ex_outNamedTuple1, &desc); linalg_cholesky_ex_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_cholesky_ex_outNamedTuple1; } PyTypeObject* get_linalg_eig_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""}, {nullptr} }; static PyTypeObject linalg_eigNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_eig", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_eigNamedTuple, &desc); linalg_eigNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_eigNamedTuple; } PyTypeObject* get_linalg_eig_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""}, {nullptr} }; static PyTypeObject linalg_eig_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_eig_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_eig_outNamedTuple1, &desc); linalg_eig_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_eig_outNamedTuple1; } PyTypeObject* get_linalg_eigh_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""}, {nullptr} }; static PyTypeObject linalg_eighNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_eigh", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_eighNamedTuple, &desc); linalg_eighNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_eighNamedTuple; } PyTypeObject* get_linalg_eigh_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""}, {nullptr} }; static PyTypeObject linalg_eigh_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_eigh_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_eigh_outNamedTuple1, &desc); linalg_eigh_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_eigh_outNamedTuple1; } PyTypeObject* get_linalg_inv_ex_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"inverse", ""}, {"info", ""}, {nullptr} }; static PyTypeObject linalg_inv_exNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_inv_ex", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_inv_exNamedTuple, &desc); linalg_inv_exNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_inv_exNamedTuple; } PyTypeObject* get_linalg_inv_ex_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"inverse", ""}, {"info", ""}, {nullptr} }; static PyTypeObject linalg_inv_ex_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_inv_ex_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_inv_ex_outNamedTuple1, &desc); linalg_inv_ex_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_inv_ex_outNamedTuple1; } PyTypeObject* get_linalg_lstsq_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"residuals", ""}, {"rank", ""}, {"singular_values", ""}, {nullptr} }; static PyTypeObject linalg_lstsqNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_lstsq", nullptr, NamedTuple_fields, 4 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_lstsqNamedTuple, &desc); linalg_lstsqNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_lstsqNamedTuple; } PyTypeObject* get_linalg_lstsq_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"residuals", ""}, {"rank", ""}, {"singular_values", ""}, {nullptr} }; static PyTypeObject linalg_lstsq_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_lstsq_out", nullptr, NamedTuple_fields, 4 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_lstsq_outNamedTuple1, &desc); linalg_lstsq_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_lstsq_outNamedTuple1; } PyTypeObject* get_linalg_qr_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"Q", ""}, {"R", ""}, {nullptr} }; static PyTypeObject linalg_qrNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_qr", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_qrNamedTuple, &desc); linalg_qrNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_qrNamedTuple; } PyTypeObject* get_linalg_qr_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"Q", ""}, {"R", ""}, {nullptr} }; static PyTypeObject linalg_qr_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_qr_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_qr_outNamedTuple1, &desc); linalg_qr_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_qr_outNamedTuple1; } PyTypeObject* get_linalg_slogdet_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"sign", ""}, {"logabsdet", ""}, {nullptr} }; static PyTypeObject linalg_slogdetNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_slogdet", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_slogdetNamedTuple, &desc); linalg_slogdetNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_slogdetNamedTuple; } PyTypeObject* get_linalg_slogdet_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"sign", ""}, {"logabsdet", ""}, {nullptr} }; static PyTypeObject linalg_slogdet_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_slogdet_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_slogdet_outNamedTuple1, &desc); linalg_slogdet_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_slogdet_outNamedTuple1; } PyTypeObject* get_linalg_svd_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"U", ""}, {"S", ""}, {"Vh", ""}, {nullptr} }; static PyTypeObject linalg_svd_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_svd_out", nullptr, NamedTuple_fields, 3 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_svd_outNamedTuple, &desc); linalg_svd_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_svd_outNamedTuple; } PyTypeObject* get_linalg_svd_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"U", ""}, {"S", ""}, {"Vh", ""}, {nullptr} }; static PyTypeObject linalg_svdNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_svd", nullptr, NamedTuple_fields, 3 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_svdNamedTuple1, &desc); linalg_svdNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_svdNamedTuple1; } PyTypeObject* get_lstsq_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"QR", ""}, {nullptr} }; static PyTypeObject lstsq_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.lstsq_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&lstsq_outNamedTuple, &desc); lstsq_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &lstsq_outNamedTuple; } PyTypeObject* get_lstsq_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"QR", ""}, {nullptr} }; static PyTypeObject lstsqNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.lstsq", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&lstsqNamedTuple1, &desc); lstsqNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &lstsqNamedTuple1; } PyTypeObject* get_lu_unpack_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"P", ""}, {"L", ""}, {"U", ""}, {nullptr} }; static PyTypeObject lu_unpackNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.lu_unpack", nullptr, NamedTuple_fields, 3 }; if (!is_initialized) { PyStructSequence_InitType(&lu_unpackNamedTuple, &desc); lu_unpackNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &lu_unpackNamedTuple; } PyTypeObject* get_lu_unpack_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"P", ""}, {"L", ""}, {"U", ""}, {nullptr} }; static PyTypeObject lu_unpack_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.lu_unpack_out", nullptr, NamedTuple_fields, 3 }; if (!is_initialized) { PyStructSequence_InitType(&lu_unpack_outNamedTuple1, &desc); lu_unpack_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &lu_unpack_outNamedTuple1; } PyTypeObject* get_max_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject maxNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.max", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&maxNamedTuple, &desc); maxNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &maxNamedTuple; } PyTypeObject* get_max_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject max_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.max_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&max_outNamedTuple1, &desc); max_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &max_outNamedTuple1; } PyTypeObject* get_median_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject medianNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.median", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&medianNamedTuple, &desc); medianNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &medianNamedTuple; } PyTypeObject* get_median_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject median_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.median_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&median_outNamedTuple1, &desc); median_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &median_outNamedTuple1; } PyTypeObject* get_min_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject minNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.min", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&minNamedTuple, &desc); minNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &minNamedTuple; } PyTypeObject* get_min_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject min_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.min_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&min_outNamedTuple1, &desc); min_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &min_outNamedTuple1; } PyTypeObject* get_mode_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject modeNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.mode", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&modeNamedTuple, &desc); modeNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &modeNamedTuple; } PyTypeObject* get_mode_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject mode_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.mode_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&mode_outNamedTuple1, &desc); mode_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &mode_outNamedTuple1; } PyTypeObject* get_nanmedian_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject nanmedianNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.nanmedian", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&nanmedianNamedTuple, &desc); nanmedianNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &nanmedianNamedTuple; } PyTypeObject* get_nanmedian_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject nanmedian_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.nanmedian_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&nanmedian_outNamedTuple1, &desc); nanmedian_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &nanmedian_outNamedTuple1; } PyTypeObject* get_qr_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"Q", ""}, {"R", ""}, {nullptr} }; static PyTypeObject qr_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.qr_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&qr_outNamedTuple, &desc); qr_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &qr_outNamedTuple; } PyTypeObject* get_qr_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"Q", ""}, {"R", ""}, {nullptr} }; static PyTypeObject qrNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.qr", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&qrNamedTuple1, &desc); qrNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &qrNamedTuple1; } PyTypeObject* get_slogdet_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"sign", ""}, {"logabsdet", ""}, {nullptr} }; static PyTypeObject slogdetNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.slogdet", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&slogdetNamedTuple, &desc); slogdetNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &slogdetNamedTuple; } PyTypeObject* get_solve_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"LU", ""}, {nullptr} }; static PyTypeObject solveNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.solve", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&solveNamedTuple, &desc); solveNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &solveNamedTuple; } PyTypeObject* get_solve_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"LU", ""}, {nullptr} }; static PyTypeObject solve_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.solve_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&solve_outNamedTuple1, &desc); solve_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &solve_outNamedTuple1; } PyTypeObject* get_sort_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject sort_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.sort_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&sort_outNamedTuple, &desc); sort_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &sort_outNamedTuple; } PyTypeObject* get_sort_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject sortNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.sort", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&sortNamedTuple1, &desc); sortNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &sortNamedTuple1; } PyTypeObject* get_svd_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"U", ""}, {"S", ""}, {"V", ""}, {nullptr} }; static PyTypeObject svd_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.svd_out", nullptr, NamedTuple_fields, 3 }; if (!is_initialized) { PyStructSequence_InitType(&svd_outNamedTuple, &desc); svd_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &svd_outNamedTuple; } PyTypeObject* get_svd_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"U", ""}, {"S", ""}, {"V", ""}, {nullptr} }; static PyTypeObject svdNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.svd", nullptr, NamedTuple_fields, 3 }; if (!is_initialized) { PyStructSequence_InitType(&svdNamedTuple1, &desc); svdNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &svdNamedTuple1; } PyTypeObject* get_symeig_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""}, {nullptr} }; static PyTypeObject symeig_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.symeig_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&symeig_outNamedTuple, &desc); symeig_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &symeig_outNamedTuple; } PyTypeObject* get_symeig_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""}, {nullptr} }; static PyTypeObject symeigNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.symeig", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&symeigNamedTuple1, &desc); symeigNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &symeigNamedTuple1; } PyTypeObject* get_topk_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject topk_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.topk_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&topk_outNamedTuple, &desc); topk_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &topk_outNamedTuple; } PyTypeObject* get_topk_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject topkNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.topk", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&topkNamedTuple1, &desc); topkNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &topkNamedTuple1; } PyTypeObject* get_triangular_solve_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"cloned_coefficient", ""}, {nullptr} }; static PyTypeObject triangular_solve_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.triangular_solve_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&triangular_solve_outNamedTuple, &desc); triangular_solve_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &triangular_solve_outNamedTuple; } PyTypeObject* get_triangular_solve_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"cloned_coefficient", ""}, {nullptr} }; static PyTypeObject triangular_solveNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.triangular_solve", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&triangular_solveNamedTuple1, &desc); triangular_solveNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &triangular_solveNamedTuple1; } } namespace torch { namespace autograd { std::map<std::string, PyTypeObject>& get_namedtuple_types_map() { // [NOTE] Non-global map // This map calls Python functions during its initialization. // If it is a global static variable and in case it is loaded // before Python interpreter is ready, then the calls it makes during // initialization will SEGFAULT. // To avoid this we make it function static variable so that it is // initialized only after the Python interpreter is ready. static std::map<std::string, PyTypeObject> namedtuple_types_map = { {"_det_lu_based_helper", get__det_lu_based_helper_namedtuple()}, {"_fake_quantize_per_tensor_affine_cachemask_tensor_qparams", get__fake_quantize_per_tensor_affine_cachemask_tensor_qparams_namedtuple()}, {"_fused_moving_avg_obs_fq_helper", get__fused_moving_avg_obs_fq_helper_namedtuple()}, {"_lu_with_info", get__lu_with_info_namedtuple()}, {"_unpack_dual", get__unpack_dual_namedtuple()}, {"aminmax", get_aminmax_namedtuple()}, {"aminmax_out", get_aminmax_out_namedtuple()}, {"cummax", get_cummax_namedtuple()}, {"cummax_out", get_cummax_out_namedtuple()}, {"cummin", get_cummin_namedtuple()}, {"cummin_out", get_cummin_out_namedtuple()}, {"eig_out", get_eig_out_namedtuple()}, {"eig", get_eig_namedtuple()}, {"frexp", get_frexp_namedtuple()}, {"frexp_out", get_frexp_out_namedtuple()}, {"geqrf_out", get_geqrf_out_namedtuple()}, {"geqrf", get_geqrf_namedtuple()}, {"histogram_out", get_histogram_out_namedtuple()}, {"histogram", get_histogram_namedtuple()}, {"kthvalue", get_kthvalue_namedtuple()}, {"kthvalue_out", get_kthvalue_out_namedtuple()}, {"linalg_cholesky_ex", get_linalg_cholesky_ex_namedtuple()}, {"linalg_cholesky_ex_out", get_linalg_cholesky_ex_out_namedtuple()}, {"linalg_eig", get_linalg_eig_namedtuple()}, {"linalg_eig_out", get_linalg_eig_out_namedtuple()}, {"linalg_eigh", get_linalg_eigh_namedtuple()}, {"linalg_eigh_out", get_linalg_eigh_out_namedtuple()}, {"linalg_inv_ex", get_linalg_inv_ex_namedtuple()}, {"linalg_inv_ex_out", get_linalg_inv_ex_out_namedtuple()}, {"linalg_lstsq", get_linalg_lstsq_namedtuple()}, {"linalg_lstsq_out", get_linalg_lstsq_out_namedtuple()}, {"linalg_qr", get_linalg_qr_namedtuple()}, {"linalg_qr_out", get_linalg_qr_out_namedtuple()}, {"linalg_slogdet", get_linalg_slogdet_namedtuple()}, {"linalg_slogdet_out", get_linalg_slogdet_out_namedtuple()}, {"linalg_svd_out", get_linalg_svd_out_namedtuple()}, {"linalg_svd", get_linalg_svd_namedtuple()}, {"lstsq_out", get_lstsq_out_namedtuple()}, {"lstsq", get_lstsq_namedtuple()}, {"lu_unpack", get_lu_unpack_namedtuple()}, {"lu_unpack_out", get_lu_unpack_out_namedtuple()}, {"max", get_max_namedtuple()}, {"max_out", get_max_out_namedtuple()}, {"median", get_median_namedtuple()}, {"median_out", get_median_out_namedtuple()}, {"min", get_min_namedtuple()}, {"min_out", get_min_out_namedtuple()}, {"mode", get_mode_namedtuple()}, {"mode_out", get_mode_out_namedtuple()}, {"nanmedian", get_nanmedian_namedtuple()}, {"nanmedian_out", get_nanmedian_out_namedtuple()}, {"qr_out", get_qr_out_namedtuple()}, {"qr", get_qr_namedtuple()}, {"slogdet", get_slogdet_namedtuple()}, {"solve", get_solve_namedtuple()}, {"solve_out", get_solve_out_namedtuple()}, {"sort_out", get_sort_out_namedtuple()}, {"sort", get_sort_namedtuple()}, {"svd_out", get_svd_out_namedtuple()}, {"svd", get_svd_namedtuple()}, {"symeig_out", get_symeig_out_namedtuple()}, {"symeig", get_symeig_namedtuple()}, {"topk_out", get_topk_out_namedtuple()}, {"topk", get_topk_namedtuple()}, {"triangular_solve_out", get_triangular_solve_out_namedtuple()}, {"triangular_solve", get_triangular_solve_namedtuple()}, }; return namedtuple_types_map; } PyTypeObject* get_namedtuple(std::string name) { static auto& namedtuple_types_map = get_namedtuple_types_map(); return namedtuple_types_map[name]; } void initReturnTypes(PyObject* module) { static struct PyModuleDef def = { PyModuleDef_HEAD_INIT, "torch._C._return_types", nullptr, -1, {}}; PyObject* return_types_module = PyModule_Create(&def); if (!return_types_module) { throw python_error(); } for (const auto& return_type_pair : get_namedtuple_types_map()) { // hold onto the TypeObject for the unlikely case of user // deleting or overriding it. Py_INCREF(return_type_pair.second); if (PyModule_AddObject( return_types_module, return_type_pair.first.c_str(), (PyObject)return_type_pair.second) != 0) { Py_DECREF((PyObject)return_type_pair.second); throw python_error(); } } // steals a reference to return_types on success if (PyModule_AddObject(module, "_return_types", return_types_module) != 0) { Py_DECREF(return_types_module); throw python_error(); } } } // namespace autograd } // namespace torch ``` </details> <details> <summary>Eg. updated call in other python__functions</summary> ```cpp // linalg_cholesky_ex static PyObject THPVariable_linalg_cholesky_ex(PyObject* self_, PyObject* args, PyObject* kwargs) { HANDLE_TH_ERRORS static PyTypeObject* NamedTuple = get_namedtuple("linalg_cholesky_ex"); static PyTypeObject* NamedTuple1 = get_namedtuple("linalg_cholesky_ex_out"); static PythonArgParser parser({ "linalg_cholesky_ex(Tensor input, , bool upper=False, bool check_errors=False, TensorList[2] out=None)", }, /traceable=/true); ParsedArgs<4> parsed_args; auto _r = parser.parse(nullptr, args, kwargs, parsed_args); if(_r.has_torch_function()) { return handle_torch_function(_r, nullptr, args, kwargs, THPLinalgVariableFunctionsModule, "torch.linalg"); } if (_r.isNone(3)) { // aten::linalg_cholesky_ex(Tensor self, , bool upper=False, bool check_errors=False) -> (Tensor L, Tensor info) auto dispatch_linalg_cholesky_ex = [](const at::Tensor & self, bool upper, bool check_errors) -> ::std::tuple<at::Tensor,at::Tensor> { pybind11::gil_scoped_release no_gil; return at::linalg_cholesky_ex(self, upper, check_errors); }; return wrap(NamedTuple, dispatch_linalg_cholesky_ex(_r.tensor(0), _r.toBool(1), _r.toBool(2))); } else { // aten::linalg_cholesky_ex.L(Tensor self, *, bool upper=False, bool check_errors=False, Tensor(a!) L, Tensor(b!) info) -> (Tensor(a!) L, Tensor(b!) info) auto out = _r.tensorlist_n<2>(3); auto dispatch_linalg_cholesky_ex_out = [](at::Tensor & L, at::Tensor & info, const at::Tensor & self, bool upper, bool check_errors) -> ::std::tuple<at::Tensor,at::Tensor> { pybind11::gil_scoped_release no_gil; return at::linalg_cholesky_ex_out(L, info, self, upper, check_errors); }; return wrap(NamedTuple1, dispatch_linalg_cholesky_ex_out(out[0], out[1], _r.tensor(0), _r.toBool(1), _r.toBool(2))); } Py_RETURN_NONE; END_HANDLE_TH_ERRORS } ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/66614 Reviewed By: H-Huang Differential Revision: D32741134 Pulled By: zou3519 fbshipit-source-id: 27bada30d20e66333ca1be1775608d9f0cbf9f59	2021-12-06 09:05:29 -08:00
Christian Puhrsch	75955e4ef8	[clone][sparse] Add `torch._C._sparse` namespace (#68672 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68672 This PR adds `python_module: sparse` to `native_function.yaml`. These functions would appear in `torch._C._sparse` namespace instead of just `torch`. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D32517813 fbshipit-source-id: 7c3d6df57a24d7c7354d0fefe1b628dc89be9431	2021-11-19 19:47:38 -08:00
soulitzer	2455cc2adf	Address case when layout of tangent is not same as base (#66292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66292 In this PR: 1. Fix the case when tangent has a different layout from the base when `set_fw_grad` by adding a native function and its batching rule. For (1) we replace the following: ``` Tensor new_with_same_meta(const Variable& base) { int64_t nelement_in_storage = base.storage().nbytes() / base.itemsize(); auto new_tensor = at::zeros({nelement_in_storage}, base.options()); auto res = new_tensor.as_strided(base.sizes(), base.strides(), base.storage_offset()); return res; } ``` with a native function as to enable a batching rule to alter its behavior. This new function will be similar to `new_zeros_strided` except we also require the `storage_offset` and `storage_numel` arguments. Possible concerns: - Why have redundant logic? Why not add new args `new_zeros_strided`? This is probably a niche use case, so it's better not to complicate the current API. - Previously the created tensor inherits the TensorOptions of the primal. Now we inherit from the TensorOptions of the tangent. - Probably fine. Likely, no one relies on this because the behavior is only triggered when tangent/base have different layouts. - Why pass in exploded size, stride, and offset? It is possible in the non-batched case to pass in a tensor directly, but not possible when we'd like to have a batching rule. The size, stride, and offset we'd be passing won't belong to any live tensor. Test Plan: Imported from OSS Reviewed By: zou3519, albanD Differential Revision: D31842019 Pulled By: soulitzer fbshipit-source-id: a58433d814fd173bc43a2c550b395377dba40de2	2021-11-19 14:29:46 -08:00
Brian Hirsh	0032fa7725	Add a Functionalization pass in core (#64432 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64432 Original PR description + feedback here: https://github.com/pytorch/pytorch/pull/63048 I've addressed all of the feedback in the original PR and made some pretty large changes, listed below. Table of Contents - Starting points - List of the main changes from the original PR - Next Steps - Example codegen output (for a view, mutation, and view+mutation op) Starting Points A good place to start when looking through the PR: * Alban mentioned that this is a useful mental model (thanks Ed for originally making this clear to me). Semantically, the pass currently does THREE things, which are all needed by functorch - all fused together into one big pass. * (a) alias removal, which replaces {view} calls with {view}_copy calls, and manually tracks aliasing information, so that when one tensor is mutated, we re-apply the same mutation to all of the aliases. This is the bulk of the work - once this is done, the next 2 things are trivial to implement. * (b) mutation removal, which is easy to do once we know that there are no aliases. Every mutation `a.add_(b)` becomes `a.replace_(a.add(b))` * (c) reapplying views: all of the `{view}_copy` calls are replaced with `{view}` calls again. This is an optimization that we can make specifically for functorch (and strided backends), that only care about mutation removal and not alias removal * XLA and Vulkan only want (a), or (a) + (b). Later, we'll want to split this out so that you can actually opt into different versions of this logic. * There is currently no {view}_copy replacement, because the pass just <replace views with copies> and <replace copies with views> steps have been combined. Later, we'll want to actually implement {view}_copy variants of each view operator, probably with codegen. * documentation breadcrumb 1, in `FunctionalTensorWrapper.cpp`: https://github.com/pytorch/pytorch/pull/64432/files#diff-a0bac99bf205dba5b94cb64fc2466d3d55d991887572f9cd6a02e27b3a91dd60R59 (you might have to expand the `FunctionalTensorWrapper.cpp` file, which GitHub closes by default because it's large) * documentation breadcrumb 2, in `FunctionalTensorWrapper.h`: https://github.com/pytorch/pytorch/pull/64432/files#diff-c945c71a4ccac65871f24a912e8904f9a5088b24a32e636727ea9c8fe920708aR12 * Reading through the codegen output at the bottom of this description. Main changes from the original PR (1) I use lambdas instead of a giant enum to handle all of the different views. This results in less boilerplate per view op (and more stuff that can be codegen'd). Every `ViewMeta` object now contains a `forward` and `reverse` lambda, that knows how to replay the view and its inverse. This makes the actual code that executes the replaying logic a lot less boilerplate-y (see `Alias::sync_update_operations` and `FunctionalTensorWrapper::sync_`) (2) Every tensor during the functionalization pass is always wrapped in a `FunctionalTensorWrapper`. This is potentially unnecessary for Vulkan/XLA, and will have a mild perf impact, but for now this PR just targets the functorch use case. I previously had a complicated design a (`FunctionalTensorImplBase` class) to avoid needing the wrapper for XLA, but it had some subtleties that are gonna require more thought to fix, so I'm pushing that off for now. (3) `FunctionalTensorWrapper` objects accurately report stride information. It's a little annoying to do this though, because the logic that calculates stride info for each view isn't easily separated from the actual view kernels in core, `at::native::{view}`. I do this by adding logic in each `at::functionalization::{view}` kernel to call the reference implementation `at::native::{view}`. I don't do anything with the output aside from taking it's size/stride/storage_offset to set the actual output tensor's size/stride/storage_offset correctly. There's another annoying part to this: I'm pretty sure that we want to pass in the actual wrapper tensors directly into the native kernels, not their inner unwrapped values. But there are some `at::native::{view}` kernels that call other tensor methods, which re-invokes the dispatcher, calling functionalization/functorch kernels that try do the unwrapping. To do this, right now I have an `AutoDispatchDirectlyToNative` guard that basically ensures that any tensor methods called inside of the at::native::{view} op always redispatch straight to the CPU kernel (which will be another at::native:: kernel). This feels kind of heavy handed, but I'm not sure of a better way to do it. (4) `FunctionalTensorWrapper` objects accurately report aliasing information. There's a new `FunctionalStorageImpl` class (subclass of `StorageImpl`) that allows tensors in the functionalization pass to accurately alias storage. If two tensors `a` and `b` in a functionalized program are views of one another, then `a.storage.is_alias_of(b.storage)` should return true. I added this in a pretty similar way to how meta tensors allocate storage, although I don't pass in an actual allocator (I think this is fine because you should never resize a functional tensor's storage). One thing I'm not sure about - should `FunctionalTensorWrapper` set `storage_access_should_throw_`: (a) always, (b) never, (c) only if its wrapped tensor has it set. Right now I have it not set, mostly because calling the reference view functions (`at::native::{view}`) requires looking at the storage. But that means that if you try to access storage from python in a functionalized program, you'll get silent garbage instead of an error. Related question: are we planning on exposing meta tensor storage to python in the future (even though it contains garbage)? (5) better docs :) View operator coverage (6) The functionalization pass now gets math-composite view ops for free. I didn't add the `Functionalize` dispatch key to the composite set, because I don't want composite ops like `torch.ones` to get decomposed before hitting the functionalization pass. Instead, I added codegen to manually register the `at::native::` kernels of composite view ops. This is a little hairy, because the names of the `at::native::` kernels aren't easily accessible. They're stored in a `Dict[DispatchKey, BackendIndex]`. I made a best-effort attempt to get each view kernel's name, basically by assuming that every view op has either a composite or cpu implementation. There's also a hardcoded list of composite view ops in `gen_inplace_or_view_type.py`, but it looks like it's wrong. This is probably worth rationalizing later, but instead I created a new list of the "complete" set of composite view ops, and preserved the old set by hardcoding the delta between the two sets. (7) I've added codegen for ops that are both views AND mutations, like `transpose_()` (why do we even have these {emoji:1f622}). From some light testing, it looks like they work correctly with one caveat: I had a hard time ensuring that functorch programs that mutate their inputs using ops like `transpose_()` preserve the input mutations after the program finishes running. For (in my corresponding functorch branch) I emit a warning when this happens, and just don't preserve the mutation (8) I added `{view}_inverse` implementations for every view op, in `FunctionalInverses.cpp`. These are needed to take mutations made to views and replay them back onto the base. To reduce boilerplate, the codegen generates function declarations for each `{view}_inverse` function, so you get a nice compiler error when someone eventually adds a new view op. The only view ops currently not supported are (a) as_strided, and (b) the sparse view ops (values()/indices()). I can add support for as_strided, but it needs an `as_strided_inverse()` function. That will look really similar to the `as_strided_backward()` function in FunctionsManual.cpp, but it has some noticeable differences: we basically want an `as_strided_embed` for autograd and `as_strided_scatter` for functionalization. We also will probably need them to be primitives w.r.t to autograd, since the currently implementation for autograd uses view().copy_() calls that XLA won't be able to handle. I'm wondering if anyone has any objections, but otherwise I can make those change (which will require writing backward formulas for `as_strided_embed` and `as_strided_scatter`). I did a bunch of manual testing that all looks pretty good, but it's definitely not fully tested. Ed pointed out that once XLA uses this pass (or at least once there's a POC), we can just run the existing xla view test suite. Hopefully that delay is okay - if it's not, maybe we can think about using OpInfos similar to how functorch uses them for testing. Note: there's some duplication with autograd's view code. Every `{view}_inverse` implementation is really similar to the implementation for that view listed in `derivatives.yaml`. There are some major differences though: * the autograd implementations over those backwards functions (like `permute_backwards()`, in `FunctionsManual.cpp`) internally call other view ops. For functoinalization, we want them to (eventually call `{view}_copy` operators). * For view ops that take a subset of the original storage, like `slice/select/diagonal/as_strided()`, the autograd backward functions fill the "spaces" in the inverse call with zeroes. For functionalizations, we want to fill them with the value of `base` at those positions. It looks like this currently applies to 6 total ops (since we can ignore composites): * select * slice * diagonal * as_stridied * split * split_with_sizes A nice end state would probably be for the autograd + functoinalization codegen to both look at the same yaml (either `derivatives.yaml`, or something else), and automatically generate the right thing. I didn't leave that in scope for this PR though. Current State + Next Steps There are a bunch of followups after this PR eventually lands. Roughly in order: * Use the current pass to register problematic composite ops in functorch. Also, nested `functionalize()` calls aren't supported yet (I mostly just need to remove some debug asserts and test it). * Work on freeing up dispatch key space in the by deduplicating the `{backend}`/`Autograd{backend}`/`Sparse{backend}`/`Quantized{backend}` keys * Once we have more dispatch keys, split up this pass into 3 pieces - it's currently fused, and doesn't do the right thing for vulkan/XLA. Specifically, all of the `{view}` calls in the current pass's view-replay logic should turn into `{view}_copy` calls that vulkan/XLA know how to implement, and there will be separate passes for (a) removing mutations, and (b) turning `{view}_copy` calls back into `{view}` calls. For Vulkan, we eventually want a pass that ONLY removes aliasing and view calls, and doesn't remove mutations. We can also probably make the 2 new passes user dispatch keys to save dispatch key space, if they'll only be used by functorch anyway. * Do more of a dive on perf for the vulkan/xla use cases. There are several areas to improve perf with varying levels of effort required. The simplest one that I'll probably do regardless is to codegen the out-of-place kernels instead of using a boxed fallback. Getting a POC working for xla will also be useful to test the view operator coverage. Example Codegen Output View Op: ``` ::std::vector<at::Tensor> split_Tensor(c10::DispatchKeySet ks, const at::Tensor & self, int64_t split_size, int64_t dim) { auto self_ = at::functionalization::impl::unwrapFunctionalTensor(self); ::std::vector<at::Tensor> out; { at::AutoDispatchBelowFunctionalize guard; auto tmp_output = at::redispatch::split(ks & c10::after_func_keyset, self_, split_size, dim); out = at::functionalization::impl::wrapFunctionalTensor(tmp_output); // I'm fusing the [alias removal], [mutation removal], [add views back] passes together. // Later, we'll want to turn them into separate passes (since e.g. vulkan only cares about alias removal). } at::functionalization::ViewMeta view_meta = at::functionalization::ViewMeta( [split_size, dim](const at::Tensor& base, int64_t mutated_view_idx) -> at::Tensor { return base.split(split_size, dim)[mutated_view_idx]; }, [split_size, dim](const at::Tensor& base, const at::Tensor& mutated_view, int64_t mutated_view_idx) -> at::Tensor { return at::functionalization::impl::split_inverse(base, mutated_view, mutated_view_idx, split_size, dim); } ); at::functionalization::impl::set_view_meta(out, self, view_meta); at::AutoDispatchDirectlyToNative native_guard; ::std::vector<at::Tensor> reference_tensor_output = at::native::split(self, split_size, dim); at::functionalization::impl::set_strides(out, reference_tensor_output); return out; } ``` Mutation Op: ``` at::Tensor & add__Tensor(c10::DispatchKeySet ks, at::Tensor & self, const at::Tensor & other, const at::Scalar & alpha) { at::functionalization::impl::sync(self); at::functionalization::impl::sync(other); auto self_ = at::functionalization::impl::unwrapFunctionalTensor(self); auto other_ = at::functionalization::impl::unwrapFunctionalTensor(other); at::Tensor tmp_output; { at::AutoDispatchBelowFunctionalize guard; // The functionalization pass explicitly doesn't pass out= parameters to the redispatch tmp_output = at::redispatch::add( ks & c10::after_func_keyset, self_, other_, alpha); } self.replace_(tmp_output); at::functionalization::impl::maybe_add_update(self); return self; } ``` View + Mutation Op: ``` at::Tensor & transpose_(c10::DispatchKeySet ks, at::Tensor & self, int64_t dim0, int64_t dim1) { at::functionalization::ViewMeta view_meta = at::functionalization::ViewMeta( [dim0, dim1](const at::Tensor& base, int64_t mutated_view_idx) -> at::Tensor { return base.transpose(dim0, dim1); }, [dim0, dim1](const at::Tensor& base, const at::Tensor& mutated_view, int64_t mutated_view_idx) -> at::Tensor { return at::functionalization::impl::transpose_inverse(base, mutated_view, dim0, dim1); } ); at::functionalization::impl::mutate_view_meta(self, view_meta); // See Note [Propagating strides in the functionalization pass] // Directly update the sizes/strides/storage_offset fields on self using the inplace call. // I need the guard because I don't want the at::native kernel to end up calling more functionalization/functorch kernels. // Its only job is to directly compute the output size/stride/storage_offset metadata. at::AutoDispatchDirectlyToNative native_guard; at::native::transpose_(self, dim0, dim1); return self; } ``` Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31942093 Pulled By: bdhirsh fbshipit-source-id: b95598dae35dd1842fa8b1d8d1448332f3afaadf	2021-10-28 10:51:17 -07:00
Brian Hirsh	665c148e42	move some codegen utilities into utils.py (#63094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63094 This PR: - Moves `FileManager` and its dependencies (`assert_never` and other imports) to `utils.py`, and updates all of the call-sites with the fresh imports - Passes the list of NativeFunction objects into `gen_trace_type` directly, instead of requiring the function to regenerate it (we already have it) The purpose of the reshuffling is to avoid circular dependencies in the next PR, where I add codegen for the functionalization pass, which gets called from `gen.py` (but depends on some stuff from the autograd codegen - in partulcar, the list of view ops). Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31942096 Pulled By: bdhirsh fbshipit-source-id: 36118facae61f25f8922bb43ad2818c80b53504e	2021-10-28 10:49:17 -07:00
lezcano	82a216c45b	Add tensor.{adjoint(),H,mT,mH} methods and properties (#64179 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64179 This PR follows the discussion in https://github.com/pytorch/pytorch/issues/45063#issuecomment-904431478 Fixes https://github.com/pytorch/pytorch/issues/45063 cc ezyang anjali411 dylanbespalko mruberry Lezcano nikitaved rgommers pmeier asmeurer leofang AnirudhDagar asi1024 emcastillo kmaehashi heitorschueroff Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30730483 Pulled By: anjali411 fbshipit-source-id: 821d25083f5f682450f6812bf852dc96a1cdf9f2	2021-10-13 07:44:43 -07:00
Peter Bell	44ede71751	Shard python_torch_functions.cpp (#62187 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62187 This file can take 3 minutes on its own to compile, and after python_functions.cpp is the second limiting factor for compile time of `libtorch_python` on a 32-core threadripper. This splits it into 3 files that take around 1 minute each to compile. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D29962048 Pulled By: albanD fbshipit-source-id: 99016d75912bff483fe21b130cef43a6882f8c0e	2021-08-25 15:10:43 -07:00
Richard Zou	389380ffcc	[reland] Refactor Tensor::to to call a primitive that is not copy_. (#62262 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62262 Context ------- functorch is unable to vmap(grad(f)) when f contains a .to call. This is because .to (when it is not a no-op) decomposes to .copy_ under grad and the .copy_ is not compatible with vmap. Fix --- The fix for this is to have all Tensor::to variants call a new operator, `_to_copy`, that always copies and is a primitive w.r.t. autograd so that autograd decomposes Tensor::to into a call to `_to_copy`. (This is related to https://github.com/pytorch/pytorch/issues/60956, please let me know if you want to bikeshed the naming). In order to get this done I had to do a bit of refactoring. All of the `::to` implementations now call `to_impl` which may call `_to_copy`. Autograd codegen changes ------------------------ The second thing I had to do was modify the autograd codegen. Right now, autograd assumes that every output is either statically known to be differentiable or not differentiable at codegen time. `_to_copy` is a little special because its differentiability depends on the output dtype. e.g. `torch.randn(3, requires_grad=True).to(torch.long)` is non differentiable. To get this to work: - I changed how `output_differentiability` in derivatives.yaml work. - output_differentiability can now accept "conditions" for each of the output arguments. A "condition" is some C++ code. - We currently only support `output_differentiability` with conditions if there is a single output. This is for convenience and can be changed in the future. - I added a new `output_differentiability_conditions` field to DifferentiabilityInfo. This gets populated in load_derivatives.yaml - forward-mode and reverse-mode AD take `output_differentiability_conditions` into account. Here's how the generated code for `VariableType::_to_copy` [looks like](https://gist.github.com/zou3519/93462df4bda1837acee345205b7cc849) No other autogenerated code gets modified by this PR. Performance benchmarking ------------------------ - I benchmarked [three cases that demonstrate overhead](https://gist.github.com/zou3519/5b6985e6906b80eec5a0dd94ed5b6a1a). - Case A: No-op .to(). Instruction count went from 50223 to 25623. I have no clue why but this is a good thing. - Case B: not-no-op .to(). Instruction count went from 665291 to 671961. This is expected; `_to_copy` adds an additional dispatch. - Case C: not-no-op .to() forward pass and backward pass. Instruction count went from 4022841 to 4030057. This PR adds an additional dispatch to .to() (so there should be one additional dispatch in the forward pass) so this number looks reasonable. Test Plan --------- - test_torch.py has a test_to - test_cuda.py has test_to* - test_autograd has tests (test_type_conversions) that exercise the reverse-mode path - test_ops.py has some tests (like log_softmax) that exercise the reverse-mode and forward-mode AD path. - test_quantization, test_namedtensor all exercise tensor.to as well. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29934998 Pulled By: zou3519 fbshipit-source-id: 820069acd66fd5af97b98f42edfca68572c9eb1c	2021-07-29 10:49:32 -07:00
Nikita Shulga	478098aaac	Revert D29801652: Refactor Tensor::to to call a primitive that is not copy_. Test Plan: revert-hammer Differential Revision: D29801652 (`29bb3f4647`) Original commit changeset: bb01eb1acf3d fbshipit-source-id: 93693bad8068d47a3a4c16f34f300e03ea573897	2021-07-26 19:37:17 -07:00
Richard Zou	29bb3f4647	Refactor Tensor::to to call a primitive that is not copy_. (#61458 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61458 Context ------- functorch is unable to vmap(grad(f)) when f contains a .to call. This is because .to (when it is not a no-op) decomposes to .copy_ under grad and the .copy_ is not compatible with vmap. Fix --- The fix for this is to have all Tensor::to variants call a new operator, `_to_copy`, that always copies and is a primitive w.r.t. autograd so that autograd decomposes Tensor::to into a call to `_to_copy`. (This is related to https://github.com/pytorch/pytorch/issues/60956, please let me know if you want to bikeshed the naming). In order to get this done I had to do a bit of refactoring. All of the `::to` implementations now call `to_impl` which may call `_to_copy`. Autograd codegen changes ------------------------ The second thing I had to do was modify the autograd codegen. Right now, autograd assumes that every output is either statically known to be differentiable or not differentiable at codegen time. `_to_copy` is a little special because its differentiability depends on the output dtype. e.g. `torch.randn(3, requires_grad=True).to(torch.long)` is non differentiable. To get this to work: - I changed how `output_differentiability` in derivatives.yaml work. - output_differentiability can now accept "conditions" for each of the output arguments. A "condition" is some C++ code. - We currently only support `output_differentiability` with conditions if there is a single output. This is for convenience and can be changed in the future. - I added a new `output_differentiability_conditions` field to DifferentiabilityInfo. This gets populated in load_derivatives.yaml - forward-mode and reverse-mode AD take `output_differentiability_conditions` into account. Here's how the generated code for `VariableType::_to_copy` [looks like](https://gist.github.com/zou3519/93462df4bda1837acee345205b7cc849) No other autogenerated code gets modified by this PR. Performance benchmarking ------------------------ - I benchmarked [three cases that demonstrate overhead](https://gist.github.com/zou3519/5b6985e6906b80eec5a0dd94ed5b6a1a). - Case A: No-op .to(). Instruction count went from 50223 to 25623. I have no clue why but this is a good thing. - Case B: not-no-op .to(). Instruction count went from 665291 to 671961. This is expected; `_to_copy` adds an additional dispatch. - Case C: not-no-op .to() forward pass and backward pass. Instruction count went from 4022841 to 4030057. This PR adds an additional dispatch to .to() (so there should be one additional dispatch in the forward pass) so this number looks reasonable. Test Plan --------- - test_torch.py has a test_to - test_cuda.py has test_to* - test_autograd has tests (test_type_conversions) that exercise the reverse-mode path - test_ops.py has some tests (like log_softmax) that exercise the reverse-mode and forward-mode AD path. - test_quantization, test_namedtensor all exercise tensor.to as well. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29801652 Pulled By: zou3519 fbshipit-source-id: bb01eb1acf3d79d84f284150d1be4be3b4ace351	2021-07-26 13:02:39 -07:00
Laurence Rouesnel	adb73d3dcf	Removed overhead from reshape() call if tensor doesn't need to be changed (#61466 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61466 ## Goal Per #55126 the performance of `reshape` is worse than `alias` in cases where they are performing the same operation (i.e. where reshape is returning a view) because `reshape` delegates to `view` and duplicates some of the operations (specifically `infer_size_dv` and `computeStride`). The goal of this pull-request is to reduce or remove the additional overhead that `reshape` has. ### Proposed Implementation Instead of using `view` we implement a private/internal operator (`_reshape_alias`) that `reshape` dispatches to which skips the relevant checks. This is functionally equivalent to `as_strided` however it is a lot simpler because it's specialized to this use-case, and importantly the `backward` implementation is a lot faster. Note that we have to dispatch (`reshape` is a composite operator) because `reshape` can return either a view or a copy of the Tensor depending on the parameters, and this complicates implementing a derivative/backward for `reshape`. ### Why not `as_strided`? Using `as_strided` directly slows down autograd. If we use a custom function equivalent to `_reshape_alias` but with a simpler backward function then `view` has the same performance as `reshape`. If we delegate to `as_strided` it is about 56% slower (and this holds against our custom function). This is also the reason we make an internal operator named `_reshape_alias` instead of exposing a new operator since this should only be used in the `reshape` case and it is effectively a more limited version of `view`, `alias`, and `as_strided`. ## Benchmarks In a micro-benchmark for `backward` running: ```cpp // Setup at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); // Benchmark loop // `reshape(-1)` replaced with a call to view(-1) for view baseline x.pow(4).reshape(-1).mean().backward(); ``` I also benchmarked simple operations without gradients using: ```cpp // Setup at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); // Benchmark loop x.reshape(-1) // replaced with a call to view(-1) for view baseline ``` Baselined to `view`: * Original `reshape`: `+3.3%` (without gradients `+20.8%`) * Using `as_strided`: `+55.1%` (without gradients `+1.0%`) * Using custom `_reshape_view`: `-1.0%` (without gradients `+6.2%`) In absolute terms (note the percentages above were generated comparing between runs/tests rather than to a single baseline): * Original `view`: `53.66 us` (without gradients `582.78 ns`) * Original `reshape`: `55.46 us` (without gradients `704.24 ns`) * Using `as_strided`: `83.24 us` (without gradients `576.49 ns`) * Using custom `_reshape_view`: `53.13 us` (without gradients `536.01 ns`) Note that these benchmarks perform a backwards operation as well. When compared without using gradient computation at all the performance differneces are more pronounced as this takes up more of the time. ### Original performance <details> <summary>Benchmark results</summary> ``` [<torch.utils.benchmark.utils.common.Measurement object at 0x7f0e4d393160> x.pow(4).view(-1).mean().backward(); setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); Median: 53.66 us IQR: 2.70 us (52.54 to 55.24) 884 measurements, 100 runs per measurement, 1 thread] [<torch.utils.benchmark.utils.common.Measurement object at 0x7f0e2ebd4fa0> x.pow(4).reshape(-1).mean().backward(); setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); Median: 55.46 us IQR: 2.61 us (54.39 to 57.01) 889 measurements, 100 runs per measurement, 1 thread] 2276116 2286256 <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f0e5b2e3e20> 2640 ???:at::detail::computeStride(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::SmallVector<long, 5u> const&) 1920 ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>) 1520 ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>) 1040 ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long>&&) 980 ???:void at::infer_size_impl<c10::SmallVector<long, 5u> >(c10::ArrayRef<long>, long, c10::SmallVector<long, 5u>&) 720 ???:__tls_get_addr 520 ???:at::shouldRunRecordFunction(bool) 520 ???:__memcpy_avx_unaligned_erms 200 ???:c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10:: ... g>)>::call(c10::OperatorKernel, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) 100 ???:c10::TensorImpl::strides() const 100 ???:c10::TensorImpl::sizes() const 100 ???:at::(anonymous namespace)::manager() 77 /tmp/benchmark_utils_jit_build__1626465284__8a34e7ff-cd37-4a82-be28-7f19e081e771/timer_cpp_7815557938202456331/timer_src.cpp:main 40 ???:c10::TensorImpl::numel() const -77 /tmp/benchmark_utils_jit_build__1626465284__8a34e7ff-cd37-4a82-be28-7f19e081e771/timer_cpp_8055217880649990171/timer_src.cpp:main -260 ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>) Total: 10140 ``` ``` [<torch.utils.benchmark.utils.common.Measurement object at 0x7f850dd66c10> x.view(-1); setup: at::Tensor x=torch::empty({2,2}); Median: 582.78 ns IQR: 33.80 ns (573.80 to 607.61) 833 measurements, 10000 runs per measurement, 1 thread] [<torch.utils.benchmark.utils.common.Measurement object at 0x7f850de31e20> x.reshape(-1); setup: at::Tensor x=torch::empty({2,2}); Median: 704.24 ns IQR: 24.42 ns (697.20 to 721.62) 679 measurements, 10000 runs per measurement, 1 thread] 56896 67036 <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f84e1930bb0> 2640 ???:at::detail::computeStride(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::SmallVector<long, 5u> const&) 1920 ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>) 1520 ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>) 1040 ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long>&&) 980 ???:void at::infer_size_impl<c10::SmallVector<long, 5u> >(c10::ArrayRef<long>, long, c10::SmallVector<long, 5u>&) 720 ???:__tls_get_addr 520 ???:at::shouldRunRecordFunction(bool) 520 ???:__memcpy_avx_unaligned_erms 200 ???:c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10:: ... g>)>::call(c10::OperatorKernel, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) 100 ???:c10::TensorImpl::strides() const 100 ???:c10::TensorImpl::sizes() const 100 ???:at::(anonymous namespace)::manager() 76 /tmp/benchmark_utils_jit_build__1626466038__15fbbac0-2072-4459-8f8e-08121a905b99/timer_cpp_547407365342278353/timer_src.cpp:main 40 ???:c10::TensorImpl::numel() const -76 /tmp/benchmark_utils_jit_build__1626466038__15fbbac0-2072-4459-8f8e-08121a905b99/timer_cpp_3457873755756181226/timer_src.cpp:main -260 ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>) Total: 10140 ``` </details> ### Using `as_strided` <details> <summary>Benchmark results</summary> ``` [<torch.utils.benchmark.utils.common.Measurement object at 0x7f8b13bb5b50> x.pow(4).view(-1).mean().backward(); setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); Median: 53.37 us IQR: 3.15 us (51.73 to 54.88) 936 measurements, 100 runs per measurement, 1 thread] [<torch.utils.benchmark.utils.common.Measurement object at 0x7f8af55f8490> x.pow(4).reshape(-1).mean().backward(); setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); Median: 83.24 us IQR: 4.05 us (81.20 to 85.25) 609 measurements, 100 runs per measurement, 1 thread] 2267916 2525061 <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f8af55f8e50> 31930 ???:_int_free 15940 ???:malloc 11595 ???:_int_malloc 10100 ???:torch::autograd::generated::details::as_strided_backward(at::Tensor, at::TensorGeometry, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 9360 ???:__tls_get_addr 8280 ???:free 8100 ???:torch::autograd::VariableType::(anonymous namespace)::as_strided(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 4520 ???:c10::intrusive_ptr<c10::TensorImpl, c10::UndefinedTensorImpl>::reset_() 4080 ???:operator new(unsigned long) ... -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -920 ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long> const&) -1220 ???:torch::autograd::generated::ViewBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) -1520 ???:at::_ops::view::call(at::Tensor const&, c10::ArrayRef<long>) -1580 ???:torch::ADInplaceOrView::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -1680 ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::SmallVector<long, 5u> >(at::Tensor const&, c10::SmallVector<long, 5u> const&, c10::SmallVector<long, 5u> const&) -2560 ???:at::detail::computeStride(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::SmallVector<long, 5u> const&) -2640 ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>) -4860 ???:torch::autograd::VariableType::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) Total: 257145 ``` ``` [<torch.utils.benchmark.utils.common.Measurement object at 0x7f93176a0160> x.view(-1); setup: at::Tensor x=torch::empty({2,2}); Median: 570.55 ns IQR: 32.69 ns (552.87 to 585.56) 874 measurements, 10000 runs per measurement, 1 thread] [<torch.utils.benchmark.utils.common.Measurement object at 0x7f92f8f29490> x.reshape(-1); setup: at::Tensor x=torch::empty({2,2}); Median: 576.49 ns IQR: 37.95 ns (559.51 to 597.46) 861 measurements, 10000 runs per measurement, 1 thread] 56896 58556 <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f932556ca60> 2140 ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>) 1940 ???:torch::autograd::VariableType::(anonymous namespace)::as_strided(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 1880 ???:torch::ADInplaceOrView::(anonymous namespace)::as_strided(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 1720 ???:at::_ops::as_strided::call(at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 1520 ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>) 1400 ???:at::native::as_strided_tensorimpl(at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 1260 ???:at::_ops::as_strided::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>)'2 1260 ???:at::_ops::as_strided::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 980 ???:void at::infer_size_impl<c10::SmallVector<long, 5u> >(c10::ArrayRef<long>, long, c10::SmallVector<long, 5u>&) ... -620 ???:at::Tensor c10::Dispatcher::redispatch<at::Tensor, at::Tensor const&, c10::ArrayRef<long ... ::ArrayRef<long>)> const&, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) const -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)'2 -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -920 ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long> const&) -1520 ???:at::_ops::view::call(at::Tensor const&, c10::ArrayRef<long>) -1580 ???:torch::ADInplaceOrView::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -1680 ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::SmallVector<long, 5u> >(at::Tensor const&, c10::SmallVector<long, 5u> const&, c10::SmallVector<long, 5u> const&) -1740 ???:torch::autograd::VariableType::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -2640 ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>) Total: 1660 ``` </details> ### Using custom function (`_reshape_alias`) <details> <summary>Benchmark results</summary> ``` [<torch.utils.benchmark.utils.common.Measurement object at 0x7f16861d6b50> x.pow(4).view(-1).mean().backward(); setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); Median: 53.50 us IQR: 2.64 us (52.32 to 54.96) 906 measurements, 100 runs per measurement, 1 thread] [<torch.utils.benchmark.utils.common.Measurement object at 0x7f1667b2ed60> x.pow(4).reshape(-1).mean().backward(); setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); Median: 53.13 us IQR: 3.40 us (51.72 to 55.13) 914 measurements, 100 runs per measurement, 1 thread] 2269736 2273236 <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f1693f8dc10> 5060 ???:torch::autograd::VariableType::(anonymous namespace)::_reshape_alias(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 2000 ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>) 1780 ???:torch::ADInplaceOrView::(anonymous namespace)::_reshape_alias(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 1660 ???:at::_ops::_reshape_alias::call(at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 1600 ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::ArrayRef<long> >(at::Tensor const&, c10::ArrayRef<long> const&, c10::ArrayRef<long> const&) 1520 ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>) 1240 ???:at::_ops::_reshape_alias::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)'2 1240 ???:at::_ops::_reshape_alias::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 1220 ???:torch::autograd::generated::AliasToShapeBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) ... -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)'2 -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -920 ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long> const&) -1220 ???:torch::autograd::generated::ViewBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) -1520 ???:at::_ops::view::call(at::Tensor const&, c10::ArrayRef<long>) -1580 ???:torch::ADInplaceOrView::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -1680 ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::SmallVector<long, 5u> >(at::Tensor const&, c10::SmallVector<long, 5u> const&, c10::SmallVector<long, 5u> const&) -2640 ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>) -4860 ???:torch::autograd::VariableType::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) Total: 3500 ``` ``` [<torch.utils.benchmark.utils.common.Measurement object at 0x7f5287adfb20> x.view(-1); setup: at::Tensor x=torch::empty({2,2}); Median: 505.10 ns IQR: 20.04 ns (500.41 to 520.45) 944 measurements, 10000 runs per measurement, 1 thread] [<torch.utils.benchmark.utils.common.Measurement object at 0x7f526951b430> x.reshape(-1); setup: at::Tensor x=torch::empty({2,2}); Median: 536.01 ns IQR: 17.81 ns (531.34 to 549.16) 916 measurements, 10000 runs per measurement, 1 thread] 56896 60376 <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f5295896c10> 2000 ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>) 1860 ???:torch::autograd::VariableType::(anonymous namespace)::_reshape_alias(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 1780 ???:torch::ADInplaceOrView::(anonymous namespace)::_reshape_alias(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 1660 ???:at::_ops::_reshape_alias::call(at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 1600 ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::ArrayRef<long> >(at::Tensor const&, c10::ArrayRef<long> const&, c10::ArrayRef<long> const&) 1520 ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>) 1240 ???:at::_ops::_reshape_alias::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)'2 1240 ???:at::_ops::_reshape_alias::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 980 ???:void at::infer_size_impl<c10::SmallVector<long, 5u> >(c10::ArrayRef<long>, long, c10::SmallVector<long, 5u>&) ... -620 ???:at::Tensor c10::Dispatcher::redispatch<at::Tensor, at::Tensor const&, c10::ArrayRef<long ... ::ArrayRef<long>)> const&, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) const -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)'2 -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -920 ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long> const&) -1520 ???:at::_ops::view::call(at::Tensor const&, c10::ArrayRef<long>) -1580 ???:torch::ADInplaceOrView::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -1680 ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::SmallVector<long, 5u> >(at::Tensor const&, c10::SmallVector<long, 5u> const&, c10::SmallVector<long, 5u> const&) -1740 ???:torch::autograd::VariableType::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -2640 ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>) Total: 3480 ``` </details> Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29792126 Pulled By: laurencer fbshipit-source-id: f0519b45b65f868aa3e8651679354558bd761dfd	2021-07-21 14:05:35 -07:00
albanD	d03ff1a17d	pre compute regex and match simple signature autograd codegen 15s -> 12s (#59852 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59852 This whole stack does not change anything to the codegened code Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D29063814 Pulled By: albanD fbshipit-source-id: a751047526f8d58f4760ee6f9ae906675bed5d75	2021-06-12 06:58:36 -07:00
albanD	30a18fe318	refactor yaml loader import, no runtime change (#59850 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59850 This whole stack does not change anything to the codegened code Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D29063816 Pulled By: albanD fbshipit-source-id: ca3067443d8e6282c1077d3dafa3b4f330d43b28	2021-06-12 06:58:34 -07:00
albanD	7143a6a189	Avoid unnecessary re-computation autograd codegen 21s -> 15s (#59847 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59847 This whole stack does not change anything to the codegened code Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D29063817 Pulled By: albanD fbshipit-source-id: 284c3e057029b7a67f43a1b034bb30863bd68c71	2021-06-12 06:57:19 -07:00
Kshiteej K	c90260905f	[fix] torch.{lin, log}space(): properly examine passed dtype (#53685 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53171 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53685 Reviewed By: jbschlosser Differential Revision: D28331863 Pulled By: anjali411 fbshipit-source-id: e89359b607d058158cfa1c9a82389d9a4a71185b	2021-06-10 11:59:54 -07:00
Jeffrey Wan	f52e202840	Add warning when accessing Tensor::grad() in the C++ API (#59362 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/35379 - Adds `retains_grad` attribute backed by cpp as a native function. The python bindings for the function are skipped to be consistent with `is_leaf`. - Tried writing it without native function, but the jit test `test_tensor_properties` seems to require that it be a native function (or alternatively maybe it could also work if we manually add a prim implementation?). - Python API now uses `retain_grad` implementation from cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/59362 Reviewed By: jbschlosser Differential Revision: D28969298 Pulled By: soulitzer fbshipit-source-id: 335f2be50b9fb870cd35dc72f7dadd6c8666cc02	2021-06-08 19:43:21 -07:00
Brian Hirsh	9354a68e7d	[codegen] split out backend-specific information from NativeFunction in the model (#57361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57361 Data model change in the codegen, which splits backend-specific information out of `NativeFunction` ### Overview Currently in the codegen, native_functions.yaml has backend-specific information about each operator that is encoded directly into the data model, in the `NativeFunction` object. That's reasonable, since the native_functions.yaml is the source of truth for information about an operator, and the data model encodes that information into types. Now that external backends can use the codegen though, that information is technically incomplete/inaccurate. In another PR, I tried patching the information on the `NativeFunction` object with the additional external information, by updating the `dispatch` entry to contain the external backend kernel name and dispatch key. Instead, this PR tries to split out that information. The `NativeFunction` class contains all information about an operator from native_functions.yaml that's backend-independent and is known never to change regardless of what extra information backends provide. We also build up a backend "index", which is basically a mapping from [backend] -> [backend-specific-metadata]. Reading in an external backend yaml just involves updating that index with the new backend. There were a few places where `NativeFunction` used the dispatch table directly, that I encoded as properties directly on the NativeFunction object (e.g. `is_abstract`). They were mostly around whether or not the operator has a composite kernel, which isn't something that's going to change for any external backends. This has a few advantages: - We can more easily re-use the existing logic in `native_function.py` and `register_dispatch_key.py` for both native and external backends, since they both involve a NativeFunction + a particular backend index - The data in the data model will be the same regardless of how the codegen is run. Running the codegen with a new external backend doesn't change the data inside of NativeFunction or an existing backend index. It just adds a new index for that backend. - There are several of codegen areas that don't care about backend-specific information: mostly the tracing and autograd codegen. We can reason about the codegen there more easily, knowing that backend-specific info is entirely uninvolved. An alternative to this split would be to augment the NativeFunction objects with external backend information at the time that we create them. So the external codegen could read both native_functions.yaml and the external backend's yaml at the same time, and construct a NativeObject with a full dispatch table (including the XLA entry), and the correct setting of structured (taking into account both yamls). One disadvantage to this approach is that NativeFunction objects now contain different stuff depending on how you ran the codegen, and you have to make sure that any changes to the codegen can properly handle all the different variants. ### Data Model Changes Removed 3 classes, which are used by the external codegen: - ExternalBackendFunction - ExternalBackendFunctionsGroup - ExternalBackendMetadata And added two new ones: - BackendIndex - BackendMetadata `BackendIndex` contains any info that's specific to that backend, plus a mapping from operator names to backend specific metadata about the operator. One example of backend-specific info that's not operator-dependent is the fact that XLA prefers to implement functional kernels instead of out kernels (and so when they eventually mark an op as structured, they're going to mark the functional op and not the out op). `BackendMetadata` contains info specific to an (operator, backend) pair. Right now, that's just (a) the name of the kernel, and (b) whether or not that operator is structured. ### Questions I wanted to get this PR up earlier so I could get feedback, but there are a few things I want to call out: Dealing with `structured`. This PR separates out the notion of `structured` into two bits of information: - Does [operator] have a meta() function. This is backend-agnostic, and is represented by the `structured` property on `NativeFunction`, same as before. This is used, e.g., to decide what signatures to add to `MetaFunctions.h`. - Does [operator, backend] have an impl() function. This is backend dependent; even though technically all in-tree backends are forced to write impl() functions for an operator when we port the op to structured in native_functions.yaml, out-of-tree backends can decide to opt in independently. This is represented as a property on `BackendMetadata`. This is used in most other cases, e.g. in `RegisterDispatchKey` when we're deciding whether or not to gen a structured or unstructured wrapper. I also baked `is_structured_dispatch_key` directly into each BackendIndex. So for operators marked "structured" in native_functions.yaml, their corresponding CPU/CUDA BackendIndex entries will be marked structured, and all others (except for potentially external backends) will not. I ended up trying to deal with `structured` in this change since it's technically backend dependent (XLA can opt kernels into structured separately from in-tree ops), but that may have been too ambitious: it's technically not relevant until we actually add support for structured external kernels. If it's not clear that this is the right path for dealing with structured and we want to push that off, I'm fine with backing out the bits of this PR that make `structured` backend-dependent. I don't see anything too controversial related to structured in the change, but I tried to call out any areas in the comments Localizing the fact that external backends follow Dispatcher convention. Another thing that's sort of backend specific that I didn't totally address in this PR is the fact the fact that in-tree backends follow the Native API while external backends follow the Dispatcher API. I painted over that in `native_functions.py` by adding a helper, `kernel_signature`, that takes in a native function and gives you the "correct" signature for the specified backend- NativeSignature for in-tree backends, and DispatcherSignature for out-of-tree backends. In order to make that fully useable though, we'll need `NativeSignature` and `DispatcherSignature` to have matching interfaces. I didn't bother with that in this PR, which is why `gen_external_aten_fallbacks.py` still has a bunch of direct references to the dispatcher API. Thinking of adding it in a later PR but wanted to see if anyone has other opinions. Maybe `is_external()` shouldn't even be a property on the BackendMetadata, and anything the codegen does that requires asking for that information should just be better abstracted away. Thoughts on the `BackendIndex` / `BackendMetadata` breakdown. One thing that's annoying right now is that to query for various pieces of metadata, you call helper functions like `backend_index.structured(f)`, which queries that particular backend and tells you if that specific NativeFunctionGroup is structured for that backend. It has to return an `Optional[bool]` though, since you have to handle the case where that operator doesn't have a kernel for that backend at all. So users of those helpers end up with a bunch of optionals that they need to unpack, even if they know at some point that the result isn't None. I think it would be easier instead to just store the NativeFunction object as a field directly on the BackendMetadata. Curious if there are any other opinions on a better way to model it though. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28474362 Pulled By: bdhirsh fbshipit-source-id: 41a00821acf172467d764cb41e771e096542f661	2021-05-17 12:25:35 -07:00
Ilqar Ramazanli	8b816e9010	To implement gradient for Pytorch (#54617 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56129 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54617 Reviewed By: anjali411 Differential Revision: D28057452 Pulled By: iramazanli fbshipit-source-id: 9bd86679282d34f5e5393e6447121586517eb4f0	2021-05-11 18:52:20 -07:00
Ilqar Ramazanli	15975cf6a6	To add priority of int/int? over int[] on signature matching and adding {h,v,d}split methods (#57346 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54555 It has been discussed in the issue https://github.com/pytorch/pytorch/issues/54555 that {h,v,d}split methods unexpectedly matches argument of single int[] when it is expected to match single argument of int. The same unexpected behavior can happen in other functions/methods which can take both int[] and int? as single argument signatures. In this PR we solve this problem by giving higher priority to int/int? arguments over int[] while sorting signatures. We also add methods of {h,v,d}split methods here, which helped us to discover this unexpected behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57346 Reviewed By: ezyang Differential Revision: D28121234 Pulled By: iramazanli fbshipit-source-id: 851cf40b370707be89298177b51ceb4527f4b2d6	2021-05-03 18:52:41 -07:00
Sam Estep	75024e228c	Add lint for unqualified `type: ignore` (#56290 ) Summary: The other half of https://github.com/pytorch/pytorch/issues/56272. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56290 Test Plan: CI should pass on the tip of this PR, and we know that the lint works because the following CI runs (before this PR was finished) failed: - https://github.com/pytorch/pytorch/runs/2384511062 - https://github.com/pytorch/pytorch/actions/runs/765036024 Reviewed By: seemethere Differential Revision: D27867219 Pulled By: samestep fbshipit-source-id: e648f07b6822867e70833e23ddafe7fb7eaca235	2021-04-21 08:07:23 -07:00
Edward Yang	6ec71ed4f9	Replace all direct cdata access with THPVariable_Unpack (#55799 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55799 I'm going to change the implementation of cdata soon so I need to abstract over cdata access with a function. Additionally, many users are casting manually casting to THPVariable to access the member so I can remove these unsafe casts in the client code (the implementation, of course, is still doing an unsafe cast.) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27712130 Pulled By: ezyang fbshipit-source-id: 95fcc013bf3913d67f2c634068eb5b3aab144cb3	2021-04-15 08:57:04 -07:00
Sam Estep	4753100a3b	Un-ignore F403 in .flake8 (#55838 ) Summary: Generally wildcard imports are bad for the reasons described here: https://www.flake8rules.com/rules/F403.html This PR replaces wildcard imports with an explicit list of imported items where possible, and adds a `# noqa: F403` comment in the other cases (mostly re-exports in `__init__.py` files). This is a prerequisite for https://github.com/pytorch/pytorch/issues/55816, because currently [`tools/codegen/dest/register_dispatch_key.py` simply fails if you sort its imports](https://github.com/pytorch/pytorch/actions/runs/742505908). Pull Request resolved: https://github.com/pytorch/pytorch/pull/55838 Test Plan: CI. You can also run `flake8` locally. Reviewed By: jbschlosser Differential Revision: D27724232 Pulled By: samestep fbshipit-source-id: 269fb09cb4168f8a51fd65bfaacc6cda7fb87c34	2021-04-13 09:24:07 -07:00
Sameer Deshmukh	5fb1142702	Add CSR (compressed sparse row) layout for sparse tensors (#50937 ) Summary: Implement compressed sparse row format. Derived from the GCS implementation at https://github.com/pytorch/pytorch/pull/44190 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50937 Reviewed By: mrshenli Differential Revision: D27439865 Pulled By: ezyang fbshipit-source-id: 3ba3dcb9679505b980ff6a5f513e913bbae2fb1d	2021-04-12 10:09:12 -07:00
lezcano	5870346173	Port index_copy from TH to ATen (#52203 ) Summary: The design of the `TensorIterator` was similar to that in https://github.com/pytorch/pytorch/pull/50578 Resolves https://github.com/pytorch/pytorch/issues/24670 Resolves https://github.com/pytorch/pytorch/issues/24523 Timings: <details> <summary>Script</summary> ```python from IPython import get_ipython import torch torch.manual_seed(13) torch.set_num_threads(1) ipython = get_ipython() cpu = torch.device('cpu') cuda = torch.device('cuda') def run_test(ndims, size, index_len, device): print(f"ndims: {ndims}, tensor_size: {size}, index_len: {index_len}, device: {device}") x = torch.rand(([size] ndims), device=device) index = torch.randint(size, (index_len,), dtype=torch.long, device=device) for d in range(ndims): shape_t = [size] * d + [index_len] + [size] * (ndims - d - 1) t = torch.rand(*shape_t, device=device) command = "x.index_copy(d, index, t)" if device == cuda: command = command + "; torch.cuda.synchronize()" ipython.magic(f"timeit {command}") print() run_test(3, 700, 10, cpu) run_test(3, 700, 100, cpu) run_test(3, 700, 700, cpu) run_test(2, 10000, 10000, cpu) run_test(3, 700, 10, cuda) run_test(3, 700, 100, cuda) run_test(3, 700, 700, cuda) run_test(2, 10000, 10000, cuda) ``` </details> <details> <summary>CPU ATen</summary> ``` ndims: 3, tensor_size: 700, index_len: 10, device: cpu 327 ms ± 309 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 329 ms ± 456 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 378 ms ± 1.44 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ndims: 3, tensor_size: 700, index_len: 100, device: cpu 348 ms ± 1.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 359 ms ± 330 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 526 ms ± 686 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) ndims: 3, tensor_size: 700, index_len: 700, device: cpu 560 ms ± 19 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 552 ms ± 2.61 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 932 ms ± 2.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ndims: 2, tensor_size: 10000, index_len: 10000, device: cpu 163 ms ± 5.05 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 302 ms ± 5.75 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` </details> <details> <summary>CUDA ATen</summary> ``` ndims: 3, tensor_size: 700, index_len: 10, device: cuda 9.63 ms ± 441 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 9.65 ms ± 230 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 12.4 ms ± 881 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ndims: 3, tensor_size: 700, index_len: 100, device: cuda 10.8 ms ± 1.51 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 11 ms ± 417 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 21.2 ms ± 18.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ndims: 3, tensor_size: 700, index_len: 700, device: cuda 19 ms ± 4.42 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 17.8 ms ± 493 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 25.8 ms ± 1.22 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ndims: 2, tensor_size: 10000, index_len: 10000, device: cuda 5.59 ms ± 109 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 10 ms ± 25.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` </details> <details> <summary>CPU TH</summary> ``` ndims: 3, tensor_size: 700, index_len: 10, device: cpu 333 ms ± 2.42 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 327 ms ± 1.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 366 ms ± 753 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) ndims: 3, tensor_size: 700, index_len: 100, device: cpu 336 ms ± 1.24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 345 ms ± 914 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 884 ms ± 4.32 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ndims: 3, tensor_size: 700, index_len: 700, device: cpu 441 ms ± 3.58 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 514 ms ± 1.17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 7.46 s ± 6.46 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ndims: 2, tensor_size: 10000, index_len: 10000, device: cpu 141 ms ± 233 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 1.13 s ± 855 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` </details> <details> <summary>CUDA TH</summary> ``` ndims: 3, tensor_size: 700, index_len: 10, device: cuda 9.64 ms ± 390 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 9.68 ms ± 3.26 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 13.9 ms ± 928 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ndims: 3, tensor_size: 700, index_len: 100, device: cuda 11.6 ms ± 1.38 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 12.1 ms ± 3.72 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 30.3 ms ± 27.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ndims: 3, tensor_size: 700, index_len: 700, device: cuda 27.2 ms ± 19.8 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 30.6 ms ± 43.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 146 ms ± 204 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ndims: 2, tensor_size: 10000, index_len: 10000, device: cuda 6.5 ms ± 3.99 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 64.7 ms ± 55.5 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` </details> According to these we see a slight performance improvement across both CPU and GPU. cc: nikitaved Pull Request resolved: https://github.com/pytorch/pytorch/pull/52203 Reviewed By: jbschlosser Differential Revision: D27066572 Pulled By: mruberry fbshipit-source-id: 6101e461cf731afa3db042a383b723d3d6bfdc26	2021-03-22 22:36:35 -07:00
kedejesu	53d8778b4d	Update clang-format linux hash and yaml import calls (#53932 ) Summary: Fixing Bandit security issues. - yaml_load: Use of unsafe yaml load. Allows instantiation of arbitrary objects. Consider yaml.safe_load(). Test ID: B506 Severity: MEDIUM Confidence: HIGH File: ./caffe2/contrib/aten/gen_op.py More info: https://bandit.readthedocs.io/en/latest/plugins/b506_yaml_load.html 235 if __name__ == '__main__': 236 decls = yaml.load(read(os.path.join(args.yaml_dir, 'Declarations.yaml')), Loader=Loader) 237 factory_methods = find_factory_methods(decls) - Blacklist: Use of insecure MD2 (`6149a26adb`), MD4 (`fc7f026980`), MD5 (`7ea9d9af4e`), or SHA1 hash function. Test ID: B303 Severity: MEDIUM Confidence: HIGH File: ./tools/clang_format_utils.py More info: https://bandit.readthedocs.io/en/latest/blacklists/blacklist_calls.html#b303-md5 36 37 hash = hashlib.sha1() 38 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53932 Reviewed By: jbschlosser Differential Revision: D27072017 Pulled By: malfet fbshipit-source-id: 2fef0119388797aee3cacdc880fc345bd2ba68ce	2021-03-18 17:11:58 -07:00

1 2 3 4 5 ...

281 Commits