Commit Graph

362 Commits

Author SHA1 Message Date
Edward Z. Yang
a11c1bbdd0 Run Black on all of tools/
Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76089

Approved by: https://github.com/albanD
2022-04-20 17:29:41 +00:00
PyTorch MergeBot
ea44645c9a Revert "Allow specifying tags for aten operators in native_functions.yaml"
This reverts commit 1dab71ab25.

Reverted https://github.com/pytorch/pytorch/pull/72549 on behalf of https://github.com/malfet
2022-03-28 18:04:38 +00:00
anjali411
1dab71ab25 Allow specifying tags for aten operators in native_functions.yaml
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72549

Approved by: https://github.com/ezyang
2022-03-25 21:17:52 +00:00
Peter Bell
40d1f77384 Codegen: python_torch_functions only include relevant operators (#68693)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68693

Generation of python bindings for native functions is split over 8
different files. One for each namespace, with the torch namespace
split into 3 shards, and methods in their own file as well. This
change ensures that editing any single (non-method) operator only
causes one of these files to be rebuilt.

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D32596270

Pulled By: albanD

fbshipit-source-id: 0570ec69e7476b8f1bc21138ba18fe8f95ebbe3f
(cherry picked from commit ba0fc71a3a)
2022-01-21 15:37:06 +00:00
Alban Desmaison
badf7b0210 fix typo changing the generated code (#69899)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69899

Reviewed By: soulitzer

Differential Revision: D33093461

Pulled By: albanD

fbshipit-source-id: 2c672a2b767f0caed1ef3a1d2afa1cacdfcdc320
2021-12-14 06:36:14 -08:00
soulitzer
51033ec840 Add forward AD layout check for storage numel (#68631)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68631

This PR:
- Adds the check that the storage numel of the base and tangent tensors are the same. This is to support the case when as_strided reveals elements that aren't indexable by the input tensor.
- Skips the check when batched tensors are involved, because using as_strided to reveal elements that not indexable by the input tensor is already not allowed vmap.
- Adds tests for the above two cases, as well as an edge case regarding conj bit (what about neg bit?)

For functorch:
- we need to copy the batching rule implemented here

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D32899678

Pulled By: soulitzer

fbshipit-source-id: 54db9550dd2c93bc66b8fb2d36ce40799ebba794
2021-12-14 04:34:25 -08:00
kshitij12345
b737e09f60 expose return_types in Python (#66614)
Summary:
https://github.com/facebookresearch/functorch/issues/87

TODO:
* [x] Add comments
* [x] Add test
* [x] Fix XLA

<details>

<summary>Generated python_return_types.cpp</summary>

```cpp
#include <Python.h>

#include <vector>
#include <map>
#include <string>

#include "torch/csrc/autograd/python_return_types.h"
#include "torch/csrc/utils/structseq.h"
#include "torch/csrc/Exceptions.h"

namespace {
PyTypeObject* get__det_lu_based_helper_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"det", ""}, {"lu", ""}, {"pivs", ""},  {nullptr} };
    static PyTypeObject _det_lu_based_helperNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types._det_lu_based_helper", nullptr, NamedTuple_fields, 3 };
    if (!is_initialized) {
        PyStructSequence_InitType(&_det_lu_based_helperNamedTuple, &desc);
        _det_lu_based_helperNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &_det_lu_based_helperNamedTuple;
}
PyTypeObject* get__fake_quantize_per_tensor_affine_cachemask_tensor_qparams_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"output", ""}, {"mask", ""},  {nullptr} };
    static PyTypeObject _fake_quantize_per_tensor_affine_cachemask_tensor_qparamsNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types._fake_quantize_per_tensor_affine_cachemask_tensor_qparams", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&_fake_quantize_per_tensor_affine_cachemask_tensor_qparamsNamedTuple, &desc);
        _fake_quantize_per_tensor_affine_cachemask_tensor_qparamsNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &_fake_quantize_per_tensor_affine_cachemask_tensor_qparamsNamedTuple;
}
PyTypeObject* get__fused_moving_avg_obs_fq_helper_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"output", ""}, {"mask", ""},  {nullptr} };
    static PyTypeObject _fused_moving_avg_obs_fq_helperNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types._fused_moving_avg_obs_fq_helper", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&_fused_moving_avg_obs_fq_helperNamedTuple, &desc);
        _fused_moving_avg_obs_fq_helperNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &_fused_moving_avg_obs_fq_helperNamedTuple;
}
PyTypeObject* get__lu_with_info_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"LU", ""}, {"pivots", ""}, {"info", ""},  {nullptr} };
    static PyTypeObject _lu_with_infoNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types._lu_with_info", nullptr, NamedTuple_fields, 3 };
    if (!is_initialized) {
        PyStructSequence_InitType(&_lu_with_infoNamedTuple, &desc);
        _lu_with_infoNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &_lu_with_infoNamedTuple;
}
PyTypeObject* get__unpack_dual_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"primal", ""}, {"tangent", ""},  {nullptr} };
    static PyTypeObject _unpack_dualNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types._unpack_dual", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&_unpack_dualNamedTuple, &desc);
        _unpack_dualNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &_unpack_dualNamedTuple;
}
PyTypeObject* get_aminmax_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"min", ""}, {"max", ""},  {nullptr} };
    static PyTypeObject aminmaxNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.aminmax", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&aminmaxNamedTuple, &desc);
        aminmaxNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &aminmaxNamedTuple;
}

PyTypeObject* get_aminmax_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"min", ""}, {"max", ""},  {nullptr} };
    static PyTypeObject aminmax_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.aminmax_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&aminmax_outNamedTuple1, &desc);
        aminmax_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &aminmax_outNamedTuple1;
}
PyTypeObject* get_cummax_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject cummaxNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.cummax", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&cummaxNamedTuple, &desc);
        cummaxNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &cummaxNamedTuple;
}

PyTypeObject* get_cummax_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject cummax_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.cummax_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&cummax_outNamedTuple1, &desc);
        cummax_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &cummax_outNamedTuple1;
}
PyTypeObject* get_cummin_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject cumminNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.cummin", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&cumminNamedTuple, &desc);
        cumminNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &cumminNamedTuple;
}

PyTypeObject* get_cummin_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject cummin_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.cummin_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&cummin_outNamedTuple1, &desc);
        cummin_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &cummin_outNamedTuple1;
}
PyTypeObject* get_eig_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""},  {nullptr} };
    static PyTypeObject eig_outNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.eig_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&eig_outNamedTuple, &desc);
        eig_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &eig_outNamedTuple;
}

PyTypeObject* get_eig_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""},  {nullptr} };
    static PyTypeObject eigNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.eig", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&eigNamedTuple1, &desc);
        eigNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &eigNamedTuple1;
}
PyTypeObject* get_frexp_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"mantissa", ""}, {"exponent", ""},  {nullptr} };
    static PyTypeObject frexpNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.frexp", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&frexpNamedTuple, &desc);
        frexpNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &frexpNamedTuple;
}

PyTypeObject* get_frexp_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"mantissa", ""}, {"exponent", ""},  {nullptr} };
    static PyTypeObject frexp_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.frexp_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&frexp_outNamedTuple1, &desc);
        frexp_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &frexp_outNamedTuple1;
}
PyTypeObject* get_geqrf_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"a", ""}, {"tau", ""},  {nullptr} };
    static PyTypeObject geqrf_outNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.geqrf_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&geqrf_outNamedTuple, &desc);
        geqrf_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &geqrf_outNamedTuple;
}

PyTypeObject* get_geqrf_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"a", ""}, {"tau", ""},  {nullptr} };
    static PyTypeObject geqrfNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.geqrf", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&geqrfNamedTuple1, &desc);
        geqrfNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &geqrfNamedTuple1;
}
PyTypeObject* get_histogram_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"hist", ""}, {"bin_edges", ""},  {nullptr} };
    static PyTypeObject histogram_outNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.histogram_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&histogram_outNamedTuple, &desc);
        histogram_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &histogram_outNamedTuple;
}

PyTypeObject* get_histogram_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"hist", ""}, {"bin_edges", ""},  {nullptr} };
    static PyTypeObject histogramNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.histogram", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&histogramNamedTuple1, &desc);
        histogramNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &histogramNamedTuple1;
}
PyTypeObject* get_kthvalue_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject kthvalueNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.kthvalue", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&kthvalueNamedTuple, &desc);
        kthvalueNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &kthvalueNamedTuple;
}

PyTypeObject* get_kthvalue_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject kthvalue_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.kthvalue_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&kthvalue_outNamedTuple1, &desc);
        kthvalue_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &kthvalue_outNamedTuple1;
}
PyTypeObject* get_linalg_cholesky_ex_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"L", ""}, {"info", ""},  {nullptr} };
    static PyTypeObject linalg_cholesky_exNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.linalg_cholesky_ex", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&linalg_cholesky_exNamedTuple, &desc);
        linalg_cholesky_exNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &linalg_cholesky_exNamedTuple;
}

PyTypeObject* get_linalg_cholesky_ex_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"L", ""}, {"info", ""},  {nullptr} };
    static PyTypeObject linalg_cholesky_ex_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.linalg_cholesky_ex_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&linalg_cholesky_ex_outNamedTuple1, &desc);
        linalg_cholesky_ex_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &linalg_cholesky_ex_outNamedTuple1;
}
PyTypeObject* get_linalg_eig_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""},  {nullptr} };
    static PyTypeObject linalg_eigNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.linalg_eig", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&linalg_eigNamedTuple, &desc);
        linalg_eigNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &linalg_eigNamedTuple;
}

PyTypeObject* get_linalg_eig_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""},  {nullptr} };
    static PyTypeObject linalg_eig_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.linalg_eig_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&linalg_eig_outNamedTuple1, &desc);
        linalg_eig_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &linalg_eig_outNamedTuple1;
}
PyTypeObject* get_linalg_eigh_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""},  {nullptr} };
    static PyTypeObject linalg_eighNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.linalg_eigh", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&linalg_eighNamedTuple, &desc);
        linalg_eighNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &linalg_eighNamedTuple;
}

PyTypeObject* get_linalg_eigh_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""},  {nullptr} };
    static PyTypeObject linalg_eigh_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.linalg_eigh_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&linalg_eigh_outNamedTuple1, &desc);
        linalg_eigh_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &linalg_eigh_outNamedTuple1;
}
PyTypeObject* get_linalg_inv_ex_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"inverse", ""}, {"info", ""},  {nullptr} };
    static PyTypeObject linalg_inv_exNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.linalg_inv_ex", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&linalg_inv_exNamedTuple, &desc);
        linalg_inv_exNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &linalg_inv_exNamedTuple;
}

PyTypeObject* get_linalg_inv_ex_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"inverse", ""}, {"info", ""},  {nullptr} };
    static PyTypeObject linalg_inv_ex_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.linalg_inv_ex_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&linalg_inv_ex_outNamedTuple1, &desc);
        linalg_inv_ex_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &linalg_inv_ex_outNamedTuple1;
}
PyTypeObject* get_linalg_lstsq_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"residuals", ""}, {"rank", ""}, {"singular_values", ""},  {nullptr} };
    static PyTypeObject linalg_lstsqNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.linalg_lstsq", nullptr, NamedTuple_fields, 4 };
    if (!is_initialized) {
        PyStructSequence_InitType(&linalg_lstsqNamedTuple, &desc);
        linalg_lstsqNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &linalg_lstsqNamedTuple;
}

PyTypeObject* get_linalg_lstsq_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"residuals", ""}, {"rank", ""}, {"singular_values", ""},  {nullptr} };
    static PyTypeObject linalg_lstsq_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.linalg_lstsq_out", nullptr, NamedTuple_fields, 4 };
    if (!is_initialized) {
        PyStructSequence_InitType(&linalg_lstsq_outNamedTuple1, &desc);
        linalg_lstsq_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &linalg_lstsq_outNamedTuple1;
}
PyTypeObject* get_linalg_qr_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"Q", ""}, {"R", ""},  {nullptr} };
    static PyTypeObject linalg_qrNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.linalg_qr", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&linalg_qrNamedTuple, &desc);
        linalg_qrNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &linalg_qrNamedTuple;
}

PyTypeObject* get_linalg_qr_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"Q", ""}, {"R", ""},  {nullptr} };
    static PyTypeObject linalg_qr_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.linalg_qr_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&linalg_qr_outNamedTuple1, &desc);
        linalg_qr_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &linalg_qr_outNamedTuple1;
}
PyTypeObject* get_linalg_slogdet_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"sign", ""}, {"logabsdet", ""},  {nullptr} };
    static PyTypeObject linalg_slogdetNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.linalg_slogdet", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&linalg_slogdetNamedTuple, &desc);
        linalg_slogdetNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &linalg_slogdetNamedTuple;
}

PyTypeObject* get_linalg_slogdet_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"sign", ""}, {"logabsdet", ""},  {nullptr} };
    static PyTypeObject linalg_slogdet_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.linalg_slogdet_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&linalg_slogdet_outNamedTuple1, &desc);
        linalg_slogdet_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &linalg_slogdet_outNamedTuple1;
}
PyTypeObject* get_linalg_svd_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"U", ""}, {"S", ""}, {"Vh", ""},  {nullptr} };
    static PyTypeObject linalg_svd_outNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.linalg_svd_out", nullptr, NamedTuple_fields, 3 };
    if (!is_initialized) {
        PyStructSequence_InitType(&linalg_svd_outNamedTuple, &desc);
        linalg_svd_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &linalg_svd_outNamedTuple;
}

PyTypeObject* get_linalg_svd_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"U", ""}, {"S", ""}, {"Vh", ""},  {nullptr} };
    static PyTypeObject linalg_svdNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.linalg_svd", nullptr, NamedTuple_fields, 3 };
    if (!is_initialized) {
        PyStructSequence_InitType(&linalg_svdNamedTuple1, &desc);
        linalg_svdNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &linalg_svdNamedTuple1;
}
PyTypeObject* get_lstsq_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"QR", ""},  {nullptr} };
    static PyTypeObject lstsq_outNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.lstsq_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&lstsq_outNamedTuple, &desc);
        lstsq_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &lstsq_outNamedTuple;
}

PyTypeObject* get_lstsq_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"QR", ""},  {nullptr} };
    static PyTypeObject lstsqNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.lstsq", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&lstsqNamedTuple1, &desc);
        lstsqNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &lstsqNamedTuple1;
}
PyTypeObject* get_lu_unpack_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"P", ""}, {"L", ""}, {"U", ""},  {nullptr} };
    static PyTypeObject lu_unpackNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.lu_unpack", nullptr, NamedTuple_fields, 3 };
    if (!is_initialized) {
        PyStructSequence_InitType(&lu_unpackNamedTuple, &desc);
        lu_unpackNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &lu_unpackNamedTuple;
}

PyTypeObject* get_lu_unpack_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"P", ""}, {"L", ""}, {"U", ""},  {nullptr} };
    static PyTypeObject lu_unpack_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.lu_unpack_out", nullptr, NamedTuple_fields, 3 };
    if (!is_initialized) {
        PyStructSequence_InitType(&lu_unpack_outNamedTuple1, &desc);
        lu_unpack_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &lu_unpack_outNamedTuple1;
}
PyTypeObject* get_max_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject maxNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.max", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&maxNamedTuple, &desc);
        maxNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &maxNamedTuple;
}

PyTypeObject* get_max_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject max_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.max_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&max_outNamedTuple1, &desc);
        max_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &max_outNamedTuple1;
}
PyTypeObject* get_median_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject medianNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.median", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&medianNamedTuple, &desc);
        medianNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &medianNamedTuple;
}

PyTypeObject* get_median_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject median_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.median_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&median_outNamedTuple1, &desc);
        median_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &median_outNamedTuple1;
}
PyTypeObject* get_min_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject minNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.min", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&minNamedTuple, &desc);
        minNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &minNamedTuple;
}

PyTypeObject* get_min_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject min_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.min_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&min_outNamedTuple1, &desc);
        min_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &min_outNamedTuple1;
}
PyTypeObject* get_mode_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject modeNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.mode", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&modeNamedTuple, &desc);
        modeNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &modeNamedTuple;
}

PyTypeObject* get_mode_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject mode_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.mode_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&mode_outNamedTuple1, &desc);
        mode_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &mode_outNamedTuple1;
}
PyTypeObject* get_nanmedian_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject nanmedianNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.nanmedian", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&nanmedianNamedTuple, &desc);
        nanmedianNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &nanmedianNamedTuple;
}

PyTypeObject* get_nanmedian_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject nanmedian_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.nanmedian_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&nanmedian_outNamedTuple1, &desc);
        nanmedian_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &nanmedian_outNamedTuple1;
}
PyTypeObject* get_qr_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"Q", ""}, {"R", ""},  {nullptr} };
    static PyTypeObject qr_outNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.qr_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&qr_outNamedTuple, &desc);
        qr_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &qr_outNamedTuple;
}

PyTypeObject* get_qr_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"Q", ""}, {"R", ""},  {nullptr} };
    static PyTypeObject qrNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.qr", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&qrNamedTuple1, &desc);
        qrNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &qrNamedTuple1;
}
PyTypeObject* get_slogdet_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"sign", ""}, {"logabsdet", ""},  {nullptr} };
    static PyTypeObject slogdetNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.slogdet", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&slogdetNamedTuple, &desc);
        slogdetNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &slogdetNamedTuple;
}
PyTypeObject* get_solve_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"LU", ""},  {nullptr} };
    static PyTypeObject solveNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.solve", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&solveNamedTuple, &desc);
        solveNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &solveNamedTuple;
}

PyTypeObject* get_solve_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"LU", ""},  {nullptr} };
    static PyTypeObject solve_outNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.solve_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&solve_outNamedTuple1, &desc);
        solve_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &solve_outNamedTuple1;
}
PyTypeObject* get_sort_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject sort_outNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.sort_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&sort_outNamedTuple, &desc);
        sort_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &sort_outNamedTuple;
}

PyTypeObject* get_sort_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject sortNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.sort", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&sortNamedTuple1, &desc);
        sortNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &sortNamedTuple1;
}
PyTypeObject* get_svd_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"U", ""}, {"S", ""}, {"V", ""},  {nullptr} };
    static PyTypeObject svd_outNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.svd_out", nullptr, NamedTuple_fields, 3 };
    if (!is_initialized) {
        PyStructSequence_InitType(&svd_outNamedTuple, &desc);
        svd_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &svd_outNamedTuple;
}

PyTypeObject* get_svd_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"U", ""}, {"S", ""}, {"V", ""},  {nullptr} };
    static PyTypeObject svdNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.svd", nullptr, NamedTuple_fields, 3 };
    if (!is_initialized) {
        PyStructSequence_InitType(&svdNamedTuple1, &desc);
        svdNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &svdNamedTuple1;
}
PyTypeObject* get_symeig_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""},  {nullptr} };
    static PyTypeObject symeig_outNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.symeig_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&symeig_outNamedTuple, &desc);
        symeig_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &symeig_outNamedTuple;
}

PyTypeObject* get_symeig_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""},  {nullptr} };
    static PyTypeObject symeigNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.symeig", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&symeigNamedTuple1, &desc);
        symeigNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &symeigNamedTuple1;
}
PyTypeObject* get_topk_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject topk_outNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.topk_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&topk_outNamedTuple, &desc);
        topk_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &topk_outNamedTuple;
}

PyTypeObject* get_topk_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""},  {nullptr} };
    static PyTypeObject topkNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.topk", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&topkNamedTuple1, &desc);
        topkNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &topkNamedTuple1;
}
PyTypeObject* get_triangular_solve_out_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"cloned_coefficient", ""},  {nullptr} };
    static PyTypeObject triangular_solve_outNamedTuple;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.triangular_solve_out", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&triangular_solve_outNamedTuple, &desc);
        triangular_solve_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &triangular_solve_outNamedTuple;
}

PyTypeObject* get_triangular_solve_namedtuple() {
    static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"cloned_coefficient", ""},  {nullptr} };
    static PyTypeObject triangular_solveNamedTuple1;
    static bool is_initialized = false;
    static PyStructSequence_Desc desc = { "torch.return_types.triangular_solve", nullptr, NamedTuple_fields, 2 };
    if (!is_initialized) {
        PyStructSequence_InitType(&triangular_solveNamedTuple1, &desc);
        triangular_solveNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr;
        is_initialized = true;
    }
    return &triangular_solveNamedTuple1;
}
}

namespace torch {
namespace autograd {

std::map<std::string, PyTypeObject*>& get_namedtuple_types_map() {
  // [NOTE] Non-global map
  // This map calls Python functions during its initialization.
  // If it is a global static variable and in case it is loaded
  // before Python interpreter is ready, then the calls it makes during
  // initialization will SEGFAULT.
  // To avoid this we make it function static variable so that it is
  // initialized only after the Python interpreter is ready.
  static std::map<std::string, PyTypeObject*> namedtuple_types_map = {
    {"_det_lu_based_helper", get__det_lu_based_helper_namedtuple()},
    {"_fake_quantize_per_tensor_affine_cachemask_tensor_qparams", get__fake_quantize_per_tensor_affine_cachemask_tensor_qparams_namedtuple()},
    {"_fused_moving_avg_obs_fq_helper", get__fused_moving_avg_obs_fq_helper_namedtuple()},
    {"_lu_with_info", get__lu_with_info_namedtuple()},
    {"_unpack_dual", get__unpack_dual_namedtuple()},
    {"aminmax", get_aminmax_namedtuple()},
    {"aminmax_out", get_aminmax_out_namedtuple()},
    {"cummax", get_cummax_namedtuple()},
    {"cummax_out", get_cummax_out_namedtuple()},
    {"cummin", get_cummin_namedtuple()},
    {"cummin_out", get_cummin_out_namedtuple()},
    {"eig_out", get_eig_out_namedtuple()},
    {"eig", get_eig_namedtuple()},
    {"frexp", get_frexp_namedtuple()},
    {"frexp_out", get_frexp_out_namedtuple()},
    {"geqrf_out", get_geqrf_out_namedtuple()},
    {"geqrf", get_geqrf_namedtuple()},
    {"histogram_out", get_histogram_out_namedtuple()},
    {"histogram", get_histogram_namedtuple()},
    {"kthvalue", get_kthvalue_namedtuple()},
    {"kthvalue_out", get_kthvalue_out_namedtuple()},
    {"linalg_cholesky_ex", get_linalg_cholesky_ex_namedtuple()},
    {"linalg_cholesky_ex_out", get_linalg_cholesky_ex_out_namedtuple()},
    {"linalg_eig", get_linalg_eig_namedtuple()},
    {"linalg_eig_out", get_linalg_eig_out_namedtuple()},
    {"linalg_eigh", get_linalg_eigh_namedtuple()},
    {"linalg_eigh_out", get_linalg_eigh_out_namedtuple()},
    {"linalg_inv_ex", get_linalg_inv_ex_namedtuple()},
    {"linalg_inv_ex_out", get_linalg_inv_ex_out_namedtuple()},
    {"linalg_lstsq", get_linalg_lstsq_namedtuple()},
    {"linalg_lstsq_out", get_linalg_lstsq_out_namedtuple()},
    {"linalg_qr", get_linalg_qr_namedtuple()},
    {"linalg_qr_out", get_linalg_qr_out_namedtuple()},
    {"linalg_slogdet", get_linalg_slogdet_namedtuple()},
    {"linalg_slogdet_out", get_linalg_slogdet_out_namedtuple()},
    {"linalg_svd_out", get_linalg_svd_out_namedtuple()},
    {"linalg_svd", get_linalg_svd_namedtuple()},
    {"lstsq_out", get_lstsq_out_namedtuple()},
    {"lstsq", get_lstsq_namedtuple()},
    {"lu_unpack", get_lu_unpack_namedtuple()},
    {"lu_unpack_out", get_lu_unpack_out_namedtuple()},
    {"max", get_max_namedtuple()},
    {"max_out", get_max_out_namedtuple()},
    {"median", get_median_namedtuple()},
    {"median_out", get_median_out_namedtuple()},
    {"min", get_min_namedtuple()},
    {"min_out", get_min_out_namedtuple()},
    {"mode", get_mode_namedtuple()},
    {"mode_out", get_mode_out_namedtuple()},
    {"nanmedian", get_nanmedian_namedtuple()},
    {"nanmedian_out", get_nanmedian_out_namedtuple()},
    {"qr_out", get_qr_out_namedtuple()},
    {"qr", get_qr_namedtuple()},
    {"slogdet", get_slogdet_namedtuple()},
    {"solve", get_solve_namedtuple()},
    {"solve_out", get_solve_out_namedtuple()},
    {"sort_out", get_sort_out_namedtuple()},
    {"sort", get_sort_namedtuple()},
    {"svd_out", get_svd_out_namedtuple()},
    {"svd", get_svd_namedtuple()},
    {"symeig_out", get_symeig_out_namedtuple()},
    {"symeig", get_symeig_namedtuple()},
    {"topk_out", get_topk_out_namedtuple()},
    {"topk", get_topk_namedtuple()},
    {"triangular_solve_out", get_triangular_solve_out_namedtuple()},
    {"triangular_solve", get_triangular_solve_namedtuple()},
  };
  return namedtuple_types_map;
}

PyTypeObject* get_namedtuple(std::string name) {
  static auto& namedtuple_types_map = get_namedtuple_types_map();
  return namedtuple_types_map[name];
}

void initReturnTypes(PyObject* module) {
  static struct PyModuleDef def = {
      PyModuleDef_HEAD_INIT, "torch._C._return_types", nullptr, -1, {}};
  PyObject* return_types_module = PyModule_Create(&def);
  if (!return_types_module) {
    throw python_error();
  }

  for (const auto& return_type_pair : get_namedtuple_types_map()) {
    // hold onto the TypeObject for the unlikely case of user
    // deleting or overriding it.
    Py_INCREF(return_type_pair.second);
    if (PyModule_AddObject(
            return_types_module,
            return_type_pair.first.c_str(),
            (PyObject*)return_type_pair.second) != 0) {
      Py_DECREF((PyObject*)return_type_pair.second);
      throw python_error();
    }
  }

  // steals a reference to return_types on success
  if (PyModule_AddObject(module, "_return_types", return_types_module) != 0) {
    Py_DECREF(return_types_module);
    throw python_error();
  }
}

} // namespace autograd
} // namespace torch

```

</details>

<details>

<summary>Eg. updated call in other python_*_functions</summary>

```cpp
// linalg_cholesky_ex
static PyObject * THPVariable_linalg_cholesky_ex(PyObject* self_, PyObject* args, PyObject* kwargs)
{
  HANDLE_TH_ERRORS
  static PyTypeObject* NamedTuple = get_namedtuple("linalg_cholesky_ex");
  static PyTypeObject* NamedTuple1 = get_namedtuple("linalg_cholesky_ex_out");
  static PythonArgParser parser({
    "linalg_cholesky_ex(Tensor input, *, bool upper=False, bool check_errors=False, TensorList[2] out=None)",
  }, /*traceable=*/true);

  ParsedArgs<4> parsed_args;
  auto _r = parser.parse(nullptr, args, kwargs, parsed_args);
  if(_r.has_torch_function()) {
    return handle_torch_function(_r, nullptr, args, kwargs, THPLinalgVariableFunctionsModule, "torch.linalg");
  }
  if (_r.isNone(3)) {
    // aten::linalg_cholesky_ex(Tensor self, *, bool upper=False, bool check_errors=False) -> (Tensor L, Tensor info)

    auto dispatch_linalg_cholesky_ex = [](const at::Tensor & self, bool upper, bool check_errors) -> ::std::tuple<at::Tensor,at::Tensor> {
      pybind11::gil_scoped_release no_gil;
      return at::linalg_cholesky_ex(self, upper, check_errors);
    };
    return wrap(NamedTuple, dispatch_linalg_cholesky_ex(_r.tensor(0), _r.toBool(1), _r.toBool(2)));
  } else {
    // aten::linalg_cholesky_ex.L(Tensor self, *, bool upper=False, bool check_errors=False, Tensor(a!) L, Tensor(b!) info) -> (Tensor(a!) L, Tensor(b!) info)
    auto out = _r.tensorlist_n<2>(3);
    auto dispatch_linalg_cholesky_ex_out = [](at::Tensor & L, at::Tensor & info, const at::Tensor & self, bool upper, bool check_errors) -> ::std::tuple<at::Tensor,at::Tensor> {
      pybind11::gil_scoped_release no_gil;
      return at::linalg_cholesky_ex_out(L, info, self, upper, check_errors);
    };
    return wrap(NamedTuple1, dispatch_linalg_cholesky_ex_out(out[0], out[1], _r.tensor(0), _r.toBool(1), _r.toBool(2)));
  }
  Py_RETURN_NONE;
  END_HANDLE_TH_ERRORS
}

```

</details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66614

Reviewed By: H-Huang

Differential Revision: D32741134

Pulled By: zou3519

fbshipit-source-id: 27bada30d20e66333ca1be1775608d9f0cbf9f59
2021-12-06 09:05:29 -08:00
Christian Puhrsch
75955e4ef8 [clone][sparse] Add torch._C._sparse namespace (#68672)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68672

This PR adds `python_module: sparse` to `native_function.yaml`.
These functions would appear in `torch._C._sparse` namespace instead of
just `torch`.

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D32517813

fbshipit-source-id: 7c3d6df57a24d7c7354d0fefe1b628dc89be9431
2021-11-19 19:47:38 -08:00
soulitzer
2455cc2adf Address case when layout of tangent is not same as base (#66292)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66292

In this PR:
1. Fix the case when tangent has a different layout from the base when `set_fw_grad` by adding a native function and its batching rule.

For (1) we replace the following:
```
Tensor new_with_same_meta(const Variable& base) {
  int64_t nelement_in_storage = base.storage().nbytes() / base.itemsize();
  auto new_tensor = at::zeros({nelement_in_storage}, base.options());
  auto res = new_tensor.as_strided(base.sizes(), base.strides(), base.storage_offset());
  return res;
}
```
with a native function as to enable a batching rule to alter its behavior.

This new function will be similar to `new_zeros_strided` except we also require the `storage_offset` and `storage_numel` arguments.

Possible concerns:
 - Why have redundant logic? Why not add new args `new_zeros_strided`? This is probably a niche use case, so it's better not to complicate the current API.
 - Previously the created tensor inherits the TensorOptions of the primal. Now we inherit from the TensorOptions of the tangent.
   - Probably fine. Likely, no one relies on this because the behavior is only triggered when tangent/base have different layouts.
 - Why pass in exploded size, stride, and offset? It is possible in the non-batched case to pass in a tensor directly, but not possible when we'd like to have a batching rule. The size, stride, and offset we'd be passing won't belong to any live tensor.

Test Plan: Imported from OSS

Reviewed By: zou3519, albanD

Differential Revision: D31842019

Pulled By: soulitzer

fbshipit-source-id: a58433d814fd173bc43a2c550b395377dba40de2
2021-11-19 14:29:46 -08:00
Brian Hirsh
0032fa7725 Add a Functionalization pass in core (#64432)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64432

Original PR description + feedback here: https://github.com/pytorch/pytorch/pull/63048

I've addressed all of the feedback in the original PR and made some pretty large changes, listed below.

**Table of Contents**
- Starting points
- List of the main changes from the original PR
- Next Steps
- Example codegen output (for a view, mutation, and view+mutation op)

**Starting Points**

A good place to start when looking through the PR:
* Alban mentioned that this is a useful mental model (thanks Ed for originally making this clear to me). Semantically, the pass currently does THREE things, which are all needed by functorch - all fused together into one big pass.
  * (a) alias removal, which replaces {view} calls with {view}_copy calls, and manually tracks aliasing information, so that when one tensor is mutated, we re-apply the same mutation to all of the aliases. This is the bulk of the work - once this is done, the next 2 things are trivial to implement.
  * (b) mutation removal, which is easy to do once we know that there are no aliases. Every mutation `a.add_(b)` becomes `a.replace_(a.add(b))`
  * (c) reapplying views: all of the `{view}_copy` calls are replaced with `{view}` calls again. This is an optimization that we can make specifically for functorch (and strided backends), that only care about mutation removal and not alias removal
  * XLA and Vulkan only want (a), or (a) + (b). Later, we'll want to split this out so that you can actually opt into different versions of this logic.
  * There is currently no {view}_copy replacement, because the pass just <replace views with copies> and <replace copies with views> steps have been combined. Later, we'll want to actually implement {view}_copy variants of each view operator, probably with codegen.
* documentation breadcrumb 1, in `FunctionalTensorWrapper.cpp`: https://github.com/pytorch/pytorch/pull/64432/files#diff-a0bac99bf205dba5b94cb64fc2466d3d55d991887572f9cd6a02e27b3a91dd60R59 (you might have to expand the `FunctionalTensorWrapper.cpp` file, which GitHub closes by default because it's large)
* documentation breadcrumb 2, in `FunctionalTensorWrapper.h`: https://github.com/pytorch/pytorch/pull/64432/files#diff-c945c71a4ccac65871f24a912e8904f9a5088b24a32e636727ea9c8fe920708aR12
* Reading through the codegen output at the bottom of this description.

**Main changes from the original PR**

(1)  I use lambdas instead of a giant enum to handle all of the different views.

This results in less boilerplate per view op (and more stuff that can be codegen'd). Every `ViewMeta` object now contains a `forward` and `reverse` lambda, that knows how to replay the view and its inverse. This makes the actual code that executes the replaying logic a lot less boilerplate-y (see `Alias::sync_update_operations` and `FunctionalTensorWrapper::sync_`)

(2) Every tensor during the functionalization pass is always wrapped in a `FunctionalTensorWrapper`.

This is potentially unnecessary for Vulkan/XLA, and will have a mild perf impact, but for now this PR just targets the functorch use case. I previously had a complicated design a (`FunctionalTensorImplBase` class) to avoid needing the wrapper for XLA, but it had some subtleties that are gonna require more thought to fix, so I'm pushing that off for now.

(3) `FunctionalTensorWrapper` objects accurately report stride information.

It's a little annoying to do this though, because the logic that calculates stride info for each view isn't easily separated from the actual view kernels in core, `at::native::{view}`. I do this by adding logic in each `at::functionalization::{view}` kernel to call the reference implementation `at::native::{view}`. I don't do anything with the output aside from taking it's size/stride/storage_offset to set the actual output tensor's size/stride/storage_offset correctly. There's another annoying part to this: I'm pretty sure that we want to pass in the actual *wrapper* tensors directly into the native kernels, not their inner unwrapped values. But there are some `at::native::{view}` kernels that call other tensor methods, which re-invokes the dispatcher, calling functionalization/functorch kernels that try do the unwrapping.

To do this, right now I have an `AutoDispatchDirectlyToNative` guard that basically ensures that any tensor methods called inside of the at::native::{view} op always redispatch straight to the CPU kernel (which will be another at::native:: kernel). This feels kind of heavy handed, but I'm not sure of a better way to do it.

(4) `FunctionalTensorWrapper` objects accurately report aliasing information.

There's a new `FunctionalStorageImpl` class (subclass of `StorageImpl`) that allows tensors in the functionalization pass to accurately alias storage. If two tensors `a` and `b` in a functionalized program are views of one another, then `a.storage.is_alias_of(b.storage)` should return true. I added this in a pretty similar way to how meta tensors allocate storage, although I don't pass in an actual allocator (I think this is fine because you should never resize a functional tensor's storage).

One thing I'm not sure about - should `FunctionalTensorWrapper` set `storage_access_should_throw_`: (a) always, (b) never, (c) only if its wrapped tensor has it set.

Right now I have it not set, mostly because calling the reference view functions (`at::native::{view}`) requires looking at the storage. But that means that if you try to access storage from python in a functionalized program, you'll get silent garbage instead of an error. Related question: are we planning on exposing meta tensor storage to python in the future (even though it contains garbage)?

(5) better docs :)

**View operator coverage**

(6) The functionalization pass now gets math-composite view ops for free.

I didn't add the `Functionalize` dispatch key to the composite set, because I don't want composite ops like `torch.ones` to get decomposed before hitting the functionalization pass. Instead, I added codegen to manually register the `at::native::` kernels of composite view ops. This is a little hairy, because the names of the `at::native::` kernels aren't easily accessible. They're stored in a `Dict[DispatchKey, BackendIndex]`. I made a best-effort attempt to get each view kernel's name, basically by assuming that every view op has either a composite or cpu implementation.
There's also a hardcoded list of composite view ops in `gen_inplace_or_view_type.py`, but it looks like it's wrong. This is probably worth rationalizing later, but instead I created a new list of the "complete" set of composite view ops, and preserved the old set by hardcoding the delta between the two sets.

(7) I've added codegen for ops that are both views AND mutations, like `transpose_()` (why do we even have these {emoji:1f622}).

From some light testing, it looks like they work correctly with one caveat: I had a hard time ensuring that functorch programs that mutate their inputs using ops like `transpose_()` preserve the input mutations after the program finishes running. For (in my corresponding functorch branch) I emit a warning when this happens, and just don't preserve the mutation

(8) I added `{view}_inverse` implementations for every view op, in `FunctionalInverses.cpp`.

These are needed to take mutations made to views and replay them back onto the base. To reduce boilerplate, the codegen generates function declarations for each `{view}_inverse` function, so you get a nice compiler error when someone eventually adds a new view op.

The only view ops currently not supported are (a) as_strided, and (b) the sparse view ops (values()/indices()).

I can add support for as_strided, but it needs an `as_strided_inverse()` function. That will look really similar to the `as_strided_backward()` function in FunctionsManual.cpp, but it has some noticeable differences: we basically want an `as_strided_embed` for autograd and `as_strided_scatter` for functionalization. We also will probably need them to be primitives w.r.t to autograd, since the currently implementation for autograd uses view().copy_() calls that XLA won't be able to handle. I'm wondering if anyone has any objections, but otherwise I can make those change (which will require writing backward formulas for `as_strided_embed` and `as_strided_scatter`).

I did a bunch of manual testing that all looks pretty good, but it's definitely not fully tested. Ed pointed out that once XLA uses this pass (or at least once there's a POC), we can just run the existing xla view test suite. Hopefully that delay is okay - if it's not, maybe we can think about using OpInfos similar to how functorch uses them for testing.

Note: there's some duplication with autograd's view code. Every `{view}_inverse` implementation is really similar to the implementation for that view listed in `derivatives.yaml`. There are some major differences though:
* the autograd implementations over those backwards functions (like `permute_backwards()`, in `FunctionsManual.cpp`) internally call other view ops. For functoinalization, we want them to (eventually call `{view}_copy` operators).
* For view ops that take a subset of the original storage, like `slice/select/diagonal/as_strided()`, the autograd backward functions fill the "spaces" in the inverse call with zeroes. For functionalizations, we want to fill them with the value of `base` at those positions. It looks like this currently applies to 6 total ops (since we can ignore composites):
  * select
  * slice
  * diagonal
  * as_stridied
  * split
  * split_with_sizes
A nice end state would probably be for the autograd + functoinalization codegen to both look at the same yaml (either `derivatives.yaml`, or something else), and automatically generate the right thing. I didn't leave that in scope for this PR though.

**Current State + Next Steps**

There are a bunch of followups after this PR eventually lands. Roughly in order:
* Use the current pass to register problematic composite ops in functorch. Also, nested `functionalize()` calls aren't supported yet (I mostly just need to remove some debug asserts and test it).
* Work on freeing up dispatch key space in the by deduplicating the `{backend}`/`Autograd{backend}`/`Sparse{backend}`/`Quantized{backend}` keys
* Once we have more dispatch keys, split up this pass into 3 pieces - it's currently fused, and doesn't do the right thing for vulkan/XLA. Specifically, all of the `{view}` calls in the current pass's view-replay logic should turn into `{view}_copy` calls that vulkan/XLA know how to implement, and there will be separate passes for (a) removing mutations, and (b) turning `{view}_copy` calls back into `{view}` calls. For Vulkan, we eventually want a pass that ONLY removes aliasing and view calls, and doesn't remove mutations. We can also probably make the 2 new passes user dispatch keys to save dispatch key space, if they'll only be used by functorch anyway.
* Do more of a dive on perf for the vulkan/xla use cases. There are several areas to improve perf with varying levels of effort required. The simplest one that I'll probably do regardless is to codegen the out-of-place kernels instead of using a boxed fallback. Getting a POC working for xla will also be useful to test the view operator coverage.

**Example Codegen Output**

View Op:
```
::std::vector<at::Tensor> split_Tensor(c10::DispatchKeySet ks, const at::Tensor & self, int64_t split_size, int64_t dim) {

      auto self_ = at::functionalization::impl::unwrapFunctionalTensor(self);
      ::std::vector<at::Tensor> out;
      {
        at::AutoDispatchBelowFunctionalize guard;
        auto tmp_output = at::redispatch::split(ks & c10::after_func_keyset, self_, split_size, dim);
        out = at::functionalization::impl::wrapFunctionalTensor(tmp_output);
        // I'm fusing the [alias removal], [mutation removal], [add views back] passes together.
        // Later, we'll want to turn them into separate passes (since e.g. vulkan only cares about alias removal).
      }

      at::functionalization::ViewMeta view_meta = at::functionalization::ViewMeta(
        [split_size, dim](const at::Tensor& base, int64_t mutated_view_idx) -> at::Tensor {
          return base.split(split_size, dim)[mutated_view_idx];
        },
        [split_size, dim](const at::Tensor& base, const at::Tensor& mutated_view, int64_t mutated_view_idx) -> at::Tensor {
          return at::functionalization::impl::split_inverse(base, mutated_view, mutated_view_idx, split_size, dim);
        }
      );
      at::functionalization::impl::set_view_meta(out, self, view_meta);

      at::AutoDispatchDirectlyToNative native_guard;
      ::std::vector<at::Tensor> reference_tensor_output = at::native::split(self, split_size, dim);
      at::functionalization::impl::set_strides(out, reference_tensor_output);
      return out;

}
```

Mutation Op:
```
at::Tensor & add__Tensor(c10::DispatchKeySet ks, at::Tensor & self, const at::Tensor & other, const at::Scalar & alpha) {

      at::functionalization::impl::sync(self);
      at::functionalization::impl::sync(other);
      auto self_ = at::functionalization::impl::unwrapFunctionalTensor(self);
      auto other_ = at::functionalization::impl::unwrapFunctionalTensor(other);
      at::Tensor tmp_output;
      {
          at::AutoDispatchBelowFunctionalize guard;
          // The functionalization pass explicitly doesn't pass out= parameters to the redispatch
          tmp_output = at::redispatch::add(
            ks & c10::after_func_keyset, self_, other_, alpha);
      }

      self.replace_(tmp_output);
      at::functionalization::impl::maybe_add_update(self);
      return self;
}
```

View + Mutation Op:
```
at::Tensor & transpose_(c10::DispatchKeySet ks, at::Tensor & self, int64_t dim0, int64_t dim1) {

      at::functionalization::ViewMeta view_meta = at::functionalization::ViewMeta(
        [dim0, dim1](const at::Tensor& base, int64_t mutated_view_idx) -> at::Tensor {
          return base.transpose(dim0, dim1);
        },
        [dim0, dim1](const at::Tensor& base, const at::Tensor& mutated_view, int64_t mutated_view_idx) -> at::Tensor {
          return at::functionalization::impl::transpose_inverse(base, mutated_view, dim0, dim1);
        }
      );
      at::functionalization::impl::mutate_view_meta(self, view_meta);
      // See  Note [Propagating strides in the functionalization pass]
      // Directly update the sizes/strides/storage_offset fields on self using the inplace call.
      // I need the guard because I don't want the at::native kernel to end up calling more functionalization/functorch kernels.
      // Its only job is to directly compute the output size/stride/storage_offset metadata.
      at::AutoDispatchDirectlyToNative native_guard;
      at::native::transpose_(self, dim0, dim1);
      return self;

}
```

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31942093

Pulled By: bdhirsh

fbshipit-source-id: b95598dae35dd1842fa8b1d8d1448332f3afaadf
2021-10-28 10:51:17 -07:00
Brian Hirsh
665c148e42 move some codegen utilities into utils.py (#63094)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63094

This PR:
- Moves `FileManager` and its dependencies (`assert_never` and other imports) to `utils.py`, and updates all of the call-sites with the fresh imports
- Passes the list of NativeFunction objects into `gen_trace_type` directly, instead of requiring the function to regenerate it (we already have it)

The purpose of the reshuffling is to avoid circular dependencies in the next PR, where I add codegen for the functionalization pass, which gets called from `gen.py` (but depends on some stuff from the autograd codegen - in partulcar, the list of view ops).

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D31942096

Pulled By: bdhirsh

fbshipit-source-id: 36118facae61f25f8922bb43ad2818c80b53504e
2021-10-28 10:49:17 -07:00
lezcano
82a216c45b Add tensor.{adjoint(),H,mT,mH} methods and properties (#64179)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64179

This PR follows the discussion in https://github.com/pytorch/pytorch/issues/45063#issuecomment-904431478

Fixes https://github.com/pytorch/pytorch/issues/45063

cc ezyang anjali411 dylanbespalko mruberry Lezcano nikitaved rgommers pmeier asmeurer leofang AnirudhDagar asi1024 emcastillo kmaehashi heitorschueroff

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D30730483

Pulled By: anjali411

fbshipit-source-id: 821d25083f5f682450f6812bf852dc96a1cdf9f2
2021-10-13 07:44:43 -07:00
Peter Bell
44ede71751 Shard python_torch_functions.cpp (#62187)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62187

This file can take 3 minutes on its own to compile, and after
python_functions.cpp is the second limiting factor for compile time of
`libtorch_python` on a 32-core threadripper. This splits it into 3 files that
take around 1 minute each to compile.

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D29962048

Pulled By: albanD

fbshipit-source-id: 99016d75912bff483fe21b130cef43a6882f8c0e
2021-08-25 15:10:43 -07:00
Richard Zou
389380ffcc [reland] Refactor Tensor::to to call a primitive that is not copy_. (#62262)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62262

Context
-------
functorch is unable to vmap(grad(f)) when f contains a .to
call. This is because .to (when it is not a no-op) decomposes
to .copy_ under grad and the .copy_ is not compatible with vmap.

Fix
 ---
The fix for this is to have all Tensor::to variants call a new operator,
`_to_copy`, that always copies and is a primitive w.r.t. autograd so
that autograd decomposes Tensor::to into a call to `_to_copy`.
(This is related to https://github.com/pytorch/pytorch/issues/60956,
please let me know if you want to bikeshed the naming).

In order to get this done I had to do a bit of refactoring. All of the
`::to` implementations now call `to_impl` which may call `_to_copy`.

Autograd codegen changes
------------------------

The second thing I had to do was modify the autograd codegen. Right now,
autograd assumes that every output is either statically known to be
differentiable or not differentiable at codegen time. `_to_copy` is a
little special because its differentiability depends on the output
dtype. e.g. `torch.randn(3, requires_grad=True).to(torch.long)` is non
differentiable. To get this to work:
- I changed how `output_differentiability` in derivatives.yaml work.
- output_differentiability can now accept "conditions" for each of the
output arguments. A "condition" is some C++ code.
- We currently only support `output_differentiability` with conditions
if there is a single output. This is for convenience and can be changed
in the future.
- I added a new `output_differentiability_conditions` field to
DifferentiabilityInfo. This gets populated in load_derivatives.yaml
- forward-mode and reverse-mode AD take
`output_differentiability_conditions` into account.

Here's how the generated code for `VariableType::_to_copy`
[looks
like](https://gist.github.com/zou3519/93462df4bda1837acee345205b7cc849)
No other autogenerated code gets modified by this PR.

Performance benchmarking
------------------------
- I benchmarked [three
cases that demonstrate overhead](https://gist.github.com/zou3519/5b6985e6906b80eec5a0dd94ed5b6a1a).
- Case A: No-op .to(). Instruction count went from 50223 to 25623. I
have no clue why but this is a good thing.
- Case B: not-no-op .to(). Instruction count went from 665291 to 671961.
This is expected; `_to_copy` adds an additional dispatch.
- Case C: not-no-op .to() forward pass and backward pass. Instruction count
went from 4022841 to 4030057. This PR adds
an additional dispatch to .to() (so there should be one additional
dispatch in the forward pass) so this number looks reasonable.

Test Plan
---------
- test_torch.py has a test_to
- test_cuda.py has test_to*
- test_autograd has tests (test_type_conversions) that exercise the
reverse-mode path
- test_ops.py has some tests (like log_softmax) that exercise the
reverse-mode and forward-mode AD path.
- test_quantization, test_namedtensor all exercise tensor.to as well.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D29934998

Pulled By: zou3519

fbshipit-source-id: 820069acd66fd5af97b98f42edfca68572c9eb1c
2021-07-29 10:49:32 -07:00
Nikita Shulga
478098aaac Revert D29801652: Refactor Tensor::to to call a primitive that is not copy_.
Test Plan: revert-hammer

Differential Revision:
D29801652 (29bb3f4647)

Original commit changeset: bb01eb1acf3d

fbshipit-source-id: 93693bad8068d47a3a4c16f34f300e03ea573897
2021-07-26 19:37:17 -07:00
Richard Zou
29bb3f4647 Refactor Tensor::to to call a primitive that is not copy_. (#61458)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61458

Context
-------
functorch is unable to vmap(grad(f)) when f contains a .to
call. This is because .to (when it is not a no-op) decomposes
to .copy_ under grad and the .copy_ is not compatible with vmap.

Fix
 ---
The fix for this is to have all Tensor::to variants call a new operator,
`_to_copy`, that always copies and is a primitive w.r.t. autograd so
that autograd decomposes Tensor::to into a call to `_to_copy`.
(This is related to https://github.com/pytorch/pytorch/issues/60956,
please let me know if you want to bikeshed the naming).

In order to get this done I had to do a bit of refactoring. All of the
`::to` implementations now call `to_impl` which may call `_to_copy`.

Autograd codegen changes
------------------------

The second thing I had to do was modify the autograd codegen. Right now,
autograd assumes that every output is either statically known to be
differentiable or not differentiable at codegen time. `_to_copy` is a
little special because its differentiability depends on the output
dtype. e.g. `torch.randn(3, requires_grad=True).to(torch.long)` is non
differentiable. To get this to work:
- I changed how `output_differentiability` in derivatives.yaml work.
- output_differentiability can now accept "conditions" for each of the
output arguments. A "condition" is some C++ code.
- We currently only support `output_differentiability` with conditions
if there is a single output. This is for convenience and can be changed
in the future.
- I added a new `output_differentiability_conditions` field to
DifferentiabilityInfo. This gets populated in load_derivatives.yaml
- forward-mode and reverse-mode AD take
`output_differentiability_conditions` into account.

Here's how the generated code for `VariableType::_to_copy`
[looks
like](https://gist.github.com/zou3519/93462df4bda1837acee345205b7cc849)
No other autogenerated code gets modified by this PR.

Performance benchmarking
------------------------
- I benchmarked [three
cases that demonstrate overhead](https://gist.github.com/zou3519/5b6985e6906b80eec5a0dd94ed5b6a1a).
- Case A: No-op .to(). Instruction count went from 50223 to 25623. I
have no clue why but this is a good thing.
- Case B: not-no-op .to(). Instruction count went from 665291 to 671961.
This is expected; `_to_copy` adds an additional dispatch.
- Case C: not-no-op .to() forward pass and backward pass. Instruction count
went from 4022841 to 4030057. This PR adds
an additional dispatch to .to() (so there should be one additional
dispatch in the forward pass) so this number looks reasonable.

Test Plan
---------
- test_torch.py has a test_to
- test_cuda.py has test_to*
- test_autograd has tests (test_type_conversions) that exercise the
reverse-mode path
- test_ops.py has some tests (like log_softmax) that exercise the
reverse-mode and forward-mode AD path.
- test_quantization, test_namedtensor all exercise tensor.to as well.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D29801652

Pulled By: zou3519

fbshipit-source-id: bb01eb1acf3d79d84f284150d1be4be3b4ace351
2021-07-26 13:02:39 -07:00
Laurence Rouesnel
adb73d3dcf Removed overhead from reshape() call if tensor doesn't need to be changed (#61466)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61466

## Goal

Per #55126 the performance of `reshape` is worse than `alias` in cases where they are performing the same operation (i.e. where reshape is returning a view) because `reshape` delegates to `view` and duplicates some of the operations (specifically `infer_size_dv` and `computeStride`).

The goal of this pull-request is to reduce or remove the additional overhead that `reshape` has.

### Proposed Implementation

Instead of using `view` we implement a private/internal operator (`_reshape_alias`) that `reshape` dispatches to which skips the relevant checks. This is functionally equivalent to `as_strided` however it is a lot simpler because it's specialized to this use-case, and importantly the `backward` implementation is a lot faster.

Note that we have to dispatch (`reshape` is a composite operator) because `reshape` can return either a view or a copy of the Tensor depending on the parameters, and this complicates implementing a derivative/backward for `reshape`.

### Why not `as_strided`?

Using `as_strided` directly slows down autograd. If we use a custom function equivalent to `_reshape_alias` but with a simpler backward function then `view` has the same performance as `reshape`. If we delegate to `as_strided` it is about 56% slower (and this holds against our custom function).

This is also the reason we make an internal operator named `_reshape_alias` instead of exposing a new operator since this should only be used in the `reshape` case and it is effectively a more limited version of `view`, `alias`, and `as_strided`.

## Benchmarks
In a micro-benchmark for `backward` running:

```cpp
// Setup
at::Tensor x=torch::empty({2,2}, torch::requires_grad(true));

// Benchmark loop
// `reshape(-1)` replaced with a call to view(-1) for view baseline
x.pow(4).reshape(-1).mean().backward();
```

I also benchmarked simple operations without gradients using:

```cpp
// Setup
at::Tensor x=torch::empty({2,2}, torch::requires_grad(true));

// Benchmark loop
x.reshape(-1) // replaced with a call to view(-1) for view baseline
```

Baselined to `view`:

* Original `reshape`: `+3.3%` (without gradients `+20.8%`)
* Using `as_strided`: `+55.1%` (without gradients `+1.0%`)
* Using custom `_reshape_view`: `-1.0%` (without gradients `+6.2%`)

In absolute terms (note the percentages above were generated comparing between runs/tests rather than to a single baseline):

* Original `view`: `53.66 us` (without gradients `582.78 ns`)
* Original `reshape`: `55.46 us` (without gradients `704.24 ns`)
* Using `as_strided`: `83.24 us` (without gradients `576.49 ns`)
* Using custom `_reshape_view`: `53.13 us` (without gradients `536.01 ns`)

Note that these benchmarks perform a backwards operation as well. When compared without using gradient computation at all the performance differneces are more pronounced as this takes up more of the time.

### Original performance

<details>
  <summary>Benchmark results</summary>

```
[<torch.utils.benchmark.utils.common.Measurement object at 0x7f0e4d393160>
x.pow(4).view(-1).mean().backward();
setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true));
  Median: 53.66 us
  IQR:    2.70 us (52.54 to 55.24)
  884 measurements, 100 runs per measurement, 1 thread]

[<torch.utils.benchmark.utils.common.Measurement object at 0x7f0e2ebd4fa0>
x.pow(4).reshape(-1).mean().backward();
setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true));
  Median: 55.46 us
  IQR:    2.61 us (54.39 to 57.01)
  889 measurements, 100 runs per measurement, 1 thread]

2276116
2286256

<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f0e5b2e3e20>
   2640  ???:at::detail::computeStride(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::SmallVector<long, 5u> const&)
   1920  ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>)
   1520  ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>)
   1040  ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long>&&)
    980  ???:void at::infer_size_impl<c10::SmallVector<long, 5u> >(c10::ArrayRef<long>, long, c10::SmallVector<long, 5u>&)
    720  ???:__tls_get_addr
    520  ???:at::shouldRunRecordFunction(bool*)
    520  ???:__memcpy_avx_unaligned_erms
    200  ???:c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10:: ... g>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
    100  ???:c10::TensorImpl::strides() const
    100  ???:c10::TensorImpl::sizes() const
    100  ???:at::(anonymous namespace)::manager()
     77  /tmp/benchmark_utils_jit_build__1626465284__8a34e7ff-cd37-4a82-be28-7f19e081e771/timer_cpp_7815557938202456331/timer_src.cpp:main
     40  ???:c10::TensorImpl::numel() const
    -77  /tmp/benchmark_utils_jit_build__1626465284__8a34e7ff-cd37-4a82-be28-7f19e081e771/timer_cpp_8055217880649990171/timer_src.cpp:main
   -260  ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>)

Total: 10140
```

```
[<torch.utils.benchmark.utils.common.Measurement object at 0x7f850dd66c10>
x.view(-1);
setup: at::Tensor x=torch::empty({2,2});
  Median: 582.78 ns
  IQR:    33.80 ns (573.80 to 607.61)
  833 measurements, 10000 runs per measurement, 1 thread]

[<torch.utils.benchmark.utils.common.Measurement object at 0x7f850de31e20>
x.reshape(-1);
setup: at::Tensor x=torch::empty({2,2});
  Median: 704.24 ns
  IQR:    24.42 ns (697.20 to 721.62)
  679 measurements, 10000 runs per measurement, 1 thread]

56896
67036

<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f84e1930bb0>
   2640  ???:at::detail::computeStride(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::SmallVector<long, 5u> const&)
   1920  ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>)
   1520  ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>)
   1040  ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long>&&)
    980  ???:void at::infer_size_impl<c10::SmallVector<long, 5u> >(c10::ArrayRef<long>, long, c10::SmallVector<long, 5u>&)
    720  ???:__tls_get_addr
    520  ???:at::shouldRunRecordFunction(bool*)
    520  ???:__memcpy_avx_unaligned_erms
    200  ???:c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10:: ... g>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
    100  ???:c10::TensorImpl::strides() const
    100  ???:c10::TensorImpl::sizes() const
    100  ???:at::(anonymous namespace)::manager()
     76  /tmp/benchmark_utils_jit_build__1626466038__15fbbac0-2072-4459-8f8e-08121a905b99/timer_cpp_547407365342278353/timer_src.cpp:main
     40  ???:c10::TensorImpl::numel() const
    -76  /tmp/benchmark_utils_jit_build__1626466038__15fbbac0-2072-4459-8f8e-08121a905b99/timer_cpp_3457873755756181226/timer_src.cpp:main
   -260  ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>)

Total: 10140
```

</details>

### Using `as_strided`

<details>
  <summary>Benchmark results</summary>

```
[<torch.utils.benchmark.utils.common.Measurement object at 0x7f8b13bb5b50>
x.pow(4).view(-1).mean().backward();
setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true));
  Median: 53.37 us
  IQR:    3.15 us (51.73 to 54.88)
  936 measurements, 100 runs per measurement, 1 thread]

[<torch.utils.benchmark.utils.common.Measurement object at 0x7f8af55f8490>
x.pow(4).reshape(-1).mean().backward();
setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true));
  Median: 83.24 us
  IQR:    4.05 us (81.20 to 85.25)
  609 measurements, 100 runs per measurement, 1 thread]

2267916
2525061

<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f8af55f8e50>
   31930  ???:_int_free
   15940  ???:malloc
   11595  ???:_int_malloc
   10100  ???:torch::autograd::generated::details::as_strided_backward(at::Tensor, at::TensorGeometry, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>)
    9360  ???:__tls_get_addr
    8280  ???:free
    8100  ???:torch::autograd::VariableType::(anonymous namespace)::as_strided(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>)
    4520  ???:c10::intrusive_ptr<c10::TensorImpl, c10::UndefinedTensorImpl>::reset_()
    4080  ???:operator new(unsigned long)
     ...
    -780  ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
    -920  ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long> const&)
   -1220  ???:torch::autograd::generated::ViewBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&)
   -1520  ???:at::_ops::view::call(at::Tensor const&, c10::ArrayRef<long>)
   -1580  ???:torch::ADInplaceOrView::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
   -1680  ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::SmallVector<long, 5u> >(at::Tensor const&, c10::SmallVector<long, 5u> const&, c10::SmallVector<long, 5u> const&)
   -2560  ???:at::detail::computeStride(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::SmallVector<long, 5u> const&)
   -2640  ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>)
   -4860  ???:torch::autograd::VariableType::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)

Total: 257145
```

```

[<torch.utils.benchmark.utils.common.Measurement object at 0x7f93176a0160>
x.view(-1);
setup: at::Tensor x=torch::empty({2,2});
  Median: 570.55 ns
  IQR:    32.69 ns (552.87 to 585.56)
  874 measurements, 10000 runs per measurement, 1 thread]

[<torch.utils.benchmark.utils.common.Measurement object at 0x7f92f8f29490>
x.reshape(-1);
setup: at::Tensor x=torch::empty({2,2});
  Median: 576.49 ns
  IQR:    37.95 ns (559.51 to 597.46)
  861 measurements, 10000 runs per measurement, 1 thread]

56896
58556

<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f932556ca60>
    2140  ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>)
    1940  ???:torch::autograd::VariableType::(anonymous namespace)::as_strided(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>)
    1880  ???:torch::ADInplaceOrView::(anonymous namespace)::as_strided(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>)
    1720  ???:at::_ops::as_strided::call(at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>)
    1520  ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>)
    1400  ???:at::native::as_strided_tensorimpl(at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>)
    1260  ???:at::_ops::as_strided::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>)'2
    1260  ???:at::_ops::as_strided::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>)
     980  ???:void at::infer_size_impl<c10::SmallVector<long, 5u> >(c10::ArrayRef<long>, long, c10::SmallVector<long, 5u>&)
     ...
    -620  ???:at::Tensor c10::Dispatcher::redispatch<at::Tensor, at::Tensor const&, c10::ArrayRef<long ... ::ArrayRef<long>)> const&, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) const
    -780  ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)'2
    -780  ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
    -920  ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long> const&)
   -1520  ???:at::_ops::view::call(at::Tensor const&, c10::ArrayRef<long>)
   -1580  ???:torch::ADInplaceOrView::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
   -1680  ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::SmallVector<long, 5u> >(at::Tensor const&, c10::SmallVector<long, 5u> const&, c10::SmallVector<long, 5u> const&)
   -1740  ???:torch::autograd::VariableType::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
   -2640  ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>)

Total: 1660

```

</details>

### Using custom function (`_reshape_alias`)

<details>
  <summary>Benchmark results</summary>

```
[<torch.utils.benchmark.utils.common.Measurement object at 0x7f16861d6b50>
x.pow(4).view(-1).mean().backward();
setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true));
  Median: 53.50 us
  IQR:    2.64 us (52.32 to 54.96)
  906 measurements, 100 runs per measurement, 1 thread]

[<torch.utils.benchmark.utils.common.Measurement object at 0x7f1667b2ed60>
x.pow(4).reshape(-1).mean().backward();
setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true));
  Median: 53.13 us
  IQR:    3.40 us (51.72 to 55.13)
  914 measurements, 100 runs per measurement, 1 thread]

2269736
2273236

<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f1693f8dc10>
    5060  ???:torch::autograd::VariableType::(anonymous namespace)::_reshape_alias(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)
    2000  ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>)
    1780  ???:torch::ADInplaceOrView::(anonymous namespace)::_reshape_alias(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)
    1660  ???:at::_ops::_reshape_alias::call(at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)
    1600  ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::ArrayRef<long> >(at::Tensor const&, c10::ArrayRef<long> const&, c10::ArrayRef<long> const&)
    1520  ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>)
    1240  ???:at::_ops::_reshape_alias::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)'2
    1240  ???:at::_ops::_reshape_alias::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)
    1220  ???:torch::autograd::generated::AliasToShapeBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&)
     ...
    -780  ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)'2
    -780  ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
    -920  ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long> const&)
   -1220  ???:torch::autograd::generated::ViewBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&)
   -1520  ???:at::_ops::view::call(at::Tensor const&, c10::ArrayRef<long>)
   -1580  ???:torch::ADInplaceOrView::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
   -1680  ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::SmallVector<long, 5u> >(at::Tensor const&, c10::SmallVector<long, 5u> const&, c10::SmallVector<long, 5u> const&)
   -2640  ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>)
   -4860  ???:torch::autograd::VariableType::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)

Total: 3500
```

```

[<torch.utils.benchmark.utils.common.Measurement object at 0x7f5287adfb20>
x.view(-1);
setup: at::Tensor x=torch::empty({2,2});
  Median: 505.10 ns
  IQR:    20.04 ns (500.41 to 520.45)
  944 measurements, 10000 runs per measurement, 1 thread]

[<torch.utils.benchmark.utils.common.Measurement object at 0x7f526951b430>
x.reshape(-1);
setup: at::Tensor x=torch::empty({2,2});
  Median: 536.01 ns
  IQR:    17.81 ns (531.34 to 549.16)
  916 measurements, 10000 runs per measurement, 1 thread]

56896
60376

<torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f5295896c10>
    2000  ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>)
    1860  ???:torch::autograd::VariableType::(anonymous namespace)::_reshape_alias(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)
    1780  ???:torch::ADInplaceOrView::(anonymous namespace)::_reshape_alias(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)
    1660  ???:at::_ops::_reshape_alias::call(at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)
    1600  ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::ArrayRef<long> >(at::Tensor const&, c10::ArrayRef<long> const&, c10::ArrayRef<long> const&)
    1520  ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>)
    1240  ???:at::_ops::_reshape_alias::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)'2
    1240  ???:at::_ops::_reshape_alias::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)
     980  ???:void at::infer_size_impl<c10::SmallVector<long, 5u> >(c10::ArrayRef<long>, long, c10::SmallVector<long, 5u>&)
     ...
    -620  ???:at::Tensor c10::Dispatcher::redispatch<at::Tensor, at::Tensor const&, c10::ArrayRef<long ... ::ArrayRef<long>)> const&, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) const
    -780  ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)'2
    -780  ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
    -920  ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long> const&)
   -1520  ???:at::_ops::view::call(at::Tensor const&, c10::ArrayRef<long>)
   -1580  ???:torch::ADInplaceOrView::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
   -1680  ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::SmallVector<long, 5u> >(at::Tensor const&, c10::SmallVector<long, 5u> const&, c10::SmallVector<long, 5u> const&)
   -1740  ???:torch::autograd::VariableType::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)
   -2640  ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>)

Total: 3480

```

</details>

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D29792126

Pulled By: laurencer

fbshipit-source-id: f0519b45b65f868aa3e8651679354558bd761dfd
2021-07-21 14:05:35 -07:00
albanD
d03ff1a17d pre compute regex and match simple signature autograd codegen 15s -> 12s (#59852)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59852

This whole stack does not change anything to the codegened code

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D29063814

Pulled By: albanD

fbshipit-source-id: a751047526f8d58f4760ee6f9ae906675bed5d75
2021-06-12 06:58:36 -07:00
albanD
30a18fe318 refactor yaml loader import, no runtime change (#59850)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59850

This whole stack does not change anything to the codegened code

Test Plan: Imported from OSS

Reviewed By: ailzhang

Differential Revision: D29063816

Pulled By: albanD

fbshipit-source-id: ca3067443d8e6282c1077d3dafa3b4f330d43b28
2021-06-12 06:58:34 -07:00
albanD
7143a6a189 Avoid unnecessary re-computation autograd codegen 21s -> 15s (#59847)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59847

This whole stack does not change anything to the codegened code

Test Plan: Imported from OSS

Reviewed By: ailzhang

Differential Revision: D29063817

Pulled By: albanD

fbshipit-source-id: 284c3e057029b7a67f43a1b034bb30863bd68c71
2021-06-12 06:57:19 -07:00
Kshiteej K
c90260905f [fix] torch.{lin, log}space(): properly examine passed dtype (#53685)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/53171

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53685

Reviewed By: jbschlosser

Differential Revision: D28331863

Pulled By: anjali411

fbshipit-source-id: e89359b607d058158cfa1c9a82389d9a4a71185b
2021-06-10 11:59:54 -07:00
Jeffrey Wan
f52e202840 Add warning when accessing Tensor::grad() in the C++ API (#59362)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/35379

 - Adds  `retains_grad` attribute backed by cpp as a native function. The python bindings for the function are skipped to be consistent with `is_leaf`.
   - Tried writing it without native function, but the jit test `test_tensor_properties` seems to require that it be a native function (or alternatively maybe it could also work if we manually add a prim implementation?).
 - Python API now uses `retain_grad` implementation from cpp

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59362

Reviewed By: jbschlosser

Differential Revision: D28969298

Pulled By: soulitzer

fbshipit-source-id: 335f2be50b9fb870cd35dc72f7dadd6c8666cc02
2021-06-08 19:43:21 -07:00
Brian Hirsh
9354a68e7d [codegen] split out backend-specific information from NativeFunction in the model (#57361)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57361

Data model change in the codegen, which splits backend-specific information out of `NativeFunction`

### Overview
Currently in the codegen, native_functions.yaml has backend-specific information about each operator that is encoded directly into the data model, in the `NativeFunction` object. That's reasonable, since the native_functions.yaml is the source of truth for information about an operator, and the data model encodes that information into types.

Now that external backends can use the codegen though, that information is technically incomplete/inaccurate. In another PR, I tried patching the information on the `NativeFunction` object with the additional external information, by updating the `dispatch` entry to contain the external backend kernel name and dispatch key.

Instead, this PR tries to split out that information. The `NativeFunction` class contains all information about an operator from native_functions.yaml that's backend-independent and is known never to change regardless of what extra information backends provide. We also build up a backend "index", which is basically a mapping from [backend] -> [backend-specific-metadata]. Reading in an external backend yaml just involves updating that index with the new backend.

There were a few places where `NativeFunction` used the dispatch table directly, that I encoded as properties directly on the NativeFunction object (e.g. `is_abstract`). They were mostly around whether or not the operator has a composite kernel, which isn't something that's going to change for any external backends.

This has a few advantages:
- We can more easily re-use the existing logic in `native_function.py` and `register_dispatch_key.py` for both native and external backends, since they both involve a NativeFunction + a particular backend index
- The data in the data model will be the same regardless of how the codegen is run. Running the codegen with a new external backend doesn't change the data inside of NativeFunction or an existing backend index. It just adds a new index for that backend.
- There are several of codegen areas that don't care about backend-specific information: mostly the tracing and autograd codegen. We can reason about the codegen there more easily, knowing that backend-specific info is entirely uninvolved.

An alternative to this split would be to augment the NativeFunction objects with external backend information at the time that we create them. So the external codegen could read both native_functions.yaml and the external backend's yaml at the same time, and construct a NativeObject with a full dispatch table (including the XLA entry), and the correct setting of structured (taking into account both yamls). One disadvantage to this approach is that NativeFunction objects now contain different stuff depending on how you ran the codegen, and you have to make sure that any changes to the codegen can properly handle all the different variants.

### Data Model Changes
Removed 3 classes, which are used by the external codegen:
- ExternalBackendFunction
- ExternalBackendFunctionsGroup
- ExternalBackendMetadata

And added two new ones:
- BackendIndex
- BackendMetadata

`BackendIndex` contains any info that's specific to that backend, plus a mapping from operator names to backend specific metadata about the operator. One example of backend-specific info that's not operator-dependent is the fact that XLA prefers to implement functional kernels instead of out kernels (and so when they eventually mark an op as structured, they're going to mark the functional op and not the out op).

`BackendMetadata` contains info specific to an (operator, backend) pair. Right now, that's just (a) the name of the kernel, and (b) whether or not that operator is structured.

### Questions
I wanted to get this PR up earlier so I could get feedback, but there are a few things I want to call out:

**Dealing with `structured`.**
This PR separates out the notion of `structured` into two bits of information:
- Does [operator] have a meta() function. This is backend-agnostic, and is represented by the `structured` property on `NativeFunction`, same as before. This is used, e.g., to decide what signatures to add to `MetaFunctions.h`.
- Does [operator, backend] have an impl() function. This is backend dependent; even though technically all in-tree backends are forced to write impl() functions for an operator when we port the op to structured in native_functions.yaml, out-of-tree backends can decide to opt in independently. This is represented as a property on `BackendMetadata`. This is used in most other cases, e.g. in `RegisterDispatchKey` when we're deciding whether or not to gen a structured or unstructured wrapper.

I also baked `is_structured_dispatch_key` directly into each BackendIndex. So for operators marked "structured" in native_functions.yaml, their corresponding CPU/CUDA BackendIndex entries will be marked structured, and all others (except for potentially external backends) will not.

I ended up trying to deal with `structured` in this change since it's technically backend dependent (XLA can opt kernels into structured separately from in-tree ops), but that may have been too ambitious: it's technically not relevant until we actually add support for structured external kernels. If it's not clear that this is the right path for dealing with structured and we want to push that off, I'm fine with backing out the bits of this PR that make `structured` backend-dependent. I don't see anything *too* controversial related to structured in the change, but I tried to call out any areas in the comments

**Localizing the fact that external backends follow Dispatcher convention.**
Another thing that's sort of backend specific that I didn't totally address in this PR is the fact the fact that in-tree backends follow the Native API while external backends follow the Dispatcher API. I painted over that in `native_functions.py` by adding a helper, `kernel_signature`, that takes in a native function and gives you the "correct" signature for the specified backend- NativeSignature for in-tree backends, and DispatcherSignature for out-of-tree backends. In order to make that fully useable though, we'll need `NativeSignature` and `DispatcherSignature` to have matching interfaces. I didn't bother with that in this PR, which is why `gen_external_aten_fallbacks.py` still has a bunch of direct references to the dispatcher API. Thinking of adding it in a later PR but wanted to see if anyone has other opinions.

Maybe `is_external()` shouldn't even be a property on the BackendMetadata, and anything the codegen does that requires asking for that information should just be better abstracted away.

**Thoughts on the `BackendIndex` / `BackendMetadata` breakdown.**
One thing that's annoying right now is that to query for various pieces of metadata, you call helper functions like `backend_index.structured(f)`, which queries that particular backend and tells you if that specific NativeFunctionGroup is structured for that backend. It has to return an `Optional[bool]` though, since you have to handle the case where that operator doesn't have a kernel for that backend at all. So users of those helpers end up with a bunch of optionals that they need to unpack, even if they know at some point that the result isn't None. I think it would be easier instead to just store the NativeFunction object as a field directly on the BackendMetadata. Curious if there are any other opinions on a better way to model it though.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D28474362

Pulled By: bdhirsh

fbshipit-source-id: 41a00821acf172467d764cb41e771e096542f661
2021-05-17 12:25:35 -07:00
Ilqar Ramazanli
8b816e9010 To implement gradient for Pytorch (#54617)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/56129

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54617

Reviewed By: anjali411

Differential Revision: D28057452

Pulled By: iramazanli

fbshipit-source-id: 9bd86679282d34f5e5393e6447121586517eb4f0
2021-05-11 18:52:20 -07:00
Ilqar Ramazanli
15975cf6a6 To add priority of int/int? over int[] on signature matching and adding {h,v,d}split methods (#57346)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/54555

It has been discussed in the issue https://github.com/pytorch/pytorch/issues/54555 that {h,v,d}split methods unexpectedly matches argument of single int[] when it is expected to match single argument of int. The same unexpected behavior can happen in other functions/methods which can take both int[] and int? as single argument signatures.

In this PR we solve this problem by giving higher priority to int/int? arguments over int[] while sorting signatures.

We also add methods of {h,v,d}split methods here, which helped us to discover this unexpected behavior.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57346

Reviewed By: ezyang

Differential Revision: D28121234

Pulled By: iramazanli

fbshipit-source-id: 851cf40b370707be89298177b51ceb4527f4b2d6
2021-05-03 18:52:41 -07:00
Sam Estep
75024e228c Add lint for unqualified type: ignore (#56290)
Summary:
The other half of https://github.com/pytorch/pytorch/issues/56272.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56290

Test Plan:
CI should pass on the tip of this PR, and we know that the lint works because the following CI runs (before this PR was finished) failed:

- https://github.com/pytorch/pytorch/runs/2384511062
- https://github.com/pytorch/pytorch/actions/runs/765036024

Reviewed By: seemethere

Differential Revision: D27867219

Pulled By: samestep

fbshipit-source-id: e648f07b6822867e70833e23ddafe7fb7eaca235
2021-04-21 08:07:23 -07:00
Edward Yang
6ec71ed4f9 Replace all direct cdata access with THPVariable_Unpack (#55799)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55799

I'm going to change the implementation of cdata soon so I need to
abstract over cdata access with a function.  Additionally, many
users are casting manually casting to THPVariable to access
the member so I can remove these unsafe casts in the client code
(the implementation, of course, is still doing an unsafe cast.)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D27712130

Pulled By: ezyang

fbshipit-source-id: 95fcc013bf3913d67f2c634068eb5b3aab144cb3
2021-04-15 08:57:04 -07:00
Sam Estep
4753100a3b Un-ignore F403 in .flake8 (#55838)
Summary:
Generally wildcard imports are bad for the reasons described here: https://www.flake8rules.com/rules/F403.html

This PR replaces wildcard imports with an explicit list of imported items where possible, and adds a `# noqa: F403` comment in the other cases (mostly re-exports in `__init__.py` files).

This is a prerequisite for https://github.com/pytorch/pytorch/issues/55816, because currently [`tools/codegen/dest/register_dispatch_key.py` simply fails if you sort its imports](https://github.com/pytorch/pytorch/actions/runs/742505908).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55838

Test Plan: CI. You can also run `flake8` locally.

Reviewed By: jbschlosser

Differential Revision: D27724232

Pulled By: samestep

fbshipit-source-id: 269fb09cb4168f8a51fd65bfaacc6cda7fb87c34
2021-04-13 09:24:07 -07:00
Sameer Deshmukh
5fb1142702 Add CSR (compressed sparse row) layout for sparse tensors (#50937)
Summary:
Implement compressed sparse row format. Derived from the GCS implementation at https://github.com/pytorch/pytorch/pull/44190

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50937

Reviewed By: mrshenli

Differential Revision: D27439865

Pulled By: ezyang

fbshipit-source-id: 3ba3dcb9679505b980ff6a5f513e913bbae2fb1d
2021-04-12 10:09:12 -07:00
lezcano
5870346173 Port index_copy from TH to ATen (#52203)
Summary:
The design of the `TensorIterator` was similar to that in https://github.com/pytorch/pytorch/pull/50578

Resolves https://github.com/pytorch/pytorch/issues/24670
Resolves https://github.com/pytorch/pytorch/issues/24523

Timings:
<details>
<summary>Script</summary>

```python
from IPython import get_ipython
import torch

torch.manual_seed(13)
torch.set_num_threads(1)

ipython = get_ipython()

cpu = torch.device('cpu')
cuda = torch.device('cuda')

def run_test(ndims, size, index_len, device):
    print(f"ndims: {ndims}, tensor_size: {size}, index_len: {index_len}, device: {device}")

    x = torch.rand(*([size] * ndims), device=device)
    index = torch.randint(size, (index_len,), dtype=torch.long, device=device)
    for d in range(ndims):
        shape_t = [size] * d + [index_len] + [size] * (ndims - d - 1)
        t = torch.rand(*shape_t, device=device)
        command = "x.index_copy(d, index, t)"
        if device == cuda:
            command = command + "; torch.cuda.synchronize()"
        ipython.magic(f"timeit {command}")
    print()

run_test(3, 700, 10, cpu)
run_test(3, 700, 100, cpu)
run_test(3, 700, 700, cpu)
run_test(2, 10000, 10000, cpu)

run_test(3, 700, 10, cuda)
run_test(3, 700, 100, cuda)
run_test(3, 700, 700, cuda)
run_test(2, 10000, 10000, cuda)
```

</details>

<details>
<summary>CPU ATen</summary>

```
ndims: 3, tensor_size: 700, index_len: 10, device: cpu
327 ms ± 309 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
329 ms ± 456 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
378 ms ± 1.44 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

ndims: 3, tensor_size: 700, index_len: 100, device: cpu
348 ms ± 1.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
359 ms ± 330 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
526 ms ± 686 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

ndims: 3, tensor_size: 700, index_len: 700, device: cpu
560 ms ± 19 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
552 ms ± 2.61 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
932 ms ± 2.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

ndims: 2, tensor_size: 10000, index_len: 10000, device: cpu
163 ms ± 5.05 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
302 ms ± 5.75 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```
</details>

<details>
<summary>CUDA ATen</summary>

```
ndims: 3, tensor_size: 700, index_len: 10, device: cuda
9.63 ms ± 441 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
9.65 ms ± 230 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
12.4 ms ± 881 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)

ndims: 3, tensor_size: 700, index_len: 100, device: cuda
10.8 ms ± 1.51 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
11 ms ± 417 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
21.2 ms ± 18.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

ndims: 3, tensor_size: 700, index_len: 700, device: cuda
19 ms ± 4.42 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
17.8 ms ± 493 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
25.8 ms ± 1.22 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

ndims: 2, tensor_size: 10000, index_len: 10000, device: cuda
5.59 ms ± 109 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
10 ms ± 25.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```

</details>

<details>
<summary>CPU TH</summary>

```
ndims: 3, tensor_size: 700, index_len: 10, device: cpu
333 ms ± 2.42 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
327 ms ± 1.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
366 ms ± 753 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

ndims: 3, tensor_size: 700, index_len: 100, device: cpu
336 ms ± 1.24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
345 ms ± 914 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
884 ms ± 4.32 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

ndims: 3, tensor_size: 700, index_len: 700, device: cpu
441 ms ± 3.58 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
514 ms ± 1.17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
7.46 s ± 6.46 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

ndims: 2, tensor_size: 10000, index_len: 10000, device: cpu
141 ms ± 233 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
1.13 s ± 855 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

</details>

<details>
<summary>CUDA TH</summary>

```
ndims: 3, tensor_size: 700, index_len: 10, device: cuda
9.64 ms ± 390 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
9.68 ms ± 3.26 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
13.9 ms ± 928 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)

ndims: 3, tensor_size: 700, index_len: 100, device: cuda
11.6 ms ± 1.38 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
12.1 ms ± 3.72 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
30.3 ms ± 27.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

ndims: 3, tensor_size: 700, index_len: 700, device: cuda
27.2 ms ± 19.8 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
30.6 ms ± 43.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
146 ms ± 204 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

ndims: 2, tensor_size: 10000, index_len: 10000, device: cuda
6.5 ms ± 3.99 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
64.7 ms ± 55.5 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
```

</details>

According to these we see a slight performance improvement across both CPU and GPU.

cc: nikitaved

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52203

Reviewed By: jbschlosser

Differential Revision: D27066572

Pulled By: mruberry

fbshipit-source-id: 6101e461cf731afa3db042a383b723d3d6bfdc26
2021-03-22 22:36:35 -07:00
kedejesu
53d8778b4d Update clang-format linux hash and yaml import calls (#53932)
Summary:
Fixing Bandit security issues.
- yaml_load: Use of unsafe yaml load. Allows instantiation of arbitrary objects. Consider yaml.safe_load().
Test ID: B506
Severity: MEDIUM
Confidence: HIGH
File: ./caffe2/contrib/aten/gen_op.py
More info: https://bandit.readthedocs.io/en/latest/plugins/b506_yaml_load.html
235 if __name__ == '__main__':
236     decls = yaml.load(read(os.path.join(args.yaml_dir, 'Declarations.yaml')), Loader=Loader)
237     factory_methods = find_factory_methods(decls)

- Blacklist: Use of insecure MD2 (6149a26adb), MD4 (fc7f026980), MD5 (7ea9d9af4e), or SHA1 hash function.
Test ID: B303
Severity: MEDIUM
Confidence: HIGH
File: ./tools/clang_format_utils.py
More info: https://bandit.readthedocs.io/en/latest/blacklists/blacklist_calls.html#b303-md5
36
37     hash = hashlib.sha1()
38

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53932

Reviewed By: jbschlosser

Differential Revision: D27072017

Pulled By: malfet

fbshipit-source-id: 2fef0119388797aee3cacdc880fc345bd2ba68ce
2021-03-18 17:11:58 -07:00
Ailing Zhang
9f75de278f Move common autograd utils functions from gen_variable_type.py to api/autograd.py. (#53340)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53340

Test Plan: Imported from OSS

Reviewed By: nikithamalgifb

Differential Revision: D26973914

Pulled By: ailzhang

fbshipit-source-id: 8367a08b27b25808782c77aadc3c67d07c354957
2021-03-11 19:58:45 -08:00
kshitij12345
c4c77e2001 [special] add torch.special namespace (#52296)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/50345

 * Add `torch.special` namespace
* Add `torch.special.gammaln` (alias to `torch.lgamma`)

TODO:
* Add proper entries for docs.
   * [x] Add .rst file entry
   * [x] Add documentation
   * [x] Update `lgamma` OpInfo entry for alias to `special.gammaln`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52296

Reviewed By: ngimel

Differential Revision: D26754890

Pulled By: mruberry

fbshipit-source-id: 73479f68989d6443ad07b7b02763fa98973c15f6
2021-03-04 00:04:36 -08:00
Peter Bell
70d0aab7bd De-prioritise Dimname and DimnameList in python overload resolution (#51350)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51350

`None` being a valid `Dimname` is awkward for optional `dim` arguments, as found
on NumPy's reduction functions like `std` and `var`. In these cases `dim=None`
should mean an all-reduction, but instead you get an error
"Please look up dimensions by name".

I've also had to fix `FunctionParameter::check` to actually check the first
element of `INT_LIST` arguments and reject non-int types. Otherwise, the dim
names end up calling the `int[]` overload and fail.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D26756208

Pulled By: mruberry

fbshipit-source-id: 44221ca0f4822ec2c1f62b092466fd4f779eb45a
2021-03-02 23:07:08 -08:00
Vasiliy Kuznetsov
33afb5f19f fake_quant cachemask: remove Python bindings (#51878)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51878

`fake_quantize_per_tensor_affine_cachemask` and
`fake_quantize_per_channel_affine_cachemask` are implementation
details of `fake_quantize_per_tensor_affine` and
`fake_quantize_per_channel_affine`, removing the
Python bindings for them since there is no need to
expose them.

Test Plan:
```
python test/test_quantization.py TestFakeQuantize
```

Imported from OSS

Reviewed By: albanD, bugra

Differential Revision: D26314173

fbshipit-source-id: 733c93a3951453e739b6ed46b72fbad2244f6e97
2021-02-09 23:27:53 -08:00
Edward Yang
93c4f9f972 Split out RegisterDispatchKey to its own file (#51508)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51508

No substantive changes.  The codegen for this file was getting a
bit long so I moved it off into tools.codegen.dest submodule (I
wanted to do tools.codegen.gen but that conflicts with the existing
module; oy vey!)  To do this I had to move some other functions around
so that they were more generally accessible.  Otherwise
self-explanatory.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: ljk53

Differential Revision: D26187856

Pulled By: ezyang

fbshipit-source-id: fd3784571d03d01c4acb7ca589fcde4492526408
2021-02-04 09:19:32 -08:00
Edward Yang
8e20594b38 Construct CppSignatureGroup from NativeFunction (#49245)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49245

This will make it easier to implement the POC in
d534f7d4c5
see also https://github.com/pytorch/pytorch/pull/45666

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: smessmer

Differential Revision: D25594005

Pulled By: ezyang

fbshipit-source-id: e458d3dc3a765ec77425761b9b17f23769cecf9e
2021-01-04 11:55:28 -08:00
albanD
c23808d8e8 Reland: Add base forward grad logic (#49734)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49734

RFC: https://github.com/pytorch/rfcs/pull/11

This PR add the basic logic to handle forward grad as dual Tensors.
It contains the following:
- Mechanism to save dual state on a Tensor and clear it up when the dual level ends
- C++ and python user facing API
- Updated view system that is able to track both forward and backward views

The current PR has the following limitations:
- Extensive tests are in the next PR in the stack as formulas are needed to write full tests.
- Only the manual formulas have been audited and no other formula is actually implemented here (they are in the next PR in the stack)
- Only level 0 is allowed for now. This was discussed and agreed that it is not needed for the first version of this PR.
- We can save one ViewInfo creation when both the forward and backward views have the same base. This can be done by adding a boolean flag to the DifferentiableViewMeta and extra logic in the `as_view` method. This is left out to keep this PR concise.
- We can skip tracking forward views if the base has a forward grad. This can be done by adding extra logic in the `as_view` method. This is left out to keep this PR concise.

Reading guide:
- Updated view handling in [gen_variable_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-f6553cec68caeaea36f6c8b14ff76a6d39dfd774e0ea9ef2f76e8d81fd9af5df), [VariableTypeUtils.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-ec71cfa45954dece1236c661d170e6341879c5be637f4abf52e826d61b40695a), [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285) (skip code below "[Forward Grad View]" for now), [variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-1604bcd0e4350ed99ec45e437cee7ac9ebe337392c9ea16a236247aeeb35b02bR266-R542) and [custom_function.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-dd85f452082b5bb6612bbc12adb496f8827defa228509f7b493de1d517522d5d). This introduces the new ViewInfo to hold view informations shared for forward and backward. It also updates the differentiable view meta to use this. And it updates the as_view function to handle both forward and backward view.
- New forward grad class that handle storing gradients and tracking at each level [forward_grad.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c6c5b9ab2d7e5dde4102495faa1b6bbbfc23aa3e47deb7359c0bfe1eb004c0cb), [forward_grad.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-de2ab54ade7312701850d71a119a4f4ee4b9fc5a9c42a467cdd4e73c033531dd) and [build_variables.bzl](https://github.com/pytorch/pytorch/pull/49097/files#diff-dfdfa2efb17beddfd9094524f95351fd197db6c8857e96b436fb599870359325). EDIT: These files also contain the new flag to globally disable forward AD that allows us to reduce performance issues while this is in development.
- Lowest level API and binding between Tensor and AutogradMeta in [TensorBody.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-7554853205392fa743357bf845ecc350a974ec049383248c12daaf2f4de04911), [TensorImpl.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-052bd9150ef8e09289ddf644b5a6830ede49207201cd41728f6d7cc6d9cead94), [TensorImpl.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-a15aae4cf23da44970db7cece62ff981265575c798c62f7b52d87c8809dfe2e1) and the rest of [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285R557-R677)
- API to access the forward primal that needs to be a differentiable function (and so in native_functions.yaml) [native_functions.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-2f3dbd85efb9b5172f2264eedd3be47dd765e6ab7cc8bf3ade5e62c28ae35991) [NamedRegistrations.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-69bd3bea510c9b64e1633fa18c3ea63d4b8348dbad3a78ad9de844ab3e43dc1d), [VariableMethodsStub.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-23f5fcb737a2b289811fe0f4b65aef775e7c824b2e629ecd343df51405cd434f), [derivatives.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_python_functions.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_trace_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-54e0b976027bf8debefb959ff360b89ae93466970c843365b1b3a03806d868ce), [TraceTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-f34636741ad4a23d018e0c289bc750c3bad887b45660e1d6eaf440d234a78fbf) and [part of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R198-R243)
- c++ API [autograd.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-349028fbe8291a965a7a263c323b208fe071c35c66179ee997ef84fa81aa4b1e), [autograd.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-a3fe908d67dfec16a1fcde300de68b0701bf68b88db7451f29f2bee255cf30c9)
- python binding [init.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-c58a67c85191c22c9b3bb439117d8053edfd9dea839fa010cf967d404c3c630d)
- python API [forward_ad.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a4efad4ba18fffdfb264c21e5475997a24a743089a899f8ec1a5ff962c6738d9), [autograd/__init__.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-743abcafd32ad0e69f39ac5a91df4197b7e1921c135cacee7ef6dc829a8a7af8)
- c++ and python printing [Formatting.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-881dba501e71662e2e4818b4b016f739b344c8aed2f5edc6b871eda47a2aced0), [_tensor_str.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a7911f8d5e73adbff914d99fd7818ace2a7030b6a3748abe06ec6fc6e3df9cc3)
- Utility for formulas and updated manual functions to respect new view system as well as forward grad [FunctionsManual.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-6378bb6dc81a64dab676d61731341fa5d1088418f32a1473a33a0ccfc2357dc1), [FunctionsManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-4adbd88239afcd60e8198aab65d4f5e43b62314e34b80551e997a1ea503adea5) [rest of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R264-R433)
- Ensure SavedVariable save forward grad properly [saved_variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c1b8039d776241abe177d5aa99b79dd9489a9b3e529da8ab24c2e386c1238ae2), [saved_variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-cc9fba479b5beae06b2eea2e390d17796e0341c5b037a20b5bcaccbb0c341030)

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D25678797

Pulled By: albanD

fbshipit-source-id: 3d58550c11b5f58b9b73fd30596d042b857fb9dd
2020-12-22 12:11:27 -08:00
Walter Shen
f5178bf151 Revert D25607503: Add base forward grad logic
Test Plan: revert-hammer

Differential Revision:
D25607503 (fdf02eff3d)

Original commit changeset: f1396290de1d

fbshipit-source-id: 057206e28ff48ee288856adfe3ca577d4880789f
2020-12-21 19:56:28 -08:00
albanD
fdf02eff3d Add base forward grad logic (#49097)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49097

RFC: https://github.com/pytorch/rfcs/pull/11

This PR add the basic logic to handle forward grad as dual Tensors.
It contains the following:
- Mechanism to save dual state on a Tensor and clear it up when the dual level ends
- C++ and python user facing API
- Updated view system that is able to track both forward and backward views

The current PR has the following limitations:
- Extensive tests are in the next PR in the stack as formulas are needed to write full tests.
- Only the manual formulas have been audited and no other formula is actually implemented here (they are in the next PR in the stack)
- Only level 0 is allowed for now. This was discussed and agreed that it is not needed for the first version of this PR.
- We can save one ViewInfo creation when both the forward and backward views have the same base. This can be done by adding a boolean flag to the DifferentiableViewMeta and extra logic in the `as_view` method. This is left out to keep this PR concise.
- We can skip tracking forward views if the base has a forward grad. This can be done by adding extra logic in the `as_view` method. This is left out to keep this PR concise.

Reading guide:
- Updated view handling in [gen_variable_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-f6553cec68caeaea36f6c8b14ff76a6d39dfd774e0ea9ef2f76e8d81fd9af5df), [VariableTypeUtils.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-ec71cfa45954dece1236c661d170e6341879c5be637f4abf52e826d61b40695a), [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285) (skip code below "[Forward Grad View]" for now), [variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-1604bcd0e4350ed99ec45e437cee7ac9ebe337392c9ea16a236247aeeb35b02bR266-R542) and [custom_function.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-dd85f452082b5bb6612bbc12adb496f8827defa228509f7b493de1d517522d5d). This introduces the new ViewInfo to hold view informations shared for forward and backward. It also updates the differentiable view meta to use this. And it updates the as_view function to handle both forward and backward view.
- New forward grad class that handle storing gradients and tracking at each level [forward_grad.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c6c5b9ab2d7e5dde4102495faa1b6bbbfc23aa3e47deb7359c0bfe1eb004c0cb), [forward_grad.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-de2ab54ade7312701850d71a119a4f4ee4b9fc5a9c42a467cdd4e73c033531dd) and [build_variables.bzl](https://github.com/pytorch/pytorch/pull/49097/files#diff-dfdfa2efb17beddfd9094524f95351fd197db6c8857e96b436fb599870359325). EDIT: These files also contain the new flag to globally disable forward AD that allows us to reduce performance issues while this is in development.
- Lowest level API and binding between Tensor and AutogradMeta in [TensorBody.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-7554853205392fa743357bf845ecc350a974ec049383248c12daaf2f4de04911), [TensorImpl.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-052bd9150ef8e09289ddf644b5a6830ede49207201cd41728f6d7cc6d9cead94), [TensorImpl.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-a15aae4cf23da44970db7cece62ff981265575c798c62f7b52d87c8809dfe2e1) and the rest of [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285R557-R677)
- API to access the forward primal that needs to be a differentiable function (and so in native_functions.yaml) [native_functions.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-2f3dbd85efb9b5172f2264eedd3be47dd765e6ab7cc8bf3ade5e62c28ae35991) [NamedRegistrations.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-69bd3bea510c9b64e1633fa18c3ea63d4b8348dbad3a78ad9de844ab3e43dc1d), [VariableMethodsStub.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-23f5fcb737a2b289811fe0f4b65aef775e7c824b2e629ecd343df51405cd434f), [derivatives.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_python_functions.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_trace_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-54e0b976027bf8debefb959ff360b89ae93466970c843365b1b3a03806d868ce), [TraceTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-f34636741ad4a23d018e0c289bc750c3bad887b45660e1d6eaf440d234a78fbf) and [part of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R198-R243)
- c++ API [autograd.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-349028fbe8291a965a7a263c323b208fe071c35c66179ee997ef84fa81aa4b1e), [autograd.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-a3fe908d67dfec16a1fcde300de68b0701bf68b88db7451f29f2bee255cf30c9)
- python binding [init.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-c58a67c85191c22c9b3bb439117d8053edfd9dea839fa010cf967d404c3c630d)
- python API [forward_ad.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a4efad4ba18fffdfb264c21e5475997a24a743089a899f8ec1a5ff962c6738d9), [autograd/__init__.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-743abcafd32ad0e69f39ac5a91df4197b7e1921c135cacee7ef6dc829a8a7af8)
- c++ and python printing [Formatting.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-881dba501e71662e2e4818b4b016f739b344c8aed2f5edc6b871eda47a2aced0), [_tensor_str.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a7911f8d5e73adbff914d99fd7818ace2a7030b6a3748abe06ec6fc6e3df9cc3)
- Utility for formulas and updated manual functions to respect new view system as well as forward grad [FunctionsManual.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-6378bb6dc81a64dab676d61731341fa5d1088418f32a1473a33a0ccfc2357dc1), [FunctionsManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-4adbd88239afcd60e8198aab65d4f5e43b62314e34b80551e997a1ea503adea5) [rest of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R264-R433)
- Ensure SavedVariable save forward grad properly [saved_variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c1b8039d776241abe177d5aa99b79dd9489a9b3e529da8ab24c2e386c1238ae2), [saved_variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-cc9fba479b5beae06b2eea2e390d17796e0341c5b037a20b5bcaccbb0c341030)

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D25607503

Pulled By: albanD

fbshipit-source-id: f1396290de1d75760f3d380c43cdd56e86fa6099
2020-12-21 14:39:43 -08:00
Sebastian Messmer
26974e6b28 Remove set_quantizer_ from native_functions.yaml (#49463)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49463

set_quantizer_ takes a ConstQuantizerPtr argument, which is neither supported by JIT nor by c10.
Also, it doesn't get dispatched (CPU and CUDA have the same implementation) and it is excluded from python bindings generation.
So there is no real reason why this needs to be in native_functions.yaml

Removing it unblocks the migration to c10-fullness since this is an op that would have been hard to migrate. See https://fb.quip.com/QRtJAin66lPN
ghstack-source-id: 118710663

Test Plan: waitforsandcastle

Reviewed By: ezyang

Differential Revision: D25587763

fbshipit-source-id: 8fab921f4c256c128d48d82dac731f04ec9bad92
2020-12-17 03:28:00 -08:00
Brian Hirsh
33a9b14da0 pyi codegen - removing byte-for-byte-compatibility hacks (sorting overloads) (#49056)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49056

This is another byte-for-byte compatibility hack. I'm now sorting pyi signature overloads (previously the codegen did not).

Mostly put this in a separate PR just to more easily reason about the diff in the codegen output.

Test Plan: Imported from OSS

Reviewed By: ljk53

Differential Revision: D25410846

Pulled By: bdhirsh

fbshipit-source-id: 06e5c32edbce610dd12ec7499014b41b23c646bd
2020-12-11 13:29:22 -08:00
Brian Hirsh
9920adebfd pyi cleanup (#49054)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49054

These are some followups from the first pyi codegen PR. Still maintaining byte-for-byte compatibility in this one.

- Separated `argument_str() with a pyi flag into two functions, `argument_str()` and `argument_str_pyi()`
- Added a notes section for pyi at the top of `python.py`
- Added a `Python Interface` section that I moved the free-standing pyi functions to

Test Plan: Imported from OSS

Reviewed By: ljk53

Differential Revision: D25410848

Pulled By: bdhirsh

fbshipit-source-id: db83a80af900c32b5e32d67ce27767f6e7c2adfb
2020-12-11 13:27:41 -08:00
Brian Hirsh
ba6511b304 pyi codegen update - remove Declarations.yaml (#48754)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48754

The goal of this PR is to kill Declarations.yaml in the pyi codegen, in favor of native_functions + the existing python object model.

**High-level design**

Since the python signatures used by the `python_arg_parser` are “supposed” to resemble the corresponding pyi type hint signatures, I re-used the existing python object model that Jiakai defined in `tools/codegen/api/python.py`. This means that the pyi codegen now reads `native_functions.yaml`, parses it into a bunch of `PythonSignatureGroup` objects, and emits corresponding method + function variants of type-hint signatures for each one, respectively into `__init__.pyi` and `_VariableFunctions.pyi`.

What makes this uglier is that pyi and the python arg parser have a number of differences in how they’re emitted. I expressed that through a `pyi` flag on the `PythonSignature` dataclass, that tells it whether or not to print itself as a pyi vs. arg_parser signature.

One thing worth noting is how pyi generates signatures differently for native / deprecated op signatures.

For native ops:
- The pyi codegen fuses functional and out variants of each op into a single signature with an optional `out` argument. Ops without an `out` variant just get an ordinary functional signature.
- Some ops that fit certain criteria also get a second “varargs” signature - basically ops with a single positional argument of type List[int].

For deprecated signatures:
- Functional and out variants are not fused - they each get their own signature entry
- There are no varargs signatures

This is currently implemented through the `signature_str()` and `signature_str_vararg()` methods on the `PythonSignature`/`PythonSignatureDeprecated` classes.  `signature_str()` knows how to print itself with/without out arguments, differently for native/deprecated ops. `signature_str_vararg()` optionally returns a vararg variant of the signature if one exists.

**Calling out the gap between python_arg_parser vs. pyi**

The two formats are notably different, so I don’t think we can expect to unify them completely. That said, I encountered a number of differences in the pyi codegen that looked wrong- I tried to call them out in the PR, to be removed later. Just as an example, looking at the `svd` signature in the python_arg_parser vs. the pyi type hint:

python_arg_parser
```
Static PythonArgParser parser({
  “svd(Tensor input, bool some=True, bool compute_uv=True, *, TensorList[3] out=None”,
}, /*traceable=*/true);
```

Pyi
```
def svd(input: Tensor, some: _bool=True, compute_uv: _bool=True, *, out: Optional[Tensor]=None) -> namedtuple_U_S_V: …
```

The two have obvious syntactic differences that we probably don’t plan on changing: the python_arg_parser doesn’t include `def` or return types, and it includes the type hint before the variable name. But the type of `out` in pyi is probably wrong, since `svd` has multiple output params. I tried to clearly call out any instances of the pyi codegen diverging in a way that looks buggy, so we can clean it up in a later PR (see the comments for details).

Another particularly ugly “bug” that I kept in to maintain byte-for-byte compatibility is the fact that the pyi codegen groups operator overloads together. It turns out that the only reason it does this (as far as I can tell) is because is tacks on an out argument to signatures that don’t have one, if ANY overloads of that op have an out variant.

E.g. consider the pyi type hints generated for `nanmedian` in `_VF.pyi`:
```
overload
def nanmedian(input: Tensor, *, out: Optional[Tensor]=None) -> Tensor: ...
overload
def nanmedian(input: Tensor, dim: _int, keepdim: _bool=False, *, out: Optional[Tensor]=None) -> namedtuple_values_indices: ...
overload
def nanmedian(input: Tensor, dim: Union[str, ellipsis, None], keepdim: _bool=False, *, out: Optional[Tensor]=None) -> namedtuple_values_indices: ...
```

And the corresponding native_functions.yaml entries:
```
- func: nanmedian(Tensor self) -> Tensor
- func: nanmedian.dim(Tensor self, int dim, bool keepdim=False) -> (Tensor values, Tensor indices)
- func: nanmedian.dim_values(Tensor self, int dim, bool keepdim=False, *, Tensor(a!) values, Tensor(b!) indices) -> (Tensor(a!) values, Tensor(b!) indices)
- func: nanmedian.names_dim(Tensor self, Dimname dim, bool keepdim=False) -> (Tensor values, Tensor indices)
- func: nanmedian.names_dim_values(Tensor self, Dimname dim, bool keepdim=False, *, Tensor(a!) values, Tensor(b!) indices) -> (Tensor(a!) values, Tensor(b!)
```

Signature 2 corresponds to entries 2 and 3 in native_functions, and Signature 3 corresponds to entries 4 and 5. But signature 1 has an optional out argument, even though entry 1 in native_functions.yaml has no out variant.

I’d like to delete that logic in a later PR- that will also have the added benefit no longer requiring to group overloads together in the pyi codegen. We can just operate independently on each PythonSignatureGroup.

**More detailed accounting of the changes**

Per file:

gen_python_functions.py
- `load_signatures()` can now skip deprecated signatures. Needed because pyi only includes deprecated functions, and skips their method variants (maybe we should add them in…?)
- Moved `namedtuple_fieldnames` into python.cpp
- `group_overloads()` can now opt to not sort the overloads (needed for byte-for-byte compact, pyi doesn’t sort for some reason)

Python.py:
- Gave `PythonSignature`and `PythonSignatureDeprecated` a `pyi` flag that tells it whether or not to print itself in pyi vs. python_arg_parser format
- Added a `PythonReturns` dataclass , which is now a member of PythonSignature. It is only used by pyi. I found this useful because python returns need to know how to deal with named tuple returns properly. I also moved `namedtuple_fieldnames` into this file from gen_python_functions

gen_pyi.py
- Merged `get_py_torch_functions` and `get_py_variable_methods` into a single function, since they’re very similar
- Lifted out all of the pyi type hint type-mapping mess and dropped it into python.py. This required updating the mapping to deal with NativeFunction objects instead of the outputs of Declarations.yaml (this was most of the logic in `type_to_python`, `arg_to_type_hint`, and `generate_type_hints`).  `generate_type_hints` is now a small orchestration function that gathers the different signatures for each PythonSignatureGroup.
- NamedTuples are now generated by calling `PythonReturn.named_tuple()` (in `generate_named_tuples()`), rather than appending to a global list

A lot of hardcoded pyi signatures still live in `gen_pyi.py`. I didn’t look to closely into whether or not any of that can be removed as part of this PR.

Test Plan: Imported from OSS

Reviewed By: ljk53

Differential Revision: D25343802

Pulled By: bdhirsh

fbshipit-source-id: f73e99e1afef934ff41e4aca3dabf34273459a52
2020-12-07 10:39:38 -08:00
Jiakai Liu
de284b6d35 [pytorch][codegen] add autograd data model (#48249)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48249

Introduced autograd related data models at tools.codegen.api.autograd.

Migrated load_derivatives.py to produce the new data models from derivatives.yaml.
It has clean mypy-strict result.

Changed both gen_autograd_functions.py and gen_variable_type.py to consume
the new data model.

Added type annotations to gen_autograd_functions.py - it has clean mypy-strict
result except for the .gen_autograd import (so haven't added it to the strict
config in this PR).

To limit the scope of the PR, gen_variable_type.py is not refactored, and the
main structure of load_derivatives.py / gen_autograd_functions.py is kept. We
only make necessary changes to make it work.

Confirmed byte-for-byte compatible with the old codegen:

```
Run it before and after this PR:
  .jenkins/pytorch/codegen-test.sh <baseline_output_dir>
  .jenkins/pytorch/codegen-test.sh <test_output_dir>

Then run diff to compare the generated files:
  diff -Naur <baseline_output_dir> <test_output_dir>
```

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D25086561

Pulled By: ljk53

fbshipit-source-id: 1f43ab0931d9814c24683b9a48ca497c5fc3d729
2020-11-19 21:47:05 -08:00
Jiakai Liu
5eaf8562cd [pytorch][codegen] simplify dunder method check in gen_python_functions.py (#47976)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47976

Confirmed byte-for-byte compatible with the old codegen:

```
Run it before and after this PR:
  .jenkins/pytorch/codegen-test.sh <baseline_output_dir>
  .jenkins/pytorch/codegen-test.sh <test_output_dir>

Then run diff to compare the generated files:
  diff -Naur <baseline_output_dir> <test_output_dir>
```

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D24976273

Pulled By: ljk53

fbshipit-source-id: 6f8f20d18db20c3115808bfac0a8b8ad83dcf64c
2020-11-18 12:26:47 -08:00
Jiakai Liu
4ff8cd8f3a [pytorch][codegen] gen_python_functions.py loading native_functions.yaml / deprecated.yaml directly (#47746)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47746

- Removed the integration hack in gen_python_functions.py. It now directly
  loads native_functions.yaml. All dependencies on Declarations.yaml
  have been removed / moved to elsewhere.
- Rewrote the deprecated.yaml parsing logic to work with new data model directly.

Confirmed byte-for-byte compatible with the old codegen:
```
Run it before and after this PR:
  .jenkins/pytorch/codegen-test.sh <baseline_output_dir>
  .jenkins/pytorch/codegen-test.sh <test_output_dir>

Then run diff to compare the generated files:
  diff -Naur <baseline_output_dir> <test_output_dir>
```

Differential Revision: D24885067

Test Plan: Imported from OSS

Reviewed By: bhosmer

Pulled By: ljk53

fbshipit-source-id: 8e906b7dd36a64395087bd290f6f54596485ceb4
2020-11-14 02:27:57 -08:00
Jiakai Liu
d91cefb0d8 [pytorch][codegen] migrate gen_annotated_fn_args.py to new codegen model (#47745)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47745

This is a relatively small codegen. Reintroduced 'simple_type' to preserve
old codegen output.

It depends on some methods defined in gen_python_functions.py - next PR will
clean up the remaining Declarations.yaml methods in gen_python_functions.py.

Confirmed byte-for-byte compatible with the old codegen:
```
Run it before and after this PR:
  .jenkins/pytorch/codegen-test.sh <baseline_output_dir>
  .jenkins/pytorch/codegen-test.sh <test_output_dir>

Then run diff to compare the generated files:
  diff -Naur <baseline_output_dir> <test_output_dir>
```

Differential Revision: D24885068

Test Plan: Imported from OSS

Reviewed By: ezyang

Pulled By: ljk53

fbshipit-source-id: c0fbd726bcc450c3c7fe232c23e5b31779d0b65f
2020-11-14 02:24:39 -08:00
Jiakai Liu
4159191f0e [pytorch] split out trace type generator and migrate to new codegen model (#47438)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47438

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D24808211

Pulled By: ljk53

fbshipit-source-id: 44dfadf550a255c05aa201e54b48101aaf722885
2020-11-09 12:39:39 -08:00
Jiakai Liu
16c72a5a6b [pytorch] continue to rewrite gen_python_functions.py with typed models (#46978)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46978

Refactored and added type annotations to the most part of the file.

Some top-level codegen functions are called by other codegen scripts.
Will migrate them in subsequent PRs.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D24589210

Pulled By: ljk53

fbshipit-source-id: e0c7e5b3672b41983f321400c2e2330d1462e76e
2020-11-08 01:34:12 -08:00
Brian Hirsh
7a0f0d24d0 Codegen - error when an argument that looks like an out argument isn't a kwarg (fix #43273) (#47284)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47284

Test Plan: Imported from OSS

Reviewed By: nikithamalgifb

Differential Revision: D24706763

Pulled By: bdhirsh

fbshipit-source-id: 60fbe81a0dff7e07aa8c169235d15b84151d3ed7
2020-11-03 16:30:01 -08:00
Erjia Guan
a341a4329a Format error message for unmatched signature between _out and base functions (#47087)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47087

Fixes #33547

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D24633077

Pulled By: ejguan

fbshipit-source-id: d1baca84cb3bc415cced9b696103f17131e1e4c7
2020-11-03 07:36:37 -08:00
Jiakai Liu
9d23fd5c00 [pytorch] get rid of cpp_type_str from pybind codegen (#46977)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46977

Clean up a few TODOs in the new python binding codegen.
Get rid of the _simple_type() hack and the uses of cpp_type_str.
Now python argument type strings and PythonArgParser unpacking methods
are directly generated from the original Type model.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D24589209

Pulled By: ljk53

fbshipit-source-id: b2a6c3911d58eae49c031d319c8ea6f804e2cfde
2020-10-28 21:25:55 -07:00
Kurt Mohler
b75b961934 Fix requires_grad arg for new_full, new_empty, new_zeros (#46486)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/36455

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46486

Reviewed By: gchanan

Differential Revision: D24497034

Pulled By: ezyang

fbshipit-source-id: 769a7f00f9a8f7cb77273a1193173a837ae7e32f
2020-10-28 09:34:53 -07:00
Pritam Damania
2b221a9599 Remove PyCFunction casts as much as possible. (#46227)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46227

Follow up from https://github.com/pytorch/pytorch/issues/45419, in
this PR I've removed as many PyCFunction casts as I could from the codebase.

The only ones I didn't remove were the ones with `METH_VARARGS | METH_KEYWORDS`
which have 3 parameters instead of 2 and had to be casted. Example: `
{"copy_", (PyCFunction)(void(*)(void))THPStorage_(copy_), METH_VARARGS |
METH_KEYWORDS, nullptr},`
ghstack-source-id: 114632704

Test Plan: waitforbuildbot

Reviewed By: albanD

Differential Revision: D24269435

fbshipit-source-id: 025cfd43a9a2a3e59f6b2951c1a78749193d77cf
2020-10-20 15:01:51 -07:00
Jiakai Liu
3d421b3137 [pytorch] rewrite of the python binding codegen with the v2 API (#46244)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46244

- What does the generated binding code do?

The Python binding codegen produces code that takes the input list of
PyObjects, finds the matching ATen C++ function using PythonArgParser,
converts the PyObjects into C++ types and calls the ATen C++ function:

```
+--------+  parsing   +------------------------+  binding   +-----------------------+
| PyObjs | ---------> | PythonArgParser Output | ---------> | Cpp Function Dispatch |
+--------+            +------------------------+            +-----------------------+
```

- Are Python arguments 1-1 mapped to C++ arguments?

Python arguments might be reordered, packed, unpacked when binding to
C++ arguments, as illustrated below:

```
// Binding - Reorder & Packing
// aten::empty.names(int[] size, *, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None,
                     Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor

            Python Args               Cpp Args
-----------------------------------------------------------
         0: size                      size
         1: names                     names
         2: memory_format -------+
         3: dtype         -----+-|--> options
         4: layout            /  |
         5: device           /   +--> memory_format
         6: pin_memory      /
         7: requires_grad -+

// Binding - Unpacking
// aten::max.names_dim(Tensor self, Dimname dim, bool keepdim=False) -> (Tensor values, Tensor indices)

            Python Args               Cpp Args
-----------------------------------------------------------
                               +----> max
                              /-----> max_values
         0: input            /        self
         1: dim             /         dim
         2: keepdim        /          keepdim
         3: out      -----+
```

- Why do we want to rewrite the python binding codegen?

The old codegen takes Declarations.yaml as input. It doesn't distinguish
between Python arguments and C++ arguments - they are all mixed together
as a bag of non-typed dict objects. Different methods process these arg
objects and add new attributes for various different purposes. It's not so
obvious to figure out the semantics of these attributes. The complicated
binding logic happens implicitly and scatteredly.

```
+--------------------+
|  Native Functions  |
+--------------------+
  |
  |
  v
+--------------------+
|   Cpp Signatures   |
+--------------------+
  |
  |
  v
+--------------------+
| Declarations.yaml  |
+--------------------+
  |                        +-------------------------------------+
  |              +-------> |       PythonArgParser Schema        |
  |              |         +-------------------------------------+
  |              |                            .
  |              |                            .
  v              |                            .
+--------------------+     +-------------------------------------+
| NonTyped Args Objs | --> | PythonArgParser -> Cpp Args Binding |
+--------------------+     +-------------------------------------+
                 |                            .
                 |                            .
                 |                            .
                 |         +-------------------------------------+
                 +-------> |        Cpp Function Dispatch        |
                           +-------------------------------------+
```

This PR leverages the new immutable data models introduced in the new
aten codegen. It introduces dedicated data models for python schema.
This way, we can not only avoid subtle Declaration.yaml conversions but
also decouple the generation of python schema, python to c++ binding and
c++ function call.

The ultimate state will be like the following diagram:

```
            +-------------------+     +-------------------------------------+
  +-------> | Python Signatures | --> |       PythonArgParser Schema        |
  |         +-------------------+     +-------------------------------------+
  |                         |                            .
  |                         |                            .
  |                         |                            .
+------------------+        |         +-------------------------------------+
| Native Functions |        +-------> | PythonArgParser -> Cpp Args Binding |
+------------------+        |         +-------------------------------------+
  |                         |                            .
  |                         |                            .
  |                         |                            .
  |         +-------------------+     +-------------------------------------+
  +-------> |  Cpp Signatures   | --> |        Cpp Function Dispatch        |
            +-------------------+     +-------------------------------------+
```

This PR has migrated the core binding logic from
tools/autograd/gen_python_functions.py to tools/codegen/api/python.py.

It produces the byte-for-byte same results (tested with #46243).

Will migrate the rest of gen_python_functions.py in subsequent PRs.

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D24388874

Pulled By: ljk53

fbshipit-source-id: f88b6df4e917cf90d868a2bbae2d5ffb680d1841
2020-10-19 17:36:45 -07:00
chengjun
5741de883a Define the record_stream method in native_functions.yaml (#44301)
Summary:
The record_stream method was hard coded for CUDA device. Define the record_stream in the native_functions.yaml to enable the dynamic dispatch to different end device.

Fixes https://github.com/pytorch/pytorch/issues/36556

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44301

Reviewed By: glaringlee

Differential Revision: D23763954

Pulled By: ezyang

fbshipit-source-id: e6d24f5e7892b56101fa858a6cad2abc5cdc4293
2020-10-13 09:15:22 -07:00
Peter Bell
8b39498a23 codegen: Allow string arguments to have defaults (#45665)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45665

Fixes #43944

Note that the codegen doesn't use a proper parser so, in the same way as with lists, the string `, ` cannot appear in defaults or it will be interpreted as a splitting point between arguments.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D24141835

Pulled By: ezyang

fbshipit-source-id: 578127861fd2504917f4486c44100491a2c40343
2020-10-06 21:53:56 -07:00
Supriya Rao
c112e89cc6 [quant] Make choose_qparams_optimized return Tensors to preserve dtype (#45530)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45530

Returning double values requires special handling as a return type for aten functions.
Instead return tensors where the type is preserved in the tensor dtype

Test Plan:
python test/test_quantization.py TestQuantizedTensor.test_choose_qparams_optimized

Imported from OSS

Reviewed By: dskhudia

Differential Revision: D24001134

fbshipit-source-id: bec6b17242f4740ab5674be06e0fc30c35eb0379
2020-09-30 11:35:23 -07:00
Iurii Zdebskyi
d5748d9a1a Enable binary ops with Scalar Lists with for foreach APIs (#45298)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45298

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D23931986

Pulled By: izdeby

fbshipit-source-id: 281267cd6f90d57a169af89f9f10b0f4fcab47e3
2020-09-25 12:58:34 -07:00
Xinyu Li
26001a2334 Revert D23753711: [pytorch][PR] Add foreach APIs for binary ops with ScalarList
Test Plan: revert-hammer

Differential Revision:
D23753711 (71d1b5b0e2)

Original commit changeset: bf3e8c54bc07

fbshipit-source-id: 192692e0d3fff4cade9983db0a1760fedfc9674c
2020-09-24 11:55:49 -07:00
iurii zdebskyi
71d1b5b0e2 Add foreach APIs for binary ops with ScalarList (#44743)
Summary:
In this PR:
1) Added binary operations with ScalarLists.
2) Fixed _foreach_div(...) bug in native_functions
3) Covered all possible cases with scalars and scalar lists in tests
4) [minor] fixed bug in native_functions by adding "use_c10_dispatcher: full" to all _foreach functions

tested via unit tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/44743

Reviewed By: bwasti, malfet

Differential Revision: D23753711

Pulled By: izdeby

fbshipit-source-id: bf3e8c54bc07867e8f6e82b5d3d35ff8e99b5a0a
2020-09-24 08:30:42 -07:00
Supriya Rao
60665ace17 [quant] Add optimized approach to calculate qparams for qembedding_bag (#45149)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45149

The choose_qparams_optimized calculates the the optimized qparams.
It uses a greedy approach to nudge the min and max and calculate the l2 norm
  and tries to minimize the quant error by doing `torch.norm(x-fake_quant(x,s,z))`

Test Plan: Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D23848060

fbshipit-source-id: c6c57c9bb07664c3f1c87dd7664543e09f634aee
2020-09-23 19:00:22 -07:00
Jiakai Liu
9e5045e978 [pytorch] clean up normalized_dynamic_type() hack (#44889)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44889

This HACK doesn't seem to be necessary any more - there is no 'real'
type in generated Declarations.yaml file.
Verified by comparing generated code before/after.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D23761624

Pulled By: ljk53

fbshipit-source-id: de996f04d77eebea3fb9297dd90a8ebeb07647bb
2020-09-18 23:49:46 -07:00
Peter Bell
fd4e21c91e Add optional string support to native_functions schema (#43010)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43010

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D23751851

Pulled By: mruberry

fbshipit-source-id: 648f7430e1b7311eff28421f38e01f52d998fcbd
2020-09-18 14:57:24 -07:00
Edward Yang
6ea89166bd Rewrite of ATen code generator (#42629)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42629

How to approach reviewing this diff:

- The new codegen itself lives in `tools/codegen`. Start with `gen.py`, then read `model.py` and them the `api/` folder. The comments at the top of the files describe what is going on. The CLI interface of the new codegen is similar to the old one, but (1) it is no longer necessary to explicitly specify cwrap inputs (and now we will error if you do so) and (2) the default settings for source and install dir are much better; to the extent that if you run the codegen from the root source directory as just `python -m tools.codegen.gen`, something reasonable will happen.
- The old codegen is (nearly) entirely deleted; every Python file in `aten/src/ATen` was deleted except for `common_with_cwrap.py`, which now permanently finds its home in `tools/shared/cwrap_common.py` (previously cmake copied the file there), and `code_template.py`, which now lives in `tools/codegen/code_template.py`. We remove the copying logic for `common_with_cwrap.py`.
- All of the inputs to the old codegen are deleted.
- Build rules now have to be adjusted to not refer to files that no longer exist, and to abide by the (slightly modified) CLI.
- LegacyTHFunctions files have been generated and checked in. We expect these to be deleted as these final functions get ported to ATen. The deletion process is straightforward; just delete the functions of the ones you are porting. There are 39 more functions left to port.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D23183978

Pulled By: ezyang

fbshipit-source-id: 6073ba432ad182c7284a97147b05f0574a02f763
2020-08-31 09:00:22 -07:00
Xiang Gao
a860be898e [resubmit] Add amax/amin (#43819)
Summary:
Resubmit for landing next week.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43819

Reviewed By: ngimel

Differential Revision: D23421906

Pulled By: mruberry

fbshipit-source-id: 23dd60d1e365bb1197d660c3bfad7ee07ba3e97f
2020-08-31 04:54:48 -07:00
Nikita Shulga
3f0120edb4 Revert D23360705: [pytorch][PR] Add amax/amin
Test Plan: revert-hammer

Differential Revision:
D23360705 (bcec8cc3f9)

Original commit changeset: 5bdeb08a2465

fbshipit-source-id: 76a9e199823c7585e55328bad0778bcd8cd49381
2020-08-28 18:01:25 -07:00
Gao, Xiang
bcec8cc3f9 Add amax/amin (#43092)
Summary:
Add a max/min operator that only return values.

## Some important decision to discuss
| **Question**                          | **Current State** |
|---------------------------------------|-------------------|
| Expose torch.max_values to python?    | No                |
| Remove max_values and only keep amax? | Yes               |
| Should amax support named tensors?    | Not in this PR    |

## Numpy compatibility

Reference: https://numpy.org/doc/stable/reference/generated/numpy.amax.html

| Parameter                                                                                                                                                                                                                                              | PyTorch Behavior                                                                  |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|
| `axis`:  None or int or tuple of ints, optional. Axis or axes along which to operate. By default, flattened input is used. If this is a tuple of ints, the maximum is selected over multiple axes, instead of a single axis or all the axes as before. | Named `dim`, behavior same as `torch.sum` (https://github.com/pytorch/pytorch/issues/29137)                                |
| `out`: ndarray, optional. Alternative output array in which to place the result. Must be of the same shape and buffer length as the expected output.                                                                                                   | Same                                                                              |
| `keepdims`: bool, optional. If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.                                      | implemented as `keepdim`                                                          |
| `initial`: scalar, optional. The minimum value of an output element. Must be present to allow computation on empty slice.                                                                                                                              | Not implemented in this PR. Better to implement for all reductions in the future. |
| `where`: array_like of bool, optional. Elements to compare for the maximum.                                                                                                                                                                            | Not implemented in this PR. Better to implement for all reductions in the future. |

**Note from numpy:**
> NaN values are propagated, that is if at least one item is NaN, the corresponding max value will be NaN as well. To ignore NaN values (MATLAB behavior), please use nanmax.

PyTorch has the same behavior

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43092

Reviewed By: ngimel

Differential Revision: D23360705

Pulled By: mruberry

fbshipit-source-id: 5bdeb08a2465836764a5a6fc1a6cc370ae1ec09d
2020-08-28 12:51:03 -07:00
Basil Hosmer
a5a6a3e633 add support for optional int list with scalar fill (#43262)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43262

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D23212049

Pulled By: bhosmer

fbshipit-source-id: c7ceb2318645c07d36c3f932c981c9ee3c414f82
2020-08-21 18:24:36 -07:00
James Gilbert
da5df7e2d2 Remove use of term "blacklist" from tools/autograd/gen_python_functions.py (#42047)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/41720

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42047

Reviewed By: colesbury

Differential Revision: D23197785

Pulled By: SplitInfinity

fbshipit-source-id: 8ef38518f479e5e96b6a51bc420b0df5b35b447c
2020-08-18 15:11:22 -07:00
Mike Ruberry
9c8021c0b1 Adds torch.linalg namespace (#42664)
Summary:
This PR adds the `torch.linalg` namespace as part of our continued effort to be more compatible with NumPy. The namespace is tested by adding a single function, `torch.linalg.outer`, and testing it in a new test suite, test_linalg.py. It follows the same pattern that https://github.com/pytorch/pytorch/pull/41911, which added the `torch.fft` namespace, did.

Future PRs will likely:

- add more functions to torch.linalg
- expand the testing done in test_linalg.py, including legacy functions, like torch.ger
- deprecate existing linalg functions outside of `torch.linalg` in preference to the new namespace

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42664

Reviewed By: ngimel

Differential Revision: D22991019

Pulled By: mruberry

fbshipit-source-id: 39258d9b116a916817b3588f160b141f956e5d0b
2020-08-07 10:18:30 -07:00
Mike Ruberry
ccfce9d4a9 Adds fft namespace (#41911)
Summary:
This PR creates a new namespace, torch.fft (torch::fft) and puts a single function, fft, in it. This function is analogous to is a simplified version of NumPy's [numpy.fft.fft](https://numpy.org/doc/1.18/reference/generated/numpy.fft.fft.html?highlight=fft#numpy.fft.fft) that accepts no optional arguments. It is intended to demonstrate how to add and document functions in the namespace, and is not intended to deprecate the existing torch.fft function.

Adding this namespace was complicated by the existence of the torch.fft function in Python. Creating a torch.fft Python module makes this name ambiguous: does it refer to a function or module? If the JIT didn't exist, a solution to this problem would have been to make torch.fft refer to a callable class that mimicked both the function and module. The JIT, however, cannot understand this pattern. As a workaround it's required to explicitly `import torch.fft` to access the torch.fft.fft function in Python:

```
import torch.fft

t = torch.randn(128, dtype=torch.cdouble)
torch.fft.fft(t)
```

See https://github.com/pytorch/pytorch/issues/42175 for future work. Another possible future PR is to get the JIT to understand torch.fft as a callable class so it need not be imported explicitly to be used.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41911

Reviewed By: glaringlee

Differential Revision: D22941894

Pulled By: mruberry

fbshipit-source-id: c8e0b44cbe90d21e998ca3832cf3a533f28dbe8d
2020-08-06 00:20:50 -07:00
Hameer Abbasi
3d46e02ea1 Add __torch_function__ for methods (#37091)
Summary:
According to pytorch/rfcs#3

From the goals in the RFC:

1. Support subclassing `torch.Tensor` in Python (done here)
2. Preserve `torch.Tensor` subclasses when calling `torch` functions on them (done here)
3. Use the PyTorch API with `torch.Tensor`-like objects that are _not_ `torch.Tensor`
   subclasses (done in https://github.com/pytorch/pytorch/issues/30730)
4. Preserve `torch.Tensor` subclasses when calling `torch.Tensor` methods. (done here)
5. Propagating subclass instances correctly also with operators, using
   views/slices/indexing/etc. (done here)
6. Preserve subclass attributes when using methods or views/slices/indexing. (done here)
7. A way to insert code that operates on both functions and methods uniformly
   (so we can write a single function that overrides all operators). (done here)
8. The ability to give external libraries a way to also define
   functions/methods that follow the `__torch_function__` protocol. (will be addressed in a separate PR)

This PR makes the following changes:

1. Adds the `self` argument to the arg parser.
2. Dispatches on `self` as well if `self` is not `nullptr`.
3. Adds a `torch._C.DisableTorchFunction` context manager to disable `__torch_function__`.
4. Adds a `torch::torch_function_enabled()` and `torch._C._torch_function_enabled()` to check the state of `__torch_function__`.
5. Dispatches all `torch._C.TensorBase` and `torch.Tensor` methods via `__torch_function__`.

TODO:

- [x] Sequence Methods
- [x] Docs
- [x] Tests

Closes https://github.com/pytorch/pytorch/issues/28361

Benchmarks in https://github.com/pytorch/pytorch/pull/37091#issuecomment-633657778

Pull Request resolved: https://github.com/pytorch/pytorch/pull/37091

Reviewed By: ngimel

Differential Revision: D22765678

Pulled By: ezyang

fbshipit-source-id: 53f8aa17ddb8b1108c0997f6a7aa13cb5be73de0
2020-08-05 20:44:13 -07:00
Sebastian Messmer
1542c41a67 Change C++ frontend to take optional<Tensor> arguments (#41947)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41947

Previously, if an op took an optional `Tensor?` argument, the C++ frontend (i.e. `at::op()` and `Tensor::op()`)
were generated to take `Tensor`. A previous PR (https://github.com/pytorch/pytorch/pull/41610) changed the kernels
to be written with `c10::optional<Tensor>` instead of `Tensor`, but that did not touch the C++ frontend yet.

This PR changes the C++ frontend API to take `c10::optional<Tensor>` instead of `Tensor` as well.
This should be mostly bc conserving. Since `Tensor` implicitly converts to `c10::optional<Tensor>`, any old code
calling an op with a `Tensor` would still work. There are likely corner cases that get broken though.
For example, C++ only ever does *one* implicit conversion. So if you call an op with a non-tensor object
that gets implicitly converted to a `Tensor`, then that previously worked since the API took a `Tensor` and
C++ allows one implicit conversion. Now it wouldn't work anymore because it would require two implicit conversions
(to `Tensor` and then to `c10::optional<Tensor>`) and C++ doesn't do that.

The main reasons for doing this are
- Make the C++ API more sane. Those arguments are optional and that should be visible from the signature.
- Allow easier integration for XLA and Autocast. Those backends generate code to wrap operators and forward
  operator arguments to calls to at::op(). After https://github.com/pytorch/pytorch/pull/41610, there was
  a mismatch because they had to implement operators with `optional<Tensor>` but call `at::op()` with `Tensor`,
  so they had to manually convert between those. After this PR, they can just forward the `optional<Tensor>`
  in their call to `at::op()`.
ghstack-source-id: 108873705

Test Plan: unit tests

Reviewed By: bhosmer

Differential Revision: D22704832

fbshipit-source-id: f4c00d457b178fbc124be9e884a538a3653aae1f
2020-07-31 16:11:55 -07:00
David Reiss
fb9e44f8dd Add support for float[]? arguments in native_functions.yaml (#37175)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37175

ghstack-source-id: 106938114

Test Plan: Upcoming diffs use this for upsampling.

Differential Revision: D21209994

fbshipit-source-id: 1a71c07e45e28772a2bbe450b68280dcc0fe2def
2020-07-13 11:51:10 -07:00
David Reiss
5e03a1e926 Add support for int[]? arguments in native_functions.yaml (#37174)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37174

ghstack-source-id: 106938112

Test Plan: Upcoming diffs use this for upsampling.

Differential Revision: D21210002

fbshipit-source-id: d6a55ab6420c05a92873a569221b613149aa0daa
2020-07-07 13:52:20 -07:00
Kurt Mohler
f9eb8824f1 Remove datatype from Storage and StorageImpl (#38870)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38870

* Removed dtype data member from StorageImpl
* Removed any methods or method arguments in Storage/StorageImpl that deal with dtypes
* Update all callers of the changed API

Part of issue https://github.com/pytorch/pytorch/issues/33950
Original PR: https://github.com/pytorch/pytorch/pull/38038

Reviewed By: albanD

Differential Revision: D21549645

Pulled By: ezyang

fbshipit-source-id: 4289b356c55ff6b9530376a79343b99b540ee3de
2020-05-21 15:26:08 -07:00
Lu Fang
b579433bf7 Revert D21487840: Bind VariableFunctions as a module, not a class with static methods.
Test Plan: revert-hammer

Differential Revision:
D21487840

Original commit changeset: 368da9b9c50e

fbshipit-source-id: 900f5d36490ac8d419c6704f8727d4c8e492bfb7
2020-05-09 11:58:02 -07:00
Edward Yang
30f4064cfb Bind VariableFunctions as a module, not a class with static methods. (#38136)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38136

This was a bit trickier than I expected, because modules have
to be importable to be pickleable, but adding a module to another
module in the C API isn't really the right way to make it importable.
We hack around it by manually adding the module to sys.modules.

Thanks Richard Zou for an extremely useful prior attempt which helped
me make this work.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D21487840

Pulled By: ezyang

fbshipit-source-id: 368da9b9c50e5de4d7dd265e6f9f189a882d75c1
2020-05-08 22:34:34 -07:00
James Reed
1592d6842c [resubmit] Move profiler to a dispatch wrapper (#36766)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36766

Original commit changeset: dcb41d243369
ghstack-source-id: 102614215

Test Plan: waitforsadcastle

Differential Revision: D21076029

fbshipit-source-id: c2461c57cfd364bd23ff99bc2cb5572d22e23391
2020-04-21 16:37:11 -07:00
Karl Ostmo
4894cba572 Revert D19775659: [WIP] Move profiler to a dispatch wrapper
Test Plan: revert-hammer

Differential Revision:
D19775659

Original commit changeset: 5cbe5f736660

fbshipit-source-id: dcb41d2433697c5d521044a9dbc12c79f31e0929
2020-04-16 14:18:51 -07:00
James Reed
a85c835196 [WIP] Move profiler to a dispatch wrapper (#33057)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33057

Test Plan: Imported from OSS

Differential Revision: D19775659

Pulled By: jamesr66a

fbshipit-source-id: 5cbe5f736660c8543764ef62b16550638d9ceb72
2020-04-16 13:36:37 -07:00
Pavel Belevich
c9a1fc2b31 replace Generator arguments with c10::optional<Generator> (#36232)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36232

The purpose of this PR is to replace `at::Generator generator = nullptr` with `c10::optional<at::Generator> = c10::nullopt` all over the code

* #36230 Replace std::shared_ptr with c10::intrusive_ptr in at::Generator

Test Plan: Imported from OSS

Differential Revision: D20943603

Pulled By: pbelevich

fbshipit-source-id: 65d335990f01fcc706867d5344e73793fad68ae6
2020-04-13 16:26:57 -07:00
Supriya Rao
032c27cff7 [quant][graph] Add _choose_qparams function for graph mode (#35235)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35235

For dynamic quantization in graph mode, we need an operator that returns the qparams of the tensor
similar to the linear_dynamic quantized op

Test Plan:
python test/test_quantized_tensor.py TestQuantizedTensor.test_choose_qparams

Imported from OSS

Differential Revision: D20608793

fbshipit-source-id: b923b2620421b32d05f4097db0d6153d53198221
2020-03-25 10:33:21 -07:00
Pavel Belevich
5306713a36 Replace Generator* with Generator that holds std::shared_ptr<GeneratorImpl> (#34468)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34468

This PR prepares `at::Generator` for pybind11's `type_caster<at::Generator>` which is required to implement custom RNG in python. The following changes are done:
1. `at::Generator` was moved to `c10::GeneratorImpl` (similar to `c10::TensorImpl`)
2. `at::Generator` was recreated as a holder of `std::shared_ptr<c10::GeneratorImpl>` (similar to `at::Tensor` that holds `c10::intrusive_ptr<c10::TensorImpl>`)
3. Most of `at::Generator*` usages were replaced with `at::Generator`

TBD: replacing `Generator generator = nullptr` with `{}` requires JIT changes(adding Generator to IValue?)

Differential Revision: D20549420

Pulled By: pbelevich

fbshipit-source-id: 4c92a40eab8f033b359bb6c93f4cd84b07ee8d4e
2020-03-21 17:36:10 -07:00
Mike Ruberry
1afc584188 Deprecates current torch.full integral type inference, adds torch.full complex type inference (#34709)
Summary:
Per title.

Currently torch.full will always (attempt to) produce a float tensor. This is inconsistent with NumPy in (at least) two cases:

- When integral fill values (including bool) are given
- When complex fill values are given

For example:

```
np.full((1, 2), 1).dtype
: dtype('int64')

np.full((1, 2), (1 + 1j)).dtype
: dtype('complex128')
```

Whereas in PyTorch

```
torch.full((1, 2), 1).dtype
: torch.float32

torch.full((1, 2), (1 + 1j)).dtype
: RuntimeError: value cannot be converted to type float without overflow: (1,1)
```

This PR begins the process of deprecating our current behavior of returning float tensors (by default) when given integer fill values by warning the user that integer fill values will require explicitly specifying the dtype or out kwargs in 1.6, and in 1.7 the behavior will change to return a LongTensor by default (BoolTensor for bool values). The intermediate 1.6 release is to prevent changing the behavior silently and unexpectedly.

The PR also implements inference for complex types. So that with it:

```
torch.full((1, 2), (1 + 1j)).dtype
: torch.complex64
```

The complex type inference returns a ComplexFloat tensor when given a complex fill value (and no dtype or out kwarg is specified), unless the default dtype is Double, in which case a ComplexDouble tensor is returned.

A test for these behaviors is added to test_torch.py.

Implementation note:

This PR required customizing full's dispatch because currently in eager codegen the TensorOptions object passed to functions improperly sets has_dtype() to true, even if the user did not explicitly provide a dtype. torch.arange already worked around this issue with its own custom implementation. The JIT, however, does pass a properly constructed TensorOptions object.

Future Work:

This PR does not extend torch.full's complex type inference to ONNX. This seems unlikely to come up and will be a clear error if it does. When integer type inference is added to torch.full, however, then porting the behavior to ONNX may be warranted. torch.arange ported its complex type promotion logic to ONNX, for example.

Additionally, this PR mostly leaves existing call sites in PyTorch that would trigger this warning intact. This is to be more minimal (since the PR is BC breaking). I will submit a separate PR fixing PyTorch's call sites.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34709

Differential Revision: D20509387

Pulled By: mruberry

fbshipit-source-id: 129593ba06a1662032bbbf8056975eaa59baf933
2020-03-18 12:19:31 -07:00
Terence Feng
3c76b2aeea Replace THPLayout with at::Layout in Python Argument Parser (#34543) (#34584)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34584

Test Plan:
```
python setup.py develop
python test/test_torch.py
```
Output:
```
...
Ran 3834 tests in 198.825s

OK (skipped=180)
```

Imported from OSS

Differential Revision: D20403330

fbshipit-source-id: 41474d5e7001db070f98ac8379f909f0ac74deb6
2020-03-12 07:19:00 -07:00
Edward Yang
0e74cbcc54 Revert "Revert "Revert D19975411: Remove special case codegen for tril_indices/triu_indices." (#33572)" (#33742)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33742

This reverts commit 90f4c5695e.

Test Plan: Imported from OSS

Differential Revision: D20095103

Pulled By: ezyang

fbshipit-source-id: ff47dae21c278570b4ca497d76deedb75823d6d7
2020-02-25 12:09:49 -08:00
Nathan Goldbaum
fa80299bdf __torch_function__ overrides for torch.functional and torch.nn.functional (#32799)
Summary:
This adds `__torch_function__` support for all functions in `torch.functional` and `torch.nn.functional`.

The changes to C++ code and codegen scripts are to facilitate adding `__torch_function__` support for the native functions in `torch._C._nn`. Note that I moved the `handle_torch_function` C++ function to a header that both `python_torch_functions.cpp` and `python_nn_functions.cpp` include. The changes to `python_nn_functions.cpp` mirror the changes I made to `python_torch_functions.cpp` when `__torch_function__` support was first added in https://github.com/pytorch/pytorch/issues/27064. Due to the somewhat different way the `torch._C` and `torch._C._nn` namespaces are initialized I needed to create a new static reference to the `torch._C._nn` namespace (`THPNNVariableFunctions`). I'm not sure if that is the best way to do this. In principle I could import these namespaces in each kernel and avoid the global variable but that would have a runtime cost.

I added `__torch_function__` support to the Python functions in `torch.nn.functional` following the approach in https://github.com/pytorch/pytorch/issues/32194.

I re-enabled the test that checks if all functions in the `torch` namespace are explicitly tested for `__torch_function__` support. I also generalized the check to work for `torch.functional` and `torch.nn.functional` as well. This test was explicitly disabled in https://github.com/pytorch/pytorch/issues/30730 and I'm happy to disable it again if you think that's appropriate. I figured now was as good a time as any to try to re-enable it.

Finally I adjusted the existing torch API tests to suppress deprecation warnings and add keyword arguments used by some of the code in `torch.nn.functional` that were missed when I originally added the tests in https://github.com/pytorch/pytorch/issues/27064.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32799

Differential Revision: D19956809

Pulled By: ezyang

fbshipit-source-id: 40d34e0109cc4b9f3ef62f409d2d35a1d84e3d22
2020-02-21 08:38:37 -08:00
Edward Yang
90f4c5695e Revert "Revert D19975411: Remove special case codegen for tril_indices/triu_indices." (#33572)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33572

This reverts commit 687a7e4a25.

Original PR #33305

Reland with BC tests whitelisted. See https://github.com/pytorch/pytorch/issues/33580 for reasoning why this change is not actually BC breaking.

Test Plan: Imported from OSS

Differential Revision: D20011011

Pulled By: ezyang

fbshipit-source-id: 116374efc93af12b8ad738a0989d6f0daa9569e2
2020-02-21 08:36:32 -08:00
Vitaly Fedyunin
687a7e4a25 Revert D19975411: Remove special case codegen for tril_indices/triu_indices.
Test Plan: revert-hammer

Differential Revision:
D19975411

Original commit changeset: 996598759bed

fbshipit-source-id: 6bdb4b8f903e13815fc146e6f3260e5bb04c1045
2020-02-20 11:29:53 -08:00
Edward Yang
196fda5a79 Remove special case codegen for tril_indices/triu_indices. (#33305)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33305

The current TensorOptions code is written to exactly extract out
TensorOptions based on exact struct match, including default arguments.
That meant that tril_indices/triu_indices which had a different
default argument didn't match, and thus needed a special case.

I resolve this special case by instead replacing the explicit long
default argument with a None default argument, and then adjusting
the actual implementations to select the correct dtype when none
was specified.  I think the general rule I'm following here is that
it is always acceptable to replace an explicit default argument,
with a None argument (assuming the backend will compute it appropriately);
the documentation gets modestly worse, but everything that was
previously expressible continues to be expressible.  Maybe later
we should switch the default argument back to long, but for now
the simplification in code is worth it.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D19975411

Pulled By: ezyang

fbshipit-source-id: 996598759bed9e8d54fe61e19354ad038ed0e852
2020-02-20 09:34:28 -08:00
Will Feng
5d7f42847c Add at::Tensor::retain_grad API (#33349)
Summary:
This PR adds `at::Tensor::retain_grad`, and its implementation mirrors the Python `torch.Tensor.retain_grad` API:
c6271c63f2/torch/tensor.py (L292-L315)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33349

Differential Revision: D19944524

Pulled By: yf225

fbshipit-source-id: e61d5d761996b6d1b860c04c4b4650c1a49a6a8c
2020-02-17 20:03:48 -08:00
Basil Hosmer
544eab37d0 Move deprecation warning out of generated code into python_arg_parser. (#32907)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32907

All op-specific information used in this logic was available to the
parser itself, so the check can be done in that context, no codegen
needed.

No change in the warning behavior itself, mod minor formatting tweak -
passes existing tests. Saves like ~275K binary size on mac:
```
-rwxr-xr-x  1 bhosmer  1876110778   16502064 Feb  1 00:43 torch/lib/libtorch_python.dylib
-rwxr-xr-x  1 bhosmer  1876110778   16247888 Feb  1 00:44 torch/lib/libtorch_python.dylib
```

[codegen diff](https://github.com/bhosmer/scratch/compare/deprecation_warning_before...deprecation_warning_after)

More important than the size savings is the minimization of codegen. Ideally the generated artifact should express distinctive per-op properties in as minimal a form as practically possible - e.g. here instead of generating check-and-warn behavior into every binding, we generate only the data that triggers the behavior in the parser. (And actually we were generating it already.)

Test Plan: Imported from OSS

Differential Revision: D19679928

Pulled By: bhosmer

fbshipit-source-id: cf0140573118430720c6b797c762fe5be98acd86
2020-02-03 17:47:04 -08:00
Basil Hosmer
fb159b5236 Some work on eager op binding codegen (gen_python_functions.py) (#29986)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29986

Previously in addition to generating a python binding for each op,
we would generate an almost-trivial helper for each overload.
This PR eliminates the helpers, simplifying codegen logic a bit and
reducing the source-level indirection by a step.
Perf should be unchanged.

codegen diff: 1f2f07fb60

Note: in the interests of keeping the diff contained, there's only
some light cleanup here beyond what's necessary for the codegen changes.
Plan is to do some more substantial refactoring in followup PRs that
leave generated code unchanged.

Test Plan: Imported from OSS

Differential Revision: D18567980

Pulled By: bhosmer

fbshipit-source-id: eb9a81babb4489abd470842757af45580d4c9906
2020-01-30 00:29:53 -08:00
Brian Wignall
f326045b37 Fix typos, via a Levenshtein-type corrector (#31523)
Summary:
Should be non-semantic.

Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking.

Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523

Differential Revision: D19216749

Pulled By: mrshenli

fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea
2020-01-17 16:03:19 -08:00
Peter Bell
b0ac425dc4 Emit warning from deprecated torch function signatures (#32009)
Summary:
Continuation of https://github.com/pytorch/pytorch/issues/31514, fixes https://github.com/pytorch/pytorch/issues/28430
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32009

Test Plan:
I verified that the deprecation warnings only occur once on a relevant workflow. Built with:

```
buck build mode/opt //vision/fair/detectron2/tools:train_net
```

Ran with:

```
DETECTRON2_ENV_MODULE=detectron2.fb.env ~/local/train_net.par --config-file configs/quick_schedules/retinanet_R_50_FPN_instant_test.yaml --num-gpus 1 SOLVER.IMS_PER_BATCH 2
```

Inspected log:

```
[01/14 07:28:13 d2.engine.train_loop]: Starting training from iteration 0
buck-out/opt/gen/caffe2/generate-code=python_variable_methods.cpp/python_variable_methods.cpp:1299: UserWarning: This overload of add is deprecated:
add(Number alpha, Tensor other)
Consider using one of the following signatures instead:
add(Tensor other, Number alpha)
buck-out/opt/gen/caffe2/generate-code=python_variable_methods.cpp/python_variable_methods.cpp:1334: UserWarning: This overload of add_ is deprecated:
add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
add_(Tensor other, Number alpha)
[01/14 07:28:25 d2.utils.events]: eta: 0:00:10  iter: 19  total_loss: 1.699  loss_cls: 1.185  loss_box_reg: 0.501  time: 0.5020  data_time: 0.0224  lr: 0.000100  max_mem: 3722M
[01/14 07:28:35 fvcore.common.checkpoint]: Saving checkpoint to ./output/model_final.pth
```

Differential Revision: D19373523

Pulled By: ezyang

fbshipit-source-id: 75756de129645501f43ecc4e3bf8cc0f78c40b90
2020-01-14 11:44:29 -08:00
Edward Yang
5dfcfeebb8 Revert D19298735: Emit warning from deprecated torch function signatures
Test Plan: revert-hammer

Differential Revision:
D19298735

Original commit changeset: 03cb78af1765

fbshipit-source-id: 304a6d4412f53a8fc822d36897c96815432e0f70
2020-01-08 13:04:41 -08:00
Peter Bell
0e5a6700cc Emit warning from deprecated torch function signatures (#31514)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/28430

The unpythonic signatures for functions such as `torch.addcdiv` are already seperated in [`deprecated.yaml`] and the signatures marked as deprecated in `PythonArgParser`. However, nothing was done with this information previously. So, this now emits a warning when the deprecated signatures are used.

One minor complication is that if all arguments are passed as keyword args then there is nothing to differentiate the deprecated overload. This can lead to false warnings being emitted. So, I've also modified `PythonArgParser` to prefer non-deprecated signatures.

[`deprecated.yaml`]: https://github.com/pytorch/pytorch/blob/master/tools/autograd/deprecated.yaml
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31514

Differential Revision: D19298735

Pulled By: ezyang

fbshipit-source-id: 03cb78af17658eaab9d577cd2497c6f413f07647
2020-01-07 10:57:53 -08:00
Gregory Chanan
68e5172382 Support optional float parameters (float?, optional<double>). (#31517)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31517

This is going to be used by upsample (which currently uses magic values to represent optionals).

For now, we just introduce a fake function for testing (torch._test_optional_float(x)).

Test Plan: Imported from OSS

Differential Revision: D19198721

Pulled By: gchanan

fbshipit-source-id: 0a1382fde0927c5d277d02d62bfb31fb574b8c74
2019-12-23 08:33:39 -08:00
Nathan Goldbaum
f531815526 Deprecate tensor.type() (#30281)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/29161.

I looked a bit at the code changes related to this and think I have all of the use cases of `DeprecatedTypeProperties` covered in the message, but suggestions from someone with more context on this would be very much appreciated :)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30281

Differential Revision: D18830818

Pulled By: ezyang

fbshipit-source-id: 1a7fcee15354ae09e6644577e7fa33bd26acfe20
2019-12-05 10:55:34 -08:00
Nathan Goldbaum
9d3402e4cb Add the __torch_function__ API override mechanism (#30730)
Summary:
This is a re-do of https://github.com/pytorch/pytorch/issues/27064, which was reverted (b8792c0438). This was landed at the same time as other work that added new operators to the `torch` namespace so the check for whether the `torch` namespace is exhaustively checked for overridability was triggering test failures.

I've temporarily disabled that check and added an explanatory comment that the check will be re-enabled in a future PR that will be merged during a time when the commit velocity on PyTorch is lower.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30730

Differential Revision: D18813270

Pulled By: ezyang

fbshipit-source-id: 70477c4656dca8fea6e7bc59259555041fcfbf68
2019-12-04 13:19:07 -08:00
Edward Yang
b8792c0438 Revert D18645954: add __torch_function__ API override mechanism
Test Plan: revert-hammer

Differential Revision:
D18645954

Original commit changeset: 54b5e4344d7a

fbshipit-source-id: 4a7aebb483e6b001130d6f384ccc53c5a808ab13
2019-12-04 07:41:47 -08:00
Prasun Anand
d12786b24f add __torch_function__ API override mechanism (#27064)
Summary:
Closes https://github.com/pytorch/pytorch/issues/24015 (see description of that issue for more details).

For a toy example, see the `DiagonalTensor` and `SubDiagonalTensor` class in test/test_overrides.py.

This PR currently contains:

* tests for `__torch_function__` behavior
* modification to `gen_python_functions` and `parse` function signatures and dispatched to correct overloaded argument.

This feature is inspired by and analogous to NumPy's `__array_function__` protocol ([see NumPy Enhancement Proposal 18](https://numpy.org/neps/nep-0018-array-function-protocol.html#trying-array-function-methods-until-the-right-one-works)).

### Benchmarks:
See Nathan's comment below: https://github.com/pytorch/pytorch/pull/27064#issuecomment-554601189
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27064

Differential Revision: D18645954

Pulled By: ezyang

fbshipit-source-id: 54b5e4344d7afdbcf996bb57191b0bdadc7b1767
2019-12-04 05:56:46 -08:00
Edward Yang
1111a6b810 Use pybind11::gil_scoped_* functions instead of AutoGIL/AutoNoGIL (#30274)
Summary:
Reland of https://github.com/pytorch/pytorch/pull/29095
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30274

Differential Revision: D18762293

Pulled By: ezyang

fbshipit-source-id: d3d50c2dd12bcb678ab25fa708eb6587cc4b66f9
2019-12-02 12:19:58 -08:00
Mike Ruberry
eff4c4d7c1 Revert D18301806: Use pybind11::gil_scoped_* functions instead of AutoGIL/AutoNoGIL
Test Plan: revert-hammer

Differential Revision:
D18301806

Original commit changeset: 03da6a26c41e

fbshipit-source-id: c1324ee8d154e7e16f5dd4f1cf3625aaa566cd39
2019-11-21 14:50:07 -08:00
Alan Du
f4b9690f2d Use pybind11::gil_scoped_* functions instead of AutoGIL/AutoNoGIL (#29095)
Summary:
Given that pybind11 implements these gil functions, I don't think it makes sense for Pytorch to have its own bespoke versions.

Fixes https://github.com/pytorch/pytorch/issues/29065
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29095

Differential Revision: D18301806

Pulled By: ezyang

fbshipit-source-id: 03da6a26c41ee65aaadf7b67b9f0b14d2def2a5a
2019-11-21 13:44:40 -08:00
Jie
fdab1cf0d4 NHWC support in cuDNN BatchNorm & Conv2d (#29361)
Summary:
This reverts the 9a9bb448ee

Fixing the broken case which reverts the previous commit.
details about fix:
	modified:   aten/src/ATen/native/Convolution.cpp

called contiguous on 3D input tensor. This avoids the code path to accidentally
recognize the input as channel_last stride, due to unsqueezing of permuted 3d
tensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29361

Differential Revision: D18371964

Pulled By: VitalyFedyunin

fbshipit-source-id: a5985f4687b37e183649fa35b8ccdb50368ebfdf
2019-11-07 10:39:58 -08:00
Vitaly Fedyunin
9a9bb448ee Revert cudnn changes #23861 (#29329)
Summary:
Broken case:

```python
x = torch.randn(192,16,50).cuda()
x = x.permute(0,2,1).contiguous().permute(0,2,1)
m = torch.nn.Conv1d(
       in_channels=16,
       out_channels=32,
       kernel_size=2,
       bias=True,
  ).cuda()

m(x)
```

This reverts commit 8160f390cf.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29329

Differential Revision: D18357674

Pulled By: VitalyFedyunin

fbshipit-source-id: cdd7e77e8dcbfc5f2ab3df54eb53ccfbf703b245
2019-11-06 17:38:46 -08:00
Jie
8160f390cf (#23861)
Summary:
Added nhwc support for:
1. cudnn_batch_norm & cudnn_batch_norm_backward
2. cudnn_convolution_forward & cudnn_convolution_backward
3. cudnn_convolution_transpose & cudnn_convolution_transpose_backward

patching suggest_memory_format for convolution

suggest_memory_format has ambiguous meaning for two cases:
1. tensor with NCHW where C = 1.
   we could use stride of C as a hint to tell the intended memory format.
2. tensor with NCHW where H == W == 1.
   there's no way to identify the intended memory format from strides.

Currently we fallback to NCHW whenever we see contiguous tensor. Hence avoiding
ambiguity for some of the special cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23861

Differential Revision: D18263434

Pulled By: VitalyFedyunin

fbshipit-source-id: dd9f69576ec12fec879cd87a3d446931371360d9
2019-11-04 09:11:50 -08:00
Peter Bell
f33813d589 Return NotImplemented from all binary math ops (#27423)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/26333

Fixes the operators missed in https://github.com/pytorch/pytorch/issues/26507 and includes a test for all operators.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27423

Differential Revision: D17835390

Pulled By: ezyang

fbshipit-source-id: 7a1351c7ccc8ad11454dbaa00d3701dcee4f06a8
2019-10-28 14:28:33 -07:00
Pavel Belevich
46f96d1538 C++ API parity: at::Tensor::requires_grad_
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26332

Test Plan: Imported from OSS

Differential Revision: D17427575

Pulled By: pbelevich

fbshipit-source-id: 5500169a4fa0ef9cc2a7272e13b6e2d89df09260
2019-10-24 13:24:18 -07:00
Brian Vaughan
002c250139 Expose a torch.result_type and simplify tensor iterator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26012

Test Plan: Imported from OSS

Differential Revision: D17556197

Pulled By: nairbv

fbshipit-source-id: c0be3ac9e99fecc26a181e301defc1942bc6708c
2019-09-25 06:52:23 -07:00
Dmytro Dzhulgakov
9aad4d7b5f Fix _empty_per_channel_affine_quantized to be less hacky (#26243)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26243

This is an attempt to fix _empty_per_channel_affine_quantized to be more sane. It's a factory function that nevertheless receives a Tensor argument and it throws the codegen off course.

Before people did a hacky workaround of appending _like to the function name to trick codegen, it also required non-natural argument order.

This PR explicitly allows to override the 'category' of the function to make codegen do the right thing. Now name and the argument order (in C++) make more sense.

Test Plan: Imported from OSS

Differential Revision: D17443221

Pulled By: dzhulgakov

fbshipit-source-id: c98c1c74473d8cbf637f511d26ceb949d8ae2a1a
2019-09-23 22:28:58 -07:00
Pavel Belevich
d117842e56 C++ API parity: at::Tensor::version
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26561

Test Plan: Imported from OSS

Differential Revision: D17507167

Pulled By: pbelevich

fbshipit-source-id: 167890c7b745acc9cb9ce4185f1d8c1745aaecc2
2019-09-21 08:37:46 -07:00
Edward Yang
a5bcde97af Revert D17427577: C++ API parity: at::Tensor::version
Test Plan: revert-hammer

Differential Revision:
D17427577

Original commit changeset: e9b3e76ca44d

fbshipit-source-id: a5bbae208ba33a31f90ab5c9b199f232de0c6d1b
2019-09-20 11:19:43 -07:00
Pavel Belevich
198521978b C++ API parity: at::Tensor::version
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26217

Test Plan: Imported from OSS

Differential Revision: D17427577

Pulled By: pbelevich

fbshipit-source-id: e9b3e76ca44df883e3038b688dd7b930752d93a2
2019-09-20 11:02:41 -07:00
Dmytro Dzhulgakov
8c1354c31b Implement more support for per-channel quantization (#26240)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26240

In particular adds support for empty/empty_like which is needed for memory layouts to work.

Test Plan: Imported from OSS

Differential Revision: D17443220

Pulled By: dzhulgakov

fbshipit-source-id: 9c9e25981999c0edaf40be104a5741e9c62a1333
2019-09-19 13:39:17 -07:00
Pavel Belevich
fc3e1a22da C++ API parity: at::Tensor::output_nr
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26216

Test Plan: Imported from OSS

Differential Revision: D17427576

Pulled By: pbelevich

fbshipit-source-id: 351c834c6c44a2a2f915e48a1e8aa8ad7f4274b3
2019-09-19 09:11:40 -07:00
Pavel Belevich
44ffbc43de C++ API parity: at::Tensor::is_leaf
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26186

Test Plan: Imported from OSS

Differential Revision: D17427580

Pulled By: pbelevich

fbshipit-source-id: c01362a3b1fdb0bd1dfc158dbf6fe1cf1d928761
2019-09-18 17:56:13 -07:00
Ralf Gommers
1b4951d3a5 Fix remaining invalid function cast warnings that show up with GCC 8/9 (#26104)
Summary:
Follow-up to gh-25483, more of the same fixes for warnings like:

```
../torch/csrc/autograd/python_variable.cpp:503:31: warning: cast between incompatible function types from ‘PyObject* (*)(THPVariable*)’ {aka ‘_object* (*)(THPVariable*)’} to ‘getter’ {aka ‘_object* (*)(_object*, void*)’} [-Wcast-function-type]
  503 |   {"_backward_hooks", (getter)THPVariable_get_backwards_hooks, (setter)THPVariable_set_backwards_hooks, nullptr, nullptr},
      |                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```

This takes the build log output for a full rebuild with GCC 9.1 from ~10,000 to ~7,000 lines.

`clang-tidy` is going to complain, no way around that - see discussion at the end of gh-25483.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26104

Differential Revision: D17396831

Pulled By: ezyang

fbshipit-source-id: d71696bfe4dbe25519e4bcb7753151c118bd39f7
2019-09-17 07:43:37 -07:00
Gregory Chanan
5aff3dbaf6 Kill 'default_init', which isn't needed anymore.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26281

Test Plan: Imported from OSS

Differential Revision: D17397097

Pulled By: gchanan

fbshipit-source-id: fb53e90637a3dfb2300fca78f414abe2d82832f3
2019-09-16 16:20:49 -07:00
Pavel Belevich
33221b19ac C++ API parity: at::Tensor::data
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26008

Test Plan: Imported from OSS

Differential Revision: D17343488

Pulled By: pbelevich

fbshipit-source-id: b9ba5e26cad621a428a14292446d7fb5a6e5535d
2019-09-12 23:33:34 -07:00
Edward Yang
3d9c419648 Port new_empty to ATen. (#25475)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25475

I got sucked into this rabbit hole when I was trying to understand
what I should do with TensorTypeId occurrences in
torch/csrc/utils/tensor_new.cpp.  I eventually concluded that all of my problems
were because Tensor.new_empty was hand implemented and not actually a native
function.  So I made it a native function.

There are a bunch of other new_* functions which should get this
treatment, but I'm sending out this PR just to show how it can
be done.

The general recipe:
1. Implement a concept of TensorOptions merging (TensorOptions::merge_in).
   This represents the notion of taking a tensor, but "overriding" some
   of its values with specific overrides.  One subtlety here is how
   devices get merged; see the comments for what our existing behavior is,
   and how I preserve it.
2. Implement new_empty as a native function, using options merging.
3. Add another special case to Python binding generation to treat new_*
   similar to *_like (i.e., handle TensorOptions correctly).  The logic
   here is probably wrong, actually; we should codegen TensorOptions
   correctly no matter what happens, but new_empty follows the same
   pattern as empty_like so I opted not to touch this code too much.
4. Delete the now defunct manual binding code.
5. Delete manual type annotations that are no longer necessary since
   we're going through native.

I didn't handle memory format correctly here.  I don't know if this function
should accept memory format; prior memory format patches didn't add support
for memory format to new_like.  If we had put memory format in TensorOptions
this wouldn't have been a question.
ghstack-source-id: 89294185

Test Plan: sandcastle & ossci

Differential Revision: D17133000

fbshipit-source-id: 00f4e98bd5174f6fd54e8aba2910ea91824771d9
2019-09-04 14:34:39 -07:00
Ailing Zhang
858493d168 generic overrideable convolution for backends (#23562)
Summary:
One possible solution based on our discussion yesterday: ezyang gchanan zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23562

Differential Revision: D16998161

Pulled By: ailzhang

fbshipit-source-id: 07fe3a335f43b4205a421b3521aeb5fa4dc80279
2019-08-27 18:33:21 -07:00
Edward Yang
d125b5ffa2 Fix C412 lint from flake8-comprehensions update. (#24184)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24184

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D16764168

Pulled By: ezyang

fbshipit-source-id: cc252a860fd7e4b7fb2b95c5d9fcdbf6935ffeb6
2019-08-12 14:34:45 -07:00
Richard Zou
0dcb8755c8 Implement tensor.set_names_, tensor.names setter
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23172

Test Plan:
- [namedtensor ci]

gh-metadata: pytorch pytorch 23172 gh/zou3519/74/head

Imported from OSS

Differential Revision: D16494364

Pulled By: zou3519

fbshipit-source-id: 8d0e26b33346d4eadba30b2e76610f6d7be7c373
2019-07-26 08:50:49 -07:00
vishwakftw
7d055c21b3 Port SVD to ATen, enable batching for matrix inputs (#21588)
Summary:
Changelog:
- Port SVD TH implementation to ATen/native/BatchLinearAlgebra.cpp
- Port SVD THC implementation to ATen/native/cuda/BatchLinearAlgebra.cu
- Allow batches of matrices as arguments to `torch.svd`
- Remove existing implementations in TH and THC
- Update doc string
- Update derivatives to support batching
- Modify nuclear norm implementation to use at::svd instead of _batch_svd
- Remove _batch_svd as it is redundant
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21588

Test Plan:
- Add new test suite for SVD in test_torch.py with port to test_cuda.py
- Add tests in common_methods_invocations.py for derivative testing

Differential Revision: D16266115

Pulled By: nairbv

fbshipit-source-id: e89bb0dbd8f2d58bd758b7830d2389c477aa61fb
2019-07-15 13:34:01 -07:00
Brian Vaughan
97a604ef57 Rereapply optional ScalarType interface changes that were reverted in D16079809 (#22456)
Summary:
re-apply changes reverted in:
https://github.com/pytorch/pytorch/pull/22412

Also change log_softmax to take positional arguments. Long-term we do want the kwarg-only interface, but seems to currently be incompatible with jit serialization.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22456

Differential Revision: D16097159

Pulled By: nairbv

fbshipit-source-id: 8cb73e9ca18fc66b35b873cf4a574b167a578b3d
2019-07-03 20:03:25 -07:00
Wanchao Liang
dff2c07183 Manual revert of D16012838
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22412

Reviewed By: nairbv, houseroad

Differential Revision: D16079809

fbshipit-source-id: ee0d805ff7a2bc5f98bcc65f90b8199751c840f6
2019-07-01 19:58:21 -07:00
Roy Li
6c454ff14c Stop using Type in Python bindings (#21963)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21963
ghimport-source-id: 4d9d66ba2c8587503d892b67f535cc2a62e2d19e

Test Plan: Imported from OSS

Differential Revision: D15897423

Pulled By: li-roy

fbshipit-source-id: 2dd55ceb80971df7c86545b7bfff733387f13572
2019-06-30 04:11:32 -07:00
Brian Vaughan
7707dee761 Re apply optional ScalarType changes (#22237)
Summary:
This is (mostly) the re-application of:
https://github.com/pytorch/pytorch/pull/21088

which was reverted due to an issue conflicting with changes in:
https://github.com/pytorch/pytorch/pull/22104
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22237

Differential Revision: D16012838

Pulled By: nairbv

fbshipit-source-id: 35f4a73c97ab68b4e2648aca96b2176f07b5a883
2019-06-26 13:36:25 -07:00
Vitaly Fedyunin
516c7e4456 Adding memory_format to empty and empty_like operators (#20558)
Summary:
Original RFC https://github.com/pytorch/pytorch/issues/19092

To ensure that we are not introducing BC breaking change, empty_like returns contiguous tensor by default.

```python
nCwh = torch.randn(N, C, H, W)
nhwC = nCwh.contiguous(memory_format=torch.channels_last)

new_nCwh = torch.empty_like(nhwC)
new_nCwh.is_contiguous(memory_format=torch.channels_last) == False
```

Now we need a way to preserve memory format in `empty_like`

```python
nCwh = torch.randn(N, C, H, W)
nhwC = nCwh.contiguous(memory_format=torch.channels_last)

new_nhwC = torch.empty_like(nhwC, memory_format=torch.preserve_format)
new_nhwC.is_contiguous(memory_format=torch.channels_last) == True

like_nCwh = torch.empty_like(nCwh, memory_format=torch.preserve_format)
like_nCwh.is_contiguous(memory_format=torch.channels_last) == False
```

Usage of `torch.preserve_format` allows us to avoid `if` constructs.

We can also generate different memory format outputs

```python
nCwh = torch.randn(N, C, H, W)
nhwC = nCwh.contiguous(memory_format=torch.channels_last)

new_nhwC = torch.empty_like(nCwh, memory_format=torch.channels_last)
new_nhwC.is_contiguous(memory_format=torch.channels_last) == True

new_nCwh = torch.empty_like(nhwC, memory_format=torch.contiguous_format)
new_nCwh.is_contiguous(memory_format=torch.channels_last) == False
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20558

Differential Revision: D15502474

Pulled By: VitalyFedyunin

fbshipit-source-id: 2e120d57eefad6fb8e04b8322c79871392f64331
2019-06-26 11:48:27 -07:00
vishwakftw
bcb5fd8f06 Port symeig to ATen and enable batching of inputs (#21858)
Summary:
Changelog:
- Port `symeig` from TH/THC to ATen
- Enable batching of matrix inputs for `symeig`
- Modify derivative computation based on batching
- Update docs to reflect the change
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21858

Test Plan: - Added additional tests in `test_torch.py` (with a port to `test_cuda.py`) and `common_methods_invocations.py` to test if both the port and batching work.

Differential Revision: D15981789

Pulled By: soumith

fbshipit-source-id: ab9af8361f8608db42318aabc8421bd99a1ca7ae
2019-06-25 12:13:27 -07:00
Michael Suo
e016a424ef Revert D15944971: [pytorch][PR] merge interfaces that have an optional scalartype parameter
Differential Revision:
D15944971

Original commit changeset: 53473c370813

fbshipit-source-id: a18158b448cb8993b12e1a3bf2c2a3e0d6df6b10
2019-06-24 09:41:33 -07:00
Brian Vaughan
142361a7e4 merge interfaces that have an optional scalartype parameter (#21088)
Summary:
This change is backwards incompatible in *C++ only* on mean(), sum(), and prod() interfaces that accepted either of:
```
Tensor sum(IntArrayRef dim, bool keepdim=false) const;
Tensor sum(IntArrayRef dim, ScalarType dtype) const;
```
but now to specify both the dim and dtype will require the keepdim parameter:
```
Tensor sum(IntArrayRef dim, bool keepdim=false, c10::optional<ScalarType> dtype=c10::nullopt) const;
```

[xla ci]
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21088

Reviewed By: ailzhang

Differential Revision: D15944971

Pulled By: nairbv

fbshipit-source-id: 53473c370813d9470b190aa82764d0aea767ed74
2019-06-24 07:17:58 -07:00
Jerry Zhang
88921feafd change return type for q_scale and q_zero_point (#21709)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21709

Change the return type from Scalar to double/int64_t so we don't need to do conversion when we call other quantize related aten functions

Differential Revision: D15793003

fbshipit-source-id: 510936c69fa17a4d67340a31ebb03415647feb04
2019-06-20 20:30:39 -07:00
Jerry Zhang
fa5263af2c Add set_quantizer_ for QTensor (#21852)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21852

To enable change of q_scale and q_zero_point in `copy_`

Differential Revision: D15793427

fbshipit-source-id: a7040b5b956d161fd6af6176287f4a4aa877c9be
2019-06-18 19:50:12 -07:00
Jerry Zhang
94f903654c Add qscheme() method (#20608)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20608

Exposing QScheme in python as Python objects like `torch.qscheme.per_tensor_affine` etc.

Reviewed By: zafartahirov

Differential Revision: D15364354

fbshipit-source-id: 4d6a96d67e9ead051cf4a8f934553a8c7232fdb7
2019-06-14 16:29:29 -07:00
Richard Zou
0d6eb209e6 Expose torch.empty(sizes, *, names, ...) to Python (#21648)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21648
ghimport-source-id: 583f155c8ee95967d2f8b9d8df27d94b9e725694

Differential Revision: D15804482

Pulled By: zou3519

fbshipit-source-id: f86520dda479100be2a752e4db8a902167413a83
2019-06-14 11:52:47 -07:00
Richard Zou
5c0e058950 Implement at::empty(IntArrayRef, DimnameList?, TensorOptions) in aten (#21647)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21647
ghimport-source-id: 1db4ec31f047f7854a39c28e2b38918dc6b44f42

Differential Revision: D15804425

Pulled By: zou3519

fbshipit-source-id: 575cc3de09287efe75e7052df129626748208d0d
2019-06-13 20:38:19 -07:00
Brennan Vincent
f4f32cecfd numpy like nonzero (called nonzero_tuple) (#20293)
Summary:
No performance degradation compared to Numpy when indexing:

```
In [15]: x=torch.randn((1000,1000))

In [16]: %timeit x[x.nonzero_tuple()]
4.63 ms ± 102 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [17]: y=x.numpy()

In [18]: %timeit y[y.nonzero()]
14.6 ms ± 281 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [20]: x=x.t()

In [22]: %timeit x[x.nonzero_tuple()]
9.01 ms ± 626 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [24]: y=x.numpy()

In [25]: %timeit y[y.nonzero()]
16.8 ms ± 770 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20293

Differential Revision: D15358754

Pulled By: umanwizard

fbshipit-source-id: 1344aabd95c969eeda9780c475a39551231879e1
2019-06-06 12:50:59 -07:00
Brennan Vincent
e268fc97c3 Re-add Tensor.T (#21175)
Summary:
Something flaky is going on with `test_inplace_view_saved_output` on Windows.

With my PR #20598 applied, the test fails, even though there is no obvious reason it should be related, so the PR was reverted.

Based on commenting out various parts of my change and re-building, I think the problem is with the name -- renaming everything from `T` to `asdf` seems to make the test stop failing. I can't be sure that this is actually the case though, since I could just be seeing patterns in non-deterministic build output...

I spoke with colesbury offline and we agreed that it is okay to just disable this test on Windows for now and not block landing the main change. He will look into why it is failing.

**Test Plan:** I will wait to make sure the Windows CI suite passes before landing this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21175

Differential Revision: D15566970

Pulled By: umanwizard

fbshipit-source-id: edf223375d41faaab0a3a14dca50841f08030da3
2019-06-04 17:38:25 -07:00
Edward Yang
0544a491d5 Revert D15499749: [pytorch][PR] Add Tensor.T attribute to reverse dimensions
Differential Revision:
D15499749

Original commit changeset: f3306b496667

fbshipit-source-id: 7f50431d2ea37bc41bfed62f386ddedea1412878
2019-05-29 04:29:48 -07:00
Roy Li
3038cf8eee Remove THSTensor and SparseTensorRef (#20877)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20877
ghimport-source-id: a07f53ca158f9a3dce7a25ef5a169871e98ea3ea

Differential Revision: D15480353

Pulled By: li-roy

fbshipit-source-id: 1152dbc4df827ded3be1a57f007a6b7de12f567f
2019-05-29 01:37:03 -07:00
vishwakftw
f6ec464890 Enable batched QR decomposition and add a some option (#20689)
Summary:
This PR covers two important points with respect to the QR decomposition:
- batching of input matrices (#7500)
- adding `some` as an option in `torch.qr` akin to NumPy's `mode` option (#10538)

Changelog:
- Enable batching for inputs to `torch.qr`
- Move QR decomposition implementation to ATen (CPU and CUDA)
- Remove existing implementations in TH/THC
- Add a `some` option to `torch.qr` that will enable users to switch between complete and reduced decomposition
- Modify doc strings
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20689

Differential Revision: D15529230

Pulled By: soumith

fbshipit-source-id: 16af82b1d2db8a3a758fa8a5f798d83f5f950efb
2019-05-28 17:52:37 -07:00
Brennan Vincent
9294de8c9f Add Tensor.T attribute to reverse dimensions (#20598)
Summary:
For compatibility with numpy
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20598

Differential Revision: D15499749

Pulled By: umanwizard

fbshipit-source-id: f3306b496667f20169e9b28db3150d12183703bc
2019-05-28 16:59:06 -07:00
Junjie Bai
c9f380df02 Add aten mkldnn linear operator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19210

Reviewed By: dzhulgakov

Differential Revision: D14901641

fbshipit-source-id: 8fa68b9941fd93cea0f313a828cba34c5c81ae11
2019-04-26 13:41:57 -07:00
Roy Li
a6811e17c0 Restore copy_ overload with async arg (#19641)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19641
ghimport-source-id: 7099221334505bacdc209cff8bf29e3004c30379

Differential Revision: D15056755

Pulled By: li-roy

fbshipit-source-id: e9063b606e72a70fc1270fbcdcf1c0b23d876dd3
2019-04-24 17:51:50 -07:00