pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Nikita Shulga	478098aaac	Revert D29801652: Refactor Tensor::to to call a primitive that is not copy_. Test Plan: revert-hammer Differential Revision: D29801652 (`29bb3f4647`) Original commit changeset: bb01eb1acf3d fbshipit-source-id: 93693bad8068d47a3a4c16f34f300e03ea573897	2021-07-26 19:37:17 -07:00
Richard Zou	29bb3f4647	Refactor Tensor::to to call a primitive that is not copy_. (#61458 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61458 Context ------- functorch is unable to vmap(grad(f)) when f contains a .to call. This is because .to (when it is not a no-op) decomposes to .copy_ under grad and the .copy_ is not compatible with vmap. Fix --- The fix for this is to have all Tensor::to variants call a new operator, `_to_copy`, that always copies and is a primitive w.r.t. autograd so that autograd decomposes Tensor::to into a call to `_to_copy`. (This is related to https://github.com/pytorch/pytorch/issues/60956, please let me know if you want to bikeshed the naming). In order to get this done I had to do a bit of refactoring. All of the `::to` implementations now call `to_impl` which may call `_to_copy`. Autograd codegen changes ------------------------ The second thing I had to do was modify the autograd codegen. Right now, autograd assumes that every output is either statically known to be differentiable or not differentiable at codegen time. `_to_copy` is a little special because its differentiability depends on the output dtype. e.g. `torch.randn(3, requires_grad=True).to(torch.long)` is non differentiable. To get this to work: - I changed how `output_differentiability` in derivatives.yaml work. - output_differentiability can now accept "conditions" for each of the output arguments. A "condition" is some C++ code. - We currently only support `output_differentiability` with conditions if there is a single output. This is for convenience and can be changed in the future. - I added a new `output_differentiability_conditions` field to DifferentiabilityInfo. This gets populated in load_derivatives.yaml - forward-mode and reverse-mode AD take `output_differentiability_conditions` into account. Here's how the generated code for `VariableType::_to_copy` [looks like](https://gist.github.com/zou3519/93462df4bda1837acee345205b7cc849) No other autogenerated code gets modified by this PR. Performance benchmarking ------------------------ - I benchmarked [three cases that demonstrate overhead](https://gist.github.com/zou3519/5b6985e6906b80eec5a0dd94ed5b6a1a). - Case A: No-op .to(). Instruction count went from 50223 to 25623. I have no clue why but this is a good thing. - Case B: not-no-op .to(). Instruction count went from 665291 to 671961. This is expected; `_to_copy` adds an additional dispatch. - Case C: not-no-op .to() forward pass and backward pass. Instruction count went from 4022841 to 4030057. This PR adds an additional dispatch to .to() (so there should be one additional dispatch in the forward pass) so this number looks reasonable. Test Plan --------- - test_torch.py has a test_to - test_cuda.py has test_to* - test_autograd has tests (test_type_conversions) that exercise the reverse-mode path - test_ops.py has some tests (like log_softmax) that exercise the reverse-mode and forward-mode AD path. - test_quantization, test_namedtensor all exercise tensor.to as well. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D29801652 Pulled By: zou3519 fbshipit-source-id: bb01eb1acf3d79d84f284150d1be4be3b4ace351	2021-07-26 13:02:39 -07:00
Laurence Rouesnel	adb73d3dcf	Removed overhead from reshape() call if tensor doesn't need to be changed (#61466 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61466 ## Goal Per #55126 the performance of `reshape` is worse than `alias` in cases where they are performing the same operation (i.e. where reshape is returning a view) because `reshape` delegates to `view` and duplicates some of the operations (specifically `infer_size_dv` and `computeStride`). The goal of this pull-request is to reduce or remove the additional overhead that `reshape` has. ### Proposed Implementation Instead of using `view` we implement a private/internal operator (`_reshape_alias`) that `reshape` dispatches to which skips the relevant checks. This is functionally equivalent to `as_strided` however it is a lot simpler because it's specialized to this use-case, and importantly the `backward` implementation is a lot faster. Note that we have to dispatch (`reshape` is a composite operator) because `reshape` can return either a view or a copy of the Tensor depending on the parameters, and this complicates implementing a derivative/backward for `reshape`. ### Why not `as_strided`? Using `as_strided` directly slows down autograd. If we use a custom function equivalent to `_reshape_alias` but with a simpler backward function then `view` has the same performance as `reshape`. If we delegate to `as_strided` it is about 56% slower (and this holds against our custom function). This is also the reason we make an internal operator named `_reshape_alias` instead of exposing a new operator since this should only be used in the `reshape` case and it is effectively a more limited version of `view`, `alias`, and `as_strided`. ## Benchmarks In a micro-benchmark for `backward` running: ```cpp // Setup at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); // Benchmark loop // `reshape(-1)` replaced with a call to view(-1) for view baseline x.pow(4).reshape(-1).mean().backward(); ``` I also benchmarked simple operations without gradients using: ```cpp // Setup at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); // Benchmark loop x.reshape(-1) // replaced with a call to view(-1) for view baseline ``` Baselined to `view`: * Original `reshape`: `+3.3%` (without gradients `+20.8%`) * Using `as_strided`: `+55.1%` (without gradients `+1.0%`) * Using custom `_reshape_view`: `-1.0%` (without gradients `+6.2%`) In absolute terms (note the percentages above were generated comparing between runs/tests rather than to a single baseline): * Original `view`: `53.66 us` (without gradients `582.78 ns`) * Original `reshape`: `55.46 us` (without gradients `704.24 ns`) * Using `as_strided`: `83.24 us` (without gradients `576.49 ns`) * Using custom `_reshape_view`: `53.13 us` (without gradients `536.01 ns`) Note that these benchmarks perform a backwards operation as well. When compared without using gradient computation at all the performance differneces are more pronounced as this takes up more of the time. ### Original performance <details> <summary>Benchmark results</summary> ``` [<torch.utils.benchmark.utils.common.Measurement object at 0x7f0e4d393160> x.pow(4).view(-1).mean().backward(); setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); Median: 53.66 us IQR: 2.70 us (52.54 to 55.24) 884 measurements, 100 runs per measurement, 1 thread] [<torch.utils.benchmark.utils.common.Measurement object at 0x7f0e2ebd4fa0> x.pow(4).reshape(-1).mean().backward(); setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); Median: 55.46 us IQR: 2.61 us (54.39 to 57.01) 889 measurements, 100 runs per measurement, 1 thread] 2276116 2286256 <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f0e5b2e3e20> 2640 ???:at::detail::computeStride(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::SmallVector<long, 5u> const&) 1920 ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>) 1520 ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>) 1040 ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long>&&) 980 ???:void at::infer_size_impl<c10::SmallVector<long, 5u> >(c10::ArrayRef<long>, long, c10::SmallVector<long, 5u>&) 720 ???:__tls_get_addr 520 ???:at::shouldRunRecordFunction(bool) 520 ???:__memcpy_avx_unaligned_erms 200 ???:c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10:: ... g>)>::call(c10::OperatorKernel, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) 100 ???:c10::TensorImpl::strides() const 100 ???:c10::TensorImpl::sizes() const 100 ???:at::(anonymous namespace)::manager() 77 /tmp/benchmark_utils_jit_build__1626465284__8a34e7ff-cd37-4a82-be28-7f19e081e771/timer_cpp_7815557938202456331/timer_src.cpp:main 40 ???:c10::TensorImpl::numel() const -77 /tmp/benchmark_utils_jit_build__1626465284__8a34e7ff-cd37-4a82-be28-7f19e081e771/timer_cpp_8055217880649990171/timer_src.cpp:main -260 ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>) Total: 10140 ``` ``` [<torch.utils.benchmark.utils.common.Measurement object at 0x7f850dd66c10> x.view(-1); setup: at::Tensor x=torch::empty({2,2}); Median: 582.78 ns IQR: 33.80 ns (573.80 to 607.61) 833 measurements, 10000 runs per measurement, 1 thread] [<torch.utils.benchmark.utils.common.Measurement object at 0x7f850de31e20> x.reshape(-1); setup: at::Tensor x=torch::empty({2,2}); Median: 704.24 ns IQR: 24.42 ns (697.20 to 721.62) 679 measurements, 10000 runs per measurement, 1 thread] 56896 67036 <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f84e1930bb0> 2640 ???:at::detail::computeStride(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::SmallVector<long, 5u> const&) 1920 ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>) 1520 ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>) 1040 ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long>&&) 980 ???:void at::infer_size_impl<c10::SmallVector<long, 5u> >(c10::ArrayRef<long>, long, c10::SmallVector<long, 5u>&) 720 ???:__tls_get_addr 520 ???:at::shouldRunRecordFunction(bool) 520 ???:__memcpy_avx_unaligned_erms 200 ???:c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10:: ... g>)>::call(c10::OperatorKernel, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) 100 ???:c10::TensorImpl::strides() const 100 ???:c10::TensorImpl::sizes() const 100 ???:at::(anonymous namespace)::manager() 76 /tmp/benchmark_utils_jit_build__1626466038__15fbbac0-2072-4459-8f8e-08121a905b99/timer_cpp_547407365342278353/timer_src.cpp:main 40 ???:c10::TensorImpl::numel() const -76 /tmp/benchmark_utils_jit_build__1626466038__15fbbac0-2072-4459-8f8e-08121a905b99/timer_cpp_3457873755756181226/timer_src.cpp:main -260 ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>) Total: 10140 ``` </details> ### Using `as_strided` <details> <summary>Benchmark results</summary> ``` [<torch.utils.benchmark.utils.common.Measurement object at 0x7f8b13bb5b50> x.pow(4).view(-1).mean().backward(); setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); Median: 53.37 us IQR: 3.15 us (51.73 to 54.88) 936 measurements, 100 runs per measurement, 1 thread] [<torch.utils.benchmark.utils.common.Measurement object at 0x7f8af55f8490> x.pow(4).reshape(-1).mean().backward(); setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); Median: 83.24 us IQR: 4.05 us (81.20 to 85.25) 609 measurements, 100 runs per measurement, 1 thread] 2267916 2525061 <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f8af55f8e50> 31930 ???:_int_free 15940 ???:malloc 11595 ???:_int_malloc 10100 ???:torch::autograd::generated::details::as_strided_backward(at::Tensor, at::TensorGeometry, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 9360 ???:__tls_get_addr 8280 ???:free 8100 ???:torch::autograd::VariableType::(anonymous namespace)::as_strided(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 4520 ???:c10::intrusive_ptr<c10::TensorImpl, c10::UndefinedTensorImpl>::reset_() 4080 ???:operator new(unsigned long) ... -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -920 ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long> const&) -1220 ???:torch::autograd::generated::ViewBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) -1520 ???:at::_ops::view::call(at::Tensor const&, c10::ArrayRef<long>) -1580 ???:torch::ADInplaceOrView::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -1680 ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::SmallVector<long, 5u> >(at::Tensor const&, c10::SmallVector<long, 5u> const&, c10::SmallVector<long, 5u> const&) -2560 ???:at::detail::computeStride(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::SmallVector<long, 5u> const&) -2640 ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>) -4860 ???:torch::autograd::VariableType::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) Total: 257145 ``` ``` [<torch.utils.benchmark.utils.common.Measurement object at 0x7f93176a0160> x.view(-1); setup: at::Tensor x=torch::empty({2,2}); Median: 570.55 ns IQR: 32.69 ns (552.87 to 585.56) 874 measurements, 10000 runs per measurement, 1 thread] [<torch.utils.benchmark.utils.common.Measurement object at 0x7f92f8f29490> x.reshape(-1); setup: at::Tensor x=torch::empty({2,2}); Median: 576.49 ns IQR: 37.95 ns (559.51 to 597.46) 861 measurements, 10000 runs per measurement, 1 thread] 56896 58556 <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f932556ca60> 2140 ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>) 1940 ???:torch::autograd::VariableType::(anonymous namespace)::as_strided(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 1880 ???:torch::ADInplaceOrView::(anonymous namespace)::as_strided(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 1720 ???:at::_ops::as_strided::call(at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 1520 ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>) 1400 ???:at::native::as_strided_tensorimpl(at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 1260 ???:at::_ops::as_strided::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>)'2 1260 ???:at::_ops::as_strided::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<long>) 980 ???:void at::infer_size_impl<c10::SmallVector<long, 5u> >(c10::ArrayRef<long>, long, c10::SmallVector<long, 5u>&) ... -620 ???:at::Tensor c10::Dispatcher::redispatch<at::Tensor, at::Tensor const&, c10::ArrayRef<long ... ::ArrayRef<long>)> const&, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) const -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)'2 -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -920 ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long> const&) -1520 ???:at::_ops::view::call(at::Tensor const&, c10::ArrayRef<long>) -1580 ???:torch::ADInplaceOrView::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -1680 ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::SmallVector<long, 5u> >(at::Tensor const&, c10::SmallVector<long, 5u> const&, c10::SmallVector<long, 5u> const&) -1740 ???:torch::autograd::VariableType::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -2640 ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>) Total: 1660 ``` </details> ### Using custom function (`_reshape_alias`) <details> <summary>Benchmark results</summary> ``` [<torch.utils.benchmark.utils.common.Measurement object at 0x7f16861d6b50> x.pow(4).view(-1).mean().backward(); setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); Median: 53.50 us IQR: 2.64 us (52.32 to 54.96) 906 measurements, 100 runs per measurement, 1 thread] [<torch.utils.benchmark.utils.common.Measurement object at 0x7f1667b2ed60> x.pow(4).reshape(-1).mean().backward(); setup: at::Tensor x=torch::empty({2,2}, torch::requires_grad(true)); Median: 53.13 us IQR: 3.40 us (51.72 to 55.13) 914 measurements, 100 runs per measurement, 1 thread] 2269736 2273236 <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f1693f8dc10> 5060 ???:torch::autograd::VariableType::(anonymous namespace)::_reshape_alias(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 2000 ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>) 1780 ???:torch::ADInplaceOrView::(anonymous namespace)::_reshape_alias(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 1660 ???:at::_ops::_reshape_alias::call(at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 1600 ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::ArrayRef<long> >(at::Tensor const&, c10::ArrayRef<long> const&, c10::ArrayRef<long> const&) 1520 ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>) 1240 ???:at::_ops::_reshape_alias::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)'2 1240 ???:at::_ops::_reshape_alias::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 1220 ???:torch::autograd::generated::AliasToShapeBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) ... -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)'2 -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -920 ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long> const&) -1220 ???:torch::autograd::generated::ViewBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) -1520 ???:at::_ops::view::call(at::Tensor const&, c10::ArrayRef<long>) -1580 ???:torch::ADInplaceOrView::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -1680 ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::SmallVector<long, 5u> >(at::Tensor const&, c10::SmallVector<long, 5u> const&, c10::SmallVector<long, 5u> const&) -2640 ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>) -4860 ???:torch::autograd::VariableType::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) Total: 3500 ``` ``` [<torch.utils.benchmark.utils.common.Measurement object at 0x7f5287adfb20> x.view(-1); setup: at::Tensor x=torch::empty({2,2}); Median: 505.10 ns IQR: 20.04 ns (500.41 to 520.45) 944 measurements, 10000 runs per measurement, 1 thread] [<torch.utils.benchmark.utils.common.Measurement object at 0x7f526951b430> x.reshape(-1); setup: at::Tensor x=torch::empty({2,2}); Median: 536.01 ns IQR: 17.81 ns (531.34 to 549.16) 916 measurements, 10000 runs per measurement, 1 thread] 56896 60376 <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f5295896c10> 2000 ???:at::native::reshape(at::Tensor const&, c10::ArrayRef<long>) 1860 ???:torch::autograd::VariableType::(anonymous namespace)::_reshape_alias(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 1780 ???:torch::ADInplaceOrView::(anonymous namespace)::_reshape_alias(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 1660 ???:at::_ops::_reshape_alias::call(at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 1600 ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::ArrayRef<long> >(at::Tensor const&, c10::ArrayRef<long> const&, c10::ArrayRef<long> const&) 1520 ???:at::_ops::reshape::call(at::Tensor const&, c10::ArrayRef<long>) 1240 ???:at::_ops::_reshape_alias::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>)'2 1240 ???:at::_ops::_reshape_alias::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>, c10::ArrayRef<long>) 980 ???:void at::infer_size_impl<c10::SmallVector<long, 5u> >(c10::ArrayRef<long>, long, c10::SmallVector<long, 5u>&) ... -620 ???:at::Tensor c10::Dispatcher::redispatch<at::Tensor, at::Tensor const&, c10::ArrayRef<long ... ::ArrayRef<long>)> const&, c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) const -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>)'2 -780 ???:at::_ops::view::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -920 ???:c10::SmallVectorImpl<long>::operator=(c10::SmallVectorImpl<long> const&) -1520 ???:at::_ops::view::call(at::Tensor const&, c10::ArrayRef<long>) -1580 ???:torch::ADInplaceOrView::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -1680 ???:at::Tensor at::native::alias_with_sizes_and_strides<c10::SmallVector<long, 5u> >(at::Tensor const&, c10::SmallVector<long, 5u> const&, c10::SmallVector<long, 5u> const&) -1740 ???:torch::autograd::VariableType::(anonymous namespace)::view(c10::DispatchKeySet, at::Tensor const&, c10::ArrayRef<long>) -2640 ???:at::native::view(at::Tensor const&, c10::ArrayRef<long>) Total: 3480 ``` </details> Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D29792126 Pulled By: laurencer fbshipit-source-id: f0519b45b65f868aa3e8651679354558bd761dfd	2021-07-21 14:05:35 -07:00
albanD	d03ff1a17d	pre compute regex and match simple signature autograd codegen 15s -> 12s (#59852 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59852 This whole stack does not change anything to the codegened code Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D29063814 Pulled By: albanD fbshipit-source-id: a751047526f8d58f4760ee6f9ae906675bed5d75	2021-06-12 06:58:36 -07:00
albanD	30a18fe318	refactor yaml loader import, no runtime change (#59850 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59850 This whole stack does not change anything to the codegened code Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D29063816 Pulled By: albanD fbshipit-source-id: ca3067443d8e6282c1077d3dafa3b4f330d43b28	2021-06-12 06:58:34 -07:00
albanD	7143a6a189	Avoid unnecessary re-computation autograd codegen 21s -> 15s (#59847 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59847 This whole stack does not change anything to the codegened code Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D29063817 Pulled By: albanD fbshipit-source-id: 284c3e057029b7a67f43a1b034bb30863bd68c71	2021-06-12 06:57:19 -07:00
Kshiteej K	c90260905f	[fix] torch.{lin, log}space(): properly examine passed dtype (#53685 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53171 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53685 Reviewed By: jbschlosser Differential Revision: D28331863 Pulled By: anjali411 fbshipit-source-id: e89359b607d058158cfa1c9a82389d9a4a71185b	2021-06-10 11:59:54 -07:00
Jeffrey Wan	f52e202840	Add warning when accessing Tensor::grad() in the C++ API (#59362 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/35379 - Adds `retains_grad` attribute backed by cpp as a native function. The python bindings for the function are skipped to be consistent with `is_leaf`. - Tried writing it without native function, but the jit test `test_tensor_properties` seems to require that it be a native function (or alternatively maybe it could also work if we manually add a prim implementation?). - Python API now uses `retain_grad` implementation from cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/59362 Reviewed By: jbschlosser Differential Revision: D28969298 Pulled By: soulitzer fbshipit-source-id: 335f2be50b9fb870cd35dc72f7dadd6c8666cc02	2021-06-08 19:43:21 -07:00
Brian Hirsh	9354a68e7d	[codegen] split out backend-specific information from NativeFunction in the model (#57361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57361 Data model change in the codegen, which splits backend-specific information out of `NativeFunction` ### Overview Currently in the codegen, native_functions.yaml has backend-specific information about each operator that is encoded directly into the data model, in the `NativeFunction` object. That's reasonable, since the native_functions.yaml is the source of truth for information about an operator, and the data model encodes that information into types. Now that external backends can use the codegen though, that information is technically incomplete/inaccurate. In another PR, I tried patching the information on the `NativeFunction` object with the additional external information, by updating the `dispatch` entry to contain the external backend kernel name and dispatch key. Instead, this PR tries to split out that information. The `NativeFunction` class contains all information about an operator from native_functions.yaml that's backend-independent and is known never to change regardless of what extra information backends provide. We also build up a backend "index", which is basically a mapping from [backend] -> [backend-specific-metadata]. Reading in an external backend yaml just involves updating that index with the new backend. There were a few places where `NativeFunction` used the dispatch table directly, that I encoded as properties directly on the NativeFunction object (e.g. `is_abstract`). They were mostly around whether or not the operator has a composite kernel, which isn't something that's going to change for any external backends. This has a few advantages: - We can more easily re-use the existing logic in `native_function.py` and `register_dispatch_key.py` for both native and external backends, since they both involve a NativeFunction + a particular backend index - The data in the data model will be the same regardless of how the codegen is run. Running the codegen with a new external backend doesn't change the data inside of NativeFunction or an existing backend index. It just adds a new index for that backend. - There are several of codegen areas that don't care about backend-specific information: mostly the tracing and autograd codegen. We can reason about the codegen there more easily, knowing that backend-specific info is entirely uninvolved. An alternative to this split would be to augment the NativeFunction objects with external backend information at the time that we create them. So the external codegen could read both native_functions.yaml and the external backend's yaml at the same time, and construct a NativeObject with a full dispatch table (including the XLA entry), and the correct setting of structured (taking into account both yamls). One disadvantage to this approach is that NativeFunction objects now contain different stuff depending on how you ran the codegen, and you have to make sure that any changes to the codegen can properly handle all the different variants. ### Data Model Changes Removed 3 classes, which are used by the external codegen: - ExternalBackendFunction - ExternalBackendFunctionsGroup - ExternalBackendMetadata And added two new ones: - BackendIndex - BackendMetadata `BackendIndex` contains any info that's specific to that backend, plus a mapping from operator names to backend specific metadata about the operator. One example of backend-specific info that's not operator-dependent is the fact that XLA prefers to implement functional kernels instead of out kernels (and so when they eventually mark an op as structured, they're going to mark the functional op and not the out op). `BackendMetadata` contains info specific to an (operator, backend) pair. Right now, that's just (a) the name of the kernel, and (b) whether or not that operator is structured. ### Questions I wanted to get this PR up earlier so I could get feedback, but there are a few things I want to call out: Dealing with `structured`. This PR separates out the notion of `structured` into two bits of information: - Does [operator] have a meta() function. This is backend-agnostic, and is represented by the `structured` property on `NativeFunction`, same as before. This is used, e.g., to decide what signatures to add to `MetaFunctions.h`. - Does [operator, backend] have an impl() function. This is backend dependent; even though technically all in-tree backends are forced to write impl() functions for an operator when we port the op to structured in native_functions.yaml, out-of-tree backends can decide to opt in independently. This is represented as a property on `BackendMetadata`. This is used in most other cases, e.g. in `RegisterDispatchKey` when we're deciding whether or not to gen a structured or unstructured wrapper. I also baked `is_structured_dispatch_key` directly into each BackendIndex. So for operators marked "structured" in native_functions.yaml, their corresponding CPU/CUDA BackendIndex entries will be marked structured, and all others (except for potentially external backends) will not. I ended up trying to deal with `structured` in this change since it's technically backend dependent (XLA can opt kernels into structured separately from in-tree ops), but that may have been too ambitious: it's technically not relevant until we actually add support for structured external kernels. If it's not clear that this is the right path for dealing with structured and we want to push that off, I'm fine with backing out the bits of this PR that make `structured` backend-dependent. I don't see anything too controversial related to structured in the change, but I tried to call out any areas in the comments Localizing the fact that external backends follow Dispatcher convention. Another thing that's sort of backend specific that I didn't totally address in this PR is the fact the fact that in-tree backends follow the Native API while external backends follow the Dispatcher API. I painted over that in `native_functions.py` by adding a helper, `kernel_signature`, that takes in a native function and gives you the "correct" signature for the specified backend- NativeSignature for in-tree backends, and DispatcherSignature for out-of-tree backends. In order to make that fully useable though, we'll need `NativeSignature` and `DispatcherSignature` to have matching interfaces. I didn't bother with that in this PR, which is why `gen_external_aten_fallbacks.py` still has a bunch of direct references to the dispatcher API. Thinking of adding it in a later PR but wanted to see if anyone has other opinions. Maybe `is_external()` shouldn't even be a property on the BackendMetadata, and anything the codegen does that requires asking for that information should just be better abstracted away. Thoughts on the `BackendIndex` / `BackendMetadata` breakdown. One thing that's annoying right now is that to query for various pieces of metadata, you call helper functions like `backend_index.structured(f)`, which queries that particular backend and tells you if that specific NativeFunctionGroup is structured for that backend. It has to return an `Optional[bool]` though, since you have to handle the case where that operator doesn't have a kernel for that backend at all. So users of those helpers end up with a bunch of optionals that they need to unpack, even if they know at some point that the result isn't None. I think it would be easier instead to just store the NativeFunction object as a field directly on the BackendMetadata. Curious if there are any other opinions on a better way to model it though. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28474362 Pulled By: bdhirsh fbshipit-source-id: 41a00821acf172467d764cb41e771e096542f661	2021-05-17 12:25:35 -07:00
Ilqar Ramazanli	8b816e9010	To implement gradient for Pytorch (#54617 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56129 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54617 Reviewed By: anjali411 Differential Revision: D28057452 Pulled By: iramazanli fbshipit-source-id: 9bd86679282d34f5e5393e6447121586517eb4f0	2021-05-11 18:52:20 -07:00
Ilqar Ramazanli	15975cf6a6	To add priority of int/int? over int[] on signature matching and adding {h,v,d}split methods (#57346 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54555 It has been discussed in the issue https://github.com/pytorch/pytorch/issues/54555 that {h,v,d}split methods unexpectedly matches argument of single int[] when it is expected to match single argument of int. The same unexpected behavior can happen in other functions/methods which can take both int[] and int? as single argument signatures. In this PR we solve this problem by giving higher priority to int/int? arguments over int[] while sorting signatures. We also add methods of {h,v,d}split methods here, which helped us to discover this unexpected behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57346 Reviewed By: ezyang Differential Revision: D28121234 Pulled By: iramazanli fbshipit-source-id: 851cf40b370707be89298177b51ceb4527f4b2d6	2021-05-03 18:52:41 -07:00
Sam Estep	75024e228c	Add lint for unqualified `type: ignore` (#56290 ) Summary: The other half of https://github.com/pytorch/pytorch/issues/56272. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56290 Test Plan: CI should pass on the tip of this PR, and we know that the lint works because the following CI runs (before this PR was finished) failed: - https://github.com/pytorch/pytorch/runs/2384511062 - https://github.com/pytorch/pytorch/actions/runs/765036024 Reviewed By: seemethere Differential Revision: D27867219 Pulled By: samestep fbshipit-source-id: e648f07b6822867e70833e23ddafe7fb7eaca235	2021-04-21 08:07:23 -07:00
Edward Yang	6ec71ed4f9	Replace all direct cdata access with THPVariable_Unpack (#55799 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55799 I'm going to change the implementation of cdata soon so I need to abstract over cdata access with a function. Additionally, many users are casting manually casting to THPVariable to access the member so I can remove these unsafe casts in the client code (the implementation, of course, is still doing an unsafe cast.) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27712130 Pulled By: ezyang fbshipit-source-id: 95fcc013bf3913d67f2c634068eb5b3aab144cb3	2021-04-15 08:57:04 -07:00
Sam Estep	4753100a3b	Un-ignore F403 in .flake8 (#55838 ) Summary: Generally wildcard imports are bad for the reasons described here: https://www.flake8rules.com/rules/F403.html This PR replaces wildcard imports with an explicit list of imported items where possible, and adds a `# noqa: F403` comment in the other cases (mostly re-exports in `__init__.py` files). This is a prerequisite for https://github.com/pytorch/pytorch/issues/55816, because currently [`tools/codegen/dest/register_dispatch_key.py` simply fails if you sort its imports](https://github.com/pytorch/pytorch/actions/runs/742505908). Pull Request resolved: https://github.com/pytorch/pytorch/pull/55838 Test Plan: CI. You can also run `flake8` locally. Reviewed By: jbschlosser Differential Revision: D27724232 Pulled By: samestep fbshipit-source-id: 269fb09cb4168f8a51fd65bfaacc6cda7fb87c34	2021-04-13 09:24:07 -07:00
Sameer Deshmukh	5fb1142702	Add CSR (compressed sparse row) layout for sparse tensors (#50937 ) Summary: Implement compressed sparse row format. Derived from the GCS implementation at https://github.com/pytorch/pytorch/pull/44190 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50937 Reviewed By: mrshenli Differential Revision: D27439865 Pulled By: ezyang fbshipit-source-id: 3ba3dcb9679505b980ff6a5f513e913bbae2fb1d	2021-04-12 10:09:12 -07:00
lezcano	5870346173	Port index_copy from TH to ATen (#52203 ) Summary: The design of the `TensorIterator` was similar to that in https://github.com/pytorch/pytorch/pull/50578 Resolves https://github.com/pytorch/pytorch/issues/24670 Resolves https://github.com/pytorch/pytorch/issues/24523 Timings: <details> <summary>Script</summary> ```python from IPython import get_ipython import torch torch.manual_seed(13) torch.set_num_threads(1) ipython = get_ipython() cpu = torch.device('cpu') cuda = torch.device('cuda') def run_test(ndims, size, index_len, device): print(f"ndims: {ndims}, tensor_size: {size}, index_len: {index_len}, device: {device}") x = torch.rand(([size] ndims), device=device) index = torch.randint(size, (index_len,), dtype=torch.long, device=device) for d in range(ndims): shape_t = [size] * d + [index_len] + [size] * (ndims - d - 1) t = torch.rand(*shape_t, device=device) command = "x.index_copy(d, index, t)" if device == cuda: command = command + "; torch.cuda.synchronize()" ipython.magic(f"timeit {command}") print() run_test(3, 700, 10, cpu) run_test(3, 700, 100, cpu) run_test(3, 700, 700, cpu) run_test(2, 10000, 10000, cpu) run_test(3, 700, 10, cuda) run_test(3, 700, 100, cuda) run_test(3, 700, 700, cuda) run_test(2, 10000, 10000, cuda) ``` </details> <details> <summary>CPU ATen</summary> ``` ndims: 3, tensor_size: 700, index_len: 10, device: cpu 327 ms ± 309 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 329 ms ± 456 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 378 ms ± 1.44 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ndims: 3, tensor_size: 700, index_len: 100, device: cpu 348 ms ± 1.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 359 ms ± 330 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 526 ms ± 686 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) ndims: 3, tensor_size: 700, index_len: 700, device: cpu 560 ms ± 19 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 552 ms ± 2.61 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 932 ms ± 2.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ndims: 2, tensor_size: 10000, index_len: 10000, device: cpu 163 ms ± 5.05 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 302 ms ± 5.75 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` </details> <details> <summary>CUDA ATen</summary> ``` ndims: 3, tensor_size: 700, index_len: 10, device: cuda 9.63 ms ± 441 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 9.65 ms ± 230 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 12.4 ms ± 881 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ndims: 3, tensor_size: 700, index_len: 100, device: cuda 10.8 ms ± 1.51 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 11 ms ± 417 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 21.2 ms ± 18.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ndims: 3, tensor_size: 700, index_len: 700, device: cuda 19 ms ± 4.42 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 17.8 ms ± 493 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 25.8 ms ± 1.22 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ndims: 2, tensor_size: 10000, index_len: 10000, device: cuda 5.59 ms ± 109 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 10 ms ± 25.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` </details> <details> <summary>CPU TH</summary> ``` ndims: 3, tensor_size: 700, index_len: 10, device: cpu 333 ms ± 2.42 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 327 ms ± 1.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 366 ms ± 753 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) ndims: 3, tensor_size: 700, index_len: 100, device: cpu 336 ms ± 1.24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 345 ms ± 914 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 884 ms ± 4.32 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ndims: 3, tensor_size: 700, index_len: 700, device: cpu 441 ms ± 3.58 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 514 ms ± 1.17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 7.46 s ± 6.46 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ndims: 2, tensor_size: 10000, index_len: 10000, device: cpu 141 ms ± 233 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 1.13 s ± 855 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` </details> <details> <summary>CUDA TH</summary> ``` ndims: 3, tensor_size: 700, index_len: 10, device: cuda 9.64 ms ± 390 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 9.68 ms ± 3.26 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 13.9 ms ± 928 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ndims: 3, tensor_size: 700, index_len: 100, device: cuda 11.6 ms ± 1.38 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 12.1 ms ± 3.72 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 30.3 ms ± 27.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ndims: 3, tensor_size: 700, index_len: 700, device: cuda 27.2 ms ± 19.8 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 30.6 ms ± 43.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 146 ms ± 204 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ndims: 2, tensor_size: 10000, index_len: 10000, device: cuda 6.5 ms ± 3.99 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 64.7 ms ± 55.5 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` </details> According to these we see a slight performance improvement across both CPU and GPU. cc: nikitaved Pull Request resolved: https://github.com/pytorch/pytorch/pull/52203 Reviewed By: jbschlosser Differential Revision: D27066572 Pulled By: mruberry fbshipit-source-id: 6101e461cf731afa3db042a383b723d3d6bfdc26	2021-03-22 22:36:35 -07:00
kedejesu	53d8778b4d	Update clang-format linux hash and yaml import calls (#53932 ) Summary: Fixing Bandit security issues. - yaml_load: Use of unsafe yaml load. Allows instantiation of arbitrary objects. Consider yaml.safe_load(). Test ID: B506 Severity: MEDIUM Confidence: HIGH File: ./caffe2/contrib/aten/gen_op.py More info: https://bandit.readthedocs.io/en/latest/plugins/b506_yaml_load.html 235 if __name__ == '__main__': 236 decls = yaml.load(read(os.path.join(args.yaml_dir, 'Declarations.yaml')), Loader=Loader) 237 factory_methods = find_factory_methods(decls) - Blacklist: Use of insecure MD2 (`6149a26adb`), MD4 (`fc7f026980`), MD5 (`7ea9d9af4e`), or SHA1 hash function. Test ID: B303 Severity: MEDIUM Confidence: HIGH File: ./tools/clang_format_utils.py More info: https://bandit.readthedocs.io/en/latest/blacklists/blacklist_calls.html#b303-md5 36 37 hash = hashlib.sha1() 38 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53932 Reviewed By: jbschlosser Differential Revision: D27072017 Pulled By: malfet fbshipit-source-id: 2fef0119388797aee3cacdc880fc345bd2ba68ce	2021-03-18 17:11:58 -07:00
Ailing Zhang	9f75de278f	Move common autograd utils functions from gen_variable_type.py to api/autograd.py. (#53340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53340 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D26973914 Pulled By: ailzhang fbshipit-source-id: 8367a08b27b25808782c77aadc3c67d07c354957	2021-03-11 19:58:45 -08:00
kshitij12345	c4c77e2001	[special] add `torch.special` namespace (#52296 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 * Add `torch.special` namespace * Add `torch.special.gammaln` (alias to `torch.lgamma`) TODO: * Add proper entries for docs. * [x] Add .rst file entry * [x] Add documentation * [x] Update `lgamma` OpInfo entry for alias to `special.gammaln`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52296 Reviewed By: ngimel Differential Revision: D26754890 Pulled By: mruberry fbshipit-source-id: 73479f68989d6443ad07b7b02763fa98973c15f6	2021-03-04 00:04:36 -08:00
Peter Bell	70d0aab7bd	De-prioritise Dimname and DimnameList in python overload resolution (#51350 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51350 `None` being a valid `Dimname` is awkward for optional `dim` arguments, as found on NumPy's reduction functions like `std` and `var`. In these cases `dim=None` should mean an all-reduction, but instead you get an error "Please look up dimensions by name". I've also had to fix `FunctionParameter::check` to actually check the first element of `INT_LIST` arguments and reject non-int types. Otherwise, the dim names end up calling the `int[]` overload and fail. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D26756208 Pulled By: mruberry fbshipit-source-id: 44221ca0f4822ec2c1f62b092466fd4f779eb45a	2021-03-02 23:07:08 -08:00
Vasiliy Kuznetsov	33afb5f19f	fake_quant cachemask: remove Python bindings (#51878 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51878 `fake_quantize_per_tensor_affine_cachemask` and `fake_quantize_per_channel_affine_cachemask` are implementation details of `fake_quantize_per_tensor_affine` and `fake_quantize_per_channel_affine`, removing the Python bindings for them since there is no need to expose them. Test Plan: ``` python test/test_quantization.py TestFakeQuantize ``` Imported from OSS Reviewed By: albanD, bugra Differential Revision: D26314173 fbshipit-source-id: 733c93a3951453e739b6ed46b72fbad2244f6e97	2021-02-09 23:27:53 -08:00
Edward Yang	93c4f9f972	Split out RegisterDispatchKey to its own file (#51508 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51508 No substantive changes. The codegen for this file was getting a bit long so I moved it off into tools.codegen.dest submodule (I wanted to do tools.codegen.gen but that conflicts with the existing module; oy vey!) To do this I had to move some other functions around so that they were more generally accessible. Otherwise self-explanatory. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D26187856 Pulled By: ezyang fbshipit-source-id: fd3784571d03d01c4acb7ca589fcde4492526408	2021-02-04 09:19:32 -08:00
Edward Yang	8e20594b38	Construct CppSignatureGroup from NativeFunction (#49245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49245 This will make it easier to implement the POC in `d534f7d4c5` see also https://github.com/pytorch/pytorch/pull/45666 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D25594005 Pulled By: ezyang fbshipit-source-id: e458d3dc3a765ec77425761b9b17f23769cecf9e	2021-01-04 11:55:28 -08:00
albanD	c23808d8e8	Reland: Add base forward grad logic (#49734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49734 RFC: https://github.com/pytorch/rfcs/pull/11 This PR add the basic logic to handle forward grad as dual Tensors. It contains the following: - Mechanism to save dual state on a Tensor and clear it up when the dual level ends - C++ and python user facing API - Updated view system that is able to track both forward and backward views The current PR has the following limitations: - Extensive tests are in the next PR in the stack as formulas are needed to write full tests. - Only the manual formulas have been audited and no other formula is actually implemented here (they are in the next PR in the stack) - Only level 0 is allowed for now. This was discussed and agreed that it is not needed for the first version of this PR. - We can save one ViewInfo creation when both the forward and backward views have the same base. This can be done by adding a boolean flag to the DifferentiableViewMeta and extra logic in the `as_view` method. This is left out to keep this PR concise. - We can skip tracking forward views if the base has a forward grad. This can be done by adding extra logic in the `as_view` method. This is left out to keep this PR concise. Reading guide: - Updated view handling in [gen_variable_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-f6553cec68caeaea36f6c8b14ff76a6d39dfd774e0ea9ef2f76e8d81fd9af5df), [VariableTypeUtils.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-ec71cfa45954dece1236c661d170e6341879c5be637f4abf52e826d61b40695a), [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285) (skip code below "[Forward Grad View]" for now), [variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-1604bcd0e4350ed99ec45e437cee7ac9ebe337392c9ea16a236247aeeb35b02bR266-R542) and [custom_function.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-dd85f452082b5bb6612bbc12adb496f8827defa228509f7b493de1d517522d5d). This introduces the new ViewInfo to hold view informations shared for forward and backward. It also updates the differentiable view meta to use this. And it updates the as_view function to handle both forward and backward view. - New forward grad class that handle storing gradients and tracking at each level [forward_grad.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c6c5b9ab2d7e5dde4102495faa1b6bbbfc23aa3e47deb7359c0bfe1eb004c0cb), [forward_grad.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-de2ab54ade7312701850d71a119a4f4ee4b9fc5a9c42a467cdd4e73c033531dd) and [build_variables.bzl](https://github.com/pytorch/pytorch/pull/49097/files#diff-dfdfa2efb17beddfd9094524f95351fd197db6c8857e96b436fb599870359325). EDIT: These files also contain the new flag to globally disable forward AD that allows us to reduce performance issues while this is in development. - Lowest level API and binding between Tensor and AutogradMeta in [TensorBody.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-7554853205392fa743357bf845ecc350a974ec049383248c12daaf2f4de04911), [TensorImpl.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-052bd9150ef8e09289ddf644b5a6830ede49207201cd41728f6d7cc6d9cead94), [TensorImpl.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-a15aae4cf23da44970db7cece62ff981265575c798c62f7b52d87c8809dfe2e1) and the rest of [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285R557-R677) - API to access the forward primal that needs to be a differentiable function (and so in native_functions.yaml) [native_functions.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-2f3dbd85efb9b5172f2264eedd3be47dd765e6ab7cc8bf3ade5e62c28ae35991) [NamedRegistrations.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-69bd3bea510c9b64e1633fa18c3ea63d4b8348dbad3a78ad9de844ab3e43dc1d), [VariableMethodsStub.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-23f5fcb737a2b289811fe0f4b65aef775e7c824b2e629ecd343df51405cd434f), [derivatives.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_python_functions.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_trace_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-54e0b976027bf8debefb959ff360b89ae93466970c843365b1b3a03806d868ce), [TraceTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-f34636741ad4a23d018e0c289bc750c3bad887b45660e1d6eaf440d234a78fbf) and [part of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R198-R243) - c++ API [autograd.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-349028fbe8291a965a7a263c323b208fe071c35c66179ee997ef84fa81aa4b1e), [autograd.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-a3fe908d67dfec16a1fcde300de68b0701bf68b88db7451f29f2bee255cf30c9) - python binding [init.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-c58a67c85191c22c9b3bb439117d8053edfd9dea839fa010cf967d404c3c630d) - python API [forward_ad.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a4efad4ba18fffdfb264c21e5475997a24a743089a899f8ec1a5ff962c6738d9), [autograd/__init__.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-743abcafd32ad0e69f39ac5a91df4197b7e1921c135cacee7ef6dc829a8a7af8) - c++ and python printing [Formatting.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-881dba501e71662e2e4818b4b016f739b344c8aed2f5edc6b871eda47a2aced0), [_tensor_str.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a7911f8d5e73adbff914d99fd7818ace2a7030b6a3748abe06ec6fc6e3df9cc3) - Utility for formulas and updated manual functions to respect new view system as well as forward grad [FunctionsManual.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-6378bb6dc81a64dab676d61731341fa5d1088418f32a1473a33a0ccfc2357dc1), [FunctionsManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-4adbd88239afcd60e8198aab65d4f5e43b62314e34b80551e997a1ea503adea5) [rest of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R264-R433) - Ensure SavedVariable save forward grad properly [saved_variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c1b8039d776241abe177d5aa99b79dd9489a9b3e529da8ab24c2e386c1238ae2), [saved_variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-cc9fba479b5beae06b2eea2e390d17796e0341c5b037a20b5bcaccbb0c341030) Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D25678797 Pulled By: albanD fbshipit-source-id: 3d58550c11b5f58b9b73fd30596d042b857fb9dd	2020-12-22 12:11:27 -08:00
Walter Shen	f5178bf151	Revert D25607503: Add base forward grad logic Test Plan: revert-hammer Differential Revision: D25607503 (`fdf02eff3d`) Original commit changeset: f1396290de1d fbshipit-source-id: 057206e28ff48ee288856adfe3ca577d4880789f	2020-12-21 19:56:28 -08:00
albanD	fdf02eff3d	Add base forward grad logic (#49097 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49097 RFC: https://github.com/pytorch/rfcs/pull/11 This PR add the basic logic to handle forward grad as dual Tensors. It contains the following: - Mechanism to save dual state on a Tensor and clear it up when the dual level ends - C++ and python user facing API - Updated view system that is able to track both forward and backward views The current PR has the following limitations: - Extensive tests are in the next PR in the stack as formulas are needed to write full tests. - Only the manual formulas have been audited and no other formula is actually implemented here (they are in the next PR in the stack) - Only level 0 is allowed for now. This was discussed and agreed that it is not needed for the first version of this PR. - We can save one ViewInfo creation when both the forward and backward views have the same base. This can be done by adding a boolean flag to the DifferentiableViewMeta and extra logic in the `as_view` method. This is left out to keep this PR concise. - We can skip tracking forward views if the base has a forward grad. This can be done by adding extra logic in the `as_view` method. This is left out to keep this PR concise. Reading guide: - Updated view handling in [gen_variable_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-f6553cec68caeaea36f6c8b14ff76a6d39dfd774e0ea9ef2f76e8d81fd9af5df), [VariableTypeUtils.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-ec71cfa45954dece1236c661d170e6341879c5be637f4abf52e826d61b40695a), [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285) (skip code below "[Forward Grad View]" for now), [variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-1604bcd0e4350ed99ec45e437cee7ac9ebe337392c9ea16a236247aeeb35b02bR266-R542) and [custom_function.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-dd85f452082b5bb6612bbc12adb496f8827defa228509f7b493de1d517522d5d). This introduces the new ViewInfo to hold view informations shared for forward and backward. It also updates the differentiable view meta to use this. And it updates the as_view function to handle both forward and backward view. - New forward grad class that handle storing gradients and tracking at each level [forward_grad.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c6c5b9ab2d7e5dde4102495faa1b6bbbfc23aa3e47deb7359c0bfe1eb004c0cb), [forward_grad.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-de2ab54ade7312701850d71a119a4f4ee4b9fc5a9c42a467cdd4e73c033531dd) and [build_variables.bzl](https://github.com/pytorch/pytorch/pull/49097/files#diff-dfdfa2efb17beddfd9094524f95351fd197db6c8857e96b436fb599870359325). EDIT: These files also contain the new flag to globally disable forward AD that allows us to reduce performance issues while this is in development. - Lowest level API and binding between Tensor and AutogradMeta in [TensorBody.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-7554853205392fa743357bf845ecc350a974ec049383248c12daaf2f4de04911), [TensorImpl.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-052bd9150ef8e09289ddf644b5a6830ede49207201cd41728f6d7cc6d9cead94), [TensorImpl.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-a15aae4cf23da44970db7cece62ff981265575c798c62f7b52d87c8809dfe2e1) and the rest of [variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-60e3bfe444e89efc7149f25b38e472710525984789934ab83f1bd5671b8ff285R557-R677) - API to access the forward primal that needs to be a differentiable function (and so in native_functions.yaml) [native_functions.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-2f3dbd85efb9b5172f2264eedd3be47dd765e6ab7cc8bf3ade5e62c28ae35991) [NamedRegistrations.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-69bd3bea510c9b64e1633fa18c3ea63d4b8348dbad3a78ad9de844ab3e43dc1d), [VariableMethodsStub.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-23f5fcb737a2b289811fe0f4b65aef775e7c824b2e629ecd343df51405cd434f), [derivatives.yaml](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_python_functions.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-e4c2f99a2404e98c3586e07425da73008f36b1bada790648a7297af141d37f8c), [gen_trace_type.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-54e0b976027bf8debefb959ff360b89ae93466970c843365b1b3a03806d868ce), [TraceTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-f34636741ad4a23d018e0c289bc750c3bad887b45660e1d6eaf440d234a78fbf) and [part of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R198-R243) - c++ API [autograd.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-349028fbe8291a965a7a263c323b208fe071c35c66179ee997ef84fa81aa4b1e), [autograd.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-a3fe908d67dfec16a1fcde300de68b0701bf68b88db7451f29f2bee255cf30c9) - python binding [init.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-c58a67c85191c22c9b3bb439117d8053edfd9dea839fa010cf967d404c3c630d) - python API [forward_ad.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a4efad4ba18fffdfb264c21e5475997a24a743089a899f8ec1a5ff962c6738d9), [autograd/__init__.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-743abcafd32ad0e69f39ac5a91df4197b7e1921c135cacee7ef6dc829a8a7af8) - c++ and python printing [Formatting.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-881dba501e71662e2e4818b4b016f739b344c8aed2f5edc6b871eda47a2aced0), [_tensor_str.py](https://github.com/pytorch/pytorch/pull/49097/files#diff-a7911f8d5e73adbff914d99fd7818ace2a7030b6a3748abe06ec6fc6e3df9cc3) - Utility for formulas and updated manual functions to respect new view system as well as forward grad [FunctionsManual.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-6378bb6dc81a64dab676d61731341fa5d1088418f32a1473a33a0ccfc2357dc1), [FunctionsManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-4adbd88239afcd60e8198aab65d4f5e43b62314e34b80551e997a1ea503adea5) [rest of VariableTypeManual.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-6e19a1bce8cbdba8714b6e2c794a76bc0864b64a49cfa757cb0b5afdc937d1a4R264-R433) - Ensure SavedVariable save forward grad properly [saved_variable.h](https://github.com/pytorch/pytorch/pull/49097/files#diff-c1b8039d776241abe177d5aa99b79dd9489a9b3e529da8ab24c2e386c1238ae2), [saved_variable.cpp](https://github.com/pytorch/pytorch/pull/49097/files#diff-cc9fba479b5beae06b2eea2e390d17796e0341c5b037a20b5bcaccbb0c341030) Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D25607503 Pulled By: albanD fbshipit-source-id: f1396290de1d75760f3d380c43cdd56e86fa6099	2020-12-21 14:39:43 -08:00
Sebastian Messmer	26974e6b28	Remove set_quantizer_ from native_functions.yaml (#49463 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49463 set_quantizer_ takes a ConstQuantizerPtr argument, which is neither supported by JIT nor by c10. Also, it doesn't get dispatched (CPU and CUDA have the same implementation) and it is excluded from python bindings generation. So there is no real reason why this needs to be in native_functions.yaml Removing it unblocks the migration to c10-fullness since this is an op that would have been hard to migrate. See https://fb.quip.com/QRtJAin66lPN ghstack-source-id: 118710663 Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D25587763 fbshipit-source-id: 8fab921f4c256c128d48d82dac731f04ec9bad92	2020-12-17 03:28:00 -08:00
Brian Hirsh	33a9b14da0	pyi codegen - removing byte-for-byte-compatibility hacks (sorting overloads) (#49056 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49056 This is another byte-for-byte compatibility hack. I'm now sorting pyi signature overloads (previously the codegen did not). Mostly put this in a separate PR just to more easily reason about the diff in the codegen output. Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D25410846 Pulled By: bdhirsh fbshipit-source-id: 06e5c32edbce610dd12ec7499014b41b23c646bd	2020-12-11 13:29:22 -08:00
Brian Hirsh	9920adebfd	pyi cleanup (#49054 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49054 These are some followups from the first pyi codegen PR. Still maintaining byte-for-byte compatibility in this one. - Separated `argument_str() with a pyi flag into two functions, `argument_str()` and `argument_str_pyi()` - Added a notes section for pyi at the top of `python.py` - Added a `Python Interface` section that I moved the free-standing pyi functions to Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D25410848 Pulled By: bdhirsh fbshipit-source-id: db83a80af900c32b5e32d67ce27767f6e7c2adfb	2020-12-11 13:27:41 -08:00
Brian Hirsh	ba6511b304	pyi codegen update - remove Declarations.yaml (#48754 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48754 The goal of this PR is to kill Declarations.yaml in the pyi codegen, in favor of native_functions + the existing python object model. High-level design Since the python signatures used by the `python_arg_parser` are “supposed” to resemble the corresponding pyi type hint signatures, I re-used the existing python object model that Jiakai defined in `tools/codegen/api/python.py`. This means that the pyi codegen now reads `native_functions.yaml`, parses it into a bunch of `PythonSignatureGroup` objects, and emits corresponding method + function variants of type-hint signatures for each one, respectively into `__init__.pyi` and `_VariableFunctions.pyi`. What makes this uglier is that pyi and the python arg parser have a number of differences in how they’re emitted. I expressed that through a `pyi` flag on the `PythonSignature` dataclass, that tells it whether or not to print itself as a pyi vs. arg_parser signature. One thing worth noting is how pyi generates signatures differently for native / deprecated op signatures. For native ops: - The pyi codegen fuses functional and out variants of each op into a single signature with an optional `out` argument. Ops without an `out` variant just get an ordinary functional signature. - Some ops that fit certain criteria also get a second “varargs” signature - basically ops with a single positional argument of type List[int]. For deprecated signatures: - Functional and out variants are not fused - they each get their own signature entry - There are no varargs signatures This is currently implemented through the `signature_str()` and `signature_str_vararg()` methods on the `PythonSignature`/`PythonSignatureDeprecated` classes. `signature_str()` knows how to print itself with/without out arguments, differently for native/deprecated ops. `signature_str_vararg()` optionally returns a vararg variant of the signature if one exists. Calling out the gap between python_arg_parser vs. pyi The two formats are notably different, so I don’t think we can expect to unify them completely. That said, I encountered a number of differences in the pyi codegen that looked wrong- I tried to call them out in the PR, to be removed later. Just as an example, looking at the `svd` signature in the python_arg_parser vs. the pyi type hint: python_arg_parser ``` Static PythonArgParser parser({ “svd(Tensor input, bool some=True, bool compute_uv=True, , TensorList[3] out=None”, }, /traceable=/true); ``` Pyi ``` def svd(input: Tensor, some: _bool=True, compute_uv: _bool=True, , out: Optional[Tensor]=None) -> namedtuple_U_S_V: … ``` The two have obvious syntactic differences that we probably don’t plan on changing: the python_arg_parser doesn’t include `def` or return types, and it includes the type hint before the variable name. But the type of `out` in pyi is probably wrong, since `svd` has multiple output params. I tried to clearly call out any instances of the pyi codegen diverging in a way that looks buggy, so we can clean it up in a later PR (see the comments for details). Another particularly ugly “bug” that I kept in to maintain byte-for-byte compatibility is the fact that the pyi codegen groups operator overloads together. It turns out that the only reason it does this (as far as I can tell) is because is tacks on an out argument to signatures that don’t have one, if ANY overloads of that op have an out variant. E.g. consider the pyi type hints generated for `nanmedian` in `_VF.pyi`: ``` overload def nanmedian(input: Tensor, , out: Optional[Tensor]=None) -> Tensor: ... overload def nanmedian(input: Tensor, dim: _int, keepdim: _bool=False, , out: Optional[Tensor]=None) -> namedtuple_values_indices: ... overload def nanmedian(input: Tensor, dim: Union[str, ellipsis, None], keepdim: _bool=False, , out: Optional[Tensor]=None) -> namedtuple_values_indices: ... ``` And the corresponding native_functions.yaml entries: ``` - func: nanmedian(Tensor self) -> Tensor - func: nanmedian.dim(Tensor self, int dim, bool keepdim=False) -> (Tensor values, Tensor indices) - func: nanmedian.dim_values(Tensor self, int dim, bool keepdim=False, , Tensor(a!) values, Tensor(b!) indices) -> (Tensor(a!) values, Tensor(b!) indices) - func: nanmedian.names_dim(Tensor self, Dimname dim, bool keepdim=False) -> (Tensor values, Tensor indices) - func: nanmedian.names_dim_values(Tensor self, Dimname dim, bool keepdim=False, , Tensor(a!) values, Tensor(b!) indices) -> (Tensor(a!) values, Tensor(b!) ``` Signature 2 corresponds to entries 2 and 3 in native_functions, and Signature 3 corresponds to entries 4 and 5. But signature 1 has an optional out argument, even though entry 1 in native_functions.yaml has no out variant. I’d like to delete that logic in a later PR- that will also have the added benefit no longer requiring to group overloads together in the pyi codegen. We can just operate independently on each PythonSignatureGroup. More detailed accounting of the changes* Per file: gen_python_functions.py - `load_signatures()` can now skip deprecated signatures. Needed because pyi only includes deprecated functions, and skips their method variants (maybe we should add them in…?) - Moved `namedtuple_fieldnames` into python.cpp - `group_overloads()` can now opt to not sort the overloads (needed for byte-for-byte compact, pyi doesn’t sort for some reason) Python.py: - Gave `PythonSignature`and `PythonSignatureDeprecated` a `pyi` flag that tells it whether or not to print itself in pyi vs. python_arg_parser format - Added a `PythonReturns` dataclass , which is now a member of PythonSignature. It is only used by pyi. I found this useful because python returns need to know how to deal with named tuple returns properly. I also moved `namedtuple_fieldnames` into this file from gen_python_functions gen_pyi.py - Merged `get_py_torch_functions` and `get_py_variable_methods` into a single function, since they’re very similar - Lifted out all of the pyi type hint type-mapping mess and dropped it into python.py. This required updating the mapping to deal with NativeFunction objects instead of the outputs of Declarations.yaml (this was most of the logic in `type_to_python`, `arg_to_type_hint`, and `generate_type_hints`). `generate_type_hints` is now a small orchestration function that gathers the different signatures for each PythonSignatureGroup. - NamedTuples are now generated by calling `PythonReturn.named_tuple()` (in `generate_named_tuples()`), rather than appending to a global list A lot of hardcoded pyi signatures still live in `gen_pyi.py`. I didn’t look to closely into whether or not any of that can be removed as part of this PR. Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D25343802 Pulled By: bdhirsh fbshipit-source-id: f73e99e1afef934ff41e4aca3dabf34273459a52	2020-12-07 10:39:38 -08:00
Jiakai Liu	de284b6d35	[pytorch][codegen] add autograd data model (#48249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48249 Introduced autograd related data models at tools.codegen.api.autograd. Migrated load_derivatives.py to produce the new data models from derivatives.yaml. It has clean mypy-strict result. Changed both gen_autograd_functions.py and gen_variable_type.py to consume the new data model. Added type annotations to gen_autograd_functions.py - it has clean mypy-strict result except for the .gen_autograd import (so haven't added it to the strict config in this PR). To limit the scope of the PR, gen_variable_type.py is not refactored, and the main structure of load_derivatives.py / gen_autograd_functions.py is kept. We only make necessary changes to make it work. Confirmed byte-for-byte compatible with the old codegen: ``` Run it before and after this PR: .jenkins/pytorch/codegen-test.sh <baseline_output_dir> .jenkins/pytorch/codegen-test.sh <test_output_dir> Then run diff to compare the generated files: diff -Naur <baseline_output_dir> <test_output_dir> ``` Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D25086561 Pulled By: ljk53 fbshipit-source-id: 1f43ab0931d9814c24683b9a48ca497c5fc3d729	2020-11-19 21:47:05 -08:00
Jiakai Liu	5eaf8562cd	[pytorch][codegen] simplify dunder method check in gen_python_functions.py (#47976 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47976 Confirmed byte-for-byte compatible with the old codegen: ``` Run it before and after this PR: .jenkins/pytorch/codegen-test.sh <baseline_output_dir> .jenkins/pytorch/codegen-test.sh <test_output_dir> Then run diff to compare the generated files: diff -Naur <baseline_output_dir> <test_output_dir> ``` Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D24976273 Pulled By: ljk53 fbshipit-source-id: 6f8f20d18db20c3115808bfac0a8b8ad83dcf64c	2020-11-18 12:26:47 -08:00
Jiakai Liu	4ff8cd8f3a	[pytorch][codegen] gen_python_functions.py loading native_functions.yaml / deprecated.yaml directly (#47746 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47746 - Removed the integration hack in gen_python_functions.py. It now directly loads native_functions.yaml. All dependencies on Declarations.yaml have been removed / moved to elsewhere. - Rewrote the deprecated.yaml parsing logic to work with new data model directly. Confirmed byte-for-byte compatible with the old codegen: ``` Run it before and after this PR: .jenkins/pytorch/codegen-test.sh <baseline_output_dir> .jenkins/pytorch/codegen-test.sh <test_output_dir> Then run diff to compare the generated files: diff -Naur <baseline_output_dir> <test_output_dir> ``` Differential Revision: D24885067 Test Plan: Imported from OSS Reviewed By: bhosmer Pulled By: ljk53 fbshipit-source-id: 8e906b7dd36a64395087bd290f6f54596485ceb4	2020-11-14 02:27:57 -08:00
Jiakai Liu	d91cefb0d8	[pytorch][codegen] migrate gen_annotated_fn_args.py to new codegen model (#47745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47745 This is a relatively small codegen. Reintroduced 'simple_type' to preserve old codegen output. It depends on some methods defined in gen_python_functions.py - next PR will clean up the remaining Declarations.yaml methods in gen_python_functions.py. Confirmed byte-for-byte compatible with the old codegen: ``` Run it before and after this PR: .jenkins/pytorch/codegen-test.sh <baseline_output_dir> .jenkins/pytorch/codegen-test.sh <test_output_dir> Then run diff to compare the generated files: diff -Naur <baseline_output_dir> <test_output_dir> ``` Differential Revision: D24885068 Test Plan: Imported from OSS Reviewed By: ezyang Pulled By: ljk53 fbshipit-source-id: c0fbd726bcc450c3c7fe232c23e5b31779d0b65f	2020-11-14 02:24:39 -08:00
Jiakai Liu	4159191f0e	[pytorch] split out trace type generator and migrate to new codegen model (#47438 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47438 Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D24808211 Pulled By: ljk53 fbshipit-source-id: 44dfadf550a255c05aa201e54b48101aaf722885	2020-11-09 12:39:39 -08:00
Jiakai Liu	16c72a5a6b	[pytorch] continue to rewrite gen_python_functions.py with typed models (#46978 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46978 Refactored and added type annotations to the most part of the file. Some top-level codegen functions are called by other codegen scripts. Will migrate them in subsequent PRs. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D24589210 Pulled By: ljk53 fbshipit-source-id: e0c7e5b3672b41983f321400c2e2330d1462e76e	2020-11-08 01:34:12 -08:00
Brian Hirsh	7a0f0d24d0	Codegen - error when an argument that looks like an out argument isn't a kwarg (fix #43273 ) (#47284 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47284 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D24706763 Pulled By: bdhirsh fbshipit-source-id: 60fbe81a0dff7e07aa8c169235d15b84151d3ed7	2020-11-03 16:30:01 -08:00
Erjia Guan	a341a4329a	Format error message for unmatched signature between _out and base functions (#47087 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47087 Fixes #33547 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D24633077 Pulled By: ejguan fbshipit-source-id: d1baca84cb3bc415cced9b696103f17131e1e4c7	2020-11-03 07:36:37 -08:00
Jiakai Liu	9d23fd5c00	[pytorch] get rid of cpp_type_str from pybind codegen (#46977 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46977 Clean up a few TODOs in the new python binding codegen. Get rid of the _simple_type() hack and the uses of cpp_type_str. Now python argument type strings and PythonArgParser unpacking methods are directly generated from the original Type model. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D24589209 Pulled By: ljk53 fbshipit-source-id: b2a6c3911d58eae49c031d319c8ea6f804e2cfde	2020-10-28 21:25:55 -07:00
Kurt Mohler	b75b961934	Fix `requires_grad` arg for `new_full`, `new_empty`, `new_zeros` (#46486 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/36455 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46486 Reviewed By: gchanan Differential Revision: D24497034 Pulled By: ezyang fbshipit-source-id: 769a7f00f9a8f7cb77273a1193173a837ae7e32f	2020-10-28 09:34:53 -07:00
Pritam Damania	2b221a9599	Remove PyCFunction casts as much as possible. (#46227 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46227 Follow up from https://github.com/pytorch/pytorch/issues/45419, in this PR I've removed as many PyCFunction casts as I could from the codebase. The only ones I didn't remove were the ones with `METH_VARARGS \| METH_KEYWORDS` which have 3 parameters instead of 2 and had to be casted. Example: ` {"copy_", (PyCFunction)(void(*)(void))THPStorage_(copy_), METH_VARARGS \| METH_KEYWORDS, nullptr},` ghstack-source-id: 114632704 Test Plan: waitforbuildbot Reviewed By: albanD Differential Revision: D24269435 fbshipit-source-id: 025cfd43a9a2a3e59f6b2951c1a78749193d77cf	2020-10-20 15:01:51 -07:00
Jiakai Liu	3d421b3137	[pytorch] rewrite of the python binding codegen with the v2 API (#46244 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46244 - What does the generated binding code do? The Python binding codegen produces code that takes the input list of PyObjects, finds the matching ATen C++ function using PythonArgParser, converts the PyObjects into C++ types and calls the ATen C++ function: ``` +--------+ parsing +------------------------+ binding +-----------------------+ \| PyObjs \| ---------> \| PythonArgParser Output \| ---------> \| Cpp Function Dispatch \| +--------+ +------------------------+ +-----------------------+ ``` - Are Python arguments 1-1 mapped to C++ arguments? Python arguments might be reordered, packed, unpacked when binding to C++ arguments, as illustrated below: ``` // Binding - Reorder & Packing // aten::empty.names(int[] size, *, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor Python Args Cpp Args ----------------------------------------------------------- 0: size size 1: names names 2: memory_format -------+ 3: dtype -----+-\|--> options 4: layout / \| 5: device / +--> memory_format 6: pin_memory / 7: requires_grad -+ // Binding - Unpacking // aten::max.names_dim(Tensor self, Dimname dim, bool keepdim=False) -> (Tensor values, Tensor indices) Python Args Cpp Args ----------------------------------------------------------- +----> max /-----> max_values 0: input / self 1: dim / dim 2: keepdim / keepdim 3: out -----+ ``` - Why do we want to rewrite the python binding codegen? The old codegen takes Declarations.yaml as input. It doesn't distinguish between Python arguments and C++ arguments - they are all mixed together as a bag of non-typed dict objects. Different methods process these arg objects and add new attributes for various different purposes. It's not so obvious to figure out the semantics of these attributes. The complicated binding logic happens implicitly and scatteredly. ``` +--------------------+ \| Native Functions \| +--------------------+ \| \| v +--------------------+ \| Cpp Signatures \| +--------------------+ \| \| v +--------------------+ \| Declarations.yaml \| +--------------------+ \| +-------------------------------------+ \| +-------> \| PythonArgParser Schema \| \| \| +-------------------------------------+ \| \| . \| \| . v \| . +--------------------+ +-------------------------------------+ \| NonTyped Args Objs \| --> \| PythonArgParser -> Cpp Args Binding \| +--------------------+ +-------------------------------------+ \| . \| . \| . \| +-------------------------------------+ +-------> \| Cpp Function Dispatch \| +-------------------------------------+ ``` This PR leverages the new immutable data models introduced in the new aten codegen. It introduces dedicated data models for python schema. This way, we can not only avoid subtle Declaration.yaml conversions but also decouple the generation of python schema, python to c++ binding and c++ function call. The ultimate state will be like the following diagram: ``` +-------------------+ +-------------------------------------+ +-------> \| Python Signatures \| --> \| PythonArgParser Schema \| \| +-------------------+ +-------------------------------------+ \| \| . \| \| . \| \| . +------------------+ \| +-------------------------------------+ \| Native Functions \| +-------> \| PythonArgParser -> Cpp Args Binding \| +------------------+ \| +-------------------------------------+ \| \| . \| \| . \| \| . \| +-------------------+ +-------------------------------------+ +-------> \| Cpp Signatures \| --> \| Cpp Function Dispatch \| +-------------------+ +-------------------------------------+ ``` This PR has migrated the core binding logic from tools/autograd/gen_python_functions.py to tools/codegen/api/python.py. It produces the byte-for-byte same results (tested with #46243). Will migrate the rest of gen_python_functions.py in subsequent PRs. Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D24388874 Pulled By: ljk53 fbshipit-source-id: f88b6df4e917cf90d868a2bbae2d5ffb680d1841	2020-10-19 17:36:45 -07:00
chengjun	5741de883a	Define the record_stream method in native_functions.yaml (#44301 ) Summary: The record_stream method was hard coded for CUDA device. Define the record_stream in the native_functions.yaml to enable the dynamic dispatch to different end device. Fixes https://github.com/pytorch/pytorch/issues/36556 Pull Request resolved: https://github.com/pytorch/pytorch/pull/44301 Reviewed By: glaringlee Differential Revision: D23763954 Pulled By: ezyang fbshipit-source-id: e6d24f5e7892b56101fa858a6cad2abc5cdc4293	2020-10-13 09:15:22 -07:00
Peter Bell	8b39498a23	codegen: Allow string arguments to have defaults (#45665 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45665 Fixes #43944 Note that the codegen doesn't use a proper parser so, in the same way as with lists, the string `, ` cannot appear in defaults or it will be interpreted as a splitting point between arguments. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D24141835 Pulled By: ezyang fbshipit-source-id: 578127861fd2504917f4486c44100491a2c40343	2020-10-06 21:53:56 -07:00
Supriya Rao	c112e89cc6	[quant] Make choose_qparams_optimized return Tensors to preserve dtype (#45530 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45530 Returning double values requires special handling as a return type for aten functions. Instead return tensors where the type is preserved in the tensor dtype Test Plan: python test/test_quantization.py TestQuantizedTensor.test_choose_qparams_optimized Imported from OSS Reviewed By: dskhudia Differential Revision: D24001134 fbshipit-source-id: bec6b17242f4740ab5674be06e0fc30c35eb0379	2020-09-30 11:35:23 -07:00
Iurii Zdebskyi	d5748d9a1a	Enable binary ops with Scalar Lists with for foreach APIs (#45298 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45298 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D23931986 Pulled By: izdeby fbshipit-source-id: 281267cd6f90d57a169af89f9f10b0f4fcab47e3	2020-09-25 12:58:34 -07:00
Xinyu Li	26001a2334	Revert D23753711: [pytorch][PR] Add foreach APIs for binary ops with ScalarList Test Plan: revert-hammer Differential Revision: D23753711 (`71d1b5b0e2`) Original commit changeset: bf3e8c54bc07 fbshipit-source-id: 192692e0d3fff4cade9983db0a1760fedfc9674c	2020-09-24 11:55:49 -07:00
iurii zdebskyi	71d1b5b0e2	Add foreach APIs for binary ops with ScalarList (#44743 ) Summary: In this PR: 1) Added binary operations with ScalarLists. 2) Fixed _foreach_div(...) bug in native_functions 3) Covered all possible cases with scalars and scalar lists in tests 4) [minor] fixed bug in native_functions by adding "use_c10_dispatcher: full" to all _foreach functions tested via unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/44743 Reviewed By: bwasti, malfet Differential Revision: D23753711 Pulled By: izdeby fbshipit-source-id: bf3e8c54bc07867e8f6e82b5d3d35ff8e99b5a0a	2020-09-24 08:30:42 -07:00
Supriya Rao	60665ace17	[quant] Add optimized approach to calculate qparams for qembedding_bag (#45149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45149 The choose_qparams_optimized calculates the the optimized qparams. It uses a greedy approach to nudge the min and max and calculate the l2 norm and tries to minimize the quant error by doing `torch.norm(x-fake_quant(x,s,z))` Test Plan: Imported from OSS Reviewed By: raghuramank100 Differential Revision: D23848060 fbshipit-source-id: c6c57c9bb07664c3f1c87dd7664543e09f634aee	2020-09-23 19:00:22 -07:00
Jiakai Liu	9e5045e978	[pytorch] clean up normalized_dynamic_type() hack (#44889 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44889 This HACK doesn't seem to be necessary any more - there is no 'real' type in generated Declarations.yaml file. Verified by comparing generated code before/after. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D23761624 Pulled By: ljk53 fbshipit-source-id: de996f04d77eebea3fb9297dd90a8ebeb07647bb	2020-09-18 23:49:46 -07:00

1 2 3 4 5

248 Commits