pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Mikayla Gawarecki	1e79872f2e	[BE] More torch.nn docs coverage test (except for torch.nn.parallel) (#158654 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/158654 Approved by: https://github.com/janeyx99 ghstack dependencies: #158491	2025-07-25 22:03:55 +00:00
Mikayla Gawarecki	9e8f27cc79	[BE] Make torch.nn.modules.* satisfy the docs coverage test (#158491 ) Options to address the "undocumented python objects": 1. Reference the functions in the .rst via the torch.nn.modules namespace. Note that this changes the generated doc filenames / locations for most of these functions! 2. [Not an option] Monkeypatch `__module__` for these objects (broke several tests in CI due to `inspect.findsource` failing after this change) 3. Update the .rst files to also document the torch.nn.modules forms of these functions, duplicating docs. #### [this is the docs page added](https://docs-preview.pytorch.org/pytorch/pytorch/158491/nn.aliases.html) This PR takes option 3 by adding an rst page nn.aliases that documents the aliases in nested namespaces, removing all the torch.nn.modules.* entries from the coverage skiplist except - NLLLoss2d (deprecated) - Container (deprecated) - CrossMapLRN2d (what is this?) - NonDynamicallyQuantizableLinear This mostly required adding docstrings to `forward`, `extra_repr` and `reset_parameters`. Since forward arguments are already part of the module docstrings I just added a very basic docstring. Pull Request resolved: https://github.com/pytorch/pytorch/pull/158491 Approved by: https://github.com/janeyx99	2025-07-25 22:03:55 +00:00
Mikayla Gawarecki	2ee91db03d	Add APIs to separate norm calculation and gradient scaling in `nn.utils.clip_grad_norm_` (#139662 ) Fixes https://github.com/pytorch/pytorch/issues/139467 Refactor `nn.utils.clip_grad_norm_` into `nn.utils.get_total_norm` and then `nn.utils.clip_grads_with_norm_` . `clip_grad_norm_` now calls into these two new ops, `get_total_norm` is generalized (rather than `get_grad_norm` due to the discussion on the issue from @awgu) Pull Request resolved: https://github.com/pytorch/pytorch/pull/139662 Approved by: https://github.com/H-Huang	2024-11-07 23:13:23 +00:00
ekamiti	9e473fd868	Make adding Buffers more like adding Parameters (#125971 ) Add similar semantics for creating a buffer object similar to creating a parameter. This is done by introducing a new Buffer class that can be used for type disambiguation. The underlying functionality of registering a buffer remains the same as the register_buffer method has not been changed. The persistent parameter in the Buffer type is to indicate whether a buffer object should be persistent or not. Other non-test changes have to do with getting the new Buffer type recognized by inductor and dynamo. Remaining changes are test changes to make sure that the Buffer type can be used as a drop in replacement for register_buffer as it just leads to register_buffer being called. The addition of this new functionality still allows for normal tensors to be used as buffers so these changes are intended to be backwards compatible. Fixes #35735 Co-authored-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/125971 Approved by: https://github.com/albanD, https://github.com/anijain2305, https://github.com/mlazos	2024-07-31 10:32:40 +00:00
Huy Do	3d20cc1332	Cleanup some duplicated placeholder py:module docs (#123244 ) Fixes https://github.com/pytorch/pytorch/issues/123068 Fixes https://github.com/pytorch/pytorch/issues/111256 While investigating the flaky doc build failure .w.r.t duplicated `torch.ao.quantization.quantize` docstring warning, i.e. https://github.com/pytorch/pytorch/actions/runs/8532187126/job/23376591356#step:10:1260, I discover an old but still open bug in Sphinx https://github.com/sphinx-doc/sphinx/issues/4459. These warnings have always been there, but they are hidden because we are using `-j auto` to build docs with multiple threads. It's just by chance that they start to surface now. The issue can be reproduced by removing `-j auto` from https://github.com/pytorch/pytorch/blob/main/docs/Makefile#L5 and run `make html` locally. Then, these warnings shows up consistently. As `make html` treats warnings as errors, they will fail the build. ``` ... /data/users/huydo/conda/py3.8/lib/python3.8/site-packages/torch/ao/quantization/quantize.py:docstring of torch.ao.quantization.quantize.quantize:1: WARNING: duplicate object description of torch.ao.quantization.quantize, other instance in quantization, use :noindex: for one of them /data/users/huydo/conda/py3.8/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py:docstring of torch.nn.parallel.data_parallel.data_parallel:1: WARNING: duplicate object description of torch.nn.parallel.data_parallel, other instance in nn, use :noindex: for one of them /data/users/huydo/conda/py3.8/lib/python3.8/site-packages/torch/nn/utils/spectral_norm.py:docstring of torch.nn.utils.spectral_norm.spectral_norm:1: WARNING: duplicate object description of torch.nn.utils.spectral_norm, other instance in nn, use :noindex: for one of them /data/users/huydo/conda/py3.8/lib/python3.8/site-packages/torch/nn/utils/weight_norm.py:docstring of torch.nn.utils.weight_norm.weight_norm:1: WARNING: duplicate object description of torch.nn.utils.weight_norm, other instance in nn, use :noindex: for one of them /data/users/huydo/github/pytorch/docs/source/nn.rst:579: WARNING: duplicate object description of torch.nn.parallel.data_parallel, other instance in generated/torch.nn.functional.torch.nn.parallel.data_parallel, use :noindex: for one of them /data/users/huydo/github/pytorch/docs/source/nn.rst:594: WARNING: duplicate object description of torch.nn.utils.spectral_norm, other instance in generated/torch.nn.utils.spectral_norm, use :noindex: for one of them /data/users/huydo/github/pytorch/docs/source/nn.rst:595: WARNING: duplicate object description of torch.nn.utils.weight_norm, other instance in generated/torch.nn.utils.weight_norm, use :noindex: for one of them /data/users/huydo/github/pytorch/docs/source/quantization.rst:1348: WARNING: duplicate object description of torch.ao.quantization.quantize, other instance in generated/torch.ao.quantization.quantize, use :noindex: for one of them ... ``` The fix is just to clean up those duplicated placeholder py:module docs, which were there because these modules didn't have any docs originally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123244 Approved by: https://github.com/andrewor14, https://github.com/malfet	2024-04-05 03:18:53 +00:00
Mikayla Gawarecki	487b6d40ec	Add RMSNorm module (#121364 ) Similar to `dbeed9724b/torchmultimodal/modules/layers/normalizations.py (L51)` The implementation here is not optimized and we welcome pull requests to improve this - Use `normalized_shape` instead of singular integer `dim` to be aligned with the `nn.LayerNorm` implementation - Remove the [upcast to float and downcast ](`dbeed9724b/torchmultimodal/modules/layers/normalizations.py (L73)`) Differential Revision: [](https://our.internmc.facebook.com/intern/diff/) Differential Revision: [D55485840](https://our.internmc.facebook.com/intern/diff/D55485840) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121364 Approved by: https://github.com/albanD	2024-03-29 18:05:28 +00:00
PyTorch MergeBot	8698121636	Revert "Add RMSNorm module (#121364 )" This reverts commit `a7306de0dc`. Reverted https://github.com/pytorch/pytorch/pull/121364 on behalf of https://github.com/atalman due to Broke internal tests ([comment](https://github.com/pytorch/pytorch/pull/121364#issuecomment-2025502007))	2024-03-28 15:31:10 +00:00
Mikayla Gawarecki	a7306de0dc	Add RMSNorm module (#121364 ) Similar to `dbeed9724b/torchmultimodal/modules/layers/normalizations.py (L51)` The implementation here is not optimized and we welcome pull requests to improve this - Use `normalized_shape` instead of singular integer `dim` to be aligned with the `nn.LayerNorm` implementation - Remove the [upcast to float and downcast ](`dbeed9724b/torchmultimodal/modules/layers/normalizations.py (L73)`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/121364 Approved by: https://github.com/albanD	2024-03-27 21:39:30 +00:00
Tobias Ringwald	62c1e4a578	Added missing CircularPad*d references so the docs are actually built. (#118465 ) Fixes #118429 Pull Request resolved: https://github.com/pytorch/pytorch/pull/118465 Approved by: https://github.com/Skylion007	2024-01-27 22:39:01 +00:00
drisspg	4e29f01bf2	Remove sdp_kernel and replace with sdpa_kernel in attention namespace (#114689 ) # Summary Simplification of Backend Selection This PR deprecates the `torch.backends/cuda/sdp_kernel` context manager and replaces it with a new context manager `torch.nn.attention.sdpa_kernel`. This context manager also changes the api for this context manager. For `sdp_kernel` one would specify the backend choice by taking the negation of what kernel they would like to run. The purpose of this backend manager was to only to be a debugging tool, "turn off the math backend" and see if you can run one of the fused implementations. Problems: - This pattern makes sense if majority of users don't care to know anything about the backends that can be run. However, if users are seeking to use this context manager then they are explicitly trying to run a specific backend. - This is not scalable. We are working on adding the cudnn backend and this API makes it so so that more implementations will need to be turned off if user wants to explicitly run a given backend. - Discoverability of the current context manager. It is somewhat un-intutive that this backend manager is in backends/cuda/init when this now also controls the CPU fused kernel behavior. I think centralizing to attention namespace will be helpful. Other concerns: - Typically backends (kernels) for operators are entirely hidden from users and implementation details of the framework. We have exposed this to users already, albeit not by default and with beta warnings. Does making backends choices even more explicit lead to problems when we potentially want to remove existing backends, (perhaps inputs shapes will get covered by newer backends). A nice side effect is now that we aren't using the `BACKEND_MAP` in test_transformers many, many dynamo failures are passing for CPU tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114689 Approved by: https://github.com/cpuhrsch	2024-01-24 22:28:04 +00:00
Damien	2d2016fdf8	WIP Add compatibility with channels_last_3d for conv3d (#114790 ) Part of a multi-PR work to fix #59168 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114790 Approved by: https://github.com/albanD	2023-12-20 19:28:25 +00:00
Wongboo	68f74dd162	Add python and C++ support for LPPool3d (#114199 ) Add python and C++ support for LPPool3d to Fixes #114114 Pull Request resolved: https://github.com/pytorch/pytorch/pull/114199 Approved by: https://github.com/mikaylagawarecki	2023-12-08 18:18:44 +00:00
drisspg	d4c79a3078	Add an attention bias subclass for a lower right causal masking (#114823 ) # Summary This PR introduces a new Tensor subclass that is designed to be used with torch.nn.functional.scaled_dot_product_attention. Currently we have a boolean `is_causal` flag that allows users to do do causal masking without the need to actually create the "realized" attention bias and pass into sdpa. We originally added this flag since there is native support in both fused kernels we support. This provides a big performance gain ( the kernels only need to iterate over ~0.5x the sequence, and for very large sequence lengths this can provide vary large memory improvements. The flag was introduced when the early on in the kernel development and at the time it was implicitly meant to "upper_left" causal attention. This distinction only matters when the attention_bias is not square. For a more detailed break down see: https://github.com/pytorch/pytorch/issues/108108. The kernels default behavior has since changed, largely due to the rise of autogressive text generation. And unfortunately this would lead to a BC break. In the long term it may actually be beneficial to change the default meaning of `is_causal` to represent lower_right causal masking. The larger theme though is laid here: https://github.com/pytorch/pytorch/issues/110681. The thesis being that there is alot of innovation in SDPA revolving around the attention_bias being used. This is the first in hopefully a few more attention_biases that we would like to add. The next interesting one would be `sliding_window` which is used by the popular mistral model family. Results from benchmarking, I improved the meff_attention perf hence the slightly decreased max perf. ```Shell +---------+--------------------+------------+-----------+-----------+-----------+-----------+----------------+----------+ \| Type \| Speedup \| batch_size \| num_heads \| q_seq_len \| k_seq_len \| embed_dim \| dtype \| head_dim \| +---------+--------------------+------------+-----------+-----------+-----------+-----------+----------------+----------+ \| Average \| 1.2388050062214226 \| \| \| \| \| \| \| \| \| Max \| 1.831672915579016 \| 128 \| 32 \| 1024 \| 2048 \| 2048 \| torch.bfloat16 \| 64 \| \| Min \| 0.9430534166730135 \| 1 \| 16 \| 256 \| 416 \| 2048 \| torch.bfloat16 \| 128 \| +---------+--------------------+------------+-----------+-----------+-----------+-----------+----------------+----------+ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/114823 Approved by: https://github.com/cpuhrsch	2023-12-06 08:29:26 +00:00
Tongzhou Wang	275403be16	[doc] Add nn.parametrizations.weight_norm (#113783 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/113783 Approved by: https://github.com/albanD	2023-11-16 17:42:48 +00:00
Matthew Hoffman	ad4472833c	define public API for torch.nn.utils (#111026 ) Adding modules imported here and the following functions to the `__all__`: * [clip_grad_norm_](https://pytorch.org/docs/stable/generated/torch.nn.utils.clip_grad_norm_.html) * [clip_grad_value_](https://pytorch.org/docs/stable/generated/torch.nn.utils.clip_grad_value_.html) * [remove_weight_norm](https://pytorch.org/docs/stable/generated/torch.nn.utils.remove_weight_norm.html) * [parameters_to_vector](https://pytorch.org/docs/stable/generated/torch.nn.utils.parameters_to_vector.html) * [vector_to_parameters](https://pytorch.org/docs/stable/generated/torch.nn.utils.vector_to_parameters.html) * [remove_spectral_norm](https://pytorch.org/docs/stable/generated/torch.nn.utils.remove_spectral_norm.html) * [skip_init](https://pytorch.org/docs/stable/generated/torch.nn.utils.skip_init.html) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111026 Approved by: https://github.com/mikaylagawarecki	2023-10-12 23:05:23 +00:00
albanD	c4db607607	Doc test non packages (#110568 ) Add non-package python modules to the public API checks. The original change is to remove the `ispkg` check in this line https://github.com/pytorch/pytorch/blob/main/docs/source/conf.py#L518 Everything else is to add the appropriate modules to the rst files, make sure every module we provide can be imported (fixed by either making optional dependencies optional or just deleting files that have been un-importable for 3 years), make API that are both modules and functions (like torch.autograd.gradcheck) properly rendered on the docs website without confusion and add every non-documented API to the allow list (~3k of them). Next steps will be to try and fix these missing docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/110568 Approved by: https://github.com/zou3519	2023-10-06 14:16:01 +00:00
Jason Lu	bc88028e8e	Back out "Reland "Make adding buffers more like adding parameters (#104069 )" (#106224 )" (#106743 ) Summary: Original commit changeset: 81319beb97f3 Original Phabricator Diff: D47961182 Test Plan: revert to maintain backward compat with legacy ads_dper3 production package. Read details in: S357822 Reviewed By: atuljangra Differential Revision: D48131623 @diff-train-skip-merge (D48131623 landed internally) Pull Request resolved: https://github.com/pytorch/pytorch/pull/106743 Approved by: https://github.com/malfet	2023-08-08 15:27:34 +00:00
Mikayla Gawarecki	d8e5f2aa6d	Reland "Make adding buffers more like adding parameters (#104069 )" (#106224 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/106224 Approved by: https://github.com/atalman, https://github.com/albanD	2023-07-31 17:18:56 +00:00
Andrey Talman	c6653b65d8	Back out "Make adding buffers more like adding parameters (#104069 )" (#105581 ) Summary: D47537831 is breaking pyper tests: https://fb.workplace.com/groups/802176577445480/posts/1018902842439518/ with `TypeError: register_buffer() takes 3 positional arguments but 4 were given` Original commit changeset: d4b4069fbd38 Original Phabricator Diff: D47537831 Test Plan: ``` buck2 run //caffe2/torch/fb/training_toolkit/integration_tests/training_lifecycle/cogwheel_tests/pyper_release_v2:cogwheel_smallworld_inline_cvr_infer_pyper_pyper__canary_offline_training-launcher -- --run-harness-in-tupperware --build-fbpkg ads_dper3 --build-fbpkg training_platform ``` Reviewed By: atalman Differential Revision: D47600140 Pull Request resolved: https://github.com/pytorch/pytorch/pull/105581 Approved by: https://github.com/mikaylagawarecki	2023-07-20 03:39:53 +00:00
ekamiti	32d422f335	Make adding buffers more like adding parameters (#104069 ) Add similar semantics for creating a buffer object similar to creating a parameter. This is done by introducing a new `Buffer` class that can be used for type disambiguation. The underlying functionality of registering a buffer remains the same as the `register_buffer` method has not been changed. The `persistent` parameter in the `Buffer` type is to indicate whether a buffer object should be persistent or not. Other non-test changes have to do with getting the new `Buffer` type recognized by inductor and dynamo. Remaining changes are test changes to make sure that the `Buffer` type can be used as a drop in replacement for `register_buffer` as it just leads to `register_buffer` being called. The addition of this new functionality still allows for normal tensors to be used as buffers so these changes are intended to be backwards compatible. Fixes #35735 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104069 Approved by: https://github.com/mikaylagawarecki	2023-07-17 17:59:05 +00:00
Mikayla Gawarecki	b04363ead4	[easy] Expose documentation for a few global nn.Module hooks (#97185 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/97185 Approved by: https://github.com/albanD	2023-03-21 20:09:29 +00:00
Rishub Tamirisa	f3b8638074	Adding nn.ZeroPad1d and nn.ZeroPad3d (#96295 ) Fixes #95796 ### Implementation Adds python implementation for `nn.ZeroPad1d` and `nn.ZeroPad3d` in `torch/nn/modules/padding.py`. Adds cpp implementation for `nn::ZeroPad1d` and `nn::ZeroPad3d` in the following 3 files, refactored with templates similarly to `nn::ConstantPad`'s implementation: <br> - `torch/crsc/api/include/torch/nn/modules/padding.h` - `torch/csrc/api/include/torch/nn/options/padding.h` - `torch/csrc/api/src/nn/modules/padding.cpp` Also added relevant definitions in `torch/nn/modules/__init__.py`. ### Testing Adds the following tests: - cpp tests of similar length and structure as `ConstantPad` and the existing `ZeroPad2d` impl in `test/cpp/api/modules.cpp` - cpp API parity tests in `torch/testing/_internal/common_nn.py` - module init tests in `test/test_module_init.py` Also added relevant definitions in `test/cpp_api_parity/parity-tracker.md` Pull Request resolved: https://github.com/pytorch/pytorch/pull/96295 Approved by: https://github.com/soulitzer	2023-03-10 03:51:41 +00:00
Mikayla Gawarecki	5ce1fad711	Add rnn.unpad_sequence and rnn.unpack_sequence to documentation (#94316 ) Fix #76064 Pull Request resolved: https://github.com/pytorch/pytorch/pull/94316 Approved by: https://github.com/jbschlosser	2023-02-13 17:47:10 +00:00
Joel Benjamin Schlosser	2d73c8e6e0	Add Dropout1d module Pull Request resolved: https://github.com/pytorch/pytorch/pull/79545 Approved by: https://github.com/ngimel, https://github.com/albanD	2022-06-15 14:39:07 +00:00
albanD	a6a5e6cecf	move the stateless util to public API! Pull Request resolved: https://github.com/pytorch/pytorch/pull/75834 Approved by: https://github.com/zou3519, https://github.com/jbschlosser	2022-04-21 13:42:24 +00:00
Alban Desmaison	734281c3d6	Cleanup all module references in doc (#73983 ) Summary: Working towards https://docs.google.com/document/d/10yx2-4gs0gTMOimVS403MnoAWkqitS8TUHX73PN8EjE/edit?pli=1# This PR: - Ensure that all the submodules are listed in a rst file (that ensure they are considered by the coverage tool) - Remove some long deprecated code that just error out on import - Remove the allow list altogether to ensure nothing gets added back there Pull Request resolved: https://github.com/pytorch/pytorch/pull/73983 Reviewed By: anjali411 Differential Revision: D34787908 Pulled By: albanD fbshipit-source-id: 163ce61e133b12b2f2e1cbe374f979e3d6858db7 (cherry picked from commit c9edfead7a01dc45bfc24eaf7220d2a84ab1f62e)	2022-03-10 22:26:29 +00:00
lezcano	f3e329cbec	Implements the orthogonal parametrization (#62089 ) Summary: Implements an orthogonal / unitary parametrisation. It does passes the tests and I have trained a couple models with this implementation, so I believe it should be somewhat correct. Now, the implementation is very subtle. I'm tagging nikitaved and IvanYashchuk as reviewers in case they have comments / they see some room for optimisation of the code, in particular of the `forward` function. Fixes https://github.com/pytorch/pytorch/issues/42243 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62089 Reviewed By: ezyang Differential Revision: D30639063 Pulled By: albanD fbshipit-source-id: 988664f333ac7a75ce71ba44c8d77b986dff2fe6	2021-08-30 13:12:07 -07:00
Calvin McCarter	bdf439a958	Adds _LazyInstanceNorm and LazyInstanceNormXd (#60982 ) Summary: Signed-off-by: Calvin McCarter <calvin@lightmatter.co> Fixes https://github.com/pytorch/pytorch/issues/60981 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60982 Reviewed By: albanD Differential Revision: D29810547 Pulled By: jbschlosser fbshipit-source-id: d933d4c7fe5cf7be9b09a5ab93f740b94cf08cc1	2021-07-21 06:45:45 -07:00
sawradip	eddc5f40f9	Added GLU and FeatureAlphaDropout to nn docs (#60590 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60563 and https://github.com/pytorch/pytorch/issues/60570 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60590 Reviewed By: albanD Differential Revision: D29352372 Pulled By: jbschlosser fbshipit-source-id: f81dd65deab1848a68dc202df252c416ce5214d0	2021-06-24 08:00:18 -07:00
Thomas J. Fan	c16f87949f	ENH Adds nn.ReflectionPad3d (#59791 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/27655 This PR adds a C++ and Python version of ReflectionPad3d with structured kernels. The implementation uses lambdas extensively to better share code from the backward and forward pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59791 Reviewed By: gchanan Differential Revision: D29242015 Pulled By: jbschlosser fbshipit-source-id: 18e692d3b49b74082be09f373fc95fb7891e1b56	2021-06-21 10:53:14 -07:00
Thomas J. Fan	8af6281201	DOC Adds register_module_full_backward_hook into docs (#58954 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54443 Adds `register_module_full_backward_hook` into the index so it is rendered in the html docs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58954 Reviewed By: ngimel Differential Revision: D28801816 Pulled By: jbschlosser fbshipit-source-id: a2e737fe983e5d7e4e26d7639183bca34b571cb8	2021-06-01 15:47:10 -07:00
Joel Schlosser	a749e8edf5	Add UninitializedBuffer to nn docs (#59021 ) Summary: The `UninitializedBuffer` class was previously left out of `nn.rst`, so it was not included in the generated documentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59021 Reviewed By: anjali411 Differential Revision: D28723044 Pulled By: jbschlosser fbshipit-source-id: 71e15b0c7fabaf57e8fbdf7fbd09ef2adbdb36ad	2021-05-26 14:36:05 -07:00
Adnios	09a8f22bf9	Add mish activation function (#58648 ) Summary: See issus: https://github.com/pytorch/pytorch/issues/58375 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58648 Reviewed By: gchanan Differential Revision: D28625390 Pulled By: jbschlosser fbshipit-source-id: 23ea2eb7d5b3dc89c6809ff6581b90ee742149f4	2021-05-25 10:36:21 -07:00
Joel Schlosser	c58709b7bb	Helper function for skipping module parameter / buffer initialization (#57555 ) Summary: This PR introduces a helper function named `torch.nn.utils.skip_init()` that accepts a module class object + `args` / `kwargs` and instantiates the module while skipping initialization of parameter / buffer values. See discussion at https://github.com/pytorch/pytorch/issues/29523 for more context. Example usage: ```python import torch m = torch.nn.utils.skip_init(torch.nn.Linear, 5, 1) print(m.weight) m2 = torch.nn.utils.skip_init(torch.nn.Linear, 5, 1, device='cuda') print(m2.weight) m3 = torch.nn.utils.skip_init(torch.nn.Linear, in_features=5, out_features=1) print(m3.weight) ``` ``` Parameter containing: tensor([[-3.3011e+28, 4.5915e-41, -3.3009e+28, 4.5915e-41, 0.0000e+00]], requires_grad=True) Parameter containing: tensor([[-2.5339e+27, 4.5915e-41, -2.5367e+27, 4.5915e-41, 0.0000e+00]], device='cuda:0', requires_grad=True) Parameter containing: tensor([[1.4013e-45, 0.0000e+00, 0.0000e+00, 0.0000e+00, 0.0000e+00]], requires_grad=True) ``` Bikeshedding on the name / namespace is welcome, as well as comments on the design itself - just wanted to get something out there for discussion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57555 Reviewed By: zou3519 Differential Revision: D28640613 Pulled By: jbschlosser fbshipit-source-id: 5654f2e5af5530425ab7a9e357b6ba0d807e967f	2021-05-24 11:28:32 -07:00
Jeffrey Wan	e1bb9d2d99	Reimplement spectral_norm using new parametrization functionality (#57784 ) Summary: Adds a new file under `torch/nn/utils/parametrizations.py` which should contain all the parametrization implementations For spectral_norm we add the `SpectralNorm` module which can be registered using `torch.nn.utils.parametrize.register_parametrization` or using a wrapper: `spectral_norm`, the same API the old implementation provided. Most of the logic is borrowed from the old implementation: - Just like the old implementation, there should be cases when retrieving the weight should perform another power iteration (thus updating the weight) and cases where it shouldn't. For example in eval mode `self.training=True`, we do not perform power iteration. There are also some differences/difficulties with the new implementation: - Using new parametrization functionality as-is there doesn't seem to be a good way to tell whether a 'forward' call was the result of parametrizations are unregistered (and leave_parametrizations=True) or when the injected property's getter was invoked. The issue is that we want perform power iteration in the latter case but not the former, but we don't have this control as-is. So, in this PR I modified the parametrization functionality to change the module to eval mode before triggering their forward call - Updates the vectors based on weight on initialization to fix https://github.com/pytorch/pytorch/issues/51800 (this avoids silently update weights in eval mode). This also means that we perform twice any many power iterations by the first forward. - right_inverse is just the identity for now, but maybe it should assert that the passed value already satisfies the constraints - So far, all the old spectral_norm tests have been cloned, but maybe we don't need so much testing now that the core functionality is already well tested Pull Request resolved: https://github.com/pytorch/pytorch/pull/57784 Reviewed By: ejguan Differential Revision: D28413201 Pulled By: soulitzer fbshipit-source-id: e8f1140f7924ca43ae4244c98b152c3c554668f2	2021-05-13 14:16:13 -07:00
albanD	cbd1227809	Add a note in the parametrize doc about the naming choice (#58142 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58142 Reviewed By: agolynski Differential Revision: D28386655 Pulled By: albanD fbshipit-source-id: c2793ac377ef7082c1840e1a50604da3ff9c61ac	2021-05-12 13:15:56 -07:00
Jeff Yang	02f5c50828	docs: separate autosummary for flatten layers (#54663 ) Summary: fixes https://github.com/pytorch/pytorch/issues/46881 https://11815123-65600975-gh.circle-artifacts.com/0/docs/generated/torch.nn.Flatten.html#torch.nn.Flatten Pull Request resolved: https://github.com/pytorch/pytorch/pull/54663 Reviewed By: ailzhang Differential Revision: D27328367 Pulled By: zou3519 fbshipit-source-id: de1651a670181db8ea8ab16624c17ba08a88eb5d	2021-03-29 10:23:34 -07:00
Ioana Tivadar	1041fdd069	Grammatically update tech docs (#54370 ) Summary: Small grammatical update to nn.rst ![Screenshot 2021-03-20 at 11 44 29](https://user-images.githubusercontent.com/80534697/111867047-d868f900-8971-11eb-8cc2-0ae7d2c59229.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/54370 Reviewed By: radkris-git Differential Revision: D27243944 Pulled By: heitorschueroff fbshipit-source-id: 08d8061d9e74ffaf95c8a610107a8632259474ca	2021-03-23 02:59:19 -07:00
lezcano	7aeee2849b	Parametrization Functionality (#33344 ) Summary: Provides the implementation for feature request issue https://github.com/pytorch/pytorch/issues/28937. Adds the `Parametrization` functionality and implements `Pruning` on top of it. It adds the `auto` mode, on which the parametrization is just computed once per forwards pass. The previous implementation computed the pruning on every forward, which is not optimal when pruning RNNs for example. It implements a caching mechanism for parameters. This is implemented through the mechanism proposed at the end of the discussion https://github.com/pytorch/pytorch/issues/7313. In particular, it assumes that the user will not manually change the updated parameters between the call to `backwards()` and the `optimizer.step()`. If they do so, they would need to manually call the `.invalidate()` function provided in the implementation. This could be made into a function that gets a model and invalidates all the parameters in it. It might be the case that this function has to be called in the `.cuda()` and `.to` and related functions. As described in https://github.com/pytorch/pytorch/issues/7313, this could be used, to implement in a cleaner way the `weight_norm` and `spectral_norm` functions. It also allows, as described in https://github.com/pytorch/pytorch/issues/28937, for the implementation of constrained optimization on manifolds (i.e. orthogonal constraints, positive definite matrices, invertible matrices, weights on the sphere or the hyperbolic space...) TODO (when implementation is validated): - More thorough test - Documentation Resolves https://github.com/pytorch/pytorch/issues/28937 albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/33344 Reviewed By: zhangguanheng66 Differential Revision: D26816708 Pulled By: albanD fbshipit-source-id: 07c8f0da661f74e919767eae31335a9c60d9e8fe	2021-03-04 12:45:27 -08:00
Joel Schlosser	e86476f736	Huber loss (#50553 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48595. ## Background This PR implements HuberLoss, which differs from SmoothL1Loss by a factor of beta. The current implementation does not share logic between the two. Feedback is welcome for the optimal way to minimize code duplication while remaining performant. I've done some early [benchmarking](https://pytorch.org/tutorials/recipes/recipes/benchmark.html#collecting-instruction-counts-with-callgrind) with Huber calling in to the Smooth L1 kernel and scaling afterwards; for the simple test case I used, instruction counts are as follows: ``` Huber loss calls dedicated Huber kernel: 2,795,300 Huber loss calls Smooth L1 kernel and scales afterwards: 4,523,612 ``` With these numbers, instruction counts are ~62% higher when using the pre-existing Smooth L1 kernel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50553 Test Plan: ``` python test/test_nn.py TestNN.test_HuberLoss python test/test_nn.py TestNN.test_HuberLoss_delta python test/test_nn.py TestNN.test_huber_loss_invalid_delta python test/test_nn.py TestNNDeviceTypeCPU.test_smooth_l1_loss_vs_huber_loss_cpu python test/test_nn.py TestNNDeviceTypeCUDA.test_smooth_l1_loss_vs_huber_loss_cuda python test/test_nn.py TestNNDeviceTypeCPU.test_invalid_reduction_strings_cpu python test/test_nn.py TestNNDeviceTypeCUDA.test_invalid_reduction_strings_cuda python test/test_nn.py TestNN.test_loss_equal_input_target_shape python test/test_nn.py TestNN.test_pointwise_loss_broadcast python test/test_overrides.py python test/test_jit.py TestJitGeneratedFunctional.test_nn_huber_loss python test/test_type_hints.py python test/test_cpp_api_parity.py build/bin/test_api ``` ## Documentation <img width="677" alt="Screen Shot 2021-01-14 at 4 25 08 PM" src="https://user-images.githubusercontent.com/75754324/104651224-5a445980-5685-11eb-884b-14ea517958c2.png"> <img width="677" alt="Screen Shot 2021-01-14 at 4 24 35 PM" src="https://user-images.githubusercontent.com/75754324/104651190-4e589780-5685-11eb-974d-8c63a89c050e.png"> <img width="661" alt="Screen Shot 2021-01-14 at 4 24 45 PM" src="https://user-images.githubusercontent.com/75754324/104651198-50225b00-5685-11eb-958e-136b36f6f8a8.png"> <img width="869" alt="Screen Shot 2021-01-14 at 4 25 27 PM" src="https://user-images.githubusercontent.com/75754324/104651208-53b5e200-5685-11eb-9fe4-5ff433aa13c5.png"> <img width="862" alt="Screen Shot 2021-01-14 at 4 25 48 PM" src="https://user-images.githubusercontent.com/75754324/104651209-53b5e200-5685-11eb-8051-b0cfddcb07d3.png"> Reviewed By: H-Huang Differential Revision: D26734071 Pulled By: jbschlosser fbshipit-source-id: c98c1b5f32a16f7a2a4e04bdce678080eceed5d5	2021-03-02 17:30:45 -08:00
Jeff Yang	7f4dff5496	docs: add FractionalMaxPool3d in pooling layers (#52556 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51625 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52556 Reviewed By: smessmer Differential Revision: D26593666 Pulled By: bdhirsh fbshipit-source-id: 3d81d23fa70efa0f794dde47a34baad0aaa9c751	2021-02-22 17:04:09 -08:00
Akifumi Imanishi	b3fda95fe7	Add LazyBatchNormXd (#51862 ) Summary: Same diff with https://github.com/pytorch/pytorch/issues/51548 (cc. albanD) Pull Request resolved: https://github.com/pytorch/pytorch/pull/51862 Reviewed By: izdeby Differential Revision: D26312289 Pulled By: albanD fbshipit-source-id: 9cdec0e0c9021c33d10d85010978c7fa5cb4dc60	2021-02-09 10:29:03 -08:00
Alban Desmaison	a930162c69	Revert D26276903: [pytorch][PR] Add LazyBatchNormXd Test Plan: revert-hammer Differential Revision: D26276903 (`aa1fd6b45a`) Original commit changeset: 0ac706974178 fbshipit-source-id: bfe01b01cd460f1e2845ea5ef1fc1514e6b6ba54	2021-02-05 12:37:29 -08:00
Akifumi Imanishi	aa1fd6b45a	Add LazyBatchNormXd (#51548 ) Summary: This PR implements UninitializedBuffer and LazyBatchnormXd based on https://github.com/pytorch/pytorch/issues/44538. (cc. emcastillo and albanD) Pull Request resolved: https://github.com/pytorch/pytorch/pull/51548 Reviewed By: zhangguanheng66 Differential Revision: D26276903 Pulled By: albanD fbshipit-source-id: 0ac706974178363f8af075e59b41d5989418922f	2021-02-05 10:27:04 -08:00
M.L. Croci	8eb90d4865	Add Gaussian NLL Loss (#50886 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48520. cc albanD (This is a clean retry PR https://github.com/pytorch/pytorch/issues/49807) Pull Request resolved: https://github.com/pytorch/pytorch/pull/50886 Reviewed By: ejguan Differential Revision: D26007435 Pulled By: albanD fbshipit-source-id: 88fe91b40dea6f72e093e6301f0f04fcc842d2f0	2021-01-22 06:56:49 -08:00
Joel Schlosser	68d438c9da	Add PixelUnshuffle (#49334 ) Summary: Adds an implementation of `torch.nn.PixelUnshuffle` as the inverse operation of `torch.nn.PixelShuffle`. This addresses https://github.com/pytorch/pytorch/issues/2456 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49334 Test Plan: ``` # Unit tests. python test/test_nn.py TestNN.test_pixel_shuffle_unshuffle # Module test. python test/test_nn.py TestNN.test_PixelUnshuffle # C++ API tests. build/bin/test_api # C++ / python parity tests. python test/test_cpp_api_parity.py # JIT test. python test/test_jit.py TestJitGeneratedFunctional.test_nn_pixel_unshuffle # Override tests. python test/test_overrides.py # Type hint tests. python test/test_type_hints.py ``` Screenshots of rendered docs: <img width="876" alt="Screen Shot 2020-12-18 at 12 19 05 PM" src="https://user-images.githubusercontent.com/75754324/102642255-6b07bb00-412b-11eb-88fa-e53e7e8ba720.png"> <img width="984" alt="Screen Shot 2020-12-18 at 12 19 26 PM" src="https://user-images.githubusercontent.com/75754324/102642276-70fd9c00-412b-11eb-8548-445082a2db02.png"> <img width="932" alt="Screen Shot 2020-12-18 at 12 19 34 PM" src="https://user-images.githubusercontent.com/75754324/102642704-19abfb80-412c-11eb-9546-95bdd1c3cf22.png"> <img width="876" alt="Screen Shot 2020-12-22 at 12 51 36 PM" src="https://user-images.githubusercontent.com/75754324/102918259-986aa680-4454-11eb-99e7-a0b4c8b3e283.png"> <img width="869" alt="Screen Shot 2020-12-22 at 12 51 44 PM" src="https://user-images.githubusercontent.com/75754324/102918274-9ef91e00-4454-11eb-94bb-91b58aff47d3.png"> Reviewed By: mruberry Differential Revision: D25401439 Pulled By: jbschlosser fbshipit-source-id: 209d92ce7295e51699e83616d0c62170a7ce75c8	2020-12-22 20:14:55 -08:00
Akifumi Imanishi	492683bd42	Add LazyConvXd and LazyConvTransposeXd (#47350 ) Summary: This PR implements LazyConvXd and LazyConvTransposeXd based on https://github.com/pytorch/pytorch/issues/44538. (cc. emcastillo and albanD) Pull Request resolved: https://github.com/pytorch/pytorch/pull/47350 Reviewed By: ejguan Differential Revision: D25220645 Pulled By: albanD fbshipit-source-id: b5e2e866d53761a3415fd762d05a81920f8b16c3	2020-12-01 07:00:28 -08:00
AishwaryaKalloli	fe80638212	added docs to nn.rst (#48374 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48198 Added following functions to a subsection "Global Hooks For Module" in containers sections of nn.rst. - register_module_forward_pre_hook - register_module_forward_hook - register_module_backward_hook screenshots: ![image](https://user-images.githubusercontent.com/30429206/99903019-9ee7f000-2ce7-11eb-95dd-1092d5e57ce7.png) ![image](https://user-images.githubusercontent.com/30429206/99903027-ac04df00-2ce7-11eb-9983-42ce67de75ba.png) ![image](https://user-images.githubusercontent.com/30429206/99903039-c3dc6300-2ce7-11eb-81c4-a0240067fe23.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/48374 Reviewed By: ejguan Differential Revision: D25219507 Pulled By: albanD fbshipit-source-id: 0dd9d65f562c001c993ebcb51465e8ddcf631231	2020-11-30 11:34:49 -08:00
Emilio Castillo	d38a71d579	`torch.nn.modules.LazyModuleMixin` and `torch.nn.LazyLinear` (Shape Inference II) (#44538 ) Summary: Retake on https://github.com/pytorch/pytorch/issues/40493 after all the feedback from albanD This PR implements the generic Lazy mechanism and a sample `LazyLinear` layer with the `UninitializedParameter`. The main differences with the previous PR are two; Now `torch.nn.Module` remains untouched. We don't require an explicit initialization or a dummy forward pass before starting the training or inference of the actual module. Making this much simpler to use from the user side. As we discussed offline, there was the suggestion of not using a mixin, but changing the `__class__` attribute of `LazyLinear` to become `Linear` once it's completely initialized. While this can be useful, by the time being we need `LazyLinear` to be a `torch.nn.Module` subclass since there are many checks that rely on the modules being instances of `torch.nn.Module`. This can cause problems when we create complex modules such as ``` class MyNetwork(torch.nn.Module): def __init__(self): super(MyNetwork, self).__init__() self.conv = torch.nn.Conv2d(20, 4, 2) self.linear = torch.nn.LazyLinear(10) def forward(self, x): y = self.conv(x).clamp(min=0) return self.linear(y) ``` Here, when the __setattr__ function is called at the time LazyLinear is registered, it won't be added to the child modules of `MyNetwork`, so we have to manually do it later, but currently there is no way to do such thing as we can't access the parent module from LazyLinear once it becomes the Linear module. (We can add a workaround to this if needed). TODO: Add convolutions once the design is OK Fix docstrings Pull Request resolved: https://github.com/pytorch/pytorch/pull/44538 Reviewed By: ngimel Differential Revision: D24162854 Pulled By: albanD fbshipit-source-id: 6d58dfe5d43bfb05b6ee506e266db3cf4b885f0c	2020-10-19 13:13:54 -07:00
n-v-k	64b0686986	Expose ChannelShuffle (#46000 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45999 Also small fix for caffe2 counterpart Pull Request resolved: https://github.com/pytorch/pytorch/pull/46000 Reviewed By: mruberry Differential Revision: D24185855 Pulled By: ngimel fbshipit-source-id: c5d599bb8100b86b81c6901f1b8b8baefc12cb16	2020-10-08 16:00:01 -07:00

1 2 3 4

161 Commits