pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
BowenBao	8726f08e15	[ONNX] Update documentation (#58712 ) (#60249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60249 * Add introductory paragraph explaining what ONNX is and what the torch.onnx module does. * In "Tracing vs Scripting" and doc-string for torch.onnx.export(), clarify that exporting always happens on ScriptModules and that tracing and scripting are the two ways to produce a ScriptModule. * Remove examples of using Caffe2 to run exported models. Caffe2's website says it's deprecated, so it's probably best not to encourage people to use it by including it in examples. * Remove a lot of content that's redundant: * The example of how to mix tracing and scripting, and instead link to Introduction to TorchScript, which includes very similar content. * "Type annotations" section. Link to TorchScript docs which explain that in more detail. * "Using dictionaries to handle Named Arguments as model inputs" section. It's redundant with the description of the `args` argument to `export()`, which appears on the same page once the HTML is generated. * Remove the list of supported Tensor indexing patterns. If it's not in the list of unsupported patterns, users can assume it's supported, so having both is redundant. * Remove the list of supported operators and models. I think the list of supported operators is not very useful. A list of supported model architectures may be useful, but in reality it's already very out of date. We should add it back if / when we have a system for keeping it up to date. * "Operator Export Type" section. It's redundant with the description of the `operator_export_type` arg to to `export()`, which appears on the same page once the HTML is generated. * "Use external data format" section. It's redundant with the description of the `use_external_data_format` arg to `export()`. * "Training" section. It's redundant with the description of the `training` arg to `export()`. * Move the content about different operator implementations producing different results from the "Limitations" section into the doc for the `operator_export_type` arg. * Document "quantized" -> "caffe2" behavior of OperatorExportTypes.ONNX_ATEN_FALLBACK. * Combing the text about using torch.Tensor.item() and the text about using NumPy types into a section titled "Avoid NumPy and built-in Python types", since they're both fundamentally about the same issue. * Rename "Write PyTorch model in Torch way" to "Avoiding Pitfalls". * Lots of minor fixes: spelling, grammar, brevity, fixing links, adding links. * Clarify limitation on input and output types. Phrasing it in terms of PyTorch types is much more accessible than in terms of TorchScript types. Also clarify what actually happens when dict and str are used as inputs and outputs. * In Supported operators, use torch function and class names and link to them. This is more user friendly than using the internal aten op names. * Remove references to VariableType.h, which doesn't appear to contain the information that it once did. Instead refer to the generated .pyi files. * Remove the text in the FAQ about appending to lists within loops. I think this limitation is no longer present (perhaps since https://github.com/pytorch/pytorch/pull/51577). * Minor fixes to some code I read along the way. * Explain the current rationale for the weird ::prim_PythonOp op name. Test Plan: Imported from OSS Reviewed By: zou3519, ZolotukhinM Differential Revision: D29494912 Pulled By: SplitInfinity fbshipit-source-id: 7756c010b2320de0692369289604403d28877719 Co-authored-by: Gary Miguel <garymiguel@microsoft.com>	2021-07-08 16:29:32 -07:00
Aliaksandr Ivanou	13658b10bb	[torch] Various improvements to `torch.distributed.launch` and `torch.distributed.run` (#61294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61294 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60925 * Make `torch.distributed.launch` restarts to 0 * Remove unnecessary `-use_env` warning, move `-use_env` warnings * Move `-use_env` warnings to `torch.distributed.launch` * Make default log level WARNING * Add new doc section around transitioning to `torch.distributed.run` * Make `torch.distributed.launch` not use error-propagation * Set default events handler to `null` that does not print events to console * Add reference from `torch.distributed.launch` to `torch.distributed.run` * Set correct preexec function that sends SIGTERM to child processes when parent dies Issues resolved: https://github.com/pytorch/pytorch/issues/60716 https://github.com/pytorch/pytorch/issues/60754 Test Plan: sandcastle python -m torch.distributed.launch --nproc_per_node 2 main.py -> uses 0 restarts python -m torch.distributed.run --nproc_per_node 2 main.py -> uses default for torchelastic, 0 restarts python -m torch.distributed.launch --nproc_per_node=4 --use_env --no_python main.py -> produces error python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py -> no warning python -m torch.distributed.launch --nproc_per_node=4 --no_python main.py ->warning Output of running torch.distributed.launch without --use_env: $path/torch/distributed/launch.py:173: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torch.distributed.run. Note that --use_env is set by default in torch.distributed.run. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ('LOCAL_RANK')` instead. New section: {F628923078} {F628974089} Reviewed By: cbalioglu Differential Revision: D29559553 fbshipit-source-id: 03ed9ba638bf154354e1530ffc964688431edf6b	2021-07-08 16:28:06 -07:00
Howard Huang	cdc027679b	Add compare_set in distributed docs (#61351 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61351 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D29588206 Pulled By: H-Huang fbshipit-source-id: 9db48e7b6de29503275f10616470ad2d66b075f9	2021-07-08 12:30:32 -07:00
Kushashwa Ravi Shrimali	423523d8bb	Alias for logsumexp to special namespace (#58838 ) Summary: See https://github.com/pytorch/pytorch/issues/50345 cc: kshitij12345 Lezcano mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/58838 Reviewed By: malfet Differential Revision: D29565033 Pulled By: mruberry fbshipit-source-id: 9b715ea00c78f47b6f183357ee3c7d4c3abe4d01	2021-07-07 13:32:15 -07:00
Philip Meier	1262b2c4c6	fix `torch.futures` docstring examples (#61029 ) Summary: Trying to run the doctests for the complete documentation hangs if it reaches the examples of `torch.futures`. It turns out to be only syntax errors, which are normally just reported. My guess is that `doctest` probably doesn't work well for failures within async stuff. Anyway, while debugging this, I fixed the syntax. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61029 Reviewed By: mruberry Differential Revision: D29571923 Pulled By: mrshenli fbshipit-source-id: bb8112be5302c6ec43151590b438b195a8f30a06	2021-07-07 11:47:55 -07:00
Vitaly Fedyunin	ccfdb30644	Revert D29413019: [torch] Various improvements to `torch.distributed.launch` and `torch.distributed.run` Test Plan: revert-hammer Differential Revision: D29413019 (`4e181dfc35`) Original commit changeset: 323bfbad9d0e fbshipit-source-id: 1f8ae4b3d0a23f3eaff28c37e9148efff25fafe2	2021-07-01 08:44:51 -07:00
Aliaksandr Ivanou	4e181dfc35	[torch] Various improvements to `torch.distributed.launch` and `torch.distributed.run` (#60925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60925 * Make `torch.distributed.launch` restarts to 0 * Remove unnecessary `-use_env` warning, move `-use_env` warnings * Move `-use_env` warnings to `torch.distributed.launch` * Make default log level WARNING * Add new doc section around transitioning to `torch.distributed.run` * Make `torch.distributed.launch` not use error-propagation * Set default events handler to `null` that does not print events to console * Add reference from `torch.distributed.launch` to `torch.distributed.run` * Set correct preexec function that sends SIGTERM to child processes when parent dies Issues resolved: https://github.com/pytorch/pytorch/issues/60716 https://github.com/pytorch/pytorch/issues/60754 Test Plan: sandcastle python -m torch.distributed.launch --nproc_per_node 2 main.py -> uses 0 restarts python -m torch.distributed.run --nproc_per_node 2 main.py -> uses default for torchelastic, 0 restarts python -m torch.distributed.launch --nproc_per_node=4 --use_env --no_python main.py -> produces error python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py -> no warning python -m torch.distributed.launch --nproc_per_node=4 --no_python main.py ->warning Output of running torch.distributed.launch without --use_env: $path/torch/distributed/launch.py:173: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torch.distributed.run. Note that --use_env is set by default in torch.distributed.run. If your script expects `--local_rank` argument to be set, please change it to read from `os.environ('LOCAL_RANK')` instead. New section: {F628923078} {F628974089} Reviewed By: kiukchung, cbalioglu Differential Revision: D29413019 fbshipit-source-id: 323bfbad9d0e4aba3b10ddd7a243ca6e48169630	2021-06-30 23:31:02 -07:00
Heitor Schueroff	f32f85e6da	Implemented torch.corrcoef (#60420 ) Summary: Implements `torch.corrcoef` similar to [`np.corrcoef`](https://numpy.org/doc/stable/reference/generated/numpy.corrcoef.html) using `torch.cov` implemented in https://github.com/pytorch/pytorch/pull/58311. closes https://github.com/pytorch/pytorch/issues/1254 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60420 Reviewed By: mruberry Differential Revision: D29474687 Pulled By: heitorschueroff fbshipit-source-id: f3c7c5610363aebd88274a51fc77e3cf879cb611	2021-06-30 12:36:02 -07:00
Heitor Schueroff	ec9c03c234	Implemented torch.cov (#58311 ) Summary: Based from https://github.com/pytorch/pytorch/pull/50466 Adds the initial implementation of `torch.cov` similar to `numpy.cov`. For simplicity, we removed support for many parameters in `numpy.cov` that are either redundant such as `bias`, or have simple workarounds such as `y` and `rowvar`. cc PandaBoi closes https://github.com/pytorch/pytorch/issues/19037 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58311 Reviewed By: jbschlosser Differential Revision: D29431651 Pulled By: heitorschueroff fbshipit-source-id: 167dea880f534934b145ba94291a9d634c25b01b	2021-06-29 14:02:39 -07:00
Jeff Yang	a8057e7ef1	docs: add `permute` in torch docs (#60821 ) Summary: fix https://github.com/pytorch/pytorch/issues/60181 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60821 Reviewed By: VitalyFedyunin Differential Revision: D29431949 Pulled By: jbschlosser fbshipit-source-id: 2353afceaa188315cde1f0c955897c4750809c8e	2021-06-28 11:20:35 -07:00
Michael Carilli	2fa6c7627e	[CUDA graphs][BC-breaking] Removes post-backward syncs on default stream (#60421 ) Summary: Before https://github.com/pytorch/pytorch/pull/57833, calls to backward() or grad() synced only the calling thread's default stream with autograd leaf streams at the end of backward. This made the following weird pattern safe: ```python with torch.cuda.stream(s): # imagine forward used many streams, so backward leaf nodes may run on many streams loss.backward() # no sync use grads ``` but a more benign-looking pattern was unsafe: ```python with torch.cuda.stream(s): # imagine forward used a lot of streams, so backward leaf nodes may run on many streams loss.backward() # backward() syncs the default stream with all the leaf streams, but does not sync s with anything, # so counterintuitively (even though we're in the same stream context as backward()!) # it is NOT SAFE to use grads here, and there's no easy way to make it safe, # unless you manually sync on all the streams you used in forward, # or move "use grads" back to default stream outside the context. use grads ``` mruberry ngimel and I decided backward() should have the [same user-facing stream semantics as any cuda op](https://pytorch.org/docs/master/notes/cuda.html#stream-semantics-of-backward-passes). In other words, the weird pattern should be unsafe, and the benign-looking pattern should be safe. Implementationwise, this meant backward() should sync its calling thread's current stream, not default stream, with the leaf streams. After https://github.com/pytorch/pytorch/pull/57833, backward syncs the calling thread's current stream AND default stream with all leaf streams at the end of backward. The default stream syncs were retained for temporary backward compatibility. This PR finishes https://github.com/pytorch/pytorch/pull/57833's work by deleting syncs on the default stream. With this PR, graph-capturing an entire backward() call should be possible (see the [test_graph_grad_scaling diffs](https://github.com/pytorch/pytorch/compare/master...mcarilli:streaming_backwards_remove_default_syncs?expand=1#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450R3641-R3642)). first paragraph has a formatting error which this PR should also fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60421 Reviewed By: albanD Differential Revision: D29370344 Pulled By: ngimel fbshipit-source-id: 3248bc5fb92fc517db0c15c897e5d7250f67d7fe	2021-06-24 17:34:02 -07:00
sawradip	eddc5f40f9	Added GLU and FeatureAlphaDropout to nn docs (#60590 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60563 and https://github.com/pytorch/pytorch/issues/60570 Pull Request resolved: https://github.com/pytorch/pytorch/pull/60590 Reviewed By: albanD Differential Revision: D29352372 Pulled By: jbschlosser fbshipit-source-id: f81dd65deab1848a68dc202df252c416ce5214d0	2021-06-24 08:00:18 -07:00
Luca Wehrstedt	bb9e1150ea	Revert D29342234: [pytorch][PR] [CUDA graphs][BC-breaking] Removes post-backward syncs on default stream Test Plan: revert-hammer Differential Revision: D29342234 (`675cea1adb`) Original commit changeset: 98e6be7fdd85 fbshipit-source-id: 84022973248b2254210eee57402df2c4f4bc43c6	2021-06-24 04:49:28 -07:00
kshitij12345	dfd2edc025	[special] add zeta (#59623 ) Summary: Reference https://github.com/pytorch/pytorch/issues/50345 `zeta` was already present in the codebase to support computation of `polygamma`. However, `zeta` only had `double(double, double)` signature for CPU before the PR (which meant that computation `polygamma` were always upcasted to `double` for zeta part). With this PR, float computations will take place in float and double in double. Have also refactored the code and moved the duplicate code from `Math.cuh` to `Math.h` Note: For scipy, q is optional, and if it is `None`, it defaults `1` which corresponds to Reimann-Zeta. However, for `torch.specia.zeta`, I made it mandatory cause for me it feels odd without `q` this is Reimann-Zeta and with `q` it is the general Hurwitz Zeta. I think sticking to just general made more sense as passing `1` for q sounds trivial. Verify: * [x] Docs https://14234587-65600975-gh.circle-artifacts.com/0/docs/special.html#torch.special.zeta Pull Request resolved: https://github.com/pytorch/pytorch/pull/59623 Reviewed By: ngimel Differential Revision: D29348269 Pulled By: mruberry fbshipit-source-id: a3f9ebe1f7724dbe66de2b391afb9da1cfc3e4bb	2021-06-24 00:00:12 -07:00
Akifumi Imanishi	26cdec6ce4	Support `torch.bitwise_{left/right}_shift` and `__rlshift__`, `__rrshift__` (#59544 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58121 This PR implements `torch.bitwise_left_shift` and `torch.bitwise_right_shift` and `torch.Tensor.{__rlshift__/__rrshift__}`for compatibility with Python array API standard. (cc: mruberry, rgommers, emcastillo, kmaehashi) Pull Request resolved: https://github.com/pytorch/pytorch/pull/59544 Reviewed By: ngimel Differential Revision: D29348869 Pulled By: mruberry fbshipit-source-id: 329aee296cf890735e8a9f858bccfe87c03d06ca	2021-06-23 23:57:16 -07:00
Michael Carilli	675cea1adb	[CUDA graphs][BC-breaking] Removes post-backward syncs on default stream (#60421 ) Summary: Before https://github.com/pytorch/pytorch/pull/57833, calls to backward() or grad() synced only the calling thread's default stream with autograd leaf streams at the end of backward. This made the following weird pattern safe: ```python with torch.cuda.stream(s): # imagine forward used many streams, so backward leaf nodes may run on many streams loss.backward() # no sync use grads ``` but a more benign-looking pattern was unsafe: ```python with torch.cuda.stream(s): # imagine forward used a lot of streams, so backward leaf nodes may run on many streams loss.backward() # backward() syncs the default stream with all the leaf streams, but does not sync s with anything, # so counterintuitively (even though we're in the same stream context as backward()!) # it is NOT SAFE to use grads here, and there's no easy way to make it safe, # unless you manually sync on all the streams you used in forward, # or move "use grads" back to default stream outside the context. use grads ``` mruberry ngimel and I decided backward() should have the [same user-facing stream semantics as any cuda op](https://pytorch.org/docs/master/notes/cuda.html#stream-semantics-of-backward-passes). In other words, the weird pattern should be unsafe, and the benign-looking pattern should be safe. Implementationwise, this meant backward() should sync its calling thread's current stream, not default stream, with the leaf streams. After https://github.com/pytorch/pytorch/pull/57833, backward syncs the calling thread's current stream AND default stream with all leaf streams at the end of backward. The default stream syncs were retained for temporary backward compatibility. This PR finishes https://github.com/pytorch/pytorch/pull/57833's work by deleting syncs on the default stream. With this PR, graph-capturing an entire backward() call should be possible (see the [test_graph_grad_scaling diffs](https://github.com/pytorch/pytorch/compare/master...mcarilli:streaming_backwards_remove_default_syncs?expand=1#diff-893b1eea27352f336f4cd832919e48d721e4e90186e63400b8596db6b82e7450R3641-R3642)). first paragraph has a formatting error which this PR should also fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60421 Reviewed By: VitalyFedyunin, albanD Differential Revision: D29342234 Pulled By: ngimel fbshipit-source-id: 98e6be7fdd8550872f0a78f9a66cb8dfe75abf63	2021-06-23 23:35:24 -07:00
Ilqar Ramazanli	63219f1f9f	To add Rectified Adam Algorithm to Optimizers (#58968 ) Summary: Fixes : https://github.com/pytorch/pytorch/issues/24892 In the paper : https://arxiv.org/pdf/1908.03265.pdf Liyuan Liu et al. suggested a new optimization algorithm with an essence of similar to Adam Algorithm. It has been discussed in the paper that, without warmup heuristic, in the early stage of adaptive optimization / learning algorithms sometimes we can get undesirable large variance which can slow overall convergence process. Authors proposed the idea of rectification of variance of adaptive learning rate when it is expected to be high. Differing from the paper, we selected variance tractability cut-off as 5 instead of 4. This adjustment is common practice, and could be found in the code-repository and also tensorflow swift optim library as well : `2f03dd1970/radam/radam.py (L156)` `f51ee4618d/Sources/TensorFlow/Optimizers/MomentumBased.swift (L638)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/58968 Reviewed By: vincentqb Differential Revision: D29310601 Pulled By: iramazanli fbshipit-source-id: b7bd487f72f1074f266687fd9c0c6be264a748a9	2021-06-23 18:27:57 -07:00
Ilqar Ramazanli	e8690dacb2	To add Nesterov Adam Algorithm to Optimizers (#59009 ) Summary: Fixes : https://github.com/pytorch/pytorch/issues/5804 In the paper : https://openreview.net/forum?id=OM0jvwB8jIp57ZJjtNEZ Timothy Dozat suggested a new optimization algorithm with an essence of combination of NAG and Adam algorithms. It is known that the idea of momentum can be improved with the Nesterov acceleration in optimization algorithms, and Dozat is investigating to apply this idea to momentum component of Adam algorithm. Author provided experiment evidence in their work to show excellence of the idea. In this PR we are implementing the proposed algorithm NAdam in the mentioned paper. Author has a preliminary work http://cs229.stanford.edu/proj2015/054_report.pdf where he shows the decay base constant should be taken as 0.96 which we also followed the same phenomenon here in this implementation similar to Keras. Moreover, implementation / coding practice have been followed similar to Keras in some other places as well: `f9d3868495/tensorflow/python/keras/optimizer_v2/nadam.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59009 Reviewed By: gchanan, vincentqb Differential Revision: D29220375 Pulled By: iramazanli fbshipit-source-id: 4b4bb4b15f7e16f7527f368bbf4207ed345751aa	2021-06-23 08:21:43 -07:00
Weiqiang Wu	6a87e8d087	Implement erfcx() (#58194 ) Summary: Implement erfcx() https://github.com/pytorch/pytorch/issues/31945 Reference: https://github.com/pytorch/pytorch/issues/50345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58194 Reviewed By: ngimel Differential Revision: D29285979 Pulled By: mruberry fbshipit-source-id: 5bcfe77fddfabbeb8c8068658ba6d9fec6430399	2021-06-22 12:38:38 -07:00
Sam Estep	1abf45e37f	Revert D29241736: [pytorch][PR] To add Rectified Adam Algorithm to Optimizers Test Plan: revert-hammer Differential Revision: D29241736 (`0d2a936176`) Original commit changeset: 288b9b1f3125 fbshipit-source-id: 56c4ec98647c6f1822b130726741a1c9ca193670	2021-06-22 12:08:31 -07:00
Ilqar Ramazanli	0d2a936176	To add Rectified Adam Algorithm to Optimizers (#58968 ) Summary: Fixes : https://github.com/pytorch/pytorch/issues/24892 In the paper : https://arxiv.org/pdf/1908.03265.pdf Liyuan Liu et al. suggested a new optimization algorithm with an essence of similar to Adam Algorithm. It has been discussed in the paper that, without warmup heuristic, in the early stage of adaptive optimization / learning algorithms sometimes we can get undesirable large variance which can slow overall convergence process. Authors proposed the idea of rectification of variance of adaptive learning rate when it is expected to be high. Differing from the paper, we selected variance tractability cut-off as 5 instead of 4. This adjustment is common practice, and could be found in the code-repository and also tensorflow swift optim library as well : `2f03dd1970/radam/radam.py (L156)` `f51ee4618d/Sources/TensorFlow/Optimizers/MomentumBased.swift (L638)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/58968 Reviewed By: gchanan Differential Revision: D29241736 Pulled By: iramazanli fbshipit-source-id: 288b9b1f3125fdc6c7a7bb23fde1ea5c201c0448	2021-06-22 10:38:41 -07:00
Saketh Are	729f7cd52f	Implement histogram operator on CPU (#58780 ) Summary: The existing [torch.histc](https://pytorch.org/docs/stable/generated/torch.histc.html) operator is limited in comparison to [numpy.histogram](https://numpy.org/doc/stable/reference/generated/numpy.histogram.html). This PR adds torch.histogram on CPU. The new operator replicates numpy.histogram's behavior, including support for caller-specified bin edges and weights. It was motivated by previous community requests for histogram. The implementation was [benchmarked](https://docs.google.com/spreadsheets/d/1xCR0jODchVvwdVSAjiLsNCkmyictA6j1LNfDpWOafjw/edit?usp=sharing) against numpy.histogram as well as torch.histc. This implementation is weakly faster than numpy.histogram across all types of inputs tested, and performs in line with torch.histc for the limited inputs histc supports. mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/58780 Test Plan: Added unit tests, OpInfo for the new torch.histogram operator. Tested execution time on a variety of input sizes and compared to numpy.histogram performance: https://docs.google.com/spreadsheets/d/1xCR0jODchVvwdVSAjiLsNCkmyictA6j1LNfDpWOafjw/edit?usp=sharing Reviewed By: ezyang Differential Revision: D29134626 Pulled By: saketh-are fbshipit-source-id: f2773085de1697f6bc6ffdeffe9a81267f51bdfc	2021-06-22 10:06:04 -07:00
kshitij12345	01e0296eb7	[special] migrate log1p, sinc, round to special namespace (#55878 ) Summary: Reference : https://github.com/pytorch/pytorch/issues/50345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55878 Reviewed By: zou3519, janeyx99 Differential Revision: D29160593 Pulled By: mruberry fbshipit-source-id: f3ca9c541382bab33fb85d7817ce8ddc117c6826	2021-06-21 12:34:29 -07:00
Michael Wootton	2f3be2735f	Don't split oversize cached blocks (#44742 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/35901 This change is designed to prevent fragmentation in the Caching Allocator. Permissive block splitting in the allocator allows very large blocks to be split into many pieces. Once split too finely it is unlikely all pieces will be 'free' at that same time so the original allocation can never be returned. Anecdotally, we've seen a model run out of memory failing to alloc a 50 MB block on a 32 GB card while the caching allocator is holding 13 GB of 'split free blocks' Approach: - Large blocks above a certain size are designated "oversize". This limit is currently set 1 decade above large, 200 MB - Oversize blocks can not be split - Oversize blocks must closely match the requested size (e.g. a 200 MB request will match an existing 205 MB block, but not a 300 MB block) - In lieu of splitting oversize blocks there is a mechanism to quickly free a single oversize block (to the system allocator) to allow an appropriate size block to be allocated. This will be activated under memory pressure and will prevent _release_cached_blocks()_ from triggering Initial performance tests show this is similar or quicker than the original strategy. Additional tests are ongoing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44742 Reviewed By: zou3519 Differential Revision: D29186394 Pulled By: ezyang fbshipit-source-id: c88918836db3f51df59de6d1b3e03602ebe306a9	2021-06-21 11:46:08 -07:00
Thomas J. Fan	c16f87949f	ENH Adds nn.ReflectionPad3d (#59791 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/27655 This PR adds a C++ and Python version of ReflectionPad3d with structured kernels. The implementation uses lambdas extensively to better share code from the backward and forward pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59791 Reviewed By: gchanan Differential Revision: D29242015 Pulled By: jbschlosser fbshipit-source-id: 18e692d3b49b74082be09f373fc95fb7891e1b56	2021-06-21 10:53:14 -07:00
Michael Carilli	f89ae9cb8d	Moves grid_sampler to autocast promote list (#58618 ) Summary: Should close https://github.com/pytorch/pytorch/issues/42218 Numerically, `grid_sampler` is fine in fp16 or fp32, but takes several inputs and expects their dtypes to match, so it belongs on the autocast promote list. `grid_sampler` currently uses `gpuAtomicAdd`, notoriously slow in fp16 because it calls cuda's atomicAdd __half overload which uses a software compare-and-swap loop internally. To allow good performance if both inputs happen to be FP16, the PR also modifies `grid_sampler_[2,3]d_backward_kernel`s to use `fastAtomicAdd` instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58618 Reviewed By: mruberry Differential Revision: D29257199 Pulled By: ngimel fbshipit-source-id: 3cc7505945b480427f2fc1beb36bee80bf3853b3	2021-06-21 10:22:36 -07:00
kshitij12345	5ec4ad7f54	[special] Add special.ndtri (#58650 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 TODO * [x] Add docs https://13865352-65600975-gh.circle-artifacts.com/0/docs/special.html#torch.special.ndtri * [x] Add comments on implementation * [x] Clean-up Pull Request resolved: https://github.com/pytorch/pytorch/pull/58650 Reviewed By: H-Huang Differential Revision: D29160170 Pulled By: mruberry fbshipit-source-id: 50e4ea663920e97b8437d03d5b52bcd9dedc1a8d	2021-06-19 18:36:54 -07:00
Patrick Wang	8b55e9feaf	removed cat, equal, and stack from autocast promote list (#59497 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/59497 Reviewed By: zou3519 Differential Revision: D29185909 Pulled By: ngimel fbshipit-source-id: db96239106d9e46a2704b8f457fd0463dacc1f5c	2021-06-17 21:13:22 -07:00
Patrick	5948e6f653	removed gelu from autocast fp32 list (#59639 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/59639 Reviewed By: H-Huang Differential Revision: D29155914 Pulled By: ezyang fbshipit-source-id: feb117181894c2355768d5b1189b3d5f1649fc0b	2021-06-16 16:29:57 -07:00
Michael Suo	15f236f3e3	[package] fix tutorial link (#60113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60113 The tutorial link in the docs was to an fb-only colab. Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D29169818 Pulled By: suo fbshipit-source-id: 374807c234a185bd515b8ffe1300e6cf8d821636	2021-06-16 11:27:25 -07:00
BowenBao	55530e2276	Update Autograd Export Docs (#56594 ) (#59534 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59534 Update autograd export docs Test Plan: Imported from OSS Reviewed By: nikithamalgifb, ansley Differential Revision: D29046606 Pulled By: SplitInfinity fbshipit-source-id: 36057f6bdfd3e5c071dbca05d327de7952904120 Co-authored-by: neginraoof <neginmr@utexas.edu>	2021-06-15 12:23:00 -07:00
Joel Schlosser	c645d39a77	Implementation of torch.isin() (#53125 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/3025 ## Background This PR implements a function similar to numpy's [`isin()`](https://numpy.org/doc/stable/reference/generated/numpy.isin.html#numpy.isin). The op supports integral and floating point types on CPU and CUDA (+ half & bfloat16 for CUDA). Inputs can be one of: * (Tensor, Tensor) * (Tensor, Scalar) * (Scalar, Tensor) Internally, one of two algorithms is selected based on the number of elements vs. test elements. The heuristic for deciding which algorithm to use is taken from [numpy's implementation](`fb215c7696/numpy/lib/arraysetops.py (L575)`): if `len(test_elements) < 10 * len(elements) ** 0.145`, then a naive brute-force checking algorithm is used. Otherwise, a stablesort-based algorithm is used. I've done some preliminary benchmarking to verify this heuristic on a devgpu, and determined for a limited set of tests that a power value of `0.407` instead of `0.145` is a better inflection point. For now, the heuristic has been left to match numpy's, but input is welcome for the best way to select it or whether it should be left the same as numpy's. Tests are adapted from numpy's [isin and in1d tests](`7dcd29aaaf/numpy/lib/tests/test_arraysetops.py`). Note: my locally generated docs look terrible for some reason, so I'm not including the screenshot for them until I figure out why. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53125 Test Plan: ``` python test/test_ops.py # Ex: python test/test_ops.py TestOpInfoCPU.test_supported_dtypes_isin_cpu_int32 python test/test_sort_and_select.py # Ex: python test/test_sort_and_select.py TestSortAndSelectCPU.test_isin_cpu_int32 ``` Reviewed By: soulitzer Differential Revision: D29101165 Pulled By: jbschlosser fbshipit-source-id: 2dcc38d497b1e843f73f332d837081e819454b4e	2021-06-14 13:50:53 -07:00
Meghan Lele	8e92a3a8b0	[docs] Add pickle security warning to package docs (#59959 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59959 Summary This commit replaces the warning on the `torch.package` documentation page about the module not being publicly released (which will no longer be true as of 1.9) with one that warns about security issues caused by the use of the `pickle` module. Test Plan 1) Built the docs locally. 2) Continuous integration. <img width="877" alt="Captura de Pantalla 2021-06-14 a la(s) 11 22 05 a m" src="https://user-images.githubusercontent.com/4392003/121940300-c98cab00-cd02-11eb-99dc-08e29632079a.png"> Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D29108429 Pulled By: SplitInfinity fbshipit-source-id: 3a0aeac0dc804a31203bc5071efb1c5bd6ef9725	2021-06-14 13:03:05 -07:00
Kushashwa Ravi Shrimali	cf38b20c61	Alias for `digamma` as `psi` to `special` namespace (#59143 ) Summary: See https://github.com/pytorch/pytorch/issues/50345 cc: mruberry kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59143 Reviewed By: jbschlosser Differential Revision: D28986909 Pulled By: mruberry fbshipit-source-id: bc8ff0375de968f3662b224689fa0a6b117f9c4e	2021-06-14 03:05:14 -07:00
Michael Carilli	be038d8989	[CUDA graphs] Make stream semantics of backward calls consistent with other cuda ops (ci-all edition) (#57833 ) Summary: ci-all resubmit of https://github.com/pytorch/pytorch/pull/54227. Tests look good except for a few distributed autograd failures (pytorch_linux_xenial_cuda10_2_cudnn7_py3_multigpu_test) and rocm failures (pr/pytorch-linux-bionic-rocm4.1-py3.6). The common denominator in rocm failures appears to be multi-gpu activity: some [multiprocess DDP failures](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm4.1-py3.6-test1/8115/console), some [single-process failures](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm4.1-py3.6-test2/8115/console) where the single process has autograd ops that span devices. jeffdaily jithunnair-amd sunway513, could one of you take a look? The streaming backward change is also beneficial to rocm, I expect. For debugging rocm failures, I think we should ignore the multiprocess/DDP tests and focus on the single process cases. The root cause is probably the same and the single process cases are simpler. ---------------------------------- Update: Rocm failures are due to https://github.com/pytorch/pytorch/issues/59750. `2718a54032` is a workaround, to be updated once https://github.com/pytorch/pytorch/issues/59750 is fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57833 Reviewed By: mruberry Differential Revision: D28942391 Pulled By: ngimel fbshipit-source-id: d6047e971c5f1c6386334bf3641402a92f12e2f8	2021-06-13 12:09:56 -07:00
Mike Ruberry	92513038e8	Revert D28994140: [pytorch][PR] Implemented torch.cov Test Plan: revert-hammer Differential Revision: D28994140 (`23c232554b`) Original commit changeset: 1890166c0a9c fbshipit-source-id: 73dfe1b00464e38f004f99960cdeeb604ed4b20a	2021-06-13 02:33:37 -07:00
Heitor Schueroff	23c232554b	Implemented torch.cov (#58311 ) Summary: Based from https://github.com/pytorch/pytorch/pull/50466 Adds the initial implementation of `torch.cov` similar to `numpy.cov`. For simplicity, we removed support for many parameters in `numpy.cov` that are either redundant such as `bias`, or have simple workarounds such as `y` and `rowvar`. cc PandaBoi TODO - [x] Improve documentation Pull Request resolved: https://github.com/pytorch/pytorch/pull/58311 Reviewed By: mruberry Differential Revision: D28994140 Pulled By: heitorschueroff fbshipit-source-id: 1890166c0a9c01e0a536acd91571cd704d632f44	2021-06-11 09:40:50 -07:00
Meghan Lele	4025f95a20	[docs] Add table of contents to torch.package docs (#59842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59842 Test Plan: Continuous integration. <img width="544" alt="Captura de Pantalla 2021-06-10 a la(s) 5 13 07 p m" src="https://user-images.githubusercontent.com/4392003/121612390-2ccec280-ca0f-11eb-87ad-fef632ba05ca.png"> Reviewed By: Lilyjjo Differential Revision: D29050627 Pulled By: SplitInfinity fbshipit-source-id: 76c25ed4002cbaf072036e2e14e7857c15077df7	2021-06-10 19:52:50 -07:00
Meghan Lele	0e222db087	[docs] Add explanation section to torch.package docs (#59833 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59833 Summary This commit adds an explanation section to the `torch.package` documentation. This section clarifies and illuminates various aspects of the internals of `torch.package` that might be of interest to users. Test Plan Continuous integration. Test Plan: Imported from OSS Reviewed By: Lilyjjo Differential Revision: D29050626 Pulled By: SplitInfinity fbshipit-source-id: 78e0cda00f69506ef2dfc52d6df63694b502269e	2021-06-10 19:52:48 -07:00
Meghan Lele	062dde7285	[docs] Add "how do I" section to torch.package docs (#59503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59503 Summary This commit adds a "how do I..." section to the `torch.package` documentation. This section contains short guides about how to solve real-world problems that frequently recur while using `torch.package`. Test Plan Continuous integration. <img width="877" alt="Captura de Pantalla 2021-06-04 a la(s) 9 19 54 p m" src="https://user-images.githubusercontent.com/4392003/120879911-98321380-c57b-11eb-8664-c582c92b7837.png"> Test Plan: Imported from OSS Reviewed By: Lilyjjo Differential Revision: D29050629 Pulled By: SplitInfinity fbshipit-source-id: 2b7800732e0a3c1c947f110c05562aed5174a87f	2021-06-10 19:52:47 -07:00
Meghan Lele	6a18ca7a07	[docs] Add tutorials section to torch.package docs (#59499 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59499 Summary This commit adds a tutorials section to the torch.package docs. Test Plan Continuous integration. <img width="870" alt="Captura de Pantalla 2021-06-04 a la(s) 5 10 31 p m" src="https://user-images.githubusercontent.com/4392003/120874257-b9ced300-c55a-11eb-84dd-721cb7ac73ab.png"> Test Plan: Imported from OSS Reviewed By: Lilyjjo Differential Revision: D29050628 Pulled By: SplitInfinity fbshipit-source-id: c17ab0100a9d63e7af8da7a618143cedbd0a5872	2021-06-10 19:52:45 -07:00
Meghan Lele	a3db8e0a26	[docs] Add torch.package documentation preamble (#59491 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59491 Summary This commit adds a preamble to the `torch.package` documentation page that explains briefly what `torch.package` is. Test Plan Continous integration. <img width="881" alt="Captura de Pantalla 2021-06-04 a la(s) 3 57 01 p m" src="https://user-images.githubusercontent.com/4392003/120872203-d535e000-c552-11eb-841d-b38df19bc992.png"> Test Plan: Imported from OSS Reviewed By: Lilyjjo Differential Revision: D29050630 Pulled By: SplitInfinity fbshipit-source-id: 70a3fd43f076751c6ea83be3ead291686c641158	2021-06-10 19:51:37 -07:00
Rohan Varma	2f395f3b54	[reland] Document debugability features in torch.distributed (#59726 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59726 Reland of https://github.com/pytorch/pytorch/pull/59604 with indentation fix ghstack-source-id: 130979356 Test Plan: ci Reviewed By: SciPioneer Differential Revision: D29001923 fbshipit-source-id: 225d9dc5054c223b453f3b39749e2b62f61b9a2c	2021-06-09 16:40:11 -07:00
Luca Wehrstedt	f1786b293d	Revert D28972444: [pytorch][PR] Document debugability features in torch.distributed Test Plan: revert-hammer Differential Revision: D28972444 (`a9d2810817`) Original commit changeset: da5e8ee84f0d fbshipit-source-id: 94d3b3b75ddec74ea5b2b76f6a7519dc921ee2a7	2021-06-09 03:04:36 -07:00
Rohan Varma	a9d2810817	Document debugability features in torch.distributed (#59604 ) Summary: Adds comprehensive documentation around debugability features added to `torch.distributed` recently, including the `monitored_barrier` and TORCH_DISTRIBUTED_DEBUG env variable. ![dist_one](https://user-images.githubusercontent.com/8039770/121102672-0f052180-c7b3-11eb-974c-81dbbe102cb6.png) ![dist_two](https://user-images.githubusercontent.com/8039770/121102734-39ef7580-c7b3-11eb-94f7-c75469351440.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/59604 Reviewed By: jbschlosser, SciPioneer Differential Revision: D28972444 Pulled By: rohan-varma fbshipit-source-id: da5e8ee84f0d6f252c703c4d70ff2a0d5817cc4e	2021-06-08 23:52:19 -07:00
Jeffrey Wan	f52e202840	Add warning when accessing Tensor::grad() in the C++ API (#59362 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/35379 - Adds `retains_grad` attribute backed by cpp as a native function. The python bindings for the function are skipped to be consistent with `is_leaf`. - Tried writing it without native function, but the jit test `test_tensor_properties` seems to require that it be a native function (or alternatively maybe it could also work if we manually add a prim implementation?). - Python API now uses `retain_grad` implementation from cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/59362 Reviewed By: jbschlosser Differential Revision: D28969298 Pulled By: soulitzer fbshipit-source-id: 335f2be50b9fb870cd35dc72f7dadd6c8666cc02	2021-06-08 19:43:21 -07:00
James Reed	02d380450d	[FX][docs][EZ] Fix link to fuser example (#59670 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59670 Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D28975704 Pulled By: jamesr66a fbshipit-source-id: 2fb759224b5b1ecc62c0ab26563d2a35ed422794	2021-06-08 17:32:55 -07:00
Vasiliy Kuznetsov	dafa4b3517	quantization: improve documentation on natively supported backends (#58925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58925 Cleans up documentation on natively supported backends. In particular: * adds a section title * deduplicates information about fbgemm/qnnpack * clarifies what `torch.backends.quantized.engine` does * adds code samples with default settings for `fbgemm` and `qnnpack` Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D28681840 Pulled By: vkuzo fbshipit-source-id: 51a6ab66934f657553351f6c84a638fd5f7b4e12	2021-06-07 17:29:03 -07:00
Thomas J. Fan	6ff001c125	DOC Improve documentation for LayerNorm (#59178 ) Summary: Closes https://github.com/pytorch/pytorch/issues/51455 I think the current implementation is aggregating over the correct dimensions. The shape of `normalized_shape` is only used to determine the dimensions to aggregate over. The actual values of `normalized_shape` are used when `elementwise_affine=True` to initialize the weights and biases. This PR updates the docstring to clarify how `normalized_shape` is used. Here is a short script comparing the implementations for tensorflow and pytorch: ```python import torch import torch.nn as nn import tensorflow as tf from tensorflow.keras.layers import LayerNormalization rng = np.random.RandomState() x = rng.randn(10, 20, 64, 64).astype(np.float32) # slightly non-trival x[:, :10, ...] = x[:, :10, ...] * 10 + 20 x[:, 10:, ...] = x[:, 10:, ...] * 30 - 100 # Tensorflow Layer norm x_tf = tf.convert_to_tensor(x) layer_norm_tf = LayerNormalization(axis=[-3, -2, -1], epsilon=1e-5) output_tf = layer_norm_tf(x_tf) output_tf_np = output_tf.numpy() # PyTorch Layer norm x_torch = torch.as_tensor(x) layer_norm_torch = nn.LayerNorm([20, 64, 64], elementwise_affine=False) output_torch = layer_norm_torch(x_torch) output_torch_np = output_torch.detach().numpy() # check tensorflow and pytorch torch.testing.assert_allclose(output_tf_np, output_torch_np) # manual comutation manual_output = ((x_torch - x_torch.mean(dim=(-3, -2, -1), keepdims=True)) / (x_torch.var(dim=(-3, -2, -1), keepdims=True, unbiased=False) + 1e-5).sqrt()) torch.testing.assert_allclose(output_torch, manual_output) ``` To get to the layer normalization as shown here: <img width="157" alt="Screen Shot 2021-05-29 at 2 13 52 PM" src="https://user-images.githubusercontent.com/5402633/120080691-1e37f100-c088-11eb-9060-4f263e4cd093.png"> One needs to pass in `normalized_shape` with shape `x.dim() - 1` with the size of the channels and all spatial dimensions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59178 Reviewed By: ejguan Differential Revision: D28931877 Pulled By: jbschlosser fbshipit-source-id: 193e05205b9085bb190c221428c96d2ca29f2a70	2021-06-07 14:34:10 -07:00
anjali411	3607478ecd	Conjugate View (#54987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54987 Based off of ezyang (https://github.com/pytorch/pytorch/pull/44799) and bdhirsh (https://github.com/pytorch/pytorch/pull/43702) 's prototype: Here's a summary of the changes in this PR: This PR adds a new dispatch key called Conjugate. This enables us to make conjugate operation a view and leverage the specialized library functions that fast path with the hermitian operation (conj + transpose). 1. Conjugate operation will now return a view with conj bit (1) for complex tensors and returns self for non-complex tensors as before. This also means `torch.view_as_real` will no longer be a view on conjugated complex tensors and is hence disabled. To fill the gap, we have added `torch.view_as_real_physical` which would return the real tensor agnostic of the conjugate bit on the input complex tensor. The information about conjugation on the old tensor can be obtained by calling `.is_conj()` on the new tensor. 2. NEW API: a) `.conj()` -- now returning a view. b) `.conj_physical()` -- does the physical conjugate operation. If the conj bit for input was set, you'd get `self.clone()`, else you'll get a new tensor with conjugated value in its memory. c) `.conj_physical_()`, and `out=` variant d) `.resolve_conj()` -- materializes the conjugation. returns self if the conj bit is unset, else returns a new tensor with conjugated values and conj bit set to 0. e) `.resolve_conj_()` in-place version of (d) f) `view_as_real_physical` -- as described in (1), it's functionally same as `view_as_real`, just that it doesn't error out on conjugated tensors. g) `view_as_real` -- existing function, but now errors out on conjugated tensors. 3. Conjugate Fallback a) Vast majority of PyTorch functions would currently use this fallback when they are called on a conjugated tensor. b) This fallback is well equipped to handle the following cases: - functional operation e.g., `torch.sin(input)` - Mutable inputs and in-place operations e.g., `tensor.add_(2)` - out-of-place operation e.g., `torch.sin(input, out=out)` - Tensorlist input args - NOTE: Meta tensors don't work with conjugate fallback. 4. Autograd a) `resolve_conj()` is an identity function w.r.t. autograd b) Everything else works as expected. 5. Testing: a) All method_tests run with conjugate view tensors. b) OpInfo tests that run with conjugate views - test_variant_consistency_eager/jit - gradcheck, gradgradcheck - test_conj_views (that only run for `torch.cfloat` dtype) NOTE: functions like `empty_like`, `zero_like`, `randn_like`, `clone` don't propagate the conjugate bit. Follow up work: 1. conjugate view RFC 2. Add neg bit to re-enable view operation on conjugated tensors 3. Update linalg functions to call into specialized functions that fast path with the hermitian operation. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D28227315 Pulled By: anjali411 fbshipit-source-id: acab9402b9d6a970c6d512809b627a290c8def5f	2021-06-04 14:12:41 -07:00

1 2 3 4 5 ...

1399 Commits