pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Anthony Barbier	ce9e27a0fc	Add new keys for Graphcore IPU (DispatchKey / Backend / DeviceType) We need a key to register our out of tree backend: https://github.com/graphcore/poptorch Pull Request resolved: https://github.com/pytorch/pytorch/pull/74763 Approved by: https://github.com/bdhirsh	2022-04-07 17:18:45 +00:00
Edward Z. Yang	31c86625cc	__torch_function__ mode Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/75154 Approved by: https://github.com/albanD, https://github.com/zou3519	2022-04-07 02:23:29 +00:00
Peter Bell	1ab03a0f6f	Deprecate `__torch_function__` as instance method in C++ Ref #63767 This has already been deprecated in the python code for a long time, but was never deprecated in the C++ api so it's possible users might not have had sufficient warning yet. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74829 Approved by: https://github.com/ezyang	2022-04-06 02:28:00 +00:00
Mikayla Gawarecki	e9a8e6f74a	Add include_self flag to scatter_reduce Pull Request resolved: https://github.com/pytorch/pytorch/pull/74607 Approved by: https://github.com/cpuhrsch	2022-04-05 16:31:39 +00:00
Peter Bell	bf16552617	Restore TestTorchFunctionOverride Fixes #74122 This re-enables TestTorchFunctionOverride and fixes a bunch of test failures that had crept in while it was disabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74202 Approved by: https://github.com/ezyang	2022-04-04 01:26:20 +00:00
Mikayla Gawarecki	2bfa018462	[BC-breaking] Use ScatterGatherKernel for scatter_reduce (CPU-only) (#74226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74226 Update signature of `scatter_reduce_` to match `scatter_/scatter_add_` `Tensor.scatter_reduce_(int64 dim, Tensor index, Tensor src, str reduce)` - Add new reduction options in ScatterGatherKernel.cpp and update `scatter_reduce` to call into the cpu kernel for `scatter.reduce` - `scatter_reduce` now has the same shape constraints as `scatter_` and `scatter_add_` - Migrate `test/test_torch.py:test_scatter_reduce` to `test/test_scatter_gather_ops.py` Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D35222842 Pulled By: mikaylagawarecki fbshipit-source-id: 84930add2ad30baf872c495251373313cb7428bd (cherry picked from commit 1b45139482e22eb0dc8b6aec2a7b25a4b58e31df)	2022-04-01 05:57:45 +00:00
Sherlockk Huang	bbf7e159e0	Implement torch.special.log_ndtr Implements torch.special.log_ndtr Issue: https://github.com/pytorch/pytorch/issues/50345 TODO: - [x] adding proper reference to scipy implementation - [x] double check if the changes in test/test_unary_ufuncs.py is really necessary - [x] check setting for UnaryUfuncInfo cc: @kshitij12345 @mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/74795 Approved by: https://github.com/anjali411	2022-03-29 23:13:37 +00:00
Scott Wolchok	f9d0bc5338	[PyTorch] Delete NestedTensor Python wrapper (#74691 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74691 The wrapper just called through to methods on the underlying Tensor. ghstack-source-id: 152433754 Test Plan: existing tests Reviewed By: ezyang Differential Revision: D34689789 fbshipit-source-id: cf53476780cf3ed00a3aa4add441300bfe8e27ce (cherry picked from commit 5a9e5eb6bc13eb30be6e3c3bc4ac954c92704198)	2022-03-29 19:13:40 +00:00
Christian Puhrsch	e55b73d65a	Add strided layout support for to_dense Fixes #59958 Pull Request resolved: https://github.com/pytorch/pytorch/pull/74486 Approved by: https://github.com/pearu, https://github.com/suo	2022-03-29 00:12:48 +00:00
Christian Puhrsch	7fe0b6a5cd	mul(sparse_csr, sparse_csr) using mul(sparse, sparse) Basic fallback implementation. Let's make this faster once used. NOTE: This is stacked on top of https://github.com/pytorch/pytorch/pull/74294 Pull Request resolved: https://github.com/pytorch/pytorch/pull/74266 Approved by: https://github.com/pearu, https://github.com/malfet	2022-03-25 17:10:33 +00:00
Edward Z. Yang	a5b848aec1	Use has_torch_function_unary instead of manual type test. Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/74278 Approved by: https://github.com/albanD	2022-03-17 02:14:40 +00:00
Scott Wolchok	d4a4430059	[PyTorch] Add Tensor.is_nested (#73999 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73999 Seems to be the typical way to detect a flavor of TensorImpl. ghstack-source-id: 151440167 Test Plan: Existing tests? Reviewed By: ezyang Differential Revision: D34665269 fbshipit-source-id: 5081a00928933e0c5252eeddca43bae0b026013d (cherry picked from commit 7cf62a3f69f158a33c5108f7e96ea4c5520f0f15)	2022-03-16 17:04:30 +00:00
Edward Z. Yang	35cfa74f97	Add a default implementation of __torch_dispatch__ I was working on an explanation of how to call into the "super" implementation of some given ATen operation inside of __torch_dispatch__ (https://github.com/albanD/subclass_zoo/blob/main/trivial_tensors.py) and I kept thinking to myself "Why doesn't just calling super() on __torch_dispatch__ work"? Well, after this patch, it does! The idea is if you don't actually unwrap the input tensors, you can call super().__torch_dispatch__ to get at the original behavior. Internally, this is implemented by disabling PythonKey and then redispatching. This implementation of disabled_torch_dispatch is not /quite/ right, and some reasons why are commented in the code. There is then some extra work I have to do to make sure we recognize disabled_torch_dispatch as the "default" implementation (so we don't start slapping PythonKey on all tensors, including base Tensors), which is modeled the same way as how disabled_torch_function is done. Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/73684 Approved by: albanD	2022-03-03 20:19:33 +00:00
Nikita Shulga	cfb6c942fe	`scatter_reduce` documentation (#73125 ) Summary: Reland of https://github.com/pytorch/pytorch/issues/68580 (which were milestoned for 1.11) plus partial revert of https://github.com/pytorch/pytorch/pull/72543 Pull Request resolved: https://github.com/pytorch/pytorch/pull/73125 Reviewed By: bdhirsh Differential Revision: D34355217 Pulled By: malfet fbshipit-source-id: 325ecdeaf53183d653b44ee5e6e8839ceefd9200 (cherry picked from commit `71db31748a`)	2022-02-22 19:33:46 +00:00
Scott Wolchok	79a216ce57	Move native MHA code out of PyTorch core (#72944 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72944 Doesn't make sense to develop it in core right now. ghstack-source-id: 149456040 Test Plan: CI run MHA benchmark in benchmark_transformers.py to make sure it doesn't crash Reviewed By: zrphercule Differential Revision: D34283104 fbshipit-source-id: 4f0c7a6bc066f938ceac891320d4cf4c3f8a9cd6 (cherry picked from commit `b9df65e97c`)	2022-02-18 21:34:06 +00:00
Brian Hirsh	f87f753bb9	avoiding adding some functions to the public python API before 1.11 release (#72543 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72543 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D34085724 Pulled By: bdhirsh fbshipit-source-id: 941d5a90a6fa5328268d623e0e2b01577e4132ca (cherry picked from commit `6676a0c79a`)	2022-02-14 19:49:01 +00:00
Ryan Spring	4f8b986e28	Implement Tanh Gelu Approximation (#61439 ) Summary: 1. Implements https://github.com/pytorch/pytorch/issues/39853 2. Adds approximate boolean flag to Gelu 3. Enables Tanh Gelu approximation 4. Adds double backward support for Gelu 5. Enable Tanh Gelu in NvFuser ``` def gelu(x, approximate : str = 'none'): if approximate == 'tanh': # sqrt(2/pi) = 0.7978845608028654 return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * (x + 0.044715 * torch.pow(x, 3.0)))) else: return x * normcdf(x) ``` Linking XLA PR - https://github.com/pytorch/xla/pull/3039 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61439 Reviewed By: VitalyFedyunin Differential Revision: D33894937 Pulled By: jbschlosser fbshipit-source-id: b65e8fb6ea66168af8f34f45ed50e92737a33851 (cherry picked from commit `6e986f91a9`)	2022-02-14 03:40:32 +00:00
Brian Muse	8bf3179f6e	#71946 Remove Python 3.6 references (#72211 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/71946 This commit removes some bits of code that were hard coded for Python 3.6 support from the `.circleci` and `torch` folders. It should only be merged if https://github.com/pytorch/pytorch/issues/66462 is complete. Pull Request resolved: https://github.com/pytorch/pytorch/pull/72211 Reviewed By: dagitses, seemethere Differential Revision: D33982604 Pulled By: musebc fbshipit-source-id: 8f453bf9909df615addd59538adb369c65484044 (cherry picked from commit `944a9970fe`)	2022-02-08 03:46:20 +00:00
Rui Zhu	541773d268	Make native MHA private for release 1.11 (#72200 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72200 This op should still remain private in release 1.11, add underscore before op name to make it happens Test Plan: buck run mode/opt -c fbcode.enable_gpu_sections=true pytext/fb/tools:benchmark_transformers -- mha --batch-size=10 --max-sequence-length=16 Reviewed By: bdhirsh Differential Revision: D33952191 fbshipit-source-id: 3f8525ac9c23bb286f51476342113ebc31b8ed59 (cherry picked from commit `6e41bfa4fc`)	2022-02-03 04:15:18 +00:00
Nikita Shulga	74c44ba9d6	Revert D33850228: [pytorch][PR] Implement Tanh Gelu Approximation Test Plan: revert-hammer Differential Revision: D33850228 (`23d03025dc`) Original commit changeset: 3cc33fb298e4 Original Phabricator Diff: D33850228 (`23d03025dc`) fbshipit-source-id: 9436e7df73c2b2e2011f321674f24973316d3692 (cherry picked from commit `c9efb58223`)	2022-01-31 17:44:19 +00:00
Ryan Spring	23d03025dc	Implement Tanh Gelu Approximation (#61439 ) Summary: 1. Implements https://github.com/pytorch/pytorch/issues/39853 2. Adds approximate boolean flag to Gelu 3. Enables Tanh Gelu approximation 4. Adds double backward support for Gelu 5. Enable Tanh Gelu in NvFuser ``` def gelu(x, approximate : str = 'none'): if approximate == 'tanh': # sqrt(2/pi) = 0.7978845608028654 return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * (x + 0.044715 * torch.pow(x, 3.0)))) else: return x * normcdf(x) ``` Linking XLA PR - https://github.com/pytorch/xla/pull/3039 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61439 Reviewed By: cpuhrsch Differential Revision: D33850228 Pulled By: jbschlosser fbshipit-source-id: 3cc33fb298e480d7ecc5c67716da019d60c6ab33 (cherry picked from commit `3a53b3e94f`)	2022-01-31 17:07:45 +00:00
Joel Schlosser	cb823d9f07	Revert D33744717: [pytorch][PR] Implement Tanh Gelu Approximation Test Plan: revert-hammer Differential Revision: D33744717 (`f499ab9cef`) Original commit changeset: d64532a562ed Original Phabricator Diff: D33744717 (`f499ab9cef`) fbshipit-source-id: 396c3f63de5865f894dbc353d0790a01a624be93 (cherry picked from commit `e9fb2d1db1`)	2022-01-28 18:35:01 +00:00
Ryan Spring	f499ab9cef	Implement Tanh Gelu Approximation (#61439 ) Summary: 1. Implements https://github.com/pytorch/pytorch/issues/39853 2. Adds approximate boolean flag to Gelu 3. Enables Tanh Gelu approximation 4. Adds double backward support for Gelu 5. Enable Tanh Gelu in NvFuser ``` def gelu(x, approximate : str = 'none'): if approximate == 'tanh': # sqrt(2/pi) = 0.7978845608028654 return 0.5 * x * (1.0 + torch.tanh(0.7978845608028654 * (x + 0.044715 * torch.pow(x, 3.0)))) else: return x * normcdf(x) ``` Linking XLA PR - https://github.com/pytorch/xla/pull/3039 Pull Request resolved: https://github.com/pytorch/pytorch/pull/61439 Reviewed By: mikaylagawarecki Differential Revision: D33744717 Pulled By: jbschlosser fbshipit-source-id: d64532a562ed53247bb4fa52bb16722634d5c187 (cherry picked from commit `4713dd9cca`)	2022-01-28 16:59:09 +00:00
Mikayla Gawarecki	fdec94504f	Rename _scatter_reduce to scatter_reduce and make it unstructured (#71787 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71787 Test Plan: Imported from OSS Reviewed By: mikaylagawarecki Differential Revision: D33778524 Pulled By: cpuhrsch fbshipit-source-id: 55a330e1c2227c0eaaa1c0d2f9205a4dee24a11b (cherry picked from commit `6e4a8a91da`)	2022-01-27 16:29:13 +00:00
lezcano	108b37db84	[Array API] Add linalg.diagonal (#70599 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70599 This PR adds `linalg.diagonal` following the Array API: https://data-apis.org/array-api/latest/extensions/linear_algebra_functions.html#linalg-diagonal-x-axis1-0-axis2-1-offset-0 Fixes https://github.com/pytorch/pytorch/issues/62813 cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano rgommers pmeier asmeurer leofang AnirudhDagar asi1024 emcastillo kmaehashi Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33760506 Pulled By: mruberry fbshipit-source-id: e32c3490321d8c3f31b3bb538bc1f72b39bd2854 (cherry picked from commit `44f41f8e39`)	2022-01-26 08:08:32 +00:00
mingfeima	054b90f0d6	add channels last support for ChannelShuffle (#50247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50247 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D26007052 Pulled By: VitalyFedyunin fbshipit-source-id: 08f737d64a65791c8002ffd56b79b02cf14d6159	2022-01-14 11:55:21 -08:00
Rui Zhu	9267fd8d73	[WIP] [ATen] Add native_multi_attention_self_attention CPU + GPU implementation (#70649 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70649 As described in https://fb.quip.com/oxpiA1uDBjgP This implements the first parts of the RFC, and is a rough draft showing the approach. The idea is that for the first cut we can maintain very close (identical I believe in this diff) numerical equivalence to the existing nn.MHA implementation, which is what this diff attempts to do. In subsequent implementations, once we have a working and adopted native self-attention implementation, we could then explore alternative implementations, etc. The current implementation is similar to existing dedicated implementations such as LightSeq/FasterTransformer/DeepSpeed, and for MHA on both CPUs and GPUs is between 1.2x and 2x faster depending on the setting. It makes some approximations/restrictions (doesn't handle masking in masked softmax, etc), but these shouldn't materially impact performance. This does the first few items: * add native_multi_head_attention(...) , native_multi_head_attention_backward(..) to native_functions.yaml * Implement native_multi_head_attention(..) on GPU, extracting bits and pieces out of LS/DS/FT as appropriate * Implement native_multi_head_attention(..) on CPU The backward implementation is still WIP, but the idea would be to: * Hook these up in derivatives.yaml Implement native_multi_head_attention_backward(..) on GPU, extracting out bits and pieces out of LS/DS (not FT since it’s inference only) * Implement native_multi_head_attention_backward(..) on CPU * In torch.nn.functional.multi_head_attention_forward `23321ba7a3/torch/nn/functional.py (L4953)`, add some conditionals to check if we are being called in a BERT/ViT-style encoder fashion, and invoke the native function directly. Test Plan: TODO Reviewed By: mikekgfb Differential Revision: D31829981 fbshipit-source-id: c430344d91ba7a5fbee3138e50b3e62efbb33d96	2022-01-08 21:50:41 -08:00
lezcano	a35b4b49d2	Add linalg.lu_factor (#66933 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66933 This PR exposes `torch.lu` as `torch.linalg.lu_factor` and `torch.linalg.lu_factor_ex`. This PR also adds support for matrices with zero elements both in the size of the matrix and the batch. Note that this function simply returns empty tensors of the correct size in this case. We add a test and an OpInfo for the new function. This PR also adds documentation for this new function in line of the documentation in the rest of `torch.linalg`. Fixes https://github.com/pytorch/pytorch/issues/56590 Fixes https://github.com/pytorch/pytorch/issues/64014 cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D32834069 Pulled By: mruberry fbshipit-source-id: 51ef12535fa91d292f419acf83b800b86ee9c7eb	2022-01-05 20:32:12 -08:00
Heitor Schueroff	34c49d3d3b	Document torch.quantile interpolation kwarg (#70637 ) Summary: clone of https://github.com/pytorch/pytorch/pull/59397 This PR documents the interpolation kwarg parameter added in https://github.com/pytorch/pytorch/issues/49267. Now that the forward compatibility period is over, we can expose this parameter. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70637 Reviewed By: jbschlosser Differential Revision: D33411707 Pulled By: anjali411 fbshipit-source-id: f5f2d0a6739b3a855bbdf58fc671ac2f0342ce69	2022-01-05 11:02:13 -08:00
Joel Schlosser	e6c3aa3880	Remove backward ops for mkldnn convolution (#70467 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70467 Test Plan: Imported from OSS Reviewed By: mikaylagawarecki Differential Revision: D33342476 Pulled By: jbschlosser fbshipit-source-id: 9811d02b16adea0dd1dd2500261f4b3b294d2dee	2021-12-30 14:29:22 -08:00
anjali411	3e6164449f	Add efficient zero tensors (#64837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64837 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D32834987 Pulled By: anjali411 fbshipit-source-id: 20ea08ade0db0044ca633d9c1a117a6a2e65d1fd	2021-12-08 10:37:39 -08:00
Mark Richardson	834bd3134e	Back out "Add efficient zero tensors" (#69327 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69327 Original commit changeset: d44096d88265 Original Phabricator Diff: D32144240 (`668574af4a`) Test Plan: CI original diff failed 175 builds in CI Reviewed By: airboyang, anjali411 Differential Revision: D32809407 fbshipit-source-id: c7c8e69bcee0274992e2d5da901f035332e60071	2021-12-02 19:11:41 -08:00
anjali411	668574af4a	Add efficient zero tensors (#64837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64837 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32144240 Pulled By: anjali411 fbshipit-source-id: d44096d882657c7f9270a16636900e0b73cefa40	2021-12-02 08:47:45 -08:00
Mike Ruberry	6ae34ea6f8	Revert D32521980: Add linalg.lu_factor Test Plan: revert-hammer Differential Revision: D32521980 (`b10929a14a`) Original commit changeset: 26a49ebd87f8 fbshipit-source-id: e1a6bb9c2ece9bd78190fe17e16a46e3358c5c82	2021-11-28 17:22:15 -08:00
lezcano	b10929a14a	Add linalg.lu_factor (#66933 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66933 This PR exposes `torch.lu` as `torch.linalg.lu_factor` and `torch.linalg.lu_factor_ex`. This PR also adds support for matrices with zero elements both in the size of the matrix and the batch. Note that this function simply returns empty tensors of the correct size in this case. We add a test and an OpInfo for the new function. This PR also adds documentation for this new function in line of the documentation in the rest of `torch.linalg`. Fixes https://github.com/pytorch/pytorch/issues/56590 Fixes https://github.com/pytorch/pytorch/issues/64014 cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32521980 Pulled By: mruberry fbshipit-source-id: 26a49ebd87f8a41472f8cd4e9de4ddfb7f5581fb	2021-11-27 17:52:48 -08:00
lezcano	b46c89d950	Add linalg.solve_triangular (#63568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63568 This PR adds the first solver with structure to `linalg`. This solver has an API compatible with that of `linalg.solve` preparing these for a possible future merge of the APIs. The new API: - Just returns the solution, rather than the solution and a copy of `A` - Removes the confusing `transpose` argument and replaces it by a correct handling of conj and strides within the call - Adds a `left=True` kwarg. This can be achieved via transposes of the inputs and the result, but it's exposed for convenience. This PR also implements a dataflow that minimises the number of copies needed before calling LAPACK / MAGMA / cuBLAS and takes advantage of the conjugate and neg bits. This algorithm is implemented for `solve_triangular` (which, for this, is the most complex of all the solvers due to the `upper` parameters). Once more solvers are added, we will factor out this calling algorithm, so that all of them can take advantage of it. Given the complexity of this algorithm, we implement some thorough testing. We also added tests for all the backends, which was not done before. We also add forward AD support for `linalg.solve_triangular` and improve the docs of `linalg.solve_triangular`. We also fix a few issues with those of `torch.triangular_solve`. Resolves https://github.com/pytorch/pytorch/issues/54258 Resolves https://github.com/pytorch/pytorch/issues/56327 Resolves https://github.com/pytorch/pytorch/issues/45734 cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32588230 Pulled By: mruberry fbshipit-source-id: 69e484849deb9ad7bb992cc97905df29c8915910	2021-11-22 12:41:06 -08:00
jiej	ca92111758	Add native_dropout (#63937 ) Summary: Adds native_dropout to have a reasonable target for torchscript in auto diff. native_dropout has scale and train as arguments in its signature, this makes native_dropout more consistent with other operators and removes conditionals in the autodiff definition. cc gmagogsfm Pull Request resolved: https://github.com/pytorch/pytorch/pull/63937 Reviewed By: mruberry Differential Revision: D32477657 Pulled By: ngimel fbshipit-source-id: d37b137a37acafa50990f60c77f5cea2818454e4	2021-11-18 19:41:10 -08:00
Jane Xu	9f4e004abd	Revert D32283178: Add linalg.solve_triangular Test Plan: revert-hammer Differential Revision: D32283178 (`0706607abc`) Original commit changeset: deb672e6e52f fbshipit-source-id: d2a3421292147426cc61c2f063b721acf9004755	2021-11-18 14:46:10 -08:00
lezcano	0706607abc	Add linalg.solve_triangular (#63568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63568 This PR adds the first solver with structure to `linalg`. This solver has an API compatible with that of `linalg.solve` preparing these for a possible future merge of the APIs. The new API: - Just returns the solution, rather than the solution and a copy of `A` - Removes the confusing `transpose` argument and replaces it by a correct handling of conj and strides within the call - Adds a `left=True` kwarg. This can be achieved via transposes of the inputs and the result, but it's exposed for convenience. This PR also implements a dataflow that minimises the number of copies needed before calling LAPACK / MAGMA / cuBLAS and takes advantage of the conjugate and neg bits. This algorithm is implemented for `solve_triangular` (which, for this, is the most complex of all the solvers due to the `upper` parameters). Once more solvers are added, we will factor out this calling algorithm, so that all of them can take advantage of it. Given the complexity of this algorithm, we implement some thorough testing. We also added tests for all the backends, which was not done before. We also add forward AD support for `linalg.solve_triangular` and improve the docs of `linalg.solve_triangular`. We also fix a few issues with those of `torch.triangular_solve`. Resolves https://github.com/pytorch/pytorch/issues/54258 Resolves https://github.com/pytorch/pytorch/issues/56327 Resolves https://github.com/pytorch/pytorch/issues/45734 cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: zou3519, JacobSzwejbka Differential Revision: D32283178 Pulled By: mruberry fbshipit-source-id: deb672e6e52f58b76536ab4158073927a35e43a8	2021-11-18 09:45:51 -08:00
Rok	952ca25daa	Sparse CSR: add `convert_indices_from_csr_to_coo` (#66774 ) Summary: This PR adds conversion from CSR to COO. Fixes https://github.com/pytorch/pytorch/issues/56959 cc nikitaved pearu cpuhrsch IvanYashchuk gchanan mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/66774 Reviewed By: zou3519 Differential Revision: D32288415 Pulled By: cpuhrsch fbshipit-source-id: 683ba658dc46835fdf3c0e24645c0c2bb243b968	2021-11-17 22:28:30 -08:00
rusty1s	9807787135	`scatter_reduce` (#68115 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63780 Basic functionality of a `scatter_reduce` algorithm with `reduce="sum"`: * `scatter_reduce` is named as `scatter_reduce2` due to compiling issues * It currently re-uses functionality from `scatter_add` * Tests are missing: WIP The error when the `scatter_reduce` naming is used: ``` In file included from aten/src/ATen/core/TensorBody.h:3, from ../aten/src/ATen/core/Tensor.h:3, from ../aten/src/ATen/DeviceGuard.h:4, from ../aten/src/ATen/ATen.h:11, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/Operators.h:13949:18: error: redefinition of ‘struct at::_ops::scatter_reduce’ 13949 \| struct TORCH_API scatter_reduce { \| ^~~~~~~~~~~~~~ aten/src/ATen/Operators.h:13817:18: note: previous definition of ‘struct at::_ops::scatter_reduce’ 13817 \| struct TORCH_API scatter_reduce { \| ^~~~~~~~~~~~~~ aten/src/ATen/Operators.h:13960:18: error: redefinition of ‘struct at::_ops::scatter_reduce_out’ 13960 \| struct TORCH_API scatter_reduce_out { \| ^~~~~~~~~~~~~~~~~~ aten/src/ATen/Operators.h:13839:18: note: previous definition of ‘struct at::_ops::scatter_reduce_out’ 13839 \| struct TORCH_API scatter_reduce_out { \| ^~~~~~~~~~~~~~~~~~ In file included from ../aten/src/ATen/core/Tensor.h:3, from ../aten/src/ATen/DeviceGuard.h:4, from ../aten/src/ATen/ATen.h:11, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/core/TensorBody.h: In member function ‘at::Tensor at::Tensor::scatter_reduce(int64_t, const at::Tensor&, c10::string_view, c10::optional<long int>) const’: aten/src/ATen/core/TensorBody.h:3976:83: error: cannot convert ‘c10::string_view’ {aka ‘c10::basic_string_view<char>’} to ‘const at::Tensor&’ 3976 \| return at::_ops::scatter_reduce::call(const_cast<Tensor&>(*this), dim, index, reduce, output_size); \| ^~~~~~ \| \| \| c10::string_view {aka c10::basic_string_view<char>} In file included from aten/src/ATen/core/TensorBody.h:3, from ../aten/src/ATen/core/Tensor.h:3, from ../aten/src/ATen/DeviceGuard.h:4, from ../aten/src/ATen/ATen.h:11, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/Operators.h:13824:109: note: initializing argument 4 of ‘static at::Tensor at::_ops::scatter_reduce::call(const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, c10::string_view)’ 13824 \| static at::Tensor call(const at::Tensor & self, int64_t dim, const at::Tensor & index, const at::Tensor & src, c10::string_view reduce); \| ~~~~~~~~~~~~~~~~~~~^~~ In file included from ../aten/src/ATen/ATen.h:15, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/Functions.h: In function ‘at::Tensor at::scatter_reduce(const at::Tensor&, int64_t, const at::Tensor&, c10::string_view, c10::optional<long int>)’: aten/src/ATen/Functions.h:7119:61: error: cannot convert ‘c10::string_view’ {aka ‘c10::basic_string_view<char>’} to ‘const at::Tensor&’ 7119 \| return at::_ops::scatter_reduce::call(self, dim, index, reduce, output_size); \| ^~~~~~ \| \| \| c10::string_view {aka c10::basic_string_view<char>} In file included from aten/src/ATen/core/TensorBody.h:3, from ../aten/src/ATen/core/Tensor.h:3, from ../aten/src/ATen/DeviceGuard.h:4, from ../aten/src/ATen/ATen.h:11, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/Operators.h:13824:109: note: initializing argument 4 of ‘static at::Tensor at::_ops::scatter_reduce::call(const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, c10::string_view)’ 13824 \| static at::Tensor call(const at::Tensor & self, int64_t dim, const at::Tensor & index, const at::Tensor & src, c10::string_view reduce); \| ~~~~~~~~~~~~~~~~~~~^~~ In file included from ../aten/src/ATen/ATen.h:15, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/Functions.h: In function ‘at::Tensor& at::scatter_reduce_out(at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, c10::string_view, c10::optional<long int>)’: aten/src/ATen/Functions.h:7124:65: error: cannot convert ‘c10::string_view’ {aka ‘c10::basic_string_view<char>’} to ‘const at::Tensor&’ 7124 \| return at::_ops::scatter_reduce_out::call(self, dim, index, reduce, output_size, out); \| ^~~~~~ \| \| \| c10::string_view {aka c10::basic_string_view<char>} In file included from aten/src/ATen/core/TensorBody.h:3, from ../aten/src/ATen/core/Tensor.h:3, from ../aten/src/ATen/DeviceGuard.h:4, from ../aten/src/ATen/ATen.h:11, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/Operators.h:13846:111: note: initializing argument 4 of ‘static at::Tensor& at::_ops::scatter_reduce_out::call(const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, c10::string_view, at::Tensor&)’ 13846 \| static at::Tensor & call(const at::Tensor & self, int64_t dim, const at::Tensor & index, const at::Tensor & src, c10::string_view reduce, at::Tensor & out); \| ~~~~~~~~~~~~~~~~~~~^~~ In file included from ../aten/src/ATen/ATen.h:15, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/Functions.h: In function ‘at::Tensor& at::scatter_reduce_outf(const at::Tensor&, int64_t, const at::Tensor&, c10::string_view, c10::optional<long int>, at::Tensor&)’: aten/src/ATen/Functions.h:7129:65: error: cannot convert ‘c10::string_view’ {aka ‘c10::basic_string_view<char>’} to ‘const at::Tensor&’ 7129 \| return at::_ops::scatter_reduce_out::call(self, dim, index, reduce, output_size, out); \| ^~~~~~ \| \| \| c10::string_view {aka c10::basic_string_view<char>} In file included from aten/src/ATen/core/TensorBody.h:3, from ../aten/src/ATen/core/Tensor.h:3, from ../aten/src/ATen/DeviceGuard.h:4, from ../aten/src/ATen/ATen.h:11, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/Operators.h:13846:111: note: initializing argument 4 of ‘static at::Tensor& at::_ops::scatter_reduce_out::call(const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, c10::string_view, at::Tensor&)’ 13846 \| static at::Tensor & call(const at::Tensor & self, int64_t dim, const at::Tensor & index, const at::Tensor & src, c10::string_view reduce, at::Tensor & out); \| ~~~~~~~~~~~~~~~~~~~^~~ In file included from aten/src/ATen/NativeFunctions.h:6, from ../aten/src/ATen/TensorIndexing.h:12, from ../aten/src/ATen/ATen.h:20, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/NativeMetaFunctions.h: At global scope: aten/src/ATen/NativeMetaFunctions.h:496:18: error: redefinition of ‘struct at::meta::structured_scatter_reduce’ 496 \| struct TORCH_API structured_scatter_reduce : public at::impl::MetaBase { \| ^~~~~~~~~~~~~~~~~~~~~~~~~ aten/src/ATen/NativeMetaFunctions.h:481:18: note: previous definition of ‘struct at::meta::structured_scatter_reduce’ 481 \| struct TORCH_API structured_scatter_reduce : public at::impl::MetaBase { \| ^~~~~~~~~~~~~~~~~~~~~~~~~ ninja: build stopped: subcommand failed. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/68115 Reviewed By: albanD Differential Revision: D32488450 Pulled By: cpuhrsch fbshipit-source-id: 65e79c6d0555c0d5715535bb52aade8d5fcd9722	2021-11-17 19:53:12 -08:00
vfdev-5	3da2e09c9b	Added antialias flag to interpolate (CPU only, bilinear) (#65142 ) Summary: Description: - Added antialias flag to interpolate (CPU only) - forward and backward for bilinear mode - added tests ### Benchmarks <details> <summary> Forward pass, CPU. PTH interpolation vs PIL </summary> Cases: - PTH RGB 3 Channels, float32 vs PIL RGB uint8 (apply vs pears) - PTH 1 Channel, float32 vs PIL 1 Channel Float Code: https://gist.github.com/vfdev-5/b173761a567f2283b3c649c3c0574112 ``` # OMP_NUM_THREADS=1 python bench_interp_aa_vs_pillow.py Torch config: PyTorch built with: - GCC 9.3 - C++ Version: 201402 - OpenMP 201511 (a.k.a. OpenMP 4.5) - CPU capability usage: AVX2 - CUDA Runtime 11.1 - NVCC architecture flags: -gencode;arch=compute_75,code=sm_75 - CuDNN 8.0.5 - Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON, Num threads: 1 [------------------------ Downsampling: torch.Size([1, 3, 906, 438]) -> (320, 196) ------------------------] \| Reference, PIL 8.3.2, mode: RGB \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 2.9 \| 3.1 channels_last non-contiguous torch.float32 \| 2.6 \| 3.6 Times are in milliseconds (ms). [------------------------ Downsampling: torch.Size([1, 3, 906, 438]) -> (460, 220) ------------------------] \| Reference, PIL 8.3.2, mode: RGB \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 3.4 \| 4.0 channels_last non-contiguous torch.float32 \| 3.4 \| 4.8 Times are in milliseconds (ms). [------------------------ Downsampling: torch.Size([1, 3, 906, 438]) -> (120, 96) -------------------------] \| Reference, PIL 8.3.2, mode: RGB \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 1.6 \| 1.8 channels_last non-contiguous torch.float32 \| 1.6 \| 1.9 Times are in milliseconds (ms). [----------------------- Downsampling: torch.Size([1, 3, 906, 438]) -> (1200, 196) ------------------------] \| Reference, PIL 8.3.2, mode: RGB \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 9.0 \| 11.3 channels_last non-contiguous torch.float32 \| 8.9 \| 12.5 Times are in milliseconds (ms). [----------------------- Downsampling: torch.Size([1, 3, 906, 438]) -> (120, 1200) ------------------------] \| Reference, PIL 8.3.2, mode: RGB \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 2.1 \| 1.8 channels_last non-contiguous torch.float32 \| 2.1 \| 3.4 Times are in milliseconds (ms). [--------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (320, 196) --------------] \| Reference, PIL 8.3.2, mode: F \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------ contiguous torch.float32 \| 1.2 \| 1.0 Times are in milliseconds (ms). [--------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (460, 220) --------------] \| Reference, PIL 8.3.2, mode: F \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------ contiguous torch.float32 \| 1.4 \| 1.3 Times are in milliseconds (ms). [--------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (120, 96) ---------------] \| Reference, PIL 8.3.2, mode: F \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------ contiguous torch.float32 \| 719.9 \| 599.9 Times are in microseconds (us). [-------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (1200, 196) --------------] \| Reference, PIL 8.3.2, mode: F \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------ contiguous torch.float32 \| 3.7 \| 3.5 Times are in milliseconds (ms). [-------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (120, 1200) --------------] \| Reference, PIL 8.3.2, mode: F \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------ contiguous torch.float32 \| 834.4 \| 605.7 Times are in microseconds (us). ``` </details> Code is moved from torchvision: https://github.com/pytorch/vision/pull/4208 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65142 Reviewed By: mrshenli Differential Revision: D32432405 Pulled By: jbschlosser fbshipit-source-id: b66c548347f257c522c36105868532e8bc1d4c6d	2021-11-17 09:10:15 -08:00
Thomas Metcalfe	ba16b1eca7	[numpy] Alias `arctan2` to `atan2` (#67010 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65906 Adds an alias `arctan2` to improve numpy compatibility cc mruberry rgommers Pull Request resolved: https://github.com/pytorch/pytorch/pull/67010 Reviewed By: anjali411 Differential Revision: D32378998 Pulled By: mruberry fbshipit-source-id: 424c5c10c12b49c20ee83ccd109325c480b5b6cf	2021-11-16 09:41:09 -08:00
David Dang	f7366ca51b	implemented quantize_per_tensor_dynamic and added a corresponding test script (#68004 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68004 Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D32301792 Pulled By: dzdang fbshipit-source-id: f680557ba4736d095efc33e8c92111265f25aee0	2021-11-13 06:34:36 -08:00
Anirudh Dagar	b07a11929d	Array API: Add torch.linalg.cross (#63285 ) Summary: ### Create `linalg.cross` Fixes https://github.com/pytorch/pytorch/issues/62810 As discussed in the corresponding issue, this PR adds `cross` to the `linalg` namespace (Note: There is no method variant) which is slightly different in behaviour compared to `torch.cross`. Note: this is NOT an alias as suggested in mruberry's [https://github.com/pytorch/pytorch/issues/62810 comment](https://github.com/pytorch/pytorch/issues/62810#issuecomment-897504372) below > linalg.cross being consistent with the Python Array API (over NumPy) makes sense because NumPy has no linalg.cross. I also think we can implement linalg.cross without immediately deprecating torch.cross, although we should definitely refer users to linalg.cross. Deprecating torch.cross will require additional review. While it's not used often it is used, and it's unclear if users are relying on its unique behavior or not. The current default implementation of `torch.cross` is extremely weird and confusing. This has also been reported multiple times previously. (See https://github.com/pytorch/pytorch/issues/17229, https://github.com/pytorch/pytorch/issues/39310, https://github.com/pytorch/pytorch/issues/41850, https://github.com/pytorch/pytorch/issues/50273) - [x] Add `torch.linalg.cross` with default `dim=-1` - [x] Add OpInfo and other tests for `torch.linalg.cross` - [x] Add broadcasting support to `torch.cross` and `torch.linalg.cross` - [x] Remove out skip from `torch.cross` OpInfo - [x] Add docs for `torch.linalg.cross`. Improve docs for `torch.cross` mentioning `linalg.cross` and the difference between the two. Also adds a warning to `torch.cross`, that it may change in the future (we might want to deprecate it later) --- ### Additional Fixes to `torch.cross` - [x] Fix Doc for Tensor.cross - [x] Fix torch.cross in `torch/overridres.py` While working on `linalg.cross` I noticed these small issues with `torch.cross` itself. [Tensor.cross docs](https://pytorch.org/docs/stable/generated/torch.Tensor.cross.html) still mentions `dim=-1` default which is actually wrong. It should be `dim=None` after the behaviour was updated in PR https://github.com/pytorch/pytorch/issues/17582 but the documentation for the `method` or `function` variant wasn’t updated. Later PR https://github.com/pytorch/pytorch/issues/41850 updated the documentation for the `function` variant i.e `torch.cross` and also added the following warning about the weird behaviour. > If `dim` is not given, it defaults to the first dimension found with the size 3. Note that this might be unexpected. But still, the `Tensor.cross` docs were missed and remained outdated. I’m finally fixing that here. Also fixing `torch/overrides.py` for `torch.cross` as well now, with `dim=None`. To verify according to the docs the default behaviour of `dim=-1` should raise, you can try the following. ```python a = torch.randn(3, 4) b = torch.randn(3, 4) b.cross(a) # this works because the implementation finds 3 in the first dimension and the default behaviour as shown in documentation is actually not true. >>> tensor([[ 0.7171, -1.1059, 0.4162, 1.3026], [ 0.4320, -2.1591, -1.1423, 1.2314], [-0.6034, -1.6592, -0.8016, 1.6467]]) b.cross(a, dim=-1) # this raises as expected since the last dimension doesn't have a 3 >>> RuntimeError: dimension -1 does not have size 3 ``` Please take a closer look (particularly the autograd part, this is the first time I'm dealing with `derivatives.yaml`). If there is something missing, wrong or needs more explanation, please let me know. Looking forward to the feedback. cc mruberry Lezcano IvanYashchuk rgommers Pull Request resolved: https://github.com/pytorch/pytorch/pull/63285 Reviewed By: gchanan Differential Revision: D32313346 Pulled By: mruberry fbshipit-source-id: e68c2687c57367274e8ddb7ef28ee92dcd4c9f2c	2021-11-11 12:49:41 -08:00
Kurt Mohler	db014b8529	Add `set_deterministic_debug_mode` and `get_deterministic_debug_mode` (#67778 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67386 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67778 Reviewed By: ngimel Differential Revision: D32310661 Pulled By: mruberry fbshipit-source-id: 300129e96ca51c22fa711182ce6a9f4d4d2ce57f	2021-11-11 12:48:29 -08:00
kshitij12345	510e3026a9	[numpy] add torch.argwhere (#64257 ) Summary: Adds `torch.argwhere` as an alias to `torch.nonzero` Currently, `torch.nonzero` is actually provides equivalent functionality to `np.argwhere`. From NumPy docs, > np.argwhere(a) is almost the same as np.transpose(np.nonzero(a)), but produces a result of the correct shape for a 0D array. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64257 Reviewed By: qihqi Differential Revision: D32049884 Pulled By: saketh-are fbshipit-source-id: 016e49884698daa53b83e384435c3f8f6b5bf6bb	2021-10-30 15:26:11 -07:00
Brian Hirsh	03f3a0331b	add slice/select/diagonal_scatter variants as primitive ops (#64430 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64430 The functionalization pass needs `{view}_scatter` versions of the slice/select/diagonal ops in order to correctly propagate mutations from a view to its base. On top of that, the implementations need to be primitive w.r.t. autograd, because they look something like `...slice().copy_()`, and the functionalization pass can't use views + mutations inside of it's own alias-removal machinery! I added some basic tests that I tried to base off of existing tests for views (particularly around testing the derivative formulas), but I'm wondering if I should add something more comprehensive. Also, as_strided fits into this category - the functionalization pass will need an `as_strided_scatter` op that's primitive w.r.t. autograd. I didn't add it for now, because it'll involve duplicating a bunch of logic from the current `as_strided_backward()` function, and also writing a derivative formula that I wasn't sure how to write :) Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31942092 Pulled By: bdhirsh fbshipit-source-id: c702a57c2748a7c771c14e4bcc3e996b48fcc4c8	2021-10-28 10:51:12 -07:00
jjsjann123	1ec732bc46	Add fp16/fp32 autocasting to JIT/TorchScript (#63939 ) Summary: Adds mixed precision autocasting support between fp32/fp16 to torchscript/JIT. More in depth descriptoin can be found at [torch/csrc/jit/JIT-AUTOCAST.md](https://github.com/pytorch/pytorch/pull/63939/files#diff-1f1772aaa508841c5bb58b74ab98f49a1e577612cd9ea5c386c8714a75db830b) This PR implemented an autocast optimization pass that inserts casting ops per AMP rule (torch/csrc/jit/passes/autocast.cpp), that mimics the behavior of eager autocast. The pass also takes into consideration the context of `torch.cuda.amp.autocast` and only inserts casting ops within the enabled context manager, giving feature parity as with eager amp autocast. We currently provide JIT AMP autocast as a prototyping feature, so it is default off and could be turned on via `torch._C._jit_set_autocast_mode(True)` The JIT support for autocast is subject to different constraints compared to the eager mode implementation (mostly related to the fact that TorchScript is statically typed), restriction on the user facing python code is described in doc torch/csrc/jit/JIT-AUTOCAST.md This is a prototype, there are also implementation limitation that's necessary to keep this PR small and get something functioning quickly on upstream, so we can iterate on designs. Few limitation/challenge that is not properly resolved in this PR: 1. Autocast inserts cast operation, which would have impact on scalar type of output tensor feeding downstream operations. We are not currently propagating the updated scalar types, this would give issues/wrong results on operations in promotion rules. 2. Backward for autodiff in JIT misses the casting of dgrad to input scalar type, as what autograd does in eager. This forces us to explicitly mark the casting operation for certain operations (e.g. binary ops), otherwise, we might be feeding dgrad with mismatch scalar type to input. This could potentially break gradient function consuming dgrad. (e.g. gemm backwards, which assumes grad_output to be of same scalar type as input') 3. `torch.autocast` api has an optional argument `dtype` which is not currently supported in the JIT autocast and we require a static value. Credit goes mostly to: tlemo kevinstephano Pull Request resolved: https://github.com/pytorch/pytorch/pull/63939 Reviewed By: navahgar Differential Revision: D31093381 Pulled By: eellison fbshipit-source-id: da6e26c668c38b01e296f304507048d6c1794314	2021-10-27 12:11:36 -07:00
Saketh Are	33790c4e06	Implement histogramdd on CPU (#65318 ) Summary: Implements `torch.histogramdd` analogous to `numpy.histogramdd`. Builds on https://github.com/pytorch/pytorch/pull/58780, generalizing the existing `torch.histogram` kernel to handle D-dimensional inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65318 Reviewed By: soulitzer Differential Revision: D31654555 Pulled By: saketh-are fbshipit-source-id: 14b781fac0fd3698b052dbd6f0fda46e50d4c5f1	2021-10-21 16:09:31 -07:00

1 2 3 4 5 ...

296 Commits