pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
XiaobingSuper	136dadd689	fix norrow_copy correctness issue for non-contiguous input for cpu path (#91789 ) Fix https://github.com/pytorch/pytorch/issues/91690. Pull Request resolved: https://github.com/pytorch/pytorch/pull/91789 Approved by: https://github.com/jgong5, https://github.com/lezcano	2023-01-09 00:55:03 +00:00
PyTorch MergeBot	b3603f8129	Revert "Deduplicate c10 error and PyTorchError hierarchy (#87855 )" This reverts commit `34f2d3e6ae`. Reverted https://github.com/pytorch/pytorch/pull/87855 on behalf of https://github.com/osalpekar due to perf regression in quantization tests	2023-01-06 19:56:35 +00:00
William Phetsinorath	34f2d3e6ae	Deduplicate c10 error and PyTorchError hierarchy (#87855 ) Fixes #53370 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87855 Approved by: https://github.com/albanD	2023-01-02 15:53:36 +00:00
ecao	274d3b24c3	use scatter_add for index_add when dim is the most inner dim (#88729 ) ### Motivation When dim is -1 and the slice of source or result is noncontiguous, original `index_add` is slow as it uses add for the sliced tensor, which is serial on index and parallel on sliced tensor to avoid write conflict. Doing parallel on the sliced tensor is not optimal as the size of sliced tensor may be not big enough to parallel and also causes multiple parallelizations. `scatter_add ` is used to speedup for this case as `scatter_add ` parallels on the outer dimension of input and is serial on the inner dimension to avoid write conflict. `scatter_add ` only need one parallel and the size of outer dimensions is bigger to do parallel. ### Testing - Single core: Before: shape \| fp32 / s \| bf16 / s -- \| -- \| -- [10, 128, 20, 20] \| 2.82E-03 \| 2.11E-03 [10, 128, 50, 50] \| 0.023604 \| 0.023794 After: shape \| fp32 / s \| bf16 / s -- \| -- \| -- [10, 128, 20, 20] \| 9.30E-04 \| 1.66E-03 [10, 128, 50, 50] \| 0.005995 \| 0.010003 - Single socket (28 cores): Before: shape \| fp32 / s \| bf16 / s -- \| -- \| -- [10, 128, 20, 20] \| 2.96E-03 \| 2.52E-03 [10, 128, 50, 50] \| 0.012208 \| 0.012568 After: shape \| fp32 / s \| bf16 / s -- \| -- \| -- [10, 128, 20, 20] \| 7.44E-05 \| 1.33E-04 [10, 128, 50, 50] \| 0.000333 \| 0.000469 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88729 Approved by: https://github.com/mingfeima, https://github.com/jgong5, https://github.com/malfet	2022-12-28 12:04:17 +00:00
PyTorch MergeBot	eadd557266	Revert "use scatter_add for index_add when dim is the most inner dim (#88729 )" This reverts commit `68e9da68cb`. Reverted https://github.com/pytorch/pytorch/pull/88729 on behalf of https://github.com/atalman due to Break internal build	2022-12-22 18:06:45 +00:00
ecao	68e9da68cb	use scatter_add for index_add when dim is the most inner dim (#88729 ) ### Motivation When dim is -1 and the slice of source or result is noncontiguous, original `index_add` is slow as it uses add for the sliced tensor, which is serial on index and parallel on sliced tensor to avoid write conflict. Doing parallel on the sliced tensor is not optimal as the size of sliced tensor may be not big enough to parallel and also causes multiple parallelizations. `scatter_add ` is used to speedup for this case as `scatter_add ` parallels on the outer dimension of input and is serial on the inner dimension to avoid write conflict. `scatter_add ` only need one parallel and the size of outer dimensions is bigger to do parallel. ### Testing - Single core: Before: shape \| fp32 / s \| bf16 / s -- \| -- \| -- [10, 128, 20, 20] \| 2.82E-03 \| 2.11E-03 [10, 128, 50, 50] \| 0.023604 \| 0.023794 After: shape \| fp32 / s \| bf16 / s -- \| -- \| -- [10, 128, 20, 20] \| 9.30E-04 \| 1.66E-03 [10, 128, 50, 50] \| 0.005995 \| 0.010003 - Single socket (28 cores): Before: shape \| fp32 / s \| bf16 / s -- \| -- \| -- [10, 128, 20, 20] \| 2.96E-03 \| 2.52E-03 [10, 128, 50, 50] \| 0.012208 \| 0.012568 After: shape \| fp32 / s \| bf16 / s -- \| -- \| -- [10, 128, 20, 20] \| 7.44E-05 \| 1.33E-04 [10, 128, 50, 50] \| 0.000333 \| 0.000469 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88729 Approved by: https://github.com/mingfeima, https://github.com/jgong5, https://github.com/malfet	2022-12-22 01:13:35 +00:00
PyTorch MergeBot	3194281ca7	Revert "use scatter_add for index_add when dim is the most inner dim (#88729 )" This reverts commit `13dbad6369`. Reverted https://github.com/pytorch/pytorch/pull/88729 on behalf of https://github.com/desertfire due to causing inductor test failure	2022-12-20 15:19:54 +00:00
ecao	13dbad6369	use scatter_add for index_add when dim is the most inner dim (#88729 ) ### Motivation When dim is -1 and the slice of source or result is noncontiguous, original `index_add` is slow as it uses add for the sliced tensor, which is serial on index and parallel on sliced tensor to avoid write conflict. Doing parallel on the sliced tensor is not optimal as the size of sliced tensor may be not big enough to parallel and also causes multiple parallelizations. `scatter_add ` is used to speedup for this case as `scatter_add ` parallels on the outer dimension of input and is serial on the inner dimension to avoid write conflict. `scatter_add ` only need one parallel and the size of outer dimensions is bigger to do parallel. ### Testing - Single core: Before: shape \| fp32 / s \| bf16 / s -- \| -- \| -- [10, 128, 20, 20] \| 2.82E-03 \| 2.11E-03 [10, 128, 50, 50] \| 0.023604 \| 0.023794 After: shape \| fp32 / s \| bf16 / s -- \| -- \| -- [10, 128, 20, 20] \| 9.30E-04 \| 1.66E-03 [10, 128, 50, 50] \| 0.005995 \| 0.010003 - Single socket (28 cores): Before: shape \| fp32 / s \| bf16 / s -- \| -- \| -- [10, 128, 20, 20] \| 2.96E-03 \| 2.52E-03 [10, 128, 50, 50] \| 0.012208 \| 0.012568 After: shape \| fp32 / s \| bf16 / s -- \| -- \| -- [10, 128, 20, 20] \| 7.44E-05 \| 1.33E-04 [10, 128, 50, 50] \| 0.000333 \| 0.000469 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88729 Approved by: https://github.com/mingfeima, https://github.com/jgong5, https://github.com/malfet	2022-12-20 13:12:36 +00:00
Yanbo Liang	511fbad830	[Dynamo] Fix builder for class with metaclass (#90807 ) Fixes Meta internal user case: a class with metaclass can't be identified as ```UserDefinedClassVariable```. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90807 Approved by: https://github.com/jansel	2022-12-20 05:02:28 +00:00
Edward Z. Yang	e686a442b4	If a torch.* returns non-Tensor, make this unimplemented rather than assert. (#89918 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89918 Approved by: https://github.com/albanD	2022-12-15 21:53:54 +00:00
Edward Z. Yang	283cf718ed	Fix _fix_weakref memory leak (#90823 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/90823 Approved by: https://github.com/eellison, https://github.com/albanD	2022-12-15 01:07:29 +00:00
Edward Z. Yang	cc504ce292	Restore test_warn_types (#90810 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/90810 Approved by: https://github.com/ngimel	2022-12-14 05:15:32 +00:00
Bin Bao	7035bcdd0f	[inductor] Enable test_torch (#90518 ) Summary: Skipping failures in those tests so that CI can guard other passing cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/90518 Approved by: https://github.com/jansel	2022-12-13 16:21:35 +00:00
Yuxin Wu	5d8618dfbd	Some memory saving in large unittests (#90148 ) Two tests test_large_cumsum, test_large_cumprod use a lot of memory. This PR: * Reduces their memory usage by: avoid `self.assertEqual` and avoid a temporary python variable * Mark their memory requirement by decorator. related to https://github.com/pytorch/pytorch/issues/84944 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90148 Approved by: https://github.com/soumith	2022-12-11 21:04:38 +00:00
Edward Z. Yang	2ad6ed8ac9	Fix some typed storage is deprecated warnings. (#89867 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/89867 Approved by: https://github.com/albanD	2022-12-07 20:09:57 +00:00
eqy	f7520cb51e	Reduce memory usage requirement of `test_pdist_norm_large` in `test_torch.py` (#90075 ) Basically the same fix as #85373, `/usr/bin/time` indicates that the memory requirement on the host-side was actually ~64GiB before the workaround and ~30GiB after. CC @ptrblck @davidberard98 Pull Request resolved: https://github.com/pytorch/pytorch/pull/90075 Approved by: https://github.com/davidberard98	2022-12-03 05:28:21 +00:00
Yu, Guangye	4144ad16af	add XPU backend to support torch.save and torch.load (#89679 ) # Motivate We need to add XPU backend to support torch.save and torch.load when parameter _use_new_zipfile_serialization=False. # Solution We give a design via wrap data as a tensor: >1. and use an in-place copy for H2D >2. directly call a tensor.to() for D2H. This can help us: >1. unify the generic code for all backends. >2. support all the non-CPU device backends. # Additional Context No need more UT. test/test_serialization.py will cover this code change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/89679 Approved by: https://github.com/ezyang	2022-11-30 20:38:02 +00:00
albanD	8713119c89	Stream actually overrides __new__ so we need to patch it as well (#89592 ) Avoids ``` $ python foo.py Traceback (most recent call last): File "foo.py", line 3, in <module> a = torch.cuda.Stream() File "/home/albandes/local/pytorch/3.8_debug_source/torch/cuda/streams.py", line 34, in __new__ return super(Stream, cls).__new__(cls, priority=priority, kwargs) TypeError: object.__new__() takes exactly one argument (the type to instantiate) ``` And now gets ``` $ python foo.py Traceback (most recent call last): File "foo.py", line 3, in <module> a = torch.cuda.Stream() File "/home/albandes/local/pytorch/3.8_debug_source/torch/cuda/streams.py", line 34, in __new__ return super(Stream, cls).__new__(cls, priority=priority, kwargs) File "/home/albandes/local/pytorch/3.8_debug_source/torch/cuda/_utils.py", line 44, in err_fn raise RuntimeError( RuntimeError: Tried to instantiate dummy base class Stream ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/89592 Approved by: https://github.com/soumith	2022-11-29 21:43:23 +00:00
David Berard	a029ec2c88	Move gpu slow tests to sm86 (#87880 ) NVFuser tests (which are slow tests) would be better to run on more modern GPU hardware. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87880 Approved by: https://github.com/malfet	2022-11-29 19:29:59 +00:00
Nikita Karetnikov	57af0c8245	Bug fix: make sure `copy_impl` doesn't read out of bounds (#88544 ) Fixes #88543. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88544 Approved by: https://github.com/lezcano	2022-11-16 13:23:38 +00:00
PyTorch MergeBot	8441443132	Revert "Add nondeterministic error for `scatter` (#88244 )" This reverts commit `e940a2f8e2`. Reverted https://github.com/pytorch/pytorch/pull/88244 on behalf of https://github.com/mehtanirav due to Internal test failures	2022-11-10 23:56:49 +00:00
Kurt Mohler	ee28b865ee	Deprecate TypedStorage, its derived classes, and all of their public methods (#85303 ) Part of #85302 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85303 Approved by: https://github.com/ezyang	2022-11-08 18:11:01 +00:00
Kurt Mohler	e940a2f8e2	Add nondeterministic error for `scatter` (#88244 ) Fixes #88096 Pull Request resolved: https://github.com/pytorch/pytorch/pull/88244 Approved by: https://github.com/ezyang, https://github.com/mruberry	2022-11-04 20:23:59 +00:00
Nikolay Korovaiko	0f6304ef1e	disable the out variants in test_cumprod test for inductor (#88328 ) `out=` variants aren't supported by autograd and it's not a must fix, so disabling the test (https://github.com/pytorch/torchdynamo/issues/1798) for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88328 Approved by: https://github.com/desertfire	2022-11-03 16:52:37 +00:00
Nikolay Korovaiko	529ba076c6	add an exclude for test_constructor for inductor (#88143 ) This test (https://github.com/pytorch/torchdynamo/issues/1800) fails since none of the c-tor ops support `pin_memory=True`. Natalia suggests it's not a priority to fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88143 Approved by: https://github.com/desertfire	2022-11-03 16:21:18 +00:00
Edward Z. Yang	f884e817d4	Make Python op registration work with torchdeploy/multipy (#87162 ) See strategy at PythonOpRegistrationTrampoline.cpp for the big picture. Along the way, I made OperatorHandle support == and hashing, and slightly changed the low level python_dispatch impl API to disallow empty strings for dispatch key, which had the knock on effect of requiring us to explicitly make sure we pass in CompositeImplicitAutograd if we would have passed in "" (I didn't apply this to the rest of the file because I'm lazy.) Test strategy is we delete the logic for preventing Python op registrations in torch from being skipped in a torchdeploy context and show CI still works. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/87162 Approved by: https://github.com/anjali411, https://github.com/bdhirsh	2022-11-03 12:56:44 +00:00
Philip Meier	bc73affdad	prepare removal of deprecated functionality in torch.testing (#87969 ) _Redo of #86586 with all BC breaking changes granularly placed into separate commits._ --- Per title. Deprecation happened on Feb 25, 2022 in `c6f1bbc0ac`, which made it into the 1.12 release. Since it is now 245 days later and the next release will be 1.14, the removals later in the stack comply with the [BC policy](https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#minimizing-the-disruption-of-bc-breaking-changes). Pull Request resolved: https://github.com/pytorch/pytorch/pull/87969 Approved by: https://github.com/mruberry	2022-11-02 14:04:48 +00:00
Kurt Mohler	1dbc8ad3b7	Add `Warning` class and refactor C++ warnings to use it (#84101 ) Also adds `TORCH_WARN_WITH` and `TORCH_WARN_DEPRECATION` macros Part of #72948 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84101 Approved by: https://github.com/albanD	2022-10-18 20:02:42 +00:00
Natalia Gimelshein	1704256b10	Enables `where` to have cpu scalar args (#87022 ) This is for decompositions only, no attempt made to have good performance for this case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/87022 Approved by: https://github.com/ezyang, https://github.com/eellison, https://github.com/mruberry	2022-10-17 17:08:47 +00:00
Mikayla Gawarecki	afaee00fec	Add python `nested_tensor` and `as_nested_tensor` constructors in `torch.nested` (#85593 ) Remove `torch.nested_tensor` which has erroneous behavior wrt gradients (could be either leaf or not leaf). Introduce `torch.nested.nested_tensor` and `torch.nested.as_nested_tensor` in the vein of `torch.tensor` and `torch.as_tensor`. Done in nested `__init__.py` for now but can move to pybind in future (when we want to load from numpy/nested lists ). Discussed offline with @cpuhrsch and pybind constructor (https://github.com/pytorch/pytorch/pull/85536) was more gnarly than expected, so we can move to that when we do need loading from numpy etc. Differential Revision: [D39806622](https://our.internmc.facebook.com/intern/diff/D39806622) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85593 Approved by: https://github.com/drisspg, https://github.com/cpuhrsch	2022-09-28 20:15:02 +00:00
Kurt Mohler	b0a631cd14	Add nondeterministic alert for `MaxUnpool1d/2d/3d` (#84766 ) Part of #80827 Part of #78249 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84766 Approved by: https://github.com/Lezcano, https://github.com/mruberry, https://github.com/nikitaved	2022-09-17 11:58:18 +00:00
soulitzer	02f654abca	Disable torch.library.Library with PYTORCH_DISABLE_LIBRARY (#85190 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85190 Approved by: https://github.com/d4l3k	2022-09-17 03:05:43 +00:00
Khushi Agrawal	a9258eba8e	[Testing] Port `bernoulli` and `multinomial` to ErrorInputs. (#74683 ) Hi, The PR aims to port `bernoulli` and `multinomial` to error inputs. Thanks! cc: @kshitij12345! :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/74683 Approved by: https://github.com/kshitij12345, https://github.com/mruberry	2022-09-16 21:24:09 +00:00
Elias Ellison	f37069aac7	Re-enable fixed dynamo tests (#84969 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84969 Approved by: https://github.com/bdhirsh, https://github.com/ezyang	2022-09-16 15:36:52 +00:00
Kurt Mohler	95a2c3df31	Replace `expectedAlertNondeterministic` with simpler check function (#84808 ) Fixes #84807 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84808 Approved by: https://github.com/mruberry	2022-09-16 01:10:12 +00:00
Kurt Mohler	5b58140d1a	Add deterministic impl of `scatter_add` CUDA for all input sizes (#79466 ) Fixes #50469 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79466 Approved by: https://github.com/ngimel	2022-09-07 03:12:49 +00:00
Natalia Gimelshein	0b363c5c5c	don't synchronize single element any/all reductions (#84465 ) Fixes #84291 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84465 Approved by: https://github.com/ezyang	2022-09-02 21:18:58 +00:00
Elias Ellison	f701cb04fb	Test Dynamo CI w Fake Tensors (#84282 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84282 Approved by: https://github.com/anijain2305	2022-09-01 00:15:05 +00:00
mattip	4dfa6d28a1	Normalize DLPack stride to 1 where shape < 2 (#83158 ) Fixes #83069. Also move all the dlpack tests to a new file., `test_dlpack.py`. The fix involves always allocating a "strides" int array when converting to dlPack and deleting the strides when the capsule descructor is called. Then the strides are copied from the tensor, and `strides[i]` is set to `1` where `shape[i] < 2`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83158 Approved by: https://github.com/ezyang	2022-08-23 15:03:29 +00:00
Brian Hirsh	0c24af4985	Always allow tensor metadata changes (#83590 ) Make it so that it is valid to set metadata after detach calls, like `x.detach().resize_(...)`. This technically lifts some restrictions around `.data`. This PR means that you can now technically call `x.data.resize_(...)`, which can now directly resize `x` instead of erroring. My understanding: Before the tensor-variable merge, when `x` and `x.data` were really different tensors, you could resize `x.data` independently of `x`, and during the merge, this error was added to avoid silent confusing behavior changes. It was agreed that this error has been around long enough (several years) that it's acceptable to drop. cc @albanD @ezyang. (Ed already had a prototype PR [here](https://github.com/pytorch/pytorch/pull/83545) - I ended up making one to try to slog through test failures). Pull Request resolved: https://github.com/pytorch/pytorch/pull/83590 Approved by: https://github.com/ezyang	2022-08-19 23:30:43 +00:00
Nikita Shulga	1a09b05c94	Fix `torch.equal` on CPU (#83350 ) `torch.equal` should not raise an exception when comparing tensors of different types I.e. `torch.equal(torch.tensor([1, 2]), torch.tensor([1, 2], dtype=torch.float)))` should return True rather than raise an exception. Also, this makes it consistent with GPU behaviour Fixes https://github.com/pytorch/pytorch/issues/83314 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83350 Approved by: https://github.com/albanD	2022-08-17 03:22:56 +00:00
Nikita Karetnikov	4010f96121	[primTorch] Fix off by 1 in `canonicalize_dim` (#83198 ) Also fix an issue in the `unsqueeze` ref due to this change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83198 Approved by: https://github.com/ngimel	2022-08-16 17:57:01 +00:00
PyTorch MergeBot	f534b2c627	Revert "Remove split functional wrapper (#74727 )" This reverts commit `a58876ace7`. Reverted https://github.com/pytorch/pytorch/pull/74727 on behalf of https://github.com/seemethere due to Fails internal use cases, might extend out to external use cases as well. Need to assess overall impact of this change more widely	2022-08-10 19:45:23 +00:00
Peter Bell	a58876ace7	Remove split functional wrapper (#74727 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/74727 Approved by: https://github.com/albanD, https://github.com/khabinov	2022-08-10 17:57:48 +00:00
Kurt Mohler	c379915969	Add nondeterministic alert to CUDA cumsum (#75693 ) Part of #75240 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75693 Approved by: https://github.com/ngimel	2022-08-04 01:58:29 +00:00
Kurt Mohler	14d0296e5c	Rename `_Typed/_UntypedStorage` to `Typed/UntypedStorage` and update docs (#82438 ) ### Description Since the major changes for `_TypedStorage` and `_UntypedStorage` are now complete, they can be renamed to be public. `TypedStorage._untyped()` is renamed to `TypedStorage.untyped()`. Documentation for storages is improved as well. ### Issue Fixes #82436 ### Testing N/A Pull Request resolved: https://github.com/pytorch/pytorch/pull/82438 Approved by: https://github.com/ezyang	2022-07-30 19:37:08 +00:00
Fabio Rocha	fd84c458f4	Add torch.unflatten and improve its docs (#81399 ) unflatten now has a free function version in torch.flatten in addition to the method in torch.Tensor.flatten. Updated docs to reflect this and polished them a little. For consistency, changed the signature of the int version of unflatten in native_functions.yaml. Some override tests were failing because unflatten has unusual characteristics in terms of the .int and .Dimname versions having different number of arguments so this required some changes to test/test_override.py Removed support for using mix of integer and string arguments when specifying dimensions in unflatten. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81399 Approved by: https://github.com/Lezcano, https://github.com/ngimel	2022-07-29 15:02:42 +00:00
ecao	1ebe98220c	Optimize the copy of BFloat16 to Float and Float to BFloat16 (#79685 ) Optimize the copy of BFloat16 to Float and Float to BFloat16. * Vectorize the copy of BFLoat16 <-> Float * Use `at::internal::serial_for_each` instead of directly using `cpu_kernel_vec` as `cpu_kernel_vec` can't handle that input and output has different data types. single socket (28cores): ``` before: torch.Size([10, 128, 10, 124]) bf16 -> fp32: 4.18e-05 ms; fp32 -> bf16: 5.04e-05 ms torch.Size([10, 128, 30, 124]) bf16 -> fp32: 0.00011868 ms; fp32 -> bf16: 0.0001476 ms after: torch.Size([10, 128, 10, 124]) bf16 -> fp32: 1.35e-05 ms; fp32 -> bf16: 1.97e-05 ms torch.Size([10, 128, 30, 124]) bf16 -> fp32: 7.32e-05 ms; fp32 -> bf16: 5.70e-05 ms ``` single core: ``` before: torch.Size([10, 128, 10, 124]) bf16 -> fp32: 0.000848 ms; fp32 -> bf16: 0.00105 ms torch.Size([10, 128, 30, 124]) bf16 -> fp32: 0.00269 ms; fp32 -> bf16: 0.00321 ms after: torch.Size([10, 128, 10, 124]) bf16 -> fp32: 0.000370 ms; fp32 -> bf16: 0.000382 ms torch.Size([10, 128, 30, 124]) bf16 -> fp32: 0.00153 ms; fp32 -> bf16: 0.00113 ms ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/79685 Approved by: https://github.com/malfet	2022-07-28 14:34:08 +00:00
Huy Do	edf1868e67	Fix test_doc_template regex (#81755 ) ### The problem This original regex abuses .* in combination with `re.DOTALL` and leads to a catastrophic backtracking perf issue when there is no match. When it happens, test_doc_template will run "forever" and timeout. Here is an example timeout test https://github.com/pytorch/pytorch/runs/7413337595 Another minor issue with this regex is that it won't matches concatenated doc string like `"""FOO""" + """BAR"""`, which is used for some API `_torch_docs.py` ### The fix * Remove most of the match all .* usage. I have tested to make sure that the test finishes even when there is no match, i.e. it fails successfully * Update the regex to match all the following cases before and after linting (You can also try it out on https://pythex.org): BEFORE ``` add_docstr(torch.abs, r""" abs(input, , out=None) -> Tensor Computes the absolute value of each element in :attr:`input`. .. math:: \text{out}_{i} = \|\text{input}_{i}\| """ + r""" Args: {input} Keyword args: {out} Example:: >>> torch.abs(torch.tensor([-1, -2, 3])) tensor([ 1, 2, 3]) """.format(common_args)) add_docstr(torch.absolute, r""" absolute(input, , out=None) -> Tensor Alias for :func:`torch.abs` """) ``` AFTER ``` add_docstr( torch.abs, r""" abs(input, , out=None) -> Tensor Computes the absolute value of each element in :attr:`input`. .. math:: \text{out}_{i} = \|\text{input}_{i}\| """ + r""" Args: {input} Keyword args: {out} Example:: >>> torch.abs(torch.tensor([-1, -2, 3])) tensor([ 1, 2, 3]) """.format( common_args ), ) add_docstr( torch.absolute, r""" absolute(input, , out=None) -> Tensor Alias for :func:`torch.abs` """, ) ``` This will unblock https://github.com/pytorch/pytorch/pull/81643 Pull Request resolved: https://github.com/pytorch/pytorch/pull/81755 Approved by: https://github.com/atalman	2022-07-21 16:28:29 +00:00
Animesh Jain	1d90d6ee60	Setup for running PyTorch tests with TorchDynamo and skips for known failing tests (#80106 ) @ezyang I am going to keep adding more skips in this PR for now. And once we have the CI running, I will replace with the appropriate decorators. cc @mlazos , we should add those tests in test_ops.py in this PR as well cc @jansel Pull Request resolved: https://github.com/pytorch/pytorch/pull/80106 Approved by: https://github.com/ezyang, https://github.com/jansel	2022-07-07 18:57:33 +00:00

1 2 3 4 5 ...

1899 Commits