pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
PyTorch MergeBot	e61d04e467	Revert "[sparse] Add fast semi-structured spasification kernels (#122350 )" This reverts commit `c63a7b5691`. Reverted https://github.com/pytorch/pytorch/pull/122350 on behalf of https://github.com/malfet due to This broke rocm builds, which is visible on PR as well ([comment](https://github.com/pytorch/pytorch/pull/122350#issuecomment-2038424125))	2024-04-04 23:15:36 +00:00
Jesse Cai	c63a7b5691	[sparse] Add fast semi-structured spasification kernels (#122350 ) This PR adds in fast semi-structured sparsification kernels to PyTorch. These kernels allow for accelerated semi-structured sparsification kernels in PyTorch. The kernels have been added as aten native functions In particular, three new functions have been added: * `torch._sparse_semi_structured_tile` This function will return the packed representation and metadata for both X and X', as well as the thread masks. Note that this applies 2:4 sparsity in a 4x4 tile instead of a 1x4 strip as usual. * `torch._sparse_semi_structured_apply` This function takes in an input tensor and thread masks from the above function and returns a packed representation and metadata from applying thread masks to the input tensor. * `torch._sparse_semi_structured_apply_dense` This function does the same thing as above but instead of returning the tensor in the sparse representation it returns it in the dense representation The subclasses have also been updated to add a new `prune_dense_static_sort` classmethod to create sparse tensors with this format. I've added some additional documentatino on how to calculate the compressed tensors needed to create a SparseSemiStructuredTensor oneself. To this end, there are two new helper functions added: `sparse_semi_structured_tile` `compute_compressed_swizzled_bitmask` Pull Request resolved: https://github.com/pytorch/pytorch/pull/122350 Approved by: https://github.com/cpuhrsch	2024-04-04 19:07:35 +00:00
Zola	e49a38973f	Update DimOrDims typing in torch.sparse (#122471 ) I noticed the typing of the `torch.sparse.sum`'s `dim` parameter wasn't allowing an int tuple as input and tracked the issue to this type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122471 Approved by: https://github.com/soulitzer	2024-03-25 16:25:56 +00:00
Pearu Peterson	a39e638707	Update bsr_dense_addmm kernel parameters for sizes 3 x 2 ^ N (#122506 ) As in the title. The speed-ups for a particular set of input sizes range from about 7 to 85 % depending on the used BSR tensor block sizes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122506 Approved by: https://github.com/cpuhrsch	2024-03-23 11:54:33 +00:00
Jesse Cai	16369816a2	[sparse] semi-structured sparse refactor (#117302 ) Summary: This PR is a refactor of semi-structured sparsity support. deprecation: Before `torch.sparse.to_sparse_semi_structured` had a kwarg param `transposed=False`, which has been removed. This kwarg was unused and now thros a deprecation warning. Namely, I've taken the subclassing implementation that xFormers has created and brought it over to PyTorch, as part of our plan to upstream runtime 2:4 sparsity. I've also copied over all the op support that Daniel implemenented that did not depend on the fast sparsification routines, into `_sparse_semi_structured_ops.py` With this subclass, all of our internal tests pass, as well as those in xFormers. The main change is that we now define a base subclass, `SparseSemiStructuredTensor` that is inherited from for each of the specific backends. We also now can arbitrarily override the sparse dispatch table with `_load_dispatch_table()`, idea being this is still general enough where users don't need to modify pytorch source code to get their model working. This also adds in padding support and stores alg_id and fuse_transpose as flags on the tensor, instead of hardcoding them. There still remains two components in xFormers that will need to be ported over eventually: - the autograd functions (`Sparsify24`, `Sparsify24_like`) - fast sparsification routines that they rely on Test Plan: Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/117302 Approved by: https://github.com/alexsamardzic, https://github.com/HDCharles	2024-02-14 01:10:40 +00:00
Jesse Cai	1c1dc0e4e0	[sparse] Add in out_dtype support (i8i8->bf16, i32) for cusparselt (#119296 ) Summary: Adds in out_dtype support for (i8i8->bf16) and (i8i8->i32) matmul with cuSPARSELt. Test Plan: ``` python test/test_sparse_semi_structured.py -k mixed ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/119296 Approved by: https://github.com/cpuhrsch, https://github.com/alexsamardzic	2024-02-12 16:02:36 +00:00
Peter Bell	3a8bf25fdd	[SparseCsr] Remove triton sdpa skip after triton pin update (#109601 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109601 Approved by: https://github.com/desertfire, https://github.com/amjames	2024-02-08 16:40:25 +00:00
Catherine Lee	4f5785b6b3	Enable possibly-undefined error code (#118533 ) Fixes https://github.com/pytorch/pytorch/issues/118129 Suppressions automatically added with ``` import re with open("error_file.txt", "r") as f: errors = f.readlines() error_lines = {} for error in errors: match = re.match(r"(.):(\d+):\d+: error:.\[(.*)\]", error) if match: file_path, line_number, error_type = match.groups() if file_path not in error_lines: error_lines[file_path] = {} error_lines[file_path][int(line_number)] = error_type for file_path, lines in error_lines.items(): with open(file_path, "r") as f: code = f.readlines() for line_number, error_type in sorted(lines.items(), key=lambda x: x[0], reverse=True): code[line_number - 1] = code[line_number - 1].rstrip() + f" # type: ignore[{error_type}]\n" with open(file_path, "w") as f: f.writelines(code) ``` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Co-authored-by: Catherine Lee <csl@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/118533 Approved by: https://github.com/Skylion007, https://github.com/zou3519	2024-01-30 21:07:01 +00:00
PyTorch MergeBot	40ece2e579	Revert "Enable possibly-undefined error code (#118533 )" This reverts commit `4f13f69a45`. Reverted https://github.com/pytorch/pytorch/pull/118533 on behalf of https://github.com/clee2000 due to sorry i'm trying to figure out a codev merge conflict, if this works i'll be back to rebase and merge ([comment](https://github.com/pytorch/pytorch/pull/118533#issuecomment-1917695185))	2024-01-30 19:00:34 +00:00
Edward Z. Yang	4f13f69a45	Enable possibly-undefined error code (#118533 ) Fixes https://github.com/pytorch/pytorch/issues/118129 Suppressions automatically added with ``` import re with open("error_file.txt", "r") as f: errors = f.readlines() error_lines = {} for error in errors: match = re.match(r"(.):(\d+):\d+: error:.\[(.*)\]", error) if match: file_path, line_number, error_type = match.groups() if file_path not in error_lines: error_lines[file_path] = {} error_lines[file_path][int(line_number)] = error_type for file_path, lines in error_lines.items(): with open(file_path, "r") as f: code = f.readlines() for line_number, error_type in sorted(lines.items(), key=lambda x: x[0], reverse=True): code[line_number - 1] = code[line_number - 1].rstrip() + f" # type: ignore[{error_type}]\n" with open(file_path, "w") as f: f.writelines(code) ``` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/118533 Approved by: https://github.com/Skylion007, https://github.com/zou3519	2024-01-30 05:08:10 +00:00
Aleksandar Samardžić	341c4227a8	Update F32 sparse semi-structured support for CUTLASS back-end (#116017 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/116017 Approved by: https://github.com/jcaip	2023-12-22 16:53:04 +00:00
Jesse Cai	a8e354a9a0	[sparse][semi-structured] enable fp32 support, separate sparse and dense constraints (#115550 ) Summary: Both cuSPASRELt and CUTLASS support 1:2 semi-structured sparsity for fp32, which this PR enables.(thanks @alexsamardzic). Furthermore, this PR also updates the sparse_config to take into account the different shape constraints for sparse and dense matrices. Technically, cuSPARSELt supports smaller sparse matrix constraints as it seens to pad to the CUTLASS constraints under the hood. However, in practice small sparse matrices are not commonly used and we care more about the dense constraints for LLM inference. For now, we keep the CUTLASS constraints in place for both cuSPARSELt and CUTLASS tensors This PR also reconnects the _FUSE_TRANSPOSE flag for cuSPARSELt tensors. Test Plan: ``` python test/test_sparse_semi_structured.py ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/115550 Approved by: https://github.com/cpuhrsch	2023-12-15 02:28:17 +00:00
Pearu Peterson	e918461377	Add instructions for generating optimal Triton kernel parameters of bsr_dense_addmm (#115504 ) As in the title. In addition, enable verbose output when executing the torch/sparse/_triton_ops_meta.py script. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115504 Approved by: https://github.com/cpuhrsch ghstack dependencies: #115499	2023-12-12 16:44:51 +00:00
Pearu Peterson	32286512cc	Add tune_bsr_dense_addmm as an API to find optimal triton kernel parameters for bsr_dense_addmm (#115499 ) As in the title. In addition: - improve the algorithm for finding a minima of operation timings: break the inner loop early when a next minima candidate is found - add tests and fix bugs Pull Request resolved: https://github.com/pytorch/pytorch/pull/115499 Approved by: https://github.com/cpuhrsch	2023-12-12 16:44:51 +00:00
Pearu Peterson	12085914b8	Replace bsr_dense_mm triton kernel with bsr_dense_addm triton kernel (#115030 ) The `bsr_dense_addmm` triton kernel introduced in https://github.com/pytorch/pytorch/pull/114595 is a generalization of `bsr_dense_mm` triton kernel and a more efficient version of it because it uses an extra kernel parameter `SPLIT_N` that has notable effect to performance for r.h.s operand with a larger number of columns. This PR eliminates the `bsr_dense_mm` triton kernel in favor of using `bsr_dense_addmm` triton kernel. The performance increase of `bsr_dense_mm` is as follows (float16, `NVIDIA A100-SXM4-80GB`): - with 16x16 blocks, the average/maximal speed up is 50/71 % - with 32x32 blocks, the average/maximal speed up is 30/63 % - with 64x64 blocks, the average/maximal speed up is 12/26 % - with 128x128 blocks, the average/maximal speed up is 7/17 % Pull Request resolved: https://github.com/pytorch/pytorch/pull/115030 Approved by: https://github.com/cpuhrsch	2023-12-05 22:29:24 +00:00
Joel Schlosser	22704426c3	Expand dynamic dims support for traceable subclasses (#114311 ) Continuation of #112185, following the design in this [doc](https://docs.google.com/document/d/1ipSxcTzEMMOAPvxP-YJlD5JBZZmIGgh8Q34ixtOUCRo). Summary: * Introduce `SubclassSymbolicPolicy` containing separate dynamic dim / constraint policies for the outer and inner tensors * Expand the automatic dynamic algorithm to recurse into inner tensors and produce one of these for a subclass instance * Maintain legacy behavior for subclasses by recursively calling `mark_dynamic()` on inner tensors of the same dim as outer when `mark_dynamic(outer, ...)` is called * Addresses this: `6a86cf00ad/torch/_dynamo/variables/builder.py (L1750)` * Add `outer_size` and `outer_stride` arguments to `__tensor_unflatten__()` so that you can find out what symbols were allocated for the outer size / stride (you are expected to return a tensor that compares equal to the outer symbols) * Signatures now: ```python # attrs is a list of inner tensor attributes on x; inner_tensor = getattr(x, attr) # ctx is anything useful for rebuilding the class we want to guard on attrs, ctx = x.__tensor_flatten__() ... # inner_tensors is a dict of {attr -> tensor} # ctx is taken unmodified from flattening and (eventually) guarded on # outer_size is the expected size of the output; possibly symbolic # outer_stride is the expected strides of the output; possibly symbolic y = MySubclass.__tensor_unflatten__(inner_tensors, ctx, outer_size, outer_stride) # at the __tensor_unflatten__() call-site in PT2, we assert y.shape == outer_size and y.stride() == outer_stride # the assert simplifies symbols when there are relationships between outer and inner symbols ``` * Size info needed for `NestedTensor` at least, stride info needed for `DTensor` at least * Punting on `outer_storage_offset` because storage_offset handling is horribly broken in PT2 right now * ~~Add new `__tensor_mark_dynamic__()` to allow overriding the behavior of mark_dynamic on a per-subclass basis~~ (booted to future work) * ~~Add guards for tensor subclasses by calling `__tensor_flatten__()` in the guard to test equality on `ctx`~~ * Now handled in #114469 * Next PR: add TENSOR_MATCH guards on inner tensors Pull Request resolved: https://github.com/pytorch/pytorch/pull/114311 Approved by: https://github.com/ezyang, https://github.com/drisspg, https://github.com/voznesenskym, https://github.com/bdhirsh	2023-12-05 21:09:25 +00:00
Pearu Peterson	4ba37e1804	Add tests for bsr_dense_addmm and bsr_dense_mm triton kernels (#114800 ) As in the title. In addition, - resolve https://github.com/pytorch/pytorch/pull/114757#discussion_r1409547917 re triton-contiguous inputs - support non-contiguous inputs and outputs in triton kernels - fix a couple of minor bugs Pull Request resolved: https://github.com/pytorch/pytorch/pull/114800 Approved by: https://github.com/cpuhrsch	2023-12-04 22:07:47 +00:00
Jesse Cai	4cb7dd0fc9	[sparse][quant] Add support for vector alpha in cusparselt mm (#112056 ) Summary: This PR adds in support for passing in a alpha Tensor, which represents a tensor of alpha values to fuse into the matmul. ``` cusparselt_sparse_mm = alpha A @ B + bias ``` This operation is necessary for quantization, where we would like to fuse one of the dequant matmuls into the sparse op. Test Plan: ``` python test/test_sparse_semi_structured -k alpha ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/112056 Approved by: https://github.com/cpuhrsch	2023-12-04 16:56:06 +00:00
Pearu Peterson	69f112d586	Call triton bsr_dense_mm/bsr_dense_addmm kernels on mm/addmm float32 inputs when appropiate (#114757 ) As in the title. In addition, this PR fixes a bug in `bsr_dense_mm` and `bsr_dense_addmm` return value handling where computations are performed on `make_triton_contiguous` return value while `bsr_dense_mm`/`bsr_dense_addmm` return a tensor that is an input to `make_triton_contiguous`. If `make_triton_contiguous` makes a copy of the input, the return values of `bsr_dense_mm`/`bsr_dense_addmm` will contain garbage. The PR increases the performance of nn.linear as follows (float32, `NVIDIA A100-SXM4-80GB`): - with 16x16 blocks, the average/maximal speed up is 67/78 % - with 32x32 blocks, the average/maximal speed up is 72/79 % - with 64x64 blocks, the average/maximal speed up is 71/79 % - with 128x128 blocks, the average/maximal speed up is 62/76 % The performance increase is illustrated also by the following sparsity-speedup graphs (before and after this PR): <img src="https://github.com/pytorch/pytorch/assets/402156/55ce0bf7-8ef2-47ab-99e8-8878f159037d" width="48%"> <img src="https://github.com/pytorch/pytorch/assets/402156/df256175-a594-4bd7-b244-90867fb9a45e" width="48%"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/114757 Approved by: https://github.com/cpuhrsch	2023-11-30 13:38:07 +00:00
Pearu Peterson	69c4819f53	Add bsr_dense_addmm triton kernel (#114595 ) As in the title. The `bsr_dense_addmm` kernel implemented in this PR is a generalization of `bsr_dense_mm` in the following respects (in addition of having input, beta, and alpha parameters): - it implements `SPLIT_N` kernel parameter that enables efficient kernel launches in the case of wide inputs. For instance, the timing of nn.linear with 256x256 BSR weights having 16x16 blocks and 256x131072 strided input reduced about 16x (this corresponds to the 94 % speed up value listed below). - it supports rectangular blocks in sparse BSR tensor weights The performance increase of nn.linear is as follows (float16, `NVIDIA A100-SXM4-80GB`): - with 16x16 blocks, the average/maximal speed up is 55/94 % - with 32x32 blocks, the average/maximal speed up is 33/63 % - with 64x64 blocks, the average/maximal speed up is 23/42 % - with 128x128 blocks, the average/maximal speed up is 15/39 % Pull Request resolved: https://github.com/pytorch/pytorch/pull/114595 Approved by: https://github.com/cpuhrsch	2023-11-29 05:29:25 +00:00
Pearu Peterson	12f95df0e9	Eliminate unnecessary multiplications by 1 in addmm with sparse compressed tensor operand (#114026 ) This PR: - updates `torch/sparse/_triton_ops_meta.py` for the API change in `triton.testing.do_bench` - force `num_stages` to be 1 when blocksize is 128x128 to avoid out of resources exception when `bsr_dense_mm` is called from `nn.linear`. - as in the title. The performance of `nn.linear` on BSR tensor weights (dtypes `float16` and `bfloat16`) is increased as follows (`NVIDIA A100-SXM4-80GB`): - for blocksize 16x16, the average/maximum speed up is about 11/20 % - for blocksize 32x32, the average/maximum speed up is about 15/24 % - for blocksize 64x64, the average/maximum speed up is about 18/26 % - for blocksize 128x128, the average/maximum speed up is about 15/28 % Pull Request resolved: https://github.com/pytorch/pytorch/pull/114026 Approved by: https://github.com/cpuhrsch	2023-11-19 12:13:54 +00:00
Pearu Peterson	cffea773e3	Fix bsr_dense_mm with a non-contiguous out argument. (#113801 ) Fixes https://github.com/pytorch/pytorch/issues/113754 Pull Request resolved: https://github.com/pytorch/pytorch/pull/113801 Approved by: https://github.com/cpuhrsch	2023-11-16 05:56:17 +00:00
Pearu Peterson	e1c872e009	Add optimal triton kernel parameters to bsr_dense_mm and scatter_mm for bfloat16 and float32 dtypes (#113553 ) As in the title. This PR is a follow-up to PR https://github.com/pytorch/pytorch/pull/112737 to address bfloat16 and float32 dtype cases. The performance increase is as follows (`NVIDIA A100-SXM4-80GB`): - bsr_scatter_mm and bfloat16 - for blocksize 16x16, the average/maximum speed up is about 29/75 %. - for blocksize 32x32, the average/maximum speed up is about 23/58 %. - for blocksize 64x64, the average/maximum speed up is about 27/66 %. - for blocksize 128x128, the average/maximum speed up is about 33/72 %. - bsr_dense_mm and bfloat16 - for blocksize 16x16, the average/maximum speed up is about 47/61 %. - for blocksize 32x32, the average/maximum speed up is about 29/43 %. - for blocksize 64x64, the average/maximum speed up is about 21/41 %. - for blocksize 128x128, the average/maximum speed up is about 12/29 %. - bsr_dense_mm and float32 - for blocksize 16x16, the average/maximum speed up is about 35/49 %. - for blocksize 32x32, the average/maximum speed up is about 2/5 %. - for blocksize 64x64, the average/maximum speed up is about 2/21 %. - for blocksize 128x128, the average/maximum speed up is about 79/84 %. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113553 Approved by: https://github.com/cpuhrsch	2023-11-14 00:47:59 +00:00
Siddharth Mishra	fe5d8850e2	Fixed docstring errors in _fuser.py, _state.py, __init__.py, _freeze.py, _async.py, _recursive.py, _tensorboard_vis.py, _trace.py, _await.py, _check.py, _serialization.py, _script.py, annotations.py, _monkeytype_config.py (#113371 ) Fixes #113194 docstrings updated. Here are the outputs with the number before and after:- 1) torch/sparse/__init__.py Before: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:1 at module level: D104: Missing docstring in public package /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:183 in public function `sum`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:183 in public function `sum`: D400: First line should end with a period (not 'n') /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:183 in public function `sum`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:391 in public class `check_sparse_tensor_invariants`: D207: Docstring is under-indented /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:436 in public method `is_enabled`: D207: Docstring is under-indented /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:436 in public method `is_enabled`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:448 in public method `enable`: D207: Docstring is under-indented /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:468 in public method `disable`: D207: Docstring is under-indented /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:475 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:479 in public method `__enter__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:486 in public method `__exit__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:492 in public method `__call__`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:502 in public function `as_sparse_gradcheck`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:502 in public function `as_sparse_gradcheck`: D400: First line should end with a period (not 'l') /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:502 in public function `as_sparse_gradcheck`: D401: First line should be in imperative mood (perhaps 'Decorate', not 'Decorator') /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:518 in private nested function `gradcheck_with_sparse_support`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:518 in private nested function `gradcheck_with_sparse_support`: D400: First line should end with a period (not 's') /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:518 in private nested function `gradcheck_with_sparse_support`: D401: First line should be in imperative mood; try rephrasing (found 'Same') /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:528 in private nested function `convert_to_strided_representation`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:528 in private nested function `convert_to_strided_representation`: D400: First line should end with a period (not 'n') /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:559 in private nested function `restore_from_strided_representation`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:559 in private nested function `restore_from_strided_representation`: D400: First line should end with a period (not 'd') 23 ``` After: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:1 at module level: D104: Missing docstring in public package /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:476 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:480 in public method `__enter__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:487 in public method `__exit__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/sparse/__init__.py:493 in public method `__call__`: D102: Missing docstring in public method 5 ``` 2) torch/contrib/_tensorboard_vis.py Before: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/contrib/_tensorboard_vis.py:21 in public function `dump_tensorboard_summary`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/contrib/_tensorboard_vis.py:54 in public function `visualize_graph_executor`: D401: First line should be in imperative mood (perhaps 'Append', not 'Appends') 2 ``` After: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/contrib/_tensorboard_vis.py:21 in public function `dump_tensorboard_summary`: D103: Missing docstring in public function 1 ``` 3) torch/jit/_state.py Before: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_state.py:1 at module level: D400: First line should end with a period (not 'e') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_state.py:20 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_state.py:25 in public method `parse_env`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_state.py:41 in public method `__bool__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_state.py:48 in public function `disable`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_state.py:52 in public function `enable`: D103: Missing docstring in public function 6 ``` After: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_state.py:20 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_state.py:25 in public method `parse_env`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_state.py:41 in public method `__bool__`: D105: Missing docstring in magic method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_state.py:48 in public function `disable`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_state.py:52 in public function `enable`: D103: Missing docstring in public function 5 ``` 4) torch/jit/_monkeytype_config.py Before: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:27 in public function `is_torch_native_class`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:40 in public function `get_type`: D200: One-line docstring should fit on one line with quotes (found 3) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:40 in public function `get_type`: D401: First line should be in imperative mood; try rephrasing (found 'Helper') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:62 in public function `get_optional_of_element_type`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:62 in public function `get_optional_of_element_type`: D400: First line should end with a period (not 'l') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:62 in public function `get_optional_of_element_type`: D401: First line should be in imperative mood; try rephrasing (found 'Helper') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:75 in public function `get_qualified_name`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:84 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:87 in public method `log`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:90 in public class `JitTypeTraceStore`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:91 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:98 in public method `add`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:103 in public method `filter`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:111 in public method `analyze`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:122 in public method `consolidate_types`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:139 in public method `get_args_types`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:142 in public class `JitTypeTraceConfig`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:143 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:148 in public method `trace_logger`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:148 in public method `trace_logger`: D400: First line should end with a period (not 'd') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:148 in public method `trace_logger`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:154 in public method `trace_store`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:157 in public method `code_filter`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:163 in public class `JitTypeTraceStoreLogger`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:164 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:167 in public class `JitTypeTraceStore`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:168 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:171 in public class `JitTypeTraceConfig`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:172 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:179 in public function `jit_code_filter`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:179 in public function `jit_code_filter`: D401: First line should be in imperative mood; try rephrasing (found 'Custom') 31 ``` After: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:27 in public function `is_torch_native_class`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:74 in public function `get_qualified_name`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:83 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:86 in public method `log`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:89 in public class `JitTypeTraceStore`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:90 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:97 in public method `add`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:102 in public method `filter`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:110 in public method `analyze`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:121 in public method `consolidate_types`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:138 in public method `get_args_types`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:141 in public class `JitTypeTraceConfig`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:142 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:150 in public method `trace_store`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:153 in public method `code_filter`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:159 in public class `JitTypeTraceStoreLogger`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:160 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:163 in public class `JitTypeTraceStore`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:164 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:167 in public class `JitTypeTraceConfig`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_monkeytype_config.py:168 in public method `__init__`: D107: Missing docstring in __init__ 21 ``` 5) torch/jit/_fuser.py Before: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_fuser.py:9 in public function `optimized_execution`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_fuser.py:9 in public function `optimized_execution`: D400: First line should end with a period (not 'n') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_fuser.py:9 in public function `optimized_execution`: D401: First line should be in imperative mood; try rephrasing (found 'A') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_fuser.py:23 in public function `fuser`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_fuser.py:23 in public function `fuser`: D400: First line should end with a period (not 'n') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_fuser.py:23 in public function `fuser`: D401: First line should be in imperative mood; try rephrasing (found 'A') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_fuser.py:136 in public function `set_fusion_strategy`: D401: First line should be in imperative mood (perhaps 'Set', not 'Sets') 7 ``` After: ``` 0 ``` 6) torch/jit/_async.py Before: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_async.py:1 at module level: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_async.py:1 at module level: D400: First line should end with a period (not 'I') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_async.py:20 in public function `fork`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_async.py:20 in public function `fork`: D400: First line should end with a period (not 'e') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_async.py:20 in public function `fork`: D401: First line should be in imperative mood (perhaps 'Create', not 'Creates') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_async.py:88 in public function `wait`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_async.py:88 in public function `wait`: D400: First line should end with a period (not 'e') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_async.py:88 in public function `wait`: D401: First line should be in imperative mood (perhaps 'Force', not 'Forces') 8 ``` After: ``` 0 ``` 7) torch/jit/_await.py Before: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_await.py:11 in private function `_awaitable`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_await.py:11 in private function `_awaitable`: D400: First line should end with a period (not ',') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_await.py:11 in private function `_awaitable`: D401: First line should be in imperative mood (perhaps 'Create', not 'Creates') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_await.py:19 in private function `_awaitable_wait`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_await.py:19 in private function `_awaitable_wait`: D400: First line should end with a period (not ',') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_await.py:19 in private function `_awaitable_wait`: D401: First line should be in imperative mood (perhaps 'Request', not 'Requests') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_await.py:27 in private function `_awaitable_nowait`: D200: One-line docstring should fit on one line with quotes (found 3) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_await.py:27 in private function `_awaitable_nowait`: D401: First line should be in imperative mood (perhaps 'Create', not 'Creates') 8 ``` After: ``` 0 ``` 8) torch/jit/_check.py Before: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_check.py:10 in public class `AttributeTypeIsSupportedChecker`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_check.py:10 in public class `AttributeTypeIsSupportedChecker`: D400: First line should end with a period (not 'e') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_check.py:10 in public class `AttributeTypeIsSupportedChecker`: D412: No blank lines allowed between a section header and its content ('Example') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_check.py:61 in public method `check`: D102: Missing docstring in public method /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_check.py:110 in public method `visit_Assign`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_check.py:110 in public method `visit_Assign`: D400: First line should end with a period (not 'n') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_check.py:132 in public method `visit_AnnAssign`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_check.py:132 in public method `visit_AnnAssign`: D400: First line should end with a period (not '`') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_check.py:187 in public method `visit_Call`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_check.py:187 in public method `visit_Call`: D400: First line should end with a period (not '`') 10 ``` After: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_check.py:58 in public method `check`: D102: Missing docstring in public method 1 ``` 9) torch/jit/_freeze.py Before: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_freeze.py:1 at module level: D400: First line should end with a period (not 'g') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_freeze.py:16 in public function `freeze`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_freeze.py:16 in public function `freeze`: D400: First line should end with a period (not 'd') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_freeze.py:127 in public function `run_frozen_optimizations`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_freeze.py:127 in public function `run_frozen_optimizations`: D401: First line should be in imperative mood (perhaps 'Run', not 'Runs') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_freeze.py:182 in public function `optimize_for_inference`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_freeze.py:182 in public function `optimize_for_inference`: D400: First line should end with a period (not 'e') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_freeze.py:182 in public function `optimize_for_inference`: D401: First line should be in imperative mood (perhaps 'Perform', not 'Performs') 8 ``` After: ``` 0 ``` 10) torch/jit/_recursive.py Before: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:69 in public function `make_stub`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:75 in public function `make_stub_from_method`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:90 in public function `make_stubs_from_exported_methods`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:103 in public function `jit_ignored_properties`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:155 in public class `SourceContext`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:156 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:160 in public function `get_annotations`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:186 in public function `infer_concrete_type_builder`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:186 in public function `infer_concrete_type_builder`: D400: First line should end with a period (not 's') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:423 in public class `ConcreteTypeStore`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:427 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:434 in public method `get_or_create_concrete_type`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:434 in public method `get_or_create_concrete_type`: D400: First line should end with a period (not 'T') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:459 in public function `create_methods_and_properties_from_stubs`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:474 in public function `create_hooks_from_stubs`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:485 in public function `get_module_concrete_type`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:485 in public function `get_module_concrete_type`: D400: First line should end with a period (not 'e') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:485 in public function `get_module_concrete_type`: D401: First line should be in imperative mood (perhaps 'Get', not 'Gets') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:539 in public function `create_script_module`: D400: First line should end with a period (not 'e') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:539 in public function `create_script_module`: D401: First line should be in imperative mood (perhaps 'Create', not 'Creates') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:725 in public function `script_model_defines_attr`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:735 in public function `add_python_attr_to_scripted_model`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:740 in public function `get_overload_annotations`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:772 in public function `get_overload_name_mapping`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:797 in public function `make_stubs_for_overloads`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:816 in public function `check_module_initialized`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:842 in public function `infer_methods_to_compile`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:842 in public function `infer_methods_to_compile`: D400: First line should end with a period (not 'g') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:842 in public function `infer_methods_to_compile`: D401: First line should be in imperative mood (perhaps 'Implement', not 'Implements') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:904 in public function `get_hook_stubs`: D200: One-line docstring should fit on one line with quotes (found 3) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:904 in public function `get_hook_stubs`: D400: First line should end with a period (not 's') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:904 in public function `get_hook_stubs`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:940 in public function `get_property_stubs`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:940 in public function `get_property_stubs`: D400: First line should end with a period (not 'd') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:963 in public function `interface_script`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:963 in public function `interface_script`: D400: First line should end with a period (not 'r') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:963 in public function `interface_script`: D401: First line should be in imperative mood (perhaps 'Make', not 'Makes') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:977 in private nested function `infer_interface_methods_to_compile`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:977 in private nested function `infer_interface_methods_to_compile`: D400: First line should end with a period (not 'h') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:989 in public function `try_compile_fn`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:1014 in public function `wrap_cpp_class`: D200: One-line docstring should fit on one line with quotes (found 3) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:1021 in public function `wrap_cpp_module`: D200: One-line docstring should fit on one line with quotes (found 3) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:1021 in public function `wrap_cpp_module`: D400: First line should end with a period (not 's') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:1040 in public function `compile_unbound_method`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:1052 in public function `lazy_bind`: D205: 1 blank line required between summary line and description (found 0) /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:1052 in public function `lazy_bind`: D400: First line should end with a period (not 'd') /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:1052 in public function `lazy_bind`: D401: First line should be in imperative mood (perhaps 'Return', not 'Returns') 47 ``` After: ``` /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:69 in public function `make_stub`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:75 in public function `make_stub_from_method`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:90 in public function `make_stubs_from_exported_methods`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:103 in public function `jit_ignored_properties`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:155 in public class `SourceContext`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:156 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:160 in public function `get_annotations`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:424 in public class `ConcreteTypeStore`: D101: Missing docstring in public class /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:428 in public method `__init__`: D107: Missing docstring in __init__ /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:457 in public function `create_methods_and_properties_from_stubs`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:472 in public function `create_hooks_from_stubs`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:724 in public function `script_model_defines_attr`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:734 in public function `add_python_attr_to_scripted_model`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:739 in public function `get_overload_annotations`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:771 in public function `get_overload_name_mapping`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:796 in public function `make_stubs_for_overloads`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:815 in public function `check_module_initialized`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:979 in public function `try_compile_fn`: D103: Missing docstring in public function /home/ubuntu/Desktop/Docathon/pytorch/torch/jit/_recursive.py:1026 in public function `compile_unbound_method`: D103: Missing docstring in public function 19 ``` @svekars Pull Request resolved: https://github.com/pytorch/pytorch/pull/113371 Approved by: https://github.com/davidberard98	2023-11-12 03:19:02 +00:00
Aaron Gokaslan	8219bf051b	[BE]: Apply RUF015 to torch folder (#113025 ) Removes unnecessary allocations of iterators. There is a small chance this may have side effects as the entire iterator is no longer consumed, but this is a way more efficient method for retrieving the first element. Pull Request resolved: https://github.com/pytorch/pytorch/pull/113025 Approved by: https://github.com/ezyang, https://github.com/malfet	2023-11-07 00:48:15 +00:00
Pearu Peterson	e64d250210	Add a tool for a semi-automatic optimization of bsr_dense_mm meta parameters. (#112737 ) Finding optimal meta parameters for bsr_dense_mm and bsr_scatter_mm triton kernels is a tedious job. This PR introduces a tool (a Python script `torch/sparse/_triton_ops_meta.py`) that finds the optimal set of meta parameters for a given set of matrix multiplication inputs and their block sizes. Currently, such a set is found for square bsr tensor inputs with sizes 256...16384 and square blocksizes 16...128, and dense tensor inputs with sizes 256...131072. As a result, bsr_dense_mm performance has increased as follows (`NVIDIA A100-SXM4-80GB`): - for blocksize 16x16, the average/maximum speed up is about 40/60 %. - for blocksize 32x32, the average/maximum speed up is about 28/45 %. - for blocksize 64x64, the average/maximum speed up is about 26/43 %. - for blocksize 128x128, the average/maximum speed up is about 12/28 %. To enable the performance improvements through meta parameter optimization for other CUDA devices, one must execute the `_triton_ops_meta.py` which will calculate the optimal meta parameters and store the results in a dictionary object defined in `_triton_ops_meta.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112737 Approved by: https://github.com/cpuhrsch	2023-11-05 12:52:09 +00:00
Pearu Peterson	33c41daf60	Fix scatter_mm kernel failure on non-contiguous tensor arguments (#112337 ) This PR fixes ``` RuntimeError: Triton Error [CUDA]: an illegal memory access was encountered ``` that appears when using large non-contiguous tensor arguments in `scatter_mm` kernel launch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112337 Approved by: https://github.com/cpuhrsch ghstack dependencies: #112154, #112076	2023-10-30 19:16:05 +00:00
Pearu Peterson	cf6041e942	Use weakref in storing tensors as keys (follow-up to #111470 ) (#112076 ) This PR addresses the discussion items in https://github.com/pytorch/pytorch/pull/111470#discussion_r1369008167, that is, - use weakref when storing tensors as keys, - add `storage_offset` to the key data, - and revise the description of the `TensorAsKey` utility. Pull Request resolved: https://github.com/pytorch/pytorch/pull/112076 Approved by: https://github.com/cpuhrsch ghstack dependencies: #112154	2023-10-30 19:16:05 +00:00
Jesse Cai	702aaf8aea	[sparse] semi-structured sparse + torch.compile support (#111049 ) Summary: This PR adds in torch.compile support for semi-structured sparsity, using the subclass tracing @bdhirsh added. Based on wether we are using cuSPARSELt or CUTLASS, we return a different representation of the inner tensors. Test Plan: ``` python test/test_sparse_semi_structured.py -k compile ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/111049 Approved by: https://github.com/cpuhrsch	2023-10-24 02:23:20 +00:00
Pearu Peterson	b969c675f5	Add batched dimensions support to the second operand of bsr_scatter_mm (#111796 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111796 Approved by: https://github.com/cpuhrsch ghstack dependencies: #110396, #111470, #111489, #111760	2023-10-23 23:52:49 +00:00
Pearu Peterson	6382011843	Add NVIDIA A100 optimized meta parameters to bsr_dense_mm (#111760 ) As in the title. The figures below illustrate the performance differences of bsr_dense_mm with optimized parameters and bsr_dense_mm with default parameters (GPU: NVIDIA A100-SXM4-80GB). The first figure represents the performance equilibrium point in BSR tensor sparsity at which value bsr_dense_mm have the same performance characteristics as torch.matmul. The second figure represents speedups from using optimized meta parameters in bsr_dense_mm at its performance equilibrium points with respect to bsr_dense_mm with default meta parameters. In sum, this PR speeds up `bsr_dense_mm` about 50 % depending on the bsr tensor shape and blocksize and lowers the performance equilibrium points of BSR tensor sparsity and strided tensor for matmul operations. <img src="https://github.com/pytorch/pytorch/assets/402156/6fe9d35f-dd21-4aa0-bb01-6ee257254453" width="48%"> <img src="https://github.com/pytorch/pytorch/assets/402156/506921c6-3770-4209-ad3d-498d2ae4989d" width="48%"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/111760 Approved by: https://github.com/cpuhrsch ghstack dependencies: #110396, #111470, #111489	2023-10-23 23:52:49 +00:00
Pearu Peterson	f3d08ab271	Use more performant bsr_scatter_mm within bsr_dense_mm when blocksize is 16. (#111489 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111489 Approved by: https://github.com/cpuhrsch ghstack dependencies: #110396, #111470	2023-10-23 23:52:49 +00:00
Pearu Peterson	6078ed95cc	Use lru_cache to cache indices data for bsr_scatter_mm. (#111470 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/111470 Approved by: https://github.com/cpuhrsch ghstack dependencies: #110396	2023-10-23 23:52:49 +00:00
Pearu Peterson	d4708a6da7	Add scatter_mm and bsr_scatter_mm operations. (#110396 ) This PR introduces `scatter_mm` operation (compute `mm` of arbitrary pairs of tensors given in batches of tensors) that is used to implement `bsr_scatter_mm` that is equivalent to `bsr_dense_mm` (the `mm` operation on bsr and strided tensors). The implementation is provided both in Triton (when tensor dimensions are multiples of 16) and in PyTorch (otherwise). The figures below illustrate the performance differences of `bsr_scatter_mm` and `bsr_dense_mm` (GPU: `NVIDIA GeForce RTX 2060 SUPER`). The first figure represents the performance equilibrium point in BSR tensor sparsity at which value `bsr_scatter_mm` or `bsr_dense_mm` have the same performance characteristics as `torch.matmul`. The second figure represents speedups from using `bsr_scatter_mm` at its performance equilibrium points with respect to `bsr_dense_mm`. <img src="https://github.com/pytorch/pytorch/assets/402156/526d182e-937f-4812-a6c4-904f52d6d5ab" width="48%"> <img src="https://github.com/pytorch/pytorch/assets/402156/ccb606ab-1f3f-4133-887c-b56285f4f168" width="48%"> The same figures for GPU card `NVIDIA A100-SXM4-80GB`: <img src="https://github.com/pytorch/pytorch/assets/402156/25466f1d-df34-4d1c-a975-afb478e4d9f0" width="48%"> <img src="https://github.com/pytorch/pytorch/assets/402156/6ada91f0-a20f-4f0d-8a48-1f4ccc60d08e" width="48%"> In sum: - `bsr_scatter_mm` is about 2x faster than `bsr_dense_mm` for small block sizes of 16 and 32 and large tensors [GPU: `NVIDIA GeForce RTX 2060 SUPER`]. - `bsr_scatter_mm` is up to 2x faster than `bsr_dense_mm` for small block sizes of 16 and large tensors [GPU: `NVIDIA A100-SXM4-80GB`]. - `bsr_dense_mm` is up to 20 % faster than `bsr_scatter_mm` for block sizes of 64 or larger [GPU: `NVIDIA GeForce RTX 2060 SUPER`]. - However, `bsr_dense_mm` fails with `OutOfResources` exception for block sizes of 256 or larger whereas `bsr_scatter_mm` succeeds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110396 Approved by: https://github.com/cpuhrsch	2023-10-23 19:45:30 +00:00
PyTorch MergeBot	41490119f2	Revert "[sparse] semi-structured sparse + torch.compile support (#111049 )" This reverts commit `408f210938`. Reverted https://github.com/pytorch/pytorch/pull/111049 on behalf of https://github.com/clee2000 due to Sorry I'm pretty sure this caused a memory leak `408f210938` https://github.com/pytorch/pytorch/actions/runs/6550388354/job/17790615103 `test_sparse_semi_structured.py::TestSparseSemiStructuredCUDA::test_mlp_contiguous_relu_compile_backend_cutlass_dense_input_shape_(1, 128)_cuda - RuntimeError: CUDA driver API confirmed a leak in __main__.TestSparseSemiStructuredCUDA.test_mlp_contiguous_relu_compile_backend_cutlass_dense_input_shape_(1, 128)_cuda! Caching allocator allocated memory was 235008 and is now reported as 352256 on device 0. CUDA driver allocated memory was 359333888 and is now 361431040.` ([comment](https://github.com/pytorch/pytorch/pull/111049#issuecomment-1767186569))	2023-10-17 21:11:09 +00:00
Jesse Cai	408f210938	[sparse] semi-structured sparse + torch.compile support (#111049 ) Summary: This PR adds in torch.compile support for semi-structured sparsity, using the subclass tracing @bdhirsh added. Based on wether we are using cuSPARSELt or CUTLASS, we return a different representation of the inner tensors. Test Plan: ``` python test/test_sparse_semi_structured.py -k compile ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/111049 Approved by: https://github.com/cpuhrsch ghstack dependencies: #110583	2023-10-16 23:07:26 +00:00
PyTorch MergeBot	b4745d476c	Revert "[sparse] semi-structured sparse + torch.compile support (#111049 )" This reverts commit `ac02531bab`. Reverted https://github.com/pytorch/pytorch/pull/111049 on behalf of https://github.com/DanilBaibak due to Broken trunk ([comment](https://github.com/pytorch/pytorch/pull/111049#issuecomment-1763795957))	2023-10-16 06:16:59 +00:00
Jesse Cai	ac02531bab	[sparse] semi-structured sparse + torch.compile support (#111049 ) Summary: This PR adds in torch.compile support for semi-structured sparsity, using the subclass tracing @bdhirsh added. Based on wether we are using cuSPARSELt or CUTLASS, we return a different representation of the inner tensors. Test Plan: ``` python test/test_sparse_semi_structured.py -k compile ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/111049 Approved by: https://github.com/cpuhrsch ghstack dependencies: #110583	2023-10-14 01:13:01 +00:00
Jesse Cai	8db72a430d	[sparse] Add padding for dense matrices in semi-structured sparse (#110583 ) Summary: Currently we have shape constraints in semi-structured sparsity for both CUTLASS and cuSPARSELt These shape constraints unfortunately apply to both the dense and sparse matrices in sparsedense matmul. This PR adds in support for calling `F.pad` in order to pad dense matrices to the right size with zeros and then pull out the corresponding rows from the resultant result matrix. We also throw a warning in this case. The tests have also been updated to take in a dense_input_shape parameter. Test Plan: ``` python test/test_sparse_semi_structured.py ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110583 Approved by: https://github.com/alexsamardzic, https://github.com/cpuhrsch	2023-10-13 20:04:23 +00:00
Jesse Cai	f10aab03c4	[sparse] Fix semi-structured sparse shape mismatch bug (#110420 ) Summary: Currently, PyTorch incorrectly calculates the size of the returned matrix when we pass a non-contiguous batched (>2d) input to the semi-structured sparse subclass. This is most common in MLP layers, where we have 2 linear layers back to back. This will lead to an error like the following: ``` RuntimeError: shape '[20, 64, 64, 3072]' is invalid for input of size 62914560 ``` Where the size of the sparse matmul result is off because we infer the output shape with the wrong tensor shape. This happens because of a bug where we did not update the subclass tensor shape when doing transpose. For semi-structured sparsity, transposing is a no-op where we just set the boolean flag, but we forgot to also update the tensor shape. Note that this error goes away in inference mode, since we avoid decomposing the aten.linear op and handle shape folding ourselves, which changes the execution path. An alternative way to fix this issue is to set TORCH_FLATTEN_LINEAR_3D=True, which will also fix this error. Test Plan: ``` python test/test_sparse_semi_structured.py -k test_mlp ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110420 Approved by: https://github.com/alexsamardzic, https://github.com/cpuhrsch	2023-10-10 03:07:31 +00:00
Aleksandar Samardžić	6a202c36af	Minor fixes in semi-structured sparse code (#105595 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105595 Approved by: https://github.com/jcaip	2023-09-25 14:06:08 +00:00
Oguz Ulgen	1df14f1bf8	Move has_triton to top level triton utils so that dynamo can also access (#109832 ) it without creating cyclic dependencies Pull Request resolved: https://github.com/pytorch/pytorch/pull/109832 Approved by: https://github.com/zou3519	2023-09-22 19:33:41 +00:00
Pearu Peterson	4e042cfed5	Improve triton bsr_dense_mm performance on column-major ordered inputs with float32 dtype (#108512 ) As in the title. The bsr_dense_mm performance on inputs using column-major storage order is relevant for `linear(x, W)` operation that for BSR weights is defined as `bsr_dense_mm(W, x.transpose(-2, -1)).transpose(-2, 1)` so that the second argument to `bse_dense_mm` is a strided tensor using column-major storage order when `x` is C-contiguous. For large inputs (size > 1000) and moderate sparsity in the BSR input, the speed up can be more than 3 times, as illustrated in the following figure (raw data: [bench_bsr_dense_mm_1_results.txt](https://github.com/pytorch/pytorch/files/12512245/bench_bsr_dense_mm_1_results.txt)): ![bench_bsr_dense_mm_1](https://github.com/pytorch/pytorch/assets/402156/c6372008-dfae-4d26-b119-2c3c944a74ae) For small inputs (size=512), there exists a slight degradation of performance. For row-major ordered inputs, there is no change in performance (see raw data above). For inputs with float16 dtype, there is no considerable change in performance (see blue marks in the figure). Pull Request resolved: https://github.com/pytorch/pytorch/pull/108512 Approved by: https://github.com/cpuhrsch	2023-09-06 17:30:06 +00:00
Pearu Peterson	c5ad44be1d	Add torch.sparse.as_sparse_gradcheck decorator of gradcheck that allows gradcheck input function to receive and return sparse tensors (#107150 ) Compared to #104848, this PR makes a step further: when the enable_sparse_support decorator is applied to `torch.autograd.gradcheck`, the resulting callable is equivalent to `torch.autograd.gradcheck` with an extra feature of supporting functions that can have input sparse tensors or/and can return sparse tensors. At the same time, the underlying call to `torch.autograd.gradcheck` will operate on strided tensors only. This basically means that torch/autograd/gradcheck.py can be cleaned up by removing the code that deals with sparse tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/107150 Approved by: https://github.com/albanD, https://github.com/amjames, https://github.com/cpuhrsch ghstack dependencies: #107638, #107777	2023-08-26 07:24:31 +00:00
Christian Puhrsch	925d71e72e	[core][sparse][pruning] cuSPARSELt Kernels and ops. (#107398 ) Summary: This is a duplicate PR of 102133, which was reverted because it was failing internal tests. It seems like that internal builds did not like my guard to check if cuSPARSELt was available or not. Test Plan: python test/test_sparse_semi_structured.py Differential Revision: D48440330 Pull Request resolved: https://github.com/pytorch/pytorch/pull/107398 Approved by: https://github.com/cpuhrsch	2023-08-25 07:04:15 +00:00
PyTorch MergeBot	fe594ab323	Revert "[core][pruning][feature] cuSPARSELt kernels and ops (#102133 )" This reverts commit `ad22f0ffb4`. Reverted https://github.com/pytorch/pytorch/pull/102133 on behalf of https://github.com/jcaip due to breaking lots of internal builds, see D48144534 ([comment](https://github.com/pytorch/pytorch/pull/102133#issuecomment-1671707821))	2023-08-09 16:03:14 +00:00
Jesse Cai	ad22f0ffb4	[core][pruning][feature] cuSPARSELt kernels and ops (#102133 ) This PR contains two new private ops, added for cuSPARSELt support. These ops call into the cuSPASRELt kernels using the bindings they provide. For more information, see the documentation [here](https://docs.nvidia.com/cuda/cusparselt/index.html). The two new private ops added are: ``` _cslt_compress() _cslt_sparse_mm() ``` _cslt_compress is an op that reuturns the compressesed matrix given a sparse matrix that is passed in. _cslt_sparse_mm is an op that expects a compressed matrix (the result of _cslt_compress) and a dense matrix and performs sparse-dense matmul These ops will throw runtime errors if they cusparselt is not present. This PR also modifies the test and tensor sublass to reflect the new cuSPARSELt support. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102133 Approved by: https://github.com/cpuhrsch	2023-08-08 06:59:22 +00:00
Justin Chu	3721fa5612	[BE] Enable ruff's UP rules and autoformat optim/ (#105426 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105426 Approved by: https://github.com/malfet, https://github.com/albanD, https://github.com/aaronenyeshi, https://github.com/janeyx99	2023-07-18 21:07:43 +00:00
Aleksandar Samardžić	5d473a950f	Make conversions from/to sparse semi-structured always @torch.compile-d (#105272 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105272 Approved by: https://github.com/ezyang	2023-07-18 04:51:28 +00:00
Nikita Shulga	5837e95d30	[Reland] Update mypy to 1.4.1 (#105227 ) This PR re-lands - [Typing] Fix PEP 484 Violation (#105022) - Update mypy to 1.4.1 (#91983) That were reverted due to the conflict with internal source repo. Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional) Plus few real fixes: - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi` - Add missing return statement to `torch._export. deserialize_graph` - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights` - Add assert it `torch/optim/optimizer.py` that Optional list is not None TODO (in followup PR): - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py` Unrelated, to bypass CI failures due to the gcc9 dependency update in Ubuntu-18.04: - Add hack to squash older libstdc++ from conda environment in favor one from OS to `.ci/docker/install_conda.sh` - Update bazel cuda builds to focal, as with libstdc++-6.0.32 bazel builds loose the ability to catch exceptions (probably because they link with cupti statically, but I could not found where it is done) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227 Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007	2023-07-15 20:30:20 +00:00
PyTorch MergeBot	15fd1ea118	Revert "[Reland] Update mypy to 1.4.1 (#105227 )" This reverts commit `c9c4f8efc3`. Reverted https://github.com/pytorch/pytorch/pull/105227 on behalf of https://github.com/atalman due to trying to mitigate ci sev #105248 ([comment](https://github.com/pytorch/pytorch/pull/105227#issuecomment-1636510935))	2023-07-14 22:28:35 +00:00
Nikita Shulga	c9c4f8efc3	[Reland] Update mypy to 1.4.1 (#105227 ) This PR re-lands - [Typing] Fix PEP 484 Violation (#105022) - Update mypy to 1.4.1 (#91983) That were reverted due to the conflict with internal source repo. Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional) Plus few real fixes: - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi` - Add missing return statement to `torch._export. deserialize_graph` - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights` - Add assert it `torch/optim/optimizer.py` that Optional list is not None TODO (in followup PR): - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/105227 Approved by: https://github.com/atalman, https://github.com/albanD, https://github.com/Skylion007	2023-07-14 20:45:12 +00:00
PyTorch MergeBot	3c5a494d7a	Revert "Update mypy to 1.4.1 (#91983 )" This reverts commit `634659e262`. Reverted https://github.com/pytorch/pytorch/pull/91983 on behalf of https://github.com/malfet due to It's dependent change was reverted, so reverting this one as well, to keep CI clean ([comment](https://github.com/pytorch/pytorch/pull/91983#issuecomment-1636059709))	2023-07-14 15:59:16 +00:00
Aleksandar Samardžić	d7e6040efa	Update sparse semi-structured linear operator (#104608 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104608 Approved by: https://github.com/cpuhrsch	2023-07-13 23:52:39 +00:00
Aleksandar Samardžić	fc2f87b281	Add semi-structured sparse conversions (#103830 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103830 Approved by: https://github.com/amjames, https://github.com/jcaip, https://github.com/cpuhrsch	2023-07-13 21:09:09 +00:00
nikitaved	44c8515d0d	SDPA: frontend for BSR masks (#104042 ) This PR implements a (yet private) frontend for scaled_dot_product_attention that works with BSR `attn_mask`. This function is directly comparable (with suitable masks) with `torch.nn.functional.scaled_dot_product_attention` once `attn_mask.dtype == torch.bool`, but it's behavior is different when `attn_mask.dtype != torch.bool`. This is because `torch.nn.functional.scaled_dot_product_attention` assumes that irrelevant values are supposed to be filled with `-inf`, while the selected ones should be `0`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104042 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2023-07-13 18:01:21 +00:00
Nikita Shulga	634659e262	Update mypy to 1.4.1 (#91983 ) Mostly fixes for PEP-484 violation (i.e. when default arg is set to None, but type is not annotated as optional) Plus few real fixes: - Add missing `_get_upgraders_entry_map` to `torch/_C/__init__.pyi` - Add missing return statement to `torch._export. deserialize_graph` - Fix error message in `torch.ao.ns.fx.weight_utils.get_lstm_mod_weights` - TODO (in followup PR): - Fix erroneous `isinstance` check in `torch/ao/quantization/_pt2e/qat_utils.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/91983 Approved by: https://github.com/kit1980, https://github.com/ZainRizvi, https://github.com/huydhn, https://github.com/thiagocrepaldi, https://github.com/aaronenyeshi	2023-07-13 16:30:36 +00:00
Jesse Cai	2da6cae43c	[core][pruning][sparse][feature] SparseSemiStructured tensor subclass (#102135 ) This PR adds in support for semi-structured sparsity via a tensor subclass. It currently uses the CUTLASS kernels merged in PR #100881. In the future we plan to add in cuSPARSELt support (see the other PRs in the stack), which will give us larger performance gains. This PR adds in 2 things: - a Tensor subclass, `SparseSemiStructuredTensor` to store the sparse tensor in copmressed form and override `__torch_dispatch__`. - a conversion function that takes in a dense tensor and a semi-structured sparse bool mask and creates an instance of the subclass. SparseSemiStructuredTensor The subclass stores the dense tensor in a contiguous flattened tensor for future compatability with cuSPARSELt, which expects this format. Note that the CUTLASS kernels do not have this limitation, as the specified values and the metadata are passed separately in `_structured_sparse_linear`. In the future we can use the cuSPARSELT bindings [here](https://github.com/pytorch/pytorch/pull/103700) for faster matmul, better dtype converage, and relaxed shape constraints. Since we currently don't have a way to go back from the sparse representation to the dense representation, and we store the weights in compressed form, we don't have a great way to handle .t(). Instead, we keep track of how often we've called transpose on our tensor, and if it's an unexpected number we throw an error. When the first argument is sparse, we expect an even number of calls to transpose, while when the second argument is sparse, we expect an odd number of calls. This is because we support second argument sparse matrix multiplications by using transpose properties. to_sparse_semi_structured This is a conversion function to convert a dense tensor and a semi-structured sparse bool mask into a subclass. Currently, we must pass in a bool mask, since we can't infer it becuase there may be additional zero elements in the dense tensor, so `tensor !=0` is not 2:4 sparse. Once we add either a method to derive the mask from the dense tensor or cuSPARSELt, we no longer need to pass in the mask. cuSPARSELt has it's own helper functions to create the metadata mask. User Details We have implemented support for the following ops for `torch.float16` and `torch.int8`: ``` torch.addmm(bias, dense, sparse.t()) torch.mm(dense, sparse) torch.mm(sparse, dense) aten.linear.default aten.t.default aten.t.detach ``` The end user interface to accelerate a nn.Linaer module with the subclass would look like this: ``` from torch.sparse import to_sparse_semi_structured mask = torch.Tensor([0, 0, 1, 1]).tile(128, 32).cuda().bool() linear = Model(128, 128).half().cuda() linear.weight = nn.Parameter(to_sparse_semi_structured(linear.weight, mask=linear.weight.bool()) ``` This also updates tests and the `torch.sparse` module docstring to reflect these changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102135 Approved by: https://github.com/albanD	2023-06-27 19:21:06 +00:00
PyTorch MergeBot	b76a040b18	Revert "[core][pruning][sparse][feature] SparseSemiStructured tensor subclass (#102135 )" This reverts commit `aea771de30`. Reverted https://github.com/pytorch/pytorch/pull/102135 on behalf of https://github.com/huydhn due to test_sparse_semi_structured.py::TestSparseSemiStructuredCUDA::test_mm_sparse_first_NT_cuda_int8 is still failing CUDA trunk jobs `aea771de30` ([comment](https://github.com/pytorch/pytorch/pull/102135#issuecomment-1608744110))	2023-06-27 03:49:31 +00:00
Jesse Cai	aea771de30	[core][pruning][sparse][feature] SparseSemiStructured tensor subclass (#102135 ) This PR adds in support for semi-structured sparsity via a tensor subclass. It currently uses the CUTLASS kernels merged in PR #100881. In the future we plan to add in cuSPARSELt support (see the other PRs in the stack), which will give us larger performance gains. This PR adds in 2 things: - a Tensor subclass, `SparseSemiStructuredTensor` to store the sparse tensor in copmressed form and override `__torch_dispatch__`. - a conversion function that takes in a dense tensor and a semi-structured sparse bool mask and creates an instance of the subclass. SparseSemiStructuredTensor The subclass stores the dense tensor in a contiguous flattened tensor for future compatability with cuSPARSELt, which expects this format. Note that the CUTLASS kernels do not have this limitation, as the specified values and the metadata are passed separately in `_structured_sparse_linear`. In the future we can use the cuSPARSELT bindings [here](https://github.com/pytorch/pytorch/pull/103700) for faster matmul, better dtype converage, and relaxed shape constraints. Since we currently don't have a way to go back from the sparse representation to the dense representation, and we store the weights in compressed form, we don't have a great way to handle .t(). Instead, we keep track of how often we've called transpose on our tensor, and if it's an unexpected number we throw an error. When the first argument is sparse, we expect an even number of calls to transpose, while when the second argument is sparse, we expect an odd number of calls. This is because we support second argument sparse matrix multiplications by using transpose properties. to_sparse_semi_structured This is a conversion function to convert a dense tensor and a semi-structured sparse bool mask into a subclass. Currently, we must pass in a bool mask, since we can't infer it becuase there may be additional zero elements in the dense tensor, so `tensor !=0` is not 2:4 sparse. Once we add either a method to derive the mask from the dense tensor or cuSPARSELt, we no longer need to pass in the mask. cuSPARSELt has it's own helper functions to create the metadata mask. User Details We have implemented support for the following ops for `torch.float16` and `torch.int8`: ``` torch.addmm(bias, dense, sparse.t()) torch.mm(dense, sparse) torch.mm(sparse, dense) aten.linear.default aten.t.default aten.t.detach ``` The end user interface to accelerate a nn.Linaer module with the subclass would look like this: ``` from torch.sparse import to_sparse_semi_structured mask = torch.Tensor([0, 0, 1, 1]).tile(128, 32).cuda().bool() linear = Model(128, 128).half().cuda() linear.weight = nn.Parameter(to_sparse_semi_structured(linear.weight, mask=linear.weight.bool()) ``` This also updates tests and the `torch.sparse` module docstring to reflect these changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102135 Approved by: https://github.com/albanD	2023-06-27 02:37:00 +00:00
PyTorch MergeBot	bfa08a1c67	Revert "[core][pruning][sparse][feature] SparseSemiStructured tensor subclass (#102135 )" This reverts commit `cf5262a84f`. Reverted https://github.com/pytorch/pytorch/pull/102135 on behalf of https://github.com/huydhn due to Sorry for reverting your PR but test_sparse_semi_structured.py::TestSparseSemiStructuredCUDA::test_mm_sparse_first_NT_cuda_int8 is failing CUDA trunk jobs `cf5262a84f`. This looks like a landrace ([comment](https://github.com/pytorch/pytorch/pull/102135#issuecomment-1608423849))	2023-06-26 22:54:16 +00:00
Jesse Cai	cf5262a84f	[core][pruning][sparse][feature] SparseSemiStructured tensor subclass (#102135 ) This PR adds in support for semi-structured sparsity via a tensor subclass. It currently uses the CUTLASS kernels merged in PR #100881. In the future we plan to add in cuSPARSELt support (see the other PRs in the stack), which will give us larger performance gains. This PR adds in 2 things: - a Tensor subclass, `SparseSemiStructuredTensor` to store the sparse tensor in copmressed form and override `__torch_dispatch__`. - a conversion function that takes in a dense tensor and a semi-structured sparse bool mask and creates an instance of the subclass. SparseSemiStructuredTensor The subclass stores the dense tensor in a contiguous flattened tensor for future compatability with cuSPARSELt, which expects this format. Note that the CUTLASS kernels do not have this limitation, as the specified values and the metadata are passed separately in `_structured_sparse_linear`. In the future we can use the cuSPARSELT bindings [here](https://github.com/pytorch/pytorch/pull/103700) for faster matmul, better dtype converage, and relaxed shape constraints. Since we currently don't have a way to go back from the sparse representation to the dense representation, and we store the weights in compressed form, we don't have a great way to handle .t(). Instead, we keep track of how often we've called transpose on our tensor, and if it's an unexpected number we throw an error. When the first argument is sparse, we expect an even number of calls to transpose, while when the second argument is sparse, we expect an odd number of calls. This is because we support second argument sparse matrix multiplications by using transpose properties. to_sparse_semi_structured This is a conversion function to convert a dense tensor and a semi-structured sparse bool mask into a subclass. Currently, we must pass in a bool mask, since we can't infer it becuase there may be additional zero elements in the dense tensor, so `tensor !=0` is not 2:4 sparse. Once we add either a method to derive the mask from the dense tensor or cuSPARSELt, we no longer need to pass in the mask. cuSPARSELt has it's own helper functions to create the metadata mask. User Details We have implemented support for the following ops for `torch.float16` and `torch.int8`: ``` torch.addmm(bias, dense, sparse.t()) torch.mm(dense, sparse) torch.mm(sparse, dense) aten.linear.default aten.t.default aten.t.detach ``` The end user interface to accelerate a nn.Linaer module with the subclass would look like this: ``` from torch.sparse import to_sparse_semi_structured mask = torch.Tensor([0, 0, 1, 1]).tile(128, 32).cuda().bool() linear = Model(128, 128).half().cuda() linear.weight = nn.Parameter(to_sparse_semi_structured(linear.weight, mask=linear.weight.bool()) ``` This also updates tests and the `torch.sparse` module docstring to reflect these changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102135 Approved by: https://github.com/albanD	2023-06-26 21:30:43 +00:00
Nikita Vedeneev	39a22e2791	softmax: Triton kernel for BSR inputs (#102095 ) Implements `softmax` Triton kernel for BSR inputs. So far, only over `dim=-1`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/102095 Approved by: https://github.com/cpuhrsch	2023-06-21 01:23:27 +00:00
Nikita Vedeneev	6c7410ddc3	sampled_addmm: BSR support (#101163 ) This PR implements a `sampled_addmm` kernel that works with a BSR mask. Pull Request resolved: https://github.com/pytorch/pytorch/pull/101163 Approved by: https://github.com/cpuhrsch	2023-05-25 12:33:50 +00:00
Nikita Vedeneev	dd2c22f4bb	bsr_dense_bmm(): enable more precise float32 support with float64 accumulators (#100882 ) Float64 is there in Triton! This PR increases precision for float32 inputs with float64 accumulation dtype. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100882 Approved by: https://github.com/cpuhrsch	2023-05-11 11:22:55 +00:00
Nikita Vedeneev	0141a242fd	bsr_dense_bmm(): remove sparse_rowspace kernel and some dead code (#100876 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/100876 Approved by: https://github.com/cpuhrsch, https://github.com/Skylion007	2023-05-09 16:12:11 +00:00
Nikita Vedeneev	c4bc259f00	bsr_dense_mm(): better test coverage (#100543 ) This PR improves test coverage for `bsr_dense_mm` by: - ~~enabling correctness tests for `float32`~~. - extending and testing input correctness checks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100543 Approved by: https://github.com/cpuhrsch, https://github.com/malfet	2023-05-09 09:26:02 +00:00
Nikita Vedeneev	cd8b82e5c6	bsr_dense_mm(): code refactoring (#100634 ) Code unification/refactoring for better re-use. Intended for easier `sampled_addmm` implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100634 Approved by: https://github.com/cpuhrsch	2023-05-08 13:27:39 +00:00
Nikita Vedeneev	05dda7ff65	bsr_dense_mm Triton kernel: fix out kwarg (#96648 ) As per title. The kernel did not handle `out=` correctly and returned a different tensor which only shared storage with `out`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/96648 Approved by: https://github.com/cpuhrsch	2023-03-14 18:01:22 +00:00
Natalia Gimelshein	76cac70939	new triton main pin (#95896 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/95896 Approved by: https://github.com/jansel, https://github.com/malfet	2023-03-10 06:30:41 +00:00
PyTorch MergeBot	d0731271cd	Revert "new triton main pin (#95896 )" This reverts commit `6e0359dd42`. Reverted https://github.com/pytorch/pytorch/pull/95896 on behalf of https://github.com/huydhn due to I am not quite sure what this is about yet, but testing 3.8 wheel starts to fail `6e0359dd42`	2023-03-10 05:41:45 +00:00
Natalia Gimelshein	6e0359dd42	new triton main pin (#95896 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/95896 Approved by: https://github.com/jansel	2023-03-10 03:40:37 +00:00
Nikita Vedeneev	d809020fc8	Triton kernel for bsr @ dense (#94823 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/94823 Approved by: https://github.com/cpuhrsch, https://github.com/malfet	2023-03-03 15:11:28 +00:00
Pearu Peterson	0c0694495b	Fix a bug in nesting check_sparse_tensor_invariants context managers (#95372 ) As in the title. The bug was reported in https://github.com/pytorch/pytorch/pull/94728#discussion_r1108892366 and has the following reproducer: ```python >>> import torch >>> check_ctx = torch.sparse.check_sparse_tensor_invariants(True) >>> no_check_ctx = torch.sparse.check_sparse_tensor_invariants(False) >>> with check_ctx: ... assert torch.sparse.check_sparse_tensor_invariants.is_enabled() ... with no_check_ctx: ... assert not torch.sparse.check_sparse_tensor_invariants.is_enabled() ... assert torch.sparse.check_sparse_tensor_invariants.is_enabled() ... Traceback (most recent call last): File "<stdin>", line 5, in <module> AssertionError ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/95372 Approved by: https://github.com/cpuhrsch	2023-02-23 18:22:13 +00:00
mingfeima	c620ece726	port sparse_mm.reduce to pytorch and optimize it on CPU (#83727 ) ### Motivation of this PR This patch is to migrate `spmm_reduce` from `torch-sparse` (a 3rd party dependency for PyG) to `torch`, which is a response to the initial proposal for fusion of Gather, Apply Scatter in Message Passing of GNN inference/training. https://github.com/pytorch/pytorch/issues/71300 GAS is the major step for Message Passing, the behavior of GAS can be classified into 2 kinds depending on the storage type of `EdgeIndex` which records the connections of nodes: * COO: the hotspot is `scatter_reduce` * CSR: the hotspot is `spmm_reduce` The reduce type can be choose from: "max", "mean", "max", "min". extend `torch.sparse.mm` with an `reduce` argument, maps to `torch.sparse_mm.reduce` internally. `sparse_mm_reduce` is registered under the TensorTypeId of `SparseCsrCPU`, and this operator requires an internal interface `_sparse_mm_reduce_impl` which has dual outputs: * `out` - the actual output * `arg_out` - records output indices in the non zero elements if the reduce type is "max" or "min", this is only useful for training. So for inference, it will not be calculated. ### Performance Benchmark on GCN for obgn-products on Xeon single socket, the workload is improved by `4.3x` with this patch. Performance benefit for training will be bigger, the original backward impl for `sum\|mean` is sequential; the original backward impl for `max\|min` is not fused. #### before: ``` ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ torch_sparse::spmm_sum 97.09% 56.086s 97.09% 56.088s 6.232s 9 aten::linear 0.00% 85.000us 1.38% 795.485ms 88.387ms 9 aten::matmul 0.00% 57.000us 1.38% 795.260ms 88.362ms 9 aten::mm 1.38% 795.201ms 1.38% 795.203ms 88.356ms 9 aten::relu 0.00% 50.000us 0.76% 440.434ms 73.406ms 6 aten::clamp_min 0.76% 440.384ms 0.76% 440.384ms 73.397ms 6 aten::add_ 0.57% 327.801ms 0.57% 327.801ms 36.422ms 9 aten::log_softmax 0.00% 23.000us 0.10% 55.503ms 18.501ms 3 ``` #### after ``` ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls ----------------------------- ------------ ------------ ------------ ------------ ------------ ------------ aten::spmm_sum 87.35% 11.826s 87.36% 11.827s 1.314s 9 aten::linear 0.00% 92.000us 5.87% 794.451ms 88.272ms 9 aten::matmul 0.00% 62.000us 5.87% 794.208ms 88.245ms 9 aten::mm 5.87% 794.143ms 5.87% 794.146ms 88.238ms 9 aten::relu 0.00% 53.000us 3.35% 452.977ms 75.496ms 6 aten::clamp_min 3.35% 452.924ms 3.35% 452.924ms 75.487ms 6 aten::add_ 2.58% 348.663ms 2.58% 348.663ms 38.740ms 9 aten::argmax 0.42% 57.473ms 0.42% 57.475ms 14.369ms 4 aten::log_softmax 0.00% 22.000us 0.39% 52.605ms 17.535ms 3 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/83727 Approved by: https://github.com/jgong5, https://github.com/cpuhrsch, https://github.com/rusty1s, https://github.com/pearu	2023-02-10 15:56:40 +00:00
Aaron Gokaslan	8fce9a09cd	[BE]: pyupgrade Python to 3.8 - imports and object inheritance only (#94308 ) Apply parts of pyupgrade to torch (starting with the safest changes). This PR only does two things: removes the need to inherit from object and removes unused future imports. Pull Request resolved: https://github.com/pytorch/pytorch/pull/94308 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-02-07 21:10:56 +00:00
PyTorch MergeBot	7012d985fa	Revert "Improve `bsr @ strided` performance in `baddmm` for `bfloat16/half` with Triton kernels. (#88078 )" This reverts commit `46f16b9363`. Reverted https://github.com/pytorch/pytorch/pull/88078 on behalf of https://github.com/ZainRizvi due to Causing a test to fail consistently: test_decomp.py::HasDecompTest::test_has_decomposition	2023-01-26 16:22:29 +00:00
Nikita Vedeneev	46f16b9363	Improve `bsr @ strided` performance in `baddmm` for `bfloat16/half` with Triton kernels. (#88078 ) As per title. Additionally we also introduce support for: - Rectangular block sizes which are powers of 2 and at least 16 (triton's `dot` limitation). - Batch support with broadcasting for either of the arguments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88078 Approved by: https://github.com/cpuhrsch	2023-01-26 07:58:27 +00:00
PyTorch MergeBot	60bf851931	Revert "Improve `bsr @ strided` performance in `baddmm` for `bfloat16/half` with Triton kernels. (#88078 )" This reverts commit `8383b5c488`. Reverted https://github.com/pytorch/pytorch/pull/88078 on behalf of https://github.com/malfet due to This seems to have broke sm_86 testing, see https://hud.pytorch.org/hud/pytorch/pytorch/master/1?per_page=50&name_filter=sm86%20%2F%20test%20(default%2C%203	2023-01-19 23:37:59 +00:00
Nikita Vedeneev	8383b5c488	Improve `bsr @ strided` performance in `baddmm` for `bfloat16/half` with Triton kernels. (#88078 ) As per title. Additionally we also introduce support for: - Rectangular block sizes which are powers of 2 and at least 16 (triton's `dot` limitation). - Batch support with broadcasting for either of the arguments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88078 Approved by: https://github.com/cpuhrsch	2023-01-19 03:14:54 +00:00
PyTorch MergeBot	89f1ad08b4	Revert "Improve `bsr @ strided` performance in `baddmm` for `bfloat16/half` with Triton kernels. (#88078 )" This reverts commit `7f256fff77`. Reverted https://github.com/pytorch/pytorch/pull/88078 on behalf of https://github.com/huydhn due to This breaks lint `7f256fff77`	2023-01-17 22:14:37 +00:00
Nikita Vedeneev	7f256fff77	Improve `bsr @ strided` performance in `baddmm` for `bfloat16/half` with Triton kernels. (#88078 ) As per title. Additionally we also introduce support for: - Rectangular block sizes which are powers of 2 and at least 16 (triton's `dot` limitation). - Batch support with broadcasting for either of the arguments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88078 Approved by: https://github.com/cpuhrsch	2023-01-17 21:43:20 +00:00
Pearu Peterson	b3e4f5029b	Add check-sparse-tensor-invariants flag to Context - 2nd try. (#92094 ) This PR is a copy of https://github.com/pytorch/pytorch/pull/90849 that merge was reverted. The PR adds "check sparse tensor invariants" flag to Context that when enabled will trigger sparse tensor data invariants checks in unsafe methods of constructing sparse COO/CSR/CSC/BSR/BSC tensors. The feature includes the following changes to UI: `torch.sparse.check_sparse_tensor_invariants` class provides different ways to enable/disable the invariant checking. `torch.sparse_coo/csr/csc/bsr/bsc/compressed_tensor` functions have a new optional argument `check_invariants` to enable/disable the invariant checks explicitly. When the `check_invariants` argument is specified, the global state of the feature is temporarily overridden. The PR fixes https://github.com/pytorch/pytorch/issues/90833 Pull Request resolved: https://github.com/pytorch/pytorch/pull/92094 Approved by: https://github.com/cpuhrsch	2023-01-13 14:50:33 +00:00
mingfeima	3ab58fd5ed	optimize sampled_addmm performance on CPU (SparseCSR) (#90978 ) ### Target and Background This PR is improving the performance of `sampled_addmm` on CPU device. This is part of effort for improving PyG performance on CPU for GNN training/inference. The current implementation is a reference design which converts `SparseCSR` tensor back to dense tensor and then do the addmm and convert back to `SparseCSR` again: this is going to be very slow and won't be able to run most of the datasets under https://github.com/snap-stanford/ogb (convert to dense would trigger `OOM`). ### Benchmarks Right now we don't have any hands-on benchmark or workload to test this since this operator is not used in PyG yet. I fetched the dataset from `ogb-products` where: * number of nodes: 2.4 * 10^6 * number of edges: 1.26 * 10^8 * number of features: 128 So if we store the adjacency matrix is dense, it is going to be 2.4 * 2.4 * 4 * 10^12 bytes, this will be OOB on current code. I abstract the first 1k rows to compare, 1100x speedup: CPU: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, dual socket, 20 cores per socket. ``` ### before: run 1000 rows from the whole dataset sampled_addmm: running dataset ogb-products first 1000 rows: each iter takes 1212.000 ms! ### after: run 1000 rows from the whole dataset sampled_addmm: running dataset ogb-products first 1000 rows: each iter takes 1.102 ms! ### after: run the whole dataset sampled_addmm: running dataset ogb-products (the whole dataset) 2449029 rows: each iter takes 873.306 ms! ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/90978 Approved by: https://github.com/pearu, https://github.com/cpuhrsch	2023-01-12 12:04:07 +00:00
PyTorch MergeBot	c7a22bb7c7	Revert "Add check-sparse-tensor-invariants flag to Context. (#90849 )" This reverts commit `b9a035c1c5`. Reverted https://github.com/pytorch/pytorch/pull/90849 on behalf of https://github.com/DanilBaibak due to Break internal build	2023-01-12 09:58:16 +00:00
PyTorch MergeBot	c5836153f5	Revert "optimize sampled_addmm performance on CPU (SparseCSR) (#90978 )" This reverts commit `645fb217c0`. Reverted https://github.com/pytorch/pytorch/pull/90978 on behalf of https://github.com/seemethere due to This broke internal builds for android due to the new file added being missing in build_variables.bzl	2023-01-11 20:12:12 +00:00
Pearu Peterson	b9a035c1c5	Add check-sparse-tensor-invariants flag to Context. (#90849 ) This PR adds "check sparse tensor invariants" flag to Context that when enabled will trigger sparse tensor data invariants checks in unsafe methods of constructing sparse COO/CSR/CSC/BSR/BSC tensors. The feature includes the following changes to UI: - `torch.enable_check_sparse_tensor_invariants` and `torch.is_check_sparse_tensor_invariants_enabled` functions to globally enable/disable the invariant checks and to retrieve the state of the feature, respectively - `torch.sparse_coo/csr/csc/bsr/bsc/compressed_tensor` functions have a new optional argument `check_invariants` to enable/disable the invariant checks explicitly. When the `check_invariants` argument is specified, the global state of the feature is temporarily overridden. The PR also fixes https://github.com/pytorch/pytorch/issues/90833 # Main issue The following content is outdated after merging the PRs in this ghstack but kept for the record. The importance of this feature is that when enabling the invariants checks by default, say, via <details> ``` $ git diff diff --git a/torch/__init__.py b/torch/__init__.py index c8543057c7..19a91d0482 100644 --- a/torch/__init__.py +++ b/torch/__init__.py @@ -1239,3 +1239,8 @@ if 'TORCH_CUDA_SANITIZER' in os.environ: # Populate magic methods on SymInt and SymFloat import torch.fx.experimental.symbolic_shapes + +# temporarily enable sparse tensor arguments validation in unsafe +# constructors: + +torch._C._set_check_sparse_tensor_invariants(True) ``` </details> a massive number of test failures/errors occur in test_sparse_csr.py tests: ``` $ pytest -sv test/test_sparse_csr.py <snip> ==== 4293 failed, 1557 passed, 237 skipped, 2744 errors in 69.71s (0:01:09) ==== ``` that means that we are silently constructing sparse compressed tensors that do not satisfy the sparse tensor invariants. In particular, the following errors are raised: ``` AssertionError: "resize_as_sparse_compressed_tensor_: self and src must have the same layout" does not match "expected values to be a strided and contiguous tensor" RuntimeError: CUDA error: device-side assert triggered RuntimeError: `col_indices[..., crow_indices[..., i - 1]:crow_indices[..., i]] for all i = 1, ..., nrows are sorted and distinct along the last dimension values` is not satisfied. RuntimeError: expected col_indices to be a strided and contiguous tensor RuntimeError: expected row_indices to be a strided and contiguous tensor RuntimeError: expected values to be a strided and contiguous tensor RuntimeError: for_each: failed to synchronize: cudaErrorAssert: device-side assert triggered RuntimeError: tensor dimensionality must be sum of batch, base, and dense dimensionalities (=0 + 2 + 0) but got 3 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/90849 Approved by: https://github.com/amjames, https://github.com/cpuhrsch	2023-01-11 01:05:14 +00:00
mingfeima	645fb217c0	optimize sampled_addmm performance on CPU (SparseCSR) (#90978 ) ### Target and Background This PR is improving the performance of `sampled_addmm` on CPU device. This is part of effort for improving PyG performance on CPU for GNN training/inference. The current implementation is a reference design which converts `SparseCSR` tensor back to dense tensor and then do the addmm and convert back to `SparseCSR` again: this is going to be very slow and won't be able to run most of the datasets under https://github.com/snap-stanford/ogb (convert to dense would trigger `OOM`). ### Benchmarks Right now we don't have any hands-on benchmark or workload to test this since this operator is not used in PyG yet. I fetched the dataset from `ogb-products` where: * number of nodes: 2.4 * 10^6 * number of edges: 1.26 * 10^8 * number of features: 128 So if we store the adjacency matrix is dense, it is going to be 2.4 * 2.4 * 4 * 10^12 bytes, this will be OOB on current code. I abstract the first 1k rows to compare, 1100x speedup: CPU: Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz, dual socket, 20 cores per socket. ``` ### before: run 1000 rows from the whole dataset sampled_addmm: running dataset ogb-products first 1000 rows: each iter takes 1212.000 ms! ### after: run 1000 rows from the whole dataset sampled_addmm: running dataset ogb-products first 1000 rows: each iter takes 1.102 ms! ### after: run the whole dataset sampled_addmm: running dataset ogb-products (the whole dataset) 2449029 rows: each iter takes 873.306 ms! ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/90978 Approved by: https://github.com/pearu, https://github.com/cpuhrsch	2023-01-10 22:13:35 +00:00
joncrall	4618371da5	Integrate xdoctest - Rebased (#82797 ) This is a new version of #15648 based on the latest master branch. Unlike the previous PR where I fixed a lot of the doctests in addition to integrating xdoctest, I'm going to reduce the scope here. I'm simply going to integrate xdoctest, and then I'm going to mark all of the failing tests as "SKIP". This will let xdoctest run on the dashboards, provide some value, and still let the dashboards pass. I'll leave fixing the doctests themselves to another PR. In my initial commit, I do the bare minimum to get something running with failing dashboards. The few tests that I marked as skip are causing segfaults. Running xdoctest results in 293 failed, 201 passed tests. The next commits will be to disable those tests. (unfortunately I don't have a tool that will insert the `#xdoctest: +SKIP` directive over every failing test, so I'm going to do this mostly manually.) Fixes https://github.com/pytorch/pytorch/issues/71105 @ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/82797 Approved by: https://github.com/ezyang	2022-08-12 02:08:01 +00:00
Andrew M. James	5a4c9e8394	Add spdiags sparse matrix initialization (#78439 ) Similar to [scipy.sparse.spdiags](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.spdiags.html#scipy-sparse-spdiags) Part of #70926 In other functions (ie (torch.diagonal)[https://pytorch.org/docs/stable/generated/torch.diagonal.html#torch.diagonal]) diagonals of a tensor are referenced using the offset and the two dimensions that the diagonal is taken with respect to. Here the reference implementation from scipy is only considering matrix output, so even if we only support 2-d output at first. It may be useful to consider how the dimensions corresponding to each diagonal would be specified for higher dimensional output. The proposed torch signature implies that all offsets refer to the diagonals with respect to the only two dimensions of the output: ``` torch.sparse.spdiags(Tensor diagonals, IntTensor offsets, int[] shape, Layout? layout=None) -> SparseTensor ``` Above it is required that: `diagonals.ndimension() == 2`, `offsets.ndimensions() == 1`, `offsets.shape[0] == diagonals.shape[0]` and `len(shape) == 2`. This would need to be altered for the case where `len(shape)` > 2. One options is: ``` torch.sparse.spdiags(Tensor[] diagonals, IntTensor[] offsets, IntTensor dims, int[] shape, Layout? layout=None) -> SparseTensor ``` Here `offsets` and `diagonals` becomes lists of tensors, and the `IntTensor dims` argument is introduced. This would require that `len(diagonals) == len(offsets) == dims.shape[0]`, `dims.ndimension() == 2` and `dims.shape[1] == 2` also the same restrictions as the 2d case above apply to the elements of `diagonals` and `offsets` pairwise (that is `diagonals[i].ndimension() == 2`, `offsets[i].ndimension() == 1` and `offsets[i].shape[0] == diagonals[i].shape[0]` for all i). This form of the signature would construct the sparse result by placing the values from `diagonals[i][j]` into the diagonal with offset `offset[i][j]` taken with respect to dimensions `dims[i]`. The specialization back to the original signature for the 2d case could be seen as allowing the single row of dims to default to `[0, 1]` when there is only one `diagonals`, `offsets` provided, and shape is `2-d`. This option allows the rows of an input element `diagonals[i]` to have a different length which may be appropriate as the max length of a diagonal along different dimension pairs will be different. Another option is to specify the dimensions the diagonal is taken with respect to for each offset. This signature would look like: ``` torch.sparse.spdiags(Tensor diagonals, IntTensor offsets, IntTensor dims, int[] shape, Layout? layout=None) -> SparseTensor ``` Here, `diagonals` is still 2-D with dimension 0 matching the length of 1-D `offsets` and the tensor input `dims` is also 2-D with dimension 0 matching the length of 1-D `offsets` and the second dimension being fixed at `2` in this case the sparse result is constructed by placing the elements from `diagonals[i]` into the output diagonal `output.diagonal(offset[i], dim0=dims[i][0], dim1=dims[i][1])` (with some additional consideration that makes it more complicated than simply asigning to that view). The specialization from this back to the 2-D form could be seen as assuming `dims = [[0, 1], [0, 1]... len(offsets) times ]` when `len shape==2`. In both proposed signatures for the N-D case the specialization back to the 2-D signature is a bit of a stretch for your typical default arguments logic, however I think the first is better choice as it offers more flexibility. I think some discussion is required about: - [x] Should the N-D output case be implemented from the outset - [x] If not, should the future addition of the N-D output case be considered when designing the interface. - [x] Other thoughts on the signature which includes the `dims` information for the N-D output case. Resolution: Since no one has offered a request for N-D output support, I think is fine to restrict this to sparse matrix generation. Should a request for N-D support come later, an overload accepting the additional `dims` could be added. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78439 Approved by: https://github.com/nikitaved, https://github.com/cpuhrsch, https://github.com/pearu	2022-07-01 01:11:54 +00:00
PyTorch MergeBot	56e3bc5215	Revert "Add spdiags sparse matrix initialization (#78439 )" This reverts commit `cfb2034b65`. Reverted https://github.com/pytorch/pytorch/pull/78439 on behalf of https://github.com/suo due to broke windows builds, see: `cfb2034b65`	2022-06-30 21:04:36 +00:00
Andrew M. James	cfb2034b65	Add spdiags sparse matrix initialization (#78439 ) Similar to [scipy.sparse.spdiags](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.spdiags.html#scipy-sparse-spdiags) Part of #70926 In other functions (ie (torch.diagonal)[https://pytorch.org/docs/stable/generated/torch.diagonal.html#torch.diagonal]) diagonals of a tensor are referenced using the offset and the two dimensions that the diagonal is taken with respect to. Here the reference implementation from scipy is only considering matrix output, so even if we only support 2-d output at first. It may be useful to consider how the dimensions corresponding to each diagonal would be specified for higher dimensional output. The proposed torch signature implies that all offsets refer to the diagonals with respect to the only two dimensions of the output: ``` torch.sparse.spdiags(Tensor diagonals, IntTensor offsets, int[] shape, Layout? layout=None) -> SparseTensor ``` Above it is required that: `diagonals.ndimension() == 2`, `offsets.ndimensions() == 1`, `offsets.shape[0] == diagonals.shape[0]` and `len(shape) == 2`. This would need to be altered for the case where `len(shape)` > 2. One options is: ``` torch.sparse.spdiags(Tensor[] diagonals, IntTensor[] offsets, IntTensor dims, int[] shape, Layout? layout=None) -> SparseTensor ``` Here `offsets` and `diagonals` becomes lists of tensors, and the `IntTensor dims` argument is introduced. This would require that `len(diagonals) == len(offsets) == dims.shape[0]`, `dims.ndimension() == 2` and `dims.shape[1] == 2` also the same restrictions as the 2d case above apply to the elements of `diagonals` and `offsets` pairwise (that is `diagonals[i].ndimension() == 2`, `offsets[i].ndimension() == 1` and `offsets[i].shape[0] == diagonals[i].shape[0]` for all i). This form of the signature would construct the sparse result by placing the values from `diagonals[i][j]` into the diagonal with offset `offset[i][j]` taken with respect to dimensions `dims[i]`. The specialization back to the original signature for the 2d case could be seen as allowing the single row of dims to default to `[0, 1]` when there is only one `diagonals`, `offsets` provided, and shape is `2-d`. This option allows the rows of an input element `diagonals[i]` to have a different length which may be appropriate as the max length of a diagonal along different dimension pairs will be different. Another option is to specify the dimensions the diagonal is taken with respect to for each offset. This signature would look like: ``` torch.sparse.spdiags(Tensor diagonals, IntTensor offsets, IntTensor dims, int[] shape, Layout? layout=None) -> SparseTensor ``` Here, `diagonals` is still 2-D with dimension 0 matching the length of 1-D `offsets` and the tensor input `dims` is also 2-D with dimension 0 matching the length of 1-D `offsets` and the second dimension being fixed at `2` in this case the sparse result is constructed by placing the elements from `diagonals[i]` into the output diagonal `output.diagonal(offset[i], dim0=dims[i][0], dim1=dims[i][1])` (with some additional consideration that makes it more complicated than simply asigning to that view). The specialization from this back to the 2-D form could be seen as assuming `dims = [[0, 1], [0, 1]... len(offsets) times ]` when `len shape==2`. In both proposed signatures for the N-D case the specialization back to the 2-D signature is a bit of a stretch for your typical default arguments logic, however I think the first is better choice as it offers more flexibility. I think some discussion is required about: - [x] Should the N-D output case be implemented from the outset - [x] If not, should the future addition of the N-D output case be considered when designing the interface. - [x] Other thoughts on the signature which includes the `dims` information for the N-D output case. Resolution: Since no one has offered a request for N-D output support, I think is fine to restrict this to sparse matrix generation. Should a request for N-D support come later, an overload accepting the additional `dims` could be added. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78439 Approved by: https://github.com/nikitaved, https://github.com/cpuhrsch, https://github.com/pearu	2022-06-30 19:54:47 +00:00
Christian Puhrsch	8c608a79b4	Compressed sparse layout conversion stubs (#77489 ) This PR unifies sparse layout conversions into a single location and adds stubs to raise a Runtime error for unsupported conversions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/77489 Approved by: https://github.com/pearu, https://github.com/mruberry	2022-05-16 18:37:42 +00:00
Christian Puhrsch	edf2deb81e	Add private conversion function from CSR to block CSR This PR adds a private function that converts a CSR Tensor into a [scipy-style block CSR Tensor](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.bsr_matrix.html#scipy.sparse.bsr_matrix). It uses the scipy CSR to BSR conversion routines (and credits them accordingly). The main purpose of this function is to easily create a block CSR Tensor for matrix multiplication. Follow up work includes - Blocksize support for sparse_csr_tensor - Parallel CPU kernel - CUDA kernels - Faster arg sanitization - Benchmarking of cuSPARSE backend - Dense to/from block CSR - Autograd support - Column-major blocks - Block CSR to CSR conversion Pull Request resolved: https://github.com/pytorch/pytorch/pull/71582 Approved by: https://github.com/IvanYashchuk, https://github.com/albanD	2022-03-25 21:22:15 +00:00
Ivan Yashchuk	ebd93f69db	Enable CSR inputs for torch.sparse.mm (#73075 ) Summary: Previously `torch.sparse.mm` supported only COO and dense inputs. Computing derivatives works wrt dense input for sparse_csr x dense -> dense Modified implementation of `torch.sparse.mm` to be directly bound to ATen function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/73075 Reviewed By: mikaylagawarecki Differential Revision: D34342954 Pulled By: cpuhrsch fbshipit-source-id: a6ed914a0ce28b35276109479109095f7149d32b (cherry picked from commit 948de1816c46cd087bacbee36dc583cf409813f9)	2022-02-24 04:30:48 +00:00
Ivan Yashchuk	8cdcc1181c	Add missing entry for sampled_addmm in sparse.rst (#72312 ) Summary: Let's make the documentation for `torch.sparse.sampled_addmm` searchable in the PyTorch documentation. This PR shall be cherry-picked for the next 1.11 release. Pull Request resolved: https://github.com/pytorch/pytorch/pull/72312 Reviewed By: davidberard98 Differential Revision: D34045230 Pulled By: cpuhrsch fbshipit-source-id: c1b1dc907443284857f48c8ce1efab22c6701bbe (cherry picked from commit `225929ecf2`)	2022-02-08 00:07:20 +00:00
Ivan Yashchuk	89a145fd91	Sparse CSR CUDA: Add torch.sparse.sampled_addmm (#68007 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68007 This PR adds a new function to the sparse module. `sampled_addmm` computes α(A @ B) spy(C) + β*C, where C is a sparse CSR matrix and A, B are dense (strided) matrices. This function is currently restricted to single 2D matrices, it doesn't support batched input. cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32435799 Pulled By: cpuhrsch fbshipit-source-id: b1ffac795080aef3fa05eaeeded03402bc097392	2021-11-29 15:43:29 -08:00
Christian Puhrsch	75955e4ef8	[clone][sparse] Add `torch._C._sparse` namespace (#68672 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68672 This PR adds `python_module: sparse` to `native_function.yaml`. These functions would appear in `torch._C._sparse` namespace instead of just `torch`. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D32517813 fbshipit-source-id: 7c3d6df57a24d7c7354d0fefe1b628dc89be9431	2021-11-19 19:47:38 -08:00
Shen Li	1022443168	Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: revert-hammer Differential Revision: D30279364 (`b004307252`) Original commit changeset: c1ed77dfe43a fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e	2021-08-12 11:45:01 -07:00
Zsolt Dollenstein	b004307252	[codemod][lint][fbcode/c*] Enable BLACK by default Test Plan: manual inspection & sandcastle Reviewed By: zertosh Differential Revision: D30279364 fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a	2021-08-12 10:58:35 -07:00

1 2 3 4

179 Commits