This PR enables all PIE rules on ruff, there are already some enabled rules from this family, the new added rules are
```
PIE796 Enum contains duplicate value: {value}
PIE808 Unnecessary start argument in range
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165814
Approved by: https://github.com/ezyang
This PR enables all PIE rules on ruff, there are already some enabled rules from this family, the new added rules are
```
PIE796 Enum contains duplicate value: {value}
PIE808 Unnecessary start argument in range
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/165814
Approved by: https://github.com/ezyang
This is the first PR of a series in an attempt to get the content of #134592 merged as smaller PRs (Given that the original one was closed due to a lack of reviewers).
This specific PR contains:
- Add and use a common raise_on_run_directly method for when a user runs a test file directly which should not be run this way. Print the file which the user should have run.
- Update ao tests.
There will be follow up PRs to update the other test suites but I don't have permissions to create branches directly on pytorch/pytorch so I can't create a stack and therefore will have to create them one at the time.
Cc @jerryzh168
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154612
Approved by: https://github.com/jcaip
Summary:
This PR implements `BaseSparsifier.convert()`, which performs module swapping.
The modules and mappings will be merged in a future PR.
Test Plan:
`python test/test_ao_sparsity.py -- TestBaseSparsifier.test_convert`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97545
Approved by: https://github.com/jerryzh168
Applies the remaining flake8-comprehension fixes and checks. This changes replace all remaining unnecessary generator expressions with list/dict/set comprehensions which are more succinct, performant, and better supported by our torch.jit compiler. It also removes useless generators such as 'set(a for a in b)`, resolving it into just the set call.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94676
Approved by: https://github.com/ezyang
The current unittests were only checking the tensors whose shapes were already multiples of the block size. That caused some hidden bugs to creep in. Specifically, for the shapes that would require padding for the mask/data, the sparsifier would try to apply shape-mismatching tensors onto each other. This caused segfaults as well as silent failures.
This makes minor adjustments to the code to make sure the masks and data shapes are aligned, as well as fixing the tests to catch this.
Test Plan:
```python
python test/test_ao_sparsity.py
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87326
Approved by: https://github.com/jcaip
`Sparsity` as a term doesn't reflect the tools that are developed by the AO. The `torch/ao/sparsity` also has utilities for structured pruning, which internally we always referred to as just "pruning". To avoid any confusion, we renamed `Sparsity` to `Prune`. We will not be introducing the backwards compatibility, as so far this toolset was kept under silent development.
This change will reflect the changes in the documentation as well.
**TODO:**
- [ ] Change the tutorials
- [ ] Confirm no bc-breakages
- [ ] Reflect the changes in the trackers and RFC docs
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84867
Approved by: https://github.com/supriyar
The previous implementation was using loops to compute the sparsity within a block in a mask, as well as across the mask blocks. This implements the vectorized version.
## Vectorization:
A high level overview of the vectorization procedure falls into a two step process:
### Tensor-level masking
A tensor-level masking is a mask generation routine that has a granularity of `sparse_block_shape`. That means that only patches of that shape can be considered sparse/dense. To vectorize:
1. Reshape the data such that one of the dimensions represents the patches of sparse_block_shape.
2. Create a mask of the same shape as the reshaped data
3. Find the smallest `k` elements in the the data, given the dimension of the sparse "patches". `k` represents a derived paramter specifying the sparsity level.
4. Apply the 0/1 to the patches in the mask
5. Reshape the mask back to the original dimensions
Note: because the shape of the mask might not be multiple of the sparse_block_shape, we nudge the sshape of the mask, and truncate it afterwards.
## Block-level masking
A block-level masking is a mask generation routine that concerns itself only with sparsity within a patch of shape `sparse_block_shape`. This is useful when block sparsity allows partial block sparsification.
To vectorize:
Overall the block-level masking follows the same routine as the tensor-level algorithm described above. One distinction is that when reshaping the data/mask tensors we aim for creating a dimension that captures the internals of each patch. For example, if a `sparse_block_shape` is `(2, 2)`, we want to reshape the data/mask into `(2, 2, -1)`. That allows us to sort the internal elements on the last axis, and zero-out the ones that obey the sparse logic.
Differential Revision: [D37352494](https://our.internmc.facebook.com/intern/diff/D37352494/)
**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D37352494/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80059
Approved by: https://github.com/jerryzh168
Summary: per our design discussion about sparsity API, we're
discontinuing the old API in favor of the new tensor_fqn based one.
The pruning class has not been updated mostly because this change
doesn't cause any further knock on effects.
Test Plan: python test/test_ao_sparsity.py
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79597
Approved by: https://github.com/z-a-f
Summary: updated sparsity api to accept tensor_fqn as primary
specification method, i.e. [{'tensor_fqn':
'linear.weight'}]
Pruning API also updated due to knock on changes.
left old api for accepting module_fqns but changed 'fqn' to 'module_fqn'
for clarity (this will break BC)
updated variables in code to use module rather than layer
updated state dict to use tensor_fqn rather than 'fqn' or 'module_fqn'
Test Plan: python test/test_ao_sparsity.py
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79113
Approved by: https://github.com/z-a-f
This sparsifier creates a nearly diagonal mask to be applied to the weight matrix.
Nearly Diagonal Matrix is a matrix that contains non-zero elements near the diagonal and the rest are zero.
An example of a nearly diagonal matrix with degree (or nearliness) 3 and 5 are follows respectively.
1 1 0 0 1 1 1 0
1 1 1 0 1 1 1 1
0 1 1 1 1 1 1 1
0 0 1 1 0 1 1 1
Note that a nearly diagonal matrix with degree 1 is just a matrix with main diagonal populated
This sparsifier is controlled by one variable:
1. `nearliness` defines the number of non-zero diagonal lines that are closest to the main diagonal.
Currently - supports only odd number
Note:
This can be accelerated (vectorized) once the Spdiagonal feature (PR: #78439) is landed or the banded matrix
feature is landed: https://stackoverflow.com/questions/52463972/generating-banded-matrices-using-numpy
Test Plan:
```
python test/test_ao_sparsity.py TestNearlyDiagonalSparsifier
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78448
Approved by: https://github.com/z-a-f, https://github.com/HDCharles
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66777
Sometimes one might need to keep the sparsity parameters after the sparsifier is detached.
This saves the parameters in the `sparse_params`.
There are two ways of keeping the sparsifier params:
1. Tuple[str, ...]: A tuple of all the parameters that need to be stored.
2. Dict[str, Tuple[str, ...]]: A dict of layer keys and parameters. In this case only specified layers will have the parameters attached to.
For example:
```
>>> # This will keep params in every module
>>> sparsifier.squash_mask(keep_sparse_params=('sparse_block_shape',))
>>> print(model.submodule.linear1.sparse_params)
{'sparse_block_shape': (1, 4)}
>>> print(model.submodule.linear2.sparse_params)
{'sparse_block_shape': (1, 4)}
```
```
>>> # This will keep params only in specific modules
>>> sparsifier.squash_mask(keep_sparse_params={'submodule.linear1': ('sparse_block_shape',)})
>>> print(model.submodule.linear1.sparse_params)
{'sparse_block_shape': (1, 4)}
>>> print(model.submodule.linear2.sparse_params)
AttributeError: 'Linear' object has no attribute 'sparse_params'
```
Test Plan: Imported from OSS
Reviewed By: vkuzo
Differential Revision: D31835722
Pulled By: z-a-f
fbshipit-source-id: 20c2d80207eb7ce7291e7f5f655d3fb2a627190f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65295
The m-out-of-n is implemented as follows:
1. Compute the blocks that need to be sparsified using the weight-norm criterion
2. Within each block below the threshold find the smallest absolute value elements
3. Zero out only the smallest values within each block
m-out-of-n describes sparsification scheme where in a block with "n" elements, only "m" of them would be zeroed-out.
Block sparsity, with the whole block being all zeros, is a special case of m-out-n: If m==n, the whole block is reset.
This echoes the implementation described in the https://github.com/pytorch/pytorch/issues/59835,
as well as meets the support of the nVidia cusparselt requirements.
To support the CUDA sparsity (2/4), one would need to set the sparsity_level to 1.0.
That translates to all blocks of shape 1x4 within a tensor will sprasify with 2-out-4 scheme.
Test Plan: Imported from OSS
Reviewed By: vkuzo
Differential Revision: D31186828
Pulled By: z-a-f
fbshipit-source-id: 7bd3e2707915b90f4831859781fc6e25f716c618
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65296
The original API described in the https://github.com/pytorch/pytorch/issues/59835
assumed that the per-layer configuration would take a module/layer
reference. However, a more useful approach is to refer to the layers
by their fully qualified names (FQN). That allows us to store the
configuration in a file without serializing the models.
We define a layer's FQN as it's "path" within a model. For example,
if one can refer to a model using `model.layer0.sublayerX`, the FQN
of the sublayerX is `'layer0.sublayerX'`.
Test Plan:
```
python test/test_ao_sparsity.py -- TestBaseSparsifier
buck test mode/opt //caffe2:test -- TestBaseSparsifier
```
Reviewed By: gchanan
Differential Revision: D31186830
Pulled By: z-a-f
fbshipit-source-id: d8d87f1c054e5c10d470e67837476a11e0a9b1d4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65293
This fixes a bug in the WeightNormSparsifier, where the mask is being multiplied by the newly computed mask.
Because the mask elements are binary 0/1, this accumulates the mask over every iteration, eventually collapsing the mask to zero.
This bug accidentally bled through from old versions.
Test Plan: Imported from OSS
Reviewed By: gchanan
Differential Revision: D31186829
Pulled By: z-a-f
fbshipit-source-id: 3f5b2c833148ab0bd8084e7410ce398f1252e65e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65292
That was the original design, that we decided to simplify by removing the packing in the sparsifier.
The state of the sparsifier is saved directly, and the old behavior accidentally bled through to the current version.
This change removes the `_pack_params` method, and changes the state_dict to include the state directly.
We don't have to change the load_state_dict, as it will work with either the old or the new format.
The main reason for this PR is the simplification. The original design didn't achieve anything useful by packing the sparsification parameters.
Test Plan: Imported from OSS
Reviewed By: gchanan
Differential Revision: D31186826
Pulled By: z-a-f
fbshipit-source-id: 4ad72a7e669f048d2f2d269269ee11b63fa169db
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58955
Implements the weight norm sparsifier.
This type of sparsifier computes the norm of the weights, sorts them, and zeroes-out the target fraction of them.
The main imeplemented method is `update_mask`, which holds the main logic of changing the masks.
Test Plan:
```
python test/test_ao_sparsity.py
```
Imported from OSS
Differential Revision:
D28970960
D28970960
Reviewed By: raghuramank100
Pulled By: z-a-f
fbshipit-source-id: 8f2a4360ad877f430cdc1065c6777106938b58d5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58704
Implements the base sparsifier class based on the #59835 RFC documents.
This PR implements the base class for the sparsification. Specifically, the prepare method is implemented.
Test Plan:
```
python test/test_ao_sparsity.py
```
Imported from OSS
Differential Revision:
D28970958
D28970958
Reviewed By: raghuramank100
Pulled By: z-a-f
fbshipit-source-id: 0ef98a445c0a0aca22ce5708e34a9f94606d0e2b