pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 00:21:07 +01:00

Author	SHA1	Message	Date
Yuanyuan Chen	7441a1b9b1	Update ruff to 0.13.1 (#163744 ) Update ruff to 0.13.1 so that we can remove `UP038` from `pyproject.toml` because it has been removed from supported rules of ruff. There are some fixes, the most notable one is [(PYI059)](https://docs.astral.sh/ruff/rules/generic-not-last-base-class/#generic-not-last-base-class-pyi059) ``` Checks for classes inheriting from typing.Generic[] where Generic[] is not the last base class in the bases tuple. ``` A BC-breaking change is introduced to change the typing of `OrderedSet .storage` Pull Request resolved: https://github.com/pytorch/pytorch/pull/163744 Approved by: https://github.com/Skylion007, https://github.com/jingsh	2025-09-26 10:12:21 +00:00
Annop Wongwathanarat	6fcffd8cd1	Optimize SVE embedding performance (#150176 ) Change loop unrolling strategy. Previously, the script only unrolls the inner loop over block_size when block size is multiple of vector length. This version instead unrolls the outer loop which reduces the number of load/store for accumulation into the output array and improves performance for cases when block size is not multiple of vector length. Benchmarking script: ```python # SPDX-FileCopyrightText: Copyright 2025 Arm Limited and/or its affiliate <open-source-office@arm.com> # SPDX-License-Identifier: BSD-3-Clause import torch import torch.nn as nn import numpy as np import time import sys np.random.seed(0) torch.manual_seed(0) num_embeddings = 400000 embedding_dim = int(sys.argv[1]) multi_hot = 100 batch_size = 400 nrun = 1000 class SimpleEmbeddingBagModel(nn.Module): def __init__(self, num_embeddings, embedding_dim): super(SimpleEmbeddingBagModel, self).__init__() weights = torch.from_numpy((np.random.random_sample((num_embeddings, embedding_dim)) + 1).astype(np.float32)).to(torch.float16) # Defining the EmbeddingBag layer self.embedding_bag = torch.nn.EmbeddingBag(num_embeddings, embedding_dim, _weight=weights, mode='sum', include_last_offset=True, dtype=torch.float32) def forward(self, input, offsets): # Forward pass through the EmbeddingBag layer result32 = self.embedding_bag(input, offsets, per_sample_weights=None) return result32 # Instantiate the model model = SimpleEmbeddingBagModel(num_embeddings=num_embeddings, embedding_dim=embedding_dim) model.eval() # Example input input_tensor = torch.randint(0, num_embeddings, (batch_size * multi_hot,), dtype=torch.long) offsets = torch.tensor(range(0, batch_size * multi_hot + 1, multi_hot)) with torch.no_grad(): # warm up output32 = model(input_tensor, offsets) ti = time.time_ns() for i in range(nrun): _ = model(input_tensor, offsets) tf = time.time_ns() print("{:3d} {:.3E}".format(embedding_dim, (tf-ti)/nrun/1.e6)) ``` Speedup on NEOVERSEV1 with 1 thread ![embedding](https://github.com/user-attachments/assets/16e567ed-b9a5-4db3-90b8-dec66d5414a7) Pull Request resolved: https://github.com/pytorch/pytorch/pull/150176 Approved by: https://github.com/digantdesai, https://github.com/malfet	2025-04-07 18:01:54 +00:00
cyy	f9bf9057ef	Fix ruff warnings in caffe2 and functorch (#144182 ) In preparation for upgrading ruff config to py3.9. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144182 Approved by: https://github.com/malfet	2025-01-04 04:15:01 +00:00
Siddhartha Menon	e1e6417d4c	Add SVE implementation of embedding_lookup_idx (#133995 ) Adds an accelerated version of the embedding_lookup_idx perfkernels. This is done via a python codegen file similarly to `caffe2/perfkernels/hp_emblookup_codegen.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/133995 Approved by: https://github.com/malfet, https://github.com/huydhn	2024-10-15 18:52:44 +00:00
PyTorch MergeBot	dac0b4e62b	Revert "Add SVE implementation of embedding_lookup_idx (#133995 )" This reverts commit `770c134998`. Reverted https://github.com/pytorch/pytorch/pull/133995 on behalf of https://github.com/clee2000 due to breaking internal tests, I wondering if this just needs a targets change for buck? ([comment](https://github.com/pytorch/pytorch/pull/133995#issuecomment-2414596554))	2024-10-15 17:23:50 +00:00
Siddhartha Menon	770c134998	Add SVE implementation of embedding_lookup_idx (#133995 ) Adds an accelerated version of the embedding_lookup_idx perfkernels. This is done via a python codegen file similarly to `caffe2/perfkernels/hp_emblookup_codegen.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/133995 Approved by: https://github.com/malfet, https://github.com/huydhn	2024-10-14 10:17:27 +00:00

6 Commits