Commit Graph

4487 Commits

Author SHA1 Message Date
Masaki Kozuki
6cc0158311 Use maybe_unused attr in VariableType (#100498)
simple cosmetic change, a fallout of #100250
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100498
Approved by: https://github.com/albanD
2023-05-03 14:14:29 +00:00
PyTorch MergeBot
c3aa59c8f5 Revert "[WIP] enable cuda graphs support for flash attention with dropout (#100196)"
This reverts commit 32615618e4.

Reverted https://github.com/pytorch/pytorch/pull/100196 on behalf of https://github.com/clee2000 due to broke no ops build 32615618e4 https://github.com/pytorch/pytorch/actions/runs/4866578063/jobs/8678258318 ([comment](https://github.com/pytorch/pytorch/pull/100196#issuecomment-1532352810))
2023-05-03 01:41:56 +00:00
Natalia Gimelshein
32615618e4 [WIP] enable cuda graphs support for flash attention with dropout (#100196)
Fixes #99905

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100196
Approved by: https://github.com/drisspg
2023-05-02 23:05:31 +00:00
Masaki Kozuki
311c2bb7ec Move pattern match for foreach before bulky if-else in save_variables (#100445)
One caveat could be that the first if branch doesn't seem to use `arg.expr` at all.

fixes https://github.com/pytorch/pytorch/pull/96405#discussion_r1175669480.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100445
Approved by: https://github.com/soulitzer
2023-05-02 20:38:51 +00:00
Donald Dong
a1d041728b
Back out "[aarch64][tools/build_defs/third_party/fbcode_defs.bzl] Fix dep handling in cross-builds"
Differential Revision: D45415678nnPull Request resolved: https://github.com/pytorch/pytorch/pull/100294
2023-05-01 16:27:51 -07:00
PaliC
0cf6e74fa9 add users to external contrib stat upload (#100403)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100403
Approved by: https://github.com/kit1980
2023-05-01 20:35:51 +00:00
Masaki Kozuki
6c934a89a7 Skip invalid grads in outplace foreachs' backward (#100256)
Fixes #100248
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100256
Approved by: https://github.com/soulitzer, https://github.com/albanD
2023-04-29 22:45:26 +00:00
Sahan Paliskara
2b79d6c425 Update testing aggregate data (#100070)
Updates testing aggregates data to also show workflows which is useful for actually seeing how long workflows take.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100070
Approved by: https://github.com/seemethere
2023-04-29 00:09:52 +00:00
Masaki Kozuki
9e1f46d55b Use [[maybe_unused]] in VariableType_[0-4].cpp (#100250)
This is kind of trivial, as per title.

Removing `(void)_any_requires_grad` and giving `[[maybe_unused]]` attribute to that variable.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100250
Approved by: https://github.com/Skylion007, https://github.com/soulitzer
2023-04-28 19:00:19 +00:00
Richard Zou
4135295a76 Excise yaml dependency in torchgen.model (#100203)
The problem:
- The new CustomOp API depends on torchgen.model
- torchgen.model imports `yaml`
- `yaml` is not a PyTorch runtime dependency

To unblock myself, because I'm not sure how long it'll take to
convince people yaml should be a PyTorch runtime dependency
(unless one of you wants to approve #100166), this PR removes the
yaml dependency from torchgen.model.

It does so by splitting torchgen.utils (the offender) into
torchgen.utils (no yaml) and torchgen.yaml (which uses yaml).

Test Plan:
- CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100203
Approved by: https://github.com/ezyang, https://github.com/Skylion007
2023-04-28 13:45:39 +00:00
Masaki Kozuki
674018903d per-Tensor grad_fn for in-place foreach functions (#96405)
Generate a `grad_fn` for each (tuple of) `Tensor`(s) of the same index for `_foreach_foo_` and each `grad_fn` is `FooBackward`.

The current status of foreach functions' backward support for the record:
- out-place: Implemented, but no optimized implementations like their forward path
- in-place: not implemented. I think this check 7eaaefafb3/torchgen/api/autograd.py (L309-L311) is partly responsible but the difference of signature between out-place and in-place (see https://github.com/pytorch/pytorch/pull/96405#discussion_r1154690940) would prevent in-place from using out-place versions (the logic is around 7eaaefafb3/torchgen/api/autograd.py (L495-L500))

```c++
void _foreach_abs_(c10::DispatchKeySet ks, at::TensorList self) {
  auto self_ = unpack(self, "self", 0);
  #ifndef NDEBUG
  std::vector<c10::optional<Storage>> self__storage_saved(self_.size());
  for (const Tensor& tensor : self_)
    self__storage_saved.push_back(
      tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt);
  std::vector<c10::intrusive_ptr<TensorImpl>> self__impl_saved(self_.size());
  for (size_t i=0; i<self_.size(); i++)
    if (self_[i].defined()) self__impl_saved[i] = self_[i].getIntrusivePtr();
  #endif
  {
    at::AutoDispatchBelowAutograd guard;
    at::redispatch::_foreach_abs_(ks & c10::after_autograd_keyset, self_);
  }
  #ifndef NDEBUG
  for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (self__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(self_))
      AT_ASSERT(self__storage_saved[i].value().is_alias_of(self_[i].storage()));
  }
  for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (self__impl_saved[i] && !at::impl::tensorlist_has_dispatch(self_))
      AT_ASSERT(self__impl_saved[i] == self_[i].getIntrusivePtr());
  }
  #endif
}
```

Related:
- #95431
- #95765 for multiple `grad_fn`s logic

---

Examples: outputs of `_foreach_add_.List`, `_foreach_addcmul_.ScalarList`, and `_foreach_exp`

```c++
void _foreach_addcmul__ScalarList(c10::DispatchKeySet ks, at::TensorList self, at::TensorList tensor1, at::TensorList tensor2, at::ArrayRef<at::Scalar> scalars) {
  auto self_ = unpack(self, "self", 0);
  auto tensor1_ = unpack(tensor1, "tensor1", 1);
  auto tensor2_ = unpack(tensor2, "tensor2", 2);
  auto _any_requires_grad = compute_requires_grad( self, tensor1, tensor2 );

  (void)_any_requires_grad;
  std::vector<c10::optional<at::Tensor>> original_selfs(self.size());
  std::vector<std::shared_ptr<AddcmulBackward0>> grad_fns;
  if (_any_requires_grad) {
    for (const auto& i : c10::irange( self.size() )) {
      const auto ith_requires_grad = compute_requires_grad(self[i], tensor1[i], tensor2[i]);
      check_inplace(self[i], ith_requires_grad);
      grad_fns.push_back([&]() -> std::shared_ptr<AddcmulBackward0> {
          if (!ith_requires_grad) {
              return nullptr;
          } else {
              auto grad_fn = std::shared_ptr<AddcmulBackward0>(new AddcmulBackward0(), deleteNode);
              grad_fn->set_next_edges(collect_next_edges( self[i], tensor1[i], tensor2[i] ));
              return grad_fn;
          }
      }());
    }
    if (!grad_fns.empty()) {

        for (const auto& i : c10::irange(grad_fns.size())) {
            auto grad_fn = grad_fns[i];
            if (grad_fn != nullptr) {
                grad_fn->self_scalar_type = self[i].scalar_type();
                grad_fn->tensor1_scalar_type = tensor1[i].scalar_type();
                if (grad_fn->should_compute_output(1)) {
                  grad_fn->tensor2_ = SavedVariable(tensor2[i], false);
                }
                grad_fn->value = scalars[i];
                if (grad_fn->should_compute_output(2)) {
                  grad_fn->tensor1_ = SavedVariable(tensor1[i], false);
                }
                grad_fn->tensor2_scalar_type = tensor2[i].scalar_type();
            }
        }
    }
  }
  #ifndef NDEBUG
  std::vector<c10::optional<Storage>> self__storage_saved(self_.size());
  for (const Tensor& tensor : self_)
    self__storage_saved.push_back(
      tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt);
  std::vector<c10::intrusive_ptr<TensorImpl>> self__impl_saved(self_.size());
  for (size_t i=0; i<self_.size(); i++)
    if (self_[i].defined()) self__impl_saved[i] = self_[i].getIntrusivePtr();
  std::vector<c10::optional<Storage>> tensor1__storage_saved(tensor1_.size());
  for (const Tensor& tensor : tensor1_)
    tensor1__storage_saved.push_back(
      tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt);
  std::vector<c10::intrusive_ptr<TensorImpl>> tensor1__impl_saved(tensor1_.size());
  for (size_t i=0; i<tensor1_.size(); i++)
    if (tensor1_[i].defined()) tensor1__impl_saved[i] = tensor1_[i].getIntrusivePtr();
  std::vector<c10::optional<Storage>> tensor2__storage_saved(tensor2_.size());
  for (const Tensor& tensor : tensor2_)
    tensor2__storage_saved.push_back(
      tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt);
  std::vector<c10::intrusive_ptr<TensorImpl>> tensor2__impl_saved(tensor2_.size());
  for (size_t i=0; i<tensor2_.size(); i++)
    if (tensor2_[i].defined()) tensor2__impl_saved[i] = tensor2_[i].getIntrusivePtr();
  #endif
  {
    at::AutoDispatchBelowAutograd guard;
    at::redispatch::_foreach_addcmul_(ks & c10::after_autograd_keyset, self_, tensor1_, tensor2_, scalars);
  }
  #ifndef NDEBUG
  for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (self__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(self_))
      TORCH_INTERNAL_ASSERT(self__storage_saved[i].value().is_alias_of(self_[i].storage()));
  }
  for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (self__impl_saved[i] && !at::impl::tensorlist_has_dispatch(self_))
      TORCH_INTERNAL_ASSERT(self__impl_saved[i] == self_[i].getIntrusivePtr());
  }
  for (size_t i=0; i<tensor1_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (tensor1__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(tensor1_))
      TORCH_INTERNAL_ASSERT(tensor1__storage_saved[i].value().is_alias_of(tensor1_[i].storage()));
  }
  for (size_t i=0; i<tensor1_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (tensor1__impl_saved[i] && !at::impl::tensorlist_has_dispatch(tensor1_))
      TORCH_INTERNAL_ASSERT(tensor1__impl_saved[i] == tensor1_[i].getIntrusivePtr());
  }
  for (size_t i=0; i<tensor2_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (tensor2__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(tensor2_))
      TORCH_INTERNAL_ASSERT(tensor2__storage_saved[i].value().is_alias_of(tensor2_[i].storage()));
  }
  for (size_t i=0; i<tensor2_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (tensor2__impl_saved[i] && !at::impl::tensorlist_has_dispatch(tensor2_))
      TORCH_INTERNAL_ASSERT(tensor2__impl_saved[i] == tensor2_[i].getIntrusivePtr());
  }
  #endif
  if (!grad_fns.empty()) {
      auto differentiable_outputs = flatten_tensor_args( self );
      TORCH_INTERNAL_ASSERT(differentiable_outputs.size() == grad_fns.size());
      for (const auto& i : c10::irange(grad_fns.size())) {
          auto grad_fn = grad_fns[i];
          if (grad_fn != nullptr) {
              rebase_history(differentiable_outputs[i], grad_fns[i]);
          }
      }
  }
}

```

```c++
void _foreach_add__List(c10::DispatchKeySet ks, at::TensorList self, at::TensorList other, const at::Scalar & alpha) {
  auto self_ = unpack(self, "self", 0);
  auto other_ = unpack(other, "other", 1);
  auto _any_requires_grad = compute_requires_grad( self, other );

  (void)_any_requires_grad;
  std::vector<c10::optional<at::Tensor>> original_selfs(self.size());
  std::vector<std::shared_ptr<AddBackward0>> grad_fns;
  if (_any_requires_grad) {
    for (const auto& i : c10::irange( self.size() )) {
      const auto ith_requires_grad = compute_requires_grad(self[i], other[i]);
      check_inplace(self[i], ith_requires_grad);
      grad_fns.push_back([&]() -> std::shared_ptr<AddBackward0> {
          if (!ith_requires_grad) {
              return nullptr;
          } else {
              auto grad_fn = std::shared_ptr<AddBackward0>(new AddBackward0(), deleteNode);
              grad_fn->set_next_edges(collect_next_edges( self[i], other[i] ));
              return grad_fn;
          }
      }());
    }
    if (!grad_fns.empty()) {

        for (const auto& i : c10::irange(grad_fns.size())) {
            auto grad_fn = grad_fns[i];
            if (grad_fn != nullptr) {
                grad_fn->other_scalar_type = other[i].scalar_type();
                grad_fn->alpha = alpha;
                grad_fn->self_scalar_type = self[i].scalar_type();
            }
        }
    }
  }
  #ifndef NDEBUG
  std::vector<c10::optional<Storage>> self__storage_saved(self_.size());
  for (const Tensor& tensor : self_)
    self__storage_saved.push_back(
      tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt);
  std::vector<c10::intrusive_ptr<TensorImpl>> self__impl_saved(self_.size());
  for (size_t i=0; i<self_.size(); i++)
    if (self_[i].defined()) self__impl_saved[i] = self_[i].getIntrusivePtr();
  std::vector<c10::optional<Storage>> other__storage_saved(other_.size());
  for (const Tensor& tensor : other_)
    other__storage_saved.push_back(
      tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt);
  std::vector<c10::intrusive_ptr<TensorImpl>> other__impl_saved(other_.size());
  for (size_t i=0; i<other_.size(); i++)
    if (other_[i].defined()) other__impl_saved[i] = other_[i].getIntrusivePtr();
  #endif
  {
    at::AutoDispatchBelowAutograd guard;
    at::redispatch::_foreach_add_(ks & c10::after_autograd_keyset, self_, other_, alpha);
  }
  #ifndef NDEBUG
  for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (self__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(self_))
      TORCH_INTERNAL_ASSERT(self__storage_saved[i].value().is_alias_of(self_[i].storage()));
  }
  for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (self__impl_saved[i] && !at::impl::tensorlist_has_dispatch(self_))
      TORCH_INTERNAL_ASSERT(self__impl_saved[i] == self_[i].getIntrusivePtr());
  }
  for (size_t i=0; i<other_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (other__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(other_))
      TORCH_INTERNAL_ASSERT(other__storage_saved[i].value().is_alias_of(other_[i].storage()));
  }
  for (size_t i=0; i<other_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (other__impl_saved[i] && !at::impl::tensorlist_has_dispatch(other_))
      TORCH_INTERNAL_ASSERT(other__impl_saved[i] == other_[i].getIntrusivePtr());
  }
  #endif
  if (!grad_fns.empty()) {
      auto differentiable_outputs = flatten_tensor_args( self );
      TORCH_INTERNAL_ASSERT(differentiable_outputs.size() == grad_fns.size());
      for (const auto& i : c10::irange(grad_fns.size())) {
          auto grad_fn = grad_fns[i];
          if (grad_fn != nullptr) {
              rebase_history(differentiable_outputs[i], grad_fns[i]);
          }
      }
  }
}

...

void _foreach_exp_(c10::DispatchKeySet ks, at::TensorList self) {
  auto self_ = unpack(self, "self", 0);
  auto _any_requires_grad = compute_requires_grad( self );

  (void)_any_requires_grad;
  std::vector<c10::optional<at::Tensor>> original_selfs(self.size());
  std::vector<std::shared_ptr<ExpBackward0>> grad_fns;
  if (_any_requires_grad) {
    for (const auto& i : c10::irange( self.size() )) {
      const auto ith_requires_grad = compute_requires_grad(self[i]);
      check_inplace(self[i], ith_requires_grad);
      grad_fns.push_back([&]() -> std::shared_ptr<ExpBackward0> {
          if (!ith_requires_grad) {
              return nullptr;
          } else {
              auto grad_fn = std::shared_ptr<ExpBackward0>(new ExpBackward0(), deleteNode);
              grad_fn->set_next_edges(collect_next_edges( self[i] ));
              return grad_fn;
          }
      }());
    }
  }
  #ifndef NDEBUG
  std::vector<c10::optional<Storage>> self__storage_saved(self_.size());
  for (const Tensor& tensor : self_)
    self__storage_saved.push_back(
      tensor.has_storage() ? c10::optional<Storage>(tensor.storage()) : c10::nullopt);
  std::vector<c10::intrusive_ptr<TensorImpl>> self__impl_saved(self_.size());
  for (size_t i=0; i<self_.size(); i++)
    if (self_[i].defined()) self__impl_saved[i] = self_[i].getIntrusivePtr();
  #endif
  {
    at::AutoDispatchBelowAutograd guard;
    at::redispatch::_foreach_exp_(ks & c10::after_autograd_keyset, self_);
  }
  #ifndef NDEBUG
  for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (self__storage_saved[i].has_value() && !at::impl::tensorlist_has_dispatch(self_))
      TORCH_INTERNAL_ASSERT(self__storage_saved[i].value().is_alias_of(self_[i].storage()));
  }
  for (size_t i=0; i<self_.size() && !at::impl::dispatch_mode_enabled(); i++) {
    if (self__impl_saved[i] && !at::impl::tensorlist_has_dispatch(self_))
      TORCH_INTERNAL_ASSERT(self__impl_saved[i] == self_[i].getIntrusivePtr());
  }
  #endif
  if (!grad_fns.empty()) {
      auto differentiable_outputs = flatten_tensor_args( self );
      TORCH_INTERNAL_ASSERT(differentiable_outputs.size() == grad_fns.size());
      for (const auto& i : c10::irange(grad_fns.size())) {
          auto grad_fn = grad_fns[i];
          if (grad_fn != nullptr) {
              rebase_history(differentiable_outputs[i], grad_fns[i]);
          }
      }
  }
  if (!grad_fns.empty()) {

      for (const auto& i : c10::irange(grad_fns.size())) {
          auto grad_fn = grad_fns[i];
          if (grad_fn != nullptr) {
              grad_fn->result_ = SavedVariable(self[i], true, self[i].is_view());
          }
      }
  }
}

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96405
Approved by: https://github.com/soulitzer
2023-04-28 00:55:04 +00:00
feifan
c0ecd98958 Rename DispatchKey.PrivateUse1 to custom device in torchgen. (#99406)
I want to use torchgen to generate code, and my yaml file format is the same as `native_functions.yaml`.
I will use the PrivateUse1, but in my yaml file, I don't want to show PrivateUse1 to the user.
So I want to  achieve the following result(e.g. my device is `YPU`):
```
>>>from torchgen.model import DispatchKey
>>>str(DispatchKey.PrivateUse1)
"YPU"
>>>DispatchKey.parse("YPU")
DispatchKey.PrivateUse1
```
I also thought that not everyone would need this feature, so I add a new func to handle this scenario.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99406
Approved by: https://github.com/ezyang
2023-04-27 03:30:48 +00:00
mikey dagitses
cc628293bf simplify method_def generation (#100059)
simplify method_def generation

Summary:
This removes some duplication. This was originally done to streamline
a subsequent change, but that change turned out to be
misguided. Nevertheless, this is a nice simplification.

Test Plan:
This should change the code gen by removing some redundant
parentheses. Rely on CI.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100059
Approved by: https://github.com/ezyang
2023-04-26 18:46:57 +00:00
Edward Z. Yang
3a5427baf4 Add torch.utils._content_store (#99809)
Implements a simple content-addressable store for storages (with tensors implemented as cheap references on top), enabling incremental serialization of tensors to disk, which I intend to use in the accuracy repro extractor.  Check the comment at the top of torch/utils/_content_store.py for more details on the intended use case.

One major piece of this PR is implementing the content hash for tensors.  For our prospective use case, we may need to repeatedly hash up to 80 GB of tensor data every time we snapshot (and we may snapshot multiple times).  Using a conventional cryptographic hash and hashing each snapshot would likely take on order of minutes, which seemed too slow to me.  So instead, I implemented a crappy hash function that can be run on GPU.  It is at least somewhat theoretically grounded: using random parameters generated by Philox, we use the standard shift-multiply and xor sum universal hash family.  The hash function is a bit dorky though; instead of properly doing 160-bit math, it just runs 32-bit hash five times and cats them together.  By the way, this sets the first precedent for kernel in PyTorch library which MUST be torch.compile'd to be run (in fact, this kernel does not run in eager mode because of the use of xor_sum, which doesn't actually exist in ATen.)

I had to add a few more primitives to inductor, namely randint (over the entire int range) and xor_sum.  Fortunately, these primitives are natively supported by Triton/C++, and so they were very easy to plumb through.  xor_sum is exposed as a prim, while randint special cases on when low/high span the entire 32-bit signed integer range.

Thanks to Jeff Johnson for letting me bounce ideas of him on a Saturday morning lol.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99809
Approved by: https://github.com/voznesenskym
2023-04-26 18:02:59 +00:00
andrewjcg
0b1b063158
[buckbuild.bzl] Fix dep handling in cross-builds
Differential Revision: D44960349nnPull Request resolved: https://github.com/pytorch/pytorch/pull/99826
2023-04-25 20:53:28 -07:00
Aaron Gokaslan
e2a3817dfd [BE] Enable C419 rule for any all shortcircuiting (#99890)
Apparently https://github.com/pytorch/pytorch/pull/78142 made torch.JIT allow for simple generator expressions which allows us to enable rules that replace unnecessary list comprehensions with generators in any/all. This was originally part of #99280 but I split it off into this PR so that it can be easily reverted should anything break.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99890
Approved by: https://github.com/justinchuby, https://github.com/kit1980, https://github.com/malfet
2023-04-25 15:02:13 +00:00
Catherine Lee
2cea2edc27 [easy] Fix upload test stats after master -> main switch (#99924)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99924
Approved by: https://github.com/huydhn
2023-04-24 21:19:09 +00:00
Justin Chu
7d2a18da0b Enable ruff in lintrunner (#99785)
### This change

- Implements the ruff linter in pytorch lintrunner. It is adapted from https://github.com/justinchuby/lintrunner-adapters/blob/main/lintrunner_adapters/adapters/ruff_linter.py. It does **both linting and fixing**. 🔧
- Migrated all flake8 configs to the ruff config and enabled it for the repo. 
- **`ruff` lints the whole repo in under 2s** 🤯

Fixes https://github.com/pytorch/pytorch/issues/94737 Replaces #99280

@huydhn @Skylion007

<!--
copilot:all
-->
### <samp>🤖 Generated by Copilot at 6b982dd</samp>

### Summary
🧹🛠️🎨

<!--
1.  🧹 This emoji represents cleaning or tidying up, which is what `ruff` does by formatting and linting the code. It also suggests improving the code quality and removing unnecessary or redundant code.
2.  🛠️ This emoji represents tools or fixing, which is what `ruff` is as a code formatter and linter. It also suggests enhancing the code functionality and performance, and resolving potential issues or bugs.
3.  🎨 This emoji represents art or creativity, which is what `ruff` allows by providing a consistent and configurable style for the code. It also suggests adding some flair or personality to the code, and making it more readable and enjoyable.
-->
Add `[tool.ruff]` section to `pyproject.toml` to configure `ruff` code formatter and linter. This change aims to improve code quality and consistency with a single tool.

> _`ruff` cleans the code_
> _like a spring breeze in the fields_
> _`pyproject.toml`_

### Walkthrough
*  Configure `ruff` code formatter and linter for the whole project ([link](https://github.com/pytorch/pytorch/pull/99785/files?diff=unified&w=0#diff-50c86b7ed8ac2cf95bd48334961bf0530cdc77b5a56f852c5c61b89d735fd711R22-R79))

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99785
Approved by: https://github.com/malfet, https://github.com/Skylion007
2023-04-24 16:18:44 +00:00
Justin Chu
79c9e82e27 Fix flake8 lint errors reported by ruff - take 2 (#99798)
Replaces #99784. This PR is pure autofix.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99798
Approved by: https://github.com/Skylion007, https://github.com/kit1980
2023-04-23 23:09:51 +00:00
Zain Rizvi
7546972565 [BE] Refactoring test execution and improving comments (#99467)
Sharing code between the code that handles test results in parallel vs serial mode.

Note that the original version of this code had an inconsistency between the two versions where it would execute `print_to_stderr(err_message)` on every test that ran in parallel, but for serial tests it would only invoke `print_to_stderr(err_message)` if `continue_on_error` was also specified.  By sharing code, this PR changes that behavior to be consistent between the two modes.

Also adding some comments.

<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at 029342c</samp>

> _Sing, O Muse, of the skillful coder who refined_
> _The PyTorch testing script, `run_test.py`, and shined_
> _A light on its obscure logic, with docstrings and comments_
> _And made it run more smoothly, with better error contents_
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99467
Approved by: https://github.com/huydhn, https://github.com/malfet
2023-04-19 19:29:07 +00:00
Ian Graves
24f882369a [EdgeML] Remove dependency on all_mobile_model_configs.yaml from pt_operator_library BUCK rule (#99122)
Summary: Removes the dependency on the unified YAML file

Test Plan:
Smoke test via some caffe2 tests.

```
buck2 run xplat/caffe2:supported_mobile_models_test
```

Build a major FoA app that uses model tracing  and confirm it still works.

```
buck2 build fb4a
```

CI/CD for the rest.  If operator tracing / bundling was broken, I'd hope in the 1000+ tests spawned by this change should catch it.

Differential Revision: D44946368

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99122
Approved by: https://github.com/dhruvbird
2023-04-18 17:19:55 +00:00
Rodrigo Kumpera
38e964056b Reland python ops (#99170)
Waiting for the revert to land.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99170
Approved by: https://github.com/albanD
2023-04-18 15:15:46 +00:00
PyTorch MergeBot
1c042a2137 Revert "Reland python ops (#99170)"
This reverts commit d4de64ae8d.

Reverted https://github.com/pytorch/pytorch/pull/99170 on behalf of https://github.com/DanilBaibak due to Break internal build
2023-04-18 11:37:43 +00:00
Rodrigo Kumpera
d4de64ae8d Reland python ops (#99170)
Waiting for the revert to land.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99170
Approved by: https://github.com/albanD
2023-04-17 21:53:41 +00:00
Li-Huai (Allan) Lin
e549ad0046 Add log_sigmoid_backward forward-AD (#99288)
Fixes #95057
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99288
Approved by: https://github.com/kshitij12345, https://github.com/albanD
2023-04-17 15:45:20 +00:00
Edward Z. Yang
756a86d52c Support large negative SymInt (#99157)
The strategy is that we will heap allocate a LargeNegativeIntSymNodeImpl whenever we have a large negative int, so that we can keep the old `is_symbolic` test (now called `is_heap_allocated`) on SymInt. Whenever we need to do something with these ints, though, we convert them back into a plain `int64_t` (and then, e.g., wrap it in whatever user specificed SymNodeImpl they need.) We cannot wrap directly in the user specified SymNodeImpl as we generally do not know what the "tracing context" is from C++. We expect large negative ints to be rare, so we don't apply optimizations like singleton-ifying INT_MIN.  Here's the order to review:

* c10/core/SymInt.h and cpp
  * `is_symbolic` renamed to `is_heap_allocated` as I needed to audit all use sites: the old `is_symbolic` test would return true for large negative int, but it would be wrong to then try to dispatch on the LargeNegativeIntSymNodeImpl which supports very few operations. In this file, I had to update expect_int,
  * If you pass in a large negative integer, we instead heap allocate it in `promote_to_negative`. The function is written in a funny way to keep compact constructor code for SymInt (the heap allocation happens out of line)
  * clone is now moved out-of-line
  * New method maybe_as_int which will give you a constant int if it is possible, either because it's stored inline or in LargeNegativeIntSymNodeImpl. This is the preferred replacement for previous use of is_symbolic() and then as_int_unchecked().
  * Rename toSymNodeImpl to toSymNode, which is more correct (since it returns a SymNode)
  * Complete rewrite of `normalize_symints.cpp` to use new `maybe_as_int`. Cannot easily use the old code structure, so it's now done doing a macro and typing out each case manually (it's actually not that bad.)
  * Reimplementations of all the unary operators by hand to use `maybe_as_int`, relatively simple.
* c10/core/LargeNegativeIntSymNodeImpl.h - Just stores a int64_t value, but it has to be big and negative. Most methods are not implemented, since we will rewrap the large negative int in the real SymNodeImpl subclass before doing operations with it
* The rest of the files are just rewriting code to use `maybe_as_int`. There is a nontrivial comment in c10/core/SymIntArrayRef.h

Very minor test adjustment in c10/test/core/SymInt_test.cpp . Plan to exercise this properly in next PR.

Companion XLA PR: https://github.com/pytorch/xla/pull/4882

Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99157
Approved by: https://github.com/albanD
2023-04-15 22:43:51 +00:00
Nikita Karetnikov
21681f36f4 [pt2] add SymInt support for fft ops (#99115)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99115
Approved by: https://github.com/ezyang
2023-04-15 18:01:39 +00:00
Nikita Karetnikov
f89b7c2bec [pt2] add SymInt support for roll (#99114)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99114
Approved by: https://github.com/ezyang
2023-04-15 18:01:39 +00:00
Rodrigo Kumpera
a910045add [PATCH] Back out "Move functional collectives implementation to python. (#98595) (#99168)
Summary:
Original commit changeset: ba36f8751adc

Original Phabricator Diff: D44788697

Test Plan: model loading is fine after reverting the diff

Reviewed By: zyan0, sayitmemory

Differential Revision: D44921259
---

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99168
Approved by: https://github.com/izaitsevfb
2023-04-14 23:48:19 +00:00
mantaionut
5e1ac1bb83 Fix visual studio generator (#98605)
If `CMAKE_GENERATOR=Visual Studio 16 2019` then the build will fail if `USE_NINJA=False` not set.
This PR changes that if CMAKE_GENERATOR is set an not equal to ninja then it won't use Ninja.
This is just for easier setting another generator.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98605
Approved by: https://github.com/kit1980
2023-04-14 01:46:46 +00:00
Li-Huai (Allan) Lin
ca791b6909 [MPS] Add higher order derivatives warning to max_pool2d (#98582)
The higher order derivatives calculations of `max_pool2d` require indices provided, but `mps_max_pool2d` kernel doesn't calculate it. If we calculate indices during back propagations afterwards, that would be expensive and unnecessary since users can directly call `max_pool2d` with `return_indices=True`, which calculates `indices` along.

This PR adds a warning for it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98582
Approved by: https://github.com/soulitzer
2023-04-11 18:03:46 +00:00
Edward Z. Yang
b8b840be3d Convert logging f-strings to use % format, part five (#98765)
This does some annoying but simple cases by hand.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98765
Approved by: https://github.com/wanchaol
2023-04-11 13:17:59 +00:00
Edward Z. Yang
5a7aad9681 Convert logging f-strings to use % format, part four (#98705)
This does multi-line concatenated string literals.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98705
Approved by: https://github.com/voznesenskym
2023-04-11 13:17:59 +00:00
William Wen
117da58b65 [dynamo 3.11] enable dynamo unittests in 3.11 (#98104)
Enable most dynamo unittests for 3.11. There are a few tests that are skipped due to failures that will be addressed in upcoming PRs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98104
Approved by: https://github.com/yanboliang, https://github.com/voznesenskym, https://github.com/albanD, https://github.com/jansel, https://github.com/jerryzh168, https://github.com/malfet
2023-04-10 20:04:10 +00:00
Edward Z. Yang
b09722f540 Convert logging f-strings to use % format, part two (#98700)
This hits multi-line logging strings

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98700
Approved by: https://github.com/voznesenskym
2023-04-10 12:19:31 +00:00
Edward Z. Yang
9a8f71f23e Convert logging f-strings to use % format (#98697)
Codemod done with
https://gist.github.com/ezyang/2e8b0463cdc6be278478495b23ff0530 with
assistance from ChatGPT.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98697
Approved by: https://github.com/voznesenskym
2023-04-10 12:19:31 +00:00
Rodrigo Kumpera
24d9001527 Move functional collectives implementation to python. (#98595)
This simplifies a lot the work we need to add new ops.

This relands the previous PR, not sure why it was reverted.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98595
Approved by: https://github.com/wconstab
2023-04-07 21:48:05 +00:00
PyTorch MergeBot
55724a5ec9 Revert "[experiment] More procs in CI (#98098)"
This reverts commit 9fd3eba6ce.

Reverted https://github.com/pytorch/pytorch/pull/98098 on behalf of https://github.com/clee2000 due to I think theres a bug
2023-04-07 19:50:54 +00:00
Catherine Lee
9fd3eba6ce [experiment] More procs in CI (#98098)
experiment with more procs but only in master so prs dont get affected

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98098
Approved by: https://github.com/huydhn
2023-04-07 17:21:32 +00:00
PyTorch MergeBot
22411b6f02 Revert "[dynamo 3.11] enable dynamo unittests in 3.11 (#98104)"
This reverts commit 0066f3405f.

Reverted https://github.com/pytorch/pytorch/pull/98104 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, but it is failing on CPU 3.11 test in trunk 0066f3405f.  This is probably a landrace
2023-04-07 00:05:30 +00:00
William Wen
0066f3405f [dynamo 3.11] enable dynamo unittests in 3.11 (#98104)
Enable most dynamo unittests for 3.11. There are a few tests that are skipped due to failures that will be addressed in upcoming PRs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98104
Approved by: https://github.com/yanboliang, https://github.com/voznesenskym, https://github.com/albanD, https://github.com/jansel, https://github.com/jerryzh168, https://github.com/malfet
2023-04-06 23:15:48 +00:00
Guang Yang
68cb06c752 Make gen_annotated_args support kwargs (#98396)
This PR is to address the issue seeing in PR #97417 where the newly added op requires `kwargs`, however, currently tools/autograd/gen_annotated_fn_args.py does not support `kwargs`, only `func_args` are generated for test_overrides.py.

The PR adds a new field "is_kwargs" to each argument indicating whether it's a `kwargs` or not. See example:
```
annotated_args = {
    torch._C._VariableFunctions._cast_Byte: [{'is_kwarg_only': 'False', 'name': 'self', 'simple_type': 'Tensor'}],
    ...
```

The full comparison of the generated file `annotated_fn_args.py` can be found here:
  - **Before**: [P681991116](https://www.internalfb.com/phabricator/paste/view/P681991116)
  - **After**: [P681994218](https://www.internalfb.com/intern/paste/P681994218/)

Differential Revision: D44698310

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98396
Approved by: https://github.com/ezyang
2023-04-06 19:42:26 +00:00
PyTorch MergeBot
67d1a77086 Revert "Move functional collectives implementation to python. (#98315)"
This reverts commit 8b0374f83c.

Reverted https://github.com/pytorch/pytorch/pull/98315 on behalf of https://github.com/huydhn due to Sorry for reverting for PR. This is failing in trunk probably due to a landrace
2023-04-06 16:49:40 +00:00
Rodrigo Kumpera
8b0374f83c Move functional collectives implementation to python. (#98315)
This simplifies a lot the work we need to add new ops.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98315
Approved by: https://github.com/albanD, https://github.com/wconstab, https://github.com/Neilblaze
2023-04-06 14:06:16 +00:00
PaliC
d1de5f5f0d Change daily aggregates upload job to use sum and occurence counter instead of averages (#98359)
We used to keep track of the average of stats, however, when we munge the data to find interesting insights this makes things difficult (ie. finding total test time for an oncall). The pin is updated such that we keep track of the sum instead as well as an "occurrences" field such that the average can be rederived from sum/occurrences.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98359
Approved by: https://github.com/huydhn
2023-04-05 16:31:58 +00:00
BJ Hargrave
555ab310dc Add itemsize and nbytes properties to Tensor (#98322)
Adds properties for itemsize and nbytes to Tensor matching the properties in NumPy.

Fixes https://github.com/pytorch/pytorch/issues/12728

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98322
Approved by: https://github.com/ezyang
2023-04-05 12:11:55 +00:00
PyTorch MergeBot
fa08e546f3 Revert "Add all_reduce_coalesced functional collective (#97157)"
This reverts commit a3fc3531f5.

Reverted https://github.com/pytorch/pytorch/pull/97157 on behalf of https://github.com/huydhn due to Sorry for reverting your PR, but it seems to have a land race with https://github.com/pytorch/pytorch/pull/96226 and fails lint on trunk
2023-04-04 01:50:49 +00:00
PaliC
0e2bde3000 Create script to upload test aggregation data (#97954)
<!--
copilot:summary
-->
### <samp>🤖 Generated by Copilot at 79f1b37</samp>

This pull request improves the workflow and data processing for uploading contribution and testing statistics to Rockset and S3. It renames and updates a workflow file, removes unused code from a script, and adds a new script to aggregate and upload test results.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/97954
Approved by: https://github.com/huydhn
2023-04-04 01:28:08 +00:00
Rodrigo Kumpera
a3fc3531f5 Add all_reduce_coalesced functional collective (#97157)
Inductor codegen is suboptimal when calling all_reduce_coalesced with input args. We need to fix inductor's calling convention for that, or something else.

Might not work if any outputs is unused.

Test code:

```python
import torch
import torch.distributed as dist
import torch.nn.functional as F
from functorch import make_fx
import os

import torch.distributed._functional_collectives as ft_c
from torch.testing._internal.common_distributed import (
    spawn_threads_and_init_comms,
)
from torch._inductor.compile_fx import compile_fx_inner

def my_fun(a, b):
    c = a * 3
    tensors = ft_c.all_reduce_coalesced([a, c, b], "sum", [0])
    return ((tensors[1] + tensors[0] + tensors[2]).sum(), )

@spawn_threads_and_init_comms(world_size=1)
def inductor_main(self):

    x = torch.arange(4).cuda() * (dist.get_rank() + 1)
    y = torch.arange(4).cuda() * (dist.get_rank() + 1)
    x = x.to(torch.float)
    y = y.to(torch.float) * 0.5
    res = make_fx(my_fun)(x, y)
    print(f"fx graph:\n{res.graph}")
    ind = compile_fx_inner(res, [x, y])
    print(f"inductor done:\n{ind}")

os.environ["PROXY_TENSOR_TRACING"] = "1"
os.environ["TORCH_COMPILE_DEBUG"] = "1"
torch._dynamo.config.output_code = True

if __name__ == "__main__":
    inductor_main(None)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97157
Approved by: https://github.com/fegin
2023-04-04 01:13:18 +00:00
Rodrigo Kumpera
9ad66dd588 Switch reduce_scatter and all_gather in DeviceMesh to use functional collectives (#96226)
Among the changes is the introduction of gather_dim and scatter_dim in DeviceMesh collectives to simplify user code.

The current plan is to keep padding and gather/scatter dim support in DeviceMesh while we explore  optimization opportunities in Inductor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96226
Approved by: https://github.com/wanchaol
2023-04-04 00:58:33 +00:00