when a tensor has unbacked symbols it can be general enough to represent both contiguous and non contiguous tensors.
in that case we cant really evaluate is_contiguous. In many places in the code base, we check for is_contiguous to take a fast path. but the general path usually works for both contiguous and not contiguous in that case we probably want
to use definitely _contiguous API.
This is appleid for reshape in this PR and also to tensor meta data computation, the meta data now will have an attribute that says that its contiguous when its always contiguous. We would store that only if definitely _contiguous is true now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153432
Approved by: https://github.com/bobrenjc93
See context in D75266206.
This diff/PR migrates all the remaining view ops, which all need changes in `native_functions.yaml` and thus need to be exported to PR.
Ops covered by this diff:
- _reshape_alias
- unfold
internal: Also delete the entire aten_mtia_view_ops.cpp file, and update corresponding build config.
Differential Revision: [D75385411](https://our.internmc.facebook.com/intern/diff/D75385411/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154337
Approved by: https://github.com/nautsimon
ghstack dependencies: #154336
Since rocblas.dll and hipblaslt.dll are copied to torch/lib, rocblas and hipblaslt directories are needed to be stored there too (otherwise we have an error after wheel installation while searching for files in rocblas/library and hipblaslt/library which doesn't exist). This PR fixes this issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153144
Approved by: https://github.com/jeffdaily
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Prepares for the next PR in the stack by tightening up typing on a `cpp_wrapper` interface that's only used in one (well-typed) place, as well as downstream effects of that change. In particular, this enabled:
1. removing a number of now clearly unnecessary asserts
2. adding a few more targeted asserts to validate the code's current assumptions
3. removing some unneeded control flow in several functions
As far as I can tell, this PR should be functionally neutral. One argument was removed from a `cpp_wrapper` public API, but that argument was unused, and only had a single callsite.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154371
Approved by: https://github.com/desertfire
Summary:
Dot product for a single output element consists of 3 steps (both input vectors have elements of type scalar_t):
1. elementwise vector multiply (scalar_t x scalar_t -> opmath_t)
2. vector reduction to a scalar value (opmath_t -> opmath_t)
3. optional downcast if opmath_t != out_t
The current blas kernel performs steps 1 and 2 correctly, but for step 3, it will always downcast to scalar_t even when opmath_t == output_t (and then do an upcast back to output_t), which results in precision loss. This diff fixes the precision loss in the BlasKernel
Test Plan: Attention CI passes
Differential Revision: D75023858
topic: not user facing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154012
Approved by: https://github.com/Valentine233, https://github.com/aditew01, https://github.com/CaoE, https://github.com/drisspg
# Motivation
This PR is intended to add post-op fusion support fo Linear. The liner-pointwise fusion is expected to be used in graph mode like torch.compile. The FusionUtils.cpp file defines a utilization APIs for generating primitive attribute. This APIs would also be used for conv-pointwise fusion, which is in #140372.
# Validation
```bash
python test/xpu/test_fusion.py
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140365
Approved by: https://github.com/etaf, https://github.com/guangyey, https://github.com/EikanWang
Hopefully last step before all Mac build/tests could be switched away from conda
- Update cmake version from 3.22 to 3.25 as 3.22 from pipy seems to be unusable with python-3.12
- Add `--plat-name macosx_11_0_arm64` to setup.py command
- Remove `codesign` for cmake workaround (that was probably never really necessary
- Install `libpng` and `jpeg-turbo` when building torchbench and build torchaudio without OpenMP (to be fixed)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/154309
Approved by: https://github.com/Skylion007, https://github.com/cyyever
when a tensor has unbacked symbols it can be general enough to represent both contiguous and non contiguous tensors.
in that case we cant really evaluate is_contiguous. In many places in the code base, we check for is_contiguous to take a fast path. but the general path usually works for both contiguous and not contiguous in that case we probably want
to use definitely _contiguous API.
This is appleid for reshape in this PR and also to tensor meta data computation, the meta data now will have an attribute that says that its contiguous when its always contiguous. We would store that only if definitely _contiguous is true now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153432
Approved by: https://github.com/bobrenjc93
When `XPU_ARCH_FLAGS` is an empty string, compilation will fail on `C10_STRINGIZE(XPU_ARCH_FLAGS)` in file `torch/csrc/xpu/Module.cpp` on Windows.
This PR fixes this issue by setting `TORCH_XPU_ARCH_LIST` to `""` to avoid an empty string conversion in `C10_STRINGIZE()` when compiling without an AOT.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/153604
Approved by: https://github.com/guangyey, https://github.com/EikanWang
Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>
Co-authored-by: Yu, Guangye <106960996+guangyey@users.noreply.github.com>