Zhenbin Lin
41b58a1bec
OpenReg: Fix issue when copying on the same device ( #135956 )
...
Current copy gets wrong value when src and dst are both openreg.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135956
Approved by: https://github.com/albanD
2024-09-14 09:57:45 +00:00
Zhenbin Lin
8d68a02905
OpenReg: Split the daemon into drvier/executor ( #135646 )
...
Split the daemon into a proper user-process driver vs device-process executor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135646
Approved by: https://github.com/albanD
2024-09-12 05:03:46 +00:00
FFFrog
27d86f93fe
Remove redundant code ( #134955 )
...
Remove GetPrivateUse1HooksInterface
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134955
Approved by: https://github.com/Skylion007
2024-09-05 01:11:32 +00:00
albanD
3b33f26513
Add device daemon ( #131814 )
...
Base implementation aiming towards https://github.com/pytorch/rfcs/pull/64
Details of the implementation and next steps in https://github.com/pytorch/pytorch/blob/gh/albanD/3/head/test/cpp_extensions/open_registration_extension/README.md
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131814
Approved by: https://github.com/ezyang
2024-08-27 23:32:07 +00:00
cyy
8f7cf796ea
[14/N] Use std::optional ( #133417 )
...
Follows #132527
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133417
Approved by: https://github.com/ezyang
2024-08-16 00:48:34 +00:00
cyyever
636a7c4859
[13/N] Use std::optional ( #132527 )
...
Follows #132361
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132527
Approved by: https://github.com/ezyang
2024-08-08 03:16:28 +00:00
Apurva Jain
8bc5ef563e
Grouped Query Attention ( #132689 )
...
### Approach: Using the current function declaration
**Constraint:** Q_Heads % KV_Heads == 0
**Major change:**
- Added a new argument enable_gqa: bool to sdpa function call
- It adds a meaning to the last third dimension.
Sample use cases this would enable:
LLama3
```
# LLama3 8b call to SDPA
query = torch.rand(batch, 32, seq_len_q, D)
key = torch.rand(batch, 8, seq_len_kv, D)
value = torch.rand(batch, 8, seq_len_kv, D)
output = scaled_dot_product_attention(query, key, value, is_causal=True, enable_gqa=True)
# Output Shape
(batch, 32, seq_len_q, D)
```
### Design Choice:
- Check if Query.size(-3) == Key.size(-3) == Value.size(-3) or, Query.size(-3) % Key.size(-3) == 0
- The function adjusts the key and value tensors to match the query tensor's head dimension by using repeat_interleave if their number of heads are not equal, facilitating correct and efficient computation in attention mechanisms.
- By default the enable_gqa flag is set to False, which ensures that regular sdpa functionality remains unchanged.
### Benchmarks:
- **sdpa.py: #130634**
For different batch sizes enable_gqa=True shows a substansial improvement in the run_time of sdpa
| batch_size | q_num_heads | kv_num_heads | q_seq_len | kv_seq_len | embed_dim | forward_time when enable_gqa=True | forward_time when enable_gqa=False |
| ------------ | ------------- | -------------- | ----------- | ------------ | ----------- | ----------- | ---------------- |
| 1 | 32 | 8 | 2048 | 2048 | 2048 | 100.71 | 119.70 |
| 8 | 32 | 8 | 2048 | 2048 | 2048 | 539.78 | 628.83 |
| 16 | 32 | 8 | 2048 | 2048 | 2048 | 1056.81 | 1225.48 |
| 32 | 32 | 8 | 2048 | 2048 | 2048 | 2099.54 | 2440.45 |

- **TorchTitan: https://github.com/pytorch/torchtitan/pull/458 **
Differential Revision: D60772086
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132689
Approved by: https://github.com/drisspg
2024-08-07 05:35:36 +00:00
albanD
3d87dfc088
Add basic OpenReg module scaffolding with autograd ( #131708 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131708
Approved by: https://github.com/ezyang
2024-08-05 17:07:11 +00:00
PyTorch MergeBot
bcb4f7c172
Revert "Grouped Query Attention ( #128898 )"
...
This reverts commit 6b28af1b79 .
Reverted https://github.com/pytorch/pytorch/pull/128898 on behalf of https://github.com/ZainRizvi due to Sorry, this broke a bunch of tests internally. See D60638265 ([comment](https://github.com/pytorch/pytorch/pull/128898#issuecomment-2265961038 ))
2024-08-02 18:58:46 +00:00
jainapurva
6b28af1b79
Grouped Query Attention ( #128898 )
...
### Approach: Using the current function declaration
**Constraint:** Q_Heads % KV_Heads == 0
**Major change:**
- Added a new argument enable_gqa: bool to sdpa function call
- It adds a meaning to the last third dimension.
Sample use cases this would enable:
LLama3
```
# LLama3 8b call to SDPA
query = torch.rand(batch, 32, seq_len_q, D)
key = torch.rand(batch, 8, seq_len_kv, D)
value = torch.rand(batch, 8, seq_len_kv, D)
output = scaled_dot_product_attention(query, key, value, is_causal=True, enable_gqa=True)
# Output Shape
(batch, 32, seq_len_q, D)
```
### Design Choice:
- Check if Query.size(-3) == Key.size(-3) == Value.size(-3) or, Query.size(-3) % Key.size(-3) == 0
- The function adjusts the key and value tensors to match the query tensor's head dimension by using repeat_interleave if their number of heads are not equal, facilitating correct and efficient computation in attention mechanisms.
- By default the enable_gqa flag is set to False, which ensures that regular sdpa functionality remains unchanged.
### Benchmarks:
- **sdpa.py: #130634**
For different batch sizes enable_gqa=True shows a substansial improvement in the run_time of sdpa
| batch_size | q_num_heads | kv_num_heads | q_seq_len | kv_seq_len | embed_dim | forward_time when enable_gqa=True | forward_time when enable_gqa=False |
| ------------ | ------------- | -------------- | ----------- | ------------ | ----------- | ----------- | ---------------- |
| 1 | 32 | 8 | 2048 | 2048 | 2048 | 100.71 | 119.70 |
| 8 | 32 | 8 | 2048 | 2048 | 2048 | 539.78 | 628.83 |
| 16 | 32 | 8 | 2048 | 2048 | 2048 | 1056.81 | 1225.48 |
| 32 | 32 | 8 | 2048 | 2048 | 2048 | 2099.54 | 2440.45 |

- **TorchTitan: https://github.com/pytorch/torchtitan/pull/458 **
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128898
Approved by: https://github.com/drisspg
2024-07-31 22:58:51 +00:00
Xuehai Pan
548c460bf1
[BE][Easy][7/19] enforce style for empty lines in import segments in test/[a-c]*/ and test/[q-z]*/ ( #129758 )
...
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501 . Most changes are auto-generated by linter.
You can review these PRs via:
```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129758
Approved by: https://github.com/ezyang
2024-07-31 10:54:03 +00:00
PyTorch MergeBot
499ead96ff
Revert "Grouped Query Attention ( #128898 )"
...
This reverts commit d039b14207 .
Reverted https://github.com/pytorch/pytorch/pull/128898 on behalf of https://github.com/albanD due to Broken test on main ([comment](https://github.com/pytorch/pytorch/pull/128898#issuecomment-2258314481 ))
2024-07-30 13:11:24 +00:00
jainapurva
d039b14207
Grouped Query Attention ( #128898 )
...
### Approach: Using the current function declaration
**Constraint:** Q_Heads % KV_Heads == 0
**Major change:**
- Added a new argument enable_gqa: bool to sdpa function call
- It adds a meaning to the last third dimension.
Sample use cases this would enable:
LLama3
```
# LLama3 8b call to SDPA
query = torch.rand(batch, 32, seq_len_q, D)
key = torch.rand(batch, 8, seq_len_kv, D)
value = torch.rand(batch, 8, seq_len_kv, D)
output = scaled_dot_product_attention(query, key, value, is_causal=True, enable_gqa=True)
# Output Shape
(batch, 32, seq_len_q, D)
```
### Design Choice:
- Check if Query.size(-3) == Key.size(-3) == Value.size(-3) or, Query.size(-3) % Key.size(-3) == 0
- The function adjusts the key and value tensors to match the query tensor's head dimension by using repeat_interleave if their number of heads are not equal, facilitating correct and efficient computation in attention mechanisms.
- By default the enable_gqa flag is set to False, which ensures that regular sdpa functionality remains unchanged.
### Benchmarks:
- **sdpa.py: #130634**
For different batch sizes enable_gqa=True shows a substansial improvement in the run_time of sdpa
| batch_size | q_num_heads | kv_num_heads | q_seq_len | kv_seq_len | embed_dim | forward_time when enable_gqa=True | forward_time when enable_gqa=False |
| ------------ | ------------- | -------------- | ----------- | ------------ | ----------- | ----------- | ---------------- |
| 1 | 32 | 8 | 2048 | 2048 | 2048 | 100.71 | 119.70 |
| 8 | 32 | 8 | 2048 | 2048 | 2048 | 539.78 | 628.83 |
| 16 | 32 | 8 | 2048 | 2048 | 2048 | 1056.81 | 1225.48 |
| 32 | 32 | 8 | 2048 | 2048 | 2048 | 2099.54 | 2440.45 |

- **TorchTitan: https://github.com/pytorch/torchtitan/pull/458 **
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128898
Approved by: https://github.com/drisspg
2024-07-29 21:49:06 +00:00
wizzniu
8963623494
Re-implement pin_memory to be device-agnostic by leveraging the Accelerator concept ( #126376 )
...
This PR re-implements pin memory aiming to get rid of the optional `device` argument and makes all related APIs to be device-agnostic. We add two new abstract APIs in [AcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/detail/AcceleratorHooksInterface.h#L12 ) and redefine pin memory as: "Pin memory is always pinned for the current accelerator device". In detail, it uses [getAcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/Context.h#L61 ) in pin_memory/is_pinned to get an appropriate device and invoke the corresponding overridden interfaces, instead of using BackendSelect and then dispatching to CUDA or other specific backends' implement methods.
Note: For new backends who want to implement and use pin memory, just inherit AcceleratorHooksInterface and overwrite the `isPinnedPtr` and `getPinnedMemoryAllocator` methods.
Additional context: To avoid BC-breaking, this PR just preserves the `device` arg of related APIs and would throw a deprecation warning if `device` arg is passed. Another PR will be submitted to update all PT callers (`Tensor.is_pinned()`, `Tensor.pin_memory()`...) not to pass this arg based on this PR. In future, `device` arg will be actually removed.
Relates #124908
Relates #14560
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126376
Approved by: https://github.com/albanD
2024-07-23 01:44:15 +00:00
Shan19900305
d57af32e63
Fix undefined tensor error in _copy_from_and_resize when fallback to cpu. ( #130237 )
...
1) Add skip undefined tensor in cpu fallback when call _copy_from_and_resize;
2) Modify to_cpu function support optional tensor;
3) Add copy back to origin optional tensor when alias_info isWrite is true.
@ezyang @bdhirsh
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130237
Approved by: https://github.com/ezyang
2024-07-20 23:12:17 +00:00
PyTorch MergeBot
726b9268d2
Revert "Re-implement pin_memory to be device-agnostic by leveraging the Accelerator concept ( #126376 )"
...
This reverts commit c986aeea2d .
Reverted https://github.com/pytorch/pytorch/pull/126376 on behalf of https://github.com/atalman due to Failing internal builds ([comment](https://github.com/pytorch/pytorch/pull/126376#issuecomment-2237496633 ))
2024-07-18 20:25:20 +00:00
wizzniu
c986aeea2d
Re-implement pin_memory to be device-agnostic by leveraging the Accelerator concept ( #126376 )
...
This PR re-implements pin memory aiming to get rid of the optional `device` argument and makes all related APIs to be device-agnostic. We add two new abstract APIs in [AcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/detail/AcceleratorHooksInterface.h#L12 ) and redefine pin memory as: "Pin memory is always pinned for the current accelerator device". In detail, it uses [getAcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/Context.h#L61 ) in pin_memory/is_pinned to get an appropriate device and invoke the corresponding overridden interfaces, instead of using BackendSelect and then dispatching to CUDA or other specific backends' implement methods.
Note: For new backends who want to implement and use pin memory, just inherit AcceleratorHooksInterface and overwrite the `isPinnedPtr` and `getPinnedMemoryAllocator` methods.
Additional context: To avoid BC-breaking, this PR just preserves the `device` arg of related APIs and would throw a deprecation warning if `device` arg is passed. Another PR will be submitted to update all PT callers (`Tensor.is_pinned()`, `Tensor.pin_memory()`...) not to pass this arg based on this PR. In future, `device` arg will be actually removed.
Relates #124908
Relates #14560
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126376
Approved by: https://github.com/albanD
2024-07-18 11:54:14 +00:00
cyy
28f6ae2718
[9/N] Replace c10::optional with std::optional ( #130674 )
...
Follows #130509
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130674
Approved by: https://github.com/Skylion007
2024-07-15 00:48:43 +00:00
Yuanhao Ji
312652c325
[RFC] Add support for device extension autoloading ( #127074 )
...
Fixes #122468
- Load device extensions at the end of `torch/__init__.py`
- Enabled by default, or you can disable it with `TORCH_DEVICE_BACKEND_AUTOLOAD=0`
run test:
```python
python test/run_test.py -i test_autoload_enable
python test/run_test.py -i test_autoload_disable
```
doc:
https://docs-preview.pytorch.org/pytorch/pytorch/127074/miscellaneous_environment_variables.html
co-author: @jgong5 @bsochack @bkowalskiINTEL @jczaja @FFFrog @hipudding
Co-authored-by: albanD <desmaison.alban@gmail.com>
Co-authored-by: Jiong Gong <jiong.gong@intel.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127074
Approved by: https://github.com/albanD , https://github.com/jgong5
2024-07-09 06:14:13 +00:00
FEI
59e4e92556
sdp::SDPBackend::flash_attention support PrivateUse1 ( #126392 )
...
Fixes https://github.com/pytorch/pytorch/issues/124271
cc @cpuhrsch @drisspg @albanD @soulitzer
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126392
Approved by: https://github.com/drisspg
2024-06-28 17:48:40 +00:00
Shan19900305
7931eee5c5
Support torch.dtype as parameter in pybind11 cpp extension. ( #126865 )
...
Support torch.dtype as parameter in pybind11 cpp extension.
Example:
`
cpp_extension.my_ops(self, other, torch.dtype)
`
@ezyang @bdhirsh
Co-authored-by: Edward Z. Yang <ezyang@mit.edu>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126865
Approved by: https://github.com/ezyang
2024-05-29 23:19:32 +00:00
Xuehai Pan
26f4f10ac8
[5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch ( #127126 )
...
The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126
Approved by: https://github.com/kit1980
2024-05-27 14:49:57 +00:00
PyTorch MergeBot
55c0ab2887
Revert "[5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch ( #127126 )"
...
This reverts commit 7763c83af6 .
Reverted https://github.com/pytorch/pytorch/pull/127126 on behalf of https://github.com/XuehaiPan due to Broken CI ([comment](https://github.com/pytorch/pytorch/pull/127126#issuecomment-2133044286 ))
2024-05-27 09:22:08 +00:00
Xuehai Pan
7763c83af6
[5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch ( #127126 )
...
The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126
Approved by: https://github.com/kit1980
ghstack dependencies: #127122 , #127123 , #127124 , #127125
2024-05-27 04:22:18 +00:00
Richard Barnes
3f5b59eef4
[codemod] c10::optional -> std::optional in caffe2/aten/src/ATen/DeviceGuard.h +117 ( #126901 )
...
Summary:
Generated with
```
fbgs -f '.*\.(cpp|cxx|cc|h|hpp|cu|cuh)$' c10::optional -l | perl -pe 's/^fbsource.fbcode.//' | grep -v executorch | xargs -n 50 perl -pi -e 's/c10::optional/std::optional/g'
```
- If you approve of this diff, please use the "Accept & Ship" button :-)
(117 files modified.)
Test Plan: Sandcastle
Reviewed By: palmje
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126901
Approved by: https://github.com/Skylion007 , https://github.com/eqy
2024-05-24 00:26:15 +00:00
Richard Barnes
ed327876f5
[codemod] c10:optional -> std::optional ( #126135 )
...
Generated by running the following from PyTorch root:
```
find . -regex ".*\.\(cpp\|h\|cu\|hpp\|cc\|cxx\)$" | grep -v "build/" | xargs -n 50 -P 4 perl -pi -e 's/c10::optional/std::optional/'
```
`c10::optional` is just an alias for `std::optional`. This removes usages of that alias in preparation for eliminating it entirely.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126135
Approved by: https://github.com/Skylion007 , https://github.com/malfet , https://github.com/albanD , https://github.com/aaronenyeshi
2024-05-14 19:35:51 +00:00
Yu, Guangye
31372fa842
Support generic stream/event on CUDA/HIP backend ( #125757 )
...
# Motivation
According to [#123611 ](https://github.com/pytorch/pytorch/pull/123611 ), we support generic stream/event on CUDA backend.
# Additional Context
new method/attribute on `torch.Event` for cuda
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize
new method on `c10::Event` on cuda backend
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125757
Approved by: https://github.com/albanD , https://github.com/jgong5 , https://github.com/EikanWang
2024-05-10 13:34:09 +00:00
egienvalue
8461e7ed9e
Add test_cpp_extensions tests for stream_and_event and mita_backend ( #123614 )
...
Test the generic torch.Stream/Event with fake device gurad and hooks. Since we added a fake device backend, it is mutual exclusive to other backends. Tests will be skipped if TEST_CUDA or TEST_ROCM is true.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123614
Approved by: https://github.com/albanD
ghstack dependencies: #123611 , #123612
2024-04-26 16:17:54 +00:00
Shan19900305
8d12ba9acf
add methods for open device in PackedSequence module. ( #124923 )
...
1) add is_{custom_device_name}() and {custom_device_name}() for open device register;
2) fix open device failed testcases.
@ezyang @bdhirsh
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124923
Approved by: https://github.com/ezyang
2024-04-26 15:26:20 +00:00
PyTorch MergeBot
4a1299cc0e
Revert "Add test_cpp_extensions tests for stream_and_event and mita_backend ( #123614 )"
...
This reverts commit 355dc34f86 .
Reverted https://github.com/pytorch/pytorch/pull/123614 on behalf of https://github.com/jeffdaily due to this PR broke ROCm with message RuntimeError: Cannot have MTIA with other devices ([comment](https://github.com/pytorch/pytorch/pull/123612#issuecomment-2077649762 ))
2024-04-25 16:06:46 +00:00
egienvalue
355dc34f86
Add test_cpp_extensions tests for stream_and_event and mita_backend ( #123614 )
...
Test the generic torch.Stream/Event with fake device gurad and hooks.
Differential Revision: [D56443358](https://our.internmc.facebook.com/intern/diff/D56443358 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123614
Approved by: https://github.com/albanD
ghstack dependencies: #123611 , #123612
2024-04-24 20:51:20 +00:00
Ashwin Hari
5f5778476a
rename ort to maia ( #123265 )
...
Fixes #123264
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123265
Approved by: https://github.com/albanD
2024-04-23 00:33:25 +00:00
PyTorch MergeBot
52da03edeb
Revert "Add test_cpp_extensions tests for stream_and_event and mita_backend ( #123614 )"
...
This reverts commit b6f0159db0 .
Reverted https://github.com/pytorch/pytorch/pull/123614 on behalf of https://github.com/jeffdaily due to This broke ROCm. see test_overrides.py ([comment](https://github.com/pytorch/pytorch/pull/123611#issuecomment-2067363780 ))
2024-04-19 22:44:26 +00:00
egienvalue
b6f0159db0
Add test_cpp_extensions tests for stream_and_event and mita_backend ( #123614 )
...
Test the generic torch.Stream/Event with fake device gurad and hooks.
@exported-using-ghexport
Differential Revision: [D55902506](https://our.internmc.facebook.com/intern/diff/D55902506/ )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123614
Approved by: https://github.com/albanD
ghstack dependencies: #123611 , #123612
2024-04-18 17:40:13 +00:00
Yuanhao Ji
c797fbc4e1
Enable UFMT on test/cpp_api_parity, test/cpp_extensions, test/create_dummy_torchscript_model.py, test/custom_backend, test/custom_operator ( #123518 )
...
Partially addresses #123062
Ran lintrunner on:
- `test/cpp_api_parity`
- `test/cpp_extensions`
- `test/create_dummy_torchscript_model.py`
- `test/custom_backend`
- `test/custom_operator`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123518
Approved by: https://github.com/huydhn
2024-04-08 20:18:42 +00:00
chentianyi16
83ad8e01b1
fix the problem that cpu_fallback for aten::triu_indices on custom device crashed ( #121306 )
...
Fixes #121289
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121306
Approved by: https://github.com/ezyang
2024-03-26 01:29:45 +00:00
PyTorch MergeBot
db506762d1
Revert "Change ATEN generator argument type to const std::optional<Generator>& ( #120076 )"
...
This reverts commit a52b4e2257 .
Reverted https://github.com/pytorch/pytorch/pull/120076 on behalf of https://github.com/atalman due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/120076#issuecomment-2018680656 ))
2024-03-25 18:52:05 +00:00
cyy
a52b4e2257
Change ATEN generator argument type to const std::optional<Generator>& ( #120076 )
...
This PR proposes to use std::optional<Generator>& for underlying functions to avoid unnecessary copy and move operations. The torchgen code was changed to generate the new type.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120076
Approved by: https://github.com/malfet
2024-03-24 02:12:08 +00:00
PyTorch MergeBot
02fee6caec
Revert "Change ATEN generator argument type to const std::optional<Generator>& ( #120076 )"
...
This reverts commit ecbe82b9ce .
Reverted https://github.com/pytorch/pytorch/pull/120076 on behalf of https://github.com/jeanschmidt due to Reverting in order to check if this will fix XLA trunk jobs ([comment](https://github.com/pytorch/pytorch/pull/120076#issuecomment-2015272644 ))
2024-03-22 14:53:45 +00:00
cyy
ecbe82b9ce
Change ATEN generator argument type to const std::optional<Generator>& ( #120076 )
...
This PR proposes to use std::optional<Generator>& for underlying functions to avoid unnecessary copy and move operations. The torchgen code was changed to generate the new type.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120076
Approved by: https://github.com/malfet
2024-03-22 03:49:31 +00:00
Shan19900305
6662627c89
Add APIs for custom device using TensorIteratorBase. ( #120792 )
...
1) add operand and get_dim_names API;
2) set will_resize to true when output tensor is undefined;
3) add abs_stub for dummy device and calculate on cpu device;
4) support dummy device copy with stride;
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120792
Approved by: https://github.com/ezyang
2024-03-20 03:51:09 +00:00
PyTorch MergeBot
c0996866f4
Revert "Change ATEN generator argument type to const std::optional<Generator>& ( #120076 )"
...
This reverts commit 4305c64fea .
Reverted https://github.com/pytorch/pytorch/pull/120076 on behalf of https://github.com/izaitsevfb due to breaking internal builds(take 3) ([comment](https://github.com/pytorch/pytorch/pull/120076#issuecomment-1986338164 ))
2024-03-08 20:01:03 +00:00
cyy
4305c64fea
Change ATEN generator argument type to const std::optional<Generator>& ( #120076 )
...
This PR proposes to use std::optional<Generator>& for underlying functions to avoid unnecessary copy and move operations. The torchgen code was changed to generate the new type.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120076
Approved by: https://github.com/malfet
2024-03-07 09:52:21 +00:00
Chen_Liqing
291ce86a6c
Modify StorageImplCreateHelper ( #118459 )
...
I want to use tensor.untyped_storage()[a:b] for ``PrivateUse1`` backend but fail. The code will go into ``THPStorage_get``:
bb6eba189f/torch/csrc/Storage.cpp (L525-L540)
Here ``torch`` will create a new ``c10::StorageImpl`` but not consider about ``PrivateUse1`` backend.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118459
Approved by: https://github.com/albanD
2024-03-07 06:26:55 +00:00
cyy
507611f9ae
[CUDACachingAllocator] Turn Allocator::allocate into non-const ( #120969 )
...
Ideally, the method should be non-const since it changes the allocator state. Some const_casts are also removed in the way.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120969
Approved by: https://github.com/albanD
2024-03-05 09:53:05 +00:00
Shan19900305
6c3600d008
Enable optional tensorList fallback to cpu. ( #119273 )
...
add optional tensorList fallback to cpu.
Add testcases and old pr is: https://github.com/pytorch/pytorch/pull/106449
@bdhirsh
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119273
Approved by: https://github.com/bdhirsh
2024-02-07 03:54:13 +00:00
Edward Yang
b4a35632f9
Add function to materialize COW storages ( #117053 )
...
Summary: From Kurt Mohler, see https://github.com/pytorch/pytorch/pull/113396 (manually imported due to ghimport problems)
Test Plan: sandcastle, OSS CI
Differential Revision: D52610522
Pull Request resolved: https://github.com/pytorch/pytorch/pull/117053
Approved by: https://github.com/malfet , https://github.com/kurtamohler
2024-01-10 15:34:16 +00:00
PyTorch MergeBot
f36d09fcb7
Revert "Add function to materialize COW storages ( #113396 )"
...
This reverts commit e2f090086b .
Reverted https://github.com/pytorch/pytorch/pull/113396 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/113396#issuecomment-1818769090 ))
2023-11-20 10:26:01 +00:00
Kurt Mohler
e2f090086b
Add function to materialize COW storages ( #113396 )
...
Part of #109833
Pull Request resolved: https://github.com/pytorch/pytorch/pull/113396
Approved by: https://github.com/ezyang
2023-11-17 01:58:51 +00:00
feifan
c73da67d46
new_qtensor support privateuseone allocator. ( #111464 )
...
I want to create a quant tensor through `PerTensorAffineQuantizer`. But I found that it will throw error because of the lake of judgment for PrivateUse1.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111464
Approved by: https://github.com/ezyang
2023-11-01 05:16:58 +00:00