Commit Graph

248 Commits

Author SHA1 Message Date
Zhenbin Lin
41b58a1bec OpenReg: Fix issue when copying on the same device (#135956)
Current copy gets wrong value when src and dst are both openreg.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135956
Approved by: https://github.com/albanD
2024-09-14 09:57:45 +00:00
Zhenbin Lin
8d68a02905 OpenReg: Split the daemon into drvier/executor (#135646)
Split the daemon into a proper user-process driver vs device-process executor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135646
Approved by: https://github.com/albanD
2024-09-12 05:03:46 +00:00
FFFrog
27d86f93fe Remove redundant code (#134955)
Remove GetPrivateUse1HooksInterface
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134955
Approved by: https://github.com/Skylion007
2024-09-05 01:11:32 +00:00
albanD
3b33f26513 Add device daemon (#131814)
Base implementation aiming towards https://github.com/pytorch/rfcs/pull/64

Details of the implementation and next steps in https://github.com/pytorch/pytorch/blob/gh/albanD/3/head/test/cpp_extensions/open_registration_extension/README.md

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131814
Approved by: https://github.com/ezyang
2024-08-27 23:32:07 +00:00
cyy
8f7cf796ea [14/N] Use std::optional (#133417)
Follows #132527
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133417
Approved by: https://github.com/ezyang
2024-08-16 00:48:34 +00:00
cyyever
636a7c4859 [13/N] Use std::optional (#132527)
Follows #132361

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132527
Approved by: https://github.com/ezyang
2024-08-08 03:16:28 +00:00
Apurva Jain
8bc5ef563e Grouped Query Attention (#132689)
### Approach: Using the current function declaration

**Constraint:** Q_Heads % KV_Heads == 0

**Major change:**
- Added a new argument enable_gqa: bool to sdpa function call
- It adds a meaning to the last third dimension.

Sample use cases this would enable:
LLama3

```
# LLama3 8b call to SDPA
query = torch.rand(batch, 32, seq_len_q, D)
key = torch.rand(batch, 8, seq_len_kv, D)
value = torch.rand(batch, 8, seq_len_kv, D)

output = scaled_dot_product_attention(query, key, value, is_causal=True, enable_gqa=True)

# Output Shape
(batch, 32, seq_len_q, D)
```

### Design Choice:

- Check if Query.size(-3) == Key.size(-3) == Value.size(-3) or, Query.size(-3) % Key.size(-3) == 0
- The function adjusts the key and value tensors to match the query tensor's head dimension by using repeat_interleave if their number of heads are not equal, facilitating correct and efficient computation in attention mechanisms.
- By default the enable_gqa flag is set to False, which ensures that regular sdpa functionality remains unchanged.

### Benchmarks:

- **sdpa.py: #130634**
For different batch sizes enable_gqa=True shows a substansial improvement in the run_time of sdpa

 | batch_size | q_num_heads | kv_num_heads | q_seq_len | kv_seq_len | embed_dim | forward_time when enable_gqa=True   |   forward_time when enable_gqa=False    |
| ------------ | ------------- | -------------- | ----------- | ------------ | ----------- | ----------- | ---------------- |
|     1      |     32      |      8       |   2048    |    2048    |   2048    |   100.71  |  119.70  |
|     8      |     32      |      8       |   2048    |    2048    |   2048    |   539.78  |  628.83  |
|     16     |     32      |      8       |   2048    |    2048    |   2048    |   1056.81  |  1225.48  |
|     32      |     32      |      8       |   2048    |    2048    |   2048    |   2099.54  |  2440.45  |

![Screenshot 2024-07-25 at 9 07 40 PM](https://github.com/user-attachments/assets/a3e5f716-c39f-4096-9e6c-82a735e57b7b)

- **TorchTitan: https://github.com/pytorch/torchtitan/pull/458**

Differential Revision: D60772086

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132689
Approved by: https://github.com/drisspg
2024-08-07 05:35:36 +00:00
albanD
3d87dfc088 Add basic OpenReg module scaffolding with autograd (#131708)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131708
Approved by: https://github.com/ezyang
2024-08-05 17:07:11 +00:00
PyTorch MergeBot
bcb4f7c172 Revert "Grouped Query Attention (#128898)"
This reverts commit 6b28af1b79.

Reverted https://github.com/pytorch/pytorch/pull/128898 on behalf of https://github.com/ZainRizvi due to Sorry, this broke a bunch of tests internally. See D60638265 ([comment](https://github.com/pytorch/pytorch/pull/128898#issuecomment-2265961038))
2024-08-02 18:58:46 +00:00
jainapurva
6b28af1b79 Grouped Query Attention (#128898)
### Approach: Using the current function declaration

**Constraint:** Q_Heads % KV_Heads == 0

**Major change:**
- Added a new argument enable_gqa: bool to sdpa function call
- It adds a meaning to the last third dimension.

Sample use cases this would enable:
LLama3

```
# LLama3 8b call to SDPA
query = torch.rand(batch, 32, seq_len_q, D)
key = torch.rand(batch, 8, seq_len_kv, D)
value = torch.rand(batch, 8, seq_len_kv, D)

output = scaled_dot_product_attention(query, key, value, is_causal=True, enable_gqa=True)

# Output Shape
(batch, 32, seq_len_q, D)
```

### Design Choice:

- Check if Query.size(-3) == Key.size(-3) == Value.size(-3) or, Query.size(-3) % Key.size(-3) == 0
- The function adjusts the key and value tensors to match the query tensor's head dimension by using repeat_interleave if their number of heads are not equal, facilitating correct and efficient computation in attention mechanisms.
- By default the enable_gqa flag is set to False, which ensures that regular sdpa functionality remains unchanged.

### Benchmarks:

- **sdpa.py: #130634**
For different batch sizes enable_gqa=True shows a substansial improvement in the run_time of sdpa

 | batch_size | q_num_heads | kv_num_heads | q_seq_len | kv_seq_len | embed_dim | forward_time when enable_gqa=True   |   forward_time when enable_gqa=False    |
| ------------ | ------------- | -------------- | ----------- | ------------ | ----------- | ----------- | ---------------- |
|     1      |     32      |      8       |   2048    |    2048    |   2048    |   100.71  |  119.70  |
|     8      |     32      |      8       |   2048    |    2048    |   2048    |   539.78  |  628.83  |
|     16     |     32      |      8       |   2048    |    2048    |   2048    |   1056.81  |  1225.48  |
|     32      |     32      |      8       |   2048    |    2048    |   2048    |   2099.54  |  2440.45  |

![Screenshot 2024-07-25 at 9 07 40 PM](https://github.com/user-attachments/assets/a3e5f716-c39f-4096-9e6c-82a735e57b7b)

- **TorchTitan: https://github.com/pytorch/torchtitan/pull/458**

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128898
Approved by: https://github.com/drisspg
2024-07-31 22:58:51 +00:00
Xuehai Pan
548c460bf1 [BE][Easy][7/19] enforce style for empty lines in import segments in test/[a-c]*/ and test/[q-z]*/ (#129758)
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter.

You can review these PRs via:

```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129758
Approved by: https://github.com/ezyang
2024-07-31 10:54:03 +00:00
PyTorch MergeBot
499ead96ff Revert "Grouped Query Attention (#128898)"
This reverts commit d039b14207.

Reverted https://github.com/pytorch/pytorch/pull/128898 on behalf of https://github.com/albanD due to Broken test on main ([comment](https://github.com/pytorch/pytorch/pull/128898#issuecomment-2258314481))
2024-07-30 13:11:24 +00:00
jainapurva
d039b14207 Grouped Query Attention (#128898)
### Approach: Using the current function declaration

**Constraint:** Q_Heads % KV_Heads == 0

**Major change:**
- Added a new argument enable_gqa: bool to sdpa function call
- It adds a meaning to the last third dimension.

Sample use cases this would enable:
LLama3

```
# LLama3 8b call to SDPA
query = torch.rand(batch, 32, seq_len_q, D)
key = torch.rand(batch, 8, seq_len_kv, D)
value = torch.rand(batch, 8, seq_len_kv, D)

output = scaled_dot_product_attention(query, key, value, is_causal=True, enable_gqa=True)

# Output Shape
(batch, 32, seq_len_q, D)
```

### Design Choice:

- Check if Query.size(-3) == Key.size(-3) == Value.size(-3) or, Query.size(-3) % Key.size(-3) == 0
- The function adjusts the key and value tensors to match the query tensor's head dimension by using repeat_interleave if their number of heads are not equal, facilitating correct and efficient computation in attention mechanisms.
- By default the enable_gqa flag is set to False, which ensures that regular sdpa functionality remains unchanged.

### Benchmarks:

- **sdpa.py: #130634**
For different batch sizes enable_gqa=True shows a substansial improvement in the run_time of sdpa

 | batch_size | q_num_heads | kv_num_heads | q_seq_len | kv_seq_len | embed_dim | forward_time when enable_gqa=True   |   forward_time when enable_gqa=False    |
| ------------ | ------------- | -------------- | ----------- | ------------ | ----------- | ----------- | ---------------- |
|     1      |     32      |      8       |   2048    |    2048    |   2048    |   100.71  |  119.70  |
|     8      |     32      |      8       |   2048    |    2048    |   2048    |   539.78  |  628.83  |
|     16     |     32      |      8       |   2048    |    2048    |   2048    |   1056.81  |  1225.48  |
|     32      |     32      |      8       |   2048    |    2048    |   2048    |   2099.54  |  2440.45  |

![Screenshot 2024-07-25 at 9 07 40 PM](https://github.com/user-attachments/assets/a3e5f716-c39f-4096-9e6c-82a735e57b7b)

- **TorchTitan: https://github.com/pytorch/torchtitan/pull/458**

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128898
Approved by: https://github.com/drisspg
2024-07-29 21:49:06 +00:00
wizzniu
8963623494 Re-implement pin_memory to be device-agnostic by leveraging the Accelerator concept (#126376)
This PR re-implements pin memory aiming to get rid of the optional `device` argument and makes all related APIs to be device-agnostic. We add two new abstract APIs in [AcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/detail/AcceleratorHooksInterface.h#L12) and redefine pin memory as: "Pin memory is always pinned for the current accelerator device". In detail, it uses [getAcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/Context.h#L61) in pin_memory/is_pinned to get an appropriate device and invoke the corresponding overridden interfaces, instead of using BackendSelect and then dispatching to CUDA or other specific backends' implement methods.

Note: For new backends who want to implement and use pin memory, just inherit AcceleratorHooksInterface and overwrite the `isPinnedPtr` and `getPinnedMemoryAllocator` methods.

Additional context: To avoid BC-breaking, this PR just preserves the `device` arg of related APIs and would throw a deprecation warning if `device` arg is passed. Another PR will be submitted to update all PT callers (`Tensor.is_pinned()`, `Tensor.pin_memory()`...) not to pass this arg based on this PR. In future, `device` arg will be actually removed.

Relates #124908
Relates #14560
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126376
Approved by: https://github.com/albanD
2024-07-23 01:44:15 +00:00
Shan19900305
d57af32e63 Fix undefined tensor error in _copy_from_and_resize when fallback to cpu. (#130237)
1) Add skip undefined tensor in cpu fallback when call _copy_from_and_resize;
2) Modify to_cpu function support optional tensor;
3) Add copy back to origin optional tensor when alias_info isWrite is true.

@ezyang @bdhirsh

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130237
Approved by: https://github.com/ezyang
2024-07-20 23:12:17 +00:00
PyTorch MergeBot
726b9268d2 Revert "Re-implement pin_memory to be device-agnostic by leveraging the Accelerator concept (#126376)"
This reverts commit c986aeea2d.

Reverted https://github.com/pytorch/pytorch/pull/126376 on behalf of https://github.com/atalman due to Failing internal builds ([comment](https://github.com/pytorch/pytorch/pull/126376#issuecomment-2237496633))
2024-07-18 20:25:20 +00:00
wizzniu
c986aeea2d Re-implement pin_memory to be device-agnostic by leveraging the Accelerator concept (#126376)
This PR re-implements pin memory aiming to get rid of the optional `device` argument and makes all related APIs to be device-agnostic. We add two new abstract APIs in [AcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/detail/AcceleratorHooksInterface.h#L12) and redefine pin memory as: "Pin memory is always pinned for the current accelerator device". In detail, it uses [getAcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/Context.h#L61) in pin_memory/is_pinned to get an appropriate device and invoke the corresponding overridden interfaces, instead of using BackendSelect and then dispatching to CUDA or other specific backends' implement methods.

Note: For new backends who want to implement and use pin memory, just inherit AcceleratorHooksInterface and overwrite the `isPinnedPtr` and `getPinnedMemoryAllocator` methods.

Additional context: To avoid BC-breaking, this PR just preserves the `device` arg of related APIs and would throw a deprecation warning if `device` arg is passed. Another PR will be submitted to update all PT callers (`Tensor.is_pinned()`, `Tensor.pin_memory()`...) not to pass this arg based on this PR. In future, `device` arg will be actually removed.

Relates #124908
Relates #14560
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126376
Approved by: https://github.com/albanD
2024-07-18 11:54:14 +00:00
cyy
28f6ae2718 [9/N] Replace c10::optional with std::optional (#130674)
Follows  #130509

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130674
Approved by: https://github.com/Skylion007
2024-07-15 00:48:43 +00:00
Yuanhao Ji
312652c325 [RFC] Add support for device extension autoloading (#127074)
Fixes #122468

- Load device extensions at the end of `torch/__init__.py`
- Enabled by default, or you can disable it with `TORCH_DEVICE_BACKEND_AUTOLOAD=0`

run test:

```python
python test/run_test.py -i test_autoload_enable
python test/run_test.py -i test_autoload_disable
```

doc:

https://docs-preview.pytorch.org/pytorch/pytorch/127074/miscellaneous_environment_variables.html

co-author:  @jgong5 @bsochack @bkowalskiINTEL @jczaja @FFFrog @hipudding

Co-authored-by: albanD <desmaison.alban@gmail.com>
Co-authored-by: Jiong Gong <jiong.gong@intel.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127074
Approved by: https://github.com/albanD, https://github.com/jgong5
2024-07-09 06:14:13 +00:00
FEI
59e4e92556 sdp::SDPBackend::flash_attention support PrivateUse1 (#126392)
Fixes https://github.com/pytorch/pytorch/issues/124271

cc  @cpuhrsch @drisspg @albanD @soulitzer

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126392
Approved by: https://github.com/drisspg
2024-06-28 17:48:40 +00:00
Shan19900305
7931eee5c5 Support torch.dtype as parameter in pybind11 cpp extension. (#126865)
Support torch.dtype as parameter in pybind11 cpp extension.
Example:
`
cpp_extension.my_ops(self, other, torch.dtype)
`

@ezyang @bdhirsh
Co-authored-by: Edward Z. Yang <ezyang@mit.edu>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/126865
Approved by: https://github.com/ezyang
2024-05-29 23:19:32 +00:00
Xuehai Pan
26f4f10ac8 [5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch (#127126)
The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126
Approved by: https://github.com/kit1980
2024-05-27 14:49:57 +00:00
PyTorch MergeBot
55c0ab2887 Revert "[5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch (#127126)"
This reverts commit 7763c83af6.

Reverted https://github.com/pytorch/pytorch/pull/127126 on behalf of https://github.com/XuehaiPan due to Broken CI ([comment](https://github.com/pytorch/pytorch/pull/127126#issuecomment-2133044286))
2024-05-27 09:22:08 +00:00
Xuehai Pan
7763c83af6 [5/N][Easy] fix typo for usort config in pyproject.toml (kown -> known): sort torch (#127126)
The `usort` config in `pyproject.toml` has no effect due to a typo. Fixing the typo make `usort` do more and generate the changes in the PR. Except `pyproject.toml`, all changes are generated by `lintrunner -a --take UFMT --all-files`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127126
Approved by: https://github.com/kit1980
ghstack dependencies: #127122, #127123, #127124, #127125
2024-05-27 04:22:18 +00:00
Richard Barnes
3f5b59eef4 [codemod] c10::optional -> std::optional in caffe2/aten/src/ATen/DeviceGuard.h +117 (#126901)
Summary:
Generated with
```
fbgs -f '.*\.(cpp|cxx|cc|h|hpp|cu|cuh)$' c10::optional -l | perl -pe 's/^fbsource.fbcode.//' | grep -v executorch | xargs -n 50 perl -pi -e 's/c10::optional/std::optional/g'
```

 - If you approve of this diff, please use the "Accept & Ship" button :-)

(117 files modified.)

Test Plan: Sandcastle

Reviewed By: palmje

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126901
Approved by: https://github.com/Skylion007, https://github.com/eqy
2024-05-24 00:26:15 +00:00
Richard Barnes
ed327876f5 [codemod] c10:optional -> std::optional (#126135)
Generated by running the following from PyTorch root:
```
find . -regex ".*\.\(cpp\|h\|cu\|hpp\|cc\|cxx\)$" | grep -v "build/" | xargs -n 50 -P 4 perl -pi -e 's/c10::optional/std::optional/'
```

`c10::optional` is just an alias for `std::optional`. This removes usages of that alias in preparation for eliminating it entirely.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/126135
Approved by: https://github.com/Skylion007, https://github.com/malfet, https://github.com/albanD, https://github.com/aaronenyeshi
2024-05-14 19:35:51 +00:00
Yu, Guangye
31372fa842 Support generic stream/event on CUDA/HIP backend (#125757)
# Motivation
According to [#123611](https://github.com/pytorch/pytorch/pull/123611), we support generic stream/event on CUDA backend.

# Additional Context
new method/attribute on `torch.Event` for cuda
- torch.Event.event_id
- torch.Event.elapsed_time
- torch.Event.synchronize

new method on `c10::Event` on cuda backend
- c10.Event.event_id
- c10.Event.elapsed_time
- c10.Event.synchronize

Pull Request resolved: https://github.com/pytorch/pytorch/pull/125757
Approved by: https://github.com/albanD, https://github.com/jgong5, https://github.com/EikanWang
2024-05-10 13:34:09 +00:00
egienvalue
8461e7ed9e Add test_cpp_extensions tests for stream_and_event and mita_backend (#123614)
Test the generic torch.Stream/Event with fake device gurad and hooks. Since we added a fake device backend, it is mutual exclusive to other backends. Tests will be skipped if TEST_CUDA or TEST_ROCM is true.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123614
Approved by: https://github.com/albanD
ghstack dependencies: #123611, #123612
2024-04-26 16:17:54 +00:00
Shan19900305
8d12ba9acf add methods for open device in PackedSequence module. (#124923)
1) add is_{custom_device_name}() and {custom_device_name}() for open device register;
2) fix open device failed testcases.

@ezyang  @bdhirsh
Pull Request resolved: https://github.com/pytorch/pytorch/pull/124923
Approved by: https://github.com/ezyang
2024-04-26 15:26:20 +00:00
PyTorch MergeBot
4a1299cc0e Revert "Add test_cpp_extensions tests for stream_and_event and mita_backend (#123614)"
This reverts commit 355dc34f86.

Reverted https://github.com/pytorch/pytorch/pull/123614 on behalf of https://github.com/jeffdaily due to this PR broke ROCm with message RuntimeError: Cannot have MTIA with other devices ([comment](https://github.com/pytorch/pytorch/pull/123612#issuecomment-2077649762))
2024-04-25 16:06:46 +00:00
egienvalue
355dc34f86 Add test_cpp_extensions tests for stream_and_event and mita_backend (#123614)
Test the generic torch.Stream/Event with fake device gurad and hooks.

Differential Revision: [D56443358](https://our.internmc.facebook.com/intern/diff/D56443358)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123614
Approved by: https://github.com/albanD
ghstack dependencies: #123611, #123612
2024-04-24 20:51:20 +00:00
Ashwin Hari
5f5778476a rename ort to maia (#123265)
Fixes #123264

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123265
Approved by: https://github.com/albanD
2024-04-23 00:33:25 +00:00
PyTorch MergeBot
52da03edeb Revert "Add test_cpp_extensions tests for stream_and_event and mita_backend (#123614)"
This reverts commit b6f0159db0.

Reverted https://github.com/pytorch/pytorch/pull/123614 on behalf of https://github.com/jeffdaily due to This broke ROCm. see test_overrides.py ([comment](https://github.com/pytorch/pytorch/pull/123611#issuecomment-2067363780))
2024-04-19 22:44:26 +00:00
egienvalue
b6f0159db0 Add test_cpp_extensions tests for stream_and_event and mita_backend (#123614)
Test the generic torch.Stream/Event with fake device gurad and hooks.
@exported-using-ghexport

Differential Revision: [D55902506](https://our.internmc.facebook.com/intern/diff/D55902506/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123614
Approved by: https://github.com/albanD
ghstack dependencies: #123611, #123612
2024-04-18 17:40:13 +00:00
Yuanhao Ji
c797fbc4e1 Enable UFMT on test/cpp_api_parity, test/cpp_extensions, test/create_dummy_torchscript_model.py, test/custom_backend, test/custom_operator (#123518)
Partially addresses #123062

Ran lintrunner on:

- `test/cpp_api_parity`
- `test/cpp_extensions`
- `test/create_dummy_torchscript_model.py`
- `test/custom_backend`
- `test/custom_operator`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/123518
Approved by: https://github.com/huydhn
2024-04-08 20:18:42 +00:00
chentianyi16
83ad8e01b1 fix the problem that cpu_fallback for aten::triu_indices on custom device crashed (#121306)
Fixes #121289

Pull Request resolved: https://github.com/pytorch/pytorch/pull/121306
Approved by: https://github.com/ezyang
2024-03-26 01:29:45 +00:00
PyTorch MergeBot
db506762d1 Revert "Change ATEN generator argument type to const std::optional<Generator>& (#120076)"
This reverts commit a52b4e2257.

Reverted https://github.com/pytorch/pytorch/pull/120076 on behalf of https://github.com/atalman due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/120076#issuecomment-2018680656))
2024-03-25 18:52:05 +00:00
cyy
a52b4e2257 Change ATEN generator argument type to const std::optional<Generator>& (#120076)
This PR proposes to use std::optional<Generator>& for underlying functions to avoid unnecessary copy and move operations. The torchgen code was changed to generate the new type.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120076
Approved by: https://github.com/malfet
2024-03-24 02:12:08 +00:00
PyTorch MergeBot
02fee6caec Revert "Change ATEN generator argument type to const std::optional<Generator>& (#120076)"
This reverts commit ecbe82b9ce.

Reverted https://github.com/pytorch/pytorch/pull/120076 on behalf of https://github.com/jeanschmidt due to Reverting in order to check if this will fix XLA trunk jobs ([comment](https://github.com/pytorch/pytorch/pull/120076#issuecomment-2015272644))
2024-03-22 14:53:45 +00:00
cyy
ecbe82b9ce Change ATEN generator argument type to const std::optional<Generator>& (#120076)
This PR proposes to use std::optional<Generator>& for underlying functions to avoid unnecessary copy and move operations. The torchgen code was changed to generate the new type.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120076
Approved by: https://github.com/malfet
2024-03-22 03:49:31 +00:00
Shan19900305
6662627c89 Add APIs for custom device using TensorIteratorBase. (#120792)
1) add operand and get_dim_names API;
2) set will_resize to true when output tensor is undefined;
3) add abs_stub for dummy device and calculate on cpu device;
4) support dummy device copy with stride;
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120792
Approved by: https://github.com/ezyang
2024-03-20 03:51:09 +00:00
PyTorch MergeBot
c0996866f4 Revert "Change ATEN generator argument type to const std::optional<Generator>& (#120076)"
This reverts commit 4305c64fea.

Reverted https://github.com/pytorch/pytorch/pull/120076 on behalf of https://github.com/izaitsevfb due to breaking internal builds(take 3) ([comment](https://github.com/pytorch/pytorch/pull/120076#issuecomment-1986338164))
2024-03-08 20:01:03 +00:00
cyy
4305c64fea Change ATEN generator argument type to const std::optional<Generator>& (#120076)
This PR proposes to use std::optional<Generator>& for underlying functions to avoid unnecessary copy and move operations. The torchgen code was changed to generate the new type.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120076
Approved by: https://github.com/malfet
2024-03-07 09:52:21 +00:00
Chen_Liqing
291ce86a6c Modify StorageImplCreateHelper (#118459)
I want to use tensor.untyped_storage()[a:b] for ``PrivateUse1`` backend but fail. The code will go into ``THPStorage_get``:
bb6eba189f/torch/csrc/Storage.cpp (L525-L540)

Here ``torch`` will create a new ``c10::StorageImpl`` but not consider about ``PrivateUse1`` backend.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/118459
Approved by: https://github.com/albanD
2024-03-07 06:26:55 +00:00
cyy
507611f9ae [CUDACachingAllocator] Turn Allocator::allocate into non-const (#120969)
Ideally, the method should be non-const since it changes the allocator state. Some const_casts are also removed in the way.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/120969
Approved by: https://github.com/albanD
2024-03-05 09:53:05 +00:00
Shan19900305
6c3600d008 Enable optional tensorList fallback to cpu. (#119273)
add optional tensorList fallback to cpu.
Add testcases and old pr is: https://github.com/pytorch/pytorch/pull/106449

@bdhirsh
Pull Request resolved: https://github.com/pytorch/pytorch/pull/119273
Approved by: https://github.com/bdhirsh
2024-02-07 03:54:13 +00:00
Edward Yang
b4a35632f9 Add function to materialize COW storages (#117053)
Summary: From Kurt Mohler, see https://github.com/pytorch/pytorch/pull/113396 (manually imported due to ghimport problems)

Test Plan: sandcastle, OSS CI

Differential Revision: D52610522

Pull Request resolved: https://github.com/pytorch/pytorch/pull/117053
Approved by: https://github.com/malfet, https://github.com/kurtamohler
2024-01-10 15:34:16 +00:00
PyTorch MergeBot
f36d09fcb7 Revert "Add function to materialize COW storages (#113396)"
This reverts commit e2f090086b.

Reverted https://github.com/pytorch/pytorch/pull/113396 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/113396#issuecomment-1818769090))
2023-11-20 10:26:01 +00:00
Kurt Mohler
e2f090086b Add function to materialize COW storages (#113396)
Part of #109833

Pull Request resolved: https://github.com/pytorch/pytorch/pull/113396
Approved by: https://github.com/ezyang
2023-11-17 01:58:51 +00:00
feifan
c73da67d46 new_qtensor support privateuseone allocator. (#111464)
I want to create a quant tensor through `PerTensorAffineQuantizer`. But I found that it will throw error because of the lake of judgment for PrivateUse1.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/111464
Approved by: https://github.com/ezyang
2023-11-01 05:16:58 +00:00