Commit Graph

2731 Commits

Author SHA1 Message Date
Yu, Guangye
40c098f731 Introduce a device-agnostic runtime API design (#132204)
# Motivation
According to [[RFC]A device-agnostic Python runtime API design for stream-based accelerators](https://github.com/pytorch/pytorch/issues/128403), this PR intends to introduce a device-agnostic runtime API design.
I personally prefer the **Simple Version** APIs that no longer accept the device type as an input argument. It means we will leverage `getAccelerator` to fetch the current accelerator. And it is flexible to expand these APIs to handle multiple types of accelerator scenarios. The design does **NOT** break the previous design philosophies.
I also believe that namespace torch.accelerator is better. It lets users know that the APIs they are calling are running on an accelerator rather than CPU. This is important. Meanwhile, we can follow a simple API design principle:
1. Device-agnostic APIs should be placed under the torch.accelerator namespace and not accept a device_type optional parameter.
2. Device-specific APIs should be placed under device-specific submodules.
3. APIS required by both CPU and accelerators should be placed under the torch namespace and accept a device_type optional parameter.

Also, I list the pros and cons of **Simple Version** here:
Pros:
- `torch.accelerator.foo` will have the same input argument as `torch.xxx.foo`, bringing a better user experience;
- more concise, facilitate the developer to write a device-agnostic code.

Cons:
- no obvious drawbacks.

# Additional Context
I list the new APIs here:
```python
torch.accelerator.is_available() -> bool:
torch.accelerator.current_accelerator() -> torch.device:
torch.accelerator.device_count() -> int:
torch.accelerator.current_device_idx() -> int:
torch.accelerator.set_device_idx(device: Union[torch.device, str, int, None]) -> None:
torch.accelerator.current_stream(device: Union[torch.device, str, int, None]) -> torch.Stream:
torch.accelerator.set_stream(stream: torch.Stream) -> None:
torch.accelerator.synchronize(device: Union[torch.device, str, int, None]) -> None:
```
According to the discussion with Alban, we decide to change the API name `set_device` to `set_device_idx` and `current_device` to `current_device_idx` for more explicit. And will submit other PR to support device and stream context manager.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132204
Approved by: https://github.com/EikanWang, https://github.com/abhilash1910, https://github.com/gujinghui, https://github.com/albanD
2024-10-27 10:37:09 +00:00
Laith Sakka
ed313a5ca2 Introduce torch.sym_add, variadic add (#138660)
Tested internally here: https://www.internalfb.com/diff/D64057744
This is a reland after previous internal failures.
main change is
```
 if min is None and max is None:
        torch._check_is_size(size)
        return
```

Partially addresses https://github.com/pytorch/pytorch/issues/128150

When you have big sums of values, we end up computing long chains of
binary addition in our FX graph representation.  Not only is this ugly,
it also is quadratic, as the sympy.Add constructor is O(N) in number
of arguments.  Instead, ensure that we maintain the summation as a
single FX node so we can do the entire addition all in one go.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138660
Approved by: https://github.com/ezyang, https://github.com/bobrenjc93
2024-10-23 17:42:41 +00:00
Laith Sakka
662d07e93e Remove parallel_and and parallel_or (#138135)
Not used, suggested by @ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/138135
Approved by: https://github.com/ezyang
2024-10-23 00:22:22 +00:00
Nikita Shulga
d1be61ce4e Update copyrights to 2024 (#138638)
Spiritual successor of https://github.com/pytorch/pytorch/pull/119413 + CPP docs copyright update as well
Fixes https://github.com/pytorch/pytorch/issues/138630

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138638
Approved by: https://github.com/atalman
2024-10-22 21:00:58 +00:00
Syed Tousif Ahmed
03c72976a5 Properly uses ref-counting for torch.cuda.use_mem_pool (#133600)
This PR refactors some ref-counting functionality out of `beginAllocateToPool` and `releasePool`. The ref-counting logic is then used in construction and destruction of `torch.cuda.MemPool`.

The `use_count` variable in the CUDACachingAllocator is essentially a refcount of how many context managers are using the pool. Since we are now lifting up the MemPool abstraction to the user, the MemPool object itself now needs to hold a an extra reference as well.

Part of https://github.com/pytorch/pytorch/issues/124807.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133600
Approved by: https://github.com/eqy, https://github.com/ezyang
2024-10-22 03:21:53 +00:00
Mikayla Gawarecki
e24871eb3c Add environment variable to force no weights_only load (#138225)
In preparation for `weights_only` flip, if users don't have access to the `torch.load` call

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138225
Approved by: https://github.com/albanD
2024-10-21 23:26:15 +00:00
Justin Chu
c6609ece84 [ONNX] Remove deprecated export_to_pretty_string (#137790)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137790
Approved by: https://github.com/titaiwangms, https://github.com/xadupre
ghstack dependencies: #137789
2024-10-21 18:17:48 +00:00
Tugsbayasgalan Manlaibaatar
1f32a1fb80 Replace torch.export default decomp table to be lazily populated (#137650)
In this PR, we implement lazy dictionary for export decomp behaviour for following reasons:
1. Custom op loading can happen after import time, as a result, the decomp table might not be able to pick up the decomp. Therefore we try to delay materialization as late as possible.

I intentionally seperated out the core_aten_decomp to not have any custom CIA ops in this PR to mitigate the risk of getting reverted but in the future, core_aten_decomp under torch/_decomp will exist as an alias to official export table (torch.export.default_decompositions)

Differential Revision: [D64140807](https://our.internmc.facebook.com/intern/diff/D64140807)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137650
Approved by: https://github.com/justinchuby, https://github.com/bdhirsh
2024-10-18 19:28:52 +00:00
Svetlana Karslioglu
9c2a80322a Add Programmable Google Search (#137716)
- Adding the code for the programmable Google search
- Adding the CSS overrides.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137716
Approved by: https://github.com/seemethere, https://github.com/albanD

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-10-18 18:18:16 +00:00
ErezYosef
5a81475884 Documentation Update: Fix Missing Whitespace in Optimizer Docs (#138321)
### Description:

This PR addresses a minor [formatting issue identified in a previous contribution to the Optimizer documentation](https://github.com/pytorch/pytorch/pull/134107#discussion_r1800833948).

Specifically, it fixes the missing whitespace after `param_names` in the section on utilizing named parameters to load the optimizer state dict.

You can find the related docs here:
[Optimizer Documentation](https://pytorch.org/docs/main/optim.html#how-to-utilize-named-parameters-to-load-optimizer-state-dict).

@janeyx99

Pull Request resolved: https://github.com/pytorch/pytorch/pull/138321
Approved by: https://github.com/janeyx99
2024-10-18 15:41:43 +00:00
Yu, Guangye
8cda774a03 Add torch.xpu.get_arch_list and torch.xpu.get_gencode_flags for XPU (#137773)
# Motivation
Add `torch.xpu.get_arch_list()` and `torch.xpu.get_gencode_flags()` methods that return architecture list and AOT flags to preserve what flags PyTorch XPU was built with.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137773
Approved by: https://github.com/EikanWang, https://github.com/albanD
2024-10-18 02:28:08 +00:00
Zheng, Zhaoqiong
7ba706c74e update get start xpu (#137479)
1. respect the comment from the community, downgrade the "Beta" to "Prototype" for the first xpu release with wheel
2. add wheels installation of torchaudio & torchvision for nightly on Windows
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137479
Approved by: https://github.com/atalman, https://github.com/malfet
2024-10-16 17:36:29 +00:00
PyTorch MergeBot
dd32a32cb6 Revert "Expose option to disable CRC-32 computation during torch.save (#137735)"
This reverts commit 534fa96f2d.

Reverted https://github.com/pytorch/pytorch/pull/137735 on behalf of https://github.com/clee2000 due to failing internally D64438525, probably needs gating ([comment](https://github.com/pytorch/pytorch/pull/137735#issuecomment-2417412264))
2024-10-16 17:03:06 +00:00
William Wen
4c8718d8e7 [dynamo] add torch.compiler.set_stance (#137504)
Attempt # 2 at https://github.com/pytorch/pytorch/pull/132926 to implement https://github.com/pytorch/pytorch/issues/123771.

Implement a new `torch.compiler.set_stance` function that can force `torch.compile` regions to run eagerly.

See added tests for usage examples.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137504
Approved by: https://github.com/yf225, https://github.com/jansel
2024-10-16 16:18:25 +00:00
Howard Huang
75109682b6 [Pipelining] Refactor Interleaved1F1B and ZeroBubble (#137783)
NOTE: this PR removes `ScheduleFlexibleInterleaved1F1B`, let me know if theres any concerns.

`ScheduleFlexibleInterleaved1F1B` is a superset of `Interleaved1F1B` and uses most of the same implementation, but relaxes the condition that `n_microbatches % pp_size == 0`. This is refactors the implementation into `Interleaved1F1B` and then removes it since it is confusing to have both schedules with similar names. This also refactors the zero bubble logic to belong in the `ZeroBubble` schedule class.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137783
Approved by: https://github.com/wconstab
2024-10-16 03:05:14 +00:00
Jane Xu
eaec72d1e6 Link directly to new Custom Ops Landing Page (#137933)
e.g., click on first link in https://docs-preview.pytorch.org/pytorch/pytorch/137933/library.html#testing-custom-ops

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137933
Approved by: https://github.com/zou3519
2024-10-15 21:18:21 +00:00
Mikayla Gawarecki
534fa96f2d Expose option to disable CRC-32 computation during torch.save (#137735)
Option only works in open source, not internal

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137735
Approved by: https://github.com/albanD
2024-10-15 19:30:02 +00:00
PyTorch MergeBot
2831af39c4 Revert "[ONNX] Remove deprecated export_to_pretty_string (#137790)"
This reverts commit d0628a7e39.

Reverted https://github.com/pytorch/pytorch/pull/137790 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/137789#issuecomment-2414632100))
2024-10-15 17:40:06 +00:00
Alex Baden
39d21ed803 [Inductor] Update AttrsDescriptor instantiation for Triton changes (#137458)
The `AttrsDescriptor` class has been present in Triton for almost a year now (introduced [here](72c9833927)), so we should be able to rely on it existing. I am in the process of supporting the new `AttrsDescriptor` class and @jansel suggested I split changes to the existing class out separately to make sure nothing breaks removing the legacy attribute descriptor attributes.

Initially I attempted to remove the branching around detecting whether `AttrsDescriptor` exists but that breaks because PyTorch must build without Triton. So, I went back and updated for the naming introduced in the commit linked above, and also removed two unused attributes `divisible_by_8` and `ids_to_fold` which were removed in Feb 2024 (https://github.com/triton-lang/triton/pull/3122 and https://github.com/triton-lang/triton/pull/3080 respectively).

With these changes only the internal workings of the `AttrsDescriptor` class will differ between supported Triton versions, but the data stored will remain consistent.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137458
Approved by: https://github.com/jansel
2024-10-14 20:20:29 +00:00
ErezYosef
197601eeea Add Support for Tracking Parameter Names (named_parameters) in Optimizer State Dict (#134107)
A proposal addressing Issue #1489: **Optimizer should track parameter names and not id.**

(also mentioned in here: [[RFC] Introducing FQNs/clarity eyeglasses to optim state_dict](https://dev-discuss.pytorch.org/t/rfc-introducing-fqns-clarity-to-optim-state-dict/1552)

## Summary
This PR introduces a backward-compatible enhancement where optimizers track parameter names instead of just their id.
Optimizers can be initialized with `named_parameters()` as:
```python
optimizer = optim.SGD(model.named_parameters(), lr=0.01, momentum=0.9)
```
This allows for greater clarity and ease when handling optimizers, as the parameters' names are preserved within the optimizer’s `state_dict` as:
```
state_dict =
{
    'state': {
    0: {'momentum_buffer': tensor(...), ...},
    1: {'momentum_buffer': tensor(...), ...},
    },
    'param_groups': [
        {
        'lr': 0.01,
        'weight_decay': 0,
        ...
        'params': [0,1]
        'param_names' ['layer.weight', 'layer.bias']  (optional)
        }
    ]
}
```
Loading `state_dict` is not changed (backward-compatible) and the `param_names` key will be ignored.

## Key Features
#### Named Parameters in Optimizer Initialization:
Optimizers can accept the output of `model.named_parameters()` during initialization, allowing them to store parameter names directly.
#### Parameter Names in `state_dict`:
The parameter names are saved as a list in the optimizer’s `state_dict` with key `param_names`, alongside the `params` indices, ensuring seamless tracking of both names and parameters.

## Backward Compatibility
#### No Breaking Changes:
This change is fully backward-compatible. The added `param_names` key in the optimizer's `state_dict` is ignored when loading a state to the optimizer.

#### Customization with Hooks:
For more control, the loaded state_dict can be modified using a custom `register_load_state_dict_pre_hook`, providing flexibility for different design needs.

## Documentation Updates
Please refer to the documentation changes for more details on how this feature is implemented and how it can be used effectively.

## Solution Example:

A suggested solution to the problem mentioned in #1489, for the same parameters but in a different order.
The following `register_load_state_dict_pre_hook` should be added to the optimizer before loading to enable loading the state dict :
```python
def adapt_state_dict_ids(optimizer, state_dict):
    # assuming a single param group.
    current_state_group = optimizer.state_dict()['param_groups'][0]
    loaded_state_group = state_dict['param_groups'][0]

    # same number of params, same names, only different ordering
    current_state_name_to_id_mapping = {}  # mapping --  param_name: id
    for i, name in enumerate(current_state_group['param_names']):
        current_state_name_to_id_mapping[name] = current_state_group['params'][i]

    # changing the ids of the loaded state dict to match the order of the given state dict.
    for i, name in enumerate(current_state_group['param_names']):
        loaded_state_group['params'][i] = current_state_name_to_id_mapping[name]

    return state_dict
```
In this code, the loaded `state_dict` ids are adapted to match the order of the current optimizer `state_dict`.
Both the previous and the current optimizers are required to be initiated with `named_parameters()` to have the 'param_names' key in the dict.

### Note
This is my first contribution to PyTorch, and I wish to receive feedback or suggestions for improvement.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134107
Approved by: https://github.com/janeyx99

Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>
2024-10-14 19:24:44 +00:00
Justin Chu
d0628a7e39 [ONNX] Remove deprecated export_to_pretty_string (#137790)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137790
Approved by: https://github.com/titaiwangms
ghstack dependencies: #137789
2024-10-11 20:10:04 +00:00
Jiong Gong
e30c55ee52 Update maintainers for inductor and x86 CPU (#136839)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136839
Approved by: https://github.com/Skylion007, https://github.com/albanD, https://github.com/malfet
2024-10-11 07:24:07 +00:00
Jin Zhou
5516ac5c21 [ROCm] Tunableop record untuned (#128813)
When enable tunableop, It is easy to have OOM since APP usually needs large video memory size, such as running a LLM for inference.  So we need a offline mode to tune the GEMMs. This PR provide an offline mode for tunableOp:

- record untuned GEMMs to file.

- a python API named tune_gemm_in_file is added to read the untuned file and tune the GEMMs in file

Pull Request resolved: https://github.com/pytorch/pytorch/pull/128813
Approved by: https://github.com/jeffdaily, https://github.com/hongxiayang, https://github.com/naromero77amd

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
2024-10-09 21:59:03 +00:00
Jane Xu
cfe970260a Clarify opt-einsum usage, fix #127109 (#137596)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/137596
Approved by: https://github.com/albanD
2024-10-09 20:31:24 +00:00
PyTorch MergeBot
16a2c2cfd4 Revert "Introduce torch.sym_sum (#136429)"
This reverts commit 90bed32b98.

Reverted https://github.com/pytorch/pytorch/pull/136429 on behalf of https://github.com/ezyang due to fails internal stuff ([comment](https://github.com/pytorch/pytorch/pull/136429#issuecomment-2403335147))
2024-10-09 20:08:01 +00:00
Edward Z. Yang
90bed32b98 Introduce torch.sym_sum (#136429)
Partially addresses https://github.com/pytorch/pytorch/issues/128150

When you have big sums of values, we end up computing long chains of
binary addition in our FX graph representation.  Not only is this ugly,
it also is quadratic, as the sympy.Add constructor is O(N) in number
of arguments.  Instead, ensure that we maintain the summation as a
single FX node so we can do the entire addition all in one go.

update_hint_regression benchmark, before and after:

```
update_hint_regression,compile_time_instruction_count,2648328980
update_hint_regression,compile_time_instruction_count,2563748678
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136429
Approved by: https://github.com/isuruf
2024-10-08 18:12:57 +00:00
Michael Lazos
22e19bd2d7 Add link to torch.compile the missing manual in troubleshooting (#137301)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137301
Approved by: https://github.com/svekars

Co-authored-by: Svetlana Karslioglu <svekars@meta.com>
2024-10-04 18:19:30 +00:00
Jeff Daily
c7b0d4b148 raw_alloc ignores PYTORCH_NO_CUDA_MEMORY_CACHING (#131114)
raw_alloc is used by cudnn, miopen, thrust, and tunableop.  Without this PR, the env var for disabling the caching allocator will only partially work.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131114
Approved by: https://github.com/eqy, https://github.com/houseroad, https://github.com/albanD

Co-authored-by: Nichols A. Romero <nick.romero@amd.com>
2024-10-04 15:36:29 +00:00
PyTorch MergeBot
0d1701f310 Revert "raw_alloc ignores PYTORCH_NO_CUDA_MEMORY_CACHING (#131114)"
This reverts commit 7001907480.

Reverted https://github.com/pytorch/pytorch/pull/131114 on behalf of https://github.com/PaliC due to failing internal builds ([comment](https://github.com/pytorch/pytorch/pull/131114#issuecomment-2390615007))
2024-10-03 06:22:55 +00:00
Xilun Wu
54f50f19eb [dtensor][experimental] expose DTensor Context Parallel API (#137038)
**Summary**
expose experimental Context Parallel API `torch.distributed.tensor.experimental._attention.context_parallel` to module `torch.distributed.tensor.experimental`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/137038
Approved by: https://github.com/wz337, https://github.com/fegin
2024-10-02 18:00:23 +00:00
Jeff Daily
7001907480 raw_alloc ignores PYTORCH_NO_CUDA_MEMORY_CACHING (#131114)
raw_alloc is used by cudnn, miopen, thrust, and tunableop.  Without this PR, the env var for disabling the caching allocator will only partially work.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131114
Approved by: https://github.com/eqy, https://github.com/houseroad, https://github.com/albanD

Co-authored-by: Nichols A. Romero <nick.romero@amd.com>
2024-10-02 16:27:15 +00:00
Nikita Shulga
76a57568de Update windows maintainers (#136901)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136901
Approved by: https://github.com/albanD
2024-09-30 16:12:49 +00:00
albanD
2421344d8f Update current maintainers (#136672)
This file didn't had an overall in a few years so long overdue. Most of the credit goes to @orionr for gathering all of this info.

The main rules we followed:
- No code contributor is removed, they're all placed as emeritus
- Breakdown too big categories to make this document useful to know who to ping
- No category where the code is still in the codebase is removed
- We did not rework the categories (for example to be closer to module: labels) and leave that for later
- All non-emeritus names are ordered by their number of comments on issues related to their topic
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136672
Approved by: https://github.com/eqy, https://github.com/ezyang, https://github.com/seemethere, https://github.com/malfet
2024-09-26 17:13:16 +00:00
Zheng, Zhaoqiong
f3dd1721f4 [Update] Update note for Getting Started with PyTorch on Intel GPUs (#129946)
remove the hardware and software prerequisites and set up env part.
keep the prerequisites section and link to pytorch prerequistes for intel gpus for driver install, intel support package install and env set up
https://www.intel.com/content/www/us/en/developer/articles/tool/pytorch-prerequisites-for-intel-gpus.html
Update the support for Intel Client GPU MTL-H
Update inference & training examples

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129946
Approved by: https://github.com/seemethere
2024-09-26 00:22:05 +00:00
Jokeren
cabfbef6cf [pytorch][PR] [inductor] More fixes on the keys of constants and signature dictionaries (#136514)
Summary: Previous PR forgets to change two other places that also create `constants` and `signature`.

Test Plan:
Imported from GitHub, without a `Test Plan:` line.
 {F1884584338}

Differential Revision: D63027728

Pulled By: Myrthan

Co-authored-by: Jokeren <robinho364@gmail.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136514
Approved by: https://github.com/jansel

Co-authored-by: Jokeren <robinho364@gmail.com>
2024-09-25 09:34:14 +00:00
Jianyu Huang
0a35986cdb Add option to configure reduced precision math backend for SDPA (#135964)
Summary: Address https://github.com/pytorch/pytorch/issues/135778 by adding a global flag to configure whether using high precision or low precision for math backend of SDPA.

Test Plan: buck2 run mode/opt //scripts/feikou/llm:run_attn_kernels

Differential Revision: D62625515

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135964
Approved by: https://github.com/jbschlosser
2024-09-24 07:11:38 +00:00
Sergii Dymchenko
d9aca9914b Remove duplicated words in library.rst (#136340)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/136340
Approved by: https://github.com/svekars
2024-09-20 03:30:54 +00:00
Banit Agrawal
a575ce0dc6 [PyTorch Pinned Allocator] Add support of background thread to process events (#135524)
Summary: Currently we process events in the regular allocation path and we call cudaEventQuery to check on the events and this path can take some locks in libcuda driver. Its not entirely needed to do process events in the allocation path, we could move this to a background thread and keep processing events regularly and put the freed block to the free list.

Differential Revision: D62396585

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135524
Approved by: https://github.com/zyan0
2024-09-17 21:08:10 +00:00
Banit Agrawal
48d18fbd4c [PyTorch CUDA Allocator] Allow reuse of non-split blocks with better rounding (#136174)
Summary:
This diff adds an option to round the non-split blocks in caching allocator so that they can be reused without causing lots of fragmentation for large memory segments.

For example, if we specify max_split memory size as 400MB, then all allocations more than 400MB will not be split. Lets say, we allocated some 1024MB blocks and these are cached in the allocator blocks. If we request a new 500MB block, we round it to nearest power-2-division, thats 512MB, we add default kLargeBuffer of 20MB, that will be 532MB and since 532MB is less than existing 1024MB block, the 1024MB will not be used for this allocation, instead a new 512MB block will be created. In this diff, we provide an option to cofigure the kLargeBuffer for rounding and expose as a configurable option, so 512MB + max_non_split_rounding_size and if thats greater than 1024MB, we will use te 1024MB and we wont create a new 512MB block using cudaMalloc. This option is added so that we can pre-allocate some large blocks so that we can reuse them as much as possible and we dont stall on calling cudaMalloc.

Differential Revision: D62758758

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136174
Approved by: https://github.com/zyan0
2024-09-17 19:08:44 +00:00
Trung Truong
cc365fdd7b [MTIA] Support torch.cuda.get_device_capability equivalent API on MTIA (#135889)
Summary:
Mirror `get_device_capability` on MTIA per https://fburl.com/gdoc/p4lo5avn

At the moment, both the major and minor version are just 0

Test Plan:
Unit test: `buck2 test //mtia/host_runtime/torch_mtia/tests:test_torch_mtia_api`

https://www.internalfb.com/intern/testinfra/testconsole/testrun/1688850109958190/

Differential Revision: D62595296

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135889
Approved by: https://github.com/egienvalue
2024-09-17 17:42:56 +00:00
Nikita Shulga
38caf10411 [EZ] Fix spelling typo (#136157)
s/toosl/tools/ (spotted by @louie-tsai)
Also, capitalize CUDA

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136157
Approved by: https://github.com/kit1980
2024-09-16 19:30:30 +00:00
PyTorch MergeBot
0199fd4d7e Revert "[inductor] More fixes on the keys of constants and signature dictionaries (#135406)"
This reverts commit e54b559e88.

Reverted https://github.com/pytorch/pytorch/pull/135406 on behalf of https://github.com/jeanschmidt due to Reverting as it is breaking triton_mtia internal signals @jansel could you have a look and help get those changes merged? ([comment](https://github.com/pytorch/pytorch/pull/135406#issuecomment-2353557481))
2024-09-16 17:58:02 +00:00
Howard Huang
e501ed71d4 Update link in distributed.tensor.parallel.rst (#136103)
dtensor folder was moved

Pull Request resolved: https://github.com/pytorch/pytorch/pull/136103
Approved by: https://github.com/kwen2501, https://github.com/fegin
2024-09-15 19:36:29 +00:00
Tugsbayasgalan Manlaibaatar
dec3403b24 Add some doc for export_for_training (#135918)
Differential Revision: [D62610491](https://our.internmc.facebook.com/intern/diff/D62610491)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135918
Approved by: https://github.com/avikchaudhuri
ghstack dependencies: #135080, #135912
2024-09-15 17:08:12 +00:00
Tugsbayasgalan Manlaibaatar
1904b09e61 Create export_for_inference API and expose core_aten as public facing API (#135912)
Differential Revision: [D62606908](https://our.internmc.facebook.com/intern/diff/D62606908)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/135912
Approved by: https://github.com/avikchaudhuri
ghstack dependencies: #135080
2024-09-15 17:05:07 +00:00
Justin Chu
e2d3af405f [ONNX] Remove logging apis from public (#133825)
Remove

- torch.onnx.enable_log
- torch.onnx.disable_log
- torch.onnx.set_log_stream
- torch.onnx.log

Because they are not meant for public consumption and has been marked for deprecation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133825
Approved by: https://github.com/titaiwangms
2024-09-13 22:19:52 +00:00
CaoE
2f53d570fe Update document for autocast on CPU (#135299)
Update document for autocast on CPU due to the support of float16 and changes in the operator list.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135299
Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/svekars
2024-09-13 09:11:47 +00:00
Jokeren
e54b559e88 [inductor] More fixes on the keys of constants and signature dictionaries (#135406)
Previous PR forgets to change two other places that also create `constants` and `signature`. https://github.com/pytorch/pytorch/pull/135170

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135406
Approved by: https://github.com/jansel
2024-09-13 04:10:41 +00:00
Xavier Dupré
5e145861f2 [ONNX] Improves documentation of ONNX exporter (#135372)
The PR updates the documentation to reflect the changes introduced in pytorch 2.5 and related to onnx exporter.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/135372
Approved by: https://github.com/justinchuby

Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
2024-09-09 15:09:01 +00:00
Wanchao Liang
cfc227ad43 [reland][dtensor] move DTensor to public namespace (#134203)
reland of https://github.com/pytorch/pytorch/pull/133113

I have to create a new PR because the previous reverted PR could not either be rebased, or imported successfully :(

----

Moving DTensor to be in the public namespace, to formally add the documentation page that includes all the public APIs. This includes:

* many path renames and path import fixes
* a dedicated doc page without too much content yet (adding in the next PRs)
* To preserve the BC for users still using the torch.distributed._tensor, I added a shim script to redirect old path calls to the new module

The BC preserving is evidented by the fact that all DTensor tests are still working without changing the public imports. So it's safe to land the changes

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134203
Approved by: https://github.com/tianyu-l
2024-09-08 17:08:40 +00:00