Commit Graph

2671 Commits

Author SHA1 Message Date
Mikayla Gawarecki
94db935749 Add torch.serialization.skip_data context manager (#134504)
## Semantic

The semantic is
(1) By default `torch.serialization.skip_data(materialize_fake_tensors=False)` will make `torch.save` skip writing storages (but reserve space for them in the checkpoint).

```python
import torch
import torch.nn as nn

sd = nn.Linear(3, 5).state_dict()
with torch.serialization.skip_data():
    torch.save(sd, 'foo.pt')
print(torch.load('foo.pt', weights_only=True))
```

(2)  With `torch.serialization.skip_data(materialize_fake_tensors=True)`If FakeTensor is passed to `torch.save` the pickler will treat these FakeTensors as being "materialized" space will be reserved in the checkpoint for the associated storage bytes, and when loading the type will be Tensor instead of FakeTensor)

```python
import torch
import torch.nn as nn
from torch._subclasses.fake_tensor import FakeTensorMode

with FakeTensorMode():
    m = nn.Linear(3, 5, dtype=torch.float16, device='cuda')

sd = m.state_dict()
with torch.serialization.skip_data(materialize_fake_tensors=True):
    torch.save(sd, 'bla.pt')
print(torch.load('bla.pt', weights_only=True))
# OrderedDict([('weight', tensor([[0., 0., 0.],
#        [0., 0., 0.],
#        [0., 0., 0.],
#        [0., 0., 0.],
#        [0., 0., 0.]], device='cuda:0', dtype=torch.float16)), ('bias', tensor([0., 0., 0., 0., 0.], device='cuda:0', dtype=torch.float16))])

```

## Follow Ups

- [ ] `torch.load` semantic for skip_data context manager
- [ ] Mechanism for getting offsets of storages saved via this method (for writing in a separate pass)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134504
Approved by: https://github.com/albanD
2024-08-29 04:52:52 +00:00
Syed Tousif Ahmed
4655eb3ee2 Uses MemPoolContext to route allocations from CUDACachingAllocator (#134685)
Re-open of https://github.com/pytorch/pytorch/pull/133599 that was mistakenly closed by issuing `ghstack land`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134685
Approved by: https://github.com/ezyang
2024-08-29 03:56:31 +00:00
PyTorch MergeBot
503c0dd923 Revert "Add MaskedTensor support to *_like API (#128637)"
This reverts commit b6e51711a0.

Reverted https://github.com/pytorch/pytorch/pull/128637 on behalf of https://github.com/ZainRizvi due to Actually, seems like it was this commit that introduced the failure: test_maskedtensor.py::TestOperatorsCUDA::test_like_empty_like_layout1_cuda_bool [GH job link](https://github.com/pytorch/pytorch/actions/runs/10604690725/job/29392898277) [HUD commit link](b6e51711a0) ([comment](https://github.com/pytorch/pytorch/pull/128637#issuecomment-2316554188))
2024-08-29 01:42:52 +00:00
PyTorch MergeBot
1285443994 Revert "Add torch.serialization.skip_data context manager (#134504)"
This reverts commit 202600bc23.

Reverted https://github.com/pytorch/pytorch/pull/134504 on behalf of https://github.com/mikaylagawarecki due to This is breaking Windows docs tests due to NamedTemporaryFile on Windows not working well ([comment](https://github.com/pytorch/pytorch/pull/134504#issuecomment-2316543901))
2024-08-29 01:30:49 +00:00
Avik Chaudhuri
ca03a14cf7 hang dim hint constants off Dim (#134702)
Summary: Retry landing https://github.com/pytorch/pytorch/pull/134484

Test Plan: (see original)

Differential Revision: D61925860

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134702
Approved by: https://github.com/pianpwk
2024-08-29 01:02:01 +00:00
Mikayla Gawarecki
202600bc23 Add torch.serialization.skip_data context manager (#134504)
## Semantic

The semantic is
(1) By default `torch.serialization.skip_data(materialize_fake_tensors=False)` will make `torch.save` skip writing storages (but reserve space for them in the checkpoint).

```python
import torch
import torch.nn as nn

sd = nn.Linear(3, 5).state_dict()
with torch.serialization.skip_data():
    torch.save(sd, 'foo.pt')
print(torch.load('foo.pt', weights_only=True))
```

(2)  With `torch.serialization.skip_data(materialize_fake_tensors=True)`If FakeTensor is passed to `torch.save` the pickler will treat these FakeTensors as being "materialized" space will be reserved in the checkpoint for the associated storage bytes, and when loading the type will be Tensor instead of FakeTensor)

```python
import torch
import torch.nn as nn
from torch._subclasses.fake_tensor import FakeTensorMode

with FakeTensorMode():
    m = nn.Linear(3, 5, dtype=torch.float16, device='cuda')

sd = m.state_dict()
with torch.serialization.skip_data(materialize_fake_tensors=True):
    torch.save(sd, 'bla.pt')
print(torch.load('bla.pt', weights_only=True))
# OrderedDict([('weight', tensor([[0., 0., 0.],
#        [0., 0., 0.],
#        [0., 0., 0.],
#        [0., 0., 0.],
#        [0., 0., 0.]], device='cuda:0', dtype=torch.float16)), ('bias', tensor([0., 0., 0., 0., 0.], device='cuda:0', dtype=torch.float16))])

```

## Follow Ups

- [ ] `torch.load` semantic for skip_data context manager
- [ ] Mechanism for getting offsets of storages saved via this method (for writing in a separate pass)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134504
Approved by: https://github.com/albanD
2024-08-28 23:53:17 +00:00
PyTorch MergeBot
f997b2b8e6 Revert "Add MaskedTensor passthrough: unfold, F.Unfold, F.Fold, stack (#125262)"
This reverts commit f685018ea9.

Reverted https://github.com/pytorch/pytorch/pull/125262 on behalf of https://github.com/ZainRizvi due to Hi, this PR appears to be calling maskedtensor tests to fail on main. Please rebase your changes onto the latest trunk build to repro the failure. test_maskedtensor.py::TestOperatorsCUDA::test_like_empty_like_layout1_cuda_bool [GH job link](https://github.com/pytorch/pytorch/actions/runs/10604716811/job/29393256312) [HUD commit link](f685018ea9) ([comment](https://github.com/pytorch/pytorch/pull/125262#issuecomment-2316387447))
2024-08-28 23:10:07 +00:00
Nowtryz
f685018ea9 Add MaskedTensor passthrough: unfold, F.Unfold, F.Fold, stack (#125262)
Hi,
I noticed the `unfold` operator was missing on MaskedTensor.

I tested that my change works when calling unfold and backward on a `MaskedTensor` but I didn't find the tests for the dispatch of such operation. Where is it?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125262
Approved by: https://github.com/cpuhrsch
2024-08-28 21:30:39 +00:00
Nowtryz
b6e51711a0 Add MaskedTensor support to *_like API (#128637)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128637
Approved by: https://github.com/cpuhrsch
2024-08-28 21:28:23 +00:00
PyTorch MergeBot
13d40f6fc5 Revert "hang dim hint constants off Dim (#134484)"
This reverts commit c142af7209.

Reverted https://github.com/pytorch/pytorch/pull/134484 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/134484#issuecomment-2315749549))
2024-08-28 16:05:42 +00:00
Avik Chaudhuri
c142af7209 hang dim hint constants off Dim (#134484)
Summary: Recently https://github.com/pytorch/pytorch/pull/133620 added support for automatic dynamic shapes, where a new enum, `DIM`, was introduced to provide hints like `AUTO` and `STATIC`. This PR is a nominal change where we expose the hints via the existing public `Dim` API, and remove `DIM` from the public API. The main motivation is to avoid having users need to import too many things.

Test Plan: existing

Differential Revision: D61807361

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134484
Approved by: https://github.com/angelayi
2024-08-28 14:35:40 +00:00
Jerry Zhang
3ef4c27ab3 Update pt2e numeric debugger to use node.meta["custom"] field (#134040)
Summary:
With https://github.com/pytorch/pytorch/pull/131912 we now have a "custom" field in node.meta that can be preserved
in

* copy/deepcopy
* run_decompositions()
* serialization
* re-exporting

So we refactored numeric debugger to use this.

Test Plan:
python test/test_quantization.py TestNumericDebugger

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/134040
Approved by: https://github.com/tarun292
2024-08-27 19:51:03 +00:00
Tianyi Tao
7af38eb98b Fix unexpected inference_mode interaction with torch.autograd.functional.jacobian (#130307)
Fixes #128264

Pull Request resolved: https://github.com/pytorch/pytorch/pull/130307
Approved by: https://github.com/soulitzer
2024-08-25 22:14:02 +00:00
Yiming Zhou
2cfc2da527 [export] Make move_to_device_pass function public (#134263)
Summary:
This is a follow-up of https://github.com/pytorch/pytorch/pull/133660

Here we make the `move_to_device_pass()` function publich so users can call it by `from torch.export.passes import move_to_device_pass`

Test Plan: CI

Differential Revision: D61671310

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134263
Approved by: https://github.com/angelayi
2024-08-23 23:18:30 +00:00
Pian Pawakapan
8ff3a5be1b [export] basic auto dynamic shapes (#133620)
Starter version of automatic dynamic shapes for export.

Creates enums `DIM.AUTO`, `DIM.STATIC`, allowing user to specify `AUTO` for dims in dynamic_shapes specs, meaning that corresponding dims are treated as dynamic, and relevant guards will do what's necessary (e.g. refine ValueRanges, set replacements based on equality, or even set static) without raising ConstraintViolationErrors. Basically allows the user to say, "a bunch of these dims can be dynamic, let export do model analysis and return the program with maximum possible dynamism, without complaining".

The usage for specifying `dynamic_shapes` is now:
```
AUTO -> dynamic by default, return whatever produce_guards() says, even if it's static
None/int/STATIC -> static
Dim/DerivedDim -> same as before - will complain if the min/max range is invalid, or if dims related to this are unspecified.
```

Caveat 1: specifying `AUTO` for a dim won't guarantee it'll be dynamic:

- specifying `AUTO` for a dim will return the maximum possible dynamism given your program and other specified constraints, but this can still mean you'll get a static program. For example, with the program below, x is specified dynamic, but it's equal to y, which is specified static, and with how we currently do things we won't promote y to dynamic, but will demote(?) x to static. So this can be surprising if you don't fully know your model, and/or missed one of your other inputs when specifying auto-dynamic shapes.
```
class Foo(torch.nn.Module):
    def forward(self, x, y):
        return x + y
inputs = (torch.randn(6), torch.randn(6))
export(Foo(), inputs, dynamic_shapes={"x": (DIM.AUTO,), "y": None})
```

Caveat 2: specifying `AUTO` and Dims in the same spec is still problematic:

- The way Dims/DerivedDims are currently handled is very strict. A Dim represents a symbol, and we require a user to specify the symbol for all dims governed by the symbol - that's why we've seen errors in the past like `The values of x must always be related to y by ...`, asking the user to specify the exact relation as in the program. We also require the specified min/max range to be a subset of the valid range from model analysis. All this doesn't compose well with specifying `AUTO` just yet - for example in the program below, ideal behavior could be to return a dynamic program, where `dx = x.size(0) = y.size(0)` has range (3,6). Unfortunately this crashes, and correct behavior is to specify `dx` for both inputs. So currently we raise a UserError and crash if both Dims + `AUTO` are present in the spec.
```
class Foo(torch.nn.Module):
    def forward(self, x, y):
        return x + y
inputs = (torch.randn(6), torch.randn(6))
export(Foo(), inputs, dynamic_shapes={"x": (DIM.AUTO,), "y": {0: Dim("dx", min=3, max=6)}})  # this doesn't work, because x & y and related
```

Implementation details:

This is done by setting `assume_static_by_default=False`, and doing a transform on the `dynamic_shapes` spec to preserve semantics. `assume_static_by_default=False` will treat unspecified dims or Nones as dynamic. This is the opposite of what `export.export()` currently does - unspecified Dims/Nones are treated as static. Historically this static-by-default behavior, where the user deals with fewer guards, has been desirable, and we would like to respect that in this implementation. So this internal spec transformation is added, `_transform_shapes_for_default_dynamic()`, does the spec conversion necessary to be compatbile with dynamic by default. Specifically, AUTOs are converted into Nones, and Nones/unspecified dims are filled in with explicitly static constraints.

For example, this would look like, for a 3-d tensor: `{0: DIM.AUTO, 1: None, 2: Dim("dx")} -> {0: None, 1: 32, 2: Dim("dx")}`

This does seem overly complicated, but it's done to preserve dynamic shapes semantics for `torch._dynamo.export()`, which already uses `assume_static_by_default=False`, and follows the same process for generating shape constraints , via `_process_dynamic_shapes`. There the semantics are:
```
None/unspecified: dynamic by default
Dim/DerivedDim: also a strict assertion
```

If we don't care about BC for `_dynamo.export(dynamic_shapes)`, then we can just modify semantics for `_process_dynamic_shapes()` and change all the relevant tests in `test/dynamo/test_export.py`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133620
Approved by: https://github.com/avikchaudhuri
2024-08-23 22:56:39 +00:00
Avik Chaudhuri
b454c51060 remove dynamic_dim (#134211)
Summary: As promised in https://github.com/pytorch/pytorch/pull/134045.

Test Plan: existing

Differential Revision: D61646937

Pull Request resolved: https://github.com/pytorch/pytorch/pull/134211
Approved by: https://github.com/angelayi
2024-08-23 04:13:03 +00:00
Howard Huang
108a75b454 [PP] Add ZeroBubble schedule (#133467)
Zero bubble can be expressed through `ScheduleFlexibleInterleaved1F1B` by setting `enable_zero_bubble=True`. But instead of having to include this flag in schedule initialization we should create a separate ZeroBubbleSchedule and also transition `Interleaved1F1B` to derive from `ScheduleFlexibleInterleaved1F1B`. Then we dont need to expose `ScheduleFlexibleInterleaved1F1B` since the naming is not obvious

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133467
Approved by: https://github.com/wconstab
ghstack dependencies: #132691
2024-08-22 13:32:15 +00:00
Zitong Zhan
90c821814e SparseCsrCUDA: cuDSS backend for linalg.solve (#129856)
This PR switches to cuDSS library and has the same purpose of #127692, which is to add Sparse CSR tensor support to linalg.solve.
Fixes #69538

Minimum example of usage:
```
import torch

if __name__ == '__main__':
    spd = torch.rand(4, 3)
    A = spd.T @ spd
    b = torch.rand(3).to(torch.float64).cuda()
    A = A.to_sparse_csr().to(torch.float64).cuda()

    x = torch.linalg.solve(A, b)
    print((A @ x - b).norm())

```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/129856
Approved by: https://github.com/amjames, https://github.com/lezcano, https://github.com/huydhn

Co-authored-by: Zihang Fang <zhfang1108@gmail.com>
Co-authored-by: Huy Do <huydhn@gmail.com>
2024-08-22 07:57:30 +00:00
Jesse Cai
255cd75a97 [sparse] Add cuSPARSELt as a backend (#128534)
Summary:

This PR adds in cuSPARSELt as a backend to PyTorch.

It is now possible to see if cuSPARSELt is available and the version if
it is with
```
torch.backends.cusparselt.is_available()
torch.backends.cusparselt.version()
```

Test Plan:
```
python test/test_sparse_semi_structured.py -k test_cusparselt_backend
```

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128534
Approved by: https://github.com/cpuhrsch, https://github.com/eqy, https://github.com/syed-ahmed
2024-08-21 22:06:07 +00:00
Xuehai Pan
022cd7c9aa [RFC][dynamo] add decorator to register polyfill for unsupported C++ function to avoid graph break (#133712)
Add decorator `torch.compiler.substitute_in_graph` to register polyfill for unsupported C++ function to avoid graph break. This API provides an official way to add support for dynamo for third-party C extensions. Also, it can be used to simplify our implementation for `torch._dynamo.polyfill`.

5ee070266f/torch/_dynamo/variables/builtin.py (L97-L107)

Example:

```python
>>> import operator
>>> operator.indexOf([1, 2, 3, 4, 5], 3)
2

>>> torch.compile(operator.indexOf, fullgraph=True)([1, 2, 3, 4, 5], 3)
Unsupported: ...

>>> @torch.compiler.substitute_in_graph(operator.indexOf)
... def indexOf(sequence, x):
...     for i, item in enumerate(sequence):
...         if item is x or item == x:
...             return i
...     raise ValueError("sequence.index(x): x not in sequence")

>>> torch.compile(operator.indexOf, fullgraph=True)([1, 2, 3, 4, 5], 3)
2
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133712
Approved by: https://github.com/jansel
2024-08-21 06:36:41 +00:00
Justin Chu
e8fc1e0118 [ONNX] New export logic leveraging ExportedProgram and ONNX IR (#132530)
1/n PR to

- Move code from torch-onnx from commit 395495e566 into torch.onnx and fixes imports.
- Integrate the new export logic with the torch.onnx.export API and include basic set of tests.
- Refactor the API for the change.
- Improve documentation.

Next PRs will be more tests and docs.

Fix https://github.com/pytorch/pytorch/issues/129277
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132530
Approved by: https://github.com/titaiwangms, https://github.com/malfet
2024-08-21 01:08:42 +00:00
Sahdev Zala
06cc2e83f0 Make optim.swa.util content accessible from the torch.optim doc (#133393)
Link various classes and functions of the `optim.swa.util` to make doc content accessible from the `torch.optim` doc.

Currently, if you click the link,
https://pytorch.org/docs/stable/optim.html#module-torch.optim.swa_utils it goes to a blank, bottom of the page section of `torch.optim`.
Also,
`torch.optim.swa_utils.AveragedModel` and `torch.optim.swa_utils.SWALR` classes as well as `torch.optim.swa_utils.update_bn()` and `optim.swa_utils.get_ema_multi_avg_fn` are not linked to doc.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133393
Approved by: https://github.com/janeyx99
2024-08-21 00:43:46 +00:00
PyTorch MergeBot
15b5a0b67f Revert "[RFC][dynamo] add decorator to register polyfill for unsupported C++ function to avoid graph break (#133712)"
This reverts commit 71dd52f51a.

Reverted https://github.com/pytorch/pytorch/pull/133712 on behalf of https://github.com/ZainRizvi due to breaking main windows cpu tests - this stack still causes that windows test to fail ([comment](https://github.com/pytorch/pytorch/pull/133712#issuecomment-2299776241))
2024-08-20 21:14:45 +00:00
Xuehai Pan
71dd52f51a [RFC][dynamo] add decorator to register polyfill for unsupported C++ function to avoid graph break (#133712)
Add decorator `torch.compiler.substitute_in_graph` to register polyfill for unsupported C++ function to avoid graph break. This API provides an official way to add support for dynamo for third-party C extensions. Also, it can be used to simplify our implementation for `torch._dynamo.polyfill`.

5ee070266f/torch/_dynamo/variables/builtin.py (L97-L107)

Example:

```python
>>> import operator
>>> operator.indexOf([1, 2, 3, 4, 5], 3)
2

>>> torch.compile(operator.indexOf, fullgraph=True)([1, 2, 3, 4, 5], 3)
Unsupported: ...

>>> @torch.compiler.substitute_in_graph(operator.indexOf)
... def indexOf(sequence, x):
...     for i, item in enumerate(sequence):
...         if item is x or item == x:
...             return i
...     raise ValueError("sequence.index(x): x not in sequence")

>>> torch.compile(operator.indexOf, fullgraph=True)([1, 2, 3, 4, 5], 3)
2
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133712
Approved by: https://github.com/jansel
2024-08-20 19:48:57 +00:00
PyTorch MergeBot
2bd02e0c82 Revert "[RFC][dynamo] add decorator to register polyfill for unsupported C++ function to avoid graph break (#133712)"
This reverts commit 641724ed1d.

Reverted https://github.com/pytorch/pytorch/pull/133712 on behalf of https://github.com/jeanschmidt due to breaking main windows cpu tests - reverting them all, so we can identify the culprit with more calmness ([comment](https://github.com/pytorch/pytorch/pull/133712#issuecomment-2298528797))
2024-08-20 10:34:41 +00:00
PyTorch MergeBot
68570fca69 Revert "Add MaskedTensor support to *_like API (#128637)"
This reverts commit 8de56e2958.

Reverted https://github.com/pytorch/pytorch/pull/128637 on behalf of https://github.com/jeanschmidt due to Introduced API linting errors ([comment](https://github.com/pytorch/pytorch/pull/128637#issuecomment-2298270307))
2024-08-20 08:26:28 +00:00
Xuehai Pan
641724ed1d [RFC][dynamo] add decorator to register polyfill for unsupported C++ function to avoid graph break (#133712)
Add decorator `torch.compiler.substitute_in_graph` to register polyfill for unsupported C++ function to avoid graph break. This API provides an official way to add support for dynamo for third-party C extensions. Also, it can be used to simplify our implementation for `torch._dynamo.polyfill`.

5ee070266f/torch/_dynamo/variables/builtin.py (L97-L107)

Example:

```python
>>> import operator
>>> operator.indexOf([1, 2, 3, 4, 5], 3)
2

>>> torch.compile(operator.indexOf, fullgraph=True)([1, 2, 3, 4, 5], 3)
Unsupported: ...

>>> @torch.compiler.substitute_in_graph(operator.indexOf)
... def indexOf(sequence, x):
...     for i, item in enumerate(sequence):
...         if item is x or item == x:
...             return i
...     raise ValueError("sequence.index(x): x not in sequence")

>>> torch.compile(operator.indexOf, fullgraph=True)([1, 2, 3, 4, 5], 3)
2
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133712
Approved by: https://github.com/jansel
2024-08-19 22:14:33 +00:00
nowtryz
8de56e2958 Add MaskedTensor support to *_like API (#128637)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128637
Approved by: https://github.com/cpuhrsch
2024-08-19 22:13:59 +00:00
PyTorch MergeBot
35f36363ec Revert "[dtensor] move DTensor to public namespace (#133113)"
This reverts commit 2ee6b97464.

Reverted https://github.com/pytorch/pytorch/pull/133113 on behalf of https://github.com/wanchaol due to looks like it break some internal type imports ([comment](https://github.com/pytorch/pytorch/pull/133113#issuecomment-2295670911))
2024-08-19 05:00:19 +00:00
Wanchao Liang
2ee6b97464 [dtensor] move DTensor to public namespace (#133113)
Moving DTensor to be in the public namespace, to formally add the
documentation page that includes all the public APIs. This includes:

* many path renames and path import fixes
* a dedicated doc page without too much content yet (adding in the next
  PRs)
* To preserve the BC for users still using the `torch.distributed._tensor`,
  I added a shim script to redirect old path calls to the new module

The BC preserving is evidented by the fact that all DTensor tests are still
working without changing the public imports. So it's safe to land the
changes

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133113
Approved by: https://github.com/XilunWu
ghstack dependencies: #133305, #133306
2024-08-17 05:09:52 +00:00
Mikayla Gawarecki
018e48c337 [Reland] Add wrappers for synchronous GPUDirect Storage APIs (#133489)
Reland #130633

USE_CUFILE turned off by default in this version
Pull Request resolved: https://github.com/pytorch/pytorch/pull/133489
Approved by: https://github.com/albanD
2024-08-15 17:11:52 +00:00
Sahdev Zala
19270cff61 Add a reference for the LRScheduler class (#133243)
The `LRScheduler` class provides methods to adjusts the learning rate during optimization (as updated in this PR). Also, as a note, all the classes of lr_scheduluer are already provided in the `How to adjust learning rate` section.

Fixes #127884

Pull Request resolved: https://github.com/pytorch/pytorch/pull/133243
Approved by: https://github.com/janeyx99
2024-08-13 16:20:22 +00:00
fduwjj
dc8bb2636c [c10d][doc] Add docs for ENV variables TORCH_NCCL_ASYNC_ERROR_HANDLING TORCH_NCCL_TRACE_CPP_STACK and TORCH_NCCL_COORD_CHECK_MILSEC (#132920)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132920
Approved by: https://github.com/fegin, https://github.com/wconstab
2024-08-09 21:08:20 +00:00
Edward Z. Yang
1f66487c69 [BE] Reroute all uses of proxy_tensor.maybe_disable_fake_tensor_mode to fake_tensor.unset_fake_temporarily (#132770)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132770
Approved by: https://github.com/bdhirsh
2024-08-08 23:07:23 +00:00
PyTorch MergeBot
d1f73fd844 Revert "[BE] Reroute all uses of proxy_tensor.maybe_disable_fake_tensor_mode to fake_tensor.unset_fake_temporarily (#132770)"
This reverts commit 902c6f3a19.

Reverted https://github.com/pytorch/pytorch/pull/132770 on behalf of https://github.com/ezyang due to Removed API was recommitted ([comment](https://github.com/pytorch/pytorch/pull/132770#issuecomment-2275749689))
2024-08-08 12:54:34 +00:00
Edward Z. Yang
902c6f3a19 [BE] Reroute all uses of proxy_tensor.maybe_disable_fake_tensor_mode to fake_tensor.unset_fake_temporarily (#132770)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132770
Approved by: https://github.com/bdhirsh
ghstack dependencies: #132674, #132675, #132421, #132062, #132767, #132769
2024-08-08 12:03:25 +00:00
Edward Z. Yang
aec6332356 Only thunkify proxies in some situations (#132421)
The goal of this PR is to avoid stack overflow when we create extremely long chains of thunks, and then evaluate them (e.g., as occurs if you sum(long list of symint)). The basic idea behind this PR is to only thunkify proxies if they're being created in places where they may or may not be used--crucially, symint operations that occur in user code we are tracing are eagerly placed into the graph, even if they may eventually be dead.

I annotated the PR with explanation of changes.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132421
Approved by: https://github.com/Skylion007, https://github.com/zou3519
ghstack dependencies: #132674, #132675
2024-08-08 12:03:06 +00:00
Edward Z. Yang
361db32d47 Consolidate SymDispatchMode into ProxyTensorMode (#132674)
Instead of having a separate context variable for SymDispatchMode, we
now simply delegate to the current active proxy tensor mode when we
need to trace a SymInt.  We maintain a separate `__sym_dispatch__` magic
method as the calling convention is different than `__torch_dispatch__`.

Consolidating the modes in this ways means that we can consistently
disable both of these modes in tandem simply by removing the mode
from the proxy mode infra slot.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132674
Approved by: https://github.com/zou3519, https://github.com/bdhirsh
2024-08-08 12:02:54 +00:00
daitian1995
aff48f7378 Autoselect default device in FSDP construction. (#127609)
There are still some differences between CUDA and non-CUDA custom devices when
construct FSDP because CUDA is selected as the default device. For example,
when construct FSDP from CPU model and device_id is not passed, device_handle
will choose CUDA as default device. This PR will autoselect the real device
as the default device.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/127609
Approved by: https://github.com/awgu
2024-08-08 05:25:17 +00:00
PyTorch MergeBot
a9ff190867 Revert "Consolidate SymDispatchMode into ProxyTensorMode (#132674)"
This reverts commit ffdf48e63b.

Reverted https://github.com/pytorch/pytorch/pull/132674 on behalf of https://github.com/PaliC due to We need to now revert https://github.com/pytorch/pytorch/pull/132216 in OSS and there is a dependency on this pr ([comment](https://github.com/pytorch/pytorch/pull/132674#issuecomment-2274062785))
2024-08-07 18:25:33 +00:00
PyTorch MergeBot
780310fed7 Revert "Only thunkify proxies in some situations (#132421)"
This reverts commit bb99008c9e.

Reverted https://github.com/pytorch/pytorch/pull/132421 on behalf of https://github.com/clee2000 due to I think this broke dynamo/test_subclasses.py::TestNestedTensor::test_in_graph_construction_from_input [GH job link](https://github.com/pytorch/pytorch/actions/runs/10283744685/job/28459340678) [HUD commit link](bb99008c9e).  Test got added in f50621989b which is before your merge base ([comment](https://github.com/pytorch/pytorch/pull/132421#issuecomment-2273742960))
2024-08-07 15:29:54 +00:00
Edward Z. Yang
bb99008c9e Only thunkify proxies in some situations (#132421)
The goal of this PR is to avoid stack overflow when we create extremely long chains of thunks, and then evaluate them (e.g., as occurs if you sum(long list of symint)). The basic idea behind this PR is to only thunkify proxies if they're being created in places where they may or may not be used--crucially, symint operations that occur in user code we are tracing are eagerly placed into the graph, even if they may eventually be dead.

I annotated the PR with explanation of changes.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132421
Approved by: https://github.com/Skylion007, https://github.com/zou3519
ghstack dependencies: #132674, #132675
2024-08-07 11:51:17 +00:00
Edward Z. Yang
ffdf48e63b Consolidate SymDispatchMode into ProxyTensorMode (#132674)
Instead of having a separate context variable for SymDispatchMode, we
now simply delegate to the current active proxy tensor mode when we
need to trace a SymInt.  We maintain a separate `__sym_dispatch__` magic
method as the calling convention is different than `__torch_dispatch__`.

Consolidating the modes in this ways means that we can consistently
disable both of these modes in tandem simply by removing the mode
from the proxy mode infra slot.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132674
Approved by: https://github.com/zou3519, https://github.com/bdhirsh
2024-08-06 17:03:17 +00:00
Wouter Devriendt
e8645fa2b9 [Doc] fix some typos (found by codespell and typos) (#132544)
Applying doc fixes from PR https://github.com/pytorch/pytorch/pull/127267 - with CLA
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132544
Approved by: https://github.com/kit1980
2024-08-05 17:21:56 +00:00
Xuehai Pan
4226ed1585 [BE] Format uncategorized Python files with ruff format (#132576)
Remove patterns `**`, `test/**`, and `torch/**` in `tools/linter/adapters/pyfmt_linter.py` and run `lintrunner`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/132576
Approved by: https://github.com/ezyang, https://github.com/Skylion007
ghstack dependencies: #132574
2024-08-04 17:13:31 +00:00
Syed Tousif Ahmed
7c89ec0f7c Implements torch.cuda.MemPool() API (#131152)
In this PR:
- Pool id creation logic is refactored and moved to a MemPool class. `graph_pool_handle()` API now uses `torch.cuda.MemPool()` to get a unique id for a pool. Existing tests should cover this change.
- MemPool holds a pointer to a CUDAAllocator as proposed in https://github.com/pytorch/pytorch/issues/124807#issuecomment-2077506997. Tests are added to show usage with CUDAPluggableAllocator.
- MemPoolContext API makes a mempool active. Tests are added to show usage of this API. This API will be used in CUDACachingAllocator to route allocations to a user provided allocator. See draft here: https://github.com/pytorch/pytorch/pull/125722/

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131152
Approved by: https://github.com/eqy, https://github.com/ezyang
2024-08-01 01:29:30 +00:00
Luca Wehrstedt
f4f7aba75d Expose function to probe whether PyTorch was built with FlashAttention (#131894)
This is needed by downstream projects (e.g., xFormers) to determine whether they can count on FlashAttention in PyTorch or whether they need to build it themselves.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/131894
Approved by: https://github.com/drisspg, https://github.com/eqy
2024-07-31 11:33:09 +00:00
ekamiti
9e473fd868 Make adding Buffers more like adding Parameters (#125971)
Add similar semantics for creating a buffer object similar to creating a parameter. This is done by introducing a new Buffer class that can be used for type disambiguation. The underlying functionality of registering a buffer remains the same as the register_buffer method has not been changed. The persistent parameter in the Buffer type is to indicate whether a buffer object should be persistent or not. Other non-test changes have to do with getting the new Buffer type recognized by inductor and dynamo. Remaining changes are test changes to make sure that the Buffer type can be used as a drop in replacement for register_buffer as it just leads to register_buffer being called. The addition of this new functionality still allows for normal tensors to be used as buffers so these changes are intended to be backwards compatible.

Fixes #35735

Co-authored-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125971
Approved by: https://github.com/albanD, https://github.com/anijain2305, https://github.com/mlazos
2024-07-31 10:32:40 +00:00
Simon Mahns
dcb03106b7 [Land Internally] MTIA equivalent of torch.cuda.memory_stats (#132007)
Summary: as title

Test Plan: pytorch ci failing: https://github.com/pytorch/pytorch/issues/131962

Differential Revision: D60335413

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132007
Approved by: https://github.com/hanzlfs, https://github.com/egienvalue
2024-07-29 20:47:18 +00:00
PyTorch MergeBot
eb9409511e Revert "support zb1p and zb2p algorithms (#130752)"
This reverts commit 8fe5b93667.

Reverted https://github.com/pytorch/pytorch/pull/130752 on behalf of https://github.com/atalman due to Broke Periodic CI: distributed/pipelining/test_composability.py::ComposabilityTest::test_manual_with_data_parallel_dp_type_DDP_ScheduleClass4 [GH job link](https://github.com/pytorch/pytorch/actions/runs/10131472868/job/28014900187) [HUD commit link](8fe5b93667) ([comment](https://github.com/pytorch/pytorch/pull/130752#issuecomment-2255819078))
2024-07-29 12:40:00 +00:00