pytorch/docs/source
Chuanhao Zhuge 74280d0913 [muon] Introduce Muon optimizer to PyTorch (#160213)
A single-device version of Muon. Algorithm refers Keller Jordan's [Muon blogpost](https://kellerjordan.github.io/posts/muon/), and optionally incorporates [Moonshot's](https://github.com/MoonshotAI/Moonlight/blob/master/Moonlight.pdf) learning rate adjustment strategy.

This implementation maintains a minimalist API and is consistent with other optimizer conventions. PyTorch team prefers to handle parameter filtering at a higher level, with the Muon optimizer performing only the msign computation for orthogonalization on all parameters it receives. Users are responsible for grouping parameters for different optimizers as needed. An example usage is shown below, and a more detailed example will be added to the [PyTorch examples](https://github.com/pytorch/examples) directory.

**Usage**

```python
    model = MyModelForCausalLM
    # filter out your params manually
    muon_params = [...]
    adamw_params = [...]
    muon = Muon(
        params = muon_params
        lr=lr,
        wd=wd,
    )
    adamw = AdamW(
        params = adamw_params
        lr=lr,
        wd=wd,
    )

    # in training loop
    loss = model(input)
    loss.backward()
    muon.step()
    adamw.step()
    muon.zero_grad()
    adamw.zero_grad()
```

~~**Additional usage**~~
~~Users are also able to pass in self-defined `msign` function for orthogonalization, and learning rate adjustment function. Interface defined below:~~

```python
~~AdjustLrFn: TypeAlias = Callable[[float, torch.Size], float]~~
~~MsignFn: TypeAlias = Callable[[Tensor, BaseMsignFnConfig], Tensor]~~
```

As discussed with team and in comment, we prefer to make the interface simpler and cleaner, thus we removed the callback interface, and canonicalize the original NS algorithm for Muon. The only configs available to users are `ns_steps`, `coefficients`, and `eps`, configurable through kwargs.

By default, we use 5-step Newton-Schulz, with coefficients proposed by [Keller](https://kellerjordan.github.io/posts/muon/). We use LR adjustment proposed by [Moonshot](https://github.com/MoonshotAI/Moonlight/blob/master/Moonlight.pdf), which grafts learning rate from AdamW.

**Testing**

~~1. Unit tests: the newly introduced Muon is covered in `test/test_optim.py`. We updated the test cases to pass named parameters to the optimizer under test. Additionally, we introduced a new test case to verify that when the user provides an empty FQN list, Muon correctly falls back to AdamW behavior.~~

As discussed, in order not to complicate the codebase, we prefer not to include reference implementation into PyTorch. We also updated the interface so we don't need to test the FQN based filtering. Muon is covered by the existing `test_optim.py` unit test.

2. End-to-end test: we added a training script that pre-trains a QWEN-like model on `openwebtext-100k` dataset. We trained for one epoch and the resulting loss curve is compared against the Moonshot implementation to confirm behavioral consistency.
<img width="1102" height="472" alt="Screenshot 2025-07-29 at 1 04 12 AM" src="https://github.com/user-attachments/assets/ceab0733-497d-4070-8032-02ae7995c64c" />

**Numerics**
We evaluate our implementation with existing implementation to confirm numerical consistency.

As discussed, our implementation closely follows the algorithm described in [Keller's post](https://kellerjordan.github.io/posts/muon/), while incorporating the learning rate adjustment from [Moonlight](https://github.com/MoonshotAI/Moonlight/blob/master/Moonlight.pdf). This captures a key insight that allows users to reuse hyper-parameters tuned for `adamW`, making Muon a drop-in swap.

As expected, the numerics difference mainly comes from `adjust_lr`, a max of ~5% relative diff in an example unit test setup below.

```python
    # dummy model and data
    model0 = Linear(10, 10, bias=False)
    model1 = copy.deepcopy(model0)
    inputs = torch.randn(8, 10)
    targets = torch.randn(8, 10)
    loss = MSELoss()

    lr = 1e-3
    wd = 0.1
    momentum = 0.95

    opt_ref_muon = KellySingleDeviceMuon(
        params=model0.parameters(),
        lr=lr,
        weight_decay=wd,
        momentum=momentum,
    )

    opt_exp_muon = Muon(
        params=model1.parameters(),
        lr=lr,
        weight_decay=wd,
        momentum=momentum,
    )

    out_ref = model0(inputs)
    loss_ref = loss(out_ref, targets)
    opt_ref_muon.zero_grad()
    loss_ref.backward()
    opt_ref_muon.step()

    out_exp = model1(inputs)
    loss_exp = loss(out_exp, targets)
    opt_exp_muon.zero_grad()
    loss_exp.backward()
    opt_exp_muon.step()

    for p_ref, p_exp in zip(model0.parameters(), model1.parameters()):
        torch.testing.assert_close(p_ref, p_exp)
```

As explained above, including this `adjust_lr` is preferable. This is validated by an e2e training runs on training a qwen-2-like 0.5b model, where the curves show that training with `adjust_lr` converges more effectively than without.
<img width="1179" height="464" alt="Screenshot 2025-08-18 at 10 12 33 AM" src="https://github.com/user-attachments/assets/e797d3da-c2f0-4187-b99e-5d48b7437c3c" />

**Performance**
Training for one epoch of openwebtext-100k on eight H100 GPUs with DDP:

- adamw_ddp finishes in 13.12 min
- pytorch_muon_ddp finishes in 13.45 min

Muon runs ~20s slower compared to AdamW. Assuming no other changes, Muon is *2.5%* slower than AdamW.

AdamW: Optimizer.step() takes ~13.5 ms, step time ~930 ms
<img width="726" height="590" alt="Screenshot 2025-07-29 at 1 56 14 AM" src="https://github.com/user-attachments/assets/ebcd7e1c-d129-4b20-9396-39f568edf03d" />

Muon: Optimizer.step() takes ~54 ms, step time ~960 ms
<img width="751" height="597" alt="Screenshot 2025-07-29 at 2 02 20 AM" src="https://github.com/user-attachments/assets/72f5b904-ebd5-4502-a6ff-d3e9e5a6da81" />

**Note**
We restrict the implementation to accept only 2D parameters.

An alternative approach is to allow parameters with more than two dimensions and apply orthogonalization over the last two dimensions. We opt not to go with this approach as it can be error-prone. For example, with a kernel shaped `[in_channel, height, width, out_channel]`, applying orthogonalization to the last two dimensions is not meaningful.

Since Muon is designed to operate orthogonalization on 2D matrices, preserving this assumption keeps the implementation clean and sound.

**Next Steps**

1. Add `MuP`
2. Open-source optimized triton kernel for symmetric matmul. A preliminary benchmark found 1.23x - 1.48x speedup on small - large (n = 256 -> 16384) matrices.
3. Open-source unsharded Muon co-designed with FSDP2.

****

Pull Request resolved: https://github.com/pytorch/pytorch/pull/160213
Approved by: https://github.com/janeyx99
2025-08-24 08:03:04 +00:00
..
_static [doc] AOTI debugging guide (#160430) 2025-08-14 23:42:17 +00:00
_templates Migrate to new theme (#149331) 2025-04-16 21:35:19 +00:00
community [doc] add weifengpy to torch distributed pocs (#158989) 2025-07-24 04:42:33 +00:00
compile [dynamo, docs] add recompilation, observability, reporting issues docs (#159062) 2025-07-29 23:23:51 +00:00
elastic Support NUMA Binding for Callable Entrypoints (#160163) 2025-08-12 20:08:49 +00:00
export [export] Add runnable code to export docs (#158506) 2025-07-17 20:15:22 +00:00
notes Add ScalarType -> shim conversion, add stable::Tensor.scalar_type (#160557) 2025-08-19 22:13:47 +00:00
rpc Fix broken URLs (#152237) 2025-04-27 09:56:42 +00:00
scripts [ONNX] Filter out torchscript sentences (#158850) 2025-07-24 20:59:06 +00:00
user_guide Add placeholder for the User Guide (#159379) 2025-08-13 14:56:04 +00:00
accelerator.md Add unified memory APIs for torch.accelerator (#152932) 2025-08-08 17:41:22 +00:00
amp.md [Docs] Convert to markdown: accelerator.rst, amp.rst, autograd.rst, backends.rst, benchmark_utils.rst (#155762) 2025-06-12 02:55:06 +00:00
autograd.md [Docs] Convert to markdown: accelerator.rst, amp.rst, autograd.rst, backends.rst, benchmark_utils.rst (#155762) 2025-06-12 02:55:06 +00:00
backends.md [ROCm] SDPA fix mem fault when dropout is enabled (#154864) 2025-08-21 14:23:13 +00:00
benchmark_utils.md [Docs] Convert to markdown: accelerator.rst, amp.rst, autograd.rst, backends.rst, benchmark_utils.rst (#155762) 2025-06-12 02:55:06 +00:00
bottleneck.rst
checkpoint.md Convert to markdown: checkpoint.rst (#156009) 2025-06-16 17:48:23 +00:00
complex_numbers.md Convert complex_numbers.rst to markdown (#156039) 2025-06-16 17:24:37 +00:00
cond.md [Docs] Convert to markdown cond.rst, config_mod.rst (#155653) 2025-06-13 20:58:57 +00:00
conf.py Support NUMA Binding for Callable Entrypoints, Take 2 (#161183) 2025-08-23 07:23:22 +00:00
config_mod.md [Docs] Convert to markdown cond.rst, config_mod.rst (#155653) 2025-06-13 20:58:57 +00:00
cpp_extension.rst xpu: support sycl with torch.utils.cpp_extension APIs (#132945) 2025-02-16 16:50:59 +00:00
cpp_index.rst [3/n] Remove references to TorchScript in PyTorch docs (#158315) 2025-07-15 21:14:18 +00:00
cpu.rst
cuda_environment_variables.rst
cuda._sanitizer.rst
cuda.md Revert "[WIP] Merge Test (#160998)" 2025-08-19 20:30:39 +00:00
cuda.tunable.md Fix #155016 for Docathon - convert rst to markdown (#155198) 2025-06-13 20:24:34 +00:00
cudnn_persistent_rnn.rst
cudnn_rnn_determinism.rst Fix broken URLs (#152237) 2025-04-27 09:56:42 +00:00
data.md Fix #155016 for Docathon - convert rst to markdown (#155198) 2025-06-13 20:24:34 +00:00
ddp_comm_hooks.md DOC: Convert to markdown: ddp_comm_hooks.rst, debugging_environment_variables.rst, deploy.rst, deterministic.rst, distributed.algorithms.join.rst (#155298) 2025-06-06 22:44:50 +00:00
debugging_environment_variables.md DOC: Convert to markdown: ddp_comm_hooks.rst, debugging_environment_variables.rst, deploy.rst, deterministic.rst, distributed.algorithms.join.rst (#155298) 2025-06-06 22:44:50 +00:00
deterministic.md DOC: Convert to markdown: ddp_comm_hooks.rst, debugging_environment_variables.rst, deploy.rst, deterministic.rst, distributed.algorithms.join.rst (#155298) 2025-06-06 22:44:50 +00:00
distributed._dist2.md Add a title to distributed._dist2.md (#159385) 2025-07-30 04:09:41 +00:00
distributed.algorithms.join.md DOC: Convert to markdown: ddp_comm_hooks.rst, debugging_environment_variables.rst, deploy.rst, deterministic.rst, distributed.algorithms.join.rst (#155298) 2025-06-06 22:44:50 +00:00
distributed.checkpoint.md [DCP] OSS Zero Overhead Checkpointing Implementation (#156207) 2025-06-29 03:19:48 +00:00
distributed.elastic.md NUMA binding integration with elastic agent and torchrun (#149334) 2025-07-25 21:19:49 +00:00
distributed.fsdp.fully_shard.md [FSDP2] explain user contract for fully_shard (#156070) 2025-06-17 10:03:19 +00:00
distributed.md [doc] Updates to distributed.md for XCCL backend (#155834) 2025-07-22 21:01:43 +00:00
distributed.optim.md Fix #155018 (convert distributed rst to markdown) (#155528) 2025-06-16 20:46:09 +00:00
distributed.pipelining.md [PP] Add DualPipeV schedule (#159591) 2025-08-14 14:58:35 +00:00
distributed.tensor.md Revert "[DTensor] Make default RNG semantics match user-passed generator (#160482)" 2025-08-22 15:04:28 +00:00
distributed.tensor.parallel.md Convert to markdown: distributed.tensor.parallel.rst, distributed.tensor.rst, distributions.rst, dlpack.rst (#155297) 2025-06-13 22:08:37 +00:00
distributions.md Convert to markdown: distributed.tensor.parallel.rst, distributed.tensor.rst, distributions.rst, dlpack.rst (#155297) 2025-06-13 22:08:37 +00:00
dlpack.md Convert to markdown: distributed.tensor.parallel.rst, distributed.tensor.rst, distributions.rst, dlpack.rst (#155297) 2025-06-13 22:08:37 +00:00
docutils.conf
export.md [export] Add runnable code to export docs (#158506) 2025-07-17 20:15:22 +00:00
fft.md Convert to .md: draft_export.rst, export.ir_spec.rst, fft.rst (#155567) 2025-06-13 05:19:43 +00:00
fsdp.md Convert rst files to md (#155369) 2025-06-11 23:00:52 +00:00
func.api.md Convert rst files to md (#155369) 2025-06-11 23:00:52 +00:00
func.batch_norm.md Convert rst files to md (#155369) 2025-06-11 23:00:52 +00:00
func.md Convert rst files to md (#155369) 2025-06-11 23:00:52 +00:00
func.migrating.md Convert rst files to md (#155369) 2025-06-11 23:00:52 +00:00
func.ux_limitations.md Fix #155022 rst to markdown conversion (#155540) 2025-06-12 00:21:22 +00:00
func.whirlwind_tour.md Fix #155022 rst to markdown conversion (#155540) 2025-06-12 00:21:22 +00:00
future_mod.md Fix #155022 rst to markdown conversion (#155540) 2025-06-12 00:21:22 +00:00
futures.md Fix #155022 rst to markdown conversion (#155540) 2025-06-12 00:21:22 +00:00
fx.experimental.md Fix #155022 rst to markdown conversion (#155540) 2025-06-12 00:21:22 +00:00
fx.md [3/n] Remove references to TorchScript in PyTorch docs (#158315) 2025-07-15 21:14:18 +00:00
hub.md Convert hub.rst to hub.md (#155483) 2025-06-13 04:39:55 +00:00
index.md Add placeholder for the User Guide (#159379) 2025-08-13 14:56:04 +00:00
jit_builtin_functions.rst [4/n] Remove references to TorchScript in PyTorch docs (#158317) 2025-07-16 20:01:34 +00:00
jit_language_reference_v2.md [1/n] Remove references to TorchScript in PyTorch docs (#158305) 2025-07-15 20:16:53 +00:00
jit_language_reference.md [2/n] Remove references to TorchScript in PyTorch docs (#158306) 2025-07-15 20:57:23 +00:00
jit_python_reference.md [3/n] Remove references to TorchScript in PyTorch docs (#158315) 2025-07-15 21:14:18 +00:00
jit_unsupported.md [4/n] Remove references to TorchScript in PyTorch docs (#158317) 2025-07-16 20:01:34 +00:00
jit_utils.md Convert to markdown: jit_python_reference.rst, jit_unsupported.rst, jit_utils.rst, library.rst (#155404) 2025-06-26 21:09:46 +00:00
jit.rst [4/n] Remove references to TorchScript in PyTorch docs (#158317) 2025-07-16 20:01:34 +00:00
library.md Add utility to get computed kernel in torch.library (#158393) 2025-08-13 21:00:59 +00:00
linalg.md [Docs] Convert to markdown to fix 155025 (#155789) 2025-06-17 15:08:14 +00:00
logging.md [Docs] Convert to markdown to fix 155025 (#155789) 2025-06-17 15:08:14 +00:00
masked.md [Docs] Convert to markdown to fix 155025 (#155789) 2025-06-17 15:08:14 +00:00
math-quantizer-equation.png
meta.md [Docs] Convert to markdown to fix 155025 (#155789) 2025-06-17 15:08:14 +00:00
miscellaneous_environment_variables.md [Docs] Convert to markdown to fix 155025 (#155789) 2025-06-17 15:08:14 +00:00
mobile_optimizer.md DOC: Convert to markdown: mobile_optimizer.rst, model_zoo.rst, module_tracker.rst, monitor.rst, mps_environment_variables.rst (#155702) 2025-06-11 22:16:04 +00:00
model_zoo.md DOC: Convert to markdown: mobile_optimizer.rst, model_zoo.rst, module_tracker.rst, monitor.rst, mps_environment_variables.rst (#155702) 2025-06-11 22:16:04 +00:00
module_tracker.md DOC: Convert to markdown: mobile_optimizer.rst, model_zoo.rst, module_tracker.rst, monitor.rst, mps_environment_variables.rst (#155702) 2025-06-11 22:16:04 +00:00
monitor.md DOC: Convert to markdown: mobile_optimizer.rst, model_zoo.rst, module_tracker.rst, monitor.rst, mps_environment_variables.rst (#155702) 2025-06-11 22:16:04 +00:00
mps_environment_variables.md DOC: Convert to markdown: mobile_optimizer.rst, model_zoo.rst, module_tracker.rst, monitor.rst, mps_environment_variables.rst (#155702) 2025-06-11 22:16:04 +00:00
mps.md Fix/issue #155027 (#155252) 2025-06-08 21:17:31 +00:00
mtia.md Fix/issue #155027 (#155252) 2025-06-08 21:17:31 +00:00
mtia.memory.md [Re-land][Inductor] Support native Inductor as backend for MTIA (#159211) 2025-07-29 17:03:24 +00:00
multiprocessing.md Fix/issue #155027 (#155252) 2025-06-08 21:17:31 +00:00
name_inference.md Fix/issue #155027 (#155252) 2025-06-08 21:17:31 +00:00
named_tensor.md Convert to markdown: named_tensor.rst, nested.rst, nn.attention.bias.rst, nn.attention.experimental.rst, nn.attention.flex_attention.rst #155028 (#155696) 2025-06-14 03:32:00 +00:00
nested.md Convert to markdown: named_tensor.rst, nested.rst, nn.attention.bias.rst, nn.attention.experimental.rst, nn.attention.flex_attention.rst #155028 (#155696) 2025-06-14 03:32:00 +00:00
nn.aliases.md [BE] Use .md instead of .rst for nn.aliases doc (#158666) 2025-07-25 22:03:55 +00:00
nn.attention.bias.md Convert to markdown: named_tensor.rst, nested.rst, nn.attention.bias.rst, nn.attention.experimental.rst, nn.attention.flex_attention.rst #155028 (#155696) 2025-06-14 03:32:00 +00:00
nn.attention.experimental.md Convert to markdown: named_tensor.rst, nested.rst, nn.attention.bias.rst, nn.attention.experimental.rst, nn.attention.flex_attention.rst #155028 (#155696) 2025-06-14 03:32:00 +00:00
nn.attention.flex_attention.md Add kernel options to flex docs (#158875) 2025-07-23 19:05:19 +00:00
nn.attention.rst
nn.functional.rst
nn.init.rst
nn.rst [BE] More torch.nn docs coverage test (except for torch.nn.parallel) (#158654) 2025-07-25 22:03:55 +00:00
notes.md Migrate to new theme (#149331) 2025-04-16 21:35:19 +00:00
onnx_export.md [ONNX] Remove enable_fake_mode and exporter_legacy (#161222) 2025-08-22 22:15:27 +00:00
onnx_ops.md [ONNX] Implement Attention-23 (#156431) 2025-06-20 23:54:57 +00:00
onnx_verification.md Convert to .md: onnx_verification.rst, onnx.rst, package.rst, (#155556) 2025-06-10 21:40:40 +00:00
onnx.md [ONNX] Remove enable_fake_mode and exporter_legacy (#161222) 2025-08-22 22:15:27 +00:00
optim.aliases.md Document the rest of the specific optimizer module APIs (#158669) 2025-07-19 07:27:15 +00:00
optim.md [muon] Introduce Muon optimizer to PyTorch (#160213) 2025-08-24 08:03:04 +00:00
package.md [3/n] Remove references to TorchScript in PyTorch docs (#158315) 2025-07-15 21:14:18 +00:00
profiler.md Convert rst to markdown - profiler.rst #155031 (#155559) 2025-06-13 05:02:54 +00:00
pytorch-api.md Add placeholder for the User Guide (#159379) 2025-08-13 14:56:04 +00:00
quantization-support.md Convert to markdown: quantization-accuracy-debugging.rst, quantization-backend-configuration.rst, quantization-support.rst, random.rst (#155520) 2025-06-18 18:46:04 +00:00
quantization.rst Remove the uncessary empty file (#160728) 2025-08-19 10:54:08 +00:00
random.md Convert to markdown: quantization-accuracy-debugging.rst, quantization-backend-configuration.rst, quantization-support.rst, random.rst (#155520) 2025-06-18 18:46:04 +00:00
rpc.md RPC tutorial audit (#157938) 2025-07-10 14:15:37 +00:00
signal.md Convert rst to md: rpc.rst, signal.rst, size.rst, special.rst (#155430) 2025-06-18 01:27:04 +00:00
size.md Convert rst to md: rpc.rst, signal.rst, size.rst, special.rst (#155430) 2025-06-18 01:27:04 +00:00
sparse.rst [BE] fix typos in docs/ (#156080) 2025-06-21 02:47:32 +00:00
special.md Convert rst to md: rpc.rst, signal.rst, size.rst, special.rst (#155430) 2025-06-18 01:27:04 +00:00
storage.rst Super tiny fix typo (#151212) 2025-04-14 16:47:40 +00:00
tensor_attributes.rst revamp dtype documentation for 2025 (#156087) 2025-06-27 13:10:23 +00:00
tensor_view.rst [docs] fix numpy docs reference (#147697) 2025-02-26 01:30:03 +00:00
tensorboard.rst
tensors.rst revamp dtype documentation for 2025 (#156087) 2025-06-27 13:10:23 +00:00
testing.md Convert to markdown: testing.rst, threading_environment_variables.rst, torch_cuda_memory.rst, torch_environment_variables.rst, torch_nccl_environment_variables.rst (#155523) 2025-06-10 20:38:36 +00:00
threading_environment_variables.md Convert to markdown: testing.rst, threading_environment_variables.rst, torch_cuda_memory.rst, torch_environment_variables.rst, torch_nccl_environment_variables.rst (#155523) 2025-06-10 20:38:36 +00:00
torch_cuda_memory.md Fix typo in link to torch memory_viz tool (#159214) 2025-07-31 18:50:54 +00:00
torch_environment_variables.md Convert to markdown: testing.rst, threading_environment_variables.rst, torch_cuda_memory.rst, torch_environment_variables.rst, torch_nccl_environment_variables.rst (#155523) 2025-06-10 20:38:36 +00:00
torch_nccl_environment_variables.md Convert to markdown: testing.rst, threading_environment_variables.rst, torch_cuda_memory.rst, torch_environment_variables.rst, torch_nccl_environment_variables.rst (#155523) 2025-06-10 20:38:36 +00:00
torch.aliases.md Remove torch.functional entries from the doc ignore list (#158581) 2025-07-25 17:19:01 +00:00
torch.compiler_aot_inductor_debugging_guide.md [doc] AOTI debugging guide (#160430) 2025-08-14 23:42:17 +00:00
torch.compiler_aot_inductor_minifier.md Converting .rst files to .md files (#155377) 2025-06-13 22:54:27 +00:00
torch.compiler_aot_inductor.md [doc] AOTI debugging guide (#160430) 2025-08-14 23:42:17 +00:00
torch.compiler_api.md Implement guard collectives (optimized version) (#156562) 2025-06-24 04:59:49 +00:00
torch.compiler_backward.md Add AOTDispatcher config to set backward autocast behavior (#156356) 2025-06-27 14:58:58 +00:00
torch.compiler_cudagraph_trees.md [Graph Partition] add graph partition doc (#159450) 2025-07-30 17:01:10 +00:00
torch.compiler_custom_backends.md DOC: Convert to markdown: torch.compiler_best_practices_for_backends.rst, torch.compiler_cudagraph_trees.rst, torch.compiler_custom_backends.rst, torch.compiler_dynamic_shapes.rst, torch.compiler_dynamo_deepdive.rst (#155137) 2025-06-10 20:51:05 +00:00
torch.compiler_dynamic_shapes.md DOC: Convert to markdown: torch.compiler_best_practices_for_backends.rst, torch.compiler_cudagraph_trees.rst, torch.compiler_custom_backends.rst, torch.compiler_dynamic_shapes.rst, torch.compiler_dynamo_deepdive.rst (#155137) 2025-06-10 20:51:05 +00:00
torch.compiler_dynamo_deepdive.md Dynamo Deep Dive Documentation Fix (#158860) 2025-08-12 08:53:33 +00:00
torch.compiler_dynamo_overview.md convert: rst to myst pr 1/2 (#155840) 2025-06-13 18:02:28 +00:00
torch.compiler_fake_tensor.md convert: rst to myst pr 1/2 (#155840) 2025-06-13 18:02:28 +00:00
torch.compiler_faq.md convert: rst to myst pr2/2 (#155911) 2025-06-16 00:44:44 +00:00
torch.compiler_fine_grain_apis.md convert: rst to myst pr2/2 (#155911) 2025-06-16 00:44:44 +00:00
torch.compiler_get_started.md convert: rst to myst pr2/2 (#155911) 2025-06-16 00:44:44 +00:00
torch.compiler_inductor_profiling.md Convert compiler rst files to markdown (#155335) 2025-06-10 01:12:11 +00:00
torch.compiler_inductor_provenance.rst Update provenance tracking doc (#154062) 2025-05-23 17:09:52 +00:00
torch.compiler_ir.md [export] Update docs (#157750) 2025-07-16 19:53:12 +00:00
torch.compiler_nn_module.md Convert compiler rst files to markdown (#155335) 2025-06-10 01:12:11 +00:00
torch.compiler_performance_dashboard.md Convert compiler rst files to markdown (#155335) 2025-06-10 01:12:11 +00:00
torch.compiler_profiling_torch_compile.md [Docs] Update PT2 Profiler Torch-Compiled Region Image (#158066) 2025-07-11 07:56:45 +00:00
torch.compiler_transformations.md [Docs] Convert to markdown: torch.compiler_transformations.rst, torch.compiler.config.rst (#155347) 2025-06-11 18:55:30 +00:00
torch.compiler_troubleshooting_old.md Add torch compile force disable caches alias (#158072) 2025-08-02 23:23:17 +00:00
torch.compiler_troubleshooting.md [doc] AOTI debugging guide (#160430) 2025-08-14 23:42:17 +00:00
torch.compiler.config.md [Docs] Convert to markdown: torch.compiler_transformations.rst, torch.compiler.config.rst (#155347) 2025-06-11 18:55:30 +00:00
torch.compiler.md [dynamo, docs] programming model dynamo core concepts (#157985) 2025-07-29 01:53:34 +00:00
torch.overrides.md DOC: Convert to markdown: torch.overrides.rst, type_info.rst, utils.rst, xpu.rst (#155088) 2025-06-06 20:16:13 +00:00
torch.rst Remove torch.functional entries from the doc ignore list (#158581) 2025-07-25 17:19:01 +00:00
type_info.md finfo eps doc fix (#160502) 2025-08-14 01:49:35 +00:00
utils.md DOC: Convert to markdown: torch.overrides.rst, type_info.rst, utils.rst, xpu.rst (#155088) 2025-06-06 20:16:13 +00:00
xpu.md DOC: Convert to markdown: torch.overrides.rst, type_info.rst, utils.rst, xpu.rst (#155088) 2025-06-06 20:16:13 +00:00