Commit Graph

32467 Commits

Author SHA1 Message Date
Bin Bao
3058700f7f [aotinductor] Add AOTIModelRunner as a utility class (#110891)
Summary: Introduce a utility class AOTIModelRunner to take care of running an AOTInductor compiled model. It does things like dlopen a model, initialize the model container, setup inputs and outputs, and destroy the model container.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110891
Approved by: https://github.com/chenyang78
ghstack dependencies: #110652
2023-10-11 15:58:28 +00:00
ydwu4
3062e267b1 [cond] Add more tests for valid inputs of cond (#110727)
This PR adds a parametrized test for cond. It tests cond can be traced with valid inputs. Specifically valid inputs is combination of:
- pred (python boolean, boolean tensor, int tensor, scalar tensor)
- true_fn/false_fn (func, obj, nn_module)
- Operands (0 or more tensor inputs), tested with 0  and 2
- closures (0 or more tensor closures), tested with 0 and 2
- nested_level (no nesting or level-2 nested cond)

What this test doesn't cover:
- pred: symbolic boolean expression as predicate
- true_fn/false_fn: that mutates indermediate tensors
- operands: non-tensor operands such as float, int
- closures: nn_module attribute closures, python constant closures
- nested_level: 3+

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110727
Approved by: https://github.com/zou3519
2023-10-11 15:56:13 +00:00
Nikita Shulga
ef19824db8 Suppress warnings in tensorpipe.h (#111012)
To fix distributed compilation with clang-15

Fixes https://github.com/pytorch/pytorch/issues/110974

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111012
Approved by: https://github.com/huydhn, https://github.com/drisspg, https://github.com/Skylion007
2023-10-11 15:41:30 +00:00
Sam Larsen
fc1105b282 [inductor] Implement Fx graph caching to improve warm compilation time. (#103453)
Summary: Implement an on-disk cache to save and reuse compiled FX Graphs. This implementation does not handle tensors with symbolic shapes. This needs to be done in a follow-up PR.

Test Plan:
* New unit tests exercising saving and load from the cache.
* New unit tests to exercise the cache key calculations.
* Ran several benchmarks to see cache hit and resulting compilation times.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103453
Approved by: https://github.com/eellison, https://github.com/Chillee
2023-10-11 14:39:14 +00:00
rzou
2cf9782912 [generate_opcheck_tests] Add some reasonable defaults (#110977)
Summary:
Make it easier to add `generate_opcheck_tests` by adding defaults for
the failures_dict location, the additional decorators, and the test
utils.

Test Plan:
Existing tests

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110977
Approved by: https://github.com/williamwen42
ghstack dependencies: #110951
2023-10-11 14:28:05 +00:00
PyTorch MergeBot
98c329b19e Revert "[core ATen IR] Add decompositions for max, min, var_mean (#110906)"
This reverts commit 9606cda64e.

Reverted https://github.com/pytorch/pytorch/pull/110906 on behalf of https://github.com/SS-JIA due to Breaks internal CI ([comment](https://github.com/pytorch/pytorch/pull/110906#issuecomment-1757490740))
2023-10-11 11:41:21 +00:00
Rohan Varma
de370eb313 [Distributed] Small nits to apply_optimizer_in_backward (#110903)
Clarify a few things around the documentation

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110903
Approved by: https://github.com/janeyx99
2023-10-11 07:45:45 +00:00
PyTorch MergeBot
0821868110 Revert "[export] Get export APIs ready for PTC (#110410)"
This reverts commit b96ea9f361.

Reverted https://github.com/pytorch/pytorch/pull/110410 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/110410#issuecomment-1757017249))
2023-10-11 07:31:51 +00:00
Angela Yi
b96ea9f361 [export] Get export APIs ready for PTC (#110410)
Summary:
https://docs.google.com/document/d/1QJJEGnj2nHGPODlw38BEG3KLLCOTfdOVjPrNQbz_LM8/edit#bookmark=id.lp80wfshq130
Changes:
* `torch.export` will return a functional ATen graph w/o decompositions
* `exported_program.run_decompositions(decomposition_table)` will optionally take a decomposition table, and run decompositions on the exported program, returning a new exported program. By default we will run the Core ATen decomposition table.

Calling convention for Executorch stays the same:
```
pre_autograd_graph = capture_pre_autograd_graph(f, args, ...)
aten_graph_no_decomps = torch.export.export(pre_autograd_graph, args, ...)
# Within to_edge we decompose to core aten and then convert to edge
edge_graph = exir.to_edge(aten_graph_no_decomps)
```

Test Plan: CI

Differential Revision: D49742989

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110410
Approved by: https://github.com/ydwu4
2023-10-11 06:10:07 +00:00
Michael Voznesensky
1e7947b3e0 Revert "Reland 3rd try [finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#109323)" + Forward fixes + test (#110964)
This reverts commit f786fbdebd.

Forward fixes

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110964
Approved by: https://github.com/ezyang, https://github.com/anijain2305
2023-10-11 05:16:47 +00:00
Nikita Shulga
e49ea87162 Fix socket.cpp compilation using gcc-9.4 (#111002)
Otherwise following error is thrown when attempted to compile with WERROR enabled:
```
In file included from /home/nshulga/git/pytorch/pytorch/torch/csrc/distributed/c10d/socket.cpp:30:
/home/nshulga/git/pytorch/pytorch/third_party/fmt/include/fmt/chrono.h:340:24: warning: redundant redeclaration of ‘constexpr’ static data member ‘fmt::v10::detail::codecvt_result<CodeUnit>::max_size’ [-Wdeprecated]
  340 | constexpr const size_t codecvt_result<CodeUnit>::max_size;
      |                        ^~~~~~~~~~~~~~~~~~~~~~~~
/home/nshulga/git/pytorch/pytorch/third_party/fmt/include/fmt/chrono.h:335:33: note: previous declaration of ‘fmt::v10::detail::codecvt_result<CodeUnit>::max_size’
  335 |   static constexpr const size_t max_size = 32;
      |                                 ^~~~~~~~
```
or following if using clang as host compiler
```
In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/distributed/c10d/socket.cpp:30:
/Users/nshulga/git/pytorch/pytorch/third_party/fmt/include/fmt/chrono.h:340:50: warning: out-of-line definition of constexpr static data member is redundant in C++17 and is deprecated [-Wdeprecated]
constexpr const size_t codecvt_result<CodeUnit>::max_size;
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111002
Approved by: https://github.com/drisspg
2023-10-11 05:16:00 +00:00
wz337
a614281ea9 Add current_device() to torch.cpu (#110987)
Better support device agnostic, add a "cpu" return for `current_device()` in torch.cpu so that we won't run into `AttributeError: module 'torch.cpu' has no attribute 'current_device'`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110987
Approved by: https://github.com/wanchaol
2023-10-11 05:13:10 +00:00
soulitzer
110382bacf Make NestedTensor compilable with eager backend (#109171)
In this PR:
- Adds support for strides for jagged tensor (design doc for this coming soon)
- NestedTensor skips automatic dynamic
- Make use of @bdhirsh's subclass fakification logic by adding the __tensor_{un,}flatten__ functions.
- Additional logic for fakification: since existing subclass fakification logic does not handle the case where the outer tensor has an additional dimension. We insert one-off logic to (1) insert an extra SingletonSymInt onto the fakified NestedTensor. (2) make sure we call track_symint on both the sizes on the inner and outer tensor during guard creation.

Remaining things that are weird:
- Still need to skip some logic in meta utils for some reason (I was going to write this up more, but decided not to since we're not able to do this anyway for a immediate reason: we cannot arbitrarily compare singleton ints. For now I'm just following Brian's advise from [here](https://github.com/pytorch/pytorch/pull/109171#discussion_r1328137070) )

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109171
Approved by: https://github.com/ezyang, https://github.com/bdhirsh
2023-10-11 04:47:10 +00:00
drisspg
e0dbaa04d2 Fix the meta func for mem_eff_backward (#110893)
Fixes #110832

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110893
Approved by: https://github.com/eellison
2023-10-11 02:58:54 +00:00
andrewor14
0e551bbcd7 [quant][pt2] Preserve source_fn_stack after QAT fusion (#110899)
Test Plan:
python test/test_quantization.py TestQuantizePT2EQAT.test_qat_preserve_source_fn_stack

Reviewers: jerryzh168, kimishpatel

Subscribers: jerryzh168, kimishpatel, supriyar

Differential Revision: [D50101253](https://our.internmc.facebook.com/intern/diff/D50101253)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110899
Approved by: https://github.com/jerryzh168
2023-10-11 02:55:52 +00:00
Tugsbayasgalan Manlaibaatar
5aee22e0e0 Move export.constrain_as_* to torch._constrain_as_* (#110757)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110757
Approved by: https://github.com/avikchaudhuri
ghstack dependencies: #109859
2023-10-11 02:37:55 +00:00
soulitzer
c9eb8d8d90 Add set_checkpoint_debug_enabled that overrides local setting (#110728)
People access activation checkpoint through many layers of config and it is not always guaranteed that all the layers of wrapping around checkpoint properly propagate all the kwargs, e.g. debug mode. This context manager offers an alternative way to enable debug mode that bypasses the need for all layers to propagate kwargs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110728
Approved by: https://github.com/albanD
ghstack dependencies: #110673, #110674, #110675, #110676
2023-10-11 02:12:31 +00:00
Michael Voznesensky
02f6a8126e Support a simple subset of functions as backward hooks on intermediate tensors (#109537)
The main thrust of the initial effort here was to capture `register_hook` calls on tensors in compile regions. The first part of this was done in https://github.com/pytorch/pytorch/pull/108903 wherein we added support for register_hook input tensors.

The distinction between input and intermediary is due to implementation differences.

There are 2 kinds of hooks:

1) Hooks on objects with sources (inputs, params)
2) Hooks on objects w/o sources (intermediaries, and outputs).

Note: As outputs can be made simple by how dynamo handles residuals, they could actually be handled as if they were inputs, but, for the sake of this PR, we will refer to hooks as either hooks on inputs (sourced), or hooks on intermediaries (not sourced).

**The plan:**

For tensors w/ a source: (The PR above)
We record registered hooks, store them as a global, and associate them with the tensor in residuals. This means that when dynamo goes to create the frame, where we produce bytecode to stitch together our PT2 modified bytecode with the original eager code, we call register_hook. This registration of hooks in residuals is sound because (a) it happens right after a Pt2 frame region ends and (b) we know that the tensor is alive in f_locals, f_globals, or a module in the users invoking frame. This means we can soundly know it will be around to invoke register_hook on. As long as we guard on the identity of the lifted function, this is sound to do.

For tensors w/o a source: (This PR)

Ostensibly, the most correct and complete solution would be to smuggle hooks into a runtime wrapper in aot_autograd, where all the items the hooks close over are lifted to inputs as necessary and passed alongside the user provided function. This is necessary so that we can properly trace out and capture all the mutations within the user defined hook at backwards time.

This is too complicated - so, we limited the scope of this initial PR to a simple subset of hooks:

- Hooks must have a source (be known to us already, not a lambda or intermediary defined function)
- We must be tracing under compiled autograd

**The flow**:

We use the HOP added in https://github.com/pytorch/pytorch/pull/109690/files, referred to as the HOP below.

1) We intercept register_hook calls and wrap the user defined fn in the HOP
2) We write a `_register_hook_trampoline` to the graph that is a local no-arg function that is invoked as a call_function in the dynamo graph
3) aot_autograd inlines through it during its trace, and sees the HOP
4) the HOP preserves itself in the graph - it does not get traced into
5) During backwards, compiled_autograd installs the HOP under a hook call
6) When compiled_autograd enters compilation over its generated graph, dynamo traces the contents of the hook

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109537
Approved by: https://github.com/ezyang
2023-10-11 01:35:37 +00:00
Jon Chuang
79212430df feat(inductor): fx graph debug should display device (#110346)
Device mismatch issues are root cause of: https://github.com/pytorch/pytorch/issues/107006, hence make device-related scheduling issues easier to diagnose.
Also format single-kwarg graphs to be more concise

Example rendering:
![image](https://github.com/pytorch/pytorch/assets/9093549/1b59a994-f2df-45c9-8cb7-37eb3ba12654)

CC code owners: @ngimel @jansel @shunting314 @mlazos @peterbell10

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110346
Approved by: https://github.com/eellison
2023-10-11 00:34:55 +00:00
Edward Z. Yang
24bf9aeb6b Fix arange with dynamic end argument. (#110979)
Fixes https://github.com/pytorch/pytorch/issues/93468

There's a few extra tests that are sort of unrelated, but I ended up writing them while working on the fix and decided to keep them. The big idea here is to split the `_check` so that `expect_true` works; I could have probably also improved the symbolic reasoning but I'm lazy. One small logging fix too.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110979
Approved by: https://github.com/Skylion007
2023-10-11 00:32:34 +00:00
leslie-fang-intel
a11d4a8378 [Reland] [Inductor] Break the loop fusion when node2 depends on node1 mutations (#110677)
Reland PR https://github.com/pytorch/pytorch/pull/109172 which has been reverted in https://github.com/pytorch/pytorch/pull/110622

Differential Revision: [D50097373](https://our.internmc.facebook.com/intern/diff/D50097373)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110677
Approved by: https://github.com/jgong5, https://github.com/ezyang
2023-10-11 00:26:45 +00:00
PyTorch MergeBot
314a502eb0 Revert "Reland "[C10] PG observability hooks. (#108815)" (#110907)"
This reverts commit 7678cd22af.

Reverted https://github.com/pytorch/pytorch/pull/110907 on behalf of https://github.com/huydhn due to Sorry for reverting this, but macos job in trunk starts failing after this 7678cd22af ([comment](https://github.com/pytorch/pytorch/pull/110907#issuecomment-1756497387))
2023-10-11 00:23:42 +00:00
Jon Chuang
5aa96fd336 [dynamo] list index: add more list types to testing, support namedtuple, improve error handling (#110919)
Follow up: #110817

Minor improvements as discussed in prev PR

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110919
Approved by: https://github.com/ezyang
2023-10-11 00:16:39 +00:00
SS-JIA
9606cda64e [core ATen IR] Add decompositions for max, min, var_mean (#110906)
## Context

Add decompositions for `aten.max`, `aten.min`, and `aten.var_mean`. These operators follow a pattern of returning a tuple of outputs from two component operators:

```
aten.max(x) -> return aten.amax(x), aten.argmax(x)
aten.min(x) -> return aten.amin(x), aten.argmin(x)
aten.var_mean(x) -> return aten.var(x), aten.mean(x)
```

For `var_mean`, the `refs` implementation was doing something similar, so I changed it to call `torch.` ops instead like was done for other `refs` implementations previously. cc: @peterbell10 @lezcano

Note that Inductor lowers all these directly, so they are excluded from the Inductor decomp table.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110906
Approved by: https://github.com/manuelcandales
2023-10-11 00:06:24 +00:00
PyTorch MergeBot
3100d3e661 Revert "[inductor] Implement Fx graph caching to improve warm compilation time. (#103453)"
This reverts commit 8a8668e1ae.

Reverted https://github.com/pytorch/pytorch/pull/103453 on behalf of https://github.com/kit1980 due to The newly added test fails on internal builds ([comment](https://github.com/pytorch/pytorch/pull/103453#issuecomment-1756449919))
2023-10-10 23:21:59 +00:00
Will Constable
ca03f36233 Change ProcessGroupNCCL default timeout to 10 min (#110947)
Avoid changing default for other backends as CPU backend (GLOO) may need
longer timeouts.

Motivated by trying to save cluster time when encountering collective
hangs.  Generally collectives should time out within seconds and 30
minutes (or 10 minutes) should provide ample headroom for edge cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110947
Approved by: https://github.com/xw285cornell, https://github.com/fduwjj
2023-10-10 22:28:39 +00:00
Tugsbayasgalan Manlaibaatar
cd275dc24f Remove RangeConstraints in favor of ValueRanges (#109859)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109859
Approved by: https://github.com/avikchaudhuri
2023-10-10 22:22:05 +00:00
Jerry Zhang
7a69e3d30b [fx][subgraph_matcher] Add a matcher that supports name to node map (#110743)
Summary:
We want the matcher to return a name -> node in target graph
so that we can refer to the node by name, this is useful for downstream applications like
quantization.

and also we can use the torch API as source of truth instead of matching aten API directly.

Test Plan:
python test/fx/test_matcher_utils.py

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110743
Approved by: https://github.com/SherlockNoMad
2023-10-10 22:21:24 +00:00
Ramil Nugmanov
91eeb77260 StackDataset batched sampling (#110694)
Optimization of loading minibatches

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110694
Approved by: https://github.com/ejguan
2023-10-10 22:05:51 +00:00
Joel Schlosser
43ea782af3 Multiprocessing support for NT (#110292)
Fixes #110161

Allows NTs to be used in DataLoaders with `num_workers > 1`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110292
Approved by: https://github.com/cpuhrsch, https://github.com/albanD
2023-10-10 21:58:19 +00:00
rzou
3a29cdc5e6 [optests] Add dontGenerateOpCheckTests and is_inside_opcheck_mode (#110951)
This PR adds the following helper functions for generated opcheck tests:
- dontGenerateOpCheckTests is a decorator that skips generation of the
  opcheck tests for the generated function
- is_inside_opcheck_mode lets us query if we are in a generated test.
  Useful for fast debugging out-of-tree without needing to update
  PyTorch.

Test Plan:
- new tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110951
Approved by: https://github.com/williamwen42
2023-10-10 21:43:43 +00:00
wz337
d9eb5a57aa [FSDP] Change _create_chunk_dtensor in fsdp/_shard_utils.py to use public API from DTensor (#110831)
This PR:
1) updates _create_chunk_dtensor() in _shard_utils.py to use public APIs from DTensor. This will avoid the global_size calculation error from using DTensor.from_local() for uneven-sharded parameters, as described in https://github.com/pytorch/pytorch/issues/110762
2) updates test/distributed/fsdp/test_fsdp_dtensor_state_dict.py to include unit test for a model with uneven sharding.

cc. @wanchaol, @fegin

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110831
Approved by: https://github.com/wanchaol, https://github.com/fegin
2023-10-10 21:04:27 +00:00
Jon Chuang
6e770c0dda [dynamo] Add itertools.repeat via polyfill (#110953)
Fixes https://github.com/pytorch/pytorch/issues/110286

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110953
Approved by: https://github.com/ezyang
2023-10-10 20:40:33 +00:00
PyTorch MergeBot
02a02a23ee Revert "Move at::{Refcounted,}MapAllocator to c10 (#109881)"
This reverts commit 0341deb1c7.

Reverted https://github.com/pytorch/pytorch/pull/109881 on behalf of https://github.com/albanD due to It does break buck build ([comment](https://github.com/pytorch/pytorch/pull/109881#issuecomment-1756195823))
2023-10-10 20:39:12 +00:00
Will Constable
7678cd22af Reland "[C10] PG observability hooks. (#108815)" (#110907)
This reverts commit ff0358b038.

(original PR https://github.com/pytorch/pytorch/pull/108815 desc copied below)

Expose a set of observability hooks into C10D such that our users can
detect collectives failure both faster and more easily.

The design is similar to NCCL desync debug that it minimized the
overhead by doing most of the work out of the main thread.

This PR introduces a new module torch.distributed.hooks that exposes the following set of methods:

    register_collective_start_hook
    register_collective_end_hook
    register_process_group_hook

The process group hook exposes PG creation on the member ranks and call them inline from the
the PG creation code. This is fine since this happens during initialization and a limited number of times.

The collective start/end hooks are fired from a single background thread. It reads
events from a C++ queue and dispatches over.

Queue notification is oddly done using a pipe, this is needed so python can abort the thread on shutdown
and have it as background thread. This is not possible with more reasonable choices like a condvar.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110907
Approved by: https://github.com/fduwjj
2023-10-10 20:09:40 +00:00
Jon Chuang
84ad3ed7b2 [dynamo] add config for displaying all guard failures (#110927)
Fixes https://github.com/pytorch/pytorch/issues/110879

Example output:
```
('Recompiling function fn in /home/jonch/Desktop/Programming/mlsys/pytorch/test/dynamo/test_misc.py:4578', 'triggered by the following guard failures: ["___check_type_id(L[\'obj\'], 94834370481168)", "L[\'obj\'].x == -0.5"]')
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110927
Approved by: https://github.com/lezcano
2023-10-10 19:57:44 +00:00
DanilBaibak
8cf1a02e80 Rever [Profiler] Improve the docstring for export_memory_timeline (#110978)
Rever [Profiler] Improve the docstring for export_memory_timeline
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110978
Approved by: https://github.com/huydhn, https://github.com/aaronenyeshi
2023-10-10 19:57:25 +00:00
soulitzer
bc49b1e50b [reland] Use is_symbolic instead of testing isinstance in some place (#110676)
reland of https://github.com/pytorch/pytorch/pull/110372

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110676
Approved by: https://github.com/ezyang
ghstack dependencies: #110673, #110674, #110675
2023-10-10 19:37:17 +00:00
soulitzer
df9a6bcaef [reland] Symintify guards.cpp (#110675)
reland of https://github.com/pytorch/pytorch/pull/110371

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110675
Approved by: https://github.com/ezyang
ghstack dependencies: #110673, #110674
2023-10-10 19:37:17 +00:00
soulitzer
3842b175d2 [reland] Add symbolic singleton int (#110674)
reland of https://github.com/pytorch/pytorch/pull/110370
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110674
Approved by: https://github.com/ezyang
ghstack dependencies: #110673
2023-10-10 19:37:17 +00:00
soulitzer
fda0a965c7 [reland] Support SingletonSymNode mul with coefficient (#110673)
reland of https://github.com/pytorch/pytorch/pull/110369
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110673
Approved by: https://github.com/ezyang
2023-10-10 19:37:17 +00:00
Aaron Shi
52b1470935 [Profiler] Improve the docstring for export_memory_timeline (#110949)
Summary: Add more details about the export_memory_timeline API, as we've landed new representations of the memory timeline data.

Test Plan: CI, should be no functional change, as we only changed comments.

Differential Revision: D50123450

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110949
Approved by: https://github.com/davidberard98
2023-10-10 17:53:56 +00:00
Chien-Chin Huang
57f6368b8e [collective] Add a torch.compile + functional_collectives test (#110688)
Add a test to ensure functional_collectives + torch.compile always works.

Differential Revision: [D50001491](https://our.internmc.facebook.com/intern/diff/D50001491/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110688
Approved by: https://github.com/wanchaol, https://github.com/fduwjj
2023-10-10 17:14:50 +00:00
eellison
c5f06b9753 Re-enable test_copy_transpose_math_view, neg_view/dce fix (#110651)
- neg view can just be lowered to neg() post functionalization
- we were treating all fallback kernels as not having side effects. we shouldn't dce mutating fallback kernels - either mutations induced by the reinplacing pass or clone_ with unsupported arguments (complex)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110651
Approved by: https://github.com/Chillee, https://github.com/jansel, https://github.com/malfet, https://github.com/Skylion007
2023-10-10 16:34:01 +00:00
Brian Hirsh
ba86dfcd83 AOTDispatch subclass (#104483)
This is a PoC of AOTDispatch support. This PR actually works on basic examples, and I'm working on testing it out on `DTensor` (with @wanchaol), `SemiStructuredSparsityTensor` (with @jcaip), and `FP8Tensor`.

There are some design decisions baked into the PR that I think we need consensus on though - so I'm planning on writing a larger design doc to go over the changes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104483
Approved by: https://github.com/ezyang
2023-10-10 16:13:16 +00:00
Jiong Gong
8bc04f46fe [inductor cpp] use c10::bit_cast to avoid violating strict-aliasing (#110809)
Fix https://github.com/pytorch/pytorch/issues/110807

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110809
Approved by: https://github.com/jansel
2023-10-10 11:16:31 +00:00
Chien-Chin Huang
7b25c2b90e [FSDP][optim_state_dict] Move local optimizer state to FSDP compute_device (#110929)
This will ensure all the tensors are on FSDP compute_device.

Differential Revision: [D50059492](https://our.internmc.facebook.com/intern/diff/D50059492/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110929
Approved by: https://github.com/wz337
2023-10-10 10:34:31 +00:00
Michael Voznesensky
fb68aa0a92 [Easy] Remove unused return type from utils (#110887)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110887
Approved by: https://github.com/ezyang
2023-10-10 09:02:11 +00:00
jjsjann123
37567fdf31 Nvfuser cpp api deprecation attempt 2 (#110881)
attempting to re-try #110318 deprecating nvfuser c++ API

warning has been updated to TORCH_WARN_ONCE;
Warning thrown inside torch::jit::fuser::cuda::isEnabled() is turned off and will be deprecated when we pulled out TorchScript integration in the follow up PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110881
Approved by: https://github.com/davidberard98, https://github.com/NicolasHug
2023-10-10 08:07:03 +00:00
Tugsbayasgalan Manlaibaatar
35e48e262c [custom op] Use canonical API to constrain unbacked values (#108372)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/108372
Approved by: https://github.com/angelayi, https://github.com/ezyang
2023-10-10 05:14:28 +00:00