Commit Graph

87 Commits

Author SHA1 Message Date
Edward Z. Yang
ddf4cd69ec Delete ifdyn and ifunspec combinators (#103596)
Replaced with expect tests for ease of updating.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103596
Approved by: https://github.com/voznesenskym
2023-06-15 00:14:17 +00:00
Michael Lazos
6c6c897d6b Add graph break logging option instead of config flag (#103202)
Make graph break logging a logging option vs a config setting

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103202
Approved by: https://github.com/yanboliang, https://github.com/anijain2305
2023-06-12 19:52:31 +00:00
Edward Z. Yang
414ec6ce97 Turn off automatic_dynamic_shapes in prep for dynamic-by-default (#103320)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103320
Approved by: https://github.com/Skylion007
2023-06-10 02:49:59 +00:00
PyTorch MergeBot
f79d2b45fb Revert "Replace _dynamo.config with an object instead of module (#96455)"
This reverts commit 3864207c2a.

Reverted https://github.com/pytorch/pytorch/pull/96455 on behalf of https://github.com/DanilBaibak due to Break internal build ([comment](https://github.com/pytorch/pytorch/pull/96455#issuecomment-1576162237))
2023-06-05 07:06:14 +00:00
Han Qi
3864207c2a Replace _dynamo.config with an object instead of module (#96455)
Summary:
    Replace _dynamo.config with an object instead of module

    Current usage patterns of setting and reading fields on config will work
    unchanged.

    Only changes needed going forward:
    1. import torch._dynamo.config will not work. However, just doing
       import torch._dynamo is sufficient to access dynamo config
       as torch._dynamo.config.

    2. Files inside of _dynamo folder need to access config via
       from torch._dynamo.config_util import config instead of
       from torch._dynamo import config. Because _dynamo/__init__.py
       imports some of the files so it would be circular import.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96455
Approved by: https://github.com/jansel
2023-06-03 23:18:41 +00:00
Michael Voznesensky
4c1bc91f42 Support autograd.Function w/ grad (#99483)
This PR adds support for tracing autograd.Function with grad.

A few important bullet points outlining our approach:

1) Our goal is to verify soundness in order to add a call_function to the autograd.Function's `apply` to the graph.
2) We achieve (1) by either verifying soundness or rejecting soundness, by ensuring that both forward and backward of the autograd.Function are sound.
3) For the forward, if we verify soundness, we install its guards into the graph.
4) For the backward, if we verify soundness, we throw it out. However, backwards soundness verification is more onerous, and has a config driven set of banned attrs and methods for tensors.

1-4 above are achieved by turning the forward and backward into UserDefinedFunctionVariables, and inlining through them, relying on dynamo's soundness detection. If we graph break in these, we raise and treat them as unsound. As noted above, backwards is stricter yet.

For the tracing, the safety comes from dynamo's HigherOrderOperator system. That system ensures that not only do we trace soundly, but that no new variables are lifted into inputs during the tracing, and that the forward and backwards are entirely self contained.

Whenever we reject a function as unsound, we restore back, as usual.

Due to some limitations in the lifting logic, we have an escape hatch we implemented for tensors that are known in forward, but cross into backwards through save_tensors (save) /saved_tensors (load). We escape hatch here to avoid having the known saved tensors coming from forward end up being accidentally treated as lifted variables (and rejected). This is sound, but a little hacky feeling.

Additionally, due to some limitations in fx node removal, combined with how we produce subgraphs for the traces installed from HigherOrderOperators, we had to improve our node removal logic. In the event of a restore, we remove the old nodes from the graph, as usual in dynamo. However, because the references to these nodes may exist in subgraphs, we traverse any nodes users and remove them first if and only if they are in another graph. This is always sound, because removal should only be downstream of restoration at this point.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99483
Approved by: https://github.com/zou3519
2023-05-19 01:26:21 +00:00
Animesh Jain
8994d9e610 [dynamo] Hide guard_fail_hook behind a flag to improve cache lookup time (+10% DebertaV2) (#100590)
For TorchDynamo eager backend, DebertaV2 speedup improves from 0.77x to 0.87x.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100590
Approved by: https://github.com/voznesenskym, https://github.com/wconstab
2023-05-04 18:52:21 +00:00
Will Constable
2dca418112 Reland basic dynamo support for traceable collectives (#100476)
Relative to the original land, this also contains:
- Fix torchdeploy import of functional collectives
- Can't import torchdynamo utils due to torch._refs being missing

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100476
Approved by: https://github.com/kumpera
2023-05-04 04:25:35 +00:00
Edward Z. Yang
c7e9f40653 Misc accuracy improvements on minifier (#100447)
The changes:

* Add config knob `same_two_models_use_fp64` for toggling whether or not to use fp64
* Add a test showing that RMSE is superior to atol/rtol
* Add `--strict-accuracy` options, which allows for testing against integral/boolean accuracy.  Regular accuracy by default now ONLY. There's a test which exercises this, it's a little delicate but I had trouble thinking of a good test otherwise.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100447
Approved by: https://github.com/voznesenskym
2023-05-04 02:51:26 +00:00
Shabab Ayub
287f74c4fc Revert D45387167: Multisect successfully blamed D45387167 for test or build failures (#100424)
Summary:
This diff is reverting D45387167
D45387167: Basic dynamo support for traceable collectives (#94440) by wconstab has been identified to be causing the following test or build failures (internal)

If you believe this diff has been generated in error you may Commandeer and Abandon it.

Test Plan: NA

Reviewed By: s4ayub

Differential Revision: D45448312

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100424
Approved by: https://github.com/rohan-varma, https://github.com/kumpera
2023-05-03 16:10:54 +00:00
kshitij12345
8b64dee5d2 [fix] torch_compile_debug don't log with 0 (#100462)
Fixes https://github.com/pytorch/pytorch/issues/99906

Tested locally.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/100462
Approved by: https://github.com/mlazos
2023-05-03 08:23:09 +00:00
Larry Liu
687afeb686 [dynamo][numpy] Add NumpyTensorVariable to translate ndarray attribute calls to tensor attributes (#95849)
Issue: #93684

# Problem

Reduce graph breaks when dynamo compiles python functions containing numpy functions and ndarray operations.

# Design (as I know it)

* Use torch_np.ndarray(a wrapper of tensor) to back a `VariableTracker`: `NumpyTensorVariable`.
* Translate all attributes and methods calls, on ndarray, to torch_np.ndarray equivalent.

This PR adds `NumpyTensorVariable` and supports:
1.  tensor to ndarray, ndarray to tensor
2. numpy functions such as numpy.meshgrid()
3. ndarray attributes such as `itemsize`, `stride`

Next PR will handle returning `np.ndarray` and add support for ndarray methods
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95849
Approved by: https://github.com/ezyang
2023-04-27 16:18:35 +00:00
Will Constable
100a25d021 Basic dynamo support for traceable collectives (#94440)
Make traceable collectives work with torchdynamo,
bypassing problems with tracing the AsyncTensor subclass.

Accept a suboptimal solution for now, and optimize it later.
For now, wait happens immediately, which generally forces an early sync.

Later, find a way either in dynamo or AOT stack to handle
AsyncCollectiveTensor to get the wait in the optimal place.

Note on implementation:
- Dynamo traces 'user-level' fc apis that are designed to behave differently
  in eager vs compiled.  In eager, there will be work-obj registration and
  a wrapper subclass will insert a 'wait' call at the appropriate time.
  In compile/trace mode, wait will be immetiately called, and work obj
  registration is required to be handled by the compile backend at runtime.
- Dynamo needs to trace into some of the helper functions in the 'user-level'
  api, such as '_expand_group' which is essentially a constant transformation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94440
Approved by: https://github.com/kumpera
2023-04-27 05:38:36 +00:00
Avik Chaudhuri
f6f35135a4 suggest constraints to specify for export based on generated shape guards (#98463)
The design of export API expects constraints to be specified on dynamic dimensions, while assuming all other dimensions are static by default. However a user who wishes to export a model may not be fully familiar with the code to plan what to specify.

This diff provides support for discovering constraints to specify. The basic idea is to take the set of generated shape guards and convert them into appropriate constraints. However, we usually generate a LOT of shape guards, and there is often a LOT of redundancy in them. Thus, we also need to simplify the guards so that our suggested constraints are concise yet capture the information content in the guards.

The algorithm for simplification uses `sympy` under the hood, but very surgically to avoid any risk of blowing up. See comments inline for a full description. Briefly,
1. We consider only univariate inequalities, and among them, solve for equalities first.
2. We substitute these exact solutions to convert multivariate inequalities progressively into univariate.
3. Remaining univariate inequalities are solved using `sympy.solvers.inequalities.reduce_inequalities`.
4. As pre-processing, we also eliminate all `//` and `%` operations to generate a set of linear congruence guards, and solve these using `sympy.ntheory.modular.solve_congruence`.

The results are quite dramatic. For example, an internal model produced several hundreds of guards with `dynamic_shapes=True`, which were pretty much inscrutable for humans. The summary contains around 30 dimensions that were specialized and 3 constraints on dynamic dimensions. The output format looks like this:
```
The following dimensions have been specialized and CANNOT be dynamic.
NOTE: Specializations will happen by default with `assume_static_by_default=True`.
	L['foo']['bar'].size()[0] == 4
        ...
	L['baz']['qux'].size()[3] == 96

The following dimensions CAN be dynamic.
You can use the following code to specify the constraints they must satisfy:
constraints=[
	dynamic_dim(L['blah']['bleh'], 1) == dynamic_dim(L['blah']['bloh'], 1),
        ...,
	2 <= dynamic_dim(L['blah']['bloh'], 1),
]
```

Differential Revision: [D44731747](https://our.internmc.facebook.com/intern/diff/D44731747/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98463
Approved by: https://github.com/voznesenskym, https://github.com/ezyang
2023-04-19 21:56:36 +00:00
Edward Z. Yang
c67c16bcd2 Switch calling convention back to real tensors (#99320)
Months ago, in order to get dynamic shapes working through to Dynamo backends, we changed the calling convention to pass fake tensors rather than real tensors as example inputs to backends. The motivation at the time was, well, backends shouldn't really be peeking at the real tensors when they are doing compilation, and so it would make more sense to hide the real tensors from backends. But there were a bunch of problems:

* This interacted poorly with our accuracy minifier design: accuracy minifier needs access to the real inputs in order to run the model and figure out what happens!
* The TensorRT backend required real inputs and we never figured out how to fix it.
* In practice, all the backends needed to detect if they were passed real tensors, and fakeify them anyway (certainly AOTAutograd does this)
* Parameters and inputs are treated non-uniformly: parameters had to be passed as real tensors, because CUDA graphs requires knowing what the actual tensors are

Furthermore, there were some more problems discovered after the fact:

* Backends may want to optimize on aspects of tensors which you cannot tell without having real tensors; e.g., alignment of the data pointer

So, this PR decides that changing the calling convention was a bad idea, and switches back to passing real tensors. There is a problem though: AOTAutograd will perform fakeification, which means that in practice backends are still going to end up with fake tensors in the end anyway. I want to change this, but this will require some work with bdhirsh's upcoming AOTAutograd export refactor.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99320
Approved by: https://github.com/voznesenskym
2023-04-19 12:15:52 +00:00
PyTorch MergeBot
ea50d4f146 Revert "Switch calling convention back to real tensors (#99320)"
This reverts commit 780922c24e.

Reverted https://github.com/pytorch/pytorch/pull/99320 on behalf of https://github.com/DanilBaibak due to Break internal build
2023-04-19 09:44:06 +00:00
Edward Z. Yang
780922c24e Switch calling convention back to real tensors (#99320)
Months ago, in order to get dynamic shapes working through to Dynamo backends, we changed the calling convention to pass fake tensors rather than real tensors as example inputs to backends. The motivation at the time was, well, backends shouldn't really be peeking at the real tensors when they are doing compilation, and so it would make more sense to hide the real tensors from backends. But there were a bunch of problems:

* This interacted poorly with our accuracy minifier design: accuracy minifier needs access to the real inputs in order to run the model and figure out what happens!
* The TensorRT backend required real inputs and we never figured out how to fix it.
* In practice, all the backends needed to detect if they were passed real tensors, and fakeify them anyway (certainly AOTAutograd does this)
* Parameters and inputs are treated non-uniformly: parameters had to be passed as real tensors, because CUDA graphs requires knowing what the actual tensors are

Furthermore, there were some more problems discovered after the fact:

* Backends may want to optimize on aspects of tensors which you cannot tell without having real tensors; e.g., alignment of the data pointer

So, this PR decides that changing the calling convention was a bad idea, and switches back to passing real tensors. There is a problem though: AOTAutograd will perform fakeification, which means that in practice backends are still going to end up with fake tensors in the end anyway. I want to change this, but this will require some work with bdhirsh's upcoming AOTAutograd export refactor.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99320
Approved by: https://github.com/voznesenskym
2023-04-18 02:09:57 +00:00
Michael Voznesensky
ccc9a3d726 Automatic Dynamic Shapes (#98923)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98923
Approved by: https://github.com/ezyang
2023-04-13 02:39:23 +00:00
Edward Z. Yang
5c38c4cfa4 Improve symbolic shapes guard logging (#98941)
Billing of changes:
* Get rid of `print_guards`; instead, you control this with `TORCH_LOGS=torch.fx.experimental.symbolic_shapes`, debug logging toggles stack traces
* Don't incorrectly report the tracing context frame when we're compiling; we just don't have this info anymore! (TODO: use the saved frames instead). This is via a new TracingContext.clear_frame context manager
* Add TracingContext.extract_stack() which gives you the tracing context stack.
* Add ShapeEnvLoggingAdapter to report which ShapeEnv any given operation is from (this is helpful for debugging situations when there are too many ShapeEnvs floating around)
* Tweak create_symbol log message to also report Source
* Add a debug log whenever duck sizing occurs
* Report an excerpt of both the user and system backtrace whenever a guard is added in INFO mode. I found this is a good balance of "where did the guard come from" without full backtrace verbosity.

Example log output with the new output:

```
[2023-04-12 08:25:49,003] torch.fx.experimental.symbolic_shapes: [INFO] 0: create_env
[2023-04-12 08:25:49,021] torch.fx.experimental.symbolic_shapes: [INFO] 0: create_symbol s0 = 32 for L['x'].size()[0]
[2023-04-12 08:25:50,154] torch.fx.experimental.symbolic_shapes: [INFO] 0: evaluate_expr s0 < 128 [guard added] at w.py:11 in forward2 (_dynamo/variables/tensor.py:476 in evaluate_expr)
[2023-04-12 08:25:52,057] torch.fx.experimental.symbolic_shapes: [INFO] 0: evaluate_expr Eq(Mod(s0, 16), 0) [guard added] (_inductor/codegen/triton.py:77 in is_aligned)
```

from running

```
import torch
import torch._dynamo

def f(x, y):
    return x + y

def forward(x, y):
    return forward2(x, y)

def forward2(x, y):
    if x.size(0) < 128:
        x = x * 2
    else:
        x = x * 3
    r = f(x, y)
    r = r * y
    return r

def woof():
    fn_compiled = torch.compile(forward, dynamic=True)
    x = torch.randn(32, device='cuda')
    y = torch.randn(32, device='cuda')
    print(fn_compiled(x, y))

woof()
```

(To induce the Triton guard, I synthetically reverted https://github.com/pytorch/pytorch/pull/98471)

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98941
Approved by: https://github.com/wconstab
2023-04-12 21:58:59 +00:00
PyTorch MergeBot
629377ea8b Revert "Replace _dynamo.config with an object instead of module (#96455)"
This reverts commit 420104a886.

Reverted https://github.com/pytorch/pytorch/pull/96455 on behalf of https://github.com/jansel due to BC breaking, was landed prematurely
2023-04-12 15:06:14 +00:00
Han Qi
420104a886 Replace _dynamo.config with an object instead of module (#96455)
Summary:
    Replace _dynamo.config with an object instead of module

    Current usage patterns of setting and reading fields on config will work
    unchanged.

    Only changes needed going forward:
    1. import torch._dynamo.config will not work. However, just doing
       import torch._dynamo is sufficient to access dynamo config
       as torch._dynamo.config.

    2. Files inside of _dynamo folder need to access config via
       from torch._dynamo.config_util import config instead of
       from torch._dynamo import config. Because _dynamo/__init__.py
       imports some of the files so it would be circular import.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96455
Approved by: https://github.com/williamwen42
2023-04-11 21:23:32 +00:00
Michael Lazos
34961d416c Remove unused log config settings (#98795)
Summary: Removing deprecated log settings

Test Plan: Removing code, no tests needed

Differential Revision: D44853619

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98795
Approved by: https://github.com/anijain2305
2023-04-11 10:07:29 +00:00
Will Constable
390c51bf87 Skip nnmodule hook guards by default (#98371)
This PR makes basic nnmodule forward hooks work by default, without any overhead.  But it leaves silent correctness issues if users modify/remove their hooks later, thus also emits a warning.

- the usual case is to not use hooks, so avoid guard overhead here
- registering any hook before compile will trigger a warning about hook support
- registering a hook later (or removing one) requires user knowledge and opting in,
  currently this isn't warnable (but maybe we can observe compiled nnmodules to make it
  warnable).

Why skip hook guards by default instead of not tracing __call__/hooks by default?
- avoid having a mode flag that alters dynamo tracing behavior (harder to test both codepaths
  in CI with full coverage)
- the most basic hook usecase (registering a hook before compile, and never removing it)
  will work by default with this PR, while it would require enablement and incur overhead
  in the 'not tracing __call__' proposal.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98371
Approved by: https://github.com/jansel
2023-04-07 15:10:51 +00:00
Jason Ansel
bc9dd969e1 Support inlining no_grad() decorator (#98121)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/98121
Approved by: https://github.com/anijain2305, https://github.com/voznesenskym
2023-04-03 00:24:56 +00:00
Michael Lazos
ee9a9b7add Remove old logging callsites (#98095)
Get around GH first issue, OSS only changes for https://github.com/pytorch/pytorch/pull/97182

Pull Request resolved: https://github.com/pytorch/pytorch/pull/98095
Approved by: https://github.com/anijain2305
2023-04-01 00:57:37 +00:00
Edward Z. Yang
97fc8ea5f4 Run the benchmark suite with dynamic batch only (#97912)
Symbolic shapes compile time on full CI with inductor is horribly long (even though our aot_eager local runs seemed to suggest that the added latency was only 10s per model.) To patch over the problem for now, run the benchmark suite with dynamic batch only.  This should absolve a lot of sins.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97912
Approved by: https://github.com/janeyx99, https://github.com/desertfire
2023-03-30 18:04:48 +00:00
Michael Lazos
e626be79a4 Add config setting to error on recompile (#97829)
Adds a config setting `error_on_recompile` - when set dynamo will raise an exception after compiling a function for the second time.

This was requested to help debugging in pyper

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97829
Approved by: https://github.com/bertmaher
2023-03-29 19:00:43 +00:00
Will Constable
f4ac8e0052 Add dynamo config skip_nnmodule_hook_guards (#97830)
This lets users that are sure they won't use hooks avoid overhead
related to dynamo guards on (assumedly) empty hook dicts on all
nn modules.

Only enable this flag if you are sure you won't change hook-behavior
after compiling.  It is ok to register a hook and then compile, if
you promise never to remove/alter the hook.  It is also ok to
not register a hook and compile, if you never register a hook later.

Note- this is not the best we can do, and hopefully in the future
we can avoid the need for this option following some of these paths
- make guards fast enough to not be an issue when guarding on hook
  dicts
- make a mode where dynamo actually skips tracing __call__ so
  hooks are consistently ignored by compiled programs
- use nnmodule versioning so hook changes can be guarded without
  explicit hook dict guards

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97830
Approved by: https://github.com/jansel
2023-03-29 04:25:27 +00:00
Will Constable
c1a6dde79e Make dynamo-FSDP skip guards (#97463)
Create a new GuardSource for FSDP modules, and use it
to opt out of guard installation.

Based on @awgu's work in https://github.com/pytorch/pytorch/pull/97091

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97463
Approved by: https://github.com/voznesenskym, https://github.com/jansel, https://github.com/awgu
2023-03-28 04:04:34 +00:00
Michael Lazos
a1c46e5f8f component-level configurable logging for dynamo, inductor, aot (#94858)
Summary:

Adds NNC-like logging that is configured through an env var `TORCH_COMPILE_LOGS`
Examples:
`TORCH_LOGS="dynamo,guards" python script.py` - prints dynamo logs at level INFO with guards of all functions that are compiled

`TORCH_LOGS="+dynamo,guards,graph" python script.py` - prints dynamo logs at level DEBUG with guards and graphs (in tabular) format of all graphs that are compiled

[More examples with full output](https://gist.github.com/mlazos/b17f474457308ce15e88c91721ac1cce)

Implementation:
The implementation parses the log settings from the environment, finds any components (aot, dynamo, inductor) or other loggable objects (guards, graph, etc.) and generates a log_state object. This object contains all of the enabled artifacts, and a qualified log name -> level mapping. _init_logs then adds handlers to the highest level logs (the registered logs), and sets any artifact loggers to level DEBUG if the artifact is enabled.

Note: set_logs is an alternative for manipulating the log_state, but if the environment contains TORCH_LOGS, the environment settings will be prioritized.

Adding a new log:
To add a new log, a dev should add their log name to torch._logging._registrations (there are examples there already).

Adding a new artifact:
To add a new artifact, a dev should add their artifact name to torch._logging._registrations as well.
Additionally, wherever the artifact is logged, `torch._logging.getArtifactLogger(__name__, <artifact_name>)` should be used instead of the standard logging implementation.

[design doc](https://docs.google.com/document/d/1ZRfTWKa8eaPq1AxaiHrq4ASTPouzzlPiuquSBEJYwS8/edit#)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94858
Approved by: https://github.com/ezyang
2023-03-18 04:17:31 +00:00
Edward Z. Yang
3606f59366 Default specialize_int to False (#96624)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96624
Approved by: https://github.com/janeyx99
2023-03-16 02:54:18 +00:00
Will Constable
784dd583a6 Automatically register/clear dynamo profiler hooks while profiling (#96199)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/96199
Approved by: https://github.com/jansel
2023-03-14 21:19:33 +00:00
PyTorch MergeBot
ba4fb9b6ad Revert "Default specialize_int to False (#96624)"
This reverts commit 1ac8782db2.

Reverted https://github.com/pytorch/pytorch/pull/96624 on behalf of https://github.com/kit1980 due to Broke inductor/test_torchinductor_dynamic_shapes.py
2023-03-14 19:43:47 +00:00
Edward Z. Yang
1ac8782db2 Default specialize_int to False (#96624)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96624
Approved by: https://github.com/janeyx99
2023-03-14 18:37:47 +00:00
Avik Chaudhuri
178d2a38e0 debug shape guards (#95848)
Adds logging when shape guards are added and when symbols are specialized to constants.

Differential Revision: [D43719743](https://our.internmc.facebook.com/intern/diff/D43719743/)

Differential Revision: [D43719743](https://our.internmc.facebook.com/intern/diff/D43719743)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95848
Approved by: https://github.com/ezyang
2023-03-14 16:05:28 +00:00
Michael Lazos
203890e1e0 Properly show buck target to run (#96089)
Summary: Makes the debug dir location configurable with TORCH_COMPILE_DEBUG_DIR env var

Test Plan: TORCH_COMPILE_DEBUG_DIR=”.” buck2 run mode/dev-nosan //caffe2/test/inductor:minifier_smoke

Reviewed By: bertmaher

Differential Revision: D43639955

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96089
Approved by: https://github.com/bertmaher
2023-03-07 22:52:27 +00:00
Will Constable
d4f5f9fdb4 Profile dynamo guards (#96119)
Adds a profiler start and end callback to dynamo's C eval_frame impl, which can be used to profile a region providing a name for visualization.  Currently only hooks up one usage to profile cache lookup (primarily covering guards and linear search through  linked list).

Example profile taken from toy model:
`python benchmarks/dynamo/distributed.py --toy_model --profile --dynamo aot_eager`
<img width="1342" alt="image" src="https://user-images.githubusercontent.com/4984825/223225931-b2f6c5a7-505a-4c90-9a03-34982f6dc033.png">

Planning to measure overhead in CI, and probably can't afford to check this in enabled by default.  Will have to evaluate UX options such as `config.profile_dynamo_cache = True` or some other way.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96119
Approved by: https://github.com/jansel
2023-03-07 16:12:22 +00:00
Jason Ansel
95d17dc93d [inductor] Reland #95567 part 1 (#96023)
This is the non-problematic part of #95567.  The errors were coming from
IR printing changes which will be next in the stack.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96023
Approved by: https://github.com/ngimel, https://github.com/mlazos
2023-03-06 22:57:22 +00:00
Jason Ansel
43dd043ea7 Revert "[inductor] Improve error messages (#95567)" (#96014)
This reverts commit 62b775583f.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/96014
Approved by: https://github.com/Chillee
2023-03-04 04:03:31 +00:00
Edward Z. Yang
d303665d33 Make int unspecialization actually work (#95621)
OK, so this PR used to be about reducing the number of constants we specialize on, but it turns out that unspecialization was ~essentially never used (because we still constant specialized way too aggressively) and I ended up having to fix a bunch of issues to actually get tests to pass. So this PR is now "make int unspecialization actually work". As part of this, I have to turn off unspecialization by default, as there are still latent bugs in inductor.

The general strategy is that an unspecialized int is represented as a SymInt. Representing it as a 0d tensor (which is what the code used to do) is untenable: (1) we often need unspecialized ints to participate in size computations, but we have no way of propagating sympy expressions through tensor compute, and (2) a lot of APIs work when passed SymInt, but not when passed a Tensor. However, I continue to represent Numpy scalars as Tensors, as they are rarely used for size computation and they have an explicit dtype, so they are more accurately modeled as 0d tensors.

* I folded in the changes from https://github.com/pytorch/pytorch/pull/95099 as I cannot represent unspecialized ints as SymInts without also turning on dynamic shapes. This also eliminates the necessity for test_unspec.py, as toggling specialization without dynamic shapes doesn't do anything. As dynamic shapes defaults to unspecializing, I just deleted this entirely; for the specialization case, I rely on regular static shape tests to catch it. (Hypothetically, we could also rerun all the tests with dynamic shapes, but WITH int/float specialization, but this seems... not that useful? I mean, I guess export wants it, but I'd kind of like our Source heuristic to improve enough that export doesn't have to toggle this either.)
* Only 0/1 integers get specialized by default now
* A hodgepodge of fixes. I'll comment on the PR about them.

Fixes https://github.com/pytorch/pytorch/issues/95469

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95621
Approved by: https://github.com/jansel, https://github.com/Chillee
2023-03-04 01:22:08 +00:00
Jason Ansel
62b775583f [inductor] Improve error messages (#95567)
Example error message before/after (710 to 131 lines):
https://gist.github.com/jansel/6fecad057738089fa95bf08c3de9fc8a

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95567
Approved by: https://github.com/mlazos
2023-03-02 02:20:55 +00:00
Kazuaki Ishizaki
46385b3e48 Fix typos under torch/_dynamo directory (#95599)
This PR fixes typos in comments and messages of `.py` files under `torch/_dynamo` directory

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95599
Approved by: https://github.com/ezyang
2023-02-28 03:44:24 +00:00
Edward Z. Yang
4833e47feb Add support for nonzero, some improvements to reduce guards (#95387)
This takes the strategy described in https://docs.google.com/document/d/1lFRYAJo5nrfxRhwIzGnfi2pbLpU6T4ytSRSuLJ5qebI/edit#

It is essentially https://github.com/pytorch/pytorch/pull/95222 but squashed and with changes that are unnecessary given that we assume nonzero returns > 1.

What's in the PR:

* nonzero now supports meta propagation. When `capture_dynamic_output_shape_ops`, it will return a tensor with an unbacked SymInt representing the size in question.
* The unbacked SymInt is UNSOUNDLY assumed to be not equal to 0/1. We will still error if you guard otherwise.
* PrimTorch pointwise operators are updated to use empty_permuted, to avoid guarding on unbacked SymInt from empty_strided (tested in `test_dynamic_pointwise_scalar`)
* Convolution is updated to skip backend selection if batch is unbacked, to avoid guarding on unbacked SymInt (tested in `test_unbacked_batch_resnet`)
* I kept the helper utilities like `definitely_true` for working with possibly unbacked SymInts. They're not used right now but maybe someone will find them useful.
* Added `constrain_unify` to let you specify two unbacked SymInts must have the same value

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95387
Approved by: https://github.com/voznesenskym
2023-02-24 00:27:45 +00:00
Michael Voznesensky
500ebb2cd6 Fine grained dynamic shape controls (#94787)
https://docs.google.com/document/d/1aoIyYE8_6cYpWqS25thzVoIiKsT5aaUEOiiPwbIXt8k/edit

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94787
Approved by: https://github.com/ezyang
2023-02-17 22:28:37 +00:00
PyTorch MergeBot
e0ede1cc30 Revert "Fine grained dynamic shape controls (#94787)"
This reverts commit 2aa806608b.

Reverted https://github.com/pytorch/pytorch/pull/94787 on behalf of https://github.com/kit1980 due to After this PR, test_autocast_sdpa_dynamic_shapes_static_default started to fail with RuntimeError: Cannot call sizes() on tensor with symbolic sizes/strides: https://github.com/pytorch/pytorch/actions/runs/4206176846/jobs/7299657478
2023-02-17 19:52:16 +00:00
Michael Voznesensky
2aa806608b Fine grained dynamic shape controls (#94787)
https://docs.google.com/document/d/1aoIyYE8_6cYpWqS25thzVoIiKsT5aaUEOiiPwbIXt8k/edit

Pull Request resolved: https://github.com/pytorch/pytorch/pull/94787
Approved by: https://github.com/ezyang
2023-02-17 17:39:22 +00:00
Jason Ansel
4d6a4401f8 Raise warning if torch.compile options change without reset (#94680)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/94680
Approved by: https://github.com/wconstab, https://github.com/malfet
2023-02-13 20:21:04 +00:00
Xiaodong Wang
88e16849db [pt2] Fix multiple races in log folder (#93407)
Summary:
There are a few races/permission errors in file creation, fixing
OSS:
1. caffe2/torch/_dynamo/utils.py, get_debug_dir: multiple process may conflict on it even it's using us. Adding pid to it
2. caffe2/torch/_dynamo/config.py: may not be a right assumption that we have permission to cwd

Test Plan: sandcastle

Differential Revision: D42905908

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93407
Approved by: https://github.com/soumith, https://github.com/mlazos
2023-02-09 21:10:14 +00:00
Jason Ansel
57d74aae55 Remove torch/_dynamo/optimizations/normalize.py (#93278)
This file was largely made obsolete by dispatcher level functionalization,
and has been disabled by config.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93278
Approved by: https://github.com/voznesenskym
2023-02-02 02:02:54 +00:00
Edward Z. Yang
ca9ebf9e2b Delete dynamo_import and inductor_import (#93851)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/93851
Approved by: https://github.com/albanD, https://github.com/jansel
2023-02-02 01:51:29 +00:00