Commit Graph

25787 Commits

Author SHA1 Message Date
Fabio Rocha
c6c2de586d [inductor] New approach for computing triton load/store masks (#89566)
This PR changes the way masks for loads/stores are computed in triton backend of inductor.

New approach is to iterate over all variables used in indexing expression and add the corresponding mask variables to the set that will be used. For indexing variables like `x0`, `y1` and  `r3` it adds `xmask`, `ymask` and `rmask` respectively.
For indexing variables like `tmp5` (i.e., indirect indexing), it uses the new `mask_vars` attribute of the corresponding `TritonCSEVariable` object, which is populated when variable is created.

I started working on this with the aim of fixing https://github.com/pytorch/torchdynamo/issues/1654, which meanwhile was fixed by #89524 with a different approach, making this change less necessary. However note that #89524 fixes the issue by broadcasting the indices that are being loaded to a larger size, while this approach fixes it by making the mask have only the necessary terms.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89566
Approved by: https://github.com/jansel, https://github.com/ngimel
2022-12-09 12:43:19 +00:00
Alex Settle
6b7efac3c9 Reland "Add heirachical module names to torchFX graph.node" (#90205)
Fixes #87659

Reland of PR #87742

Resolves errors that caused the changes to be backed out.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90205
Approved by: https://github.com/jerryzh168
2022-12-09 06:20:31 +00:00
HDCharles
c71b12851d [ao] public vs private for ao.quantization._X (#88392)
Summary: added all for these modules without altering names since they
tend to be experimental

Test Plan: python test/test_public_bindings.py

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D41015543](https://our.internmc.facebook.com/intern/diff/D41015543)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88392
Approved by: https://github.com/jcaip
2022-12-09 05:39:29 +00:00
HDCharles
6050a7a3d9 [ao] backend_config moving all to top (#88391)
Summary: moved __all__ to top of functions, removed private funcitons
from all

Test Plan: python test/test_public_bindings.py

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D41015538](https://our.internmc.facebook.com/intern/diff/D41015538)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88391
Approved by: https://github.com/jcaip
2022-12-09 05:39:29 +00:00
Xilun Wu
3759777edc [threaded PG] fix long hang issue in testing (#90515)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90515
Approved by: https://github.com/wanchaol
2022-12-09 05:24:08 +00:00
Mark Saroufim
db0ce4acf3 Dynamo, FX, Inductor Progress Bars (#88384)
There are 3 progress bars each gated behind their own config, all off by default for now
1. Dynamo: Macro level config for dynamo, AOT, inductor
2. FX: Progress bar for each pass, with their names
3. Inductor

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88384
Approved by: https://github.com/wconstab, https://github.com/mlazos
2022-12-09 04:32:31 +00:00
Mauricio Villegas
aacafd2cba Fixed a couple of mistakes in type annotations in optim package (#90216)
Doing some tests with all Optimizer and LRScheduler classes in optim package, I noticed a couple of mistakes in type annotations, so created a pull request to fix them.

- In Optimizer class, incorrectly named parameter `default` instead of `defaults` in pyi file
- In SGD class, type for `maximize` and `differentiable` not available in either py or pyi files

I don't know if there is a plan to move all types from pyi to py files, so wasn't too sure where to fix what.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90216
Approved by: https://github.com/janeyx99
2022-12-09 03:20:21 +00:00
Jerry Zhang
797544f1c4 [dynamo][ez] Change module type to str for easier downstream parsing (#90429)
Summary:
att

Test Plan:
NA

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90429
Approved by: https://github.com/SherlockNoMad
2022-12-09 02:00:18 +00:00
Jerry Zhang
f978a8b026 [quant][be] Remove special casing for getitem in prepare (#90393)
Summary:
This PR cleans up previous special casing for getitem, it should be configured through BackendConfig

Test Plan:
python test/test_quantization.py TestQuantizeFx

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D41846185](https://our.internmc.facebook.com/intern/diff/D41846185)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90393
Approved by: https://github.com/andrewor14
2022-12-09 01:59:02 +00:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
c8f5c194ca Fix bug in dynamic shapes multiply (#90336)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90336
Approved by: https://github.com/ezyang
2022-12-09 00:59:50 +00:00
Andrew Gu
2cf703214b [Composable API][Easy] Fix some follow-ups (#90471)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90471
Approved by: https://github.com/mrshenli
2022-12-09 00:26:38 +00:00
William Wen
eb5b4c21e1 Deepcopy GraphModule in minifier (#90401)
Fixes https://github.com/pytorch/pytorch/issues/90397. Remove deepcopy calls in minifier tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90401
Approved by: https://github.com/anijain2305, https://github.com/mlazos
2022-12-08 23:59:05 +00:00
Howard Huang
80150788bc [21/N] Add alltoall_base custom op with CPU/CUDA implementations (#89813)
Differential Revision: [D41812670](https://our.internmc.facebook.com/intern/diff/D41812670)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89813
Approved by: https://github.com/kwen2501
2022-12-08 23:39:26 +00:00
Howard Huang
e65ee3975f [20/N] Add recv_any_source custom op with CPU/CUDA implementations (#89505)
Differential Revision: [D41812671](https://our.internmc.facebook.com/intern/diff/D41812671)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89505
Approved by: https://github.com/kwen2501
2022-12-08 23:39:26 +00:00
Rohan Varma
43660051d8 [Ez] Omit HSDP Z2 from doc (#90503)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90503
Approved by: https://github.com/awgu
2022-12-08 23:05:49 +00:00
William Wen
9bb16cd3ca Track torch.compile calls (#90310)
Title.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90310
Approved by: https://github.com/colin2328, https://github.com/anijain2305
2022-12-08 21:41:15 +00:00
Michael Lazos
76f440f20a [dynamo] Rewrite inplace addcdiv and inplace add (#90330)
Rewrite inplace addcdiv to a div, mul and inplace add to avoid graph break
Rewrite inplace add to a mul and inplace add to avoid graph break

Needed to close optimizer graph breaks

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90330
Approved by: https://github.com/jansel
2022-12-08 21:19:23 +00:00
Stephen Macke
0c972fb5c7 [rfc][pkg] check spec for module source before falling back to file in package exporter (#90258)
Summary: To get source for a particular module, the "correct" thing to do is to check the module's spec and use `get_source` if it's a SourceFileLoader, since subclasses may look elsewhere than the `__file__`, and the spec will give the source of truth. For torch packager, however, we prefer to use linecache, but the loader could still change the file, so we figure out the file for the module using the spec's loader rather than using `module.__file__`, if possible.

Test Plan: This code path will get exercised by CI. Also added a test for remapped files.

Differential Revision: D41412983

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90258
Approved by: https://github.com/PaliC
2022-12-08 20:24:45 +00:00
Zheng Yan
e1674d7dc0 avoid fork in torch/__init__.py for deploy/multipy (#90492)
Summary:
We should not fork in deploy when initializing torch.

    Traceback (most recent call last):
    File "<string>", line 38, in <module>
    File "<string>", line 36, in __run
    File "/usr/local/fbcode/platform010/lib/python3.8/runpy.py", line 194, in _run_module_as_main
        return _run_code(code, main_globals, None,
    File "/usr/local/fbcode/platform010/lib/python3.8/runpy.py", line 87, in _run_code
        exec(code, run_globals)
    File "/data/users/zyan/fbsource/buck-out/v2/gen/fbcode/104a4d5c3a690252/multipy/runtime/__test_py__/test_py#link-tree/multipy/runtime/test_py.py", line 61, in <module>
        import torch # has to be done serially otherwise things will segfault
    File "/data/users/zyan/fbsource/buck-out/v2/gen/fbcode/104a4d5c3a690252/multipy/runtime/__test_py__/test_py#link-tree/torch/__init__.py", line 158, in <module>
        platform.system() != 'Windows':
    File "/usr/local/fbcode/platform010/lib/python3.8/platform.py", line 891, in system
        return uname().system
    File "/usr/local/fbcode/platform010/lib/python3.8/platform.py", line 857, in uname
        processor = _syscmd_uname('-p', '')
    File "/usr/local/fbcode/platform010/lib/python3.8/platform.py", line 613, in _syscmd_uname
        output = subprocess.check_output(('uname', option),

Test Plan: override a local script run trigger init and set `subprocess.check_output` to None

Reviewed By: yinghai, houseroad

Differential Revision: D41848592

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90492
Approved by: https://github.com/PaliC
2022-12-08 20:22:01 +00:00
Elias Ellison
b651e06049 Add Pointwise Tag from pointwise set in DTensor, use in aot_autograd partitioner (#90029)
Takes the pointwise op list from [DTensor](https://github.com/pytorch/pytorch/blob/master/torch/distributed/_tensor/ops/pointwise_ops.py#L36) as an initially starting point for pointwise ops, and feeds them to the aot autograd partitioner.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90029
Approved by: https://github.com/ezyang
2022-12-08 20:21:17 +00:00
Richard Zou
7342251281 functorch.grad support for autograd.Function (#89860)
Happy to split this PR more if it helps.

This PR adds functorch.grad support for autograd.Function. There's a lot
going on; here is the high level picture and there are more details as
comments in the code.

Mechanism (PyOperator)
- Somehow, autograd.Function needs to dispatch with functorch. This is
necessary because every layer of functorch needs to see the
autograd.Function; grad layers need to preserve the backward pass.
- The mechanism for this is via PyOperator. If functorch transforms are
active, then we wrap the autograd.Function in a `custom_function_call`
PyOperator where we are able to define various rules for functorch
transforms.
- `custom_function_call` has a rule for the functorch grad transform.

autograd.Function changes
- I needed to make some changes to autograd.Function to make this work.
- First, this PR splits autograd.Function into a _SingleLevelFunction
(that works with a single level of functorch transform) and
autograd.Function (which works with multiple levels). This is necessary
because functorch's grad rule needs some way of specifying a backward
pass for that level only.
- This PR changes autograd.Function's apply to eitehr call
`custom_function_call` (if functorch is active) or super().apply (if
functorch isn't active).

Testing
- Most of this PR is just testing. It creates an autograd.Function
OpInfo database that then gets passed to the functorch grad-based tests
(grad, vjp, vjpvjp).
- Since functorch transform tests are autogenerated from OpInfo tests,
this is the easiest way to test various autograd.Function with
functorch.

Future
- jvp and vmap support coming next
- better error message (functorch only supports autograd.Function that
have the optional setup_context staticmethod)
- documentation to come when we remove the feature flag

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89860
Approved by: https://github.com/soulitzer
2022-12-08 19:31:04 +00:00
Richard Zou
eb314f9b1a Add setup_context staticmethod to autograd.Function (#89859)
Adds a setup_context staticmethod to autograd.Function.
If it exists, then the user splits the ctx-specific logic from the
forward() and puts it in the setup_context staticmethod.

Docs will come later when we remove the feature flag.

Test Plan:
- some light tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89859
Approved by: https://github.com/soulitzer
2022-12-08 19:31:04 +00:00
Richard Zou
103be1f164 Add feature flag for the autograd.Function extension (#89858)
This PR adds a private runtime feature flag for the feature work we're going
to do with extending autograd.Function. The motivation of the feature flag
is:
- to guard the feature against unsuspecting users
- control the release of the feature to when we are ready to release it

We might not even need the feature flag (because we hope to have the
work done in the next month), but it is good practice and it does touch
currently public API (autograd.Function).

Concretely, "autograd.Function extension" refers to:
- adding an optional `setup_context` staticmethod to autograd.Function
- adding an optional `vmap` staticmethod to autograd.Function
- autograd.Function support for functorch

Test Plan:
- new test that the feature flag works
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89858
Approved by: https://github.com/soulitzer
2022-12-08 19:31:01 +00:00
Yuxin Wu
1ba5c55992 skip flaky tests (rather than expectedFailure) (#90233)
They are flaky but don't always fail. So `expectedFailure` is incorrect.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90233
Approved by: https://github.com/mruberry, https://github.com/soumith
2022-12-08 18:29:11 +00:00
PyTorch MergeBot
e89685b0b5 Revert "[inductor] Use decomposition for _to_copy (#90314)"
This reverts commit 3fdb5f2dda.

Reverted https://github.com/pytorch/pytorch/pull/90314 on behalf of https://github.com/desertfire due to regresses performance on hf_Bert
2022-12-08 18:29:06 +00:00
Jiewen Tan
b738da8c8e [LTC] Tweak LazyTensor Class for XLATensor (#90363)
Summary:
This pull request makes some tweaks on LazyTensor class such that it's easier for XLATensor to inherit.

1. It replaces data_ptr() with data() which now returns a const shared_ptr& type.
2. It adds a temporary ctor to LazyTensor::Data such that XLATensor::Data can easily inherits it.
3. It moves LazyTensor(std::shared_ptr<Data>) and SetTensorData(at::Tensor) to protected for XLATensor to access.

Test Plan:
CI.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90363
Approved by: https://github.com/JackCaoG
2022-12-08 18:23:17 +00:00
Bin Bao
d2ee94231e [inductor] Fallback for index with None in the middle of indices (#90022)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90022
Approved by: https://github.com/ngimel
2022-12-08 16:18:57 +00:00
Rohan Varma
793a999ce0 Hybrid Sharded Data Parallel (#89915)
Adds 2 new hybrid sharding strategy to FSDP:
1. HYBRID_SHARD: applies zero-3 style sharding within a node, and data parallel across
2. HYBRID_SHARD_ZERO2: applies zero-2 style sharding within a node, and data parallel across

These are useful for medium sized models and aim to decrease communication volume, tests and benchmarks will be run to understand which workloads are optimal under which sharding strategy.

Hybrid sharding in general works by sharding the model using a process group within a single node, and creating intra-node process groups for replication / data parallelism. The user either needs to pass in a tuple of these process groups, or None, and we generate the process groups appropriately.

** Acknowledgements **
- @awgu 's excellent prototype: 5ad3a16d48
- @liangluofb For ideation, feedback, and initial implementation and experimentation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89915
Approved by: https://github.com/awgu
2022-12-08 16:18:03 +00:00
Peter Bell
454361435c Implement correction argument in torch.masked.{std,var} (#87118)
This makes the signature of `torch.masked.std` and `var` more consistent with the global namespace variant and also updates the sample inputs to repurpose the existing `sample_inputs_std_var` inputs which fully exercise the `correction` argument.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87118
Approved by: https://github.com/cpuhrsch
2022-12-08 15:59:09 +00:00
Andrew Gu
21a0e809c2 [Composable API] Match fully_shard() comm. schedule with wrapper FSDP (#90387)
- This PR introduces a new concept, the _communication module_ (denoted `comm_module`), that represents the module responsible for the unshard/reshard pair for a `FlatParamHandle`. This is well-defined because the current design assumes that each `FlatParamHandle` only has _one_ unshard/reshard pair for either the forward or backward pass.
    - For the wrapper code path, the `comm_module` is exactly the module already being passed to the `FlatParamHandle` constructor.
    - For the composable code path, the `comm_module` is not necessarily the module already being passed to the `FlatParamHandle`. This is because the module already being passed is always the local FSDP root module to give complete FQNs, instead of local FQNs. Distinguishing the communication module from the local FSDP root module can provide more flexibility for non-recursive wrapping designs in the future.
- This PR adds a unit test `test_unshard_reshard_order` that explicitly checks that `_unshard` and `_reshard` are called in the exactly the same order across the two code paths.
- This PR does not fix `test_checkpoint_fsdp_submodules_use_reentrant`. However, the error message changes, so this PR accommodates that.
    - The error is now the same as if we used the equivalent wrapper FSDP:
    ```
    test_model.u1 = FSDP(test_model.u1, use_orig_params=True)
    test_model.u2 = FSDP(test_model.u2, use_orig_params=True)
    ```
    - The error is also the same as if we used wrapper FSDP with `use_orig_params=False`, so it is not unique to `use_orig_params=True`.

---

**`comm_module` Example**

```
model = Model(
    seq1: nn.Sequential(
        nn.Linear
        nn.ReLU
        nn.Linear
        nn.ReLU
    )
    seq2: nn.Sequential(
        nn.Linear
        nn.ReLU
        nn.Linear
        nn.ReLU
    )
)
policy = ModuleWrapPolicy({nn.Sequential})
fully_shard(model, policy=policy)
FullyShardedDataParallel(model, auto_wrap_policy=policy)
```
- This policy constructs two `FlatParamHandle`s, one for `seq1` and one for `seq2`.
- `FullyShardedDataParallel` will pass `seq1` and `seq2` as the `module` argument to the two `FlatParamHandle`s, respectively.
- `fully_shard()` will pass `model` as the `module` argument to every `FlatParamHandle`.
- `FullyShardedDataParallel` will pass `seq1` and `seq2` as the `comm_module` argument to the two `FlatParamHandle`s, respectively.
- `fully_shard()` will pass `seq1` and `seq2` as the `comm_module` argument to the two `FlatParamHandle`s, respectively.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90387
Approved by: https://github.com/mrshenli
2022-12-08 15:55:20 +00:00
Andrew Gu
4011597dd4 [Composable API] Refactor test_fully_shard.py to use common models (#90386)
Unlike for FSDP, where we already diverged to using per-test-file models, let us try to use the same set of models for the composable API effort. This can improve debugging efficiency because we know which module structures we support and which we do not _across all of our composable APIs_.

This PR had to perform some surgery for `test_materialize_meta_module`. Writing a correct parameter initialization function for meta device initialization is not easy, and we should revisit this. The old implementation, which followed the style of the previous unit tests--namely, using `module.to_empty()`--is actually incorrect for nested FSDP applications because `module.to_empty()` will re-initialize already materialized parameters and the module materialization proceeds bottom up. The existing unit test in `test_fsdp_meta.py` passes because it sets every parameter to ones (`self.weight.fill_(1)`), which is idempotent to re-initialization.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90386
Approved by: https://github.com/mrshenli
2022-12-08 15:32:36 +00:00
Andrew Gu
5ca4e95f6c [Composable API] Move test models to common file (#90385)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90385
Approved by: https://github.com/mrshenli
2022-12-08 15:32:36 +00:00
Bin Bao
3fdb5f2dda [inductor] Use decomposition for _to_copy (#90314)
Summary: also contains a fix for https://github.com/pytorch/pytorch/issues/89633

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90314
Approved by: https://github.com/ngimel
2022-12-08 15:25:44 +00:00
Till Hoffmann
b485781440 Add a transform for positive-definite matrices. (#76777)
The `PositiveDefiniteTransform` is required to transform from an unconstrained space to positive definite matrices, e.g. to support testing the Wishart mode in #76690. It is a simple extension of the `LowerCholeskyTransform`.

I've also added a small test that ensures the generated data belong to the domain of the associated transform. Previously, the data generated for the inverse transform of the `LowerCholeskyTransform` wasn't part of the domain, and the test only passed because the comparison uses `equal_nan=True`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76777
Approved by: https://github.com/lezcano, https://github.com/fritzo, https://github.com/soumith
2022-12-08 09:18:44 +00:00
Yuxin Wu
c00b135adf Remove deprecated call to tf.io.gfile.get_filesystem (#89832)
Fixes #30966 . Fixes #47139
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89832
Approved by: https://github.com/soumith
2022-12-08 08:53:27 +00:00
Yuxin Wu
ecd784667c Avoid overflow in tensorboard image summary (#90423)
Fix #90419

Added some code such that the test will update the expect files when `expecttest.ACCEPT` is True.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90423
Approved by: https://github.com/soumith
2022-12-08 08:31:52 +00:00
Jiewen Tan
1978773399 [LTC] Overlap data creation and ir_value setting (#90438)
Summary:
Upstreaming changes from torch_xla to lazy tensor core: https://github.com/pytorch/xla/pull/4011.
It overlaps data creation and ir_value setting with previous executions.

To be noted, this is a clone of https://github.com/pytorch/pytorch/pull/87119, and the author is @aws-rhsoln.

Test Plan:
CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90438
Approved by: https://github.com/JackCaoG
2022-12-08 08:11:01 +00:00
Rohan Varma
9c80f13692 [Resubmit] state_dict_pre_hook (#90435)
Resubmit of https://github.com/pytorch/pytorch/pull/88541 which got stale.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90435
Approved by: https://github.com/fegin
2022-12-08 07:54:14 +00:00
Jesse Cai
de016b3799 [pruning][core][feature] Implement prune for structured pruning (#89777)
Summary:

This PR implements `prune` in BaseStructuredSparsifier:

`prune` is a function that takes in a model with structured sparsity parametritizations (the result of `prepare`) and will return a resized model with the masked out weights removed.

`prune` is defined by a mapping from **patterns** to different **pruning functions**.
	- **patterns** are just sequences of operations, for example `(nn.Linear, activation, nn.Linear)`
	- **pruning functions** are functions that take in an matched pattern as args and will resize the appropriate layer sizes and weights.
	  ```
	  def prune_linear_activation_linear(linear1, activation, linear2):
		pass
	  ```
	- This is one line in the pattern config `(nn.Linear, activation, nn.Linear): prune_linear_activation_linear`

At a high level `prune` works by finding instances of the graph that match different patterns and then calling the mapped pruning functions on those matched patterns.
This is unlike the previous code which attempted to do both at the same time.

There may be some gaps in the patterns compared to the previous implementation, but the conversion functionality support should be the same.

Currently we have pruning functions for the following patterns:
    - linear -> linear
    - linear -> activation -> linear
    - conv2d -> conv2d
    - conv2d -> activation -> conv2d
    - conv2d -> activation -> pool -> conv2d
    - conv2d -> pool -> activation -> conv2d
    - conv2d -> adaptive pool -> flatten -> linear

Added in MyPy type hints as well for the prune_functions.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89777
Approved by: https://github.com/vkuzo
2022-12-08 07:13:24 +00:00
Jiewen Tan
c20d41253f [LTC] Tweak LazyGraphExecutor for XLA (#90420)
Summary:
This patch moves some of the data structures from private to protected such that XLAGraphExecutor can reuse them.

Test Plan:
CI.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90420
Approved by: https://github.com/JackCaoG
2022-12-08 06:56:23 +00:00
fduwjj
1a48ae96ba [PT-D][Easy] Reformat the optim code within PTD code base (#90399)
Just run two commands:
```
ufmt format torch/distributed/optim/
ufmt format test/distributed/optim/
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90399
Approved by: https://github.com/awgu
2022-12-08 06:38:59 +00:00
titaiwang
06c98e673f [ONNX] Fix ignored small eps in layer normalization in fp16 (#89869)
Prior to this change, the symbolic_fn `layer_norm` (before ONNX version 17) always lose precision when eps is smaller than Float type, while PyTorch always take eps as Double. This PR adds `onnx::Cast` into eps related operations to prevent losing precision during the calculation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89869
Approved by: https://github.com/BowenBao
2022-12-08 06:13:09 +00:00
PyTorch MergeBot
5f3ca208c5 Revert "add save and load stats in memory_tracker (#90144)"
This reverts commit 1f137c1e2f.

Reverted https://github.com/pytorch/pytorch/pull/90144 on behalf of https://github.com/ezyang due to dirty git working copy broke master
2022-12-08 05:16:56 +00:00
PyTorch MergeBot
22a249e44e Revert "[Inductor] More robust stride and offset extraction from index expressions (#90184)"
This reverts commit 71f27f7688.

Reverted https://github.com/pytorch/pytorch/pull/90184 on behalf of https://github.com/ngimel due to catastrophically regresses performance
2022-12-08 05:04:15 +00:00
Han Qi (qihqi)
25eb7c3ae3 Clean up dependancy for flatbuffer_loader (#86041)
Test Plan: waitforsandcastle

Differential Revision: D38445936

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86041
Approved by: https://github.com/cccclai
2022-12-08 03:48:04 +00:00
Edward Z. Yang
37892041a1 Always compile tiny graphs with AOTAutograd (#89775)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89775
Approved by: https://github.com/anjali411, https://github.com/bdhirsh
2022-12-08 03:41:29 +00:00
Iris
b8b7480065 [Checkpoint][2D][6/N] Add optimizer and update default_planner to core distributed (#90212)
This is the last PR for integrating 2D into core distributed.

This PR does the following:
1. Add optimizer.py: this adds ability to load a state_dict in conjunction with FSDP sharded optimzer state.
2. Update default_planner.py to support 2D checkpoint.
3. Add test_fsdp_optim_state.py as a unit test for No. 1.
4. Fix bug in torch/testing/_internal/distributed/checkpoint_utils.py
5. Rename the filename for the APIs that should be private. Will organize and cleanup further in following PRs. #90328

Docstring and integration test will be added in the following PRs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90212
Approved by: https://github.com/wanchaol
2022-12-08 02:53:29 +00:00
Nikita Shulga
36ac095ff8 Migrate PyTorch to C++17 (#85969)
With CUDA-10.2 gone we can finally do it!

This PR mostly contains build system related changes, invasive functional ones are to be followed.
Among many expected tweaks to the build system, here are few unexpected ones:
 - Force onnx_proto project to be updated to C++17 to avoid `duplicate symbols` error when compiled by gcc-7.5.0, as storage rule for `constexpr` changed in C++17, but gcc does not seem to follow it
 - Do not use `std::apply` on CUDA but rely on the built-in variant, as it results in test failures when CUDA runtime picks host rather than device function when `std::apply` is invoked from CUDA code.
 - `std::decay_t` -> `::std::decay_t` and `std::move`->`::std::move` as VC++ for some reason claims that `std` symbol is ambigious
 - Disable use of `std::aligned_alloc` on Android, as its `libc++` does not implement it.

Some prerequisites:
 - https://github.com/pytorch/pytorch/pull/89297
 - https://github.com/pytorch/pytorch/pull/89605
 - https://github.com/pytorch/pytorch/pull/90228
 - https://github.com/pytorch/pytorch/pull/90389
 - https://github.com/pytorch/pytorch/pull/90379
 - https://github.com/pytorch/pytorch/pull/89570
 - https://github.com/facebookincubator/gloo/pull/336
 - https://github.com/facebookincubator/gloo/pull/343
 - 919676fb32

Fixes https://github.com/pytorch/pytorch/issues/56055

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85969
Approved by: https://github.com/ezyang, https://github.com/kulinseth
2022-12-08 02:27:48 +00:00
Will Constable
772b726068 Revert "Disable dynamo tracing torchrec.distributed (#90087)" (#90416)
This reverts commit 7e9a8a1361.

This revert fixes a torchbench dlrm amp crash.  Auto revert fails due to conflict.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90416
Approved by: https://github.com/yanboliang, https://github.com/malfet
2022-12-08 01:50:54 +00:00
Richard Barnes
ad188a227e Introduce CUDA Device Assertions Infrastructure (#84609)
Summary:
This diff introduces a set of changes that makes it possible for the host to get assertions from CUDA devices. This includes the introduction of

**`CUDA_KERNEL_ASSERT2`**

A preprocessor macro to be used within a CUDA kernel that, upon an assertion failure, writes the assertion message, file, line number, and possibly other information to UVM (Managed memory). Once this is done, the original assertion is triggered, which places the GPU in a Bad State requiring recovery. In my tests, data written to UVM appears there before the GPU reaches the Bad State and is still accessible from the host after the GPU is in this state.

Messages are written to a multi-message buffer which can, in theory, hold many assertion failures. I've done this as a precaution in case there are several, but I don't actually know whether that is possible and a simpler design which holds only a single message may well be all that is necessary.

**`TORCH_DSA_KERNEL_ARGS`**

This preprocess macro is added as an _argument_ to a kernel function's signature. It expands to supply the standardized names of all the arguments needed by `C10_CUDA_COMMUNICATING_KERNEL_ASSERTION` to handle device-side assertions. This includes, eg, the name of the pointer to the UVM memory the assertion would be written to. This macro abstracts the arguments so there is a single point of change if the system needs to be modified.

**`c10::cuda::get_global_cuda_kernel_launch_registry()`**

This host-side function returns a singleton object that manages the host's part of the device-side assertions. Upon allocation, the singleton allocates sufficient UVM (Managed) memory to hold information about several device-side assertion failures. The singleton also provides methods for getting the current traceback (used to identify when a kernel was launched). To avoid consuming all the host's memory the singleton stores launches in a circular buffer; a unique "generation number" is used to ensure that kernel launch failures map to their actual launch points (in the case that the circular buffer wraps before the failure is detected).

**`TORCH_DSA_KERNEL_LAUNCH`**

This host-side preprocessor macro replaces the standard
```
kernel_name<<<blocks, threads, shmem, stream>>>(args)
```
invocation with
```
TORCH_DSA_KERNEL_LAUNCH(blocks, threads, shmem, stream, args);
```
Internally, it fetches the UVM (Managed) pointer and generation number from the singleton and append these to the standard argument list. It also checks to ensure the kernel launches correctly. This abstraction on kernel launches can be modified to provide additional safety/logging.

**`c10::cuda::c10_retrieve_device_side_assertion_info`**
This host-side function checks, when called, that no kernel assertions have occurred. If one has. It then raises an exception with:
1. Information (file, line number) of what kernel was launched.
2. Information (file, line number, message) about the device-side assertion
3. Information (file, line number) about where the failure was detected.

**Checking for device-side assertions**

Device-side assertions are most likely to be noticed by the host when a CUDA API call such as `cudaDeviceSynchronize` is made and fails with a `cudaError_t` indicating
> CUDA error: device-side assert triggered CUDA kernel errors

Therefore, we rewrite `C10_CUDA_CHECK()` to include a call to `c10_retrieve_device_side_assertion_info()`. To make the code cleaner, most of the logic of `C10_CUDA_CHECK()` is now contained within a new function `c10_cuda_check_implementation()` to which `C10_CUDA_CHECK` passes the preprocessor information about filenames, function names, and line numbers. (In C++20 we can use `std::source_location` to eliminate macros entirely!)

# Notes on special cases

* Multiple assertions from the same block are recorded
* Multiple assertions from different blocks are recorded
* Launching kernels from many threads on many streams seems to be handled correctly
* If two process are using the same GPU and one of the processes fails with a device-side assertion the other process continues without issue
* X Multiple assertions from separate kernels on different streams seem to be recorded, but we can't reproduce the test condition
* X Multiple assertions from separate devices should be all be shown upon exit, but we've been unable to generate a test that produces this condition

Differential Revision: D37621532

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84609
Approved by: https://github.com/ezyang, https://github.com/malfet
2022-12-08 01:26:07 +00:00
Atul Jangra
1ba94b3882 Support pickle version 4 by adding missing ops (#90223)
Summary:
In this logic, we are traversing the entries to find the module for STACK_GLOBAL entries.

According to 2837241f22/Lib/pickletools.py (L1799) we need to look for GET, BINGET and LONG_BINGET.

So this diff updates that. Also while testing, I found some cases of empty modules, for cases such as tanh. For this I added the option to skip processing when this is the case.

Test Plan: Tested with f392778829

Differential Revision: D41748595

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90223
Approved by: https://github.com/PaliC
2022-12-08 01:06:40 +00:00
Edward Z. Yang
d5c6a74699 Rewrite dynamo cond() handling to not recursively call export (#90286)
The original implementation of cond() operator support in dynamo operated by recursively calling export() on the inner subgraph.  This is problematic for a number of reasons:

* My original motivating reason: the original implementation had to play tricks to feed real tensors to the recursive export call, which means that it doesn't work well with tracing with dynamic shapes (where we MUST stay in fake tensors to accurately track dynamic shapes across the cond invocation)
* If there are pending side effects, the recursive export() call won't see those side effects (as they are only tracked by Dynamo, not actually applied to the Python environment.) You can see an example where dynamo cond tracing does the wrong thing at https://github.com/pytorch/pytorch/pull/90208
* If there were side effects inside the true/false branch, these side effects were silently lost (as the export only returns the graph of tensor operations, and not any of the residual Python bytecodes necessary to reapply any side effects.) This could have substantive effects on the export of subsequent parts of the model, as those parts of the models could rely on the side effects.
* It was not possible to track NN module accesses inside the true/false branches, necessitating a hack where the NN module was explicitly passed in as an input to cond https://github.com/pytorch/pytorch/pull/87020#issuecomment-1338842844 which doesn't really make any sense from a backend compilation perspective
* Guards induced from the inside of the true/false branch were not properly propagated to the top level guards; they were just silently dropped (in fact, the original implementation checked that the true/false branch produce the same guards which... is not useful? Like, I don't think that actually is even necessary for correctness)

This PR replaces the old implementation with a new implementation based on graphstate checkpointing. The basic idea is to process a cond(), we checkpoint the state of our interpreter, run the true branch, rollback to our checkpoint, run the false branch, rollback to our checkpoint and then merge the changes from both of the checkpoints. I require the true/false branches to have exactly the same side effects, but union their guards.

Some of the details:

* Dynamo is too aggressive with tracking side effects when processing closures, c.f. https://github.com/pytorch/torchdynamo/pull/233/files#r1040480078 The basic problem is whenever I define a closure, this immediately counts as a side effect, even if I didn't actually mutate anything. This triggered on the nested cond export example. To prevent this from happening, I optimistically avoid tracking side effects, but if a STORE_DEREF happens, I restart analysis with the relevant Source.name() added to `mutated_closure_cell_contents` so we start tracking on closure allocation. This is enough to fix the relevant test.
* For the most part, I assert that the graph states must be equivalent after applying the true/false branches. During debugging, I found it useful to be able to compare two graph states and give a better description about what the divergence was. You can test this using the `diff()` method I've added to a few structures.
* The implementation now supports NestedUserFunctionVariable, which is nice as it allows the true/false branches to be defined closer to the cond implementation.
* I fixed the naming of the true/false subgraphs; previously they were named `name_0`, `name_1`, now they are named `cond_true_0` and `cond_false_0`
* I added `name_to_input` to the saved graph state. I don't actually know if this is necessary, but it seemed like a good idea.
* I have to play some tricks to get the speculating execution of the true/false branch to record into a subgraph. After a careful read of OutputGraph, I found that what would work is overriding graph with a fresh Graph that we want to write things into, and manually setting up the inputs/outputs. It's a little delicate as you have to make sure you reset the Graph to its original before you restore a checkpoint, as checkpoints don't actually save graph for efficiency, and just undo changes on the graph. This capability may usefully get refactored to OutputGraph but I didn't do it in this PR for simplicity.

There are some further problems with the cond() implementation that I leave for future work. Most of these were preexisting with the original implementation.

* Not a problem per se, but if an NN module is used by both the true/false branch, it will show up in the final graph twice (since it has to be a submodule of the GraphModule that makes use of it.) I hope the export pipeline can deal with this.
* List of tensor output for cond is not supported.
* The true/false return values may not have consistent sizes/dims/etc, and we don't check them for consistency.
* If we modify fake tensors in the true/false branches, we aren't rolling them back, c.f. https://github.com/pytorch/torchdynamo/issues/1840

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90286
Approved by: https://github.com/voznesenskym
2022-12-08 01:05:12 +00:00
Edward Z. Yang
54d344b0b7 Type torch._dynamo.side_effects (#90202)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90202
Approved by: https://github.com/voznesenskym
2022-12-08 01:05:12 +00:00
Edward Z. Yang
ca5f69ef19 Convert InstructionTranslatorGraphState and OutputGraphState to NamedTuple (#90186)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90186
Approved by: https://github.com/voznesenskym
2022-12-08 01:05:12 +00:00
Edward Z. Yang
1119aac485 Type torch._dynamo.symbolic_convert (#90185)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90185
Approved by: https://github.com/voznesenskym
2022-12-08 01:05:12 +00:00
Edward Z. Yang
7abd035b2f Add missing mypy-nofollow.ini (#90179)
I'm not sure how lintrunner worked without this lol.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90179
Approved by: https://github.com/albanD, https://github.com/voznesenskym
2022-12-08 01:05:12 +00:00
Jerry Zhang
47071c3d47 [quant] Add support for symmetric quant in executorch (#90304)
Summary:
This PR adds symmetric quant in the backend config for executorch

Test Plan:
NA, will be tested in meta internal flow

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90304
Approved by: https://github.com/cccclai, https://github.com/jcaip, https://github.com/andrewor14
2022-12-08 01:03:00 +00:00
PyTorch MergeBot
9f7bc7bc24 Revert "[Quant][fx][bc-breaking] Make convert.py smaller (#90189)"
This reverts commit 824641b083.

Reverted https://github.com/pytorch/pytorch/pull/90189 on behalf of https://github.com/seemethere due to Fails internal tests due to potential circular import, see https://www.internalfb.com/diff/D41817429?dst_version_fbid=1453307181865235&transaction_fbid=899728221278938
2022-12-08 00:51:13 +00:00
Bin Bao
d7c30e11c6 [inductor] Remove .to from lowering (#90280)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90280
Approved by: https://github.com/ngimel
2022-12-08 00:40:41 +00:00
Michael Wootton
5351176caa Kineto activity fix (#89785)
Continuation of https://github.com/pytorch/pytorch/pull/88207

A compile time guard was preventing ActivityType::CUDA from being available on rocm. This caused both the GPU_FALLBACK and CUDA modes to be active at the same time. So operators were being charged gpu time for the hipEventRecord ranges and the actual kernel execution times. This caused incorrect (and often negative) cuda times, in e.g. table().

Previously a cmake variable was not being propagated to a '-D', causing an issue on Windows, which uses cuda but not cupti.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89785
Approved by: https://github.com/jeffdaily, https://github.com/malfet
2022-12-08 00:24:55 +00:00
Peter Bell
79406378ae [primTorch] Add prim and ref for as_strided_scatter (#88426)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88426
Approved by: https://github.com/mruberry
2022-12-08 00:17:39 +00:00
Yanli Zhao
1f137c1e2f add save and load stats in memory_tracker (#90144)
add save and load stats in memory_tracker, so that users could plot the traces in another place, rather than just inside trainer
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90144
Approved by: https://github.com/rohan-varma
2022-12-08 00:17:21 +00:00
Natalia Gimelshein
bc93454e4a correctly set strides for expanded/unsqueezed dimensions (#90341)
Fixes https://github.com/pytorch/torchdynamo/issues/1959, #90260
However, I wasn't able to make existing stride tests fail before the fix, even though I'm comparing all, not just significant strides.
Separately running refs on meta tensors produces wrong strides as shown in #90260, however, it looks like in meta tests some other way of computing meta info is used (I've been running
```
pytest -s -v test/test_meta.py -k test_meta_outplace_expand_cuda_float64
```
and verified that it has sample input that should fail, and that it indeed compares all the strides, but the produced `meta_rs` results somehow still had correct strides).

Edit: @SherlockNoMad helped me figure out how to fail the tests, and now I've set the correct ops for checking. `expand` fails for some test inputs because it special-cases 0-dim input case, correctly modeling it in prims would require a lot of changes, so skipping that for now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90341
Approved by: https://github.com/SherlockNoMad
2022-12-07 23:38:33 +00:00
Nikita Shulga
e0f681aa85 Add manual cuda deps search logic (#90411)
If PyTorch is package into a wheel with [nvidia-cublas-cu11](https://pypi.org/project/nvidia-cublas-cu11/), which is designated as PureLib, but `torch` wheel is not, can cause a torch_globals loading problem.

Fix that by searching for `nvidia/cublas/lib/libcublas.so.11` an `nvidia/cudnn/lib/libcudnn.so.8` across all `sys.path` folders.

Test plan:
```
docker pull amazonlinux:2
docker run --rm -t amazonlinux:2 bash -c 'yum install -y python3 python3-devel python3-distutils patch;python3 -m pip install torch==1.13.0;curl -OL https://patch-diff.githubusercontent.com/raw/pytorch/pytorch/pull/90411.diff; pushd /usr/local/lib64/python3.7/site-packages; patch -p1 </90411.diff; popd; python3 -c "import torch;print(torch.__version__, torch.cuda.is_available())"'
```

Fixes https://github.com/pytorch/pytorch/issues/88869

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90411
Approved by: https://github.com/atalman
2022-12-07 23:06:51 +00:00
Angela Yi
a076bdb357 [fx] Copy codegen in legalize_graph (#90023)
Test Plan: CI

Differential Revision: D41666330

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90023
Approved by: https://github.com/SherlockNoMad
2022-12-07 21:09:38 +00:00
Edward Z. Yang
6dcc214ac2 Fix AssertionError fake_mode is not None in distributed (#90392)
Fixes https://github.com/pytorch/pytorch/issues/90375

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90392
Approved by: https://github.com/voznesenskym
2022-12-07 20:12:39 +00:00
Edward Z. Yang
2ad6ed8ac9 Fix some typed storage is deprecated warnings. (#89867)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89867
Approved by: https://github.com/albanD
2022-12-07 20:09:57 +00:00
PyTorch MergeBot
1b1301f16a Revert "[pruning][core][feature] Implement prune for structured pruning (#89777)"
This reverts commit 3531e44307.

Reverted https://github.com/pytorch/pytorch/pull/89777 on behalf of https://github.com/clee2000 due to breaking test_ao_sparcity due to import 3531e44307 https://github.com/pytorch/pytorch/actions/runs/3641476330/jobs/6147830487, probably a landrace with 824641b083860df4d7ffef06a798ea2702bc4bde?
2022-12-07 19:41:15 +00:00
Chien-Chin Huang
44779d9bc6 [FSDP][optim_state_dict][2/N] Add _get_fqn_to_fsdp_param_info to map from original FQN to flat_param (#89899)
**Motivation:**
Add a helper to map from the FQN to the corresponding flat_param. The helper will directly get flat_param from fsdp_state and flat_handler as flat_param is not registered to the module if `use_orig_params` is True.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89899
Approved by: https://github.com/awgu
2022-12-07 19:40:47 +00:00
Bin Bao
f7cdd3a7a0 [inductor] Use a large tolerance for botnet26t_256 (#90383)
Summary: botnet26t_256 shows random tolerance failure on CI. The root
cause of this randomness is still to-be-invesitgated, but let's use a
larger tolerance for now.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90383
Approved by: https://github.com/ezyang
2022-12-07 19:35:06 +00:00
YJ Shi
2b0b4bb6fd [Dynamo] Fix llvm target for meta schedule & add torch to tvm ndarray helper func (#90214)
Fixes #90213. Also a torch.tensor to tvm.nd.array helper function is added to avoid data copy with dlpack.

@jansel @Chillee

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90214
Approved by: https://github.com/wconstab
2022-12-07 19:23:56 +00:00
Richard Zou
4b1053497c [vmap] Prepend "legacy" to files for old vmap implementation (#90324)
We have an older torch.vmap implementation. It is no longer supported.
It still needs to exist somewhere for the sake of BC with
torch.autograd.functional.

This PR makes it clear what files are meant for implementing the old
vmap implementation. I've seen a couple of PRs recently adding support
for the old vmap implementation, so this will lessen the confusion.

Test Plan:
- CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90324
Approved by: https://github.com/samdow
2022-12-07 18:46:15 +00:00
Jesse Cai
3531e44307 [pruning][core][feature] Implement prune for structured pruning (#89777)
Summary:

This PR implements `prune` in BaseStructuredSparsifier:

`prune` is a function that takes in a model with structured sparsity parametritizations (the result of `prepare`) and will return a resized model with the masked out weights removed.

`prune` is defined by a mapping from **patterns** to different **pruning functions**.
	- **patterns** are just sequences of operations, for example `(nn.Linear, activation, nn.Linear)`
	- **pruning functions** are functions that take in an matched pattern as args and will resize the appropriate layer sizes and weights.
	  ```
	  def prune_linear_activation_linear(linear1, activation, linear2):
		pass
	  ```
	- This is one line in the pattern config `(nn.Linear, activation, nn.Linear): prune_linear_activation_linear`

At a high level `prune` works by finding instances of the graph that match different patterns and then calling the mapped pruning functions on those matched patterns.
This is unlike the previous code which attempted to do both at the same time.

There may be some gaps in the patterns compared to the previous implementation, but the conversion functionality support should be the same.

Currently we have pruning functions for the following patterns:
    - linear -> linear
    - linear -> activation -> linear
    - conv2d -> conv2d
    - conv2d -> activation -> conv2d
    - conv2d -> activation -> pool -> conv2d
    - conv2d -> pool -> activation -> conv2d
    - conv2d -> adaptive pool -> flatten -> linear

Added in MyPy type hints as well for the prune_functions.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89777
Approved by: https://github.com/vkuzo
2022-12-07 17:52:01 +00:00
Jesse Cai
d680ea7e36 [quant]Fix public bindings for DTypeWithConstraint (#90315)
Summary:
Need this to fix `test_public_bindings`.

Test Plan:
`python test/test_public_bindings.py`
Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90315
Approved by: https://github.com/HDCharles
2022-12-07 17:52:01 +00:00
Michael Voznesensky
4cdc96fb4f Add hooks structure for passing around user provided hooks, add a new guard_failure_fn (#90371)
This PR introduces a new function we can pass to torch._dynamo.optimize - guard_failure_fn. Usage is in the PR, and the one stacked on top of it, but the gist of it is that it emits failed guard reason strings alongside code. This is useful for tests and debugging, as it gives far finer grained assertions and control than the compile counter alone.

This is a resubmit of https://github.com/pytorch/pytorch/pull/90129

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90371
Approved by: https://github.com/ezyang
2022-12-07 17:51:53 +00:00
andrewor14
824641b083 [Quant][fx][bc-breaking] Make convert.py smaller (#90189)
Summary: This commit moves helper functions that are not core
to the convert logic out of convert.py, which was more than
1000 lines. This helps with readability since a new developer
won't have to scroll through hundreds of lines of util functions
to understand the core logic. There should be no change in
functionality in this commit.

BC-breaking notes: The following helper functions that were
previously exposed under the `torch.ao.quantization.fx.convert`
namespace are now made private. Many of these are moved to the
new convert_utils.py
```
convert_custom_module
convert_standalone_module
convert_weighted_module
get_module_path_and_prefix,
has_none_qconfig,
insert_dequantize_node,
is_conversion_supported,
maybe_recursive_remove_dequantize,
replace_observer_or_dequant_stub_with_dequantize_node,
restore_state,
run_weight_observers,
```

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90189
Approved by: https://github.com/jerryzh168
2022-12-07 16:16:25 +00:00
Charlie Yan
99fb39f508 reland #89243: [Composable API] replicate: add support for DDP args (#90255)
reland https://github.com/pytorch/pytorch/pull/89243
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90255
Approved by: https://github.com/zhaojuanmao
2022-12-07 15:22:33 +00:00
Peter Bell
e6a7278753 Give std/var correction overloads proper defaults (#56398)
The correction overloads defaults were left off for forward
compatibility reasons, but this FC window expired well over a year ago
at this point.

Differential Revision: [D29625593](https://our.internmc.facebook.com/intern/diff/D29625593)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56398
Approved by: https://github.com/mruberry
2022-12-07 15:15:00 +00:00
fduwjj
85ae28b454 Reformat optim import (#90294)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90294
Approved by: https://github.com/awgu
2022-12-07 07:11:12 +00:00
Bert Maher
26d1dbc4f8 [inductor] More correct check for fbcode environment (#90312)
Summary:
importing torch.fb seemed like a good idea, but we don't always have
torch.fb inside fbcode.  Testing for torch.version.git_version is more
reliable, since we'll never have a git_version inside fbcode, which is an hg
repo.

Test Plan: `buck2 run mode/dev-nosan //caffe2/test/inductor:smoke`

Reviewed By: soumith, jansel

Differential Revision: D41777058

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90312
Approved by: https://github.com/soumith
2022-12-07 04:50:11 +00:00
Ram Rachum
351d73b97f Fix exception causes all over the codebase (#90271)
This is the continuation to #90134 and hopefully the final PR in this series.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90271
Approved by: https://github.com/kit1980
2022-12-07 04:29:00 +00:00
Yanbo Liang
898b46d6cc [Dynamo][Easy] capture more exceptions when import skip modules (#90338)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90338
Approved by: https://github.com/williamwen42
2022-12-07 02:05:39 +00:00
Peter Bell
71f27f7688 [Inductor] More robust stride and offset extraction from index expressions (#90184)
Currently the stride and offset are determined by substituting 1 and 0 for
different indices, which will fail for any expression that doesn't match the
expected stride calculation. Instead, this uses `sympy.match` and returns `None`
for any variables used in non-standard index calculation, e.g. `torch.roll`
which uses `ModularIndexing`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90184
Approved by: https://github.com/jansel
2022-12-07 01:43:21 +00:00
Peter Bell
4f44877983 [Inductor] Add test for Scheduler fusions (#90014)
Currently there is `test_vertical_fusion1` which fuses entirely during
the lowering stage and no buffers are realized. This adds
`test_scheduler_vertical_fusion1` which is the same test but with
several intermediate calculations realized so the scheduler is left
to do the fusion.

To support the test, this PR also adds:
- `metrics.ir_nodes_pre_fusion` which when compared with
`generated_kernel_count` tells us how many nodes were fused.
- `torch._test_inductor_realize` which is an identity operator in
eager, but under inductor also forces the input to be realized.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90014
Approved by: https://github.com/jansel
2022-12-07 01:33:25 +00:00
andrewor14
13fcc412be [Quant][fx][bc-breaking] Remove unused functions in fx/utils.py (#90025)
Summary and BC-breaking notes: This commit removes the following
unused functions from both the `torch.quantization` and the
`torch.ao.quantization` namespaces:

```
graph_pretty_str
get_per_tensor_qparams
quantize_node
get_qconv_op
create_qparam_nodes
node_return_type_is_int
is_get_tensor_info_node
```

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestAOMigrationQuantizationFx

Reviewers: jerryzh168, vkuzo

Subscribers: jerryzh168, vkuzo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90025
Approved by: https://github.com/HDCharles
2022-12-07 01:31:28 +00:00
Tran Le
b769005924 [fx][passes] Implement annotate getitem node FX passes (#90237)
Summary: One common cause of jit unscriptability issue is loss of node type annotations on local names after one or several FX transform(s). One way to improve the type coverage is to eagerly annotate the type for `getitem` nodes from its parent sequence node. This diff introduces an fx pass to do that.

Test Plan:
```
buck2 test //caffe2/test:fx_experimental
```

Reviewed By: xush6528

Differential Revision: D41749744

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90237
Approved by: https://github.com/xush6528
2022-12-06 23:18:55 +00:00
Jerry Zhang
0e182c9441 [quant][fx] Add support for matching constant in the custom matcher code in quantization (#90092)
Summary:
att

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_pattern_match_constant

Reviewers:

Subscribers:

Tasks:

Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90092
Approved by: https://github.com/jcaip
2022-12-06 22:47:41 +00:00
Peter Bell
5caa27a3fd as_strided: Fix default storage_offset for reference implementation (#89513)
This fixes the default storage_offset to take it from the input. This was
previously untested, so I've also added a new OpInfo which includes samples with
non-zero storage_offsets on the input tensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89513
Approved by: https://github.com/ezyang, https://github.com/ngimel
2022-12-06 22:39:21 +00:00
Edward Z. Yang
3d4b92b171 Ensure that we fakeify tensor subclasses when they are initially tracked (#90009)
The old code didn't actually fakeify traceable tensor subclasses at the
time they are added as a GraphArg to the module; now we do, by ignoring
the subclass during fakeification and relying on Dynamo to simulate
the subclass on top.  See comments for more details.

BTW, this codepath is super broken, see filed issues linked on the
inside.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90009
Approved by: https://github.com/wconstab, https://github.com/voznesenskym
2022-12-06 22:36:32 +00:00
Michael Voznesensky
3b9a386d48 Add TORCH_FAKE_TENSOR_DEBUG use it to enable storage of traces on fake tensors at init time (#90215)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90215
Approved by: https://github.com/ezyang
2022-12-06 22:28:52 +00:00
William Wen
d224ac7f77 Remove logging.CODE (#90234)
Fixes https://github.com/pytorch/torchdynamo/issues/1932

Discussed with @mlazos: if we still want to separate streams for code logging and the rest of info, we can use a separate logger object with a unique name.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90234
Approved by: https://github.com/ezyang
2022-12-06 22:24:43 +00:00
Sergii Dymchenko
14894a7311 Remove non-existing parameter from docstring (#90163)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90163
Approved by: https://github.com/clee2000
2022-12-06 22:22:17 +00:00
Yanbo Liang
7e9a8a1361 Disable dynamo tracing torchrec.distributed (#90087)
Summary: Context at T138318923

Test Plan: mannual test

Reviewed By: yf225

Differential Revision: D41631076

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90087
Approved by: https://github.com/yf225
2022-12-06 22:17:16 +00:00
Eli Uriegas
27ad2605c8 Hotfix to unblock TRT unit tests internally (#90313)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Export of [D41778303](https://www.internalfb.com/diff/D41778303)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90313
Approved by: https://github.com/ezyang, https://github.com/malfet
2022-12-06 22:14:37 +00:00
eqy
62e450d55f [CUDA Graphs] Add option to dump a captured graph for debugging (#85519)
CC @xwang233 @ptrblck @ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85519
Approved by: https://github.com/ngimel
2022-12-06 22:03:05 +00:00
fduwjj
1abe264ef0 [Upstream _NamedOptimzer] Reland PR (89480) (#90293)
Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom):

Reland https://github.com/pytorch/pytorch/pull/89480/
* #90294
* __->__ #90293

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90293
Approved by: https://github.com/awgu
2022-12-06 21:47:12 +00:00
Andrew Gu
7436b19eb2 [FSDP] Clarify loss dtype check in _test_fsdp_parity (#90251)
A recent PR deprecated `torch.testing.assert_allclose` in favor of `torch.testing.assert_close` and left a `TODO`. This PR follows up to confirm that we do intend to have `check_dtype=False`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90251
Approved by: https://github.com/rohan-varma
2022-12-06 21:28:40 +00:00
Andrew Gu
919e09f26a [FSDP][BE] Clean up dead code from clip_grad_norm_() testing (#90250)
`FSDP.clip_grad_norm_()` is tested separately in `test_fsdp_clip_grad_norm.py`. This PR removes the dead non-run code from `common_fsdp.py` and `test_fsdp_core.py` related to `clip_grad_norm_()`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/90250
Approved by: https://github.com/rohan-varma
2022-12-06 21:28:40 +00:00
Yanbo Liang
25f39c1bce Fix uniform ref implementation (#90094)
Fixes https://github.com/pytorch/torchdynamo/issues/1954

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90094
Approved by: https://github.com/ngimel
2022-12-06 21:28:17 +00:00
Edward Z. Yang
a1ab06ab65 ShapeEnv.create_symbolic_sizes_strides_storage_offset (#89962)
Instead of having storage offset hang out on its own, allocate
all of these symbols all in one go.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89962
Approved by: https://github.com/albanD, https://github.com/voznesenskym
2022-12-06 21:27:02 +00:00