Commit Graph

53731 Commits

Author SHA1 Message Date
PyTorch MergeBot
ae2c668cc0 Revert "[dynamo][api] Better support of torch.nn.Module (#88629)"
This reverts commit c83348597b.

Reverted https://github.com/pytorch/pytorch/pull/88629 on behalf of https://github.com/anijain2305 due to job failing on master https://github.com/pytorch/pytorch/actions/runs/3449914495/jobs/5758267231
2022-11-12 07:52:56 +00:00
Jerry Zhang
6b775c42dd [quant][executorch] Support quant fusion for reshape in quant in executorch stack (#88858)
Summary: This diff added support for fusing "dq - reshape - q" to a reshape op, the op is needed in wakeword model

Test Plan: buck test executorch/exir/tests:quant_fusion_pass

Reviewed By: qihqi, JacobSzwejbka

Differential Revision: D41111069

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88858
Approved by: https://github.com/JacobSzwejbka
2022-11-12 07:52:44 +00:00
PyTorch MergeBot
34641c4384 Revert "Add comprehensive minifier tests (#88022)"
This reverts commit 5ff600aa6e.

Reverted https://github.com/pytorch/pytorch/pull/88022 on behalf of https://github.com/wconstab due to Seems to be causing CI failures relating to minifier test and some /tmp/ path not existing
2022-11-12 05:16:41 +00:00
Animesh Jain
c83348597b [dynamo][api] Better support of torch.nn.Module (#88629)
This is an API change, so please review carefully.

With this PR, torchdynamo returns an `OptimizedModule` class object, a subclass of `torch.nn.Module`, when asked to optimize a `nn.Module` object. Most of the methods are redirected to the original `nn.Module`, which is installed as `_mod` in the `OptimizedModule`.

This is helpful for many cases

```
mod = MockModule()

opt_mod = torch._dynamo.optimize()(mod)

print(opt_mod) # Works

opt_mod = opt_mod.to(device="cuda")
print(opt_mod) # Works
opt_mod(input) # Triggers recompile if necessary, earlier we were shedding the TorchDynamo wrapper

opt_mod.parameters() # Refers to the original module

```

Topics unclear to me
* I have overridden many methods to raise NotImplementedError. A careful review of those will be good.
* hooks
* For the optimized forward, should we call torchdynamo optimization on `__call__` or `forward`
* What else to test

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88629
Approved by: https://github.com/Chillee, https://github.com/jansel, https://github.com/msaroufim
2022-11-12 04:45:17 +00:00
Andrew Gu
d01bf1d1f1 [FSDP] Introduce ModuleWrapPolicy for simplicity (#88450)
**BC Breaking Change**
This renames `unwrapped_params` to `nonwrapped_numel`. I prefer `nonwrapped` over `unwrapped` because "unwrap"  suggests that some wrapping has been undone. I prefer `numel` over `params` because that is unit of measurement; I think we should keep "params" to refer to `nn.Parameter`s themselves.

This only breaks anything that passes `unwrapped_params` as a keyword argument, but I did not see anything that did that (except the one internal benchmark file but that does not actually depend on our `pytorch` code).

In a follow-up, I want to rename `min_num_params` to `min_nonwrapped_numel` in `size_based_auto_wrap_policy`, which is also BC breaking. Again, this is to differentiate between "params" being `nn.Parameter`s and "numel" being the unit for `param.numel()`.

**Overview**
This PR introduces `ModuleWrapPolicy` as a lightweight layer over the existing `transformer_auto_wrap_policy`. The most common auto wrapping paradigm is:
```
module_classes: Set[Type[nn.Module]] = ...
auto_wrap_policy = functools.partial(
    transformer_auto_wrap_policy,
    transformer_layer_cls=module_classes,
)
fsdp_model = FSDP(model, auto_wrap_policy=auto_wrap_policy, ...)
```
Now, users can instead write:
```
auto_wrap_policy = ModuleWrapPolicy(module_classes)
fsdp_model = FSDP(model, auto_wrap_policy=auto_wrap_policy, ...)
```
This hides the unused arguments expected from the callable (`recurse` and `unwrapped_params`/`nonwrapped_numel`).

`ModuleWrapPolicy` inherits from an abstract base class `FSDPPolicy` that expects a `policy` property. This decouples the construct of such `FSDPPolicy` classes and their actual `policy`, which must abide by the `_recursive_wrap` interface. Any existing auto wrap policy can be rewritten as a class that inherits from `FSDPPolicy`, so this approach is fully backward compatible from a functionality perspective.

I call this base class `FSDPPolicy` to generalize over the cases where we may not want to actually perform any nested wrapping. In reality, the policy is meant for constructing `FlatParameter`s, which just happened to be induced by a nested wrapping before. Given this, I am changing the constructor argument in `fully_shard()` to simply `policy` instead of `auto_wrap_policy`.

This PR migrates usages of `transformer_auto_wrap_policy` within our unit test suite to `ModuleWrapPolicy` as much as possible.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88450
Approved by: https://github.com/zhaojuanmao
2022-11-12 04:14:32 +00:00
PyTorch MergeBot
b2b0a0d3ba [vision hash update] update the pinned vision hash (#88920)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/master/.github/workflows/_update-commit-hash.yml).
Update the pinned vision hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88920
Approved by: https://github.com/pytorchbot
2022-11-12 03:21:08 +00:00
Chien-Chin Huang
ae4074669e [FSDP][state_dict][6/N] Remove most FSDP module dependency from _optim_utils (#88638)
**What**
This PR removes most `FullyShardedDataParallel` dependencies from `optim_utils`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88638
Approved by: https://github.com/awgu
2022-11-12 03:16:37 +00:00
Bin Bao
4108367123 Exclude poolformer_m36 from the inductor model test (#88908)
Summary: The root cause is still to be investigated. Issue tracked at
https://github.com/pytorch/torchdynamo/issues/1856

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88908
Approved by: https://github.com/malfet
2022-11-12 03:10:25 +00:00
mikey dagitses
1e2327baf7 fix fx tests (#88886)
Summary:
Some source files are missing and TPX couldn't handle the default test
names.

Test Plan: Rely on CI.

Differential Revision: D41218564

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88886
Approved by: https://github.com/zou3519
2022-11-12 02:23:48 +00:00
Edward Z. Yang
66736ff425 Fix bug in OptionalTensorList (#88887)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88887
Approved by: https://github.com/anjali411
2022-11-12 02:19:46 +00:00
Edward Z. Yang
2b166532f7 Remove incorrect assert about hermetic state. (#88885)
I'm not sure why I thought this assert was valid in the first
place, and there's no comment about it.

The assert is tantamount to saying, "no tensor objects should
become dead via SafePyObject when hermetic mode is on."  But
suppose we run a Python GC while we're inside hermetic mode.
This could result in us disposing non-hermetic tensors, which
would hit decref.  So the assert seems invalid.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88885
Approved by: https://github.com/anjali411, https://github.com/malfet
2022-11-12 02:19:45 +00:00
Jiaxu Zhu
2cd05a2818 Support torch.qint32 in Convert (#88871)
Enable the `torch.qint32` when creating `quantize_per_tensor` function call in `convert_fx`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88871
Approved by: https://github.com/jerryzh168
2022-11-12 01:20:52 +00:00
Will Constable
a3f3ec8fac [FSDP+dynamo]: forward treats parameter-views as params (#88781)
Dynamo+AotAutograd needs a way to wrap all tensors (whether
inputs or params/buffers) in FakeTensor wrappers, and
FSDP's mangling of parameters hides them from this wrapping.

This PR unblocks running hf_bert and hf_T5 with FSDP under dynamo, whether using recursive wrapping around transformer layers or only applying FSDP around the whole model.  Perf/memory validation and possibly optimization is the next step.
`python benchmarks/dynamo/distributed.py --torchbench_model hf_Bert --fsdp --dynamo aot_eager`
`python benchmarks/dynamo/distributed.py --torchbench_model hf_Bert --fsdp --dynamo aot_eager --fsdp_wrap`
`python benchmarks/dynamo/distributed.py --torchbench_model hf_T5 --fsdp --dynamo aot_eager`
`python benchmarks/dynamo/distributed.py --torchbench_model hf_T5 --fsdp --dynamo aot_eager --fsdp_wrap`

The problem:
Dynamo (Actually aot_autograd) trips up with FSDP becuase it must
wrap all input tensors in FakeTensor wrappers, and it only knows
to wrap graph inputs or named_(parameters, buffers).  FSDP's
pre_forward hook sets views (which are not nn.param) into the flatparam
as attrs on the module with the same name as the original param, but
they will not show up in named_parameters.

- in use_orig_params mode, FSDP still de-registers
  params during pre-forward hook, then re-registers them
  post-forward
- during forward (between the hooks), the params are setattr'd
  on the module as regular view tensors, not nn.Parameters
- note: use_orig_params is the recommended way to use FSDP,
  and use_orig_params=False is being deprecated.  So i only consider
  use_orig_params=True for this enablement

The solution:
- adding them to named_buffers is not possible because it interferes
  with how FSDP's `_apply` works
- since they are not actual nn.parameters, register_parameter will
  complain about registering them
- simply seting `module._parameters[name] = view` seems to be a viable
  workaround, despite being hacky, and FSDP code does modify _parameters
  directly already.

Note: Manual checkpointing still isn't working with FSDP+dynamo,
so that will have to be addressed in a follow up.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88781
Approved by: https://github.com/ezyang, https://github.com/awgu
2022-11-12 01:17:23 +00:00
William Wen
5ff600aa6e Add comprehensive minifier tests (#88022)
Adds tests for https://github.com/pytorch/torchdynamo/issues/1241.

To run: `pytest test/dynamo/test_minifier.py`.

Actually runs minifier launcher script and repro scripts, rather than just checking for existence of the minifier launcher script.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88022
Approved by: https://github.com/mlazos, https://github.com/anijain2305
2022-11-12 00:22:25 +00:00
Horace He
37c5b42fa6 Fix matmul decomp to use reshape instead of contiguous().view() (#88832)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88832
Approved by: https://github.com/bertmaher, https://github.com/ngimel
2022-11-12 00:15:42 +00:00
Richard Zou
7c3adddd6c [functorch] delete some unused files (#88763)
Some post-merge cleanup.
- packaging/ was for building standalone windows binaries
- our flake8 config got superceded by PyTorch's.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88763
Approved by: https://github.com/samdow
2022-11-11 23:58:51 +00:00
Peter Bell
a7fa423f48 copy_: Short-circuit when self and src view the same data (#88884)
This comes up if you use inplace operators on a slice, e.g.
```python
import torch
a = torch.rand(1000000, device="cuda")
a[::2] *= 2
```

The last line looks as if it should be fully inplace, but is actually
equivalent to:

```python
tmp = a[::2]
tmp *= 2
a[::2] = tmp
```

Which results in `mul_` and `copy_` being called. With this PR, the
redundant copy becomes a no-op and the above example is 2x faster.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88884
Approved by: https://github.com/ngimel
2022-11-11 23:31:15 +00:00
Yanbo Liang
6fe47b682f [Dynamo] Fix str(Guard.obj_weakref) bug to re-ennable support overriding __getattr__ (#88564)
See my inline comments!

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88564
Approved by: https://github.com/ezyang, https://github.com/anijain2305
2022-11-11 22:31:32 +00:00
Kevin Tse
be8d88f8d0 [DataLoader] Removing DataLoader2 related code (#88848)
Removing these lines of code as `DataLoader2` has been added to [TorchData](https://github.com/pytorch/data). I'm importing this to confirm it will not impact internal codes.

Differential Revision: [D41201578](https://our.internmc.facebook.com/intern/diff/D41201578)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88848
Approved by: https://github.com/ejguan
2022-11-11 22:27:01 +00:00
Nikita Shulga
f39cad50b7 Make InductorCPU usable in internally (#88870)
Test Plan: `buck2 test mode/opt //caffe2/test:test_inductor -- --exact 'caffe2/test:test_inductor - test_dtype_mismatch_issue_cuda (caffe2.test.inductor.test_torchinductor.CudaTests)'`

Differential Revision: D41206109

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88870
Approved by: https://github.com/izaitsevfb
2022-11-11 22:07:34 +00:00
BowenBao
fbc1878265 [ONNX] Pretty print diagnostic logging (#88261)
Adds pretty print diagnostic logging. For example
```python
import io
import torch
from torch.onnx._internal import diagnostics

class CustomAdd(torch.autograd.Function):
    @staticmethod
    def forward(ctx, x, y):
        return x + y

    @staticmethod
    def symbolic(g, x, y):
        return g.op("custom::CustomAdd", x, y)

class M(torch.nn.Module):
    def forward(self, x):
        return CustomAdd.apply(x, x)

# trigger warning for missing shape inference.
# rule = diagnostics.rules.node_missing_onnx_shape_inference
torch.onnx.export(M(), torch.randn(3, 4), io.BytesIO())
```

By default, observe minimum summary of diagnostics
```
========= Diagnostic Run torch.onnx.export version 1.14.0a0+git90a69c5 =========
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 3 WARNING 0 ERROR ========================
3 WARNING were not printed due to the log level.
```

Adjusting the `verbose` and `level` argument.
```python
diagnostics.engine.pretty_print(verbose=True, level=diagnostics.levels.WARNING)
```

Prints full log.
```
=============================== 1 Diagnostic Run ===============================
========= Diagnostic Run torch.onnx.export version 1.14.0a0+git90a69c5 =========
verbose: True, log level: Level.WARNING
======================= 0 NONE 0 NOTE 3 WARNING 0 ERROR ========================
WARNING: node-missing-onnx-shape-inference
==========================================
The shape inference of custom::CustomAdd type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
--------------------------- Stack: Python call stack ---------------------------
frame: diagnostic = ExportDiagnostic(rule, level, message, **kwargs) /home/bowbao/pytorch_dev/torch/onnx/_internal/diagnostics/_diagnostic.py:151
frame: n, utils._params_dict, GLOBALS.export_onnx_opset_version /home/bowbao/pytorch_dev/torch/onnx/_patch_torch.py:82
frame: <@beartype(torch.onnx._patch_torch._graph_op) at 0x7f62184b6710>:78
frame: return beartyped(*args, **kwargs) /home/bowbao/pytorch_dev/torch/onnx/_internal/_beartype.py:81
frame: return function(*args, **kwargs) /home/bowbao/pytorch_dev/torch/onnx/_deprecation.py:30
frame: return g.op("custom::CustomAdd", x, y) test_pretty_print.py:14
frame: return symbolic_fn(g, *args) /home/bowbao/pytorch_dev/torch/onnx/utils.py:1716
frame: return beartyped(*args, **kwargs) /home/bowbao/pytorch_dev/torch/onnx/_internal/_beartype.py:81
frame: graph = _C._jit_pass_onnx(graph, operator_export_type) /home/bowbao/pytorch_dev/torch/onnx/utils.py:663
frame: <@beartype(torch.onnx.utils._optimize_graph) at 0x7f62180e05f0>:85
frame: return beartyped(*args, **kwargs) /home/bowbao/pytorch_dev/torch/onnx/_internal/_beartype.py:81
frame: module=module, /home/bowbao/pytorch_dev/torch/onnx/utils.py:1123
frame: return beartyped(*args, **kwargs) /home/bowbao/pytorch_dev/torch/onnx/_internal/_beartype.py:81
frame: dynamic_axes=dynamic_axes, /home/bowbao/pytorch_dev/torch/onnx/utils.py:1539
frame: return beartyped(*args, **kwargs) /home/bowbao/pytorch_dev/torch/onnx/_internal/_beartype.py:81
frame: export_modules_as_functions=export_modules_as_functions, /home/bowbao/pytorch_dev/torch/onnx/utils.py:519
frame: <@beartype(torch.onnx.utils.export) at 0x7f62180e0170>:347
frame: return beartyped(*args, **kwargs) /home/bowbao/pytorch_dev/torch/onnx/_internal/_beartype.py:81
frame: torch.onnx.export(M(), torch.randn(3, 4), io.BytesIO()) test_pretty_print.py:22
---------------------------- Stack: C++ call stack -----------------------------
frame: (<unknown frame>)
frame: (<unknown function> + 0x88411b (0x7f625b36011b in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so))
frame: (torch::jit::UpdateReliable(torch::jit::Value*, std::pair<bool, bool> const&) + 0x7d3 (0x7f625b351743 in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so))
frame: (torch::jit::UpdateReliable(torch::jit::Node*) + 0x4f (0x7f625b35198f in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so))
frame: (torch::jit::ONNXShapeTypeInference(torch::jit::Node*, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, c10::IValue, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, c10::IValue> > > const&, int) + 0xac9 (0x7f625b357179 in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so))
frame: (<unknown function> + 0xabd026 (0x7f625b599026 in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so))
frame: (<unknown function> + 0x3c0fda (0x7f625ae9cfda in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so))
frame: (<unknown frame>)

WARNING: node-missing-onnx-shape-inference
==========================================
The shape inference of custom::CustomAdd type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
--------------------------- Stack: Python call stack ---------------------------
frame: diagnostic = ExportDiagnostic(rule, level, message, **kwargs) /home/bowbao/pytorch_dev/torch/onnx/_internal/diagnostics/_diagnostic.py:151
frame: graph, params_dict, GLOBALS.export_onnx_opset_version /home/bowbao/pytorch_dev/torch/onnx/utils.py:688
frame: <@beartype(torch.onnx.utils._optimize_graph) at 0x7f62180e05f0>:85
frame: return beartyped(*args, **kwargs) /home/bowbao/pytorch_dev/torch/onnx/_internal/_beartype.py:81
frame: module=module, /home/bowbao/pytorch_dev/torch/onnx/utils.py:1123
frame: return beartyped(*args, **kwargs) /home/bowbao/pytorch_dev/torch/onnx/_internal/_beartype.py:81
frame: dynamic_axes=dynamic_axes, /home/bowbao/pytorch_dev/torch/onnx/utils.py:1539
frame: return beartyped(*args, **kwargs) /home/bowbao/pytorch_dev/torch/onnx/_internal/_beartype.py:81
frame: export_modules_as_functions=export_modules_as_functions, /home/bowbao/pytorch_dev/torch/onnx/utils.py:519
frame: <@beartype(torch.onnx.utils.export) at 0x7f62180e0170>:347
frame: return beartyped(*args, **kwargs) /home/bowbao/pytorch_dev/torch/onnx/_internal/_beartype.py:81
frame: torch.onnx.export(M(), torch.randn(3, 4), io.BytesIO()) test_pretty_print.py:22
---------------------------- Stack: C++ call stack -----------------------------
frame: (<unknown frame>)
frame: (<unknown function> + 0x88411b (0x7f625b36011b in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so))
frame: (torch::jit::UpdateReliable(torch::jit::Value*, std::pair<bool, bool> const&) + 0x7d3 (0x7f625b351743 in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so))
frame: (torch::jit::UpdateReliable(torch::jit::Node*) + 0x4f (0x7f625b35198f in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so))
frame: (torch::jit::ONNXShapeTypeInference(torch::jit::Node*, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, c10::IValue, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, c10::IValue> > > const&, int) + 0xac9 (0x7f625b357179 in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so))
frame: (<unknown function> + 0x87d6d1 (0x7f625b3596d1 in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so))
frame: (torch::jit::ONNXShapeTypeInference(std::shared_ptr<torch::jit::Graph>&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, c10::IValue, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, c10::IValue> > > const&, int) + 0x33 (0x7f625b359cf3 in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so))
frame: (<unknown function> + 0xabdbae (0x7f625b599bae in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so))
frame: (<unknown function> + 0x3c0fda (0x7f625ae9cfda in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so))
frame: (<unknown frame>)

WARNING: node-missing-onnx-shape-inference
==========================================
The shape inference of custom::CustomAdd type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
--------------------------- Stack: Python call stack ---------------------------
frame: diagnostic = ExportDiagnostic(rule, level, message, **kwargs) /home/bowbao/pytorch_dev/torch/onnx/_internal/diagnostics/_diagnostic.py:151
frame: graph, params_dict, GLOBALS.export_onnx_opset_version /home/bowbao/pytorch_dev/torch/onnx/utils.py:1179
frame: return beartyped(*args, **kwargs) /home/bowbao/pytorch_dev/torch/onnx/_internal/_beartype.py:81
frame: dynamic_axes=dynamic_axes, /home/bowbao/pytorch_dev/torch/onnx/utils.py:1539
frame: return beartyped(*args, **kwargs) /home/bowbao/pytorch_dev/torch/onnx/_internal/_beartype.py:81
frame: export_modules_as_functions=export_modules_as_functions, /home/bowbao/pytorch_dev/torch/onnx/utils.py:519
frame: <@beartype(torch.onnx.utils.export) at 0x7f62180e0170>:347
frame: return beartyped(*args, **kwargs) /home/bowbao/pytorch_dev/torch/onnx/_internal/_beartype.py:81
frame: torch.onnx.export(M(), torch.randn(3, 4), io.BytesIO()) test_pretty_print.py:22
---------------------------- Stack: C++ call stack -----------------------------
frame: (<unknown frame>)
frame: (<unknown function> + 0x88411b (0x7f625b36011b in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so))
frame: (torch::jit::UpdateReliable(torch::jit::Value*, std::pair<bool, bool> const&) + 0x7d3 (0x7f625b351743 in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so))
frame: (torch::jit::UpdateReliable(torch::jit::Node*) + 0x4f (0x7f625b35198f in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so))
frame: (torch::jit::ONNXShapeTypeInference(torch::jit::Node*, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, c10::IValue, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, c10::IValue> > > const&, int) + 0xac9 (0x7f625b357179 in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so))
frame: (<unknown function> + 0x87d6d1 (0x7f625b3596d1 in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so))
frame: (torch::jit::ONNXShapeTypeInference(std::shared_ptr<torch::jit::Graph>&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, c10::IValue, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, c10::IValue> > > const&, int) + 0x33 (0x7f625b359cf3 in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so))
frame: (<unknown function> + 0xabdbae (0x7f625b599bae in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so))
frame: (<unknown function> + 0x3c0fda (0x7f625ae9cfda in /home/bowbao/pytorch_dev/torch/lib/libtorch_python.so))
frame: (<unknown frame>)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88261
Approved by: https://github.com/abock, https://github.com/justinchuby
2022-11-11 21:59:16 +00:00
efiks
ea0ec9d71c [tourch] BatchBoxCox - fix numerical issue in vectorized code (#88875)
Summary:
Usage of fast math in BatchBoxCox kernel provided different math results between dev and optimized versions which cause few internal test to fail.
For now disabling the compiler optimized version and relying on ATEN vectors

Differential Revision: D41211784

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88875
Approved by: https://github.com/hyuen
2022-11-11 21:58:23 +00:00
Richard Barnes
dfb4b73e45 Fix unused variable 'options' warning in RNN.cpp (#88753)
Fixes
```
/home/rbarnes/pytorch/aten/src/ATen/native/cudnn/RNN.cpp:73:17: warning: unused variable 'options' [-Wunused-variable]
  TensorOptions options = TensorOptions().dtype(dtype).layout(layout).device(device).pinned_memory(pin_memory);
                ^
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88753
Approved by: https://github.com/soumith
2022-11-11 21:51:13 +00:00
Chien-Chin Huang
7aa144ac54 [FSDP][state_dict][5/N] Remove the FSDP module dependency from _state_dict_utils (#88637)
**What**
This PR completely removes the `FullyShardedDataParallel` dependency from `_state_dict_utils` -- `_state_dict_utils` now depends only on `_FSDPState` and all the utils modules.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88637
Approved by: https://github.com/awgu
2022-11-11 21:22:13 +00:00
Nikita Shulga
575e02df53 Fix CUDNN_PATH handling on Windows (#88898)
Fixes https://github.com/pytorch/pytorch/issues/88873
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88898
Approved by: https://github.com/kit1980
2022-11-11 21:19:26 +00:00
kshitij12345
f74946324e [fix] allow saving python attr on Tensor and Parameter via torch.save (#81616)
Fixes: https://github.com/pytorch/pytorch/issues/72129

TODO:
* [x] Fix for Parameter

Benchmark
(Measurable diff for small tensors)
```
[-------------- Save and Load --------------]
                    |  After PR  |  Before PR
1 threads: ----------------------------------
      ()            |    111.7   |     106.9
      (4, 4)        |    114.4   |     109.2
      (128, 128)    |    135.2   |     128.3
      (1024, 1024)  |   1431.9   |    1431.3

Times are in microseconds (us).
```

<details>

<summary> Benchmark Script </summary>

```python
import torch
from torch.testing._internal.common_utils import BytesIOContext
from torch.utils import benchmark
import pickle

shapes = ((), (4, 4), (128, 128), (1024, 1024))

sizes = [1, 64, 1024, 10000]
results = []

def save_load_fn(t):
    with BytesIOContext() as f:
        torch.save(t, f)
        f.seek(0)
        torch.load(f)

for shape in shapes:
    t = torch.randn(shape)
    label = 'Save and Load'
    sub_label = f'{shape}'
    results.append(benchmark.Timer(
        stmt='save_load_fn(t)',
        globals={'t': t, 'save_load_fn':save_load_fn},
        label=label,
        sub_label=sub_label,
        description='Before PR',
    ).blocked_autorange(min_run_time=2))

compare = benchmark.Compare(results)
compare.print()

with open('before_pr.pkl', 'wb') as f:
    pickle.dump(results, f)

# with open('after_pr.pkl', 'rb') as f:
#     after_pr = pickle.load(f)

# with open('before_pr.pkl', 'rb') as f:
#     before_pr = pickle.load(f)

# compare = benchmark.Compare(after_pr + before_pr)
# compare.print()
```

</details>

NOTE : **BC-Breaking** : After this PR, all tensors (also regular tensors) will be serialised using `_rebuild_from_type_v2`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81616
Approved by: https://github.com/albanD, https://github.com/kurtamohler
2022-11-11 21:11:12 +00:00
PyTorch MergeBot
ba4d5aae06 Revert "rename DisableTorchFunction to DisableTorchFunctionSubclass (#88218)"
This reverts commit 7f28be10e5.

Reverted https://github.com/pytorch/pytorch/pull/88218 on behalf of https://github.com/izaitsevfb due to BC-breaking change, D41211901
2022-11-11 19:13:05 +00:00
PyTorch MergeBot
4e5d7afe84 Revert "add DisableTorchFunction that matches DisableTorchDispatch (#88219)"
This reverts commit c0ecce15b5.

Reverted https://github.com/pytorch/pytorch/pull/88219 on behalf of https://github.com/izaitsevfb due to BC-breaking change, D41211901
2022-11-11 19:08:30 +00:00
BowenBao
9d7d21f569 [ONNX] Add stack info to diagnostics (#87258)
~~Investigating strange bug releasing 'graph' right when returning from `_C._jit_pass_onnx`.~~
~~Can be repro-ed locally via `test_cpp_diagnose`, with changes in this PR.~~
Resolved by https://github.com/pytorch/pytorch/pull/87829.
This PR adds methods to record stack backtrace information to diagnostics.

* #87830
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87258
Approved by: https://github.com/abock
2022-11-11 18:58:15 +00:00
Chien-Chin Huang
3d1c5c89ed [FSDP][state_dict][4/N] Move the core logic of summon full parameters to _unshard_params_utils.py (#88636)
**What**
`_summon_full_parameters` is required for state_dict. To enable composable FSDP state_dict, `_summon_full_params` must be accessible without FullyShardedDataParall. This PR move the core logic of `_summon_full_params` to `_unshard_params_utils`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88636
Approved by: https://github.com/awgu
2022-11-11 18:30:57 +00:00
Thiago Crepaldi
5f0783bd6d Fix ATen Fallback for BUILD_CAFFE2=0 for ONNX-only ops (#88504)
Follow-up for #87735

Once again, because BUILD_CAFFE2=0 is not tested for ONNX exporter, one scenario slipped through. A use case where the model can be exported without aten fallback when operator_export_type=ONNX_ATEN_FALLBACK and BUILD_CAFFE2=0

A new unit test has been added, but it won't prevent regressions if BUILD_CAFFE2=0 is not executed on CI again

Fixes #87313

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88504
Approved by: https://github.com/justinchuby, https://github.com/BowenBao
2022-11-11 17:43:46 +00:00
Elias Ellison
8ff2e34ca6 Take input striding for conv forward based on eager output (#88706)
From discussion with @Chillee and @ngimel we'll likely need further fixes to ensure that we hit channels last kernels but this is still worth landing in its own right.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88706
Approved by: https://github.com/ngimel
2022-11-11 17:29:15 +00:00
PyTorch MergeBot
adfbd831cf Revert "[Autograd] Use in-place input accumulation fast path for dense Tensors. (#88339)"
This reverts commit 8f66ae413f.

Reverted https://github.com/pytorch/pytorch/pull/88339 on behalf of https://github.com/mehtanirav due to Internal test failures
2022-11-11 17:03:25 +00:00
Kurt Mohler
89a326ff7e Explicitly check filelike arg of torch.save (#88867)
Fixes #88793

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88867
Approved by: https://github.com/ezyang
2022-11-11 16:57:08 +00:00
Elias Ellison
a6832b08a3 Regularize bernouilli_ with bernouilli decomp (#88349)
Fix for https://github.com/pytorch/torchdynamo/issues/1796. Just like the other [bernouilli decomp](https://github.com/pytorch/pytorch/blob/master/torch/_inductor/decomposition.py#L302) we need to pass `dtype=float32` to avoid `"check_uniform_bounds" not implemented` errors.

Are we planning on enabling `TEST_WITH_TORCHINDUCTOR` ? Do I need to change anything with the tests ?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88349
Approved by: https://github.com/desertfire
2022-11-11 16:53:02 +00:00
Nikita Karetnikov
1e8f95ace1 Symintify broadcast_to (#88776)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88776
Approved by: https://github.com/ezyang
2022-11-11 15:49:43 +00:00
anjali411
d615d12289 Add meta impl for topk (#88694)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88694
Approved by: https://github.com/ezyang
2022-11-11 15:28:41 +00:00
Chien-Chin Huang
3c7f96665e [FSDP][state_dict][3/N] Change how state_dict utils access attributes in _FSDPState (#88635)
**What This PR Does**
_state_dict_utils currently accesses the FSDP states through module. To enable composable FSDP state_dict, these accesses need to go through _FSDPState. module is still required for most APIs as state_dict has to access per-module information.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88635
Approved by: https://github.com/awgu
2022-11-11 15:20:36 +00:00
soulitzer
b92acee8f8 Add context manager to allow mutation on saved tensors (#79056)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79056
Approved by: https://github.com/albanD
2022-11-11 15:18:28 +00:00
Bin Bao
91b71cdbe4 [dynamo] Add torch.device to is_safe_constant (#88766)
Test Plan:
```
PYTORCH_TEST_WITH_DYNAMO=1 python test/test_torch.py -k  test_advancedindex_mixed_cpu_devices_cuda
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88766
Approved by: https://github.com/jansel
2022-11-11 15:06:17 +00:00
Chien-Chin Huang
324ac93a43 [FSDP][state_dict][2/N] Move state_dict related enums/dataclasses/states to state_dict_utils.py, api.py and init_state_dict() (#88481)
**Motivation**:
Several Enums, Dataclasses and states defined in fully_sharded_data_paralle.py should be moved to a place where the composable FSDP can access. This PR does the move.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88481
Approved by: https://github.com/rohan-varma, https://github.com/awgu
2022-11-11 12:28:37 +00:00
Michael Gschwind
ee91c328da Fix cuda/cpu check on NoneType (#88854)
Summary: Fix cuda/cpu check on NoneType

Test Plan: sabdcastle/ github CI/CD

Differential Revision: D41203955

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88854
Approved by: https://github.com/drisspg, https://github.com/ngimel
2022-11-11 12:19:31 +00:00
kshitij12345
d15a6b0c97 Error on ZeroTensor serialization (#88803)
Follow-up : https://github.com/pytorch/pytorch/pull/88182#issuecomment-1308628415

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88803
Approved by: https://github.com/anjali411
2022-11-11 08:51:29 +00:00
AllenTiTaiWang
b843f4db0a [ONNX] Add test case for onnx::Max scalar type (#88751)
Referenced by minimum cases
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88751
Approved by: https://github.com/wschin, https://github.com/BowenBao
2022-11-11 07:08:56 +00:00
Eddie Yan
396c3b1d88 Use atomicAdd for bfloat16 in Ampere and above (#84981)
WIP to fix extremely slow `scatter_add` issue vs. fp16. The current changes seem to improve performance, but it still appears to lag behind the fp16 equivalent.

CC @ngimel @ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84981
Approved by: https://github.com/ngimel
2022-11-11 05:23:48 +00:00
AllenTiTaiWang
a6d72f44a4 [ONNX] Add onnx::Max into standard Op for scalar type alignment (#88750)
Easy fix for onnx::Max ScalarType
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88750
Approved by: https://github.com/justinchuby, https://github.com/BowenBao
2022-11-11 04:22:04 +00:00
PyTorch MergeBot
0de8f047c1 Revert "[dynamo] fixes dict changed during runtime error (#87526)"
This reverts commit cf04b36ce8.

Reverted https://github.com/pytorch/pytorch/pull/87526 on behalf of https://github.com/anijain2305 due to error reported
2022-11-11 04:19:08 +00:00
Jane Xu
310335de48 Update lr_scheduler.pyi to match lr_scheduler.py (#88818)
Following #88503, we should also update the pyi file

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88818
Approved by: https://github.com/soulitzer
2022-11-11 04:02:44 +00:00
Wei-Sheng Chin
86b7aa26f0 Fix FakeTensorProp on Module with Parameters or Buffers (#88700)
In `FakeTensorMode.__torch_dispatch__`, the output is now always computed by meta kernels in
```python
        try:
            with in_kernel_invocation_manager(self):
                r = func(*args, **kwargs)  # <----- "r" can be a real tensor.
        except NotImplementedError as not_implemented_error:
            # no meta kernel registered, fallback to kernel for the device
            if not self.allow_fallback_kernels:
                raise not_implemented_error
            return run_fallback_kernel(self, func, args, kwargs, not_implemented_error)

        return self.wrap_meta_outputs_with_default_device_logic(r, func, args, kwargs)
```
For example, I observed a CPU tensor is generated when executing `aten.addmm` when running `FakeTensorProp`. Therefore, I'd like to allow `FakeTensorMode` to wrap real tensor as `FakeTensor` during the computation. Does this PR look a good direction to fix this problem? If yes, I can go ahead and add some tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88700
Approved by: https://github.com/eellison, https://github.com/ezyang
2022-11-11 03:49:29 +00:00
Chien-Chin Huang
c4fc5d372f [FSDP][state_dict][1/N] Moving state_dict logic to pre_state_dict_hook (#87900)
This is one step toward the ultimate goal: remove the overwritten state_dict in FSDP. All the logic should be either in `pre_state_dict_hook` or `post_state_dict_hook`.

Since current `nn.Module` does not support `pre_state_dict_hook`, this PR mimic `pre_state_dict_hook` by calling the pre hook inside post the hook, effectively ditching all the work done by `nn.Module.state_dict`. Once `pre_state_dict_hook` is supported by `nn.Module`, these pre hook calls can be moved out from the post hooks and be registered to `nn.Module.pre_state_dict_hook`.

The major issue of this temporary solution is that `post_state_dict_hook` is called from the leaf node to the root node. This makes the `module._lazy_init()` invalid as FSDP assumes `_lazy_init()` to be called from the root. As a result, `FSDP.state_dict` currently contains only one logic -- calling `module._lazy_init()`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87900
Approved by: https://github.com/rohan-varma
2022-11-11 03:41:40 +00:00