Commit Graph

6786 Commits

Author SHA1 Message Date
Han Qi (qihqi)
25eb7c3ae3 Clean up dependancy for flatbuffer_loader (#86041)
Test Plan: waitforsandcastle

Differential Revision: D38445936

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86041
Approved by: https://github.com/cccclai
2022-12-08 03:48:04 +00:00
Nikita Shulga
36ac095ff8 Migrate PyTorch to C++17 (#85969)
With CUDA-10.2 gone we can finally do it!

This PR mostly contains build system related changes, invasive functional ones are to be followed.
Among many expected tweaks to the build system, here are few unexpected ones:
 - Force onnx_proto project to be updated to C++17 to avoid `duplicate symbols` error when compiled by gcc-7.5.0, as storage rule for `constexpr` changed in C++17, but gcc does not seem to follow it
 - Do not use `std::apply` on CUDA but rely on the built-in variant, as it results in test failures when CUDA runtime picks host rather than device function when `std::apply` is invoked from CUDA code.
 - `std::decay_t` -> `::std::decay_t` and `std::move`->`::std::move` as VC++ for some reason claims that `std` symbol is ambigious
 - Disable use of `std::aligned_alloc` on Android, as its `libc++` does not implement it.

Some prerequisites:
 - https://github.com/pytorch/pytorch/pull/89297
 - https://github.com/pytorch/pytorch/pull/89605
 - https://github.com/pytorch/pytorch/pull/90228
 - https://github.com/pytorch/pytorch/pull/90389
 - https://github.com/pytorch/pytorch/pull/90379
 - https://github.com/pytorch/pytorch/pull/89570
 - https://github.com/facebookincubator/gloo/pull/336
 - https://github.com/facebookincubator/gloo/pull/343
 - 919676fb32

Fixes https://github.com/pytorch/pytorch/issues/56055

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85969
Approved by: https://github.com/ezyang, https://github.com/kulinseth
2022-12-08 02:27:48 +00:00
Ryan Spring
3c9431f505 Add factory functions to python frontend (#89230)
- Add `full` nvprim to support factory functions because the full reference uses `empty` and `fill` while we have a full factory function.
- Change `full_like` reference to call `full` to avoid defining another nvprim.
- Enable support for new_zeros to enable `cudnn_batch_norm` decomposition.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89230
Approved by: https://github.com/kevinstephano, https://github.com/mruberry
2022-12-06 07:16:21 +00:00
David Berard
7571134f69 [NNC] Use New PassManager for LLVM >= 15 (#89978)
This is needed because TargetMachine::adjustPassManager was removed in https://reviews.llvm.org/D137796. However, we need to keep around the old pass manager implementation for LLVM < 12.

Based on this: https://llvm.org/docs/NewPassManager.html

Tests: `./build/bin/test_tensorexpr` passes.

RUN_TORCHBENCH: nvfuser

Differential Revision: [D41636445](https://our.internmc.facebook.com/intern/diff/D41636445)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89978
Approved by: https://github.com/bertmaher
2022-12-05 19:19:36 +00:00
Lukas N Wirz
301d9c0556 Remove deprecated usage of is_pod/is_pod_v (#88918)
… as equivalent replacements for std::is_pod and std::is_pod_v because they are deprecated in C++20.

When consuming libtorch header files in a project that uses C++20, there are warnings about std::is_pod being deprecated.  This patch fixes that issue.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88918
Approved by: https://github.com/ezyang
2022-12-05 16:50:00 +00:00
David Berard
9d54d3bec2 [NVFuser] undo v100 OOM skips (#90070)
Summary: I think these were just caused by parallel tests. After adjusting test settings to 1 thread, these stopped OOMing.

Test Plan:
```
$ buck2 test -j 1 mode/dev-nosan //caffe2/torch/csrc/jit/codegen/cuda:nvfuser
```
https://www.internalfb.com/intern/testinfra/testrun/6473924590389963

Differential Revision: D41643827

Pull Request resolved: https://github.com/pytorch/pytorch/pull/90070
Approved by: https://github.com/jjsjann123
2022-12-02 21:58:24 +00:00
Scott Ramsby
24b3b73c98 [Caffe2] Fix merge logic bug (#89551)
Summary: `ExprGroup::getMergeCandidates()` had a logic bug. The vector being initialized had its arguments mis-ordered. This didn't trigger a build warning because the warning about implicit cast from an integral type to `bool` wasn't enabled.

Test Plan: `buck test fbsource//arvr/mode/win/vs2019/cuda11/opt fbsource//arvr/mode/hybrid_execution //arvr/libraries/neural_net_inference/TorchScript/...`

Differential Revision: D41488939

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89551
Approved by: https://github.com/davidberard98, https://github.com/jjsjann123
2022-11-30 01:01:49 +00:00
Chen Lai
93772305d9 [PyTorch Edge] Set training for module only (#89488)
Update previous recursive logic.

Continue setting training attribute only if the slot is an object and a module.

For the corresponding JIT module, they get the module list first and set module one by one. there is method to get all modules iteratively, instead of recursively.

This change patch one fix to set training attribute for `model_f269583363.ptl`. Another patch is needed, because current lite interpreter doesn't have the correct type when loading object with setstate.

Differential Revision: [D41466417](https://our.internmc.facebook.com/intern/diff/D41466417/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89488
Approved by: https://github.com/iseeyuan
2022-11-29 13:49:44 +00:00
David Berard
908daa8ae5 [nvfuser] avoid out of bounds error (#89584)
Summary: update OOB check (https://github.com/csarofeen/pytorch/pull/2218) and skip tests that OOM on internal machines.

Test Plan:
```
buck2 test mode/dev-nosan //caffe2/torch/csrc/jit/codegen/cuda/test:nvfuser
```

Differential Revision: D41502369

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89584
Approved by: https://github.com/jjsjann123
2022-11-29 02:03:59 +00:00
Vasiliy Kuznetsov
22a1b5e243 quantization: deprecate observer compute_dtype and replace with is_dynamic (#85431)
Summary:

This PR deprecates the `compute_dtype` field on observers, and replaces
it with the `is_dynamic` field on observers.  This is better aligned
with the reference model spec.

Test plan:

```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85431
Approved by: https://github.com/jerryzh168
2022-11-24 07:07:34 +00:00
Wu, Chunyuan
9c867eae1a nnc: fix Store if value is fp32 while buf is bf16 (#86788)
Fixes https://github.com/pytorch/pytorch/issues/86533.
For the below graph:
```bash
[DUMP kernel.cpp:1690] TensorExprKernel graph:
[DUMP kernel.cpp:1690] graph(%x.1 : BFloat16(10, strides=[1], requires_grad=0, device=cpu)):
[DUMP kernel.cpp:1690]   %1 : int = prim::Constant[value=0]()
[DUMP kernel.cpp:1690]   %2 : BFloat16(10, strides=[1], requires_grad=0, device=cpu) = aten::pow(%x.1, %1) # test/test_tensorexpr.py:1330:29
[DUMP kernel.cpp:1690]   %3 : BFloat16(10, strides=[1], requires_grad=0, device=cpu) = aten::sin(%2) # test/test_tensorexpr.py:1330:19
[DUMP kernel.cpp:1690]   return (%3)
```

**Loop stmt before the fix:**
The store value `0.8414709568023682f` is float while the scalar_type of the store buf `aten_sin` is bf16.
```bash
[DEBUG llvm_codegen.cpp:489] After HalfRewriter {
[DEBUG llvm_codegen.cpp:489]   aten_sin[Ramp(0ll, 1ll, 8)] = Broadcast(0.8414709568023682f, 8);
[DEBUG llvm_codegen.cpp:489]   for (int64_t i_1_tail_tail = 0ll; i_1_tail_tail < 2ll; i_1_tail_tail++) {
[DEBUG llvm_codegen.cpp:489]     aten_sin[i_1_tail_tail + 8ll] = 0.8414709568023682f;
[DEBUG llvm_codegen.cpp:489]   }
[DEBUG llvm_codegen.cpp:489] }
```

**Loop stmt after the fix:**
```bash
[DEBUG llvm_codegen.cpp:489] After HalfRewriter {
[DEBUG llvm_codegen.cpp:489]   aten_sin[Ramp(0ll, 1ll, 8)] = bfloat16(Broadcast(0.8414709568023682f, 8));
[DEBUG llvm_codegen.cpp:489]   for (int64_t i_1_tail_tail = 0ll; i_1_tail_tail < 2ll; i_1_tail_tail++) {
[DEBUG llvm_codegen.cpp:489]     aten_sin[i_1_tail_tail + 8ll] = bfloat16(0.8414709568023682f);
[DEBUG llvm_codegen.cpp:489]   }
[DEBUG llvm_codegen.cpp:489] }
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86788
Approved by: https://github.com/EikanWang, https://github.com/kit1980
2022-11-24 02:52:34 +00:00
Wei-Sheng Chin
e922bd4e52 [ONNX] Move two headers from .h to .cc (#86852)
As title. Header dependency should be as small as possible.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86852
Approved by: https://github.com/titaiwangms, https://github.com/BowenBao
2022-11-24 01:30:09 +00:00
Mike Iovine
7b0650d5cf Back out "[static-runtime] change the backend for permute_copy" (#89463)
Summary: This permute copy change seems to be causing huge regressions on machines without AVX512. Revert to mitigate. This shouldn't be problematic since the improvement from changing it was super small anyways.

Differential Revision: D41450088

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89463
Approved by: https://github.com/hlu1
2022-11-22 06:26:10 +00:00
maxren
496c8ae760 [xnnpack][lite-int] Handle Constant Data (#89445)
Handling constant data for xnnpack delegation. This allows us to handle new modules like such:

```
class Module(torch.nn.Module):
            def __init__(self):
                super().__init__()
                self._constant = torch.ones(4, 4, 4)

            def forward(self, x):
                return x + self._constant
```

this is the precursor work to handling convolution, as we need to serialize constant data(weights)

Differential Revision: [D41050349](https://our.internmc.facebook.com/intern/diff/D41050349/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89445
Approved by: https://github.com/digantdesai
2022-11-22 02:20:54 +00:00
AllenTiTaiWang
6daf60be5a [ONNX] Add setType from user into InferredType and Reliable in ConstantValueMap (#88622)
`setType` API is not respected in current exporter because the graph-level shape type inference simply overrides every NOT ONNX Op shape we had from node-level shape type inference. To address this issue, this PR (1) makes custom Op with `setType` **reliable** in ConstantValueMap to secure its shape/type information in pass:  _C._jit_pass_onnx. (2) If an invalid Op with shape/type in pass: _C._jit_pass_onnx_graph_shape_type_inference(graph-level), we recognize it as reliable.

1. In #62856, The refactor in onnx.cpp made regression on custom Op, as that was the step we should update custom Op shape/type information into ConstantValueMap for remaining Ops.

2. Add another condition besides IsValidONNXNode for custom Op setType in shape_type_inference.cpp. If all the node output has shape (not all dynamic), we say it's custom set type.

3. ~However, this PR won't solve the [issue](https://github.com/pytorch/pytorch/issues/87738#issuecomment-1292831219) that in the node-level shape type inference, exporter invokes the warning in terms of the unknow custom Op, since we process its symbolic_fn after this warning, but it would have shape/type if setType is used correctly. And that will be left for another issue to solve. #84661~ Add `no_type_warning` in UpdateReliable() and it only warns if non ONNX node with no given type appears.

Fixes #81693
Fixes #87738

NOTE: not confident of this not breaking anything. Please share your thoughts if there is a robust test on your mind.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88622
Approved by: https://github.com/BowenBao
2022-11-19 17:16:59 +00:00
maxren
7beb151889 [xnnpack][executorch] remove unordered_set from xnn_compiler (#89231)
Removing unrodered_set from xnncompiler for executorch.

While some STL libraries are unavoidable, and I think it should be ok for delegate to pull these libraries, unordered_set wasn't really needed, and we should be serializing the number of external ids anyways

After this, the backend classes should be good to hg copy into executorch

Differential Revision: [D41227391](https://our.internmc.facebook.com/intern/diff/D41227391/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89231
Approved by: https://github.com/salilsdesai, https://github.com/cccclai
2022-11-18 07:07:19 +00:00
Nikita Shulga
767f6aa49f [JIT][Security] Do not blindly eval input string (#89189)
Introduce `_eval_no_call` method, that evaluates statement only if it
does not contain any calls(done by examining the bytecode), thus preventing command injection exploit

Added simple unit test to check for that
`torch.jit.annotations.get_signature` would not result in calling random
code.

Although, this code path exists for Python-2 compatibility, and perhaps
should be simply removed.

Fixes https://github.com/pytorch/pytorch/issues/88868

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89189
Approved by: https://github.com/suo
2022-11-17 22:05:30 +00:00
maxren
637e764ec5 [xnnpack][executorch] Pass xnnexecutor pointer to compileModel() (#89090)
Here we pass XNNExecutor* to compile model so that XNNExecutor can be allocated by runtime. This signature change is for executorch:

```
XNNExecutor compileModel(void* buffer) --> void compileModel(void* buffer, XNNExecutor* executor)
```

The intended usecase for allocating Executor and Compiling the serialized flatbuffer:

```
XNNExecutor* executor = runtime_allocator->allocateList<jit::xnnpack::delegate::XNNExecutor>(1);
XNNCompiler::compileModel(processed.buffer, executor);

```

Differential Revision: [D41208387](https://our.internmc.facebook.com/intern/diff/D41208387/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89090
Approved by: https://github.com/digantdesai
2022-11-17 04:29:25 +00:00
maxren
d1f48f05ce [xnnpack][Bug Fix] Pass serialized model by reference (#89089)
Two changes
- Remove XNNCompiler Dependence on std::string by passing void*
- Grab ser_model by reference: This bug was causing data pointers given to xnn_runtime to be freed because ser_model was on the stack.

Differential Revision: [D41208380](https://our.internmc.facebook.com/intern/diff/D41208380/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89089
Approved by: https://github.com/digantdesai
2022-11-17 04:17:23 +00:00
maxren
366f1b2c2f [xnnpack][lite-int] Freeze/Inline module to remove reference to self (#88863)
We need to inline graph before converting from torchscript to xnnpack flatubuffer. Remove graph dependence on self.

This will later help us work with constant data.

Differential Revision: [D41049858](https://our.internmc.facebook.com/intern/diff/D41049858/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88863
Approved by: https://github.com/digantdesai
2022-11-17 04:14:57 +00:00
Kazuaki Ishizaki
a5f04e9a91 Fix typos in .md and .rst files (#88962)
This PR fixes typos `Github` in `.md` and `.rst` files.
`Github` -> `GitHub`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88962
Approved by: https://github.com/kit1980
2022-11-17 03:37:02 +00:00
R Max Espinoza
3af5cf4de1 doc(typo): memroy -> memory (#89126)
Minor typo in comments.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89126
Approved by: https://github.com/kit1980
2022-11-17 01:03:34 +00:00
Charlie West-Taylor
cfd552547f Use the Python frame safely in _pythonCallstack (#88993)
Currently, the result of `PyEval_GetFrame()` is piped straight to `Py_INCREF`. However, `PyEval_GetFrame` [may return null](https://docs.python.org/3/c-api/reflection.html#c.PyEval_GetFrame), which seems to be the case sometimes, when calling `_pythonCallstack` from another thread. This is handled in the subsequent `while (nullptr != frame)` block, but `Py_INCREF`, called before it, [doesn't handle this case](https://docs.python.org/3/c-api/refcounting.html#c.Py_INCREF), so the program segfaults. The safe form of `Py_INCREF` is `Py_XINCREF`, so use that instead ([docs](https://docs.python.org/3/c-api/refcounting.html#c.Py_XINCREF)).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88993
Approved by: https://github.com/albanD
2022-11-17 00:59:15 +00:00
Chen Lai
2452e3f99a Update xnnpack graph schema to use xnode and xvalue (#89036)
There are different nodes definition like [Node in autograd](https://www.internalfb.com/code/fbsource/fbcode/caffe2/torch/csrc/autograd/function.h?lines=108-609&reveal=108-609) and onnxnodes and etc. Understand namespace can be used where nodes from definition are used together, however it's still better to slightly differentiate the name.

Differential Revision: [D41002324](https://our.internmc.facebook.com/intern/diff/D41002324/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89036
Approved by: https://github.com/mcr229
2022-11-15 10:34:45 +00:00
Chen Lai
8c46a5de3a Add debug handle to xnnpack schema (#89033)
As title, add three things to the schema
1. debug handle for each node
2. file identifier, so we can sanity check we are getting the xnnpack schema flatbuffers file, instead of other random binary
3. extension, so the dumped binary will end up with its own extension like `myschema.xnnpack` (maybe can have a better name) instead of the default extension `.bin`

Differential Revision: [D40906970](https://our.internmc.facebook.com/intern/diff/D40906970/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89033
Approved by: https://github.com/mcr229
2022-11-15 09:49:54 +00:00
Wenzhe Xue
5314af5383 Set correct size of attr::output_layouts when the graph has multiple outputs in JIT oneDNN fuser (#88496)
Bug:
Previously, `initOutputLayouts()` was called after creating a graph and before merging other nodes. It is a vector with one element. So when a graph contains multiple outputs, e.g. using AOTAutograd compile in my case, layout_propagation pass try to access out of range elements in the vector. Then it comes to the second bug in `useOpaqueLayout()`, the out of range checks the index with the updated output size instead of the size of the vector. Then used `[]` to access the element, which is out of range.

Fixes the above two issues:

1. check the offset is within range with the size of `attr::output_layouts` vector instead of another variable. This check catches the error now.
2. change the place to initial `attr::output_layouts` after node merging. The graph may change with node merging. Thus we moved the initialization in layout_propagation with the complete graph.

Added test time:
`Ran 1 test in 0.383s`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88496
Approved by: https://github.com/jgong5, https://github.com/sanchitintel
2022-11-15 07:29:55 +00:00
Kazuaki Ishizaki
e0c194f10b Fix typos in messages under torch (#88961)
This PR fixes typos of messages and parms in c++ source and head files under `torch` directory.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88961
Approved by: https://github.com/albanD
2022-11-14 19:06:41 +00:00
Edward Z. Yang
46796fe5e9 Fix XLA symbolic shapes binding (#88928)
Obsoletes https://github.com/pytorch/pytorch/pull/88772

Mostly revolves around NOT assuming that the inside is a SymNode,
but instead duck-typed to be a SymNode.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88928
Approved by: https://github.com/SherlockNoMad
2022-11-13 00:31:27 +00:00
kshitij12345
f74946324e [fix] allow saving python attr on Tensor and Parameter via torch.save (#81616)
Fixes: https://github.com/pytorch/pytorch/issues/72129

TODO:
* [x] Fix for Parameter

Benchmark
(Measurable diff for small tensors)
```
[-------------- Save and Load --------------]
                    |  After PR  |  Before PR
1 threads: ----------------------------------
      ()            |    111.7   |     106.9
      (4, 4)        |    114.4   |     109.2
      (128, 128)    |    135.2   |     128.3
      (1024, 1024)  |   1431.9   |    1431.3

Times are in microseconds (us).
```

<details>

<summary> Benchmark Script </summary>

```python
import torch
from torch.testing._internal.common_utils import BytesIOContext
from torch.utils import benchmark
import pickle

shapes = ((), (4, 4), (128, 128), (1024, 1024))

sizes = [1, 64, 1024, 10000]
results = []

def save_load_fn(t):
    with BytesIOContext() as f:
        torch.save(t, f)
        f.seek(0)
        torch.load(f)

for shape in shapes:
    t = torch.randn(shape)
    label = 'Save and Load'
    sub_label = f'{shape}'
    results.append(benchmark.Timer(
        stmt='save_load_fn(t)',
        globals={'t': t, 'save_load_fn':save_load_fn},
        label=label,
        sub_label=sub_label,
        description='Before PR',
    ).blocked_autorange(min_run_time=2))

compare = benchmark.Compare(results)
compare.print()

with open('before_pr.pkl', 'wb') as f:
    pickle.dump(results, f)

# with open('after_pr.pkl', 'rb') as f:
#     after_pr = pickle.load(f)

# with open('before_pr.pkl', 'rb') as f:
#     before_pr = pickle.load(f)

# compare = benchmark.Compare(after_pr + before_pr)
# compare.print()
```

</details>

NOTE : **BC-Breaking** : After this PR, all tensors (also regular tensors) will be serialised using `_rebuild_from_type_v2`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81616
Approved by: https://github.com/albanD, https://github.com/kurtamohler
2022-11-11 21:11:12 +00:00
kshitij12345
d15a6b0c97 Error on ZeroTensor serialization (#88803)
Follow-up : https://github.com/pytorch/pytorch/pull/88182#issuecomment-1308628415

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88803
Approved by: https://github.com/anjali411
2022-11-11 08:51:29 +00:00
AllenTiTaiWang
a6d72f44a4 [ONNX] Add onnx::Max into standard Op for scalar type alignment (#88750)
Easy fix for onnx::Max ScalarType
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88750
Approved by: https://github.com/justinchuby, https://github.com/BowenBao
2022-11-11 04:22:04 +00:00
BowenBao
20ae19aa1d [ONNX] Improve diagnostic message formatting (#87830)
* Reflect required arguments in method signature for each diagnostic rule. Previous design accepts arbitrary sized tuple which is hard to use and prone to error.
     ![image](https://user-images.githubusercontent.com/9376104/200381982-d1e905f0-a159-4ef5-8d2e-070524e8f5bf.png)
* Removed `DiagnosticTool` to keep things compact.
* Removed specifying supported rule set for tool(context) and checking if rule of reported diagnostic falls inside the set, to keep things compact.
* Initial overview markdown file.
* Change `full_description` definition. Now `text` field should not be empty. And its markdown should be stored in `markdown` field.
* Change `message_default_template` to allow only named fields (excluding numeric fields). `field_name` provides clarity on what argument is expected.
* Added `diagnose` api to `torch.onnx._internal.diagnostics`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87830
Approved by: https://github.com/abock
2022-11-10 21:42:17 +00:00
maxren
37b468ac77 [xnnpack][lite-int][on-device] rebuild serialized modules at runtime (#88780)
This is the on-device runtime work. We modify the compile and execute from our hacky solution from before to what will actually be running at runtime.

First we rebuild our graph from the serialized flatbuffer string. We also introduce a runtime wrapper that inherits CustomClassHolder that allows us to forward along the built xnngraph runtime to our execute function

Once the subgraph object has been rebuilt by our we pass it along to the runtime wrapper for us to forward along to execute

At execute we prep the input/outputs and invoke the runtime using our runtime wrapper. Finally we forward those results to our execution

Differential Revision: [D39413031](https://our.internmc.facebook.com/intern/diff/D39413031/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39413031/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88780
Approved by: https://github.com/digantdesai
2022-11-10 21:35:28 +00:00
maxren
3a4e8736ad [xnnpack][on-device] compiler --> executor object (#88779)
#### XNN Compiler Object
This is purely to abstract away the subgraph rebuild from the flatbuffer object. CompileModel return an executor object which we can use to setup inputs and run forward with.

#### Executorch Considerations
We Include ATen/utils for torch_check, this will be changed when moving to executorch

Differential Revision: [D40733163](https://our.internmc.facebook.com/intern/diff/D40733163/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88779
Approved by: https://github.com/digantdesai
2022-11-10 21:09:22 +00:00
maxren
d5e1e2f0fc [xnnpack][on-device] executor class (#88778)
# Executor Class

Executor object used to wrap our xnn_runtime object. The ideal flow of this object looks as such:

```
executor.set_inputs(vector<tensor> inputs, vector<tensor> outputs)
executor.forward()
```

This will likely be returned by our delegate compile and given over to execute in order to run inference using the xnn runtime

##### Executorch Considerations
```
#include <ATen/Functions.h>
#include <ATen/Utils.h>
```
These Aten functions are included in order to use at::Tensor when setting the inputs, this will change when used for Executorch because we will be switching from at::Tensor to whatever tensor abstraction is used for ET. Seems like they have the same call for `.data_ptr<float>()`, so realistically all logic here will be the same.

ATen/Utils is used for TORCH_CHECK. We will switch to ET_CHECK_MESSAGE for executorch.

Differential Revision: [D40733121](https://our.internmc.facebook.com/intern/diff/D40733121/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88778
Approved by: https://github.com/digantdesai
2022-11-10 21:01:46 +00:00
Bert Maher
1e4079a476 [nnc] Disable opaque pointers mode in LLVM backend to allow getPointerElementType (#88798)
As of LLVM 15 typed pointers are going away:
https://llvm.org/docs/OpaquePointers.html.  Thus
`getPointerElementType` is no longer legal, since pointers are all
opaque.  I don't totally remember why we use it so prolifically, or
whether there's an easy change to get rid of it, or whether we'd need
a significant refactor to carry around `Type`s alongside `Value`s.

But in any case, NNC is deprecated (see: TorchInductor) and will
hopefully be gone before LLVM 16 is a thing.  For now, we can apply
the hack of turning off opaque pointer mode on the LLVMContext.

Differential Revision: [D41176215](https://our.internmc.facebook.com/intern/diff/D41176215)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88798
Approved by: https://github.com/desertfire
2022-11-10 18:14:02 +00:00
PyTorch MergeBot
93d3bd626e Revert "[primTorch] Improve narrow and narrow_copy: refs, tests, docs (#87045)"
This reverts commit aa8279bcb8.

Reverted https://github.com/pytorch/pytorch/pull/87045 on behalf of https://github.com/izaitsevfb due to BC-breaking change, D41161182
2022-11-09 20:48:32 +00:00
kshitij12345
eb9b156019 [fix] MathBits: serialization (#88182)
Fixes #81690

TODO:

* [x] C++ Unpickler Fix (locally tested pickled in Python and unpickled in C++)
* [x] C++ Pickler Fix (locally tested pickled in C++ and unpickled in Python)
* [x] Do quant_tensor, sparse_tensor, etc require similar changes? (Sparse and Quant don't need this)
* [x] Add Comments
* [x] How to make sure C++ and Python are in sync? (Functions in `pickler.h` help in getting and setting Tensor Metadata (math-bits for now) on a tensor. They are the only place which should handle this.)

Notes:
Quant Tensor don't support complex dtypes and for float they segfault with `_neg_view` : https://github.com/pytorch/pytorch/issues/88484

Sparse Tensor:
```python
>>> a = torch.tensor([[0, 2.], [3j, 0]]).to_sparse()
>>> a.conj().is_conj()
False
>>> a._neg_view()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NotImplementedError: Cannot access storage of SparseTensorImpl
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88182
Approved by: https://github.com/ezyang, https://github.com/anjali411
2022-11-09 17:15:12 +00:00
Nikita Karetnikov
aa8279bcb8 [primTorch] Improve narrow and narrow_copy: refs, tests, docs (#87045)
Fixes #87019.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/87045
Approved by: https://github.com/mruberry
2022-11-09 09:19:28 +00:00
Wei-Sheng Chin
19d7941e37 Fix Python-bound function signature (torch._C.Graph.addInput) (#88528)
In pytorch/torch/_C/__init__.pyi, Graph.addInput has signature
```python
  def addInput(self, name: str) -> Value: ...
```
which doesn't match the corresponding function
```cpp
  Value* addInput(const std::string& name = "") {
    return block_->addInput(name);
  }

```

in python_ir.cpp. This PR aligns the bound function on both C++ and Python sides. Without this PR, mypy will compain whenever a change contains some calls to `addInput`; for example,
![image](https://user-images.githubusercontent.com/3524474/200092086-429b8d63-9321-4d03-b0d6-f4c9bd361756.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88528
Approved by: https://github.com/davidberard98
2022-11-09 01:31:45 +00:00
Huy Do
8cb5c5543e Revive static_runtime_benchmark build and test (#87660)
This build uses the wrong BUILD_ENVIRONMENT `pytorch-linux-focal-py3`, thus it hasn't been run for a long time (forgotten). The name was probably the old name of the build environment we used in the past.  The convention today doesn't have the `pytorch-` prefix. There is a TODO for this:

> TODO: this condition is never (BUILD_ENVIRONMENT doesn't start with pytorch-), need to fix this.

This is done as part of [T131829540](https://www.internalfb.com/intern/tasks/?t=131829540), where we want
 `static_runtime_benchmark` build and test jobs to run  in OSS CI to avoid breaking internal

* I also fix some compiler warning errors `-Werror=sign-compare`, `-Werror,-Wunused-const-variable`, and gcc7 compatibility issue along the way because this hasn't been run for a long time.
* Reviving this test also reveals a small bug in `PrepackWeights` test in `test_static_runtime.cc` added recently in https://github.com/pytorch/pytorch/pull/85289. The test refers to an internal ops and should only be run internally. This has been fixed by https://github.com/pytorch/pytorch/pull/87799 (To be merged)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87660
Approved by: https://github.com/malfet
2022-11-08 08:32:45 +00:00
PyTorch MergeBot
78a0ca29d9 Revert "[fix] allow saving python attr on Tensor and Parameter via torch.save (#81616)"
This reverts commit 54b6188cc6.

Reverted https://github.com/pytorch/pytorch/pull/81616 on behalf of https://github.com/mehtanirav due to Internal publishing is broken
2022-11-07 18:51:16 +00:00
Mike Iovine
dd43903fa9 [Static Runtime] Fix tensor_split sections overload (#88113)
Summary:
D40798763 broke this op. Unfortunately, it wasn't caught at land time due to the recent OSS Static Runtime test problems.

The problem is C++ overload resolution. After D40798763, the int that we were passing to `at::native::tensor_split` was getting implicitly converted to `IntArrayRef`. Fix this by converting the int to a `SymInt` and calling the correct overload.

Test Plan:
```
buck2 test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- Tensor_Split --run-disabled
```

Differential Revision: D40862394

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88113
Approved by: https://github.com/hlu1
2022-11-07 14:36:39 +00:00
jjsjann123
7b419e8513 [NVFuser] Upstream push 1026 (#87779)
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Codegen changes include:

* codegen improvement:
    i. allow non-root trivial reductions, allow empty/no-op fusion
    ii. fixes vectorization checks and size calculation
    iii. bank conflict handle improvement
    iv. enables transpose scheduler

* misc:
    i. CI tests failure fixes
    ii. cpp tests file clean up
    iii. trivial forwarding supports added in codegen runtime
    iv. added factory methods support in codegen

Commits that's in this PR from the devel branch:

```
7117a7e37ebec372d9e802fdfb8abb7786960f4a patching nvfuser conv cudnn test numerics mismatch (#2048)
65af1a4e7013f070df1ba33701f2d524de79d096 Inserting sync for redundant parallel types is already done at the (#2023)
6ac74d181689c8f135f60bfc1ec139d88941c98c Fix sync map (#2047)
f5bca333355e2c0033523f3402de5b8aac602c00 Bank conflict checker improvements (#2032)
d2ca7e3fd203537946be3f7b435303c60fa7f51e Minor update on cp.async code generation. (#1901)
d36cf61f5570c9c992a748126287c4e7432228e0 Test file cleanup (#2040)
0b8e83f49c2ea9f04a4aad5061c1e7f4268474c6 Allow non-root trivial reductions (#2037)
a2dfe40b27cd3f5c04207596f0a1818fbd5e5439 Fix vectorize size calculation (#2035)
e040676a317fe34ea5875276270c7be88f6eaa56 Use withPredicate to replace setPredicate to maintain Exprs immutable (#2025)
197221b847ad5eb347d7ec1cf2706733aacbf97c removing ci workflow (#2034)
40e2703d00795526e7855860aa00b9ab7160755f Reduction rand like patch (#2031)
bc772661cbdb3b711d8e9854ae9b8b7052e3e4a3 Add utility for checking bank conflict of shared memory (#2029)
ddd1cf7695f3fb172a0e4bcb8e4004573617a037 Add back FusionReductionWithTrivialReduction_CUDA (#2030)
fbd97e5ef15fa0f7573800e6fbb5743463fd9e57 Revert "Cleanup trivial reduction workarounds (#2006)" (#2024)
bca20c1dfb8aa8d881fc7973e7579ce82bc6a894 Cleanup trivial reduction workarounds (#2006)
e4b65850eee1d70084105bb6e1f290651adde23e Trivial forwarding (#1995)
1a0e355b5027ed0df501989194ee8f2be3fdd37a Fix contiguity analysis of predicates to match updated contiguity. (#1991)
a4effa6a5f7066647519dc56e854f4c8a2efd2a7 Enable output allocation cache (#2010)
35440b7953ed8da164a5fb28f87d7fd760ac5e00 Patching bn inference (#2016)
0f9f0b4060dc8ca18dc65779cfd7e0776b6b38e8 Add matmul benchmark (#2007)
45045cd05ea268f510587321dbcc8d7c2977cdab Enable tests previously disabled due to an aliasing bug (#2005)
967aa77d2c8e360c7c01587522eec1c1d377c87e Contiguous indexing for View operations (#1990)
a43cb20f48943595894e345865bc1eabf58a5b48 Make inlining even more modular (#2004)
dc458358c0ac91dfaf4e6655a9b3fc206fc0c897 Test util cleanup (#2003)
3ca21ebe4d213f0070ffdfa4ae5d7f6cb0b8e870 More strict validation (#2000)
a7a7d573310c4707a9f381831d3114210461af01 Fix build problem (#1999)
fc235b064e27921fa9d6dbb9dc7055e5bae1c222 Just fixes comments (#1998)
482386c0509fee6edb2964c5ae72074791f3e43a cleanup (#1997)
4cbe0db6558a82c3097d281eec9c85ad2ea0893a Improve divisible split detection (#1970)
42ccc52bdc18bab0330f4b93ed1399164e2980c9 Minor build fix. (#1996)
fcf8c091f72d46f3055975a35afd06263324ede6 Cleanup of lower_utils.cpp: Isolate out GpuLower usage (#1989)
15f2f6dba8cbf408ec93c344767c1862c30f7ecc Move ConcretizedBroadcastDomains to shared_ptr in GpuLower. (#1988)
8f1c7f52679a3ad6acfd419d28a2f4be4a7d89e2 Minor cleanup lower_unroll.cpp (#1994)
1d9858c80319ca7f0037db7de5f04e47f540d76c Minor cleanup (#1992)
f262d9cab59f41c669f53799c6d4a6b9fc4267eb Add support for uniform RNG (#1986)
eb1dad10c73f855eb1ecb20a8b1f7b6edb0c9ea3 Remove non-const functions, remove GpuLower instance on build, pass in ca_map. (#1987)
634820c5e3586c0fe44132c51179b3155be18072 Add support for some empty fusion (#1981)
eabe8d844ad765ee4973faa4821d451ef71b83c3 Segment self mapping fusions (#1954)
e96aacfd9cf9b3c6d08f120282762489bdf540c8 Enable Transpose operation (#1882)
425dce2777420248e9f08893765b5402644f4161 Add a null scheduler that helps segmenting away no-op schedules (#1835)
306d4a68f127dd1b854b749855e48ba23444ba60 Fix canScheduleCompileTime check of transpose scheduler (#1969)
b1bd32cc1b2ae7bbd44701477bddbcfa6642a9be Minor fix (#1967)
bd93578143c1763c1e00ba613a017f8130a6b989 Enable transpose scheduler (#1927)
b7a206e93b4ac823c791c87f12859cf7af264a4c Move scheduler vectorize utilities into their own file (#1959)
d9420e4ca090489bf210e68e9912bb059b895baf View scheduling (#1928)
c668e13aea0cf21d40f95b48e0163b812712cdf2 Upstream push ci fixes (#1965)
c40202bb40ce955955bb97b12762ef3b6b612997 Fix dump effective bandwidth (#1962)
93505bcbb90a7849bd67090fe5708d867e8909e4 WAR on index mapping when exact and permissive maps differ (#1960)
45e95fd1d3c773ee9b2a21d79624c279d269da9f Allow splitting inner-most ID to create virtual innermost ID in transpose scheduler (#1930)
a3ecb339442131f87842eb56955e4f17c544e99f Improve the comments at the beginning of index_compute.h (#1946)
f7bc3417cc2923a635042cc6cc361b2f344248d6 Remove unused variables (#1955)
df3393adbb5cb0309d091f358cfa98706bd4d313 Some cleanup (#1957)
7d1d7c8724ab5a226fad0f5a80feeac04975a496 TVDomainGuard factory (#1953)
357ba224c0fb41ed3e4e8594d95599c973f4a0ca Fill allocation with nan on tests (#1956)
8eafc54685d406f5ac527bcbacc475fda4492d7a Fix detection of unmappable root domains (#1952)
90a51f282601ba8ebd4c84b9334efd7762a234bc Some indexing cleanups, Add eye support (#1940)
ddc01e4e16428aec92f9c84d698f959b6436a971 Exclude unsupported data types (#1951)
992e17c0688fe690c51b50e81a75803621b7e6aa test the groups the same order as they are merged (#1949)
208262b75d1fed0597a0329d61d57bc8bcd7ff14 Move detection of self mapping IDs to IterDomainGraph from (#1941)
ac4de38c6ee53b366e85fdfe408c3642d32b57df Merge pull request #1945 from csarofeen/master_merge_0828
631094891a96f715d8c9925fb73d41013ca7f2e3 Add full, full_like, zeros, zeros_like, ones, ones_like (#1943)
aab10bce4541204c46b91ff0f0ed9878aec1bfc4 Merge remote-tracking branch 'upstream/viable/strict' into HEAD
4c254c063bb55887b45677e3812357556a7aa80d Fix arange when step is negative (#1942)
89330aa23aa804340b2406ab58899d816e3dc3d2 Tensor factories must set the output shape as its input (#1939)
```

RUN_TORCHBENCH: nvfuser

Differential Revision: [D40869846](https://our.internmc.facebook.com/intern/diff/D40869846)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87779
Approved by: https://github.com/davidberard98
2022-11-04 20:04:34 +00:00
ssjia
b78b8727ff [vulkan] enable prepacking for Batchnorm op (#88433)
Adds a `BatchNormPackedContext` so that the `batchnorm` op can use prepacking.

Differential Revision: [D40721546](https://our.internmc.facebook.com/intern/diff/D40721546/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/88433
Approved by: https://github.com/manuelcandales
2022-11-04 19:24:13 +00:00
Max Ren
826b4a9c2d [coreml] delegate multiple outputs (#88345)
Summary:
https://www.internalfb.com/code/fbsource/[c0e4da0b5c7fff3b4e31e4611033c30cabdc6aef]/fbcode/caffe2/torch/csrc/jit/backends/backend_detail.cpp?lines=268-276

seems like the torchscript addition of
`$unpack, = self.__backend.execute( ... `

the comma after unpack forces the result of execute to have only one item. So for this fix now when the size of the outputs > 1, execute returns a List List of outputs (basically put the outputs in another list before putting it into the list we return)
```
[[output1, output2, output3, ...]]
```
instead of
```
[output1, output2, output3, ...]
```

Do we want to fix this in backend_detail? Or should we make the change in our delegate to accomadate the torchscript? Proposing this q here. Requesting cccclai, kimishpatel for approval here

Test Plan: unblocked models for chengxiangyin and models in pytorch playground all passing unit tests

Reviewed By: kimishpatel, cccclai

Differential Revision: D40328684

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88345
Approved by: https://github.com/jmdetloff, https://github.com/Skylion007
2022-11-03 20:05:53 +00:00
Kshiteej K
54b6188cc6 [fix] allow saving python attr on Tensor and Parameter via torch.save (#81616)
Fixes: https://github.com/pytorch/pytorch/issues/72129

TODO:
* [x] Fix for Parameter

Benchmark
(Measurable diff for small tensors)
```
[-------------- Save and Load --------------]
                    |  After PR  |  Before PR
1 threads: ----------------------------------
      ()            |    111.7   |     106.9
      (4, 4)        |    114.4   |     109.2
      (128, 128)    |    135.2   |     128.3
      (1024, 1024)  |   1431.9   |    1431.3

Times are in microseconds (us).
```

<details>

<summary> Benchmark Script </summary>

```python
import torch
from torch.testing._internal.common_utils import BytesIOContext
from torch.utils import benchmark
import pickle

shapes = ((), (4, 4), (128, 128), (1024, 1024))

sizes = [1, 64, 1024, 10000]
results = []

def save_load_fn(t):
    with BytesIOContext() as f:
        torch.save(t, f)
        f.seek(0)
        torch.load(f)

for shape in shapes:
    t = torch.randn(shape)
    label = 'Save and Load'
    sub_label = f'{shape}'
    results.append(benchmark.Timer(
        stmt='save_load_fn(t)',
        globals={'t': t, 'save_load_fn':save_load_fn},
        label=label,
        sub_label=sub_label,
        description='Before PR',
    ).blocked_autorange(min_run_time=2))

compare = benchmark.Compare(results)
compare.print()

with open('before_pr.pkl', 'wb') as f:
    pickle.dump(results, f)

# with open('after_pr.pkl', 'rb') as f:
#     after_pr = pickle.load(f)

# with open('before_pr.pkl', 'rb') as f:
#     before_pr = pickle.load(f)

# compare = benchmark.Compare(after_pr + before_pr)
# compare.print()
```

</details>

NOTE : **BC-Breaking** : After this PR, all tensors (also regular tensors) will be serialised using `_rebuild_from_type_v2`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81616
Approved by: https://github.com/albanD, https://github.com/kurtamohler
2022-11-03 09:57:47 +00:00
jjsjann123
b325c3fc25 [nvFuser] patches profiling on scalar arguments for std/var (#88165)
Fixes #86531

Added profiling on scalar values for aten::std & aten::var.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88165
Approved by: https://github.com/kevinstephano
2022-11-02 22:47:34 +00:00
Digant Desai
03346296db [edge profiler] Add support for performance events counting (#87876)
* Add support in lite_predictor benchmark binary to select event lists
* Uses Linux perf through Kineto profiler

Differential Revision: [D39837216](https://our.internmc.facebook.com/intern/diff/D39837216/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39837216/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87876
Approved by: https://github.com/SS-JIA
2022-11-02 14:47:44 +00:00
Ivan Yashchuk
9ebb8d5232 Add ops.broadcast for nvFuser (#88080)
Having nvFuser's `broadcast` available alongside `broadcast_in_dim` would allow easier experimentation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/88080
Approved by: https://github.com/jjsjann123, https://github.com/kevinstephano, https://github.com/mruberry
2022-11-02 10:05:12 +00:00