Commit Graph

435 Commits

Author SHA1 Message Date
cyy
dee100945e [2/N] Move c10::variant to std::variant (#109723)
This PR moves most of c10::variant calls to std::variant.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109723
Approved by: https://github.com/ezyang
2023-09-24 02:47:43 +00:00
cyy
4c208c1475 Remove unneeded linking in CMake targets (#109192)
This PR removes unused library dependencies, help refactoring in the future.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/109192
Approved by: https://github.com/ezyang
2023-09-15 19:43:25 +00:00
PyTorch MergeBot
380ccfd442 Revert "Added round_with_scale_factor arg to ATen (#97868)"
This reverts commit aa99c5b4ed.

Reverted https://github.com/pytorch/pytorch/pull/97868 on behalf of https://github.com/osalpekar due to Caused breakages in the glow compiler - see [D45374622](https://www.internalfb.com/diff/D45374622) for more details
2023-04-28 20:47:00 +00:00
vfdev-5
aa99c5b4ed Added round_with_scale_factor arg to ATen (#97868)
Addresses #62396 following the strategy described in https://github.com/pytorch/pytorch/pull/64983#issuecomment-1026177629.

Fixing output size to match opencv, scikit-image, scipy if scale factor is specified on ATen side only due to JIT FC.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/97868
Approved by: https://github.com/lezcano, https://github.com/mikaylagawarecki
2023-04-26 18:48:37 +00:00
cyy
1ab112cfab code is clean enough that some warnings can be enabled (#95139)
Fixes #ISSUE_NUMBER

Pull Request resolved: https://github.com/pytorch/pytorch/pull/95139
Approved by: https://github.com/Skylion007
2023-02-21 07:24:20 +00:00
mikey dagitses
322e4b4c8a set -Wsuggest-override for builds (#89852)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/pytorch/pytorch/pull/89852).
* __->__ #89852
* #89851

set -Wsuggest-override for builds

Summary: This was flagged by a Meta internal build.

Test Plan: Rely on CI.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/89852
Approved by: https://github.com/malfet
2022-12-19 22:08:47 +00:00
Wang, Eikan
70c6a988d6 Fix the performance issue that the for-loop before ExternallCall could not be parallelized. (#85056)
Currently, NNC only parallelizes the loop statement of the graph outputs. The logic could bypass some loop statements that could be parallelized. Take an example as follows and suppose the output of `ExternallCall` is also the output of NNC fusion group. Current [parallel logic](https://github.com/pytorch/pytorch/pull/85056/files#diff-9a11174c26e4b57ab73e819520122bc314467c72962f3a5b79e7400ea3c4bbe5L781-L785) only tries to parallel the `ExternalCall` and bypass `stmt1` and `stmt2`.

```c++
stmt1: For:
stmt2:   For:
stmt3: ExternalCall
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85056
Approved by: https://github.com/frank-wei, https://github.com/bertmaher
2022-10-07 07:36:28 +00:00
Wang, Eikan
45be74cc63 Optimize to if the datatyep of the source tensor is as same as the dest datatype (#85140)
The AMP inserts `_autocast_to_reduced_precision` and `_autocast_to_full_precision` automatically. The aten implementation provides a fast path to bypass the conversion if the tensor data type has been the reduced/full precision. But NNC always does the conversion which could bring >5% E2E performance regression.

This PR is to address the performance issue like aten. We will not pull `_autocast_to_reduced_precision` and `_autocast_to_full_precision` into NNC fusion group and fallback to aten to trigger its fast path if the tensor data type has been the reduced/full precision.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85140
Approved by: https://github.com/frank-wei
2022-09-27 04:40:42 +00:00
Edward Z. Yang
9e5563dbb1 Delete SymIntArrayRef wrapper struct (#84837)
Since we separated at::foo and at::foo_symint there is no benefit
to trying to make initializer lists work in both cases.  So we can
get rid of the special different struct.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84837
Approved by: https://github.com/kit1980
2022-09-12 20:04:01 +00:00
PyTorch MergeBot
034f2db1fd Revert "Delete SymIntArrayRef wrapper struct (#84837)"
This reverts commit 9c78f599e4.

Reverted https://github.com/pytorch/pytorch/pull/84837 on behalf of https://github.com/ZainRizvi due to The test test_post_localSGD_optimizer_step_reload in the X linux-bionic-cuda11.6-py3.10-gcc7 workflow has started consistently failing since this PR was submitted
2022-09-12 19:04:07 +00:00
Edward Z. Yang
9c78f599e4 Delete SymIntArrayRef wrapper struct (#84837)
Since we separated at::foo and at::foo_symint there is no benefit
to trying to make initializer lists work in both cases.  So we can
get rid of the special different struct.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84837
Approved by: https://github.com/kit1980
2022-09-12 16:28:20 +00:00
Richard Barnes
67f0940cdd Check all CUDA API calls for errors in test/ (#74921) (#83954)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74921

Test Plan: Sandcastle

Reviewed By: ezyang, malfet, ngimel

Differential Revision: D35194966

Pull Request resolved: https://github.com/pytorch/pytorch/pull/83954
Approved by: https://github.com/ezyang
2022-08-24 20:12:25 +00:00
Will Constable
4f34cd6d1e Replace all CHECK_ and DCHECK_ with TORCH_* macros (#82032)
Avoid exposing defines that conflict with google logging, since this blocks external usage of libtorch in certain cases.

All the 'interesting' changes should be in these two files, and the rest should just be mechanical changes via sed.
c10/util/logging_is_not_google_glog.h
c10/util/logging_is_google_glog.h

Fixes https://github.com/pytorch/pytorch/issues/81415

cc @miladm @malfet
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82032
Approved by: https://github.com/soumith, https://github.com/miladm
2022-07-26 01:20:44 +00:00
Nikita Shulga
4a4890cfb2 [BE] Use CamelCase for enum class members (#79772)
Per many C++ code-style guides members(for [example](https://google.github.io/styleguide/cppguide.html#Enumerator_Names) ) members of `enum` should be CamelCased,
and only defines should be ALL_CAPS

Changes `MemOverlap`, `MemOverlapStatus` and `CmpEvalResult` enum values

Also, `YES`, `NO`, `TRUE` and `FALSE` are often system defines

Fixes among other things, current iOS build regression, see, which manifests as follows (see [this](6e90572bb9):
```
/Users/runner/work/pytorch/pytorch/aten/src/ATen/MemoryOverlap.h:19:29: error: expected identifier
enum class MemOverlap { NO, YES, TOO_HARD };
                            ^
/Applications/Xcode_12.4.app/Contents/Developer/Platforms/iPhoneSimulator.platform/Developer/SDKs/iPhoneSimulator14.4.sdk/usr/include/objc/objc.h:89:13: note: expanded from macro 'YES'
#define YES __objc_yes
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79772
Approved by: https://github.com/drisspg, https://github.com/kulinseth
2022-06-17 05:53:57 +00:00
Michael Andreas Dagitses
ab2ca95dd1 turn on -Werror=unused-variable in our Bazel CPU build
Summary:
We also fix any existing issues. Note that we only do this for the CPU
build because nvcc is considered a C++ toolchain but it does not have
the same flag support. Adding flags to the GPU build will cause nvcc
errors.

Test Plan: Built locally, rely on CI to confirm.

Reviewers: malfet

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79156

Approved by: https://github.com/seemethere, https://github.com/osalpekar, https://github.com/albanD
2022-06-11 02:46:34 +00:00
Michael Andreas Dagitses
606b234336 turn on -Werror=unused-function in our Bazel CPU build
Summary:
We also fix any existing issues. Note that we only do this for the CPU
build because nvcc is considered a C++ toolchain but it does not have
the same flag support. Adding flags to the GPU build will cause nvcc
errors.

Test Plan: Built locally, rely on CI to confirm.

Reviewers: malfet

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79154

Approved by: https://github.com/seemethere, https://github.com/osalpekar, https://github.com/albanD
2022-06-10 22:11:54 +00:00
PyTorch MergeBot
bcd7a20953 Revert "turn on -Werror=unused-function in our Bazel CPU build"
This reverts commit 67d313a032.

Reverted https://github.com/pytorch/pytorch/pull/79154 on behalf of https://github.com/malfet due to Breaks bazel build: 67d313a032
2022-06-10 20:43:03 +00:00
Michael Andreas Dagitses
67d313a032 turn on -Werror=unused-function in our Bazel CPU build
Summary:
We also fix any existing issues. Note that we only do this for the CPU
build because nvcc is considered a C++ toolchain but it does not have
the same flag support. Adding flags to the GPU build will cause nvcc
errors.

Test Plan: Built locally, rely on CI to confirm.

Reviewers: malfet

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79154

Approved by: https://github.com/seemethere, https://github.com/osalpekar, https://github.com/albanD
2022-06-10 18:30:08 +00:00
dzdang
a56f4e23b9 [quant][core][better-engineering] Rename files in quantized directory to conform with non-quantized countertpart filenames
Summary:
Names of analogous files in quantized directory (previously snake case) were inconsistent with
their non-quantized filename counterparts (pascal case). This is the first of a series of PRs that changes
all files in quantized (and sub-directories) dir to have pascal case.

`aten/src/ATen/native/quantized/qconv_unpack.cpp` has not been renamed yet
because (for reasons currently unknown) after making the name change, `import torch` produces the below error (`qlinear_unpack.cpp` renaming also seems to fail some phabricator CI tests for similar reasons). We suspect that these may be undefined errors and will revisit naming these files in a future PR.

```
terminate called after throwing an instance of 'c10::Error'
  what():  Type c10::intrusive_ptr<ConvPackedParamsBase<2> > could not be converted to any of the known types.
Exception raised from operator() at ../aten/src/ATen/core/jit_type.h:1735 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x55 (0x7f26745c0c65 in /data/users/dzdang/pytorch/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xb1 (0x7f26745bdcd1 in /data/users/dzdang/pytorch/torch/lib/libc10.so)
frame #2: <unknown function> + 0x1494e24 (0x7f2663b14e24 in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so)
frame #3: <unknown function> + 0xfed0bc (0x7f266366d0bc in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so)
frame #4: c10::detail::infer_schema::make_function_schema(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>) + 0x5a (0x7f266366d71a in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so)
frame #5: c10::detail::infer_schema::make_function_schema(c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>, c10::ArrayRef<c10::detail::infer_schema::ArgumentDef>) + 0x7b (0x7f266366e06b in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so)
frame #6: <unknown function> + 0x1493f32 (0x7f2663b13f32 in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0xe227dd (0x7f26634a27dd in /data/users/dzdang/pytorch/torch/lib/libtorch_cpu.so)
frame #8: <unknown function> + 0x14e0a (0x7f268c934e0a in /lib64/ld-linux-x86-64.so.2)
..........................truncated.............
```

Test Plan:
```
python test/test_quantization.py
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77037

Approved by: https://github.com/jerryzh168
2022-06-07 13:47:08 +00:00
Wang, Eikan
e5a5cd149f Simplify IfThenElse and CompareSelect within for-loop (#76793)
Analyze the range to determine if a condition cannot be satisfied. Suppose the for-loop body contains `IfThenElse` or `CompareSelect` while the condition of the two statements depends on the for-loop index `Var`. In that case, we will analyze the range to check whether the condition could always be satisfied or not. If the condition is deterministic, simplify the logic.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76793
Approved by: https://github.com/huiguoo
2022-05-15 20:21:28 +00:00
Wang, Eikan
429a80dded [NNC] Lowering function generates the output buffer with the specified stride (#76529)
Summary:
Pass stride information to lowering function to generate the output bufer with proper memory layout.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76529

Reviewed By: ZolotukhinM

Differential Revision: D36116712

Pulled By: IvanKobzarev

fbshipit-source-id: d3901f756b3710ecce172d6db3ecb0b7c12fb929
(cherry picked from commit b6cd53c91c01db36ea0e99167dc0ce0ae1d3aa23)
2022-05-04 20:04:22 +00:00
zengk95
1d55518198 Revert "[nnc] Strides to Tensor (#72962)"
This reverts commit 939060925f.

Fixes https://github.com/pytorch/vision/issues/5873

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76332
Approved by: https://github.com/seemethere
2022-04-25 19:50:00 +00:00
Ivan Kobzarev
939060925f [nnc] Strides to Tensor (#72962)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72962

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM, cpuhrsch

Differential Revision: D34589306

Pulled By: IvanKobzarev

fbshipit-source-id: ecee5249760ecc0c8b2edb1842b90218899bc944
(cherry picked from commit 9e310c4c67389da30da89126d838ffe3864aba6f)
2022-04-23 19:35:15 +00:00
Wang, Eikan
ef0873327e [NNC] Add utility functions to check channels-last contiguous (#75938)
Summary:
The `Buf` uses `std::vector<ExprHandle>` to represent its strides. The `ExprHandle` could be an immediate value or a mathematical expression with variables involved both for the static shape and dynamic shape. So it is hard to directly deduce the channels-last contiguous layout based on the numerical calculation. Hence, the utility functions of this PR are based on the pattern match to check whether the `Buf` is channels-last contiguous.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75938

Reviewed By: cpuhrsch

Differential Revision: D35724091

Pulled By: ZolotukhinM

fbshipit-source-id: f79ae21749d0aad8601f0434b52df88602ff09bf
(cherry picked from commit 3712bbbe4bea57c5c1abe1eafde4b8778e13e0c4)
2022-04-22 06:42:39 -07:00
Raghavan Raman
c2d5f6a5a4 [nnc] Update bounds overlap analysis to identify non-overlaps even with symbolic bounds
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74658

Approved by: https://github.com/ZolotukhinM
2022-04-14 20:24:03 +00:00
Raghavan Raman
d8ad1a579f [nnc] Fuse loops that have variable bounds
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74346

Approved by: https://github.com/ZolotukhinM
2022-04-14 20:24:03 +00:00
John Clow
f281d83d77 Moving Remove Tensor Type Specializations to after custom passes
This is to allow for Intel folks to use type information in their custom passes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71748

Approved by: https://github.com/eellison
2022-04-11 22:12:01 +00:00
Wang, Eikan
252e1ccce6 Enable TE fuser to support user defined operator (#73073)
Summary:
PyTorch supports registering a custom operator by `TORCH_LIBRARY_FRAGMENT` / `TORCH_LIBRARY_IMPL` and `torch::jit::tensorexpr::getNNCLoweringRegistry` could insert a custom operator. But the te fuser passes conditional check does not support custom operator. The `isSupported` of `tensorexpr_fuser` checks whether the `Node` is `get_tensorexpr_elementwise_set()`, `supported_non_eltwise_set()`, `supported_misc_set` and `supported_reduction_set`. If a custom operator needs to be added to the TE fusion group, the checked will block it.

Taking the RN50 as an example, we can speed up the model by fusing the convolution and consecutive element-wise operator into a custom operator. The framework overhead becomes non-negligible when the computation becomes more efficient, especially for the latency mode and the tiny models. If the TE fuser allows adding the custom operator to the fusion group, then the entire RN50 model could be fused by TE as a single operator/function consisting of "ExternalCalls" and TE-IR.  This could significantly reduce framework overhead, which in turn improves RN50 E2E performance. The same goes for other models.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73073

Reviewed By: pbelevich

Differential Revision: D35453165

Pulled By: ZolotukhinM

fbshipit-source-id: a764cf340b0b1e05fe230649cbe44f5786bdd37d
(cherry picked from commit ee95aa4d36714540fbb216a338799e6a6bb966d5)
2022-04-07 04:36:39 +00:00
Nikita Shulga
81d765ef1f Fix sign-compare violations in cpp tests
Prerequisite change for enabling `-Werror=sign-compare` across PyTorch repo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75080

Approved by: https://github.com/atalman
2022-04-04 23:05:31 +00:00
Nikita Shulga
43313cbde3 Revert D34647822: [tensorexpr] Add support for aten::stack
Test Plan: revert-hammer

Differential Revision:
D34647822 (954c7e2a77)

Original commit changeset: 3b863c71886c

Original Phabricator Diff: D34647822 (954c7e2a77)

fbshipit-source-id: e9ce06c9c8d7caf0fbb2565f0d99035bad685793
(cherry picked from commit b2ff355e9dbaa4e940fb221254223984c3c8a215)
2022-03-31 04:25:43 +00:00
Nikita Shulga
320e5a8268 Revert D34808051: [tensorexpr] Enabled aten::stack in the fuser pass with static shapes
Test Plan: revert-hammer

Differential Revision:
D34808051

Original commit changeset: 213e2ffdf87f

Original Phabricator Diff: D34808051

fbshipit-source-id: b618daeb346f784e8ab9525040edcb4a30a39613
(cherry picked from commit e47b973cba5c95e9410f8aecdfd5619de6d4be7c)
2022-03-31 04:25:43 +00:00
Hui Guo
90c3699cc8 [tensorexpr] Enabled aten::stack in the fuser pass with static shapes (#74077)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74077

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D34808051

Pulled By: huiguoo

fbshipit-source-id: 213e2ffdf87fb1a74104037cea7ef25e4bfd4307
(cherry picked from commit ad9e84842e5b47eda845827d325b08ba361a8286)
2022-03-31 04:25:43 +00:00
Hui Guo
954c7e2a77 [tensorexpr] Add support for aten::stack (#73801)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73801

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D34647822

Pulled By: huiguoo

fbshipit-source-id: 3b863c71886c7c6616b16f5d3313079714c8b82a
(cherry picked from commit c71778cf6a5724d26b671bf3ee0478add24990e8)
2022-03-30 21:25:15 +00:00
David Dang
abfaef0aec [Quant][core] Merged conv packed params and linear packed params (#73486)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73486

conv and linear packed params were previously defined in ATen/native/quantized/cpu/conv_packed_params.h> and ATen/native/quantized/cpu/packed_params.h>. These two files have been merged into one and has been relocated to ATen/native/quantized/cpu/packed_params.h>.

Differential Revision:
D34513286
D34513286

Test Plan: Imported from OSS

Reviewed By: dagitses

Pulled By: dzdang

fbshipit-source-id: 813845af7ea9449e316ab7822efe7460f0bd0d88
(cherry picked from commit 2f627561f27f81977ff73b8863c5e9e719dc4c60)
2022-03-11 15:18:45 +00:00
Ivan Kobzarev
519e226b66 [tensorexp] ExternalCall2 without memcpy (#72225)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72225

Test Plan: Imported from OSS

Reviewed By: dagitses

Differential Revision: D33960933

Pulled By: IvanKobzarev

fbshipit-source-id: fc73a3de9e5150919e3806516065b4a6c8316000
(cherry picked from commit f637842c341e0ba94906a0c8a1efc81691dc512c)
2022-03-09 21:19:26 +00:00
Elias Ellison
52ccbf4494 Lock thread/block computation (#73800)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73800

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D34647281

Pulled By: eellison

fbshipit-source-id: adbdaf24191c4c1b85e0b62564388f2481002ed2
(cherry picked from commit 6cf38015cc14691518b1b5cb7d636e80eb3684fc)
2022-03-04 22:32:08 +00:00
Hui Guo
5eb5b61221 [tensorexpre] Add typecast when src and dest buf types are different in PlacementAllocate (#71934)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71934

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D33826700

Pulled By: huiguoo

fbshipit-source-id: 9fb29a43ab5983586a6bfde3a34d7e2f2120ab0a
(cherry picked from commit 2bee018691ec888cb1ec761528951f5745d7ef79)
2022-02-23 19:36:50 +00:00
Raghavan Raman
0d66748948 [jit] Add tests for JIT with dynamic shape fusion (#72201)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72201

Reviewed By: mikaylagawarecki

Differential Revision: D34067211

Pulled By: navahgar

fbshipit-source-id: 2c13bb43c76c7fed720ad37892d2177c3dc0b924
(cherry picked from commit eed2d8cea4)
2022-02-18 23:29:08 +00:00
Raghavan Raman
6d33852685 [NNC] TensorExprKernel state should not be modified on calls to run methods (#73028)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73028

A typical use case for `TensorExprKernel` is to create the kernel once and call it multiple times, possibly in parallel. For the parallel calls to work, we need to ensure that the run() method calls do not change any state in `TensorExprKernel`.

Before this change, the `run()` method was modifying the sizes and strides vectors when dynamic shapes were present. This manifested as a data race when running a model with Static Runtime.
ghstack-source-id: 149398820

Test Plan:
```
buck build mode/dev-asan //caffe2/test/cpp/tensorexpr:tensorexpr
./buck-out/dev/gen/caffe2/test/cpp/tensorexpr/tensorexpr --gtest_filter="DynamicShapes.MultiThreadedExecution"
```

Reviewed By: eellison

Differential Revision: D34287960

fbshipit-source-id: d311f3c5a66c5d5de4e1deaeaa01816b53e9906e
(cherry picked from commit 161568bfae)
2022-02-17 23:14:27 +00:00
Ivan Kobzarev
67cd98fad4 [tensorexpr] Fix isNLC segfault (#72786)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72786

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D34204523

Pulled By: IvanKobzarev

fbshipit-source-id: 9a0f2ce0a1921e261932029c3ebd842330fdf528
(cherry picked from commit b8326064f6)
2022-02-15 20:31:56 +00:00
Mikhail Zolotukhin
1855b14922 [TensorExpr] Delet DimArg class. (#72390)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72390

This class didn't add much value and only caused more boilerplate code.
This change removes the class and updates all the use cases with
uses of `ExprHandle`.

A side effect of this change is different names in loop variables, which
caused massive mechanical changes in our tests.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D34030296

Pulled By: ZolotukhinM

fbshipit-source-id: 2ba4e313506a43ab129a10d99e72b638b7d40108
(cherry picked from commit c2ec46a058)
2022-02-11 01:21:59 +00:00
Mikhail Zolotukhin
9123e9b3b5 [TensorExpr] Switch from ExprPtr to ExprHandle in Compute impl. (#72389)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72389

This is an NFC change that just prepares the code for the upcoming
deletion of `DimArg` class. This change makes `Compute` and `Reduce`
APIs to use `ExprHandle` everywhere.

There should be no observable behavior change from this PR.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D34030295

Pulled By: ZolotukhinM

fbshipit-source-id: 3fd035b6a6bd0a07ccfa92e118819478ae85412a
(cherry picked from commit 1b0a4b6fac)
2022-02-11 01:21:59 +00:00
Raghavan Raman
765908708b [nnc] Adding a test with dynamic shapes from a model (#72198)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72198

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D33951741

Pulled By: navahgar

fbshipit-source-id: 596b193eba14c8e1affa9fa13070079f05d64cac
(cherry picked from commit ddbb78ff80)
2022-02-08 02:00:46 +00:00
Raghavan Raman
ff71429906 [nnc] Add stride args while running with allocated outputs (#72223)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/72223

ghstack-source-id: 148494871

Test Plan:
```
buck test mode/opt //caffe2/test/cpp/tensorexpr:tensorexpr -- --exact 'caffe2/test/cpp/tensorexpr:tensorexpr - DynamicShapes.GraphWithSymbolicStrides'
```

Reviewed By: eellison

Differential Revision: D33960592

fbshipit-source-id: 6334978d5e3713889b4ad12bcd8ed8c69df39d58
(cherry picked from commit 95cc102bc2)
2022-02-07 19:24:56 +00:00
Raghavan Raman
38f696c0cd [nnc] Add a API to unroll loops by a given factor (#72071)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/72071

Reviewed By: ngimel

Differential Revision: D33946250

Pulled By: navahgar

fbshipit-source-id: 3f3f92054174620025a9d71154d006f1738953e2
(cherry picked from commit d8b53598e9)
2022-02-03 18:41:21 +00:00
Ivan Kobzarev
34e4418dfa [nnc] tensorexpr for quantized/aten::upsample_nearest2d (#71236)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/71236

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D33553305

Pulled By: IvanKobzarev

fbshipit-source-id: 2442afee6d23123bb3a4bc52d3555393b0254106
(cherry picked from commit 90a263fc08)
2022-02-01 19:48:53 +00:00
Mikhail Zolotukhin
1dbcde2ade [TensorExpr] Support scalar intermediate and output values. (#71186)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71186

So far we've only supported scalar inputs, but couldn't handle scalar outputs
or intermediates. This PR adds it.

Scalar outputs are returned as 0-dim tensors. If the kernel is invoked on a
stack of IValues, we correctly convert the results to scalar IValues when
needed. If the kernel is invoked with a vector of void* pointers, everything
works out of the box without any conversions.

Lowerings for scalar operators are a bit tricky. Usual lowerings return a pair
<Buf, Stmt> (aka Tensor), but for scalar operators we also want to have the
corresponding Var that the lowering function supposedly creates (in theory we
could just use Loads and Stores, but I'm worried it can affect performance as
there is no guarantee this will be optimized by LLVM). So, what we do here to
work around this is we return a fake buf + stmt that sets the corresponding
var. Then outside of the lowering we create a real buffer and generate a Store
to it with the value from the variable we passed as the base handle of the fake
buf. This real buffer is then treated as usual by the rest of the system and we
can use it if we need to return this scalar value as a kernel output. If we do
not need to return it, then the Store will be deleted by the DCE pass.

Differential Revision:
D33539324
D33539324

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: ab4524b9820ce204f106effcf6232ed33d4ee223
(cherry picked from commit 7faa0939f0)
2022-01-26 06:32:51 +00:00
Raghavan Raman
70c9146c40 [nnc] Update block and thread extents in cuda_codegen to use int64_t (#71428)
Summary:
The block and thread extent calculations in `cuda_codegen` should be using `int64_t` instead of `int`. The updated test, `test_dynamic_shapes`, fails without this change.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71428

Reviewed By: samdow

Differential Revision: D33640374

Pulled By: navahgar

fbshipit-source-id: 64c340ad2a9a1fa1fe066cf1c5dfc3b546b7be6d
(cherry picked from commit 6ea546ce11)
2022-01-19 23:21:24 +00:00
CodemodService FBSourceClangFormatLinterBot
88012c7daf [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D33577744

fbshipit-source-id: 7ecc8367998ee1dffde54c2f4dd3cfafe19a53c9
2022-01-14 06:10:57 -08:00
Mike Ruberry
3a0c680a14 Jiterates exp2, erfc, erfinv and entr and refactors code_template.h to ATen (#71295)
Summary:
Per title.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71295

Reviewed By: ngimel

Differential Revision: D33575885

Pulled By: mruberry

fbshipit-source-id: bc841b46fc0b5458a26a4d4465b18a7a54cd5a5b
2022-01-13 23:58:51 -08:00