Commit Graph

8 Commits

Author SHA1 Message Date
Xue Li
2f099c7555 Revert D30652629: use irange for loops
Test Plan: revert-hammer

Differential Revision:
D30652629 (687c2267d4)

Original commit changeset: 0ae6c4bbbb55

fbshipit-source-id: 5c4f067b584a021c8c9656454d1ee60999600fb3
2021-10-15 15:23:10 -07:00
Richard Barnes
687c2267d4 use irange for loops (#66234)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234

Modified loops in files under fbsource/fbcode/caffe2/ from the format

`for(TYPE var=x0;var<x_max;x++)`

to the format

`for(const auto var: irange(xmax))`

This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand.

bypass_size_limit
allow-large-files

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D30652629

fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e
2021-10-15 13:50:33 -07:00
Mikhail Zolotukhin
f23f21dafe [TensorExpr] Remove 'Placeholder' class. (#64887)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64887

BufHandle has exactly the same functionality and should be used instead.

Differential Revision:
D30889483
D30889483

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: 365fe8e396731b88920535a3de96bd3301aaa3f3
2021-09-14 00:22:44 -07:00
Mikhail Zolotukhin
f0d274294d [TensorExpr] Nuke KernelArena and KernelScope. (#63587)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63587

Now that there is no classes using KernelArena for memory management we
can remove it.

Differential Revision:
D30429115
D30429115

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: 375f6f9294d27790645eeb7cb5a8e87047a57544
2021-08-24 00:32:16 -07:00
Mikhail Zolotukhin
62d02f2b57 [TensorExpr] Make 'Tensor' a value type. (#63586)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63586

This is another commit in transition from KernelArena memory management.
Tensor is essentially just a pair of <BufPtr, StmtPtr> and we don't need
to dynamically allocate it at all - it's cheap to pass it by value, and
that's what we're switching to in this commit.

After this change nothing uses KernelScope/KernelArena and they can be
safely removed.

Differential Revision:
D30429114
D30429114

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: f90b859cfe863692b7beffbe9bd0e4143df1e819
2021-08-24 00:32:13 -07:00
Mikhail Zolotukhin
dd96c26066 [TensorExpr] More NFC changes like Expr* -> ExprPtr. (#63778)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63778

This is a preparation for a switch from raw pointers to shared pointers
as a memory model for TE expressions and statements.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30487425

Pulled By: ZolotukhinM

fbshipit-source-id: 9cbe817b7d4e5fc2f150b29bb9b3bf578868f20c
2021-08-24 00:30:49 -07:00
Mikhail Zolotukhin
1263448cb2 [TensorExpr] Remove mask field from Load and Store classes. (#55825)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55825

The mask has never been used (in vectorization we generate an explicit
`IfThenElse` construct when we need to mask out some elements). The PR
removes it and cleans up all its traces from tests.

Differential Revision: D27717776

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: 41d1feeea4322da75b3999d661801c2a7f82b9db
2021-04-13 12:08:51 -07:00
Raghavan Raman
8af648354f [nnc] Benchmarks for concat (#52592)
Summary:
This PR adds a c++ benchmark for "concat" with 3 different versions - 1) aten::cat, 2) NNC implementation with if-then-else, 3) NNC implementation using multiple loops. It also adds a python benchmark for "concat" which can now be invoked with and without CPU fusion.

Here are the results of these benchmarks on a `Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz` machine with `OMP_NUM_THREADS=1`

```
--------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                   Time           CPU Iterations UserCounters...
--------------------------------------------------------------------------------------------------------------------------
Concat2D2 (678fe9f077)Input/ATen/1/160/1/14/1                                         1211 ns       1211 ns     567896 GB/s=1.14953G/s
Concat2D2 (678fe9f077)Input/ATen/1/580/1/174/1                                        1296 ns       1296 ns     537060 GB/s=4.65362G/s
Concat2D2 (678fe9f077)Input/ATen/20/160/20/14/1                                       1823 ns       1823 ns     382052 GB/s=15.2677G/s
Concat2D2 (678fe9f077)Input/ATen/20/580/20/174/1                                      3347 ns       3347 ns     210036 GB/s=36.0432G/s
Concat2D2 (678fe9f077)Input/ATen/8/512/8/512/1                                        2093 ns       2093 ns     324760 GB/s=31.3061G/s
Concat2D2 (678fe9f077)Input/NNC/1/160/1/14/1                                           694 ns        694 ns    1002902 GB/s=2.00692G/s
Concat2D2 (678fe9f077)Input/NNC/1/580/1/174/1                                          852 ns        852 ns     803002 GB/s=7.08127G/s
Concat2D2 (678fe9f077)Input/NNC/20/160/20/14/1                                        1639 ns       1639 ns     419683 GB/s=16.9828G/s
Concat2D2 (678fe9f077)Input/NNC/20/580/20/174/1                                       5956 ns       5956 ns     117833 GB/s=20.2548G/s
Concat2D2 (678fe9f077)Input/NNC/8/512/8/512/1                                         3136 ns       3136 ns     224122 GB/s=20.8958G/s
Concat2D2 (678fe9f077)Input/NNCLoop/1/160/1/14/1                                       581 ns        581 ns    1209873 GB/s=2.39737G/s
Concat2D2 (678fe9f077)Input/NNCLoop/1/580/1/174/1                                      614 ns        614 ns    1132332 GB/s=9.82955G/s
Concat2D2 (678fe9f077)Input/NNCLoop/20/160/20/14/1                                    1091 ns       1091 ns     622952 GB/s=25.5247G/s
Concat2D2 (678fe9f077)Input/NNCLoop/20/580/20/174/1                                   2399 ns       2399 ns     288376 GB/s=50.289G/s
Concat2D2 (678fe9f077)Input/NNCLoop/8/512/8/512/1                                     1500 ns       1500 ns     478360 GB/s=43.6968G/s
Concat2D3 (e23ddf06e9)Input/ATen/8/512/8/512/8/512/1                                  2584 ns       2584 ns     266394 GB/s=38.0397G/s
Concat2D3 (e23ddf06e9)Input/NNC/8/512/8/512/8/512/1                                   5056 ns       5056 ns     139768 GB/s=19.4416G/s
Concat2D3 (e23ddf06e9)Input/NNCLoop/8/512/8/512/8/512/1                               1917 ns       1917 ns     369626 GB/s=51.2758G/s
Concat2D7 (b5edf329f8)Input/ATen/8/128/8/256/8/384/8/512/8/512/8/512/8/512/1          3888 ns       3888 ns     178124 GB/s=46.3571G/s
Concat2D7 (b5edf329f8)Input/NNC/8/128/8/256/8/384/8/512/8/512/8/512/8/512/1          24639 ns      24638 ns      28336 GB/s=7.31481G/s
Concat2D7 (b5edf329f8)Input/NNCLoop/8/128/8/256/8/384/8/512/8/512/8/512/8/512/1       3093 ns       3093 ns     226326 GB/s=58.265G/s
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52592

Reviewed By: bertmaher

Differential Revision: D26596701

Pulled By: navahgar

fbshipit-source-id: 650fa88febf4423ea49f5a1d3d734edc2294d257
2021-02-24 06:09:32 -08:00