Commit Graph

1320 Commits

Author SHA1 Message Date
Mikhail Zolotukhin
556dfcb0db [TensorExpr] Re-enable "LoopNest.VectorizeUse" test. (#56094)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56094

Now FunctionCalls are merged with Loads and vectorization for
intermediate values automatically started to work.

Fixes #53553.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D27781519

Pulled By: ZolotukhinM

fbshipit-source-id: 1ed68ca2399e9bd4598639bd6dd8f369365f0ef0
2021-04-14 21:39:03 -07:00
Natalia Gimelshein
ad17fadbfc Revert D27652485: [nnc] Enable CPU fusion only when num_threads == 1
Test Plan: revert-hammer

Differential Revision:
D27652485 (e7e164f9e6)

Original commit changeset: 182580cf758d

fbshipit-source-id: e3c95b06d1eef668095f3cf461485395179d94af
2021-04-14 20:23:15 -07:00
Kurt Mohler
3fe4718d16 Add padding_idx argument to EmbeddingBag (#49237)
Summary:
This PR adds a `padding_idx` parameter to `nn.EmbeddingBag` and `nn.functional.embedding_bag`. As with `nn.Embedding`'s `padding_idx` argument, if an embedding's index is equal to `padding_idx` it is ignored, so it is not included in the reduction.

This PR does not add support for `padding_idx` for quantized or ONNX `EmbeddingBag` for opset10/11 (opset9 is supported). In these cases, an error is thrown if `padding_idx` is provided.

Fixes https://github.com/pytorch/pytorch/issues/3194

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49237

Reviewed By: walterddr, VitalyFedyunin

Differential Revision: D26948258

Pulled By: jbschlosser

fbshipit-source-id: 3ca672f7e768941f3261ab405fc7597c97ce3dfc
2021-04-14 09:38:01 -07:00
Bert Maher
e7e164f9e6 [nnc] Enable CPU fusion only when num_threads == 1 (#55621)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55621

Fuser support for thread-level parallelism is a work in progress, so
only fuse when the program is running single-threaded.
ghstack-source-id: 126069259

Test Plan: observe fusion groups formed when torch.get_num_threads==1 vs not

Reviewed By: ZolotukhinM

Differential Revision: D27652485

fbshipit-source-id: 182580cf758d99dd499cc4591eb9d080884aa7ef
2021-04-14 09:16:54 -07:00
Mikhail Zolotukhin
7ab654afd7 [TensorExpr] Rename Tensor::call to Tensor::load to be consistent with Buf and Placeholder. (#55826)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55826

It's a mechanical change.

Differential Revision: D27717777

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: fbc1bb99602250c706cf2c8c2684119c323e4d51
2021-04-13 12:08:53 -07:00
Mikhail Zolotukhin
1263448cb2 [TensorExpr] Remove mask field from Load and Store classes. (#55825)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55825

The mask has never been used (in vectorization we generate an explicit
`IfThenElse` construct when we need to mask out some elements). The PR
removes it and cleans up all its traces from tests.

Differential Revision: D27717776

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: 41d1feeea4322da75b3999d661801c2a7f82b9db
2021-04-13 12:08:51 -07:00
Mikhail Zolotukhin
b01a15d3d3 [TensorExpr] Redesign Rfactor loopnest transformation. (#55324)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55324

With this change `rfactor` only affects the passed loop and its body
never touching anything outside (that was a rootcause of a bug with the
previous implementation). Also, we don't have an `insertion_point`
parameter anymore - its meaning was vague, and the effect of it
should've been achievable with other transformations anyway.

The new `rfactor` semantics is as follows:

```
Requirements:
 * S is the reduction store
 * S is the only statement in the innermost loop
 * There is at least two reduction arguments in S
 * OUTER_REDUCTION_FOR loop corresponds to the outermost reduction variable
 used in the store and all other reduction variables are index variables of
 children loops of OUTER_REDUCTION_FOR
 * OUTER_REDUCTION_FOR is a perfect loop nest, i.e. it has only loops
 corresponding to the other reduction variables and the store, nested into
 each other

What it does:
  * Introduce a new buffer with an extra dimension of a size equal to the
  span of the loop OUTER_REDUCTION_FOR (the new buffer is returned via
  RFAC_BUF_PTR)
  * Insert an initialization store for the new buffer in
  OUTER_REDUCTION_FOR before its nested loop
  * Replace the reduction store to the original buffer with the reduction
  store to the temp buffer, removing the index var of OUTER_REDUCTION_FOR
  from reduction arguments
  * Insert a final reduction store over the extra dimension of the new
  buffer to the original buffer
  * Returns TRUE if the transformation succeeded and FALSE otherwise

Example:
Original IR:
S1: for i        # normal axis
S2:   X[i] = 0
S3:   for j      # reduction axis
S4:     for k    # reduction axis
S5:       X[i] = ReduceOp(X[i] + Y[i,j,k], reduce_axis={j,k})

After RFACTOR(S5, S3)
S1: for i               # normal axis
S2:   X[i] = 0
S3:   for j             # reduction axis for X, normal axis for X_rfac
        X_rfac[i,j] = 0
S4:     for k           # reduction axis
          X_rfac[i,j] = ReduceOp(X_rfac[i,j] + Y[i,j,k], reduce_axis={k})
        X[i] = ReduceOp(X[i] + X_rfac[i,j], reduce_axis={j})
```

Differential Revision: D27694960

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: 076fa6a1df2c23f5948302aa6b43e82cb222901c
2021-04-13 12:08:48 -07:00
Raghavan Raman
d805908c34 [NNC] API to reorder multiple loops (#55568)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/52690

This PR adds the following APIs:

```
static bool areLoopsPerfectlyNested(const std::vector<For*>& loops);

static std::vector<For*> reorder(
      const std::vector<For*>& loops,
      const std::vector<size_t>& permutation);
```

The first API checks if the given list of loops are perfectly nested. The second API reorders the given list of loops according to the permutation specified.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55568

Reviewed By: albanD

Differential Revision: D27689734

Pulled By: navahgar

fbshipit-source-id: dc1bffdbee068c3f401188035772b41847cbc7c6
2021-04-12 18:12:24 -07:00
Yukio Siraichi
93bf0ae6fc Remove legacy constructor calls from pytorch codebase. (#54142)
Summary:
Follow up from https://github.com/pytorch/pytorch/issues/53889
Related to https://github.com/pytorch/pytorch/issues/47112

Removing every occurrence of the legacy constructor call present in PyTorch at:
- _docs_
- _benchmarks_
- _test_
- _caffe2_
- _CONTRIBUTING.md_

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54142

Reviewed By: ngimel

Differential Revision: D27699450

Pulled By: mruberry

fbshipit-source-id: 530aa3f5746cc8bc1407d5d51b2bbd8075e30546
2021-04-11 15:45:17 -07:00
Ailing Zhang
6842da6251 [WIP]Relax some limitations of InferenceMode. (#54403)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54403

A few important points about InferenceMode behavior:
1. All tensors created in InferenceMode are inference tensors except for view ops.
   - view ops produce output has the same is_inference_tensor property as their input.
     Namely view of normal tensor inside InferenceMode produce a normal tensor, which is
     exactly the same as creating a view inside NoGradMode. And view of
     inference tensor outside InferenceMode produce inference tensor as output.
2. All ops are allowed inside InferenceMode, faster than normal mode.
3. Inference tensor cannot be saved for backward.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D27316483

Pulled By: ailzhang

fbshipit-source-id: e03248a66d42e2d43cfe7ccb61e49cc4afb2923b
2021-04-09 14:40:37 -07:00
Joel Schlosser
defc649eca Update to short forms of splitWithTail / splitWithMask (#55542)
Summary:
Switched to short forms of `splitWithTail` / `splitWithMask` for all tests in `test/cpp/tensorexpr/test_*.cpp` (except test_loopnest.cpp)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55542

Reviewed By: mrshenli

Differential Revision: D27632033

Pulled By: jbschlosser

fbshipit-source-id: dc2ba134f99bff8951ae61e564cd1daea92c41df
2021-04-09 10:15:20 -07:00
Bert Maher
90f848572c NNC depthwise conv2d implementation (#54920)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54920

Add a depthwise convolution implementation and reasonably good
schedules for 3x3 stride=1,2.
ghstack-source-id: 126076113

Test Plan: new tensorexpr test: Conv.DepthwiseConv2D

Reviewed By: ZolotukhinM

Differential Revision: D27413745

fbshipit-source-id: 833da6072b655fbe2b679704e9d56a08e1bf7e7e
2021-04-08 21:56:53 -07:00
Jeffrey Wan
3f9492c8b3 [Hackathon] Modernize API used in NNC C++ tests (1/3) (#55512)
Summary:
Partially fixes https://github.com/pytorch/pytorch/issues/55203

Fixes issues (1) and (2) in the following tests:
tests in test/cpp/tensorexpr/test_loopnest.cpp from the beginning to LoopNestReorderLongStringFull (including)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55512

Reviewed By: mrshenli

Differential Revision: D27630679

Pulled By: soulitzer

fbshipit-source-id: b581aaea4f5f54b3285f0348aa76e99779418f80
2021-04-08 08:34:25 -07:00
Brian Hirsh
dd2bccafc5 nnc hackathon - use new APIs in tests (#55497)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55497

Migrating some of the NNC API's used in testing, from this issue: https://github.com/pytorch/pytorch/issues/55203

I covered the second half of `test_loopnest.cpp`, and migrated (1) and (2) in the above issue: `LoopNest::getLoopStmtsFor`, `splitWithTail`, and `splitWithMask`

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D27628625

Pulled By: bdhirsh

fbshipit-source-id: ec15efba45fae0bbb442ac3577fb9ca2f8023c2d
2021-04-07 13:03:25 -07:00
Martin Yuan
3551bd31be [PyTorch] Lite interpreter with a backend delegate (#54462)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54462

Unclean files during sync - Sat Mar 20 04:00:02 PDT 2021

Unclean files during sync - Sun Mar 21 04:00:01 PDT 2021
ghstack-source-id: 124585992

Test Plan:
```
buck run xplat/caffe2/fb/test/delegate:interpreter_test -- --model_file_path=/path/to/mobile_model.ptl
```

Reviewed By: raziel

Differential Revision: D27232309

fbshipit-source-id: 8504a3185339d73bfa6e924485c4745acf269cec
2021-04-06 00:55:26 -07:00
Nikitha Malgi
197f9f0826 Merge CUDA Streams and Events (#53902)
Summary:
-----------
- Updates current_stream and default stream API's to take `optional[device]` argument
- Adds parsing logic to replace `torch.cuda.Stream` and `torch.cuda.Event` -> `torch.classes.cuda.Stream` and `torch.classes.cuda.Event` for JIT
- Merges StreamContext manager for both Eager and JIT.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53902

Test Plan:
------
Run JIT tests:
python test/test_jit.py -v TestCUDA

Run eager tests:
python test/test_cuda.py -v TestCuda

Reviewed By: glaringlee

Differential Revision: D27494627

Pulled By: nikithamalgifb

fbshipit-source-id: b30b0570e38a33fb335c83762eb06ffd46a44b5c
2021-04-05 08:19:55 -07:00
Louis Feng
159fdde9ae Support needsOutputs for RecordFunction and ObserverUtil improvements (#55012)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55012

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54442

Added needsOutputs support to RecordFunction, improved ObserverUtil functions to handle list data. Minor refactor names to be consistent.

To get output data from kernel calls, we need to temporarily capture them before passing them to the record function. Then the results are released to function return. We handle two cases, for unboxed and boxed kernels. The boxed version is fairly simple since all outputs are stored in the stack object. For unboxed kernel calls, we added a `ReturnValue` utility class to properly handle the different return values of unboxed kernels.

For optimization, this intermediate capture is only enabled for observers that request `needsOutputs(true)` and should not affect other observers or when the observer is not enabled.

Test Plan:
```
=> buck build //caffe2/test/cpp/jit: --show-output
=> buck-out/gen/caffe2/test/cpp/jit/jit --gtest_filter=RecordFunctionTest*
CUDA not available. Disabling CUDA and MultiCUDA tests
Note: Google Test filter = RecordFunctionTest*-*_CUDA:*_MultiCUDA
[==========] Running 7 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 7 tests from RecordFunctionTest
[ RUN      ] RecordFunctionTest.TracedTestInputsOutputs
[       OK ] RecordFunctionTest.TracedTestInputsOutputs (226 ms)
[ RUN      ] RecordFunctionTest.SampledCallbacks
[       OK ] RecordFunctionTest.SampledCallbacks (771 ms)
[ RUN      ] RecordFunctionTest.RecordFunctionGuard
[       OK ] RecordFunctionTest.RecordFunctionGuard (0 ms)
[ RUN      ] RecordFunctionTest.Callbacks
[       OK ] RecordFunctionTest.Callbacks (2 ms)
[ RUN      ] RecordFunctionTest.ShouldRun
[       OK ] RecordFunctionTest.ShouldRun (0 ms)
[ RUN      ] RecordFunctionTest.Basic
[       OK ] RecordFunctionTest.Basic (1 ms)
[ RUN      ] RecordFunctionTest.OperatorNameOverload
[       OK ] RecordFunctionTest.OperatorNameOverload (1 ms)
[----------] 7 tests from RecordFunctionTest (1001 ms total)

[----------] Global test environment tear-down
[==========] 7 tests from 1 test case ran. (1002 ms total)
[  PASSED  ] 7 tests.

```

Reviewed By: ilia-cher

Differential Revision: D27449877

fbshipit-source-id: 69918b729565f5899471d9db42a587f9af52238d
2021-04-02 15:16:17 -07:00
Maxim Grechkin
38a08a49ea Flip clip_grad_norm default for error_if_nonfinite to false (#55169)
Summary:
Non-backwards-compatible change introduced in https://github.com/pytorch/pytorch/pull/53843 is tripping up a lot of code. Better to set it to False initially and then potentially flip to True in the later version to give people time to adapt.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55169

Reviewed By: mruberry

Differential Revision: D27511150

Pulled By: jbschlosser

fbshipit-source-id: 1ac018557c0900b31995c29f04aea060a27bc525
2021-04-02 12:25:32 -07:00
Lucas Hosseini
09f1f14569 Transition to new tensorpipe::Pipe API. (#55193)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55193

Test Plan: CI

Reviewed By: lw

Differential Revision: D27466387

fbshipit-source-id: 07b831d699f56874dd45f37e448b8c4244ead5e3
2021-04-02 02:28:07 -07:00
Mikhail Zolotukhin
0b75f862c7 [TensorExpr] Nuke FunctionCall. (#54998)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54998

The only reason why we couldn't use Load instead of FunctionCall was
DepTracker. Now this is gone and we finally could replace FunctionCall
with Load.

Test Plan: Imported from OSS

Reviewed By: bertmaher, pbelevich

Differential Revision: D27446412

Pulled By: ZolotukhinM

fbshipit-source-id: 9183ae5541c2618abc9026b1dc4c4c9fab085d47
2021-04-01 19:47:59 -07:00
Mikhail Zolotukhin
688e350725 [TensorExpr] Nuke DepTracker and findAllNeededTensors. (#54997)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54997

DepTracker was used to automatically pull in dependent computations from
output ones. While it seems quite convenient, it's led to several
architectural issues, which are fixed in this stack.

DepTracker worked on Tensors, which is a pair of Buf and Stmt. However,
Stmt could become stale and there was no way to reliably update the
corresponding tensor. We're now using Bufs and Stmts directly and moving
away from using Tensors to avoid these problems.

Removing DepTracker allowed to unify Loads and FunctionCalls, which
essentially were duplicates of each other.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D27446414

Pulled By: ZolotukhinM

fbshipit-source-id: a2a32749d5b28beed92a601da33d126c0a2cf399
2021-04-01 19:46:26 -07:00
Lucas Hosseini
9d6a81d1a6 Avoid aggregate initialization for tensorpipe::{Cpu,Cuda}Buffer and tensorpipe::Message::Tensor. (#55136)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55136

This will ease the transition to the new API where `Buffer` does not
store a length anymore.

Test Plan: CI

Reviewed By: lw

Differential Revision: D27466385

fbshipit-source-id: 9a167f8c501455a3ab49ce75257c69d8b4869925
2021-04-01 06:55:02 -07:00
Ailing Zhang
43d4f3b8d0 Implement public API InferenceMode and its error handling (#55008)
Summary:
https://www.internalfb.com/phabricator/paste/view/P360377337Pull Request resolved: https://github.com/pytorch/pytorch/pull/53343

For easier review, here's a diff between the version before revert. https://www.internalfb.com/phabricator/paste/view/P360750919

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55008

Test Plan: Imported from OSS

Pulled By: ailzhang

Reviewed By: bhosmer

Differential Revision: D27443229

fbshipit-source-id: 01b03446a1f6373f43dd5c7170d26226b50f363c
2021-03-31 10:48:00 -07:00
Jianyu Huang
7fc03dd7c9 Back out "[pytorch][PR] Merge CUDA Streams and Events" (#54996)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54996

Original commit changeset: 45d9fee9a582

Test Plan: CI

Reviewed By: jspark1105

Differential Revision: D27444718

fbshipit-source-id: deb627230817923eaf84ade50ecb14bfbce4e779
2021-03-31 10:21:35 -07:00
Jacob Szwejbka
a0ae3e520f [Pytorch Mobile] 'fix' filter of named parameters for FL (#54633)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54633

Theres currently no information that could be used to determine what is a parameter during the loading of a mobile module. This prevents named parameters from functioning correctly. This change is a temporary hack to help out federated learning the sole user of this api currently.
ghstack-source-id: 124885201

Test Plan: todo

Reviewed By: dhruvbird

Differential Revision: D27308738

fbshipit-source-id: 0af5d1e8381ab7b7a43b20560941aa070a02e7b8
2021-03-31 09:21:35 -07:00
Qi Zhao
5b448cf21a Revert D25966661: Support needsOutputs for RecordFunction and ObserverUtil improvements
Test Plan: revert-hammer

Differential Revision:
D25966661 (0e43a73f76)

Original commit changeset: 707886e1f212

fbshipit-source-id: a4e4af29abf622c1e0aaaf7dfb019c045988b4bc
2021-03-30 15:41:12 -07:00
Louis Feng
0e43a73f76 Support needsOutputs for RecordFunction and ObserverUtil improvements (#54442)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54442

Added needsOutputs support to RecordFunction, improved ObserverUtil functions to handle list data. Minor refactor names to be consistent.

To get output data from kernel calls, we need to temporarily capture them before passing them to the record function. Then the results are released to function return. We handle two cases, for unboxed and boxed kernels. The boxed version is fairly simple since all outputs are stored in the stack object. For unboxed kernel calls, we added a `ReturnValue` utility class to properly handle the different return values of unboxed kernels.

For optimization, this intermediate capture is only enabled for observers that request `needsOutputs(true)` and should not affect other observers or when the observer is not enabled.

Test Plan:
```
=> buck build //caffe2/test/cpp/jit: --show-output
=> buck-out/gen/caffe2/test/cpp/jit/jit --gtest_filter=RecordFunctionTest*
CUDA not available. Disabling CUDA and MultiCUDA tests
Note: Google Test filter = RecordFunctionTest*-*_CUDA:*_MultiCUDA
[==========] Running 7 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 7 tests from RecordFunctionTest
[ RUN      ] RecordFunctionTest.TracedTestInputsOutputs
[       OK ] RecordFunctionTest.TracedTestInputsOutputs (226 ms)
[ RUN      ] RecordFunctionTest.SampledCallbacks
[       OK ] RecordFunctionTest.SampledCallbacks (771 ms)
[ RUN      ] RecordFunctionTest.RecordFunctionGuard
[       OK ] RecordFunctionTest.RecordFunctionGuard (0 ms)
[ RUN      ] RecordFunctionTest.Callbacks
[       OK ] RecordFunctionTest.Callbacks (2 ms)
[ RUN      ] RecordFunctionTest.ShouldRun
[       OK ] RecordFunctionTest.ShouldRun (0 ms)
[ RUN      ] RecordFunctionTest.Basic
[       OK ] RecordFunctionTest.Basic (1 ms)
[ RUN      ] RecordFunctionTest.OperatorNameOverload
[       OK ] RecordFunctionTest.OperatorNameOverload (1 ms)
[----------] 7 tests from RecordFunctionTest (1001 ms total)

[----------] Global test environment tear-down
[==========] 7 tests from 1 test case ran. (1002 ms total)
[  PASSED  ] 7 tests.

```

Reviewed By: ilia-cher

Differential Revision: D25966661

fbshipit-source-id: 707886e1f212f40ba16a1fe292ea7dd33f2646e3
2021-03-30 14:26:22 -07:00
Sam Estep
5bcbbf5373 Lint trailing newlines (#54737)
Summary:
*Context:* https://github.com/pytorch/pytorch/issues/53406 added a lint for trailing whitespace at the ends of lines. However, in order to pass FB-internal lints, that PR also had to normalize the trailing newlines in four of the files it touched. This PR adds an OSS lint to normalize trailing newlines.

The changes to the following files (made in 54847d0adb9be71be4979cead3d9d4c02160e4cd) are the only manually-written parts of this PR:

- `.github/workflows/lint.yml`
- `mypy-strict.ini`
- `tools/README.md`
- `tools/test/test_trailing_newlines.py`
- `tools/trailing_newlines.py`

I would have liked to make this just a shell one-liner like the other three similar lints, but nothing I could find quite fit the bill. Specifically, all the answers I tried from the following Stack Overflow questions were far too slow (at least a minute and a half to run on this entire repository):

- [How to detect file ends in newline?](https://stackoverflow.com/q/38746)
- [How do I find files that do not end with a newline/linefeed?](https://stackoverflow.com/q/4631068)
- [How to list all files in the Git index without newline at end of file](https://stackoverflow.com/q/27624800)
- [Linux - check if there is an empty line at the end of a file [duplicate]](https://stackoverflow.com/q/34943632)
- [git ensure newline at end of each file](https://stackoverflow.com/q/57770972)

To avoid giving false positives during the few days after this PR is merged, we should probably only merge it after https://github.com/pytorch/pytorch/issues/54967.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54737

Test Plan:
Running the shell script from the "Ensure correct trailing newlines" step in the `quick-checks` job of `.github/workflows/lint.yml` should print no output and exit in a fraction of a second with a status of 0. That was not the case prior to this PR, as shown by this failing GHA workflow run on an earlier draft of this PR:

- https://github.com/pytorch/pytorch/runs/2197446987?check_suite_focus=true

In contrast, this run (after correcting the trailing newlines in this PR) succeeded:

- https://github.com/pytorch/pytorch/pull/54737/checks?check_run_id=2197553241

To unit-test `tools/trailing_newlines.py` itself (this is run as part of our "Test tools" GitHub Actions workflow):
```
python tools/test/test_trailing_newlines.py
```

Reviewed By: malfet

Differential Revision: D27409736

Pulled By: samestep

fbshipit-source-id: 46f565227046b39f68349bbd5633105b2d2e9b19
2021-03-30 13:09:52 -07:00
Ailing Zhang
263180d7fc Revert D26973911: Implement public API InferenceMode and its error handling
Test Plan: revert-hammer

Differential Revision:
D26973911 (7caa464631)

Original commit changeset: 0ebdac7a3cd5

fbshipit-source-id: afd37a3785bc694e8ffbd679eba1cfed89ef2273
2021-03-29 11:17:49 -07:00
Kurt Mohler
3ddc6174da Raise error in clip_grad_norm_ if norm is non-finite (#53843)
Summary:
**BC-breaking note**: This change throws errors for cases that used to silently pass. The old behavior can be obtained by setting `error_if_nonfinite=False`

Fixes https://github.com/pytorch/pytorch/issues/46849

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53843

Reviewed By: malfet

Differential Revision: D27291838

Pulled By: jbschlosser

fbshipit-source-id: 216d191b26e1b5919a44a3af5cde6f35baf825c4
2021-03-29 08:41:21 -07:00
Ailing Zhang
7caa464631 Implement public API InferenceMode and its error handling (#53343)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53343

Test Plan: Imported from OSS

Reviewed By: ezyang, nikithamalgifb

Differential Revision: D26973911

Pulled By: ailzhang

fbshipit-source-id: 0ebdac7a3cd554822d26d5a40f539b6e2aaec61d
2021-03-27 13:44:23 -07:00
Bert Maher
e4d19798f3 [nnc][tests] Convert a bunch of FileCheck to checkIR
Summary:
I added a helper to convert a Stmt to string and FileCheck it, so
started using it in a bunch of places.  I replaced about half the current uses,
got tired, started to write a Perl script to automate it, realized that was
hard, and decided to give up for a bit.  But this cleans up some of the tests a
bit, so seems easy to review and worth landing.

Test Plan: test_tensorexpr --gtest_filter=LoopNest.*

Reviewed By: navahgar

Differential Revision: D27375866

fbshipit-source-id: 15894b9089dec5cf25f340fe17e6e54546a64257
2021-03-26 20:27:50 -07:00
Bert Maher
24f589df44 [nnc] Disabled test case for failure in implementing conv1d (#54756)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54756

We have multiple bugs here, one relating to index flattening and the
other to computeAt.
ghstack-source-id: 125054729

Test Plan: yikes

Reviewed By: ZolotukhinM

Differential Revision: D27354082

fbshipit-source-id: 8b15bac28e3eba4629881ae0f3bd143636f65ad7
2021-03-26 20:27:48 -07:00
Bert Maher
e542e67253 [nnc] Test case for computeAt with reduction (#54755)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54755

As title.  A step on the way to using computeAt to optimize
convolution.
ghstack-source-id: 125054730

Test Plan: new test

Reviewed By: ZolotukhinM

Differential Revision: D27353663

fbshipit-source-id: 930e09d96d1f74169bf148cd30fc195c6759a3e9
2021-03-26 20:25:18 -07:00
Nikitha Malgi
416ba5c48f Merge CUDA Streams and Events (#53902)
Summary:
-----------
- Updates current_stream and default stream API's to take `optional[device]` argument
- Adds parsing logic to replace `torch.cuda.Stream` and `torch.cuda.Event` -> `torch.classes.cuda.Stream` and `torch.classes.cuda.Event` for JIT
- Merges StreamContext manager for both Eager and JIT.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53902

Test Plan:
------
Run JIT tests:
python test/test_jit.py -v TestCUDA

Run eager tests:
python test/test_cuda.py -v TestCuda

Reviewed By: SplitInfinity

Differential Revision: D27285996

Pulled By: nikithamalgifb

fbshipit-source-id: 45d9fee9a582b5f4c82330f5f99eb88584804270
2021-03-26 14:19:39 -07:00
Pritam Damania
267fc27d39 Ensure torch.futures.wait_all exits early on error. (#53953)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53953

torch.futures.wait_all, would wait for all specified futures to
complete before it returned. As a result, if there was an error it would still
wait for a long time (ex: long running RPCs) before it returned an error to the
user.

This PR ensures `wait_all` returns and error as soon as any future runs into an
error and doesn't wait for all futures to complete.

I removed the logic _invoke_rpc_python_udf which raised an error in the unwrap
function, because ideally the error should be set on the Future and not be
raised to the user only when `wait()` is called. As an example, in the case of
`wait_all`, the user never calls `wait()` on the future that errored out but a
future down the chain and we should propagate these errors via `setError`
instead.
ghstack-source-id: 124721216

Test Plan:
1) Unit test added.
2) waitforbuildbot

Reviewed By: mrshenli

Differential Revision: D27032362

fbshipit-source-id: c719e2277c27ff3d45f1511d5dc6f1f71a03e3a8
2021-03-25 07:39:14 -07:00
Mikhail Zolotukhin
1ceb90405b [TensorExpr] Add plumbing for conv2d fusion. (#54439)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54439

For now the only way to represent conv2d in TE is via an external call,
and since aten library doesn't have an out variant for conv2d, the
external call has to perform an extra copy. Because of that fusing
conv2d now regressed performance and hence is disabled. However, in near
future we should have two alternative ways to enable it:
1) represent conv2d natively in TE (without an external call)
2) add an out variant for conv2d

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D27237045

Pulled By: ZolotukhinM

fbshipit-source-id: f5545ff711b75f9f37bc056316d1999a70043b4c
2021-03-24 18:49:07 -07:00
Chen Lai
7605ce4ed8 [PyTorch] Enable test_lite_interpreter_runtime running in android (#54579)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54579

## Summary

1. Eliminate a few more tests when BUILD_LITE_INTERPRETER is on, such that test_lite_interpreter_runtime can build and run on device.
2. Remove `#include <torch/torch.h>`, because it's not needed.

## Test plan

Set the BUILD_TEST=ON `in build_android.sh`, then run
` BUILD_LITE_INTERPRETER=1 ./scripts/build_pytorch_android.sh x86`

push binary to android device:
```
 adb push ./build_android_x86/bin/test_lite_interpreter_runtime /data/local/tmp
```

Reorganize the folder in `/data/local/tmp` so the test binary and model file is like following:
```
/data/local/tmp/test_bin/test_lite_interpreter_runtime
/data/local/tmp/test/cpp/lite_interpreter_runtime/sequence.ptl
```
such that the model file is in the correct path and can be found by the test_lite_interpreter_runtime.

![image](https://user-images.githubusercontent.com/16430979/112276332-d89d1900-8c3d-11eb-91de-7bf10d1e418d.png)

Test Plan: Imported from OSS

Reviewed By: iseeyuan

Differential Revision: D27300720

Pulled By: cccclai

fbshipit-source-id: d9526c7d3db8c0d3e76c5a4d604c6877c78afdf9
2021-03-24 14:45:27 -07:00
anjali411
f9ca0d87a7 Teach Python TS frontend to parse complex literals (#52881)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52881

**This PR adds:**
1. logic to parse complex constants (complex literals of the form `bj`)
2. logic to parse complex lists
3. support for complex constructors: `complex(tensor/int/float/bool, tensor/int/float/bool)`
4. Limited operator support
     - `add`, `sub`, `mul`, `torch.tensor`, `torch.as_tensor`

**Follow-up work:**
1. Add complex support for unary and other registered ops.
2. support complex constructor with string as input (this is supported in Python eager mode).
3. Test all emitXYZ for all XYZ in `ir_emitter.cpp` (currently only emitConst, emitValueToTensor are tested). e.g., test loops etc.
4. onnx doesn't support complex tensors, so we should error out with a clear and descriptive error message.

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D27245059

Pulled By: anjali411

fbshipit-source-id: af043b5159ae99a9cc8691b5a8401503fa8d6f05
2021-03-24 08:12:17 -07:00
Raghavan Raman
601e79200d [NNC] Implementing LoopFusion (#54461)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/54337

This PR adds a new API to NNC to perform loop fusion.

```
static For* fuseLoops(const std::vector<For*>& loops);
```

Loop fusion is done only when all the conditions below are satisfied.
  * All the loops have the same parent.
  * There are no statements between these loops in their parent body.
  * The start bounds are the same for all loops.
  * The stop bounds are the same for all loops.
  * Fusing the loops does not violate or add any dependencies.

This PR also adds an API to check for partial overlaps in `buffer_inference.h` and fixes a bug in `mem_dependency_checker.cpp`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54461

Reviewed By: bertmaher

Differential Revision: D27254888

Pulled By: navahgar

fbshipit-source-id: c21b027d738e5022e9cb88f6f72cd9e255bdb15e
2021-03-23 21:20:00 -07:00
Lucas Hosseini
a84afb3a7c Use type-erased union for Buffer. (#54251)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54251

Pull Request resolved: https://github.com/pytorch/tensorpipe/pull/324

In order to merge the channel hierarchies, we need a generic `Buffer` type, that can wrap either a `CpuBuffer` or a `CudaBuffer`.
The constraints are that, since this type is used by the channels, it cannot explicitly refer to `CudaBuffer`. We propose here a type-erasure based solution, with small-buffer optimization to avoid heap-allocating the wrapped concrete buffer.

This is a new version of D27001339 (c618dc13d2) which broke PyTorch OSS build.

Test Plan: CI

Reviewed By: lw, mrshenli

Differential Revision: D27156053

fbshipit-source-id: 4244302af33a3be91dcd06093c0d6045d081d3cc
2021-03-19 04:58:09 -07:00
Peter Bell
04e0cbf5a9 Add padding='same' mode to conv{1,2,3}d (#45667)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45667

First part of #3867 (Pooling operators still to do)

This adds a `padding='same'` mode to the interface of `conv{n}d`and `nn.Conv{n}d`. This should match the behaviour of `tensorflow`. I couldn't find it explicitly documented but through experimentation I found `tensorflow` returns the shape `ceil(len/stride)` and always adds any extra asymmetric padding onto the right side of the input.

Since the `native_functions.yaml` schema doesn't seem to support strings or enums, I've moved the function interface into python and it now dispatches between the numerically padded `conv{n}d` and the `_conv{n}d_same` variant. Underscores because I couldn't see any way to avoid exporting a function into the `torch` namespace.

A note on asymmetric padding. The total padding required can be odd if both the kernel-length is even  and the dilation is odd. mkldnn has native support for asymmetric padding, so there is no overhead there, but for other backends I resort to padding the input tensor by 1 on the right hand side to make the remaining padding symmetrical. In these cases, I use `TORCH_WARN_ONCE` to notify the user of the performance implications.

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D27170744

Pulled By: jbschlosser

fbshipit-source-id: b3d8a0380e0787ae781f2e5d8ee365a7bfd49f22
2021-03-18 16:22:03 -07:00
Raghavan Raman
4b2abc4b8e [NNC] Adding API to distribute loops (#53865)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/53864

This PR adds the following APIs that perform loop distribution to `LoopNest`:
```
static std::vector<For*> distributeLoop(For* loop, const std::unordered_set<Stmt*>& pivots);
static std::vector<For*> distributeLoop(For* loop);
static std::vector<For*> distributeLoopOverInnerLoops(For* loop);
```

* The first method distributes the given loop over its body by splitting after every given pivot stmt.
* The second method distributes the given loop over every stmt in its body.
* The last method distributes the given loop over its body by splitting after every `For` stmt in its body.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53865

Reviewed By: mruberry

Differential Revision: D27075006

Pulled By: navahgar

fbshipit-source-id: 031746aad619fe84c109e78b53387535e7f77cef
2021-03-18 07:27:39 -07:00
Xiaoqiang Zheng
9f86b656ba Resubmit: Adding parallel support for the LLVM backend. (#54122)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54122

Test Plan:
* USE_TBB=1 ATEN_THREADING=TBB python setup.py develop --cmake
  * USE_TBB=1 ATEN_THREADING=NATIVE python setup.py develop --cmake
  * USE_TBB=1 ATEN_THREADING=OMP python setup.py develop --cmake
  * cd build; ninja bin/tensorexpr_bench
  * bin/test_tensorexpr --gtest_filter="*Parallel*"

Reviewed By: bertmaher

Differential Revision: D27109802

Pulled By: zheng-xq

fbshipit-source-id: db159466d0b46357bcf0fbefb36094bee312368c
2021-03-18 07:19:37 -07:00
Mike Ruberry
8caa7889fc Revert D27001339: Use type-erased union for Buffer.
Test Plan: revert-hammer

Differential Revision:
D27001339 (c618dc13d2)

Original commit changeset: 26d7dc19d69d

fbshipit-source-id: 6e036ed7e1f71c9cf20e3361607c4fe4fa2d3d02
2021-03-18 05:27:17 -07:00
Lucas Hosseini
c618dc13d2 Use type-erased union for Buffer. (#322)
Summary:
Pull Request resolved: https://github.com/pytorch/tensorpipe/pull/322

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54145

In order to merge the channel hierarchies, we need a generic `Buffer` type, that can wrap either a `CpuBuffer` or a `CudaBuffer`.
The constraints are that, since this type is used by the channels, it cannot explicitly refer to `CudaBuffer`. We propose here a type-erasure based solution, with small-buffer optimization to avoid heap-allocating the wrapped concrete buffer.
ghstack-source-id: 124131499

Test Plan: CI

Reviewed By: lw

Differential Revision: D27001339

fbshipit-source-id: 26d7dc19d69d7e3336df6fd4ff6ec118dc17c5b6
2021-03-18 02:23:17 -07:00
Wanchao Liang
a4f0f8b1e9 [distributed] add base processgroup::options (#53662)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53662

Add a base processgroup::options so that we can do inheritance and
provide
a universal option API in python

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D26968856

Pulled By: wanchaol

fbshipit-source-id: 858f4b61b27aecb1943959bba68f8c14114f67d8
2021-03-17 18:40:04 -07:00
Bert Maher
7367bca066 [nnc] Tests for proposed feature: loop bounds conditional simplification (#54121)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54121

It would be nice to do range analysis to determine if a condition
cannot be satisfied.  These are some tests that we should be able to turn on
once we have this feature.
ghstack-source-id: 124116847

Test Plan: Simplify.*LoopBounds

Reviewed By: ZolotukhinM

Differential Revision: D27107956

fbshipit-source-id: bb27e3d3bc803f0101c416e4a351ba2278684980
2021-03-17 11:01:10 -07:00
Bert Maher
a852fdb6b5 [nnc] Test for using int64 dimensions (#54094)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54094

We should be able to use 64-bit integers for loop boundaries and
buffer/tensor indexing.
ghstack-source-id: 124116846

Test Plan: New tests, disabled

Reviewed By: ZolotukhinM

Differential Revision: D27094934

fbshipit-source-id: a53de21a0ef523ea3560d5dd4707df50624896ef
2021-03-17 10:59:26 -07:00
Martin Yuan
524cb0a514 [PyTorch Mobile] Dedup method names in bytecode serialization (#53677)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53677

When serializing bytecode, we serialize it based on methods. It may happen that there are multiple instances of a class. In such a case, the methods inside the class may be serialized multiple times.

To reduce the duplication, we cache the qualified name of the methods, so that one method is serialized only once.

Test Plan: existing unittests and CI

Reviewed By: dhruvbird, raziel

Differential Revision: D26933945

Pulled By: iseeyuan

fbshipit-source-id: 8a9833949fa18f7103a5a0be19e2028040dc7717
2021-03-16 15:24:47 -07:00