Commit Graph

14 Commits

Author SHA1 Message Date
Shihao Xu
ae452a81a9 [DistAutograd x JIT] Capture global state, dist autograd current context id, before thread switching triggered by JIT future.wait() (#36395)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36395

titled

Test Plan:
# Unit tests

```
buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:dist_autograd_fork -- test_restore_context_after_swtich_to_jit_thread

buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:dist_autograd_fork

buck-out/gen/caffe2/test/distributed/rpc/jit/dist_autograd_fork\#binary.par -r test_restore_context_after_swtich_to_jit_thread
```

```
buck test mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork -- test_backward_simple_script_call

buck build mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork

buck-out/gen/caffe2/test/distributed/rpc/dist_autograd_fork\#binary.par -r test_backward_simple_script_call
```

Differential Revision: D7857991

fbshipit-source-id: 168e0e3846a50ea92d4f9450a30ccc6c13e2fcec
2020-04-11 02:51:39 -07:00
Nikolay Korovaiko
4b916b6b75 Mark every frame with a unique id (#33788)
Summary:
This PR introduces frame ids that will allow us to associate profiling information with its corresponding run.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33788

Differential Revision: D20164897

Pulled By: Krovatkin

fbshipit-source-id: 8172ff9f4d188b339e2ff98a80bbe4a2b306a8aa
2020-04-07 17:52:06 -07:00
Jeremy Lilley
72b55fea6b [jit] Make torch::utils::Future and ivalue::future apis closer (#35849)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35849

This change harmonizes some aspects of the api.

- torch::utils::Future callback should have no args, like ivalue::future.
   Many of the lines of this change are related to fixing that up downstream.

   No args makes the api simpler to use, particularly since many/most of the
   downstream use cases ignore the passed-in args. It's simple enough to
   appropriately capture the future in the lambda if necessary.

 - Add error/hasError methods to ivalue::Future.
 - Use c10::optional underneath for error to ivalue::Future.
 - Change markCompleted(error) to setError(error) to ivalue::Future.
 - Add setValue(FutureError) version to torch::utils::Future

ghstack-source-id: 101684435

Test Plan: buck test mode/dev-nosan caffe2/test/...

Differential Revision: D20803251

fbshipit-source-id: e3d925287bd9a80d649843eef5f270163f448269
2020-04-07 17:05:35 -07:00
Christian Sarofeen
6d24f8fe21 Infrastructure for a new CUDA Fuser (#34785)
Summary:
**Summary:** This PR contains the infrastructure of a new CUDA fuser. This CUDA fuser is based on many of the same principles of TensorExpressions and Halide, however the implementation is ground up. The fusion pass itself is similar to the default CUDA fuser, however, it has undergone some refactoring and is using the new code generation infrastructure. For those who are interested in how the code generation in this PR works, I would recommend reviewing _test/cpp/jit/test_gpu_fusion.cpp_ as well as the long comment section at the beginning of _torch/csrc/jit/codegen/cuda/transform_replay.h_  One of the largest differences between our approach and that of TVM/Halide, is the concept of "TensorView". TensorView from a high level should be thought of similarly to how we think of working with Tensors in PyTorch. It's an N-D object which can undergo transformations that change its dimensionality. Dimensionality changes are done through the operations split/merge/reorder/computeAt. These transformations are similar to split/fuse/reorder/compute_at of TVM, they modify how a tensor is iterated over to generate GPU code. Interestingly, in our scheme these transformations are applied to tensors and only impact how that tensor is generated.

**Warning:** This PR is purposefully not feature complete with the current fuser. We wanted to separate out the infrastructure from the fusion capabilities. Once in, smaller incremental PRs will be submitted to expand capabilities of the fuser.

**Short term goals:**

Parity with current CUDA fuser (including performance):
- Dynamic shapes (no recompilation)
- Implicit handling of braodcast (broadcasted tensors are treated as tensors of the braodcasted size in the generated code)
- Dropout

**Mid-term goals:**

- Transposes fused with pointwise operations where transpose involves only 2 axes (across the fused operation).
- 1-D reductions fused with pointwise operations
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34785

Reviewed By: ZolotukhinM

Differential Revision: D20650977

Pulled By: soumith

fbshipit-source-id: ee39c95a880e1b9822e874ed4cc180971572bf63
2020-04-02 09:22:42 -07:00
Ilia Cherniavskii
a5bfcc5323 Unify management of thread local settings (#35523)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35523

In this PR we extend ThreadLocalState to cover dispatch keys and
ThreadLocalDebugInfo and move it from JIT interpreter down to
thread management (at::launch) and autograd (backward threads) code

Test Plan: unit tests (CI)

Reviewed By: dzhulgakov

Differential Revision: D20615714

fbshipit-source-id: 16a9fc96a25cb6c2629230b1187fbf78786ac565
2020-04-01 01:56:39 -07:00
Ilia Cherniavskii
800d5617c0 Recording of TorchScript functions (#34710)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34710

Extending RecordFunction API to support new recording scopes (such as TorchScript functions), as well as giving more flexibility to set sampling rate.

Test Plan: unit test (test_misc.cpp/testRecordFunction)

Reviewed By: gdankel, dzhulgakov

Differential Revision: D20158523

fbshipit-source-id: a9e0819d21cc06f4952d92d43246587c36137582
2020-03-31 00:33:23 -07:00
Meghan Lele
6384c2d81b [JIT] clang-format JIT code (#35115)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35115

This commit runs the newly added tools/clang_format.py on the JIT
codebase and includes all of the formatting changes thus produced.

Testing:
Ran the script, CI.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D20568523

Pulled By: SplitInfinity

fbshipit-source-id: e09bdb982ccf090eecfb7c7b461b8d0681eef82b
2020-03-26 11:24:51 -07:00
Pritam Damania
7065c46ea2 Respect dist autograd context in torch.jit._fork. (#34360)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34360

The distributed autograd context sets up a thread local context id
which is used to perform appropriate book keeping and autograd recording of RPC
functions in the forward pass.

However, if we use torch.jit._fork within the distributed autograd context, the
code executed within torch.jit._fork will lose this context since it is run in
a separate JIT thread and the thread local is not set in that thread.

To fix this problem, we pass in the distributed autograd context to
torch.jit._fork similar to what we did in
https://github.com/pytorch/pytorch/pull/16101.
ghstack-source-id: 100445465

Test Plan: waitforbuildbot

Differential Revision: D20301352

fbshipit-source-id: aa3fffe69c2b40722c66213351a4e0d77484a621
2020-03-19 14:12:28 -07:00
Hong Xu
027d7f7ba5 Delete AT_WARN and replace all AT_WARN with TORCH_WARN (#34623)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34623

The bandaid of "AT_WARN" keeps introducing new warnings. Let's get rid
of it entirely.

Close #34502

Test Plan: Imported from OSS

Differential Revision: D20420112

Pulled By: albanD

fbshipit-source-id: 7160c113cb4deb2d2f50a375356f423fe5e86f50
2020-03-13 12:27:22 -07:00
Nikolay Korovaiko
e16908cb1f profile block outputs; helps guard elimination (#33889)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33889

Reviewed By: zdevito

Differential Revision: D20294979

Pulled By: Krovatkin

fbshipit-source-id: 2a68710ec8f8f854c99dfe173f49da442a39e498
2020-03-09 17:12:58 -07:00
James Reed
45a504dd2d [JIT] Introduce BuiltinOpFunction and integrate into torchbind (#34098)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/34098

* #33900 [JIT] Move stuff out of class_type.cpp

Test Plan: Imported from OSS

Differential Revision: D20229166

Pulled By: jamesr66a

fbshipit-source-id: d658a63a5d6e372e675f35b8456adc8de82b49f3
2020-03-07 10:03:56 -08:00
James Reed
60e8615a6d [JIT] Virtualize Function (#33921)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33921

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.intern.facebook.com/intern/diff/D20153092/)!

Test Plan: Imported from OSS

Differential Revision: D20177227

Pulled By: jamesr66a

fbshipit-source-id: 87f3e484c4f873d60f76f50f6789c1b4a73bdfde
2020-03-07 10:03:50 -08:00
Zachary DeVito
358450e02b improved TorchScript traceback (#33834)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33834

This changes how we report Tracebacks to make them more clear when
there are both serialized and non-serialized ranges. It now looks like:

```
Traceback (most recent call last):
  File "foo.py", line 25, in <module>
    s2(a, b)
  File "/scratch/zdevito/pytorch/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__.py", line 7, in forward
    x: Tensor,
    y: Tensor) -> Tensor:
    return (self).bar(x, y, )
            ~~~~~~~~~ <--- HERE
  def bar(self: __torch__.Moo,
    x: Tensor,
  File "code/__torch__.py", line 11, in bar
    x: Tensor,
    y: Tensor) -> Tensor:
    _0 = (self).baz(x, y, )
          ~~~~~~~~~ <--- HERE
    _1 = torch.ones([3], dtype=None, layout=None, device=None, pin_memory=None)
    return torch.add(_0, _1, alpha=1)
  File "code/__torch__.py", line 17, in baz
    x: Tensor,
    y: Tensor) -> Tensor:
    return torch.add(x, y, alpha=1)
           ~~~~~~~~~ <--- HERE

Traceback of TorchScript, original code (most recent call last):
  File "foo.py", line 11, in forward
    def forward(self, x, y):
        return self.bar(x, y)
               ~~~~~~~~ <--- HERE
  File "foo.py", line 9, in bar
    def bar(self, x, y):
        return self.baz(x, y) + torch.ones(3)
               ~~~~~~~~ <--- HERE
  File "foo.py", line 7, in baz
    def baz(self, x, y):
        return x + y
               ~~~~~ <--- HERE
RuntimeError: The size of tensor a (4) must match the size of tensor b (5) at non-singleton dimension 1
```

It follows Python convension of having the most important information last
and reading from the bottom up.

Changes:
* Moved the error message to the end, to copy Python
* Report original traceback separate from serialized traceback
* Make sure root functions have names in the interpreter trace.

Test Plan: Imported from OSS

Differential Revision: D20126136

Pulled By: zdevito

fbshipit-source-id: fd01f9985e5d74e04c4d064c02e8bc320f4fac13
2020-03-03 12:27:38 -08:00
Michael Suo
dbe850af5b [jit] do the code reorg (#33851)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33851

Rationale and context described in #33828.

Script to reproduce the move:
https://gist.github.com/suo/16cbefaaeb67ca5a7c6caffd49b7f6e9
ghstack-source-id: 99079645

Test Plan: Make sure CI passes

Reviewed By: jamesr66a

Differential Revision: D20133869

fbshipit-source-id: 390e9241a9c85366d9005c492ac31f10aa96488e
2020-02-27 13:02:51 -08:00