Commit Graph

655 Commits

Author SHA1 Message Date
PyTorch MergeBot
8eb579e362 Revert "[Profiler] Move legacy profiler out of torch/csrc/autograd (#85512)"
This reverts commit 157a3d2a7c.

Reverted https://github.com/pytorch/pytorch/pull/85512 on behalf of https://github.com/DanilBaibak due to Due to files were deleted, the internal build failed. Please re-submit via codev.
2022-10-14 14:56:59 +00:00
Taylor Robie
157a3d2a7c [Profiler] Move legacy profiler out of torch/csrc/autograd (#85512)
The legacy profiler is an eyesore in the autograd folder. At this point the implementation is almost completely decoupled from the rest of profiler, and it is in maintaince mode pending deprecation.

As a result, I'm moving it to `torch/csrc/profiler/standalone`. Unfortuantely BC requires that the symbols remain in `torch::autograd::profiler`, so I've put some basic forwarding logic in `torch/csrc/autograd/profiler.h`.

One strange bit is that `profiler_legacy.h` forward declares `torch::autograd::Node`, but doesn't seem to do anything with it. I think we can delete it, but I want to test to make sure.

(Note: this should not land until https://github.com/pytorch/torchrec/pull/595 is landed.)

Differential Revision: [D39108648](https://our.internmc.facebook.com/intern/diff/D39108648/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85512
Approved by: https://github.com/aaronenyeshi
2022-10-14 05:38:48 +00:00
Taylor Robie
b8f14b7877 [Profiler][Minor] Group and consolidate stub APIs (#85510)
There is a concept in profiler of a stub that wraps a profiling API. It was introduced for CUDA profiling before Kineto, and ITT has adopted it to call into VTune APIs. However for the most part we don't really interact with them when developing the PyTorch profiler.

Thus it makes sense to unify the fallback registration mechanism and create a subfolder to free up real estate in the top level `torch/csrc/profiler` directory.

Differential Revision: [D39108647](https://our.internmc.facebook.com/intern/diff/D39108647/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85510
Approved by: https://github.com/aaronenyeshi
2022-10-14 05:38:46 +00:00
Jason Ansel
c7c09722ad Move TorchDynamo into PyTorch core (#86461)
Context:
https://github.com/pytorch/torchdynamo/issues/1588

This PR moves [TorchDynamo](https://github.com/pytorch/torchdynamo) and TorchInductor into PyTorch core.
- `torchdynamo` becomes `torch._dynamo`
- `torchinductor` becomes `torch._inductor`

This PR was generated by running `copy_to_core.sh` in https://github.com/pytorch/torchdynamo/pull/1538

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86461
Approved by: https://github.com/voznesenskym
2022-10-13 23:18:06 +00:00
Jason Ansel
f1fdb6efbd Manual changes for moving dynamo to core (#86621)
This is the subset of the changes in #86461 not auto-generated by `copy_to_core.sh`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86621
Approved by: https://github.com/albanD
2022-10-11 23:01:21 +00:00
Sahan Paliskara
936e93058b Delete torch::deploy from pytorch core (#85953)
As we have migrated torch::deploy over to https://github.com/pytorch/multipy, we can now delete it from pytorch core as ongoing development will happen there.

This PR was created due to syncing issues with https://github.com/pytorch/pytorch/pull/85443 which is where the review history can be found.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85953
Approved by: https://github.com/seemethere, https://github.com/malfet
2022-10-06 07:20:16 +00:00
Min Si
089a64e99e Install c10d headers with absolute path (#86257)
https://github.com/pytorch/pytorch/pull/85780 updated all c10d headers in pytorch to use absolute path following the other distributed components. However, the headers were still copied to `${TORCH_INSTALL_INCLUDE_DIR}/torch`, thus external extentions still have to reference the c10d headers as `<c10d/*.h>`, making the usage inconsistent (the only exception was c10d/exception.h, which was copied to `${TORCH_INSTALL_INCLUDE_DIR}/torch/csrc/distributed/c10d`).

This patch fixes the installation step to copy all c10d headers to `${TORCH_INSTALL_INCLUDE_DIR}/torch/csrc/distributed/c10d`, thus external extensions can consistently reference c10d headers with the absolute path.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/86257
Approved by: https://github.com/kumpera
2022-10-05 20:02:05 +00:00
Jane Xu
3cdf621fe5 Add opt-einsum to CI (#85574)
Depends on https://github.com/pytorch/pytorch/pull/84890.

This PR adds opt_einsum to CI, enabling path optimization for the multi-input case. It also updates the installation sites to install torch with einsum, but those are mostly to make sure it would work on the user's end (as opt-einsum would have already been installed in the docker or in prior set up steps).

This PR also updates the windows build_pytorch.bat script to use the same bdist_wheel and install commands as on Linux, replacing the `setup.py install` that'll become deprecated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85574
Approved by: https://github.com/huydhn, https://github.com/soulitzer
2022-09-29 14:28:55 +00:00
Jane Xu
e7e1cd945f Add path optimize kwarg to einsum (#84890)
## This PR seeks to:
- [x] add c++ support for an optimize path
- [x] add python opt_einsum path passthrough
- [x] add opt_einsum to OSS requirements, but a soft one
- [x] show benchmark results here

Additional things I've explored + their conclusions:
- **Delaying the summing over dimensions** => added!
    - The idea here is to not incur kernel calls to `sum` as we try to early sum out in einsum. Thus, we collect all the dimensions that need to be summed together in one contraction + sum at the end instead of summing as we go. While this optimization didn't feel like it made things faster for the random cases we've selected (they all summed 1 dim per contraction), it is a good principle and would help more common use cases that would reduce multiple dimensions at a time (like `bxy,xyi,xyj->bij`).
- **Caching contract_path based on equation and tensor sizes** => dropped :(
    - The benchmarks were strictly worse for all the cases, and, from scanning the use cases, I observed people do not often call einsum on the same equation/tensor order enough for caching to be justified. I do think caching can be effective in the future, but it would require further investigation.

## Not a part of this PR (but are next steps):
- adding opt_einsum package to OSS CI
- adding it to internal CI
- potentially adding a kwarg path argument to the python API -- if the path is given, we wouldn't have to spend time calculating it, but there would be some time lost validating user input.

## Testing:
- Added more tests to CI

## Benchmarking:
**TL;DRs**
- **torch.einsum with opt_einsum is a definite win for the production case**.
- **torch.einsum with opt_einsum installed is consistently fast, but has an overhead** of needing to find the path. If the path is already found/optimal, it will be slightly slower.
- The einsum overhead decreases for bigger dimensions.
- **torch.einsum without opt_einsum installed is comparable to before this commit**, with occasional slowness potentially due to not reshaping/squeezing as we contract until the end.
- For many of the random generated cases, the dimensions were too similar and small where an optimal order wasn't that much more optimal than just going left to right. However, in production, dimensions are commonly quite distinct (batch size will be small, but the data will be huge).
- **torch.einsum opt is comparable (slightly faster overall) compared to numpy.einsum opt for the cpu case**. This is interesting given that torch.einsum currently spends time computing the path, but numpy.einsum takes it as input.
- **torch.einsum opt is significantly faster than numpy.einsum opt for the gpu case**. This is because numpy doesn't take advantage of GPUs.

The following benchmarks were done on an A100 GPU and Linux CPUs. The line in the first chart separates GPU (on top) from CPU, and the line in the second graph separates CPU (on top) and then GPU. Sorry it's flipped 😛 .

Production example (see [colab benchmark](https://colab.research.google.com/drive/1V2s4v1dOOKwRvp5T_DC-PNUosOV9FFJx?authuser=1#scrollTo=WZoQkC8Mdt6I) for more context):
<img width="1176" alt="image" src="https://user-images.githubusercontent.com/31798555/192012636-9a68bfa7-2601-43b1-afeb-b4e0877db6a4.png">

Randomly generated examples (the same ones as in https://github.com/pytorch/pytorch/pull/60191)
<img width="1176" alt="image" src="https://user-images.githubusercontent.com/31798555/192012804-1c639595-b3e6-48c9-a385-ad851c13e1c2.png">

Open below to see old + not super relevant benchmarking results:
<details>
Benchmark results BEFORE this PR (on Linux -- I will update devices so they are consistent later):
<img width="776" alt="image" src="https://user-images.githubusercontent.com/31798555/190807274-18f71fce-556e-47f4-b18c-e0f7d0c0d5aa.png">

Benchmark results with the code on this PR (on my x86 mac):
For the CPU internal use case --
![image](https://user-images.githubusercontent.com/31798555/190801376-6f591b00-cebd-4ca7-bb23-ae8f17f1634e.png)

For the general use case --
It looks like numpy opt still does better in several of these random cases, but torch einsum opt is consistently faster than torch.einsum.
![image](https://user-images.githubusercontent.com/31798555/190811730-fbb6797d-af59-4f5a-92da-ba4103372014.png)
<details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84890
Approved by: https://github.com/albanD, https://github.com/soulitzer
2022-09-24 03:47:36 +00:00
atalman
eb94df28c7 Use pip install cu117 (#85097)
Creates new wheel workflow specific to CUDA 11.7 that does not bundle the cudnn and cublas.

Workflow:
https://github.com/pytorch/pytorch/actions/runs/3094622781

New Package:
manywheel-py3_10-cuda11_7-with-pypi-cudnn | 843 MB

Old Package:
manywheel-py3_10-cuda11_7 | 1.65 GB

Testing workflow:

[manywheel-py3_7-cuda11_7-with-pypi-cudnn-build / build](https://github.com/pytorch/pytorch/actions/runs/3091145546/jobs/5000867662#logs):
```
Bundling without cudnn and cublas.
+ DEPS_LIST=("/usr/local/cuda/lib64/libcudart.so.11.0" "/usr/local/cuda/lib64/libnvToolsExt.so.1" "/usr/local/cuda/lib64/libnvrtc.so.11.2" "/usr/local/cuda/lib64/libnvrtc-builtins.so.11.7" "$LIBGOMP_PATH")
+ DEPS_SONAME=("libcudart.so.11.0" "libnvToolsExt.so.1" "libnvrtc.so.11.2" "libnvrtc-builtins.so.11.7" "libgomp.so.1")
.....
pytorch_extra_install_requirements: nvidia-cuda-runtime-cu11, nvidia-cudnn-cu11, nvidia-cublas-cu11
```

[manywheel-py3_7-cuda11_7-build / build](https://github.com/pytorch/pytorch/actions/runs/3091145546/jobs/5000863250#logs)

```
Bundling with cudnn and cublas.
+ DEPS_LIST=("/usr/local/cuda/lib64/libcudart.so.11.0" "/usr/local/cuda/lib64/libnvToolsExt.so.1" "/usr/local/cuda/lib64/libnvrtc.so.11.2" "/usr/local/cuda/lib64/libnvrtc-builtins.so.11.7" "/usr/local/cuda/lib64/libcudnn_adv_infer.so.8" "/usr/local/cuda/lib64/libcudnn_adv_train.so.8" "/usr/local/cuda/lib64/libcudnn_cnn_infer.so.8" "/usr/local/cuda/lib64/libcudnn_cnn_train.so.8" "/usr/local/cuda/lib64/libcudnn_ops_infer.so.8" "/usr/local/cuda/lib64/libcudnn_ops_train.so.8" "/usr/local/cuda/lib64/libcudnn.so.8" "/usr/local/cuda/lib64/libcublas.so.11" "/usr/local/cuda/lib64/libcublasLt.so.11" "$LIBGOMP_PATH")
+ DEPS_SONAME=("libcudart.so.11.0" "libnvToolsExt.so.1" "libnvrtc.so.11.2" "libnvrtc-builtins.so.11.7" "libcudnn_adv_infer.so.8" "libcudnn_adv_train.so.8" "libcudnn_cnn_infer.so.8" "libcudnn_cnn_train.so.8" "libcudnn_ops_infer.so.8" "libcudnn_ops_train.so.8" "libcudnn.so.8" "libcublas.so.11" "libcublasLt.so.11" "libgomp.so.1")
```

cc: @malfet @ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85097
Approved by: https://github.com/malfet
2022-09-21 16:30:25 +00:00
Nikita Shulga
d05a11337c [CMake] Add functorch target (#83464)
Move functorch/functorch into `functorch` folder
- Add functorch/CMakeLists.txt that adds `functorch` native python exension
- Modify `setup.py` to package pytorch and functorch together into a single wheel
- Modify `functorch.__version__` is not equal to that of `torch.__version__`
- Add dummy `functorch/setup.py` file for the projects that still want to build it

Differential Revision: [D39058811](https://our.internmc.facebook.com/intern/diff/D39058811)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83464
Approved by: https://github.com/zou3519
2022-09-14 00:05:33 +00:00
Kento Nozawa
5238404f4d Increment version_range_max (#84815)
Python 3.10 should be added as a listing in `Programming Language` on https://pypi.org/project/torch/:

<img width="238" alt="Screenshot 2022-09-11 at 2 48 01" src="https://user-images.githubusercontent.com/7121753/189495599-72bd6a28-4248-4e4e-8194-b5b1f9e984e2.png">
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84815
Approved by: https://github.com/malfet
2022-09-12 21:38:16 +00:00
Driss Guessous
0fc02dbba4 flash_attention integration (#81434)
# Summary:
- I added a new submodule Cutlass pointing to 2.10 release. The inclusion of flash_attention code should be gated by the flag: USE_FLASH_ATTENTION. This is defaulted to off resulting in flash to not be build anywhere. This is done on purpose since we don't have A100 machines to compile and test on.

- Only looked at CMake did not attempt bazel or buck yet.

-  I included the mha_fwd from flash_attention that has ben refactored to use cutlass 2.10. There is currently no backwards kernel on this branch. That would be a good follow up.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81434
Approved by: https://github.com/cpuhrsch
2022-09-09 20:11:26 +00:00
Richard Zou
0a89bdf989 Set up aten/src/ATen/functorch directory; move some files there (#84648)
This PR:
- sets up aten/src/ATen/functorch in PyTorch's build system
- Moves {BatchedTensorImpl.h, and BatchedTensorImpl.cpp}
there as a test.

Test Plan:
- functorch build and test should pass

Differential Revision: [D39315051](https://our.internmc.facebook.com/intern/diff/D39315051)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84648
Approved by: https://github.com/ezyang
2022-09-09 15:22:57 +00:00
Taylor Robie
bea0184033 Reland: [Profiler][Trivial] Create orchestration folder and move observer management there. (#83893)" (#84667)
Reland of https://github.com/pytorch/pytorch/pull/83893

Differential Revision: [D39282536](https://our.internmc.facebook.com/intern/diff/D39282536/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39282536/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84667
Approved by: https://github.com/slgong-fb
2022-09-08 17:09:19 +00:00
PyTorch MergeBot
8b578849b4 Revert "[Profiler][Trivial] Create orchestration folder and move observer management there. (#83893)"
This reverts commit 48a596ad3f.

Reverted https://github.com/pytorch/pytorch/pull/83893 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally
2022-09-01 18:34:58 +00:00
Nikita Shulga
4b8ae04788 [BE] Delete torch._dl extension (#84361)
And lots of complexity around the availability of RTLD_GLOBAL flags in `os` module
As this flag is always present since Python-3.3, see https://docs.python.org/3/library/os.html#os.RTLD_GLOBAL

Fixes https://github.com/pytorch/pytorch/issues/84351

Pull Request resolved: https://github.com/pytorch/pytorch/pull/84361
Approved by: https://github.com/kit1980
2022-08-31 19:59:31 +00:00
Taylor Robie
48a596ad3f [Profiler][Trivial] Create orchestration folder and move observer management there. (#83893)
Just a basic move. Later I'll add other subsystems. (Python, Kineto)

Differential Revision: [D38925895](https://our.internmc.facebook.com/intern/diff/D38925895/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D38925895/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83893
Approved by: https://github.com/slgong-fb
2022-08-30 21:40:59 +00:00
Nikita Shulga
91e754b268 [BE] setup.py refactors (#83635)
No function changes, just move stuff around:
- Move main code to `main` routine
- Define torch and torchgen package data list in local vars
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83635
Approved by: https://github.com/kit1980
2022-08-21 14:50:39 +00:00
Yeounoh Chung
8707aabe9a Bundle lazy ts backend (#82384)
### Description
<!-- What did you change and why was it needed? -->
`libtorch.so` is missing `lazy/ts_backend`, which is breaking the XLA build/test pipeline.

### Issue
<!-- Link to Issue ticket or RFP -->
This currently blocks #82342 and #78182

### Testing
<!-- How did you test your change? -->
https://github.com/pytorch/pytorch/runs/7551019518?check_suite_focus=true

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82384
Approved by: https://github.com/albanD
2022-07-28 16:55:29 +00:00
Kurt Mohler
863176a1c7 Remove torch/csrc/generic (#82373)
### Description
Remove `torch/csrc/generic` since it is no longer needed.

### Issue
#82372

### Testing
No tests added

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82373
Approved by: https://github.com/ezyang
2022-07-28 07:45:31 +00:00
Sergii Dymchenko
3cf9c3d876 Remove obsolete Python < 3.3 TODO (#82278)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82278
Approved by: https://github.com/huydhn
2022-07-27 02:36:14 +00:00
mattip
37474a54de create a concated LICENSE file for wheels (#81500)
Fixes #81181 by creating a temporary LICENCE file that has all the third-party licenses concatenated together when creating a wheel. Also update the `third_party/LICENSES_BUNDLED.txt` file.

The `third_party/LICENSES_BUNDLED.txt` file is supposed to be tested via `tests/test_license.py`, but the test is not running?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81500
Approved by: https://github.com/rgommers, https://github.com/seemethere
2022-07-18 14:02:37 +00:00
Antonio Kim
65d03b1024 Add missing LTC headers to setup.py (#81424)
A number of headers that are not packaged but required for building vendor lazy tensor backends

Fixes #81423

CC: @wconstab @desertfire @ke1337 @henrytwo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/81424
Approved by: https://github.com/malfet
2022-07-14 00:30:27 +00:00
Jing Xu
3c7044728b Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289)
More detailed description of benefits can be found at #41001. This is Intel's counterpart of NVidia’s NVTX (https://pytorch.org/docs/stable/autograd.html#torch.autograd.profiler.emit_nvtx).

ITT is a functionality for labeling trace data during application execution across different Intel tools.
For integrating Intel(R) VTune Profiler into Kineto, ITT needs to be integrated into PyTorch first. It works with both standalone VTune Profiler [(https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html)) and Kineto-integrated VTune functionality in the future.
It works for both Intel CPU and Intel XPU devices.

Pitch
Add VTune Profiler's ITT API function calls to annotate PyTorch ops, as well as developer customized code scopes on CPU, like NVTX for NVidia GPU.

This PR rebases the code changes at https://github.com/pytorch/pytorch/pull/61335 to the latest master branch.

Usage example:
```
with torch.autograd.profiler.emit_itt():
    for i in range(10):
        torch.itt.range_push('step_{}'.format(i))
        model(input)
        torch.itt.range_pop()
```

cc @ilia-cher @robieta @chaekit @gdankel @bitfort @ngimel @orionr @nbcsm @guotuofeng @guyang3532 @gaoteng-git
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63289
Approved by: https://github.com/malfet
2022-07-13 13:50:15 +00:00
PyTorch MergeBot
1454515253 Revert "Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289)"
This reverts commit f988aa2b3f.

Reverted https://github.com/pytorch/pytorch/pull/63289 on behalf of https://github.com/malfet due to broke trunk, see f988aa2b3f
2022-06-30 12:49:41 +00:00
Jing Xu
f988aa2b3f Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289)
More detailed description of benefits can be found at #41001. This is Intel's counterpart of NVidia’s NVTX (https://pytorch.org/docs/stable/autograd.html#torch.autograd.profiler.emit_nvtx).

ITT is a functionality for labeling trace data during application execution across different Intel tools.
For integrating Intel(R) VTune Profiler into Kineto, ITT needs to be integrated into PyTorch first. It works with both standalone VTune Profiler [(https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html)) and Kineto-integrated VTune functionality in the future.
It works for both Intel CPU and Intel XPU devices.

Pitch
Add VTune Profiler's ITT API function calls to annotate PyTorch ops, as well as developer customized code scopes on CPU, like NVTX for NVidia GPU.

This PR rebases the code changes at https://github.com/pytorch/pytorch/pull/61335 to the latest master branch.

Usage example:
```
with torch.autograd.profiler.emit_itt():
    for i in range(10):
        torch.itt.range_push('step_{}'.format(i))
        model(input)
        torch.itt.range_pop()
```

cc @ilia-cher @robieta @chaekit @gdankel @bitfort @ngimel @orionr @nbcsm @guotuofeng @guyang3532 @gaoteng-git
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63289
Approved by: https://github.com/malfet
2022-06-30 05:14:03 +00:00
PyTorch MergeBot
ec4be38ba9 Revert "To add hipify_torch as a submodule in pytorch/third_party (#74704)"
This reverts commit 93b0fec39d.

Reverted https://github.com/pytorch/pytorch/pull/74704 on behalf of https://github.com/malfet due to broke torchvision
2022-06-21 23:54:00 +00:00
Bhavya Medishetty
93b0fec39d To add hipify_torch as a submodule in pytorch/third_party (#74704)
`hipify_torch` as a submodule in `pytorch/third_party`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74704
Approved by: https://github.com/jeffdaily, https://github.com/malfet
2022-06-21 18:56:49 +00:00
Amit Kumar Chawla
0c78821408 Compilation fix to access pretty_print_onnx function (#79864)
Description:

While using Pytorch header
"torch/csrc/jit/serialization/export.h" got compilation error.

File export_bytecode.h accesses
"#include <torch/csrc/jit/mobile/function.h>"

This mobile folder isn't present in torch installation dir.

This PR adds mobile folder to torch installation setup.

Fixes #79190

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79864
Approved by: https://github.com/ngimel
2022-06-21 18:17:09 +00:00
jjsjann123
c9c402eae9 [nvfuser_upstream_push] Reland: nvfuser code base bump 060822 (#79406)
Landing reverted PR #79147.

Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Bug fixes and minor refactor

Squashed commits to WAR github API
Commits that's actually in this PR from the devel branch:

```
4c60e7dff22a494632370e5df55c011007340d06 Add examples infrastructure for using nvFuser in a standalone program (#1725)
02a05d98334ffa580d73ccb28fdb8c577ad296fe Fix issue #1751 (#1753)
8a69aa320bd7629e1709fe5ceb7104d2c88ec84c Refactor NvFuser transpose API to match eager mode behavior (#1746)
ffdf6b7709048170d768217fcd7083fc8387f932 Remove BroadcastWithoutStride. (#1738)
02bab16035e70734450c02124f5cdaa95cf5749d Fix flipping of a boolean flag (#1745)
465d66890c8242e811224359cbdb1c2915490741 cleanup (#1744)
26d354e68720bc7dd2d3b1338ac01b707a230b6a fixing noncontig broadcast (#1742)
856b6b2f9073662dd98ca22ba6c3540e20eb1cdd Add IterDomainBuilder (#1736)
1fd974f912cd4c1e21cbd16e2abb23598d66a02f fixing warning for gcc7 (#1732)
de2740a43a869f8272c2648e091d7b8235097db9 disabling complex in python tests for #1730 (#1733)
fbbbe0a2e7c7a63e0e2719b8bfccb759b714221a fixing MSVC build (#1728)
b5feee5e2b28be688dbddc766f3c0220389c8175 Fix the fused reduction runtime kernel (#1729)
5247682dff5980bb66edf8d3aac25dea2ef2ced5 Re-entrant GroupedGridReduction (#1727)
```

RUN_TORCHBENCH: nvfuser
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79406
Approved by: https://github.com/davidberard98
2022-06-16 17:52:21 +00:00
PyTorch MergeBot
d28e9e145b Revert "[nvfuser_upstream_push] nvfuser code base bump 060822 (#79147)"
This reverts commit 49c41b87a2.

Reverted https://github.com/pytorch/pytorch/pull/79147 on behalf of https://github.com/janeyx99 due to Broke 11.3 builds on trunk 49c41b87a2
2022-06-10 20:55:10 +00:00
jjsjann123
49c41b87a2 [nvfuser_upstream_push] nvfuser code base bump 060822 (#79147)
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Bug fixes and minor refactor

Squashed commits to WAR github API
Commits that's actually in this PR from the devel branch:

```
4c60e7dff22a494632370e5df55c011007340d06 Add examples infrastructure for using nvFuser in a standalone program (#1725)
02a05d98334ffa580d73ccb28fdb8c577ad296fe Fix issue #1751 (#1753)
8a69aa320bd7629e1709fe5ceb7104d2c88ec84c Refactor NvFuser transpose API to match eager mode behavior (#1746)
ffdf6b7709048170d768217fcd7083fc8387f932 Remove BroadcastWithoutStride. (#1738)
02bab16035e70734450c02124f5cdaa95cf5749d Fix flipping of a boolean flag (#1745)
465d66890c8242e811224359cbdb1c2915490741 cleanup (#1744)
26d354e68720bc7dd2d3b1338ac01b707a230b6a fixing noncontig broadcast (#1742)
856b6b2f9073662dd98ca22ba6c3540e20eb1cdd Add IterDomainBuilder (#1736)
1fd974f912cd4c1e21cbd16e2abb23598d66a02f fixing warning for gcc7 (#1732)
de2740a43a869f8272c2648e091d7b8235097db9 disabling complex in python tests for #1730 (#1733)
fbbbe0a2e7c7a63e0e2719b8bfccb759b714221a fixing MSVC build (#1728)
b5feee5e2b28be688dbddc766f3c0220389c8175 Fix the fused reduction runtime kernel (#1729)
5247682dff5980bb66edf8d3aac25dea2ef2ced5 Re-entrant GroupedGridReduction (#1727)
```

RUN_TORCHBENCH: nvfuser
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79147
Approved by: https://github.com/davidberard98
2022-06-10 19:37:42 +00:00
Richard Zou
9da5defff6 Package config/template files with torchgen (#78942)
Package config/template files with torchgen

This PR packages native_functions.yaml, tags.yaml and ATen/templates
with torchgen.

This PR:
- adds a step to setup.py to copy the relevant files over into torchgen
- adds a docstring for torchgen (so `import torchgen; help(torchgen)`
says something)
- adds a helper function in torchgen so you can get the torchgen root
directory (and figure out where the packaged files are)
- changes some scripts to explicitly pass the location of torchgen,
which will be helpful for the first item in the Future section.

Future
======

- torchgen, when invoked from the command line, should use sources
in torchgen/packaged instead of aten/src. I'm unable to do this because
people (aka PyTorch CI) invokes `python -m torchgen.gen` without
installing torchgen.
- the source of truth for all of these files should be in torchgen.
This is a bit annoying to execute on due to potential merge conflicts
and dealing with merge systems
- CI and testing. The way things are set up right now is really fragile,
we should have a CI job for torchgen.

Test Plan
=========
I ran the following locally:

```
python -m torchgen.gen -s torchgen/packaged
```
and verified that it outputted files.

Furthermore, I did a setup.py install and checked that the files are
actually being packaged with torchgen.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78942
Approved by: https://github.com/ezyang
2022-06-07 13:33:55 +00:00
Andrey Talman
ca7f948806 Don't include libiomp with conda install on MacOS (#78632)
Fixes #78490

Following command:
```
conda install pytorch torchvision torchaudio -c pytorch-nightly
```

Installs libiomp . Hence we don't want to package libiomp with conda installs. However, we still keep it for libtorch and wheels.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78632
Approved by: https://github.com/malfet
2022-06-01 22:06:16 +00:00
Antonio Kim
f3f327e103 Decouple LTC from TS Backend using Lazy IR Builder
Next stage of breaking up https://github.com/pytorch/pytorch/pull/74710

IR builder class introduced to decouple the explicit usage of `TsNode` in core lazy tensors.

Requires https://github.com/pytorch/pytorch/pull/75324 to be merged in first.

**Background**
- there are ~ 5 special ops used in lazy core but defined as :public {Backend}Node.  (DeviceData, Expand, Scalar...)
- we currently require all nodes derive from {Backend}Node, so that backends can make this assumption safely
- it is hard to have shared 'IR classes' in core/ because they depend on 'Node'

**Motivation**

1. avoid copy-paste of "special" node classes for each backend
2. in general decouple and remove all dependencies that LTC has on the TS backend

**Summary of changes**
- new 'IRBuilder' interface that knows how to make 5 special ops
- move 'special' node classes to `ts_backend/`
- implement TSIRBuilder that makes the special TS Nodes
- new backend interface API to get the IRBuilder
- update core code to call the builder

CC: @wconstab @JackCaoG @henrytwo

Partially Fixes #74628

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75433
Approved by: https://github.com/wconstab
2022-04-28 02:07:02 +00:00
Edward Z. Yang
5109d81fc5 Distribute torchgen as part of PyTorch package
Fixes https://github.com/pytorch/pytorch/issues/73212

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/76306

Approved by: https://github.com/zou3519
2022-04-25 20:15:22 +00:00
Yeounoh Chung
0428364cbf Add missing LTC headers, re-enble xla configuration
Addresses XLA test failures due to missing PyTorch lazy tensor backend headers:
```
“fatal error: ‘torch/csrc/lazy/backend/backend_device.h’ file not found” from pytorch-xla-linux-bionic-py3.7-clang8
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74756
Approved by: https://github.com/seemethere
2022-03-28 20:07:20 +00:00
Han Qi
75d6cbe605 [4/5]Testing jit module in flatbuffer in Python. (#74387)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74387

Make temporary python bindings for flatbuffer to test ScriptModule save / load.

(Note: this ignores all push blocking failures!)

Test Plan: unittest

Reviewed By: iseeyuan

Differential Revision: D34968080

fbshipit-source-id: d23b16abda6e4b7ecf6b1198ed6e00908a3db903
(cherry picked from commit 5cbbc390c5f54146a1c469106ab4a6286c754325)
2022-03-24 23:29:47 +00:00
Sahan Paliskara
238d01ec90 Allow torch/csrc/deploy/interpreter/Optional.hpp to be allowed into the wheel distribution (#74643)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74643

Previously `torch/csrc/deploy/interpreter/Optional.hpp` wasn't getting included in the wheel distribution created by `USE_DEPLOY=1 python setup.py bdist_wheel`, this pr fixes that

Test Plan: Imported from OSS

Reviewed By: d4l3k

Differential Revision: D35094459

Pulled By: PaliC

fbshipit-source-id: 50aea946cc5bb72720b993075bd57ccf8377db30
(cherry picked from commit 6ad5d96594f40af3d49d2137c2b3799a2d493b36)
2022-03-24 00:47:57 +00:00
Kevin Tse
ff3688f07a [BE Hackathon][DataPipe] Automatically generate datapipe.pyi via CMake (#73991)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73991

Automatically generate `datapipe.pyi` via CMake and removing the generated .pyi file from Git. Users should have the .pyi file locally after building for the first time.

I will also be adding an internal equivalent diff for buck.

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D34868001

Pulled By: NivekT

fbshipit-source-id: 448c92da659d6b4c5f686407d3723933c266c74f
(cherry picked from commit 306dbc5f469e63bc141dac57ef310e6f0e16d9cd)
2022-03-15 14:46:34 +00:00
Ashwin Hari
7ed73b2803 CMake option for using static MKL libraries
Fixes #70587

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73069
Approved by: https://github.com/malfet
2022-03-07 19:32:33 +00:00
Mengwei Liu
9ce9803abe [PyTorch] Add codegen unboxing ability (#69881)
Summary:
RFC: https://github.com/pytorch/rfcs/pull/40

This PR (re)introduces python codegen for unboxing wrappers. Given an entry of `native_functions.yaml` the codegen should be able to generate the corresponding C++ code to convert ivalues from the stack to their proper types. To trigger the codegen, run
```
tools/jit/gen_unboxing.py -d cg/torch/share/ATen
```

Merged changes on CI test. In https://github.com/pytorch/pytorch/issues/71782 I added an e2e test for static dispatch + codegen unboxing. The test exports a mobile model of mobilenetv2, load and run it on a new binary for lite interpreter: `test/mobile/custom_build/lite_predictor.cpp`.

## Lite predictor build specifics

1. Codegen: `gen.py` generates `RegisterCPU.cpp` and `RegisterSchema.cpp`. Now with this PR, once `static_dispatch` mode is enabled, `gen.py` will not generate `TORCH_LIBRARY` API calls in those cpp files, hence avoids interaction with the dispatcher. Once `USE_LIGHTWEIGHT_DISPATCH` is turned on, `cmake/Codegen.cmake` calls `gen_unboxing.py` which generates `UnboxingFunctions.h`, `UnboxingFunctions_[0-4].cpp` and `RegisterCodegenUnboxedKernels_[0-4].cpp`.
2. Build: `USE_LIGHTWEIGHT_DISPATCH` adds generated sources into `all_cpu_cpp` in `aten/src/ATen/CMakeLists.txt`. All other files remain unchanged. In reality all the `Operators_[0-4].cpp` are not necessary but we can rely on linker to strip them off.

## Current CI job test coverage update

Created a new CI job `linux-xenial-py3-clang5-mobile-lightweight-dispatch-build` that enables the following build options:
* `USE_LIGHTWEIGHT_DISPATCH=1`
* `BUILD_LITE_INTERPRETER=1`
* `STATIC_DISPATCH_BACKEND=CPU`

This job triggers `test/mobile/lightweight_dispatch/build.sh` and builds `libtorch`. Then the script runs C++ tests written in `test_lightweight_dispatch.cpp` and `test_codegen_unboxing.cpp`. Recent commits added tests to cover as many C++ argument type as possible: in `build.sh` we installed PyTorch Python API so that we can export test models in `tests_setup.py`. Then we run C++ test binary to run these models on lightweight dispatch enabled runtime.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/69881

Reviewed By: iseeyuan

Differential Revision: D33692299

Pulled By: larryliu0820

fbshipit-source-id: 211e59f2364100703359b4a3d2ab48ca5155a023
(cherry picked from commit 58e1c9a25e3d1b5b656282cf3ac2f548d98d530b)
2022-03-01 23:28:13 +00:00
Luca Wehrstedt
b213041df3 Also install c10d headers with .h extension (#73422)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73422

Fixes https://github.com/pytorch/pytorch/issues/73421
ghstack-source-id: 149978120

Test Plan: None

Reviewed By: cbalioglu

Differential Revision: D34475711

fbshipit-source-id: 9e4d1d57021cbff51f53762b32bbfffbf3f81c4c
(cherry picked from commit 72ff35e28242132cf20e538d43ad3b63b3e497b1)
2022-02-28 08:39:10 +00:00
Nikita Shulga
dc5cda0cca Update min python version to 3.7 in setup.py and mypy configs (#71494)
Summary:
As Python-3.6 have reached EOL

Pull Request resolved: https://github.com/pytorch/pytorch/pull/71494

Reviewed By: atalman

Differential Revision: D33667509

Pulled By: malfet

fbshipit-source-id: ab1f03085cfb9161df77ba5ce373b81f5e7ef3ae
(cherry picked from commit 60343166d9)
2022-01-20 00:03:57 +00:00
Taylor Robie
ebc66bfeea [Profiler] Pull helper methods into dedicated file. (And start torch/csrc/profiler folder. (#69255)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69255

One thing that I've found as I optimize profier is that there's a lot of intermingled code, where the kineto profiler relies on the legacy (autograd) profiler for generic operations. This made optimization hard because I had to manage too many complex dependencies. (Exaserbated by the USE_KINETO #ifdef's sprinkled around.) This PR is the first of several to restructure the profiler(s) so the later optimizations go in easier.

Test Plan: Unit tests

Reviewed By: aaronenyeshi

Differential Revision: D32671972

fbshipit-source-id: efa83b40dde4216f368f2a5fa707360031a85707
2021-12-16 10:33:47 -08:00
Peter Bell
4829dcea09 Codegen: Generate seperate headers per operator (#68247)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68247

This splits `Functions.h`, `Operators.h`, `NativeFunctions.h` and
`NativeMetaFunctions.h` into seperate headers per operator base name.
With `at::sum` as an example, we can include:
```cpp
<ATen/core/sum.h>         // Like Functions.h
<ATen/core/sum_ops.h>     // Like Operators.h
<ATen/core/sum_native.h>  // Like NativeFunctions.h
<ATen/core/sum_meta.h>    // Like NativeMetaFunctions.h
```

The umbrella headers are still being generated, but all they do is
include from the `ATen/ops' folder.

Further, `TensorBody.h` now only includes the operators that have
method variants. Which means files that only include `Tensor.h` don't
need to be rebuilt when you modify function-only operators. Currently
there are about 680 operators that don't have method variants, so this
is potentially a significant win for incremental builds.

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D32596272

Pulled By: albanD

fbshipit-source-id: 447671b2b6adc1364f66ed9717c896dae25fa272
2021-12-14 06:40:08 -08:00
Jithun Nair
8dfdc3df82 [ROCm] Refactor how to specify AMD gpu targets using PYTORCH_ROCM_ARCH (#61706)
Summary:
Remove all hardcoded AMD gfx targets

PyTorch build and Magma build will use rocm_agent_enumerator as
backup if PYTORCH_ROCM_ARCH env var is not defined

PyTorch extensions will use same gfx targets as the PyTorch build,
unless PYTORCH_ROCM_ARCH env var is defined

torch.cuda.get_arch_list() now works for ROCm builds

PyTorch CI dockers will continue to be built for gfx900 and gfx906 for now.

PYTORCH_ROCM_ARCH env var can be a space or semicolon separated list of gfx archs eg. "gfx900 gfx906" or "gfx900;gfx906"
cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61706

Reviewed By: seemethere

Differential Revision: D32735862

Pulled By: malfet

fbshipit-source-id: 3170e445e738e3ce373203e1e4ae99c84e645d7d
2021-12-13 15:41:40 -08:00
Michael Suo
ad182479b0 [deploy] docs (#69251)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69251

This adds some actual documentation for deploy, which is probably useful
since we told everyone it was experimentally available so they will
probably be looking at what the heck it is.

It also wires up various compoenents of the OSS build to actually work
when used from an external project.

Differential Revision:
D32783312
D32783312

Test Plan: Imported from OSS

Reviewed By: wconstab

Pulled By: suo

fbshipit-source-id: c5c0a1e3f80fa273b5a70c13ba81733cb8d2c8f8
2021-12-01 21:55:18 -08:00
Eli Uriegas
f398320e0d packaging: Include lazy headers in package_data (#68817)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/68817

Looks like these files are getting used by downstream xla so we need to
include them in our package_data

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D32622241

Pulled By: seemethere

fbshipit-source-id: 7b64e5d4261999ee58bc61185bada6c60c2bb5cc
2021-11-29 08:29:48 -08:00