pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-06 12:20:52 +01:00

Author	SHA1	Message	Date
PyTorch MergeBot	8eb579e362	Revert "[Profiler] Move legacy profiler out of `torch/csrc/autograd` (#85512 )" This reverts commit `157a3d2a7c`. Reverted https://github.com/pytorch/pytorch/pull/85512 on behalf of https://github.com/DanilBaibak due to Due to files were deleted, the internal build failed. Please re-submit via codev.	2022-10-14 14:56:59 +00:00
Taylor Robie	157a3d2a7c	[Profiler] Move legacy profiler out of `torch/csrc/autograd` (#85512 ) The legacy profiler is an eyesore in the autograd folder. At this point the implementation is almost completely decoupled from the rest of profiler, and it is in maintaince mode pending deprecation. As a result, I'm moving it to `torch/csrc/profiler/standalone`. Unfortuantely BC requires that the symbols remain in `torch::autograd::profiler`, so I've put some basic forwarding logic in `torch/csrc/autograd/profiler.h`. One strange bit is that `profiler_legacy.h` forward declares `torch::autograd::Node`, but doesn't seem to do anything with it. I think we can delete it, but I want to test to make sure. (Note: this should not land until https://github.com/pytorch/torchrec/pull/595 is landed.) Differential Revision: [D39108648](https://our.internmc.facebook.com/intern/diff/D39108648/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85512 Approved by: https://github.com/aaronenyeshi	2022-10-14 05:38:48 +00:00
Taylor Robie	b8f14b7877	[Profiler][Minor] Group and consolidate stub APIs (#85510 ) There is a concept in profiler of a stub that wraps a profiling API. It was introduced for CUDA profiling before Kineto, and ITT has adopted it to call into VTune APIs. However for the most part we don't really interact with them when developing the PyTorch profiler. Thus it makes sense to unify the fallback registration mechanism and create a subfolder to free up real estate in the top level `torch/csrc/profiler` directory. Differential Revision: [D39108647](https://our.internmc.facebook.com/intern/diff/D39108647/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/85510 Approved by: https://github.com/aaronenyeshi	2022-10-14 05:38:46 +00:00
Jason Ansel	c7c09722ad	Move TorchDynamo into PyTorch core (#86461 ) Context: https://github.com/pytorch/torchdynamo/issues/1588 This PR moves [TorchDynamo](https://github.com/pytorch/torchdynamo) and TorchInductor into PyTorch core. - `torchdynamo` becomes `torch._dynamo` - `torchinductor` becomes `torch._inductor` This PR was generated by running `copy_to_core.sh` in https://github.com/pytorch/torchdynamo/pull/1538 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86461 Approved by: https://github.com/voznesenskym	2022-10-13 23:18:06 +00:00
Jason Ansel	f1fdb6efbd	Manual changes for moving dynamo to core (#86621 ) This is the subset of the changes in #86461 not auto-generated by `copy_to_core.sh`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86621 Approved by: https://github.com/albanD	2022-10-11 23:01:21 +00:00
Sahan Paliskara	936e93058b	Delete torch::deploy from pytorch core (#85953 ) As we have migrated torch::deploy over to https://github.com/pytorch/multipy, we can now delete it from pytorch core as ongoing development will happen there. This PR was created due to syncing issues with https://github.com/pytorch/pytorch/pull/85443 which is where the review history can be found. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85953 Approved by: https://github.com/seemethere, https://github.com/malfet	2022-10-06 07:20:16 +00:00
Min Si	089a64e99e	Install c10d headers with absolute path (#86257 ) https://github.com/pytorch/pytorch/pull/85780 updated all c10d headers in pytorch to use absolute path following the other distributed components. However, the headers were still copied to `${TORCH_INSTALL_INCLUDE_DIR}/torch`, thus external extentions still have to reference the c10d headers as `<c10d/*.h>`, making the usage inconsistent (the only exception was c10d/exception.h, which was copied to `${TORCH_INSTALL_INCLUDE_DIR}/torch/csrc/distributed/c10d`). This patch fixes the installation step to copy all c10d headers to `${TORCH_INSTALL_INCLUDE_DIR}/torch/csrc/distributed/c10d`, thus external extensions can consistently reference c10d headers with the absolute path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/86257 Approved by: https://github.com/kumpera	2022-10-05 20:02:05 +00:00
Jane Xu	3cdf621fe5	Add opt-einsum to CI (#85574 ) Depends on https://github.com/pytorch/pytorch/pull/84890. This PR adds opt_einsum to CI, enabling path optimization for the multi-input case. It also updates the installation sites to install torch with einsum, but those are mostly to make sure it would work on the user's end (as opt-einsum would have already been installed in the docker or in prior set up steps). This PR also updates the windows build_pytorch.bat script to use the same bdist_wheel and install commands as on Linux, replacing the `setup.py install` that'll become deprecated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85574 Approved by: https://github.com/huydhn, https://github.com/soulitzer	2022-09-29 14:28:55 +00:00
Jane Xu	e7e1cd945f	Add path optimize kwarg to einsum (#84890 ) ## This PR seeks to: - [x] add c++ support for an optimize path - [x] add python opt_einsum path passthrough - [x] add opt_einsum to OSS requirements, but a soft one - [x] show benchmark results here Additional things I've explored + their conclusions: - Delaying the summing over dimensions => added! - The idea here is to not incur kernel calls to `sum` as we try to early sum out in einsum. Thus, we collect all the dimensions that need to be summed together in one contraction + sum at the end instead of summing as we go. While this optimization didn't feel like it made things faster for the random cases we've selected (they all summed 1 dim per contraction), it is a good principle and would help more common use cases that would reduce multiple dimensions at a time (like `bxy,xyi,xyj->bij`). - Caching contract_path based on equation and tensor sizes => dropped :( - The benchmarks were strictly worse for all the cases, and, from scanning the use cases, I observed people do not often call einsum on the same equation/tensor order enough for caching to be justified. I do think caching can be effective in the future, but it would require further investigation. ## Not a part of this PR (but are next steps): - adding opt_einsum package to OSS CI - adding it to internal CI - potentially adding a kwarg path argument to the python API -- if the path is given, we wouldn't have to spend time calculating it, but there would be some time lost validating user input. ## Testing: - Added more tests to CI ## Benchmarking: TL;DRs - torch.einsum with opt_einsum is a definite win for the production case. - torch.einsum with opt_einsum installed is consistently fast, but has an overhead of needing to find the path. If the path is already found/optimal, it will be slightly slower. - The einsum overhead decreases for bigger dimensions. - torch.einsum without opt_einsum installed is comparable to before this commit, with occasional slowness potentially due to not reshaping/squeezing as we contract until the end. - For many of the random generated cases, the dimensions were too similar and small where an optimal order wasn't that much more optimal than just going left to right. However, in production, dimensions are commonly quite distinct (batch size will be small, but the data will be huge). - torch.einsum opt is comparable (slightly faster overall) compared to numpy.einsum opt for the cpu case. This is interesting given that torch.einsum currently spends time computing the path, but numpy.einsum takes it as input. - torch.einsum opt is significantly faster than numpy.einsum opt for the gpu case. This is because numpy doesn't take advantage of GPUs. The following benchmarks were done on an A100 GPU and Linux CPUs. The line in the first chart separates GPU (on top) from CPU, and the line in the second graph separates CPU (on top) and then GPU. Sorry it's flipped 😛 . Production example (see [colab benchmark](https://colab.research.google.com/drive/1V2s4v1dOOKwRvp5T_DC-PNUosOV9FFJx?authuser=1#scrollTo=WZoQkC8Mdt6I) for more context): <img width="1176" alt="image" src="https://user-images.githubusercontent.com/31798555/192012636-9a68bfa7-2601-43b1-afeb-b4e0877db6a4.png"> Randomly generated examples (the same ones as in https://github.com/pytorch/pytorch/pull/60191) <img width="1176" alt="image" src="https://user-images.githubusercontent.com/31798555/192012804-1c639595-b3e6-48c9-a385-ad851c13e1c2.png"> Open below to see old + not super relevant benchmarking results: <details> Benchmark results BEFORE this PR (on Linux -- I will update devices so they are consistent later): <img width="776" alt="image" src="https://user-images.githubusercontent.com/31798555/190807274-18f71fce-556e-47f4-b18c-e0f7d0c0d5aa.png"> Benchmark results with the code on this PR (on my x86 mac): For the CPU internal use case -- ![image](https://user-images.githubusercontent.com/31798555/190801376-6f591b00-cebd-4ca7-bb23-ae8f17f1634e.png) For the general use case -- It looks like numpy opt still does better in several of these random cases, but torch einsum opt is consistently faster than torch.einsum. ![image](https://user-images.githubusercontent.com/31798555/190811730-fbb6797d-af59-4f5a-92da-ba4103372014.png) <details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/84890 Approved by: https://github.com/albanD, https://github.com/soulitzer	2022-09-24 03:47:36 +00:00
atalman	eb94df28c7	Use pip install cu117 (#85097 ) Creates new wheel workflow specific to CUDA 11.7 that does not bundle the cudnn and cublas. Workflow: https://github.com/pytorch/pytorch/actions/runs/3094622781 New Package: manywheel-py3_10-cuda11_7-with-pypi-cudnn \| 843 MB Old Package: manywheel-py3_10-cuda11_7 \| 1.65 GB Testing workflow: [manywheel-py3_7-cuda11_7-with-pypi-cudnn-build / build](https://github.com/pytorch/pytorch/actions/runs/3091145546/jobs/5000867662#logs): ``` Bundling without cudnn and cublas. + DEPS_LIST=("/usr/local/cuda/lib64/libcudart.so.11.0" "/usr/local/cuda/lib64/libnvToolsExt.so.1" "/usr/local/cuda/lib64/libnvrtc.so.11.2" "/usr/local/cuda/lib64/libnvrtc-builtins.so.11.7" "$LIBGOMP_PATH") + DEPS_SONAME=("libcudart.so.11.0" "libnvToolsExt.so.1" "libnvrtc.so.11.2" "libnvrtc-builtins.so.11.7" "libgomp.so.1") ..... pytorch_extra_install_requirements: nvidia-cuda-runtime-cu11, nvidia-cudnn-cu11, nvidia-cublas-cu11 ``` [manywheel-py3_7-cuda11_7-build / build](https://github.com/pytorch/pytorch/actions/runs/3091145546/jobs/5000863250#logs) ``` Bundling with cudnn and cublas. + DEPS_LIST=("/usr/local/cuda/lib64/libcudart.so.11.0" "/usr/local/cuda/lib64/libnvToolsExt.so.1" "/usr/local/cuda/lib64/libnvrtc.so.11.2" "/usr/local/cuda/lib64/libnvrtc-builtins.so.11.7" "/usr/local/cuda/lib64/libcudnn_adv_infer.so.8" "/usr/local/cuda/lib64/libcudnn_adv_train.so.8" "/usr/local/cuda/lib64/libcudnn_cnn_infer.so.8" "/usr/local/cuda/lib64/libcudnn_cnn_train.so.8" "/usr/local/cuda/lib64/libcudnn_ops_infer.so.8" "/usr/local/cuda/lib64/libcudnn_ops_train.so.8" "/usr/local/cuda/lib64/libcudnn.so.8" "/usr/local/cuda/lib64/libcublas.so.11" "/usr/local/cuda/lib64/libcublasLt.so.11" "$LIBGOMP_PATH") + DEPS_SONAME=("libcudart.so.11.0" "libnvToolsExt.so.1" "libnvrtc.so.11.2" "libnvrtc-builtins.so.11.7" "libcudnn_adv_infer.so.8" "libcudnn_adv_train.so.8" "libcudnn_cnn_infer.so.8" "libcudnn_cnn_train.so.8" "libcudnn_ops_infer.so.8" "libcudnn_ops_train.so.8" "libcudnn.so.8" "libcublas.so.11" "libcublasLt.so.11" "libgomp.so.1") ``` cc: @malfet @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/85097 Approved by: https://github.com/malfet	2022-09-21 16:30:25 +00:00
Nikita Shulga	d05a11337c	[CMake] Add functorch target (#83464 ) Move functorch/functorch into `functorch` folder - Add functorch/CMakeLists.txt that adds `functorch` native python exension - Modify `setup.py` to package pytorch and functorch together into a single wheel - Modify `functorch.__version__` is not equal to that of `torch.__version__` - Add dummy `functorch/setup.py` file for the projects that still want to build it Differential Revision: [D39058811](https://our.internmc.facebook.com/intern/diff/D39058811) Pull Request resolved: https://github.com/pytorch/pytorch/pull/83464 Approved by: https://github.com/zou3519	2022-09-14 00:05:33 +00:00
Kento Nozawa	5238404f4d	Increment `version_range_max` (#84815 ) Python 3.10 should be added as a listing in `Programming Language` on https://pypi.org/project/torch/: <img width="238" alt="Screenshot 2022-09-11 at 2 48 01" src="https://user-images.githubusercontent.com/7121753/189495599-72bd6a28-4248-4e4e-8194-b5b1f9e984e2.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/84815 Approved by: https://github.com/malfet	2022-09-12 21:38:16 +00:00
Driss Guessous	0fc02dbba4	flash_attention integration (#81434 ) # Summary: - I added a new submodule Cutlass pointing to 2.10 release. The inclusion of flash_attention code should be gated by the flag: USE_FLASH_ATTENTION. This is defaulted to off resulting in flash to not be build anywhere. This is done on purpose since we don't have A100 machines to compile and test on. - Only looked at CMake did not attempt bazel or buck yet. - I included the mha_fwd from flash_attention that has ben refactored to use cutlass 2.10. There is currently no backwards kernel on this branch. That would be a good follow up. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81434 Approved by: https://github.com/cpuhrsch	2022-09-09 20:11:26 +00:00
Richard Zou	0a89bdf989	Set up aten/src/ATen/functorch directory; move some files there (#84648 ) This PR: - sets up aten/src/ATen/functorch in PyTorch's build system - Moves {BatchedTensorImpl.h, and BatchedTensorImpl.cpp} there as a test. Test Plan: - functorch build and test should pass Differential Revision: [D39315051](https://our.internmc.facebook.com/intern/diff/D39315051) Pull Request resolved: https://github.com/pytorch/pytorch/pull/84648 Approved by: https://github.com/ezyang	2022-09-09 15:22:57 +00:00
Taylor Robie	bea0184033	Reland: [Profiler][Trivial] Create orchestration folder and move observer management there. (#83893 )" (#84667 ) Reland of https://github.com/pytorch/pytorch/pull/83893 Differential Revision: [D39282536](https://our.internmc.facebook.com/intern/diff/D39282536/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D39282536/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/84667 Approved by: https://github.com/slgong-fb	2022-09-08 17:09:19 +00:00
PyTorch MergeBot	8b578849b4	Revert "[Profiler][Trivial] Create orchestration folder and move observer management there. (#83893 )" This reverts commit `48a596ad3f`. Reverted https://github.com/pytorch/pytorch/pull/83893 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally	2022-09-01 18:34:58 +00:00
Nikita Shulga	4b8ae04788	[BE] Delete torch._dl extension (#84361 ) And lots of complexity around the availability of RTLD_GLOBAL flags in `os` module As this flag is always present since Python-3.3, see https://docs.python.org/3/library/os.html#os.RTLD_GLOBAL Fixes https://github.com/pytorch/pytorch/issues/84351 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84361 Approved by: https://github.com/kit1980	2022-08-31 19:59:31 +00:00
Taylor Robie	48a596ad3f	[Profiler][Trivial] Create orchestration folder and move observer management there. (#83893 ) Just a basic move. Later I'll add other subsystems. (Python, Kineto) Differential Revision: [D38925895](https://our.internmc.facebook.com/intern/diff/D38925895/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D38925895/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/83893 Approved by: https://github.com/slgong-fb	2022-08-30 21:40:59 +00:00
Nikita Shulga	91e754b268	[BE] setup.py refactors (#83635 ) No function changes, just move stuff around: - Move main code to `main` routine - Define torch and torchgen package data list in local vars Pull Request resolved: https://github.com/pytorch/pytorch/pull/83635 Approved by: https://github.com/kit1980	2022-08-21 14:50:39 +00:00
Yeounoh Chung	8707aabe9a	Bundle lazy ts backend (#82384 ) ### Description <!-- What did you change and why was it needed? --> `libtorch.so` is missing `lazy/ts_backend`, which is breaking the XLA build/test pipeline. ### Issue <!-- Link to Issue ticket or RFP --> This currently blocks #82342 and #78182 ### Testing <!-- How did you test your change? --> https://github.com/pytorch/pytorch/runs/7551019518?check_suite_focus=true Pull Request resolved: https://github.com/pytorch/pytorch/pull/82384 Approved by: https://github.com/albanD	2022-07-28 16:55:29 +00:00
Kurt Mohler	863176a1c7	Remove `torch/csrc/generic` (#82373 ) ### Description Remove `torch/csrc/generic` since it is no longer needed. ### Issue #82372 ### Testing No tests added Pull Request resolved: https://github.com/pytorch/pytorch/pull/82373 Approved by: https://github.com/ezyang	2022-07-28 07:45:31 +00:00
Sergii Dymchenko	3cf9c3d876	Remove obsolete Python < 3.3 TODO (#82278 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/82278 Approved by: https://github.com/huydhn	2022-07-27 02:36:14 +00:00
mattip	37474a54de	create a concated LICENSE file for wheels (#81500 ) Fixes #81181 by creating a temporary LICENCE file that has all the third-party licenses concatenated together when creating a wheel. Also update the `third_party/LICENSES_BUNDLED.txt` file. The `third_party/LICENSES_BUNDLED.txt` file is supposed to be tested via `tests/test_license.py`, but the test is not running? Pull Request resolved: https://github.com/pytorch/pytorch/pull/81500 Approved by: https://github.com/rgommers, https://github.com/seemethere	2022-07-18 14:02:37 +00:00
Antonio Kim	65d03b1024	Add missing LTC headers to setup.py (#81424 ) A number of headers that are not packaged but required for building vendor lazy tensor backends Fixes #81423 CC: @wconstab @desertfire @ke1337 @henrytwo Pull Request resolved: https://github.com/pytorch/pytorch/pull/81424 Approved by: https://github.com/malfet	2022-07-14 00:30:27 +00:00
Jing Xu	3c7044728b	Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289 ) More detailed description of benefits can be found at #41001. This is Intel's counterpart of NVidia’s NVTX (https://pytorch.org/docs/stable/autograd.html#torch.autograd.profiler.emit_nvtx). ITT is a functionality for labeling trace data during application execution across different Intel tools. For integrating Intel(R) VTune Profiler into Kineto, ITT needs to be integrated into PyTorch first. It works with both standalone VTune Profiler [(https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html)) and Kineto-integrated VTune functionality in the future. It works for both Intel CPU and Intel XPU devices. Pitch Add VTune Profiler's ITT API function calls to annotate PyTorch ops, as well as developer customized code scopes on CPU, like NVTX for NVidia GPU. This PR rebases the code changes at https://github.com/pytorch/pytorch/pull/61335 to the latest master branch. Usage example: ``` with torch.autograd.profiler.emit_itt(): for i in range(10): torch.itt.range_push('step_{}'.format(i)) model(input) torch.itt.range_pop() ``` cc @ilia-cher @robieta @chaekit @gdankel @bitfort @ngimel @orionr @nbcsm @guotuofeng @guyang3532 @gaoteng-git Pull Request resolved: https://github.com/pytorch/pytorch/pull/63289 Approved by: https://github.com/malfet	2022-07-13 13:50:15 +00:00
PyTorch MergeBot	1454515253	Revert "Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289 )" This reverts commit `f988aa2b3f`. Reverted https://github.com/pytorch/pytorch/pull/63289 on behalf of https://github.com/malfet due to broke trunk, see `f988aa2b3f`	2022-06-30 12:49:41 +00:00
Jing Xu	f988aa2b3f	Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289 ) More detailed description of benefits can be found at #41001. This is Intel's counterpart of NVidia’s NVTX (https://pytorch.org/docs/stable/autograd.html#torch.autograd.profiler.emit_nvtx). ITT is a functionality for labeling trace data during application execution across different Intel tools. For integrating Intel(R) VTune Profiler into Kineto, ITT needs to be integrated into PyTorch first. It works with both standalone VTune Profiler [(https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html)) and Kineto-integrated VTune functionality in the future. It works for both Intel CPU and Intel XPU devices. Pitch Add VTune Profiler's ITT API function calls to annotate PyTorch ops, as well as developer customized code scopes on CPU, like NVTX for NVidia GPU. This PR rebases the code changes at https://github.com/pytorch/pytorch/pull/61335 to the latest master branch. Usage example: ``` with torch.autograd.profiler.emit_itt(): for i in range(10): torch.itt.range_push('step_{}'.format(i)) model(input) torch.itt.range_pop() ``` cc @ilia-cher @robieta @chaekit @gdankel @bitfort @ngimel @orionr @nbcsm @guotuofeng @guyang3532 @gaoteng-git Pull Request resolved: https://github.com/pytorch/pytorch/pull/63289 Approved by: https://github.com/malfet	2022-06-30 05:14:03 +00:00
PyTorch MergeBot	ec4be38ba9	Revert "To add hipify_torch as a submodule in pytorch/third_party (#74704 )" This reverts commit `93b0fec39d`. Reverted https://github.com/pytorch/pytorch/pull/74704 on behalf of https://github.com/malfet due to broke torchvision	2022-06-21 23:54:00 +00:00
Bhavya Medishetty	93b0fec39d	To add hipify_torch as a submodule in pytorch/third_party (#74704 ) `hipify_torch` as a submodule in `pytorch/third_party` Pull Request resolved: https://github.com/pytorch/pytorch/pull/74704 Approved by: https://github.com/jeffdaily, https://github.com/malfet	2022-06-21 18:56:49 +00:00
Amit Kumar Chawla	0c78821408	Compilation fix to access pretty_print_onnx function (#79864 ) Description: While using Pytorch header "torch/csrc/jit/serialization/export.h" got compilation error. File export_bytecode.h accesses "#include <torch/csrc/jit/mobile/function.h>" This mobile folder isn't present in torch installation dir. This PR adds mobile folder to torch installation setup. Fixes #79190 Pull Request resolved: https://github.com/pytorch/pytorch/pull/79864 Approved by: https://github.com/ngimel	2022-06-21 18:17:09 +00:00
jjsjann123	c9c402eae9	[nvfuser_upstream_push] Reland: nvfuser code base bump 060822 (#79406 ) Landing reverted PR #79147. Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/ Bug fixes and minor refactor Squashed commits to WAR github API Commits that's actually in this PR from the devel branch: ``` 4c60e7dff22a494632370e5df55c011007340d06 Add examples infrastructure for using nvFuser in a standalone program (#1725) 02a05d98334ffa580d73ccb28fdb8c577ad296fe Fix issue #1751 (#1753) 8a69aa320bd7629e1709fe5ceb7104d2c88ec84c Refactor NvFuser transpose API to match eager mode behavior (#1746) ffdf6b7709048170d768217fcd7083fc8387f932 Remove BroadcastWithoutStride. (#1738) 02bab16035e70734450c02124f5cdaa95cf5749d Fix flipping of a boolean flag (#1745) 465d66890c8242e811224359cbdb1c2915490741 cleanup (#1744) 26d354e68720bc7dd2d3b1338ac01b707a230b6a fixing noncontig broadcast (#1742) 856b6b2f9073662dd98ca22ba6c3540e20eb1cdd Add IterDomainBuilder (#1736) 1fd974f912cd4c1e21cbd16e2abb23598d66a02f fixing warning for gcc7 (#1732) de2740a43a869f8272c2648e091d7b8235097db9 disabling complex in python tests for #1730 (#1733) fbbbe0a2e7c7a63e0e2719b8bfccb759b714221a fixing MSVC build (#1728) b5feee5e2b28be688dbddc766f3c0220389c8175 Fix the fused reduction runtime kernel (#1729) 5247682dff5980bb66edf8d3aac25dea2ef2ced5 Re-entrant GroupedGridReduction (#1727) ``` RUN_TORCHBENCH: nvfuser Pull Request resolved: https://github.com/pytorch/pytorch/pull/79406 Approved by: https://github.com/davidberard98	2022-06-16 17:52:21 +00:00
PyTorch MergeBot	d28e9e145b	Revert "[nvfuser_upstream_push] nvfuser code base bump 060822 (#79147 )" This reverts commit `49c41b87a2`. Reverted https://github.com/pytorch/pytorch/pull/79147 on behalf of https://github.com/janeyx99 due to Broke 11.3 builds on trunk `49c41b87a2`	2022-06-10 20:55:10 +00:00
jjsjann123	49c41b87a2	[nvfuser_upstream_push] nvfuser code base bump 060822 (#79147 ) Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/ Bug fixes and minor refactor Squashed commits to WAR github API Commits that's actually in this PR from the devel branch: ``` 4c60e7dff22a494632370e5df55c011007340d06 Add examples infrastructure for using nvFuser in a standalone program (#1725) 02a05d98334ffa580d73ccb28fdb8c577ad296fe Fix issue #1751 (#1753) 8a69aa320bd7629e1709fe5ceb7104d2c88ec84c Refactor NvFuser transpose API to match eager mode behavior (#1746) ffdf6b7709048170d768217fcd7083fc8387f932 Remove BroadcastWithoutStride. (#1738) 02bab16035e70734450c02124f5cdaa95cf5749d Fix flipping of a boolean flag (#1745) 465d66890c8242e811224359cbdb1c2915490741 cleanup (#1744) 26d354e68720bc7dd2d3b1338ac01b707a230b6a fixing noncontig broadcast (#1742) 856b6b2f9073662dd98ca22ba6c3540e20eb1cdd Add IterDomainBuilder (#1736) 1fd974f912cd4c1e21cbd16e2abb23598d66a02f fixing warning for gcc7 (#1732) de2740a43a869f8272c2648e091d7b8235097db9 disabling complex in python tests for #1730 (#1733) fbbbe0a2e7c7a63e0e2719b8bfccb759b714221a fixing MSVC build (#1728) b5feee5e2b28be688dbddc766f3c0220389c8175 Fix the fused reduction runtime kernel (#1729) 5247682dff5980bb66edf8d3aac25dea2ef2ced5 Re-entrant GroupedGridReduction (#1727) ``` RUN_TORCHBENCH: nvfuser Pull Request resolved: https://github.com/pytorch/pytorch/pull/79147 Approved by: https://github.com/davidberard98	2022-06-10 19:37:42 +00:00
Richard Zou	9da5defff6	Package config/template files with torchgen (#78942 ) Package config/template files with torchgen This PR packages native_functions.yaml, tags.yaml and ATen/templates with torchgen. This PR: - adds a step to setup.py to copy the relevant files over into torchgen - adds a docstring for torchgen (so `import torchgen; help(torchgen)` says something) - adds a helper function in torchgen so you can get the torchgen root directory (and figure out where the packaged files are) - changes some scripts to explicitly pass the location of torchgen, which will be helpful for the first item in the Future section. Future ====== - torchgen, when invoked from the command line, should use sources in torchgen/packaged instead of aten/src. I'm unable to do this because people (aka PyTorch CI) invokes `python -m torchgen.gen` without installing torchgen. - the source of truth for all of these files should be in torchgen. This is a bit annoying to execute on due to potential merge conflicts and dealing with merge systems - CI and testing. The way things are set up right now is really fragile, we should have a CI job for torchgen. Test Plan ========= I ran the following locally: ``` python -m torchgen.gen -s torchgen/packaged ``` and verified that it outputted files. Furthermore, I did a setup.py install and checked that the files are actually being packaged with torchgen. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78942 Approved by: https://github.com/ezyang	2022-06-07 13:33:55 +00:00
Andrey Talman	ca7f948806	Don't include libiomp with conda install on MacOS (#78632 ) Fixes #78490 Following command: ``` conda install pytorch torchvision torchaudio -c pytorch-nightly ``` Installs libiomp . Hence we don't want to package libiomp with conda installs. However, we still keep it for libtorch and wheels. Pull Request resolved: https://github.com/pytorch/pytorch/pull/78632 Approved by: https://github.com/malfet	2022-06-01 22:06:16 +00:00
Antonio Kim	f3f327e103	Decouple LTC from TS Backend using Lazy IR Builder Next stage of breaking up https://github.com/pytorch/pytorch/pull/74710 IR builder class introduced to decouple the explicit usage of `TsNode` in core lazy tensors. Requires https://github.com/pytorch/pytorch/pull/75324 to be merged in first. Background - there are ~ 5 special ops used in lazy core but defined as :public {Backend}Node. (DeviceData, Expand, Scalar...) - we currently require all nodes derive from {Backend}Node, so that backends can make this assumption safely - it is hard to have shared 'IR classes' in core/ because they depend on 'Node' Motivation 1. avoid copy-paste of "special" node classes for each backend 2. in general decouple and remove all dependencies that LTC has on the TS backend Summary of changes - new 'IRBuilder' interface that knows how to make 5 special ops - move 'special' node classes to `ts_backend/` - implement TSIRBuilder that makes the special TS Nodes - new backend interface API to get the IRBuilder - update core code to call the builder CC: @wconstab @JackCaoG @henrytwo Partially Fixes #74628 Pull Request resolved: https://github.com/pytorch/pytorch/pull/75433 Approved by: https://github.com/wconstab	2022-04-28 02:07:02 +00:00
Edward Z. Yang	5109d81fc5	Distribute torchgen as part of PyTorch package Fixes https://github.com/pytorch/pytorch/issues/73212 Signed-off-by: Edward Z. Yang <ezyangfb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/76306 Approved by: https://github.com/zou3519	2022-04-25 20:15:22 +00:00
Yeounoh Chung	0428364cbf	Add missing LTC headers, re-enble xla configuration Addresses XLA test failures due to missing PyTorch lazy tensor backend headers: ``` “fatal error: ‘torch/csrc/lazy/backend/backend_device.h’ file not found” from pytorch-xla-linux-bionic-py3.7-clang8 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/74756 Approved by: https://github.com/seemethere	2022-03-28 20:07:20 +00:00
Han Qi	75d6cbe605	[4/5]Testing jit module in flatbuffer in Python. (#74387 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74387 Make temporary python bindings for flatbuffer to test ScriptModule save / load. (Note: this ignores all push blocking failures!) Test Plan: unittest Reviewed By: iseeyuan Differential Revision: D34968080 fbshipit-source-id: d23b16abda6e4b7ecf6b1198ed6e00908a3db903 (cherry picked from commit 5cbbc390c5f54146a1c469106ab4a6286c754325)	2022-03-24 23:29:47 +00:00
Sahan Paliskara	238d01ec90	Allow torch/csrc/deploy/interpreter/Optional.hpp to be allowed into the wheel distribution (#74643 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74643 Previously `torch/csrc/deploy/interpreter/Optional.hpp` wasn't getting included in the wheel distribution created by `USE_DEPLOY=1 python setup.py bdist_wheel`, this pr fixes that Test Plan: Imported from OSS Reviewed By: d4l3k Differential Revision: D35094459 Pulled By: PaliC fbshipit-source-id: 50aea946cc5bb72720b993075bd57ccf8377db30 (cherry picked from commit 6ad5d96594f40af3d49d2137c2b3799a2d493b36)	2022-03-24 00:47:57 +00:00
Kevin Tse	ff3688f07a	[BE Hackathon][DataPipe] Automatically generate datapipe.pyi via CMake (#73991 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73991 Automatically generate `datapipe.pyi` via CMake and removing the generated .pyi file from Git. Users should have the .pyi file locally after building for the first time. I will also be adding an internal equivalent diff for buck. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D34868001 Pulled By: NivekT fbshipit-source-id: 448c92da659d6b4c5f686407d3723933c266c74f (cherry picked from commit 306dbc5f469e63bc141dac57ef310e6f0e16d9cd)	2022-03-15 14:46:34 +00:00
Ashwin Hari	7ed73b2803	CMake option for using static MKL libraries Fixes #70587 Pull Request resolved: https://github.com/pytorch/pytorch/pull/73069 Approved by: https://github.com/malfet	2022-03-07 19:32:33 +00:00
Mengwei Liu	9ce9803abe	[PyTorch] Add codegen unboxing ability (#69881 ) Summary: RFC: https://github.com/pytorch/rfcs/pull/40 This PR (re)introduces python codegen for unboxing wrappers. Given an entry of `native_functions.yaml` the codegen should be able to generate the corresponding C++ code to convert ivalues from the stack to their proper types. To trigger the codegen, run ``` tools/jit/gen_unboxing.py -d cg/torch/share/ATen ``` Merged changes on CI test. In https://github.com/pytorch/pytorch/issues/71782 I added an e2e test for static dispatch + codegen unboxing. The test exports a mobile model of mobilenetv2, load and run it on a new binary for lite interpreter: `test/mobile/custom_build/lite_predictor.cpp`. ## Lite predictor build specifics 1. Codegen: `gen.py` generates `RegisterCPU.cpp` and `RegisterSchema.cpp`. Now with this PR, once `static_dispatch` mode is enabled, `gen.py` will not generate `TORCH_LIBRARY` API calls in those cpp files, hence avoids interaction with the dispatcher. Once `USE_LIGHTWEIGHT_DISPATCH` is turned on, `cmake/Codegen.cmake` calls `gen_unboxing.py` which generates `UnboxingFunctions.h`, `UnboxingFunctions_[0-4].cpp` and `RegisterCodegenUnboxedKernels_[0-4].cpp`. 2. Build: `USE_LIGHTWEIGHT_DISPATCH` adds generated sources into `all_cpu_cpp` in `aten/src/ATen/CMakeLists.txt`. All other files remain unchanged. In reality all the `Operators_[0-4].cpp` are not necessary but we can rely on linker to strip them off. ## Current CI job test coverage update Created a new CI job `linux-xenial-py3-clang5-mobile-lightweight-dispatch-build` that enables the following build options: * `USE_LIGHTWEIGHT_DISPATCH=1` * `BUILD_LITE_INTERPRETER=1` * `STATIC_DISPATCH_BACKEND=CPU` This job triggers `test/mobile/lightweight_dispatch/build.sh` and builds `libtorch`. Then the script runs C++ tests written in `test_lightweight_dispatch.cpp` and `test_codegen_unboxing.cpp`. Recent commits added tests to cover as many C++ argument type as possible: in `build.sh` we installed PyTorch Python API so that we can export test models in `tests_setup.py`. Then we run C++ test binary to run these models on lightweight dispatch enabled runtime. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69881 Reviewed By: iseeyuan Differential Revision: D33692299 Pulled By: larryliu0820 fbshipit-source-id: 211e59f2364100703359b4a3d2ab48ca5155a023 (cherry picked from commit 58e1c9a25e3d1b5b656282cf3ac2f548d98d530b)	2022-03-01 23:28:13 +00:00
Luca Wehrstedt	b213041df3	Also install c10d headers with .h extension (#73422 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/73422 Fixes https://github.com/pytorch/pytorch/issues/73421 ghstack-source-id: 149978120 Test Plan: None Reviewed By: cbalioglu Differential Revision: D34475711 fbshipit-source-id: 9e4d1d57021cbff51f53762b32bbfffbf3f81c4c (cherry picked from commit 72ff35e28242132cf20e538d43ad3b63b3e497b1)	2022-02-28 08:39:10 +00:00
Nikita Shulga	dc5cda0cca	Update min python version to 3.7 in setup.py and mypy configs (#71494 ) Summary: As Python-3.6 have reached EOL Pull Request resolved: https://github.com/pytorch/pytorch/pull/71494 Reviewed By: atalman Differential Revision: D33667509 Pulled By: malfet fbshipit-source-id: ab1f03085cfb9161df77ba5ce373b81f5e7ef3ae (cherry picked from commit `60343166d9`)	2022-01-20 00:03:57 +00:00
Taylor Robie	ebc66bfeea	[Profiler] Pull helper methods into dedicated file. (And start `torch/csrc/profiler` folder. (#69255 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69255 One thing that I've found as I optimize profier is that there's a lot of intermingled code, where the kineto profiler relies on the legacy (autograd) profiler for generic operations. This made optimization hard because I had to manage too many complex dependencies. (Exaserbated by the USE_KINETO #ifdef's sprinkled around.) This PR is the first of several to restructure the profiler(s) so the later optimizations go in easier. Test Plan: Unit tests Reviewed By: aaronenyeshi Differential Revision: D32671972 fbshipit-source-id: efa83b40dde4216f368f2a5fa707360031a85707	2021-12-16 10:33:47 -08:00
Peter Bell	4829dcea09	Codegen: Generate seperate headers per operator (#68247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68247 This splits `Functions.h`, `Operators.h`, `NativeFunctions.h` and `NativeMetaFunctions.h` into seperate headers per operator base name. With `at::sum` as an example, we can include: ```cpp <ATen/core/sum.h> // Like Functions.h <ATen/core/sum_ops.h> // Like Operators.h <ATen/core/sum_native.h> // Like NativeFunctions.h <ATen/core/sum_meta.h> // Like NativeMetaFunctions.h ``` The umbrella headers are still being generated, but all they do is include from the `ATen/ops' folder. Further, `TensorBody.h` now only includes the operators that have method variants. Which means files that only include `Tensor.h` don't need to be rebuilt when you modify function-only operators. Currently there are about 680 operators that don't have method variants, so this is potentially a significant win for incremental builds. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32596272 Pulled By: albanD fbshipit-source-id: 447671b2b6adc1364f66ed9717c896dae25fa272	2021-12-14 06:40:08 -08:00
Jithun Nair	8dfdc3df82	[ROCm] Refactor how to specify AMD gpu targets using PYTORCH_ROCM_ARCH (#61706 ) Summary: Remove all hardcoded AMD gfx targets PyTorch build and Magma build will use rocm_agent_enumerator as backup if PYTORCH_ROCM_ARCH env var is not defined PyTorch extensions will use same gfx targets as the PyTorch build, unless PYTORCH_ROCM_ARCH env var is defined torch.cuda.get_arch_list() now works for ROCm builds PyTorch CI dockers will continue to be built for gfx900 and gfx906 for now. PYTORCH_ROCM_ARCH env var can be a space or semicolon separated list of gfx archs eg. "gfx900 gfx906" or "gfx900;gfx906" cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH Pull Request resolved: https://github.com/pytorch/pytorch/pull/61706 Reviewed By: seemethere Differential Revision: D32735862 Pulled By: malfet fbshipit-source-id: 3170e445e738e3ce373203e1e4ae99c84e645d7d	2021-12-13 15:41:40 -08:00
Michael Suo	ad182479b0	[deploy] docs (#69251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69251 This adds some actual documentation for deploy, which is probably useful since we told everyone it was experimentally available so they will probably be looking at what the heck it is. It also wires up various compoenents of the OSS build to actually work when used from an external project. Differential Revision: D32783312 D32783312 Test Plan: Imported from OSS Reviewed By: wconstab Pulled By: suo fbshipit-source-id: c5c0a1e3f80fa273b5a70c13ba81733cb8d2c8f8	2021-12-01 21:55:18 -08:00
Eli Uriegas	f398320e0d	packaging: Include lazy headers in package_data (#68817 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68817 Looks like these files are getting used by downstream xla so we need to include them in our package_data Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D32622241 Pulled By: seemethere fbshipit-source-id: 7b64e5d4261999ee58bc61185bada6c60c2bb5cc	2021-11-29 08:29:48 -08:00

1 2 3 4 5 ...

655 Commits