pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
yanbing-j	dc40b6d043	Upgrade oneDNN to v2.7.2 (#90051 ) This PR is to upgrade oneDNN to v2.7.2. ### oneDNN v2.7.1 & 2.7.2 changes: Fixes #89104 Updated ITT API version to 3.23.0 ### Performance Benchmark Use TorchBench test in ICX with 40 cores Intel OpenMP & tcmalloc were preloaded ![image](https://user-images.githubusercontent.com/61222868/205240855-04e2d50f-8b3a-4097-9038-fdd0c0fc93b9.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/90051 Approved by: https://github.com/XiaobingSuper, https://github.com/jgong5	2022-12-08 09:41:02 +00:00
Facebook Community Bot	3ef4fc2012	Automated submodule update: FBGEMM (#74729 ) This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `f99e161663` Test Plan: Ensure that CI jobs succeed on GitHub before landing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/74729 Approved by: https://github.com/malfet	2022-12-07 22:36:35 +00:00
PyTorch MergeBot	0d8e53dfe7	Revert "[Composable API] `replicate`: change to per module call, remove `mark_root_module()` (#89222 )" This reverts commit `65a0dcffd8`. Reverted https://github.com/pytorch/pytorch/pull/89222 on behalf of https://github.com/malfet due to Included unintended submodule updates	2022-12-06 03:26:28 +00:00
Charlie Yan	65a0dcffd8	[Composable API] `replicate`: change to per module call, remove `mark_root_module()` (#89222 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89222 Approved by: https://github.com/zhaojuanmao	2022-12-05 17:54:55 +00:00
Nikita Shulga	f2cf1b0f5e	Revert submodule updates introduced by #89157 (#89449 ) Reverts updates that were introduced by https://github.com/pytorch/pytorch/pull/89157 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89449 Approved by: https://github.com/kit1980, https://github.com/huydhn, https://github.com/clee2000	2022-11-22 05:48:43 +00:00
Taylor Robie	cf9476554f	update kineto pinned commit (#89435 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/89435 Approved by: https://github.com/malfet	2022-11-21 17:32:29 +00:00
yanbing-j	a80e5e7813	Update ideep for future performance improvement (#87966 ) Summary The update includes API changes and optimzations to reduce framework overhead, which will benefit all mkldnn (onednn) ops in JIT mode and inductor CPU backend, etc. These benefits will be seen after switching to new ideep API by future PRs. Test plan For correctness, all UTs that call mkldnn ops, including test_ops.py, test_mkldnn*.py, test_quantization.py, etc. For performance, TorchBench has been run and no regression is found. Results are shown below. - Intel (R) Xeon (R) IceLake with 40 cores - Use multi-instance - Using tcmalloc & Intel OMP ![image](https://user-images.githubusercontent.com/12522207/201631004-bb77468d-953b-4757-a001-94d44615b5f6.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87966 Approved by: https://github.com/jgong5, https://github.com/XiaobingSuper	2022-11-21 09:52:36 +00:00
Nikita Shulga	ea58955dda	Move bazel to c++17 (#89297 ) Splitting out various smaller pieces from https://github.com/pytorch/pytorch/pull/85969 Pull Request resolved: https://github.com/pytorch/pytorch/pull/89297 Approved by: https://github.com/huydhn	2022-11-19 01:13:08 +00:00
Zain Rizvi	ab75982d3a	Always retry curl downloads (#89157 ) Modify our curl commands so that they always retry downloads. By default, curl only retries what it considers to be "transient" errors, based on the server's response. However, curl's estimate of what's transient is very conservative. By adding the --retry-all-errors parameter we'll always retry curl commands. In particular, I'm hoping this mitigates errors where curl fails with the below error ([logs](https://github.com/pytorch/pytorch/actions/runs/3468758110/jobs/5794939941)) `curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to ossci-linux.s3.amazonaws.com:443` Some of the modified downloads didn't even have retries, so I added them in More details: https://everything.curl.dev/usingcurl/downloads/retry Pull Request resolved: https://github.com/pytorch/pytorch/pull/89157 Approved by: https://github.com/kit1980, https://github.com/malfet	2022-11-18 07:03:24 +00:00
Kenichi Maehashi	e2f0648750	Add an option to include actual license terms to the output (#85624 ) When building products using PyTorch, it is often required to display license terms for all dependencies. The feature itself has been implemented in #81500 but it seems there are no options to enable it. This PR implements the option. cc/ @mattip @rgommers Pull Request resolved: https://github.com/pytorch/pytorch/pull/85624 Approved by: https://github.com/rgommers, https://github.com/seemethere	2022-11-16 05:07:53 +00:00
Nikita Shulga	6be426ca1a	Update gloo submodule (#88530 ) Also, add an explicit cudart dependency to `torch_cuda` if Kineto is used with GPU support (it used to be somehow inherited from a wrong `gloo` setup) Pull Request resolved: https://github.com/pytorch/pytorch/pull/88530 Approved by: https://github.com/osalpekar	2022-11-09 01:04:32 +00:00
Aaron Gokaslan	5fb9c113ae	Update pybind11 to v2.10.1 (#88332 ) I am one of the maintainers of pybind11, and a frequent PyTorch user. We added quite a lot of bugfixes and performance improvements in 2.10.1 (see the changelog for full details) and I wanted to upstream them to PyTorch. Our releases is tested throughout Google's codebase including on their global builds of PyTorch so there should be no surprises. The main new feature is optin in Eigen Tensor to Numpy casters. Pull Request resolved: https://github.com/pytorch/pytorch/pull/88332 Approved by: https://github.com/soumith	2022-11-03 02:53:26 +00:00
Minh Nguyen	bd4c4537dc	aten cpu and xnnpack to be compatible with arvr mode build (#87125 ) Summary: When building 3d photo sdk generator package in arvr/mode/mac and arvr/mode/mac-arm modes, we got several issues with aten cpu and xnnpack libraries. The reason is that those packages are using platform-* properties (platform-deps, platform-srcs...) which are not compatible with arvr modes. This diff fixes those issues by using `select` for non-platform properties when is_arvr_mode() is true, while keeping those platform ones for non-arvr modes. Test Plan: ``` buck build //arvr/projects/compphoto/photo3d_sdk/unity/plugin:generator_plugin_shared arvr/mode/mac-arm/dev buck build //arvr/projects/compphoto/photo3d_sdk/unity/plugin:generator_plugin_shared arvr/mode/mac-arm/opt buck build //arvr/projects/compphoto/photo3d_sdk/unity/plugin:generator_plugin_shared arvr/mode/mac/dev buck build //arvr/projects/compphoto/photo3d_sdk/unity/plugin:generator_plugin_shared arvr/mode/mac/opt ``` and sandcastle builds Differential Revision: D40028669 Pull Request resolved: https://github.com/pytorch/pytorch/pull/87125 Approved by: https://github.com/kimishpatel	2022-10-25 22:52:52 +00:00
Christian Puhrsch	f6c6048b10	Use CUTLASS GEMM for NT bmm (#85894 ) Copy of https://github.com/pytorch/pytorch/pull/85710 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85894 Approved by: https://github.com/drisspg	2022-10-18 23:11:47 +00:00
Jiang, Yanbing	c56be31d2e	Upgrade oneDNN to v2.7 (#87061 ) This PR is to upgrade oneDNN to v2.7. ### oneDNN v2.7 changes: Performance Optimizations - Improved performance for future Intel Xeon Scalable processors (code name Sapphire Rapids). - Introduced performance optimizations for [bf16 floating point math mode](http://oneapi-src.github.io/oneDNN/group_dnnl_api_mathmode.html) on Intel Xeon Scalable processors (code name Sapphire Rapids). The bf16 math mode allows oneDNN to use bf16 arithmetic and Intel AMX instructions in computations on fp32 data. Please go to https://github.com/oneapi-src/oneDNN/releases/tag/v2.7 for more detailed changes. ### oneDNN v2.6.1 & 2.6.2 changes: Functionality - Updated ITT API to 3.22.5 - Fixed correctness issue in fp32 convolution implementation for cases with large spatial size (https://github.com/pytorch/pytorch/issues/84488) ### Performance Benchmark Use TorchBench test in ICX with 40 cores Intel OpenMP & tcmalloc were preloaded ![image](https://user-images.githubusercontent.com/61222868/196121957-656faebc-9f4a-49f0-9ef0-0784416c3a47.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/87061 Approved by: https://github.com/jgong5, https://github.com/XiaobingSuper, https://github.com/weiwangmeta	2022-10-18 19:07:58 +00:00
sanchitintel	974ad8fa6c	Add BFloat16 dtype support for oneDNN Graph JIT fuser (#85591 ) ## BFloat16 dtype support for faster inference with TorchScript using oneDNN Graph Intel Xeon Cooper Lake platform & beyond support the `AVX512_BF16` ISA, which is essentially native BFloat16 support. oneDNN Graph delivers high inference performance with BFloat16 on such machines. While oneDNN Graph can still be used with BFloat16 on older machines that lack `avx512_bf16` ISA but support `avx512bw`, `avx512vl` & `avx512dq` ISAs, the BF16 performance on these older machines will be significantly poorer (probably even poorer than Float32), as they lack native BF16 support. Currently, [AMP support for eager mode & JIT mode is divergent in PyTorch](https://github.com/pytorch/pytorch/issues/75956). So, for using oneDNN Graph with BFloat16, eager-mode AMP should be leveraged by turning off AMP for JIT mode, using `torch._C._jit_set_autocast_mode(False)` in python code, so as to avoid conflicts. Please use the following environment variable to view JIT logs - `PYTORCH_JIT_LOG_LEVEL=">>graph_helper:>>graph_fuser:>>kernel:>>interface"` ## Changes being made in this PR 1. This PR does NOT change the `oneDNN` commit or the `ideep` files. While the `ideep` commit is being updated, only files pertaining to oneDNN Graph are being updated. oneDNN Graph is being upgraded to version 0.5.2 (alpha patch release 2). To put things into perspective, `ideep` is a git submodule of PyTorch. `oneDNN Graph` is a git submodule of `ideep` (`ideep/mkl-dnn`), and oneDNN is a git submodule of oneDNN Graph (`ideep/mkl-dnn/third_party/oneDNN`). 2. Unit-tests are being updated. We now use the [existing dtypes decorator](https://github.com/pytorch/pytorch/blob/master/torch/testing/_internal/common_device_type.py#L123-L131). 3. Suggestions made by @eellison in the [FP32 PR](https://github.com/pytorch/pytorch/pull/68111#pullrequestreview-896719477) are being incorporated/addressed - \| Action-item \| Status \| \| :--- \| ---: \| \|checkInputCompatibility follow up \| Fixed \| \|the mayConvertScalarInputToTensor logic we can consider \| Added type promotion code \| \|fix up fixConvOptionalBias\| The current approach seems correct \| \|Use opinfo tests\| using dtypes decorator. Will use `OpInfo` in a subsequent PR, if that'd be possible. Should we create a list of ops from opDB that are supported by oneDNN Graph, and add it to `common_methods_invocations.py`? \| \|inferDevice torch_check call \| not necessary now, perhaps, as only CPU is supported, for now? We'd add it by the beta release of oneDNN Graph, though, so that by then, users might be able to use other fusers with oneDNN Graph (NNC/TensorExpr are already compatible with the oneDNN Graph fuser). We can still add it, if you'd insist. \| \|not checking shapes of input mkldnn tensor to llga guard \| Those checks should not be present because oneDNN Graph may use blocked or channels-last layout, so those strides would be different. They're only skipped if an LLGA subgraph's output is input to another LLGA subgraph, which enables LLGA to choose an optimal layout between them. \| \|fix test failures with respect to unsupported inputs \| We'll address them with the upcoming release of oneDNN Graph beta version\| 4. More PyTorch ops are being been mapped to oneDNN Graph ## Example of using oneDNN Graph with BFloat16 ```python # Assuming we have a model of the name 'model' example_input = torch.rand(1, 3, 224, 224) # enable oneDNN Graph torch.jit.enable_onednn_fusion(True) # Disable AMP for JIT torch._C._jit_set_autocast_mode(False) with torch.no_grad(), torch.cpu.amp.autocast(): model = torch.jit.trace(model, (example_input)) model = torch.jit.freeze(model) # 2 warm-ups (2 for tracing/scripting with an example, 3 without an example) model(example_input) model(example_input) # speedup would be observed in subsequent runs. model(example_input) ``` ## TorchBench based Benchmarks URL: https://github.com/sanchitintel/benchmark/tree/onednn_graph_benchmark (instructions present at URL). Batch-size(s): TorchBench-default for each model Baseline : PyTorch JIT OFI FP32 Machine: Intel(R) Xeon(R) Platinum 8371HC (Cooper Lake) Sockets used: 1 Number of cores on one socket: 26 Intel OpenMP & tcmalloc were preloaded #### Benchmark results with single thread \| name \| latency of PyTorch JIT OFI FP32 (s) \| Latency of oneDNN Graph BF16 (s) \| % change \| \| :--- \| ---: \| ---: \| ---: \| \| test_eval[alexnet-cpu-jit] \| 1.063851 \| 0.509820 \| -52.1% \| \| test_eval[mnasnet1_0-cpu-jit] \| 0.218435 \| 0.107100 \| -51.0% \| \| test_eval[mobilenet_v2-cpu-jit] \| 0.114467 \| 0.058359 \| -49.0% \| \| test_eval[mobilenet_v3_large-cpu-jit] \| 0.233873 \| 0.117614 \| -49.7% \| \| test_eval[resnet18-cpu-jit] \| 0.160584 \| 0.075854 \| -52.8% \| \| test_eval[resnet50-cpu-jit] \| 1.652846 \| 0.713373 \| -56.8% \| \| test_eval[resnext50_32x4d-cpu-jit] \| 0.471174 \| 0.209431 \| -55.6% \| \|test_eval[shufflenet_v2_x1_0-cpu-jit] \| 0.310306 \| 0.167090 \| -46.2% \| \| test_eval[squeezenet1_1-cpu-jit] \| 0.161247 \| 0.045684 \| -71.7% \| \| test_eval[timm_efficientnet-cpu-jit] \| 1.643772 \| 0.800099 \| -51.3% \| \| test_eval[timm_regnet-cpu-jit] \| 5.732272 \| 2.333417 \| -59.3% \| \| test_eval[timm_resnest-cpu-jit] \| 1.366464 \| 0.715252 \| -47.7% \| \| test_eval[timm_vision_transformer-cpu-jit] \| 0.508521 \| 0.271598 \| -46.6% \| \| test_eval[timm_vovnet-cpu-jit] \| 2.756692 \| 1.125033 \| -59.2% \| \| test_eval[vgg16-cpu-jit] \| 0.711533 \| 0.312344 \| -56.1% \| #### Benchmark results with 26 threads: \| name \| latency of PyTorch JIT OFI FP32 (s) \| Latency of oneDNN Graph BF16 (s) \| % change \| \| :--- \| ---: \| ---: \| ---: \| \| test_eval[alexnet-cpu-jit] \| 0.062871 \| 0.034198 \| -45.6% \| \| test_eval[mnasnet1_0-cpu-jit] \| 0.022490 \| 0.008172 \| -63.7% \| \| test_eval[mobilenet_v2-cpu-jit] \| 0.012730 \| 0.005866 \| -53.9% \| \| test_eval[mobilenet_v3_large-cpu-jit] \| 0.025948 \| 0.010346 \| -60.1% \| \| test_eval[resnet18-cpu-jit] \| 0.011194 \| 0.005726 \| -48.9% \| \| test_eval[resnet50-cpu-jit] \| 0.124662 \| 0.045599 \| -63.4% \| \| test_eval[resnext50_32x4d-cpu-jit] \| 0.034737 \| 0.015214 \| -56.2% \| \|test_eval[shufflenet_v2_x1_0-cpu-jit] \| 0.028820 \| 0.012517 \| -56.6% \| \| test_eval[squeezenet1_1-cpu-jit] \| 0.012557 \| 0.003876 \| -69.1% \| \| test_eval[timm_efficientnet-cpu-jit] \| 0.203177 \| 0.051879 \| -74.5% \| \| test_eval[timm_regnet-cpu-jit] \| 0.452050 \| 0.151113 \| -66.6% \| \| test_eval[timm_resnest-cpu-jit] \| 0.117072 \| 0.052848 \| -54.9% \| \| test_eval[timm_vision_transformer-cpu-jit] \| 0.046048 \| 0.023275 \| -49.5% \| \| test_eval[timm_vovnet-cpu-jit] \| 0.213187 \| 0.077482 \| -63.7% \| \| test_eval[vgg16-cpu-jit] \| 0.044726 \| 0.021998 \| -50.8% \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/85591 Approved by: https://github.com/jgong5, https://github.com/frank-wei, https://github.com/chunyuan-w	2022-10-13 20:36:59 +00:00
PyTorch MergeBot	d169f950da	Revert "Use CUTLASS GEMM for NT bmm [OSS-only] (#85894 )" This reverts commit `ef58a132f2`. Reverted https://github.com/pytorch/pytorch/pull/85894 on behalf of https://github.com/DanilBaibak due to Break internal build	2022-10-13 15:28:09 +00:00
Christian Puhrsch	ef58a132f2	Use CUTLASS GEMM for NT bmm [OSS-only] (#85894 ) OSS-only copy of https://github.com/pytorch/pytorch/pull/85710 Pull Request resolved: https://github.com/pytorch/pytorch/pull/85894 Approved by: https://github.com/drisspg	2022-10-12 20:03:28 +00:00
Nikita Shulga	09364f4298	Compile C10 with `Wshadow` (#86666 ) This should prevent further regressions like https://github.com/pytorch/pytorch/pull/86646 Update `fmt` to `7.1.0` to fix variable shadowing in that library Pull Request resolved: https://github.com/pytorch/pytorch/pull/86666 Approved by: https://github.com/seemethere	2022-10-11 22:39:58 +00:00
Jianyu Huang	577070ff96	update fbgemm commit ID in PyTorch (#86577 ) Summary: Update after https://github.com/pytorch/FBGEMM/pull/1388 . Previous issue: D40216348 Test Plan: CI Differential Revision: D40219252 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86577 Approved by: https://github.com/malfet	2022-10-11 02:15:53 +00:00
Nikita Shulga	6a1e3f2f37	Update fbgemm submodule (#86054 ) Reland of `481def752c` Fixes https://github.com/pytorch/pytorch/issues/85956 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86054 Approved by: https://github.com/xuzhao9	2022-10-03 05:51:22 +00:00
Nikita Shulga	b9b24c31fd	[MPS] Fix non-contig to contig tensor copy (#86056 ) This handles a rare case when MPS tensor is constructed from non-contiguous CPU tensor. Fixes https://github.com/pytorch/pytorch/issues/85967 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86056 Approved by: https://github.com/janeyx99	2022-10-02 20:13:05 +00:00
Nikita Shulga	481def752c	Update fbgemm submodule (#86054 ) Fixes https://github.com/pytorch/pytorch/issues/85956 Pull Request resolved: https://github.com/pytorch/pytorch/pull/86054 Approved by: https://github.com/xuzhao9	2022-10-02 15:05:34 +00:00
Peter Bell	9a81da7ad1	Update NCCL to current master and remove patch step (#85367 ) The patch from #84245 has been upstreamed into NCCL, so the patch step is no longer required. Pull Request resolved: https://github.com/pytorch/pytorch/pull/85367 Approved by: https://github.com/ezyang	2022-09-21 19:23:49 +00:00
PyTorch MergeBot	35088f283e	Revert "Python stack tracing OD flow (part 1) (#84362 )" This reverts commit `1f4f05e59c`. Reverted https://github.com/pytorch/pytorch/pull/84362 on behalf of https://github.com/malfet due to Broke CUDA-10.2 tests, see `1f4f05e59c`	2022-09-20 03:42:43 +00:00
Seonglyong Gong	1f4f05e59c	Python stack tracing OD flow (part 1) (#84362 ) Summary: submodule update Test Plan: CI Differential Revision: D39176686 Pull Request resolved: https://github.com/pytorch/pytorch/pull/84362 Approved by: https://github.com/robieta	2022-09-19 21:33:55 +00:00
atalman	25d91e0a9d	Updating cudnn_frontend to 0.7.1 (#84943 ) Updating cudnn_frontend to 0.7.1 To enable CUDNN 8.5 integration cc @malfet @ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/84943 Approved by: https://github.com/huydhn, https://github.com/malfet	2022-09-13 23:00:09 +00:00
Driss Guessous	0fc02dbba4	flash_attention integration (#81434 ) # Summary: - I added a new submodule Cutlass pointing to 2.10 release. The inclusion of flash_attention code should be gated by the flag: USE_FLASH_ATTENTION. This is defaulted to off resulting in flash to not be build anywhere. This is done on purpose since we don't have A100 machines to compile and test on. - Only looked at CMake did not attempt bazel or buck yet. - I included the mha_fwd from flash_attention that has ben refactored to use cutlass 2.10. There is currently no backwards kernel on this branch. That would be a good follow up. Pull Request resolved: https://github.com/pytorch/pytorch/pull/81434 Approved by: https://github.com/cpuhrsch	2022-09-09 20:11:26 +00:00
Stephen Jia	732255f031	[vulkan] Add VMA as a third_party subrepo (#83906 ) the [VulkanMemoryAllocator](https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator) is a popular library for GPU memory allocation using Vulkan. The Vulkan backend has a dependency on it, but since it is only a single header file we currently include it by checking it into the repo under [aten/src/ATen/native/vulkan/api/vk_mem_alloc.h](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/vulkan/api/vk_mem_alloc.h). However, it is better to check it in as a third party submodule, since it allows better version tracking. Pull Request resolved: https://github.com/pytorch/pytorch/pull/83906 Approved by: https://github.com/kimishpatel	2022-08-23 18:42:46 +00:00
Huy Do	f0ee21fe0a	Update cpuinfo to the latest commit (#83620 ) This hasn't been updated for a while, so pulling the latest commit from https://github.com/pytorch/cpuinfo. I wonder if it breaks anything Fixes #83594 Pull Request resolved: https://github.com/pytorch/pytorch/pull/83620 Approved by: https://github.com/malfet	2022-08-20 06:16:54 +00:00
yanbing-j	6dc8673b1b	Update ideep for NNC post-op (#82705 ) ### Description This PR is to add NNC post-op fusion support in ideep for further NNC development. It includes: - element wise post op fusion - conv/matmal/linear + binary post op fusion ### Performance Common configuration: - Jemalloc and iomp enabled - BS=1 - num_warmup = 300 - num_run = 500 - Average time of 1 iteration in ms is used - time_before: no fusion - time_after: with fusion - Eltwise OPs selected: hardswish and abs - Using oneDNN v2.6 On ICX (32 cores per socket): Conv2d FP32 (in channels Last format) \| shape \| time_(ms)_before \| time_(ms)_after \| Gain -- \| -- \| -- \| -- \| -- 1socket \| Conv+abs_kernel=3_N=1_iC=64_H=56_W=56_oC=64_stride=1_pad=1_dilates=1_groups=1 \| 0.112174 \| 0.071106 \| 36.61% 1socket \| Conv+hardswish_kernel=3_N=1_iC=64_H=56_W=56_oC=64_stride=1_pad=1_dilates=1_groups=1 \| 0.11269 \| 0.070586 \| 37.36% 1socket \| Conv+abs_kernel=3_N=1_iC=512_H=56_W=56_oC=512_stride=2_pad=1_dilates=1_groups=32 \| 0.164219 \| 0.129498 \| 21.14% 1socket \| Conv+hardswish_kernel=3_N=1_iC=512_H=56_W=56_oC=512_stride=2_pad=1_dilates=1_groups=32 \| 0.169371 \| 0.1277 \| 24.60% \| \| \| \| \| shape \| time_(ms)_before \| time_(ms)_after \| Gain 1thread \| Conv+abs_kernel=3_N=1_iC=64_H=56_W=56_oC=64_stride=1_pad=1_dilates=1_groups=1 \| 1.994555 \| 1.429813 \| 28.31% 1thread \| Conv+hardswish_kernel=3_N=1_iC=64_H=56_W=56_oC=64_stride=1_pad=1_dilates=1_groups=1 \| 1.715168 \| 1.459937 \| 14.88% 1thread \| Conv+abs_kernel=3_N=1_iC=512_H=56_W=56_oC=512_stride=2_pad=1_dilates=1_groups=32 \| 2.997382 \| 2.47915 \| 17.29% 1thread \| Conv+hardswish_kernel=3_N=1_iC=512_H=56_W=56_oC=512_stride=2_pad=1_dilates=1_groups=32 \| 3.044476 \| 2.499366 \| 17.90% \| \| \| \| \| shape \| time_(ms)_before \| time_(ms)_after \| Gain 4thread \| Conv+abs_kernel=3_N=1_iC=64_H=56_W=56_oC=64_stride=1_pad=1_dilates=1_groups=1 \| 0.405204 \| 0.38117 \| 5.93% 4thread \| Conv+hardswish_kernel=3_N=1_iC=64_H=56_W=56_oC=64_stride=1_pad=1_dilates=1_groups=1 \| 0.410145 \| 0.389279 \| 5.09% 4thread \| Conv+abs_kernel=3_N=1_iC=512_H=56_W=56_oC=512_stride=2_pad=1_dilates=1_groups=32 \| 0.67917 \| 0.662792 \| 2.41% 4thread \| Conv+hardswish_kernel=3_N=1_iC=512_H=56_W=56_oC=512_stride=2_pad=1_dilates=1_groups=32 \| 0.682302 \| 0.671226 \| 1.62% On CPX (28 cores per socket): Conv2d BF16 (in channels Last format) \| shape \| time_(ms)_before \| time_(ms)_after \| Gain -- \| -- \| -- \| -- \| -- 1socket \| Conv+abs_kernel=3_N=1_iC=64_H=56_W=56_oC=64_stride=1_pad=1_dilates=1_groups=1 \| 0.119289 \| 0.091015 \| 23.70% 1socket \| Conv+hardswish_kernel=3_N=1_iC=64_H=56_W=56_oC=64_stride=1_pad=1_dilates=1_groups=1 \| 0.144116 \| 0.09339 \| 35.20% 1socket \| Conv+abs_kernel=3_N=1_iC=512_H=56_W=56_oC=512_stride=2_pad=1_dilates=1_groups=32 \| 0.209975 \| 0.177111 \| 15.65% 1socket \| Conv+hardswish_kernel=3_N=1_iC=512_H=56_W=56_oC=512_stride=2_pad=1_dilates=1_groups=32 \| 0.234777 \| 0.179945 \| 23.36% \| \| \| \| \| shape \| time_(ms)_before \| time_(ms)_after \| Gain 1thread \| Conv+abs_kernel=3_N=1_iC=64_H=56_W=56_oC=64_stride=1_pad=1_dilates=1_groups=1 \| 1.296252 \| 1.086423 \| 16.19% 1thread \| Conv+hardswish_kernel=3_N=1_iC=64_H=56_W=56_oC=64_stride=1_pad=1_dilates=1_groups=1 \| 1.364738 \| 1.131289 \| 17.11% 1thread \| Conv+abs_kernel=3_N=1_iC=512_H=56_W=56_oC=512_stride=2_pad=1_dilates=1_groups=32 \| 3.99519 \| 3.736147 \| 6.48% 1thread \| Conv+hardswish_kernel=3_N=1_iC=512_H=56_W=56_oC=512_stride=2_pad=1_dilates=1_groups=32 \| 4.03415 \| 3.77981 \| 6.30% \| \| \| \| \| shape \| time_(ms)_before \| time_(ms)_after \| Gain 4thread \| Conv+abs_kernel=3_N=1_iC=64_H=56_W=56_oC=64_stride=1_pad=1_dilates=1_groups=1 \| 0.27474 \| 0.245281 \| 10.72% 4thread \| Conv+hardswish_kernel=3_N=1_iC=64_H=56_W=56_oC=64_stride=1_pad=1_dilates=1_groups=1 \| 0.28595 \| 0.254748 \| 10.91% 4thread \| Conv+abs_kernel=3_N=1_iC=512_H=56_W=56_oC=512_stride=2_pad=1_dilates=1_groups=32 \| 0.847318 \| 0.791453 \| 6.59% 4thread \| Conv+hardswish_kernel=3_N=1_iC=512_H=56_W=56_oC=512_stride=2_pad=1_dilates=1_groups=32 \| 0.870212 \| 0.801594 \| 7.89% On CPX (28 cores per socket): Linear BF16 \| shape \| time_(ms)_before \| time_(ms)_after \| Gain -- \| -- \| -- \| -- \| -- 1socket \| Linear+abs_N=1_iC=1024_oC=4096 \| 0.043199 \| 0.037603 \| 12.95% 1socket \| Linear+hardswish_N=1_iC=1024_oC=4096 \| 0.041845 \| 0.038332 \| 8.40% 1socket \| Linear+abs_N=1_iC=4096_oC=1024 \| 0.048282 \| 0.044281 \| 8.29% 1socket \| Linear+hardswish_N=1_iC=4096_oC=1024 \| 0.048362 \| 0.044106 \| 8.80% 1socket \| Linear+abs_N=1_iC=2048_oC=1000 \| 0.036302 \| 0.0344 \| 5.24% 1socket \| Linear+hardswish_N=1_iC=2048_oC=1000 \| 0.035734 \| 0.035593 \| 0.39% \| \| \| \| \| shape \| time_(ms)_before \| time_(ms)_after \| Gain 1thread \| Linear+abs_N=1_iC=1024_oC=4096 \| 0.365143 \| 0.36279 \| 0.64% 1thread \| Linear+hardswish_N=1_iC=1024_oC=4096 \| 0.364464 \| 0.363392 \| 0.29% 1thread \| Linear+abs_N=1_iC=4096_oC=1024 \| 0.384498 \| 0.379902 \| 1.20% 1thread \| Linear+hardswish_N=1_iC=4096_oC=1024 \| 0.382545 \| 0.381252 \| 0.34% 1thread \| Linear+abs_N=1_iC=2048_oC=1000 \| 0.213244 \| 0.209999 \| 1.52% 1thread \| Linear+hardswish_N=1_iC=2048_oC=1000 \| 0.212003 \| 0.208567 \| 1.62% \| \| \| \| \| shape \| time_(ms)_before \| time_(ms)_after \| Gain 4thread \| Linear+abs_N=1_iC=1024_oC=4096 \| 0.126096 \| 0.12157 \| 3.59% 4thread \| Linear+hardswish_N=1_iC=1024_oC=4096 \| 0.126627 \| 0.121662 \| 3.92% 4thread \| Linear+abs_N=1_iC=4096_oC=1024 \| 0.132845 \| 0.128921 \| 2.95% 4thread \| Linear+hardswish_N=1_iC=4096_oC=1024 \| 0.132642 \| 0.12783 \| 3.63% 4thread \| Linear+abs_N=1_iC=2048_oC=1000 \| 0.079582 \| 0.072584 \| 8.79% 4thread \| Linear+hardswish_N=1_iC=2048_oC=1000 \| 0.077761 \| 0.071981 \| 7.43% Pull Request resolved: https://github.com/pytorch/pytorch/pull/82705 Approved by: https://github.com/frank-wei, https://github.com/eellison	2022-08-18 05:08:12 +00:00
Nikita Shulga	c08092fdf2	Update NCCL to v2.13.4-1 (#82775 ) Also, update slimming script to include two instances of net.o that new library generates Pull Request resolved: https://github.com/pytorch/pytorch/pull/82775 Approved by: https://github.com/ngimel	2022-08-04 19:36:45 +00:00
zengk95	d0e6e5a5bb	Revert "sym_numel (#82374 )" (#82726 ) TSIA It looks like this PR #82374 is breaking mac builds on trunk but I can't revert it normally since there's a merge conflict in the XLA hash. <img width="1753" alt="image" src="https://user-images.githubusercontent.com/34172846/182644661-b7fdda4b-e5ce-45c3-96a2-ad6737d169ae.png"> I reverted it and resolved the conflict using the old XLA hash that this commit was based upon Pull Request resolved: https://github.com/pytorch/pytorch/pull/82726 Approved by: https://github.com/albanD, https://github.com/janeyx99	2022-08-03 15:23:47 +00:00
Nikolay Korovaiko	fd68b0931f	sym_numel (#82374 ) ### Description This PR makes `numel` symint-aware similar to `sym_sizes()` and `sym_strides()`. Similar to https://github.com/pytorch/pytorch/pull/81300 . This PR is the part of a bigger project to support dynamic_shapes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/82374 Approved by: https://github.com/ezyang	2022-08-03 06:33:45 +00:00
Jianyu Huang	916a565151	Upgrade fbgemm in OSS PyTorch (#82676 ) Differential Revision: D38368525 Pull Request resolved: https://github.com/pytorch/pytorch/pull/82676 Approved by: https://github.com/ngimel	2022-08-03 00:28:43 +00:00
albanD	4b7de26556	Fix C API to be compatible with latest 3.11 beta (#81242 ) Based off https://github.com/pytorch/pytorch/pull/80511 with extra changes: - Update pybind to the latest release as it contains some needed fixes - Extend the compat header to do reduce changes in code Pull Request resolved: https://github.com/pytorch/pytorch/pull/81242 Approved by: https://github.com/malfet, https://github.com/mattip	2022-07-27 08:37:10 +00:00
Max Ren	0b3a239e85	[pocket fft] turning on pocketfft flag (#81670 ) Summary: enabling AT_POCKETFFT_ENABLED@ flag and adding the appropriate dependencies to aten-cpu moved mkl files from `aten_cpu_source_non_codegen_list` to `aten_native_source_non_codegen_list` Test Plan: After building testing binaries for both android and ios targets ### iOS `fbcode/aibench/specifications/frameworks/pytorch/ios/build.sh` Submitted benchmarks with the new binaries supporting pocketfft here: https://www.internalfb.com/intern/aibench/details/245253003946591 ### Android `fbcode/aibench/specifications/frameworks/pytorch/android/arm64/build.sh` Submitted Benchmarks with the new binaries supporting pocket fft here: https://www.internalfb.com/intern/aibench/details/406253690682941 ### Build Size Impact Success: igios-pika on D37790257-V7 ☷[pocket fft] turning on pocketfft flag☷ Diff: https://fburl.com/diff/exkploof Unigraph Explorer: https://fburl.com/mbex/aipdzaqo Changes for variation [arm64 + 3x assets]: ```Compressed : -473 B (-0.00%) => 86.69 MiB Uncompressed: +2.4 KiB (+0.00%) => 187.71 MiB ``` Reviewed By: kimishpatel Differential Revision: D37790257 Pull Request resolved: https://github.com/pytorch/pytorch/pull/81670 Approved by: https://github.com/kit1980	2022-07-21 02:45:20 +00:00
PyTorch MergeBot	7408004454	Revert "[Codemod][Format buck files with arc lint] caffe2/third_party (#81441 )" This reverts commit `1233c3c256`. Reverted https://github.com/pytorch/pytorch/pull/81441 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally	2022-07-19 09:57:32 +00:00
James Donald	1233c3c256	[Codemod][Format buck files with arc lint] caffe2/third_party (#81441 ) Reviewed By: jdonald Differential Revision: D37710887 Pull Request resolved: https://github.com/pytorch/pytorch/pull/81441 Approved by: https://github.com/malfet	2022-07-18 17:10:23 +00:00
mattip	37474a54de	create a concated LICENSE file for wheels (#81500 ) Fixes #81181 by creating a temporary LICENCE file that has all the third-party licenses concatenated together when creating a wheel. Also update the `third_party/LICENSES_BUNDLED.txt` file. The `third_party/LICENSES_BUNDLED.txt` file is supposed to be tested via `tests/test_license.py`, but the test is not running? Pull Request resolved: https://github.com/pytorch/pytorch/pull/81500 Approved by: https://github.com/rgommers, https://github.com/seemethere	2022-07-18 14:02:37 +00:00
Taylor Robie	9d3c35d1e1	Back out "Revert D37720837: Back out "Revert D37228314: [Profiler] Include ActivityType from Kineto"" (#81450 ) Differential Revision: [D37842341](https://our.internmc.facebook.com/intern/diff/D37842341/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D37842341/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/81450 Approved by: https://github.com/pbelevich	2022-07-15 18:25:40 +00:00
Jing Xu	3c7044728b	Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289 ) More detailed description of benefits can be found at #41001. This is Intel's counterpart of NVidia’s NVTX (https://pytorch.org/docs/stable/autograd.html#torch.autograd.profiler.emit_nvtx). ITT is a functionality for labeling trace data during application execution across different Intel tools. For integrating Intel(R) VTune Profiler into Kineto, ITT needs to be integrated into PyTorch first. It works with both standalone VTune Profiler [(https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html](https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html)) and Kineto-integrated VTune functionality in the future. It works for both Intel CPU and Intel XPU devices. Pitch Add VTune Profiler's ITT API function calls to annotate PyTorch ops, as well as developer customized code scopes on CPU, like NVTX for NVidia GPU. This PR rebases the code changes at https://github.com/pytorch/pytorch/pull/61335 to the latest master branch. Usage example: ``` with torch.autograd.profiler.emit_itt(): for i in range(10): torch.itt.range_push('step_{}'.format(i)) model(input) torch.itt.range_pop() ``` cc @ilia-cher @robieta @chaekit @gdankel @bitfort @ngimel @orionr @nbcsm @guotuofeng @guyang3532 @gaoteng-git Pull Request resolved: https://github.com/pytorch/pytorch/pull/63289 Approved by: https://github.com/malfet	2022-07-13 13:50:15 +00:00
PyTorch MergeBot	36d2c44cce	Revert "Back out "Revert D37228314: [Profiler] Include ActivityType from Kineto" (#81122 )" This reverts commit `52a538868b`. Reverted https://github.com/pytorch/pytorch/pull/81122 on behalf of https://github.com/clee2000 due to broke periodic buck build https://github.com/pytorch/pytorch/runs/7306516655?check_suite_focus=true	2022-07-12 18:20:00 +00:00
Taylor Robie	52a538868b	Back out "Revert D37228314: [Profiler] Include ActivityType from Kineto" (#81122 ) Reland Differential Revision: [D37720837](https://our.internmc.facebook.com/intern/diff/D37720837/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D37720837/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/81122 Approved by: https://github.com/chaekit	2022-07-12 14:54:01 +00:00
PyTorch MergeBot	a965a67492	Revert "[Profiler] Include ActivityType from Kineto (#80750 )" This reverts commit `2f6f7391ef`. Reverted https://github.com/pytorch/pytorch/pull/80750 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally	2022-07-08 05:16:56 +00:00
Taylor Robie	2f6f7391ef	[Profiler] Include ActivityType from Kineto (#80750 ) We don't want to compile with Kineto on all platforms, but if we're going to have significant integration between profiler and Kineto profiler will need to be able to rely on simple API constructs like the Kineto enums. Differential Revision: [D37228314](https://our.internmc.facebook.com/intern/diff/D37228314/) NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D37228314/)! Pull Request resolved: https://github.com/pytorch/pytorch/pull/80750 Approved by: https://github.com/aaronenyeshi	2022-07-08 04:59:06 +00:00
PyTorch MergeBot	814cccc968	Revert "Automated submodule update: kineto (#79925 )" This reverts commit `cc0f1cc3d3`. Reverted https://github.com/pytorch/pytorch/pull/79925 on behalf of https://github.com/malfet due to Seems to have caused CUDA-10.2 regression, see https://hud.pytorch.org/hud/pytorch/pytorch/master/1?name_filter=linux-bionic-cuda10.2	2022-07-06 22:14:13 +00:00
Facebook Community Bot	cc0f1cc3d3	Automated submodule update: kineto (#79925 ) This is an automated pull request to update the first-party submodule for [pytorch/kineto](https://github.com/pytorch/kineto). New submodule commit: `a7c85d503c` Test Plan: Ensure that CI jobs succeed on GitHub before landing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/79925 Approved by: https://github.com/malfet, https://github.com/robieta	2022-07-06 16:59:31 +00:00
PyTorch MergeBot	b1943e01e2	Revert "[MPS] Add test consistency from OpInfo based tests from PR 78504 (#79532 )" This reverts commit `c71886e048`. Reverted https://github.com/pytorch/pytorch/pull/79532 on behalf of https://github.com/malfet due to Unintended submodules updates	2022-06-30 16:37:11 +00:00
PyTorch MergeBot	1454515253	Revert "Enable Intel® VTune™ Profiler's Instrumentation and Tracing Technology APIs (ITT) to PyTorch (#63289 )" This reverts commit `f988aa2b3f`. Reverted https://github.com/pytorch/pytorch/pull/63289 on behalf of https://github.com/malfet due to broke trunk, see `f988aa2b3f`	2022-06-30 12:49:41 +00:00

1 2 3 4 5 ...

1383 Commits