pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-08 07:39:33 +01:00

Author	SHA1	Message	Date
lezcano	9a5b4d2403	Do not forward parent's value range to CSE variable for variables created within codegen. (#123099 ) Consider we are generating code for `ops.gt`, and within it we call `ops.to_dtype`. Before, we would forward the bounds from `gt` to the to the result of `to_dtype`, which is wrong. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123099 Approved by: https://github.com/jgong5, https://github.com/peterbell10	2024-04-23 06:26:39 +00:00
Isuru Fernando	edcd968b51	Add out wrappers to some decompositions (#115437 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115437 Approved by: https://github.com/lezcano	2024-04-23 06:26:11 +00:00
chilli	e0c5113dec	Add support for capturing tensors with score_mod (#124444 ) ``` import torch from torch import nn import torch.nn.functional as F import torch._inductor.config as config # torch.set_default_device('cuda') import torch from torch.nn.attention._templated_attention import _templated_attention as templated_attention from triton.testing import do_bench from torch.nn.attention import SDPBackend, sdpa_kernel index = torch.ops.aten torch.manual_seed(0) B = 16 H = 16 S = 2048 D = 64 head_scale = torch.randn(H, device='cuda') def alibi(score, batch, head, token_q, token_kv): return score + torch.ops.aten.index(head_scale, [head]) * (token_q - token_kv) bias = torch.randn(H, S, S, dtype=torch.float16, device='cuda') query = torch.randn(B, H, S, D, device="cuda", dtype=torch.float16) key = torch.randn(B, H, S, D, device="cuda", dtype=torch.float16) value = torch.randn(B, H, S, D, device="cuda", dtype=torch.float16) compiled = torch.compile(templated_attention) out = compiled(query, key, value, score_mod=alibi) out2 = templated_attention(query, key, value,score_mod=alibi) print((out - out2).abs().mean()) assert (out - out2).abs().mean() < 1e-3 print("Flash (no mask): ", do_bench(lambda: F.scaled_dot_product_attention(query, key, value))) print("Flash (mask): ", do_bench(lambda: F.scaled_dot_product_attention(query, key, value, attn_mask=bias))) print("flexattention: ", do_bench(lambda: compiled(query, key, value, score_mod=alibi))) ``` <img width="324" alt="image" src="https://github.com/pytorch/pytorch/assets/6355099/18c175d0-2720-4dfd-8747-85b8a8f609f5"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/124444 Approved by: https://github.com/jansel, https://github.com/drisspg	2024-04-23 06:20:13 +00:00
Mikayla Gawarecki	c82fcb7b30	Add testing and fix `weights_only` load for quantized types and nn.Parameters with python attrs (#124330 ) Adds the following to allowed globals for the `weights_only` unpickler - [x] `torch._utils._rebuild_qtensor` and qtensor related types - [x] `torch._utils._rebuild_parameter_with_state` (used deserializing a parameter that has user-defined attributes like `Param.foo`) The remaining rebuild functions that have not been allowlisted are - [x] `torch._utils._rebuild_wrapper_subclass` (allowlisted in above PR) - [ ] `torch._utils._rebuild_device_tensor_from_numpy` - [ ] `torch._utils._rebuild_xla_tensor` (legacy) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124330 Approved by: https://github.com/albanD	2024-04-23 04:13:26 +00:00
Nikita Shulga	de5d689cf9	[EZ] Update pillow to 10.3.0 (#124614 ) As older versions as subject to [CVE-2024-28219](https://nvd.nist.gov/vuln/detail/CVE-2024-28219), although it's not super important from CI PoV Modernize `torch/utils/tensorboard/summary.py` to use Pillow-9+ APIs (is this file even used for anything anymore?) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124614 Approved by: https://github.com/Skylion007, https://github.com/ZainRizvi	2024-04-23 03:22:23 +00:00
atalman	7706cd7d12	Extend CPU inductor merge rule (#124671 ) To help unblock: https://github.com/pytorch/pytorch/pull/123710 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124671 Approved by: https://github.com/leslie-fang-intel, https://github.com/huydhn	2024-04-23 02:18:00 +00:00
Edward Z. Yang	660db767ef	Don't clean up fresh inductor cache on error (#124620 ) Useful for local debugging. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/124620 Approved by: https://github.com/oulgen, https://github.com/desertfire, https://github.com/jansel	2024-04-23 02:13:05 +00:00
Oguz Ulgen	7e095be4b6	Fix test_max_autotune_remote_caching (#124655 ) D55206000 broke this test. It is not clear why it did not run in the CI but here's the fix. Differential Revision: [D56439213](https://our.internmc.facebook.com/intern/diff/D56439213/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124655 Approved by: https://github.com/aorenste	2024-04-23 01:41:04 +00:00
Fadi Botros	375ec25f55	Add missing aten::sort.any op for assistant lm models (#123982 ) Differential Revision: D56084098 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123982 Approved by: https://github.com/JacobSzwejbka	2024-04-23 01:35:07 +00:00
cyy	ea61c9cb29	[Distributed] [5/N] Fix clang-tidy warnings in torch/csrc/distributed/c10d (#124043 ) This PR continues to fix some clang-tidy warnings in distributed/c10d code, following https://github.com/pytorch/pytorch/pull/124032. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124043 Approved by: https://github.com/ezyang	2024-04-23 00:43:50 +00:00
Ashwin Hari	5f5778476a	rename ort to maia (#123265 ) Fixes #123264 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123265 Approved by: https://github.com/albanD	2024-04-23 00:33:25 +00:00
leslie-fang-intel	bffecb5aff	[Inductor] Enable VecMask store (#123710 ) Summary Enable the vectorization of store with `bool` dtype. Test Plan ``` python -u -m pytest -s -v inductor/test_cpu_repro.py -k test_decomposed_fake_quant_per_channel ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/123710 Approved by: https://github.com/jgong5, https://github.com/lezcano ghstack dependencies: #123512	2024-04-23 00:29:47 +00:00
leslie-fang-intel	dd440ac734	Add Matmul recipe into x86_inductor_quantizer (#122776 ) Summary Add `matmul` in the quantization recipes, noting that it's not a general recipe but tailored to meet accuracy criteria for specific models. `matmul` recipe is disabled by default. Test Plan ``` python -m pytest quantization/pt2e/test_x86inductor_quantizer.py -k test_attention_block ``` Differential Revision: [D56288468](https://our.internmc.facebook.com/intern/diff/D56288468) Pull Request resolved: https://github.com/pytorch/pytorch/pull/122776 Approved by: https://github.com/jgong5, https://github.com/jerryzh168	2024-04-23 00:25:41 +00:00
soulitzer	1fcdea8cd6	Do not import transformers when import torch._dynamo (#124634 ) Fixes https://github.com/pytorch/pytorch/issues/123954 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124634 Approved by: https://github.com/thiagocrepaldi, https://github.com/Chillee ghstack dependencies: #124343	2024-04-23 00:25:20 +00:00
nopperl	0c21161488	Add meta function for `torch.histc` (#124548 ) Registers a meta function for the `aten.histc.default` and `aten.histc.out` ops to support `torch.compile(dynamic=True)`. Fixes #124512. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124548 Approved by: https://github.com/lezcano, https://github.com/peterbell10	2024-04-23 00:24:59 +00:00
Edward Z. Yang	6054789874	Make numel equal test size oblivious in reshape_symint (#124611 ) Fixes https://github.com/pytorch/pytorch/issues/124581 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/124611 Approved by: https://github.com/bdhirsh ghstack dependencies: #124139	2024-04-22 23:59:40 +00:00
Nikita Shulga	abf3f90781	[MPS] Fix large copy (#124635 ) By slicing `copyFromBuffer:sourceOffset:toBuffer:destinationOffset:size:` into 2Gb chunks Add regression test, but limit it to machines with 12Gb of RAM or more, and MacOS 14+, as on MacOS 13 attempt to alloc 4Gb tensor fails with: ``` /AppleInternal/Library/BuildRoots/c651a45f-806e-11ed-a221-7ef33c48bc85/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:724: failed assertion `[MPSNDArray initWithDevice:descriptor:] Error: total bytes of NDArray > 2**32' ``` Fixes https://github.com/pytorch/pytorch/issues/124335 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124635 Approved by: https://github.com/kulinseth	2024-04-22 23:43:11 +00:00
Yanbo Liang	72a34eeb99	Dynamo x autograd.Function supports non-{Tensor, symnode, constant} inputs (#124360 ) Fixes #118395 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124360 Approved by: https://github.com/zou3519	2024-04-22 23:32:54 +00:00
atalman	302d7e9a6e	[Binary Build] Increase timeout for Linux nightly binary builds (#124668 ) Related issue: https://github.com/pytorch/pytorch/issues/124667. Please note, this is mitigation PR. Will follow up with investigation and proper fix for this. Similar to: `94d6463255` Pull Request resolved: https://github.com/pytorch/pytorch/pull/124668 Approved by: https://github.com/huydhn	2024-04-22 22:38:39 +00:00
Shuqiang Zhang	87a35d5a29	Use new function to log one cluster per line (#124628 ) Summary: For motivation behind the overall stack of diffs see D56218385 summary. This particular diff makes cpp_dumper take a custom printer function to log callstacks one-group-at-a-time and as such no longer running into 30K characters limit of `LOG(INFO)`. Test Plan: ``` [romanmal@46150.od /data/sandcastle/boxes/fbsource/fbcode (520a7b7b5)]$ buck2 test //caffe2/torch/csrc/distributed/c10d/... File changed: fbcode//common/base/ThreadStackTrace.cpp File changed: fbsource//xplat/caffe2/torch/csrc/distributed/c10d/fb/TraceUtils.cpp File changed: fbcode//caffe2/torch/csrc/distributed/c10d/ProcessGroupNCCL.hpp 4 additional file change events Buck UI: https://www.internalfb.com/buck2/d8ceae86-7d6f-4779-ad0c-8e37eddcff98 Network: Up: 0B Down: 0B Jobs completed: 2. Time elapsed: 1.5s. Tests finished: Pass 0. Fail 0. Fatal 0. Skip 0. Build failure 0 NO TESTS RAN [romanmal@46150.od /data/sandcastle/boxes/fbsource/fbcode (520a7b7b5)]$ ``` Tested to print the stack trace: P1220109730 Differential Revision: D56218360 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124628 Approved by: https://github.com/wconstab	2024-04-22 21:57:39 +00:00
William Wen	501edc7e59	[inductor, test] remove cast for test_tmp_not_defined_issue2_cpu (#114910 ) Does this verify that https://github.com/pytorch/pytorch/issues/94017 is fixed? Pull Request resolved: https://github.com/pytorch/pytorch/pull/114910 Approved by: https://github.com/angelayi	2024-04-22 21:51:53 +00:00
Sheng Fu	ba3c00c266	[test_profiler.py] Disable tqdm monitor thread and torch.compile with compile_threads=1 (#124409 ) Summary: if tqdm is not shutdown properly, it will leave the monitor thread alive. This causes an issue in the multithreading test because we check all events in that test with their tids. The events that correspond to these lingering threads all have TID of (uint64_t)(-1) which is invalid. The work around is turning off monitoring thread when tqdm is loaded. Since these are unit tests, it is safe to turn off monitor thread. Test Plan: buck test mode/dev-sand caffe2/test:profiler Differential Revision: D56310301 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124409 Approved by: https://github.com/aaronenyeshi	2024-04-22 21:51:14 +00:00
IvanKobzarev	c01499ecc6	[sym_shapes][perf] Cache ShapeEnv constrain_symbol_range calls (#124610 ) Differential Revision: [D56422688](https://our.internmc.facebook.com/intern/diff/D56422688) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124610 Approved by: https://github.com/ezyang, https://github.com/lezcano	2024-04-22 21:49:08 +00:00
Wanchao Liang	05addd5658	[tp] add kwargs support to prepare_module_input (#124114 ) as titled, this PR adds kwargs support to PrepareModuleInput style, where there might be modules who have only kwargs inputs but no positional args, so we should support this Pull Request resolved: https://github.com/pytorch/pytorch/pull/124114 Approved by: https://github.com/XilunWu	2024-04-22 21:46:31 +00:00
Jithun Nair	5785b02ba6	Skip workspace permission change for ROCm CI (#123816 ) PR https://github.com/pytorch/pytorch/pull/122922 added chown steps to test.sh and used the trap mechanism to ensure that, even if the test scripts fails and exits with a non-zero code, it will call the cleanup_workspace function on EXIT. However, this doesn't work as intended when the CI job gets cancelled for eg. if a PR pushes new commits and the older commit CI job gets cancelled. The trap function doesn't get called as the test script immediately aborts. Any subsequent jobs scheduled on the same runner then fail in the 'Checkout PyTorch' step when they try to delete the workspace. This has been resulting in a slew of CI failures on the HUD. Example of this situation playing out on one of the ROCm runners: Cancelled job: https://github.com/pytorch/pytorch/actions/runs/8563212279/job/23469711035 ![image](https://github.com/pytorch/pytorch/assets/37884920/7192e4fe-8cff-4256-abc8-9f874a3918ff) Subsequent failed job: https://github.com/pytorch/pytorch/actions/runs/8564517036/job/23472675041 ![image](https://github.com/pytorch/pytorch/assets/37884920/24b0af66-cfe9-431f-851a-24a1ccc18e84) This PR skips the logic introduced by PR 122922 for ROCm CI. Alternative to https://github.com/pytorch/pytorch/pull/123468 and https://github.com/pytorch/pytorch/pull/123588 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123816 Approved by: https://github.com/pruthvistony, https://github.com/zxiiro, https://github.com/kit1980, https://github.com/malfet	2024-04-22 21:27:32 +00:00
Bin Bao	bb37910e30	[AOTI] Fixes ScatterFallback codegen (#124580 ) Summary: For https://github.com/pytorch/pytorch/issues/123184. ScatterFallback currently relies on op name matching for codegen, which makes its cpp codegen fragile. Refactor to use op_overload and fix the relevant unit test failures. Differential Revision: [D56417815](https://our.internmc.facebook.com/intern/diff/D56417815) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124580 Approved by: https://github.com/chenyang78	2024-04-22 20:47:26 +00:00
Catherine Lee	fd59554be6	Scripts to compile reruns + td exclusions and upload to s3 (#124312 ) Edits upload_test_stats to also upload a condensed version that contains reruns, and one that contains the list of td_exclusions. Grouped by build name + test config Pull Request resolved: https://github.com/pytorch/pytorch/pull/124312 Approved by: https://github.com/malfet	2024-04-22 20:19:35 +00:00
Edward Z. Yang	0bbbc754dd	Add AOTInductor generated cpp code to TORCH_TRACE (#124617 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/124617 Approved by: https://github.com/albanD	2024-04-22 19:25:20 +00:00
Jason Ansel	0093735ccd	[inductor] Use compile time config values in runtime (#124561 ) This removes usage of torch._inductor.config from `torch._inductor.runtime`. Fixing two issues: 1) If configs change we should really use the compile time ones 2) In compile workers, we want to use the parent process config Pull Request resolved: https://github.com/pytorch/pytorch/pull/124561 Approved by: https://github.com/yanboliang ghstack dependencies: #124552, #124553, #124557, #124559, #124560, #124569	2024-04-22 18:46:40 +00:00
Jason Ansel	cb9fe91f5c	[inductor] Remove config check for 3D tiling (#124569 ) This makes the check per-kernel (if 3D tiling is used), rather than global config. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124569 Approved by: https://github.com/yanboliang ghstack dependencies: #124552, #124553, #124557, #124559, #124560	2024-04-22 18:46:40 +00:00
Jason Ansel	4620a45542	[inductor] Refactor runtime files into torch._inductor.runtime (part 5) (#124560 ) I am planning to make the compile_worker process not import torch so it can start up much faster. This stack is prep for that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124560 Approved by: https://github.com/yanboliang ghstack dependencies: #124552, #124553, #124557, #124559	2024-04-22 18:46:35 +00:00
Jason Ansel	0cc0e60e30	[inductor] Refactor runtime files into torch._inductor.runtime (part 4) (#124559 ) I am planning to make the compile_worker process not import torch so it can start up much faster. This stack is prep for that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124559 Approved by: https://github.com/yanboliang ghstack dependencies: #124552, #124553, #124557	2024-04-22 18:46:29 +00:00
Jason Ansel	7fd8870e6b	[inductor] Refactor runtime files into torch._inductor.runtime (part 3) (#124557 ) I am planning to make the compile_worker process not import torch so it can start up much faster. This stack is prep for that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124557 Approved by: https://github.com/yanboliang ghstack dependencies: #124552, #124553	2024-04-22 18:46:24 +00:00
Jason Ansel	bb8815bc31	[inductor] Refactor runtime files into torch._inductor.runtime (part 2) (#124553 ) I am planning to make the compile_worker process not import torch so it can start up much faster. This stack is prep for that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124553 Approved by: https://github.com/yanboliang ghstack dependencies: #124552	2024-04-22 18:46:20 +00:00
Jason Ansel	480585fd2b	[inductor] Refactor runtime files into torch._inductor.runtime (part 1) (#124552 ) I am planning to make the compile_worker process not import torch so it can start up much faster. This stack is prep for that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/124552 Approved by: https://github.com/yanboliang	2024-04-22 18:41:12 +00:00
PyTorch MergeBot	16eea7c6a5	Revert "[inductor] Refactor runtime files into torch._inductor.runtime (part 1) (#124552 )" This reverts commit `a7035cc11a`. Reverted https://github.com/pytorch/pytorch/pull/124552 on behalf of https://github.com/jeanschmidt due to There are internal breakages, already discussed with author and he'll FF ([comment](https://github.com/pytorch/pytorch/pull/124552#issuecomment-2070548223))	2024-04-22 18:28:05 +00:00
PyTorch MergeBot	56714cb497	Revert "[inductor] Refactor runtime files into torch._inductor.runtime (part 2) (#124553 )" This reverts commit `f4d47f5bbb`. Reverted https://github.com/pytorch/pytorch/pull/124553 on behalf of https://github.com/jeanschmidt due to There are internal breakages, already discussed with author and he'll FF ([comment](https://github.com/pytorch/pytorch/pull/124552#issuecomment-2070548223))	2024-04-22 18:28:05 +00:00
PyTorch MergeBot	0b90af0bf5	Revert "[inductor] Refactor runtime files into torch._inductor.runtime (part 3) (#124557 )" This reverts commit `fcf28b0ad5`. Reverted https://github.com/pytorch/pytorch/pull/124557 on behalf of https://github.com/jeanschmidt due to There are internal breakages, already discussed with author and he'll FF ([comment](https://github.com/pytorch/pytorch/pull/124552#issuecomment-2070548223))	2024-04-22 18:28:05 +00:00
PyTorch MergeBot	b3d6c2fe9b	Revert "[inductor] Refactor runtime files into torch._inductor.runtime (part 4) (#124559 )" This reverts commit `9ea2a09510`. Reverted https://github.com/pytorch/pytorch/pull/124559 on behalf of https://github.com/jeanschmidt due to There are internal breakages, already discussed with author and he'll FF ([comment](https://github.com/pytorch/pytorch/pull/124552#issuecomment-2070548223))	2024-04-22 18:28:05 +00:00
PyTorch MergeBot	0f44ef93ab	Revert "[inductor] Refactor runtime files into torch._inductor.runtime (part 5) (#124560 )" This reverts commit `3ac30bc32a`. Reverted https://github.com/pytorch/pytorch/pull/124560 on behalf of https://github.com/jeanschmidt due to There are internal breakages, already discussed with author and he'll FF ([comment](https://github.com/pytorch/pytorch/pull/124552#issuecomment-2070548223))	2024-04-22 18:28:05 +00:00
PyTorch MergeBot	8973c5b846	Revert "[inductor] Remove config check for 3D tiling (#124569 )" This reverts commit `317c0af149`. Reverted https://github.com/pytorch/pytorch/pull/124569 on behalf of https://github.com/jeanschmidt due to There are internal breakages, already discussed with author and he'll FF ([comment](https://github.com/pytorch/pytorch/pull/124552#issuecomment-2070548223))	2024-04-22 18:28:05 +00:00
PyTorch MergeBot	30dec1da84	Revert "[inductor] Use compile time config values in runtime (#124561 )" This reverts commit `3af12447f8`. Reverted https://github.com/pytorch/pytorch/pull/124561 on behalf of https://github.com/jeanschmidt due to There are internal breakages, already discussed with author and he'll FF ([comment](https://github.com/pytorch/pytorch/pull/124561#issuecomment-2070537634))	2024-04-22 18:24:38 +00:00
rzou	d77e7b7c54	Make some kernel static asserts clearer (#124519 ) Users get int/int64_t and double/float confused a lot. Test Plan: - tested locally Pull Request resolved: https://github.com/pytorch/pytorch/pull/124519 Approved by: https://github.com/Skylion007	2024-04-22 18:17:40 +00:00
Isuru Fernando	c2f8bfae9c	Make torch._inductor.dependencies.Dep a proper class (#124407 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124407 Approved by: https://github.com/peterbell10	2024-04-22 17:09:34 +00:00
Aleksei Nikiforov	77c35334c1	Fix build on s390x (#123250 ) Rename s390x-specific zvector functions with same name. Pull Request resolved: https://github.com/pytorch/pytorch/pull/123250 Approved by: https://github.com/malfet	2024-04-22 16:57:08 +00:00
Aleksei Nikiforov	be2e56b5ab	s390x: update using vectorization builtins (#124396 ) With gcc >= 12 on s390x store builtins are accidentally optimized out due to bad type aliasing. Ensure that proper corresponding types are used, and if types do mismatch, first store data into array of correct type and then memcpy it to destination pointer. See also: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676 Pull Request resolved: https://github.com/pytorch/pytorch/pull/124396 Approved by: https://github.com/malfet	2024-04-22 16:55:18 +00:00
mengfeil	0ee514e628	[CI] Upgrade xpu driver to LTS_803.29 (#123920 ) Upgrade xpu driver from 647.21 to LTS 803.29 Works for #114850 Pull Request resolved: https://github.com/pytorch/pytorch/pull/123920 Approved by: https://github.com/chuanqi129, https://github.com/EikanWang, https://github.com/huydhn	2024-04-22 16:45:01 +00:00
Thiago Crepaldi	9c2ac4476c	Allow ONNX models without parameters (#121904 ) Currently, if initializers are available, they are included in the ONNX model. If they are not available, the model is serialized without them. However, there are times in which the initializers are avaialable, but the user prefers not to include them in the model, say for visualizing it on Netron or because the initialziers will be specified along with the inputs in the onnx runtime of choice. This PR allow users to pass `include_initializers` to `ONNXProgram.save()` API. Fixes #100996 Pull Request resolved: https://github.com/pytorch/pytorch/pull/121904 Approved by: https://github.com/titaiwangms	2024-04-22 15:53:38 +00:00
Jeff Daily	6ede882c0b	preferred blas library; cublaslt gemm implementation (#122106 ) Following the example of PyTorch supporting a preferred Linalg library (cusolver or magma), this PR introduces a preferred blas library selector of either cublas or cublaslt for CUDA and hipblas or hipblaslt for ROCm via normal hipification of sources. The default blas implementation remains cublas or hipblas. cublaslt or hipblaslt can be enabled using environment variable TORCH_BLAS_PREFER_CUBLASLT=1 (or TORCH_BLAS_PREFER_HIPBLASLT=1 as an alias) or by calling `torch.backends.cuda.preferred_blas_library(backend="cublaslt")` or as an alias `backend="hipblaslt"`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/122106 Approved by: https://github.com/lezcano	2024-04-22 15:38:22 +00:00
Adnan Akhundov	9a322ba1b0	[user triton] Return unbacked SymInts used in the grid (#124594 ) Summary: When unbacked SymInts are used only in a grid of a user-written Triton kernel call, there is no dependency between the Triton kernel's buffer and those unbacked SymInts. As a result, definition of the unbacked SymInts are not codegen-end and the code using them in the grid definition breaks. Here we add the unbacked SymInts used in the grid to the `get_unbacked_symbol_uses` returned by the `UserDefinedTritonKernel` alongside those used in the `kwargs` (returned by `ExternKernel`). Test Plan: ``` $ python test/inductor/test_aot_inductor.py -k test_triton_kernel_unbacked_symint ... ---------------------------------------------------------------------- Ran 24 tests in 155.764s OK (skipped=16) ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D56406991](https://our.internmc.facebook.com/intern/diff/D56406991) Pull Request resolved: https://github.com/pytorch/pytorch/pull/124594 Approved by: https://github.com/oulgen	2024-04-22 15:33:30 +00:00

1 2 3 4 5 ...

72112 Commits