pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Bin Bao	3058700f7f	[aotinductor] Add AOTIModelRunner as a utility class (#110891 ) Summary: Introduce a utility class AOTIModelRunner to take care of running an AOTInductor compiled model. It does things like dlopen a model, initialize the model container, setup inputs and outputs, and destroy the model container. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110891 Approved by: https://github.com/chenyang78 ghstack dependencies: #110652	2023-10-11 15:58:28 +00:00
Bin Bao	b17c247eb1	[aotindutor] Update the cpp test example (#110652 ) Summary: store inputs and outpus in python, and load them back to run the compiled model in c++ and compare the output Pull Request resolved: https://github.com/pytorch/pytorch/pull/110652 Approved by: https://github.com/chenyang78	2023-10-11 15:58:28 +00:00
ydwu4	3062e267b1	[cond] Add more tests for valid inputs of cond (#110727 ) This PR adds a parametrized test for cond. It tests cond can be traced with valid inputs. Specifically valid inputs is combination of: - pred (python boolean, boolean tensor, int tensor, scalar tensor) - true_fn/false_fn (func, obj, nn_module) - Operands (0 or more tensor inputs), tested with 0 and 2 - closures (0 or more tensor closures), tested with 0 and 2 - nested_level (no nesting or level-2 nested cond) What this test doesn't cover: - pred: symbolic boolean expression as predicate - true_fn/false_fn: that mutates indermediate tensors - operands: non-tensor operands such as float, int - closures: nn_module attribute closures, python constant closures - nested_level: 3+ Pull Request resolved: https://github.com/pytorch/pytorch/pull/110727 Approved by: https://github.com/zou3519	2023-10-11 15:56:13 +00:00
Sam Larsen	fc1105b282	[inductor] Implement Fx graph caching to improve warm compilation time. (#103453 ) Summary: Implement an on-disk cache to save and reuse compiled FX Graphs. This implementation does not handle tensors with symbolic shapes. This needs to be done in a follow-up PR. Test Plan: * New unit tests exercising saving and load from the cache. * New unit tests to exercise the cache key calculations. * Ran several benchmarks to see cache hit and resulting compilation times. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103453 Approved by: https://github.com/eellison, https://github.com/Chillee	2023-10-11 14:39:14 +00:00
rzou	2cf9782912	[generate_opcheck_tests] Add some reasonable defaults (#110977 ) Summary: Make it easier to add `generate_opcheck_tests` by adding defaults for the failures_dict location, the additional decorators, and the test utils. Test Plan: Existing tests Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110977 Approved by: https://github.com/williamwen42 ghstack dependencies: #110951	2023-10-11 14:28:05 +00:00
PyTorch MergeBot	98c329b19e	Revert "[core ATen IR] Add decompositions for max, min, var_mean (#110906 )" This reverts commit `9606cda64e`. Reverted https://github.com/pytorch/pytorch/pull/110906 on behalf of https://github.com/SS-JIA due to Breaks internal CI ([comment](https://github.com/pytorch/pytorch/pull/110906#issuecomment-1757490740))	2023-10-11 11:41:21 +00:00
igm503	95ff51d8ed	[MPS] Add support for Softshrink to MPS Backend (#110814 ) Adds the softshrink activation function to the mps backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110814 Approved by: https://github.com/kulinseth	2023-10-11 07:55:39 +00:00
PyTorch MergeBot	0821868110	Revert "[export] Get export APIs ready for PTC (#110410 )" This reverts commit `b96ea9f361`. Reverted https://github.com/pytorch/pytorch/pull/110410 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](https://github.com/pytorch/pytorch/pull/110410#issuecomment-1757017249))	2023-10-11 07:31:51 +00:00
Angela Yi	b96ea9f361	[export] Get export APIs ready for PTC (#110410 ) Summary: https://docs.google.com/document/d/1QJJEGnj2nHGPODlw38BEG3KLLCOTfdOVjPrNQbz_LM8/edit#bookmark=id.lp80wfshq130 Changes: * `torch.export` will return a functional ATen graph w/o decompositions * `exported_program.run_decompositions(decomposition_table)` will optionally take a decomposition table, and run decompositions on the exported program, returning a new exported program. By default we will run the Core ATen decomposition table. Calling convention for Executorch stays the same: ``` pre_autograd_graph = capture_pre_autograd_graph(f, args, ...) aten_graph_no_decomps = torch.export.export(pre_autograd_graph, args, ...) # Within to_edge we decompose to core aten and then convert to edge edge_graph = exir.to_edge(aten_graph_no_decomps) ``` Test Plan: CI Differential Revision: D49742989 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110410 Approved by: https://github.com/ydwu4	2023-10-11 06:10:07 +00:00
Michael Voznesensky	1e7947b3e0	Revert "Reland 3rd try [finishing colesbury's PR 100642] Guard on nn.Module dicts and type (#109323 )" + Forward fixes + test (#110964 ) This reverts commit `f786fbdebd`. Forward fixes Pull Request resolved: https://github.com/pytorch/pytorch/pull/110964 Approved by: https://github.com/ezyang, https://github.com/anijain2305	2023-10-11 05:16:47 +00:00
soulitzer	110382bacf	Make NestedTensor compilable with eager backend (#109171 ) In this PR: - Adds support for strides for jagged tensor (design doc for this coming soon) - NestedTensor skips automatic dynamic - Make use of @bdhirsh's subclass fakification logic by adding the __tensor_{un,}flatten__ functions. - Additional logic for fakification: since existing subclass fakification logic does not handle the case where the outer tensor has an additional dimension. We insert one-off logic to (1) insert an extra SingletonSymInt onto the fakified NestedTensor. (2) make sure we call track_symint on both the sizes on the inner and outer tensor during guard creation. Remaining things that are weird: - Still need to skip some logic in meta utils for some reason (I was going to write this up more, but decided not to since we're not able to do this anyway for a immediate reason: we cannot arbitrarily compare singleton ints. For now I'm just following Brian's advise from [here](https://github.com/pytorch/pytorch/pull/109171#discussion_r1328137070) ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/109171 Approved by: https://github.com/ezyang, https://github.com/bdhirsh	2023-10-11 04:47:10 +00:00
andrewor14	0e551bbcd7	[quant][pt2] Preserve source_fn_stack after QAT fusion (#110899 ) Test Plan: python test/test_quantization.py TestQuantizePT2EQAT.test_qat_preserve_source_fn_stack Reviewers: jerryzh168, kimishpatel Subscribers: jerryzh168, kimishpatel, supriyar Differential Revision: [D50101253](https://our.internmc.facebook.com/intern/diff/D50101253) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110899 Approved by: https://github.com/jerryzh168	2023-10-11 02:55:52 +00:00
Tugsbayasgalan Manlaibaatar	5aee22e0e0	Move export.constrain_as_* to torch._constrain_as_* (#110757 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110757 Approved by: https://github.com/avikchaudhuri ghstack dependencies: #109859	2023-10-11 02:37:55 +00:00
soulitzer	c9eb8d8d90	Add set_checkpoint_debug_enabled that overrides local setting (#110728 ) People access activation checkpoint through many layers of config and it is not always guaranteed that all the layers of wrapping around checkpoint properly propagate all the kwargs, e.g. debug mode. This context manager offers an alternative way to enable debug mode that bypasses the need for all layers to propagate kwargs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110728 Approved by: https://github.com/albanD ghstack dependencies: #110673, #110674, #110675, #110676	2023-10-11 02:12:31 +00:00
Michael Voznesensky	02f6a8126e	Support a simple subset of functions as backward hooks on intermediate tensors (#109537 ) The main thrust of the initial effort here was to capture `register_hook` calls on tensors in compile regions. The first part of this was done in https://github.com/pytorch/pytorch/pull/108903 wherein we added support for register_hook input tensors. The distinction between input and intermediary is due to implementation differences. There are 2 kinds of hooks: 1) Hooks on objects with sources (inputs, params) 2) Hooks on objects w/o sources (intermediaries, and outputs). Note: As outputs can be made simple by how dynamo handles residuals, they could actually be handled as if they were inputs, but, for the sake of this PR, we will refer to hooks as either hooks on inputs (sourced), or hooks on intermediaries (not sourced). The plan: For tensors w/ a source: (The PR above) We record registered hooks, store them as a global, and associate them with the tensor in residuals. This means that when dynamo goes to create the frame, where we produce bytecode to stitch together our PT2 modified bytecode with the original eager code, we call register_hook. This registration of hooks in residuals is sound because (a) it happens right after a Pt2 frame region ends and (b) we know that the tensor is alive in f_locals, f_globals, or a module in the users invoking frame. This means we can soundly know it will be around to invoke register_hook on. As long as we guard on the identity of the lifted function, this is sound to do. For tensors w/o a source: (This PR) Ostensibly, the most correct and complete solution would be to smuggle hooks into a runtime wrapper in aot_autograd, where all the items the hooks close over are lifted to inputs as necessary and passed alongside the user provided function. This is necessary so that we can properly trace out and capture all the mutations within the user defined hook at backwards time. This is too complicated - so, we limited the scope of this initial PR to a simple subset of hooks: - Hooks must have a source (be known to us already, not a lambda or intermediary defined function) - We must be tracing under compiled autograd The flow: We use the HOP added in https://github.com/pytorch/pytorch/pull/109690/files, referred to as the HOP below. 1) We intercept register_hook calls and wrap the user defined fn in the HOP 2) We write a `_register_hook_trampoline` to the graph that is a local no-arg function that is invoked as a call_function in the dynamo graph 3) aot_autograd inlines through it during its trace, and sees the HOP 4) the HOP preserves itself in the graph - it does not get traced into 5) During backwards, compiled_autograd installs the HOP under a hook call 6) When compiled_autograd enters compilation over its generated graph, dynamo traces the contents of the hook Pull Request resolved: https://github.com/pytorch/pytorch/pull/109537 Approved by: https://github.com/ezyang	2023-10-11 01:35:37 +00:00
Edward Z. Yang	24bf9aeb6b	Fix arange with dynamic end argument. (#110979 ) Fixes https://github.com/pytorch/pytorch/issues/93468 There's a few extra tests that are sort of unrelated, but I ended up writing them while working on the fix and decided to keep them. The big idea here is to split the `_check` so that `expect_true` works; I could have probably also improved the symbolic reasoning but I'm lazy. One small logging fix too. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/110979 Approved by: https://github.com/Skylion007	2023-10-11 00:32:34 +00:00
leslie-fang-intel	a11d4a8378	[Reland] [Inductor] Break the loop fusion when node2 depends on node1 mutations (#110677 ) Reland PR https://github.com/pytorch/pytorch/pull/109172 which has been reverted in https://github.com/pytorch/pytorch/pull/110622 Differential Revision: [D50097373](https://our.internmc.facebook.com/intern/diff/D50097373) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110677 Approved by: https://github.com/jgong5, https://github.com/ezyang	2023-10-11 00:26:45 +00:00
PyTorch MergeBot	314a502eb0	Revert "Reland "[C10] PG observability hooks. (#108815 )" (#110907 )" This reverts commit `7678cd22af`. Reverted https://github.com/pytorch/pytorch/pull/110907 on behalf of https://github.com/huydhn due to Sorry for reverting this, but macos job in trunk starts failing after this `7678cd22af` ([comment](https://github.com/pytorch/pytorch/pull/110907#issuecomment-1756497387))	2023-10-11 00:23:42 +00:00
Huy Do	2edc75a669	Add a workflow to release Android binaries (#110976 ) This adds 2 jobs to build PyTorch Android with and without lite interpreter: * Keep the list of currently supported ABI armeabi-v7a, arm64-v8a, x86, x86_64 * Pass all the test on emulator * Run an the test app on emulator and my Android phone `arm64-v8a` without any issue ![Screenshot_20231010-114453](https://github.com/pytorch/pytorch/assets/475357/57e12188-1675-44d2-a259-9f9577578590) * Run on AWS https://us-west-2.console.aws.amazon.com/devicefarm/home#/mobile/projects/b531574a-fb82-40ae-b687-8f0b81341ae0/runs/5fce6818-628a-4099-9aab-23e91a212076 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110976 Approved by: https://github.com/atalman	2023-10-11 00:19:33 +00:00
Jon Chuang	5aa96fd336	[dynamo] list index: add more list types to testing, support namedtuple, improve error handling (#110919 ) Follow up: #110817 Minor improvements as discussed in prev PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/110919 Approved by: https://github.com/ezyang	2023-10-11 00:16:39 +00:00
SS-JIA	9606cda64e	[core ATen IR] Add decompositions for max, min, var_mean (#110906 ) ## Context Add decompositions for `aten.max`, `aten.min`, and `aten.var_mean`. These operators follow a pattern of returning a tuple of outputs from two component operators: ``` aten.max(x) -> return aten.amax(x), aten.argmax(x) aten.min(x) -> return aten.amin(x), aten.argmin(x) aten.var_mean(x) -> return aten.var(x), aten.mean(x) ``` For `var_mean`, the `refs` implementation was doing something similar, so I changed it to call `torch.` ops instead like was done for other `refs` implementations previously. cc: @peterbell10 @lezcano Note that Inductor lowers all these directly, so they are excluded from the Inductor decomp table. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110906 Approved by: https://github.com/manuelcandales	2023-10-11 00:06:24 +00:00
PyTorch MergeBot	3100d3e661	Revert "[inductor] Implement Fx graph caching to improve warm compilation time. (#103453 )" This reverts commit `8a8668e1ae`. Reverted https://github.com/pytorch/pytorch/pull/103453 on behalf of https://github.com/kit1980 due to The newly added test fails on internal builds ([comment](https://github.com/pytorch/pytorch/pull/103453#issuecomment-1756449919))	2023-10-10 23:21:59 +00:00
Will Constable	ca03f36233	Change ProcessGroupNCCL default timeout to 10 min (#110947 ) Avoid changing default for other backends as CPU backend (GLOO) may need longer timeouts. Motivated by trying to save cluster time when encountering collective hangs. Generally collectives should time out within seconds and 30 minutes (or 10 minutes) should provide ample headroom for edge cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110947 Approved by: https://github.com/xw285cornell, https://github.com/fduwjj	2023-10-10 22:28:39 +00:00
Jerry Zhang	7a69e3d30b	[fx][subgraph_matcher] Add a matcher that supports name to node map (#110743 ) Summary: We want the matcher to return a name -> node in target graph so that we can refer to the node by name, this is useful for downstream applications like quantization. and also we can use the torch API as source of truth instead of matching aten API directly. Test Plan: python test/fx/test_matcher_utils.py Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110743 Approved by: https://github.com/SherlockNoMad	2023-10-10 22:21:24 +00:00
Ramil Nugmanov	91eeb77260	StackDataset batched sampling (#110694 ) Optimization of loading minibatches Pull Request resolved: https://github.com/pytorch/pytorch/pull/110694 Approved by: https://github.com/ejguan	2023-10-10 22:05:51 +00:00
Joel Schlosser	ac01304e22	pin_memory support for NT (#110404 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110404 Approved by: https://github.com/cpuhrsch, https://github.com/albanD ghstack dependencies: #110292	2023-10-10 21:58:19 +00:00
Joel Schlosser	43ea782af3	Multiprocessing support for NT (#110292 ) Fixes #110161 Allows NTs to be used in DataLoaders with `num_workers > 1`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110292 Approved by: https://github.com/cpuhrsch, https://github.com/albanD	2023-10-10 21:58:19 +00:00
rzou	3a29cdc5e6	[optests] Add dontGenerateOpCheckTests and is_inside_opcheck_mode (#110951 ) This PR adds the following helper functions for generated opcheck tests: - dontGenerateOpCheckTests is a decorator that skips generation of the opcheck tests for the generated function - is_inside_opcheck_mode lets us query if we are in a generated test. Useful for fast debugging out-of-tree without needing to update PyTorch. Test Plan: - new tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/110951 Approved by: https://github.com/williamwen42	2023-10-10 21:43:43 +00:00
wz337	d9eb5a57aa	[FSDP] Change _create_chunk_dtensor in fsdp/_shard_utils.py to use public API from DTensor (#110831 ) This PR: 1) updates _create_chunk_dtensor() in _shard_utils.py to use public APIs from DTensor. This will avoid the global_size calculation error from using DTensor.from_local() for uneven-sharded parameters, as described in https://github.com/pytorch/pytorch/issues/110762 2) updates test/distributed/fsdp/test_fsdp_dtensor_state_dict.py to include unit test for a model with uneven sharding. cc. @wanchaol, @fegin Pull Request resolved: https://github.com/pytorch/pytorch/pull/110831 Approved by: https://github.com/wanchaol, https://github.com/fegin	2023-10-10 21:04:27 +00:00
Jon Chuang	6e770c0dda	[dynamo] Add `itertools.repeat` via polyfill (#110953 ) Fixes https://github.com/pytorch/pytorch/issues/110286 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110953 Approved by: https://github.com/ezyang	2023-10-10 20:40:33 +00:00
Will Constable	7678cd22af	Reland "[C10] PG observability hooks. (#108815 )" (#110907 ) This reverts commit `ff0358b038`. (original PR https://github.com/pytorch/pytorch/pull/108815 desc copied below) Expose a set of observability hooks into C10D such that our users can detect collectives failure both faster and more easily. The design is similar to NCCL desync debug that it minimized the overhead by doing most of the work out of the main thread. This PR introduces a new module torch.distributed.hooks that exposes the following set of methods: register_collective_start_hook register_collective_end_hook register_process_group_hook The process group hook exposes PG creation on the member ranks and call them inline from the the PG creation code. This is fine since this happens during initialization and a limited number of times. The collective start/end hooks are fired from a single background thread. It reads events from a C++ queue and dispatches over. Queue notification is oddly done using a pipe, this is needed so python can abort the thread on shutdown and have it as background thread. This is not possible with more reasonable choices like a condvar. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110907 Approved by: https://github.com/fduwjj	2023-10-10 20:09:40 +00:00
soulitzer	3842b175d2	[reland] Add symbolic singleton int (#110674 ) reland of https://github.com/pytorch/pytorch/pull/110370 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110674 Approved by: https://github.com/ezyang ghstack dependencies: #110673	2023-10-10 19:37:17 +00:00
soulitzer	fda0a965c7	[reland] Support SingletonSymNode mul with coefficient (#110673 ) reland of https://github.com/pytorch/pytorch/pull/110369 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110673 Approved by: https://github.com/ezyang	2023-10-10 19:37:17 +00:00
eellison	fb4b9e9c8e	Re-enable a couple of fixed tests (#110770 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110770 Approved by: https://github.com/yanboliang, https://github.com/int3, https://github.com/Skylion007 ghstack dependencies: #110651	2023-10-10 19:13:14 +00:00
drisspg	5183760ca5	Adding Backward Support for NestedTensors and FlashAttention (#97485 ) # Summary <!-- copilot:summary --> ### <samp>🤖 Generated by Copilot at 318764f</samp> This pull request implements the CUDA backend of the SDPA kernel for nested tensors, which enables efficient transformer models with variable-length sequences. It adds a new dispatch key, a backward function, a unit test, and some helper functions for the kernel. It modifies `test/test_transformers.py`, `aten/src/ATen/native/native_functions.yaml`, `aten/src/ATen/native/nested/cuda/NestedTensorTransformerFunctionsBackward.cpp`, and `aten/src/ATen/native/nested/cuda/NestedTensorTransformerUtils.h`. <!-- copilot:poem --> ### <samp>🤖 Generated by Copilot at ed4a773</samp> > _Fused kernels of doom, unleash the flash attention_ > _Nested tensors on fire, reshape and pad with caution_ > _Backward pass of power, dispatch the CUDA key_ > _Test the gradients of hell, warn the user if they disagree_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/97485 Approved by: https://github.com/jbschlosser	2023-10-10 18:08:17 +00:00
Chien-Chin Huang	57f6368b8e	[collective] Add a torch.compile + functional_collectives test (#110688 ) Add a test to ensure functional_collectives + torch.compile always works. Differential Revision: [D50001491](https://our.internmc.facebook.com/intern/diff/D50001491/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110688 Approved by: https://github.com/wanchaol, https://github.com/fduwjj	2023-10-10 17:14:50 +00:00
eellison	c5f06b9753	Re-enable test_copy_transpose_math_view, neg_view/dce fix (#110651 ) - neg view can just be lowered to neg() post functionalization - we were treating all fallback kernels as not having side effects. we shouldn't dce mutating fallback kernels - either mutations induced by the reinplacing pass or clone_ with unsupported arguments (complex) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110651 Approved by: https://github.com/Chillee, https://github.com/jansel, https://github.com/malfet, https://github.com/Skylion007	2023-10-10 16:34:01 +00:00
Brian Hirsh	ba86dfcd83	AOTDispatch subclass (#104483 ) This is a PoC of AOTDispatch support. This PR actually works on basic examples, and I'm working on testing it out on `DTensor` (with @wanchaol), `SemiStructuredSparsityTensor` (with @jcaip), and `FP8Tensor`. There are some design decisions baked into the PR that I think we need consensus on though - so I'm planning on writing a larger design doc to go over the changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104483 Approved by: https://github.com/ezyang	2023-10-10 16:13:16 +00:00
jjsjann123	37567fdf31	Nvfuser cpp api deprecation attempt 2 (#110881 ) attempting to re-try #110318 deprecating nvfuser c++ API warning has been updated to TORCH_WARN_ONCE; Warning thrown inside torch::jit::fuser::cuda::isEnabled() is turned off and will be deprecated when we pulled out TorchScript integration in the follow up PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110881 Approved by: https://github.com/davidberard98, https://github.com/NicolasHug	2023-10-10 08:07:03 +00:00
Liao, Xuan	8820dda943	Revise def of contiguity in bmm (#110811 ) Fixes #108754. `hf_T5_generate` would encounter a regression when calling `extern_kernels.bmm`, if one input is `reinterpret_tensor(buf2, (8, 1, 64), (64, 0, 1))` rather than `reinterpret_tensor(buf2, (8, 1, 64), (64, 512, 1), 0)`. As @jgong5 mentioned in comment, in fact the two tensors are equivalent: The stride doesn't matter when the corresponding size is 1. We revise the definition of contiguity in `bmm` to add the above situation as a contiguous case. Thus, when stride equals to 0, `extern_kernels.bmm` could still use `gemm` of MKL to gain the performance. Speedup of `hf_T5_generate` is 1.343x now and 1.138x before, with script `bash inductor_single_test.sh multiple inference performance torchbench hf_T5_generate float32 first dynamic default 0`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/110811 Approved by: https://github.com/jgong5, https://github.com/lezcano, https://github.com/Chillee	2023-10-10 06:48:08 +00:00
Jesse Cai	f10aab03c4	[sparse] Fix semi-structured sparse shape mismatch bug (#110420 ) Summary: Currently, PyTorch incorrectly calculates the size of the returned matrix when we pass a non-contiguous batched (>2d) input to the semi-structured sparse subclass. This is most common in MLP layers, where we have 2 linear layers back to back. This will lead to an error like the following: ``` RuntimeError: shape '[20, 64, 64, 3072]' is invalid for input of size 62914560 ``` Where the size of the sparse matmul result is off because we infer the output shape with the wrong tensor shape. This happens because of a bug where we did not update the subclass tensor shape when doing transpose. For semi-structured sparsity, transposing is a no-op where we just set the boolean flag, but we forgot to also update the tensor shape. Note that this error goes away in inference mode, since we avoid decomposing the aten.linear op and handle shape folding ourselves, which changes the execution path. An alternative way to fix this issue is to set TORCH_FLATTEN_LINEAR_3D=True, which will also fix this error. Test Plan: ``` python test/test_sparse_semi_structured.py -k test_mlp ``` Reviewers: Subscribers: Tasks: Tags: Pull Request resolved: https://github.com/pytorch/pytorch/pull/110420 Approved by: https://github.com/alexsamardzic, https://github.com/cpuhrsch	2023-10-10 03:07:31 +00:00
Guilherme Leobas	0a580da582	Add batch decomposition for torch.linalg.eigh (#110640 ) Closes https://github.com/pytorch/pytorch/issues/108481 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110640 Approved by: https://github.com/kshitij12345, https://github.com/zou3519	2023-10-09 21:36:49 +00:00
chilli	c596db762f	refactor aotautograd to set requires_grad on info rather than a separate array (#110720 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110720 Approved by: https://github.com/bdhirsh	2023-10-09 20:18:19 +00:00
Jon Chuang	db760527e0	fix(dynamo): list index via polyfill (#110817 ) Fixes https://github.com/pytorch/pytorch/issues/109031 Pull Request resolved: https://github.com/pytorch/pytorch/pull/110817 Approved by: https://github.com/ezyang	2023-10-09 19:48:39 +00:00
PyTorch MergeBot	d1c157c598	Revert "[reland] Update custom Function preserve torch function when inputs r… (#110679 )" This reverts commit `563728f61c`. Reverted https://github.com/pytorch/pytorch/pull/110679 on behalf of https://github.com/kit1980 due to The diff has Meta-internal changes, please land from Phabricator ([comment](https://github.com/pytorch/pytorch/pull/110679#issuecomment-1753523182))	2023-10-09 19:09:01 +00:00
Edward Z. Yang	8ae623db9d	Don't pass tuple to with statement (#110864 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/110864 Approved by: https://github.com/Skylion007, https://github.com/awgu	2023-10-09 19:00:34 +00:00
igm503	4b881b0da3	[MPS] add support for sgn to MPS backend (#110829 ) Fixes #86805 Adds support for sgn to MPS backend. Notes: 1. @malfet self-assigned this when he was working on implementing polar, but from what I can tell, he didn't end up needing to implement it. 2. @Berzeg implemented this last year, before view_as_complex was supported. Because of @malfet recent contributions, however, @Berzeg 's implementation works. I've removed the part of his implementation that dealt with non-complex dtypes (since these can just be passed to at::sign), matched the more recent pattern we've been using in UnaryOps.mm, and thrown in a simple implementation of _efficientzerotensor for mps, so that the backward function works. 3. @Berzeg deserves a good bit of credit for this, so let me know if there's a way to assign him some without jamming up the pr (he seems to be AWOL since last working on this) Pull Request resolved: https://github.com/pytorch/pytorch/pull/110829 Approved by: https://github.com/malfet	2023-10-09 16:53:25 +00:00
Mwiza Kunda	306b2284f2	Add meta kernel for ctc_loss.intList (#107949 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/107949 Approved by: https://github.com/zou3519	2023-10-09 16:35:14 +00:00
PyTorch MergeBot	bbdc8c7b05	Revert "deprecating nvfuser c++ API (#110318 )" This reverts commit `bf0866fc16`. Reverted https://github.com/pytorch/pytorch/pull/110318 on behalf of https://github.com/davidberard98 due to too many warnings being thrown in torchvision https://github.com/pytorch/pytorch/issues/110857 ([comment](https://github.com/pytorch/pytorch/pull/110318#issuecomment-1753245449))	2023-10-09 15:41:50 +00:00
vfdev-5	d2a2a67fa4	Added new test sample to interpolate op in OpInfo (#104181 ) Description: - Added new test sample to interpolate op in OpInfo - Fixed silent issue with zero tensor test sample for uint8 dtype Pull Request resolved: https://github.com/pytorch/pytorch/pull/104181 Approved by: https://github.com/pmeier, https://github.com/lezcano	2023-10-09 10:55:56 +00:00

1 2 3 4 5 ...

22654 Commits