pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Aaron Gokaslan	71cb13869b	[Easy][BE]: Enable clang-tidy check for duplicate includes (#116193 ) Adds a clang-tidy check to flag duplicate include files Pull Request resolved: https://github.com/pytorch/pytorch/pull/116193 Approved by: https://github.com/albanD, https://github.com/malfet	2023-12-20 17:56:21 +00:00
PyTorch MergeBot	fe15645619	Revert "Serve multistream graph captures from correct pool (#114647 )" This reverts commit `8a445f7bd5`. Reverted https://github.com/pytorch/pytorch/pull/114647 on behalf of https://github.com/jeanschmidt due to breaking multiple internal build jobs, please check internal diff in order to obtain more details ([comment](https://github.com/pytorch/pytorch/pull/114647#issuecomment-1864840724))	2023-12-20 17:11:42 +00:00
Bin Bao	fabf9433e7	[AOTI][refactor] Organize model runner files (#116022 ) Summary: Move runner util files into a subdirectory and put AOTIModelContainerRunnerCpu into a separate file Differential Revision: [D52300693](https://our.internmc.facebook.com/intern/diff/D52300693) Pull Request resolved: https://github.com/pytorch/pytorch/pull/116022 Approved by: https://github.com/khabinov	2023-12-20 15:35:34 +00:00
soulitzer	4d6a1ad400	Activation checkpoint and checkpoint_sequential errors if use_reentrant not passed explicitly (#115868 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115868 Approved by: https://github.com/albanD ghstack dependencies: #115438	2023-12-20 15:23:44 +00:00
soulitzer	cfb3cd11c1	Add basic autograd TORCH_LOGS support (#115438 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115438 Approved by: https://github.com/albanD	2023-12-20 15:23:44 +00:00
Michael Lazos	8eb7f6276b	Ensure wrapping subclasses with `as_subclass` is supported (#116091 ) As title Pull Request resolved: https://github.com/pytorch/pytorch/pull/116091 Approved by: https://github.com/pmeier, https://github.com/zou3519	2023-12-20 14:37:08 +00:00
PyTorch MergeBot	c215e59bf2	Revert "[inductor] Avoid bool being upcast to int (#109913 )" This reverts commit `92998693a9`. Reverted https://github.com/pytorch/pytorch/pull/109913 on behalf of https://github.com/jeanschmidt due to causing performance regression in relevant metrics, @malfet I believe you are the correct person to help identify and fix the issues. More details check internal OPS count for ads metricsnin the internal related diff ([comment](https://github.com/pytorch/pytorch/pull/109913#issuecomment-1864397407))	2023-12-20 12:33:50 +00:00
PyTorch MergeBot	bdfabe5e7d	Revert "[Dynamo][9/N] Make SkipFilesVariable wrap functions only (#115963 )" This reverts commit `bb5a27052f`. Reverted https://github.com/pytorch/pytorch/pull/115963 on behalf of https://github.com/jeanschmidt due to causing significant performance regression, identified by number of ops in ads, please check internal diff ([comment](https://github.com/pytorch/pytorch/pull/115963#issuecomment-1864361697))	2023-12-20 12:06:55 +00:00
Yifu Wang	6e1ba79b7f	[re-land] Introduce 3 low-latency, intra-node allreduce algorithms for small messages to PyTorch (#114001 ) (#116125 ) This is an attempt to re-land https://github.com/pytorch/pytorch/pull/114001. The previous attempt used `std::array` in cuda kernels which wasn't compatible with Meta's internal build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116125 Approved by: https://github.com/yf225	2023-12-20 07:13:50 +00:00
Carlos Mocholí	9df4ee8d38	Fix ColwiseParallel typo (#116151 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/116151 Approved by: https://github.com/wanchaol	2023-12-20 06:40:32 +00:00
Will Constable	3747aca49a	[C10D] Make all PGNCCL LOG usages use logPrefix() (#116060 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/116060 Approved by: https://github.com/fduwjj ghstack dependencies: #116059	2023-12-20 04:19:45 +00:00
Elias Ellison	6ffe1da375	Add support for multi device foreach ops (#116064 ) Fix for https://github.com/pytorch/pytorch/issues/102023 Pull Request resolved: https://github.com/pytorch/pytorch/pull/116064 Approved by: https://github.com/mlazos	2023-12-20 04:19:40 +00:00
Bin Bao	a597a00c87	[AOTI][refactor][3/n] Declare python_kernel_name and cpp_kernel_name in ExternKernel (#115972 ) Summary: Both ExternKernelAlloc and ExternKernelOut need the two fields, so declaring them in the base class. Also add cpp codegen for IndexPutFallback and InplaceBernoulliFallback in this PR. This is a reland of https://github.com/pytorch/pytorch/pull/115831 Differential Revision: [D52290900](https://our.internmc.facebook.com/intern/diff/D52290900) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115972 Approved by: https://github.com/chenyang78	2023-12-20 03:22:03 +00:00
Will Constable	4f02cc0670	[C10D] Add logPrefix to abortCommsFromMap (#116059 ) Prints additional info such as PG ID/Rank. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116059 Approved by: https://github.com/fduwjj	2023-12-20 02:17:04 +00:00
Oleg Khabinov	c3bc65d9d8	[dynamo] Restore constant tensor original FQNs (#116086 ) Differential Revision: D52192693 Pull Request resolved: https://github.com/pytorch/pytorch/pull/116086 Approved by: https://github.com/angelayi, https://github.com/muchulee8	2023-12-20 02:10:02 +00:00
Tugsbayasgalan Manlaibaatar	6730b5bcb4	Support nn_module_stack in torch.export(strict=False) (#115454 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115454 Approved by: https://github.com/suo, https://github.com/bdhirsh	2023-12-20 01:43:39 +00:00
Sun, Jiayi	c173a9d9b3	add Half support for layer_norm on CPU (#99590 ) ### Testing Single socket (icx, 32cores): \| shape \| fp32 forward (ms) \| fp16 forward (ms) \| mixed fp32 fp16 forward (ms) \| fp32 backward (ms) \| fp16 backward (ms) \| mixed fp32 fp16 backward (ms) \| \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| \| (1, 8, 16) \| 0.012 \| 0.011 \| 0.011 \| 0.051 \| 0.051 \| 0.050 \| \| (8 ,8, 16) \| 0.013 \| 0.013 \| 0.013 \| 0.054 \| 0.053 \| 0.051 \| \| (32, 8, 16) \| 0.015 \| 0.014 \| 0.014 \| 0.059 \| 0.054 \| 0.052 \| \| (64, 128, 56, 56) \| 1.875 \| 0.790 \| 1.016 \| 12.845 \| 7.151 \| 6.985 \| \| (64, 128, 256, 256) \| 50.226 \| 25.462 \| 35.736 \| 328.957 \| 179.615 \| 175.618 \| Single core (icx): \| shape \| fp32 forward (ms) \| fp16 forward (ms) \| mixed fp32 fp16 forward (ms) \| fp32 backward (ms) \| fp16 backward (ms) \| mixed fp32 fp16 backward (ms) \| \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| \| (1, 8, 16) \| 0.012 \| 0.011 \| 0.011 \| 0.040 \| 0.041 \| 0.041 \| \| (8 ,8, 16) \| 0.012 \| 0.012 \| 0.012 \| 0.042 \| 0.042 \| 0.042 \| \| (32, 8, 16) \| 0.027 \| 0.014 \| 0.014 \| 0.048 \| 0.048 \| 0.046 \| \| (64, 128, 56, 56) \| 58.054 \| 11.034 \| 17.928 \| 108.603 \| 48.816 \| 50.244 \| \| (64, 128, 256, 256) \| 1327.758 \| 352.394 \| 496.994 \| 2846.182 \| 1224.247 \| 1218.422 \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/99590 Approved by: https://github.com/mingfeima, https://github.com/jgong5, https://github.com/cpuhrsch	2023-12-20 01:11:15 +00:00
Oguz Ulgen	c55210b4f0	[Inductor] Deduplicate grid wrapper statements for user defined triton kernels (#115849 ) Noticed that on many MRS kernels the grid wrapper for autotuning is huge with a bunch of duplicates due to num_warps and num_stages not being needed for grid calculation. Lets deduplicate these entries. Previously, we would see wrapper like ``` def grid_wrapper_for_add_kernel_2d_autotuned_0(meta): if meta['BLOCK_SIZE_X'] == 128 and meta['BLOCK_SIZE_Y'] == 128: return (4, 2, 1) if meta['BLOCK_SIZE_X'] == 128 and meta['BLOCK_SIZE_Y'] == 128: return (4, 2, 1) if meta['BLOCK_SIZE_X'] == 64 and meta['BLOCK_SIZE_Y'] == 64: return (8, 4, 1) if meta['BLOCK_SIZE_X'] == 64 and meta['BLOCK_SIZE_Y'] == 64: return (8, 4, 1) ``` now it looks like ``` def grid_wrapper_for_add_kernel_2d_autotuned_0(meta): if meta['BLOCK_SIZE_X'] == 128 and meta['BLOCK_SIZE_Y'] == 128: return (4, 2, 1) if meta['BLOCK_SIZE_X'] == 64 and meta['BLOCK_SIZE_Y'] == 64: return (8, 4, 1) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/115849 Approved by: https://github.com/jansel	2023-12-20 00:25:32 +00:00
Elias Ellison	9a2a44457a	SDPA extend backward realized tensor alignment checking to forward realized tensors (#116069 ) The logic to check alignment for realized tensors in the backward can be extended for realized tensors in the forward. This fixes an interaction with freezing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116069 Approved by: https://github.com/drisspg	2023-12-20 00:14:20 +00:00
Avik Chaudhuri	68c7aac809	[export][reland] non-strict export with dynamic shapes (#116048 ) Reland of https://github.com/pytorch/pytorch/pull/115862 Pull Request resolved: https://github.com/pytorch/pytorch/pull/116048 Approved by: https://github.com/ydwu4	2023-12-19 23:57:22 +00:00
aaitzhan	f88c9af98e	[TEST] Skip scaled_dot_product_attention test on sm < 80 (#115760 ) According to the [functionality](https://github.com/NVIDIA/cutlass/blob/main/media/docs/functionality.md) page, CUTLASS support `bfloat16` aka `bf16` only on compute capability 80+ devices. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115760 Approved by: https://github.com/drisspg	2023-12-19 22:00:33 +00:00
Tugsbayasgalan Manlaibaatar	d85314c95c	Support Predispatch functionalization (#113728 ) In this PR, we are implementing Functionalization on pre-dispatch graph. Today, every dispatch key except for Dispatchkey.Python has a dedicated mode stack in python. PreDispatch tracing relies on this behaviour by pushing ProxyTorchDispatchMode to Dispatchkey.PreDispatch mode stack and handle the dispatching logic in python. To make pre-dispatch functionalization work, we now need to push FunctionalTensorMode on DispatchKey.PreDispatch mode stack and make sure it runs before ProxyTorchDispatchMode. (this is very similar to how post-dispatch tracing work). Here are some design decisions we made for this flow to work: 1. FunctionalTensorMode internally calls C++ functionalize key. Since C++ functionalization goes after PreDispatch, if we are not careful, we will keep re-entering into PreDispatch key. We solve this by directly dispatching to C++ Functionalize key. 2. We delete mode_stack_per_key logic because the only realistic time it is exercised is for PreDispatch and it is in general not safe to have a plain list because FunctionalTensorMode and ProxyTorchDispatchMode ordering matter and it is hard to enforce it on plain list. Instead, now we have a private class that tracks PreDispatch mode stack. 3. We will still run CompositeImplicitAutograd decomps in this PR, and disable this logic later as a followup. Some missing bits after this PR: 1. Preserving autograd ops in a functional form. Right now they still show up in the graph but in a "non-functional" way. 2. Turn off CompositeImplicitAutograd decomps 3. Functionalizing HOO Pull Request resolved: https://github.com/pytorch/pytorch/pull/113728 Approved by: https://github.com/bdhirsh	2023-12-19 20:28:35 +00:00
Joel Schlosser	1474eb5f29	Fix jagged composite impl of flatten() (#115192 ) Need to handle this in `NestedTensor.__torch_function__()` since it's CompositeImplicit Pull Request resolved: https://github.com/pytorch/pytorch/pull/115192 Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer	2023-12-19 19:15:21 +00:00
voznesenskym	77d5f60740	[fsdp][torch.compile] FSDP changes (#115497 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115497 Approved by: https://github.com/albanD	2023-12-19 18:44:36 +00:00
Lucas Pasqualin	d749b4a152	Implements `permute_tensor` in functional collectives (#115078 ) Implementation of `permute_tensor` as per @yifuwang 's suggestion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115078 Approved by: https://github.com/wanchaol, https://github.com/yifuwang	2023-12-19 18:33:28 +00:00
rzou	5ba87a31bc	Unflake test_reference_numerics_large__refs_special_multigammaln_mvlgamma_p_1_cpu_bfloat16 (#116058 ) Run the test under markDynamoStrict mode and record an expected failure under the Dynamo CI shard. Test Plan: - wait for CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/116058 Approved by: https://github.com/atalman	2023-12-19 16:42:29 +00:00
PyTorch MergeBot	91e184fd74	Revert "Introduce 3 low-latency, intra-node allreduce algorithms for small messages to PyTorch (#114001 )" This reverts commit `4edc921857`. Reverted https://github.com/pytorch/pytorch/pull/114001 on behalf of https://github.com/jeanschmidt due to Breaking multiple internal tests, might be flakiness but multiple retries did not elicit an improvement, please check internal diff ([comment](https://github.com/pytorch/pytorch/pull/114001#issuecomment-1863036417))	2023-12-19 16:01:19 +00:00
PyTorch MergeBot	b6d0d0819a	Revert "[PT2] [Quant] Change the QConv2d Binary post op name from add to sum (#115329 )" This reverts commit `9ae0e62929`. Reverted https://github.com/pytorch/pytorch/pull/115329 on behalf of https://github.com/jeanschmidt due to Breaking internal builds, please check internal diff to get the list and logs, @jerryzh168 please support the author in order to get these changes merged and landed ([comment](https://github.com/pytorch/pytorch/pull/115329#issuecomment-1863021726))	2023-12-19 15:52:57 +00:00
PyTorch MergeBot	c539f7df10	Revert "[Inductor] Deduplicate grid wrapper statements for user defined triton kernels (#115849 )" This reverts commit `21b8127f1c`. Reverted https://github.com/pytorch/pytorch/pull/115849 on behalf of https://github.com/jeanschmidt due to Breaking internal tests, please check internal diff for more details ([comment](https://github.com/pytorch/pytorch/pull/115849#issuecomment-1863012933))	2023-12-19 15:47:55 +00:00
Philip Meier	505a9e4854	add support for dynamic shapes in round (#115259 ) Fixes #114310 and supersedes #114748. There are two reasons why we have quite a few special cases for `round`: 1. `round` is actually two ops. With `ndigits=None` (default), `round` always returns an integer. When `ndigits` is an integer, the returned type is a float. 2. Although `round` takes two arguments, it is a unary function with a parameter rather than a binary one. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115259 Approved by: https://github.com/peterbell10, https://github.com/lezcano	2023-12-19 15:45:50 +00:00
PyTorch MergeBot	a7bfa04da6	Revert "More markDynamoStrictTest (#115870 )" This reverts commit `7f686c8fe1`. Reverted https://github.com/pytorch/pytorch/pull/115870 on behalf of https://github.com/jeanschmidt due to Breaking internal tests and builds, please check diff ([comment](https://github.com/pytorch/pytorch/pull/115870#issuecomment-1862997125))	2023-12-19 15:40:57 +00:00
PyTorch MergeBot	24af118e55	Revert "markDynamoStrictTest more tests (#115871 )" This reverts commit `478f0e96dc`. Reverted https://github.com/pytorch/pytorch/pull/115871 on behalf of https://github.com/jeanschmidt due to Breaking internal tests and builds, please check diff, this is required to revert #115870 ([comment](https://github.com/pytorch/pytorch/pull/115871#issuecomment-1862992931))	2023-12-19 15:36:27 +00:00
PyTorch MergeBot	5b6b680517	Revert "Adamw refactor (#115983 )" This reverts commit `eafeba71c1`. Reverted https://github.com/pytorch/pytorch/pull/115983 on behalf of https://github.com/jeanschmidt due to Breaking internal tests, @janeyx99 please help @tfsingh to have this PR landed ([comment](https://github.com/pytorch/pytorch/pull/115983#issuecomment-1862976954))	2023-12-19 15:26:44 +00:00
Peter Bell	92998693a9	[inductor] Avoid bool being upcast to int (#109913 ) Currently the inductor code for `x.any(-1)` does a this strange dance: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128x0)), rmask & xmask) tmp1 = tmp0.to(tl.int64) tmp2 = (tmp1 != 0) ``` This happens because `register_lowering` is doing type promotion with the dimension argument, and so promotes to `int64` which we then cast back to bool. A better fix would be to fix `register_lowering` but for now I just remove the unnecessary type promotion from `aten.any`. In the current code we also see: ```python tmp5 = tl.where(rmask & xmask, tmp3, 0) ``` which promotes the boolean value to int since `0` is an int32 in triton. This fixes it to generate a boolean constant instead. Finally there is also a triton bug where the `tl.load` itself upcasts to `tl.int8`. I fix this by adding an explicit cast to `tl.int1`. The final kernel code looks like: ```python tmp0 = tl.load(in_ptr0 + (r1 + (128x0)), rmask & xmask).to(tl.int1) tmp1 = tl.broadcast_to(tmp0, [XBLOCK, RBLOCK]) tmp3 = tl.full([1, 1], 0, tl.int1) tmp4 = tl.where(rmask & xmask, tmp1, tmp3) tmp5 = triton_helpers.any(tmp4, 1)[:, None] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/109913 Approved by: https://github.com/lezcano	2023-12-19 14:16:10 +00:00
Carlos Mocholí	a31effa15f	Update device_mesh.py docs imports (#116074 ) These are not importable from `torch.distributed`, at least today. Pull Request resolved: https://github.com/pytorch/pytorch/pull/116074 Approved by: https://github.com/wz337, https://github.com/fegin	2023-12-19 09:44:55 +00:00
FFFrog	327bdcdb14	Some tiny modification about torch.set/get_default_device (#116014 ) 1. fix bug of torch.set_default_device in multi-threading 2. add new interface named torch.get_default_device Fixes #115333 Fixes #115917 Pull Request resolved: https://github.com/pytorch/pytorch/pull/116014 Approved by: https://github.com/malfet, https://github.com/jansel	2023-12-19 05:08:06 +00:00
wz337	b48abbc020	[DeviceMesh] Fix DeviceMesh docstring (#116053 ) 1. remove outdated comments 2. fix examples in docstring Doc after fix: <img width="706" alt="image" src="https://github.com/pytorch/pytorch/assets/31293777/19f4f03c-0fd7-4e88-bca1-1a6ce693fbb7"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/116053 Approved by: https://github.com/wanchaol	2023-12-19 04:05:49 +00:00
Isuru Fernando	8b0122ad33	Add lowerings for reflection_pad{1, 3}d_backward (#115645 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115645 Approved by: https://github.com/lezcano, https://github.com/peterbell10	2023-12-19 04:05:10 +00:00
Oguz Ulgen	01b979fc9a	[Inductor] Fix constant folding and extern kernel mutation tracking bugs (#115908 ) This PR fixes two bugs 1) Constant folding a triton kernel results in the kernel's inputs to be returned back without any modification. Disable constant folding for triton kernels. Need more investigation 2) NoneLayout buffers should not be deleted as they do not exist Pull Request resolved: https://github.com/pytorch/pytorch/pull/115908 Approved by: https://github.com/aakhundov, https://github.com/jansel	2023-12-19 02:06:50 +00:00
Yanbo Liang	bb5a27052f	[Dynamo][9/N] Make SkipFilesVariable wrap functions only (#115963 ) Make ```SkipFilesVariable``` only handle function type, and route skipped classes to ```UserDefinedClassVariable```. The reasons behind this are: * We'd like to remove ```is_allowed```, so the allowed/disallowed torch classes should have a proper place to handle. We can put them in either ```SkipFilesVariable``` and ```UserDefinedClassVariable``` under the current architecture, but it's confusing to have two places do one thing. - Going forward, let's make ```SkipFilesVariable``` only handle functions, and probably I'll rename it to ```SkippedFunctionVariable``` in the following PRs. - Let's do dispatch by value's type, all torch classes stuff would go to ```UserDefinedClassVariable``` in the next PR. * We'd merge in_graph/skip/inline trace decision into the same API ```trace_rule.lookup```, so probably we have to limit the input to only function for better organizing ```VariableBuilder._wrap``` logics. - Next step, I'll merge ```skipfiles.check``` into ```trace_rules.lookup```, and do the skipfile check before wrapping them into correct variable tracker. - Though the ```TorchCtxManagerClassVariable``` is decided by ```trace_rules.lookup```, I'll refactor it out in the following PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/115963 Approved by: https://github.com/jansel	2023-12-19 02:01:47 +00:00
PyTorch MergeBot	47908a608f	Revert "[ROCm] add hipblaslt support (#114329 )" This reverts commit `b062ea3803`. Reverted https://github.com/pytorch/pytorch/pull/114329 on behalf of https://github.com/jeanschmidt due to Reverting due to inconsistencies on internal diff ([comment](https://github.com/pytorch/pytorch/pull/114329#issuecomment-1861933267))	2023-12-19 01:04:58 +00:00
Yue Dong	270ed13e87	[DTensor] Make DTensor `from_local` backward partial() to replicate() pass through (#115967 ) Summary: This change makes the `DTensor.from_local()` placements in backward pass from `Partial()` to `Replicate()` as pass through for following reasons: 1. When we run backward pass of DTensor.from_local, if the target placement is partial() (i.e. from user manual overwrite code instead of torch_dispatch) we keep the grad as replicate. This is because converting the gradients back to `Partial()` is meaningless. 2. The current div logic will lead to wrong numerical value in the above case. Test Plan: CI: CI Tests Unit test: `buck2 test mode/dev-nosan //caffe2/test/distributed/_tensor:redistribute` - Passed With model training: ``` # We tested the case where input tensor is manually overwrite as Partial() and # output tensor manually overwrite to Shard() then to local. # Before the change: numerical value not correct Forward pass: collective: ReduceScatter backward pass: collective: AllGather + div by process group size # After the change: div is removed as expected. Forward pass: collective: ReduceScatter Backward pas: collective: AllGather ``` Differential Revision: D52175709 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115967 Approved by: https://github.com/wanchaol	2023-12-19 00:16:10 +00:00
David Berard	054f9548b4	[dynamo] Store CompilationEvents in a buffer in torch._dynamo.utils (#115788 ) Motivation: it would be nice to be able to test using the metrics in log_compilation_event; currently dumps logs (or logs to a database in fbcode) - these are hard to use in unit tests. This change: * always record the information in torch._dynamo.utils.record_compilation_metrics; here, log into a limited-size deque to prevent the list of metrics from getting too long * if config.log_compilation_metrics, then call back into the original log_compilation_event function Pull Request resolved: https://github.com/pytorch/pytorch/pull/115788 Approved by: https://github.com/yanboliang	2023-12-18 23:26:13 +00:00
Joel Schlosser	bf62511e07	Reshape decomposition for jagged layout NT (#115191 ) No more segfault from using `reshape()` on jagged NT :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/115191 Approved by: https://github.com/cpuhrsch, https://github.com/soulitzer	2023-12-18 22:34:41 +00:00
Lucas Pasqualin	8452f41305	Adds allreduce to inductor remap (#115950 ) Fixes #115728 Implements a rewrite path for allreduce Pull Request resolved: https://github.com/pytorch/pytorch/pull/115950 Approved by: https://github.com/wconstab	2023-12-18 22:00:22 +00:00
Tianyu Liu	2a5659a797	add length assertion to PrepareModuleInput and PrepareModuleOutput (#115957 ) ## summary `zip(inputs, self.input_layouts, self.desired_input_layouts)` is used in `_prepare_input_fn`; similar for `_prepare_output_fn`. Without assertion, unmatched dimension in inputs/outputs will be lost, potentially causing unexpected behabiors. ## test plan `python test/distributed/tensor/parallel/test_tp_style.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/115957 Approved by: https://github.com/wanchaol	2023-12-18 21:50:18 +00:00
isdanni	2f7bb18def	[Doc] Add padding size constraint in nn.ReflectionPad2d (#115995 ) Fixes #115532 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115995 Approved by: https://github.com/mikaylagawarecki	2023-12-18 21:29:14 +00:00
Wanchao Liang	a1a0b290d2	[tp] further fix the docs (#115974 ) some typo result in the note section not rendered properly, can't see this from the last PR directly as the last PR only show the first commit documentation :( Also make the parallelize_module doc example more concrete Pull Request resolved: https://github.com/pytorch/pytorch/pull/115974 Approved by: https://github.com/wz337	2023-12-18 20:41:53 +00:00
zdevito	8a445f7bd5	Serve multistream graph captures from correct pool (#114647 ) This fixes #114320 by placing the logic for determining whether to allocate to a pool inside a callback that is controlled by CUDAGraph.cpp or by the python bound api to allocate a stream directly to a pool. Pull Request resolved: https://github.com/pytorch/pytorch/pull/114647 Approved by: https://github.com/ngimel, https://github.com/eellison	2023-12-18 18:24:15 +00:00
CK Luk	3b70bd3970	Take 2 of "Add an option to log the source of the Triton kernels generated by torch._inductor (#115979 ) Summary: This is useful the comparing the Triton kernels generated by two different invocations of torch.compile on the same model (e.g., checking of serial compile and parallel compile generate identical Triton kernels). Test Plan: Unit test: buck2 test mode/opt //caffe2/torch/fb/module_factory/sync_sgd/tests:test_torchdynamo_wrapper -- --print-passing-details >& ~/tmp/log.test PyPer Mast job: https://www.internalfb.com/mast/job/sw-951074659-OfflineTraining_87587a4e See the *.py files generated in: pyper_traces/tree/torchinductor_traces/sw-951074659-OfflineTraining_87587a4e/4623 Differential Revision: D52221500 Pull Request resolved: https://github.com/pytorch/pytorch/pull/115979 Approved by: https://github.com/yanboliang	2023-12-18 18:16:44 +00:00

1 2 3 4 5 ...

34349 Commits