pytorch

mirror of https://github.com/zebrajr/pytorch.git synced 2025-12-07 12:21:27 +01:00

Author	SHA1	Message	Date
Laith Sakka	39df901b2a	introduce definitely_contiguous and use it for reshape and tensor meta data computation. (#153432 ) when a tensor has unbacked symbols it can be general enough to represent both contiguous and non contiguous tensors. in that case we cant really evaluate is_contiguous. In many places in the code base, we check for is_contiguous to take a fast path. but the general path usually works for both contiguous and not contiguous in that case we probably want to use definitely _contiguous API. This is appleid for reshape in this PR and also to tensor meta data computation, the meta data now will have an attribute that says that its contiguous when its always contiguous. We would store that only if definitely _contiguous is true now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153432 Approved by: https://github.com/bobrenjc93	2025-05-28 03:41:26 +00:00
PyTorch MergeBot	11a51a11af	Revert "introduce definitely_contiguous and use it for reshape and tensor meta data computation. (#153432 )" This reverts commit `5c6d7caaaa`. Reverted https://github.com/pytorch/pytorch/pull/153432 on behalf of https://github.com/malfet due to Looks like it broke flex attention tests, see https://hud.pytorch.org/hud/pytorch/pytorch/main/1?per_page=50&name_filter=g6.4xlarge&mergeEphemeralLF=true ([comment](https://github.com/pytorch/pytorch/pull/153432#issuecomment-2912562570))	2025-05-27 13:42:34 +00:00
Laith Sakka	5c6d7caaaa	introduce definitely_contiguous and use it for reshape and tensor meta data computation. (#153432 ) when a tensor has unbacked symbols it can be general enough to represent both contiguous and non contiguous tensors. in that case we cant really evaluate is_contiguous. In many places in the code base, we check for is_contiguous to take a fast path. but the general path usually works for both contiguous and not contiguous in that case we probably want to use definitely _contiguous API. This is appleid for reshape in this PR and also to tensor meta data computation, the meta data now will have an attribute that says that its contiguous when its always contiguous. We would store that only if definitely _contiguous is true now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153432 Approved by: https://github.com/bobrenjc93	2025-05-27 08:54:31 +00:00
Aaron Orenstein	6503b4a96e	Update to using mypy 1.15 (#154054 ) The BC break isn't real - mypy decided to start complaining about the way we were typing that function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/154054 Approved by: https://github.com/Skylion007	2025-05-24 04:30:57 +00:00
Autin Mitra	5623d30228	[Minimizer] Gracefully exit when there is no discrepancy in block mode (#154076 ) Summary: Previously, when there is no discrepancy in results for block mode, net_min_base will throw an OOB error. This occurs due to the block _block_traverse_impl returning an OOB after exhausting subgraphs all the way down to a single node There is also an issue where we may get an unsound subgraph (i.e. mark an earlier node as the "end" even if the correct end is later). This is due to an incorrect check (start_idx == mid) where there can possibly be two values left before the program pre-maturely returns Test Plan: Buck UI: https://www.internalfb.com/buck2/52524c26-ace5-4593-8a4b-843a54eb206a Test UI: https://www.internalfb.com/intern/testinfra/testrun/3096224973363310 Network: Up: 0B Down: 15MiB (reSessionID-cd404e97-395f-49fc-8381-373e90a1378f) Executing actions. Remaining 0/1 Command: test. Time elapsed: 53.7s Tests finished: Pass 7. Fail 0. Fatal 0. Skip 0. Build failure 0 Differential Revision: D75143242 Pull Request resolved: https://github.com/pytorch/pytorch/pull/154076 Approved by: https://github.com/jfix71	2025-05-23 06:42:07 +00:00
Deysha Rivera	aec7cc60d7	add graph_code_verbose_log artifact for fx passes (#153775 ) Fixes #153646 This PR refactors the logging behavior in the FX pass insert_deferred_runtime_asserts and runtime_assert.py to separate verbose/intermediate graph logs from the final output graph log. All verbose logs generated during the FX pass are now routed to a new artifact logger, graph_code_verbose, while only the final output graph remains logged to the original graph_code artifact. Changes - Added a new artifact logger: [graph_code_log = torch._logging.getArtifactLogger(__name__, "graph_code_verbose")] - Updated all verbose/intermediate FX pass logs in [insert_deferred_runtime_asserts] to use the new graph_code_verbose artifact. - Ensured that only the final output graph is logged to the original graph_code artifact. - No changes to the FX pass logic or output—only logging behavior is affected. Notes This change is backward-compatible and does not affect the functional behavior of FX passes. No changes to user-facing APIs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/153775 Approved by: https://github.com/williamwen42	2025-05-21 18:31:59 +00:00
PyTorch MergeBot	d81217be2e	Revert "Improve torch.ops typing (#153558 )" This reverts commit `c5cba39d46`. Reverted https://github.com/pytorch/pytorch/pull/153558 on behalf of https://github.com/yangw-dev due to Your diff will not be landed to fbcode since we suspect it caused the following breakage in an internal test:[D75007157](https://www.internalfb.com/diff/D75007157) for instance: tests_gpu/lookup_gpu_index_test.py:232:8 Undefined attribute [16]: torch._ops._OpNamespace has no attribute simple_index_mm_batch ([comment](https://github.com/pytorch/pytorch/pull/153558#issuecomment-2892506789))	2025-05-19 23:32:36 +00:00
Benjamin Glass	c5cba39d46	Improve torch.ops typing (#153558 ) Fixes longstanding issue where direct references to aten operations are seen as untyped by type checkers. This is accomplished by setting attributes on several classes more consistently, so that `__getattr__` can return a single type in all other cases. Decisions made along the way: 1. `torch.ops.higher_order` is now implemented by a single-purpose class. This was effectively true before, but the class implementing it attempted to be generalized unnecessarily. Fixing this simplified typing for the `_Ops` class. 2. `__getattr__` is only called when all other lookup methods have failed, so several constant special-cases in the function could be implemented as class variables. The remainder of this PR is fixing up all the bugs exposed by the updated typing, as well as all the nitpicky typing issues. Test plan: CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/153558 Approved by: https://github.com/rec, https://github.com/Skylion007, https://github.com/cyyever	2025-05-19 14:52:32 +00:00
Jim Wan	8b8051f6ed	[Minimizer] Fix the path naming (#153130 ) Summary: Added some logging and captured the indexing. See below image. {F1977773416} This is why the saved module path is called `/tmp/jimwan/minimizer_a_acc.pt` Now the updated module paths are `/tmp/jimwan/minimizer_addmm_default_103_acc.pt`. Test Plan: ``` MTIAC_USE_DIST_REF_KERNELS=all buck2 run @//mode/opt mtia/accuracy/minimizer:mtia_minimizer_runner -- --mode sequential --compare_fn allclose --pt_save_dir /tmp/debug3 --atol 1e-4 --rtol 1e-4 --all_outputs --start_idx native_layer_norm_default_80 --end_idx getitem_272 2>&1 \| tee ~/test.log ``` {F1977773610} Reviewed By: qcyuan Differential Revision: D74369107 Pull Request resolved: https://github.com/pytorch/pytorch/pull/153130 Approved by: https://github.com/Skylion007	2025-05-08 19:59:52 +00:00
Animesh Jain	97dfd8dd53	[invoke_subgraph] Run missing graph passes recursively (#152675 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/152675 Approved by: https://github.com/bdhirsh, https://github.com/zou3519 ghstack dependencies: #152772, #152770	2025-05-06 02:55:34 +00:00
Anthony Shoumikhin	9e50c21e27	Fix xrefs (#151888 ) Fix existing cross references and removed old ones Pull Request resolved: https://github.com/pytorch/pytorch/pull/151888 Approved by: https://github.com/eqy, https://github.com/huydhn, https://github.com/svekars	2025-04-25 21:27:27 +00:00
Pian Pawakapan	fd3d339e17	[dynamic shapes] be less aggressive with runtime assert CSE for bounds (#151590 ) Fixes #150540 Fixes #147772 Stops trying to CSE bound expressions, only does exact deduplication for runtime asserts. Adds the test cases to check that AOTAutograd doesn't data-dependent error out when retracing due to not seeing the asserts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/151590 Approved by: https://github.com/laithsakka	2025-04-23 23:07:00 +00:00
Laith Sakka	6ea2e6a2d2	Do not do proper const fold during tensorify_python_scalars (#151494 ) Chatting with Bob the goal of this is to const fold the floats that where tensorified by calling guard_scalar(val) on them and then replacing their usages by their values. Hence we do not need to do this for nodes with no float symbols. We do not want todo proper const folding because we need to preserve statements that deferred runtime asserts depend on. (see the added test) Pull Request resolved: https://github.com/pytorch/pytorch/pull/151494 Approved by: https://github.com/bobrenjc93	2025-04-21 22:39:50 +00:00
Ze Sheng	e98afa0f89	[Sigmoid] Remove magic method in CapabilityBasedPartitioner (#149400 ) Summary: As title. Test Plan: CI Differential Revision: D70575197 Pull Request resolved: https://github.com/pytorch/pytorch/pull/149400 Approved by: https://github.com/jfix71	2025-03-19 16:02:43 +00:00
Zhou, Lingzhi	4a12777ffe	[Partitioner] Remove unnecessary upstream nodes in dependency viewer (#146580 ) We iterate upstream nodes to update partition map. But actually did nothing due to we iterate nodes with reversed topological order https://github.com/pytorch/pytorch/pull/136608/files#diff-f2f9dd3903fd99955732eb694941fea0cb7301a58d59554787f3311d417e5615L193 so that there exists no upstream nodes in assignment. Remove it to reduce for-loop overhead which up to O(N * N) complexity. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146580 Approved by: https://github.com/Skylion007, https://github.com/jerome-habana	2025-03-13 01:42:10 +00:00
lingzhi98	81aee3c9c4	[Partitioner] Reduce time consuming of partitions merger (#146582 ) This patch optimize maybe_merge_partition func through 3-ways: Remove unnecessary copy https://github.com/pytorch/pytorch/blob/main/torch/fx/passes/infra/partitioner.py#L99. The number of copied nodes is large if we can merge all of the nodes of graph into one partition. Record users of each partition to avoid duplicate iteration over nodes https://github.com/pytorch/pytorch/blob/main/torch/fx/passes/infra/partitioner.py#L133. The trip count of this loop maybe very large. The nodes number of each partitions maybe not balance https://github.com/pytorch/pytorch/blob/main/torch/fx/passes/infra/partitioner.py#L145. We always encounter one issue: one partition has n nodes, but the other has one node. Merge the smaller partition into the larger can help to reduce time consuming. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146582 Approved by: https://github.com/jerome-habana, https://github.com/Skylion007	2025-03-12 09:24:38 +00:00
Qiaochu Yuan	12a95390ae	[Minimizer] allow overriding of ShapeProp logic by subclasses of _MinimizerBase (#148784 ) Summary: The changes contained in this diff - allow subclass Minimizer implementations to override the default shape propagation logic with custom logic - copies over the meta attribute on get_attr graph nodes during the graph splitting step - for both changes, behavior for existing classes do not change Test Plan: CI Differential Revision: D70799942 Pull Request resolved: https://github.com/pytorch/pytorch/pull/148784 Approved by: https://github.com/blaine-rister	2025-03-10 22:22:16 +00:00
Zesheng Zong	580f1183b4	Enable ruff rule S324 (#147665 ) Fixes #147627 - Add `S324` in `pyproject.toml ` - Running check and clean warnings ```bash lintrunner --take RUFF --all-files ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/147665 Approved by: https://github.com/Skylion007 Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>	2025-02-25 18:27:34 +00:00
Jing Shan	d5a2e4c754	[oncall] Change error message to be more readable (#146934 ) Summary: During oncall, got a debug, where the error message is a bit ambiguous, due to multiple colons, and full line cutoff ``` AssertionError: Expected order: 1 for the component: remote_request_only to be >= 2, the max order for all its ``` Update the error message to something like ``` AssertionError: Component remote_request_only order must be >= max order of its upstream components, got component order=1 and max=2 ``` Test Plan: CI Differential Revision: D69482789 Pull Request resolved: https://github.com/pytorch/pytorch/pull/146934 Approved by: https://github.com/ColinPeppler	2025-02-12 23:33:09 +00:00
Aaron Gokaslan	7f65a20884	[BE]: Enable ruff SLOT checks (#146276 ) This enables a check that which a class which only inherits from immutable classes like str, tuple, and NamedTuple, also defined `__slots__` so they don't allocate memory unnecessarily. This also ensure contributors think about how they define their classes with subclass NamedTuples and str, of which we have many in our codebase Pull Request resolved: https://github.com/pytorch/pytorch/pull/146276 Approved by: https://github.com/aorenste	2025-02-04 19:18:23 +00:00
Aaron Gokaslan	292af3cc89	[BE][Ez]: ISC001 Auto concatenate implicit one line strings (#146408 ) Apply ruff rule about implicit string concatenation, this autofixes strings that are all the same type and on the same line. These lines are broken up likely as the result of autoformatters in the past. All fixes are automated using the autofixes in ISC001. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146408 Approved by: https://github.com/justinchuby, https://github.com/janeyx99	2025-02-04 19:07:04 +00:00
Aaron Orenstein	0b2a3687b9	PEP585 update - torch/fx (#145166 ) See #145101 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145166 Approved by: https://github.com/bobrenjc93	2025-01-20 18:11:54 +00:00
Shangdi Yu	e15f91337b	[inductor] Add unbacked symints binding in ShapeProp (#144605 ) Summary: ShapeProp doesn't know how to propagate unbacked. Patch it up to propagate unbacked symints like PropagateUnbackedSymInts. Test Plan: ``` buck run mode/dev-nosan fbcode//caffe2/test:fx -- -r test_shape_prop_unbacked_sym ``` Differential Revision: D68050073 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144605 Approved by: https://github.com/guowentian, https://github.com/pianpwk	2025-01-13 21:30:20 +00:00
Shangdi Yu	379b54603a	[Inductor] [bc-breaking] Node Level provenance tracking (#144277 ) Summary: - use GraphTransformObserver + replace_node hooks to track node sources when they are replaced - add pre_grad_graph tracking to tlparse - add the node provenance information to post_grad_graph tlparse. This is for the frontend to create a mapping between pre_grad and post_grad graph. See an example frontend (this is just a prototype) here: https://drive.google.com/file/d/1cMHH_0y4FJUSS9tATwGQvA72O0Lth8eh/view?usp=sharing - change "action" of NodeSource from a single action to a list of actions. - It's BC-Breaking because we removed `GraphTransformObserver`'s class methods `on_node_erase` and `on_node_erase` . https://docs.google.com/document/d/1dGh9myqNhywmbfP0Quzx_f04bghDFlj8cawj8MopiO8/edit?tab=t.0 The front-end code that takes in the tlparse result is in https://github.com/yushangdi/compiler_explorer. ghstack-source-id: 260390519 Test Plan: ``` buck2 run mode/dev-nosan fbcode//caffe2/test:fx -- -r test_graph_transform_observer buck run mode/dev-nosan fbcode//caffe2/test:fx -- -r node_source buck run mode/dev-nosan fbcode//caffe2/test:fx -- -r graph_provenance ``` Front-end example screenshots on a real model, 93% coverage rate between pre_grad_graph and post_grad_graph {F1973584210}{F1973584209} ``` buck2 build --show-output mode/opt -c=python.package_style=inplace -c fbcode.enable_gpu_sections=true -c fbcode.platform=platform010 -c fbcode.split-dwarf=true -c fbcode.nvcc_arch=a100,h100 caffe2/torch/fb/model_transform/experimental/benchmark:mts_gpu_benchmark MODEL_ENTITY_ID=644688112 SNAPSHOT_ID=32 MODULE=merge TORCH_COMPILE_DEBUG=1 CUDA_VISIBLE_DEVICES=7 TORCH_LOGS="+inductor,+schedule,output_code,graph_code" TORCHINDUCTOR_MAX_AUTOTUNE=1 TORCHINDUCTOR_UNIQUE_KERNEL_NAMES=1 ../buck-out/v2/gen/fbcode/ec86b05dd59e84db/caffe2/torch/fb/model_transform/experimental/benchmark/__mts_gpu_benchmark__/mts_gpu_benchmark.par --local-model /home/bahuang/models/${MODEL_ENTITY_ID}/${SNAPSHOT_ID}/gpu_lowering/input.predictor.disagg.gpu.merge --lower-backend AOT_INDUCTOR_EP --gpu-trace --aot-inductor-config="{'max_autotune': True}" buck2 run mode/dev-nosan fbcode//caffe2/test/inductor:auto_functionalize ``` Differential Revision: D65006709 Pull Request resolved: https://github.com/pytorch/pytorch/pull/144277 Approved by: https://github.com/desertfire	2025-01-09 22:06:51 +00:00
Faran Ahmad	729b7c0a84	[TGIF][Easy] Slightly improve the logging for tgif split pass (#143771 ) Summary: 1. Added more details for some of the assert statements. 2. Moved assert statements to use tgif_assert Test Plan: all unit tests should pass Reviewed By: jingsh Differential Revision: D67608251 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143771 Approved by: https://github.com/jingsh	2025-01-06 21:00:15 +00:00
cyy	df458be4e5	[4/N] Apply py39 ruff and pyupgrade fixes (#143257 ) ```torch/fx/passes/annotate_getitem_nodes.py``` was changed to support the new type hinting annotations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143257 Approved by: https://github.com/justinchuby, https://github.com/albanD	2025-01-04 10:47:51 +00:00
Shangdi Yu	f15af077fb	Fix get_source_partitions when weights are tied (#142446 ) Summary: Fix https://github.com/pytorch/pytorch/issues/142035 and https://github.com/pytorch/pytorch/issues/143621 When Linear module params are tied to another parameter, like this: ``` class SimpleLinearModel(nn.Module): def __init__(self, input_size, output_size): super(SimpleLinearModel, self).__init__() # Define a linear layer self.linear = nn.Linear(input_size, output_size) self.tied_weight = self.linear.weight def forward(self, x): # Forward pass through the linear layer b = self.tied_weight + 1 return self.linear(x), b ``` We get a graph like below: ``` graph(): %p_tied_weight : [num_users=0] = placeholder[target=p_tied_weight] %p_linear_weight : [num_users=2] = placeholder[target=p_linear_weight] %p_linear_bias : [num_users=1] = placeholder[target=p_linear_bias] %x : [num_users=1] = placeholder[target=x] %add : [num_users=1] = call_function[target=torch.ops.aten.add.Tensor](args = (%p_linear_weight, 1), kwargs = {}) %linear : [num_users=1] = call_function[target=torch.ops.aten.linear.default](args = (%x, %p_linear_weight, %p_linear_bias), kwargs = {}) return (linear, add) ``` Notice that ` %p_linear_weight : [num_users=2]`. When we get source partitions, we should exclude attributes nodes like `p_linear_weight` from outputs. A real world example where people do something like this is in https://github.com/pytorch/pytorch/issues/142035. Test Plan: ``` buck2 run 'fbcode//mode/dev-nosan' fbcode//caffe2/test:fx -- -r test_module_partitioner_weight_tied ``` Differential Revision: D66998592 Pull Request resolved: https://github.com/pytorch/pytorch/pull/142446 Approved by: https://github.com/angelayi	2025-01-04 04:28:20 +00:00
bobrenjc93	4f8b7c4272	Revert "refactor tensorify restart logic to use sources (#141517 )" (#143623 ) This reverts commit `30d8b30db7`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143623 Approved by: https://github.com/mlazos	2024-12-20 15:38:34 +00:00
bobrenjc93	8850a7b62c	add some logging for tensorify (#143391 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143391 Approved by: https://github.com/jamesjwu	2024-12-19 20:06:26 +00:00
Joel Schlosser	c5ddf5dd90	Unbacked SymInt fixes for subclasses + data-dependent slice() bounds (non-dynamic) (#143526 ) Lifted non-controversial (non-dynamic) fixes from #142062. See description there for context. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143526 Approved by: https://github.com/ezyang	2024-12-19 18:46:36 +00:00
Digant Desai	6829897682	Remove assert from partitioner.py (#143376 ) Remove erroneous assert assuming a dependent (user) node to be in the partition. This partially reverts #136616 by removing the assert. Tested locally with a failing ExecuTorch Arm test using ``` $ python -m examples.arm.aot_arm_compiler --model_name mv2 --target ethos-u55-128 --delegate --quantize ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/143376 Approved by: https://github.com/tarun292	2024-12-18 06:08:19 +00:00
Bob Ren	30d8b30db7	refactor tensorify restart logic to use sources (#141517 ) Differential Revision: [D67066706](https://our.internmc.facebook.com/intern/diff/D67066706) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141517 Approved by: https://github.com/ezyang	2024-12-11 07:15:39 +00:00
Huamin Li	65c2086d45	fix the lint from D66795414 (#142122 ) Summary: this diff is to fix the lint issues from D66457500 / https://github.com/pytorch/pytorch/pull/142056 Test Plan: OSS CI Reviewed By: houseroad, FulinHuang Differential Revision: D66795414 Pull Request resolved: https://github.com/pytorch/pytorch/pull/142122 Approved by: https://github.com/houseroad	2024-12-05 12:05:51 +00:00
Sherlock Huang	0be004ff37	Enable fuse_by_partitions to always return output as tuple (#142056 ) Summary: aot_compile only accept a graph with tuple output. we introduce an option to fuse_by_partitions to alway return outputs as tuple, even if it only have a single entry. Test Plan: OSS CI Differential Revision: D66457500 Pull Request resolved: https://github.com/pytorch/pytorch/pull/142056 Approved by: https://github.com/angelayi, https://github.com/hl475	2024-12-05 08:07:41 +00:00
Bob Ren	2f72635a5c	automatic dynamic unspecialize float (#141647 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141647 Approved by: https://github.com/ezyang	2024-11-29 22:36:53 +00:00
Zhou, Lingzhi	ad3986498a	[Partitioner] Speed up the update of partition map (#136616 ) We can update partition map by iterating users of node but not all of the downstream users of node. The former is faster than the latter which has many duplicate insertion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/136616 Approved by: https://github.com/jgong5, https://github.com/tarun292	2024-11-28 01:11:44 +00:00
Kshiteej K	af47e05a96	[fx] make split_module work with keep_original_order=True and no-op graph (#141340 ) Fixes https://github.com/pytorch/pytorch/issues/140014 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141340 Approved by: https://github.com/ezyang	2024-11-24 06:41:30 +00:00
Edward Z. Yang	612122af8f	Fix type-safety of torch.nn.Module instances (#141240 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/141240 Approved by: https://github.com/Skylion007, https://github.com/malfet	2024-11-22 00:05:05 +00:00
Marvin Kim	f2f7ef9d59	Fix `stride` in TensorMetadata to always be a `Tuple[int, ...]` (#141106 ) Test Plan: CI Differential Revision: D66204410 Pull Request resolved: https://github.com/pytorch/pytorch/pull/141106 Approved by: https://github.com/Skylion007, https://github.com/evanleed	2024-11-21 14:52:36 +00:00
Bob Ren	f3f7ba5a69	Restart dynamo analysis when we fail to tensorify away all symfloat inputs (#140346 ) Fixes a bunch of benchmarks that failed with cudagraph errors including `tlp python benchmarks/dynamo/timm_models.py --device cuda --inductor --accuracy --amp --training --only resmlp_12_224` when `specialize_float=False` Also brings down number of overall failures (with keep-going) from 108 => 62. I'd estimate >80% of those 62 are wobbly expect tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/140346 Approved by: https://github.com/ezyang ghstack dependencies: #140983, #141003	2024-11-20 21:20:41 +00:00
Laith Sakka	caa3a3e12c	Only compute new_untracked_symbols and new_unbacked_bindings if needed. (#140083 ) Summary: 237s -> 198.. buck2 run fbcode//mode/opt fbcode//torchrec/distributed/tests:pt2_compile_benchmark -- --num-features=2000 Test Plan: NA Differential Revision: D65638637 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140083 Approved by: https://github.com/ezyang, https://github.com/isuruf, https://github.com/anijain2305	2024-11-20 19:28:18 +00:00
Aaron Gokaslan	12e95aa4ee	[BE]: Apply PERF401 autofixes from ruff (#140980 ) * Automatically applies ruff rule 401. Turns loops into equivalent list comprehensions which are faster and do not leak the scope of the loop variables. * list comprehensions not only often have better typing, but are 50+% faster than for loops on overhead. They also preserve length information etc and are better for the interpreter to optimize. * Manually went back and made mypy happy after the change. * Also fixed style lints in files covered by flake8 but not by pyfmt Pull Request resolved: https://github.com/pytorch/pytorch/pull/140980 Approved by: https://github.com/justinchuby, https://github.com/malfet	2024-11-20 17:52:07 +00:00
Sidney Tsang	be90d3ce86	[IG] Avoid generation of empty merge cpu submodule by splitter v2 (#140794 ) Summary: Customize splitter behavior to mark `get_attr` nodes as acc supported. Currently these nodes are excluded by `FxNetAccNodesFinder` which marks all nodes with op not in `CALLABLE_NODE_OPS` ("call_module", "call_function", "call_method") as unsupported. Before this change, merge-net is split into an almost empty cpu submodule with a single empty output node: ``` INFO:caffe2.torch.fb.model_transform.experimental.prepare_fx_model:###### debug_print nodes for _run_on_cpu_0 INFO:caffe2.torch.fb.model_transform.experimental.prepare_fx_model:Found output node: n.name='output', n.target='output', n.args=((),), n.kwargs={}, n.meta={} INFO:caffe2.torch.fb.model_transform.experimental.prepare_fx_model:return () INFO:caffe2.torch.fb.model_transform.experimental.prepare_fx_model: _run_on_cpu_0 stats for merge: [output] output: 1 ``` full log: P1678727348 (generated using same command as below) Test Plan: Tested by lowering `ig_organic_feed_cn_v2_mtml` using cmd: ``` buck run mode/opt-split-dwarf //tgif/cli:cli -- --model-name=ig_organic_feed_cn_v2_mtml --model-type ig_organic_feed_cn_v2_mtml --world-size=1 --storage-mode 1 --inference-dtype=FP16 --meta-transform=False --use-random-weights=True --accelerator-arch=3 --enable-input-dist=True --embedding-tables-dtype=FP16 --mtia-use-torch-export=True embedding-quantization-pass torchrec-sharding-pass tgif-split-pass gen-app-graph-pass tgif-mtia-lowering-pass dense-quantization-pass save-torch-package-pass generate-model-package-pass pack-weights-and-save-pass 2>&1 \| tee /tmp/publish_ig_organic_feed_cn_v2_mtml_mtia_export_20241114_splitter_2.log ``` Output shows only 1 acc submodule is generated for merge: ``` INFO 18:33:15.951 1735650 utils.py:235: [TGIF] num of acc submodules: 1 INFO 18:33:15.952 1735650 utils.py:236: [TGIF] num of cpu submodules: 0 INFO 18:33:16.534 1735650 logging_utils.py:53: [TGIF] _run_on_acc_0 graph module debug info: https://www.internalfb.com/intern/everpaste/?color=0&handle=GK4VKhWsDKF9VdsDAKxhR6KAlhJ0br0LAAAz INFO 18:33:16.534 1735650 utils.py:257: [TGIF] Start MTIA lowering _run_on_acc_0 in merge, device ordinal: -1 ``` full log: P1679596796 Differential Revision: D65983916 Pull Request resolved: https://github.com/pytorch/pytorch/pull/140794 Approved by: https://github.com/ezyang	2024-11-16 01:49:03 +00:00
Zhou, Lingzhi	33191bb664	[Partitioner] Enumerate partitions by iterating partition ids (#136598 ) Currently, we get all partition id by iterating assignment whose size is same as the number of nodes in graph. But we can reach same results by iterating partitions_by_id whose size is much smaller than the nodes number. Assume the number of nodes is N, the number of partitions is P, the time complexity decrease from O(N * N) to O(N * P) after this patch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/136598 Approved by: https://github.com/mcr229 Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>	2024-11-15 00:25:14 +00:00
Zejun Huang	274f4cfacb	[3/x][fx minimizer] Support all_outputs in minimizer (#139774 ) Summary: output nodes may be eliminated to the input nodes if only partial output nodes are specified. add option to check results for all output nodes in the partitioned graph Test Plan: see D65367305 Reviewed By: qcyuan Differential Revision: D65367305 Pull Request resolved: https://github.com/pytorch/pytorch/pull/139774 Approved by: https://github.com/jfix71	2024-11-13 22:56:42 +00:00
PyTorch MergeBot	222175b3d5	Revert "[Partitioner] Enumerate partitions by iterating partition ids (#136598 )" This reverts commit `2ede4c9a38`. Reverted https://github.com/pytorch/pytorch/pull/136598 on behalf of https://github.com/kit1980 due to breaking internal ExecuTorch tests ([comment](https://github.com/pytorch/pytorch/pull/136598#issuecomment-2469294995))	2024-11-11 23:42:51 +00:00
Bob Ren	4488e23763	Fix another item memo loss location + bool specialization bug (#139587 ) This fix was a bit more involved: 1) It fixes a item_memo loss place. 2) It updates a test to be eager instead of aot_eager since it reveals a very obscure bug related to replacements that's not worth solving since in practice inductor will regenerate the runtime asserts anyways 3) It updates tensorify to specialize more places now that the aforementioned bug is fixed. Fixes `PYTORCH_OPINFO_SAMPLE_INPUT_INDEX=6 python test/inductor/test_torchinductor_opinfo.py TestInductorOpInfoCPU.test_comprehensive_linalg_norm_cpu_float16` when `specialize_float=False` while ensuring `python test/dynamo/test_dynamic_shapes.py DynamicShapesMiscTests.test_runtime_assert_replacement_dynamic_shapes` doesn't regress Pull Request resolved: https://github.com/pytorch/pytorch/pull/139587 Approved by: https://github.com/ezyang ghstack dependencies: #139569, #139457, #139568, #139572, #139846, #139454, #139896, #139935	2024-11-09 03:11:19 +00:00
Zhou, Lingzhi	2ede4c9a38	[Partitioner] Enumerate partitions by iterating partition ids (#136598 ) Currently, we get all partition id by iterating assignment whose size is same as the number of nodes in graph. But we can reach same results by iterating partitions_by_id whose size is much smaller than the nodes number. Assume the number of nodes is N, the number of partitions is P, the time complexity decrease from O(N * N) to O(N * P) after this patch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/136598 Approved by: https://github.com/ezyang Co-authored-by: Aaron Gokaslan <aaronGokaslan@gmail.com>	2024-11-09 01:31:46 +00:00
Colin L. Rice	e675c6702d	justknobs: Remove JustKnobsConfig and justknobs_feature (#138767 ) This never ended up getting used, and instead we're doing this resolution within the configuration system. Removing these unused internal features. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138767 Approved by: https://github.com/ezyang ghstack dependencies: #138766, #138956	2024-11-07 00:21:46 +00:00
PyTorch MergeBot	6dada2136a	Revert "Refactor FxGraphDrawer to use HTML-like labels (#137726 )" This reverts commit `1e73842029`. Reverted https://github.com/pytorch/pytorch/pull/137726 on behalf of https://github.com/huydhn due to Sorry for reverting your change, but it looks like some internal components are failing after this change and need to be updated ([comment](https://github.com/pytorch/pytorch/pull/137726#issuecomment-2455332612))	2024-11-04 17:44:44 +00:00

1 2 3 4 5 ...

438 Commits