mirror of
https://github.com/zebrajr/postgres.git
synced 2025-12-06 12:20:15 +01:00
Eager aggregation is a query optimization technique that partially pushes aggregation past a join, and finalizes it once all the relations are joined. Eager aggregation may reduce the number of input rows to the join and thus could result in a better overall plan. In the current planner architecture, the separation between the scan/join planning phase and the post-scan/join phase means that aggregation steps are not visible when constructing the join tree, limiting the planner's ability to exploit aggregation-aware optimizations. To implement eager aggregation, we collect information about aggregate functions in the targetlist and HAVING clause, along with grouping expressions from the GROUP BY clause, and store it in the PlannerInfo node. During the scan/join planning phase, this information is used to evaluate each base or join relation to determine whether eager aggregation can be applied. If applicable, we create a separate RelOptInfo, referred to as a grouped relation, to represent the partially-aggregated version of the relation and generate grouped paths for it. Grouped relation paths can be generated in two ways. The first method involves adding sorted and hashed partial aggregation paths on top of the non-grouped paths. To limit planning time, we only consider the cheapest or suitably-sorted non-grouped paths in this step. Alternatively, grouped paths can be generated by joining a grouped relation with a non-grouped relation. Joining two grouped relations is currently not supported. To further limit planning time, we currently adopt a strategy where partial aggregation is pushed only to the lowest feasible level in the join tree where it provides a significant reduction in row count. This strategy also helps ensure that all grouped paths for the same grouped relation produce the same set of rows, which is important to support a fundamental assumption of the planner. For the partial aggregation that is pushed down to a non-aggregated relation, we need to consider all expressions from this relation that are involved in upper join clauses and include them in the grouping keys, using compatible operators. This is essential to ensure that an aggregated row from the partial aggregation matches the other side of the join if and only if each row in the partial group does. This ensures that all rows within the same partial group share the same "destiny", which is crucial for maintaining correctness. One restriction is that we cannot push partial aggregation down to a relation that is in the nullable side of an outer join, because the NULL-extended rows produced by the outer join would not be available when we perform the partial aggregation, while with a non-eager-aggregation plan these rows are available for the top-level aggregation. Pushing partial aggregation in this case may result in the rows being grouped differently than expected, or produce incorrect values from the aggregate functions. If we have generated a grouped relation for the topmost join relation, we finalize its paths at the end. The final paths will compete in the usual way with paths built from regular planning. The patch was originally proposed by Antonin Houska in 2017. This commit reworks various important aspects and rewrites most of the current code. However, the original patch and reviews were very useful. Author: Richard Guo <guofenglinux@gmail.com> Author: Antonin Houska <ah@cybertec.at> (in an older version) Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Jian He <jian.universality@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Tomas Vondra <tomas@vondra.me> (in an older version) Reviewed-by: Andy Fan <zhihuifan1213@163.com> (in an older version) Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> (in an older version) Discussion: https://postgr.es/m/CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com
306 lines
10 KiB
C
306 lines
10 KiB
C
/*-------------------------------------------------------------------------
|
|
*
|
|
* planmain.c
|
|
* Routines to plan a single query
|
|
*
|
|
* What's in a name, anyway? The top-level entry point of the planner/
|
|
* optimizer is over in planner.c, not here as you might think from the
|
|
* file name. But this is the main code for planning a basic join operation,
|
|
* shorn of features like subselects, inheritance, aggregates, grouping,
|
|
* and so on. (Those are the things planner.c deals with.)
|
|
*
|
|
* Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
|
*
|
|
*
|
|
* IDENTIFICATION
|
|
* src/backend/optimizer/plan/planmain.c
|
|
*
|
|
*-------------------------------------------------------------------------
|
|
*/
|
|
#include "postgres.h"
|
|
|
|
#include "optimizer/appendinfo.h"
|
|
#include "optimizer/clauses.h"
|
|
#include "optimizer/optimizer.h"
|
|
#include "optimizer/orclauses.h"
|
|
#include "optimizer/pathnode.h"
|
|
#include "optimizer/paths.h"
|
|
#include "optimizer/placeholder.h"
|
|
#include "optimizer/planmain.h"
|
|
|
|
|
|
/*
|
|
* query_planner
|
|
* Generate a path (that is, a simplified plan) for a basic query,
|
|
* which may involve joins but not any fancier features.
|
|
*
|
|
* Since query_planner does not handle the toplevel processing (grouping,
|
|
* sorting, etc) it cannot select the best path by itself. Instead, it
|
|
* returns the RelOptInfo for the top level of joining, and the caller
|
|
* (grouping_planner) can choose among the surviving paths for the rel.
|
|
*
|
|
* root describes the query to plan
|
|
* qp_callback is a function to compute query_pathkeys once it's safe to do so
|
|
* qp_extra is optional extra data to pass to qp_callback
|
|
*
|
|
* Note: the PlannerInfo node also includes a query_pathkeys field, which
|
|
* tells query_planner the sort order that is desired in the final output
|
|
* plan. This value is *not* available at call time, but is computed by
|
|
* qp_callback once we have completed merging the query's equivalence classes.
|
|
* (We cannot construct canonical pathkeys until that's done.)
|
|
*/
|
|
RelOptInfo *
|
|
query_planner(PlannerInfo *root,
|
|
query_pathkeys_callback qp_callback, void *qp_extra)
|
|
{
|
|
Query *parse = root->parse;
|
|
List *joinlist;
|
|
RelOptInfo *final_rel;
|
|
|
|
/*
|
|
* Init planner lists to empty.
|
|
*
|
|
* NOTE: append_rel_list was set up by subquery_planner, so do not touch
|
|
* here.
|
|
*/
|
|
root->join_rel_list = NIL;
|
|
root->join_rel_hash = NULL;
|
|
root->join_rel_level = NULL;
|
|
root->join_cur_level = 0;
|
|
root->canon_pathkeys = NIL;
|
|
root->left_join_clauses = NIL;
|
|
root->right_join_clauses = NIL;
|
|
root->full_join_clauses = NIL;
|
|
root->join_info_list = NIL;
|
|
root->placeholder_list = NIL;
|
|
root->placeholder_array = NULL;
|
|
root->placeholder_array_size = 0;
|
|
root->agg_clause_list = NIL;
|
|
root->group_expr_list = NIL;
|
|
root->tlist_vars = NIL;
|
|
root->fkey_list = NIL;
|
|
root->initial_rels = NIL;
|
|
|
|
/*
|
|
* Set up arrays for accessing base relations and AppendRelInfos.
|
|
*/
|
|
setup_simple_rel_arrays(root);
|
|
|
|
/*
|
|
* In the trivial case where the jointree is a single RTE_RESULT relation,
|
|
* bypass all the rest of this function and just make a RelOptInfo and its
|
|
* one access path. This is worth optimizing because it applies for
|
|
* common cases like "SELECT expression" and "INSERT ... VALUES()".
|
|
*/
|
|
Assert(parse->jointree->fromlist != NIL);
|
|
if (list_length(parse->jointree->fromlist) == 1)
|
|
{
|
|
Node *jtnode = (Node *) linitial(parse->jointree->fromlist);
|
|
|
|
if (IsA(jtnode, RangeTblRef))
|
|
{
|
|
int varno = ((RangeTblRef *) jtnode)->rtindex;
|
|
RangeTblEntry *rte = root->simple_rte_array[varno];
|
|
|
|
Assert(rte != NULL);
|
|
if (rte->rtekind == RTE_RESULT)
|
|
{
|
|
/* Make the RelOptInfo for it directly */
|
|
final_rel = build_simple_rel(root, varno, NULL);
|
|
|
|
/*
|
|
* If query allows parallelism in general, check whether the
|
|
* quals are parallel-restricted. (We need not check
|
|
* final_rel->reltarget because it's empty at this point.
|
|
* Anything parallel-restricted in the query tlist will be
|
|
* dealt with later.) We should always do this in a subquery,
|
|
* since it might be useful to use the subquery in parallel
|
|
* paths in the parent level. At top level this is normally
|
|
* not worth the cycles, because a Result-only plan would
|
|
* never be interesting to parallelize. However, if
|
|
* debug_parallel_query is on, then we want to execute the
|
|
* Result in a parallel worker if possible, so we must check.
|
|
*/
|
|
if (root->glob->parallelModeOK &&
|
|
(root->query_level > 1 ||
|
|
debug_parallel_query != DEBUG_PARALLEL_OFF))
|
|
final_rel->consider_parallel =
|
|
is_parallel_safe(root, parse->jointree->quals);
|
|
|
|
/*
|
|
* The only path for it is a trivial Result path. We cheat a
|
|
* bit here by using a GroupResultPath, because that way we
|
|
* can just jam the quals into it without preprocessing them.
|
|
* (But, if you hold your head at the right angle, a FROM-less
|
|
* SELECT is a kind of degenerate-grouping case, so it's not
|
|
* that much of a cheat.)
|
|
*/
|
|
add_path(final_rel, (Path *)
|
|
create_group_result_path(root, final_rel,
|
|
final_rel->reltarget,
|
|
(List *) parse->jointree->quals));
|
|
|
|
/* Select cheapest path (pretty easy in this case...) */
|
|
set_cheapest(final_rel);
|
|
|
|
/*
|
|
* We don't need to run generate_base_implied_equalities, but
|
|
* we do need to pretend that EC merging is complete.
|
|
*/
|
|
root->ec_merging_done = true;
|
|
|
|
/*
|
|
* We still are required to call qp_callback, in case it's
|
|
* something like "SELECT 2+2 ORDER BY 1".
|
|
*/
|
|
(*qp_callback) (root, qp_extra);
|
|
|
|
return final_rel;
|
|
}
|
|
}
|
|
}
|
|
|
|
/*
|
|
* Construct RelOptInfo nodes for all base relations used in the query.
|
|
* Appendrel member relations ("other rels") will be added later.
|
|
*
|
|
* Note: the reason we find the baserels by searching the jointree, rather
|
|
* than scanning the rangetable, is that the rangetable may contain RTEs
|
|
* for rels not actively part of the query, for example views. We don't
|
|
* want to make RelOptInfos for them.
|
|
*/
|
|
add_base_rels_to_query(root, (Node *) parse->jointree);
|
|
|
|
/* Remove any redundant GROUP BY columns */
|
|
remove_useless_groupby_columns(root);
|
|
|
|
/*
|
|
* Examine the targetlist and join tree, adding entries to baserel
|
|
* targetlists for all referenced Vars, and generating PlaceHolderInfo
|
|
* entries for all referenced PlaceHolderVars. Restrict and join clauses
|
|
* are added to appropriate lists belonging to the mentioned relations. We
|
|
* also build EquivalenceClasses for provably equivalent expressions. The
|
|
* SpecialJoinInfo list is also built to hold information about join order
|
|
* restrictions. Finally, we form a target joinlist for make_one_rel() to
|
|
* work from.
|
|
*/
|
|
build_base_rel_tlists(root, root->processed_tlist);
|
|
|
|
find_placeholders_in_jointree(root);
|
|
|
|
find_lateral_references(root);
|
|
|
|
joinlist = deconstruct_jointree(root);
|
|
|
|
/*
|
|
* Reconsider any postponed outer-join quals now that we have built up
|
|
* equivalence classes. (This could result in further additions or
|
|
* mergings of classes.)
|
|
*/
|
|
reconsider_outer_join_clauses(root);
|
|
|
|
/*
|
|
* If we formed any equivalence classes, generate additional restriction
|
|
* clauses as appropriate. (Implied join clauses are formed on-the-fly
|
|
* later.)
|
|
*/
|
|
generate_base_implied_equalities(root);
|
|
|
|
/*
|
|
* We have completed merging equivalence sets, so it's now possible to
|
|
* generate pathkeys in canonical form; so compute query_pathkeys and
|
|
* other pathkeys fields in PlannerInfo.
|
|
*/
|
|
(*qp_callback) (root, qp_extra);
|
|
|
|
/*
|
|
* Examine any "placeholder" expressions generated during subquery pullup.
|
|
* Make sure that the Vars they need are marked as needed at the relevant
|
|
* join level. This must be done before join removal because it might
|
|
* cause Vars or placeholders to be needed above a join when they weren't
|
|
* so marked before.
|
|
*/
|
|
fix_placeholder_input_needed_levels(root);
|
|
|
|
/*
|
|
* Remove any useless outer joins. Ideally this would be done during
|
|
* jointree preprocessing, but the necessary information isn't available
|
|
* until we've built baserel data structures and classified qual clauses.
|
|
*/
|
|
joinlist = remove_useless_joins(root, joinlist);
|
|
|
|
/*
|
|
* Also, reduce any semijoins with unique inner rels to plain inner joins.
|
|
* Likewise, this can't be done until now for lack of needed info.
|
|
*/
|
|
reduce_unique_semijoins(root);
|
|
|
|
/*
|
|
* Remove self joins on a unique column.
|
|
*/
|
|
joinlist = remove_useless_self_joins(root, joinlist);
|
|
|
|
/*
|
|
* Now distribute "placeholders" to base rels as needed. This has to be
|
|
* done after join removal because removal could change whether a
|
|
* placeholder is evaluable at a base rel.
|
|
*/
|
|
add_placeholders_to_base_rels(root);
|
|
|
|
/*
|
|
* Construct the lateral reference sets now that we have finalized
|
|
* PlaceHolderVar eval levels.
|
|
*/
|
|
create_lateral_join_info(root);
|
|
|
|
/*
|
|
* Match foreign keys to equivalence classes and join quals. This must be
|
|
* done after finalizing equivalence classes, and it's useful to wait till
|
|
* after join removal so that we can skip processing foreign keys
|
|
* involving removed relations.
|
|
*/
|
|
match_foreign_keys_to_quals(root);
|
|
|
|
/*
|
|
* Look for join OR clauses that we can extract single-relation
|
|
* restriction OR clauses from.
|
|
*/
|
|
extract_restriction_or_clauses(root);
|
|
|
|
/*
|
|
* Check if eager aggregation is applicable, and if so, set up
|
|
* root->agg_clause_list and root->group_expr_list.
|
|
*/
|
|
setup_eager_aggregation(root);
|
|
|
|
/*
|
|
* Now expand appendrels by adding "otherrels" for their children. We
|
|
* delay this to the end so that we have as much information as possible
|
|
* available for each baserel, including all restriction clauses. That
|
|
* let us prune away partitions that don't satisfy a restriction clause.
|
|
* Also note that some information such as lateral_relids is propagated
|
|
* from baserels to otherrels here, so we must have computed it already.
|
|
*/
|
|
add_other_rels_to_query(root);
|
|
|
|
/*
|
|
* Distribute any UPDATE/DELETE/MERGE row identity variables to the target
|
|
* relations. This can't be done till we've finished expansion of
|
|
* appendrels.
|
|
*/
|
|
distribute_row_identity_vars(root);
|
|
|
|
/*
|
|
* Ready to do the primary planning.
|
|
*/
|
|
final_rel = make_one_rel(root, joinlist);
|
|
|
|
/* Check that we got at least one usable path */
|
|
if (!final_rel || !final_rel->cheapest_total_path ||
|
|
final_rel->cheapest_total_path->param_info != NULL)
|
|
elog(ERROR, "failed to construct the join relation");
|
|
|
|
return final_rel;
|
|
}
|