这篇文章主要介绍“PostgreSQL查询优化中对Having和Group By子句的简化处理分析”,在日常操作中,相信很多人在PostgreSQL查询优化中对Having和Group By子句的简化处理分析问题上存在疑惑,小编查阅了各式资料,整理出简单好用的操作方法,希望对大家解答”PostgreSQL查询优化中对Having和Group By子句的简化处理分析”的疑惑有所帮助!接下来,请跟着小编一起来学习吧!

公司主营业务:成都做网站、网站制作、移动网站开发等业务。帮助企业客户真正实现互联网宣传,提高企业的竞争能力。创新互联建站是一支青春激扬、勤奋敬业、活力青春激扬、勤奋敬业、活力澎湃、和谐高效的团队。公司秉承以“开放、自由、严谨、自律”为核心的企业文化,感谢他们对我们的高要求,感谢他们从不同领域给我们带来的挑战,让我们激情的团队有机会用头脑与智慧不断的给客户带来惊喜。创新互联建站推出文登免费做网站回馈大家。
简化Having语句
把Having中的约束条件,如满足可以提升到Where条件中的,则移动到Where子句中,否则仍保留在Having语句中.这样做的目的是因为Having过滤在Group by之后执行,如能把Having中的过滤提升到Where中,则可以提前执行"选择"运算,减少Group by的开销.
以下语句,条件dwbh='1002'提升到Where中执行:
testdb=# explain verbose select a.dwbh,a.xb,count(*) testdb-# from t_grxx a testdb-# group by a.dwbh,a.xb testdb-# having count(*) >= 1 and dwbh = '1002'; QUERY PLAN ----------------------------------------------------------------------------- GroupAggregate (cost=15.01..15.06 rows=1 width=84) Output: dwbh, xb, count(*) Group Key: a.dwbh, a.xb Filter: (count(*) >= 1) -- count(*) >= 1 仍保留在Having中 -> Sort (cost=15.01..15.02 rows=2 width=76) Output: dwbh, xb Sort Key: a.xb -> Seq Scan on public.t_grxx a (cost=0.00..15.00 rows=2 width=76) Output: dwbh, xb Filter: ((a.dwbh)::text = '1002'::text) -- 提升到Where中,扫描时过滤Tuple (10 rows)
如存在Group by & Grouping sets则不作处理:
testdb=# explain verbose testdb-# select a.dwbh,a.xb,count(*) testdb-# from t_grxx a testdb-# group by testdb-# grouping sets ((a.dwbh),(a.xb),()) testdb-# having count(*) >= 1 and dwbh = '1002' testdb-# order by a.dwbh,a.xb; QUERY PLAN ------------------------------------------------------------------------------- Sort (cost=28.04..28.05 rows=3 width=84) Output: dwbh, xb, (count(*)) Sort Key: a.dwbh, a.xb -> MixedAggregate (cost=0.00..28.02 rows=3 width=84) Output: dwbh, xb, count(*) Hash Key: a.dwbh Hash Key: a.xb Group Key: () Filter: ((count(*) >= 1) AND ((a.dwbh)::text = '1002'::text)) -- 扫描数据表后再过滤 -> Seq Scan on public.t_grxx a (cost=0.00..14.00 rows=400 width=76) Output: dwbh, grbh, xm, xb, nl (11 rows)
简化Group by语句
如Group by中的字段列表已包含某个表主键的所有列,则该表在Group by语句中的其他列可以删除,这样的做法有利于提升在Group by过程中排序或Hash的性能,减少不必要的开销.
testdb=# explain verbose select a.dwbh,a.dwmc,count(*) testdb-# from t_dwxx a testdb-# group by a.dwbh,a.dwmc testdb-# having count(*) >= 1; QUERY PLAN -------------------------------------------------------------------------- HashAggregate (cost=13.20..15.20 rows=53 width=264) Output: dwbh, dwmc, count(*) Group Key: a.dwbh, a.dwmc -- 分组键为dwbh & dwmc Filter: (count(*) >= 1) -> Seq Scan on public.t_dwxx a (cost=0.00..11.60 rows=160 width=256) Output: dwmc, dwbh, dwdz (6 rows) testdb=# alter table t_dwxx add primary key(dwbh); -- 添加主键 ALTER TABLE testdb=# explain verbose select a.dwbh,a.dwmc,count(*) from t_dwxx a group by a.dwbh,a.dwmc having count(*) >= 1; QUERY PLAN ----------------------------------------------------------------------- HashAggregate (cost=1.05..1.09 rows=1 width=264) Output: dwbh, dwmc, count(*) Group Key: a.dwbh -- 分组键只保留dwbh Filter: (count(*) >= 1) -> Seq Scan on public.t_dwxx a (cost=0.00..1.03 rows=3 width=256) Output: dwmc, dwbh, dwdz (6 rows)
相关处理的源码位于文件subquery_planner.c中,主函数为subquery_planner,代码片段如下:
/*
* In some cases we may want to transfer a HAVING clause into WHERE. We
* cannot do so if the HAVING clause contains aggregates (obviously) or
* volatile functions (since a HAVING clause is supposed to be executed
* only once per group). We also can't do this if there are any nonempty
* grouping sets; moving such a clause into WHERE would potentially change
* the results, if any referenced column isn't present in all the grouping
* sets. (If there are only empty grouping sets, then the HAVING clause
* must be degenerate as discussed below.)
*
* Also, it may be that the clause is so expensive to execute that we're
* better off doing it only once per group, despite the loss of
* selectivity. This is hard to estimate short of doing the entire
* planning process twice, so we use a heuristic: clauses containing
* subplans are left in HAVING. Otherwise, we move or copy the HAVING
* clause into WHERE, in hopes of eliminating tuples before aggregation
* instead of after.
*
* If the query has explicit grouping then we can simply move such a
* clause into WHERE; any group that fails the clause will not be in the
* output because none of its tuples will reach the grouping or
* aggregation stage. Otherwise we must have a degenerate (variable-free)
* HAVING clause, which we put in WHERE so that query_planner() can use it
* in a gating Result node, but also keep in HAVING to ensure that we
* don't emit a bogus aggregated row. (This could be done better, but it
* seems not worth optimizing.)
*
* Note that both havingQual and parse->jointree->quals are in
* implicitly-ANDed-list form at this point, even though they are declared
* as Node *.
*/
newHaving = NIL;
foreach(l, (List *) parse->havingQual)//存在Having条件语句
{
Node *havingclause = (Node *) lfirst(l);//获取谓词
if ((parse->groupClause && parse->groupingSets) ||
contain_agg_clause(havingclause) ||
contain_volatile_functions(havingclause) ||
contain_subplans(havingclause))
{
/* keep it in HAVING */
//如果有Group&&Group Sets语句
//保持不变
newHaving = lappend(newHaving, havingclause);
}
else if (parse->groupClause && !parse->groupingSets)
{
/* move it to WHERE */
//只有group语句,可以加入到jointree的条件中
parse->jointree->quals = (Node *)
lappend((List *) parse->jointree->quals, havingclause);
}
else//既没有group也没有grouping set,拷贝一份到jointree的条件中
{
/* put a copy in WHERE, keep it in HAVING */
parse->jointree->quals = (Node *)
lappend((List *) parse->jointree->quals,
copyObject(havingclause));
newHaving = lappend(newHaving, havingclause);
}
}
parse->havingQual = (Node *) newHaving;//调整having子句
/* Remove any redundant GROUP BY columns */
remove_useless_groupby_columns(root);//去掉group by中无用的数据列remove_useless_groupby_columns
/*
* remove_useless_groupby_columns
* Remove any columns in the GROUP BY clause that are redundant due to
* being functionally dependent on other GROUP BY columns.
*
* Since some other DBMSes do not allow references to ungrouped columns, it's
* not unusual to find all columns listed in GROUP BY even though listing the
* primary-key columns would be sufficient. Deleting such excess columns
* avoids redundant sorting work, so it's worth doing. When we do this, we
* must mark the plan as dependent on the pkey constraint (compare the
* parser's check_ungrouped_columns() and check_functional_grouping()).
*
* In principle, we could treat any NOT-NULL columns appearing in a UNIQUE
* index as the determining columns. But as with check_functional_grouping(),
* there's currently no way to represent dependency on a NOT NULL constraint,
* so we consider only the pkey for now.
*/
static void
remove_useless_groupby_columns(PlannerInfo *root)
{
Query *parse = root->parse;//查询树
Bitmapset **groupbyattnos;//位图集合
Bitmapset **surplusvars;//位图集合
ListCell *lc;
int relid;
/* No chance to do anything if there are less than two GROUP BY items */
if (list_length(parse->groupClause) < 2)//如果只有1个ITEMS,无需处理
return;
/* Don't fiddle with the GROUP BY clause if the query has grouping sets */
if (parse->groupingSets)//存在Grouping sets,不作处理
return;
/*
* Scan the GROUP BY clause to find GROUP BY items that are simple Vars.
* Fill groupbyattnos[k] with a bitmapset of the column attnos of RTE k
* that are GROUP BY items.
*/
//用于分组的属性
groupbyattnos = (Bitmapset **) palloc0(sizeof(Bitmapset *) *
(list_length(parse->rtable) + 1));
foreach(lc, parse->groupClause)
{
SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
TargetEntry *tle = get_sortgroupclause_tle(sgc, parse->targetList);
Var *var = (Var *) tle->expr;
/*
* Ignore non-Vars and Vars from other query levels.
*
* XXX in principle, stable expressions containing Vars could also be
* removed, if all the Vars are functionally dependent on other GROUP
* BY items. But it's not clear that such cases occur often enough to
* be worth troubling over.
*/
if (!IsA(var, Var) ||
var->varlevelsup > 0)
continue;
/* OK, remember we have this Var */
relid = var->varno;
Assert(relid <= list_length(parse->rtable));
groupbyattnos[relid] = bms_add_member(groupbyattnos[relid],
var->varattno - FirstLowInvalidHeapAttributeNumber);
}
/*
* Consider each relation and see if it is possible to remove some of its
* Vars from GROUP BY. For simplicity and speed, we do the actual removal
* in a separate pass. Here, we just fill surplusvars[k] with a bitmapset
* of the column attnos of RTE k that are removable GROUP BY items.
*/
surplusvars = NULL; /* don't allocate array unless required */
relid = 0;
//如某个Relation的分组键中已含主键列,去掉其他列
foreach(lc, parse->rtable)
{
RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
Bitmapset *relattnos;
Bitmapset *pkattnos;
Oid constraintOid;
relid++;
/* Only plain relations could have primary-key constraints */
if (rte->rtekind != RTE_RELATION)
continue;
/* Nothing to do unless this rel has multiple Vars in GROUP BY */
relattnos = groupbyattnos[relid];
if (bms_membership(relattnos) != BMS_MULTIPLE)
continue;
/*
* Can't remove any columns for this rel if there is no suitable
* (i.e., nondeferrable) primary key constraint.
*/
pkattnos = get_primary_key_attnos(rte->relid, false, &constraintOid);
if (pkattnos == NULL)
continue;
/*
* If the primary key is a proper subset of relattnos then we have
* some items in the GROUP BY that can be removed.
*/
if (bms_subset_compare(pkattnos, relattnos) == BMS_SUBSET1)
{
/*
* To easily remember whether we've found anything to do, we don't
* allocate the surplusvars[] array until we find something.
*/
if (surplusvars == NULL)
surplusvars = (Bitmapset **) palloc0(sizeof(Bitmapset *) *
(list_length(parse->rtable) + 1));
/* Remember the attnos of the removable columns */
surplusvars[relid] = bms_difference(relattnos, pkattnos);
/* Also, mark the resulting plan as dependent on this constraint */
parse->constraintDeps = lappend_oid(parse->constraintDeps,
constraintOid);
}
}
/*
* If we found any surplus Vars, build a new GROUP BY clause without them.
* (Note: this may leave some TLEs with unreferenced ressortgroupref
* markings, but that's harmless.)
*/
if (surplusvars != NULL)
{
List *new_groupby = NIL;
foreach(lc, parse->groupClause)
{
SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
TargetEntry *tle = get_sortgroupclause_tle(sgc, parse->targetList);
Var *var = (Var *) tle->expr;
/*
* New list must include non-Vars, outer Vars, and anything not
* marked as surplus.
*/
if (!IsA(var, Var) ||
var->varlevelsup > 0 ||
!bms_is_member(var->varattno - FirstLowInvalidHeapAttributeNumber,
surplusvars[var->varno]))
new_groupby = lappend(new_groupby, sgc);
}
parse->groupClause = new_groupby;
}
}到此,关于“PostgreSQL查询优化中对Having和Group By子句的简化处理分析”的学习就结束了,希望能够解决大家的疑惑。理论与实践的搭配能更好的帮助大家学习,快去试试吧!若想继续学习更多相关知识,请继续关注创新互联网站,小编会继续努力为大家带来更多实用的文章!