TBase is an advanced enterprise-level database management system based on prior work of Postgres-XL project. It supports an extended subset of the SQL standard, including transactions, foreign keys, user-defined types and functions. Additional, it adds parallel computing, security, management, audit and other functions.
Postgres-XL 10R1.1 版本发布后(2019-02-18),后续 停止新版本发布。
Postgres-XL官方网站 Postgres-XL | Open Source Scalable SQL Database Cluster
Postgres-XL 10R1.1 now available! Click here to download the source tarball. (2019-02-18)
TBase是一个提供写可靠性,多主节点数据同步的关系数据库集群平台。你可以将TBase配置一台或者多台主机上,TBase数据存储在多台物理主机上面。数据表的存储有两种方式, 分别是distributed或者replicated ,当向TBase发送查询 SQL时,TBase会自动向数据节点发出查询语句并获取最终结果。
TBase采用分布式集群架构 , 该架构分布式为无共享(share nothing)模式,节点之间相应独立,各自处理自己的数据,处理后的结果可能向上层汇总或在节点间流转,各处理单元之间通过网络协议进行通信,并行处理和扩展能力更好,这也意味着只需要简单的x86服务器就可以部署TBase数据库集群。
解读一下TBase的三大模块
Coordinator:协调节点(简称CN)
业务访问入口,负责数据的分发和查询规划,多个节点位置对等,每个节点都提供相同的数据库视图;在功能上CN上只存储系统的全局元数据,并不存储实际的业务数据。
Datanode:数据节点(简称DN)
每个节点还存储业务数据的分片在功能上,DN节点负责完成执行协调节点分发的执行请求。
GTM:全局事务管理器(Global Transaction Manager)
负责管理集群事务信息,同时管理集群的全局对象,比如序列等。
Q:支持行列混合存储吗?
A:开发的V3版本是支持的,目前开源的V2版本是只支持行存。
Q:扩容后老数据如何清理?
A:我们现在扩容的老数据更多的是通过delete+vacuum的方式做数据的清理,因此老数据清理会对业务造成一些影响。后面会有一些更好的方式来做,比如说做一些数据聚簇的方案,来优化扩容的搬迁和扩容的数据清理对系统的一些影响。
Q :索引膨胀如何解决?
A:若你更新数据或者数据搬迁确实会有索引膨胀,我们建议是重建索引。因为索引重建是可以并行来做的,对业务其实是没有太大的影响,索引建好后把老的索引删掉就OK了。
++++++++++++++++++++++++++++++++++++++++++++
Utility commands 不需要执行计划
/* Utility commands require no planning. */
CMD_UTILITY, /* cmds like create, destroy, copy, vacuum,
* etc. */
非 Utility commands,
//调用优化器 planner函数
plan = planner(querytree, cursorOptions, boundParams);
/*
* Generate a plan for a single already-rewritten query.
* This is a thin wrapper around planner() and takes the same parameters.
*/
PlannedStmt *
pg_plan_query(Query *querytree, int cursorOptions, ParamListInfo boundParams)
/* call the optimizer */
plan = planner(querytree, cursorOptions, boundParams);
\src\backend\optimizer\plan\Planner.c
/*****************************************************************************
*
* Query optimizer entry point
*
* To support loadable plugins that monitor or modify planner behavior,
* we provide a hook variable that lets a plugin get control before and
* after the standard planning process. The plugin would normally call
* standard_planner().
*
* Note to plugin authors: standard_planner() scribbles on its Query input,
* so you'd better copy that data structure if you want to plan more than once.
*
*****************************************************************************/
PlannedStmt *
planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
CN 节点调用 pgxc_planner(parse, cursorOptions, boundParams);
DN节点调用standard_planner(parse, cursorOptions, boundParams);
\src\backend\optimizer\plan\Planner.c
#ifdef PGXC
/*
* A Coordinator receiving a query from another Coordinator
* is not allowed to go into PGXC planner.
*/
if (IS_PGXC_LOCAL_COORDINATOR)
result = pgxc_planner(parse, cursorOptions, boundParams);
else
#endif
result = standard_planner(parse, cursorOptions, boundParams);
CN 节点调用 pgxc_planner(parse, cursorOptions, boundParams);
\src\backend\optimizer\plan\Planner.c
/*
* Build up a QueryPlan to execute on.
*
* This functions tries to find out whether
* 1. The statement can be shipped to the Datanode and Coordinator is needed
* only as a proxy - in which case, it creates a single node plan.
* 2. The statement can be evaluated on the Coordinator completely - thus no
* query shipping is involved and standard_planner() is invoked to plan the
* statement
* 3. The statement needs Coordinator as well as Datanode for evaluation -
* again we use standard_planner() to plan the statement.
*
* The plan generated in either of the above cases is returned.
*/
PlannedStmt *
pgxc_planner(Query *query, int cursorOptions, ParamListInfo boundParams)
{
PlannedStmt *result;
/* see if can ship the query completely */
result = pgxc_FQS_planner(query, cursorOptions, boundParams);
if (result)
return result;
/* we need Coordinator for evaluation, invoke standard planner */
result = standard_planner(query, cursorOptions, boundParams);
return result;
}
\src\backend\optimizer\plan\Planner.c
standard_planner(parse, cursorOptions, boundParams);
PlannedStmt *
standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
{// #lizard forgives
PlannedStmt *result;
PlannerGlobal *glob;
double tuple_fraction;
PlannerInfo *root;
RelOptInfo *final_rel;
Path *best_path;
Plan *top_plan;
ListCell *lp,
*lr;
#ifdef XCP
if (IS_PGXC_LOCAL_COORDINATOR && parse->utilityStmt &&
IsA(parse->utilityStmt, RemoteQuery))
return pgxc_direct_planner(parse, cursorOptions, boundParams);
#endif
/*
* Set up global state for this planner invocation. This data is needed
* across all levels of sub-Query that might exist in the given command,
* so we keep it in a separate struct that's linked to by each per-Query
* PlannerInfo.
*/
glob = makeNode(PlannerGlobal);
...
#ifdef __TBASE__
groupOids = NULL;
min_workers_of_hashjon_gather = PG_INT32_MAX;
#endif
...
/* Determine what fraction of the plan is likely to be scanned */
if (cursorOptions & CURSOR_OPT_FAST_PLAN)
...
/* primary planning entry point (may recurse for subqueries) */
root = subquery_planner(glob, parse, NULL,
false, tuple_fraction);
/* Select best Path and turn it into a Plan */
final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL);
best_path = get_cheapest_fractional_path(final_rel, tuple_fraction);
if (!root->distribution)
root->distribution = best_path->distribution;
#ifdef __TBASE__
if (root->distribution && !parse->hasUnshippableTriggers)
{
remote_subplan_depth++;
}
#endif
top_plan = create_plan(root, best_path);
#ifdef XCP
#ifdef __TBASE__
if (root->distribution && !parse->hasUnshippableTriggers)
#else
if (root->distribution)
#endif
{
/*
* FIXME, this keeps adding RemoteSubplan at a top of queries that
* don't really need it (e.g above a MergeAppend with subplans pushed
* to remote nodes). Not sure why it's happening, though ...
*/
top_plan = (Plan *) make_remotesubplan(root, top_plan, NULL,
root->distribution,
root->sort_pathkeys);
SS_remote_attach_initplans(root, top_plan);
remote_subplan_depth--;
}
#endif
...
/*
* If creating a plan for a scrollable cursor, make sure it can run
* backwards on demand. Add a Material node at the top at need.
*/
if (cursorOptions & CURSOR_OPT_SCROLL)
{
if (!ExecSupportsBackwardScan(top_plan))
top_plan = materialize_finished_plan(top_plan);
}
/*
* Optionally add a Gather node for testing purposes, provided this is
* actually a safe thing to do.
*/
if (force_parallel_mode != FORCE_PARALLEL_OFF && top_plan->parallel_safe)
{
Gather *gather = makeNode(Gather);
gather->plan.targetlist = top_plan->targetlist;
gather->plan.qual = NIL;
gather->plan.lefttree = top_plan;
gather->plan.righttree = NULL;
gather->num_workers = 1;
gather->single_copy = true;
gather->invisible = (force_parallel_mode == FORCE_PARALLEL_REGRESS);
/*
* Ideally we'd use cost_gather here, but setting up dummy path data
* to satisfy it doesn't seem much cleaner than knowing what it does.
*/
gather->plan.startup_cost = top_plan->startup_cost +
parallel_setup_cost;
gather->plan.total_cost = top_plan->total_cost +
parallel_setup_cost + parallel_tuple_cost * top_plan->plan_rows;
gather->plan.plan_rows = top_plan->plan_rows;
gather->plan.plan_width = top_plan->plan_width;
gather->plan.parallel_aware = false;
gather->plan.parallel_safe = false;
/* use parallel mode for parallel plans. */
root->glob->parallelModeNeeded = true;
top_plan = &gather->plan;
}
/*
* If any Params were generated, run through the plan tree and compute
* each plan node's extParam/allParam sets. Ideally we'd merge this into
* set_plan_references' tree traversal, but for now it has to be separate
* because we need to visit subplans before not after main plan.
*/
if (glob->nParamExec > 0)
{
Assert(list_length(glob->subplans) == list_length(glob->subroots));
forboth(lp, glob->subplans, lr, glob->subroots)
{
Plan *subplan = (Plan *) lfirst(lp);
PlannerInfo *subroot = (PlannerInfo *) lfirst(lr);
SS_finalize_plan(subroot, subplan);
}
SS_finalize_plan(root, top_plan);
}
...
standard_planner函数调用的——subquery_planner函数
subquery_planner 函数接受Query查询树,最后通过PlannedStmt结构体的形式返回Plan计划树。(Plan计划树被封装在PlannedStmt结构体中)subquery_planner函数负责创建计划,可以递归处理子查询。subquery_planner函数的工作分为两个部分:
1:依据消除冗余条件,减少查询层次,简化路径生成的基本思想,调用预处理函数对查询树进行处理
2:调用inheritance_planner 或者grouping_planner进入生成计划流程,此过程不对查询树做出实质性改变
\src\backend\optimizer\plan\Planner.c
/*--------------------
* subquery_planner
* Invokes the planner on a subquery. We recurse to here for each
* sub-SELECT found in the query tree.
*
* glob is the global state for the current planner run.
* parse is the querytree produced by the parser & rewriter.
* parent_root is the immediate parent Query's info (NULL at the top level).
* hasRecursion is true if this is a recursive WITH query.
* tuple_fraction is the fraction of tuples we expect will be retrieved.
* tuple_fraction is interpreted as explained for grouping_planner, below.
*
* Basically, this routine does the stuff that should only be done once
* per Query object. It then calls grouping_planner. At one time,
* grouping_planner could be invoked recursively on the same Query object;
* that's not currently true, but we keep the separation between the two
* routines anyway, in case we need it again someday.
*
* subquery_planner will be called recursively to handle sub-Query nodes
* found within the query's expressions and rangetable.
*
* Returns the PlannerInfo struct ("root") that contains all data generated
* while planning the subquery. In particular, the Path(s) attached to
* the (UPPERREL_FINAL, NULL) upperrel represent our conclusions about the
* cheapest way(s) to implement the query. The top level will select the
* best Path and pass it through createplan.c to produce a finished Plan.
*--------------------
*/
PlannerInfo *
subquery_planner(PlannerGlobal *glob, Query *parse,
PlannerInfo *parent_root,
bool hasRecursion, double tuple_fraction)
子链接和子查询的区别:子查询是一条完整的查询语句,而子链接是一条表达式,但是表达式内部也可以包含查询语句。直白点说呢就是:子查询是放在FROM子句里的而子链接则出现在WHERE子句或者HAVING子句中。
在subquery_planner函数里,调用pull_up_sublinks函数处理WHERE子句和JOIN/ON子句中的ANY和EXISTS类型的子链接。
subquery_planner函数调用pull_up_subqueries函数来提升子查询。当子查询仅仅是一个简单的扫描或者连接时,就会把子查询或者子查询的一部分合并到父查询中以进行优化。
1.在范围表中存在子查询。对于简单的子查询,直接调用pull_up_simple_subquery函数进行提升;而对于简单的UNION ALL子查询,调用pull_up_simple_union_all函数进行提升,其他的情况则不处理;
2.在FROM表达式中存在子查询。对于FROM列表中的每个节点都调用pull_up_subqueries递归处理;
3.连接表达式中的子查询。调用pull_up_subqueries函数递归地处理.
/*
* pull_up_subqueries
* Look for subqueries in the rangetable that can be pulled up into
* the parent query. If the subquery has no special features like
* grouping/aggregation then we can merge it into the parent's jointree.
* Also, subqueries that are simple UNION ALL structures can be
* converted into "append relations".
*/
void
pull_up_subqueries(PlannerInfo *root)
表达式的预处理工作主要由函数preprocess_expression完成。该函数采用递归扫描的方式处理PlannerInfo结构体里面保存的目标属性、HAVING子句、OFFSET子句、LIMIT子句和连接树jointree。总体来说做了以下这些事
1.调用flatten_join_alias_vars函数,用基本关系变量取代连接别名变量;
2.调用函数eval_const_expression进行常量表达式的简化,也就是直接计算出常量表达式的值。例如:“3+1 <> 4” 这种会直接被替换成“FALSE”;
3.调用canonicalize_qual函数对表达式进行规范化,主要是将表达式转换为最佳析取范式或者合取范式
4.调用函数make_subplan将子链接转换为子计划.
\src\backend\optimizer\plan\Planner.c
/*
* preprocess_expression
* Do subquery_planner's preprocessing work for an expression,
* which can be a targetlist, a WHERE clause (including JOIN/ON
* conditions), a HAVING clause, or a few other things.
*/
static Node *
preprocess_expression(PlannerInfo *root, Node *expr, int kind)
{// #lizard forgives
/*
* Fall out quickly if expression is empty. This occurs often enough to
* be worth checking. Note that null->null is the correct conversion for
* implicit-AND result format, too.
*/
if (expr == NULL)
return NULL;
/*
* If the query has any join RTEs, replace join alias variables with
* base-relation variables. We must do this before sublink processing,
* else sublinks expanded out from join aliases would not get processed.
* We can skip it in non-lateral RTE functions, VALUES lists, and
* TABLESAMPLE clauses, however, since they can't contain any Vars of the
* current query level.
*/
if (root->hasJoinRTEs &&
!(kind == EXPRKIND_RTFUNC ||
kind == EXPRKIND_VALUES ||
kind == EXPRKIND_TABLESAMPLE ||
kind == EXPRKIND_TABLEFUNC))
expr = flatten_join_alias_vars(root, expr);
/*
* Simplify constant expressions.
*
* Note: an essential effect of this is to convert named-argument function
* calls to positional notation and insert the current actual values of
* any default arguments for functions. To ensure that happens, we *must*
* process all expressions here. Previous PG versions sometimes skipped
* const-simplification if it didn't seem worth the trouble, but we can't
* do that anymore.
*
* Note: this also flattens nested AND and OR expressions into N-argument
* form. All processing of a qual expression after this point must be
* careful to maintain AND/OR flatness --- that is, do not generate a tree
* with AND directly under AND, nor OR directly under OR.
*/
expr = eval_const_expressions(root, expr);
/*
* If it's a qual or havingQual, canonicalize it.
*/
if (kind == EXPRKIND_QUAL)
{
expr = (Node *) canonicalize_qual((Expr *) expr);
#ifdef OPTIMIZER_DEBUG
printf("After canonicalize_qual()\n");
pprint(expr);
#endif
}
/* Expand SubLinks to SubPlans */
if (root->parse->hasSubLinks)
expr = SS_process_sublinks(root, expr, (kind == EXPRKIND_QUAL));
/*
* XXX do not insert anything here unless you have grokked the comments in
* SS_replace_correlation_vars ...
*/
/* Replace uplevel vars with Param nodes (this IS possible in VALUES) */
if (root->query_level > 1)
expr = SS_replace_correlation_vars(root, expr);
/*
* If it's a qual or havingQual, convert it to implicit-AND format. (We
* don't want to do this before eval_const_expressions, since the latter
* would be unable to simplify a top-level AND correctly. Also,
* SS_process_sublinks expects explicit-AND format.)
*/
if (kind == EXPRKIND_QUAL)
expr = (Node *) make_ands_implicit((Expr *) expr);
return expr;
}
/*
* Expand SubLinks to SubPlans in the given expression.
*
* The isQual argument tells whether or not this expression is a WHERE/HAVING
* qualifier expression. If it is, any sublinks appearing at top level need
* not distinguish FALSE from UNKNOWN return values.
*/
Node *
SS_process_sublinks(PlannerInfo *root, Node *expr, bool isQual)
{
process_sublinks_context context;
context.root = root;
context.isTopQual = isQual;
return process_sublinks_mutator(expr, &context);
}
process_sublinks_mutator 引用了make_subplan 函数
/*
* Convert a SubLink (as created by the parser) into a SubPlan.
*
* We are given the SubLink's contained query, type, ID, and testexpr. We are
* also told if this expression appears at top level of a WHERE/HAVING qual.
*
* Note: we assume that the testexpr has been AND/OR flattened (actually,
* it's been through eval_const_expressions), but not converted to
* implicit-AND form; and any SubLinks in it should already have been
* converted to SubPlans. The subquery is as yet untouched, however.
*
* The result is whatever we need to substitute in place of the SubLink node
* in the executable expression. If we're going to do the subplan as a
* regular subplan, this will be the constructed SubPlan node. If we're going
* to do the subplan as an InitPlan, the SubPlan node instead goes into
* root->init_plans, and what we return here is an expression tree
* representing the InitPlan's result: usually just a Param node representing
* a single scalar result, but possibly a row comparison tree containing
* multiple Param nodes, or for a MULTIEXPR subquery a simple NULL constant
* (since the real output Params are elsewhere in the tree, and the MULTIEXPR
* subquery itself is in a resjunk tlist entry whose value is uninteresting).
*/
static Node *
make_subplan(PlannerInfo *root, Query *orig_subquery,
SubLinkType subLinkType, int subLinkId,
Node *testexpr, bool isTopQual)
函数make_subplan, 其执行步骤如下:
1)首先复制子链接SubLink中的查询树Query,如果查询树是一个EXISTS型的子计划,那么则调用simplify_EXISTS_query函数对QUery副本进行处理;
2)调用subquery_planner函数为子链接生成计划,同时设置好参数tuple_fraction来告诉底层规划器需要去多少个元组。对于EXISTS型的子查询,设置该值为1.0,取回一个元组即可;对于ALL和ANY型的子查询,从概率上说大约取到50%的元组就可以得到结果,因此设置参数值为0.5;对于其他的查询,统一设置为0;
3)调用build_subplan函数将上一步生成的计划转换为SubPlan或者InitPlan的形式,并将当前查询层次的参数列表传递给它的子计划;
4)对于生成的是SubPlan而且是一个简单的EXISTS类型的查询,调用convert_EXISTS_to_ANY函数尝试将其转换为一个ANY类型的查询并创建查询计划.
处理HAVING子句
对于HAVING子句来说,除了进行前面所提到的预处理外,还需要处理其中的每个条件。如果HAVING子句中没有聚集函数的话,那么它完全可以退化到WHERE子句中去,否则的话他将被写到查询树的HavingQual字段里面。
HavingQual / Qual 转换成隐式 的 and 样式。
List *
make_ands_implicit(Expr *clause)
{
/*
* NB: because the parser sets the qual field to NULL in a query that has
* no WHERE clause, we must consider a NULL input clause as TRUE, even
* though one might more reasonably think it FALSE. Grumble. If this
* causes trouble, consider changing the parser's behavior.
*/
if (clause == NULL)
return NIL; /* NULL -> NIL list == TRUE */
else if (and_clause((Node *) clause))
return ((BoolExpr *) clause)->args;
else if (IsA(clause, Const) &&
!((Const *) clause)->constisnull &&
DatumGetBool(((Const *) clause)->constvalue))
return NIL; /* constant TRUE input -> NIL list */
else
return list_make1(clause);
}