当前位置: 首页 > 工具软件 > TBase > 使用案例 >

Tbase 源码 (一)

慕和惬
2023-12-01

     TBase 是基于Postgres-XL 开源项目,演进优化发展而来的 企业级分布式并行计算开源数据库。

TBase is an advanced enterprise-level database management system based on prior work of Postgres-XL project. It supports an extended subset of the SQL standard, including transactions, foreign keys, user-defined types and functions. Additional, it adds parallel computing, security, management, audit and other functions.

  Postgres-XL 10R1.1 版本发布后(2019-02-18),后续 停止新版本发布。

Postgres-XL官方网站  Postgres-XL | Open Source Scalable SQL Database Cluster

 Postgres-XL 10R1.1 now available! Click here to download the source tarball. (2019-02-18)

TBase是一个提供写可靠性,多主节点数据同步的关系数据库集群平台。你可以将TBase配置一台或者多台主机上,TBase数据存储在多台物理主机上面。数据表的存储有两种方式, 分别是distributed或者replicated ,当向TBase发送查询 SQL时,TBase会自动向数据节点发出查询语句并获取最终结果。

TBase采用分布式集群架构 , 该架构分布式为无共享(share nothing)模式,节点之间相应独立,各自处理自己的数据,处理后的结果可能向上层汇总或在节点间流转,各处理单元之间通过网络协议进行通信,并行处理和扩展能力更好,这也意味着只需要简单的x86服务器就可以部署TBase数据库集群。

解读一下TBase的三大模块

  • Coordinator:协调节点(简称CN)

    业务访问入口,负责数据的分发和查询规划,多个节点位置对等,每个节点都提供相同的数据库视图;在功能上CN上只存储系统的全局元数据,并不存储实际的业务数据。

  • Datanode:数据节点(简称DN)

    每个节点还存储业务数据的分片在功能上,DN节点负责完成执行协调节点分发的执行请求。

  • GTM:全局事务管理器(Global Transaction Manager)

    负责管理集群事务信息,同时管理集群的全局对象,比如序列等。

Q:支持行列混合存储吗?

A:开发的V3版本是支持的,目前开源的V2版本是只支持行存。

Q:扩容后老数据如何清理?

A:我们现在扩容的老数据更多的是通过delete+vacuum的方式做数据的清理,因此老数据清理会对业务造成一些影响。后面会有一些更好的方式来做,比如说做一些数据聚簇的方案,来优化扩容的搬迁和扩容的数据清理对系统的一些影响。

Q :索引膨胀如何解决?

A:若你更新数据或者数据搬迁确实会有索引膨胀,我们建议是重建索引。因为索引重建是可以并行来做的,对业务其实是没有太大的影响,索引建好后把老的索引删掉就OK了。

 ++++++++++++++++++++++++++++++++++++++++++++

===Tbase ===   源码【语义分析】部分 沿用了PostgreSQL的源码。

    【查询重写】部分依然沿用PostgreSQL的源码\src\backend\tcop\Postgres.c,入口函数是:pg_rewrite_query 。

    【查询优化——预处理】查询优化模块的入口函数是pg_plan_queries函数 \src\backend\tcop\Postgres.c,它负责将查询树链表变成执行计划链表。

Utility commands  不需要执行计划

/* Utility commands require no planning. */

  CMD_UTILITY,                /* cmds like create, destroy, copy, vacuum,
                                 * etc. */

非 Utility commands,

//调用优化器 planner函数

plan = planner(querytree, cursorOptions, boundParams);
/*
 * Generate a plan for a single already-rewritten query.
 * This is a thin wrapper around planner() and takes the same parameters.
 */
PlannedStmt *
pg_plan_query(Query *querytree, int cursorOptions, ParamListInfo boundParams)
/* call the optimizer */
    plan = planner(querytree, cursorOptions, boundParams);

 \src\backend\optimizer\plan\Planner.c

/*****************************************************************************
 *
 *       Query optimizer entry point
 *
 * To support loadable plugins that monitor or modify planner behavior,
 * we provide a hook variable that lets a plugin get control before and
 * after the standard planning process.  The plugin would normally call
 * standard_planner().
 *
 * Note to plugin authors: standard_planner() scribbles on its Query input,
 * so you'd better copy that data structure if you want to plan more than once.
 *
 *****************************************************************************/
PlannedStmt *
planner(Query *parse, int cursorOptions, ParamListInfo boundParams)

CN 节点调用 pgxc_planner(parse, cursorOptions, boundParams);

DN节点调用standard_planner(parse, cursorOptions, boundParams);

\src\backend\optimizer\plan\Planner.c

#ifdef PGXC
        /*
         * A Coordinator receiving a query from another Coordinator
         * is not allowed to go into PGXC planner.
         */
        if (IS_PGXC_LOCAL_COORDINATOR)
            result = pgxc_planner(parse, cursorOptions, boundParams);
        else
#endif
            result = standard_planner(parse, cursorOptions, boundParams);

 CN 节点调用 pgxc_planner(parse, cursorOptions, boundParams);

\src\backend\optimizer\plan\Planner.c

/*
 * Build up a QueryPlan to execute on.
 *
 * This functions tries to find out whether
 * 1. The statement can be shipped to the Datanode and Coordinator is needed
 *    only as a proxy - in which case, it creates a single node plan.
 * 2. The statement can be evaluated on the Coordinator completely - thus no
 *    query shipping is involved and standard_planner() is invoked to plan the
 *    statement
 * 3. The statement needs Coordinator as well as Datanode for evaluation -
 *    again we use standard_planner() to plan the statement.
 *
 * The plan generated in either of the above cases is returned.
 */
PlannedStmt *
pgxc_planner(Query *query, int cursorOptions, ParamListInfo boundParams)
{
    PlannedStmt *result;

    /* see if can ship the query completely */
    result = pgxc_FQS_planner(query, cursorOptions, boundParams);
    if (result)
        return result;

    /* we need Coordinator for evaluation, invoke standard planner */
    result = standard_planner(query, cursorOptions, boundParams);
    return result;
}

 \src\backend\optimizer\plan\Planner.c

standard_planner(parse, cursorOptions, boundParams);

PlannedStmt *
standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
{// #lizard forgives
    PlannedStmt *result;
    PlannerGlobal *glob;
    double        tuple_fraction;
    PlannerInfo *root;
    RelOptInfo *final_rel;
    Path       *best_path;
    Plan       *top_plan;
    ListCell   *lp,
               *lr;

#ifdef XCP
    if (IS_PGXC_LOCAL_COORDINATOR && parse->utilityStmt &&
            IsA(parse->utilityStmt, RemoteQuery))
        return pgxc_direct_planner(parse, cursorOptions, boundParams);
#endif
    /*
     * Set up global state for this planner invocation.  This data is needed
     * across all levels of sub-Query that might exist in the given command,
     * so we keep it in a separate struct that's linked to by each per-Query
     * PlannerInfo.
     */
    glob = makeNode(PlannerGlobal);
   ...
#ifdef __TBASE__
    groupOids = NULL;
	min_workers_of_hashjon_gather = PG_INT32_MAX;
#endif
   ...
 /* Determine what fraction of the plan is likely to be scanned */
    if (cursorOptions & CURSOR_OPT_FAST_PLAN)
...
 /* primary planning entry point (may recurse for subqueries) */
    root = subquery_planner(glob, parse, NULL,
                            false, tuple_fraction);

    /* Select best Path and turn it into a Plan */
    final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL);
    best_path = get_cheapest_fractional_path(final_rel, tuple_fraction);

    if (!root->distribution)
        root->distribution = best_path->distribution;

#ifdef __TBASE__
    if (root->distribution && !parse->hasUnshippableTriggers)
    {
        remote_subplan_depth++;
    }
#endif
    top_plan = create_plan(root, best_path);
#ifdef XCP
#ifdef __TBASE__
    if (root->distribution && !parse->hasUnshippableTriggers)
#else
    if (root->distribution)
#endif
    {
        /*
         * FIXME, this keeps adding RemoteSubplan at a top of queries that
         * don't really need it (e.g above a MergeAppend with subplans pushed
         * to remote nodes). Not sure why it's happening, though ...
         */
        top_plan = (Plan *) make_remotesubplan(root, top_plan, NULL,
                                               root->distribution,
                                               root->sort_pathkeys);
        SS_remote_attach_initplans(root, top_plan);
        remote_subplan_depth--;
    }
#endif
  ...
  /*
     * If creating a plan for a scrollable cursor, make sure it can run
     * backwards on demand.  Add a Material node at the top at need.
     */
    if (cursorOptions & CURSOR_OPT_SCROLL)
    {
        if (!ExecSupportsBackwardScan(top_plan))
            top_plan = materialize_finished_plan(top_plan);
    }

    /*
     * Optionally add a Gather node for testing purposes, provided this is
     * actually a safe thing to do.
     */
    if (force_parallel_mode != FORCE_PARALLEL_OFF && top_plan->parallel_safe)
    {
        Gather       *gather = makeNode(Gather);

        gather->plan.targetlist = top_plan->targetlist;
        gather->plan.qual = NIL;
        gather->plan.lefttree = top_plan;
        gather->plan.righttree = NULL;
        gather->num_workers = 1;
        gather->single_copy = true;
        gather->invisible = (force_parallel_mode == FORCE_PARALLEL_REGRESS);

        /*
         * Ideally we'd use cost_gather here, but setting up dummy path data
         * to satisfy it doesn't seem much cleaner than knowing what it does.
         */
        gather->plan.startup_cost = top_plan->startup_cost +
            parallel_setup_cost;
        gather->plan.total_cost = top_plan->total_cost +
            parallel_setup_cost + parallel_tuple_cost * top_plan->plan_rows;
        gather->plan.plan_rows = top_plan->plan_rows;
        gather->plan.plan_width = top_plan->plan_width;
        gather->plan.parallel_aware = false;
        gather->plan.parallel_safe = false;

        /* use parallel mode for parallel plans. */
        root->glob->parallelModeNeeded = true;

        top_plan = &gather->plan;
    }
 /*
     * If any Params were generated, run through the plan tree and compute
     * each plan node's extParam/allParam sets.  Ideally we'd merge this into
     * set_plan_references' tree traversal, but for now it has to be separate
     * because we need to visit subplans before not after main plan.
     */
    if (glob->nParamExec > 0)
    {
        Assert(list_length(glob->subplans) == list_length(glob->subroots));
        forboth(lp, glob->subplans, lr, glob->subroots)
        {
            Plan       *subplan = (Plan *) lfirst(lp);
            PlannerInfo *subroot = (PlannerInfo *) lfirst(lr);

            SS_finalize_plan(subroot, subplan);
        }
        SS_finalize_plan(root, top_plan);
    }
...

 standard_planner函数调用的——subquery_planner函数
subquery_planner 
函数接受Query查询树,最后通过PlannedStmt结构体的形式返回Plan计划树。(Plan计划树被封装在PlannedStmt结构体中)subquery_planner函数负责创建计划,可以递归处理子查询。subquery_planner函数的工作分为两个部分:
1:依据消除冗余条件,减少查询层次,简化路径生成的基本思想,调用预处理函数对查询树进行处理 
2:调用inheritance_planner 或者grouping_planner进入生成计划流程,此过程不对查询树做出实质性改变
 

 \src\backend\optimizer\plan\Planner.c

/*--------------------
 * subquery_planner
 *      Invokes the planner on a subquery.  We recurse to here for each
 *      sub-SELECT found in the query tree.
 *
 * glob is the global state for the current planner run.
 * parse is the querytree produced by the parser & rewriter.
 * parent_root is the immediate parent Query's info (NULL at the top level).
 * hasRecursion is true if this is a recursive WITH query.
 * tuple_fraction is the fraction of tuples we expect will be retrieved.
 * tuple_fraction is interpreted as explained for grouping_planner, below.
 *
 * Basically, this routine does the stuff that should only be done once
 * per Query object.  It then calls grouping_planner.  At one time,
 * grouping_planner could be invoked recursively on the same Query object;
 * that's not currently true, but we keep the separation between the two
 * routines anyway, in case we need it again someday.
 *
 * subquery_planner will be called recursively to handle sub-Query nodes
 * found within the query's expressions and rangetable.
 *
 * Returns the PlannerInfo struct ("root") that contains all data generated
 * while planning the subquery.  In particular, the Path(s) attached to
 * the (UPPERREL_FINAL, NULL) upperrel represent our conclusions about the
 * cheapest way(s) to implement the query.  The top level will select the
 * best Path and pass it through createplan.c to produce a finished Plan.
 *--------------------
 */
PlannerInfo *
subquery_planner(PlannerGlobal *glob, Query *parse,
                 PlannerInfo *parent_root,
                 bool hasRecursion, double tuple_fraction)
 

 子链接和子查询的区别:子查询是一条完整的查询语句,而子链接是一条表达式,但是表达式内部也可以包含查询语句。直白点说呢就是:子查询是放在FROM子句里的而子链接则出现在WHERE子句或者HAVING子句中。
在subquery_planner函数里,调用pull_up_sublinks函数处理WHERE子句和JOIN/ON子句中的ANY和EXISTS类型的子链接。 

 subquery_planner函数调用pull_up_subqueries函数来提升子查询。当子查询仅仅是一个简单的扫描或者连接时,就会把子查询或者子查询的一部分合并到父查询中以进行优化。
1.在范围表中存在子查询。对于简单的子查询,直接调用pull_up_simple_subquery函数进行提升;而对于简单的UNION ALL子查询,调用pull_up_simple_union_all函数进行提升,其他的情况则不处理;
2.在FROM表达式中存在子查询。对于FROM列表中的每个节点都调用pull_up_subqueries递归处理;
3.连接表达式中的子查询。调用pull_up_subqueries函数递归地处理. 

/*
 * pull_up_subqueries
 *        Look for subqueries in the rangetable that can be pulled up into
 *        the parent query.  If the subquery has no special features like
 *        grouping/aggregation then we can merge it into the parent's jointree.
 *        Also, subqueries that are simple UNION ALL structures can be
 *        converted into "append relations".
 */
void
pull_up_subqueries(PlannerInfo *root)

表达式的预处理工作主要由函数preprocess_expression完成。该函数采用递归扫描的方式处理PlannerInfo结构体里面保存的目标属性、HAVING子句、OFFSET子句、LIMIT子句和连接树jointree。总体来说做了以下这些事
1.调用flatten_join_alias_vars函数,用基本关系变量取代连接别名变量;
2.调用函数eval_const_expression进行常量表达式的简化,也就是直接计算出常量表达式的值。例如:“3+1 <> 4” 这种会直接被替换成“FALSE”;
3.调用canonicalize_qual函数对表达式进行规范化,主要是将表达式转换为最佳析取范式或者合取范式
4.调用函数make_subplan将子链接转换为子计划.
 

 \src\backend\optimizer\plan\Planner.c

/*
 * preprocess_expression
 *        Do subquery_planner's preprocessing work for an expression,
 *        which can be a targetlist, a WHERE clause (including JOIN/ON
 *        conditions), a HAVING clause, or a few other things.
 */
static Node *
preprocess_expression(PlannerInfo *root, Node *expr, int kind)
{// #lizard forgives
    /*
     * Fall out quickly if expression is empty.  This occurs often enough to
     * be worth checking.  Note that null->null is the correct conversion for
     * implicit-AND result format, too.
     */
    if (expr == NULL)
        return NULL;

    /*
     * If the query has any join RTEs, replace join alias variables with
     * base-relation variables.  We must do this before sublink processing,
     * else sublinks expanded out from join aliases would not get processed.
     * We can skip it in non-lateral RTE functions, VALUES lists, and
     * TABLESAMPLE clauses, however, since they can't contain any Vars of the
     * current query level.
     */
    if (root->hasJoinRTEs &&
        !(kind == EXPRKIND_RTFUNC ||
          kind == EXPRKIND_VALUES ||
          kind == EXPRKIND_TABLESAMPLE ||
          kind == EXPRKIND_TABLEFUNC))
        expr = flatten_join_alias_vars(root, expr);

    /*
     * Simplify constant expressions.
     *
     * Note: an essential effect of this is to convert named-argument function
     * calls to positional notation and insert the current actual values of
     * any default arguments for functions.  To ensure that happens, we *must*
     * process all expressions here.  Previous PG versions sometimes skipped
     * const-simplification if it didn't seem worth the trouble, but we can't
     * do that anymore.
     *
     * Note: this also flattens nested AND and OR expressions into N-argument
     * form.  All processing of a qual expression after this point must be
     * careful to maintain AND/OR flatness --- that is, do not generate a tree
     * with AND directly under AND, nor OR directly under OR.
     */
    expr = eval_const_expressions(root, expr);

    /*
     * If it's a qual or havingQual, canonicalize it.
     */
    if (kind == EXPRKIND_QUAL)
    {
        expr = (Node *) canonicalize_qual((Expr *) expr);

#ifdef OPTIMIZER_DEBUG
        printf("After canonicalize_qual()\n");
        pprint(expr);
#endif
    }

    /* Expand SubLinks to SubPlans */
    if (root->parse->hasSubLinks)
        expr = SS_process_sublinks(root, expr, (kind == EXPRKIND_QUAL));

    /*
     * XXX do not insert anything here unless you have grokked the comments in
     * SS_replace_correlation_vars ...
     */

    /* Replace uplevel vars with Param nodes (this IS possible in VALUES) */
    if (root->query_level > 1)
        expr = SS_replace_correlation_vars(root, expr);

    /*
     * If it's a qual or havingQual, convert it to implicit-AND format. (We
     * don't want to do this before eval_const_expressions, since the latter
     * would be unable to simplify a top-level AND correctly. Also,
     * SS_process_sublinks expects explicit-AND format.)
     */
    if (kind == EXPRKIND_QUAL)
        expr = (Node *) make_ands_implicit((Expr *) expr);

    return expr;
}

/*
 * Expand SubLinks to SubPlans in the given expression.
 *
 * The isQual argument tells whether or not this expression is a WHERE/HAVING
 * qualifier expression.  If it is, any sublinks appearing at top level need
 * not distinguish FALSE from UNKNOWN return values.
 */
Node *
SS_process_sublinks(PlannerInfo *root, Node *expr, bool isQual)
{
    process_sublinks_context context;

    context.root = root;
    context.isTopQual = isQual;
    return process_sublinks_mutator(expr, &context);
}

 process_sublinks_mutator  引用了make_subplan 函数

/*
 * Convert a SubLink (as created by the parser) into a SubPlan.
 *
 * We are given the SubLink's contained query, type, ID, and testexpr.  We are
 * also told if this expression appears at top level of a WHERE/HAVING qual.
 *
 * Note: we assume that the testexpr has been AND/OR flattened (actually,
 * it's been through eval_const_expressions), but not converted to
 * implicit-AND form; and any SubLinks in it should already have been
 * converted to SubPlans.  The subquery is as yet untouched, however.
 *
 * The result is whatever we need to substitute in place of the SubLink node
 * in the executable expression.  If we're going to do the subplan as a
 * regular subplan, this will be the constructed SubPlan node.  If we're going
 * to do the subplan as an InitPlan, the SubPlan node instead goes into
 * root->init_plans, and what we return here is an expression tree
 * representing the InitPlan's result: usually just a Param node representing
 * a single scalar result, but possibly a row comparison tree containing
 * multiple Param nodes, or for a MULTIEXPR subquery a simple NULL constant
 * (since the real output Params are elsewhere in the tree, and the MULTIEXPR
 * subquery itself is in a resjunk tlist entry whose value is uninteresting).
 */
static Node *
make_subplan(PlannerInfo *root, Query *orig_subquery,
             SubLinkType subLinkType, int subLinkId,
             Node *testexpr, bool isTopQual)

函数make_subplan,  其执行步骤如下:

1)首先复制子链接SubLink中的查询树Query,如果查询树是一个EXISTS型的子计划,那么则调用simplify_EXISTS_query函数对QUery副本进行处理;
2)调用subquery_planner函数为子链接生成计划,同时设置好参数tuple_fraction来告诉底层规划器需要去多少个元组。对于EXISTS型的子查询,设置该值为1.0,取回一个元组即可;对于ALL和ANY型的子查询,从概率上说大约取到50%的元组就可以得到结果,因此设置参数值为0.5;对于其他的查询,统一设置为0;
3)调用build_subplan函数将上一步生成的计划转换为SubPlan或者InitPlan的形式,并将当前查询层次的参数列表传递给它的子计划;
4)对于生成的是SubPlan而且是一个简单的EXISTS类型的查询,调用convert_EXISTS_to_ANY函数尝试将其转换为一个ANY类型的查询并创建查询计划. 

处理HAVING子句

对于HAVING子句来说,除了进行前面所提到的预处理外,还需要处理其中的每个条件。如果HAVING子句中没有聚集函数的话,那么它完全可以退化到WHERE子句中去,否则的话他将被写到查询树的HavingQual字段里面。 

HavingQual / Qual  转换成隐式 的 and  样式。

List *
make_ands_implicit(Expr *clause)
{
    /*
     * NB: because the parser sets the qual field to NULL in a query that has
     * no WHERE clause, we must consider a NULL input clause as TRUE, even
     * though one might more reasonably think it FALSE.  Grumble. If this
     * causes trouble, consider changing the parser's behavior.
     */
    if (clause == NULL)
        return NIL;                /* NULL -> NIL list == TRUE */
    else if (and_clause((Node *) clause))
        return ((BoolExpr *) clause)->args;
    else if (IsA(clause, Const) &&
             !((Const *) clause)->constisnull &&
             DatumGetBool(((Const *) clause)->constvalue))
        return NIL;                /* constant TRUE input -> NIL list */
    else
        return list_make1(clause);
}

 类似资料: