[23] Analyzer

朱啸
2023-12-01
NodeRef 是Node的引用
Scope 
   Scope本身是一个链表
    解析公式的类型
    Optional<Scope> parent 
    RelationId 是Node
    RelationType 是Node的类型(数据类型) 
FieldId是解析出来的ResolvedField的index信息

Analyzer

Analyzer{
    Metadata metadata;
    SqlParser sqlParser;
    Session session;
    List<Expression> parameters;
public Analysis analyze(Statement statement, boolean isDescribe){
       Analysis analysis = new Analysis(rewrittenStatement, parameters, isDescribe);
        StatementAnalyzer(analysis, metadata, sqlParser, accessControl, session).analyze()去构造Analysis
   }

}

StatementAnalyzer.

StatementAnalyzer{
    Analysis analysis;
    Metadata metadata;
    Session session;
    SqlParser sqlParser;
    AccessControl accessControl;
    通过访问者去遍历
    public Scope analyze(Node node, Optional<Scope> outerQueryScope)
    {
        return new Visitor(outerQueryScope).process(node, Optional.empty());
    }
}

StatementAnalyzer::Visitor

StatementAnalyzer::Visitor
整个过程每一步都是一个Scope节点,Scope组成一个链表记录整个visit的结果
主要看
visitQuery()
visitQuerySpecification()

visitQuery

visitQuery(){
    先处理queryBody,这里就进入了visitQuerySpecification()
    Scope queryBodyScope = process(node.getQueryBody(), withScope)
    //这里setOutputExpressions
    analysis.setOutputExpressions(node, descriptorToFields(queryBodyScope));
    //更新Scope
    analysis.setScope(node, queryScope)
}

visitQuerySpecification.

visitQuerySpecification(){
     analyzeFrom(node, scope);
     analyzeWhere(node, sourceScope, where)
     analyzeSelect(node, sourceScope)
     analyzeGroupBy(node, sourceScope, outputExpressions)
     analyzeHaving(node, sourceScope);
     analyzeOrderBy(node, orderByScope.get(), outputExpressions)
     analyzeGroupingOperations(node, sourceExpressions, orderByExpressions)
     analyzeAggregations(node, sourceScope, orderByScope, groupByExpressions, sourceExpressions, orderByExpressions)
     analyzeWindowFunctions(node, outputExpressions, orderByExpressions)
     //没有groupby但有orderby
     computeAndAssignOrderByScopeWithAggregation(node.getOrderBy().get(), sourceScope, outputScope, aggregations, groupByExpressions, analysis.getGroupingOperations(node))
}

analyzeFrom

analyzeFrom
analyzeFrom(node, scope){
  如果是单表查询,下一步是visitTable
}

visitTable
(1)Analysis 更新了table、列
(2)做了鉴权
(3)此时的scope,将列构成的filed更新进去
(4)Analysis的scops,添加一项,node(table)-->上面这个scope

visitTable(Table table, Optional<Scope> scope){//scope此时一般是空的
   根据table的name信息构建一个完整的catalog.schema.tableName的QualifiedObjectName name(注意session中由于catalog信息,所以查询时可以不指定)
   如果是view
        单独处理:这里有时间需要细看(备注这里也是本质是拿出来view的sql重新parse和解析,也是hive view 不能完成支持根本,Hsql不能用prseto AST完整表示)
   如果是普通的表
   //读取元数据表Handle,这里是通过连接meta实时读取的,比如mysql 此时通过BaseJdbcClient::getTableHandle开启一次connect读取mysql的元数据
   tableHandle = metadata.getTableHandle(session, name)
   //校验是否表注册,这里可以借鉴
   //表鉴权,这里可以借鉴,以后细看
   accessControl.checkCanSelectFromTable(session.getRequiredTransactionId(), session.getIdentity(), name)
   //获取tableMetadata
   //例如mysql,MetadataManager::getTableMetadata-->JdbcMetadta::getTableMetadata-->BaseJdbcClient::getColumns-->这里会连接数据库
   TableMetadata tableMetadata = metadata.getTableMetadata(session, tableHandle.get());
   //抽取出来每一列
   Map<String, ColumnHandle> columnHandles = metadata.getColumnHandles(session, tableHandle.get())
   将元数据中列更新到Analysis
   tableMetadata.getColumns()遍历
          构造field,其实就是列包装成Field
            analysis.setColumn(field, columnHandle)
    //更新Analysis table
    analysis.registerTable(table, tableHandle.get())
    createAndAssignScope(table, scope, fields.build())
        这里会建立返回Scope,对于表Scope
        Scope{
            Optional<Scope> parent = empty
            RelationId 是Node = Table的包装
            RelationType 是Node的类型(数据类型)  = tables中所有列和其数据类型
        }
   //更新 Analysis的scops,添加一项,node(table)-->上面这个scope         
}

analyzeWhere

(1)analysis.recordSubqueries(node, expressionAnalysis)
(2)analysis.setWhere(node, predicate)
(3)对where检测是否是布尔
(4)scope

public void analyzeWhere(Node node, Scope scope, Expression predicate)
{
    //where中不能存在聚合窗函数
    Analyzer.verifyNoAggregateWindowOrGroupingFunctions(metadata.getFunctionRegistry(), predicate, "WHERE clause");
    //子查询检测
    ExpressionAnalysis expressionAnalysis = analyzeExpression(predicate, scope);
    analysis.recordSubqueries(node, expressionAnalysis);

    Type predicateType = expressionAnalysis.getType(predicate);
    if (!predicateType.equals(BOOLEAN)) 
        //检测是否是布尔值
        // coerce null to boolean
        analysis.addCoercion(predicate, BOOLEAN, false);
    }
    //设置where
    analysis.setWhere(node, predicate);
}

analyzeExpression(predicate, scope) {
    ExpressionAnalyzer.analyzeExpression
}

公式分析器和公式分析结果

ExpressionAnalyzer、expressionAnalysis、ExpressionTreeUtils
ExpressionAnalyzer
分析公式生成一个ExpressionAnalysis,注意ExpressionAnalyzer必须要有一个Scope初始有类型,然后依据这个进行其他推断
比如可以向将TableNode 进行visit产生一个sope,或者仿照visitTable的方法构建一个Scope

ExpressionAnalysis
主要是存储将一个公式解析后的分析结果,比如 id = 1
expressionTypes 是公式中所有成员的类型 id>int 1>int id=1>boolean
columnReferences 是涉及的列 id>Field(id的field)

ExpressionTreeUtils
这里有从公式中抽取聚合函数、窗含数的工具方法

List<FunctionCall> extractAggregateFunctions
List<FunctionCall> extractWindowFunctions

extractExpressions{
   这里是利用DefaultExpressionTraversalVisitor将公式的每个成员都拿出来
   比如id = 1 
   [id, 1, id =1]
   然后返回去做比较和过滤
},

analyzeSelect

(1)更新了analysis.setOutputExpressions
(2)检测了count_distinct列必须是可比较的列

List outputExpressions = analyzeSelect(node, sourceScope)

 analyzeSelect(node, sourceScope) {
    outputExpressionBuilder = ImmutableList.builder();
    for(ode.getSelect().getSelectItems()) 遍历{
        如果是*{
            从scope中获取所有的field 构建FieldReference
        } else 如果是SingleColumn {
        ExpressionAnalysis expressionAnalysis = analyzeExpression(column.getExpression(), scope);
        analysis.recordSubqueries(node, expressionAnalysis);//这里更新了recordSubqueries
        outputExpressionBuilder.add(column.getExpression());
        }
    }
    analysis.setOutputExpressions(node, outputExpressionBuilder.build())
 }

analyzeGroupBy

(1)检测无窗函数和聚合函数.
(2)analysis.setGroupingSets.

List<List<Expression>> groupByExpressions = analyzeGroupBy(node, sourceScope, outputExpressions)    

analyzeHaving

(1)analysis.setHaving.
(2)不能有窗函数

analyzeHaving(node, sourceScope)

computeAndAssignOutputScope

(1)新建一个QuerySpecific > [输出列组成的field构建出来的scope] QuerySpecific的scope,这个scope的父是table对应的scope
(2)更新analysis的scopes,一个是table>表的列对应的scope,一个是QuerySpecific的输出列组成的scope

sourceScope是经过from where select having后的scope,主要是table中的列    
scope是入参空
Scope outputScope = computeAndAssignOutputScope(node, scope, sourceScope)
遍历所有select for(node.getSelect().getSelectItems())准备生成OutputScope{
  构造输出field
  新建一个QuerySpecific > [输出列组成的field构建出来的scope] QuerySpecific的scope,这个scope的父是table对应的scope
  更新analysis的scopes 
}

analyzeOrderBy

sourceScope, outputScope构造一个orderByScope即:父是sourceScope,本身是orderByScope的scope
然后同上分析 并设置analysis.setOrderByExpressions

analyzeGroupingOperations

sourceExpressions是所有select输出列和having中列
analyzeGroupingOperations(node, sourceExpressions, orderByExpressions)

analyzeAggregations

sourceExpressions是所有select输出列和having中列
sourceScope是table对应的scope
orderByScope是输出列对应的scope

analyzeAggregations(node, sourceScope, orderByScope, groupByExpressions, sourceExpressions, orderByExpressions){
    先提取了聚合函数
    analysis.setAggregates(node, aggregates)  
    用AggregationAnalyzer校验,select中列和groupby中的列,校验一遍groupby 不是聚合的列,都在groupby中
    verifySourceAggregations(distinctGroupingColumns, sourceScope, expression, metadata, analysis)
    //分析窗函数
    verifyOrderByAggregations(distinctGroupingColumns, sourceScope, orderByScope.get(), expression, metadata, analysis)
}
 类似资料: