Flink的TableAPI与SQL介绍和使用

景信瑞

2023-12-01

Flink也提供了关系型编程接口TableAPI以及基于TableAPI的SQLAPI，让用户能够通过使用结构化编程接口高效地构建Flink应用。同时TableAPI以及SQL能够统一处理批量和实时计算业务，无须切换修改任何应用代码就能够基于同一套API编写流式应用和批量应用，从而达到真正意义的批流统一。

在Flink1.8中，如果用户需要同时在流计算、批处理的场景下，用户需要两套业务代码，开发人员也需要维护两套技术栈，对于开发人员来说非常麻烦。Flink社区很早就设想过将批处理看做是一个有界流数据，将批处理看做一个流计算的特例，从而实现批流统一。阿里的Blink团队在这方面做了大量的工作，已经实现了Table API&SQL层的批流统一。终于在Flink1.9中，Table模块迎来了核心架构的升级，引入了阿里Blink团队贡献的诸多功能。

orderlog.txt如下，依次表示订单号、数量、商品编号、价格类型、下单时间戳

20201011231245423,2,1226354,new,1599931359071
20201011231254678,1,1226322,normal,1599931359024
20201011231212768,1,1226324,back,1599931359011
20201011231234567,3,1226351,normal,1599931359073
20201011231234569,4,1226352,new,1599931359077

一、TableEnvironment构建

1、引入maven依赖

<dependency>
	<groupId>org.apache.flink</groupId>
	<artifactId>flink-table-planner_2.11</artifactId>
	<version>1.9.1</version>
</dependency>
<dependency>
	<groupId>org.apache.flink</groupId>
	<artifactId>flink-table-api-java-bridge_2.11</artifactId>
	<version>1.9.1</version>
</dependency>

2、TableEnvironment创建

（1）流式计算环境如下

//fink1.9之前写法
StreamExecutionEnvironment executionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment tableEnvironment = StreamTableEnvironment.create(executionEnvironment);
//fink1.9之后写法
EnvironmentSettings fsSettings = EnvironmentSettings.newInstance().useOldPlanner().inStreamingMode().build();
StreamExecutionEnvironment fsEnv = StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment fsTableEnv = StreamTableEnvironment.create(fsEnv, fsSettings);

（2）批处理环境如下

ExecutionEnvironment fbEnv = ExecutionEnvironment.getExecutionEnvironment();
BatchTableEnvironment fbTableEnv = BatchTableEnvironment.create(fbEnv);

二、TableAPI的使用

1、创建表

在 Flink 中创建一张表有两种方法：
（1）从一个文件中导入表结构（Structure）（静态）。

此方法常用于批计算，常见于从外部系统获取数据：如数据库、文件系统、kafka的source

如：读取本地系统Csv文件（这里读取txt文本文件更方便），订单日志，创建一张表

import org.apache.flink.api.common.typeinfo.TypeInformation;
import org.apache.flink.api.common.typeinfo.Types;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.java.StreamTableEnvironment;
import org.apache.flink.table.sources.CsvTableSource;
import org.apache.flink.table.sources.TableSource;
import org.apache.flink.types.Row;

public class TestTableApi {
	public static void main(String[] args) throws Exception {
		// fink1.9之前写法
        //StreamExecutionEnvironment executionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment();
        //StreamTableEnvironment tableEnvironment = StreamTableEnvironment.create(executionEnvironment);
		// fink1.9之后写法
		EnvironmentSettings fsSettings = EnvironmentSettings.newInstance().useOldPlanner().inStreamingMode().build();
		StreamExecutionEnvironment fsEnv = StreamExecutionEnvironment.getExecutionEnvironment();
		StreamTableEnvironment fsTableEnv = StreamTableEnvironment.create(fsEnv, fsSettings);
		
		//定义表列字段
		String[] fieldNames = {"orderId","orderNum", "skuId", "priceType", "requestTime"};
		//定义表列类型
		@SuppressWarnings("rawtypes")
		TypeInformation[] fieldTypes = {Types.STRING, Types.INT, Types.STRING, Types.STRING,Types.STRING,}; 
		//创建CSV格式的TableSource
		TableSource<Row> tableSource = new CsvTableSource("C:\\Users\\LiryZlian\\Desktop\\order.txt", fieldNames, fieldTypes);
		// 注册表
		fsTableEnv.registerTableSource("tb_order_log", tableSource); 
		//转换成Table对象，并打印表结构
		Table table = fsTableEnv.scan("tb_order_log");
		table.printSchema();
	}
}

输出结果如下：

root
 |-- orderId: STRING
 |-- orderNum: INT
 |-- skuId: STRING
 |-- priceType: STRING
 |-- requestTime: STRING

（2）从 DataStream 或者 DataSet 转换成 Table （动态）

读取网络的内容

import org.apache.flink.api.common.functions.RichMapFunction;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.java.StreamTableEnvironment;

import com.alibaba.fastjson.JSON;

public class TestTableApi {
	public static void main(String[] args) throws Exception {
		EnvironmentSettings fsSettings = EnvironmentSettings.newInstance().useOldPlanner().inStreamingMode().build();
		StreamExecutionEnvironment fsEnv = StreamExecutionEnvironment.getExecutionEnvironment();
		StreamTableEnvironment fsTableEnv = StreamTableEnvironment.create(fsEnv, fsSettings);
		//netcat模拟mq接收数据
		SingleOutputStreamOperator<OrderLog> textDataSteam = fsEnv.socketTextStream("127.0.0.1", 8888)
				.map(new OutMapFunction());
		// 注册表，支持POJO的field转换为表字段，即表字段默认为POJO的field，可以用 field as newfield为表指定字段名
		fsTableEnv.registerDataStream("tb_order_log", textDataSteam);
		//fsTableEnv.registerDataStream("tb_order_log", textDataSteam,"orderId as order_id,orderNum,skuId,priceType,requestTime");
		//转换成Table对象
		Table table = fsTableEnv.scan("tb_order_log");
		//或者直接转化换为table对象
		//Table table = fsTableEnv.fromDataStream(textDataSteam);
        //Table table = fsTableEnv.fromDataStream(textDataSteam,"orderId as order_id,orderNum,skuId,priceType,requestTime");
		//并打印表结构
		table.printSchema();
		fsEnv.execute();
	}
}

/**
 * map转换输出
 */
class OutMapFunction extends RichMapFunction<String, OrderLog> {
	private static final long serialVersionUID = -6478853684295335571L;

	@Override
	public OrderLog map(String value) throws Exception {
		OrderLog orderLog = JSON.parseObject(value,OrderLog.class);
		return orderLog;
	}
}

日志类OrderLog如下：

import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;

@AllArgsConstructor
@NoArgsConstructor
@Data
public class OrderLog {
	private String orderId;
	
	private Integer orderNum;
	
	private String skuId;
	
	private String priceType;
	
	private Long requestTime;
}

打印Schema输出如下：

root
 |-- orderId: STRING
 |-- orderNum: INT
 |-- skuId: STRING
 |-- priceType: STRING
 |-- requestTime: BIGINT

2、修改表字段

创建表的时候可以修改表字段

调用registerDataStream或fromDataStream的另一个方法，用as指定表字即可，如orderId as order_id

改成
fsTableEnv.registerDataStream("tb_order_log", textDataSteam,"orderId as order_id,orderNum,skuId,priceType,requestTime");
或者
fsTableEnv.fromDataStream(textDataSteam,"orderId as order_id,orderNum,skuId,priceType,requestTime");

再运行如下：

root
 |-- order_id: STRING
 |-- orderNum: INT
 |-- skuId: STRING
 |-- priceType: STRING
 |-- requestTime: BIGINT

3、查询与过滤

用where或filter作为条件过滤，这里where等同于filter，where的内部实现调用的是filter方法。用select作为查询

import org.apache.flink.api.common.functions.RichMapFunction;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.java.StreamTableEnvironment;
import org.apache.flink.types.Row;

import com.alibaba.fastjson.JSON;

public class TestTableApi {
	public static void main(String[] args) throws Exception {
		EnvironmentSettings fsSettings = EnvironmentSettings.newInstance().useOldPlanner().inStreamingMode().build();
		StreamExecutionEnvironment fsEnv = StreamExecutionEnvironment.getExecutionEnvironment();
		StreamTableEnvironment fsTableEnv = StreamTableEnvironment.create(fsEnv, fsSettings);
		
		SingleOutputStreamOperator<OrderLog> textDataSteam = fsEnv.socketTextStream("127.0.0.1", 8888)
				.map(new OutMapFunction());
		
		// 注册表,并指定表字段
		Table table = fsTableEnv.fromDataStream(textDataSteam);
		//过滤priceType = 'new'的数据，并查询字段orderId、orderNum、priceType
		Table filterTable = table.filter("priceType = 'new'").select("'orderId=' + orderId,'orderNum=' + orderNum,'priceType=' + priceType");
		DataStream<Row> appendStream = fsTableEnv.toAppendStream(filterTable, Row.class);
		//并打印表数据
		appendStream.print();
		fsEnv.execute();
	}
}

/**
 * map转换输出
 */
class OutMapFunction extends RichMapFunction<String, OrderLog> {
	private static final long serialVersionUID = -6478853684295335571L;

	@Override
	public OrderLog map(String value) throws Exception {
		OrderLog orderLog = JSON.parseObject(value,OrderLog.class);
		return orderLog;
	}
}

本地启动netcat，测试输入如下：

LiryZlian@DESKTOP-HLMB1EG MINGW64 ~/Desktop
$ nc -L 127.0.0.1 -p 8888
{"orderId":"20201011231234567","orderNum":3,"priceType":"normal","requestTime":3,"skuId":"1226351"}
{"orderId":"20201011231245423","orderNum":2,"priceType":"new","requestTime":2,"skuId":"1226354"}
{"orderId":"20201011231254678","orderNum":1,"priceType":"normal","requestTime":1,"skuId":"1226322"}
{"orderId":"20201011231212768","orderNum":1,"priceType":"back","requestTime":1,"skuId":"1226324"}
{"orderId":"20201011231234569","orderNum":4,"priceType":"new","requestTime":4,"skuId":"1226352"}

输出结果如下：

3> orderId=20201011231245423,orderNum=2,priceType=new
2> orderId=20201011231234569,orderNum=4,priceType=new

4、分组聚合

统计每个价格类型的订单数量

import org.apache.flink.api.common.functions.FilterFunction;
import org.apache.flink.api.common.functions.RichMapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.java.StreamTableEnvironment;
import org.apache.flink.types.Row;

import com.alibaba.fastjson.JSON;

public class TestTableApi {
	public static void main(String[] args) throws Exception {
		EnvironmentSettings fsSettings = EnvironmentSettings.newInstance().useOldPlanner().inStreamingMode().build();
		StreamExecutionEnvironment fsEnv = StreamExecutionEnvironment.getExecutionEnvironment();
		StreamTableEnvironment fsTableEnv = StreamTableEnvironment.create(fsEnv, fsSettings);
		
		SingleOutputStreamOperator<OrderLog> textDataSteam = fsEnv.socketTextStream("127.0.0.1", 8888)
				.map(new OutMapFunction());
		
		// 注册表
		Table table = fsTableEnv.fromDataStream(textDataSteam);
		//统计每个价格类型的订单数量，按字段priceType groupby
		Table filterTable = table.groupBy("priceType").select("priceType, priceType.count as ccout");
		//filter过滤出为true的
		DataStream<Tuple2<Boolean, Row>> appendStream = fsTableEnv.toRetractStream(filterTable, Row.class).filter(new MyFilterFunction());
		//并打印表数据
		appendStream.print();
		fsEnv.execute();
	}
}

class MyFilterFunction implements FilterFunction<Tuple2<Boolean, Row>> {
	private static final long serialVersionUID = -4837029132152942499L;

	@Override
	public boolean filter(Tuple2<Boolean, Row> value) throws Exception {
		return value.f0.booleanValue();
	}
}
/**
 * map转换输出
 */
class OutMapFunction extends RichMapFunction<String, OrderLog> {
	private static final long serialVersionUID = -6478853684295335571L;

	@Override
	public OrderLog map(String value) throws Exception {
		OrderLog orderLog = JSON.parseObject(value,OrderLog.class);
		return orderLog;
	}
}

输入内容同上，输出如下：

2> (true,normal,1)
4> (true,new,1)
2> (true,normal,2)
2> (true,back,1)
4> (true,new,2)

5、自定义函数（UDF）

未完待续

6、窗口使用

三、FlinkSQL使用

SQL作为Flink中提供的接口之一，占据着非常重要的地位，主要是因为SQL具有灵活和丰富的语法，能够应用于大部分的计算场景。FlinkSQL底层使用ApacheCalcite框架，将标准的FlinkSQL语句解析并转换成底层的算子处理逻辑，并在转换过程中基于语法规则层面进行性能优化，比如谓词下推等。另外用户在使用SQL编写Flink应用时，能够屏蔽底层技术细节，能够更加方便且高效地通过SQL语句来构建Flink应用。FlinkSQL构建在TableAPI之上，并含盖了大部分的TableAPI功能特性。同时FlinkSQL可以和TableAPI混用，Flink最终会在整体上将代码合并在同一套代码逻辑中

1、SQL的运用

大概有两种写法，第一种写法，混合写法，先转换Table Api,再SQL，第二种写法，sql直接调用方式

import org.apache.flink.api.common.functions.FilterFunction;
import org.apache.flink.api.common.functions.RichMapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.table.api.EnvironmentSettings;
import org.apache.flink.table.api.Table;
import org.apache.flink.table.api.java.StreamTableEnvironment;
import org.apache.flink.types.Row;

import com.alibaba.fastjson.JSON;

public class TestTableApi {
	public static void main(String[] args) throws Exception {
		EnvironmentSettings fsSettings = EnvironmentSettings.newInstance().useOldPlanner().inStreamingMode().build();
		StreamExecutionEnvironment fsEnv = StreamExecutionEnvironment.getExecutionEnvironment();
		StreamTableEnvironment fsTableEnv = StreamTableEnvironment.create(fsEnv, fsSettings);

		SingleOutputStreamOperator<OrderLog> textDataSteam = fsEnv
				.socketTextStream("127.0.0.1", 8888).map(new OutMapFunction());

		//第一种写法，混合写法，先转换Table Api,再SQL
		//---------------
		// 注册表,并指定表字段
		Table table = fsTableEnv.fromDataStream(textDataSteam);
		// 此类型直接转换为String，即为表名
		String tableName = table.toString();
		String sql = "select priceType,count(orderId) as ccount from %s where priceType='new' group by priceType";
		// 统计新人价格类型的订单数量，按字段priceType groupby
		Table filterTable = fsTableEnv.sqlQuery(String.format(sql, tableName));
		//------------------------------
		
		//第二种写法,sql直接调用方式
		// 注册表,并指定表字段
//		fsTableEnv.registerDataStream("tb_order_log", textDataSteam);
//		String sql = "select priceType,count(orderId) as ccount from tb_order_log where priceType='new' group by priceType";
//		// 统计新人价格类型的订单数量，按字段priceType groupby
//		Table filterTable = fsTableEnv.sqlQuery(sql);

		// filter过滤出为true的
		DataStream<Tuple2<Boolean, Row>> appendStream = fsTableEnv.toRetractStream(filterTable, Row.class)
				.filter(new MyFilterFunction());
		// 并打印表数据
		appendStream.print();
		fsEnv.execute();
	}
}

class MyFilterFunction implements FilterFunction<Tuple2<Boolean, Row>> {
	private static final long serialVersionUID = -4837029132152942499L;

	@Override
	public boolean filter(Tuple2<Boolean, Row> value) throws Exception {
		return value.f0.booleanValue();
	}
}

/**
 * map转换输出
 */
class OutMapFunction extends RichMapFunction<String, OrderLog> {
	private static final long serialVersionUID = -6478853684295335571L;

	@Override
	public OrderLog map(String value) throws Exception {
		OrderLog orderLog = JSON.parseObject(value,OrderLog.class);
		return orderLog;
	}
}

输入内容同上，输出如下：

4> (true,new,1)
4> (true,new,2)

2、SQL中的Window

FlinkSQL也支持三种窗口类型，分别为TumbleWindows（滚动窗口）、HOPWindows（滑动窗口）和SessionWindows（会话擦混港口），其中HOPWindows对应TableAPI中的SlidingWindow，同时每种窗口分别有相应的使用场景和方法。

参考：https://ci.apache.org/projects/flink/flink-docs-release-1.9/zh/dev/table/common.html

参考：https://ci.apache.org/projects/flink/flink-docs-release-1.9/zh/dev/table/sourceSinks.html#define-a-tablesource

Flink的TableAPI与SQL介绍和使用

一、TableEnvironment构建

1、引入maven依赖

2、TableEnvironment创建

二、TableAPI的使用

1、创建表

2、修改表字段

3、查询与过滤

4、分组聚合

5、自定义函数（UDF）

6、窗口使用

三、FlinkSQL使用

1、SQL的运用

2、SQL中的Window

相关阅读

相关文章

相关问答

相关文档

Flink的TableAPI与SQL介绍和使用

一、TableEnvironment构建

1、引入maven依赖

2、TableEnvironment创建

二、TableAPI的使用

1、创建表

2、 修改表字段

3、查询与过滤

4、分组聚合

5、自定义函数（UDF）

6、窗口使用

三、FlinkSQL使用

1、SQL的运用

2、SQL中的Window

相关阅读

相关文章

相关问答

相关文档

2、修改表字段