CarbonData 时间序列 DataMap
时间序列 DataMap 介绍 (Alpha feature in 1.3.0)
Timeseries DataMap 是基于 'preaggregate' DataMap 实现的预聚合表。区别在于时间序列 DataMap 内置了对时间层次(time hierarchy)和级别的理解:年、月、日、时、分,以便它支持将查询中时间维度进行自动 roll-up。
数据加载、查询、压缩命令以及行为和预聚合 DataMap 一致,请参考 预聚合 DataMap 了解更多的信息。
要使用这个 datamap,用户可以在主表上创建多个时间序列 datamap,其中主表上有一个 event_time 的列,一个 datamap 对应一个时间粒度。然后 Carbondata 可以为主表上的查询进行自动汇总(automatic roll-up)。
例如,以下语句在名为 timeseries 的主表上有效地创建了多个预聚合表
CREATE DATAMAP agg_year
ON TABLE sales
USING "timeseries"
DMPROPERTIES (
'event_time'='order_time',
'year_granularity'='1',
) AS
SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
avg(price) FROM sales GROUP BY order_time, country, sex
CREATE DATAMAP agg_month
ON TABLE sales
USING "timeseries"
DMPROPERTIES (
'event_time'='order_time',
'month_granularity'='1',
) AS
SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
avg(price) FROM sales GROUP BY order_time, country, sex
CREATE DATAMAP agg_day
ON TABLE sales
USING "timeseries"
DMPROPERTIES (
'event_time'='order_time',
'day_granularity'='1',
) AS
SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
avg(price) FROM sales GROUP BY order_time, country, sex
CREATE DATAMAP agg_sales_hour
ON TABLE sales
USING "timeseries"
DMPROPERTIES (
'event_time'='order_time',
'hour_granularity'='1',
) AS
SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
avg(price) FROM sales GROUP BY order_time, country, sex
CREATE DATAMAP agg_minute
ON TABLE sales
USING "timeseries"
DMPROPERTIES (
'event_time'='order_time',
'minute_granularity'='1',
) AS
SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
avg(price) FROM sales GROUP BY order_time, country, sex
CREATE DATAMAP agg_minute
ON TABLE sales
USING "timeseries"
DMPROPERTIES (
'event_time'='order_time',
'minute_granularity'='1',
) AS
SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
avg(price) FROM sales GROUP BY order_time, country, sex
为了查询时间序列数据,Carbondata 内置支持时间相关的 UDF,从而能够自动滚动到所需的聚合级别
timeseries(timeseries column name, 'aggregation level')
SELECT timeseries(order_time, 'hour'), sum(quantity) FROM sales GROUP BY timeseries(order_time,
'hour')
除非查询需要,否则不必为每个粒度创建预聚合表。 Carbondata 可以汇总数据并获取它。
比如: 对于主表 sales , 如果以下时间序列 datamaps 是日级和小时级的预先聚合表
CREATE DATAMAP agg_day
ON TABLE sales
USING "timeseries"
DMPROPERTIES (
'event_time'='order_time',
'day_granularity'='1',
) AS
SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
avg(price) FROM sales GROUP BY order_time, country, sex
CREATE DATAMAP agg_sales_hour
ON TABLE sales
USING "timeseries"
DMPROPERTIES (
'event_time'='order_time',
'hour_granularity'='1',
) AS
SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
avg(price) FROM sales GROUP BY order_time, country, sex
下面这样的查询将被汇总并命中时间序列 datamaps
Select timeseries(order_time, 'month'), sum(quantity) from sales group by timeseries(order_time,
'month')
Select timeseries(order_time, 'year'), sum(quantity) from sales group by timeseries(order_time,
'year')
注意 (限制):
仅支持层次级别为 1。 未来的 CarbonData 将支持其他层级。
需要一个接着一个地创建所需级别的时间序列 datamap
每个级别时间序列 datamap 需要分别进行删除
压缩时间序列 datamp
请参见预聚合 datamap 的压缩章节。同样适用于时间序列 datamap。
时间序列 datamap 上的数据管理
请参见预聚合 datamap 的数据管理章节。同样适用于时间序列 datamap。