carbondate mysql_Apache CarbonData 1.3.1 中文文档

董洲
2023-12-01

CarbonData 时间序列 DataMap

时间序列 DataMap 介绍 (Alpha feature in 1.3.0)

Timeseries DataMap 是基于 'preaggregate' DataMap 实现的预聚合表。区别在于时间序列 DataMap 内置了对时间层次(time hierarchy)和级别的理解:年、月、日、时、分,以便它支持将查询中时间维度进行自动 roll-up。

数据加载、查询、压缩命令以及行为和预聚合 DataMap 一致,请参考 预聚合 DataMap 了解更多的信息。

要使用这个 datamap,用户可以在主表上创建多个时间序列 datamap,其中主表上有一个 event_time 的列,一个 datamap 对应一个时间粒度。然后 Carbondata 可以为主表上的查询进行自动汇总(automatic roll-up)。

例如,以下语句在名为 timeseries 的主表上有效地创建了多个预聚合表

CREATE DATAMAP agg_year

ON TABLE sales

USING "timeseries"

DMPROPERTIES (

'event_time'='order_time',

'year_granularity'='1',

) AS

SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),

avg(price) FROM sales GROUP BY order_time, country, sex

CREATE DATAMAP agg_month

ON TABLE sales

USING "timeseries"

DMPROPERTIES (

'event_time'='order_time',

'month_granularity'='1',

) AS

SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),

avg(price) FROM sales GROUP BY order_time, country, sex

CREATE DATAMAP agg_day

ON TABLE sales

USING "timeseries"

DMPROPERTIES (

'event_time'='order_time',

'day_granularity'='1',

) AS

SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),

avg(price) FROM sales GROUP BY order_time, country, sex

CREATE DATAMAP agg_sales_hour

ON TABLE sales

USING "timeseries"

DMPROPERTIES (

'event_time'='order_time',

'hour_granularity'='1',

) AS

SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),

avg(price) FROM sales GROUP BY order_time, country, sex

CREATE DATAMAP agg_minute

ON TABLE sales

USING "timeseries"

DMPROPERTIES (

'event_time'='order_time',

'minute_granularity'='1',

) AS

SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),

avg(price) FROM sales GROUP BY order_time, country, sex

CREATE DATAMAP agg_minute

ON TABLE sales

USING "timeseries"

DMPROPERTIES (

'event_time'='order_time',

'minute_granularity'='1',

) AS

SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),

avg(price) FROM sales GROUP BY order_time, country, sex

为了查询时间序列数据,Carbondata 内置支持时间相关的 UDF,从而能够自动滚动到所需的聚合级别

timeseries(timeseries column name, 'aggregation level')

SELECT timeseries(order_time, 'hour'), sum(quantity) FROM sales GROUP BY timeseries(order_time,

'hour')

除非查询需要,否则不必为每个粒度创建预聚合表。 Carbondata 可以汇总数据并获取它。

比如: 对于主表 sales , 如果以下时间序列 datamaps 是日级和小时级的预先聚合表

CREATE DATAMAP agg_day

ON TABLE sales

USING "timeseries"

DMPROPERTIES (

'event_time'='order_time',

'day_granularity'='1',

) AS

SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),

avg(price) FROM sales GROUP BY order_time, country, sex

CREATE DATAMAP agg_sales_hour

ON TABLE sales

USING "timeseries"

DMPROPERTIES (

'event_time'='order_time',

'hour_granularity'='1',

) AS

SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),

avg(price) FROM sales GROUP BY order_time, country, sex

下面这样的查询将被汇总并命中时间序列 datamaps

Select timeseries(order_time, 'month'), sum(quantity) from sales group by timeseries(order_time,

'month')

Select timeseries(order_time, 'year'), sum(quantity) from sales group by timeseries(order_time,

'year')

注意 (限制):

仅支持层次级别为 1。 未来的 CarbonData 将支持其他层级。

需要一个接着一个地创建所需级别的时间序列 datamap

每个级别时间序列 datamap 需要分别进行删除

压缩时间序列 datamp

请参见预聚合 datamap 的压缩章节。同样适用于时间序列 datamap。

时间序列 datamap 上的数据管理

请参见预聚合 datamap 的数据管理章节。同样适用于时间序列 datamap。

 类似资料: