flink athenaX调研

艾泉

2023-12-01

Uber 有超大量的实时数据需要分析，（路况分析，计算车辆到达需要的时间等）

有More than one trillion real-time 消息通过他们的kafka，so 他们需要一个infrastructure 并且infrastructure (平台)需要有如下特性:

(1) easily navigable（通航） by all users regardless(而不用管) of technical expertise（专门知识）

(2) scalable and efficient enough to analyze real-time events, and (3) robust enough to continuously support hundreds if not thousands of critical(危机) jobs.

athenaX运行流程step below

AthenaX’s workflow follows the steps below:

Users specify a job in SQL and submit it to the AthenaX master.
The AthenaX master validates the query and compiles it down to a Flink job.
The AthenaX master packages, deploys, and executes the job in the YARN cluster. The master also recovers the jobs in the case of a failure.
The job starts processing the data and produces results to external systems (e.g., Kafka).

支持UDF 用户自定义函数

Uber说他们经验表明 70% 流处理都可以用SQL

Select a.a , b.b from a join b on a.id = b.id

compiled 阶段 athenaX会最小化join数据量提高性能

更快的收集和处理数据，给用户带来更好的体验

从数据中，或把数据和earning联系起来

main component： AthenaX master | catalog | connectors

design role ： job catalog cluster instance（flink job）

watchdog 完成监控 fail recover

官方文档：https://athenax.readthedocs.io/en/latest/

flink athenaX调研

相关阅读

相关文章

相关问答

相关文档