Uber 有超大量的实时数据需要分析,(路况分析,计算车辆到达需要的时间等)
有More than one trillion real-time 消息通过他们的kafka,so 他们需要一个infrastructure 并且infrastructure (平台)需要有如下特性:
(1) easily navigable(通航) by all users regardless(而不用管) of technical expertise(专门知识)
(2) scalable and efficient enough to analyze real-time events, and (3) robust enough to continuously support hundreds if not thousands of critical(危机) jobs.
athenaX运行流程step below
AthenaX’s workflow follows the steps below:
支持UDF 用户自定义函数
Uber说他们经验表明 70% 流处理都可以用SQL
Select a.a , b.b from a join b on a.id = b.id
compiled 阶段 athenaX会最小化join数据量 提高性能
更快的收集和处理数据,给用户带来更好的体验
从数据中,或把数据和earning联系起来
main component: AthenaX master | catalog | connectors
design role : job catalog cluster instance(flink job)
watchdog 完成监控 fail recover