论文阅读记录[ Benchmarking Streaming Computation Engines: Storm, Flink and Spark Streaming ]

拓拔弘厚

2023-12-01

简介：雅虎发布的一份各种流处理引擎的基准测试，包括Storm, Flink, Spark Streaming

动机：贴近生产环境，使用Kafka和Redis进行数据获取和存储，设计并实现了一个真实的流处理基准。

论文中的一些测试结果和结论：

原文：The results demonstrate that at fairly high throughput, Storm and Flink have much lower latency than Spark Streaming (whose latency is proportional to throughput rate). On the other hand, Spark Streaming is able to handle higher maximum throughput rate while its performance is quite sensitive to the batch duration setting.

Storm, Flink延迟更小，更加接近于真正的“实时”流处理系统。
Spark Streaming有更高的吞吐率，同时延迟也最高。Spark Streaming的性能对批处理间隔时间设置(batch duration setting)敏感。

原文：at high-throughput both versions of Storm struggled。

storm不适合高吞吐量。

原文：Storm’s acking functionality as of version 0.11.0 incurs enough overhead to be a limitation at very high throughputs, and while processing guarantees require acking, flow control could be achieved via backpressure instead.

Storm 的 acking 功能会产生很大的开销，从而限制了高吞吐。虽然处理保证需要 acking，但可以通过背压来实现流量控制。（这个第二句话不是很懂啥意思……）

论文阅读记录[ Benchmarking Streaming Computation Engines: Storm, Flink and Spark Streaming ]

相关阅读

相关文章

相关问答

相关文档