当前位置: 首页 > 软件库 > 云计算 > 云原生 >

azure-event-hubs-spark

授权协议 Apache-2.0 License
开发语言 Java
所属分类 云计算、 云原生
软件类型 开源软件
地区 不详
投 递 者 朱通
操作系统 跨平台
开源组织
适用人群 未知
 软件概览

Azure Event Hubs Connector for Apache Spark

chat on gitter

This is the source code of the Azure Event Hubs Connector for Apache Spark.

Azure Event Hubs is a highly scalable publish-subscribe service that can ingest millions of events per second and stream them into multiple applications.Spark Streaming and Structured Streaming are scalable and fault-tolerant stream processing engines that allow users to process huge amounts of data usingcomplex algorithms expressed with high-level functions like map, reduce, join, and window. This data can then be pushed tofilesystems, databases, or even back to Event Hubs.

By making Event Hubs and Spark easier to use together, we hope this connector makes building scalable, fault-tolerant applications easier for our users.

Latest Releases

Spark

Spark Version Package Name Package Version
Spark 3.0 azure-eventhubs-spark_2.12
Spark 2.4 azure-eventhubs-spark_2.11
Spark 2.4 azure-eventhubs-spark_2.12
Spark 2.3 azure-eventhubs-spark_2.11
Spark 2.2 azure-eventhubs-spark_2.11
Spark 2.1 azure-eventhubs-spark_2.11

Databricks

Databricks Runtime Version Artifact Id Package Version
Databricks Runtime 8.X azure-eventhubs-spark_2.12
Databricks Runtime 7.X azure-eventhubs-spark_2.12
Databricks Runtime 6.X azure-eventhubs-spark_2.11
Databricks Runtime 5.X azure-eventhubs-spark_2.11

Roadmap

There is an open issue for each planned feature/enhancement.

FAQ

We maintain an FAQ - reach out to us via gitterif you think anything needs to be added or clarified!

Usage

Linking

For Scala/Java applications using SBT/Maven project definitions, link your application with the artifact below.Note: See Latest Releases to find the correct artifact for your version of Apache Spark (or Databricks)!

groupId = com.microsoft.azure
artifactId = azure-eventhubs-spark_2.11
version = 2.3.21

or

groupId = com.microsoft.azure
artifactId = azure-eventhubs-spark_2.12
version = 2.3.21

Documentation

Documentation for our connector can be found here. The integration guides there contain all the information you need to use this library.

If you're new to Apache Spark and/or Event Hubs, then we highly recommend reading their documentation first. You can read Event Hubsdocumentation here, documentation for Spark Streaminghere, and, the last but not least, Structured Streaminghere.

Further Assistance

If you need additional assistance, please don't hesitate to ask! General questions and discussion should happen on ourgitter chat. Please open an issue for bug reports and feature requests! Feedback, featurerequests, bug reports, etc are all welcomed!

Contributing

If you'd like to help contribute (we'd love to have your help!), then go to our Contributor's Guide for more information.

Build Prerequisites

In order to use the connector, you need to have:

More details on building from source and running tests can be found in our Contributor's Guide.

Build Command

// Builds jar and runs all tests
mvn clean package

// Builds jar, runs all tests, and installs jar to your local maven repository
mvn clean install
  • 1、百度比较难找 azure-eventhubs-spark 。 2、所以我把maven依赖的地址晒出来 https://mvnrepository.com/artifact/com.microsoft.azure/azure-eventhubs-spark

  • 这里面又是一个简单的模拟,通过创建两个event hubs。然后模拟同时给这两个event hub发流数据。 using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; using Microsoft.ServiceBus; u

  • https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction https://learn.microsoft.com/en-us/azure/data-factory/ https://learn.microsoft.com/en-us/azure/synapse-analytics/ ht

  • Event Hub: https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-what-is-event-hubs Event Hub和Databricks集成:https://docs.azuredatabricks.net/spark/latest/structured-streaming/streaming-event-hub

  • 我们常常会有这样的应用场景,就是一个复杂系统可能是由多个模块组成,然后每个模块都会有自己的logging系统。最后我们在处理日志数据的时候,会希望能够把不同模块的日志数据join在一起。一个简单的例子就是,广告有显示的日志,然后当用户点击了广告后又会有点击的日志,那么如果我想把广告信息和点击信息组合在一起就需要从这两个日志流源得到的数据进行join。这在API文档里面有相应的介绍Stream-St

  • 我们在处理流数据的时候,往往会有实时性要求。可是如果我们直接按照程序所在服务器的当前时间计算又不行,比如当上游日志数据延迟了,则所有的这部分数据都会被抛弃掉。所以一般我们在记录日志的时候,加上日志的时间戳。这样我们在进行流处理的时候,就可以把日志记录的时间拿出来,根据这个时间来决定流处理是不是要往下进行。而往往我们会以最早到达的日志作为时间参考点,如果下一个日志比这个时间点晚的太多,就可以抛弃掉。

  • 微軟開源了一個原為內部使用的大資料專案Data Accelerator,能進行大規模資料處理,簡化在Apache Spark上串流傳輸的工作,支援SQL以及即時查詢,不需要撰寫程式碼就能設定處理規則與設定警報。從2017年開發以來,已經大規模應用在各種微軟產品工作管線上,現在於GitHub上開源。微軟在2017年開始發展Data Accelerator專案,為的是要處理多來源串流資料,將這些資料重

 相关资料
  • Microsoft Azure Event Hubs Migrating to Azure Event Hubs for Apache Kafka Ecosystems An Azure Event Hubs Kafka endpoint enables users to connect to Azure Event Hubs using the Kafka protocol. By making

  • 针对 Microsoft Azure Eventhubs 的 Storm spout 和 bolt 实现 build mvn clean package 运行 topology 示例 要运行 topology 示例, 您需要去修改 config.properties 文件与 eventhubs 配置. 以下是一个例子: eventhubspout.username = [username: p

  • 我有一个服务器,这将需要发送消息到几个客户机,让客户机知道一些事情需要做。 当我发送消息时,哪个监听器将处理该消息是随机的。 是否可以用Azure Event Hub向多个收件人发送消息?

  • Java 8, Flink 1.9.1, Azure事件集线器 从2020年1月5日起,我无法再用我的flink项目连接到azure event hub。我在几个spring boot应用程序中也遇到了同样的问题,但当我升级到spring boot 2.2.2时,这个问题得到了解决,它还将Kafka客户端和Kafka依赖项更新到2.3.1。我试图更新Flink的Kafka依赖项,但没有成功。我也提

  • Type: DOMEvent MooTools的DOMEvent方法。 DOMEvent Method: constructor 语法: new DOMEvent([event[, win]]); 参数: event - (event, required)HTMLEvent对象。 win - (window, optional: defaults to window)事件的上下文。 属性: pag