当前位置: 首页 > 工具软件 > Spring XD > 使用案例 >

Introducing Spring XD

薄腾
2023-12-01

Engineering
Mark Fisher
April 23, 2013

Today we are officially kicking off a new initiative called Spring XD whose theme is “tackling Big Data complexity”1.

The Spring Data team has been incredibly busy over the past few years, not only providing support for NoSQL datastores but also simplifying the development experience with Hadoop. With the creation of the Spring for Apache Hadoop project, we made it easier to get started developing Hadoop applications by providing a rich configuration model and a consistent programming model across Hadoop ecosystem projects such as Hive and Pig. As Spring users would expect, one can:

Configure and run MapReduce jobs as container managed objects.

Use template helper classes for HDFS, HBase, Pig and Hive to remove boilerplate code from your applications.

Spring for Apache Hadoop provides a strong foundation for building Hadoop applications. Spring XD builds upon these foundational assets and further simplifies the process of creating real-world big data solutions. Specifically, Spring XD addresses common big data use-cases such as:

High throughput distributed data ingestion into HDFS from a variety of input sources.
Real-time analytics at ingestion time, e.g. gathering metrics and counting values.
Hadoop workflow management via batch jobs that combine interactions with standard enterprise systems (e.g. RDBMS) as well as Hadoop operations (e.g. MapReduce, HDFS, Pig, Hive or Cascading).
High throughput data export, e.g. from HDFS to a RDBMS or NoSQL database.

The Spring Data book covers several of these use-cases, and the sample code for that book is available in our GitHub repository. Those examples are built upon Spring Batch and Spring Integration in addition to the Spring for Apache Hadoop project.

When it comes to managing event-driven data ingestion streams, Spring Integration provides a proven model, inspired by the well-established Enterprise Integration Patterns. Likewise, Spring Batch is a powerful solution for managing workflows, with robust support for the most important requirements such as job state management and retry/restart capabilities, and is the basis for JSR-352.

Extending the frameworks to support Big Data use-cases started with the book examples, but with Spring XD we aim to take that support to another level. First, we will provide a consistent model that spans the four use-case categories listed above. That model will be immediately familiar to those with Spring experience. Second, as Spring XD evolves we will be moving well beyond the API layer to provide an out-of-the-box executable server, a pluggable module system, a simple model for distributing data collection instances on or off the Hadoop cluster, and more.

If this sounds interesting to you, get involved! You can fork the repository and/or monitor JIRA. It’s practically a clean-slate now, but we wanted to make sure that our community members had a chance to get in on the ground floor. As always, we consider the feedback from our broad and passionate community to be our greatest asset. We have been doing a lot of prototyping over the past year, so you will see some code drops soon. Also, we plan to post blogs after each sprint so that you can follow along with the progress. And, if you haven’t yet registered for SpringOne, please do; Spring XD will be featured prominently.

Finally, be sure to sign up for our live streaming event tomorrow (April 24th): Pivotal: A New Platform for a New Era.

1XD = eXtreme Data or ‘x’ as in y = mx + b 
comments powered by Disqus

translate:
翻译:

今天,我们正式启动一个名为Spring XD的新计划,其主题是“解决大数据复杂性”1。
在过去几年中,Spring数据团队异常忙碌,不仅为NoSQL数据存储提供支持,而且简化了Hadoop的开发体验。随着Spring for Apache Hadoop项目的创建,我们通过跨Hadoop生态系统项目(如Hive和Pig)提供丰富的配置模型和一致的编程模型,使开发Hadoop应用程序变得更加容易。正如Spring用户所期望的,我们可以:
将MapReduce作业配置为容器管理对象并运行。
使用HDFS、HBase、Pig和Hive的模板助手类从应用程序中删除样板代码。
Apache Hadoop的Spring为构建Hadoop应用提供了坚实的基础。Spring XD建立在这些基础资产之上,进一步简化了创建真实世界大数据解决方案的过程。具体来说,Spring XD解决了常见的大数据用例,例如:
从各种输入源向HDFS的高吞吐量分布式数据摄取。
摄取时间的实时分析,例如收集指标和计数值。
Hadoop工作流管理,通过批处理作业将交互与标准企业系统(如RDBMS)以及Hadoop操作(如MapReduce、HDFS、Pig、Hive或级联)结合起来。
高吞吐量数据导出,例如从HDFS到RDBMS或NoSQL数据库。

 类似资料:

相关阅读

相关文章

相关问答