Spring Batch & Spring Cloud Task & Spring Cloud Data Flow & Spring Cloud Deployer

琴刚豪

2023-12-01

整理之前的笔记，发现有一些内容没有发出来，陆续发出来。。。

Partitioner
StepExecutionSplitter
PartitionHandler

Configuration

JobFactory：创建Job对象，可以进一步获取Job的名称
JobLocator：根据任务名称获取任务详情，ListableJobLocator，JobRegistry
StepRegistry：分步注册器
JobRepository：作业仓库
JobLauncher：作业启动器

Configurer

BatchConfigurer

concept

org.springframework.batch.core

Job
JobInstance
JobExecution：一个JobInstance可能因为运行时错误运行多次，每一此时一个JobExecution对象
JobParameter
Step
StepExecution
JobLauncher/JobExplorer/JobOperator:Job执行相关的接口。JobOperator=JobLauncher+JobExplorer。JobOperator适用于命令行或者JMX使用。
通过自定义tasklet可以复用开发者已有的企业服务

Chunk拦截器接口

ChunkListener
ItemProcessListener
ItemReadListener
ItemWriterListener
SkipListener
RetryListener

ItemReader

只实现ItemReader接口的读实现是不支持重启的，为了支持重启，自定义的ItemReader需要实现接口ItemStream。

Spring Batch提供了跳过、重试、重启等的策略。根据处理过程中的错误类型选择，比如网络错误，数据库锁错误等的错误可以选择重试，像类型转换错误，数据格式不对啊，重试多少次还是不会行，这种就选择跳过，重启，暂时还没想到例子。

*Context

JobContext
StepContext
ChunkContext

Spring Cloud Task

使用@EnableTask标记一个SpringBootApplication之后，这个应用就成了一个生存时间短的Task应用了。整个Jar包，直接给Spring Cloud Data Flow就可以运行了。
举个生动的例子，就知道了，使用@EnableTask标记的SpringBootApplication的spring.application.name配置的值就是spring cloud task在数据库中的任务记录的task的名字。当然了，如果配置了spring.cloud.task.name，那任务的名字就是这个key指定的值了。
一个被@EnableTask标记的SpringBootApplication，独立运行与一个JVM中，是一个进程。

TaskConfigurer一共需要三个东西：TaskRepository（任务的增删改）、TaskExplorer（任务的查询）、PlatformTransactionManager（事务管理器）

spring-cloud-task-batch模块

TaskBatchExecutionListener：负责存储Spring Batch job和Spring Cloud task的关系
真正负载存储的是TaskBatchDao接口的实现，有两个内置的实现JdbcTaskBatchDao和MapTaskBatchDao。

TaskBatchExecutionListenerBeanPostProcessor会给ioc容器中所有的Jobbean注册TaskBatchExecutionListener。Job对象有一个registerJobExecutionListener方法。

此模块的主要作用是跟spring batch整合

spring-cloud-task-stream

此模块的作用是监听一个sink，启动任务
将任务执行过程中配置的事件发送到mq中

Spring Cloud Deployer

The Spring Cloud Deployer project defines an SPI for deploying long lived applications and short lived tasks

Spring Cloud Data Flow

Spring Cloud Data Flow is a toolkit for building data integration and real-time data processing pipelines.

Pipelines consist of Spring Boot apps, built using the Spring Cloud Stream or Spring Cloud Task microservice frameworks. This makes Spring Cloud Data Flow suitable for a range of data processing use cases, from import/export to event streaming and predictive analytics.

架构

Application、Data Flow Server、target runtime
Application有两种：

通过消息中间件生产或者消费无限大的数据的存活时间长的Stream Application
处理有限数据然后结束的存活时间短的Task Application

Application可以根据运行时环境打包成两种形式：

Spring Boot uber-jar，通过Spring Resource的实现可以加在，比如说maven Repository，file，http等。
Docker

运行时环境（runtime ）是application执行的地方。
应用的target runtimes是应用部署的platform。目前支持的target runtimes有：

Cloud Foundry
Apache YARN
Kubernetes
Apache Mesos
Local Server for development

Spring Data Flow定义了一套部署应用到target runtimes的SPI，你可以去扩展他，支持其他的平台，比如说Docker Swarm。

负责将应用部署到runtime的是Data Flow Server这个组件。Data Flow Server 简化了应用的部署。

要操作hadoop，当然用spring hadoop，看看spring hadoop中有没有yarn和mapreduce