问题：

使用Spring批处理文件项读取器的多线程

洪昱

2023-03-14

在Spring批处理中，我试图读取CSV文件，并希望将每一行分配给一个单独的线程并对其进行处理。我试图通过使用TaskExecutor来实现它，但所有线程都在一次拾取同一行。我还尝试使用Partioner实现这个概念，同样的事情也发生了。请参阅下面我的配置Xml。

步骤说明

    <step id="Step2">
        <tasklet task-executor="taskExecutor">
            <chunk reader="reader" processor="processor" writer="writer" commit-interval="1" skip-limit="1">
            </chunk>
        </tasklet> 
    </step>

              <bean id="reader" class="org.springframework.batch.item.file.FlatFileItemReader">
<property name="resource" value="file:cvs/user.csv" />

<property name="lineMapper">
    <bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
      <!-- split it -->
      <property name="lineTokenizer">
            <bean
          class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
            <property name="names" value="userid,customerId,ssoId,flag1,flag2" />
        </bean>
      </property>
      <property name="fieldSetMapper">   

          <!-- map to an object -->
          <bean
            class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper">
            <property name="prototypeBeanName" value="user" />
          </bean>           
      </property>

      </bean>
  </property>

       </bean>

      <bean id="taskExecutor" class="org.springframework.core.task.SimpleAsyncTaskExecutor">
 <property name="concurrencyLimit" value="4"/>

我尝试过不同类型的任务执行器，但它们的行为方式都是一样的。如何将每一行分配给单独的线程？

共有3个答案

宋臻

2023-03-14

您可以将输入文件拆分为多个文件，使用Partitionner并使用线程加载小文件，但如果出现错误，必须在清理数据库后重新启动所有作业。

<batch:job id="transformJob">
<batch:step id="deleteDir" next="cleanDB">
    <batch:tasklet ref="fileDeletingTasklet" />
</batch:step>
<batch:step id="cleanDB" next="split">
    <batch:tasklet ref="countThreadTasklet" />
</batch:step>
<batch:step id="split" next="partitionerMasterImporter">
    <batch:tasklet>
        <batch:chunk reader="largeCSVReader" writer="smallCSVWriter" commit-interval="#{jobExecutionContext['chunk.count']}" />
    </batch:tasklet>
</batch:step>
<batch:step id="partitionerMasterImporter" next="partitionerMasterExporter">
    <partition step="importChunked" partitioner="filePartitioner">
        <handler grid-size="10" task-executor="taskExecutor" />
    </partition>
</batch:step>

完整的示例代码工作（在Github上）

希望这有帮助。

沈枫涟

2023-03-14

你的问题是你的读者不在范围步骤中。

这意味着：所有线程共享相同的输入流（资源文件）。

要为每个线程处理一行，您需要：

确保所有线程都从文件的开头到结尾读取文件（每个线程都应该为每个执行上下文打开流并关闭它）
分区程序必须为每个执行上下文注入开始和结束位置。
您的读者必须阅读具有此位置的文件。

我编写了一些代码，这是输出：

com的代码。测验分区器。RangePartitioner类：

public Map<String, ExecutionContext> partition() {

    Map < String, ExecutionContext > result = new HashMap < String, ExecutionContext >();

    int range = 1;
    int fromId = 1;
    int toId = range;

    for (int i = 1; i <= gridSize; i++) {
        ExecutionContext value = new ExecutionContext();

        log.debug("\nStarting : Thread" + i);
        log.debug("fromId : " + fromId);
        log.debug("toId : " + toId);

        value.putInt("fromId", fromId);
        value.putInt("toId", toId);

        // give each thread a name, thread 1,2,3
        value.putString("name", "Thread" + i);

        result.put("partition" + i, value);

        fromId = toId + 1;
        toId += range;

    }

    return result;
}

开始：线程1 fromId： 1 toId： 1

开始：线程2 fromId： 2 toId： 2

起始：Thread3从ID:3到ID:3

开始：线程4 fromId： 4 toId： 4

起始：Thread5 fromId:5 toId:5

起始：Thread6从ID:6到ID:6

开始：Thread7 fromId： 7 toId： 7

起始：Thread8 fromId:8 toId:8

起始：Thread9 fromId:9 toId:9

开始：html" target="_blank">线程10 fromId： 10 toId： 10

看看下面的配置：

http://www.springframework.org/schema/batch/spring-batch-2.2.xsdhttp://www.springframework.org/schema/beanshttp://www.springframework.org/schema/beans/spring-beans-3.2.xsd"

<import resource="../config/context.xml" />
<import resource="../config/database.xml" />

<bean id="mouvement" class="com.test.model.Mouvement" scope="prototype" />

<bean id="itemProcessor" class="com.test.processor.CustomItemProcessor" scope="step">
    <property name="threadName" value="#{stepExecutionContext[name]}" />
</bean>
<bean id="xmlItemWriter" class="com.test.writer.ItemWriter" />

<batch:job id="mouvementImport" xmlns:batch="http://www.springframework.org/schema/batch">
    <batch:listeners>
        <batch:listener ref="myAppJobExecutionListener" />
    </batch:listeners>

    <batch:step id="masterStep">
        <batch:partition step="slave" partitioner="rangePartitioner">
            <batch:handler grid-size="10" task-executor="taskExecutor" />
        </batch:partition>
    </batch:step>
</batch:job>

<bean id="rangePartitioner" class="com.test.partitioner.RangePartitioner" />

<bean id="taskExecutor" class="org.springframework.core.task.SimpleAsyncTaskExecutor" />

<batch:step id="slave">
    <batch:tasklet>
        <batch:listeners>
            <batch:listener ref="stepExecutionListener" />
        </batch:listeners>

        <batch:chunk reader="mouvementReader" writer="xmlItemWriter" processor="itemProcessor" commit-interval="1">
        </batch:chunk>

    </batch:tasklet>
</batch:step>



<bean id="stepExecutionListener" class="com.test.listener.step.StepExecutionListenerCtxInjecter" scope="step" />

<bean id="myAppJobExecutionListener" class="com.test.listener.job.MyAppJobExecutionListener" />

<bean id="mouvementReaderParent" class="org.springframework.batch.item.file.FlatFileItemReader" scope="step">

    <property name="resource" value="classpath:XXXXX/XXXXXXXX.csv" />

    <property name="lineMapper">
        <bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
            <property name="lineTokenizer">
                <bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
                    <property name="delimiter" value="|" />
                    <property name="names"
                        value="id,numen,prenom,grade,anneeScolaire,academieOrigin,academieArrivee,codeUsi,specialiteEmploiType,natureSupport,dateEffet,modaliteAffectation" />
                </bean>
            </property>
            <property name="fieldSetMapper">
                <bean class="com.test.mapper.MouvementFieldSetMapper" />
            </property>
        </bean>
    </property>

</bean>

<!--    <bean id="itemReader" scope="step" autowire-candidate="false" parent="mouvementReaderParent">-->
<!--        <property name="resource" value="#{stepExecutionContext[fileName]}" />-->
<!--    </bean>-->

<bean id="mouvementReader" class="com.test.reader.MouvementItemReader" scope="step">
    <property name="delegate" ref="mouvementReaderParent" />
    <property name="parameterValues">
        <map>
            <entry key="fromId" value="#{stepExecutionContext[fromId]}" />
            <entry key="toId" value="#{stepExecutionContext[toId]}" />
        </map>
    </property>
</bean>

<!--    <bean id="xmlItemWriter" class="org.springframework.batch.item.xml.StaxEventItemWriter">-->
<!--        <property name="resource" value="file:xml/outputs/Mouvements.xml" />-->
<!--        <property name="marshaller" ref="reportMarshaller" />-->
<!--        <property name="rootTagName" value="Mouvement" />-->
<!--    </bean>-->

<bean id="reportMarshaller" class="org.springframework.oxm.jaxb.Jaxb2Marshaller">
    <property name="classesToBeBound">
        <list>
            <value>com.test.model.Mouvement</value>
        </list>
    </property>
</bean>

待办事项：更改其他读取位置（开始和结束位置）的阅读器，如java中的扫描仪类。

希望这有帮助。

梅欣然

2023-03-14

FlatFileItemReader不是线程安全的。在您的示例中，您可以尝试将CSV文件拆分为较小的CSV文件，然后使用MultiResourcePartitioner来处理其中的每一个文件。这可以通过两个步骤完成，一个用于拆分原始文件（如10个较小的文件），另一个用于处理拆分的文件。这样，您就不会有任何问题，因为每个文件都将由一个线程处理。

例子：

<batch:job id="csvsplitandprocess">
     <batch:step id="step1" next="step2master">
    <batch:tasklet>
        <batch:chunk reader="largecsvreader" writer="csvwriter" commit-interval="500">
        </batch:chunk>
    </batch:tasklet>
    </batch:step>
    <batch:step id="step2master">
    <partition step="step2" partitioner="partitioner">
        <handler grid-size="10" task-executor="taskExecutor"/>
    </partition>
</batch:step>
</batch:job>

<batch:step id="step2">
    <batch:tasklet>
        <batch:chunk reader="smallcsvreader" writer="writer" commit-interval="100">
        </batch:chunk>
    </batch:tasklet>
</batch:step>


<bean id="taskExecutor" class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor">
            <property name="corePoolSize" value="10" />
            <property name="maxPoolSize" value="10" />
    </bean>

<bean id="partitioner" 
class="org.springframework.batch.core.partition.support.MultiResourcePartitioner">
<property name="resources" value="file:cvs/extracted/*.csv" />
</bean>

替代分区的方法可能是定制线程安全读取器，它将为每一行创建一个线程，但分区可能是您的最佳选择

类似资料：

Spring批处理单例读取器和多线程处理器

我必须使用Spring Batch配置一个作业。是否可以有一个单线程的项目阅读器，但多线程处理器？在这种情况下，ItemReader将通过从数据库中读取工作项（通过执行预定义的查询）来创建要处理的工作项，每个处理器将并行处理项/块。
单文件到多文件的Spring批处理多线程处理

我的问题陈述。读取包含1000万数据的csv文件，并将其存储在数据库中。用尽可能少的时间我使用java的简单多线程执行器实现了它，其逻辑几乎与spring batch的chunk相似。从csv文件中读取预配置数量的数据，然后创建一个线程，并将数据传递给线程，该线程验证数据，然后写入多线程运行的文件。完成所有任务后，我将调用sql loader来加载每个文件。现在我想把这段代码移到spring b
使用Spring批处理读取具有重复批处理的文件

我知道匹配模式解析器，这是Spring批处理提供的。我需要关于如何构造批处理作业的帮助，以便它可以读取循环中的记录类型5和记录类型6。
Spring批处理：未读取文件

我正在尝试创建一个应用程序，该应用程序使用spring-batch-excel扩展名来读取用户通过web界面上传的Excel文件，以便解析Excel文件中的地址。当代码运行时，没有错误，但我得到的只是我日志中的以下内容。即使我的处理器和Writer中都有log/syso（它们从未被调用过，我所能想象的是它没有正确读取文件，也没有返回要处理/写入的数据）。是的，这个文件有数据，实际上有几千条记录。
spring批处理多线程处理器

我正在尝试使用多个处理器类在处理器步骤中处理记录。这些类可以并行工作。目前我已经编写了一个多线程步骤，其中我设置处理器类的输入和输出行提交给遗嘱执行人服务获取所有未来对象并收集最终输出
Spring批处理项目读取

我正在使用JpaPagingItemReaderBuilder查询一个DB，结果被插入到另一个DB中。查询返回的结果没有任何问题，但我得到了一个错误与读取器的返回，在处理器中，您可以检查我的编码和错误下面。有谁能给我一点启示吗？为什么我不能处理结果？

使用Spring批处理文件项读取器的多线程

共有3个答案

相关问答

相关文章

相关阅读

相关工具

相关文档