问题：

Spring批处理和多线程步骤

洪鸿博

2023-03-14

我目前正在处理一批数据，这些数据来自一个拥有数百万行的大型SQL数据库。

它在处理器中执行一些处理，包括通过带有连接的大型sql查询对从Reader检索到的行进行分组。

编写器将结果写入另一个表。

问题是此Batch存在性能问题，因为Sql选择查询需要大量时间并且步骤不会在多线程中执行。

因此，我希望在多标题中运行它们，但问题是，这些步骤通过计算具有相同类型的所有行的总数来对行进行分组。

因此，如果我将它放在多标题中，当每个分区将在不同的线程中处理时，我如何做到这一点，因为我知道有数百万行无法存储在上下文中，无法在步骤之后检索它们并进行分组。我也无法将它们保存在数据库中，因为它有数百万行。你知道我如何做到这一点吗？我希望我能很好地解释我的问题。提前感谢您的帮助

顾乐心

2023-03-14

我有过像你类似的任务，不喜欢我们使用java 1.7和spring 3.x。我可以在xml中提供配置，所以也许你可以使用注释配置，我没有尝试过。

<batch:job id="dualAgeRestrictionJob">
    <-- use a listner if you need -->
    <batch:listeners>
        <batch:listener ref="dualAgeRestrictionJobListener" />
    </batch:listeners>
    <!-- master step, 10 threads (grid-size) -->
    <batch:step id="dualMasterStep">
        <partition step="dualSlaveStep"
            partitioner="arInputRangePartitioner">
            <handler grid-size="${AR_GRID_SIZE}" task-executor="taskExecutor" />
        </partition>
    </batch:step>   
</batch:job>
<-- here you define your reader processor and writer and the commit interval -->
<batch:step id="dualSlaveStep">
    <batch:tasklet transaction-manager="transactionManager">
        <batch:chunk reader="arInputPagingItemReader"
            writer="arOutputWriter" processor="arInputItemProcessor"
            commit-interval="${AR_COMMIT_INTERVAL}" />
    </batch:tasklet>
</batch:step>
<!-- The partitioner -->
<bean id="arInputRangePartitioner" class="com.example.ArInputRangePartitioner">
    <property name="arInputDao" ref="arInputJDBCTemplate" />
    <property name="statsForMail" ref="statsForMail" />
</bean>
<bean id="taskExecutor"
        class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor">
    <property name="corePoolSize" value="${AR_CORE_POOL_SIZE}" />
    <property name="maxPoolSize" value="${AR_MAX_POOL_SIZE}" />
    <property name="allowCoreThreadTimeOut" value="${AR_ALLOW_CORE_THREAD_TIME_OUT}" />
</bean>
<bean id="transactionManager"
        class="org.springframework.jdbc.datasource.DataSourceTransactionManager">
    <property name="dataSource" ref="kvrDatasource" />
</bean>

分区器进行查询以计算行并为每个线程生成块：

public class ArInputRangePartitioner implements Partitioner {
    
    private static final Logger logger = LoggerFactory.getLogger(ArInputRangePartitioner.class);

    private ArInputDao arInputDao;
    
    private StatsForMail statsForMail;

    @Override
    public Map<String, ExecutionContext> partition(int gridSize) {
        Map<String, ExecutionContext> result = new HashMap<String, ExecutionContext>();
        // You can make a query and then divede the from to for each thread
        Map<Integer,Integer> idMap = arInputDao.getOrderIdList();
        Integer countRow = idMap.size();
        statsForMail.setNumberOfRecords( countRow );  
        Integer range = countRow / gridSize;
        Integer remains = countRow % gridSize;
        int fromId = 1;
        int toId = range;
        for (int i = 1; i <= gridSize; i++) {
            ExecutionContext value = new ExecutionContext();
            if(i == gridSize) {
                toId += remains;
            }
            logger.info("\nStarting : Thread {}", i);
            logger.info("fromId : {}", idMap.get(fromId) );
            logger.info("toId : {}", idMap.get(toId) );
            value.putInt("fromId", idMap.get(fromId) );
            value.putInt("toId", idMap.get(toId) );
            value.putString("name", "Thread" + i);
            result.put("partition" + i, value);
            fromId = toId + 1;
            toId += range;
        }
        return result;
    }
    
    public ArInputDao getArInputDao() {
        return arInputDao;
    }

    public void setArInputDao(ArInputDao arInputDao) {
        this.arInputDao = arInputDao;
    }

    public StatsForMail getStatsForMail() {
        return statsForMail;
    }

    public void setStatsForMail(StatsForMail statsForMail) {
        this.statsForMail = statsForMail;
    }

}

这是读取器和写入器的配置:

<bean id="arInputPagingItemReader" class="org.springframework.batch.item.database.JdbcPagingItemReader" scope="step" >
    <property name="dataSource" ref="kvrDatasource" />
    <property name="queryProvider">
        <bean class="org.springframework.batch.item.database.support.SqlPagingQueryProviderFactoryBean" >
            <property name="dataSource" ref="kvrDatasource" />
            <property name="selectClause" value="${AR_INPUT_PAGING_ITEM_READER_SELECT}" />
            <property name="fromClause" value="${AR_INPUT_PAGING_ITEM_READER_FROM}" />          <property name="whereClause" value="${AR_INPUT_PAGING_ITEM_READER_WHERE}" />
            <property name="sortKey" value="${AR_INPUT_PAGING_ITEM_READER_SORT}" />
        </bean>
    </property>
    <!-- Inject via the ExecutionContext in rangePartitioner -->
    <property name="parameterValues">
        <map>
            <entry key="fromId" value="#{stepExecutionContext[fromId]}" />
            <entry key="toId" value="#{stepExecutionContext[toId]}" />
        </map>
    </property>
    <property name="pageSize" value="${AR_PAGE_SIZE}" />
    <property name="rowMapper" ref="arOutInRowMapper" />
</bean>
<bean id="arOutputWriter"
        class="org.springframework.batch.item.database.JdbcBatchItemWriter"
        scope="step">
    <property name="dataSource" ref="kvrDatasource" />
    <property name="sql" value="${SQL_AR_OUTPUT_INSERT}"/>
    <property name="itemSqlParameterSourceProvider">
        <bean class="org.springframework.batch.item.database.BeanPropertyItemSqlParameterSourceProvider" />
    </property>
</bean>

也许有人知道如何用现代Spring批次/Spring靴来转换它

PS：不要使用很多线程，否则Spring批次会浪费很多时间来填充它自己的表。您必须进行一些基准测试才能了解正确的配置

Spring批处理和多线程步骤

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档