JDBC批处理操作的理解

爱茂勋

2023-03-14

问题内容：

我在应用程序中使用Hibernate
ORM和PostgreSQL，有时我使用批处理操作。最初，我不明白为什么在批处理大小为25的日志中，会生成25个查询，并且最初认为它无法正常工作。但是之后，我查看了pg驱动程序的源代码，并在PgStatement类中找到了以下几行：

 public int[] executeBatch() throws SQLException {
        this.checkClosed();
        this.closeForNextExecution();
        if (this.batchStatements != null && !this.batchStatements.isEmpty()) {
            this.transformQueriesAndParameters();
//confuses next line, because we have array of identical queries
            Query[] queries = (Query[])this.batchStatements.toArray(new Query[0]);
            ParameterList[] parameterLists = 
(ParameterList[])this.batchParameters.toArray(new ParameterList[0]); 
            this.batchStatements.clear();
            this.batchParameters.clear();

并在PgPreparedStatement类中

    public void addBatch() throws SQLException {
        checkClosed();
        if (batchStatements == null) {
          batchStatements = new ArrayList<Query>();
          batchParameters = new ArrayList<ParameterList>();
        }

        batchParameters.add(preparedParameters.copy());
        Query query = preparedQuery.query;
    //confuses next line
        if (!(query instanceof BatchedQuery) || batchStatements.isEmpty()) {
          batchStatements.add(query);
        }
      }

我注意到，事实证明，如果批处理的大小达到25，则会发送25个带有附加参数的查询。

数据库的日志确认了这一点，例如：

2017-12-06 01:22:08.023 MSK [18402] postgres@buzzfactory СООБЩЕНИЕ:  выполнение S_3: BEGIN
2017-12-06 01:22:08.024 MSK [18402] postgres@buzzfactory СООБЩЕНИЕ:  выполнение S_4: select nextval ('tests_id_seq')
2017-12-06 01:22:08.041 MSK [18402] postgres@buzzfactory СООБЩЕНИЕ:  выполнение S_2: insert into tests (name, id) values ($1, $2)     
2017-12-06 01:22:08.041 MSK [18402] postgres@buzzfactory ПОДРОБНОСТИ:  параметры: $1 = 'test', $2 = '1'
2017-12-06 01:22:08.041 MSK [18402] postgres@buzzfactory СООБЩЕНИЕ:  выполнение S_2: insert into tests (name, id) values ($1, $2)
2017-12-06 01:22:08.041 MSK [18402] postgres@buzzfactory ПОДРОБНОСТИ:  параметры: $1 = 'test', $2 = '2'
...
x23 queries with parameters 
...
2017-12-06 01:22:08.063 MSK [18402] postgres@buzzfactory СООБЩЕНИЕ:  выполнение S_5: COMMIT

但是我认为一个查询必须使用25个参数的数组来执行。还是我不明白批处理插入如何与准备好的语句一起工作？为什么重复一个查询n次？

毕竟，我试图在这个地方调试查询

if (!(query instanceof BatchedQuery) || batchStatements.isEmpty()) {

并注意到我的查询始终是SimpleQuery的实例，而不是BatchedQuery的实例。也许这是解决问题的办法？我找不到有关BatchedQuery的信息

问题答案：

可能涉及各种批处理，我将介绍其中的PostgreSQL JDBC驱动程序（pgjdbc）。

TL;
DR：在使用批处理API的情况下，pgjdbc确实使用较少的网络轮询。BatchedQuery仅在reWriteBatchedInserts=true传递给pgjdbc连接设置时使用。

您可能会发现与https://www.slideshare.net/VladimirSitnikv/postgresql-and-jdbc-
striving-for-high-
performance
有关（幻灯片44，…）

当涉及到查询执行时，网络延迟通常是经过时间的重要部分。

假设情况是插入10行。

没有批处理（例如，仅PreparedStatement#execute在循环中）。驱动程序将执行以下操作
```
execute query
```
sync <– wait for the response from the DB
execute query
sync <– wait for the response from the DB
execute query
sync <– wait for the response from the DB
…

大量时间将花费在“等待数据库”中

JDBC批处理API。这PreparedStatement#addBatch()使驱动程序可以在单个网络往返中发送多个“查询执行”。但是，当前的实现仍将大批拆分成较小的批，以避免TCP死锁。

动作会好得多：

    execute query
...
execute query
execute query
execute query
sync <-- wait for the response from the DB

请注意，即使有了#addBatch，“执行查询”命令也有开销。服务器确实需要花费大量时间来单独处理每条消息。

减少查询数量的一种方法是使用多值插入。例如：

    insert into tab(a,b,c) values (?,?,?), (?,?,?), ..., (?,?,?)

该PostgreSQL允许一次插入多行。缺点是您没有详细的（每行）错误消息。当前，Hibernate尚未实现多值插入。

但是，自9.4.1209（2016-07-15）起，pgjdbc可以即时将常规批处理插入重写为多值。

为了激活多值重写，您需要添加reWriteBatchedInserts=true连接属性。该功能最初在https://github.com/pgjdbc/pgjdbc/pull/491中开发

使用2条语句插入10行足够聪明。第一个是8值语句，第二个是2值语句。使用2的幂可以使pgjdbc保持不同的语句的数量合理，并且由于经常使用的语句是服务器准备的，因此提高了性能（请参阅PostgreSQL服务器端准备好的语句的寿命）

BatchedQuery是表示这种多值语句，因此您将看到reWriteBatchedInserts=true仅在大写情况下使用的类。

该功能的缺点可能包括：较低的细节作为“批处理结果”。例如，常规批处理为您提供“每个语句行计数”，但是在多值情况下，您仅获得“语句完成”状态。最重要的是，即时重写器可能无法解析某些SQL语句（例如https://github.com/pgjdbc/pgjdbc/issues/1045）。

JDBC批处理操作的理解

相关阅读

相关文章

相关问答

相关工具

相关文档