DBCP之poolPreparedStatements

栾钟展

2023-12-01

1.概述
DBCP的poolPreparedStatements是在每个connection中缓存一定数量的preparestatement，用于减少driver的statement对象创建，和数据库的网络交互以及数据库的cursor创建。

根据我们的测试结果，基本是节省1ms左右。

但在使用过程中遇到两个诡异问题，均为开源软件的bug，这里记录下排查过程。

2.问题1
Xml代码
并发情况下的PreparedStatement is closed.（spring的bug）
2.1 现象
在一个高并发的性能测试过程中，发现如下问题：

Sql代码
Cause: java.sql.SQLException: org.apache.commons.dbcp.PoolablePreparedStatement is closed.; nested exception is com.ibatis.commo
n.jdbc.exception.NestedSQLException:
--- The error occurred while applying a parameter map.
--- Check the CALL_INFO.SELECT_CALLINFO_BY_LOGINID-InlineParameterMap.
--- Check the parameter mapping for the 'loginIds[0]' property.
--- Cause: java.sql.SQLException: org.apache.commons.dbcp.PoolablePreparedStatement is closed.

配置如下：

Xml代码
<bean id="dataSource" class="org.apache.commons.dbcp.BasicDataSource" destroy-method="close">
<property name="maxActive" value="20"/>
<property name="poolPreparedStatements" value="true"></property>
<property name="maxOpenPreparedStatements" value="50"></property>
</bean>

使用30个线程并发压测时发现。

2.2 原因
在spring的SqlMapClientTemplate的execute方法的finally中，先执行springCon.close()然后执行session.close().

资源这样释放是有问题的，原因如下：

connection关闭过程如下：

Java代码
springCon.close()->DataSourceUtils.doReleaseConnection->PoolGuardConnectionWrapper.close()
->PoolableConnection.close()->_pool.returnObject(this)->_factory.passivateObject(obj)->DelegatingConnection.passivate()

statment关闭过程如下：

Java代码
DelegatingStatement.close()->PoolablePreparedStatement.close()->statPool.returnObj()->
factory.passivateObject(key,obj)-> ((DelegatingPreparedStatement)obj).passivate()->closed=true

在线程A的connection的关闭过程中，会触发对statment的关闭 (DelegatingConnection.passivate()中），这样我们的connection和statment都被归还到连接池了，然后由于高并发（而且连接池默认是FILO)，这个连接很快又被借出去使用（线程B)，使用到一半，线程A的session.close()触发了，把 statment的状态又置为了close，这样就出问题了。简单的说，是因为一个归还的连接池中的statment，还有一个外部引用持有并发生关闭操作导致的。

DelegatingPreparedStatement代码：

修复后的版本是先执行session.close()然后执行springCon.close()就可以了。

2.3 排查过程

这个问题的排查过程相对简单：

1.和性能测试同学发现问题后，先定位是并发略大于池大小的时候出现，由此基本断定是并发导致的问题。

2.google搜索相关报错，发现https://jira.springsource.org/browse/SPR-7760描述基本符合我们遇到的问题

3.在开发环境重现该问题（尽量用最小依赖的代码，单元测试或者main函数来重现）

4.分析原因，这是这次排查中最复杂的一步，因为原因还是比较隐蔽。基本上分析的思路如下：先看报错的触发条件，在看该触发条件的改变入口点，然后再考虑程序执行过程中调用该入口点的几个地方，结合上面多线程的初步判断来分析。

2.问题2
Html代码
偶发性报“关闭的语句"问题，commons-pool 1.3 的bug
2.1 现象
线上环境偶尔出现，测试环境不能重现

Ruby代码
Caused by: java.sql.SQLException: 关闭的语句
        at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:112)
        at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:146)


2.2 原因

commons-pool的GenericKeyedObjectPool结构大概是 HashMap<Object,LinkedList<ObjectTimestampPair>>, 也就是说，每个key可以有多个元素，用list连起来（maxactive代表list大小控制，maxtotal代表所有的原始，比如3个key，一个key有2个元素，就是6）在returnObject的时候，会将元素封装为ObjectTimestampPair放入：

Java代码
LinkedList pool = (LinkedList) (_poolMap.get(key));
...
pool.addLast(new ObjectTimestampPair(obj));

而ObjectTimestampPair结构如下：

Java代码
Object value;
  long tstamp;

  ObjectTimestampPair(Object val) {
      this(val, System.currentTimeMillis());
  }

其目的是保存每个元素的最后归还时间，这样在需要的时候可以逐出最久没有使用的（LRU),事实上这个过程发生在borrowObject的时候：

Java代码
if ((_maxTotal > 0) && (_totalActive + _totalIdle >= _maxTotal)) {
                    clearOldest();
}
然后就是最重要的clearOldest方法：

Java代码
public synchronized void clearOldest() {
       // build sorted map of idle objects
       TreeMap map = new TreeMap();
       for (Iterator keyiter = _poolMap.keySet().iterator(); keyiter.hasNext();) {
           Object key = keyiter.next();
           LinkedList list = (LinkedList) _poolMap.get(key);
           for (Iterator it = list.iterator(); it.hasNext();) {
               // each item into the map uses the objectimestamppair object
               // as the key.  It then gets sorted based on the timstamp field
               // each value in the map is the parent list it belongs in.
               ObjectTimestampPair pair = (ObjectTimestampPair) it.next();
               map.put(pair, key);
           }
       }

       // Now iterate created map and kill the first 15% plus one to account for zero
       Set setPairKeys = map.entrySet();
       int itemsToRemove = ((int) (map.size() * 0.15)) + 1;

       Iterator iter = setPairKeys.iterator();
       while (iter.hasNext() && itemsToRemove > 0) {
           Map.Entry entry = (Map.Entry) iter.next();
           // kind of backwards on naming.  In the map, each key is the objecttimestamppair
           // because it has the ordering with the timestamp value.  Each value that the
           // key references is the key of the list it belongs to.
           Object key = entry.getValue();
           ObjectTimestampPair pairTimeStamp = (ObjectTimestampPair) entry.getKey();
           LinkedList list = (LinkedList) _poolMap.get(key);
           list.remove(pairTimeStamp);

           try {
               _factory.destroyObject(key, pairTimeStamp.value);
           } catch (Exception e) {
               // ignore error, keep destroying the rest
           }
           // if that was the last object for that key, drop that pool
           if (list.isEmpty()) {
               _poolMap.remove(key);
           }
           _totalIdle--;
           itemsToRemove--;
       }
       notifyAll();
   }

可以看到，这个方法主要过程是将所有元素取出来，放到一个treemap中（按归还时间排序），然后根据逐出的比例，取出部分的entry，并取得在池中的key（sql生成的那个key），然后根据key取出对应的list，将该ObjectTimestampPair从list中移除，同时调用 _factory.destroyObject关闭statement。

问题就是关闭statement的操作成功了，但是从pool中移除的动作失败了，为什么呢，是因为在map.put(pair, key)的过程中出错了。注意这里的map是个treeMap，它的key比较是使用元素的compareTo的（参考Treemap），而如果我们有同一个connection的两个statment是在同一ms内放回来的，那么他们的tstamp就会相同，而且ObjectTimestampPair 的comparTo方法如下：

Java代码
public int compareTo(ObjectTimestampPair other) {
           return (int) (this.tstamp - other.tstamp);
       }
因此，在map.put(pair, key)动作的时候，过程是这样的，

Java代码
put（pair1，key1）
put（pair2，key2）
因为Treemap的put是：

Java代码
int diff = compare(key, t.getKey(), comparator);
if (diff == 0) return t.setValue(value);
而正好我们的pair1和pair2的tstamp又相等，当pair2放进去的时候，就只更新了value，没有更新key。因此map中就是

（pair1，key2），这样在后续的过程中，取出key2的list，然后执行remove（pair1）的操作，肯定是不会移除的（pair1不在key2的list中),但是_factory.destroyObject(key, pairTimeStamp.value);方法确是实实在在的执行了：

Java代码
public void destroyObject(Object key, Object obj) throws Exception {
     if(obj instanceof DelegatingPreparedStatement) {
         ((DelegatingPreparedStatement)obj).getInnermostDelegate().close();
     } else {
         ((PreparedStatement)obj).close();
     }
}

因此，池中就多了一个关闭后的连接，再次使用时报错。

2.3 排查过程

1.这个排查过程相当纠结，花了很久的时间。

2.首先是问题重现，普通压测不能重现，初步定位可能是环境不复杂或者环境配置不同造成的，环境唯一可能不同的就是数据库端配置，但通过看了下 jdbc的driver的源码，抛错的地方是一个状态位，而改变这个状态位的入口肯定是要app端去执行一下的，一个statment和 connection放在池中，没有线程处理的时候，是没有办法改变这个状态位的，而如果是使用过程中出问题，应该在不缓存的时候也会出错，所以基本排除数据库配置的猜想。

3.那还有一个可能就是出错环境的访问和sql执行比较复杂，这个复杂性最主要的是影响statement的逐出（超过50个就出触发 clearoldest），那么我们如果要想模拟这样的情况，就应该减小pool的大小，于是减小到1，用两个不同sql来交替执行（并启多个线程），没有重现。

4.上面这条路不通，换了一个思路，用btrace在出错环境监控jdbc的statement的关闭，观察发现所有的关闭都是在 clearOldest中触发的，由于有问题1的经验，又怀疑是多线程或者资源关闭顺序造成，于是用btrace监控连接的brrow和return，没有发现问题。

5.和老大讨论，结合前面的分析，初步还是觉得在逐出的地方（clearOldest）出了问题，再考虑到pool大小为1可能比较极端，将pool大小设置为2，sql换为3个，重现了问题！

6.重现后再次重btrace观察池的借出，归还和逐出，发现所有操作都是正常的，而且由于抛异常后不会从池中清除（这里是有点问题的，应该清除掉），所以每次重现刚刚出问题的瞬间都要重启，浪费了较多时间。

7.既然初步定位了问题，考虑用java代码来重现，而不是起服务器，结果比较顺利，也成功重现了。然后开始排除无关因素，发现不需要多线程，单线程也可以重现，于是确定和并发无关。

8.在3个sql之间插入一些sleep，发现在sql1和2之间插入sleep后问题消失，而sql1,2执行时pool还没有满，不会触发clearoldest，又纠结了。

9.尝试debug下能否重现，开始以debug就重现不了了，结合上面分析，应该和时间有关，于是为了避免在sql1和2间停顿，只在clearoldest打了断点，然后先只是在断点停一下，看能否重现。果然重现了，于是开始看debug的内容，发现了具体原因。

commons-pool 在1.5.4修复了该问题。

    3.其他一些相关点
一般来说：driver.createConnection()花费几百到1000ms左右；一个connection第一次ps.exeute()的时候，数据库有一些初始化工作，需要100ms左右后续的使用就是正常的数据库查询时间了。
oracle的sql解析和preparestatement基本无关的，只要sql（或者statement中的str）相同就可以，是一个全局缓存，LRU逐出。
oracle的max-open-cursor是instance级别的，多个schema共用，要考虑缓存后cursor太多的情况，20个conn，一个缓存50个statement，就有1000个了。
第一次执行一个sql是硬解析，然后再缓存逐出前是软解析，一个session中（对应app的connection）是软软解析，缓存了就是open cursor，层层变快。查询结果不会缓存。但从磁盘角度会有块的缓存，如果近期又查询的话。
driver和db间一次网络交互的大小有fetchsize决定，oracle driver默认是10. 那么preparestatement的缓存buf大小就是一行大小*fetchsize个byte。太大会占用较多内存。
    4.参考文献
      1.DBCP浅析： http://zldeng1984.iteye.com/admin/blogs/1368384

2.http://zldeng1984.iteye.com/blog/1500140

DBCP之poolPreparedStatements

相关阅读

相关文章

相关问答

相关文档