在JBoss Remoting 2.2.2中存在这么一个bug,如果刚好客户端的timeout比服务器端处理时间短的话,就会出现客户端连接池中的连接被无故用掉一个的状况,而且是没法回收的,最终就会导致很快客户端的连接池被占满的现象,在分析JBoss Remoting 2.2.2的代码后发现了问题的所在,同时查看了下JBoss Remoting 2.4的代码,发现在2.4中此bug已被修复。
来看下JBoss Remoting 2.2.2中有问题的这段代码的片断:
synchronized
(usedPoolLock)
{
if
(pooled
!=
null
)
{
usedPooled
++
;
if
(trace) log.trace(
this
+
"
got a socket, usedPooled:
"
+
usedPooled);
break
;
}
if
(usedPooled
<
maxPoolSize)
{
//
Try to get a socket.
if
(trace) log.trace(
this
+
"
getting a socket, usedPooled:
"
+
usedPooled);
usedPooled
++
;
}
else
{
retry
=
true
;
if
(trace) log.trace(
this
+
"
will try again to get a socket
"
);
}
}
Socket socket
=
null
;
long
timestamp
=
System.currentTimeMillis();
try
{
if
(trace) { log.trace(
this
+
"
creating socket
"
+
(counter
++
)
+
"
, attempt
"
+
(i
+
1
)); }
socket
=
createSocket(address.address, address.port, timeRemaining);
if
(trace) log.trace(
this
+
"
created socket:
"
+
socket);
}
catch
(Exception ex)
{
log.debug(
this
+
"
got Exception
"
+
ex
+
"
, creation attempt took
"
+
(System.currentTimeMillis()
-
timestamp)
+
"
ms
"
);
synchronized
(usedPoolLock)
{
usedPooled
--
;
}
if
(i
+
1
<
numberOfRetries)
{
Thread.sleep(
1
);
continue
;
}
throw
ex;
}
socket.setTcpNoDelay(address.enableTcpNoDelay);
Map metadata
=
getLocator().getParameters();
if
(metadata
==
null
)
{
metadata
=
new
HashMap(
2
);
}
else
{
metadata
=
new
HashMap(metadata);
}
metadata.put(SocketWrapper.MARSHALLER, marshaller);
metadata.put(SocketWrapper.UNMARSHALLER, unmarshaller);
if
(timeAllowed
>
0
)
{
timeRemaining
=
(
int
) (timeAllowed
-
(System.currentTimeMillis()
-
start));
if
(timeRemaining
<=
0
)
break
;
metadata.put(SocketWrapper.TEMP_TIMEOUT,
new
Integer(timeRemaining));
}
pooled
=
createClientSocket(socket, address.timeout, metadata);
这段代码的问题出在哪呢,就出在最后一行,或者说出在前面实现给usedPooled++上也可以。
在这里JBoss Remoting过于相信createClientSocket这行代码了,jboss remoting认为这行代码是不可能抛出异常的,但事实上其实这行是有可能会抛出异常的,可以想下,如果这行代码执行抛出异常的话,会造成的现象就是之前说的,客户端连接池中的连接被占用了一个,而且没有回收的地方。
所以最简单的方法自然是在pooled=createClientSocket(socket, address.timeout, metadata);这行代码上增加捕捉try...catch,如果有异常抛出的话,则将usedPooled--;就像之前createSocket那个地方一样。
在JBoss Remoting 2.4中,jboss不再采用usedPooled这个long型加上usedPoolLock这个对象锁的方式来控制连接池,而是改为了采用更简单好用的Semphore,不过用的还是EDG包的,而不是java 5的,来看看jboss remoting 2.4中的这段代码改成什么样了:
boolean
timedout
=
!
semaphore.attempt(timeToWait);
if
(trace) log.trace(
this
+
"
obtained semaphore:
"
+
semaphore.permits());
if
(timedout)
{
throw
new
IllegalStateException(
"
Timeout waiting for a free socket
"
);
}
SocketWrapper pooled
=
null
;
if
(tryPool)
{
synchronized
(pool)
{
//
if connection within pool, use it
if
(pool.size()
>
0
)
{
pooled
=
getPooledConnection();
if
(trace) log.trace(
this
+
"
reusing pooled connection:
"
+
pooled);
}
}
}
else
{
if
(trace) log.trace(
this
+
"
avoiding connection pool, creating new socket
"
);
}
if
(pooled
==
null
)
{
//
Need to create a new one
Socket socket
=
null
;
if
(trace) { log.trace(
this
+
"
creating socket
"
); }
//
timeAllowed < 0 indicates no per invocation timeout has been set.
int
timeRemaining
=
-
1
;
if
(
0
<=
timeAllowed)
{
timeRemaining
=
(
int
) (timeAllowed
-
(System.currentTimeMillis()
-
start));
}
socket
=
createSocket(address.address, address.port, timeRemaining);
if
(trace) log.trace(
this
+
"
created socket:
"
+
socket);
socket.setTcpNoDelay(address.enableTcpNoDelay);
Map metadata
=
getLocator().getParameters();
if
(metadata
==
null
)
{
metadata
=
new
HashMap(
2
);
}
else
{
metadata
=
new
HashMap(metadata);
}
metadata.put(SocketWrapper.MARSHALLER, marshaller);
metadata.put(SocketWrapper.UNMARSHALLER, unmarshaller);
if
(timeAllowed
>
0
)
{
timeRemaining
=
(
int
) (timeAllowed
-
(System.currentTimeMillis()
-
start));
if
(timeRemaining
<=
0
)
throw
new
IllegalStateException(
"
Timeout creating a new socket
"
);
metadata.put(SocketWrapper.TEMP_TIMEOUT,
new
Integer(timeRemaining));
}
pooled
=
createClientSocket(socket, address.timeout, metadata);
}
return
pooled;
从以上代码可以看到,JBoss首先是通过semphore.attempt的方式来获取信号量锁,然后就在下面的所有代码中都不做异常的捕捉,jboss在这里改为了在外面统一捕捉这个方法的所有异常,并在有异常的情况下再调用semphore.release():
try
{
boolean
tryPool
=
retryCount
<
(numberOfCallRetries
-
1
)
||
maxPoolSize
==
1
||
numberOfCallRetries
==
1
;
long
l
=
System.currentTimeMillis();
socketWrapper
=
getConnection(marshaller, unmarshaller, tryPool, timeLeft);
long
d
=
System.currentTimeMillis()
-
l;
if
(trace) log.trace(
"
took
"
+
d
+
"
ms to get socket
"
+
socketWrapper);
}
catch
(Exception e)
{
//
if (bailOut)
//
return null;
semaphore.release();
if
(trace) log.trace(
this
+
"
released semaphore:
"
+
semaphore.permits(), e);
sockEx
=
new
CannotConnectException(
"
Can not get connection to server. Problem establishing
"
+
"
socket connection for
"
+
locator, e);
continue
;
}
这样自然是不会再出现2.2.2版本里的那个bug了。
:),由于2.4是重构为了采用semphore,不知道这个bug是刚好凑巧被这样修复了呢,还是知道了这个bug进行fixed,呵呵,不管如何,总之bug是被修订了。
这个bug对于使用jboss remoting的同学们而言还是要引起注意的,因为jboss remoting 2.2.2是jboss as 4.2.2中默认带的版本。
从上面的代码可以看到使用semphore这样的方式来控制并发的需要限制大小的数据结构是非常好的,简单易用,以前的那些long+Object Lock的方式实在是繁琐,另外这个bug也给大家提了醒,在并发的这些资源的控制上千万要注意锁以及释放的点,千万不要主观的认为某些代码是绝对不会出问题的。