具体请参考Oracle® Coherence Administrator's Guide的第6章:Performance Tuning。针对本次项目的AIX环境,建议调整下面这些参数:
默认的socket buffer sizes一般都比较小,Coherence会报下面的Warning:
UnicastUdpSocket failed to set receive buffer size to1428 packets (2096304
bytes); actual size is 89 packets (131071 bytes).Consult your OS documentation
regarding increasing the maximum socket buffer size.Proceeding with the actual
value may cause sub-optimal performance.
用root用户执行下面的命令进行调整:
no -o rfc1323=1
no -o sb_max=4194304
AIX5.2以上版本缺省以IPV6进行多播,需要在启动Coherence服务与应用时候,在JVM使用以下系统属性确认使用IPV4
-D java.net.preferIPv4Stack = true
同时在/etc/netsvc.conf中hosts=local,bind4
如果某个节点处于OutOfMemoryError状态,会给集群带来不好的影响,所以当某个节点处于这种状态,应该让它退出而不是师徒恢复。所以需要在IBM JVM的启动参数中配置:
UNIX:
-Xdump:tool:events=throw,filter=java/lang/OutOfMemoryError,exec="kill-9 %pid"
IBM JVM不建议采用固定大小的heap,所以建议只配置-Xms,不配置-Xmx,具体可参考:http://www.ibm.com/developerworks/java/jdk/diagnosis/
Coherence有它自己的日志框架,同时还支持使用log4j,SLF4J以及Javalogging ,为应用程序提供一个通用的日志环境。Coherence的日志是一个专用和低优先级线程,以降低日志记录对系统的关键部分的影响。日志被预先配置,并根据需要将默认设置进行修改。
Coherence记录日志级别决定了日志消息发出。默认的日志级别发出的错误,警告,信息,以及一些调试消息。在开发过程中,日志级别应提高到其最大设置,以确保所有调试消息记录。生产环境的日志输出级别3是合理的,在开发环境下,日志级别越高,输出信息越详细,默认值为5. 以下日志级别说明:
· 0
– Thislevel includes messages that are not associated with a logging level.与日志级别没有关系的信息
· 1
– Thislevel includes the previous level's messages plus error messages.错误日志
· 2
– Thislevel includes the previous levels' messages plus warning messages.警告日志
· 3
– Thislevel includes the previous levels' messages plus informational messages.
· 4-9
– Theselevels include the previous levels' messages plus internal debugging messages.More log messages are emitted as the log level is increased. The default loglevel is5
. debug的信息
· -1
– Nolog messages are emitted.无日志输出
Coherence的日志级别可以在tangosol-coherence-override.xml文件中配置,如下说示:
<logging-config>
<destinationsystem-property="tangosol.coherence.log">log4j</destination>
<severity-levelsystem-property="tangosol.coherence.log.level">3</severity-level>
</logging-config>
如果Coherence的日志文件或者应用的日志文件比较多或者比较大,要及时清理,防止把磁盘空间耗光。需要定期检查Coherence的日志,要注意警告warning及以上级别的日志信息,特别要注意的下面这些问题:
1、 Un-indexed data access 无索引的数据访问 日志关注的内容
1) at com.tangosol...readSerializable(ExternalizableHelper.java:2180
2) YYYY-MM-DD HH:MM:SS.mmm/55.838 Oracle Coherence GE 12.1.2.0.0<…> . . .Timeout while delivering a packet; requestingthe departure confirmation for Member(. . . ) by MemberSet(. . . )
2、 Heap exhaustion 内存消耗 日志关注的内容
java.lang.OutOfMemoryError: GC overhead limit exceeded Dumpingheap to java_pid6199.hprof. . .
Heap dump file created [16864871 bytes in 1.921 secs]
3、 Unresponsive service 未响应的服务
(thread=Cluster, member=2): Detected soft timeout) of {WrapperGuardableGuard{Daemon=DistributedCache}
4、 有关SWAP 的消息
2013/09/17 10:20:26 | [GC 938176K->865107K(1021376K), 19.7179554secs]
5、 Potential Bandwidth Messages 潜在的带宽的消息
a) Experienceda XXX ms communication delay (probable remote GC) with MemberYYY
b) Apotential communication problem has been detected.
c) Thisnode appears to have become disconnected
6、 Potential Disconnect Messages 潜在断开消息
a) (thread=Cluster,member=5): Failed to reach address /192.168.1.103within the IpMonitor timeout. Members [Member(Id=3. . . )] are suspect.
b) (thread=Cluster,member=5): Timed-out members MemberSet(Size=4,BitSetCount=2Member(Id=1, Timestamp=2011-02-05
7、 Detecting Split Brain 集群脑裂的信息
a) 2013-01-2508:16:59.555/638.831 Oracle Coherence GE 12.1.2.0.0/465p4 <D5>Anexistence of a cluster island
b) 2010-01-2509:38:43.213/460.877 Oracle Coherence GE 12.1.2.0.0/465p4Receivedpanic from senior Member,. . .
有多种工具可以监控Coherence集群,主要有:
1. Using JMX to Manage Oracle Coherence
JMX工具,主要是指Jconsole或者Java VisualVM.
2. Using Oracle Coherence Reporting
Coherence本身提供的功能,可生产文本格式的统计报告。
3. Using Oracle WebLogic Server
可通过Weblogic Console监控Coherence节点的健康状态,并启停Coherence节点。
4. Using Oracle Enterprise Manager
也就是通过OEM的ManagementPack for Oracle Coherence,具体请参见:https://docs.oracle.com/cd/E24628_01/install.121/e24215/coherence_getstarted.htm
如果是通过JXM工具监控,需要修改Coherence启动脚本,加上下面的参数:
-Dcom.sun.management.jmxremote-Dtangosol.coherence.management=all -Dtangosol.coherence.management.remote=true
如果需要远程监控:还需要加上:
-Dcom.sun.management.jmxremote.host=10.46.158.140-Dcom.sun.management.jmxremote.port=7091-Dcom.sun.management.jmxremote.ssl=false-Dcom.sun.management.jmxremote.authenticate=false
如果连接不上,还要加上
-Dcom.sun.management.jmxremote.local.only=false
为减少对集群性能的影响,一个集群中,只要有一个节点配置了上面的JMX参数就可以了。不需要每个节点都配置.
JMX工具只能监控从JMX工具启动到停止这个阶段的Coherence集群情况,而通过OEM监控,则可以把采集到的监控数据保存到数据库中,可以查看历史情况。
对Coherence的监控,重点是对内存的监控,如果发现内存没有及时回收并且即将耗光,可进行手工GC, Jconsole或者java VisualVM都可以手工GC,见下面的介绍。
要注意publisher success rate和receiver success rate, send Q size等指标,并注意每个节点的内存是否足够。Free memory等指标
要注意是不是所有的Service都处于正常状态,并注意task average duration, request average duration是否正常。Task backlog是否为0
如下面的Service状态就不正常,处于ENDANGERED状态, request average duration值也特别高。
如下图所示,VisualVM可监控到具体某个节点的CPU,内存使用情况,并且可以进行手工GC.
JConsole可监控具体某个Coherence节点的CPU,内存,进程情况,并可通过Jconsole手工执行GC。
另外通过JConsole的MBean可以监控更多细节的东西,这是JConsole比VisualVM强的地方。
通过jmx管理Coherence,通过MBean数据可以显示Coherence集群简明的操作信息,实现实时的监控和分析。用Coherence-JVisualVM插件可以得到很多的Coherence相关信息,比如:Coherence集群的Machines,Members,Services,Caches等相关信息。
Coherence的MBean列表如下:
Represents a cache. A cluster member includes zero or more instances of this managed bean. | |
Represents a cluster. Each cluster member includes a single instance of this managed bean. | |
Represents a cluster member. Each cluster member includes a single instance of this managed bean. | |
Represents an Oracle Coherence*Extend proxy. A cluster member includes zero or more instances of this managed bean. | |
Represents a remote client connection through Oracle Coherence*Extend. A cluster member includes zero or more instances of this managed bean. | |
Represents a flash journal resource manager. The managed bean is an instance of the JournalMBean interface. Each cluster member includes a single instance of this managed bean. | |
Represents the grid JMX infrastructure. Each cluster member includes a single instance of this managed bean. | |
Represents the network status between two cluster members. Each cluster member includes a single instance of this managed bean. | |
Represents a RAM journal resource manager. The managed bean is an instance of the JournalMBean interface. Each cluster member includes a single instance of this managed bean. | |
Represents the Oracle Coherence reporter. Each cluster member includes a single instance of this managed bean. | |
Represents a clustered service. A cluster member includes zero or more instances of this managed bean. | |
Represents a storage instance for a storage-enabled distributed cache service. A cluster member includes zero or more instances of this managed bean. | |
Represents a transaction manager. A cluster member includes zero or more instances of this managed bean. |