当前位置: 首页 > 知识库问答 >
问题:

Cassandra持续存在GC问题-应用程序长时间暂停

欧奇希
2023-03-14

我们正在使用一个3站点,每个站点3个节点的Cassandra 1.1.12集群,每个节点分配了8GB内存。我们定期在节点上看到长时间的GC暂停,这扰乱了我们的应用程序实时要求。我们运行的系统是8个核心系统,具有24GB内存。

我们已经看到了120秒的暂停,它会停止世界GC。

我们在JDK 1.7.0_04上运行这些标志

-XX:+UseThreadPriorities 
-XX:ThreadPriorityPolicy=42 
-Xms8G 
-Xmx8G 
-Xmn1600M 
-XX:+HeapDumpOnOutOfMemoryError 
-Xss180k 
-XX:+UseParNewGC 
-XX:+UseConcMarkSweepGC 
-XX:+CMSParallelRemarkEnabled 
-XX:SurvivorRatio=8 
-XX:MaxTenuringThreshold=1 
-XX:CMSInitiatingOccupancyFraction=75 
-XX:+UseCMSInitiatingOccupancyOnly 
-XX:+PrintGCDetails 
-XX:+PrintGCDateStamps 
-XX:+PrintHeapAtGC 
-XX:+PrintTenuringDistribution 
-XX:+PrintGCApplicationStoppedTime 
-XX:+PrintPromotionFailure 
-XX:PrintFLSStatistics=1 

以下是导致长时间暂停的详细GC日志:


2014-02-23T11:50:19.231-0500: 2119627.980: [GC Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 145905916
Max   Chunk Size: 3472057
Number of Blocks: 160577
Av.  Block  Size: 908
Tree      Height: 146
Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 0
Max   Chunk Size: 0
Number of Blocks: 0
Tree      Height: 0
2119627.981: [ParNew
Desired survivor size 83886080 bytes, new threshold 1 (max 1)
- age   1:   32269664 bytes,   32269664 total
: 1356995K->44040K(1474560K), 4.5270760 secs] 5829345K->4546031K(8224768K)After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 144857324
Max   Chunk Size: 2423465
Number of Blocks: 160577
Av.  Block  Size: 902
Tree      Height: 146
After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 0
Max   Chunk Size: 0
Number of Blocks: 0
Tree      Height: 0
, 4.5295190 secs] [Times: user=24.78 sys=0.07, real=4.52 secs] 
Heap after GC invocations=82068 (full 2561):
 par new generation   total 1474560K, used 44040K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000)
  eden space 1310720K,   0% used [0x00000005fae00000, 0x00000005fae00000, 0x000000064ae00000)
  from space 163840K,  26% used [0x000000064ae00000, 0x000000064d902028, 0x0000000654e00000)
  to   space 163840K,   0% used [0x0000000654e00000, 0x0000000654e00000, 0x000000065ee00000)
 concurrent mark-sweep generation total 6750208K, used 4501991K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 40012K, used 23966K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000)
}
Total time for which application threads were stopped: 4.5524690 seconds
{Heap before GC invocations=82068 (full 2561):
 par new generation   total 1474560K, used 1354760K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000)
  eden space 1310720K, 100% used [0x00000005fae00000, 0x000000064ae00000, 0x000000064ae00000)
  from space 163840K,  26% used [0x000000064ae00000, 0x000000064d902028, 0x0000000654e00000)
  to   space 163840K,   0% used [0x0000000654e00000, 0x0000000654e00000, 0x000000065ee00000)
 concurrent mark-sweep generation total 6750208K, used 4501991K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 40012K, used 23966K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000)
2014-02-23T11:51:14.221-0500: 2119682.970: [GC Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 144857324
Max   Chunk Size: 2423465
Number of Blocks: 160577
Av.  Block  Size: 902
Tree      Height: 146
Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 0
Max   Chunk Size: 0
Number of Blocks: 0
Tree      Height: 0
2119682.971: [ParNew
Desired survivor size 83886080 bytes, new threshold 1 (max 1)
- age   1:   41744088 bytes,   41744088 total
: 1354760K->52443K(1474560K), 2.1589280 secs] 5856751K->4582809K(8224768K)After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 143937754
Max   Chunk Size: 1505947
Number of Blocks: 160575
Av.  Block  Size: 896
Tree      Height: 146
After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 0
Max   Chunk Size: 0
Number of Blocks: 0
Tree      Height: 0
, 2.1613420 secs] [Times: user=12.82 sys=0.04, real=2.16 secs] 
Heap after GC invocations=82069 (full 2561):
 par new generation   total 1474560K, used 52443K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000)
  eden space 1310720K,   0% used [0x00000005fae00000, 0x00000005fae00000, 0x000000064ae00000)
  from space 163840K,  32% used [0x0000000654e00000, 0x0000000658136e38, 0x000000065ee00000)
  to   space 163840K,   0% used [0x000000064ae00000, 0x000000064ae00000, 0x0000000654e00000)
 concurrent mark-sweep generation total 6750208K, used 4530365K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 40012K, used 23966K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000)
}
Total time for which application threads were stopped: 2.1719930 seconds
{Heap before GC invocations=82069 (full 2561):
 par new generation   total 1474560K, used 1363163K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000)
  eden space 1310720K, 100% used [0x00000005fae00000, 0x000000064ae00000, 0x000000064ae00000)
  from space 163840K,  32% used [0x0000000654e00000, 0x0000000658136e38, 0x000000065ee00000)
  to   space 163840K,   0% used [0x000000064ae00000, 0x000000064ae00000, 0x0000000654e00000)
 concurrent mark-sweep generation total 6750208K, used 4530365K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 40012K, used 23966K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000)
2014-02-23T11:52:33.089-0500: 2119761.837: [GC Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 143937754
Max   Chunk Size: 1505947
Number of Blocks: 160575
Av.  Block  Size: 896
Tree      Height: 146
Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 0
Max   Chunk Size: 0
Number of Blocks: 0
Tree      Height: 0
2119761.839: [ParNew
Desired survivor size 83886080 bytes, new threshold 1 (max 1)
- age   1:   37906760 bytes,   37906760 total
: 1363163K->48710K(1474560K), 3.5105890 secs] 5893529K->4611208K(8224768K)After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 142756036
Max   Chunk Size: 326281
Number of Blocks: 160573
Av.  Block  Size: 889
Tree      Height: 146
After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 0
Max   Chunk Size: 0
Number of Blocks: 0
Tree      Height: 0
, 3.5130550 secs] [Times: user=18.81 sys=0.08, real=3.52 secs] 
Heap after GC invocations=82070 (full 2561):
 par new generation   total 1474560K, used 48710K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000)
  eden space 1310720K,   0% used [0x00000005fae00000, 0x00000005fae00000, 0x000000064ae00000)
  from space 163840K,  29% used [0x000000064ae00000, 0x000000064dd91af8, 0x0000000654e00000)
  to   space 163840K,   0% used [0x0000000654e00000, 0x0000000654e00000, 0x000000065ee00000)
 concurrent mark-sweep generation total 6750208K, used 4562497K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 40012K, used 23966K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000)
}
Total time for which application threads were stopped: 3.5236060 seconds
{Heap before GC invocations=82070 (full 2561):
 par new generation   total 1474560K, used 1359430K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000)
  eden space 1310720K, 100% used [0x00000005fae00000, 0x000000064ae00000, 0x000000064ae00000)
  from space 163840K,  29% used [0x000000064ae00000, 0x000000064dd91af8, 0x0000000654e00000)
  to   space 163840K,   0% used [0x0000000654e00000, 0x0000000654e00000, 0x000000065ee00000)
 concurrent mark-sweep generation total 6750208K, used 4562497K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 40012K, used 23966K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000)
2014-02-23T11:55:59.448-0500: 2119968.196: [GC Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 142756036
Max   Chunk Size: 326281
Number of Blocks: 160573
Av.  Block  Size: 889
Tree      Height: 146
Before GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 0
Max   Chunk Size: 0
Number of Blocks: 0
Tree      Height: 0
2119968.198: [ParNew (0: promotion failure size = 131074)  (1: promotion failure size = 131074)  (2: promotion failure size = 131074)  (4: promotion failure size = 131074)  (7: promotion failure size = 131074)  (promotion failed)
Desired survivor size 83886080 bytes, new threshold 1 (max 1)
- age   1:   34318480 bytes,   34318480 total
: 1359430K->1353373K(1474560K), 1.5971880 secs]2119969.795: [CMSCMS: Large block 0x000000073aa59270
: 4586726K->3600612K(6750208K), 149.1735470 secs] 5921928K->3600612K(8224768K), [CMS Perm : 23966K->23925K(40012K)]After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 403131826
Max   Chunk Size: 403131826
Number of Blocks: 1
Av.  Block  Size: 403131826
Tree      Height: 1
After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 0
Max   Chunk Size: 0
Number of Blocks: 0
Tree      Height: 0
, 150.7724630 secs] [Times: user=28.89 sys=11.57, real=150.75 secs] 
Heap after GC invocations=82071 (full 2562):
 par new generation   total 1474560K, used 0K [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000)
  eden space 1310720K,   0% used [0x00000005fae00000, 0x00000005fae00000, 0x000000064ae00000)
  from space 163840K,   0% used [0x0000000654e00000, 0x0000000654e00000, 0x000000065ee00000)
  to   space 163840K,   0% used [0x000000064ae00000, 0x000000064ae00000, 0x0000000654e00000)
 concurrent mark-sweep generation total 6750208K, used 3600612K [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 40012K, used 23925K [0x00000007fae00000, 0x00000007fd513000, 0x0000000800000000)
}
Total time for which application threads were stopped: 150.7833360 seconds

我还设置了一个夜间作业,强制GC在凌晨2点在实例上运行,希望它能缓解这些问题——它做了一些好事,但每隔几天节点仍会出现问题。

--增加了细节在将NewGen更改为2GB(从1600M)后,我们在运行的前12小时内看到了15秒的暂停。

... 类直方图转储:

15772.786: [Class Histogram (before full gc): 
 num     #instances         #bytes  class name
----------------------------------------------
   1:       9743656      526609104  [B
   2:       9176097      440452656  java.nio.HeapByteBuffer
   3:       8152787      326111480  java.math.BigInteger
   4:       8126173      321393760  [I
   5:        207997      307212904  [J
   6:       8940730      214577520  java.lang.Long
   7:       8121743      194921832  org.apache.cassandra.db.DecoratedKey
   8:       8121399      129942384  org.apache.cassandra.dht.BigIntegerToken
   9:        174374       78049856  [Ljava.lang.Object;
  10:        914261       43884528  edu.stanford.ppl.concurrent.SnapTreeMap$Node
  11:       1112269       35592608  java.util.concurrent.ConcurrentHashMap$HashEntry
  12:       1101827       35258464  com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node
  13:        306955       29467680  edu.stanford.ppl.concurrent.CopyOnWriteManager$COWEpoch
  14:       1216257       29190168  org.apache.cassandra.cache.KeyCacheKey
  15:       1111387       26673288  com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue
  16:        427695       20529360  edu.stanford.ppl.concurrent.SnapTreeMap$RootHolder
  17:        417278       16691120  org.apache.cassandra.db.ExpiringColumn
  18:        306955        9822560  edu.stanford.ppl.concurrent.CopyOnWriteManager$Latch
  19:        401296        9631104  org.apache.cassandra.db.ISortedColumns$DeletionInfo
  20:           305        8691760  [Ljava.util.concurrent.ConcurrentHashMap$HashEntry;
  21:        528460        8455360  java.util.concurrent.atomic.AtomicReference
  22:        263965        8446880  edu.stanford.ppl.concurrent.SnapTreeMap
  23:        342775        8226600  org.apache.cassandra.db.ColumnFamily
  24:        263965        6335160  org.apache.cassandra.db.AtomicSortedColumns$Holder
  25:        193133        6180256  org.apache.cassandra.db.DeletedColumn
  26:        179519        5744608  java.util.ArrayList$Itr
  27:        221515        5316360  java.util.concurrent.ConcurrentSkipListMap$Node
  28:         35607        5025400  
  29:         35607        4852600  
  30:         61748        4445856  org.apache.cassandra.io.sstable.SSTableIdentityIterator
  31:        264409        4230544  edu.stanford.ppl.concurrent.SnapTreeMap$COWMgr
  32:         52605        4216504  [Ljava.util.HashMap$Entry;
  33:          3389        3868680  
  34:        221414        3542624  org.apache.cassandra.db.AtomicSortedColumns
  35:        107140        3428480  org.apache.cassandra.db.Column
  36:         69734        3347232  java.util.TreeMap
  37:        131777        3162648  java.util.ArrayList
  38:        111752        2682048  java.util.concurrent.ConcurrentSkipListMap$Index
  39:         51879        2490192  java.util.HashMap
  40:          3389        2438184  
  41:          2978        2284352  
  42:         55660        1781120  org.apache.cassandra.io.util.DataOutputBuffer
  43:         69664        1671936  org.apache.cassandra.db.TreeMapBackedSortedColumns
  44:         51697        1654304  org.apache.cassandra.db.ArrayBackedSortedColumns
  45:         61842        1485456  [Ljava.nio.ByteBuffer;
  46:         61748        1481952  org.apache.cassandra.utils.BytesReadTracker
  47:          3533        1462144  
  48:         55672        1336128  org.apache.cassandra.io.util.FastByteArrayOutputStream
  49:         21743        1318848  [C
  50:         33109        1268616  [Lorg.apache.cassandra.db.IColumn;
  51:         28607        1144280  java.util.HashMap$KeyIterator
  52:         17623        1127616  [Ljava.util.Hashtable$Entry;
  53:         35003        1120096  java.util.Vector
  54:         17335        1109440  com.sun.jmx.remote.util.OrderClassLoaders
  55:         17601         844848  java.util.Hashtable
  56:         18528         741120  org.apache.cassandra.io.sstable.IndexHelper$IndexInfo
  57:         22634         724288  java.lang.String
  58:         17514         700560  java.security.ProtectionDomain
  59:         19829         634528  java.util.HashMap$Entry
  60:         25327         607848  org.apache.cassandra.utils.IntervalTree.Interval
  61:         17513         560416  java.security.CodeSource
  62:         22026         528624  java.lang.Double
  63:         21740         521760  java.util.concurrent.LinkedBlockingDeque$Node
  64:         18782         486728  [[J
  65:         18660         447840  org.apache.cassandra.utils.BloomFilter
  66:         18660         447840  org.apache.cassandra.utils.obs.OpenBitSet
  67:         18529         444696  org.apache.cassandra.db.compaction.PrecompactedRow
  68:          3717         444688  java.lang.Class
  69:         18528         444672  org.apache.cassandra.db.ColumnIndexer$RowHeader
  70:         18528         444672  org.apache.cassandra.utils.BloomCalculations$BloomSpecification
  71:         24572         393152  java.lang.Object
  72:          5753         335568  [S
  73:          7326         293040  java.util.TreeMap$Entry
  74:         17817         285072  java.util.HashSet
  75:          5583         282488  [[I
  76:         17514         280224  java.security.ProtectionDomain$Key
  77:         17514         280224  [Ljava.security.Principal;
  78:          6612         264480  org.apache.cassandra.service.WriteResponseHandler
  79:          7862         251584  org.apache.cassandra.db.RowMutation
  80:         10083         241992  org.apache.cassandra.db.EchoedRow
  81:          6110         195520  org.apache.cassandra.utils.ExpiringMap$CacheableObject
  82:           311         179136  
  83:          2927         163912  org.apache.cassandra.thrift.TBinaryProtocol
  84:          6563         157512  org.apache.cassandra.net.Message
  85:          6289         150936  org.apache.cassandra.net.Header
  86:          6110         146640  org.apache.cassandra.net.CallbackInfo
  87:          5996         143904  java.util.concurrent.LinkedBlockingQueue$Node
  88:          8004         128064  java.util.HashMap$EntrySet
  89:          7885         126160  java.util.TreeMap$Values
  90:          7790         124640  java.util.concurrent.atomic.AtomicInteger
  91:          5132         123168  org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder
  92:          6640         106240  org.apache.cassandra.utils.SimpleCondition
  93:          6626         106016  java.util.HashMap$Values
  94:           830          92960  java.net.SocksSocketImpl
  95:          1076          86080  java.lang.reflect.Method
  96:          4176          66816  java.lang.Integer
  97:          2005          64160  java.lang.ThreadLocal$ThreadLocalMap$Entry
  98:          1592          63680  java.lang.ref.SoftReference
  99:          1586          63440  com.google.common.collect.SingletonImmutableMap
 100:          1743          55776  java.util.TreeMap$ValueIterator
 101:           470          48880  java.lang.Thread
 102:          1469          47008  java.util.concurrent.locks.AbstractQueuedSynchronizer$Node
 103:          1041          41640  java.lang.ref.Finalizer
 104:          1002          40080  java.util.LinkedHashMap$Entry
 105:          1603          38472  com.google.common.collect.SingletonImmutableSet
 106:          1586          38064  com.google.common.collect.ImmutableEntry
 107:           435          34800  [Ljava.lang.ThreadLocal$ThreadLocalMap$Entry;
 108:           970          31040  java.net.Inet4Address
 109:           425          30600  java.lang.reflect.Constructor
 110:           396          28512  java.lang.reflect.Field
 111:           825          26400  java.net.Socket
 112:          1050          25200  java.util.concurrent.atomic.AtomicLong
 113:           687          21984  java.util.concurrent.SynchronousQueue$TransferStack$SNode
 114:           387          21672  sun.security.provider.MD5
 115:           451          21648  java.util.concurrent.ThreadPoolExecutor$Worker
 116:           442          21216  java.net.SocketInputStream
 117:           525          21000  org.apache.cassandra.thrift.TCustomSocket
 118:           514          20560  org.apache.cassandra.db.CounterColumn
 119:           639          20448  
 120:           353          19768  org.cliffc.high_scale_lib.ConcurrentAutoTable$CAT
 121:           762          18288  java.io.FileDescriptor
 122:           570          18240  org.apache.thrift.transport.TFramedTransport
 123:           939          18200  [Ljava.lang.Class;
 124:           546          17472  java.util.concurrent.locks.ReentrantLock$NonfairSync
 125:           532          17464  [Ljava.lang.String;
 126:           527          16864  java.util.Hashtable$Entry
 127:           298          16688  org.apache.cassandra.service.ClientState$1
 128:           298          16688  org.apache.cassandra.service.ClientState$2
 129:           405          16200  org.apache.cassandra.thrift.Column
 130:           323          15504  java.net.SocketOutputStream
 131:           298          14304  org.apache.cassandra.service.ClientState
 132:           585          14040  org.apache.thrift.transport.TMemoryInputTransport
 133:           570          13680  org.apache.thrift.TByteArrayOutputStream
 134:           284          13632  sun.nio.cs.UTF_8$Encoder
 135:           405          12960  org.apache.cassandra.thrift.ColumnOrSuperColumn
 136:           324          12960  java.io.BufferedInputStream
 137:           530          12720  org.apache.cassandra.utils.EstimatedHistogram
 138:           525          12600  org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess
 139:           388          12416  java.security.MessageDigest$Delegate
 140:           129          12384  org.apache.cassandra.io.sstable.SSTableReader
 141:           305          12200  java.util.concurrent.ConcurrentHashMap$Segment
 142:           376          12032  org.apache.cassandra.db.columniterator.SSTableSliceIterator
 143:            47          10904  [Z
 144:           334          10688  org.apache.cassandra.net.OutboundTcpConnection$Entry
 145:           330          10560  java.lang.ref.WeakReference
 146:           435          10440  java.lang.ThreadLocal$ThreadLocalMap
 147:           426          10224  java.util.BitSet
 148:           314          10048  org.apache.cassandra.net.MessageDeliveryTask
... [removed anything taking less than 10K]
Total      72302201     2937460496
, 3.1156640 secs]
15775.902: [CMSCMS: Large block 0x0000000714aace48
: 2751863K->2568755K(6340608K), 12.0184460 secs]15787.921: [Class Histogram (after full gc): 
 num     #instances         #bytes  class name
----------------------------------------------
   1:       8644126      451299632  [B
   2:       8434549      404858352  java.nio.HeapByteBuffer
   3:       7759859      310394360  java.math.BigInteger
   4:         10686      293655512  [J
   5:       7763265      248560200  [I
   6:       8630461      207131064  java.lang.Long
   7:       7759615      186230760  org.apache.cassandra.db.DecoratedKey
   8:       7759483      124151728  org.apache.cassandra.dht.BigIntegerToken
   9:          1825       64173960  [Ljava.lang.Object;
  10:       1096613       35091616  java.util.concurrent.ConcurrentHashMap$HashEntry
  11:       1092266       34952512  com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$Node
  12:       1092266       26214384  com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap$WeightedValue
  13:       1092266       26214384  org.apache.cassandra.cache.KeyCacheKey
  14:        532463       25558224  edu.stanford.ppl.concurrent.SnapTreeMap$Node
  15:        221414       21255744  edu.stanford.ppl.concurrent.CopyOnWriteManager$COWEpoch
  16:        411588       16463520  org.apache.cassandra.db.ExpiringColumn
  17:        341705       16401840  edu.stanford.ppl.concurrent.SnapTreeMap$RootHolder
  18:           305        8691760  [Ljava.util.concurrent.ConcurrentHashMap$HashEntry;
  19:        442919        7086704  java.util.concurrent.atomic.AtomicReference
  20:        221414        7085248  edu.stanford.ppl.concurrent.CopyOnWriteManager$Latch
  21:        221414        7085248  edu.stanford.ppl.concurrent.SnapTreeMap
  22:        221515        5316360  java.util.concurrent.ConcurrentSkipListMap$Node
  23:        221418        5314032  org.apache.cassandra.db.ColumnFamily
  24:        221418        5314032  org.apache.cassandra.db.ISortedColumns$DeletionInfo
  25:        221414        5313936  org.apache.cassandra.db.AtomicSortedColumns$Holder
  26:         35593        5023912  
  27:         35593        4850696  
  28:          3382        3862240  
  29:        116355        3723360  org.apache.cassandra.db.DeletedColumn
  30:        221414        3542624  org.apache.cassandra.db.AtomicSortedColumns
  31:        221414        3542624  edu.stanford.ppl.concurrent.SnapTreeMap$COWMgr
  32:        111752        2682048  java.util.concurrent.ConcurrentSkipListMap$Index
  33:          3382        2433816  
  34:          2971        2279488  
  35:          3533        1462144  
  36:         11281         891344  [C
  37:          3710         443904  java.lang.Class
  38:         12509         400288  java.lang.String
  39:         23066         369056  java.lang.Object
  40:          5742         334480  [S
  41:          5417         275792  [[I
  42:          8620         206880  java.lang.Double
  43:          8596         206304  java.util.concurrent.LinkedBlockingDeque$Node
  44:           311         179136  
  45:          1631         139152  [Ljava.util.HashMap$Entry;
  46:          4006         128192  org.apache.cassandra.db.Column
  47:          3630         116160  java.util.HashMap$Entry
  48:          1066          85280  java.lang.reflect.Method
  49:          1310          62880  java.util.HashMap
  50:           509          57008  java.net.SocksSocketImpl
  51:          1178          47120  java.lang.ref.SoftReference
  52:           228          40752  [[J
  53:           411          29592  java.lang.reflect.Constructor
  54:           703          28120  java.lang.ref.Finalizer
  55:           367          26424  java.lang.reflect.Field
  56:           248          25792  java.lang.Thread
  57:           804          25728  java.lang.ThreadLocal$ThreadLocalMap$Entry
  58:           554          22160  java.util.LinkedHashMap$Entry
  59:           431          20688  java.net.SocketInputStream
  60:           514          20560  org.apache.cassandra.db.CounterColumn
  61:           835          20040  java.util.concurrent.atomic.AtomicLong
  62:           912          17768  [Ljava.lang.Class;
  63:           546          17472  java.util.concurrent.locks.ReentrantLock$NonfairSync
  64:           524          17272  [Ljava.lang.String;
  65:           535          17120  java.net.Inet4Address
  66:          1068          17088  java.lang.Integer
  67:           213          17040  [Ljava.lang.ThreadLocal$ThreadLocalMap$Entry;
  68:           708          16992  java.util.ArrayList
  69:          1043          16688  java.util.concurrent.atomic.AtomicInteger
  70:           684          16416  java.io.FileDescriptor
  71:           506          16192  java.util.Hashtable$Entry
  72:           504          16128  java.net.Socket
  73:           184          15296  [Ljava.util.Hashtable$Entry;
  74:           264          14784  org.apache.cassandra.thrift.TBinaryProtocol
  75:           305          12200  java.util.concurrent.ConcurrentHashMap$Segment
  76:           506          12144  org.apache.cassandra.utils.EstimatedHistogram
  77:           201          11256  org.cliffc.high_scale_lib.ConcurrentAutoTable$CAT
  78:           117          11232  org.apache.cassandra.io.sstable.SSTableReader
  79:           229          10992  java.util.concurrent.ThreadPoolExecutor$Worker
  80:            47          10904  [Z
  81:           320          10240  java.lang.ref.WeakReference
  82:           319          10208  java.util.Vector
  83:           181          10136  sun.security.provider.MD5
... [removed anything taking up less than 10K]
Total      65302653     2582483384
, 2.5709140 secs]
 2951753K->2568755K(8183808K), [CMS Perm : 21044K->21000K(35192K)]After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 482780727
Max   Chunk Size: 482780727
Number of Blocks: 1
Av.  Block  Size: 482780727
Tree      Height: 1
After GC:
Statistics for BinaryTreeDictionary:
------------------------------------
Total Free Space: 0
Max   Chunk Size: 0
Number of Blocks: 0
Tree      Height: 0
, 17.7063220 secs] [Times: user=16.57 sys=0.05, real=17.70 secs] 
Heap after GC invocations=1233 (full 21):
 par new generation   total 1843200K, used 0K [0x00000005fae00000, 0x0000000677e00000, 0x0000000677e00000)
  eden space 1638400K,   0% used [0x00000005fae00000, 0x00000005fae00000, 0x000000065ee00000)
  from space 204800K,   0% used [0x000000065ee00000, 0x000000065ee00000, 0x000000066b600000)
  to   space 204800K,   0% used [0x000000066b600000, 0x000000066b600000, 0x0000000677e00000)
 concurrent mark-sweep generation total 6340608K, used 2568755K [0x0000000677e00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 35192K, used 21000K [0x00000007fae00000, 0x00000007fd05e000, 0x0000000800000000)
}
Total time for which application threads were stopped: 17.7081220 seconds

共有1个答案

严峰
2023-03-14

首先是一个额外的提示,考虑将其添加到GC设置中。它会在完整的GCs上转储一个柱状图,你可以用它来找出是什么严重占用了堆:

JVM_OPTS="$JVM_OPTS -XX:+PrintSafepointStatistics"
JVM_OPTS="$JVM_OPTS -XX:+PrintClassHistogramBeforeFullGC"
JVM_OPTS="$JVM_OPTS -XX:+PrintClassHistogramAfterFullGC"

也看看你的system.logMemoryM的消息eter.java.那些通常在这些情况下吐出,可以方便地获得更多线索。

从您的GC日志中的消息来看,您似乎遇到了促销失败:

2119968.198: [ParNew (0: promotion failure size = 131074)  (1: promotion failure size = 131074)  (2: promotion failure size = 131074)  (4: promotion failure size = 131074)  (7: promotion failure size = 131074)  (promotion failed)
Desired survivor size 83886080 bytes, new threshold 1 (max 1)
- age   1:   34318480 bytes,   34318480 total

这意味着您的堆中有太长时间的对象,并且您的堆变得支离破碎,导致它没有足够的空间将对象从新一代提升到幸存者或旧一代。这对于你拥有的强大硬件来说是不正常的,但即使是在一台拥有70Gb的16核机器上,这种情况也确实发生在我身上,所以这里有一些事情需要连续查看,以便你能够确定问题。如果你有我告诉过你的柱状图,我们可以更快地得出结论,但下面是可以尝试的列表:

1.如果您启用了多线程压缩,请禁用它;

2.你的CPU怎么样?如果利用率高,将系统中默认为8个线程的并发压缩器减少到4个;

3.Cassandra 1.1在密钥缓存大小计算方面存在缺陷。它使用48字节的硬编码大小来计算用于密钥缓存的堆空间,如果应用程序生成的密钥长于此长度,则堆上的密钥缓存使用率将显著高于在casssandra中实际设置的值。yaml,它会造成大量堆压力,特别是那些长期存在的东西,会导致升级失败:

参考资料:https://git-wip-us.apache.org/repos/asf?p=cassandra.git; a=blob; f=src/java/org/apache/cassandra/service/CacheService.java; hb=02672936#l102

https://issues.apache.org/jira/browse/CASSANDRA-4315

您可以从nodetool info命令或适当的JMX MBean检查您的密钥缓存使用情况。如果它通常是满的,那可能是一个线索;不幸的是,修复程序在1.1中不可用。如果您有此问题,您应该升级到1.2或在不升级的情况下解决它,将您的密钥缓存大小的值降低cassandra.yaml显着;

4.是否使用行缓存?试着把它关掉。

5.阅读宽行或我通常称之为死亡行。如果您的应用程序正在生成宽行(1个键有很多列),并且正在尝试读取该行的一个大片段(

可以查看nodetool cfstats命令,查看列族的最大行大小。如果你有一个最大行大小为几百MBs或更多的CF,这可能是一个线索;

7.对于读/写重载,GC设置中的默认使用期限阈值标志很低,可能会导致提前升级。这很可能是你的问题的症状。尝试将TenuringReshold增加到一个较大的值,比如32,以防止过早晋升,这样ParNew可以收集年轻一代中的大部分垃圾。

您可以通过运行jstat命令来观察这个持续存在的问题和提升失败,并查看堆中的幸存者空间是如何利用的。我敢打赌,它们没有被充分利用,对象直接从伊甸园到旧一代。;)

8.如果都失败了,很可能你的堆不够大,无法处理你的工作量,所以你必须调整大小。这种可能性较小,你应该把它作为最后的手段,谨慎行事。

8A:增加你的体重。尝试将8GB的容量增加到2GB。也可以像上面那样提高寿命阈值;

8B:如果以上改善了情况,但没有100%消除问题,考虑增加你的MAX_HEAP_SIZE2GB一次。增加NEWGEN与它。确保它不超过1/4的MAX HEAP。

继续高达16GB,但不会超过这个数,因为您只有24GB的总内存。

关于我自己:我有上述所有问题,我一次一个地消除它们。目前,我使用24GB堆和6GB新根运行我的节点,使用32的持续阈值。我拥有的最大GC暂停是

希望这对你有帮助。

这里有几篇关于GC的文章:http://fedora.fiz-karlsruhe.de/docs/Wiki.jsp?page=Java堆&GC调整http://blog.ragozin.info/2011/10/java-cg-hotspots-cms-and-heap.htmlhttp://java.dzone.com/articles/how-tame-java-gc-pauseshttp://grokbase.com/t/cassandra/user/113qf50x4r/parnew-promotion-failed

 类似资料:
  • 我一直在我的应用程序中随机(内存溢出)崩溃,所以我开始分析我的堆。我注意到,如果我从活动A到活动B,堆会从27 MB增加到35 MB(由于懒惰加载许多图像)。但是,当我完成()活动B返回到活动A时,堆大小保持不变,即使使用GC操作!! 令人讨厌的是,再次进入活动B会将堆增加到42 MB。我可以这样做,因为五月的时候,堆只会不断增加。 这是我正在使用的惰性图像加载库: LazyListhttps:/

  • 我对Java非常陌生,在仔细阅读文档之后,我发现自己陷入了困境。 我有一个使用JavaFXMediaPlayer播放wav文件的小程序。我的播放器对象有一个currentTimeProperty,我希望在播放期间以分钟:秒为单位显示该属性的输出。 所以我在一个函数的末尾有这样一个函数,它接收我的wav文件并初始化播放器: 然后我有: 这很有效。在wav播放过程中,my以毫秒为单位随当前时间更新。问

  • 我有一个分布式缓存应用程序(内存绑定,由于与集群中其他节点的交互而具有网络I/O),运行在带有G1垃圾收集器的JVM1.7.0_51中。以下是JVM配置: 我把gc日志附上了几分钟。您在GC日志中看到任何问题以及JVM GC调优的任何需要吗?多谢! GC日志

  • 我在生产中遇到了一个奇怪的问题,在操作了5年多后只发生过一次。我们发现,PSPermGen和“real”持续时间远远高于User+SYS时间。2016年2月4日,PSPermGen花了38.96秒,而以前的运行是0.2-0.3秒,并且没有释放内存。当User+SYS为0.3-0.4秒时,实时时间需要40秒,与以前的运行相比,这是异常高的。完全的GC并不经常发生,我也没有观察到任何与GC相关的错误。

  • 问题内容: 有人可以从下面的数据中告诉我我的会话将持续多长时间吗?-我不确定哪个告诉我 问题答案: 通常,您可以说_session.gc_maxlifetime_ 指定自上次更改会话数据( 而不是 上次调用!)以来的最长生存时间。但是PHP的会话处理有点复杂。 因为会话数据由垃圾收集器,其仅由称为除去用的概率_session.gc_probability合_ 由devided session.gc