问题：

Apache在未加入群集的Kubernetes上点火

万德海

2023-03-14

我正在尝试用Kubernetes建立一个简单的两个节点点火集群。当直接在VM上运行时，相同的配置工作良好。

pod1 --> service1(9090, 10900)   |
                                 | --> ignite-service (47100/TCP,47500/TCP)
pod2 --> service2(9092, 10900)   |               ^
                                                 |
    KubernetesIPFinder----------------------------
            ns = ignite-ns
            svc = ignite-service
        ServiceAccount (ignite-account)

[INFO ] 2020-08-26 16:09:08.969 [main] IgniteKernal%aztecCommunityUserIgnite - VM arguments: [-Xms1g, -Xmx1g, -XX:MaxGCPauseMillis=500, -XX:GCPauseIntervalMillis=30000, -XX:InitiatingHeapOccupancyPercent=60, -XX:G1ReservePercent=30, -XX:+HeapDumpOnOutOfMemoryError, -XX:+DisableExplicitGC, -Djava.net.preferIPv4Stack=true, -XX:+UseG1GC, -Xlog:gc*,safepoint,age*,ergo*:file=/app/aztec/logs/gc-%p-%t.log:tags,uptime,time,level:filecount=10,filesize=50m, -DIGNITE_PERFORMANCE_SUGGESTIONS_DISABLED=true, -DIGNITE_LONG_OPERATIONS_DUMP_TIMEOUT=300000, -Dlog4j.configurationFile=file:///app/aztec/communityuser_service/conf/log4j2.xml, -DIGNITE_WAIT_FOR_BACKUPS_ON_SHUTDOWN=true, -DIGNITE_NO_SHUTDOWN_HOOK=true, -DIGNITE_WAL_MMAP=false]
[INFO ] 2020-08-26 16:09:08.970 [main] IgniteKernal%aztecCommunityUserIgnite - System cache's DataRegion size is configured to 40 MB. Use DataStorageConfiguration.systemRegionInitialSize property to change the setting.
[INFO ] 2020-08-26 16:09:08.970 [main] IgniteKernal%aztecCommunityUserIgnite - Configured caches [in 'sysMemPlc' dataRegion: ['ignite-sys-cache']]
[INFO ] 2020-08-26 16:09:09.054 [main] IgnitePluginProcessor - Configured plugins:
[INFO ] 2020-08-26 16:09:09.054 [main] IgnitePluginProcessor -   ^-- None
[INFO ] 2020-08-26 16:09:09.054 [main] IgnitePluginProcessor -
[INFO ] 2020-08-26 16:09:09.059 [main] FailureProcessor - Configured failure handler: [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]]]
[WARN ] 2020-08-26 16:09:09.278 [main] TcpCommunicationSpi - Failure detection timeout will be ignored (one of SPI parameters has been set explicitly)
[INFO ] 2020-08-26 16:09:09.299 [main] TcpCommunicationSpi - Successfully bound communication NIO server to TCP port [port=47100, locHost=0.0.0.0/0.0.0.0, selectorsCnt=4, selectorSpins=0, pairedConn=false]
[WARN ] 2020-08-26 16:09:09.302 [main] TcpCommunicationSpi - Message queue limit is set to 0 which may lead to potential OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to message queues growth on sender and receiver sides.
[WARN ] 2020-08-26 16:09:09.312 [main] NoopCheckpointSpi - Checkpoints are disabled (to enable configure any GridCheckpointSpi implementation)
[WARN ] 2020-08-26 16:09:09.337 [main] GridCollisionManager - Collision resolution is disabled (all jobs will be activated upon arrival).
[INFO ] 2020-08-26 16:09:09.341 [main] IgniteKernal%aztecCommunityUserIgnite - Security status [authentication=off, tls/ssl=off]
[INFO ] 2020-08-26 16:09:09.392 [main] TcpDiscoverySpi - Successfully bound to TCP port [port=47500, localHost=0.0.0.0/0.0.0.0, locNodeId=11e43ce8-b846-41ac-b688-9c6c34aebcf9]
[INFO ] 2020-08-26 16:09:09.421 [main] PdsFoldersResolver - Successfully created new persistent storage folder [/app/aztec/data/ignite/db/node00-6cd407c6-0c86-4e57-9803-ab56bec5b16c]
[INFO ] 2020-08-26 16:09:09.422 [main] PdsFoldersResolver - Consistent ID used for local node is [6cd407c6-0c86-4e57-9803-ab56bec5b16c] according to persistence data storage folders
[INFO ] 2020-08-26 16:09:09.423 [main] CacheObjectBinaryProcessorImpl - Resolved directory for serialized binary metadata: /app/aztec/data/ignite/binary_meta/node00-6cd407c6-0c86-4e57-9803-ab56bec5b16c
[INFO ] 2020-08-26 16:09:09.637 [main] FilePageStoreManager - Resolved page store work directory: /app/aztec/data/ignite/db/node00-6cd407c6-0c86-4e57-9803-ab56bec5b16c
[INFO ] 2020-08-26 16:09:09.637 [main] FileWriteAheadLogManager - Resolved write ahead log work directory: /app/aztec/data/ignite/db/wal/node00-6cd407c6-0c86-4e57-9803-ab56bec5b16c
[INFO ] 2020-08-26 16:09:09.638 [main] FileWriteAheadLogManager - Resolved write ahead log archive directory: /app/aztec/data/ignite/db/wal/node00-6cd407c6-0c86-4e57-9803-ab56bec5b16c
[INFO ] 2020-08-26 16:09:09.951 [main] FileHandleManagerImpl - Initialized write-ahead log manager [mode=BACKGROUND]
[WARN ] 2020-08-26 16:09:09.954 [main] GridCacheDatabaseSharedManager - DataRegionConfiguration.maxWalArchiveSize instead DataRegionConfiguration.walHistorySize would be used for removing old archive wal files
[INFO ] 2020-08-26 16:09:09.975 [main] GridCacheDatabaseSharedManager - Configured data regions initialized successfully [total=4]
[INFO ] 2020-08-26 16:09:09.993 [main] PartitionsEvictManager - Evict partition permits=2
[WARN ] 2020-08-26 16:09:10.029 [main] IgniteH2Indexing - Serialization of Java objects in H2 was enabled.
[INFO ] 2020-08-26 16:09:10.251 [main] ClientListenerProcessor - Client connector processor has started on TCP port 10900
[INFO ] 2020-08-26 16:09:10.324 [main] GridTcpRestProtocol - Command protocol successfully started [name=TCP binary, host=0.0.0.0/0.0.0.0, port=11211]
[INFO ] 2020-08-26 16:09:10.374 [main] IgniteKernal%aztecCommunityUserIgnite - Non-loopback local IPs: 172.17.239.163
[INFO ] 2020-08-26 16:09:10.375 [main] IgniteKernal%aztecCommunityUserIgnite - Enabled local MACs: 2255F14C9361
[INFO ] 2020-08-26 16:09:10.381 [main] GridCacheDatabaseSharedManager - Read checkpoint status [startMarker=null, endMarker=null]
[INFO ] 2020-08-26 16:09:10.388 [main] PageMemoryImpl - Started page memory [memoryAllocated=100.0 MiB, pages=24814, tableSize=1.9 MiB, checkpointBuffer=100.0 MiB]
[INFO ] 2020-08-26 16:09:10.391 [main] GridCacheDatabaseSharedManager - Checking memory state [lastValidPos=FileWALPointer [idx=0, fileOff=0, len=0], lastMarked=FileWALPointer [idx=0, fileOff=0, len=0], lastCheckpointId=00000000-0000-0000-0000-000000000000]
[INFO ] 2020-08-26 16:09:10.428 [main] GridCacheDatabaseSharedManager - Applying lost cache updates since last checkpoint record [lastMarked=FileWALPointer [idx=0, fileOff=0, len=0], lastCheckpointId=00000000-0000-0000-0000-000000000000]
[INFO ] 2020-08-26 16:09:10.430 [main] GridCacheDatabaseSharedManager - Finished applying WAL changes [updatesApplied=0, time=0 ms]
[INFO ] 2020-08-26 16:09:10.430 [main] GridCacheProcessor - Restoring partition state for local groups.
[INFO ] 2020-08-26 16:09:10.430 [main] GridCacheProcessor - Finished restoring partition state for local groups [groupsProcessed=0, partitionsProcessed=0, time=0ms]
[INFO ] 2020-08-26 16:09:10.483 [main] FilePageStoreManager - Cleanup cache stores [total=1, left=0, cleanFiles=false]
[INFO ] 2020-08-26 16:09:10.491 [main] PageMemoryImpl - Started page memory [memoryAllocated=100.0 MiB, pages=24814, tableSize=1.9 MiB, checkpointBuffer=100.0 MiB]
[INFO ] 2020-08-26 16:09:10.492 [main] PageMemoryImpl - Started page memory [memoryAllocated=100.0 MiB, pages=24814, tableSize=1.9 MiB, checkpointBuffer=100.0 MiB]
[INFO ] 2020-08-26 16:09:10.493 [main] PageMemoryImpl - Started page memory [memoryAllocated=100.0 MiB, pages=24814, tableSize=1.9 MiB, checkpointBuffer=100.0 MiB]
[INFO ] 2020-08-26 16:09:10.502 [main] GridCacheDatabaseSharedManager - Configured data regions started successfully [total=4]
[INFO ] 2020-08-26 16:09:10.503 [main] GridCacheDatabaseSharedManager - Starting binary memory restore for: [-2100569601]
[INFO ] 2020-08-26 16:09:10.518 [main] GridCacheDatabaseSharedManager - Read checkpoint status [startMarker=null, endMarker=null]
[INFO ] 2020-08-26 16:09:10.518 [main] GridCacheDatabaseSharedManager - Checking memory state [lastValidPos=FileWALPointer [idx=0, fileOff=0, len=0], lastMarked=FileWALPointer [idx=0, fileOff=0, len=0], lastCheckpointId=00000000-0000-0000-0000-000000000000]
[INFO ] 2020-08-26 16:09:10.522 [main] FileWriteAheadLogManager - Resuming logging to WAL segment [file=/app/aztec/data/ignite/db/wal/node00-6cd407c6-0c86-4e57-9803-ab56bec5b16c/0000000000000000.wal, offset=0, ver=2]
[INFO ] 2020-08-26 16:09:10.684 [main] GridCacheProcessor - Started cache in recovery mode [name=ignite-sys-cache, id=-2100569601, dataRegionName=sysMemPlc, mode=REPLICATED, atomicity=TRANSACTIONAL, backups=2147483647, mvcc=false]
[INFO ] 2020-08-26 16:09:10.689 [main] GridCacheDatabaseSharedManager - Binary recovery performed in 186 ms.
[INFO ] 2020-08-26 16:09:10.690 [main] GridCacheDatabaseSharedManager - Read checkpoint status [startMarker=null, endMarker=null]
[INFO ] 2020-08-26 16:09:10.690 [main] GridCacheDatabaseSharedManager - Applying lost cache updates since last checkpoint record [lastMarked=FileWALPointer [idx=0, fileOff=0, len=0], lastCheckpointId=00000000-0000-0000-0000-000000000000]
[INFO ] 2020-08-26 16:09:10.692 [main] GridCacheDatabaseSharedManager - Finished applying WAL changes [updatesApplied=0, time=0 ms]
[INFO ] 2020-08-26 16:09:10.692 [main] GridCacheProcessor - Restoring partition state for local groups.
[INFO ] 2020-08-26 16:09:10.703 [main] GridCacheProcessor - Finished restoring partition state for local groups [groupsProcessed=1, partitionsProcessed=0, time=10ms]
[INFO ] 2020-08-26 16:09:10.738 [main] TcpDiscoverySpi - Connection check threshold is calculated: 300000

[INFO ] 2020-08-26 16:11:18.387 [tcp-disco-srvr-[:47500]-#3%aztecCommunityUserIgnite%] TcpDiscoverySpi - TCP discovery accepted incoming connection [rmtAddr=/172.17.239.64, rmtPort=34837]
[INFO ] 2020-08-26 16:11:18.395 [tcp-disco-srvr-[:47500]-#3%aztecCommunityUserIgnite%] TcpDiscoverySpi - TCP discovery spawning a new thread for connection [rmtAddr=/172.17.239.64, rmtPort=34837]
[INFO ] 2020-08-26 16:11:18.396 [tcp-disco-sock-reader-[]-#4%aztecCommunityUserIgnite%] TcpDiscoverySpi - Started serving remote node connection [rmtAddr=/172.17.239.64:34837, rmtPort=34837]
[INFO ] 2020-08-26 16:11:18.399 [tcp-disco-sock-reader-[]-#4%aztecCommunityUserIgnite%] TcpDiscoverySpi - Received ping request from the remote node [rmtNodeId=f4df02cf-0700-4f31-93b0-9073c9394d2d, rmtAddr=/172.17.239.64:34837, rmtPort=34837]
[INFO ] 2020-08-26 16:11:18.400 [tcp-disco-sock-reader-[]-#4%aztecCommunityUserIgnite%] TcpDiscoverySpi - Finished writing ping response [rmtNodeId=f4df02cf-0700-4f31-93b0-9073c9394d2d, rmtAddr=/172.17.239.64:34837, rmtPort=34837]
[INFO ] 2020-08-26 16:11:18.400 [tcp-disco-sock-reader-[]-#4%aztecCommunityUserIgnite%] TcpDiscoverySpi - Finished serving remote node connection [rmtAddr=/172.17.239.64:34837, rmtPort=34837
[INFO ] 2020-08-26 16:13:25.749 [tcp-disco-srvr-[:47500]-#3%aztecCommunityUserIgnite%] TcpDiscoverySpi - TCP discovery accepted incoming connection [rmtAddr=/172.17.239.64, rmtPort=36858]
[INFO ] 2020-08-26 16:13:25.749 [tcp-disco-srvr-[:47500]-#3%aztecCommunityUserIgnite%] TcpDiscoverySpi - TCP discovery spawning a new thread for connection [rmtAddr=/172.17.239.64, rmtPort=36858]
[INFO ] 2020-08-26 16:13:25.750 [tcp-disco-sock-reader-[]-#5%aztecCommunityUserIgnite%] TcpDiscoverySpi - Started serving remote node connection [rmtAddr=/172.17.239.64:36858, rmtPort=36858]
[INFO ] 2020-08-26 16:13:25.752 [tcp-disco-sock-reader-[f4df02cf 172.17.239.64:36858]-#5%aztecCommunityUserIgnite%] TcpDiscoverySpi - Initialized connection with remote server node [nodeId=f4df02cf-0700-4f31-93b0-9073c9394d2d, rmtAddr=/172.17.239.64:36858]
[INFO ] 2020-08-26 16:13:25.772 [tcp-disco-msg-worker-[]-#2%aztecCommunityUserIgnite%] TcpDiscoverySpi - New next node [newNext=TcpDiscoveryNode [id=f4df02cf-0700-4f31-93b0-9073c9394d2d, consistentId=b003163e-ef90-450a-885c-6d7e9b0cbef4, addrs=ArrayList [127.0.0.1, 172.17.193.243], sockAddrs=HashSet [sit-aztec-authentication-service/192.168.164.225:47500, /127.0.0.1:47500, /172.17.193.243:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1598458405757, loc=false, ver=2.8.1#20200521-sha1:86422096, isClient=false]]

更新：

点火配置：

[INFO ] 2020-08-26 16:25:03.364 [main] IgniteKernal%aztecAuthIgnite - IgniteConfiguration [igniteInstanceName=aztecAuthIgnite, pubPoolSize=8, svcPoolSize=8, callbackPoo
lSize=8, stripedPoolSize=8, sysPoolSize=8, mgmtPoolSize=4, igfsPoolSize=1, dataStreamerPoolSize=8, utilityCachePoolSize=8, utilityCacheKeepAliveTime=60000, p2pPoolSize=
2, qryPoolSize=8, sqlQryHistSize=1000, dfltQryTimeout=0, igniteHome=null, igniteWorkDir=/app/aztec/data/ignite, mbeanSrv=com.sun.jmx.mbeanserver.JmxMBeanServer@d554c5f,
 nodeId=3b17a57c-6ee6-4225-bc50-a762f6ec50af, marsh=BinaryMarshaller [], marshLocJobs=false, daemon=false, p2pEnabled=false, netTimeout=150000, netCompressionLevel=1, s
ndRetryDelay=1000, sndRetryCnt=3, metricsHistSize=10000, metricsUpdateFreq=2000, metricsExpTime=9223372036854775807, discoSpi=TcpDiscoverySpi [addrRslvr=null, sockTimeo
ut=0, ackTimeout=0, marsh=null, reconCnt=10, reconDelay=2000, maxAckTimeout=600000, soLinger=5, forceSrvMode=false, clientReconnectDisabled=false, internalLsnr=null, sk
ipAddrsRandomization=false], segPlc=STOP, segResolveAttempts=2, waitForSegOnStart=true, allResolversPassReq=true, segChkFreq=10000, commSpi=TcpCommunicationSpi [connect
Gate=null, connPlc=org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$FirstConnectionPolicy@60c38c44, chConnPlc=null, enableForcibleNodeKill=false, enableTroub
leshootingLog=false, locAddr=null, locHost=null, locPort=47100, locPortRange=100, shmemPort=-1, directBuf=true, directSndBuf=false, idleConnTimeout=600000, connTimeout=
5000, maxConnTimeout=600000, reconCnt=10, sockSndBuf=32768, sockRcvBuf=32768, msgQueueLimit=0, slowClientQueueLimit=0, nioSrvr=null, shmemSrv=null, usePairedConnections
=false, connectionsPerNode=1, tcpNoDelay=true, filterReachableAddresses=false, ackSndThreshold=32, unackedMsgsBufSize=0, sockWriteTimeout=2000, boundTcpPort=-1, boundTc
pShmemPort=-1, selectorsCnt=4, selectorSpins=0, addrRslvr=null, ctxInitLatch=java.util.concurrent.CountDownLatch@1ee2a1e2[Count = 1], stopping=false, metricsLsnr=null],
 evtSpi=org.apache.ignite.spi.eventstorage.NoopEventStorageSpi@59ae2de7, colSpi=NoopCollisionSpi [], deploySpi=LocalDeploymentSpi [], indexingSpi=org.apache.ignite.spi.
indexing.noop.NoopIndexingSpi@38bb9fad, addrRslvr=null, encryptionSpi=org.apache.ignite.spi.encryption.noop.NoopEncryptionSpi@11620476, clientMode=false, rebalanceThrea
dPoolSize=4, rebalanceTimeout=10000, rebalanceBatchesPrefetchCnt=3, rebalanceThrottle=0, rebalanceBatchSize=524288, txCfg=TransactionConfiguration [txSerEnabled=false,
dfltIsolation=REPEATABLE_READ, dfltConcurrency=PESSIMISTIC, dfltTxTimeout=0, txTimeoutOnPartitionMapExchange=0, deadlockTimeout=10000, pessimisticTxLogSize=0, pessimist
icTxLogLinger=10000, tmLookupClsName=null, txManagerFactory=null, useJtaSync=false], cacheSanityCheckEnabled=true, discoStartupDelay=60000, deployMode=SHARED, p2pMissed
CacheSize=100, locHost=null, timeSrvPortBase=31100, timeSrvPortRange=100, failureDetectionTimeout=300000, sysWorkerBlockedTimeout=null, clientFailureDetectionTimeout=30
000, metricsLogFreq=60000, hadoopCfg=null, connectorCfg=ConnectorConfiguration [jettyPath=null, host=null, port=11211, noDelay=true, directBuf=false, sndBufSize=32768,
rcvBufSize=32768, idleQryCurTimeout=600000, idleQryCurCheckFreq=60000, sndQueueLimit=0, selectorCnt=1, idleTimeout=7000, sslEnabled=false, sslClientAuth=false, sslCtxFa
ctory=null, sslFactory=null, portRange=100, threadPoolSize=8, msgInterceptor=null], odbcCfg=null, warmupClos=null, atomicCfg=AtomicConfiguration [seqReserveSize=1000, c
acheMode=PARTITIONED, backups=1, aff=null, grpName=null], classLdr=null, sslCtxFactory=null, platformCfg=null, binaryCfg=null, memCfg=null, pstCfg=null, dsCfg=DataStora
geConfiguration [sysRegionInitSize=41943040, sysRegionMaxSize=104857600, pageSize=4096, concLvl=0, dfltDataRegConf=DataRegionConfiguration [name=Default_Region, maxSize
=131072000, initSize=26214400, swapPath=null, pageEvictionMode=DISABLED, evictionThreshold=0.9, emptyPagesPoolSize=100, metricsEnabled=false, metricsSubIntervalCount=5,
 metricsRateTimeInterval=60000, persistenceEnabled=true, checkpointPageBufSize=0, lazyMemoryAllocation=true], dataRegions=null, storagePath=db, checkpointFreq=60000, lo
ckWaitTime=10000, checkpointThreads=4, checkpointWriteOrder=SEQUENTIAL, walHistSize=20, maxWalArchiveSize=250000000, walSegments=4, walSegmentSize=67108864, walPath=db/wal, walArchivePath=db/wal, metricsEnabled=false, walMode=BACKGROUND, walTlbSize=131072, walBuffSize=33554432, walFlushFreq=5000, walFsyncDelay=1000, walRecordIterBuffSize=67108864, alwaysWriteFullPages=false, fileIOFactory=org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIOFactory@25a02442, metricsSubIntervalCnt=5, metricsRateTimeInterval=60000, walAutoArchiveAfterInactivity=-1, writeThrottlingEnabled=true, walCompactionEnabled=false, walCompactionLevel=1, checkpointReadLockTimeout=null, walPageCompression=DISABLED, walPageCompressionLevel=null], activeOnStart=true, autoActivation=false, longQryWarnTimeout=3000, sqlConnCfg=null, cliConnCfg=ClientConnectorConfiguration [host=sit-aztec-authentication-service, port=10900, portRange=10, sockSndBufSize=0, sockRcvBufSize=0, tcpNoDelay=true, maxOpenCursorsPerConn=64, threadPoolSize=8, idleTimeout=0, handshakeTimeout=10000, jdbcEnabled=true, odbcEnabled=true, thinCliEnabled=true, sslEnabled=false, useIgniteSslCtxFactory=true, sslClientAuth=false, sslCtxFactory=null, thinCliCfg=ThinClientConfiguration [maxActiveTxPerConn=100]], mvccVacuumThreadCnt=2, mvccVacuumFreq=5000, authEnabled=false, failureHnd=null, commFailureRslvr=null]

贝杜吟

2023-03-14

我知道这是什么问题了。显然这与我在Kubernetes中配置服务对象的方式有关。我不确定这是一个bug还是一个特性，但看起来Ignite节点只能扩展到节点，而不能跨节点。我的意思是服务对象应该是唯一的一个节点。如果您跨节点（微服务）共享服务对象，期望集群分布在多个节点上，那么它将挂起。（我不确定这是否是反模式）工作的方法是保持服务对象对节点的唯一性，然后在需要时缩放节点。

我认为如果是这样的话，那么我们可能应该将ignite节点作为一个单独的集群，而不是嵌入到微服务中。

Apache在未加入群集的Kubernetes上点火

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档