当前位置: 首页 > 知识库问答 >
问题:

当使用高达64G的大型Xmx时,点燃启动非常慢,并且无法在超时时形成集群

壤驷骁
2023-03-14

我有3台ignite服务器,每台都有128G内存,当我用JVM opt-with-Xmx和Xms64G启动集群时,它启动非常慢,需要几分钟

我的jvm选择:

JVM_OPTS="-server       \
-Xms64G     \
-Xmx64G     \
-XX:+AlwaysPreTouch     \
-XX:+UseG1GC        \
-XX:+ScavengeBeforeFullGC       \
-XX:+DisableExplicitGC      \
-XX:MaxGCPauseMillis=200        \
-XX:InitiatingHeapOccupancyPercent=45       \
-XX:+PrintGCDateStamps      \
-XX:+PrintGCDetails     \
-Xloggc:/var/log/apache-ignite/apache-ignite-gc.log     \
-XX:+UseGCLogFileRotation       \
-XX:GCLogFileSize=10M       \
-XX:NumberOfGCLogFiles=20       \
-Djava.awt.headless=true"

当第一台服务器启动时,我触发第二台服务器,形成集群需要很长时间(似乎加载/读取一些数据,非常慢),但最后看起来还可以,但当我启动第三台服务器时,它总是会使集群失败,日志显示一些节点退出(超时?),我无法启动第三个节点

[11:48:33,743][INFO][tcp-disco-sock-reader-#5][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/192.168.28.162:38607, rmtPort=38607
[11:48:39,468][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery accepted incoming connection [rmtAddr=/192.168.28.162, rmtPort=53016]
[11:48:39,469][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery spawning a new thread for connection [rmtAddr=/192.168.28.162, rmtPort=53016]
[11:48:39,469][INFO][tcp-disco-sock-reader-#7][TcpDiscoverySpi] Started serving remote node connection [rmtAddr=/192.168.28.162:53016, rmtPort=53016]
[11:48:43,770][WARNING][tcp-disco-msg-worker-#3][TcpDiscoverySpi] Timed out waiting for message delivery receipt (most probably, the reason is in long GC pauses on remote node; consider tuning GC and increasing 'ackTimeout' configuration property). Will retry to send message with increased timeout [currentTimeout=10000, rmtAddr=/192.168.28.161:47500, rmtPort=47500]
[11:48:43,775][WARNING][tcp-disco-msg-worker-#3][TcpDiscoverySpi] Failed to send message to next node [msg=TcpDiscoveryNodeAddedMessage [node=TcpDiscoveryNode [id=3f7dad77-dc5d-45a2-ad02-89e590304c03, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.28.161], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /192.168.28.161:47500], discPort=47500, order=0, intOrder=3, lastExchangeTime=1525448898690, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], dataPacket=o.a.i.spi.discovery.tcp.internal.DiscoveryDataPacket@24ac7e3f, discardMsgId=null, discardCustomMsgId=null, top=null, clientTop=null, gridStartTime=1525448001350, super=TcpDiscoveryAbstractMessage [sndNodeId=84b416f0-f146-4907-9ff6-9391568eacea, id=43b49cb2361-84b416f0-f146-4907-9ff6-9391568eacea, verifierNodeId=84b416f0-f146-4907-9ff6-9391568eacea, topVer=0, pendingIdx=0, failedNodes=null, isClient=false]], next=TcpDiscoveryNode [id=3f7dad77-dc5d-45a2-ad02-89e590304c03, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.28.161], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /192.168.28.161:47500], discPort=47500, order=0, intOrder=3, lastExchangeTime=1525448898690, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], errMsg=Failed to send message to next node [msg=TcpDiscoveryNodeAddedMessage [node=TcpDiscoveryNode [id=3f7dad77-dc5d-45a2-ad02-89e590304c03, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.28.161], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /192.168.28.161:47500], discPort=47500, order=0, intOrder=3, lastExchangeTime=1525448898690, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], dataPacket=o.a.i.spi.discovery.tcp.internal.DiscoveryDataPacket@24ac7e3f, discardMsgId=null, discardCustomMsgId=null, top=null, clientTop=null, gridStartTime=1525448001350, super=TcpDiscoveryAbstractMessage [sndNodeId=84b416f0-f146-4907-9ff6-9391568eacea, id=43b49cb2361-84b416f0-f146-4907-9ff6-9391568eacea, verifierNodeId=84b416f0-f146-4907-9ff6-9391568eacea, topVer=0, pendingIdx=0, failedNodes=null, isClient=false]], next=ClusterNode [id=3f7dad77-dc5d-45a2-ad02-89e590304c03, order=0, addr=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.28.161], daemon=false]]]
[11:48:47,355][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery accepted incoming connection [rmtAddr=/192.168.28.161, rmtPort=52626]
[11:48:47,355][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery spawning a new thread for connection [rmtAddr=/192.168.28.161, rmtPort=52626]
[11:48:47,355][INFO][tcp-disco-sock-reader-#8][TcpDiscoverySpi] Started serving remote node connection [rmtAddr=/192.168.28.161:52626, rmtPort=52626]
[11:48:47,376][INFO][tcp-disco-sock-reader-#8][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/192.168.28.161:52626, rmtPort=52626
[11:48:49,485][INFO][tcp-disco-sock-reader-#7][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/192.168.28.162:53016, rmtPort=53016
[11:48:49,551][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery accepted incoming connection [rmtAddr=/192.168.28.162, rmtPort=51850]
[11:48:49,551][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery spawning a new thread for connection [rmtAddr=/192.168.28.162, rmtPort=51850]
[11:48:49,551][INFO][tcp-disco-sock-reader-#9][TcpDiscoverySpi] Started serving remote node connection [rmtAddr=/192.168.28.162:51850, rmtPort=51850]
[11:48:49,553][INFO][tcp-disco-sock-reader-#9][TcpDiscoverySpi] Received ping request from the remote node [rmtNodeId=84b416f0-f146-4907-9ff6-9391568eacea, rmtAddr=/192.168.28.162:51850, rmtPort=51850]
[11:48:49,554][INFO][tcp-disco-sock-reader-#9][TcpDiscoverySpi] Finished writing ping response [rmtNodeId=84b416f0-f146-4907-9ff6-9391568eacea, rmtAddr=/192.168.28.162:51850, rmtPort=51850]
[11:48:49,554][INFO][tcp-disco-sock-reader-#9][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/192.168.28.162:51850, rmtPort=51850
[11:48:51,166][WARNING][tcp-disco-msg-worker-#3][TcpDiscoverySpi] Local node has detected failed nodes and started cluster-wide procedure. To speed up failure detection please see 'Failure Detection' section under javadoc for 'TcpDiscoverySpi'
[11:48:51,169][WARNING][tcp-disco-msg-worker-#3][TcpDiscoverySpi] Node is out of topology (probably, due to short-time network problems).
[11:48:51,169][WARNING][disco-event-worker-#101][GridDiscoveryManager] Local node SEGMENTED: TcpDiscoveryNode [id=ba873112-9978-4845-9d92-b25816edbb34, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.28.163], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, Redis3/192.168.28.163:47500], discPort=47500, order=2, intOrder=2, lastExchangeTime=1525448931164, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=false]
[11:48:51,176][WARNING][disco-event-worker-#101][GridDiscoveryManager] Stopping local node according to configured segmentation policy.
[11:48:51,177][WARNING][disco-event-worker-#101][GridDiscoveryManager] Node FAILED: TcpDiscoveryNode [id=84b416f0-f146-4907-9ff6-9391568eacea, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.28.162], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /192.168.28.162:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1525448128889, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false]
[11:48:51,178][INFO][disco-event-worker-#101][GridDiscoveryManager] Topology snapshot [ver=3, servers=1, clients=0, CPUs=32, offheap=25.0GB, heap=64.0GB]
[11:48:51,178][INFO][disco-event-worker-#101][GridDiscoveryManager] Data Regions Configured:
[11:48:51,179][INFO][disco-event-worker-#101][GridDiscoveryManager]   ^-- default [initSize=256.0 MiB, maxSize=25.1 GiB, persistenceEnabled=true]
[11:48:51,184][INFO][Thread-34][GridTcpRestProtocol] Command protocol successfully stopped: TCP binary
[11:48:51,192][INFO][Thread-34][GridJettyRestProtocol] Command protocol successfully stopped: Jetty REST
[11:48:51,196][INFO][exchange-worker-#102][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=0], crd=true, evt=NODE_FAILED, evtNode=84b416f0-f146-4907-9ff6-9391568eacea, customEvt=null, allowMerge=true]
[11:48:51,197][INFO][exchange-worker-#102][GridDhtPartitionsExchangeFuture] Finish exchange future [startVer=AffinityTopologyVersion [topVer=3, minorTopVer=0], resVer=null, err=class org.apache.ignite.internal.IgniteInterruptedCheckedException: null]
[11:48:51,201][INFO][db-checkpoint-thread-#114][GridCacheDatabaseSharedManager] Skipping checkpoint (no pages were modified) [checkpointLockWait=0ms, checkpointLockHoldTime=0ms, reason='timeout']
[11:48:51,212][INFO][Thread-34][GridCacheProcessor] Stopped cache [cacheName=ignite-sys-cache]
[11:48:51,244][INFO][Thread-34][IgniteKernal] 

我在网上搜索过,看到有人说使用了一个很大的值:

<property name="failureDetectionTimeout" value="200000"/>

,这样第三个节点可以启动,但出现致命错误:

[15:00:17,086][SEVERE][exchange-worker-#102][GridCachePartitionExchangeManager] Failed to wait for completion of partition map exchange (preloading will not start): GridDhtPartitionsExchangeFuture [firstDiscoEvt=DiscoveryCustomEvent [customMsg=null, affTopVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], super=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=d520c781-a2ec-44a3-8f39-b2d72928a811, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.28.163], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /192.168.28.163:47500], discPort=47500, order=3, intOrder=3, lastExchangeTime=1525460353176, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=3, nodeId8=7a048246, msg=null, type=DISCOVERY_CUSTOM_EVT, tstamp=1525460411312]], crd=TcpDiscoveryNode [id=7a048246-9b6e-4f03-b6bc-f44e69feb0db, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.28.161], sockAddrs=[Redis1/192.168.28.161:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1525460417080, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=false], exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], discoEvt=DiscoveryCustomEvent [customMsg=null, affTopVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], super=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=d520c781-a2ec-44a3-8f39-b2d72928a811, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 192.168.28.163], sockAddrs=[/0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /192.168.28.163:47500], discPort=47500, order=3, intOrder=3, lastExchangeTime=1525460353176, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=3, nodeId8=7a048246, msg=null, type=DISCOVERY_CUSTOM_EVT, tstamp=1525460411312]], nodeId=d520c781, evt=DISCOVERY_CUSTOM_EVT], added=true, initFut=GridFutureAdapter [ignoreInterrupts=false, state=DONE, res=true, hash=2145189203], init=true, lastVer=GridCacheVersion [topVer=0, order=1525460352448, nodeOrder=0], partReleaseFut=PartitionReleaseFuture [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], futures=[ExplicitLockReleaseFuture [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], futures=[]], TxReleaseFuture [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], futures=[]], AtomicUpdateReleaseFuture [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], futures=[]], DataStreamerReleaseFuture [topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1], futures=[]]]], exchActions=null, affChangeMsg=null, initTs=1525460411332, centralizedAff=false, changeGlobalStateE=null, done=true, state=DONE, evtLatch=0, remaining=[], super=GridFutureAdapter [ignoreInterrupts=false, state=DONE, res=class o.a.i.IgniteCheckedException: Cluster state change failed., hash=204829629]]
class org.apache.ignite.IgniteCheckedException: Cluster state change failed.
    at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.finishExchangeOnCoordinator(GridDhtPartitionsExchangeFuture.java:2539)
    at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onAllReceived(GridDhtPartitionsExchangeFuture.java:2334)
    at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processSingleMessage(GridDhtPartitionsExchangeFuture.java:2071)
    at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$100(GridDhtPartitionsExchangeFuture.java:124)
    at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:1928)
    at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:1916)
    at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)
    at org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353)
    at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onReceiveSingleMessage(GridDhtPartitionsExchangeFuture.java:1916)
    at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.processSinglePartitionUpdate(GridCachePartitionExchangeManager.java:1531)
    at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.access$1000(GridCachePartitionExchangeManager.java:133)
    at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:332)
    at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$2.onMessage(GridCachePartitionExchangeManager.java:312)
    at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2689)
    at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$MessageHandler.apply(GridCachePartitionExchangeManager.java:2668)
    at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1060)
    at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579)
    at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:378)
    at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:304)
    at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:99)
    at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:293)
    at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1555)
    at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1183)
    at org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
    at org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1090)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

当我更改为 XmX 到 4G 时,它会快一点(仍然很慢),但集群将形成 3 个节点。

我如何使用 128G 大内存,因为我的工作需要非常大的内存来处理,以及如何更快地使 3 节点集群的集群/启动,而且我认为使用 failureDetectionTimeout 高达 200S 不是一个好方法。

共有1个答案

任文乐
2023-03-14

目标计算机上有多少 RAM?

Ignite 2.x将数据存储在堆外存储中(即不低于-Xmx)。默认情况下,20%的RAM用于堆外存储策略,您可以在配置中更改。然后,如果(Xmx内存策略)≥ 总的RAM,你的系统最终会停止运行,你的集群会被分割。

考虑到数据存储在堆外,Ignite真的需要这么大的Xmx吗?我会推荐-Xmx4G和60G内存策略。

群集应立即启动。如果没有发生这种情况,请尝试将多播发现替换为 VM 发现。

 类似资料:
  • 我使用的是Spring Boot版本2.0.6.RELEASE 我有以下 API 方法 getSearchData方法运行大约需要40秒。30秒后,我的代码超时,日志中显示以下消息: 并返回以下json响应 getSearchData()方法继续执行,并最终在日志中显示为完成。如果我去掉未来返回类型,只返回一个列表,代码就能正常工作。我已经尝试了以下属性,但不幸的是没有成功。 有谁知道我需要做些什

  • 我需要你们关于JPA spring数据的联系。我的数据库使用MySQL,数据容量为20GB(20m行)。当我执行findByID(字符串id)-(不是唯一标识符)时。这需要10多分钟。。。 性能问题可能是什么?? 我的对象实体: 我的职能: @自动连线专用数据存储库数据存储库; Hibernate统计: 如果我移除findBy需要10秒。。。 谢谢,伊丹

  • 我怎样才能在一定的时间后超时一个承诺?我知道Q有一个承诺超时,但我使用的是原生NodeJS承诺,它们没有.timeout函数。 是我少了一个还是它的包装方式不同? 或者,下面的实现在不占用内存方面是否很好,实际上是按照预期工作的? 同样,我可以使它以某种方式进行全局包装,以便我可以在创建的每个承诺中使用它,而不必重复setTimeout和clearTimeout代码吗? 谢谢!

  • 问题内容: 由于某种原因,我的hibernate应用程序的启动非常缓慢。(最多2分钟)我一直在思考c3p0配置是完全错误,但是研究日志显示,在建立与服务器的连接之后,没有任何活动。同样,使用Hibernate的内置轮询功能可以显示相同的结果。 这是日志的摘录: (请注意#comment#。) 我也尝试了较旧的Postgres JDBC驱动程序,但没有任何运气。 连接到本地数据库就可以了。立即建立连

  • 我有一个大型的实时数据库,其中大约1000个用户每分钟更新2个或更多的更新。同时,有4个用户正在获取报告和添加新的项目。到目前为止,主表包含大约200万和400万行。 使用这些表的查询花费了太多时间,即使是简单的查询,如: 分别用了10秒和26秒 大报告现在只需15分钟!!!太,太,太多时间了。 我使用的所有桌子都是innodb 在我读到关于名誉的文章之前,有没有什么方法可以解决这个问题?? 事先

  • 问题内容: 我有一个简单的表格视图,可以在其中处理表格视图的select动作。此动作是继事之后。 如果segue是segue,则立即显示下一个视图。如果segue是segue,则下一个视图为: 需要大约6秒钟才能显示 如果再次点击(第二次点击)会立即显示 我尝试寻找一些想法,但似乎没有一个适合我的情况。特别是: 我正在主UI线程上执行segue 我的观点非常简单(因此中没有问题)。加上在segue

  • 我使用的是Guidewire开发工作室(基于IntelliJ的IDE),在处理大文本文件(~1500行及以上)时速度非常慢。我也尝试了一个开箱即用的社区IntelliJ,但遇到了同样的问题。 当我打开这些文件时,键入一个字符需要 1 秒,即使我清楚地看到使用的内存仍然足够 (1441 MB/3959 MB)。此外,如果我打开多个文件,它会迅速吸收所有内存(我只为 IntelliJ 分配 4GB)。

  • 问题内容: 我是Ubuntu的新手,我在Ubuntu 10.04中安装了XAMPP。当我启动XAMPP时,它说MySQL无法启动。 这是我的终端输出: 然后,当我转到时,我陷入了XAMPP初始屏幕,如果选择语言,则什么也不会发生。 问题答案: 我找到了答案。首先完全卸载lampp,然后重新安装。卸载之前,请使用以下命令停止lampp: 然后使用命令卸载/删除它 然后重新安装。