当前位置: 首页 > 知识库问答 >
问题:

使用spark-cassandra连接器在cassandra中写入时间

江鸿羲
2023-03-14

我的要求是尽可能的实时,这似乎离得很远。生产环境大约每3秒有400个事件。

是否需要对Cassandra中的YAML文件进行调优,或者对cassandra-connector本身进行任何更改

INFO  05:25:14 system_traces.events                      0,0
WARN  05:25:14 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN  05:25:14 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
WARN  05:25:15 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN  05:25:15 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN  05:25:15 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
WARN  05:25:15 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
INFO  05:25:16 ParNew GC in 340ms.  CMS Old Gen: 1308020680 -> 1454559048; Par Eden Space: 251658240 -> 0; 
WARN  05:25:16 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN  05:25:16 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
WARN  05:25:17 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN  05:25:17 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN  05:25:17 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
WARN  05:25:17 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
INFO  05:25:17 ParNew GC in 370ms.  CMS Old Gen: 1498825040 -> 1669094840; Par Eden Space: 251658240 -> 0; 
WARN  05:25:18 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN  05:25:18 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
WARN  05:25:18 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN  05:25:18 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN  05:25:19 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
WARN  05:25:19 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
INFO  05:25:19 ParNew GC in 382ms.  CMS Old Gen: 1714792864 -> 1875460032; Par Eden Space: 251658240 -> 0; 
W

共有1个答案

诸修伟
2023-03-14

我怀疑您在cassandra中遇到了与模式中定义的大量CFS/列相关的边缘情况。通常,当您看到墓碑警告时,这是因为您搞乱了数据模型。但是,这些都在系统表中,所以显然您对表做了一些作者没有预料到的事情(很多很多的表,可能会删除/重新创建它们很多)。

添加这些警告是因为扫描墓碑寻找活动列会导致内存压力,这会导致GC,这会导致暂停,这会导致缓慢。

您可以将数据压缩到更少的列族中吗?您可能还想尝试清除墓碑(将该表的gcgs降为零,在允许的情况下在系统上运行主要压缩?,将其调回默认值)。

 类似资料:
  • 我们最近开始了使用Scala、Spark和Cassandra的大数据项目,我对所有这些技术都是新手。我试图做简单的任务写到和读从卡桑德拉表。如果将属性名和列名都保留为小写或snake大小写(unserscores)就可以实现这一点,但我希望在scala代码中使用camel大小写。在Scala中使用camel case格式,在Cassandra中使用snake case格式,有没有更好的方法来实现这

  • 谁能告诉我为什么火花连接器要花这么多时间插入?我在代码中做了什么错误吗?或者使用spark-cassandra连接器进行插入操作是否不可取?

  • 问题-无法使用Spark Cassandra连接器1.5.0连接Cassandra 3.0 根据DataStax Spark Cassandra Connector文档,它说Spark Connector 1.5可以从Spark 1.5.0/1.6.0用于Cassandra 3.0。 你能告诉我我是不是漏掉了哪一步? 尝试的方法 在“pom.xml”中添加了单独的番石榴依赖项 提前谢了。

  • **dataframe2:从另一个来源获得的键的Dataframe(这些键是上表中ID列的分区键)-此表中不同键的数量约为0.15万** 现在,此代码总是导致“com.datastax.oss.driver.api.core.servererrors.ReadFailureException:在一致性LOCAL_ONE读取查询期间Cassandra失败(需要1个响应,但只有0个副本响应,1个失败)

  • 注意,这里是每个cassandra分区的限制,而不是每个spark分区的限制(连接器中现有的限制函数支持这一点)。 spark 2.0.1,连接器-2.0.0-M3

  • 我得到了一个错误:- 线程“main”java.lang.nosuchmethoderror:com.datastax.driver.core.queryoptions.setrefreshnodeintervalmillis(I)lcom/datastax/driver/core/queryoptions;**在com.datastax.spark.connector.cql.defaultCo