问题：

使用spark-cassandra连接器在cassandra中写入时间

江鸿羲

2023-03-14

我的要求是尽可能的实时，这似乎离得很远。生产环境大约每3秒有400个事件。

是否需要对Cassandra中的YAML文件进行调优，或者对cassandra-connector本身进行任何更改

INFO  05:25:14 system_traces.events                      0,0
WARN  05:25:14 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN  05:25:14 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
WARN  05:25:15 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN  05:25:15 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN  05:25:15 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
WARN  05:25:15 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
INFO  05:25:16 ParNew GC in 340ms.  CMS Old Gen: 1308020680 -> 1454559048; Par Eden Space: 251658240 -> 0; 
WARN  05:25:16 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN  05:25:16 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
WARN  05:25:17 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN  05:25:17 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN  05:25:17 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
WARN  05:25:17 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
INFO  05:25:17 ParNew GC in 370ms.  CMS Old Gen: 1498825040 -> 1669094840; Par Eden Space: 251658240 -> 0; 
WARN  05:25:18 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN  05:25:18 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
WARN  05:25:18 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN  05:25:18 Read 2124 live and 4248 tombstoned cells in system.schema_columnfamilies (see tombstone_warn_threshold). 2147483639 columns was requested, slices=[-]
WARN  05:25:19 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
WARN  05:25:19 Read 33972 live and 70068 tombstoned cells in system.schema_columns (see tombstone_warn_threshold). 2147483575 columns was requested, slices=[-]
INFO  05:25:19 ParNew GC in 382ms.  CMS Old Gen: 1714792864 -> 1875460032; Par Eden Space: 251658240 -> 0; 
W

共有1个答案

诸修伟

2023-03-14

我怀疑您在cassandra中遇到了与模式中定义的大量CFS/列相关的边缘情况。通常，当您看到墓碑警告时，这是因为您搞乱了数据模型。但是，这些都在系统表中，所以显然您对表做了一些作者没有预料到的事情（很多很多的表，可能会删除/重新创建它们很多）。

添加这些警告是因为扫描墓碑寻找活动列会导致内存压力，这会导致GC，这会导致暂停，这会导致缓慢。

您可以将数据压缩到更少的列族中吗？您可能还想尝试清除墓碑（将该表的gcgs降为零，在允许的情况下在系统上运行主要压缩？，将其调回默认值）。

类似资料：

Datastax spark cassandra连接器-将DF写入cassandra表

我们最近开始了使用Scala、Spark和Cassandra的大数据项目，我对所有这些技术都是新手。我试图做简单的任务写到和读从卡桑德拉表。如果将属性名和列名都保留为小写或snake大小写（unserscores）就可以实现这一点，但我希望在scala代码中使用camel大小写。在Scala中使用camel case格式，在Cassandra中使用snake case格式，有没有更好的方法来实现这
使用spark-cassandra连接器的Cassandra插入器性能

谁能告诉我为什么火花连接器要花这么多时间插入？我在代码中做了什么错误吗？或者使用spark-cassandra连接器进行插入操作是否不可取？
无法使用Spark Cassandra连接器1.5.0连接Cassandra 3.0

问题-无法使用Spark Cassandra连接器1.5.0连接Cassandra 3.0 根据DataStax Spark Cassandra Connector文档，它说Spark Connector 1.5可以从Spark 1.5.0/1.6.0用于Cassandra 3.0。你能告诉我我是不是漏掉了哪一步？尝试的方法在“pom.xml”中添加了单独的番石榴依赖项提前谢了。
Spark cassandra连接器+连接超时

**dataframe2:从另一个来源获得的键的Dataframe（这些键是上表中ID列的分区键）-此表中不同键的数量约为0.15万** 现在，此代码总是导致“com.datastax.oss.driver.api.core.servererrors.ReadFailureException：在一致性LOCAL_ONE读取查询期间Cassandra失败（需要1个响应，但只有0个副本响应，1个失败）
Spark Cassandra连接器-perPartitionLimit

注意，这里是每个cassandra分区的限制，而不是每个spark分区的限制（连接器中现有的限制函数支持这一点）。 spark 2.0.1，连接器-2.0.0-M3
spark-cassandra连接器的Spark cassandra集成错误

我得到了一个错误：- 线程“main”java.lang.nosuchmethoderror：com.datastax.driver.core.queryoptions.setrefreshnodeintervalmillis（I）lcom/datastax/driver/core/queryoptions；**在com.datastax.spark.connector.cql.defaultCo

使用spark-cassandra连接器在cassandra中写入时间

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档