问题:
SparkStreaming处理实时数据将统计结果写入mongo,用mongo-java的api需要做一层判断即对某个维度进行查找如果存在则把指标更新,如果不存在则插入维度与指标字段,这种方式耗时效率低下
换用mongo-scala的api使用其upsert方式实现插入与跟新,需要query的字段需在mongo中建立索引
/**
* Performs an update operation.
* @param q search query for old object to update
* @param o object with which to update `q`
*/
def update[A, B](q: A, o: B, upsert: Boolean = false, multi: Boolean = false,
concern: com.mongodb.WriteConcern = this.writeConcern,
bypassDocumentValidation: Option[Boolean] = None)(implicit queryView: A => DBObject, objView: B => DBObject,
encoder: DBEncoder = customEncoderFactory.map(_.create).orNull): WriteResult = {
bypassDocumentValidation match {
case None => underlying.update(queryView(q), objView(o), upsert, multi, concern, encoder)
case Some(bypassValidation) => underlying.update(queryView(q), objView(o), upsert, multi, concern, bypassValidation, encoder)
}
}
添加依赖:
<dependency>
<groupId>org.mongodb</groupId>
<artifactId>casbah_2.11</artifactId>
<version>3.1.1</version>
<type>pom</type>
</dependency>
<dependency>
<groupId>org.mongodb.spark</groupId>
<artifactId>mongo-spark-connector_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
注:pom中和需要去掉java-mongo-driver的依赖,否则冲突