我想创建一个独立的scala代码,使用自定义设置在MongoDB网站上使用该代码从MongoDB读取。
当我运行SBT包时,我会遇到一些错误。我猜这与SparkSession的错误创作方法有关。你能给我一个提示来修理它吗?
我的建筑。sbt
内容
scalaVersion := "2.11.12"
libraryDependencies ++= Seq(
"org.mongodb.spark" %% "mongo-spark-connector" % "2.4.1",
"org.apache.spark" %% "spark-core" % "2.4.1",
"org.apache.spark" %% "spark-sql" % "2.4.1"
)
Firstapp.scala代码
package com.mongodb
import org.apache.spark.sql.SparkSession
import com.mongodb.spark.config.{ReadConfig,WriteConfig}
import com.mongodb.spark.MongoSpark
import org.bson.Document
object FirstApp {
def main(args: Array[String]) {
val sc = SparkSession.builder()
.master("local")
.appName("MongoSparkConnectorIntro")
.config("spark.mongodb.input.uri", "mongodb://127.0.0.1/test.myCollection")
.config("spark.mongodb.output.uri", "mongodb://127.0.0.1/test.myCollection")
.getOrCreate()
val readConfig = ReadConfig(Map("collection" -> "spark", "readPreference.name" -> "secondaryPreferred"), Some(ReadConfig(sc)))
val customRdd = MongoSpark.load(sc, readConfig)
println(customRdd.count)
println(customRdd.first.toJson)
}
}
以及运行sbt包
后的错误
value toJson is not a member of org.apache.spark.sql.Row
[error] println(customRdd.first.toJson)
[error] ^
[error] one error found
[error] (Compile / compileIncremental) Compilation failed
[error] Total time: 10 s, completed Jun 10, 2020 6:10:50 PM
编辑1:
我尝试了这个解决方案,但没有正确编译。Buid。sbt
内容同上。我改变了SimpleApp。将scala
转换为:
import com.mongodb.spark.MongoSpark
import com.mongodb.spark.config.ReadConfig
import org.apache.spark.sql.SparkSession
object FirstApp {
def main(args: Array[String]) {
val spark = SparkSession.builder()
.master("local")
.appName("MongoSparkConnectorIntro")
.config("spark.mongodb.input.uri", "mongodb://127.0.0.1/test.myCollection")
.config("spark.mongodb.output.uri", "mongodb://127.0.0.1/test.myCollection")
.getOrCreate()
val sc = spark.sparkContext
val readConfig = ReadConfig(Map("collection" -> "spark", "readPreference.name" -> "secondaryPreferred"), Some(ReadConfig(sc)))
val customRdd = MongoSpark.load(sc)
println(customRdd.count())
println(customRdd.first.toJson)
}
}
汇编结果如下:
$ spark-submit --class "FirstApp" --master local[4] target/scala-2.11/root-2_2.11-0.1.0-SNAPSHOT.jar
20/06/12 07:09:53 WARN Utils: Your hostname, Project resolves to a loopback address: 127.0.1.1; using 10.0.2.15 instead (on interface enp0s3)
20/06/12 07:09:53 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
20/06/12 07:09:54 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/06/12 07:09:54 INFO SparkContext: Running Spark version 2.4.5
20/06/12 07:09:54 INFO SparkContext: Submitted application: MongoSparkConnectorIntro
20/06/12 07:09:55 INFO SecurityManager: Changing view acls to: sadegh
20/06/12 07:09:55 INFO SecurityManager: Changing modify acls to: sadegh
20/06/12 07:09:55 INFO SecurityManager: Changing view acls groups to:
20/06/12 07:09:55 INFO SecurityManager: Changing modify acls groups to:
20/06/12 07:09:55 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(sadegh); groups with view permissions: Set(); users with modify permissions: Set(sadegh); groups with modify permissions: Set()
20/06/12 07:09:55 INFO Utils: Successfully started service 'sparkDriver' on port 33031.
20/06/12 07:09:55 INFO SparkEnv: Registering MapOutputTracker
20/06/12 07:09:55 INFO SparkEnv: Registering BlockManagerMaster
20/06/12 07:09:55 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/06/12 07:09:55 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/06/12 07:09:55 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-7405e1be-08e8-4f58-b88e-b8f01f8fe87e
20/06/12 07:09:55 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
20/06/12 07:09:55 INFO SparkEnv: Registering OutputCommitCoordinator
20/06/12 07:09:55 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
20/06/12 07:09:55 INFO Utils: Successfully started service 'SparkUI' on port 4041.
20/06/12 07:09:56 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.0.2.15:4041
20/06/12 07:09:56 INFO SparkContext: Added JAR file:/Folder/target/scala-2.11/root-2_2.11-0.1.0-SNAPSHOT.jar at spark://10.0.2.15:33031/jars/root-2_2.11-0.1.0-SNAPSHOT.jar with timestamp 1591938596069
20/06/12 07:09:56 INFO Executor: Starting executor ID driver on host localhost
20/06/12 07:09:56 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42815.
20/06/12 07:09:56 INFO NettyBlockTransferService: Server created on 10.0.2.15:42815
20/06/12 07:09:56 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/06/12 07:09:56 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.0.2.15, 42815, None)
20/06/12 07:09:56 INFO BlockManagerMasterEndpoint: Registering block manager 10.0.2.15:42815 with 366.3 MB RAM, BlockManagerId(driver, 10.0.2.15, 42815, None)
20/06/12 07:09:56 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.0.2.15, 42815, None)
20/06/12 07:09:56 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.0.2.15, 42815, None)
Exception in thread "main" java.lang.NoClassDefFoundError: com/mongodb/spark/config/ReadConfig$
at FirstApp$.main(SimpleApp.scala:16)
at FirstApp.main(SimpleApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: com.mongodb.spark.config.ReadConfig$
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 14 more
20/06/12 07:09:56 INFO SparkContext: Invoking stop() from shutdown hook
20/06/12 07:09:56 INFO SparkUI: Stopped Spark web UI at http://10.0.2.15:4041
20/06/12 07:09:56 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/06/12 07:09:56 INFO MemoryStore: MemoryStore cleared
20/06/12 07:09:56 INFO BlockManager: BlockManager stopped
20/06/12 07:09:56 INFO BlockManagerMaster: BlockManagerMaster stopped
20/06/12 07:09:56 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/06/12 07:09:56 INFO SparkContext: Successfully stopped SparkContext
20/06/12 07:09:56 INFO ShutdownHookManager: Shutdown hook called
20/06/12 07:09:56 INFO ShutdownHookManager: Deleting directory /tmp/spark-7f90ac08-403c-4a3f-bb45-ea24a347c380
20/06/12 07:09:56 INFO ShutdownHookManager: Deleting directory /tmp/spark-78cb32aa-c6d1-4ba4-b94f-16d3761d181b
编辑2:
我添加了。配置(“spark.jars.packages”、“org.mongodb.spark:mongo-spark-connector_2.11:2.4.1”)
toSimpleApp。scala
但错误与EDIT1部分相同:
import com.mongodb.spark.MongoSpark
import com.mongodb.spark.config.ReadConfig
import org.apache.spark.sql.SparkSession
object FirstApp {
def main(args: Array[String]) {
val spark = SparkSession.builder()
.master("local")
.appName("MongoSparkConnectorIntro")
.config("spark.mongodb.input.uri", "mongodb://127.0.0.1/test.myCollection")
.config("spark.mongodb.output.uri", "mongodb://127.0.0.1/test.myCollection")
.config("spark.jars.packages", "org.mongodb.spark:mongo-spark-connector_2.11:2.4.1")
.getOrCreate()
val sc = spark.sparkContext
val readConfig = ReadConfig(Map("collection" -> "spark", "readPreference.name" -> "secondaryPreferred"), Some(ReadConfig(sc)))
val customRdd = MongoSpark.load(sc)
println(customRdd.count())
println(customRdd.first.toJson)
}
}
我不知道你是否能解决你的问题,但这就是我的工作方式。您需要在spark submit
网站提供--软件包
,如下所示:
spark-submit --class "FirstApp" --master local[4] --packages org.mongodb.spark:mongo-spark-connector_2.11:2.4.1 target/scala-2.11/root-2_2.11-0.1.0-SNAPSHOT.jar
我认为你的问题是你试图将SparkSession
用作SparkContext
,但它们不是一回事。如果将sc
替换为SparkContext
所有内容都将编译。
import com.mongodb.spark.MongoSpark
import com.mongodb.spark.config.ReadConfig
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder()
.master("local")
.appName("MongoSparkConnectorIntro")
.config("spark.mongodb.input.uri", "mongodb://127.0.0.1/test.myCollection")
.config("spark.mongodb.output.uri", "mongodb://127.0.0.1/test.myCollection")
.getOrCreate()
val sc = spark.sparkContext
val readConfig = ReadConfig(Map("collection" -> "spark", "readPreference.name" -> "secondaryPreferred"), Some(ReadConfig(sc)))
val customRdd = MongoSpark.load(sc)
println(customRdd.count())
println(customRdd.first.toJson)
下面是创建一个Scala项目的详细步骤,该项目使用Apache火花从MongoDB读取数据
您可以使用IDE或手动创建包含以下文件的项目
项目/插件。sbt
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.10")
build.sbt
name := "SparkMongo"
version := "0.1"
scalaVersion := "2.11.12"
val sparkVersion = "2.4.1"
val mongoSparkVersion = "2.4.1"
libraryDependencies ++= Seq(
"org.mongodb.spark" %% "mongo-spark-connector" % mongoSparkVersion ,
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion
)
assemblyJarName in assembly := s"${name.value}_${scalaBinaryVersion.value}-${version.value}.jar"
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs @ _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
SparkMongo/src/main/scala/com/test/FirstMongoSparkApp.scala
package com.test
import com.mongodb.spark.MongoSpark
import com.mongodb.spark.config.ReadConfig
import org.apache.spark.sql.SparkSession
object FirstMongoSparkApp extends App {
val spark = SparkSession.builder()
.master("local")
.appName("MongoSparkProject")
.config("spark.mongodb.input.uri", "mongodb://localhost/test.cities")
.config("spark.mongodb.output.uri", "mongodb://localhost/test.outputCities")
.getOrCreate()
import spark.implicits._
val readConfig = ReadConfig(Map("collection" -> "cities", "readPreference.name" -> "secondaryPreferred"), Some(ReadConfig(spark.sparkContext)))
val customRdd = MongoSpark.load(spark.sparkContext, readConfig)
customRdd.toDF().show(false)
}
现在您可以执行sbt assembly
将生成一个jar文件SparkMongo_2.11-0.1。jar
可以按如下方式运行jar文件:
spark-submit --class "com.test.FirstMongoSparkApp" --master "local" target/scala-2.11/SparkMongo_2.11-0.1.jar
要无问题运行,请确保spark的版本与依赖项中的版本相同,在本例中为2.4.1和mongoDB 2.6版
我试图连接到我的本地机器上的Kafka(2.1),并在Flink(1.7.2)附带的scalashell中读取。 下面是我正在做的: 之后,最后一条语句我得到了以下错误: 我已经创建了一个名为“topic”的主题,我能够通过另一个客户端正确地生成和读取来自它的消息。我正在使用java版本1.8.0\u 201,并遵循https://ci.apache.org/projects/flink/flin
我在Android v4中读取APN时遇到问题。2(是读,不是写APN),它抛出了一个安全异常: 没有写入APN设置的权限:用户10068和当前进程都没有android。准许写入APN设置。 以前所有平台上都使用相同的代码,有人知道有什么解决方法吗? 谢谢
我有以下类,它从/到包裹读取和写入对象数组: 在上面的代码中,我在读取< code>readParcelableArray时得到一个< code>ClassCastException: 错误/AndroidRuntime(5880):原因:Java . lang . classcastexception:[land roid . OS . parcelable; 上面的代码有什么错误?在编写对象数
问题内容: 我有以下从缓冲读取器读取数据的示例: 每当缓冲读取器中出现某些情况时(在这种情况下),将执行循环中的代码。在我的情况下,如果客户端应用程序将某些内容写入套接字,则将执行循环中的代码(服务器应用程序中)。 但是我不明白它是如何工作的。等待直到缓冲读取器中出现某些内容,当其中出现某些内容时,它将返回并执行循环中的代码。但是什么时候可以退货。 还有另一个问题。上面的代码摘自一个方法,我在线程
下面是代码示例: 错误是: 错误:(17,13)value withFilter不是cats的成员。数据ReaderT[测试q.this.FailFast,映射[字符串,字符串],布尔值]b1 如何使用f2组合f1。f2必须仅在f1返回右(true)时应用。我通过以下方式解决了它: 但我希望有一个更优雅的解决方案。
我试图更好地理解Scala MongoDB src 使用scala mongodb驱动程序(api文档:http://mongodb.github.io/mongo-scala-driver/) 当我使用 似乎调用subscribe调用了一个新线程(因为它被称为subscribe),但我没有看到这个新线程是从src调用的? 当我使用? 更新: MongoClient.Close();正在inser