我试图在远程群集上运行Spark应用程序,但遇到序列化错误。Scala和Spark版本是相同的。我被困在这一点上了。
spark shell-群集上的版本:
root@a913008dd071:/usr/local/spark-2.1.1# ./bin/spark-submit --version
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.1.1
/_/
Using Scala version 2.11.8, OpenJDK 64-Bit Server VM, 1.8.0_131
Branch
Compiled by user jenkins on 2017-04-25T23:51:10Z
Revision
Url
Type --help for more information.
建筑sbt
import sbt.ExclusionRule
name := "hxfa"
version := "1.0"
scalaVersion := "2.11.8"
val elasticVersion = "5.4.1"
resolvers += "Spark Packages" at "https://dl.bintray.com/spark-packages/maven/"
resolvers += "Additional spark packages" at "https://dl.bintray.com/sbcd90/org.apache.spark"
resolvers += "Apache HBase" at "https://repository.apache.org/content/repositories/releases"
resolvers += "Thrift" at "http://people.apache.org/~rawson/repo/"
resolvers += "Spring Plugins" at "http://repo.spring.io/plugins-release/"
/* Dependencies */
libraryDependencies ++= Seq(
// Framework and configuration
"org.springframework.boot" % "spring-boot-starter-web" % "1.5.4.RELEASE",
"org.hibernate" % "hibernate-validator" % "5.2.4.Final",
/* Serializations */
"com.fasterxml.jackson.core" % "jackson-core" % "2.8.7",
"com.fasterxml.jackson.core" % "jackson-databind" % "2.8.7",
"com.fasterxml.jackson.module" % "jackson-module-scala_2.11" % "2.8.7",
"com.esotericsoftware" % "kryo" % "4.0.0",
// Spark and utilities
"org.apache.spark" %% "spark-core" % "2.1.0",
"org.apache.spark" %% "spark-sql" % "2.1.0" ,
"org.apache.spark" %% "spark-mllib" % "2.1.0" ,
"graphframes" % "graphframes" % "0.5.0-spark2.1-s_2.11",
// Spark connectors
"org.elasticsearch" % "elasticsearch-spark-20_2.11" % elasticVersion,
"org.mongodb.spark" % "mongo-spark-connector_2.11" % "2.0.0",
//JDBC
"mysql" % "mysql-connector-java" % "5.1.35",
// HBase
"org.apache.hbase" % "hbase" % "1.2.4",
"org.apache.hbase" % "hbase-client" % "1.2.4",
"org.apache.hbase" % "hbase-common" % "1.2.4",
// OrientDB
"com.orientechnologies" % "orientdb-graphdb" % "2.2.20"
).map(_.excludeAll(ExclusionRule("org.slf4j", "slf4j-log4j12"), ExclusionRule("log4j", "log4j")))
libraryDependencies ++= Seq(
"org.apache.hbase" % "hbase-server" % "1.2.4"
).map(_.excludeAll(
ExclusionRule("com.sun.jersey", "jersey-server"),
ExclusionRule("tomcat"),
ExclusionRule("log4j", "log4j")
))
/* Assembly */
mainClass in assembly := Some("com.x.x.hello.app.HX")
assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false, includeDependency = false)
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs@_*) => MergeStrategy.discard
case x => MergeStrategy.first
}
堆栈跟踪:
spark submit——version显示其软件包的spark和scala版本,而不是您的系统的版本,而sbt使用的是您系统的scala版本。所以
请换衣服
"org.apache.spark" %% "spark-core" % "2.1.0",
"org.apache.spark" %% "spark-sql" % "2.1.0" ,
"org.apache.spark" %% "spark-mllib" % "2.1.0" ,
到
"org.apache.spark" % "spark-core_2.11" % "2.1.1",
"org.apache.spark" % "spark-sql_2.11" % "2.1.1" ,
"org.apache.spark" % "spark-mllib_2.11" % "2.1.1" ,
如果它没有帮助,请更新您的问题与您的系统scala版本,您是如何提交应用程序和您的远程机器的scala和火花版本。
我对Apache Spark的世界比较陌生。我正在尝试使用LinearRegressionWithSGD()来估计一个大规模模型,我希望在不需要创建庞大的设计矩阵的情况下估计固定效果和交互项。 我注意到在决策树中有一个支持分类变量的实现 https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark
我在尝试将spark数据帧的一列从十六进制字符串转换为双精度字符串时遇到了一个问题。我有以下代码: 我无法共享txs数据帧的内容,但以下是元数据: 但当我运行这个程序时,我得到了一个错误: 错误:类型不匹配;找到:MsgRow需要:org.apache.spark.sql.行MsgRow(row.getLong(0),row.getString(1),row.getString(2),hex2in
我已经和Cassandra合作了一段时间,并遵循了以下链接中的基准测试提示: http://www.datastax.com/dev/blog/how-not-to-benchmark-cassandra 我有4个节点运行Cassandra,2个不同的节点使用本机基准测试工具“cassandra-stress”为集群提供数据。我知道,由于Cassandra写操作的LSM特性,它们很难绑定到IO,但
scala的版本是2.11.8;jdk为1.8;spark是2.0.2 我试图在spark apache的官方网站上运行一个LDA模型的示例,我从以下句子中得到了错误消息: 错误按摩是: 错误:(49,25)读取的值不是组织的成员。阿帕奇。火花SparkContext val dataset=spark。阅读格式(“libsvm”)^ 我不知道怎么解决。
将现有应用程序从Spark 1.6移动到Spark 2.2*(最终)会导致错误“org.apache.spark.SparkExctive:任务不可序列化”。我过于简化了我的代码,以演示同样的错误。代码查询拼花文件以返回以下数据类型:“org.apache.spark.sql.数据集[org.apache.spark.sql.行]”我应用一个函数来提取字符串和整数,返回字符串。一个固有的问题与Sp