我使用以下Dockerfile创建了一个Spark容器:
FROM ubuntu:16.04
RUN apt-get update -y && apt-get install -y \
default-jdk \
nano \
wget && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
RUN useradd --create-home --shell /bin/bash ubuntu
ENV HOME /home/ubuntu
ENV SPARK_VERSION 2.4.3
ENV HADOOP_VERSION 2.6
ENV MONGO_SPARK_VERSION 2.2.0
ENV SCALA_VERSION 2.11
WORKDIR ${HOME}
ENV SPARK_HOME ${HOME}/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}
ENV PATH ${PATH}:${SPARK_HOME}/bin
COPY files/times.json /home/ubuntu/times.json
COPY files/README.md /home/ubuntu/README.md
COPY files/examples.scala /home/ubuntu/examples.scala
COPY files/initDocuments.scala /home/ubuntu/initDocuments.scala
RUN chown -R ubuntu:ubuntu /home/ubuntu/*
USER ubuntu
# get spark
RUN wget http://apache.mirror.digitalpacific.com.au/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz && \
tar xvf spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz
RUN rm -fv spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}.tgz
我还有两个用Scala编程语言编写的文件,这对我来说听起来很新。问题在于容器只知道Java,而没有安装任何其他命令。有什么方法可以在容器上没有安装任何程序的情况下运行Scala?
文件名是examples.scala
和initDocuments.scala
。这是initDocuments.scala文件:
import com.mongodb.spark._
import com.mongodb.spark.config._
import org.bson.Document
val rdd = MongoSpark.load(sc)
if (rdd.count<1){
val t = sc.textFile("times.json")
val converted = t.map((tuple)=>Document.parse(tuple))
converted.saveToMongoDB(WriteConfig(Map("uri"->"mongodb://mongodb/spark.times")))
println("Documents inserted.")
} else {
println("Database 'spark' collection 'times' is not empty. Maybe you've loaded a data into the collection previously ? skipping process. ")
}
System.exit(0);
我也尝试了以下方法,但不起作用。
spark-shell --conf "spark.mongodb.input.uri=mongodb://mongodb:27017/spark.times" --conf "spark.mongodb.output.uri=mongodb://mongodb/spark.output" --packages org.mongodb.spark:mongo-spark-connector_${SCALA_VERSION}:${MONGO_SPARK_VERSION} -i ./initDocuments.scala
错误:
Ivy Default Cache set to: /home/ubuntu/.ivy2/cache
The jars for the packages stored in: /home/ubuntu/.ivy2/jars
:: loading settings :: url = jar:file:/home/ubuntu/spark-2.4.3-bin-hadoop2.6/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.mongodb.spark#mongo-spark-connector_2.11 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-d0f95242-e9b9-4d49-8dde-42afc7c55e9a;1.0
confs: [default]
You probably access the destination server through a proxy server that is not well configured.
You probably access the destination server through a proxy server that is not well configured.
You probably access the destination server through a proxy server that is not well configured.
You probably access the destination server through a proxy server that is not well configured.
:: resolution report :: resolve 40879ms :: artifacts dl 0ms
:: modules in use:
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 1 | 0 | 0 | 0 || 0 | 0 |
---------------------------------------------------------------------
:: problems summary ::
:::: WARNINGS
Host repo1.maven.org not found. url=https://repo1.maven.org/maven2/org/mongodb/spark/mongo-spark-connector_2.11/2.2.0/mongo-spark-connector_2.11-2.2.0.pom
Host repo1.maven.org not found. url=https://repo1.maven.org/maven2/org/mongodb/spark/mongo-spark-connector_2.11/2.2.0/mongo-spark-connector_2.11-2.2.0.jar
Host dl.bintray.com not found. url=https://dl.bintray.com/spark-packages/maven/org/mongodb/spark/mongo-spark-connector_2.11/2.2.0/mongo-spark-connector_2.11-2.2.0.pom
Host dl.bintray.com not found. url=https://dl.bintray.com/spark-packages/maven/org/mongodb/spark/mongo-spark-connector_2.11/2.2.0/mongo-spark-connector_2.11-2.2.0.jar
module not found: org.mongodb.spark#mongo-spark-connector_2.11;2.2.0
==== local-m2-cache: tried
file:/home/ubuntu/.m2/repository/org/mongodb/spark/mongo-spark-connector_2.11/2.2.0/mongo-spark-connector_2.11-2.2.0.pom
-- artifact org.mongodb.spark#mongo-spark-connector_2.11;2.2.0!mongo-spark-connector_2.11.jar:
file:/home/ubuntu/.m2/repository/org/mongodb/spark/mongo-spark-connector_2.11/2.2.0/mongo-spark-connector_2.11-2.2.0.jar
==== local-ivy-cache: tried
/home/ubuntu/.ivy2/local/org.mongodb.spark/mongo-spark-connector_2.11/2.2.0/ivys/ivy.xml
-- artifact org.mongodb.spark#mongo-spark-connector_2.11;2.2.0!mongo-spark-connector_2.11.jar:
/home/ubuntu/.ivy2/local/org.mongodb.spark/mongo-spark-connector_2.11/2.2.0/jars/mongo-spark-connector_2.11.jar
==== central: tried
https://repo1.maven.org/maven2/org/mongodb/spark/mongo-spark-connector_2.11/2.2.0/mongo-spark-connector_2.11-2.2.0.pom
-- artifact org.mongodb.spark#mongo-spark-connector_2.11;2.2.0!mongo-spark-connector_2.11.jar:
https://repo1.maven.org/maven2/org/mongodb/spark/mongo-spark-connector_2.11/2.2.0/mongo-spark-connector_2.11-2.2.0.jar
==== spark-packages: tried
https://dl.bintray.com/spark-packages/maven/org/mongodb/spark/mongo-spark-connector_2.11/2.2.0/mongo-spark-connector_2.11-2.2.0.pom
-- artifact org.mongodb.spark#mongo-spark-connector_2.11;2.2.0!mongo-spark-connector_2.11.jar:
https://dl.bintray.com/spark-packages/maven/org/mongodb/spark/mongo-spark-connector_2.11/2.2.0/mongo-spark-connector_2.11-2.2.0.jar
::::::::::::::::::::::::::::::::::::::::::::::
:: UNRESOLVED DEPENDENCIES ::
::::::::::::::::::::::::::::::::::::::::::::::
:: org.mongodb.spark#mongo-spark-connector_2.11;2.2.0: not found
::::::::::::::::::::::::::::::::::::::::::::::
:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: org.mongodb.spark#mongo-spark-connector_2.11;2.2.0: not found]
at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1306)
at org.apache.spark.deploy.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:54)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:315)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
PS: 我试图使用以下命令来更改代理地址,但我认为我的使用情况没有很好的代理。如果有人可以帮助我运行配置良好的代理来解决我的下载问题,我将不胜感激。
export JAVA_OPTS="$JAVA_OPTS -Dhttp.proxyHost=yourserver -Dhttp.proxyPort=8080 -Dhttp.proxyUser=username -Dhttp.proxyPassword=password"
根据下面的错误消息
:: org.mongodb.spark#mongo-spark-connector_2.11;2.2.0: not found
它指示该软件包丢失。检查当前可用的MongoDB Connector for Spark软件包,确认该软件包不再可用(已替换v2.2.6修补程序)。
您可以在sindbach / mongodb-spark-docker上查看带有Docker的MongoDB Spark连接器的更新示例。
附加信息: spark-shell是REPL(读取-评估-
打印循环)工具。它是程序员用于与框架进行交互的交互式外壳。您无需显式执行即可build
执行。当您指定它的--packages
参数时spark- shell
,它将自动获取程序包并将其包含在您的Shell环境中。
我编写docker文件是为了运行jar文件,而它并没有创建日志文件,请参阅下面的控制台是我的docker文件
问题内容: 我正在研究Centos7。我有一个运行Jenkins的Docker容器。在那个Jenkins容器中,我必须构建并运行其他Docker容器。但是詹金斯不认识码头工人。我能够执行一个shell并将docker安装在容器中。但是,是否有可能让容器在主机上使用我的docker- engine?如何使用? 在Jenkins-(docker)-容器中安装Docker的最佳选择是什么? 问题答案:
我已经在我的主机虚拟机上安装了docker。现在想用vi创建一个文件。 但它向我展示了一个错误:
问题内容: 地球人你好!我正在使用Airflow计划和运行Spark任务。我这次发现的所有内容都是Airflow可以管理的python DAG。 DAG示例: 问题是我的Python代码不好,并且有一些用Java编写的任务。我的问题是如何在python DAG中运行Spark Java jar?或者,也许还有其他方法吗?我发现了spark提交:http : //spark.apache.org/d
我有以下docker文件,它运行: 问题是,目前我必须首先在主机上运行 ,以便在路径的本地计算机上创建 jar 文件: 我如何将这个< code>gradle clean build放入我的docker文件中,以便构建步骤在容器内部完成? 编辑: 我希望用户的步骤是: < li >从< code>github克隆我的项目 < li >运行< code>docker build -t pokerst
Azure文档(https://docs.microsoft.com/en-us/Azure/devops/pipelines/tasks/build/docker?view=azure-devops)没有指定如何在Azure pipeline中运行docker容器。我们可以使用docker@2任务来构建/推送docker图像,但它没有运行容器的命令。通过查看旧版本Docker task的源代码,