This is not an officially supported Google product.
Project status: beta
Current API version: v1beta2
If you are currently using the v1beta1
version of the APIs in your manifests, please update them to use the v1beta2
version by changing apiVersion: "sparkoperator.k8s.io/<version>"
to apiVersion: "sparkoperator.k8s.io/v1beta2"
. You will also need to delete the previous
version of the CustomResourceDefinitions named sparkapplications.sparkoperator.k8s.io
and scheduledsparkapplications.sparkoperator.k8s.io
, and replace them with the v1beta2
version either by installing the latest version of the operator or by running kubectl create -f manifest/crds
.
Customization of Spark pods, e.g., mounting arbitrary volumes and setting pod affinity, is implemented using a Kubernetes Mutating Admission Webhook, which became beta in Kubernetes 1.9. The mutating admission webhook is disabled by default if you install the operator using the Helm chart. Check out the Quick Start Guide on how to enable the webhook.
subresource
support for CustomResourceDefinitions, which became beta in 1.13 and is enabled by default in 1.13 and higher.The easiest way to install the Kubernetes Operator for Apache Spark is to use the Helm chart.
$ helm repo add spark-operator https://googlecloudplatform.github.io/spark-on-k8s-operator
$ helm install my-release spark-operator/spark-operator --namespace spark-operator --create-namespace
This will install the Kubernetes Operator for Apache Spark into the namespace spark-operator
. The operator by default watches and handles SparkApplication
s in every namespaces. If you would like to limit the operator to watch and handle SparkApplication
s in a single namespace, e.g., default
instead, add the following option to the helm install
command:
--set sparkJobNamespace=default
For configuration options available in the Helm chart, please refer to the chart's README.
The following table lists the most recent few versions of the operator.
Operator Version | API Version | Kubernetes Version | Base Spark Version | Operator Image Tag |
---|---|---|---|---|
latest (master HEAD) |
v1beta2 |
1.13+ | 3.0.0 |
latest |
v1beta2-1.2.3-3.1.1 |
v1beta2 |
1.13+ | 3.1.1 |
v1beta2-1.2.3-3.1.1 |
v1beta2-1.2.0-3.0.0 |
v1beta2 |
1.13+ | 3.0.0 |
v1beta2-1.2.0-3.0.0 |
v1beta2-1.1.2-2.4.5 |
v1beta2 |
1.13+ | 2.4.5 |
v1beta2-1.1.2-2.4.5 |
v1beta2-1.0.1-2.4.4 |
v1beta2 |
1.13+ | 2.4.4 |
v1beta2-1.0.1-2.4.4 |
v1beta2-1.0.0-2.4.4 |
v1beta2 |
1.13+ | 2.4.4 |
v1beta2-1.0.0-2.4.4 |
v1beta1-0.9.0 |
v1beta1 |
1.13+ | 2.4.0 |
v2.4.0-v1beta1-0.9.0 |
When installing using the Helm chart, you can choose to use a specific image tag instead of the default one, using the following option:
--set image.tag=<operator image tag>
Get started quickly with the Kubernetes Operator for Apache Spark using the Quick Start Guide.
If you are running the Kubernetes Operator for Apache Spark on Google Kubernetes Engine and want to use Google Cloud Storage (GCS) and/or BigQuery for reading/writing data, also refer to the GCP guide.
For more information, check the Design, API Specification and detailed User Guide.
The Kubernetes Operator for Apache Spark aims to make specifying and running Spark applications as easy and idiomatic as running other workloads on Kubernetes. It usesKubernetes custom resourcesfor specifying, running, and surfacing status of Spark applications. For a complete reference of the custom resource definitions, please refer to the API Definition. For details on its design, please refer to the design doc. It requires Spark 2.3 and above that supports Kubernetes as a native scheduler backend.
The Kubernetes Operator for Apache Spark currently supports the following list of features:
spark-submit
on behalf of users for each SparkApplication
eligible for submission.SparkApplication
objects with updated specification.sparkctl
.sparkctl
.Please check CONTRIBUTING.md and the Developer Guide out.
安装 spark-on-k8s-operator helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator helm install incubator/sparkoperator --namespace spark-operator NAME: eponymous-flee LAST D
Spark Demo过程中的常见问题(一) executor在启动时总是拉不到镜像 kubelet在node上启动pod时,sparkApplication要有imagePullSecret的描述,这样才能连接到私仓获取,需要: 创建dockersecret kubectl create secret docker-registry harborsecret --docker-server=har
我们在之前的文章中 已经了解了 spark支持的模式,其中一种就是 使用k8s进行管理。 hadoop组件—spark----全面了解spark以及与hadoop的区别 是时候考虑让你的 Spark 跑在K8s 上了 spark on k8s的优势–为什么要把Spark部署在k8s上 大数据和云计算一直分属两个不同的领域。大数据主要关注怎么将数据集中起来,挖掘数据的价值;云计算主要关注怎么更高效地
我们在上篇文章中已经成功运行使用spark-submit提交了python程序运行spark。 hadoop组件—spark实战----spark on k8s模式k8s原生方式安装spark2.4.4 client mode提交python程序和运行pyspark 本篇文章记录 在client pod中使用spark-submit提交python程序在spark on k8s中访问s3。 进入c
对于目前基于k8s的的spark应用,主要采用两种方式运行 spark原生支持的 spark on k8s 基于k8s的operator的 spark on k8s operator 前者是spark社区支持k8s这种资源管理框架而引入的k8s client的实现 后者是k8s社区为了支持spark而开发的一种operator 区别 spark on k8s spark on k8s operat
一、Spark3.1.2 on k8s配置日志存储路径:spark-defaults.conf 使用的Hadoop版本是2.7.3 HDFS端口号9000 192.168.x.x是Hadoop的namenode节点IP地址 18080是默认的历史日志的端口号 spark.yarn.historyServer.address=192.168.x.x:18080 spark.history.ui.po
Spark on k8s Operator 部署安装
Elastic Cloud on Kubernetes (ECK) Elastic Cloud on Kubernetes automates the deployment, provisioning, management, and orchestration of Elasticsearch, Kibana, APM Server, Enterprise Search, Beats, Elas
Spark on Angel Angel从v1.0.0版本开始,就加入了PS-Service的特性,不仅仅可以作为一个完整的PS框架运行,也可以作为一个PS-Service,为不具备参数服务器能力的分布式框架,引入PS能力,从而让它们运行得更快,功能更强。 而Spark是这个Service设计的第一个获益者。 作为一个比较流行的内存计算框架,Spark 的核心概念是RDD,而RDD的关键特性之一,
该项目是基于 Spark standalone 模式,对资源的分配调度还有作业状态查询的功能实在有限,对于让 spark 使用真正原生的 kubernetes 资源调度推荐大家尝试 https://github.com/apache-spark-on-k8s/ 本文中使用的镜像我已编译好上传到了时速云上,大家可以直接下载。 index.tenxcloud.com/jimmy/spark:1.5.2
Angel or Spark On Angel? Angel拥有不同的运行模式:ANGEL_PS_WORKER和ANGEL_PS。 在ANGEL_PS_WORKER模式下,Angel可以独立完成模型的训练和预测等计算任务;在ANGEL_PS模式下,Angel启动PS服务,为其他的计算平台提供参数的存储和交换服务。目前基于ANGEL_PS运行模式,Angel开源社区打造了Spark On Angel
Spark on Angel 快速入门 Spark on Angel同时支持YARN和Local两种运行模型,从而方便用户在本地调试程序。Spark on Angel的任务本质上是一个Spark的Application,但是多了一个附属的Application。在任务成功提交后,集群上将会出现两个独立的Application,一个是Spark Application, 一个是Angel-PS Ap
Spark on Angel编程指南 Spark on Angel的算法实现与纯Spark的实现非常接近,因此大部分的Spark ML算法仅需要修改一小部分代码就能将算法跑到Spark on Angel上。 该版本的Spark on Angel是基于Spark 2.1.0和Scala 2.11.8,因此建议大家在该环境下开发。 开发者接触到的类主要有PSContext,PSVectorPool。