当前位置：首页 > 软件库 > 云计算 > 云原生 >

spark-on-k8s-operator

授权协议 Apache-2.0 License

开发语言 Google Go

所属分类云计算、云原生

软件类型开源软件

地区不详

投递者宰父君昊

操作系统跨平台

开源组织无

适用人群未知

软件概览

This is not an officially supported Google product.

Community

Join our Slack channel on Kubernetes on Slack.
Check out who is using the Kubernetes Operator for Apache Spark.

Project Status

Project status: beta

Current API version: v1beta2

If you are currently using the v1beta1 version of the APIs in your manifests, please update them to use the v1beta2 version by changing apiVersion: "sparkoperator.k8s.io/<version>" to apiVersion: "sparkoperator.k8s.io/v1beta2". You will also need to delete the previous version of the CustomResourceDefinitions named sparkapplications.sparkoperator.k8s.io and scheduledsparkapplications.sparkoperator.k8s.io, and replace them with the v1beta2 version either by installing the latest version of the operator or by running kubectl create -f manifest/crds.

Customization of Spark pods, e.g., mounting arbitrary volumes and setting pod affinity, is implemented using a Kubernetes Mutating Admission Webhook, which became beta in Kubernetes 1.9. The mutating admission webhook is disabled by default if you install the operator using the Helm chart. Check out the Quick Start Guide on how to enable the webhook.

Prerequisites

Version >= 1.13 of Kubernetes to use the subresource support for CustomResourceDefinitions, which became beta in 1.13 and is enabled by default in 1.13 and higher.

Installation

The easiest way to install the Kubernetes Operator for Apache Spark is to use the Helm chart.

$ helm repo add spark-operator https://googlecloudplatform.github.io/spark-on-k8s-operator

$ helm install my-release spark-operator/spark-operator --namespace spark-operator --create-namespace

This will install the Kubernetes Operator for Apache Spark into the namespace spark-operator. The operator by default watches and handles SparkApplications in every namespaces. If you would like to limit the operator to watch and handle SparkApplications in a single namespace, e.g., default instead, add the following option to the helm install command:

--set sparkJobNamespace=default

For configuration options available in the Helm chart, please refer to the chart's README.

Version Matrix

The following table lists the most recent few versions of the operator.

Operator Version	API Version	Kubernetes Version	Base Spark Version	Operator Image Tag
`latest` (master HEAD)	`v1beta2`	1.13+	`3.0.0`	`latest`
`v1beta2-1.2.3-3.1.1`	`v1beta2`	1.13+	`3.1.1`	`v1beta2-1.2.3-3.1.1`
`v1beta2-1.2.0-3.0.0`	`v1beta2`	1.13+	`3.0.0`	`v1beta2-1.2.0-3.0.0`
`v1beta2-1.1.2-2.4.5`	`v1beta2`	1.13+	`2.4.5`	`v1beta2-1.1.2-2.4.5`
`v1beta2-1.0.1-2.4.4`	`v1beta2`	1.13+	`2.4.4`	`v1beta2-1.0.1-2.4.4`
`v1beta2-1.0.0-2.4.4`	`v1beta2`	1.13+	`2.4.4`	`v1beta2-1.0.0-2.4.4`
`v1beta1-0.9.0`	`v1beta1`	1.13+	`2.4.0`	`v2.4.0-v1beta1-0.9.0`

When installing using the Helm chart, you can choose to use a specific image tag instead of the default one, using the following option:

--set image.tag=<operator image tag>

Get Started

Get started quickly with the Kubernetes Operator for Apache Spark using the Quick Start Guide.

If you are running the Kubernetes Operator for Apache Spark on Google Kubernetes Engine and want to use Google Cloud Storage (GCS) and/or BigQuery for reading/writing data, also refer to the GCP guide.

For more information, check the Design, API Specification and detailed User Guide.

Overview

The Kubernetes Operator for Apache Spark aims to make specifying and running Spark applications as easy and idiomatic as running other workloads on Kubernetes. It usesKubernetes custom resourcesfor specifying, running, and surfacing status of Spark applications. For a complete reference of the custom resource definitions, please refer to the API Definition. For details on its design, please refer to the design doc. It requires Spark 2.3 and above that supports Kubernetes as a native scheduler backend.

The Kubernetes Operator for Apache Spark currently supports the following list of features:

Supports Spark 2.3 and up.
Enables declarative application specification and management of applications through custom resources.
Automatically runs spark-submit on behalf of users for each SparkApplication eligible for submission.
Provides native cron support for running scheduled applications.
Supports customization of Spark pods beyond what Spark natively is able to do through the mutating admission webhook, e.g., mounting ConfigMaps and volumes, and setting pod affinity/anti-affinity.
Supports automatic application re-submission for updated SparkApplication objects with updated specification.
Supports automatic application restart with a configurable restart policy.
Supports automatic retries of failed submissions with optional linear back-off.
Supports mounting local Hadoop configuration as a Kubernetes ConfigMap automatically via sparkctl.
Supports automatically staging local application dependencies to Google Cloud Storage (GCS) via sparkctl.
Supports collecting and exporting application-level metrics and driver/executor metrics to Prometheus.

Contributing

Please check CONTRIBUTING.md and the Developer Guide out.

使用案例

Spark PI demo with spark-on-k8s-operator

安装 spark-on-k8s-operator helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator helm install incubator/sparkoperator --namespace spark-operator NAME: eponymous-flee LAST D
Spark on K8S（spark-on-kubernetes-operator）常见问题（一）

Spark Demo过程中的常见问题（一） executor在启动时总是拉不到镜像 kubelet在node上启动pod时，sparkApplication要有imagePullSecret的描述，这样才能连接到私仓获取，需要：创建dockersecret kubectl create secret docker-registry harborsecret --docker-server=har
hadoop组件---spark理论----spark on k8s模式的三种方式全面了解

我们在之前的文章中已经了解了 spark支持的模式，其中一种就是使用k8s进行管理。 hadoop组件—spark----全面了解spark以及与hadoop的区别是时候考虑让你的 Spark 跑在K8s 上了 spark on k8s的优势–为什么要把Spark部署在k8s上大数据和云计算一直分属两个不同的领域。大数据主要关注怎么将数据集中起来，挖掘数据的价值；云计算主要关注怎么更高效地
hadoop组件---spark实战----spark on k8s模式k8s原生方式安装spark2.4.4 client mode提交python程序在spark...

我们在上篇文章中已经成功运行使用spark-submit提交了python程序运行spark。 hadoop组件—spark实战----spark on k8s模式k8s原生方式安装spark2.4.4 client mode提交python程序和运行pyspark 本篇文章记录在client pod中使用spark-submit提交python程序在spark on k8s中访问s3。进入c
【k8s系列1】spark on k8s 与 spark on k8s operator的对比

对于目前基于k8s的的spark应用，主要采用两种方式运行 spark原生支持的 spark on k8s 基于k8s的operator的 spark on k8s operator 前者是spark社区支持k8s这种资源管理框架而引入的k8s client的实现后者是k8s社区为了支持spark而开发的一种operator 区别 spark on k8s spark on k8s operat
Spark3.1.2 on k8s配置日志存储路径：spark-defaults.conf

一、Spark3.1.2 on k8s配置日志存储路径：spark-defaults.conf 使用的Hadoop版本是2.7.3 HDFS端口号9000 192.168.x.x是Hadoop的namenode节点IP地址 18080是默认的历史日志的端口号 spark.yarn.historyServer.address=192.168.x.x:18080 spark.history.ui.po
【收藏】Spark on k8s Operator 部署安装

Spark on k8s Operator 部署安装

spark-on-k8s-operator

Community

Project Status

Prerequisites

Installation

Version Matrix

Get Started

Overview

Contributing

同类工具

相关阅读

相关文章

相关问答

相关文档