当前位置: 首页 > 软件库 > 云计算 > 云原生 >

spark-on-k8s-operator

授权协议 Apache-2.0 License
开发语言 Google Go
所属分类 云计算、 云原生
软件类型 开源软件
地区 不详
投 递 者 宰父君昊
操作系统 跨平台
开源组织
适用人群 未知
 软件概览

This is not an officially supported Google product.

Community

Project Status

Project status: beta

Current API version: v1beta2

If you are currently using the v1beta1 version of the APIs in your manifests, please update them to use the v1beta2 version by changing apiVersion: "sparkoperator.k8s.io/<version>" to apiVersion: "sparkoperator.k8s.io/v1beta2". You will also need to delete the previous version of the CustomResourceDefinitions named sparkapplications.sparkoperator.k8s.io and scheduledsparkapplications.sparkoperator.k8s.io, and replace them with the v1beta2 version either by installing the latest version of the operator or by running kubectl create -f manifest/crds.

Customization of Spark pods, e.g., mounting arbitrary volumes and setting pod affinity, is implemented using a Kubernetes Mutating Admission Webhook, which became beta in Kubernetes 1.9. The mutating admission webhook is disabled by default if you install the operator using the Helm chart. Check out the Quick Start Guide on how to enable the webhook.

Prerequisites

Installation

The easiest way to install the Kubernetes Operator for Apache Spark is to use the Helm chart.

$ helm repo add spark-operator https://googlecloudplatform.github.io/spark-on-k8s-operator

$ helm install my-release spark-operator/spark-operator --namespace spark-operator --create-namespace

This will install the Kubernetes Operator for Apache Spark into the namespace spark-operator. The operator by default watches and handles SparkApplications in every namespaces. If you would like to limit the operator to watch and handle SparkApplications in a single namespace, e.g., default instead, add the following option to the helm install command:

--set sparkJobNamespace=default

For configuration options available in the Helm chart, please refer to the chart's README.

Version Matrix

The following table lists the most recent few versions of the operator.

Operator Version API Version Kubernetes Version Base Spark Version Operator Image Tag
latest (master HEAD) v1beta2 1.13+ 3.0.0 latest
v1beta2-1.2.3-3.1.1 v1beta2 1.13+ 3.1.1 v1beta2-1.2.3-3.1.1
v1beta2-1.2.0-3.0.0 v1beta2 1.13+ 3.0.0 v1beta2-1.2.0-3.0.0
v1beta2-1.1.2-2.4.5 v1beta2 1.13+ 2.4.5 v1beta2-1.1.2-2.4.5
v1beta2-1.0.1-2.4.4 v1beta2 1.13+ 2.4.4 v1beta2-1.0.1-2.4.4
v1beta2-1.0.0-2.4.4 v1beta2 1.13+ 2.4.4 v1beta2-1.0.0-2.4.4
v1beta1-0.9.0 v1beta1 1.13+ 2.4.0 v2.4.0-v1beta1-0.9.0

When installing using the Helm chart, you can choose to use a specific image tag instead of the default one, using the following option:

--set image.tag=<operator image tag>

Get Started

Get started quickly with the Kubernetes Operator for Apache Spark using the Quick Start Guide.

If you are running the Kubernetes Operator for Apache Spark on Google Kubernetes Engine and want to use Google Cloud Storage (GCS) and/or BigQuery for reading/writing data, also refer to the GCP guide.

For more information, check the Design, API Specification and detailed User Guide.

Overview

The Kubernetes Operator for Apache Spark aims to make specifying and running Spark applications as easy and idiomatic as running other workloads on Kubernetes. It usesKubernetes custom resourcesfor specifying, running, and surfacing status of Spark applications. For a complete reference of the custom resource definitions, please refer to the API Definition. For details on its design, please refer to the design doc. It requires Spark 2.3 and above that supports Kubernetes as a native scheduler backend.

The Kubernetes Operator for Apache Spark currently supports the following list of features:

  • Supports Spark 2.3 and up.
  • Enables declarative application specification and management of applications through custom resources.
  • Automatically runs spark-submit on behalf of users for each SparkApplication eligible for submission.
  • Provides native cron support for running scheduled applications.
  • Supports customization of Spark pods beyond what Spark natively is able to do through the mutating admission webhook, e.g., mounting ConfigMaps and volumes, and setting pod affinity/anti-affinity.
  • Supports automatic application re-submission for updated SparkApplication objects with updated specification.
  • Supports automatic application restart with a configurable restart policy.
  • Supports automatic retries of failed submissions with optional linear back-off.
  • Supports mounting local Hadoop configuration as a Kubernetes ConfigMap automatically via sparkctl.
  • Supports automatically staging local application dependencies to Google Cloud Storage (GCS) via sparkctl.
  • Supports collecting and exporting application-level metrics and driver/executor metrics to Prometheus.

Contributing

Please check CONTRIBUTING.md and the Developer Guide out.

  • 安装 spark-on-k8s-operator helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator helm install incubator/sparkoperator --namespace spark-operator NAME: eponymous-flee LAST D

  • Spark Demo过程中的常见问题(一) executor在启动时总是拉不到镜像 kubelet在node上启动pod时,sparkApplication要有imagePullSecret的描述,这样才能连接到私仓获取,需要: 创建dockersecret kubectl create secret docker-registry harborsecret --docker-server=har

  • 我们在之前的文章中 已经了解了 spark支持的模式,其中一种就是 使用k8s进行管理。 hadoop组件—spark----全面了解spark以及与hadoop的区别 是时候考虑让你的 Spark 跑在K8s 上了 spark on k8s的优势–为什么要把Spark部署在k8s上 大数据和云计算一直分属两个不同的领域。大数据主要关注怎么将数据集中起来,挖掘数据的价值;云计算主要关注怎么更高效地

  • 我们在上篇文章中已经成功运行使用spark-submit提交了python程序运行spark。 hadoop组件—spark实战----spark on k8s模式k8s原生方式安装spark2.4.4 client mode提交python程序和运行pyspark 本篇文章记录 在client pod中使用spark-submit提交python程序在spark on k8s中访问s3。 进入c

  • 对于目前基于k8s的的spark应用,主要采用两种方式运行 spark原生支持的 spark on k8s 基于k8s的operator的 spark on k8s operator 前者是spark社区支持k8s这种资源管理框架而引入的k8s client的实现 后者是k8s社区为了支持spark而开发的一种operator 区别 spark on k8s spark on k8s operat

  • 一、Spark3.1.2 on k8s配置日志存储路径:spark-defaults.conf 使用的Hadoop版本是2.7.3 HDFS端口号9000 192.168.x.x是Hadoop的namenode节点IP地址 18080是默认的历史日志的端口号 spark.yarn.historyServer.address=192.168.x.x:18080 spark.history.ui.po

  • Spark on k8s Operator 部署安装

 相关资料
  • Elastic Cloud on Kubernetes (ECK) Elastic Cloud on Kubernetes automates the deployment, provisioning, management, and orchestration of Elasticsearch, Kibana, APM Server, Enterprise Search, Beats, Elas

  • Spark on Angel Angel从v1.0.0版本开始,就加入了PS-Service的特性,不仅仅可以作为一个完整的PS框架运行,也可以作为一个PS-Service,为不具备参数服务器能力的分布式框架,引入PS能力,从而让它们运行得更快,功能更强。 而Spark是这个Service设计的第一个获益者。 作为一个比较流行的内存计算框架,Spark 的核心概念是RDD,而RDD的关键特性之一,

  • 该项目是基于 Spark standalone 模式,对资源的分配调度还有作业状态查询的功能实在有限,对于让 spark 使用真正原生的 kubernetes 资源调度推荐大家尝试 https://github.com/apache-spark-on-k8s/ 本文中使用的镜像我已编译好上传到了时速云上,大家可以直接下载。 index.tenxcloud.com/jimmy/spark:1.5.2

  • Angel or Spark On Angel? Angel拥有不同的运行模式:ANGEL_PS_WORKER和ANGEL_PS。 在ANGEL_PS_WORKER模式下,Angel可以独立完成模型的训练和预测等计算任务;在ANGEL_PS模式下,Angel启动PS服务,为其他的计算平台提供参数的存储和交换服务。目前基于ANGEL_PS运行模式,Angel开源社区打造了Spark On Angel

  • Spark on Angel 快速入门 Spark on Angel同时支持YARN和Local两种运行模型,从而方便用户在本地调试程序。Spark on Angel的任务本质上是一个Spark的Application,但是多了一个附属的Application。在任务成功提交后,集群上将会出现两个独立的Application,一个是Spark Application, 一个是Angel-PS Ap

  • Spark on Angel编程指南 Spark on Angel的算法实现与纯Spark的实现非常接近,因此大部分的Spark ML算法仅需要修改一小部分代码就能将算法跑到Spark on Angel上。 该版本的Spark on Angel是基于Spark 2.1.0和Scala 2.11.8,因此建议大家在该环境下开发。 开发者接触到的类主要有PSContext,PSVectorPool。