bitnami如何使用
I’ve been using it for around 2 years now to build out custom workflow interfaces, like those used for Laboratory Information Management Systems (LIMs), Computer Vision pre and postprocessing pipelines, and to set and forget other genomics pipelines.
我已经使用它大约两年了,以构建自定义的工作流界面,例如用于实验室信息管理系统(LIM),计算机视觉前后处理管道的界面,以及设置和忘记其他基因组学管道。
My favorite feature of Airflow is how completely agnostic it is to the work you are doing or where that work is taking place. It could take place locally, on a Docker image, on Kubernetes, on any number of AWS services, on an HPC system, etc. Using Airflow allows me to concentrate on the business logic of what I’m trying to accomplish without getting too bogged down in implementation details.
我最喜欢的Airflow功能是对您正在进行的工作或进行的工作有多不可知。 它可以在本地,Docker映像,Kubernetes,任何数量的AWS服务,HPC系统等上进行。使用Airflow,我可以专注于自己想要完成的业务逻辑,而不会陷入困境实施细节。
During that time I’ve adopted a set of systems that I use to quickly build out the main development stack with Docker and Docker Compose, using the Bitnami Apache Airflow stack. Generally, I either deploy the stack to production using either the same Docker compose stack if its a small enough instance that is isolated, or with Kubernetes when I need to interact with other services or file systems.
在这段时间里,我采用了一套系统,使用Bitnami Apache Airflow堆栈快速构建了Docker和Docker Compose的主要开发堆栈 。 通常,如果隔离的实例足够小,则可以使用相同的Docker compose堆栈将堆栈部署到生产环境,或者当我需要与其他服务或文件系统进行交互时,可以使用Kubernetes部署堆栈。
比塔南vs自己动手 (Bitnami vs Roll Your Own)
I used to roll my own Airflow containers using Conda. I still use this approach for most of my other containers, including microservices that interact with my Airflow system, but configuring Airflow is a lot more than just installing packages. Also, even just installing those packages is a pain and I could rarely count on a rebuild actually working without some pain. Then, on top of the packages you need to configure database connections and a message queue.
我曾经使用Conda来滚动自己的Airflow容器。 我仍然对大多数其他容器(包括与我的Airflow系统进行交互的微服务)使用这种方法,但是配置Airflow不仅仅是安装软件包。 另外,即使只是安装这些软件包也是一件很痛苦的事,我很少指望重建工作而不会有些痛苦。 然后,在软件包的顶部,您需要配置数据库连接和消息队列。
In comes the Bitnami Apache Airflow docker compose stack for dev and Bitnami Apache Airflow Helm Chart for prod!
用于开发人员的Bitnami Apache Airflow泊坞窗组成堆栈和用于产品的Bitnami Apache Airflow掌舵图表 !
Bitnami, in their own words:
用自己的话说Bitnami :
Bitnami makes it easy to get your favorite open source software up and running on any platform, including your laptop, Kubernetes and all the major clouds. In addition to popular community offerings, Bitnami, now part of VMware, provides IT organizations with an enterprise offering that is secure, compliant, continuously maintained and customizable to your organizational policies. https://bitnami.com/
Bitnami使您可以轻松地在任何平台上启动并运行您喜欢的开源软件,包括您的笔记本电脑,Kubernetes和所有主要云。 除了受欢迎的社区产品之外,Bitnami(现在是VMware的一部分)为IT组织提供安全,合规,可连续维护且可根据您的组织策略自定义的企业产品。 https://bitnami.com/
Bitnami stacks (usually) work completely the same from their Docker Compose stacks to their Helm charts. This means I can test and develop locally using my compose stack, build out new images, versions, packages, etc, and then deploy to Kubernetes. The configuration, environmental variables, and everything else acts the same. It would be a fairly large undertaking to do all this from scratch, so I use Bitnami.
从Docker Compose堆栈到Helm图表,Bitnami堆栈(通常)的工作原理完全相同。 这意味着我可以使用自己的Compose堆栈在本地进行测试和开发,构建新的映像,版本,软件包等,然后部署到Kubernetes。 配置,环境变量以及其他所有行为均相同。 从头开始做所有这一切将是一个相当大的任务,所以我使用Bitnami。
They have plenty of enterprise offerings, but everything included here is open source and there is no paywall involved.
他们有大量的企业产品,但是这里包含的所有内容都是开源的,没有涉及任何付费。
And no, I am not affiliated with Bitnami, although I have kids that eat a lot and don’t have any particular ethical aversions to selling out. ;-) I’ve just found their offerings to be excellent.
不,我不隶属于Bitnami,尽管我的孩子吃得很多,并且对卖出没有特别的道德厌恶。 ;-)我刚刚发现他们的产品很棒 。
项目结构 (Project Structure)
I like to have my projects organized so that I can run tree
and have a general idea of what's happening.
我喜欢组织我的项目,以便可以运行tree
并大致了解正在发生的事情。
Apache Airflow has 3 main components, the application, the worker, and the scheduler. Each of these has it’s own Docker image to separate out the services. Additionally, there is a database and an message queue, but we won’t be doing any customization to these.
Apache Airflow具有3个主要组件,即应用程序,工作程序和调度程序。 它们每个都有自己的Docker映像以分离出服务。 另外,有一个数据库和一个消息队列,但是我们不会对其进行任何自定义。
.
└── docker
└── bitnami-apache-airflow-1.10.10
├── airflow
│ └── Dockerfile
├── airflow-scheduler
│ └── Dockerfile
├── airflow-worker
│ └── Dockerfile
├── dags
│ └── tutorial.py
├── docker-compose.yml
So what we have here is a directory called bitnami-apache-airflow-1.10.10
. Which brings us to a very important point! Pin your versions! It will save you so, so much pain and frustration!
因此,这里有一个名为bitnami-apache-airflow-1.10.10
。 这把我们带到了非常重要的一点! 固定您的版本! 它将为您节省如此多的痛苦和沮丧!
Then we have one Dockerfile per Airflow piece.
然后,每个Airflow件只有一个Dockerfile。
Create this directory structure with:
使用以下命令创建此目录结构:
mkdir -p docker/bitnami-apache-airflow-1.10.10/{airflow,airflow-scheduler,airflow-worker,dags}
Docker Compose文件 (The Docker Compose File)
This is my preference for the docker-compose.yml
file. I made a few changes for my own preferences, mostly that I pin versions, build my own Docker images, I have volume mounts for the dags
, plugins
, and database backups
along with adding in the docker socket so I can run DockerOperators
from within my stack.
这是我对docker-compose.yml
文件的偏好。 我根据自己的喜好做了一些更改,主要是我固定了版本 ,构建了自己的Docker映像,为dags
, plugins
和database backups
添加了卷挂载,并添加了DockerOperators
套接字,以便可以在我的内部运行DockerOperators
堆栈。
You can always go and grab the original docker-compose
here.
您随时可以在这里获取原始的docker-compose
。
version: '2'
services:
postgresql:
image: 'docker.io/bitnami/postgresql:10-debian-10'
volumes:
- 'postgresql_data:/bitnami/postgresql'
environment:
- POSTGRESQL_DATABASE=bitnami_airflow
- POSTGRESQL_USERNAME=bn_airflow
- POSTGRESQL_PASSWORD=bitnami1
- ALLOW_EMPTY_PASSWORD=yes
redis:
image: docker.io/bitnami/redis:5.0-debian-10
volumes:
- 'redis_data:/bitnami'
environment:
- ALLOW_EMPTY_PASSWORD=yes
airflow-scheduler:
# image: docker.io/bitnami/airflow-scheduler:1-debian-10
build:
context: airflow-scheduler
environment:
- AIRFLOW_DATABASE_NAME=bitnami_airflow
- AIRFLOW_DATABASE_USERNAME=bn_airflow
- AIRFLOW_DATABASE_PASSWORD=bitnami1
- AIRFLOW_EXECUTOR=CeleryExecutor
# If you'd like to load the example DAGs change this to yes!
- AIRFLOW_LOAD_EXAMPLES=no
# only works with 1.10.11
#- AIRFLOW__WEBSERVER__RELOAD_ON_PLUGIN_CHANGE=true
#- AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION=False
volumes:
- airflow_scheduler_data:/bitnami
- ./plugins:/opt/bitnami/airflow/plugins
- ./dags:/opt/bitnami/airflow/dags
- ./db_backups:/opt/bitnami/airflow/db_backups
- /var/run/docker.sock:/var/run/docker.sock
airflow-worker:
# image: docker.io/bitnami/airflow-worker:1-debian-10
build:
context: airflow-worker
environment:
- AIRFLOW_DATABASE_NAME=bitnami_airflow
- AIRFLOW_DATABASE_USERNAME=bn_airflow
- AIRFLOW_DATABASE_PASSWORD=bitnami1
- AIRFLOW_EXECUTOR=CeleryExecutor
- AIRFLOW_LOAD_EXAMPLES=no
# only works with 1.10.11
#- AIRFLOW__WEBSERVER__RELOAD_ON_PLUGIN_CHANGE=true
#- AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION=False
volumes:
- airflow_worker_data:/bitnami
- ./plugins:/opt/bitnami/airflow/plugins
- ./dags:/opt/bitnami/airflow/dags
- ./db_backups:/opt/bitnami/airflow/db_backups
- /var/run/docker.sock:/var/run/docker.sock
airflow:
# image: docker.io/bitnami/airflow:1-debian-10
build:
# You can also specify the build context
# as cwd and point to a different Dockerfile
context: .
dockerfile: airflow/Dockerfile
environment:
- AIRFLOW_DATABASE_NAME=bitnami_airflow
- AIRFLOW_DATABASE_USERNAME=bn_airflow
- AIRFLOW_DATABASE_PASSWORD=bitnami1
- AIRFLOW_EXECUTOR=CeleryExecutor
- AIRFLOW_LOAD_EXAMPLES=no
# only works with 1.10.11
#- AIRFLOW__WEBSERVER__RELOAD_ON_PLUGIN_CHANGE=True
#- AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION=False
ports:
- '8080:8080'
volumes:
- airflow_data:/bitnami
- ./dags:/opt/bitnami/airflow/dags
- ./plugins:/opt/bitnami/airflow/plugins
- ./db_backups:/opt/bitnami/airflow/db_backups
- /var/run/docker.sock:/var/run/docker.sock
volumes:
airflow_scheduler_data:
driver: local
airflow_worker_data:
driver: local
airflow_data:
driver: local
postgresql_data:
driver: local
redis_data:
driver: local
固定您的版本 (Pin your versions)
The version of Apache Airflow used here is 1.10.10
. The 1.10.11
has some cool updates I would like to incorporate, so I will keep an eye on it!
这里使用的Apache Airflow版本为1.10.10
。 我想加入1.10.11
一些很酷的更新,所以我会一直关注它!
You can always keep up with the latest Apache Airflow versions by checking out the changelog on the main site.
您可以通过在主站点上查看变更日志来始终了解最新的Apache Airflow版本。
We are using Bitnami, which has bots that automatically build and update their images as new releases come along.
我们使用的是Bitnami,它的机器人会随着新版本的发布自动构建和更新其映像。
While this approach is great for bots, I highly do not recommend just hoping that the latest version will be backwards compatible and work with your setup.
虽然这种方法非常适合机器人,但我强烈不建议您仅希望最新版本向后兼容并可以使用您的设置。
Instead, pin a version, and when a new version comes along test it out in your dev stack. At the time of writing the most recent version is 1.10.11
, but it doesn't quite work out of the box, so we are using 1.10.10
.
而是固定一个版本,并在出现新版本时在开发堆栈中对其进行测试。 在撰写本文时,最新版本是1.10.11
,但是它并没有1.10.10
,因此我们正在使用1.10.10
。
Bitnami Apache Airflow Docker标签 (Bitnami Apache Airflow Docker Tags)
Generally speaking, a docker tag corresponds to the application version. Sometimes there are other variants as well, such as base OS. Here we can just go with the application version.
一般来说,泊坞窗标签对应于应用程序版本。 有时也有其他变体,例如基本OS。 在这里,我们可以使用应用程序版本。
Bitnami Apache Airflow Scheduler Image Tags
Bitnami Apache Airflow Scheduler图像标签
Bitnami Apache Airflow Worker Image Tags
Bitnami Apache Airflow Worker图像标签
Bitnami Apache Airflow Web Image Tags
Bitnami Apache Airflow Web图像标签
建立自定义图像 (Build Custom Images)
In our docker-compose
we have placeholders in order to build custom images.
在我们的docker-compose
我们有占位符以构建自定义图像。
We’ll just create a minimal Docker file for now. Later I’ll show you how to customize your docker container with extra system or python packages.
现在,我们仅创建一个最小的Docker文件。 稍后,我将向您展示如何使用额外的系统或python软件包来自定义docker容器。
气流应用 (Airflow Application)
echo "FROM docker.io/bitnami/airflow:1.10.10" > docker/bitnami-apache-airflow-1.10.10/airflow/Dockerfile
Will give you this airflow application docker file.
将为您提供此气流应用程序docker文件。
FROM docker.io/bitnami/airflow:1.10.10
气流调度器 (Airflow Scheduler)
echo "FROM docker.io/bitnami/airflow-scheduler:1.10.10" > docker/bitnami-apache-airflow-1.10.10/airflow-scheduler/Dockerfile
Will give you this airflow scheduler docker file.
将为您提供此气流调度程序docker文件。
FROM docker.io/bitnami/airflow-scheduler:1.10.10
气流工人 (Airflow Worker)
echo "FROM docker.io/bitnami/airflow-worker:1.10.10" > docker/bitnami-apache-airflow-1.10.10/airflow-worker/Dockerfile
Will give you this airflow worker docker file.
会给你这个气流工人泊坞窗文件。
FROM docker.io/bitnami/airflow-worker:1.10.10
调高堆栈 (Bring Up The Stack)
Grab the docker-compose
file above and let's get rolling!
抓住上面的docker-compose
文件,让我们开始吧!
cd docker/bitnami-apache-airflow-1.10.10
docker-compose up
If this is your first time running the command this will take some time. Docker will fetch any images it doesn’t already have, and build all the airflow-* images.
如果这是您第一次运行该命令,则将需要一些时间。 Docker将获取其尚无的任何图像,并构建所有airflow- *图像。
导航到用户界面 (Navigate to the UI)
Once everything is up and running navigate to the UI at http://localhost:8080
.
一切就绪并运行后,导航至位于http://localhost:8080
的UI 。
Unless you changed the configuration, your default username/password
is user/bitnami
.
除非更改配置,否则默认username/password
为user/bitnami
。
Login to check out your Airflow web UI!
登录以查看您的Airflow Web UI!
添加自定义DAG (Add in a Custom DAG)
Here’s a DAG that I grabbed from the Apache Airflow Tutorial. I’ve only included it here for the sake of completeness.
这是我从Apache Airflow教程中获得的DAG 。 为了完整起见,我仅将其包括在此处。
from datetime import timedelta
# The DAG object; we'll need this to instantiate a DAG
from airflow import DAG
# Operators; we need this to operate!
from airflow.operators.bash_operator import BashOperator
from airflow.utils.dates import days_ago
# These args will get passed on to each operator
# You can override them on a per-task basis during operator initialization
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': days_ago(2),
'email': ['airflow@example.com'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
# 'queue': 'bash_queue',
# 'pool': 'backfill',
# 'priority_weight': 10,
# 'end_date': datetime(2016, 1, 1),
# 'wait_for_downstream': False,
# 'dag': dag,
# 'sla': timedelta(hours=2),
# 'execution_timeout': timedelta(seconds=300),
# 'on_failure_callback': some_function,
# 'on_success_callback': some_other_function,
# 'on_retry_callback': another_function,
# 'sla_miss_callback': yet_another_function,
# 'trigger_rule': 'all_success'
}
dag = DAG(
'tutorial',
default_args=default_args,
description='A simple tutorial DAG',
schedule_interval=timedelta(days=1),
)
# t1, t2 and t3 are examples of tasks created by instantiating operators
t1 = BashOperator(
task_id='print_date',
bash_command='date',
dag=dag,
)
t2 = BashOperator(
task_id='sleep',
depends_on_past=False,
bash_command='sleep 5',
retries=3,
dag=dag,
)
dag.doc_md = __doc__
t1.doc_md = """\
#### Task Documentation
You can document your task using the attributes `doc_md` (markdown),
`doc` (plain text), `doc_rst`, `doc_json`, `doc_yaml` which gets
rendered in the UI's Task Instance Details page.
![img](http://montcs.bloomu.edu/~bobmon/Semesters/2012-01/491/import%20soul.png)
"""
templated_command = """
{% for i in range(5) %}
echo "{{ ds }}"
echo "{{ macros.ds_add(ds, 7)}}"
echo "{{ params.my_param }}"
{% endfor %}
"""
t3 = BashOperator(
task_id='templated',
depends_on_past=False,
bash_command=templated_command,
params={'my_param': 'Parameter I passed in'},
dag=dag,
)
t1 >> [t2, t3]
Anyways, grab this file and put it in your code/bitnami-apache-airflow-1.10.10/dags
folder. The name of the file itself doesn't matter. The DAG name will be whatever you set in the file.
无论如何,请抓取此文件并将其放在您的code/bitnami-apache-airflow-1.10.10/dags
文件夹中。 文件本身的名称无关紧要。 DAG名称将是您在文件中设置的名称。
Airflow will restart itself automatically, and if you refresh the UI you should see your new tutorial
DAG listed.
Airflow会自动重新启动,如果刷新UI,您应该会看到列出的新tutorial
DAG。
构建自定义气流Docker容器 (Build Custom Airflow Docker Containers)
If you’d like to add additonal system or python packages you can do so.
如果您想添加附加系统或python软件包,可以这样做。
# code/bitnami-apache-airflow-1.10.10/airflow/Dockerfile
FROM docker.io/bitnami/airflow:1.10.10
# From here - https://github.com/bitnami/bitnami-docker-airflow/blob/master/1/debian-10/Dockerfile
USER root
RUN apt-get update && apt-get upgrade -y && \
apt-get install -y vim && \
rm -r /var/lib/apt/lists /var/cache/apt/archives
RUN bash -c "source /opt/bitnami/airflow/venv/bin/activate && \
pip install flask-restful && \
deactivate"
To be clear, I don’t especially endorse this approach anymore, except that I like to add flask-restful
for creating custom REST API plugins.
需要明确的是,我不再特别赞成这种方法,除了我喜欢添加flask-restful
来创建自定义REST API插件。
I like to treat Apache Airflow the way I treat web applications. I’ve been burned too many times, so now my web apps take care of routing and rendering views, and absolutely nothing else.
我喜欢以对待Web应用程序的方式来对待Apache Airflow。 我已经被烧死了太多次 ,所以现在我的Web应用程序负责路由和渲染视图,而别无其他。
Airflow is about the same, except it handles the business logic of my workflows and absolutely nothing else. If I have some crazy pandas/tensorflow/opencv/whatever stuff I need to do I’ll build that into a separate microservice and not touch my main business logic. I like to think of Airflow as the spider that sits in the web.
气流几乎相同,除了它处理我的工作流的业务逻辑外, 别无其他 。 如果我需要一些疯狂的Pandas / tensorflow / opencv /其他东西,我会将其构建到单独的微服务中,而不涉及我的主要业务逻辑。 我喜欢将Airflow视为网络中的蜘蛛。
Still, I’m paranoid enough that I like to build my own images so I can then push them to my own docker repo.
尽管如此,我还是很偏执,喜欢构建自己的映像,因此可以将它们推送到我自己的docker存储库中。
总结以及从这里去哪里 (Wrap Up and Where to go from here)
Now that you have your foundation its time to build out your data science workflows! Add some custom DAGs, create some custom plugins, and generally build stuff.
现在,您已经有时间构建数据科学工作流了! 添加一些自定义DAG,创建一些自定义插件,并通常构建东西 。
If you’d like to request a tutorial please feel free to reach out to me at jillian@dabbleofdevops.com or on twitter.
如果您想索取教程,请随时通过jillian@dabbleofdevops.com或通过Twitter与我联系。
备忘单 (Cheat Sheet)
Here are some hopefully helpful commands and resources.
这里有一些希望有用的命令和资源。
登录到您的Apache Airflow实例 (Log into your Apache Airflow Instance)
The default username and password is user
and bitnami
.
默认的用户名和密码为user
和bitnami
。
Docker撰写命令 (Docker Compose Commands)
Build
建立
cd code/bitnami-apache-airflow-1.10.10/
docker-compose build
Bring up your stack! Running docker-compose up
makes all your logs come up on STDERR/STDOUT.
提起你的筹码! 运行docker-compose up
可使您的所有日志显示在STDERR / STDOUT上。
cd code/bitnami-apache-airflow-1.10.10/
docker-compose build && docker-compose up
If you’d like to run it in the background instead use -d
.
如果您想在后台运行它,请使用-d
。
cd code/bitnami-apache-airflow-1.10.10/
docker-compose build && docker-compose up -d
Bitnami Apache Airflow配置 (Bitnami Apache Airflow Configuration)
You can further customize your Airflow instance using environmental variables that you pass into the docker-compose file. Check out the README for details.
您可以使用传递到docker-compose文件的环境变量进一步自定义Airflow实例。 查看自述文件以了解详细信息。
加载DAG文件 (Load DAG files)
Custom DAG files can be mounted to /opt/bitnami/airflow/dags
or copied during the Docker build phase.
自定义DAG文件可以挂载到/opt/bitnami/airflow/dags
或在Docker构建阶段复制。
使用Docker Compose指定环境变量 (Specifying Environment variables using Docker Compose)
version: '2'
services:
airflow:
image: bitnami/airflow:latest
environment:
- AIRFLOW_FERNET_KEY=46BKJoQYlPPOexq0OhDZnIlNepKFf87WFwLbfzqDDho=
- AIRFLOW_EXECUTOR=CeleryExecutor
- AIRFLOW_DATABASE_NAME=bitnami_airflow
- AIRFLOW_DATABASE_USERNAME=bn_airflow
- AIRFLOW_DATABASE_PASSWORD=bitnami1
- AIRFLOW_PASSWORD=bitnami123
- AIRFLOW_USERNAME=user
- AIRFLOW_EMAIL=user@example.com
在Docker之后清理 (Clean up after Docker)
Docker can take up a lot of room on your filesystem.
Docker会在您的文件系统上占用大量空间。
If you’d like to clean up just the Airflow stack then:
如果您只想清理气流堆叠,请:
cd code/docker/bitnami-apache-airflow-1.10.10
docker-compose stop
docker-compose rm -f -v
Running docker-compose rm -f
forcibly removes all the containers, and the -v
also removes all data volumes.
运行docker-compose rm -f
强制删除所有容器,并且-v
也将删除所有数据卷。
随处删除所有docker映像 (Remove all docker images everywhere)
This will stop all running containers and remove them.
这将停止所有正在运行的容器并删除它们。
docker container stop $(docker container ls -aq)
docker system prune -f -a
This will remove all containers AND data volumes
这将删除所有容器和数据卷
docker system prune -f -a --volumes
Originally published at https://www.dabbleofdevops.com.
最初发布在 https://www.dabbleofdevops.com 。
bitnami如何使用