Pulsar运维监控

史烈

2023-12-01

消息队列——>用来程序的异步解耦

Queuing：消费一次，不按特定顺序

Streaming：可多次消费，按特定顺序

Pulsar同时支持以上两种

Exclusive，Failover：Streaming 流处理消费模式

Shared，Key Shared：Queue 队列消费模式

Pulsar特性：Durability(持久性)，Ordering(有序)，Delivery Guarantees(传递保证)，High throughput(高吞吐量)，Low Latency(低延迟)，Unified messaging model(统一消息传递模型)，Multi-tenancy(多租户)，Geo-replication(跨地域复制)，Highly scalable & available(高度可扩展性和可用性)

1. Apache Pulsar核心组件

Broker：计算层。处理producer，consumer交互的协议解析。

BookKeeper/Bookie：存储层。分片segment。

ZooKeeper：协调层。集群元数据的管理，Service discovery服务感知（节点新增进来，节点宕机）。

组件交互：broker，bookie都需要注册到zookeeper上。broker处理producer和consumer的读写请求，并请求bookie。

组件端口（默认端口，可更改）：

Broker：TCP/6650：负责Pulsar Client的连接

Http/8080：暴露普罗米修斯的监控指标，暴露Pulsar admin的API

BookKeeper：TCP/3181：broker连接bookKeeper使用的端口

Http/8080：暴露普罗米修斯的监控指标

ZookKeeper：TCP/2181：broker，bookeeper连接使用的端口

Http/8080：暴露普罗米修斯的监控指标

tcp端口主要用于组件间内部通信以及client访问

http端口主要用于提供rest api 和暴露 prometheus 的 metrics

分布式节点个数：

Broker：至少2个

BookKeeper：至少3个

Zookeeper：奇数个，至少3个

2. Pursal上手

2.1 Pursal下载地址

下载地址：Apache Pulsar

清华Mirro：Index of /apache/pulsar

2.2 本地开发单机模式

启动命令：前台运行：bin/pulsar standalone

后台运行：bin/pulsar-daemon standalone

查看集群列表：bin/pulsar-admin clusters list

查看brokers列表：bin/pulsar-admin brokers list test

查看topic列表：bin/pulsar-admin topics list public/default

命令行生产消息：bin/pulsar-client produce my-topic --messages "hello-pulsar"

命令行消费消息：bin/pulsar-client consume my-topic --s "first-subscription"

2.3 集群模式

测试环境使用集群模式推荐使用docker

2.4 运维工具Pulsar Manager

wget https://dist.apache.org/repos/dist/release/pulsar/pulsar-manager/pulsar-manager-0.2.0/apache-pulsar-manager-0.2.0-bin.tar.gz tar -zxvf apache-pulsar-manager-0.2.0-bin.tar.gz cd pulsar-manager tar -xvf pulsar-manager.tar cd pulsar-manager cp -r ../dist ui

建议打开如下两个配置

1.application.properties

bookie.enable=true
pulsar.peek.message=true

2.bkvm.conf

bookie.enable=true

初始化用户名密码（启动前执行初始化）：

CSRF_TOKEN=$(curl http://localhost:7750/pulsar-manager/csrf-token) curl \ -H "X-XSRF-TOKEN: $CSRF_TOKEN" \ -H "Cookie: XSRF-TOKEN=$CSRF_TOKEN;" \ -H 'Content-Type: application/json' \ -X PUT http://localhost:7750/pulsar-manager/users/superuser \ -d '{"name": "admin", "password": "apachepulsar", "description": "test", "email": "username@test.org"}'

执行bin/pulsar-manager启动程序

pulsar可视化：admin/apachepulsar：http://localhost:7750/ui/index.html

bookie可视化：admin/admin：http://localhost:7750/bkvm/

2.5 监控工具Prometheus&Grafana

2.5.1 Prometheus

prometheus.yml配置参考模板：apache-pulsar-grafana-dashboard/cluster.yml.template at master · streamnative/apache-pulsar-grafana-dashboard · GitHub

#
# Copyright (c) 2018 Sijie. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

---
global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.
  evaluation_interval: 15s # By default, scrape targets every 15 seconds.
  # scrape_timeout is set to the global default (10s).
  external_labels:
    # TODO: replace `<cluster-name>` with the right cluster name. E.g.
    #
    # cluster: test-cluster
    cluster: <cluster-name>

# Load and evaluate rules in these files every 'evaluation_interval' seconds.
# rule_files:

scrape_configs:

  - job_name: "proxy"
    honor_labels: true # don't overwrite job & instance labels
    static_configs:
    - targets:
      # TODO: add the proxies to monitor
      #
      # - 'proxy1:8080'
      # - 'proxy2:8080'
      # - ...

  - job_name: "broker"
    honor_labels: true # don't overwrite job & instance labels
    static_configs:
    - targets:
      # TODO: add the brokers to monitor
      #
      # - 'broker1:8080'
      # - 'broker2:8080'
      # - ...

  - job_name: "bookie"
    honor_labels: true # don't overwrite job & instance labels
    static_configs:
    - targets:
      # TODO: add the bookies to monitor
      #
      # - 'bookie1:8000'
      # - 'bookie2:8000'
      # - ...

  - job_name: "zookeeper"
    honor_labels: true
    static_configs:
    - targets:
      # TODO: add the zookeeper nodes to monitor
      #
      # - 'zookeeper1:8000'
      # - 'zookeeper2:8000'
      # - ...

  - job_name: "node_metrics"
    honor_labels: true # don't overwrite job & instance labels
    static_configs:
    - targets:
      # TODO: add the physical machines to monitor
      #
      # - 'node1:9100'
      # - 'node2:9100'
      # - ...

修改完配置后，启动Prometheus。

Prometheus可视化：http://localhost:9090

2.5.2 Grafana

bin/grafana-server启动grafana

grafana可视化：http://localhost:3000

GitHub - streamnative/apache-pulsar-grafana-dashboard: Apache Pulsar Grafana Dashboard

执行下面这个命令

./scripts/generate_dashboards.sh <prometheus-url> <clustername>

<prometheus-url>: The url points to your prometheus servcie. E.g. http://localhost:9090

<clustername>: Your pulsar cluster name.

在grafana的Manage—>import—>Upload JSON File，在apache-pulsar-grafana-dashboard/target/dashboards目录下选择要导入的json文件。

2.5.3 Perf 压力测试

pulsar 提供了压力测试的命令行工具，使用以下命令生产消息：

-r：每秒生产的消息总数（所有生产者）
-n：生产者数量
-s：每条消息的大小（bytes）
最后跟上 topic 名字

bin/pulsar-perf produce -r 100 -n 2 -s 1024 test-perf

# 输出内容，从左到右依次是：
# 每秒生产的消息数量：87.2条
# 每秒流量大小：0.7Mb
# 每秒生产失败的消息数：0
# 平均延迟：5.478ms
# 延迟中位数：4.462ms
# 95%的延迟在 11.262ms以内
# 99%的延迟在 25.802ms以内
# 99.9%的延迟在 43.757ms以内
# 99.99%的延迟在 51.956ms以内
# 最大延迟：51.956ms

... Throughput produced:   87.2  msg/s ---      0.7 Mbit/s --- failure      0.0 msg/s --- Latency: mean:   5.478 ms - med:   4.642 - 95pct:  11.263 - 99pct:  25.802 - 99.9pct:  43.757 - 99.99pct:  51.956 - Max:  51.956

使用以下命令消费消息：

bin/pulsar-perf consume test-perf

# 输出内容，从左到右依次是：
# 每秒消费的消息数量：100.007条
# 每秒流量大小：0.781Mb
# 平均延迟：9.273ms
# 延迟中位数：9ms
# 95%的延迟在 14ms以内
# 99%的延迟在 15ms以内
# 99.9%的延迟在 28ms以内
# 99.99%的延迟在 34ms以内
# 最大延迟：34ms
... Throughput received: 100.007  msg/s -- 0.781 Mbit/s --- Latency: mean: 9.273 ms - med: 9 - 95pct: 14 - 99pct: 15 - 99.9pct: 28 - 99.99pct: 34 - Max: 34

附录 Apache Pulsar入门资料

TGIP CN：GitHub - streamnative/tgip-cn: TGIP-CN (Thank God Its Pulsar) is a weekly live video streaming about Apache Pulsar in Chinese.

Bilibili：StreamNative的个人空间_哔哩哔哩_Bilibili

微信公众号：Apache Pulsar - 从入门到实践合集（假期充电包 | Apache Pulsar 从入门到实践）

官方文档：Pulsar(Index of /docs)，bookKeeper(Apache BookKeeper - Apache BookKeeper 4.5.0-SNAPSHOT Documentation)

样例：https://github.com/streamnative/examples

GitHub - streamnative/psat_exercise_code: pulsar summit asia workshop execise code

Pulsar运维监控

1. Apache Pulsar核心组件

2. Pursal上手

附录 Apache Pulsar入门资料

相关阅读

相关文章

相关问答

相关文档