Telegraf安装及使用

滕令雪

2023-12-01

1 安装

1.1 创建用户

（1）添加用户

# useradd tigk
# passwd tigk
Changing password for user tigk.
New password: 
BAD PASSWORD: The password is shorter than 8 characters
Retype new password: 
passwd: all authentication tokens updated successfully.

(2)授权

个人用户的权限只可以在本home下有完整权限，其他目录需要别人授权。经常需要root用户的权限，可以通过修改sudoers文件来赋予权限，使用sudo命令。

 # 赋予读写权限
# chmod -v u+w /etc/sudoers
mode of ‘/etc/sudoers’ changed from 0440 (r--r-----) to 0640 (rw-r-----)

修改sudoers文件，添加新用户信息 vi /etc/sudoers，添加内容"elastic ALL=(ALL) ALL "

## Allow root to run any commands anywhere
root    ALL=(ALL)       ALL
tigk    ALL=(ALL)       ALL

收回权限

#  chmod -v u-w /etc/sudoers
mode of ‘/etc/sudoers’ changed from 0640 (rw-r-----) to 0440 (r--r-----)

创建tigk安装目录

# su - tigk
$ mkdir /home/tigk/.local

（3）创建目录存放TIGK相关文件

# mkdir /data/tigk
# chown tigk:tigk /data/tigk
# su - tigk
$ mkdir /data/tigk/telegraf
$ mkdir /data/tigk/influxdb
$ mkdir /data/tigk/kapacitor

1.2 Tar包安装

1.2.1 获取tar包

wget https://dl.influxdata.com/telegraf/releases/telegraf-1.14.4_linux_amd64.tar.gz

1.2.2 解压tar包

$ tar xf /opt/package/telegraf-1.14.4_linux_amd64.tar.gz -C /home/tigk/.local/

1.2.3 生成简单配置

可执行文件在 {telegraf根目录}/usr/bin/telegraf，配置文件在安装后的etc目录下，也可直接配置生成

查看帮助telegraf --help

生成配置文件 telegraf config > telegraf.conf

生成带cpu、memroy、http_listener和influxdb插件的配置文件
telegraf --input-filter cpu:mem:http_listener --output-filter influxdb config > telegraf.conf

执行程序 telegraf --config telegraf.conf

以后台方式启动 nohup telegraf --config telegraf > /dev/null 2>&1 &

$ cd /home/tigk/.local/telegraf/usr/bin

$ ./telegraf --help

$ ./telegraf config > telegraf.conf

$ ./telegraf --input-filter cpu:mem:http_listener --output-filter influxdb config > telegraf.conf

1.2.4 修改配置文件

[tigk@fbi-local-02 ~]$ mkdir /data/tigk/telegraf/logs

$ mkdir /data/tigk/telegraf/conf
$ cp /home/tigk/.local/telegraf/usr/bin/telegraf.conf /data/tigk/telegraf/conf
$ vim /data/tigk/telegraf/conf/telegraf.conf 
找到[outputs.influxdb]部分提供用户名和密码,修改内容如下
[[outputs.influxdb]]
  urls = ["http://10.0.165.2:8085"]
  timeout = "5s"
  username = "tigk"
  password = "tigk"
[agent]
  logfile = "/data/tigk/telegraf/logs/telegraf.log"

启动

$ cd /home/tigk/.local/telegraf/usr/bin
$ nohup ./telegraf --config /data/tigk/telegraf/conf/telegraf.conf   &

1.3 RPM包安装

（1）获取rpm包

wget https://dl.influxdata.com/telegraf/releases/telegraf-1.14.4-1.x86_64.rpm

（2）安装rpm包

sudo yum localinstall telegraf-1.14.4-1.x86_64.rpm

（3）启动服务、添加开机启动

systemctl start  telegraf.service
service telegraf status
systemctl enable  telegraf.service

（4）查看版本，修改配置文件

telegraf --version

配置文件位置（默认配置）：/etc/telegraf/telegraf.conf
修改telegraf配置文件

vim /etc/telegraf/telegraf.conf

（5）启动

service telegraf start

2 使用

2.1 常见命令及配置

（1）命令展示 telegraf –h

$ ./telegraf -h
Telegraf, The plugin-driven server agent for collecting and reporting metrics.

Usage:

  telegraf [commands|flags]

The commands & flags are:

  config              print out full sample configuration to stdout
  version             print the version to stdout

  --aggregator-filter <filter>   filter the aggregators to enable, separator is :
  --config <file>                configuration file to load
  --config-directory <directory> directory containing additional *.conf files
  --plugin-directory             directory containing *.so files, this directory will be
                                 searched recursively. Any Plugin found will be loaded
                                 and namespaced.
  --debug                        turn on debug logging
  --input-filter <filter>        filter the inputs to enable, separator is :
  --input-list                   print available input plugins.
  --output-filter <filter>       filter the outputs to enable, separator is :
  --output-list                  print available output plugins.
  --pidfile <file>               file to write our pid to
  --pprof-addr <address>         pprof address to listen on, don't activate pprof if empty
  --processor-filter <filter>    filter the processors to enable, separator is :
  --quiet                        run in quiet mode
  --section-filter               filter config sections to output, separator is :
                                 Valid values are 'agent', 'global_tags', 'outputs',
                                 'processors', 'aggregators' and 'inputs'
  --sample-config                print out full sample configuration
  --test                         gather metrics, print them out, and exit;
                                 processors, aggregators, and outputs are not run
  --test-wait                    wait up to this many seconds for service
                                 inputs to complete in test mode
  --usage <plugin>               print usage for a plugin, ie, 'telegraf --usage mysql'
  --version                      display the version and exit

Examples:

  # generate a telegraf config file:
  telegraf config > telegraf.conf

  # generate config with only cpu input & influxdb output plugins defined
  telegraf --input-filter cpu --output-filter influxdb config

  # run a single telegraf collection, outputing metrics to stdout
  telegraf --config telegraf.conf --test

  # run telegraf with all plugins defined in config file
  telegraf --config telegraf.conf

  # run telegraf, enabling the cpu & memory input, and influxdb output plugins
  telegraf --config telegraf.conf --input-filter cpu:mem --output-filter influxdb

  # run telegraf with pprof
  telegraf --config telegraf.conf --pprof-addr localhost:6060

（2）命令使用

命令	解释
telegraf --help	查看帮助
telegraf config > telegraf.conf	标准输出生成配置文档模板
telegraf --input-filter cpu --output-filter influxdb config	只生成数据采集插件为cpu、输出插件为influxdb的配置文档模板
telegraf --config telegraf.conf --test	使用指定配置文件进行测试、将收集到的数据输出stdout
telegraf --config telegraf.conf	使用指定文件启动telegraf
telegraf --config telegraf.conf --input-filter cpu:mem --output-filter influxdb	按指定配置文件启动telegraf，过滤使用cpu、mem作为数据采集插件、influxdb为输出插件

（3）配置文档位置

安装方式	默认位置	默认补充配置文件夹
Linux RPM包	/etc/telegraf/telegraf.conf	/etc/telegraf/telegraf.d
Linux Tar包	{安装目录}/etc/telegraf/telegraf.conf	{安装目录}/etc/telegraf/telegraf.d

（4）配置加载方式
命令默认加载telegraf.conf和/etc/telegraf/telegraf.d下的所有配置。选项—config和–config-directory可改变其行为。配置中每一个input模块，都会有对应的线程进行收集。如果有input配置重复，会造成资源浪费。

（5）配置全局tag标签
在配置文件中的[global_tags]区域定义key=“value”形式的键值对，这样收集到的metrics都会打上这样子的标签
（6）Agent配置
[agent] 区域可以对本机所有进行数据采集的agent进行配置。

属性	说明
interval	数据采集间隔
round_interval	是否整时收集。如interval=10s，设置会使收集发生在每分钟的00，10，20，30…
metric_batch_size	发送到output的数据的分批大小
metric_buffer_limit	发给output的数据buffer大小
collection_jitter	收集数据前agent最大随机休眠时间，主要防止agent在同一时间收集数据
flush_interval	发送数据到output的时间间隔
flush_jitter	发送数据前最大随机休眠时间，主要防止一起发output时出现大的写高峰
Precision	时间精度
logfile	日志名
debug	是否debug模式
quiet	安静模式，只有错误消息
hostname	默认os.Hostname()，设置则覆盖
omit_hostname	Tag中是否需要包含hostname

（7）Input插件通用配置

属性	说明
interval	数据采集间隔，如果设置，会覆盖Agent的设置
name_override	改变输出的measurement名字
name_prefix	measurement名字前缀
name_suffix	measurement名字后缀
Tags	添加到输出measurement 的一个tag字典

（8）Output通用插件配置：无通用配置
（9）Measurement过滤，可以定义在input，output等插件中

属性	说明
namepass	只有Measurement符合此正则的数据点通过
namedrop	measurement符合此正则的数据点被丢弃
fieldpass	只有fieldkey符合此正则的field通过
fielddrop	fieldkey符合此正则的field被丢弃
tagpass	只有tag符合此正则的点通过
tagdrop	tag符合此正则的点被丢弃
taginclude	只有tag符合此正则的点通过,并丢掉不符合的tag
tagexclude	丢掉符合正则的tag

（10）典型配置举例
①Input - System – cpu

# Read metrics about cpu usage
[[inputs.cpu]]
  ## Whether to report per-cpu stats or not
  percpu = true
  ## Whether to report total system cpu stats or not
  totalcpu = true
  ## If true, collect raw CPU time metrics.
  collect_cpu_time = false
  ## If true, compute and report the sum of all non-idle CPU states.
  report_active = false

②Input - System – disk

# Read metrics about disk usage by mount point
[[inputs.disk]]
  ## By default stats will be gathered for all mount points.
  ## Set mount_points will restrict the stats to only the specified mount points.
  # mount_points = ["/"]

  ## Ignore mount points by filesystem type.
  ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]

③Input - System – kernel

# Get kernel statistics from /proc/stat
[[inputs.kernel]]
  # no configuration

④Input - System – MEM

# Read metrics about memory usage
[[inputs.mem]]
  # no configuration

⑤Input - System – netstat

# # Read TCP metrics such as established, time wait and sockets counts.
# [[inputs.netstat]]
#   # no configuration

⑥Input - System – processes

# Get the number of processes and group them by status
[[inputs.processes]]
  # no configuration

⑦Input - System – system

# Read metrics about system load & uptime
[[inputs.system]]
  ## Uncomment to remove deprecated metrics.
  # fielddrop = ["uptime_format"]

⑧Input - System – ping

# # Ping given url(s) and return statistics
# [[inputs.ping]]
#   ## Hosts to send ping packets to.
#   urls = ["example.org"]
#
#   ## Method used for sending pings, can be either "exec" or "native".  When set
#   ## to "exec" the systems ping command will be executed.  When set to "native"
#   ## the plugin will send pings directly.
#   ##
#   ## While the default is "exec" for backwards compatibility, new deployments
#   ## are encouraged to use the "native" method for improved compatibility and
#   ## performance.
#   # method = "exec"
#
#   ## Number of ping packets to send per interval.  Corresponds to the "-c"
#   ## option of the ping command.
#   # count = 1
#
#   ## Time to wait between sending ping packets in seconds.  Operates like the
#   ## "-i" option of the ping command.
#   # ping_interval = 1.0
#
#   ## If set, the time to wait for a ping response in seconds.  Operates like
#   ## the "-W" option of the ping command.
#   # timeout = 1.0
#
#   ## If set, the total ping deadline, in seconds.  Operates like the -w option
#   ## of the ping command.
#   # deadline = 10
#
#   ## Interface or source address to send ping from.  Operates like the -I or -S
#   ## option of the ping command.
#   # interface = ""
#
#   ## Specify the ping executable binary.
#   # binary = "ping"
#
#   ## Arguments for ping command. When arguments is not empty, the command from
#   ## the binary option will be used and other options (ping_interval, timeout,
#   ## etc) will be ignored.
#   # arguments = ["-c", "3"]
#
#   ## Use only IPv6 addresses when resolving a hostname.
#   # ipv6 = false

⑨Input - App – procstat

# [[inputs.procstat]]
#   ## PID file to monitor process
#   pid_file = "/var/run/nginx.pid"
#   ## executable name (ie, pgrep <exe>)
#   # exe = "nginx"
#   ## pattern as argument for pgrep (ie, pgrep -f <pattern>)
#   # pattern = "nginx"
#   ## user as argument for pgrep (ie, pgrep -u <user>)
#   # user = "nginx"
#   ## Systemd unit name
#   # systemd_unit = "nginx.service"
#   ## CGroup name or path
#   # cgroup = "systemd/system.slice/nginx.service"
#
#   ## Windows service name
#   # win_service = ""
#
#   ## override for process_name
#   ## This is optional; default is sourced from /proc/<pid>/status
#   # process_name = "bar"
#
#   ## Field name prefix
#   # prefix = ""
#
#   ## When true add the full cmdline as a tag.
#   # cmdline_tag = false
#
#   ## Add PID as a tag instead of a field; useful to differentiate between
#   ## processes whose tags are otherwise the same.  Can create a large number
#   ## of series, use judiciously.
#   # pid_tag = false
#
#   ## Method to use when finding process IDs.  Can be one of 'pgrep', or
#   ## 'native'.  The pgrep finder calls the pgrep executable in the PATH while
#   ## the native finder performs the search directly in a manor dependent on the
#   ## platform.  Default is 'pgrep'
#   # pid_finder = "pgrep"

⑩Input – App – redis


# # Read metrics from one or many redis servers
# [[inputs.redis]]
#   ## specify servers via a url matching:
#   ##  [protocol://][:password]@address[:port]
#   ##  e.g.
#   ##    tcp://localhost:6379
#   ##    tcp://:password@192.168.99.100
#   ##    unix:///var/run/redis.sock
#   ##
#   ## If no servers are specified, then localhost is used as the host.
#   ## If no port is specified, 6379 is used
#   servers = ["tcp://localhost:6379"]
#
#   ## specify server password
#   # password = "s#cr@t%"
#
#   ## Optional TLS Config
#   # tls_ca = "/etc/telegraf/ca.pem"
#   # tls_cert = "/etc/telegraf/cert.pem"
#   # tls_key = "/etc/telegraf/key.pem"
#   ## Use TLS but skip chain & host verification
#   # insecure_skip_verify = true

⑪Input – App – kafka_consumer

# # Read metrics from Kafka topics
# [[inputs.kafka_consumer]]
#   ## Kafka brokers.
#   brokers = ["localhost:9092"]
#
#   ## Topics to consume.
#   topics = ["telegraf"]
#
#   ## When set this tag will be added to all metrics with the topic as the value.
#   # topic_tag = ""
#
#   ## Optional Client id
#   # client_id = "Telegraf"
#
#   ## Set the minimal supported Kafka version.  Setting this enables the use of new
#   ## Kafka features and APIs.  Must be 0.10.2.0 or greater.
#   ##   ex: version = "1.1.0"
#   # version = ""
#
#   ## Optional TLS Config
#   # enable_tls = true
#   # tls_ca = "/etc/telegraf/ca.pem"
#   # tls_cert = "/etc/telegraf/cert.pem"
#   # tls_key = "/etc/telegraf/key.pem"
#   ## Use TLS but skip chain & host verification
#   # insecure_skip_verify = false
#
#   ## SASL authentication credentials.  These settings should typically be used
#   ## with TLS encryption enabled using the "enable_tls" option.
#   # sasl_username = "kafka"
#   # sasl_password = "secret"
#
#   ## SASL protocol version.  When connecting to Azure EventHub set to 0.
#   # sasl_version = 1
#
#   ## Name of the consumer group.
#   # consumer_group = "telegraf_metrics_consumers"
#
#   ## Initial offset position; one of "oldest" or "newest".
#   # offset = "oldest"
#
#   ## Consumer group partition assignment strategy; one of "range", "roundrobin" or "sticky".
#   # balance_strategy = "range"
#
#   ## Maximum length of a message to consume, in bytes (default 0/unlimited);
#   ## larger messages are dropped
#   max_message_len = 1000000
#
#   ## Maximum messages to read from the broker that have not been written by an
#   ## output.  For best throughput set based on the number of metrics within
#   ## each message and the size of the output's metric_batch_size.
#   ##
#   ## For example, if each message from the queue contains 10 metrics and the
#   ## output metric_batch_size is 1000, setting this to 100 will ensure that a
#   ## full batch is collected and the write is triggered immediately without
#   ## waiting until the next flush_interval.
#   # max_undelivered_messages = 1000
#
#   ## Data format to consume.
#   ## Each data format has its own unique set of configuration options, read
#   ## more about them here:
#   ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
#   data_format = "influx"

⑫Input – App – exec

# [[outputs.exec]]
#   ## Command to ingest metrics via stdin.
#   command = ["tee", "-a", "/dev/null"]
#
#   ## Timeout for command to complete.
#   # timeout = "5s"
#
#   ## Data format to output.
#   ## Each data format has its own unique set of configuration options, read
#   ## more about them here:
#   ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_OUTPUT.md
#   # data_format = "influx"

⑬Output – influxdb

# # Configuration for sending metrics to InfluxDB
# [[outputs.influxdb_v2]]
#   ## The URLs of the InfluxDB cluster nodes.
#   ##
#   ## Multiple URLs can be specified for a single cluster, only ONE of the
#   ## urls will be written to each interval.
#   ##   ex: urls = ["https://us-west-2-1.aws.cloud2.influxdata.com"]
#   urls = ["http://127.0.0.1:9999"]
#
#   ## Token for authentication.
#   token = ""
#
#   ## Organization is the name of the organization you wish to write to; must exist.
#   organization = ""
#
#   ## Destination bucket to write into.
#   bucket = ""
#
#   ## The value of this tag will be used to determine the bucket.  If this
#   ## tag is not set the 'bucket' option is used as the default.
#   # bucket_tag = ""
#
#   ## If true, the bucket tag will not be added to the metric.
#   # exclude_bucket_tag = false
#
#   ## Timeout for HTTP messages.
#   # timeout = "5s"
#
#   ## Additional HTTP headers
#   # http_headers = {"X-Special-Header" = "Special-Value"}
#
#   ## HTTP Proxy override, if unset values the standard proxy environment
#   ## variables are consulted to determine which proxy, if any, should be used.
#   # http_proxy = "http://corporate.proxy:3128"
#
#   ## HTTP User-Agent
#   # user_agent = "telegraf"
#
#   ## Content-Encoding for write request body, can be set to "gzip" to
#   ## compress body or "identity" to apply no encoding.
#   # content_encoding = "gzip"
#
#   ## Enable or disable uint support for writing uints influxdb 2.0.
#   # influx_uint_support = false
#
#   ## Optional TLS Config for use on HTTP connections.
#   # tls_ca = "/etc/telegraf/ca.pem"
#   # tls_cert = "/etc/telegraf/cert.pem"
#   # tls_key = "/etc/telegraf/key.pem"
#   ## Use TLS but skip chain & host verification
#   # insecure_skip_verify = false

2.2 获取官方未提供input plugin的应用

如获取yarn中的应用，并存入influxdb：①可利用input插件exec，执行某个脚本，使其标准输出打印符合influxdb line protocol的日志②通过脚本里利用yarn的api获取正在跑的应用

#!bin/python
import json
import urllib
import httplib

host="10.0.165.3:8088"

path="/ws/v1/cluster/apps"
data=urllib.urlencode({'state':"RUNNING","applicationTypes":"Apache Flink"})
path=path+"?"+data
headers = {"Accept":"application/json"}
conn=httplib.HTTPConnection(host)
conn.request("GET",path,headers=headers)
result=conn.getresponse()
if(result.status):
	content = result.read()
	apps = json.loads(content)["apps"]["app"]
	for app in apps:
		if("test" in app["name"] or "TEST" in app["name"] or "Test" in app["name"]):
			continue
		app["escaped_name"] = app["name"].replace(' ','\ ')
		print "APPLICATION.RUNNING,appname=%s,appid=%s field_appname=\"%s\",field_appid=\"%s\" " % (app["escaped_name"],app["id"],app["name"],app["id"])

执行结果为APPLICATION.RUNNING,appname=iot_road_traffic,appid=application_1592979353214_0175 field_appname=“iot_road_traffic”,field_appid=“application_1592979353214_0175”
配置input插件的exec如下

[[outputs.exec]]
  ## Command to ingest metrics via stdin.
  command = ["python", "/data/tigk/telegraf/exec/getRunningFlinkJob.py"]

  # Timeout for command to complete.
  timeout = "5s"

  ## Data format to output.
  ## Each data format has its own unique set of configuration options, read
  ## more about them here:
  ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_OUTPUT.md
  data_format = "influx"