kubectl-debug is an out-of-tree solution for troubleshooting running pods, which allows you to run a new container in running pods for debugging purpose (examples). The new container will join the pid, network, user and ipc namespaces of the target container, so you can use arbitrary trouble-shooting tools without pre-installing them in your production container image.

Screenshots

Quick Start

Install the kubectl debug plugin

Homebrew:

brew install aylei/tap/kubectl-debug

Download the binary:

export PLUGIN_VERSION=0.1.1
# linux x86_64
curl -Lo kubectl-debug.tar.gz https://github.com/aylei/kubectl-debug/releases/download/v${PLUGIN_VERSION}/kubectl-debug_${PLUGIN_VERSION}_linux_amd64.tar.gz
# macos
curl -Lo kubectl-debug.tar.gz https://github.com/aylei/kubectl-debug/releases/download/v${PLUGIN_VERSION}/kubectl-debug_${PLUGIN_VERSION}_darwin_amd64.tar.gz

tar -zxvf kubectl-debug.tar.gz kubectl-debug
sudo mv kubectl-debug /usr/local/bin/

For windows users, download the latest archive from the release page, decompress the package and add it to your PATH.

(Optional) Install the debug agent DaemonSet

kubectl-debug requires an agent pod to communicate with the container runtime. In the agentless mode, the agent pod can be created when a debug session starts and to be cleaned up when the session ends.(Turn on agentless mode by default)

While convenient, creating pod before debugging can be time consuming. You can install the debug agent DaemonSet and use --agentless=false params in advance to skip this:

# if your kubernetes version is v1.16 or newer
kubectl apply -f https://raw.githubusercontent.com/aylei/kubectl-debug/master/scripts/agent_daemonset.yml
# if your kubernetes is old version(<v1.16), you should change the apiVersion to extensions/v1beta1, As follows
wget https://raw.githubusercontent.com/aylei/kubectl-debug/master/scripts/agent_daemonset.yml
sed -i '' '1s/apps\/v1/extensions\/v1beta1/g' agent_daemonset.yml
kubectl apply -f agent_daemonset.yml
# or using helm
helm install kubectl-debug -n=debug-agent ./contrib/helm/kubectl-debug
# use daemonset agent mode(close agentless mode)
kubectl debug --agentless=false POD_NAME

Debug instructions

Try it out!

# kubectl 1.12.0 or higher
kubectl debug -h
# if you installed the debug agent's daemonset, you can use --agentless=false to speed up the startup.
# the default agentless mode will be used in following commands
kubectl debug POD_NAME

# in case of your pod stuck in `CrashLoopBackoff` state and cannot be connected to,
# you can fork a new pod and diagnose the problem in the forked pod
kubectl debug POD_NAME --fork

# in fork mode, if you want the copied pod retains the labels of the original pod, you can use the --fork-pod-retain-labels parameter to set(comma separated, and spaces are not allowed)
# Example is as follows
# If not set, this parameter is empty by default (Means that any labels of the original pod are not retained, and the labels of the copied pods are empty.)
kubectl debug POD_NAME --fork --fork-pod-retain-labels=<labelKeyA>,<labelKeyB>,<labelKeyC>

# in order to enable node without public IP or direct access (firewall and other reasons) to access, port-forward mode is enabled by default.
# if you don't need to turn on port-forward mode, you can use --port-forward false to turn off it.
kubectl debug POD_NAME --port-forward=false --agentless=false --daemonset-ns=kube-system --daemonset-name=debug-agent

# old versions of kubectl cannot discover plugins, you may execute the binary directly
kubectl-debug POD_NAME

# use primary docker registry, set registry kubernets secret to pull image
# the default registry-secret-name is kubectl-debug-registry-secret, the default namespace is default
# please set the secret data source as {Username: <username>, Password: <password>}
kubectl-debug POD_NAME --image calmkart/netshoot:latest --registry-secret-name <k8s_secret_name> --registry-secret-namespace <namespace>
# in default agentless mode, you can set the agent pod's resource limits/requests, for example:
# default is not set
kubectl-debug POD_NAME --agent-pod-cpu-requests=250m --agent-pod-cpu-limits=500m --agent-pod-memory-requests=200Mi --agent-pod-memory-limits=500Mi

You can configure the default arguments to simplify usage, refer to Configuration
Refer to Examples for practical debugging examples

(Optional) Create a Secret for Use with Private Docker Registries

You can use a new or existing Kubernetes dockerconfigjson secret. For example:

# Be sure to run "docker login" beforehand.
kubectl create secret generic kubectl-debug-registry-secret \
    --from-file=.dockerconfigjson=<path/to/.docker/config.json> \
    --type=kubernetes.io/dockerconfigjson

Alternatively, you can create a secret with the key authStr and a JSON payload containing a Username and Password. For example:

echo -n '{"Username": "calmkart", "Password": "calmkart"}' > ./authStr
kubectl create secret generic kubectl-debug-registry-secret --from-file=./authStr

Refer to the official Kubernetes documentation on Secrets for more ways to create them.

Build from source

Clone this repo and:

# make will build plugin binary and debug-agent image
make
# install plugin
mv kubectl-debug /usr/local/bin

# build plugin only
make plugin
# build agent only
make agent-docker

port-forward mode And agentless mode(Default opening)

port-foward mode: By default, kubectl-debug will directly connect with the target host. When kubectl-debug cannot connect to targetHost:agentPort, you can enable port-forward mode. In port-forward mode, the local machine listens on localhost:agentPort and forwards data to/from targetPod:agentPort.
agentless mode: By default, debug-agent needs to be pre-deployed on each node of the cluster, which consumes cluster resources all the time. Unfortunately, debugging Pod is a low-frequency operation. To avoid loss of cluster resources, the agentless mode has been added in #31. In agentless mode, kubectl-debug will first start debug-agent on the host where the target Pod is located, and then debug-agent starts the debug container. After the user exits, kubectl-debug will delete the debug container and kubectl-debug will delete the debug-agent pod at last.

Configuration

kubectl-debug uses nicolaka/netshoot as the default image to run debug container, and use bash as default entrypoint.

You can override the default image and entrypoint with cli flag, or even better, with config file ~/.kube/debug-config:

# debug agent listening port(outside container)
# default to 10027
agentPort: 10027

# whether using agentless mode
# default to true
agentless: true
# namespace of debug-agent pod, used in agentless mode
# default to 'default'
agentPodNamespace: default
# prefix of debug-agent pod, used in agentless mode
# default to  'debug-agent-pod'
agentPodNamePrefix: debug-agent-pod
# image of debug-agent pod, used in agentless mode
# default to 'aylei/debug-agent:latest'
agentImage: aylei/debug-agent:latest

# daemonset name of the debug-agent, used in port-forward
# default to 'debug-agent'
debugAgentDaemonset: debug-agent
# daemonset namespace of the debug-agent, used in port-forwad
# default to 'default'
debugAgentNamespace: kube-system
# whether using port-forward when connecting debug-agent
# default true
portForward: true
# image of the debug container
# default as showed
image: nicolaka/netshoot:latest
# start command of the debug container
# default ['bash']
command:
- '/bin/bash'
- '-l'
# private docker registry auth kuberntes secret
# default registrySecretName is kubectl-debug-registry-secret
# default registrySecretNamespace is default
registrySecretName: my-debug-secret
registrySecretNamespace: debug
# in agentless mode, you can set the agent pod's resource limits/requests:
# default is not set
agentCpuRequests: ""
agentCpuLimits: ""
agentMemoryRequests: ""
agentMemoryLimits: ""
# in fork mode, if you want the copied pod retains the labels of the original pod, you can change this params
# format is []string
# If not set, this parameter is empty by default (Means that any labels of the original pod are not retained, and the labels of the copied pods are empty.)
forkPodRetainLabels: []
# You can disable SSL certificate check when communicating with image registry by 
# setting registrySkipTLSVerify to true.
registrySkipTLSVerify: false
# You can set the log level with the verbosity setting
verbosity : 0

If the debug-agent is not accessible from host port, it is recommended to set portForward: true to using port-forawrd mode.

PS: kubectl-debug will always override the entrypoint of the container, which is by design to avoid users running an unwanted service by mistake(of course you can always do this explicitly).

Authorization

Currently, kubectl-debug reuse the privilege of the pod/exec sub resource to do authorization, which means that it has the same privilege requirements with the kubectl exec command.

Auditing / Security

Some teams may want to limit what debug image users are allowed to use and to have an audit record for each command they run in the debug container.

You can use the environment variable KCTLDBG_RESTRICT_IMAGE_TO restrict the agent to using a specific container image. For example putting the following in the container spec section of your daemonset yaml will force the agent to always use the image docker.io/nicolaka/netshoot:latest regardless of what the user specifies on the kubectl-debug command line

          env : 
            - name: KCTLDBG_RESTRICT_IMAGE_TO
              value: docker.io/nicolaka/netshoot:latest

If KCTLDBG_RESTRICT_IMAGE_TO is set and as a result agent is using an image that is different than what the user requested then the agent will log to standard out a message that announces what is happening. The message will include the URI's of both images.

Auditing can be enabled by placingaudit: truein the agent's config file.

There are 3 settings related to auditing.

audit

Boolean value that indicates whether auditing should be enabled or not. Default value is false

audit_fifo

Template of path to a FIFO that will be used to exchange audit information from the debug container to the agent. The default value is /var/data/kubectl-debug-audit-fifo/KCTLDBG-CONTAINER-ID. If auditing is enabled then the agent will :

Prior to creating the debug container, create a fifo based on the value of audit_fifo. The agent will replace KCTLDBG-CONTAINER-ID with the id of the debug container it is creating.
Create a thread that reads lines of text from the FIFO and then writes log messages to standard out, where the log messages look similar to example below
2020/05/22 17:59:58 runtime.go:717: audit - user: USERNAME/885cbd0506868985a6fc491bb59a2d3c debugee: 48107cbdacf4b478cbf1e2e34dbea6ebb48a2942c5f3d1effbacf0a216eac94f exec: 265 execve("/bin/tar", ["tar", "--help"], 0x55a8d0dfa6c0 /* 7 vars */) = 0
Where USERNAME is the kubernetes user as determined by the client that launched the debug container and debuggee is the container id of the container being debugged.
Bind mount the fifo it creates to the debugger container.

audit_shim

String array that will be placed before the command that will be run in the debug container. The default value is {"/usr/bin/strace", "-o", "KCTLDBG-FIFO", "-f", "-e", "trace=/exec"}. The agent will replace KCTLDBG-FIFO with the fifo path ( see above ) If auditing is enabled then agent will use the concatenation of the array specified by audit_shim and the original command array it was going to use.

The easiest way to enable auditing is to define a config map in the yaml you use to deploy the deamonset. You can do this by place

apiVersion : v1
kind: ConfigMap 
metadata: 
  name : kubectl-debug-agent-config
data: 
  agent-config.yml: |  
    audit: true
---

at the top of the file, adding a configmap volume like so

        - name: config
          configMap:
            name: kubectl-debug-agent-config

and a volume mount like so

            - name: config
              mountPath: "/etc/kubectl-debug/agent-config.yml"
              subPath: agent-config.yml

Roadmap

kubectl-debug is supposed to be just a troubleshooting helper, and is going be replaced by the native kubectl debug command when this proposal is implemented and merged in the future kubernetes release. But for now, there is still some works to do to improve kubectl-debug.

Security: currently, kubectl-debug do authorization in the client-side, which should be moved to the server-side (debug-agent)
More unit tests
More real world debugging example
e2e tests

If you are interested in any of the above features, please file an issue to avoid potential duplication.

Contribute

Feel free to open issues and pull requests. Any feedback is highly appreciated!

Acknowledgement

This project would not be here without the effort of our contributors, thanks!

使用案例

K8S调试工具之--kubectl debug

注：本文基于K8S v1.21.2版本编写 1 Ephemeral Containers 1.1 Ephemeral Containers原理容器的隔离是基于namespace做的，我们将程序需要使用到的资源都变成一个cgroup项，然后进行组合，就形成一个完整的运行环境。基于这个原理，我们可以将一个新建进程添加系统中已运行的namespace中，这就是docker exec和kubectl e
kubectl nginx error debug

[root@master001 ~]# kubectl get all -n test NAME READY STATUS RESTARTS AGE pod/nginx-test 1/1 Running 0 200d [root@master001 ~]# [root@master001 ~]# [root@master0
K8S 故障排错新手段：kubectl debug 实战

K8S INTERNAL 系列容器编排之争在 Kubernetes 一统天下局面形成后，K8S 成为了云原生时代的新一代操作系统。K8S 让一切变得简单了，但自身逐渐变得越来越复杂。【K8S Internals 系列专栏】围绕 K8S 生态的诸多方面，将由博云容器云研发团队定期分享有关调度、安全、网络、性能、存储、应用场景等热点话题。希望大家在享受 K8S 带来的高效便利的同时，又可以如庖丁解牛
kubectl describe

describe 输出指定的一个/多个资源的详细信息。此命令组合调用多条API，输出指定的一个或者一组资源的详细描述。根据打印的详细信息，查找相关信息，结合kubectl get 、kubectl logs ，配合寻找相关问题语法 kubectl describe (-f FILENAME | TYPE [NAME_PREFIX | -l label] | TYPE/NAME) 注：支持
kubernetes Debug Init Containers

概述本博客主要研究Init Containers的执行相关的问题。以下示例命令行将Pod称为，将Init Containers称为和。检查Init Containers的状态执行命令 kubectl get pod <pod-name> 如果出现类似一下情况，表明<init-container-1>已经执行完成 NAME READY STATUS RESTA
简化 Pod 故障诊断: kubectl-debug 介绍

背景容器技术的一个最佳实践是构建尽可能精简的容器镜像。但这一实践却会给排查问题带来麻烦：精简后的容器中普遍缺失常用的排障工具，部分容器里甚至没有 shell (比如 FROM scratch ）。在这种状况下，我们只能通过日志或者到宿主机上通过 docker-cli 或 nsenter 来排查问题，效率很低。Kubernetes 社区也早就意识到了这个问题，在 16 年就有相关的 Issue
kubectl edit

kubectl edit 官方文档使用默认编辑器编辑服务器上定义的资源。使用命令行工具获取的任何资源都可以使用edit命令编辑。edit命令会打开使用KUBE_EDITOR，GIT_EDITOR 或者EDITOR环境变量定义的编辑器，可以同时编辑多个资源，但所编辑过的资源只会一次性提交。edit除命令参数外还接受文件名形式。文件默认输出格式为YAML。要以JSON格式编辑，请指定“-o j
kubectl exec

kubectl exec exec命令同样类似于docker的exec命令，为在一个已经运行的容器中执行一条shell命令，如果一个pod容器中，有多个容器，需要使用-c选项指定容器。通过bash获得pod中某个容器的TTY，相当于登录容器 kubectl exec -it <pod-name> -n <name-space> bash 命令行,创建一个test文件： kubectl exec
Kubernetes Debug Pod

export CONTAINER_ID=4fd96b9d6864 docker run -it --privileged --net=container:$CONTAINER_ID --ipc=container:$CONTAINER_ID --pid=container:$CONTAINER_ID atjapan2015/busybox:v5 bash ---END---

kubectl-debug

Kubectl-debug

Overview