当前位置: 首页 > 知识库问答 >
问题:

Kubernetes即使在发送SIGTERM后也会向pod发送流量

公西永嘉
2023-03-14

我有一个配置了优雅关机的SpringBoot项目。部署在k8s1.12.7这是日志,

2019-07-20 10:23:16.180 INFO [service,,,] 1 --- [ Thread-7] com.jay.util.GracefulShutdown : Received shutdown event
2019-07-20 10:23:16.180 INFO [service,,,] 1 --- [ Thread-7] com.jay.util.GracefulShutdown : Waiting for 30s to finish
2019-07-20 10:23:16.273 INFO [service,fd964ebaa631a860,75a07c123397e4ff,false] 1 --- [io-8080-exec-10] com.jay.resource.ProductResource : GET /products?id=59
2019-07-20 10:23:16.374 INFO [service,9a569ecd8c448e98,00bc11ef2776d7fb,false] 1 --- [nio-8080-exec-1] com.jay.resource.ProductResource : GET /products?id=68
...
2019-07-20 10:23:33.711 INFO [service,1532d6298acce718,08cfb8085553b02e,false] 1 --- [nio-8080-exec-9] com.jay.resource.ProductResource : GET /products?id=209
2019-07-20 10:23:46.181 INFO [service,,,] 1 --- [ Thread-7] com.jay.util.GracefulShutdown : Resumed after hibernation
2019-07-20 10:23:46.216 INFO [service,,,] 1 --- [ Thread-7] o.s.s.concurrent.ThreadPoolTaskExecutor : Shutting down ExecutorService 'applicationTaskExecutor'

应用程序已从库伯内特斯收到10:23:16.180处的SIGTERM。根据Pods的终止,点#5表示终止Pod已从服务的endpoint列表中删除,但它与发送SIGTERM信号后转发请求17秒(直到10:23:33.711)相矛盾。是否缺少任何配置?

Dockerfile

FROM openjdk:8-jre-slim
MAINTAINER Jay

RUN apt update && apt install -y curl libtcnative-1 gcc && apt clean

ADD build/libs/sample-service.jar /

CMD ["java", "-jar" , "sample-service.jar"]

优雅关机

// https://github.com/spring-projects/spring-boot/issues/4657
class GracefulShutdown(val waitTime: Long, val timeout: Long) : TomcatConnectorCustomizer, ApplicationListener<ContextClosedEvent> {

    @Volatile
    private var connector: Connector? = null

    override fun customize(connector: Connector) {
        this.connector = connector
    }

    override fun onApplicationEvent(event: ContextClosedEvent) {

        log.info("Received shutdown event")

        val executor = this.connector?.protocolHandler?.executor
        if (executor is ThreadPoolExecutor) {
            try {
                val threadPoolExecutor: ThreadPoolExecutor = executor

                log.info("Waiting for ${waitTime}s to finish")
                hibernate(waitTime * 1000)

                log.info("Resumed after hibernation")
                this.connector?.pause()

                threadPoolExecutor.shutdown()
                if (!threadPoolExecutor.awaitTermination(timeout, TimeUnit.SECONDS)) {
                    log.warn("Tomcat thread pool did not shut down gracefully within $timeout seconds. Proceeding with forceful shutdown")

                    threadPoolExecutor.shutdownNow()

                    if (!threadPoolExecutor.awaitTermination(timeout, TimeUnit.SECONDS)) {
                        log.error("Tomcat thread pool did not terminate")
                    }
                }
            } catch (ex: InterruptedException) {
                log.info("Interrupted")
                Thread.currentThread().interrupt()
            }
        }else
            this.connector?.pause()
    }

    private fun hibernate(time: Long){
        try {
            Thread.sleep(time)
        }catch (ex: Exception){}
    }

    companion object {
        private val log = LoggerFactory.getLogger(GracefulShutdown::class.java)
    }
}
@Configuration
class GracefulShutdownConfig(@Value("\${app.shutdown.graceful.wait-time:30}") val waitTime: Long,
                             @Value("\${app.shutdown.graceful.timeout:30}") val timeout: Long) {

    companion object {
        private val log = LoggerFactory.getLogger(GracefulShutdownConfig::class.java)
    }

    @Bean
    fun gracefulShutdown(): GracefulShutdown {

        return GracefulShutdown(waitTime, timeout)
    }

    @Bean
    fun webServerFactory(gracefulShutdown: GracefulShutdown): ConfigurableServletWebServerFactory {

        log.info("GracefulShutdown configured with wait: ${waitTime}s and timeout: ${timeout}s")

        val factory = TomcatServletWebServerFactory()
        factory.addConnectorCustomizers(gracefulShutdown)
        return factory
    }
}

部署文件

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    k8s-app: service
  name: service
spec:
  progressDeadlineSeconds: 420
  replicas: 1
  revisionHistoryLimit: 1
  selector:
    matchLabels:
      k8s-app: service
  strategy:
    rollingUpdate:
      maxSurge: 2
      maxUnavailable: 0
    type: RollingUpdate
  template:
    metadata:
      labels:
        k8s-app: service
    spec:
      terminationGracePeriodSeconds: 60
      containers:
      - env:
        - name: SPRING_PROFILES_ACTIVE
          value: dev
        image: service:2
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 20
          httpGet:
            path: /actuator/health
            port: 8080
          initialDelaySeconds: 60
          periodSeconds: 30
          timeoutSeconds: 5
        name: service
        ports:
        - containerPort: 8080
          protocol: TCP
        readinessProbe:
          failureThreshold: 60
          httpGet:
            path: /actuator/health
            port: 8080
          initialDelaySeconds: 100
          periodSeconds: 10
          timeoutSeconds: 5

更新时间:

添加了自定义健康检查endpoint

@RestControllerEndpoint(id = "live")
@Component
class LiveEndpoint {

    companion object {
        private val log = LoggerFactory.getLogger(LiveEndpoint::class.java)
    }

    @Autowired
    private lateinit var gracefulShutdownStatus: GracefulShutdownStatus

    @GetMapping
    fun live(): ResponseEntity<Any> {

        val status = if(gracefulShutdownStatus.isTerminating())
            HttpStatus.INTERNAL_SERVER_ERROR.value()
        else
            HttpStatus.OK.value()

        log.info("Status: $status")
        return ResponseEntity.status(status).build()
    }
}

更改了livenessProbe,

  livenessProbe:
    httpGet:
      path: /actuator/live
      port: 8080
    initialDelaySeconds: 100
    periodSeconds: 5
    timeoutSeconds: 5
    failureThreshold: 3

这是更改后的日志,

2019-07-21 14:13:01.431  INFO [service,9b65b26907f2cf8f,9b65b26907f2cf8f,false] 1 --- [nio-8080-exec-2] com.jay.util.LiveEndpoint          : Status: 200
2019-07-21 14:13:01.444  INFO [service,3da259976f9c286c,64b0d5973fddd577,false] 1 --- [nio-8080-exec-3] com.jay.resource.ProductResource   : GET /products?id=52
2019-07-21 14:13:01.609  INFO [service,,,] 1 --- [       Thread-7] com.jay.util.GracefulShutdown      : Received shutdown event
2019-07-21 14:13:01.610  INFO [service,,,] 1 --- [       Thread-7] com.jay.util.GracefulShutdown      : Waiting for 30s to finish
...
2019-07-21 14:13:06.431  INFO [service,002c0da2133cf3b0,002c0da2133cf3b0,false] 1 --- [nio-8080-exec-3] com.jay.util.LiveEndpoint          : Status: 500
2019-07-21 14:13:06.433  INFO [service,072abbd7275103ce,d1ead06b4abf2a34,false] 1 --- [nio-8080-exec-4] com.jay.resource.ProductResource   : GET /products?id=96
...
2019-07-21 14:13:11.431  INFO [service,35aa09a8aea64ae6,35aa09a8aea64ae6,false] 1 --- [io-8080-exec-10] com.jay.util.LiveEndpoint          : Status: 500
2019-07-21 14:13:11.508  INFO [service,a78c924f75538a50,0314f77f21076313,false] 1 --- [nio-8080-exec-2] com.jay.resource.ProductResource   : GET /products?id=110
...
2019-07-21 14:13:16.431  INFO [service,38a940dfda03956b,38a940dfda03956b,false] 1 --- [nio-8080-exec-9] com.jay.util.LiveEndpoint          : Status: 500
2019-07-21 14:13:16.593  INFO [service,d76e81012934805f,b61cb062154bb7f0,false] 1 --- [io-8080-exec-10] com.jay.resource.ProductResource   : GET /products?id=152
...
2019-07-21 14:13:29.634  INFO [service,38a32a20358a7cc4,2029de1ed90e9539,false] 1 --- [nio-8080-exec-6] com.jay.resource.ProductResource   : GET /products?id=191
2019-07-21 14:13:31.610  INFO [service,,,] 1 --- [       Thread-7] com.jay.util.GracefulShutdown      : Resumed after hibernation
2019-07-21 14:13:31.692  INFO [service,,,] 1 --- [       Thread-7] o.s.s.concurrent.ThreadPoolTaskExecutor  : Shutting down ExecutorService 'applicationTaskExecutor'

使用3次故障的livenessProbe,kubernetes在liveness故障后为流量服务13秒,即从14:13:16.431到14:13:29.634。

更新2:事件顺序(由于Eamonn McEvoy)

seconds | healthy | events
   0    |    ✔    |   * liveness probe healthy
   1    |    ✔    |   - SIGTERM
   2    |    ✔    |   
   3    |    ✔    |   
   4    |    ✔    |   
   5    |    ✔    |   * liveness probe unhealthy (1/3)
   6    |    ✔    |   
   7    |    ✔    |   
   8    |    ✔    |   
   9    |    ✔    |   
   10   |    ✔    |   * liveness probe unhealthy (2/3)
   11   |    ✔    |   
   12   |    ✔    |   
   13   |    ✔    |   
   14   |    ✔    |   
   15   |    ✘    |   * liveness probe unhealthy (3/3)
   ..   |    ✔    |   * traffic is served       
   28   |    ✔    |   * traffic is served
   29   |    ✘    |   * pod restarts

共有1个答案

郑博厚
2023-03-14

SIGTERM不会立即将pod置于终止状态。您可以在日志中看到您的应用程序在10:23:16.180开始优雅地关闭

就库伯内特斯而言,吊舱在优雅的停堆期间看起来还不错。您需要在部署中添加一个活跃度探测器;当它变得不健康时,交通就会停止。

livenessProbe:
  httpGet:
    path: /actuator/health
    port: 8080
  initialDelaySeconds: 100
  periodSeconds: 10
  timeoutSeconds: 5

更新:

这是因为您的故障阈值为3,因此在sigterm之后,您最多允许15秒的通信量;

例如。

seconds | healthy | events
   0    |    ✔    |   * liveness probe healthy
   1    |    ✔    |   - SIGTERM
   2    |    ✔    |   
   3    |    ✔    |   
   4    |    ✔    |   
   5    |    ✔    |   * liveness probe issued
   6    |    ✔    |       .
   7    |    ✔    |       .
   8    |    ✔    |       .
   9    |    ✔    |       .
   10   |    ✔    |   * liveness probe timeout - unhealthy (1/3)
   11   |    ✔    |   
   12   |    ✔    |   
   13   |    ✔    |   
   14   |    ✔    |   
   15   |    ✔    |   * liveness probe issued
   16   |    ✔    |       .
   17   |    ✔    |       .
   18   |    ✔    |       .
   19   |    ✔    |       .
   20   |    ✔    |   * liveness probe timeout - unhealthy (2/3)
   21   |    ✔    |   
   22   |    ✔    |   
   23   |    ✔    |   
   24   |    ✔    |   
   25   |    ✔    |   * liveness probe issued
   26   |    ✔    |       .
   27   |    ✔    |       .
   28   |    ✔    |       .
   29   |    ✔    |       .
   30   |    ✘    |   * liveness probe timeout - unhealthy (3/3)
        |         |   * pod restarts

这是假设endpoint在正常关闭期间返回不正常的响应。由于您有5秒的超时时间,如果探测器只是超时,这将花费更长的时间,在发出活性探测请求和接收响应之间会有5秒的延迟。这种情况可能是容器在达到活性阈值之前实际死亡,并且您仍然可以看到原始行为

 类似资料:
  • 问题内容: 这是我的代码: 我的回应是: 500服务器错误 我打开我的变量,然后看到: POST / rest / platform / domain / list HTTP / 1.1 即使我曾经将其设置为GET,为什么也将其设置为POST ? 问题答案: 将请求方法隐式设置为POST,因为这是您要发送请求正文时的默认方法。 如果要使用GET,请删除该行并删除该行。您无需发送GET请求的请求正文

  • 这是一个经过验证的API,我试图调用...我可以看到,OPTIONs调用返回200 OK和适当的响应头,仍然火狐不发送API调用,完美地工作在Chrome...有什么想法吗 一些值隐藏在下面... 响应报头 HTTP/1.1 200 OK 访问-控制-允许-起源:* 访问-控制-允许-方法:[POST,GET] 访问-控制-允许-报头:授权 日期:周三,02 Oct 2013 20:52:02 G

  • 我已经为这个问题挣扎了几天了 我有一个android应用程序,用户可以指定植物浇水的频率。我想根据植物浇水的频率发送通知。(例如,每3天发送一次通知) 我知道有很多问题可以解决这个问题,但我仍然无法让它发挥作用 我从使用报警管理器实现/启动服务开始 主要问题是,一旦我的应用程序被终止,通知将不再发送 我正在使用firebase存储所有数据 我读了一些关于firebase函数的内容,但我不知道这是否

  • 我有一个关于Spring WebFlux的问题。我想创建一个使用内容类型text/event-stream的反应endpoint。不是生产而是消费。我们的一个服务需要向另一个服务发送大量的小对象,我们认为这样流式传输可能是一个很好的解决方案。 流量是每1秒产生一个值的流。我遇到的问题是,WebClient完全读取发布服务器,然后将数据作为一个整体发送,而不是一个接一个地流式传输。我能用这个客户机或

  • 问题内容: 尽管JVM会将SIGTERM和类似的信号转换为关闭挂钩,但是许多服务关闭脚本使用TCP端口启动关闭。(例如,Tomcat的关闭端口,Java Service Wrapper ,JBoss的管理接口等) 所以我认为不建议使用信号和关闭钩子来正常关闭Java服务,直到发现Play!框架通过关闭钩子管理服务生命周期,并且由生成的启动脚本假定将信号发送到JVM的PID。 我知道信号是与平台相关

  • 我正在从Bash脚本启动一个名为 的Java代码。Bash 脚本启动 Java 代码,然后运行 Java 代码。在Java程序结束时,我想发送一个信号回到Bash脚本以终止。请记住,Bash 脚本在 PID = 1 的情况下运行。我必须杀死PID 1过程。 我设置了bash脚本,使其在无限循环中运行,并< code >等待终止信号: 我正在使用Docker实例,信号是< code>sigterm。