preempt_disable导致的cyclictest测试延时超标

蒋嘉颖

2023-12-01

1，内核：4.19.90 preempt rt

架构：arm64

2，测试步骤和结果

1）cyclictest -p 90 -m -c 0 -i 1000 （v1.10）

2）ltpstress.sh -p -n -t 48

测试结果：cyclictest的最大延时超过10毫秒

3，ftrace log

1）方法

trace-cmd start -e all;cyclictest -p 90 -m -c 0 -i 1000 -b 1000 --tracemark

ltpstress.sh -p -n -t 48

2）日志

ksoftirq-9       0d..20   712.754481: sched_migrate_task:   comm=modprobe pid=4551 prio=120 orig_cpu=1 dest_cpu=0                      ------进程modprobe从cpu1迁移到cpu0
ksoftirq-9       0d..20   712.754481: sched_stat_runtime:   comm=ksoftirqd/0 pid=9 runtime=4292 [ns] vruntime=157448265788 [ns]
ksoftirq-9       0d..20   712.754483: sched_switch:         ksoftirqd/0:9 [120] S ==> modprobe:4551 [120]

modprobe-4551 0d..20 712.754486: irq_enable: caller=_raw_spin_unlock_irq+0x24 parent=finish_task_switch+0xb8

============关掉了抢占，也就是说，内核不能发生抢占了preempt_disable
modprobe-4551 0...10 712.754529: preempt_disable: caller=0xffff0001019900c8 parent=0xffff0001019900c8

modprobe-4551    0d..20   712.754529: irq_disable:          caller=rcu_irq_exit_irqson+0x30 parent=trace_preempt_off+0xf4
modprobe-4551    0d..20   712.754530: irq_enable:           caller=rcu_irq_exit_irqson+0x40 parent=trace_preempt_off+0xf4
modprobe-4551    0d..10   712.754589: irq_disable:          caller=el1_irq+0x7c parent=0xffff0001019900dc
modprobe-4551    0d.h10   712.754590: irq_handler_entry:    irq=2 name=arch_timer
modprobe-4551    0d.h20   712.754591: hrtimer_cancel:       hrtimer=0xffff8020f4983d98
modprobe-4551    0d.h10   712.754591: hrtimer_expire_entry: hrtimer=0xffff8020f4983d98 now=712720218245 function=hrtimer_wakeup/0x0
modprobe-4551    0d.h20   712.754592: sched_waking:         comm=cyclictest pid=2212 prio=9 target_cpu=000   ================下一个进程本来是要调度cyclictest，但没有发生抢占
modprobe-4551    0dNh30   712.754593: sched_wakeup:         cyclictest:2212 [9] success=1 CPU:000

modprobe-4551    0dNh10   712.754594: hrtimer_expire_exit: hrtimer=0xffff8020f4983d98
modprobe-4551    0dNh10   712.754594: irq_handler_exit:     irq=2 ret=handled
modprobe-4551    0dN.10   712.754595: irq_enable:           caller=el1_irq+0xe4 parent=0xffff0001019900dc ==========中断处理结束

modprobe-4551    0dN.10   712.756505: irq_disable:          caller=el1_irq+0x7c parent=0xffff0001019900dc      ============过了很久，又来了一个中断，irq=42 name=enp5s
modprobe-4551    0dNh10   712.756508: irq_handler_entry:    irq=42 name=enp5s0
modprobe-4551    0dNh10   712.756508: irq_handler_exit:     irq=42 ret=handled
modprobe-4551    0dNh20   712.756509: sched_waking:         comm=irq/42-enp5s0 pid=781 prio=49 target_cpu=000         =============== 唤醒了irq/42-enp5s0，但也没有抢占
modprobe-4551    0dNh30   712.756511: sched_wakeup:         irq/42-enp5s0:781 [49] success=1 CPU:000
modprobe-4551    0dN.10   712.756512: irq_enable:           caller=el1_irq+0xe4 parent=0xffff0001019900dc

。。。

modprobe-4551    0dN.10   712.924394: irq_disable:          caller=__schedule+0x84 parent=preempt_schedule_common+0xa8
modprobe-4551    0dN.10   712.924394: rcu_utilization:      Start context switch                       ================最终modprobe自己放弃了cpu，调度器找到了最高优先级的migration
modprobe-4551    0dN.10   712.924394: rcu_utilization:      End context switch
modprobe-4551    0d..20   712.924396: sched_switch:         modprobe:4551 [120] R ==> migration/0:17 [0]
migratio-17      0d..20   712.924397: irq_enable:           caller=_raw_spin_unlock_irq+0x24 parent=finish_task_switch+0xb8

3）分析

总结问题：

进程modprobe从cpu1迁移到cpu0
modprobe-4551 0...10 712.754529: preempt_disable: caller=0xffff0001019900c8 parent=0xffff0001019900c8，关掉了抢占，也就是说，内核不能发生抢占了
hrtimer时钟中断到来
下一个进程本来是要调度cyclictest，但没有发生抢占，也不能发生抢占
在此期间，又来了好几个中断

irq_handler_entry: irq=42 name=enp5s0

irq_handler_entry: irq=2 name=arch_timer

irq_handler_entry: irq=43 name=radeon

都没有发生任务切换

直到最后modprobe主动放弃了cpu

4）重现

内核提供了preemptirq_delay_test，里面提供了preempt_enable/disable的测试，可以很容易的重现出本例的问题：

modprobe preemptirq_delay_test test_mode=preempt delay=500000 cpu_affinity=3
cyclictest -p 90 -m -c 0 -i 1000 -a 3

preemptirq_delay_test在测试的时候有可能与cyclictest没有跑在同一个cpu上，那么问题就不会重新，我给preemptirq_delay_test增加了一个cpu_affinity的参数，可以指定运行的cpu。该patch已经被社区接受并合入主干。

4，代码

el1_irq执行完中断handler后要切换到下一个任务，代码如下：

el1_irq:
   kernel_entry 1
   enable_da_f
#ifdef CONFIG_TRACE_IRQFLAGS
   bl   trace_hardirqs_off
#endif

   irq_handler    ----------中断的handler

#ifdef CONFIG_PREEMPT
   ldr   w24, [tsk, #TSK_TI_PREEMPT]   // get preempt count 判断preempt count是不是为0，不是的话，就不能抢占，本例就是这个问题
   cbnz   w24, 2f               // preempt count != 0
   ldr   x0, [tsk, #TSK_TI_FLAGS]   // get flags
   tbnz   x0, #TIF_NEED_RESCHED, 1f   // needs rescheduling?   TIF_NEED_RESCHED这个标志也要置为1，否则也不能切换任务

   ldr   w24, [tsk, #TSK_TI_PREEMPT_LAZY] // get preempt lazy count
   cbnz   w24, 2f               // preempt lazy count != 0
   tbz   x0, #TIF_NEED_RESCHED_LAZY, 2f   // needs rescheduling?
1:
   bl   el1_preempt
2:
#endif
#ifdef CONFIG_TRACE_IRQFLAGS
   bl   trace_hardirqs_on
#endif
   kernel_exit 1
ENDPROC(el1_irq)

#ifdef CONFIG_PREEMPT
el1_preempt:
   mov   x24, lr
1:   bl   preempt_schedule_irq       // irq en/disable is done inside
   ldr   x0, [tsk, #TSK_TI_FLAGS]   // get new tasks TI_FLAGS
   tbnz   x0, #TIF_NEED_RESCHED, 1b   // needs rescheduling?
   tbnz   x0, #TIF_NEED_RESCHED_LAZY, 1b   // needs rescheduling?
   ret   x24
#endif

如果上述条件（preempt_count和TIF_NEED_RESCHED）都满足的话，系统会执行preempt_schedule_irq，真正的去切换进程的上下文。本例中，el1_preempt并没有被调用，也就是说，由于modprobe调用了preempt_disable，导致条件不满足。

PS：el1_preempt -- preempt_schedule_irq -- __schedule进行任务上下文切换

5，尝试修复

static enum hrtimer_restart hrtimer_wakeup(struct hrtimer *timer)
{
   struct hrtimer_sleeper *t =
       container_of(timer, struct hrtimer_sleeper, timer);
   struct task_struct *task = t->task;

   t->task = NULL;
   if (task)
       wake_up_process(task);
#if 0
   trace_printk("!!!current:%s, next:%s, count: %d\n",
               current->comm, task->comm, preempt_count());
   trace_printk("!!!current need reshed:%d, next need resched:%d\n",
               test_tsk_need_resched(current), test_tsk_need_resched(task));

   if(preempt_count())
       preempt_count_set(0);
   set_tsk_need_resched(task);
#endif
   return HRTIMER_NORESTART;
}

调用preempt_count_set(0);和set_tsk_need_resched(task);企图强制el1_irq调用el1_preempt，但是系统崩溃且没有输出信息。

6，求助社区

社区回复，设计如此"This works as designed. Running a latency sensitive application has some inherent rules. You need to stick to that rules, otherwise excessive latencies are to be expected. "

7，后续

我个人觉得，preempt rt方案提供了顶半部的线程化，sleeping spinlock，都是为了保证代码错误的行为的时候，系统依然能保证实时的确定性，在这一点上，preempt_disable/enable不能保证有效的容错性。

正确把panic信息打印出来看一看，有没有方式优化。

preempt_disable导致的cyclictest测试延时超标

相关阅读

相关文章

相关问答

相关文档