soft lockup

梁丘逸仙

2023-12-01

static void dump_softlock_debug(unsigned long data);

DEFINE_TIMER(softlock_timer, dump_softlock_debug, 0, 0);

init_timer(&softlock_timer);

static void dump_softlock_debug(unsigned long data)
{
int i, reboot;
u64 system[NR_CPUS], num_jifs;

num_jifs = jiffies - beattime;//获得过去了的时间
for_each_possible_cpu(i) {
system[i] = kcpustat_cpu(i).cpustat[CPUTIME_SYSTEM] - heartbeats[i];
}

for_each_possible_cpu(i) {
if ((num_jifs - cputime_to_jiffies(system[i])) < msecs_to_jiffies(10)) {//如果逝去的时间减去系统占用的时间小于10ms, 说明有问题。
WARN(1, "cpu %d wedged\n", i);
smp_call_function_single(i, smp_dumpstack, NULL, 1);
reboot = 1;
}
}

if (reboot) {
panic_timeout = 10;
trigger_all_cpu_backtrace();
panic("Soft lock on CPUs\n");
}

}

在某个tasklet func( )里面

{

beattime = jiffies;

for_each_possible_cpu(i) {
heartbeats[i] = kcpustat_cpu(i).cpustat[CPUTIME_SYSTEM];

}

mod_timer(&softlock_timer, jiffies + SOFT_LOCK_TIME * HZ);

}

================================================

How to Deliberately Crash a System when Soft Lockup Occurs

Information

When the system experience soft-lockups, e.g. BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0] Pid: 0 one needs to generate a vmcore at the time of the soft-lockups which could be used for further investigation of the issue.

Details

Starting from Red Hat Enterprise Linux 5.3, it is now possible to have the vmcore dump generated automatically at the time of a soft-lockup.

To implement this, firstly one needs to set up and test kdump.

Then update the sysctl.conf file by the below command to panic the system when soft-lockup occurs.

# sysctl -w kernel.softlockup_panic=1

This should now result in the system deliberately crashing and generating a vmcore at the time of a soft-lockup.

Soft lockups are situations in which the kernel's scheduler subsystem has not been given a chance to perform its job for more than 10 seconds.

They can be caused by defects in the kernel, by hardware issues or by extremely high workloads. The kernel includes code (in kernel/softlockup.c) to detect these situations and take action on them.

Issue

Enduser may see CPU soft lockup messages in the log files under heavy load. These are informational messages indicating that a CPU did not respond to a soft lockup timer within the timer window (currently 10 seconds on Red Hat Enterprise Linux). They do not indicate a problem with the system.

Solution

The current upstream setting for this soft lockup timer parameter is 60 seconds.

Altering the default value of kernel.softlockup_thresh from 10 to 30 or above would get rid of this message.

# sysctl -w kernel.softlockup_thresh=30

Add this line to /etc/sysctl.conf (takes effect on next reboot):

kernel.softlockup_thresh=30

Change value dynamically; only affects the system's current value:

echo 30 > /proc/sys/kernel/softlockup_thresh

soft lockup

How to Deliberately Crash a System when Soft Lockup Occurs

Information

Details

Issue

Solution

相关阅读

相关文章

相关问答