static void dump_softlock_debug(unsigned long data);
DEFINE_TIMER(softlock_timer, dump_softlock_debug, 0, 0);
init_timer(&softlock_timer);
static void dump_softlock_debug(unsigned long data)
{
int i, reboot;
u64 system[NR_CPUS], num_jifs;
num_jifs = jiffies - beattime;//获得过去了的时间
for_each_possible_cpu(i) {
system[i] = kcpustat_cpu(i).cpustat[CPUTIME_SYSTEM] - heartbeats[i];
}
for_each_possible_cpu(i) {
if ((num_jifs - cputime_to_jiffies(system[i])) < msecs_to_jiffies(10)) {//如果 逝去的时间减去系统占用的时间 小于10ms, 说明有问题。
WARN(1, "cpu %d wedged\n", i);
smp_call_function_single(i, smp_dumpstack, NULL, 1);
reboot = 1;
}
}
if (reboot) {
panic_timeout = 10;
trigger_all_cpu_backtrace();
panic("Soft lock on CPUs\n");
}
}
在某个tasklet func( )里面
{
beattime = jiffies;
for_each_possible_cpu(i) {
heartbeats[i] = kcpustat_cpu(i).cpustat[CPUTIME_SYSTEM];
}
mod_timer(&softlock_timer, jiffies + SOFT_LOCK_TIME * HZ);
}
================================================
How to Deliberately Crash a System when Soft Lockup Occurs
Information
When the system experience soft-lockups, e.g.
BUG: soft lockup - CPU#1 stuck for 15s! [swapper:0] Pid: 0 one needs to generate a vmcore at the time of the soft-lockups which could be used for further investigation of the issue.
Details
Starting from Red Hat Enterprise Linux 5.3, it is now possible to have the
vmcore dump generated automatically at the time of a soft-lockup.
To implement this, firstly one needs to set up and test kdump.
Then update the
sysctl.conf file by the below command to panic the system when soft-lockup occurs.
# sysctl -w kernel.softlockup_panic=1
This should now result in the system deliberately crashing and generating a vmcore at the time of a soft-lockup.
Soft lockups are situations in which the kernel's scheduler subsystem has
not been given a chance to perform its job for more than 10 seconds.
They can be caused by defects in the kernel, by hardware issues or by extremely high workloads. The kernel includes code (in kernel/softlockup.c) to detect these situations and take action on them.
Issue
Enduser may see
CPU soft lockup messages in the log files under heavy load. These are informational messages indicating that a CPU did not respond to a soft lockup timer within the timer window (currently 10 seconds on Red Hat Enterprise Linux). They do not indicate a problem with the system.
Solution
The current upstream setting for this soft lockup timer parameter is 60 seconds.
Altering the default value of
kernel.softlockup_thresh from 10 to 30 or above would get rid of this message.
# sysctl -w kernel.softlockup_thresh=30
OR
Add this line to
/etc/sysctl.conf (takes effect on next reboot):
kernel.softlockup_thresh=30
OR
Change value dynamically; only affects the system's current value:
echo 30 > /proc/sys/kernel/softlockup_thresh