Sanjeev Sharma Blog
A fine WordPress.com site
Debug kernel panics
调试kernel panics
Debugging Analysis of Kernel panics and Kernel oopses using System Map:
使用system map 调试分析kernel panics 和kernel oopses:
There are various ways to debug an kernel like debugging by printing ,Using kernel symbols,Using a kernel debugger but this page describes some tricks and techniques to help Interpreting an Oops message and Kernel panic but before going forward we should understand what is kernel OOPS and panic.
有许多各种各样的debug kernel的方法,如使用print函数,使用kernel symbols,使用kernel debugger工具,此处介绍一些技巧帮助理解oops消息和kernel panic日志,首先我们了解一下什么是kernel Oops和kernel panic
A kernel panic is an action taken by an operating system upon detecting an internal fatal error from which it cannot safely recover and force the system to do a controlled system hang / reboot due to a detected run-time system malfunction (not necessarily an OOPS). The operation of the panic kernel feature may be controllable via run-time sysconfig settings such as hung task handling. This is a kernel panic.
kernel panic是操作系统检测到内部致命错误后采取的行动,由于检测到运行时的故障,不能安全恢复,强制系统挂起/重启时。不一定是Oops。内核panic的特性是操作可以通过运行时sysconfig设置(如挂起任务处理)进行控制。
OOPS are due to the Kernel exception handler getting executed including macros such as BUG() which is defined as an invalid instruction. Each exception has a unique number. Some “oops”es are bad enough that the kernel decides to execute the panic feature to stop running immediately. This is a kernel crash optionally followed by invoking a panic.
Oops是由于内核异常处理程序被执行,包括一些宏,例如定义为无效指令的BUG()。每个异常都有一个唯一的编号。有些“Oops”的错误已经足够严重,以至于内核决定执行panic特性以立即停止运行。这是一个可选的内核崩溃,然后调用panic。
When a Kernel OOPS is encountered in a running kernel an OOPS message like ([ 67.994624] Internal error: Oops: 5 [#1] XXXXXXXXXXX) is displayed on the screen. The OOPS message contains the following: the values of the CPU registers, the address of the function that invoked the failure i.e PC, the stack, and the name of the current process executing. By using this OOPS statement, you can begin to debug the specific problem in the kernel. However, sometimes this OOPS message is insufficient.
当kernel运行时遇到 Oops,形如 ([ 67.994624] Internal error: Oops: 5 [#1] XXXXXXXXXXX)的Oops消息回显示在屏幕上。Oops消息包含如下:CPU寄存器的值,调用故障的函数的地址,例如,当前正在执行的进程的名称,通过Oops语句,可以开始调试内核中的特定问题。然而,有时这个Oops消息是不够的。
In Linux, the System.map file is a symbol table used by the kernel.The System.map is required when the address of a symbol name, or the symbol name of an address, is needed. It is especially useful for debugging kernel panics and kernel oopses. The kernel does the address-to-name translation itself when CONFIG_KALLSYMS is enabled so that tools like ksymoops are not required.
在linux中,System.map文件时内核使用的符号表。当需要符号名的地址或地址的符号名时,System.map是必需的。它对于调试kernel panic和kernel oops特别有用,当启用CONFIG_KALLSYMS时,内核会执行从地址到名称的转换,这样就不需要像ksymoops这样的工具了。
More detailed information can be found here System.map.
Note: Addresses inside System.map may change from one build to the next or in another word new System.map is generated for each build of the kernel however it is must to have System.map of the same Linux kernel on which Kernel panics/oopses has been reported to debug the problem.
System.map中的地址可能随着System.map的更新而改变,所以System.map必须对应linux kernel对应kernel panic/oops才能调试问题。
Note in the kernel backtraces in the logs, the kernel finds the nearest symbol to the address being analysed. Not all function symbols are available because of inlining, static, and optimisation so sometimes the reported function name is not the location of the failure.
注意:在内核回溯日志中,内核会找到与被分析地址最近的符号。由于内联、静态和优化等原因,并非所有函数符号都可用,因此有时报告的函数名并不是故障发生的位置。
How to Debug Kernel panics and oopses:
如何调试kernel panic和oops:
[ 67.994406] Unable to handle kernel paging request at virtual address 02120bc4
[ 67.994495] pgd = 94240000
[ 67.994553] [02120bc4] *pgd=00000000
[ 67.994624] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[ 67.994926] CPU: 0 Not tainted (3.8.13.23-XXXXXXXX #1)
[ 67.994996] PC is at add_range+0x14/0x6c
[ 67.995056] LR is at XXXXXXX+0x38/0x44
[ 67.995117] pc : [<80049F3C>] lr : [<8004a1ec>] psr: 20000013
[ 67.995117] sp : 9423fd90 ip : 9423fda8 fp : 9423fda4
[ 67.995176] r10: 00000000 r9 : 9423ff60 r8 : 8000da84
[ 67.995233] r7 : 000041fd r6 : 00000081 r5 : aa068088 r4 : aa068088
[ 67.995290] r3 : ac8ceb80 r2 : 021ab618 r1 : 00000000 r0 : 02120bc0
[ 67.995348] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
[ 67.995406] Control: 10c5387d Table: 2424004a DAC: 00000015
[ 67.995462] Process cat (pid: 1352, stack limit = 0x9423e238)
[ 67.995518] Stack: (0x9423fd90 to 0x94240000)
[ 67.995577] fd80: aa068088 aa068088 9423fdb4 9423fda8
Here is the kernel backtrace where Kernel is crashing at “add_range” function. (Actually nearest function symbol to the crash). Let’s analyze step by step.
这是内核回溯日志,kernel在add_range函数处崩溃。(实际上是最接近崩溃的函数符号),让我们一步步分析。
1. Crash occurs at below location as per Back Trace.
系统奔溃发生于此处
PC is at add_range +0x14/0x6c
2. Grep/find add_range() in System.map file and note down symbol name address i.e. 80049f28
在System.map文件中找add_range符号名对应地址
80049f28 T add_range
3.Replace add_range symbol name address in “add_range+0x14” = 80049f28 + 0x14 = 80049F3C
将对应add_range符号的地址加上0x14得出新地址80049F3C
4.“80049F3C” should be same as PC Address in the Back trace .Wow it is same so it means that Kernel version I am using and on which issue is reported is same (also depends on same .config settings). Let’s move to next step.
" 80049F3C "应该是相同的PC地址在追溯日志上。这是一样的,所以这意味着我使用的内核版本和System.map和kernel debug日志时相同的。
5.run objdump on vmlinux to get the disassembly and detail on objdump program and vmlinux can be fetched from below hyperlinks.
在vmlinux上运行objdump以获得objdump程序的拆解的和详细信息
objdump: is a program for displaying various information about object files. For instance, it can be used as a disassembler to view executable in assembly form.
objdump:是一个显示有关目标文件的各种信息的程序。例如,它可以用作反汇编程序,以汇编形式查看可执行文件。
vmlinux: is a statically linked executable file that contains the Linux kernel in one of the object file formats supported by Linux, The vmlinux file might be required for kernel debugging, symbol table generation or other operations,
vmlinux:是一个静态链接的可执行文件,包含Linux支持的目标文件格式之一的Linux内核,内核调试、符号表生成或其他操作可能需要vmlinux文件,
#objdump -D -S --show-raw-insn --prefix-addresses --line-numbers vmlinux > objdump
6.Find “add_range” in vmlinux.objdump and look for PC address calculated above i.e. 80049F3C
在vmlinux.objdump中找到“add_range”。并查找上面计算的PC地址,即80049F3C
80049F3C <add_Range+0x14> e5903004 ldr r3, [r0, #4]
7.The crash point can be identified below.
最终找到crash位置
ldr r3, [r0, #4] = r0+4 = 02120bc0+4 = 02120bc4 /*replace r0 with r0 register value from the Back Traces*
8.Wow ! This is same as fault address
Unable to handle kernel paging request at virtual address 02120bc4
这与错误地址相同
无法处理虚拟地址02120bc4上的内核分页请求
Conclusion: Here r0 is pointing to invalid address and from disassembly found out where r0 is pointing and found out why r0 is pointing to invalid address.
结论:这里r0指向无效地址,从拆卸中找出r0指向哪里,并找出为什么r0指向无效地址。
A quick and easy way to find the line of code where your kernel panicked or oopsed is to use GDB list command. You can do this as follows.
一种快速而简单的方法是使用GDB list命令来查找内核panic和oops的代码行。你可以这样做。
Let’s assume your panic/oops message says something like:
让我们假设你的panic/oops消息是这样说的:
[ 174.507084] Stack:
[ 174.507163]
ce0bd8ac 00000008 00000000 ce4a7e90 c039ce30 ce0bd8ac c0718b04 c07185a0
[ 174.507380]
ce4a7ea0 c0398f22 ce0bd8ac c0718b04 ce4a7eb0 c037deee ce0bd8e0 ce0bd8ac
[ 174.507597]
ce4a7ec0 c037dfe0 c07185a0 ce0bd8ac ce4a7ed4 c037d353 ce0bd8ac ce0bd8ac
[ 174.507888] Call Trace:
[ 174.508125]
[<c039ce30>] ? sd_remove+0x20/0x70
[ 174.508235]
[<c0398f22>] ? scsi_bus_remove+0x32/0x40
[ 174.508326]
[<c037deee>] ? __device_release_driver+0x3e/0x70
[ 174.508421]
[<c037dfe0>] ? device_release_driver+0x20/0x40
[ 174.508514]
[<c037d353>] ? bus_remove_device+0x73/0x90
[ 174.508606]
[<c037bccf>] ? device_del+0xef/0x150
[ 174.508693]
[<c0399207>] ? __scsi_remove_device+0x47/0x80
[ 174.508786]
[<c0399262>] ? scsi_remove_device+0x22/0x40
[ 174.508877]
[<c0399324>] ? __scsi_remove_target+0x94/0xd0
[ 174.508969]
[<c03993c0>] ? __remove_child+0x0/0x20
[ 174.509060]
[<c03993d7>] ? __remove_child+0x17/0x20
[ 174.509148]
[<c037b868>] ? device_for_each_child+0x38/0x60
[ 174.509241]
[<c039938f>] ? scsi_remove_target+0x2f/0x60
[ 174.509393]
[<d0c38907>] ? __iscsi_unbind_session+0x77/0xa0
[scsi_transport_iscsi]
[ 174.509699]
[<c015272e>] ? run_workqueue+0x6e/0x140
[ 174.509801]
[<d0c38890>] ? __iscsi_unbind_session+0x0/0xa0
[scsi_transport_iscsi]
[ 174.509977]
[<c0152888>] ? worker_thread+0x88/0xe0
[ 174.510047]
[<c01566a0>] ? autoremove_wake_function+0x0/0x40
Lets say you want to know what line of code represents sd_remove+0x20/0x70. cd to your directory of your kernel tree and run gdb on the “.o” file which has the function sd_remove() in this case in sd.o, and use the gdb “list” command, (gdb) list *(function+0xoffset), in this case function is sd_remove() and offset is 0x20, and gdb should tell you the line number where you hit the panic or oops. This worked reliably for most cases.
假设您想知道哪一行代码表示sd_remove+0x20/0x70,cd到你kernel目录,运行gdb对应你sd_remove函数的".o"文件,这里对应sd.o,然后使用gdb"list" 命令,(gdb) list *(function+0xoffset),这里对应sd_remove在0x20的偏移位置,而gdb应该会告诉你你在哪里panic和oops的行号。这在大多数情况下都是可靠的。
#gdb sd.o
gdb)list *(sd_remove+0x20)
0x1650 is in sd_remove
(Kernel/linux-xxx/drivers/scsi/sd.c:2125).
2120 static int sd_remove(struct device *dev)
2121 {
2122 struct scsi_disk *sdkp;
2123
2124 async_synchronize_full();
2125 sdkp = dev_get_drvdata(dev);
2126
blk_queue_prep_rq(sdkp->device->request_queue, scsi_prep_fn);
2127 device_del(&sdkp->dev);
2128 del_gendisk(sdkp->disk);
2129 sd_shutdown(dev);
(gdb)
so dev_get_drvdata()is the function where crash has been happened and Lets analyze why dev_get_drvdata(dev)is crashing.
因此dev_get_drvdata()是发生崩溃的函数,让我们分析一下为什么dev_get_drvdata(dev)会崩溃。
Disassembling the kernel
拆解内核
Cross tools are needed.The objdump utility,The main utility used to do it is the objdump one.
需要交叉工具。objdump实用程序,用来做这件事的主要实用程序是objdump。
arm-none-linux-gnueabi-objdump –dr vmlinux /*If We have object code handy then, we can disassemble the individual object file also like objdump -S panic.o"
gdb on vmlinux
One can disassemble a built kernel using gdb on the vmlinux image. This is useful when one gets a kernel Oops message and a stack dump – one can then disassemble the object code and see where the Oops is occurring. For example:
可以在vmlinux映像上使用gdb来拆卸构建的内核。这在获得内核Oops消息和堆栈转储时非常有用——然后可以反汇编目标代码,查看哪里发生了Oops。例如:
#arm-none-linux-gnueabi-gdb –silent vmlinux
#disassemble printk
Dump of assembler code for function printk:
0xffffffff8023dce0 <printk+0>: sub $0xd8,%rsp
0xffffffff8023dce7 <printk+7>: lea 0xe0(%rsp),%rax
0xffffffff8023dcef <printk+15>: mov %rsi,0x28(%rsp)
0xffffffff8023dcf4 <printk+20>: mov %rsp,%rsi
0xffffffff8023dcf7 <printk+23>: mov %rdx,0x30(%rsp)
0xffffffff8023dcfc <printk+28>: mov %rcx,0x38(%rsp)
0xffffffff8023dd01 <printk+33>: mov %rax,0x8(%rsp)
0xffffffff8023dd06 <printk+38>: lea 0x20(%rsp),%rax
0xffffffff8023dd0b <printk+43>: mov %r8,0x40(%rsp)
0xffffffff8023dd10 <printk+48>: mov %r9,0x48(%rsp)
0xffffffff8023dd15 <printk+53>: movl $0x8,(%rsp)
0xffffffff8023dd1c <printk+60>: movl $0x30,0x4(%rsp)
0xffffffff8023dd24 <printk+68>: mov %rax,0x10(%rsp)
0xffffffff8023dd29 <printk+73>: callq 0xffffffff8023d980 <vprintk>
0xffffffff8023dd2e <printk+78>: add $0xd8,%rsp
0xffffffff8023dd35 <printk+85>: retq
End of assembler dump.
First of all we should disassemble the kernel function with either OBJDUMP utility or using gdb on vmlinux kernel image by referring above mention section in Document. For example here is the dis-assembly of add_range() kernel function where I will demonstrate how this all works.This will be different depending on how the compiler optimizes,but it should give an idea.
首先,我们应该使用objdump实用程序或在vmlinux内核映像上使用gdb来分解内核函数,方法是在文档中参考上述部分。例如,这里是add_range()内核函数的反汇编,我将在这里演示这一切是如何工作的。每个人的情况是不同的,取决于编译器如何优化,这边给出的经供参考
#gdb disassemble add_range
Dump of assembler code for function add_range:
0x8004c4d8 <+0>: mov r12, sp
0x8004c4dc <+4>: push {r4, r5, r6, r7, r11, r12, lr, pc}
0x8004c4e0 <+8>: sub r11, r12, #4
0x8004c4e4 <+12>: ldrd r6, [r11, #4]
0x8004c4e8 <+16>: ldrd r4, [r11, #12]
0x8004c4ec <+20>: cmp r7, r5
0x8004c4f0 <+24>: cmpeq r6, r4
0x8004c4f4 <+28>: bcs 0x8004c510 <add_range+56>
0x8004c4f8 <+32>: cmp r2, r1
0x8004c4fc <+36>: lsllt r3, r2, #4
0x8004c500 <+40>: addlt r2, r2, #1
0x8004c504 <+44>: addlt r1, r0, r3
0x8004c508 <+48>: strdlt r6, [r0, r3]
0x8004c50c <+52>: strdlt r4, [r1, #8]
0x8004c510 <+56>: mov r0, r2
0x8004c514 <+60>: ldm sp, {r4, r5, r6, r7, r11, sp, pc}
End of assembler dump.
Corresponding Kernel C function.
对应的kernel C函数
int add_range(struct range *range, int az, int nr_range, u64 start, u64 end)
{
if (start >= end)
return nr_range;
/* Out of slots: */
if (nr_range >= az)
return nr_range;
range[nr_range].start = start;
range[nr_range].end = end;
nr_range++;
return nr_range;
}
Lets analyse the first 3 lines which is more or less same/common across all functions and here r12=IP(Intra-Procedure-call scratch register),r11=FP(Frame pointer).The FP keeps track of the variables from function to function.It is a frame on the stack of the function.please explore basic frame layout for more detail.so in simple words SP is where the stack is and FP is where the stack was like PC and LR register.
让我们来分析一下前3行代码,它们在所有函数中或多或少是相同的/通用的,这里r12=IP(过程内调用暂存寄存器),r11=FP(帧指针)。FP从一个函数到另一个函数跟踪变量。它是函数堆栈上的一个帧。请查看基本框架布局的更多细节。因此,简单地说,SP是栈所在的位置,FP是栈所在的位置,就像PC和LR寄存器一样。
0x8004c4d8 <+0>: mov r12, sp /*get a copy of sp*/
0x8004c4dc <+4>: push {r4, r5, r6, r7, r11, r12, lr, pc} /*Save the frame,link register,program counter and other Register on to the stack */
0x8004c4e0 <+8>: sub r11, r12, #4 /*Set the new frame pointer.*/
The next 2 instruction passing 4 bytes and 12 bytes from the Frame pointer into the r6 and r4 registers for an function call and in other words values stores at r11+#4 would be stored in r6 and values stores at r11+#12 would be stored in r4.
Note:LDRD is used to store Double Word instruction however content also will be loaded into r7 and r5 register.This function call is dealing with 64 bit data so 64 bit data is manipulated in stack only.
接下来的两条指令从帧指针向r6和r4寄存器传递4个字节和12个字节,用于函数调用,换句话说,存储在r11+#4的值将存储在r6中,存储在r11+#12的值将存储在r4中。
注意:LDRD用于存储双字指令,但内容也将被加载到r7和r5寄存器。这个函数调用是处理64位数据,所以64位数据只能在堆栈中操作。
0x8004c4e4 <+12>: ldrd r6, [r11, #4]
0x8004c4e8 <+16>: ldrd r4, [r11, #12]
Note:The first four registers r0-r3 are used to pass argument values into a subroutine and to return a result value from a function.so R0=range,R1=az,R2=nr_range,R3=start,R4=end.
注意:前四个寄存器r0-r3用于将参数值传递给子例程并从函数返回结果值。所以R0 =range,R1 = az, R2 = nr_range R3 =start,R4 =end。
The next instructions can easily mapped with c code
接下来的指令可以很容易地用c代码映射
Note:Underlying mapping somewhat different from the normal C to Assembly conversion mapping because here 64-bit value is being passed in Function call argument which is u64 start and u64 end and to deal with 64 bit data it has to be stored in register pair and can be retrived using ldrd instruction from stack using frame pointer.
注意:底层映射不同的从正常的C组装转换映射,因为64位值是通过在函数调用参数u64开始和u64结束,处理64位数据必须存储在寄存器组和可以使用ldrd索取指令堆栈使用帧指针。
0x8004c4ec <+20>: cmp r7, r5 /*first instruction compare r7 and r5 register i.e store 32 bit LSB for start & End whose value is stored in stack.
0x8004c4f0 <+24>: cmpeq r6, r4 /*This next instruction performs an comparison only if the result of above { cmp r7, r5 } instruction found true(i.e r7=r5).
0x8004c4f4 <+28>: bcs 0x8004c510 <add_range+56>
0x8004c4f8 <+32>: cmp r2, r1 /*This instruction compare values stored in resisters r2 and r1 which are passed argument values i.e nr_range and az.
Corresponding C code is
if (start >= end)
return nr_range;
/* Out of slots: */
if (nr_range >= az)
return nr_range;
Lets move to next instructions .
让我们进入下一个指令。
0x8004c4fc <+36>: lsllt r3, r2, #4
0x8004c500 <+40>: addlt r2, r2, #1
0x8004c504 <+44>: addlt r1, r0, r3
0x8004c508 <+48>: strdlt r6, [r0, r3]
0x8004c50c <+52>: strdlt r4, [r1, #8]
Corresponding C code is
range[nr_range].start = start;
range[nr_range].end = end;
nr_range++;
0x8004c510 <+56>: mov r0, r2 /*move r2 content into r0 register which can be return back and As I said R0-R3 are also used to hold return value from function.
0x8004c514 <+60>: ldm sp, {r4, r5, r6, r7, r11, sp, pc} /*LDM is used to load multiple instructions and similar to POP stack instruction.
Corresponding C code is
return nr_range;
Here are the ARM Register definition for your reference and please keep in mind these register when you are mapping C function to ARM registers and more detailed information can be found here.
这里是ARM寄存器的定义,供您参考。当您将C函数映射到ARM寄存器时,请记住这些寄存器,在这里可以找到更详细的信息。
Note:In addition to above technique’s,don’t forget to visit Tour of ARM Assembly(http://www.coranac.com/tonc/text/asm.htm ) which will help you to understand following’s in deeper detail and after going through it, I Bet you,you should be able to produce some nice ARM assembly, or at least be able to read it well enough.
注意:除了以上技术的,别忘了参观ARM Assembly(http://www.coranac.com/tonc/text/asm.htm)后将有助于您理解的更深层次的细节和经历,你应该能够产生一些不错的ARM assembly,或者至少能够读它。
General assembly
The ARM instruction set
References:
Procedure Call Standard for the ARM® Architecture
ARM Procedure Call Standard
Arm Instruction set manual
Advertisements
Report this ad
Advertisements
Report this ad
Posted in Uncategorized
Tagged Debug kernel panics, Disassembling the kernel, Interpret Assembly Language, Kernel oopses
Jun23
Post navigation
Blog at WordPress.com.
:)