0x00 前言
原文:https://miasm.re/blog/2016/01/27/re150.html
github:https://github.com/cea-sec/miasm
reverseMe:http://www.grehack.fr/data/grehack2015/re/Grehack%202015%20-%20Reverse%20-%20150.zip
主要是对以上文章的详细描述,但由于miasm经过了大的版本变化,原文章的脚本在当前版本有些问题,需要进行一定的改动。
0x01 介绍(直接复制官方的了)
What is Miasm?
Miasm is a free and open source (GPLv2) reverse engineering framework. Miasm aims to analyze / modify / generate binary programs. Here is a non exhaustive list of features:
- Opening / modifying / generating PE / ELF 32 / 64 LE / BE
- Assembling / Disassembling X86 / ARM / MIPS / SH4 / MSP430
- Representing assembly semantic using intermediate language
- Emulating using JIT (dynamic code analysis, unpacking, ...)
- Expression simplification for automatic de-obfuscation
- ...
See the official blog for more examples and demos.
0x02 利用沙箱模拟运行程序并生成流程图
import os
from miasm.analysis.sandbox import Sandbox_Linux_x86_32
parser = Sandbox_Linux_x86_32.parser(description='ELF sandbox')
parser.add_argument('filename', help='ELF Filename')
options = parser.parse_args()
sb = Sandbox_Linux_x86_32(options.filename, options, globals())
sb.run()
这段代码没什么好说的,就是用于执行一个我们命令行输入的ELF文件,保存为sandbox.py后使用python sandbox.py -b reverseMe来执行。
-> c_to:loc_804b1f0
loc_804b1f0
MOV EBX, 0x1
MOV EAX, 0x4
INT 0x80
-> c_next:loc_804b1fc
Traceback (most recent call last):
File "Sandbox.py", line 22, in <module>
sb.run()
File "/usr/local/lib/python2.7/dist-packages/miasm/analysis/sandbox.py", line 617, in run
super(Sandbox_Linux_x86_32, self).run(addr)
File "/usr/local/lib/python2.7/dist-packages/miasm/analysis/sandbox.py", line 133, in run
self.jitter.continue_run()
File "/usr/local/lib/python2.7/dist-packages/miasm/jitter/jitload.py", line 405, in continue_run
return next(self.run_iterator)
File "/usr/local/lib/python2.7/dist-packages/miasm/jitter/jitload.py", line 373, in runiter_once
assert(self.get_exception() == 0)
AssertionError
该环境在结尾处崩溃了。 崩溃的原因是assert(self.get_exception() == 0)断言的不成立导致的异常,但该异常并非在INT 0x80处捕获到。如果我们直接用IDA打开该程序,则会发现一堆被加密的代码部分,而现在运行到了INT 0x80,这说明我们执行到了一个新的代码部分,而非被加密过的代码(该程序代码段经过SMC),现在我们可以将内存dump下来,然后进行反汇编并生成CFG(control flow graph)。使用python -i Sandbox.py -b reverseMe,使运行脚本后启动交互式界面
# Using -i, we obtain an interactive Python shell at the end of the script
$ python -i sandbox_elf_x86_32.py -b reverseMe
...
AssertionError
# sb is our Sandbox instance
# sb.jitter is the emulator instance
# sb.jitter.vm is the emulator virtual memory
>>> sb.jitter.vm
# Emulated memory. Columns are: address, size, memory permissions
# 0x130000 -> stack
ad 0x130000 size 0x10000 RW_
# 0x8048000 -> mapped binary
ad 0x8048000 size 0x4000 RW_
# Dump the binary (the ELF structure is conserved because in this specific
# binary, memory addresses are directly linked with file offset)
>>> open("dump.bin", "w").write(sb.jitter.vm.get_mem(0x8048000, 0x4000))
现在,dump下来的ELF文件中的代码段全部被解密,但这个文件中只包含之前在沙箱中执行过的路径代码,我们可以使用Miasm来为其生成CFG,脚本如下
from miasm.analysis.binary import Container
from miasm.analysis.machine import Machine
cont = Container.from_stream(open('dump.bin'))
bin_stream = cont.bin_stream
machine = Machine(cont.arch)
mdis = machine.dis_engine(bin_stream)
blocks = mdis.dis_multibloc(cont.entry_point)
open('cfg.dot', 'w').write(blocks.dot())
这样就生成了cfg.dot的流程图,在linux下可使用xdot命令直接运行来查看,在Windows下可使用Grapgviz工具来查看。由于该图片太长,这里就不贴了,丢个链接吧https://miasm.re/blog/_images/re150_graph_execflow_2.svg。
0x03 CFG模式匹配
仔细查看该流程图就能看出一些问题,在这里进行了许多重复的解密模式,可以说是花指令了,它们都有一种共有模式,即
# Pattern we want to match:
# |
# +----v----+
# | (dad) |
# | PUSH |
# | MOV |
# +----+----+
# |
# +----v----+
# | (middle)|<---+
# +----+--+-+ |
# | +------+
# +----v----+
# | (end) |
# +----+----+
# |
# v
我们现在来在流程图中匹配这种模式
#首先还是生成流程图
from miasm.analysis.binary import Container
from miasm.analysis.machine import Machine
cont = Container.from_stream(open('dump.bin'))
bin_stream = cont.bin_stream
machine = Machine(cont.arch)
mdis = machine.dis_engine(bin_stream)
blocks = mdis.dis_multibloc(cont.entry_point)
#然后利用MatchGraphJoker生成单个匹配块,最后连接为MatchGraph
from miasm.core.graph import MatchGraphJoker
def dad_filter(graph, node):
block = graph.loc_key_to_block(node)
if len(block.lines) == 2 and \
block.lines[0].name = 'PUSH' and \
block.lines[1].name == 'MOV':
return True
else:
return False
dad = MatchGraphJoker(name='dad', restrict_in=False, filt=dad_filter)
middle = MatchGraphJoker(name='middle')
end = MatchGraphJoker(name='end', restrict_out=False)
matcher = dad >> middle >> end
先来解释一下代码吧,上半部分之前已经说过了,直接说下半部分吧。首先我们定义了一个dad_filter作为dad这个流程图必须匹配的模式,然后用MatchGraphJoker实例化一个dad,至于middle和end没什么要求。restrict_in=False表示进入的边无限制,而restrict_out=False表示出去的边无限制。最后将这三个MatchGraphJoker用>>运算符连接起来形成一个MatchGraph。
然后我们就可以来匹配blocks中的这种模式了
for sol in matcher.match(blocks):
print sol[dad]
loc_00000000080496DE:0x080496de
PUSH EDX
MOV EDX, 0x8048AC0
-> c_next:loc_00000000080496E4:0x080496e4
loc_00000000080482F2:0x080482f2
PUSH EDX
MOV EDX, 0x804A9CE
-> c_next:loc_00000000080482F8:0x080482f8
loc_00000000080496C5:0x080496c5
PUSH EDX
MOV EDX, 0x8049DCB
-> c_next:loc_00000000080496CB:0x080496cb
loc_0000000008049981:0x08049981
PUSH EDX
MOV EDX, 0x8048DAE
-> c_next:loc_0000000008049987:0x08049987
loc_000000000804881F:0x0804881f
PUSH EDX
MOV EDX, 0x8049614
-> c_next:loc_0000000008048825:0x08048825
loc_0000000008049C88:0x08049c88
PUSH EDX
MOV EDX, 0x80480CA
-> c_next:loc_0000000008049C8E:0x08049c8e
loc_0000000008048E78:0x08048e78
PUSH EDX
MOV EDX, 0x8048A5C
-> c_next:loc_0000000008048E7E:0x08048e7e
...
MatchGraph.match()函数返回一个生成器,它的元素是一个dic类型,这个dic的键为MatchGraphJoker,而对应的值即实际的block。
0x04 简化流程图
由于这里重复了太多此种模式,我们现将这些东西清除掉,将其作为一个黑盒过程,来看看程序其他地方做了什么。
def block_merge(graph):
global matcher
for sol in matcher.match(blocks):
successors = graph.successors(sol[end])
for pred in graph.predecessors(sol[dad]):
for succ in successors:
graph.add_edge(pred, succ, blocks.edges2constraint[(pred, sol[dad])])
for node in sol.itervalues():
graph.del_node(node)
block_merge(blocks)
open('cfg_after.dot', 'w').write(blocks.dot())
这里的逻辑很简单,就是将匹配到的模式中dad的前一个和end的后一个连起来,然后删除这三个结点。查看cfg_after.dot就很清晰了。图片看这里:https://miasm.re/blog/_images/re150_cfg_after_0.svg
0x05 执行正确的路径
显然,我们这里执行的路径是错误的,要使我们执行正确的路径,需要重写Sandbox.py脚本。以上图片中,可以看到cmp eax, 2这条指令,然后经过一个判断导致了最终的错误路径执行。而在这条指令之上是pop eax,说明我们的栈顶必须为2才能正确执行,最终我们push 2进去,然后在沙箱中执行
import os
from string import printable
from miasm.analysis.sandbox import Sandbox_Linux_x86_32
from miasm.core.types import Str, set_allocator
from miasm.os_dep.common import heap
parser = Sandbox_Linux_x86_32.parser(description='ELF sandbox')
parser.add_argument('filename', help='ELF Filename')
options = parser.parse_args()
sb = Sandbox_Linux_x86_32(options.filename, options, globals())
set_allocator(heap().vm_alloc)
MemStrAnsi = Str().lval
addr_pwd = MemStrAnsi.from_str(sb.jitter.vm, 'password').get_addr()
addr_bin = MemStrAnsi.from_str(sb.jitter.vm, 'reverseMe').get_addr()
sb.jitter.push_uint32_t(addr_pwd)
sb.jitter.push_uint32_t(addr_bin)
sb.jitter.push_uint32_t(0x2)
sb.run()
运行后,程序会在一个新的地方抛出异常
RAX 0000000000000000 RBX 0000000020000000 RCX 000000000000000D RDX 0000000000000000
RSI 0000000000000000 RDI 0000000000000000 RSP 000000000013FFF0 RBP 0000000000000000
zf 0000000000000000 nf 0000000000000000 of 0000000000000000 cf 0000000000000001
RIP 000000000804B1DD
0804B1DE XOR BYTE PTR [EBX+ECX], 0x33
WARNING: address 0x2000000D is not mapped in virtual memory:
这说明我们分配的空间不够,而我们分配的空间实际上是根据字符串长度来决定的,也就是说我们的输入是不够的。于是给出足够的长度或足够的分配空间,这里需要的是28,于是修改脚本
from string import printable
# printable avoid character repetition, useful for debugging
addr_pwd = MemStrAnsi.from_str(sb.jitter.vm, printable[:28]).get_addr()
重新dump后得到新的CFG:https://miasm.re/blog/_images/re150_cfg_after_1.svg
其中有一条路径明显是有问题的,我们将其地址屏蔽掉
mdis = machine.dis_engine(bin_stream, dont_dis=[0x8049215])
最终CFG为:https://miasm.re/blog/_images/re150_cfg_after_2.svg
通过这张图,可以看到该程序就是通过异或来验证我们的输入,可以手工找到这些异或值来得到flag,也可以使用Miasm来操作CFG。最终的完整脚本如下:
from miasm.analysis.binary import Container
from miasm.analysis.machine import Machine
from miasm.core.graph import MatchGraphJoker
from miasm.core.graph import DiGraphSimplifier
def dad_filter(blocks, loc_key):
block = blocks.loc_key_to_block(loc_key)
if len(block.lines) == 2 and \
block.lines[0].name == 'PUSH' and \
block.lines[1].name == 'MOV':
return True
else:
return False
cont = Container.from_stream(open("dump2.bin"))
bin_stream = cont.bin_stream
machine = Machine(cont.arch)
mdis = machine.dis_engine(bin_stream, dont_dis=[0x8049215])
blocks = mdis.dis_multiblock(cont.entry_point)
dad = MatchGraphJoker(name='dad', restrict_in=False, filt=dad_filter)
middle = MatchGraphJoker(name='middle')
end = MatchGraphJoker(name='end', restrict_out=False)
matcher = dad >> middle >> middle >> end
def block_merge(graph):
global matcher
for sol in matcher.match(blocks):
successors = graph.successors(sol[end])
for pred in graph.predecessors(sol[dad]):
for succ in successors:
graph.add_edge(pred, succ, blocks.edges2constraint[(pred, sol[dad])])
for node in sol.itervalues():
graph.del_node(node)
block_merge(blocks)
open('/mnt/c/Code/miasm/cfg_after2.dot', 'w').write(blocks.dot())
//以下获取CFG中xor指令的操作数
key = []
head = blocks.heads()[0]
for loc_key in blocks.walk_depth_first_forward(head):
block = blocks.loc_key_to_block(loc_key)
if (len(block.lines) == 3 and block.lines[1].name == 'XOR'):
cst = block.lines[1].args[1]
key.append(chr(int(cst.arg)))
print ''.join(key)