Miasm初探（1）

史昊焱

2023-12-01

从Github的第一个例子说起

Assembling / Disassembling

Import Miasm x86 architecture:

	>>> from miasm.arch.x86.arch import mn_x86
	>>> from miasm.core.locationdb import LocationDB

mn_x86是一个代表x86架构的类，用来处理x86架构

关于LocationDB：

	class LocationDB(__builtin__.object)
	 |  LocationDB is a "database" of information associated to location.
	 |  
	 |  An entry in a LocationDB is uniquely identified with a LocKey.
	 |  Additional information which can be associated with a LocKey are:
	 |  - an offset (uniq per LocationDB)
	 |  - several names (each are uniqs per LocationDB)
	 #	LocationDB是一个与位置相关的信息的数据库
	 #	一个LocationDB的入口被一个LocKey唯一确定
	 #	关于LocKey的额外信息：
	 #	- 一个偏移（每个LocationDB的LocKey都独一无二）
	 #	- 几个名称（每一个名称在LocationDB中都是独一无二的）

Get a location db:

	>>> loc_db = LocationDB()

Assemble a line:

	>>> l = mn_x86.fromstring('XOR ECX, ECX', loc_db, 32)
	>>> print(l)
	XOR        ECX, ECX
	>>> mn_x86.asm(l)
	['1\xc9', '3\xc9', 'g1\xc9', 'g3\xc9']

mn_x86.fromstring接受三个参数，第一个参数是指令字符串（必须大写），第二个是LocationDB对象，第三个表示32位模式

返回的l是一个miasm.arch.x86.arch.instruction_x86对象

mn_x86.asm方法将该对象转换为字节码的列表，其中，该列表中的每一项都是XOR ECX, ECX字节码

Modify an operand:

	>>> l.args[0] = mn_x86.regs.EAX
	>>> print(l)
	XOR        EAX, ECX
	>>> a = mn_x86.asm(l)
	>>> print(a)
	['1\xc8', '3\xc1', 'g1\xc8', 'g3\xc1']

l.args[0]即为该XOR指令的第一个操作数，mn_x86.regs.EAX代表EAX寄存器

Disassemble the result:

	>>> print(mn_x86.dis(a[0], 32))
	XOR        EAX, ECX

mn_x86.dis即为反汇编的方法，其接受两个参数，第一个参数可以是上述asm方法返回的列表的某一项，也可以是字符串形式的字节码，第二个参数代表32位模式

Using Machine abstraction:

	>>> from miasm.analysis.machine import Machine
	>>> mn = Machine('x86_32').mn
	>>> print(mn.dis('\x33\x30', 32))
	XOR        ESI, DWORD PTR [EAX]

关于Machine：

	class Machine(__builtin__.object)
	 |  Abstract machine architecture to restrict architecture dependent code
	 #	限制架构

Machine('x86_32').mn返回的其实就是mn_x86

For Mips:

	>>> mn = Machine('mips32b').mn
	>>> print(mn.dis(b'\x97\xa3\x00 ', "b"))
	LHU        V1, 0x20(SP)

这里展示了如何反汇编mips架构的字节码

Intermediate representation

这里的Intermediate representation就是简称的IR，即中间表达式

官方这里用的是arm架构的，我改成x86_32架构

Create an instruction:

	>>> machine = Machine('x86_32')
	>>> instr = machine.mn.dis('\x33\xc1', 'l')
	>>> print(instr)
	XOR        EAX, ECX

Create an intermediate representation object:

	>>> ira = machine.ira(loc_db)

machine.ira方法接受一个参数，为LocationDB

Create an empty ircfg

	>>> ircfg = ira.new_ircfg()

ircfg是一个miasm.ir.ir.IRCFG对象，意如其名……大概 : )

Add instruction to the pool:

	>>> ira.add_instr_to_ircfg(instr, ircfg)

意如其名，加_instr_到_ircfg

Print current pool:

	>>> for lbl, irblock in ircfg.blocks.items():
	...     print(irblock.to_string(loc_db))
	...
	
	loc_key_0:
	pf = parity((EAX ^ ECX) & 0xFF)
	
	zf = FLAG_EQ_CMP(EAX, ECX)
	
	of = 0x0
	
	EAX = EAX ^ ECX
	
	cf = 0x0
	
	nf = FLAG_SIGN_SUB(EAX ^ ECX, 0x0)
	
	IRDst = loc_key_1
	#	（看起来官方文档选择arm架构还是有愿意的）

ircfg.blocks返回之前传入的LocationDB对象，ircfg.blocks.items是一个列表

[(<LocKey 0>, <miasm.ir.ir.IRBlock object at 0x7fd38cb052d0>)]

所以for循环中lbl就是LocKey，irblock是miasm.ir.ir.IRBlock的对象

关于IRBlock：

	class IRBlock(__builtin__.object)
	 |  Intermediate representation block object.
	 |  
	 |  Stand for an intermediate representation  basic block.
	 #	中间表达式块对象
	 #	代表一个中间表达式的基本块

Working with IR, for instance by getting side effects:

	>>> for lbl, irblock in ircfg.blocks.iteritems():
	...     for assignblk in irblock:
	...         rw = assignblk.get_rw()
	...         for dst, reads in rw.iteritems():
	...             print('read:   ', [str(x) for x in reads])
	...             print('written:', dst)
	...             print()
	...
	
	read:  ['ECX', 'EAX']
	written:  pf

	read:  ['ECX', 'EAX']
	written:  zf

	read:  []
	written:  of

	read:  ['ECX', 'EAX']
	written:  nf

	read:  []
	written:  cf

	read:  ['ECX', 'EAX']
	written:  EAX
	
	read:  []
	written:  IRDst

	（其实还挺好懂的）

Emulation

Giving a shellcode:

	00000000 8d4904      lea    ecx, [ecx+0x4]
	00000003 8d5b01      lea    ebx, [ebx+0x1]
	00000006 80f901      cmp    cl, 0x1
	00000009 7405        jz     0x10
	0000000b 8d5bff      lea    ebx, [ebx-1]
	0000000e eb03        jmp    0x13
	00000010 8d5b01      lea    ebx, [ebx+0x1]
	00000013 89d8        mov    eax, ebx
	00000015 c3          ret
	>>> s = '\x8dI\x04\x8d[\x01\x80\xf9\x01t\x05\x8d[\xff\xeb\x03\x8d[\x01\x89\xd8\xc3'

Import the shellcode thanks to the Container abstraction:

	>>> from miasm.analysis.binary import Container
	>>> c = Container.from_string(s)
	>>> c
	<miasm.analysis.binary.ContainerUnknown object at 0x7f34cefe6090>

关于Container：

	class Container(__builtin__.object)
	 |  Container abstraction layer
	 |  
	 |  This class aims to offer a common interface for abstracting container
	 |  such as PE or ELF.
	 #	容器抽象层
	 #	这个类旨在为如PE或ELF的抽象容器提供一个通用接口
	 #	类如其名，应该就是个装信息的容器

关于Container.from_string：

	from_string(cls, data, *args, **kwargs) method of __builtin__.type instance
    	Instantiate a container and parse the binary
	    @data: str containing the binary
	    
		实例化一个容器，解析二进制流
		@data参数是二进制流的字符串

Disassembling the shellcode at address 0:

	>>> from miasm.analysis.machine import Machine
	>>> machine = Machine('x86_32')
	>>> mdis = machine.dis_engine(c.bin_stream)
	>>> asmcfg = mdis.dis_multiblock(0)
	>>> for block in asmcfg.blocks:
	...  print(block.to_string(asmcfg.loc_db))
	...
	loc_0
	LEA        ECX, DWORD PTR [ECX + 0x4]
	LEA        EBX, DWORD PTR [EBX + 0x1]
	CMP        CL, 0x1
	JZ         loc_10
	->      c_next:loc_b    c_to:loc_10
	loc_10
	LEA        EBX, DWORD PTR [EBX + 0x1]
	->      c_next:loc_13
	loc_b
	LEA        EBX, DWORD PTR [EBX + 0xFFFFFFFF]
	JMP        loc_13
	->      c_to:loc_13
	loc_13
	MOV        EAX, EBX
	RET

关于machine.dis_engine：

它是machine的一个成员，是一个miasm.arch.x86.disasm.dis_x86_32对象

关于mdis.dis_multiblock：

	dis_multiblock(self, offset, blocks=None) method of miasm.arch.x86.disasm.dis_x86_32 	instance
	    Disassemble every block reachable from @offset regarding
	    specific disasmEngine conditions
	    Return an AsmCFG instance containing disassembled blocks
	    @offset: starting offset
	    @blocks: (optional) AsmCFG instance of already disassembled blocks to
	            merge with
	    
	    #	反汇编所有从@offset开始的可及的块
	    #	明确反汇编引擎环境
	    #	返回一个包含反汇编块的AsmCFG实例（私以为和IRCFG很像）
	    #	@offset：开始的偏移
	    #	@blocks：（可选）与已经反汇编的块的AsmCFG实例合并

Initializing the Jit engine with a stack:

	>>> jitter = machine.jitter(jit_type='python')
	>>> jitter.init_stack()

JIT，即Just-in-time compilation，能做到即时编译，它能做到在程序执行期间进行编译

如果第一条指令报错：Unsupported jit arch: x86，不要在它的miasm目录下运行就不会报错了

Add the shellcode in an arbitrary memory location:

	>>> run_addr = 0x40000000
	>>> from miasm.jitter.csts import PAGE_READ, PAGE_WRITE
	>>> jitter.vm.add_memory_page(run_addr, PAGE_READ | PAGE_WRITE, s)

关于jitter.vm.add_memory_page：

	add_memory_page(...)
	    add_memory_page(address, access, content [, cmt]) -> Maps a memory page at 	@address of len(@content) bytes containing @content with protection @access
	    @cmt is a comment linked to the memory page
	
		#	在@address映射一个长度为len(@content)字节、包含带有@access保护的内容的内存页
		#	@cmt是一个链接到内存页的？？？注释？？？

miasm.jitter.cts包含了很多相关参数，具体可以看一下源码

Create a sentinelle to catch the return of the shellcode:

	def code_sentinelle(jitter):
	    jitter.run = False
	    jitter.pc = 0
	    return True
	
	>>> jitter.add_breakpoint(0x1337beef, code_sentinelle)
	>>> jitter.push_uint32_t(0x1337beef)

关于add_breakpoint：

	add_breakpoint(self, addr, callback) method of miasm.arch.x86.jit.jitter_x86_32 instance
    	Add a callback associated with addr.
	    @addr: breakpoint address
	    @callback: function with definition (jitter instance)

		#	在相关地址加一个回调函数
		#	@addr：断点地址
		#	@callback：自定的回调函数，参数是jitter实例

看这个回调函数，将jitter.run属性设置为False，这样做就停止了JIT的运行

jitter.pc就是pc寄存器，学过80x86的同学都直到，pc:ip是它的寻址的方法

jitter.push_uint32_t就向栈中push入一个unsigned int32的数据，调用该方法的前提是先调用jitter.init_stack

Active logs:

	>>> jitter.set_trace_log()

关于jitter.set_trace_log：

	set_trace_log(self, trace_instr=True, trace_regs=True, trace_new_blocks=False) method of miasm.arch.x86.jit.jitter_x86_32 instance
	    Activate/Deactivate trace log options
	    @trace_instr: activate instructions tracing log
	    @trace_regs: activate registers tracing log
	    @trace_new_blocks: dump new code blocks log

		#	激活/停用追踪日志选项
		#	@trace_instr：激活追踪指令的日志
		#	@trace_regs：激活追踪寄存器的日志
		#	@trace_new_blocks：转储新代码块的日志

Run at arbitrary address:

	>>> jitter.init_run(run_addr)
	>>> jitter.continue_run()

关于jitter.continue_run：

	continue_run(self, step=False) method of miasm.arch.x86.jit.jitter_x86_32 instance
	    PRE: init_run.
	    Continue the run of the current session until iterator returns or run is
	    set to False.
	    If step is True, run only one time.
	    Return the iterator value

		#	前提：先运行init_run
		#	继续运行当前阶段直到迭代器返回或run属性被设为False
		#	如果step属性为True，则只运行一次
		#	返回值为迭代器的值

官方文档显示输出中都是64位寄存器，他应该是整错了吧……

Interacting with the jitter:

	>>> jitter.vm
	Addr               Size               Access Comment
	0x1230000          0x10000            RW_    Stack
	0x40000000         0x16               RW_    

	
	>>> hex(jitter.cpu.EAX)
	'0x0L'
	>>> jitter.cpu.ESI = 12

Symbolic execution

不写了