解析字节码是常用到的一个需求,被解析出来的字节码可以用于多种用途,例如数值分析、机器学习等。
所谓的字节码:在 Java 语言中中引入了虚拟机的概念,即在机器和编译程序之间加入了一层抽象的虚拟的机器。这台虚拟的机器在任何平台上都提供给编译程序一个的共同的接口。编译程序只需要面向虚拟机,生成虚拟机能够理解的代码,然后由解释器来将虚拟机代码转换为特定系统的机器码执行。在 Java 中,这种供虚拟机理解的代码叫做字节码(即扩展名为 .class 的文件),它不面向任何特定的处理器,只面向虚拟机。每一种平台的解释器是不同的,但是实现的虚拟机是相同的。Java 源程序经过编译器编译后变成字节码,字节码由虚拟机解释执行,虚拟机将每一条要执行的字节码送给解释器,解释器将其翻译成特定机器上的机器码,然后在特定的机器上运行。这也就是解释了 Java 的编译与解释并存的特点。
采用字节码的好处:Java 语言通过字节码的方式,在一定程度上解决了传统解释型语言执行效率低的问题,同时又保留了解释型语言可移植的特点。所以 Java 程序运行时比较高效,而且,由于字节码并不专对一种特定的机器,因此,Java程序无须重新编译便可在多种不同的计算机上运行。
每种方法的字节码都存储在 Dalvik 文件中。Androguard 可以提供三种不同形式获取字节码的方法。
字节码是以16位为单位构造的,但是Androguard将使用8位单位来显示字节码。 如果在字节码中给出了偏移量,则也以字节表示。同样的所有索引均以字节长度提供。
由于Dalvik与Java密切相关,因此所有整数值都表示为带符号的“ int”(32位值)或“ long”(64位)。
值以十进制或十六进制表示。如果值为十六进制,则该值后缀为“ h”,即“ f7a0h”或“ 63392”。
要想获取方法的字节码的原始字节表示,第一步仍然是先要加载要测试的 APK 文件
ubuntu@ubuntu:~$ androguard analyze /home/ubuntu/Desktop/meeting.apk
Please be patient, this might take a while.
Found the provided file is of type 'APK'
[INFO ] androguard.apk: Starting analysis on AndroidManifest.xml
[INFO ] androguard.apk: APK file was successfully validated!
[INFO ] androguard.analysis: Adding DEX file version 35
[INFO ] androguard.analysis: Reading bytecode took : 0min 00s
[INFO ] androguard.analysis: Adding DEX file version 35
[INFO ] androguard.analysis: Reading bytecode took : 0min 00s
[INFO ] androguard.analysis: End of creating cross references (XREF) run time: 0min 00s
Added file to session: SHA256::689673bed0f4d6121a63f3c9fd88efb538ec316561d426120c440d8be89f6256
Loaded APK file...
>>> a
<androguard.core.bytecodes.apk.APK object at 0x7f3b27bb8390>
>>> d
[<androguard.core.bytecodes.dvm.DalvikVMFormat object at 0x7f3b1ae9d978>, <androguard.core.bytecodes.dvm.DalvikVMFormat object at 0x7f3b1ae36b00>]
>>> dx
<analysis.Analysis VMs: 2, Classes: 85, Methods: 340, Strings: 122>
Androguard version 3.4.0a1 started
然后通过 python 编程获取所有方法的字节表示:
In [1]: for method in dx.get_methods():
...: if method.is_external():
...: continue
...: m = method.get_method()
...: if m.get_code():
...: print(m.get_code().get_bc().get_raw())
其输出结果中将包含很多的二进制数据,如下:
bytearray(b'p\x10\t\x00\x00\x00\x0e\x00')
bytearray(b'p\x10\t\x00\x00\x00\x0e\x00')
bytearray(b'p\x10\xf3\x00\x00\x00\x0e\x00')
bytearray(b'\x12\x01i\x01E\x00\x1a\x00C\x01i\x00C\x00\x1a\x00\xa1\x02i\x00F\x00i\x01D\x00\x0e\x00')
bytearray(b'p\x10\x00\x00\x00\x00\x0e\x00')
bytearray(b'\x1d\x02b\x00C\x00\x1a\x01\x08\x00n \x03\x01\x10\x00\n\x008\x00\x1e\x00"\x00G\x00o\x10\x03\x00\x02\x00\x0c\x01q\x10\x06\x01\x01\x00\x0c\x01p \t\x01\x10\x00b\x01C\x00n \x0f\x01\x10\x00\x0c\x00n\x10\x11\x01\x00\x00\x0c\x00i\x00C\x00\x12\x10\x1e\x02\x0f\x00b\x00C\x00\x1a\x01\x08\x00n \xfe\x00\x10\x00\n\x00;\x00\xf5\xff"\x00G\x00o\x10\x03\x00\x02\x00\x0c\x01q\x10\x06\x01\x01\x00\x0c\x01p \t\x01\x10\x00\x1a\x01\x08\x00n \x0f\x01\x10\x00\x0c\x00b\x01C\x00n \x0f\x01\x10\x00\x0c\x00n\x10\x11\x01\x00\x00\x0c\x00i\x00C\x00(\xd4\r\x00\x1e\x02\'\x00')
编写代码获取反汇编形式
In [2]: for method in dx.get_methods():
...: if method.is_external():
...: continue
...: m = method.get_method()
...: for idx, ins in m.get_instructions_idx():
...: print(idx, ins.get_op_value(), ins.get_name(), ins.get_output())
...:
输出为
0 112 invoke-direct v0, Ljava/lang/Object;-><init>()V
6 14 return-void
0 112 invoke-direct v0, Ljava/lang/Object;-><init>()V
6 14 return-void
0 18 const/4 v1, 0
2 105 sput-object v1, Lcom/wrapper/proxyapplication/WrapperProxyApplication;->shellApp Landroid/app/Application;
如果想根据具体的类名和方法名获取其反汇编形式,可以采取以下做法
In [3]: for m in dx.find_methods("Lcom/tencent/wemeet/app/MyWrapperProxyApplication;"):
...: print(m.full_name)
...: for idx, ins in m.get_method().get_instructions_idx():
...: print(idx, ins.get_op_value(), ins.get_name(), ins.get_output())
...:
输出中将会看到
Lcom/tencent/wemeet/app/MyWrapperProxyApplication; <init> ()V
0 112 invoke-direct v0, Lcom/wrapper/proxyapplication/WrapperProxyApplication;-><init>()V
6 14 return-void
Lcom/tencent/wemeet/app/MyWrapperProxyApplication; initProxyApplication (Landroid/content/Context;)V
0 110 invoke-virtual v7, Landroid/content/Context;->getApplicationInfo()Landroid/content/pm/ApplicationInfo;
6 12 move-result-object v4
8 84 iget-object v0, v4, Landroid/content/pm/ApplicationInfo;->sourceDir Ljava/lang/String;
12 18 const/4 v1, 0
14 34 new-instance v2, Ljava/util/zip/ZipFile;
18 112 invoke-direct v2, v0, Ljava/util/zip/ZipFile;-><init>(Ljava/lang/String;)V
24 7 move-object v1, v2
26 57 if-nez v1, +00dh
30 113 invoke-static Landroid/os/Process;->myPid()I
36 10 move-result v4
38 113 invoke-static v4, Landroid/os/Process;->killProcess(I)V
44 18 const/4 v4, 0
46 113 invoke-static v4, Ljava/lang/System;->exit(I)V
52 113 invoke-static v7, v1, Lcom/wrapper/proxyapplication/Util;->PrepareSecurefiles(Landroid/content/Context; Ljava/util/zip/ZipFile;)I
58 110 invoke-virtual v1, Ljava/util/zip/ZipFile;->close()V
64 98 sget-object v4, Lcom/wrapper/proxyapplication/Util;->CPUABI Ljava/lang/String;
68 26 const-string v5, "x86"
72 51 if-ne v4, v5, +031h
76 34 new-instance v4, Ljava/lang/StringBuilder;
80 110 invoke-virtual v7, Landroid/content/Context;->getFilesDir()Ljava/io/File;
86 12 move-result-object v5
88 110 invoke-virtual v5, Ljava/io/File;->getAbsolutePath()Ljava/lang/String;
94 12 move-result-object v5
96 113 invoke-static v5, Ljava/lang/String;->valueOf(Ljava/lang/Object;)Ljava/lang/String;
102 12 move-result-object v5
104 112 invoke-direct v4, v5, Ljava/lang/StringBuilder;-><init>(Ljava/lang/String;)V
110 26 const-string v5, "/prodexdir/"
114 110 invoke-virtual v4, v5, Ljava/lang/StringBuilder;->append(Ljava/lang/String;)Ljava/lang/StringBuilder;
120 12 move-result-object v4
122 98 sget-object v5, Lcom/wrapper/proxyapplication/Util;->libname Ljava/lang/String;
126 110 invoke-virtual v4, v5, Ljava/lang/StringBuilder;->append(Ljava/lang/String;)Ljava/lang/StringBuilder;
132 12 move-result-object v4
134 110 invoke-virtual v4, Ljava/lang/StringBuilder;->toString()Ljava/lang/String;
140 12 move-result-object v4
142 113 invoke-static v4, Ljava/lang/System;->load(Ljava/lang/String;)V
148 14 return-void
150 13 move-exception v3
152 110 invoke-virtual v3, Ljava/io/IOException;->printStackTrace()V
158 40 goto -42h
160 13 move-exception v3
162 110 invoke-virtual v3, Ljava/io/IOException;->printStackTrace()V
168 40 goto -34h
170 98 sget-object v4, Lcom/wrapper/proxyapplication/Util;->libname Ljava/lang/String;
174 113 invoke-static v4, Ljava/lang/System;->loadLibrary(Ljava/lang/String;)V
180 40 goto -10h
Lcom/tencent/wemeet/app/MyWrapperProxyApplication; onCreate ()V
0 111 invoke-super v0, Lcom/wrapper/proxyapplication/WrapperProxyApplication;->onCreate()V
6 14 return-void
In [9]: for method in dx.get_methods():
...: if method.is_external():
...: continue
...: m = method.get_method()
...: print(m.source())
...:
通过上述代码,可以输出所有方法的源码,输出举例如下:
private declared_synchronized boolean Fixappname()
{
try {
if (!com.wrapper.proxyapplication.WrapperProxyApplication.className.startsWith(.)) {
if (com.wrapper.proxyapplication.WrapperProxyApplication.className.indexOf(.) < 0) {
com.wrapper.proxyapplication.WrapperProxyApplication.className = new StringBuilder(String.valueOf(super.getPackageName())).append(.).append(com.wrapper.proxyapplication.WrapperProxyApplication.className).toString();
}
} else {
com.wrapper.proxyapplication.WrapperProxyApplication.className = new StringBuilder(String.valueOf(super.getPackageName())).append(com.wrapper.proxyapplication.WrapperProxyApplication.className).toString();
}
} catch (String v0_11) {
throw v0_11;
}
return 1;
}
也可以使用 DAD 编译抽象语法树(AST),AST 可以轻松地用于对代码本身进行分析。其方法如下:
from pprint import pprint
from androguard.decompiler.dad.decompile import DvMethod
for method in dx.get_methods():
if method.is_external():
continue
dv = DvMethod(method)
dv.process(doAST=True)
pprint(dv.get_ast())
其输出形式为:
{'body': ['BlockStatement',
None,
[['LocalDeclarationStatement',
None,
[['TypeName', ('.int', 0)], ['Local', 'v1_0']]],
['LocalDeclarationStatement',
['ClassInstanceCreation',
(java/io/File, <init>, (Ljava/lang/String;)V),
[['Local', 'p5']],
['TypeName', (java/io/File, 0)]],
[['TypeName', (java/io/File, 0)], ['Local', 'v0_1']]],
['IfStatement',
None,
['BinaryInfix',
[['Parenthesis',
[['Unary',
[['MethodInvocation',
[['Local', 'v0_1']],
(java/io/File, exists, ()Z),
exists,
True]],
'!',
False]]],
['Parenthesis',
[['BinaryInfix',
[['MethodInvocation',
[['Local', 'v0_1']],
(java/io/File, length, ()J),
length,
True],
['Local', 'p6']],
'!=']]]],
'||'],
[['BlockStatement',
None,
[['ExpressionStatement',
['Assignment',
[['Local', 'v1_0'], ['Literal', '0', ('.int', 0)]],
'']]]],
['BlockStatement',
None,
[['ExpressionStatement',
['Assignment',
[['Local', 'v1_0'], ['Literal', '1', ('.int', 0)]],
'']]]]]],
['ReturnStatement', ['Local', 'v1_0']]]],
'comments': [],
'flags': ['private', 'static'],
'params': [[['TypeName', (java/lang/String, 0)], ['Local', 'p5']],
[['TypeName', ('.long', 0)], ['Local', 'p6']]],
'ret': ['TypeName', ('.boolean', 0)],
'triple': (com/wrapper/proxyapplication/Util,
isFileValid,
(Ljava/lang/String;J)Z)}
以上 AST 等价于下面的源代码
private static boolean isFileValid(String p5, long p6)
{
int v1_0;
java.io.File v0_1 = new java.io.File(p5);
if ((!v0_1.exists()) || (v0_1.length() != p6)) {
v1_0 = 0;
} else {
v1_0 = 1;
}
return v1_0;
}