最近需要对分析的病毒提供一定的检测能力。看了一圈发现yara规则比较满足我的需求。
本文包括:
简介:vt开发的一个用于编写恶意软件识别和分类规则的工具。
官方的github库地址:https://github.com/VirusTotal/yara/releases
官方文档说明:https://yara.readthedocs.io
简单示例:
rule silent_banker : banker
{
meta:
description = "This is just an example"
threat_level = 3
in_the_wild = true
strings:
$a = {6A 40 68 00 30 00 00 6A 14 8D 91}
$b = {8D 4D B0 2B C1 83 C0 27 99 6A 4E 59 F7 F9}
$c = "UVODFRYSIHLNWPEJXQZAKCBGMT"
condition:
$a or $b or $c
}
安装:下载即可使用
使用:yara.exe rule.yara 待检测文件或目录
一般规则分为:字符串和条件两个部分。
字符串定义软件中可能出现的字符串。
条件将字符串出现进行组合更好的筛选程序。
//两种简单的字符串形式
rule ExampleRule
{
strings:
$my_text_string = "text here"
$my_hex_string = { E2 34 A1 C8 23 FB }
condition:
$my_text_string or $my_hex_string
}
关键字规则与c语言类似
all and any ascii at base64 base64wide condition
contains endswith entrypoint false filesize for fullword global
import icontains iendswith iequals in include int16 int16be
int32 int32be int8 int8be istartswith matches meta nocase
none not of or private rule startswith strings
them true uint16 uint16be uint32 uint32be uint8 uint8be
wide xor defined
字符串以$开头,使用数字、下划线、字符串进行命名。可以使用”或者{}进行字符串的定义
$my_hex_string = { E2 34 A1 C8 23 FB }
$hex_string = { E2 34 ?? C8 A? FB }//?为通配符
$hex_string = { F4 23 [4-6] 62 B4 }//任意填充4-6个字节
$hex_string = { F4 23 ( 62 B4 | 56 ) 45 }//63 B4或者56选择其中一个
$my_text_string = "text here\" \\ \r \t \n \xdd"//和c语言中的字符串定义一样
字符串的修饰:在定义了字符串后可以用一些修饰词对其进行修饰,并且支持同时使用多个修饰词,如nocase表示忽略大小写
$text_string = "foobar" nocase//忽略大小写,可以匹配Foobar, FOOBAR, and fOoBaR
$wide_string = "Borland" wide//表示匹配宽字节,B\x00p\x00这种
$wide_and_ascii_string = "Borland" wide ascii//可以同事匹配wide或者ascii
$xor_string = "This program cannot" xor//可以发现异或后的字符串
$a = "This program cannot" base64//发现base64加密的字符串
$a = "This program cannot" base64("!@#$%^&*(){}[].,|ABCDEFGHIJ\x09LMNOPQRSTUVWXYZabcdefghijklmnopqrstu")
//支持自定义base64的表
fullword关键字,需要整个词匹配。如domain不能匹配www.mydomain.com,匹配www.my-domain.com和
关键词的组合限制
关键词 | 作用 | 限制,无法同时使用 |
---|---|---|
nocase | 忽略大小写 | xor base64 base64wide |
wide | 宽字节UTF16 | |
ascii | 匹配ascii字符 | |
xor | 单字节异或 | nocase base64 base64wide |
base64 | 匹配base64后的 | nocase xor fullword |
base64wide | 匹配base64后的交错0x00的字符串 | nocase xor fullword |
fullword | 严格匹配完整字符 | base64 base64wide |
正则表达式方式:使用/和/将正则内容包裹(https://www.runoob.com/regexp/regexp-tutorial.html 正则学习)
$re1 = /md5: [0-9a-fA-F]{32}/
$re2 = /state: (on|off)/
$re1 = /foo/i // 大小不敏感
$re2 = /bar./s // In this regexp the dot matches everything, including new-line
$re3 = /baz./is // Both modifiers can be used together
正则表达式特殊字符学习
符号 | 含义 |
---|---|
\ | 匹配一个字符。\,|,*等 |
^ | 匹配开头 |
$ | 匹配结尾 |
. | 匹配任意单个字符 |
() | 匹配括号里的内容 |
[] | 匹配【】里的任意内容 |
* | 匹配0或多次 |
+ | 至少匹配一次 |
? | 匹配0或1次 |
{n} | 匹配n次 |
{n,} | 至少匹配n次 |
{,m} | 最多匹配m次 |
{n,m} | 匹配n到m次 |
\t | tab |
\n | 换行 |
\r | 回车 |
\xNN | 某个字符 |
\w | 匹配一个单词(数字,字母,下划线) |
\W | 匹配非单词 |
\s | 匹配一个空白字符 |
\S | 匹配非空白字符 |
\d | 匹配数字 |
\D | 匹配非数字 |
\b | 单词边界 |
\B | 非单词边界 |
条件定义与编程的布尔表达式基本一致
布尔类型:and、or、not
关系运算:>=、<=、<、>、==、!=
算术运算:+、-、*、、\、%
位运算:&、|、<<、>>、~、^
井号(#)表示统计出现次数
rule CountExample
{
strings:
$a = "dummy1"
$b = "dummy2"
condition:
#a == 6 and #b > 10
#a in (filesize-500..filesize) == 2 //可以范围统计
}
at表示偏移或虚拟地址
rule AtExample
{
strings:
$a = "dummy1"
$b = "dummy2"
condition:
$a at 100 and $b at 200//$a出现在100偏移
}
in表示范围寻找
rule InExample
{
strings:
$a = "dummy1"
$b = "dummy2"
condition:
$a in (0..100) and $b in (100..filesize)
}
关键词filesize表示文件大小,表示文件大于200kb,只对文件时生效
rule FileSizeExample
{
condition:
filesize > 200KB
}
关键词entrypoint表示程序的入口点,常用于查看是否为壳或是否感染
rule EntryPointExample1
{
strings:
$a = { E8 00 00 00 00 }
condition:
$a at entrypoint
}
rule EntryPointExample2
{
strings:
$a = { 9C 50 66 A1 ?? ?? ?? 00 66 A9 ?? ?? 58 0F 85 }
condition:
$a in (entrypoint..entrypoint + 10)
}
从文件或内存偏移获取数据
int8(<offset or virtual address>)
int16(<offset or virtual address>)
int32(<offset or virtual address>)
uint8(<offset or virtual address>)
uint16(<offset or virtual address>)
uint32(<offset or virtual address>)
int8be(<offset or virtual address>)
int16be(<offset or virtual address>)
int32be(<offset or virtual address>)
uint8be(<offset or virtual address>)
uint16be(<offset or virtual address>)
uint32be(<offset or virtual address>)
rule IsPE
{
condition:
// MZ signature at offset 0 and ...
uint16(0) == 0x5A4D and
// ... PE signature at offset stored in MZ header at 0x3C
uint32(uint32(0x3C)) == 0x00004550
}
字符串集合:可以使用括号,或者通配符*来表示,所有字符串可以使用them
rule OfExample1
{
strings:
$a = "dummy1"
$b = "dummy2"
$c = "dummy3"
$foo1 = "foo1"
$foo2 = "foo2"
$foo3 = "foo3"
condition:
2 of ($a,$b,$c)
2 of ($foo*) // equivalent to 2 of ($foo1,$foo2,$foo3)
1 of them // equivalent to 1 of ($*)
}
all of them // all strings in the rule
any of them // any string in the rule
all of ($a*) // all strings whose identifier starts by $a
any of ($a,$b,$c) // any of $a, $b or $c
1 of ($*) // same that "any of them"
none of ($b*) // zero of the set of strings that start with "$b"
针对字符串的遍历,#表示出现次数,@表示第一个偏移量,!表示字符串长度
for all of them : ( # > 3 )
for all of ($a*) : ( @ > @b )
迭代遍历
for any section in pe.sections : ( section.name == ".text" )
for any i in (0..pe.number_of_sections-1) : ( pe.sections[i].name == ".text" )
for any k,v in some_dict : ( k == "foo" and v == "bar" )
for <quantifier> <variables> in <iterable> : ( <some condition using the loop variables> )
参考其他规则,可以直接复用其他规则
rule Rule1
{
strings:
$a = "dummy1"
condition:
$a
}
rule Rule2
{
strings:
$a = "dummy2"
condition:
$a and Rule1
}
全局规则(global):所有其他规则都会带上全局规则限制
global rule SizeLimit
{
condition:
filesize < 2MB
}
私有规则:不会有检测输出,作为其他规则的配套规则
private rule PrivateRuleExample
{
...
}
Metadata:存放规则的相关信息
rule MetadataExample
{
meta:
my_identifier_1 = "Some string data"
my_identifier_2 = 24
my_identifier_3 = true
strings:
$my_text_string = "text here"
$my_hex_string = { E2 34 A1 C8 23 FB }
condition:
$my_text_string or $my_hex_string
}
引入第三方的库
import "pe"
import "cuckoo"
rule Test
{
strings:
$a = "some string"
condition:
$a and pe.entry_point == 0x1000
}
引入其他的yara文件
include "other.yar"
include "./includes/other.yar"
include "../includes/other.yar"
安装yara-python库
pip install yara-python
简单demo
import yara
import os
# 获取目录内的yara规则文件
# 将yara规则编译
def getRules(path):
filepath = {}
for index, file in enumerate(os.listdir(path)):
rupath = os.path.join(path, file)
key = "rule" + str(index)
filepath[key] = rupath
yararule = yara.compile(filepaths=filepath)
return yararule
# 扫描函数
def scan(rule, path):
for file in os.listdir(path.decode("utf-8")):
mapath = os.path.join(path, file)
print malpath
fp = open(mapath, 'rb')
matches = rule.match(data=fp.read())
if len(matches) > 0:
print file, matches
if __name__ == '__main__':
rulepath = "/home/authenticate/yara/rule_yara/" # yara规则目录
malpath ="/home/authenticate/yara/test_simple/" # simple目录
# yara规则编译函数调用
yararule = getRules(rulepath)
# 扫描函数调用
scan(yararule, malpath)
规则编写主要分为字符串编写和条件编写难度都不大,但是如何能够写出准确、通用性好、误报少的还是挺难的,需要多写写和想象力。
参考:
官方的github库地址:https://github.com/VirusTotal/yara/releases
官方文档说明:https://yara.readthedocs.io
python中使用yara的demo: https://blog.csdn.net/weixin_40596016/article/details/79865670