llvm:Call Graph And Control Flow Graph

谭光辉

2023-12-01

前言

最近对llvm框架进行了初步的了解，才体会了llvm真正的魅力。它不仅是一个编译器框架，更是研究者们研究程序的一个有力的工具。本篇文章主要介绍一下如何对llvm的中间语言IR进行处理从而生成Call Graph(CG)和Control Flow Graph。

Call Graph

Call Graph又叫做函数调用图，用来记录程序中的函数调用关系的。比如：

    void foo(){
        a();
        b();
    }

在上面的例子中，foo函数调用了a函数和b函数，则程序的函数调用图为 foo -> a, foo -> b.

我们在llvm的IR层面上进行分析(llvm允许我们自己编写Pass分析程序来对IR进行分析或修改）。具体的Pass教程可参考how to write a pass

在实现Call Graph过程中所需要了解的是在IR中间代码中，调用函数有两种表示形式，一种是使用call调用，另一种是使用invoke调用。这两种形式的区别是call调用的函数中没有异常需要捕捉，而invoke调用的函数中有异常需要捕捉。因此在invoke指令中除了所调用的函数这一标签外还有exception这一标签

<result> = invoke [cconv] [ret attrs] <ptr to function ty> <function ptr val>(<function args>) [fn attrs]
 to label <normal label> unwind label <exception label>

上面除了normal标签（调用的函数）之外，还有exception标签（捕捉的异常）

所以我们对于CallInst和InvokeInst都要进行处理。

因此，我声明一个class继承自ModulePass，并且重载runOnModule(Module &M)函数。我们首先找到c/c++语言的入口函数main,并遍历main函数的每一个指令，使用dyn_cast函数来判断指令是callInst还是invokeInst,当是这两个指令的时候，解析这两个指令，通过getCalledFunction()函数来获得所调用的函数，如果该函数还没有被遍历到，就将其加入到栈中，等到以后遍历到。具体的函数过程如下所示：


bool CGPass::runOnModule(Module &M) {

    Function *main = M.getFunction("main");

    G = new CallGraph(main);
    G->valueList.push_back(main);
    if (!main) return false;

    std::deque<Function*> list;


    list.push_back(main);

    while (!list.empty()) {

        Function* func = list.front();

        list.pop_front();

        for (Function::iterator iter = func->begin(); iter != func->end(); ++iter)

        {



            for (BasicBlock::iterator Biter = iter->begin(); Biter != iter->end(); ++Biter)

            {

                Instruction *I = &*Biter;

                if (CallInst *inst = dyn_cast<CallInst>(I))

                {

                    //errs() <<"instruction\n";

                    Function* called = inst->getCalledFunction();

                    if (called)

                    {

                        //errs() <<"instruction1\n";
                        //errs() <<"instruction2\n";

                        G->AddEdge(func, called);

                        if (!G->hasFunction(called))

                        {
                            list.push_back(called);
                            G->valueList.push_back(called);
                        }

                        //}

                    }

                }

                if (InvokeInst *inst = dyn_cast<InvokeInst>(I))

                {

                    Function* called = inst->getCalledFunction();

                    errs() << "hello\n";

                    if (called)

                    {

                        G->AddEdge(func, called);

                        if (!G->hasFunction(called))

                        {
                            list.push_back(called);
                            G->valueList.push_back(called);
                        }

                    }

                }

            }

        }

    }

    //G->print();

    G->dump();

}

完整的代码请参考我的github

注：我在实际的实验过程中，发现c++的new函数在llvm中的表示为@_Znwm；并且由于C++的动态绑定特性，其所调用的虚函数只有在运行的时候才能确定，虚函数在c++中都保存在一个vtable中，并且在llvm中在调用虚函数的时候是直接取得该函数在vtable中的位置的指针，即该函数的指针。具体的操作可以参考如下示例：

 %16 = getelementptr inbounds void (%class.A*)*, void (%class.A*)** %15, i64 0
  %17 = load void (%class.A*)*, void (%class.A*)** %16, align 8
  call void %17(%class.A* %13)

因为vtable是从一个实例类的0偏移开始存储的，所以在该实例中从(%class.A*)*指向的地址开始获取，由于是第一个虚函数，所以getelementptr的偏移为0，然后调用load函数获得该虚函数，随后调用该虚函数。可是遗憾的是我们根据CallInst指令获得该函数的时候函数为空，可能因为函数的动态绑定，只能在运行的时候才能确定具体的函数吧。

Control Flow Graph

Control Flow Graph又叫做控制流程图，表示一个函数之间Basic Block的控制关系。Basic Block是一个函数中的基本块，在llvm中有BasicBlock表示基本块，并可以通过

for (Function::iterator B_iter = F.begin(); B_iter != F.end(); ++B_iter)

来遍历一个函数中的所有基本块。
并且在llvm中有专门的函数successors(BasicBlock *B)来获得BasicBlock B的后继基本块。所以获得一个函数的Control Flow Graph的逻辑还是比较简单的。关键代码如下：

bool runOnFunction(Function &F) override {
            raw_string_ostream rso(str);
            StringRef name(F.getName().str() + ".dot");

            enum sys::fs::OpenFlags F_None;
            raw_fd_ostream file(name, error, F_None);
            //std::ofstream os;
            //os.open(name.str() + ".dot");
            //if (!os.is_open()){
            //  errs() << "Could not open the " << name << "file\n";
            //  return false;
            //}
            file << "digraph \"CFG for'" + F.getName() + "\' function\" {\n";
            for (Function::iterator B_iter = F.begin(); B_iter != F.end(); ++B_iter){
                BasicBlock* curBB = &*B_iter;
                std::string name = curBB->getName().str();
                int fromCountNum;
                int toCountNum;
                if (basicBlockMap.find(curBB) != basicBlockMap.end())
                {
                    fromCountNum = basicBlockMap[curBB];
                }
                else
                {
                    fromCountNum = bbCount;
                    basicBlockMap[curBB] = bbCount++;
                }

                file << "\tBB" << fromCountNum << " [shape=record, label=\"{";
                file << "BB" << fromCountNum << ":\\l\\l";
                for (BasicBlock::iterator I_iter = curBB->begin(); I_iter != curBB->end(); ++I_iter) {
                    //printInstruction(&*I_iter, os);
                    file << *I_iter << "\\l\n";
                }
                file << "}\"];\n";
                for (BasicBlock *SuccBB : successors(curBB)){
                    if (basicBlockMap.find(SuccBB) != basicBlockMap.end())
                    {
                        toCountNum = basicBlockMap[SuccBB];
                    }
                    else
                    {
                        toCountNum = bbCount;
                        basicBlockMap[SuccBB] = bbCount++;
                    }

                    file << "\tBB" << fromCountNum<< "-> BB"
                        << toCountNum << ";\n";
                }
            }
            file << "}\n";
            file.close();
            return false;
        }
        //void printInstruction(Instruction *inst, std::ofstream os) {

    //}
    };

完整的代码请参考我的github

llvm:Call Graph And Control Flow Graph

前言

Call Graph

Control Flow Graph

相关阅读

相关文章

相关问答