【gcc编译优化系列】static与inline的区别与联系（RT-Thread技术论坛优秀文章）

冀越

2023-12-01

1 问题来源

今天偶然留意到RT-Thread论坛的一个问题帖子，它的题目是RTT-VSCODE插件编译RTT工程与RTT Studio结果不符，这种编译问题是我最喜欢深扒的，于是我点进去看了看。

得知，它的核心问题就是有一个类似这样定义的函数（为了简要说明问题，我精简了代码）：

/* main.c */

inline void test_func(int a, int b)
{
    printf("%d, %d\n", a, b);
}

int main(int argc, const char *argv[])
{
    /* do something */
    
    /* call func */
    test_func(1, 2);
    
    return 0;
}

然后，问题就是 同一套工程代码在RT-Thread Studio上能够编译通过，但在VSCODE上却产生错误，这个错误居然是undefined reference to ‘test_func’。

2 问题分析

看到undefined reference to ‘test_func’这个错误，熟悉C代码编译流程的都知道，这是一个典型的链接错误，也就是说错误发在链接阶段，链接错误的原因是找不到test_func函数的实现体。

相信你一定也有许多问号？？？？？？

test_func不是定义在main.c里面吗？？？？？

不就在main函数的上面吗？？？？？？

怎么可能会发生链接错误呢？？？？？？

我们平时写函数不就是这样写的吗？？？？？？

难道这个inline作妖？？？？？？

3 知识点分析

3.1 inline关键字是干嘛的？

准确来说，它这个inline是一个**C++**关键字，在函数声明或定义中，函数返回类型前加上关键字inline，即可以把函数指定为内联函数。但是由于市面上的大部分C编译器都可以兼容部分C++的关键字和语法，所以我们也经常见到inline出现在C代码中。

3.2 inline与宏定义有什么区别？

宏定义发生在预编译处理阶段，它仅仅是做字符串的替换，没有任何的语法规则检查，比如类型不匹配，宏展开后的各种语法问题，的确让人比较头疼；
inline函数则是发生在编译阶段，有完整的语法检查，在Debug版本中也可以跟普通函数一样，正常打断点进行调试；
由于处理的阶段不一样，这就导致如果宏函数展开后仍然是一个函数调用的话，它是具有调用函数的开销，包括函数进栈出栈等等；而inline函数却仅仅是函数代码的拷贝替换，并不会发生函数调用的开销，在这一点上inline具有很高的执行效率。

3.3 inline函数与普通函数有什么区别？

正如上面提及的，普通函数的调用在汇编上有标准的 push 压实参指令，然后 call 指令调用函数，给函数开辟栈帧，函数运行完成，有函数退出栈帧的过程；而 inline 内联函数是在编译阶段，在函数的调用点将函数的代码展开，省略了函数栈帧开辟回退的调用开销，效率高。

3.4 static函数与普通函数有什么区别？

两者唯一的区别在于可见范围不一样：

不被static关键字修饰的函数，它在整个工程范围内，全局都可以调用，即其属性是global的；只要函数参与了编译，且最后链接的时候把函数的.o文件链接进去了，是不会报**undefined reference to ‘xxx’**的；
被static关键字修饰的函数，只能在其定义的C文件内可见，即其属性由global变成了local，这个时候如果有另一个C文件的函数想调用这个static的函数，那么对不起，最终链接阶段会报**undefined reference to ‘xxx’**错误的。

4 解决方案

回到前文的问题，该如何解决这个问题呢？我的想法，有两种解决思路：

4.1 放弃inline函数的优势，将inline函数修改为普通函数

这个方法很简单，无非就是去掉inline，做个降维处理，把inline函数变成普通函数，自然编译链接就不会报错。但我想，既然写代码的原作者加了inline，肯定是希望用上inline的高效率的特性，所以去掉inline显然不是一个明智的选择。

4.2 对inline函数加上static修饰

这一个做法，就可以很聪明地把它的问题给解决了。一个函数被static和inline修饰，证明这个函数是一个静态的内联函数，它的可见范围依然是当前C文件，且同时具备inline函数的特性。

5 知其然且知其所以然

5.1 实践出真理

为了验证4.2的改法是否有效，我在rt-thread/bsp/qemu-vexpress-a9中快速做个验证，只需要在applications/main.c里面添加下面的测试代码：

/* applications/main.c */
static inline void test_func(int a, int b)
{
  printf("%d, %d\n", a, b);
}

int main(void)
{
    printf("hello rt-thread\n");

    test_func(1, 2);

    return 0;
}

特此说明下，我使用的交叉编译链是：gcc-arm-none-eabi-5_4-2016q3/bin/arm-none-eabi-gcc

然后使用scons编译，果然编译成功了，运行rtthread.elf，功能一切正常。

而当我去掉static的时候，期望中的链接错误果然出现了。

LINK rtthread.elf
build/applications/main.o: In function `main':
/home/recan/win_share_workspace/rt-thread-share/rt-thread/bsp/qemu-vexpress-a9/applications/main.c:253: undefined reference to `test_func'
collect2: error: ld returned 1 exit status
scons: *** [rtthread.elf] Error 1
scons: building terminated because of errors.

为了做进一步验证，我在rtconfig.py里面的CFLAGS加了一个编译选项：-save-temps=obj；这个选项的作用就是在编译的过程中，把中间过程文件也同步输出，这里的中间文件有以下几个：

xxx.i 文件：这是预编译处理之后的文件，比如想宏定义被展开之后是怎么样的，就可以看这个文件；
xxx.s 文件：这是由预编译处理后的xxx.i文件编译得到的汇编文件，里面描述的是汇编指令；
xxx.o 文件：这是最终对应单个C文件生成的二进制目标文件，这个文件是最终参与链接成可执行文件的。

关于**使用GCC编译C程序的完整过程**这个话题，我已经整理出来了，分享分享给大家，毕竟这个知识点，对于解决编译问题可是帮助非常大的。

5.2 实践结果分析

为了做对比，我把整个编译执行了两次，一次是加上static的，一次是不加static的；

5.2.1 .i文件对比

对比结果如下，使用的是linux下的diff命令

diff ./build/applications/main.i.nostatic ./build/applications/main.i.static
4516c4516
<             inline void test_func(int a, int b)
---
> static inline void test_func(int a, int b)

结果我们发现如我们期望一样，nostatic的仅比static的少了一个static修饰符，其他都是一样的。

5.2.2 .s文件对比

.s文件使用文本对比工具，发现加了static的.s文件，里面有test_func的汇编实现代码，而不加的这个函数直接就被优化掉了，压根就找不到它的实现。

5.2.3 .o文件对比

由于.o文件已经不是可读的文本文件了，我们只能通过一些命令行工具来查看，这里推荐linux命令行下的nm工具，具体用途和方法可以使用man nm查看下。这里直接给出对比的命令行结果：

nm -a ./build/applications/main.o.nostatic | grep test_func
         U test_func

nm -a ./build/applications/main.o.static | grep test_func  
000002d8 t test_func

OK，从中已经可以看到重要区别了：在不带static的版本中，main.c里定义的test_func函数被认为是一个外部函数（标识为U），而被static修饰的却是本地实现函数（标识为T）。
而标识为U的函数是需要外部去实现的，这也就解释了为何nostatic的版本会报undefined reference to ‘test_func’ 错误，因为压根就没有外部的谁去实现这个函数。

5.4 终极实验

5.4.1 补充测试代码

为了验证好这几个关键字的区别，以及为何加了inline还不内联，如何才能真正的内联，我补充了一下测试代码：

#include <stdio.h>

#if 0
/* only inline function : link error ! */
inline void test_func(int a, int b)
{
    printf("%d, %d\n", a, b);
}
#endif

/* normal function: OK */
void test_func1(int a, int b)
{
    printf("%d, %d\n", a, b);
}

/* static function: OK */
static void test_func2(int a, int b)
{
    printf("%d, %d\n", a, b);
}

/* static inline function: OK, but no real inline */
static inline void test_func3(int a, int b)
{
    printf("%d, %d\n", a, b);
}

/* always_inline is very important*/
#define FORCE_FUNCTION  __attribute__((always_inline))

/* static inline function: OK, it real inline. */
FORCE_FUNCTION static inline void test_func4(int a, int b)
{
    printf("%d, %d\n", a, b);
}

int main(int argc, const char *argv[])
{
    printf("Hello world !\n");

    /* call these functions with the same input praram */
    //test_func(1, 2);
    test_func1(1, 2); // normal
    test_func2(1, 2); // static
    test_func3(1, 2); // static inline (real inline ?)
    test_func4(1, 2); // static inline (real inline ?)

    return 0;
}

5.4.2 编译验证

执行编译

gcc main.c -save-temps=obj -Wall -o test_static -Wl,-Map=test_static.map

成功编译，运行也完全没有问题。

./test_static 
Hello world !
1, 2
1, 2
1, 2
1, 2

5.4.3 进阶分析

通过上面的章节，我们可以知道，我们应该重点分析.s文件和.o文件，因为.o文件不可读，我们用nm -a查看下：

 nm -a test_static.o | grep test_func
0000000000000000 T test_func1
000000000000002e t test_func2
000000000000005c t test_func3

结果发现test_func4不在里面了，看样子是被真正inline了？
我们打开.s文件确认下：

    .file   "main.c"
    .text
    .section    .rodata
.LC0:
    .string "%d, %d\n"
    .text
    .globl  test_func1
    .type   test_func1, @function
test_func1:
.LFB0:
    .cfi_startproc
    endbr64
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    subq    $16, %rsp
    movl    %edi, -4(%rbp)
    movl    %esi, -8(%rbp)
    movl    -8(%rbp), %edx
    movl    -4(%rbp), %eax
    movl    %eax, %esi
    leaq    .LC0(%rip), %rdi
    movl    $0, %eax
    call    printf@PLT
    nop
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   test_func1, .-test_func1
    .type   test_func2, @function
test_func2:
.LFB1:
    .cfi_startproc
    endbr64
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    subq    $16, %rsp
    movl    %edi, -4(%rbp)
    movl    %esi, -8(%rbp)
    movl    -8(%rbp), %edx
    movl    -4(%rbp), %eax
    movl    %eax, %esi
    leaq    .LC0(%rip), %rdi
    movl    $0, %eax
    call    printf@PLT
    nop
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE1:
    .size   test_func2, .-test_func2
    .type   test_func3, @function
test_func3:
.LFB2:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    subq    $16, %rsp
    movl    %edi, -4(%rbp)
    movl    %esi, -8(%rbp)
    movl    -8(%rbp), %edx
    movl    -4(%rbp), %eax
    movl    %eax, %esi
    leaq    .LC0(%rip), %rdi
    movl    $0, %eax
    call    printf@PLT
    nop
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE2:
    .size   test_func3, .-test_func3
    .section    .rodata
.LC1:
    .string "Hello world !"
    .text
    .globl  main
    .type   main, @function
main:
.LFB4:
    .cfi_startproc
    endbr64
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    subq    $32, %rsp
    movl    %edi, -20(%rbp)
    movq    %rsi, -32(%rbp)
    leaq    .LC1(%rip), %rdi
    call    puts@PLT
    movl    $2, %esi
    movl    $1, %edi
    call    test_func1
    movl    $2, %esi
    movl    $1, %edi
    call    test_func2
    movl    $2, %esi
    movl    $1, %edi
    call    test_func3
    movl    $1, -8(%rbp)
    movl    $2, -4(%rbp)
    movl    -4(%rbp), %edx
    movl    -8(%rbp), %eax
    movl    %eax, %esi
    leaq    .LC0(%rip), %rdi
    movl    $0, %eax
    call    printf@PLT
    nop
    movl    $0, %eax
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE4:
    .size   main, .-main
    .ident  "GCC: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0"
    .section    .note.GNU-stack,"",@progbits
    .section    .note.gnu.property,"a"
    .align 8
    .long    1f - 0f
    .long    4f - 1f
    .long    5
0:
    .string  "GNU"
1:
    .align 8
    .long    0xc0000002
    .long    3f - 2f
2:
    .long    0x3
3:
    .align 8
4:

从中，我们可以看到test_func1与test_func2的区别是test_func1是GLOBAL的，而test_func2是LOCAL的；而test_func2与test_func3却是完全一模一样；也就是说test_func3使用static inline压根就没有被内联。
我们再找找test_func4，发现已经找不到了，到底是不是内联了？我们再看看main函数里面调用的部分：

main:
.LFB4:
    .cfi_startproc
    endbr64
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    subq    $32, %rsp
    movl    %edi, -20(%rbp)
    movq    %rsi, -32(%rbp)
    leaq    .LC1(%rip), %rdi
    call    puts@PLT

    movl    $2, %esi
    movl    $1, %edi
    call    test_func1  //调用test_func1函数

    movl    $2, %esi
    movl    $1, %edi
    call    test_func2  //调用test_func2函数

    movl    $2, %esi
    movl    $1, %edi
    call    test_func3  //调用test_func3函数

    movl    $1, -8(%rbp)
    movl    $2, -4(%rbp)
    movl    -4(%rbp), %edx
    movl    -8(%rbp), %eax
    movl    %eax, %esi
    leaq    .LC0(%rip), %rdi
    movl    $0, %eax
    call    printf@PLT
    nop
    movl    $0, %eax
    leave               //“调用”test_func4函数，使用了内联，直接拷贝了代码，并不是真的函数调用。


    .cfi_def_cfa 7, 8

哗，果然，这才是真正的内联啊，我们终于揭开了这个神秘的面纱。

5.4 实践经验总结

inline有利有弊，切记使用的时候，最好让它跟static一起使用，否则可能导致的问题超出你的想象。
加了inline，不是你想内联，编译器就一定会帮你内联的，还得看代码的实现。
如果要强制内联，还得加参数修饰，每个C编译器的方法还不一样，比如gcc的是使用**attribute((always_inline))**修饰定义的函数即可。