使用适用于ARM Cortex-M的GNU GCC减少代码大小的终极指南
The embedded industry has been talking about code size for decades. In particular, this discussion has applied to microcontroller compilers. Nowadays, more or less every C/C++ compiler on the market is very good at optimizations, and differences in code-size are pretty minimal in most cases.
Additionally, different compilers produce different results dependent on what source code you feed into it. And so, one compiler that beats the competition for one set of source code files may see itself beaten by another compiler for another set of source code files. But even so, there might be times when you want to optimize the code-size as much as possible. Read this blog post to learn how to do this with the GNU GCC C/C++ compiler for ARM Cortex-M!
Many developers are using the GNU C/C++ compiler tool suite for ARM Cortex devices (for example, many STM32, LPC or Kinetis developers). Furthermore, this excellent compiler is also included in many popular ARM Cortex IDE’s; most notably, Atollic TrueSTUDIO.
What options do you have if you want to produce a size-optimized build of an ARM Cortex project using the ARM GNU GCC compiler?
To start with, instruct the compiler to optimize for size in release mode builds. You do this using the -Os compiler command line option. This produces a release build that is optimized for small size.
If you want to code-size optimize debug mode builds (i.e., builds with debug instrumentation), use the -Og compiler command line option instead. This produces a debug build that is optimized for small size. The code cannot be optimized as much as in release mode, as the binary must still be debuggable – but it becomes a lot smaller than without -Og.
Commercially polished GNU tools, like Atollic TrueSTUDIO, provide simple GUI options to enable different types of compiler optimization levels.
The compiler translates the C/C++ source code file being compiled into an intermediate object file, that is later fed into the linker for final processing. The job of the linker is to combine code and data from many object files into one flashable output binary file.
Because the compiler only compiles one source code file at a time, it cannot know if code or data in the file is unused or not. Sure, it knows if a function in the file is called by another function in the same file. But if the function in a file is not called by any other function in the same file, the compiler has no way of knowing if the function is called by a function in some other file, or if the function is unused and can be removed.
This knowledge is only available at link time; in the final phase where the linker combines all the intermediate object files and resolves any interdependencies across the files. This gives the linker the power to remove unused code and data objects from the output binary flash file, thus saving valuable memory and reduce waste. But the compiler can’t do it.
The only problem is the GNU tool chain do not enable this linker feature by default. Dead code and dead data removal must be enabled manually. Thus, many rookie developers trying out the GNU tools end up with massive binary output files, as neither unused application functions, nor unused library functions, are removed.
Commercially polished distributions of the GNU ARM compiler often handle this. The Atollic TrueSTUDIO IDE, for example, have dead code and dead data enabled by default. If you haven’t taken the smooth path of using a commercially polished GNU GCC for ARM distribution, like Atollic TrueSTUDIO, you will have to enable this yourself.
To enable dead code removal, all the C/C++ source code files must be compiled using the command line option -ffunction-sections. Furthermore, the linker must be launched with the linker command line option --gc-sections.
To enable dead data removal, all the C/C++ source code files must be compiled using the command line option -fdata-sections. Furthermore, the linker must be launched with the linker command line option --gc-sections.
Beware that any 3rd party software libraries that you receive in binary format may not be built in this way. This means that any unused functions in the library might not be removed by the linker, and your flashable binary output file may be bloated.
If you use C++, make sure to disable RTTI and exception handling, as these can produce a lot of extra code. Use the compiler options -fno-rtti and -fno-exceptions to do this.
Atollic TrueSTUDIO does this by default, although you can easily re-configure this should you want to override this behavior for full C++ functionality.
The GNU compiler traditionally ships with the Newlib C runtime library by default. This library is standards compliant, but not compact enough for most Cortex-M based projects. You can use the Newlib Nano library for smaller code size.
Additionally, Atollic TrueSTUDIO contains a super-compact “tiny printf” implementation for further code-size reductions.
As I outlined above, the C compiler traditionally compiles one source code file at a time, and translate it to a corresponding intermediate object file. The linker finishes the process by combining all intermediate object files, resolving any interdependencies, and producing a flashable binary output file.
A consequence of this architecture is that the C compiler has to treat each C file as an individual sandbox during compilation. When it applies various optimization techniques, it can only do so on the file itself – without considering how optimizations could be improved if it knew the contents of the other files.
And so, there is one more trick the toolchain can do to reduce the code-size of your project. That is to optimize the code globally, i.e. also across different C files, not only within the individual C files.
This is now possible with the GNU compiler toolchain, using a re-sequencing technique called LTO (Link Time Optimization). With LTO, some optimization algorithms are moved from the compiler to the linker – where they can be applied to the whole code base, not just locally inside each file in isolation. This enables the toolchain to optimize also across files, not just within them. This can reduce the code size further.
You can use the command line option -flto on both the compiler and linker to enable LTO. Keep in mind, however, that this is a reasonably new addition to the GNU tool chain that has not been proven in the field for years yet. Your mileage may thus wary. Since LTO is an experimental feature at this time, Atollic do not provide any quality guarantee or support related to LTO. But it can be worth exploring this further if you are aggressively trying to reduce the code size of your ARM Cortex-M project, and are prepared to be at the forefront of testing new technologies.
The GNU GCC compiler for ARM Cortex-M is a rock solid, proven C/C++ compiler with strong optimization. But with additional knowledge and hand-crafted tweaking, you might get a lot better results compared to if you use the default settings – alternatively, you can choose a commercially polished ARM Cortex-M IDE, like Atollic TrueSTUDIO, that handles most of these issues automatically.
Do you want to print out a convenient checklist? Get the GNU GCC codesize checklist:
The GNU GCC compiler and associated tools are of very high quality nowadays, in particular for ARM Cortex targets like STM32, Kinetis, LPC or EFM32. But what about code size optimizations? Many developers new to GCC don't seem to get it right. Read our free checklist 7 steps to reduce GNU GCC code-size to learn how to reduce the code size in your embedded projects. By following the advice in this free checklist, you will learn more on:
|