Hints to the compiler to enable or disable loop unrolling and jamming. These pragmas can only be applied to iterative FOR loops.
指示编译器打开或者关闭循环展开阻塞。仅适用于for循环。
Syntax
#pragma unroll_and_jam
#pragma unroll_and_jam (n)
#pragma nounroll_and_jam
pragma直译为编译指示,但此处翻译为(编译器)导语,即指示编译器做的。fuse和jam意思一样,只是惯用jam。
Arguments
n :is the unrolling factor representing the number of times to unroll a loop; it is an integer constant from 0 through 255.
n是展开因子,表示循环展开次数,是0-255的一个整数(实测最新版Intel编译器Version 19.1.1.217 Build 20200306只支持最大为16)
Description
The unroll_and_jam pragma partially unrolls one or more loops higher in the nest than the innermost loop and fuses/jams the resulting loops back together. This transformation allows more reuses in the loop.
unroll_and_jam导语展开最内层循环以外的循环(即至少是个二层循环,不能用于展开单层循环),并且将展开后的循环融合/阻塞在一起(类似openmp的collapse,平铺循环)。
This pragma is not effective on innermost loops. Ensure that the immediately following loop is not the innermost loop after compiler-initiated interchanges are completed.
该导语在最内层的循环上无效。 在编译器启动的交换完成(编译器可能会交换多层循环的顺序)之后,请确保紧随其后的循环不是最内层的循环。
Specifying this pragma is a hint to the compiler that the unroll and jam sequence is legal and profitable. The compiler enables this transformation whenever possible.
指定这个导语是告诉编译器要展开且阻塞的嵌套循环是合法有益的。编译器会尽可能启用该转换。
The unroll_and_jam pragma must precede the FOR statement for each FOR loop it affects. If n is specified, the optimizer unrolls the loop n times. If n is omitted or if it is outside the allowed range, the optimizer assigns the number of times to unroll the loop. The compiler generates correct code by comparing n and the loop count.
该导语必须在for语句之前。如果指定了n,优化器将循环展开n次,如果省略n或者超过允许的范围,那么将由优化器自行指定n(没有手动指定效果好),编译器通过比较n和循环次数生成正确的代码。
This pragma is supported only when compiler option O3 is set. The unroll_and_jam pragma overrides any setting of loop unrolling from the command line.
只有开启了编译器选项O3时才支持此编译指示。它会覆盖掉命令行中与循环展开有关的任何设置(例如-funroll-loops)。
When unrolling a loop increases register pressure and code size it may be necessary to prevent unrolling of a nested/imperfect nested loop. In such cases, use the nounroll_and_jam pragma. The nounroll_and_jam pragma hints to the compiler not to unroll a specified loop.
当展开一个循环会增加寄存器压力和代码大小时,可能有必要防止展开嵌套/不完美的嵌套循环。 在这种情况下,请使用nounroll_and_jam导语。 nounroll_and_jam提示编译器不要展开指定的循环。
Example: Using unroll_and_jam pragma
int a[10][10];
int b[10][10];
int c[10][10];
int d[10][10];
void unroll(int n)
{
int i,j,k;
#pragma unroll_and_jam (6)
for (i = 1; i < n; i++) {
#pragma unroll_and_jam (6)
for (j = 1; j < n; j++) {
for (k = 1; k < n; k++){ //此处为最内层循环,不能放这里
a[i][j] += b[i][k]*c[k][j];
}
}
}
}
Parallel Programming Guidefor HP-UX Systems,P88
原理
阻塞的目的是数据重用。
循环展开和阻塞转换主要是为了增加寄存器利用率,并在嵌套循环的迭代中减少了内存存取。改进的寄存器使用率减少了对内存访问的需求,并允许更好地利用某些机器指令。
根据上面书籍的P74,数据重用分为空间重用与时间重用。
其他
unroll_and_jam pragma ignored but no reason specified
水平有限,如有错误,敬请指正!