当我们在做多线程编程的时候,会涉及到一个称为memory order的问题。
例如
int x(0),y(0);
x=4;
y=3;
请问,实际执行的时候,这两条赋值语句谁先执行,谁后执行? 会不会有某个时间点,在某个CPU看来,y比x大?
答案很复杂。本文的目的是从非常实践的角度来考虑这个问题。
首先,它分为两个层面。在编译器看来,x和y是两个没有关联的变量,那么编译器有权利调整这两行代码的执行顺序,只要它乐意。
其次,CPU也有权利这么做。
如果我非要严格要求顺序,那么就应该插入一个memory barrier
int x(0),y(0);
x=4;
在此插入memory barrier指令
y=3;
下面要论述,中间那行怎么写。请耐心看下去,因为大多数人都在瞎整。
gcc的手册中有一节叫做”Built-in functions for atomic memory access”,然后里面列举了这样一个函数:
__sync_synchronize (…)
This builtin issues a full memory barrier.
来,我们写段代码试下:
int main(){
__sync_synchronize();
return 0;
}
然后用gcc4.2编译,
# gcc -S -c test.c
然后看对应的汇编代码,
main:
pushq %rbp
movq %rsp, %rbp
movl $0, %eax
leave
ret
嗯?Nothing at all !!! 不信你试一试,我的编译环境是Freebsd 9.0 release, gcc (GCC) 4.2.1 20070831 patched [FreeBSD]。 好,我换个高版本的gcc编译器试一试,gcc46 (FreeBSD Ports Collection) 4.6.3 20120113 (prerelease)
main:
pushq %rbp
movq %rsp, %rbp
mfence
movl $0, %eax
popq %rbp
ret
看,多了一行,mfence。 怎么回事呢?这是gcc之前的一个BUG:http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36793 。 2008年被发现,然后修复的。其实它之所以是一个BUG,关键在于gcc的开发者很迷惑,mfence在x86 CPU上到底有没有用?有嘛用? 说到这里,我们得到一个结论:gcc的__sync_synchronize()尽量别用,因为你的代码在低版本的gcc下会有BUG。大部分人用的gcc都比4.4低。从CentOS 6开始,默认的编译器才是gcc 4.4。
那么mfence到底能不能提供我们想要的结果呢? 之前intel的手册一直语焉不详,没说清楚。
最新的手册对mfence的解释是:
“Serializes all store and load operations that occurred prior to the MFENCE instruction in the
program instruction stream”
并且特别强调,这个指令影响的是data memory子系统,而不是指令执行流。
对于单个CPU来说,
"Reads cannot pass earlier MFENCE instructions”
“Writes cannot pass earlier MFENCE instructions. ”
“MFENCE instructions cannot pass earlier reads or writes”
而对于多个CPU来说,
• Individual processors use the same ordering principles as in a single-processor system.
• Writes by a single processor are observed in the same order by all processors.
• Writes from an individual processor are NOT ordered with respect to the writes from other processors.
• Memory ordering obeys causality (memory ordering respects transitive visibility).
• Any two stores are seen in a consistent order by processors other than those performing the stores
简单点说,对于单个CPU,即便你不用mfence,写入顺序也是保证的。
假如你在C++中, std::string* str=new std::string();
那么不会出现str指针已经被赋值但是它指向的对象还未被初始化好的情况。
另一个有趣的问题是,gcc有一个汇编指令是用来控制内存顺序的,请看这段文档:
Accesses to non-volatile objects are not ordered with respect to volatile accesses. You cannot use a volatile object as a memory barrier to order a sequence of writes to non-volatile memory. For instance:
int *ptr = something; volatile int vobj; *ptr = something; vobj = 1;
Unless *ptr and vobj can be aliased, it is not guaranteed that the write to *ptr occurs by the time the update of vobj happens. If you need this guarantee, you must use a stronger memory barrier such as:
int *ptr = something; volatile int vobj; *ptr = something; asm volatile ("" : : : "memory"); vobj = 1;
经我测试,asm volatile (“” : : : “memory”);并不生成任何汇编代码。也就是说,这个仅仅是给编译器看的。
为了进一步证实我的观点,请看如下从Intel的Threading Building Blocks函数库中摘取的代码:
#define __TBB_compiler_fence() __asm__ __volatile__(“”: : :”memory”)
#define __TBB_control_consistency_helper() __TBB_compiler_fence()
#define __TBB_acquire_consistency_helper() __TBB_compiler_fence()
#define __TBB_release_consistency_helper() __TBB_compiler_fence()
#ifndef __TBB_full_memory_fence
#define __TBB_full_memory_fence() __asm__ __volatile__(“mfence”: : :”memory”)
#endif
能同时起编译器和硬件内存屏障作用的是__asm__ __volatile__(“mfence”: : :”memory”)。注意:mfence!
另外,我们在intel cpu上用的CAS指令都是带lock前缀的。所以在使用CAS的时候完全不必考虑memory order的问题。
最后推荐一篇文章:《Mathematizing C++ Concurrency》http://www.cl.cam.ac.uk/~pes20/cpp/popl085ap-sewell.pdf 第一作者是剑桥的某在读博士。)
This article is from: https://www.sunchangming.com/blog/post/1632.html