From: Paolo Bonzini <pbonzini@redhat.com>
Date: Sat, 10 Sep 2011 19:29:56 +0000 (-0700)
Subject: cmm: do not generate code for smp_rmb/smp_wmb on x86_64
X-Git-Tag: v0.6.5~28
X-Git-Url: https://git.lttng.org./?a=commitdiff_plain;h=4e029f656588009dec8ce6c47e57b11e658bf2d5;p=userspace-rcu.git

cmm: do not generate code for smp_rmb/smp_wmb on x86_64

We can assume, on x86, that no accesses to write-combining memory occur,
and also that there are no non-temporal load/stores (people would
presumably write those with assembly or intrinsics and put appropriate
lfence/sfence manually). In this case rmb and wmb are no-ops on x86.
according to the updated x86 memory models:

INTEL CORPORATION. Intel 64 Architecture Memory Ordering White Paper,
2007.
http://developer.intel.com/products/processor/manuals/318147.pdf

ADVANCED MICRO DEVICES. AMD x86-64 Architecture Programmer’s Manual
Volume 2: System Programming, 2007.

Paul E. McKenney. Memory Ordering in Modern Microprocessors
www.rdrop.com/users/paulmck/scalability/paper/ordering.2007.09.19a.pdf

x86 does not reorder loads vs loads, and stores vs stores when using
normal memory accesses, with the notable exceptions of Pentium Pro
(reorders reads) and WinChip (reorders writes). Therefore, it is safe
not to emit fence instructions for x86_64 cmm_smp_rmb()/cmm_smp_wmb(),
but we leave the memory fences in place for x86_32 for those two
special-cases.

Define cmm_smp_rmb and cmm_smp_wmb to be the "common" operations that
do not require fence instruction, while leaving cmm_rmb and cmm_wmb in
place for more sophisticated uses.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---

diff --git a/urcu/arch/x86.h b/urcu/arch/x86.h
index 9e5411f..c1e2e07 100644
--- a/urcu/arch/x86.h
+++ b/urcu/arch/x86.h
@@ -33,12 +33,27 @@ extern "C" {
 
 #ifdef CONFIG_RCU_HAVE_FENCE
 #define cmm_mb()    asm volatile("mfence":::"memory")
-#define cmm_rmb()   asm volatile("lfence":::"memory")
-#define cmm_wmb()   asm volatile("sfence"::: "memory")
+
+/*
+ * Define cmm_rmb/cmm_wmb to "strict" barriers that may be needed when
+ * using SSE or working with I/O areas.  cmm_smp_rmb/cmm_smp_wmb are
+ * only compiler barriers, which is enough for general use.
+ */
+#define cmm_rmb()     asm volatile("lfence":::"memory")
+#define cmm_wmb()     asm volatile("sfence"::: "memory")
+#define cmm_smp_rmb() cmm_barrier()
+#define cmm_smp_wmb() cmm_barrier()
 #else
 /*
- * Some non-Intel clones support out of order store. cmm_wmb() ceases to be a
- * nop for these.
+ * We leave smp_rmb/smp_wmb as full barriers for processors that do not have
+ * fence instructions.
+ *
+ * An empty cmm_smp_rmb() may not be enough on old PentiumPro multiprocessor
+ * systems, due to an erratum.  The Linux kernel says that "Even distro
+ * kernels should think twice before enabling this", but for now let's
+ * be conservative and leave the full barrier on 32-bit processors.  Also,
+ * IDT WinChip supports weak store ordering, and the kernel may enable it
+ * under our feet; cmm_smp_wmb() ceases to be a nop for these processors.
  */
 #define cmm_mb()    asm volatile("lock; addl $0,0(%%esp)":::"memory")
 #define cmm_rmb()   asm volatile("lock; addl $0,0(%%esp)":::"memory")