← Back to team overview

maria-developers team mailing list archive

Fixes for an issue with my_atomic

 

Hi,

I had a problem with my group commit patch, and tracked it down to a problem
in my_atomic.

The issue is that my_atomic_cas*(val, cmp, new) accesses *cmp after successful
CAS operation (in one place reading it, in another place writing it). Here is
the fix:

Index: work-5.1-groupcommit/include/atomic/gcc_builtins.h
===================================================================
--- work-5.1-groupcommit.orig/include/atomic/gcc_builtins.h	2010-06-09 11:53:59.000000000 +0200
+++ work-5.1-groupcommit/include/atomic/gcc_builtins.h	2010-06-09 11:54:06.000000000 +0200
@@ -19,8 +19,9 @@
   v= __sync_lock_test_and_set(a, v);
 #define make_atomic_cas_body(S)                     \
   int ## S sav;                                     \
-  sav= __sync_val_compare_and_swap(a, *cmp, set);   \
-  if (!(ret= (sav == *cmp))) *cmp= sav;
+  int ## S cmp_val= *cmp;                           \
+  sav= __sync_val_compare_and_swap(a, cmp_val, set);\
+  if (!(ret= (sav == cmp_val))) *cmp= sav
 
 #ifdef MY_ATOMIC_MODE_DUMMY
 #define make_atomic_load_body(S)   ret= *a
Index: work-5.1-groupcommit/include/atomic/x86-gcc.h
===================================================================
--- work-5.1-groupcommit.orig/include/atomic/x86-gcc.h	2010-06-09 11:53:59.000000000 +0200
+++ work-5.1-groupcommit/include/atomic/x86-gcc.h	2010-06-09 11:54:06.000000000 +0200
@@ -45,8 +45,12 @@
 #define make_atomic_fas_body(S)				\
   asm volatile ("xchg %0, %1;" : "+r" (v) , "+m" (*a))
 #define make_atomic_cas_body(S)					\
+  int ## S sav;                                     \
   asm volatile (LOCK_prefix "; cmpxchg %3, %0; setz %2;"	\
-               : "+m" (*a), "+a" (*cmp), "=q" (ret): "r" (set))
+               : "+m" (*a), "=a" (sav), "=q" (ret)              \
+               : "r" (set), "a" (*cmp));                        \
+  if (!ret)                                                     \
+    *cmp= sav
 
 #ifdef MY_ATOMIC_MODE_DUMMY
 #define make_atomic_load_body(S)   ret=*a

This makes the behaviour consistent with the other implementations, see for
example generic-msvc.h.

(It is also pretty important for correct operation. In my code, I use
my_atomic_casptr() to atomically enqueue a struct from one thread, and as soon
as it is enqueued another thread may grab the struct and change it, including
the field pointed to by cmp. So it is essential that my_atomic_casptr()
neither reads nor writes *cmp after successful CAS, as the above patch
ensures.)

It was btw. a bit funny how I tracked this down. After digging for a few hours
in my own code without success I got the idea to check the CAS implementation,
and spotted the problem in gcc_builtins.h. After fixing I was baffled as to
why my code stil failed, until I realised I was not using gcc_builtins.h but
x86-gcc.h, and found and fixed the similar problem there ;-)

Now the question is, where should I push this (if at all)? Any opinions?

-----------------------------------------------------------------------

While I was there, I also noticed another potential problem in gcc_builtins.h,
suggesting this patch:

Index: work-5.1-groupcommit/include/atomic/x86-gcc.h
===================================================================
--- work-5.1-groupcommit.orig/include/atomic/x86-gcc.h	2010-06-09 11:37:12.000000000 +0200
+++ work-5.1-groupcommit/include/atomic/x86-gcc.h	2010-06-09 11:52:47.000000000 +0200
@@ -38,17 +38,31 @@
 #define asm __asm__
 #endif
 
+/*
+  The atomic operations imply a memory barrier for the CPU, to ensure that all
+  prior writes are flushed from cache, and all subsequent reads reloaded into
+  cache.
+
+  We need to imply a similar memory barrier for the compiler, so that all
+  pending stores (to memory that may be aliased in other parts of the code)
+  will be flushed to memory before the operation, and all reads from such
+  memory be re-loaded. This is achieved by adding the "memory" pseudo-register
+  to the clobber list, see GCC documentation for more explanation.
+
+  The compiler and CPU memory barriers are needed to make sure changes in one
+  thread are made visible in another by the atomic operation.
+*/
 #ifndef MY_ATOMIC_NO_XADD
 #define make_atomic_add_body(S)					\
-  asm volatile (LOCK_prefix "; xadd %0, %1;" : "+r" (v) , "+m" (*a))
+  asm volatile (LOCK_prefix "; xadd %0, %1;" : "+r" (v) , "+m" (*a): : "memory")
 #endif
 #define make_atomic_fas_body(S)				\
-  asm volatile ("xchg %0, %1;" : "+r" (v) , "+m" (*a))
+  asm volatile ("xchg %0, %1;" : "+r" (v) , "+m" (*a) : : "memory")
 #define make_atomic_cas_body(S)					\
   int ## S sav;                                     \
   asm volatile (LOCK_prefix "; cmpxchg %3, %0; setz %2;"	\
                : "+m" (*a), "=a" (sav), "=q" (ret)              \
-               : "r" (set), "a" (*cmp));                        \
+               : "r" (set), "a" (*cmp) : "memory");             \
   if (!ret)                                                     \
     *cmp= sav
 
@@ -63,9 +77,9 @@
 #define make_atomic_load_body(S)				\
   ret=0;							\
   asm volatile (LOCK_prefix "; cmpxchg %2, %0"			\
-               : "+m" (*a), "+a" (ret): "r" (ret))
+               : "+m" (*a), "+a" (ret) : "r" (ret) : "memory")
 #define make_atomic_store_body(S)				\
-  asm volatile ("; xchg %0, %1;" : "+m" (*a), "+r" (v))
+  asm volatile ("; xchg %0, %1;" : "+m" (*a), "+r" (v) : : "memory")
 #endif
 
 /* TODO test on intel whether the below helps. on AMD it makes no difference */

The comment in the patch explains the idea I think. Basically, these memory
barrier operations need to be a compiler barrier also. Otherwise there is
nothing that prevents GCC from moving unrelated stores across the memory
barrier operation. This is a problem for example when filling in a structure
and then atomically linking it into a shared list. The CPU memory barrier
ensures that the fields in the structure will be visible to other threads on
the CPU level, but it is also necessary to tell GCC to keep the stores into
the struct fields prior to the store linking into the shared list.

(It also makes the volatile declaration on the updated memory location
unnecessary, but maybe it is needed for other implementations (though it
shouldn't be, volatile is almost always wrong)).

 - Kristian.