The Userspace RCU compatibility layer around sys_futex has a race
condition which makes pretty much all "benchmark" tests hang pretty
quickly on non-Linux systems (tested on Mac OS X).
I narrowed it down to a bug in compat_futex_noasync: this compat layer
uses a single pthread mutex and condition variable for all callers,
independently of their uaddr. The FUTEX_WAKE performs a pthread cond
broadcast to all waiters. FUTEX_WAIT must then compare *uaddr with val
to see which thread has been awakened.
Unfortunately, the check was not done again after each return from
pthread_cond_wait(), thus causing the race.
This race affects threads using the futex_noasync() compatibility layer
concurrently, thus it affects only on non-Linux systems.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
assert(!ret);
switch (op) {
case FUTEX_WAIT:
- if (*uaddr != val)
- goto end;
- pthread_cond_wait(&__urcu_compat_futex_cond, &__urcu_compat_futex_lock);
+ /*
+ * Wait until *uaddr is changed to something else than "val".
+ * Comparing *uaddr content against val figures out which
+ * thread has been awakened.
+ */
+ while (*uaddr == val)
+ pthread_cond_wait(&__urcu_compat_futex_cond,
+ &__urcu_compat_futex_lock);
break;
case FUTEX_WAKE:
+ /*
+ * Each wake is sending a broadcast, thus attempting wakeup of
+ * all awaiting threads, independently of their respective
+ * uaddr.
+ */
pthread_cond_broadcast(&__urcu_compat_futex_cond);
break;
default:
gret = -EINVAL;
}
-end:
ret = pthread_mutex_unlock(&__urcu_compat_futex_lock);
assert(!ret);
return gret;