Observed issue
==============
The futex futex_nto1_wait() implements a futex wait/wakeup scheme
identical to the liburcu workqueue code, which has an issue with
spurious wakeups.
A spurious wakeup on futex_nto1_wait can cause futex_nto1_wait to return
with a futex state of -1, which is unexpected.
futex_nto1_wait is used by the relayd live dispatcher thread, by the
relayd main dispatcher thread, as well as by the sessiond dispatcher
thread.
Given that following a futex_nto1_wait returning due to a spurious
wakeup futex_nto1_prepare will set the futex value to -1, things go
back to normal for the following futex_nto1_wait calls.
Therefore, the only impact of this issue is to spuriously use slightly
more CPU time than strictly required.
The effect is even shorter-lasting that in the liburcu counterparts
because futex_nto1_prepare explicitly sets the futex state to -1 rather
than use an atomic decrement, which immediately sets to state back to
a consistent state.
Cause
=====
From futex(5):
FUTEX_WAIT
Returns 0 if the caller was woken up. Note that a wake-up can
also be caused by common futex usage patterns in unrelated code
that happened to have previously used the futex word's memory
location (e.g., typical futex-based implementations of Pthreads
mutexes can cause this under some conditions). Therefore, call‐
ers should always conservatively assume that a return value of 0
can mean a spurious wake-up, and use the futex word's value
(i.e., the user-space synchronization scheme) to decide whether
to continue to block or not.
Solution
========
We therefore need to validate whether the value differs from -1 in
user-space after the call to FUTEX_WAIT returns 0.
Known drawbacks
===============
None.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I8a1b6aaf77b6a2f941fd4f89b61bed71cf17906b
{
cmm_smp_mb();
- if (uatomic_read(futex) != -1)
- goto end;
- while (futex_async(futex, FUTEX_WAIT, -1, NULL, NULL, 0)) {
+ while (uatomic_read(futex) == -1) {
+ if (!futex_async(futex, FUTEX_WAIT, -1, NULL, NULL, 0)) {
+ /*
+ * Prior queued wakeups queued by unrelated code
+ * using the same address can cause futex wait to
+ * return 0 even through the futex value is still
+ * -1 (spurious wakeups). Check the value again
+ * in user-space to validate whether it really
+ * differs from -1.
+ */
+ continue;
+ }
switch (errno) {
- case EWOULDBLOCK:
+ case EAGAIN:
/* Value already changed. */
goto end;
case EINTR:
/* Retry if interrupted by signal. */
- break; /* Get out of switch. */
+ break; /* Get out of switch. Check again. */
default:
/* Unexpected error. */
PERROR("futex_async");