From: Mathieu Desnoyers Date: Thu, 23 Jun 2022 20:15:51 +0000 (-0400) Subject: Fix: futex wait: handle spurious futex wakeups X-Git-Url: https://git.lttng.org./?a=commitdiff_plain;h=36962e16797b5ca590c54a385ca594e2859ef8c2;p=lttng-tools.git Fix: futex wait: handle spurious futex wakeups Observed issue ============== The futex futex_nto1_wait() implements a futex wait/wakeup scheme identical to the liburcu workqueue code, which has an issue with spurious wakeups. A spurious wakeup on futex_nto1_wait can cause futex_nto1_wait to return with a futex state of -1, which is unexpected. futex_nto1_wait is used by the relayd live dispatcher thread, by the relayd main dispatcher thread, as well as by the sessiond dispatcher thread. Given that following a futex_nto1_wait returning due to a spurious wakeup futex_nto1_prepare will set the futex value to -1, things go back to normal for the following futex_nto1_wait calls. Therefore, the only impact of this issue is to spuriously use slightly more CPU time than strictly required. The effect is even shorter-lasting that in the liburcu counterparts because futex_nto1_prepare explicitly sets the futex state to -1 rather than use an atomic decrement, which immediately sets to state back to a consistent state. Cause ===== From futex(5): FUTEX_WAIT Returns 0 if the caller was woken up. Note that a wake-up can also be caused by common futex usage patterns in unrelated code that happened to have previously used the futex word's memory location (e.g., typical futex-based implementations of Pthreads mutexes can cause this under some conditions). Therefore, call‐ ers should always conservatively assume that a return value of 0 can mean a spurious wake-up, and use the futex word's value (i.e., the user-space synchronization scheme) to decide whether to continue to block or not. Solution ======== We therefore need to validate whether the value differs from -1 in user-space after the call to FUTEX_WAIT returns 0. Known drawbacks =============== None. Signed-off-by: Mathieu Desnoyers Signed-off-by: Jérémie Galarneau Change-Id: I8a1b6aaf77b6a2f941fd4f89b61bed71cf17906b --- diff --git a/src/common/futex.cpp b/src/common/futex.cpp index 21ab09ab7..e178b7e1b 100644 --- a/src/common/futex.cpp +++ b/src/common/futex.cpp @@ -72,16 +72,25 @@ void futex_nto1_wait(int32_t *futex) { cmm_smp_mb(); - if (uatomic_read(futex) != -1) - goto end; - while (futex_async(futex, FUTEX_WAIT, -1, NULL, NULL, 0)) { + while (uatomic_read(futex) == -1) { + if (!futex_async(futex, FUTEX_WAIT, -1, NULL, NULL, 0)) { + /* + * Prior queued wakeups queued by unrelated code + * using the same address can cause futex wait to + * return 0 even through the futex value is still + * -1 (spurious wakeups). Check the value again + * in user-space to validate whether it really + * differs from -1. + */ + continue; + } switch (errno) { - case EWOULDBLOCK: + case EAGAIN: /* Value already changed. */ goto end; case EINTR: /* Retry if interrupted by signal. */ - break; /* Get out of switch. */ + break; /* Get out of switch. Check again. */ default: /* Unexpected error. */ PERROR("futex_async");