Fix: waiter: futex wait: handle spurious futex wakeups
authorMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Thu, 23 Jun 2022 20:27:41 +0000 (16:27 -0400)
committerJérémie Galarneau <jeremie.galarneau@efficios.com>
Mon, 22 Aug 2022 18:01:41 +0000 (14:01 -0400)
Observed issue
==============

The waiter lttng_waiter_wait() implements a futex wait/wakeup
scheme similar to the liburcu workqueue code, which has an issue with
spurious wakeups.

A spurious wakeup on lttng_waiter_wait can cause
lttng_waiter_wait to reach label skip_futex_wait with a
waiter->state state of WAITER_WAITING, which is unexpected. It would
cause busy-waiting on WAITER_TEARDOWN state to start early. The
wait-teardown stage is done with WAIT_ATTEMPTS active attempts,
following by attempts spaced by 10ms sleeps. I do not expect that these
spurious wakeups will cause user-observable effects other than being
slightly less efficient that it should be.

This issue will cause spurious unexpected high CPU use, but will not
lead to data corruption.

Cause
=====

From futex(5):

       FUTEX_WAIT
              Returns 0 if the caller was woken up.  Note that a  wake-up  can
              also  be caused by common futex usage patterns in unrelated code
              that happened to have previously used the  futex  word's  memory
              location  (e.g., typical futex-based implementations of Pthreads
              mutexes can cause this under some conditions).  Therefore, call‐
              ers should always conservatively assume that a return value of 0
              can mean a spurious wake-up, and  use  the  futex  word's  value
              (i.e.,  the user-space synchronization scheme) to decide whether
              to continue to block or not.

Solution
========

We therefore need to validate whether the value differs from
WAITER_WAITING in user-space after the call to FUTEX_WAIT returns 0.

Known drawbacks
===============

None.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ida9905d1f0b5d9543c8b85ecbd7d748a6f7c1c97

src/common/waiter.c

index 779754e1ba168cd2d3e115bfc39e82d81109728d..db2de6557f5df5a5465d66c8e2df591202953983 100644 (file)
@@ -52,15 +52,25 @@ void lttng_waiter_wait(struct lttng_waiter *waiter)
                }
                caa_cpu_relax();
        }
-       while (futex_noasync(&waiter->state, FUTEX_WAIT, WAITER_WAITING,
-                       NULL, NULL, 0)) {
+       while (uatomic_read(&waiter->state) == WAITER_WAITING) {
+               if (!futex_noasync(&waiter->state, FUTEX_WAIT, WAITER_WAITING, NULL, NULL, 0)) {
+                       /*
+                        * Prior queued wakeups queued by unrelated code
+                        * using the same address can cause futex wait to
+                        * return 0 even through the futex value is still
+                        * WAITER_WAITING (spurious wakeups). Check
+                        * the value again in user-space to validate
+                        * whether it really differs from WAITER_WAITING.
+                        */
+                       continue;
+               }
                switch (errno) {
-               case EWOULDBLOCK:
+               case EAGAIN:
                        /* Value already changed. */
                        goto skip_futex_wait;
                case EINTR:
                        /* Retry if interrupted by signal. */
-                       break;  /* Get out of switch. */
+                       break;  /* Get out of switch. Check again. */
                default:
                        /* Unexpected error. */
                        PERROR("futex_noasync");
This page took 0.026285 seconds and 4 git commands to generate.