Fix: sessiond wait futex: handle spurious futex wakeups
authorMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Thu, 23 Jun 2022 19:58:04 +0000 (15:58 -0400)
committerMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mon, 27 Jun 2022 14:28:15 +0000 (10:28 -0400)
Observed issue
==============

LTTng-UST scheme for letting listener threads wait on session daemon
to wake up a futex is similar to the liburcu workqueue code, which has
an issue with spurious wakeups.

This wait/wakeup scheme is only used after the LTTng-UST listener thread
has been unable to connect to the session daemon.

A spurious wakeup on wait_for_sessiond can cause wait_for_sessiond to
return with a sock_info->wait_shm_mmap state of 0, which is unexpected.

However, this should not cause any user-observable issues other than
using slightly more CPU time than strictly needed, because this spurious
wakeup will only cause an additional connection attempt to the session
daemon to fail.

Cause
=====

From futex(5):

       FUTEX_WAIT
              Returns 0 if the caller was woken up.  Note that a  wake-up  can
              also  be caused by common futex usage patterns in unrelated code
              that happened to have previously used the  futex  word's  memory
              location  (e.g., typical futex-based implementations of Pthreads
              mutexes can cause this under some conditions).  Therefore, call‐
              ers should always conservatively assume that a return value of 0
              can mean a spurious wake-up, and  use  the  futex  word's  value
              (i.e.,  the user-space synchronization scheme) to decide whether
              to continue to block or not.

Solution
========

We therefore need to validate whether the value differs from 0 in
user-space after the call to FUTEX_WAIT returns 0.

Known drawbacks
===============

None.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I468d8ff302f467ee9924e6edb04476fcb031b4b9

src/lib/lttng-ust/lttng-ust-comm.c

index ed5bb8c8f4339f2d067ec7a934b5c6e72b3b319a..a46ca4e40bb4b79bf0002fc1a2f7ea0cd35e9627 100644 (file)
@@ -1757,18 +1757,25 @@ void wait_for_sessiond(struct sock_info *sock_info)
 
        DBG("Waiting for %s apps sessiond", sock_info->name);
        /* Wait for futex wakeup */
-       if (uatomic_read((int32_t *) sock_info->wait_shm_mmap))
-               goto end_wait;
-
-       while (lttng_ust_futex_async((int32_t *) sock_info->wait_shm_mmap,
-                       FUTEX_WAIT, 0, NULL, NULL, 0)) {
+       while (!uatomic_read((int32_t *) sock_info->wait_shm_mmap)) {
+               if (!lttng_ust_futex_async((int32_t *) sock_info->wait_shm_mmap, FUTEX_WAIT, 0, NULL, NULL, 0)) {
+                       /*
+                        * Prior queued wakeups queued by unrelated code
+                        * using the same address can cause futex wait to
+                        * return 0 even through the futex value is still
+                        * 0 (spurious wakeups). Check the value again
+                        * in user-space to validate whether it really
+                        * differs from 0.
+                        */
+                       continue;
+               }
                switch (errno) {
-               case EWOULDBLOCK:
+               case EAGAIN:
                        /* Value already changed. */
                        goto end_wait;
                case EINTR:
                        /* Retry if interrupted by signal. */
-                       break;  /* Get out of switch. */
+                       break;  /* Get out of switch. Check again. */
                case EFAULT:
                        wait_poll_fallback = 1;
                        DBG(
This page took 0.028351 seconds and 4 git commands to generate.