Fix: call_rcu: futex wait: handle spurious futex wakeups
authorMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Wed, 22 Jun 2022 20:38:06 +0000 (16:38 -0400)
committerMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mon, 27 Jun 2022 14:30:07 +0000 (10:30 -0400)
Observed issue
==============

The urcu call_rcu() and rcu_barrier() each implement a futex wait/wakeup
scheme identical to the workqueue code, which has an issue with spurious
wakeups.

* call_rcu

A spurious wakeup on call_rcu_wait can cause call_rcu_wait to return
with a crdp->futex state of -1, which is unexpected. It would cause the
following loops in call_rcu_thread() to decrement the crdp->futex to
values below -1, thus actively using CPU time as values will be
decremented to very low negative values until the futex value underflows
back to 0. The state is *not* restored to 0 when the callback list is
found to be non-empty, so this unexpected state will persist until the
crdp->futex state underflows back to 0, or until the call_rcu_thread is
stopped. What prevents this from having too much user-observable effects
is that the call rcu thread has a 10ms sleep between loops, to favor
batching of callbacks. Therefore, rather than being a purely 100% active
busy-wait, this scenario leads to a busy-wait which is paced by 10ms
sleeps.

Therefore the observed issue will be that the call_rcu_thread will
unexpectedly wake up the CPU each 10ms after this spurious wakeup
happens.

* rcu_barrier

A spurious wakeup on call_rcu_completion_wait can cause
call_rcu_completion_wait to return with a completion->futex state of -1,
which is unexpected. It would cause the following loops in rcu_barrier()
to decrement the completion->futex to values below -1, thus actively
using CPU time as values will be decremented to very low negative values
until either the barrier count reaches 0 or until the futex value
underflows to 0.

Therefore the observed issue will be that rcu_barrier() will
unexpectedly use a lot of CPU time when this spurious wakeup happens.

These issues will cause spurious unexpected high CPU use, but will not
lead to data corruption.

Cause
=====

From futex(5):

       FUTEX_WAIT
              Returns 0 if the caller was woken up.  Note that a  wake-up  can
              also  be caused by common futex usage patterns in unrelated code
              that happened to have previously used the  futex  word's  memory
              location  (e.g., typical futex-based implementations of Pthreads
              mutexes can cause this under some conditions).  Therefore, call‐
              ers should always conservatively assume that a return value of 0
              can mean a spurious wake-up, and  use  the  futex  word's  value
              (i.e.,  the user-space synchronization scheme) to decide whether
              to continue to block or not.

Solution
========

We therefore need to validate whether the value differs from -1 in
user-space after the call to FUTEX_WAIT returns 0.

Known drawbacks
===============

None.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I3e625f1689462f8eb9f1223b5b24b1a754bad324

src/urcu-call-rcu-impl.h

index 288686528997004c890cea7f31dc6af2af5e5398..136337290ce07be59e524d971b320e4b8fc28912 100644 (file)
@@ -240,17 +240,25 @@ static void call_rcu_wait(struct call_rcu_data *crdp)
 {
        /* Read call_rcu list before read futex */
        cmm_smp_mb();
-       if (uatomic_read(&crdp->futex) != -1)
-               return;
-       while (futex_async(&crdp->futex, FUTEX_WAIT, -1,
-                       NULL, NULL, 0)) {
+       while (uatomic_read(&crdp->futex) == -1) {
+               if (!futex_async(&crdp->futex, FUTEX_WAIT, -1, NULL, NULL, 0)) {
+                       /*
+                        * Prior queued wakeups queued by unrelated code
+                        * using the same address can cause futex wait to
+                        * return 0 even through the futex value is still
+                        * -1 (spurious wakeups). Check the value again
+                        * in user-space to validate whether it really
+                        * differs from -1.
+                        */
+                       continue;
+               }
                switch (errno) {
-               case EWOULDBLOCK:
+               case EAGAIN:
                        /* Value already changed. */
                        return;
                case EINTR:
                        /* Retry if interrupted by signal. */
-                       break;  /* Get out of switch. */
+                       break;  /* Get out of switch. Check again. */
                default:
                        /* Unexpected error. */
                        urcu_die(errno);
@@ -274,17 +282,25 @@ static void call_rcu_completion_wait(struct call_rcu_completion *completion)
 {
        /* Read completion barrier count before read futex */
        cmm_smp_mb();
-       if (uatomic_read(&completion->futex) != -1)
-               return;
-       while (futex_async(&completion->futex, FUTEX_WAIT, -1,
-                       NULL, NULL, 0)) {
+       while (uatomic_read(&completion->futex) == -1) {
+               if (!futex_async(&completion->futex, FUTEX_WAIT, -1, NULL, NULL, 0)) {
+                       /*
+                        * Prior queued wakeups queued by unrelated code
+                        * using the same address can cause futex wait to
+                        * return 0 even through the futex value is still
+                        * -1 (spurious wakeups). Check the value again
+                        * in user-space to validate whether it really
+                        * differs from -1.
+                        */
+                       continue;
+               }
                switch (errno) {
-               case EWOULDBLOCK:
+               case EAGAIN:
                        /* Value already changed. */
                        return;
                case EINTR:
                        /* Retry if interrupted by signal. */
-                       break;  /* Get out of switch. */
+                       break;  /* Get out of switch. Check again. */
                default:
                        /* Unexpected error. */
                        urcu_die(errno);
This page took 0.027562 seconds and 4 git commands to generate.