From: Mathieu Desnoyers Date: Thu, 11 Oct 2012 15:41:48 +0000 (-0400) Subject: call_rcu: remove head field alignement, explain wfcqueue motivation X-Git-Tag: v0.8.0~198 X-Git-Url: https://git.lttng.org./?a=commitdiff_plain;h=0b8ab7df078a6d8e1439b1db5849638892e1cc83;p=userspace-rcu.git call_rcu: remove head field alignement, explain wfcqueue motivation The following commit: commit 5161f31e09ce33dd79afad8d08a2372fbf1c4fbe Author: Mathieu Desnoyers Date: Tue Sep 25 10:50:49 2012 -0500 call_rcu: use wfcqueue, eliminate false-sharing Eliminate false-sharing between call_rcu (enqueuer) and worker threads on the queue head and tail. introduced a change in call_rcu: it moved from "wfqueue" to "wfcqueue". Its changelog states that the goal is to eliminate false-sharing, but the changelog rationale is wrong. The actual primary goal is to use the "splice" operation (which is similar to the "dequeue_all" operation proposed by Lai Jiangshan), instead of open-coding this operation directly within the call_rcu implementation. The objective stated by Lai was to make testing of this code-path easier, and he was right: we ended up noticing a bug in the original call_rcu implementation (in this open-coded splice operation) that was really hard to trigger, which was fixed by the move to wfcqueue. About false-sharing: In the case of call_rcu callback invokation threads vs call_rcu callers, we do not care about false-sharing because call_rcu callback-invocation threads use batching ("splice") to get an entire list of callbacks, which effectively empties the queue, and requires to touch the tail anyway. Ensuring that head and tail are placed on different cache lines would matter only if we would be using "dequeue" in the callback-invocation thread, which is not the case: we grab the whole queue, and then iterate from our local head to our local tail. Signed-off-by: Mathieu Desnoyers --- diff --git a/urcu-call-rcu-impl.h b/urcu-call-rcu-impl.h index dca98e4..4e5879f 100644 --- a/urcu-call-rcu-impl.h +++ b/urcu-call-rcu-impl.h @@ -48,15 +48,14 @@ struct call_rcu_data { /* - * Align the tail on cache line size to eliminate false-sharing - * with head. Small note, however: the "qlen" field, kept for - * debugging, will cause false-sharing between enqueue and - * dequeue. + * We do not align head on a different cache-line than tail + * mainly because call_rcu callback-invocation threads use + * batching ("splice") to get an entire list of callbacks, which + * effectively empties the queue, and requires to touch the tail + * anyway. */ struct cds_wfcq_tail cbs_tail; - /* Alignment on cache line size will add padding here */ - - struct cds_wfcq_head __attribute__((aligned(CAA_CACHE_LINE_SIZE))) cbs_head; + struct cds_wfcq_head cbs_head; unsigned long flags; int32_t futex; unsigned long qlen; /* maintained for debugging. */