call_rcu: remove head field alignement, explain wfcqueue motivation
The following commit:
commit
5161f31e09ce33dd79afad8d08a2372fbf1c4fbe
Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Date: Tue Sep 25 10:50:49 2012 -0500
call_rcu: use wfcqueue, eliminate false-sharing
Eliminate false-sharing between call_rcu (enqueuer) and worker threads
on the queue head and tail.
introduced a change in call_rcu: it moved from "wfqueue" to "wfcqueue".
Its changelog states that the goal is to eliminate false-sharing, but
the changelog rationale is wrong.
The actual primary goal is to use the "splice" operation (which is
similar to the "dequeue_all" operation proposed by Lai Jiangshan),
instead of open-coding this operation directly within the call_rcu
implementation. The objective stated by Lai was to make testing of this
code-path easier, and he was right: we ended up noticing a bug in the
original call_rcu implementation (in this open-coded splice operation)
that was really hard to trigger, which was fixed by the move to
wfcqueue.
About false-sharing: In the case of call_rcu callback invokation threads
vs call_rcu callers, we do not care about false-sharing because call_rcu
callback-invocation threads use batching ("splice") to get an entire
list of callbacks, which effectively empties the queue, and requires to
touch the tail anyway. Ensuring that head and tail are placed on
different cache lines would matter only if we would be using "dequeue"
in the callback-invocation thread, which is not the case: we grab the
whole queue, and then iterate from our local head to our local tail.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>