Paolo Bonzini [Wed, 17 Aug 2011 09:42:51 +0000 (05:42 -0400)]
urcu-qsbr: avoid useless futex wakeups and burning CPU for long grace periods
I noticed that urcu makes exactly _one_ attempt at using futexes
to avoid busy looping on synchronize_rcu. The attached patch instead
switches from busy waiting to futexes after RCU_QS_ACTIVE_ATTEMPTS.
To limit the amount of system calls, reading threads remember whether
they already had a quiescent state in this grace period; if so they were
already removed from the list, and can avoid signaling the futex.
Performance measured with rcutorture (nreaders: 10, nupdaters: 1,
duration: 10, median of nine runs):
RCU_QS_ACTIVE_ATTEMPTS == 100, no patch n_updates = 292
RCU_QS_ACTIVE_ATTEMPTS == 1, no patch n_updates = 290
RCU_QS_ACTIVE_ATTEMPTS == 100, with patch n_updates = 408
RCU_QS_ACTIVE_ATTEMPTS == 1, with patch n_updates = 404
(the first two cases are obviously the same; the only change is
when the futex is used, but over many calls there is no difference).
This patch matches the update to the Promela model.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Paolo Bonzini [Wed, 17 Aug 2011 09:31:21 +0000 (05:31 -0400)]
test api cleanup: remove unused primitives
[ Mathieu: the rationale for this is that we can always add back that
code if every needed. Removing leftover GPLv2 test code is an
incentive to create the appropriate library-wide LGPL/MIT-style
abstractions.]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Paolo Bonzini [Tue, 9 Aug 2011 12:37:14 +0000 (08:37 -0400)]
wfqueue: fix type-incorrect assignment
The "old_tail = q->tail, q->tail = node" assignment in wfqueue
is not type safe; q->tail is a pointer to pointer to node and the
correct value to assign is &node->next. While the arithmetic is
the same, it is better to be tidy.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Duncan Sands [Tue, 16 Aug 2011 11:10:01 +0000 (07:10 -0400)]
Fix choice of default flavour in urcu/map/urcu.h
Hi, I noticed in the 0.64 release (and git too) that if a flavour is not
specified explicitly then RCU_MB is chosen in urcu/map/urcu.h, while the
docs say and the Makefile expects RCU_MEMBARRIER. Note that the header
file urcu/static/urcu.h has similar logic but uses RCU_MEMBARRIER for
the default.
Paolo Bonzini [Tue, 9 Aug 2011 12:37:14 +0000 (08:37 -0400)]
wfqueue: fix type-incorrect assignment
The "old_tail = q->tail, q->tail = node" assignment in wfqueue
is not type safe; q->tail is a pointer to pointer to node and the
correct value to assign is &node->next. While the arithmetic is
the same, it is better to be tidy.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
urcu tests: hold mutex across use of custom allocator
A thread preempted for a long period of time could race, when scheduled
again, with another thread that would have been allocating/freeing
entries (thus wrapping-around the available buffer), which would trigger
this race only when overcommitting the number of threads compared to the
number of available CPUs.
Taking the mutex across alloc and free to fix this.
mremap keeps the same virtual pages for the old/new mappings. So
explicitly copying from the old mapping is not needed, and probably
buggy, since the old mapping might have been unmapped.
call_rcu: per_cpu_call_rcu_data should be non-const
On FreeBSD:
In file included from urcu.c:438:
urcu-call-rcu-impl.h: In function 'get_cpu_call_rcu_data_mb':
urcu-call-rcu-impl.h:325: warning: return discards qualifiers from pointer target type
compat_arch_x86.c: In function '_compat_uatomic_set':
compat_arch_x86.c:104:16: warning: variable 'result' set but not used [-Wunused-but-set-variable]
* Incorrect prototype for uatomic_and and uatomic_or in i386
compatibility code.
* Missing $(COMPAT) code inclusion in wfq/lfq tests.
* Silence gcc warnings about compat code (branch volountarily causing a
linker error, which can never return).
./urcu/uatomic/generic.h: In function '_uatomic_and':
./urcu/uatomic/generic.h:310:2: warning: 'return' with a value, in
function returning void
./urcu/uatomic/generic.h: In function '_uatomic_or':
./urcu/uatomic/generic.h:374:2: warning: 'return' with a value, in
function returning void
Even though int is 32-bit on all architectures supported by liburcu so
far, make it future-proof by uint a int32_t, which enforces the same
type width used by the system call in the kernel.
Using int32_t and not uint32_t to make comparison with 0 more
straightforward.
Reported-by: Darren Hart <dvhart@linux.intel.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Yannick Brosseau [Fri, 10 Jun 2011 15:35:49 +0000 (11:35 -0400)]
Add library version information
Following the guidelines from libtool
(http://www.gnu.org/software/libtool/manual/html_node/Updating-version-info.htm)
this patch add version information to the distributed libraries.
For the next release, the version will be 1:0:0.
It will need to be updated before each release.
Paolo Bonzini [Thu, 9 Jun 2011 16:54:38 +0000 (12:54 -0400)]
arm: remove useless declarations
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
commit bc94ca9bada25f7403e3e859caa241146ae8e338 changed the !RT behavior
slightly: when the list is not empty, it does not wait for a delay
anymore. Add this delay back, to ensure we don't flood the system with
frequent synchronize_rcu() calls, which would slow down readers.
The wait scheme has an implementation problem: if the list is not empty
when the !RT scheme checks for it, it will restart the loop and
decrement the futex (again) without calling call_rcu_wait() (which would
wait until it is set back to 0). So in this case, we can end up
decrementing "futex" to values well below -1.
Fix this by moving the decrement before the loop, and duplicate it after
return from call_rcu_wait() + poll() delay. Also move the "set futex to
0 upon stopping" outside of the loop: this is the only way the loop can
be stopped anyway.
Paolo Bonzini [Thu, 9 Jun 2011 14:13:13 +0000 (10:13 -0400)]
use generic-size macros for common implementation of atomic ops
The definition of _uatomic_cmpxchg is different in x86 and other
architectures. For x86 it is a 4-argument macro, for other
architectures it is a 3-argument function. This patch makes it easier
to implement atomic operations incrementally (first as a generic version
and then in machine-specific code), which aids testing and
bisectability.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Paolo Bonzini [Thu, 9 Jun 2011 13:32:58 +0000 (09:32 -0400)]
call_rcu: drop mutex
The mutex is being used only to protect OR accesses to the flags.
Just use atomic operations for that.
[ Edit: this also fixes busy-looping on flags that were previously read
without volatile access, which could lead to never-ending loop given the
appropriate set of compiler optimisations. ]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>