Michael Jeanson [Mon, 3 Jun 2019 20:36:43 +0000 (16:36 -0400)]
Fix: SONAME bump to 6.1.0
In commit d6c78161aed9b2d550ce201b0a8cd5b3ee515ac8 we bumped the 'age'
part of the library version with the intention of keeping the same major
SONAME because we only introduced new symbols. However by bumping the
'age' and not the 'current' we substracted 1 to the major SONAME which
we did not intend. Seems like we missed this in testing.
Fix it by bumping the 'current' to end up with an SONAME of 6.1.0 which
is what we originally intended.
From the libtool manual for reference :
Programs using the previous version may use the new version as drop-in
replacement, but programs using the new version may use APIs not present
in the previous one. In other words, a program linking against the new
version may fail with “unresolved symbols” if linking against the old
version at runtime: set revision to 0, bump current and age.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cleanup: update code layout to fix old gcc warning
Some CI jobs show:
urcu-pointer.o
13:46:22 In file included from urcu.c:49:0:
13:46:22 urcu-wait.h:70:9: warning: missing initializer for field 'lock' of 'struct cds_wfs_stack' [-Wmissing-field-initializers]
13:46:22 struct urcu_wait_queue name = URCU_WAIT_QUEUE_HEAD_INIT(name)
13:46:22 ^
13:46:22 urcu.c:150:8: note: in expansion of macro 'DEFINE_URCU_WAIT_QUEUE'
13:46:22 static DEFINE_URCU_WAIT_QUEUE(gp_waiters);
13:46:22 ^
13:46:22 In file included from urcu-wait.h:27:0,
13:46:22 from urcu.c:49:
13:46:22 ../include/urcu/wfstack.h:92:18: note: 'lock' declared here
13:46:22 pthread_mutex_t lock;
13:46:22
Building liburcu with --enable-cds-lfht-iter-debug and rebuilding
application to match the ABI change allows finding cases where the
hash table iterator is re-purposed to be used on a different hash
table while still being used to iterate on a hash table.
This is a common programming mistake that happens often enough
to justify creating a debugging mode to track this automatically.
Michael Jeanson [Wed, 12 Dec 2018 20:01:37 +0000 (15:01 -0500)]
Port: no symbols aliases on MacOS
There is no equivalent to symbols aliases on MacOS, this will
unfortunatly break the ABI for SONAME(6) and will require a rebuild of
client applications.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Fri, 30 Nov 2018 19:28:51 +0000 (14:28 -0500)]
Add -Wextra to CFLAGS
Edited by Mathieu Desnoyers:
Use /* fall through */ rather than __attribute__((fallthrough)) to
stay compatible with clang and gcc < 7. The fallthrough attribute
was introduced in gcc 7.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Use new header locations for includes from urcu code
This also moves urcu/static/urcu-pointer.h to urcu/static/pointer.h.
Considering that it is not meant to be included directly by library
users, it should not cause any problem.
urcu-call-rcu.h is included by rculfhqueue.h only for struct rcu_head
forward declaration, but as a result the urcu flavor needs to be chosen
beforehand, and therefore prevents using rculfqueue.h with multiple
urcu flavors in a given compile unit.
Remove that include and do a forward declaration of struct rcu_head
instead.
The header include/urcu/urcu.h now includes the right header between the
memb, signal, or mb flavors based on the compiler defines.
The symbol names of liburcu flavors are cleaned up, favoring the
following hierarchy:
urcu_<flavor name>_...
This is an ABI-breaking change, however the previous symbols name were
kept as aliases to maintain backward compatibility. They will be removed
when the next SONAME bump occurs.
The new liburcu-memb.so shared object is introduced, properly
namespacing this flavor. It is a duplicate of the previous liburcu.so,
which is kept around for backward compatibility.
The new URCU_API_MAP macro is introduced, controlling whether the
urcu API "mapping" should stay defined after inclusion of the flavor
headers. Users wishing to use the prior urcu API should either
explicitly define URCU_API_MAP before including the urcu/urcu*.h flavor
headers, or include the flavor header files from the include toplevel
directory, which are placeholders for backward compatibility. Use of
many urcu flavors within the same _LGPL_SOURCE compile unit should not
use the "map" APIs.
Internally, the "map" header files are split into one header per
flavor. The include guards are removed, so their effect can be
applied more than once. A new include/urcu/map/clear.h header is
introduced, which undefines the mappings at the end of the flavor
header if URCU_API_MAP is not set.
The new APIs namespaced for each urcu flavor is the recommended way to
use liburcu. We can expect the prior APIs to eventually become
deprecated over time.
Fix: only wait if work queue is empty in real-time mode
Unconditionally waiting for 10 ms after the completion of every batch
of jobs of the work queue in real-time mode appears to be a behaviour
inherited from the call-rcu thread.
While this is a fair trade-off in the context of call-rcu, it is less
evident that it is desirable in the context of a general-purpose
work queue. Waiting when work is available artificially degrades the
latency characteristics of the work queue.
If a workqueue user even need the explicit delay for batching (e.g. if
a call-rcu implementation would ever use the workqueue worker thread),
it can add it within the worker_before_wait_fct callback received as
argument from workqueue creation.
Fix: don't wait after completion of a work queue job batch
As indicated in the previous patch of this series, waiting on
completion of a job batch from the work queue artificially increases
the latency of the work queue.
The previous patch removed the wait that is performed when the
work queue is observed to be empty and was observed as the cause of a
performance problem.
It is likely that waiting when the queue is observed to be non-empty
is similarly unintended. Note that I have not observed such a problem
myself.
If a workqueue user even need the explicit delay for batching (e.g. if
a call-rcu implementation would ever use the workqueue worker thread),
it can add it within the worker_before_wait_fct callback received as
argument from workqueue creation.
Fix: don't wait after completion of job batch if work queue is empty
On completion of a batch of jobs from the work queue, a wait of 10
ms (using poll()) is performed if there is no work present in the
work queue before waiting on its futex.
The work queue thread's structure is inspired by the call-rcu thread.
In the context of the call-rcu thread, my understanding is that the
intention is to ensure that the thread does not continuously wake-up
to process a single queued item. This is fine as an application should
not wait for a call-rcu job to be executed (or at least I don't see a
use-case for that).
In the context of the work queue, waiting for more work to be available
artificially slows down the execution of work on which an application
may wait.
I have observed a case where LTTng's session daemon's shutdown is
takes around 4 seconds as a large number of cds_lfht objects are
destroyed. Removing the wait reduces the duration of this phase of the
shut-down to almost ~10ms.
If a workqueue user even need the explicit delay for batching (e.g. if
a call-rcu implementation would ever use the workqueue worker thread),
it can add it within the worker_before_wait_fct callback received as
argument from workqueue creation.
Fix: mixup between URCU_WORKQUEUE_RT and URCU_CALL_RCU_RT
The work queue implementation is derived from the call-rcu thread. A
number of references seem to have been left in place when adapting the
code for its new purpose.
The URCU_CALL_RCU_RT flag is used by wake_worker_thread() while the
rest of the workqueue.c code uses URCU_WORKQUEUE_RT to determine if
the work queue was configured in real-time mode. Both flags are defined
to the same value (0x1) and the current internal user of the
work queue (lfht) never specifies any flags.
In practice, this does not cause any problem, but this mixup should
be fixed nevertheless.
Michael Jeanson [Fri, 23 Nov 2018 21:47:18 +0000 (16:47 -0500)]
Fix: pthread_rwlock initialization on Cygwin
On Cygwin the PTHREAD_RWLOCK_INITIALIZER macro is not
sufficient to get a properly initialized pthread_rwlock_t
struct. Use the pthread_rwlock_init function instead which
should work on all platforms.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Fri, 23 Nov 2018 20:27:04 +0000 (15:27 -0500)]
Fix: compat_futex_noasync on Cygwin
The futex_noasync compat code uses a weak symbol to share state across
different shared object which is not possible on Windows with the
Portable Executable format. Use the async compat code for both cases.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Eric Wong [Wed, 1 Aug 2018 18:54:45 +0000 (18:54 +0000)]
wfcqueue: allow defining CDS_WFCQ_WAIT_SLEEP to override `poll'
Users may want to use alternative sleeping behavior instead of
`poll'. Make CDS_WFCQ_WAIT_SLEEP a macro which may be defined
before including wfcqueue.h.
This alternative behavior could include logging, performing
low-priority cleanup work, sleeping a shorter/longer interval
or any combination of that.
This will also make integration into glibc easier, as `poll'
linkage causes conformance test failures even when relegated
to an impossible code path:
https://public-inbox.org/libc-alpha/20180801092626.jrwyrojfye4avcis@whir/
Signed-off-by: Eric Wong <normalperson@yhbt.net> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Fri, 8 Dec 2017 16:00:17 +0000 (11:00 -0500)]
Tests: Replace prove by autotools tap runner
This patch removes the dependency on the prove perl script
to run the TAP test suite. It replaces it with the autotools
shell TAP driver that only requires a shell and awk.
Custom arguments can be passed to the test runner with
env variables as follow:
env LOG_DRIVER_FLAGS='--comments --ignore-exit' \
TESTS='foo.test baz.test' make -e check
This tap driver also creates a log file for each test that
can then be used by another system to build a test report.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
liburcu-bp: Use membarrier private expedited when available
For the liburcu-bp flavor, use the membarrier private expedited command
when available. It is faster than the shared expedited command, but has
only been introduced in 4.14 Linux kernels.
When configured with --disable-sys-membarrier-fallback, liburcu-bp
will abort if running on a kernel that do not provide the membarrier
private expedited command (e.g. CONFIG_MEMBARRIER=n or kernel version
below 4.14).
liburcu: Use membarrier private expedited when available
For the liburcu flavor, use the membarrier private expedited
command when available. It is faster than the shared expedited
command, but has only been introduced in 4.14 Linux kernels.
When configured with --disable-sys-membarrier-fallback, liburcu
will abort if running on a kernel that provide neither the shared
nor the private expedited membarrier commands. This is the case
if running on a CONFIG_MEMBARRIER=n kernel, or a kernel version
below 4.3.
Michael Jeanson [Fri, 28 Jul 2017 15:51:15 +0000 (11:51 -0400)]
Fix: don't use overlapping mmap mappings on Cygwin
The allocation scheme used by the mmap based RCU hash table is to make a
large unaccessible mapping to reserve memory without allocating it.
Then smaller chunks are allocated by overlapping read/write mappings which
do allocate memory. Deallocation is done by an overlapping unaccessible
mapping.
This scheme was tested on Linux, macOS and Solaris. However, on Cygwin the
mmap wrapper is based on the Windows NtMapViewOfSection API which doesn't
support overlapping mappings.
An alternative to the overlapping mappings is to use mprotect to change the
protection on chunks of the large mapping, read/write to allocate and none
to deallocate. This works perfecty on Cygwin and Solaris but on Linux a
call to madvise is also required to deallocate and it just doesn't work on
macOS.
For this reason, we keep to original scheme on all platforms except Cygwin.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Wed, 26 Jul 2017 17:31:04 +0000 (13:31 -0400)]
Tests fix: errors in shell scripts
Fix all shellcheck errors in the test scripts, switch to posix
compatible syntax. Remove duplicated code already included in common.sh.
Call the tap.sh cleanup code from our exit trap instead of overriding
it.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
The initial-exec model seems to behave differently than global-dynamic
with respect to lazy initialization, causing locks to be taken then
first time each thread touch the TLS. This introduces deadlocks
with library constructors waiting on other threads.
The initial-exec tls model removes requirement on performing memory
allocation the first time a tls variable is touched by any given thread.
This is needed to ensure usage of the TLS from a signal handler works
fine.
Given that the link-editor figures out the right model to use at
runtime, we can change the tls model without changing the soname major
version.
This also brings interesting speedups over the GD model. This does not
affects TLS accesses performed by executables, but does affect TLS
accesses performed by libraries.
Fix: don't use membarrier SHARED syscall command in liburcu-bp
One main user of liburcu-bp (lttng-ust) invokes synchronize_rcu()
repeatedly, without batching (does not use call_rcu).
Those delays introduced by sys_membarrier SHARED command significantly
impacts application startup time. Therefore, revert to not using the
membarrier SHARED command.
The RCU lock-free hash table currently requires that the destroy
function should not be called from within RCU read-side critical
sections. This is caused by the lazy resize, which uses the call_rcu
worker thread, even though all it really needs is a workqueue/worker
thread scheme.
Use the new internal workqueue API instead of call_rcu in rculfhash to
overcome this limitation.
Michael Jeanson [Tue, 2 May 2017 21:40:45 +0000 (17:40 -0400)]
Fix: Don't override user variables within the build system
Instead use the appropriatly prefixed AM_* variables as to not interfere
when a user variable is passed to a make command. The proper use of flag
variables is documented at :
Jonathan Rajotte [Mon, 23 Jan 2017 19:26:59 +0000 (14:26 -0500)]
Add --enable-rcu-debug to configure
When used CONFIG_RCU_DEBUG is defined in urcu/config.h, thus the
debugging self-test are used at all time. This enables a permanent
built-in debugging behaviour.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
"Hans Boehm pointed out that we were using dmb sy instead of dmb ish.
Given that the ARM-ARM says that the inner shareability domain is really
the one that contains all PE's controlled by a single hypervisor or
operating system, it would be safe to replace all dmb sy's with dmb
ish's. "
Keep full system barriers for cmm_mb()/cmm_rmb()/cmm_wmb().
Cleanup: remove cmm_wmb() from rcu_xchg_pointer and rcu_cmpxchg_pointer
Both rcu_xchg_pointer() and rcu_cmpxchg_pointer() imply both release and
acquire barriers. Therefore, the extra cmm_wmb() is redundant and can be
removed.
Fix: uatomic arm32: add missing release barrier before uatomic_xchg
__sync_lock_test_and_set() only imply a release barrier, but
uatomic_xchg() guarantees both acquire and release barrier semantics.
Therefore, add the missing release barrier.
When using the default (liburcu.so) and bulletproof (liburcu-bp.so)
flavours of Userspace RCU, kernel support for sys-membarrier is detected
dynamically and stored in the rcu_has_sys_membarrier_memb and
urcu_bp_has_sys_membarrier global variables.
Checking the value of these variables adds a small but measurable overhead
to smp_mb_slave. On systems which support sys-membarrier, it would be
nice to have a way of avoiding that overhead.
Here is the proposed approach: if CONFIG_RCU_FORCE_SYS_MEMBARRIER is
defined then rcu_has_sys_membarrier_memb/urcu_bp_has_sys_membarrier are
replaced with the constant 1, eliminating the overhead in smp_mb_slave.
As a sanity check, support for sys-membarrier is still detected at
startup. A program using liburcu or liburcu-bp compiled with this option
aborts in the library constructor if the membarrier system call is not
supported by the operating system.
Fix: rcutorture: work-around signal issue on mac os x
Our MacOS X test machine with the following config:
15.6.0 Darwin Kernel Version 15.6.0
root:xnu-3248.60.10~1/RELEASE_X86_64
appears to have issues with liburcu-signal signal being delivered on top
of pthread_cond_wait. It seems to make the thread continue, and
therefore corrupt the rcu_head. Work around this issue by unregistering
the RCU read-side thread immediately after call_rcu (call_rcu needs us
to be registered RCU readers).
Michael Jeanson [Tue, 28 Jun 2016 14:50:18 +0000 (10:50 -0400)]
Fix: Do not use wildcards in include/Makefile.am
Wildcards are not officially supported by autotools
in Makefiles since it needs to know the exact list
of files it has to work with.
Using an absolute path was a hack that worked as long
as the path to the header files from the top source dir
was the same as the install path of those files, which
is not the case anymore.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Thu, 23 Jun 2016 17:39:39 +0000 (13:39 -0400)]
Cleanup: Re-organise source dir
Re-organise the sources, add a top level "src" and "include" dir and
move relevant files.
Disable autotools automated includes and define them manually. This
fixes problems with collision of header names with system headers.
Include the autoconf config.h in the default includes and remove it
where it's explicitely included. Remove _GNU_SOURCE defines since
it's detected at configure for platforms that requires it and added
to the config.h.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fix: add missing destroy functions to queues/stack APIs
Queues and stack APIs that invoke pthread_mutex_init() should have a
"destroy" counterpart which calls pthread_mutex_destroy(), ortherwise
this causes small memory leaks on platforms where pthread_mutex_init
performs memory allocation.