Fix: tls-compat.h exposes compiler-dependent public configuration
Exposing the storage class chosen by ax_tls.m4 in a public header is
a bad idea, because if a recent gcc is used when configuring
liburcu, thus detecting C11, it will choose _Thread_local. Then, if an
external project uses urcu/tls-compat.h with an older gcc (e.g. 4.8),
it will fail to build, because that storage class is unknown, and
__thread should be used instead.
Therefore, use a preprocessor conditional on __cplusplus to detect C++11
(and use thread_local). Else, the STDC version is used to select
_Thread_local. Else check if _MSC_VER is defined to select
__declspec(thread), or else rely on __thread as fallback.
On architectures where "char" is signed, it should be cast to unsigned
char before being passed as parameter to isdigit or isspace. Based on
their man page:
These functions check whether c, which must have the value of an
unsigned char or EOF, falls into a certain character class according to
the specified locale.
Passing a signed char as parameter is invalid if the values fall into
the negative range of the signed char.
Cleanup: update code layout to fix old gcc warning
Some CI jobs show:
urcu-pointer.o
13:46:22 In file included from urcu.c:49:0:
13:46:22 urcu-wait.h:70:9: warning: missing initializer for field 'lock' of 'struct cds_wfs_stack' [-Wmissing-field-initializers]
13:46:22 struct urcu_wait_queue name = URCU_WAIT_QUEUE_HEAD_INIT(name)
13:46:22 ^
13:46:22 urcu.c:150:8: note: in expansion of macro 'DEFINE_URCU_WAIT_QUEUE'
13:46:22 static DEFINE_URCU_WAIT_QUEUE(gp_waiters);
13:46:22 ^
13:46:22 In file included from urcu-wait.h:27:0,
13:46:22 from urcu.c:49:
13:46:22 ../include/urcu/wfstack.h:92:18: note: 'lock' declared here
13:46:22 pthread_mutex_t lock;
13:46:22
Michael Jeanson [Fri, 23 Nov 2018 21:47:18 +0000 (16:47 -0500)]
Fix: pthread_rwlock initialization on Cygwin
On Cygwin the PTHREAD_RWLOCK_INITIALIZER macro is not
sufficient to get a properly initialized pthread_rwlock_t
struct. Use the pthread_rwlock_init function instead which
should work on all platforms.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Fri, 23 Nov 2018 20:27:04 +0000 (15:27 -0500)]
Fix: compat_futex_noasync on Cygwin
The futex_noasync compat code uses a weak symbol to share state across
different shared object which is not possible on Windows with the
Portable Executable format. Use the async compat code for both cases.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Fri, 28 Jul 2017 15:51:15 +0000 (11:51 -0400)]
Fix: don't use overlapping mmap mappings on Cygwin
The allocation scheme used by the mmap based RCU hash table is to make a
large unaccessible mapping to reserve memory without allocating it.
Then smaller chunks are allocated by overlapping read/write mappings which
do allocate memory. Deallocation is done by an overlapping unaccessible
mapping.
This scheme was tested on Linux, macOS and Solaris. However, on Cygwin the
mmap wrapper is based on the Windows NtMapViewOfSection API which doesn't
support overlapping mappings.
An alternative to the overlapping mappings is to use mprotect to change the
protection on chunks of the large mapping, read/write to allocate and none
to deallocate. This works perfecty on Cygwin and Solaris but on Linux a
call to madvise is also required to deallocate and it just doesn't work on
macOS.
For this reason, we keep to original scheme on all platforms except Cygwin.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fix: don't use membarrier SHARED syscall command in liburcu-bp
One main user of liburcu-bp (lttng-ust) invokes synchronize_rcu()
repeatedly, without batching (does not use call_rcu).
Those delays introduced by sys_membarrier SHARED command significantly
impacts application startup time. Therefore, revert to not using the
membarrier SHARED command.
Michael Jeanson [Mon, 8 May 2017 20:05:46 +0000 (16:05 -0400)]
Fix: Don't override user variables within the build system
Instead use the appropriatly prefixed AM_* variables as to not interfere
when a user variable is passed to a make command. The proper use of flag
variables is documented at :
Fix: uatomic arm32: add missing release barrier before uatomic_xchg
__sync_lock_test_and_set() only imply a release barrier, but
uatomic_xchg() guarantees both acquire and release barrier semantics.
Therefore, add the missing release barrier.
Fix: rcutorture: work-around signal issue on mac os x
Our MacOS X test machine with the following config:
15.6.0 Darwin Kernel Version 15.6.0
root:xnu-3248.60.10~1/RELEASE_X86_64
appears to have issues with liburcu-signal signal being delivered on top
of pthread_cond_wait. It seems to make the thread continue, and
therefore corrupt the rcu_head. Work around this issue by unregistering
the RCU read-side thread immediately after call_rcu (call_rcu needs us
to be registered RCU readers).
Fix: add missing destroy functions to queues/stack APIs
Queues and stack APIs that invoke pthread_mutex_init() should have a
"destroy" counterpart which calls pthread_mutex_destroy(), ortherwise
this causes small memory leaks on platforms where pthread_mutex_init
performs memory allocation.
Fix: urcu-bp: re-initialize list head on library exit
In case an application would try to create threads after the urcu-bp
library destructor has run, make sure the arena chunk list is
re-initialized after the memory mappings are unmapped.
"make distcheck" marks each source file on the srcdir in the extracted
dist tarball read-only. The examples copy from the srcdir into the
builddir before running the "make" examples, but this keeps the
read-only flag on the builddir directories, which fails the build
because the resulting objects cannot be created.
Fix this by ensuring the copied target directory for each example is
user-writeable.
Introduce __cds_wfcq_head_cast and cds_wfcq_head_cast for compability
of wfcqueue with c++. Those are effect-less in C, where transparent
unions are supported. However, in C++, those transform struct
cds_wfcq_head and struct __cds_wfcq_head pointers to
cds_wfcq_head_ptr_t.
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
The urcu refcounting API features a look and feel similar to the Linux
kernel reference counting API, which has been the subject of
CVE-2016-0728 (use-after-free). Therefore, improve the urcu refcounting
API by dealing with reference counting overflow.
For urcu_ref_get(), handle this by comparing the prior value with
LONG_MAX before updating it with a cmpxchg. When an overflow would
occur, trigger a abort() rather than allowing the overflow (which is a
use-after-free security concern).
For urcu_ref_get_unless_zero(), in addition to compare the prior value
to 0, also compare it to LONG_MAX, and return failure (false) in both
cases.
Fix: compat_futex should work-around futex signal-restart kernel bug
When testing liburcu on a 3.18 Linux kernel, 2-core MIPS (cpu model :
Ingenic JZRISC V4.15 FPU V0.0), we notice that a blocked sys_futex
FUTEX_WAIT returns -1, errno=ENOSYS when interrupted by a SA_RESTART
signal handler. This spurious ENOSYS behavior causes hangs in liburcu
0.9.x. Running a MIPS 3.18 kernel under a QEMU emulator exhibits the
same behavior. This might affect earlier kernels.
This issue appears to be fixed in 3.19 since commit e967ef022 "MIPS: Fix
restart of indirect syscalls", but nevertheless, we should try to handle
this kernel bug more gracefully than a user-space hang due to unexpected
spurious ENOSYS return value.
Therefore, fallback on the "async-safe" version of compat_futex in those
situations where FUTEX_WAIT returns ENOSYS. This async-safe fallback has
the nice property of being OK to use concurrently with other FUTEX_WAKE
and FUTEX_WAIT futex() calls, because it's simply a busy-wait scheme.
The 4.2 kernel on parisc, and likely newer kernels too, are also
affected by a similar issue.
Didier Nadeau [Wed, 16 Dec 2015 20:02:47 +0000 (15:02 -0500)]
Support for Xeon-Phi with newer MPSS
The Xeon-Phi is now considered as a new architecture instead of a
vendor in MPSS version 3.4.4. This change is backward compatible with
previous MPSS versions.
Now that the membarrier system call is allocated on sparc, allocate
its number in our architecture header if the system headers don't
allocate it. This allows using the membarrier system call as soon as
implemented in the kernel, even if the distribution has old kernel
headers.
Now that the membarrier system call is allocated on hppa (parisc),
allocate its number in our architecture header if the system headers
don't allocate it. This allows using the membarrier system call as soon
as implemented in the kernel, even if the distribution has old kernel
headers.
The signal-based urcu flavor calls smp_mb_master() within the wait_gp()
function. Since commit "Fix: deadlock when thread join is issued in
read-side C.S.", wait_gp() is called without the registry lock held.
Ensure that the registry lock is only released around the wait per se,
not around the call to smp_mb_master(), otherwise we end up iterating on
a non-consistent thread registry in smp_mb_master().
- Migrate benchmarks and regression tests to tap,
- Replace the "bench" make target by "short_bench" and "long_bench".
The short benchmark is 3 seconds per test, and the long one is 30
seconds per test,
- make regtest now invokes the benchmarks with only 1 second per
benchmark.
- Now use "nproc" command to detect the number of available CPUs rather
than hardcoding a value.
- rcutorture in "stress" mode is now executed.
Now that the membarrier system call is allocated on tile, allocate
its number in our architecture header if the system headers don't
allocate it. This allows using the membarrier system call as soon as
implemented in the kernel, even if the distribution has old kernel
headers.
Do so by creating headers specifically for tile, which rely on the
gcc atomic and memory barrier builtins.
Now that the membarrier system call is allocated on ia64, allocate
its number in our architecture header if the system headers don't
allocate it. This allows using the membarrier system call as soon as
implemented in the kernel, even if the distribution has old kernel
headers.
Do so by creating headers specifically for ia64, which rely on the
gcc atomic and memory barrier builtins.
Now that the membarrier system call is allocated on aarch64, allocate
its number in our architecture header if the system headers don't
allocate it. This allows using the membarrier system call as soon as
implemented in the kernel, even if the distribution has old kernel
headers.
Do so by creating headers specifically for aarch64, which rely on the
gcc atomic and memory barrier builtins.
powerpc64le has been originally added to urcu with the "gcc" generic
architecture support. After testing, it appears that the "ppc"
architecture works as well.
Move to the "ppc" architecture so it becomes the same as other powerpc
32/64 (big endian) architectures.
Doing so wires up the membarrier system call on powerpc64le.
Now that the membarrier system call is allocated on ARM, allocate its
number in our architecture header if the system headers don't allocate
it. This allows using the membarrier system call as soon as implemented
in the kernel, even if the distribution has old kernel headers.
Now that the membarrier system call is allocated on s390/s390x, allocate
its number in our architecture header if the system headers don't
allocate it. This allows using the membarrier system call as soon as
implemented in the kernel, even if the distribution has old kernel
headers.
Now that the membarrier system call is allocated on powerpc, allocate
its number in our architecture header if the system headers don't
allocate it. This allows using the membarrier system call as soon as
implemented in the kernel, even if the distribution has old kernel
headers.
The documentation of the RCU-based synchronization technique in lfstack
is too strict. It currently states that the cds_lfs_node structure
cannot be overwritten before a grace period has passed. However, lfstack
pop only use the next pointer as the replacement value when doing the
cmpxchg on the head. After the node has been pop'd from the stack,
concurrent cmpxchg trying to pop that same node will necessarily fail as
long as there is a grace period before pop/pop_all and re-adding the
node into the stack.
It is therefore sufficient to wait for a grace period between:
1) pop/pop_all and
2) freeing the node (to ensure existence for concurrent pop trying to
read node->next) or re-adding the node into the stack.
This node re-use constraint relaxation is only possible because we don't
care about node->next content read by concurrent pop: it will be simply
discarded by the cmpxchg on head. Be careful not to apply this relaxed
constraint to other data structures which care about the content of the
node's next pointer (e.g. wfstack).
This relaxed constraint allows implementing efficient free-lists (memory
allocation) with a lock-free allocation/free based on lfstack: it allows
re-using the memory backing the free-list node immediately after
allocation. The only requirement with respect to this use-case is to
wait for a grace period before putting the node back into the free-list.
Also update the test_urcu_lfs to poison the next pointer immediately
after pop/pop_all to make sure we test this relaxed constraint.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com> CC: Lai Jiangshan <jiangshanlai@gmail.com> CC: lttng-dev@lists.lttng.org CC: rp@svcs.cs.pdx.edu
Cleanup: tests: Branch condition evaluates to a garbage value
scan-build reported this:
Logic error Branch condition evaluates to a garbage value tests
/benchmark /test_urcu_hash_rw.c 170
Logic error Branch condition evaluates to a garbage value tests
/benchmark /test_urcu_hash_rw.c 274
It should never happen based on code review, but silence this warning by
initializing to NULL.
CID 1021635 (#1 of 2): Unchecked return value (CHECKED_RETURN)7.
check_return: Calling pthread_mutex_unlock without checking return value
(as is done elsewhere 29 out of 33 times).
CID 1021634 (#2 of 2): Unchecked return value (CHECKED_RETURN)12.
check_return: Calling pthread_mutex_unlock without checking return value
(as is done elsewhere 29 out of 33 times).
CID 1021642 (#1 of 2): Side effect in assertion
(ASSERT_SIDE_EFFECT)assert_side_effect: Argument test_array of assert()
has a side effect because the variable is volatile. The containing
function might work differently in a non-debug build.
Now that the membarrier system call is allocated on x86 32/64, allocate
its number in our architecture header if the system headers don't
allocate it. This allows using the membarrier system call as soon as
implemented in the kernel, even if the distribution has old kernel
headers.
Allows getting a reference atomically if the reference count is not
zero. Returns true if the reference is taken, false otherwise. This
needs to be used in conjunction with another synchronization technique
(e.g. RCU or mutex) to ensure existence of the reference count.