Olivier Dion [Mon, 21 Oct 2024 17:31:28 +0000 (13:31 -0400)]
Introduce _CMM_TOOLCHAIN_SUPPORT_C11_MM
If the toolchain used supports the C11 memory model, define the private
_CMM_TOOLCHAIN_SUPPORT_C11_MM macro.
This is used to implement `uatomic_load()/uatomic_store' in term of
`__atomic' builtins instead of using volatile accesses, therefore
reducing possible redundant memory barriers for some memory orders.
Change-Id: I0df3c202daf74ac012338583b761de15687d25f3 Signed-off-by: Olivier Dion <odion@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Olivier Dion [Mon, 21 Oct 2024 17:18:08 +0000 (13:18 -0400)]
Seperate uatomic and uatomic_mo
The API for uatomic is now defined under `urcu/uatomic/api.h', which is
included by `urcu/uatomic.h'. All definitions are macros that dispatch
to their `_mo' counterpart. The default argument is set for the memory
order for backward compatibility.
This means that only the `uatomic_*_mo' must be implemented either
generically or by architecture.
This also remove the C11 compatibility layer in x86. Indeed, since RMW
operations are always guaranteed to have a full fence, then it is safe
to ignore the memory ordering. This is because all used sync operations
are documented as been "considered as a full barrier". See
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html.
Change-Id: I6be8c45b1758f268e7406bb17ab0086f9e9f5d4e Signed-off-by: Olivier Dion <odion@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
When using rcu_dereference() in the method of a templated class on a pointer
whose type is determined by the template arguments, the following error occurs
(using g++ 13.2.0):
../../src/common/urcu.hpp:387:39: error: need 'typename' before 'std::remove_cv<__typeof__ (((lttng::urcu::rcu_list_iteration_adapter<ContainedType, Member>::iterator*)this)->_node_contents.next)>::type' because 'std::remove_cv<__typeof__ (((lttng::urcu::rcu_list_iteration_adapter<ContainedType, Member>::iterator*)this)->_node_contents.next)>' is a dependent scope
`typename` is necessary here since the result of the template argument provided
to std::remove_cv can be a dependant type and we are required to inform the
compiler of this dependency.
Michael Jeanson [Mon, 22 Jul 2024 17:47:33 +0000 (13:47 -0400)]
Allow building with GCC >= 13.3 on RISC-V
Building with GCC was marked as broken on RISC-V in
"46980605309e922d14f646c6705d184cb674c0f7". The patches fixing the issue
with atomic operations were included in the GCC 14 branch and backported
to 13.3.0.
Add the appropriate checks to allow building on RISC-V with the fixed
GCC versions.
Tested with GCC 13.3.0 on a VisionFive 2 board.
Change-Id: I9e4498640116b2b5fe73dd8e1822309444130998 Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
futex.h: Use urcu_posix_assert to validate unused values
When building on FreeBSD, uaddr2 and val3 are unused. Add a
urcu_posix_assert() to validate that they are zero and hence allow users
of the API to quickly figure out that those are not effectively used.
When building on OpenBSD, val3 is unused. Add a urcu_posix_assert() to
validate that it is zero.
Those asserts are already present in the compat code. Use the same
mechanism to prevent users from expecting futex arguments to be used
when they are in fact discarded.
fix: handle EINTR correctly in get_cpu_mask_from_sysfs
If the read() in get_cpu_mask_from_sysfs() fails with EINTR, the code is
supposed to retry, but the while loop condition has (bytes_read > 0),
which is false when read() fails with EINTR. The result is that the code
exits the loop, having only read part of the string.
Use (bytes_read != 0) in the while loop condition instead, since the
(bytes_read < 0) case is already handled in the loop.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I565030d4625ae199cabc4c2ab5eb8ac49ea4dfcb
Olivier Dion [Tue, 22 Aug 2023 20:23:17 +0000 (16:23 -0400)]
uatomic/x86: Remove redundant memory barriers
When liburcu is configured to _not_ use atomic builtins, the
implementation of atomic operations is done using inline assembler for
each architecture.
Because we control the emitted assembler, we know whether specific
operations (e.g. lock; cmpxchg) already have an implicit memory barrier.
In those cases, emitting an explicit cmm_smp_mb() before/after the
operation is redundant and hurts performance.
Remove those redundant barriers on x86.
Change-Id: Ic1f6cfe9c2afe250946549cf6187f8fa88f5b009 Signed-off-by: Olivier Dion <odion@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Xenofon Foukas [Thu, 15 Feb 2024 15:21:42 +0000 (15:21 +0000)]
Add support for custom memory allocators for rculfhash
The current implementation of rculfhash relies on calloc()
to allocate memory for its buckets. This can in some cases
lead to latency spikes when accessing the hash table, which
can be avoided by using an optimized custom memory allocator.
However, there is currently no way of replacing the default
allocator with a custom one.
This commit allows custom allocators to be used during the
table initialization. The default behavior of the hash table
remains unaffected, by using the stdlib calloc() and free(),
if no custom allocator is given.
Sergey Fedorov [Fri, 5 Jan 2024 10:44:18 +0000 (18:44 +0800)]
ppc.h: use mftb on ppc
Older versions of GNU as do not support mftbl. The issue affects Darwin
PowerPC, as well as some older versions of NetBSD and Linux. Since mftb
is equivalent and universally understood, just use that.
Sam James [Sun, 5 Nov 2023 22:27:17 +0000 (22:27 +0000)]
Fix -Walloc-size
GCC 14 introduces a new -Walloc-size included in -Wextra which gives:
```
urcu-call-rcu-impl.h:912:20: warning: allocation of insufficient size '1' for type 'struct call_rcu_completion' with size '16' [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]]
urcu-call-rcu-impl.h:927:22: warning: allocation of insufficient size '1' for type 'struct call_rcu_completion_work' with size '24' [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]]
urcu-call-rcu-impl.h:912:20: warning: allocation of insufficient size '1' for type 'struct call_rcu_completion' with size '16' [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]]
urcu-call-rcu-impl.h:927:22: warning: allocation of insufficient size '1' for type 'struct call_rcu_completion_work' with size '24' [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]]
urcu-call-rcu-impl.h:912:20: warning: allocation of insufficient size '1' for type 'struct call_rcu_completion' with size '16' [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]]
urcu-call-rcu-impl.h:927:22: warning: allocation of insufficient size '1' for type 'struct call_rcu_completion_work' with size '24' [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]]
urcu-call-rcu-impl.h:912:20: warning: allocation of insufficient size '1' for type 'struct call_rcu_completion' with size '16' [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]]
urcu-call-rcu-impl.h:927:22: warning: allocation of insufficient size '1' for type 'struct call_rcu_completion_work' with size '24' [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]]
urcu-call-rcu-impl.h:912:20: warning: allocation of insufficient size '1' for type 'struct call_rcu_completion' with size '16' [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]]
urcu-call-rcu-impl.h:927:22: warning: allocation of insufficient size '1' for type 'struct call_rcu_completion_work' with size '24' [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]]
workqueue.c:401:20: warning: allocation of insufficient size '1' for type 'struct urcu_workqueue_completion' with size '16' [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]]
workqueue.c:432:14: warning: allocation of insufficient size '1' for type 'struct urcu_workqueue_completion_work' with size '24' [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]]
urcu-call-rcu-impl.h:912:20: warning: allocation of insufficient size '1' for type 'struct call_rcu_completion' with size '16' [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]]
urcu-call-rcu-impl.h:927:22: warning: allocation of insufficient size '1' for type 'struct call_rcu_completion_work' with size '24' [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]]
qsbr.c:49:14: warning: allocation of insufficient size ‘1’ for type ‘struct mynode’ with size ‘40’ [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]]
mb.c:50:14: warning: allocation of insufficient size ‘1’ for type ‘struct mynode’ with size ‘40’ [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]]
membarrier.c:50:14: warning: allocation of insufficient size ‘1’ for type ‘struct mynode’ with size ‘40’ [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]]
signal.c:49:14: warning: allocation of insufficient size ‘1’ for type ‘struct mynode’ with size ‘40’ [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]]
bp.c:49:14: warning: allocation of insufficient size ‘1’ for type ‘struct mynode’ with size ‘40’ [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]]
```
The calloc prototype is:
```
void *calloc(size_t nmemb, size_t size);
```
So, just swap the number of members and size arguments to match the prototype, as
we're initialising 1 struct of size `sizeof(struct ...)`. GCC then sees we're not
doing anything wrong.
Signed-off-by: Sam James <sam@gentoo.org> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: Id84ce5cf9a1b97bfa942597aa188ef6e27e7c10d
Olivier Dion [Thu, 28 Sep 2023 16:53:46 +0000 (12:53 -0400)]
urcu/uatomic/riscv: Mark RISC-V as broken
Implementations of some atomic operations of GCC for RISC-V are
insufficient for sequential consistency. For this reason Userspace RCU
is currently marked as `broken' for RISC-V with GCC. However, it is
still possible to use other toolchains.
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104831 for details.
For now, we mark every version of GCC as unsupported. Distribution
package maintainers will have to cherry-pick the relevant patches in GCC
then remove the #error in Userspace RCU if they want to support it.
As for us, we will incrementally add specific versions of GCC that have
fixed the issue whenever new stable releases are made from the GCC
project.
Change-Id: I2cd7c8f12068628b845a096e03f5f8100eacbe43 Signed-off-by: Olivier Dion <odion@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
This is a port from a fix in LTTng-UST's embedded urcu (d1a0fad8). The
original message follows:
Running the LTTng-tools tests (test_valid_filter, for example) under
address sanitizer results in the following warning:
/usr/include/lttng/urcu/static/urcu-ust.h:155:6: runtime error: member access within misaligned address 0x7fc45db3a020 for type 'struct lttng_ust_urcu_reader', which requires 128 byte alignment
0x7fc45db3a020: note: pointer points here
c4 7f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
^
While the node member of lttng_ust_urcu_reader has an "aligned"
attribute of CAA_CACHE_LINE_SIZE, the compiler can't ensure the
alignment of members for dynamically allocated instances.
The `data` pointer is changed from char* to struct
lttng_ust_urcu_reader*, allowing the compiler to enforce the expected
alignment constraints.
Since `data` was addressed in bytes, the code using this field is
adapted to use element counts. As the chunks are only used to allocate
reader instances (and not other types), it makes the code a bit easier
to read.
Section 2.2.7 "Atomic Memory Access Instructions" only lists atomic
operations for 32-bit and 64-bit integers. As detailed in Section
2.2.7.1, LL/SC instructions operating on 32-bit and 64-bit integers are
also available. Those are used by the compiler to support atomics on
byte and short types.
This means atomics on 32-bit and 64-bit types have stronger forward
progress guarantees than those operating on 8-bit and 16-bit types.
Tests: Add test for byte/short atomics on addresses which are not word-aligned
Add a unit test to catch architectures which do not allow byte and short
atomic operations on addresses which are not word aligned.
If an architecture supports byte and short atomic operations, it should
be valid to issue those operations on variables which are not
word-aligned, otherwise the architecture should not define
UATOMIC_HAS_ATOMIC_BYTE nor UATOMIC_HAS_ATOMIC_SHORT.
This should help identify architectures which mistakenly define
UATOMIC_HAS_ATOMIC_BYTE and UATOMIC_HAS_ATOMIC_SHORT.
This commit completes removal of the urcu-signal flavor.
Users can migrate to liburcu-memb with a kernel implementing the
membarrier(2) system call to have similar read-side performance without
requiring use of a reserved signal, and with improved grace period
performance.
Fix: Add missing cmm_smp_mb() in deprecated urcu-signal
commit 97d13221f8a1 ("Phase 1 of deprecating liburcu-signal") miss a
cmm_smp_mb() at the beginning of the read-side critical sections, which
causes spurious failures in the CI tests.
Olivier Dion [Mon, 14 Aug 2023 20:40:30 +0000 (16:40 -0400)]
Phase 1 of deprecating liburcu-signal
The first phase of liburcu-signal deprecation consists of implementing
it in term of liburcu-mb. In other words, liburcu-signal is identical to
liburcu-mb at the exception of the function symbols and public header
files.
This is done by:
1) Removing the RCU_SIGNAL specific code in urcu.c
2) Making the RCU_MB specific code also specific to RCU_SIGNAL in
urcu.c
3) Rewriting _urcu_signal_read_unlock_update_and_wakeup to use a
atomic store with CMM_SEQ_CST instead of a store CMM_RELAXED with
cmm_barrier() around it. We could keep the explicit barriers, but that
would require to add some cmm_annotate annotations. Therefore, to be
less intrusive in a public header file, simply use the CMM_SEQ_CST
like for the mb flavor.
Change-Id: Ie406f7df2f47da0a9f464df94b968ad9204821f3 Signed-off-by: Olivier Dion <odion@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
abort(3) was explicitly declared external to avoid including
<stdlib.h>. However, this emit a redundant declaration warning if it was
already declared before including <urcu/uatomic.h>.
Fix this by including <stdlib.h> and not declaring abort().
Change-Id: If9557814c311e2b531e85fec8c41788462338fe4 Signed-off-by: Olivier Dion <odion@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Olivier Dion [Mon, 29 May 2023 15:21:11 +0000 (11:21 -0400)]
Add cmm_emit_legacy_smp_mb()
Some public APIs stipulate implicit memory barriers on operations. These
were coherent with the memory model used at that time. However, with the
migration to a memory model closer to the C11 memory model, these memory
barriers are not strictly emitted by the atomic operations in the new
memory model.
Therefore, introducing the `--disable-legacy-mb' configuration
option. By default, liburcu is configured to emit these legacy memory
barriers, thus keeping backward compatibility at the expense of slower
performances. However, users can opt-out by disabling the legacy memory
barriers.
This options is publicly exported in the system configuration header
file and can be overrode manually on a compilation unit basis by
defining `CONFIG_RCU_EMIT_LEGACY_MB' before including any liburcu files.
The usage of this macro requires to re-write atomic operations in term
of the CMM memory model. This is done for the queue and stack APIs.
Change-Id: Ia5ce3b3d8cd1955556ce96fa4408a63aa098a1a6 Signed-off-by: Olivier Dion <odion@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Olivier Dion [Fri, 31 Mar 2023 17:47:17 +0000 (13:47 -0400)]
urcu/annotate: Add CMM annotation
The CMM annotation is highly experimental and not meant to be used by
user for now, even though it is exposed in the public API since some
parts of the liburcu public API require those annotations.
The main primitive is the cmm_annotate_t which denotes a group of memory
operations associated with a memory barrier. A group follows a state
machine, starting from the `CMM_ANNOTATE_VOID' state. The following are
the only valid transitions:
The macro `cmm_annotate_define(name)' can be used to create an
annotation object on the stack. The rest of the `cmm_annotate_*' macros
can be used to change the state of the group after validating that the
transition is allowed. Some of these macros also inject TSAN annotations
to help it understand the flow of events in the program since it does
not currently support thread fence.
Sometime, a single memory access does not need to be associated with a
group. In the case, the acquire/release macros variant without the
`group' infix can be used to annotate memory accesses.
Note that TSAN can not be used on the liburcu-signal flavor. This is
because TSAN hijacks calls to sigaction(3) and places its own handler
that will deliver the signal to the application at a synchronization
point.
Thus, the usage of TSAN on the signal flavor is undefined
behavior. However, there's at least one known behavior which is a
deadlock between readers that want to unregister them-self by locking
the `rcu_registry_lock' while a synchronize RCU is made on the writer
side which has already locked that mutex until all the registered
readers execute a memory barrier in a signal handler defined by
liburcu-signal. However, TSAN will not call the registered handler while
waiting on the mutex. Therefore, the writer spin infinitely on
pthread_kill(3p) because the reader simply never complete the handshake.
Olivier Dion [Fri, 31 Mar 2023 14:53:43 +0000 (10:53 -0400)]
benchmark: Use uatomic for accessing global states
Global states accesses were protected via memory barriers. Use the
uatomic API with the CMM memory model so that TSAN can understand the
ordering imposed by the synchronization flags.
Olivier Dion [Wed, 29 Mar 2023 19:22:13 +0000 (15:22 -0400)]
tests: Use uatomic for accessing global states
Global states accesses were protected via memory barriers. Use the
uatomic API with the CMM memory model so that TSAN does not warn about
non-atomic concurrent accesses.
Also, the thread id map mutex must be unlocked after setting the new
created thread id in the map. Otherwise, the new thread could observe an
unset id.
Olivier Dion [Thu, 30 Mar 2023 20:09:50 +0000 (16:09 -0400)]
urcu-wait: Fix wait state load/store
The state of a wait node must be accessed atomically. Also, the action
of busy loading until the teardown state is seen must follow a
CMM_ACQUIRE semantic while storing the teardown must follow a
CMM_RELEASE semantic.
The CMM memory model reflects the C11 memory model with an additional
CMM_SEQ_CST_FENCE memory order. The memory order can be selected through
the enum cmm_memorder.
* With Atomic Builtins
If configured with atomic builtins, the correspondence between the CMM
memory model and the C11 memory model is a one to one at the exception
of the CMM_SEQ_CST_FENCE memory order which implies the memory order
CMM_SEQ_CST and a thread fence after the operation.
* Without Atomic Builtins
However, if not configured with atomic builtins, the following stipulate
the memory model.
For load operations with uatomic_load(), the memory orders CMM_RELAXED,
CMM_CONSUME, CMM_ACQUIRE, CMM_SEQ_CST and CMM_SEQ_CST_FENCE are
allowed. A barrier may be inserted before and after the load from memory
depending on the memory order:
- CMM_RELAXED: No barrier
- CMM_CONSUME: Memory barrier after read
- CMM_ACQUIRE: Memory barrier after read
- CMM_SEQ_CST: Memory barriers before and after read
- CMM_SEQ_CST_FENCE: Memory barriers before and after read
For store operations with uatomic_store(), the memory orders
CMM_RELAXED, CMM_RELEASE, CMM_SEQ_CST and CMM_SEQ_CST_FENCE are
allowed. A barrier may be inserted before and after the store to memory
depending on the memory order:
- CMM_RELAXED: No barrier
- CMM_RELEASE: Memory barrier before operation
- CMM_SEQ_CST: Memory barriers before and after operation
- CMM_SEQ_CST_FENCE: Memory barriers before and after operation
For load/store operations with uatomic_and_mo(), uatomic_or_mo(),
uatomic_add_mo(), uatomic_sub_mo(), uatomic_inc_mo(), uatomic_dec_mo(),
uatomic_add_return_mo() and uatomic_sub_return_mo(), all memory orders
are allowed. A barrier may be inserted before and after the operation
depending on the memory order:
- CMM_RELAXED: No barrier
- CMM_ACQUIRE: Memory barrier after operation
- CMM_CONSUME: Memory barrier after operation
- CMM_RELEASE: Memory barrier before operation
- CMM_ACQ_REL: Memory barriers before and after operation
- CMM_SEQ_CST: Memory barriers before and after operation
- CMM_SEQ_CST_FENCE: Memory barriers before and after operation
For the exchange operation uatomic_xchg_mo(), any memory order is
valid. A barrier may be inserted before and after the exchange to memory
depending on the memory order:
- CMM_RELAXED: No barrier
- CMM_ACQUIRE: Memory barrier after operation
- CMM_CONSUME: Memory barrier after operation
- CMM_RELEASE: Memory barrier before operation
- CMM_ACQ_REL: Memory barriers before and after operation
- CMM_SEQ_CST: Memory barriers before and after operation
- CMM_SEQ_CST_FENCE: Memory barriers before and after operation
For the compare exchange operation uatomic_cmpxchg_mo(), the success
memory order can be anything while the failure memory order cannot be
CMM_RELEASE nor CMM_ACQ_REL and cannot be stronger than the success
memory order. A barrier may be inserted before and after the store to
memory depending on the memory orders:
Success memory order:
- CMM_RELAXED: No barrier
- CMM_ACQUIRE: Memory barrier after operation
- CMM_CONSUME: Memory barrier after operation
- CMM_RELEASE: Memory barrier before operation
- CMM_ACQ_REL: Memory barriers before and after operation
- CMM_SEQ_CST: Memory barriers before and after operation
- CMM_SEQ_CST_FENCE: Memory barriers before and after operation
Barriers after the operations are only emitted if the compare exchange
succeed.
Failure memory order:
- CMM_RELAXED: No barrier
- CMM_ACQUIRE: Memory barrier after operation
- CMM_CONSUME: Memory barrier after operation
- CMM_SEQ_CST: Memory barriers before and after operation
- CMM_SEQ_CST_FENCE: Memory barriers before and after operation
Barriers after the operations are only emitted if the compare exchange
failed. Barriers before the operation are never emitted by this
memory order.
If the toolchain supports atomic builtins and the user ask for atomic
builtins, use them for the uatomic API. This requires that the
toolchains used to compile the library and the user application supports
such builtins.
The advantage of using these builtins is that they are well known
synchronization primitives by several tools such as TSAN.
However, they may introduce redundant memory barriers, mainly on
strongly ordered architectures.
Change-Id: Ia8e97112681f744f17816dbc4cbbec805a483331 Co-authored-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Olivier Dion <odion@efficios.com> Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Olivier Dion [Fri, 14 Jul 2023 16:49:24 +0000 (12:49 -0400)]
tests/regression/rcutorture: Use urcu-wait
pthread_cond_wait(3) can have spurious wakeups on some OS. To detect
such spurious wakeup, a global variable is shared between the waiter and
the waker.
We can use urcu-wait instead.
Change-Id: I6a2d2f3c9104ea23df16a7c8ba3557bb5d58306c Signed-off-by: Olivier Dion <odion@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Thu, 6 Jul 2023 15:35:11 +0000 (11:35 -0400)]
Complete REUSE support
The SPDX identifiers [1] are a legally binding shorthand, which can be
used instead of the full boiler plate text. This is the final step
towards implementing the full REUSE spec [2] to help with copyright and
licensing audits and compliance.
This will reduce a lot a manual work required for the licensing audit
required in Debian on each update.
Michael Jeanson [Wed, 5 Jul 2023 18:20:10 +0000 (14:20 -0400)]
extras/abi: license data files under CC-1.0
The SPDX identifiers [1] are a legally binding shorthand, which can be
used instead of the full boiler plate text. This is another step towards
implementing the full REUSE spec [2] to help with copyright and
licensing audits and compliance.
This will reduce a lot a manual work required for the licensing audit
required in Debian on each update.
These are generated files, use the CC-1.0 license to make their
licensing clear.
Michael Jeanson [Wed, 5 Jul 2023 15:17:24 +0000 (11:17 -0400)]
examples: use SPDX identifiers
The SPDX identifiers [1] are a legally binding shorthand, which can be
used instead of the full boiler plate text. This is another step towards
implementing the full REUSE spec [2] to help with copyright and
licensing audits and compliance.
This will reduce a lot a manual work required for the licensing audit
required in Debian on each update.
Relicense all examples from 'Boehm-GC' to the more well-known and
functionnaly identical 'MIT' license. This is possible since all the
examples were written by Mathieu Desnoyers and only a few trivial fixes
from external contributors were applied over the years.
Michael Jeanson [Tue, 4 Jul 2023 20:53:30 +0000 (16:53 -0400)]
tests: use SPDX identifiers
The SPDX identifiers [1] are a legally binding shorthand, which can be
used instead of the full boiler plate text. This is another step towards
implementing the full REUSE spec [2] to help with copyright and
licensing audits and compliance.
This will reduce a lot a manual work required for the licensing audit
required in Debian on each update.
For files that lacked copyright and licensing information, I used the
following guidelines. Use the author from the git history and the test
scripts license as stated in LICENSE, 'GPL-2.0-only'.
Michael Jeanson [Tue, 4 Jul 2023 20:53:07 +0000 (16:53 -0400)]
src: use SPDX identifiers
The SPDX identifiers [1] are a legally binding shorthand, which can be
used instead of the full boiler plate text. This is another step towards
implementing the full REUSE spec [2] to help with copyright and
licensing audits and compliance.
This will reduce a lot a manual work required for the licensing audit
required in Debian on each update.
Michael Jeanson [Tue, 4 Jul 2023 20:52:00 +0000 (16:52 -0400)]
Public headers: use SPDX identifiers
The SPDX identifiers [1] are a legally binding shorthand, which can be
used instead of the full boiler plate text. This is another step towards
implementing the full REUSE spec [2] to help with copyright and
licensing audits and compliance.
This will reduce a lot a manual work required for the licensing audit
required in Debian on each update.
For files that lacked copyright and licensing information, I used the
following guidelines. Use the author from the git history and the main
project license 'LGPL-2.1-or-later'.
Michael Jeanson [Tue, 4 Jul 2023 20:47:07 +0000 (16:47 -0400)]
Build system: use SPDX identifiers
The SPDX identifiers [1] are a legally binding shorthand, which can be
used instead of the full boiler plate text. This is the first step
towards implementing the full REUSE spec [2] to help with copyright and
licensing audits and compliance.
This will reduce a lot a manual work required for the licensing audit
required in Debian on each update.
For files that lacked copyright and licensing information, I used the
following guidelines. If a clear author could be determined from the git
history use it, otherwise use 'EfficiOS Inc.'. For build system files,
use 'MIT', for documentation 'CC-BY-4.0' and for data files 'CC-1.0'.
The approach taken by caa_unqual_scalar_typeof requires use of _Generic
which requires full C11 support. Currently liburcu supports C99.
Therefore, this approach is not appropriate for now.
Instead, introduce caa_container_of_check_null which returns NULL if the
ptr is NULL before offsetting by the member offset.
Avoid calling caa_container_of on NULL pointer in cds_lfht macros
The cds_lfht_for_each_entry and cds_lfht_for_each_entry_duplicate macros
would call caa_container_of() macro on NULL pointer. This is not a
problem under normal circumstances as the check in the for loop fails
and the loop-statement is not called with invalid (pos) value.
However AddressSanitizer doesn't like that and complains about this:
Move the cds_lfht_iter_get_node(iter) != NULL from the cond-expression
of the for loop into both init-clause and iteration-expression as
conditional operator and check for (pos) value in the cond-expression
instead. Introduce the cds_lfht_entry() macro to eliminate code
duplication.
Michael Jeanson [Thu, 23 Mar 2023 18:23:55 +0000 (14:23 -0400)]
fix: warning 'noreturn' function does return on ppc
On a ppc64 system with gcc 9.5.0 I get the following error when building
with -O0 :
/usr/include/urcu/uatomic/generic.h: In function 'void _uatomic_link_error()':
/usr/include/urcu/uatomic/generic.h:53:1: warning: 'noreturn' function does return
53 | }
| ^
Split the inline function in 2 variants and apply the noreturn attribute
only on the builtin_trap one.
Change-Id: I5ae8e764c4cc27af0463924a653b9eaa9f698c34 Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Ondřej Surý [Fri, 17 Mar 2023 15:44:10 +0000 (16:44 +0100)]
Fix: use __noreturn__ for C11-compatibility
The noreturn convenience macro provided by stdnoreturn.h might get
included before urcu headers, use __noreturn__ for better compatibility
with code using <stdnoreturn.h> header.
Brad Smith [Sat, 25 Feb 2023 05:53:06 +0000 (00:53 -0500)]
Adjust shell scripts to allow Bash in other locations
Linux-based OS for the most part provide Bash and being located in /bin,
but on other OS's the shell would be in another location. Utilize env(1)
and allow it to be located elsewhere.
Signed-off-by: Brad Smith <brad@comstyle.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I9d4d4a3feaf993754c64b740ea91e42b336ba2b4
Brad Smith [Sat, 25 Feb 2023 02:17:16 +0000 (21:17 -0500)]
Add support for OpenBSD
- Add OpenBSD to syscall compatibility header as appropriate.
- Add function for retrieving the thread id in urcu_get_thread_id().
- Rely on pthread cond variables for futex compatibility.
It builds on all of our archs and fully run time tested on amd64.
Signed-off-by: Brad Smith <brad@comstyle.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I5cca5962ba3dc3113c9bd12e544b6e6f77dfdb61
Fix: call_rcu: teardown default call_rcu worker on application exit
Teardown the default call_rcu worker thread if there are no queued
callbacks on process exit. This prevents leaking memory.
Here is how an application can ensure graceful teardown of this
worker thread:
- An application queuing call_rcu callbacks should invoke
rcu_barrier() before it exits.
- When chaining call_rcu callbacks, the number of calls to
rcu_barrier() on application exit must match at least the maximum
number of chained callbacks.
- If an application chains callbacks endlessly, it would have to be
modified to stop chaining callbacks when it detects an application
exit (e.g. with a flag), and wait for quiescence with rcu_barrier()
after setting that flag.
- The statements above apply to a library which queues call_rcu
callbacks, only it needs to invoke rcu_barrier in its library
destructor.
Fix a deadlock for auto-resize hash tables when cds_lfht_destroy
is called with RCU read-side lock held.
Example stack track of a hang:
Thread 2 (Thread 0x7f21ba876700 (LWP 26114)):
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x00007f21beba7aa0 in futex (val3=0, uaddr2=0x0, timeout=0x0, val=-1, op=0, uaddr=0x7f21bedac308 <urcu_memb_gp+8>) at ../include/urcu/futex.h:81
#2 futex_noasync (timeout=0x0, uaddr2=0x0, val3=0, val=-1, op=0, uaddr=0x7f21bedac308 <urcu_memb_gp+8>) at ../include/urcu/futex.h:90
#3 wait_gp () at urcu.c:265
#4 wait_for_readers (input_readers=input_readers@entry=0x7f21ba8751b0, cur_snap_readers=cur_snap_readers@entry=0x0,
qsreaders=qsreaders@entry=0x7f21ba8751c0) at urcu.c:357
#5 0x00007f21beba8339 in urcu_memb_synchronize_rcu () at urcu.c:498
#6 0x00007f21be99f93f in fini_table (last_order=<optimized out>, first_order=13, ht=0x5651cec75400) at rculfhash.c:1489
#7 _do_cds_lfht_shrink (new_size=<optimized out>, old_size=<optimized out>, ht=0x5651cec75400) at rculfhash.c:2001
#8 _do_cds_lfht_resize (ht=ht@entry=0x5651cec75400) at rculfhash.c:2023
#9 0x00007f21be99fa26 in do_resize_cb (work=0x5651e20621a0) at rculfhash.c:2063
#10 0x00007f21be99dbfd in workqueue_thread (arg=0x5651cec74a00) at workqueue.c:234
#11 0x00007f21bd7c06db in start_thread (arg=0x7f21ba876700) at pthread_create.c:463
#12 0x00007f21bd4e961f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 1 (Thread 0x7f21bf285300 (LWP 26098)):
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x00007f21be99d8b7 in futex (val3=0, uaddr2=0x0, timeout=0x0, val=-1, op=0, uaddr=0x5651d8b38584) at ../include/urcu/futex.h:81
#2 futex_async (timeout=0x0, uaddr2=0x0, val3=0, val=-1, op=0, uaddr=0x5651d8b38584) at ../include/urcu/futex.h:113
#3 futex_wait (futex=futex@entry=0x5651d8b38584) at workqueue.c:135
#4 0x00007f21be99e2c8 in urcu_workqueue_wait_completion (completion=completion@entry=0x5651d8b38580) at workqueue.c:423
#5 0x00007f21be99e3f9 in urcu_workqueue_flush_queued_work (workqueue=0x5651cec74a00) at workqueue.c:452
#6 0x00007f21be9a0c83 in cds_lfht_destroy (ht=0x5651d8b2fcf0, attr=attr@entry=0x0) at rculfhash.c:1906
This deadlock is easy to reproduce when rapidly adding a large number of
entries in the cds_lfht, removing them, and calling cds_lfht_destroy().
The deadlock will occur if the call to cds_lfht_destroy() takes place
while a resize of the hash table is ongoing.
Fix this by moving the teardown of the lfht worker thread to libcds
library destructor, so it does not have to wait on synchronize_rcu from
a resize callback from within a read-side critical section. As a
consequence, the atfork callbacks are left registered within each urcu
flavor for which a resizeable hash table is created until the end of the
executable lifetime.
The other part of the fix is to move the hash table destruction to the
worker thread for auto-resize hash tables. This prevents having to wait
for resize callbacks from RCU read-side critical section. This is
guaranteed by the fact that the worker thread serializes previously
queued resize callbacks before the destroy callback.
Christopher Ng [Fri, 3 Feb 2023 12:16:06 +0000 (12:16 +0000)]
Fix building on MSYS2
Update cygwin libtool config in `configure.ac` to match MSYS2 build
environments as well. MSYS2 is also a Windows build environment that
produces DLLs.
Signed-off-by: Christopher Ng <facboy@gmail.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I48ca648123fd40b8003c72c0447c70a8b4bde6d6