git.lttng.org Git - urcu.git/commit

Fix: urcu-bp segfault in glibc pthread_kill()

This fixes an issue that appears after this recent urcu-bp fix is
applied:

  Fix: urcu-bp: Bulletproof RCU arena resize bug

Prior to this fix, on Linux at least, the behavior was to allocate
(and leak) one memory map region per reader thread. It worked, except
for the unfortunate leak. The fact that it worked, even though not the
way we had intended it to, is is why testing did not raise any red flag.
That state of affairs has prevailed for a long time, but it was
side-tracking some issues. After fixing the underlying bug that was
causing the memory map leak, another issue appears.

The garbage collection scheme reclaiming the thread tracking structures
in urcu-bp fails in stress tests to due a bug in glibc (tested against
glibc 2.13 and 2.17). Under this workload, on a 2-core/hyperthreaded i7:

./test_urcu_bp 40 4 10

we can easily trigger a segmentation fault in the pthread_kill() code.

Program terminated with signal 11, Segmentation fault.

Backtrace:

#0  __pthread_kill (threadid=140723681437440, signo=0) at ../nptl/sysdeps/unix/sysv/linux/pthread_kill.c:42
42 ../nptl/sysdeps/unix/sysv/linux/pthread_kill.c: No such file or directory.
(gdb) bt full
#0  __pthread_kill (threadid=140723681437440, signo=0) at ../nptl/sysdeps/unix/sysv/linux/pthread_kill.c:42
         __x = <optimized out>
         pd = 0x7ffcc90b2700
         tid = <optimized out>
         val = <optimized out>
#1  0x0000000000403009 in rcu_gc_registry () at ../../urcu-bp.c:437
         tid = 140723681437440
         ret = 0
         chunk = 0x7ffcca0b8000
         rcu_reader_reg = 0x7ffcca0b8120
         __PRETTY_FUNCTION__ = "rcu_gc_registry"
#2  0x0000000000402b9c in synchronize_rcu_bp () at ../../urcu-bp.c:230
         cur_snap_readers = {next = 0x7ffcb4888cc0, prev = 0x7ffcb4888cc0}
         qsreaders = {next = 0x7ffcb4888cd0, prev = 0x7ffcb4888cd0}
         newmask = {__val = {18446744067267100671, 18446744073709551615 <repeats 15 times>}}
         oldmask = {__val = {0, 140723337334144, 0, 0, 0, 140723690351643, 0, 140723127058464, 4, 0, 140723698253920, 140723693868864, 4096, 140723690370432,
             140723698253920, 140723059951840}}
         ret = 0
         __PRETTY_FUNCTION__ = "synchronize_rcu_bp"
#3  0x0000000000401803 in thr_writer (_count=0x76b2f0) at test_urcu_bp.c:223
         count = 0x76b2f0
         new = 0x7ffca80008c0
         old = 0x7ffca40008c0
#4  0x00007ffcc9c83f8e in start_thread (arg=0x7ffcb4889700) at pthread_create.c:311
         __res = <optimized out>
         pd = 0x7ffcb4889700
         now = <optimized out>
         unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140723337336576, 6546223316613858487, 0, 140723698253920, 140723693868864, 4096, -6547756131873848137,
                 -6547872135220034377}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
         not_first_call = 0
         pagesize_m1 = <optimized out>
         sp = <optimized out>
         freesize = <optimized out>
         __PRETTY_FUNCTION__ = "start_thread"
#5  0x00007ffcc99ade1d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

It appears that the memory backing the thread information can be
relinquished by NPTL concurrently with execution of pthread_kill()
targeting an already joined thread and cause this segfault. We were
using pthread_kill(tid, 0) to discover if the target thread was alive or
not, as documented in pthread_kill(3):

       If  sig  is 0, then no signal is sent, but error checking is still per‐
       formed; this can be used to check for the existence of a thread ID.

but it appears that the glibc implementation is racy.

Instead of using the racy pthread_kill implementation, implement cleanup
using a pthread_key destroy notifier for a dummy key. This notifier is
called for each thread exit and destroy.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

author	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
	Wed, 2 Oct 2013 00:06:37 +0000 (20:06 -0400)
committer	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
	Wed, 2 Oct 2013 03:14:31 +0000 (23:14 -0400)
commit	8c1bd0782ff0ce28b96d394a5b27a294af6156e0
tree	4b7498085d7fc9dd848b5a73f773dc40c9d18d37	tree \| snapshot
parent	a0d529ecc9949d776800f5a006974d6bf410910e	commit \| diff