From: Mathieu Desnoyers Date: Wed, 13 Jun 2012 08:26:34 +0000 (-0400) Subject: Fix: liblttng-ust-fork deadlock X-Git-Tag: v2.1.0-rc1~60 X-Git-Url: https://git.lttng.org./?a=commitdiff_plain;h=e8508a495280a1396a0319a157409a7ea61ffc5a;p=lttng-ust.git Fix: liblttng-ust-fork deadlock * Mathieu Desnoyers (mathieu.desnoyers@efficios.com) wrote: > * Burton, Michael (mburton@ciena.com) wrote: > > Mathieu, > > > > I think there is a deadlock scenario in UST, which has been causing my problem. > > Good catch ! > > > > > sessiond is started as root: > > - creates global sockets ONLY > > - DOES NOT CREATE shm in $HOME/lttng-ust-wait- > > > > application linked against ust is run as root: > > - in lttng_ust_init constructor > > - ust_listener_thread (local_apps) > > - fails to connect to local_apps in $HOME/.lttng (as expected) > > - prev_connect_failed=1 > > - ust_unlock() > > - restart > > - wait_for_sessiond() > > --> - ust_lock() > > | - get_map_shm() > > | - get_wait_shm() > > DEADLOCK - shm_open() FAILS (not created by sessiond when run by root) > > | - fork() (trying to create shared memory itself) > > | - ust_before_fork() > > ------------> - ust_lock() > > > > > > You should be able to create this with an empty main, with no > > tracepoints. As long as sessiond is started as root so > > $HOME/lttng-ust-wait- is not created. You can also make the > > lttng-ust constructor (lttng_ust_init) wait forever and then you'll be > > able to see the deadlock in gdb without even leaving the > > lttng_ust_init constructor. > > Ah, I see. This deadlock is caused by the interaction between > [ liblttng-ust-fork ] and liblttng-ust (the fork override is > performed by [ liblttng-ust-fork ]). This can be reproduced easily with the in-tree tests: by removing the lttng-ust-apps-wait* files belonging to the user in /dev/shm, running the "tests/fork" test (with ./run) hangs. If we run "hello" first, and then the fork test, it works fine. Fixing this by keeping a nesting counter around the fork() call, so we return immediately from the pre/post fork handlers if they are overridden by liblttng-ust-fork. Reported-by: Michael Burton Signed-off-by: Mathieu Desnoyers --- diff --git a/liblttng-ust/lttng-ust-comm.c b/liblttng-ust/lttng-ust-comm.c index 37f21c80..7a909701 100644 --- a/liblttng-ust/lttng-ust-comm.c +++ b/liblttng-ust/lttng-ust-comm.c @@ -78,6 +78,12 @@ static sem_t constructor_wait; */ static int sem_count = { 2 }; +/* + * Counting nesting within lttng-ust. Used to ensure that calling fork() + * from liblttng-ust does not execute the pre/post fork handlers. + */ +static int __thread lttng_ust_nest_count; + /* * Info about socket and associated listener thread. */ @@ -481,7 +487,9 @@ int get_wait_shm(struct sock_info *sock_info, size_t mmap_size) * If the open failed because the file did not exist, try * creating it ourself. */ + lttng_ust_nest_count++; pid = fork(); + lttng_ust_nest_count--; if (pid > 0) { int status; @@ -1031,6 +1039,8 @@ void ust_before_fork(sigset_t *save_sigset) sigset_t all_sigs; int ret; + if (lttng_ust_nest_count) + return; /* Disable signals */ sigfillset(&all_sigs); ret = sigprocmask(SIG_BLOCK, &all_sigs, save_sigset); @@ -1056,6 +1066,8 @@ static void ust_after_fork_common(sigset_t *restore_sigset) void ust_after_fork_parent(sigset_t *restore_sigset) { + if (lttng_ust_nest_count) + return; DBG("process %d", getpid()); rcu_bp_after_fork_parent(); /* Release mutexes and reenable signals */ @@ -1073,6 +1085,8 @@ void ust_after_fork_parent(sigset_t *restore_sigset) */ void ust_after_fork_child(sigset_t *restore_sigset) { + if (lttng_ust_nest_count) + return; DBG("process %d", getpid()); /* Release urcu mutexes */ rcu_bp_after_fork_child();