From: Jérémie Galarneau Date: Wed, 14 Feb 2018 21:13:51 +0000 (-0500) Subject: Fix: consumer socket lock not held during snapshot record X-Git-Tag: v2.10.3~13 X-Git-Url: https://git.lttng.org./?a=commitdiff_plain;h=705c13ed534670994e761d4d79742c3f95b7140c;p=lttng-tools.git Fix: consumer socket lock not held during snapshot record This missing lock was identified while stress-testing the snapshot tracing mode. The "post_mortem" test case would sometimes hang on a push_metadata() call waiting for a status reply from the consumer daemon. This test demonstrated a race that consists in killing an application and taking a snapshot near-simultaneously. This causes the app management thread to issue a "push metadata" command to the consumerd while the lttng client is issuing a snapshot record command. Since the snapshot record does not acquire the consumer socket lock, the "push metadata" and "snapshot" commands end-up mixed-up on the socket which ultimately causes the "apps management" thread to wait for a reply forever while holding the socket's lock. This prevents the client, invoked by the test script, from completing the "stop" operation on the session. Signed-off-by: Jérémie Galarneau --- diff --git a/src/bin/lttng-sessiond/consumer.c b/src/bin/lttng-sessiond/consumer.c index 251944606..126b01a44 100644 --- a/src/bin/lttng-sessiond/consumer.c +++ b/src/bin/lttng-sessiond/consumer.c @@ -1442,7 +1442,9 @@ int consumer_snapshot_channel(struct consumer_socket *socket, uint64_t key, } health_code_update(); + pthread_mutex_lock(socket->lock); ret = consumer_send_msg(socket, &msg); + pthread_mutex_unlock(socket->lock); if (ret < 0) { goto error; }