From: Jérémie Galarneau Date: Wed, 14 Feb 2018 21:13:51 +0000 (-0500) Subject: Fix: consumer socket lock not held during snapshot record X-Git-Tag: v2.9.8~13 X-Git-Url: https://git.lttng.org./?a=commitdiff_plain;h=d31d74e226f91d92a421c2e9056c557620097617;p=lttng-tools.git Fix: consumer socket lock not held during snapshot record This missing lock was identified while stress-testing the snapshot tracing mode. The "post_mortem" test case would sometimes hang on a push_metadata() call waiting for a status reply from the consumer daemon. This test demonstrated a race that consists in killing an application and taking a snapshot near-simultaneously. This causes the app management thread to issue a "push metadata" command to the consumerd while the lttng client is issuing a snapshot record command. Since the snapshot record does not acquire the consumer socket lock, the "push metadata" and "snapshot" commands end-up mixed-up on the socket which ultimately causes the "apps management" thread to wait for a reply forever while holding the socket's lock. This prevents the client, invoked by the test script, from completing the "stop" operation on the session. Signed-off-by: Jérémie Galarneau --- diff --git a/src/bin/lttng-sessiond/consumer.c b/src/bin/lttng-sessiond/consumer.c index 7c8f4e830..cad1587d2 100644 --- a/src/bin/lttng-sessiond/consumer.c +++ b/src/bin/lttng-sessiond/consumer.c @@ -1406,7 +1406,9 @@ int consumer_snapshot_channel(struct consumer_socket *socket, uint64_t key, } health_code_update(); + pthread_mutex_lock(socket->lock); ret = consumer_send_msg(socket, &msg); + pthread_mutex_unlock(socket->lock); if (ret < 0) { goto error; }