This missing lock was identified while stress-testing the
snapshot tracing mode.
The "post_mortem" test case would sometimes hang on a
push_metadata() call waiting for a status reply from the
consumer daemon.
This test demonstrated a race that consists in killing an
application and taking a snapshot near-simultaneously.
This causes the app management thread to issue a "push metadata"
command to the consumerd while the lttng client is issuing
a snapshot record command.
Since the snapshot record does not acquire the consumer socket lock,
the "push metadata" and "snapshot" commands end-up mixed-up on
the socket which ultimately causes the "apps management" thread
to wait for a reply forever while holding the socket's lock.
This prevents the client, invoked by the test script, from
completing the "stop" operation on the session.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
}
health_code_update();
+ pthread_mutex_lock(socket->lock);
ret = consumer_send_msg(socket, &msg);
+ pthread_mutex_unlock(socket->lock);
if (ret < 0) {
goto error;
}