The following crash was reported when short-lived applications
are traced in a live session with per-pid buffering channels.
From the original report:
```
Thread 1 (Thread 0x7f72b67fc700 (LWP
1912155)):
#0 0x00005650b3f6ccbd in commit_one_metadata_packet (stream=0x7f729c010bf0) at ust-consumer.c:2537
#1 0x00005650b3f6cf58 in lttng_ustconsumer_sync_metadata (ctx=0x5650b588ce60, metadata=0x7f729c010bf0) at ust-consumer.c:2608
#2 0x00005650b3f4dba3 in do_sync_metadata (metadata=0x7f729c010bf0, ctx=0x5650b588ce60) at consumer-stream.c:471
#3 0x00005650b3f4dd3c in consumer_stream_sync_metadata (ctx=0x5650b588ce60, session_id=0) at consumer-stream.c:548
#4 0x00005650b3f6de78 in lttng_ustconsumer_read_subbuffer (stream=0x7f729c0058e0, ctx=0x5650b588ce60) at ust-consumer.c:2917
#5 0x00005650b3f45196 in lttng_consumer_read_subbuffer (stream=0x7f729c0058e0, ctx=0x5650b588ce60) at consumer.c:3524
#6 0x00005650b3f42da7 in consumer_thread_data_poll (data=0x5650b588ce60) at consumer.c:2894
#7 0x00007f72bdc476db in start_thread (arg=0x7f72b67fc700) at pthread_create.c:463
#8 0x00007f72bd97088f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
The segfault happen on the access to 'stream->chan->metadata_cache->lock'
chan value here is zero.
```
The problem is easily reproducible if a sleep(1) is added just after
the call to lttng_ustconsumer_request_metadata(), before the metadata
stream lock is re-acquired.
During the execution of the "request_metadata", an application can
close. This will cause the session daemon to push any remaining
metadata to the consumer daemon and to close the metadata channel.
Closing the metadata channel closes the metadata stream's wait_fd,
which is an internal pipe. The closure of the metadata pipe is
detected by the metadata_poll thread, which will ensure that all
metadata has been consumed before issuing the deletion of the metadata
stream and channel.
During the deletion, the channel's "stream" attribute the stream's
"chan" attribute are set to NULL as both are logically deleted and
should not longer be used.
Meanwhile, the thread executing commit_one_metadata_packet()
re-acquires the metadata stream lock and trips on the now-NULL "chan"
member.
The fix consists in checking if the metadata stream is logically
deleted after its lock is re-acquired. It is correct for the
sync_metadata operation to then complete successfully as the metadata
is synced: the metadata guarantees this before deleting the
stream/channel.
Since the metadata stream's lifetime is protected by its lock, there
may be other sites that need such a check. The lock and deletion check
could be combined into a single consumer_stream_lock() helper in
follow-up fixes.
Reported-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
end:
return ret;
}
+
+bool consumer_stream_is_deleted(struct lttng_consumer_stream *stream)
+{
+ /*
+ * This function does not take a const stream since
+ * cds_lfht_is_node_deleted was not const before liburcu 0.12.
+ */
+ assert(stream);
+ return cds_lfht_is_node_deleted(&stream->node.node);
+}
*/
int consumer_stream_rotate_output_files(struct lttng_consumer_stream *stream);
+/*
+ * Indicates whether or not a stream is logically deleted. A deleted stream
+ * should no longer be used; its existence is only garanteed by the RCU lock
+ * held by the caller.
+ *
+ * This function must be called with the RCU read side lock held.
+ */
+bool consumer_stream_is_deleted(struct lttng_consumer_stream *stream);
+
#endif /* LTTNG_CONSUMER_STREAM_H */
* interacting with sessiond, else we cause a deadlock with live
* awaiting on metadata to be pushed out.
*
+ * The RCU read side lock must be held by the caller.
+ *
* Return 0 if new metadatda is available, EAGAIN if the metadata stream
* is empty or a negative value on error.
*/
int lttng_ustconsumer_sync_metadata(struct lttng_consumer_local_data *ctx,
- struct lttng_consumer_stream *metadata)
+ struct lttng_consumer_stream *metadata_stream)
{
int ret;
int retry = 0;
+ struct lttng_consumer_channel *metadata_channel;
assert(ctx);
- assert(metadata);
+ assert(metadata_stream);
- pthread_mutex_unlock(&metadata->lock);
+ metadata_channel = metadata_stream->chan;
+ pthread_mutex_unlock(&metadata_stream->lock);
/*
* Request metadata from the sessiond, but don't wait for the flush
* because we locked the metadata thread.
*/
- ret = lttng_ustconsumer_request_metadata(ctx, metadata->chan, 0, 0);
- pthread_mutex_lock(&metadata->lock);
+ ret = lttng_ustconsumer_request_metadata(ctx, metadata_channel, 0, 0);
+ pthread_mutex_lock(&metadata_stream->lock);
if (ret < 0) {
goto end;
}
- ret = commit_one_metadata_packet(metadata);
+ /*
+ * The metadata stream and channel can be deleted while the
+ * metadata stream lock was released. The streamed is checked
+ * for deletion before we use it further.
+ *
+ * Note that it is safe to access a logically-deleted stream since its
+ * existence is still guaranteed by the RCU read side lock. However,
+ * it should no longer be used. The close/deletion of the metadata
+ * channel and stream already guarantees that all metadata has been
+ * consumed. Therefore, there is nothing left to do in this function.
+ */
+ if (consumer_stream_is_deleted(metadata_stream)) {
+ DBG("Metadata stream %" PRIu64 " was deleted during the metadata synchronization",
+ metadata_stream->key);
+ ret = 0;
+ goto end;
+ }
+
+ ret = commit_one_metadata_packet(metadata_stream);
if (ret <= 0) {
goto end;
} else if (ret > 0) {
retry = 1;
}
- ret = ustctl_snapshot(metadata->ustream);
+ ret = ustctl_snapshot(metadata_stream->ustream);
if (ret < 0) {
if (errno != EAGAIN) {
ERR("Sync metadata, taking UST snapshot");