From: Jérémie Galarneau Date: Tue, 21 Jun 2022 20:56:23 +0000 (-0400) Subject: consumerd: send a buffer static sample on flush command X-Git-Url: https://git.lttng.org./?a=commitdiff_plain;h=9cc4ae91845c03b141af7ef58a86a2a9689dfafd;p=lttng-tools.git consumerd: send a buffer static sample on flush command When application exits during per-pid tracing, both the session and consumer daemons notice it. The session daemon sees the application's command pipe hanging-up, while the consumer daemon sees the application's data-ready pipe hanging-up. Upon handling this event, both daemons tear down their representation of the channels. In an ideal world, we'd want to sample the streams' "consumed_size" at the last possible moment to get the size of all consumed data for this stream. However, this is problematic in the following scenario: - the sessiond destroys the channel before the consumer daemon, - the consumer daemon sends a final buffer stats sample on tear down, - the sessiond can do nothing with the sample as it doesn't know that channel anymore. Note that the session daemon handles the case where it doesn't know a channel gracefully. When an application being traced in per-pid mode is torn down, the session requests a flush of its buffers to the consumer daemon. We can use this opportunity to emit a buffer stats sample. This is still racy since the tear down of the channel could complete on the session daemon's end before that last sample can be processed. In practice, though, it markedly improves the precision of size-based rotations in per-pid tracing mode. On my work machine, I see the size-based rotation tests pass with archive sizes within ~10% of the size threshold. Before this, we lost a lot of samples from short-lived buffers and it would not be rare to see archives end-up multiple times (5x-10x) larger than the size-threshold. Another problem is that the consumed_size returned by the consumer daemon will not include the packets that have yet to be consumed. Whether or not this is a fix is debatable since it arguably just improves the precision of size-based rotations. Signed-off-by: Jérémie Galarneau Change-Id: I8a72328ba1733ac2f50c77a1ff81d7a6aaac095c --- diff --git a/src/common/consumer/consumer-timer.cpp b/src/common/consumer/consumer-timer.cpp index f37bfa350..4e308383d 100644 --- a/src/common/consumer/consumer-timer.cpp +++ b/src/common/consumer/consumer-timer.cpp @@ -637,11 +637,8 @@ end: return ret; } -/* - * Execute action on a monitor timer. - */ -static -void monitor_timer(struct lttng_consumer_channel *channel) +/* Sample and send channel buffering statistics to the session daemon. */ +void sample_and_send_channel_buffer_stats(struct lttng_consumer_channel *channel) { int ret; int channel_monitor_pipe = @@ -787,7 +784,7 @@ void *consumer_timer_thread(void *data) struct lttng_consumer_channel *channel; channel = (lttng_consumer_channel *) info.si_value.sival_ptr; - monitor_timer(channel); + sample_and_send_channel_buffer_stats(channel); } else if (signr == LTTNG_CONSUMER_SIG_EXIT) { LTTNG_ASSERT(CMM_LOAD_SHARED(consumer_quit)); goto end; diff --git a/src/common/consumer/consumer.hpp b/src/common/consumer/consumer.hpp index dd8eb40d4..44dd5d1bb 100644 --- a/src/common/consumer/consumer.hpp +++ b/src/common/consumer/consumer.hpp @@ -1085,5 +1085,6 @@ enum lttcomm_return_code lttng_consumer_open_channel_packets( struct lttng_consumer_channel *channel); int consumer_metadata_wakeup_pipe(const struct lttng_consumer_channel *channel); void lttng_consumer_sigbus_handle(void *addr); +void sample_and_send_channel_buffer_stats(struct lttng_consumer_channel *channel); #endif /* LIB_CONSUMER_H */ diff --git a/src/common/ust-consumer/ust-consumer.cpp b/src/common/ust-consumer/ust-consumer.cpp index 034043444..30d1f1021 100644 --- a/src/common/ust-consumer/ust-consumer.cpp +++ b/src/common/ust-consumer/ust-consumer.cpp @@ -702,6 +702,14 @@ static int flush_channel(uint64_t chan_key) next: pthread_mutex_unlock(&stream->lock); } + + /* + * Send one last buffer statistics update to the session daemon. This + * ensures that the session daemon gets at least one statistics update + * per channel even in the case of short-lived channels, such as when a + * short-lived app is traced in per-pid mode. + */ + sample_and_send_channel_buffer_stats(channel); error: rcu_read_unlock(); return ret;