From: Jonathan Rajotte Date: Thu, 6 May 2021 15:14:52 +0000 (-0400) Subject: Fix: relayd: live: data is missing between viewer attach and retry X-Git-Tag: v2.13.0-rc2~29 X-Git-Url: https://git.lttng.org./?a=commitdiff_plain;h=36c209074c0fb45662eaba6903a185c22793e786;p=lttng-tools.git Fix: relayd: live: data is missing between viewer attach and retry Observed issue ============== Data produced between the time a live viewer attach and a viewer subsequent "polling" is missing from the point of view of the viewer. Using the following reproducer: # A lttng-sessiond is already running lttng-relayd -b -vvv > relayd.log 2>&1 lttng create live --live lttng enable-event -u "hello_world:*" lttng start date # Set the "polling" interval at ~30 seconds babeltrace2 --retry-duration=30000000 -i lttng-live net://localhost/host/$HOSTNAME/live & pid=$! # Make sure we are in the retry phase of bt2 date sleep 2 # Produce events # Note here that hello perform a sleep of 1 seconds before generating # 10 events to make sure lttng-ust had time to register. A total of # 100 events is generated. date for i in `seq 1 10`; do ./hello done # Here we expect to see 100 events outputted at some point (~18 # seconds (30 - 2 - 10) from that time). # No events are received by babeltrace2. date sleep 40 date # We retry the same exercise a second time. echo "Moving to phase 2" date for i in `seq 1 10`; do ./hello done # Here data is received correctly. date sleep 40 date kill $pid wait lttng destroy live pkill lttng-relayd Cause ===== At the moment the viewer attach is done, no streams exist. On the following viewer_get_new_streams (~30 seconds later), streams now exist since they were allocated lazily following the first application registration, but LTTNG_VIEWER_SEEK_LAST is used. This essentially "discards" any events in the stream that happened between the attach time and the viewer_get_new_stream execution. Note that the same problem would happen if a new UID (in per-uid mode) would happen to begin tracing in between viewer retry. Solution ======== Always use `LTTNG_VIEWER_SEEK_BEGINNING` during `viewer_get_new_streams`. Known drawbacks ========= This does not fix the per-pid problem where an app run and die in between the viewer retries. References ========== https://lists.lttng.org/pipermail/lttng-dev/2021-May/029953.html Signed-off-by: Jonathan Rajotte Signed-off-by: Jérémie Galarneau Change-Id: I92fbe6511fb93e31d20922b4eef5c9a221e07716 --- diff --git a/src/bin/lttng-relayd/live.c b/src/bin/lttng-relayd/live.c index 5877467c0..e884cdc8a 100644 --- a/src/bin/lttng-relayd/live.c +++ b/src/bin/lttng-relayd/live.c @@ -1142,10 +1142,19 @@ int viewer_get_new_streams(struct relay_connection *conn) goto send_reply; } + /* + * For any new stream, create it with LTTNG_VIEWER_SEEK_BEGINNING since + * that at this point the client is already attached to the session.Aany + * initial stream will have been created with the seek type at attach + * time (for now most readers use the LTTNG_VIEWER_SEEK_LAST on attach). + * Otherwise any event happening in a new stream between the attach and + * a call to viewer_get_new_streams will be "lost" (never received) from + * the viewer's point of view. + */ pthread_mutex_lock(&session->lock); ret = make_viewer_streams(session, conn->viewer_session, - LTTNG_VIEWER_SEEK_LAST, &nb_total, &nb_unsent, + LTTNG_VIEWER_SEEK_BEGINNING, &nb_total, &nb_unsent, &nb_created, &closed); if (ret < 0) { goto error_unlock_session;