Fix: relayd: compare viewer chunks by ID rather than address
authorMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Thu, 18 Nov 2021 15:37:29 +0000 (10:37 -0500)
committerJérémie Galarneau <jeremie.galarneau@efficios.com>
Fri, 17 Dec 2021 04:48:17 +0000 (23:48 -0500)
Observed issue
==============

In "Fix: relayd: live: erroneous message timestamp observed from live
viewer", we observe that sometimes viewer streams unexpectedly end up
performing a viewer stream rotation at session destroy.

Cause
=====

This may happen in the following scenario:

  1) Relay stream A is rotated to NULL.

  2) viewer_get_next_index for viewer stream A:
  2.1) observes a NULL rstream->trace_chunk, updates the viewer session
       current trace chunk to NULL (viewer_session_set_trace_chunk_copy).
  2.2) "Transition the viewer stream into the latest trace chunk
       available." does not issue viewer_stream_rotate_to_trace_chunk, because
       the condition (rstream->completed_rotation_count ==
       vstream->last_seen_rotation_count + 1 && !rstream->trace_chunk)
       evaluates to "true", and thus the entire if () evaluates to false.

  3) check_index_status detects rstream->closed and
     index_received_seqcount == index_sent_seqcount, thus replying HUP to
     viewer, effectively releasing ownership of the viewer stream.

  4) viewer_get_next_index for viewer stream B (not rotated to NULL yet):
  4.1) observes a non-NULL rstream->trace_chunk, updates the viewer
       session current trace chunk to *a new copy* of the non-NULL
       rstream->trace_chunk (viewer_session_set_trace_chunk_copy).
  4.2) the comparison (conn->viewer_session->current_trace_chunk !=
       vstream->stream_file.trace_chunk) done by pointer don't match, because
       the viewer session current trace chunk is a new copy.

Therefore, due to those stream close scenarios where the viewer session
can go back and forth between NULL and _different copies_ of the relay
chunk, we cannot use a comparison of chunks by address on the viewer
chunks.

Solution
========

Compare the viewer stream chunks by ID rather than address.

Known drawbacks
===============

The comparison is probably slightly slower, but I don't expect this to
be significant.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I05e4f97f26b2659c007726cc29d3edafa17bdb98

src/bin/lttng-relayd/live.c

index 2667fa6ff212bd3e34b3735209573b11561edfe5..70b9b1eb6f1a4efe593a0843c6011503cb56c205 100644 (file)
@@ -1659,9 +1659,9 @@ int viewer_get_next_index(struct relay_connection *conn)
         * This allows clients to consume all the packets of a trace chunk
         * after a session's destruction.
         */
-       if (conn->viewer_session->current_trace_chunk != vstream->stream_file.trace_chunk &&
+       if (!lttng_trace_chunk_ids_equal(conn->viewer_session->current_trace_chunk, vstream->stream_file.trace_chunk) &&
                        !(rstream->completed_rotation_count == vstream->last_seen_rotation_count + 1 && !rstream->trace_chunk)) {
-               DBG("Viewer session and viewer stream chunk differ: "
+               DBG("Viewer session and viewer stream chunk IDs differ: "
                                "vsession chunk %p vstream chunk %p",
                                conn->viewer_session->current_trace_chunk,
                                vstream->stream_file.trace_chunk);
@@ -2028,8 +2028,8 @@ int viewer_get_metadata(struct relay_connection *conn)
        }
 
        if (conn->viewer_session->current_trace_chunk &&
-                       conn->viewer_session->current_trace_chunk !=
-                                       vstream->stream_file.trace_chunk) {
+                       !lttng_trace_chunk_ids_equal(conn->viewer_session->current_trace_chunk,
+                                       vstream->stream_file.trace_chunk)) {
                bool acquired_reference;
 
                DBG("Viewer session and viewer stream chunk differ: "
This page took 0.027056 seconds and 4 git commands to generate.