Fix: relayd comm: improperly packed rotate streams command header
authorJérémie Galarneau <jeremie.galarneau@efficios.com>
Tue, 14 Dec 2021 19:52:24 +0000 (14:52 -0500)
committerJérémie Galarneau <jeremie.galarneau@efficios.com>
Thu, 16 Dec 2021 16:05:56 +0000 (11:05 -0500)
Observed issue
==============

The implicit rotation of a session performed during its destruction
fails on LTTng 2.12 (and upwards) with the following errors:
  Error: Relayd rotate streams replied error 152
  Error: Relayd rotate stream failed. Cleaning up relayd 2
  Error: Rotate channel failed
  Failed to find relay daemon socket: relayd_id = 2
  Error: Failed to perform a quiet rotation as part of the destruction of session "my_session": Rotation failure on consumer

Cause
=====

Error 152 matches the LTTNG_ERR_INVALID_PROTOCOL error, which implies
that the consumer daemon sent an unexpected command to the relay daemon.

It was determined that the RELAYD_ROTATE_STREAMS command header is not
properly packed since the LTTNG_PACKED annotation was omitted from its
`new_chunk_id` optional field. The documentation of LTTNG_OPTIONAL_COMM
duly indicates that this is required.

Without the use of LTTNG_PACKED, various lengths of padding (3 or 7
bytes) are inserted between new_chunk_id's `is_set` and `value` field to
align `value`, which results in an incorrect interpretation of the
command's arguments.

The relay daemon catches the protocol error when it is built in a
configuration that inserts 7 bytes of padding, while the consumer only
inserts three.

Solution
========

The solution proposed here is not perfect, see "Known drawbacks".

First, if we were to annotate the field, patched consumer daemons would
send unintelligible command headers to unpatched relay daemons. Leaving
it as is is the least of all evils, see "Known drawbacks" for more
details.

From the relay daemon end, we can easily anticipate the padding problem
by reading the `stream_count` field and use it to determine the expected
size of the payload.

The difference between the actual size of the payload and the expected
size allows us to determine the padding size and use the appropriate
declaration of the structure to interpret the command's arguments.

Known drawbacks
===============

While this fix causes the relay daemon to handle all improperly packed
command headers received from an unpatched consumer daemon, the reverse
is not completely true.

The following tables show which cases are known to work and which are
known to be broken when patched and unpatched versions of the relay
and consumer daemons are mixed, with the various alignment constraints.

Note that here:
  - 4 byte alignement implies "daemon running on an architecture on
    which uint64_t is aligned on an 4-byte boundary" (e.g. x86),
  - 8 byte alignement implies "daemon running on an architecture on
    which uint64_t is aligned on an 8-byte boundary" (e.g. x86-64).

Scenario 1 - Unpatched relay daemon and unpatched consumer daemon
 -----------------------------------------------------------------------------------
| Architecture alignment | 4 byte alignement consumerd | 8 byte alignment consumerd |
|------------------------|-----------------------------|----------------------------|
| 4 byte alignment relay |           Works             |           Fails            |
| 8 byte alignment relay |           Fails             |           Works            |
 -----------------------------------------------------------------------------------

Scenario 2 - Patched relay daemon and unpatched consumer daemon
 -----------------------------------------------------------------------------------
| Architecture alignment | 4 byte alignement consumerd | 8 byte alignment consumerd |
|------------------------|-----------------------------|----------------------------|
| 4 byte alignment relay |           Works             |           Works            |
| 8 byte alignment relay |           Works             |           Works            |
 -----------------------------------------------------------------------------------

Scenario 3 - Unpatched relay daemon and patched consumer daemon
 -----------------------------------------------------------------------------------
| Architecture alignment | 4 byte alignement consumerd | 8 byte alignment consumerd |
|------------------------|-----------------------------|----------------------------|
| 4 byte alignment relay |           Works             |           Fails            |
| 8 byte alignment relay |           Fails             |           Works            |
 -----------------------------------------------------------------------------------

Scenario 4 - Patched relay daemon and patched consumer daemon
 -----------------------------------------------------------------------------------
| Architecture alignment | 4 byte alignement consumerd | 8 byte alignment consumerd |
|------------------------|-----------------------------|----------------------------|
| 4 byte alignment relay |           Works             |           Works            |
| 8 byte alignment relay |           Works             |           Works            |
 -----------------------------------------------------------------------------------

Note that Scenarios 1 and 3 are the same since this fix doesn't
change the behaviour of the consumer daemon.

Also note that packing the `new_chunk_id` field would break the two
working cases of scenario 3 which are, in all likelyhood, the more
common cases.

A new command using a properly packed version of the command's header
could be implemented in future versions, but this presents no benefit as
part of this fix.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I3b822a9766ea5be51996a771a0d4a90efe29ce0b

src/bin/lttng-relayd/main.cpp
src/common/sessiond-comm/relayd.h

index 07d79adee6fa929642d17b46fe080a90e3708656..adbf3f3b4c18be0693770903599934588c0c37e5 100644 (file)
@@ -2513,6 +2513,125 @@ end_no_session:
        return ret;
 }
 
+static ssize_t relay_unpack_rotate_streams_header(
+               const struct lttng_buffer_view *payload,
+               struct lttcomm_relayd_rotate_streams *_rotate_streams)
+{
+       struct lttcomm_relayd_rotate_streams rotate_streams;
+       /*
+        * Set to the smallest version (packed) of `lttcomm_relayd_rotate_streams`.
+        * This is the smallest version of this structure, but it can be larger;
+        * this variable is updated once the proper size of the structure is known.
+        *
+        * See comment at the declaration of this structure for more information.
+        */
+       ssize_t header_len = sizeof(struct lttcomm_relayd_rotate_streams_packed);
+       size_t expected_payload_size_no_padding,
+               expected_payload_size_3_bytes_padding,
+               expected_payload_size_7_bytes_padding;
+
+       if (payload->size < header_len) {
+               ERR("Unexpected payload size in \"relay_rotate_session_stream\": expected >= %zu bytes, got %zu bytes",
+                               header_len, payload->size);
+               goto error;
+       }
+
+       /*
+        * Some versions incorrectly omitted the LTTNG_PACKED annotation on the
+        * `new_chunk_id` optional field of struct lttcomm_relayd_rotate_streams.
+        *
+        * We start by "unpacking" `stream_count` to figure out the padding length
+        * emited by our peer.
+        */
+       memcpy(&rotate_streams.stream_count, payload->data,
+                       sizeof(rotate_streams.stream_count));
+       rotate_streams = (typeof(rotate_streams)) {
+               .stream_count = be32toh(rotate_streams.stream_count),
+       };
+
+       /*
+        * Payload size expected given the possible padding lengths in
+        * `struct lttcomm_relayd_rotate_streams`.
+        */
+       expected_payload_size_no_padding = (rotate_streams.stream_count *
+               sizeof(*rotate_streams.rotation_positions)) +
+               sizeof(lttcomm_relayd_rotate_streams_packed);
+       expected_payload_size_3_bytes_padding = (rotate_streams.stream_count *
+               sizeof(*rotate_streams.rotation_positions)) +
+               sizeof(lttcomm_relayd_rotate_streams_3_bytes_padding);
+       expected_payload_size_7_bytes_padding = (rotate_streams.stream_count *
+               sizeof(*rotate_streams.rotation_positions)) +
+               sizeof(lttcomm_relayd_rotate_streams_7_bytes_padding);
+
+       if (payload->size == expected_payload_size_no_padding) {
+               struct lttcomm_relayd_rotate_streams_packed packed_rotate_streams;
+
+               /*
+                * This handles cases where someone might build with
+                * -fpack-struct or any other toolchain that wouldn't produce
+                * padding to align `value`.
+                */
+               DBG("Received `struct lttcomm_relayd_rotate_streams` with no padding");
+
+               header_len = sizeof(packed_rotate_streams);
+               memcpy(&packed_rotate_streams, payload->data, header_len);
+
+               /* Unpack the packed structure to the natively-packed version. */
+               *_rotate_streams = (typeof(*_rotate_streams)) {
+                       .stream_count = be32toh(packed_rotate_streams.stream_count),
+                       .new_chunk_id = (typeof(_rotate_streams->new_chunk_id)) {
+                               .is_set = !!packed_rotate_streams.new_chunk_id.is_set,
+                               .value = be64toh(packed_rotate_streams.new_chunk_id.value),
+                       }
+               };
+       } else if (payload->size == expected_payload_size_3_bytes_padding) {
+               struct lttcomm_relayd_rotate_streams_3_bytes_padding padded_rotate_streams;
+
+               DBG("Received `struct lttcomm_relayd_rotate_streams` with 3 bytes of padding (4-byte aligned peer)");
+
+               header_len = sizeof(padded_rotate_streams);
+               memcpy(&padded_rotate_streams, payload->data, header_len);
+
+               /* Unpack the 3-byte padded structure to the natively-packed version. */
+               *_rotate_streams = (typeof(*_rotate_streams)) {
+                       .stream_count = be32toh(padded_rotate_streams.stream_count),
+                       .new_chunk_id = (typeof(_rotate_streams->new_chunk_id)) {
+                               .is_set = !!padded_rotate_streams.new_chunk_id.is_set,
+                               .value = be64toh(padded_rotate_streams.new_chunk_id.value),
+                       }
+               };
+       } else if (payload->size == expected_payload_size_7_bytes_padding) {
+               struct lttcomm_relayd_rotate_streams_7_bytes_padding padded_rotate_streams;
+
+               DBG("Received `struct lttcomm_relayd_rotate_streams` with 7 bytes of padding (8-byte aligned peer)");
+
+               header_len = sizeof(padded_rotate_streams);
+               memcpy(&padded_rotate_streams, payload->data, header_len);
+
+               /* Unpack the 7-byte padded structure to the natively-packed version. */
+               *_rotate_streams = (typeof(*_rotate_streams)) {
+                       .stream_count = be32toh(padded_rotate_streams.stream_count),
+                       .new_chunk_id = (typeof(_rotate_streams->new_chunk_id)) {
+                               .is_set = !!padded_rotate_streams.new_chunk_id.is_set,
+                               .value = be64toh(padded_rotate_streams.new_chunk_id.value),
+                       }
+               };
+
+               header_len = sizeof(padded_rotate_streams);
+       } else {
+               ERR("Unexpected payload size in \"relay_rotate_session_stream\": expected %zu, %zu or %zu bytes, got %zu bytes",
+                               expected_payload_size_no_padding,
+                               expected_payload_size_3_bytes_padding,
+                               expected_payload_size_7_bytes_padding,
+                               payload->size);
+               goto error;
+       }
+
+       return header_len;
+error:
+       return -1;
+}
+
 /*
  * relay_rotate_session_stream: rotate a stream to a new tracefile for the
  * session rotation feature (not the tracefile rotation feature).
@@ -2530,11 +2649,11 @@ static int relay_rotate_session_streams(
        struct lttcomm_relayd_rotate_streams rotate_streams;
        struct lttcomm_relayd_generic_reply reply = {};
        struct relay_stream *stream = NULL;
-       const size_t header_len = sizeof(struct lttcomm_relayd_rotate_streams);
        struct lttng_trace_chunk *next_trace_chunk = NULL;
        struct lttng_buffer_view stream_positions;
        char chunk_id_buf[MAX_INT_DEC_LEN(uint64_t)];
        const char *chunk_id_str = "none";
+       ssize_t header_len;
 
        if (!session || !conn->version_check_done) {
                ERR("Trying to rotate a stream before version check");
@@ -2548,24 +2667,12 @@ static int relay_rotate_session_streams(
                goto end_no_reply;
        }
 
-       if (payload->size < header_len) {
-               ERR("Unexpected payload size in \"relay_rotate_session_stream\": expected >= %zu bytes, got %zu bytes",
-                               header_len, payload->size);
+       header_len = relay_unpack_rotate_streams_header(payload, &rotate_streams);
+       if (header_len < 0) {
                ret = -1;
                goto end_no_reply;
        }
 
-       memcpy(&rotate_streams, payload->data, header_len);
-
-       /* Convert header to host endianness. */
-       rotate_streams = (typeof(rotate_streams)) {
-               .stream_count = be32toh(rotate_streams.stream_count),
-               .new_chunk_id = (typeof(rotate_streams.new_chunk_id)) {
-                       .is_set = !!rotate_streams.new_chunk_id.is_set,
-                       .value = be64toh(rotate_streams.new_chunk_id.value),
-               }
-       };
-
        if (rotate_streams.new_chunk_id.is_set) {
                /*
                 * Retrieve the trace chunk the stream must transition to. As
@@ -2603,7 +2710,7 @@ static int relay_rotate_session_streams(
                        chunk_id_str);
 
        stream_positions = lttng_buffer_view_from_view(payload,
-                       sizeof(rotate_streams), -1);
+                       header_len, -1);
        if (!stream_positions.data ||
                        stream_positions.size <
                                        (rotate_streams.stream_count *
@@ -2662,8 +2769,6 @@ end_no_reply:
        return ret;
 }
 
-
-
 /*
  * relay_create_trace_chunk: create a new trace chunk
  */
index fe7d2936d869a0f03dcc82078b005d3592cdbd90..36a91070cbfd0cf6a587d238f6712912b9b78de0 100644 (file)
@@ -239,17 +239,70 @@ struct lttcomm_relayd_stream_rotation_position {
        uint64_t rotate_at_seq_num;
 } LTTNG_PACKED;
 
+/*
+ * For certain releases, the LTTNG_PACKED annotation was missing on the
+ * `new_chunk_id` field which causes padding to be added between the
+ * "optional" structure's `is_set` and `value` fields.
+ *
+ * Three alignment cases are handled:
+ *   - `value` is aligned to the next byte boundary after `is_set`
+ *     no padding is produced, see
+ *     `struct lttcomm_relayd_rotate_streams_packed`,
+ *   - `value` is aligned to the next 4-byte boundary after `is_set`
+ *     (e.g. x86), 3 bytes of padding are produced, see
+ *     `struct lttcomm_relayd_rotate_streams_3_bytes_padding`,
+ *   - `value` is aligned to the next 8-byte boundary after `is_set`
+ *     (e.g. x86-64), 7 bytes of padding are produced, see
+ *     `struct lttcomm_relayd_rotate_streams_7_bytes_padding`.
+ *
+ * Note that since this structure's advertised size is used to determine
+ * the size of the padding it includes, it can't be extended with new
+ * optional fields. A new command would be needed.
+ */
 struct lttcomm_relayd_rotate_streams {
        uint32_t stream_count;
        /*
         * Streams can be rotated outside of a chunk but not be parented to
         * a new chunk.
+        *
+        * Improperly packed, but left as-is for backwards compatibility
+        * with unpatched relay daemons.
         */
        LTTNG_OPTIONAL_COMM(uint64_t) new_chunk_id;
        /* `stream_count` positions follow. */
        struct lttcomm_relayd_stream_rotation_position rotation_positions[];
 } LTTNG_PACKED;
 
+struct lttcomm_relayd_rotate_streams_packed {
+       uint32_t stream_count;
+       LTTNG_OPTIONAL_COMM(uint64_t) LTTNG_PACKED new_chunk_id;
+       struct lttcomm_relayd_stream_rotation_position rotation_positions[];
+} LTTNG_PACKED;
+
+struct lttcomm_relayd_rotate_streams_3_bytes_padding {
+       uint32_t stream_count;
+       struct {
+               union {
+                       uint8_t is_set;
+                       uint32_t padding;
+               };
+               uint64_t value;
+       } LTTNG_PACKED new_chunk_id;
+       struct lttcomm_relayd_stream_rotation_position rotation_positions[];
+} LTTNG_PACKED;
+
+struct lttcomm_relayd_rotate_streams_7_bytes_padding {
+       uint32_t stream_count;
+       struct {
+               union {
+                       uint8_t is_set;
+                       uint64_t padding;
+               };
+               uint64_t value;
+       } LTTNG_PACKED new_chunk_id;
+       struct lttcomm_relayd_stream_rotation_position rotation_positions[];
+} LTTNG_PACKED;
+
 struct lttcomm_relayd_create_trace_chunk {
        uint64_t chunk_id;
        /* Seconds since EPOCH. */
This page took 0.0309469999999999 seconds and 4 git commands to generate.