Fix: combined tracing of lttng-ust 2.12/2.13 generates corrupted traces
Observed issue
==============
When tracing applications linked against lttng-ust 2.12 and lttng-ust
2.13 in parallel with a lttng-tools 2.13 into the same per-uid buffers,
with the "procname" context enabled, babeltrace fails with "Event id NN
is outside range" when reading the trace:
[14:51:58.
717006865] (+5.
927872956) x lttng_ust_statedump:start: { cpu_id = 1 }, { procname = "sample-2.13-ust" }, { }
[error] Event id 41984 is outside range.
[error] Reading event failed.
Error printing trace.
Cause
=====
Inspection of the trace reveals that the layout of the procname context
field changed from 17 bytes to 16 bytes between 2.12 and 2.13. This is
an issue when applications share a per-uid ring buffer, because context
fields are associated with channels, and need to have the same layout
across all processes tracing into a given channel.
The layout of the procname field described by the trace metadata is that
of the first application which happens to register that channel in the
session lifetime.
Therefore, the procname context field length is part of the LTTng-UST
ABI and cannot be changed without breaking the LTTng-UST ABI (bumping
LTTNG_UST_ABI_MAJOR_VERSION_OLDEST_COMPATIBLE), which is unwanted
between 2.12 and 2.13. Keeping compatibility for combined use by
different applications between lttng-ust 2.12 and 2.13 is a required
feature for this release, because lttng-ust 2.13 introduces a library
ABI break (soname bump).
An example scenario leading to this issue:
1) trace created for per-uid buffers,
2) add procname context
3) start tracing
4) Application [a] linked against lttng-ust 2.13 registers the channel to
lttng-sessiond, sending its context descriptions with a 16-byte
procname context field,
5) Application [b] linked against lttng-ust 2.12 registers the same channel
to lttng-sessiond,
6) Application [b] traces an event with the procname context, followed
by an event payload with a single "string" field.
7) A trace viewer will observe the procname context, followed by an
extra null character, and thus mistakenly consider the event payload
to be an empty string. Reading the next event header will fail
because the string payload will be expected to contain an event ID.
Solution
========
Revert the procname context field size to 17 bytes to stay compatible
with lttng-ust 2.12.
In an abundance of caution, also revert the size of the
lttng_ust_statedump:procname "procname" field to 17, so there won't be
duplicated event IDs for this event when applications linked against
lttng-ust 2.12 and 2.13 are traced concurrently for the same user ID
in per-uid tracing.
History
=======
This issue was introduced by commit
0db3d6ee9be ("port: fix
pthread_setname_np integration") within the 2.13 development cycle.
Known drawbacks
===============
Applications currently running which are linked against a liblttng-ust
2.13 without this fix should be restarted after upgrading the library to
liblttng-ust 2.13 with this fix.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I206086df8b71463c248ca186343baaff5452762b