From: Jérémie Galarneau Date: Thu, 25 Jun 2020 23:22:24 +0000 (-0400) Subject: Fix: post-clear trace chunk has a late beginning packet X-Git-Tag: v2.12.2~22 X-Git-Url: http://git.lttng.org./?a=commitdiff_plain;h=fc0c52e9fb0608ede702556178a68d15a1d25af8;hp=fc0c52e9fb0608ede702556178a68d15a1d25af8;p=lttng-tools.git Fix: post-clear trace chunk has a late beginning packet Observed issue ============== In the following scenario: - create a regular session - enable an event that occurs on only one core - start tracing - trace for a while - stop tracing - clear the session - start - trace for a while (referred to the "active period" later on) - stop The resulting trace will contain a very short stream intersection as the active stream will contain packets spanning the entire active period. However, all other streams will contain a single packet at the end of the active period with a duration of 0 ns. This presents two problems: 1) This makes the stream intersection mode of viewers unusable, 2) This misleads the user into thinking that tracing was not active for some buffers. Cause ===== The packet beginning timestamps of the buffers are initialized on creation (on the first "start" of a tracing session). When a "clear" is performed on a session, all open packets are closed and the existing contents are purged. If a stream is inactive, it is possible for no packet to be "opened" until the "stop" of the tracing session. On stop, a "flush_empty" is performed. Such a flush opens a packet (if it was not already the case), closes it, and marks the packet as being ready for consumption. Solution ======== Attempt to flush an empty packet as close to the rotation point as possible. In the event where a stream remains inactive after the rotation point, this ensures that the new trace chunk has a beginning timestamp set at the begining of the trace chunk instead of only creating an empty packet when the trace chunk is stopped. This indicates to the viewers that the stream was being recorded, but more importantly it allows viewers to determine a useable trace intersection. This presents a problem in the case where the ring-buffer is completely full. Consider the following scenario: - The consumption of data is slow (slow network, for instance), - The ring buffer is full, - A rotation is initiated, - The flush below does nothing (no space left to open a new packet), - The other streams rotate very soon, and new data is produced in the new chunk, - This stream completes its rotation long after the rotation was initiated - The session is stopped before any event can be produced in this stream's buffers. The resulting trace chunk will have a single packet temporaly at the end of the trace chunk for this stream making the stream intersection more narrow than it should be. To work-around this, an empty flush is performed after the first consumption of a packet during a rotation if the initial flush failed. The idea is that consuming a packet frees enough space to switch packets in this scenario and allows the tracer to "stamp" the beginning of the new trace chunk at the earliest possible point. Note that metadata streams are always skipped when opening a packet. This is done for two reasons: 1) Timestamps are not relevant to the metadata stream 2) Inserting an empty packet in the metadata stream after a rotation breaks the use of "clear" in live mode. The contents of the metadata streams of successive chunks must be strict superset of one another as live clients only receive the information appended to a metadata stream (i.e. the parts it already has received can't change). If a flush_empty was performed after a clear/rotation, it would result in an empty packet being inserted at the beginning of the metadata stream that wasn't present in the first chunk. This would cause the live client and relay daemon to have mismatching copies of the metadata stream. Known drawbacks =============== In the case of an inactive stream, this results in the completed trace chunk archive containing an extra empty packet at the beginning of the stream (typically 4kB). In the case of an active stream, this change will cause the first packet to be empty or contain few events. Those are all efficiency losses that are inevitable (AFAIK) given the current buffer control APIs. It will be possible to recoup those losses if an API allowing the consumer daemon to open a new packet is introduced. As noted in the comments, this patch is not final. The flush after the rotation should not be open-coded in lttng_consumer_read_subbuffer. It should be a data-stream specific "post-consume" step. Signed-off-by: Jérémie Galarneau Change-Id: I7d8ab876e55e9d0718a55ec1bb77ec6466accc02 ---