Fix: ust-consumer: metadata thread not woken-up after version change
Issue observed
==============
The metadata regeneration test fails, very rarely, in the "streaming"
case on the CI. The interesting part of the test boils down to:
1) start session
2) launch an app tracing one event
3) stop session
4) delete metadata file
5) start session
6) regenerate metadata
7) stop session
8) destroy session
9) read trace: babeltrace fails on an invalid metadata file.
The problem is hard to capture, but modifying the test allows us to see
that there appears to be a short window between steps 7 and 8 where the
metadata file is empty or doesn't exist.
Cause
=====
When metadata is regenerated, its version is bumped and the metadata
cache is "reset". In some cases, such as in this test, the new metadata
will have exactly the same size as it had prior as nothing happened to
change that (e.g. no new apps/probes were registered).
When this occurs, the metadata thread is not woken-up by
consumer_metadata_cache_write() as it sees that max_offset of the
metadata cache didn't change; the data was replaced but it has the same
size.
The metadata consumption thread also checks for version bumps and
resets the amount of consumed metadata. Hence, if the "cache write"
operation woke up the metadata consumption thread, the stream's
"ust metadata pushed" state would be reset and the new contents would
be consumed.
Solution
========
The metadata stream's "ust metadata pushed" position is directly reset
to zero when a metadata version change is detected by the metadata
cache. The metadata poll thread is also woken up to resume the
consumption of the newly-available data.
It is unclear why the change to the consumption position was only done
on the metadata consumption thread's code path and not directly by the
session daemon command handling.
Note that a session rotation will also result in a reset of the pushed
position and a wake-up of the metadata poll thread from the command
handling thread. I am speculating that this couldn't be done due to the
design of the locking at the time of the original
implementation (I haven't checked).
In implementing this change, the metadata reception code path is
untangled a bit to separate the logic that affects the metadata stream
from the logic that manages the metadata cache. I suspect the original
error stems from a mix-up/confusion between both concerns.
When a metadata version change happens, the metadata cache resets its
'max_offset' (in other words, it's current size) and notifies the
caller. The caller then resets the "ust pushed metadata" position to
zero and wakes-up the metadata thread to consume the new contents of the
metadata cache.
Known drawbacks
===============
None.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I142ef957140d497ac7fc4294ca65a55c12518598
This page took 0.026937 seconds and 4 git commands to generate.