Observed Issue
==============
The LTTng-IVC tests fail on the `regenerate metadata` tests which
essentially:
- Setups a user space session
- Enables events
- Traces an application
- Stops tracing
- Validates the trace
- Truncates the metadata file (empties it)
- Starts tracing
- Regenerates the metadata
- Stops the session
- Validates the trace
The last trace validation step fails on an empty file (locally) or
a garbled file (remote).
The in-tree tests did no catch any of this since they essentially don't
test much. They verify that the command works (returns 0) but do not
validate any of its effects.
The issue was bisected down to:
commit
6f9449c22eef59294cf1e1dc3610a5cbf14baec0 (HEAD)
Author: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Date: Sun May 10 18:00:26 2020 -0400
consumerd: refactor: split read_subbuf into sub-operations
[...]
Cause
=====
The commit that introduced the issue refactored the sub-buffer
consumption loop to eliminate code duplications between the user space
and kernel consumer daemons.
In doing so, it eleminated a metadata version check from the consumption
path.
The consumption of a metadata sub-buffer follows those relevant
high-level steps:
- `get` the sub-buffer
- /!\ user space specific /!\
- if the `get` fails, attempt to flush the metadata cache's
contents to the ring-buffer
- populate `stream_subbuffer` properties (size, version, etc.)
- check the sub-buffer's version against the last known metadata
version (pre-consume step)
- if they don't match, a metadata regeneration occurred: reset the
metadata consumed position
- consume (actual write/send)
- `put` sub-buffer
[...]
As shown above, the user space consumer must manage the flushing of the
metadata cache explicitly as opposed to the kernel domain for which the
tracer performs the flushing implicitly through the `get` operation.
When the user space consumer encounters a `get` failure, it checks
if all the metadata cache was flushed (consumed position != cache size),
and flushes any remaining contents.
However, the metadata version could have changed and yielded an
identical cache size: a regeneration without any new metadata will
yield the same cache size.
Since
6f9449c22, the metadata version check is only performed after
a successful `get`. This means that after a regeneration, `get`
never succeeds (there is seemingly nothing to consume), and the
metadata version check is never performed.
Therefore, the metadata stream is never put in the `reset` mode,
effectively not regenerating the data.
Note that producing new metadata (e.g. a newly registering app
announcing new events) would work around the problem here.
Solution
========
Add a metadata version check when failing to `get` a metadata
sub-buffer. This is done in `commit_one_metadata_packet()` when the
cache size is seen to be equal to the consumed position.
When this occurs, `consumer_stream_metadata_set_version()`, a new
consumer stream method, is invoked which sets the new metadata version,
sets the `reset` flag, and discards any previously bucketized metadata.
The metadata cache's consumed position is also reset, allowing the
cache flush to take place.
`metadata_stream_reset_cache()` is renamed to
`metadata_stream_reset_cache_consumed_position()` since its name is
misleading and since it is used as part of the fix.
Know drawbacks
==============
None.
Change-Id: I3b933c8293f409f860074bd49bebd8d1248b6341
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Reported-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
}
DBG("New metadata version detected");
- stream->metadata_version = subbuffer->info.metadata.version;
- stream->reset_metadata_flag = 1;
-
- if (stream->metadata_bucket) {
- metadata_bucket_reset(stream->metadata_bucket);
- }
+ consumer_stream_metadata_set_version(stream,
+ subbuffer->info.metadata.version);
if (stream->read_subbuffer_ops.reset_metadata) {
stream->read_subbuffer_ops.reset_metadata(stream);
end:
return ret;
}
+
+void consumer_stream_metadata_set_version(
+ struct lttng_consumer_stream *stream, uint64_t new_version)
+{
+ assert(new_version > stream->metadata_version);
+ stream->metadata_version = new_version;
+ stream->reset_metadata_flag = 1;
+
+ if (stream->metadata_bucket) {
+ metadata_bucket_reset(stream->metadata_bucket);
+ }
+}
int consumer_stream_enable_metadata_bucketization(
struct lttng_consumer_stream *stream);
+/*
+ * Set the version of a metadata stream (i.e. following a metadata
+ * regeneration).
+ *
+ * Changing the version of a metadata stream will cause any bucketized metadata
+ * to be discarded and will mark the metadata stream for future `reset`.
+ */
+void consumer_stream_metadata_set_version(
+ struct lttng_consumer_stream *stream, uint64_t new_version);
+
#endif /* LTTNG_CONSUMER_STREAM_H */
}
static
-void metadata_stream_reset_cache(struct lttng_consumer_stream *stream)
+void metadata_stream_reset_cache_consumed_position(
+ struct lttng_consumer_stream *stream)
{
DBG("Reset metadata cache of session %" PRIu64,
stream->chan->session_id);
stream->ust_metadata_pushed = 0;
- stream->metadata_version = stream->chan->metadata_cache->version;
- stream->reset_metadata_flag = 1;
}
/*
int ret;
pthread_mutex_lock(&stream->chan->metadata_cache->lock);
- if (stream->chan->metadata_cache->max_offset
- == stream->ust_metadata_pushed) {
- ret = 0;
- goto end;
+ if (stream->chan->metadata_cache->max_offset ==
+ stream->ust_metadata_pushed) {
+ /*
+ * In the context of a user space metadata channel, a
+ * change in version can be detected in two ways:
+ * 1) During the pre-consume of the `read_subbuffer` loop,
+ * 2) When populating the metadata ring buffer (i.e. here).
+ *
+ * This function is invoked when there is no metadata
+ * available in the ring-buffer. If all data was consumed
+ * up to the size of the metadata cache, there is no metadata
+ * to insert in the ring-buffer.
+ *
+ * However, the metadata version could still have changed (a
+ * regeneration without any new data will yield the same cache
+ * size).
+ *
+ * The cache's version is checked for a version change and the
+ * consumed position is reset if one occurred.
+ *
+ * This check is only necessary for the user space domain as
+ * it has to manage the cache explicitly. If this reset was not
+ * performed, no metadata would be consumed (and no reset would
+ * occur as part of the pre-consume) until the metadata size
+ * exceeded the cache size.
+ */
+ if (stream->metadata_version !=
+ stream->chan->metadata_cache->version) {
+ metadata_stream_reset_cache_consumed_position(stream);
+ consumer_stream_metadata_set_version(stream,
+ stream->chan->metadata_cache->version);
+ } else {
+ ret = 0;
+ goto end;
+ }
}
write_len = ustctl_write_one_packet_to_channel(stream->chan->uchan,
goto end;
}
- subbuf->info.metadata.version = stream->chan->metadata_cache->version;
+ subbuf->info.metadata.version = stream->metadata_version;
end:
return ret;
stream->read_subbuffer_ops.extract_subbuffer_info =
extract_metadata_subbuffer_info;
stream->read_subbuffer_ops.reset_metadata =
- metadata_stream_reset_cache;
+ metadata_stream_reset_cache_consumed_position;
if (stream->chan->is_live) {
stream->read_subbuffer_ops.on_sleep = signal_metadata;
ret = consumer_stream_enable_metadata_bucketization(