Fix: cleanup inactive FDs in the consumer polling thread
Users have reported assert() hitting on consumerd shutdown on a
non-empty data stream hash table.
Relevant stack trace:
[...] in lttng_ht_destroy (ht=0x6) at hashtable.c:162
[...] in lttng_consumer_cleanup () at consumer.c:1207
[...] in main ([...]) at lttng-consumerd.c:625
This is reproducible when a consumerd is shutting down at the same
time as one of its relay daemon peers.
On failure to reach a relay daemon, all of that relay daemons'
associated streams are marked as having an inactive endpoint (see
cleanup_relayd(), consumer.c:467). The data polling thread is notified
of the change through an empty message on the "data" pipe.
Before blocking on the next poll(), the data polling thread checks if
it needs to update its poll set using the "need_update" flag. This
flag is set anytime a stream is added or deleted.
While building a new poll set, streams that are now marked as inactive
or as having an inactive endpoint are not included in the new poll
set. Those inactive streams are in a transitional state, awaiting
a clean-up.
After updating the poll set, the data polling thread checks if it
should quit (via the consumer_quit flag). Assuming this flag is set,
the thread cannot simply exit; it must clean-up any remaining data
stream.
The thread currently performs this check at consumer.c:2532. This
check is erroneous as it assumes that the number of FDs in the poll set is
indicative of the number of FDs the thread has ownership of.
If all streams are inactive, the poll set will contain no FDs to
monitor and the thread will assume that it can exit. This will leave
streams in "data_ht", causing an assertion to hit in the main thread
during the clean-up.
This patch adds an inactive FD count which must also reach zero before
the data polling thread can exit.
The clean-up of the inactive streams occurs as the data polling thread
wakes-up on its "data" pipe. Upon being woken-up on the "data" pipe,
the data polling thread will validate the endpoint status of every
data stream and close those that have been marked as inactive
(see consumer_del_stream(), consumer.c:525).
This occurs as often as necessary to allow the thread to clean-up all
of its inactive streams and exit cleanly.
Signed-off-by: Julien Desfossez <jdesfossez@efficios.com>
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
This page took 0.028096 seconds and 4 git commands to generate.