From: Julien Desfossez Date: Thu, 1 Feb 2018 19:24:10 +0000 (-0500) Subject: Fix: cleanup inactive FDs in the consumer polling thread X-Git-Tag: v2.10.2~2 X-Git-Url: http://git.lttng.org./?a=commitdiff_plain;ds=sidebyside;h=2afa177e790cc3223abe9f1a591536017270e0ab;hp=2afa177e790cc3223abe9f1a591536017270e0ab;p=lttng-tools.git Fix: cleanup inactive FDs in the consumer polling thread Users have reported assert() hitting on consumerd shutdown on a non-empty data stream hash table. Relevant stack trace: [...] in lttng_ht_destroy (ht=0x6) at hashtable.c:162 [...] in lttng_consumer_cleanup () at consumer.c:1207 [...] in main ([...]) at lttng-consumerd.c:625 This is reproducible when a consumerd is shutting down at the same time as one of its relay daemon peers. On failure to reach a relay daemon, all of that relay daemons' associated streams are marked as having an inactive endpoint (see cleanup_relayd(), consumer.c:467). The data polling thread is notified of the change through an empty message on the "data" pipe. Before blocking on the next poll(), the data polling thread checks if it needs to update its poll set using the "need_update" flag. This flag is set anytime a stream is added or deleted. While building a new poll set, streams that are now marked as inactive or as having an inactive endpoint are not included in the new poll set. Those inactive streams are in a transitional state, awaiting a clean-up. After updating the poll set, the data polling thread checks if it should quit (via the consumer_quit flag). Assuming this flag is set, the thread cannot simply exit; it must clean-up any remaining data stream. The thread currently performs this check at consumer.c:2532. This check is erroneous as it assumes that the number of FDs in the poll set is indicative of the number of FDs the thread has ownership of. If all streams are inactive, the poll set will contain no FDs to monitor and the thread will assume that it can exit. This will leave streams in "data_ht", causing an assertion to hit in the main thread during the clean-up. This patch adds an inactive FD count which must also reach zero before the data polling thread can exit. The clean-up of the inactive streams occurs as the data polling thread wakes-up on its "data" pipe. Upon being woken-up on the "data" pipe, the data polling thread will validate the endpoint status of every data stream and close those that have been marked as inactive (see consumer_del_stream(), consumer.c:525). This occurs as often as necessary to allow the thread to clean-up all of its inactive streams and exit cleanly. Signed-off-by: Julien Desfossez Signed-off-by: Jonathan Rajotte Signed-off-by: Jérémie Galarneau ---