From: Jérémie Galarneau Date: Tue, 1 Dec 2020 21:51:23 +0000 (-0500) Subject: Fix: sessiond: metadata not created on app unregistration during start X-Git-Tag: v2.11.6~4 X-Git-Url: https://git.lttng.org./?a=commitdiff_plain;h=d02e041f27640bee1badaddc6c3ea75a72741d3c;p=lttng-tools.git Fix: sessiond: metadata not created on app unregistration during start Issue observed ============== A test for an incoming feature (trigger actions on on-event conditions) hangs. While this problem was discovered using this test, it exercises a scenario that is problematic as of this fix. The destruction of a session can hang if a single application being traced unregisters (dies) during the 'start' of a session. Cause ===== When a per-uid session is started, its buffers (channels and streams) are allocated only if an instrumented application is registered to the session daemon at that moment. For historical reasons, the 'data' and 'metadata' buffers are allocated in separate code paths. The 'data' buffers are allocated in ust_app_synchronize() and the 'metadata' buffers are allocated in ust_app_start_trace(). Both functions perform their own look-up for an application session and will gracefully fail if an application session can't be found; it typically means the application has exited. This leaves a race window open where ust_app_synchronize() can succeed in looking-up the application session, and ust_app_start_trace() can fail following the death of the application. When this occurs, the session is left with 'data' buffers allocated and unallocated ''metadata' buffers. This is an unexpected state and results in the rotation code attempting to rotate a partially initialized metadata stream. The rotation of this partially initialized metadata stream never completes which, in turn, never allows the session to complete its implicit rotation on destruction. This race window is fairly narrow, but can be reproduced by sleep()-ing at the beginning of ust_app_start_trace() and killing an application that is being traced during the sleep period. Solution ======== The creation of the metadata channel is performed as part of ust_app_synchronize() if the application look-up succeeds. When it fails, both 'data' and 'metadata' streams will fail to be created resulting in an expected and valid state. Known drawbacks =============== None. Signed-off-by: Jérémie Galarneau Change-Id: Ice0ec16734a39a6bb885986d3ad70d20cd2618e0 --- diff --git a/src/bin/lttng-sessiond/ust-app.c b/src/bin/lttng-sessiond/ust-app.c index 41f201c48..dedf4e600 100644 --- a/src/bin/lttng-sessiond/ust-app.c +++ b/src/bin/lttng-sessiond/ust-app.c @@ -4430,15 +4430,6 @@ int ust_app_start_trace(struct ltt_ust_session *usess, struct ust_app *app) goto skip_setup; } - /* - * Create the metadata for the application. This returns gracefully if a - * metadata was already set for the session. - */ - ret = create_ust_app_metadata(ua_sess, app, usess->consumer); - if (ret < 0) { - goto error_unlock; - } - health_code_update(); skip_setup: @@ -5104,6 +5095,7 @@ void ust_app_synchronize(struct ltt_ust_session *usess, } rcu_read_lock(); + cds_lfht_for_each_entry(usess->domain_global.channels->ht, &uchan_iter, uchan, node.node) { struct ust_app_channel *ua_chan; @@ -5147,6 +5139,21 @@ void ust_app_synchronize(struct ltt_ust_session *usess, } } } + + /* + * Create the metadata for the application. This returns gracefully if a + * metadata was already set for the session. + * + * The metadata channel must be created after the data channels as the + * consumer daemon assumes this ordering. When interacting with a relay + * daemon, the consumer will use this assumption to send the + * "STREAMS_SENT" message to the relay daemon. + */ + ret = create_ust_app_metadata(ua_sess, app, usess->consumer); + if (ret < 0) { + goto error_unlock; + } + rcu_read_unlock(); end: