Issue observed
==============
A test for an incoming feature (trigger actions on on-event conditions)
hangs. While this problem was discovered using this test, it exercises a
scenario that is problematic as of this fix.
The destruction of a session can hang if a single application being
traced unregisters (dies) during the 'start' of a session.
Cause
=====
When a per-uid session is started, its buffers (channels and streams)
are allocated only if an instrumented application is registered to the
session daemon at that moment.
For historical reasons, the 'data' and 'metadata' buffers are allocated
in separate code paths. The 'data' buffers are allocated in
ust_app_synchronize() and the 'metadata' buffers are allocated in
ust_app_start_trace(). Both functions perform their own look-up for an
application session and will gracefully fail if an application session
can't be found; it typically means the application has exited.
This leaves a race window open where ust_app_synchronize() can succeed
in looking-up the application session, and ust_app_start_trace() can
fail following the death of the application.
When this occurs, the session is left with 'data' buffers allocated and
unallocated ''metadata' buffers. This is an unexpected state and results
in the rotation code attempting to rotate a partially initialized
metadata stream.
The rotation of this partially initialized metadata stream never
completes which, in turn, never allows the session to complete its
implicit rotation on destruction.
This race window is fairly narrow, but can be reproduced by sleep()-ing
at the beginning of ust_app_start_trace() and killing an application
that is being traced during the sleep period.
Solution
========
The creation of the metadata channel is performed as part of
ust_app_synchronize() if the application look-up succeeds. When it
fails, both 'data' and 'metadata' streams will fail to be created
resulting in an expected and valid state.
Known drawbacks
===============
None.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ice0ec16734a39a6bb885986d3ad70d20cd2618e0
goto skip_setup;
}
- /*
- * Create the metadata for the application. This returns gracefully if a
- * metadata was already set for the session.
- */
- ret = create_ust_app_metadata(ua_sess, app, usess->consumer);
- if (ret < 0) {
- goto error_unlock;
- }
-
health_code_update();
skip_setup:
}
rcu_read_lock();
+
cds_lfht_for_each_entry(usess->domain_global.channels->ht, &uchan_iter,
uchan, node.node) {
struct ust_app_channel *ua_chan;
}
}
}
+
+ /*
+ * Create the metadata for the application. This returns gracefully if a
+ * metadata was already set for the session.
+ *
+ * The metadata channel must be created after the data channels as the
+ * consumer daemon assumes this ordering. When interacting with a relay
+ * daemon, the consumer will use this assumption to send the
+ * "STREAMS_SENT" message to the relay daemon.
+ */
+ ret = create_ust_app_metadata(ua_sess, app, usess->consumer);
+ if (ret < 0) {
+ goto error_unlock;
+ }
+
rcu_read_unlock();
end: