David Goulet [Thu, 13 Dec 2012 01:16:33 +0000 (20:16 -0500)]
Fix data pending for inflight streaming
The consumer_data_pending() function call had a bad label naming. The
goto label data_not_pending was actually going to the return value of
pending data (1). So, this patch fixes that by renaming the label to the
right meaning.
Add a missing destroy of the relayd session id mapping hash table.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Wed, 12 Dec 2012 22:05:45 +0000 (17:05 -0500)]
Add the relayd create session command
This is needed in order to fix a specific condition of the data pending
where we need to have streams associated with a session and this command
will be used for new feature in the future.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Wed, 12 Dec 2012 16:23:20 +0000 (11:23 -0500)]
Make the consumer sends a ACK after each command
This is needed to avoid buffer bloating when throttling communication
between the consumer and the relayd. Considering a very low bandwith
limit between the relayd and consumerd, the session daemon would send a
high debit of commands to the consumer without ever
emptying the unix socket queue, which makes the UNIX socket reach buffer
full conditions, which is prone to trigger corner-cases behaviors in
blocking send/recv with MSG_WAITALL, which is likely the cause of hang
experienced when limiting relayd bandwidth.
Adding an ACK to each command makes sure that we acknowledge the session
daemon that we, the consumer, have emptied the unix socket buffer.
NOTE: In consumer_add_relayd_socket(), there might be a problem with the
error path and message status to the sessiond. A subsequent patch might
fix a possible issue but for now it is not at all critical since any
critical error on the consumer side will notify the sessiond through the
error socket.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Wed, 12 Dec 2012 18:39:37 +0000 (13:39 -0500)]
Remove MSG_WAITALL on every recvmsg() socket type
In order to handle messages that are possibly larger than the socket
buffer size set by wmem_max and rmem_max /proc files, ensure that the
recv-side reads the data chunk-wise rather than hanging on a
MSG_WAITALL.
In addition to fixing this issue, chances are that it will also help
fixing hangs detected due to UNIX socket buffers filling up. The
MSG_WAITALL behavior in such situations might be unexpected.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Mon, 10 Dec 2012 21:03:58 +0000 (16:03 -0500)]
Fix: Use stream deletion function when cleaning up
In theory, once the destroy stream ht function is called with the hash
table, it should be empty. However, for some fatal errors, it might not
so it's imperative that we gracefully delete the stream and free it
using an RCU call so both hash tables (stream and the one for the
pending command) are synchronized.
Simply freeing the stream could have created possible fd leaks and
invalid node for the data pending hash table.
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Mon, 10 Dec 2012 17:16:15 +0000 (12:16 -0500)]
Fix: Relayd and sessiond version check
Now only checks for the major version to be equal. After 2.1 stable
release, both components will adapt to the lowest minor version for the
same major version. For this, the session daemon now send it's version
values to the relayd so slight change in the protocol here.
For instance, a relayd 2.4 talking to a sessiond 2.8, the communication
and available feature will only be those of 2.4 version.
For a relayd let say 3.2 and a sessiond 2.2, the communication stops
right there since both major version differs.
Acked-by: Julien Desfossez <julien.desfossez@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Fri, 7 Dec 2012 18:54:44 +0000 (13:54 -0500)]
Fix: UST app session teardown process
This patch removes the ht_del of sessions from the delete_ust_app RCU
call and puts it in the unregister app function just before the call_rcu
is done.
To be able to free the sessions in the call rcu, a list is added for
which, when in tearing down an application or session, this list is used
to get the session reference for deletion.
Note that when in the RCU call, we are assured that the list is
exclusively accessed thus no need for any locking.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Fri, 7 Dec 2012 17:05:24 +0000 (12:05 -0500)]
Fix: check ht_del ret value of ust app session
UST app sesion can be destroyed by two execution paths. Either the app
unregisters or a destroy session is triggered. So, allowing a ht_del to
fail means that the session is already scheduled for teardown in a rcu
call.
Furthermore, this means that when looking up for a ust app session that
is not found becomes valid since it means it is in the teardown process.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Tue, 4 Dec 2012 23:17:55 +0000 (18:17 -0500)]
Fix: don't steal key when adding a metadata stream
This was causing a stream corruption of the node key if the stream->key
of the metadata was matching a stream wait_fd making the stream not
findable and asserting when getting out of the metadata poll wait.
Now we lookup the stream before adding it to make sure it's unique and
don't try to steal the key anymore since wait_fd is unique to the
consumer.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Wed, 28 Nov 2012 20:30:37 +0000 (15:30 -0500)]
Fix: enable-consumer for all domains missing dir
So the fix here is to enable consumer for all domains if no domain is
given to the command line. This way, the session daemon can handle
correctly the trace directory path for the right domain.
Note that there is *no* switch for all domains (-a, --all) so omitting
the domain (-u or -k) automatically creates a UST and kernel session, if
none, and set the consumer for both of them. If one fails, the command
is stopped. All in all, to be sure, use a domain with the command ;).
Fixes #333
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Thu, 22 Nov 2012 15:22:09 +0000 (10:22 -0500)]
lttng.h API update: set filter becomes enable event with filter
The lttng_set_event_filter() is changed to
lttng_enable_event_with_filter() taking the same arguments. The lttng UI
now only uses this call. Note that the original lttng_enable_event() is
still available but will set the filter to NULL.
This is done since now UST allows to enable the same event with
different filters or/and loglevels. So, the events are still hashed by
name but matched by the name/filter/loglevel triplet. In order to add an
event to the hash table, those three attributes are needed at the
creation time thus adding this API call which takes them all at once.
There is some fixes in the match functions and filter setting from the
previous commit that were needed to make the overlap.sh tests works.
The loglevel_match function is removed because it is now only done in
the hash table match function which will eventually get merged making a
single loglevel match call site hence this function becomes useless.
Furthermore, the filter.c/.h are no longer required since the filter is
now added at event creation and CAN NOT be set after.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Wed, 21 Nov 2012 17:28:03 +0000 (12:28 -0500)]
Change the UST event hash table match function
The event are now matched, when looked up, with the name/filter/loglevel
triplet since the UST tracer now allows us to enable the same event with
different filters or/and loglevels.
The disable command however only takes the event name so for now it
disable all events matching that name which is why we still hash by
event name.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
Simon Marchi [Tue, 13 Nov 2012 19:28:42 +0000 (14:28 -0500)]
Add default subbuf sizes getter functions
This patch adds functions to retrieve defaults subbuf sizes. It uses the
DEFAULT_*_SUBBUF_SIZE defines from defaults.h but also make sure that
the values are at least as big as the page size.
The functions are defined as static inline in defaults.h.
Signed-off-by: Simon Marchi <simon.marchi@polymtl.ca> Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Fri, 9 Nov 2012 18:36:02 +0000 (13:36 -0500)]
Fix: Don't set filter if enable event fails
Furthermore, the session daemon now returns an already exist error if
the event is enabled twice. Since we can now set filter, it's misleading
the user to tell him/her that the event was enabled but the filter
failed and than disabling the already enabled event by a previous
command.
Fixes #363 #364
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Thu, 8 Nov 2012 19:08:28 +0000 (14:08 -0500)]
Fix: Enable event after start command
Refer to the bug tracker for more information on this bug. In a
nutshell, it was not possible to enable channel/event after a start
command has been issued (for both kernel and UST).
Note that it is still NOT possible to enable a channel after a start but
the use case here was the following:
The first starts does NOT create a kernel or UST session so the
following enable event creates the session for the given domain but
failed to start the session after.
Fix: Teardown of thread_manage_clients on failure of listen/create_poll
Currently, if the call to lttcomm_listen_unix_sock or
create_thread_poll_set fails, the error handling and thread teardown
code path is triggered via a jump to an error label. This error handling
path closes the sockets that were used and cleanup the poll set. If the
listen fails, the poll set will tentatively be cleaned even though it
has never been initialized.
This patch add 2 labels to differentiate the error handling paths needed
in case of failure of lttcomm_listen_unix_sock or
create_thread_poll_set.
Signed-off-by: Christian Babeux <christian.babeux@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Tue, 6 Nov 2012 15:18:26 +0000 (10:18 -0500)]
Remove consumer poll timeout in data thread
This was originally used to stop any poll() after a grace period when a
session daemon dies in case of weird stuck fd(s).
However, things have changed a bit especially on the UST tracer side
where all stream handles are now cleaned up if the session daemon socket
hang up which than release every stream on the consumer and finally
quits gracefully.
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Tue, 6 Nov 2012 20:21:18 +0000 (15:21 -0500)]
Fix: RCU hash table seed
Remove the dangerous static value 0x42UL. Use time(NULL) value to set
the new lttng_ht_seed variable when the hash table library constructor
is called.
Fixes #388
Signed-off-by: David Goulet <dgoulet@efficios.com>
The libraries libhealthexit and libhealthstall should not be installed
on the user system. They are only useful for the health check tests.
Furthermore, when adding libraries to noinst_LTLIBRARIES, libtool will
only build these as static libraries (see [1] for a workaround). This is
fine for most use cases, but for the health tests, we _must_ have shared
libraries (the nature of the tests require LD_PRELOAD), hence we force
the build of a shared object.
Forcing shared object has the unfortunate side-effect of breaking builds
where configure was invoked with "--disable-shared" flag. Instead of
failing badly, detect this flag and skip the health tests altogether.
David Goulet [Tue, 6 Nov 2012 14:41:35 +0000 (09:41 -0500)]
Fix: Add EPIPE error handling on buffer splice
Even though this is _not_ documented in splice(2), if the fd_out is a
socket but closed on one end, splice returns a negative value and set
errno to EPIPE. The man page specifies a EBADF but I guess both are
possible (and it is according to the kernel 3.6.2 source).
So, when streaming a kernel session (using splice), if the relayd quits,
a splice on the socket returns an EPIPE.
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Mon, 5 Nov 2012 21:49:40 +0000 (16:49 -0500)]
Fix: Wrong poll events on UST application socket
The thread manage apps was adding UST sockets to the poll set with the
POLLIN event registered. However, the thread was not handling this event
which could ultimately cause an infinite loop if the application sends
data.
This has been observed with the bug386 when an application is stopped
and a lttng command is sent.
https://bugs.lttng.org/issues/386
Furthermore, a time window between the send and the reply recv of an UST
command was making the app manager loop actively because of a POLLIN
event on the socket caused by the reply from the application which is
finally handled a bit after by the client thread. This was not that
problematic but lead to a lot of repeated debug message and CPU time
lost.
This application thread is *only* handling error event usually triggered
by a close() on the UST socket thus OK to *not* wait for POLLIN/POLLOUT
event.
Signed-off-by: David Goulet <dgoulet@efficios.com>
The -lurcu-cds link flag is not mandatory to link any of the tests.
Also, adding this flag can cause multiple symbol definition errors when
linking statically because the libhashtable contains symbol names that
are also present in urcu-cds.
Signed-off-by: Christian Babeux <christian.babeux@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
Fix: Missing librt dependency in configure check for lttng-ust-ctl
The lttng-ust-ctl library depends on librt. The AC_CHECK_LIBRARY macro
can't automatically resolve dependents libraries (ala libtool), so any
additionnals dependencies must be manually specified.
Signed-off-by: Christian Babeux <christian.babeux@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Fri, 2 Nov 2012 17:45:05 +0000 (13:45 -0400)]
Fix: Deny session creation name 'auto'
This is a reserved keyword for default session(s). Note that this is
only for the lttng command line tool. Using 'auto' is possible with the
API but the current date and time is automatically appended to it for
now so be aware of that.
Fixes #359
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Fri, 2 Nov 2012 16:38:33 +0000 (12:38 -0400)]
Fix: Add output option to enable-channel command
Allow to specify an output mode for the channel (MMAP or SPLICE).
A new error code is introduced to notify the user that the operation is
not supported. Useful for user space tracing which does NOT support
splice mode.
Fixes #335
Signed-off-by: David Goulet <dgoulet@efficios.com>
Christian Babeux [Wed, 31 Oct 2012 13:00:45 +0000 (09:00 -0400)]
Tests: Add filtering tests for uncovered cases
While investigating the code coverage of the filtering feature, a couple
of possible tests cases were uncovered:
Error tests:
* Strings can't be IR root node
* Unary ! not allowed on string type
* Comparison with string type not allowed
* Logical operator not allowed with string types
* Nesting of binary operator not allowed
Valid tests:
* Cover all left/right operands permutations with
fields ref. and numeric values.
Signed-off-by: Christian Babeux <christian.babeux@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Thu, 1 Nov 2012 20:27:08 +0000 (16:27 -0400)]
Fix: Sync issue when deleting a data stream
A data stream was rescheduled for deletion after a flush on hang up.
Once in a normal read code path, on error, the stream is deleted and
then processed for ERR|HUP error which could also delete again the
stream causing an assert() and a failed trace.
We fix that by setting the local array to NULL for that stream once
deleted and ignoring the stream is subsequent loop if NULL.
Fixes #390
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Thu, 1 Nov 2012 17:02:58 +0000 (13:02 -0400)]
Rename data_available to data_pending
This is just a rename and a change of semantic.
The lttng_data_pending returns 0 if _NO_ data is pending meaning that
the buffers are ready to be read safely. A value of 1 means that data is
still pending so the buffers are not ready for any read.
This is the same semantic as lttng_data_available but in reverse order.
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Tue, 30 Oct 2012 19:11:18 +0000 (15:11 -0400)]
Fix: Bad return error code handling
Two things here. First, if the tracer dies before we have time to create
the session on it, the returned value (-1) was not handled creating a
segfault on the following loop.
Second, we no longer assert on the application PID hash table delete
return value since we use add_replace on app. registration creating a
possible key reuse for a different node.
Signed-off-by: David Goulet <dgoulet@efficios.com>