Christian Babeux [Tue, 18 Dec 2012 21:31:14 +0000 (16:31 -0500)]
run-report: Use libtool wrapper to spawn the sessiond for tests
The run-report script was using the sessiond binary generated via
libtool under the ".libs/" folder. When using this binary, the consumerd
used when starting the sessiond is the one installed system-wide (if
any). This could lead to tests failures if no consumer are installed in
the system or any version mismatch occurs.
This commit fix this by using the consumerd that was built with libtool
in the local source tree.
Signed-off-by: Christian Babeux <christian.babeux@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Tue, 18 Dec 2012 19:02:14 +0000 (14:02 -0500)]
Fix: flag metadata stream on quiescent control cmd
For the relayd, when doing a quiescent control command, we have to flag
the corresponding metadata stream or else it will simply stay alive
until a close stream and always returning that data is inflight at the
end data pending command.
Add a stream id to the relayd command so the relayd can identify which
stream to flag.
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Tue, 18 Dec 2012 17:05:24 +0000 (12:05 -0500)]
Fix: remove ua_sess->started assert on stop trace
It's totally possible that a start failed for a specific app but the
started flag is set for the global session making a stop trace possible
on a failed started session.
The assert is no longer valid since this code flow is possible.
Signed-off-by: David Goulet <dgoulet@efficios.com>
Julien Desfossez [Mon, 17 Dec 2012 17:13:38 +0000 (12:13 -0500)]
Set classes of traffic in high_throughput_limits
This patch creates 2 classes for the bandwidth limited test instead of
one. The intent is to have multiple queues in the kernel instead of just
one. That way we can prioritize the control port over the data port and
make sure it gets its share of the bandwidth.
With this update, the control port gets 1/10th of the limit and the data
get the remaining 9/10th. If unused, the data connection can borrow the
remaining bandwidth.
Signed-off-by: Julien Desfossez <jdesfossez@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Mon, 17 Dec 2012 17:19:56 +0000 (12:19 -0500)]
Fix: force the poll() return value to be nb_fd
With poll(), we have to iterate over all fd in the pollset since it is
handled in user space where we don't have to with epoll.o
This is a first patch to fix the fact that we should iterate over the
number of fd the lttng_poll_wait() call returns which is for epoll the
number of returned events and with poll the whole set of fd.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
Christian Babeux [Thu, 13 Dec 2012 23:39:13 +0000 (18:39 -0500)]
Tests: Add health check testpoint fail test
This test trigger a failure in a specified thread by using the testpoint
mechanism. The testpoints behavior is implemented in health_fail.c. The
testpoint code simply return 1 (non-zero values are considered as errors
for testpoints) to trigger the specific thread error handling mechanism.
This test ensure that we can detect health failure for each thread error
handling paths.
Signed-off-by: Christian Babeux <christian.babeux@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
Christian Babeux [Thu, 13 Dec 2012 23:38:56 +0000 (18:38 -0500)]
Add return code to the testpoint mechanism
The testpoint processing could fail and currently there is no mechanism
to notify the caller of such failures. This patch adds an int return
code to the testpoint prototype. Non-zero return code indicate failure.
When using the testpoint mechanism, the caller should properly handle
testpoint failure cases and trigger the appropriate response (error
handling, thread teardown, etc.).
Signed-off-by: Christian Babeux <christian.babeux@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Thu, 13 Dec 2012 22:51:45 +0000 (17:51 -0500)]
Fix: RCU unlock out of error path
On channel error, RCU was not unlocking the read side. Furthermore,
remove a check for a NULL session that was also not going through an RCU
unlock. Change it to an assert.
This also adds a channel subbuf size check when enabling a channel.
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Thu, 13 Dec 2012 01:16:33 +0000 (20:16 -0500)]
Fix data pending for inflight streaming
The consumer_data_pending() function call had a bad label naming. The
goto label data_not_pending was actually going to the return value of
pending data (1). So, this patch fixes that by renaming the label to the
right meaning.
Add a missing destroy of the relayd session id mapping hash table.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Wed, 12 Dec 2012 22:05:45 +0000 (17:05 -0500)]
Add the relayd create session command
This is needed in order to fix a specific condition of the data pending
where we need to have streams associated with a session and this command
will be used for new feature in the future.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Wed, 12 Dec 2012 16:23:20 +0000 (11:23 -0500)]
Make the consumer sends a ACK after each command
This is needed to avoid buffer bloating when throttling communication
between the consumer and the relayd. Considering a very low bandwith
limit between the relayd and consumerd, the session daemon would send a
high debit of commands to the consumer without ever
emptying the unix socket queue, which makes the UNIX socket reach buffer
full conditions, which is prone to trigger corner-cases behaviors in
blocking send/recv with MSG_WAITALL, which is likely the cause of hang
experienced when limiting relayd bandwidth.
Adding an ACK to each command makes sure that we acknowledge the session
daemon that we, the consumer, have emptied the unix socket buffer.
NOTE: In consumer_add_relayd_socket(), there might be a problem with the
error path and message status to the sessiond. A subsequent patch might
fix a possible issue but for now it is not at all critical since any
critical error on the consumer side will notify the sessiond through the
error socket.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Wed, 12 Dec 2012 18:39:37 +0000 (13:39 -0500)]
Remove MSG_WAITALL on every recvmsg() socket type
In order to handle messages that are possibly larger than the socket
buffer size set by wmem_max and rmem_max /proc files, ensure that the
recv-side reads the data chunk-wise rather than hanging on a
MSG_WAITALL.
In addition to fixing this issue, chances are that it will also help
fixing hangs detected due to UNIX socket buffers filling up. The
MSG_WAITALL behavior in such situations might be unexpected.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Mon, 10 Dec 2012 21:03:58 +0000 (16:03 -0500)]
Fix: Use stream deletion function when cleaning up
In theory, once the destroy stream ht function is called with the hash
table, it should be empty. However, for some fatal errors, it might not
so it's imperative that we gracefully delete the stream and free it
using an RCU call so both hash tables (stream and the one for the
pending command) are synchronized.
Simply freeing the stream could have created possible fd leaks and
invalid node for the data pending hash table.
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Mon, 10 Dec 2012 17:16:15 +0000 (12:16 -0500)]
Fix: Relayd and sessiond version check
Now only checks for the major version to be equal. After 2.1 stable
release, both components will adapt to the lowest minor version for the
same major version. For this, the session daemon now send it's version
values to the relayd so slight change in the protocol here.
For instance, a relayd 2.4 talking to a sessiond 2.8, the communication
and available feature will only be those of 2.4 version.
For a relayd let say 3.2 and a sessiond 2.2, the communication stops
right there since both major version differs.
Acked-by: Julien Desfossez <julien.desfossez@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Fri, 7 Dec 2012 18:54:44 +0000 (13:54 -0500)]
Fix: UST app session teardown process
This patch removes the ht_del of sessions from the delete_ust_app RCU
call and puts it in the unregister app function just before the call_rcu
is done.
To be able to free the sessions in the call rcu, a list is added for
which, when in tearing down an application or session, this list is used
to get the session reference for deletion.
Note that when in the RCU call, we are assured that the list is
exclusively accessed thus no need for any locking.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Fri, 7 Dec 2012 17:05:24 +0000 (12:05 -0500)]
Fix: check ht_del ret value of ust app session
UST app sesion can be destroyed by two execution paths. Either the app
unregisters or a destroy session is triggered. So, allowing a ht_del to
fail means that the session is already scheduled for teardown in a rcu
call.
Furthermore, this means that when looking up for a ust app session that
is not found becomes valid since it means it is in the teardown process.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Tue, 4 Dec 2012 23:17:55 +0000 (18:17 -0500)]
Fix: don't steal key when adding a metadata stream
This was causing a stream corruption of the node key if the stream->key
of the metadata was matching a stream wait_fd making the stream not
findable and asserting when getting out of the metadata poll wait.
Now we lookup the stream before adding it to make sure it's unique and
don't try to steal the key anymore since wait_fd is unique to the
consumer.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Wed, 28 Nov 2012 20:30:37 +0000 (15:30 -0500)]
Fix: enable-consumer for all domains missing dir
So the fix here is to enable consumer for all domains if no domain is
given to the command line. This way, the session daemon can handle
correctly the trace directory path for the right domain.
Note that there is *no* switch for all domains (-a, --all) so omitting
the domain (-u or -k) automatically creates a UST and kernel session, if
none, and set the consumer for both of them. If one fails, the command
is stopped. All in all, to be sure, use a domain with the command ;).
Fixes #333
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Thu, 22 Nov 2012 15:22:09 +0000 (10:22 -0500)]
lttng.h API update: set filter becomes enable event with filter
The lttng_set_event_filter() is changed to
lttng_enable_event_with_filter() taking the same arguments. The lttng UI
now only uses this call. Note that the original lttng_enable_event() is
still available but will set the filter to NULL.
This is done since now UST allows to enable the same event with
different filters or/and loglevels. So, the events are still hashed by
name but matched by the name/filter/loglevel triplet. In order to add an
event to the hash table, those three attributes are needed at the
creation time thus adding this API call which takes them all at once.
There is some fixes in the match functions and filter setting from the
previous commit that were needed to make the overlap.sh tests works.
The loglevel_match function is removed because it is now only done in
the hash table match function which will eventually get merged making a
single loglevel match call site hence this function becomes useless.
Furthermore, the filter.c/.h are no longer required since the filter is
now added at event creation and CAN NOT be set after.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Wed, 21 Nov 2012 17:28:03 +0000 (12:28 -0500)]
Change the UST event hash table match function
The event are now matched, when looked up, with the name/filter/loglevel
triplet since the UST tracer now allows us to enable the same event with
different filters or/and loglevels.
The disable command however only takes the event name so for now it
disable all events matching that name which is why we still hash by
event name.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
Simon Marchi [Tue, 13 Nov 2012 19:28:42 +0000 (14:28 -0500)]
Add default subbuf sizes getter functions
This patch adds functions to retrieve defaults subbuf sizes. It uses the
DEFAULT_*_SUBBUF_SIZE defines from defaults.h but also make sure that
the values are at least as big as the page size.
The functions are defined as static inline in defaults.h.
Signed-off-by: Simon Marchi <simon.marchi@polymtl.ca> Signed-off-by: David Goulet <dgoulet@efficios.com>