lttng-tools.git
11 years agoFix: change function name for better meaning
David Goulet [Mon, 21 Jan 2013 16:24:29 +0000 (11:24 -0500)] 
Fix: change function name for better meaning

Mostly to avoid confusion in the future for patches, reviews and
contributors.

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: improve error handling for UST stream creation
David Goulet [Mon, 21 Jan 2013 15:50:44 +0000 (10:50 -0500)] 
Fix: improve error handling for UST stream creation

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: remove duplicate set ust event filter
David Goulet [Wed, 16 Jan 2013 19:15:42 +0000 (14:15 -0500)] 
Fix: remove duplicate set ust event filter

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoUpdate version to v2.1.1 v2.1.1
David Goulet [Fri, 11 Jan 2013 15:56:44 +0000 (10:56 -0500)] 
Update version to v2.1.1

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: update next_net_seq_num after sending header
David Goulet [Thu, 10 Jan 2013 17:07:35 +0000 (12:07 -0500)] 
Fix: update next_net_seq_num after sending header

Increment the sequence number after we are sure that the relayd has
received correctly the data header. If an error occurs when sending the
header, the data won't be extracted from the buffers thus keeping this
sequence number untouched.

Furthermore, after sending the header, if the relayd dies, this value
won't matter much and if there is an error on the stream when reading
the trace data, the stream will be deleted thus closed on the relayd
making this value useless.

It's important to note that this sequence number is updated on the
relayd side if the full expected data packet was received. So,
incrementing the value after the transmission of the header is not
changing anything in terms of value coherency. The point is to have a
semantic of when read and used successfully (transmission to relayd),
let's update it.

In that code flow, the stream's lock is acquired so no need to
read/update it atomically. I've also added a comments to better
understand the purpose of this variable and how to use it.

Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: wrong loop continuation in metadata thread
David Goulet [Thu, 10 Jan 2013 15:18:31 +0000 (10:18 -0500)] 
Fix: wrong loop continuation in metadata thread

The validation of the endpoint status can change the metadata hash table
meaning stream(s) can be removed from it and the poll set. After that,
continuing the for loop was making the thread use possible invalid file
descriptor that were not in the hash table anymore trigerring the lookup
assert of the node just after the for loop.

The very important part here is that when the metadata ht changes, we
MUST go back to the poll wait() to synchronize the subset of fd we are
looking at.

Reported-by: Jesus Garcia <jesus.garcia@ericsson.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: lttng create session memleaks
David Goulet [Wed, 9 Jan 2013 22:06:38 +0000 (17:06 -0500)] 
Fix: lttng create session memleaks

The uri_parse() function call was leaking copy(ies) of lttng_uri
structure.

Fixes #420

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: remove unused session id map
David Goulet [Wed, 9 Jan 2013 15:14:15 +0000 (10:14 -0500)] 
Fix: remove unused session id map

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: wrong session id used on relayd lookup
David Goulet [Wed, 9 Jan 2013 15:03:38 +0000 (10:03 -0500)] 
Fix: wrong session id used on relayd lookup

The relayd session id might not be unique with multiple relayd so the
lookup could choose the wrong relayd for the given sessiond session id.

Fixes #419

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: add missing UST abi header for make dist
David Goulet [Mon, 7 Jan 2013 19:37:16 +0000 (14:37 -0500)] 
Fix: add missing UST abi header for make dist

Reported-by: Samuel Martin <smartin@aldebaran-robotics.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: add missing rcu read side lock/unlock
David Goulet [Mon, 7 Jan 2013 18:45:29 +0000 (13:45 -0500)] 
Fix: add missing rcu read side lock/unlock

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoUpdate version to v2.1.0 v2.1.0
David Goulet [Thu, 20 Dec 2012 18:53:18 +0000 (13:53 -0500)] 
Update version to v2.1.0

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: lttng create URI parsing and check
David Goulet [Thu, 20 Dec 2012 19:13:07 +0000 (14:13 -0500)] 
Fix: lttng create URI parsing and check

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: missing scripts for make dist
David Goulet [Thu, 20 Dec 2012 18:06:50 +0000 (13:06 -0500)] 
Fix: missing scripts for make dist

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoAdd disable-event to man page and clarify enable-event
David Goulet [Thu, 20 Dec 2012 17:17:11 +0000 (12:17 -0500)] 
Add disable-event to man page and clarify enable-event

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: update to latest UST abi
David Goulet [Thu, 20 Dec 2012 15:51:41 +0000 (10:51 -0500)] 
Fix: update to latest UST abi

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: bad check of accept() return value
David Goulet [Thu, 20 Dec 2012 01:56:04 +0000 (20:56 -0500)] 
Fix: bad check of accept() return value

Also fix a missing ret = -1 assignment. Although, the chances are
unlikely to hit a positive ret value that does not match the structure
size, better safe than sorry.

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: missing mutex lock if relayd was not created
David Goulet [Thu, 20 Dec 2012 01:37:14 +0000 (20:37 -0500)] 
Fix: missing mutex lock if relayd was not created

Also add missing ret = -1 assignment on error in error path when adding
a relayd socket in the consumer.

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: return error if sendmsg fails on relayd
David Goulet [Thu, 20 Dec 2012 01:21:54 +0000 (20:21 -0500)] 
Fix: return error if sendmsg fails on relayd

Also, remove a FIXME that was refering to something that disapeared
(data_size).

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: variable usage for data pending and add comments
David Goulet [Thu, 20 Dec 2012 00:58:30 +0000 (19:58 -0500)] 
Fix: variable usage for data pending and add comments

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: print ret value on ust_app start/stop error
David Goulet [Wed, 19 Dec 2012 23:49:37 +0000 (18:49 -0500)] 
Fix: print ret value on ust_app start/stop error

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: compare write() return value to size
David Goulet [Wed, 19 Dec 2012 23:30:37 +0000 (18:30 -0500)] 
Fix: compare write() return value to size

Now also check if the ret value of a write() operation is not equal to
the given size.

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: handle orderly shutdown from transport layer
David Goulet [Wed, 19 Dec 2012 23:25:49 +0000 (18:25 -0500)] 
Fix: handle orderly shutdown from transport layer

Print a debug statement if a shutdown is detected or else an error. The
transport layer will print the perror in case of an error.

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: change perror to debug statement
David Goulet [Wed, 19 Dec 2012 23:11:12 +0000 (18:11 -0500)] 
Fix: change perror to debug statement

Most of the changes here remove a double PERROR which is done by the
transport layer. So we notify in the debug message to understand where
the transport error was.

Also, don't print an error if the relayd is not found. This is possible
if the relayd dies so an error here is useless to the common user but
useful as a debug statement.

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: don't print EPIPE error which can happen
David Goulet [Wed, 19 Dec 2012 22:54:25 +0000 (17:54 -0500)] 
Fix: don't print EPIPE error which can happen

Anytime a relayd is killed, writing on a closed fd is totally possible
so the PERROR of an EPIPE error is useless as an error but we do print
it as a dbg message now.

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: handle shutdown on recv reply in relayd
David Goulet [Wed, 19 Dec 2012 22:51:25 +0000 (17:51 -0500)] 
Fix: handle shutdown on recv reply in relayd

Print a meaningful error when the recvmsg for the reply gets an orderly
shutdown or an error.

Return a negative value each time since this means that we have to stop
everything for that socket and clean up.

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: Off by one in seq num for data pending command
David Goulet [Wed, 19 Dec 2012 20:36:59 +0000 (15:36 -0500)] 
Fix: Off by one in seq num for data pending command

Like the close stream command, the next sequence number of the stream
needs to be used minus 1 for the data pending or else we are off by one
on the relayd during the check since 4 data packets for instance means a
prev_seq value of 4 but a last_next_seq_num of 5 hence creating an off
by one for the data pending check.

Furthermore, the check was actually wrong on the relayd side. Having a
previous sequence number lower than the last one seen does NOT mean that
the data is not pending so the check needed was actually equal or
greater.

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: wrong check on session started on stop command
David Goulet [Wed, 19 Dec 2012 19:13:24 +0000 (14:13 -0500)] 
Fix: wrong check on session started on stop command

This is problematic for application that lives longer than the tracing
session so the make check unfortunately did not catch this problem since
we either kill the applications before the stop or wait for them to die.

I will quote a colleague of mine on IRC after discovering this:
14:14 < cbab> moar tests!

:)

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: for librelayd, fix negative reply ret code
David Goulet [Tue, 18 Dec 2012 21:50:59 +0000 (16:50 -0500)] 
Fix: for librelayd, fix negative reply ret code

Trying to negate a uint32_t is kind of difficult so set ret to -1 and
print the actuall host byte order ret code as an error.

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agorun-report: Add filtering, health and streaming tests
Christian Babeux [Tue, 18 Dec 2012 21:31:18 +0000 (16:31 -0500)] 
run-report: Add filtering, health and streaming tests

Signed-off-by: Christian Babeux <christian.babeux@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agorun-report: Allow tests to spawn and control their own sessiond
Christian Babeux [Tue, 18 Dec 2012 21:31:17 +0000 (16:31 -0500)] 
run-report: Allow tests to spawn and control their own sessiond

The run-report script can spawn a sessiond if the 'daemon' key value is
set to 'True' in the test description dictionary. If the 'daemon' key is
set to 'False', the TEST_NO_SESSIOND environment variable is set so no
sessiond can be spawned in the tests. This variable is also set when the
run-report spawn its own sessiond.

This behavior has the unfortunate side-effect of restricting any kind of
spawning and control of the sessiond via the tests.

Fix this issue by allowing the tests to spawn their own sessiond. We
need to pass an additional env dictionary to the TestWorker in order to
spawn the test with the proper environment variables set.

To indicate that a test will spawn and manage its own sessiond, the
'daemon' key value should be set to the "test" string.

Signed-off-by: Christian Babeux <christian.babeux@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agorun-report: Fix CPU usage stats computation
Christian Babeux [Tue, 18 Dec 2012 21:31:16 +0000 (16:31 -0500)] 
run-report: Fix CPU usage stats computation

The CPU usage statistics are computed by grepping the top command
output. The top output format as since changed so the CPU usage
statistics were not properly computed.

Fix this by adjusting to the new top command output format.

Signed-off-by: Christian Babeux <christian.babeux@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agorun-report: Restore SIGPIPE default handler in subprocess calls
Christian Babeux [Tue, 18 Dec 2012 21:31:15 +0000 (16:31 -0500)] 
run-report: Restore SIGPIPE default handler in subprocess calls

Python override the SIGPIPE default handler because it prefers to check
every write and raise an IOError exception rather than taking SIGPIPE
[1].

This behavior has the unfortunate side-effect of polluting stdout with
broken pipe messages on shell pipelines invocations (e.g. echo foo |
grep something | etc.) in shell scripts spawned via subprocess.Popen().

This commit fix the polluting of stdout by restoring the default SIGPIPE
handler on subprocess calls.

[1] - http://bugs.python.org/issue1652

Signed-off-by: Christian Babeux <christian.babeux@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agorun-report: Use libtool wrapper to spawn the sessiond for tests
Christian Babeux [Tue, 18 Dec 2012 21:31:14 +0000 (16:31 -0500)] 
run-report: Use libtool wrapper to spawn the sessiond for tests

The run-report script was using the sessiond binary generated via
libtool under the ".libs/" folder. When using this binary, the consumerd
used when starting the sessiond is the one installed system-wide (if
any). This could lead to tests failures if no consumer are installed in
the system or any version mismatch occurs.

This commit fix this by using the consumerd that was built with libtool
in the local source tree.

Signed-off-by: Christian Babeux <christian.babeux@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: sessiond write() to handle EINTR
David Goulet [Tue, 18 Dec 2012 21:19:34 +0000 (16:19 -0500)] 
Fix: sessiond write() to handle EINTR

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: change ERR/PERROR statement to DBG
David Goulet [Tue, 18 Dec 2012 21:04:19 +0000 (16:04 -0500)] 
Fix: change ERR/PERROR statement to DBG

Most of the explanation is added as comments in the code.

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: DBG statement in relayd
David Goulet [Tue, 18 Dec 2012 20:38:25 +0000 (15:38 -0500)] 
Fix: DBG statement in relayd

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: handle EINTR for every read()
David Goulet [Tue, 18 Dec 2012 20:30:25 +0000 (15:30 -0500)] 
Fix: handle EINTR for every read()

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: handle consumer data pipe read error
David Goulet [Tue, 18 Dec 2012 20:21:33 +0000 (15:21 -0500)] 
Fix: handle consumer data pipe read error

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: don't print usage when listing fails
David Goulet [Tue, 18 Dec 2012 20:18:27 +0000 (15:18 -0500)] 
Fix: don't print usage when listing fails

Fixes #414

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: possible invalid free in kernel thread
David Goulet [Tue, 18 Dec 2012 19:50:51 +0000 (14:50 -0500)] 
Fix: possible invalid free in kernel thread

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: flag metadata stream on quiescent control cmd
David Goulet [Tue, 18 Dec 2012 19:02:14 +0000 (14:02 -0500)] 
Fix: flag metadata stream on quiescent control cmd

For the relayd, when doing a quiescent control command, we have to flag
the corresponding metadata stream or else it will simply stay alive
until a close stream and always returning that data is inflight at the
end data pending command.

Add a stream id to the relayd command so the relayd can identify which
stream to flag.

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: prioritize control socket communication in relayd
David Goulet [Tue, 18 Dec 2012 00:04:13 +0000 (19:04 -0500)] 
Fix: prioritize control socket communication in relayd

Add the LTTNG_POLL_GET_PREV_FD for the relayd listener thread that needs
to access the previous valid fd during a poll loop.

Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: poll and epoll fd set reallocation
David Goulet [Mon, 17 Dec 2012 20:46:28 +0000 (15:46 -0500)] 
Fix: poll and epoll fd set reallocation

Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: cppcheck linter cleanups
Mathieu Desnoyers [Mon, 17 Dec 2012 23:32:27 +0000 (18:32 -0500)] 
Fix: cppcheck linter cleanups

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: add missing goto pending if data is inflight
David Goulet [Tue, 18 Dec 2012 17:09:09 +0000 (12:09 -0500)] 
Fix: add missing goto pending if data is inflight

There was only a detection for data NOT inflight and for data inflight,
if a relayd was found, was simply exiting the loop and return no data
pending.

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: remove ua_sess->started assert on stop trace
David Goulet [Tue, 18 Dec 2012 17:05:24 +0000 (12:05 -0500)] 
Fix: remove ua_sess->started assert on stop trace

It's totally possible that a start failed for a specific app but the
started flag is set for the global session making a stop trace possible
on a failed started session.

The assert is no longer valid since this code flow is possible.

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: remove bash quote when starting relayd in tests
David Goulet [Tue, 18 Dec 2012 13:59:07 +0000 (08:59 -0500)] 
Fix: remove bash quote when starting relayd in tests

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoSet classes of traffic in high_throughput_limits
Julien Desfossez [Mon, 17 Dec 2012 17:13:38 +0000 (12:13 -0500)] 
Set classes of traffic in high_throughput_limits

This patch creates 2 classes for the bandwidth limited test instead of
one. The intent is to have multiple queues in the kernel instead of just
one. That way we can prioritize the control port over the data port and
make sure it gets its share of the bandwidth.

With this update, the control port gets 1/10th of the limit and the data
get the remaining 9/10th. If unused, the data connection can borrow the
remaining bandwidth.

Signed-off-by: Julien Desfossez <jdesfossez@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: use the poll wait ret value when iterating on fd(s)
David Goulet [Mon, 17 Dec 2012 17:37:42 +0000 (12:37 -0500)] 
Fix: use the poll wait ret value when iterating on fd(s)

Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: force the poll() return value to be nb_fd
David Goulet [Mon, 17 Dec 2012 17:19:56 +0000 (12:19 -0500)] 
Fix: force the poll() return value to be nb_fd

With poll(), we have to iterate over all fd in the pollset since it is
handled in user space where we don't have to with epoll.o

This is a first patch to fix the fact that we should iterate over the
number of fd the lttng_poll_wait() call returns which is for epoll the
number of returned events and with poll the whole set of fd.

Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: add missing pollset reset in relayd listener thread
David Goulet [Mon, 17 Dec 2012 16:30:24 +0000 (11:30 -0500)] 
Fix: add missing pollset reset in relayd listener thread

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: Wrong check of node when cleaning up ht
David Goulet [Fri, 14 Dec 2012 20:11:49 +0000 (15:11 -0500)] 
Fix: Wrong check of node when cleaning up ht

The node should NOT be in the hash table to ignore the deletion and not
the contrary.

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoRevert adding LTTNG_PACKED in lttng.h
David Goulet [Fri, 14 Dec 2012 15:28:31 +0000 (10:28 -0500)] 
Revert adding LTTNG_PACKED in lttng.h

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: cleanup high_throughput_limits test
David Goulet [Fri, 14 Dec 2012 14:47:21 +0000 (09:47 -0500)] 
Fix: cleanup high_throughput_limits test

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: set started flag of ust app after ustctl
David Goulet [Fri, 14 Dec 2012 01:40:53 +0000 (20:40 -0500)] 
Fix: set started flag of ust app after ustctl

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: memory leak in add relayd socket error path
David Goulet [Fri, 14 Dec 2012 01:30:50 +0000 (20:30 -0500)] 
Fix: memory leak in add relayd socket error path

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoMove relay commands out of lttcomm_sessiond_command
Julien Desfossez [Fri, 14 Dec 2012 01:01:52 +0000 (20:01 -0500)] 
Move relay commands out of lttcomm_sessiond_command

Introduce a new enum for relayd commands: lttcomm_relayd_command. This
will make further additions to either enum cleaner.

Signed-off-by: Julien Desfossez <jdesfossez@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoTests: Add health check testpoint fail test
Christian Babeux [Thu, 13 Dec 2012 23:39:13 +0000 (18:39 -0500)] 
Tests: Add health check testpoint fail test

This test trigger a failure in a specified thread by using the testpoint
mechanism. The testpoints behavior is implemented in health_fail.c. The
testpoint code simply return 1 (non-zero values are considered as errors
for testpoints) to trigger the specific thread error handling mechanism.

This test ensure that we can detect health failure for each thread error
handling paths.

Signed-off-by: Christian Babeux <christian.babeux@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoAdd return code to the testpoint mechanism
Christian Babeux [Thu, 13 Dec 2012 23:38:56 +0000 (18:38 -0500)] 
Add return code to the testpoint mechanism

The testpoint processing could fail and currently there is no mechanism
to notify the caller of such failures. This patch adds an int return
code to the testpoint prototype. Non-zero return code indicate failure.

When using the testpoint mechanism, the caller should properly handle
testpoint failure cases and trigger the appropriate response (error
handling, thread teardown, etc.).

Signed-off-by: Christian Babeux <christian.babeux@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: put back the high-throughput test removed by mistake
David Goulet [Thu, 13 Dec 2012 23:27:23 +0000 (18:27 -0500)] 
Fix: put back the high-throughput test removed by mistake

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: Bad error handling when enable channel fails
David Goulet [Thu, 13 Dec 2012 23:15:56 +0000 (18:15 -0500)] 
Fix: Bad error handling when enable channel fails

Fixes #403

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoTests: Fix sleep interruption in health stall test
Christian Babeux [Mon, 10 Dec 2012 19:46:15 +0000 (14:46 -0500)] 
Tests: Fix sleep interruption in health stall test

The sleep(3) call can return the number of seconds left to sleep if
interrupted. Handle the intteruption in the health stall test.

Signed-off-by: Christian Babeux <christian.babeux@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: RCU unlock out of error path
David Goulet [Thu, 13 Dec 2012 22:51:45 +0000 (17:51 -0500)] 
Fix: RCU unlock out of error path

On channel error, RCU was not unlocking the read side. Furthermore,
remove a check for a NULL session that was also not going through an RCU
unlock. Change it to an assert.

This also adds a channel subbuf size check when enabling a channel.

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: update file listing for licensing
David Goulet [Thu, 13 Dec 2012 22:30:40 +0000 (17:30 -0500)] 
Fix: update file listing for licensing

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: missing health exit in registration app thread
David Goulet [Thu, 13 Dec 2012 21:55:08 +0000 (16:55 -0500)] 
Fix: missing health exit in registration app thread

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: add packed attribute to filter structure
David Goulet [Thu, 13 Dec 2012 21:41:57 +0000 (16:41 -0500)] 
Fix: add packed attribute to filter structure

Also fix the internal UST abi by swapping two variables and fit the
upstream UST abi.

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: Add missing health code update for consumer command
David Goulet [Thu, 13 Dec 2012 21:35:44 +0000 (16:35 -0500)] 
Fix: Add missing health code update for consumer command

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: packed every sessiond-comm.h structure pass over sockets
David Goulet [Thu, 13 Dec 2012 20:25:03 +0000 (15:25 -0500)] 
Fix: packed every sessiond-comm.h structure pass over sockets

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoAdd LTTNG_PACKED macro
David Goulet [Thu, 13 Dec 2012 20:15:10 +0000 (15:15 -0500)] 
Add LTTNG_PACKED macro

This adds the macro and set it on all lttng.h structure. Also, replace
the already packed relayd structure with the macro.

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: clear the fixme in high_throughput_limits
David Goulet [Thu, 13 Dec 2012 18:58:31 +0000 (13:58 -0500)] 
Fix: clear the fixme in high_throughput_limits

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix data pending for inflight streaming
David Goulet [Thu, 13 Dec 2012 01:16:33 +0000 (20:16 -0500)] 
Fix data pending for inflight streaming

The consumer_data_pending() function call had a bad label naming. The
goto label data_not_pending was actually going to the return value of
pending data (1). So, this patch fixes that by renaming the label to the
right meaning.

Add a missing destroy of the relayd session id mapping hash table.

Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoMap session id of relayd and sessiond in consumer
David Goulet [Wed, 12 Dec 2012 22:39:06 +0000 (17:39 -0500)] 
Map session id of relayd and sessiond in consumer

Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoAdd the relayd create session command
David Goulet [Wed, 12 Dec 2012 22:05:45 +0000 (17:05 -0500)] 
Add the relayd create session command

This is needed in order to fix a specific condition of the data pending
where we need to have streams associated with a session and this command
will be used for new feature in the future.

Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoMake the consumer sends a ACK after each command
David Goulet [Wed, 12 Dec 2012 16:23:20 +0000 (11:23 -0500)] 
Make the consumer sends a ACK after each command

This is needed to avoid buffer bloating when throttling communication
between the consumer and the relayd. Considering a very low bandwith
limit between the relayd and consumerd, the session daemon would send a
high debit of commands to the consumer without ever

emptying the unix socket queue, which makes the UNIX socket reach buffer
full conditions, which is prone to trigger corner-cases behaviors in
blocking send/recv with MSG_WAITALL, which is likely the cause of hang
experienced when limiting relayd bandwidth.

Adding an ACK to each command makes sure that we acknowledge the session
daemon that we, the consumer, have emptied the unix socket buffer.

NOTE: In consumer_add_relayd_socket(), there might be a problem with the
error path and message status to the sessiond. A subsequent patch might
fix a possible issue but for now it is not at all critical since any
critical error on the consumer side will notify the sessiond through the
error socket.

Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoRemove MSG_WAITALL on every recvmsg() socket type
David Goulet [Wed, 12 Dec 2012 18:39:37 +0000 (13:39 -0500)] 
Remove MSG_WAITALL on every recvmsg() socket type

In order to handle messages that are possibly larger than the socket
buffer size set by wmem_max and rmem_max /proc files, ensure that the
recv-side reads the data chunk-wise rather than hanging on a
MSG_WAITALL.

In addition to fixing this issue, chances are that it will also help
fixing hangs detected due to UNIX socket buffers filling up. The
MSG_WAITALL behavior in such situations might be unexpected.

Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: overlap bash escaping for wildcard event name
David Goulet [Mon, 10 Dec 2012 23:24:42 +0000 (18:24 -0500)] 
Fix: overlap bash escaping for wildcard event name

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: Wrong path in the overlap test
David Goulet [Mon, 10 Dec 2012 22:18:23 +0000 (17:18 -0500)] 
Fix: Wrong path in the overlap test

Also, activate the overlap.sh tests by default in the make check.

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: Add missing relayd ht cleanup and ht destroy
David Goulet [Mon, 10 Dec 2012 21:27:55 +0000 (16:27 -0500)] 
Fix: Add missing relayd ht cleanup and ht destroy

Add a function to cleanup every element of the relayd ht and free them
in a call_rcu.

Also, destroy the stream_list_ht on cleanup.

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: Allocate stream hash table in respective threads
David Goulet [Mon, 10 Dec 2012 21:11:15 +0000 (16:11 -0500)] 
Fix: Allocate stream hash table in respective threads

Allocation and destroy are now in the same thread for both metadata and
data hash table.

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: Use stream deletion function when cleaning up
David Goulet [Mon, 10 Dec 2012 21:03:58 +0000 (16:03 -0500)] 
Fix: Use stream deletion function when cleaning up

In theory, once the destroy stream ht function is called with the hash
table, it should be empty. However, for some fatal errors, it might not
so it's imperative that we gracefully delete the stream and free it
using an RCU call so both hash tables (stream and the one for the
pending command) are synchronized.

Simply freeing the stream could have created possible fd leaks and
invalid node for the data pending hash table.

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: Missing umask when using run as no clone
David Goulet [Mon, 10 Dec 2012 18:45:45 +0000 (13:45 -0500)] 
Fix: Missing umask when using run as no clone

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: Relayd and sessiond version check
David Goulet [Mon, 10 Dec 2012 17:16:15 +0000 (12:16 -0500)] 
Fix: Relayd and sessiond version check

Now only checks for the major version to be equal. After 2.1 stable
release, both components will adapt to the lowest minor version for the
same major version. For this, the session daemon now send it's version
values to the relayd so slight change in the protocol here.

For instance, a relayd 2.4 talking to a sessiond 2.8, the communication
and available feature will only be those of 2.4 version.

For a relayd let say 3.2 and a sessiond 2.2, the communication stops
right there since both major version differs.

Acked-by: Julien Desfossez <julien.desfossez@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: FD leak on consumer add relayd socket error
David Goulet [Mon, 10 Dec 2012 16:38:35 +0000 (11:38 -0500)] 
Fix: FD leak on consumer add relayd socket error

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: Consumer sockets leak on error
David Goulet [Mon, 10 Dec 2012 16:20:30 +0000 (11:20 -0500)] 
Fix: Consumer sockets leak on error

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: Use endpoint status enum value in checks
David Goulet [Fri, 7 Dec 2012 21:03:04 +0000 (16:03 -0500)] 
Fix: Use endpoint status enum value in checks

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: protect consumer_find_channel with rcu locking
David Goulet [Fri, 7 Dec 2012 21:00:48 +0000 (16:00 -0500)] 
Fix: protect consumer_find_channel with rcu locking

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: Rename ust_app_destroy_trace and set it static
David Goulet [Fri, 7 Dec 2012 20:54:19 +0000 (15:54 -0500)] 
Fix: Rename ust_app_destroy_trace and set it static

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: UST app session teardown process
David Goulet [Fri, 7 Dec 2012 18:54:44 +0000 (13:54 -0500)] 
Fix: UST app session teardown process

This patch removes the ht_del of sessions from the delete_ust_app RCU
call and puts it in the unregister app function just before the call_rcu
is done.

To be able to free the sessions in the call rcu, a list is added for
which, when in tearing down an application or session, this list is used
to get the session reference for deletion.

Note that when in the RCU call, we are assured that the list is
exclusively accessed thus no need for any locking.

Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: check ht_del ret value of ust app session
David Goulet [Fri, 7 Dec 2012 17:05:24 +0000 (12:05 -0500)] 
Fix: check ht_del ret value of ust app session

UST app sesion can be destroyed by two execution paths. Either the app
unregisters or a destroy session is triggered. So, allowing a ht_del to
fail means that the session is already scheduled for teardown in a rcu
call.

Furthermore, this means that when looking up for a ust app session that
is not found becomes valid since it means it is in the teardown process.

Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: locking order between consumer and stream
David Goulet [Tue, 4 Dec 2012 23:10:45 +0000 (18:10 -0500)] 
Fix: locking order between consumer and stream

Also, lock the stream BEFORE calling the read subbuffer so not to race
with the data pending command.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: don't steal key when adding a metadata stream
David Goulet [Tue, 4 Dec 2012 23:17:55 +0000 (18:17 -0500)] 
Fix: don't steal key when adding a metadata stream

This was causing a stream corruption of the node key if the stream->key
of the metadata was matching a stream wait_fd making the stream not
findable and asserting when getting out of the metadata poll wait.

Now we lookup the stream before adding it to make sure it's unique and
don't try to steal the key anymore since wait_fd is unique to the
consumer.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoConsumer hold mutex for add stream
Mathieu Desnoyers [Thu, 6 Dec 2012 14:20:11 +0000 (09:20 -0500)] 
Consumer hold mutex for add stream

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@ev0ke.net>
11 years agoFix: audit all close/fclose and check returned code
David Goulet [Mon, 3 Dec 2012 21:57:57 +0000 (16:57 -0500)] 
Fix: audit all close/fclose and check returned code

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: update/clean lttng.h comments
David Goulet [Mon, 3 Dec 2012 21:43:43 +0000 (16:43 -0500)] 
Fix: update/clean lttng.h comments

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: install lttng health check man page
David Goulet [Mon, 3 Dec 2012 21:14:31 +0000 (16:14 -0500)] 
Fix: install lttng health check man page

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: ship relevant documentations with tarball
David Goulet [Mon, 3 Dec 2012 21:07:45 +0000 (16:07 -0500)] 
Fix: ship relevant documentations with tarball

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoRemove useles AUTHORS and NEWS files
David Goulet [Mon, 3 Dec 2012 21:01:09 +0000 (16:01 -0500)] 
Remove useles AUTHORS and NEWS files

Authors are in each code files associated with the copyright statement.

AUTHORS is useless and out of date. NEWS contains nothing.

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoFix: update urcu version in README and configure.ac
David Goulet [Mon, 3 Dec 2012 21:00:45 +0000 (16:00 -0500)] 
Fix: update urcu version in README and configure.ac

Signed-off-by: David Goulet <dgoulet@efficios.com>
11 years agoUpdate version to v2.1.0-rc9 v2.1.0-rc9
David Goulet [Mon, 3 Dec 2012 20:08:48 +0000 (15:08 -0500)] 
Update version to v2.1.0-rc9

Signed-off-by: David Goulet <dgoulet@efficios.com>
This page took 0.041881 seconds and 4 git commands to generate.