Clang 3.3 with -O2 optimisations is especially picky about arithmetic
on NULL pointers. This undefined behavior is turned into optimized out
NULL checks by clang 3.3. Fix the undefined behavior by checking against
the pointer directly, without going back and forth around NULL with
pointer arithmetic.
David Goulet [Thu, 28 Nov 2013 18:08:10 +0000 (13:08 -0500)]
Fix: don't fail on push metadata if no channel
The comments in the code explains it well but in a nutshell, this is an
acceptable race between the creation of the metadata on the consumer
side and the push metadata from the session daemon for that channel.
This race is resolved by either having the consumer requesting metadata
or the session is stopped which will in both situation push the metadata
to the consumer.
Without that fix, the session daemon flags the registry's metadata to be
"closed" which usually indicates that the consumer is not responding
leading to the consumer thread exiting in the session daemon.
Acked-by: Julien Desfossez <julien.desfossez@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Wed, 6 Nov 2013 15:49:43 +0000 (10:49 -0500)]
Fix: application SIGBUS when starting in parallel with sessiond
There is a race between application startup and sessiond startup, where there
is an intermediate state where applications can SIGBUS if they see a zero-sized
shm, if the shm has been created, but not ftruncated yet.
On the sessiond side, we need to ensure that the shared memory is writeable by
applications as long as its size is 0, which allow applications to perform
ftruncate and extend its size.
On the UST side, another commit needs to ensure that UST can read the shared
memory file descriptor with a read() system call before they try accessing it
through a memory map (which triggers the SIGBUS if the access goes beyond the
file size)
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Mon, 4 Nov 2013 18:19:06 +0000 (13:19 -0500)]
Fix: arguments in the wrong order for fd-limit
This is related to a bug we've been seeing with a very HIGH load of
applications registering at the same time where the get/put counters get
out of sync. exhausting the fd pool quite rapidely even though there was
no fd leak.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Fri, 11 Oct 2013 19:35:39 +0000 (15:35 -0400)]
Fix: backported fix uses msec vs sec in stable-2.2
In stable-2.2, setting the timeout expects seconds and not msec like in
stable-2.3 and above. This fixes the previous commit that was backported
from master.
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Thu, 29 Aug 2013 14:38:12 +0000 (10:38 -0400)]
Fix: remove bad check after epoll wait in consumer
The returned nb_fd value is the number of FD ready for the requested I/O
so having and revents set to 0 is possible since not all fd are ready
thus making this check irrelevant and actually ressource consuming.
Signed-off-by: David Goulet <dgoulet@efficios.com>
We should at least output one packet before a stream can be considered
as readable. So far, for PID buffers, if an application exits at the
wrong timing before a stop waiting for data pending, empty streams could
be visible by a babeltrace executed after data pending incorrectly
returned false.
Fix it by considering a stream for which the consumerd has written 0
bytes to the output as having data pending.
This applies to 2.3-rc and stable-2.2.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
Fix: hash table growth (for small tables) should be limited (v2)
Buckets with many entries encountered in a hash table could cause it to
grow to a large size, beyond the scope for which this mechanism is
expected to play a role when node accounting is available. Indeed, when
the hash table grows to larger size, split-counter node accounting is
expected to deal with resize/shrink rather than relying on an heuristic
based on the largest bucket size.
This is fixing an issue where we see hash tables sometimes reaching 65k
entries index (65536*8 = 524288 bytes) for a workload limited to adding
1000 entries and then removing all of them, done in a loop (random
keys).
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Tue, 27 Aug 2013 19:23:18 +0000 (15:23 -0400)]
Fix: set the health delta tcp timeout aware
The health check subsystem now initialized the time delta using the TCP
timeout. It takes the maximum value between our default internal delta
and the TCP timeout fetched by the lttcomm inet subsytem.
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Tue, 27 Aug 2013 19:11:40 +0000 (15:11 -0400)]
Get the maximum TCP timeout in sessiond
This actually just open some /proc files when calling
lttcomm_init_inet() and computes the maximum value a TCP timeout for
every possible operations (send/recv/connect).
This is the first patch to fix the health check issue of having a
smaller delta than the TCP maximum possible timeout.
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Tue, 27 Aug 2013 17:56:57 +0000 (13:56 -0400)]
Fix: don't report error if UST app dies
An application dying unexpectedly is a normal behavior so the execution
must continue and not report an error when sending commands onto the
dying application.
Signed-off-by: David Goulet <dgoulet@efficios.com>
Julien Desfossez [Tue, 13 Aug 2013 21:40:28 +0000 (17:40 -0400)]
Fix: reset out_fd_offset when we rotate the trace file
This value is only used for the sync_file_range, but it has to be set
to 0 when we start to write in a new trace file, otherwise the values
passed to this call are bogus.
Applied to 2.3, but could probably be backported to 2.2.
Decrease the switch timer period to 100ms (instead of 1s), since we're
only getting 2s worth of app run. If the system is busy, nothing really
guarantee that the timer will indeed fire during this time-frame (so
strictly speaking, this test could still fail).
Also, kill the application before trying to read the trace: this should
ensure that the trace is not appended to concurrently with validation.
There is a data pending race involving late population of the streams in
the stream hash table, and applying flush on streams that are not yet
globally visible.
This is caused by the fact that streams are added to the hash table only
when received by the data-handling consumer thread.
This results in data_pending() incorrectly returning that there is no
data pending in some cases.
This has been discovered by adding 1s delay in read subbuffer function
for testing.
Currently a shadow of the channel lock. Will eventually be used to
protect channel timer handler from concurrent channel updates without
being held when the timer is stopped (future commit).
Fix: consumer: 64-bit index for relayd rather than 32-bit (v2)
Relayd "unique" ids wrap every 32-bit, and in some cases, negative
values are considered as error.
Change this to make the error value specifically -1ULL, use a direct
comparison (since we use an unsigned 64-bit integer, comparison with 0
becomes incorrect).
Since we now use a 64-bit ID, it is assumed to _never_ wrap-around
(remember, value -1ULL is an _error_). Therefore,
consumer_add_relayd_socket() can become much more strict than it was:
instead of accepting re-use of net_seq_idx, we can now assert that upon
LTTNG_STREAM_CONTROL socket, we have indeed allocated a relayd object,
and upon LTTNG_STREAM_DATA, we have found a relayd object.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
Julien Desfossez [Thu, 27 Jun 2013 21:57:06 +0000 (17:57 -0400)]
Fix: send per-pid session id in channel creation
The registry indexing for per-pid sessions is done with a per-pid
session id. So for per-pid buffers, we need to send the per-pid session
id as well as the global session id to the consumer in order to give it
enough information if it needs to request metadata later.
This patch adds the session_id_per_pid to the channel creation message
and to the consumer. When the sessiond receives a metadata_request,
depending on the buffer type (per-pid or per-uid), it selects the right
id to do the registry lookup.
Signed-off-by: Julien Desfossez <jdesfossez@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>