Mathieu Desnoyers [Tue, 5 Nov 2019 23:56:34 +0000 (18:56 -0500)]
align.h: Implement ALIGN_FLOOR macro
Implement the ALIGN_FLOOR macro which aligns the given value to the
previous alignment boundary, or keeps the value as-is if it is already
aligned.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I36d981e8fb705fafa3ff23ba2d82ec1babe73e45
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Mathieu Desnoyers [Tue, 3 Dec 2019 09:08:36 +0000 (04:08 -0500)]
Fix: relayd: per-pid live: no new metadata vs close
When using lttng-live on a per-pid UST trace, the metadata stream is
closed after an application exits. The live viewer observes that the
stream has no more data when getting the metadata returns 0 bytes.
Internally in the relay daemon, it is assumed that a metadata stream
can be put (and thus removed) as soon as all the metadata has been sent
to the viewer, but this is not quite right. The viewer actually needs to
observe a 0-byte reply after receiving all the metadata in order to
gracefully perceive the metadata stream hang up.
Therefore, add a no_new_metadata_notified flag to the metadata stream
to track whether that 0-byte message has been sent to the viewer since
the last metadata content was sent, and postpone the reclamation of the
metadata stream until it is closed _and_ that 0-byte reply was sent to
the live viewer.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I0a05332284299d62b832046e4f9d22b6029c3a3e
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Mathieu Desnoyers [Tue, 5 Nov 2019 15:40:58 +0000 (10:40 -0500)]
Fix: relayd: use packet sequence number for rotation position
The "network" sequence number (net_seq_num) is a 64-bit sequence number
tagging each packet sent over the network. The net_seq_num increments
monotonically (+1) for each packet sent from consumer daemon to relay
daemon, on a per-stream basis. It is tagged by the consumer daemon when
sending a trace packet to the relay daemon.
The LTTng kernel and user-space ring buffer "consumed position"
(consumed_pos) and "produced position" (produced_pos) are free-running
counters counting the number of bytes consumed and produced so far by
each stream. Because those counters are updated atomically, they are
limited to a size of 32-bit on 32-bit architectures.
The "packet" sequence number (packet_seq_num) is a sequence number
found in the packet header starting from LTTng 2.8. It is a 64-bit
sequence number assigned by the lttng-modules and lttng-ust ring
buffers. It increments monotonically (+1) for each packet produced
within a given ring buffer (stream).
Using produced_pos as rotation position and comparing it to the
net_seq_num has a few issues:
1) It breaks on 32-bit producers after generating more than 4GB of
data per stream, due to overflow. The net_seq_num is a 64-bit
counter, which does not overflow, but the produced_pos overflows
after 4GB on 32-bit architectures. This can lead to never-completing
rotations.
2) It breaks scenarios where ring buffers are configured in
overwrite mode, and streaming to a relay daemon. Indeed, when
the ring buffer moves the consumed_pos ahead, actually overwriting
data within the ring buffer, it introduces an offset between the
produced_pos and the net_seq_num. Therefore, if producers are
generating a low- (or no-) throughput in some streams, the
rotation may never complete, even on 64-bit architectures.
The solution proposed for this issue is to use the packet_seq_num as
rotation position rather than the net_seq_num. It takes care of
the two problematic scenarios, since the counter is always 64-bit
(even on 32-bit architectures), and because the counter is managed
by the producer, which therefore tracks progress of the ring buffer
overwrites.
This commit introduces changes required at the relayd side. A
separate commit introduces the changes required in the consumerd.
In relayd, one major restriction is the fact that the packet_seq_num
is not sent over the data socket, only through the control socket
receiving the indexes.
Therefore, in order to figure out the pivot position for the data
socket for a given stream, the associated index first needs to be
received. At that point, the corresponding net_seq_num is known,
which provides the pivot position for the data stream. Given that
the data and index sockets provide no ordering guarantees with
respect to their arrival, we handle the fact that data might have
been saved to disk in the wrong (previous) trace chunk by moving
it to the next trace chunk when the pivot position is known.
In order to allow "jumps" in the sequence numbers produced by
overwrite mode buffers, try_rotate_stream_index(), which previously
asserted that each sequence number was received in sequence, now
uses the packet_seq_num pivot position as a lower (inclusive) bound.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I755329e313f0980655a164b7bdb57e4f3d8e944a
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Mathieu Desnoyers [Wed, 18 Dec 2019 15:56:29 +0000 (10:56 -0500)]
Fix: relayd stream.c: LTTNG_OPTIONAL_GET address confusion
The lack of proper care around use of macro arguments in LTTNG_OPTIONAL_GET
allowed this code to compile, but caused the following issue (manually
replacing "optional" argument with the a
Using LTTNG_OPTIONAL_GET(&stream->ongoing_rotation)->next_trace_chunk
with:
#define LTTNG_OPTIONAL_GET(optional) \
({ \
assert(optional.is_set); \
optional.value; \
})
translates to:
({
assert(&stream->ongoing_rotation.is_set);
&stream->ongoing_rotation.value;
})->next_trace_chunk
The issue here is the assert(), which just checks that the address
is not NULL, when it should actually check that the value is set.
The prior commit fixing optional.h to add proper parentheses to the
macro rightfully fails to compile this code. Fix the erroneous user.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I0dd45fb60573cae6ae3e831e24266aff4406f57f
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Mathieu Desnoyers [Wed, 18 Dec 2019 15:50:00 +0000 (10:50 -0500)]
Fix: optional.h macro missing parentheses and guards
The are a few coding style issues with optional.h which leads to
unexpected effects when the macros are used in the caller code.
All macro parameters need to be surrounded by () (except when used near
commas, which is the C operator with least precedence).
All macros that emit code need to be surrounded by do { } while (0) so
not to emit extra ; or omit ;, which can cause subtle issues when used
with if/else statements.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: Iba6fde7c267f4d8c9ec1a89147045f0bcda3a67a
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Fri, 13 Dec 2019 21:47:57 +0000 (16:47 -0500)]
dynamic-array: fix documentation of lttng_dynamic_pointer_array_get_pointer
The documentation of this function says that mutating the array
invalidates the return pointer. This is not true, since the returned
pointer is the element itself, not a pointer to the array.
Change-Id: I8b8978cddb2d1d6ce0b42ed313e9843ca418c96c
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Fri, 13 Dec 2019 20:25:28 +0000 (15:25 -0500)]
Fix: make dist fails on sdt-probe test with autotools v1.16+
The 'dist' target fails when building with autotools 1.16.1 as the
behaviour of BUILT_SOURCES has changed in 1.16 [1].
BUILT_SOURCES are now built as part of the 'dist' target which causes
it to fail on machines that don't have dtrace installed.
The following error is produced: DTRACE foobar_provider.h /bin/sh: -s:
command not found
The BUILT_SOURCES is skipped for build configurations where the build
of the test helper 'userspace-probe-sdt-binary' is disabled.
The resulting artifact, foobar_provider.h, is not part of the
distributed tarball (marked as nodist), so this doesn't have any
adverse effect.
[1] https://www.mail-archive.com/automake@gnu.org/msg20156.html
Signed-off-by: Michael Jeanson <michael.jeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ibba4689775a8b002e8a959079b1f3c8db9917c5f
Jonathan Rajotte [Wed, 11 Dec 2019 20:58:58 +0000 (15:58 -0500)]
Fix: tests: metadata presence on relayd is not deterministic
There is no synchronization point guaranteeing the presence of metadata on
lttng-relayd side when the live metadata request is done. It must be
present at some point in the future but might not be at the moment we
perform the request.
Add a retry phase to this step. It must succeed in the future.
Not sure how this test was never marked as flaky.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Change-Id: I033005c6a228013e699c74d8e8faafcb3272dd98
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Tue, 10 Dec 2019 16:15:39 +0000 (11:15 -0500)]
Fix: move testpoint after state update
This prevent failure and hang for the long_regression test suite.
Otherwise the sessiond error out before the test completion due to an
invalid thread state.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Change-Id: Ie4bf3bea6f84f49f5eceb413ae75a7e6fff08a8f
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Fri, 6 Dec 2019 21:07:03 +0000 (16:07 -0500)]
doc: fix typo in lttng-enable-event man page
Change-Id: I81ce8f5d4eb3857ace5b1bc3a3ed3c62751dc503
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 9 Dec 2019 19:31:33 +0000 (14:31 -0500)]
Fix: sessiond: RCU read lock imbalance on get trace chunk id error
The error path when cmd_setup_relayd() fails to get the ID of a trace
chunk causes the RCU read lock to be released even though it was never
acquired.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I86aeb99e6cd6ba11634283ab7456a3760a0a6b0d
Jonathan Rajotte [Tue, 3 Dec 2019 16:46:24 +0000 (11:46 -0500)]
Fix: build: ust -> kernel mix-up in noinst_SCRIPTS
This is simply a copy paste error as test_notification_ust appears
twice.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Change-Id: I4e170a092338f5a95479a58d48247088d28d3e31
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 3 Dec 2019 09:59:10 +0000 (04:59 -0500)]
Fix: relayd: missing metadata stream causes all traces to be skipped
Commit
123ed7c22 intends for a trace that doesn't have a metadata
stream to be skipped when creating viewer streams. However, the loop
over ctf_traces should be "continued" rather then "broken" from when
this situation arises. Otherwise, all ctf_traces of the session are
skipped, which is not the intention here.
Moreover, a reference to the current ctf_trace is leaked when the
break (now continue) occurs.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ic630521c6f050c77a78f2c1e20c5741a6b3a66a1
Simon Marchi [Mon, 27 May 2019 15:52:46 +0000 (11:52 -0400)]
configure.ac: Remove duplicated CMD_DESCR_ROTATE definition
CMD_DESCR_ROTATE is defined twice by configure. This breaks configuring
with "-Werror -Wall", as some test programs don't compile due to:
configure:16990: gcc -o conftest -Wall -Werror conftest.c >&5
conftest.c:163: error: "CONFIG_CMD_DESCR_ROTATE" redefined [-Werror]
#define CONFIG_CMD_DESCR_ROTATE "Archive a tracing session’s current trace chunk"
conftest.c:154: note: this is the location of the previous definition
#define CONFIG_CMD_DESCR_ROTATE "Archive a tracing session's current trace chunk"
cc1: all warnings being treated as errors
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Sat, 23 Nov 2019 00:04:16 +0000 (19:04 -0500)]
Fix: relayd: fully initialize viewer stream before publishing it
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Francis Deslauriers [Tue, 12 Nov 2019 22:55:25 +0000 (17:55 -0500)]
Fix: relayd: don't send streams if there is no metadata
Issue
=====
When tracing short-live UST apps in per-pid mode, it happens that traces
are closed before we can send all the streams to the viewer. This can
place the viewer in an uncomfortable position where it is aware of data
streams of a trace but can't get the metadata stream to decode the
events.
Solution
========
Only send the data streams if we have the metadata stream or if the
metadata stream was already sent. This ensures that the viewer will
either have all the {data,metadata} streams or none of them.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Mon, 18 Nov 2019 20:12:20 +0000 (15:12 -0500)]
Fix: update apps on untrack only when session is active
This mimics what is done on the track side.
Fixes #1210
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Fri, 22 Nov 2019 20:08:15 +0000 (15:08 -0500)]
Fix: consumerd: assert on null trace chunk on session restart
The consumer daemon asserts on
`stream->net_seq_idx != (uint64_t) 1ULL || stream>trace_chunk'
when a session is re-started following a rotation.
Steps to reproduce, while an instrumented application is running:
$ lttng create
$ lttng enable-event -u -a
$ lttng start
$ lttng stop
$ lttng rotate
$ lttng start
-> assertion fails
The trace chunk of a stream can be null in this scenario as the
"create trace chunk" consumer command immediately sets the streams'
current trace chunk when they are transitioning from a "null" trace
chunk.
Since
1f4962443, a session restart will cause a rotation to occur from
"null" to the streams' new trace chunk. When this rotation occurs, the
channel and their streams are seen to be in the same trace chunk.
This is unexpected as this should only occur when transitioning from a
trace chunk to the "null" trace chunk (no output). In that case, the
streams are considered to have reached the point where they should
discard their current trace chunk to enter the "null trace chunk"
state (having no current output). This is exactly the opposite of
the situation here as streams are transitioning from a null to a
non-null trace chunk.
Hence, the immediate assignation of the trace chunk to the streams
should no longer be performed on creation of a trace chunk on the
consumer daemon. The streams transition from the "null" trace chunk to
a new trace chunk through the "rotate channel" command as is done for
all other rotation operations. Removing this bypass also ensures that
the rotation behaviour of both the consumer and relay daemons match.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 20 Nov 2019 21:09:50 +0000 (16:09 -0500)]
Fix: sessiond: don't wait for a rotation from a null chunk to finish
The rotation completion checking does not handle waiting for the
completion of a rotation _from_ a NULL trace chunk. This is correct as
there is essentially nothing to check. Streams always rotate out of a
null trace chunk immediately as it means the stream had no output and
could not receive data.
Concretely, this happens when stopping a session, rotating, and
re-starting a session.
The fix consists in simply re-working the logic of the "rotate"
command to not launch a rotation completion check job and not put the
session in the "rotation ongoing" state.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 20 Nov 2019 21:06:30 +0000 (16:06 -0500)]
Fix: sessiond: duplicated rotation notification sent
A duplicate notification that a rotation was launched and completed
is sent by the rotation thread when a rotation is performed after
a session was stopped.
This code is removed at it is a left-over from the time when the
rotation thread had to explicitly rename trace output folders
when a rotation had completed.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 19 Nov 2019 18:41:51 +0000 (13:41 -0500)]
Fix: relayd: remove assert of non-null stream trace chunk on rotate
A stream can rotate from a "NULL" trace chunk to a new trace chunk
and the relay daemon should not assert on this condition.
This happens when a session is stopped, rotated, and started again
later on.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 18 Nov 2019 21:09:21 +0000 (16:09 -0500)]
Docs: verb/noun confusion in comment
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 18 Nov 2019 21:01:44 +0000 (16:01 -0500)]
Fix: sessiond: no rotation performed from null chunk to new chunk
The session daemon must rotate from a null trace chunk to a new trace
chunk when a session's current trace chunk is null. This situation
happens when a session is stopped, rotated, and started again.
This bug leaves existing streams in the "null" trace chunk and causes
the relay daemon to report a protocol error on the reception of the
first data packet following such a start.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 18 Nov 2019 19:43:06 +0000 (14:43 -0500)]
Fix: invalid use of destructor in dynamic pointer array
A dynamic pointer array is built on top of a dynamic array and uses
the dynamic array's internal "destructor" field to store the
user-specified destructor.
lttng_dynamic_pointer_array_remove_pointer currently uses
the dynamic array's remove_element directly which causes the
user destructor to be called with the underlying storage of the
pointer rather than with the pointer itself.
This change re-uses the same pattern as
lttng_dynamic_pointer_array_reset(), namely using the destructor
explicitly and setting it to NULL for the duration of the call to
the dynamic array API.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Fri, 15 Nov 2019 22:16:32 +0000 (17:16 -0500)]
Fix: relayd: check for a trace chunk before writing a packet
Protocol errors can cause a packet to be written to a stream that
doesn't have an active trace chunk. Validate this condition for the
init and write packet operations on a stream.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 14 Nov 2019 22:12:08 +0000 (17:12 -0500)]
Fix: relayd: viewer session trace chunk not released on detach
The 'attach' command on a viewer session expects (asserts) the trace
chunk of the viewer session to be NULL. This is reasonable as there is
no reason to hold a reference to a trace chunk while no clients are
attached.
Release the reference to the trace chunk on detach. The relay
session's trace chunk will be re-sampled (copied) when the next client
attaches to the viewer session.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Thu, 7 Nov 2019 19:02:55 +0000 (14:02 -0500)]
Require automake >= 1.12
The test suite LOG_DRIVER statement requires that automake >= 1.12 be used
during bootstrap.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 12 Nov 2019 04:38:25 +0000 (23:38 -0500)]
Fix: relayd: session trace chunk is copied too late
In per-pid buffering mode, a viewer can attach to a session while it
is active and find it has been closed by the time it requests new
streams. The viewer session's trace chunk is created as a side-effect
of the "get_new_streams" command and can find that the relay_session's
trace chunk has now been closed, causing the creation of the viewer
session trace chunk to fail.
This results in an unexpected error being reported to the live client.
This fix moves the creation of the viewer session's trace chunk to the
"attach" command. If the creation fails, the session is reported as
being "unknown".
Reported-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Tested-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Jérémie Galarneau [Fri, 8 Nov 2019 22:45:20 +0000 (17:45 -0500)]
Fix: relayd: disallow 0-length session names for 2.4+ peers
The group-by-session-name feature assumes sessions are named for peers
more recent than 2.3. Enforce this assumption at session creation
time.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Mathieu Desnoyers [Fri, 1 Nov 2019 20:23:05 +0000 (16:23 -0400)]
Fix: sessiond: ust: deadlock with per-pid buffers
Do not hold the registry lock while communicating with the consumerd,
because doing so causes inter-process deadlocks between consumerd and
sessiond with the metadata request notification.
The deadlock involves both sessiond and consumerd:
* lttng-sessiond:
thread 11 - thread_application_management
close_metadata()
pthread_mutex_lock(®istry->lock);
consumer_close_metadata()
pthread_mutex_lock(socket->lock);
thread 15 - thread_consumer_management
ust_consumer_metadata_request()
pthread_mutex_lock(&ust_reg->lock);
thread 8 - thread_manage_clients
consumer_is_data_pending
pthread_mutex_lock(socket->lock);
consumer_socket_recv()
* lttng-consumerd:
thread 4 - consumer_timer_thread
sample_channel_positions()
pthread_mutex_lock(&stream->lock);
thread 8 - consumer_thread_sessiond_poll
consumer_data_pending
pthread_mutex_lock(&consumer_data.lock);
pthread_mutex_lock(&stream->lock);
thread 7 - consumer_thread_data_poll
lttng_consumer_read_subbuffer
pthread_mutex_lock(&stream->chan->lock);
pthread_mutex_lock(&stream->lock);
do_sync_metadata
pthread_mutex_lock(&metadata->lock);
lttng_ustconsumer_sync_metadata
pthread_mutex_unlock(&metadata_stream->lock);
lttng_ustconsumer_request_metadata()
pthread_mutex_lock(&ctx->metadata_socket_lock);
lttcomm_recv_unix_sock()
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 6 Nov 2019 19:43:18 +0000 (14:43 -0500)]
relayd: close viewer stream trace chunk earlier on release
A viewer stream puts its references to its stream and index files
within its "release" method (called when its reference count reaches
0).
However, the reference to its trace chunk is only released during the
RCU reclamation of the viewer stream. This unnecessarily delays the
clean-up of the viewer trace chunk.
For cleanliness' sake, move the release of the viewer stream's trace
chunk to the release method, just after the release of the various
file handles of that stream.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Mathieu Desnoyers [Fri, 1 Nov 2019 20:23:04 +0000 (16:23 -0400)]
Fix: relayd: put chunk reference when closing stream
If a stream is closed by an application exiting (per-pid buffers), it
needs to put its reference on the stream trace chunk right away, because
otherwise still holding the reference on the trace chunk could allow a
viewer stream (which holds a reference to the stream) to postpone
destroy waiting for the chunk to cease to exist endlessly until the
viewer is detached.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Mathieu Desnoyers [Fri, 1 Nov 2019 20:23:03 +0000 (16:23 -0400)]
Fix: relayd: tracefile rotation: viewer opening missing index file
Moving the head position of the tracefile array when the data is
received opens a window where a viewer attaching to the session could
try to open a missing index file (which has not been received yet).
However, we want to bump the tail position as soon as we receive
data, because the prior tail is not valid anymore.
Solve this by introducing two head positions: the "read" head
and the "write" head. The "write" head is the position of the
newest data file (equivalent to the prior "head" position). We
also introduce a "read" head position, which is only moved
forward when the index is received.
The viewer now uses the "read" head position as upper bound, which
ensures it never attempts to open a non-existing index file.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Fri, 25 Oct 2019 22:12:04 +0000 (18:12 -0400)]
Tests: fix shellcheck warning
No need to use random string for session name here, use the test name.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Fri, 25 Oct 2019 22:12:03 +0000 (18:12 -0400)]
Tests: base path: lttng load for session configuration
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Fri, 25 Oct 2019 22:12:02 +0000 (18:12 -0400)]
Cleanup: remove unused internal lttng_session_descriptor_get_base_path
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Fri, 25 Oct 2019 22:12:01 +0000 (18:12 -0400)]
Refactor: Move set session path to own function
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Fri, 25 Oct 2019 22:12:00 +0000 (18:12 -0400)]
Fix: move set base_path of session to URI configuration
The load code still uses the "old" API to create and configure network
session output (lttng_create_session followed by
lttng_set_consumer_url). The session base_path is only set in the
cmd_create_session_from_descriptor function. This results in invalid
network output paths when using a loaded session configuration (xml).
While we might want to move the load code to the new API, there is a
case to be made in not breaking the previous API behaviour.
To restore the expected behaviour of lttng_set_consumer_url, move the
assignation of the session base_path to the cmd_set_consumer_uri
function (LTTNG_SET_CONSUMER_URI).
Both the previous and session descriptor based creation API uses this
code path when needed (network output).
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Fri, 25 Oct 2019 21:56:26 +0000 (17:56 -0400)]
Fix: lttng: initialize sessions pointer to NULL
lttng_list_sessions does not set the passed pointer to NULL on empty
return. This leads to a deallocation of an invalid pointer (segfault).
For returns of size 0, the value of the passed argument should be
considered "undefined".
Refactor error handling a bit by removing the "error" jump. Always
call free on the 'sessions' object.
Fixes #1205
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Thu, 31 Oct 2019 20:12:46 +0000 (16:12 -0400)]
Fix: check for dtrace and sdt.h before enabling SDT uprobe tests
Add a configure switch '--enable-sdt-uprobe / --disable-sdt-uprobe', the
default behavior of enabling the test if the requirements are found is
kept but it's now possible to explicitly disable it.
Also add the detection of the dtrace binary and its override trough the
DTRACE environment variable.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 30 Oct 2019 19:35:28 +0000 (15:35 -0400)]
Fix: consumerd: crash occurs when taking snapshot of ust channel
Commit
8e1ef46e8 added an acquisition of the metadata_stream's lock
during consumer_metadata_cache_flushed() as stream attributes are
used. However, when this function is called, the metadata channel's
stream can already be NULL, as indicated by the function's comments.
Check if the stream is NULL before attempting to acquire its lock.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 29 Oct 2019 19:57:59 +0000 (15:57 -0400)]
Fix: trace-chunk: log the cause of file open failures
Use the PERROR macro instead of ERR to obtain the "errno" message
when an error occurs while opening a file relative to a trace
chunk.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 29 Oct 2019 03:57:01 +0000 (23:57 -0400)]
Fix: relayd: live: crash when creating viewer streams
Viewer streams can be creating while serving a "GET_STREAMS" viewer
client command for a session that is being destroyed. If this happens,
the viewer streams will be created with a NULL viewer trace chunk,
which would result in a crash.
The fix consists in returning a stream error when such a command
happens during the destruction of a session. This is the same
behaviour than if the session could not be found at all, introducing
no meaningful change in behaviour from the viewer's perspective.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 29 Oct 2019 03:32:36 +0000 (23:32 -0400)]
Fix: relayd: live: crash on attach to a session without trace chunk
Attaching to a session that doesn't have a current trace chunk results
in a crash when the viewer streams are created from a NULL viewer
trace chunk.
Live clients are prevented from attaching to sessions without a
current trace chunk as those sessions are either being destroyed or
too young to have a trace chunk, meaning that they don't have streams
yet. Live clients will receive the "unknown" status code that they
already receive when asking an unknown session. Since such sessions
are not listed, this shouldn't change any exposed behaviour.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 29 Oct 2019 03:29:53 +0000 (23:29 -0400)]
Fix: relayd: live: some listed sessions are not attacheable
The list sessions command currently returns sessions that do not
have a current trace chunk. This can be caused by the session
either being destroyed or being so young that it hasn't had a
trace chunk created against it yet.
In both cases, such sessions would not be attacheable in their
current condition. This fix omits them from from the listing
to reduce the number of failures at the "attach session" and
"attach stream" step.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 28 Oct 2019 20:53:48 +0000 (16:53 -0400)]
Fix: relayd: don't put un-acquired trace chunk reference
stream_create() should not release (put) the reference to the current
trace chunk in its error path if it could not acquire a new reference
in the first place.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 28 Oct 2019 20:26:48 +0000 (16:26 -0400)]
Fix: relayd: don't put un-acquired viewer trace chunk reference
viewer_stream_create() should not release (put) the reference to
the viewer_trace_chunk in its error path if it could not acquire
a new reference in the first place.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 28 Oct 2019 19:37:34 +0000 (15:37 -0400)]
Fix: consumerd: NULL pointer dereference during metadata sync
The following crash was reported when short-lived applications
are traced in a live session with per-pid buffering channels.
From the original report:
```
Thread 1 (Thread 0x7f72b67fc700 (LWP
1912155)):
#0 0x00005650b3f6ccbd in commit_one_metadata_packet (stream=0x7f729c010bf0) at ust-consumer.c:2537
#1 0x00005650b3f6cf58 in lttng_ustconsumer_sync_metadata (ctx=0x5650b588ce60, metadata=0x7f729c010bf0) at ust-consumer.c:2608
#2 0x00005650b3f4dba3 in do_sync_metadata (metadata=0x7f729c010bf0, ctx=0x5650b588ce60) at consumer-stream.c:471
#3 0x00005650b3f4dd3c in consumer_stream_sync_metadata (ctx=0x5650b588ce60, session_id=0) at consumer-stream.c:548
#4 0x00005650b3f6de78 in lttng_ustconsumer_read_subbuffer (stream=0x7f729c0058e0, ctx=0x5650b588ce60) at ust-consumer.c:2917
#5 0x00005650b3f45196 in lttng_consumer_read_subbuffer (stream=0x7f729c0058e0, ctx=0x5650b588ce60) at consumer.c:3524
#6 0x00005650b3f42da7 in consumer_thread_data_poll (data=0x5650b588ce60) at consumer.c:2894
#7 0x00007f72bdc476db in start_thread (arg=0x7f72b67fc700) at pthread_create.c:463
#8 0x00007f72bd97088f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
The segfault happen on the access to 'stream->chan->metadata_cache->lock'
chan value here is zero.
```
The problem is easily reproducible if a sleep(1) is added just after
the call to lttng_ustconsumer_request_metadata(), before the metadata
stream lock is re-acquired.
During the execution of the "request_metadata", an application can
close. This will cause the session daemon to push any remaining
metadata to the consumer daemon and to close the metadata channel.
Closing the metadata channel closes the metadata stream's wait_fd,
which is an internal pipe. The closure of the metadata pipe is
detected by the metadata_poll thread, which will ensure that all
metadata has been consumed before issuing the deletion of the metadata
stream and channel.
During the deletion, the channel's "stream" attribute the stream's
"chan" attribute are set to NULL as both are logically deleted and
should not longer be used.
Meanwhile, the thread executing commit_one_metadata_packet()
re-acquires the metadata stream lock and trips on the now-NULL "chan"
member.
The fix consists in checking if the metadata stream is logically
deleted after its lock is re-acquired. It is correct for the
sync_metadata operation to then complete successfully as the metadata
is synced: the metadata guarantees this before deleting the
stream/channel.
Since the metadata stream's lifetime is protected by its lock, there
may be other sites that need such a check. The lock and deletion check
could be combined into a single consumer_stream_lock() helper in
follow-up fixes.
Reported-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 28 Oct 2019 18:52:44 +0000 (14:52 -0400)]
consumerd: clean-up: stream attribute accessed without locking stream
consumer_metadata_cache_flushed makes use of the metadata stream's
ust_metadata_pushed attribute without locking while it is updated by
commit_one_metadata_packet() which holds the metadata stream lock.
This is marked as a clean-up since the attribute appears to always be
accessed while the metadata cache lock is held. However this is a
_channel_ attribute and the stream and channel lifetimes do not match,
making the locking assumptions conceptually dubious.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Wed, 23 Oct 2019 15:08:34 +0000 (11:08 -0400)]
Fix: check for lttng-ust >= 2.11 at configure
We don't support building lttng-tools against an older version of
lttng-ust, make this check explicitly at configure.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Fri, 18 Oct 2019 21:40:06 +0000 (17:40 -0400)]
Typo: occured -> occurred
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Fri, 18 Oct 2019 21:12:22 +0000 (17:12 -0400)]
Fix typo 'Attemp' -> 'Attempt'
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Fri, 18 Oct 2019 20:39:00 +0000 (16:39 -0400)]
Fix: sessiond: use system LTTng-UST headers when available
The LTTng-Tools tree includes a local copy of three LTTng-UST headers:
* ust-error.h
* ust-ctl.h
* ust-abi.h
The system headers should be used when UST support is configured to
ensure the appropriate ABI definitions are used. The local copies of
the headers should only be used when LTTng-Tools is built with the
--without-lttng-ust configuration option. Those headers are needed
since some UST support code is compiled-in even though the support
is deactivated.
A misconfiguration in the CI setup allowed us to notice that
sessiond-config.c is using the internal header unconditionally.
To ensure this doesn't happen in the future, the local copies
are renamed:
* ust-error.h -> ust-error-internal.h
* ust-ctl.h -> ust-ctl-internal.h
* ust-abi.h -> ust-abi-internal.h
All code should use the `lttng-` prefixed versions of the headers
which include either the local or "system" copy of the headers
depending on the build configuration.
Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Philippe Proulx [Fri, 18 Oct 2019 19:53:05 +0000 (15:53 -0400)]
doc/man: use specific revision date for each manual page
This patch makes each manual page indicate its own revision date with
the `revdate` AsciiDoc attribute.
In `asciidoc.conf`, we use this attribute to specify the DocBook
reference page date (see
<https://tdg.docbook.org/tdg/4.5/refentryinfo.html> and
<https://tdg.docbook.org/tdg/4.5/date.html>).
Without the DocBook date tag, `xmlto` uses the current date. You can
see this date at the bottom of the rendered manual page:
...
SEE ALSO
lttng-enable-rotation(1), lttng-disable-rotation(1), lttng(1)
LTTng 2.12.0-pre 10/18/2019 LTTNG-ROTATE(1)
Using the manual page generation date seems unexpected for the reader
here.
For this initial change, I used the last commit date for each source
file.
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Philippe Proulx [Fri, 18 Oct 2019 19:29:31 +0000 (15:29 -0400)]
lttng-rotate.1.txt: update voice and document the `archives` subdir.
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 17 Oct 2019 19:35:49 +0000 (15:35 -0400)]
Update version to v2.11.0
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 17 Oct 2019 18:37:37 +0000 (14:37 -0400)]
Fix: sessiond: unbalanced health register/unregister on error
A number of threads do not correctly pair registrations and
unregistrations to the health monitoring subsystem when an error
forces them to abort early. Since the pattern is mostly the same
in the notification and rotation thread, they are both fixed in
the same commit.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 17 Oct 2019 18:35:34 +0000 (14:35 -0400)]
Fix: sessiond: NULL thread_state provided to pthread_cleanup callback
The callback registered through pthread_cleanup_push(...),
thread_init_cleanup, now expects a non-NULL thread_state argument.
The thread_state is passed to match this recent change.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 16 Oct 2019 22:53:47 +0000 (18:53 -0400)]
Fix: sessiond: leak of trace chunk on destruction error
By design, a trace chunk can be leaked on the consumer daemon's end if
the session daemon does not close it. This is because the consumer
daemon has no "top-level" session object which could bound the
lifetime of a trace chunk.
It was reported that errors during a session destruction operation
could result in a trace chunk leak being reported by the consumer
daemon on shut down.
In the case that was reported, the failure to launch an application
caused the metadata channel to never be created. When the session was
destroyed, the rotation of the metadata channel failed with a "channel
does not exist" error. This error caused cmd_rotate_session() to abort
before the trace chunk close command was sent to the consumer
daemon(s). This ultimately results in the leak described earlier.
The fix consists in performing the trace chunk close command on the
consumer daemon even if the rotation itself fails.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 16 Oct 2019 22:25:57 +0000 (18:25 -0400)]
common: cleanup error message mentioning mkdir
While there is a good chance that mkdir is the actual syscall that
fails when this error code is returned, the error messages should
describe the operation that failed and not its implementation.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 16 Oct 2019 22:22:32 +0000 (18:22 -0400)]
Fix: sessiond: session destruction errors are unreported
The session daemon does not report errors which occur while setting-up
a session's destruction. For instance, if the implicit rotation or
rotation to the "null" chunk fails. While the session will be
destroyed (it will no longer appear in session listings), the session
daemon could have failed to destroy it properly and it could be
corrupted/unreadable.
This reports those errors so the user does not expect the session to
be readable (but it _could_ be).
This was discovered while investigating another, unrelated, issue.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 16 Oct 2019 16:35:23 +0000 (12:35 -0400)]
Fix: consumer: double unlock of rcu read lock on error
Commit
6b584c2e changed the implementation of the "trace_chunk_exists"
command to use the then-new "exists" method of the trace chunk
registry. Before this change, the trace chunks were looked-up to check
for their existence.
Since the "exists" method doesn't require the caller to hold the rcu
read lock, it is acquired later on in the function. However, the rcu
read lock is still released on error when this function fails,
resulting in a double-unlock.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 15 Oct 2019 20:56:22 +0000 (16:56 -0400)]
Fix: sessiond: application channel creation failure stops start cmd
The creation of an application's channel can fail when, for instance,
a context can't be created. This causes applications that would have
been started _after_ it to never be started.
This keeps the iteration going on error and starts all applications
that could be started. This is more in line with the behaviour of
2.10 (and earlier) since those channel creations would occur as
applications registered and not on tracing "start".
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Fri, 11 Oct 2019 21:12:26 +0000 (17:12 -0400)]
sessiond: clean-up: enhance logging on event allocation failure
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Fri, 11 Oct 2019 20:26:29 +0000 (16:26 -0400)]
Fix: sessiond: don't assert on event creation error
Don't assert if an application tracer reports that an event already
exists. This could be caused by a bug on the tracer end or memory
corruption on the application's end. In either case, an assert() is
too strict; simply report the error.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Francis Deslauriers [Wed, 9 Oct 2019 13:32:40 +0000 (09:32 -0400)]
Fix: lttng-elf.c: dereferencing pointer before null check
Coverity report:
CID
1405899 (#1 of 1): Dereference before null check (REVERSE_INULL)
check_after_deref: Null-checking elf suggests that it may be null, but it has
already been dereferenced on all paths leading to the check.
Reported-by: Coverity (1405899) Dereference before null check
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 8 Oct 2019 21:34:51 +0000 (17:34 -0400)]
Fix: sessiond: unbounded elf section data size allocation
The size of ELF sections is read from a user-provided file descriptor
to an ELF file which could be malformed. In theory it would not
really be a problem as the run-as process is automatically restarted
after a crash (e.g. SIGBUS).
The alloctions are now bounded to the smallest of 512MB or the
file's size. The limit is kept high to accomodate very large
binaries and not impose an artificial limitation.
In time, this should be replaced by an mmap() of the section's
data rather than copying to a private set of pages.
1405558 Untrusted value as argument
The argument could be controlled by an attacker, who could invoke the
function with arbitrary values (for example, a very high or negative
buffer size).
In lttng_elf_get_sdt_probe_offsets: An unscrutinized value from an
untrusted source used as argument to a function (for example, a buffer
size) (CWE-20)
Reported-by: Coverity Scan
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 8 Oct 2019 19:15:28 +0000 (15:15 -0400)]
Fix: sessiond: double socket close on allocation failure
The application registration thread performs a double close() on
an application socket whenever it fails to allocate a ust_command.
Assign `-1` to `sock` after the initial close() to follow the
pattern of other close paths.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 8 Oct 2019 18:18:31 +0000 (14:18 -0400)]
Fix: sessiond: TOCTOU error on save of session configuration
The session_save() function checks for the existance and access rights
on the target session configuration filename before opening it. This
results in a TOCTOU (Time of check, time of use) problem.
Defer the check and error reporting to the run_as_open() call.
1191754 Time of check time of use
An attacker could change the filename's file association or other
attributes between the check and use. In save_session: A check occurs
on a file's attributes before the file is used in a privileged
operation, but things may have changed (CWE-367)
Reported-by: Coverity Scan
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 8 Oct 2019 18:01:54 +0000 (14:01 -0400)]
Fix: tests: replace truncation-prone logging helper
The printerr() error logging scheme in test_utils_expand_path
is prone to unexpected truncations which results in a lot of
warnings when building using GCC 9.2.
It is replaced by a variable-argument macro that uses fprintf()
directly.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Francis Deslauriers [Tue, 1 Oct 2019 15:43:39 +0000 (11:43 -0400)]
Fix: relayd: Dereference after null check
There is no legitimate case where a stream's trace chunk would be NULL
while receiving a data packet. It could only result from an internal
error. Hence, stream->trace_chunk != NULL can be considered a
pre-condition of this function.
Coverity report:
CID
1404937 (#1 of 1): Dereference after null check (FORWARD_NULL)
11. var_deref_model: Passing null pointer stream->index_file to
relay_index_set_file, which dereferences it
Reported-by: Coverity (1404937) Dereference after null check
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 7 Oct 2019 23:34:08 +0000 (19:34 -0400)]
Fix: sessiond: app sock and notif shm not created by the main thread
The application registration socket and application notification
shared memory are not created by the session daemon's main thread
which causes calls to umask() to be issued from multiple threads.
Since umask manipulates a process-wide ressource, it should be only be
used by a single thread.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 7 Oct 2019 22:26:51 +0000 (18:26 -0400)]
Fix: sessiond: client socket not created by the main thread
The client socket is not created by the session daemon's main thread
which causes calls to umask() to be issued from multiple threads.
Since umask manipulates a process-wide ressource, it should be only be
used by a single thread.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Francis Deslauriers [Fri, 4 Oct 2019 11:42:36 +0000 (07:42 -0400)]
Fix: relayd: Dereference before null check
Coverity report:
CID
1405858 (#1 of 1): Dereference before null check
(REVERSE_INULL)check_after_deref: Null-checking base_path suggests that
it may be null, but it has already been dereferenced on all paths
leading to the check.
Reported-by: Coverity (1405858) Dereference before null check
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Francis Deslauriers [Tue, 1 Oct 2019 19:46:35 +0000 (15:46 -0400)]
Fix: relayd: unchecked return values
Coverity reports:
CID
1171571 (#1 of 1): Unchecked return value (CHECKED_RETURN)
24. check_return: Calling compat_epoll_add without checking return
value (as is done elsewhere 11 out of 13 times).
CID
1171572 (#1 of 1): Unchecked return value (CHECKED_RETURN)
24. check_return: Calling compat_epoll_add without checking return
value (as is done elsewhere 11 out of 13 times).
Reported-by: Coverity (1171571) Unchecked return value
Reported-by: Coverity (1171572) Unchecked return value
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Francis Deslauriers [Tue, 1 Oct 2019 15:06:30 +0000 (11:06 -0400)]
Fix: ust-consumer.c: Double unlock of channel lock
Coverity report:
CID
1404942 (#1 of 1): Double unlock (LOCK)
15. double_unlock: pthread_mutex_unlock unlocks channel->lock while it is
unlocked.
Reported-by: Coverity (1404942) Double unlock
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Francis Deslauriers [Tue, 1 Oct 2019 14:35:28 +0000 (10:35 -0400)]
Fix: sessiond: Dereference before null check
Coverity report:
CID
1404933 (#1 of 1): Dereference before null check
(REVERSE_INULL)check_after_deref: Null-checking
session->chunk_being_archived suggests that it may be null, but it has
already been dereferenced on all paths leading to the check.
Reported-by: Coverity (1404933) Dereference before null check
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Francis Deslauriers [Tue, 1 Oct 2019 14:19:49 +0000 (10:19 -0400)]
Fix: sessiond: Dereference after null check
Coverity report:
CID
1404943 (#1 of 1): Dereference after null check (FORWARD_NULL)9.
var_deref_model: Passing null pointer session->chunk_being_archived to
lttng_trace_chunk_get_id, which dereferences it.
Reported-by: Coverity (1404943) Dereference after null check
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Francis Deslauriers [Fri, 27 Sep 2019 15:49:23 +0000 (11:49 -0400)]
Fix: relayd: Explicit null dereferenced
Coverity warns about an explicit null dereference on an early exit path.
Coverity report:
CID
1405577 (#1 of 1): Explicit null dereferenced (FORWARD_NULL)7.
var_deref_model: Passing null pointer previous_stream_fd to
stream_fd_put, which dereferences it.
Reported-by: Coverity (1405577) Explicit null dereferenced
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Francis Deslauriers [Fri, 27 Sep 2019 15:56:24 +0000 (11:56 -0400)]
Cleanup: relayd: Logically dead code
Coverity report:
CID
1404935 (#1 of 1): Logically dead code (DEADCODE)dead_error_line:
Execution cannot reach the expression 0UL inside this statement:
base_path_len = (base_path
Reported-by: Coverity (1404935) Logically dead code
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Francis Deslauriers [Tue, 1 Oct 2019 13:04:12 +0000 (09:04 -0400)]
Fix: enable_events.c: typo in `WARN()` message
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Francis Deslauriers [Tue, 1 Oct 2019 13:12:47 +0000 (09:12 -0400)]
Cleanup: enable_events.c: fix erroneous comment
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Francis Deslauriers [Tue, 1 Oct 2019 13:35:19 +0000 (09:35 -0400)]
Cleanup: relayd: identical code for different branches
Coverity report:
CID
1404928 (#1 of 1): Identical code for different branches
(IDENTICAL_BRANCHES)identical_branches: The same code is executed when
the condition ret < 0 is true or false, because the code in the if-then
branch and after the if statement is identical. Should the if statement
be removed?
Reported-by: Coverity (1404928) Identical code for different branches
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Francis Deslauriers [Tue, 1 Oct 2019 13:43:03 +0000 (09:43 -0400)]
Fix: common: Unchecked return value of `closedir()`
Coverity report:
CID
1404930 (#1 of 1): Unchecked return value (CHECKED_RETURN)1.
check_return: Calling closedir without checking return value (as is done
elsewhere 5 out of 6 times).
Reported-by: Coverity (1404930) Unchecked return value
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Francis Deslauriers [Tue, 1 Oct 2019 13:48:49 +0000 (09:48 -0400)]
Fix: relayd: Dereference after null check
Coverity report:
CID
1404934 (#1 of 1): Dereference after null check (FORWARD_NULL)
11. var_deref_model: Passing null pointer element to
trace_chunk_registry_ht_element_put, which dereferences it.
Reported-by: Coverity (1404934) Dereference after null check
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Francis Deslauriers [Tue, 1 Oct 2019 15:50:43 +0000 (11:50 -0400)]
Fix: Tests: test_session.c: Structurally dead code
Coverity report:
CID
1400680 (#1 of 1): Structurally dead code (UNREACHABLE)
unreachable: This code cannot be reached: ret = 0;.
Reported-by: Coverity (1400680) Structurally dead code
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Francis Deslauriers [Tue, 1 Oct 2019 18:21:00 +0000 (14:21 -0400)]
Fix: session-descriptor.c: Dereference before null check
Coverity report:
CID
1400682 (#1 of 1): Dereference before null check
(REVERSE_INULL)check_after_deref: Null-checking descriptor suggests that
it may be null, but it has already been dereferenced on all paths
leading to the check.
Reported-by: (1400682) Dereference before null check
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Francis Deslauriers [Tue, 1 Oct 2019 20:06:04 +0000 (16:06 -0400)]
Fix: common: Dereference after null check
Coverity report:
CID
1404929 (#1 of 1): Dereference after null check (FORWARD_NULL)
25. var_deref_model: Passing null pointer directory_to_rename to
lttng_directory_handle_rename_as_user, which dereferences it.
Reported-by: Coverity (1404929) Dereference after null check
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Francis Deslauriers [Tue, 1 Oct 2019 20:22:37 +0000 (16:22 -0400)]
Fix: test_utils_compat_poll.c: Unchecked return value
Coverity report:
CID
1401360 (#1 of 1): Unchecked return value (CHECKED_RETURN)4.
check_return: Calling compat_epoll_create without checking return value
(as is done elsewhere 6 out of 7 times).
Reported-by: Coverity (1401360) Unchecked return value
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 3 Oct 2019 00:28:39 +0000 (20:28 -0400)]
Fix: liblttng-ctl: wrong variable used during argument validation
Local 'handle' variable is used to check for NULL arguments while
the provided argument is named '_handle'. This results in failures
to destroy a session.
Rename the variable used in the argument check.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 2 Oct 2019 22:04:43 +0000 (18:04 -0400)]
Fix: liblttng-ctl: ABI-breaking size change of lttng_session struct
abidiff reports that the size of struct lttng_session has changed
since 2.10:
[C]'function int lttng_list_sessions(lttng_session**)' at lttng-ctl.c:2065:1 has some indirect sub-type changes:
parameter 1 of type 'lttng_session**' has sub-type changes:
in pointed to type 'lttng_session*':
in pointed to type 'struct lttng_session' at session.h:38:1:
type size changed from 35008 to 35072 (in bits)
1 data member deletion:
'char lttng_session::padding[12]', at offset 34912 (in bits) at session.h:50:1
1 data member insertion:
'union {char padding[12]; void* ptr;} lttng_session::extended', at offset 34944 (in bits) at session.h:57:1
The offset after the 'live_timer_interval' field is aligned on 4
bytes, but not on 8 bytes. This causes some compilers (such as gcc and
clang) to align the following 'extended' union on 8 bytes, making the
overall structure larger.
To preserve the size of 'struct lttng_session', four bytes of padding
are added after 'live_timer_interval', resulting in an aligned offset
for both bitnesses.
The 'extended' union's padding is reduced from 12 to 8 bytes,
essentially ensuring that 'ptr' always occupies 8 bytes, even on
32-bit builds.
Tested on clang and gcc for x64, x86, PPC32, PPC64, ARM, ARM64,
AVR, MIPS, MIPS64, and MSVC (32-bit and 64-bit).
Reviewed-by: Michael Jeanson <michael.jeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 2 Oct 2019 21:55:06 +0000 (17:55 -0400)]
Fix: liblttng-ctl: config and mi strings inadvertantly exported
abidiff reports that a number of configuration and MI string symbols
are exported by liblttng-ctl:
'const char* const config_event_type_userspace_probe' {config_event_type_userspace_probe}
'const char* const mi_lttng_element_command_disable_rotation' {mi_lttng_element_command_disable_rotation}
'const char* const mi_lttng_element_command_enable_rotation' {mi_lttng_element_command_enable_rotation}
'const char* const mi_lttng_element_command_rotate' {mi_lttng_element_command_rotate}
Those strings are marked as LTTNG_HIDDEN.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 2 Oct 2019 21:53:58 +0000 (17:53 -0400)]
Fix: liblttng-ctl: compat_sync_file_range inadvertantly exported
abidiff reports that compat_sync_file_range is exported by
liblttng-ctl.
'function int compat_sync_file_range(int, off64_t, off64_t, unsigned int)' {compat_sync_file_range}
Mark the function as LTTNG_HIDDEN.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 2 Oct 2019 21:25:00 +0000 (17:25 -0400)]
Fix: liblttng-ctl: poll compatibility symbols inadvertently exported
abidiff reports that the following poll/epoll compatibility layer
symbols were inadventently exported by liblttng-ctl:
'function int compat_epoll_add(compat_epoll_event*, int, uint32_t)' {compat_epoll_add}
'function int compat_epoll_create(compat_epoll_event*, int, int)' {compat_epoll_create}
'function int compat_epoll_del(compat_epoll_event*, int)' {compat_epoll_del}
'function int compat_epoll_mod(compat_epoll_event*, int, uint32_t)' {compat_epoll_mod}
'function int compat_epoll_set_max_size()' {compat_epoll_set_max_size}
'function int compat_epoll_wait(compat_epoll_event*, int, bool)' {compat_epoll_wait}
Those functions and their 'poll' equivalent are marked as
LTTNG_HIDDEN.
The poll_max_size variable is also made static and removed from the
compatibility header since it is never used apart from within the
implementation files.
'unsigned int poll_max_size' {poll_max_size}
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 2 Oct 2019 18:46:26 +0000 (14:46 -0400)]
Fix: lttng-ctl: unvalidated session destruction handle API arguments
The liblttng-ctl API is not performance sensitive and normally adopts
a defensive stance with regards to supplied arguments. The session
destruction handle API introduced in 2.11 does not check user-supplied
arguments for NULLs which does not fit with existing liblttng-ctl API
conventions.
Add NULL checks for all arguments which cannot be legitimately left
NULL and return a suitable "invalid parameters" return code.
Moreover, note that lttng_destroy_session_ext() is now used by
lttng_destroy_session(), which previously checked for a NULL session
name. Not checking for this case in the new 'ext' version introduced a
change in behaviour.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 2 Oct 2019 18:42:26 +0000 (14:42 -0400)]
Docs: document the session destruction handle API
Add function headers to the session destruction API declared in
destruction-handle.h.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 1 Oct 2019 20:38:45 +0000 (16:38 -0400)]
Update version to v2.11.0-rc4
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Francis Deslauriers [Fri, 27 Sep 2019 19:14:17 +0000 (15:14 -0400)]
Fix: Tests: Segfault in `test_utils_expand_path()`
Background
==========
I have a file named "/a" on my file system (don't ask why).
Issue
=====
While running the `test_utils_expand_path` test case on my machine, I
get a Segfault. Here is the gdb backtrace:
#0 __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:62
#1 0x0000555555559eb7 in expand_double_slashes_dot_and_dotdot (path=0x0) at utils.c:223
#2 0x000055555555a2d8 in _utils_expand_path (path=0x55555556b250 "/a/b/c/d/e", keep_symlink=true) at utils.c:384
#3 0x000055555555a408 in utils_expand_path (path=0x55555556b250 "/a/b/c/d/e") at utils.c:423
#4 0x000055555555859e in test_utils_expand_path () at test_utils_expand_path.c:291
#5 0x00005555555589b0 in main (argc=1, argv=0x7fffffffe5e8) at test_utils_expand_path.c:352
I get this backtrace because the function `utils_partial_realpath()`
returns NULL when it tries to expand the "/a/b/c/d/e" path and realize
that it could not exist since "/a" is a file and not a directory.
Anyways, the returned NULL pointer is ignored and directly used in the
`expand_double_slashes_dot_and_dotdot()` function right after.
This configuration ("/a" being a file) is expected to fail but not to segfault.
It could be reproduce in a real scenario when creating directory
structures.
Solution
========
Return an error if `utils_partial_realpath()` returns NULL.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 1 Oct 2019 17:54:22 +0000 (13:54 -0400)]
Fix: lttng-ctl: missing __cplusplus closing brace
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 30 Sep 2019 19:57:33 +0000 (15:57 -0400)]
Fix: trace chunk reported unknown before close command execution
The check for the existence of a trace chunk is done by using the
regular 'find' operations on a trace chunk registry in both the relay
and consumer daemons.
The 'find' operations attempt to acquire a reference to the trace
chunk being looked-up. On failure to acquire a reference, a trace
chunk is reported as being "unknown". The rotation completion check
uses this mechanism to determine if a trace chunk is still in use.
A close command defers a given operation until a trace chunk is no
longer in use (when its last reference is dropped). Hence, a thread
can attempt to 'find' a trace chunk, fail to acquire a reference, and
fail to see the effects of the close command.
In other words, the thread that has dropped the last reference to
the chunk could still be running the close command of a trace chunk
that is reported to be "unknown" by the consumer and relay daemons.
To fix this, dedicated "chunk_exists" operations are introduced. These
operations do not attempt to acquire a trace chunk. They only look it
up in the registry's internal hash table. As the removal of the trace
chunk from the hash table is performed _after_ the execution of its
close command, it provides a reliable way to ensure that a chunk that
had a close command could complete it before being reported as
"unknown"/no longer in use.
In terms of provided guarantees, this changes the moment at which a
trace chunk is considered to no longer exist from the moment its last
reference was dropped to the moment after the execution of its close
command has completed.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
This page took 0.05241 seconds and 4 git commands to generate.