Jérémie Galarneau [Mon, 30 Sep 2019 19:57:33 +0000 (15:57 -0400)]
Fix: trace chunk reported unknown before close command execution
The check for the existence of a trace chunk is done by using the
regular 'find' operations on a trace chunk registry in both the relay
and consumer daemons.
The 'find' operations attempt to acquire a reference to the trace
chunk being looked-up. On failure to acquire a reference, a trace
chunk is reported as being "unknown". The rotation completion check
uses this mechanism to determine if a trace chunk is still in use.
A close command defers a given operation until a trace chunk is no
longer in use (when its last reference is dropped). Hence, a thread
can attempt to 'find' a trace chunk, fail to acquire a reference, and
fail to see the effects of the close command.
In other words, the thread that has dropped the last reference to
the chunk could still be running the close command of a trace chunk
that is reported to be "unknown" by the consumer and relay daemons.
To fix this, dedicated "chunk_exists" operations are introduced. These
operations do not attempt to acquire a trace chunk. They only look it
up in the registry's internal hash table. As the removal of the trace
chunk from the hash table is performed _after_ the execution of its
close command, it provides a reliable way to ensure that a chunk that
had a close command could complete it before being reported as
"unknown"/no longer in use.
In terms of provided guarantees, this changes the moment at which a
trace chunk is considered to no longer exist from the moment its last
reference was dropped to the moment after the execution of its close
command has completed.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 25 Sep 2019 22:04:14 +0000 (18:04 -0400)]
Fix: sessiond: leak of application socket on chmod failure
The apps_sock fd is leaked whenever the chmod of the application
socket fails. Add a clean-up error path.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 25 Sep 2019 21:06:23 +0000 (17:06 -0400)]
sessiond: clean-up: silence warning that agent event is leaked
Both Coverity and scan-build got confused by this
function. Essentially, they warn that aevent can be leaked if
it is created in an already enabled state. We know that this can't
happen as the events are created in a disabled state.
Add an assert that created events are not enabled to help the static
analyzers. This could also catch the leak should this change in the
future.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 25 Sep 2019 20:55:35 +0000 (16:55 -0400)]
Fix: tests: leak of prefix on error to register lttng namespace
prefix should always be free'd in this function.
1402045 Resource leak
The system resource will not be reclaimed and reused, reducing the
future availability of the resource.
In register_lttng_namespace: Leak of memory or pointers to system
resources (CWE-404)
Reported-by: Coverity Scan
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Tue, 24 Sep 2019 15:24:17 +0000 (11:24 -0400)]
Fix: use newly created event filter for condition check
The following commit introduced a regression while
fixing the filter and filter_expression ownership.
commit
b0a23296344e57bd2e48e62ec2d7e0d8a38661bb
Author: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Date: Sat Jan 12 14:53:56 2019 -0500
Fix: leak of filter bytecode and expression on agent event re-enable
The agent subsystem does not properly assume the clean-up of an
event's filter bytecode and expression when a previously disabled
event is re-enabled.
This change ensures that the ownership of both the filter bytecode
and expression is assumed by the agent subsystem and discarded
when a matching event is found.
Steps to reproduce the leak:
$ lttng create
$ lttng enable-event --python allo --filter 'a[42] == 241'
$ lttng disable-event --python allo
$ lttng enable-event --python allo --filter 'a[42] == 241'
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Setting the "filter" object to NULL prevents the call to
add_filter_app_ctx when needed.
We use the filter from the newly created event to
perform the check and the call to add_filter_app_ctx.
Fixes coverity #
1399733
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Mathieu Desnoyers [Mon, 23 Sep 2019 18:31:33 +0000 (14:31 -0400)]
Fix: lttng-crash: detect truncated files
Detect truncated files which size is smaller than the ring buffer
header.
This can be caused by a situation where sessiond is killed with SIGKILL
while doing a metadata regenerate command.
Without this fix, lttng-crash is killed with a "Bus error" when
encountering a truncated file.
Fixes: #1166
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 24 Sep 2019 05:10:58 +0000 (01:10 -0400)]
Fix: sessiond: fs.protected_regular sysctl breaks app registration
I observed that userspace tracing no longer worked when an
instrumented application (linked against liblttng-ust) was launched
before the session daemon.
While investigating this, I noticed that the shm_open() of
'/lttng-ust-wait-8' failed with EACCES. As the permissions on the
'/dev/shm' directory and the file itself should have allowed the
session daemon to open the shm, this pointed to a change in kernel
behaviour.
Moreover, it appeared that this could only be reproduced on my
system (running Arch Linux) and not on other systems.
It turns out that Linux 4.19 introduces a new protected_regular sysctl
to allow the mitigation of a class of TOCTOU security issues related
to the creation of files and FIFOs in sticky directories.
When this sysctl is not set to '0', it specifically blocks the way the
session daemon attempts to open the app notification shm that an
application has already created.
To quote a comment added in linux's fs/namei.c as part of
30aba6656f:
```
Block an O_CREAT open of a FIFO (or a regular file) when:
- sysctl_protected_fifos (or sysctl_protected_regular) is enabled
- the file already exists
- we are in a sticky directory
- we don't own the file
- the owner of the directory doesn't own the file
- the directory is world writable
```
While the concerns that led to the inclusion of this patch are valid,
the risks that are being mitigated do not apply to the session
daemon's and instrumented application's use of this shm. This shm is
only used to wake-up applications and get them to attempt to connect
to the session daemon's application socket. The application socket is
the part that is security sensitive. At worst, an attacker controlling
this shm could wake up the UST thread in applications which would then
attempt to connect to the session daemon.
Unfortunately (for us, at least), systemd v241+ sets the
protected_regular sysctl to 1 by default (see systemd commit
27325875), causing the open of the shm by the session daemon to fail.
Introduce a fall-back to attempt a shm_open without the O_CREAT flag
when opening it with 'O_RDWR | O_CREAT' fails. The comments detail the
reason why those attempts are made in that specific order.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 23 Sep 2019 18:46:42 +0000 (14:46 -0400)]
relayd: clean-up: mix-up between LTTNG_PATH_MAX and LTTNG_NAME_MAX
LTTNG_PATH_MAX and LTTNG_NAME_MAX are mixed up in
cmd_create_session_2_4(). While Coverity warns of a possible buffer
overrun, this is not possible since the length of the received
buffer is correctly checked against LTTNG_NAME_MAX.
Change the use of LTTNG_PATH_MAX for LTTNG_NAME_MAX even though
strcpy() could be used safely here.
1405634 Out-of-bounds access
Access of memory not owned by this buffer may cause crashes or incorrect computations.
In relay_create_session: Out-of-bounds access to a buffer (CWE-119)
Reported-by: Coverity Scan
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Mathieu Desnoyers [Fri, 20 Sep 2019 21:41:14 +0000 (17:41 -0400)]
Fix: destroy command: put consumer output after destroy notifier
The destroy notifier needs to access the consumer output to format
the absolute path to the last chunk.
The observed problematic behavior can be observed by doing a
rotate and a destroy command in quick succession. Sometimes,
the resulting path printed by the destroy command is incomplete:
e.g. /archives/20190920T163616-0400-20190920T163618-0400-1
when we would expect:
/home/efficios/lttng-traces/auto-
20190920-164425/archives/20190920T164437-0400-20190920T164439-0400-1
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Fri, 20 Sep 2019 15:40:37 +0000 (11:40 -0400)]
Refactor: remove logically dead code
Fixes coverity #
1400681
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Fri, 20 Sep 2019 15:15:12 +0000 (11:15 -0400)]
Fix: Null check before destroying health_sessiond
Fixes coverity #
1399735
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Fri, 20 Sep 2019 15:07:55 +0000 (11:07 -0400)]
Fix: Move initialization of queue_pipe_fd after null check of handle
Fixes coverity #
1399732
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Fri, 20 Sep 2019 14:47:30 +0000 (10:47 -0400)]
Fix: release reference to new chunk on copy error
Fixes coverity #
1405775
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Fri, 20 Sep 2019 15:34:21 +0000 (11:34 -0400)]
Fix: Close socket handle on error
Fixes coverity #
1399739
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 19 Sep 2019 18:24:34 +0000 (14:24 -0400)]
Fix: lttng: out-of-bound copy of arguments in 'view' command handler
The 'size' operand of memcpy() does not indicate the length of the
opts array; it is the size of the resulting array once the opts array
is concatenated with the options being added in this function. This
results in out-of-bound read(s) in the opts array.
Use 'sizeof(char *) * opts_len' as the length to copy at the beginning
of the resulting array.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 19 Sep 2019 18:18:50 +0000 (14:18 -0400)]
Revert "lttng: fix: out-of-bounds copy of original 'view' command arguments"
This reverts commit
fc1465822a1075d98160bb74ae8e974ac8c010a8 as it
broke the 'lttng view' command on live sessions. The original commit
was meant to fix a warning, but it was (evidently) not an appropriate
change.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 19 Sep 2019 17:48:20 +0000 (13:48 -0400)]
Fix: relayd: session destruction does not complete in live mode
Live clients currently hold a reference to their relay_session's trace
chunk through the viewer streams. This causes the destruction of live
session to hang for as long as a client is consuming a session's
contents.
The reason for the hang itself is that the "quiet" rotation performed
as part of the session's destruction can never complete. Indeed, a
rotation completes when the last reference to the trace chunk being
archived is released. Since viewer streams hold references to this
trace chunk, they bound its lifetime to their own lifetime, resulting
in the destruction operation appearing to hang to users of
liblttng-ctl / the lttng client.
Since session rotations are not allowed for live sessions, it is safe
to presume that a session's output folder hierarchy will not change
over its lifetime with the exception of sub-folder additions, such as
in per-PID tracing mode where new traces may be added at any time
within the session's output directory.
The viewer session now copies its corresponding relay_session's trace
chunk to obtain a 'user' mode trace chunk. A user mode trace chunk
will prevent the viewer session from modifying the session output
hierarchy and prevent it from keeping the relay_session's trace chunk
"alive". The viewer stream simply use this trace chunk to perform
their filesystem operations, but they no longer extend the lifetime of
the relayd_session trace chunks.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 18 Sep 2019 16:34:07 +0000 (12:34 -0400)]
Add a copy method to the trace chunk interface
Trace chunks may now be copied. This will allow, in a follow-up fix,
streams to use the trace chunk facilities without extending the
life-time of an existing chunk.
However, note that the copy of an "owner" chunk returns a "user"
chunk. While this may be surprising at first, it makes no sense for
two "owner" trace chunks to point to the same location.
In such a case, it would be legal for both trace chunks to arbitrarily
modify their underlying sub-folder hierarchies without coordinating
among themselves. This would break the existing "move to completed"
close command, for instance.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 19 Sep 2019 17:47:08 +0000 (13:47 -0400)]
relayd: move viewer stream chunk reference release to destroy
Move the viewer stream's release of its trace chunk to the destroy
method (as opposed to the release method) as destroy is used directly
when an error occurs in the viewer stream's creation function.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 19 Sep 2019 17:22:17 +0000 (13:22 -0400)]
relayd: move relay_session locking outside of make_viewer_streams
The locking period of the relay_session is slightly extended outside
of make_viewer_streams(). This makes a follow-up change easier to
review, but is also a clean-up cleaner as some of the session's
attribute are accessed without holding the lock (`live_timer` and
`id`) assuming (correctly, for the moment) that they don't change over
the lifetime of the relayd_session. It also allows us to move the
relayd_session locking logic from viewer_session_attach().
The signature of send_viewer_streams() is changed to accept the
session's id rather than a pointer to the relay_session, therefore
making the locking scope simpler to follow. Changing this makes
follow-up changes easier to review.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 18 Sep 2019 17:47:26 +0000 (13:47 -0400)]
Fix: release reference to trace chunk on index file creation failure
The reference to a trace chunk that is acquired in
_lttng_index_file_create_from_trace_chunk() is leaked on error.
Release the reference in the function's error path.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 18 Sep 2019 16:33:15 +0000 (12:33 -0400)]
trace-chunk: clean-up: mark close command properties as static const
Command properties should not be experted outside of trace-chunk.c
and should be const. Mark them as such.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 18 Sep 2019 16:30:22 +0000 (12:30 -0400)]
trace-chunk: clean-up: misleading label name
The 'end_unlock' label in lttng_trace_chunk_set_close_command() no
longer needs to unlock the chunk (and doesn't do it). Rename it to
'end' to alleviate any confusion in the future.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Sun, 15 Sep 2019 19:10:20 +0000 (15:10 -0400)]
ust-consumer: fix: metadata stream lock taken before destroy
The lock of a metadata stream is taken when calling the stream's
destroy function after the completion of a snapshot. This is invalid
and does not appear to protect anything.
I am guessin that this was meant to be an 'unlock' invoked in the
error path of this functions.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Sun, 15 Sep 2019 19:01:33 +0000 (15:01 -0400)]
sessiond: fix: memory leak of section name in elf parser
lttng_elf_get_section_name() returns a dynamically-allocated section
name. However, lttng_elf_get_section_hdr_by_name() never frees
this returned section name.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Sun, 15 Sep 2019 18:56:23 +0000 (14:56 -0400)]
kconsumer: clean-up: initialize ctf_index before populating it
The 'offset' field of the ctf_index is used uninitialized when it is
populated. Its uninitialized value is read and stored back. This is
not a problem as the offset is populated later-on.
Coveity reports
1405630 Uninitialized scalar variable
The variable will contain an arbitrary value left from earlier computations.
In lttng_kconsumer_read_subbuffer: Use of an uninitialized variable (CWE-457)
Reported-by: Coverity Scan
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Sun, 15 Sep 2019 18:27:40 +0000 (14:27 -0400)]
sessiond: fix: strncpy called with source length
strncpy is called with the source's length in two cases in the
session save code. Use the destination and remaining destination
length as intended by the API.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Sun, 15 Sep 2019 17:19:22 +0000 (13:19 -0400)]
sessiond: fix: possible unaligned access in packed structure
A number of (potentially) unaligned pointers are used in client
command handlers to pass command parameters to other internal
functions. Fixing those forces a number of const-correctness
changes where contexts are passed as arguments.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Sat, 14 Sep 2019 20:53:22 +0000 (16:53 -0400)]
relayd: clean-up: strncpy uses the length of the source as length
strncpy is used with the source length of both session_name and
hostname. This doesn't make sense at face value. However, this is not
a "bug" as both src and dst are guaranteed to be the same length in
all existing code paths.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Sat, 14 Sep 2019 20:49:07 +0000 (16:49 -0400)]
lttng: fix: out-of-bounds copy of original 'view' command arguments
A mix-up between 'size' and 'opts_len' can cause the copy of the
original arguments passed to the 'view' command can result in
out-of-bounds accesses.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Sat, 14 Sep 2019 20:43:38 +0000 (16:43 -0400)]
lttng: clean-up: silence bogus string truncation warning
gcc 9.1.0 reports
enable_events.c:827:2: warning: ‘strncpy’ output truncated copying 9 bytes from a string of length 11 [-Wstringop-truncation]
827 | strncpy(ret, preamble, length);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
gcc seems confused and using "sizeof" the preamble seems to make
it understand the code flow and not generate the warning anymore.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Sat, 14 Sep 2019 20:27:45 +0000 (16:27 -0400)]
sessiond: clean-up: mixed log levels enums used to look-up event
lttng_ust_loglevel_type and lttng_loglevel_type enums are mixed. This
is correct as they share all values. A cast is added to silence a
clang warning. Normally we would assign the values of one to the
other. This is not practical here as the ust variant is defined by the
lttng-ust API and the other is part of the public lttng-ctl API.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Sat, 14 Sep 2019 20:16:56 +0000 (16:16 -0400)]
sessiond: fix: possible unaligned access in packed structure
Use a temporary snapshot_id in cmd_rotate_get_info() to obtain the
return value of cmd_snapshot_add_output() and explicitly assign it to
the reply communication structure. Otherwise, &reply.id may be
unaligned.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Sat, 14 Sep 2019 20:09:35 +0000 (16:09 -0400)]
sessiond: fix: possible unaligned access in packed structure
Use temporary control and data port variables to get rotation
destination ports and explicitly copy them to the return communication
structure.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Sat, 14 Sep 2019 20:02:33 +0000 (16:02 -0400)]
sessiond: fix: possible unaligned access in packed structure
'&rsock->sock.fd' is passed to consumer_send_fds and may result in an
unaligned pointer value. Use the ALIGNED_CONST_PTR macro to create
an aligned copy of the fd that is being passed.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Sat, 14 Sep 2019 19:58:17 +0000 (15:58 -0400)]
sessiond: fix: possible unaligned access in packed structure
Use a temporary relayd_session_id variable to prevent an unaligned
access in the packed communication structure 'msg'.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Sat, 14 Sep 2019 19:51:32 +0000 (15:51 -0400)]
lttng-ctl: fix: possible unaligned access in packed structure
Wrap all lttng_domain copies with COPY_DOMAIN_PACKED which copies the
source domain to a temporary destination (on stack) and then assign
this temporary domain to the destination domain. This ensures the
compiler generates the code needed to perform the unaligned accesses
to the domain.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Sat, 14 Sep 2019 18:38:44 +0000 (14:38 -0400)]
lttng-ctl rotate: fix: possible unaligned access in packed structure
Fix the warnings that unaligned pointers can be passed as parameters
emitted when sampling rotation schedules.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Sat, 14 Sep 2019 18:31:40 +0000 (14:31 -0400)]
runas: fix: possible unaligned access in packed structure
Fix the warning that an unaligned pointers can be passed as parameter
emitted in _extract_elf_symbol_offset.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Sat, 14 Sep 2019 18:25:25 +0000 (14:25 -0400)]
consumer: fix: possible unaligned access in packed structure
Fix the warnings that unaligned pointers can be passed as parameters
emitted when sampling buffer statistics.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 12 Sep 2019 15:51:52 +0000 (11:51 -0400)]
inet: fix: possible unaligned access in packed structure (inet/inet6)
Fix the warnings that unaligned pointers can be passed as parameters
emitted when building inet.c and inet6.c.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Fri, 13 Sep 2019 20:48:18 +0000 (16:48 -0400)]
consumer: fix: unaligned accesses to index fields
The ctf_index structure, being part of the ABI, is explicitly packed
using the LTTNG_PACKED macro. However, populating it by using pointers
to its members is not acceptable as it may cause the ust and kernel
tracer APIs to populate write their return values using unaligned
pointers.
Use automatic storage variables to fetch the various index fields and
populate the index at-once using a compound literal.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 12 Sep 2019 14:23:23 +0000 (10:23 -0400)]
lttng-sessiond: clean-up: set free'd pointer to NULL
Set the wait_node and ust_cmd pointers to NULL after they have been
free'd to make this function easier to follow. This may also help
scan-build analyze this function as it gets confused about the values
of those variables. Currently, scan-build (clang 8.0.1) reports a
use-after free of both variables when 'app' is simultaneously null and
non-null... you read that right.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 12 Sep 2019 14:17:41 +0000 (10:17 -0400)]
lttng: fix: potential 0-length allocation in pid list parsing
Check that count is > 0 before allocating pid list. This would
only happen after a prior error, but check it anyway.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 12 Sep 2019 14:12:41 +0000 (10:12 -0400)]
Tests: fix: uninitialized session_id used on list_sessions failure
Stop live test when list_sessions() fails since the session_id used
further on would be uninitialized.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 12 Sep 2019 14:08:44 +0000 (10:08 -0400)]
Tests: fix: uninitialized values passed to close() on error
The fds array is not initialized resulting in uninitialized file
descriptors being passed to close() when an error is encountered in
the epoll-setting loop.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 12 Sep 2019 14:02:06 +0000 (10:02 -0400)]
Clean-up: assert that get_count_order() returns a positive value
The callers of get_count_order() don't handle negative values
correctly. However, those should not happen without a previous
bug. Assert that its return value is >= 0.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 12 Sep 2019 13:50:29 +0000 (09:50 -0400)]
Clean-up: suppress bogus scan-build warning
scan-build seems confused about the address passed to free() and
reports that chunk elements are passed with an offset of 8 bytes from
the address that was returned by calloc().
Move the inner-trace chunk of an element to the start of the element
wrapper-structure to silence this warning.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 11 Sep 2019 17:30:18 +0000 (13:30 -0400)]
Tests: fix: leak caused by misuse of realloc in multi-lib-test
realloc() can return NULL, in which case 'libraries' will never
be free'd in the case where the reallocation happens outside of
the while()'s first iteration.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 11 Sep 2019 17:22:32 +0000 (13:22 -0400)]
sessiond: clean-up: init ret value of _session_set_trace_chunk_no_lock_check
This fixes are warning that 'ret' may not be initialized when a
session has no ust nor kernel session. In such a case, the sessiond
would never start a session and thus would not assign a trace chunk to
the session. Still, this can be confuging so it is better-off being
fixed.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 11 Sep 2019 17:16:48 +0000 (13:16 -0400)]
sessiond: fix: print_escaped_ctf_string mishandles empty string
The return value of print_escaped_ctf_string() is uninitialized in the
case of an empty string ("").
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 11 Sep 2019 16:18:15 +0000 (12:18 -0400)]
lttng-ctl: fix: lttng_data_pending confuses communication status
lttng_ctl_ask_sessiond can return a positive value even though it
failed to receive the variable length payload of a session message
reply. In this case, lttng_ctl_ask_sessiond ends up calling into
lttng_ctl_ask_sessiond_fds_varlen() which will return the (negated)
error code returned by the session daemon if it was not LTTNG_OK.
The peer could return anything here, which lttng_data_pending will end
up interpreting as the length of the variable data that was received.
In this case, if the sessiond returns '-1', '1' will be returned to
lttng_data_pending, which it will interpret as being the length of the
'data_pending' byte flag. It will then dereference 'pending', which is
NULL, and (most likely) crash.
Check for NULL on top of checking for the return code. This
communication layer needs love as much as it needs a bulldozer.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 11 Sep 2019 16:03:38 +0000 (12:03 -0400)]
relayd: fix: rotate_truncate_stream() assumes non-null next chunk
While the protocol doesn't allow a stream rotation position in the
past when rotating to a "NULL" trace chunk, a misbehaving peer could
express this. Report the protocol error and abort the truncation
operation in this case.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 11 Sep 2019 00:35:19 +0000 (20:35 -0400)]
Fix: dereference of NULL pointer in stream_write()
stream_write() can be used with a NULL packet to write padding only to
a stream. In the case of a metadata stream, packet is assumed to
always be non-NULL which may not always be true.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 11 Sep 2019 00:26:47 +0000 (20:26 -0400)]
Fix: report bytecode_push failure when pushing symbol
The last use of bytecode_push's return value is not checked in
visit_node_load_expression_legacy().
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 11 Sep 2019 00:24:20 +0000 (20:24 -0400)]
Fix: only invoke PERROR() on failure to close sessiond_socket
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 11 Sep 2019 00:17:34 +0000 (20:17 -0400)]
Clean-up: lttng: check status returned when checking rotation state
While the use of the destruction handle in the lttng client guarantees
that obtaining the rotation's state will succeed, it is a poor example
to give of using this API. Moreover, we don't wait to give the
impression that this could never change.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 11 Sep 2019 00:08:30 +0000 (20:08 -0400)]
Fix: communication error unreported in relay_rotate_session_streams
'ret' is set to 0 even if the communication with the relay daemon
fails in relay_rotate_session_streams().
This also fixes a dead-assignment warning.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 11 Sep 2019 00:04:33 +0000 (20:04 -0400)]
Fix: unreported error in relay_close_trace_chunk
Errors encountered in relay_close_trace_chunk are not reported to the
caller as the 'ret' value is re-used to check for the successful
'append' to a dynamic buffer. Use a separate variable.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 11 Sep 2019 00:02:32 +0000 (20:02 -0400)]
Clean-up: remove dead assignment from ht cleanup thread launcher
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 11 Sep 2019 00:01:32 +0000 (20:01 -0400)]
Clean-up: remove dead assignment from thread_rotation
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 10 Sep 2019 23:59:20 +0000 (19:59 -0400)]
Clean-up: remove dead assignment in setup_channel_trace_path
setup_channel_trace_path() returns NULL on error. There is no
need to update a return code.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 10 Sep 2019 23:56:18 +0000 (19:56 -0400)]
Clean-up: remove dead assignments while setting session trace chunk
The error path always returns -1 making more specific error reporting
impossible. Don't bother reporting more precise error codes.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 10 Sep 2019 23:53:20 +0000 (19:53 -0400)]
Fix: report path truncation on addition of local uri to consumer
Return an error whenever a session's destinatio path exceeds the
maximal allowed length (LTTNG_PATH_MAX).
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 10 Sep 2019 23:47:28 +0000 (19:47 -0400)]
Clean-up: lttng: silence warning in regenerate command handler
argv[0] is used before checking argc, resulting in a warning
that argv could be of zero-length.
This is not a reachable bug; it really is a false positive.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 10 Sep 2019 23:43:16 +0000 (19:43 -0400)]
Clean-up: lttng: silence warning in metadata command handler
argv[0] is used before checking argc, resulting in a warning
that argv could be of zero-length.
This is not a reachable bug; it really is a false positive.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 10 Sep 2019 23:35:32 +0000 (19:35 -0400)]
Clean-up: remove NULL check on sesison_create mandatory arguments
session_name, base_path, and hostname are assumed to be non-null in
the session_create() function. Checking the pointers for null is
therefore useless.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 10 Sep 2019 23:30:33 +0000 (19:30 -0400)]
Fix: session may be NULL in relay_create_session error path
A reply is sent in response to a session's creation even if it failed
to be created. In those cases, session is null and its output_path
should not be accessed.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 10 Sep 2019 17:44:26 +0000 (13:44 -0400)]
Clean-up: silence erroneous leak warning
Add an assert to validate that a newly created uevent is not enabled.
If it was enabled at creation time, it would be leaked before it is
published.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 10 Sep 2019 17:18:32 +0000 (13:18 -0400)]
Fix: leak of lttng_elf_shdr in lttng-elf.c
Most lttng_elf_shdr uses in lttng-elf.c end-up leaking the header.
Since those headers instances are always short-lived and relatively
small, these leaks are fixed by not using dynamic allocations and
instead using automatic storage.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 10 Sep 2019 16:44:44 +0000 (12:44 -0400)]
Fix: leaked chunk reference in lttng_consumer_create_trace_chunk
Error paths using the 'end' label leak the reference of the
'created_chunk'. The labels are renamed to error and error_unlock
to follow the convention used in most of the code base.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 9 Sep 2019 16:39:02 +0000 (12:39 -0400)]
Clean-up: remove unused stream file creation and unlink functions
No code should be creating or unlinking stream files without going
through the trace chunk interface. These functions are removed since
there are no more users and no legitimate future users.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 10 Sep 2019 14:27:22 +0000 (10:27 -0400)]
Fix: use the trace chunk to truncate streams on late rotation
A stream's rotation can occur after the reception of data that should
be part of the "next" trace chunk. In those cases, the current stream
file and the next one (belonging to the new trace chunk) need to be
opened. The misplaced data is copied between both files and the
now-old file is closed.
This code was not transitioned to use the trace chunk interface and is
the last user of raw stream file FDs. This patch transitions the
function (rewrites it, really) to use the trace chunk interface.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 9 Sep 2019 16:14:54 +0000 (12:14 -0400)]
Clean-up: format using remaining buffer len rather than total len
end_datetime_suffix is has an extra byte to contain a '-' prefix.
This means that its effective length is ISO8601_STR_LEN and not its
sizeof(). This is not considered a fix as time_to_iso8601_str() does
not write more than ISO8601_STR_LEN (for time being).
The ISO8601_STR_LEN macro is used for the buffer lengths for clarity;
no behaviour change is intended.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 9 Sep 2019 16:08:21 +0000 (12:08 -0400)]
Fix sessiond: report flush errors on session stop
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 9 Sep 2019 15:22:39 +0000 (11:22 -0400)]
Fix: leak of trace_path on error in ust_app_snapshot_record
trace_path is leaked in some error paths of ust_app_snapshot_record().
Lift trace_path to the function's scope and free it whenever it is not
NULL. This is not clean, but this function should be cleaned-up in a
separate patch.
Moreover, this fixes a use after free in the PER_PID-case as
trace_path is free'd when a channel is not found.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 9 Sep 2019 15:14:47 +0000 (11:14 -0400)]
Fix: uninitialized directory handle finalized on error path
relay_create_trace_chunk() creates a session output directory handle
and assigns it to a trace chunk. Since a handle doesn't have a
dedicated "uninitialized" state, reduce its lifetime and don't
generically finalize it in error paths.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 9 Sep 2019 15:00:30 +0000 (11:00 -0400)]
Fix: release reference to created chunk if it can't be published
The reference to a trace chunk created in
lttng_consumer_create_trace_chunk() should be dropped if it can't be
published. Failing to do so results in a leak.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 9 Sep 2019 14:53:45 +0000 (10:53 -0400)]
Fix: destroy chunk registry on element creation failure
A failure path in trace_chunk_registry_ht_element_create() does not
destroy a newly created chunk registry, resulting in a leak.
The ownership of the registry is transfered to the element being
created as soon as possible. Error paths that release the reference on
the new element will then naturally clean-up the trace chunk registry.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 9 Sep 2019 14:46:21 +0000 (10:46 -0400)]
Fix: return NULL on trace chunk registry failure
A free'd registry object can be returned in the error paths
of lttng_trace_chunk_registry_create(). Right now, this happens
whenever the allocation of the registry's hash table fails.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 9 Sep 2019 14:39:23 +0000 (10:39 -0400)]
Fix: forward fatal error in evaluate_condition_for_client()
A fatal error during the evaluation of a condition on behalf
of a client of a notification channel should be forwarded up
the stack to the notification thread.
Here, 'ret' is silently ignored assuming that a check of "evaluation"
is equivalent, which is not the case.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 9 Sep 2019 14:22:58 +0000 (10:22 -0400)]
Fix relayd: check for NULL in session_put
The session and relay daemons both define their own "session"
APIs (ltt_session and relay_session) which define a session_put()
function.
Coverity reports that a fair amount of callers now assume that
session_put() assumes that a NULL check is performed (as in the
sessiond).
Since the session daemon's variant checks for NULL, it makes sense to
bring both implementation to parity to fix the problems reported and
make this function less confusing to use. This also allows
simplifications to the error handling paths in the relay daemon
(not included in this patch).
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Geneviève Bastien [Fri, 13 Apr 2018 19:51:28 +0000 (15:51 -0400)]
Fix typo in regenerate statedump test util
Signed-off-by: Geneviève Bastien <gbastien+lttng@versatic.net>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Fri, 6 Sep 2019 17:57:01 +0000 (13:57 -0400)]
Update version to v2.11.0-rc3
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Fri, 6 Sep 2019 17:33:18 +0000 (13:33 -0400)]
Fix: time constants already defined on macOS
Include the compatibility time.h header in the common/time.h
to override any platform-defined constants. It seems that the
NSEC_PER_SEC (and other similar definitions) are defined to
nothing on macOS. This was already fixed in the compatibility
header in the past.
This also adds USEC_PER_SEC to the list of undef-initions
on macOS.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Fri, 6 Sep 2019 17:09:54 +0000 (13:09 -0400)]
Fix: missing include strings.h for bcopy on Solaris 10/11
bcopy() is defined in strings.h on Solaris 10 and 11.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Fri, 6 Sep 2019 16:21:58 +0000 (12:21 -0400)]
Fix: strndup does not exist on Solaris 10
strndup does not exist on Solaris 10. Use the lttng_strndup
compatibility wrapper.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Fri, 6 Sep 2019 16:21:14 +0000 (12:21 -0400)]
Fix: strnlen is not defined on Solaris 10
strnlen does not exist on Solaris 10. Use the lttng_strnlen
compatibility wrapper.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Fri, 6 Sep 2019 15:56:02 +0000 (11:56 -0400)]
Fix: redefinition of USEC_PER_SEC macro on macOS
USEC_PER_SEC is defined by system headers on macOS, resulting in
a warning that it is redefined in time.h. Define it conditionally.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Fri, 6 Sep 2019 15:46:23 +0000 (11:46 -0400)]
Fix: sessiond does not build without lttng-ust support
A syntax error in the stub of ust_registry_session_init()
(extra semi-colon) prevents the build of the lttng-sessiond
from succeeding when configured with the --without-lttng-ust
option.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Mathieu Desnoyers [Thu, 5 Sep 2019 19:18:40 +0000 (15:18 -0400)]
Fix: sessiond: handle NULL control output in session descriptor base path getter
Creating a session with "lttng create --live" without specifying any
URL triggers a SEGFAULT (dereferencing a NULL pointer) because the
output is not set when getting the session descriptor base path.
Indeed, the destination output URL will only be set later in
cmd_create_session_from_descriptor(), when setting the default output.
When the default output is used, no base path override is possible,
therefore it is fine to assign the base_path to NULL in the base path
getter.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Thu, 5 Sep 2019 23:24:04 +0000 (19:24 -0400)]
Tests: metadata env scope values
The tests extract information from the metadata env scope of the trace
and reconstruct an lttng directory hierarchy to validate that the
information found in the metadata is correct.
Testing this way simulate a viewer that must reconstruct a lttng directory
hierarchy from metadata information only. This information is mostly there
for this purpose.
While there we validate the value of other env field when possible.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Thu, 5 Sep 2019 22:23:54 +0000 (18:23 -0400)]
Support LTTNG_KERNEL_SESSION_SET_CREATION_DATETIME of lttng-modules
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Thu, 5 Sep 2019 22:23:53 +0000 (18:23 -0400)]
Support LTTNG_KERNEL_SESSION_SET_NAME of lttng-modules
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Thu, 5 Sep 2019 22:23:52 +0000 (18:23 -0400)]
Metadata: add env fields to ease lttng path hierarchy creation for viewer
Add the following fields in the env section of the metadata:
- trace_name
The session name without datetime information.
Hence when the session is an auto-generated one, only print
LTTNG_DEFAULT_NAME.
- trace_creation_datetime:
The datetime at which the session was created.
We use session->creation time for it.
- tracer_buffering_scheme
The buffering scheme used. The value can be uid or pid.
- tracer_buffering_id
The key used by the buffering scheme (uid/pid).
- architecture_bit_width
The bit width of the computer architecture (e.g 32 or 64)
- vpid_datetime
The registration time of the vpid for per-pid mode.
Adding these fields ensure that the trace itself carry information that
is normally carried via folder hierarchy. e.g
test-
20190417-174951/ <- trace_name, trace_creation_datetime
└── ust <- domain
└── uid <- tracer_buffering_scheme
└── 1000 <- tracer_buffering_id
└── 64-bit <- architecture_bit_width
├── channel0_0
├── index
│ ├── channel0_0.idx
└── metadata
Per-pid buffering is quite similar.
auto-
20190722-174816 <- trace_name, trace_creation_datetime
└── ust <- domain
└── pid <- tracer_buffering_scheme
└── sample-ust-7640-
20190722-174818 <- procname, tracer_buffering_id, vpid_datetime
├── my-channel_0
├── index
│ ├── my-channel_0.idx
├── metadata
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Wed, 4 Sep 2019 22:30:00 +0000 (18:30 -0400)]
Fix: lttng_directory_handle_init fails on opening base relayd output
lttng_directory_handle_init, called from session_set_anonymous_chunk,
fails to open "$LTTNG_HOME/lttng-traces" directory as the folder is
not yet created at this time.
Rename init_session_output_directory_handle to
session_init_output_directory_handle and move it to session.h
For an anonymous chunk, the session->output_path is empty. The
resulting output directory handle is at the "root" node of the
lttng-relayd chosen output.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Wed, 4 Sep 2019 22:29:59 +0000 (18:29 -0400)]
Fix: init_session_output_path is valid for peer >= 2.11 only
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Wed, 4 Sep 2019 22:29:58 +0000 (18:29 -0400)]
Fix: chunk state is not set when relayd does not support trace chunks
Being explicit here enforces that for a lttng-relayd that does not
support chunks, the queried chunk will never exist.
This caused a lttng destroy command to hang during backward
compatibility testing with older lttng-relayd.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Wed, 28 Aug 2019 20:36:03 +0000 (16:36 -0400)]
Fix: check validity of a stream before invoking ust flush command
At the time ustctl_flush_buffer is called the ustream object might have
already been freed on lttng-ust side.
This can happen following a lttng_consumer_cleanup_relayd and concurrent
consumer flush command (lttng stop).
The chain of events goes as follows.
An error on communication with lttng-relayd occurs.
lttng_consumer_cleanup_relayd flags the streams for deletion
(CONSUMER_ENDPOINT_INACTIVE). validate_endpoint_status_data_stream calls
consumer_del_stream.
At the same time the hash table of streams is iterated over in the
flush_channel function following a stop command. The loop is iterating on
a given stream. The current thread is unscheduled before taking the stream
lock.
In the initial thread, the same stream is the current iteration of
cds_lfht_for_each_entry in validate_endpoint_status_data_stream.
consumer_del_stream is called on it. The stream lock is acquired, and
destroy_close_stream is called. lttng_ustconsumer_del_stream is eventually
called and at this point the ustream is freed.
Going back to the iteration in flush_channel. The current stream is still
valid from the point of view of the iteration, ustctl_flush_buffer is then
called on a freed ustream object.
This can lead to unknown behaviour since there is no validation on the
lttng-ust side. The underlying memory of the ustream object is garbage at
this point.
To prevent such scenario, we check for the presence of the node in the
hash table via cds_lfht_is_node_deleted while holding the stream lock.
This is valid because the stream destruction removes the node from
the hash table and frees the ustream object with the stream lock held.
This duplicate similar "validation" check of the stream object. [1][2]
[1] src/common/consumer/consumer.c:consumer_close_channel_streams
[2] src/common/ust-consumer/ust-consumer.c:close_metadata
This issue can be reproduced by the following scenario:
Modify flush_channel to sleep (i.e 10s) before acquiring the lock on
a stream.
Modify lttng-ust ustctl_destroy_stream to set the
ring_buffer_clock_read callback to NULL.
Note: An assert on !cds_lfht_is_node_deleted in flush channel
after acquiring the lock can provide the same information. We are
modifying the callback to simulate the original backtrace from our
customer.
lttng-relayd
lttng-sessiond
lttng create --live
lttng enable-event -u -a
lttng start
Start some applications to generate data.
lttng stop
The stop command force a flush of the channel/streams.
pkill -9 lttng-relayd
Expect assert or segfault
The original customer backtrace:
0 lib_ring_buffer_try_switch_slow (handle=<optimized out>, tsc=<synthetic pointer>, offsets=0x3fffa9b76c80, chan=0x3fff98006e90, buf=<optimized out>,
mode=<optimized out>) at /usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/ring_buffer_frontend.c:1834
1 lib_ring_buffer_switch_slow (buf=0x3fff98016b40, mode=<optimized out>, handle=0x3fff98017670)
at /usr/src/debug/lttng-ust/2.9.1/git/libringbuffer/ring_buffer_frontend.c:1952
2 0x00003fffac680940 in ustctl_flush_buffer (stream=<optimized out>, producer_active=<optimized out>)
at /usr/src/debug/lttng-ust/2.9.1/git/liblttng-ust-ctl/ustctl.c:1568
3 0x0000000010031bc8 in flush_channel (chan_key=<optimized out>) at ust-consumer.c:772
4 lttng_ustconsumer_recv_cmd (ctx=<optimized out>, sock=<optimized out>, consumer_sockpoll=<optimized out>) at ust-consumer.c:1651
5 0x000000001000de50 in lttng_consumer_recv_cmd (ctx=<optimized out>, sock=<optimized out>, consumer_sockpoll=<optimized out>) at consumer.c:2011
6 0x0000000010014208 in consumer_thread_sessiond_poll (data=0x10079430) at consumer.c:3192
7 0x00003fffac608b30 in start_thread (arg=0x3fffa9b7bdb0) at pthread_create.c:462
8 0x00003fffac530d0c in .__clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:96
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Mon, 3 Jun 2019 19:25:52 +0000 (15:25 -0400)]
Fix: namespace our gettid wrapper
Since glibc 2.30, a gettid wrapper was added that conflicts with our
static declaration. Namespace our wrapper so there is no conflict,
we'll add support for the glibc provided wrapper in a further commit.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
This page took 0.047763 seconds and 4 git commands to generate.