Jérémie Galarneau [Thu, 16 May 2019 16:18:10 +0000 (12:18 -0400)]
Remove unused bitfield.h header
There are no users of the bitfield.h header. It was previously
used to list syscalls from a kernel channel in
834978fd, but
this function was removed in
9897fbc9.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 12 Dec 2018 20:11:29 +0000 (15:11 -0500)]
Use uuid_to_str() when formatting metadata
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 12 Dec 2018 20:10:36 +0000 (15:10 -0500)]
Add an internal uuid formatting utility
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 18 Jul 2019 19:51:49 +0000 (15:51 -0400)]
Tests build fix: undefined MAGIC_VALUE macro
The MAGIC_VALUE macro is only defined when the build is configured
in epoll mode. This value should be defined unconditionally as it
is used in both poll and epoll test modes.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 18 Jul 2019 19:49:28 +0000 (15:49 -0400)]
Build fix: undeclared variable in poll compat
No `ipfd` variable exists in the compat_poll_wait function. The
author meant to use `idle_pfd`.
Reported-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Geneviève Bastien [Mon, 17 Jun 2019 16:56:21 +0000 (12:56 -0400)]
Fix: error when listing sessions with no session
lttng_list_sessions() returns a "fatal error" code when
lttng_ctl_ask_sessiond() returns 0. This was interpreted as the
control socket being shutdown unexpectedly. However, it is
(more often) caused by no sessions being available. Given that, it
makes more sense to report that no sessions are available.
More clean-up/refactoring would be needed to report unexpected socket
shutdowns.
Fixes #1188
Signed-off-by: Geneviève Bastien <gbastien@versatic.net>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Fri, 24 May 2019 19:24:22 +0000 (15:24 -0400)]
Update version to v2.11.0-rc2
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Thu, 23 May 2019 18:11:35 +0000 (14:11 -0400)]
Update base test for binding
This test is not run for now as it is not part of the test suite.
Use a temporary directory to store trace.
Split in 2 test suite, one for ust and the other for kernel.
Partially fix formatting.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Thu, 23 May 2019 18:02:26 +0000 (14:02 -0400)]
Fix: python binding: expose domain buffer type
On enable_channel the domain buffer type is used to create a temporary
channel. This currently fail for kernel channel since the buffer type is
not exposed at the binding level and default to LTTNG_BUFFER_PER_PID.
Channel for the kernel domain can only be created in LTTNG_BUFFER_GLOBAL
mode.
Exposing the buffer type also allow userpsace channel to use the per uid
buffering scheme.
The current bindings are in a rough state. This is to at least get them
to work with kernel domain.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Yannick Lamarre [Thu, 25 Apr 2019 22:23:34 +0000 (18:23 -0400)]
Clean-up: correct typo from epoll to poll
Signed-off-by: Yannick Lamarre <ylamarre@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Yannick Lamarre [Thu, 25 Apr 2019 22:23:33 +0000 (18:23 -0400)]
Clean code base from redundant verification
Remove redundant verification for file descriptors with 0 revents in
code base.
Signed-off-by: Yannick Lamarre <ylamarre@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Yannick Lamarre [Thu, 25 Apr 2019 22:23:32 +0000 (18:23 -0400)]
Change lttng_poll_wait behaviour of compat-poll to match compat-epoll
This removes the need to verify for idle file descriptors and mitigates
risks of bug due to behaviour mismatch.
Signed-off-by: Yannick Lamarre <ylamarre@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Yannick Lamarre [Thu, 25 Apr 2019 22:23:31 +0000 (18:23 -0400)]
Fix: hang in thread_rotation when using compat-poll
Add missing verification to prevent a blocking read on an empty fd.
Signed-off-by: Yannick Lamarre <ylamarre@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Yannick Lamarre [Thu, 25 Apr 2019 22:23:30 +0000 (18:23 -0400)]
Adapt poll layer behaviour to match the epoll layer
Signed-off-by: Yannick Lamarre <ylamarre@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Yannick Lamarre [Thu, 25 Apr 2019 22:23:29 +0000 (18:23 -0400)]
Change LTTNG_POLL_GETNB behaviour for poll flavor
Modify LTTNG_POLL_GETNB to provide compatibility with the epoll flavor.
Since it is only used after a lttng_poll_wait call with no modification
(add, del, mod) between, this change does not modify the behaviour in
its current usage while ensuring similar API behavior between
compatibility layer implementations.
Signed-off-by: Yannick Lamarre <ylamarre@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Yannick Lamarre [Thu, 25 Apr 2019 22:23:28 +0000 (18:23 -0400)]
Add Unit test to poll compatibility layer
Signed-off-by: Yannick Lamarre <ylamarre@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Yannick Lamarre [Thu, 25 Apr 2019 22:23:27 +0000 (18:23 -0400)]
Fix: lttng_poll_mod calls compat_(e)poll_add
lttng_poll_mod should call compat_(e)poll_mod.
Signed-off-by: Yannick Lamarre <ylamarre@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Tue, 30 Apr 2019 23:09:24 +0000 (19:09 -0400)]
Fix: getenv can return null
On system with LANG not defined getenv will return null.
An example of such system is the lava runner used by ci.lttng.org.
https://ci.lttng.org/view/System%20Tests/
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Mathieu Desnoyers [Tue, 30 Apr 2019 18:59:51 +0000 (14:59 -0400)]
Bump LTTNG_UST_ABI to 8.0
See commit
6c737d05 in lttng-ust for the rationale behind this
change.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Sun, 28 Apr 2019 22:06:11 +0000 (18:06 -0400)]
Fix: directory handle credentials parameter is not const
There is no reason for the "as user" operations on a directory
handle not to take the credentials as a const parameter. Not
passing credentials as const makes their ownership ambiguous
and makes it harder to write const-correct code.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Geneviève Bastien [Thu, 4 Apr 2019 19:18:07 +0000 (15:18 -0400)]
doc: Add reference to USDT probes
SDT probes are known by most as USDT probes. People may be looking for them
by that name.
Reviewed-by: Philippe Proulx <eeppeliteloop@gmail.com>
Signed-off-by: Geneviève Bastien <gbastien+lttng@versatic.net>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Yannick Lamarre [Thu, 28 Mar 2019 19:07:43 +0000 (15:07 -0400)]
Clean-up: Remove double buffer initialisation
Signed-off-by: Yannick Lamarre <ylamarre@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Mon, 1 Apr 2019 20:33:41 +0000 (16:33 -0400)]
Fix: getgrnam is not MT-Safe, use getgrnam_r
Running the test suite under a Yocto musl build resulted in musl
coredump due to double freeing.
We get the following backtraces:
0 a_crash () at ./arch/x86_64/atomic_arch.h:108
1 unmap_chunk (self=<optimized out>) at src/malloc/malloc.c:515
2 free (p=<optimized out>) at src/malloc/malloc.c:526
3 0x00007f46d9dc3849 in __getgrent_a (f=f@entry=0x7f46d9d1f7e0, gr=gr@entry=0x7f46d9e24460 <gr>, line=line@entry=0x7f46d9e26058 <line>, size=size@entry=0x7f46d92db550, mem=mem@entry=0x7f46d9e26050 <mem>, nmem=nmem@entry=0x7f46d92db558, res=0x7f46d92db548) at src/passwd/getgrent_a.c:45
4 0x00007f46d9dc2e6b in __getgr_a (name=0x487242 "tracing", gid=gid@entry=0, gr=gr@entry=0x7f46d9e24460 <gr>, buf=buf@entry=0x7f46d9e26058 <line>, size=size@entry=0x7f46d92db550, mem=mem@entry=0x7f46d9e26050 <mem>, nmem=0x7f46d92db558, res=0x7f46d92db548) at src/passwd/getgr_a.c:30
5 0x00007f46d9dc3733 in getgrnam (name=<optimized out>) at src/passwd/getgrent.c:37
6 0x0000000000460b29 in utils_get_group_id (name=<optimized out>) at ../../../lttng-tools-2.10.6/src/common/utils.c:1241
7 0x000000000044ee69 in thread_manage_health (data=<optimized out>) at ../../../../lttng-tools-2.10.6/src/bin/lttng-sessiond/main.c:4115
8 0x00007f46d9de1541 in start (p=<optimized out>) at src/thread/pthread_create.c:195
9 0x00007f46d9dee661 in __clone () at src/thread/x86_64/clone.s:22
From another run:
0 a_crash () at ./arch/x86_64/atomic_arch.h:108
1 unmap_chunk (self=<optimized out>) at src/malloc/malloc.c:515
2 free (p=<optimized out>) at src/malloc/malloc.c:526
3 0x00007f5abc210849 in __getgrent_a (f=f@entry=0x7f5abc2733e0, gr=gr@entry=0x7f5abc271460 <gr>, line=line@entry=0x7f5abc273058 <line>, size=size@entry=0x7f5abaef5510, mem=mem@entry=0x7f5abc273050 <mem>, nmem=nmem@entry=0x7f5abaef5518, res=0x7f5abaef5508) at src/passwd/getgrent_a.c:45
4 0x00007f5abc20fe6b in __getgr_a (name=0x487242 "tracing", gid=gid@entry=0, gr=gr@entry=0x7f5abc271460 <gr>, buf=buf@entry=0x7f5abc273058 <line>, size=size@entry=0x7f5abaef5510, mem=mem@entry=0x7f5abc273050 <mem>, nmem=0x7f5abaef5518, res=0x7f5abaef5508) at src/passwd/getgr_a.c:30
5 0x00007f5abc210733 in getgrnam (name=<optimized out>) at src/passwd/getgrent.c:37
6 0x0000000000460b29 in utils_get_group_id (name=<optimized out>) at ../../../lttng-tools-2.10.6/src/common/utils.c:1241
7 0x000000000042dee4 in notification_channel_socket_create () at ../../../../lttng-tools-2.10.6/src/bin/lttng-sessiond/notification-thread.c:238
8 init_thread_state (state=0x7f5abaef5560, handle=0x7f5abbf9be40) at ../../../../lttng-tools-2.10.6/src/bin/lttng-sessiond/notification-thread.c:375
9 thread_notification (data=0x7f5abbf9be40) at ../../../../lttng-tools-2.10.6/src/bin/lttng-sessiond/notification-thread.c:495
10 0x00007f5abc22e541 in start (p=<optimized out>) at src/thread/pthread_create.c:195
11 0x00007f5abc23b661 in __clone () at src/thread/x86_64/clone.s:22
The problem was easily reproducible (~6 crash on ~300 runs). A prototype fix
using mutex around the getgrnam yielded no crash in over 1000 runs. This
patch yielded the same results as the prototype fix.
Unfortunately we cannot rely on a mutex in liblttng-ctl since we cannot
enforce the locking for the application using the lib.
Use getgrnam_r instead.
The previous implementation of utils_get_group_id returned the gid of
the root group (0) on error/not found. lttng_check_tracing_group needs
to know if an error/not found occured, returning the root group is not
enough. We now return the gid via the passed parameter. The caller is
responsible for either defaulting to the root group or propagating the
error.
We also do not want to warn when used in liblttng-ctl context. We might
want to move the warning elsewhere in the future. For now, pass a bool
if we need to warn or not.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Mathieu Desnoyers [Tue, 2 Apr 2019 17:41:17 +0000 (13:41 -0400)]
Fix: logging: log_add_time() save/restore errno
The debugging logging macros (e.g. DBG()) are used as printf in the
lttng-tools source files. The printf() implementation does not alter the
errno value, so the fact that log_add_time() (through clock_gettime())
can alter errno is unexpected. For instance, adding a logging statement
for debugging purposes within a function for which errno is expected to
stay unchanged on return will change the behavior between execution with
-vvv and non-verbose.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Mathieu Desnoyers [Wed, 24 Apr 2019 22:56:05 +0000 (18:56 -0400)]
Fix relayd: initialize beacon to -1ULL
The relayd stream beacon_ts_end field is expected to have the value
-1ULL when unset (no beacon has been received since last index).
However, the initial state is wrong. It is left at the value 0, which
indicates that a live beacon has indeed been received (which is untrue),
which in turn causes a live beacon with ctf_stream_id of -1ULL to be
sent to babeltrace, which does not expect it, and fails.
This issue can be triggered with the following scenario:
1) create live session
2) setup UST per-uid buffers tracing
3) start tracing, without any active traced application
4) hook with babeltrace live client to view the trace
5) run a traced application
Step 5) will cause the babeltrace live client to receive a stream_id of
-1ULL, and error out.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Mathieu Desnoyers [Wed, 3 Apr 2019 20:26:45 +0000 (16:26 -0400)]
Fix: relayd: handling of lttng_read errors >= 0
errno is only set when lttng_read returns a negative value. Else, we
need to print a ERR() statement rather than use PERROR().
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Tue, 16 Apr 2019 20:43:48 +0000 (16:43 -0400)]
Harmonize pprint macro across projects
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Tue, 16 Apr 2019 20:43:47 +0000 (16:43 -0400)]
Update the ac_define_dir macro from the autoconf archive
This macro was removed many years ago from the archive because it crosses
the boundary between configure and make time variables. For the moment
update it to the latest released version.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Tue, 16 Apr 2019 20:43:46 +0000 (16:43 -0400)]
Harmonize rw_prog_cxx_works macro across projects
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Tue, 16 Apr 2019 20:43:45 +0000 (16:43 -0400)]
Namespace check_sdt_works custom macro
The ax_ prefix is for macros that are copied from the
Autoconf archive project.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Tue, 16 Apr 2019 20:43:44 +0000 (16:43 -0400)]
Update macros from the autoconf archive
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 24 Apr 2019 20:06:01 +0000 (16:06 -0400)]
Fix: relayd not spawned on default-url live session creation
b178f53e9 introduced a regression that causes the lttng client to
not spawn a relay daemon automatically when a live session is
created using the default url parameters.
This fix re-introduces an equivalent check to restore the
previous behaviour.
Reported-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 24 Apr 2019 20:05:36 +0000 (16:05 -0400)]
Clean-up: remove empty line in lttng create command
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 17 Apr 2019 20:55:27 +0000 (16:55 -0400)]
Add mkdirat utils and runas wrappers
The lttng_directory_handle allows its user to keep a handle to
a directory and to create subdirectories relative to it.
On platforms implementing POSIX.2008, a directory file descriptor
is used to maintain a handle to an existing directory and used
in conjunction with mkdirat() to create subdirectories.
Derelict platforms (such as Solaris 10) use an alternative
implementation which carries the location of the root directory
and builds subdirectory paths on creation.
The existing mkdir utils are re-implemented using this new
interface (using the special AT_FDCWD file descriptor value, when
applicable) to limit code duplication.
The implementation of the directory handle and its users is
automatically selected based on the presence of the dirfd() function,
but can also be explicitly chosen using the --enable/disable-dirfd
configuration option.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 11 Apr 2019 17:54:57 +0000 (13:54 -0400)]
Clean-up: remove commented code from test
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 10 Apr 2019 20:46:35 +0000 (16:46 -0400)]
Fix tests: NULL pointer dereference in ltt_session unit tests
Skip the session destruction test if the target session is not
found. Otherwise, a NULL pointer dereference will occur.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 10 Apr 2019 20:37:42 +0000 (16:37 -0400)]
Fix tests: NULL pointer dereference in ust channel unit tests
The test_create_ust_channel() test case erroneously checks for
a NULL session instead of a channel. This can result in a
NULL pointer dereference on failure to create a ust channel.
The scope of usess is reduced to prevent similar mistakes in the
future. Moving 'dom' has made it obvious that this variable is
unused. Hence, it is removed.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 10 Apr 2019 20:25:20 +0000 (16:25 -0400)]
Fix tests: NULL pointer dereference in ltt_ust_context unit tests
The check for the expected context's type must be skipped when
trace_ust_create_context() fails. Otherwise, a NULL pointer
dereference will occur.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 10 Apr 2019 20:16:43 +0000 (16:16 -0400)]
Fix tests: NULL pointer dereference in ltt_session unit tests
The check for a NULL kernel session must be skipped when
the session_find_by_name() fails to find a session else a NULL
pointer dereference will occur.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 10 Apr 2019 19:28:15 +0000 (15:28 -0400)]
Log the wait-shm's path on shm_open failure
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 8 Apr 2019 19:44:01 +0000 (15:44 -0400)]
Generate session name and default output on sessiond's end
The lttng client currently generates the default session name and
output parameters. This has, over time, resulted in a number of
problems. Notably, it is possible for scripts to create session
too quickly using automatically-generated session names that would
clash since the session's creation timestamp is the only variable part
of a session "automatic" name. Hence, sessions created in the same
second would clash and result in spurious session creation failures.
More importantly, generating session names and outputs on the client
end makes it impossible to reliably differentiate output locations that
were automatically generated vs. those that were explicitly provided.
This causes destinations to be "opaque" to the LTTng daemons as
the subdir, session name, and session's creation timestamp are all
"cooked" as part of the output destination path/subdir. Keeping these
path components separate will make it easier to implement output path
configurations that allow the grouping of session outputs by name, by
host, etc.
Since a session's creation time is used as part of its shm-path, an
accessor to the session's creation time is added to the public API:
lttng_session_get_creation_time(). This creation time attribute can be
accessed when an lttng_session structure is created using the session
listing API.
Note that existing session creation functions are preserved to
maintain the binary compatibility with existing liblttng-ctl users.
The session creation functions are reimplemented on top of this
newly-introduced API. The only function for which compatibility is
dropped is the hidden _lttng_create_session_ext().
Overhaul of path separation
---
Not generating paths on the client-end has uncovered a number
of problems in the path handling of the session daemon, especially
when a network output was used. A lot of code presumed that a
network session would be created with a URL containing a sub-directory
of the form "session_name-timestamp". While this is true for
remote sessions created by the lttng client, a sub-directory is
not required when liblttng-ctl is used directly.
Hence, this commit ensures that session directories are split
as base path, chunk directory, domain directory, application
directory.
A number of changes in this fix ensure that a session's base path
contains everything up to the "session" path element _or_ up to
the user-specified output directory.
For example, creating a local session using default output settings,
the session base output is:
/home/user/lttng-traces/session-timestamp
Creating a remote session using default output settings, the session
base output path is:
/hostname/session-timestamp/
Using custom output directories, whether locally or remotely, causes
the session base path to be set to that custom output directory.
For example, using a local output path of /tmp/my_path will result
in a session base path of the form:
/tmp/my_path
Whereas creating a session with a network output of
net://localhost/my_path will result in a session base path of the
form:
/hostname/my_path
Another problematic element is the subdir of the kernel_session
and ust_session consumer output which in different scenarios
contained chunk names and arbitrary parts of the path hierarchy.
The consumer output subdir has been renamed to 'domain_subdir'
and now only ever contains: "kernel/", "ust/", or "".
Finally, the chunk_path session attribute only contains the name
of the current chunk directory being produced.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 1 Apr 2019 22:03:57 +0000 (18:03 -0400)]
Move completed trace archive chunks to an "archives" sub-folder
Users have expressed the desire to read all completed trace archive
chunks at once. Initialy, the requirement called for users to wait,
using the notification API, for chunks to be made available following
the completion of a rotation. Upon the reception of a rotation
completion notification, which contains the chunk's location, it would
be possible to safely consume the resulting trace archive chunk.
Given that using the notification API was deemed too complex for
certain users, it was decided that the creation of a folder of the
form "<start-time>-<end-time>-<idx>" must guarantee the readability
of its contents. This guarantee is currently not honored for the first
trace archive chunk. This is a known bug which will be addressed
before the release.
This requirement has evolved and it must now be possible for users to
point a reader to a session output directory, after an arbitrary
number of rotations, to consume all completed trace archive chunks
while tracing is ongoing.
To make this possible (and reliable), a reader must have a way to
infer which trace archive chunks are still being produced, and which
have been completed so as to not attempt to consume a trace that
is still being produced.
First, a quick refresher on the hierachy of a session output path.
Before a rotation occurs, a session output path has the following
hierarchy:
my_trace
|__ust
|__[...]
|__kernel
|__[...]
And, after four completed rotations:
my_trace
|__<start-time>-<end-time>-0 <--- completed trace archive chunk
|__<start-time>-<end-time>-1
|__<start-time>-<end-time>-2
|__<start-time>-<end-time>-3
|__<start-time>-4 <--- trace archive being produced
As a consequence of this behaviour, it is not possible to safely point
a CTF reader to a trace output directory without a special
configuration option that would indicate that the reader should ignore
the usual lttng hierarchy ('ust' and 'kernel'). Indeed, it is not
possible to distinguish a completed trace from a trace being produced
before the completion of the first rotation of a session.
Moreover, relying on the format of the name of trace archive chunks to
infer their completeness is awfully restrictive in terms of our
ability to alter the trace archive chunk directory format in the
future.
This format is also not part of any CTF specification document meaning
that implementing this logic in a reference CTF
implementation (babeltrace) is ill-advised. It would be unexpected for
a reader to fail to read a trace simply because it is stored in a
directory of the form <ISO8601-timestamp>-<integer>, which is a
valid trace output directory.
Hence, this commit changes the hierarchy of the trace output directory
so that completed (and safe to read) trace archive chunks are stored
in an "archives" sub-folder under the tracing session output
directory. This results in the following trace output directory
hierarchy:
my_trace
|__archives
| |__<start-time>-<end-time>-0 <--- completed trace archive chunk
| |__<start-time>-<end-time>-1
| |__<start-time>-<end-time>-2
| |__<start-time>-<end-time>-3
|__<start-time>-4 <--- trace archive being produced
Pointing to the "archives" directory requires no LTTng-specific logic
to be implemented in the reader(s) to achieve the users' requirement.
It does require that users wait for the presence of an "archives"
directory in the trace output path.
Reference shell code is provided to implement such a check:
"""
if [ -d ${trace_output_path}/archives ]; then
# invoke reader
fi
"""
Given that a user would have to wait for a first completed chunk to
appear or ensure (in whichever way) that a rotation has occured before
consuming trace archive chunks, this alternative does not seem more
complex.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 10 Apr 2019 03:38:43 +0000 (23:38 -0400)]
Fix: lttng_rotate_session does not handle socket close
lttng_ctl_ask_sessiond may return 0 if the sessiond process is killed
or if its client socket is closed unexpectedly. This causes
lttng_rotate_session to assume a rotation command reply has
been received, resulting in a NULL pointer dereference later on.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 8 Apr 2019 20:03:17 +0000 (16:03 -0400)]
Fix: hide internal libcommon time utilities
libcommon's time-handling utilities are used by liblttng-ctl.
Like other symbols of libcommon, they must be marked as hidden
to prevent them from being exported as part of the liblttng-ctl
interface.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Sun, 7 Apr 2019 18:39:08 +0000 (14:39 -0400)]
lttng: make the configuration file interface const correct
The interface defined in conf.h is not const-correct thus hindering
the use of const-correct code in the CLI client.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Sat, 6 Apr 2019 19:16:38 +0000 (15:16 -0400)]
Fix: command reply message is leaked for variable-len replies
Commands which return a variable-length payload re-setup the
command context using setup_lttng_msg() (and its wrappers).
In doing so, the lttcomm_lttng_msg structure (plus its trailing
variable-length payload) are re-allocated. However, the previous
instance of lttcomm_lttng_msg is leaked.
This is solved by free()-ing the original lttcomm_lttng_msg when
setup_lttng_msg() is used. When it is only used once, a NULL
pointer will be free'd without any effect.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Wed, 20 Mar 2019 21:49:00 +0000 (17:49 -0400)]
Fix: skip test when ust doesn't have perf support
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Mon, 25 Mar 2019 18:49:39 +0000 (14:49 -0400)]
Tests: check for lttng-modules presence
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Yannick Lamarre [Tue, 26 Mar 2019 19:53:06 +0000 (15:53 -0400)]
Fix: Properly sanitize input parameter
The lttng client uses the sizeof the containing buffer, defined as
LTTNG_SYMBOL_NAME_LEN, for input string sanitation instead of libc defined
macro NAME_MAX. lttng-enable_channel improperly verified user input
and wrongly discarded valid input in case NAME_MAX was less than the
sizeof the containing buffer for the channel's name.
This patch also fixes potential buffer overflow caused by an improperly
bounded strcpy in the case where NAME_MAX would have been greater than
LTTNG_SYMBOL_NAME_LEN.
Signed-off-by: Yannick Lamarre <ylamarre@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Tue, 19 Mar 2019 20:56:03 +0000 (16:56 -0400)]
Fix tests: link libpause_consumer on liblttng-ctl
This preload test library uses symbols from liblttng-ctl which are
resolved when preloaded by GLIBC but not by MUSL.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Wed, 13 Mar 2019 21:50:55 +0000 (17:50 -0400)]
tap-driver.sh: flush stdout after each test result
This is useful in a CI system where stdout is fully buffered and you
look at the console output to see which test is hanging.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 28 Mar 2019 15:18:38 +0000 (11:18 -0400)]
Fix tests: snapshot size validation failure runs too many test cases
The snapshot max size test is reported as both passing and failing
when the test case fails.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Tue, 12 Mar 2019 18:30:31 +0000 (14:30 -0400)]
Fix tests: the tree origin can be a symlink itself
Problem:
The base tree is defined as "/tmp/.....XXXXXX".
On systems where "/tmp/" is itself a symlink utils_expand_path will
expand the tree origin itself.
For example on a base core-image-minimal Yocto build /tmp is a symlink
to "/var/tmp", which is a symlink to "/var/volatile".
utils_expand_path will return something like this for the symlink test:
"/var/volative/.....XXXXXX/...." which is the valid result.
Solution:
Simply use realpath on the tree_origin and use this path to perform the
test validation.
This work was performed in the effort to support yocto fully and be able
to run the test suite to detect problem as early as possible.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Tue, 12 Mar 2019 18:30:30 +0000 (14:30 -0400)]
Fix tests: skip test_getcpu_override on single core systems
There is no value in performing this test on single-core system
since the only valid value for the cpu field is 0.
This test currently fails on single-core systems (i.e yocto runqemu)
on the test_getcpu_override_fail test case.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Yannick Lamarre [Fri, 22 Feb 2019 19:33:38 +0000 (14:33 -0500)]
Enforce DL_LIBS value instead of hard coded -ldl
Generated makefiles would ignore DL_LIBS value selected by configure
script and use the hard coded value -ldl. Generated makefiles will
now use DL_LIBS.
Refs: #1165
Signed-off-by: Yannick Lamarre <ylamarre@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Yannick Lamarre [Fri, 22 Feb 2019 19:33:37 +0000 (14:33 -0500)]
Fix: Add POPT_CFLAGS to lttng_CFLAGS
The generated makefile was ignoring POPT_CFLAGS when compiling
lttng, but was adding POPT_LIBS to lttng_LDADD. With this commit,
make now honors both settings for applications and tests.
Fixes: #1165
Signed-off-by: Yannick Lamarre <ylamarre@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Mathieu Desnoyers [Tue, 19 Feb 2019 22:47:49 +0000 (17:47 -0500)]
Fix: consumer snapshot: handle unsigned long overflow
Comparing the consumed iterator and the produced position without
using a difference generates an empty snapshot when the iterator is
before unsigned long overflow and the produced position is after
unsigned long overflow.
This applies to both UST and kernel consumers.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 26 Feb 2019 16:21:14 +0000 (11:21 -0500)]
Clean-up: hide internal kernel_consumer_add_channel() symbol
kernel_consumer_add_channel() is not used outside of
kernel-consumer.c.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 27 Mar 2019 22:36:15 +0000 (18:36 -0400)]
Fix: no-output sessions do not enforce snapshot constraints
A number of scenarios can lead to failures to record snapshots when a
session is created as "no-output" (not snapshot) and has snapshot
outputs added later-on.
These changes prevent a user from adding snapshot outputs after the
creation of a kernel channel that makes use of the 'splice' output
type since those do not allow the capture of snapshots.
Moreover, the output type of kernel channels is overriden when a
snapshot output is present in a session to behave like a snapshot
session would.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 27 Mar 2019 19:42:19 +0000 (15:42 -0400)]
Fix: wrong error code returned by kernel_snapshot_record()
On snapshot error, kernel_snapshot_record() can return
LTTNG_ERR_KERN_CONSUMER_FAIL which means that the kernel consumer
daemon failed to launch. In this path, the appropriate error to
return is LTTNG_ERR_KERN_META_FAIL.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 27 Mar 2019 19:31:13 +0000 (15:31 -0400)]
Clarify incorrect channel output type logging message
Recording a snapshot is only supported for channels that have
an "mmap" output type. Add the channel's name and an explanation
of the error in the consumer daemon's log as the channel's
'output' integral representation is of limited use.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 27 Mar 2019 19:28:54 +0000 (15:28 -0400)]
Mark lttng_kconsumer_snapshot_channel as static
lttng_kconsumer_snapshot_channel() is not used outside of
kernel-consumer.c. It should therefore not be exported outside
of this TU.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 27 Mar 2019 18:12:47 +0000 (14:12 -0400)]
Docs: clarify the meaning of the snapshot_mode flag in ltt_session
The snapshot_mode flag of an ltt_session only affects the default
channel creation attributes. It is not used to prevent the
modification of snapshot outputs.
Since a snapshot session is necessarily in "no-output" mode, it
is redundant to check for this flag in rotation-related commands.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 27 Mar 2019 17:36:06 +0000 (13:36 -0400)]
lttng: clean-up the printout of snapshot outputs
The printout of snapshot outputs, when they are added or listed,
insistently uses "-1" and "0" to indicate that no "max-size"
parameter is specified.
This commit only prints the "max-size" settings when one is
actually specified and changes the spelling of "max-size" to
"max size".
Moreover, the unit (bytes) is printed when a max-size parameter
is specified on a snapshot output.
This changes the output of the listing of snapshot outputs as follows.
Before:
$ lttng snapshot list-output
Snapshot output list for session auto-
20190327-133932
[1] snapshot-1: /home/jgalar/lttng-traces/auto-
20190327-133932 (max-size: 0)
After:
$ lttng snapshot list-output
Snapshot output list for session auto-
20190327-133932
[1] snapshot-1: /home/jgalar/lttng-traces/auto-
20190327-133932
Creating a snapshot output without a max-size no longer indicates
"(max-size: -1)".
Before:
$ lttng snapshot add-output /tmp/keb
Snapshot output successfully added for session auto-
20190327-134055
[1] snapshot-1: /tmp/keb (max-size: -1)
After:
$ lttng snapshot add-output /tmp/keb
Snapshot output successfully added for session auto-
20190327-134055
[1] snapshot-1: /tmp/keb
Max sizes, when specified, are printed in bytes.
Before:
$ lttng snapshot add-output /tmp/keb --max-size=498394
Snapshot output successfully added for session auto-
20190327-134055
[1] snapshot-1: /tmp/keb (max-size: 498394)
After:
$ lttng snapshot add-output /tmp/keb --max-size=498394
Snapshot output successfully added for session auto-
20190327-134055
[1] snapshot-1: /tmp/keb (max size: 498394 bytes)
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 26 Mar 2019 23:43:50 +0000 (19:43 -0400)]
lttng: clean-up printout of session output destination
This commit fixes three UX problems with lttng's list command
output.
1) The list command currently repeats the session's output location
twice, e.g.
auto-
20190326-192630 (/home/jgalar/lttng-traces/auto-
20190326-192630) [inactive]
Trace path: /home/jgalar/lttng-traces/auto-
20190326-192630
2) In the case of a snapshot session, the parentheses are empty
and the "Trace path:" line is empty, e.g.
auto-
20190326-192613 () [inactive snapshot]
Trace path:
3) The term "path" is used even though the output may be a
network location, e.g.
auto-
20190326-194856 (tcp4://127.0.0.1:5342/ [data: 5343]) [inactive]
Trace path: tcp4://127.0.0.1:5342/ [data: 5343]
The new output omits the output location in parentheses, doesn't
print the "Trace path" line if no output is specified, and
uses the generic "output" terminology rather than "path".
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Fri, 22 Mar 2019 21:51:14 +0000 (17:51 -0400)]
Docs: document the format of the lttng_session path member
Document that the path returned through a session listing operation
is not a path nor standard URL. While a UNIX path will be returned
when a session is configured to trace locally, a liblttng-ctl user
should not expect this field to contain a valid URL when a network
streaming (or live) output destination is configured. The "path"
field will hold a custom-formatted string describing the output.
This is arguably unexepected, but since this is currently the only
way to obtain the destination of an existing session, this format
will not be changed to preserve compatiblity with existing tools
which could rely on this format.
A description of the formating used by the session daemon is
added as part of this patch.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Fri, 22 Mar 2019 20:08:11 +0000 (16:08 -0400)]
Docs: lttng-ctl has no default live timer period
The documentation of liblttng-ctl mentions that a 1 second timer
is used by default.
The library itself (and the session daemon) has no concept of a
default live timer period. The CLI governs this default value.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 26 Mar 2019 15:50:41 +0000 (11:50 -0400)]
Fix: missing include can cause structures to not be packed
A number of files declaring "packed" structures (using the LTTNG_PACKED
macro) do not include common/macros.h, which defines this macro.
This results in structures being used in their "unpacked" form, or
under both packed and unpacked forms, depending on the other files
included at the point of definition and use of these structures.
It is unclear which of the users of these structures were actually
affected by the bug. Most of these structures are used for IPC
over a UNIX socket. In these cases, it is reasonable to assume that
lttng-tools will be rebuilt completely to take this change into
account.
However, the structures declared in common/sessiond-comm/relayd.h are
more worrying as they are part of the relay daemon's network protocol.
Fortunately, adding the following directive to
common/sessiond-comm/relayd.h confirms that the header is included
transitively where those structures are used.
> #ifndef LTTNG_PACKED
> #error Not defined!
> #endif
Instances of this issue were found using the following script.
for file in $(ag -l LTTNG_PACKED); do
ag "#include \<common/macros\.h\>" -l ${file} > /dev/null
if [ $? -ne 0 ]; then
echo "Missing include in" $file
fi
done
Running this script produces the following output (annotated):
Missing include in include/lttng/channel-internal.h
Missing include in include/lttng/condition/buffer-usage-internal.h
Missing include in include/lttng/condition/session-consumed-size-internal.h
Missing include in include/lttng/condition/session-rotation-internal.h
Missing include in src/common/sessiond-comm/sessiond-comm.h
Missing include in src/common/sessiond-comm/relayd.h
Missing include in src/common/sessiond-comm/agent.h
> LTTNG_PACKED mentioned in comments
Missing include in src/common/optional.h
> Unneeded.
Missing include in src/common/macros.h
> lttng-ust-abi.h defines its own version of LTTNG_PACKED
> and is included by lttng-ust-ctl.h
Missing include in src/bin/lttng-sessiond/lttng-ust-ctl.h
Missing include in src/bin/lttng-sessiond/lttng-ust-abi.h
Missing include in src/lib/lttng-ctl/filter/filter-bytecode.h
> False positives (not source files)
Missing include in packed.sh
Missing include in ChangeLog
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Fri, 22 Mar 2019 21:51:40 +0000 (17:51 -0400)]
Fix: check illegal combinations of ctrl-url/data-url/ouput/set-url
The lttng CLI must check for illegal combinations of the
--ctrl-url, --data-url, --set-url, and --output options.
The following combinations are mutually exclusive:
1) --set-url
2) --ctrl-url + --data-url
3) --output
Combining these incompatible options resulted in unhelpful
generic error messages since the error is catched a lot farther
than it should.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 19 Mar 2019 19:37:28 +0000 (15:37 -0400)]
Fix: lttng_uri structure must be packed as it is used for IPC
The lttng_create_session commands send lttng_uri structures over
a UNIX socket. As such, the structure must be packed to preclude
the inclusion of any padding.
Moreover, the 'in_port_t' is replaced by uint16_t and PATH_MAX
is replaced by LTTNG_PATH_MAX to prevent conflicts if both ends
of the IPC are not build with the same toolchain/definitions.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 19 Mar 2019 01:30:35 +0000 (21:30 -0400)]
Fix: missing mentions of tracing session rotation in basic help
The basic help shown when the 'lttng' binary is invoked without
a command is currently out of sync with the LTTNG(1) man page.
This adds the short descriptions of the 'rotate', 'enable-roation',
and 'disable-rotation' commands which were added as part of the
2.11 release.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 26 Feb 2019 03:05:32 +0000 (22:05 -0500)]
Fix: release reference to ltt_session on error instead of free()
Since ltt_session objects within the session daemon are now
reference counted, it is more appropriate to release a reference
on error rather than calling free() directly in session_create().
The session_release() function also performs additional that can
be needed in some of the error paths of session_create().
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 19 Feb 2019 21:25:17 +0000 (16:25 -0500)]
Fix relayd: session leaked on communication error during creation
A relay_session object can be leaked if the relay daemon fails
to reply to the RELAYD_CREATE_SESSION command.
Since the relay daemon's peer can't know the session's id, the
session will never be referenced in the future. Moreover, the
session will never be tied to the connection(s) in order to bound
its lifetime.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Francis Deslauriers [Sat, 17 Nov 2018 03:51:06 +0000 (22:51 -0500)]
Prevent channel buffer allocation larger than memory
Background
==========
Until recently (before lttng-modules commit
1f0ab1e) it was possible to
trigger an Out-Of-Memory crash by creating a kernel channel buffer
larger than the currently usable memory on the system. The following
commands was triggering the issue on my laptop:
lttng create
lttng enable-channel -k --subbuf-size=100G --num-subbuf=1 chan0
The lttng-modules commit
1f0ab1e adds a verification based on an
estimate to prevent this from happening. Since this kernel tracer sanity
check is based on an estimate, it would safer to do a similar check on
the session daemon side.
Approach
========
Verify that there is enough memory available on the system to do all the
allocations needed to enable the channel. If the available memory is
insufficient for the buffer allocation, return an error to the user
without trying to allocate the buffers.
Use the `/proc/meminfo` procfile to get an estimate of the current size
of available memory (using `MemAvailable`). The `MemAvailable` field was
added in the Linux kernel 3.14, so if it's absent, fallback to verifying
that the requested buffer is smaller than the physical memory on the
system.
Compute the size of the requested buffers using the following equation:
requested_memory = number_subbuffer * size_subbuffer * number_cpu
The following error is returned to the command line user:
lttng enable-channel -k --subbuf-size=100G --num-subbuf=1 chan0
Error: Channel chan0: Not enough memory (session auto-
20181121-161146)
Side effect
===========
This patch has the interesting side effect to alerting the user with an
error that buffer allocation has failed because of memory availability
in both --kernel and --userspace channel creation.
Drawback
========
The fallback check on older kernels is imperfect and is only to prevent
obvious user errors.
Note
====
In the future, there might be a need for a way to deactivate this check
(by using an environment variable) if a case arises where
`/proc/meminfo` doesn't accurately reflect the state of memory for a
particular use case.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Jérémie Galarneau [Fri, 14 Dec 2018 20:36:20 +0000 (15:36 -0500)]
Fix: destroy called twice on quit pipe
A consumer management thread can be launched successsfully and yet
still report an error encoutered during its initialization. If
such an error occurs, the cleanup function is invoked explicitly
in the error path and will be called again when the last reference
to the thread is released.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Thu, 20 Dec 2018 21:16:47 +0000 (16:16 -0500)]
Remove duplicate check for dlopen
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Fri, 8 Feb 2019 01:25:41 +0000 (20:25 -0500)]
Tests: take multiple snapshots in streaming mode
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Fri, 8 Feb 2019 01:25:42 +0000 (20:25 -0500)]
Fix: don't destroy the sockets if the snapshot was successful
Missing a goto to skip the error condition that was destroying the
relayd sockets even if a snapshot was successful. We want to keep them
open to reuse them for the next snapshots.
This is verbatim from the fix
1371fc1228461eb532118280e67ab3e9de015757
It is also the same fix.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Wed, 16 Jan 2019 18:38:57 +0000 (13:38 -0500)]
Fix: run-as thread deadlock on itself in restart error path
The deadlock was found using this backtrace
Thread 5:
0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
1 0x00007efc6b650023 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x55fc37128400 <worker_lock>) at ../nptl/pthread_mutex_lock.c:78
2 0x000055fc36efbe05 in run_as_destroy_worker () at runas.c:1233
3 0x000055fc36efc2e7 in run_as_restart_worker (worker=<optimized out>) at runas.c:998
4 run_as (cmd=cmd@entry=RUN_AS_UNLINK, data=data@entry=0x7efc5b7fa630, ret_value=ret_value@entry=0x7efc5b7fa510, uid=uid@entry=1000, gid=gid@entry=1000) at runas.c:1033
5 0x000055fc36efc9ce in run_as_unlink (path=path@entry=0x7efc5b7fb690 "/home/joraj/lttng-traces/auto-
20190116-111518/20190116T111729-0500-33/kernel/index/channel0_3.idx", uid=uid@entry=1000, gid=gid@entry=1000) at runas.c :1120
6 0x000055fc36ef7feb in utils_unlink_stream_file (path_name=path_name@entry=0x7efc5b7fc7e0 "/home/joraj/lttng-traces/auto-
20190116-111518/20190116T111729-0500-33/kernel/index", file_name=file_name@entry=0x7efc500085d4 "channel0_3", size=size@entry=0, count=count@entry=0, uid=uid@entry=1000, gid=gid@entry=1000, suffix=0x55fc36f19b26 ".idx") at utils.c:929
7 0x000055fc36f01d4e in lttng_index_file_create (path_name=path_name@entry=0x7efc500087a0 "/home/joraj/lttng-traces/auto-
20190116-111518/20190116T111729-0500-33/kernel", stream_name=stream_name@entry=0x7efc500085d4 "channel0_3", uid=1000, gid=1000, size=0, count=0, major=1, minor=1) at index.c:79
8 0x000055fc36ed9475 in rotate_local_stream (ctx=<optimized out>, stream=0x7efc50008460) at consumer.c:4105
9 0x000055fc36ed98b5 in lttng_consumer_rotate_stream (ctx=ctx@entry=0x55fc37428d80, stream=stream@entry=0x7efc50008460, rotated=rotated@entry=0x7efc5b7fdb27) at consumer.c:4181
10 0x000055fc36ee354e in lttng_kconsumer_read_subbuffer (stream=stream@entry=0x7efc50008460, ctx=ctx@entry=0x55fc37428d80, rotated=rotated@entry=0x7efc5b7fdb27) at kernel-consumer.c:1740
11 0x000055fc36ed7a30 in lttng_consumer_read_subbuffer (stream=0x7efc50008460, ctx=0x55fc37428d80) at consumer.c:3383
12 0x000055fc36ed4b74 in consumer_thread_data_poll (data=0x55fc37428d80) at consumer.c:2751
13 0x00007efc6b64d6db in start_thread (arg=0x7efc5b7fe700) at pthread_create.c:463
14 0x00007efc6af6488f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
The owner of the lock is itself:
print worker_lock.__data.__owner
$2 = 25725
thread find 25725
Thread 5 has target id 'Thread 0x7efc5b7fe700 (LWP 25725)'
The worker_lock is first taken in frame #4: run_as runas.c:1033
pthread_mutex_lock(&worker_lock);
if (use_clone()) {
...
/*
* If the worker thread crashed the errno is set to EIO. we log
* the error and start a new worker process.
*/
if (ret == -1 && saved_errno == EIO) {
DBG("Socket closed unexpectedly... "
"Restarting the worker process");
-> ret = run_as_restart_worker(global_worker);
if (ret == -1) {
ERR("Failed to restart worker process.");
goto err;
}
Solution
========
Create run_as_restart_worker_no_lock which does not to take the lock on
execution.
Use run_as_restart_worker_no_lock at the run_as error path call site.
Use run_as_restart_worker_no_lock inside run_as_restart_worker while
holding the worker lock to provide identical behaviour to other call sites.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Wed, 19 Dec 2018 18:47:23 +0000 (13:47 -0500)]
Fix: session list lock must be held on session put operation
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Fri, 14 Dec 2018 21:32:12 +0000 (16:32 -0500)]
Support minute and hour as time suffixes
utils_parse_time_suffix now support the following suffix:
"us" for microsecond,
"ms" for millisecond,
"s" for second,
"m" for minute,
"h" for hour
This removes the use of "m" for milliseconds and "u" for microseconds.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Sat, 2 Feb 2019 13:09:55 +0000 (08:09 -0500)]
Test fix: passing bool argument to va_start is undefined
clang warns that "passing an object that undergoes default argument
promotion to 'va_start' has undefined behaviour [-Wvarargs]".
Since va_start's last argument has no known type, the boolean argument
is promoted to 'int', which is not guaranteed to have the same size
as 'bool'.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 23 Jan 2019 20:29:14 +0000 (15:29 -0500)]
Fix: missing rcu read locking in trigger "unregister all" command
While the notification subsystem all runs within a single thread,
the iteration over the triggers hash table must be protected using
the RCU read-side lock since the RCU worker may resize the hash
table while the iteration is performed.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Fri, 18 Jan 2019 17:40:47 +0000 (12:40 -0500)]
Fix: create_kernel_session asserts on failure
create_kernel_session() will call trace_kernel_destroy_session()
on failure to create a kernel session (e.g. modules failed to load).
This can be reproduced by enabling kernel events on a session after
the session daemon has failed to load the LTTng kernel modules.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 14 Jan 2019 22:13:32 +0000 (17:13 -0500)]
Fix: only free trace_path when it is dynamically allocated
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 14 Jan 2019 22:09:42 +0000 (17:09 -0500)]
Fix: wrong error check on kernel session creation
create_kernel_session() returns a positive lttng error code
on error and returns LTTNG_OK on success.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 14 Jan 2019 21:53:38 +0000 (16:53 -0500)]
Fix: don't put() thread on shutdown failure
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 14 Jan 2019 21:36:21 +0000 (16:36 -0500)]
Fix: dereference on NULL pointer on allocation failure
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Sat, 12 Jan 2019 19:53:56 +0000 (14:53 -0500)]
Fix: leak of filter bytecode and expression on agent event re-enable
The agent subsystem does not properly assume the clean-up of an
event's filter bytecode and expression when a previously disabled
event is re-enabled.
This change ensures that the ownership of both the filter bytecode
and expression is assumed by the agent subsystem and discarded
when a matching event is found.
Steps to reproduce the leak:
$ lttng create
$ lttng enable-event --python allo --filter 'a[42] == 241'
$ lttng disable-event --python allo
$ lttng enable-event --python allo --filter 'a[42] == 241'
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Sat, 12 Jan 2019 19:21:24 +0000 (14:21 -0500)]
Test fix: python logging test spams its output
A set -x/+x pair was erroneously committed as part of the
test_python_logging test script which causes the test to be
unnecessarily verbose.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Sat, 12 Jan 2019 19:17:58 +0000 (14:17 -0500)]
Fix: leak of lttng-consumerd global HTs in run-as worker
All resources allocated by the consumerd before the launch
of the run-as worker process are leaked since the run-as process
is only fork()'ed (the original process image is preserved).
Moving the launch of the worker earlier in the initialization
of the consumerd works around this problem.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Fri, 11 Jan 2019 20:49:44 +0000 (15:49 -0500)]
Fix: leak of sessiond configuration on launch of run-as worker
The run-as worker is spawned through fork() without using
exec*(). This means that any resource allocated by the session
daemon before the launch of the run-as worker will be leaked in
the run-as worker's process.
A callback is added to the run_as launch interface to allow users
a chance to clean-up after the fork occurs. This mechanism is
fragile as it may not always be easy (or possible) to track all
such resources in the future. This makes a strong argument for using a
new process image (through exec*()) and forego any such problem at
some point.
The lttng-consumerd from a similar (and more severe) problem with its
own run-as worker. A fix adressing the consumerd's problem follows.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Fri, 11 Jan 2019 20:10:08 +0000 (15:10 -0500)]
Fix: leak of rundir config string
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 10 Jan 2019 18:48:28 +0000 (13:48 -0500)]
Fix: only synchronize application configuration on tracing start
The UST configuration of applications is currently replicated as it is
changed from the ltt_ust_{session, channel, event} data structures to
their ust_app_* equivalent as they are modified.
While this worked correctly for the most part, it caused a problem in
per-PID mode since the buffers would get allocated
(and files created, in applicable tracing modes) even though tracing
was never started during some applications' lifetime.
A previous fix attempt,
0498a00cb, adressed this problem but
introduced a regression that caused configurations to become
mismatched between the sessiond and applications in cases where a
tracing session was started, stopped, modified, and started again
within the lifetime of a given application.
This change introduces an explicit "synchronize" set of operations
that ensures that a session's channels and events configurations, as
known by the application(s), match those of the session daemon
whenever a session is started.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Mathieu Desnoyers [Thu, 13 Dec 2018 18:56:35 +0000 (13:56 -0500)]
Fix: run_command_wait() handle partial write
Use lttng_write() to handle partial writes (writing less than the
requested amount of bytes) as well as ret = -1, errno = EINTR.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Mathieu Desnoyers [Wed, 12 Dec 2018 22:24:11 +0000 (17:24 -0500)]
Fix: do not repurpose iterator while it is being used
The hash table iteration uses an iterator that needs to stay valid for
the next loop. Using that same iterator variable in a nested lookup
in a different hash table leads to segmentation fault.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Mathieu Desnoyers [Wed, 12 Dec 2018 20:11:15 +0000 (15:11 -0500)]
Fix: handle_notification_thread_command: handle partial read
Use lttng_read() to handle partial reads (returning less than the
requested amount of bytes) as well as ret = -1, errno == EINTR.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Mathieu Desnoyers [Wed, 12 Dec 2018 20:11:14 +0000 (15:11 -0500)]
Fix: notification thread: free session trigger list on error
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Mathieu Desnoyers [Wed, 12 Dec 2018 17:16:44 +0000 (12:16 -0500)]
Fix: notification thread: RCU-safe reclaim of hash table nodes
Nodes that are put in a rculfhash hash table created with the
"auto resize" flag need to beware that a worker thread can access the
hash table nodes as a RCU reader concurrently, and that this worker
thread can modify the hash table content, effectively adding and
removing "bucket" nodes, and changing the size of the hash table
index.
Therefore, even though only a single thread reads and updates the hash
table, a grace period is needed before reclaiming the memory holding
the rculfhash nodes.
Moreover, handle_notification_thread_command_add_channel() misses a
RCU read-side lock around iteration on the triggers hash table. Failure
to hold this read-side lock could cause segmentation faults when
accessing hash table objects if a hash table resize is done by the
worker thread in parallel with iteration over the hash table.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 18 Dec 2018 19:01:08 +0000 (14:01 -0500)]
Fix: error logged on partial recvmsg() in MSG_DONTWAIT
The relay daemon logs a "Resource temporarily unavailable" error
message when the lttcomm_recvmsg_inet_sock() is invoked and
no data is left to be consumed from the lttcomm_sock.
The "recvmsg" socket operation is called in a loop by the relay
daemon to consume the data being received in 64k chunks. If, on
one of those iterations, 0 bytes are available, recvmsg() will
return an error (-1, errno = EAGAIN). This should not be
logged in non-blocking mode.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
This page took 0.049305 seconds and 4 git commands to generate.