Jonathan Rajotte [Thu, 9 Dec 2021 20:14:26 +0000 (15:14 -0500)]
Fix: sessiond: assert on lttng_ht_add_unique_str on ltt_sessions_ht_by_name
Observed issue
==============
The lttng-sessiond asserts with the following backtrace on lttng create:
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007ffff7ab5859 in __GI_abort () at abort.c:79
#2 0x00007ffff7ab5729 in __assert_fail_base (fmt=0x7ffff7c4b588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5555556ab0a6 "node_ptr == &node->node", file=0x5555556ab085 "hashtable.c", line=298, function=<optimized out>) at a
#3 0x00007ffff7ac6f36 in __GI___assert_fail (assertion=assertion@entry=0x5555556ab0a6 "node_ptr == &node->node", file=file@entry=0x5555556ab085 "hashtable.c", line=line@entry=298, function=function@entry=0x5555556ab380 <__PRETTY_FUNCTIO
#4 0x000055555560be44 in lttng_ht_add_unique_str (ht=<optimized out>, node=0x7fffe0026c58) at hashtable.c:298
#5 0x000055555558fb6a in add_session_ht (ls=0x7fffe0024970) at session.c:372
#6 session_create (name=<optimized out>, uid=1000, gid=1000, out_session=out_session@entry=0x7fffedfddbd8) at session.c:1308
#7 0x000055555559b219 in cmd_create_session_from_descriptor (creds=<optimized out>, creds=<optimized out>, home_path=<optimized out>, descriptor=<optimized out>) at cmd.c:3040
#8 cmd_create_session (cmd_ctx=cmd_ctx@entry=0x7fffedfe5fa0, sock=<optimized out>, return_descriptor=return_descriptor@entry=0x7fffedfddd68) at cmd.c:3176
#9 0x00005555555cc341 in process_client_msg (sock_error=0x7fffedfddd10, sock=0x7fffedfddd0c, cmd_ctx=0x7fffedfe5fa0) at client.c:2177
#10 thread_manage_clients (data=<optimized out>) at client.c:2742
#11 0x00005555555c5fff in launch_thread (data=0x55555571b780) at thread.c:66
#12 0x00007ffff7c8b609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#13 0x00007ffff7bb2293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
The issue can be reproduced with modifications to the rotation thread
code and the following scenario:
$ lttng create test
$ lttng enable-event -u -a
$ lttng start
run any app just so that we have a complete valid session. (might not be necessary)
$ lttng destroy --no-wait
$ lttng create test
^ Should assert here.
The diff to be applied:
diff --git a/src/bin/lttng-sessiond/rotation-thread.cpp b/src/bin/lttng-sessiond/rotation-thread.cpp
index
ac149c845..
c11f068ed 100644
--- a/src/bin/lttng-sessiond/rotation-thread.cpp
+++ b/src/bin/lttng-sessiond/rotation-thread.cpp
@@ -565,6 +565,8 @@ int handle_job_queue(struct rotation_thread_handle *handle,
{
int ret = 0;
+ sleep(5);
+
for (;;) {
struct ltt_session *session;
struct rotation_thread_job *job;
Note that the initial report for this issue was on a system under load
for which the `lttng destroy` completion check failed and a `lttng
create` was performed. As of today the exact reason why the completion
check failed is not known. Still we can "fix" the race leading to the
lttng-sessiond assertion considering a user might use the `--no-wait`
variant of `lttng destroy` and could easily end up in this
situation.
Cause
=====
Note: all `lttng create` commands have the same session name passed as
argument.
On `lttng destroy` the ltt_session object is flagged as destroyed
(ltt_session::destroyed). The removal of the object from the hash
table (ltt_sessions_ht_by_name) will be performed during the
`session_release` which is driven by the session refcount.
A reference on the `ltt_session` object is held for the
rotation initiated by the `lttng destroy` command. The rotation is
enqueued by the rotation thread.
At this point the system is busy and the rotation thread does not run.
We simulate this with a `sleep(5)` during the `handle_job_queue`.
The `lttng destroy --no-wait` returns. If the `--no-wait` option is not
passed the `lttng destroy` command will work as expected and wait for
completion. We can SIGINT the `lttng destroy` command and perform a
`lttng create` yielding the same backtrace.
On `lttng create`, `session_create` validates that the name does not
conflict with an existing session using `session_find_by_name`. It is
important to note that `session_find_by_name` discriminates also on the
`session->destroyed` flag (introduced by [1]).
The `ltt_sessions_ht_by_name` hash table was introduced by [2] to remove
the need to lock the session list to sample a session id during the
queueing of actions to be executed related to a trigger. The assumption
was made that, during the creation phase, the session would
always be unique in that hash table based on its name. This is simply
not true since multiple sessions with the same name can coexist as long
as only a single one is marked as "not destroyed". This is an important
concept due to the refcounting of the object and the feature relying on
the lifetime of the object (i.e rotation). This is mostly valid when
talking about the global session list.
Solution
========
Move the hash table removal earlier during the release of the session
object.
Move the removal from `del_session_ht`, which is done during the
`session_release` function, to the `lttng_session_destroy` function.
It is safe to do so since currently the only user of that hash table
(the action executor) does not care much about destroyed session at that
point.
This ensures that we maintain the uniqueness property of the key (name)
for that hash table on insertion.
The alternative was to expose an hash table that could contain
duplicates and force the handling of a set on all lookups.
Known drawbacks
=========
None.
References
==========
[1] https://git.lttng.org/?p=lttng-tools.git;a=commit;h=
e32d7f274604b77bcd83c24994e88df3761ed658
[2] https://git.lttng.org/?p=lttng-tools.git;a=commit;h=
e1bbf98908a6399f39a9a8bc95bd8b59cecaa816
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I2f1d0d6c04ee7210166e9847a850afbe6eaa7609
Jérémie Galarneau [Wed, 3 Nov 2021 02:31:15 +0000 (22:31 -0400)]
Fix: sessiond: snapshot: leak of trace chunk
Valgrind reports a leak after every snapshot record command:
==827791== 430 (280 direct, 150 indirect) bytes in 1 blocks are definitely lost in loss record 34 of 37
==827791== at 0x48435FF: calloc (vg_replace_malloc.c:1117)
==827791== by 0x223D01: zmalloc (macros.h:45)
==827791== by 0x224B79: lttng_trace_chunk_allocate (trace-chunk.c:387)
==827791== by 0x224E41: lttng_trace_chunk_create (trace-chunk.c:427)
==827791== by 0x150B55: session_create_new_trace_chunk (session.c:656)
==827791== by 0x164A11: snapshot_record (cmd.c:5113)
==827791== by 0x1651EE: cmd_snapshot_record (cmd.c:5302)
==827791== by 0x196E74: process_client_msg (client.c:2166)
==827791== by 0x198AF1: thread_manage_clients (client.c:2742)
==827791== by 0x18E245: launch_thread (thread.c:66)
==827791== by 0x4B9E258: start_thread (in /usr/lib/libpthread-2.33.so)
==827791== by 0x4CB45E2: clone (in /usr/lib/libc-2.33.so)
session_set_trace_chunk() on line 5162 returns a reference to the
current trace chunk which is never released.
This also causes tests/regression/tools/snapshots/test_ust_long to fail
due to a file descriptor exhaustion (presumably from using too many
directory file descriptors) when it is executed by an unprivileged user.
The CI doesn't catch this since the long regression test suite is
executed as root.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I4b6e45df48c3daafa2294c80ccd8a2b4d91401e1
Francis Deslauriers [Thu, 28 Oct 2021 15:01:22 +0000 (11:01 -0400)]
Tests: add missing kernel test cases to make check target
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I4ae8b8b1c130dd370322c2629f851d6aa1f557e7
Simon Marchi [Fri, 20 Aug 2021 19:19:42 +0000 (15:19 -0400)]
configure: enable -Wsuggest-attribute=format
Enable this warning, which suggests adding format attributes to some
functions, to help with format string validation. Fix the warnings it
generates by using the new ATTR_FORMAT_PRINTF macro.
There is only one spot we can't fix, that is in modprobe.c. The
compiler suggests we add an attribute to the kmod_set_log_fn
declaration, which we can't do, since it's not our code:
/home/simark/src/lttng-tools/src/bin/lttng-sessiond/modprobe.c: In function ‘setup_kmod_ctx’:
/home/simark/src/lttng-tools/src/bin/lttng-sessiond/modprobe.c:286:31: error: argument 2 of ‘kmod_set_log_fn’ might be a candidate for a format attribute [-Werror=suggest-attribute=format]
286 | kmod_set_log_fn(*ctx, log_kmod, NULL);
| ^~~~~~~~
I don't see any other choice but to explicitly ignore that spot.
Introduce some macros to abstract how to ignore specific diagnostics.
This is useful since not all compilers support the same diagnostic
flags. For example, telling clang to ignore -Wsuggest-attribute=format
would give an error.
Change-Id: I71278d7e2cdc66d4bbc59bd966469d0b427e963d
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Fri, 20 Aug 2021 19:22:16 +0000 (15:22 -0400)]
tests: sync tests/utils/tap with Babeltrace repository
There are a few fixes / improvements in the Babeltrace version of the
tap library, bring them here. The Babeltrace commit used is
0022a87819b0 ("Fix: clear_string_field(): set first character to 0") In
particular, I'm looking for the TAP_PRINTF_FORMAT fixes, to be able to
enable -Wsuggest-attribute=format. But I think it's easier to keep them
in sync, so bring in all changes, instead of just the TAP_PRINTF_FORMAT
ones.
The only diff not brought are the semicolons in the pass / fail
definitions, which were removed in the previous patch. Those changes
should be brought to the Babeltrace repository.
Bringing in the TAP_PRINTF_FORMAT changes finds a few format string
mistakes in the tests, fix them.
Change-Id: I0d125b313265e72303be8d704ba40554bca87cd1
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Philippe Proulx [Tue, 13 Jul 2021 14:48:19 +0000 (10:48 -0400)]
lttng_strerror(): accept positive and negative codes
Some functions return positive `lttng_error_code` enumerator values
while other return negative values.
Make lttng_strerror() accept positive and negative codes. This improves
the API UX and will make the documentation less complex.
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I359cef98ae32991d1a01b7f2cd484ece7bab2df9
Jonathan Rajotte [Tue, 23 Nov 2021 17:15:42 +0000 (12:15 -0500)]
Fix: test: use BABELTRACE_BIN instead of babeltrace
Observed issue
==============
The System tests jobs fails on multi-session test since the move to bt2.
Cause
=====
The tests uses `babeltrace` instead of `BABELTRACE_BIN`.
Solution
========
Use `BABELTRACE_BIN`.
Add a babelrace bail out.
While there fix easy shellcheck warning.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I441d736e85c447c5765bffd520ec2f267c86048f
Michael Jeanson [Wed, 11 Nov 2020 16:18:55 +0000 (11:18 -0500)]
Cleanup: consumerd: update stale comment
Change-Id: If65648f32e5b78cfd4e16d86f4cf9eac6c1b3f43
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Thu, 11 Nov 2021 20:02:54 +0000 (15:02 -0500)]
Fix: action executor: ref count imbalance for session object
Observed issue
==============
The following scenario leads to a hang on `lttng destroy`.
# Start lttng-sessiond under gdb
$ gdb lttng-sessiond
set pagination off
set non-stop
start
break action_executor_snapshot_session_handler
$ lttng add-trigger --name my_trigger --condition=event-rule-matches --type=user:tracepoint --name=sample_component:message --action=snapshot-session my_snapshot
$ lttng create --snapshot my_snapshot
$ lttng enable-event -u -a
$ lttng start
$ start an app producing a single sample_component:message
# gdb should break on thread 6
# inside gdb
thread 6
$ lttng destroy my_snapshot
$ lttng create --snapshot my_snapshot
$ lttng enable-event -u -a
$ lttng start
# inside gdb use `continue`
$ lttng destroy my_snapshot
The destroy command hang:
Destroying session my_snapshot.... ....
Cause
=====
The scenario forces the usage of the following code path:
if (session->id != LTTNG_OPTIONAL_GET(item->context.session_id)) {
624├───────────────> DBG("Session id for session `%s` (id: %" PRIu64
625│ " is not the same that was sampled (id: %" PRIu64
626│ " at the moment the work item was enqueued for %s` action of trigger `%s`",
627│ session_name, session->id,
628│ LTTNG_OPTIONAL_GET(item->context.session_id),
629│ get_action_name(action),
630│ get_trigger_name(work_item->trigger));
631│ ret = 0;
632│ goto error_unlock_list;
633│ }
At that point a reference on the session object was taken on line:
610│ session = session_find_by_name(session_name);
But the reference is never put on `error_unlock_list` resulting in a ref
count problem.
Solution
========
Use `session_put` for the code path.
Note that most of the handler also have the same problem that was
introduced by commit
72365501d3148ca977a09bad8de0ec51b427bdd8 [1]
Known drawbacks
=========
None.
Refs
=========
[1] https://github.com/lttng/lttng-tools/commit/
72365501d3148ca977a09bad8de0ec51b427bdd8
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I23c3c089866df74854bbfe64320310c4b28ee41d
Mathieu Desnoyers [Thu, 2 Dec 2021 22:33:55 +0000 (17:33 -0500)]
Fix: relayd: `!vsession->current_trace_chunk` assertion failed
Observed issue
==============
When performing:
#!/bin/bash
lttng create py_syscalls --live
lttng enable-event -u -a
lttng enable-event -k -a
lttng start
babeltrace2 -i lttng-live net://localhost/host/raton/py_syscalls
The relay daemon hits this assertion:
Thread 8 (Thread 0x7fffeeffd700 (LWP 167040) "lttng-relayd"):
#0 0x00007ffff7b1618b in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007ffff7af5859 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007ffff7af5729 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x00007ffff7b06f36 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x00005555555889bb in viewer_session_attach (vsession=0x7fffdc001400, session=session@entry=0x7fffe8001180) at viewer-session.c:80
#5 0x000055555557bcff in viewer_attach_session (conn=0x7fffd0001140) at live.c:1275
#6 process_control (conn=0x7fffd0001140, recv_hdr=0x7fffeeffcaf0) at live.c:2341
#7 thread_worker (data=<optimized out>) at live.c:2515
#8 0x00007ffff7ccd609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#9 0x00007ffff7bf2293 in clone () from /lib/x86_64-linux-gnu/libc.so.6
Cause
=====
This assert appears to be entirely wrong.
It checks that the "viewer session" has a NULL current trace chunk when
attaching a session to a viewer session, but in the case where a viewer
session has multiple sessions (e.g. with kernel and ust tracing
combined), we are attaching each session individually to the viewer
session, and we set the current trace chunk of the viewer session when
we attach the first session to it.
So it is expected to be non-NULL when attaching the second session.
Solution
========
Remove the assertion.
Known limitations
=================
None.
Fixes: #1335
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I4d8f5d5347b4588144ddf449976cae5a94b81b3a
Jérémie Galarneau [Wed, 8 Dec 2021 21:07:29 +0000 (16:07 -0500)]
Docs: relayd: document the semantics of a viewer session trace chunk
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ia0af16114a5adc54b90f60f26ffffe682eecb1a6
Michael Jeanson [Fri, 15 Oct 2021 16:08:13 +0000 (12:08 -0400)]
relayd: do not link lttng-relayd on liblttng-ctl
Building on Cygwin, we get:
libtool: link: gcc -shared .libs/cyglttng-ctl-0.dll.def .libs/lttng-ctl.o .libs/snapshot.o .libs/lttng-ctl-health.o .libs/save.o .libs/load.o .libs/deprecated-symbols.o .libs/channel.o .libs/rotate.o .libs/event.o .libs/destruction-handle.o .libs/clear.o .libs/tracker.o -Wl,--whole-archive ../../../src/common/sessiond-comm/.libs/libsessiond-comm.a ../../../src/common/.libs/libcommon.a -Wl,--no-whole-archive -L/cygdrive/c/Users/jenkins/workspace/lttng-tools_master_winbuild/arch/cygwin64/babeltrace_version/stable-2.0/build/std/conf/relayd-only/liburcu_version/master/test_type/base/deps/build/lib -lxml2 -L/build/lib -lurcu -lurcu-common -lurcu-cds -lrt -pthread -g -O2 -pthread -o .libs/cyglttng-ctl-0.dll -Wl,--enable-auto-image-base -Xlinker --out-implib -Xlinker .libs/liblttng-ctl.dll.a
/usr/lib/gcc/x86_64-pc-cygwin/11/../../../../x86_64-pc-cygwin/bin/ld: cannot export error_log_time: symbol not defined
This is caused by commit
ca806b0b247f ("liblttng-ctl: use export list to
define exported symbols"). liblttng-ctl is not needed on Cygwin, so
rather than fight it to resolve the issue and have liblttng-ctl build on
Cygwin, skip building liblttng-ctl on Cygwin.
liblttng-ctl is currently built on Cygwin because relayd depends on it.
This should not be the case, as relayd does not use liblttng-ctl. Remove
the dependency on liblttng-ctl from lttng-relayd/Makefile.am. Remove the
dependency on libconfig.la (used for session config save and load) as
well. It's included in libcommon.la, so it's redundant, but relayd
doesn't need it anyway.
Change-Id: If7e1944c4a30b8adcdd6e6d3083a94f27988697e
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Fri, 15 Oct 2021 19:15:51 +0000 (15:15 -0400)]
common: split ini-config in its own convenience library
The src/common/config contains code related to two kinds of unrelated
"config": the ini config, used for configuration files, and the XML
session configuration (used for loading/saving sessions, and
incidentally MI).
Split the ini config in its own convenience library, in
src/common/ini-config and keep the rest under src/common/config.
Move ini-related things out of config/session-config.{cpp,h} and into
ini-config/ini-config.{cpp,h}.
Change-Id: Ia0b2b6cdcc15198e20444aa30f1fc86c053176d9
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Fri, 19 Nov 2021 16:40:08 +0000 (11:40 -0500)]
Fix: C++ syntax of macOS compat code
Minors fixes to the macOS compat code to build with a C++ compiler
following the conversion of the binaries.
Change-Id: Ic879d56d0025c838d5a34b0b45d02bae02a12053
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Fri, 19 Nov 2021 19:35:51 +0000 (14:35 -0500)]
Docs: optional.h: incorrect declaration example
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I1c9f6e8c3200bc72a5eb29e6459c3449bf0771cd
Michael Jeanson [Thu, 18 Nov 2021 22:43:55 +0000 (17:43 -0500)]
Fix: g++ 4.8 doesn't support non-trivial designated initializers
GCC 4.8 doesn't support initializing a compound literal or an anonymous
structure with only empty brackets '{}'. Add members until it stops
complaining.
Change-Id: I22f05d3f8791e34da2618b0cae282da994c670f3
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Thu, 18 Nov 2021 16:15:29 +0000 (11:15 -0500)]
Fix: pass explicit type to std::min for 32-bit platforms
Provide an explicit type to templates when comparing a 'uint64_t' with a
type that is less than 64-bits on 32-bit platforms.
main.cpp: In function ‘relay_connection_status relay_process_data_receive_payload(relay_connection*)’:
main.cpp:3632:44: error: no matching function for call to ‘min(uint64_t&, const size_t&)’
3632 | size_t recv_size = std::min(left_to_receive, chunk_size);
| ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Change-Id: I8fe8725c5c888cce9c54d564bd668e9723c0f947
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Francis Deslauriers [Wed, 17 Nov 2021 20:53:15 +0000 (15:53 -0500)]
Cleanup: ust-app.c: remove extra parenthesis in log statements
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ie3e59dbf9444e48edae9ae97c4a38770f739b698
Michael Jeanson [Mon, 1 Nov 2021 21:06:35 +0000 (17:06 -0400)]
Cleanup: typo: commiting -> committing
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: If2c6ad336e783efb0a93f29f4c5090ad4c360b0a
Michael Jeanson [Mon, 1 Nov 2021 21:05:51 +0000 (17:05 -0400)]
Cleanup: typo: existance -> existence
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I1573b1858896ee631756578afa9639533e75ca47
Francis Deslauriers [Tue, 16 Nov 2021 19:49:17 +0000 (14:49 -0500)]
Cleanup: typo: everythong -> everything
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I21604bf997ba8afd1824849f865cc3adf6c1e528
Francis Deslauriers [Fri, 1 Oct 2021 21:02:27 +0000 (17:02 -0400)]
Cleanup: typo: mutliple -> multiple
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ia9b1b4adcaed7045a60418bae93a4cf73a588f25
Simon Marchi [Fri, 15 Oct 2021 19:55:28 +0000 (15:55 -0400)]
Remove extern "C" from internal headers
All internal code is now compiled as C++, we can now remove all 'extern
"C"' declarations from internal headers. This means files will see each
other's declarations as C++, and we can now use things in headers.
Remove the min/min_t/max/max_t macros from macros.h as well.
Change-Id: I5a6b7ef60be5f46160c6d5ca39f082d2137d5a07
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Mon, 6 Sep 2021 13:45:35 +0000 (09:45 -0400)]
tests: compile some tools/tests as C++
These tests use things from the common libs, or at least include header
files from src/common. These files are going to contain C++-specific
things in a following commit, so it's easier if we compile them
tools/tests as C++.
Change-Id: Ib99f2373beb414c50eaa10b35e0d895bc37e4e64
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Sat, 4 Sep 2021 23:32:47 +0000 (19:32 -0400)]
tests: compile all unit tests as C++
Change-Id: I9938237cee13a534d5c52288983238c0871dc504
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Wed, 22 Sep 2021 12:21:14 +0000 (08:21 -0400)]
common: compile libcommon as C++
Change-Id: I5160ef36932d71a4d925018fe0763bbc78b88009
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Fri, 3 Sep 2021 21:31:29 +0000 (17:31 -0400)]
common: compile libconsumer, libust-consumer, libkernel-consumer as C++
Change-Id: I6d51a069b360121152286a674d551fd5e80bfe2f
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Fri, 3 Sep 2021 21:31:29 +0000 (17:31 -0400)]
common: compile libindex as C++
Change-Id: Ia9ade43bad7d53a652e83529d0e4f3d29b7839c8
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Fri, 3 Sep 2021 21:31:29 +0000 (17:31 -0400)]
common: compile libtestpoint as C++
Change-Id: I2dae562c4632ea3ecfa1365f573baa6d6ba232bf
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Fri, 3 Sep 2021 21:31:29 +0000 (17:31 -0400)]
common: compile librelayd as C++
Change-Id: I59e983fce3bc3a9850a9df7c1b76966d2c9f175c
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Wed, 22 Sep 2021 12:16:19 +0000 (08:16 -0400)]
common: compile libsessiond-common as C++
Convert lttcomm_readable_code to a switch in a function, and rework
lttcomm_get_readable_code a bit. That changes the behavior a bit, but
probably not in a meaningful way.
Change-Id: I92da95f5d27de9df176835e820dd81ab93fb7b89
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Fri, 3 Sep 2021 21:31:29 +0000 (17:31 -0400)]
common: compile libkernel-ctl as C++
Change-Id: If0c92341d8efea2d093d404d25399bd8d75fc059
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Fri, 3 Sep 2021 21:31:29 +0000 (17:31 -0400)]
common: compile libhealth as C++
Change-Id: Ie5362170a7b58850bbb00b30e130dfb0986a2dd3
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Wed, 22 Sep 2021 12:13:04 +0000 (08:13 -0400)]
common: compile libhashtable as C++
Change-Id: Ia91d3207ffffb0cd45ee987107058dc9e4690c94
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Wed, 6 Oct 2021 16:16:33 +0000 (12:16 -0400)]
common: compile libfilter as C++
This patch renames filter-lexer.l to filter-lexer.lpp and
filter-parser.y to filter-parser.ypp. That makes automake pass the
right options to flex/bison to generate C++ code.
In filter-lexer.lpp, Instead of having declarations with the `unused`
attribute for yyunput and yyinput, use the noinput and nounput options.
The rest of the changes are standard C to C++ conversion stuff.
Change-Id: Ie4bf1981b970145f97e8db1d88edaa2d9b95aef4
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Wed, 22 Sep 2021 01:06:30 +0000 (21:06 -0400)]
common: compile libfd-tracker as C++
Change-Id: I6eed13d24dd8a306f4aa474f7d739931f0750635
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Wed, 22 Sep 2021 01:00:28 +0000 (21:00 -0400)]
common: compile libconfig as C++
As seen previously, make some global variables that need to be exported
non-const.
config_element_tracker_pid_target_legacy seems unused and clang gives an
unused warning. It was not exposed publicly, so I think it's safe to
remove it.
Change-Id: I6f9e7d77a7a04b02ae6585383c11389869b4a79a
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Fri, 3 Sep 2021 21:31:29 +0000 (17:31 -0400)]
common: compile libcompat as C++
I got errors like these for programs that use libcompat (usually through
libcommon), but are still linked with gcc, rather than g++:
CCLD filter-grammar-test
/usr/bin/ld: ./.libs/libcommon.a(directory-handle.o):(.data.rel.local.DW.ref.__gxx_personality_v0[DW.ref.__gxx_personality_v0]+0x0): undefined reference to `_
_gxx_personality_v0'
Automake still links them with gcc, because they don't contain any C++
source directly. Fix that by changing them to be C++ source.
Change-Id: I3eeca3d9af8940795b69f48d306f282ae0b08589
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 17 Nov 2021 23:55:35 +0000 (18:55 -0500)]
.gitignore: ignore generated filter parser sources
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I19fd4aa8cb4ef0c48fbacf61aa869497777e12fa
Simon Marchi [Wed, 22 Sep 2021 00:57:54 +0000 (20:57 -0400)]
common: compile libbytecode as C++
Change-Id: I33642d43db58b9631772ad99f6fb9d4babf9d888
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Fri, 3 Sep 2021 21:31:29 +0000 (17:31 -0400)]
common: compile libstring-utils as C++
The code of string-utils.cpp is compiled as C++, but the functions are
still exported as C symbols for the moment (until all users are
converted to C++).
The only thing of interest here is this error:
CXX string-utils.lo
/home/simark/src/lttng-tools/src/common/string-utils/string-utils.cpp: In function ‘star_glob_pattern_type_flags strutils_test_glob_pattern(const char*)’:
/home/simark/src/lttng-tools/src/common/string-utils/string-utils.cpp:89:37: error: invalid conversion from ‘int’ to ‘star_glob_pattern_type_flags’ [-fpermissive]
89 | ret |= STAR_GLOB_PATTERN_TYPE_FLAG_END_ONLY;
| ~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| |
| int
In C++, you can't freely use bitwise operator on enumerators. I added
an operator|= free function to handle it, which converts to the
underlying type and back. If we have many of these enums used as flags,
we could think of adding a class for that, like enum_flags in GDB:
https://gitlab.com/gnutools/binutils-gdb/-/blob/
7a6cb96b710257a4f5bc7e85cc103b6bf8dfc25c/gdbsupport/enum-flags.h
Change-Id: I64b458a6f6c1e5a131525826a116607eef824aaa
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Wed, 6 Oct 2021 16:15:33 +0000 (12:15 -0400)]
lib: compile liblttng-ctl as C++
Same as the previous commits, but compile the liblttng-ctl library as
C++ code (while still offering a C interface).
Some exported global variables (for example in deprecated-symbols.cpp)
have to be made non-const, otherwise we get:
CXX deprecated-symbols.lo
/home/simark/src/lttng-tools/src/lib/lttng-ctl/deprecated-symbols.cpp:21:33: error: ‘visibility’ attribute ignored [-Werror=attributes]
21 | LTTNG_EXPORT const char * const config_element_pid_tracker;
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
I think this is related to the fact that const global variables
automatically have internal linkage in C++.
Despite using -export-symbols in src/lib/lttng-ctl/Makefile.am, some new
ELF symbols become exposed. It could be related to this, in the ld man
page:
--retain-symbols-file does not discard undefined symbols, or
symbols needed for relocations.
One new symbol I see, for example, is `_Z16connect_sessiondv`. Looking
at liblttng-ctl.so, I indeed see a relocation for this symbol:
000000000010b778 0000053e00000007 R_X86_64_JUMP_SLOT
00000000000314a2 _Z16connect_sessiondv + 0
And that would explain why the linker keeps that symbol visible. This
is related to the entry for this function in the procedure linkage
table. I'm not entirely sure why these functions didn't generate a PLT
entry in C, but do in C++.
To avoid these new symbols, build everything with -fvisibility=hidden
and -fvisibility-inlines-hidden, and tag functions we really want to
export with LTTNG_EXPORT, a new macro defined to
__attribute__((visibility("default"))). This macro is publicly visible,
because it has to be used in distributed header files (although it's of
no use for users of liblttng-ctl).
Change-Id: Ie51bf0a2edfb87e5f46f9c39eed5309d9f8c41d6
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Fri, 3 Sep 2021 21:31:29 +0000 (17:31 -0400)]
bin: compile lttng-consumerd as a C++
Same as the previous commits, but change lttng-consumerd to be a C++
program.
Function lttng_consumer_get_type is dlsym-ed by
tests/regression/tools/notification/consumer_testpoints.c, so we have
two choices: make the dlsym use the mangled name or make the function
have a C linkage name. The latter is better, because the mangled name
is implementation-defined, and therefore more fragile.
Change-Id: I1986e38c193a935721e52c89426c89fd434ccd6b
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Fri, 3 Sep 2021 21:31:29 +0000 (17:31 -0400)]
bin: compile lttng-crash as a C++
Same as the previous commits, but change lttng-crash to be a C++
program.
Change-Id: Ia487e12d2db07e2cd091637d38669289bcaff5d4
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Fri, 3 Sep 2021 21:31:29 +0000 (17:31 -0400)]
bin: compile lttng-relayd as a C++
Same as the previous commit, but change lttng-relayd to be a C++
program. In addition to what was mentioned in previous commits:
- Test test_relayd_backward_compat_group_by_session uses an object file
from the relayd binary, so it's simpler to change this test to be C++
as well.
- There are some declarations related to fs_handle in fd-tracker.h.
Remove them, as they are already in fs-handle.h.
Change-Id: I72c5fdecd2e82c30633563a47bd641af1dcfa29c
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Wed, 6 Oct 2021 16:14:41 +0000 (12:14 -0400)]
bin: compile lttng-sessiond as C++
Same as commit
48a400056134 ("bin: compile lttng as C++"), but change
lttng-sessiond to be a C++ program. In addition to the categories of
changes already mentioned in that commit's message, here are some
interesting changes:
- Add an include in trigger.h, an exported header, to fix:
CXX notification-thread.lo
In file included from /home/simark/src/lttng-tools/src/bin/lttng-sessiond/notification-thread.cpp:9:
/home/simark/src/lttng-tools/include/lttng/trigger/trigger.h:142:13: error: use of enum ‘lttng_error_code’ without previous declaration
142 | extern enum lttng_error_code lttng_register_trigger_with_name(
| ^~~~~~~~~~~~~~~~
- We get this with clang:
CXX lttng-conf.o
In file included from /home/simark/src/lttng-tools/src/bin/lttng/conf.cpp:18:
In file included from /home/simark/src/lttng-tools/src/common/common.h:14:
In file included from /home/simark/src/lttng-tools/src/common/runas.h:17:
In file included from /home/simark/src/lttng-tools/src/common/sessiond-comm/sessiond-comm.h:38:
In file included from /home/simark/src/lttng-tools/src/common/unix.h:17:
/home/simark/src/lttng-tools/src/common/payload-view.h:82:27: error: 'lttng_payload_view_from_payload' has C-linkage specified, but returns user-defined type 'struct lttng_payload_view' which is incompatible with C [-Werror,-Wreturn-type-c-linkage]
struct lttng_payload_view lttng_payload_view_from_payload(
^
Turns out that because of the "const" field in lttng_payload_view,
clang doesn't consider that type incompatible with C. I don't
really want to remove the "const" for C code using that API, so
conditionally remove it if we are compiling with clang in C++.
- clang gives:
CXX event.lo
In file included from /home/simark/src/lttng-tools/src/bin/lttng-sessiond/event.cpp:19:
/home/simark/src/lttng-tools/src/common/bytecode/bytecode.h:50:1: error: struct has size 0 in C, size 1 in C++ [-Werror,-Wextern-c-compat]
struct literal_string {
^
It looks like that type isn't even used? Remove it.
- it's not possible to initialize some union members, for example with
lttcomm_consumer_msg, in consumer.cpp. Initialize it in a separate
statement.
- It's not possible to use the transparent union trick when calling
urcu function, for example in thread_application_registration, in
register.cpp. We need to instantiate a cds_wfcq_head_ptr_t object,
assign the appropriate field, and pass that object to the function.
- the ALIGNED_CONST_PTR trick does not work in C++:
CXX consumer.lo
In file included from /home/simark/src/lttng-tools/src/common/error.h:19,
from /home/simark/src/lttng-tools/src/common/common.h:12,
from /home/simark/src/lttng-tools/src/bin/lttng-sessiond/consumer.cpp:19:
/home/simark/src/lttng-tools/src/bin/lttng-sessiond/consumer.cpp: In function ‘int consumer_send_relayd_socket(consumer_socket*, lttcomm_relayd_sock*, consumer_output*, lttng_stream_type, uint64_t, const char*, const char*, const char*, int, const uint64_t*, time_t, bool)’:
/home/simark/src/lttng-tools/src/common/macros.h:116:58: error: expected primary-expression before ‘]’ token
116 | #define ALIGNED_CONST_PTR(value) (((const typeof(value) []) { value }))
| ^
/home/simark/src/lttng-tools/src/bin/lttng-sessiond/consumer.cpp:1192:48: note: in expansion of macro ‘ALIGNED_CONST_PTR’
1192 | ret = consumer_send_fds(consumer_sock, ALIGNED_CONST_PTR(rsock->sock.fd), 1);
| ^~~~~~~~~~~~~~~~~
Replace uses with copying the data in a local variable (which is
properly aligned), and pass the address to that variable to the
function.
- In consumer.h, an array field in a structure is defined using
the max macro. It can't be replaced with std::max, since std::max
isn't constexpr in C++11. Define a max_constexpr function locally
and use it.
- g++ 7 doesn't support non-trivial designated initializers, leading to
errors like:
CXX globals.lo
/home/smarchi/src/lttng-tools/src/bin/lttng-sessiond/globals.cpp:44:1: sorry, unimplemented: non-trivial designated initializers not supported
};
^
Change consumer_data to have a constructor instead. Change
initializations of some structures, such as lttcomm_lttng_msg, to
initialize the fields separate from the variable declaration. This
requires making these variable non-const which is not ideal. But
once everything is C++, these types could get a fancy constructor,
and then they can be made const again.
- When compiling without UST support the stub versions of functions
ust_app_rotate_session & co, in ust-app.h, are used. Some of them
have the return type "enum lttng_error_code", but return 0, an invalid
value, causing:
CXX main.o
In file included from /home/smarchi/src/lttng-tools/src/bin/lttng-sessiond/lttng-sessiond.h:22:0,
from /home/smarchi/src/lttng-tools/src/bin/lttng-sessiond/main.cpp:45:
/home/smarchi/src/lttng-tools/src/bin/lttng-sessiond/ust-app.h: In function ‘lttng_error_code ust_app_snapshot_record(ltt_ust_session*, const consumer_output*, int, uint64_t)’:
/home/smarchi/src/lttng-tools/src/bin/lttng-sessiond/ust-app.h:575:9: error: invalid conversion from ‘int’ to ‘lttng_error_code’ [-fpermissive]
return 0;
^
Change these functions to return LTTNG_ERR_UNK. These functions are
not supposed to be called if UST support is not included. But even
if they were: all their callers check that the return value is not
LTTNG_OK. The value 0 would be considered an error, so will be
LTTNG_ERR_UNK.
Change-Id: I2cdd34459a54b1943087b43843ef20b35b7bf7d8
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Wed, 10 Nov 2021 13:42:25 +0000 (08:42 -0500)]
Fix: tests: fix unused-but-set warning in test_fd_tracker.c
When building with clang-14 on Ubuntu 20.04, I get:
CC test_fd_tracker.o
/home/smarchi/src/lttng-tools/tests/unit/test_fd_tracker.c:169:15: error: variable 'fds_set_to_minus_1' set but not used [-Werror,-Wunused-but-set-variable]
unsigned int fds_set_to_minus_1 = 0;
^
The compiler seems right, so remove fds_set_to_minus_1. It might be
that the intention was to assert something using this variable, but I
couldn't figure it out.
Change-Id: I12bfd07bca7829de8d5b85d375d9b52bd84d677a
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Wed, 10 Nov 2021 13:39:22 +0000 (08:39 -0500)]
Fix: sessiond: fix possible buffer overflow warning
When compiling with clang-14 on Ubuntu 20.04, I get:
CC lttng-syscall.lo
/home/smarchi/src/lttng-tools/src/bin/lttng-sessiond/lttng-syscall.c:70:13: error: 'fscanf' may overflow; destination buffer in argument 4 has size 255, but the corresponding specifier may require size 256 [-Werror,-Wfortify-source]
&index, name, &bitness) == 3) {
^
I think the compiler is right, we read a string when length up to 255 in
a buffer of size 255. We need one more byte for the NULL terminator,
fix that.
Change-Id: I6b2eec401af3ef6230dd4b6c8559032de9b54584
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Wed, 17 Nov 2021 02:29:16 +0000 (21:29 -0500)]
common: fix compilation when strnlen, strnup and memrchr are not available
Add a few casts to fix build failure when these functions are not
available. For example:
CXX lttng-conf.o
In file included from /home/smarchi/src/lttng-tools/src/common/macros.h:15:0,
from /home/smarchi/src/lttng-tools/src/common/error.h:19,
from /home/smarchi/src/lttng-tools/src/common/common.h:12,
from /home/smarchi/src/lttng-tools/src/bin/lttng/conf.cpp:18:
/home/smarchi/src/lttng-tools/src/common/compat/string.h: In function ‘size_t lttng_strnlen(const char*, size_t)’:
/home/smarchi/src/lttng-tools/src/common/compat/string.h:28:14: error: invalid conversion from ‘const void*’ to ‘const char*’ [-fpermissive]
end = memchr(str, 0, max);
~~~~~~^~~~~~~~~~~~~
This can be tested by modifying config.cache and setting the relevant
config variables to "no".
Change-Id: Ieb6debf32c82927767ad32d2f4aa62fb829a9f9f
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Mon, 23 Aug 2021 21:12:28 +0000 (17:12 -0400)]
Fix: tests: app unregistering is not guaranteed by app lifetime
Observed issue
==============
The per-pid timer based rotation tests fail on a minimal ptest
yocto image.
The test suite report that the second archive is not empty as it
expects.
Note that the yocto/OE image is running under QEMU without
KVM.
Cause
=====
Since the image is running under QEMU without KVM, the overall
processing capability of the VM is quite limited.
The test seems to assume that between the first and the second rotation
the app will be unregistered by the time the second rotation is issued.
Note that the observable lifetime of an app is not equal to the
lttng-sessiond/consumerd app visibility since we deal with app
unregistration via a polling mechanism.
Note, that as far as I understand, this is a testing issue only.
It is still relevant in the context of rotation to validate that the second
rotation archive does NOT contain info for a "dead" app under per-pid
configuration.
Solution
========
Move the rotation timer operation after the app is registered and
considered unregistered from the point of view of
lttng-sessiond/lttng-consumerd. This should give us a more robust
approach.
Known drawbacks
=========
None.
References
==========
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ie8c542d29ef8bdb325efc05de14e80b179c68754
Jérémie Galarneau [Tue, 2 Nov 2021 19:35:27 +0000 (15:35 -0400)]
Clean-up: lttngctl: copy string using strdup instead of asprintf
Use strdup to copy a string instead of using the formatting utilities.
The code of the function is also adapted to follow the coding standards.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I13328737e42a0287a7402761f75c67eef0dbdfdd
Jonathan Rajotte [Tue, 19 Oct 2021 19:22:39 +0000 (15:22 -0400)]
Fix: lttng-ctl: tracing_group memory leaks
Observed issue
==============
liblttng-ctl leaks memory if `lttng_set_tracing_group` is called at least
1 time by an API client.
joraj@~/lttng/master/lttng-tools-dev [master][]$ valgrind --leak-check=full lttng --group=joraj list
==24823== Memcheck, a memory error detector
==24823== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==24823== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==24823== Command: lttng --group=joraj list
==24823==
Error: No session daemon is available
==24823==
==24823== HEAP SUMMARY:
==24823== in use at exit: 8 bytes in 1 blocks
==24823== total heap usage: 55 allocs, 54 frees, 87,023 bytes allocated
==24823==
==24823== 8 bytes in 1 blocks are definitely lost in loss record 1 of 1
==24823== at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==24823== by 0x4BA7DC7: __vasprintf_internal (vasprintf.c:71)
==24823== by 0x4C4B742: __asprintf_chk (asprintf_chk.c:34)
==24823== by 0x48687D9: asprintf (stdio2.h:181)
==24823== by 0x48687D9: lttng_set_tracing_group (lttng-ctl.c:2620)
==24823== by 0x4011B89: call_init.part.0 (dl-init.c:72)
==24823== by 0x4011C90: call_init (dl-init.c:30)
==24823== by 0x4011C90: _dl_init (dl-init.c:119)
==24823== by 0x4001139: ??? (in /usr/lib/x86_64-linux-gnu/ld-2.31.so)
==24823== by 0x2: ???
==24823== by 0x1FFEFFFCFE: ???
==24823== by 0x1FFEFFFD04: ???
==24823== by 0x1FFEFFFD12: ???
==24823==
==24823== LEAK SUMMARY:
==24823== definitely lost: 8 bytes in 1 blocks
==24823== indirectly lost: 0 bytes in 0 blocks
==24823== possibly lost: 0 bytes in 0 blocks
==24823== still reachable: 0 bytes in 0 blocks
==24823== suppressed: 0 bytes in 0 blocks
==24823==
==24823== For lists of detected and suppressed errors, rerun with: -s
==24823== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Cause
=====
The allocated pointer in the library constructor is not freed on
subsequent assignation.
Solution
========
Free the pointer.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ie1d4c45df2764a88c74d56de691783df9215633c
Jonathan Rajotte [Fri, 15 Oct 2021 19:50:19 +0000 (15:50 -0400)]
Cleanup: missing ')'
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I3198306b7f7cb081155dd53631db26e282f23526
Francis Deslauriers [Tue, 2 Nov 2021 13:33:02 +0000 (09:33 -0400)]
Fix: use <unistd.h> instead of <sys/unistd.h>
Fixes: #1330
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I07cabde5a0295de06f7c6f42dd12de803b57c907
Francis Deslauriers [Mon, 1 Nov 2021 19:31:26 +0000 (15:31 -0400)]
Fix: Tests: unchecked `close()` return value
CID
1465101 (#1 of 1): Unchecked return value (CHECKED_RETURN)
9. check_return: Calling close without checking return value (as is done
elsewhere 177 out of 185 times).
CID
1465100 (#1 of 1): Unchecked return value (CHECKED_RETURN)
4. check_return: Calling close without checking return value (as is done
elsewhere 177 out of 185 times)
CID
1465099 (#1 of 1): Unchecked return value (CHECKED_RETURN) 4.
check_return: Calling close without checking return value (as is done
elsewhere 177 out of 185 times).
CID
1465098 (#1 of 1): Unchecked return value (CHECKED_RETURN) 4.
check_return: Calling close without checking return value (as is done
elsewhere 177 out of 185 times).
CID
1465097 (#1 of 1): Unchecked return value (CHECKED_RETURN) 4.
check_return: Calling close without checking return value (as is done
elsewhere 177 out of 185 times).
Reported-by: Coverity Scan
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I8e2552c75ab7cec5aa3707e2c1c4d9f2484b501a
Jérémie Galarneau [Mon, 1 Nov 2021 19:43:55 +0000 (15:43 -0400)]
Fix: relayd: live: mishandled initial null trace chunk
Observed issue
==============
As reported in #1323 (https://bugs.lttng.org/issues/1323), crashes of
the relay daemon are observed when running the user space clear tests.
The crash occurs with the following stack trace:
#0 0x000055fbb861d6ae in urcu_ref_get_unless_zero (ref=0x28) at /usr/local/include/urcu/ref.h:85
#1 lttng_trace_chunk_get (chunk=0x0) at trace-chunk.c:1836
#2 0x000055fbb86051e2 in make_viewer_streams (relay_session=relay_session@entry=0x7f6ea002d540, viewer_session=<optimized out>, seek_t=seek_t@entry=LTTNG_VIEWER_SEEK_BEGINNING, nb_total=nb_total@entry=0x7f6ea9607b00, nb_unsent=nb_unsent@entry=0x7f6ea9607aec, nb_created=nb_created@entry=0x7f6ea9607ae8, closed=<optimized out>) at live.c:405
#3 0x000055fbb86061d9 in viewer_get_new_streams (conn=0x7f6e94000fc0) at live.c:1155
#4 process_control (conn=0x7f6e94000fc0, recv_hdr=0x7f6ea9607af0) at live.c:2353
#5 thread_worker (data=<optimized out>) at live.c:2515
#6 0x00007f6eae86a609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#7 0x00007f6eae78f293 in clone () from /lib/x86_64-linux-gnu/libc.so.6
The race window during which this occurs seems very small as it can take
hours to reproduce this crash. However, a minimal reproducer could be
identified, as stated in the bug report.
Essentially, the same crash can be reproduced by attaching a live viewer
to a session that has seen events being produced, been stopped and been
cleared.
Cause
=====
The crash occurs as an attempt is made to take a reference to a viewer
session’s trace chunk as viewer streams are created. The crux of the
problem is that the code doesn’t expect a viewer session’s trace chunk
to be NULL.
The viewer session’s current trace chunk is initially set, when a viewer
attaches to the viewer session, to a copy the corresponding
relay_session’s current trace chunk.
A live session always attempts to "catch-up" to the newest available
trace chunk. This means that when a viewer reaches the end of a trace
chunk, the viewer session may not transition to the "next" one: it jumps
to the most recent trace chunk available (the one being produced by the
relay_session). Hence, if the producer performs multiple rotations
before a viewer completes the consumption of a trace chunk, it will skip
over those "intermediary" trace chunks.
A viewer session updates its current trace chunk when:
1) new viewer streams are created,
2) a new index is requested,
3) metadata is requested.
Hence, as a general principle, the viewer session will reference the
most recent trace chunk available _even if its streams do not point to
it_. It indicates which trace chunk viewer streams should transition to
when the end of their current trace chunk is reached.
The live code properly handles transitions to a null chunk. This can be
verified by attaching a viewer to a live session, stopping the session,
clearing it (thus entering a null trace chunk), and resuming tracing.
The only issue is that the case where the first trace chunk of a viewer
session is "null" (no active trace chunk) is mishandled in two places:
1) in make_viewer_streams(), where the crash is observed,
2) in viewer_get_metadata().
Solution
========
In make_viewer_streams(), it is assumed that a viewer session will have
a non-null trace chunk whenever a rotation is not ongoing. This is
reflected by the fact that a reference is always acquired on the viewer
session’s trace chunk.
That code is one of the three places that can cause a viewer session’s
trace chunk to be updated. We still want to update the viewer session to
the most recently seen trace chunk (null, in this case). However, there
is no reference to acquire and the trace chunk to use for the creation
of the viewer stream is NULL. This is properly handled by
viewer_stream_create().
The second site to change is viewer_get_metadata() which doesn’t handle
a viewer metadata stream not having an active trace chunk at all.
Thankfully, the protocol allows us to express this condition by
returning the LTTNG_VIEWER_NO_NEW_METADATA status code when a viewer
metadata stream doesn’t have an open file and doesn’t have a current
trace chunk.
Surprisingly, this bug didn’t trigger in the case where a transition to
a null chunk occurred _after_ attaching to a viewer session.
This is because viewers will typically ask for metadata as a result of an
LTTNG_VIEWER_FLAG_NEW_METADATA reply to the GET_NEXT_INDEX command. When
a session is stopped and all data was consumed, this command returns
that no new data is available, causing the viewers to wait and ask again
later.
However, when attaching, babeltrace2 (at least, and probably babeltrace 1.x)
always asks for an initial segment of metadata before asking for an
index.
Known drawbacks
===============
None.
Fixes: #1323
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I516fca60755e6897f6b7170c12d706ef57ad61a5
Jérémie Galarneau [Mon, 1 Nov 2021 19:44:04 +0000 (15:44 -0400)]
Docs: relayd: document the lifetime of viewer session trace chunks
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I167ae4e651099c824bb24d5497199d1414b6ca1a
Francis Deslauriers [Fri, 8 Oct 2021 13:41:04 +0000 (09:41 -0400)]
Tests: use babeltrace2 for all tests
- Change value of `BABELTRACE_BIN` to `babeltrace2`,
- Replace all direct calls to babeltrace with calls to `BABELTRACE_BIN`
variable,
- Add `bail_out_if_no_babeltrace` bash function to fail the test if
Babeltrace 2 is needed but not found.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ieeffe4977660f47561732223e94a1b8f5a00ef0e
Francis Deslauriers [Thu, 7 Oct 2021 19:25:28 +0000 (15:25 -0400)]
Tests: port validate_select_poll_epoll.py to bt2 python bindings
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I6ed3177e5bd31a56f4a1e84b53bf1d9025ac3161
Francis Deslauriers [Thu, 30 Sep 2021 18:43:11 +0000 (14:43 -0400)]
Fix: configure.ac: reporting SDT uprobe as a UST feature
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I86638d6a148b04e7131e4af7ec830c5e56817fdc
Francis Deslauriers [Thu, 28 Oct 2021 19:36:49 +0000 (15:36 -0400)]
Cleanup: Tests: remove trace after test
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I28024856d22f395922996fbb69c490fb1783b2e9
Francis Deslauriers [Wed, 20 Oct 2021 14:46:39 +0000 (10:46 -0400)]
Cleanup: Remove unused `live_find_viewer_stream_by_id()` func declaration
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I78f977a94ea3609fbee1176608ff97399e314e7f
Francis Deslauriers [Thu, 7 Oct 2021 18:52:27 +0000 (14:52 -0400)]
Fix: Tests: leaking epoll fd
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I5ec4fcdb87159f35932c20e7314cda764d14967c
Francis Deslauriers [Mon, 25 Oct 2021 15:32:24 +0000 (11:32 -0400)]
Typo: occurences -> occurrences
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I719e26febd639f3b047b6aa6361fc6734088e871
Jérémie Galarneau [Mon, 18 Oct 2021 18:43:18 +0000 (14:43 -0400)]
Fix: lttng: add-context: silence coverity warning
Coverity reports that:
1464653 Uninitialized scalar field
The field will contain an arbitrary value left over from earlier
computations.
In ctx_opts::ctx_opts(): A scalar field is not initialized by the
constructor.
uninit_member: Non-static class member hide_help is not initialized
in this constructor nor in any functions that it calls.
In our case it doesn't matter since a nullptr symbol indicates the end
the ctx_opts array. An "unknown" value is added to the context array
and used to initialize the end-of-list item to an invalid value.
Calling the ctx_opts(const char *symbol_, context_type ctx_type_, bool
hide_help_ = false) constructor causes `hide_help` to be initialized.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ie685fef5712705a1938d9e6b8a63d23ab5bc6b39
Jérémie Galarneau [Fri, 15 Oct 2021 21:03:38 +0000 (17:03 -0400)]
Build fix: Missing message in LTTNG_DEPRECATED invocation
Coverity scan build jobs fail since LTTNG_DEPRECATED expects a string
and none is provided at the lttng_metadata_regenerate use site.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I7e6701abd24c679f578b0adead771ac93b6566cd
Simon Marchi [Fri, 3 Sep 2021 21:31:28 +0000 (17:31 -0400)]
bin: compile lttng as C++
Compile the code of the lttng binary as C++ source.
- start by renaming all files under src/bin/lttng to have the .cpp
extension, adjust Makefile.am accordingly
- apply the sophisticated algorithm:
while does_not_build():
fix_error()
until completion
Fixes fall in these categories:
- add extern "C" to headers of functions implemented in C. This is
likely temporary: at some point some of these things will be
implemented in C++, at which point we'll remove the extern "C".
- rename mi_lttng_version to mi_lttng_version_data, to avoid a -Wshadow
warning about the mi_lttng_version function hiding the
mi_lttng_version's struct constructor
- we have the same warning about lttng_calibrate, but we can't rename
it, it's exposed in a public header. Add some pragmas to disable the
warning around there. We will need more macro smartness in case we
need to support a compiler that doesn't understand these pragmas.
- in filter-ast.h, add a dummy field to the empty struct, to avoid a
-Wextern-c-compat warning with clang++ (it warns us that the struct
has size 0 in C but size 1 in C++).
- in add_context.cpp, we can't initialize ctx_opts' union field like we
did in C. Fix that by adding a ctx_opts constructor for each kind of
context and implement the PERF_* macros to use them.
- need to explicitly cast void pointer to type of the destination, for
example the eturn value of allocation functions, or parameter of
"destroy" functions
- need to explicitly cast when passing an int to an enum parameter, for
example an lttng_error_code parameter
- remove use of designated array initializers, for example for
schedule_type_str in disable_rotation.cpp
- fix order of struct initializers to match order of field
declarations, for example in list_triggers.cpp, function
cmd_list_triggers
- rename some things to avoid clashing with keywords, for example in
runas.h
Change-Id: Id743b141552a412b4104af4dda8969eef5032388
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Wed, 6 Oct 2021 16:30:25 +0000 (12:30 -0400)]
Cleanup: always use sysconf to get the page size
Use 'sysconf(_SC_PAGE_SIZE)' across the code base which is works on all
our supported platforms.
Change-Id: I4231d45e0b03301de1274c0a5a4903cd17b4a80a
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Fri, 16 Jul 2021 17:58:10 +0000 (13:58 -0400)]
Cleanup: namespace 'align' macros
Remove the duplicate ALIGN() and ALIGN_TO() macro and replace them with
namespaced variants 'lttng_align_ceil()' and 'lttng_align_floor()'.
Change-Id: I683baccb4e97874e647cf557bad9653a336f4a6d
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Thu, 7 Oct 2021 20:19:41 +0000 (16:19 -0400)]
Fix: sessiond: previously created channel cannot be enabled
Observed issue
==============
A previously created channel cannot be enabled back once a session is
started.
Cause
=====
The check validating that the session was started is to early in the
`cmd_enable_channel` function.
Solution
========
Move the check at the creation code path when the channel is not found.
Known drawbacks
=========
None.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Change-Id: I8e7d62b7e97246e65f1cf9022270293a6dd34cc9
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Thu, 8 Jul 2021 18:17:51 +0000 (14:17 -0400)]
Fix: ust: app stuck on recv message during UST comm timeout scenario
Observed issue
==============
The following scenario lead to the UST thread to be "stuck" on recvmsg
on the notify socket.
The problem manifest itself when an application is unresponsive during
the ustctl_start_session call. Note that the default timeout for ust
communication is 5 seconds.
# Start an instrumented app
./app
gdb lttng-sessiond
# put a breakpoint on ustctl_start_session
lttng create my_session
lttng enable-event -u -a
lttng start
# The tracepoint should hit. Do not continue.
kill -s SIGSTOP $(pgrep app)
# Continue lttng-sessiond.
sleep 5 # This make sure lttng-sessiond unregister the app from its point of view
kill -s SIGCONT $(pgrep app)
gdb -p $(pgrep app)
thread apply all bt
App stack trace:
Thread 3 (Thread 0x7fe2c6f58700 (LWP 48172)):
#0 __libc_recvmsg (flags=0, msg=0x7fe2c6f56ac0, fd=4) at ../sysdeps/unix/sysv/linux/recvmsg.c:28
#1 __libc_recvmsg (fd=fd@entry=4, msg=msg@entry=0x7fe2c6f56ac0, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recvmsg.c:25
#2 0x00007fe2c7a010ba in ustcomm_recv_unix_sock (sock=sock@entry=4, buf=buf@entry=0x7fe2c6f56ea0, len=len@entry=48) at lttng-ust-comm.c:308
#3 0x00007fe2c7a037c3 in ustcomm_register_channel (sock=4, session=session@entry=0x7fe2c0000ba0, session_objd=<optimized out>, channel_objd=<optimized out>, nr_ctx_fields=nr_ctx_fields@entry=0, ctx_fields=<optimized out>, chan_id=0x7fe2
c6f5716c, header_type=0x7fe2c0012b18) at lttng-ust-comm.c:1544
#4 0x00007fe2c7a10787 in lttng_session_enable (session=0x7fe2c0000ba0) at lttng-events.c:444
#5 0x00007fe2c7a0b785 in lttng_session_cmd (objd=1, cmd=128, arg=
140611977311672, uargs=0x7fe2c6f57800, owner=0x7fe2c7a5da00 <local_apps>) at lttng-ust-abi.c:576
#6 0x00007fe2c7a07d6d in handle_message (lum=0x7fe2c6f57590, sock=3, sock_info=0x7fe2c7a5da00 <local_apps>) at lttng-ust-comm.c:1003
#7 ust_listener_thread (arg=0x7fe2c7a5da00 <local_apps>) at lttng-ust-comm.c:1712
#8 0x00007fe2c7993609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#9 0x00007fe2c78ba293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
...
Cause
=====
When the app continues after the timeout from lttng-sessiond side, the
actual start_session message is received on the application side then
UST, app side, send commands on the notify socket. On lttng-sessiond
side, the command is received but no reply is sent.
This is due to the fact that the lookup against the
ust_app_ht_by_notify_sock hash table (find_app_by_notify_sock)
return nothing since the app is unregistered at this point and the hash
table node was removed on unregistration.
Solution
========
When the app lookup fails, return an error that will trigger the cleanup
of the notify socket.
Known drawbacks
=========
None
Note
=========
Subsequent error path in reply_ust_register_channel,
add_event_ust_registry, and add_enum_ust_registry might lead to the same
type of problem since no reply is sent to the app. Still, for those
cases the complete application/notify socket should not be destroyed
since the error path relate to either a session or a sub object of a
session.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Change-Id: Iea0dc027ca1ee772e84c7e545114f1be69fd1f63
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Wed, 23 Jun 2021 02:17:03 +0000 (22:17 -0400)]
Fix: ust: UST communication can return -EAGAIN
Observed issue
==============
The following scenario lead to an abort on event creation. The
problem manifest itself when an application is unresponsive. Note that
the default timeout for ust communication is 5 seconds.
# Start an instrumented app
./app
gdb lttng-sessiond
# put a breakpoint on ustctl_create_event.
lttng create my_session
lttng enable-event -u -a
lttng start
# The tracepoint should hit. Do not continue.
kill -s SIGSTOP $(pgrep app)
# Continue lttng-sessiond.
# lttng-sessiond will abort.
Note that for UST this is not an expected behaviour. Expected
communication failure with a single app should not invalidate the
complete channel, compromise its setup or result in an abort.
Note that a similar scenario for the following ustctl call sites also
lead to scenario where failure of a single app lead to error reporting
and/or error propagation to upper level object.
Problematic callsites:
ustctl_set_exclusion
ustctl_set_filter
ustctl_disable_channel
These callsites are also fixed by this patch.
Cause
=====
For an unresponsive application, EAGAIN is returned and is treated as an
"unknown" hard error.
In this particular case the abort() call was introduced by commit:
88e3c2f5610b9ac89b0923d448fee34140fc46fb [1]. It is not clear if this is
a leftover from debugging session since this is the only callsite where
an abort is issued on communication failure via ustctl.
Solution
========
Handle EAGAIN coming from ustctl_* and treat it the same way a
dying application is handled. The only minor difference is that we WARN
on communication time out. Albeit not the most useful thing for a CLI
client, it could help overall user of lttng-sessiond in time out
situation.
Most call site already handled "unknown" error correctly. For those call
site we simply end up bringing more info in regards to the timeout
issue instead of mentioning that "-11" was returned.
Note, the reclamation of "app" is handled by the poll loop and
ust_app_unregister since the socket is shutdown by lttng-ust internally
on error, including EAGAIN.
Note that the application will try to register itself back to the
lttng-sessiond based on its configuration.
Known drawbacks
=========
None
Note
==========
Some logging call sites used the ppid of the app instead of the pid.
Those have been changed to pid.
References
==========
[1] https://github.com/lttng/lttng-tools/commit/
88e3c2f5610b9ac89b0923d448fee34140fc46fb
Fixes: #1384
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Change-Id: If364b5d48e7fd2b664276a0fb1b7eec2c45ed683
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Mon, 12 Jul 2021 20:44:38 +0000 (16:44 -0400)]
Fix: ust: segfault on lttng start on filter bytecode copy
Observed issue
==============
A segmentation fault is observed for multiple UST timeout scenarios.
Backtrace:
#0 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:384
#1 0x0000557fe0395df9 in copy_filter_bytecode (orig_f=0x7f9c5802b790) at ust-app.c:1196
#2 0x0000557fe0397702 in shadow_copy_event (ua_event=0x7f9c58025ff0, uevent=0x7f9c58033560) at ust-app.c:1824
#3 0x0000557fe039ac46 in create_ust_app_event (ua_sess=0x7f9c5802ec20, ua_chan=0x7f9c58025cc0, uevent=0x7f9c58033560, app=0x7f9c5c001da0) at ust-app.c:3192
#4 0x0000557fe03a054d in ust_app_channel_synchronize_event (ua_chan=0x7f9c58025cc0, uevent=0x7f9c58033560, ua_sess=0x7f9c5802ec20, app=0x7f9c5c001da0) at ust-app.c:5096
#5 0x0000557fe03a0772 in ust_app_synchronize (usess=0x7f9c580074a0, app=0x7f9c5c001da0) at ust-app.c:5173
#6 0x0000557fe03a0a70 in ust_app_global_update (usess=0x7f9c580074a0, app=0x7f9c5c001da0) at ust-app.c:5255
#7 0x0000557fe03a00e0 in ust_app_start_trace_all (usess=0x7f9c580074a0) at ust-app.c:4987
#8 0x0000557fe0355c6a in cmd_start_trace (session=0x7f9c5800a190) at cmd.c:2668
#9 0x0000557fe0382e70 in process_client_msg (cmd_ctx=0x7f9c58003d70, sock=0x7f9c74bf44e0, sock_error=0x7f9c74bf44e4) at client.c:1527
#10 0x0000557fe03848a2 in thread_manage_clients (data=0x557fe06d9440) at client.c:2200
#11 0x0000557fe037d1cb in launch_thread (data=0x557fe06d94b0) at thread.c:75
#12 0x00007f9c796af609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#13 0x00007f9c795b6293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
The scenario:
# Start an instrumented app
./app
gdb lttng-sessiond
# put a breakpoint on ustctl_set_filter
lttng create my_session
lttng enable-event -u tp:tp_test
lttng start
lttng enable-event -u __dummy --filter 'my_field == "user34"'
# The tracepoint should hit. Do not continue.
kill -s SIGSTOP $(pgrep app)
# Continue lttng-sessiond.
# enable-event will return an error. This a bug in itself, still let's
# continue with the current bug.
lttng stop
# Start a new app that will register.
./app &
sleep 1
lttng start
# lttng-sessiond should segfault.
Cause
=====
During the "lttng enable-event" command, the timeout error bubbles up
all the way to event_ust_enable_tracepoint and is different from
LTTNG_UST_ERR_EXIST. `trace_ust_destroy_event` is called and frees the
`uevent` object. Note that contrary to the comment `uevent` is added to
the channel event hash table at this point.
On the next `lttng start` command, the event node is still present in
the hash table and is iterated on. lttng-sessiond segfault on the first
data access of the previously freed memory.
The problem was introduced by commit
88e3c2f5610b9ac89b0923d448fee34140fc46fb [1]. Which essentially move the
callsite of `add_unique_ust_event` before `ust_app_*_event_glb` calls.
Solution
========
Go to `end` label to prevent freeing of the uevent object.
Note that app synchronization should not force an error at the channel
level, since a single app can fail but the whole channel should not.
The `error` label is now obsolete.
Known drawbacks
=========
None.
References
==========
[1] https://github.com/lttng/lttng-tools/commit/
88e3c2f5610b9ac89b0923d448fee34140fc46fb
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Change-Id: Ifaf3f4c71bb2da869c7b441aaa4b367f8f7cbdd6
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Francis Deslauriers [Mon, 27 Sep 2021 13:42:54 +0000 (09:42 -0400)]
Fix: notification-thread: handling event from a removed tracer event src
Issue
=====
The issue is caused by a race condition where the `lttng_poll_wait()`
returns a _REMOVE_TRACER_EVENT_SOURCE event followed by an actual
notification event on the removed event source fd.
This causes the notification thread to remove the fd from the potential
notification sources list and later fail to find that same fd in the
next iteration.
This race condition can lead to the notification thread to hang
indefinitely or to failed assertions within the `fini_thread_state()`
function.
Fix
===
When removing an tracer event source, force the notification thread
`lttng_poll_wait()` loop to restart to ignore events from the removed
fd.
Use the `restart_poll` for that purpose (see note below).
Reproducer
==========
It's easy to reproduce this issue by adding a `usleep(5000)` just before
the `lttng_poll_wait()` call in the notification thread.
Note
====
It's the second time that I fix this issue.
It was first fixed by this commit by adding the `restart_poll` flag:
commit
8b5240601e4ddf6127e4291b7194dd5179cb35b5
Author: Francis Deslauriers <francis.deslauriers@efficios.com>
Date: Thu Dec 10 15:41:29 2020 -0500
notification-thread: drain all tracer notification on removal
and later, that other commit refactored that code but accidently removed
the use of the `restart_poll`:
commit
34bf4f69e49d8a69331a6aa6826ef1f155e20ede
Author: Francis Deslauriers <francis.deslauriers@efficios.com>
Date: Wed May 26 16:05:16 2021 -0400
notification-thread: remove fd from pollset on LPOLLHUP and friends
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I6da0ed4374b612934adc72fb88d5c142505c5d53
Simon Marchi [Fri, 3 Sep 2021 21:31:28 +0000 (17:31 -0400)]
common/macros: don't define min and max macros in C++
Later in the series, the min macro conflicted with a C++ header file.
Since the min and max macros are not useful in C++ (std::min/std::max
are preferred, don't define them when building a C++ file. Don't define
min_t and max_t either, they too can be replaced with std::min and
std::max, which are templated / type-safe.
Change-Id: I3d56d325f6508c32baba674c335c3f4ab0ecc582
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Fri, 3 Sep 2021 21:31:28 +0000 (17:31 -0400)]
Add mandatory C++ compiler build dependency
This change is in preparation of converting some internal code to C++.
- Add ax_cxx_compile_stdcxx.m4, which provides AX_CXX_COMPILE_STDCXX, a
macro to find a C++ compiler matching certain characteristics.
- Find which warning flags work for the C++ compiler (certain warning
flags may be C or C++ specific).
- Do a bit of reorganizing to group all C compiler things together, all
C++ compiler things together.
Change-Id: I35a16996fa9ba1fbbb040f7fa5f826e6bc95ea29
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Tue, 21 Sep 2021 13:40:37 +0000 (09:40 -0400)]
liblttng-ctl: use export list to define exported symbols
Symbols are currently exported by default by liblttng-ctl.so (usable by
other shared libraries / programs using liblttng-ctl.so), so we must use
LTTNG_HIDDEN on all symbols that are meant to be internal to
liblttng-ctl.so. Of course, this is easy to forget, so over the years
many symbols that were not meant to be exported were exported, and must
now stay exported to avoid breaking the ABI.
As explained here [1], a better method is to make symbols hidden by
default, and mark those we want to be exported as such. I have tried to
use this, but when subsequently converting the code to C++, I have
noticed that some symbols related to the STL were exported anyway, which
is bad.
The other alternative, implemented in this patch, is to use an explicit
symbol export list [2], using libtool's -export-symbols (which uses the
linker's -version-script option). Only the symbols listed here are
exported.
So, in practice, this patch:
- Adds an liblttng-ctl.sym file with the list of exported symbols and
adjusts the Makefile to use the -export-symbol option
- Removes LTTNG_HIDDEN and all its uses
abidiff shows no changes for liblttng-ctl.so between before and after
this patch.
[1] https://gcc.gnu.org/wiki/Visibility
[2] https://www.gnu.org/software/libtool/manual/libtool.html#Link-mode
Change-Id: I5d8c558303894b0ad8113c6e52f79a053bb580e1
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Wed, 29 Sep 2021 13:37:01 +0000 (09:37 -0400)]
tests: add some diags in utils.sh
When debugging test cases and trying to reproduce manually, it helps me
a lot to see what commands are executed by the test cases.
Change utils.sh such that all invocations of the lttng binary go through
a function that logs (using `diag`) the executed command.
Also add some diagnostics when starting lttng-sessiond, to make that
easier to replicate by hand.
It's not perfect, if an argument contains some spaces, the diag output
will not be quoted properly, so you won't be able to copy paste the
command directly. But from what I saw, a lot of things in our testsuite
would not handle spaces (in session names for example) properly, so it's
not likely to happen often.
Here's an example of the result:
$ ./tests/regression/ust/multi-lib/test_multi_lib
1..55
# UST - Dynamic loading and unloading of libraries
# export LTTNG_SESSION_CONFIG_XSD_PATH=/home/smarchi/build/lttng-tools/tests/../src/common/config/
# env /home/smarchi/build/lttng-tools/tests/../src/bin/lttng-sessiond/lttng-sessiond --background --consumerd64-path=/home/smarchi/build/lttng-tools/tests/../src/bin/lttng-consumerd/lttng-consumerd 1
ok 1 - Start session daemon
# ./tests/regression/ust/multi-lib//../../../../src/bin/lttng/lttng create multi_lib -o /tmp/tmp.test_multi_lib_ust_trace_path.SRYlHj
ok 2 - Create session multi_lib in -o /tmp/tmp.test_multi_lib_ust_trace_path.SRYlHj
# dlopen 2 providers, same event name, same payload
# ./tests/regression/ust/multi-lib//../../../../src/bin/lttng/lttng enable-event multi:tp -s multi_lib -u
ok 3 - Enable ust event multi:tp for session multi_lib
# ./tests/regression/ust/multi-lib//../../../../src/bin/lttng/lttng start multi_lib
ok 4 - Start tracing for session multi_lib
# ./tests/regression/ust/multi-lib//../../../../src/bin/lttng/lttng stop multi_lib
ok 5 - Stop lttng tracing for session multi_lib
ok 6 - Trace match with 2 event multi:tp
ok 7 - Metadata match with the metadata of 1 event(s) named multi:tp
# ./tests/regression/ust/multi-lib//../../../../src/bin/lttng/lttng destroy multi_lib
ok 8 - Destroy session multi_lib
# ./tests/regression/ust/multi-lib//../../../../src/bin/lttng/lttng create multi_lib -o /tmp/tmp.test_multi_lib_ust_trace_path.dQ0kMN
ok 9 - Create session multi_lib in -o /tmp/tmp.test_multi_lib_ust_trace_path.dQ0kMN
# dlopen 2 providers, same event name, different payload
# ./tests/regression/ust/multi-lib//../../../../src/bin/lttng/lttng enable-event multi:tp -s multi_lib -u
ok 10 - Enable ust event multi:tp for session multi_lib
...
Change-Id: I312fc0890a2dfaedf199b9021baebac8d3bf632b
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Wed, 6 Oct 2021 15:41:19 +0000 (11:41 -0400)]
include: add missing "extern"
Change-Id: I37574b25adede7c639a04c508f6e4be8256339d9
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Wed, 6 Oct 2021 14:57:24 +0000 (10:57 -0400)]
include: remove spurious spaces in condition/session-rotation.h
Change-Id: Ia525d24c3b4098dff5c50fb2c5d93c16f6e08f5c
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Tue, 5 Oct 2021 20:10:18 +0000 (16:10 -0400)]
tests: fix header of regression/ust/getcpu-override/run-getcpu-override
The "SPDX-License-Identifier:" header is not in a comment, so is
interpreted as a bash command. This is harmless, but it appears in the
test output:
ok 13 - Start tracing for session sequence-cpu
# Launching app with getcpu-plugin wrapper
./tests/regression/ust/getcpu-override//run-getcpu-override: 2: SPDX-License-Identifier:: not found
ok 14 - Application with wrapper done
Fix that, and add a proper copyright notice, based on the other files
that were added at the same time as this one.
Change-Id: Icdf5e2fd5aec4080b2e5cad10cca4813bad26394
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Tue, 5 Oct 2021 17:49:09 +0000 (13:49 -0400)]
cleanup: remove urcu_bp dependency from ust tests
lttng-ust >= 2.13 has its own internal copy of urcu and doesn't require
applications to link with urcu_bp anymore.
Change-Id: I0a11af7cf284952dbc26d0657eb490040acb7438
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Thu, 5 Aug 2021 20:48:51 +0000 (16:48 -0400)]
fix: wrong define used for GCC version check
As far as I can tell, the __GNUC_MAJOR__ define has never existed, the
proper define for the major version is __GNUC__. See
https://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html for
more details.
Change-Id: I0d47d524e7efd204fd2f8976311c62e872eb6170
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Wed, 29 Sep 2021 14:38:44 +0000 (10:38 -0400)]
Cleanup: unnecessary declaration
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I4a5e45d3981064b16d0790d511a1dc7ea33904af
Jonathan Rajotte [Wed, 29 Sep 2021 14:30:31 +0000 (10:30 -0400)]
Cleanup: firing policy source files
This is a leftover of a manipulation error during a rebase.
None of these files are considered by the build system.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I8c923d65e36c850e883552a8e46333ee854588a3
Jérémie Galarneau [Mon, 4 Oct 2021 16:41:51 +0000 (12:41 -0400)]
Fix: userspace-probe: unreported error on string copy error
Issue
=====
String copy errors, either due to the length or an allocation failure,
are not reported by
lttng_userspace_probe_location_tracepoint_create_from_payload
and don't log a clear error message.
This allowed truncation bugs like the one fixed in
b45a296 to go
unnoticed.
Fix
===
Return an "invalid" status code and log a more descriptive error
message.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ia07cac7cba315ea79337262e9082dd06eb60950f
Francis Deslauriers [Fri, 1 Oct 2021 20:10:24 +0000 (16:10 -0400)]
Fix: userspace-probe: truncating binary path for SDT
Issue
=====
This issue was uncovered when we enabled the testing of the SDT
userspace probe instrumentation on the CI, where the paths to file are
specially long.
The reported error is:
- rule: ma-probe-sdt (type: kernel:uprobe, location type: SDT, location: /root/workspace/dev_gerrit_lttng-tools_rootbuild/arch/amd64/babeltrace_version/stable-2.0/build/std/conf/agents/liburcu_version/master/node/amd64-rootnode/test_type/base/src/lttng-tools/tests/utils/testapp/userspace-probe-sdt-binary/.libs/userspace-probe-sdt-binary:foobar:tp1)
+ rule: ma-probe-sdt (type: kernel:uprobe, location type: SDT, location: /root/workspace/dev_gerrit_lttng-tools_rootbuild/arch/amd64/babeltrace_version/stable-2.0/build/std/conf/agents/liburcu_version/master/node/amd64-rootnode/test_type/base/src/lttng-tools/tests/utils/testapp/userspace-probe-sdt-binary/.libs/userspace-probe-s:foobar:tp1)
The important part to notice is that the path to the binary is truncated
compared to was is expected by the test case.
The problem is caused by the
`lttng_userspace_probe_location_tracepoint_create_from_payload()`
function that strdup() the path string using the wrong defined value.
Fix
===
Use LTTNG_PATH_MAX rather then LTTNG_SYMBOL_NAME_LEN to copy the binary
path.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I24cbf413baba405bf4c4b534ccbc2b18f8d5d43f
Jérémie Galarneau [Tue, 7 Sep 2021 21:37:37 +0000 (17:37 -0400)]
Clean-up: sessiond: remove unused ust_app_{lock, unlock}_list declarations
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ic01a48881756322ffa5ba6a977dde1bc81f4a0d0
Jérémie Galarneau [Thu, 1 Jul 2021 19:50:47 +0000 (15:50 -0400)]
Fix: lttng: add-trigger: don't provide a default event rule type
There is no reason for an event rule to have a default type. The
--type parameter is required.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ic7f03453fac410c96ca6bb3b3ca0bdfb297a10d1
Francis Deslauriers [Fri, 20 Aug 2021 19:26:01 +0000 (15:26 -0400)]
Force usage of assert() condition when NDEBUG is defined
Reuse the BT2 approach to force the usage of the assertion condition
even when assert() are removed by the NDEBUG define.
See `BT_USE_EXPR()` macro and documentation in Babeltrace commit[0]:
commit
1778c2a4134647150b199b2b57130817144446b0
Author: Philippe Proulx <eeppeliteloop@gmail.com>
Date: Tue Apr 21 11:15:42 2020 -0400
lib: assign a unique ID to each pre/postcond. and report it on failure
0: https://github.com/efficios/babeltrace/commit/
1778c2a4134647150b199b2b57130817144446b0
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I3844b6ae7e95952d90033898397ac936540b785c
Francis Deslauriers [Thu, 19 Aug 2021 21:14:46 +0000 (17:14 -0400)]
Fix: statements with side-effects in assert statements
Background
==========
When building with the NDEBUG definition the `assert()` statements are
removed.
Issue
=====
Currently, a few `assert()` statements in the code base contain
statements that have side effects and removing them changes the
behavior for the program.
Fix
===
Extract the statements with side effects out of the `assert()`
statements.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I0b11c8e25c3380563332b4c0fad15f70b09a7335
Jonathan Rajotte [Thu, 16 Sep 2021 15:20:07 +0000 (11:20 -0400)]
Fix: lttng_trace_archive_location_serialize is called on freed memory
Observed issue
==============
The following backtrace have been reported [1].
#0 __GI_raise (sig=sig@entry=6) at /usr/src/debug/glibc/2.31+gitAUTOINC+
f84949f1c4-r0/git/sysdeps/unix/sysv/linux/raise.c:50
#1 0x0000003123025528 in __GI_abort () at /usr/src/debug/glibc/2.31+gitAUTOINC+
f84949f1c4-r0/git/stdlib/abort.c:79
#2 0x0000000000419884 in lttng_trace_archive_location_serialize (location=0x7f1c9c001160, buffer=0x7f1cb961c320) at /usr/src/debug/lttng-tools/2.13.0-r0/lttng-tools-2.13.0/src/common/location.c:230
#3 0x00000000004c8f06 in lttng_evaluation_session_rotation_serialize (evaluation=0x7f1cb000a7f0, payload=0x7f1cb961c320) at /usr/src/debug/lttng-tools/2.13.0-r0/lttng-tools-2.13.0/src/common/conditions/session-rotation.c:539
#4 0x00000000004a80fa in lttng_evaluation_serialize (evaluation=0x7f1cb000a7f0, payload=0x7f1cb961c320) at /usr/src/debug/lttng-tools/2.13.0-r0/lttng-tools-2.13.0/src/common/evaluation.c:42
#5 0x00000000004bc24f in lttng_notification_serialize (notification=0x7f1cb961c310, payload=0x7f1cb961c320) at /usr/src/debug/lttng-tools/2.13.0-r0/lttng-tools-2.13.0/src/common/notification.c:63
#6 0x0000000000458b7d in notification_client_list_send_evaluation (client_list=0x7f1cb0008f90, trigger=0x7f1ca40113d0, evaluation=<optimized out>, source_object_creds=0x7f1cb000a874, client_report=0x475840 <client_handle_transmission_status>, user_data=0x7f1cb0006010) at /usr/src/debug/lttng-tools/2.13.0-r0/lttng-tools-2.13.0/src/bin/lttng-sessiond/notification-thread-events.c:4379
#7 0x0000000000476586 in action_executor_generic_handler (item=0x7f1cb0009600, work_item=0x7f1cb000a820, executor=0x7f1cb0006010) at /usr/src/debug/lttng-tools/2.13.0-r0/lttng-tools-2.13.0/src/bin/lttng-sessiond/action-executor.c:696
#8 action_work_item_execute (work_item=0x7f1cb000a820, executor=0x7f1cb0006010) at /usr/src/debug/lttng-tools/2.13.0-r0/lttng-tools-2.13.0/src/bin/lttng-sessiond/action-executor.c:715
#9 action_executor_thread (_data=0x7f1cb0006010) at /usr/src/debug/lttng-tools/2.13.0-r0/lttng-tools-2.13.0/src/bin/lttng-sessiond/action-executor.c:797
#10 0x0000000000462327 in launch_thread (data=0x7f1cb00060b0) at /usr/src/debug/lttng-tools/2.13.0-r0/lttng-tools-2.13.0/src/bin/lttng-sessiond/thread.c:66
#11 0x0000003123408ea4 in start_thread (arg=<optimized out>) at /usr/src/debug/glibc/2.31+gitAUTOINC+
f84949f1c4-r0/git/nptl/pthread_create.c:477
#12 0x00000031230f8dcf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
This can be easily reproduced with the following session and trigger
configuration:
lttng create test
lttng enable-event -u -a
lttng start
# Register two similar triggers via a dummy C program since rotation
# completed condition is not exposed on the CLI for now. Yielding the
# following triggers:
lttng list-triggers
- name: trigger0
owner uid: 1000
condition: session rotation completed
session name: test
errors: none
action:notify
errors: none
- name: trigger1
owner uid: 1000
condition: session rotation completed
session name: test
errors: none
action:notify
errors: none
lttng rotate <- abort happens here.
Cause
=====
The problem lies in how the location (`lttng_trace_archive_location`)
object is assigned to the `lttng_evaluation` objects. A single location
object can end up being shared between multiple `lttng_evaluation` objects
since we iterate over all triggers and create an `lttng_evaluation` object
with the location each time as needed.
See `src/bin/lttng-sessiond/notification-thread-events.c:1956`.
The location object is then freed when the first notification is
completely serialized. The second serialization end up having a
reference to a freed `lttng_trace_archive_location` object.
Solution
========
Implement ref counting for the lttng_trace_archive_location object.
Note
=======
This also fixes a leak that was present in `cmd_destroy_session_reply`.
The location is created by `session_get_trace_archive_location` and is
never `destroyed`/`put`.
Known drawbacks
=========
None.
References
==========
[1] https://bugs.lttng.org/issues/1325
Fixes: #1325
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Change-Id: I99dc595ee5b0288c727b193ed061f5273752bd24
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Mon, 13 Sep 2021 20:49:48 +0000 (16:49 -0400)]
Fix: sessiond: ust session is inactive during ust_app_global_update
Observed issue
==============
The following scenario leads to an abort of lttng-sessiond.
lttng-sessiond (with kernel tracing available)
lttng create system-trace --snapshot -U /tmp/snapshot
lttng enable-channel -k system-trace --subbuf-size=4k --num-subbuf=256
lttng enable-event -c system-trace -k 'sched_wak*' -s system-trace
lttng start system-trace
lttng enable-event -u -a
Fails as expected with:
Error: Events: The command tried to enable an event in a new domain for
a session that has already been started once. (channel channel0,
session system-trace)
Launch any ust app such as easy_ust from the lttng-ust repository.
The following backtrace is generated:
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007ffff7af0859 in __GI_abort () at abort.c:79
#2 0x00007ffff7af0729 in __assert_fail_base (fmt=0x7ffff7c86588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55555564b765 "usess->active", file=0x555555649a60 "ust-app.c", line
#3 0x00007ffff7b01f36 in __GI___assert_fail (assertion=0x55555564b765 "usess->active", file=0x555555649a60 "ust-app.c", line=5123, function=0x55555564ecf0 <__PRETTY_FUNCTION__.14199> "ust_
#4 0x00005555555d1f5e in ust_app_global_update (usess=0x7fffe001fb90, app=0x7fffac000b80) at ust-app.c:5123
#5 0x00005555555b60d4 in update_ust_app (app_sock=82) at dispatch.c:71
#6 0x00005555555b7025 in thread_dispatch_ust_registration (data=0x5555556a07f0) at dispatch.c:409
#7 0x00005555555ad5ab in launch_thread (data=0x5555556a0810) at thread.c:65
#8 0x00007ffff7ce6609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#9 0x00007ffff7bed293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
This also happens for the track command. You can replace the `lttng
enable-event -u -a` with `lttng track --userspace --vuid=0` then launch
an app and the same backtrace gets generated.
Cause
=====
During `process_client_msg` the `create_ust_session` function is called
and a ust session is assigned to the "system_trace" session with a
state of `active` set to 0 (false). This is not a problem.
The problem seems to lie with a single call site for
`ust_app_global_update` in `update_ust_app`. The status of the ust
session is not checked before calling the `ust_app_global_update`. It is
important to note that all `ust_app_global_update_all` callsites guard
the call with a check against the status of the session.
Solution
========
Guard the call to `ust_app_global_update` with a check of the ust
session active state.
Known drawbacks
=========
None.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I14d25d99d0609689247cdfa86130bd0219613581
Jonathan Rajotte [Tue, 14 Sep 2021 20:10:36 +0000 (16:10 -0400)]
Fix: common: error query for trigger action protocol error
Observed issue
==============
When listing a trigger with a single non-list action the CLI reports an
error in the protocol resulting in an output with no error accounting
for the action.
$ lttng list-triggers
- name: trigger0
owner uid: 1000
condition: session rotation ongoing
session name: test
errors: none
action:notify
Error: Failed to query errors of trigger 'trigger0' (owner uid: 1000): Protocol error occurred
Cause
=====
The `action_path` associated with the query has an index count of 0 as
it should considering that the single root element action element is not
a `list` object.
Inside `lttng_action_path_create_from_payload` a payload view is
initialized with a `len` of 0 since `header->index_count` is 0 as it
should.
The payload view is then validated and is considered invalid since the
validation check for `len` > 0. The error then bubbles up.
Solution
========
Since that the payload view is considered invalid when it is equal to
zero simply handle this special case and call directly
`lttng_action_path_create` with the appropriate parameter.
Known drawbacks
=========
None.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I8f302c3aa78835342c665793908dc02f0a9dece4
Simon Marchi [Tue, 21 Sep 2021 14:31:55 +0000 (10:31 -0400)]
Fix: common: un-hide two rate policy functions
These functions are part of the liblttng-ctl API/ABI, they should not be
hidden.
Change-Id: Ic04bb4e7a0bfd0c7d661228b7ccf5d17dccfd9ba
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Tue, 21 Sep 2021 13:30:09 +0000 (09:30 -0400)]
Fix: include: remove unneeded declaration of lttng_session_descriptor_get_session_name
There is a declaration of lttng_session_descriptor_get_session_name in
both session-descriptor.h and session-descriptor-internal.h. Since this
is a function exposed by the API, the one in -internal.h is not needed,
remove it.
Since the removed declaration had LTTNG_HIDDEN, this has the effect of
making the lttng_session_descriptor_get_session_name symbol of
liblttng-ctl exported / part of the ABI. I think it was a mistake that
it wasn't previously exported.
Change-Id: I79d383f012d161a6df42240c6849b1b3af109def
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Francis Deslauriers [Wed, 8 Sep 2021 14:16:23 +0000 (10:16 -0400)]
Fix: Tests: race condition in test_ns_contexts_change
Issue
=====
The test script doesn't wait for the test application to complete before
stopping the tracing session. The race is that depending on the
scheduling the application is not always done generating events when the
session is stopped.
Fix
===
Make the test script wait for the termination of the test app before
stopping the session.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I29d9b41d2a2ed60a6c42020509c2067442ae332c
Francis Deslauriers [Tue, 7 Sep 2021 21:10:31 +0000 (17:10 -0400)]
Fix: Tests: race condition in test_event_tracker
Background
==========
The `test_event_tracker` file contains test cases when the event
generating app in executed in two distinct steps. Those two steps are
preparation and execution.
1. the preparation is the launching the app in the background, and
2. the execution is actually generating the event that should or
should not be traced depending on the test case.
This is useful to test the tracker feature since we want to ensure that
already running apps are notified properly when changing their tracking
status.
Issue
=====
The `test_event_vpid_track_untrack` test case suffers from a race
condition that is easy to reproduce on Yocto.
The issue is that sometimes events are end up the trace when none is
expected.
This is due to the absence of synchronization point at the launch of the
app which leads to the app being scheduled in-between the track-untrack
calls leading to events being recorded to the trace.
It's easy to reproduce this issue on my machine by adding a `sleep 5`
between the track and untrack calls and setting the `NR_USEC_WAIT`
variable to 1.
Fix
===
Using the testapp `--sync-before-last-event-touch` flag to make the app
create a file when all but the last event are executed. We then have the
app wait until we create a file (`--sync-before-last-event`) to generate
that last event. This way, we are sure no event will be generated when
running the track and untrack commands.
Notes
=====
- This issue affects other test cases in this file.
- This commit fixes a typo in the test header.
- This commit adds `diag` calls to help tracking to what test the output
relates to when reading the log.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ia2b68128dc9a805526f9748f31ec2c2d95566f31
Simon Marchi [Fri, 20 Aug 2021 19:36:50 +0000 (15:36 -0400)]
tests: tap: remove semicolons in pass / fail definitions
I wanted to write something like this in a test:
if (cond)
fail("message 1");
else
fail("message 2");
And was met with
CC test_action.o
/home/simark/src/lttng-tools/tests/unit/test_action.c: In function ‘main’:
/home/simark/src/lttng-tools/tests/unit/test_action.c:544:9: error: ‘else’ without a previous ‘if’
544 | else
| ^~~~
I then remembered that it was in our coding style to use braces:
if (cond) {
fail("message 1");
} else {
fail("message 2");
}
... which avoids the error. But still, the former form should work.
Fix this by removing the semi-colons in the pass and fail definitions, I
don't think they belong there. Doing so finds a spot in ini_config.c
where a semi-colon is missing, add it.
Change-Id: I6ff09d496a0b12f34baa6f993cffc69eef611df0
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Simon Marchi [Mon, 23 Aug 2021 18:01:17 +0000 (14:01 -0400)]
configure: add -Wno-gnu-folding-constant
When building with clang 12, I get:
CC tp.o
In file included from /home/simark/src/lttng-tools/tests/utils/testapp/gen-ust-events/tp.c:10:
In file included from /home/simark/src/lttng-tools/tests/utils/testapp/gen-ust-events/./tp.h:75:
In file included from /tmp/lttng/include/lttng/tracepoint-event.h:69:
In file included from /tmp/lttng/include/lttng/ust-tracepoint-event.h:1021:
/home/simark/src/lttng-tools/tests/utils/testapp/gen-ust-events/./tp.h:30:1: error: variable length array folded to constant array as an extension [-Werror,-Wgnu-folding-constant]
TRACEPOINT_EVENT(tp, tptest,
^
/tmp/lttng/include/lttng/tracepoint.h:791:27: note: expanded from macro 'TRACEPOINT_EVENT'
#define TRACEPOINT_EVENT LTTNG_UST_TRACEPOINT_EVENT
^
/tmp/lttng/include/lttng/ust-tracepoint-event.h:84:2: note: expanded from macro 'LTTNG_UST_TRACEPOINT_EVENT'
LTTNG_UST__TRACEPOINT_EVENT_CLASS(_provider, _name, \
^
/tmp/lttng/include/lttng/ust-tracepoint-event.h:940:10: note: expanded from macro 'LTTNG_UST__TRACEPOINT_EVENT_CLASS'
size_t __dynamic_len[__num_fields]; \
^
From what I understand, this warning simply says that the compiler did
figure out that __num_fields could be known at compile-time, so
__dynamic_len ends up as a regular static array and not a variable
length array (which is what we want). And it warns us that this
behavior is not standard C, but an extension that originated from GNU.
So I think it's fine to ignore it, as it simply warns us that the
behavior we want happens.
Change-Id: Ib7273e7f86c6b04742f8463f925cdbb1fa14041d
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
This page took 0.0572279999999999 seconds and 4 git commands to generate.