lttng-tools.git
8 months agolttng: enable-channel: move kernel tracer status check to util
Jérémie Galarneau [Wed, 6 Dec 2023 19:35:27 +0000 (14:35 -0500)] 
lttng: enable-channel: move kernel tracer status check to util

In order to re-use the same logic in the enable-event command, move the
kernel status checking and printing to a common utility function.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I85956298c27fd7073ac02aac901d39e3d82bb280

8 months agolttng-ctl: manage memory automatically in kernel tracer status check
Jérémie Galarneau [Tue, 5 Dec 2023 21:06:07 +0000 (16:06 -0500)] 
lttng-ctl: manage memory automatically in kernel tracer status check

Use a unique_ptr to manage the dynamically allocated payload returned by
lttng_ctl_ask_sessiond.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I685fc03c1da7ff7903503ed82636d27f98f9895e

8 months agosessiond: kernel: log kernel tracer status on change
Jérémie Galarneau [Tue, 5 Dec 2023 19:51:25 +0000 (14:51 -0500)] 
sessiond: kernel: log kernel tracer status on change

Log the value of newly-introduced the kernel tracer status when it is
set.

This will make it easier to confirm that the
lttng_get_kernel_tracer_status API returns a given value given the logs
of the session daemon.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I7a49493d05bf34868acd32f7f7fba301a41f69d5

8 months agosessiond: kernel: clean-up: move static variables to anonymous namespace
Jérémie Galarneau [Tue, 5 Dec 2023 19:26:42 +0000 (14:26 -0500)] 
sessiond: kernel: clean-up: move static variables to anonymous namespace

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I3ee0e5ff2ff4c47c23f6f27f5707be08107f70f1

8 months agosessiond: lttng: Add command to check kernel tracer status
Kienan Stewart [Wed, 22 Nov 2023 21:28:01 +0000 (16:28 -0500)] 
sessiond: lttng: Add command to check kernel tracer status

Issue observed
--------------

When `lttng enable-channel --kernel` fails, little feedback is
available to user to help them to understand the cause.

Eg.

```
Error: Channel asdf: Kernel tracer not available (session auto-20231123-092621)
```

Solution
--------

The semantic status of the kernel tracer is tracked and persisted in
the session daemon (through `init_kernel_tracer` and
`cleanup_tracer_tracer`.

A new client command `lttng_kernel_tracer_status` is added to request
the current value of the `kernel_tracer_status`. The `lttng` client
uses this command after enabling a kernel-domain channel fails to
provide the user with a more specific cause of the failure.

Eg.

```
Error: Channel asdf: Kernel tracer not available (session auto-20231123-092621)
        Missing one or more required kernel modules
        Consult lttng-sessiond logs for more information
```

The kernel tracer status is tracked with an enum defined in
`include/lttng/kernel.h` to avoid passing potentially different errno values
or locale-dependant strings between the LTTng client and session
daemon.

Loading modules and checking signatures can fail with a number of
different errno values. For example:

C.f. https://gitlab.com/linux-kernel/stable/-/blob/master/kernel/module/signing.c#L70

 * `EKEYREJECTED`
 * Any other error code

C.f. https://gitlab.com/linux-kernel/stable/-/blob/master/Documentation/security/keys/core.rst

 * `EKEYREVOKED`
 * `EKEYEXPIRED`
 * `ENOKEY`
 * Others, such as `ENOMEM`

Known drawbacks
---------------

None.

Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I2ae4b188f0110a472200c2511439b9e3e600527d

8 months agoTests: Avoid looping on app PIDs during cleanup
Kienan Stewart [Thu, 8 Jun 2023 15:05:17 +0000 (11:05 -0400)] 
Tests: Avoid looping on app PIDs during cleanup

When the list of app PIDs becomes long, eg. in the case for
`tests/regression/ust/nprocesses/test_nprocesses`, then performing the
following type of loop can be quite slow, especially if the apps have
some teardown time internally:

```
for p in ${APP_PIDS} ; do
  kill ${p}
  wait ${p}
done
```

Both `kill` and `wait` take a list of PIDs, so the cleanup is easily
factorable to

```
kill ${APP_PIDS}
wait ${APP_PIDS}
```

In the case of `test_nprocesses`, the test run time drops from ~25s to
~4s. The difference is less important in tests with fewer apps.

Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I9619939b4f1201a99f9666dee3e19551a67c9fb6

9 months agotests: Correct timing of python tests with python3 < 3.7
Kienan Stewart [Wed, 10 Jan 2024 16:11:08 +0000 (11:11 -0500)] 
tests: Correct timing of python tests with python3 < 3.7

`time.monotonic_ns()` was introduced in python 3.7. Prior to that, the
other available monotonic time function was `time.monotonic()` which was
itself introduced in python 3.3.

For python3 < 3.3, the automatic timing of TAP tests is disabled.

The use of underscores for readable integers was also only introduced in
python 3.6.

Change-Id: Ibf85669c4d108347097d2cea7ab5d28cde9d0cc6
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 months agotests: Remove debug statement from tap.sh
Kienan Stewart [Wed, 10 Jan 2024 15:59:40 +0000 (10:59 -0500)] 
tests: Remove debug statement from tap.sh

It looks like a debug statement got left in for commit 2a69bf1437. The
debug statement uses print, which is not necessarily a commonly
available command, and led to spurious errors on certain CI nodes.

Eg. https://ci.lttng.org/job/lttng-tools_master_slesbuild/976/babeltrace_version=stable-2.0,build=std,conf=agents,liburcu_version=master,platform=sles15sp4-amd64/consoleFull

Change-Id: If035e57489b5bf6185f73d1046bfdbd03f4a1fed
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
11 months agotests: Reduce runtime of tools/tracker/test_event_tracker tests
Kienan Stewart [Tue, 26 Sep 2023 15:36:24 +0000 (11:36 -0400)] 
tests: Reduce runtime of tools/tracker/test_event_tracker tests

Observed issue
==============

When running the test as root, so both UST and kernel tests are
exercised, this test takes about 100s to run on my development system.

Solution
========

By using session destroy with '--no-wait', the runtime is reduced by
30-40s. For the kernel tests in particular this introduces a detail to
keep in mind with regards to unloading the lttng-tests kernel
module. More details in the 'Known drawbacks' section.

The test applications (both userspace and kernel) also execute much
more quickly than the 0.5s sleep they are given. By reducing the sleep
to a hundreth of a second, another 15s or so can be shaved off the
test runtime.

Overall, the test runtime is reduced from 102s to 45s on my
development machine.

Known drawbacks
===============

If `modprobe -r lttng-tests` is run too quickly after the last session
destruction with `--no-wait`, the removal will fail. This patch uses a
simple one second sleep to give some time for the processes using that
module to get completely shutdown. I think it could be somewhat
brittle on systems that are slow or overcommitted; however, it seemed
more 'maintainable' than remembering to ensure that the last kernel
test session destruction doesn't use `--no-wait`.

Change-Id: Ib953ef22299d30507f46d2e6507fbd0f5641aa27
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
11 months agotests: Use `--no-wait` when destroying sessions in relayd-grouping
Kienan Stewart [Mon, 25 Sep 2023 15:00:17 +0000 (11:00 -0400)] 
tests: Use `--no-wait` when destroying sessions in relayd-grouping

By using `--no-wait`, the test completes about 22s faster on
my development machine.

Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I53cb0665bbb5a038fbe8da6ece924579e8e91549

11 months agotests: Use '--no-wait' to reduce clear/test_ust runtime
Kienan Stewart [Thu, 21 Sep 2023 16:08:17 +0000 (12:08 -0400)] 
tests: Use '--no-wait' to reduce clear/test_ust runtime

Motivation
==========

`regression/clear/test_ust` is one of the longer running regression
tests.

Solution
========

By using `--no-wait` when destroying sessions and reducing the value
of `DELAYUS` from the default of `1000000` (us) to `500000` (us) the
test runtime decreases from 5m48s to 4m36s on my development machine.

Known drawbacks
===============

When `DELAYUS` is further decreased, the events are no longer recorded
by the connected babeltrace live viewer.

Using `--no-wait` causes additional warnings and "errors" to be
emitted, which increase the effort required to find actual errors in
the test logs. For example,

```
PERROR - 12:07:59.932183981 [Rotation]: sendmsg: Broken pipe (in lttcomm_send_unix_sock() at unix.cpp:300)
Error: Failed to send result of the destruction of session
"s5yRHQuEBtpmQ8sF" to client
```

Change-Id: I58e71fe34cb4ee97e0879ebd5e08e6d9a1e6c07f
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
11 months agotests: Add POPT_CFLAGS to gen-ust-events-ns and gen-ns-events
Kienan Stewart [Thu, 21 Sep 2023 15:49:27 +0000 (11:49 -0400)] 
tests: Add POPT_CFLAGS to gen-ust-events-ns and gen-ns-events

Observed issue
==============

When building in an environment where popt was in a non-standard
location, the builds for these two test binaries failed with the
following error:

```
gen-ns-events.cpp:19:10: fatal error: popt.h: No such file or directory
   19 | #include <popt.h>
      |          ^~~~~~~~
```

Solution
========

Set the binary-specific CPPFLAGS in the `Makefile.am`

Known drawbacks
===============

None

Change-Id: I5563e24f330be86d630c68c32eaafaedf53a6c87
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
11 months agotests: Use destroy with no-wait during filter tests
Kienan Stewart [Wed, 6 Sep 2023 15:56:45 +0000 (11:56 -0400)] 
tests: Use destroy with no-wait during filter tests

Motivation
==========

The regression/tool/sfiltering/test_valid_filter test is one of the
longest running tests in the test suite, largely due to the number of
filters tested.

The session destroy step for each test takes in the order of 500-600ms
on my development machine and accounts for an important sum of time
across a run as each filter test will create and destroy a session.

Solution
========

By passing "--no-wait" to the session destroy command, the step time
is reduced from 500-600ms to ~20ms. The overall test time is
reduced (when running without root tests) from ~150s to ~40s.

Using "--no-wait" with the session destroy seems safe in this case as
the session stop is called before, which will finalize the writes to
disk.

Known drawbacks
===============

The patch causes errors similar to the following to be logged to
stderr:

```
PERROR - 11:55:26.730314528 [Rotation]: sendmsg: Broken pipe (in lttcomm_send_unix_sock() at unix.cpp:300)
Error: Failed to send result of the destruction of session "valid_filter" to client
```

Change-Id: Ic005116bbe910cb3da3e99aa85dc90244ed72f5b
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
11 months agotests: Use session discarded counter to validate blocking test
Kienan Stewart [Thu, 31 Aug 2023 20:06:09 +0000 (16:06 -0400)] 
tests: Use session discarded counter to validate blocking test

Motivation
==========

On my development machine, the block test takes about 75s to complete,
with the majority of the time being used in the infinite blocking
test.

Solution
========

Using the discarded counter provided by `lttng list SESSION`, the test
no longer spends time reading and processing the ~1G of traces with
babeltrace2. This reduces the run time of the entire suite from 75s to
about 15s.

Known drawbacks
===============

There's no longer a validation that the trace files written to disk
are parseable by a CTF reader.

Change-Id: I0ccdef53ef80f1ffe5cfff970b92c3de4ba460ec
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
11 months agotests: Reduce sleep time when waiting for various daemons to die
Kienan Stewart [Thu, 31 Aug 2023 18:00:26 +0000 (14:00 -0400)] 
tests: Reduce sleep time when waiting for various daemons to die

Motivation
==========

Using the rough duration of the tests 'Wait after kill \w+ daemon',
approximately 200s were being spent in those areas when running
non-root, non-destructive tests on my development machine.

```
$ find . -iname '*.log' -exec grep -A4 -HE 'Wait for kill [a-z]+ daemon'
{} \; | grep duration_ms | cut -d':' -f2 | datamash sum 1
198078.707216
```

Solution
========

The `stop_x_opt()` functions with timers in `tests/utils/utils.sh`
have the timeouts multiplied by 5 to reduce the counted sleep interval
to 0.1s.

After applying this change, the time spent in the tests as counted
with the find command in the 'Motivation' section above was reduced to
92061.768ms, a reduction of about 1.5 minutes.

Known drawbacks
===============

None

Change-Id: I1d4ad899808f37f6cb7b88bbab0ff05ab0c2b787
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
11 months agotests: Reduce sleep in regression/tools/clear/test_ust
Kienan Stewart [Thu, 31 Aug 2023 17:23:12 +0000 (13:23 -0400)] 
tests: Reduce sleep in regression/tools/clear/test_ust

Motivation
==========

This test is one of the longer running non-kernel
non-destructive tests, taking ~7 minutes to run
in my development environment.

Solution
========

By reducing the the sleeps from a half second to a tenth of a second,
the test passes from 364.0s to 309.1s.

Known drawbacks
===============

None

Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ibe6c37d474edf5ca118c82669e073089d888ff84

11 months agotests: Automatically time TAP tests
Kienan Stewart [Fri, 25 Aug 2023 19:31:16 +0000 (15:31 -0400)] 
tests: Automatically time TAP tests

Output approximate test timing by default for TAP tests by
initializing a timer when then TAP tests plan is initialized, and
resetting after every result.

Automatic timing of tests may be disabled by setting `TAP_AUTOTIME=0`
in the environment. In `tap.sh`, `autotime` is provided as a public
command to update configuration at runtime.

tap.sh and the tap-driver.sh scripts use a small helper
`tests/utils/tap/clock` to print the result of `lttng_clock_gettime`.

Originally `date` was used, but there are two principal drawbacks:

 * the `%N` formatter to provided sub-second resolution is specific to
 GNU date, and unavailable on other platforms (eg. FreeBSD).
 * destructive tests that modify the system date would cause strange
 results (eg. a test that takes 10 years to run)

Known drawbacks
===============

The automatic timing depends on having plan called first. If plan
isn't called (eg. in a late plan mode), the first test time will be
wrong.

The duration key is hardcoded to "duration_ms", as used by
https://github.com/jenkinsci/tap-plugin

As the timing information for the TAP tests is stored in a multiline
YAML block following the result line, any unexpected output (eg. from
stderr) could be written in that region. These lines can cause tools
parsing the TAP log to fail as the lines written may not be valid
YAML. For ci.lttng.org, the TAP parser should be configured to remove
invalid YAML instead of causing the build to become "unstable".

After a test run, lines other than `duration_ms` in the TAP YAML block
can be identified with the following command:

  find . -iname '*.log' -exec sed -n '/  ---/,/  \.\.\./p' {} \; \
  | grep -v -E '  ---|  \.\.\.|    duration_ms'

Some solutions to the above issue were considered and discarded:

 * Using a named pipe to pass stderr through sed so lines may be
 prefixed with '# '
 * Switching the `tap-driver.sh` to run with bash and take advantage
 of the Process substition for performing the prefixing of '# '

The above options ended up causing more coherency issues in the output
file than were resolved due to differing types of buffering and
processing delays.

 * Redirection to the stderr of the test script to another file

The '*.log' and '*.trs' cleanups are driven by the automake log driver
implementation, which is not aware of other files that may be produced
during the invocation of the modified `tap-driver.sh`. I didn't find
an easy way to hook into the automake behaviour to add additional file
patterns to cleanup.

 * Cleanup in the various test scripts to systematically prefix
   stderr, or to respect the `ERROR_OUTPUT_DEST` of
   `tests/utils/utils.sh`.

The scope of the patch would be significantly increased and for
relatively low added value compared to instructing the CI systems to
discard invalid YAML. Furthermore the values `OUTPUT_DEST` and
`ERROR_OUTPUT_DEST` are set to `/dev/null`, which would further
reduce the ability to understand failures based on the test logs.

Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Iabc35fc00f14085ef035d4b4e19a2c30cd86d851

11 months agotests: Output test time to trs files
Kienan Stewart [Wed, 23 Aug 2023 18:43:54 +0000 (14:43 -0400)] 
tests: Output test time to trs files

Observed issue
==============

The test suite for lttng-tools is relatively large and can take
upwards of an hour or more to run in CI, especially with root
tests enabled.

Neither automake driver (`tests/utils/tap-driver.sh`) nor the
standalone tap script (`tests/utils/tap/tap.sh`) provide timing
information for the test runs or the steps within a test run.

Solution
========

The TAP driver invoked by `make check` has been modified to include a
`:time-taken:` metadata field (in seconds) in the produced trs file.

Known drawbacks
===============

`date` on FreeBSD does not support the `%N` formatter for nanoseconds,
which results in the precision dropping to 1s.

Metadata which isn't recognized by the automake test harness is
currently ignored, but it is a behaviour that remains open to change.

C.f. https://www.gnu.org/software/automake/manual/html_node/Log-files-generation-and-test-results-recording.html

Further work is required on the CI side to make use of the information.

Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I94c7ddd7eb9388c595794e76dbbedc9bfe64d206

11 months agotests: Fix typo in tests/regression/kernel/test_ns_contexts
Kienan Stewart [Wed, 20 Dec 2023 15:58:42 +0000 (10:58 -0500)] 
tests: Fix typo in tests/regression/kernel/test_ns_contexts

Change-Id: I50e6027f87b6d4a08a61337782356f8fbc6a64ae
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
11 months agoFix: sessiond: freeze on channel creation on restart
Jérémie Galarneau [Tue, 12 Dec 2023 21:54:41 +0000 (16:54 -0500)] 
Fix: sessiond: freeze on channel creation on restart

Issue observed
--------------

When using lttng via a script, the session and consumer daemons appear
to completely lock up when we request that a channel be created. The
conditions for this lockup seem to be created by destroying a sessiond
and then creating a sessiond in quick sequence.

This can be reproduced, on some systems, by launching a session daemon
and running the following commands:
  $ sudo killall lttng-sessiond
  $ sudo lttng-sessiond --daemonize
  $ lttng create my_session --snapshot --output /tmp/demo-output
  $ lttng enable-channel --kernel my_channel

Note that 'killall' above is racy as it does not wait for the session
daemon to be killed. Hence, it is not unexpected for the incoming
session daemon to see the smoldering ashes of the "outgoing" session
daemon. However, it would be helpful if the second session daemon
instance warned the user of the existing session daemon instance.

From the logs captured from both instances of the lttng-sessiond (the
outgoing and incoming instances), there appears to be a time period
during which both session daemons are active at once.

This behaviour is unexpected as the session daemon guards itself (in
theory) from running multiple conflicting instances.

The guarding mechanism works in two steps (see the implementation of
`check_existing_daemon` @ src/bin/lttng-sessiond/main.cpp:926)

When a session daemon is launched, it attempts to connect to any active
session daemon's 'client' endpoint (a UNIX socket, the same used by
liblttng-ctl to communicate with the session daemon).

If the daemon receives a reply, it can assume that another session
daemon instance is already active and abort its launch. Conversely, when
no reply is received, it uses a "lock file" mechanism to check for other
running instances.

The lock file-based check creates a file (typically
/var/run/lttng/lttng-sessiond.lck in the case of a root session daemon)
and acquires an exclusive (write) POSIX lock on it [1]. The assumption
is that any other instance would own the lock and cause the operation to
fail.

On a reproducer system, we could notice that the client thread of the
outgoing sessiond daemon was torn down before the launch of the
initialization of the incoming session daemon. This caused the incoming
session daemon to not receive a reply to its connection attempt and
fall-back to the lock file-based mechanism.

Surprinsingly, it appears that the lock file checks succeeds even though
the outgoing session daemon still holds the lock file according to its
log.

See the original bug report for more information about the investigation
and how to reproduce the problem.

Cause
-----

The POSIX file locking API has a number of surprising behaviours[2] that
have seen it being superseded by platform-specific APIs. In our case,
the one that bit us is that any file lock held by a process is
automatically released when any of the file descriptors that reference
the file's description is released.

In practical terms, if a process forks and its child dies, it loses its
file lock since the child's file descriptors are closed on exit.

The LWN article linked below describes a similar scenario:

  It's common to have a library routine that opens a file, reads or
  writes to it, and then closes it again, without the calling
  application ever being aware that has occurred. If the application
  happens to be holding a lock on the file when that occurs, it can lose
  that lock without ever being aware of it.

The problem affects any use of the --background/--daemonize options
since, as part of the daemonization process (which occurs after the lock
file acquisition), the session daemon forks and its parent process
exits. This causes one of the descriptors pointing to the lock file to
be closed and the lock to be released.

After that point, any other instance of the session daemon process would
succeed in acquiring the lock file and assume it is the sole instance on
the system.

Solution
--------

The lock file code is modified to use the non-POSIX `flock`[3]
interface which is available on Linux and some BSDs[4]. `flock` provides
us with the guarantee we thought we had: that the file lock is only
released when _all_ file descriptors pointing to a given file
description are closed.

Drawbacks
---------

As a fallback, platforms that don't support `flock` will use the original
locking mechanism. Since this is a "hint" to warn users when erroneously
launch a second privileged session daemon, it seems acceptable for it
to not be completely reliable on secondary platforms.

References
----------

[1] https://man7.org/linux/man-pages/man2/fcntl.2.html (see F_SETLK)
[2] https://lwn.net/Articles/586904/
[3] https://linux.die.net/man/2/flock
[4] https://man.freebsd.org/cgi/man.cgi?query=flock&sektion=2

Fixes #1405

Reported-by: Erica Bugden <ebugden@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ic505ff0671c321f808050831ef2b7152cdbf4b8a

11 months agocommon: move utils_create_lock_file to its own file
Jérémie Galarneau [Tue, 12 Dec 2023 21:13:59 +0000 (16:13 -0500)] 
common: move utils_create_lock_file to its own file

A follow-up change introduces platform-specific implementations of this
functions. Moving the function to a separate file makes it possible to
add other implementations without polluting utils.cpp with more
platform-specific code.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ibd566d8710380fe378a8f3df9454e21e83655b62

11 months agotests: tools/clear/test_ust wait for specific test app pid
Kienan Stewart [Tue, 19 Dec 2023 19:01:47 +0000 (14:01 -0500)] 
tests: tools/clear/test_ust wait for specific test app pid

Observed issue
==============

When debugging failing tests manually, one step that is sometimes done
is to quickly swap the commands that start the relay or sessiond in
`tests/utils/utils.sh` (eg. in `start_lttng_relayd_opt`) for the version
which uses a verbose output to a logfile.

When doing this, the `relayd` wasn't using the background
`process_mode`, and was a child of the running test.

This caused `test_ust_local_snapshot_per_pid` in
`tests/regression/tools/clear/test_ust` to hang as it waited for all
child processes to terminate.

Solution
========

The test has been updated to wait for only the specific test application
pid.

Known drawbacks
===============

None.

Change-Id: I8761649a52fceda92a5545c71818dc2eb027bfcf
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
11 months agoREADME.adoc: Jenkins CI badge points to non-existent job
Jérémie Galarneau [Mon, 18 Dec 2023 16:44:01 +0000 (11:44 -0500)] 
README.adoc: Jenkins CI badge points to non-existent job

The lttng-tools_master_build job has been superseded by
lttng-tools_master_linuxbuild. The badge now points to this job as it is
the "main" build configuration.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I72e4b53709ccef95c4bb708dc3ac49f45f48b6d7

11 months agosessiond: log error message when libkmod operations fail
Kienan Stewart [Wed, 22 Nov 2023 16:18:23 +0000 (11:18 -0500)] 
sessiond: log error message when libkmod operations fail

Issue observed
--------------

When loading or unloading modules and libkmod is being used, there are
no details available in the `lttng-sessiond` error ouput.

For example,

```
$ sudo -E lttng-sessiond
Error: Unable to load required module lttng-ring-buffer-client-discard
Warning: No kernel tracer available
```

When libkmod is not available or disabled in the configuration
options, `lttng-sessiond` falls back to invoking `modprobe` via a
`system` call. The command's error output will be visible and provide
the necessary details, eg.

```
$ sudo -E lttng-sessiond
modprobe: FATAL: Module lttng-ring-buffer-client-discard not found in directory /usr/lib/modules/6.2.0-36-generic
Error: Unable to load required module lttng-ring-buffer-client-discard
Warning: No kernel tracer available
```

Solution
--------

Include the error message from `strerror` in the message that is
logged via the `DBG` or `ERR` macros.

The error is no clearer for users, eg.

```
$ sudo -E lttng-sessiond
PERROR - 12:00:05.004593045 [Main]: Unable to load required module
lttng-ring-buffer-client-discard: No such file or directory (in
modprobe_lttng() at modprobe.cpp:396)
Warning: No kernel tracer available
```

Known drawbacks
---------------

The debug message emitted when a non-obligatory kernel module fails to
load is now printed with `PERROR`.

Change-Id: Ibd25614a6c5b5dd3b801063eafc272a4017058cd
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
11 months agoFix: sessiond: crash when sending data_pending to an active session
Kienan Stewart [Thu, 13 Jul 2023 19:13:12 +0000 (15:13 -0400)] 
Fix: sessiond: crash when sending data_pending to an active session

Observed Issue
==============

When a data_pending command is sent to an active session, the sessiond
crashes with the following assert

```
lttng-sessiond: client.cpp:2647: void* thread_manage_clients(void*): Assertion `cmd_ctx.reply_payload.buffer.size >= sizeof(*llm)' failed.
Error: 1 trace chunks are leaked by lttng-consumerd. This can be caused by an internal error of the session daemon.
```

Cause
=====

When a session is active, cmd.cpp:cmd_data_pending() returns
LTTNG_ERR_SESSION_STARTED. In client.cpp:process_client_msg(), this
return value causes the execution to go the the setup_error label. In
the setup_error label, no default LLM header is added to the reply,
meaning the reply has a zero size and triggering the assert above.

Solution
========

When cmd_data_pending() returns a value that is neither 0 nor 1, the
return code is set appropriately as follows:

  * when the return value is outside the range of lttng error codes,
  LTTNG_ERR_UNK is used
  * otherwise, the return value is used

The execution then jumps to the error label so that the default LLM
message header can be added.

Known Drawbacks
===============

None.

Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Iff46f87c7725d25c131a86ac3dbaed5c99b4d16b

11 months agodocs: Update contributing guide
Kienan Stewart [Thu, 23 Nov 2023 21:34:18 +0000 (16:34 -0500)] 
docs: Update contributing guide

Indicate that Gerrit (https://review.lttng.org) is the principal place
where patches are submitted and reviewed, rather than the mailing list.

Based on feedback received on the mailing list:
https://lists.lttng.org/pipermail/lttng-dev/2023-November/030670.html

Change-Id: Icb0bc3e45bb35fa85eca272d8043e5553465f700
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
13 months agoFix: sessiond: cmd potentially used uninitialized
Jérémie Galarneau [Fri, 6 Oct 2023 19:17:41 +0000 (15:17 -0400)] 
Fix: sessiond: cmd potentially used uninitialized

GCC reports the following warning:

  libtool: compile:  g++ -std=gnu++11 -DHAVE_CONFIG_H -I../../../include -I../../../include -I../../../src -I../../../src -include config.h -I/home/mjeanson/opt/include -I/home/mjeanson/opt/include -DINSTALL_BIN_PATH=\"/home/mjeanson/opt/lib/lttng/libexec\" -DINSTALL_LIB_PATH=\"/home/mjeanson/opt/lib\" -fvisibility=hidden -fvisibility-inlines-hidden -fno-strict-aliasing -Wall -Wextra -Wmissing-declarations -Wnull-dereference -Wundef -Wredundant-decls -Wshadow -Wsuggest-attribute=format -Wwrite-strings -Wformat=2 -Wstrict-aliasing -Wmissing-noreturn -Wduplicated-cond -Wduplicated-branches -Wlogical-op -Winit-self -Wno-incomplete-setjmp-declaration -Wno-gnu-folding-constant -Wno-sign-compare -Werror -pthread -g -O2 -MT notification-thread-events.lo -MD -MP -MF .deps/notification-thread-events.Tpo -c notification-thread-events.cpp  -fPIC -DPIC -o .libs/notification-thread-events.o
  In file included from field.hpp:14,
                   from ust-field-convert.hpp:11,
                   from ust-app.hpp:14,
                   from event-notifier-error-accounting.hpp:11,
                   from notification-thread-events.cpp:12:
  ../../../src/vendor/optional.hpp: In function ‘int handle_notification_thread_command(notification_thread_handle*, notification_thread_state*)’:
  ../../../src/vendor/optional.hpp:877:17: error: ‘cmd’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
    877 |         return &data;
        |                 ^~~~
  notification-thread-events.cpp:3181:45: note: ‘cmd’ was declared here
   3181 |         struct notification_thread_command *cmd;
        |

When failing to pop a command, cmd can indeed be used uninitialized when
jumping to the error label.

Reported-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I5e63451ed4936638e9fbe05146f16c86ea2e42e2

13 months ago.gitignore: add man{1,3,7,8} symlinks
Jérémie Galarneau [Mon, 2 Oct 2023 19:37:58 +0000 (15:37 -0400)] 
.gitignore: add man{1,3,7,8} symlinks

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I9695833394aa51c1fa62f86cc77ce8981088cbbf

13 months agoutils: Allow users to define LTTNG_MANPATH
Olivier Dion [Mon, 2 Oct 2023 19:09:44 +0000 (15:09 -0400)] 
utils: Allow users to define LTTNG_MANPATH

Currently, the configured value `MANPATH` is used when executing `man`.
This forces `lttng --help` to use man pages where they will be
installed, even with the `pre-inst-env` script.

Instead, let the user provide a `LTTNG_MANPATH` environment variable. If
not defined, fallback to the configured `MANPATH`.

This allows developers to do:

  $ ./pre-inst-env lttng --help

to read the locally generated man pages where the `pre-inst-env` script
was generated.

Also adding the `LTTNG_MAN_BIN_PATH` to `pre-inst-env` since `man` could
be installed someplace else.

Change-Id: I32d9af480737bb80732dc5d690f947242aacac4f
Signed-off-by: Olivier Dion <odion@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
13 months agodoc/man/Makefile: Mimic mandb(5) path hierarchy
Olivier Dion [Mon, 2 Oct 2023 16:01:29 +0000 (12:01 -0400)] 
doc/man/Makefile: Mimic mandb(5) path hierarchy

The following allows developers to read locally generated man pages by
using the `pre-inst-env' script. For example:

  $ ./pre-inst-env man lttng-add-context

will open the `lttng-add-context.1' man pages in the build directory
under which the `pre-inst-env' was generated.

This is done by:

  1. Simlinking `build/doc/man{1,3,7,8}' to `build/doc/man'

  2. Adding MANPATH to `pre-inst-env'

The symlinking part is a hack to force `man' to use our current doc
layout, doing a less invasive change than would be required otherwise

Change-Id: I2ea1af779f237fe1808a1d44d4f3b1c3a8535e2d
Signed-off-by: Olivier Dion <odion@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
13 months agoTests: valid_filters: temporary trace path uses 'invalid'
Jérémie Galarneau [Wed, 27 Sep 2023 12:54:22 +0000 (08:54 -0400)] 
Tests: valid_filters: temporary trace path uses 'invalid'

Likely a result of a copy/paste error, the temporary trace path used for
the 'valid' filter tests uses the word 'invalid'.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I70123f372b0f52660f9d3cb2c27e7ff90e9c3c7a

13 months agowaiter: modernize the waiter interface
Jérémie Galarneau [Sat, 23 Sep 2023 15:24:26 +0000 (11:24 -0400)] 
waiter: modernize the waiter interface

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I3e8ffac4324e36dc3bf7a7f79d874f228a48def6

15 months agoLicense: common: error_query: fix typo in SPDX specifier
Jérémie Galarneau [Thu, 24 Aug 2023 17:33:44 +0000 (13:33 -0400)] 
License: common: error_query: fix typo in SPDX specifier

The error-query API files were erroneously licensed under "GPL-2.1", a
license which doesn't exist.

As the author of those files, I hereby confirm that the intention was to
license these files under LGPL-2.1 as evidenced by their presence in the
libcommon_lgpl internal library.

Reported-by: Christophe Bedard <christophe.bedard@apex.ai>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I0d56372ca009e44732e0737bd8f10ad4a4d000c5

15 months agoDocs: CodingStyle.md: remove extraneous underscore in emphasis
Jérémie Galarneau [Tue, 22 Aug 2023 19:22:22 +0000 (15:22 -0400)] 
Docs: CodingStyle.md: remove extraneous underscore in emphasis

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I802551140889d23dc0ac6521d95254ea441ffa22

15 months agodocs: fix: Match stated automake requirement
Erica Bugden [Thu, 10 Aug 2023 21:16:25 +0000 (17:16 -0400)] 
docs: fix: Match stated automake requirement

to the applied requirement. Previously, the README stated the minimum
required version of automake was 1.10. However, since commit 343a7a984
(Require automake >= 1.12), the configuration script (configure.ac)
actually enforced a minimum of 1.12:

AM_INIT_AUTOMAKE([1.12 foreign dist-bzip2 no-dist-gzip tar-pax ...

Change-Id: I4f9fcc3aca340e4638e93d69155c51f82247e29d
Signed-off-by: Erica Bugden <ebugden@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
15 months agoTests: use CPU ids from online ranges
Kienan Stewart [Mon, 19 Jun 2023 19:32:54 +0000 (15:32 -0400)] 
Tests: use CPU ids from online ranges

test_tracefile_count could fail randomly on systems where there are CPUs
present but not online. For example:

  $ cat /sys/devices/system/cpu/online
  0-7
  $ cat /sys/devices/system/cpu/present
  0-39

When a CPU is present, it will have an entry in
/sys/devices/system/cpu/cpuX for it's ID, and thus the test may pick
that CPU's ID. However, a present CPU which is not online is not a valid
target for taskset.

In cases where `get_any_available_cpu` is used with task set, the tests
could fail for a similar reason. This case can be somewhat less common,
because it would return the numerically lowest CPU first; however, with
online as follows cpu 0 isn't available and taskset fails.

  $ cat /sys/devices/system/cpu/online
  18-19,135,142
  $ cat /sys/devices/system/cpu/present
  0-167

Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ia7fb7ff69ecdd7aa6bac9dcfdf72344df08f6782

15 months agoMove lttng_session unique_ptr to lttng/session-internal.hpp
Simon Marchi [Tue, 27 Jun 2023 17:02:04 +0000 (13:02 -0400)] 
Move lttng_session unique_ptr to lttng/session-internal.hpp

Make it possible to use this unique_ptr elsewhere.

Change-Id: I30141efac45d842f4bc3414ca03fffb2e4ba5cce
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
15 months agolttng: make session_storage::_array a unique_ptr to array
Simon Marchi [Tue, 27 Jun 2023 16:39:42 +0000 (12:39 -0400)] 
lttng: make session_storage::_array a unique_ptr to array

By making this field a unique_ptr of lttng_session[] instead of
lttng_session, we can use the subscript operator on it, making the code
in session_list_operations a bit simpler.

Change-Id: Ic4e441a23be834e68c5af3f4cca2794f86f2f57e
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
15 months agosessiond: remove ust-metadata.cpp
Simon Marchi [Mon, 26 Jun 2023 19:38:11 +0000 (15:38 -0400)] 
sessiond: remove ust-metadata.cpp

This file is essentially empty, remove it.  Note that it's no longer
listed in Makefile.am, so it isn't compiled.

Change-Id: I88251890cbe10e10ed7c135dc88be5d4c0e2ef5a
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
15 months agoTests: fix formatting in gen-ust-events.cpp
Simon Marchi [Tue, 20 Jun 2023 20:40:48 +0000 (16:40 -0400)] 
Tests: fix formatting in gen-ust-events.cpp

I see this change when running format-cpp.

Change-Id: If80837682bb65dfc14fe8a5df22aca94e7047e44
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
15 months agosessiond: disable clang-format to work around unstable output
Simon Marchi [Tue, 20 Jun 2023 20:38:14 +0000 (16:38 -0400)] 
sessiond: disable clang-format to work around unstable output

When running format-cpp multiple times, I see clang-format-14
alternating between these two forms:

        _environment += lttng::format(
            "   {} = \"{}\";\n", field.name, escape_tsdl_env_string_value(field.value));

        _environment += lttng::format(" {} = \"{}\";\n",
                          field.name,
                          escape_tsdl_env_string_value(field.value));

Disable clang-format locally to avoid always having some spurious
changes.

Change-Id: I71b10a2ad1a5264f26c61f54743f298eb10917bf
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
15 months agoformat-cpp: run clang-format in parallel
Simon Marchi [Tue, 20 Jun 2023 20:32:51 +0000 (16:32 -0400)] 
format-cpp: run clang-format in parallel

Use the -P option of GNU xargs to run multiple instances of clang-format
in parallel, which speeds up the execution quite a bit (depending on the
number of cores, of course).

Inspired by this babeltrace commit:

  http://git.efficios.com/?p=babeltrace.git;a=commit;h=66c3bce11973e6e96a3791c378a9e5f98ddaa280

Change-Id: I201535244ef4c3614dfd742ae6f1c427994e6147
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
15 months agoFix: format-cpp: don't pass -i twice to clang-format
Jérémie Galarneau [Thu, 27 Jul 2023 18:09:00 +0000 (14:09 -0400)] 
Fix: format-cpp: don't pass -i twice to clang-format

From Simon Marchi's original commit message:

I'm trying to run format-cpp, with clang-format-14 in my PATH, and I
get a ton of these messages:

    clang-format-14: for the -i option: may only occur zero or one times!

This is because -i is present in the FORMATTER variables, as well as in
the command line where the formatter is invoked.

Remove the one in the command-line.

Instead of assuming the FORMATTER variable contains '-i', assume it
doesn't since the options are not semantically part of the formatter's
name. The '-i' option is passed to the formatter invocation directly
since it is always needed.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I0c26ff0d8d4d99b3f161ca9e5aa94ff867e3f916

15 months agoTests: python: path-like object introduced in python 3.6
Michael Jeanson [Tue, 27 Jun 2023 17:33:18 +0000 (17:33 +0000)] 
Tests: python: path-like object introduced in python 3.6

Prior to python 3.6 the os.path() function expected a string or bytes
object for the pathname. Use a compat method to convert the path-like
object to a string on interpreters that lack PEP-519 [1] support.

Traceback (most recent call last):
  File "tests/regression/tools/context/test_ust.py", line 156, in <module>
    tap, test_env, lttngtest.VpidContextType(), lambda test_app: test_app.vpid
  File "tests/regression/tools/context/test_ust.py", line 114, in test_static_context
    test_app = test_env.launch_wait_trace_test_application(50)
  File "tests/utils/lttngtest/environment.py", line 541, in launch_wait_trace_test_application
    wait_before_exit_file_path,
  File "tests/utils/lttngtest/environment.py", line 163, in __init__
    self._wait_for_file_to_be_created(pathlib.Path(app_ready_file_path))
  File "tests/utils/lttngtest/environment.py", line 168, in _wait_for_file_to_be_created
    if os.path.exists(sync_file_path):
  File "/usr/lib/python3.5/genericpath.py", line 19, in exists
    os.stat(path)
TypeError: argument should be string, bytes or integer, not PosixPath

[1] https://peps.python.org/pep-0519/

Change-Id: I783e36f61223d44667294ccbf4b3aec5bff68701
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
15 months agoFix: lttng-add-context: context type options possible null dereference
Jérémie Galarneau [Thu, 27 Jul 2023 16:26:42 +0000 (12:26 -0400)] 
Fix: lttng-add-context: context type options possible null dereference

Coverity reports that:
** CID 1518091:  Null pointer dereferences  (FORWARD_NULL)
/src/bin/lttng/commands/add_context.cpp: 820 in destroy_ctx_type(<unnamed>::ctx_type *)(

Free application context options only if type->opt isn't null.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Icc27d04480c4821ed33127f5baf293510cdb314e

15 months agoFix: consumerd: slow metadata push slows down application registration
Jérémie Galarneau [Thu, 29 Jun 2023 18:04:37 +0000 (14:04 -0400)] 
Fix: consumerd: slow metadata push slows down application registration

Issue observed
--------------

When rotating the channels of a session configured with a "per-pid"
buffer sharing policy, applications with a long registration
timeout (e.g. LTTNG_UST_REGISTER_TIMEOUT=-1, see LTTNG-UST(3)) sometimes
experience long start-up times.

Cause
-----

The session list lock is held during the registration of an application
and during the setup of a rotation.

While setting up a rotation in the userspace domain, the session daemon
flushes its metadata cache to the userspace consumer daemon and waits
for a confirmation that all metadata emitted before that point in time
has been serialized (whether on disk or sent through a network output).

As the consumer daemon waits for the metadata to be consumed, it
periodically checks the metadata stream's output position with a 200ms
delay (see DEFAULT_METADATA_AVAILABILITY_WAIT_TIME).

In practice, in per-uid mode, this delay is seldomly encountered since
the metadata has already been pushed by the consumption thread.
Moreover, if it was not, a single polling iteration will typically
suffice.

However, in per-pid buffering mode and with a sustained "heavy" data
production rate, this delay becomes problematic since:
  - metadata is pushed for every application,
  - the delay is hit almost systematically as the consumption thread
    is busy and has to catch up to consume the most recent metadata.

Hence, some rotation setups can easily take multiple seconds (at least
200ms per application). This makes the locking scheme employed on that
path unsuitable as it blocks some operations (like application
registrations) for an extended period of time.

Solution
--------

The polling "back-off" delay is eliminated by using a waiter that allows
the consumer daemon thread that runs the metadata push command to
wake-up whenever the criteria used to evaluate the "pushed" metadata
position are changed.

Those criteria are:
  - the metadata stream's pushed position
  - the lifetime of the metadata channel's stream
  - the status of the session's endpoint

Whenever those states are affected, the waiters are woken-up to force a
re-evaluation of the metadata cache flush position and, eventually,
cause the metadata push command to complete.

Note
----

The waiter queue is adapted from urcu-wait.h of liburcu (also LGPL
licensed).

Change-Id: Ib86c2e878abe205c73f930e6de958c0b10486a37
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
16 months agoDocs: Fix broken reference in lttng-add-trigger
Kienan Stewart [Thu, 29 Jun 2023 14:32:59 +0000 (10:32 -0400)] 
Docs: Fix broken reference in lttng-add-trigger

Change-Id: I4068570d188fbf75e402898234944b6e21cfa2a1
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
16 months agoDocs: Fix broken reference to lttng-concepts(7) man page
Kienan Stewart [Thu, 6 Jul 2023 20:20:24 +0000 (16:20 -0400)] 
Docs: Fix broken reference to lttng-concepts(7) man page

Change-Id: Iaa700e06ec98a3a451f10b4e287c7b28e6ff4524
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
16 months agoTests: Preemptively fail infinite blocking tests when low on disk space
Kienan Stewart [Wed, 21 Jun 2023 13:39:06 +0000 (09:39 -0400)] 
Tests: Preemptively fail infinite blocking tests when low on disk space

In the system tests run by LAVA, the infinite blocking tests were
hanging when the system under test ran out of disk space. This is the
expected behaviour of the failing test, but the condition can be
detected and the tests preemptively failed with a clear error of what
needs to be addressed in the system being tested.

Change-Id: I9e6126408b57c2cd5aa64c2e360e0672f9eb2314
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
16 months agoFix: sessiond: bad fd used while rotating exiting app's buffers
Jérémie Galarneau [Wed, 17 May 2023 17:41:03 +0000 (13:41 -0400)] 
Fix: sessiond: bad fd used while rotating exiting app's buffers

Issue observed
--------------

From bug #1372:

We are observing seemingly random crashes in the LTTng consumer daemon
when tracing a C++ application with LTTng-UST. Our workload has a single
printf-like tracepoint, where each string is in the order of 1kb and the
total output is around 30MB/s.

LTTng is set up with a single session and channel enabling this
tracepoint, and we enabled rotation with a maximum size of 100MB or
every 30 seconds. We are periodically starting new traced processes and
the system runs close to 100% CPU load. This ran on an AWS
Graviton2 (ARM) instance with CentOS 7 and a 5.4 kernel, using LTTng-UST
2.13.5 and LTTng-tools 2.13.8.

The first reported error is a write to a bad file descriptor (-1),
apparently when waking up the metadata poll thread during a rotation.

Cause
-----

Inspecting the logs, we see that the metadata channel with key 574 has a
negative poll fd write end which causes the write in
consumer_metadata_wakeup_pipe to fail because of an invalid file
descriptor:

  DBG1 - 15:12:13.271001175 [6593/6605]: Waking up metadata poll thread (writing to pipe): channel name = 'metadata', channel key = 574 (in consumer_metadata_wakeup_pipe() at consumer.c:888)
  DBG3 - 15:12:13.271010093 [6593/6605]: write() fd = -1 (in consumer_metadata_wakeup_pipe() at consumer.c:892)
  PERROR - 15:12:13.271014655 [6593/6605]: Failed to write to UST metadata pipe while attempting to wake-up the metadata poll thread: Bad file descriptor (in consumer_metadata_wakeup_pipe() at consumer.c:907)
  Error: Failed to dump the metadata cache
  Error: Rotate channel failed

Meanwhile, a lot of applications seem to be unregistering. Notably, the
application associated with that metadata channel is being torn down.

Leading up to the use of a bad file descriptor, the chain of events is:

1) The "rotation" thread starts to issue "Consumer rotate channel" on
   key 574 (@ `15:12:12.865621802`), but blocks on the consumer socket
   lock. We can deduce this from the fact that thread "6605" in the
   consumer wakes up to process an unrelated command originating from the
   same socket.

   We don't see that command being issued by the session daemon, most
   likely because it occurs just before the captured logs start. All
   call sites that use this socket take the socket lock, issue their
   command, wait for a reply, and release the socket lock.

2) The application unregisters (@ `15:12:13.269722736`). The
   `registry_session`, which owns the metadata contents, is destroyed
   during `delete_ust_app_session` which is done directly as a consequence
   of the app unregistration (through a deferred RCU call), see
   `ust_app_unregister`.

   This is problematic since the consumer will request the metadata during
   the rotation of the metadata channel. In the logs, we can see that
   the "close_metadata" command blocks on the consumer socket lock.
   However, the problem occurs when the `manage-apps` acquires the lock
   before the "rotation" thread. In this instance, the "close-metadata"
   command is performed by the consumer daemon, closing the metadata
   poll file descriptor.

3) As the "close_metadata" command completes, the rotation thread
   successfully acquires the socket lock. It is not aware of the
   unregistration of the application and of the subsequent tear-down of the
   application, registry, and channels since it was already iterating on
   the application's channels.

   The consumer starts to process the channel rotation command (@
   `15:12:13.270633213`) which fails on the metadata poll fd.

Essentially, we must ensure that the lifetime of metadata
channel/streams exceeds any ongoing rotation, and prevent a rotation
from being launched when an application is being torn-down in per-PID
buffering mode.

The problem is fairly hard to reproduce as it requires threads to
wake-up in the problematic order described above. I don't have a
straight-forward reproducer for the moment.

Solution
--------

During the execution of a rotation on a per-pid session, the session
daemon iterates on all applications to rotate their data and metadata
channels.

The `ust_app` itself is correctly protected: it is owned by an RCU HT
(`ust_app_ht`) and the RCU read lock is acquired as required to protect
the lifetime of the storage of `ust_app`. However, there is no way to
lock an `ust_app` instance itself.

The rotation command assumes that if it finds the `ust_app`, it will be
able to rotate all of its channels. This isn't true: the `ust_app` can
be unregistered by the `manage-applications` thread which monitors the
application sockets for their deaths in order to teardown the
applications.

The `ust_app` doesn't directly own its channels; they are owned by an
`ust_app_session` which, itself, has a `lock` mutex. Also, the metadata
of the application is owned by the "session registry", which itself can
also be locked.

At a high-level, we want to ensure that the metadata isn't closed while
a rotation is being setup. The registry lock could provide this
guarantee. However, it currently needs to remain unlocked during the
setup of the rotation as it is used when providing the metadata to the
consumer daemon.

Taking the registry lock over the duration of the setup would result in
a deadlock like so:

- the consumer buffer consumption thread consumed a data buffer and attempts
  a metadata sync,
- the command handling thread of the consumer daemon attempts to rotate
  any stream that is already at its rotation position and locks on the
  channel lock held by the consumption thread,
- the metadata sync launches a metadata request against the session
  daemon which attempts to refresh the metadata contents through the
  command socket,
- the command handling thread never services the metadata "refresh" sent
  by the session daemon since it is locked against the same channel as
  the buffer consumption thread, resulting in a deadlock.

Instead, a different approach is required: extending the lifetime of the
application's channels over the duration of the setup of a rotation.

To do so, the `ust_app` structure (which represents a registered
application) is now reference-counted. A reference is acquired over the
duration of the rotation's setup phase. This reference transitively
holds a reference the application's tracing buffers.

Note that taking a reference doesn't prevent applications from
unregistering; it simply defers the reclamation of their buffers to the
end of the rotation setup.

As the rotation completes its setup phase, the references to the
application (and thus, its tracing buffers) are released, allowing the
reclamation of all buffering ressources.

Note that the setup phase of the rotation doesn't last long so it
shouldn't significantly change the observable behaviour in terms of
memory usage. The setup phase mostly consists in sampling the
consumption/production positions of all buffers in order to establish a
switch-over point between the old and new files.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I8dc1ee45dd00c85556dd70d34a3af4f3a4d4e7cb

16 months agoFix: sessiond: leak of application context in channel
Jérémie Galarneau [Mon, 24 Jul 2023 20:45:07 +0000 (16:45 -0400)] 
Fix: sessiond: leak of application context in channel

Issue observed
--------------

ASAN generates the following report when the session daemon exists after
running the tests/regression/tools/context/test_ust.py test suite.

  lttng-sessiond: ==930543==ERROR: LeakSanitizer: detected memory leaks
  lttng-sessiond: Direct leak of 8 byte(s) in 1 object(s) allocated from:
  lttng-sessiond:      0 0x7f8d1706c33a in __interceptor_strdup /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_interceptors.cpp:454
  lttng-sessiond:      1 0x55e36fa6d107 in alloc_ust_app_ctx /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/ust-app.cpp:1368
  lttng-sessiond:      2 0x55e36fa82f73 in create_ust_app_channel_context /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/ust-app.cpp:2912
  lttng-sessiond:      3 0x55e36fa9eeac in ust_app_channel_create /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/ust-app.cpp:5062
  lttng-sessiond:      4 0x55e36faa9fef in find_or_create_ust_app_channel /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/ust-app.cpp:5936
  lttng-sessiond:      5 0x55e36faab610 in ust_app_synchronize_all_channels /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/ust-app.cpp:6147
  lttng-sessiond:      6 0x55e36faac12e in ust_app_synchronize /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/ust-app.cpp:6208
  lttng-sessiond:      7 0x55e36faacc29 in ust_app_global_update(ltt_ust_session*, ust_app*) /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/ust-app.cpp:6268
  lttng-sessiond:      8 0x55e36faa910e in ust_app_start_trace_all(ltt_ust_session*) /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/ust-app.cpp:5850
  lttng-sessiond:      9 0x55e36f920343 in cmd_start_trace(ltt_session*) /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/cmd.cpp:2826
  lttng-sessiond:      10 0x55e36f9ffac5 in process_client_msg /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/client.cpp:1779
  lttng-sessiond:      11 0x55e36fa077c0 in thread_manage_clients /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/client.cpp:2588
  lttng-sessiond:      12 0x55e36f9e4d85 in launch_thread /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/thread.cpp:67
  lttng-sessiond:      13 0x7f8d15c9d44a  (/usr/lib/libc.so.6+0x8744a) (BuildId: 2f005a79cd1a8e385972f5a102f16adba414d75e)
  lttng-sessiond: Direct leak of 5 byte(s) in 1 object(s) allocated from:
  lttng-sessiond:      0 0x7f8d1706c33a in __interceptor_strdup /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_interceptors.cpp:454
  lttng-sessiond:      1 0x55e36fa6d059 in alloc_ust_app_ctx /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/ust-app.cpp:1367
  lttng-sessiond:      2 0x55e36fa82f73 in create_ust_app_channel_context /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/ust-app.cpp:2912
  lttng-sessiond:      3 0x55e36fa9eeac in ust_app_channel_create /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/ust-app.cpp:5062
  lttng-sessiond:      4 0x55e36faa9fef in find_or_create_ust_app_channel /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/ust-app.cpp:5936
  lttng-sessiond:      5 0x55e36faab610 in ust_app_synchronize_all_channels /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/ust-app.cpp:6147
  lttng-sessiond:      6 0x55e36faac12e in ust_app_synchronize /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/ust-app.cpp:6208
  lttng-sessiond:      7 0x55e36faacc29 in ust_app_global_update(ltt_ust_session*, ust_app*) /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/ust-app.cpp:6268
  lttng-sessiond:      8 0x55e36faa910e in ust_app_start_trace_all(ltt_ust_session*) /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/ust-app.cpp:5850
  lttng-sessiond:      9 0x55e36f920343 in cmd_start_trace(ltt_session*) /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/cmd.cpp:2826
  lttng-sessiond:      10 0x55e36f9ffac5 in process_client_msg /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/client.cpp:1779
  lttng-sessiond:      11 0x55e36fa077c0 in thread_manage_clients /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/client.cpp:2588
  lttng-sessiond:      12 0x55e36f9e4d85 in launch_thread /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng-sessiond/thread.cpp:67
  lttng-sessiond:      13 0x7f8d15c9d44a  (/usr/lib/libc.so.6+0x8744a) (BuildId: 2f005a79cd1a8e385972f5a102f16adba414d75e)
  lttng-sessiond: SUMMARY: AddressSanitizer: 13 byte(s) leaked in 2 allocation(s).

Cause
-----

In the case of application contexts, alloc_ust_app_ctx() copies the
provider and application context names. However, these fields are not
free'd by delete_ust_app_ctx().

Solution
--------

The application context and provider names are free'd during
delete_ust_app_ctx() when the context type is LTTNG_UST_ABI_CONTEXT_APP.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I0759018ec1811cf6246b5a80d4f5a7545c63910a

16 months agoFix: lttng-add-context: leak of application context parameters
Jérémie Galarneau [Mon, 24 Jul 2023 20:10:50 +0000 (16:10 -0400)] 
Fix: lttng-add-context: leak of application context parameters

Issue observed
--------------

ASAN reports the following leak when running the
tests/regression/tools/context/test_ust.py test suite:

  Direct leak of 8 byte(s) in 1 object(s) allocated from:
    #0 0x7f32e5ae0cd1 in __interceptor_calloc /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:77
    #1 0x5653e1092088 in zmalloc_internal ../../../src/common/macros.hpp:60
    #2 0x5653e10922b3 in char* calloc<char>(unsigned long) string-utils/../macros.hpp:113
    #3 0x5653e119d68f in get_context_type commands/add_context.cpp:1012
    #4 0x5653e119ddf5 in cmd_add_context(int, char const**) commands/add_context.cpp:1059
    #5 0x5653e11e12e7 in handle_command /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng/lttng.cpp:237
    #6 0x5653e11e2027 in parse_args /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng/lttng.cpp:427
    #7 0x5653e11e24e1 in _main /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng/lttng.cpp:474
    #8 0x5653e11e25bd in main /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng/lttng.cpp:485
    #9 0x7f32e3e3984f  (/usr/lib/libc.so.6+0x2384f) (BuildId: 2f005a79cd1a8e385972f5a102f16adba414d75e)

  Direct leak of 5 byte(s) in 1 object(s) allocated from:
    #0 0x7f32e5ae0cd1 in __interceptor_calloc /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:77
    #1 0x5653e1092088 in zmalloc_internal ../../../src/common/macros.hpp:60
    #2 0x5653e10922b3 in char* calloc<char>(unsigned long) string-utils/../macros.hpp:113
    #3 0x5653e119d2ae in get_context_type commands/add_context.cpp:1003
    #4 0x5653e119ddf5 in cmd_add_context(int, char const**) commands/add_context.cpp:1059
    #5 0x5653e11e12e7 in handle_command /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng/lttng.cpp:237
    #6 0x5653e11e2027 in parse_args /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng/lttng.cpp:427
    #7 0x5653e11e24e1 in _main /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng/lttng.cpp:474
    #8 0x5653e11e25bd in main /home/jgalar/EfficiOS/src/lttng-tools/src/bin/lttng/lttng.cpp:485
    #9 0x7f32e3e3984f  (/usr/lib/libc.so.6+0x2384f) (BuildId: 2f005a79cd1a8e385972f5a102f16adba414d75e)

Cause
-----

The context and provider names are dynamically allocated by
get_context_type() and stored in ctx_type. However, destroy_ctx_type()
never frees those members when the structure is of type
CONTEXT_APP_CONTEXT.

Solution
--------

Free both names when an application context type is destroyed.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I86dde1eed9f0cc63499c936cf373b094168035e2

16 months agoFix: sessiond: memory leak of lttng_pipe structure
Jérémie Galarneau [Mon, 24 Jul 2023 19:25:39 +0000 (15:25 -0400)] 
Fix: sessiond: memory leak of lttng_pipe structure

Issue observed
--------------

When running the session daemon under ASAN, the following report is
produced:

  Direct leak of 104 byte(s) in 1 object(s) allocated from:
      #0 0x7f93866e0cd1 in __interceptor_calloc /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:77
      #1 0x55c55a7c4963 in zmalloc_internal /home/simark/src/lttng-tools/src/common/macros.hpp:60
      #2 0x55c55a7c4973 in lttng_pipe* zmalloc<lttng_pipe>() /home/simark/src/lttng-tools/src/common/macros.hpp:88
      #3 0x55c55a7c26eb in _pipe_create /home/simark/src/lttng-tools/src/common/pipe.cpp:111
      #4 0x55c55a7c351d in lttng_pipe_open(int) /home/simark/src/lttng-tools/src/common/pipe.cpp:185
      #5 0x55c55a586dd6 in operator() /home/simark/src/lttng-tools/src/bin/lttng-sessiond/rotation-thread.cpp:403
      #6 0x55c55a58744a in lttng::sessiond::rotation_thread::rotation_thread(lttng::sessiond::rotation_thread_timer_queue&, notification_thread_handle&) /home/simark/src/lttng-tools/src/bin/lttng-sessiond/rotation-thread.cpp:402
      #7 0x55c55a46377f in std::unique_ptr<lttng::sessiond::rotation_thread, std::default_delete<lttng::sessiond::rotation_thread> > lttng::make_unique<lttng::sessiond::rotation_thread, lttng::sessiond::rotation_thread_timer_queue&, notification_thread_handle&>(lttng::sessiond::rotation_thread_timer_queue&, notification_thread_handle&) /home/simark/src/lttng-tools/src/common/make-unique.hpp:18
      #8 0x55c55a455024 in _main /home/simark/src/lttng-tools/src/bin/lttng-sessiond/main.cpp:1773
      #9 0x55c55a455c2e in main /home/simark/src/lttng-tools/src/bin/lttng-sessiond/main.cpp:1982
      #10 0x7f9385c1484f  (/usr/lib/libc.so.6+0x2384f) (BuildId: 2f005a79cd1a8e385972f5a102f16adba414d75e)

Cause
-----

On destruction, the std::unique_ptr wrapper of
lttng_pipe (lttng_pipe::uptr) invokes `lttng_pipe_close` (which only
closes the file descriptors of the underlying pipe) rather than
`lttng_pipe_destroy` which closes the file descriptors _and_ frees the
memory allocated by lttng_open.

Currently, the rotation thread is the only user of this wrapper (through
its quit_pipe).

Solution
--------

The deleter of lttng_pipe::uptr is replaced to invoke lttng_pipe_destroy.

Fixes #1380
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I5715ac6131c5aa134cfd18d8b677f31aabed36f0

16 months agoevent-rule: set event rule loglevel to domain specific value when unset
Jérémie Galarneau [Fri, 14 Jul 2023 21:26:26 +0000 (17:26 -0400)] 
event-rule: set event rule loglevel to domain specific value when unset

The various domains that support log levels define different values for
ALL/UNSET that are not -1. Using these domain values makes reasoning
about the code simpler as -1 does not necessarily have a defined meaning
in all domains.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I2e818d42f72d3b12e44e375ce30367bf1f6d5463

16 months agoFix: sessiond: preserve jul/log4j domain loglevels
Kienan Stewart [Wed, 28 Jun 2023 15:28:55 +0000 (11:28 -0400)] 
Fix: sessiond: preserve jul/log4j domain loglevels

Issue observed
==============

Following dcd24bbf7dbc74e3584d1d0d52715e749023c452, the
lttng-ust-java-tests started failing with a number of errors such as
the following [1]:

```
org.opentest4j.AssertionFailedError: expected: java.util.Collections$SingletonList@3270d194<[Event name = eventA, Log level selector = (LTTNG_EVENT_LOGLEVEL_ALL), Filter string = logger_name == "eventA"]> but was: java.util.ArrayList@235834f2<[Event name = eventA, Log level selector = (LTTNG_EVENT_LOGLEVEL_ALL), Filter string = logger_name == "eventA"]>
at org.lttng.ust.agent.integration.client.TcpClientIT.testEnableEvent(TcpClientIT.java:187
```
While the assertion failure print out looks like the events are the
same, there is a difference in between the objects which is not
printed: the loglevel integer value. For example:

```
eventA [level -2147483648, type 0]: logger_name == "eventA"
eventB [level -2147483648, type 0]: logger_name == "eventB"

eventA [level -1, type 0]: logger_name == "eventA"
eventB [level -1, type 0]: logger_name == "eventB"
```

Cause
=====

When events are created from payloads in
`src/common/event.cpp:lttng_event_create_from_payload`, the loglevel
value is coerced to `-1` when the loglevel_type is
LTTNG_EVENT_LOGLEVEL_ALL.

Consider the event created in `lttng enable-event --jul
eventName`. The loglevel_type and loglevel will be set as follows:

* loglevel_type: LTTNG_EVENT_LOGLEVEL_ALL (0)
* loglevel: LTTNG_EVENT_LOGLEVEL_JUL_ALL (-2147483648)

The event created is then serialized and sent to the sessiond which
recreates it from the payload removing the value set initially.

The normalization performed in `src/bin/lttng-sessiond/cmd.cpp` has
the same effect.

Solution
========

Remove the normalization of the the loglevel to -1 when events with
loglevel_type LTTNG_EVENT_LOGLEVEL_ALL are created from payloads or
processed via `_cmd_enable_event`.

A test has been added to confirm that the modification doesn't regress
on the issue flagged in https://bugs.lttng.org/issues/1373 which lead
to the normalization changes being made.

This change isn't an exhaustive audit of the packet outputs which may
or may not leak the '-1' "unset" value. One potential change to the
normalization may be to have the normalization be domain-aware and
always default to the domain's "ALL" value. Note that not all domains
have the concept of an "unset" level.

References
==========

[1] https://ci.lttng.org/job/lttng-ust-java-tests_master_build/java_version=java-11-openjdk,platform=bionic-amd64/3302/consoleFull

Refs: #1373

Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Iac653157190b61b44d5ff18ac968fef58330a106

17 months agoFix: lttng-elf: add missing include for uint64_t
Michael Jeanson [Thu, 27 Apr 2023 18:47:53 +0000 (14:47 -0400)] 
Fix: lttng-elf: add missing include for uint64_t

Change-Id: Ic10f9c51197d1c6942ce76a7ee4be4a75f51db47
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
17 months agoTests: add a recording rule listing test
Jérémie Galarneau [Thu, 15 Jun 2023 20:38:08 +0000 (16:38 -0400)] 
Tests: add a recording rule listing test

Add a test that validates that recording rules can be added to a channel
and listed back. Most of the changes are to the lttngtest package in
order to provide the requisite capabilities to the test itself.

This test is also intended as a reproducer for #1373.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I1c6b5e072934ba9760cb1f838395e18a4c6bb211

17 months agoFix: sessiond: crash enabling event rules that differ only by loglevel type
Jérémie Galarneau [Wed, 14 Jun 2023 23:03:12 +0000 (19:03 -0400)] 
Fix: sessiond: crash enabling event rules that differ only by loglevel type

Issue observed
--------------

Summarizing bug #1373, an assertion fails when enabling two event-rules
that only differ by their log level selection type (all, range, or
single).

This issue can be reproduced by launching an instrumented
application (which remains active over the duration of this test) and
running:
  $ lttng create test_session
  $ lttng enable-channel --userspace test_channel
  $ lttng start test_session
  $ lttng enable-event --userspace --session test_session --channel test_channel 'l*' --loglevel-only=TRACE_DEBUG_LINE
  $ lttng enable-event --userspace --session test_session --channel test_channel 'l*' --loglevel=TRACE_DEBUG_LINE

Cause
-----

A number of sites conflate log level values and type. A clean-up had
been performed previously as of 2106efa08 and through follow-up commits.
However, some instances were apparently missed at the time.

As such, add_unique_ust_app_event mixed loglevel values and types when
initializing the key used for the unique insertion. The log level type,
for its part, is completely ignored.

Going back to the reproducer above, the first insertion succeeds as
expected. The second insertion fails since there is already an app event
rule with the `TRACE_DEBUG_LINE` log level type.

Moreover, the matching function only matches on the log level
type (which is really the value), which is also a bug.

The "matching" function is invoked on the key of the second event rule
and the first event rule since the hashing is only performed on the
event-rule's name pattern, resulting in a collision.

Solution
--------

Both the log level value and log level types are used to perform the
matching within the ust-app module. This implies extending the
ust_app_ht_key to store the log level value.

To simplify the matching logic (which attempted to handle 0 and -1
having the same meaning when the "ALL" log level type was used), the log
level value is normalized to '-1' throughout.

Fixes #1373
Reported-by: Bogdan Codres <bogdan.codres@windriver.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I869a0fb7a6554da7d84bc71df6ee91a7e46937cc

17 months agoTests fix: lttngtest: quote session output path
Jérémie Galarneau [Fri, 16 Jun 2023 21:01:13 +0000 (17:01 -0400)] 
Tests fix: lttngtest: quote session output path

The output path used with the `--output` option must be quoted since it
can contain spaces.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I7bf557f64d3fb72a10230ad9da9d59872672e11c

17 months agoTests: remove accidental import of cgi.test
Jérémie Galarneau [Thu, 15 Jun 2023 20:37:33 +0000 (16:37 -0400)] 
Tests: remove accidental import of cgi.test

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I17c1776e8981e212636b65244a58389f194b5d4c

17 months agoTests: lttngtest: raise NotImplementedError in abstract class methods
Jérémie Galarneau [Thu, 15 Jun 2023 16:16:59 +0000 (12:16 -0400)] 
Tests: lttngtest: raise NotImplementedError in abstract class methods

It is preferable to error-out when a Controller doesn't support a method
rather than silently failing.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I3751bf4cff97f400ae53e07efb2740e8426992e9

17 months agoTests: lttngtest: add methods to control session rotations
Jérémie Galarneau [Thu, 15 Jun 2023 16:14:01 +0000 (12:14 -0400)] 
Tests: lttngtest: add methods to control session rotations

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I8b8513fa37e6a51e034e5832499085c501e34ac4

17 months agoTests: lttngctl.py: allow the creation of per-pid buffers
Jérémie Galarneau [Thu, 15 Jun 2023 16:09:05 +0000 (12:09 -0400)] 
Tests: lttngctl.py: allow the creation of per-pid buffers

Add BufferSharingPolicy which can be used with the add_channel method of
a Session to create per-pid channels.

This is done to allow tests for the per-pid buffer sharing mode to be
written using the python utils.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ice3b5d3a8db69c95aaa632c45f42069ab2be590c

17 months agoTests: introduce WaitTraceTestApplicationGroup
Jérémie Galarneau [Thu, 15 Jun 2023 16:00:40 +0000 (12:00 -0400)] 
Tests: introduce WaitTraceTestApplicationGroup

A new test helper, WaitTraceTestApplicationGroup, is added to make it
easier to write test that require a group of applications.

A WaitTraceTestApplicationGroup can be used to:
  - launch a given number of applications,
  - make them trace (almost) all at once,
  - wait for the group of applications to complete

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I18023f1c507993a3e5647f87cd74ead3be392163

17 months agoClean-up: sessiond: ust-app: ua_sess is never populated on failure
Jérémie Galarneau [Mon, 15 May 2023 19:49:04 +0000 (15:49 -0400)] 
Clean-up: sessiond: ust-app: ua_sess is never populated on failure

When find_or_create_ust_app_session() fails, it doesn't populate its
return parameter. Therefore, it is unnecessary to destroy the app
session when it returns < 0.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I8f5cc75f718d96d32fb67fc67135034eb95365d7

17 months agoTests: user space context: remove unneeded test import
Jérémie Galarneau [Wed, 3 May 2023 19:24:51 +0000 (15:24 -0400)] 
Tests: user space context: remove unneeded test import

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I2c109562022fdb4934fdd950ce106cfd66509cd3

17 months agoTests: environment: base WaitTraceTestApplication on gen-ust-events
Jérémie Galarneau [Wed, 3 May 2023 18:55:09 +0000 (14:55 -0400)] 
Tests: environment: base WaitTraceTestApplication on gen-ust-events

WaitTraceTestApplication wraps gen-ust-nevents, but that test app offers
few synchronization points compared to gen-ust-events.

The synchronization points to determine that the application has reached
its `main()` and just before emiting its first event are added to
gen-ust-events. WaitTraceTestApplication is adapted to use those new
options.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Idf717fbaa9108d48a3f7d2b26946a4e5c5dfffd5

17 months agoTests: environment: use a context manager to restore original signal handler
Jérémie Galarneau [Wed, 3 May 2023 18:14:51 +0000 (14:14 -0400)] 
Tests: environment: use a context manager to restore original signal handler

The original signal handler for SIGUSR1 was not restored when launching
the session daemon. This didn't cause any issue, but is unexpected and
would have been a real head scratcher if we wanted to use it later.

To prevent us from forgetting to restore signals when using a
_SignalWaitQueue, a context manager is provided to handle that
automatically.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I86599846ad2000af5d1e3c5cb2a0a73fb1c91455

17 months agocommon: Add a default nullptr argument to make_unique_wrapper
Jérémie Galarneau [Sat, 4 Feb 2023 00:24:54 +0000 (19:24 -0500)] 
common: Add a default nullptr argument to make_unique_wrapper

When wrapping C libraries that return unmanaged pointers,
lttng::make_unique_wrapper makes it easier to locally "wrap" returned
pointers.

  auto val = lttng::make_unique_ptr<struct foo *,
                                    lttng::free>(some_func());

However, in its current form, a nullptr must be passed to define an
alias:

  using my_type_uptr =
    decltype(lttng::make_unique_wrapper<lttng_session,
                                        lttng::free>(nullptr));

Adding a default nullptr argument cuts down a bit of boiler plate.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I9d05d8162d28cc156b1e9ec6fe623f1cc02e9c8e

17 months agoTests: Reduce runtime for high_throughput_limits test
Kienan Stewart [Fri, 9 Jun 2023 20:30:10 +0000 (16:30 -0400)] 
Tests: Reduce runtime for high_throughput_limits test

The test currently takes about 2 hours to run. The iteration count and
range of bandwidth limits have been adjusted to reduce the runtime,
while hopefully covering the same goal: attempting to catch rare race
conditions in low-bandwidth scenarios.

The iteration count has been set so there should be some dropped
events, and a diagnostic message emitted if ever there are no dropped
events.

Change-Id: I670414b60c6881c3a6f7aafd2c1b0c4540e6707f
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
17 months agofix: tests: grep for '$key =' in metadata
Michael Jeanson [Tue, 23 May 2023 19:22:04 +0000 (15:22 -0400)] 
fix: tests: grep for '$key =' in metadata

Always grep for '$key =' to avoid a collision with a value, for example
if you are looking for the 'domain' key and your hostname contains
'domain'.

While we are here, add a bunch of logging to facilitate debugging in the
future.

Change-Id: I09197169ab7f842921c9139fdeb36007d7b20cfb
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
17 months agoBuild fix: workaround g++ 4.8 decltype handling bug
Jérémie Galarneau [Fri, 9 Jun 2023 20:28:09 +0000 (16:28 -0400)] 
Build fix: workaround g++ 4.8 decltype handling bug

g++ 4.8.5 fails to build with the following error:
  g++ -std=gnu++11 -DHAVE_CONFIG_H   -I../../../include -I../../../include -I../../../src -I../../../src -include config.h  -I/home/mjeanson/opt/include   -I/home/mjeanson/opt/include   -DINSTALL_BIN_PATH=\""/home/mjeanson/opt/bin"\"   -fvisibility=hidden -fvisibility-inlines-hidden -fno-strict-aliasing -Wall -Wextra -Wmissing-declarations -Wundef -Wredundant-decls -Wshadow -Wsuggest-attribute=format -Wwrite-strings -Wformat=2 -Wstrict-aliasing -Wmissing-noreturn -Wlogical-op -Winit-self -Wno-incomplete-setjmp-declaration -Wno-gnu-folding-constant -Wno-sign-compare -pthread  -Wno-shadow -Wno-missing-field-initializers -MT commands/lttng-list_triggers.o -MD -MP -MF commands/.deps/lttng-list_triggers.Tpo -c -o commands/lttng-list_triggers.o `test -f 'commands/list_triggers.cpp' || echo './'`commands/list_triggers.cpp
  In file included from commands/../utils.hpp:12:0,
                   from commands/../command.hpp:12,
                   from commands/list_triggers.cpp:8:
  ../../../src/common/container-wrapper.hpp: In instantiation of ‘typename std::conditional<std::is_pointer<_Dp>::value, ElementType, ElementType&>::type lttng::utils::random_access_container_wrapper<ContainerType, ElementType, ContainerOperations>::operator[](std::size_t) [with ContainerType = const lttng_action*; ElementType = const lttng_action*; ContainerOperations = lttng::ctl::details::const_action_list_operations; typename std::conditional<std::is_pointer<_Dp>::value, ElementType, ElementType&>::type = const lttng_action*; std::size_t = long unsigned int]’:
  ../../../src/common/container-wrapper.hpp:78:21:   required from ‘typename std::conditional<std::is_pointer<U>::value, IteratorElementType, IteratorElementType&>::type lttng::utils::random_access_container_wrapper<ContainerType, ElementType, ContainerOperations>::_iterator<IteratorContainerType, IteratorElementType>::operator*() const [with IteratorContainerType = lttng::utils::random_access_container_wrapper<const lttng_action*, const lttng_action*, lttng::ctl::details::const_action_list_operations>; IteratorElementType = const lttng_action*; ContainerType = const lttng_action*; ElementType = const lttng_action*; ContainerOperations = lttng::ctl::details::const_action_list_operations; typename std::conditional<std::is_pointer<U>::value, IteratorElementType, IteratorElementType&>::type = const lttng_action*]’
  commands/list_triggers.cpp:1030:66:   required from here
  ../../../src/common/container-wrapper.hpp:133:69: error: ‘const’ qualifiers cannot be applied to ‘lttng::utils::random_access_container_wrapper<const lttng_action*, const lttng_action*, lttng::ctl::details::const_action_list_operations>&’
   const auto& const_this = static_cast<const decltype(*this)&>(*this);
                                                                     ^

This bug was fixed in g++ 5.0, see:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60420

In this case, we can simply restate the class' type to work around the
issue since the problem is confined to the handling of decltype
declaration specifiers.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I3ba6b012af0f43f7cd06d780e6800c42e16cc66c

17 months agoWrap calls to fmt::format to catch formatting exceptions
Jérémie Galarneau [Thu, 8 Jun 2023 18:53:15 +0000 (14:53 -0400)] 
Wrap calls to fmt::format to catch formatting exceptions

fmt::format throws when a formatting error is encountered.
Unfortunately, we can't ensure complete coverage of all logging call
sites (e.g. error paths) and it is not desirable for such an exception
to be thrown in those cases.

The formatting error is returned as the formatted string so that it ends
up in the logs or exception messages more or less transparently.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I1cb33a5fe87221139eaf9de918b47e0397daa89c

17 months agoWrap main functions to handle uncaught exceptions
Jérémie Galarneau [Thu, 8 Jun 2023 18:20:09 +0000 (14:20 -0400)] 
Wrap main functions to handle uncaught exceptions

Coverity reports multiple instances where formatting facilities can
throw (e.g. invalid format string).

A wrapper to handle formatting exceptions is added in a follow-up
change, but it is a good practice to catch exceptions at the top level
nonetheless.

Reported-by: Coverity Scan
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ie55af338b9ef1f7d6e8825055cfc2e7037cdd80e

17 months agoFix: container-wrapper: size container operation can throw
Jérémie Galarneau [Thu, 8 Jun 2023 17:42:50 +0000 (13:42 -0400)] 
Fix: container-wrapper: size container operation can throw

  1512923 Uncaught exception
  If the exception is ever thrown, the program will crash.

  In lttng::​utils::​random_access_container_wrapper<lttng_action const *, lttng_action const *, lttng::​ctl::​details::​const_action_list_operations>::​size(): A C++ exception is thrown but never caught (CWE-248)

Reported-by: Coverity Scan
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I5f8bffc64fb239e59b272985f6b3c959d238da0a

17 months agocommon: silence bogus coverity warning
Jérémie Galarneau [Wed, 7 Jun 2023 20:57:37 +0000 (16:57 -0400)] 
common: silence bogus coverity warning

Coverity reports:
  1512891 Uninitialized scalar variable
  The variable will contain an arbitrary value left from earlier computations.

  In lttng_event_serialize(lttng_event const *, unsigned int, char
  const * const *,   char const *, unsigned long, lttng_bytecode *, lttng_payload *): Use of an uninitialized variable (CWE-457)

This warning is bogus since lttng_event_exclusion_comm contains a single
field which is already initialized and is packed (no padding possible).

Initialize the header explicitly to silence the warning.

Reported-by: Coverity Scan
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ia1eeee779168b3ac0eb9d0796d503b2d9ab225f2

17 months agoFix: coverity warns of uncaught exception
Jérémie Galarneau [Wed, 7 Jun 2023 20:28:27 +0000 (16:28 -0400)] 
Fix: coverity warns of uncaught exception

Coverity warns that some container operations used by
random_access_container_wrapper can throw even though methods are marked
as noexcept.

  CID 1512893 (#1-2 of 2): Uncaught exception (UNCAUGHT_EXCEPT)
  exn_spec_violation: An exception of type lttng::invalid_argument_error is thrown but the exception specification noexcept doesn't allow it to be thrown. This will result in a call to terminate().

The noexcept specifier is remvoved from operator* and end() of
random_access_container_wrapper's iterator implementation.

To make this a bit clearer, a bounds check is performed in operator[]
directly which will make errors easier to catch.

Reported-by: Coverity Scan
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I31d51e8709d33b3c80d64c8c05a23e519e3a93e7

17 months agoBuild fix: brace-enclosed initlializer lists error with g++ 4.8
Michael Jeanson [Wed, 7 Jun 2023 14:42:57 +0000 (10:42 -0400)] 
Build fix: brace-enclosed initlializer lists error with g++ 4.8

Looks like g++ 4.8 is confused by a single argument brace enclosed
initializer list:

utils.hpp: In constructor 'lttng::cli::session_list::session_list(lttng::cli::session_list&&)':
utils.hpp:112:38: error: call of overloaded 'random_access_container_wrapper(<brace-enclosed initializer list>)' is ambiguous
    { std::move(original._container) })
                                      ^

Change-Id: I19da292ed9a49bada7dbda5753bf1bd1442e612f
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
17 months agoclang-tidy: ignore generated filter-parser.hpp
Jérémie Galarneau [Tue, 6 Jun 2023 18:09:44 +0000 (14:09 -0400)] 
clang-tidy: ignore generated filter-parser.hpp

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I92adc9c9fca588aad2dece474e89e2daa88f36d0

17 months agoFix clang-tidy cppcoreguidelines-pro-type-const-cast warning
Jérémie Galarneau [Tue, 6 Jun 2023 17:43:28 +0000 (13:43 -0400)] 
Fix clang-tidy cppcoreguidelines-pro-type-const-cast warning

clang-tidy reports:
cppcoreguidelines-pro-type-const-cast; do not use const_cast

The const_cast adds a const qualifier so this warning seems a bit
strict. Regardless, we can dodge the whole question by passing the
exclusion_list as `const char * const *`, which is closer to the
original intention of the API anyhow.

For more information on the safety of these types of casts, see:
https://isocpp.org/wiki/faq/const-correctness#constptrptr-conversion

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ia3129b7d1ed4e450f3f2d63920d2fd67c66a6d73

17 months agofmtlib: backport upstream fixes to suppress bogus gcc 13.1 warnings
Jérémie Galarneau [Tue, 6 Jun 2023 15:20:06 +0000 (11:20 -0400)] 
fmtlib: backport upstream fixes to suppress bogus gcc 13.1 warnings

gcc 13.1 erroneously warns of dangling references when using our custom
formatters. This was reported to both fmtlib and gcc and fixes have been
provided, but are not released yet.

This change backports two fixes from the master branch to our vendored
version:
https://github.com/fmtlib/fmt/commit/f61f15cc5b11582d50d02ba0514c5344f7b2600e
https://github.com/fmtlib/fmt/commit/ef55d4f52ec527668a8e910a56ea79d9b939dbc2

For more information on the issue, see:
https://github.com/fmtlib/fmt/issues/3415
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=6b927b1297e66e26e62e722bf15c921dcbbd25b9

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I30bbbbe5e0aa2729e50228acdb528ee060d9df23

17 months agoFix: sessiond: incorrect use of exclusions array leads to crash
Jérémie Galarneau [Tue, 23 May 2023 23:35:10 +0000 (19:35 -0400)] 
Fix: sessiond: incorrect use of exclusions array leads to crash

Issue observed
--------------

When using the CLI to list the configuration of a session that has an
event rule which makes use of multiple exclusions, the session daemon
crashes with the following stack trace:

  (gdb) bt
  #0  0x00007fa9ed401445 in ?? () from /usr/lib/libc.so.6
  #1  0x0000560cd5fc5199 in lttng_strnlen (str=0x615f6f6c6c6568 <error: Cannot access memory at address 0x615f6f6c6c6568>, max=256) at ../../src/common/compat/string.h:19
  #2  0x0000560cd5fc6b39 in lttng_event_serialize (event=0x7fa9cc01d8b0, exclusion_count=2, exclusion_list=0x7fa9cc011794, filter_expression=0x0, bytecode_len=0, bytecode=0x0, payload=0x7fa9d3ffda88) at event.c:767
  #3  0x0000560cd5f380b5 in list_lttng_ust_global_events (nb_events=<synthetic pointer>, reply_payload=0x7fa9d3ffda88, ust_global=<optimized out>, channel_name=<optimized out>) at cmd.c:472
  #4  cmd_list_events (domain=<optimized out>, session=<optimized out>, channel_name=<optimized out>, reply_payload=0x7fa9d3ffda88) at cmd.c:3860
  #5  0x0000560cd5f6d76a in process_client_msg (cmd_ctx=0x7fa9d3ffa710, sock=0x7fa9d3ffa5b0, sock_error=0x7fa9d3ffa5b4) at client.c:1890
  #6  0x0000560cd5f6f876 in thread_manage_clients (data=0x560cd7879490) at client.c:2629
  #7  0x0000560cd5f65a54 in launch_thread (data=0x560cd7879500) at thread.c:66
  #8  0x00007fa9ed32d44b in ?? () from /usr/lib/libc.so.6
  #9  0x00007fa9ed3b0e40 in ?? () from /usr/lib/libc.so.6

Cause
-----

lttng_event_serialize expects a `char **` list of exclusion names, as
provided by the other callsite in liblttng-ctl. However, the callsite in
list_lttng_ust_global_events passes pointer to the exclusions as stored
in lttng_event_exclusion.

lttng_event_exclusion contains an array of fixed-length strings (with a
stride of 256 bytes) which isn't an expected layout for
lttng_event_serialize.

Solution
--------

A temporary array of pointers is constructed before invoking
lttng_event_serialize to construct a list of exclusions with the layout
that lttng_event_serialize expects.

The array itself is reused for all events, limiting the number of
allocations.

Note
----

None.

Change-Id: I266a1cc9e9f18e0476177a0047b1d8f468110575
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
17 months agolttng: reuse random_access_container_wrapper for session_list
Jérémie Galarneau [Tue, 6 Jun 2023 03:39:17 +0000 (23:39 -0400)] 
lttng: reuse random_access_container_wrapper for session_list

Reimplement lttng::cli::session_list in terms of the
random_access_container_wrapper utility since the code is essentially
duplicated.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I99160a54d5d2dfa7cf65a6287b271e9c2238c510

17 months agolttng: move session_list to the lttng::cli namespace
Jérémie Galarneau [Mon, 5 Jun 2023 21:39:39 +0000 (17:39 -0400)] 
lttng: move session_list to the lttng::cli namespace

This is a preliminary step to re-use the random_access_container util to
replace the implementation of session_list. No behaviour change is
intended by this change.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I32d0354a6985b8c6fb58813cf880ef04a10678dd

17 months agoProvide an idiomatic c++ interface for action lists
Jérémie Galarneau [Thu, 1 Jun 2023 20:19:31 +0000 (16:19 -0400)] 
Provide an idiomatic c++ interface for action lists

Replace for_each macros with the use of an iterator. It is done by using
a random_access_container_wrapper util which is intended to wrap
random_access containers implemented in C.

Change-Id: I1b22725b7335f267c9b2d02fc65f9375baf37426
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
17 months agoactions: list: Add `for_each_action_{const, mutable}()` macros
Francis Deslauriers [Wed, 23 Jun 2021 16:33:18 +0000 (12:33 -0400)] 
actions: list: Add `for_each_action_{const, mutable}()` macros

Accessing all the inner actions of a action list in a loop is a common
access pattern. This commit adds 2 `for_each` macros to iterate over all
elements either using a const or a mutable pointer.

Add a few unit tests for the list action to test these macros.

Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I9aff0b81e1f782b5d20c3fcb82ee7028da8dd810

17 months agotrace-ust: Rename `{next, used}_channel_id` to `{next, used}_event_container_id`
Francis Deslauriers [Fri, 18 Jun 2021 16:23:35 +0000 (12:23 -0400)] 
trace-ust: Rename `{next, used}_channel_id` to `{next, used}_event_container_id`

Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Idb07462c15cbebaf37a244ad76a32e893ecbab0c

17 months agoTests fix: test_callstack: output of addr2line incorrectly parsed
Jérémie Galarneau [Thu, 25 May 2023 23:15:20 +0000 (19:15 -0400)] 
Tests fix: test_callstack: output of addr2line incorrectly parsed

Issue observed
--------------

The test_callstack test fails with GCC 13.1 with the following output:

  Traceback (most recent call last):
  File "/usr/lib/lttng-tools/ptest/tests/regression/././kernel//../../utils/parse-callstack.py", line 160, in <module>
  main()
  File "/usr/lib/lttng-tools/ptest/tests/regression/././kernel//../../utils/parse-callstack.py", line 155, in main
  raise Exception('Expected function name not found in recorded callstack')
  Exception: Expected function name not found in recorded callstack
  ok 10 - Destroy session callstack
  PASS: kernel/test_callstack 10 - Destroy session callstack
  not ok 11 - Validate userspace callstack
  FAIL: kernel/test_callstack 11 - Validate userspace callstack

Cause
-----

parse-callstack.py uses 'split()' to split the lines of addr2line's
output. By default, 'split()' splits a string on any whitespace.
Typically this was fine as addr2line's output doesn't contain spaces and
the function then splits on new lines.

Typical output of addr2line:

  $ addr2line -e ./tests/regression/kernel//../../utils/testapp/gen-syscall-events-callstack/gen-syscall-events-callstack --functions --addresses 0x40124B
  0x000000000040124b
  my_gettid
  /tmp/test-callstack-master/src/lttng-tools/tests/utils/testapp/gen-syscall-events-callstack/gen-syscall-events-callstack.c:40

However, with the test app compiled using gcc 13.1, a "discriminator"
annotation is present:

  0x0000000000401279
  fct_b
  /tmp/test-callstack-master/src/lttng-tools/tests/utils/testapp/gen-syscall-events-callstack/gen-syscall-events-callstack.c:58 (discriminator 1)

Hence, by selecting the second to last element (-2, with negative
indexing), the addr2line function returns '(discriminator' as the
function name.

Solution
--------

The parsing code is changed to simply iterate on groups of 3 lines,
following addr2line's output format.

Fixes #1377

Change-Id: I8c1eab97e84ca7cad171904bed6660540061cf08
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
18 months agoTests: add more output to test_ust_constructor
Michael Jeanson [Tue, 2 May 2023 21:06:29 +0000 (17:06 -0400)] 
Tests: add more output to test_ust_constructor

To make debugging test failures easier, add a tap result for each lttng
command and for each expected event.

Change-Id: I54872494c967f1e13bf6d5b05dc5c202abf5865e
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
18 months agoTests: convert even more left-over type hint to type comment
Michael Jeanson [Tue, 2 May 2023 14:28:54 +0000 (10:28 -0400)] 
Tests: convert even more left-over type hint to type comment

Use type comments to support older python3 interpreters that can't
handle type hints (such as 3.4).

Change-Id: I4f97bc3f1e18b8701cb04175d97deb295178883b
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
18 months agoDocs: test environment: clarify the behavior of a TraceTestApplication
Jérémie Galarneau [Mon, 1 May 2023 21:34:36 +0000 (17:34 -0400)] 
Docs: test environment: clarify the behavior of a TraceTestApplication

A TraceTestApplication, as opposed to a WaitTraceTestApplication, traces
immediately when it is launched. This is useful to test tracing from a
constructor, but should most likely not be used for other purposes.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I7c25a8a60008d584b9376fce74fb925fc02b846a

18 months agoTests: convert more left-over type hint to type comment
Michael Jeanson [Mon, 1 May 2023 20:50:57 +0000 (16:50 -0400)] 
Tests: convert more left-over type hint to type comment

Use type comments to support older python3 interpreters that can't
handle type hints (such as 3.4).

Change-Id: I3599311dfcc172344c759f8c0e22af640759f1f5
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
18 months agoPort: fix -Wdeprecated-declarations warning about sprintf on macOS clang 14
Michael Jeanson [Tue, 18 Apr 2023 19:02:37 +0000 (15:02 -0400)] 
Port: fix -Wdeprecated-declarations warning about sprintf on macOS clang 14

Remove uses of sprintf to fix this warning:

    warning: 'sprintf' is deprecated: This function is provided for
    compatibility reasons only.  Due to security concerns inherent in the
    design of sprintf(3), it is highly recommended that you use snprintf(3)
    instead. [-Wdeprecated-declarations]

Change-Id: Idf3109f2eacafe0a7d18f4c132613f2f85afa09b
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
18 months agoTests: ust_constructor: convert left-over type hint to type comment
Jérémie Galarneau [Mon, 1 May 2023 17:45:27 +0000 (13:45 -0400)] 
Tests: ust_constructor: convert left-over type hint to type comment

Use type comments to support older python3 interpreters that can't
handle type hints (such as 3.4).

Python 3.4 reports the following error:
  File "./ust/ust-constructor/test_ust_constructor.py", line 176
    client: lttngtest.Controller = lttngtest.LTTngClient(test_env, log=tap.diagnostic)
          ^
  SyntaxError: invalid syntax
  ERROR: ust/ust-constructor/test_ust_constructor.py - missing test plan
  ERROR: ust/ust-constructor/test_ust_constructor.py - exited with status 1

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ic0ead4b9bedfca56e8999bf4012e103801794655

18 months agoTests: ust_constructor: convert type hints to type comments
Jérémie Galarneau [Mon, 1 May 2023 15:57:48 +0000 (11:57 -0400)] 
Tests: ust_constructor: convert type hints to type comments

Use type comments to support older python3 interpreters that can't
handle type hints (such as 3.4).

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Id8b040fb36a4f243a3af0d731843028ad2f4388d

18 months agoClean-up: lttng: utils: coding style fix
Jérémie Galarneau [Fri, 28 Apr 2023 14:46:08 +0000 (10:46 -0400)] 
Clean-up: lttng: utils: coding style fix

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ia23ad0168729819bcb0735ff8e55ce02fc9bcd20

18 months agoTests: Introduce test_ust_constructor
Mathieu Desnoyers [Fri, 17 Feb 2023 20:32:25 +0000 (15:32 -0500)] 
Tests: Introduce test_ust_constructor

Test instrumentation coverage of C/C++ constructors and destructors by
LTTng-UST tracepoints.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ia9e5a5a57bfa7fd4316f8a914ef97effd020262e

18 months agoTests: Introduce gen-ust-events-constructor test application
Mathieu Desnoyers [Fri, 17 Feb 2023 14:57:03 +0000 (09:57 -0500)] 
Tests: Introduce gen-ust-events-constructor test application

This test application tests C/C++ constructor/destructor instrumentation coverage.

* How to use:

lttng create
lttng enable-event -u 'tp*'
lttng start
./gen-ust-events-constructor
lttng stop
lttng view

* Before UST fixes:

[11:57:09.949917277] (+?.?????????) compudjdev tp_so:constructor_c_provider_shared_library: { cpu_id = 6 }, { }
[11:57:09.949962573] (+0.000045296) compudjdev tp_so:constructor_cplusplus_provider_shared_library: { cpu_id = 6 }, { msg = "global - shared library define and provider" }
[11:57:09.952145202] (+0.002182629) compudjdev tp:constructor_cplusplus: { cpu_id = 6 }, { msg = "global - same unit after provider" }
[11:57:09.952146517] (+0.000001315) compudjdev tp:constructor_c_across_units_after_provider: { cpu_id = 6 }, { }
[11:57:09.952146887] (+0.000000370) compudjdev tp:constructor_cplusplus: { cpu_id = 6 }, { msg = "global - across units after provider" }
[11:57:09.952634622] (+0.000487735) compudjdev tp:constructor_cplusplus: { cpu_id = 6 }, { msg = "main() local" }
[11:57:09.952635522] (+0.000000900) compudjdev tp_so:constructor_cplusplus_provider_shared_library: { cpu_id = 6 }, { msg = "main() local - shared library define and provider" }
[11:57:09.952636176] (+0.000000654) compudjdev tp_a:constructor_cplusplus_provider_static_archive: { cpu_id = 6 }, { msg = "main() local - static archive define and provider" }
[11:57:09.952636906] (+0.000000730) compudjdev tp:main: { cpu_id = 6 }, { }
[11:57:09.952637469] (+0.000000563) compudjdev tp_a:destructor_cplusplus_provider_static_archive: { cpu_id = 6 }, { msg = "main() local - static archive define and provider" }
[11:57:09.952638106] (+0.000000637) compudjdev tp_so:destructor_cplusplus_provider_shared_library: { cpu_id = 6 }, { msg = "main() local - shared library define and provider" }
[11:57:09.952638516] (+0.000000410) compudjdev tp:destructor_cplusplus: { cpu_id = 6 }, { msg = "main() local" }
[11:57:09.952681576] (+0.000043060) compudjdev tp:destructor_cplusplus: { cpu_id = 6 }, { msg = "global - across units after provider" }
[11:57:09.952682066] (+0.000000490) compudjdev tp:destructor_cplusplus: { cpu_id = 6 }, { msg = "global - same unit after provider" }
[11:57:09.952729603] (+0.000047537) compudjdev tp_so:destructor_cplusplus_provider_shared_library: { cpu_id = 6 }, { msg = "global - shared library define and provider" }

* After UST fixes:

[11:49:37.921028048] (+?.?????????) compudjdev tp_so:constructor_c_provider_shared_library: { cpu_id = 22 }, { }
[11:49:37.921033701] (+0.000005653) compudjdev tp_a:constructor_c_provider_static_archive: { cpu_id = 22 }, { }
[11:49:37.921036278] (+0.000002577) compudjdev tp_so:constructor_cplusplus_provider_shared_library: { cpu_id = 22 }, { msg = "global - shared library define and provider" }
[11:49:37.921037961] (+0.000001683) compudjdev tp_a:constructor_cplusplus_provider_static_archive: { cpu_id = 22 }, { msg = "global - static archive define and provider" }
[11:49:37.921039431] (+0.000001470) compudjdev tp:constructor_c_across_units_before_define: { cpu_id = 22 }, { }
[11:49:37.921040288] (+0.000000857) compudjdev tp:constructor_cplusplus: { cpu_id = 22 }, { msg = "global - across units before define" }
[11:49:37.921041208] (+0.000000920) compudjdev tp:constructor_c_same_unit_before_define: { cpu_id = 22 }, { }
[11:49:37.921042021] (+0.000000813) compudjdev tp:constructor_c_same_unit_after_define: { cpu_id = 22 }, { }
[11:49:37.921042568] (+0.000000547) compudjdev tp:constructor_cplusplus: { cpu_id = 22 }, { msg = "global - same unit before define" }
[11:49:37.921043161] (+0.000000593) compudjdev tp:constructor_cplusplus: { cpu_id = 22 }, { msg = "global - same unit after define" }
[11:49:37.921044058] (+0.000000897) compudjdev tp:constructor_c_across_units_after_define: { cpu_id = 22 }, { }
[11:49:37.921044585] (+0.000000527) compudjdev tp:constructor_cplusplus: { cpu_id = 22 }, { msg = "global - across units after define" }
[11:49:37.921045585] (+0.000001000) compudjdev tp:constructor_c_same_unit_before_provider: { cpu_id = 22 }, { }
[11:49:37.921046385] (+0.000000800) compudjdev tp:constructor_c_same_unit_after_provider: { cpu_id = 22 }, { }
[11:49:37.921046938] (+0.000000553) compudjdev tp:constructor_cplusplus: { cpu_id = 22 }, { msg = "global - same unit before provider" }
[11:49:37.921047548] (+0.000000610) compudjdev tp:constructor_cplusplus: { cpu_id = 22 }, { msg = "global - same unit after provider" }
[11:49:37.921048428] (+0.000000880) compudjdev tp:constructor_c_across_units_after_provider: { cpu_id = 22 }, { }
[11:49:37.921048918] (+0.000000490) compudjdev tp:constructor_cplusplus: { cpu_id = 22 }, { msg = "global - across units after provider" }
[11:49:37.921050001] (+0.000001083) compudjdev tp:constructor_cplusplus: { cpu_id = 22 }, { msg = "main() local" }
[11:49:37.921050628] (+0.000000627) compudjdev tp_so:constructor_cplusplus_provider_shared_library: { cpu_id = 22 }, { msg = "main() local - shared library define and provider" }
[11:49:37.921051368] (+0.000000740) compudjdev tp_a:constructor_cplusplus_provider_static_archive: { cpu_id = 22 }, { msg = "main() local - static archive define and provider" }
[11:49:37.921052098] (+0.000000730) compudjdev tp:main: { cpu_id = 22 }, { }
[11:49:37.921052758] (+0.000000660) compudjdev tp_a:destructor_cplusplus_provider_static_archive: { cpu_id = 22 }, { msg = "main() local - static archive define and provider" }
[11:49:37.921053758] (+0.000001000) compudjdev tp_so:destructor_cplusplus_provider_shared_library: { cpu_id = 22 }, { msg = "main() local - shared library define and provider" }
[11:49:37.921054595] (+0.000000837) compudjdev tp:destructor_cplusplus: { cpu_id = 22 }, { msg = "main() local" }
[11:49:37.921055698] (+0.000001103) compudjdev tp:destructor_cplusplus: { cpu_id = 22 }, { msg = "global - across units after provider" }
[11:49:37.921056455] (+0.000000757) compudjdev tp:destructor_cplusplus: { cpu_id = 22 }, { msg = "global - same unit after provider" }
[11:49:37.921057011] (+0.000000556) compudjdev tp:destructor_cplusplus: { cpu_id = 22 }, { msg = "global - same unit before provider" }
[11:49:37.921057558] (+0.000000547) compudjdev tp:destructor_cplusplus: { cpu_id = 22 }, { msg = "global - across units after define" }
[11:49:37.921058188] (+0.000000630) compudjdev tp:destructor_cplusplus: { cpu_id = 22 }, { msg = "global - same unit after define" }
[11:49:37.921058658] (+0.000000470) compudjdev tp:destructor_cplusplus: { cpu_id = 22 }, { msg = "global - same unit before define" }
[11:49:37.921059168] (+0.000000510) compudjdev tp:destructor_cplusplus: { cpu_id = 22 }, { msg = "global - across units before define" }
[11:49:37.921059768] (+0.000000600) compudjdev tp_a:destructor_cplusplus_provider_static_archive: { cpu_id = 22 }, { msg = "global - static archive define and provider" }
[11:49:37.921060445] (+0.000000677) compudjdev tp_so:destructor_cplusplus_provider_shared_library: { cpu_id = 22 }, { msg = "global - shared library define and provider" }
[11:49:37.921067265] (+0.000006820) compudjdev tp:destructor_c_across_units_after_provider: { cpu_id = 22 }, { }
[11:49:37.921067901] (+0.000000636) compudjdev tp:destructor_c_same_unit_after_provider: { cpu_id = 22 }, { }
[11:49:37.921068515] (+0.000000614) compudjdev tp:destructor_c_same_unit_before_provider: { cpu_id = 22 }, { }
[11:49:37.921069128] (+0.000000613) compudjdev tp:destructor_c_across_units_after_define: { cpu_id = 22 }, { }
[11:49:37.921069831] (+0.000000703) compudjdev tp:destructor_c_same_unit_after_define: { cpu_id = 22 }, { }
[11:49:37.921070445] (+0.000000614) compudjdev tp:destructor_c_same_unit_before_define: { cpu_id = 22 }, { }
[11:49:37.921071075] (+0.000000630) compudjdev tp:destructor_c_across_units_before_define: { cpu_id = 22 }, { }
[11:49:37.921071721] (+0.000000646) compudjdev tp_a:destructor_c_provider_static_archive: { cpu_id = 22 }, { }
[11:49:37.921072605] (+0.000000884) compudjdev tp_so:destructor_c_provider_shared_library: { cpu_id = 22 }, { }

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I4572c2548acf5e295f70e88137ab12b3b86d17c9

18 months agoFix: io-hint: add missing include for off_t
Michael Jeanson [Thu, 27 Apr 2023 17:08:16 +0000 (13:08 -0400)] 
Fix: io-hint: add missing include for off_t

Change-Id: I73ca36d43c7a9b9b5f67e77e835426964b315f65
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
18 months agoBuild fix: g++ 4.8 incorrectly disambiguates enum and member
Jérémie Galarneau [Wed, 26 Apr 2023 22:13:02 +0000 (18:13 -0400)] 
Build fix: g++ 4.8 incorrectly disambiguates enum and member

g++ 4.8 fails to build with the following error:

  commands/start.cpp: In function ‘cmd_error_code {anonymous}::start_tracing(const session_spec&)’:
  commands/start.cpp:123:76: error: ‘session_spec::type’ is not a class, namespace, or enumeration
    if (!listing_failed && sessions.size() == 0 && spec.type == session_spec::type::NAME) {
                                                                              ^
  commands/start.cpp:144:36: error: ‘session_spec::type’ is not a class, namespace, or enumeration
       if (spec.type != session_spec::type::NAME) {
                                    ^

The `type` member is renamed to type_ to workaround this compiler bug.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Id46ca4219d4d96db71a9d4523c3571303b2e97a7

This page took 0.058932 seconds and 4 git commands to generate.