]> git.lttng.org Git - lttng-tools.git/log
lttng-tools.git
5 days agotests: Add helper functions to validate if lttng modules are loaded
Kienan Stewart [Thu, 4 Jul 2024 13:25:51 +0000 (09:25 -0400)] 
tests: Add helper functions to validate if lttng modules are loaded

Observed issue
==============

Regression tests exercising kernel modules don't validate before and
after the test execution that there are no LTTng kernel modules
loaded.

Solution
========

Add convenient tap helper functions that may be used by shell TAP
tests.

Known drawbacks
===============

None.

Change-Id: I8244358137c2d049ed72e6dc7d9290cb9dda10e3
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 days agoFix: Update lttng-modules load list
Kienan Stewart [Wed, 3 Jul 2024 19:51:29 +0000 (15:51 -0400)] 
Fix: Update lttng-modules load list

Observed issue
==============

When starting a root sessiond, after stopping it there would still be
lttng-modules modules loaded in the kernel.

```
$ lsmod | grep lttng || echo "nothing"
nothing

$ lttng-sessiond -b

$ killall lttng-sessiond

$ lsmod | grep lttng || echo "nothing"
lttng_statedump       757760  0
lttng_wrapper          16384  1 lttng_statedump
lttng_uprobes          12288  0
lttng_clock            12288  0
lttng_kprobes          12288  0
lttng_lib_ring_buffer    90112  0
lttng_kretprobes       12288  0
```

Cause
=====

Not all modules are listed in the core/data modules in `modprobe.cpp`.

Solution
========

Add missing modules.

Known drawbacks
===============

None.

Change-Id: I28525c55eadb95467f77ffac0b9152ac8576e0fc
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 days agoFix: sessiond: notification-thread: add to pollset fails silently
Jérémie Galarneau [Wed, 20 Nov 2024 16:44:12 +0000 (16:44 +0000)] 
Fix: sessiond: notification-thread: add to pollset fails silently

Noticed when reviewing the code, so I don't have a reproducer
for the issue. However, the "ADD_TRACER_EVENT_SOURCE" command
returns LTTNG_OK even when the notification thread fails to add
the tracer event source to the poll set.

The error path properly performs the requisite clean-up, but
the command emitter will be under the impression that the command
succeeded. In doing so, it will most likely use the
"REMOVE_TRACER_EVENT_SOURCE" at a later point which will cause
a failed assertion to hit when the `source_element` isn't found.

Change-Id: I4a400d6affa21d2c2247ecfb845ca1e4aa730b5d
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
8 days agodocs: Fix up test documentation
Kienan Stewart [Tue, 29 Oct 2024 12:40:33 +0000 (08:40 -0400)] 
docs: Fix up test documentation

Observed issue
==============

Principally `LTTNG_TEST_GDBSERVER_SESSIOND` was listed twice in the
table, and `LTTNG_TEST_GDBSERVER_SESSIOND_WAIT` was no listed.

Solution
========

The duplicate `LTTNG_TEST_GDBSERVER_SESSIOND` has been removed and
`LTTNG_TEST_GDBSERVER_SESSIOND_WAIT` added.

The wording on the `*_GDBSERVER_WAIT` variables has been clarified.

The ordering of the new entries was modified to preserve alphabetical ordering.

Known drawbacks
===============

None.

Change-Id: I932a892a6aa78f37bb9a3a1c5ddaf8fadb552f56
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
8 days agodocs: Update coding standard blurb in contributions readme
Kienan Stewart [Tue, 29 Oct 2024 12:48:45 +0000 (08:48 -0400)] 
docs: Update coding standard blurb in contributions readme

Observed issue
==============

The CodingStyle document now includes style guidelines for Python,
Shell (bash), and legacy C code.

Solution
========

* Fix wording in the first sentence and use a relative link to the file
* Remove paragraph that other languages do not have a coding style
* Add a link to the test README

Known drawbacks
===============

None.

Change-Id: I057ae8d51afaa07892aa40b47c2f1716574b3b06
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
8 days agoClean-up: sessiond: document ownership of agent
Jérémie Galarneau [Tue, 26 Nov 2024 15:44:52 +0000 (15:44 +0000)] 
Clean-up: sessiond: document ownership of agent

`agent` instances are lazily created at various sites and their
ownership can be confusing for both analysis tools and meat bags.

Document that the ownership of the instance is transferred by the
`agent_add` function.

Change-Id: I709d2908611bdebd88c82261d1e5d5ee3bde3a09
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
8 days agoFix: sessiond: cmd_add_ctx: leak of internal channel members
Jérémie Galarneau [Tue, 26 Nov 2024 15:27:10 +0000 (15:27 +0000)] 
Fix: sessiond: cmd_add_ctx: leak of internal channel members

lttng_channel instances must be released using channel_attr_destroy.
However, an error path of cmd_add_ctx uses free() directly, which causes
internal structures of lttng_channel to be leaked.

Wrap the lttng_channel instance to use a unique_ptr which invokes
channel_attr_destroy on release.

Change-Id: I77443c8a57475437dbb11792869e70840680492f
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
8 days agoClean-up: sessiond: enable_event: use unique_ptr to manage memory
Jérémie Galarneau [Tue, 26 Nov 2024 15:07:53 +0000 (15:07 +0000)] 
Clean-up: sessiond: enable_event: use unique_ptr to manage memory

Simplify the error paths used to create the internal events associated
with agent events as copies of the filter bytecode and expressions are
performed.

Change-Id: I260bcabb8965e4c86cbc9e40fdd056a294ced676
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
8 days agoFix: sessiond: leak of filter_expression, bytecode, and exclusion
Jérémie Galarneau [Tue, 26 Nov 2024 14:56:53 +0000 (14:56 +0000)] 
Fix: sessiond: leak of filter_expression, bytecode, and exclusion

The filter_expression, bytecode, and exclusion arguments are leaked
whenever a client omits the transmission of an event-rule as part of an
enable-event command (a protocol error).

Ensure the three arguments are automatically managed using smart
pointers before performing the protocol consistency check.

Change-Id: I4c4f1d5d7b6acdd215ef34e1fc8f2c4bc81a674a
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
8 days agoFix: lttng-ctl: Validate lttng_enable_event_with_exclusions inputs at beginning of...
Mathieu Desnoyers [Tue, 26 Nov 2024 14:41:18 +0000 (14:41 +0000)] 
Fix: lttng-ctl: Validate lttng_enable_event_with_exclusions inputs at beginning of function

Checking for ev == nullptr after having dereferenced ev triggers this
Coverity warning:

    CID 1566411:  Null pointer dereferences  (REVERSE_INULL)
    Null-checking "ev" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.

Move the handle and ev nullptr checks before the LTTNG_EVENT_ALL
special-case. This is relevant because within LTTNG_EVENT_ALL, after
modifying the event type, lttng_enable_event_with_exclusions is invoked
again with the modified type, which will end up doing the input
validation anyway.

Move the original_filter_expression before the LTTNG_EVENT_ALL event
type check for the same reason: it will end up being checked with the
modified event type in the nested call anyway, so favor early checking
of input arguments.

Change-Id: Id1c931300aa49477a9f480698f5dad645240b904
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
8 days agoFix: sessiond: uninitialized ust_app event_notifier_group
Kienan Stewart [Thu, 21 Nov 2024 20:03:56 +0000 (15:03 -0500)] 
Fix: sessiond: uninitialized ust_app event_notifier_group

Observed issue
==============

On SLES12SP5 (gcc 4.8.5), occasional crashes were observed with the
following log entry:

```
DBG1 - 18:49:41.084261033 [Notification]: Poll wait returned (1) (in thread_notification() at notification-thread.cpp:632)
DBG1 - 18:49:41.084264635 [Notification]: Handling fd (28) activity (1) (in thread_notification() at notification-thread.cpp:656)
DBG1 - 18:49:41.084269451 [Notification]: Received `REMOVE_TRACER_EVENT_SOURCE` command (in handle_notification_thread_command() at notification-thread-events.cpp:3178)
lttng-sessiond: notification-thread-events.cpp:2228: int
handle_notification_thread_command_remove_tracer_event_source(notification_thread_state*,
int, lttng_error_code*): Assertion `source_element' failed
```

The easiest reproduce was to comment out most all tests in
`regression/ust/ust-app-ctl-paths/test_ust_app_ctl_paths` except
`test_trace_self_default_paths`. Then run a loop until a failure is
detected, e.g.

```
make -j4 && \
ITER=0 && \
while true ; do LTTNG_UST_DEBUG=1 LTTNG_TEST_LOG_DIR=- LTTNG_TEST_VERBOSE_SESSIOND=1 ./tests/regression/ust/ust-app-ctl-paths/test_ust_app_ctl_paths 2>&1 | tee sessiond.log ; if grep -qE '^not ok' sessiond.log ; then break; fi ; ITER=$((ITER+1)) ; echo "Done $ITER iterations" ; sleep 1 ; done
```

Cause
=====

After investigation, it appears to be that `lta = new ust_app;` may
occasionally return the address multiple times when previous
allocation is rapidly de-allocated.

For example, in the following snippet of a log tracking an application
address that shows re-use:

```
DBG1 - 19:47:29.103043500 [UST registration dispatch]: Create new ust app allocation 0x7fce14001da0 for pid 23256 sock 53 (in ust_app_create() at ust-app.cpp:4061)
DBG1 - 19:47:29.115347169 [UST registration dispatch]: wait_node_in_queue looked up app 0x7fce14001da0 pid 23256 (in thread_dispatch_ust_registration() at dispatch.cpp:362)
DBG1 - 19:47:29.115933991 [UST registration dispatch]: ust app version app 0x7fce14001da0 pid 23256 (in thread_dispatch_ust_registration() at dispatch.cpp:392)
DBG1 - 19:47:29.115954872 [UST registration dispatch]: Added ust app 0x7fce14001da0 pid 23256 (in thread_dispatch_ust_registration() at dispatch.cpp:403)
DBG1 - 19:47:29.115958125 [UST registration dispatch]: ust_app_setup_event_notifier_group app 0x7fce14001da0 pid 23256 (in ust_app_setup_event_notifier_group() at ust-app.cpp:4226)
DBG1 - 19:47:29.116367064 [UST registration dispatch]: app 0x7fce14001da0 add tracer event source in setup_event_notifier_group fd 60 ret 10 (in ust_app_setup_event_notifier_group() at ust-app.cpp:4279)
DBG1 - 19:47:29.117653161 [UST registration dispatch]: app 0x7fce14001da0 pid 23256 return from setup trace event notifier group ret 0, event notifier group 0x7fce14003180 (in ust_app_setup_event_notifier_group() at ust-app.cpp:4313)
DBG1 - 19:47:29.152058781 [23308/23332]: delete_ust_app 0x7fce14001da0 pid 23256 sock 53 (in delete_ust_app() at ust-app.cpp:1021)
DBG1 - 19:47:29.152076408 [23308/23332]: No event notifier group object remove tracer event source fd 60 app 0x7fce14001da0 (in delete_ust_app() at ust-app.cpp:1075)
DBG1 - 19:47:29.191427936 [UST registration dispatch]: Create new ust app allocation 0x7fce14001da0 for pid 23371 sock 71 (in ust_app_create() at ust-app.cpp:4061)
DBG1 - 19:47:29.204450310 [UST registration dispatch]: wait node 0x7fce14001d10, app 0x7fce14001da0 (in sanitize_wait_queue() at dispatch.cpp:144)
DBG1 - 19:47:29.204454373 [UST registration dispatch]: Culling app from wait queue: pid=23371 app 0x7fce14001da0 (in sanitize_wait_queue() at dispatch.cpp:146)
DBG1 - 19:47:29.259291727 [23308/23332]: delete_ust_app 0x7fce14001da0 pid 23371 sock 71 (in delete_ust_app() at ust-app.cpp:1021)
DBG1 - 19:47:29.590230014 [23308/23332]: No event notifier group object remove tracer event source fd 81 app 0x7fce14001da0 (in delete_ust_app() at ust-app.cpp:1075)
lttng-sessiond: notification-thread-events.cpp:2233: int handle_notification_thread_command_remove_tracer_event_source(notification_thread_state*, int, lttng_error_code*):
Assertion `source_element' failed.
```

As this variable is default initialized[1], the memory contents are
indeterminate. In `ust_app_create`, many fields are manually
initialized; however `lta->event_notifier_group.object` is not set
explicitly.

The next time `event_notifier_group.object` is set is when the the
`notification_thread_command_add_tracer_event_source` succeeds. If the
app dies before than, or if that registration fails, the
`ust_app.event_notifier_group.object` may be non-NULL. When this
object passes to `delete_ust_app`, the only check made before trying
to remove the tracer event
source (`notification_thread_command_remove_tracer_event_source`) is
if `event_notifier_group.object` is NULL or not.

If the file descriptor hasn't been previously registered, the assert
is triggered.

While I haven't confirmed my hypothesis, it is possible that the
behaviour of GCC regarding default initialization has changed between
GCC 4.8.5 and now.

Solution
========

Use an explicit initialization for the new allocation.

Known drawbacks
===============

None.

References
==========

[1]: https://en.cppreference.com/w/cpp/language/default_initialization (See case 2)

Change-Id: I12de1864d2995be014d5526283fb843d0f99093e
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
8 days agoDocs: Correct `LTTNG_RUNDIR` environment variable name
Kienan Stewart [Fri, 1 Nov 2024 15:33:39 +0000 (11:33 -0400)] 
Docs: Correct `LTTNG_RUNDIR` environment variable name

Change-Id: I13c803f946bf25cc9c622e2d4d156ecc3af3457d
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
8 days agoDocs: Correct lttng-ust man page references
Kienan Stewart [Fri, 1 Nov 2024 15:17:18 +0000 (11:17 -0400)] 
Docs: Correct lttng-ust man page references

==== Observed issue ====

Deploying the lttng-www site fails with the following broken link:

```
urlname;parentname;base;result;warningstring;infostring;valid;url;line;column;name;dltime;size;checktime;cached;level;modified
/man/1/lttng-ust/v2.13;http://localhost:10000/man/8/lttng-sessiond/v2.13/;;404
Not
Found;;;False;http://localhost:10000/man/1/lttng-ust/v2.13;730;8;lttng-ust(1);-1;160;3.2490103244781494;0;3;
```

==== Cause ====

LTTng-UST does not ship a man page in section 1[1].

==== Solution ====

Correct the lttng-ust man page references in `lttng-sessiond.8.txt`.

==== Know drawbacks ====

None.

==== References ====

[1]: https://github.com/lttng/lttng-ust/tree/5cc8729236a95d784e9561abbcb93a0fce90890c/doc/man

Change-Id: If697be273a9c4db38a7723adc469f45818bad355
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 days agoFix: Close per-process event notifier error accounting fds on registration
Mathieu Desnoyers [Mon, 4 Nov 2024 18:26:50 +0000 (13:26 -0500)] 
Fix: Close per-process event notifier error accounting fds on registration

On application registration, the event notifier error accounting file
descriptors are duplicated to send the error accounting counter objects
to the application.

Those are left open until the application unregisters.

There is one file descriptor per CPU, so on larger systems (228 CPUs
Intel or 192 CPUs AMD EPYC), this adds up to a lot of file descriptors
when the number of registered applications is large, which can result in
file descriptor exhaustion errors.

Moreover, the application unregistration is done from delete_ust_app(),
which is used from a call_rcu() worker thread, thus after an RCU grace
period delay. This means that a steady stream of short-lived
applications with a short enough lifetime could end up allocating more
file descriptors than can be closed.

Fix this by closing those file descriptors immediately after the objects
are sent to the application, similarly to what is done for the ring
buffer streams.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ia1bbc3ff09a20f37d069ade7e267fb043ea1ac7f

9 days agoMaintain recording channel configuration objects in ltt_session
Jérémie Galarneau [Sat, 14 Sep 2024 09:36:09 +0000 (05:36 -0400)] 
Maintain recording channel configuration objects in ltt_session

Maintain the state of event rules in the ltt_session's channel
configurations. This is laying the groundwork to maintain triggers (and
ultimately, event-rules) associated with map channels.

liblttng-ctl now converts the lttng_event specifications into
event-rules which are used by the session daemon to maintain recording
channel configurations.

The objective is to share the synchronization code of
tracers (essentially, the description of enablers and channel
configuration) with the session configurations when adding support for
maps.

As of this commit, the event-rule-configurations are maintained
internally, but aren't use to synchronize the tracer configuration.

Change-Id: I6067ef31ca4b9cead55cf4c9dc2bac8dd8ca3400
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 days agoFix: unhandled value in lttng_consumer_type formatter
Jérémie Galarneau [Mon, 25 Nov 2024 19:57:37 +0000 (19:57 +0000)] 
Fix: unhandled value in lttng_consumer_type formatter

Change-Id: I6780fafba72c326a8ea036d0500db6f15524886f
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 days agoFix: path: handle truncation in utils_partial_realpath()
Jérémie Galarneau [Mon, 25 Nov 2024 19:53:57 +0000 (19:53 +0000)] 
Fix: path: handle truncation in utils_partial_realpath()

gcc warns that:
  warning: 'char* strncpy(char*, const char*, size_t)' specified bound 4096 equals destination size [-Wstringop-truncation]

Return an error when lttng_strncpy reports that a truncation occurred.

Change-Id: I03889d6d275566413df2848974e4d3ad83565b17
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
9 days agoFix: path: handle truncation in utils_partial_realpath()
Jérémie Galarneau [Mon, 25 Nov 2024 19:49:05 +0000 (19:49 +0000)] 
Fix: path: handle truncation in utils_partial_realpath()

gcc warns that:
  warning: 'char* strncpy(char*, const char*, size_t)' specified bound 4096 equals destination size [-Wstringop-truncation]

Error-out when snprintf indicates that a truncation occurred.

Change-Id: I8c514da6b0ccb1a59d1555b02bfea2dd3c57febb
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
12 days agoFix: sessiond: assertion on inconsistent filter bytecode and expression
Jérémie Galarneau [Tue, 1 Oct 2024 19:47:57 +0000 (15:47 -0400)] 
Fix: sessiond: assertion on inconsistent filter bytecode and expression

The session daemon correctly expects that a recording event-rule that
specifies a filter must have both a bytecode and an expression (or
neither of them).

However, it shouldn't assert as those elements are user-specified. The
handling is changed to return an "invalid parameter" error to the
client.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I3df6ce79bc6ce1fa9dadb184a7fd2b528dc2a597

12 days agolttng: list: refer to domain recording rules as "event rules"
Jérémie Galarneau [Fri, 11 Oct 2024 15:42:17 +0000 (15:42 +0000)] 
lttng: list: refer to domain recording rules as "event rules"

Change-Id: I1aeb6ad091fc1a65c055ff9e5a616c615e258c6f
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
12 days agouprobe: log binary path on failure to open
Jérémie Galarneau [Fri, 15 Nov 2024 16:22:00 +0000 (11:22 -0500)] 
uprobe: log binary path on failure to open

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Iacf81482ffe71272a4cc4eea306d6bf99d03bd24

12 days agoFix: uprobe event-rule: logging mentions pattern instead of name
Jérémie Galarneau [Fri, 15 Nov 2024 14:38:09 +0000 (09:38 -0500)] 
Fix: uprobe event-rule: logging mentions pattern instead of name

A uprobe event rule names a new event; it does not match existing
events. Thus, it doesn't make sense to use the "pattern" terminology
used by some other event rule types.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I86ed75f48723c37480016645a48c0c2bd11cf37d

12 days agoTests: use _run_lttng_cmd instead of directly calling the client
Jérémie Galarneau [Thu, 14 Nov 2024 21:37:25 +0000 (16:37 -0500)] 
Tests: use _run_lttng_cmd instead of directly calling the client

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I1ad6601fb8cdf3b9ca973f1476c12bba2ff867cd

12 days agoTests: python logging: log enable-event command
Jérémie Galarneau [Thu, 14 Nov 2024 20:54:46 +0000 (15:54 -0500)] 
Tests: python logging: log enable-event command

Log the enable-event commands by using the _run_lttng_cmd util
to improve the debuggability of failed test cases.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: If250e014a7a22438ea3320fb90bf194dd7646a3d

4 weeks agosessiond: allocate ltt_session using `new`
Jérémie Galarneau [Sat, 14 Sep 2024 14:06:38 +0000 (10:06 -0400)] 
sessiond: allocate ltt_session using `new`

In order to make a follow-up change which adds non-POD members to
ltt_session, allocate it using the `new operator which causes
ltt_session's constructor to run.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ifd1fa8dc3e7a8cd3862985e1d30c5f032d0d4a8b

4 weeks agoTests: Fix: Use '.logfile' instead of '.log' for test app output
Kienan Stewart [Thu, 31 Oct 2024 12:33:04 +0000 (08:33 -0400)] 
Tests: Fix: Use '.logfile' instead of '.log' for test app output

Observed issue
==============

Frequent CI errors related to parsing TAP log files.

E.g.

```
17:04:55 Parsing TAP test result [/var/lib/jenkins/jobs/dev_review_lttng-tools_master_linuxbuild/configurations/axis-babeltrace_version/stable-2.0/axis-build/std/axis-conf/std/axis-liburcu_version/stable-0.14/axis-platform/deb12-amd64/builds/857/tap-master-files/tap/regression/ust/ust-app-ctl-paths/test_blocking.log.d/babeltrace.err.uMIm4i.log].
17:04:55 org.tap4j.parser.ParserException: Error parsing TAP Stream:
Missing TAP Plan.
```

Cause
=====

The TAP collector in the [CI][1] uses `**/*.*`[2] as the ANT pattern
for globbing test log files.

Consequently if, for example,  `./test.log.d/session.XXXXX.log` exists,
the content will try to be parsed as a TAP report and potentially
fail.

The Jenkins TAP plugin[3] does not expose the option of adding
excludes[4] to the globbing pattern.

Solution
========

Use `.logfile` as the suffix and modify the CI TAP collector
configuration to use `**/*.log`, as `.log` is the suffix for test logs
produced when running `make check`[5].

Known drawbacks
===============

None.

References
==========

[1]: https://ci.lttng.org
[2]: https://github.com/lttng/lttng-ci/blob/74c5e73d5d1961fcd8c530323b180c688f624e5c/jobs/lttng-tools.yaml#L430
[3]: https://github.com/jenkinsci/tap-plugin/
[4]: https://github.com/jenkinsci/tap-plugin/blob/8f9fb2d2b0ac9628380c6a56e3c7c97caf29d9ca/src/main/java/org/tap4j/plugin/TapPublisher.java#L472
[5]: https://www.gnu.org/software/automake/manual/html_node/Parallel-Test-Harness.html

Change-Id: I76b7203ce02645e222c7319ac576a9a1272cf985
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
4 weeks agosessiond/lttng-ctl: Introduce LTTNG_RUNDIR
Kienan Stewart [Thu, 18 Apr 2024 20:35:40 +0000 (16:35 -0400)] 
sessiond/lttng-ctl: Introduce LTTNG_RUNDIR

Observed issue
==============

Starting multiple instances of lttng-sessiond as the root user isn't
possible, even with setting different values for the `LTTNG_HOME`
environment variable.

Cause
=====

When starting `lttng-sessiond` the `apps_unix_sock_path`,
`client_unix_sock_path`, and `health_unix_sock_path` are set to
configure time static defines under `CONFIG_LTTNG_SYSTEM_RUNDIR`,
e.g. `/var/lib/lttng`.

Solution
========

A new environment variable `LTTNG_RUNDIR` is introduced to control the
base directory used by applications to find communication sockets.

The default behaviour is to emulate the existing divide: if
`LTTNG_RUNDIR` is not set the root `lttng-sessiond` will continue to
use `CONFIG_LTTNG_SYSTEM_RUNDIR`, and a non-root user `lttng-sessiond`
will use `LTTNG_HOME/.lttng`.

When `LTTNG_RUNDIR` is set, it takes priority over `LTTNG_HOME` for
determining the base directory. The output directory for traces is
not affected by `LTTNG_RUNDIR`, it continues to respect `LTTNG_HOME`.

Example with starting multiple root `lttng-sessiond`s:

```
DIR_A=$(mktemp -d)
DIR_B=$(mktemp -d)

LTTNG_RUNDIR="${DIR_A}" lttng-sessiond -b
LTTNG_RUNDIR="${DIR_B}" lttng-sessiond -b

LTTNG_RUNDIR="${DIR_A}" lttng list

LTTNG_RUNDIR="${DIR_B}" lttng list
```

In the example above, as `LTTNG_HOME` is not set, the default output
directory for traces by both over the `lttng-sessiond` instances will
be `$HOME/lttng-traces`.

The following will also work:

```
DIR_A=$(mktemp -d)
LTTNG_RUNDIR="${DIR_A}" lttng create
LTTNG_RUNDIR="${DIR_A}" lttng enable-event -u --all
LTTNG_RUNDIR="${DIR_A}" lttng start

LTTNG_UST_APP_PATH="${DIR_A}" test-application
```

The `LTTNG_UST_CTL_PATH` can be set to a location other than the
rundir as follows;

```
DIR_A=$(mktemp -d)
DIR_B=$(mktemp -d)
LTTNG_UST_CTL_PATH="${DIR_B}" LTTNG_RUNDIR="${DIR_A}" lttng create
LTTNG_RUNDIR="${DIR_A}" lttng enable-event -u -all
LTTNG_RUNDIR="${DIR_A}" lttng start

LTTNG_UST_APP_PATH="${DIR_B}" test-application

LTTNG_UST_APP_PATH="${DIR_A}" test-application
```

Known drawbacks
===============

None.

Change-Id: I371c2c72644277b7dcafaf970ee7b75d9bfbaedc
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
4 weeks agotests: Add test to cover daemon socket paths
Kienan Stewart [Fri, 26 Apr 2024 21:04:17 +0000 (17:04 -0400)] 
tests: Add test to cover daemon socket paths

Drawbacks
=========

The shared memory test uses `/dev/shm` which is only available on Linux
systems. The test is skipped otherwise.

Change-Id: I2e60c5b3191c638d6e36ceafa88e01473891c250
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
4 weeks agodocs: Add man page entry for LTTNG_UST_APP_PATH and LTTNG_UST_CTL_PATH
Kienan Stewart [Fri, 27 Oct 2023 18:58:03 +0000 (14:58 -0400)] 
docs: Add man page entry for LTTNG_UST_APP_PATH and LTTNG_UST_CTL_PATH

Add manual entries for the LTTNG_UST_APP_PATH and LTTNG_UST_CTL_PATH
environment variables in lttng-sessiond(8).

Change-Id: I9c7fd672d006b0e8afad7a8dfacbb3dd41e28f8f
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
4 weeks agotests: Add tests for LTTNG_UST_APP_PATH and LTTNG_UST_CTL_PATH
Kienan Stewart [Thu, 26 Oct 2023 20:08:27 +0000 (16:08 -0400)] 
tests: Add tests for LTTNG_UST_APP_PATH and LTTNG_UST_CTL_PATH

test_blocking_mode: Verifies that the sessiond starts (or doesn't start)
appropriately depending on the combination of path settings in
conjuction with `LTTNG_UST_ALLOW_BLOCKING` and `--blocking-timeout`.

test_path_separators:Path separators: Verifies the behaviour of the
sessiond and applications with multiple `LTTNG_UST_APP_PATH`s and
multiple `LTTNG_UST_CTL_PATHS`, including the verification of Java JUL
and Python agents.

test_ust_app_ctl_paths: Verifies the sessiond and traced applications
with different combinations of `LTTNG_UST_APP_PATH` and
`LTTNG_UST_CTL_PATH` settings.

Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I9f25cdc20fd482cdcc71294aadff3a2a6b34c01e

4 weeks agosessiond: exit early if it's possible to self-trace with blocking mode set
Kienan Stewart [Thu, 23 Nov 2023 19:16:21 +0000 (14:16 -0500)] 
sessiond: exit early if it's possible to self-trace with blocking mode set

Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I13e76160135e1cd9f0cb7d5ebe16e9662c1aee56

5 weeks agosessiond: Add initial support for multiple LTTNG_UST_CTL_PATHs
Kienan Stewart [Mon, 27 Nov 2023 16:32:42 +0000 (11:32 -0500)] 
sessiond: Add initial support for multiple LTTNG_UST_CTL_PATHs

LTTNG_UST_CTL_PATH may be separated with ':'. There is no provision
for escaping the ':' path separator (similar to `$PATH`). Subsequent
paths after the first are ignored.

Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I80f7d63e4be164f7afe3fe478794bd55c71c31fb

5 weeks agoIntroduce LTTNG_UST_CTL_PATH environment variable
Mathieu Desnoyers [Fri, 20 Oct 2023 20:39:40 +0000 (16:39 -0400)] 
Introduce LTTNG_UST_CTL_PATH environment variable

Example use to trace lttng-sessiond (tracee) with another lttng-sessiond
(tracer):

```
mkdir /tmp/tracer-ust
mkdir /tmp/tracee-ust
mkdir /tmp/tracer-home
mkdir /tmp/tracee-home
LTTNG_HOME=/tmp/tracer-home LTTNG_UST_CTL_PATH=/tmp/tracer-ust lttng-sessiond --daemon
LTTNG_HOME=/tmp/tracee-home LTTNG_UST_CTL_PATH=/tmp/tracee-ust \
LTTNG_UST_APP_PATH=/tmp/tracer-ust \
LD_PRELOAD=/usr/local/lib/liblttng-ust-fork.so:/usr/local/lib/liblttng-ust-fd.so \
lttng-sessiond --daemon
```

* Control the tracer sessiond by setting LTTNG_HOME=/tmp/tracer-home
* Control the tracee sessiond by setting LTTNG_HOME=/tmp/tracee-home
* Run an application traced by the tracee sessiond

```
LTTNG_UST_APP_PATH=/tmp/tracee-ust myapp
```

* Run an application traced by the tracer sessiond

```
LTTNG_UST_APP_PATH=/tmp/tracer-ust myapp
```

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ifcd05c366004b5fc0a4a362ad3556d7e6d8658b1

5 weeks agoClean-up: unify comment style
Jérémie Galarneau [Thu, 24 Oct 2024 21:01:44 +0000 (17:01 -0400)] 
Clean-up: unify comment style

Some comment blocks use /** in lieu of /* on the first line. Since the
latter style is prevalent in the code base, replace all instances to
attain One True Style.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ica1aee472bdbeaaa40ee32f67274e3c43f7f9850

5 weeks agoBump required clang-format version to 16
Simon Marchi [Mon, 16 Sep 2024 09:00:35 +0000 (05:00 -0400)] 
Bump required clang-format version to 16

Bump the required clang-format version from 14 to 16. For reference,
this is the version the Babeltrace project uses [1], and it is checked
by the LTTng CI [2], so I think it makes sense to go to that version.
More recent versions might not be readily available on the CI machines
(mostly running Debian 12 at the moment).

FWIW, I think that the code changes generated by this version change are
improvements.

[1] https://github.com/efficios/babeltrace/blob/d16ccfd174984c3d18f4f4427e4e438b1a64730e/tools/format-cpp.sh#L7
[2] https://ci.lttng.org/view/Babeltrace/job/babeltrace_master_lint/

Change-Id: I188f04928357ac7562b1a2dce3238a622189b04a
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 weeks agoFix: relayd: leaked socket for live connections
Kienan Stewart [Thu, 17 Oct 2024 14:40:48 +0000 (10:40 -0400)] 
Fix: relayd: leaked socket for live connections

Observed issue
==============

While exploring an issue where Python programs using the `bt2` module
would crash on shutdown, it was noticed that the same case also
highlighted a leaked file descriptor.

The following Python program may be used to demonstrate the leak.

```

import os
import socket
import time

import bt2

def is_connected():
    ctf_live_cc = bt2.find_plugin("ctf").source_component_classes["lttng-live"]
    q = bt2.QueryExecutor(ctf_live_cc, "sessions", params={"url": "net://localhost"})
    connected = False
    try:
        for x in q.query():
            print(x)
            if x['session-name'] == 'test' and x['client-count'] >= 1:
                connected = True
                break
    except Exception as e:
        print(e)
    return connected

os.system("lttng create test --live")
os.system("lttng enable-event -u --all")
os.system("lttng start")

ctf_live_cc = bt2.find_plugin("ctf").source_component_classes["lttng-live"]
iterator = bt2.TraceCollectionMessageIterator(bt2.ComponentSpec(ctf_live_cc, {'inputs': ["net://localhost/host/{}/test".format(socket.gethostname())], 'session-not-found-action': 'end'}))

data = []
while not is_connected():
    try:
        data.append(next(iterator))
    except Exception as e:
        print(e)
    time.sleep(0.1)

os.system("lttng stop")
os.system("lttng destroy")
os.system("killall lttng-sessiond")
os.system("killall lttng-relayd")
```

Cause
=====

During the clean-up at the end of the live worker thread, all remaining
viewer connections have their references put (which should cause the
connection to be released).

When releasing connections, `close` is never called on the socket's file
descriptor. Furthermore, the release doesn't modify `the_fd_tracker`.

Solution
========

Explicitly close and stop tracking the sockets for viewer connections
just prior to the final put.

Known drawbacks
===============

None.

Change-Id: I5eb0a3e5b9cb14dc1199be463bc312cbc72d8244
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 weeks agoTests: Document common environment variables used to control tests
Kienan Stewart [Tue, 15 Oct 2024 19:04:50 +0000 (19:04 +0000)] 
Tests: Document common environment variables used to control tests

Change-Id: I4837d506de0a366d847fd529e0c747b48701ccf1
Signed-off-by: Kienan Stewart <kstewart@efficos.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 weeks agoTests: Convert tests readme to adoc format
Kienan Stewart [Tue, 15 Oct 2024 18:19:54 +0000 (18:19 +0000)] 
Tests: Convert tests readme to adoc format

New documentation should be in adoc, and it's not a big hassle to
change this one.

Change-Id: I43b82749db0795b488d936925683aaa3a6a4f95b
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 weeks agoTests: Add test to cover live viewer hangs with an early inactive app
Kienan Stewart [Fri, 15 Dec 2023 14:34:28 +0000 (09:34 -0500)] 
Tests: Add test to cover live viewer hangs with an early inactive app

Refs: https://bugs.lttng.org/issues/1406

Change-Id: Iefe796d39ff39a04055acc5f95fc9440d5ee244a
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 weeks agoTests: Add environment variables in tests to attach gdbserver
Kienan Stewart [Fri, 11 Oct 2024 15:40:34 +0000 (15:40 +0000)] 
Tests: Add environment variables in tests to attach gdbserver

When debugging tests, the current infrastructure in both the python and
bash test harnesses require that the user start the sessiond or relayd
beforehand, and run the tests with environment variables to stop the
spawning of the respective programs.

To facilitate the process, new environment variables are added to allow
gdbserver to be spawned and attach to the relayd or sessiond. The user
may then connect with gdb, for example: `gdb -ex "target
localhost:1001"`.

Change-Id: Id4d1b446c7d6682c011ef27682198fb4a503f5f4
Signed-off-by: kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 weeks agoTests: Add wait_until_disconnected() to python test LiveViewer
Kienan Stewart [Thu, 10 Oct 2024 19:36:17 +0000 (19:36 +0000)] 
Tests: Add wait_until_disconnected() to python test LiveViewer

Change-Id: I70cf5e46a08c658b3fecbcea9317e1b2d9f769fb
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 weeks agoTests: Add _LiveViewer.is_connected()
Kienan Stewart [Thu, 10 Oct 2024 18:42:25 +0000 (18:42 +0000)] 
Tests: Add _LiveViewer.is_connected()

Change-Id: I3adf51ceea08d62e3d23598b57460427cf5561ea
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 weeks agoTests: Stop draining LiveViewer when iterator is inactive
Kienan Stewart [Thu, 10 Oct 2024 18:41:45 +0000 (18:41 +0000)] 
Tests: Stop draining LiveViewer when iterator is inactive

Change-Id: Ic67cea7c3fa57ed5c48bbab7d21d206a827852ad
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
6 weeks agodoc: Specify that string memory must be freed
Erica Bugden [Thu, 4 Jul 2024 19:26:16 +0000 (15:26 -0400)] 
doc: Specify that string memory must be freed

Specify that the caller of `get_session_name()` must explicitly free the
memory associated with the returned string.

Historically this software was written in C. C programmers have the
reflex to assume they are responsible for freeing memory. However, now
that this project is mixed C/C++ and is transitioning towards C++ the
assumption that developers will automatically know to free memory
(according to C programming conventions) does not hold as well. For this
reason, clarify that memory must be explicitly freed by the function
caller.

There are other C functions in this software that return variables that
must explicitly be freed by the caller. However, these changes have not
been applied consistently to all these cases. The assumption is that
this individual clarification still reduces confusion even if not
applied consistently.

Change-Id: I72d3568b5c1a7d8b7ff9b3d241bca1c5c55e47c1
Signed-off-by: Erica Bugden <ebugden@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
6 weeks agoFix: relayd: viewer_stream leak causes assertion failure on exit
Jérémie Galarneau [Fri, 11 Oct 2024 20:26:59 +0000 (16:26 -0400)] 
Fix: relayd: viewer_stream leak causes assertion failure on exit

Observed issue
==============

Running the test proposed in change #11584[1], the relay daemon aborts
when destroying the viewer_streams_ht as it is not empty.

Cause
=====

A viewer stream reference is leaked when sending streams to the live
client causing them to remain published in the viewer_streams_ht beyond
the lifetime of the viewer_connection.

The send_viewer_streams() function operates in two phases. First, it
iterates over the viewer_streams_ht to find streams that belong to the
target session and have not been sent yet.

In the second phase, it iterates over the session's unannounced stream
list. The commit message of 98b82dfa2 gives more background on the role
of the unannounced stream list.

When a viewer stream is created, two references are acquired:
  - one belongs to the global viewer_streams_ht,
  - the other belongs to the unannounced stream list.

When the viewer stream is eventually sent to the client, it is removed
from the unannounced stream list and that reference must be dropped.

Unfortunately, the reference is not dropped during the first phase.

Solution
========

Put the reference of the viewer streams that are sent during the first
phase of send_viewer_streams().

Known drawbacks
===============

None.

[1] https://review.lttng.org/c/lttng-tools/+/11584

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I7d5d5d3a77f6f08744b712d9ef1233a1d19a7124

2 months agoUse compiler-agnostic defines to silence warning
Jérémie Galarneau [Mon, 9 Sep 2024 15:53:28 +0000 (11:53 -0400)] 
Use compiler-agnostic defines to silence warning

g++ emits warnings that it can't recognize the clang-specific diagnostic
pragmas. They are replaced by the internal compiler-specific macros so
that nothing is emitted when g++ is used.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I543987a861d2322aa2ef3b7d631f280d2ac999bf

2 months agoDisable clang warning for injected class name ambiguity in non_copyable_reference
Jérémie Galarneau [Fri, 6 Sep 2024 21:18:21 +0000 (21:18 +0000)] 
Disable clang warning for injected class name ambiguity in non_copyable_reference

clang raises a warning (-Winjected-class-name) due to ambiguity between
a constructor name and a type within the non_copyable_reference code.

Since clang could not infer the correct type context, this commit uses
`#pragma clang diagnostic` to disable the specific warning in the
affected area of the code.

The `push` and `pop` pragmas ensure that the warning is disabled only
where needed, preventing it from affecting other parts of the codebase,
and allowing us to maintain clean and clear code without unnecessary
compiler warnings.

A static_assert enforces that CustomDeleter::deleter is indeed a type,
although interpreting it as a constructor would be non-sensical here.

Change-Id: Ic0aac06d7af4272438f6f3d0275f29dc57a32194
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 months agolttng: remove use of variable length array
Jérémie Galarneau [Fri, 6 Sep 2024 21:39:23 +0000 (21:39 +0000)] 
lttng: remove use of variable length array

Use fmtlib to format the session attribute string when saving
the current session to .lttngrc. This eliminates a warning
emitted by clang (VLAs are not standard in C++).

Change-Id: Icdb8c1cc47adcbdfd82eefa8d2f1bf37a042a028
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 months agoTests: Add and set new log-file-d for the tap driver
Kienan Stewart [Mon, 12 Aug 2024 19:59:15 +0000 (15:59 -0400)] 
Tests: Add and set new log-file-d for the tap driver

This adds a new option `--log-file-d` to the tap-driver, which will
create a `*.log.d` folder for each test when running `make check` and
the `LTTNG_TEST_LOG_DIR` accordingly.

Doing so allows the tests to be run in verbose and create logs in a
predictable location. These log folders are removed when running `make
clean`.

Change-Id: Ibcf7e2cb54098a3e9ccd828ca76df6efcf33431d
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 months agoTests: Use _run_babeltrace_cmd where possible
Kienan Stewart [Wed, 31 Jul 2024 18:54:14 +0000 (14:54 -0400)] 
Tests: Use _run_babeltrace_cmd where possible

Change-Id: I5b279201a1806e8975de39fa85e666da3d0c5204
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 months agoTests: Add environment variables for verbosity and log directory
Kienan Stewart [Tue, 30 Jul 2024 20:38:13 +0000 (16:38 -0400)] 
Tests: Add environment variables for verbosity and log directory

Observed issue
==============

When working locally with test failures, changes to `utils.sh` are often
required to produce verbose output.

Solution
========

By adding environment variables to allow running the various tools
with higher verbosity and potentially outputting to either stderr or
temporary files in a given directory test runners now have the option
to quickly get more information.

Known drawbacks
===============

Some tests depend on parsing either stderr or stdout, and these global
defaults may potentially make developing robust tests more
complicated.

Change-Id: I4128c421cdf9ce12827adc017dba5a298b62b6de
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 months agoTests: Clean-up: Remove duplicate babeltrace calls
Kienan Stewart [Wed, 31 Jul 2024 18:55:07 +0000 (14:55 -0400)] 
Tests: Clean-up: Remove duplicate babeltrace calls

Change-Id: Iec2471dcc97c8a2d73df6afe1b93db95b6316c4f
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 months agoClean-up: fix string truncation warning
Jérémie Galarneau [Thu, 5 Sep 2024 21:29:53 +0000 (21:29 +0000)] 
Clean-up: fix string truncation warning

Some version of g++ emits the following warning:
  'char* strncpy(char*, const char*, size_t)' output may be truncated
  copying 255 bytes from a string of length 255 [-Wstringop-truncation]

Using the internal strncpy wrapper, which checks for truncation,
fixes the problem.

Change-Id: I8ab0f2cca0247eee329b137f547d3dfed32c995f
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 months agoFix: unload all kernel modules on sessiond exit
Michael Jeanson [Tue, 27 Aug 2024 18:19:03 +0000 (14:19 -0400)] 
Fix: unload all kernel modules on sessiond exit

Stopping a root lttng-sessiond that has loaded kernel modules currently
leaves some modules loaded, add them in the correct order to allow
unloading them all.

Change-Id: I71f25c798f8c42737d295f32a5e3708287168bc6
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 months agoclang-format: run clang-format on the tree
Jérémie Galarneau [Thu, 5 Sep 2024 20:23:39 +0000 (16:23 -0400)] 
clang-format: run clang-format on the tree

It appears re-running clang-format produces additional changes after
7c8d0f41c. This state now seems stable for the moment.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: If46d7cab85fa6a5df41453b3ea3aa0e80782768d

2 months agoTests: test_session: reduce session count in unit test
Jérémie Galarneau [Thu, 5 Sep 2024 19:28:10 +0000 (19:28 +0000)] 
Tests: test_session: reduce session count in unit test

The destruction of the sessions takes 45 secondes to complete and
I don't see what testing 10 000 iterations tests that is not
achieved by 1 000 iterations.

That step now completes in 32ms on my development machine 🌪

Change-Id: I84e807750678144045974f27d084dd45f0f9713b
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 months agoTests: test_session: fix conditionally-supported offsetof warning
Jérémie Galarneau [Thu, 5 Sep 2024 19:25:34 +0000 (19:25 +0000)] 
Tests: test_session: fix conditionally-supported offsetof warning

g++ produces the following warning when building test_session on the CI:

  /home/jenkins/workspace/dev_review_lttng-tools_master_linuxbuild/babeltrace_version/stable-2.0/build/std/conf/std/liburcu_version/stable-0.14/platform/deb12-amd64/deps/build/include/urcu/list.h:160:49: warning: ‘offsetof’ within non-standard-layout type ‘ltt_session’ is conditionally-supported [-Winvalid-offsetof]
     160 |         for (pos = cds_list_entry((head)->next, __typeof__(*(pos)), member); \
   /home/jenkins/workspace/dev_review_lttng-tools_master_linuxbuild/babeltrace_version/stable-2.0/build/std/conf/std/liburcu_version/stable-0.14/platform/deb12-amd64/deps/build/include/urcu/list.h:126:49: note: in expansion of macro ‘caa_container_of’
     126 | #define cds_list_entry(ptr, type, member)       caa_container_of(ptr, type, member)
         |                                                 ^~~~~~~~~~~~~~~~

Replace the problematic liburcu macros with the C++-ified internal
replacement wrappers.

Change-Id: I1396e9f6e1d99710150e03cabc33d07df3d8558a
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 months agovendor/argpar: sync with upstream repository
Simon Marchi [Mon, 18 Mar 2024 18:42:10 +0000 (14:42 -0400)] 
vendor/argpar: sync with upstream repository

Sync with argpar, commit 88d1c8ae5c47 ("tap: import some changes").

Change-Id: I2338119ffa55391f836ed2bacc09360fc3f47217
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 months agoUpdate fmtlib to 11.0.2 and use the built library
Jérémie Galarneau [Fri, 30 Aug 2024 22:24:15 +0000 (18:24 -0400)] 
Update fmtlib to 11.0.2 and use the built library

Change-Id: I51c80c7d1829604a035e84dac7d87a48bb6fe05b
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 months agoClean-up: sessiond: logging typo hast -> hash
Jérémie Galarneau [Thu, 5 Sep 2024 16:04:12 +0000 (12:04 -0400)] 
Clean-up: sessiond: logging typo hast -> hash

The comment refers to hash tables, not The Hate Tables 🤘

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Idac6e7e5d9b4bebca9db9e3187ec33a4dd335ddf

2 months agoclang-format: run clang-format on the tree
Jérémie Galarneau [Thu, 5 Sep 2024 15:26:33 +0000 (11:26 -0400)] 
clang-format: run clang-format on the tree

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ifa81e86645655c1e4c1ee81375dad3ba45e850d1

2 months agoMove argpar to vendor directory
Simon Marchi [Mon, 18 Mar 2024 18:21:45 +0000 (14:21 -0400)] 
Move argpar to vendor directory

Since this is source copied as-is from another project, I think it
belongs to the vendor directory.  This will make it so it will be
skipped by format-cpp, for instance.

Change-Id: I78892f80c4cbb3a2e863567b0021e895c6489402
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
2 months agoconfigure.ac: enable automake's subdir-objects option
Simon Marchi [Mon, 18 Mar 2024 18:39:26 +0000 (14:39 -0400)] 
configure.ac: enable automake's subdir-objects option

From [1]:

    subdir-objects

        If this option is specified, then objects are placed into the
        subdirectory of the build directory corresponding to the
        subdirectory of the source file. For instance, if the source file is
        subdir/file.cxx, then the output file would be subdir/file.o. See
        Program and Library Variables.

This will allow reducing the number of Makefiles, but placing rules in
Makefiles in parent directories, instead of having Makefiles in every
single directory with something that needs to be built.

[1] https://www.gnu.org/software/automake/manual/html_node/List-of-Automake-options.html

Change-Id: Ic0c6ee5d809dee6b9e560239abd15408b28d9695
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
3 months agoTests: Remove unused helper `default_pipe_size_getter`
root [Wed, 28 Aug 2024 19:24:53 +0000 (19:24 +0000)] 
Tests: Remove unused helper `default_pipe_size_getter`

Change-Id: If72cb1456edbf26c340fcffdf93c8fb874ab030e
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
3 months agoTests: Make notifier discard count test more robust
root [Wed, 28 Aug 2024 18:52:09 +0000 (18:52 +0000)] 
Tests: Make notifier discard count test more robust

Observed issue
==============

In the CI, this test would intermittently fail. During failures,
the calculated pipe size from the `default_pipe_size_getter`
application was 8192, while in other cases it was 65536.

```
ERROR: tools/notification/test_notification_notifier_discarded_count
====================================================================

1..41
ok 1 - Add trigger my_trigger
PASS: tools/notification/test_notification_notifier_discarded_count 1 - Add trigger my_trigger
  ---
    duration_ms: 1323.966137
  ...
ok 2 - No discarded tracer notification
PASS: tools/notification/test_notification_notifier_discarded_count 2 - No discarded tracer notification
  ---
    duration_ms: 22.021590
  ...
ok 3 - Generating 390 tracer notifications
PASS: tools/notification/test_notification_notifier_discarded_count 3 - Generating 390 tracer notifications
  ---
    duration_ms: 154.790871
  ...
not ok 4 - Discarded tracer notification number non-zero (0) as expected
FAIL: tools/notification/test_notification_notifier_discarded_count 4 - Discarded tracer notification number non-zero (0) as expected
  ---
    duration_ms: 24.323759
  ...
```

Cause
=====

The initial size of pipes in linux may have different values:

1) `16 * PAGE_SIZE` (as documented in `man 7 pipe`) (since Linux 2.6.11)
2) When a user has many pipes open and is above a soft limit:

  * `2 * PAGE_SIZE` (undocumented, see[1]), as of Linux 5.14[2]
  * `1 * PAGE_SIZE` since linux 2.6.35[3]

As the program `default_pipe_size_getter` opened a pipe to check it's
size, there could be times in a system where a user has many pipe
buffers open beyond the soft limit and the lower value would be
returned; however, the previously opened sessiond may have had a pipe
opened with the larger default pipe size.

Solution
========

Use the maximum page size (on Linux, from
`/proc/sys/fs/pipe-max-size`) for the estimated pipe size rather than
opening a pipe and checking it's size.

Known drawbacks
===============

When the maximum pipe size value is much larger than the actual size
of the notification pipe, many more events are emitted than is
necessary to complete the test.

References
==========

[1]: https://gitlab.com/linux-kernel/stable/-/blob/3e9bff3bbe1355805de919f688bef4baefbfd436/fs/pipe.c#L809
[2]: See upstream commit 46c4c9d1beb7f5b4cec4dd90e7728720583ee348
[3]: See upstream commit 6a6ca57de92fcae34603551ac944aa74758c30d4

Change-Id: Id547a1d772b5a7f9b18ffa686ff6644afca4ab15
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
3 months agodocs: Update relayd architecture
Kienan Stewart [Mon, 6 May 2024 20:14:49 +0000 (16:14 -0400)] 
docs: Update relayd architecture

Observed issue
==============

The sessiond sessions do not map one-to-one with relay
sessions. Rather, there can be one relay session associated with
each of the active consumers, e.g. ustconsumerd64, ustconsumerd32,
and kconsumerd of each lttng-sessiond sessions.

Solution
========

The phrasing of the the relay session has been updated to
"per-consumer". An additional mention is added to say that
attaching a viewer session to multiple lttng-sessiond
sessions is not supported.

Change-Id: I1df18c4e97c0ee9ec4ee17b3bf35c6e74c90774f
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
3 months agotests: Add test for missing short lived applications
Kienan Stewart [Tue, 28 May 2024 14:33:07 +0000 (10:33 -0400)] 
tests: Add test for missing short lived applications

Change-Id: I691b9fb2cfae7603e40ca4c668acaff752ba3f28
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
3 months agoFix: relayd: live: Catch short lived applications for attached viewers
Kienan Stewart [Fri, 12 Apr 2024 18:27:09 +0000 (14:27 -0400)] 
Fix: relayd: live: Catch short lived applications for attached viewers

Observed issue
==============

When a live viewer is attached to a session and a new application
starts, emits events, and exits the viewer may not see the produced
events.

With per-UID buffer allocation, the application needs to run as a new
user that hasn't had streams allocated before. With per-PID buffers,
spawning a new traced application is sufficient.

Cause
=====

When the new relay streams are created, associated viewer streams are
not immediately created. As a result, there is a gap between in which
the session may start being destroyed and/or the relay streams
unpublished and the time at which the live viewer sends a GET_NEW_STREAMS
command. When the relay streams are unpublished for any reason, the
reference to the relay stream in the ctf_trace is removed. The new
and unsent streams iterate over the relay streams in each ctf_trace.
Therefore, relay streams that were created and unpublished while
the live viewer was already attached to the session can be completely
missed.

Solution
========

The solution has three main aspects:

1. When new relayd streams are published and a viewer is attached for the
corresponding relay session or when a live viewer session attaches to
an existing relay session the viewer streams are created immediately.

2. The unsent viewer streams are tracked in a per-viewer session
list so that there continues to be a reference (via the
viewer_stream->stream backreference) held for the relay stream, and that
unpublished relay streams can be found without iterating over the
entire relay streams hashtable.

3. To cover cases where a relay stream has been closed but there are
still known trace chunks available, an additional check has been added
to the `get_next_index` viewer stream transition checks. When the
seen rotation count and relay stream rotation count are the same and
that the relay stream no longer has an active trace chunk, the
viewer stream is not forcibly rotated. This stops the final drop to
the trace chunk reference (via
viewerstream->stream_file->trace_chunk). Later, when the relay stream
is fully closed, there is a final rotation that is performed.

Known drawbacks
===============

The current implementation adds a global hash table which holds
references to created viewer sessions. When searching to determine if
new viewer streams should be created, the search is O(N*M) where N is the
number of viewer sessons and M is the number of relay sessions.

A different approach to recording references from relay sessions to
viewer sessions (if any exist) could reduce the search space.

Change-Id: Ie8f00697a4dafd5c9b0bfe60a872d1c1882f6944
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
3 months agoTests: Add controls to run python tests with verbose output
Kienan Stewart [Thu, 11 Apr 2024 17:52:45 +0000 (13:52 -0400)] 
Tests: Add controls to run python tests with verbose output

When running failing tests, it can be useful to get verbose output
immediately without trying to run an environment with a separate
sessiond and/or relayd.

Setting `LTTNG_TEST_VERBOSE_RELAYD` or `LTTNG_TEST_VERBOSE_SESSIOND`
environment variables will cause the corresponding application to be run
in it's most verbose configuration.

Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Change-Id: Ic2dd84a36f61837dfbca99d06d6a438ae884f782
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
3 months agoClean-up: Tests: Use lttngtest to run live viewer for test_live_hang.py
Kienan Stewart [Tue, 2 Apr 2024 12:00:57 +0000 (08:00 -0400)] 
Clean-up: Tests: Use lttngtest to run live viewer for test_live_hang.py

Change-Id: Iadc40684c8cd5f0ce64e45e3c78747ca54f5bc89
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
3 months agoFix: Tests: Use wait_before_exit_file_path in WaitTraceTestApplication
Kienan Stewart [Fri, 29 Mar 2024 20:58:20 +0000 (16:58 -0400)] 
Fix: Tests: Use wait_before_exit_file_path in WaitTraceTestApplication

The `_WaitTraceTestApplication`'s `__init__` method proposed the
`wait_before_exit` and `wait_before_exit_file_path` parameters; however,
the parameters weren't then passed onwards to the trace test
application's invocation.

Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Change-Id: I9055aa206a8fd943012bacfa49d6ff152f2dfbde
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
3 months agoTests: Add LiveViewer to lttngtest environment
Kienan Stewart [Fri, 29 Mar 2024 20:58:03 +0000 (16:58 -0400)] 
Tests: Add LiveViewer to lttngtest environment

Drawbacks
=========

With the current python bindings, the relayd seems to leak a file
descriptor; however, this doesn't stop the tests from working.

E.g.

```
ok 1 - BT2 live viewer exited successfully

Change-Id: I13994f7c8b0b6cffcee0d0ea0f8fca22538de651
---
  duration_ms: 1097.302968
...

 Killing session daemon (pid = 3340512)
Session daemon killed
lttng-relayd: Error: A file descriptor leak has been detected: 1
tracked file descriptors are still being tracked
```

Change-Id: Ie4294dd7238d4b6074af2d4cf193e1ca9949a741
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
3 months agoTests: Allow the creation of dummy users in the lttngtest environment
Kienan Stewart [Fri, 29 Mar 2024 20:57:20 +0000 (16:57 -0400)] 
Tests: Allow the creation of dummy users in the lttngtest environment

There are tests that need other user accounts created (e.g. to exercise
per-UID buffers with more than one user). Those accounts may be created
by the test environment and cleaned up on deletion.

An option has been added to the _WaitTraceTestApplication to run the
application as another using `su`.

Change-Id: Ie003e628258fdfbea1972f1f8825c4466fc2792b
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
3 months agoTests: Do not remove interrupted test log files by default
Kienan Stewart [Fri, 26 Jul 2024 14:46:40 +0000 (10:46 -0400)] 
Tests: Do not remove interrupted test log files by default

Observed issue
==============

During CI runs, builds may timeout or be killed for another reason.
Those tests logs are deleted and cannot be checked for diagnostic
information, warnings, or errors.

Cause
=====

By default, the test log for the currently running test is deleted by
automake so that subsequent invocations of `make check` will re-run the test.

Solution
========

Add a disable flag `--disable-precious-tests` and set
`PRECIOUS_TESTS` to true by default when configuring lttng-tools. When
`PRECIOUS_TOOLS` is set, all test logs in `tests/regression` will be
marked as `.PRECIOUS` and subsequently not deleted when interrupted.

Known drawbacks
===============

This could make interrupting a test and re-running during test
development more of a hassle.

References
==========

[1]: https://www.gnu.org/software/make/manual/html_node/Special-Targets.html
[2]: https://automake.gnu.narkive.com/1TjEGbH2/delete-on-error-test-suite-log-and-precious

Change-Id: I08b4a1bb29eb609827cc1c047f141f94b210effe
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
3 months agolttng: add-trigger: clarify terminology for log levels
Michael Jeanson [Tue, 28 May 2024 21:18:35 +0000 (17:18 -0400)] 
lttng: add-trigger: clarify terminology for log levels

To eliminate ambiguity in the code, the terminology for log levels has
been updated. The previous terms "min" and "max" log levels have been
replaced with "least_severe" and "most_severe" respectively.

This change addresses the varying conventions across different logging
domains, where numerical values for severity can either increase or
decrease with severity. The new terminology provides clarity, making it
easier to understand the severity levels regardless of the logging
domain's convention.

Change-Id: Ie90bcc8e4c07b8b7437d9580e166141fae5c6d2f
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
3 months agoFix: inverted logic in loglevel_parse_range_string_common function
Michael Jeanson [Tue, 28 May 2024 19:08:22 +0000 (15:08 -0400)] 
Fix: inverted logic in loglevel_parse_range_string_common function

The mapping of numerical severity levels to their corresponding names
varies across different logging domains. Some domains, like
Java Util Logging, use higher numerical values for more severe logging
levels, while others, like Log4j2, use lower values for the same
purpose.

To accommodate this variation, the `loglevel_parse_range_string_common`
function has been updated. It now accepts the numerical value
representing the most severe logging level in a given domain. This
change ensures that log level specifications in the format `TRACE..` are
parsed correctly, regardless of the domain's convention.

Change-Id: Idbc3949ac33b69c71fce484a6d8912f59cdbe08d
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
3 months agoAdd 'log4j2' domain event-rule unit tests
Michael Jeanson [Fri, 11 Feb 2022 15:38:48 +0000 (15:38 +0000)] 
Add 'log4j2' domain event-rule unit tests

Change-Id: I8a075516de8cb0b682791e4bf44d9a5d7780680e
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
3 months agoAdd Log4j 2.x agent tests for the 'log4j2' domain
Michael Jeanson [Fri, 11 Feb 2022 15:38:10 +0000 (15:38 +0000)] 
Add Log4j 2.x agent tests for the 'log4j2' domain

Add integration tests for the new Log4j 2.x agent in its native mode
using the new 'log4j2' domain, the new configure switch
'--enable-test-java-agent-log4j2' to enable it or
'--enable-test-java-agent-all' to enable all Java agents tests.

To run only this new test, use this command :

  cd tests/regression && make check TESTS="ust/java-log4j2/test_agent_log4j2_domain_log4j2"

Change-Id: Idfac151d2e523b5ac109f2dae2f182b0bc9415d8
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
3 months agoAdd 'log4j2' domain to the documentation
Michael Jeanson [Wed, 22 May 2024 20:40:29 +0000 (16:40 -0400)] 
Add 'log4j2' domain to the documentation

Change-Id: Ie76c686583f10bc09b9769db66e8e079b8472a37
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
3 months agoAdd 'log4j2' domain to zsh completion
Michael Jeanson [Wed, 22 May 2024 20:41:39 +0000 (16:41 -0400)] 
Add 'log4j2' domain to zsh completion

Change-Id: Ic9121630022981909962a0b5943a20dbe5240558
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
3 months agoAdd 'log4j2' domain to common test code
Michael Jeanson [Wed, 22 May 2024 20:44:30 +0000 (16:44 -0400)] 
Add 'log4j2' domain to common test code

Change-Id: I84961823eb875c673e525d83d8096291506c1edb
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
3 months agoAdd a Log4j 2.x agent specific domain 'log4j2'
Michael Jeanson [Wed, 2 Feb 2022 20:04:09 +0000 (20:04 +0000)] 
Add a Log4j 2.x agent specific domain 'log4j2'

The initial version of the new LTTng-UST Log4j 2.x agent only operated
in a compatibility mode making use of the existing 'log4j' tracing
domain currently implemented in LTTng-Tools.

While this is useful when migrating existing Log4j applications using
the compatibility bridge it does require converting the log levels from
the new Log4j 2.x values to the old Log4j 1.x standard. This results in
hiding the actual log level values from the users for applications
natively using Log4j 2.x.

Exposing the native Log4j 2.x log level values requires a new domain
since the changes are significant:

  * The same list of standard log levels and names
  * Each standard log level has a new integer value
  * The log levels scale is reversed and shortened from
    'int32_max -> int32_min' to '0 -> int32_max'
  * The interval between standard log levels has changed

This new 'log4j2' domain is basicaly a straight copy of the current
'log4j' domain with minor adjustements for the reversed and shortened
scale.

Change-Id: I89f9c0a428ffe1d0bd26f7af547e9e21503de653
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
3 months agoclang-format: run clang-format on the tree
Jérémie Galarneau [Fri, 30 Aug 2024 20:37:20 +0000 (16:37 -0400)] 
clang-format: run clang-format on the tree

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I45c2a1f375e5b19b649ee5d11fb299ccd77c6845

3 months agoclang-format: ignore generated files
Jérémie Galarneau [Fri, 30 Aug 2024 19:52:37 +0000 (15:52 -0400)] 
clang-format: ignore generated files

Two auto-generated files cause clang-format < v17 to hang when
they are being formatted. I have not looked into the root-cause,
but formatting them is useless anyhow.

Adding them to .clang-format-ignore works around the problem for the
moment.

Since clang-format 14 does not support ignore files, their support is
crudely emulated here using grep to filter out find's results.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I1cd10349ef1a66b1d595105b0ec1a4beef9dcc9a

3 months agoFix: Do not null out lttng_consumer_stream channel on deletion
Kienan Stewart [Thu, 11 Jul 2024 14:34:52 +0000 (10:34 -0400)] 
Fix: Do not null out lttng_consumer_stream channel on deletion

Change-Id: Ic98a27e6704d2683d24b8645d345955cee8b038c
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
3 months agoFix: Crash when unregistering UST apps during shutdown
Kienan Stewart [Wed, 10 Jul 2024 18:14:14 +0000 (14:14 -0400)] 
Fix: Crash when unregistering UST apps during shutdown

Observed issue
==============

The following crash has been observed in v2.12.2:

```
function=0x55ac7c4c9600 <_ PRETTY FUNCTION .12873> "lttng_ustconsumer_close_metadata") at assert.c:92
function=0x55ac7c4c9600 <_ PRETTY FUNCTION .12873> "lttng_ustconsumer_close_metadata") at assert.c:101
```

The underlying cause is applicable in the current master branch as
well.

Cause
=====

There is a potential race between the threads the consumerd control
thread which handles commands coming from the sessiond and the main
thread when shutting down a consumerd.

Is it possible that the following happens:

1. `destroy_metadata_stream_ht` has the locks on `consumer_data`,
`channel`, `stream`
2. `lttng_ustconsumer_close_all_metadata` looks up the channel and starts to try and acquire a channel lock (`stream->chan->lock`)
3. `destroy_metadata_stream_ht` sets `stream->chan` to `null`
4. `destroy_metadata_stream_ht` releases the `stream`, `channel`, and `consumer_data` locks
5. `lttng_ustconsumer_close_all_metadata` now has the channel lock, and looks up `stream->chan` again to call `destroy_metadata_stream_ht`, and that member is now null

Solution
========

Acquire the stream lock after acquiring the channel lock.

part 2 follows: don't set stream->chan to null.

Known drawbacks
===============

None.

Change-Id: I1d27ea6ac08f3e7ed4624a8921cffb675be649d2
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
3 months agoFix: Compilation failure deducing type of `auto` variables in GCC 4.8
Kienan Stewart [Tue, 6 Aug 2024 15:32:57 +0000 (11:32 -0400)] 
Fix: Compilation failure deducing type of `auto` variables in GCC 4.8

Observed issue
==============

When compiling with GCC 4.8.5 or GCC 5.5.0 on SLES12SP5, the following
error happens:

```
save.cpp: In function 'int save_agent_events(config_writer*, agent*)':
save.cpp:1185:43: error: use of 'agent_event' before deduction of 'auto'
       lttng::urcu::lfht_iteration_adapter<agent_event,
                                           ^                                                                 save.cpp:1185:43: error: use of 'agent_event' before deduction of 'auto'
save.cpp:1185:43: error: use of 'agent_event' before deduction of 'auto'
save.cpp:1187:26: error: template argument 1 is invalid
        &agent_event::node>(*agent->events->ht)) {
                          ^
save.cpp:1187:26: error: creating pointer to member of non-class type '<type error>'
save.cpp:1187:26: note: invalid template non-type parameter
In file included from ../../../src/vendor/fmt/core.h:3316:0,
                 from ../../../src/common/format.hpp:20,
                 from ../../../src/common/error.hpp:13,
                 from ../../../src/common/common.hpp:12,
                 from snapshot.hpp:13,
                 from consumer.hpp:12,
                 from session.hpp:11,
                 from kernel.hpp:13,
                 from save.cpp:10:
```

Cause
=====

This appears to be a limitation in older versions of GCC. I did not
find specific commit(s) or bugs which hilight the issue, but
compilation of this code works as of GCC 6.5.0 on SLES12SP5. Previous
point releases of GCC 6.x were not tested.

Solution
========

Explicitly define the type of the pointer and the type passed to
`lttng::urcu::lftht_iteration_adapter` so the compiler does not have
to perform type deduction.

Known drawbacks
===============

None.

Change-Id: I71c5937a38336756ece4f396ea5ba7af7f3d36c3
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
3 months agoFix: Compilation failure in session_not_found_error with GCC 4.8
Kienan Stewart [Tue, 6 Aug 2024 15:14:49 +0000 (11:14 -0400)] 
Fix: Compilation failure in session_not_found_error with GCC 4.8

Observed issue
==============

When compiling with gcc 4.8.5, the compilation fails with the
following erorr:

```
session.hpp:577:2: error: function 'lttng::sessiond::exceptions::session_not_found_error::session_not_found_error(lttng::sessiond::exceptions::session_not_found_error&&)' defaulted on its first declaration with an exception-specification that differs from the implicit declaration 'lttng::sessiond::exceptions::session_not_found_error::session_not_found_error(lttng::sessiond::exceptions::session_not_found_error&&)'
session.hpp:577:2: error: function 'lttng::sessiond::exceptions::session_not_found_error::session_not_found_error(lttng::sessiond::exceptions::session_not_found_error&&)' defaulted on its first declaration with an exception-specification that differs from the implicit declaration 'lttng::sessiond::exceptions::session_not_found_error::session_not_found_error(lttng::sessiond::exceptions::session_not_found_error&&)'
```

Cause
=====

This is due a bug in GCC which is fixed as of GCC 5.0[1]

Solution
========

Do not explicitly define the move_assignable for
`lttng::sessiond::exceptions::session_not_found_error` as
`noexcept`. The function should be implicitly generated as `noexcept`.

Known drawbacks
===============

None.

References
==========
[1]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59526

Change-Id: I3368633ce3b45627f2e67f7d2def361e662eec3d
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
4 months agosessiond: main.cpp: iterate on list using list_iteration_adapter
Jérémie Galarneau [Wed, 31 Jul 2024 01:10:16 +0000 (01:10 +0000)] 
sessiond: main.cpp: iterate on list using list_iteration_adapter

Change-Id: I492b597b70040c0e1f3eb826aadb66ca44550fb5
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
4 months agosessiond: agent-thread.cpp: iterate on list using list_iteration_adapter
Jérémie Galarneau [Wed, 31 Jul 2024 01:06:11 +0000 (01:06 +0000)] 
sessiond: agent-thread.cpp: iterate on list using list_iteration_adapter

Change-Id: Ibd02f3e2c8d91fc8aa09097be2ea7b563001b1da
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
4 months agosessiond: trace-ust.cpp: iterate on list using list_iteration_adapter
Jérémie Galarneau [Wed, 31 Jul 2024 01:04:45 +0000 (01:04 +0000)] 
sessiond: trace-ust.cpp: iterate on list using list_iteration_adapter

Change-Id: I20a5549e8a93b7fe0d111b72548af8e74d80531a
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
4 months agosessiond: ust-app.cpp: iterate on list using list_iteration_adapter
Jérémie Galarneau [Tue, 30 Jul 2024 20:44:53 +0000 (20:44 +0000)] 
sessiond: ust-app.cpp: iterate on list using list_iteration_adapter

Change-Id: I77d7ecb33f297561ec9c887495c7a798fa7f73ce
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
4 months agosessiond: client.cpp: iterate on list using list_iteration_adapter
Jérémie Galarneau [Tue, 30 Jul 2024 20:36:13 +0000 (20:36 +0000)] 
sessiond: client.cpp: iterate on list using list_iteration_adapter

Change-Id: Ibb45513080329e805c757cfc69a99eaa14387ac0
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
4 months agorelayd: ctf-trace.cpp: iterate on rcu list using rcu_list_iteration_adapter
Jérémie Galarneau [Tue, 30 Jul 2024 20:32:50 +0000 (20:32 +0000)] 
relayd: ctf-trace.cpp: iterate on rcu list using rcu_list_iteration_adapter

Change-Id: I5c16d02d44fc90b9bf9336ac6cd795398d4ab4f5
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
4 months agosessiond: manage-kernel.cpp: iterate on list using list_iteration_adapter
Jérémie Galarneau [Tue, 30 Jul 2024 20:30:46 +0000 (20:30 +0000)] 
sessiond: manage-kernel.cpp: iterate on list using list_iteration_adapter

Change-Id: Icf0e10d675e1d0ba116c09b92d9426309b7cb606
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
4 months agosessiond: dispatch.cpp: iterate on list using list_iteration_adapter
Jérémie Galarneau [Tue, 30 Jul 2024 20:20:09 +0000 (20:20 +0000)] 
sessiond: dispatch.cpp: iterate on list using list_iteration_adapter

Change-Id: Ie8a45753922b0a5dd476be06ce15a1f7d6883c08
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
4 months agorelayd: viewer-session.cpp: iterate on rcu list using rcu_list_iteration_adapter
Jérémie Galarneau [Tue, 30 Jul 2024 20:04:23 +0000 (20:04 +0000)] 
relayd: viewer-session.cpp: iterate on rcu list using rcu_list_iteration_adapter

Change-Id: Ie8110d36a9c59e687366309e6ee399e6a3f93bbc
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
4 months agorelayd: session.cpp: iterate on rcu list using rcu_list_iteration_adapter
Jérémie Galarneau [Tue, 30 Jul 2024 20:00:15 +0000 (20:00 +0000)] 
relayd: session.cpp: iterate on rcu list using rcu_list_iteration_adapter

Change-Id: I9cfca29e54873c696ef6b8c84454e77e299ddd10
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
4 months agorelayd: live.cpp: iterate on rcu list using rcu_list_iteration_adapter
Jérémie Galarneau [Tue, 30 Jul 2024 19:58:10 +0000 (19:58 +0000)] 
relayd: live.cpp: iterate on rcu list using rcu_list_iteration_adapter

Change-Id: I89824eb36bb317a424880f34dc962cf7b1eca1ed
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
4 months agorelayd: main.cpp: iterate on rcu list using rcu_list_iteration_adapter
Jérémie Galarneau [Tue, 30 Jul 2024 19:43:01 +0000 (19:43 +0000)] 
relayd: main.cpp: iterate on rcu list using rcu_list_iteration_adapter

Change-Id: Id3070b39458b3e44185875c87dca069c2dbb6ed6
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
This page took 0.053381 seconds and 4 git commands to generate.