Kienan Stewart [Fri, 11 Oct 2024 15:40:34 +0000 (15:40 +0000)]
Tests: Add environment variables in tests to attach gdbserver
When debugging tests, the current infrastructure in both the python and
bash test harnesses require that the user start the sessiond or relayd
beforehand, and run the tests with environment variables to stop the
spawning of the respective programs.
To facilitate the process, new environment variables are added to allow
gdbserver to be spawned and attach to the relayd or sessiond. The user
may then connect with gdb, for example: `gdb -ex "target
localhost:1001"`.
Change-Id: Id4d1b446c7d6682c011ef27682198fb4a503f5f4 Signed-off-by: kienan Stewart <kstewart@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Specify that the caller of `get_session_name()` must explicitly free the
memory associated with the returned string.
Historically this software was written in C. C programmers have the
reflex to assume they are responsible for freeing memory. However, now
that this project is mixed C/C++ and is transitioning towards C++ the
assumption that developers will automatically know to free memory
(according to C programming conventions) does not hold as well. For this
reason, clarify that memory must be explicitly freed by the function
caller.
There are other C functions in this software that return variables that
must explicitly be freed by the caller. However, these changes have not
been applied consistently to all these cases. The assumption is that
this individual clarification still reduces confusion even if not
applied consistently.
Fix: relayd: viewer_stream leak causes assertion failure on exit
Observed issue
==============
Running the test proposed in change #11584[1], the relay daemon aborts
when destroying the viewer_streams_ht as it is not empty.
Cause
=====
A viewer stream reference is leaked when sending streams to the live
client causing them to remain published in the viewer_streams_ht beyond
the lifetime of the viewer_connection.
The send_viewer_streams() function operates in two phases. First, it
iterates over the viewer_streams_ht to find streams that belong to the
target session and have not been sent yet.
In the second phase, it iterates over the session's unannounced stream
list. The commit message of 98b82dfa2 gives more background on the role
of the unannounced stream list.
When a viewer stream is created, two references are acquired:
- one belongs to the global viewer_streams_ht,
- the other belongs to the unannounced stream list.
When the viewer stream is eventually sent to the client, it is removed
from the unannounced stream list and that reference must be dropped.
Unfortunately, the reference is not dropped during the first phase.
Solution
========
Put the reference of the viewer streams that are sent during the first
phase of send_viewer_streams().
g++ emits warnings that it can't recognize the clang-specific diagnostic
pragmas. They are replaced by the internal compiler-specific macros so
that nothing is emitted when g++ is used.
Disable clang warning for injected class name ambiguity in non_copyable_reference
clang raises a warning (-Winjected-class-name) due to ambiguity between
a constructor name and a type within the non_copyable_reference code.
Since clang could not infer the correct type context, this commit uses
`#pragma clang diagnostic` to disable the specific warning in the
affected area of the code.
The `push` and `pop` pragmas ensure that the warning is disabled only
where needed, preventing it from affecting other parts of the codebase,
and allowing us to maintain clean and clear code without unnecessary
compiler warnings.
A static_assert enforces that CustomDeleter::deleter is indeed a type,
although interpreting it as a constructor would be non-sensical here.
Use fmtlib to format the session attribute string when saving
the current session to .lttngrc. This eliminates a warning
emitted by clang (VLAs are not standard in C++).
Kienan Stewart [Mon, 12 Aug 2024 19:59:15 +0000 (15:59 -0400)]
Tests: Add and set new log-file-d for the tap driver
This adds a new option `--log-file-d` to the tap-driver, which will
create a `*.log.d` folder for each test when running `make check` and
the `LTTNG_TEST_LOG_DIR` accordingly.
Doing so allows the tests to be run in verbose and create logs in a
predictable location. These log folders are removed when running `make
clean`.
Change-Id: Ibcf7e2cb54098a3e9ccd828ca76df6efcf33431d Signed-off-by: Kienan Stewart <kstewart@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Kienan Stewart [Tue, 30 Jul 2024 20:38:13 +0000 (16:38 -0400)]
Tests: Add environment variables for verbosity and log directory
Observed issue
==============
When working locally with test failures, changes to `utils.sh` are often
required to produce verbose output.
Solution
========
By adding environment variables to allow running the various tools
with higher verbosity and potentially outputting to either stderr or
temporary files in a given directory test runners now have the option
to quickly get more information.
Known drawbacks
===============
Some tests depend on parsing either stderr or stdout, and these global
defaults may potentially make developing robust tests more
complicated.
Change-Id: I4128c421cdf9ce12827adc017dba5a298b62b6de Signed-off-by: Kienan Stewart <kstewart@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Some version of g++ emits the following warning:
'char* strncpy(char*, const char*, size_t)' output may be truncated
copying 255 bytes from a string of length 255 [-Wstringop-truncation]
Using the internal strncpy wrapper, which checks for truncation,
fixes the problem.
Michael Jeanson [Tue, 27 Aug 2024 18:19:03 +0000 (14:19 -0400)]
Fix: unload all kernel modules on sessiond exit
Stopping a root lttng-sessiond that has loaded kernel modules currently
leaves some modules loaded, add them in the correct order to allow
unloading them all.
Change-Id: I71f25c798f8c42737d295f32a5e3708287168bc6 Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Tests: test_session: reduce session count in unit test
The destruction of the sessions takes 45 secondes to complete and
I don't see what testing 10 000 iterations tests that is not
achieved by 1 000 iterations.
That step now completes in 32ms on my development machine 🌪
Simon Marchi [Mon, 18 Mar 2024 18:21:45 +0000 (14:21 -0400)]
Move argpar to vendor directory
Since this is source copied as-is from another project, I think it
belongs to the vendor directory. This will make it so it will be
skipped by format-cpp, for instance.
Change-Id: I78892f80c4cbb3a2e863567b0021e895c6489402 Signed-off-by: Simon Marchi <simon.marchi@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
If this option is specified, then objects are placed into the
subdirectory of the build directory corresponding to the
subdirectory of the source file. For instance, if the source file is
subdir/file.cxx, then the output file would be subdir/file.o. See
Program and Library Variables.
This will allow reducing the number of Makefiles, but placing rules in
Makefiles in parent directories, instead of having Makefiles in every
single directory with something that needs to be built.
root [Wed, 28 Aug 2024 18:52:09 +0000 (18:52 +0000)]
Tests: Make notifier discard count test more robust
Observed issue
==============
In the CI, this test would intermittently fail. During failures,
the calculated pipe size from the `default_pipe_size_getter`
application was 8192, while in other cases it was 65536.
1..41
ok 1 - Add trigger my_trigger
PASS: tools/notification/test_notification_notifier_discarded_count 1 - Add trigger my_trigger
---
duration_ms: 1323.966137
...
ok 2 - No discarded tracer notification
PASS: tools/notification/test_notification_notifier_discarded_count 2 - No discarded tracer notification
---
duration_ms: 22.021590
...
ok 3 - Generating 390 tracer notifications
PASS: tools/notification/test_notification_notifier_discarded_count 3 - Generating 390 tracer notifications
---
duration_ms: 154.790871
...
not ok 4 - Discarded tracer notification number non-zero (0) as expected
FAIL: tools/notification/test_notification_notifier_discarded_count 4 - Discarded tracer notification number non-zero (0) as expected
---
duration_ms: 24.323759
...
```
Cause
=====
The initial size of pipes in linux may have different values:
1) `16 * PAGE_SIZE` (as documented in `man 7 pipe`) (since Linux 2.6.11)
2) When a user has many pipes open and is above a soft limit:
* `2 * PAGE_SIZE` (undocumented, see[1]), as of Linux 5.14[2]
* `1 * PAGE_SIZE` since linux 2.6.35[3]
As the program `default_pipe_size_getter` opened a pipe to check it's
size, there could be times in a system where a user has many pipe
buffers open beyond the soft limit and the lower value would be
returned; however, the previously opened sessiond may have had a pipe
opened with the larger default pipe size.
Solution
========
Use the maximum page size (on Linux, from
`/proc/sys/fs/pipe-max-size`) for the estimated pipe size rather than
opening a pipe and checking it's size.
Known drawbacks
===============
When the maximum pipe size value is much larger than the actual size
of the notification pipe, many more events are emitted than is
necessary to complete the test.
Kienan Stewart [Mon, 6 May 2024 20:14:49 +0000 (16:14 -0400)]
docs: Update relayd architecture
Observed issue
==============
The sessiond sessions do not map one-to-one with relay
sessions. Rather, there can be one relay session associated with
each of the active consumers, e.g. ustconsumerd64, ustconsumerd32,
and kconsumerd of each lttng-sessiond sessions.
Solution
========
The phrasing of the the relay session has been updated to
"per-consumer". An additional mention is added to say that
attaching a viewer session to multiple lttng-sessiond
sessions is not supported.
Change-Id: I1df18c4e97c0ee9ec4ee17b3bf35c6e74c90774f Signed-off-by: Kienan Stewart <kstewart@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Kienan Stewart [Fri, 12 Apr 2024 18:27:09 +0000 (14:27 -0400)]
Fix: relayd: live: Catch short lived applications for attached viewers
Observed issue
==============
When a live viewer is attached to a session and a new application
starts, emits events, and exits the viewer may not see the produced
events.
With per-UID buffer allocation, the application needs to run as a new
user that hasn't had streams allocated before. With per-PID buffers,
spawning a new traced application is sufficient.
Cause
=====
When the new relay streams are created, associated viewer streams are
not immediately created. As a result, there is a gap between in which
the session may start being destroyed and/or the relay streams
unpublished and the time at which the live viewer sends a GET_NEW_STREAMS
command. When the relay streams are unpublished for any reason, the
reference to the relay stream in the ctf_trace is removed. The new
and unsent streams iterate over the relay streams in each ctf_trace.
Therefore, relay streams that were created and unpublished while
the live viewer was already attached to the session can be completely
missed.
Solution
========
The solution has three main aspects:
1. When new relayd streams are published and a viewer is attached for the
corresponding relay session or when a live viewer session attaches to
an existing relay session the viewer streams are created immediately.
2. The unsent viewer streams are tracked in a per-viewer session
list so that there continues to be a reference (via the
viewer_stream->stream backreference) held for the relay stream, and that
unpublished relay streams can be found without iterating over the
entire relay streams hashtable.
3. To cover cases where a relay stream has been closed but there are
still known trace chunks available, an additional check has been added
to the `get_next_index` viewer stream transition checks. When the
seen rotation count and relay stream rotation count are the same and
that the relay stream no longer has an active trace chunk, the
viewer stream is not forcibly rotated. This stops the final drop to
the trace chunk reference (via
viewerstream->stream_file->trace_chunk). Later, when the relay stream
is fully closed, there is a final rotation that is performed.
Known drawbacks
===============
The current implementation adds a global hash table which holds
references to created viewer sessions. When searching to determine if
new viewer streams should be created, the search is O(N*M) where N is the
number of viewer sessons and M is the number of relay sessions.
A different approach to recording references from relay sessions to
viewer sessions (if any exist) could reduce the search space.
Change-Id: Ie8f00697a4dafd5c9b0bfe60a872d1c1882f6944 Signed-off-by: Kienan Stewart <kstewart@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Kienan Stewart [Thu, 11 Apr 2024 17:52:45 +0000 (13:52 -0400)]
Tests: Add controls to run python tests with verbose output
When running failing tests, it can be useful to get verbose output
immediately without trying to run an environment with a separate
sessiond and/or relayd.
Setting `LTTNG_TEST_VERBOSE_RELAYD` or `LTTNG_TEST_VERBOSE_SESSIOND`
environment variables will cause the corresponding application to be run
in it's most verbose configuration.
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Change-Id: Ic2dd84a36f61837dfbca99d06d6a438ae884f782 Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Kienan Stewart [Fri, 29 Mar 2024 20:58:20 +0000 (16:58 -0400)]
Fix: Tests: Use wait_before_exit_file_path in WaitTraceTestApplication
The `_WaitTraceTestApplication`'s `__init__` method proposed the
`wait_before_exit` and `wait_before_exit_file_path` parameters; however,
the parameters weren't then passed onwards to the trace test
application's invocation.
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Change-Id: I9055aa206a8fd943012bacfa49d6ff152f2dfbde Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Killing session daemon (pid = 3340512)
Session daemon killed
lttng-relayd: Error: A file descriptor leak has been detected: 1
tracked file descriptors are still being tracked
```
Change-Id: Ie4294dd7238d4b6074af2d4cf193e1ca9949a741 Signed-off-by: Kienan Stewart <kstewart@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Kienan Stewart [Fri, 29 Mar 2024 20:57:20 +0000 (16:57 -0400)]
Tests: Allow the creation of dummy users in the lttngtest environment
There are tests that need other user accounts created (e.g. to exercise
per-UID buffers with more than one user). Those accounts may be created
by the test environment and cleaned up on deletion.
An option has been added to the _WaitTraceTestApplication to run the
application as another using `su`.
Change-Id: Ie003e628258fdfbea1972f1f8825c4466fc2792b Signed-off-by: Kienan Stewart <kstewart@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Kienan Stewart [Fri, 26 Jul 2024 14:46:40 +0000 (10:46 -0400)]
Tests: Do not remove interrupted test log files by default
Observed issue
==============
During CI runs, builds may timeout or be killed for another reason.
Those tests logs are deleted and cannot be checked for diagnostic
information, warnings, or errors.
Cause
=====
By default, the test log for the currently running test is deleted by
automake so that subsequent invocations of `make check` will re-run the test.
Solution
========
Add a disable flag `--disable-precious-tests` and set
`PRECIOUS_TESTS` to true by default when configuring lttng-tools. When
`PRECIOUS_TOOLS` is set, all test logs in `tests/regression` will be
marked as `.PRECIOUS` and subsequently not deleted when interrupted.
Known drawbacks
===============
This could make interrupting a test and re-running during test
development more of a hassle.
Michael Jeanson [Tue, 28 May 2024 21:18:35 +0000 (17:18 -0400)]
lttng: add-trigger: clarify terminology for log levels
To eliminate ambiguity in the code, the terminology for log levels has
been updated. The previous terms "min" and "max" log levels have been
replaced with "least_severe" and "most_severe" respectively.
This change addresses the varying conventions across different logging
domains, where numerical values for severity can either increase or
decrease with severity. The new terminology provides clarity, making it
easier to understand the severity levels regardless of the logging
domain's convention.
Change-Id: Ie90bcc8e4c07b8b7437d9580e166141fae5c6d2f Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Tue, 28 May 2024 19:08:22 +0000 (15:08 -0400)]
Fix: inverted logic in loglevel_parse_range_string_common function
The mapping of numerical severity levels to their corresponding names
varies across different logging domains. Some domains, like
Java Util Logging, use higher numerical values for more severe logging
levels, while others, like Log4j2, use lower values for the same
purpose.
To accommodate this variation, the `loglevel_parse_range_string_common`
function has been updated. It now accepts the numerical value
representing the most severe logging level in a given domain. This
change ensures that log level specifications in the format `TRACE..` are
parsed correctly, regardless of the domain's convention.
Change-Id: Idbc3949ac33b69c71fce484a6d8912f59cdbe08d Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Fri, 11 Feb 2022 15:38:10 +0000 (15:38 +0000)]
Add Log4j 2.x agent tests for the 'log4j2' domain
Add integration tests for the new Log4j 2.x agent in its native mode
using the new 'log4j2' domain, the new configure switch
'--enable-test-java-agent-log4j2' to enable it or
'--enable-test-java-agent-all' to enable all Java agents tests.
To run only this new test, use this command :
cd tests/regression && make check TESTS="ust/java-log4j2/test_agent_log4j2_domain_log4j2"
Change-Id: Idfac151d2e523b5ac109f2dae2f182b0bc9415d8 Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Michael Jeanson [Wed, 2 Feb 2022 20:04:09 +0000 (20:04 +0000)]
Add a Log4j 2.x agent specific domain 'log4j2'
The initial version of the new LTTng-UST Log4j 2.x agent only operated
in a compatibility mode making use of the existing 'log4j' tracing
domain currently implemented in LTTng-Tools.
While this is useful when migrating existing Log4j applications using
the compatibility bridge it does require converting the log levels from
the new Log4j 2.x values to the old Log4j 1.x standard. This results in
hiding the actual log level values from the users for applications
natively using Log4j 2.x.
Exposing the native Log4j 2.x log level values requires a new domain
since the changes are significant:
* The same list of standard log levels and names
* Each standard log level has a new integer value
* The log levels scale is reversed and shortened from
'int32_max -> int32_min' to '0 -> int32_max'
* The interval between standard log levels has changed
This new 'log4j2' domain is basicaly a straight copy of the current
'log4j' domain with minor adjustements for the reversed and shortened
scale.
Change-Id: I89f9c0a428ffe1d0bd26f7af547e9e21503de653 Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Two auto-generated files cause clang-format < v17 to hang when
they are being formatted. I have not looked into the root-cause,
but formatting them is useless anyhow.
Adding them to .clang-format-ignore works around the problem for the
moment.
Since clang-format 14 does not support ignore files, their support is
crudely emulated here using grep to filter out find's results.
Kienan Stewart [Wed, 10 Jul 2024 18:14:14 +0000 (14:14 -0400)]
Fix: Crash when unregistering UST apps during shutdown
Observed issue
==============
The following crash has been observed in v2.12.2:
```
function=0x55ac7c4c9600 <_ PRETTY FUNCTION .12873> "lttng_ustconsumer_close_metadata") at assert.c:92
function=0x55ac7c4c9600 <_ PRETTY FUNCTION .12873> "lttng_ustconsumer_close_metadata") at assert.c:101
```
The underlying cause is applicable in the current master branch as
well.
Cause
=====
There is a potential race between the threads the consumerd control
thread which handles commands coming from the sessiond and the main
thread when shutting down a consumerd.
Is it possible that the following happens:
1. `destroy_metadata_stream_ht` has the locks on `consumer_data`,
`channel`, `stream`
2. `lttng_ustconsumer_close_all_metadata` looks up the channel and starts to try and acquire a channel lock (`stream->chan->lock`)
3. `destroy_metadata_stream_ht` sets `stream->chan` to `null`
4. `destroy_metadata_stream_ht` releases the `stream`, `channel`, and `consumer_data` locks
5. `lttng_ustconsumer_close_all_metadata` now has the channel lock, and looks up `stream->chan` again to call `destroy_metadata_stream_ht`, and that member is now null
Solution
========
Acquire the stream lock after acquiring the channel lock.
part 2 follows: don't set stream->chan to null.
Known drawbacks
===============
None.
Change-Id: I1d27ea6ac08f3e7ed4624a8921cffb675be649d2 Signed-off-by: Kienan Stewart <kstewart@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Kienan Stewart [Tue, 6 Aug 2024 15:32:57 +0000 (11:32 -0400)]
Fix: Compilation failure deducing type of `auto` variables in GCC 4.8
Observed issue
==============
When compiling with GCC 4.8.5 or GCC 5.5.0 on SLES12SP5, the following
error happens:
```
save.cpp: In function 'int save_agent_events(config_writer*, agent*)':
save.cpp:1185:43: error: use of 'agent_event' before deduction of 'auto'
lttng::urcu::lfht_iteration_adapter<agent_event,
^ save.cpp:1185:43: error: use of 'agent_event' before deduction of 'auto'
save.cpp:1185:43: error: use of 'agent_event' before deduction of 'auto'
save.cpp:1187:26: error: template argument 1 is invalid
&agent_event::node>(*agent->events->ht)) {
^
save.cpp:1187:26: error: creating pointer to member of non-class type '<type error>'
save.cpp:1187:26: note: invalid template non-type parameter
In file included from ../../../src/vendor/fmt/core.h:3316:0,
from ../../../src/common/format.hpp:20,
from ../../../src/common/error.hpp:13,
from ../../../src/common/common.hpp:12,
from snapshot.hpp:13,
from consumer.hpp:12,
from session.hpp:11,
from kernel.hpp:13,
from save.cpp:10:
```
Cause
=====
This appears to be a limitation in older versions of GCC. I did not
find specific commit(s) or bugs which hilight the issue, but
compilation of this code works as of GCC 6.5.0 on SLES12SP5. Previous
point releases of GCC 6.x were not tested.
Solution
========
Explicitly define the type of the pointer and the type passed to
`lttng::urcu::lftht_iteration_adapter` so the compiler does not have
to perform type deduction.
Known drawbacks
===============
None.
Change-Id: I71c5937a38336756ece4f396ea5ba7af7f3d36c3 Signed-off-by: Kienan Stewart <kstewart@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Kienan Stewart [Tue, 6 Aug 2024 15:14:49 +0000 (11:14 -0400)]
Fix: Compilation failure in session_not_found_error with GCC 4.8
Observed issue
==============
When compiling with gcc 4.8.5, the compilation fails with the
following erorr:
```
session.hpp:577:2: error: function 'lttng::sessiond::exceptions::session_not_found_error::session_not_found_error(lttng::sessiond::exceptions::session_not_found_error&&)' defaulted on its first declaration with an exception-specification that differs from the implicit declaration 'lttng::sessiond::exceptions::session_not_found_error::session_not_found_error(lttng::sessiond::exceptions::session_not_found_error&&)'
session.hpp:577:2: error: function 'lttng::sessiond::exceptions::session_not_found_error::session_not_found_error(lttng::sessiond::exceptions::session_not_found_error&&)' defaulted on its first declaration with an exception-specification that differs from the implicit declaration 'lttng::sessiond::exceptions::session_not_found_error::session_not_found_error(lttng::sessiond::exceptions::session_not_found_error&&)'
```
Cause
=====
This is due a bug in GCC which is fixed as of GCC 5.0[1]
Solution
========
Do not explicitly define the move_assignable for
`lttng::sessiond::exceptions::session_not_found_error` as
`noexcept`. The function should be implicitly generated as `noexcept`.
Add a type-safe cds_list iteration adapter. Like those provided for
the lfht, this adapter provides type-safe range-for semantics for
cds_list structures.
The urcu lfht macros often make use of caa_container_of (and other equivalent
variations) which use offsetof. Unfortunately, offsetof is conditionally
supported by compilers for non-POD types.
The tree already has lttng::utils::container_of to work around this
problem. This new utils makes it possible to iterate on the
elements of an lfht that match a given key without using those macros. Those iterations are the
main reason such warnings are emitted. The interface of
lfht_filtered_iteration_adapter also allows the use of ranged-for loops.