glibc 2.34 implements close_range(2), which is used by the ssh client
(amongst others). This needs to be overridden to make sure ssh does not
close lttng-ust file descriptors.
Olivier Dion [Thu, 21 Mar 2024 18:42:13 +0000 (14:42 -0400)]
lttng-ust(3): Fix wrong len_type for sequence
`len_type' of a sequence field must be of type unsigned integer. Some
provided examples in the man page were incorrectly using a type signed
integer, resulting in correct compilation, but error while decoding.
Change-Id: Icc685b330d0704660b36f703075f453d71c5e4cb Signed-off-by: Olivier Dion <odion@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fix: libc wrapper: use initial-exec for malloc_nesting TLS
Use the initial-exec TLS model for the malloc_nesting nesting guard
variable to ensure that the glibc implementation of the TLS access don't
trigger infinite recursion by calling the memory allocator wrapper
functions, which can happen with global-dynamic.
Considering that the libc wrapper is meant to be loaded with LD_PRELOAD
anyway (never with dlopen(3)), we always expect the libc to have enough
space to hold the malloc_nesting variable.
In addition to change the malloc_nesting from global-dynamic to
initial-exec, this removes the URCU TLS compatibility layer from the
libc wrapper, which is a good thing: this compatibility layer relies
on pthread key and calloc internally, which makes it a bad fit for TLS
accesses guarding access to malloc wrappers, due to possible infinite
recursion.
Michael Jeanson [Thu, 14 Dec 2023 15:46:56 +0000 (10:46 -0500)]
fix: -Wsingle-bit-bitfield-constant-conversion with clang16
We get the following warning with Clang 16:
lttng-ust-abi.c:558:38: warning: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Wsingle-bit-bitfield-constant-conversion]
lttng_chan_buf->priv->parent.tstate = 1;
My understanding is that there is no bug because we only check if the
values are zero or not, so we can silence the warning by making the
variables unsigned.
Change-Id: Ic4e02164d5adf4271fa24e5b13e5d320ae19de2e Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Tue, 17 Oct 2023 19:02:44 +0000 (15:02 -0400)]
fix: clean java inner class files in examples
Java classes that contain inner classes will result in additional class
files being created when compiled in the form of
'Class$InnerClass.class'. Expand the clean target to delete those
additional files.
Change-Id: I0ed7939dcaefa5ca26db9438f7a9b34e57d78f21 Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Eliminate iteration over unmodified enablers when synchronizing the
enablers vs event state.
The intent is to turn a O(m*n) algorithm (m = number of enablers, n =
number of event probes) into a O(n) when enabling many additional events
when tracing is active.
This change is done both for event enablers and for event notifier
enablers.
Running the LTTng-tools tests (test_valid_filter, for example) under
address sanitizer results in the following warning:
/usr/include/lttng/urcu/static/urcu-ust.h:155:6: runtime error: member access within misaligned address 0x7fc45db3a020 for type 'struct lttng_ust_urcu_reader', which requires 128 byte alignment
0x7fc45db3a020: note: pointer points here
c4 7f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
^
While the node member of lttng_ust_urcu_reader has an "aligned"
attribute of CAA_CACHE_LINE_SIZE, the compiler can't ensure the
alignment of members for dynamically allocated instances.
The `data` pointer is changed from char* to struct
lttng_ust_urcu_reader*, allowing the compiler to enforce the expected
alignment constraints.
Since `data` was addressed in bytes, the code using this field is
adapted to use element counts. As the chunks are only used to allocate
reader instances (and not other types), it makes the code a bit easier
to read.
Olivier Dion [Tue, 15 Aug 2023 14:47:06 +0000 (10:47 -0400)]
ustfork: Fix warning about volatile qualifier
Clang is strict about the volatile qualifier on function pointers. It
also wants pointers to be passed to atomic builtins, even for
functions. Therefore, use the addresses of function pointers even if
unnecessary according to C standard.
Change-Id: I5d553a46671cc4bfbe8de5cec2425201459f60d2 Signed-off-by: Olivier Dion <odion@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Olivier Dion [Wed, 9 Aug 2023 21:35:40 +0000 (17:35 -0400)]
ustfork: Fix possible race conditions
Assuming that `dlsym(RTLD_NEXT, "symbol")' is invariant for "symbol",
then we could think that memory operations on the `plibc_func' pointers can
be safely done without atomics.
However, consider what would happen if a load to a`plibc_func' pointer
is torn apart by the compiler. Then a thread could see:
1) NULL
2) The stored value as returned by a dlsym() call
3) A mix of 1) and 2)
The same goes for other optimizations that a compiler is authorized to
do (e.g. store tearing, load fusing).
One could question whether such race condition is even possible for the
clone(2) wrapper. Indeed, a thread must be cloned to get into
existence. Therefore, the main thread would always store the value of
`plibc_func' at least once before creating the first sibling thread,
preventing any possible race condition for this wrapper. However, this
assume that the main thread will not call the clone system call directly
before calling the libc wrapper! Thus, to be on the safe side, we do the
same for the clone wrapper.
Fix the race conditions by using the uatomic_read/uatomic_set functions,
on access to `plibc_func' pointers.
Change-Id: Ic4be25983b8836d2b333f367af9c18d2f6b75879 Signed-off-by: Olivier Dion <odion@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Wed, 14 Jun 2023 20:55:28 +0000 (16:55 -0400)]
fix: python agent: use stdlib distutils when setuptools is installed
When the setuptools package is installed, it monkey patches the standard
library distutils even if the user code doesn't import setuptools.
This results in a failure to install the python agent in a directory
which ins't in the current PYTHONPATH. To allow this setuptools requires
the '--single-version-externally-managed' options which is not
implemented in distutils.
To resolve this, force the use of distutils for python < 3.12 even when
setuptools is installed with the 'SETUPTOOLS_USE_DISTUTILS' environment
variable and use the proper setuptools option with python >= 3.12 which
doesn't include distutils anymore.
Change-Id: Idf477ca61bed460c9f6be7f481fe3b84624f328c Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Wed, 14 Jun 2023 19:58:32 +0000 (15:58 -0400)]
fix: python agent: install on Debian python >= 3.10
Starting with Debian's Python 3.10, the default install scheme is
'posix_local' which is a Debian specific scheme based on 'posix_prefix'
but with an added 'local' prefix. This is the default so users doing
system wide manual installations of python modules end up in
'/usr/local'. This interferes with our autotools based install which
already defaults to '/usr/local' and expect a provided prefix to be used
verbatim.
Monkeypatch sysconfig to override this scheme and use 'posix_prefix' instead.
Change-Id: I08fe77b6c8807515765e3ad0344aa6849e573b90 Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fix: segmentation fault on filter interpretation in "switch" mode
When building the interpreter with `INTERPRETER_USE_SWITCH`, I get the
following crash when interpreting a bytecode:
Program terminated with signal SIGSEGV, Segmentation fault.
(gdb) bt
#0 0x00007f5789aee443 in lttng_bytecode_interpret (ust_bytecode=0x555dfe90a650, interpreter_stack_data=0x7ffd12615500 "", probe_ctx=0x7ffd12615620,
caller_ctx=0x7ffd126154bc) at lttng-bytecode-interpreter.c:885
#1 0x00007f5789af4da2 in lttng_ust_interpret_event_filter (event=0x555dfe90a580, interpreter_stack_data=0x7ffd12615500 "", probe_ctx=0x7ffd12615620,
event_filter_ctx=0x0) at lttng-bytecode-interpreter.c:2548
#2 0x0000555dfe02d2d4 in lttng_ust__event_probe__tp___the_string (__tp_data=0x555dfe90a580, i=0, arg_i=2, str=0x7ffd12617cfa "hypothec") at ././tp.h:16
#3 0x0000555dfe02cac0 in lttng_ust_tracepoint_cb_tp___the_string (str=0x7ffd12617cfa "hypothec", arg_i=2, i=0)
at /tmp/lttng-master/src/lttng-tools/tests/utils/testapp/gen-ust-nevents-str/tp.h:16
#4 main (argc=39, argv=0x7ffd12615818) at gen-ust-nevents-str.cpp:38
This appears to be caused by `bytecode->data` being used to determine
the `start_pc` address. In my case, `data` is NULL. A quick look around
the code seems to show that this member is not used except during the
transmission of the bytecode.
I am basing the fix on the implementation of START_OP in the default
case which uses `code` in lieu of `data` and can confirm that it fixes
the crash on my end.
Fix: c99: static assert: clang build fails due to multiple typedef
Unlike c11, c99 does not allow redefinition of the same typedef, and
clang is strict about it. Building code with tracepoints with -std=c99
with clang fails with:
warning: redefinition of typedef 'lttng_ust_static_assert_Tracepoint_name_length_is_too_long' is a C11 feature [-Wtypedef-redefinition]
Fix this by placing the (potentially negative size) array as argument to
a function prototype instead.
It is caused by the fact that tracef.h includes tracepoint.h in a
context which has LTTNG_UST_TRACEPOINT_DEFINE undefined, and this is not
re-evaluated for the following includes.
Fix this by lifting the definition code in tracepoint.h outside of the
header include guards, and #undef the old LTTNG_UST__DEFINE_TRACEPOINT
before re-defining it to its new semantic. Use a new
_LTTNG_UST_TRACEPOINT_DEFINE_ONCE include guard within the
LTTNG_UST_TRACEPOINT_DEFINE defined case to ensure symbols are not
duplicated.
Wrap constructor and destructor functions to invoke them as functions with
the constructor/destructor GNU C attributes, which ensures that those
constructors/destructors are ordered before/after C++
constructors/destructors.
Wrap constructor and destructor functions as the constructor/destructor of a
variable defined within an anonymous namespace when building as C++ with
LTTNG_UST_ALLOCATE_COMPOUND_LITERAL_ON_HEAP defined. With this option,
there are no guarantees that the events in C++ constructors/destructors will
be traced.
Fixes: 05bfa3dc3a6e ("Fix: generate probe registration constructor as a C++ constuctor") Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: If058b15af6b4d8852fa29d0a21b8233bcb4b43a2
Adding a priority (150) to the tracepoint and tracepoint provider
constructors/destructors ensures that we trace tracepoints located
within C constructors/destructors with a higher priority value,
including the default init priority of 65535, when the tracepoint vs
tracepoint definition vs tracepoint probe provider are in different
compile units (and in various link order one compared to another).
Fix: use unaligned pointer accesses for lttng_inline_memcpy
lttng_inline_memcpy receives pointers which can be unaligned. This
causes issues (traps) specifically on arm 32-bit with 8-byte strings
(including \0).
Use unaligned pointer accesses for loads/stores within
lttng_inline_memcpy instead.
There is an impact on code generation on some architectures. Using the
following test code on godbolt.org:
The resulting assembler (gcc 12.2.0 in -O2) between aligned and
unaligned:
- x86-32: unchanged.
- x86-64: unchanged.
- powerpc32: unchanged.
- powerpc64: unchanged.
- arm32: 16 and 32-bit copy: unchanged. Added code for 64-bit unaligned copy.
- aarch64: unchanged.
- mips32: added code for unaligned.
- mips64: added code for unaligned.
- riscv: added code for unaligned.
If we want to improve the situation on mips and riscv, this would
require introducing a new "lttng_inline_integer_copy" and expose
additional ring buffer client APIs in addition to event_write() which
take integers as inputs. Let's not introduce that complexity yet until
it is justified.
Reject specialized load ref and get context ref instructions so a
bytecode crafted with nefarious intent cannot read a memory area larger
than the memory targeted by the instrumentation.
This prevents bytecode received from the session daemon from performing
out of bound memory accesses and from disclosing the content of
application memory beyond what has been targeted by the instrumentation.
Reject specialized load instructions so a bytecode crafted with
nefarious intent cannot read a memory area larger than the memory
targeted by the instrumentation.
This prevents bytecode received from the session daemon from performing
out of bound memory accesses and from disclosing the content of
application memory beyond what has been targeted by the instrumentation.
Validate that the buffer length is large enough to hold empty capture
fields.
If the buffer is initially not large enough to hold empty capture fields
for each field to capture, discard the notification.
If after capturing a field there is not enough room anymore in the
buffer to write empty capture fields, skip the offending large field by
writing an empty capture field in its place.
The code currently assumes that the forked process is the only child
process at that point in time. However, there can be unreaped child
processes as reported in the original bug.
From wait(3), as currently used, "status is requested for any child
process."
Using the pid explicitly ensures a wait on the expected child process.
More context is available at:
https://bugs.lttng.org/issues/1359
Michael Jeanson [Thu, 7 Jul 2022 21:01:54 +0000 (17:01 -0400)]
fix: 'make dist' without javah
Don't use 'BUILT_SOURCES' for the header file generated by javah /
javac, files added to this target will be generated on 'make dist'
regardless of the configuration or presence of the required tools.
Add proper make dependencies between the different targets instead of
using 'all-local'.
Set JAVAROOT to a temporary directory to properly clean class files and
avoid confusing javah when it's used to generate the JNI header.
Change-Id: I8544d0418039ba667d062cb01c924368ab702ab7 Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fix: disable array/sequence compile-time type check in C
Disable this compile-time check in C. Indeed, the C implementation of
lttng_ust_is_pointer_type does not support opaque pointer types, because
it relies on pointer arithmetic.
Therefore, remove this check to keep supporting opaque pointers as
array/sequence elements in probe providers.
The worse that could happen is that users providing an unsupported
type as array/sequence element will end up with a meaningless integer
field.
Michael Jeanson [Fri, 29 Jul 2022 15:12:57 +0000 (11:12 -0400)]
fix: add missing closedir in _get_max_cpuid_from_sysfs()
As reported by Coverity:
*** CID 1490849: (RESOURCE_LEAK)
/src/common/smp.c: 84 in _get_max_cpuid_from_sysfs()
78 * CPU num of 0.
79 */
80 if (max_cpuid < 0 || max_cpuid > INT_MAX)
81 max_cpuid = -1;
82
83 end:
>>> CID 1490849: (RESOURCE_LEAK)
>>> Variable "cpudir" going out of scope leaks the storage it points to.
84 return max_cpuid;
85 }
86
87 /*
88 * As a fallback to parsing the CPU mask in "/sys/devices/system/cpu/possible",
89 * iterate on all the folders in "/sys/devices/system/cpu" that start with
Change-Id: I2048e2473d66aaa2a275fe2923da84a7e105f235 Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Wed, 27 Jul 2022 20:19:35 +0000 (16:19 -0400)]
fix: Unify possible CPU number fallback
The MUSL specific fallback to get the number of possible CPUs in the
system has the same issue with hot-unplugged CPUs as the Glibc
implementation we worked around by using the possible CPU mask from
sysfs.
To address this, unify our fallback code across all C libraries to get
the maximum CPU id from the directories in "/sys/devices/system/cpu".
Change-Id: I5541742dc1de8e011a942880825fa88c656f0905 Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Wed, 27 Jul 2022 14:54:53 +0000 (10:54 -0400)]
fix: removed accidental VLA in _get_num_possible_cpus()
The LTTNG_UST_PAGE_SIZE define can either point to a literal value or
the sysconf() function making buf[] a VLA. Replace this by a
cpumask specifc define that will always be a literal value.
Change-Id: I8d329f314878e8018939f979861918969e3ec8ac Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Wed, 20 Jul 2022 18:49:56 +0000 (14:49 -0400)]
fix: num_possible_cpus() with hot-unplugged CPUs
We rely on sysconf(_SC_NPROCESSORS_CONF) to get the maximum possible
number of CPUs that can be attached to the system for the lifetime of an
application. We use this value to allocate an array of per-CPU buffers
that is indexed by the numerical id of the CPUs.
As such we expect that the highest possible CPU id would be one less
than the number returned by sysconf(_SC_NPROCESSORS_CONF) which is
unfortunatly not always the case and can vary across libc
implementations and versions.
Glibc up to 2.35 will count the number of "cpuX" directories in
"/sys/devices/system/cpu" which doesn't include CPUS that were
hot-unplugged.
This information is however provided by the kernel in
"/sys/devices/system/cpu/possible" in the form of a mask listing all the
CPUs that could possibly be hot-plugged in the system.
This patch changes the implementation of num_possible_cpus() to first
try parsing the possible CPU mask to extract the highest possible value
and if this fails fallback to the previous behavior.
Change-Id: I1a3cb1a446154ec443a391d6689cb7d4165726fd Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Thu, 21 Jul 2022 13:10:30 +0000 (09:10 -0400)]
fix: Disable warnings for GNU extensions on Clang
Some versions of Clang enabled '-Wgnu' in '-Wall', since we rely on
GNUisms in the code this results in numerous errors. Check if the
compiler accepts '-Wno-gnu' to disable those warnings.
Change-Id: I9d1126744e427a6cf7c18e219cae5431227a43c0 Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
takeshi.iwanari [Fri, 24 Jun 2022 13:17:39 +0000 (22:17 +0900)]
Fix: Use negative value for error code of lttng_ust_ctl_duplicate_ust_object_data
[As is]
- `lttng_ust_ctl_duplicate_ust_object_data` function is called by the following functions:
- `event_notifier_error_accounting_register_app` (lttng-tools)
- `duplicate_stream_object` (lttng-tools)
- `duplicate_channel_object` (lttng-tools)
- `lttng_ust_ctl_duplicate_ust_object_data` function returns positive value (= errno = 24 = EMFILE) when system call `dup` returns error
- However, `duplicate_stream_object` and `duplicate_channel_object` functions expect negative value as error code
- As a result, these functions cannot handle error and segmentation fault occurs when using `stream->handle`
[Proposal]
- Currently, `lttng_ust_ctl_duplicate_ust_object_data` function returns either positive or negative value when error happens
- It looks convention is using negative value for error code (e.g. `-ENOMEM` )
- So, I propose to change `errno` to `-errno`
LTTng-UST scheme for letting listener threads wait on session daemon
to wake up a futex is similar to the liburcu workqueue code, which has
an issue with spurious wakeups.
This wait/wakeup scheme is only used after the LTTng-UST listener thread
has been unable to connect to the session daemon.
A spurious wakeup on wait_for_sessiond can cause wait_for_sessiond to
return with a sock_info->wait_shm_mmap state of 0, which is unexpected.
However, this should not cause any user-observable issues other than
using slightly more CPU time than strictly needed, because this spurious
wakeup will only cause an additional connection attempt to the session
daemon to fail.
Cause
=====
From futex(5):
FUTEX_WAIT
Returns 0 if the caller was woken up. Note that a wake-up can
also be caused by common futex usage patterns in unrelated code
that happened to have previously used the futex word's memory
location (e.g., typical futex-based implementations of Pthreads
mutexes can cause this under some conditions). Therefore, call‐
ers should always conservatively assume that a return value of 0
can mean a spurious wake-up, and use the futex word's value
(i.e., the user-space synchronization scheme) to decide whether
to continue to block or not.
Solution
========
We therefore need to validate whether the value differs from 0 in
user-space after the call to FUTEX_WAIT returns 0.
ust_lock_nocheck is meant to be async-signal-safe for use from the
fork() override helper (and fork(2) is async-signal-safe).
Remove calls to strerror() from ust lock functions and from the
cancelstate helper because strerror is not async-signal-safe and indeed
allocates memory.
Fix: remove non-async-signal-safe fflush from ERR()
Commit ff1fedb9f2e8 ("usterr: make error reporting functions signal safe")
changed the logging printout mechanism to use patient_write() to a file
descriptor to ensure signal-safety of the ERR() logging mechanism.
However, the fflush(stderr) was left in place, although it was useless.
Unfortunately, fflush() is not async-signal-safe.
Fix: Pointers are rejected by integer element compile time assertion for array and sequence
commit 2df82195d140b ("Add compile time assertion that array and
sequence have integer elements") introduced a check to validate that
sequences and arrays only contain integers. This was meant to refuse
arrays of double/float which are not supported.
However, as a side-effect, this also refuses arrays and sequences of
pointers, which were accepted prior to lttng-ust 2.13.
Introduce a lttng_ust_is_pointer_type() and use it in the array/sequence
type validation. The trick here is to use the fact that a difference
between two pointers in C is an integer. Therefore, we can validate that
an argument type is a pointer similarly to C++ is_pointer.
for (i = 0; i < 5; i++) {
tracepoint(sample_component, message, "Hello World");
usleep(1);
}
printf("Run `lttng regenerate statedump. Press enter \n");
getchar();
dlclose(handle_dog);
printf("Run `lttng regenerate statedump. Press enter \n");
getchar();
dlclose(handle_cat);
return 0;
}
On lttng side:
lttng create
lttng enable-event -u -a
lttng start
valgrind sample
Issue `lttng regenerate statedump` as the app suggest.
The second `lttng regenerate statedump` results in:
==934747== Invalid read of size 8
==934747== at 0x48BA90F: iter_end (lttng-ust-statedump.c:439)
==934747== by 0x48BAD73: lttng_ust_dl_update (lttng-ust-statedump.c:586)
==934747== by 0x48BADC0: do_baddr_statedump (lttng-ust-statedump.c:599)
==934747== by 0x48BAE62: do_lttng_ust_statedump (lttng-ust-statedump.c:633)
==934747== by 0x489F820: lttng_handle_pending_statedump (lttng-events.c:969)
==934747== by 0x488C000: handle_pending_statedump (lttng-ust-comm.c:717)
==934747== by 0x488DCF7: handle_message (lttng-ust-comm.c:1110)
==934747== by 0x48905EA: ust_listener_thread (lttng-ust-comm.c:1756)
==934747== by 0x4B62608: start_thread (pthread_create.c:477)
==934747== by 0x4A4D162: clone (clone.S:95)
==934747== Address 0x4c4ea88 is 4,152 bytes inside a block of size 4,176 free'd
==934747== at 0x483CA3F: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==934747== by 0x48B9588: free_dl_node (lttng-ust-statedump.c:123)
==934747== by 0x48BA90A: iter_end (lttng-ust-statedump.c:450)
==934747== by 0x48BAD73: lttng_ust_dl_update (lttng-ust-statedump.c:586)
==934747== by 0x48BADC0: do_baddr_statedump (lttng-ust-statedump.c:599)
==934747== by 0x48BAE62: do_lttng_ust_statedump (lttng-ust-statedump.c:633)
==934747== by 0x489F820: lttng_handle_pending_statedump (lttng-events.c:969)
==934747== by 0x488C000: handle_pending_statedump (lttng-ust-comm.c:717)
==934747== by 0x488DCF7: handle_message (lttng-ust-comm.c:1110)
==934747== by 0x48905EA: ust_listener_thread (lttng-ust-comm.c:1756)
==934747== by 0x4B62608: start_thread (pthread_create.c:477)
==934747== by 0x4A4D162: clone (clone.S:95)
==934747== Block was alloc'd at
==934747== at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==934747== by 0x48B936A: zmalloc (helper.h:27)
==934747== by 0x48B936A: alloc_dl_node (lttng-ust-statedump.c:85)
==934747== by 0x48B98F7: find_or_create_dl_node (lttng-ust-statedump.c:184)
==934747== by 0x48BA205: extract_baddr (lttng-ust-statedump.c:339)
==934747== by 0x48BABC6: extract_bin_info_events (lttng-ust-statedump.c:528)
==934747== by 0x4A8D2F4: dl_iterate_phdr (dl-iteratephdr.c:75)
==934747== by 0x48BAD4C: lttng_ust_dl_update (lttng-ust-statedump.c:583)
==934747== by 0x48BADC0: do_baddr_statedump (lttng-ust-statedump.c:599)
==934747== by 0x48BAE62: do_lttng_ust_statedump (lttng-ust-statedump.c:633)
==934747== by 0x489F820: lttng_handle_pending_statedump (lttng-events.c:969)
==934747== by 0x488C000: handle_pending_statedump (lttng-ust-comm.c:717)
==934747== by 0x488DCF7: handle_message (lttng-ust-comm.c:1110)
==934747==
Cause
=========
Nodes can be removed during the `cds_hlist_for_each_entry_2` iteration which
is not meant to be used when items are removed within the traversal.
Solution
=========
Use `cds_hlist_for_each_entry_safe_2`.
Change-Id: Ibf3d94a4d6f7abac19ed9740eeacfbcb1bdf1f4f Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fix: bytecode interpreter context_get_index() leaves byte order uninitialized
Observed Issue
==============
When using the event notification capture feature to capture a context
field, e.g. '$ctx.cpu_id', the captured value is often observed in
reverse byte order.
Cause
=====
Within the bytecode interpreter, context_get_index() leaves the "rev_bo"
field uninitialized in the top of stack.
This only affects the event notification capture bytecode because the
BYTECODE_OP_GET_SYMBOL bytecode instruction (as of lttng-tools 2.13)
is only generated for capture bytecode in lttng-tools. Therefore, only
capture bytecode targeting contexts are affected by this issue. The
reason why lttng-tools uses the "legacy" bytecode instruction to get
context (BYTECODE_OP_GET_CONTEXT_REF) for the filter bytecode is to
preserve backward compatibility of filtering when interacting with
applications linked against LTTng-UST 2.12.
Solution
========
Initialize the rev_bo field based on the context field type
reserve_byte_order field.
Sampling the discarded events count in the buffer_end callback is done
out of order, and may therefore include increments performed by following
events (in following packets) if the thread doing the end-of-packet
event write is preempted for a long time.
Sampling the event discarded counts before reserving space for the last
event in a packet, and keeping this as part of the private ring buffer
context, should fix this race.
When compiling with -DLTTNG_RING_BUFFER_COUNT_EVENTS, the lttng-ust
libringbuffer can count events (with additional overhead). This is never
used or enabled by default. Fix this code so it compiles again when the
define is enabled.
If exec(2) is executed by the application concurrently with LTTng-UST
listener threads between the creation of a file descriptor with
socket(2), recvmsg(2), or pipe(2) and call to fcntl(3) FD_CLOEXEC, those
file descriptors will stay open after the exec, which is not intended.
As a consequence, shared memory files for ring buffers can stay present
on the file system for long-running traced processes.
Use:
- pipe2(2) O_CLOEXEC (supported since Linux 2.6.27, and by FreeBSD),
- socket(2) SOCK_CLOEXEC (supported since Linux 2.6.27, and by FreeBSD),
- recvmsg(2) MSG_CMSG_CLOEXEC (supported since Linux 2.6.23 and by FreeBSD),
rather than fcntl(2) FD_CLOEXEC to make sure the file descriptors are
closed on exec immediately upon their creation.
Michael Jeanson [Thu, 10 Feb 2022 15:25:02 +0000 (15:25 +0000)]
Add 'domain' parameter to the Log4j 2.x agent
The initial Log4j 2.x agent commit only implemented a compatibility mode
to be used with the existing LOG4J domain in lttng-tools.
In this mode the agent converts the new Log4j 2.x loglevel values to
their corresponding Log4j 1.x values in the same way the upstream
compatibility bridge does.
This is great when doing in-place migration using the upstream
compatibility bridge but doesn't cover the usecase of an application
that natively uses Log4j 2.x.
This commit adds a new mandatory 'domain' parameter to the Log4j2 agent
which currently only implements the 'LOG4J' compatibility domain in
preparation to adding a 'LOG4J2' domain.
The configuration for a single appender in Log4j 1.x compat mode will
now look like this:
Michael Jeanson [Wed, 2 Feb 2022 19:04:50 +0000 (19:04 +0000)]
fix: Convert custom loglevels in Log4j 2.x agent
The loglevel integer representation has changed between log4j 1.x and
2.x, we currently convert the standard loglevels but passthrough the
custom ones.
This can be problematic when using severity ranges as custom loglevels
won't be properly filtered.
Use the same strategy as the upstream Log4j 2.x compatibility layer by
converting the custom loglevels to their equivalent standard loglevel
value.
Change-Id: I8cbd4706cb774e334380050cf0b407e19d7bc7c4 Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
This backport differs from the master branch for the
'--enable-java-agent-all' option won't select this new agent since we
wanted to avoid introducing a new dependency in existing configurations.
The name of the new agent jar file is "lttng-ust-agent-log4j2.jar".
It will be installed in the arch-agnostic "$prefix/share/java" path
e.g: "/usr/share/java".
It uses the same jni library "liblttng-ust-log4j-jni.so" as the Log4j 1.x agent.
The agent was designed as a mostly drop-in replacement for applications
upgrading from Log4j 1.x to 2.x. It requires no modification to the
tracing configuration as it uses the same domain "-l / LOG4J" and the
loglevels integer representations are converted to the Log4j 1.x values
(excluding custom loglevels).
The recommended way to use this agent with Log4j 2.x is to add an
"Lttng" Appender with an arbiraty name and associate it with one or more
Logger using an AppenderRef.
For example, here is a basic log4j2 xml configuration that would send
all logging statements exlusively to an lttng appender:
More examples can be found in the 'doc/examples' directory.
The implementation of the appender is based on this[1] great guide by
Keith D. Gregory which is so much more detailed than the official
documentation, my thanks to him.
Michael Jeanson [Tue, 18 Jan 2022 19:14:33 +0000 (19:14 +0000)]
Fix: may be used uninitialized on powerpc
Fix the following warning on powerpc :
In file included from ../../src/common/counter/counter-internal.h:16,
from ../../src/common/counter/counter-api.h:16,
from counter-clients/percpu-64-modular.c:12:
In function ‘__lttng_counter_add_percpu’,
inlined from ‘lttng_counter_add’ at ../../src/common/counter/counter-api.h:265:10,
inlined from ‘counter_add’ at counter-clients/percpu-64-modular.c:53:9:
include/urcu/compiler.h:25:42: warning: ‘move_sum’ may be used uninitialized [-Wmaybe-uninitialized]
25 | #define caa_unlikely(x) __builtin_expect(!!(x), 0)
| ^~~~~
../../src/common/counter/counter-api.h:244:13: note: in expansion of macro ‘caa_unlikely’
244 | if (caa_unlikely(move_sum))
| ^~~~~~~~~~~~
In file included from counter-clients/percpu-64-modular.c:12:
counter-clients/percpu-64-modular.c: In function ‘counter_add’:
../../src/common/counter/counter-api.h:237:17: note: ‘move_sum’ declared here
237 | int64_t move_sum;
| ^~~~~~~~
Change-Id: I65dc61a567c0337735124a35f1af96697d416054 Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
lttng-sessiond does not expect the variant_nestable type when generating
UST metadata. This fix only belongs to the master branch, not to a
stable branch.
LTTng-UST 2.13 serializes the contents of the variant_nestable union
field, but keeps the "atype" as lttng_ust_ctl_atype_variant.
It happens to work by pure chance because the binary layout of the
variant_nestable and legacy.variant union fields are the same, except
for the alignment field of variant_nestable which is zeroed padding in
the legacy.variant. Therefore, as long as the variant_nestable has a
padding of 0, everything works out fine (which is currently the case).
But it's better to fix this discrepancy in case we ever plan to use a
nonzero variant alignment.
Fix: doc/examples/java-log4j: fix paths to directories
Since the directory hierarchy refactoring introduced in the 2.13 release
of lttng-ust, the paths in `run` are wrong, they are missing a `src`
component.
ust-compiler: constructor/destructor whitespaces layout and macro dependency
Introduce LTTNG_UST_COMPILER_COMBINE_TOKENS in lttng/ust-compiler.h to
eliminate a circular dependency from ust-compiler.h to
LTTNG_UST__TP_COMBINE_TOKENS (defined in tracepoint.h). Use it in
LTTNG_UST_DECLARE_CONSTRUCTOR_DESTRUCTOR.
Change the layout of LTTNG_UST_DECLARE_CONSTRUCTOR_DESTRUCTOR to use
tabs rather than spaces.
The default behavior, for AC_CHECK_LIB when the `action-if-found` is NOT
defined, is to prepend the library to LIBS. [1]
"
If action-if-found is not specified, the default action prepends
-llibrary to LIBS and defines ‘HAVE_LIBlibrary’ (in all capitals).
"
It is important to note that the LIBS variable is used for ALL linking.
This is normally not a problem for most distribution since they force
the use of `--as-needed` at the toolchain level (gcc specs) (for example
debian [2]). One could also pass the `--as-needed` flag manually but
libtool reorganize flags in the case of shared object creation [3].
In our case, we always explicitly state the dependencies via the *_LIBADD
automake clause. We do not rely on the LIBS variable.
Simply force the define of HAVE_LIBNUMA to prevent the prepending to
LIBS.
Michael Jeanson [Thu, 2 Dec 2021 21:11:21 +0000 (16:11 -0500)]
fix: Allow disabling some abi compat tests
Allow disabling ABI compat tests that rely on a library using a symbol from
the global offset table even if it provides its own copy, which is the
default behavior on Linux.
This situation happens when using the '-Bsymbolic-functions' linker flag
which binds references to public symbols in a library to the definition
within the library, bypassing the global offset table.
To disable those tests when running the test suite, set the
UST_TESTS_LD_SYMBOLIC_FUNC environment variable to any value, for
example :
make check UST_TESTS_LD_SYMBOLIC_FUNC=1
Change-Id: I1ed23d90bbe1b174ab7b4fccfb40b701b291c074 Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fix: generate probe registration constructor as a C++ constuctor
Observed issue
==============
Applications which transitively dlopen() a library which, in turn,
dlopen() providers crash when they are compiled with clang or
if LTTNG_UST_ALLOCATE_COMPOUND_LITERAL_ON_HEAP is defined.
Core was generated by `././myapp.exe'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007fa94f860bc2 in check_event_provider (probe_desc=<optimized out>) at lttng-probes.c:153
153 if (!check_type_provider(field->type)) {
[Current thread is 1 (Thread 0x7fa94fcbc740 (LWP 511754))]
(gdb) bt
#0 0x00007fa94f860bc2 in check_event_provider (probe_desc=<optimized out>) at lttng-probes.c:153
#1 lttng_ust_probe_register (desc=0x7fa94fe9dc80 <lttng_ust__probe_desc___embedded_sys>)
at lttng-probes.c:242
#2 0x00007fa94fe9ba3c in lttng_ust__tracepoints__ptrs_destroy ()
at /usr/include/lttng/tracepoint.h:590
#3 0x00007fa94fedfe2e in call_init () from /lib64/ld-linux-x86-64.so.2
#4 0x00007fa94fedff1c in _dl_init () from /lib64/ld-linux-x86-64.so.2
#5 0x00007fa94fdf7d45 in _dl_catch_exception () from /usr/lib/libc.so.6
#6 0x00007fa94fee420a in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#7 0x00007fa94fdf7ce8 in _dl_catch_exception () from /usr/lib/libc.so.6
#8 0x00007fa94fee39bb in _dl_open () from /lib64/ld-linux-x86-64.so.2
#9 0x00007fa94fe8d36c in ?? () from /usr/lib/libdl.so.2
#10 0x00007fa94fdf7ce8 in _dl_catch_exception () from /usr/lib/libc.so.6
#11 0x00007fa94fdf7db3 in _dl_catch_error () from /usr/lib/libc.so.6
#12 0x00007fa94fe8db99 in ?? () from /usr/lib/libdl.so.2
#13 0x00007fa94fe8d3f8 in dlopen () from /usr/lib/libdl.so.2
#14 0x00007fa94fecc647 in mon_constructeur () at mylib.cpp:20
#15 0x00007fa94fedfe2e in call_init () from /lib64/ld-linux-x86-64.so.2
#16 0x00007fa94fedff1c in _dl_init () from /lib64/ld-linux-x86-64.so.2
#17 0x00007fa94fdf7d45 in _dl_catch_exception () from /usr/lib/libc.so.6
#18 0x00007fa94fee420a in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#19 0x00007fa94fdf7ce8 in _dl_catch_exception () from /usr/lib/libc.so.6
#20 0x00007fa94fee39bb in _dl_open () from /lib64/ld-linux-x86-64.so.2
#21 0x00007fa94fe8d36c in ?? () from /usr/lib/libdl.so.2
#22 0x00007fa94fdf7ce8 in _dl_catch_exception () from /usr/lib/libc.so.6
#23 0x00007fa94fdf7db3 in _dl_catch_error () from /usr/lib/libc.so.6
#24 0x00007fa94fe8db99 in ?? () from /usr/lib/libdl.so.2
#25 0x00007fa94fe8d3f8 in dlopen () from /usr/lib/libdl.so.2
#26 0x00005594f478c18c in main ()
Cause
=====
Building tracepoint instrumentation as C++ using clang causes
LTTNG_UST_ALLOCATE_COMPOUND_LITERAL_ON_HEAP to be defined due to a
compiler version detection problem addressed by another patch.
However, building with LTTNG_UST_ALLOCATE_COMPOUND_LITERAL_ON_HEAP
defined still results in the crash.
When LTTNG_UST_ALLOCATE_COMPOUND_LITERAL_ON_HEAP is defined, the
lttng_ust_event_field lttng_ust__event_fields__[...] structure is
initialized by dynamically-allocating field structures for the various
fields.
As the initialization can't be performed statically, it is performed at
run-time _after_ the execution of the library constructors has
completed.
Moreover, the generated initialization
function of the provider (lttng_ust__events_init__[...]) is declared as being a library
constructor. Hence, this causes it to run before the
tracepoint fields structures has a chance to be initialized.
This all results in a NULL pointer dereference during the validation of
the fields.
Solution
========
When building providers as C++, the initialization function is defined
as the constructor of a class. This class is, in turn, instantiated in
an anonymous namespace.
For the purposes of this patch, the use of an anonymous namespace is
equivalent to declaring the instance as 'static', but it is preferred in
C++11.
Known drawbacks
===============
None.
References
==========
A reproducer is available:
https://github.com/jgalar/ust-clang-reproducer
Problem initially reported on dotnet/runtime's issue tracker:
https://github.com/dotnet/runtime/issues/62398
The pthread cancelstate disable performed to ensure threads are not
cancelled while holding mutexes which are used in library destructors
does not currently support that those mutexes may be nested. It
generates error messages when using the fork and fd helpers when running
with LTTNG_UST_DEBUG=1. The effect of this is that the pthread
cancelstate can be re-enabled too soon when the first unlock is
performed (in a nested lock scenario), thus allowing the thread to be
cancelled while still holding a lock, and causing a deadlock on
application exit.
Fix: abort on decrement_sem_count during concurrent tracing start and teardown
Observed issue
==============
The following backtrace has been reported:
#0 __GI_raise (sig=sig@entry=6)
at /usr/src/debug/glibc/2.31/git/sysdeps/unix/sysv/linux/raise.c:50
#1 0x0000007f90b3fdd4 in __GI_abort () at /usr/src/debug/glibc/2.31/git/stdlib/abort.c:79
#2 0x0000007f90b4bf50 in __assert_fail_base (fmt=0x7f90c3da98 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
assertion=assertion@entry=0x7f9112cb90 "uatomic_read(&sem_count) >= count",
file=file@entry=0x7f9112cb30 "/usr/src/debug/lttng-ust/2_2.13.0-r0/lttng-ust-2.13.0/src/lib/lttng-ust/lttng-ust-comm.c",
line=line@entry=664, function=function@entry=0x7f911317e8 <__PRETTY_FUNCTION__.10404> "decrement_sem_count")
at /usr/src/debug/glibc/2.31/git/assert/assert.c:92
#3 0x0000007f90b4bfb4 in __GI___assert_fail (assertion=assertion@entry=0x7f9112cb90 "uatomic_read(&sem_count) >= count",
file=file@entry=0x7f9112cb30 "/usr/src/debug/lttng-ust/2_2.13.0-r0/lttng-ust-2.13.0/src/lib/lttng-ust/lttng-ust-comm.c",
line=line@entry=664, function=function@entry=0x7f911317e8 <__PRETTY_FUNCTION__.10404> "decrement_sem_count")
at /usr/src/debug/glibc/2.31/git/assert/assert.c:101
#4 0x0000007f910e3830 in decrement_sem_count (count=<optimized out>)
at /usr/src/debug/lttng-ust/2_2.13.0-r0/lttng-ust-2.13.0/src/lib/lttng-ust/lttng-ust-comm.c:664
#5 0x0000007f910e5d28 in handle_pending_statedump (sock_info=0x7f9115c608 <global_apps>)
at /usr/src/debug/lttng-ust/2_2.13.0-r0/lttng-ust-2.13.0/src/lib/lttng-ust/lttng-ust-comm.c:737
#6 handle_message (lum=0x7f8dde46d8, sock=3, sock_info=0x7f9115c608 <global_apps>)
at /usr/src/debug/lttng-ust/2_2.13.0-r0/lttng-ust-2.13.0/src/lib/lttng-ust/lttng-ust-comm.c:1410
#7 ust_listener_thread (arg=0x7f9115c608 <global_apps>)
at /usr/src/debug/lttng-ust/2_2.13.0-r0/lttng-ust-2.13.0/src/lib/lttng-ust/lttng-ust-comm.c:2055
#8 0x0000007f90af73e0 in start_thread (arg=0x7fc27a82f6)
at /usr/src/debug/glibc/2.31/git/nptl/pthread_create.c:477
#9 0x0000007f90bead5c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78
It turns out that the main thread is at that point iterating over the
libraries destructors:
Thread 3 (LWP 1983):
#0 0x0000007f92a68a0c in _dl_fixup (l=0x7f9054e510, reloc_arg=432)
at /usr/src/debug/glibc/2.31/git/elf/dl-runtime.c:69
#1 0x0000007f92a6ea3c in _dl_runtime_resolve () at ../sysdeps/aarch64/dl-trampoline.S:100
#2 0x0000007f905170f8 in __do_global_dtors_aux () from <....>/crash/work/rootfs/usr/lib/libbsd.so.0
#3 0x0000007f92a697f8 in _dl_fini () at /usr/src/debug/glibc/2.31/git/elf/dl-fini.c:138
#4 0x0000007f90b54864 in __run_exit_handlers (status=0, listp=0x7f90c65648 <__exit_funcs>,
run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true)
at /usr/src/debug/glibc/2.31/git/stdlib/exit.c:108
#5 0x0000007f90b549f4 in __GI_exit (status=<optimized out>)
at /usr/src/debug/glibc/2.31/git/stdlib/exit.c:139
#6 0x0000000000404c98 in a_function_name (....) at main.c:152
#7 0x0000000000404a98 in main (argc=3, argv=0x7fc27a8858, env=0x7fc27a8878) at main.c:97
Cause
=====
An enable command is processed at the same time that the lttng-ust
destructor is run. At the end of the command handling,
`handle_pending_statedump` is called. Multiple variables from the
`sock_info` struct are checked outside the UST lock at that point.
lttng-ust-comm.c +1406:
/*
* Performed delayed statedump operations outside of the UST
* lock. We need to take the dynamic loader lock before we take
* the UST lock internally within handle_pending_statedump().
*/
handle_pending_statedump(sock_info);
`statedump_pending` is set during the enable command
(`lttng_session_statedump`, lttng-events.c +631) in the same thread.
As for `registration_done` and `initial_statedump_done` they are invariant
from the registration of the app until `lttng_ust_cleanup` is called.
`cleanup_sock_info` called by `lttng_ust_cleanup`, itself called by
`lttng_ust_exit` resets the `registration_done` and
`initial_statedump_done` fields. Note that these operations are done
outside of the listener thread.
Note that by that point `lttng_ust_exit` expects all "getters" on
`sock_info` to fail while trying to acquire the UST lock due to
`lttng_ust_comm_should_quit` set to 1. Note that the listener threads
can still exist because we do not join them, we only execute
pthread_cancel which is async.
Clearly we are missing mutual exclusion provided by locking
when accessing `registration_done` and `initial_statedump_done`.
Solution
========
Here we can do better and simply not require any mutual exclusion based on locking.
`registration_done` and `initial_statedump_done` only need to be reset
to zero when we are not actually exiting (`lttng_ust_after_fork_child`).
In this case, no concurrent listener thread exists at that point
that could access those fields during the reset. Hence we can move the
reset to only the non-exiting code path and alleviate the current
situation.
Known drawbacks
===============
None.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I45ba3eaee20c49a3988837a87fa680ce0a6ed953
Michael Jeanson [Mon, 6 Dec 2021 20:05:59 +0000 (15:05 -0500)]
fix: allocating C++ compound literal on heap with Clang
Exclude Clang from the GCC version macro check for <= 4.8 since most
versions of Clang seem to identify themselves as GCC 4.2 which in this
case forces the allocation of C++ compound literals on the heap which
is only supported starting with Clang >= 6.0.
The macro was also broken for GCC <= 4.8 in C mode, add missing
parentheses around the 'or' statement to properly distinguish between C
and C++.
Also document the minimal supported version of Clang 4.0 to build C++
probe providers.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I62eea00381b7dc5958a09b13044ad9e7f7caf2ab
Michael Jeanson [Wed, 8 Dec 2021 20:54:53 +0000 (15:54 -0500)]
Check for C++11 when building C++ probe providers
The compiler used to build probe providers might differ from the one
used to build lttng-ust, make sure that when a probe provider is built
by a C++ compiler, it supports C++11.
Change-Id: I2a17e923316ff87c023d8e50c53efdbe35386a21 Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Tue, 9 Mar 2021 17:38:06 +0000 (12:38 -0500)]
fix: liblttng-ust-fd async-signal-safe close()
"close(2)" is documented as async-signal-safe (see signal-safety(7)) and
as such our override function should also be. This means we shouldn't do
lazy initialization of the function pointer to the original libc close
symbol in the override function.
If we move the initialization of the function pointer in the library
constructor we risk breaking other constructors that may run before ours
and call close().
A compromise is to explicitly do the initialization in the constructor
but keep a lazy init scheme if close() is called before it is executed.
The dlsym call is now done only once, if it fails the wrappers will
return -1 and set errno to ENOSYS.
Change-Id: I05c66d2f5d51b2022c6803ca215340fb9c00f5a8 Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
tracepoints: print debug message when lttng-ust-tracepoint.so is not found
Rather than silently disable tracepoints when lttng-ust-tracepoint.so is
not found in the library search path, print a debug message when either
the compile unit including tracepoint.h has defined LTTNG_UST_DEBUG, or
when the LTTNG_UST_DEBUG environment variable is set.
gcc 4.8 introduces support for C11, and gcc 4.6 introduces support for
_Static_assert. Therefore, using _Static_assert when C11 is detected is
always OK.
However, using static_assert in C11 depends on glibc >= 2.16. Even
though the minimum version requirement for glibc is not documented in
the README.md file, make a best effort to keep compatibility with older
glibc.
Fix: combined tracing of lttng-ust 2.12/2.13 generates corrupted traces
Observed issue
==============
When tracing applications linked against lttng-ust 2.12 and lttng-ust
2.13 in parallel with a lttng-tools 2.13 into the same per-uid buffers,
with the "procname" context enabled, babeltrace fails with "Event id NN
is outside range" when reading the trace:
[14:51:58.717006865] (+5.927872956) x lttng_ust_statedump:start: { cpu_id = 1 }, { procname = "sample-2.13-ust" }, { }
[error] Event id 41984 is outside range.
[error] Reading event failed.
Error printing trace.
Cause
=====
Inspection of the trace reveals that the layout of the procname context
field changed from 17 bytes to 16 bytes between 2.12 and 2.13. This is
an issue when applications share a per-uid ring buffer, because context
fields are associated with channels, and need to have the same layout
across all processes tracing into a given channel.
The layout of the procname field described by the trace metadata is that
of the first application which happens to register that channel in the
session lifetime.
Therefore, the procname context field length is part of the LTTng-UST
ABI and cannot be changed without breaking the LTTng-UST ABI (bumping
LTTNG_UST_ABI_MAJOR_VERSION_OLDEST_COMPATIBLE), which is unwanted
between 2.12 and 2.13. Keeping compatibility for combined use by
different applications between lttng-ust 2.12 and 2.13 is a required
feature for this release, because lttng-ust 2.13 introduces a library
ABI break (soname bump).
An example scenario leading to this issue:
1) trace created for per-uid buffers,
2) add procname context
3) start tracing
4) Application [a] linked against lttng-ust 2.13 registers the channel to
lttng-sessiond, sending its context descriptions with a 16-byte
procname context field,
5) Application [b] linked against lttng-ust 2.12 registers the same channel
to lttng-sessiond,
6) Application [b] traces an event with the procname context, followed
by an event payload with a single "string" field.
7) A trace viewer will observe the procname context, followed by an
extra null character, and thus mistakenly consider the event payload
to be an empty string. Reading the next event header will fail
because the string payload will be expected to contain an event ID.
Solution
========
Revert the procname context field size to 17 bytes to stay compatible
with lttng-ust 2.12.
In an abundance of caution, also revert the size of the
lttng_ust_statedump:procname "procname" field to 17, so there won't be
duplicated event IDs for this event when applications linked against
lttng-ust 2.12 and 2.13 are traced concurrently for the same user ID
in per-uid tracing.
History
=======
This issue was introduced by commit 0db3d6ee9be ("port: fix
pthread_setname_np integration") within the 2.13 development cycle.
Known drawbacks
===============
Applications currently running which are linked against a liblttng-ust
2.13 without this fix should be restarted after upgrading the library to
liblttng-ust 2.13 with this fix.
Documentation: clarify API backward compatibility comment
Considering that the ABI (soname major version) and API version can
evolve independently in the future, remove references to the soname
major version from the API compatibility documentation.
Philippe Proulx [Wed, 9 Jun 2021 19:39:25 +0000 (15:39 -0400)]
doc/man: only mention `-llttng-ust-common` in synopses (conditionally)
LTTng-UST only requires that you link your application or tracepoint
provider package with `-llttng-ust-common` if you define
`_LGPL_SOURCE` before you include, directly or indirectly,
`<lttng/tracepoint.h>`.
The `_LGPL_SOURCE` definition is specific to the EfficiOS/LTTng
projects.
Because defining `_LGPL_SOURCE` is not considered the typical scenario,
remove instructions to link with `-llttng-ust-common` throughout the
manual pages, except in synopses, to make such instructions more
readable/light.
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I460a2f746d5e2904660a11b3151d0d01776361db
Philippe Proulx [Wed, 9 Jun 2021 19:26:01 +0000 (15:26 -0400)]
doc/man: remove vtracef() and vtracelog() manual pages
Following 2268c76f ("Remove vtracelog and vtracef from v0 compat API"),
this patch removes the manual pages of vtracef() and vtracelog() which
don't exist.
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I1a07c74b330015ee74bb92235db2171066751503
vtracelog and vtracef were introduced between lttng-ust 2.12 and 2.13
(not released yet). They are replaced by lttng_ust_vtracelog and
lttng_ust_vtracef in the v1 API, newly introduced in lttng-ust 2.13.
Therefore, there is no need to expose a v0 compat API for the vtracelog
and vtracef macros which were never officially part of any release
other than the 2.13 release candidates.
Michael Jeanson [Wed, 3 Mar 2021 16:56:49 +0000 (11:56 -0500)]
Add serialized ABI definition files
This commit contains the serialized ABI definitions for a typical build
of the lttng-ust librairies. This information is extracted using
libabigail (https://sourceware.org/libabigail/).
The artefacts used to generate these were built with CFLAGS="-O0 -ggdb"
and all optional configure switches enabled.
You can compare the serialized ABI with a shared object to check for
changes. For example, here we compare an in-tree built version of
liblttng-ust.so with the serialized ABI of stable-2.13 :
Philippe Proulx [Tue, 25 May 2021 17:06:56 +0000 (13:06 -0400)]
doc/man: document LTTng-UST 2.13
Significant changes:
* Prefix all macro/definition names with `LTTNG_UST_` or `lttng_ust_`
where needed.
* Prefix all log level definitions with `LTTNG_UST_TRACEPOINT_LOGLEVEL`.
* lttng-ust(3):
* Add "Compatibility with previous APIs" section to explain
the new `LTTNG_UST_COMPAT_API_VERSION` definition.
* Document the new tracepoint class provider name parameter of
`LTTNG_UST_TRACEPOINT_EVENT_INSTANCE()`.
Update examples accordingly.
* Mention `liblttng-ust-common` where missing.
* tracef(3), vtracef(3), tracelog(3), and vtracelog(3) now indicate that
the macros are part of version 0 of the LTTng-UST API, albeit still
available, and point to lttng_ust_tracef(3), lttng_ust_vtracef(3),
lttng_ust_tracelog(3), and lttng_ust_vtracelog(3).
* New lttng_ust_do_tracepoint(3), lttng_ust_tracepoint(3),
and lttng_ust_tracepoint_enabled(3) manual pages which source
lttng-ust(3).
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I07d6ace0d6f219c36d7c99a455726bbf4b0736a2