Move symbol preventing unloading of probe providers
Issue
=====
Calling dlclose on the probe provider library that first loaded
__tracepoints__disable_destructors in the symbol table does not
unregister the probes from the callsites as the destructors are not
executed.
The __tracepoints__disable_destructors weak symbol is exposed by probe
providers, liblttng-ust.so and liblttng-ust-tracepoint.so libraries. If
a probe provider is loaded first into the address space, its definition
is bound to the symbol. All the subsequent loaded libraries using the
symbol will use the existing definition of the symbol, thus creating a
situation where liblttng-ust.so or liblttng-ust-tracepoint.so depend on
the probe provider library.
This prevents the dynamic loader from unloading the library as it is
still in use by other libraries. Because of this, the execution of its
destructors and the unregistration of the probes is postponed. Since the
unregistration of the probes is postponed, event will be generated if
the callsite is executed even though the probes should not be loaded.
Solution
========
To overcome this issue, we no longer expose this symbol in the
tracepoint.h file to remove the explicit dependency of the probe
provider on the symbol. We instead use the existing dlopen handle on
liblttng-ust-tracepoint.so and use dlsym to get handles on functions
that disable and get the state of the destructors.
Version compatibility
=====================
- This change is backward compatible with UST applications and libraries
built on lttng-ust version before 2.11. Those applications will use
the __tracepoints__disable_destructors symbol that is now exposed
as a weak symbol in the liblttng-ust-tracepoint.so library. This
symbol is alway checked in 2.11 in case an old app is running.
- Applications built with this change will also work in older versions
of lttng-ust as there is a check to see if the new destructor state
checking method should be used, if it is not we fallback to a
compatibility method. To ensure compatibility in this case, we also
look up and keep up-to-date the __tracepoints__disable_destructors
value using the dlopen-dlsym combo.
- A mix of applications/probes builds in part against 2.10 and 2.11
also work. When setting the destructor state from a binary built
against 2.11 headers, both old/new states are set, so a binary built
against 2.10 will correctly see the old state. When querying the state
from a binary built against 2.11 headers, both old and new states are
queried, so if the state has been set from a binary built against
2.10 headers, the old state will be set.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Fri, 2 Mar 2018 22:36:26 +0000 (17:36 -0500)]
Fix: cache the result of getpid() internally
On Linux we called getpid() directly on each tracepoint and relied on
the glibc pid cache. However, in glibc 2.25, released on 2017-02-05, the
pid cache was removed which results in a getpid syscall on each event
when the vpid context is enabled.
Remove the Linux specific case and use our internal cache all the time.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Fri, 2 Mar 2018 22:36:25 +0000 (17:36 -0500)]
Fix: reset cached vpid context on fork
We currently reset the cached vtid on fork but not the vpid. This is not
a problem on Linux because we don't cache the vpid internally but call
getpid() directly and rely on the glibc pid cache.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Wed, 25 Oct 2017 18:28:04 +0000 (14:28 -0400)]
Fix: build example SO when PIE is enabled
In the example Makefiles, when building shared object libraires, make sure
we set the custom linker options after the CFLAGS/LDFLAGS so that it
overrides them. This is useful when the build system set some hardening
features like PIE in the CFLAGS.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
With this commit, it's now possible to dlclose() a library containing an
actively used probe provider.
The destructor of such library will now iterate over all the sessions
and over all probe definitions to unregister them from the respective
callsites in the process.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
dlopen() liblttng-ust.so from constructor to prevent unloading
The support of probe provider dlclose() allows for the following
problematic scenario:
- Application is not linked against the liblttng-ust.so
- Application dlopen() a probe provider library that is linked against
liblttng-ust.so
- Application dlclose() the probe provider
In this scenario, the probe provider has a dependency on
liblttng-ust.so, so when it's loaded by the application, liblttng-ust.so
is loaded too. The probe provider library now has the only reference to
the liblttng-ust.so library. When the application calls dlclose() on
it, all its references are dropped, thus triggering the unloading of
both the probe provider library and liblttng-ust.so.
This scenario is problematic because lttng ust_listener_threads are in
DETACHED state. We cannot join them and therefore we cannot unload the
library containing the code they run. Only the operating system can free
those resources.
The reason why those threads are in DETACHED state is to quickly
teardown applications on process exit.
A possible solution to investigate: if we can determine whether
liblttng-ust.so is being dlopen (directly or undirectly) or it's linked
against the application, we could set the detached state accordingly.
To prevent that unloading, we pin it in memory by grabbing an extra
reference on the library, with a RTLD_NODELETE flag. This will prevent
the dynamic loader from ever removing the liblttng-ust.so library from
the process' address space.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
It's now possible to register a probe provider with a name that has
already been registered. This is useful when wanting to load a new
version of a shared library on a already running process.
Changes are necessary in the lttng-session daemon to support cases where
the newly register event has a different probe payload.
Taking a simple case where a probe provider is registered twice, the
tracepoint call site will have two probes registered to it and thus will
generate two events in the trace.
08:51:35 lttng-ust-fd-tracker.c: In function 'dup_std_fd':
08:51:35 lttng-ust-fd-tracker.c:174:2: error: 'for' loop initial
declarations are only allowed in C99 mode
08:51:35 for (int i = 0; i < STDERR_FILENO + 1; i++) {
08:51:35 ^
08:51:35 lttng-ust-fd-tracker.c:174:2: note: use option -std=c99 or
-std=gnu99 to compile your code
08:51:35 lttng-ust-fd-tracker.c:195:11: error: redefinition of 'i'
08:51:35 for (int i = 0; i < fd_to_close_count; i++) {
08:51:35 ^
08:51:35 lttng-ust-fd-tracker.c:174:11: note: previous definition of 'i'
was here
08:51:35 for (int i = 0; i < STDERR_FILENO + 1; i++) {
08:51:35 ^
08:51:35 lttng-ust-fd-tracker.c:195:2: error: 'for' loop initial
declarations are only allowed in C99 mode
08:51:35 for (int i = 0; i < fd_to_close_count; i++) {
08:51:35 ^
08:51:35 Makefile:412: recipe for target 'lttng-ust-fd-tracker.lo'
failed
08:51:35 make[2]: *** [lttng-ust-fd-tracker.lo] Error 1
08:51:35 make[2]: *** Waiting for unfinished jobs....
Michael Jeanson [Tue, 21 Nov 2017 16:11:15 +0000 (11:11 -0500)]
Fix: specify SONAME in python-lttngust LoadLibrary
When loading the python agent library with ctypes in the python
bindings, specify the SONAME. This will make sure we load the proper
library in the event of a SONAME bump and the bindings will work without
having to install the "dev" package which in most distros contains the
non-versionned ".so".
Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Jonathan Rajotte [Fri, 10 Nov 2017 16:06:41 +0000 (11:06 -0500)]
Fix: fd of an elf object must be registered to the fd tracker
The open call take place inside ust, it must be tracked to prevent external
closing.
The bug can be hit during tracing of an application for which the probe
provider is loaded using LD_PRELOAD in combination with the fd utility
shared object. The application is responsible for closing all possible fd.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
The initial-exec model seems to behave differently than global-dynamic
with respect to lazy initialization, causing locks to be taken then
first time each thread touch the TLS. This introduces deadlocks with
library constructors waiting on other threads.
The initial-exec model seems to behave differently than global-dynamic
with respect to lazy initialization, causing locks to be taken then
first time each thread touch the TLS. This introduces deadlocks with
library constructors waiting on other threads.
Philippe Proulx [Mon, 6 Nov 2017 20:46:03 +0000 (15:46 -0500)]
configure.ac: add --disable-examples option to not build/install examples
Some environments and distributions do not need the LTTng-UST examples
to be built because they remove them anyway. Continue to build them by
default, but add --disable-examples to explicitly disable them.
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Mon, 6 Nov 2017 19:09:30 +0000 (14:09 -0500)]
Disable NUMA by default on 32bit arm
There is currently no NUMA support on 32bit arm, disable the dependency
on libnuma by default on this architecture. It can still be force with
--enable-numa.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fix: sync buffer file metadata on buffer allocation
Synchronizing the file metadata on disk after zeroing the whole file (on
buffer allocation) will make the crash extraction feature (--shm-path
create option) more robust. It ensures the content of the file metadata
backing the buffers does not have to be updated while tracing into the
memory map. Therefore, the on-disk metadata will never be out of sync at
the point where a system crash occurs.
Philippe Proulx [Thu, 27 Jul 2017 23:28:40 +0000 (19:28 -0400)]
Fix: doc/man: use a single XSL file and match local names
Matching the local name instead of the full name, that is:
*[local-name() = 'co']
instead of just `co` matches both the non-namespaced element and the
DocBook-namespaced element whether we're using the DocBook 4.5 or
DocBook 5.0 stylesheets.
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Introduce the LTTNG_UST_ALLOW_BLOCKING env. var. to control whether
applications are allowed to block when a buffer is full. If set, it
allows the tracer to block the application when buffers are full.
The blocking is now controlled by a per-channel configuration option in
the LTTng control interface for channels with the "--blocking-timeout"
parameter, which is specified in usec (or -1 to block forever).
This replaces the LTTNG_UST_BLOCKING_RETRY_TIMEOUT env. var., which
actually never made it into a stable release (we therefore remove this
env. var).
Allow context length calculation to have side-effects which trigger
event tracing by moving the calculation outside of the buffer space
reservation retry loop.
This also paves the way to have dynamically sized contexts in lttng-ust,
which would expect to put their size of the internal stack. Note that
the context length calculation is performed *after* the event payload
field length calculation, so the stack needs to be used accordingly.
Currently, the only dynamically sized contexts we have are provided by
Java integration, which keeps its own stack.
Michael Jeanson [Tue, 9 May 2017 18:25:01 +0000 (14:25 -0400)]
Fix: Don't override user variables within the build system
Instead use the appropriatly prefixed AM_* variables as to not interfere
when a user variable is passed to a make command. The proper use of flag
variables is documented at :
The protocol's minor version is bumped since a new API entry
point is introduced. The so name's "current" and "age" fields are
bumped in accordance with the libtool guidelines[1].
Philippe Proulx [Wed, 15 Mar 2017 00:48:18 +0000 (20:48 -0400)]
doc/man: add typical `$` and `#` prompts to command lines
It is more instinctive for the typical reader to immediately recognize
command lines when they start with the classic prompts.
On the online version of the man pages, those prompts are treated
specially to make them non-selectable. This makes it possible to copy
multiple command lines at once (without copying the prompts) and to
paste them to your shell.
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fix: race between lttng-ust getenv() and application setenv()
The LTTng-UST listener threads invoke getenv(), which can cause issues
if the application issues setenv() concurrently. This is a legitimate
use by the application because it may have a single thread and not be
aware that it runs with liblttng-ust.
Fix this by keeping our own environment variable table for the variables
we care about. Initialize this table within the lttng-ust library
constructor, when we don't race with the application.
As this thread shows:
https://sourceware.org/bugzilla/show_bug.cgi?id=5069#c10
getenv() does _not_ appear to be thread-safe if an application uses
setenv() or putenv().
Use SIZE_MAX instead of -1ULL for size_t parameter
strutils_star_glob_match() receives a size_t. Passing -1ULL truncates
the value implicitly on systems where size_t is 32-bit. It is cleaner to
use SIZE_T.
Support generic globbing patterns in the Java agent
Replace the separate eventNames and eventNamePrefixes maps by
one map tracking generic Patterns instead. This will allow
matching against patterns containing more than one wildcard
character, which is now supported by UST.
Philippe Proulx [Fri, 17 Feb 2017 09:26:59 +0000 (04:26 -0500)]
Add support for star globbing patterns in event names
This patch adds support for full star-only globbing patterns used in
the event names (enabler names).
strutils_star_glob_match() is always used to perform the match when
the enabler is LTTNG_ENABLER_STAR_GLOB. This enabler is set when it is
detected that its name contains at least one non-escaped star with
strutils_is_star_glob_pattern().
While exclusions could be checked before the enabler name match to this
date, they must now be checked after we know there's a match because the
intersection of exclusion names and event event name is not always
checked on the LTTng-tools side (too much complexity for too little
gain).
The match itself is performed by strutils_star_glob_match(), the same
function that the filter interpreter uses.
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Philippe Proulx [Fri, 17 Feb 2017 09:14:40 +0000 (04:14 -0500)]
Filtering: add support for star-only globbing patterns
This patch adds the support for "full" star-only globbing patterns to be
used in filter literal strings. A star-only globbing pattern is a
globbing pattern with the star (`*`) being the only special character.
This means `?` and character sets (`[abc-k]`) are not supported here. We
cannot support them without a strategy to differentiate the globbing
pattern because `?` and `[` are not special characters in filter literal
strings right now. The eventual strategy to support them would probably
look like this:
filename =* "?sys*.[ch]"
The filter bytecode generator in LTTng-tools's session daemon creates
the new FILTER_OP_LOAD_STAR_GLOB_STRING operation when the interpreter
should load a star globbing pattern literal string. Even if both
"plain", or legacy strings and star globbing pattern strings are literal
strings, they do not represent the same thing, that is, the == and !=
operators act differently.
The validation process checks that:
1. There's no binary operator between two
FILTER_OP_LOAD_STAR_GLOB_STRING operations. It is illegal to compare
two star globbing patterns, as this is not trivial to implement, and
completely useless as far as I know.
2. Only the == and != binary operators are allowed between a
star globbing pattern and a string.
For the special case of star globbing patterns with a star at the end
only, the current behaviour is not changed to preserve a maximum of
backward compatibility. This is also why the UST ABI version is changed
from 7.1 to 7.2, not to 8.0.
== or != operations between REG_STRING and REG_STAR_GLOB_STRING
registers is specialized to FILTER_OP_EQ_STAR_GLOB_STRING and
FILTER_OP_NE_STAR_GLOB_STRING. Which side is the actual globbing pattern
(the one with the REG_STAR_GLOB_STRING type) is checked at execution
time. The strutils_star_glob_match() function is used to perform the
match operation. See the implementation for more details.
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
This Makefile was using Distutils' setup.py to install the Python agent
but was using the Autoconf's $pkgpythondir variable for the uninstall
process. The two folders can be different on some distributions which
made the uninstall attempting to delete a non-existant folder and
effectively not uninstalling.
We now run a phony installation of the bindings in a temporary directory
and use the tree structure of the install folder to infere the location
of the files on the system to delete them.
Also, we print a warning if the install directory is not included in the
PYTHONPATH variable.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
LTTNG_UST_BLOCKING_RETRY_TIMEOUT
Maximum duration (milliseconds) to retry event tracing when
there’s no space left for the event record in the
sub-buffer.
0 (default)
Never block the application.
Positive value
Block the application for the specified number of
milliseconds. If there’s no space left after this
duration, discard the event record.
Negative value
Block the application until there’s space left for the
event record.
This option can be useful in workloads generating very
large trace data throughput, where blocking the application
is an acceptable trade-off to prevent discarding event
records.
Warning
Setting this environment variable to a non-zero value
may significantly affect application timings.
Fix: loglevel and model_emf_uri with g++ compiled probes
Fix the loglevel and model_emf_uri features for probe providers compiled
with g++. They were previously effectless because of C++ symbol name
mangling. The weakref was refering to the non-mangled symbol, but C++
emits a mangled symbol for the static variable.
Fix this by emitting an extern "C" symbol with hidden visibility on C++.
With a C compiled, this simply turns a static variable into a variable
with hidden visibility.