Support generic globbing patterns in the Java agent
Replace the separate eventNames and eventNamePrefixes maps by
one map tracking generic Patterns instead. This will allow
matching against patterns containing more than one wildcard
character, which is now supported by UST.
Philippe Proulx [Fri, 17 Feb 2017 09:26:59 +0000 (04:26 -0500)]
Add support for star globbing patterns in event names
This patch adds support for full star-only globbing patterns used in
the event names (enabler names).
strutils_star_glob_match() is always used to perform the match when
the enabler is LTTNG_ENABLER_STAR_GLOB. This enabler is set when it is
detected that its name contains at least one non-escaped star with
strutils_is_star_glob_pattern().
While exclusions could be checked before the enabler name match to this
date, they must now be checked after we know there's a match because the
intersection of exclusion names and event event name is not always
checked on the LTTng-tools side (too much complexity for too little
gain).
The match itself is performed by strutils_star_glob_match(), the same
function that the filter interpreter uses.
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Philippe Proulx [Fri, 17 Feb 2017 09:14:40 +0000 (04:14 -0500)]
Filtering: add support for star-only globbing patterns
This patch adds the support for "full" star-only globbing patterns to be
used in filter literal strings. A star-only globbing pattern is a
globbing pattern with the star (`*`) being the only special character.
This means `?` and character sets (`[abc-k]`) are not supported here. We
cannot support them without a strategy to differentiate the globbing
pattern because `?` and `[` are not special characters in filter literal
strings right now. The eventual strategy to support them would probably
look like this:
filename =* "?sys*.[ch]"
The filter bytecode generator in LTTng-tools's session daemon creates
the new FILTER_OP_LOAD_STAR_GLOB_STRING operation when the interpreter
should load a star globbing pattern literal string. Even if both
"plain", or legacy strings and star globbing pattern strings are literal
strings, they do not represent the same thing, that is, the == and !=
operators act differently.
The validation process checks that:
1. There's no binary operator between two
FILTER_OP_LOAD_STAR_GLOB_STRING operations. It is illegal to compare
two star globbing patterns, as this is not trivial to implement, and
completely useless as far as I know.
2. Only the == and != binary operators are allowed between a
star globbing pattern and a string.
For the special case of star globbing patterns with a star at the end
only, the current behaviour is not changed to preserve a maximum of
backward compatibility. This is also why the UST ABI version is changed
from 7.1 to 7.2, not to 8.0.
== or != operations between REG_STRING and REG_STAR_GLOB_STRING
registers is specialized to FILTER_OP_EQ_STAR_GLOB_STRING and
FILTER_OP_NE_STAR_GLOB_STRING. Which side is the actual globbing pattern
(the one with the REG_STAR_GLOB_STRING type) is checked at execution
time. The strutils_star_glob_match() function is used to perform the
match operation. See the implementation for more details.
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
This Makefile was using Distutils' setup.py to install the Python agent
but was using the Autoconf's $pkgpythondir variable for the uninstall
process. The two folders can be different on some distributions which
made the uninstall attempting to delete a non-existant folder and
effectively not uninstalling.
We now run a phony installation of the bindings in a temporary directory
and use the tree structure of the install folder to infere the location
of the files on the system to delete them.
Also, we print a warning if the install directory is not included in the
PYTHONPATH variable.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
LTTNG_UST_BLOCKING_RETRY_TIMEOUT
Maximum duration (milliseconds) to retry event tracing when
there’s no space left for the event record in the
sub-buffer.
0 (default)
Never block the application.
Positive value
Block the application for the specified number of
milliseconds. If there’s no space left after this
duration, discard the event record.
Negative value
Block the application until there’s space left for the
event record.
This option can be useful in workloads generating very
large trace data throughput, where blocking the application
is an acceptable trade-off to prevent discarding event
records.
Warning
Setting this environment variable to a non-zero value
may significantly affect application timings.
Fix: loglevel and model_emf_uri with g++ compiled probes
Fix the loglevel and model_emf_uri features for probe providers compiled
with g++. They were previously effectless because of C++ symbol name
mangling. The weakref was refering to the non-mangled symbol, but C++
emits a mangled symbol for the static variable.
Fix this by emitting an extern "C" symbol with hidden visibility on C++.
With a C compiled, this simply turns a static variable into a variable
with hidden visibility.
Fix: perform statedump before replying to sessiond
If a stop command immediately follows a start command, the consumer
daemon will stop event recording in the ring buffers shared memory
control structures before the sessiond sends further commands to the
application. Therefore, a stop-after-start may be performed concurrently
with the statedump, leading to have parts of the statedump being
missing. This case may always happen if an application exits during
statedump, but it is not expected to have incomplete statedump in the
stop-after-start use case.
The session daemon statedump regeneration tests expect that the
statedump is completed when the regeneration command returns. This also
requires that we perform the statedump in lttng-ust before replying to
the session daemon command.
This library overrides close() and closeall() libc functions, and uses
lttng_ust_safe_close_fd() to check whether the application can
interact with the file descriptor or if it should be left to lttng-ust.
This takes care of bugs caused by applications doing bulk close() or
closefrom() of file descriptors soon after forking.
Introduce a tracker for file descriptors used by lttng-ust. It exposes
a new API in an internal header lttng_ust_safe_close_fd(), which is
meant to be used by a LD_PRELOADed library overriding close() and
closefrom() (BSD).
This takes care of bugs caused by applications doing bulk close() or
closefrom() of file descriptors soon after forking.
We need to hold the ust_lock() to protect the fd tracker lock against
fork. Since the fd tracker is needed across connect() (which allocates a
file descriptor), we need to hold the ust_lock across connect().
Needed if we want to hold the ust_lock() while we connect to the session
daemon without blocking the application forever if the session daemon is
hung on SIGSTOP.
This only triggers if we launchs _many_ applications with a session
daemon SIGSTOP'd (e.g. 1000 in parallel), so we fill the socket queue,
and applications hang there until the session daemon is SIGCONT'd.
Fix: perform TLS fixup in all UST entry points from each thread
Each entry point into lttng-ust that end up taking the ust lock need to
perform a TLS fixup for each thread. Add a TLS fixup in both listener
threads, in fork and base address dump helper libs, and in app context
and tracepoint probe registration/unregistration functions, which can be
called from application threads.
Those ensure we don't take the libc dl lock within the ust lock when
performing the TLS lazy fixup.
Because all length parameters received for serializing data coming from
applications go through a callback, they are never constant, and it
hurts performance to perform a call to memcpy each time.
Performance: mark ring buffer do_copy callers always inline
The underlying copy operation is more efficient if the size is a
constant, which only happens if this function is inlined in the caller.
Otherwise, we end up calling memcpy for each field.
Force inlining for performance reasons for:
- lib_ring_buffer_write,
- lib_ring_buffer_do_strcpy,
- lib_ring_buffer_strcpy.
Note that in lttng-ust, the probe provider serialization functions need
to call the lttng_event_write() client callback, which will fallback to
the memcpy operation.
Inlining those functions helps for the event header code, which can
inline them.
Performance: cache the backend pages pointer in context
Getting the backend pages pointer requires walking through the ring
buffer backend tables through multiple shmp operations. Cache the
current value so it can be re-used for all backend write operations
writing fields for the same event.
Performance: Relax atomicity constraints for crash handling
Use a store rather than a cmpxchg() for the update of the
sequential commit counter. This speeds up commit. The downside
is that short race windows between the if() check to see if the
counter is larger than the new value and the update could result
in the counter going backwards, in unlikely preemption or signal
delivery scenarios.
Accept that we may lose a few events in a crash dump for the
benefit of tracing speed.
Disable event counting in the ring buffer, which can count the number of
events produced per ring-buffer, as well as the number of events
overwritten in overwrite mode.
This feature is currently unused anyway: it is not saved in the ring
buffer header, nor made available to lttng-tools.
This saves 70 ns/event in lttng-ust on the ARM32 Cubietruck.
Performance: remove rcu read lock from ring buffer get/put cpu
The tracepoints are already protected by a RCU-bp read-side lock, so
trying to take this nested lock is useless. We gain 132 ns/event on the
ARM32 Cubietruck by removing this nested rcu read-side lock.
On ARMv7l (Cubietruck), the compiler generates a function call for each
lib_ring_buffer_check_deliver, even though it typically only do an
unlikely check. Split it into an inline fast path, and a function call
for the slow path. This brings a performance gain of about 500ns/event
on the Cubietruck.
Fix: perf counters build against kernel headers < 3.12
Copy Linux kernel perf_event.h installed headers into lttng-ust to know
the recent ABI layout, and use the bit description detailed in the
following Linux kernel commit:
Fall-back on the perf read system call for kernels prior to 3.12,
because older kernels have an ABI bug where a union was used for both
cap_usr_time and cap_usr_rdpmc.
Ensure setup_perf set the pc pointer value before checking whether we
need to the file descriptor open or not.
Batch invocation of synchronize_rcu() when unregistering many events
from a session.
Also batch invocation of synchronize_rcu() when registering the same
events within many concurrent sessions (starting from the 2nd session).
Those slowdowns are noticeable with applications processes that have a
short life-time, e.g. shell scripts spawning multiple short-lived
processes take significantly longer to complete when LD_PRELOADing a UST
probe provider.
This slowdown only occurs when UST tracing sessions are created in the
session daemon.
tracepoint_probe_update_all() (currently unused) implements a similar
mechanism which has the downside of iterating on all events in all probe
libraries (not as efficient). Move synchronize_rcu() in
tracepoint_probe_update_all() outside of the iteration on all events to
free in this function, because it is only needed between the last
callsite update and the first memory reclaim, not between list removal
and reclaim.
Faulting the TLS variable when accessed for the first time can trigger
deadlocks with internal libc lock when using the liblttng-ust-malloc
wrapper. Fix this by pre-faulting it in a library constructor, similarly
to the approach taken for all other TLS variables in lttng-ust.
Fix: cleanup local_apps.allowed flag on lib cleanup
In case of applications using fork/clone, which drop their privileges,
we need to clear the local_apps.allowed flag, otherwise those
application get an assertion when using the liblttng-ust-fork helper:
e.g.
varnishd: lttng-ust-comm.c:423: setup_local_apps: Assertion `local_apps.allowed == 0' failed.
CID 1357641 (#1 of 1): Out-of-bounds write (OVERRUN)2. sprintf_overrun:
sprintf will overrun its first argument &name[len] which can accommodate
4 bytes. The number of bytes written may be 5 bytes, including the
terminating null.
This variable can be tested by applications to check whether lttng-ust
is loaded. They simply have to define their own "lttng_ust_loaded" weak
symbol, and test it. It is set to 1 by the library constructor.
The main use-case is to allow applications to detect that they should
not try to close file descriptors that do not belong to them (e.g. BSD
closefrom). This is a common pattern with applications invoking
daemon(3).
Set internally by liblttng-ust's constructor. Can be used by
applications to detect if lttng-ust is loaded, even if liblttng-ust is
not directly linked by the application.
The main use-case is to allow applications to detect that they should
not try to close file descriptors that do not belong to them (e.g.
BSD closefrom). This is a common pattern with applications invoking
daemon(3).
Note that this environment variable is passed to children of a traced
process, and through exec calls. Therefore, an application might think
that lttng-ust is loaded even though it's not loaded in its own address
space if it was loaded by one of its parent processes.
On this error path, we should not free lttng_chan, because it is located
within an allocated shm memory area associated with the channel now. It
is invalid to free this pointer.
This takes care of correctly tracing the mapping of direct dependencies
of dlopen'd libraries, which was not appropriately done by tracing just
dlopen events.
Julien Desfossez [Mon, 27 Jun 2016 21:40:01 +0000 (17:40 -0400)]
Add perf context support for ARMv7
Allow to add perf context to UST traces. ARMv7 does not have a reliable
way to read perf PMU counters entirely from user-space like we do on
x86, so this approach requires a system call every time a counter needs
to be read which has a significant performance impact.
ARMv7 does not have way to read PMU from userspace because it requires
write access to the debug coprocessor to select which PMU counter to
read which defeats user-space/kernel protection. For that reason, the
bits required to allow user-space access to those registers are not
enabled in the kernel and Perf does not expose any information in the
shared mmap page, so we do not know what is the counter index. Also, for
ARMv7 we cannot set the exclude_kernel flag, so the counter stays active
even when the process is executing in kernel context (system calls
mainly).
This generic approach might work on other architecture, but it has not
yet been tested so it is not enabled in the code.
Julien Desfossez [Mon, 27 Jun 2016 21:40:00 +0000 (17:40 -0400)]
Keep perf context FD open for other architectures
Instead of closing the perf context after the page has been mapped, keep
it open so it can be used with the read() system call if the
architecture does not support direct access from user-space.
The file Makefile.am in the examples directory was modified. The
modifications were done to include the new cmake example. The new cmake
example is built when make is invoke in the root directory. Further,
this new example also ships in the tarball when the latter is created
("make distcheck").
In doc/examples/cmake-multiple-shared-libraries/, a new example lives.
This example requires a C++ compiler (HAVE_CXX) and it requires also a
cmake executable (HAVE_CMAKE). This example relies on the cmake module
called FindLTTngUST, which was kindly provided by Philippe Proulx.
The alignment of a small amount of lines was improved in the FindLTTngUST
module.
This new example generates a shared library (tracepoint provider) that
links with dl and with lttng-ust. Two other shared libraries are also
generated, and these two are linked with the tracepoint provider shared
library.
In the configure.ac file, detect if a cmake executable is
available. Two new variables are now available in any
Makefile.am file: HAVE_CXX (which tells if a C++ compiler
is available) and HAVE_CMAKE (which indicates whether a cmake
executable is available).
Non-LGPL modules that use tracepoint instrumentation, but have no
compile unit defining either TRACEPOINT_DEFINE or
TRACEPOINT_CREATE_PROBES fail to build due to undefined reference to
`tracepoint_dlopen_ptr'.
Add -ust to the name of UST threads of the application
Add the required functions to change the thread name of the UST
threads and add the -ust string at its end. This will help to
identify LTTng-UST processes when analyzing the trace of a process.
Comment the locking mechanisms in ContextInfoManager
Coverity didn't like our non-locking of the get() method. The
lock is actually only needed for the registration/unregistration
of retrievers, the get() can access the ConcurrentHashMap
directly.
Fix: Include child loggers in the output of "lttng list"
The case where a parent logger has an handler attached but the
tracepoint comes from a child logger is not correctly handled
by the "lttng list -j/-l" command.
For example, if the logger "org.myapp" has a LTTng handler
attached, its child logger "org.myapp.mycomponent" would be
absent from the lttng list output even if it exists.
When checking for events to list, search through the parent
tree of each logger to find a potential LTTng handler.
This should also fix the problem of "lttng list" always
returning empty when the deprecated, but still supported,
LTTngAgent API was used, since that one attaches only one
handler to the root logger.
Fix: Handle both agent config files pointing to same port
The expected locations for the user and root agent sessiond
configuration files are ~/.lttng/agent.port and
/var/run/lttng/agent.port, respectively. These files indicate
which port an agent should connect to to reach its respective
sessiond.
If by some bad luck both files indicate the same port, then
both Java TCP clients would end up connecting to the same
sessiond, resulting in weird results, like "lttng list" listing
all events twice.
Make sure the target ports are different, and avoid duplicate
connections in case there are not.