Philippe Proulx [Thu, 27 Jul 2017 23:28:40 +0000 (19:28 -0400)]
Fix: doc/man: use a single XSL file and match local names
Matching the local name instead of the full name, that is:
*[local-name() = 'co']
instead of just `co` matches both the non-namespaced element and the
DocBook-namespaced element whether we're using the DocBook 4.5 or
DocBook 5.0 stylesheets.
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Introduce the LTTNG_UST_ALLOW_BLOCKING env. var. to control whether
applications are allowed to block when a buffer is full. If set, it
allows the tracer to block the application when buffers are full.
The blocking is now controlled by a per-channel configuration option in
the LTTng control interface for channels with the "--blocking-timeout"
parameter, which is specified in usec (or -1 to block forever).
This replaces the LTTNG_UST_BLOCKING_RETRY_TIMEOUT env. var., which
actually never made it into a stable release (we therefore remove this
env. var).
Michael Jeanson [Tue, 9 May 2017 18:25:01 +0000 (14:25 -0400)]
Fix: Don't override user variables within the build system
Instead use the appropriatly prefixed AM_* variables as to not interfere
when a user variable is passed to a make command. The proper use of flag
variables is documented at :
The protocol's minor version is bumped since a new API entry
point is introduced. The so name's "current" and "age" fields are
bumped in accordance with the libtool guidelines[1].
Philippe Proulx [Wed, 15 Mar 2017 00:48:18 +0000 (20:48 -0400)]
doc/man: add typical `$` and `#` prompts to command lines
It is more instinctive for the typical reader to immediately recognize
command lines when they start with the classic prompts.
On the online version of the man pages, those prompts are treated
specially to make them non-selectable. This makes it possible to copy
multiple command lines at once (without copying the prompts) and to
paste them to your shell.
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fix: race between lttng-ust getenv() and application setenv()
The LTTng-UST listener threads invoke getenv(), which can cause issues
if the application issues setenv() concurrently. This is a legitimate
use by the application because it may have a single thread and not be
aware that it runs with liblttng-ust.
Fix this by keeping our own environment variable table for the variables
we care about. Initialize this table within the lttng-ust library
constructor, when we don't race with the application.
As this thread shows:
https://sourceware.org/bugzilla/show_bug.cgi?id=5069#c10
getenv() does _not_ appear to be thread-safe if an application uses
setenv() or putenv().
Use SIZE_MAX instead of -1ULL for size_t parameter
strutils_star_glob_match() receives a size_t. Passing -1ULL truncates
the value implicitly on systems where size_t is 32-bit. It is cleaner to
use SIZE_T.
Support generic globbing patterns in the Java agent
Replace the separate eventNames and eventNamePrefixes maps by
one map tracking generic Patterns instead. This will allow
matching against patterns containing more than one wildcard
character, which is now supported by UST.
Philippe Proulx [Fri, 17 Feb 2017 09:26:59 +0000 (04:26 -0500)]
Add support for star globbing patterns in event names
This patch adds support for full star-only globbing patterns used in
the event names (enabler names).
strutils_star_glob_match() is always used to perform the match when
the enabler is LTTNG_ENABLER_STAR_GLOB. This enabler is set when it is
detected that its name contains at least one non-escaped star with
strutils_is_star_glob_pattern().
While exclusions could be checked before the enabler name match to this
date, they must now be checked after we know there's a match because the
intersection of exclusion names and event event name is not always
checked on the LTTng-tools side (too much complexity for too little
gain).
The match itself is performed by strutils_star_glob_match(), the same
function that the filter interpreter uses.
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Philippe Proulx [Fri, 17 Feb 2017 09:14:40 +0000 (04:14 -0500)]
Filtering: add support for star-only globbing patterns
This patch adds the support for "full" star-only globbing patterns to be
used in filter literal strings. A star-only globbing pattern is a
globbing pattern with the star (`*`) being the only special character.
This means `?` and character sets (`[abc-k]`) are not supported here. We
cannot support them without a strategy to differentiate the globbing
pattern because `?` and `[` are not special characters in filter literal
strings right now. The eventual strategy to support them would probably
look like this:
filename =* "?sys*.[ch]"
The filter bytecode generator in LTTng-tools's session daemon creates
the new FILTER_OP_LOAD_STAR_GLOB_STRING operation when the interpreter
should load a star globbing pattern literal string. Even if both
"plain", or legacy strings and star globbing pattern strings are literal
strings, they do not represent the same thing, that is, the == and !=
operators act differently.
The validation process checks that:
1. There's no binary operator between two
FILTER_OP_LOAD_STAR_GLOB_STRING operations. It is illegal to compare
two star globbing patterns, as this is not trivial to implement, and
completely useless as far as I know.
2. Only the == and != binary operators are allowed between a
star globbing pattern and a string.
For the special case of star globbing patterns with a star at the end
only, the current behaviour is not changed to preserve a maximum of
backward compatibility. This is also why the UST ABI version is changed
from 7.1 to 7.2, not to 8.0.
== or != operations between REG_STRING and REG_STAR_GLOB_STRING
registers is specialized to FILTER_OP_EQ_STAR_GLOB_STRING and
FILTER_OP_NE_STAR_GLOB_STRING. Which side is the actual globbing pattern
(the one with the REG_STAR_GLOB_STRING type) is checked at execution
time. The strutils_star_glob_match() function is used to perform the
match operation. See the implementation for more details.
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
This Makefile was using Distutils' setup.py to install the Python agent
but was using the Autoconf's $pkgpythondir variable for the uninstall
process. The two folders can be different on some distributions which
made the uninstall attempting to delete a non-existant folder and
effectively not uninstalling.
We now run a phony installation of the bindings in a temporary directory
and use the tree structure of the install folder to infere the location
of the files on the system to delete them.
Also, we print a warning if the install directory is not included in the
PYTHONPATH variable.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
LTTNG_UST_BLOCKING_RETRY_TIMEOUT
Maximum duration (milliseconds) to retry event tracing when
there’s no space left for the event record in the
sub-buffer.
0 (default)
Never block the application.
Positive value
Block the application for the specified number of
milliseconds. If there’s no space left after this
duration, discard the event record.
Negative value
Block the application until there’s space left for the
event record.
This option can be useful in workloads generating very
large trace data throughput, where blocking the application
is an acceptable trade-off to prevent discarding event
records.
Warning
Setting this environment variable to a non-zero value
may significantly affect application timings.
Fix: loglevel and model_emf_uri with g++ compiled probes
Fix the loglevel and model_emf_uri features for probe providers compiled
with g++. They were previously effectless because of C++ symbol name
mangling. The weakref was refering to the non-mangled symbol, but C++
emits a mangled symbol for the static variable.
Fix this by emitting an extern "C" symbol with hidden visibility on C++.
With a C compiled, this simply turns a static variable into a variable
with hidden visibility.
Fix: perform statedump before replying to sessiond
If a stop command immediately follows a start command, the consumer
daemon will stop event recording in the ring buffers shared memory
control structures before the sessiond sends further commands to the
application. Therefore, a stop-after-start may be performed concurrently
with the statedump, leading to have parts of the statedump being
missing. This case may always happen if an application exits during
statedump, but it is not expected to have incomplete statedump in the
stop-after-start use case.
The session daemon statedump regeneration tests expect that the
statedump is completed when the regeneration command returns. This also
requires that we perform the statedump in lttng-ust before replying to
the session daemon command.
This library overrides close() and closeall() libc functions, and uses
lttng_ust_safe_close_fd() to check whether the application can
interact with the file descriptor or if it should be left to lttng-ust.
This takes care of bugs caused by applications doing bulk close() or
closefrom() of file descriptors soon after forking.
Introduce a tracker for file descriptors used by lttng-ust. It exposes
a new API in an internal header lttng_ust_safe_close_fd(), which is
meant to be used by a LD_PRELOADed library overriding close() and
closefrom() (BSD).
This takes care of bugs caused by applications doing bulk close() or
closefrom() of file descriptors soon after forking.
We need to hold the ust_lock() to protect the fd tracker lock against
fork. Since the fd tracker is needed across connect() (which allocates a
file descriptor), we need to hold the ust_lock across connect().
Needed if we want to hold the ust_lock() while we connect to the session
daemon without blocking the application forever if the session daemon is
hung on SIGSTOP.
This only triggers if we launchs _many_ applications with a session
daemon SIGSTOP'd (e.g. 1000 in parallel), so we fill the socket queue,
and applications hang there until the session daemon is SIGCONT'd.
Fix: perform TLS fixup in all UST entry points from each thread
Each entry point into lttng-ust that end up taking the ust lock need to
perform a TLS fixup for each thread. Add a TLS fixup in both listener
threads, in fork and base address dump helper libs, and in app context
and tracepoint probe registration/unregistration functions, which can be
called from application threads.
Those ensure we don't take the libc dl lock within the ust lock when
performing the TLS lazy fixup.
Because all length parameters received for serializing data coming from
applications go through a callback, they are never constant, and it
hurts performance to perform a call to memcpy each time.
Performance: mark ring buffer do_copy callers always inline
The underlying copy operation is more efficient if the size is a
constant, which only happens if this function is inlined in the caller.
Otherwise, we end up calling memcpy for each field.
Force inlining for performance reasons for:
- lib_ring_buffer_write,
- lib_ring_buffer_do_strcpy,
- lib_ring_buffer_strcpy.
Note that in lttng-ust, the probe provider serialization functions need
to call the lttng_event_write() client callback, which will fallback to
the memcpy operation.
Inlining those functions helps for the event header code, which can
inline them.
Performance: cache the backend pages pointer in context
Getting the backend pages pointer requires walking through the ring
buffer backend tables through multiple shmp operations. Cache the
current value so it can be re-used for all backend write operations
writing fields for the same event.
Performance: Relax atomicity constraints for crash handling
Use a store rather than a cmpxchg() for the update of the
sequential commit counter. This speeds up commit. The downside
is that short race windows between the if() check to see if the
counter is larger than the new value and the update could result
in the counter going backwards, in unlikely preemption or signal
delivery scenarios.
Accept that we may lose a few events in a crash dump for the
benefit of tracing speed.
Disable event counting in the ring buffer, which can count the number of
events produced per ring-buffer, as well as the number of events
overwritten in overwrite mode.
This feature is currently unused anyway: it is not saved in the ring
buffer header, nor made available to lttng-tools.
This saves 70 ns/event in lttng-ust on the ARM32 Cubietruck.
Performance: remove rcu read lock from ring buffer get/put cpu
The tracepoints are already protected by a RCU-bp read-side lock, so
trying to take this nested lock is useless. We gain 132 ns/event on the
ARM32 Cubietruck by removing this nested rcu read-side lock.
On ARMv7l (Cubietruck), the compiler generates a function call for each
lib_ring_buffer_check_deliver, even though it typically only do an
unlikely check. Split it into an inline fast path, and a function call
for the slow path. This brings a performance gain of about 500ns/event
on the Cubietruck.
Fix: perf counters build against kernel headers < 3.12
Copy Linux kernel perf_event.h installed headers into lttng-ust to know
the recent ABI layout, and use the bit description detailed in the
following Linux kernel commit:
Fall-back on the perf read system call for kernels prior to 3.12,
because older kernels have an ABI bug where a union was used for both
cap_usr_time and cap_usr_rdpmc.
Ensure setup_perf set the pc pointer value before checking whether we
need to the file descriptor open or not.
Batch invocation of synchronize_rcu() when unregistering many events
from a session.
Also batch invocation of synchronize_rcu() when registering the same
events within many concurrent sessions (starting from the 2nd session).
Those slowdowns are noticeable with applications processes that have a
short life-time, e.g. shell scripts spawning multiple short-lived
processes take significantly longer to complete when LD_PRELOADing a UST
probe provider.
This slowdown only occurs when UST tracing sessions are created in the
session daemon.
tracepoint_probe_update_all() (currently unused) implements a similar
mechanism which has the downside of iterating on all events in all probe
libraries (not as efficient). Move synchronize_rcu() in
tracepoint_probe_update_all() outside of the iteration on all events to
free in this function, because it is only needed between the last
callsite update and the first memory reclaim, not between list removal
and reclaim.
Faulting the TLS variable when accessed for the first time can trigger
deadlocks with internal libc lock when using the liblttng-ust-malloc
wrapper. Fix this by pre-faulting it in a library constructor, similarly
to the approach taken for all other TLS variables in lttng-ust.
Fix: cleanup local_apps.allowed flag on lib cleanup
In case of applications using fork/clone, which drop their privileges,
we need to clear the local_apps.allowed flag, otherwise those
application get an assertion when using the liblttng-ust-fork helper:
e.g.
varnishd: lttng-ust-comm.c:423: setup_local_apps: Assertion `local_apps.allowed == 0' failed.
CID 1357641 (#1 of 1): Out-of-bounds write (OVERRUN)2. sprintf_overrun:
sprintf will overrun its first argument &name[len] which can accommodate
4 bytes. The number of bytes written may be 5 bytes, including the
terminating null.
This variable can be tested by applications to check whether lttng-ust
is loaded. They simply have to define their own "lttng_ust_loaded" weak
symbol, and test it. It is set to 1 by the library constructor.
The main use-case is to allow applications to detect that they should
not try to close file descriptors that do not belong to them (e.g. BSD
closefrom). This is a common pattern with applications invoking
daemon(3).
Set internally by liblttng-ust's constructor. Can be used by
applications to detect if lttng-ust is loaded, even if liblttng-ust is
not directly linked by the application.
The main use-case is to allow applications to detect that they should
not try to close file descriptors that do not belong to them (e.g.
BSD closefrom). This is a common pattern with applications invoking
daemon(3).
Note that this environment variable is passed to children of a traced
process, and through exec calls. Therefore, an application might think
that lttng-ust is loaded even though it's not loaded in its own address
space if it was loaded by one of its parent processes.
On this error path, we should not free lttng_chan, because it is located
within an allocated shm memory area associated with the channel now. It
is invalid to free this pointer.