Fix: loglevel and model_emf_uri with g++ compiled probes
Fix the loglevel and model_emf_uri features for probe providers compiled
with g++. They were previously effectless because of C++ symbol name
mangling. The weakref was refering to the non-mangled symbol, but C++
emits a mangled symbol for the static variable.
Fix this by emitting an extern "C" symbol with hidden visibility on C++.
With a C compiled, this simply turns a static variable into a variable
with hidden visibility.
Fix: perform statedump before replying to sessiond
If a stop command immediately follows a start command, the consumer
daemon will stop event recording in the ring buffers shared memory
control structures before the sessiond sends further commands to the
application. Therefore, a stop-after-start may be performed concurrently
with the statedump, leading to have parts of the statedump being
missing. This case may always happen if an application exits during
statedump, but it is not expected to have incomplete statedump in the
stop-after-start use case.
Needed if we want to hold the ust_lock() while we connect to the session
daemon without blocking the application forever if the session daemon is
hung on SIGSTOP.
This only triggers if we launchs _many_ applications with a session
daemon SIGSTOP'd (e.g. 1000 in parallel), so we fill the socket queue,
and applications hang there until the session daemon is SIGCONT'd.
Fix: perform TLS fixup in all UST entry points from each thread
Each entry point into lttng-ust that end up taking the ust lock need to
perform a TLS fixup for each thread. Add a TLS fixup in both listener
threads, in fork and base address dump helper libs, and in app context
and tracepoint probe registration/unregistration functions, which can be
called from application threads.
Those ensure we don't take the libc dl lock within the ust lock when
performing the TLS lazy fixup.
Copy Linux kernel perf_event.h installed headers into lttng-ust to know
the recent ABI layout, and use the bit description detailed in the
following Linux kernel commit:
Fall-back on the perf read system call for kernels prior to 3.12,
because older kernels have an ABI bug where a union was used for both
cap_usr_time and cap_usr_rdpmc.
Also fall-back on the perf read system call for kernels that do not
support rdpmc.
Ensure setup_perf set the pc pointer value before checking whether we
need to the file descriptor open or not.
This combines the following master commits:
* Fix: perf counters build against kernel headers < 3.12
* Add generic fallback for perf counter read
* Fix: lttng context perf: missing stdbool.h header include
* Add perf context support for ARMv7
(removed the ARM-specific lines when combining)
* Keep perf context FD open for other architectures
Since this is a bugfix, we explicitly do not enable building perf
support for other architectures, as this would introduce a feature in
the stable release cycle.
Batch invocation of synchronize_rcu() when unregistering many events
from a session.
Also batch invocation of synchronize_rcu() when registering the same
events within many concurrent sessions (starting from the 2nd session).
Those slowdowns are noticeable with applications processes that have a
short life-time, e.g. shell scripts spawning multiple short-lived
processes take significantly longer to complete when LD_PRELOADing a UST
probe provider.
This slowdown only occurs when UST tracing sessions are created in the
session daemon.
tracepoint_probe_update_all() (currently unused) implements a similar
mechanism which has the downside of iterating on all events in all probe
libraries (not as efficient). Move synchronize_rcu() in
tracepoint_probe_update_all() outside of the iteration on all events to
free in this function, because it is only needed between the last
callsite update and the first memory reclaim, not between list removal
and reclaim.
Faulting the TLS variable when accessed for the first time can trigger
deadlocks with internal libc lock when using the liblttng-ust-malloc
wrapper. Fix this by pre-faulting it in a library constructor, similarly
to the approach taken for all other TLS variables in lttng-ust.
Fix: cleanup local_apps.allowed flag on lib cleanup
In case of applications using fork/clone, which drop their privileges,
we need to clear the local_apps.allowed flag, otherwise those
application get an assertion when using the liblttng-ust-fork helper:
e.g.
varnishd: lttng-ust-comm.c:423: setup_local_apps: Assertion `local_apps.allowed == 0' failed.
On this error path, we should not free lttng_chan, because it is located
within an allocated shm memory area associated with the channel now. It
is invalid to free this pointer.
Non-LGPL modules that use tracepoint instrumentation, but have no
compile unit defining either TRACEPOINT_DEFINE or
TRACEPOINT_CREATE_PROBES fail to build due to undefined reference to
`tracepoint_dlopen_ptr'.
Jonathan Rajotte [Thu, 26 May 2016 22:05:12 +0000 (18:05 -0400)]
Fix: log4j example: set logger level to prevent unexpected level inheritance
BSF or other jars can ship with log4j.properties file embedded. This
causes problem when launching application with a general class path (e.g
/usr/share/java/*) since log4j will look for a property file in all
loaded jars. If any contains directive for the root logger it will
affect any logger with no level who are directly under the root logger.
This could result in an unexpected behaviour (e.g no events triggered
etc.).
Link: https://issues.apache.org/jira/browse/BSF-24 Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fix: initialize RCU callbacks with mixed LGPL/non-LGPL objects
Linking both _LGPL_SOURCE and non-_LGPL_SOURCE objects into the same
module may result in having the RCU callbacks left to NULL, which
prevents tracing for tracepoints and/or probes which sit in the non-LGPL
compile unit.
This happens if the contructor of the LGPL compile unit is executed
first, thus incrementing the __tracepoint_registered counter, which will
prevent later execution of that same constructor in the non-LGPL compile
unit to initialize the RCU callbacks in __tracepoint__init_urcu_sym().
Fix: incorrect structure layout with mixed LGPL/non-LGPL objects
Linking both _LGPL_SOURCE and non-_LGPL_SOURCE objects into the same
module may result in corruption. If the tracepoint_dlopen object used is
the one declared by a LGPL compile unit, a non-LGPL compile unit may try
to initialize fields beyond the end of the structure.
Fix: don't call __builtin_return_address(0) on 32-bit powerpc
Invoking __builtin_return_address(0) corrupts the stack, as previously
noticed for the "ip" context. Disable its use on 32-bit powerpc
everywhere else in the lttng-ust code base.
Fix: update debug message about weak-hidden symbols
We actually deal OK with compilers that treats weak-hidden symbols as
different addresses between compile units part of the same module.
Simply report this without statement on whether or not the compiler
producing this code is broken.
Fix: work-around gcc optimisation oddness on 32-bit powerpc
Deal with gcc O1 optimisation issues with weak hidden symbols. gcc 4.8
and prior does not have the same behavior for symbol scoping on 32-bit
powerpc depending on the object size: symbols for objects of 8 bytes or
less have the same address throughout a module, whereas they have
different addresses between compile units for objects larger than 8
bytes. Add this pointer indirection to ensure that the symbol scoping
match that of the other weak hidden symbols found in tracepoint.h.
"make distcheck" marks each source file on the srcdir in the extracted
dist tarball read-only. The examples copy from the srcdir into the
builddir before running the "make" examples, but this keeps the
read-only flag on the builddir directories, which fails the build
because the resulting objects cannot be created.
Fix this by ensuring the copied target directory for each example is
user-writeable.
We need to byteswap integers passed to the filter when they are tagged
as being in an endianness which differs from the architecture
endianness, so the integer comparisons make sense in terms of value
rather than raw bytes for those fields.
This has been detected in the lttng-modules port of the filter
interpreter by Coverity. The intent of the code in UST is similar, and
we can find the same dead code, although Coverity may not see it as dead
code because it cannot prove that the string is not modified between the
two uses. Since we know it is not modified, remove the dead code.
Assertions in the lttng-ust-comm init function are slightly too harsh
for their own good. In situations involving incoherent seccomp profiles
(e.g. accepting futex, poll, nanosleep, clock_nanosleep, but not
restart_syscall), unexpected errno values can be returned by
sem_timedwait.
Print an error in those situations, but let the application proceed.
Fix: Ensure the Java JUL messages are correctly formatted
It is possible for log records to contain messages that need some
formatting, for example if the string contains localized elements
or if the log(Level, String, Object[]) method is used.
In these cases, we need to make sure to format the string and not
pass the "raw" string to the tracepoint.
This only applies to the JUL API. log4j 1.2.x did not handle such
formatting, although log4j 2.x does.
This is a backport of commit 4721f9c to the stable-2.7 branch.
When adding large context (e.g. callstack), headers larger than 256
bytes cause discrepancy between calculated size and size written into
the trace buffers. This generates a corrupted trace and triggers a
warning in ring buffer backend, which triggers a safety net disabling
tracing for the current channel.
Mikael Beckius [Tue, 12 May 2015 09:04:34 +0000 (11:04 +0200)]
Fix: live timer calculation error
There is an calculation error for live timer. Variable
chan->switch_timer_interval is based on microsecond, and it is not right
to assign chan->switch_timer_interval mod 1000000 to var tv_nsec which
is based on nanosecond.
Signed-off-by: Mikael Beckius <mikael.beckius@windriver.com> Signed-off-by: Jianchuan Wang <jianchuan.wang@windriver.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Stelios Bounanos [Wed, 14 Oct 2015 16:31:36 +0000 (17:31 +0100)]
Fix: Don't (re)define STAP_PROBEV
Define a new LTTNG_STAP_PROBEV macro to avoid clobbering STAP_PROBEV or
emitting unwanted sdt probes when lttng-ust has been built without sdt
support.
Fix: Argument with 'nonnull' attribute passed null
Reported by scan-build
API Argument with 'nonnull' attribute passed null libringbuffer
/ring_buffer_backend.c 380
API Argument with 'nonnull' attribute passed null libringbuffer
/ring_buffer_backend.c 420
CID 1021259 (#1 of 1): Improper use of negative value
(NEGATIVE_RETURNS)5. negative_returns: sysconf(_SC_PAGESIZE) is passed
to a parameter that cannot be negative.
Philippe Proulx [Sat, 5 Sep 2015 17:38:01 +0000 (13:38 -0400)]
Fix: Python agent: do not register twice to same port
It is possible that one of the session daemons left its agent.port
file on the file system, for example when killed with SIGKILL. It
is also common that both those session daemons use the same port for
listening to agent connections. In this case, if one session daemon
is running, but two agent.port files exist, the Python agent would
connect its two threads to the same session daemon, leading to
everything done twice: list shows events twice, tracing records
events twice, etc.
This patch ensures that if two agent.port files are found and have
the same content, only one thread is used.
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fix: close socket on protocol error, sendmsg MSG_NOSIGNAL
Don't try to keep interacting with sessiond when a protocol error is
detected at the UST application side: this means we cannot trust the
protocol anymore, so there is no reason for keeping the socket open.
For instance, if the application is exiting and we receive a new stream,
we're effectively not reading the stream data, and we return an error.
Unfortunately, the session daemon may try to send us another command,
but we will try interpreting the stream data as a command, which is
invalid.
Also, use MSG_NOSIGNAL flag in the fds recvmsg, so the session daemon
don't get killed with SIGPIPE when it cannot send to the socket due to
connection closed.
Listener threads can be cancelled with ust lock held, which can hang the
following ust cleanup routine, because tracepoint probe unregister needs
to take the ust lock.
Fix this by disabling pthread cancellation for the entire duration of
the ust lock.
Jonathan Rajotte [Mon, 10 Aug 2015 18:45:34 +0000 (14:45 -0400)]
Build: python agent: use setup.py over autoconf
This change provides a valid way of installing the agent. The autoconf
python macro provides the wrong installation path for
python 3 under Debian and Ubuntu due to distro-specific packaging for
python 2.7 and 3.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fix: Cleanup local_apps sock_info in lttng_ust_cleanup
LTTng-UST will deadlock after a fork while waiting on the
"constructor_wait" semaphore if local apps, handled the session daemon
running under the current UID, are disabled or "not_allowed".
This deadlock can be triggered by setting an infinite registration
timeout, clearing the HOME environment variable and launching an app
which calls FORK(3). This will cause setup_local_apps() to fail to
determine the local_apps sock_path, thus leaving
local_apps.allowed == 0.
This, in turn, would cause lttng_ust_cleanup to skip the cleanup
of the local_apps sock_info after a fork,
leaving local_apps.constructor_sem_posted == 1. This would cause
handle_register_done() in the child to skip over the decrementation
of sem_count and post of the constructor_wait semaphore.
Handling sys_futex EINTR allows us to retry waiting on the futex
immediately if we are interrupted by a signal without having to try
connecting to the session daemon.
This should not impact correctness though, as the wait scheme would
simply try to connect, fail, and try waiting again.
Michael Jeanson [Tue, 30 Jun 2015 20:59:54 +0000 (16:59 -0400)]
Fix: java class check when uudecode is not present
On systems where 'uudecode' is present, the java class check will
run a precompiled class file in the local directory. If the CLASSPATH
variable doesn't contain "." the test will fail regardless of the
presence of the tested for class on the classpath.
This fix makes the behavior of the test consistant whether 'uudecode'
is present or not.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
- Use camelCase for regular members and FULL_CAPS for static final
ones (except for API classes).
- All members of an interface are public static final by default,
no need to repeat the modifiers.
- Marked final some fields that could be.
Java has a @Deprecated annotation to indicate deprecated classes
(or methods, members, etc.). This will keep the code available for
backwards compatibility, but will emit a warning to any user calling
it.
- Remove unused imports.
- Access static fields statically.
- Mark static methods that can be.
- Remove exception declarations that are not actually thrown
(on non-API methods only).
Trace viewers use the tracer minor version to know which event name to
expect for the statedump. Since we changed the namespacing of those
events for 2.7, switch the tracer minor version to "7" before rc1, hence
the use of "pre" at this stage of the release process.
Because tracepoint state is updated by lttng-ust threads, and read by
application threads, we need to perform a volatile load of that state.
Not having the volatile load can cause the compiler to optimize away the
reload of that state, keeping a local copy instead. For instance, a main
program consisting of a loop could keep the tracepoint state on its
stack or in registers without ever reloading it from memory.
Perform a volatile load (CMM_LOAD_SHARED()) of the tracepoint state.
Add the matching volatile store (CMM_STORE_SHARED()) in tracepoint.c for
the state update.