Fix: ring buffer: handle concurrent update in nested buffer wrap around check
With stress-test loads that trigger sub-buffer switch very frequently
(small 4kB sub-buffers, frequent flush) in lttng-modules, we currently
observe this kind of warnings once every few minutes:
[65335.896208] ring buffer relay-overwrite-mmap, cpu 5: records were lost. Caused by:
[65335.896208] [ 0 buffer full, 1 nest buffer wrap-around, 0 event too big ]
It appears that the check for nested buffer wrap-around does not take
into account that a concurrent execution contexts (either nested for
per-cpu buffers, or from another CPU or nested for global buffers) can
update the commit_count value concurrently.
What we really want to do with this check is to ensure that if we enter
a sub-buffer that had an unbalanced reserve/commit count, assuming there
is no hope that this gets rebalanced promptly, we detect this and drop
the current event. However, in the case where the commit counter has
been concurrently updated by another reserve or a switch, we want to
retry the entire reserve operation.
One way to detect this is to sample the reserve offset twice, around the
commit counter read, along with the appropriate memory barriers.
Therefore, we can detect if the mismatch between reserve and commit
counter is actually caused by a concurrent update, which necessarily has
updated the reserve counter.
Cleanup: lib_ring_buffer_switch_new_end() only calls subbuffer_set_data_size()
lib_ring_buffer_switch_new_end() is always called when an event exactly
fills a sub-buffer, which makes padding_size always 0. However, there is
one side-effect that lib_ring_buffer_switch_new_end() needs to have: it
calls subbuffer_set_data_size() to update the size of the data to be
read from the sub-buffer.
lib_ring_buffer_write() could be passed a length of 0. This typically
has no side-effect as far as writing into the buffers is concerned,
except for one detail: in overwrite mode, there is a check to make sure
the sub-buffer can be written into. This check is performed even if
length is 0. In the case where this would fall exactly at the end of a
sub-buffer, the check would fail, because the offset would fall exactly
at the beginning of the next sub-buffer.
Cleanup: ring buffer: remove lib_ring_buffer_switch_new_end()
lib_ring_buffer_switch_new_end() is a leftover from the days where an
event that would exactly fill the current sub-buffer would automatically
trigger a sub-buffer switch into the next sub-buffer.
Even before the ring buffer code has been moved into lttng-modules, this
behavior had been changed: an event that exactly fills a sub-buffer only
fills this current sub-buffer, and does not need to switch into the
next one to populate the sub-buffer header. This change had been done so
periodical timer switch, which shares the same semantic as an event
exactly filling a sub-buffer, would not create tons of empty
sub-buffers.
However, when doing this change, lib_ring_buffer_switch_new_end() has
not been removed, but clearly should have been. Its job is now performed
by the event "commit".
lib_ring_buffer_switch_new_end() has no effect, since padding_size is
always 0.
Takes care of autotools issue caused by renaming tp.c to tp.cpp. make
distclean was required when switching between old and newer versions.
It's not needed anymore.
Fix: Add --no-as-needed to the demo example's Makefile
Some distributions now ship with the --as-needed linker flag
set by default (Ubuntu 13.04). This will cause the linker to
remove the references to lttng-ust from the provider objects
thus causing the application to fail when preloading them.
Fix: liblttng-ust process startup hang when sessiond is stopped
Ensure the listener thread owns socket and notify_socket, so they don't
have to hold the ust_lock() while connecting to the sessiond and reading
from this socket.
Therefore, after process fork, we can safely cleanup those retources,
because the thread has been removed by the operating system. On exit,
however, let the OS teardown those sockets, so exit path does not race
with the listener thread.
Zifei Tong [Thu, 30 May 2013 14:11:52 +0000 (10:11 -0400)]
Allow tracepoint providers to be compiled with g++
Move enumeration definition out of lttng_ust_lib_ring_buffer_config to
make them visible at global scope for C++ compilers.
Modify designated initializers: reordering initializers, add missing
initializers, reformat nested initializers, in order to make g++
compile.
Relevant discussion:
> So each field need to be listed ? We usually don't put NULL
> initialization for structures that are always in zero-initialized
> memory. (coding style)
This is related to a known issue of g++:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55606 (Bug 55606 - sorry,
unimplemented: non-trivial designated initializers not supported).
g++'s 'trivial designated initializers' means no out-of-order
initialization, no missing
initialization (except the fields on the tail of a struct), and nested
initialization should be done in the form {.foo = {.bar = 1}} instead of
{.foo.bar = 1}. That's why I made such modification.
> Are those changes also compatible with the LLVM c++ compiler ?
Actually, clang++ have designated initializers better supported than g++.
All the modification about designated initializers are not required for
clang++. No need to add NULL initialization, reorder initializations or
change {.foo.bar = 1} into {.foo = {.bar = 1}}. These (ugly) hacks are just
to make g++ happy.
[ Updates done by Mathieu Desnoyers to fix merge conflicts. Updated
README. ]
Actually, $^ here is "demo.o", not "demo. Also, the libs should appear
after the objects on the command line. See the "-l" section in
http://gcc.gnu.org/onlinedocs/gcc/Link-Options.html. On most setup
this doesn't matter, since -Wl,--no-as-needed was the default pretty
much everywhere. Ubuntu decided to use -Wl,--as-needed to avoid
unnecessary dependencies, so the order becomes important. If you try
to manual build on a recent Ubuntu you will get undefined references
to dlopen and such. So this patch is good.
If you read carefully the log sent by Alexandre, you see that it is
when building the shared libs in this directory
(lttng-ust-provider-ust-tests-demo.so) that the build fails. I don't
know why it fails, but Alexandre hinted that passing "-fPIE -pie" to
build a shared library is weird (it is usually -fPIC -pic). I am not
sure where that comes from. This behaviour only happens when building
the package, not when building manually.
Actually, $^ here is "demo.o", not "demo. Also, the libs should appear
after the objects on the command line. See the "-l" section in
http://gcc.gnu.org/onlinedocs/gcc/Link-Options.html. On most setup
this doesn't matter, since -Wl,--no-as-needed was the default pretty
much everywhere. Ubuntu decided to use -Wl,--as-needed to avoid
unnecessary dependencies, so the order becomes important. If you try
to manual build on a recent Ubuntu you will get undefined references
to dlopen and such. So this patch is good.
If you read carefully the log sent by Alexandre, you see that it is
when building the shared libs in this directory
(lttng-ust-provider-ust-tests-demo.so) that the build fails. I don't
know why it fails, but Alexandre hinted that passing "-fPIE -pie" to
build a shared library is weird (it is usually -fPIC -pic). I am not
sure where that comes from. This behaviour only happens when building
the package, not when building manually.
* Zifei Tong <soariez@gmail.com> wrote:
> I did some debugging one this issue. The problem only occurs when we
> have more than one context field.
> So this will not work, too:
>
> lttng create
> lttng enable-event -a -u
> lttng add-context -u -t vpid
> lttng add-context -u -t vtid
> lttng start
> $@
> lttng stop
> sleep 1
> lttng view
> lttng destroy
>
> The problem I found out is wrong `fields` argument passed into
> `ustcomm_register_channel`.
> The `fields` argument passed is a pointer to the `event_field` of the
> first element in a `lttng_ctx_field` array, but not a
> `lttng_event_field` array as expected.
Fixes #529
Reported-by: Francis Giraldeau <francis.giraldeau@gmail.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Move "hello-static-lib" to doc/examples and add non-automake Makefiles
The examples are now automatically built as part of the default make
target and plain Makefiles with no dependency on automake are provided
for clarity.
Update the manpage and README to reflect the change and remove lots of
trailing whitespace.
There is not much we can do for this compatibility bug in lttng-ust 2.0
and 2.1 (already stable). Adding this check so that starting with
lttng-ust 2.2, when liblttng-ust encounters a probe provider with a
provider version major number higher than it supports, it will reject
it.
Timer management is not called under ust_lock(). It is only called from
the consumer. Add internal locking for timer start/stop and
synchronization management.
In file included from ../include/lttng/ust-tracepoint-event.h:357,
from ../include/lttng/tracepoint-event.h:62,
from lttng-ust-cyg-profile.h:63,
from lttng-ust-cyg-profile.c:27:
././lttng-ust-cyg-profile.h: In function ‘__event_prepare_filter_stack__lttng_ust_cyg_profile___func_entry’:
././lttng-ust-cyg-profile.h:35: warning: cast from pointer to integer of different size
././lttng-ust-cyg-profile.h:35: warning: cast from pointer to integer of different size
././lttng-ust-cyg-profile.h:35: warning: cast from pointer to integer of different size
././lttng-ust-cyg-profile.h:35: warning: cast from pointer to integer of different size
././lttng-ust-cyg-profile.h: In function ‘__event_prepare_filter_stack__lttng_ust_cyg_profile___func_exit’:
././lttng-ust-cyg-profile.h:46: warning: cast from pointer to integer of different size
././lttng-ust-cyg-profile.h:46: warning: cast from pointer to integer of different size
././lttng-ust-cyg-profile.h:46: warning: cast from pointer to integer of different size
././lttng-ust-cyg-profile.h:46: warning: cast from pointer to integer of different size
CCLD liblttng-ust-cyg-profile.la
CC lttng-ust-cyg-profile-fast.lo
In file included from ../include/lttng/ust-tracepoint-event.h:357,
from ../include/lttng/tracepoint-event.h:62,
from lttng-ust-cyg-profile-fast.h:59,
from lttng-ust-cyg-profile-fast.c:27:
././lttng-ust-cyg-profile-fast.h: In function ‘__event_prepare_filter_stack__lttng_ust_cyg_profile_fast___func_entry’:
././lttng-ust-cyg-profile-fast.h:35: warning: cast from pointer to integer of different size
././lttng-ust-cyg-profile-fast.h:35: warning: cast from pointer to integer of different size
Optimisation: implement callsite hash table in tracepoint.c
Instead of iterating on every tracepoint callsite each time a probe is
registered/unregistered, use a hash table of callsites to only update
tracepoint sites matching the probe name.
Fix: tracepoint instrumentation constructor order issue
If the linker decides to run a constructor from a tracepoint probe
before the constructor from the application, a recent modification
(commit 558b9d86247004f8e9bbaf8c982f3b2b182093d1) allowed that the wrong
constructor execution order could prohibit the program's tracepoints
from being registered.
Paul Woegerer [Wed, 27 Mar 2013 14:16:26 +0000 (10:16 -0400)]
Fix: forwarding of call_site argument to field
I ran some tests with the new function entry/exit instrumentations.
The tracepoint provider for lttng_ust_cyg_profile:func_entry and
func_exit does not properly forward the call_site argument to the
call_site field. The patch below fixes the problem.
Removal of disabled tests. Fixes ./configure failing
in distribution package because by missing Makefiles
in tests/ust-basic-tracing and tests/ust-multi-test.
The "fast" .so (liblttng-ust-cyg-profile-fast.so) is for use-cases where
we expect a complete event stream to be recorded, so we can skip
duplicate information.
The verbose .so (liblttng-ust-cyg-profile.so) is for use-cases where
events discarded are expected, and the trace analyzer needs extra
information to be able to reconstruct the program flow.