Hollis Blanchard <hollis_blanchard@mentor.com> wrote:
> I seem to have hit a little problem with a "hello world" test app and
> lttng-ust 2.0.3. lttng-ust.git seems to be affected as well. Basically,
> I created a single UST tracepoint, but as soon as I run "lttng
> enable-event -u -a", my app segfaults. The problem seems to be that when
> creating the event to pass to ltt_event_create(), we try to memcpy the
> full 256 bytes of name. However, the name might be shorter, and if we
> get unlucky it falls within 256 bytes of the segment boundary...
Fixing the 3 sites where this issue arise. Manually inspecting all
memcpy in the UST code returned by grep did the job.
Christian Babeux [Sat, 29 Sep 2012 17:37:40 +0000 (13:37 -0400)]
Fix: reloc offset validation error out on filters with no reloc table
The reloc table is currently appended at the end of the bytecode data.
With this scheme, the reloc table offset will be equal to the length
of the bytecode data.
Val. Operator
---- --------
0x40 (FILTER_OP_LOAD_STRING)
0x6D m
0x79 y
0x53 S
0x74 t
0x72 r
0x69 i
0x6E n
0x67 g
0x00 \0
0x40 (FILTER_OP_LOAD_STRING)
0x79 y
0x6F o
0x75 u
0x72 r
0x53 S
0x74 t
0x72 r
0x69 i
0x6E n
0x67 g
0x00 \0
0x0C (FILTER_OP_EQ)
0x01 (FILTER_OP_RETURN)
In this case, we see that the reloc table offset (24) is indeed equal to
the length of the bytecode (24), but the reloc table is _empty_. Thus,
the reloc_offset received in handle_message() will be equal to the
data_size and will be wrongly flagged as not within the data even thought
the filter is entirely valid.
The fix is to simply allow a reloc_offset to be equal to the data_size.
Fixes #342
Signed-off-by: Christian Babeux <christian.babeux@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
The main issue is that get_wait_shm() bypass the fork() wrapper (with
lttng_ust_nest_count), which is responsible for holding the UST mutex
across fork(). Therefore, when exiting the context of the child process,
we execute the destructor, which try to grab the UST mutex, which might
be in pretty much any state.
Given that we don't want this process to try to register to
lttng-sessiond (because this is internal to lttng-ust), we might want to
let it skip the destructor execution. This would actually be the easiest
way out.
Fix: Filter ABI changes to support FILTER_BYTECODE_MAX_LEN (65536)
In order to support the filter bytecode maximum length (65536 bytes),
the lttng_ust_filter_bytecode len field type must be able to
hold more than a uint16_t. Change the field type to a uint32_t.
Also, since the relocation table is located at the end of the actual
bytecode, the reloc_table_offset (reloc_offset in ust-abi) field must
support offset values larger than 65535. Change the field type to a
uint32_t. This change will allow support of relocation table appended
to larger bytecode without breaking the ABI if the need arise in the
future.
Both changes currently breaks the filter ABI, but this should be a
reasonable compromise since the filtering feature has not been
released yet.
Signed-off-by: Christian Babeux <christian.babeux@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
We keep compatiblity with applications (so we're still in the 2.x
versions), but we are breaking compatibility with lttng-consumerd.
Therefore, push the internal version number to 3.0.0.
Compiling on x86-32 shows the following warnings for filter (with gcc
4.3 and 4.4):
././ust_tests_hello.h:28: note: initialized from here
././ust_tests_hello.h:28: error: dereferencing pointer ‘__stack_data.18’ does break strict-aliasing rules
Fix it by using memcpy when copying to the temporary "stack" array used
to send arguments to the filter.
When the consumerd dies (from a SIGKILL), it may close all of its file
descriptors rather abruptly.
We ensured that the UST command threads have all signals blocked, and
they use MSG_NOSIGNAL when sending messages to the sessiond over
sockets.
However, the consumer scheme uses a pipe(2) to transport the "wakeup"
info from the application tracing site to the consumer daemon. It may
send a SIGPIPE to the application in that case, which could kill the
application, an unwanted side-effect.
Block thread SIGPIPE around write() and wait for the signal to fix this.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Christian Babeux <christian.babeux@efficios.com> CC: David Goulet <dgoulet@efficios.com>
Fix: Libtool fails to find dependent libraries when cross-compiling lttng-ust
This problem arise when cross compiling and linking libraries with
indirect libraries dependencies (such as liblttng-ust). This "bug" is
caused by an upstream modification in the libtool package on Debian
system. The libtool "link_all_deplibs" flag is set to "no" by default
on linux targets (AFAIK, other distros set it to "unknown").
The chosen solution is to detect such cases via the configure script
and automagically patch the libtool.m4 by forcing the "link_all_deplibs"
to "unknown".
This fixup can be disabled with the appropriate configure flag:
./configure --disable-libtool-linkdep-fixup
Sample configure output on affected systems:
checking for occurence(s) of link_all_deplibs = no in
./config/libtool.m4... 3
configure: WARNING: the detected libtool will not link all
dependencies, forcing link_all_deplibs = unknown
Fixes: #321 Signed-off-by: Christian Babeux <christian.babeux@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Paul Woegerer [Wed, 18 Jul 2012 19:28:44 +0000 (15:28 -0400)]
Make lttng-ust robust against -finstrument-functions.
[ Edit by Mathieu Desnoyers:
We need to declare the no_instrument_function attribute on function
declarations (rather than definition) for g++. Moved the attribute prior
to the function declaration (rather than after) to follow the coding
style within LTTng-UST. ]
Fix c99 compatibility: tp_rcu_dereference_bp() should not use braced-groups within expressions
Allow tp_rcu_dereference_bp() to be used within programs compiled with
--std=c99 -pedantic -Werror. Fixes the following:
In file included from hello.c:34:0:
ust_tests_hello.h: In function ‘__tracepoint_cb_ust_tests_hello___tptest’:
ust_tests_hello.h:28:1: warning: ISO C forbids braced-groups within expressions [-pedantic]
ust_tests_hello.h:28:1: warning: ISO C forbids conversion of object pointer to function pointer type [-pedantic]
We can easily fix this one since tp_rcu_dereference_bp() really
evaluates only a single expression.
Fix c99 compatibility: tp_rcu_dereference_bp() should not use braced-groups within expressions
Allow tp_rcu_dereference_bp() to be used within programs compiled with
--std=c99 -pedantic -Werror. Fixes the following:
In file included from hello.c:34:0:
ust_tests_hello.h: In function ‘__tracepoint_cb_ust_tests_hello___tptest’:
ust_tests_hello.h:28:1: warning: ISO C forbids braced-groups within expressions [-pedantic]
ust_tests_hello.h:28:1: warning: ISO C forbids conversion of object pointer to function pointer type [-pedantic]
We can easily fix this one since tp_rcu_dereference_bp() really
evaluates only a single expression.
* Mathieu Desnoyers (mathieu.desnoyers@efficios.com) wrote:
> * Burton, Michael (mburton@ciena.com) wrote:
> > Mathieu,
> >
> > I think there is a deadlock scenario in UST, which has been causing my problem.
>
> Good catch !
>
> >
> > sessiond is started as root:
> > - creates global sockets ONLY
> > - DOES NOT CREATE shm in $HOME/lttng-ust-wait-<uid>
> >
> > application linked against ust is run as root:
> > - in lttng_ust_init constructor
> > - ust_listener_thread (local_apps)
> > - fails to connect to local_apps in $HOME/.lttng (as expected)
> > - prev_connect_failed=1
> > - ust_unlock()
> > - restart
> > - wait_for_sessiond()
> > --> - ust_lock()
> > | - get_map_shm()
> > | - get_wait_shm()
> > DEADLOCK - shm_open() FAILS (not created by sessiond when run by root)
> > | - fork() (trying to create shared memory itself)
> > | - ust_before_fork()
> > ------------> - ust_lock()
> >
> >
> > You should be able to create this with an empty main, with no
> > tracepoints. As long as sessiond is started as root so
> > $HOME/lttng-ust-wait-<uid> is not created. You can also make the
> > lttng-ust constructor (lttng_ust_init) wait forever and then you'll be
> > able to see the deadlock in gdb without even leaving the
> > lttng_ust_init constructor.
>
> Ah, I see. This deadlock is caused by the interaction between
> [ liblttng-ust-fork ] and liblttng-ust (the fork override is
> performed by [ liblttng-ust-fork ]).
This can be reproduced easily with the in-tree tests: by removing the
lttng-ust-apps-wait* files belonging to the user in /dev/shm, running
the "tests/fork" test (with ./run) hangs. If we run "hello" first, and
then the fork test, it works fine.
Fixing this by keeping a nesting counter around the fork() call, so we
return immediately from the pre/post fork handlers if they are
overridden by liblttng-ust-fork.
Reported-by: Michael Burton <mburton@ciena.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fix strict ISO-C compatibility for ust-tracepoint-event.h public header
../../include/lttng/ust-tracepoint-event.h:328:23: warning: ISO C does not permit named variadic macros [-Wvariadic-macros]
Use ... and __VA_ARGS__ instead of args... and args.
This enables ISO-C compability for the tracepoint headers for program
instrumentation. Note that the probes need to be built _without_ strict
C99 flags (they require gnu extensions).
From Hollis Blanchard <hollis_blanchard@mentor.com>:
> Hi, I was adding an LTTng UST 2.0 tracepoint to an application that uses
> -warn-common (see http://www.math.utah.edu/docs/info/ld_2.html). I created
> a simple tracepoint, had lttng-gen-tp produce tracepoints.o, then linked
> that to the application, along with -llttng-ust. This results in some
> warnings:
>
> tracepoints.o: warning: common of `handle' overridden by definition
> /usr/local/lib/liblttng-ust.so: warning: defined here
> tracepoints.o: warning: common of `lttng_client_callbacks_overwrite' overridden
> +by definition
> /usr/local/lib/liblttng-ust.so: warning: defined here
> tracepoints.o: warning: common of `lttng_client_callbacks_discard' overridden by
> +definition
> /usr/local/lib/liblttng-ust.so: warning: defined here
> tracepoints.o: warning: common of `lttng_client_callbacks_metadata' overridden
> +by definition
> /usr/local/lib/liblttng-ust.so: warning: defined here
> /usr/local/lib/liblttng-ust-tracepoint.so.0: warning: multiple common of
> +`handle'
> tracepoints.o: warning: previous common is here
>
> This seems to be a valid warning. The LTTng UST headers contain
> definitions like this in include/lttng/ringbuffer-config.h:
> struct lttng_ust_shm_handle *handle;
>
> If two objects use that header, each will get a copy of "handle", right?
handle: This was meant to be a forward declaration of
struct lttng_ust_shm_handle
so just removing the "*handle" part. This can be considered as a
cleanup (or a fix without actual runtime effect).
lttng_client_callbacks_*: if the cb values would have been used in the
consumer daemon, this would have caused an issue: these would be set to
NULL instead of the actual callback pointers. So in a way this is a fix,
but it does not have any runtime impact at this point.
Then when the demo is run with LTTNG_UST_DEBUG=1, a warning is shown,
like:
liblttng_ust_tracepoint[3315/3315]: Warning: Tracepoint signature mismatch, not
enabling one or more tracepoints. Ensure that the tracepoint probes prototypes
match the application. (in set_tracepoint() at tracepoint.c:310)
liblttng_ust_tracepoint[3315/3315]: Warning: Tracepoint "ust_tests_demo3:done"
signatures: call: "_Bool, value" vs probe: "bool, value". (in set_tracepoint()
at tracepoint.c:312)
It seems that TP_ARGS does not perform preprocessor expansion on the
"bool" type spec, while something underneath TP_FIELDS does. And since
(at least on this Centos 6.2 box) stdbool.h uses a #define rather than a
typedef to make bool equivalent to _Bool, liblttng detects a mismatch.
Reported-by: John Steele Scott <toojays@toojays.net> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
The listener thread does not block signals and receives signals that are
intended for the application. As this can cause applications to fail,
the listener thread should block all signals. The attached patch is
derived from an old commit and fixes the issue.