David Goulet [Fri, 22 Jun 2012 15:02:51 +0000 (11:02 -0400)]
Fix: enable event loglevel match function
For the loglevel type ALL, the value set in an event inside the session
daemon is -1 where the value received from the client is 0. For this
loglevel type, the loglevel does not matter.
This following example was NOT returning a correct error message.
"event1" is set with a loglevel "TRACE_CRIT". The normal behavior here
is that once enabled, you can not change the loglevel of the enable
event on the tracer side with a second command. It now returns a new
error message like so:
This commit makes the session daemon verify if _both_ the name and
loglevel are the same when enabling an event or else an error is
returned.
Also, the session daemon will continue enabling events and not return an
error is the loglevel does not match event for ust app on the tracer
which returns an EPERM at that stage. This is to address the case where
two applications have the same event name but with different loglevel.
Reported-by: Tan Dung Le Tran <tan.dung.le.tran@ericsson.com> Signed-off-by: David Goulet <dgoulet@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fix: close all file descriptors when executed as daemon
Both sessiond and consumerd support the option "-d" to run as daemon. In
some specific cases, e.g. when launched from dpkg installation scripts,
file descriptors 3, 4, 5 are left open and don't seem to have O_CLOEXEC
flag set, so the install script hangs because the sessiond still holds a
reference to them. daemon(3) only closes standard FD 0, 1, 2.
Fix this issue by closing all file descriptors after calling daemon(3).
Note: we make sure no file descriptor is opened before calling daemon(3)
by moving the init_thread_quit_pipe() call after the FD close.
Fix: consumer fd recv thread should write into non-blocking pipe
Writing into a blocking pipe will cause the writer thread to block on
the poll fds thread when the pipe is full. Given that we would like to
batch stream array reallocation as much as possible, this wakeup should
not block.
Fix: work-around glibc __nptl_setxid vs clone hang
hash table resize threads exit end up setting a "locked" state within
libc pthread, which deadlocks with seteuid/setegid called from the
cloned process in runas.c when runas() is called exactly when a resize
thread exits.
Temporarily fix this issue by adding a mutex cross this resize
operation, which holds mutual exclusion with runas() usage.
We should investigate whether we want to properly call exec() from the
runas.c clone child before touching any non-async-signal-safe libc call.
However, given that this change is more intrusive, let's first use this
mutex-based work-around.
Before this fix, running 1000 instances of "demo-trace 300" with
sessiond running as root, and:
lttng create
lttng enable-event -u -a
lttng start
would sometimes lead to consumerd hang with the following clone child
backtrace:
setxid_mark_thread (cmdp=<optimized out>, t=0x7f52dd47c700)
at allocatestack.c:995
995 allocatestack.c: No such file or directory.
(gdb) bt full
at allocatestack.c:995
ch = <optimized out>
at allocatestack.c:1088
t = 0x80
signalled = <optimized out>
result = <optimized out>
runp = 0x7f52dd47c9c0
at ../sysdeps/unix/sysv/linux/setegid.c:44
__p = 0xfffffffffffffe00
__cmd = {syscall_no = 119, id = {-1, 1000, -1}, cntr = 0}
result = <optimized out>
data = 0x7f52e66e1930
writelen = <optimized out>
writeleft = <optimized out>
index = <optimized out>
sendret = {i = 0, c = "\000\000\000"}
ret = <optimized out>
__func__ = "child_run_as"
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
No locals.
No symbol table info available.
Was causing the sessiond to fail to receive streams under heavy load,
because this test needs to be done with a mask rather than equality.
Testing equality was failing as soon as POLLPRI (or any other flag) was
set.
David Goulet [Tue, 10 Apr 2012 17:42:46 +0000 (13:42 -0400)]
Fix: wait for sessiond to stop in tests
Before returning from stop_sessiond bash function, we wait that the
sessiond daemon completely stopped. If it hungs up at that point, the
kill did not work and investigation can begin.
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Fri, 6 Apr 2012 19:24:53 +0000 (15:24 -0400)]
Don't report back error on syscalls fail for -a -k
lttng enable-event -a -k will not report an error anymmore if enabling
syscall events has failed. Please refer to the commitdiff for a detailed
comment on why this is done like so.
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Thu, 5 Apr 2012 19:56:43 +0000 (15:56 -0400)]
Fix: destroy context hash table being NULL
Passing an event unknown loglevel type to the session daemon (for UST
domain) was triggering an error code path to destroy the context hash
table of the event which is not created once the error is hit.
Fix a segfault completely killing the session daemon.
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Thu, 5 Apr 2012 15:39:59 +0000 (11:39 -0400)]
Fix: make lttng expand path for trace output opt
lttng create --output was passing the path string to the session daemon
and thus, for relative path like './mytraces', it was created in the
current directory of the session daemon.
Now lttng command line uses the realpath(3) of the --output string and
denies creation if multiple level of directory does not exist (Ex:
/tmp/foo/bar/chap, if foo/ does not exist, it is refused).
Directory creation still occurs on the session daemon side.
Reported-by: Ettore Del Negro <ettore@ettoredelnegro.me> Signed-off-by: David Goulet <dgoulet@efficios.com>
Allow to keep 25% of file descriptors reserved for commands/kernel
tracing/internal communication within the sessiond by limiting
applications to 75% of the available file descriptors. This ensures
traced applications cannot cause a sessiond denial of service.
Julien Desfossez [Tue, 20 Mar 2012 15:19:14 +0000 (11:19 -0400)]
Fix: lttng view, error message and exit code
lttng view is a frontend command, when the viewer is not found in the
path, the error message should be human readable (no need for the
developper-oriented debug message).
Also the return code must indicate that something went wrong if the
viewer is not on the system.
(fix #144)
Signed-off-by: Julien Desfossez <julien.desfossez@efficios.com> Signed-off-by: David Goulet <dgoulet@ev0ke.net>
The session lock is broken in that it does not handle teardown correctly
(use after free). Surround each usage by the session list lock for now
to fix this issue, and don't unlock the session lock after free. Since
each session lock usage is surrounded by session list lock, no other
thread will be left waiting on this lock when the session destroy is
performed.
This effectively renders useless the per-session lock. Leave it there
for now to minimize code change before 2.0 final.
This locking scheme will be revisited for lttng 2.1.
Acked-by: David Goulet <dgoulet@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
David Goulet [Tue, 13 Mar 2012 15:13:25 +0000 (11:13 -0400)]
Fix double PID registration race
Introduce a second hash table indexed by application socket which have
the exact same content as the hash table indexed by PID.
On unregister, we now use a direct lookup per socket instead of using
the key map between sock and PID. This prevents the PID-sock lookup race
when the unregister happens just after the replace and before the
close(fd).
We also use an add_replace call on application registration for the PID
hash table and kept the add_unique for the socket hash table.
(closes #7)
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David Goulet <dgoulet@efficios.com>
That file's license was not recognized by licensecheck. I suggest using
the MIT/X11 text which is quite similar, and is already used elsewhere
in the project (in Babeltrace).
Most of the source files had the wrong FSF address. Also reworded the
first paragraph so that licensecheck's regexes actually pick up the
version number.
I made sure lttng.h and lttng-ctl.c advertise LGPL-2.1. Every other file
touched in this patch was and remains under GPL2, please make sure this
is correct. (Some files related to the RCU hashtable are under
LGPL-2.1+, but their headers were already clean).
David Goulet [Tue, 6 Mar 2012 16:16:26 +0000 (11:16 -0500)]
Fix error.h non-static variables for liblttng-ctl
Linking with liblttng-ctl made the variable opt_quiet and opt_verbose
undefined if nonexistent in the linked application.
Rename the variables adding the prefix lttng_* and declaring them in
liblttng-ctl as global variable. The user can now control the verbosity
of the library by simply setting them.
Future work will mostly add an API call to control verbosity.
(closes #151)
Signed-off-by: David Goulet <dgoulet@efficios.com>
Fix: Use PERROR all across lttng-tools, never make it quiet
We never want to hide these errors, even in quiet mode. For those
"errors" that are expected and part of the normal operation (e.g. send
consumer channel: Socket operation on non-socket), we will have to
handle the return values and errno explicitly in the code.
David Goulet [Thu, 1 Mar 2012 15:36:06 +0000 (10:36 -0500)]
Fix security permission on lttng run directory
Add execute flag for other (r+x) on the lttng run directory at
/var/run/lttng so instrumented application *not* in the tracing group
can register to the global session daemon running as root.
(refs #141)
Signed-off-by: David Goulet <dgoulet@efficios.com>
Raphaël Beamonte [Mon, 27 Feb 2012 22:37:18 +0000 (17:37 -0500)]
Fix documentation in lttng.h
Some functions in lttng.h are not aimed to be used only for kernel but
also for UST. This patch changes only the references made to 'kernel' to
make the documentation generic in the comments.
(closes #126)
Signed-off-by: David Goulet <dgoulet@efficios.com>