Jonathan Rajotte [Fri, 13 Jan 2017 22:04:42 +0000 (17:04 -0500)]
Man: move [SESSION] before options
The previous synopses for the live mode can cause confusion to users
since it can lead to an error while trying one of the simplest create
command for live session that the synopsis is proposing:
lttng create --live test.
Other synopsis are modified for symmetry.
Fixes #1081
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Session daemon should not send streams to consumer daemon
repeatedly when CPU hotplug is performed while doing kernel
tracing.
This causes the consumer daemon to have multiple file descriptors
on the same stream, and thus try to perform operations like reading
a sub-buffer and checking for data pending concurrently. This triggers
safety-net warnings in the kernel tracer.
The failure/exit of any of the consumerd, relayd or applications
(in per-PID buffer mode) will cause the metadata closed flag to
be set.
While pushing new metadata updates to the consumerd (and relayd
in streaming/live scenarios) will fail, those conditions should
be handled in-place.
Applications are _expected_ to exit during the course of a per-PID
session. However, they will typically have pushed their metadata
to the metadata cache before doing so. The session daemon must
flush the unconsumed metadata to the consumerd in this case.
Failure to answer to the metadata request originating from the
consumerd can cause it to keep the stream lock held and, thus,
prevent the channel poll thread from cleaning up on channel
close.
Fix: consumerd: order of metadata cache vs stream lock
The locking order comment in consumer.h is incorrect. First, its
description of locking order is not in sync with the comment found in
consumer-metadata-cache.h. The comment in struct consumer_metadata_cache
only states that the metadata cache lock nests inside the consumer_data
lock, and does not mention the stream lock, which implies that the
metadata cache lock does NOT nest inside the stream lock. But let's
investigate further to confirm:
* lttng_consumer_read_subbuffer() acquires the stream lock, and then
calls lttng_ustconsumer_read_subbuffer() with stream lock held,
and then invokes commin_one_metadata_packet(), which acquires the
metadata cache lock.
* lttng_ustconsumer_sync_metadata() acquires the metadata stream lock,
and calls commit_one_metadata_packet(), which takes the metadata cache
lock.
Therefore, update the comment in consumer.h to state that the metadata
cache lock nests INSIDE the stream lock, and update
consumer_del_metadata_stream() accordingly.
This should take care of fixing the locking order reversal found by
Coverity.
CID 1368314 (#1 of 1): Thread deadlock (ORDER_REVERSAL)
CID 1368319: Program hangs (ORDER_REVERSAL)
Fixes: 5feafd4130 "Fix: protect the channel's metadata stream using the metadata cache lock" Fixes: 1ea6cc572b "Fix: lock nesting order reversed" Fixes: fb549e7ac2 "Fix: reverse channel and metadata cache lock nesting order" Reported-by: Coverity Scan Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Fix: add missing rcu_barrier before daemon teardown
When performing the "cleanup" of sessiond, consumerd, and relayd, we
destroy data structures that may still be concurrently accessed by
call_rcu worker thread.
Ensure no more work is present in the call_rcu worker thread by issuing
a rcu_barrier barrier. Note that this expects call_rcu handlers don't
chain work to other call_rcu handlers.
Fix: support for older versions of Babeltrace in test script
A new context field was introduced in version LTTng 2.8 that is printed
by Babeltrace prior to v1.2.5. This regex thus fails to match the
output. Since the context fields are not used by the script, we create a
non-capturing group for these fields that matches on both old and new
Babeltrace.
This is causing problems on Ubuntu 14.04 Trusty when building
lttng-tools from source and using the Babeltrace package from the
official repository (v1.2.1) to run the test suite.
Also, this patch removes commented and used code in the function but
keeps the names of non-capturing groups for readability.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com> CC: Philippe Proulx <pproulx@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Fix: only lock the metadata_cache in userspace consumers
The kernel consumer, which re-uses the consumer_del_metadata_stream
function, has no metadata cache. Therefore, it can't be used to
protect the metadata stream (see 5feafd41).
However, only the userspace consumers invoke
consumer_metadata_cache_write() which the previous fix seeked to
protect against. It is therefore safe to omit this lock in the
kernel consumer case.
Fix: protect the channel's metadata stream using the metadata cache lock
The consumer_thread_data_poll and consumer_thread_metadata_poll
both access the channel's metadata stream.
During a session destruction, consumer_thread_metadata_poll will
destroy all metadata streams. However, the consumer_thread_data_poll
may still invoke a consumer_metadata_cache_write() triggered
by a "ready" subbuffer. Hence, the metadata stream must be protected
from this action by the metadata cache lock.
relay and consumerd 2.7 and 2.8 are expected to negociate compatibility
with the lowest common minor version.
If a consumer daemon 2.8 interacts with a relayd 2.7, it needs to send
the index fields for ctf index 1.0. Same if a relayd 2.8 interacts with
a consumer daemon 2.7: relayd should expect ctf index 1.0 fields, and
generate a ctf index 1.0 index file layout.
If both relayd and consumerd versions are 2.8+, then we can send the ctf
index 1.1 fields over the protocol, and store them in the index files.
Whenever the relayd live viewer server opens and reads an index file,
it needs to use the file's header to figure out the index "element"
size.
[ Should be applied to master, stable-2.9, stable-2.8. ]
Liguang Li [Mon, 28 Nov 2016 08:37:47 +0000 (16:37 +0800)]
Fix: truncate the metadata file in shm-path
In the shm-path mode, the metadata will be backuped to a metadata
file, when run the lttng command "lttng metadata regenerate" to
resample the wall time following a major NTP correction, the metadata
file will not be truncated and regenerated.
Add the function clear_metadata_file() to truncate and regenerate the
metadata file.
Signed-off-by: Liguang Li <liguang.li@windriver.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Philippe Proulx [Fri, 28 Oct 2016 22:33:19 +0000 (18:33 -0400)]
doc/man: only require asciidoc-attrs.conf when building the man pages
Situations:
* If you want to and can build the man pages:
* If it's a tarball tree:
* Make the man page destinations depend on asciidoc-attrs.conf.
Since it's a generated file, its date is greater than the
date of the prebuilt man pages, therefore the man pages are
built again, which is a good thing because they include the
default values of this build.
* If it's a Git tree:
* Always build the man pages anyway (no prebuilt man pages here).
* If you want to, but cannot build the man pages:
* If it's a tarball tree:
* Make the man page destinations NOT depend on asciidoc-attrs.conf,
because its recent date would ask said destinations to be rebuilt
and this is not possible because we don't have the tools.
However, warn the user at configure time that the prebuilt man
pages will be installed, which means that they will contain the
project's default values, not this build's default values.
* If it's a Git tree:
* Not valid: error at configure time as usual.
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Fix: stop lttng-relayd threads on health thread error
The lttng-relayd health thread may fail to initialize for
a variety of reason (notably, a too long unix domain socket
address), which will cause it to never notify that it is
ready.
In such circumstances, the lttng-relayd command, in background or
daemonize mode, will never return as the daemon's "readyness"
will never be signaled.
Issuing fprintf() to stderr (thus write() to the standard error file
descriptor) within the SIGPIPE signal handler is bad: it can trigger
SIGPIPE repeatedly if the listening end has closed its end of the pipe.
Set the SIGPIPE action to SIG_IGN in relayd, sessiond, and consumerd.
This was affecting sessiond and relayd. The consumerd did not print
anything to stderr.
Fix: handle backward compatibility with lttng-modules 2.7
There is no major version bump between lttng-module 2.7 and 2.8 ABI.
Even though we do not guarantee compatibility, do a best effort to
maintain it when possible.
Tests: tap.sh spams tests' output when no plan is set
Some tests are implemented in C (using tap.h) or in Python
and don't use tap.sh's facilities. However, it is sourced
by utils.sh and prints an error message during its clean-up
because a plan was never set.
Use -M parameter instead of --manpath when invoking man(1)
Older versions of man (and the implementation used in FreeBSD) do
not support the long version of the --manpath/-M option. Use
'-M' in the interest of portability.
Fix: Mark ASCIIDOC_ATTRS_CONF as a dependency of man page targets
ASCIIDOC_ATTRS_CONF contains the various paths set by autoconf,
such as datadir, syscondif and prefix, and it may be changed
by the user by invoking ./configure with different options. In
such a case, the man pages should be regenerated to take the new
paths into account.
Fix: validate number of subbuffers after tweaking properties
There are properties that are tweaked by each of ust and kernel channel
create functions after a validation on the number of subbuffers for
overwrite channels. Move validation after those properties
modifications.
The ht_cleanup thread is shut down before the queue of rcu
callbacks is emptied by the rcu_barrier(). Since callbacks added
by call_rcu can push hash tables through the ht_cleanup pipe, we run
into cases where the clean-up thread has been shutdown and
hash tables pushed through the clean-up pipe are leaked.
For channels configured with large sub-buffer size, the relayd copies
the entire trace sub-buffer (trace packet) into a large buffer, and then
copies the large buffer to disk. It is inefficient from a point of view
of cache locality.
Use a 64k buffer on the stack instead, and move the data piece-wise.
Fix: reduce scope of kconsumer consumed_pos and produced_pos
The consumed_pos and produced_pos accesses are protected by the
stream mutex, which is fine as-is. However, consumed_pos is
passed to consumer_get_consume_start_pos() and is flagged by
Coverity as a possible use of a "stale" consumed_pos.
From an analyzer's standpoint, this makes sense since
both lttng_kconsumer_get_produced_snapshot() and
lttng_kconsumer_get_consumed_snapshot() could leave their output
parameter uninitialized and return 0 since they both assume that
ioctl() will set errno if ret != 0.
IOCTL(3P) specifies that errno is only set if ret < 0.
A bug in lttng-modules could cause ioctl() to return a positive
value, leaving the errno variable unset. In such a case,
both functions would return 0, leaving the positions uninitialized.
A follow-up fix enforces this assumption (ret never > 0) as part
of the kernctl API.
Jonathan Rajotte [Thu, 26 May 2016 22:14:37 +0000 (18:14 -0400)]
Fix: set the logger level to prevent unexpected level inheritance
BSF and other jars can ship with an embedded log4j.properties
file. This causes problem when launching an application with a general
class path (e.g /usr/share/java/*) since log4j will look for a
configuration file in all loaded jars. If any contains a directive for
the root logger, it will affect any logger with no level that are
directly under the root logger. This can result in an unexpected
behaviour (e.g no events triggered etc.).
Link: https://issues.apache.org/jira/browse/BSF-24 Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Cleanup error.h __lttng_print() used for message printing
The loglevels have never really been a mask, and it is useless to try to
use them as masks, because the compiler statically knows the value of
the loglevel requested, and can therefore optimise away all the logic.
This takes care of Coverity warning about mixed bitwise and boolean
logic, which was technically correct, but more complex than needed.