Fix: protect the channel's metadata stream using the metadata cache lock
The consumer_thread_data_poll and consumer_thread_metadata_poll
both access the channel's metadata stream.
During a session destruction, consumer_thread_metadata_poll will
destroy all metadata streams. However, the consumer_thread_data_poll
may still invoke a consumer_metadata_cache_write() triggered
by a "ready" subbuffer. Hence, the metadata stream must be protected
from this action by the metadata cache lock.
relay and consumerd 2.7 and 2.8 are expected to negociate compatibility
with the lowest common minor version.
If a consumer daemon 2.8 interacts with a relayd 2.7, it needs to send
the index fields for ctf index 1.0. Same if a relayd 2.8 interacts with
a consumer daemon 2.7: relayd should expect ctf index 1.0 fields, and
generate a ctf index 1.0 index file layout.
If both relayd and consumerd versions are 2.8+, then we can send the ctf
index 1.1 fields over the protocol, and store them in the index files.
Whenever the relayd live viewer server opens and reads an index file,
it needs to use the file's header to figure out the index "element"
size.
[ Should be applied to master, stable-2.9, stable-2.8. ]
Liguang Li [Mon, 28 Nov 2016 08:37:47 +0000 (16:37 +0800)]
Fix: truncate the metadata file in shm-path
In the shm-path mode, the metadata will be backuped to a metadata
file, when run the lttng command "lttng metadata regenerate" to
resample the wall time following a major NTP correction, the metadata
file will not be truncated and regenerated.
Add the function clear_metadata_file() to truncate and regenerate the
metadata file.
Signed-off-by: Liguang Li <liguang.li@windriver.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Philippe Proulx [Fri, 28 Oct 2016 22:33:19 +0000 (18:33 -0400)]
doc/man: only require asciidoc-attrs.conf when building the man pages
Situations:
* If you want to and can build the man pages:
* If it's a tarball tree:
* Make the man page destinations depend on asciidoc-attrs.conf.
Since it's a generated file, its date is greater than the
date of the prebuilt man pages, therefore the man pages are
built again, which is a good thing because they include the
default values of this build.
* If it's a Git tree:
* Always build the man pages anyway (no prebuilt man pages here).
* If you want to, but cannot build the man pages:
* If it's a tarball tree:
* Make the man page destinations NOT depend on asciidoc-attrs.conf,
because its recent date would ask said destinations to be rebuilt
and this is not possible because we don't have the tools.
However, warn the user at configure time that the prebuilt man
pages will be installed, which means that they will contain the
project's default values, not this build's default values.
* If it's a Git tree:
* Not valid: error at configure time as usual.
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Fix: stop lttng-relayd threads on health thread error
The lttng-relayd health thread may fail to initialize for
a variety of reason (notably, a too long unix domain socket
address), which will cause it to never notify that it is
ready.
In such circumstances, the lttng-relayd command, in background or
daemonize mode, will never return as the daemon's "readyness"
will never be signaled.
Issuing fprintf() to stderr (thus write() to the standard error file
descriptor) within the SIGPIPE signal handler is bad: it can trigger
SIGPIPE repeatedly if the listening end has closed its end of the pipe.
Set the SIGPIPE action to SIG_IGN in relayd, sessiond, and consumerd.
This was affecting sessiond and relayd. The consumerd did not print
anything to stderr.
Fix: handle backward compatibility with lttng-modules 2.7
There is no major version bump between lttng-module 2.7 and 2.8 ABI.
Even though we do not guarantee compatibility, do a best effort to
maintain it when possible.
Tests: tap.sh spams tests' output when no plan is set
Some tests are implemented in C (using tap.h) or in Python
and don't use tap.sh's facilities. However, it is sourced
by utils.sh and prints an error message during its clean-up
because a plan was never set.
Use -M parameter instead of --manpath when invoking man(1)
Older versions of man (and the implementation used in FreeBSD) do
not support the long version of the --manpath/-M option. Use
'-M' in the interest of portability.
Fix: Mark ASCIIDOC_ATTRS_CONF as a dependency of man page targets
ASCIIDOC_ATTRS_CONF contains the various paths set by autoconf,
such as datadir, syscondif and prefix, and it may be changed
by the user by invoking ./configure with different options. In
such a case, the man pages should be regenerated to take the new
paths into account.
Fix: validate number of subbuffers after tweaking properties
There are properties that are tweaked by each of ust and kernel channel
create functions after a validation on the number of subbuffers for
overwrite channels. Move validation after those properties
modifications.
The ht_cleanup thread is shut down before the queue of rcu
callbacks is emptied by the rcu_barrier(). Since callbacks added
by call_rcu can push hash tables through the ht_cleanup pipe, we run
into cases where the clean-up thread has been shutdown and
hash tables pushed through the clean-up pipe are leaked.
For channels configured with large sub-buffer size, the relayd copies
the entire trace sub-buffer (trace packet) into a large buffer, and then
copies the large buffer to disk. It is inefficient from a point of view
of cache locality.
Use a 64k buffer on the stack instead, and move the data piece-wise.
Fix: reduce scope of kconsumer consumed_pos and produced_pos
The consumed_pos and produced_pos accesses are protected by the
stream mutex, which is fine as-is. However, consumed_pos is
passed to consumer_get_consume_start_pos() and is flagged by
Coverity as a possible use of a "stale" consumed_pos.
From an analyzer's standpoint, this makes sense since
both lttng_kconsumer_get_produced_snapshot() and
lttng_kconsumer_get_consumed_snapshot() could leave their output
parameter uninitialized and return 0 since they both assume that
ioctl() will set errno if ret != 0.
IOCTL(3P) specifies that errno is only set if ret < 0.
A bug in lttng-modules could cause ioctl() to return a positive
value, leaving the errno variable unset. In such a case,
both functions would return 0, leaving the positions uninitialized.
A follow-up fix enforces this assumption (ret never > 0) as part
of the kernctl API.
Jonathan Rajotte [Thu, 26 May 2016 22:14:37 +0000 (18:14 -0400)]
Fix: set the logger level to prevent unexpected level inheritance
BSF and other jars can ship with an embedded log4j.properties
file. This causes problem when launching an application with a general
class path (e.g /usr/share/java/*) since log4j will look for a
configuration file in all loaded jars. If any contains a directive for
the root logger, it will affect any logger with no level that are
directly under the root logger. This can result in an unexpected
behaviour (e.g no events triggered etc.).
Link: https://issues.apache.org/jira/browse/BSF-24 Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Cleanup error.h __lttng_print() used for message printing
The loglevels have never really been a mask, and it is useless to try to
use them as masks, because the compiler statically knows the value of
the loglevel requested, and can therefore optimise away all the logic.
This takes care of Coverity warning about mixed bitwise and boolean
logic, which was technically correct, but more complex than needed.
Philippe Proulx [Tue, 17 May 2016 23:30:39 +0000 (19:30 -0400)]
doc/man: put AsciiDoc attributes in their own file
This facilitates the generation of man pages using another
asciidoc.conf file, but keeping the same attributes, without
having to split the generated configuration file.
Signed-off-by: Philippe Proulx <eeppeliteloop@gmail.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
The new environment variable LTTNG_ABORT_ON_ERROR allows each
lttng-tools program to call abort() on PERROR() and ERR() after the
error message has been printed to stderr.
Fix: ust-consumer: flush empty packets on snapshot channel
Snapshot operation on a non-stopped stream should use a "final" flush to
ensure empty packets are flushed, so we gather timestamps at the moment
where the snapshot is taken. This is important for streams that have a
low amount of activity, which might be on an empty packet when the
snapshot is triggered.
PRINT_ERR maps to 0x1, PRINT_WARN maps to 0x2, which is fine so far to
use as masks, but PRINT_BUG maps to 0x3, which is the same as both
PRINT_ERR and PRINT_WARN, and does not make sense to use in masks with
__lttng_print:
(type & (PRINT_WARN | PRINT_ERR | PRINT_BUG))
Fix this by ensuring PRINT_BUG has its own mask, and express all
constants as shifts to eliminate the risk of re-introducing a similar
bug in the future.
We should flush the last packet after stop, not before. Otherwise, we
may end up with events written immediately after the flush, which
defeats the purpose of flushing.
Fix: UST should not generate packet at destroy after stop
In the following scenario:
- create, enable events (ust),
- start
- ...
- stop (await for data_pending to complete)
- destroy
- rm the trace directory
We would expect that the "rm" operation would not conflict with the
consumer daemon trying to output data into the trace files, since the
"stop" operation ensured that there was no data_pending.
However, the "destroy" operation currently generates an extra packet
after the data_pending check (the "on_stream_hangup"). This causes the
consumer daemon to try to perform trace file rotation concurrently with
the trace directory removal in the scenario above, which triggers
errors. The main reason why this empty packet is generated by "destroy"
is to deal with trace start/stop scenario which would otherwise generate
a completely empty stream.
Therefore, introduce the concept of a "quiescent stream". It is
initialized at false on stream creation (first packet is empty). When
tracing is started, it is set to false (for cases of start/stop/start).
When tracing is stopped, if the stream is not quiescent, perform a
"final" flush (which will generate an empty packet if the current packet
was empty), and set quiescent to true. On "destroy" stream and on
application hangup: if the stream is not quiescent, perform a "final"
flush, and set the quiescent state to true.
The test case for '*', which enables all events, is flaky by its
nature since buffers may be filled by other kernel events preventing
the test script from finding the test event (it is often discarded).
Fix: bad file descriptors on close after rotation error
Ensure we don't try to close output stream file descriptors twice when a
trace file rotation error occurs (once at tracefile rotation, once when
closing the stream). Set the fd value to -1 after the first close to
ensure we don't try to close it again.
Michael Jeanson [Wed, 18 May 2016 19:43:06 +0000 (15:43 -0400)]
Fix: merge tap tests stdout and stderr
This makes the output and error statement ordered in the log
file and ensure that the first line is the tap test plan. Some tap
parser are confused if the test plan is not on the first line.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>