Accessing the lttng channel and lttng session from the metadata ring
buffer client is a bad idea, because we don't have any reference
ensuring those are valid for the lifetime of the metadata cache.
Therefore, rather than keeping a lttng channel as private data pointer
for the metadata ring buffer channel, keep a pointer to the metadata
cache instead: this will ensure we don't shoot ourself in the foot and
access data we for which coherency is not guaranteed (we don't hold any
reference to it).
Anyway, the only reason why we needed to access the lttng session from
the metadata client in the first place was the UUID of the session. Copy
it into the metadata cache instead.
Fix: handle concurrent flush vs get_next_subbuf on metadata cache
The "flush" operation can be performed on the metadata file descriptor
concurrently with get_next_subbuffer operations by different processes
(e.g. lttng session daemon issuing flush at "stop" concurrently with
consumer daemon issuing get_next_subbuf for metadata I/O). We need
to protect the metadata cache from those concurrent use by introducing a
mutex.
This fixes a race where metadata from a kernel trace is corrupted due to
this scenario. The corruption shows up like the same metadata content (a
metadata packet) being written twice consecutively in the metadata
stream, thus triggering a babeltrace "parse error" when trying to read
the trace.
Fix: statedump: check whether "files" is NULL, RCU semantic fix
We need to check if p->files is NULL before passing it to
files_fdtable(). Moreover, since the fdt is now protected by RCU, we
have to assume it can change between the read from
lttng_enumerate_task_fd() and the internal in-kernel read in
iterate_fd(). Therefore, move this rcu dereference into
lttng_dump_one_fd(), and perform the appropriate checks on max fds.
lttng_enumerate_file_descriptors should check the pointer returned by
__get_free_page() (check if NULL).
do_lttng_statedump should check the sub-function return values. For
lttng_enumerate_block_devices(), we allow -ENOSYS to continue (if not
implemented).
Antoine Busque [Wed, 11 Jun 2014 14:45:44 +0000 (10:45 -0400)]
Fix: correct typo in kernel version number
A typo in a preprocessor conditional verifying the range in which the
current kernel version is situated caused build failure for modules on
3.5.0 specifically, by trying to use the new api for block_rq_complete
which has not been backported from the 3.15 branch to the 3.5 branch.
Signed-off-by: Antoine Busque <antoine.busque@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Julien Desfossez [Fri, 23 May 2014 13:06:07 +0000 (09:06 -0400)]
Add TID field to some block_* events
Most of the block events have the "comm" field, but we have no way to
match a block event to a certain thread which makes performing
accurate per-thread analyses difficult. Add the TID field to the events
that already export current->comm.
Within lttng-modules instrumentation starting from kernel 3.9.0, the
block_bio_queue instrumentation has incorrect rwbs field type, and does
not print "comm" as an array of text.
The result is that we are writing values to what we believe to be an
"unsigned int", which is actually described as an array of RWBS_LEN byte
(8 bytes). This discrepancy between type description and the actual
tp_assign() incrementing the write offset leads to what appears as
corruption of the following "comm" field in the trace viewer output: the
viewer will skip the first bytes of the "comm" field, erroneously
thinking they belong the the previous "rwbs" field.
lib_ring_buffer_write_commit_counter()'s 'buf_offset' argument should
contain offset of beginning of area used by the record being comitted.
However, lib_ring_buffer_commit() passes ctx->buf_offset, that gets
advanced by lib_ring_buffer_write() and thus points to just-after-
end-of-record at lib_ring_buffer_commit() time. This causes
lib_ring_buffer_write_commit_counter() to return without changing
commit_hot[idx].seq, due to
if (unlikely(subbuf_offset(offset - commit_count, chan)))
return;
Since after-crash data extraction tool checks 'seq' field to find out
how much data is in buffer, this results into inavailability of
data from partially-filled subbuffer for after-crash analysis.
This patch modifies lib_ring_buffer_write_commit_counter() and all its
callers to pass and expect the end of the area. So code works as it
should, and complete information becomes visible in crash dump.
[ Changelog inspired from Nikita Yushchenko's original patch. ]
show_type() is only used in TP_printk(), which is not used by LTTng
modules. Moreover, it is already defined by the in-kernel
instrumentation. Therefore, we can remove it from the lttng
instrumentation.
Fix: module instrumentation: update to 3.15 kernel
Remove show_module_flags() define from lttng module.h instrumentation.
It is already defined within the in-kernel module.h, _and_ LTTng does
not use TP_printk.
Add a user-space ABI (new file /proc/lttng-logger) to lttng-modules
which can be written into by any user on the system. The content is
saved into the kernel trace stream into the "lttng_logger" kernel event.
The content of a single write is written into an lttng_logger event.
The write count is truncated to 1024 bytes (if larger), which is much
smaller than the smallest subbuffer size available (4096 bytes). This
ensures all written data makes it into the active tracing buffers.
Don't use ring buffer client's struct lttng_channel from ioctl which
applies to ring buffer streams, because lttng_channel is freed while lib
ring buffer stream and channel are still in use. Their lifetime persists
until the consumer daemon releases its handles on the related stream
file descriptors.
"Introduce API to remap event names exposed by LTTng"
failed to map the event names enabled by the user to tracepoint names
known to the kernel. For instance, tracing with the kmem_kmalloc event
enabled is not gathering any event. This issue applies to all tracepoint
events declared with a different name within LTTng than within the Linux
kernel.
It should use lib_ring_buffer_read_offset_address() to get the packet
being read, rather than lib_ring_buffer_offset_address(), which is only
meant to be used when writing to the packet.
By using the timestamp sampled at space reservation when the packet is
being filled as "end timestamp" for a packet, we can ensure there is no
overlap between packet timestamp ranges, so that packet timestamp end <=
following packets timestamp begin.
Overlap between consecutive packets becomes an issue when the end
timestamp of a packet is greater than the end timestamp of a following
packet, IOW a packet completely contains the timestamp range of a
following packet. This kind of situation does not allow trace viewers
to do binary search within the packet timestamps. This kind of situation
will typically never occur if packets are significantly larger than
event size, but this fix ensures it can never even theoretically happen.
The only case where packets can still theoretically overlap is if they
have equal begin and end timestamps, which is valid.
lttng-statedump-impl: Use generic hard irqs for Linux >= 3.12
Quoting the original patch changelog from Otavio Salvador:
> The Linux kernel 3.12 uses the generic hard irqs system for all
> architectures and dropped the GENERIC_HARDIRQ option, as can be seen
> at the commit quoted below:
>
> ,----
> | commit 0244ad004a54e39308d495fee0a2e637f8b5c317
> | Author: Martin Schwidefsky <schwidefsky@de.ibm.com>
> | Date: Fri Aug 30 09:39:53 2013 +0200
> |
> | Remove GENERIC_HARDIRQ config option
> |
> | After the last architecture switched to generic hard irqs the config
> | options HAVE_GENERIC_HARDIRQS & GENERIC_HARDIRQS and the related code
> | for !CONFIG_GENERIC_HARDIRQS can be removed.
> |
> | Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
> `----
Introduce wrapper/irq.h to move the feature availability testing logic
into a specific wrapper header. It now tests if the kernel version is
>= 3.12 or if CONFIG_GENERIC_HARDIRQS is defined (for older kernels).
Introduce the lttng-specific CONFIG_LTTNG_HAS_LIST_IRQ to track
availability of this feature within LTTng.
Reported-by: Philippe Mangaud <r49081@freescale.com> Reported-by: Otavio Salvador <otavio@ossystems.com.br> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Import fix from LTSI: 3.4+ RT kernels use CONFIG_PREEMPT_RT_FULL
Initial LTSI commit:
From: Paul Gortmaker <paul.gortmaker@windriver.com>
> fix reference to obsolete RT Kconfig variable.
>
> The preempt-rt patches no longer use CONFIG_PREEMPT_RT in
> the 3.4 (and newer) versions. So even though LTSI doesn't
> include RT, having this define present can lead to an easy
> to overlook bug for anyone who does try to layer RT onto
> the LTSI baseline.
>
> Update it to use the currently used define name by RT.
>
> Reported-by: Jim Somerville <Jim.Somerville@windriver.com>
> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Merged with kernel version checks for >= 3.4 to support both old and
newer kernels.
These new calls export the data required for the consumer to generate
the index while tracing :
- timestamp begin
- timestamp end
- events discarded
- context size
- packet size
- stream id
This patch allows LTTng to override the file operations of the lib ring
buffer.
For now it does not provide any additional functions, but it prepares
the work of adding LTTng-specific ioctls to the ring buffer.
Linux kernels 3.10 and 3.11 introduce a deadlock in the timekeeping
subsystem. See
http://lkml.kernel.org/r/1378943457-27314-1-git-send-email-john.stultz@linaro.org
for details. Awaiting patch merge into Linux master, stable-3.10 and
stable-3.11 for fine-grained kernel version blacklisting.
The metadata stream should only reference the metadata cache, not the
session. Otherwise, we end up in a catch 22 situation:
- Stream POLLHUP is only given when the session is destroyed, but,
- The session is only destroyed when all references to session are
released, including references from channels, but,
- If the metadata stream holds a reference on the metadata session, we
end up with a circular dependency loop.
Fix this by making sure the metadata stream does not use any of the
lttng channel nor lttng session.
The OOPS at bug #622 is likely caused by a missing reference on the
lttng channel structure, which could lead to accessing the object after
it has been destroyed if the lttng channel file descriptor is closed
while the metadata stream fd is still in use.
However, we don't want to populate data from the metadata cache into the
stream until put_next_subbuf is issued. Add a check to ensure that it is
not populated until required.
Also, disallow get_subbuf() ioctl on metadata channel: its random-access
semantic does not play well with serialization of the metadata cache on
demand.