Within lttng-modules instrumentation starting from kernel 3.9.0, the
block_bio_queue instrumentation has incorrect rwbs field type, and does
not print "comm" as an array of text.
The result is that we are writing values to what we believe to be an
"unsigned int", which is actually described as an array of RWBS_LEN byte
(8 bytes). This discrepancy between type description and the actual
tp_assign() incrementing the write offset leads to what appears as
corruption of the following "comm" field in the trace viewer output: the
viewer will skip the first bytes of the "comm" field, erroneously
thinking they belong the the previous "rwbs" field.
lib_ring_buffer_write_commit_counter()'s 'buf_offset' argument should
contain offset of beginning of area used by the record being comitted.
However, lib_ring_buffer_commit() passes ctx->buf_offset, that gets
advanced by lib_ring_buffer_write() and thus points to just-after-
end-of-record at lib_ring_buffer_commit() time. This causes
lib_ring_buffer_write_commit_counter() to return without changing
commit_hot[idx].seq, due to
if (unlikely(subbuf_offset(offset - commit_count, chan)))
return;
Since after-crash data extraction tool checks 'seq' field to find out
how much data is in buffer, this results into inavailability of
data from partially-filled subbuffer for after-crash analysis.
This patch modifies lib_ring_buffer_write_commit_counter() and all its
callers to pass and expect the end of the area. So code works as it
should, and complete information becomes visible in crash dump.
[ Changelog inspired from Nikita Yushchenko's original patch. ]
show_type() is only used in TP_printk(), which is not used by LTTng
modules. Moreover, it is already defined by the in-kernel
instrumentation. Therefore, we can remove it from the lttng
instrumentation.
Fix: module instrumentation: update to 3.15 kernel
Remove show_module_flags() define from lttng module.h instrumentation.
It is already defined within the in-kernel module.h, _and_ LTTng does
not use TP_printk.
Add a user-space ABI (new file /proc/lttng-logger) to lttng-modules
which can be written into by any user on the system. The content is
saved into the kernel trace stream into the "lttng_logger" kernel event.
The content of a single write is written into an lttng_logger event.
The write count is truncated to 1024 bytes (if larger), which is much
smaller than the smallest subbuffer size available (4096 bytes). This
ensures all written data makes it into the active tracing buffers.
Don't use ring buffer client's struct lttng_channel from ioctl which
applies to ring buffer streams, because lttng_channel is freed while lib
ring buffer stream and channel are still in use. Their lifetime persists
until the consumer daemon releases its handles on the related stream
file descriptors.
"Introduce API to remap event names exposed by LTTng"
failed to map the event names enabled by the user to tracepoint names
known to the kernel. For instance, tracing with the kmem_kmalloc event
enabled is not gathering any event. This issue applies to all tracepoint
events declared with a different name within LTTng than within the Linux
kernel.
It should use lib_ring_buffer_read_offset_address() to get the packet
being read, rather than lib_ring_buffer_offset_address(), which is only
meant to be used when writing to the packet.
By using the timestamp sampled at space reservation when the packet is
being filled as "end timestamp" for a packet, we can ensure there is no
overlap between packet timestamp ranges, so that packet timestamp end <=
following packets timestamp begin.
Overlap between consecutive packets becomes an issue when the end
timestamp of a packet is greater than the end timestamp of a following
packet, IOW a packet completely contains the timestamp range of a
following packet. This kind of situation does not allow trace viewers
to do binary search within the packet timestamps. This kind of situation
will typically never occur if packets are significantly larger than
event size, but this fix ensures it can never even theoretically happen.
The only case where packets can still theoretically overlap is if they
have equal begin and end timestamps, which is valid.
lttng-statedump-impl: Use generic hard irqs for Linux >= 3.12
Quoting the original patch changelog from Otavio Salvador:
> The Linux kernel 3.12 uses the generic hard irqs system for all
> architectures and dropped the GENERIC_HARDIRQ option, as can be seen
> at the commit quoted below:
>
> ,----
> | commit 0244ad004a54e39308d495fee0a2e637f8b5c317
> | Author: Martin Schwidefsky <schwidefsky@de.ibm.com>
> | Date: Fri Aug 30 09:39:53 2013 +0200
> |
> | Remove GENERIC_HARDIRQ config option
> |
> | After the last architecture switched to generic hard irqs the config
> | options HAVE_GENERIC_HARDIRQS & GENERIC_HARDIRQS and the related code
> | for !CONFIG_GENERIC_HARDIRQS can be removed.
> |
> | Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
> `----
Introduce wrapper/irq.h to move the feature availability testing logic
into a specific wrapper header. It now tests if the kernel version is
>= 3.12 or if CONFIG_GENERIC_HARDIRQS is defined (for older kernels).
Introduce the lttng-specific CONFIG_LTTNG_HAS_LIST_IRQ to track
availability of this feature within LTTng.
Reported-by: Philippe Mangaud <r49081@freescale.com> Reported-by: Otavio Salvador <otavio@ossystems.com.br> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Import fix from LTSI: 3.4+ RT kernels use CONFIG_PREEMPT_RT_FULL
Initial LTSI commit:
From: Paul Gortmaker <paul.gortmaker@windriver.com>
> fix reference to obsolete RT Kconfig variable.
>
> The preempt-rt patches no longer use CONFIG_PREEMPT_RT in
> the 3.4 (and newer) versions. So even though LTSI doesn't
> include RT, having this define present can lead to an easy
> to overlook bug for anyone who does try to layer RT onto
> the LTSI baseline.
>
> Update it to use the currently used define name by RT.
>
> Reported-by: Jim Somerville <Jim.Somerville@windriver.com>
> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Merged with kernel version checks for >= 3.4 to support both old and
newer kernels.
These new calls export the data required for the consumer to generate
the index while tracing :
- timestamp begin
- timestamp end
- events discarded
- context size
- packet size
- stream id
This patch allows LTTng to override the file operations of the lib ring
buffer.
For now it does not provide any additional functions, but it prepares
the work of adding LTTng-specific ioctls to the ring buffer.
Linux kernels 3.10 and 3.11 introduce a deadlock in the timekeeping
subsystem. See
http://lkml.kernel.org/r/1378943457-27314-1-git-send-email-john.stultz@linaro.org
for details. Awaiting patch merge into Linux master, stable-3.10 and
stable-3.11 for fine-grained kernel version blacklisting.
The metadata stream should only reference the metadata cache, not the
session. Otherwise, we end up in a catch 22 situation:
- Stream POLLHUP is only given when the session is destroyed, but,
- The session is only destroyed when all references to session are
released, including references from channels, but,
- If the metadata stream holds a reference on the metadata session, we
end up with a circular dependency loop.
Fix this by making sure the metadata stream does not use any of the
lttng channel nor lttng session.
The OOPS at bug #622 is likely caused by a missing reference on the
lttng channel structure, which could lead to accessing the object after
it has been destroyed if the lttng channel file descriptor is closed
while the metadata stream fd is still in use.
However, we don't want to populate data from the metadata cache into the
stream until put_next_subbuf is issued. Add a check to ensure that it is
not populated until required.
Also, disallow get_subbuf() ioctl on metadata channel: its random-access
semantic does not play well with serialization of the metadata cache on
demand.
- Don't require 2kB of stack anymore (requiring so much kernel stack
space should be considered as a bug in itself),
- Add support for 3.10 kernel printk instrumentation.
This patch uses "Introduce __dynamic_array_enc_ext_2() and
tp_memcpy_dyn_2()".
Fix: ring buffer: handle concurrent update in nested buffer wrap around check
With stress-test loads that trigger sub-buffer switch very frequently
(small 4kB sub-buffers, frequent flush), we currently observe this kind
of warnings once every few minutes:
[65335.896208] ring buffer relay-overwrite-mmap, cpu 5: records were lost. Caused by:
[65335.896208] [ 0 buffer full, 1 nest buffer wrap-around, 0 event too big ]
It appears that the check for nested buffer wrap-around does not take
into account that a concurrent execution contexts (either nested for
per-cpu buffers, or from another CPU or nested for global buffers) can
update the commit_count value concurrently.
What we really want to do with this check is to ensure that if we enter
a sub-buffer that had an unbalanced reserve/commit count, assuming there
is no hope that this gets rebalanced promptly, we detect this and drop
the current event. However, in the case where the commit counter has
been concurrently updated by another reserve or a switch, we want to
retry the entire reserve operation.
One way to detect this is to sample the reserve offset twice, around the
commit counter read, along with the appropriate memory barriers.
Therefore, we can detect if the mismatch between reserve and commit
counter is actually caused by a concurrent update, which necessarily has
updated the reserve counter.
Cleanup: lib_ring_buffer_switch_new_end() only calls subbuffer_set_data_size()
lib_ring_buffer_switch_new_end() is always called when an event exactly
fills a sub-buffer, which makes padding_size always 0. However, there is
one side-effect that lib_ring_buffer_switch_new_end() needs to have: it
calls subbuffer_set_data_size() to update the size of the data to be
read from the sub-buffer.
lib_ring_buffer_write(), lib_ring_buffer_memset() and
lib_ring_buffer_copy_from_user_inatomic() could be passed a length of 0.
This typically has no side-effect as far as writing into the buffers is
concerned, except for one detail: in overwrite mode, there is a check to
make sure the sub-buffer can be written into. This check is performed
even if length is 0. In the case where this would fall exactly at the
end of a sub-buffer, the check would fail, because the offset would fall
exactly at the beginning of the next sub-buffer.
Fix: ring buffer: RING_BUFFER_FLUSH ioctl buffer corruption
lib_ring_buffer_switch_slow() clearly states:
* Note, however, that as a v_cmpxchg is used for some atomic
* operations, this function must be called from the CPU which owns the
* buffer for a ACTIVE flush.
But unfortunately, the RING_BUFFER_FLUSH ioctl does not follow these
important directives. Therefore, whenever the consumer daemon or session
daemon explicitly triggers a "flush" on a buffer, it can race with data
being written to the buffer, leading to corruption of the reserve/commit
counters, and therefore corruption of data in the buffer. It triggers
these warnings for overwrite mode buffers:
[65356.890016] WARNING: at
/home/compudj/git/lttng-modules/wrapper/ringbuffer/../../lib/ringbuffer/../../wrapper/ringbuffer/../../lib/ringbuffer/backend.h:110 lttng_event_write+0x118/0x140 [lttng_ring_buffer_client_mmap_overwrite]()
Which indicates that we are trying to write into a sub-buffer for which
we don't have exclusive access. It also causes the following warnings to
show up:
[65335.896208] ring buffer relay-overwrite-mmap, cpu 5: records were lost. Caused by:
[65335.896208] [ 0 buffer full, 80910 nest buffer wrap-around, 0 event too big ]
Which is caused by corrupted commit counter.
Fix this by sending an IPI to the CPU owning the flushed buffer for
per-cpu synchronization. For global synchronization, no IPI is needed,
since we allow writes from remote CPUs.