Import fix from LTSI: 3.4+ RT kernels use CONFIG_PREEMPT_RT_FULL
Initial LTSI commit:
From: Paul Gortmaker <paul.gortmaker@windriver.com>
> fix reference to obsolete RT Kconfig variable.
>
> The preempt-rt patches no longer use CONFIG_PREEMPT_RT in
> the 3.4 (and newer) versions. So even though LTSI doesn't
> include RT, having this define present can lead to an easy
> to overlook bug for anyone who does try to layer RT onto
> the LTSI baseline.
>
> Update it to use the currently used define name by RT.
>
> Reported-by: Jim Somerville <Jim.Somerville@windriver.com>
> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Merged with kernel version checks for >= 3.4 to support both old and
newer kernels.
Linux kernels 3.10 and 3.11 introduce a deadlock in the timekeeping
subsystem. See
http://lkml.kernel.org/r/1378943457-27314-1-git-send-email-john.stultz@linaro.org
for details. Awaiting patch merge into Linux master, stable-3.10 and
stable-3.11 for fine-grained kernel version blacklisting.
- Don't require 2kB of stack anymore (requiring so much kernel stack
space should be considered as a bug in itself),
- Add support for 3.10 kernel printk instrumentation.
This patch uses "Introduce __dynamic_array_enc_ext_2() and
tp_memcpy_dyn_2()".
Fix: ring buffer: handle concurrent update in nested buffer wrap around check
With stress-test loads that trigger sub-buffer switch very frequently
(small 4kB sub-buffers, frequent flush), we currently observe this kind
of warnings once every few minutes:
[65335.896208] ring buffer relay-overwrite-mmap, cpu 5: records were lost. Caused by:
[65335.896208] [ 0 buffer full, 1 nest buffer wrap-around, 0 event too big ]
It appears that the check for nested buffer wrap-around does not take
into account that a concurrent execution contexts (either nested for
per-cpu buffers, or from another CPU or nested for global buffers) can
update the commit_count value concurrently.
What we really want to do with this check is to ensure that if we enter
a sub-buffer that had an unbalanced reserve/commit count, assuming there
is no hope that this gets rebalanced promptly, we detect this and drop
the current event. However, in the case where the commit counter has
been concurrently updated by another reserve or a switch, we want to
retry the entire reserve operation.
One way to detect this is to sample the reserve offset twice, around the
commit counter read, along with the appropriate memory barriers.
Therefore, we can detect if the mismatch between reserve and commit
counter is actually caused by a concurrent update, which necessarily has
updated the reserve counter.
lib_ring_buffer_write(), lib_ring_buffer_memset() and
lib_ring_buffer_copy_from_user_inatomic() could be passed a length of 0.
This typically has no side-effect as far as writing into the buffers is
concerned, except for one detail: in overwrite mode, there is a check to
make sure the sub-buffer can be written into. This check is performed
even if length is 0. In the case where this would fall exactly at the
end of a sub-buffer, the check would fail, because the offset would fall
exactly at the beginning of the next sub-buffer.
Fix: ring buffer: RING_BUFFER_FLUSH ioctl buffer corruption
lib_ring_buffer_switch_slow() clearly states:
* Note, however, that as a v_cmpxchg is used for some atomic
* operations, this function must be called from the CPU which owns the
* buffer for a ACTIVE flush.
But unfortunately, the RING_BUFFER_FLUSH ioctl does not follow these
important directives. Therefore, whenever the consumer daemon or session
daemon explicitly triggers a "flush" on a buffer, it can race with data
being written to the buffer, leading to corruption of the reserve/commit
counters, and therefore corruption of data in the buffer. It triggers
these warnings for overwrite mode buffers:
[65356.890016] WARNING: at
/home/compudj/git/lttng-modules/wrapper/ringbuffer/../../lib/ringbuffer/../../wrapper/ringbuffer/../../lib/ringbuffer/backend.h:110 lttng_event_write+0x118/0x140 [lttng_ring_buffer_client_mmap_overwrite]()
Which indicates that we are trying to write into a sub-buffer for which
we don't have exclusive access. It also causes the following warnings to
show up:
[65335.896208] ring buffer relay-overwrite-mmap, cpu 5: records were lost. Caused by:
[65335.896208] [ 0 buffer full, 80910 nest buffer wrap-around, 0 event too big ]
Which is caused by corrupted commit counter.
Fix this by sending an IPI to the CPU owning the flushed buffer for
per-cpu synchronization. For global synchronization, no IPI is needed,
since we allow writes from remote CPUs.
Jan Glauber [Thu, 23 May 2013 11:35:16 +0000 (07:35 -0400)]
Fix CPU hotplug section mismatches
Get rid of the following section mismatches:
WARNING: /home/jang/temp/lttng-modules-2.2.0-r0/git/lttng-tracer.o(.text+0x19dc0): Section mismatch in reference from the function lttng_add_perf_counter_to_ctx() to the function .cpuinit.text:lttng_perf_counter_cpu_hp_callback()
The function lttng_add_perf_counter_to_ctx() references
the function __cpuinit lttng_perf_counter_cpu_hp_callback().
This is often because lttng_add_perf_counter_to_ctx lacks a __cpuinit
annotation or the annotation of lttng_perf_counter_cpu_hp_callback is wrong.
WARNING: /home/jang/temp/lttng-modules-2.2.0-r0/git/lib/lttng-lib-ring-buffer.o(.text+0x1204): Section mismatch in reference from the function channel_backend_init() to the function .cpuinit.text:lib_ring_buffer_cpu_hp_callback()
The function channel_backend_init() references
the function __cpuinit lib_ring_buffer_cpu_hp_callback().
This is often because channel_backend_init lacks a __cpuinit
annotation or the annotation of lib_ring_buffer_cpu_hp_callback is wrong.
WARNING: /home/jang/temp/lttng-modules-2.2.0-r0/git/lib/lttng-lib-ring-buffer.o(.text+0x269c): Section mismatch in reference from the function channel_create() to the function .cpuinit.text:lib_ring_buffer_cpu_hp_callback()
The function channel_create() references
the function __cpuinit lib_ring_buffer_cpu_hp_callback().
This is often because channel_create lacks a __cpuinit
annotation or the annotation of lib_ring_buffer_cpu_hp_callback is wrong.
WARNING: /home/jang/temp/lttng-modules-2.2.0-r0/git/lib/lttng-lib-ring-buffer.o(.text+0x4a1c): Section mismatch in reference from the function channel_iterator_init() to the function .cpuinit.text:channel_iterator_cpu_hotplug()
The function channel_iterator_init() references
the function __cpuinit channel_iterator_cpu_hotplug().
This is often because channel_iterator_init lacks a __cpuinit
annotation or the annotation of channel_iterator_cpu_hotplug is wrong.
Signed-off-by: Jan Glauber <jan.glauber@gmail.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
We don't use this event anymore since we write the metadata directly
into the ring buffer, no need for an external event. This probe was
the only one in the lttng-probe-lttng module, so we can get rid of
this module as well.
Samuel Martin [Mon, 17 Jun 2013 14:28:51 +0000 (10:28 -0400)]
Fix build and load against linux-2.6.33.x
* lttng-event.h declared but did not implement
lttng_add_perf_counter_to_ctx on kernel >=2.6.33, the implementation
was in lttng-context-perf-counters.c, which was only included for
kernel >=2.6.34. This prevented the module from being loaded.
* on kernel 2.6.33.x, lttng-context-perf-counters.c complains about
implicit declaration for {get,put}_online_cpus and
{,un}register_cpu_notifier; so fix header inclusion.
Signed-off-by: Samuel Martin <smartin@aldebaran-robotics.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Jon Bernard [Mon, 13 May 2013 15:38:17 +0000 (11:38 -0400)]
Remove bashism in lttng-syscalls-generate-headers.sh
Options to echo are not portable. In particular, the 'echo -e' option is
implemented by some shells, including bash, to expand escape sequences.
However, dash is one of the other family of shells that instead expands
escape sequences by default.
The printf command is portable and much more reliable.
Previously it just tries to compile with zero modules, which is
confusing if you thought you had configured everything correctly. Now it
throws an error which tells which went wrong.
We'll need to find a better way to instrument ARM-specific system calls
located at a far offset from the standard systems calls. A 16MB
lttng-modules kernel module is really not acceptable.
Removing this instrumentation for now. sys_set_tls will appear as
sys_unknown.
Fixes #472
CC: Ryan Kyser <Ryan.Kyser@jci.com>
Ref: http://lists.lttng.org/pipermail/lttng-dev/2013-April/019990.html Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Andrew Gabbasov [Tue, 2 Apr 2013 13:25:37 +0000 (09:25 -0400)]
Clean up using global_dirty_limit wrapper for writeback probe
Move the wrapper around reading of global_dirty_limit to /wrapper/
directory. Introduce a new kallsyms_lookup_dataptr function for
obtaining the address unchanged and use it in global_dirty_limit
wrapper. Since the data address is available only if
CONFIG_KALLSYMS_ALL is set, omit the whole probe from building if this
config is missing.
[ Edit by Mathieu Desnoyers: small coding style fixes ]
Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Don't replicate internal structures from the kernel: this is asking for
serious trouble, and could lead to breakage if building on newer kernels
that have modified structures.
The proper approach, if we really need to extract this information,
would be to add APIs to the Linux kernel workqueue that allow getting
this information.
Don't replicate internal structures from the kernel: this is asking for
serious trouble, and could lead to breakage if building on newer kernels
that have modified structures.
The proper approach, if we really need to extract this information,
would be to add APIs to the Linux kernel workqueue that allow getting
this information.
Maxin B. John [Fri, 22 Mar 2013 13:56:13 +0000 (09:56 -0400)]
lttng-module: sched.h: Fix compilation on 3.9 kernel
With commit 8bd75c77b7c6a3954140dd2e20346aef3efe4a35
included in 3.9-rc1 kernel, rt specific bits in "linux/sched.h"
were moved into new header file "linux/sched/rt.h".
Fixes this compilation error:
CC [M] /home/majo/lttng/lttng-modules/probes/lttng-probe-sched.o
...
/home/majo/lttng/lttng-modules/probes/../instrumentation/events/lttng-module
/../../../probes/../instrumentation/events/lttng-module/sched.h:
In function '__event_probe__sched_switch':
/home/majo/lttng/lttng-modules/probes/../instrumentation/events/lttng-module
/../../../probes/../instrumentation/events/lttng-module/sched.h:164:1:
error: 'MAX_RT_PRIO' undeclared (first use in this function)
...
Signed-off-by: Maxin B. John <maxin.john@enea.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fix: statedump hang/too early completion due to logic error
The previous "Fix: statedump hang due to incorrect wait/wakeup use" was
not actually fixing the real problem.
The issue is that we should pass the expected condition to wait_event()
rather than its contrary.
This bug has been sitting there for a while. I suspect that a recent
change in the Linux scheduler behavior for newly spawned worker threads
might have contributed to trigger the hang more reliably.
The effects of this bugs are:
- possible hang of the lttng-sessiond (within the kernel) at tracing
start,
- the statedump end event is traced before all worker threads have
actually completed, which can confuse LTTng viewer state systems.
Reported-by: Phil Wilshire <sysdcs@gmail.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Andrew Gabbasov [Mon, 10 Dec 2012 16:12:14 +0000 (11:12 -0500)]
Update kernel probes to more detailed match to kernel versions
Some ifdef's are added to kernel probes instrumentation to make them
more close to original tracepoints in different kernel versions.
Supported kernel version are from 2.6.32 to 3.7.
Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Andrew Gabbasov [Mon, 10 Dec 2012 16:01:52 +0000 (11:01 -0500)]
Fix possible kernel build errors with linux-patches
Kernel sources of version 2.6.32 - 2.6.34 with applied patches
from linux-patches may fail to compile if tracepoint samples
are configured to build. Some part of backported commits
are added to the kernel patches to avoid those errors.
Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Andrew Gabbasov [Tue, 27 Nov 2012 16:43:38 +0000 (17:43 +0100)]
Make upper bound of kernel version checking macro exclusive
It's more usable to have the upper limit exclusive. It helps to avoid
hardcoding of stable branch highest version number, i.e. having a range
from 3.1.0 up to 3.2.0 (exclusively) gives us all 3.1.x versions.
Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Andrew Gabbasov [Sun, 25 Nov 2012 21:13:16 +0000 (16:13 -0500)]
sock instrumentation: fix fields to get referenced values
Due to specific of passing values in lttng-modules, if it is supposed
to display the values, passing a pointer will not be enough,
we need to store the actual values.
Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Andrew Gabbasov [Sun, 25 Nov 2012 21:12:46 +0000 (16:12 -0500)]
ext3 instrumentation: fix of assignment code conversion
Due to specifics of handling assignment code in lttng-modules,
plain code in TP_fast_assign (outside tp_* macros) will not be reached.
Everything should be enclosed into tp_* fragments.
Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Simon Marchi [Fri, 23 Nov 2012 23:10:38 +0000 (18:10 -0500)]
Fix compilation for 3.0 branch (>= 3.0.39)
The isolate_mode_t type that appeared in 3.2 was backported to 3.0.39 so
the version check must be fixed. It was not backported to the 3.1 branch
though, so it must be excluded.
Signed-off-by: Simon Marchi <simon.marchi@polymtl.ca> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
The current ABI does not work for compat 32/64 bits.
This patch moves the current ABI as old-abi and provides a new ABI in
which all the structures exchanged between user and kernel-space are
packed. Also this new ABI moves the "int overwrite" member of the
struct lttng_kernel_channel to remove the alignment added by the
compiler.
A patch for lttng-tools has been developed in parallel to this one to
support the new ABI. These 2 patches have been tested in all
possible configurations (applied or not) on 64-bit and 32-bit kernels
(with CONFIG_COMPAT) and a user-space in 32 and 64-bit.
Here are the results of the tests :
k 64 compat | u 32 compat | OK
k 64 compat | u 64 compat | OK
k 64 compat | u 32 non-compat | KO
k 64 compat | u 64 non-compat | OK
k 64 non-compat | u 64 compat | OK
k 64 non-compat | u 32 compat | KO
k 64 non-compat | u 64 non-compat | OK
k 64 non-compat | u 32 non-compat | KO
k 32 compat | u compat | OK
k 32 compat | u non-compat | OK
k 32 non-compat | u compat | OK
k 32 non-compat | u non-compat | OK
The results are as expected :
- on 32-bit user-space and kernel, every configuration works.
- on 64-bit user-space and kernel, every configuration works.
- with 32-bit user-space on a 64-bit kernel the only configuration
where it works is when the compat patch is applied everywhere.
The type of fields exchanged between kernel and userspace must be
compat_ulong_t instead of unsigned long in case of compat where
userspace is 32 bits and kernel is 64 bits.
Fix ring_buffer_frontend.c: missing include lttng-tracer-core.h
In lib/ringbuffer/ring_buffer_frontend.c, RING_BUFFER_ALIGN is undefined,
leading to no alignment offset being recorded after the call to
config->cb.record_header_size() in lib_ring_buffer_try_reserve_slow().
However, lttng-ring-buffer-client.h does define RING_BUFFER_ALIGN, so
the alignment offset will be produced when the packet header is written
in lttng_write_event_header().
This discrepancy may be observed on architectures that don't set
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS, such as ARM, with a babeltrace
error such as:
indicating that the actual content size differs from the calculated one
due to the difference in alignment. Including the appropriate header
file in ring_buffer_frontend.c solves the problem.
fix timestamps on architectures without CONFIG_KTIME_SCALAR
trace_clock_monotonic_wrapper() should return a u64 representing the
number of nanoseconds since system startup.
ktime_get() provides that value directly within its .tv64 field only
on those architectures defining CONFIG_KTIME_SCALAR, whereas in all
other cases (e.g. PowerPC) a ktime_to_ns() conversion (which
translates back to .tv64 when CONFIG_KTIME_SCALAR is defined)
becomes necessary.
Julien Desfossez [Thu, 23 Aug 2012 21:11:35 +0000 (17:11 -0400)]
lttng_statedump_process_state for each PID NS
When a process is in a namespace, its pid, tid and ppid are relative to
the namespace. Since namespaces can be nested, we need to know the
representation of each process in each namespace.
This patch changes the lttng_enumerate_task_fd to iterate over each
PID namespace of a process if any, that way we generate, in the
statedump, an entry for each process in each namespace it belongs.
To know the nesting level, the field "level" is added to the
lttng_statedump_process_state event, 0 being the top-level.
For processes running on the top-level namespace, the statedump
behaviour is unchanged (except the added "level" field).
For example (no nesting, just one level of namespace) :
lttng_statedump_process_state: {
tid = 32185, vtid = 1, pid = 32185,
vpid = 1, ppid = 32173, vppid = 0,
level = 1, name = "init" }
lttng_statedump_process_state: {
tid = 32185, vtid = 32185, pid = 32185,
vpid = 32185, ppid = 32173, vppid = 32173,
level = 0, name = "init" }
Confirmed that the process 32173 in the top-level namespace is indeed
the lxc-start command that created the container and its namespace.