Also remove the confusing comment about checking if a fd exists. I
could not find one instance in the entire kernel that still matches
the description or the reason for the name fcheck.
The need for better names became apparent in the last round of
discussion of this set of changes[1].
The canary __canary__get_pfnblock_flags_mask has never done its job of
detecting changes to the prototype of get_pfnblock_flags_mask because
it was actually calling the wrapper, because the wrapper/page_alloc.h
header maps get_pfnblock_flags_mask to wrapper_get_pfnblock_flags_mask.
Unfortunately, this wrapper is included by page_alloc.c only _after_ the
linux/pageblock-flags.h header is included, which means the
get_pfnblock_flags_mask prototype does _not_ have the wrapper prefix,
which prevents it from being useful for any kind of type validation.
This has been detected by a compiler warning stating that
wrapper_get_pfnblock_flags_mask() does not have a prior declaration.
Move the wrapper/page_alloc.h include _before_ including
pageblock-flags.h. This ensures the declaration has the wrapper_ prefix,
and therefore the compiler compares the declaration with the definition
of wrapper_get_pfnblock_flags_mask within page_alloc.c. The canary
function can be removed because it is redundant with this type check.
With this proper type check in place, we notice the following two
changes upstream:
commit 535b81e209219 ("mm/page_alloc.c: remove unnecessary end_bitidx for [set|get]_pfnblock_flags_mask()")
introduced in v5.9 removes the end_bitidx argument.
commit ca891f41c4c79 ("mm: constify get_pfnblock_flags_mask and get_pfnblock_migratetype")
introduced in v5.14 adds a const qualifier to the struct page pointer.
Adapt the code to match the evolution of this prototype.
Timers are added to the timer wheel off by one. This is required in
case a timer is queued directly before incrementing jiffies to prevent
early timer expiry.
When reading a timer trace and relying only on the expiry time of the timer
in the timer_start trace point and on the now in the timer_expiry_entry
trace point, it seems that the timer fires late. With the current
timer_expiry_entry trace point information only now=jiffies is printed but
not the value of base->clk. This makes it impossible to draw a conclusion
to the index of base->clk and makes it impossible to examine timer problems
without additional trace points.
Therefore add the base->clk value to the timer_expire_entry trace
point, to be able to calculate the index the timer base is located at
during collecting expired timers.
Change-Id: I2ebdbb637db0966ff51f45bf66916a59a496b50c Signed-off-by: Kienan Stewart <kstewart@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Kienan Stewart [Fri, 22 Mar 2024 13:28:08 +0000 (09:28 -0400)]
Fix: build kvm probe on EL 8.4+
The lower value of the EL range, 240.15.1, corresponds to the first
import of EL r8 kernels into Rocky Linux's kernel staging repo.
The change may have been introduced in an earlier RHEL 8 kernel,
prior to the history of imports into Rocky.
Change-Id: Icefe472d43e28cc09746e9e046b12299609ebab1 Signed-off-by: Kienan Stewart <kstewart@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Kienan Stewart [Fri, 22 Mar 2024 13:55:55 +0000 (09:55 -0400)]
Fix: support ext4_journal_start on EL 8.4+
The lower value of the EL range, 240.15.1, corresponds to the first
import of EL r8 kernels into Rocky Linux's kernel staging repo.
The change may have been introduced in an earlier RHEL 8 kernel,
prior to the history of imports into Rocky.
Change-Id: Ibec02b382478bee33947d079f33835823827f4c5 Signed-off-by: Kienan Stewart <kstewart@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Explicitly include mmu.h in spte.h instead of relying on the "parent" to
include mmu.h. spte.h references a variety of macros and variables that
are defined/declared in mmu.h, and so including spte.h before (or instead
of) mmu.h will result in build errors, e.g.
Signed-off-by: Kienan Stewart <kstewart@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I5c3fc87d3b006cefbcca198e6e15868a342cb8dd
Kienan Stewart [Fri, 8 Mar 2024 17:47:06 +0000 (12:47 -0500)]
Fix: Correct minimum version in jbd2 SLE kernel range
This range was introduced in commit b49650509ff072d37ec112cf45a5f14f382c9a31;
however, the range is wrong and worked because the kernel versions
(eg. `5.14.21-150400.24.100-default`) were evaluated to values
greater than `LTTNG_SLE_KERNEL_RANGE(5,14,21,24,46,1)`.
As a result builds of lttng-modules against older versions of SLE
kernels failed.
Change-Id: I23d97d84a23c7b24e957fe943932d6aefbe1b409 Signed-off-by: Kienan Stewart <kstewart@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Kienan Stewart [Fri, 8 Mar 2024 16:26:02 +0000 (11:26 -0500)]
Fix: Handle recent SLE major version codes
Starting in early 2022, the SLE linux version codes changed from the
previous style `5.3.18-59.40.1` to a new convention in which the major
version is a compound number consisting of the major release version,
the service pack version, and the auxillary version (currently unused
from my understanding) similar to the following `5.3.18-150300.59.43.1`[1].
The newer values used in the SLE major version causes the integer
value to "overflow" the expected number of digits and the comparisons
may fail. The `LTTNG_SLE_KERNEL_VERSION` macro also multiplies the
`LTTNG_KERNEL_VERSION` by `100000000ULL` which doesn't work in all
situations, as the resulting value is too large to be stored fully in
an `unsigned long long`.
Example of previous results:
```
// Example range comparison. True or false depending on the value of
// `LTTNG_SLE_VERSION_CODE` and `LTTNG_LINUX_VERSION_CODE`.
LTTNG_SLE_KERNEL_RANGE(5,15,21,150400,24,46, 5,15,0,0,0,0);
`LTTNG_KERNEL_VERSION` packs the kernel version into a 32-bit integer;
however, using that type of packing on the SLE kernel version will not
work well:
In this patch, the SLE version is packed into a 64-bit integer
with 48 bits for the major version, 8 bits for each of the minor and
patch versions.
As a result of packing the SLE version into a 64-bit integer,
it is not possible to coherently combine an `LTTNG_KERNEL_VERSION` and
an `LTTNG_SLE_KERNEL_VERSION`. Doing so would require an integer
larger than 64-bits. Therefore, the `LTTNG_SLE_KERNEL_RANGE` macro has
been adjusted to perform the range comparisons using the two values
separately. The usage of the `LTTNG_SLE_KERNEL_RANGE` remains
unchanged, as `LTTNG_SLE_VERSION` is only used inside that macro.
Using the adjusted macros:
```
// Example range comparison. True or false depending on the value of
// `LTTNG_SLE_VERSION_CODE` and `LTTNG_LINUX_VERSION_CODE`.
LTTNG_SLE_KERNEL_RANGE(5,15,21,150400,24,46, 5,15,0,0,0,0);
It's possible that future releases of SLE kernels have minor or patch
values that exceed 255 (SLE15SP1 has a release using `197`, for example),
requiring an adjustment to using more bits for those fields when
packing into a 64-bit integer.
The schema of multiplying an `LTTNG_KERNEL_VERSION` by a large value
is used for other distributions. RHEL in particular uses
`100000000ULL`, which could lead to overflow issues with certain
comparisons similar to the previous behaviour of
`LTTNG_SLE_KERNEL_VERSION(5,15,0,0,0,0);`.
`struct load_op` has a trailing 0-length array `data` member that is
used to refer, in the context of BYTECODE_OP_LOAD_STAR_GLOB_STRING, to
an immediate string operand that follows it.
During the validation of a filtering bytecode, strnlen is properly used
to determine the size of the immediate string operand, with a `maxlen`
parameter that is used to ensure the string operand is contained within
the bytecode (see lttng-bytecode-validator.c:434).
However, recent KSPP-related changes have enabled additional overrun
checks when statically-sized and flexible arrays are used. Those are
enabled when the kernel is built with CONFIG_UBSAN_BOUNDS and/or
CONFIG_FORTIFY_SOURCE configured.
The KBUILD CFLAGS now contain `-fstrict-flex-arrays=3`, which is
recognized by gcc 13+[1] and allows proper coverage of dynamically sized
trailing arrays when those configuration options are used.
With those validations in place, the kernel assumes that the `data`
array is truly of length 0 and it BUGs to warn of an invalid access.
The commit linked above contains a number of links explaining the
rationale for transitioning uses of the trailing zero-length arrays (a
gcc extension) to C99 flexible array members (FAM).
This was discussed at this year's GNU Cauldron [2].
Solution
--------
Uses of zero-length arrays (`foo[0]`) are replaced by flexible array
members (`foo[]`). The only cases that are left untouched are those
where the zero-length array is used to indicate the end of a
structure (i.e. it doesn't indicate that a variable number of elements
follow), see the `metadata_packet_header`, `metadata_record_header`,
`event_notifier_packet_header`, and `event_notifier_record_header`
structures.
It may be desirable to use the new `counted_by` attribute for some of
those in the future (`lttng_kernel_abi_filter_bytecode`,
`lttng_kernel_abi_capture_bytecode`, and `bytecode_runtime`) [3].
Note
----
While this is tagged as a memory handling 'fix', it has no security
implication as far as I can tell. The accesses that are flagged by the
new validations were valid.
This merely allows the runtime validations to understand the memory
layout properly.
sched/tracing: Append prev_state to tp args instead
Commit fa2c3254d7cf (sched/tracing: Don't re-read p->state when emitting
sched_switch event, 2022-01-20) added a new prev_state argument to the
sched_switch tracepoint, before the prev task_struct pointer.
This reordering of arguments broke BPF programs that use the raw
tracepoint (e.g. tp_btf programs). The type of the second argument has
changed and existing programs that assume a task_struct* argument
(e.g. for bpf_task_storage access) will now fail to verify.
If we instead append the new argument to the end, all existing programs
would continue to work and can conditionally extract the prev_state
argument on supported kernel versions.
Change-Id: Ife2ec88a8bea2743562590cbd357068d7773863f Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
There is no good reason to keep genhd.h separate from the main blkdev.h
header that includes it. So fold the contents of genhd.h into blkdev.h
and remove genhd.h entirely.
Change-Id: I7cf2aaa3a4c133320b95f2edde49f790f9515dbd Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
The print format error was found when using ftrace event:
<...>-1406 [000] .... 23599442.895823: jbd2_end_commit: dev 252,8 transaction -1866216965 sync 0 head -1866217368
<...>-1406 [000] .... 23599442.896299: jbd2_start_commit: dev 252,8 transaction -1866216964 sync 0
Use the correct print format for transaction, head and tid.
Change-Id: Ieee3d39ed1f2515e096e87d18b5ea8f921c54bd0 Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
The print format error was found when using ftrace event:
<...>-1406 [000] .... 23599442.895823: jbd2_end_commit: dev 252,8 transaction -1866216965 sync 0 head -1866217368
<...>-1406 [000] .... 23599442.896299: jbd2_start_commit: dev 252,8 transaction -1866216964 sync 0
Use the correct print format for transaction, head and tid.
Change-Id: I7601f5cbb86495c2607be7b11e02724c90b3ebf9 Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
The print format error was found when using ftrace event:
<...>-1406 [000] .... 23599442.895823: jbd2_end_commit: dev 252,8 transaction -1866216965 sync 0 head -1866217368
<...>-1406 [000] .... 23599442.896299: jbd2_start_commit: dev 252,8 transaction -1866216964 sync 0
Use the correct print format for transaction, head and tid.
Change-Id: Ic053f0e0c1e24ebc75bae51d07696aaa5e1c0094 Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
x86 x32 system calls are not supported by LTTng. They are currently not
traced simply because their system call number is beyond the range of
NR_compat_syscalls.
However, this mostly happens by accident rather than by design.
Enforce this with an explicit check for in_x32_syscall(), which clearly
documents that those are not supported.
Fix: bytecode interpreter: LOAD_FIELD: handle user fields
The instructions for recursive traversal through composed types
are used by the filter expressions which access fields nested within
composed types.
Instructions BYTECODE_OP_LOAD_FIELD_STRING and
BYTECODE_OP_LOAD_FIELD_SEQUENCE were leaving the "user" attribute
uninitialized. Initialize those to 0.
The handling of userspace strings and integers is missing in LOAD_FIELD
instructions. Therefore, ensure that the specialization leaves the
generic LOAD_FIELD instruction in place for userspace input.
Add a "user" attribute to:
- struct bytecode_get_index_data elem field (produced by the
specialization),
- struct vstack_load used by the specialization,
- struct load_ptr used by the interpreter.
Use this "user" attribute in dynamic_load_field() for integer, string
and string_sequence object types to ensure that the proper
userspace-aware accesses are performed when loading those fields.
This prevents events with userspace input arguments (e.g. pipe2 system
call fildes field) from oopsing the kernel or reading arbitrary kernel
memory when used by the filter bytecode.
The "user" field attribute (copy from userspace) is not taken into
account in the bytecode specialization and interpreter recursive
traversal through composed types (LOAD_FIELD bytecode instructions).
Those are currently used by the filter expressions which access fields
nested within composed types.
Move the "user" attribute from the event fields to the integer and
string types. This will allow ensuring that the bytecode specialization
and interpreter have access to this user attribute even in nested types
(e.g. arrays, sequences) in a subsequent change.
He Zhe [Tue, 27 Sep 2022 07:59:42 +0000 (15:59 +0800)]
wrapper: powerpc64: fix kernel crash caused by do_get_kallsyms
Kernel crashes on powerpc64 ABIv2 as follow when lttng_tracer initializes,
since do_get_kallsyms in lttng_wrapper fails to return a proper address of
kallsyms_lookup_name.
root@qemuppc64:~# lttng create trace_session --live -U net://127.0.0.1
Spawning a session daemon
lttng_kretprobes: loading out-of-tree module taints kernel.
BUG: Unable to handle kernel data access on read at 0xfffffffffffffff8
Faulting instruction address: 0xc0000000001f6fd0
Oops: Kernel access of bad area, sig: 11 [#1]
<snip>
NIP [c0000000001f6fd0] module_kallsyms_lookup_name+0xf0/0x180
LR [c0000000001f6f28] module_kallsyms_lookup_name+0x48/0x180
Call Trace:
module_kallsyms_lookup_name+0x34/0x180 (unreliable)
kallsyms_lookup_name+0x258/0x2b0
wrapper_kallsyms_lookup_name+0x4c/0xd0 [lttng_wrapper]
wrapper_get_pfnblock_flags_mask_init+0x28/0x60 [lttng_wrapper]
lttng_events_init+0x40/0x344 [lttng_tracer]
do_one_initcall+0x78/0x340
do_init_module+0x6c/0x2f0
__do_sys_finit_module+0xd0/0x120
system_call_exception+0x194/0x2f0
system_call_vectored_common+0xe8/0x278
<snip>
do_get_kallsyms makes use of kprobe_register and in turn kprobe_lookup_name
to get the address of the kernel function kallsyms_lookup_name. In case of
PPC64_ELF_ABI_v2, when kprobes are placed at function entry,
kprobe_lookup_name adjusts the global entry point of the function returned
by kallsyms_lookup_name to the local entry point(at some fixed offset of
global one). This adjustment is all for kprobes to be able to work properly.
Global and local entry point are defined in powerpc64 ABIv2.
When the local entry point is given, some instructions at the beginning of
the function are skipped and thus causes the above kernel crash. We just
want to make a simple function call which needs global entry point.
This patch adds 4 bytes which is the length of one instruction to
kallsyms_lookup_name so that it will not trigger the global to local
adjustment, and then substracts 4 bytes from the returned address. See the
following kernel change for more details.
Michael Jeanson [Wed, 10 Aug 2022 15:07:14 +0000 (11:07 -0400)]
fix: tie compaction probe build to CONFIG_COMPACTION
The definition of 'struct compact_control' in 'mm/internal.h' depends on
CONFIG_COMPACTION being defined. Only build the compaction probe when
this configuration option is enabled.
Thanks to Bruce Ashfield <bruce.ashfield@gmail.com> for reporting this
issue.
Change-Id: I81e77aa9c1bf10452c152d432fe5224df0db42c9 Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Tue, 31 May 2022 19:24:48 +0000 (15:24 -0400)]
fix: 'random' tracepoints removed in stable kernels
The upstream commit 14c174633f349cb41ea90c2c0aaddac157012f74 removing
the 'random' tracepoints is being backported to multiple stable kernel
branches, I don't see how that qualifies as a fix but here we are.
Use the presence of 'include/trace/events/random.h' in the kernel source
tree instead of the rather tortuous version check to determine if we
need to build 'lttng-probe-random.ko'.
Change-Id: I8f5f2f4c9e09c61127c49c7949b22dd3fab0460d Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
These explicit tracepoints aren't really used and show sign of aging.
It's work to keep these up to date, and before I attempted to keep them
up to date, they weren't up to date, which indicates that they're not
really used. These days there are better ways of introspecting anyway.
Signed-off-by: He Zhe <zhe.he@windriver.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I0b7eb8aa78b5bd2039e20ae3e1da4c5eb9018789
These explicit tracepoints aren't really used and show sign of aging.
It's work to keep these up to date, and before I attempted to keep them
up to date, they weren't up to date, which indicates that they're not
really used. These days there are better ways of introspecting anyway.
Change-Id: I3b8c3e2732e7efdd76ce63204ac53a48784d0df6 Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
commit ad5e2bb2 ("Fix: tracepoint event: allow same provider and event name")
erroneously removes an include from the tracepoint event header. This
issue crept into the cherry-pick of the patch into the stable-2.12
branch.
The commit "fix: mm: compaction: fix the migration stats in trace_mm_compaction_migratepages() (v5.17)"
Triggers this warning:
LTTng: event provider mismatch: The event name needs to start with provider name + _ + one or more letter, provider: compaction, event name: mm_compaction_migratepages
Rename SKB_DROP_REASON_SOCKET_FILTER, which is used
as the reason of skb drop out of socket filter before
it's part of a released kernel. It will be used for
more protocols than just TCP in future series.
random: rather than entropy_store abstraction, use global
Originally, the RNG used several pools, so having things abstracted out
over a generic entropy_store object made sense. These days, there's only
one input pool, and then an uneven mix of usage via the abstraction and
usage via &input_pool. Rather than this uneasy mixture, just get rid of
the abstraction entirely and have things always use the global. This
simplifies the code and makes reading it a bit easier.
Change-Id: I1a2a14d7b6e69a047804e1e91e00fe002f757431 Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
btrfs: pass fs_info to trace_btrfs_transaction_commit
The root on the trans->root can be anything, and generally we're
committing from the transaction kthread so it's usually the tree_root.
Change this to just take an fs_info, and to maintain compatibility
simply put the ROOT_TREE_OBJECTID as the root objectid for the
tracepoint. This will allow use to remove trans->root.
Change-Id: Ie5a4804330edabffac0714fcb9c25b8c8599e424 Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
mm: compaction: fix the migration stats in trace_mm_compaction_migratepages()
Now the migrate_pages() has changed to return the number of {normal
page, THP, hugetlb} instead, thus we should not use the return value to
calculate the number of pages migrated successfully. Instead we can
just use the 'nr_succeeded' which indicates the number of normal pages
migrated successfully to calculate the non-migrated pages in
trace_mm_compaction_migratepages().
block: don't call blk_status_to_errno in blk_update_request
We only need to call it to resolve the blk_status_t -> errno mapping for
tracing, so move the conversion into the tracepoints that are not called
at all when tracing isn't enabled.
Change-Id: Ic556cee1d82e44a93a1467f55d45b6e17a48d387 Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
KVM: MMU: change tracepoints arguments to kvm_page_fault
Pass struct kvm_page_fault to tracepoints instead of extracting the
arguments from the struct. This also lets the kvm_mmu_spte_requested
tracepoint pick the gfn directly from fault->gfn, instead of using
the address.
Change-Id: I5ee3f344a8d1cd0ed185cdeecb3a3121183c9b43 Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
KVM: x86: Get exit_reason as part of kvm_x86_ops.get_exit_info
Extend the get_exit_info static call to provide the reason for the VM
exit. Modify relevant trace points to use this rather than extracting
the reason in the caller.
Change-Id: I28903e658eb7cbfc6666e35ba4cffba5e49d1445 Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Daniel Gomez [Fri, 8 Oct 2021 17:38:02 +0000 (19:38 +0200)]
fix: implicit-int error in EXPORT_SYMBOL_GPL
Add module header to fix error:
error: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL'
[-Werror=implicit-int]
Log:
/workdir/build/workspace/sources/lttng-modules/src/wrapper/random.c:54:1:
warning: data definition has no type or storage class
54 EXPORT_SYMBOL_GPL(wrapper_get_bootid);
^~~~~~~~~~~~~~~~~
/workdir/build/workspace/sources/lttng-modules/src/wrapper/random.c:54:1:
error: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL'
[-Werror=implicit-int]
/workdir/build/workspace/sources/lttng-modules/src/wrapper/random.c:54:1:
warning: parameter names (without types) in function declaration
cc1: some warnings being treated as errors
make[3]: ***
[/workdir/build/tmp/work-shared/qt5222/kernel-source/scripts/Makefile.build:271:
/workdir/build/workspace/sources/lttng-modules/src/wrapper/random.o]
Signed-off-by: Daniel Gomez <daniel@qtec.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: If1b96f6fcc19988b94dc09b12d071f2c61491b83
Michael Jeanson [Mon, 13 Sep 2021 18:16:22 +0000 (14:16 -0400)]
fix: Revert "Makefile: Enable -Wimplicit-fallthrough for Clang" (v5.15)
Starting with v5.15, "-Wimplicit-fallthrough=5" was added to the build
flags which requires the use of "__attribute__((__fallthrough__))" to
annotate fallthrough case statements.
It turns out that the problem with the clang -Wimplicit-fallthrough
warning is not about the kernel source code, but about clang itself, and
that the warning is unusable until clang fixes its broken ways.
In particular, when you enable this warning for clang, you not only get
warnings about implicit fallthroughs. You also get this:
warning: fallthrough annotation in unreachable code [-Wimplicit-fallthrough]
which is completely broken becasue it
(a) doesn't even tell you where the problem is (seriously: no line
numbers, no filename, no nothing).
(b) is fundamentally broken anyway, because there are perfectly valid
reasons to have a fallthrough statement even if it turns out that
it can perhaps not be reached.
In the kernel, an example of that second case is code in the scheduler:
switch (state) {
case cpuset:
if (IS_ENABLED(CONFIG_CPUSETS)) {
cpuset_cpus_allowed_fallback(p);
state = possible;
break;
}
fallthrough;
case possible:
where if CONFIG_CPUSETS is enabled you actually never hit the
fallthrough case at all. But that in no way makes the fallthrough
wrong.
So the warning is completely broken, and enabling it for clang is a very
bad idea.
In the meantime, we can keep the gcc option enabled, and make the gcc
build use
-Wimplicit-fallthrough=5
which means that we will at least continue to require a proper
fallthrough statement, and that gcc won't silently accept the magic
comment versions. Because gcc does this all correctly, and while the odd
"=5" part is kind of obscure, it's documented in [1]:
"-Wimplicit-fallthrough=5 doesn’t recognize any comments as
fallthrough comments, only attributes disable the warning"
so if clang ever fixes its bad behavior we can try enabling it there again.
Change-Id: Iea69849592fb69ac04fb9bb28efcd6b8dce8ba88 Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
The counting 'rwsem' hackery of get|put_online_cpus() is going to be
replaced by percpu rwsem.
Rename the functions to make it clear that it's locking and not some
refcount style interface. These new functions will be used for the
preparatory patches which make the code ready for the percpu rwsem
conversion.
Rename all instances in the cpu hotplug code while at it.
Change-Id: I5a37cf5afc075a402b7347989fac637dfa60a1ed Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change the type and name of task_struct::state. Drop the volatile and
shrink it to an 'unsigned int'. Rename it in order to find all uses
such that we can use READ_ONCE/WRITE_ONCE as appropriate.
Change-Id: I3a379192d6b977753fe58d4f67833a78dd7a0a47 Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
btrfs: pass btrfs_inode to btrfs_writepage_endio_finish_ordered()
There is a pretty bad abuse of btrfs_writepage_endio_finish_ordered() in
end_compressed_bio_write().
It passes compressed pages to btrfs_writepage_endio_finish_ordered(),
which is only supposed to accept inode pages.
Thankfully the important info here is the inode, so let's pass
btrfs_inode directly into btrfs_writepage_endio_finish_ordered(), and
make @page parameter optional.
By this, end_compressed_bio_write() can happily pass page=NULL while
still getting everything done properly.
Also, to cooperate with such modification, replace @page parameter for
trace_btrfs_writepage_end_io_hook() with btrfs_inode.
Although this removes page_index info, the existing start/len should be
enough for most usage.
Change-Id: If96e99c2d9533d96d9d1aa6460bb7fd3ac9ed7ab Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Tue, 11 May 2021 21:22:12 +0000 (17:22 -0400)]
Disable block rwbs bitwise enum in default build
Only generate the bitwise enumerations when
CONFIG_LTTNG_EXPERIMENTAL_BITWISE_ENUM is enabled, so the default build
does not generate traces which lead to warnings when viewed with
babeltrace 1.x and babeltrace 2 with default options.
Michael Jeanson [Tue, 11 May 2021 21:15:15 +0000 (17:15 -0400)]
Disable sched_switch bitwise enum in default build
Only generate the bitwise enumerations when
CONFIG_LTTNG_EXPERIMENTAL_BITWISE_ENUM is enabled, so the default build
does not generate traces which lead to warnings when viewed with
babeltrace 1.x and babeltrace 2 with default options.
Michael Jeanson [Wed, 12 May 2021 20:21:50 +0000 (16:21 -0400)]
Add experimental bitwise enum config option
Only generate the bitwise enumerations when
CONFIG_LTTNG_EXPERIMENTAL_BITWISE_ENUM is enabled, so the default build
does not generate traces which lead to warnings when viewed with
babeltrace 1.x and babeltrace 2 with default options.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: Id54ae3df470b9cdbc0edc0a528fa79532493d1ad
The only use of I_DIRTY_TIME_EXPIRE is to detect in
__writeback_single_inode() that inode got there because flush worker
decided it's time to writeback the dirty inode time stamps (either
because we are syncing or because of age). However we can detect this
directly in __writeback_single_inode() and there's no need for the
strange propagation with I_DIRTY_TIME_EXPIRE flag.
Change-Id: I6e7c0ced13acd4fcd88bcd572d0ba1f9b254c58c Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Michael Jeanson [Mon, 10 May 2021 15:39:24 +0000 (11:39 -0400)]
fix: block: remove disk_part_iter (v5.12)
In v5.12 a refactoring of the genhd code was started and the symbols
related to 'disk_part_iter' were unexported. In v5.13 they were
completely removed.
This patch replaces the short lived compat code that is specific to
v5.12 and replaces it with a generic internal implementation that
iterates directly on the 'disk->part_tbl' xarray which will be used
on v5.12 and up.
This seems like a better option than keeping the compat code that will
only work on v5.12 and make maintenance more complicated. The compat was
backported to the stable branches but isn't yet part of a point release
so can be safely replaced.
Now that no fast path lookups in the partition table are left, there is
no point in micro-optimizing the data structure for it. Just use a bog
standard xarray.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I43d8ef8463cb7a83dc8859a32dc29502cd897ddf
Fix: increment buffer offset when failing to copy from user-space
Upon failure to copy from user-space due to failing access ok check, the
ring buffer offset is not incremented, which could generate unreadable
traces because we don't account for the padding we write into the ring
buffer.
Note that this typically won't affect a common use-case of copying
strings from user-space, because unless mprotect is invoked within a
narrow race window (between user strlen and user strcpy), the strlen
will fail on access ok when calculating the space to reserve, which will
match what happens on strcpy.
Sync `show_inode_state()` macro with Ubuntu 4.15 kernel
The following commit changed the `show_inode_state()` macro which
triggered a warning on our CI build:
commit 63388062bea96e5cd8b8d7abf7b7142f8666ca1f
Author: Jan Kara <jack@suse.cz>
Date: Mon Jan 25 12:37:43 2021 -0800
writeback: Drop I_DIRTY_TIME_EXPIRE
Also, this commit adds a comment to clarify why we keep these
`#if/#elif` even though we don't use it the macro.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I2dd53a1a286ab8a431977bda6cde01f700f0c7d9
He Zhe [Mon, 19 Apr 2021 09:09:28 +0000 (09:09 +0000)]
fix: mm, tracing: kfree event name mismatching with provider kmem (v5.12)
a8bc8ae5c932 ("fix: mm, tracing: record slab name for kmem_cache_free() (v5.12)")
introduces the following call trace for kfree. This is caused by mismatch
between kfree event and its provider kmem.