lttng-modules.git
9 months agoFix: ext4_discard_preallocations changed in linux 6.8.0-rc3
Kienan Stewart [Mon, 5 Feb 2024 13:52:29 +0000 (08:52 -0500)] 
Fix: ext4_discard_preallocations changed in linux 6.8.0-rc3

See upstream commit:

    commit f0e54b6087de9571ec61c189d6c378b81edbe3b2
    Author: Kemeng Shi <shikemeng@huaweicloud.com>
    Date:   Fri Jan 5 17:21:02 2024 +0800

        ext4: remove 'needed' in trace_ext4_discard_preallocations

        As 'needed' to trace_ext4_discard_preallocations is always 0 which
        is meaningless. Just remove it.

Change-Id: Ib6b698ca553c4beebd4ca791c83bbbb927901758
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 months agoFix: btrfs_get_extent flags and compress_type changed in linux 6.8.0-rc1
Kienan Stewart [Mon, 22 Jan 2024 18:13:36 +0000 (13:13 -0500)] 
Fix: btrfs_get_extent flags and compress_type changed in linux 6.8.0-rc1

See upstream commit:

    commit f86f7a75e2fb5fd7d31d00eab8a392f97ba42ce9
    Author: Filipe Manana <fdmanana@suse.com>
    Date:   Mon Dec 4 16:20:33 2023 +0000

        btrfs: use the flags of an extent map to identify the compression type

        Currently, in struct extent_map, we use an unsigned int (32 bits) to
        identify the compression type of an extent and an unsigned long (64 bits
        on a 64 bits platform, 32 bits otherwise) for flags. We are only using
        6 different flags, so an unsigned long is excessive and we can use flags
        to identify the compression type instead of using a dedicated 32 bits
        field.

        We can easily have tens or hundreds of thousands (or more) of extent maps
        on busy and large filesystems, specially with compression enabled or many
        or large files with tons of small extents. So it's convenient to have the
        extent_map structure as small as possible in order to use less memory.

        So remove the compression type field from struct extent_map, use flags
        to identify the compression type and shorten the flags field from an
        unsigned long to a u32. This saves 8 bytes (on 64 bits platforms) and
        reduces the size of the structure from 136 bytes down to 128 bytes, using
        now only two cache lines, and increases the number of extent maps we can
        have per 4K page from 30 to 32. By using a u32 for the flags instead of
        an unsigned long, we no longer use test_bit(), set_bit() and clear_bit(),
        but that level of atomicity is not needed as most flags are never cleared
        once set (before adding an extent map to the tree), and the ones that can
        be cleared or set after an extent map is added to the tree, are always
        performed while holding the write lock on the extent map tree, while the
        reader holds a lock on the tree or tests for a flag that never changes
        once the extent map is in the tree (such as compression flags).

Change-Id: I95402d43f064c016b423b48652e4968d3db9b8a9
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 months agoFix: btrfs_chunk tracepoints changed in linux 6.8.0-rc1
Kienan Stewart [Mon, 22 Jan 2024 17:17:33 +0000 (12:17 -0500)] 
Fix: btrfs_chunk tracepoints changed in linux 6.8.0-rc1

See upstream commit:

    commit 7dc66abb5a47778d7db327783a0ba172b8cff0b5
    Author: Filipe Manana <fdmanana@suse.com>
    Date:   Tue Nov 21 13:38:38 2023 +0000

        btrfs: use a dedicated data structure for chunk maps

        Currently we abuse the extent_map structure for two purposes:

        1) To actually represent extents for inodes;
        2) To represent chunk mappings.

        This is odd and has several disadvantages:

        1) To create a chunk map, we need to do two memory allocations: one for
           an extent_map structure and another one for a map_lookup structure, so
           more potential for an allocation failure and more complicated code to
           manage and link two structures;

        2) For a chunk map we actually only use 3 fields (24 bytes) of the
           respective extent map structure: the 'start' field to have the logical
           start address of the chunk, the 'len' field to have the chunk's size,
           and the 'orig_block_len' field to contain the chunk's stripe size.

           Besides wasting a memory, it's also odd and not intuitive at all to
           have the stripe size in a field named 'orig_block_len'.

           We are also using 'block_len' of the extent_map structure to contain
           the chunk size, so we have 2 fields for the same value, 'len' and
           'block_len', which is pointless;

        3) When an extent map is associated to a chunk mapping, we set the bit
           EXTENT_FLAG_FS_MAPPING on its flags and then make its member named
           'map_lookup' point to the associated map_lookup structure. This means
           that for an extent map associated to an inode extent, we are not using
           this 'map_lookup' pointer, so wasting 8 bytes (on a 64 bits platform);

        4) Extent maps associated to a chunk mapping are never merged or split so
           it's pointless to use the existing extent map infrastructure.

        So add a dedicated data structure named 'btrfs_chunk_map' to represent
        chunk mappings, this is basically the existing map_lookup structure with
        some extra fields:

        1) 'start' to contain the chunk logical address;
        2) 'chunk_len' to contain the chunk's length;
        3) 'stripe_size' for the stripe size;
        4) 'rb_node' for insertion into a rb tree;
        5) 'refs' for reference counting.

        This way we do a single memory allocation for chunk mappings and we don't
        waste memory for them with unused/unnecessary fields from an extent_map.

        We also save 8 bytes from the extent_map structure by removing the
        'map_lookup' pointer, so the size of struct extent_map is reduced from
        144 bytes down to 136 bytes, and we can now have 30 extents map per 4K
        page instead of 28.

Change-Id: Ie52b5ac83df4bc6abeb84d958c4f5d24ae0d8c75
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 months agoFix: strlcpy removed in linux 6.8.0-rc1
Kienan Stewart [Mon, 22 Jan 2024 16:47:40 +0000 (11:47 -0500)] 
Fix: strlcpy removed in linux 6.8.0-rc1

See upstream commit:

    commit d26270061ae66b915138af7cd73ca6f8b85e6b44
    Author: Kees Cook <keescook@chromium.org>
    Date:   Thu Jan 18 12:31:55 2024 -0800

        string: Remove strlcpy()

        With all the users of strlcpy() removed[1] from the kernel, remove the
        API, self-tests, and other references. Leave mentions in Documentation
        (about its deprecation), and in checkpatch.pl (to help migrate host-only
        tools/ usage). Long live strscpy().

The replacement interface, `strscpy`, has been available since linux
4.3, introduced in the upstream commit
30c44659f4a3e7e1f9f47e895591b4b40bf62671.

As lttng-modules master branch targets linux 4.4+ at this time,
`strlcpy` can be replaced with `strscpy`.

Change-Id: I27cdff70a504b25340cc59150ed8e959d9629e43
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 months agoFix: timer_start changed in linux 6.8.0-rc1
Kienan Stewart [Mon, 22 Jan 2024 16:33:39 +0000 (11:33 -0500)] 
Fix: timer_start changed in linux 6.8.0-rc1

See upstream commit

    commit dbcdcb62b59db2cf6a24113873b90da15c6f0b19
    Author: Anna-Maria Behnsen <anna-maria@linutronix.de>
    Date:   Fri Dec 1 10:26:26 2023 +0100

        tracing/timers: Enhance timer_start tracepoint

        For starting a timer, the timer is enqueued into a bucket of the timer
        wheel. The bucket expiry is the defacto expiry of the timer but it is not
        equal the timer expiry because of increasing granularity when bucket is in
        a higher level of the wheel. To be able to figure out in a trace whether a
        timer expired in time or not, the bucket expiry time is required as well.

        Add bucket expiry time to the timer_start tracepoint and thereby simplify
        the arguments.

Change-Id: I4868092765745b1efd0c48f13c0b837f2007dcb6
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
9 months agoFix: sched_stat_runtime changed in linux 6.8.0-rc1
Kienan Stewart [Mon, 22 Jan 2024 16:10:37 +0000 (11:10 -0500)] 
Fix: sched_stat_runtime changed in linux 6.8.0-rc1

See upstream commit:

    commit 5fe6ec8f6ab549b6422e41551abb51802bd48bc7
    Author: Peter Zijlstra <peterz@infradead.org>
    Date:   Mon Nov 6 13:41:43 2023 +0100

        sched: Remove vruntime from trace_sched_stat_runtime()

        Tracing the runtime delta makes sense, observer can sum over time.
        Tracing the absolute vruntime makes less sense, inconsistent:
        absolute-vs-delta, but also vruntime delta can be computed from
        runtime delta.

        Removing the vruntime thing also makes the two tracepoint sites
        identical, allowing to unify the code in a later patch.

Change-Id: I24ebb4e06dbb646a1af75ac62b74f3821ff197de
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
10 months agoVersion 2.13.11 v2.13.11
Mathieu Desnoyers [Wed, 10 Jan 2024 20:35:48 +0000 (15:35 -0500)] 
Version 2.13.11

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I0f91343f361271cc5b51f5fade12c7cc7ed90da4

10 months agoFix: Include linux/sched/rt.h for kernels v3.9 to v3.14
Mathieu Desnoyers [Wed, 10 Jan 2024 01:55:58 +0000 (20:55 -0500)] 
Fix: Include linux/sched/rt.h for kernels v3.9 to v3.14

From kernel v3.0 to v3.8, MAX_RT_PRIO is defined in linux/sched.h.

From kernel v3.9 to v3.14, MAX_RT_PRIO is defined in linux/sched/rt.h,
which is not included by linux/sched.h (hence this work-around).

From kernel v3.15 onwards, MAX_RT_PRIO is defined in linux/sched/prio.h,
which is included by linux/sched.h.

Add the missing linux/sched/rt.h include for the affected kernel version
range.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: Ie7e1d9dc710621deca04553a9b5ba7f9a4d83c15

10 months agoFix: Disable IBT around indirect function calls
Mathieu Desnoyers [Mon, 8 Jan 2024 18:31:04 +0000 (13:31 -0500)] 
Fix: Disable IBT around indirect function calls

When the Intel IBT feature is enabled, a CPU supporting this feature
validates that all indirect jumps/calls land on an ENDBR64 instruction.

The kernel seals functions which are not meant to be called indirectly,
which means that calling functions indirectly from their address fetched
using kallsyms or kprobes trigger a crash.

Use the MSR_IA32_S_CET CET_ENDBR_EN MSR bit to temporarily disable ENDBR
validation around indirect calls to kernel functions. Considering that
the main purpose of this feature is to prevent ROP-style attacks,
disabling the ENDBR validation temporarily around the call from a kernel
module does not affect the ROP protection.

Fixes #1408
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I97f5d8efce093c1e956cede1f44de2fcebf30227

10 months agoInline implementation of task_prio()
Mathieu Desnoyers [Tue, 9 Jan 2024 15:36:31 +0000 (10:36 -0500)] 
Inline implementation of task_prio()

The task_prio() function has been implemented as "return p->prio -
MAX_RT_PRIO;" since at least kernel v3.0, so inline it into
lttng-modules rather than using kallsyms to call the kernel
implementation.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I7dd482a2da72a005c16b3e5864767b47d7bc3fd3

10 months agoFix: prio context NULL pointer exception
Mathieu Desnoyers [Tue, 9 Jan 2024 15:33:13 +0000 (10:33 -0500)] 
Fix: prio context NULL pointer exception

A missing call to wrapper_task_prio_init() causes the function pointer
for task_prio to stay NULL, which triggers a OOPS when trying to use the
prio context.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I417e84cb8a07db624e682c7ec2c033fbc2a7b8e7

11 months agoFix: MODULE_IMPORT_NS is introduced in kernel 5.4
Mathieu Desnoyers [Mon, 18 Dec 2023 18:17:07 +0000 (13:17 -0500)] 
Fix: MODULE_IMPORT_NS is introduced in kernel 5.4

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I4c5faafb3a3ff8178b45c0e411113b17643bbc78

11 months agoAndroid: Import VFS namespace for android common kernel
Lei wang [Mon, 18 Dec 2023 10:16:33 +0000 (05:16 -0500)] 
Android: Import VFS namespace for android common kernel

Android GKI kernel add limitation on fs interface usage.
Need to import VFS namespace explicitly to make it workable
for lttng-modules.

Signed-off-by: Lei wang <quic_leiwan@quicinc.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 months agoFix: get_file_rcu is missing in kernels < 4.1
Mathieu Desnoyers [Fri, 1 Dec 2023 14:52:08 +0000 (09:52 -0500)] 
Fix: get_file_rcu is missing in kernels < 4.1

Open-code the get_file_rcu using atomic_long_inc_not_zero() for kernel
versions < 4.1.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I0fa905b078165ede8b1837bb8d77891d05d0e8ed

11 months agofix: lookup_fd_rcu replaced by lookup_fdget_rcu in linux 6.7.0-rc1
Kienan Stewart [Mon, 20 Nov 2023 16:34:40 +0000 (11:34 -0500)] 
fix: lookup_fd_rcu replaced by lookup_fdget_rcu in linux 6.7.0-rc1

See upstream commit:

    commit 0ede61d8589cc2d93aa78230d74ac58b5b8d0244
    Author: Christian Brauner <brauner@kernel.org>
    Date:   Fri Sep 29 08:45:59 2023 +0200

        file: convert to SLAB_TYPESAFE_BY_RCU

        In recent discussions around some performance improvements in the file
        handling area we discussed switching the file cache to rely on
        SLAB_TYPESAFE_BY_RCU which allows us to get rid of call_rcu() based
        freeing for files completely. This is a pretty sensitive change overall
        but it might actually be worth doing.

        The main downside is the subtlety. The other one is that we should
        really wait for Jann's patch to land that enables KASAN to handle
        SLAB_TYPESAFE_BY_RCU UAFs. Currently it doesn't but a patch for this
        exists.

        With SLAB_TYPESAFE_BY_RCU objects may be freed and reused multiple times
        which requires a few changes. So it isn't sufficient anymore to just
        acquire a reference to the file in question under rcu using
        atomic_long_inc_not_zero() since the file might have already been
        recycled and someone else might have bumped the reference.

        In other words, callers might see reference count bumps from newer
        users. For this reason it is necessary to verify that the pointer is the
        same before and after the reference count increment. This pattern can be
        seen in get_file_rcu() and __files_get_rcu().

        In addition, it isn't possible to access or check fields in struct file
        without first aqcuiring a reference on it. Not doing that was always
        very dodgy and it was only usable for non-pointer data in struct file.
        With SLAB_TYPESAFE_BY_RCU it is necessary that callers first acquire a
        reference under rcu or they must hold the files_lock of the fdtable.
        Failing to do either one of this is a bug.

        Thanks to Jann for pointing out that we need to ensure memory ordering
        between reallocations and pointer check by ensuring that all subsequent
        loads have a dependency on the second load in get_file_rcu() and
        providing a fixup that was folded into this patch.

Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Change-Id: Iba3663f19a54820afd31a8eeec24b3b5d4b06589
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 months agofix: mm, vmscan signatures changed in linux 6.7.0-rc1
Kienan Stewart [Mon, 20 Nov 2023 16:33:14 +0000 (11:33 -0500)] 
fix: mm, vmscan signatures changed in linux 6.7.0-rc1

See upstream commit:

    commit 3dfbb555c98ac55b9d911f9af0e35014b445fb41
    Author: Vlastimil Babka <vbabka@suse.cz>
    Date:   Thu Sep 14 15:16:39 2023 +0200

        mm, vmscan: remove ISOLATE_UNMAPPED

        This isolate_mode_t flag is effectively unused since 89f6c88a6ab4 ("mm:
        __isolate_lru_page_prepare() in isolate_migratepages_block()") as
        sc->may_unmap is now checked directly (and only node_reclaim has a mode
        that sets it to 0).  The last remaining place is mm_vmscan_lru_isolate
        tracepoint for the isolate_mode parameter.  That one was mainly used to
        indicate the active/inactive mode, which the trace-vmscan-postprocess.pl
        script consumed, but that got silently broken.  After fixing the script by
        the previous patch, it does not need the isolate_mode anymore.  So just
        remove the parameter and with that the whole ISOLATE_UNMAPPED flag.

Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: Ie7346886d926a1a9d20bcb1570c587c5e943a1c3

11 months agofix: phys_proc_id and cpu_core_id moved in linux 6.7.0-rc1
Kienan Stewart [Mon, 20 Nov 2023 16:27:12 +0000 (11:27 -0500)] 
fix: phys_proc_id and cpu_core_id moved in linux 6.7.0-rc1

See upstream commit:

    commit 02fb601d27a7abf60d52b21bdf5b100a8d63da3f
    Author: Thomas Gleixner <tglx@linutronix.de>
    Date:   Mon Aug 14 10:18:30 2023 +0200

        x86/cpu: Move phys_proc_id into topology info

        Rename it to pkg_id which is the terminology used in the kernel.

        No functional change.

See upstream commit:

    commit e95256335d45cc965cd12c423535002974313340
    Author: Thomas Gleixner <tglx@linutronix.de>
    Date:   Mon Aug 14 10:18:34 2023 +0200

        x86/cpu: Move cpu_core_id into topology info

        Rename it to core_id and stick it to the other ID fields.

        No functional change.

Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I574b02430210d5bb72c4b9db901d0e3a6dc7bea0

13 months agoFix build for RHEL 8.8 with linux 4.18.0-477.10.1+
Kienan Stewart [Mon, 16 Oct 2023 14:10:09 +0000 (10:10 -0400)] 
Fix build for RHEL 8.8 with linux 4.18.0-477.10.1+

4.18.0-477.10.1 introduces backports a change which updates the
`kfree_skb` trace event to the 3-argument version used in more recent
kernel versions.

Change-Id: I5a1071a59659b76e1499beae3388159ca8ced1f7
Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
13 months agoFix: bytecode validator: oops during validation of immediate string
Jérémie Galarneau [Thu, 5 Oct 2023 21:02:57 +0000 (17:02 -0400)] 
Fix: bytecode validator: oops during validation of immediate string

Issue observed
--------------

Running Linux 6.5.5, lttng-modules @ 6be48c9f, all built with gcc
13.2.1, I got a 'BUG' in dmesg while enabling the following event
rule:

  $ lttng enable-event --kernel --syscall --channel chanK --all --filter '$ctx.procname == "UST reg*"'

The relevant parts of the 'BUG' output follow:

  [  +0.715480] detected buffer overflow in strnlen
  [  +0.000001] kernel BUG at lib/string_helpers.c:1031!
  [  +0.000008] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
  [  +0.000003] CPU: 2 PID: 157174 Comm: Client manageme Tainted: G S   U     OE      6.5.5-arch1-1 #1 d82a0f532dd8cfe67d5795c1738d9c01059a0c62
  [  +0.000001] RIP: 0010:fortify_panic+0x13/0x20
  [  +0.000006] Code: 41 5d c3 cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 48 89 fe 48 c7 c7 90 22 c8 86 e8 3d aa b1 ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90
  [  +0.000002] RSP: 0018:ffffa7c7c106f918 EFLAGS: 00010246
  [  +0.000002] RAX: 0000000000000023 RBX: 000000000000000b RCX: 0000000000000000
  [  +0.000002] RDX: 0000000000000000 RSI: ffff92766e4a16c0 RDI: ffff92766e4a16c0
  [  +0.000001] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffa7c7c106f7c0
  [  +0.000001] R10: 0000000000000003 R11: ffffffff874ca068 R12: ffff927618202480
  [  +0.000001] R13: ffff9276182024d2 R14: ffff927453999c08 R15: ffff9273dc7aa478
  [  +0.000001] FS:  00007f06553f9680(0000) GS:ffff92766e480000(0000) knlGS:0000000000000000
  [  +0.000002] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [  +0.000002] CR2: 0000556d54eceaa8 CR3: 00000001ad9de002 CR4: 00000000003706e0
  [  +0.000001] Call Trace:
  [  +0.000002]  <TASK>
  [  +0.000002]  ? die+0x36/0x90
  [  +0.000004]  ? do_trap+0xda/0x100
  [  +0.000003]  ? fortify_panic+0x13/0x20
  [  +0.000002]  ? do_error_trap+0x6a/0x90
  [  +0.000002]  ? fortify_panic+0x13/0x20
  [  +0.000002]  ? exc_invalid_op+0x50/0x70
  [  +0.000003]  ? fortify_panic+0x13/0x20
  [  +0.000002]  ? asm_exc_invalid_op+0x1a/0x20
  [  +0.000005]  ? fortify_panic+0x13/0x20
  [  +0.000002]  ? fortify_panic+0x13/0x20
  [  +0.000003]  bytecode_validate_overflow+0x155/0x1f0 [lttng_tracer 759e3e4fee0e774ef575e93b67e8dc7955d0c2c2]
  [  +0.000330]  lttng_bytecode_validate_load+0x32/0x1e0 [lttng_tracer 759e3e4fee0e774ef575e93b67e8dc7955d0c2c2]
  [  +0.000183]  lttng_enabler_link_bytecode+0x135/0x5a0 [lttng_tracer 759e3e4fee0e774ef575e93b67e8dc7955d0c2c2]
  [  +0.000132]  lttng_sync_event_list+0xef/0x650 [lttng_tracer 759e3e4fee0e774ef575e93b67e8dc7955d0c2c2]
  [  +0.000123]  ? __wake_up_common+0x73/0x180
  [  +0.000004]  lttng_session_enable+0x3e/0x130 [lttng_tracer 759e3e4fee0e774ef575e93b67e8dc7955d0c2c2]
  [  +0.000121]  lttng_session_ioctl+0x5db/0x720 [lttng_tracer 759e3e4fee0e774ef575e93b67e8dc7955d0c2c2]
  [  +0.000120]  ? __slab_free+0xf1/0x330
  [  +0.000004]  ? __scm_recv_common.isra.0+0x144/0x180
  [  +0.000004]  ? unix_stream_read_generic+0x233/0xb60
  [  +0.000006]  __x64_sys_ioctl+0x94/0xd0
  [  +0.000004]  do_syscall_64+0x5d/0x90
  [  +0.000004]  ? switch_fpu_return+0x50/0xe0
  [  +0.000004]  ? exit_to_user_mode_prepare+0x132/0x1e0
  [  +0.000003]  ? syscall_exit_to_user_mode+0x2b/0x40
  [  +0.000002]  ? do_syscall_64+0x6c/0x90
  [  +0.000003]  ? do_syscall_64+0x6c/0x90
  [  +0.000002]  ? do_syscall_64+0x6c/0x90
  [  +0.000002]  ? do_syscall_64+0x6c/0x90
  [  +0.000002]  ? syscall_exit_to_user_mode+0x2b/0x40
  [  +0.000002]  ? do_syscall_64+0x6c/0x90
  [  +0.000002]  ? do_syscall_64+0x6c/0x90
  [  +0.000002]  ? do_syscall_64+0x6c/0x90
  [  +0.000002]  ? do_syscall_64+0x6c/0x90
  [  +0.000002]  ? exc_page_fault+0x7f/0x180
  [  +0.000003]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8

Cause
-----

`struct load_op` has a trailing 0-length array `data` member that is
used to refer, in the context of BYTECODE_OP_LOAD_STAR_GLOB_STRING, to
an immediate string operand that follows it.

During the validation of a filtering bytecode, strnlen is properly used
to determine the size of the immediate string operand, with a `maxlen`
parameter that is used to ensure the string operand is contained within
the bytecode (see lttng-bytecode-validator.c:434).

However, recent KSPP-related changes have enabled additional overrun
checks when statically-sized and flexible arrays are used. Those are
enabled when the kernel is built with CONFIG_UBSAN_BOUNDS and/or
CONFIG_FORTIFY_SOURCE configured.

The KBUILD CFLAGS now contain `-fstrict-flex-arrays=3`, which is
recognized by gcc 13+[1] and allows proper coverage of dynamically sized
trailing arrays when those configuration options are used.

With those validations in place, the kernel assumes that the `data`
array is truly of length 0 and it BUGs to warn of an invalid access.

The commit linked above contains a number of links explaining the
rationale for transitioning uses of the trailing zero-length arrays (a
gcc extension) to C99 flexible array members (FAM).

This was discussed at this year's GNU Cauldron [2].

Solution
--------

Uses of zero-length arrays (`foo[0]`) are replaced by flexible array
members (`foo[]`). The only cases that are left untouched are those
where the zero-length array is used to indicate the end of a
structure (i.e. it doesn't indicate that a variable number of elements
follow), see the `metadata_packet_header`, `metadata_record_header`,
`event_notifier_packet_header`, and `event_notifier_record_header`
structures.

It may be desirable to use the new `counted_by` attribute for some of
those in the future (`lttng_kernel_abi_filter_bytecode`,
`lttng_kernel_abi_capture_bytecode`, and `bytecode_runtime`) [3].

Note
----

While this is tagged as a memory handling 'fix', it has no security
implication as far as I can tell. The accesses that are flagged by the
new validations were valid.

This merely allows the runtime validations to understand the memory
layout properly.

[1] https://github.com/torvalds/linux/commit/df8fc4e934c12b906d08050d7779f292b9c5c6b5
[2] https://gcc.gnu.org/wiki/cauldron2023talks?action=AttachFile&do=get&target=Most-wanted+Security+Features+in+GCC+for+Linux+Kernel.pdf
[3] https://lwn.net/Articles/930943/

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: Id39b101aaafe68f8fae6b86cd61806cba8cb1e6a

13 months agofix: lttng-probe-kvm-x86-mmu build with linux 6.6
Kienan Stewart [Tue, 26 Sep 2023 18:45:09 +0000 (14:45 -0400)] 
fix: lttng-probe-kvm-x86-mmu build with linux 6.6

A small change was made upstream in `spte.h` that requires
`arch/x86/kvm` to be added to the search path when
building lttng-probe-kvm.x86-mmu.o.

See upstream commit :

  commit d10f3780bc2f80744d291e118c0c8bade54ed3b8
  Author: Sean Christopherson <seanjc@google.com>
  Date:   Tue Aug 8 15:40:59 2023 -0700

      KVM: x86/mmu: Include mmu.h in spte.h

      Explicitly include mmu.h in spte.h instead of relying on the "parent" to
      include mmu.h.  spte.h references a variety of macros and variables that
      are defined/declared in mmu.h, and so including spte.h before (or instead
      of) mmu.h will result in build errors, e.g.

Signed-off-by: Kienan Stewart <kstewart@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I5c3fc87d3b006cefbcca198e6e15868a342cb8dd

15 months agofix: built-in lttng with kernel >= v6.1
Michael Jeanson [Fri, 18 Aug 2023 15:28:30 +0000 (11:28 -0400)] 
fix: built-in lttng with kernel >= v6.1

In kernel v6.1 the list of subdirectories was moved from Makefile to
Kbuild. Adjust our built-in.sh script to detect this change and use the
appropriate file to graft ourself to the kernel build system.

Thanks to Richa Bharti for the initial patch.

See upstream commit:

  commit 5750121ae7382ebac8d47ce6d68012d6cd1d7926
  Author: Masahiro Yamada <masahiroy@kernel.org>
  Date:   Sun Sep 25 03:19:10 2022 +0900

    kbuild: list sub-directories in ./Kbuild

    Use the ordinary obj-y syntax to list subdirectories.

Change-Id: Ifc0f1bdea5ee59b0e0b96cdb31c9c689deb20559
Reported-by: Richa Bharti <Richa.Bharti@siemens.com>
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
16 months agofix: ubuntu kinetic kernel range for jdb2
Michael Jeanson [Fri, 7 Jul 2023 17:27:15 +0000 (13:27 -0400)] 
fix: ubuntu kinetic kernel range for jdb2

Kinetic introduces a 'lowlatency' kernel with a different ABI number
than the 'generic' flavor, add 2 ranges accordingly.

Change-Id: I89427e30672f3f25b2f6d698d6e1cabfb45d9366
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
17 months agoVersion 2.13.10 v2.13.10
Mathieu Desnoyers [Wed, 7 Jun 2023 14:53:24 +0000 (10:53 -0400)] 
Version 2.13.10

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I9d1d358ba28dd0f9f68b54c52e784d002d9bc74c

19 months agoAdd support for RHEL 9.1
Michael Jeanson [Fri, 14 Apr 2023 19:09:25 +0000 (15:09 -0400)] 
Add support for RHEL 9.1

Change-Id: I2aaa8e385448b1e46c3c16edc4f36f2eb6906e76
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
19 months agoAdd support for RHEL 9.0
Michael Jeanson [Tue, 19 Jul 2022 19:07:22 +0000 (15:07 -0400)] 
Add support for RHEL 9.0

Change-Id: Ia01527c3d6243805445734f00f4f2f945efd16e7
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
19 months agofix: kallsyms wrapper on CONFIG_PPC64_ELF_ABI_V1
Michael Jeanson [Tue, 29 Nov 2022 17:10:17 +0000 (12:10 -0500)] 
fix: kallsyms wrapper on CONFIG_PPC64_ELF_ABI_V1

Change-Id: Ibdff5792a1511b678f7776f5d032758db739c5ad
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
20 months agofix: net: add location to trace_consume_skb() (v6.3)
Michael Jeanson [Tue, 7 Mar 2023 16:10:26 +0000 (11:10 -0500)] 
fix: net: add location to trace_consume_skb() (v6.3)

See upstream commit :

  commit dd1b527831a3ed659afa01b672d8e1f7e6ca95a5
  Author: Eric Dumazet <edumazet@google.com>
  Date:   Thu Feb 16 15:47:18 2023 +0000

    net: add location to trace_consume_skb()

    kfree_skb() includes the location, it makes sense
    to add it to consume_skb() as well.

Change-Id: I8d871187d90e7fe113a63e209b00aebe0df475f3
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
20 months agofix: btrfs: pass find_free_extent_ctl to allocator tracepoints (v6.3)
Michael Jeanson [Tue, 7 Mar 2023 16:26:25 +0000 (11:26 -0500)] 
fix: btrfs: pass find_free_extent_ctl to allocator tracepoints (v6.3)

See upstream commit :

  commit cfc2de0fce015d4249c674ef9f5e0b4817ba5c53
  Author: Boris Burkov <boris@bur.io>
  Date:   Thu Dec 15 16:06:31 2022 -0800

    btrfs: pass find_free_extent_ctl to allocator tracepoints

    The allocator tracepoints currently have a pile of values from ffe_ctl.
    In modifying the allocator and adding more tracepoints, I found myself
    adding to the already long argument list of the tracepoints. It makes it
    a lot simpler to just send in the ffe_ctl itself.

Change-Id: Iab4132a9d3df3a6369591a50fb75374b1e399fa4
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
20 months agofix: uuid: Decouple guid_t and uuid_le types and respective macros (v6.3)
Michael Jeanson [Tue, 7 Mar 2023 17:05:00 +0000 (12:05 -0500)] 
fix: uuid: Decouple guid_t and uuid_le types and respective macros (v6.3)

See upstream commit :

  commit 5e6a51787fef20b849682d8c49ec9c2beed5c373
  Author: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
  Date:   Tue Jan 24 15:38:38 2023 +0200

    uuid: Decouple guid_t and uuid_le types and respective macros

    The guid_t type and respective macros are being used internally only.
    The uuid_le has its user outside the kernel. Decouple these types and
    macros, and make guid_t completely internal type to the kernel.

Change-Id: I8644fd139b0630e9cf18886b84e33bffab1e5abd
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
20 months agofix: mm: introduce vma->vm_flags wrapper functions (v6.3)
Michael Jeanson [Tue, 7 Mar 2023 16:41:14 +0000 (11:41 -0500)] 
fix: mm: introduce vma->vm_flags wrapper functions (v6.3)

See upstream commit :

  commit bc292ab00f6c7a661a8a605c714e8a148f629ef6
  Author: Suren Baghdasaryan <surenb@google.com>
  Date:   Thu Jan 26 11:37:47 2023 -0800

    mm: introduce vma->vm_flags wrapper functions

    vm_flags are among VMA attributes which affect decisions like VMA merging
    and splitting.  Therefore all vm_flags modifications are performed after
    taking exclusive mmap_lock to prevent vm_flags updates racing with such
    operations.  Introduce modifier functions for vm_flags to be used whenever
    flags are updated.  This way we can better check and control correct
    locking behavior during these updates.

Change-Id: I2cf662420d9d7748e5e310d3ea4bac98ba7d7f94
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
20 months agoVersion 2.13.9 v2.13.9
Mathieu Desnoyers [Fri, 3 Mar 2023 15:39:24 +0000 (10:39 -0500)] 
Version 2.13.9

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: Ie5399d8f24102ee78aa6950222aa64289bbdb6ed

22 months agofix: jbd2: use the correct print format (v5.4.229)
Michael Jeanson [Wed, 18 Jan 2023 21:32:04 +0000 (16:32 -0500)] 
fix: jbd2: use the correct print format (v5.4.229)

See upstream commit :

  commit ecb9d0d2e123874bcdd2efdecda0f4e0c3dc566d
  Author: Bixuan Cui <cuibixuan@linux.alibaba.com>
  Date:   Tue Oct 11 19:33:44 2022 +0800

    jbd2: use the correct print format

    [ Upstream commit d87a7b4c77a997d5388566dd511ca8e6b8e8a0a8 ]

    The print format error was found when using ftrace event:
        <...>-1406 [000] .... 23599442.895823: jbd2_end_commit: dev 252,8 transaction -1866216965 sync 0 head -1866217368
        <...>-1406 [000] .... 23599442.896299: jbd2_start_commit: dev 252,8 transaction -1866216964 sync 0

    Use the correct print format for transaction, head and tid.

Change-Id: Ieee3d39ed1f2515e096e87d18b5ea8f921c54bd0
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
22 months agofix: jbd2 upper bound for v5.10.163
Michael Jeanson [Tue, 17 Jan 2023 17:16:04 +0000 (12:16 -0500)] 
fix: jbd2 upper bound for v5.10.163

Use the correct upper bound of 5,11,0.

Change-Id: I435b44b940c7346ed8c3ef0d445365ed156702d0
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
22 months agofix: jbd2: use the correct print format (v5.10.163)
Michael Jeanson [Tue, 17 Jan 2023 16:03:12 +0000 (11:03 -0500)] 
fix: jbd2: use the correct print format (v5.10.163)

See upstream commit :

  commit d87a7b4c77a997d5388566dd511ca8e6b8e8a0a8
  Author: Bixuan Cui <cuibixuan@linux.alibaba.com>
  Date:   Tue Oct 11 19:33:44 2022 +0800

    jbd2: use the correct print format

    The print format error was found when using ftrace event:
        <...>-1406 [000] .... 23599442.895823: jbd2_end_commit: dev 252,8 transaction -1866216965 sync 0 head -1866217368
        <...>-1406 [000] .... 23599442.896299: jbd2_start_commit: dev 252,8 transaction -1866216964 sync 0

    Use the correct print format for transaction, head and tid.

Change-Id: I7601f5cbb86495c2607be7b11e02724c90b3ebf9
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
22 months agofix: btrfs: move accessor helpers into accessors.h (v6.2)
Michael Jeanson [Mon, 16 Jan 2023 20:01:51 +0000 (15:01 -0500)] 
fix: btrfs: move accessor helpers into accessors.h (v6.2)

See upstream commit :

  commit 07e81dc94474eb62705c6f96d9ab1a5a797b8703
  Author: Josef Bacik <josef@toxicpanda.com>
  Date:   Wed Oct 19 10:51:00 2022 -0400

    btrfs: move accessor helpers into accessors.h

    This is a large patch, but because they're all macros it's impossible to
    split up.  Simply copy all of the item accessors in ctree.h and paste
    them in accessors.h, and then update any files to include the header so
    everything compiles.

Change-Id: I1f0876dd8b7a8687f6802b60c3e3baabd017cc52
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
22 months agoVersion 2.13.8 v2.13.8
Mathieu Desnoyers [Fri, 13 Jan 2023 21:08:06 +0000 (16:08 -0500)] 
Version 2.13.8

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I3686e4e44475aa8cd6aa435f74e4994f5e36da2e

22 months agofix: jbd2: use the correct print format
Michael Jeanson [Thu, 12 Jan 2023 18:52:22 +0000 (13:52 -0500)] 
fix: jbd2: use the correct print format

See upstream commit :

  commit d87a7b4c77a997d5388566dd511ca8e6b8e8a0a8
  Author: Bixuan Cui <cuibixuan@linux.alibaba.com>
  Date:   Tue Oct 11 19:33:44 2022 +0800

    jbd2: use the correct print format

    The print format error was found when using ftrace event:
        <...>-1406 [000] .... 23599442.895823: jbd2_end_commit: dev 252,8 transaction -1866216965 sync 0 head -1866217368
        <...>-1406 [000] .... 23599442.896299: jbd2_start_commit: dev 252,8 transaction -1866216964 sync 0

    Use the correct print format for transaction, head and tid.

Change-Id: Ic053f0e0c1e24ebc75bae51d07696aaa5e1c0094
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
23 months agoFix: in_x32_syscall was introduced in v4.7.0
Mathieu Desnoyers [Thu, 1 Dec 2022 16:33:20 +0000 (11:33 -0500)] 
Fix: in_x32_syscall was introduced in v4.7.0

Prior to v4.7.0, is_x32_task() was the API to query whether the current
system call is following the x32 ABI.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I783bd3bb46ec5e863ae209f79cee2f1bb415e661

23 months agoExplicitly skip tracing x32 system calls
Mathieu Desnoyers [Wed, 30 Nov 2022 20:41:02 +0000 (15:41 -0500)] 
Explicitly skip tracing x32 system calls

x86 x32 system calls are not supported by LTTng. They are currently not
traced simply because their system call number is beyond the range of
NR_compat_syscalls.

However, this mostly happens by accident rather than by design.

Enforce this with an explicit check for in_x32_syscall(), which clearly
documents that those are not supported.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I1235c32c5cf03612bf9c36785cf7c4f8f49d292b

23 months agofix: kallsyms wrapper on ppc64el
Michael Jeanson [Thu, 24 Nov 2022 19:25:33 +0000 (14:25 -0500)] 
fix: kallsyms wrapper on ppc64el

The 'PPC64_ELF_ABI_v2' macro in 'asm/types.h' was removed in v5.19 and
replaced by a config option 'CONFIG_PPC64_ELF_ABI_V2'.

See upstream commit :

  commit 5b89492c03e5c0a2c259b97d7d4c1bb9b02860aa
  Author: Christophe Leroy <christophe.leroy@csgroup.eu>
  Date:   Mon May 9 07:36:08 2022 +0200

    powerpc: Finalise cleanup around ABI use

    Now that we have CONFIG_PPC64_ELF_ABI_V1 and CONFIG_PPC64_ELF_ABI_V2,
    get rid of all indirect detection of ABI version.

Link: https://lore.kernel.org/r/709d9d69523c14c8a9fba4486395dca0f2d675b1.1652074503.git.christophe.leroy@csgroup.eu
Change-Id: Ibd00e35cab5516a6224bdfa5a6b540119b42dc55
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2 years agofix: Adjust ranges for RHEL 8.6 kernels
Michael Jeanson [Fri, 11 Nov 2022 15:47:54 +0000 (10:47 -0500)] 
fix: Adjust ranges for RHEL 8.6 kernels

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I0b2c90f3678d0fb4503f61f336a4af185de2b39d

2 years agofix: kvm-x86 requires CONFIG_KALLSYMS_ALL
Michael Jeanson [Tue, 8 Nov 2022 16:26:46 +0000 (11:26 -0500)] 
fix: kvm-x86 requires CONFIG_KALLSYMS_ALL

Fixes: #1363
Change-Id: I6da15f77123c393ccb9109b562c7c8dc5bbb96a5
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2 years agofix: mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not using...
Michael Jeanson [Mon, 17 Oct 2022 17:49:51 +0000 (13:49 -0400)] 
fix: mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not using (v6.1)

See uptream commit:

  commit 2c1d697fb8ba6d2d44f914d4268ae1ccdf025f1b
  Author: Hyeonggon Yoo <42.hyeyoo@gmail.com>
  Date:   Wed Aug 17 19:18:24 2022 +0900

    mm/slab_common: drop kmem_alloc & avoid dereferencing fields when not using

    Drop kmem_alloc event class, and define kmalloc and kmem_cache_alloc
    using TRACE_EVENT() macro.

    And then this patch does:
       - Do not pass pointer to struct kmem_cache to trace_kmalloc.
         gfp flag is enough to know if it's accounted or not.
       - Avoid dereferencing s->object_size and s->size when not using kmem_cache_alloc event.
       - Avoid dereferencing s->name in when not using kmem_cache_free event.
       - Adjust s->size to SLOB_UNITS(s->size) * SLOB_UNIT in SLOB

Change-Id: Icd7925731ed4a737699c3746cb7bb7760a4e8009
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2 years agoVersion 2.13.7 v2.13.7
Mathieu Desnoyers [Fri, 30 Sep 2022 21:11:06 +0000 (17:11 -0400)] 
Version 2.13.7

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I28620e23756e3e91965839801ea8828b3f2b919c

2 years agoFix: handle integer capture page faults as skip field
Mathieu Desnoyers [Fri, 30 Sep 2022 20:19:16 +0000 (16:19 -0400)] 
Fix: handle integer capture page faults as skip field

Now that we have the appropriate save/restore position mechanism for
error handling in place, we can handle page faults on integer
copy-from-user by skipping the offending captured field entirely rather
than relying on an arbitrary 0 value.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I4ec6243d96753ce7e9c6230563713aeacb126567

2 years agoVersion 2.13.6 v2.13.6
Mathieu Desnoyers [Fri, 30 Sep 2022 19:18:34 +0000 (15:18 -0400)] 
Version 2.13.6

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: Idecfca64078038637c4a790adac84d87893d2bdd

2 years agoFix: bytecode validator: reject specialized load field/context ref instructions
Mathieu Desnoyers [Fri, 30 Sep 2022 14:14:18 +0000 (10:14 -0400)] 
Fix: bytecode validator: reject specialized load field/context ref instructions

Reject specialized load field/context ref instructions so a bytecode
crafted with nefarious intent cannot:

- Read user-space memory without proper get_user accessors,
- Read a memory area larger than the memory targeted by the instrumentation.

This prevents bytecode received from a tracing group user from oopsing
the kernel or disclosing the content of kernel memory to the tracing
group

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I2bda938a3a050f20be1d3d542aefe638b1b8bf73

2 years agoFix: bytecode validator: reject specialized load instructions
Mathieu Desnoyers [Thu, 29 Sep 2022 19:29:21 +0000 (15:29 -0400)] 
Fix: bytecode validator: reject specialized load instructions

Reject specialized load instructions so a bytecode crafted with
nefarious intent cannot:

- Read user-space memory without proper get_user accessors,
- Read a memory area larger than the memory targeted by the instrumentation.

This prevents bytecode received from a tracing group user from oopsing
the kernel or disclosing the content of kernel memory to the tracing
group.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I6bcdf37d4a8601164082b3c24358bf0e765a2c92

2 years agoFix: honor "user" attribute for array/sequence of user integers
Mathieu Desnoyers [Thu, 29 Sep 2022 18:26:27 +0000 (14:26 -0400)] 
Fix: honor "user" attribute for array/sequence of user integers

The macro _lttng_kernel_static_type_integer_from_type() should map to
_lttng_kernel_static_type_integer() to pass the "_user" attribute.
Otherwise, userspace fields such as pipe2's system call fildes field (a
ctf_user_array()) can trigger NULL pointer exceptions and read arbitrary
kernel memory if the pipe2 system call receives a bogus pointer as input
while filtering/capture is accessing this field.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I44276d751b822f214804184d1ce4d9b10b47d89d

2 years agowrapper: powerpc64: fix kernel crash caused by do_get_kallsyms
He Zhe [Tue, 27 Sep 2022 07:59:42 +0000 (15:59 +0800)] 
wrapper: powerpc64: fix kernel crash caused by do_get_kallsyms

Kernel crashes on powerpc64 ABIv2 as follow when lttng_tracer initializes,
since do_get_kallsyms in lttng_wrapper fails to return a proper address of
kallsyms_lookup_name.

root@qemuppc64:~# lttng create trace_session --live -U net://127.0.0.1
Spawning a session daemon
lttng_kretprobes: loading out-of-tree module taints kernel.
BUG: Unable to handle kernel data access on read at 0xfffffffffffffff8
Faulting instruction address: 0xc0000000001f6fd0
Oops: Kernel access of bad area, sig: 11 [#1]
<snip>
NIP [c0000000001f6fd0] module_kallsyms_lookup_name+0xf0/0x180
LR [c0000000001f6f28] module_kallsyms_lookup_name+0x48/0x180
Call Trace:
module_kallsyms_lookup_name+0x34/0x180 (unreliable)
kallsyms_lookup_name+0x258/0x2b0
wrapper_kallsyms_lookup_name+0x4c/0xd0 [lttng_wrapper]
wrapper_get_pfnblock_flags_mask_init+0x28/0x60 [lttng_wrapper]
lttng_events_init+0x40/0x344 [lttng_tracer]
do_one_initcall+0x78/0x340
do_init_module+0x6c/0x2f0
__do_sys_finit_module+0xd0/0x120
system_call_exception+0x194/0x2f0
system_call_vectored_common+0xe8/0x278
<snip>

do_get_kallsyms makes use of kprobe_register and in turn kprobe_lookup_name
to get the address of the kernel function kallsyms_lookup_name. In case of
PPC64_ELF_ABI_v2, when kprobes are placed at function entry,
kprobe_lookup_name adjusts the global entry point of the function returned
by kallsyms_lookup_name to the local entry point(at some fixed offset of
global one). This adjustment is all for kprobes to be able to work properly.
Global and local entry point are defined in powerpc64 ABIv2.

When the local entry point is given, some instructions at the beginning of
the function are skipped and thus causes the above kernel crash. We just
want to make a simple function call which needs global entry point.

This patch adds 4 bytes which is the length of one instruction to
kallsyms_lookup_name so that it will not trigger the global to local
adjustment, and then substracts 4 bytes from the returned address. See the
following kernel change for more details.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=290e3070762ac80e5fc4087d8c4de7e3f1d90aca

Signed-off-by: He Zhe <zhe.he@windriver.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I34e68e886b97e3976d0b5e25be295a8bb866c1a4

2 years agoFix: event notification: Remove duplicate event enabled check
Mathieu Desnoyers [Wed, 28 Sep 2022 14:44:05 +0000 (10:44 -0400)] 
Fix: event notification: Remove duplicate event enabled check

The event enabled checks are already done by the event notification
callers, so there is no point in checking it again.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I8033c053d6a601cf646a008d5325d556dba5a8f9

2 years agoFix: event notification capture: validate buffer length
Mathieu Desnoyers [Wed, 28 Sep 2022 14:34:42 +0000 (10:34 -0400)] 
Fix: event notification capture: validate buffer length

Validate that the buffer length is large enough to hold empty capture
fields.

If the buffer is initially not large enough to hold empty capture fields
for each field to capture, discard the notification.

If after capturing a field there is not enough room anymore in the
buffer to write empty capture fields, skip the offending large field by
writing an empty capture field in its place.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: Ifa2cdaf084e2ebee2efa052331107cb4d9095243

2 years agoFix: handle capture page faults as skip field
Mathieu Desnoyers [Tue, 27 Sep 2022 20:31:29 +0000 (16:31 -0400)] 
Fix: handle capture page faults as skip field

Now that we have the appropriate save/restore position mechanism for
error handling in place, we can handle page faults on copy-from-user by
skipping the offending captured field entirely rather than relying on an
empty string.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: Ibe1e818f57f8218d2b83281a572895884fc28b86

2 years agoFix: event notification capture error handling
Mathieu Desnoyers [Tue, 27 Sep 2022 19:07:24 +0000 (15:07 -0400)] 
Fix: event notification capture error handling

When the captured fields end up taking more than 512 bytes of space for
the msgpack message, the notification append capture fails.

Currently, this is handled by printing a WARN_ON_ONCE() on the console,
and a printk "Error appending capture to notification" warning.

Considering that this kind of error is very much legitimate, spamming
the console with warnings is not the way we want to handle this.

Rather than print a warning on the console, reset the msgpack writer
position to skip the problematic captured field entirely when it is
erroneous.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I4c98dc85266dd7af5e11bbd3d73ab5118c9e03af

2 years agoFix: capture_sequence_element_{un,}signed: handle user-space input
Mathieu Desnoyers [Mon, 5 Sep 2022 22:19:16 +0000 (18:19 -0400)] 
Fix: capture_sequence_element_{un,}signed: handle user-space input

The "user" attribute (copy from userspace) is not applied to
sequence/array of integer field capture within event notifications. This
could eventually lead to unsafe copy of integers from user-space.

Currently, the only array/sequence of integers which are read from
user-space are the arguments to sys_select (e.g. `readfds` field). Those
are expressed as "custom" fields, which are skipped by the filter and
capture bytecode.

This is therefore not an issue with the current instrumentation, but we
should properly handle this nevertheless.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: Icf0c141d333f63402d8a76051bcd53fcdd5ed8c2

2 years agoFix: notification capture: handle userspace strings
Mathieu Desnoyers [Tue, 6 Sep 2022 15:59:17 +0000 (11:59 -0400)] 
Fix: notification capture: handle userspace strings

The "user" attribute (copy from userspace) is not applied to string
field capture within event notifications. This leads to copy of strings
from user-space (e.g. `filename` field from sys_open) to end up using
strlen/memcpy on user-space data. This can cause kernel OOPS due to
unhandled page faults, and it also allows reading kernel memory through
the event notification capture mechanism. As a result, the users within
the `tracing` group can read arbitrary kernel memory.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I3241b144fea849004a3f0a19276506c9f1b0d5e5

2 years agoImplement lttng_msgpack_write_user_str
Mathieu Desnoyers [Tue, 6 Sep 2022 15:57:58 +0000 (11:57 -0400)] 
Implement lttng_msgpack_write_user_str

Implement lttng_msgpack_write_user_str to allow safely capturing
user-space strings.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I0354382cdd599b041fd20e59bb673fda7d72b2be

2 years agoFix: bytecode interpreter: LOAD_FIELD: handle user fields
Mathieu Desnoyers [Tue, 6 Sep 2022 19:10:17 +0000 (15:10 -0400)] 
Fix: bytecode interpreter: LOAD_FIELD: handle user fields

The instructions for recursive traversal through composed types
are used by the capture bytecode, and by filter expressions which
access fields nested within composed types.

Instructions BYTECODE_OP_LOAD_FIELD_STRING and
BYTECODE_OP_LOAD_FIELD_SEQUENCE were leaving the "user" attribute
uninitialized. Initialize those to 0.

The handling of userspace strings and integers is missing in LOAD_FIELD
instructions. Therefore, ensure that the specialization leaves the
generic LOAD_FIELD instruction in place for userspace input.

Add a "user" attribute to:
- struct bytecode_get_index_data elem field (produced by the
  specialization),
- struct vstack_load used by the specialization,
- struct load_ptr used by the interpreter.
- struct lttng_interpreter_output used by the event notification
  capture.

Use this "user" attribute in dynamic_load_field() for integer, string
and string_sequence object types to ensure that the proper
userspace-aware accesses are performed when loading those fields.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: Ib8d4db5b7da5064e5897ab3802ab47e063607036

2 years agoFix: move "user" attribute from field to type
Mathieu Desnoyers [Mon, 5 Sep 2022 20:45:39 +0000 (16:45 -0400)] 
Fix: move "user" attribute from field to type

The "user" field attribute (copy from userspace) is not taken into
account in the bytecode specialization and interpreter recursive
traversal through composed types (LOAD_FIELD bytecode instructions).

Those are currently used by the event notification capture bytecode, and
by filter expressions which access fields nested within composed types.

Move the "user" attribute from the event fields to the integer and
string types. This will allow ensuring that the bytecode specialization,
interpreter and event notification output capture have access to this
user attribute even in nested types (e.g. arrays, sequences) in a
subsequent change.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I044a0845b256b5e2cf65aa0888af2b906678d19d

2 years agoIntroduce lttng_copy_from_user_check_nofault
Mathieu Desnoyers [Mon, 5 Sep 2022 21:55:37 +0000 (17:55 -0400)] 
Introduce lttng_copy_from_user_check_nofault

This code will be re-used by the event notification capture code, so
move it out of the ring buffer.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I482adb5f619944285703425e278a70c601ce99b3

2 years agofix: adjust range v5.10.137 in block probe
Michael Jeanson [Mon, 22 Aug 2022 18:16:27 +0000 (14:16 -0400)] 
fix: adjust range v5.10.137 in block probe

See upstream commit, backported in v5.10.137 :

commit 1cb3032406423b25aa984854b4d78e0100d292dd
Author: Christoph Hellwig <hch@lst.de>
Date:   Thu Dec 3 17:21:39 2020 +0100

    block: remove the request_queue to argument request based tracepoints

    [ Upstream commit a54895fa057c67700270777f7661d8d3c7fda88a ]

    The request_queue can trivially be derived from the request.

Change-Id: I01f96a437641421faf993b4b031171c372bd0374
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2 years agoVersion 2.13.5 v2.13.5
Mathieu Desnoyers [Fri, 19 Aug 2022 18:47:13 +0000 (14:47 -0400)] 
Version 2.13.5

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: If7bcc8f8709c264d140a5e7c8c815efbec8d0508

2 years agoFix: incorrect stub prototypes when CONFIG_HAVE_SYSCALL_TRACEPOINTS=n
Mathieu Desnoyers [Fri, 19 Aug 2022 14:37:58 +0000 (10:37 -0400)] 
Fix: incorrect stub prototypes when CONFIG_HAVE_SYSCALL_TRACEPOINTS=n

The stub prototypes do not match the expected argument types, and extra
erroneous semicolons are present. This has been fixed by a refactoring
in the master branch:

commit f2db8be348380b48e3795d14e49cc585b3c357fd
Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Date:   Mon Nov 1 15:14:44 2021 -0400

    Cleanup: syscall filter enable/disable event

Fixes: #1357
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I43130c8ebb7fbc6961a3b73a7b04845bef59d318

2 years agofix: mm/tracing: add 'accounted' entry into output of allocation tracepoints (v6.0)
Michael Jeanson [Mon, 15 Aug 2022 21:22:47 +0000 (17:22 -0400)] 
fix: mm/tracing: add 'accounted' entry into output of allocation tracepoints (v6.0)

See upstream commit :

  commit b347aa7b57477f71c740e2bbc6d1078a7109ba23
  Author: Vasily Averin <vasily.averin@linux.dev>
  Date:   Fri Jun 3 06:21:49 2022 +0300

    mm/tracing: add 'accounted' entry into output of allocation tracepoints

    Slab caches marked with SLAB_ACCOUNT force accounting for every
    allocation from this cache even if __GFP_ACCOUNT flag is not passed.
    Unfortunately, at the moment this flag is not visible in ftrace output,
    and this makes it difficult to analyze the accounted allocations.

    This patch adds boolean "accounted" entry into trace output,
    and set it to 'true' for calls used __GFP_ACCOUNT flag and
    for allocations from caches marked with SLAB_ACCOUNT.
    Set it to 'false' if accounting is disabled in configs.

Change-Id: I023a355b94e79931499e1a1f648e2649d6dd3c89
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2 years agofix: block: remove bdevname (v6.0)
Michael Jeanson [Mon, 15 Aug 2022 18:39:42 +0000 (14:39 -0400)] 
fix: block: remove bdevname (v6.0)

See upstream commit :

  commit 900d156bac2bc474cf7c7bee4efbc6c83ec5ae58
  Author: Christoph Hellwig <hch@lst.de>
  Date:   Wed Jul 13 07:53:17 2022 +0200

    block: remove bdevname

    Replace the remaining calls of bdevname with snprintf using the %pg
    format specifier.

Change-Id: I09f2afe91e549be2746334a4a09fc00be09b0778
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2 years agofix: fs/jbd2: Fix the documentation of the jbd2_write_superblock() callers (v6.0)
Michael Jeanson [Mon, 15 Aug 2022 21:21:20 +0000 (17:21 -0400)] 
fix: fs/jbd2: Fix the documentation of the jbd2_write_superblock() callers (v6.0)

See upstream commit :

  commit 6669797b0dd41ced457760b6e1014fdda8ce19ce
  Author: Bart Van Assche <bvanassche@acm.org>
  Date:   Thu Jul 14 11:07:22 2022 -0700

    Commit 2a222ca992c3 ("fs: have submit_bh users pass in op and flags
    separately") renamed the jbd2_write_superblock() 'write_op' argument into
    'write_flags'. Propagate this change to the jbd2_write_superblock()
    callers. Additionally, change the type of 'write_flags' into blk_opf_t.

Change-Id: I65b8af95b3d07438763dd94f409c197e3b400733
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2 years agofix: tie compaction probe build to CONFIG_COMPACTION
Michael Jeanson [Wed, 10 Aug 2022 15:07:14 +0000 (11:07 -0400)] 
fix: tie compaction probe build to CONFIG_COMPACTION

The definition of 'struct compact_control' in 'mm/internal.h' depends on
CONFIG_COMPACTION being defined. Only build the compaction probe when
this configuration option is enabled.

Thanks to Bruce Ashfield <bruce.ashfield@gmail.com> for reporting this
issue.

Change-Id: I81e77aa9c1bf10452c152d432fe5224df0db42c9
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2 years agofix: net: skb: introduce kfree_skb_reason() (v5.15.58..v5.16)
Mathieu Desnoyers [Fri, 29 Jul 2022 19:37:43 +0000 (15:37 -0400)] 
fix: net: skb: introduce kfree_skb_reason() (v5.15.58..v5.16)

See upstream commit :

  commit c504e5c2f9648a1e5c2be01e8c3f59d394192bd3
  Author: Menglong Dong <imagedong@tencent.com>
  Date:   Sun Jan 9 14:36:26 2022 +0800

    net: skb: introduce kfree_skb_reason()

    Introduce the interface kfree_skb_reason(), which is able to pass
    the reason why the skb is dropped to 'kfree_skb' tracepoint.

    Add the 'reason' field to 'trace_kfree_skb', therefor user can get
    more detail information about abnormal skb with 'drop_monitor' or
    eBPF.

    All drop reasons are defined in the enum 'skb_drop_reason', and
    they will be print as string in 'kfree_skb' tracepoint in format
    of 'reason: XXX'.

    ( Maybe the reasons should be defined in a uapi header file, so that
    user space can use them? )

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: Ib3c039207739dad10f097cf76474e0822e351273

2 years agofix: workqueue: Fix type of cpu in trace event (v5.19)
Michael Jeanson [Wed, 15 Jun 2022 16:07:16 +0000 (12:07 -0400)] 
fix: workqueue: Fix type of cpu in trace event (v5.19)

See upstream commit :

  commit 873a400938b31a1e443c4d94b560b78300787540
  Author: Wonhyuk Yang <vvghjk1234@gmail.com>
  Date:   Wed May 4 11:32:03 2022 +0900

    workqueue: Fix type of cpu in trace event

    The trace event "workqueue_queue_work" use unsigned int type for
    req_cpu, cpu. This casue confusing cpu number like below log.

    $ cat /sys/kernel/debug/tracing/trace
    cat-317  [001] ...: workqueue_queue_work: ... req_cpu=8192 cpu=4294967295

    So, change unsigned type to signed type in the trace event. After
    applying this patch, cpu number will be printed as -1 instead of
    4294967295 as folllows.

    $ cat /sys/kernel/debug/tracing/trace
    cat-1338  [002] ...: workqueue_queue_work: ... req_cpu=8192 cpu=-1

Change-Id: I478083c350b6ec314d87e9159dc5b342b96daed7
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2 years agofix: fs: Remove flags parameter from aops->write_begin (v5.19)
Michael Jeanson [Wed, 8 Jun 2022 17:07:59 +0000 (13:07 -0400)] 
fix: fs: Remove flags parameter from aops->write_begin (v5.19)

See upstream commit :

  commit 9d6b0cd7579844761ed68926eb3073bab1dca87b
  Author: Matthew Wilcox (Oracle) <willy@infradead.org>
  Date:   Tue Feb 22 14:31:43 2022 -0500

    fs: Remove flags parameter from aops->write_begin

    There are no more aop flags left, so remove the parameter.

Change-Id: I82725b93e13d749f52a631b2ac60df81a5e839f8
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2 years agofix: mm/page_alloc: fix tracepoint mm_page_alloc_zone_locked() (v5.19)
Michael Jeanson [Wed, 8 Jun 2022 16:56:36 +0000 (12:56 -0400)] 
fix: mm/page_alloc: fix tracepoint mm_page_alloc_zone_locked() (v5.19)

See upstream commit :

  commit 10e0f7530205799e7e971aba699a7cb3a47456de
  Author: Wonhyuk Yang <vvghjk1234@gmail.com>
  Date:   Thu May 19 14:08:54 2022 -0700

    mm/page_alloc: fix tracepoint mm_page_alloc_zone_locked()

    Currently, trace point mm_page_alloc_zone_locked() doesn't show correct
    information.

    First, when alloc_flag has ALLOC_HARDER/ALLOC_CMA, page can be allocated
    from MIGRATE_HIGHATOMIC/MIGRATE_CMA.  Nevertheless, tracepoint use
    requested migration type not MIGRATE_HIGHATOMIC and MIGRATE_CMA.

    Second, after commit 44042b4498728 ("mm/page_alloc: allow high-order pages
    to be stored on the per-cpu lists") percpu-list can store high order
    pages.  But trace point determine whether it is a refiil of percpu-list by
    comparing requested order and 0.

    To handle these problems, make mm_page_alloc_zone_locked() only be called
    by __rmqueue_smallest with correct migration type.  With a new argument
    called percpu_refill, it can show roughly whether it is a refill of
    percpu-list.

Change-Id: I2e4a57393757f12b9c5a4566c4d1102ee2474a09
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2 years agoVersion 2.13.4 v2.13.4
Mathieu Desnoyers [Fri, 3 Jun 2022 19:00:26 +0000 (15:00 -0400)] 
Version 2.13.4

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I37d9d03bfc0bae7c271ce3956e71919c4e2fb6e7

2 years agoFix: event notifier: racy use of last subbuffer record
Mathieu Desnoyers [Mon, 4 Apr 2022 19:42:00 +0000 (15:42 -0400)] 
Fix: event notifier: racy use of last subbuffer record

The lttng-modules event notifiers use the ring buffer internally. When
reading the payload of the last event in a sub-buffer with a multi-part
read (e.g. two read system calls), we should not "put" the sub-buffer
holding this data, else continuing reading the data in the following
read system call can observe corrupted data if it has been concurrently
overwritten by the producer.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: Idb051e50ee8a25958cfd63a9b143f4943ca2e01a

2 years agoFix: bytecode interpreter context_get_index() leaves byte order uninitialized
Mathieu Desnoyers [Wed, 30 Mar 2022 18:24:54 +0000 (14:24 -0400)] 
Fix: bytecode interpreter context_get_index() leaves byte order uninitialized

Observed Issue
==============

When using the event notification capture feature to capture a context
field, e.g. '$ctx.cpu_id', the captured value is often observed in
reverse byte order.

Cause
=====

Within the bytecode interpreter, context_get_index() leaves the "rev_bo"
field uninitialized in the top of stack.

This only affects the event notification capture bytecode because the
BYTECODE_OP_GET_SYMBOL bytecode instruction (as of lttng-tools 2.13)
is only generated for capture bytecode in lttng-tools. Therefore, only
capture bytecode targeting contexts are affected by this issue. The
reason why lttng-tools uses the "legacy" bytecode instruction to get
context (BYTECODE_OP_GET_CONTEXT_REF) for the filter bytecode is to
preserve backward compatibility of filtering when interacting with
applications linked against LTTng-UST 2.12.

Solution
========

Initialize the rev_bo field based on the context field type
reserve_byte_order field.

Known drawbacks
===============

None.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I1483642b0b8f6bc28d5b68be170a04fb419fd9b3

2 years agofix: 'random' tracepoints removed in stable kernels
Michael Jeanson [Tue, 31 May 2022 19:24:48 +0000 (15:24 -0400)] 
fix: 'random' tracepoints removed in stable kernels

The upstream commit 14c174633f349cb41ea90c2c0aaddac157012f74 removing
the 'random' tracepoints is being backported to multiple stable kernel
branches, I don't see how that qualifies as a fix but here we are.

Use the presence of 'include/trace/events/random.h' in the kernel source
tree instead of the rather tortuous version check to determine if we
need to build 'lttng-probe-random.ko'.

Change-Id: I8f5f2f4c9e09c61127c49c7949b22dd3fab0460d
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2 years agofix: random: remove unused tracepoints (v5.10, v5.15)
He Zhe [Thu, 2 Jun 2022 06:36:08 +0000 (06:36 +0000)] 
fix: random: remove unused tracepoints (v5.10, v5.15)

The following kernel commit has been back ported to v5.10.119 and v5.15.44.

commit 14c174633f349cb41ea90c2c0aaddac157012f74
Author: Jason A. Donenfeld <Jason@zx2c4.com>
Date:   Thu Feb 10 16:40:44 2022 +0100

  random: remove unused tracepoints

  These explicit tracepoints aren't really used and show sign of aging.
  It's work to keep these up to date, and before I attempted to keep them
  up to date, they weren't up to date, which indicates that they're not
  really used. These days there are better ways of introspecting anyway.

Signed-off-by: He Zhe <zhe.he@windriver.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I0b7eb8aa78b5bd2039e20ae3e1da4c5eb9018789

2 years agofix: sched/tracing: Append prev_state to tp args instead (v5.18)
Michael Jeanson [Tue, 17 May 2022 15:46:29 +0000 (11:46 -0400)] 
fix: sched/tracing: Append prev_state to tp args instead (v5.18)

See upstream commit :

  commit 9c2136be0878c88c53dea26943ce40bb03ad8d8d
  Author: Delyan Kratunov <delyank@fb.com>
  Date:   Wed May 11 18:28:36 2022 +0000

    sched/tracing: Append prev_state to tp args instead

    Commit fa2c3254d7cf (sched/tracing: Don't re-read p->state when emitting
    sched_switch event, 2022-01-20) added a new prev_state argument to the
    sched_switch tracepoint, before the prev task_struct pointer.

    This reordering of arguments broke BPF programs that use the raw
    tracepoint (e.g. tp_btf programs). The type of the second argument has
    changed and existing programs that assume a task_struct* argument
    (e.g. for bpf_task_storage access) will now fail to verify.

    If we instead append the new argument to the end, all existing programs
    would continue to work and can conditionally extract the prev_state
    argument on supported kernel versions.

Change-Id: Ife2ec88a8bea2743562590cbd357068d7773863f
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2 years agofix: mm: compaction: cleanup the compaction trace events (v5.18)
Michael Jeanson [Mon, 4 Apr 2022 19:14:01 +0000 (15:14 -0400)] 
fix: mm: compaction: cleanup the compaction trace events (v5.18)

See upstream commit :

  commit abd4349ff9b8d242376b67711254221f64f447c7
  Author: Baolin Wang <baolin.wang@linux.alibaba.com>
  Date:   Tue Mar 22 14:45:56 2022 -0700

    mm: compaction: cleanup the compaction trace events

    As Steven suggested [1], we should access the pointers from the trace
    event to avoid dereferencing them to the tracepoint function when the
    tracepoint is disabled.

    [1] https://lkml.org/lkml/2021/11/3/409

Change-Id: I6c08250df8596e8dbc76780ae5d95c899c12e6fe
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2 years agofix: scsi: core: Remove <scsi/scsi_request.h> (v5.18)
Michael Jeanson [Mon, 4 Apr 2022 19:08:48 +0000 (15:08 -0400)] 
fix: scsi: core: Remove <scsi/scsi_request.h> (v5.18)

See upstream commit :

  commit 26440303310591e29121964ede0048583cb3126d
  Author: Christoph Hellwig <hch@lst.de>
  Date:   Thu Feb 24 18:55:52 2022 +0100

    scsi: core: Remove <scsi/scsi_request.h>

    This header is empty now except for an include of <linux/blk-mq.h>, so
    remove it.

Change-Id: Ic8ee3352f1e8bddfcd44c31be9b788db82f183aa
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2 years agofix: kprobes: Use rethook for kretprobe if possible (v5.18)
Michael Jeanson [Mon, 4 Apr 2022 19:02:10 +0000 (15:02 -0400)] 
fix: kprobes: Use rethook for kretprobe if possible (v5.18)

See upstream commit :

  commit 73f9b911faa74ac5107879de05c9489c419f41bb
  Author: Masami Hiramatsu <mhiramat@kernel.org>
  Date:   Sat Mar 26 11:27:05 2022 +0900

    kprobes: Use rethook for kretprobe if possible

    Use rethook for kretprobe function return hooking if the arch sets
    CONFIG_HAVE_RETHOOK=y. In this case, CONFIG_KRETPROBE_ON_RETHOOK is
    set to 'y' automatically, and the kretprobe internal data fields
    switches to use rethook. If not, it continues to use kretprobe
    specific function return hooks.

Change-Id: I2b7670dc04e4769c1e3c372582ad2f555f6d7a66
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2 years agofix: random: remove unused tracepoints (v5.18)
Michael Jeanson [Mon, 4 Apr 2022 18:33:42 +0000 (14:33 -0400)] 
fix: random: remove unused tracepoints (v5.18)

See upstream commit :

  commit 14c174633f349cb41ea90c2c0aaddac157012f74
  Author: Jason A. Donenfeld <Jason@zx2c4.com>
  Date:   Thu Feb 10 16:40:44 2022 +0100

    random: remove unused tracepoints

    These explicit tracepoints aren't really used and show sign of aging.
    It's work to keep these up to date, and before I attempted to keep them
    up to date, they weren't up to date, which indicates that they're not
    really used. These days there are better ways of introspecting anyway.

Change-Id: I3b8c3e2732e7efdd76ce63204ac53a48784d0df6
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2 years agofix: scsi: block: Remove REQ_OP_WRITE_SAME support (v5.18)
Michael Jeanson [Mon, 4 Apr 2022 18:12:13 +0000 (14:12 -0400)] 
fix: scsi: block: Remove REQ_OP_WRITE_SAME support (v5.18)

See upstream commit :

  commit 73bd66d9c834220579c881a3eb020fd8917075d8
  Author: Christoph Hellwig <hch@lst.de>
  Date:   Wed Feb 9 09:28:28 2022 +0100

    scsi: block: Remove REQ_OP_WRITE_SAME support

    No more users of REQ_OP_WRITE_SAME or drivers implementing it are left,
    so remove the infrastructure.

Change-Id: Ifbff71f79f8b590436fc7cb79f82d90c6e033d84
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2 years agofix: block: remove genhd.h (v5.18)
Michael Jeanson [Mon, 4 Apr 2022 17:54:59 +0000 (13:54 -0400)] 
fix: block: remove genhd.h (v5.18)

See upstream commit :

  commit 322cbb50de711814c42fb088f6d31901502c711a
  Author: Christoph Hellwig <hch@lst.de>
  Date:   Mon Jan 24 10:39:13 2022 +0100

    block: remove genhd.h

    There is no good reason to keep genhd.h separate from the main blkdev.h
    header that includes it.  So fold the contents of genhd.h into blkdev.h
    and remove genhd.h entirely.

Change-Id: I7cf2aaa3a4c133320b95f2edde49f790f9515dbd
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2 years agofix: sched/tracing: Don't re-read p->state when emitting sched_switch event (v5.18)
Michael Jeanson [Mon, 4 Apr 2022 17:52:57 +0000 (13:52 -0400)] 
fix: sched/tracing: Don't re-read p->state when emitting sched_switch event (v5.18)

See upstream commit :

  commit fa2c3254d7cfff5f7a916ab928a562d1165f17bb
  Author: Valentin Schneider <valentin.schneider@arm.com>
  Date:   Thu Jan 20 16:25:19 2022 +0000

    sched/tracing: Don't re-read p->state when emitting sched_switch event

    As of commit

      c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu")

    the following sequence becomes possible:

                          p->__state = TASK_INTERRUPTIBLE;
                          __schedule()
                            deactivate_task(p);
      ttwu()
        READ !p->on_rq
        p->__state=TASK_WAKING
                            trace_sched_switch()
                              __trace_sched_switch_state()
                                task_state_index()
                                  return 0;

    TASK_WAKING isn't in TASK_REPORT, so the task appears as TASK_RUNNING in
    the trace event.

    Prevent this by pushing the value read from __schedule() down the trace
    event.

Change-Id: I46743cd006be4b4d573cae2d77df7d6d16744d04
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2 years agofix: KVM: x86: Unexport kvm_x86_ops (v5.18)
Michael Jeanson [Mon, 4 Apr 2022 20:28:26 +0000 (16:28 -0400)] 
fix: KVM: x86: Unexport kvm_x86_ops (v5.18)

See upstream commit :

  commit dfc4e6ca041135217c07ebcd102b6694cea22856
  Author: Sean Christopherson <seanjc@google.com>
  Date:   Fri Jan 28 00:51:56 2022 +0000

    KVM: x86: Unexport kvm_x86_ops

    Drop the export of kvm_x86_ops now it is no longer referenced by SVM or
    VMX.  Disallowing access to kvm_x86_ops is very desirable as it prevents
    vendor code from incorrectly modifying hooks after they have been set by
    kvm_arch_hardware_setup(), and more importantly after each function's
    associated static_call key has been updated.

    No functional change intended.

Change-Id: Icee959a984570f95ab9b71354225b5aeecea7da0
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2 years agoFix: do not warn on unknown counter ioctl
Mathieu Desnoyers [Fri, 8 Apr 2022 18:33:20 +0000 (14:33 -0400)] 
Fix: do not warn on unknown counter ioctl

It is perfectly valid for a newer lttng-tools to try to use an unknown
ioctl and handle -ENOSYS.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: Ia9f6472ca1196f983eee1327805b0ad69d028a98

2 years agoFix: tracepoint event: allow same provider and event name
Mathieu Desnoyers [Mon, 4 Apr 2022 19:49:32 +0000 (15:49 -0400)] 
Fix: tracepoint event: allow same provider and event name

Using the same name for the provider (TRACE_SYSTEM) and event name
causes a compilation error because the same identifiers are emitted
twice.

Fix this by prefixing the provider identifier with
"__provider_event_desc___".

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I8cdf8f859e35b8bd5c19737860d12f1ed546dfc2

2 years agoFix: compaction migratepages event name
Mathieu Desnoyers [Tue, 29 Mar 2022 20:34:07 +0000 (16:34 -0400)] 
Fix: compaction migratepages event name

The commit "fix: mm: compaction: fix the migration stats in trace_mm_compaction_migratepages() (v5.17)"

Triggers this warning:

    LTTng: event provider mismatch: The event name needs to start with provider name + _ + one or more letter, provider: compaction, event name: mm_compaction_migratepages

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I01c7485af765084dafb33bf33ae392e60bfbf1e7

2 years agoVersion 2.13.3 v2.13.3
Mathieu Desnoyers [Fri, 25 Mar 2022 18:06:26 +0000 (14:06 -0400)] 
Version 2.13.3

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: Ief0169c89e54c719907186e654d314d65a527098

2 years agoDocument expected ISO8601 time formats in ABI header
Mathieu Desnoyers [Mon, 14 Mar 2022 17:31:24 +0000 (13:31 -0400)] 
Document expected ISO8601 time formats in ABI header

Document the expected ISO8601 time formats in the ABI header to justify
the choice of string maximum length.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I4dedde83b5fb81c376245338773ea63677401a09

2 years agoFix: lttng ABI: lttng_counter_ioctl() tainted scalar
Mathieu Desnoyers [Mon, 14 Mar 2022 15:25:56 +0000 (11:25 -0400)] 
Fix: lttng ABI: lttng_counter_ioctl() tainted scalar

Found by Coverity:

>>>     CID 1476250:    (TAINTED_SCALAR)
>>>     Using tainted variable "local_counter_aggregate.index.number_dimensions" as a loop boundary.

>>>     CID 1476250:    (TAINTED_SCALAR)
>>>     Using tainted variable "local_counter_clear.index.number_dimensions" as a loop boundary.

>>>     CID 1476250:    (TAINTED_SCALAR)
>>>     Using tainted variable "local_counter_read.index.number_dimensions" as a loop boundary.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I7d35cf96781bb18837fe4564e4e8a34aa2ddc310

2 years agoFix: sample discarded events count before reserve
Mathieu Desnoyers [Thu, 10 Mar 2022 19:20:47 +0000 (14:20 -0500)] 
Fix: sample discarded events count before reserve

Sampling the discarded events count in the buffer_end callback is done
out of order, and may therefore include increments performed by following
events (in following packets) if the thread doing the end-of-packet
event write is interrupted for a long time.

Sampling the event discarded counts before reserving space for the last
event in a packet, and keeping this as part of the private ring buffer
context, should fix this race.

In lttng-modules, this scenario would only happen if an interrupt
handler produces many events, when nested over an event between its
reserve and commit. Note that if lttng-modules supports faultable
tracepoints in the future, this may become more easy to trigger due to
preemption.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I696a206b3926fc1abbee35caa9af65461ff56c68

2 years agoCleanup: comment alignment in ring buffer config.h
Mathieu Desnoyers [Thu, 10 Mar 2022 19:01:14 +0000 (14:01 -0500)] 
Cleanup: comment alignment in ring buffer config.h

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I38fdfd786dfb60e1339634780be2645968351ed8

2 years agoVersion 2.13.2 v2.13.2
Mathieu Desnoyers [Mon, 7 Mar 2022 20:54:28 +0000 (15:54 -0500)] 
Version 2.13.2

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: I8796907b413e09d417a1d087da5e4bdaf2a2bbf3

2 years agoFix: incorrect in/out direction for syscall exit
Mathieu Desnoyers [Mon, 10 May 2021 15:01:02 +0000 (11:01 -0400)] 
Fix: incorrect in/out direction for syscall exit

Syscall exit should fetch the "sc_out" parameters. This issue was
introduced by commit e42c4f49c15b ("Split syscall tracepoint generation in their own files").

Fixes: e42c4f49c15b ("Split syscall tracepoint generation in their own files")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: Ib34912005323ea34b6d11ca9acc5edf491649cdd

2 years agofix: net: socket: rename SKB_DROP_REASON_SOCKET_FILTER (v5.17)
Michael Jeanson [Mon, 31 Jan 2022 15:47:53 +0000 (10:47 -0500)] 
fix: net: socket: rename SKB_DROP_REASON_SOCKET_FILTER (v5.17)

No version check needed since this change is between two RCs, see
upstream commit :

  commit 364df53c081d93fcfd6b91085ff2650c7f17b3c7
  Author: Menglong Dong <imagedong@tencent.com>
  Date:   Thu Jan 27 17:13:01 2022 +0800

    net: socket: rename SKB_DROP_REASON_SOCKET_FILTER

    Rename SKB_DROP_REASON_SOCKET_FILTER, which is used
    as the reason of skb drop out of socket filter before
    it's part of a released kernel. It will be used for
    more protocols than just TCP in future series.

Link: https://lore.kernel.org/all/20220127091308.91401-2-imagedong@tencent.com/
Change-Id: I666461a5b541fe9e0bf53ad996ce33237af4bfbb
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2 years agofix: net: skb: introduce kfree_skb_reason() (v5.17)
Michael Jeanson [Wed, 26 Jan 2022 19:49:11 +0000 (14:49 -0500)] 
fix: net: skb: introduce kfree_skb_reason() (v5.17)

See upstream commit :

  commit c504e5c2f9648a1e5c2be01e8c3f59d394192bd3
  Author: Menglong Dong <imagedong@tencent.com>
  Date:   Sun Jan 9 14:36:26 2022 +0800

    net: skb: introduce kfree_skb_reason()

    Introduce the interface kfree_skb_reason(), which is able to pass
    the reason why the skb is dropped to 'kfree_skb' tracepoint.

    Add the 'reason' field to 'trace_kfree_skb', therefor user can get
    more detail information about abnormal skb with 'drop_monitor' or
    eBPF.

    All drop reasons are defined in the enum 'skb_drop_reason', and
    they will be print as string in 'kfree_skb' tracepoint in format
    of 'reason: XXX'.

    ( Maybe the reasons should be defined in a uapi header file, so that
    user space can use them? )

Change-Id: I6766678a288da959498a4736fc3f95bf239c3e94
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2 years agofix: random: rather than entropy_store abstraction, use global (v5.17)
Michael Jeanson [Wed, 26 Jan 2022 19:53:41 +0000 (14:53 -0500)] 
fix: random: rather than entropy_store abstraction, use global (v5.17)

See upstream commit :

  commit 90ed1e67e896cc8040a523f8428fc02f9b164394
  Author: Jason A. Donenfeld <Jason@zx2c4.com>
  Date:   Wed Jan 12 17:18:08 2022 +0100

    random: rather than entropy_store abstraction, use global

    Originally, the RNG used several pools, so having things abstracted out
    over a generic entropy_store object made sense. These days, there's only
    one input pool, and then an uneven mix of usage via the abstraction and
    usage via &input_pool. Rather than this uneasy mixture, just get rid of
    the abstraction entirely and have things always use the global. This
    simplifies the code and makes reading it a bit easier.

Change-Id: I1a2a14d7b6e69a047804e1e91e00fe002f757431
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2 years agofix: btrfs: pass fs_info to trace_btrfs_transaction_commit (v5.17)
Michael Jeanson [Wed, 26 Jan 2022 19:37:52 +0000 (14:37 -0500)] 
fix: btrfs: pass fs_info to trace_btrfs_transaction_commit (v5.17)

See upstream commit :

  commit 2e4e97abac4c95f8b87b2912ea013f7836a6f10b
  Author: Josef Bacik <josef@toxicpanda.com>
  Date:   Fri Nov 5 16:45:29 2021 -0400

    btrfs: pass fs_info to trace_btrfs_transaction_commit

    The root on the trans->root can be anything, and generally we're
    committing from the transaction kthread so it's usually the tree_root.
    Change this to just take an fs_info, and to maintain compatibility
    simply put the ROOT_TREE_OBJECTID as the root objectid for the
    tracepoint.  This will allow use to remove trans->root.

Change-Id: Ie5a4804330edabffac0714fcb9c25b8c8599e424
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2 years agofix: mm: compaction: fix the migration stats in trace_mm_compaction_migratepages...
Michael Jeanson [Sun, 23 Jan 2022 18:26:17 +0000 (13:26 -0500)] 
fix: mm: compaction: fix the migration stats in trace_mm_compaction_migratepages() (v5.17)

See upstream commit :

  commit 84b328aa81216e08804d8875d63f26bda1298788
  Author: Baolin Wang <baolin.wang@linux.alibaba.com>
  Date:   Fri Jan 14 14:08:40 2022 -0800

    mm: compaction: fix the migration stats in trace_mm_compaction_migratepages()

    Now the migrate_pages() has changed to return the number of {normal
    page, THP, hugetlb} instead, thus we should not use the return value to
    calculate the number of pages migrated successfully.  Instead we can
    just use the 'nr_succeeded' which indicates the number of normal pages
    migrated successfully to calculate the non-migrated pages in
    trace_mm_compaction_migratepages().

Link: https://lkml.kernel.org/r/b4225251c4bec068dcd90d275ab7de88a39e2bd7.1636275127.git.baolin.wang@linux.alibaba.com
Change-Id: Ib8e8f2a16a273f16cd73fe63afbbfc25c0a2540c
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
This page took 0.05573 seconds and 4 git commands to generate.