Issue observed
--------------
Running Linux 6.5.5, lttng-modules @
6be48c9f, all built with gcc
13.2.1, I got a 'BUG' in dmesg while enabling the following event
rule:
$ lttng enable-event --kernel --syscall --channel chanK --all --filter '$ctx.procname == "UST reg*"'
The relevant parts of the 'BUG' output follow:
[ +0.715480] detected buffer overflow in strnlen
[ +0.000001] kernel BUG at lib/string_helpers.c:1031!
[ +0.000008] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ +0.000003] CPU: 2 PID: 157174 Comm: Client manageme Tainted: G S U OE 6.5.5-arch1-1 #1
d82a0f532dd8cfe67d5795c1738d9c01059a0c62
[ +0.000001] RIP: 0010:fortify_panic+0x13/0x20
[ +0.000006] Code: 41 5d c3 cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 48 89 fe 48 c7 c7 90 22 c8 86 e8 3d aa b1 ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90
[ +0.000002] RSP: 0018:
ffffa7c7c106f918 EFLAGS:
00010246
[ +0.000002] RAX:
0000000000000023 RBX:
000000000000000b RCX:
0000000000000000
[ +0.000002] RDX:
0000000000000000 RSI:
ffff92766e4a16c0 RDI:
ffff92766e4a16c0
[ +0.000001] RBP:
0000000000000000 R08:
0000000000000000 R09:
ffffa7c7c106f7c0
[ +0.000001] R10:
0000000000000003 R11:
ffffffff874ca068 R12:
ffff927618202480
[ +0.000001] R13:
ffff9276182024d2 R14:
ffff927453999c08 R15:
ffff9273dc7aa478
[ +0.000001] FS:
00007f06553f9680(0000) GS:
ffff92766e480000(0000) knlGS:
0000000000000000
[ +0.000002] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ +0.000002] CR2:
0000556d54eceaa8 CR3:
00000001ad9de002 CR4:
00000000003706e0
[ +0.000001] Call Trace:
[ +0.000002] <TASK>
[ +0.000002] ? die+0x36/0x90
[ +0.000004] ? do_trap+0xda/0x100
[ +0.000003] ? fortify_panic+0x13/0x20
[ +0.000002] ? do_error_trap+0x6a/0x90
[ +0.000002] ? fortify_panic+0x13/0x20
[ +0.000002] ? exc_invalid_op+0x50/0x70
[ +0.000003] ? fortify_panic+0x13/0x20
[ +0.000002] ? asm_exc_invalid_op+0x1a/0x20
[ +0.000005] ? fortify_panic+0x13/0x20
[ +0.000002] ? fortify_panic+0x13/0x20
[ +0.000003] bytecode_validate_overflow+0x155/0x1f0 [lttng_tracer
759e3e4fee0e774ef575e93b67e8dc7955d0c2c2]
[ +0.000330] lttng_bytecode_validate_load+0x32/0x1e0 [lttng_tracer
759e3e4fee0e774ef575e93b67e8dc7955d0c2c2]
[ +0.000183] lttng_enabler_link_bytecode+0x135/0x5a0 [lttng_tracer
759e3e4fee0e774ef575e93b67e8dc7955d0c2c2]
[ +0.000132] lttng_sync_event_list+0xef/0x650 [lttng_tracer
759e3e4fee0e774ef575e93b67e8dc7955d0c2c2]
[ +0.000123] ? __wake_up_common+0x73/0x180
[ +0.000004] lttng_session_enable+0x3e/0x130 [lttng_tracer
759e3e4fee0e774ef575e93b67e8dc7955d0c2c2]
[ +0.000121] lttng_session_ioctl+0x5db/0x720 [lttng_tracer
759e3e4fee0e774ef575e93b67e8dc7955d0c2c2]
[ +0.000120] ? __slab_free+0xf1/0x330
[ +0.000004] ? __scm_recv_common.isra.0+0x144/0x180
[ +0.000004] ? unix_stream_read_generic+0x233/0xb60
[ +0.000006] __x64_sys_ioctl+0x94/0xd0
[ +0.000004] do_syscall_64+0x5d/0x90
[ +0.000004] ? switch_fpu_return+0x50/0xe0
[ +0.000004] ? exit_to_user_mode_prepare+0x132/0x1e0
[ +0.000003] ? syscall_exit_to_user_mode+0x2b/0x40
[ +0.000002] ? do_syscall_64+0x6c/0x90
[ +0.000003] ? do_syscall_64+0x6c/0x90
[ +0.000002] ? do_syscall_64+0x6c/0x90
[ +0.000002] ? do_syscall_64+0x6c/0x90
[ +0.000002] ? syscall_exit_to_user_mode+0x2b/0x40
[ +0.000002] ? do_syscall_64+0x6c/0x90
[ +0.000002] ? do_syscall_64+0x6c/0x90
[ +0.000002] ? do_syscall_64+0x6c/0x90
[ +0.000002] ? do_syscall_64+0x6c/0x90
[ +0.000002] ? exc_page_fault+0x7f/0x180
[ +0.000003] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Cause
-----
`struct load_op` has a trailing 0-length array `data` member that is
used to refer, in the context of BYTECODE_OP_LOAD_STAR_GLOB_STRING, to
an immediate string operand that follows it.
During the validation of a filtering bytecode, strnlen is properly used
to determine the size of the immediate string operand, with a `maxlen`
parameter that is used to ensure the string operand is contained within
the bytecode (see lttng-bytecode-validator.c:434).
However, recent KSPP-related changes have enabled additional overrun
checks when statically-sized and flexible arrays are used. Those are
enabled when the kernel is built with CONFIG_UBSAN_BOUNDS and/or
CONFIG_FORTIFY_SOURCE configured.
The KBUILD CFLAGS now contain `-fstrict-flex-arrays=3`, which is
recognized by gcc 13+[1] and allows proper coverage of dynamically sized
trailing arrays when those configuration options are used.
With those validations in place, the kernel assumes that the `data`
array is truly of length 0 and it BUGs to warn of an invalid access.
The commit linked above contains a number of links explaining the
rationale for transitioning uses of the trailing zero-length arrays (a
gcc extension) to C99 flexible array members (FAM).
This was discussed at this year's GNU Cauldron [2].
Solution
--------
Uses of zero-length arrays (`foo[0]`) are replaced by flexible array
members (`foo[]`). The only cases that are left untouched are those
where the zero-length array is used to indicate the end of a
structure (i.e. it doesn't indicate that a variable number of elements
follow), see the `metadata_packet_header`, `metadata_record_header`,
`event_notifier_packet_header`, and `event_notifier_record_header`
structures.
It may be desirable to use the new `counted_by` attribute for some of
those in the future (`lttng_kernel_abi_filter_bytecode`,
`lttng_kernel_abi_capture_bytecode`, and `bytecode_runtime`) [3].
Note
----
While this is tagged as a memory handling 'fix', it has no security
implication as far as I can tell. The accesses that are flagged by the
new validations were valid.
This merely allows the runtime validations to understand the memory
layout properly.
[1] https://github.com/torvalds/linux/commit/
df8fc4e934c12b906d08050d7779f292b9c5c6b5
[2] https://gcc.gnu.org/wiki/cauldron2023talks?action=AttachFile&do=get&target=Most-wanted+Security+Features+in+GCC+for+Linux+Kernel.pdf
[3] https://lwn.net/Articles/930943/
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Change-Id: Id39b101aaafe68f8fae6b86cd61806cba8cb1e6a