Francis Deslauriers [Fri, 1 Oct 2021 20:10:24 +0000 (16:10 -0400)]
Fix: userspace-probe: truncating binary path for SDT
Issue
=====
This issue was uncovered when we enabled the testing of the SDT
userspace probe instrumentation on the CI, where the paths to file are
specially long.
The reported error is:
- rule: ma-probe-sdt (type: kernel:uprobe, location type: SDT, location: /root/workspace/dev_gerrit_lttng-tools_rootbuild/arch/amd64/babeltrace_version/stable-2.0/build/std/conf/agents/liburcu_version/master/node/amd64-rootnode/test_type/base/src/lttng-tools/tests/utils/testapp/userspace-probe-sdt-binary/.libs/userspace-probe-sdt-binary:foobar:tp1)
+ rule: ma-probe-sdt (type: kernel:uprobe, location type: SDT, location: /root/workspace/dev_gerrit_lttng-tools_rootbuild/arch/amd64/babeltrace_version/stable-2.0/build/std/conf/agents/liburcu_version/master/node/amd64-rootnode/test_type/base/src/lttng-tools/tests/utils/testapp/userspace-probe-sdt-binary/.libs/userspace-probe-s:foobar:tp1)
The important part to notice is that the path to the binary is truncated
compared to was is expected by the test case.
The problem is caused by the
`lttng_userspace_probe_location_tracepoint_create_from_payload()`
function that strdup() the path string using the wrong defined value.
Fix
===
Use LTTNG_PATH_MAX rather then LTTNG_SYMBOL_NAME_LEN to copy the binary
path.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I24cbf413baba405bf4c4b534ccbc2b18f8d5d43f
Jonathan Rajotte [Mon, 13 Sep 2021 20:49:48 +0000 (16:49 -0400)]
Fix: sessiond: ust session is inactive during ust_app_global_update
Observed issue
==============
The following scenario leads to an abort of lttng-sessiond.
lttng-sessiond (with kernel tracing available)
lttng create system-trace --snapshot -U /tmp/snapshot
lttng enable-channel -k system-trace --subbuf-size=4k --num-subbuf=256
lttng enable-event -c system-trace -k 'sched_wak*' -s system-trace
lttng start system-trace
lttng enable-event -u -a
Fails as expected with:
Error: Events: The command tried to enable an event in a new domain for
a session that has already been started once. (channel channel0,
session system-trace)
Launch any ust app such as easy_ust from the lttng-ust repository.
The following backtrace is generated:
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007ffff7af0859 in __GI_abort () at abort.c:79
#2 0x00007ffff7af0729 in __assert_fail_base (fmt=0x7ffff7c86588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55555564b765 "usess->active", file=0x555555649a60 "ust-app.c", line
#3 0x00007ffff7b01f36 in __GI___assert_fail (assertion=0x55555564b765 "usess->active", file=0x555555649a60 "ust-app.c", line=5123, function=0x55555564ecf0 <__PRETTY_FUNCTION__.14199> "ust_
#4 0x00005555555d1f5e in ust_app_global_update (usess=0x7fffe001fb90, app=0x7fffac000b80) at ust-app.c:5123
#5 0x00005555555b60d4 in update_ust_app (app_sock=82) at dispatch.c:71
#6 0x00005555555b7025 in thread_dispatch_ust_registration (data=0x5555556a07f0) at dispatch.c:409
#7 0x00005555555ad5ab in launch_thread (data=0x5555556a0810) at thread.c:65
#8 0x00007ffff7ce6609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#9 0x00007ffff7bed293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
This also happens for the track command. You can replace the `lttng
enable-event -u -a` with `lttng track --userspace --vuid=0` then launch
an app and the same backtrace gets generated.
Cause
=====
During `process_client_msg` the `create_ust_session` function is called
and a ust session is assigned to the "system_trace" session with a
state of `active` set to 0 (false). This is not a problem.
The problem seems to lie with a single call site for
`ust_app_global_update` in `update_ust_app`. The status of the ust
session is not checked before calling the `ust_app_global_update`. It is
important to note that all `ust_app_global_update_all` callsites guard
the call with a check against the status of the session.
Solution
========
Guard the call to `ust_app_global_update` with a check of the ust
session active state.
Known drawbacks
=========
None.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I14d25d99d0609689247cdfa86130bd0219613581
Jérémie Galarneau [Fri, 20 Aug 2021 19:12:20 +0000 (15:12 -0400)]
Fix: man: lttng-rotate: trace file count/size limitation does not apply
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I21f4809215df1b8e9b1cf24edabd478e9b530969
Jérémie Galarneau [Wed, 16 Jun 2021 19:08:21 +0000 (15:08 -0400)]
Fix: rotation client example: leak of handle on error
1452927 Resource leak
The system resource will not be reclaimed and reused, reducing the
future availability of the resource.
In setup_session: Leak of memory or pointers to system
resources (CWE-404)
CID
1452927 (#1 of 1): Resource leak (RESOURCE_LEAK)8. leaked_storage:
Variable chan_handle going out of scope leaks the storage it points to
Reported-by: Coverity Scan
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I4c215ac4a86f9f70fd5c9d3aa13f944d3d7a2cc7
Francis Deslauriers [Wed, 16 Jun 2021 16:10:42 +0000 (12:10 -0400)]
Fix: use of uninitialised bytes valgrind warning
Issue
=====
Valgrind reports usage of uninitialised stack allocated memory:
==
2961363== Thread 9 Client manageme:
==
2961363== Syscall param sendmsg(msg.msg_iov[0]) points to uninitialised byte(s)
==
2961363== at 0x521418D: __libc_sendmsg (sendmsg.c:28)
==
2961363== by 0x521418D: sendmsg (sendmsg.c:25)
==
2961363== by 0x53411B: lttcomm_send_unix_sock (unix.c:294)
==
2961363== by 0x48AA8C: send_unix_sock (client.c:896)
==
2961363== by 0x484F45: thread_manage_clients (client.c:2865)
==
2961363== by 0x480FB4: launch_thread (thread.c:66)
==
2961363== by 0x5208608: start_thread (pthread_create.c:477)
==
2961363== by 0x5346292: clone (clone.S:95)
==
2961363== Address 0x7575389 is 25 bytes inside a block of size 16,384 alloc'd
==
2961363== at 0x483DFAF: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==
2961363== by 0x4EB618: lttng_dynamic_buffer_set_capacity (dynamic-buffer.c:166)
==
2961363== by 0x4EB52C: lttng_dynamic_buffer_append (dynamic-buffer.c:55)
==
2961363== by 0x48CBA1: setup_lttng_msg (client.c:125)
==
2961363== by 0x48AD70: setup_lttng_msg_no_cmd_header (client.c:860)
==
2961363== by 0x489825: process_client_msg (client.c:2253)
==
2961363== by 0x484A97: thread_manage_clients (client.c:2807)
==
2961363== by 0x480FB4: launch_thread (thread.c:66)
==
2961363== by 0x5208608: start_thread (pthread_create.c:477)
==
2961363== by 0x5346292: clone (clone.S:95)
==
2961363== Uninitialised value was created by a stack allocation
==
2961363== at 0x485FE4: process_client_msg (client.c:928)
After some digging, I found that this warning was caused by the padding
of the `struct lttng_session_list_schedules_return` during the
`LTTNG_SESSION_LIST_ROTATION_SCHEDULES` command.
All the fields are of the stack allocated struct are initialised by the
designated initializer but the padding is not.
These padding bytes are reported by Valgrind as being used
uninitialised.
Fix
===
Remove the padding by adding the LTTNG_PACKED attribute to the nested
structs in `struct lttng_session_list_schedules_return`.
Notes
=====
In light of the actual root cause, this is stacktrace is not really
useful.
The realloc call to grow the buffer makes it hard to find what is the
actual uninitialised stack allocation because Valgrind reports the
realloc call as the problematic site.
I was able to track this issue by adding a "consuming" step in the
`lttng_dynamic_buffer_append()` function. This consuming step would sum
all the bytes of the `buf` parameter so as to force Valgrind to check
each byte and not wait until the `sendmsg()` call. This way, I was able
to get a more precise location of the root cause of the issue.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ib4a729575e9117cf95716ad25e1417c833f4232b
Jérémie Galarneau [Mon, 14 Jun 2021 19:21:23 +0000 (15:21 -0400)]
Fix: bump minimal urcu dependency to 0.11
cds_lfht_destroy is used from within RCU read-side critical sections
which is only allowed for urcu >= 0.10 (see userspace-rcu commit
d0ec0ed2f).
Such uses were introduced as part of the 2.11 release (contemporary
to urcu 0.10).
This version of the fix differs from
879000628 since the configure
script has changed significantly for the 2.13 release (use of PKG_CONFIG
rather than looking for a version-specific symbol).
The configure script looks for `urcu_memb_barrier` which was introduced
in liburcu 0.11.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I7d8daba877cbdcfaf93cdc4c9b16a3b3df974782
Mathieu Desnoyers [Thu, 27 May 2021 21:11:22 +0000 (17:11 -0400)]
Fix: list_lttng_agent_events: unbalanced RCU read-side lock on error
The error label jumps to the end label which releases the RCU read-side
lock. There are many error paths in this function which goto error
without holding the RCU read-side lock, thus causing unbalanced RCU
read-side lock.
There is no point in keeping so short RCU read-side critical sections,
so cover the entire function with a single read-side critical section.
[ Applies to stable-2.12 and possibly prior versions. Does _not_ apply
to stable-2.13+. ]
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I5b20c229a5df22d22ecfdc64dbbb87ee118649d2
Mathieu Desnoyers [Thu, 27 May 2021 20:45:55 +0000 (16:45 -0400)]
Fix: consumer: unbalanced RCU read-side lock on error
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I22ddc8585af84d07e8f3d19e9df582599352a1b8
Jérémie Galarneau [Tue, 18 May 2021 21:54:01 +0000 (17:54 -0400)]
Update version to v2.11.7
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Thu, 22 Apr 2021 18:47:14 +0000 (14:47 -0400)]
Fix: sessiond: leak of config_path on duplicate --config option
1452207 Resource leak
The system resource will not be reclaimed and reused, reducing the
future availability of the resource.
CID
1452373 (#3 of 3): Resource leak (RESOURCE_LEAK)
26. overwrite_var: Overwriting config_path in config_path =
utils_expand_path(optarg) leaks the storage that config_path points to.
Reported-by: Coverity Scan
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I70eff0f4de5ac9b84b699e88232669259eab54db
Jonathan Rajotte [Thu, 6 May 2021 15:14:52 +0000 (11:14 -0400)]
Fix: relayd: live: data is missing between viewer attach and retry
Observed issue
==============
Data produced between the time a live viewer attach and a viewer
subsequent "polling" is missing from the point of view of the viewer.
Using the following reproducer:
# A lttng-sessiond is already running
lttng-relayd -b -vvv > relayd.log 2>&1
lttng create live --live
lttng enable-event -u "hello_world:*"
lttng start
date
# Set the "polling" interval at ~30 seconds
babeltrace2 --retry-duration=
30000000 -i lttng-live net://localhost/host/$HOSTNAME/live &
pid=$!
# Make sure we are in the retry phase of bt2
date
sleep 2
# Produce events
# Note here that hello perform a sleep of 1 seconds before generating
# 10 events to make sure lttng-ust had time to register. A total of
# 100 events is generated.
date
for i in `seq 1 10`; do
./hello
done
# Here we expect to see 100 events outputted at some point (~18
# seconds (30 - 2 - 10) from that time).
# No events are received by babeltrace2.
date
sleep 40
date
# We retry the same exercise a second time.
echo "Moving to phase 2"
date
for i in `seq 1 10`; do
./hello
done
# Here data is received correctly.
date
sleep 40
date
kill $pid
wait
lttng destroy live
pkill lttng-relayd
Cause
=====
At the moment the viewer attach is done, no streams exist. On the
following viewer_get_new_streams (~30 seconds later), streams now exist
since they were allocated lazily following the first application
registration, but LTTNG_VIEWER_SEEK_LAST is used. This essentially
"discards" any events in the stream that happened between the attach
time and the viewer_get_new_stream execution.
Note that the same problem would happen if a new UID (in per-uid mode)
would happen to begin tracing in between viewer retry.
Solution
========
Always use `LTTNG_VIEWER_SEEK_BEGINNING` during
`viewer_get_new_streams`.
Known drawbacks
=========
This does not fix the per-pid problem where an app run and die in between
the viewer retries.
References
==========
https://lists.lttng.org/pipermail/lttng-dev/2021-May/029953.html
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: If0e2df639572e4bce91bac03994494ddbe138315
Jérémie Galarneau [Mon, 3 May 2021 16:50:25 +0000 (12:50 -0400)]
Fix: lttng-ctl: erroneous check if user is part of the tracing group
in_tgroup is set to `-1` whenever the current user is not part of the
tracing group _or_ if an error occurred while looking up if the user
is part of the tracing group. In other words, the value '0' is unused.
in_tgroup must be explicitly checked against '1' and can't be assumed
to behave as a boolean value.
This is _not_ a security issue: if the user is not part of the tracing
group, she will fail to open the root session damon's socket because
of the kernel-side permission checking. However, the behaviour of the
lttng client (and error reporting) will be confusing.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I1c12285fd0f42bb73c560a32a56cab03f1a5a6e1
Mathieu Desnoyers [Fri, 30 Apr 2021 16:02:47 +0000 (12:02 -0400)]
Fix: kernel consumer: get next subbuffer EAGAIN handling
The caller of get next subbuffer (data and metadata) callbacks only
expects -ENODATA when there is no data to read. However, the kernel
tracer distinguishes between no data for a finalized stream (-ENODATA)
and no data for a non-finalized stream (-EAGAIN).
Given that the consumer daemon uses the POLLHUP returned by epoll to
detect stream end of life, it does not care about the distinction
between -EAGAIN and -ENODATA when streaming.
However, taking a snapshot of a metadata stream uses the distinction
between nodata and again. Change this so it considers a return value of
0 from lttng_consumer_read_subbuffer to mean there is no more data to
read, so we can combine -EAGAIN and -ENODATA within get next subbuffer
callbacks and return -ENODATA for both.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ifb157dbe28498279dce30e9efa3b5aedcf3f9b1d
Mathieu Desnoyers [Fri, 30 Apr 2021 14:44:44 +0000 (10:44 -0400)]
Fix: kernel consumer: signal metadata ready condition variable
The behaviour of the kernel consumer does not match that of the user
space consumer. When a live metadata stream goes back to "sleep", it
must broadcast on the metadata_rdv to wake-up any thread that is
performing a metadata sync.
This mismatch causes a hang in the kernel consumer during the kernel
clear tests.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I7c93007b6d08f340881c08f23931582443bfa397
Jérémie Galarneau [Thu, 29 Apr 2021 20:17:45 +0000 (16:17 -0400)]
Fix: consumerd: unbalanced subbuffer 'get' when checking operation availability
A WARN_ON in lttng-modules is hit in the get_subbuf ioctl, which
pointed to an unbalanced get/put pair when accessing the subbuffers
of a channel.
517716.168856] ------------[ cut here ]------------
[517716.171559] WARNING: CPU: 1 PID: 19313 at /home/efficios/git/lttng-modules/src/lib/ringbuffer/ring_buffer_frontend.c:1263 lib_ring_buffer_get_subbuf+0x24f/0x260 [lttng_lib_ring_buffer]
[517716.180096] Modules linked in: lttng_test(O) lttng_probe_x86_exceptions(O) lttng_probe_x86_irq_vectors(O) lttng_probe_writeback(O) lttng_probe_workqueue(O) lttng_probe_vmscan(O) lttng_probe_udp(O) lttng_probe_timer(O) lttng_probe_sunrpc(O) lttng_probe_statedump(O) lttng_probe_sock(O) lttng_probe_skb(O) lttng_probe_signal(O) lttng_probe_scsi(O) lttng_probe_sched(O) lttng_probe_regulator(O) lttng_probe_regmap(O) lttng_probe_rcu(O) lttng_probe_random(O) lttng_probe_printk(O) lttng_probe_power(O) lttng_probe_net(O) lttng_probe_napi(O) lttng_probe_module(O) lttng_probe_kmem(O) lttng_probe_jbd2(O) lttng_probe_irq(O) lttng_probe_i2c(O) lttng_probe_gpio(O) lttng_probe_ext4(O) lttng_probe_compaction(O) lttng_probe_btrfs(O) lttng_probe_block(O) lttng_counter_client_percpu_32_modular(O) lttng_counter_client_percpu_64_modular(O) lttng_counter(O) lttng_ring_buffer_event_notifier_client(O) lttng_ring_buffer_metadata_mmap_client(O) lttng_ring_buffer_client_mmap_overwrite(O)
[517716.180815] lttng_ring_buffer_client_mmap_discard(O) lttng_ring_buffer_metadata_client(O) lttng_ring_buffer_client_overwrite(O) lttng_ring_buffer_client_discard(O) lttng_tracer(O) lttng_statedump(O) lttng_wrapper(O) lttng_uprobes(O) lttng_clock(O) lttng_kprobes(O) lttng_lib_ring_buffer(O) lttng_kretprobes(O) [last unloaded: lttng_wrapper]
[517716.213228] CPU: 1 PID: 19313 Comm: lttng-consumerd Tainted: G O 5.11.2 #80
[517716.215573] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[517716.220341] RIP: 0010:lib_ring_buffer_get_subbuf+0x24f/0x260 [lttng_lib_ring_buffer]
[517716.222579] Code: 50 f0 ff 00 0f 0b 49 03 5f 28 44 8b 85 80 00 00 00 49 8b 77 30 45 85 c0 48 89 d9 0f 85 4f ff ff ff e9 25 ff ff ff f0 ff 45 00 <0f> 0b b8 f0 ff ff ff e9 a6 fe ff ff 0f 1f 44 00 00 0f 1f 44 00 00
[517716.227610] RSP: 0018:
ffffbafd09023e88 EFLAGS:
00010202
[517716.229147] RAX:
0000000000000000 RBX:
ffff9906ed069a00 RCX:
ffff9905c4e4a400
[517716.231186] RDX:
ffffdafcffc4ec90 RSI:
0000000000200000 RDI:
ffffdafcffc4e9f0
[517716.233265] RBP:
ffff9905c4e4a400 R08:
0000000000300000 R09:
0000000000200000
[517716.235284] R10:
0000000000200000 R11:
0000000000000000 R12:
0000000000000000
[517716.237333] R13:
0000000000000000 R14:
000000000000005e R15:
0000000000000000
[517716.239360] FS:
00007ff327fff700(0000) GS:
ffff9905a7a40000(0000) knlGS:
0000000000000000
[517716.241634] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[517716.243262] CR2:
000055b91cf07730 CR3:
000000076f45e003 CR4:
00000000001706e0
[517716.245316] Call Trace:
[517716.246281] lib_ring_buffer_ioctl+0x181/0x300 [lttng_lib_ring_buffer]
[517716.248301] lttng_stream_ring_buffer_ioctl+0x1a3/0x200 [lttng_tracer]
[517716.252621] __x64_sys_ioctl+0x8e/0xd0
[517716.253931] do_syscall_64+0x33/0x80
[517716.255016] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[517716.256448] RIP: 0033:0x7ff3372f46d7
[517716.257586] Code: b3 66 90 48 8b 05 b1 47 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 47 2d 00 f7 d8 64 89 01 48
[517716.262472] RSP: 002b:
00007ff327ffe2c8 EFLAGS:
00000246 ORIG_RAX:
0000000000000010
[517716.264634] RAX:
ffffffffffffffda RBX:
00007ff310002740 RCX:
00007ff3372f46d7
[517716.266674] RDX:
0000000000000000 RSI:
000000000000f605 RDI:
000000000000005e
[517716.268713] RBP:
00007ff327ffe310 R08:
00007ff310002870 R09:
a002000000000000
[517716.270732] R10:
000055992b0a6530 R11:
0000000000000246 R12:
000055992c47da70
[517716.272768] R13:
00007ff318005e80 R14:
00007ff310002740 R15:
000055992b0a6528
[517716.274800] irq event stamp:
4526705
[517716.275897] hardirqs last enabled at (
4526713): [<
ffffffff9016a474>] console_unlock+0x4b4/0x5b0
[517716.278320] hardirqs last disabled at (
4526722): [<
ffffffff9016a3d0>] console_unlock+0x410/0x5b0
[517716.280759] softirqs last enabled at (
4526658): [<
ffffffff9120030f>] __do_softirq+0x30f/0x432
[517716.285125] softirqs last disabled at (
4526653): [<
ffffffff91001052>] asm_call_irq_on_stack+0x12/0x20
[517716.287648] ---[ end trace
506e55b312b731bf ]---
The check for the availability of the 'get_next_check_metadata'
operation attempts to use the operation at the creation of the metadata
stream. Most of the time this occurs before any metadata could be
generated.
However, the check will sometimes (very rarely) occur after the
generation of some metadata causing the 'get' to succeed and,
consequently, a subbuffer to be acquired. In those cases, the subbuffer
must be released immediately.
Fixes #1313
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ic4f3deb6fa665a13e3673fd859b23d7ef803f963
Jérémie Galarneau [Thu, 22 Apr 2021 23:54:47 +0000 (19:54 -0400)]
Fix: consumerd: strlen called on uninitialized path
closed_trace_chunk_path is uninitialized when relayd_id != 0.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I71564bdce0617de7749fc5848764a8cda4d122ea
Jérémie Galarneau [Thu, 22 Apr 2021 16:32:26 +0000 (12:32 -0400)]
Fix: tests: health thread stall: only stop consumerd when required
Since
a0f8e3109, stop_lttng_consumerd will report a failure when
there is no consumer daemon to kill. This fix ensures it is only
invoked for tests that launch a consumer daemon.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ifba6c0a87158d5cf20eeacb2b65fe049197d9b55
Francis Deslauriers [Mon, 19 Apr 2021 18:58:31 +0000 (14:58 -0400)]
Fix: tests: quote variable in case it's not set
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I34d39498a271e3a1644f49ac7f86f3e7f6d37a0f
Jonathan Rajotte [Mon, 15 Mar 2021 21:52:24 +0000 (17:52 -0400)]
Fix: sessiond: session destroy hang in per-uid when context cannot be added
Observed issue
==============
The system_test CI jobs hang on the perf test suite during the destroy
command steps of the ust perf raw subtest.
Cause
=====
The system_test are running inside a kvm as root. It turns out that the
PMU (UNHALTED_REFERENCE_CYCLES) the test suite is trying to add is
unavailable on the qemu host.
ustctl_add_context return -1024 since it fails to add the context.
This leads us down the error path for the callstack leading to the
ustctl_add_context call.
1) `ust_app_channel_create` returns `ret` != 0;
2) `find_or_create_ust_app_channel` returns `ret != 0`;
3) `ust_app_synchronize` based on the `ret` value goes directly to the
end of the function to an error path without passing on the
`create_ust_app_metadata` function and clean-up structure related to
the app.
Note that being in per-uid mode, data and metadata
channel/streams/buffer allocation is done on the fly for the first app
during `ust_app_synchronize` and its callee. For the current problematic
scenario, only the data channels have been allocated on the consumer for
the uid at that point. The metadata for that uid is not yet created.
Now that we know more of what is going on during an ""add context""
let's take a look at the actual hang.
The client never complete the destroy command since the consumerd
indicates that the trace chunk for the session is not closed. The trace
chunk still exists despite the fact that a close chunk command has been
issued. This is the case since its refcount never reaches zero and thus
the release does not complete.
In a normal execution without the use of contexts, the release of the
trace hunk (refcount == 0) occurs during the final rotation on destroy.
Upon further comparison between a working execution and a non-working
execution, in a non-working execution the `cmd_rotate_session` does not
issue the rotation for the data channels since the loop detects that no
metadata is present. Which, as we discussed earlier, can happen if we
fail to add the context to the app channel.
[1]
```
cds_list_for_each_entry(reg, &usess->buffer_reg_uid_list, lnode) {
struct buffer_reg_channel *reg_chan;
struct consumer_socket *socket;
if (!reg->registry->reg.ust->metadata_key) {
/* Skip since no metadata is present */
continue;
}
....
/* Rotate the data channels. */
cds_lfht_for_each_entry(reg->registry->channels->ht, &iter.iter,
reg_chan, node.node) {
ret = consumer_rotate_channel(socket,
reg_chan->consumer_key,
usess->uid, usess->gid,
usess->consumer,
/* is_metadata_channel */ false);
if (ret < 0) {
cmd_ret = LTTNG_ERR_ROTATION_FAIL_CONSUMER;
goto error;
}
}
....
}
```
Solution
========
Move the metadata check after the data channel rotation since it is
possible to have data channels but no metadata channel, although it is a
corner case.
Note that per-pid mode and kernel are not affected by the current bug
since a complete teardown of all objects is done. This only affect
per-uid due to the "on the fly" allocation nature of it since we need to
share the channel/stream/buffers across apps.
Known drawbacks
=========
None.
References
==========
[1] https://github.com/lttng/lttng-tools/blob/
3d1384a4add389c38f8554130e8dec2e2d06009d/src/bin/lttng-sessiond/ust-app.c#L7057
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I611761ac4051c9a80158705a87e59541fb37660c
Christophe Bedard [Fri, 28 Aug 2020 15:00:30 +0000 (11:00 -0400)]
Fix: lttng_destroy_session_no_wait: return 0 on success
lttng_destroy_session_no_wait() is supposed to behave like
lttng_destroy_session():
> Return 0 on success else a negative LTTNg error code.
However, it returns LTTNG_OK on success. Make it return 0 instead.
Signed-off-by: Christophe Bedard <christophe.bedard@apex.ai>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I787307d4291babfb9ddcb12937efdbfbb5fd7009
Simon Marchi [Fri, 2 Apr 2021 19:52:27 +0000 (15:52 -0400)]
config: fix typo in error message
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Iba45933cf70452a8c4e89cb471a11cd42bcd08b8
Mathieu Desnoyers [Mon, 25 May 2020 20:19:18 +0000 (16:19 -0400)]
Tests: array expressions without contant index are invalid
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Iaff8d199295359aae9d82d645a30f6b3bf81212b
Mathieu Desnoyers [Mon, 25 May 2020 20:15:40 +0000 (16:15 -0400)]
Fix: validate that array expression contains constant
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ic4f9b178d918b2c7c9dd27bc6a30c1798a171727
Jonathan Rajotte [Wed, 2 Dec 2020 22:01:35 +0000 (17:01 -0500)]
Fix: test: base-path tests are not run
Observed issue
==============
The base-path tests are never run during `make check`.
Cause
=====
Albeit
2a1668643ca94195d5c3889d0337e19165805a42 [1] introduces the tests,
the test file is not added to the test list.
Solution
========
Add the test file to the test list.
One of the substest is failing.
not ok 20 - Snapshot recorded
# Failed test 'Snapshot recorded'
# in .//../../../utils/utils.sh:lttng_snapshot_record() at line 1468.
ok 21 - Destroy session ust_app_snapshot_base_path
not ok 22 - Validate trace for event tp:tptest
This is caused by the removal of `trace_path=$2` in
c28fcefd993b7539716bb5cd9557a08a217ec463 [2]. The removal is most
probably a merge error or simply a mistake on my end. This commit revert
that change.
Also add load-stream-extra-path.lttng in EXTRA_DIST.
Known drawbacks
=========
None
References
==========
[1] https://github.com/lttng/lttng-tools/commit/
2a1668643ca94195d5c3889d0337e19165805a42
[2] https://github.com/lttng/lttng-tools/commit/
2a1668643ca94195d5c3889d0337e19165805a42
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I255908165d59e02e0e11b123d3538a74f53fb584
Simon Marchi [Fri, 9 Apr 2021 15:19:17 +0000 (11:19 -0400)]
Fix: filter: memory leak in filter_parser_ctx
When running
$ lttng add-trigger --condition on-event -u ust_tests_demo2:loop --capture intfield --action notify
I get the leaks pasted below. It seems like filter_parser_ctx_free
doesn't free everything in filter_parser_ctx. Add what's missing.
Re-order the frees so that they are in the same order as the members of
the struct, just because it's easier to follow and make sure we didn't
forget anything.
=================================================================
==
1073803==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 128 byte(s) in 1 object(s) allocated from:
#0 0x7ffff767783a in __interceptor_realloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:164
#1 0x5555556833be in bytecode_reserve /home/simark/src/lttng-tools/src/common/bytecode/bytecode.c:59
#2 0x55555568360f in bytecode_push /home/simark/src/lttng-tools/src/common/bytecode/bytecode.c:79
#3 0x5555556a3d61 in filter_visitor_bytecode_generate /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-bytecode.c:667
#4 0x55555569c9b1 in filter_parser_ctx_create_from_filter_expression /home/simark/src/lttng-tools/src/common/filter/filter-parser.y:394
#5 0x55555560542e in parse_event_rule /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:704
#6 0x555555607429 in handle_condition_event /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:1088
#7 0x555555608760 in parse_condition /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:1326
#8 0x55555560bca0 in cmd_add_trigger /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:1925
#9 0x555555616b55 in handle_command /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:237
#10 0x555555617516 in parse_args /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:421
#11 0x555555617812 in main /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:470
#12 0x7ffff700bb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
Direct leak of 112 byte(s) in 1 object(s) allocated from:
#0 0x7ffff767783a in __interceptor_realloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:164
#1 0x5555556833be in bytecode_reserve /home/simark/src/lttng-tools/src/common/bytecode/bytecode.c:59
#2 0x55555568360f in bytecode_push /home/simark/src/lttng-tools/src/common/bytecode/bytecode.c:79
#3 0x5555556a1b94 in visit_node_load_expression_legacy /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-bytecode.c:198
#4 0x5555556a1d18 in visit_node_load_expression /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-bytecode.c:231
#5 0x5555556a2540 in visit_node_load /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-bytecode.c:399
#6 0x5555556a3a8b in recursive_visit_gen_bytecode /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-bytecode.c:622
#7 0x5555556a12fa in visit_node_root /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-bytecode.c:53
#8 0x5555556a3a76 in recursive_visit_gen_bytecode /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-bytecode.c:620
#9 0x5555556a3c55 in filter_visitor_bytecode_generate /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-bytecode.c:661
#10 0x55555569c9b1 in filter_parser_ctx_create_from_filter_expression /home/simark/src/lttng-tools/src/common/filter/filter-parser.y:394
#11 0x55555560542e in parse_event_rule /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:704
#12 0x555555607429 in handle_condition_event /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:1088
#13 0x555555608760 in parse_condition /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:1326
#14 0x55555560bca0 in cmd_add_trigger /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:1925
#15 0x555555616b55 in handle_command /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:237
#16 0x555555617516 in parse_args /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:421
#17 0x555555617812 in main /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:470
#18 0x7ffff700bb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
Direct leak of 40 byte(s) in 1 object(s) allocated from:
#0 0x7ffff7677639 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
#1 0x5555556a3dd2 in make_op_root /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:35
#2 0x5555556a73a5 in generate_ir_recursive /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:874
#3 0x5555556a74d6 in filter_visitor_ir_generate /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:903
#4 0x55555569c859 in filter_parser_ctx_create_from_filter_expression /home/simark/src/lttng-tools/src/common/filter/filter-parser.y:353
#5 0x55555560542e in parse_event_rule /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:704
#6 0x555555607429 in handle_condition_event /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:1088
#7 0x555555608760 in parse_condition /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:1326
#8 0x55555560bca0 in cmd_add_trigger /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:1925
#9 0x555555616b55 in handle_command /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:237
#10 0x555555617516 in parse_args /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:421
#11 0x555555617812 in main /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:470
#12 0x7ffff700bb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
Indirect leak of 40 byte(s) in 1 object(s) allocated from:
#0 0x7ffff7677639 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
#1 0x5555556a4f1d in make_op_load_expression /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:280
#2 0x5555556a696f in make_expression /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:637
#3 0x5555556a73df in generate_ir_recursive /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:882
#4 0x5555556a7382 in generate_ir_recursive /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:870
#5 0x5555556a74d6 in filter_visitor_ir_generate /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:903
#6 0x55555569c859 in filter_parser_ctx_create_from_filter_expression /home/simark/src/lttng-tools/src/common/filter/filter-parser.y:353
#7 0x55555560542e in parse_event_rule /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:704
#8 0x555555607429 in handle_condition_event /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:1088
#9 0x555555608760 in parse_condition /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:1326
#10 0x55555560bca0 in cmd_add_trigger /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:1925
#11 0x555555616b55 in handle_command /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:237
#12 0x555555617516 in parse_args /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:421
#13 0x555555617812 in main /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:470
#14 0x7ffff700bb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
Indirect leak of 24 byte(s) in 1 object(s) allocated from:
#0 0x7ffff7677639 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
#1 0x5555556a484d in create_load_expression /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:201
#2 0x5555556a5040 in make_op_load_expression /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:287
#3 0x5555556a696f in make_expression /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:637
#4 0x5555556a73df in generate_ir_recursive /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:882
#5 0x5555556a7382 in generate_ir_recursive /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:870
#6 0x5555556a74d6 in filter_visitor_ir_generate /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:903
#7 0x55555569c859 in filter_parser_ctx_create_from_filter_expression /home/simark/src/lttng-tools/src/common/filter/filter-parser.y:353
#8 0x55555560542e in parse_event_rule /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:704
#9 0x555555607429 in handle_condition_event /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:1088
#10 0x555555608760 in parse_condition /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:1326
#11 0x55555560bca0 in cmd_add_trigger /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:1925
#12 0x555555616b55 in handle_command /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:237
#13 0x555555617516 in parse_args /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:421
#14 0x555555617812 in main /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:470
#15 0x7ffff700bb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
Indirect leak of 24 byte(s) in 1 object(s) allocated from:
#0 0x7ffff7677639 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
#1 0x5555556a4e64 in create_load_expression /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:262
#2 0x5555556a5040 in make_op_load_expression /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:287
#3 0x5555556a696f in make_expression /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:637
#4 0x5555556a73df in generate_ir_recursive /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:882
#5 0x5555556a7382 in generate_ir_recursive /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:870
#6 0x5555556a74d6 in filter_visitor_ir_generate /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:903
#7 0x55555569c859 in filter_parser_ctx_create_from_filter_expression /home/simark/src/lttng-tools/src/common/filter/filter-parser.y:353
#8 0x55555560542e in parse_event_rule /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:704
#9 0x555555607429 in handle_condition_event /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:1088
#10 0x555555608760 in parse_condition /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:1326
#11 0x55555560bca0 in cmd_add_trigger /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:1925
#12 0x555555616b55 in handle_command /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:237
#13 0x555555617516 in parse_args /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:421
#14 0x555555617812 in main /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:470
#15 0x7ffff700bb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
Indirect leak of 24 byte(s) in 1 object(s) allocated from:
#0 0x7ffff7677639 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
#1 0x5555556a4bbc in create_load_expression /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:233
#2 0x5555556a5040 in make_op_load_expression /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:287
#3 0x5555556a696f in make_expression /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:637
#4 0x5555556a73df in generate_ir_recursive /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:882
#5 0x5555556a7382 in generate_ir_recursive /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:870
#6 0x5555556a74d6 in filter_visitor_ir_generate /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:903
#7 0x55555569c859 in filter_parser_ctx_create_from_filter_expression /home/simark/src/lttng-tools/src/common/filter/filter-parser.y:353
#8 0x55555560542e in parse_event_rule /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:704
#9 0x555555607429 in handle_condition_event /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:1088
#10 0x555555608760 in parse_condition /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:1326
#11 0x55555560bca0 in cmd_add_trigger /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:1925
#12 0x555555616b55 in handle_command /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:237
#13 0x555555617516 in parse_args /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:421
#14 0x555555617812 in main /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:470
#15 0x7ffff700bb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
Indirect leak of 9 byte(s) in 1 object(s) allocated from:
#0 0x7ffff761fa69 in __interceptor_strdup /build/gcc/src/gcc/libsanitizer/asan/asan_interceptors.cpp:452
#1 0x5555556a4c41 in create_load_expression /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:238
#2 0x5555556a5040 in make_op_load_expression /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:287
#3 0x5555556a696f in make_expression /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:637
#4 0x5555556a73df in generate_ir_recursive /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:882
#5 0x5555556a7382 in generate_ir_recursive /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:870
#6 0x5555556a74d6 in filter_visitor_ir_generate /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:903
#7 0x55555569c859 in filter_parser_ctx_create_from_filter_expression /home/simark/src/lttng-tools/src/common/filter/filter-parser.y:353
#8 0x55555560542e in parse_event_rule /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:704
#9 0x555555607429 in handle_condition_event /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:1088
#10 0x555555608760 in parse_condition /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:1326
#11 0x55555560bca0 in cmd_add_trigger /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:1925
#12 0x555555616b55 in handle_command /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:237
#13 0x555555617516 in parse_args /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:421
#14 0x555555617812 in main /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:470
#15 0x7ffff700bb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
Indirect leak of 8 byte(s) in 1 object(s) allocated from:
#0 0x7ffff7677639 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
#1 0x5555556a4829 in create_load_expression /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:196
#2 0x5555556a5040 in make_op_load_expression /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:287
#3 0x5555556a696f in make_expression /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:637
#4 0x5555556a73df in generate_ir_recursive /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:882
#5 0x5555556a7382 in generate_ir_recursive /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:870
#6 0x5555556a74d6 in filter_visitor_ir_generate /home/simark/src/lttng-tools/src/common/filter/filter-visitor-generate-ir.c:903
#7 0x55555569c859 in filter_parser_ctx_create_from_filter_expression /home/simark/src/lttng-tools/src/common/filter/filter-parser.y:353
#8 0x55555560542e in parse_event_rule /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:704
#9 0x555555607429 in handle_condition_event /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:1088
#10 0x555555608760 in parse_condition /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:1326
#11 0x55555560bca0 in cmd_add_trigger /home/simark/src/lttng-tools/src/bin/lttng/commands/add_trigger.c:1925
#12 0x555555616b55 in handle_command /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:237
#13 0x555555617516 in parse_args /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:421
#14 0x555555617812 in main /home/simark/src/lttng-tools/src/bin/lttng/lttng.c:470
#15 0x7ffff700bb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
SUMMARY: AddressSanitizer: 409 byte(s) leaked in 9 allocation(s).
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ic338ea1689d3f002bf9cade6d4f23e62d935968b
Simon Marchi [Wed, 31 Mar 2021 19:06:34 +0000 (15:06 -0400)]
Fix: sessiond: fix -Wshadow error in save.c
I think this actually fixes a bug. Because of the second ret variable,
the return value from init_ust_event_from_agent_event or save_ust_event
would not be forwarded correctly, in case of error.
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ieb61d9b267ab90e18cc349d86754d9b89b5703f1
Simon Marchi [Mon, 12 Apr 2021 17:23:39 +0000 (13:23 -0400)]
Fix: utils: avoid strncpy overlap in utils_partial_realpath
When running the test_utils_expand_path test with ASan enabled, I get:
➜ lttng-tools ./tests/unit/test_utils_expand_path
1..29
INPUT: /a/b/c/d/e
=================================================================
==
1485873==ERROR: AddressSanitizer: strncpy-param-overlap: memory ranges [0x621000021d00,0x621000021d0b) and [0x621000021d00, 0x621000021d0b) overlap
#0 0x7ffff761fd97 in __interceptor_strncpy /build/gcc/src/gcc/libsanitizer/asan/asan_interceptors.cpp:481
#1 0x555555573834 in utils_partial_realpath /home/simark/src/lttng-tools/src/common/utils.c:195
#2 0x55555557410b in _utils_expand_path /home/simark/src/lttng-tools/src/common/utils.c:374
#3 0x555555574340 in utils_expand_path /home/simark/src/lttng-tools/src/common/utils.c:420
#4 0x555555570b28 in test_utils_expand_path /home/simark/src/lttng-tools/tests/unit/test_utils_expand_path.c:274
#5 0x55555557119e in main /home/simark/src/lttng-tools/tests/unit/test_utils_expand_path.c:345
#6 0x7ffff725fb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
#7 0x55555556fa3d in _start (/home/simark/build/lttng-tools/tests/unit/test_utils_expand_path+0x1ba3d)
0x621000021d00 is located 0 bytes inside of 4096-byte region [0x621000021d00,0x621000022d00)
allocated by thread T0 here:
#0 0x7ffff7677639 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
#1 0x55555557269d in zmalloc /home/simark/src/lttng-tools/src/common/macros.h:45
#2 0x555555573d34 in _utils_expand_path /home/simark/src/lttng-tools/src/common/utils.c:335
#3 0x555555574340 in utils_expand_path /home/simark/src/lttng-tools/src/common/utils.c:420
#4 0x555555570b28 in test_utils_expand_path /home/simark/src/lttng-tools/tests/unit/test_utils_expand_path.c:274
#5 0x55555557119e in main /home/simark/src/lttng-tools/tests/unit/test_utils_expand_path.c:345
#6 0x7ffff725fb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
0x621000021d00 is located 0 bytes inside of 4096-byte region [0x621000021d00,0x621000022d00)
allocated by thread T0 here:
#0 0x7ffff7677639 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
#1 0x55555557269d in zmalloc /home/simark/src/lttng-tools/src/common/macros.h:45
#2 0x555555573d34 in _utils_expand_path /home/simark/src/lttng-tools/src/common/utils.c:335
#3 0x555555574340 in utils_expand_path /home/simark/src/lttng-tools/src/common/utils.c:420
#4 0x555555570b28 in test_utils_expand_path /home/simark/src/lttng-tools/tests/unit/test_utils_expand_path.c:274
#5 0x55555557119e in main /home/simark/src/lttng-tools/tests/unit/test_utils_expand_path.c:345
#6 0x7ffff725fb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
The sole caller of utils_partial_realpath, _utils_expand_path, passes
the same buffer (resolved_path) for the input and output. This causes
utils_partial_realpath to call strncpy with overlapping strings.
Fix it by making utils_partial_realpath allocate new memory for the
returned string itself. This causes one more allocation than the
current code, because we don't re-use the existing buffer, but this
should be fine since this isn't exactly performance-critical code.
I think the code is easier to follow as a result.
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Iab983e2f44fa57563b11ac6e9c03a41150669d9e
Simon Marchi [Mon, 12 Apr 2021 17:18:15 +0000 (13:18 -0400)]
Clean-up: utils: make utils_partial_realpath static
It is only used within its file.
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I1467f5eeb11c66ecbe358290a172c37db34d881a
Jérémie Galarneau [Wed, 7 Apr 2021 15:55:26 +0000 (11:55 -0400)]
Fix: sessiond: kernel: invalid error code check
The `EEXIST` error code does not make sense for a kernctl_disable()
operation. In fact, not existing would be an error.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I8f66c6d6769a4a10e8ff446736c6a7d55a9c32be
Jérémie Galarneau [Mon, 8 Feb 2021 19:40:33 +0000 (14:40 -0500)]
Fix: ust-consumer: metadata thread not woken-up after version change
Issue observed
==============
The metadata regeneration test fails, very rarely, in the "streaming"
case on the CI. The interesting part of the test boils down to:
1) start session
2) launch an app tracing one event
3) stop session
4) delete metadata file
5) start session
6) regenerate metadata
7) stop session
8) destroy session
9) read trace: babeltrace fails on an invalid metadata file.
The problem is hard to capture, but modifying the test allows us to see
that there appears to be a short window between steps 7 and 8 where the
metadata file is empty or doesn't exist.
Cause
=====
When metadata is regenerated, its version is bumped and the metadata
cache is "reset". In some cases, such as in this test, the new metadata
will have exactly the same size as it had prior as nothing happened to
change that (e.g. no new apps/probes were registered).
When this occurs, the metadata thread is not woken-up by
consumer_metadata_cache_write() as it sees that max_offset of the
metadata cache didn't change; the data was replaced but it has the same
size.
The metadata consumption thread also checks for version bumps and
resets the amount of consumed metadata. Hence, if the "cache write"
operation woke up the metadata consumption thread, the stream's
"ust metadata pushed" state would be reset and the new contents would
be consumed.
Solution
========
The metadata stream's "ust metadata pushed" position is directly reset
to zero when a metadata version change is detected by the metadata
cache. The metadata poll thread is also woken up to resume the
consumption of the newly-available data.
It is unclear why the change to the consumption position was only done
on the metadata consumption thread's code path and not directly by the
session daemon command handling.
Note that a session rotation will also result in a reset of the pushed
position and a wake-up of the metadata poll thread from the command
handling thread. I am speculating that this couldn't be done due to the
design of the locking at the time of the original
implementation (I haven't checked).
In implementing this change, the metadata reception code path is
untangled a bit to separate the logic that affects the metadata stream
from the logic that manages the metadata cache. I suspect the original
error stems from a mix-up/confusion between both concerns.
When a metadata version change happens, the metadata cache resets its
'max_offset' (in other words, it's current size) and notifies the
caller. The caller then resets the "ust pushed metadata" position to
zero and wakes-up the metadata thread to consume the new contents of the
metadata cache.
Known drawbacks
===============
None.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I142ef957140d497ac7fc4294ca65a55c12518598
Jérémie Galarneau [Mon, 8 Feb 2021 18:00:56 +0000 (13:00 -0500)]
Clean-up: ust-consumer: simplify metadata cache unlock on error path
The metadata cache lock can be released directly after the call to
consumer_metadata_cache_write() since nothing depends on the cache
being locked after.
This simplifies the unlocking of the metadata cache by removing an
extra "unlock" that is specific to an error path.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ib0151d9640309005dfc83e95efb538e3c4f0df4a
Jérémie Galarneau [Wed, 10 Feb 2021 19:25:57 +0000 (14:25 -0500)]
Fix: sessiond: timer: unitiliazed sigevent fields
Coverity reports:
1445773 Uninitialized scalar variable
The variable will contain an arbitrary value left from earlier computations.
In timer_start: Use of an uninitialized variable (CWE-457)
CID
1445773 (#1 of 1): Uninitialized scalar variable (UNINIT)2.
uninit_use_in_call: Using uninitialized value sev. Field sev._sigev_un
is uninitialized when calling timer_create
Reported-by: Coverity Scan
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I306ea7e58192223408fac9d354dbbfebe6c0cc70
Jérémie Galarneau [Wed, 10 Feb 2021 19:20:34 +0000 (14:20 -0500)]
Fix: sessiond: ust-registry: dereference of NULL pointer on allocation failure
Coverity reports:
1445771 Dereference after null check
Either the check against null is unnecessary, or there may be a null pointer dereference.
In ust_registry_channel_add: Pointer is checked against null but then dereferenced anyway (CWE-476)
Reported-by: Coverity Scan
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I3976e0d7b059b47ca233bc9a20585eff2e1290fc
Jérémie Galarneau [Mon, 8 Feb 2021 18:03:20 +0000 (13:03 -0500)]
Fix: ust-consumer: metadata cache lock not taken when sampling max offset
Found by inspecting the code while searching for an unrelated problem.
The 'max_offset' field is probably only accessed by the sessiond
poll thread, but this isn't a documented (nor reasonably maintainable)
guarantee.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I1d2b29e9a483aa8e49c2db589284b853cce65491
Jérémie Galarneau [Tue, 23 Feb 2021 03:14:37 +0000 (22:14 -0500)]
Update version to v2.11.6
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Mathieu Desnoyers [Wed, 3 Feb 2021 15:21:42 +0000 (10:21 -0500)]
Fix: use MT-safe strtok_r in multithreaded context
append_list_to_probes uses the non-multithread-safe "strtok" while in
multithreaded context.
It is called by modprobe_lttng_data(), which is called from
init_kernel_tracer().
init_kernel_tracer is used from main() after other threads are created,
and also from process_client_msg() when a user attempts to interact with
the kernel domain if the kernel tracer has not yet been successfully
initialized.
Fixed by using the MT-safe strtok_r() instead.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ic3f81551b8508451e0f9733c5de56a4d4407b813
Jérémie Galarneau [Tue, 12 Jan 2021 22:41:54 +0000 (17:41 -0500)]
Fix: liblttng-ctl: unreported truncations when copying strings
gcc 10.2 reports a large number of string truncation warning in
liblttng-ctl. Replace the uses of lttng_ctl_copy_string() util by
lttng_strncpy() (handling the null source case when applicable) and
report the truncations when they occur.
Example gcc warning:
lttng-ctl.c:86:3: warning: ‘strncpy’ output may be truncated copying 254 bytes from a string of length 254 [-Wstringop-truncation]
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I673720963c9ba456e7d900b1a845e72646eda97c
Jérémie Galarneau [Mon, 11 Jan 2021 23:11:46 +0000 (18:11 -0500)]
Fix: configure: support Autoconf 2.70
The newly-released autoconf 2.70 introduces a number of breaking
changes [1] and is being rolled-out by some distros.
Amongst those changes, the AC_PROG_CC_STDC macro is marked as obsolete
and was merged into AC_PROG_CC, which we already use. On 2.70, this
results in a warning which we handle as an error.
A version check is added to invoke the AC_PROG_CC_STDC macro only when
running a pre-2.70 version of autoconf, fixing the issue.
A single use of the AC_HELP_STRING macro is replaced by AS_HELP_STRING
as the former was marked as obsolete.
The AC_PROG_LEX now takes an argument, and the argument-less version is
marked as obsolete. The macro is invoked with the `noyywrap` option, as
recommended in the documentation.
Also, the AX_PTHREAD macro makes use of the $as_echo built-in shell
variable which no longer exists in 2.70. A patch was submitted to the
GNU Autoconf archive in March, but there have been no signs of life
given since then [2].
As such, our local copy is updated to the latest version and the patch
(which looks fairly straight-forward / safe) is applied. This should
minimize changes once we go back to an "official" version of the macro.
[1] https://lwn.net/Articles/839395/
[2] https://savannah.gnu.org/patch/?9906
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Reviewed-by: Michael Jeanson <michael.jeanson@efficios.com>
Change-Id: I80366d810e5daaa7c35923bdd3a49b34dc1ca25e
Jérémie Galarneau [Tue, 1 Dec 2020 21:51:23 +0000 (16:51 -0500)]
Fix: sessiond: metadata not created on app unregistration during start
Issue observed
==============
A test for an incoming feature (trigger actions on on-event conditions)
hangs. While this problem was discovered using this test, it exercises a
scenario that is problematic as of this fix.
The destruction of a session can hang if a single application being
traced unregisters (dies) during the 'start' of a session.
Cause
=====
When a per-uid session is started, its buffers (channels and streams)
are allocated only if an instrumented application is registered to the
session daemon at that moment.
For historical reasons, the 'data' and 'metadata' buffers are allocated
in separate code paths. The 'data' buffers are allocated in
ust_app_synchronize() and the 'metadata' buffers are allocated in
ust_app_start_trace(). Both functions perform their own look-up for an
application session and will gracefully fail if an application session
can't be found; it typically means the application has exited.
This leaves a race window open where ust_app_synchronize() can succeed
in looking-up the application session, and ust_app_start_trace() can
fail following the death of the application.
When this occurs, the session is left with 'data' buffers allocated and
unallocated ''metadata' buffers. This is an unexpected state and results
in the rotation code attempting to rotate a partially initialized
metadata stream.
The rotation of this partially initialized metadata stream never
completes which, in turn, never allows the session to complete its
implicit rotation on destruction.
This race window is fairly narrow, but can be reproduced by sleep()-ing
at the beginning of ust_app_start_trace() and killing an application
that is being traced during the sleep period.
Solution
========
The creation of the metadata channel is performed as part of
ust_app_synchronize() if the application look-up succeeds. When it
fails, both 'data' and 'metadata' streams will fail to be created
resulting in an expected and valid state.
Known drawbacks
===============
None.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ice0ec16734a39a6bb885986d3ad70d20cd2618e0
Francis Deslauriers [Tue, 15 Sep 2020 16:10:18 +0000 (12:10 -0400)]
Cleanup: use `modprobe --remove` rather than `rmmod`
Background
==========
According to the rmmod(8) man page:
rmmod is a trivial program to remove a module (when module unloading
support is provided) from the kernel. Most users will want to use
modprobe(8) with the -r option instead.
`rmmod` simply unloads the provided module and decrements the refcount
of the modules it depended on but doesn't unload those dependencies if
their refcount is zero.
Issue
=====
With the following scenario we can end up if modules with a zero
refcount still loaded in the kernel:
modprobe lttng-test
lttng-sessiond
... (test case) ...
ctrl+c sessiond
rmmod lttng-test
When we teardown the lttng-sessiond, some modules are kept in the kernel
because the `lttng-test` module depends on them. So unloading
`lttng-test` using `rmmod` keeps those dependencies in the kernel.
Solution
========
Use `modprobe --remove` to unload modules and their now unused
dependencies.
From the modprobe(8) man page:
-r, --remove
This option causes modprobe to remove rather than insert a module.
If the modules it depends on are also unused, modprobe will try to
remove them too. Unlike insertion, more than one module can be
specified on the command line
Note
====
This commit also replaces existing uses of `modprobe -r` to `modprobe
--remove` for consistency.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I7be83a645097e1eddd478cfbb717906b971f04ea
Jonathan Rajotte [Thu, 28 May 2020 01:29:05 +0000 (21:29 -0400)]
Tests: Fix: 99% fill ratio for high buffer usage is too high for larger events
If the event being registered is bigger than 1% of a subbuffer, the 99%
ratio cannot be achieved since the "last event" necessary to go over 99%
will always be dropped by the tracer.
e.g:
DBG1 - 19:31:07.
665963875 [Notification]: [notification-thread] High buffer usage condition being evaluated: threshold = 16220, highest usage = 16196 (in evaluate_buffer_usage_condition() at notification-thread-events.c:3733)
We use a ratio of 90% to keep a little headroom.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I06180735e0b5e88209b888e51cc83b4ac7d98193
Jérémie Galarneau [Fri, 16 Oct 2020 18:43:39 +0000 (14:43 -0400)]
Fix: common: poll: compat_poll_wait never finishes
compat_poll_wait hangs when poll returns an array of file
descriptors of the form:
[ Inactive Active ]
The logic to find the first idle pollfd entry is bogus and actually
skips the first idle entry. This causes the follow-up loop to never
conclude.
The pollfd array defragmentation logic is re-written in a simpler
style to handle those cases appropriately.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I8669a870df1ec1160f05e35e83671917bb80d6f9
Jérémie Galarneau [Fri, 11 Sep 2020 03:57:54 +0000 (23:57 -0400)]
Build fix: implicit declaration of function 'PERROR' on Solaris
Solaris 10/11 CI builds complain that PERROR is not declared
in memstream.h. This addresses the warning on both platforms.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I09ce7cea216e075d0ef7256f738ea7400d193b6b
Jérémie Galarneau [Tue, 15 Sep 2020 20:22:14 +0000 (16:22 -0400)]
Fix: PERROR spam when `tracing` group does not exist
The session daemon prints a PERROR on launch when the tracing group does
not exist. This should not occur when the group simply does not exist as
this is not an error. In that case (ESRCH), a DBG statement is
sufficient.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I3ade29071a8f4e9fe2eb56bf05ff4150b70fd463
Jérémie Galarneau [Thu, 20 Aug 2020 19:38:18 +0000 (15:38 -0400)]
Fix: memcpy used on potentially overlapping regions
Caught by reviewing unrelated code, these two uses of memcpy
can operate on overlapping buffers. I checked all other uses
of "raw" memcpy and those appear safe.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I72b1204bc52a92015042adb6a67b022d140f5b4e
Jérémie Galarneau [Mon, 3 Aug 2020 20:52:08 +0000 (16:52 -0400)]
Update version to v2.11.5
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 23 Jun 2020 03:35:02 +0000 (23:35 -0400)]
Fix: lttng: leak of userspace probe path on listing
realpath() returns a malloc'ed string when `resolved_path` is NULL.
`binary_path` can be safely free'd after it is printed.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ic35862f3275f58274ae2572e1d6be736f8a2107e
Jérémie Galarneau [Mon, 15 Jun 2020 22:44:23 +0000 (18:44 -0400)]
Fix: uprobe: missing error code on allocation failure
lttng_userspace_probe_location_function_create_from_payload
does not report an allocation failure when strdup fails.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ibed9906cabeca81081647d220a7aef1f54d251d4
Jérémie Galarneau [Mon, 15 Jun 2020 22:43:02 +0000 (18:43 -0400)]
Fix: sessiond: don't negate error code on list error
Listing errors are already negative. Negating in the error path
causes error codes to be interpreted as a number of events and
cause a communication error further on.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I6df6fa14e4e443798dfdc1aa7a8fd7cfe380835b
Jonathan Rajotte [Thu, 9 Jul 2020 20:22:19 +0000 (16:22 -0400)]
Fix: send/received actual size is overwritten by 'expected' size
Observed issue
==============
None.
Cause
=====
None.
Solution
========
Return the actual send/received size. Caller code correctly handles
partial send/receive.
Known drawbacks
=========
None
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ie52c39d0ab778108e3fde1b2375c0535caec97b1
Jonathan Rajotte [Thu, 9 Jul 2020 18:57:57 +0000 (14:57 -0400)]
Fix: partial recv lead to client disconnect
On EWOULDBLOCK and EAGAIN, the ret value from
`lttcomm_send_unix_sock_non_block` and
`lttcomm_recv_unix_sock_non_block` would be equal to -1.
Solution
=====
For such cases, set ret equal to zero since it is effectively what was
received/sent. Caller should treat ret >= 0 as `success` anyway.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I47677f26850bf544f672d270972de7cc2c6a2817
Michael Jeanson [Mon, 13 Jul 2020 19:41:01 +0000 (15:41 -0400)]
tests: return the proper TAP exit code
The C TAP library provides the 'exit_status()' function that will return
the proper exit code according to the number of tests that succeeded or
failed.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I0de2349609eb34b1c5e58f09012c1db0126923c0
Francis Deslauriers [Thu, 9 Jul 2020 20:44:32 +0000 (16:44 -0400)]
Tests: live/test_{lttng_,}kernel: use lttng_test_filter_event instead of sched_switch
Background
==========
These tests currently rely on system load (the `sched_switch` event) to
generate trace data.
Issue
=====
This is can be problematic for the `test_kernel`
test case because it has a fixed sized buffer to store the trace:
#define mmap_size 524288
This caused this test failure to randomly happen on my machine:
ok 7 - Get one index per stream
# mmap_size not big enough
not ok 8 - Get one data packet for stream 0, offset 0, len 4096
# Failed test (live_test.c:main() at line 709)
[error] Error detaching viewer session
not ok 9 - Detach viewer session
# Failed test (live_test.c:main() at line 715)
Solution
========
Using the `lttng_test_filter_event` event to control the size and
number of the event expected in the trace rather then depending on how
many Electon apps are currently fighting for my CPUs.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I15d500d5becf9c5e526ae11ff0b2a2f4f6b753ac
Francis Deslauriers [Thu, 9 Jul 2020 21:38:17 +0000 (17:38 -0400)]
Fix: consumer: Move sanity check within `consumer_subbuffer` functions
The sanity check on the number bytes written by the `consumer_subbuffer`
callback was not correct channel configured in the splice output type in
a live session.
To simplify this, move checks in the callback themselves so they can be
specialized.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I4e47a305860684c461ba7ffffd5e3bb3a21990b0
Francis Deslauriers [Tue, 7 Jul 2020 16:27:34 +0000 (12:27 -0400)]
Cleanup: typo in DBG() statements
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I19518292b4e7bd9319cafa05f08c1854f7693168
Jérémie Galarneau [Thu, 16 Jul 2020 16:39:20 +0000 (12:39 -0400)]
Fix: relayd: send_viewer_streams sends stack data in padding
A single stack-allocated instance of `struct lttng_viewer_stream` is
used to send the various streams to the live viewer. This structure
contains a path and channel name which remain uninitialized beyond the
null terminator.
The structure is zeroed on every iteration.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I1c60ee18e01e8f56157f0fe44bd3bd64b1194703
Jérémie Galarneau [Tue, 7 Jul 2020 22:55:19 +0000 (18:55 -0400)]
Fix: kconsumer: missing wait for metadata thread in do_sync_metadata
The `do_sync_metadata` function is invoked everytime a data sub-buffer
is consumed in live mode.
In the user space case, lttng_ustconsumer_sync_metadata() returns
EAGAIN (positive) when there is new metadata to consume. This causes the
"metadata rendez-vous" synchronization to take place. However, the
kernel variant of this function returns 0 when there is new data to
consume, causing the "rendez-vous" to be skipped.
I have not observed an issue caused by this first-hand, but the check
appears bogus and skips over an essential synchronization step.
This check has been in place since at least 2013, although the callees
and their return values may have changed at some point in the past.
Solution
--------
The user space and kernel code paths mix various return code conventions
(negative errno, positive errno, 0/-1) which makes it difficult to
understand the final return codes and most likely lead to this confusion
in the first place.
Moreover, returning EAGAIN to indicate that data is ready to be consumed
is not appropriate in view of the existing conventions in the code base.
An explicit `enum sync_metadata_status` is returned by the domains'
sync_metadata operations which allows the common code to handle the
various conditions in a straight-forward manner and for the
"rendez-vous" to take place in the kernel case.
Reported-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ib022eee97054c0b376853dd05593e3b94bc9a8ca
Jérémie Galarneau [Thu, 9 Jul 2020 15:44:47 +0000 (11:44 -0400)]
Fix: tests: interrupting get_next_notification causes test to fail
Attaching a debugger to the `notification` test application causes
tests to fail when the application is waiting for a notification
as the function will return an `INTERRUPTED` status code.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I4ac9ac13d02a96366791e81746e53bb5dde94885
Francis Deslauriers [Mon, 6 Jul 2020 16:31:00 +0000 (12:31 -0400)]
Fix: consumer.c: wrong order of parameter in `DBG()` statement
This debug statement is printing the padded_subbuf_size as the
subbuf_size and inversely.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Change-Id: I8c719e21999ffaa8df2de6089b65b90f1b211a72
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 29 Jun 2020 21:29:03 +0000 (17:29 -0400)]
Fix: kernel metadata file outside of kernel/ directory
3ef395a9, a backport of
a2814ea, introduces a regression that
causes the kernel trace metadata file to be created outside of
the "kernel" domain sub-directory.
master and stable-2.12 no longer need the domain sub-directory
to be passed to the consumer daemon. However, it was still used
in the 2.11 release.
Fixes #1275
Reported-by: Christophe Bedard <bedard.christophe@gmail.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I1fe10ddd414cc87264b5e6d6a6d81e45ebc800d4
Jonathan Rajotte [Thu, 25 Jun 2020 20:52:33 +0000 (16:52 -0400)]
Fix: consumer: dangling chunk on buffer allocation failure
Observed issue
==============
Using docker the /dev/shm mount size is 64 MB by default. On system
with many threads (256), combined with the default subbuffer count and
subbufer size, allocation failure of the subbuffer is inevitable since
the /dev/shm mountpoint is too small.
When a user try to destroy the problematic session, the destroy
command hangs.
# Force the size of /dev/shm, make sure to remount it to its orignal
# size when done testing.
sudo mount -o remount,size=1G /dev/shm
lttng create
lttng enable-channel --subbuf-size 500M -u test
lttng enable-event -u -a -c test
lttng start
# Run an app;
../lttng-ust/doc/examples/easy-ust/sample
lttng-sessiond should output:
Error: ask_channel_creation consumer command failed
Error: Error creating UST channel "test" on the consumer daemon
lttng destroy
Output hang:
Destroying session auto-
20200626-112733..........
Cause
=====
The hang is caused by the check of ongoing rotation which never finishes.
The consumer reports that the trace chunk still exists. This is caused
by an imbalance of the reference count for the trace chunk on close.
This is caused by a missing lttng_trace_chunk_put in destroy_channel
which is called on the error path for the consumer channel creation.
Solution
========
Call lttng_trace_chunk_put if the channel->trace_chunk is assigned
inside the channel_destroy function.
The error handling in ust_app_channel_create is also reworked since the
return value for ust_app_channel_send would be squashed for scenario
where contexts are present.
Known drawbacks
=========
None
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Idf19b1d306cf347619ccfda1621d91d8303f3c7c
Jérémie Galarneau [Wed, 17 Jun 2020 23:13:50 +0000 (19:13 -0400)]
Fix: consumerd: uninitialized written_bytes on no-data sleep
e66d26f51 introduces a jump to a label which causes `written_bytes`
(the return value) to not be initialized.
written_bytes may be used uninitialized in this function [-Wmaybe-uninitialized]
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I72c5c07298093f27fa72b72cb157ce4eedb81adb
Jérémie Galarneau [Wed, 17 Jun 2020 22:54:09 +0000 (18:54 -0400)]
Build fix: consumerd misnamed label
`sleep_stream` label was changed from `stream_sleep` as part of
a fixup/rebase which didn't make it into the master branch.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ifd257e2b6d1f522d018cf0284a89f49a92b12b02
Jérémie Galarneau [Wed, 17 Jun 2020 22:27:52 +0000 (18:27 -0400)]
consumerd: on_sleep not called on stream when no data is available
The `on_sleep` stream operation is not invoked when a stream's `get`
operation returns ENODATA (no data available).
Since this is not an error, the normal steps of the consumption loops
should be assumed.
Not marked as a fix as this is not problematic right now. However, it
could prove misleading in the future.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I0812e3af4c967390ebba4128781787abf45c76a1
Jonathan Rajotte [Wed, 17 Jun 2020 19:55:36 +0000 (15:55 -0400)]
Fix: invalid discarded events on start/stop without event production
Observed issue
==============
On consecutive start/stop command sequence the reported discarded event
count is N * CPU, where N is the number of start/stop pair executed.
Note that no event generation occurred between each start/stop pair.
lttng start
lttng stop
Tracing stopped for session auto-
20200616-094338
lttng start
lttng stop
Waiting for data availability
Warning: 4 events were discarded, please refer to the documentation on channel configuration.
Tracing stopped for session auto-
20200616-094338
lttng start
lttng stop
Waiting for data availability
Warning: 8 events were discarded, please refer to the documentation on channel configuration.
Tracing stopped for session auto-
20200616-094338
The issue was bisected down to:
commit
6f9449c22eef59294cf1e1dc3610a5cbf14baec0 (HEAD)
Author: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Date: Sun May 10 18:00:26 2020 -0400
consumerd: refactor: split read_subbuf into sub-operations
[...]
Cause
=====
The discarded event local variable, in `consumer_stream_update_stats()`
is initialized with the subbuffer sequence count instead of the
subbuffer discarded event count.
Solution
========
Use the subbuffer discarded event count to initialized the variable.
Known drawbacks
=========
None
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I5ff213d0464cdb591b550f6e610bf15085b18888
Jonathan Rajotte [Wed, 17 Jun 2020 19:05:48 +0000 (15:05 -0400)]
tests: truncate metadata file for regenerate metadata test
Truncating the metadata file ensure that we test the effect of the
regenerate metadata command. Otherwise we simply test the command
return.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I762dc849f69d2cf3fe8bf73c5a77d5c2a4aa4ae5
Jérémie Galarneau [Wed, 17 Jun 2020 16:59:24 +0000 (12:59 -0400)]
Fix: consumerd: user space metadata not regenerated
Observed Issue
==============
The LTTng-IVC tests fail on the `regenerate metadata` tests which
essentially:
- Setups a user space session
- Enables events
- Traces an application
- Stops tracing
- Validates the trace
- Truncates the metadata file (empties it)
- Starts tracing
- Regenerates the metadata
- Stops the session
- Validates the trace
The last trace validation step fails on an empty file (locally) or
a garbled file (remote).
The in-tree tests did no catch any of this since they essentially don't
test much. They verify that the command works (returns 0) but do not
validate any of its effects.
The issue was bisected down to:
commit
6f9449c22eef59294cf1e1dc3610a5cbf14baec0 (HEAD)
Author: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Date: Sun May 10 18:00:26 2020 -0400
consumerd: refactor: split read_subbuf into sub-operations
[...]
Cause
=====
The commit that introduced the issue refactored the sub-buffer
consumption loop to eliminate code duplications between the user space
and kernel consumer daemons.
In doing so, it eleminated a metadata version check from the consumption
path.
The consumption of a metadata sub-buffer follows those relevant
high-level steps:
- `get` the sub-buffer
- /!\ user space specific /!\
- if the `get` fails, attempt to flush the metadata cache's
contents to the ring-buffer
- populate `stream_subbuffer` properties (size, version, etc.)
- check the sub-buffer's version against the last known metadata
version (pre-consume step)
- if they don't match, a metadata regeneration occurred: reset the
metadata consumed position
- consume (actual write/send)
- `put` sub-buffer
[...]
As shown above, the user space consumer must manage the flushing of the
metadata cache explicitly as opposed to the kernel domain for which the
tracer performs the flushing implicitly through the `get` operation.
When the user space consumer encounters a `get` failure, it checks
if all the metadata cache was flushed (consumed position != cache size),
and flushes any remaining contents.
However, the metadata version could have changed and yielded an
identical cache size: a regeneration without any new metadata will
yield the same cache size.
Since
6f9449c22, the metadata version check is only performed after
a successful `get`. This means that after a regeneration, `get`
never succeeds (there is seemingly nothing to consume), and the
metadata version check is never performed.
Therefore, the metadata stream is never put in the `reset` mode,
effectively not regenerating the data.
Note that producing new metadata (e.g. a newly registering app
announcing new events) would work around the problem here.
Solution
========
Add a metadata version check when failing to `get` a metadata
sub-buffer. This is done in `commit_one_metadata_packet()` when the
cache size is seen to be equal to the consumed position.
When this occurs, `consumer_stream_metadata_set_version()`, a new
consumer stream method, is invoked which sets the new metadata version,
sets the `reset` flag, and discards any previously bucketized metadata.
The metadata cache's consumed position is also reset, allowing the
cache flush to take place.
`metadata_stream_reset_cache()` is renamed to
`metadata_stream_reset_cache_consumed_position()` since its name is
misleading and since it is used as part of the fix.
Know drawbacks
==============
None.
Change-Id: I3b933c8293f409f860074bd49bebd8d1248b6341
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Reported-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Jonathan Rajotte [Wed, 27 May 2020 22:49:25 +0000 (18:49 -0400)]
Fix: tests: output_dir contains the consumerd pipe
Prevents failure on teardown. Otherwise, testpoint fails when removing
the consumerd pipe.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I9cbfa211e2545350c28e3b10a34fb00aac0493cb
Jérémie Galarneau [Thu, 13 Feb 2020 23:21:08 +0000 (18:21 -0500)]
Fix: unix: don't PERROR on EAGAIN for non-blocking sockets
EAGAIN is expected on non-blocking UNIX socket operations. This
results in a spammy sessiond log.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I58ba711dad193b8d6849501f3e090797813e18ac
Depends-on: lttng-ust: I5a800fc92e588c2a6a0e26282b0ad5f31c044479
Jérémie Galarneau [Tue, 21 Apr 2020 21:08:10 +0000 (17:08 -0400)]
Fix: sessiond: sessiond and agent deadlock on destroy
Observed issue
--------------
While running the out-of-tree java agent tests [1], the session daemon
and agent often end up in a deadlock.
Attaching gdb to the session daemon, we can see that two threads are
blocked in an intriguing state.
Thread 13 (Thread 0x7f89027fc700 (LWP 9636)):
#0 0x00007f891e81a4cf in __lll_lock_wait () from /usr/lib/libpthread.so.0
#1 0x00007f891e812e03 in pthread_mutex_lock () from /usr/lib/libpthread.so.0
#2 0x000055637f1fbd92 in session_lock_list () at session.c:156
#3 0x000055637f25dc47 in update_agent_app (app=0x7f88ec003480) at agent-thread.c:56
#4 0x000055637f25ec0a in thread_agent_management (data=0x556380cd2400) at agent-thread.c:426
#5 0x000055637f22fb3a in launch_thread (data=0x556380cd24a0) at thread.c:65
#6 0x00007f891e81046f in start_thread () from /usr/lib/libpthread.so.0
#7 0x00007f891e7203d3 in clone () from /usr/lib/libc.so.6
Thread 8 (Thread 0x7f8919309700 (LWP 9631)):
#0 0x00007f891e81b44d in recvmsg () from /usr/lib/libpthread.so.0
#1 0x000055637f267847 in lttcomm_recvmsg_inet_sock (sock=0x7f88ec0033c0, buf=0x7f89192f5d5c, len=4, flags=0) at inet.c:367
#2 0x000055637f2146c6 in recv_reply (sock=0x7f88ec0033c0, buf=0x7f89192f5d5c, size=4) at agent.c:275
#3 0x000055637f215202 in app_context_op (app=0x7f88ec003400, ctx=0x7f8908020900, cmd=AGENT_CMD_APP_CTX_DISABLE) at agent.c:552
#4 0x000055637f215c2d in disable_context (ctx=0x7f8908020900, domain=LTTNG_DOMAIN_JUL) at agent.c:841
#5 0x000055637f217480 in agent_destroy (agt=0x7f890801dc20) at agent.c:1326
#6 0x000055637f243448 in trace_ust_destroy_session (session=0x7f8908004010) at trace-ust.c:1408
#7 0x000055637f1fd775 in session_release (ref=0x7f8908001e70) at session.c:873
#8 0x000055637f1fb9ac in urcu_ref_put (ref=0x7f8908001e70, release=0x55637f1fd62a <session_release>) at /usr/include/urcu/ref.h:68
#9 0x000055637f1fdad2 in session_put (session=0x7f8908000d10) at session.c:942
#10 0x000055637f2369e6 in process_client_msg (cmd_ctx=0x7f890800e6e0, sock=0x7f8919308560, sock_error=0x7f8919308564) at client.c:2102
#11 0x000055637f2375ab in thread_manage_clients (data=0x556380cd1840) at client.c:2347
#12 0x000055637f22fb3a in launch_thread (data=0x556380cd18b0) at thread.c:65
#13 0x00007f891e81046f in start_thread () from /usr/lib/libpthread.so.0
#14 0x00007f891e7203d3 in clone () from /usr/lib/libc.so.6
T8 is holding session list lock while the cmd_destroy_session
command is being processed. More specifically, it is attempting
to destroy an "agent_context" by communicating with an "agent"
application.
Meanwhile, T13 is still registering that same "agent" application.
Cause
-----
The deadlock itself is pretty simple to understand.
The "agent thread" (T13) has the responsability of accepting new agent
application connections. When such a connection occurs, the thread
creates a new `agent_app` instance and sends the current sessions'
configuration (i.e. their event rules and contexts) to the agent
application. When that "update" is complete, a "registration done"
message is sent to the new agent application.
From the stacktrace above, we can see that T13 is attempting to update
the agent application with its initial configuration, but it is
blocked on the acquisition of the session list lock. The application's
agent is also blocked since it is waiting for the "registration done"
message before allowing tracing to proceed (not shown here, but seen
in the test logs).
Meanwhile, T8 is holding the session list lock while destroying a
session. This is expected as all client commands are executed with
this lock held. It is, amongst other reasons, used to serialize
changes to the sessions' configuration and configuration updates sent
to the tracers (i.e. because new apps appear or to keep existing
tracers in sync with the users' session configuration).
The question becomes: why is T8 tearing down an application that is
not yet registered?
First, inspecting `agent_app` immediately shows that this structure
has no built-in synchronization mechanism. Therefore, the fact that
two threads are accessing it at the same time raises a big red flag.
Speculating on the intentions of the original design, my intuition is
that the "agent_management" thread's role is limited to instantiating
an `agent_app` and synchronizing it with the various sessions'
configuration. Once that synchronization is performed, the agent
application should be published and never accessed again by the "agent
thread".
Configuration updates (i.e. new event rules, contexts) are then sent
synchronously as they are requested by a client in the context of the
client thread. Those updates are performed while holding the session
list lock.
Hence, there is only one thread that should manipulate the agent
application at any given time making an explicit `agent_app` lock
unnecessary.
Overall, this would echo what is done when a 'user space tracer'
application registers to the session daemon (see dispatch.c:368).
Evidently this isn't what is happening here.
The agent thread creates the `agent_app`, publishes it, and then
performs an "agent app update" (sending the configuration) while
holding the session list lock. This means that there is a window where
an agent application is visible to the other threads, yet has not been
properly registered.
Solution
--------
The acquisition of the session list lock is moved outside of
update_agent_app() to allow the "agent thread" to hold the session
list lock during the "configuration update" phase of the agent
application registration.
Essentially, the sequence of operation changes from:
- Agent tcp connection established
- call handle_registration()
- agent version check
- allocation of agent_app instance
- new agent_add is published through the global agent_apps_ht_by_sock
hashtable
***
it is now reachable by all other threads without any form of
exclusivity synchronization.
***
- update_agent_app
- acquire session list lock
- iterate over sessions
- send configuration
- release session list lock
- send registration done
to:
- Agent tcp connection established
- call accept_agent_registration()
- agent version check
- allocation of agent_app instance
- acquire session list lock
- update_agent_app
- iterate over sessions
- send configuration
- send registration done
- new agent_add is published through the global agent_apps_ht_by_sock
hashtable
- release session list lock
Links
-----
[1] https://github.com/lttng/lttng-ust-java-tests
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ia34c5ad81ed3936acbca756b425423e0cb8dbddf
Jérémie Galarneau [Wed, 27 May 2020 15:27:26 +0000 (11:27 -0400)]
Fix: incorrect specifier %lu used with size_t argument
Fixes the following warning on 32-bit targets:
libtool: compile: gcc -DHAVE_CONFIG_H -I../../../include -I../../../include -I../../../src -include config.h -I/build/include -I/home/jenkins/workspace/lttng-tools_master_portbuild/arch/armhf/babeltrace_version/stable-1.5/build/std/conf/std/liburcu_version/master/test_type/base/deps/build/include -Wall -Wno-incomplete-setjmp-declaration -Wdiscarded-qualifiers -Wmissing-declarations -Wmissing-prototypes -Wmissing-parameter-type -fno-strict-aliasing -pthread -g -O2 -MT consumer-stream.lo -MD -MP -MF .deps/consumer-stream.Tpo -c consumer-stream.c -fPIC -DPIC -o .libs/consumer-stream.o
In file included from ../../../src/common/common.h:12:0,
from consumer.c:25:
consumer.c: In function ‘lttng_consumer_on_read_subbuffer_mmap’:
../../../src/common/error.h:161:35: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 7 has type ‘size_t {aka unsigned int}’ [-Wformat=]
#define DBG(fmt, args...) _ERRMSG("DEBUG1", PRINT_DBG, fmt, ## args)
^
../../../src/common/error.h:136:51: note: in definition of macro ‘__lttng_print’
fprintf((type) == PRINT_MSG ? stdout : stderr, fmt, ## args); \
^~~
../../../src/common/error.h:161:27: note: in expansion of macro ‘_ERRMSG’
#define DBG(fmt, args...) _ERRMSG("DEBUG1", PRINT_DBG, fmt, ## args)
^~~~~~~
consumer.c:1688:2: note: in expansion of macro ‘DBG’
DBG("Consumer mmap write() ret %zd (len %lu)", ret, write_len);
^~~
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Id9a571d8e94105428833baa053c6463b91484a03
Jérémie Galarneau [Thu, 14 May 2020 18:24:17 +0000 (14:24 -0400)]
Fix: consumerd: live client receives incomplete metadata
Observed issue
==============
Babeltrace 1.5.x and Babeltrace 2.x can both report errors (albeit
differently) when using the "lttng-live" protocol that imply that the
metadata they received is incomplete.
For instance, babeltrace 1.5.3 reports the following error:
```
[error] Error creating AST
[error] [Context] Cannot open_mmap_trace of format ctf.
[error] Error adding trace
[warning] [Context] Cannot open_trace of format lttng-live at path net://localhost:xxxx/host/session/live_session.
[warning] [Context] cannot open trace "net://localhost:xxxx/host/session/live_session" for reading.
[error] opening trace "net://localhost:xxxx/host/session/live_session" for reading.
[error] none of the specified trace paths could be opened.
```
While debugging both viewers, I noticed that both were attempting to
receive the available metadata before consuming the "data" streams'
content.
Typically, the following exchange between the relay daemon and the
lttng-live client occurs when the problem is observed:
bt lttng-live:
emits LTTNG_VIEWER_GET_METADATA command
relayd:
returns LTTNG_VIEWER_METADATA_OK, len = 4096 (default packet size)
bt lttng-live:
consume 4096 bytes of metadata
emits LTTNG_VIEWER_GET_METADATA command
relayd:
returns LTTNG_VIEWER_NO_NEW_METADATA
When the lttng-live client receives the LTTNG_VIEWER_NO_NEW_METADATA
status code, it attempts to parse all the metadata it has received
since the last LTTNG_VIEWER_NO_NEW_METADATA reply. In effect, it is
expected that this forms a logical unit of metadata that is parseable
on its own.
If this is the first time metadata is received for that trace, the
metadata is expected to contain a trace declaration, packet header
declaration, etc.
If metadata was already received, it is expected that the newly parsed
declarations can be "appended" to the existing trace schema.
It appears that the relay daemon sends the
LTTNG_VIEWER_NO_NEW_METADATA while the metadata it has sent up to that
point is not parseable on its own.
The live protocol description does not require or imply that a viewer
should attempt to parse metadata packets until it hopefully succeeds
at some point. Anyhow:
1) This would make it impossible for a live viewer to correctly
handle a corrupted metadata stream beyond retrying forever,
2) This behaviour is not implemented by the two reference
implementations of the protocol.
Cause
=====
The relay daemon provides a guarantee that it will send any available
metadata before allowing a data stream packet to be served to the
client.
In other words, a client requesting a data packet will receive the
LTTNG_VIEWER_FLAG_NEW_METADATA status code (and no data) if it
attempts to get a data stream packet while the relay daemon has
metadata already available.
This guarantee is properly enforced as far as I can tell. However,
looking at the consumer daemon implementation, it appears that
metadata packets are sent as soon as they are available.
A metadata packet is not guaranteed to be parseable on its own. For
instance, it can end in the middle the an event declaration.
Hence, this hints at a race involving the tracer, the consumer daemon,
the relay daemon, and the lttng-live client.
Consider the following scenario:
- Metadata packets (sub-buffers) are configured to be 4kB in size,
- a large number of kernel events are enabled (e.g. --kernel --all),
- the network connection between the consumer and relay daemons is
slow
1) The kernel tracer will produce enough TSDL metadata to fill the
first sub-buffer of the "metadata" ring-buffer and signal the
consumer daemon that a buffer is ready. The tracer then starts
writing the remaining data in the following available sub-buffers.
2) The consumer daemon metadata thread is woken up and consumes the
first metadata sub-buffer and sends it to the relay daemon.
3) A live client establishes an lttng-live connection to the relay
daemon and attempts to consume the available metadata. It receives
the first packet and, since the relay daemon doesn't know about any
follow-up metadata, receives LTTNG_VIEWER_NO_NEW_METADATA on the
next attempt.
4) Having received LTTNG_VIEWER_NO_NEW_METADATA, the lttng-live client
attempts to parse the metadata it has received and fails.
This scenario is easy to reproduce by inserting a "sleep(1)" at
src/bin/lttng-relayd/main.c:1978 (as of this revision). This simulates
a relay daemon that would be slow to receive/process metadata packets
from the consumer daemon.
This problem similarly applies to the user space tracer.
Solution
========
Having no knowledge of TSDL, the relay daemon can't "bundle" packets
of metadata until they form a "parseable unit" to send to the consumer
daemon.
To provide the parseability guarantee expected by the viewers, and by
the relay daemon implicitly, we need to ensure that the consumer
daemons only send "parseable units" of metadata to the relay daemon.
Unfortunately, the consumer daemons do not know how to parse TSDL
either. In fact, only the metadata producers are able to provide the
boundaries of the "parseable units" of metadata.
The general idea of the fix is to accumulate metadata up to a point
where a "parseable unit" boundary has been identified and send that
content in one request to the relay daemon. Note that the solution
described here only concerns the live mode. In other cases, the
mechanisms described are simply bypassed.
A "metadata bucket" is added to lttng_consumer_stream when it is
created from a live channel. This bucket is filled until the
consumption position reaches the "parseable unit" end position.
A refresher about the handling of metadata in live mode
-------------------------------------------------------
Three "events" are of interest here and can cause metadata to be
consumed more or less indirectly:
1) A metadata packet is closed, causing the metadata thread to wake
up
2) The live timer expires
3) A data sub-buffer is closed, causing the data thread to wake-up
1) The first case is simple and happens regardless of whether or not
the tracing session is in live mode or not. Metadata is always
consumed by the metadata thread in the same way. However, this
scenario can be "caused" by (2) and (3). See [1]. A sub-buffer is
"acquired" from the metadata ring-buffer and sent to the relayd
daemon as the payload of a "RELAYD_SEND_METADATA" command.
2) When the live timer expires [2], the 'check_stream' function is
called on all data streams of the session. As its name clearly
implies, this function is responsible for flushing all streams or
sending a "live beacon" (called an "empty index" in the code) if
there is no data to flush. Any flushed data will result in (3).
3) When a data sub-buffer is ready to be consumed, [1] is invoked
by the data thread. This function acquires a sub-buffer and sends
it to the relay daemon through the data connection.
Then, an important synchronization step takes place. The index of
the newly-sent packet will be sent through the control
connection. The relay daemon waits for both the data packet and its
matching index before making the new packet visible to live
viewers.
Since a data packet could contain data that requires "newer"
metadata to be decoded, the data thread flushes the metadata stream
and enters a "waiting" phase to pause until all metadata present in
the metadata ring buffer has been consumed [3].
At the end of this waiting phase, the data thread sends the data
packet's index to the relay daemon, allowing the relayd to make it
visible to its live clients.
How to identify a "parseable unit" boundary?
--------------------------------------------
In the case of the kernel domain, the kernel tracer produces the
actual TSDL descriptions directly. The TSDL metadata is serialized to
a metadata cache and is flushed "just in time" to the metadata
ring-buffer when a "get next" operation is performed.
There is no way, from user space, to query whether or not the metadata
cache of the kernel tracer is empty. Hence, a new
RING_RING_BUFFER_GET_NEXT_SUBBUF_METADATA_CHECK command was added to
query whether or not the kernel tracer's metadata cache is empty when
acquiring a sub-buffer.
This allows the consumer daemon to identify a "coherent" position in
the metadata stream that is safe to use as a "parseable unit"
boundary.
As for the user space domain, since the session daemon is responsible
for generating the TSDL representation of the metadata, there is no
need to change LTTng-ust APIs.
The session daemon generates coherent units of metadata and adds them
to its "registry" at once (protected by the registry's lock). It then
flushes the contents to the consumer daemon and waits for that data to
be consumed before proceeding further.
On the consumer daemon side, the metadata cache is filled with the
newly-produced contents. This is done atomically with respect to
accesses to the metadata cache as all accesses happen through a
dedicated metadata cache lock.
When the consumer's metadata polling thread is woken-up, it will
attempt to acquire (`get_next`) a sub-buffer from the metadata stream
ring-buffer. If it fails, it will flush a sub-buffer's worth of
metadata to the ring-buffer and attempt to acquire a sub-buffer again.
At this point, it is possible to determine if that sub-buffer is the
last one of a parseable metadata unit: the cache must be empty and the
ring-buffer must be empty following the consumption of this
sub-buffer. When those conditions are met, the resulting metadata
`stream_subbuffer` is tagged as being `coherent`.
Metadata bucket
---------------
A helper interface, metadata_bucket, is introduced as part of this
fix. A metadata_bucket is `fill`ed with `stream_subbuffer`s, and is
eventually `flushed` when it is filled by a `coherent` sub-buffer.
As older versions of LTTng-modules must remain supported, this new
helper is not used when the
RING_RING_BUFFER_GET_NEXT_SUBBUF_METADATA_CHECK operation is not
available. When the operation is available, the metadata stream's
bucketization is enabled, causing a bucket to be created and the
`consume` callback to be swapped.
The `consume` callback of the metadata streams is replaced by a new
implementation when the metadata bucketization is activated on the
stream. This implementation returns the padded size of the consumed
sub-buffer when they could be added to the bucket. When the bucket is
flushed, the regular `mmap`-based consumption function is called with
the bucket's contents.
Known drawbacks
===============
This implementation causes the consumer daemon to buffer the whole
initial unit of metadata before sending it. In practice, this is not
expected to be a problem since the largest metadata files we have seen
in real use are a couple of megabytes wide.
Beyond the (temporary) memory use, this causes the metadata thread to
block while this potentially large chunk of metadata is sent (rather
than blocking while sending 4kb at a time).
The second point is just a consequence of existing shortcomings of the
consumerd; slow IO should not affect other unrelated streams. The
fundamental problem is that blocking IO is used and we should switch
to non-blocking communication if this is a problem (as is done in the
relay daemon).
The first point is more problematic given the existing tracer APIs.
If the tracer could provide the boundary of a "parseable unit" of
metadata, we could send the header of the RELAYD_SEND_METADATA command
with that size and send the various metadata packets as they are made
available. This would make no difference to the relay daemon as it is
not blocking on that socket and will not make the metadata size change
visible to the "live server" until it has all been received.
This size can't be determined right now since it could exceed the
total size of the "metadata" ring buffer. In other words, we can't wait
for the production of metadata to complete before starting to consume.
Finally, while implementing this fix, I also realized that the
computation of the rotation position of the metadata streams is
erroneous. The rotation code makes use of the ring-buffer's positions
to determine the rotation position. However, since both user space and
kernel domains make use of a "cache" behind the ring-buffer, that
cached content must be taken into account when computing the metadata
stream's rotation position.
References
==========
[1] https://github.com/lttng/lttng-tools/blob/
d5ccf8fe0/src/common/consumer/consumer.c#L3433
[2] https://github.com/lttng/lttng-tools/blob/
d5ccf8fe0/src/common/consumer/consumer-timer.c#L312
[3] https://github.com/lttng/lttng-tools/blob/
d5ccf8fe0/src/common/consumer/consumer-stream.c#L492
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I40ee07e5c344c72d9aae2b9b15dc36c00b21e5fa
Jérémie Galarneau [Sun, 10 May 2020 22:00:26 +0000 (18:00 -0400)]
consumerd: refactor: split read_subbuf into sub-operations
The read_subbuf code paths intertwine domain-specific operations and
metadata/data-specific logic which makes modifications error prone and
introduces a fair amount of code duplication.
lttng_consumer_read_subbuffer is effectively turned into a template
method invoking overridable callbacks making most of the consumption
logic domain and data/metadata agnostic.
The goal is not to extensively clean-up that code path. A follow-up
fix introduces metadata buffering logic which would not reasonably fit
in the current scheme. This clean-up makes it easier to safely
introduce those changes.
No changes in behaviour are intended by this change.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I9366f2e2a38018ca9b617b93ad9259340180c55d
Jérémie Galarneau [Fri, 8 May 2020 20:00:11 +0000 (16:00 -0400)]
consumerd: move rotation logic to domain-agnostic read path
The "rotation ready" logic is duplicated in both user space and kernel
specializations of the read subbuffer functions.
It is moved to the domain-agnostic caller where it is needed only
once. This makes it easier to implement a follow-up fix and reduces
code duplication.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Iae952a2cd52fa458cec956ae219492557e4adf79
Jérémie Galarneau [Tue, 5 May 2020 22:54:32 +0000 (18:54 -0400)]
sessiond: enforce mmap output type for kernel metadata channel
A follow-up fix causes the consumer daemon to accumulate metadata
packets into a complete "unit" that can be parsed before sending it to
the relay daemon.
The consumer daemon will also need to extract the contents of the
metadata cache when computing a rotation position (follow-up fix too).
Hence, it is not possible to rely on the splice back-end as the
consumer daemon may need to accumulate more content than can be backed
by the ring buffer's underlying pages.
In both cases, the splice output mode could still be used when
combined with a memfd, but I see no tangible benefit. Moreover, it
would require a 3.17 kernel.
Curiously the kernel metadata channel configuration appears to be
hard-coded twice; once in the ltt_kernel_session's
ltt_kernel_metadata, and once again in
kernel_consumer_add_metadata(). kernel_consumer_add_metadata is
modified to use the kernel session's metadata configuration.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ia4cad82f595d3eee50d081851c234d4c2ef7ee5f
Jérémie Galarneau [Tue, 5 May 2020 19:48:05 +0000 (15:48 -0400)]
consumerd: tag metadata channel as being part of a live session
metadata channels that are part of a live session must be handled
differently than when they are part of non-live sessions since
complete "metadata units" must be accumulated before they are
forwarded to a relay daemon.
This allows a follow-up fix to use this information since the
live_timer_interval of a metadata channel is always 0.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I53db4bc717b149ed20e0309531db6f0241e873e1
Jérémie Galarneau [Tue, 5 May 2020 17:13:03 +0000 (13:13 -0400)]
consumerd: pass channel instance to stream creation function
Both callsites of consumer_allocate_stream() set the stream's "chan"
pointer after the creation. Pass the channel directly to the stream
creation function so it can initialize the stream according to the
channel's settings.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Icea7088e7695e310585bf398e14e6443d67a30bb
Jérémie Galarneau [Mon, 4 May 2020 23:04:02 +0000 (19:04 -0400)]
consumerd: cleanup: use buffer view interface for mmap read subbuf
Replace explicit pointer + size parameters by an lttng_buffer_view
in lttng_consumer_on_read_subbuffer_mmap().
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I76f35b3e295c596cdf4bbb8a6d01168a850a721a
Jérémie Galarneau [Mon, 4 May 2020 22:21:48 +0000 (18:21 -0400)]
consumerd: move address computation from on_read_subbuffer_mmap
The computation of the subbuffer's address is moved outside of
lttng_consumer_on_read_subbuffer_mmap to make it usable with a regular
buffer. This facilitates an upcoming change.
Moreover this has the benefit of isolating domain-specific logic from
this function which is supposed to be domain-agnostic.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I16f8ccaa73804f98fa03e69136548e6d6b7782e5
Jérémie Galarneau [Wed, 29 Apr 2020 04:03:43 +0000 (00:03 -0400)]
consumerd: refactor: combine duplicated check_*_functions
The check_ust_stream and check_kernel_stream functions are identical
except for the call to the domain-specific call to
consumer_flush_*_index.
A "flush_index" callback is passed to check_stream in order to share
the rest of that code.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Iafdb64192322c0106a555b67f54290dadc4f0579
Jérémie Galarneau [Wed, 29 Apr 2020 01:40:12 +0000 (21:40 -0400)]
kerner-ctl: add RING_RING_BUFFER_GET_NEXT_SUBBUF_METADATA_CHECK
Add a wrapper for RING_RING_BUFFER_GET_NEXT_SUBBUF_METADATA_CHECK
which gets the next metadata subbuffer and returns a boolean flag
indicating whether the metadata is guaranteed to be in a consistent
state at the end of this sub-buffer (can be parsed).
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I13fbdfe51c3c4ef04581409e0fbc9837ed6d555d
Jérémie Galarneau [Fri, 15 May 2020 20:04:11 +0000 (16:04 -0400)]
Add lttng_dynamic_buffer_append_view util
Add lttng_dynamic_buffer_append_view() which appends the contents
of a buffer view to a dynamic buffer.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I4082ba2c848b79aa2116847987067453638de441
Jérémie Galarneau [Thu, 14 May 2020 20:08:56 +0000 (16:08 -0400)]
Fix: lttng: Destroying session message repeated during destruction
Observed Issue
==============
The `Destroying session X` is repeated indifinitely whenever
the data pending phase lasts more than one iteration.
```
$ lttng destroy
Destroying session eloi_turcotte.Destroying session
eloi_turcotte.Destroying session eloi_turcotte.Destroying session
eloi_turcotte.D
```
Cause
=====
Missing check that the message has been printed.
Solution
========
Use the same check as is done later for
lttng_destruction_handle_wait_for_completion().
Known drawbacks
===============
None.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I6cd29d917925644a4994c515b4177bbd05ffa98e
Jérémie Galarneau [Tue, 19 May 2020 21:00:23 +0000 (17:00 -0400)]
Update version to v2.11.4
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Tue, 19 May 2020 16:23:18 +0000 (12:23 -0400)]
Fix: common: abort on rotation after time manipulation
Observed issue
==============
Core dump:
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x0000003eb4025548 in __GI_abort () at abort.c:79
#2 0x0000003eb402542f in __assert_fail_base (fmt=0x3eb4184ae0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x4cdee0 "(trace_chunk->timestamp_close).is_set",
file=0x4cde78 "../../../lttng-tools-2.11.3/src/common/trace-chunk.c", line=903, function=0x4cf4a0 <__PRETTY_FUNCTION__.6756> "lttng_trace_chunk_move_to_completed")
at assert.c:92
#3 0x0000003eb4033af2 in __GI___assert_fail (assertion=assertion@entry=0x4cdee0 "(trace_chunk->timestamp_close).is_set",
file=file@entry=0x4cde78 "../../../lttng-tools-2.11.3/src/common/trace-chunk.c", line=line@entry=903,
function=function@entry=0x4cf4a0 <__PRETTY_FUNCTION__.6756> "lttng_trace_chunk_move_to_completed") at assert.c:101
#4 0x000000000047f37e in lttng_trace_chunk_move_to_completed (trace_chunk=0x7fcb5c00e570) at ../../../lttng-tools-2.11.3/src/common/trace-chunk.c:903
#5 0x0000000000480755 in lttng_trace_chunk_release (ref=0x7fcb5c00e598) at ../../../lttng-tools-2.11.3/src/common/trace-chunk.c:1117
#6 urcu_ref_put (release=<optimized out>, ref=0x7fcb5c00e598) at /usr/include/urcu/ref.h:68
#7 lttng_trace_chunk_put (chunk=0x7fcb5c00e570) at ../../../lttng-tools-2.11.3/src/common/trace-chunk.c:1150
#8 0x0000000000429c22 in cmd_rotate_session (session=0x7fcb5c003ff0, rotate_return=rotate_return@entry=0x7fcb6b7ed470, quiet_rotation=quiet_rotation@entry=false)
at ../../../../lttng-tools-2.11.3/src/bin/lttng-sessiond/cmd.c:5037
#9 0x00000000004451d7 in process_client_msg (cmd_ctx=0x7fcb5c00e760, sock=sock@entry=0x7fcb6b7fd4c0, sock_error=sock_error@entry=0x7fcb6b7fd4c4)
at ../../../../lttng-tools-2.11.3/src/bin/lttng-sessiond/client.c:1852
#10 0x00000000004474c6 in thread_manage_clients (data=<optimized out>) at ../../../../lttng-tools-2.11.3/src/bin/lttng-sessiond/client.c:2199
#11 0x00000000004422f2 in launch_thread (data=0x4f97a0) at ../../../../lttng-tools-2.11.3/src/bin/lttng-sessiond/thread.c:75
#12 0x0000003eb4408ed4 in start_thread (arg=<optimized out>) at pthread_create.c:479
#13 0x0000003eb40f8e6f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Reproduction:
Disable ntp/any time management mechanism.
lttng create
lttng enable-event -u 'lttng_ust_tracef:*'
lttng start
lttng rotate
date --set="$(date --date='-1 hour')"
lttng rotate auto-
20200515-142503
Waiting for rotation to complete
Error: Failed to query the state of the rotation.
Logs:
DEBUG1 - 12:25:28.
570037987 [2660/2717]: Setting trace chunk close command to "move to completed chunk folder" (in lttng_trace_chunk_set_close_command() at ../../../lttng-tools-2.11.3/src/common/trace-chunk.c:1073)
Error: Failed to set trace chunk close timestamp: close timestamp is before creation timestamp
Error: Failed to set the close timestamp of the current trace chunk of session "auto-
20200515-142503"
lttng-sessiond: ../../../lttng-tools-2.11.3/src/common/trace-chunk.c:903: lttng_trace_chunk_move_to_completed: Assertion `(trace_chunk->timestamp_close).is_set' failed.
...
Aborted (core dumped)
root@X10SDV-8C-TLN4F:~# DEBUG1 - 12:25:29.
534263017 [2739/2739]: Releasing trace chunk registry to all trace chunks (in lttng_trace_chunk_registry_put_each_chunk() at ../../../lttng-tools-2.11.3/src/common/trace-chunk.c:1414)
DEBUG1 - 12:25:29.
534317468 [2739/2739]: Releasing reference to trace chunk: session_id = 0chunk_id = 2, name = "20200515T122528+0000-2", status = closed (in lttng_trace_chunk_registry_put_each_chunk() at ../../../lttng-tools-2.11.3/src/common/trace-chunk.c:1435)
DEBUG1 - 12:25:29.
534365653 [2739/2739]: Releasing reference to trace chunk: session_id = 0chunk_id = 1, name = "20200515T142520+0000-1", status = closed (in lttng_trace_chunk_registry_put_each_chunk() at ../../../lttng-tools-2.11.3/src/common/trace-chunk.c:1435)
DEBUG1 - 12:25:29.
534400638 [2739/2739]: Released reference to 2 trace chunks in lttng_trace_chunk_registry_put_each_chunk() (in lttng_trace_chunk_registry_put_each_chunk() at ../../../lttng-tools-2.11.3/src/common/trace-chunk.c:1447)
Error: 2 trace chunks are leaked by lttng-consumerd. This can be caused by an internal error of the session daemon.
Cause
=====
The trace_chunk->timestamp_close is not set since the result from time()
is smaller than the creation timestamp.
The close timestamp is smaller because the calendar system time is
modified by an administrator.
time() offers no monotonicity guarantee and hence is exposed to time
modification of the system.
The begin and close timestamps are strictly used in the name generation
of the chunk/archives. Given the current usage of these timestamps
validating monotonicity should not be a fatal error. Name uniqueness is
provided by the chunk name suffix (auto increment).
Solution
========
Do not enforce monotonicity for the begin and close timestamps but warn
on unexpected return (begin > close).
Known drawbacks
=========
None.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ic4b17285d150358d1569d6821c451c243e64e9a1
Francis Deslauriers [Mon, 4 May 2020 19:59:22 +0000 (15:59 -0400)]
Fix: tests: `-Wstringop-overflow` warning
I get the following warning when compiling with `-Wall -Werror` with gcc
9.3.0:
In file included from ../../src/common/macros.h:15,
from ../../src/common/lttng-kernel.h:14,
from ../../src/bin/lttng-sessiond/trace-kernel.h:14,
from test_kernel_data.c:16:
In function ‘lttng_strnlen’,
inlined from ‘lttng_strncpy’ at ../../src/common/macros.h:120:6,
inlined from ‘test_create_kernel_event’ at test_kernel_data.c:136:2:
../../src/common/compat/string.h:28:8: error: ‘memchr’ reading 256 bytes from a region of size 11 [-Werror=stringop-overflow=]
28 | end = memchr(str, 0, max);
| ^~~~~~~~~~~~~~~~~~~
Fix this warning by using the RANDOM_STRING_LEN value as the max number
of bytes to copy.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Change-Id: I61752ee17163c4d642aad21b296c0fc4fad5b7a6
(cherry picked from commit
1dd622b10db0821d77490c937caee80c65332f14)
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jonathan Rajotte [Tue, 7 Apr 2020 20:27:05 +0000 (16:27 -0400)]
Fix: load: incomplete error handling for load_session_from_file
Observed issue
==============
lttng-ivc test fails to fail.
test_save_load_blocking_timeout[lttng-tools-2.12-lttng-tools-2.11-False]
Here we load a xml created by lttng-tools-2.12 and try to load it using
lttng-tools 2.11. We expect this to fail on the load.
The command report an error on the stderr but the command return code
value is zero.
From lttng-ivc test runtime.log:
Command #0
Return value: 0
Command: lttng load --input-path=/home/joraj/lttng/lttng-ivc/.tox/py3/tmp/test_save_load_blocking_timeou0/save_load saved_trace
STDOUT:
Session saved_trace has been loaded successfully
STDERR:
XML Error: Element 'process_attr_trackers': This element is not expected.
Error: Session configuration file validation failed
Cause
==============
The error coming from load_session_from_file is not handled correctly.
Solution
========
Rework error handling in load_session_from_path and
load_session_from_file.
LTTNG_ERR_LOAD_SESSION_NOENT is NOT an error when session_name is
specified in load_session_from_path. In this scenario, we are actively
looking for the configuration of the session.
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I0fea474045b718664e70ffe169e4e4fa125791a8
Jérémie Galarneau [Thu, 16 Apr 2020 19:20:18 +0000 (15:20 -0400)]
Update version to v2.11.3
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Wed, 1 Apr 2020 22:34:59 +0000 (18:34 -0400)]
Fix: lttng-destroy: missing newline on session destruction message
The lost packets/discarded events statistics are printed on the same
line as the session destruction progress message when the session is
stopped as part of the `destroy` command.
This is a consequence of printing the statistics as they are
retrieved; the statistics must be fetched before the destruction,
but the progress indicator is still being printed.
The statistics output is now formatted to a buffer and printed
after the session's destruction has completed.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I2105056b109774a57c83be3e30984038880c0fb7
Simon Marchi [Wed, 25 Mar 2020 22:39:30 +0000 (18:39 -0400)]
Fix: common: add `void` parameter to log_add_time declaration
It makes it match the definition and fixes this warning:
CC error.lo
/home/smarchi/src/lttng-tools/src/common/error.c:33:13: error: no previous prototype for ‘log_add_time’ [-Werror=missing-prototypes]
const char *log_add_time(void)
^~~~~~~~~~~~
Change-Id: Ibbf5eebd8171504bc7050c754e2a5d113b6b5387
Signed-off-by: Simon Marchi <simon.marchi@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Mon, 23 Mar 2020 23:12:26 +0000 (19:12 -0400)]
Fix: lttng-list: don't warn when the kernel domain has no channels
Some commands beside lttng-enable-channel have the side-effect of
creating a domain. For instance, the lttng-track and lttng-untrack
commands will implicitly create their target domains if they don't
exist. Thus, it is not unexpected for a domain to exist without
channels.
Currently, tracking process attributes in the user space and kernel
domains will result in a warning being printed when lttng-status
(or lttng-list `the_session`) is invoked.
Example output:
Tracing session arielle_bolduc: [inactive]
Trace output: /home/jgalar/lttng-traces/arielle_bolduc-
20200323-191128
=== Domain: Linux kernel ===
Tracked process attributes
Process IDs: all
Virtual Process IDs: 12365, 526, 41
User IDs: all
Virtual User IDs: all
Group IDs: all
Virtual Group IDs: all
Warning: No kernel channel
=== Domain: User space ===
Buffering scheme: per-user
Tracked process attributes
Virtual Process IDs: 12365, 526, 41
Virtual User IDs: all
Virtual Group IDs: all
The warning is removed since it can only confuse users.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I733ee6c3d8c1d297194842c7bdbdf3f237661e51
Jérémie Galarneau [Thu, 5 Mar 2020 21:52:32 +0000 (16:52 -0500)]
Fix: sessiond: occasional badfd error on repeated SIGTERM
The session daemon occasionally prints the following messages
when it received multiple SIGTERM signals:
PERROR - 16:50:18.
505585257 [49845/49845]: write poll pipe: Bad file
descriptor (in notify_thread_pipe() at utils.c:35)
This is caused by a (somewhat inevitable) race between the teardown of
the daemon and the closing of its quit pipe. This happens more often
when kernel modules take a long time to be unloaded and the user
spams ctrl+c in the hope of convincing the daemon process to close
faster since modules are unloaded after closing the quit pipe.
Setting closed pipe fds to '-1' is safe anyway and is already
handled by the notify_thread_pipe() util.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: If08a455c65055dcb56ffbad3703173e6a45cfba5
Jérémie Galarneau [Wed, 4 Mar 2020 23:27:28 +0000 (18:27 -0500)]
Fix: lttng: incorrect domain list printed when no domain is provided
The following commands make use of a common utility function to
validate the count of domains specified and print an error when it
is invalid:
- lttng-enable-event,
- lttng-disable-event,
- lttng-track,
- lttng-untrack,
- lttng-add-context,
- lttng-enable-channel,
- lttng-disable-channel.
Those commands do not allow the same domains to be used. In fact, they
all expect --kernel or --userspace only, except for the
lttng-enable-event, lttng-disable-event, and lttng-add-context
commands which allow the --log4j, --jul, and --python domain options
to be used.
Currently, the error message when no domain is specified is incorrect
for all of those commands. The error reads as follows:
`Error: Please specify a domain (-k/-u/-j).`
For most commands, the -j option cannot be used. For those that allow
agent domains, the message is missing the -l and -p domains.
This ensures that the expected domains are printed for each of those
commands.
Moreover, the message is clarified by using the long form option
names.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I45aee075dbf6c62c4120bdeb06697b88b2d8716c
Francis Deslauriers [Mon, 9 Mar 2020 14:49:51 +0000 (10:49 -0400)]
directory-handle: print `errno` reason on `unlinkat()` error
This is helpful while troubleshooting clear and rotation issues.
Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I92f9551a49fced7603c8dca832566de71b188528
Jonathan Rajotte [Tue, 3 Mar 2020 20:30:10 +0000 (15:30 -0500)]
Fix: lttng-ctl: _handle can be NULL
lttng_destroy_session_no_wait does not care for the returned handle
and passes a NULL _handle.
This leads to an immediate LTTNG_ERR_INVALID failure on
lttng_destroy_session_no_wait calls.
lttng_destroy_session_ext already performs a null check on _handle
before assigning it to transfer ownership of the handle.
Fixes #1241
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Change-Id: I9e8bee28382fa2250dee720037efdf77b4e776b8
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Jérémie Galarneau [Tue, 10 Mar 2020 02:06:01 +0000 (22:06 -0400)]
Silence `POSIX Yacc` warnings
Flex/Bison warn that we make use of two non-POSIX directives,
%code and %define. Coincidentally, we don't use POSIX Yacc!
From the bison documentation, "[...] -Wno-yacc will hide the warnings
about POSIX Yacc incompatibilities." [1].
Thus, this flag is added to the YFLAGS.
YACC filter-parser.c
/home/jgalar/EfficiOS/src/lttng-tools/src/lib/lttng-ctl/filter/filter-parser.y:293.1-5:
warning: POSIX Yacc does not support %code [-Wyacc]
293 | %code provides
| ^~~~~
/home/jgalar/EfficiOS/src/lttng-tools/src/lib/lttng-ctl/filter/filter-parser.y:301.1-7:
warning: POSIX Yacc does not support %define [-Wyacc]
301 | %define api.pure
| ^
[1] https://www.gnu.org/software/bison/manual/html_node/Diagnostics.html
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Iaf65534bc47514d3a99b97ce414880a6afc723b9
Jérémie Galarneau [Tue, 18 Feb 2020 01:05:22 +0000 (20:05 -0500)]
Fix: relayd: use of relay_session ref count before initialization
The relay_session's reference count is used before it is initialized
on multiple code paths of session_create(). The initialization of the
reference count, mutexes, and intrusive data structure nodes are
initialized earlier to make their use safe in the event of an error.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I1be53ad88a3e783b85b4c568527df1a75ce58d3a
Jérémie Galarneau [Fri, 14 Feb 2020 22:09:58 +0000 (17:09 -0500)]
Fix: lttng: uninitialized pointer free'd when no sessiond is present
The error path of get_schedules assumes that schedules_comm is !NULL,
which is not the case currently.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I16e2d9d45cd3df9cfa80214abe03bcc782fa1f11
Jérémie Galarneau [Wed, 12 Feb 2020 17:25:23 +0000 (12:25 -0500)]
Update version to v2.11.2
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Id506bae1bbb778096e92af867d0cb5324992c864
Jérémie Galarneau [Fri, 20 Dec 2019 03:13:10 +0000 (22:13 -0500)]
Fix: sessiond: check for lttng-modules ABI 2.1 rather than 2.8
The clear patchset introduces a regression that breaks session
rotations when the kernel domain is used. The 2.8 lttng-modules ABI
does not exist yet.
The packet sequence number functionality was introduced in LTTng 2.8,
which introduced the 2.1 kernel tracer ABI.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ief97e7e25f6f0fcb5b5b97b39abb417c1eb327ec
This page took 0.063165 seconds and 4 git commands to generate.