git.lttng.org Git - lttng-ust.git/commit

Fix: ring buffer: handle concurrent update in nested buffer wrap around check

With stress-test loads that trigger sub-buffer switch very frequently
(small 4kB sub-buffers, frequent flush) in lttng-modules, we currently
observe this kind of warnings once every few minutes:

[65335.896208] ring buffer relay-overwrite-mmap, cpu 5: records were lost. Caused by:
[65335.896208] [ 0 buffer full, 1 nest buffer wrap-around, 0 event too big ]

It appears that the check for nested buffer wrap-around does not take
into account that a concurrent execution contexts (either nested for
per-cpu buffers, or from another CPU or nested for global buffers) can
update the commit_count value concurrently.

What we really want to do with this check is to ensure that if we enter
a sub-buffer that had an unbalanced reserve/commit count, assuming there
is no hope that this gets rebalanced promptly, we detect this and drop
the current event. However, in the case where the commit counter has
been concurrently updated by another reserve or a switch, we want to
retry the entire reserve operation.

One way to detect this is to sample the reserve offset twice, around the
commit counter read, along with the appropriate memory barriers.
Therefore, we can detect if the mismatch between reserve and commit
counter is actually caused by a concurrent update, which necessarily has
updated the reserve counter.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

author	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
	Mon, 1 Jul 2013 21:10:52 +0000 (17:10 -0400)
committer	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
	Mon, 1 Jul 2013 21:16:31 +0000 (17:16 -0400)
commit	ad50273e30c66aaff8af9e801b61f9402f0d70e4
tree	bb9175c22680f2f4ce6b3243365ba19d3d3911b3	tree \| snapshot
parent	3ff6e388aad7e21dbbd556cbe1d53f1d6714534e	commit \| diff

libringbuffer/frontend_internal.h		diff \| blob \| blame \| history
libringbuffer/ring_buffer_frontend.c		diff \| blob \| blame \| history