[lttng-tools.git] / doc / calibrate.txt

LTTng calibrate command documentation
Mathieu Desnoyers, August 6, 2011

The LTTng calibrate command can be used to find out the combined average
overhead of the LTTng tracer and the instrumentation mechanisms used.
This overhead can be calibrated in terms of time or using any of the PMU
performance counter available on the system.

For now, the only calibration implemented is that of the kernel function
instrumentation (kretprobes).


* Calibrate kernel function instrumentation

Let's use an example to show this calibration. We use an i7 processor
with 4 general-purpose PMU registers. This information is available by
issuing dmesg, looking for "generic registers".

This sequence of commands will gather a trace executing a kretprobe
hooked on an empty function, gathering PMU counters LLC (Last Level
Cache) misses information (see lttng add-context --help to see the list
of available PMU counters).

(as root)
lttng create calibrate-function
lttng enable-event calibrate --kernel --function lttng_calibrate_kretprobe
lttng add-context --kernel -t perf:LLC-load-misses -t perf:LLC-store-misses \
		-t perf:LLC-prefetch-misses
lttng start
for a in $(seq 1 10); do \
	lttng calibrate --kernel --function;
done
lttng destroy
babeltrace $(ls -1drt ~/lttng-traces/calibrate-function-* | tail -n 1)

The output from babeltrace can be saved to a text file and opened in a
spreadsheet (e.g. oocalc) to focus on the per-PMU counter delta between
consecutive "calibrate_entry" and "calibrate_return" events. Note that
these counters are per-CPU, so scheduling events would need to be
present to account for migration between CPU. Therefore, for calibration
purposes, only events staying on the same CPU must be considered.

The average result, for the i7, on 10 samples:

                             Average     Std.Dev.
perf_LLC_load_misses:		5.0       0.577
perf_LLC_store_misses:		1.6       0.516
perf_LLC_prefetch_misses:	9.0      14.742

As we can notice, the load and store misses are relatively stable across
runs (their standard deviation is relatively low) compared to the
prefetch misses. We can conclude from this information that LLC load and
store misses can be accounted for quite precisely, but prefetches within
a function seems to behave too erratically (not much causality link
between the code executed and the CPU prefetch activity) to be accounted
for.
Commit	Line	Data
	1	LTTng calibrate command documentation
	2	Mathieu Desnoyers, August 6, 2011
	3
	4	The LTTng calibrate command can be used to find out the combined average
	5	overhead of the LTTng tracer and the instrumentation mechanisms used.
	6	This overhead can be calibrated in terms of time or using any of the PMU
	7	performance counter available on the system.
	8
	9	For now, the only calibration implemented is that of the kernel function
	10	instrumentation (kretprobes).
	11
	12
	13	* Calibrate kernel function instrumentation
	14
	15	Let's use an example to show this calibration. We use an i7 processor
	16	with 4 general-purpose PMU registers. This information is available by
	17	issuing dmesg, looking for "generic registers".
	18
	19	This sequence of commands will gather a trace executing a kretprobe
	20	hooked on an empty function, gathering PMU counters LLC (Last Level
	21	Cache) misses information (see lttng add-context --help to see the list
	22	of available PMU counters).
	23
	24	(as root)
	25	lttng create calibrate-function
	26	lttng enable-event calibrate --kernel --function lttng_calibrate_kretprobe
	27	lttng add-context --kernel -t perf:LLC-load-misses -t perf:LLC-store-misses \
	28	-t perf:LLC-prefetch-misses
	29	lttng start
	30	for a in $(seq 1 10); do \
	31	lttng calibrate --kernel --function;
	32	done
	33	lttng destroy
	34	babeltrace $(ls -1drt ~/lttng-traces/calibrate-function-* \| tail -n 1)
	35
	36	The output from babeltrace can be saved to a text file and opened in a
	37	spreadsheet (e.g. oocalc) to focus on the per-PMU counter delta between
	38	consecutive "calibrate_entry" and "calibrate_return" events. Note that
	39	these counters are per-CPU, so scheduling events would need to be
	40	present to account for migration between CPU. Therefore, for calibration
	41	purposes, only events staying on the same CPU must be considered.
	42
	43	The average result, for the i7, on 10 samples:
	44
	45	Average Std.Dev.
	46	perf_LLC_load_misses: 5.0 0.577
	47	perf_LLC_store_misses: 1.6 0.516
	48	perf_LLC_prefetch_misses: 9.0 14.742
	49
	50	As we can notice, the load and store misses are relatively stable across
	51	runs (their standard deviation is relatively low) compared to the
	52	prefetch misses. We can conclude from this information that LLC load and
	53	store misses can be accounted for quite precisely, but prefetches within
	54	a function seems to behave too erratically (not much causality link
	55	between the code executed and the CPU prefetch activity) to be accounted
	56	for.