--- /dev/null
+LTTng calibrate command documentation
+Mathieu Desnoyers, August 6, 2011
+
+The LTTng calibrate command can be used to find out the combined average
+overhead of the LTTng tracer and the instrumentation mechanisms used.
+This overhead can be calibrated in terms of time or using any of the PMU
+performance counter available on the system.
+
+For now, the only calibration implemented is that of the kernel function
+instrumentation (kretprobes).
+
+
+* Calibrate kernel function instrumentation
+
+Let's use an example to show this calibration. We use an i7 processor
+with 4 general-purpose PMU registers. This information is available by
+issuing dmesg, looking for "generic registers".
+
+This sequence of commands will gather a trace executing a kretprobe
+hooked on an empty function, gathering PMU counters LLC (Last Level
+Cache) misses information (see lttng add-context --help to see the list
+of available PMU counters).
+
+(as root)
+lttng create calibrate-function
+lttng enable-event calibrate --kernel --function lttng_calibrate_kretprobe
+lttng add-context --kernel -t perf:LLC-load-misses -t perf:LLC-store-misses \
+ -t perf:LLC-prefetch-misses
+lttng start
+for a in $(seq 1 10); do \
+ lttng calibrate --kernel --function;
+done
+lttng destroy
+babeltrace -n $(ls -1drt ~/lttng-traces/calibrate-function-* | tail -n 1)
+
+The output from babeltrace can be saved to a text file and opened in a
+spreadsheet (e.g. oocalc) to focus on the per-PMU counter delta between
+consecutive "calibrate_entry" and "calibrate_return" events. Note that
+these counters are per-CPU, so scheduling events would need to be
+present to account for migration between CPU. Therefore, for calibration
+purposes, only events staying on the same CPU must be considered.
+
+The average result, for the i7, on 10 samples:
+
+ Average Std.Dev.
+perf_LLC_load_misses: 5.0 0.577
+perf_LLC_store_misses: 1.6 0.516
+perf_LLC_prefetch_misses: 9.0 14.742
+
+As we can notice, the load and store misses are relatively stable across
+runs (their standard deviation is relatively low) compared to the
+prefetch misses. We can conclude from this information that LLC load and
+store misses can be accounted for quite precisely, but prefetches within
+a function seems to behave too erratically (not much causality link
+between the code executed and the CPU prefetch activity) to be accounted
+for.