| 1 | LTTng calibrate command documentation |
| 2 | Mathieu Desnoyers, August 6, 2011 |
| 3 | |
| 4 | The LTTng calibrate command can be used to find out the combined average |
| 5 | overhead of the LTTng tracer and the instrumentation mechanisms used. |
| 6 | This overhead can be calibrated in terms of time or using any of the PMU |
| 7 | performance counter available on the system. |
| 8 | |
| 9 | For now, the only calibration implemented is that of the kernel function |
| 10 | instrumentation (kretprobes). |
| 11 | |
| 12 | |
| 13 | * Calibrate kernel function instrumentation |
| 14 | |
| 15 | Let's use an example to show this calibration. We use an i7 processor |
| 16 | with 4 general-purpose PMU registers. This information is available by |
| 17 | issuing dmesg, looking for "generic registers". |
| 18 | |
| 19 | This sequence of commands will gather a trace executing a kretprobe |
| 20 | hooked on an empty function, gathering PMU counters LLC (Last Level |
| 21 | Cache) misses information (see lttng add-context --help to see the list |
| 22 | of available PMU counters). |
| 23 | |
| 24 | (as root) |
| 25 | lttng create calibrate-function |
| 26 | lttng enable-event calibrate --kernel --function lttng_calibrate_kretprobe |
| 27 | lttng add-context --kernel -t perf:LLC-load-misses -t perf:LLC-store-misses \ |
| 28 | -t perf:LLC-prefetch-misses |
| 29 | lttng start |
| 30 | for a in $(seq 1 10); do \ |
| 31 | lttng calibrate --kernel --function; |
| 32 | done |
| 33 | lttng destroy |
| 34 | babeltrace $(ls -1drt ~/lttng-traces/calibrate-function-* | tail -n 1) |
| 35 | |
| 36 | The output from babeltrace can be saved to a text file and opened in a |
| 37 | spreadsheet (e.g. oocalc) to focus on the per-PMU counter delta between |
| 38 | consecutive "calibrate_entry" and "calibrate_return" events. Note that |
| 39 | these counters are per-CPU, so scheduling events would need to be |
| 40 | present to account for migration between CPU. Therefore, for calibration |
| 41 | purposes, only events staying on the same CPU must be considered. |
| 42 | |
| 43 | The average result, for the i7, on 10 samples: |
| 44 | |
| 45 | Average Std.Dev. |
| 46 | perf_LLC_load_misses: 5.0 0.577 |
| 47 | perf_LLC_store_misses: 1.6 0.516 |
| 48 | perf_LLC_prefetch_misses: 9.0 14.742 |
| 49 | |
| 50 | As we can notice, the load and store misses are relatively stable across |
| 51 | runs (their standard deviation is relatively low) compared to the |
| 52 | prefetch misses. We can conclude from this information that LLC load and |
| 53 | store misses can be accounted for quite precisely, but prefetches within |
| 54 | a function seems to behave too erratically (not much causality link |
| 55 | between the code executed and the CPU prefetch activity) to be accounted |
| 56 | for. |