| 1 | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/2002/REC-xhtml1-20020801/DTD/xhtml1-strict.dtd"> |
| 2 | <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> |
| 3 | |
| 4 | <head> |
| 5 | <title>The LTTng trace format</title> |
| 6 | </head> |
| 7 | |
| 8 | <body> |
| 9 | |
| 10 | <h1>The LTTng trace format</h1> |
| 11 | |
| 12 | <p> |
| 13 | <em>Last update: 2008/06/02</em> |
| 14 | </p> |
| 15 | |
| 16 | <p> |
| 17 | This document describes the LTTng trace format. It should be useful mainly to |
| 18 | developers who code the LTTng tracer or the traceread LTTV library, as this |
| 19 | library offers all the necessary abstractions on top of the raw trace data. |
| 20 | </p> |
| 21 | |
| 22 | <p> |
| 23 | A trace is contained in a directory tree. To send a trace remotely, the |
| 24 | directory tree may be tar-gzipped. The trace <tt>foo</tt>, placed in the home |
| 25 | directory of user john, /home/john, would have the following contents: |
| 26 | </p> |
| 27 | |
| 28 | <pre><tt> |
| 29 | $ cd /home/john |
| 30 | $ tree foo |
| 31 | foo/ |
| 32 | |-- control |
| 33 | | |-- facilities_0 |
| 34 | | |-- facilities_1 |
| 35 | | |-- facilities_... |
| 36 | | |-- interrupts_0 |
| 37 | | |-- interrupts_1 |
| 38 | | |-- interrupts_... |
| 39 | | |-- modules_0 |
| 40 | | |-- modules_1 |
| 41 | | |-- modules_... |
| 42 | | |-- network_0 |
| 43 | | |-- network_1 |
| 44 | | |-- network_... |
| 45 | | |-- processes_0 |
| 46 | | |-- processes_1 |
| 47 | | `-- processes_... |
| 48 | |-- cpu_0 |
| 49 | |-- cpu_1 |
| 50 | `-- cpu_... |
| 51 | |
| 52 | </tt></pre> |
| 53 | |
| 54 | <p> |
| 55 | The root directory contains a tracefile for each cpu, numbered from 0, |
| 56 | in .trace format. A uniprocessor thus only contains the file cpu_0. |
| 57 | A multi-processor with some unused (possibly hotplug) CPU slots may have some |
| 58 | unused CPU numbers. For instance an 8 way SMP board with 6 CPUs randomly |
| 59 | installed may produce tracefiles named 0, 1, 2, 4, 6, 7. |
| 60 | </p> |
| 61 | |
| 62 | <p> |
| 63 | The files in the control directory also follow the .trace format and are |
| 64 | also per cpu. The "facilities" files only contain "core" marker_id, |
| 65 | marker_format and time_heartbeat events. The first two are used to describe the |
| 66 | events that are in the trace. The other control files contain the initial |
| 67 | system state and various subsequent important events, for example process |
| 68 | creations and exit. The interest of placing such subsequent events in control |
| 69 | trace files instead of (or in addition to) in the per cpu trace files is that |
| 70 | they may be accessed more quickly/conveniently and that they may be kept even |
| 71 | when the per cpu files are overwritten in "flight recorder mode". |
| 72 | </p> |
| 73 | |
| 74 | <h2>Trace format</h2> |
| 75 | |
| 76 | <p> |
| 77 | Each tracefile is divided into equal size blocks with a header at the beginning |
| 78 | of the block. Events are packed sequentially in the block starting right after |
| 79 | the block header. |
| 80 | </p> |
| 81 | |
| 82 | <p> |
| 83 | Each block consists of : |
| 84 | </p> |
| 85 | |
| 86 | <pre><tt> |
| 87 | block start/end header |
| 88 | trace header |
| 89 | event 1 header |
| 90 | event 1 variable length data |
| 91 | event 2 header |
| 92 | event 2 variable length data |
| 93 | .... |
| 94 | padding |
| 95 | </tt></pre> |
| 96 | |
| 97 | <h3>The block start/end header</h3> |
| 98 | |
| 99 | <pre><tt> |
| 100 | begin |
| 101 | * the beginning of buffer information |
| 102 | uint64 cycle_count |
| 103 | * TSC at the beginning of the buffer |
| 104 | uint64 freq |
| 105 | * frequency of the CPUs at the beginning of the buffer. |
| 106 | end |
| 107 | * the end of buffer information |
| 108 | uint64 cycle_count |
| 109 | * TSC at the end of the buffer |
| 110 | uint64 freq |
| 111 | * frequency of the CPUs at the end of the buffer. |
| 112 | uint32 lost_size |
| 113 | * number of bytes of padding at the end of the buffer. |
| 114 | uint32 buf_size |
| 115 | * size of the sub-buffer. |
| 116 | </tt></pre> |
| 117 | |
| 118 | |
| 119 | |
| 120 | <h3>The trace header</h3> |
| 121 | |
| 122 | <pre><tt> |
| 123 | uint32 magic_number |
| 124 | * 0x00D6B7ED, used to check the trace byte order vs host byte order. |
| 125 | uint32 arch_type |
| 126 | * Architecture type of the traced machine. |
| 127 | uint32 arch_variant |
| 128 | * Architecture variant of the traced machine. May be unused on some arch. |
| 129 | uint32 float_word_order |
| 130 | * Byte order of floats and doubles, sometimes different from integer byte |
| 131 | order. Useful only for user space traces. |
| 132 | uint8 arch_size |
| 133 | * Size (in bytes) of the void * on the traced machine. |
| 134 | uint8 major_version |
| 135 | * major version of the trace. |
| 136 | uint8 minor_version |
| 137 | * minor version of the trace. |
| 138 | uint8 flight_recorder |
| 139 | * Is flight recorder mode activated ? If yes, data might be missing |
| 140 | (overwritten) in the trace. |
| 141 | uint8 has_heartbeat |
| 142 | * Does this trace have heartbeat timer event activated ? |
| 143 | Yes (1) -> Event header has 32 bits TSC |
| 144 | No (0) -> Event header has 64 bits TSC |
| 145 | uint8 alignment |
| 146 | * Are event headers in this trace aligned ? |
| 147 | Yes -> the value indicates the alignment |
| 148 | No (0) -> data is packed. |
| 149 | uint8 tsc_lsb_truncate |
| 150 | * Used for compact channels |
| 151 | uint8 tscbits |
| 152 | * Used for compact channels |
| 153 | uint8 compact_data_shift |
| 154 | * Used for compact channels |
| 155 | uint32 freq_scale |
| 156 | event time is always calculated from : |
| 157 | trace_start_time + ((event_tsc - trace_start_tsc) * (freq / freq_scale)) |
| 158 | uint64 start_freq |
| 159 | * CPUs clock frequency at the beginnig of the trace. |
| 160 | uint64 start_tsc |
| 161 | * TSC at the beginning of the trace. |
| 162 | uint64 start_monotonic |
| 163 | * monotonically increasing time at the beginning of the trace. |
| 164 | (currently not supported) |
| 165 | start_time |
| 166 | * Real time at the beginning of the trace (as given by date, adjusted by NTP) |
| 167 | This is the only time reference with the real world : the rest of the trace |
| 168 | has monotonically increasing time from this point (with TSC difference and |
| 169 | clock frequency). |
| 170 | uint32 seconds |
| 171 | uint32 nanoseconds |
| 172 | </tt></pre> |
| 173 | |
| 174 | |
| 175 | <h3>Event header</h3> |
| 176 | |
| 177 | <p> |
| 178 | Event headers differ according to the following conditions : does the |
| 179 | traced system have a heartbeat timer? Is tracing alignment activated? |
| 180 | </p> |
| 181 | |
| 182 | <p> |
| 183 | Event header : |
| 184 | </p> |
| 185 | <pre><tt> |
| 186 | { uint32 timestamp |
| 187 | or |
| 188 | uint64 timestamp } |
| 189 | * if has_heartbeat : 32 LSB of the cycle counter at the event record time. |
| 190 | * else : 64 bits complete cycle counter. |
| 191 | uint8 facility_id |
| 192 | * Numerical ID of the facility corresponding to the event. See the facility |
| 193 | tracefile to know which facility ID matches which facility name and |
| 194 | description. |
| 195 | uint8 event_id |
| 196 | * Numerical ID of the event inside the facility. |
| 197 | uint16 event_size |
| 198 | * Size of the variable length data that follows this header. |
| 199 | </tt></pre> |
| 200 | |
| 201 | <p> |
| 202 | Event header alignment |
| 203 | </p> |
| 204 | |
| 205 | <p> |
| 206 | If trace alignment is activated (<tt>alignment</tt>), the event header is |
| 207 | aligned. In addition, padding is automatically added after the event header so |
| 208 | the variable length data is automatically aligned on the architecture size. |
| 209 | </p> |
| 210 | |
| 211 | <!-- |
| 212 | <h2>System description</h2> |
| 213 | |
| 214 | <p> |
| 215 | The system type description, in system.xml, looks like: |
| 216 | </p> |
| 217 | |
| 218 | <pre><tt> |
| 219 | <system |
| 220 | node_name="vaucluse" |
| 221 | domainname="polymtl.ca" |
| 222 | cpu=4 |
| 223 | arch_size="ILP32" |
| 224 | endian="little" |
| 225 | kernel_name="Linux" |
| 226 | kernel_release="2.4.18-686-smp" |
| 227 | kernel_version="#1 SMP Sun Apr 14 12:07:19 EST 2002" |
| 228 | machine="i686" |
| 229 | processor="unknown" |
| 230 | hardware_platform="unknown" |
| 231 | operating_system="Linux" |
| 232 | ltt_major_version="2" |
| 233 | ltt_minor_version="0" |
| 234 | ltt_block_size="100000" |
| 235 | > |
| 236 | Some comments about the system |
| 237 | </system> |
| 238 | </tt></pre> |
| 239 | |
| 240 | <p> |
| 241 | The system attributes kernel_name, node_name, kernel_release, |
| 242 | kernel_version, machine, processor, hardware_platform and operating_system |
| 243 | come from the uname(1) program. The domainname attribute is obtained from |
| 244 | the "hostname --domain" command. The arch_size attribute is one of |
| 245 | LP32, ILP32, LP64 or ILP64 and specifies the length in bits of integers (I), |
| 246 | long (L) and pointers (P). The endian attribute is "little" or "big". |
| 247 | While the arch_size and endian attributes could be deduced from the platform |
| 248 | type, having these explicit allows analysing traces from yet unknown |
| 249 | platforms. The cpu attribute specifies the maximum number of processors in |
| 250 | the system; only tracefiles 0 to this maximum - 1 may exist in the cpu |
| 251 | directory. |
| 252 | </p> |
| 253 | |
| 254 | <p> |
| 255 | Within the system element, the text enclosed may describe further the |
| 256 | system traced. |
| 257 | </p> |
| 258 | |
| 259 | |
| 260 | <h2>Bookmarks</h2> |
| 261 | |
| 262 | <p> |
| 263 | Bookmarks are user supplied information added to a trace. They contain user |
| 264 | annotations attached to a time interval. |
| 265 | </p> |
| 266 | |
| 267 | |
| 268 | <pre><tt> |
| 269 | <bookmarks> |
| 270 | <location name=name cpu=n start_time=t end_time=t>Some text</location> |
| 271 | ... |
| 272 | </bookmarks> |
| 273 | </tt></pre> |
| 274 | |
| 275 | <p> |
| 276 | The interval is defined using either "time=" or "start_time=" and |
| 277 | "end_time=", or "cycle=" or "start_cycle=" and "end_cycle=". |
| 278 | The time is in seconds with decimals up to nanoseconds and cycle counts |
| 279 | are unsigned integers with a 64 bits range. The cpu attribute is optional. |
| 280 | </p> |
| 281 | |
| 282 | --> |
| 283 | </body> |
| 284 | </html> |