1 <!DOCTYPE html PUBLIC
"-//W3C//DTD HTML 4.01 Transitional//EN">
4 <title>The LTTng trace format
</title>
8 <h1>The LTTng trace format
</h1>
11 This document describes the LTTng trace format. It should be used only by
12 developers who code the LTTng tracer or the traceread LTTV library, as this
13 library offers all the necessary abstractions on top of the raw trace data.
16 A trace is contained in a directory tree. To send a trace remotely,
17 the directory tree may be tar-gzipped. Trace foo, placed in the home
18 directory of user john, /home/john, would have the following content:
59 The eventdefs directory contains the events descriptions for all the
60 facilities used. The syntax is a simple subset of XML; XML is widely
61 known and easily parsed or hand edited. Each file contains one or more
62 <FACILITY NAME=name
>...
</FACILITY> elements. Indeed, several
63 facilities may have the same name but different content (and thus will
64 generate a different checksum). It typically happens when, while tracing
65 is enabled, a module using the named facility is unloaded, modified
66 (along with the description of some events), recompiled and reloaded.
67 Then, the trace will contain events from two different, similarly named,
71 A small number of events are predefined, part of the
"core" facility,
72 and are not present there. These
"core" events include
"facility_load",
73 "facility_unload",
"time_heartbeat" and
"state_dump_facility_load".
76 The root directory contains a tracefile for each cpu, numbered from
0,
77 in .trace format. A uniprocessor thus only contains the file cpu_0.
78 A multi-processor with some unused (possibly hotplug) CPU slots may have some
79 unused CPU numbers. For instance a
8 way SMP board with
6 CPUs randomly
80 installed may produce tracefiles named
0,
1,
2,
4,
6,
7.
83 The files in the control directory also follow the .trace format and are also
85 The
"facilities" file only contains
"core" facility_load, facility_unload,
86 time_heartbeat and state_dump_facility_load events
87 and is used to determine the facilities used and the code range assigned
88 to each facility. The other control files contain the initial system
89 state and various subsequent important events, for example process
90 creations and exit. The interest of placing such subsequent events
91 in control trace files instead of (or in addition to) in the per cpu
92 trace files is that they may be accessed more quickly/conveniently
93 and that they may be kept even when the per cpu files are overwritten
94 in
"flight recorder mode".
97 The info directory contains in system.xml a description of the system on which
98 the trace was created as well as different user annotations in bookmark.xml.
99 This directory may also contain various information about the trace, generated
100 during trace analysis (statistics, index...).
103 <H2>Trace format
</H2>
106 Each tracefile is divided into equal size blocks with a header at the beginning
107 of the block. Events are packed sequentially in the block starting right after
110 Each block consists of :
112 block start/end header
115 event
1 variable length data
117 event
2 variable length data
123 The block start/end header
127 * the beginning of buffer information
129 * Used only when no TSC is available.
133 * TSC at the beginning of the buffer
135 * frequency of the CPUs at the beginning of the buffer.
137 * the end of buffer information
139 * Used only when no TSC is available.
143 * TSC at the beginning of the buffer
145 * frequency of the CPUs at the beginning of the buffer.
147 * number of bytes of padding at the end of the buffer.
149 * size of the sub-buffer.
159 *
0x00D6B7ED, used to check the trace byte order vs host byte order.
161 * Architecture type of the traced machine.
163 * Architecture variant of the traced machine. May be unused on some arch.
164 uint32 float_word_order
165 * Byte order of floats and doubles, sometimes different from integer byte
166 order. Useful only for user space traces.
168 * Size (in bytes) of the void * on the traced machine.
170 * major version of the trace.
172 * minor version of the trace.
173 uint8 flight_recorder
174 * Is flight recorder mode activated ? If yes, data might be missing
175 (overwritten) in the trace.
177 * Does this trace have heartbeat timer event activated ?
178 Yes (
1) -
> Event header has
32 bits TSC
179 No (
0) -
> Event header has
64 bits TSC
181 * Is the information in this trace aligned ?
182 Yes (
1) -
> aligned on min(arch size, atomic data size).
183 No (
0) -
> data is packed.
185 * Does the traced machine has a working TSC ?
186 Yes (
1) -
> event time is calculated from :
187 trace_start_time + ((event_tsc - trace_start_tsc) * freq)
188 No (
0) -
> event time is calculated from :
190 + (buffer start timestamp - trace start_monotonic)
194 * CPUs clock frequency at the beginnig of the trace.
196 * TSC at the beginning of the trace.
197 uint64 start_monotonic
198 * monotonically increasing time at the beginning of the trace.
199 (currently not supported)
201 * Real time at the beginning of the trace (as given by date, adjusted by NTP)
202 This is the only time reference with the real world : the rest of the trace
203 has monotonically increasing time from this point (with TSC difference and
214 Event headers differs depending on those conditions : does the traced system has
215 a heartbeat timer ? Is tracing alignment activated ?
223 * if has_heartbeat :
32 LSB of the cycle counter at the event record time.
224 * else :
64 bits complete cycle counter.
225 * note : if there is no working TSC (has_tsc ==
0), then this field contains
226 either the complete monotonically increasing time or the time delta from the
227 previous heartbeat event. (unsupported)
229 * Numerical ID of the facility corresponding to the event. See the facility
230 tracefile to know which facility ID matches which facility name and
233 * Numerical ID of the event inside the facility.
235 * Size of the variable length data that follows this header.
239 Event header alignment
242 If trace alignment is activated (has_alignment), the event header is aligned
243 on the architecture size (void pointer size). In addition, a padding is
244 automatically added after the event header so the variable length data is
245 automatically aligned on the architecture size.
249 <H2>System description
</H2>
252 The system type description, in system.xml, looks like:
257 domainname=
"polymtl.ca"
262 kernel_release=
"2.4.18-686-smp"
263 kernel_version=
"#1 SMP Sun Apr 14 12:07:19 EST 2002"
266 hardware_platform=
"unknown"
267 operating_system=
"Linux"
268 ltt_major_version=
"2"
269 ltt_minor_version=
"0"
270 ltt_block_size=
"100000"
272 Some comments about the system
277 The system attributes kernel_name, node_name, kernel_release,
278 kernel_version, machine, processor, hardware_platform and operating_system
279 come from the uname(
1) program. The domainname attribute is obtained from
280 the
"hostname --domain" command. The arch_size attribute is one of
281 LP32, ILP32, LP64 or ILP64 and specifies the length in bits of integers (I),
282 long (L) and pointers (P). The endian attribute is
"little" or
"big".
283 While the arch_size and endian attributes could be deduced from the platform
284 type, having these explicit allows analysing traces from yet unknown
285 platforms. The cpu attribute specifies the maximum number of processors in
286 the system; only tracefiles
0 to this maximum -
1 may exist in the cpu
290 Within the system element, the text enclosed may describe further the
294 <H2>Event type descriptions
</H2>
297 A facility contains the descriptions of several event types. When a structure
298 is reused in several event types, a named type is defined and may be referenced
299 by several other event types or named types.
302 <facility name=facility_name
>
303 <description
>Some text
</description
>
304 <event name=eventtype_name
>
305 <description
>Some text
</description
>
309 <type name=type_name
>
316 The type structure may be one of the following primitive type elements.
317 Whenever the keyword isize is used, the allowed values are
318 short, medium, long,
1,
2,
4,
8, indicating the size in bytes.
319 The fsize keyword represents one of medium, long,
4 and
8 bytes.
322 <int size=isize
format=
"printf format"/
>
324 <uint size=isize
format=
"printf format"/
>
326 <float size=fsize
format=
"printf format"/
>
328 <string
format=
"printf format"/
>
330 <enum size=isize
format=
"printf format">label1 label2 ...
</enum
>
334 The string is null terminated. For the enumeration, the size of the integer
335 used for its representation is specified.
338 The type structure may also be a compound type.
341 <array size=n
> --type structure--
</array
>
343 <sequence lengthsize=isize
> --type structure--
</sequence
>
346 <field name=field_name
>
347 <description
>Some text
</description
>
353 <union typecodesize=isize
>
354 <field name=field_name
>
355 <description
>Some text
</description
>
363 Array is a fixed size array of length size. Sequence is a variable size
364 array with its length stored as a prepended uint of length lengthsize.
365 A structure is simply an aggregation of fields. An union is one of its n
366 fields (variant record), as indicated by a preceeding code (
0 to n -
1)
367 of the specified size typecodesize.
370 Finally the type structure may be defined by referencing a named type.
373 <typeref name=type_name/
>
376 <H2>Builtin events
</H2>
379 The facility named
"builtin" is always present and contains at least the
380 following event types.
383 <event name=facility_load
>
384 <description
>Facility used in the trace
</description
>
386 <field
name=
"name"><string/
></field
>
387 <field
name=
"checksum"><uint size=
4/
></field
>
388 <field
name=
"base_code"><uint size=
4/
></field
>
392 <event name=block_start
>
393 <description
>Block start timestamp
</description
>
394 <typeref name=block_timestamp/
>
397 <event name=block_end
>
398 <description
>Block end timestamp
</description
>
399 <typeref name=block_timestamp/
>
402 <event name=time_heartbeat
>
403 <description
>System time values sent periodically to minimize cycle counter
404 drift with respect to real time clock and to detect cycle counter
407 <typeref name=timestamp/
>
410 <type name=block_timestamp
>
412 <field name=timestamp
><typeref name=timestamp
></field
>
413 <field name=block_id
><uint size=
4/
></field
>
417 <type name=timestamp
>
419 <field name=time
><typeref name=timespec/
></event
>
420 <field
name=
"cycle_count"><uint size=
8/
></field
>
424 <type name=timespec
>
426 <field
name=
"seconds"><uint size=
4/
></field
>
427 <field
name=
"nanoseconds"><uint size=
4/
></field
>
432 <H2>Control files
</H2>
435 The interrupts file reflects the content of the /proc/interrupts system file.
436 It contains one event describing each interrupt. At trace start, events are
437 generated describing all the current interrupts. If the assignment of
438 interrupts changes later, due to devices or device drivers being activated or
439 deactivated, additional events may be added to the file. Each interrupt
440 event has the following structure.
443 <event name=interrupt
>
444 <description
>Interrupt request number assignment
<description
>
446 <field
name=
"number"><uint size=
4/
></field
>
447 <field
name=
"count"><uint size=
4/
></field
>
448 <field
name=
"controller"><string/
></field
>
449 <field
name=
"name"><string/
></field
>
455 The processes file contains the list of processes already created when the
456 trace starts. Each process describing event is modeled after the
457 /proc/self/status system file. The number of fields in this event is
458 expected to be expanded in the future to include groups, signal masks,
459 opened file descriptors and address maps.
462 <event name=process
>
463 <description
>Existing process
<description
>
465 <field
name=
"name"><string/
></field
>
466 <field
name=
"pid"><uint size=
4/
></field
>
467 <field
name=
"ppid"><uint size=
4/
></field
>
468 <field
name=
"tracer_pid"><uint size=
4/
></field
>
469 <field
name=
"uid"><uint size=
4/
></field
>
470 <field
name=
"euid"><uint size=
4/
></field
>
471 <field
name=
"suid"><uint size=
4/
></field
>
472 <field
name=
"fsuid"><uint size=
4/
></field
>
473 <field
name=
"gid"><uint size=
4/
></field
>
474 <field
name=
"egid"><uint size=
4/
></field
>
475 <field
name=
"sgid"><uint size=
4/
></field
>
476 <field
name=
"fsgid"><uint size=
4/
></field
>
477 <field
name=
"state"><enum size=
4>
478 Running WaitInterruptible WaitUninterruptible Zombie Traced Paging
479 </enum
></field
>
487 Facilities define a granularity of events grouping for filtering, activation
488 and compilation. Each facility does cost a table entry in the kernel (name,
489 checksum, event type code range), or somewhere between
20 and
30 bytes. Having
490 one facility per tracing statement in the kernel would be too much (assuming
491 that they eventually are routinely inserted in the kernel code and replace
492 the
80000+ printk statements in some proportion). However, having a few
493 facilities, up to a few tens, would make sense.
496 The
"builtin" facility contains a small number of predefined events which must
497 always exist. The
"core" facility contains a small subset of OS events which
498 are almost always of interest (scheduling, interrupts, faults, system calls).
499 Then, specialized facilities may exist for each subsystem (network, disks,
506 Bookmarks are user supplied information added to a trace. They contain user
507 annotations attached to a time interval.
511 <location name=name cpu=n start_time=t end_time=t
>Some text
</location
>
517 The interval is defined using either
"time=" or
"start_time=" and
518 "end_time=", or
"cycle=" or
"start_cycle=" and
"end_cycle=".
519 The time is in seconds with decimals up to nanoseconds and cycle counts
520 are unsigned integers with a
64 bits range. The cpu attribute is optional.