1 <!DOCTYPE html PUBLIC
"-//W3C//DTD HTML 4.01 Transitional//EN">
4 <title>The new LTT trace format
</title>
8 <h1>The new LTT trace format
</h1>
11 A trace is contained in a directory tree. To send a trace remotely,
12 the directory tree may be tar-gzipped. Trace foo, placed in the home
13 directory of user john, /home/john, would have the following content:
39 The eventdefs directory contains the events descriptions for all the
40 facilities used. The syntax is a simple subset of XML; XML is widely
41 known and easily parsed or hand edited. Each file contains one or more
42 <FACILITY NAME=name
>...
</FACILITY> elements. Indeed, several
43 facilities may have the same name but different content (and thus will
44 generate a different checksum), typically when the event descriptions
45 for a given facility change from one version to the next, if a module
46 is recompiled and reloaded during a trace.
49 A small number of events are predefined, part of the
"builtin" facility,
50 and are not present there. These
"builtin" events include
"facility_load",
51 "block_start",
"block_end" and
"time_heartbeat".
54 The cpu directory contains a tracefile for each cpu, numbered from
0,
55 in .trace format. A uniprocessor thus only contains the file cpu/
0.
56 A multi-processor with some unused (possibly hotplug) CPU slots may have some
57 unused CPU numbers. For instance a
8 way SMP board with
6 CPUs randomly
58 installed may produce tracefiles named
0,
1,
2,
4,
6,
7.
61 The files in the control directory also follow the .trace format.
62 The
"facilities" file only contains
"builtin" facility_load events
63 and is used to determine the facilities used and the code range assigned
64 to each facility. The other control files contain the initial system
65 state and various subsequent important events, for example process
66 creations and exit. The interest of placing such subsequent events
67 in control trace files instead of (or in addition to) in the per cpu
68 trace files is that they may be accessed more quickly/conveniently
69 and that they may be kept even when the per cpu files are overwritten
70 in
"flight recorder mode".
73 The info directory contains in system.xml a description of the system on which
74 the trace was created as well as different user annotations in bookmark.xml.
75 This directory may also contain various information about the trace, generated
76 during trace analysis (statistics, index...).
82 Each tracefile is divided into equal size blocks with an uint32 at the block
83 end giving the offset to the last event in the block. Events are packed
84 sequentially in the block starting at offset
0 with a
"block_start" event
85 and ending, at the offset stored in the last
4 bytes of the block, with a
86 block_end event. Both the block_start and block_end events
87 contain the kernel timestamp (timespec binary structure,
88 uint32 seconds, uint32 nanoseconds), the cycle counter (uint64 cycles),
89 and the buffer id (uint64).
92 Each event consists in an event type id (uint16 which is the event type id
93 within the facility + the facility base id), a time delta (uint32 in cycles
94 or nanoseconds, depending on configuration, since the last time value, in the
95 block header or in a
"time_heartbeat" event) and the event type specific data.
96 All values are packed in native byte order binary format.
99 <H2>System description
</H2>
102 The system type description, in system.xml, looks like:
107 domainname=
"polymtl.ca"
112 kernel_release=
"2.4.18-686-smp"
113 kernel_version=
"#1 SMP Sun Apr 14 12:07:19 EST 2002"
116 hardware_platform=
"unknown"
117 operating_system=
"Linux"
118 ltt_major_version=
"2"
119 ltt_minor_version=
"0"
120 ltt_block_size=
"100000"
122 Some comments about the system
127 The system attributes kernel_name, node_name, kernel_release,
128 kernel_version, machine, processor, hardware_platform and operating_system
129 come from the uname(
1) program. The domainname attribute is obtained from
130 the
"hostname --domain" command. The arch_size attribute is one of
131 LP32, ILP32, LP64 or ILP64 and specifies the length in bits of integers (I),
132 long (L) and pointers (P). The endian attribute is
"little" or
"big".
133 While the arch_size and endian attributes could be deduced from the platform
134 type, having these explicit allows analysing traces from yet unknown
135 platforms. The cpu attribute specifies the maximum number of processors in
136 the system; only tracefiles
0 to this maximum -
1 may exist in the cpu
140 Within the system element, the text enclosed may describe further the
144 <H2>Event type descriptions
</H2>
147 A facility contains the descriptions of several event types. When a structure
148 is reused in several event types, a named type is defined and may be referenced
149 by several other event types or named types.
152 <facility name=facility_name
>
153 <description
>Some text
</description
>
154 <event name=eventtype_name
>
155 <description
>Some text
</description
>
159 <type name=type_name
>
166 The type structure may be one of the following primitive type elements.
167 Whenever the keyword isize is used, the allowed values are
168 short, medium, long,
1,
2,
4,
8, indicating the size in bytes.
169 The fsize keyword represents one of medium, long,
4 and
8 bytes.
172 <int size=isize
format=
"printf format"/
>
174 <uint size=isize
format=
"printf format"/
>
176 <float size=fsize
format=
"printf format"/
>
178 <string
format=
"printf format"/
>
180 <enum size=isize
format=
"printf format">label1 label2 ...
</enum
>
184 The string is null terminated. For the enumeration, the size of the integer
185 used for its representation is specified.
188 The type structure may also be a compound type.
191 <array size=n
> --type structure--
</array
>
193 <sequence lengthsize=isize
> --type structure--
</sequence
>
196 <field name=field_name
>
197 <description
>Some text
</description
>
203 <union typecodesize=isize
>
204 <field name=field_name
>
205 <description
>Some text
</description
>
213 Array is a fixed size array of length size. Sequence is a variable size
214 array with its length stored as a prepended uint of length lengthsize.
215 A structure is simply an aggregation of fields. An union is one of its n
216 fields (variant record), as indicated by a preceeding code (
0 to n -
1)
217 of the specified size typecodesize.
220 Finally the type structure may be defined by referencing a named type.
223 <typeref name=type_name/
>
226 <H2>Builtin events
</H2>
229 The facility named
"builtin" is always present and contains at least the
230 following event types.
233 <event name=facility_load
>
234 <description
>Facility used in the trace
</description
>
236 <field
name=
"name"><string/
></field
>
237 <field
name=
"checksum"><uint size=
4/
></field
>
238 <field
name=
"base_code"><uint size=
4/
></field
>
242 <event name=block_start
>
243 <description
>Block start timestamp
</description
>
244 <typeref name=block_timestamp/
>
247 <event name=block_end
>
248 <description
>Block end timestamp
</description
>
249 <typeref name=block_timestamp/
>
252 <event name=time_heartbeat
>
253 <description
>System time values sent periodically to minimize cycle counter
254 drift with respect to real time clock and to detect cycle counter
257 <typeref name=timestamp/
>
260 <type name=block_timestamp
>
262 <field name=timestamp
><typeref name=timestamp
></field
>
263 <field name=block_id
><uint size=
4/
></field
>
267 <type name=timestamp
>
269 <field name=time
><typeref name=timespec/
></event
>
270 <field
name=
"cycle_count"><uint size=
8/
></field
>
274 <type name=timespec
>
276 <field
name=
"seconds"><uint size=
4/
></field
>
277 <field
name=
"nanoseconds"><uint size=
4/
></field
>
282 <H2>Control files
</H2>
285 The interrupts file reflects the content of the /proc/interrupts system file.
286 It contains one event describing each interrupt. At trace start, events are
287 generated describing all the current interrupts. If the assignment of
288 interrupts changes later, due to devices or device drivers being activated or
289 deactivated, additional events may be added to the file. Each interrupt
290 event has the following structure.
293 <event name=interrupt
>
294 <description
>Interrupt request number assignment
<description
>
296 <field
name=
"number"><uint size=
4/
></field
>
297 <field
name=
"count"><uint size=
4/
></field
>
298 <field
name=
"controller"><string/
></field
>
299 <field
name=
"name"><string/
></field
>
305 The processes file contains the list of processes already created when the
306 trace starts. Each process describing event is modeled after the
307 /proc/self/status system file. The number of fields in this event is
308 expected to be expanded in the future to include groups, signal masks,
309 opened file descriptors and address maps.
312 <event name=process
>
313 <description
>Existing process
<description
>
315 <field
name=
"name"><string/
></field
>
316 <field
name=
"pid"><uint size=
4/
></field
>
317 <field
name=
"ppid"><uint size=
4/
></field
>
318 <field
name=
"tracer_pid"><uint size=
4/
></field
>
319 <field
name=
"uid"><uint size=
4/
></field
>
320 <field
name=
"euid"><uint size=
4/
></field
>
321 <field
name=
"suid"><uint size=
4/
></field
>
322 <field
name=
"fsuid"><uint size=
4/
></field
>
323 <field
name=
"gid"><uint size=
4/
></field
>
324 <field
name=
"egid"><uint size=
4/
></field
>
325 <field
name=
"sgid"><uint size=
4/
></field
>
326 <field
name=
"fsgid"><uint size=
4/
></field
>
327 <field
name=
"state"><enum size=
4>
328 Running WaitInterruptible WaitUninterruptible Zombie Traced Paging
329 </enum
></field
>
337 Facilities define a granularity of events grouping for filtering, activation
338 and compilation. Each facility does cost a table entry in the kernel (name,
339 checksum, event type code range), or somewhere between
20 and
30 bytes. Having
340 one facility per tracing statement in the kernel would be too much (assuming
341 that they eventually are routinely inserted in the kernel code and replace
342 the
80000+ printk statements in some proportion). However, having a few
343 facilities, up to a few tens, would make sense.
346 The
"builtin" facility contains a small number of predefined events which must
347 always exist. The
"core" facility contains a small subset of OS events which
348 are almost always of interest (scheduling, interrupts, faults, system calls).
349 Then, specialized facilities may exist for each subsystem (network, disks,
356 Bookmarks are user supplied information added to a trace. They contain user
357 annotations attached to a time interval.
361 <location name=name cpu=n start_time=t end_time=t
>Some text
</location
>
367 The interval is defined using either
"time=" or
"start_time=" and
368 "end_time=", or
"cycle=" or
"start_cycle=" and
"end_cycle=".
369 The time is in seconds with decimals up to nanoseconds and cycle counts
370 are unsigned integers with a
64 bits range. The cpu attribute is optional.