--- /dev/null
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html>
+<head>
+ <title>The new LTT trace format</title>
+</head>
+ <body>
+
+<h1>The new LTT trace format</h1>
+
+<P>
+A trace is contained in a directory tree. To send a trace remotely,
+the directory tree may be tar-gzipped. Trace foo, placed in the home
+directory of user john, /home/john, would have the following content:
+
+<PRE><TT>
+$ cd /home/john
+$ tree foo
+foo/
+|-- eventdefs
+| |-- core.xml
+| |-- net.xml
+| |-- ipv4.xml
+| `-- ide.xml
+|-- info
+| |-- bookmarks.xml
+| `-- system.xml
+|-- control
+| |-- facilities
+| |-- interrupts
+| `-- processes
+`-- cpu
+ |-- 0
+ |-- 1
+ |-- 2
+ `-- 3
+</TT></PRE>
+
+<P>
+The eventdefs directory contains the events descriptions for all the
+facilities used. The syntax is a simple subset of XML; XML is widely
+known and easily parsed or hand edited. Each file contains one or more
+<FACILITY NAME=name>...</FACILITY> elements. Indeed, several
+facilities may have the same name but different content (and thus will
+generate a different checksum), typically when the event descriptions
+for a given facility change from one version to the next, if a module
+is recompiled and reloaded during a trace.
+
+<P>
+A small number of events are predefined, part of the "builtin" facility,
+and are not present there. These "builtin" events include "facility_load",
+"block_start", "block_end" and "time_heartbeat".
+
+<P>
+The cpu directory contains a tracefile for each cpu, numbered from 0,
+in .trace format. A uniprocessor thus only contains the file cpu/0.
+A multi-processor with some unused (possibly hotplug) CPU slots may have some
+unused CPU numbers. For instance a 8 way SMP board with 6 CPUs randomly
+installed may produce tracefiles named 0, 1, 2, 4, 6, 7.
+
+<P>
+The files in the control directory also follow the .trace format.
+The "facilities" file only contains "builtin" facility_load events
+and is used to determine the facilities used and the code range assigned
+to each facility. The other control files contain the initial system
+state and various subsequent important events, for example process
+creations and exit. The interest of placing such subsequent events
+in control trace files instead of (or in addition to) in the per cpu
+trace files is that they may be accessed more quickly/conveniently
+and that they may be kept even when the per cpu files are overwritten
+in "flight recorder mode".
+
+<P>
+The info directory contains in system.xml a description of the system on which
+the trace was created as well as different user annotations in bookmark.xml.
+This directory may also contain various information about the trace, generated
+during trace analysis (statistics, index...).
+
+
+<H2>Trace format</H2>
+
+<P>
+Each tracefile is divided into equal size blocks with an uint32 at the block
+end giving the offset to the last event in the block. Events are packed
+sequentially in the block starting at offset 0 with a "block_start" event
+and ending, at the offset stored in the last 4 bytes of the block, with a
+block_end event. Both the block_start and block_end events
+contain the kernel timestamp (timespec binary structure,
+uint32 seconds, uint32 nanoseconds), the cycle counter (uint64 cycles),
+and the buffer id (uint64).
+
+<P>
+Each event consists in an event type id (uint16 which is the event type id
+within the facility + the facility base id), a time delta (uint32 in cycles
+or nanoseconds, depending on configuration, since the last time value, in the
+block header or in a "time_heartbeat" event) and the event type specific data.
+All values are packed in native byte order binary format.
+
+
+<H2>System description</H2>
+
+<P>
+The system type description, in system.xml, looks like:
+
+<PRE><TT>
+<system
+ node_name="vaucluse"
+ domainname="polymtl.ca"
+ cpu=4
+ arch_size="ILP32"
+ endian="little"
+ kernel_name="Linux"
+ kernel_release="2.4.18-686-smp"
+ kernel_version="#1 SMP Sun Apr 14 12:07:19 EST 2002"
+ machine="i686"
+ processor="unknown"
+ hardware_platform="unknown"
+ operating_system="Linux"
+ ltt_major_version="2"
+ ltt_minor_version="0"
+ ltt_block_size="100000"
+>
+Some comments about the system
+</system>
+</TT></PRE>
+
+<P>
+The system attributes kernel_name, node_name, kernel_release,
+ kernel_version, machine, processor, hardware_platform and operating_system
+come from the uname(1) program. The domainname attribute is obtained from
+the "hostname --domain" command. The arch_size attribute is one of
+LP32, ILP32, LP64 or ILP64 and specifies the length in bits of integers (I),
+long (L) and pointers (P). The endian attribute is "little" or "big".
+While the arch_size and endian attributes could be deduced from the platform
+type, having these explicit allows analysing traces from yet unknown
+platforms. The cpu attribute specifies the maximum number of processors in
+the system; only tracefiles 0 to this maximum - 1 may exist in the cpu
+directory.
+
+<P>
+Within the system element, the text enclosed may describe further the
+system traced.
+
+
+<H2>Event type descriptions</H2>
+
+<P>
+A facility contains the descriptions of several event types. When a structure
+is reused in several event types, a named type is defined and may be referenced
+by several other event types or named types.
+
+<PRE><TT>
+<facility name=facility_name>
+ <description>Some text</description>
+ <event name=eventtype_name>
+ <description>Some text</description>
+ --type structure--
+ </event>
+ ...
+ <type name=type_name>
+ --type structure--
+ </type>
+</facility>
+</TT></PRE>
+
+<P>
+The type structure may be one of the following primitive type elements.
+Whenever the keyword isize is used, the allowed values are
+short, medium, long, 1, 2, 4, 8, indicating the size in bytes.
+The fsize keyword represents one of medium, long, 4 and 8 bytes.
+
+<PRE><TT>
+<int size=isize format="printf format"/>
+
+<uint size=isize format="printf format"/>
+
+<float size=fsize format="printf format"/>
+
+<string format="printf format"/>
+
+<enum size=isize format="printf format">label1 label2 ...</enum>
+</TT></PRE>
+
+<P>
+The string is null terminated. For the enumeration, the size of the integer
+used for its representation is specified.
+
+<P>
+The type structure may also be a compound type.
+
+<PRE><TT>
+<array size=n> --type structure-- </array>
+
+<sequence lengthsize=isize> --type structure-- </sequence>
+
+<struct>
+ <field name=field_name>
+ <description>Some text</description>
+ --type structure--
+ </field>
+ ...
+</struct>
+
+<union typecodesize=isize>
+ <field name=field_name>
+ <description>Some text</description>
+ --type structure--
+ </field>
+ ...
+</union>
+</TT></PRE>
+
+<P>
+Array is a fixed size array of length size. Sequence is a variable size
+array with its length stored as a prepended uint of length lengthsize.
+A structure is simply an aggregation of fields. An union is one of its n
+fields (variant record), as indicated by a preceeding code (0 to n - 1)
+of the specified size typecodesize.
+
+<P>
+Finally the type structure may be defined by referencing a named type.
+
+<PRE><TT>
+<typeref name=type_name/>
+</PRE></TT>
+
+<H2>Builtin events</H2>
+
+<P>
+The facility named "builtin" is always present and contains at least the
+following event types.
+
+<PRE><TT>
+<event name=facility_load>
+ <description>Facility used in the trace</description>
+ <struct>
+ <field name="name"><string/></field>
+ <field name="checksum"><uint size=4/></field>
+ <field name="base_code"><uint size=4/></field>
+ </struct>
+</event>
+
+<event name=block_start>
+ <description>Block start timestamp</description>
+ <typeref name=block_timestamp/>
+</event>
+
+<event name=block_end>
+ <description>Block end timestamp</description>
+ <typeref name=block_timestamp/>
+</event>
+
+<event name=time_heartbeat>
+ <description>System time values sent periodically to minimize cycle counter
+ drift with respect to real time clock and to detect cycle counter
+ rollovers
+ </description>
+ <typeref name=timestamp/>
+</event>
+
+<type name=block_timestamp>
+ <struct>
+ <field name=timestamp><typeref name=timestamp></field>
+ <field name=block_id><uint size=4/></field>
+ </struct>
+</type>
+
+<type name=timestamp>
+ <struct>
+ <field name=time><typeref name=timespec/></event>
+ <field name="cycle_count"><uint size=8/></field>
+ </struct>
+</event>
+
+<type name=timespec>
+ <struct>
+ <field name="seconds"><uint size=4/></field>
+ <field name="nanoseconds"><uint size=4/></field>
+ </struct>
+</type>
+</TT></PRE>
+
+<H2>Control files</H2>
+
+<P>
+The interrupts file reflects the content of the /proc/interrupts system file.
+It contains one event describing each interrupt. At trace start, events are
+generated describing all the current interrupts. If the assignment of
+interrupts changes later, due to devices or device drivers being activated or
+deactivated, additional events may be added to the file. Each interrupt
+event has the following structure.
+
+<PRE><TT>
+<event name=interrupt>
+ <description>Interrupt request number assignment<description>
+ <struct>
+ <field name="number"><uint size=4/></field>
+ <field name="count"><uint size=4/></field>
+ <field name="controller"><string/></field>
+ <field name="name"><string/></field>
+ </struct>
+</event>
+</TT></PRE>
+
+<P>
+The processes file contains the list of processes already created when the
+trace starts. Each process describing event is modeled after the
+/proc/self/status system file. The number of fields in this event is
+expected to be expanded in the future to include groups, signal masks,
+opened file descriptors and address maps.
+
+<PRE><TT>
+<event name=process>
+ <description>Existing process<description>
+ <struct>
+ <field name="name"><string/></field>
+ <field name="pid"><uint size=4/></field>
+ <field name="ppid"><uint size=4/></field>
+ <field name="tracer_pid"><uint size=4/></field>
+ <field name="uid"><uint size=4/></field>
+ <field name="euid"><uint size=4/></field>
+ <field name="suid"><uint size=4/></field>
+ <field name="fsuid"><uint size=4/></field>
+ <field name="gid"><uint size=4/></field>
+ <field name="egid"><uint size=4/></field>
+ <field name="sgid"><uint size=4/></field>
+ <field name="fsgid"><uint size=4/></field>
+ <field name="state"><enum size=4>
+ Running WaitInterruptible WaitUninterruptible Zombie Traced Paging
+ </enum></field>
+ </struct>
+</event>
+</TT></PRE>
+
+<H2>Facilities</H2>
+
+<P>
+Facilities define a granularity of events grouping for filtering, activation
+and compilation. Each facility does cost a table entry in the kernel (name,
+checksum, event type code range), or somewhere between 20 and 30 bytes. Having
+one facility per tracing statement in the kernel would be too much (assuming
+that they eventually are routinely inserted in the kernel code and replace
+the 80000+ printk statements in some proportion). However, having a few
+facilities, up to a few tens, would make sense.
+
+<P>
+The "builtin" facility contains a small number of predefined events which must
+always exist. The "core" facility contains a small subset of OS events which
+are almost always of interest (scheduling, interrupts, faults, system calls).
+Then, specialized facilities may exist for each subsystem (network, disks,
+USB, SCSI...).
+
+
+<H2>Bookmarks</H2>
+
+<P>
+Bookmarks are user supplied information added to a trace. They contain user
+annotations attached to a time interval.
+
+<PRE><TT>
+<bookmarks>
+ <location name=name cpu=n start_time=t end_time=t>Some text</location>
+ ...
+</bookmarks>
+</TT></PRE>
+
+<P>
+The interval is defined using either "time=" or "start_time=" and
+"end_time=", or "cycle=" or "start_cycle=" and "end_cycle=".
+The time is in seconds with decimals up to nanoseconds and cycle counts
+are unsigned integers with a 64 bits range. The cpu attribute is optional.
+
+</BODY>
+</HTML>
+
+
+
+
--- /dev/null
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html>
+<head>
+ <title>Linux Trace Toolkit Status</title>
+</head>
+ <body>
+
+<h1>Linux Trace Toolkit Status</h1>
+
+<p><i>Last updated July 1, 2003.</i> </p>
+
+<p>During the 2002 Ottawa Linux Symposium tracing BOF, a list of desirable
+ features for LTT was collected by Richard Moore. Since then, a lot of infrastructure
+ work on LTT has been taking place. This status report aims to track current
+ development efforts and the current status of the various features. This
+status page is most certainly incomplete, please send
+any additions and corrections to Michel Dagenais (michel.dagenais at polymtl.ca)</p>
+
+<p>As of this writing, the most active LTT contributors include Karim Yaghmour,
+author and maintainer from opersys.com, Tom Zanussi, Robert Wisniewski,
+Richard J Moore and others from IBM, mainly at the Linux Technology Center,
+XiangXiu Yang, Mathieu Desnoyers, Benoit des Ligneris and Michel Dagenais,
+from the department of Computer Engineering at Ecole Polytechnique de
+Montreal, and Frank Rowand, from Monte Vista.</p>
+
+<h2>Work recently performed</h2>
+
+<p><b>Lockless per cpu buffers:</b> Tom Zanussi of IBM has implemented per CPU lockless buffering, with low
+overhead very fine grained timestamping, and has updated accordingly the
+kernel patch and the trace visualizer except for viewing multiple per CPU
+traces simultaneously. </p>
+
+<p><b>RelayFS:</b> Tom Zanussi has implemented RelayFS, a separate, simple
+and efficient component for moving data between the kernel and user space
+applications. This component is reusable by other projects (printk, evlog,
+lustre...) and removes a sizeable chunk from the current LTT, making each
+piece (relayfs and relayfs-based LTT) simpler, more modular and possibly
+more palatable for inclusion in the standard Linux kernel. Besides LTT on
+RelayFS, He has implemented printk over RelayFS with an automatically
+resizeable printk buffer. </p>
+
+<p><b>New trace format:</b> Karim Yaghmour and Michel Dagenais, with input
+from several LTT contributors, have designed a new trace format to accomodate
+per buffer tracefiles and dynamically defined event types. The new format
+includes both the binary trace format and the event type description format.
+XiangXiu Yang has developed a simple parser for the event type description
+format. This parser is used to generate the tracing macros in the kernel
+(genevent) and to support reading tracefiles in the trace reading library
+(libltt).
+
+<h2>Ongoing work</h2>
+
+<p><b>Libltt:</b> XiangXiu Yang is finishing up an event reading library
+and API which parses event descriptions and accordingly reads traces and
+decodes events. </p>
+
+<p><b>lttv:</b> XiangXiu Yang, Mathieu Desnoyers and Michel Dagenais are
+remodeling the trace visualizer to use the new trace format and libltt API,
+and to allow compiled and scripted plugins, which can dynamically
+add new custom trace analysis functions. </p>
+
+<h2>Planned work</h2>
+
+<p>LTT already interfaces with Dynamic Probes. This feature will need to
+be updated for the new LTT version. </p>
+
+<p>The Kernel Crash Dump utilities is another very interesting complementary
+ project. Interfacing it with RelayFS will help implement useful
+flight-recorder like tracing for post-mortem analysis. </p>
+
+<p>User level tracing is available in the current LTT version but requires
+one system call per event. With the new RelayFS based infrastructure, it
+would be interesting to use a shared memory buffer directly accessible from
+user space. Having one RelayFS channel per user would allow an extremely
+efficient, yet secure, user level tracing mechanism. </p>
+
+<p>Sending important events (process creation, event types/facilities
+definitions) to a separate channel could be used to browse traces
+interactively more efficiently. Only this concise trace of important
+events would need to be processed in its entirety, other larger
+gigabyte size traces could be used in random access without requiring
+a first preprocessing pass. A separate channel would also be required
+in case of incomplete traces such as when tracing to a circular buffer
+in "flight recorder" mode; the important events would all be kept
+while only the last buffers of ordinary events would be kept. </p>
+
+<p>Once the visualizer is able to read and display several traces, it
+ will be interesting to produce side by side synchronized views
+ (events from two interacting machines A and B one above the other)
+ or even merged views (combined events from several CPUs in a single
+ merged graph). Time differences between interacting systems will
+ need to be estimated and somewhat compensated for. </p>
+
+<p>LTT currently writes a <i>proc</i> file at trace start time. This
+ file only contains minimal information about processes and
+ interrupts names. More information would be desirable for several
+ applications (process maps, opened descriptors, content of buffer
+ cache). Furthermore, this information may be more conveniently
+ gathered from within the kernel and simply written to the trace as
+ events at start time. </p>
+
+<h2>New features already implemented since LTT 0.9.5</h2>
+
+<ol>
+ <li> Per-CPU Buffering scheme. </li>
+ <li> Logging without locking. </li>
+ <li> Minimal latency - minimal or no serialisation. (<i>Lockless tracing
+using read_cycle_counter instead of gettimeofday.</i>) </li>
+ <li> Fine granularity time stamping - min=o(CPU cycle time),
+max=.05 Gb Ethernet interrupt rate. (<i>Cycle counter being used</i>). </li>
+ <li> Random access to trace event stream. (<i>Random access reading
+of events in the trace is already available in LibLTT. However, one first
+pass is required through the trace to find all the process creation events;
+the cost of this first pass may be reduced in the future if process creation
+ events are sent to a separate much smaller trace</i>.) </li>
+
+</ol>
+
+<h2>Features being worked on</h2>
+
+<ol>
+ <li> Simple wrapper macros for trace instrumentation. (<i>GenEvent</i>)
+ </li>
+ <li> Easily expandable with new trace types. (<i>GenEvent</i>) </li>
+ <li> Multiple buffering schemes - switchable globally or selectable
+by trace client. (<i>Will be simpler to obtain with RelayFS</i>.) </li>
+ <li> Global buffer scheme. (<i>Will be simpler to obtain with RelayFS</i>.)
+ </li>
+ <li> Per-process buffer scheme. (<i>Will be simpler to obtain with RelayFS.</i>)
+ </li>
+ <li> Per-NGPT thread buffer scheme. (<i>Will be simpler to obtain with
+ RelayFS</i>.) </li>
+ <li> Per-component buffer scheme. (<i>Will be simpler to obtain with
+RelayFS</i>.) </li>
+ <li> A set of extensible and modular performance analysis post-processing
+programs. (<i>Lttv</i>) </li>
+ <li> Filtering and selection mechanisms within formatting utility. (<i>Lttv</i>)
+ </li>
+ <li> Variable size event records. (<i>GenEvent, LibEvent, Lttv</i>)
+ </li>
+ <li> Data reduction facilities able to logically combine traces from
+ more than one system. (<i>LibEvent, Lttv</i>) </li>
+ <li> Data presentation utilities to be able to present data from multiple
+ trace instances in a logically combined form (<i>LibEvent, Lttv</i>)
+ </li>
+ <li> Major/minor code means of identification/registration/assignment.
+ (<i>GenEvent</i>) </li>
+ <li> A flexible formatting mechanism that will cater for structures
+and arrays of structures with recursion. (<i>GenEvent</i>) </li>
+
+</ol>
+
+<h2>Features already planned for</h2>
+
+<ol>
+ <li> Init-time tracing. (<i>To be part of RelayFS</i>.) </li>
+ <li>Updated interface for Dynamic Probes. (<i>As soon as things stabilize.</i>)
+ </li>
+ <li> Support "flight recorder" always on tracing with minimal resource
+consumption. (<i>To be part of RelayFS and interfaced to the Kernel crash
+dump facilities.)</i> </li>
+ <li> Fine grained dynamic trace instrumentation for kernel space and
+user subsystems. (<i>Dynamic Probes, more efficient user level tracing.</i>)</li>
+ <li>System information logged at trace start. (<i>New special events
+to add</i>.)</li>
+ <li>Collection of process memory map information at trace start/restart
+ and updates of that information at fork/exec/exit. This allows address-to-name
+ resolution for user space. </li>
+ <li>Include the facility to write system snapshots (total memory layout
+ for kernel, drivers, and all processes) to a file. This is required for
+ trace post-processing on a system other than the one producing the trace.
+ Perhaps some of this is already implemented in the Kernel Crash Dump.</li>
+ <li>Even more efficient tracing from user space.</li>
+ <li>Better integration with tools to define static trace hooks.</li>
+ <li> Better integration with tools to dynamically activate tracing statements.</li>
+
+</ol>
+
+<h2>Features not currently planned</h2>
+
+<ol>
+ <li>POSIX Tracing API compliance. </li>
+ <li>Ability to do function entry/exit tracing facility. (<i>Probably
+ a totally orthogonal mechanism using either Dynamic Probes hooks or static
+ code instrumentation using the suitable GCC options for basic blocks instrumentation.</i>)</li>
+ <li>Processor performance counter (which most modern CPUs have) sampling
+and recording. (<i>These counters can be read and their value sent in traced
+events. Some support to collect these automatically at specific state change
+times and to visualize the results would be nice.)</i></li>
+ <li>Suspend & Resume capability. (<i>Why not simply stop the
+ trace and start a new one later, otherwise important information like process
+creations while suspended must be obtained in some other way.</i>)</li>
+ <li>Per-packet send/receive event. (<i>New event types will be easily
+added as needed.)</i></li>
+
+</ol>
+ <br>
+ <br>
+
+</body>
+</html>
+
+
+
--- /dev/null
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html>
+<head>
+ <title>Linux Trace Toolkit User tools</title>
+</head>
+ <body>
+
+<h1>Linux Trace Toolkit User tools</h1>
+
+<P>The Linux Trace Toolkit Visualizer, lttv, is a modular and extensible
+tool to read, analyze, annotate and display traces. It accesses traces through
+the libltt API and produces either textual output or graphical output using
+the GTK library. This document describes the architecture of lttv for
+developers.
+
+<P>Lttv is a small executable which links to the trace reading API, libltt,
+and to the glib and gobject base libraries.
+By itself it contains just enough code to
+convert a trace to a textual format and to load modules.
+The public
+functions defined in the main program are available to all modules.
+A number of
+<I>text</I> modules may be dynamically loaded to extend the capabilities of
+lttv, for instance to compute and print various statistics.
+
+<P>A more elaborate module, traceView, dynamically links to the GTK library
+and to a support library, libgtklttv. When loaded, it displays graphical
+windows in which one or more viewers in subwindows may be used to browse
+details of events in traces. A number of other graphical modules may be
+dynamically loaded to offer a choice of different viewers (e.g., process,
+CPU or block devices state versus time).
+
+<H2>Main program: main.c</H2>
+
+<P>The main program parses the command line options, loads the requested
+modules and executes the hooks registered in the global attributes
+(/hooks/main/before, /hooks/main/core, /hooks/main/after).
+
+<H3>Hooks for callbacks: hook.h (hook.c)</H3>
+
+<P>In a modular extensible application, each module registers callbacks to
+insure that it gets called at appropriate times (e.g., after command line
+options processing, at each event to compute statistics...). Hooks and lists
+of hooks are defined for this purpose and are normally stored in the global
+attributes under /hooks/*.
+
+<H3>Browsable data structures: iattribute.h (iattribute.c)</H3>
+
+<P>In several places, functions should operate on data structures for which the
+list of members is extensible. For example, the statistics printing
+module should not be
+modified each time new statistics are added by other modules.
+For this purpose, a gobject interface is defined in iattribute.h to
+enumerate and access members in a data structure. Even if new modules
+define custom data structures for efficiently storing statistics while they
+are being computed, they will be generically accessible for the printing
+routine as long as they implement the iattribute interface.
+
+<H3>Extensible data structures: attribute.h (attribute.c)</H3>
+
+<P>To allow each module to add its needed members to important data structures,
+for instance new statistics for processes, the LttvAttributes type is
+a container for named typed values. Each attribute has a textual key (name)
+and an associated typed value.
+It is similar to a C data structure except that the
+number and type of the members can change dynamically. It may be accessed
+either directly or through the iattribute interface.
+
+<P>Some members may be LttvAttributes objects, thus forming a tree of
+attributes, not unlike hierarchical file systems or registries. This is used
+for the global attributes, used to exchange information between modules.
+Attributes are also attached to trace sets, traces and contexts to allow
+storing arbitrary attributes.
+
+<H3>Modules: module.h (module.c)</H3>
+
+<P>The benefit of modules is to avoid recompiling the whole application when
+adding new functionality. It also helps insuring that only the needed code
+is loaded in memory.
+
+<P>Modules are loaded explicitly, being on the list of default modules or
+requested by a command line option, with g_module_open. The functions in
+the module are not directly accessible.
+Indeed, direct, compiled in, references to their functions would be dangerous
+since they would exist even before (if ever) the module is loaded.
+Each module contains a function named <i>init</i>. Its handle is obtained by
+the main program using g_module_symbol and is called.
+The <i>init</i> function of the module
+then calls everything it needs from the main program or from libraries,
+typically registering callbacks in hooks lists stored in the global attributes.
+No module function other than <i>init</i> is
+directly called. Modules cannot see the functions from other modules since
+they may or not be loaded at the same time.
+
+<P>The modules must see the declarations for the functions
+used, from the main program and from libraries, by including the associated
+.h files. The list of libraries used must be provided as argument when
+a module is linked. This will insure that these libraries get loaded
+automatically when that module is loaded.
+
+<P>Libraries contain a number of functions available to modules and to the main
+program. They are loaded automatically at start time if linked by the main
+program or at module load time if linked by that module. Libraries are
+useful to contain functions needed by several modules. Indeed, functions
+used by a single module could be simply part of that module.
+
+<P>A list of loaded modules is maintained. When a module is requested, it
+is verified if the module is already loaded. A module may request other modules
+at the beginning of its init function. This will insure that these modules
+get loaded and initialized before the init function of the current module
+proceeds. Circular dependencies are obviously to be avoided as the
+initialization order among mutually dependent modules will be arbitrary.
+
+<H3>Command line options: option.h (option.c)</H3>
+
+<P>Command line options are added as needed by the main program and by modules
+as they are loaded. Thus, while options are scanned and acted upon (i.e.,
+options to load modules), the
+list of options to recognize continues to grow. The options module registers
+to get called by /hooks/main/before. It offers hooks /hooks/option/before
+and /hooks/option/after which are called just before and just after
+processing the options. Many modules register in their init function to
+be called in /hooks/options/after to verify the options specified and
+register further hooks accordingly.
+
+<H2>Trace Analysis</H2>
+
+<P>The main purpose of the lttv application is to process trace sets,
+calling registered hooks for each event in the traces and maintaining
+a context (system state, accumulated statistics).
+
+<H3>Trace Sets: traceSet.h (traceSet.c)</H3>
+
+<P>Trace sets are defined such that several traces can be analyzed together.
+Traces may be added and removed as needed to a trace set.
+The main program stores a trace set in /trace_set/default.
+The content of the trace_set is defined by command line options and it is
+used by analysis modules (batch or interactive).
+
+<H3>Trace Set Analysis: processTrace.h (processTrace.c)</H3>
+
+<p>The function <i>lttv_process_trace_set</i> loops over all the events
+in the specified trace set for the specified time interval. <I>Before</I>
+Hooks are first
+called for the trace set and for each trace and tracefile
+(one per cpu plus control tracefiles) in the trace set.
+Then hooks are called for
+each event in sorted time order. Finally, <i>after</i> hooks are called
+for the trace set and for each trace and tracefile in it.
+
+<P>To call all the event hooks in sorted time order, a priority queue
+(or sorted tree) is used. The first event from each tracefile is read and its
+time used as key in the sorted tree. The event with the lowest key is removed
+from the tree, the next event from that tracefile is read and reinserted in
+the tree.
+
+<p>Each hook is called with a LttvContext gobject as call data. The LttvContext
+object for the trace set before/after hooks is provided in the call to
+lttv_process_trace_set. Shallow copies of this context are made for each
+trace in the trace set for the trace before/after hooks. Again, shallow
+copies of each trace context are made for each tracefile in a trace.
+The context for each tracefile is used both for the tracefile before/after
+hooks and when calling the hooks for the contained events.
+
+<p>The lttv_process_trace_set function sets appropriately the fields in the
+context before calling a hook. For example, when calling a hook event,
+the context contains:
+
+<DL>
+<DT>trace_set_context<DD> context for the trace set.
+<DT>trace_context<DD> context for the trace.
+<DT>ts<DD> trace set.
+<DT>t<DD> trace.
+<DT>tf<DD> tracefile.
+<DT>e<DD> event.
+</DL>
+
+<P>The cost of providing all this information in the context is relatively
+low. When calling a hook from one event to the next, in the same tracefile,
+only the event field needs to be changed.
+The contexts used when processing traces are key to extensibility and
+performance. New modules may need additional data members in the context to
+store intermediate results. For this purpose, it is possible to derive
+subtypes of LttvContext in order to add new data members.
+
+
+<H3>Reconstructing the system state from the trace: state.h (state.c)</H3>
+
+<P>The events in a trace often represent state transitions in the traced
+system. When the trace is processed, and events accessed in time sorted
+order, it is thus possible to reconstruct in part the state of the
+traced system: state of each CPU, process, disk queue. The state of each
+process may contain detailed information such as opened file descriptors
+and memory map if needed by the analysis and if sufficient information is
+available in the trace. This incrementally updated state information may be
+used to display state graphs, or simply to compute state dependent
+statistics (time spent in user or system mode, waiting for a file...).
+
+<P>
+When tracing starts, at T0, no state is available. The OS state may be
+obtained through "initial state" events which enumerate the important OS data
+structures. Unless the state is obtained atomically, other events
+describing state changes may be interleaved in the trace and must be
+processed in the correct order. Once all the special initial state
+events are obtained, at Ts, the complete state is available. From there the
+system state can be deduced incrementally from the events in the trace.
+
+<P>
+Analysis tools must be prepared for missing state information. In some cases
+only a subset of events is traced, in others the trace may be truncated
+in <i>flight recorder</i> mode.
+
+<P>
+In interactive processing, the interval for which processing is required
+varies. After scrolling a viewer, the events in the new interval to display
+need to be processed in order to redraw the view. To avoid restarting
+the processing at the trace start to reconstruct incrementally the system
+state, the computed state may be memorized at regular interval, for example at
+each 100 000 events, in a time indexed database associated with a trace.
+To conserve space, it may be possible in some cases to only store state
+differences.
+
+<p>To process a specific time interval, the state at the beginning of the
+interval would be obtained by copying the last preceeding saved state
+and processing the events since then to update the state.
+
+<p>A new subtype of LttvContext, LttvStateContext, is defined to add storage
+for the state information. It defines a trace set state as a set of trace
+state. The trace state is composed of processes, CPUs and block devices.
+Each CPU has a currently executing process and each process state keeps
+track the interrupt stack frames (faults, interrupts,
+system calls), executable file name and other information such as opened
+file descriptors. Each frame stores the process status, entry time
+and last status change time.
+
+<p>File state.c provides state updating hooks to be called when the trace is
+processed. When a scheduling change event is delivered to the hook, for
+instance, the current process for the CPU is changed and the state of the
+incoming and outgoing processes is changed.
+The state updating hooks are stored in the global attributes under
+/hooks/state/core/trace_set/before, after,
+/hooks/state/core/trace/before, after...
+to be used by processing functions requiring state updating (batch and
+interactive alalysis, computing the state at time T by updating a preceeding
+saved state...).
+
+<H3>Computing Statistics: stats.h (stats.c)</H3>
+
+<p>This file defines a subtype of LttvStateContext, LttvStatsContext,
+to store statistics on various aspects of a trace set. The LttvTraceSetStats
+structure contains a set of LttvTraceStats structures. Each such structure
+contains structures for CPUs, processes, interrupt types (IRQ, system call,
+fault), subtypes (individual system calls, IRQs or faults) and
+block devices. The CPUs also contain structures for processes, interrupt types,
+subtypes and block devices. Process structures similarly contain
+structures for interrupt types, subtypes and block devices. At each level
+(trace set, trace, cpu, process, interrupt stack frames)
+attributes are used to store statistics.
+
+<p>File stats.c provides statistics computing hooks to be called when the
+trace is processed. For example, when a <i>write</i> event is processed,
+the attribute <i>BytesWritten</i> in the corresponding system, cpu, process,
+interrupt type (e.g. system call) and subtype (e.g. write) is incremented
+by the number of bytes stored in the event. When the processing is finished,
+perhaps in the after hooks, the number of bytes written and other statistics
+may be summed over all CPUs for a given process, over all processes for a
+given CPU or over all traces.
+
+<p>The basic set of statistics computed by stats.c include for the whole
+ trace set:
+
+<UL>
+<LI>Trace start time, end time and duration.
+<LI>Total number of events.
+<LI>Number of each event type (Interrupts, faults, system calls...)
+<LI>For each interrupt type and each subtype, the number of each event type.
+<LI>For each system:
+ <UL>
+ <LI>Total number of events.
+ <LI>Number of each event type (Interrupts, faults, system calls...)
+ <LI>For each interrupt type and each subtype, the number of each event type.
+ <LI>For each CPU:
+ <UL>
+ <LI> CPU id
+ <LI> User/System time
+ <LI> Number of each event type
+ <LI> For each interrupt type and each subtype,
+ the number of each event type.
+ </UL>
+ <LI>For each block device:
+ <UL>
+ <LI> block device name
+ <LI> time busy/idle, average queue length
+ <LI> Number of each relevant event type (requests added, merged, served)
+ </UL>
+ <LI>For each process:
+ <UL>
+ <LI> Exec'ed file names.
+ <LI> Start and end time, User/System time
+ <LI> Number of each event type
+ <LI> For each interrupt type and each subtype,
+ the number of each event type.
+ </UL>
+ </UL>
+</UL>
+
+<P>The structure to store statistics differs from the state storage structure
+in several ways. Statistics are maintained in different ways (per CPU all
+processes, per process all CPUs, per process on a given CPU...). Furthermore,
+statistics are maintained for all processes which existed during the trace
+while the state at time T only stores information about current processes.
+
+<P>The hooks defined by stats.c are stored in the global attributes under
+/hooks/stats/core/trace_set/before, after,
+/hooks/stats/core/trace/before, after to be used by processing functions
+interested in statistics.
+
+<H3>Filtering events: filter.h (filter.c)</H3>
+
+<P>
+Filters are used to select which events in a trace are shown in a viewer or are
+used in a computation. The filtering rules are based on the values of
+events fields. The filter module receives a filter expression and computes
+a compiled filter. The compiled filter then serves as hook data for
+<i>check</i> event
+filter hooks which, given a context containing an event,
+return TRUE or FALSE to
+indicate if the event satisfies the filter. Trace and tracefile <i>check</i>
+filter hooks
+may be used to determine if a system and CPU satisfy the filter. Finally,
+the filter module has a function to return the time bounds, if any, imposed
+by a filter.
+
+<P>For some applications, the hooks provided by the filter module may not
+be sufficient, since they are based on simple boolean combinations
+of comparisons between fields and constants. In that case, custom code may be
+used for <i>check</i> hooks during the processing. An example of complex
+filtering could be to only show events belonging to processes which consumed
+more than 10% of the CPU in the last 10 seconds.
+
+<p>In module filter.c, filters are specified using textual expressions
+with AND, OR, NOT operations on
+nested subexpressions. Primitive expressions compare an event field to
+a constant. In the graphical user interface, a filter editor is provided.
+
+<PRE><TT>
+tokens: ( ! && || == <= >= > < != name [ ] int float string )
+
+expression = ( expression ) OR ! expression OR
+ expression && expression OR expression || expression OR
+ simple_expression
+
+simple_expression = field_selector OP value
+
+value = int OR float OR string OR enum
+
+field_selector = component OR component . field_selector
+
+component = name OR name [ int ]
+</TT></PRE>
+
+
+<H3>Batch Analysis: batchAnalysis.h (batchAnalysis.c)</H3>
+
+<p>This module registers to be called by the main program (/hooks/main/core).
+When called, it gets the current trace set (/trace_set/default),
+state updating hooks (/hooks/state/*) the statistics hooks
+(/hooks/stats/*) and other analysis hooks (/hooks/batch/*)
+and runs lttv_process_trace_set for the entire
+trace set time interval. This simple processing of the complete trace set
+is normally sufficient for batch operations such as converting a trace to
+text and computing various statistics.
+
+
+<H3>Text output for events and statistics: textDump.h (textDump.c)</H3>
+
+<P>
+This module registers hooks (/hooks/batch)
+to print a textual representation of each event
+(event hooks) and to print the content of the statistics accumulated in the
+context (after trace set hook).
+
+<H2>Trace Set Viewers</H2>
+
+<p>
+A library, libgtklttv, is defined to provide utility functions for
+the second set of modules, wich compose the interactive graphical user
+interface. It offers functions to create and interact with top level trace
+viewing windows, and to insert specialized embedded viewer modules.
+The libgtklttv library requires the gtk library.
+The viewer modules include a detailed event list, eventsTableView,
+a process state graph, processStateView, and a CPU state graph, cpuStateView.
+
+<p>
+The top level gtkTraceSet, defined in libgtklttv,
+window has the usual FILE EDIT... menu and a toolbar.
+It has an associated trace set (and filter) and contains several tabs, each
+containing several vertically stacked time synchronized trace set viewers.
+It manages the space allocated to each contained viewer, the menu items and
+tools registered by each contained viewer and the current time and current
+time interval.
+
+<P>
+When viewers change the current time or time interval, the gtkTraceSet
+window notifies all contained viewers. When one or more viewers need
+redrawing, the gtkTraceSet window calls the lttv_process_trace_set
+function for the needed time interval, after computing the system state
+for the interval start time. While events are processed, drawing hooks
+from the viewers are called.
+
+<P>
+TO COMPLETE; description and motivation for the gtkTraceSet widget structure
+and interaction with viewers. Description and motivation for the detailed
+event view and process state view.
+
+</BODY>
+</HTML>