git-svn-id: http://ltt.polymtl.ca/svn@100 04897980-b3bd-0310-b5e0-8ef037075253

author dagenais <dagenais@04897980-b3bd-0310-b5e0-8ef037075253>

Fri, 4 Jul 2003 14:52:52 +0000 (14:52 +0000)

committer dagenais <dagenais@04897980-b3bd-0310-b5e0-8ef037075253>

Fri, 4 Jul 2003 14:52:52 +0000 (14:52 +0000)
author dagenais <dagenais@04897980-b3bd-0310-b5e0-8ef037075253>
Fri, 4 Jul 2003 14:52:52 +0000 (14:52 +0000)
committer dagenais <dagenais@04897980-b3bd-0310-b5e0-8ef037075253>
Fri, 4 Jul 2003 14:52:52 +0000 (14:52 +0000)
diff --git a/ltt/branches/poly/doc/developer/format.html b/ltt/branches/poly/doc/developer/format.html

new file mode 100644 (file)

index 0000000..7bb1a12
--- /dev/null
+++ b/ltt/branches/poly/doc/developer/format.html
@@ -0,0 +1,377 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html>
+<head>
+  <title>The new LTT trace format</title>
+</head>
+  <body>
+
+<h1>The new LTT trace format</h1>
+
+<P>
+A trace is contained in a directory tree. To send a trace remotely,
+the directory tree may be tar-gzipped. Trace foo, placed in the home
+directory of user john, /home/john, would have the following content:
+
+<PRE><TT>
+$ cd /home/john
+$ tree foo
+foo/
+|-- eventdefs
+|   |-- core.xml
+|   |-- net.xml
+|   |-- ipv4.xml
+|   `-- ide.xml
+|-- info
+|   |-- bookmarks.xml
+|   `-- system.xml
+|-- control
+|   |-- facilities
+|   |-- interrupts
+|   `-- processes
+`-- cpu
+    |-- 0
+    |-- 1
+    |-- 2
+    `-- 3
+</TT></PRE>
+
+<P>
+The eventdefs directory contains the events descriptions for all the
+facilities used. The syntax is a simple subset of XML; XML is widely
+known and easily parsed or hand edited. Each file contains one or more
+<FACILITY NAME=name>...</FACILITY> elements. Indeed, several
+facilities may have the same name but different content (and thus will
+generate a different checksum), typically when the event descriptions
+for a given facility change from one version to the next, if a module
+is recompiled and reloaded during a trace.
+
+<P>
+A small number of events are predefined, part of the "builtin" facility, 
+and are not present there. These "builtin" events include "facility_load", 
+"block_start", "block_end" and "time_heartbeat".
+
+<P>
+The cpu directory contains a tracefile for each cpu, numbered from 0, 
+in .trace format. A uniprocessor thus only contains the file cpu/0. 
+A multi-processor with some unused (possibly hotplug) CPU slots may have some
+unused CPU numbers. For instance a 8 way SMP board with 6 CPUs randomly 
+installed may produce tracefiles named 0, 1, 2, 4, 6, 7.
+
+<P>
+The files in the control directory also follow the .trace format. 
+The "facilities" file only contains "builtin" facility_load events
+and is used to determine the facilities used and the code range assigned 
+to each facility. The other control files contain the initial system
+state and various subsequent important events, for example process 
+creations and exit. The interest of placing such subsequent events 
+in control trace files instead of (or in addition to) in the per cpu 
+trace files is that they may be accessed more quickly/conveniently 
+and that they may be kept even when the per cpu files are overwritten 
+in "flight recorder mode".
+
+<P>
+The info directory contains in system.xml a description of the system on which
+the trace was created as well as different user annotations in bookmark.xml.
+This directory may also contain various information about the trace, generated 
+during trace analysis (statistics, index...).
+
+
+<H2>Trace format</H2>
+
+<P>
+Each tracefile is divided into equal size blocks with an uint32 at the block 
+end giving the offset to the last event in the block. Events are packed
+sequentially in the block starting at offset 0 with a "block_start" event
+and ending, at the offset stored in the last 4 bytes of the block, with a
+block_end event. Both the block_start and block_end events
+contain the kernel timestamp (timespec binary structure, 
+uint32 seconds, uint32 nanoseconds), the cycle counter (uint64 cycles), 
+and the buffer id (uint64).
+
+<P>
+Each event consists in an event type id (uint16 which is the event type id 
+within the facility + the facility base id), a time delta (uint32 in cycles
+or nanoseconds, depending on configuration, since the last time value, in the 
+block header or in a "time_heartbeat" event) and the event type specific data.
+All values are packed in native byte order binary format.
+
+
+<H2>System description</H2>
+
+<P>
+The system type description, in system.xml, looks like:
+
+<PRE><TT>
+&lt;system 
+ node_name="vaucluse"
+ domainname="polymtl.ca" 
+ cpu=4
+ arch_size="ILP32" 
+ endian="little" 
+ kernel_name="Linux" 
+ kernel_release="2.4.18-686-smp" 
+ kernel_version="#1 SMP Sun Apr 14 12:07:19 EST 2002"
+ machine="i686" 
+ processor="unknown" 
+ hardware_platform="unknown"
+ operating_system="Linux" 
+ ltt_major_version="2"
+ ltt_minor_version="0"
+ ltt_block_size="100000"
+&gt;
+Some comments about the system
+&lt;/system&gt;
+</TT></PRE>
+
+<P>
+The system attributes kernel_name, node_name, kernel_release,
+ kernel_version, machine, processor, hardware_platform and operating_system
+come from the uname(1) program. The domainname attribute is obtained from
+the "hostname --domain" command. The arch_size attribute is one of
+LP32, ILP32, LP64 or ILP64 and specifies the length in bits of integers (I),
+long (L) and pointers (P). The endian attribute is "little" or "big".
+While the arch_size and endian attributes could be deduced from the platform
+type, having these explicit allows analysing traces from yet unknown
+platforms. The cpu attribute specifies the maximum number of processors in
+the system; only tracefiles 0 to this maximum - 1 may exist in the cpu
+directory.
+
+<P>
+Within the system element, the text enclosed may describe further the
+system traced.
+
+
+<H2>Event type descriptions</H2>
+
+<P>
+A facility contains the descriptions of several event types. When a structure
+is reused in several event types, a named type is defined and may be referenced
+by several other event types or named types.
+
+<PRE><TT>
+&lt;facility name=facility_name&gt;
+  &lt;description&gt;Some text&lt;/description&gt;
+  &lt;event name=eventtype_name&gt;
+    &lt;description&gt;Some text&lt;/description&gt;
+    --type structure--
+  &lt;/event&gt;
+  ...
+  &lt;type name=type_name&gt;
+    --type structure--
+  &lt;/type&gt;
+&lt;/facility&gt;
+</TT></PRE>
+
+<P>
+The type structure may be one of the following primitive type elements.
+Whenever the keyword isize is used, the allowed values are 
+short, medium, long, 1, 2, 4, 8, indicating the size in bytes.
+The fsize keyword represents one of medium, long, 4 and 8 bytes.
+
+<PRE><TT>
+&lt;int size=isize format="printf format"/&gt;
+
+&lt;uint size=isize format="printf format"/&gt;
+
+&lt;float size=fsize format="printf format"/&gt;
+
+&lt;string format="printf format"/&gt;
+
+&lt;enum size=isize format="printf format"&gt;label1 label2 ...&lt;/enum&gt;
+</TT></PRE>
+
+<P>
+The string is null terminated. For the enumeration, the size of the integer
+used for its representation is specified.
+
+<P>
+The type structure may also be a compound type.
+
+<PRE><TT>
+&lt;array size=n&gt; --type structure-- &lt;/array&gt;
+
+&lt;sequence lengthsize=isize&gt; --type structure-- &lt;/sequence&gt;
+
+&lt;struct&gt;
+  &lt;field name=field_name&gt;
+    &lt;description&gt;Some text&lt;/description&gt;
+    --type structure--
+  &lt;/field&gt;
+  ...
+&lt;/struct&gt;
+
+&lt;union typecodesize=isize&gt;
+  &lt;field name=field_name&gt;
+    &lt;description&gt;Some text&lt;/description&gt;
+    --type structure--
+  &lt;/field&gt;
+  ...
+&lt;/union&gt;
+</TT></PRE>
+
+<P>
+Array is a fixed size array of length size. Sequence is a variable size
+array with its length stored as a prepended uint of length lengthsize. 
+A structure is simply an aggregation of fields. An union is one of its n 
+fields (variant record), as indicated by a preceeding code (0 to n - 1)
+of the specified size typecodesize.
+
+<P>
+Finally the type structure may be defined by referencing a named type.
+
+<PRE><TT>
+&lt;typeref name=type_name/&gt;
+</PRE></TT>
+
+<H2>Builtin events</H2>
+
+<P>
+The facility named "builtin" is always present and contains at least the
+following event types.
+
+<PRE><TT>
+&lt;event name=facility_load&gt;
+  &lt;description&gt;Facility used in the trace&lt;/description&gt;
+  &lt;struct&gt;
+    &lt;field name="name"&gt;&lt;string/&gt;&lt;/field&gt;
+    &lt;field name="checksum"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
+    &lt;field name="base_code"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
+  &lt;/struct&gt;
+&lt;/event&gt;
+
+&lt;event name=block_start&gt;
+  &lt;description&gt;Block start timestamp&lt;/description&gt;
+  &lt;typeref name=block_timestamp/&gt;
+&lt;/event&gt;
+
+&lt;event name=block_end&gt;
+  &lt;description&gt;Block end timestamp&lt;/description&gt;
+  &lt;typeref name=block_timestamp/&gt;
+&lt;/event&gt;
+
+&lt;event name=time_heartbeat&gt;
+  &lt;description&gt;System time values sent periodically to minimize cycle counter 
+    drift with respect to real time clock and to detect cycle counter
+    rollovers
+  &lt;/description&gt;
+  &lt;typeref name=timestamp/&gt;
+&lt;/event&gt;
+
+&lt;type name=block_timestamp&gt;
+  &lt;struct&gt;
+    &lt;field name=timestamp&gt;&lt;typeref name=timestamp&gt;&lt;/field&gt;
+    &lt;field name=block_id&gt;&lt;uint size=4/&gt;&lt;/field&gt;
+  &lt;/struct&gt;
+&lt;/type&gt;
+
+&lt;type name=timestamp&gt;
+  &lt;struct&gt;
+    &lt;field name=time&gt;&lt;typeref name=timespec/&gt;&lt;/event&gt;
+    &lt;field name="cycle_count"&gt;&lt;uint size=8/&gt;&lt;/field&gt;
+  &lt;/struct&gt;
+&lt;/event&gt;
+
+&lt;type name=timespec&gt;
+  &lt;struct&gt;
+    &lt;field name="seconds"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
+    &lt;field name="nanoseconds"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
+  &lt;/struct&gt;
+&lt;/type&gt;
+</TT></PRE>
+
+<H2>Control files</H2>
+
+<P>
+The interrupts file reflects the content of the /proc/interrupts system file.
+It contains one event describing each interrupt. At trace start, events are
+generated describing all the current interrupts. If the assignment of
+interrupts changes later, due to devices or device drivers being activated or
+deactivated, additional events may be added to the file. Each interrupt
+event has the following structure.
+
+<PRE><TT>
+&lt;event name=interrupt&gt;
+  &lt;description&gt;Interrupt request number assignment&lt;description&gt;
+  &lt;struct&gt;
+    &lt;field name="number"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
+    &lt;field name="count"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
+    &lt;field name="controller"&gt;&lt;string/&gt;&lt;/field&gt;
+    &lt;field name="name"&gt;&lt;string/&gt;&lt;/field&gt;
+  &lt;/struct&gt;
+&lt;/event&gt;
+</TT></PRE>
+
+<P>
+The processes file contains the list of processes already created when the
+trace starts. Each process describing event is modeled after the 
+/proc/self/status system file. The number of fields in this event is
+expected to be expanded in the future to include groups, signal masks,
+opened file descriptors and address maps.
+
+<PRE><TT>
+&lt;event name=process&gt;
+  &lt;description&gt;Existing process&lt;description&gt;
+  &lt;struct&gt;
+    &lt;field name="name"&gt;&lt;string/&gt;&lt;/field&gt;
+    &lt;field name="pid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
+    &lt;field name="ppid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
+    &lt;field name="tracer_pid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
+    &lt;field name="uid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
+    &lt;field name="euid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
+    &lt;field name="suid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
+    &lt;field name="fsuid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
+    &lt;field name="gid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
+    &lt;field name="egid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
+    &lt;field name="sgid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
+    &lt;field name="fsgid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
+    &lt;field name="state"&gt;&lt;enum size=4&gt;
+        Running WaitInterruptible WaitUninterruptible Zombie Traced Paging
+    &lt;/enum&gt;&lt;/field&gt;
+  &lt;/struct&gt;
+&lt;/event&gt;
+</TT></PRE>
+
+<H2>Facilities</H2>
+
+<P>
+Facilities define a granularity of events grouping for filtering, activation
+and compilation. Each facility does cost a table entry in the kernel (name,
+checksum, event type code range), or somewhere between 20 and 30 bytes. Having
+one facility per tracing statement in the kernel would be too much (assuming
+that they eventually are routinely inserted in the kernel code and replace 
+the 80000+ printk statements in some proportion). However, having a few 
+facilities, up to a few tens, would make sense.
+
+<P>
+The "builtin" facility contains a small number of predefined events which must
+always exist. The "core" facility contains a small subset of OS events which
+are almost always of interest (scheduling, interrupts, faults, system calls).
+Then, specialized facilities may exist for each subsystem (network, disks,
+USB, SCSI...).
+ 
+
+<H2>Bookmarks</H2>
+
+<P>
+Bookmarks are user supplied information added to a trace. They contain user
+annotations attached to a time interval.
+
+<PRE><TT>
+&lt;bookmarks&gt;
+  &lt;location name=name cpu=n start_time=t end_time=t&gt;Some text&lt;/location&gt;
+  ...
+&lt;/bookmarks&gt;
+</TT></PRE>
+
+<P>
+The interval is defined using either "time=" or "start_time=" and 
+"end_time=", or "cycle=" or "start_cycle=" and "end_cycle=". 
+The time is in seconds with decimals up to nanoseconds and cycle counts 
+are unsigned integers with a 64 bits range. The cpu attribute is optional.
+
+</BODY>
+</HTML>
+
+
+
+
diff --git a/ltt/branches/poly/doc/developer/ltt-to-do.html b/ltt/branches/poly/doc/developer/ltt-to-do.html

new file mode 100644 (file)

index 0000000..0fda117
--- /dev/null
+++ b/ltt/branches/poly/doc/developer/ltt-to-do.html
@@ -0,0 +1,204 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html>
+<head>
+  <title>Linux Trace Toolkit Status</title>
+</head>
+  <body>
+        
+<h1>Linux Trace Toolkit Status</h1>
+        
+<p><i>Last updated July 1, 2003.</i> </p>
+     
+<p>During the 2002 Ottawa Linux Symposium tracing BOF, a list of desirable 
+  features for LTT was collected by Richard Moore. Since then, a lot of infrastructure 
+ work on LTT has been taking place. This status report aims to track current 
+ development efforts and the current status of the various features. This
+status page is most certainly incomplete, please send 
+any additions and corrections to Michel Dagenais (michel.dagenais at polymtl.ca)</p>
+
+<p>As of this writing, the most active LTT contributors include Karim Yaghmour,
+author and maintainer from opersys.com, Tom Zanussi, Robert Wisniewski,
+Richard J Moore and others from IBM, mainly at the Linux Technology Center,
+XiangXiu Yang, Mathieu Desnoyers, Benoit des Ligneris and Michel Dagenais,
+from the department of Computer Engineering at Ecole Polytechnique de
+Montreal, and Frank Rowand, from Monte Vista.</p>
+
+<h2>Work recently performed</h2>
+        
+<p><b>Lockless per cpu buffers:</b> Tom Zanussi of IBM has implemented per CPU lockless buffering,  with low
+overhead very fine grained timestamping, and has updated accordingly  the
+kernel patch and the trace visualizer except for viewing multiple per CPU
+traces simultaneously.  </p>
+     
+<p><b>RelayFS:</b> Tom Zanussi has implemented RelayFS, a separate, simple 
+and efficient component for moving data between the kernel and user space
+applications. This component is reusable by other projects (printk, evlog, 
+lustre...) and removes a sizeable chunk from the current LTT, making each 
+piece (relayfs and relayfs-based LTT) simpler, more modular and possibly 
+more palatable for inclusion in the standard Linux kernel. Besides LTT on
+RelayFS, He has implemented printk over RelayFS with an automatically 
+resizeable printk buffer. </p>
+
+<p><b>New trace format:</b> Karim Yaghmour and Michel Dagenais, with input
+from several LTT contributors, have designed a new trace format to accomodate
+per buffer tracefiles and dynamically defined event types. The new format
+includes both the binary trace format and the event type description format.
+XiangXiu Yang has developed a simple parser for the event type description 
+format. This parser is used to generate the tracing macros in the kernel
+(genevent) and to support reading tracefiles in the trace reading library
+(libltt).
+
+<h2>Ongoing work</h2>
+
+<p><b>Libltt:</b> XiangXiu Yang is finishing up an event reading library
+and API which parses event descriptions and accordingly reads traces and
+decodes events.  </p>
+     
+<p><b>lttv:</b> XiangXiu Yang, Mathieu Desnoyers and Michel Dagenais are
+remodeling the trace visualizer to use the new trace format and libltt API,
+and to allow compiled and scripted plugins, which can dynamically 
+add new custom trace analysis functions.  </p>
+     
+<h2>Planned work</h2>
+        
+<p>LTT already interfaces with Dynamic Probes. This feature will need to
+be updated for the new LTT version.   </p>
+     
+<p>The Kernel Crash Dump utilities is  another very interesting complementary 
+ project. Interfacing it with RelayFS will help implement useful 
+flight-recorder like tracing for post-mortem analysis.  </p>
+     
+<p>User level tracing is available in the current LTT version but requires
+one system call per event. With the new RelayFS based infrastructure, it
+would be interesting to use a shared memory buffer directly accessible from
+user space. Having one RelayFS   channel per user would allow an extremely
+efficient, yet secure, user level  tracing mechanism.  </p>
+     
+<p>Sending important events (process creation, event types/facilities
+definitions) to a separate channel could be used to browse traces
+interactively more efficiently.  Only this concise trace of important
+events would need to be processed in its entirety, other larger
+gigabyte size traces could be used in random access without requiring
+a first preprocessing pass. A separate channel would also be required
+in case of incomplete traces such as when tracing to a circular buffer
+in "flight recorder" mode; the important events would all be kept
+while only the last buffers of ordinary events would be kept.  </p>
+     
+<p>Once the visualizer is able to read and display several traces, it
+  will be interesting to produce side by side synchronized views
+  (events from two interacting machines A and B one above the other)
+  or even merged views (combined events from several CPUs in a single
+  merged graph). Time differences between interacting systems will
+  need to be estimated and somewhat compensated for.  </p>
+     
+<p>LTT currently writes a <i>proc</i> file at trace start time. This
+  file only contains minimal information about processes and
+  interrupts names.  More information would be desirable for several
+  applications (process maps, opened descriptors, content of buffer
+  cache). Furthermore, this information may be more conveniently
+  gathered from within the kernel and simply written to the trace as
+  events at start time.  </p>
+     
+<h2>New features already implemented since LTT 0.9.5</h2>
+        
+<ol>
+    <li> Per-CPU Buffering scheme. </li>
+     <li> Logging without locking. </li>
+     <li> Minimal latency - minimal or no serialisation. (<i>Lockless tracing
+using  read_cycle_counter instead of gettimeofday.</i>) </li>
+               <li> Fine granularity time stamping - min=o(CPU cycle time),
+max=.05 Gb  Ethernet interrupt rate. (<i>Cycle counter being used</i>). </li>
+     <li> Random access to trace event stream. (<i>Random access reading
+of  events  in the trace is already available in LibLTT. However, one first
+pass   is required through the trace to find all the process creation events;
+the  cost of this first pass may be reduced in the future if process creation
+ events are sent to a separate much smaller trace</i>.) </li>
+     
+</ol>
+        
+<h2>Features being worked on</h2>
+        
+<ol>
+    <li> Simple wrapper macros for trace instrumentation. (<i>GenEvent</i>)
+   </li>
+     <li> Easily expandable with new trace types.  (<i>GenEvent</i>) </li>
+     <li> Multiple buffering schemes - switchable globally or selectable
+by  trace client. (<i>Will be simpler to obtain with RelayFS</i>.) </li>
+     <li> Global buffer scheme. (<i>Will be simpler to obtain with RelayFS</i>.)
+    </li>
+     <li> Per-process buffer scheme. (<i>Will be simpler to obtain with RelayFS.</i>)
+    </li>
+     <li> Per-NGPT thread buffer scheme. (<i>Will be simpler to obtain with 
+ RelayFS</i>.) </li>
+     <li> Per-component buffer scheme. (<i>Will be simpler to obtain with 
+RelayFS</i>.)    </li>
+          <li> A set of extensible and modular performance analysis post-processing
+programs. (<i>Lttv</i>)     </li>
+  <li> Filtering and selection mechanisms within formatting utility. (<i>Lttv</i>)
+    </li>
+     <li> Variable size event records. (<i>GenEvent, LibEvent, Lttv</i>)
+   </li>
+     <li> Data reduction facilities able to logically combine traces  from
+ more than one system. (<i>LibEvent, Lttv</i>) </li>
+     <li> Data presentation utilities to be able to present data from multiple 
+  trace instances in a logically combined form (<i>LibEvent, Lttv</i>) 
+  </li>
+     <li> Major/minor code means of identification/registration/assignment.
+ (<i>GenEvent</i>)    </li>
+     <li> A flexible formatting mechanism that will cater for structures
+and  arrays of structures with recursion. (<i>GenEvent</i>) </li>
+     
+</ol>
+        
+<h2>Features already planned for</h2>
+        
+<ol>
+    <li> Init-time tracing. (<i>To be part of RelayFS</i>.) </li>
+     <li>Updated interface for Dynamic Probes. (<i>As soon as things stabilize.</i>)
+    </li>
+     <li> Support "flight recorder" always on tracing with minimal resource
+consumption.  (<i>To be part of RelayFS and interfaced to the Kernel crash
+dump   facilities.)</i>    </li>
+     <li> Fine grained dynamic trace instrumentation for kernel space and 
+user   subsystems. (<i>Dynamic Probes, more efficient user level tracing.</i>)</li>
+     <li>System information logged at trace start. (<i>New special events 
+to add</i>.)</li>
+     <li>Collection of process memory map information at trace start/restart 
+ and updates of that information at fork/exec/exit. This allows address-to-name 
+  resolution for user space. </li>
+     <li>Include the facility to write system snapshots (total memory  layout 
+ for kernel, drivers, and all processes) to a file.  This is required  for 
+ trace post-processing on a system other than the one producing the trace.
+  Perhaps some of this is already implemented in the Kernel Crash Dump.</li>
+     <li>Even more efficient tracing from user space.</li>
+     <li>Better integration with tools to define static trace hooks.</li>
+     <li> Better integration with tools to dynamically activate tracing statements.</li>
+          
+</ol>
+        
+<h2>Features not currently planned</h2>
+        
+<ol>
+    <li>POSIX Tracing API compliance. </li>
+     <li>Ability to do function entry/exit tracing facility. (<i>Probably 
+ a totally orthogonal mechanism using either Dynamic Probes hooks or static
+  code instrumentation using the suitable GCC options for basic blocks instrumentation.</i>)</li>
+     <li>Processor performance counter (which most modern CPUs have) sampling 
+and recording. (<i>These counters can be read and their value sent in traced 
+events. Some support to collect these automatically at specific state change 
+times and to visualize the results would be nice.)</i></li>
+          <li>Suspend &amp; Resume capability. (<i>Why not simply stop the
+ trace and start a new one later, otherwise important information like process
+creations while suspended must be obtained in some other way.</i>)</li>
+     <li>Per-packet send/receive event. (<i>New event types will be easily
+added as needed.)</i></li>
+               
+</ol>
+   <br>
+     <br>
+
+</body>
+</html>
+
+
+
diff --git a/ltt/branches/poly/doc/developer/lttv.html b/ltt/branches/poly/doc/developer/lttv.html

new file mode 100644 (file)

index 0000000..3dc192d
--- /dev/null
+++ b/ltt/branches/poly/doc/developer/lttv.html
@@ -0,0 +1,417 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html>
+<head>
+  <title>Linux Trace Toolkit User tools</title>
+</head>
+  <body>
+
+<h1>Linux Trace Toolkit User tools</h1>
+
+<P>The Linux Trace Toolkit Visualizer, lttv, is a modular and extensible
+tool to read, analyze, annotate and display traces. It accesses traces through
+the libltt API and produces either textual output or graphical output using
+the GTK library. This document describes the architecture of lttv for
+developers.
+
+<P>Lttv is a small executable which links to the trace reading API, libltt,
+and to the glib and gobject base libraries. 
+By itself it contains just enough code to
+convert a trace to a textual format and to load modules. 
+The public
+functions defined in the main program are available to all modules.
+A number of
+<I>text</I> modules may be dynamically loaded to extend the capabilities of
+lttv, for instance to compute and print various statistics.
+
+<P>A more elaborate module, traceView, dynamically links to the GTK library
+and to a support library, libgtklttv. When loaded, it displays graphical
+windows in which one or more viewers in subwindows may be used to browse 
+details of events in traces. A number of other graphical modules may be 
+dynamically loaded to offer a choice of different viewers (e.g., process, 
+CPU or block devices state versus time).
+
+<H2>Main program: main.c</H2>
+
+<P>The main program parses the command line options, loads the requested 
+modules and executes the hooks registered in the global attributes
+(/hooks/main/before, /hooks/main/core, /hooks/main/after).
+
+<H3>Hooks for callbacks: hook.h (hook.c)</H3>
+
+<P>In a modular extensible application, each module registers callbacks to
+insure that it gets called at appropriate times (e.g., after command line
+options processing, at each event to compute statistics...). Hooks and lists
+of hooks are defined for this purpose and are normally stored in the global
+attributes under /hooks/*.
+
+<H3>Browsable data structures: iattribute.h (iattribute.c)</H3>
+
+<P>In several places, functions should operate on data structures for which the
+list of members is extensible. For example, the statistics printing 
+module should not be
+modified each time new statistics are added by other modules.
+For this purpose, a gobject interface is defined in iattribute.h to
+enumerate and access members in a data structure. Even if new modules
+define custom data structures for efficiently storing statistics while they
+are being computed, they will be generically accessible for the printing
+routine as long as they implement the iattribute interface.
+
+<H3>Extensible data structures: attribute.h (attribute.c)</H3>
+
+<P>To allow each module to add its needed members to important data structures,
+for instance new statistics for processes, the LttvAttributes type is
+a container for named typed values. Each attribute has a textual key (name) 
+and an associated typed value. 
+It is similar to a C data structure except that the
+number and type of the members can change dynamically. It may be accessed
+either directly or through the iattribute interface.
+
+<P>Some members may be LttvAttributes objects, thus forming a tree of
+attributes, not unlike hierarchical file systems or registries. This is used
+for the global attributes, used to exchange information between modules.
+Attributes are also attached to trace sets, traces and contexts to allow
+storing arbitrary attributes.
+
+<H3>Modules: module.h (module.c)</H3>
+
+<P>The benefit of modules is to avoid recompiling the whole application when
+adding new functionality. It also helps insuring that only the needed code 
+is loaded in memory.
+
+<P>Modules are loaded explicitly, being on the list of default modules or
+requested by a command line option, with g_module_open. The functions in
+the module are not directly accessible.
+Indeed, direct, compiled in, references to their functions would be dangerous
+since they would exist even before (if ever) the module is loaded.
+Each module contains a function named <i>init</i>. Its handle is obtained by
+the main program using g_module_symbol and is called.
+The <i>init</i> function of the module 
+then calls everything it needs from the main program or from libraries,
+typically registering callbacks in hooks lists stored in the global attributes.
+No module function other than <i>init</i> is 
+directly called. Modules cannot see the functions from other modules since
+they may or not be loaded at the same time.
+
+<P>The modules must see the declarations for the functions
+used, from the main program and from libraries, by including the associated 
+.h files. The list of libraries used must be provided as argument when
+a module is linked. This will insure that these libraries get loaded
+automatically when that module is loaded.
+
+<P>Libraries contain a number of functions available to modules and to the main
+program. They are loaded automatically at start time if linked by the main
+program or at module load time if linked by that module. Libraries are
+useful to contain functions needed by several modules. Indeed, functions
+used by a single module could be simply part of that module.
+
+<P>A list of loaded modules is maintained. When a module is requested, it
+is verified if the module is already loaded. A module may request other modules
+at the beginning of its init function. This will insure that these modules
+get loaded and initialized before the init function of the current module
+proceeds. Circular dependencies are obviously to be avoided as the 
+initialization order among mutually dependent modules will be arbitrary.
+
+<H3>Command line options: option.h (option.c)</H3>
+
+<P>Command line options are added as needed by the main program and by modules
+as they are loaded. Thus, while options are scanned and acted upon (i.e.,
+options to load modules), the
+list of options to recognize continues to grow. The options module registers
+to get called by /hooks/main/before. It offers hooks /hooks/option/before
+and /hooks/option/after which are called just before and just after
+processing the options. Many modules register in their init function to
+be called in /hooks/options/after to verify the options specified and
+register further hooks accordingly.
+
+<H2>Trace Analysis</H2>
+
+<P>The main purpose of the lttv application is to process trace sets,
+calling registered hooks for each event in the traces and maintaining
+a context (system state, accumulated statistics).
+
+<H3>Trace Sets: traceSet.h (traceSet.c)</H3>
+
+<P>Trace sets are defined such that several traces can be analyzed together.
+Traces may be added and removed as needed to a trace set.
+The main program stores a trace set in /trace_set/default.
+The content of the trace_set is defined by command line options and it is
+used by analysis modules (batch or interactive).
+
+<H3>Trace Set Analysis: processTrace.h (processTrace.c)</H3>
+
+<p>The function <i>lttv_process_trace_set</i> loops over all the events
+in the specified trace set for the specified time interval. <I>Before</I> 
+Hooks are first
+called for the trace set and for each trace and tracefile 
+(one per cpu plus control tracefiles) in the trace set.
+Then hooks are called for
+each event in sorted time order. Finally, <i>after</i> hooks are called
+for the trace set and for each trace and tracefile in it. 
+
+<P>To call all the event hooks in sorted time order, a priority queue
+(or sorted tree) is used. The first event from each tracefile is read and its
+time used as key in the sorted tree. The event with the lowest key is removed
+from the tree, the next event from that tracefile is read and reinserted in
+the tree. 
+
+<p>Each hook is called with a LttvContext gobject as call data. The LttvContext
+object for the trace set before/after hooks is provided in the call to
+lttv_process_trace_set. Shallow copies of this context are made for each
+trace in the trace set for the trace before/after hooks. Again, shallow
+copies of each trace context are made for each tracefile in a trace.
+The context for each tracefile is used both for the tracefile before/after
+hooks and when calling the hooks for the contained events.
+
+<p>The lttv_process_trace_set function sets appropriately the fields in the
+context before calling a hook. For example, when calling a hook event,
+the context contains:
+
+<DL>
+<DT>trace_set_context<DD> context for the trace set.
+<DT>trace_context<DD> context for the trace.
+<DT>ts<DD> trace set.
+<DT>t<DD> trace.
+<DT>tf<DD> tracefile.
+<DT>e<DD> event.
+</DL>
+
+<P>The cost of providing all this information in the context is relatively
+low. When calling a hook from one event to the next, in the same tracefile,
+only the event field needs to be changed.
+The contexts used when processing traces are key to extensibility and
+performance. New modules may need additional data members in the context to
+store intermediate results. For this purpose, it is possible to derive
+subtypes of LttvContext in order to add new data members.
+
+
+<H3>Reconstructing the system state from the trace: state.h (state.c)</H3>
+
+<P>The events in a trace often represent state transitions in the traced
+system. When the trace is processed, and events accessed in time sorted
+order, it is thus possible to reconstruct in part the state of the 
+traced system: state of each CPU, process, disk queue. The state of each
+process may contain detailed information such as opened file descriptors
+and memory map if needed by the analysis and if sufficient information is
+available in the trace. This incrementally updated state information may be
+used to display state graphs, or simply to compute state dependent
+statistics (time spent in user or system mode, waiting for a file...).
+
+<P>
+When tracing starts, at T0, no state is available. The OS state may be
+obtained through "initial state" events which enumerate the important OS data
+structures.  Unless the state is obtained atomically, other events
+describing state changes may be interleaved in the trace and must be
+processed in the correct order.  Once all the special initial state
+events are obtained, at Ts, the complete state is available. From there the
+system state can be deduced incrementally from the events in the trace.
+
+<P>
+Analysis tools must be prepared for missing state information. In some cases
+only a subset of events is traced, in others the trace may be truncated
+in <i>flight recorder</i> mode.
+
+<P>
+In interactive processing, the interval for which processing is required 
+varies. After scrolling a viewer, the events in the new interval to display
+need to be processed in order to redraw the view. To avoid restarting
+the processing at the trace start to reconstruct incrementally the system
+state, the computed state may be memorized at regular interval, for example at
+each 100 000 events, in a time indexed database associated with a trace.
+To conserve space, it may be possible in some cases to only store state 
+differences. 
+
+<p>To process a specific time interval, the state at the beginning of the
+interval would be obtained by copying the last preceeding saved state
+and processing the events since then to update the state.
+
+<p>A new subtype of LttvContext, LttvStateContext, is defined to add storage
+for the state information. It defines a trace set state as a set of trace
+state. The trace state is composed of processes, CPUs and block devices.
+Each CPU has a currently executing process and each process state keeps
+track the interrupt stack frames (faults, interrupts,
+system calls), executable file name and other information such as opened
+file descriptors. Each frame stores the process status, entry time
+and last status change time.
+
+<p>File state.c provides state updating hooks to be called when the trace is
+processed. When a scheduling change event is delivered to the hook, for
+instance, the current process for the CPU is changed and the state of the
+incoming and outgoing processes is changed.
+The state updating hooks are stored in the global attributes under 
+/hooks/state/core/trace_set/before, after, 
+/hooks/state/core/trace/before, after...
+to be used by processing functions requiring state updating (batch and 
+interactive alalysis, computing the state at time T by updating a preceeding
+saved state...).
+
+<H3>Computing Statistics: stats.h (stats.c)</H3>
+
+<p>This file defines a subtype of LttvStateContext, LttvStatsContext,
+to store statistics on various aspects of a trace set. The LttvTraceSetStats
+structure contains a set of LttvTraceStats structures. Each such structure
+contains structures for CPUs, processes, interrupt types (IRQ, system call,
+fault), subtypes (individual system calls, IRQs or faults) and
+block devices. The CPUs also contain structures for processes, interrupt types,
+subtypes and block devices. Process structures similarly contain
+structures for interrupt types, subtypes and block devices. At each level
+(trace set, trace, cpu, process, interrupt stack frames)
+attributes are used to store statistics. 
+
+<p>File stats.c provides statistics computing hooks to be called when the
+trace is processed. For example, when a <i>write</i> event is processed, 
+the attribute <i>BytesWritten</i> in the corresponding system, cpu, process,
+interrupt type (e.g. system call) and subtype (e.g. write) is incremented 
+by the number of bytes stored in the event. When the processing is finished, 
+perhaps in the after hooks, the number of bytes written and other statistics 
+may be summed over all CPUs for a given process, over all processes for a 
+given CPU or over all traces.
+
+<p>The basic set of statistics computed by stats.c include for the whole
+   trace set:
+
+<UL>
+<LI>Trace start time, end time and duration.
+<LI>Total number of events.
+<LI>Number of each event type (Interrupts, faults, system calls...)
+<LI>For each interrupt type and each subtype, the number of each event type.
+<LI>For each system:
+  <UL>
+  <LI>Total number of events.
+  <LI>Number of each event type (Interrupts, faults, system calls...)
+  <LI>For each interrupt type and each subtype, the number of each event type.
+  <LI>For each CPU:
+    <UL>
+    <LI> CPU id
+    <LI> User/System time
+    <LI> Number of each event type
+    <LI> For each interrupt type and each subtype, 
+         the number of each event type.
+    </UL>
+  <LI>For each block device:
+    <UL>
+    <LI> block device name
+    <LI> time busy/idle, average queue length
+    <LI> Number of each relevant event type (requests added, merged, served)
+    </UL>
+  <LI>For each process:
+    <UL>
+    <LI> Exec'ed file names.
+    <LI> Start and end time, User/System time
+    <LI> Number of each event type
+    <LI> For each interrupt type and each subtype, 
+         the number of each event type.
+    </UL>
+  </UL>
+</UL>
+
+<P>The structure to store statistics differs from the state storage structure
+in several ways. Statistics are maintained in different ways (per CPU all
+processes, per process all CPUs, per process on a given CPU...). Furthermore,
+statistics are maintained for all processes which existed during the trace
+while the state at time T only stores information about current processes.
+
+<P>The hooks defined by stats.c are stored in the global attributes under
+/hooks/stats/core/trace_set/before, after, 
+/hooks/stats/core/trace/before, after to be used by processing functions
+interested in statistics.
+
+<H3>Filtering events: filter.h (filter.c)</H3>
+
+<P>
+Filters are used to select which events in a trace are shown in a viewer or are
+used in a computation. The filtering rules are based on the values of 
+events fields. The filter module receives a filter expression and computes
+a compiled filter. The compiled filter then serves as hook data for 
+<i>check</i> event 
+filter hooks which, given a context containing an event, 
+return TRUE or FALSE to 
+indicate if the event satisfies the filter. Trace and tracefile <i>check</i>
+filter hooks
+may be used to determine if a system and CPU satisfy the filter. Finally,
+the filter module has a function to return the time bounds, if any, imposed
+by a filter.
+
+<P>For some applications, the hooks provided by the filter module may not 
+be sufficient, since they are based on simple boolean combinations
+of comparisons between fields and constants. In that case, custom code may be
+used for <i>check</i> hooks during the processing. An example of complex
+filtering could be to only show events belonging to processes which consumed
+more than 10% of the CPU in the last 10 seconds.
+
+<p>In module filter.c, filters are specified using textual expressions 
+with AND, OR, NOT operations on
+nested subexpressions. Primitive expressions compare an event field to
+a constant. In the graphical user interface, a filter editor is provided.
+
+<PRE><TT>
+tokens: ( ! && || == <= >= > < != name [ ] int float string )
+
+expression = ( expression ) OR ! expression OR
+     expression && expression OR expression || expression OR 
+     simple_expression
+
+simple_expression = field_selector OP value
+
+value = int OR float OR string OR enum
+
+field_selector = component OR component . field_selector
+
+component = name OR name [ int ]
+</TT></PRE>
+
+
+<H3>Batch Analysis: batchAnalysis.h (batchAnalysis.c)</H3>
+
+<p>This module registers to be called by the main program (/hooks/main/core). 
+When called, it gets the current trace set (/trace_set/default), 
+state updating hooks (/hooks/state/*) the statistics hooks 
+(/hooks/stats/*) and other analysis hooks (/hooks/batch/*)
+and runs lttv_process_trace_set for the entire
+trace set time interval. This simple processing of the complete trace set
+is normally sufficient for batch operations such as converting a trace to
+text and computing various statistics.
+
+
+<H3>Text output for events and statistics: textDump.h (textDump.c)</H3>
+
+<P>
+This module registers hooks (/hooks/batch)
+to print a textual representation of each event
+(event hooks) and to print the content of the statistics accumulated in the
+context (after trace set hook).
+
+<H2>Trace Set Viewers</H2>
+
+<p>
+A library, libgtklttv, is defined to provide utility functions for 
+the second set of modules, wich compose the interactive graphical user 
+interface. It offers functions to create and interact with top level trace 
+viewing windows, and to insert specialized embedded viewer modules. 
+The libgtklttv library requires the gtk library.
+The viewer modules include a detailed event list, eventsTableView,
+a process state graph, processStateView, and a CPU state graph, cpuStateView.
+
+<p>
+The top level gtkTraceSet, defined in libgtklttv, 
+window has the usual FILE EDIT... menu and a toolbar.
+It has an associated trace set (and filter) and contains several tabs, each
+containing several vertically stacked time synchronized trace set viewers.
+It manages the space allocated to each contained viewer, the menu items and
+tools registered by each contained viewer and the current time and current
+time interval.
+
+<P>
+When viewers change the current time or time interval, the gtkTraceSet
+window notifies all contained viewers. When one or more viewers need
+redrawing, the gtkTraceSet window calls the lttv_process_trace_set
+function for the needed time interval, after computing the system state
+for the interval start time. While events are processed, drawing hooks 
+from the viewers are called.
+
+<P>
+TO COMPLETE; description and motivation for the gtkTraceSet widget structure
+and interaction with viewers. Description and motivation for the detailed
+event view and process state view.
+
+</BODY>
+</HTML>
author	dagenais <dagenais@04897980-b3bd-0310-b5e0-8ef037075253>
	Fri, 4 Jul 2003 14:52:52 +0000 (14:52 +0000)
committer	dagenais <dagenais@04897980-b3bd-0310-b5e0-8ef037075253>
	Fri, 4 Jul 2003 14:52:52 +0000 (14:52 +0000)
ltt/branches/poly/doc/developer/format.html	[new file with mode: 0644]	patch \| blob
ltt/branches/poly/doc/developer/ltt-to-do.html	[new file with mode: 0644]	patch \| blob
ltt/branches/poly/doc/developer/lttv.html	[new file with mode: 0644]	patch \| blob