| 1 | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> |
| 2 | <html> |
| 3 | <head> |
| 4 | <title>Linux Trace Toolkit Status</title> |
| 5 | </head> |
| 6 | <body> |
| 7 | |
| 8 | <h1>Linux Trace Toolkit Status</h1> |
| 9 | |
| 10 | <p><i>Last updated July 1, 2003.</i> </p> |
| 11 | |
| 12 | <p>During the 2002 Ottawa Linux Symposium tracing BOF, a list of desirable |
| 13 | features for LTT was collected by Richard Moore. Since then, a lot of infrastructure |
| 14 | work on LTT has been taking place. This status report aims to track current |
| 15 | development efforts and the current status of the various features. This |
| 16 | status page is most certainly incomplete, please send |
| 17 | any additions and corrections to Michel Dagenais (michel.dagenais at polymtl.ca)</p> |
| 18 | |
| 19 | <p>As of this writing, the most active LTT contributors include Karim Yaghmour, |
| 20 | author and maintainer from opersys.com, Tom Zanussi, Robert Wisniewski, |
| 21 | Richard J Moore and others from IBM, mainly at the Linux Technology Center, |
| 22 | XiangXiu Yang, Mathieu Desnoyers, Benoit des Ligneris and Michel Dagenais, |
| 23 | from the department of Computer Engineering at Ecole Polytechnique de |
| 24 | Montreal, and Frank Rowand, from Monte Vista.</p> |
| 25 | |
| 26 | <h2>Work recently performed</h2> |
| 27 | |
| 28 | <p><b>Lockless per cpu buffers:</b> Tom Zanussi of IBM has implemented per CPU lockless buffering, with low |
| 29 | overhead very fine grained timestamping, and has updated accordingly the |
| 30 | kernel patch and the trace visualizer except for viewing multiple per CPU |
| 31 | traces simultaneously. </p> |
| 32 | |
| 33 | <p><b>RelayFS:</b> Tom Zanussi has implemented RelayFS, a separate, simple |
| 34 | and efficient component for moving data between the kernel and user space |
| 35 | applications. This component is reusable by other projects (printk, evlog, |
| 36 | lustre...) and removes a sizeable chunk from the current LTT, making each |
| 37 | piece (relayfs and relayfs-based LTT) simpler, more modular and possibly |
| 38 | more palatable for inclusion in the standard Linux kernel. Besides LTT on |
| 39 | RelayFS, He has implemented printk over RelayFS with an automatically |
| 40 | resizeable printk buffer. </p> |
| 41 | |
| 42 | <p><b>New trace format:</b> Karim Yaghmour and Michel Dagenais, with input |
| 43 | from several LTT contributors, have designed a new trace format to accomodate |
| 44 | per buffer tracefiles and dynamically defined event types. The new format |
| 45 | includes both the binary trace format and the event type description format. |
| 46 | XiangXiu Yang has developed a simple parser for the event type description |
| 47 | format. This parser is used to generate the tracing macros in the kernel |
| 48 | (genevent) and to support reading tracefiles in the trace reading library |
| 49 | (libltt). |
| 50 | |
| 51 | <h2>Ongoing work</h2> |
| 52 | |
| 53 | <p><b>Libltt:</b> XiangXiu Yang is finishing up an event reading library |
| 54 | and API which parses event descriptions and accordingly reads traces and |
| 55 | decodes events. </p> |
| 56 | |
| 57 | <p><b>lttv:</b> XiangXiu Yang, Mathieu Desnoyers and Michel Dagenais are |
| 58 | remodeling the trace visualizer to use the new trace format and libltt API, |
| 59 | and to allow compiled and scripted plugins, which can dynamically |
| 60 | add new custom trace analysis functions. </p> |
| 61 | |
| 62 | <h2>Planned work</h2> |
| 63 | |
| 64 | <p>LTT already interfaces with Dynamic Probes. This feature will need to |
| 65 | be updated for the new LTT version. </p> |
| 66 | |
| 67 | <p>The Kernel Crash Dump utilities is another very interesting complementary |
| 68 | project. Interfacing it with RelayFS will help implement useful |
| 69 | flight-recorder like tracing for post-mortem analysis. </p> |
| 70 | |
| 71 | <p>User level tracing is available in the current LTT version but requires |
| 72 | one system call per event. With the new RelayFS based infrastructure, it |
| 73 | would be interesting to use a shared memory buffer directly accessible from |
| 74 | user space. Having one RelayFS channel per user would allow an extremely |
| 75 | efficient, yet secure, user level tracing mechanism. </p> |
| 76 | |
| 77 | <p>Sending important events (process creation, event types/facilities |
| 78 | definitions) to a separate channel could be used to browse traces |
| 79 | interactively more efficiently. Only this concise trace of important |
| 80 | events would need to be processed in its entirety, other larger |
| 81 | gigabyte size traces could be used in random access without requiring |
| 82 | a first preprocessing pass. A separate channel would also be required |
| 83 | in case of incomplete traces such as when tracing to a circular buffer |
| 84 | in "flight recorder" mode; the important events would all be kept |
| 85 | while only the last buffers of ordinary events would be kept. </p> |
| 86 | |
| 87 | <p>Once the visualizer is able to read and display several traces, it |
| 88 | will be interesting to produce side by side synchronized views |
| 89 | (events from two interacting machines A and B one above the other) |
| 90 | or even merged views (combined events from several CPUs in a single |
| 91 | merged graph). Time differences between interacting systems will |
| 92 | need to be estimated and somewhat compensated for. </p> |
| 93 | |
| 94 | <p>LTT currently writes a <i>proc</i> file at trace start time. This |
| 95 | file only contains minimal information about processes and |
| 96 | interrupts names. More information would be desirable for several |
| 97 | applications (process maps, opened descriptors, content of buffer |
| 98 | cache). Furthermore, this information may be more conveniently |
| 99 | gathered from within the kernel and simply written to the trace as |
| 100 | events at start time. </p> |
| 101 | |
| 102 | <h2>New features already implemented since LTT 0.9.5</h2> |
| 103 | |
| 104 | <ol> |
| 105 | <li> Per-CPU Buffering scheme. </li> |
| 106 | <li> Logging without locking. </li> |
| 107 | <li> Minimal latency - minimal or no serialisation. (<i>Lockless tracing |
| 108 | using read_cycle_counter instead of gettimeofday.</i>) </li> |
| 109 | <li> Fine granularity time stamping - min=o(CPU cycle time), |
| 110 | max=.05 Gb Ethernet interrupt rate. (<i>Cycle counter being used</i>). </li> |
| 111 | <li> Random access to trace event stream. (<i>Random access reading |
| 112 | of events in the trace is already available in LibLTT. However, one first |
| 113 | pass is required through the trace to find all the process creation events; |
| 114 | the cost of this first pass may be reduced in the future if process creation |
| 115 | events are sent to a separate much smaller trace</i>.) </li> |
| 116 | |
| 117 | </ol> |
| 118 | |
| 119 | <h2>Features being worked on</h2> |
| 120 | |
| 121 | <ol> |
| 122 | <li> Simple wrapper macros for trace instrumentation. (<i>GenEvent</i>) |
| 123 | </li> |
| 124 | <li> Easily expandable with new trace types. (<i>GenEvent</i>) </li> |
| 125 | <li> Multiple buffering schemes - switchable globally or selectable |
| 126 | by trace client. (<i>Will be simpler to obtain with RelayFS</i>.) </li> |
| 127 | <li> Global buffer scheme. (<i>Will be simpler to obtain with RelayFS</i>.) |
| 128 | </li> |
| 129 | <li> Per-process buffer scheme. (<i>Will be simpler to obtain with RelayFS.</i>) |
| 130 | </li> |
| 131 | <li> Per-NGPT thread buffer scheme. (<i>Will be simpler to obtain with |
| 132 | RelayFS</i>.) </li> |
| 133 | <li> Per-component buffer scheme. (<i>Will be simpler to obtain with |
| 134 | RelayFS</i>.) </li> |
| 135 | <li> A set of extensible and modular performance analysis post-processing |
| 136 | programs. (<i>Lttv</i>) </li> |
| 137 | <li> Filtering and selection mechanisms within formatting utility. (<i>Lttv</i>) |
| 138 | </li> |
| 139 | <li> Variable size event records. (<i>GenEvent, LibEvent, Lttv</i>) |
| 140 | </li> |
| 141 | <li> Data reduction facilities able to logically combine traces from |
| 142 | more than one system. (<i>LibEvent, Lttv</i>) </li> |
| 143 | <li> Data presentation utilities to be able to present data from multiple |
| 144 | trace instances in a logically combined form (<i>LibEvent, Lttv</i>) |
| 145 | </li> |
| 146 | <li> Major/minor code means of identification/registration/assignment. |
| 147 | (<i>GenEvent</i>) </li> |
| 148 | <li> A flexible formatting mechanism that will cater for structures |
| 149 | and arrays of structures with recursion. (<i>GenEvent</i>) </li> |
| 150 | |
| 151 | </ol> |
| 152 | |
| 153 | <h2>Features already planned for</h2> |
| 154 | |
| 155 | <ol> |
| 156 | <li> Init-time tracing. (<i>To be part of RelayFS</i>.) </li> |
| 157 | <li>Updated interface for Dynamic Probes. (<i>As soon as things stabilize.</i>) |
| 158 | </li> |
| 159 | <li> Support "flight recorder" always on tracing with minimal resource |
| 160 | consumption. (<i>To be part of RelayFS and interfaced to the Kernel crash |
| 161 | dump facilities.)</i> </li> |
| 162 | <li> Fine grained dynamic trace instrumentation for kernel space and |
| 163 | user subsystems. (<i>Dynamic Probes, more efficient user level tracing.</i>)</li> |
| 164 | <li>System information logged at trace start. (<i>New special events |
| 165 | to add</i>.)</li> |
| 166 | <li>Collection of process memory map information at trace start/restart |
| 167 | and updates of that information at fork/exec/exit. This allows address-to-name |
| 168 | resolution for user space. </li> |
| 169 | <li>Include the facility to write system snapshots (total memory layout |
| 170 | for kernel, drivers, and all processes) to a file. This is required for |
| 171 | trace post-processing on a system other than the one producing the trace. |
| 172 | Perhaps some of this is already implemented in the Kernel Crash Dump.</li> |
| 173 | <li>Even more efficient tracing from user space.</li> |
| 174 | <li>Better integration with tools to define static trace hooks.</li> |
| 175 | <li> Better integration with tools to dynamically activate tracing statements.</li> |
| 176 | |
| 177 | </ol> |
| 178 | |
| 179 | <h2>Features not currently planned</h2> |
| 180 | |
| 181 | <ol> |
| 182 | <li>POSIX Tracing API compliance. </li> |
| 183 | <li>Ability to do function entry/exit tracing facility. (<i>Probably |
| 184 | a totally orthogonal mechanism using either Dynamic Probes hooks or static |
| 185 | code instrumentation using the suitable GCC options for basic blocks instrumentation.</i>)</li> |
| 186 | <li>Processor performance counter (which most modern CPUs have) sampling |
| 187 | and recording. (<i>These counters can be read and their value sent in traced |
| 188 | events. Some support to collect these automatically at specific state change |
| 189 | times and to visualize the results would be nice.)</i></li> |
| 190 | <li>Suspend & Resume capability. (<i>Why not simply stop the |
| 191 | trace and start a new one later, otherwise important information like process |
| 192 | creations while suspended must be obtained in some other way.</i>)</li> |
| 193 | <li>Per-packet send/receive event. (<i>New event types will be easily |
| 194 | added as needed.)</i></li> |
| 195 | |
| 196 | </ol> |
| 197 | <br> |
| 198 | <br> |
| 199 | |
| 200 | </body> |
| 201 | </html> |
| 202 | |
| 203 | |
| 204 | |