584db146 |
1 | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> |
2 | <html> |
3 | <head> |
4 | <title>Linux Trace Toolkit Status</title> |
5 | </head> |
6 | <body> |
7 | |
8 | <h1>Linux Trace Toolkit Status</h1> |
9 | |
10 | <p><i>Last updated July 1, 2003.</i> </p> |
11 | |
12 | <p>During the 2002 Ottawa Linux Symposium tracing BOF, a list of desirable |
13 | features for LTT was collected by Richard Moore. Since then, a lot of infrastructure |
14 | work on LTT has been taking place. This status report aims to track current |
15 | development efforts and the current status of the various features. This |
16 | status page is most certainly incomplete, please send |
17 | any additions and corrections to Michel Dagenais (michel.dagenais at polymtl.ca)</p> |
18 | |
19 | <p>As of this writing, the most active LTT contributors include Karim Yaghmour, |
20 | author and maintainer from opersys.com, Tom Zanussi, Robert Wisniewski, |
21 | Richard J Moore and others from IBM, mainly at the Linux Technology Center, |
22 | XiangXiu Yang, Mathieu Desnoyers, Benoit des Ligneris and Michel Dagenais, |
23 | from the department of Computer Engineering at Ecole Polytechnique de |
24 | Montreal, and Frank Rowand, from Monte Vista.</p> |
25 | |
26 | <h2>Work recently performed</h2> |
27 | |
28 | <p><b>Lockless per cpu buffers:</b> Tom Zanussi of IBM has implemented per CPU lockless buffering, with low |
29 | overhead very fine grained timestamping, and has updated accordingly the |
30 | kernel patch and the trace visualizer except for viewing multiple per CPU |
31 | traces simultaneously. </p> |
32 | |
33 | <p><b>RelayFS:</b> Tom Zanussi has implemented RelayFS, a separate, simple |
34 | and efficient component for moving data between the kernel and user space |
35 | applications. This component is reusable by other projects (printk, evlog, |
36 | lustre...) and removes a sizeable chunk from the current LTT, making each |
37 | piece (relayfs and relayfs-based LTT) simpler, more modular and possibly |
38 | more palatable for inclusion in the standard Linux kernel. Besides LTT on |
39 | RelayFS, He has implemented printk over RelayFS with an automatically |
40 | resizeable printk buffer. </p> |
41 | |
42 | <p><b>New trace format:</b> Karim Yaghmour and Michel Dagenais, with input |
43 | from several LTT contributors, have designed a new trace format to accomodate |
44 | per buffer tracefiles and dynamically defined event types. The new format |
45 | includes both the binary trace format and the event type description format. |
46 | XiangXiu Yang has developed a simple parser for the event type description |
47 | format. This parser is used to generate the tracing macros in the kernel |
48 | (genevent) and to support reading tracefiles in the trace reading library |
49 | (libltt). |
50 | |
51 | <h2>Ongoing work</h2> |
52 | |
53 | <p><b>Libltt:</b> XiangXiu Yang is finishing up an event reading library |
54 | and API which parses event descriptions and accordingly reads traces and |
55 | decodes events. </p> |
56 | |
57 | <p><b>lttv:</b> XiangXiu Yang, Mathieu Desnoyers and Michel Dagenais are |
58 | remodeling the trace visualizer to use the new trace format and libltt API, |
59 | and to allow compiled and scripted plugins, which can dynamically |
60 | add new custom trace analysis functions. </p> |
61 | |
62 | <h2>Planned work</h2> |
63 | |
64 | <p>LTT already interfaces with Dynamic Probes. This feature will need to |
65 | be updated for the new LTT version. </p> |
66 | |
67 | <p>The Kernel Crash Dump utilities is another very interesting complementary |
68 | project. Interfacing it with RelayFS will help implement useful |
69 | flight-recorder like tracing for post-mortem analysis. </p> |
70 | |
71 | <p>User level tracing is available in the current LTT version but requires |
72 | one system call per event. With the new RelayFS based infrastructure, it |
73 | would be interesting to use a shared memory buffer directly accessible from |
74 | user space. Having one RelayFS channel per user would allow an extremely |
75 | efficient, yet secure, user level tracing mechanism. </p> |
76 | |
77 | <p>Sending important events (process creation, event types/facilities |
78 | definitions) to a separate channel could be used to browse traces |
79 | interactively more efficiently. Only this concise trace of important |
80 | events would need to be processed in its entirety, other larger |
81 | gigabyte size traces could be used in random access without requiring |
82 | a first preprocessing pass. A separate channel would also be required |
83 | in case of incomplete traces such as when tracing to a circular buffer |
84 | in "flight recorder" mode; the important events would all be kept |
85 | while only the last buffers of ordinary events would be kept. </p> |
86 | |
87 | <p>Once the visualizer is able to read and display several traces, it |
88 | will be interesting to produce side by side synchronized views |
89 | (events from two interacting machines A and B one above the other) |
90 | or even merged views (combined events from several CPUs in a single |
91 | merged graph). Time differences between interacting systems will |
92 | need to be estimated and somewhat compensated for. </p> |
93 | |
94 | <p>LTT currently writes a <i>proc</i> file at trace start time. This |
95 | file only contains minimal information about processes and |
96 | interrupts names. More information would be desirable for several |
97 | applications (process maps, opened descriptors, content of buffer |
98 | cache). Furthermore, this information may be more conveniently |
99 | gathered from within the kernel and simply written to the trace as |
100 | events at start time. </p> |
101 | |
102 | <h2>New features already implemented since LTT 0.9.5</h2> |
103 | |
104 | <ol> |
105 | <li> Per-CPU Buffering scheme. </li> |
106 | <li> Logging without locking. </li> |
107 | <li> Minimal latency - minimal or no serialisation. (<i>Lockless tracing |
108 | using read_cycle_counter instead of gettimeofday.</i>) </li> |
109 | <li> Fine granularity time stamping - min=o(CPU cycle time), |
110 | max=.05 Gb Ethernet interrupt rate. (<i>Cycle counter being used</i>). </li> |
111 | <li> Random access to trace event stream. (<i>Random access reading |
112 | of events in the trace is already available in LibLTT. However, one first |
113 | pass is required through the trace to find all the process creation events; |
114 | the cost of this first pass may be reduced in the future if process creation |
115 | events are sent to a separate much smaller trace</i>.) </li> |
116 | |
117 | </ol> |
118 | |
119 | <h2>Features being worked on</h2> |
120 | |
121 | <ol> |
122 | <li> Simple wrapper macros for trace instrumentation. (<i>GenEvent</i>) |
123 | </li> |
124 | <li> Easily expandable with new trace types. (<i>GenEvent</i>) </li> |
125 | <li> Multiple buffering schemes - switchable globally or selectable |
126 | by trace client. (<i>Will be simpler to obtain with RelayFS</i>.) </li> |
127 | <li> Global buffer scheme. (<i>Will be simpler to obtain with RelayFS</i>.) |
128 | </li> |
129 | <li> Per-process buffer scheme. (<i>Will be simpler to obtain with RelayFS.</i>) |
130 | </li> |
131 | <li> Per-NGPT thread buffer scheme. (<i>Will be simpler to obtain with |
132 | RelayFS</i>.) </li> |
133 | <li> Per-component buffer scheme. (<i>Will be simpler to obtain with |
134 | RelayFS</i>.) </li> |
135 | <li> A set of extensible and modular performance analysis post-processing |
136 | programs. (<i>Lttv</i>) </li> |
137 | <li> Filtering and selection mechanisms within formatting utility. (<i>Lttv</i>) |
138 | </li> |
139 | <li> Variable size event records. (<i>GenEvent, LibEvent, Lttv</i>) |
140 | </li> |
141 | <li> Data reduction facilities able to logically combine traces from |
142 | more than one system. (<i>LibEvent, Lttv</i>) </li> |
143 | <li> Data presentation utilities to be able to present data from multiple |
144 | trace instances in a logically combined form (<i>LibEvent, Lttv</i>) |
145 | </li> |
146 | <li> Major/minor code means of identification/registration/assignment. |
147 | (<i>GenEvent</i>) </li> |
148 | <li> A flexible formatting mechanism that will cater for structures |
149 | and arrays of structures with recursion. (<i>GenEvent</i>) </li> |
150 | |
151 | </ol> |
152 | |
153 | <h2>Features already planned for</h2> |
154 | |
155 | <ol> |
156 | <li> Init-time tracing. (<i>To be part of RelayFS</i>.) </li> |
157 | <li>Updated interface for Dynamic Probes. (<i>As soon as things stabilize.</i>) |
158 | </li> |
159 | <li> Support "flight recorder" always on tracing with minimal resource |
160 | consumption. (<i>To be part of RelayFS and interfaced to the Kernel crash |
161 | dump facilities.)</i> </li> |
162 | <li> Fine grained dynamic trace instrumentation for kernel space and |
163 | user subsystems. (<i>Dynamic Probes, more efficient user level tracing.</i>)</li> |
164 | <li>System information logged at trace start. (<i>New special events |
165 | to add</i>.)</li> |
166 | <li>Collection of process memory map information at trace start/restart |
167 | and updates of that information at fork/exec/exit. This allows address-to-name |
168 | resolution for user space. </li> |
169 | <li>Include the facility to write system snapshots (total memory layout |
170 | for kernel, drivers, and all processes) to a file. This is required for |
171 | trace post-processing on a system other than the one producing the trace. |
172 | Perhaps some of this is already implemented in the Kernel Crash Dump.</li> |
173 | <li>Even more efficient tracing from user space.</li> |
174 | <li>Better integration with tools to define static trace hooks.</li> |
175 | <li> Better integration with tools to dynamically activate tracing statements.</li> |
176 | |
177 | </ol> |
178 | |
179 | <h2>Features not currently planned</h2> |
180 | |
181 | <ol> |
182 | <li>POSIX Tracing API compliance. </li> |
183 | <li>Ability to do function entry/exit tracing facility. (<i>Probably |
184 | a totally orthogonal mechanism using either Dynamic Probes hooks or static |
185 | code instrumentation using the suitable GCC options for basic blocks instrumentation.</i>)</li> |
186 | <li>Processor performance counter (which most modern CPUs have) sampling |
187 | and recording. (<i>These counters can be read and their value sent in traced |
188 | events. Some support to collect these automatically at specific state change |
189 | times and to visualize the results would be nice.)</i></li> |
190 | <li>Suspend & Resume capability. (<i>Why not simply stop the |
191 | trace and start a new one later, otherwise important information like process |
192 | creations while suspended must be obtained in some other way.</i>)</li> |
193 | <li>Per-packet send/receive event. (<i>New event types will be easily |
194 | added as needed.)</i></li> |
195 | |
196 | </ol> |
197 | <br> |
198 | <br> |
199 | |
200 | </body> |
201 | </html> |
202 | |
203 | |
204 | |