Fix a problem with the detailed event list "seek backward". In the following
condition:
- Long interval between events (e.g. generated with power management suspend).
- Happening close to trace start.
- Trace start near 0s 0ns.
The substraction could underflow. Fix this by comparing the time to substract
and floor to trace start time if it would underflow.
The visible effect was that the detailed event list is seeked to the end of the
trace rather than the previous event when going "up" one event prior to the
suspend begin.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> CC: Viktor Rosendahl <viktor.rosendahl@nokia.com>
Benjamin Poirier [Thu, 17 Dec 2009 16:28:29 +0000 (11:28 -0500)]
Rebuild traceset contexts after performing synchronization
This fixes an integration bug with the state system that caused the control
flow view display to become corrupted when zooming in closely to synchronized
traces. It also caused many messages like
WARNING **: Cannot find pin_in in schedchange 5
to be displayed.
Signed-off-by: Benjamin Poirier <benjamin.poirier@polymtl.ca>
Benjamin Poirier [Mon, 16 Nov 2009 22:04:54 +0000 (17:04 -0500)]
Store graph callbacks in a structure
Also support two classes of graphs: with "trace-trace" scale (both axes
present timestamp data); with "trace-time" scale (horizontal axis presents
timestamp data, vertical axis presents difference between timstamps)
Signed-off-by: Benjamin Poirier <benjamin.poirier@polymtl.ca>
These are very cool and fancy! In a single pass you get good resolution for
small values (not all lumped in one bin) without wasting many small bins for
large values. You also do no loose any values at all thanks to underflow and
overflow bins at each end. Number of bins is configurable.
Signed-off-by: Benjamin Poirier <benjamin.poirier@polymtl.ca>
Call the stats and graph functions from sync_chain
Versus the former daisy chain method, this avoids having the analysis stats
and graph functions called twice when there are many matching modules (via
matching_distributor).
Signed-off-by: Benjamin Poirier <benjamin.poirier@polymtl.ca>
Benjamin Poirier [Tue, 27 Oct 2009 17:56:25 +0000 (13:56 -0400)]
Add a batchanalysis module to build and run a sync chain
This is mostly to build a sync chain with an analysis module that evaluates
the quality of synchronization. It does not modifiy the time correction
factors in the traces.
Signed-off-by: Benjamin Poirier <benjamin.poirier@polymtl.ca>
Benjamin Poirier [Tue, 20 Oct 2009 18:25:06 +0000 (14:25 -0400)]
Adjust the marker names used for clock synchronization
... according to the patches posted to the ltt-dev list on 2009-10-21. There
are now regular and _extended version of the markers. Synchronization needs
the extended version.
Signed-off-by: Benjamin Poirier <benjamin.poirier@polymtl.ca>
Make the synchronization module interfaces more generic
Instead of taking NetEvents and Packets, public interfaces take Events,
Messages and Exchanges. These are specialized into other structures for TCP.
This is to support the eventual integration of algorithms based on other event
types, like UDP.
Signed-off-by: Benjamin Poirier <benjamin.poirier@polymtl.ca>
Add a unittest program for clock synchronization modules
Allows to test matching and analysis modules with data read from text files.
Includes some sample data for simple good and bad (unsynchronizable) cases.
Signed-off-by: Benjamin Poirier <benjamin.poirier@polymtl.ca>
Benjamin Poirier [Fri, 14 Aug 2009 19:54:45 +0000 (15:54 -0400)]
Add convex hull algorithm-based synchronization
This analysis module implements an algorithm that provides a garantee that the
synchronization will not result in inverted messages. It is now the default
algorithm, over linear regression.
Signed-off-by: Benjamin Poirier <benjamin.poirier@polymtl.ca>
Benjamin Poirier [Mon, 10 Aug 2009 20:13:40 +0000 (16:13 -0400)]
Do not use pkfree_skb events for synchronization
Don't rely on events indicating when sk_buff structures are freed. After a
receive, we wait for another event indicating that this receive was for TCP
data. In the case where the data was not TCP, instead of keeping information
about the receive, we used to discard it when the skb was freed. It turns out
that it faster (and simpler) not to look at pkfree_skb events and keep the
information around anyways. Since sk_buff's are allocated in a pool, the
information will get overwritten and the size of pendingRecv will not grow
infinitely.
Signed-off-by: Benjamin Poirier <benjamin.poirier@polymtl.ca>
Benjamin Poirier [Wed, 22 Jul 2009 18:21:32 +0000 (14:21 -0400)]
Graphical mode synchronization
Hooks the trace synchronization code in lttvwindow. This allows to use trace
synchronization in graphical mode when lttv is started with the "--sync"
option. Unfortunately, the viewer interface "freezes" while the
synchronization code is running. This can take a noticeable amount of time
(more than a minute) for large traces.
Signed-off-by: Benjamin Poirier <benjamin.poirier@polymtl.ca>
Benjamin Poirier [Fri, 10 Jul 2009 15:31:47 +0000 (11:31 -0400)]
Text mode clock synchronization
Adds linear regression based clock correction of a group of traces. This is
mostly a reimplementation of the work presented in
[1] Clément, E.: Synchronisation de traces dans un réseau distribué, École
Polytechnique de montréal, 2006
This implementation does not use a synchronization "window" and timeouts as
described in figure 3.3 of [1]. Not using that algorithm, we don't have to
rely on the system time to be loosely synchronized; events can be processed
regardless of the order the traces are merge-sorted. The downside is that
there is no limit to the number of events that could be queued up, waiting for
their matching event from another trace. The solution to this would be to
insert extra logic (like a plugable module) that controls the merge-sort;
upstream of event delivery to the syncrhonization code. Many possibilities
come to mind:
1) reimplement something like what is found in [1] that relies on system time
for a weak pre-synchronization
2) use a heuristic based on the size of the current queue for each trace in
the synchronization code. If a trace's queue is growing and growing, deliver
events to other traces.
This implementation also does not need to keep track of the IP adresses
assigned to local interfaces. If a packet is processed by the network
subsystem it can be used for synchronization. There is no need to know if the
node really the packet's final destination.
Signed-off-by: Benjamin Poirier <benjamin.poirier@polymtl.ca>
IRQ tables, trap tables and softirq tables can grow. The copy mechanism for
these is not correct when saving/restoring state, because it uses always the
name table size (which grows) to copy the saved/restored state snapshots.
We should change the g_new allocations for g_arrays, and use the array length as
boundary for the copy rather than the name table len.
TODO / FIXME !
I currently increased the initial irq name table size to 512 to deal correctly
with ARM. This is a dumb temporary fix.
Chris Smowton [Fri, 27 Nov 2009 17:18:31 +0000 (12:18 -0500)]
wakeup wait for cpu display
Here's a slightly more constructive patch: this one adds support in
LTTV's generic state monitoring code and its control flow visualiser to
note when a process is woken by another process using the
sched_try_wakeup event.
Previously the woken process would continue to register its old state
until such time as it got scheduled; here the process transitions to
WAIT_CPU state (like preempted processes, indicating it is ready to run
but not currently scheduled). Ordinarily we see this state exist very
briefly, in between the device driver IRQ (typically) clearing it to run
and the scheduler invocation after irq_exit, but on a heavily loaded
system we might see a large stripe of dark yellow indicating the process
is ready but cannot yet be allocated a core.
The new code in controlflow/eventhooks.c is essentially a copy of the
second half of before_schedchange -- it would be nice to factor these
two and before_execmode, all of which basically identify a process,
create state objects if necessary, and draw his line up to a certain
time.
Chris Smowton [Wed, 18 Nov 2009 17:49:12 +0000 (12:49 -0500)]
LTTV trace control bug fix
Chris Smowton <cs448@cam.ac.uk>:
...whenever I tried
to start a trace using the GUI, it would freeze consuming 100% CPU after
I clicked "start". Turned out this was because in tracecontrol.c's
start_clicked callback, you poll(2) on an FD and use a switch()
statement to handle its return.
Unfortunately, poll(2) doesn't work that way -- it returns a *mask* of
bits, not a single value. Here poll was returning POLLIN | POLLHUP to
indicate there's data ready and the FD has been closed by the other
side, and since this != POLLIN and != POLLHUP, the poll loop spins
forever.
Attached is a patch to be applied to tracecontrol.c which fixes it to
check for set-bits instead. It's still strictly broken, as the read(fd,
buf, 256) call might not fully drain the child's output, but it's a step
in the right direction and means I can at least use the trace-control
thing.