1 Adding support for "compact" 32 bits events.
6 Use a separate channel for compact events
8 Mux those events into this channel and magically they are "compact". Isn't it
20 wraps 32 times per second at 4GHz
21 each wraps spaced from 0.03125s
22 100HZ clock : tick each 0.01s
23 detect wrap at least each 3 jiffies (dangerous, may miss)
24 granularity : 2^0 = 1 cycle : 0.25ns @4GHz
25 payload size known by facility
32 wraps each second at 4GHz
33 100HZ clock : tick each 0.01s
34 granularity : 2^5 = 32 cycles : 8ns @4GHz
35 payload size known by facility
42 wraps each 0.5 second at 4GHz
43 100HZ clock : tick each 0.01s
44 granularity : 2^6 = 64 cycles : 16ns @4GHz
45 payload size known by facility
52 wraps each 0.5 second at 4GHz
53 100HZ clock : tick each 0.01s
54 granularity : 2^7 = 128 cycles : 32ns @4GHz
55 payload size known by facility
64 wraps each second at 4GHz
65 100HZ clock : tick each 0.01s
66 16 bits event id, (major 8 minor 8)
68 16 bits event size (extra)
70 96 bits header (full 64 bits TSC, useful when no heartbeat available)
73 wraps each 146.14 years at 4GHz
74 16 bits event id, (major 8 minor 8)
76 16 bits event size (extra)
79 ## Discussion of compact events
81 Must put the event ID fields first in the large (64, 96-128 bits) event headers
82 What is the minimum granularity required ? (so we know how much LSB to cut)
83 - How much can synchonized CPU TSCs drift apart one from another ?
85 http://en.wikipedia.org/wiki/Phase-locked_loop
86 static phase offset -> tracking jitter
87 25 MHz oscillator on motherboard for CPU
88 jitter : expressed in ±picoseconds (should therefore be lower than 0.25ns)
89 http://www.eetasia.com/ART_8800082274_480600_683c4e6b200103.HTM
91 - What is the cacheline synchronization latency between the CPUs ?
92 Worse case : Intel Core 2, Intel Xeon 5100, Intel core solo, intel core duo
93 Unified L2 cache. http://www.intel.com/design/processor/manuals/253668.pdf
94 Intel Core 2, Intel Xeon 5100
95 http://www.intel.com/design/processor/manuals/253665.pdf
97 http://www.xbitlabs.com/articles/mobile/display/core2duo_2.html
98 Intel Core Duo Intel Core 2 Duo
99 L2 cache latency 14 cycles 14 cycles
100 (round-trip : 28 cycles) 7ns @4GHz
101 sparc64 : between threads : shares L1 cache.
102 suspected to be ~2 cycles total (1+1) (to check)
103 - How close (cycle-wise) can be two consecutive recorded events in the same
104 buffer ? (~200ns, time for logging an event) (~800 cycles @4GHz)
105 - Tracing code itself : if it's at a subbuffer boundary, more check to do.
106 Must see the maximum duration of a non interrupted probe.
107 Worse case (had NMIs enabled) : 6997 cycles. 1749 ns @4GHz.
108 TODO : test with NMIs disabled and HT disabled.
109 Ordering can be changed if an interrupt comes between the memory operation
110 and the tracer call. Therefore, we cannot rely on more precision than the
111 expected interrupt handler duration. (guess : ~10000cycles, 2500ns@4GHz)
112 - If there is a faster interconnect between the CPUs, it can be a problem, but
113 seems to only be proprietary interconnects, not used in general.
114 - IPI are expected to take much more than 28 cycles.
115 What is the minimum wrap-around interval ? (must be safe for timer interrupt
116 miss and multiple timer HZ (configurable) and CPU MHZ frequencies)
118 Granularity : 800ns (200 cycles@4GHz) : 2^9 = 512 (remove 9 LSB)
119 Probe never takes 1 cycle.
120 Number of LSB skipped : max(0, (long)find_first_bit(probe_duration_in_cycles)-1)
122 Min wrap : 100HZ system, each 3 timer ticks : 0.03s (32-4 MSB for 4 GHZ : 0.26s)
123 (heartbeat each 100HZ, to be safe)
124 Number of MSB to skip :
125 32 - find_first_bit(( (expected_longest_interrupt_latency()[ms] +
126 max_timer_interval[ms]) * cpu_khz[kcycles/s] )) - 1
127 (the last -1 is to make sure we remove less or exact amount of bits, round
128 near to 0, not round up).
132 Event : 32 bytes in size
133 each timer tick : 100HZ
136 9LSB + 4MSB = 13 bits total. 13 bits for event IDs : 8192 events.