1 Adding support for "compact" 32 bits events.
12 1 bit to select event type
16 wraps 32 times per second at 4GHz
17 each wraps spaced from 0.03125s
18 100HZ clock : tick each 0.01s
19 detect wrap at least each 3 jiffies (dangerous, may miss)
20 granularity : 2^0 = 1 cycle : 0.25ns @4GHz
21 payload size known by facility
25 1 bit to select event type
29 wraps each second at 4GHz
30 100HZ clock : tick each 0.01s
31 granularity : 2^5 = 32 cycles : 8ns @4GHz
32 payload size known by facility
36 1 bit to select event type
40 wraps each 0.5 second at 4GHz
41 100HZ clock : tick each 0.01s
42 granularity : 2^6 = 64 cycles : 16ns @4GHz
43 payload size known by facility
47 1 bit to select event type
51 wraps each 0.5 second at 4GHz
52 100HZ clock : tick each 0.01s
53 granularity : 2^7 = 128 cycles : 32ns @4GHz
54 payload size known by facility
58 1 bit to select event type
59 15 bits event id, (major 8 minor 8)
61 16 bits event size (extra)
63 wraps each second at 4GHz
64 100HZ clock : tick each 0.01s
66 96 or 128 bits header (full 64 bits TSC, useful when no heartbeat available
67 size depends on internal alignment)
69 1 bit to select event type
70 15 bits event id, (major 8 minor 8)
72 16 bits event size (extra)
75 wraps each 146.14 years at 4GHz
81 Must put the event ID fields first in the large (64, 96-128 bits) event headers
82 Create a "compact" facility which reserves the facility IDs with the MSB at 1.
83 - or better : select mapping for events
84 What is the minimum granularity required ? (so we know how much LSB to cut)
85 - How much can synchonized CPU TSCs drift apart one from another ?
87 http://en.wikipedia.org/wiki/Phase-locked_loop
88 static phase offset -> tracking jitter
89 25 MHz oscillator on motherboard for CPU
90 jitter : expressed in ±picoseconds (should therefore be lower than 0.25ns)
91 http://www.eetasia.com/ART_8800082274_480600_683c4e6b200103.HTM
93 - What is the cacheline synchronization latency between the CPUs ?
94 Worse case : Intel Core 2, Intel Xeon 5100, Intel core solo, intel core duo
95 Unified L2 cache. http://www.intel.com/design/processor/manuals/253668.pdf
96 Intel Core 2, Intel Xeon 5100
97 http://www.intel.com/design/processor/manuals/253665.pdf
99 http://www.xbitlabs.com/articles/mobile/display/core2duo_2.html
100 Intel Core Duo Intel Core 2 Duo
101 L2 cache latency 14 cycles 14 cycles
102 (round-trip : 28 cycles) 7ns @4GHz
103 sparc64 : between threads : shares L1 cache.
104 suspected to be ~2 cycles total (1+1) (to check)
105 - How close (cycle-wise) can be two consecutive recorded events in the same
106 buffer ? (~200ns, time for logging an event) (~800 cycles @4GHz)
107 - Tracing code itself : if it's at a subbuffer boundary, more check to do.
108 Must see the maximum duration of a non interrupted probe.
109 Worse case (had NMIs enabled) : 6997 cycles. 1749 ns @4GHz.
110 TODO : test with NMIs disabled and HT disabled.
111 Ordering can be changed if an interrupt comes between the memory operation
112 and the tracer call. Therefore, we cannot rely on more precision than the
113 expected interrupt handler duration. (guess : ~10000cycles, 2500ns@4GHz)
114 - If there is a faster interconnect between the CPUs, it can be a problem, but
115 seems to only be proprietary interconnects, not used in general.
116 - IPI are expected to take much more than 28 cycles.
117 What is the minimum wrap-around interval ? (must be safe for timer interrupt
118 miss and multiple timer HZ (configurable) and CPU MHZ frequencies)
119 Must align _all_ headers on 32 bits, not 64.
121 Granularity : 800ns (200 cycles@4GHz) : 2^9 = 512 (remove 9 LSB)
122 Number of LSB skipped : first_bit(probe_duration_in_cycles)-1
124 Min wrap : 100HZ system, each 3 timer ticks : 0.03s (32-4 MSB for 4 GHZ : 0.26s)
125 (heartbeat each 100HZ, to be safe)
126 Number of MSB to skip :
127 32 - first_bit(( (expected_longest_cli()[ms] + max_timer_interval[ms]) * 2 /
131 9LSB + 4MSB = 13 bits total. 12 bits for event IDs : 4096 events.