update compat
[lttv.git] / doc / developer / tsc-smallv1.txt
1 Adding support for "compact" 32 bits events.
2
3 Mathieu Desnoyers
4 March 9, 2007
5
6
7 event header
8
9
10 32 bits header
11 Aligned on 32 bits
12 1 bit to select event type
13 4 bits event ID
14 16 events (too few)
15 27 bits TSC (cut MSB)
16 wraps 32 times per second at 4GHz
17 each wraps spaced from 0.03125s
18 100HZ clock : tick each 0.01s
19 detect wrap at least each 3 jiffies (dangerous, may miss)
20 granularity : 2^0 = 1 cycle : 0.25ns @4GHz
21 payload size known by facility
22
23 32 bits header
24 Aligned on 32 bits
25 1 bit to select event type
26 4 bits event ID
27 16 events (too few)
28 27 bits TSC (cut LSB)
29 wraps each second at 4GHz
30 100HZ clock : tick each 0.01s
31 granularity : 2^5 = 32 cycles : 8ns @4GHz
32 payload size known by facility
33
34 32 bits header
35 Aligned on 32 bits
36 1 bit to select event type
37 5 bits event ID
38 32 events
39 26 bits TSC (cut LSB)
40 wraps each 0.5 second at 4GHz
41 100HZ clock : tick each 0.01s
42 granularity : 2^6 = 64 cycles : 16ns @4GHz
43 payload size known by facility
44
45 32 bits header
46 Aligned on 32 bits
47 1 bit to select event type
48 6 bits event ID
49 64 events
50 25 bits TSC (cut LSB)
51 wraps each 0.5 second at 4GHz
52 100HZ clock : tick each 0.01s
53 granularity : 2^7 = 128 cycles : 32ns @4GHz
54 payload size known by facility
55
56 64 bits header
57 Aligned on 32 bits
58 1 bit to select event type
59 15 bits event id, (major 8 minor 8)
60 32768 events
61 16 bits event size (extra)
62 32 bits TSC
63 wraps each second at 4GHz
64 100HZ clock : tick each 0.01s
65
66 96 or 128 bits header (full 64 bits TSC, useful when no heartbeat available
67 size depends on internal alignment)
68 Aligned on 32 bits
69 1 bit to select event type
70 15 bits event id, (major 8 minor 8)
71 32768 events
72 16 bits event size (extra)
73 Align on 64 bits
74 64 bits TSC
75 wraps each 146.14 years at 4GHz
76
77
78
79
80
81 Must put the event ID fields first in the large (64, 96-128 bits) event headers
82 Create a "compact" facility which reserves the facility IDs with the MSB at 1.
83 - or better : select mapping for events
84 What is the minimum granularity required ? (so we know how much LSB to cut)
85 - How much can synchonized CPU TSCs drift apart one from another ?
86 PLL
87 http://en.wikipedia.org/wiki/Phase-locked_loop
88 static phase offset -> tracking jitter
89 25 MHz oscillator on motherboard for CPU
90 jitter : expressed in ±picoseconds (should therefore be lower than 0.25ns)
91 http://www.eetasia.com/ART_8800082274_480600_683c4e6b200103.HTM
92 NEED MORE INFO.
93 - What is the cacheline synchronization latency between the CPUs ?
94 Worse case : Intel Core 2, Intel Xeon 5100, Intel core solo, intel core duo
95 Unified L2 cache. http://www.intel.com/design/processor/manuals/253668.pdf
96 Intel Core 2, Intel Xeon 5100
97 http://www.intel.com/design/processor/manuals/253665.pdf
98 Up to 10.7 GB/s FSB
99 http://www.xbitlabs.com/articles/mobile/display/core2duo_2.html
100 Intel Core Duo Intel Core 2 Duo
101 L2 cache latency 14 cycles 14 cycles
102 (round-trip : 28 cycles) 7ns @4GHz
103 sparc64 : between threads : shares L1 cache.
104 suspected to be ~2 cycles total (1+1) (to check)
105 - How close (cycle-wise) can be two consecutive recorded events in the same
106 buffer ? (~200ns, time for logging an event) (~800 cycles @4GHz)
107 - Tracing code itself : if it's at a subbuffer boundary, more check to do.
108 Must see the maximum duration of a non interrupted probe.
109 Worse case (had NMIs enabled) : 6997 cycles. 1749 ns @4GHz.
110 TODO : test with NMIs disabled and HT disabled.
111 Ordering can be changed if an interrupt comes between the memory operation
112 and the tracer call. Therefore, we cannot rely on more precision than the
113 expected interrupt handler duration. (guess : ~10000cycles, 2500ns@4GHz)
114 - If there is a faster interconnect between the CPUs, it can be a problem, but
115 seems to only be proprietary interconnects, not used in general.
116 - IPI are expected to take much more than 28 cycles.
117 What is the minimum wrap-around interval ? (must be safe for timer interrupt
118 miss and multiple timer HZ (configurable) and CPU MHZ frequencies)
119 Must align _all_ headers on 32 bits, not 64.
120
121 Granularity : 800ns (200 cycles@4GHz) : 2^9 = 512 (remove 9 LSB)
122 Number of LSB skipped : first_bit(probe_duration_in_cycles)-1
123
124 Min wrap : 100HZ system, each 3 timer ticks : 0.03s (32-4 MSB for 4 GHZ : 0.26s)
125 (heartbeat each 100HZ, to be safe)
126 Number of MSB to skip :
127 32 - first_bit(( (expected_longest_cli()[ms] + max_timer_interval[ms]) * 2 /
128 cpu_khz ))
129
130
131 9LSB + 4MSB = 13 bits total. 12 bits for event IDs : 4096 events.
132
133
134
This page took 0.031436 seconds and 4 git commands to generate.