Adding support for "compact" 32 bits events. Mathieu Desnoyers March 12, 2007 Use a separate channel for compact events Mux those events into this channel and magically they are "compact". Isn't it beautiful. event header ### COMPACT EVENTS 32 bits header Aligned on 32 bits 5 bits event ID 32 events 27 bits TSC (cut MSB) wraps 32 times per second at 4GHz each wraps spaced from 0.03125s 100HZ clock : tick each 0.01s detect wrap at least each 3 jiffies (dangerous, may miss) granularity : 2^0 = 1 cycle : 0.25ns @4GHz payload size known by facility 32 bits header Aligned on 32 bits 5 bits event ID 32 events 27 bits TSC (cut LSB) wraps each second at 4GHz 100HZ clock : tick each 0.01s granularity : 2^5 = 32 cycles : 8ns @4GHz payload size known by facility 32 bits header Aligned on 32 bits 6 bits event ID 64 events 26 bits TSC (cut LSB) wraps each 0.5 second at 4GHz 100HZ clock : tick each 0.01s granularity : 2^6 = 64 cycles : 16ns @4GHz payload size known by facility 32 bits header Aligned on 32 bits 7 bits event ID 128 events 25 bits TSC (cut LSB) wraps each 0.5 second at 4GHz 100HZ clock : tick each 0.01s granularity : 2^7 = 128 cycles : 32ns @4GHz payload size known by facility ### NORMAL EVENTS 64 bits header Aligned on 64 bits 32 bits TSC wraps each second at 4GHz 100HZ clock : tick each 0.01s 16 bits event id, (major 8 minor 8) 65536 events 16 bits event size (extra) 96 bits header (full 64 bits TSC, useful when no heartbeat available) Aligned on 64 bits 64 bits TSC wraps each 146.14 years at 4GHz 16 bits event id, (major 8 minor 8) 65536 events 16 bits event size (extra) ## Discussion of compact events Must put the event ID fields first in the large (64, 96-128 bits) event headers What is the minimum granularity required ? (so we know how much LSB to cut) - How much can synchonized CPU TSCs drift apart one from another ? PLL http://en.wikipedia.org/wiki/Phase-locked_loop static phase offset -> tracking jitter 25 MHz oscillator on motherboard for CPU jitter : expressed in ±picoseconds (should therefore be lower than 0.25ns) http://www.eetasia.com/ART_8800082274_480600_683c4e6b200103.HTM NEED MORE INFO. - What is the cacheline synchronization latency between the CPUs ? Worse case : Intel Core 2, Intel Xeon 5100, Intel core solo, intel core duo Unified L2 cache. http://www.intel.com/design/processor/manuals/253668.pdf Intel Core 2, Intel Xeon 5100 http://www.intel.com/design/processor/manuals/253665.pdf Up to 10.7 GB/s FSB http://www.xbitlabs.com/articles/mobile/display/core2duo_2.html Intel Core Duo Intel Core 2 Duo L2 cache latency 14 cycles 14 cycles (round-trip : 28 cycles) 7ns @4GHz sparc64 : between threads : shares L1 cache. suspected to be ~2 cycles total (1+1) (to check) - How close (cycle-wise) can be two consecutive recorded events in the same buffer ? (~200ns, time for logging an event) (~800 cycles @4GHz) - Tracing code itself : if it's at a subbuffer boundary, more check to do. Must see the maximum duration of a non interrupted probe. Worse case (had NMIs enabled) : 6997 cycles. 1749 ns @4GHz. TODO : test with NMIs disabled and HT disabled. Ordering can be changed if an interrupt comes between the memory operation and the tracer call. Therefore, we cannot rely on more precision than the expected interrupt handler duration. (guess : ~10000cycles, 2500ns@4GHz) - If there is a faster interconnect between the CPUs, it can be a problem, but seems to only be proprietary interconnects, not used in general. - IPI are expected to take much more than 28 cycles. What is the minimum wrap-around interval ? (must be safe for timer interrupt miss and multiple timer HZ (configurable) and CPU MHZ frequencies) Granularity : 800ns (200 cycles@4GHz) : 2^9 = 512 (remove 9 LSB) Probe never takes 1 cycle. Number of LSB skipped : max(0, (long)find_first_bit(probe_duration_in_cycles)-1) Min wrap : 100HZ system, each 3 timer ticks : 0.03s (32-4 MSB for 4 GHZ : 0.26s) (heartbeat each 100HZ, to be safe) Number of MSB to skip : 32 - find_first_bit(( (expected_longest_interrupt_latency()[ms] + max_timer_interval[ms]) * cpu_khz[kcycles/s] )) - 1 (the last -1 is to make sure we remove less or exact amount of bits, round near to 0, not round up). Heartbeat timer : Each timer interrupt Event : 32 bytes in size each timer tick : 100HZ 3.2kB/s 9LSB + 4MSB = 13 bits total. 13 bits for event IDs : 8192 events.