--- /dev/null
+Adding support for "compact" 32 bits events.
+
+Mathieu Desnoyers
+March 12, 2007
+
+Use a separate channel for compact events
+
+Mux those events into this channel and magically they are "compact". Isn't it
+beautiful.
+
+event header
+
+### COMPACT EVENTS
+
+32 bits header
+Aligned on 32 bits
+ 5 bits event ID
+ 32 events
+ 27 bits TSC (cut MSB)
+ wraps 32 times per second at 4GHz
+ each wraps spaced from 0.03125s
+ 100HZ clock : tick each 0.01s
+ detect wrap at least each 3 jiffies (dangerous, may miss)
+ granularity : 2^0 = 1 cycle : 0.25ns @4GHz
+payload size known by facility
+
+32 bits header
+Aligned on 32 bits
+ 5 bits event ID
+ 32 events
+ 27 bits TSC (cut LSB)
+ wraps each second at 4GHz
+ 100HZ clock : tick each 0.01s
+ granularity : 2^5 = 32 cycles : 8ns @4GHz
+payload size known by facility
+
+32 bits header
+Aligned on 32 bits
+ 6 bits event ID
+ 64 events
+ 26 bits TSC (cut LSB)
+ wraps each 0.5 second at 4GHz
+ 100HZ clock : tick each 0.01s
+ granularity : 2^6 = 64 cycles : 16ns @4GHz
+payload size known by facility
+
+32 bits header
+Aligned on 32 bits
+ 7 bits event ID
+ 128 events
+ 25 bits TSC (cut LSB)
+ wraps each 0.5 second at 4GHz
+ 100HZ clock : tick each 0.01s
+ granularity : 2^7 = 128 cycles : 32ns @4GHz
+payload size known by facility
+
+
+
+### NORMAL EVENTS
+
+64 bits header
+Aligned on 64 bits
+ 32 bits TSC
+ wraps each second at 4GHz
+ 100HZ clock : tick each 0.01s
+ 16 bits event id, (major 8 minor 8)
+ 65536 events
+ 16 bits event size (extra)
+
+96 bits header (full 64 bits TSC, useful when no heartbeat available)
+Aligned on 64 bits
+ 64 bits TSC
+ wraps each 146.14 years at 4GHz
+ 16 bits event id, (major 8 minor 8)
+ 65536 events
+ 16 bits event size (extra)
+
+
+## Discussion of compact events
+
+Must put the event ID fields first in the large (64, 96-128 bits) event headers
+What is the minimum granularity required ? (so we know how much LSB to cut)
+ - How much can synchonized CPU TSCs drift apart one from another ?
+ PLL
+ http://en.wikipedia.org/wiki/Phase-locked_loop
+ static phase offset -> tracking jitter
+ 25 MHz oscillator on motherboard for CPU
+ jitter : expressed in ±picoseconds (should therefore be lower than 0.25ns)
+ http://www.eetasia.com/ART_8800082274_480600_683c4e6b200103.HTM
+ NEED MORE INFO.
+ - What is the cacheline synchronization latency between the CPUs ?
+ Worse case : Intel Core 2, Intel Xeon 5100, Intel core solo, intel core duo
+ Unified L2 cache. http://www.intel.com/design/processor/manuals/253668.pdf
+ Intel Core 2, Intel Xeon 5100
+ http://www.intel.com/design/processor/manuals/253665.pdf
+ Up to 10.7 GB/s FSB
+ http://www.xbitlabs.com/articles/mobile/display/core2duo_2.html
+ Intel Core Duo Intel Core 2 Duo
+ L2 cache latency 14 cycles 14 cycles
+ (round-trip : 28 cycles) 7ns @4GHz
+ sparc64 : between threads : shares L1 cache.
+ suspected to be ~2 cycles total (1+1) (to check)
+ - How close (cycle-wise) can be two consecutive recorded events in the same
+ buffer ? (~200ns, time for logging an event) (~800 cycles @4GHz)
+ - Tracing code itself : if it's at a subbuffer boundary, more check to do.
+ Must see the maximum duration of a non interrupted probe.
+ Worse case (had NMIs enabled) : 6997 cycles. 1749 ns @4GHz.
+ TODO : test with NMIs disabled and HT disabled.
+ Ordering can be changed if an interrupt comes between the memory operation
+ and the tracer call. Therefore, we cannot rely on more precision than the
+ expected interrupt handler duration. (guess : ~10000cycles, 2500ns@4GHz)
+ - If there is a faster interconnect between the CPUs, it can be a problem, but
+ seems to only be proprietary interconnects, not used in general.
+ - IPI are expected to take much more than 28 cycles.
+What is the minimum wrap-around interval ? (must be safe for timer interrupt
+miss and multiple timer HZ (configurable) and CPU MHZ frequencies)
+
+Granularity : 800ns (200 cycles@4GHz) : 2^9 = 512 (remove 9 LSB)
+ Probe never takes 1 cycle.
+ Number of LSB skipped : max(0, (long)find_first_bit(probe_duration_in_cycles)-1)
+
+Min wrap : 100HZ system, each 3 timer ticks : 0.03s (32-4 MSB for 4 GHZ : 0.26s)
+ (heartbeat each 100HZ, to be safe)
+ Number of MSB to skip :
+ 32 - find_first_bit(( (expected_longest_interrupt_latency()[ms] + max_timer_interval[ms]) / cpu_khz )) - 1
+ (the last -1 is to make sure we remove less or exact amount of bits, round
+ near to 0, not round up).
+
+Heartbeat timer :
+ Each timer interrupt
+ Event : 32 bytes in size
+ each timer tick : 100HZ
+ 3.2kB/s
+
+9LSB + 4MSB = 13 bits total. 13 bits for event IDs : 8192 events.
+
+
+
+
+
+
+