doc/developer/tsc-smallv2.txt

   1 Adding support for "compact" 32 bits events.
   2
   3 Mathieu Desnoyers
   4 March 12, 2007
   5
   6 Use a separate channel for compact events
   7
   8 Mux those events into this channel and magically they are "compact". Isn't it
   9 beautiful.
  10
  11 event header
  12
  13 ### COMPACT EVENTS
  14
  15 32 bits header
  16 Aligned on 32 bits
  17   5 bits event ID
  18     32 events
  19   27 bits TSC (cut MSB)
  20     wraps 32 times per second at 4GHz
  21     each wraps spaced from 0.03125s
  22     100HZ clock : tick each 0.01s
  23       detect wrap at least each 3 jiffies (dangerous, may miss)
  24     granularity : 2^0 = 1 cycle : 0.25ns @4GHz
  25 payload size known by facility
  26
  27 32 bits header
  28 Aligned on 32 bits
  29   5 bits event ID
  30     32 events
  31   27 bits TSC (cut LSB)
  32     wraps each second at 4GHz
  33     100HZ clock : tick each 0.01s
  34     granularity : 2^5 = 32 cycles : 8ns @4GHz
  35 payload size known by facility
  36
  37 32 bits header
  38 Aligned on 32 bits
  39   6 bits event ID
  40     64 events
  41   26 bits TSC (cut LSB)
  42     wraps each 0.5 second at 4GHz
  43     100HZ clock : tick each 0.01s
  44     granularity : 2^6 = 64 cycles : 16ns @4GHz
  45 payload size known by facility
  46
  47 32 bits header
  48 Aligned on 32 bits
  49   7 bits event ID
  50     128 events
  51   25 bits TSC (cut LSB)
  52     wraps each 0.5 second at 4GHz
  53     100HZ clock : tick each 0.01s
  54     granularity : 2^7 = 128 cycles : 32ns @4GHz
  55 payload size known by facility
  56
  57
  58
  59 ### NORMAL EVENTS
  60
  61 64 bits header
  62 Aligned on 64 bits
  63   32 bits TSC
  64     wraps each second at 4GHz
  65     100HZ clock : tick each 0.01s
  66   16 bits event id, (major 8 minor 8)
  67      65536 events
  68   16 bits event size (extra)
  69
  70 96 bits header (full 64 bits TSC, useful when no heartbeat available)
  71 Aligned on 64 bits
  72   64 bits TSC
  73     wraps each 146.14 years at 4GHz
  74   16 bits event id, (major 8 minor 8)
  75      65536 events
  76   16 bits event size (extra)
  77
  78
  79 ## Discussion of compact events
  80
  81 Must put the event ID fields first in the large (64, 96-128 bits) event headers
  82 What is the minimum granularity required ? (so we know how much LSB to cut)
  83   - How much can synchonized CPU TSCs drift apart one from another ?
  84     PLL
  85     http://en.wikipedia.org/wiki/Phase-locked_loop
  86     static phase offset -> tracking jitter
  87     25 MHz oscillator on motherboard for CPU
  88     jitter : expressed in ±picoseconds (should therefore be lower than 0.25ns)
  89     http://www.eetasia.com/ART_8800082274_480600_683c4e6b200103.HTM
  90     NEED MORE INFO.
  91   - What is the cacheline synchronization latency between the CPUs ?
  92     Worse case : Intel Core 2, Intel Xeon 5100, Intel core solo, intel core duo
  93     Unified L2 cache. http://www.intel.com/design/processor/manuals/253668.pdf
  94     Intel Core 2, Intel Xeon 5100
  95     http://www.intel.com/design/processor/manuals/253665.pdf
  96     Up to 10.7 GB/s FSB
  97     http://www.xbitlabs.com/articles/mobile/display/core2duo_2.html
  98                       Intel Core Duo     Intel Core 2 Duo
  99     L2 cache latency  14 cycles          14 cycles
 100     (round-trip : 28 cycles) 7ns @4GHz
 101     sparc64 : between threads : shares L1 cache.
 102     suspected to be ~2 cycles total (1+1) (to check)
 103   - How close (cycle-wise) can be two consecutive recorded events in the same
 104     buffer ? (~200ns, time for logging an event) (~800 cycles @4GHz)
 105   - Tracing code itself : if it's at a subbuffer boundary, more check to do.
 106     Must see the maximum duration of a non interrupted probe.
 107     Worse case (had NMIs enabled) : 6997 cycles. 1749 ns @4GHz.
 108     TODO : test with NMIs disabled and HT disabled.
 109     Ordering can be changed if an interrupt comes between the memory operation
 110     and the tracer call. Therefore, we cannot rely on more precision than the
 111     expected interrupt handler duration. (guess : ~10000cycles, 2500ns@4GHz)
 112   - If there is a faster interconnect between the CPUs, it can be a problem, but
 113     seems to only be proprietary interconnects, not used in general.
 114   - IPI are expected to take much more than 28 cycles.
 115 What is the minimum wrap-around interval ? (must be safe for timer interrupt
 116 miss and multiple timer HZ (configurable) and CPU MHZ frequencies)
 117
 118 Granularity : 800ns (200 cycles@4GHz) : 2^9 = 512 (remove 9 LSB)
 119   Probe never takes 1 cycle.
 120   Number of LSB skipped : max(0, (long)find_first_bit(probe_duration_in_cycles)-1)
 121
 122 Min wrap : 100HZ system, each 3 timer ticks : 0.03s (32-4 MSB for 4 GHZ : 0.26s)
 123   (heartbeat each 100HZ, to be safe)
 124   Number of MSB to skip :
 125     32 - find_first_bit(( (expected_longest_interrupt_latency()[ms] +
 126        max_timer_interval[ms]) * cpu_khz[kcycles/s] )) - 1
 127     (the last -1 is to make sure we remove less or exact amount of bits, round
 128     near to 0, not round up).
 129
 130 Heartbeat timer :
 131   Each timer interrupt
 132   Event : 32 bytes in size
 133   each timer tick : 100HZ
 134   3.2kB/s
 135
 136 9LSB + 4MSB = 13 bits total. 13 bits for event IDs : 8192 events.
 137
 138
 139
 140
 141
 142
 143