doc/developer/lttng-atomic-up.txt

   1
   2 Atomic UP test results.
   3
   4
   5
   6
   7 using test-time-probe2.ko
   8
   9 Clock speed : cpu MHz         : 3000.077
  10
  11 Tracing inactive
  12
  13 [  125.787229] test init
  14 [  125.787303] test results : time per probe
  15 [  125.787306] number of loops : 20000
  16 [  125.787309] total time : 204413
  17 [  125.787312] test end
  18 [  175.660402] test init
  19 [  175.660475] test results : time per probe
  20 [  175.660479] number of loops : 20000
  21 [  175.660482] total time : 203468
  22 [  175.660484] test end
  23 [  179.337362] test init
  24 [  179.337436] test results : time per probe
  25 [  179.337440] number of loops : 20000
  26 [  179.337443] total time : 204757
  27 [  179.337446] test end
  28
  29 Res : 10.21 cycles per loop
  30
  31 Atomic UP, one trace, flight recorder.
  32
  33 [  357.983971] test init
  34 [  357.988837] test results : time per probe
  35 [  357.988843] number of loops : 20000
  36 [  357.988846] total time : 12349013
  37 [  357.988849] test end
  38 [  358.718896] test init
  39 [  358.723049] test results : time per probe
  40 [  358.723053] number of loops : 20000
  41 [  358.723057] total time : 12332497
  42 [  358.723059] test end
  43 [  359.422038] test init
  44 [  359.426173] test results : time per probe
  45 [  359.426179] number of loops : 20000
  46 [  359.426182] total time : 12332535
  47 [  359.426185] test end
  48
  49 Res : 616.90 cycles per loop.
  50 205.63 ns per loop
  51
  52 Atomic SMP, one trace, flight.
  53
  54
  55 [  111.694180] test init
  56 [  111.700191] test results : time per probe
  57 [  111.700198] number of loops : 20000
  58 [  111.700201] total time : 16925670
  59 [  111.700204] test end
  60 [  112.285716] test init
  61 [  112.291321] test results : time per probe
  62 [  112.291326] number of loops : 20000
  63 [  112.291329] total time : 16766633
  64 [  112.291332] test end
  65 [  112.880602] test init
  66 [  112.884739] test results : time per probe
  67 [  112.884743] number of loops : 20000
  68 [  112.884746] total time : 12358237
  69 [  112.884748] test end
  70
  71 Res : 767.51 cycles per loop
  72 255.83 ns per loop
  73
  74 (205.63-255.83)/255.83 * 100% = 19.62 %
  75
  76
  77 Difference between
  78 cmpxchg 2967855/20000 = 148.39 cycles or 49.46 ns
  79 cmpxchg-up 540577/20000 = 27.02 cycles or 9.00 ns
  80 irq save/restore 12636562/20000 = 631.82 cycles 210.60 ns
  81
  82
  83
  84 * Memory ordering
  85
  86 offset
  87 written by local CPU
  88 read by local CPU and other CPUs (reader)
  89
  90 commit count
  91 written by local CPU
  92 read by local CPU and other CPUs (reader)
  93
  94 consumed
  95 written by any CPU
  96 read by any CPU
  97
  98 data
  99 written by local CPU
 100 read by any CPU
 101
 102
 103 test done in the reader :
 104 if ( consumed < offset )
 105   if ( subbuf.commit_count == multiple of SUBBUFSIZE)
 106     read data
 107     inc consumed
 108
 109
 110 We must guarantee the following ordering :
 111 * offset
 112 Seen from the local CPU :
 113 offset must always be incremented before the data is written (already
 114 consistent)
 115
 116 Seen from other cpus :
 117 offset and data can be written out of order
 118 (because offset is always incremented : in an out of order case, offset is lower
 119 than the actual data ready, but the commit_count _has_ to be incremented to read
 120 the data (and is preceded by a store fence)
 121
 122 * commit_count
 123 commit_count must always be seen by other CPUs after the data has been written.
 124 Therefore, we must put a store fence before the commit_count write. (smp_wmb)
 125
 126 * consumed
 127 Rarely updated, use LOCK prefix. Acts as a full memory barrier.
 128
 129
 130
 131 Mathieu Desnoyers, November 2006