| 1 | |
| 2 | Atomic UP test results. |
| 3 | |
| 4 | |
| 5 | |
| 6 | |
| 7 | using test-time-probe2.ko |
| 8 | |
| 9 | Clock speed : cpu MHz : 3000.077 |
| 10 | |
| 11 | Tracing inactive |
| 12 | |
| 13 | [ 125.787229] test init |
| 14 | [ 125.787303] test results : time per probe |
| 15 | [ 125.787306] number of loops : 20000 |
| 16 | [ 125.787309] total time : 204413 |
| 17 | [ 125.787312] test end |
| 18 | [ 175.660402] test init |
| 19 | [ 175.660475] test results : time per probe |
| 20 | [ 175.660479] number of loops : 20000 |
| 21 | [ 175.660482] total time : 203468 |
| 22 | [ 175.660484] test end |
| 23 | [ 179.337362] test init |
| 24 | [ 179.337436] test results : time per probe |
| 25 | [ 179.337440] number of loops : 20000 |
| 26 | [ 179.337443] total time : 204757 |
| 27 | [ 179.337446] test end |
| 28 | |
| 29 | Res : 10.21 cycles per loop |
| 30 | |
| 31 | Atomic UP, one trace, flight recorder. |
| 32 | |
| 33 | [ 357.983971] test init |
| 34 | [ 357.988837] test results : time per probe |
| 35 | [ 357.988843] number of loops : 20000 |
| 36 | [ 357.988846] total time : 12349013 |
| 37 | [ 357.988849] test end |
| 38 | [ 358.718896] test init |
| 39 | [ 358.723049] test results : time per probe |
| 40 | [ 358.723053] number of loops : 20000 |
| 41 | [ 358.723057] total time : 12332497 |
| 42 | [ 358.723059] test end |
| 43 | [ 359.422038] test init |
| 44 | [ 359.426173] test results : time per probe |
| 45 | [ 359.426179] number of loops : 20000 |
| 46 | [ 359.426182] total time : 12332535 |
| 47 | [ 359.426185] test end |
| 48 | |
| 49 | Res : 616.90 cycles per loop. |
| 50 | 205.63 ns per loop |
| 51 | |
| 52 | Atomic SMP, one trace, flight. |
| 53 | |
| 54 | |
| 55 | [ 111.694180] test init |
| 56 | [ 111.700191] test results : time per probe |
| 57 | [ 111.700198] number of loops : 20000 |
| 58 | [ 111.700201] total time : 16925670 |
| 59 | [ 111.700204] test end |
| 60 | [ 112.285716] test init |
| 61 | [ 112.291321] test results : time per probe |
| 62 | [ 112.291326] number of loops : 20000 |
| 63 | [ 112.291329] total time : 16766633 |
| 64 | [ 112.291332] test end |
| 65 | [ 112.880602] test init |
| 66 | [ 112.884739] test results : time per probe |
| 67 | [ 112.884743] number of loops : 20000 |
| 68 | [ 112.884746] total time : 12358237 |
| 69 | [ 112.884748] test end |
| 70 | |
| 71 | Res : 767.51 cycles per loop |
| 72 | 255.83 ns per loop |
| 73 | |
| 74 | (205.63-255.83)/255.83 * 100% = 19.62 % |
| 75 | |
| 76 | |
| 77 | Difference between |
| 78 | cmpxchg 2967855/20000 = 148.39 cycles or 49.46 ns |
| 79 | cmpxchg-up 540577/20000 = 27.02 cycles or 9.00 ns |
| 80 | irq save/restore 12636562/20000 = 631.82 cycles 210.60 ns |
| 81 | |
| 82 | |
| 83 | |
| 84 | * Memory ordering |
| 85 | |
| 86 | offset |
| 87 | written by local CPU |
| 88 | read by local CPU and other CPUs (reader) |
| 89 | |
| 90 | commit count |
| 91 | written by local CPU |
| 92 | read by local CPU and other CPUs (reader) |
| 93 | |
| 94 | consumed |
| 95 | written by any CPU |
| 96 | read by any CPU |
| 97 | |
| 98 | data |
| 99 | written by local CPU |
| 100 | read by any CPU |
| 101 | |
| 102 | |
| 103 | test done in the reader : |
| 104 | if ( consumed < offset ) |
| 105 | if ( subbuf.commit_count == multiple of SUBBUFSIZE) |
| 106 | read data |
| 107 | inc consumed |
| 108 | |
| 109 | |
| 110 | We must guarantee the following ordering : |
| 111 | * offset |
| 112 | Seen from the local CPU : |
| 113 | offset must always be incremented before the data is written (already |
| 114 | consistent) |
| 115 | |
| 116 | Seen from other cpus : |
| 117 | offset and data can be written out of order |
| 118 | (because offset is always incremented : in an out of order case, offset is lower |
| 119 | than the actual data ready, but the commit_count _has_ to be incremented to read |
| 120 | the data (and is preceded by a store fence) |
| 121 | |
| 122 | * commit_count |
| 123 | commit_count must always be seen by other CPUs after the data has been written. |
| 124 | Therefore, we must put a store fence before the commit_count write. (smp_wmb) |
| 125 | |
| 126 | * consumed |
| 127 | Rarely updated, use LOCK prefix. Acts as a full memory barrier. |
| 128 | |
| 129 | |
| 130 | |
| 131 | Mathieu Desnoyers, November 2006 |