d7d9a4ce |
1 | |
2 | Atomic UP test results. |
3 | |
4 | |
5 | |
6 | |
7 | using test-time-probe2.ko |
8 | |
9 | Clock speed : cpu MHz : 3000.077 |
10 | |
11 | Tracing inactive |
12 | |
13 | [ 125.787229] test init |
14 | [ 125.787303] test results : time per probe |
15 | [ 125.787306] number of loops : 20000 |
16 | [ 125.787309] total time : 204413 |
17 | [ 125.787312] test end |
18 | [ 175.660402] test init |
19 | [ 175.660475] test results : time per probe |
20 | [ 175.660479] number of loops : 20000 |
21 | [ 175.660482] total time : 203468 |
22 | [ 175.660484] test end |
23 | [ 179.337362] test init |
24 | [ 179.337436] test results : time per probe |
25 | [ 179.337440] number of loops : 20000 |
26 | [ 179.337443] total time : 204757 |
27 | [ 179.337446] test end |
28 | |
29 | Res : 10.21 cycles per loop |
30 | |
31 | Atomic UP, one trace, flight recorder. |
32 | |
33 | [ 357.983971] test init |
34 | [ 357.988837] test results : time per probe |
35 | [ 357.988843] number of loops : 20000 |
36 | [ 357.988846] total time : 12349013 |
37 | [ 357.988849] test end |
38 | [ 358.718896] test init |
39 | [ 358.723049] test results : time per probe |
40 | [ 358.723053] number of loops : 20000 |
41 | [ 358.723057] total time : 12332497 |
42 | [ 358.723059] test end |
43 | [ 359.422038] test init |
44 | [ 359.426173] test results : time per probe |
45 | [ 359.426179] number of loops : 20000 |
46 | [ 359.426182] total time : 12332535 |
47 | [ 359.426185] test end |
48 | |
49 | Res : 616.90 cycles per loop. |
50 | 205.63 ns per loop |
51 | |
52 | Atomic SMP, one trace, flight. |
53 | |
54 | |
55 | [ 111.694180] test init |
56 | [ 111.700191] test results : time per probe |
57 | [ 111.700198] number of loops : 20000 |
58 | [ 111.700201] total time : 16925670 |
59 | [ 111.700204] test end |
60 | [ 112.285716] test init |
61 | [ 112.291321] test results : time per probe |
62 | [ 112.291326] number of loops : 20000 |
63 | [ 112.291329] total time : 16766633 |
64 | [ 112.291332] test end |
65 | [ 112.880602] test init |
66 | [ 112.884739] test results : time per probe |
67 | [ 112.884743] number of loops : 20000 |
68 | [ 112.884746] total time : 12358237 |
69 | [ 112.884748] test end |
70 | |
71 | Res : 767.51 cycles per loop |
72 | 255.83 ns per loop |
73 | |
74 | (205.63-255.83)/255.83 * 100% = 19.62 % |
75 | |
3fa56475 |
76 | |
77 | Difference between |
78 | cmpxchg 2967855/20000 = 148.39 cycles or 49.46 ns |
79 | cmpxchg-up 540577/20000 = 27.02 cycles or 9.00 ns |
80 | irq save/restore 12636562/20000 = 631.82 cycles 210.60 ns |
81 | |
82 | |
83 | |
7c5922fc |
84 | * Memory ordering |
85 | |
86 | offset |
87 | written by local CPU |
88 | read by local CPU and other CPUs (reader) |
89 | |
90 | commit count |
91 | written by local CPU |
92 | read by local CPU and other CPUs (reader) |
93 | |
94 | consumed |
95 | written by any CPU |
96 | read by any CPU |
97 | |
98 | data |
99 | written by local CPU |
100 | read by any CPU |
101 | |
102 | |
103 | test done in the reader : |
104 | if ( consumed < offset ) |
105 | if ( subbuf.commit_count == multiple of SUBBUFSIZE) |
106 | read data |
107 | inc consumed |
108 | |
109 | |
110 | We must guarantee the following ordering : |
111 | * offset |
112 | Seen from the local CPU : |
113 | offset must always be incremented before the data is written (already |
114 | consistent) |
115 | |
116 | Seen from other cpus : |
117 | offset and data can be written out of order |
118 | (because offset is always incremented : in an out of order case, offset is lower |
119 | than the actual data ready, but the commit_count _has_ to be incremented to read |
120 | the data (and is preceded by a store fence) |
121 | |
122 | * commit_count |
123 | commit_count must always be seen by other CPUs after the data has been written. |
124 | Therefore, we must put a store fence before the commit_count write. (smp_wmb) |
125 | |
126 | * consumed |
127 | Rarely updated, use LOCK prefix. Acts as a full memory barrier. |
3fa56475 |
128 | |
129 | |
130 | |
d7d9a4ce |
131 | Mathieu Desnoyers, November 2006 |