5 Use timestamp counter to calculate the time spent, with interrupts disabled.
6 Machine : Pentium 4 3GHz
7 Fully preemptible kernel
8 marker : MARK(subsys_mark1, "%d %p", 1, NULL);
9 Linux Kernel Markers 0.19
11 * Execute an empty loop
13 time delta (cycles): 15026497
14 cycles per loop : 1.50
15 - i386 "optimized" : immediate value, test and predicted branch
16 (non connected marker)
18 time delta (cycles): 40031640
19 cycles per loop : 4.00
20 cycles per loop for marker : 2.50
21 - i386 "generic" : load, test and predicted branch
22 (non connected marker)
24 time delta (cycles): 26697878
25 cycles per loop : 2.67
26 cycles per loop for marker : 1.17
28 * Execute a loop of memcpy 4096 bytes
31 time delta (cycles): 12981555
32 cycles per loop : 1298.16
33 - i386 "optimized" : immediate value, test and predicted branch
34 (non connected marker)
36 time delta (cycles): 12982290
37 cycles per loop : 1298.23
38 cycles per loop for marker : 0.074
39 - i386 "generic" : load, test and predicted branch
40 (non connected marker)
42 time delta (cycles): 13002788
43 cycles per loop : 1300.28
44 cycles per loop for marker : 2.123
47 The following tests are done with the "optimized" markers only
49 Execute a loop with marker enabled, with i386 "fastcall" register argument
52 time delta (cycles): 4407608
53 cycles per loop : 44.08
54 cycles per loop to setup arguments in registers : 44.08-4.00=40.08
56 Execute a loop with a marker enabled, with an empty probe. Var args argument
59 time delta (cycles): 5210587
60 cycles per loop : 52.11
61 additional cycles per loop to setup var args : 52.11-44.08=8.03
63 Execute a loop with marker enabled, with i386 "asmlinkage" arguments expected.
64 Data is copied by the probe.
66 time delta (cycles): 5299837
67 cycles per loop : 53.00
68 additional cycles per loop to get arguments in probe (from stack) on x86 :
71 Execute a loop with marker enabled, with var args probe expecting arguments.
72 Data is copied by the probe.
74 time delta (cycles): 5574300
75 cycles per loop : 55.74
76 additional cycles per loop to get expected variable arguments on x86 :
79 Execute a loop with marker enabled, with var args probe, format string
80 Data is copied by the probe.
83 time delta (cycles): 9622117
84 cycles per loop : 96.22
85 additional cycles per loop to dynamically parse arguments with format string :
94 static int my_open(struct inode *inode, struct file *file)
97 1: 89 e5 mov %esp,%ebp
98 3: 83 ec 0c sub $0xc,%esp
99 MARK(subsys_mark1, "%d %p", 1, NULL);
100 6: b0 00 mov $0x0,%al
101 8: 84 c0 test %al,%al
102 a: 75 07 jne 13 <my_open+0x13>
106 c: b8 ff ff ff ff mov $0xffffffff,%eax
109 13: b8 01 00 00 00 mov $0x1,%eax
110 18: e8 fc ff ff ff call 19 <my_open+0x19>
111 1d: c7 44 24 08 00 00 00 movl $0x0,0x8(%esp)
113 25: c7 44 24 04 01 00 00 movl $0x1,0x4(%esp)
115 2d: c7 04 24 0d 00 00 00 movl $0xd,(%esp)
116 34: ff 15 74 10 00 00 call *0x1074
117 3a: b8 01 00 00 00 mov $0x1,%eax
118 3f: e8 fc ff ff ff call 40 <my_open+0x40>
119 44: eb c6 jmp c <my_open+0xc>
124 static int my_open(struct inode *inode, struct file *file)
127 1: 89 e5 mov %esp,%ebp
128 3: 83 ec 0c sub $0xc,%esp
129 MARK(subsys_mark1, "%d %p", 1, NULL);
130 6: 0f b6 05 20 10 00 00 movzbl 0x1020,%eax
131 d: 84 c0 test %al,%al
132 f: 75 07 jne 18 <my_open+0x18>
136 11: b8 ff ff ff ff mov $0xffffffff,%eax
139 18: b8 01 00 00 00 mov $0x1,%eax
140 1d: e8 fc ff ff ff call 1e <my_open+0x1e>
141 22: c7 44 24 08 00 00 00 movl $0x0,0x8(%esp)
143 2a: c7 44 24 04 01 00 00 movl $0x1,0x4(%esp)
145 32: c7 04 24 0d 00 00 00 movl $0xd,(%esp)
146 39: ff 15 74 10 00 00 call *0x1074
147 3f: b8 01 00 00 00 mov $0x1,%eax
148 44: e8 fc ff ff ff call 45 <my_open+0x45>
149 49: eb c6 jmp 11 <my_open+0x11>
155 Adds 6 bytes in the "likely" path.
156 Adds 32 bytes in the "unlikely" path.
160 Adds 11 bytes in the "likely" path.
161 Adds 32 bytes in the "unlikely" path.
166 Compiling a 2.6.17 kernel on a Pentium 4 3GHz, 1GB ram, cold cache.
167 Running a 2.6.17 vanilla kernel :
172 Running a 2.6.17 kernel with lttng-0.6.0pre11 markers (no probe connected) :
177 Ping flood on loopback interface :
178 Running a 2.6.17 vanilla kernel :
179 136596 packets transmitted, 136596 packets received, 0% packet loss
180 round-trip min/avg/max = 0.0/0.0/0.1 ms
186 12601 packets transmitted/s
188 Running a 2.6.17 kernel with lttng-0.6.0pre11 markers (no probe connected) :
189 108504 packets transmitted, 108504 packets received, 0% packet loss
190 round-trip min/avg/max = 0.0/0.0/0.1 ms
196 12596 packets transmitted/s
202 In an empty loop, the generic marker is faster than the optimized marker. This
203 may be due to better performances of the movzbl instruction over the movb on the
204 Pentium 4 architecture. However, when we execute a loop of 4kB copy, the impact
205 of the movzbl becomes greater because it uses the memory bandwidth.
207 The preemption disabling and call to a probe itself costs 48.11 cycles, almost
208 as much as dynamically parsing the format string to get the variable arguments
211 There is almost no difference, on x86, between passing the arguments directly on
212 the stack and using a variable argument list when its layout is known
213 statically (0.89 cycles vs 2.74 cycles).