urcu-qsbr: batch concurrent synchronize_rcu()
Here are benchmarks on batching of synchronize_rcu(), and it leads to
very interesting scalability improvement and speedups, e.g., on a
24-core AMD, with a write-heavy scenario (4 readers threads, 20 updater
threads, each updater using synchronize_rcu()):
* Serialized grace periods :
./test_urcu_qsbr 4 20 20
SUMMARY ./test_urcu_qsbr testdur 20 nr_readers 4
rdur 0 wdur 0 nr_writers 20 wdelay 0
nr_reads
20251412728 nr_writes
1826331 nr_ops
20253239059
* Batched grace periods :
./test_urcu_qsbr 4 20 20
SUMMARY ./test_urcu_qsbr testdur 20 nr_readers 4
rdur 0 wdur 0 nr_writers 20 wdelay 0
nr_reads
15141994746 nr_writes
9382515 nr_ops
15151377261
For a
9382515/
1826331 = 5.13 speedup for 20 updaters.
Of course, we can see that readers have slowed down, probably due to
increased update traffic, given there is no change to the read-side code
whatsoever.
Now let's see the penality of managing the stack for single-updater.
With 4 readers, single updater:
* Serialized grace periods :
./test_urcu_qsbr 4 1 20
SUMMARY ./test_urcu_qsbr testdur 20 nr_readers 4
rdur 0 wdur 0 nr_writers 1 wdelay 0
nr_reads
19240784755 nr_writes
2130839 nr_ops
19242915594
* Batched grace periods :
./test_urcu_qsbr 4 1 20
SUMMARY ./test_urcu_qsbr testdur 20 nr_readers 4
rdur 0 wdur 0 nr_writers 1 wdelay 0
nr_reads
19160162768 nr_writes
2253068 nr_ops
1916241583
2253068 vs
2137036 -> a couple of runs show that this difference lost in
the noise for single updater.
More benchmark results:
* Serialized synchronize_rcu() -- test_urcu_qsbr
./test_urcu_qsbr 4 1 20
SUMMARY ./test_urcu_qsbr testdur 20 nr_readers 4 rdur 0 wdur 0 nr_writers 1 wdelay 0 nr_reads
18841016559 nr_writes
1857130 nr_ops
18842873689
./test_urcu_qsbr 4 20 20
SUMMARY ./test_urcu_qsbr testdur 20 nr_readers 4 rdur 0 wdur 0 nr_writers 20 wdelay 0 nr_reads
20272811733 nr_writes
1837027 nr_ops
20274648760
./test_urcu_qsbr 12 12 20
SUMMARY ./test_urcu_qsbr testdur 20 nr_readers 12 rdur 0 wdur 0 nr_writers 12 wdelay 0 nr_reads
60343516643 nr_writes
2353685 nr_ops
60345870328
./test_urcu_qsbr 16 8 20
SUMMARY ./test_urcu_qsbr testdur 20 nr_readers 16 rdur 0 wdur 0 nr_writers 8 wdelay 0 nr_reads
78202711840 nr_writes
2326331 nr_ops
78205038171
./test_urcu_qsbr 20 4 20
SUMMARY ./test_urcu_qsbr testdur 20 nr_readers 20 rdur 0 wdur 0 nr_writers 4 wdelay 0 nr_reads
94553396003 nr_writes
2238396 nr_ops
94555634399
./test_urcu_qsbr 20 3 20
SUMMARY ./test_urcu_qsbr testdur 20 nr_readers 20 rdur 0 wdur 0 nr_writers 3 wdelay 0 nr_reads
95004708661 nr_writes
2165966 nr_ops
95006874627
./test_urcu_qsbr 20 2 20
SUMMARY ./test_urcu_qsbr testdur 20 nr_readers 20 rdur 0 wdur 0 nr_writers 2 wdelay 0 nr_reads
95386506198 nr_writes
2194352 nr_ops
95388700550
./test_urcu_qsbr 20 1 20
SUMMARY ./test_urcu_qsbr testdur 20 nr_readers 20 rdur 0 wdur 0 nr_writers 1 wdelay 0 nr_reads
84705972017 nr_writes
2609595 nr_ops
84708581612
* Batched synchronize_rcu() -- test_urcu_qsbr
./test_urcu_qsbr 4 1 20
SUMMARY ./test_urcu_qsbr testdur 20 nr_readers 4 rdur 0 wdur 0 nr_writers 1 wdelay 0 nr_reads
19154850714 nr_writes
2238834 nr_ops
19157089548
./test_urcu_qsbr 4 20 20
SUMMARY ./test_urcu_qsbr testdur 20 nr_readers 4 rdur 0 wdur 0 nr_writers 20 wdelay 0 nr_reads
15114131760 nr_writes
9370255 nr_ops
15123502015
./test_urcu_qsbr 12 12 20
SUMMARY ./test_urcu_qsbr testdur 20 nr_readers 12 rdur 0 wdur 0 nr_writers 12 wdelay 0 nr_reads
45541854970 nr_writes
5786496 nr_ops
45547641466
./test_urcu_qsbr 16 8 20
SUMMARY ./test_urcu_qsbr testdur 20 nr_readers 16 rdur 0 wdur 0 nr_writers 8 wdelay 0 nr_reads
66217337547 nr_writes
4257427 nr_ops
66221594974
./test_urcu_qsbr 20 4 20
SUMMARY ./test_urcu_qsbr testdur 20 nr_readers 20 rdur 0 wdur 0 nr_writers 4 wdelay 0 nr_reads
95048642908 nr_writes
2416266 nr_ops
95051059174
./test_urcu_qsbr 20 3 20
SUMMARY ./test_urcu_qsbr testdur 20 nr_readers 20 rdur 0 wdur 0 nr_writers 3 wdelay 0 nr_reads
96679609928 nr_writes
2211168 nr_ops
96681821096
./test_urcu_qsbr 20 2 20
SUMMARY ./test_urcu_qsbr testdur 20 nr_readers 20 rdur 0 wdur 0 nr_writers 2 wdelay 0 nr_reads
92166219811 nr_writes
1968725 nr_ops
92168188536
./test_urcu_qsbr 20 1 20
SUMMARY ./test_urcu_qsbr testdur 20 nr_readers 20 rdur 0 wdur 0 nr_writers 1 wdelay 0 nr_reads
87986181951 nr_writes
3278737 nr_ops
87989460688
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
CC: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>