summary |
shortlog | log |
commit |
commitdiff |
tree
first ⋅ prev ⋅ next
Mathieu Desnoyers [Mon, 9 Feb 2009 17:06:53 +0000 (12:06 -0500)]
don't __USE_GNU in urcu.h
Not required. Let the .c do it.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 9 Feb 2009 17:05:45 +0000 (12:05 -0500)]
Include pthread.h in urcu.h
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 9 Feb 2009 17:04:15 +0000 (12:04 -0500)]
Add branch prediction, fix xchg for -mtune=core2
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 9 Feb 2009 06:48:04 +0000 (01:48 -0500)]
Add urcu-asm.c
Small file to get the assembly output of fast path.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 9 Feb 2009 06:44:01 +0000 (01:44 -0500)]
Add rcutorture with yield
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 9 Feb 2009 06:31:05 +0000 (01:31 -0500)]
Add rcutorture
Add a modified version of rcutorture.h and api.h.
Used bu urcutorture.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 9 Feb 2009 06:27:48 +0000 (01:27 -0500)]
Fix lock -> unlock in synchronize_rcu
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 9 Feb 2009 05:54:50 +0000 (00:54 -0500)]
Use xchg in publish content
Also changes the publish content parameter. Now takes the pointer itself as
first parameter rather than a pointer to the pointer.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 9 Feb 2009 05:29:58 +0000 (00:29 -0500)]
Add rcu_assign_pointer
rcu_assign_pointer has a memory barrier which lets the writer make sure the data
has been properly written to memory before setting the pointer.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 9 Feb 2009 04:56:15 +0000 (23:56 -0500)]
Remove parameter from rcu_read_lock()
Also makes the read fast-path twice faster :
7 cycles instead of 14 on a 8-cores x86_64.
Mathieu :
I limited the amount of nested readers to 256. Should be enough and lets us use
testb generically.
Changed the 64-bits code to make it the same as 32-bits. I prefer to have the
exact same behavior on both architectures.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 9 Feb 2009 00:21:34 +0000 (19:21 -0500)]
Add randomness to yield debug test
Add randomness to allow 1/2 odds that the yield will be done.
This lets some paths go quickly and others not.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 9 Feb 2009 00:01:58 +0000 (19:01 -0500)]
Add DEBUG_YIELD, add test duration
Add some testing calling the scheduler, and add a duration parameter to
test_urcu.c.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Sun, 8 Feb 2009 22:10:22 +0000 (17:10 -0500)]
Change API
* rcu_read_lock
> A patch below to allow nested rcu_read_lock() while keeping to the Linux
> kernel API, just FYI. One can argue that the overhead of accessing the
> extra per-thread variables is offset by the fact that there no longer
> needs to be a return value from rcu_read_lock() nor an argument to
> rcu_read_unlock(), but hard to say.
>
I ran your modified version within my benchmarks :
with return value : 14.164 cycles per read
without return value : 16.4017 cycles per read
So we have a 14% performance decrease due to this. We also pollute the
branch prediction buffer and we add a cache access due to the added
variables in the TLS. Returning the value has the clear advantage of
letting the compiler keep it around in registers or on the stack, which
clearly costs less.
So I think the speed factor outweights the visual considerations. Maybe
we could switch to something like :
unsigned int qparity;
urcu_read_lock(&qparity);
...
urcu_read_unlock(&qparity);
That would be a bit like local_irq_save() in the kernel, except that we
could do it in a static inline because we pass the address. I
personnally dislike the local_irq_save() way of hiding the fact that it
writes to the variable in a "clever" macro. I'd really prefer to leave
the " & ".
* rcu_write_lock
Removed.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Sun, 8 Feb 2009 20:42:08 +0000 (15:42 -0500)]
Add timing tests
initial results :
On a 8-cores x86_64
test_rwlock_timing.c : 4339.07 cycles
test_urcu_timing.c : 14.16 cycles
Speedup : 306
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Paul E. McKenney [Sun, 8 Feb 2009 00:00:55 +0000 (19:00 -0500)]
Do two parity flip in the writer to fix race condition
On Sat, Feb 07, 2009 at 07:10:28AM -0800, Paul E. McKenney wrote:
> So, how to fix? Here are some approaches:
>
> o Make urcu_publish_content() do two parity flips rather than one.
> I use this approach in my rcu_rcpg, rcu_rcpl, and rcu_rcpls
> algorithms in CodeSamples/defer.
>
> o Use a single free-running counter, in a manner similar to rcu_nest,
> as suggested earlier. This one is interesting, as I rely on a
> read-side memory barrier to handle the long-preemption case.
> However, if you believe that any thread that waits several minutes
> between executing adjacent instructions must have been preempted
> (which the memory barriers that are required to do a context
> switch), then a compiler barrier suffices. ;-)
>
> Of course, the probability of seeing this failure during test is quite
> low, since it is unlikely that thread 0 would run just long enough to
> execute its signal handler. However, it could happen. And if you were
> to adapt this algorithm for use in a real-time application, then priority
> boosting could cause this to happen naturally.
And here is a patch, taking the first approach. It also exposes a
synchronize_rcu() API that is used by the existing urcu_publish_content()
API. This allows easier handling of structures that are referenced by
more than one pointer. And should also allow to be more easily plugged
into my rcutorture test. ;-)
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Bert Wesarg [Sat, 7 Feb 2009 23:59:29 +0000 (18:59 -0500)]
test_urcu.c: use gettid()
It's probably better to print the tid for each thread, not the pid.
Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com>
Bert Wesarg [Fri, 6 Feb 2009 11:45:42 +0000 (06:45 -0500)]
URCU : use pthread_equal()
But you should use pthread_equal() for you equality test of pthread_t.
Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Fri, 6 Feb 2009 11:13:49 +0000 (06:13 -0500)]
Run longer tests
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Fri, 6 Feb 2009 03:25:51 +0000 (22:25 -0500)]
update Makefile, -Wall
Mathieu Desnoyers [Fri, 6 Feb 2009 02:47:01 +0000 (21:47 -0500)]
remove ugly gcc warning removal ack, simply cast the caller parameter
Mathieu Desnoyers [Fri, 6 Feb 2009 02:41:04 +0000 (21:41 -0500)]
add acknowledgements, fix gcc warnings
Mathieu Desnoyers [Fri, 6 Feb 2009 02:14:20 +0000 (21:14 -0500)]
fix wait_for_quiescent_state
Mathieu Desnoyers [Fri, 6 Feb 2009 01:49:36 +0000 (20:49 -0500)]
add licensing
Mathieu Desnoyers [Fri, 6 Feb 2009 01:46:14 +0000 (20:46 -0500)]
modify test values
Mathieu Desnoyers [Fri, 6 Feb 2009 01:04:29 +0000 (20:04 -0500)]
runs
Mathieu Desnoyers [Fri, 6 Feb 2009 01:04:03 +0000 (20:04 -0500)]
runs
Mathieu Desnoyers [Fri, 6 Feb 2009 00:42:15 +0000 (19:42 -0500)]
add make clean
Mathieu Desnoyers [Fri, 6 Feb 2009 00:41:34 +0000 (19:41 -0500)]
runs
Mathieu Desnoyers [Fri, 6 Feb 2009 00:16:25 +0000 (19:16 -0500)]
compile
Mathieu Desnoyers [Fri, 6 Feb 2009 00:06:44 +0000 (19:06 -0500)]
init version
This page took 0.029537 seconds and 4 git commands to generate.