workqueue/waitqueue: use lock-free stack for wakeup
The case for which we try to reduce latency is the wakeup, not the
"wait", since "wait" is typically done when a thread is ready to put
itself asleep awaiting for more work.
wfstack has blocking "pop" operation, which means that if a worker
thread is delayed for an extremely long amount of time during the push
operation (add to waitqueue), it could delay execution of pop (wakeup)
for that same amount of time.
lfstack does not have this downside, at the expense of having the "wait"
operation (push) being only lock-free rather than wait-free. However,
since we don't care that much about "wait" being wait-free, it makes
sense to use lfstack here.