I've been asked many times how our lock-free queue (you can read more about it here) differs from LMAX Disruptor. The last time is in discussion at Hacker News (it seems one of disruptor author was asking this). In LinkedIn discussions I also was asked about boost::lockfree::queue (it appeared in Boost recently). So I'm writing the post to share the answers.
Its source code available at https://github.com/redjack/varon-t.
The algorithms are simply
different and solve different problems: disruptor is a messaging queue
(if a producer P0 emits M0 into the queue then all consumers C0..CN
receive M0) while our queue is a classical work queue (if P0 and P1 emit
message M0 and M1, then only one consumer Ci receives
M0 and only consumer Cj receives M1 (i could be equal to j)).
implementation competes with boost::lockfree::queue and it's much faster
since Boost implementation uses more heavy synchronization techniques.
The benchmark on GitHub also has Boost implementation, so you can
compare both the queues.
Unfortunately, there is no adequate
algorithm description for disruptor queue, only some indistinct
descriptions mostly suitable for business people rather than engineers.
So it was not easy to dig it's source code. However, I learned it and
there are some notes about the implementation.
is bit inaccurate: there are a lot of branches without branch prediction
information available at compile time, to avoid cache line bouncing it
wastes two cache lines instead of simple align an item on cache line (I mean vrt_padded_int). I didn't pay too much attention to memory barriers
usage, but giving that X86-64 provides relatively strict memory
ordering, probably some of them also could be eliminated. Disruptor uses
very cleaver ideas, but I believe its performance can be improved after
good code review.
One more point is
that while our queue implementation is C++, it's still self sufficient
and can be easily ported to C for using in kernel space. It's doubtful
(in my humble opinion) that generic container depends on non-standard
libcork and moreover logging library (clogger).
One more thing to note about our queue and Disruptor. Dirsuptor's uses naive yielding logic. The problems with the logic is that if a thread has not job for long time it fails to sleep for increasing from yield to yield call time, this if the queue has no job for long time, but at once has a big spike, then it typically will waking up for some time before it starts to work. We solved the issue with lock-free condition wait.
It uses at least one CAS operation (see for example do_push() in boost/lockfree/queue.hpp), which is slower than plain RMW operation (atomic increment in our case) in best case, so it's slower. You can user benchmark for both the queues at GitHub. For my Intel Core i7-4650U it gives:
$ g++ -Wall -std=c++0x -O2 -D DCACHE1_LINESIZE=`getconf LEVEL1_DCACHE_LINESIZE` -I/opt/boost_1_55_0/include/ lockfree_rb_q.cc -lpthread
check X data...
check X data...
check X data...
Note that I use latest boost version. The first result is for our queue, the second for naive queue implementation and the last one is for Boost. I.e. our implementation is more than 30% faster than boost::lockfree::queue.