I've been asked many times how our lock-free queue (you can read more about it here) differs from LMAX Disruptor. The last time is in discussion at Hacker News (it seems one of disruptor author was asking this). In LinkedIn discussions I also was asked about boost::lockfree::queue (it appeared in Boost recently). So I'm writing the post to share the answers.
LMAX Disruptor
Its source code available at https://github.com/redjack/varon-t.
The algorithms are simply
different and solve different problems: disruptor is a messaging queue
(if a producer P0 emits M0 into the queue then all consumers C0..CN
receive M0) while our queue is a classical work queue (if P0 and P1 emit
message M0 and M1, then only one consumer Ci receives
M0 and only consumer Cj receives M1 (i could be equal to j)).
Our
implementation competes with boost::lockfree::queue and it's much faster
since Boost implementation uses more heavy synchronization techniques.
The benchmark on GitHub also has Boost implementation, so you can
compare both the queues.
Unfortunately, there is no adequate
algorithm description for disruptor queue, only some indistinct
descriptions mostly suitable for business people rather than engineers.
So it was not easy to dig it's source code. However, I learned it and
there are some notes about the implementation.
The implementation
is bit inaccurate: there are a lot of branches without branch prediction
information available at compile time, to avoid cache line bouncing it
wastes two cache lines instead of simple align an item on cache line (I mean vrt_padded_int). I didn't pay too much attention to memory barriers
usage, but giving that X86-64 provides relatively strict memory
ordering, probably some of them also could be eliminated. Disruptor uses
very cleaver ideas, but I believe its performance can be improved after
good code review.
One more point is
that while our queue implementation is C++, it's still self sufficient
and can be easily ported to C for using in kernel space. It's doubtful
(in my humble opinion) that generic container depends on non-standard
libcork and moreover logging library (clogger).
One more thing to note about our queue and Disruptor. Dirsuptor's uses naive yielding logic. The problems with the logic is that if a thread has not job for long time it fails to sleep for increasing from yield to yield call time, this if the queue has no job for long time, but at once has a big spike, then it typically will waking up for some time before it starts to work. We solved the issue with lock-free condition wait.
Boost::lockfree::queue
It uses at least one CAS operation (see for example do_push() in boost/lockfree/queue.hpp), which is slower than plain RMW operation (atomic increment in our case) in best case, so it's slower. You can user benchmark for both the queues at GitHub. For my Intel Core i7-4650U it gives:
$ g++ -Wall -std=c++0x -O2 -D DCACHE1_LINESIZE=`getconf LEVEL1_DCACHE_LINESIZE` -I/opt/boost_1_55_0/include/ lockfree_rb_q.cc -lpthread
$ ./a.out
13969ms
check X data...
Passed
69357ms
check X data...
Passed
20240ms
check X data...
Passed
Note that I use latest boost version. The first result is for our queue, the second for naive queue implementation and the last one is for Boost. I.e. our implementation is more than 30% faster than boost::lockfree::queue.
Interesting article!
ReplyDeleteJust have a side comment. I noticed you use the words "cache line". However, I think this should be "cache block". This is a general misconception. Cache line and cache block are different. The cache line is the physical placeholder inside a cache, however, a cache block is the payload inside that physical placeholder. People tend to confuse them together since they are both of the same size. Hence, movement is supposed to be for cache blocks and not for cache lines :)
Hi Ahmad,
Deleteprobably I've never faced "cache block" in literature, and "cache line" could be misleading, but in most cases it's clear from context whether an author writes about physical placeholder in a cache or a payload. So I just prefer to hold common practise.
This comment has been removed by the author.
ReplyDeleteAlexander, code review is great as usual, but the article has following flaw: you are seemingly attributing varon-t's C implementation as LMAX Java original Distruptor ("LMAX Disruptor .. Its source code available"), which isn't correct.
ReplyDeleteHi Igor,
ReplyDeletethat was not my idea to reference varon-t's C implementation as LMAX Disrutor. Firstly, I was asked many times how our approach (algorithm, not particular C or Java implementation) differs from Disruptor design. And secondly akadien referred the C implementation as a plug for C at Hacker News (https://news.ycombinator.com/item?id=7042771).
This comment has been removed by a blog administrator.
ReplyDelete