Over a month ago, we re-organized the architecture of devo.ps, a rather complex piece of software, around ZeroMQ to handle communication between a growing number of subsystems. For those of you who are not familiar with ZeroMQ, it is a messaging library originally written in C++. They have a great user guide (probably the best I’ve seen!) with code examples in 26 languages. ZeroMQ offers a set of communication methods: Push-Pull, Pub-Sub and Req-Rep. Each method has its own message handling rules. You can use it over TCP to communicate across networks, or you can use its inproc-sockets to do inter-process (or even inter-thread) communication. To quote the ZeroMQ team directly “ZeroMQ looks like an embeddable networking library but acts like a concurrency framework.”
The honeymoon begins
Taking a step back
This is when we started wondering if there was something better out there. After the first version was out we started caring about the less common scenarios: what if node X goes down or what if process Y gets stuck. ZeroMQ does handle network problems well by transparently buffering messages and attempting to reconnect. But the application-level logic was restricted by how ZeroMQ’s Req-Rep works. Especially after we started thinking of how to handle heartbeat properly. It does a lot of things behind the scenes for the developer, but a bit too much for our taste. For example it mostly relies on buffering and retrying message delivery automatically, which is not always the desired behavior. Rethinking our messaging protocol we could have improved our usage by adapting something close to their Majordomo Pattern but this is when we took a closer look at RabbitMQ.
RabbitMQ is an AMQP compatible message broker implemented in Erlang. It is a reliable and proven technology that we wouldn’t have to worry about. RabbitMQ seemed very good in that regard so we checked if its features were matching our needs:
Push as well as Request-Reply (or RPC) style messaging
Message routing based on topic
Reliability (no lost messages when processes crash)
Persistency of queues
RabbitMQ provided all of these features out of the box; its way of handling RPC was a bit unique (check their tutorial to find out), but it still seemed doable. With devo.ps we value reliability and monitoring of our systems so that part of the feature list was a huge plus. The proven speed and scalability of the broker also means it would not become a bottleneck in our system (even at 800,000 messages/minute).
Moving forward for the better
After reviewing both ZeroMQ and RabbitMQ we decided to re-implement our messaging strategy with RabbitMQ. The switch from ZeroMQ to RabbitMQ went relatively smoothly. Now the messaging part of the system feels much more reliable. Using RabbitMQ for messaging allows us to dedicate more time to developing higher level application logic.
Looking back we now have a good understanding of what ZeroMQ and RabbitMQ are good for. I’m glad we were open minded enough to recognize a better tool in RabbitMQ and adopt it. I would still be happy to use ZeroMQ in future projects, but for our system its approach was too low level.
The big idea
A high level comparison of the strengths of each of the technologies:
Is a communication protocol (think of one big level up from TCP)
Easy to get started with
Does not have/require a server
Works well for interprocess communication too (not just network messaging)
Lower level in general (allows more freedom, but also more effort)