Saturday, November 22, 2014

Scaling Message Buses on the X Axis


The X Axis
One of the key things I took from Scalabiltiy Rules book is what they refer to as the AKF Cube or the AKF Three Axes of Scale. Generally speaking, The cube suggests a schematic 3D description of a system's scalability options - a simple methodology to examine the ways to scale your system and it's components.



X: Horizontal duplication (Design to clone things) - duplicating your data and services for leveraging on the high read/writes ratio.
Y: Split by service (Design to split different things) - separating components of different services to avoid dependencies and shared resources and creating fault isolation architecture.
Z: Lookup oriented splits (Design to split similar things) - separating components by sharding.

 If every component of your system can scale on all three directions - then you are probably near infinite scalability.


Scaling Message Buses
When relating the service buses, The authors recommend scaling using the Y axis and Z axis. But when it comes to horizontal duplication the authors say that it does not work well when it comes to message buses - and  I decided to challenge that.

In eToro, we use RabbitMQ for asynchronous messaging and service bus. Up until now we implemented Y Axis scalability - we have different message bus for each service (or sub-domain) and we have some of them connected via federation for cross domain eventually consistent integration.

The Test
RabbitMQ Cluster
I have created a cluster using two virtual Ubuntu machines installed with latest RabbitMQ version.
I used HAProxy as a network load balancer to have the clients redirected to the available node once one drops.
The RabbitMQ Cluster enables you to join multiple nodes together into a single logical node. All queues on all nodes are accessible from all other nodes (n to n topology) when it comes to both writes and reads - transparent to both consumers and producers.
Measuring messages latency in a cluster where the readers and the writers are on different nodes (mixed) came out with ~0 latency.
Using a cluster enables you to increase the resources in the a single node horizontally by adding nodes to the cluster.
But, as only one node is considered the real owner of a queue - when that node drops - the queue drops with it and is not accessible anymore from other nodes in the cluster.

Mirrored Queues
This is exactly where duplication (or X Axis) kicks in: RabbitMQ suggests mirroring queues across a cluster to create high availability cluster

I created a queue on one of the nodes and added a policy to have it duplicated across all nodes.
I dropped one of my nodes and when my clients reconnected they continued consuming the queue - as expected.

When using mirroring queues all clients are working directly with the master queue and when a change happens (message is enqueued or acknowledged) it is simply mirrored across all slave queues. When the master queue drops other slave is being promoted to master.

Mirrored queues provides a good solution for high availability with a bit of a latency cost: When working with queues in a cluster I measured 0 latency on 1000/s messages. On Mirrored queues I measured ~1 millisecond latency.

Federated Queues
Mirrored queues provides high availability but does not solve a resources or load problems - for that, you can either use the Y and Z axis splits or federated queues.

Federating a queue creates a low priority consumer on a master queue - which implies that only messages that currently have no available consumer to handle them are passed through federation to a slave queue.

During the test I noticed that even without any consumers on the master queue the average messages/sec transferred to federated queue was ~2k (which gave me a 40% throughput boost).

Summary
RabbitMQ provides great features for scaling your message bus. I must say I would not give up so fast on scaling a system's service bus using the X axis.