Consistent Hashing
-
Consistent Hashing is a clever way to distribute data across multiple servers. Consistent Hashing helps us minimize chaos when a new server is added or a server goes down. Let’s see how it works.
-
When you need to store data across multiple servers (or nodes), you need a system to decide which data goes where. The simplest approach might be to take your data's hash value and do a modulo operation (%) with the number of servers. But this has a big problem. When you add or remove a server, most of your data needs to move to different locations. This creates unnecessary network traffic and strain on your system.
-
Consistent hashing solves this by creating a ring which maps to different hash values. So, if our hash value lies between 0 to 360, 0 to 60 degrees on the ring represent hash values between 0 and 60. The next 60 degrees represent hash values between 60 to 120 and so on. If you understood that, here’s how Consistent Hashing works:
-
First, you pick a value K (let's say K=3) which represents how many "virtual copies" each server gets on the ring. Each server is placed at K different positions around the ring. When you need to store a piece of data, you hash it and find its position on the ring. You then move clockwise until you hit the first server - that's where the data is stored.
-
The beauty of this approach is what happens when servers change. If a server fails, only the data that was specifically assigned to that server needs to move - everything else stays put. Similarly, when you add a new server, it only takes some data from its immediate neighbors on the ring, rather than triggering a massive reshuffle.