Building and breaking replica sets in MongoDB

In the first of our Write Stuff articles, Alex Kilpatrick, CTO at BeehiveID, looks at a subject most Compose users won't need to worry about – building and breaking MongoDB replica sets. If you think you have the Write Stuff to write about databases, why not submit your own ideas and earn up to $900 in cash and database credit?

The goal of this tutorial is to learn how to build a replica set in MongoDB, and more importantly how to break it and how to recover from failures. By doing this in a controlled environment, you will be ready to handle a real disaster when it happens.

Prerequisites

You need a machine with MongoDB installed. Windows, Ubuntu, OSX and basic familiarity with the command-line. If you are on Windows, I suggest you select the "custom" install option and install to C:\MongoDB to save yourself some typing.

We're not going to do much with complex MongoDB data types. If you are familiar with JSON, that is all you need.

Introduction

The main purpose of a replica set is to provide redundancy for your data. Any data you write is replicated among all members of the set, so if a member of your set goes down you can keep operating. I have personally had two production systems saved by the fact that I was using a replica set – the system never went down even though I lost a server (and in one case lost two at the same time).

A replica set is always going to be deployed across different cloud instances, and often in different data centers, or even across different cloud providers in order to enhance reliability. For this tutorial, however, we are going to run all of our instances on our local machine, just to make things a little easier. In our case, we are going to run instances on different ports, and with different data directories.

From MongoDB's perspective, they will behave almost exactly the same as if we ran them on different servers. The main difference is that replication, recovery, and general communication between the servers on a local machine will be faster because there is less latency. The core operations will be identical, though.

If you have access to multiple cloud servers through your cloud provider, you should be able to follow the tutorial using cloud servers instead of your local machine and it will work the same way.

Starting a MongoDB server

We are going to be creating a lot of windows for this tutorial, to simulate different machines. I've labeled my examples so you know which window is which, but you will want to keep them organized on your desktop.

Start by creating 3 data directories off of your MongoDB directory (or somewhere else if you prefer):

C:\MongoDB>md data1  
C:\MongoDB>md data2  
C:\MongoDB>md data3  

Start an instance of MongoDB like this:

Server window 1

C:\MongoDB>bin\mongod --dbpath data1 --port 27021  

On Windows, you will have to select 'OK' in the dialog allowing connections to these ports.

You will see some log output and the last thing should be a line with "waiting for connections on port XXX" One of the reasons we want to run this way (as opposed to running as a service) is that you can see all the log messages in the command windows. That gives some insight into what is going on behind the scenes.

Open another command prompt. We will use this window to query our first instance.

Query window 1

C:\MongoDB>bin\mongo --port 27021  
MongoDB shell version: 3.0.5  
connecting to: 127.0.0.1:27021/test  
>

So now we are connected to our MongoDB server running on port 27021. Let's insert a simple document.

> db.test.insert({name:'Alex Kilpatrick', occupation: 'CTO'})
WriteResult({ "nInserted" : 1 })  

So we inserted a simple JSON document. Without a schema it is that easy to start shoving data into MongoDB. Just to make sure, let's run a query:

> db.test.find({'occupation':'CTO'})
{ "_id" : ObjectId("55babf516605276687a92f39"), "name" : "Alex Kilpatrick", "occupation" : "CTO" }

Yep, it's there.

Replication

Now kill your MongoDB instance and restart it this way:

Server window 1

C:\MongoDB>bin\mongod --dbpath data1 --port 27021 --replSet mySet  

That tells MongoDB that it is part of a replica set. Go back to your command window. Kill the connection and reconnect. Now try to run your query again:

Query window 1

> db.test.find({'occupation':'CTO'})
Error: error: { "$err" : "not master and slaveOk=false", "code" : 13435 }  

This is because we have told MongoDB we are part of a replica set, but we haven't configured it yet. MongoDB will keep you out of trouble this way. Starting replication is easy, though. Go to your command window:

Query window 1

> rs.initiate() {
  "info2" : "no configuration explicitly specified -- making one",
  "me" : "Alex-Yoga2:27021",
  "ok" : 1
}
mySet:OTHER>  

Note that command prompt has changed to show we are part of set now, and it says "OTHER" More on that in a minute. Try our query again:

Query window 1

mySet:OTHER> db.test.find({'occupation':'CTO'})  
{ "_id" : ObjectId("55babf516605276687a92f39"), "name" : "Alex Kilpatrick", "occupation" : "CTO" }
mySet:PRIMARY>  

Our query works, and now the command prompt now says "PRIMARY" because the replication magic has decided that we are now the master. This is important to know. If all of your replication members die but one, you can still keep operating.

Next, we will run the command that you will run the most often in relations to replica sets - rs.status;

Query window 1

mySet:PRIMARY> rs.status() {  
  "set" : "mySet",
  "date" : ISODate("2015-07-31T00:33:40.724Z"),
  "myState" : 1,
  "members" : [{
    "_id" : 0,
    "name" : "Alex-Yoga2:27021",
    "health" : 1,
    "state" : 1,
    "stateStr" : "PRIMARY",
    "uptime" : 546,
    "optime" : Timestamp(1438302543, 1),
    "optimeDate" : ISODate("2015-07-31T00:29:03Z"),
    "electionTime" : Timestamp(1438302543, 2),
    "electionDate" : ISODate("2015-07-31T00:29:03Z"),
    "configVersion" : 1,
    "self" : true
  }],
  "ok" : 1
}

The most important thing to note here is the state, as well as what level your particular system is. Many times you will think you are the master, but something else is.

Now let's get things moving and replicating.

Adding Members

Open 2 more command prompt windows and start 2 new instances of MongoDB like this:

Server window 2

C:\MongoDB>bin\mongod --dbpath data2 --port 27022 --replSet mySet  

Server window 3

C:\MongoDB>bin\mongod --dbpath data3 --port 27023 --replSet mySet  

This is effectively the same thing as running MongoDB on three machines, since we have three different data directories and three different ports.

So we have three servers now, all ready to be part of a replica set, but only one of those is actually part of it. You force the other ones to join by issuing commands on the master.

Query window 1

mySet:PRIMARY> rs.add('Alex-Yoga2:27022')  
{ "ok" : 1 }

Replace 'Alex-Yoga2' with your machine name. The reason we have to do this is that MongoDB is picky about using multiple 'localhost' names in a replica set. That won't be an issue in a real deployment. Now if you run rs.status you will see the new member.

mySet:PRIMARY> rs.status() {  
  "set" : "mySet",
  "myState" : 1,
  "members" : [{
    "_id" : 0,
    "name" : "Alex-Yoga2:27021",
    "state" : 1,
    "stateStr" : "PRIMARY",
    ...
    "self" : true
  },
  {
    "_id" : 1,
    "name" : "Alex-Yoga2:27022",
    "state" : 2,
    "stateStr" : "SECONDARY",
    ...
  }],
  "ok" : 1
}
mySet:PRIMARY>  

I've deleted most of the extraneous stuff here so you can see the important parts. The new machine is now a secondary in our set, and we can see both members. By the time you have read this, all the replication magic has happened. Open up a new query window so we can see:

Query window 2

C:\MongoDB>bin\mongo --port 27022  
MongoDB shell version: 3.0.5  
connecting to: 127.0.0.1:27022/test  
mySet:SECONDARY> db.test.find()  
Error: error: { "$err" : "not master and slaveOk=false", "code" : 13435 }  
mySet:SECONDARY>  

Whoops! MongoDB really wants us to stick to the master. We can get around that, though:

Query window 2

mySet:SECONDARY> rs.slaveOk()  
mySet:SECONDARY> db.test.find()  
{ "_id" : ObjectId("55babf516605276687a92f39"), "name" : "Alex Kilpatrick", "occ
upation" : "CTO" }  
mySet:SECONDARY>  

We told MongoDB that we are ok with querying slaves. Magic! Our data has been replicated. Feel free to play around with adding data and seeing how it replicates. After that, start up a third server:

Server window 3

C:\MongoDB>bin\mongod --dbpath data3 --port 27023 --replSet mySet  

Don't forget to use the right port and data directory. Go back to the master query window and add it to the set (user your machine name):

Query window 1

mySet:PRIMARY> rs.add('Alex-Yoga2:27023')  
{ "ok" : 1 }
mySet:PRIMARY>  

You always want to have an odd number of replica set members because they vote on who is the primary. Bad things can happen with even numbers. You can use an arbiter if you want to have an even number of machines, though.

You can connect to your third server and verify that the data has been replicated. Think about this for a bit. All we did was start up a machine and then go to the master and run rs.add() on it. That's all it takes to add a machine to a replica set.

Handling Down Servers

Replica sets are very tolerant to faults, of course. Kill the master running in the first server window. The go to the #2 query window and run this:

Query window 2

mySet:SECONDARY> rs.status() {  
  "set" : "mySet",
  "myState" : 2,
  "members" : [{
    "_id" : 0,
    "name" : "Alex-Yoga2:27021",
    "health" : 0,
    "state" : 8,
    "stateStr" : "(not reachable/healthy)",
    "pingMs" : 1,
    "lastHeartbeatMessage" : "Failed attempt to connect to Alex-Yoga2:27021; couldn't connect to server Alex-Yoga2:27021 (192.168.1.131), connection attempt failed",
  }, 
  {
    "_id" : 2,
    "name" : "Alex-Yoga2:27023",
    "stateStr" : "PRIMARY",
  }],
  "ok" : 1
}
mySet:SECONDARY>  

I've deleted a lot for brevity, but you can see that the first server is now unreachable and (in my case) the third machine is now the master. For you, the second machine may be the new master. It is really just a matter of timing. This is what would happen if you had to reboot for maintenance or had some other sporadic upset. However, your app would still plug right along using the other two machines.

Now you can bring back up the simulated dead server and run rs.status() until you see it come back up. It won't be the master, because there is no need to elect a new master. You can actually see this conversation in the logs:

2015-07-30T20:26:43.993-0500 I REPL     [ReplicationExecutor] Member Alex-Yoga2:  
27023 is now in state PRIMARY  
2015-07-30T20:26:43.993-0500 I REPL     [ReplicationExecutor] not electing self,  
 Alex-Yoga2:27022 would veto with 'Alex-Yoga2:27021 is trying to elect itself but Alex-Yoga2:27023 is already primary and more up-to-date'

If you want a certain machine to always be master, you can set its priority.

Handling Dead Machines

Now imagine a typical cloud configuration. You would configure your replica set machine and probably set it to the default port of 27017. Then freeze that as a VM instance so you could run new clones of it any time you want. They don't need any special configuration other than to know they are part of a particular replica set name. When you run rs.add(), the master initiates the conversation and brings the new member into the fold. So if a machine dies, you can just spin up another one to take its place quite easily. Let's do that now.

Kill the non-primary server (2 or 3) with control-C. We're going to pretend it is never coming back and start server 4 with a new data directory. Start a new server like this:

Server window 4

C:\MongoDB>mkdir data4

C:\MongoDB>bin\mongod --dbpath data4 --port 27024 --replSet mySet  

Now connect to your master instance and add it to the replica set:

Primary query window

C:\MongoDB>bin\mongo --port 27023  
MongoDB shell version: 3.0.5  
connecting to: 127.0.0.1:27023/test  
mySet:PRIMARY> rs.add('Alex-Yoga2:27024')  
2015-07-30T22:39:33.800-0500 I NETWORK  DBClientCursor::init call() failed  
2015-07-30T22:39:33.805-0500 I NETWORK  trying reconnect to 127.0.0.1:27021 (127  
.0.0.1) failed
2015-07-30T22:39:33.809-0500 I NETWORK  reconnect 127.0.0.1:27021 (127.0.0.1) ok

reconnected to server after rs command (which is normal)  

Now we have the new server added and everything is good. However, we have a stale element in our config that we need to get rid of. We can do it like this: (remove whichever server you killed)

mySet:PRIMARY> rs.remove('Alex-Yoga2:27022')  
{ "ok" : 1 }

Now you can run rs.status() and see that everything is running fine. You may see the new server complain "count not find a member to sync from" but that will go away in a few minutes. We just simulated bringing in a new machine to replace a failed one.

Handling Data Corruption

Finally, let's simulate data corruption. The way to handle this is quite easy. Go to server 4 (the one we just created), shut it down and do this:

Server window 4

C:\MongoDB>del data4\*.*  
C:\MongoDB\data4\*.*, Are you sure (Y/N)? y  

The easiest way to handle corrupt data in a server is just to delete it and start over! It is a replica set after all. Start it back up and it will realize it has no data and start syncing with the other members. This is exactly what happens with a brand new member. If you have a lot of data it may take hours or days to sync back up, but it will.

Summary

I've shown you how to do a few things with replica sets:

MongoDB replication is pretty easy to set up and very robust. You should now play around with more failure scenarios in this sandbox and practice recovering from them. Trust me, you don't want to be learning this when a crisis happens.

Dr. Alex Kilpatrick is the CTO & Co-founder of BeehiveID, a company providing online identity verification as a service. He has designed, developed and deployed over a dozen biometric identity systems in the Middle East, including Iraq, Afghanistan, Jordan, and Kurdistan. His research interests include social network analysis, biometrics, and image recognition. He primarily works in MongoDB, Neo4J, and when forced to, SQL Server.

This article is licensed with CC-BY-NC-SA 4.0 by Compose.