MongoDB Replica Example

Russ Bateman
11 October 2012
last update:

Table of Contents

What we expect to happen
Preview of steps
/etc/hosts
Set up the server hosts via scripts
blackpearl-replicas.sh
mimosa-replicas.sh
Launch the MongoDB shell
config.js
Deploying an arbiter
Setting up an arbiter
Appendix: the /etc/hosts additions
Appendix: the mongodb.conf files
Appendix: hidden replica nodes
Appendix: arbiter nodes
Appendix: Trouble, woe, woe, trouble...

This is a real example of creating a MongoDB replica set with 4 nodes across 2 separate Linux hosts. The replica nodes have geographical names that suggest where they might live. (There is no sharding going on here; that's a different topic and isn't being explored here.) This is all fanciful and they could have been called by really mechanical, boring names like rs0, rs1, ... rs3.

This is an attempt at modeling a real MongoDB replica set in preparation for doing it in a real data center down the road.

What we expect to happen

In the end, we expect to see four replica nodes, with one of them elected as primary.

PRIMARY staging-replica
SECONDARY americas-replica
SECONDARY emea-replica
SECONDARY asia-pacific-replica

Preview of steps

  1. Add replica node name entries to /etc/hosts on each host VM.
  2. Create mongodb.conf files for each replica.
  3.  
    blackpearl-replicas.sh:
  4. Create replica subdirectories on each host.
  5. Copy relevant configuration files (mongodb.conf, though we'll use the name of the replica node instead) to replica subdirectory of each host.
  6. Touch logfiles by name in replica subdirectory on each host.
  7. Launch mongod process using config files on each host.
  8. Launch MongoDB shell, connecting perhaps remotely to a mongod on one of the hosts.
    	$ mongo --host staging-replica --port 37017
    
  9.  
    config.js:
  10. Initiate the replicas by copying and pasting the JavaScript. At this point, the replicas should all come up. To "see" into the replica set, do this:
      $ mongo --host staging-replica --port 37017
    
  11. If an even number of nodes, create an arbiter to make the count odd.

/etc/hosts

The two server hosts are named af-blackpearl.site (Black Pearl) and af-mimosa.site (Mimosa). In their /etc/hosts files are lines such as (see appendix).

These entries are needed because we refer to them thus in MongoDB.

Prior to using the set-up scripts described next, I created the mongodb.conf files, to which I gave discrete names corresponding to the replica each describes. See this here.

Set up the server hosts via scripts

First, the set-up scripts. These are run on each of the two server hosts in order to set up special MongoDB subdirectories. Both of these servers are otherwise running a perfectly normal MongoDB service. When bounced, these scripts should also relaunch the Mongo replica nodes (as these are not set up as persistent services by this example).

blackpearl-replicas.sh:
#!/bin/sh
# ---------------------------------------------------------------------
# This script sets up two replicas on af-blackpearl.site. It is run as
# root. If this script is run a second time, it just ensures that the
# replicas are up and running.
#
# We added an arbiter since our total set-up would otherwise have
# resulted in a four-node replica set.
# ---------------------------------------------------------------------
MONGO_REPLICAS_DIR=/mongodb-replicas
       STAGING_DIR=staging
      AMERICAS_DIR=americas
       ARBITER_DIR=arbiter
  MONGOD_PROCESSES=/tmp/mongodb-replicas.deleteme

# Check for root...
id=`id -u 2>&1`
if [ $id -ne 0 ]; then
  echo "This script can only run as root."
  exit -1
fi

# Create subdirectories for MongoDB replicas. A separate MongoDB will
# run on each replica--separate even from the MongoDB running on this
# server.
if [  ! -d $MONGO_REPLICAS_DIR ]; then
  mkdir -p $MONGO_REPLICAS_DIR/$STAGING_DIR   # the primary will run here
  mkdir -p $MONGO_REPLICAS_DIR/$AMERICAS_DIR  # this is the first secondary
  mkdir -p $MONGO_REPLICAS_DIR/$ARBITER_DIR   # this is our arbiter
fi

# If there are configuration files locally, copy them up to the new
# subdirectory for use in launching mongod process(es).
conf_files=`ls *.conf`

for file in $conf_files; do
  echo "  $file"
  cp $file $MONGO_REPLICAS_DIR
done

cd $MONGO_REPLICAS_DIR
CWD=`pwd`
echo $CWD

# Are the mongods running? There will be the main one that's for the
# server and two others that are for our replicas.
`ps -ef | grep "[m]ongod " > $MONGOD_PROCESSES`
counts=`cat $MONGOD_PROCESSES | wc`
 count=`echo $counts | awk '{ print $1 }'`

echo "Found $count mongod process(es) running..."

case $count in
  1) echo "There's one mongod; starting the replicas..."
     ;;
  3) echo "Three mongods are running; this probably includes the replicas."
     exit 0
     ;;
  *) `cat $MONGOD_PROCESSES`
     echo "There aren't enough mongod processes running on this host."
     exit -2
     ;;
esac

# Launch the mongod processes that are our replicas:
# This will NOT work until we get the configuration files copied over
# there.
/usr/bin/mongod --config staging.conf
/usr/bin/mongod --config americas.conf
/usr/bin/mongod --config arbiter.conf

# vim: set tabstop=2 shiftwidth=2:

Here's the output from running the script on Black Pearl. (Note: I didn't actually set up the arbiter until after the four replica nodes were set up.)

root@af-blackpearl:~# ./blackpearl-replicas.sh
  americas.conf
  arbiter.conf
  staging.conf
/mongodb-replicas
Found 1 mongod process(es) running...
There's one mongod; starting the replicas...
forked process: 952
all output going to: /mongodb-replicas/staging-replica.log
child process started successfully, parent exiting
forked process: 973
all output going to: /mongodb-replicas/americas-replica.log
child process started successfully, parent exiting
root@af-blackpearl:~# ll /mongodb-replicas/
total 32
drwxr-xr-x  4 root root 4096 2012-10-11 16:30 .
drwxr-xr-x 25 root root 4096 2012-10-11 16:30 ..
drwxr-xr-x  2 root root 4096 2012-10-11 16:30 americas
-rw-r--r--  1 root root  204 2012-10-11 16:30 americas.conf
-rw-r--r--  1 root root 1953 2012-10-11 16:30 americas-replica.log
drwxr-xr-x  2 root root 4096 2012-10-11 16:30 arbiter
-rw-r--r--  1 root root  204 2012-10-11 16:30 arbiter.conf
-rw-r--r--  1 root root 1953 2012-10-11 16:30 arbiter.log
drwxr-xr-x  2 root root 4096 2012-10-11 16:30 staging
-rw-r--r--  1 root root  198 2012-10-11 16:30 staging.conf
-rw-r--r--  1 root root 1948 2012-10-11 16:30 staging-replica.log
root@af-blackpearl:~# ps -ef | grep [m]ongod
mongodb  692   1  0 Sep21 ?    00:27:22 /usr/bin/mongod --config /etc/mongodb.conf
root     952   1  0 16:30 ?    00:00:00 /usr/bin/mongod --config staging.conf
root     973   1  0 16:30 ?    00:00:00 /usr/bin/mongod --config americas.conf

Now, almost the identical script on Mimosa for the two replicas to run there.

mimosa-replicas.sh:
#!/bin/sh
# ---------------------------------------------------------------------
# This script sets up two replicas on af-mimosa.site. It was run as
# root. If this script is run a second time, it just ensures that the
# replicas are up and running.
# ---------------------------------------------------------------------
MONGO_REPLICAS_DIR=/mongodb-replicas
          EMEA_DIR=emea
  ASIA_PACIFIC_DIR=asia-pacific
  MONGOD_PROCESSES=/tmp/mongodb-replicas.deleteme

# Check for root...
id=`id -u 2>&1`
if [ $id -ne 0 ]; then
  echo "This script can only run as root."
  exit -1
fi

# Create subdirectories for MongoDB replicas. A separate MongoDB will
# run on each replica--separate even from the MongoDB running on this
# server.
if [  ! -d $MONGO_REPLICAS_DIR ]; then
  mkdir -p $MONGO_REPLICAS_DIR/$EMEA_DIR          # the primary will run here
  mkdir -p $MONGO_REPLICAS_DIR/$ASIA_PACIFIC_DIR  # this is the first secondary
fi

# If there are configuration files locally, copy them up to the new
# subdirectory for use in launching mongod process(es).
conf_files=`ls *.conf`

for file in $conf_files; do
  echo "  $file"
  cp $file $MONGO_REPLICAS_DIR
done

cd $MONGO_REPLICAS_DIR
CWD=`pwd`
echo $CWD

# Are the mongods running? There will be the main one that's for the
# server and two others that are for our replicas.
`ps -ef | grep "[m]ongod " > $MONGOD_PROCESSES`
counts=`cat $MONGOD_PROCESSES | wc`
 count=`echo $counts | awk '{ print $1 }'`

echo "Found $count mongod process(es) running..."

case $count in
  1) echo "There's one mongod; starting the replicas..."
     ;;
  3) echo "Three mongods are running; this probably includes the replicas."
     exit 0
     ;;
  *) `cat $MONGOD_PROCESSES`
     echo "There aren't enough mongod processes running on this host."
     exit -2
     ;;
esac

# Launch the mongod processes that are our replicas:
# This will NOT work until we get the configuration files copied over
# there.
/usr/bin/mongod --config emea.conf
/usr/bin/mongod --config asia-pacific.conf

# vim: set tabstop=2 shiftwidth=2:

And here's the result on Mimosa...

root@af-mimosa:~# ./mimosa-replicas.sh
  americas.conf
  asia-pacific.conf
  emea.conf
  staging.conf
/mongodb-replicas
Found 1 mongod process(es) running...
There's one mongod; starting the replicas...
forked process: 20881
all output going to: /mongodb-replicas/emea-replica.log
child process started successfully, parent exiting
forked process: 20900
all output going to: /mongodb-replicas/asia-pacific-replica.log
child process started successfully, parent exiting
root@af-mimosa:~# ll /mongodb-replicas/
total 40
drwxr-xr-x  4 root root 4096 2007-07-19 01:06 ./
drwxr-xr-x 22 root root 4096 2007-07-19 01:06 ../
-rw-r--r--  1 root root  204 2007-07-19 01:06 americas.conf
drwxr-xr-x  2 root root 4096 2007-07-19 01:06 asia-pacific/
-rw-r--r--  1 root root  216 2007-07-19 01:06 asia-pacific.conf
-rw-r--r--  1 root root 1971 2007-07-19 01:06 asia-pacific-replica.log
drwxr-xr-x  2 root root 4096 2007-07-19 01:06 emea/
-rw-r--r--  1 root root  192 2007-07-19 01:06 emea.conf
-rw-r--r--  1 root root 1931 2007-07-19 01:06 emea-replica.log
-rw-r--r--  1 root root  198 2007-07-19 01:06 staging.conf
root@af-mimosa:~# ps -ef | grep [m]ongod
mongodb  19465  1  0 Jul17 ?    00:01:28 /usr/bin/mongod --config /etc/mongodb.conf
root     20881  1  0 01:06 ?    00:00:00 /usr/bin/mongod --config emea.conf
root     20900  1  0 01:06 ?    00:00:00 /usr/bin/mongod --config asia-pacific.conf

Launch the MongoDB shell

With all four mongod processes running on the two server hosts, we'll run the MongoDB shell from our development host targeting one of the mongod processes on Black Pearl, the one for the staging-replica. We could have done this anywhere.

blackpearl-config.js:

This script is thus named because it just runs in a shell connected to a node on Black Pearl. This is arbitrary. In the end, we only need to scrape the config assignment statement out; the rest of what's in here is just comments.

/* ------------------------------------------------------------------------
- This is for configuring the two replicas for server Black Pearl. Copy
- and paste the lines below into the MongoDB shell to configure the
- replicas. Run this script in a shell launched to target Black Pearl
- remotely; port 37017 happens to be the primary. The command to launch
- the shell is:
-
- $ mongo --host staging-replica --port 37017
-
- You shouldn't do this until scripts blackpearl-replicas.sh and
- mimosa-replicas.sh have been run successfully on their respective hosts,
- af-blackpearl.site and af-mimosa.site. This is because it won't work
- until all four mongod processes are going.
- ------------------------------------------------------------------------ */

config =
    { _id:"blackpearl", members:
        [
            { _id:0, host:"staging-replica:37017" },
            { _id:2, host:"americas-replica:37018" },
            { _id:3, host:"emea-replica:37019" },
            { _id:4, host:"asia-pacific-replica:37020" }
        ]
    }

// Then issue these next commands to the shell:

rs.initiate( config );
rs.conf();
rs.status();

Copy and paste the assignment to config that will configure our replica nodes.

russ ~/blackpearl/accountmgr $ mongo --host staging-replica --port 37017
MongoDB shell version: 2.2.0
connecting to: staging-replica:37017/test
> show dbs
local	(empty)
> config =
...     { _id:"blackpearl", members:
...         [
...             { _id:0, host:"staging-replica:37017" },
...             { _id:2, host:"americas-replica:37018" },
...             { _id:3, host:"emea-replica:37019" },
...             { _id:4, host:"asia-pacific-replica:37020" }
...         ]
...     };
{
	"_id" : "blackpearl",
	"members" : [
		{
			"_id" : 0,
			"host" : "staging-replica:37017"
		},
		{
			"_id" : 2,
			"host" : "americas-replica:37018"
		},
		{
			"_id" : 3,
			"host" : "emea-replica:37019"
		},
		{
			"_id" : 4,
			"host" : "asia-pacific-replica:37020"
		}
	]
}
> rs.initiate( config );
{
	"info" : "Config now saved locally.  Should come online in about a minute.",
	"ok" : 1
}
> rs.conf()
{
	"_id" : "blackpearl",
	"version" : 1,
	"members" : [
		{
			"_id" : 0,
			"host" : "staging-replica:37017"
		},
		{
			"_id" : 2,
			"host" : "americas-replica:37018"
		},
		{
			"_id" : 3,
			"host" : "emea-replica:37019"
		},
		{
			"_id" : 4,
			"host" : "asia-pacific-replica:37020"
		}
	]
}
blackpearl:SECONDARY> rs.status()
{
	"set" : "blackpearl",
	"date" : ISODate("2012-10-12T18:37:20Z"),
	"myState" : 1,
	"members" : [
		{
			"_id" : 0,
			"name" : "staging-replica:37017",
			"health" : 1,
			"state" : 1,
			"stateStr" : "PRIMARY",
			"uptime" : 119,
			"optime" : Timestamp(1350067021000, 1),
			"optimeDate" : ISODate("2012-10-12T18:37:01Z"),
			"self" : true
		},
		{
			"_id" : 2,
			"name" : "americas-replica:37018",
			"health" : 1,
			"state" : 3,
			"stateStr" : "RECOVERING",
			"uptime" : 19,
			"optime" : Timestamp(1350067021000, 1),
			"optimeDate" : ISODate("2012-10-12T18:37:01Z"),
			"lastHeartbeat" : ISODate("2012-10-12T18:37:19Z"),
			"pingMs" : 0,
			"errmsg" : "initial sync done"
		},
		{
			"_id" : 3,
			"name" : "emea-replica:37019",
			"health" : 1,
			"state" : 3,
			"stateStr" : "RECOVERING",
			"uptime" : 19,
			"optime" : Timestamp(0, 0),
			"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
			"lastHeartbeat" : ISODate("2012-10-12T18:37:19Z"),
			"pingMs" : 0,
			"errmsg" : "initial sync need a member to be primary or secondary to do our initial sync"
		},
		{
			"_id" : 4,
			"name" : "asia-pacific-replica:37020",
			"health" : 1,
			"state" : 3,
			"stateStr" : "RECOVERING",
			"uptime" : 17,
			"optime" : Timestamp(0, 0),
			"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
			"lastHeartbeat" : ISODate("2012-10-12T18:37:19Z"),
			"pingMs" : 0,
			"errmsg" : "initial sync need a member to be primary or secondary to do our initial sync"
		}
	],
	"ok" : 1
}
blackpearl:PRIMARY> rs.status()
{
	"set" : "blackpearl",
	"date" : ISODate("2012-10-12T18:37:31Z"),
	"myState" : 1,
	"members" : [
		{
			"_id" : 0,
			"name" : "staging-replica:37017",
			"health" : 1,
			"state" : 1,
			"stateStr" : "PRIMARY",
			"uptime" : 130,
			"optime" : Timestamp(1350067021000, 1),
			"optimeDate" : ISODate("2012-10-12T18:37:01Z"),
			"self" : true
		},
		{
			"_id" : 2,
			"name" : "americas-replica:37018",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 30,
			"optime" : Timestamp(1350067021000, 1),
			"optimeDate" : ISODate("2012-10-12T18:37:01Z"),
			"lastHeartbeat" : ISODate("2012-10-12T18:37:29Z"),
			"pingMs" : 0
		},
		{
			"_id" : 3,
			"name" : "emea-replica:37019",
			"health" : 1,
			"state" : 3,
			"stateStr" : "RECOVERING",
			"uptime" : 30,
			"optime" : Timestamp(1350067021000, 1),
			"optimeDate" : ISODate("2012-10-12T18:37:01Z"),
			"lastHeartbeat" : ISODate("2012-10-12T18:37:29Z"),
			"pingMs" : 0,
			"errmsg" : "syncing to: staging-replica:37017"
		},
		{
			"_id" : 4,
			"name" : "asia-pacific-replica:37020",
			"health" : 1,
			"state" : 3,
			"stateStr" : "RECOVERING",
			"uptime" : 28,
			"optime" : Timestamp(1350067021000, 1),
			"optimeDate" : ISODate("2012-10-12T18:37:01Z"),
			"lastHeartbeat" : ISODate("2012-10-12T18:37:29Z"),
			"pingMs" : 0,
			"errmsg" : "syncing to: staging-replica:37017"
		}
	],
	"ok" : 1
}
blackpearl:PRIMARY> rs.status()
{
	"set" : "blackpearl",
	"date" : ISODate("2012-10-12T18:37:38Z"),
	"myState" : 1,
	"members" : [
		{
			"_id" : 0,
			"name" : "staging-replica:37017",
			"health" : 1,
			"state" : 1,
			"stateStr" : "PRIMARY",
			"uptime" : 137,
			"optime" : Timestamp(1350067021000, 1),
			"optimeDate" : ISODate("2012-10-12T18:37:01Z"),
			"self" : true
		},
		{
			"_id" : 2,
			"name" : "americas-replica:37018",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 37,
			"optime" : Timestamp(1350067021000, 1),
			"optimeDate" : ISODate("2012-10-12T18:37:01Z"),
			"lastHeartbeat" : ISODate("2012-10-12T18:37:37Z"),
			"pingMs" : 0
		},
		{
			"_id" : 3,
			"name" : "emea-replica:37019",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 37,
			"optime" : Timestamp(1350067021000, 1),
			"optimeDate" : ISODate("2012-10-12T18:37:01Z"),
			"lastHeartbeat" : ISODate("2012-10-12T18:37:37Z"),
			"pingMs" : 0
		},
		{
			"_id" : 4,
			"name" : "asia-pacific-replica:37020",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 35,
			"optime" : Timestamp(1350067021000, 1),
			"optimeDate" : ISODate("2012-10-12T18:37:01Z"),
			"lastHeartbeat" : ISODate("2012-10-12T18:37:37Z"),
			"pingMs" : 0
		}
	],
	"ok" : 1
}

Now you see all four nodes are up and awaiting our use. Here's proof...

blackpearl:PRIMARY> use accountmgrdb
switched to db accountmgrdb
blackpearl:PRIMARY> show collections
accounts
addresses
partners
system.indexes
blackpearl:PRIMARY> db.partners.find().pretty()
...
{
	"_id" : ObjectId("5078652ee4b0f9b645e6226c"),
	"companykey" : "ACME",
	"companyname" : "Acme Corporation.",
	"credentials" : {
		"secretkey" : "PvNisVJ9t7ItvDM/Rdbm/3R3fLASAMv/LyJpsRgIVLk=",
		"appkey" : "HEDvhQ9M7GRkQEhOiHXTsTRD4b3Oe3nExaq2fbsqe5s=",
		"created" : ISODate("2012-10-12T18:45:02.022Z"),
		"active" : true,
		"tokenttl" : 5
	},
	"created" : ISODate("2012-10-12T18:45:02.022Z"),
	"lastupdate" : ISODate("2012-10-12T18:45:02.022Z"),
	"forgotten" : false,
	"loggedin" : false
}
blackpearl:PRIMARY> db.accounts.find( { "fullname":"Jack the Ripper" } ).pretty()
{
	"_id" : ObjectId("507876f1e4b0d6576971343b"),
	"email" : "jack.the.ripper@eastend-murderers.co.uk",
	"password" : "Stabhardstaboften68",
	"partneroid" : ObjectId("100000000000000000000888"),
	"fullname" : "Jack the Ripper",
	"language" : "en",
	"created" : ISODate("2012-10-12T20:00:49.324Z"),
	"lastupdate" : ISODate("2012-10-12T20:00:49.324Z"),
	"forgotten" : false,
	"loggedin" : false,
	"loginfailurecount" : 0,
	"lockedflag" : false
}

Here we try another node...

~ $ mongo --host emea-replica --port 37019
MongoDB shell version: 2.2.0
connecting to: emea-replica:37019/test
blackpearl:SECONDARY> show dbs
accountmgrdb	0.0625GB
local	0.125GB
blackpearl:SECONDARY> use accountmgrdb
switched to db accountmgrdb
blackpearl:SECONDARY> show collections
Fri Oct 12 12:45:55 uncaught exception: error: { "$err" : "not master and slaveOk=false", "code" : 13435 }
blackpearl:SECONDARY> rs.slaveOk()
blackpearl:SECONDARY> show collections
accounts
addresses
partners
system.indexes
blackpearl:SECONDARY> db.partners.find().pretty()
...
{
	"_id" : ObjectId("5078652ee4b0f9b645e6226c"),
	"companykey" : "ACME",
	"companyname" : "Acme Corporation.",
	"credentials" : {
		"secretkey" : "PvNisVJ9t7ItvDM/Rdbm/3R3fLASAMv/LyJpsRgIVLk=",
		"appkey" : "HEDvhQ9M7GRkQEhOiHXTsTRD4b3Oe3nExaq2fbsqe5s=",
		"created" : ISODate("2012-10-12T18:45:02.022Z"),
		"active" : true,
		"tokenttl" : 5
	},
	"created" : ISODate("2012-10-12T18:45:02.022Z"),
	"lastupdate" : ISODate("2012-10-12T18:45:02.022Z"),
	"forgotten" : false,
	"loggedin" : false
}
blackpearl:SECONDARY> db.accounts.find( { "fullname":"Jack the Ripper" } ).pretty()
{
	"_id" : ObjectId("507876f1e4b0d6576971343b"),
	"email" : "jack.the.ripper@eastend-murderers.co.uk",
	"password" : "Stabhardstaboften68",
	"partneroid" : ObjectId("100000000000000000000888"),
	"fullname" : "Jack the Ripper",
	"language" : "en",
	"created" : ISODate("2012-10-12T20:00:49.324Z"),
	"lastupdate" : ISODate("2012-10-12T20:00:49.324Z"),
	"forgotten" : false,
	"loggedin" : false,
	"loginfailurecount" : 0,
	"lockedflag" : false
}
blackpearl:SECONDARY> exit
bye

Let's try our fourth replica...

~ $ mongo --host asia-pacific-replica --port 37020
MongoDB shell version: 2.2.0
connecting to: asia-pacific-replica:37020/test
blackpearl:SECONDARY> show dbs
accountmgrdb	0.0625GB
local	0.125GB
blackpearl:SECONDARY> use accountmgrdb
switched to db accountmgrdb
blackpearl:SECONDARY> rs.slaveOk()
blackpearl:SECONDARY> db.partners.find().pretty()
...
{
	"_id" : ObjectId("5078652ee4b0f9b645e6226c"),
	"companykey" : "ACME",
	"companyname" : "Acme Corporation.",
	"credentials" : {
		"secretkey" : "PvNisVJ9t7ItvDM/Rdbm/3R3fLASAMv/LyJpsRgIVLk=",
		"appkey" : "HEDvhQ9M7GRkQEhOiHXTsTRD4b3Oe3nExaq2fbsqe5s=",
		"created" : ISODate("2012-10-12T18:45:02.022Z"),
		"active" : true,
		"tokenttl" : 5
	},
	"created" : ISODate("2012-10-12T18:45:02.022Z"),
	"lastupdate" : ISODate("2012-10-12T18:45:02.022Z"),
	"forgotten" : false,
	"loggedin" : false
}
blackpearl:SECONDARY> db.accounts.find( { "fullname":"Jack the Ripper" } ).pretty()
{
	"_id" : ObjectId("507876f1e4b0d6576971343b"),
	"email" : "jack.the.ripper@eastend-murderers.co.uk",
	"password" : "Stabhardstaboften68",
	"partneroid" : ObjectId("100000000000000000000888"),
	"fullname" : "Jack the Ripper",
	"language" : "en",
	"created" : ISODate("2012-10-12T20:00:49.324Z"),
	"lastupdate" : ISODate("2012-10-12T20:00:49.324Z"),
	"forgotten" : false,
	"loggedin" : false,
	"loginfailurecount" : 0,
	"lockedflag" : false
}
blackpearl:SECONDARY> exit
bye

Four nodes, you say? But in order to hold an election, there must have been an odd number. Yes, I'm going to come back and set up an arbiter soon.

Deploying an arbiter

An arbiter is just a special instance of a mongodb that has no data (and could not become a primary in the case of an election), but can vote in any necessary election.

Experimenting with only two separate VMs, hosts, because having two old junk pieces of hardware that I'd already set up Ubuntu Natty on, I'm forced to set up my arbiter on one of them whereas it would be better to have it off-VM just as it was silly of me to set up replica nodes, two per VM in this test.

As this is only an experiment, and as long as everyone reading this gets it, we can continue.

Setting up an arbiter

  1. Add a reference to /etc/hosts in support of the arbiter just as done for each replica node, blackpearl-arbiter.
  2. mkdir /mongodb-replica/arbiter
  3. mongod --port 37021 --dbpath /mongodb-replica/arbiter --replSet blackpearla logpath=/mongodb-replicas/arbiter.log
  4. rs.addArb( "blackpearl-arbiter:37021" )

The instructions above are adapted to what I'm doing here (rather than being generic, which you can get anyway from here, but I'm going to accomplish them using a mongodb.conf file which is handy as a permanent record. And, I'm going to add the arbiter set-up to my original subdirectory set-up script (also for permanent record).

Let's add the arbiter:

~ $ mongo --host blackpearl-arbiter --port 37017
MongoDB shell version: 2.2.0
connecting to: blackpearl-arbiter:37017/test
blackpearl:PRIMARY> rs.conf()
{
	"_id" : "blackpearl",
	"version" : 1,
	"members" : [
		{
			"_id" : 0,
			"host" : "staging-replica:37017"
		},
		{
			"_id" : 2,
			"host" : "americas-replica:37018"
		},
		{
			"_id" : 3,
			"host" : "emea-replica:37019"
		},
		{
			"_id" : 4,
			"host" : "asia-pacific-replica:37020"
		}
	]
}
blackpearl:PRIMARY> rs.addArb( "blackpearl-arbiter:37021" )
{ "down" : [ "blackpearl-arbiter:37021" ], "ok" : 1 }
blackpearl:PRIMARY> rs.conf()
{
	"_id" : "blackpearl",
	"version" : 2,
	"members" : [
		{
			"_id" : 0,
			"host" : "staging-replica:37017"
		},
		{
			"_id" : 2,
			"host" : "americas-replica:37018"
		},
		{
			"_id" : 3,
			"host" : "emea-replica:37019"
		},
		{
			"_id" : 4,
			"host" : "asia-pacific-replica:37020"
		},
		{
			"_id" : 5,
			"host" : "blackpearl-arbiter:37021",
			"arbiterOnly" : true
		}
	]
}
blackpearl:PRIMARY> rs.status()
{
	"set" : "blackpearl",
	"date" : ISODate("2012-10-15T15:49:41Z"),
	"myState" : 1,
	"members" : [
		{
			"_id" : 0,
			"name" : "staging-replica:37017",
			"health" : 1,
			"state" : 1,
			"stateStr" : "PRIMARY",
			"uptime" : 249260,
			"optime" : Timestamp(1350315961000, 1),
			"optimeDate" : ISODate("2012-10-15T15:46:01Z"),
			"self" : true
		},
		{
			"_id" : 2,
			"name" : "americas-replica:37018",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 249160,
			"optime" : Timestamp(1350315961000, 1),
			"optimeDate" : ISODate("2012-10-15T15:46:01Z"),
			"lastHeartbeat" : ISODate("2012-10-15T15:49:40Z"),
			"pingMs" : 0
		},
		{
			"_id" : 3,
			"name" : "emea-replica:37019",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 249160,
			"optime" : Timestamp(1350315961000, 1),
			"optimeDate" : ISODate("2012-10-15T15:46:01Z"),
			"lastHeartbeat" : ISODate("2012-10-15T15:49:40Z"),
			"pingMs" : 0
		},
		{
			"_id" : 4,
			"name" : "asia-pacific-replica:37020",
			"health" : 1,
			"state" : 2,
			"stateStr" : "SECONDARY",
			"uptime" : 249158,
			"optime" : Timestamp(1350315961000, 1),
			"optimeDate" : ISODate("2012-10-15T15:46:01Z"),
			"lastHeartbeat" : ISODate("2012-10-15T15:49:40Z"),
			"pingMs" : 0
		},
		{
			"_id" : 5,
			"name" : "blackpearl-arbiter:37021",
			"health" : 0,
			"state" : 8,
			"stateStr" : "(not reachable/healthy)",
			"uptime" : 0,
			"lastHeartbeat" : ISODate("1970-01-01T00:00:00Z"),
			"pingMs" : 0,
			"errmsg" : "socket exception [CONNECT_ERROR] for blackpearl-arbiter:37021"
		}
	],
	"ok" : 1
}



Appendix: the /etc/hosts additions

af-blackpearl.site:
127.0.1.1     staging-replica  americas-replica  blackpearl-replica
16.86.192.111 af-mimosa.site   emea-replica      asia-pacific-replica
af-mimosa.site:
127.0.1.1     emea-replica        asia-pacific-replica
16.86.192.110 af-blackpearl.site  staging-replica       americas-replica  blackpearl-replica

Appendix: the mongodb.conf files

staging.conf:
# This is mongodb.conf for the staging (original primary) node.
port=37017
replSet=blackpearl
logpath=/mongodb-replicas/staging-replica.log
dbpath=/mongodb-replicas/staging
fork=true
logappend=true
americas.conf:
# This is mongodb.conf for the americas (original secondary) node.
port=37018
replSet=blackpearl
logpath=/mongodb-replicas/americas-replica.log
dbpath=/mongodb-replicas/americas
fork=true
logappend=true
arbiter.conf:
# This is mongodb.conf for the arbiter node made necessary
# by the fact that we'd otherwise have only four nodes.
port=37021
replSet=blackpearl
logpath=/mongodb-replicas/arbiter.log
dbpath=/mongodb-replicas/arbiter
fork=true
logappend=true
emea.conf:
# This is mongodb.conf for the emea (original secondary) node.
port=37019
replSet=blackpearl
logpath=/mongodb-replicas/emea-replica.log
dbpath=/mongodb-replicas/emea
fork=true
logappend=true
asia-pacific.conf:
# This is mongodb.conf for the asia-pacific (original secondary) node.
port=37020
replSet=blackpearl
logpath=/mongodb-replicas/asia-pacific-replica.log
dbpath=/mongodb-replicas/asia-pacific
fork=true
logappend=true

Appendix: hidden replica nodes

Hidden replica nodes are "non-production" members of a replica set, meaning they don't vote and can't be seen in any attempt to connect to, read and/or write directly to them. They cannot become replacement primaries. They do not show up in db.isMaster(). They can vote for new primaries, however.

Use a hidden replica node as the node from which to perform back-ups, e.g.: against which to run mongodump.

Here's how to configure a node as a hidden replica:

    config = rs.conf()
    config.members[ 0 ].priority = 0
    config.members[ 0 ].hidden = true
    rs.reconfig( config )

See Configure a Replica Set Member as Hidden.


Appendix: arbiter nodes

Arbiters have no date (take up no significant disk space) and exist only to break ties in the election of new primaries.

Do not run an arbiter process on a host that is itself serving to run a mongod process that's an active primary or secondary of the replica set in which the arbiter is to serve.

To set an arbiter up (see also live example Setting up an arbiter, above):

    mkdir /data/arb
    mongo --host blackpearl-arbiter --port 37017
    rs.addArb( "blackpearl-arbiter:37021" )

See Add an Arbiter to Replica Set.


Appendix: Trouble, woe, woe, trouble...

I once upgraded a three-VM cluster of MongoDB nodes from version 2.2.x to 2.4.5 and found that upstart no longer worked. I was able to launch each VM by hand. Once launched, rs.status() revealed that the replica set did come back up cleanly (though the primary changed nodes).

The replica set in question was set up with only the following difference at the top of /etc/mongodb.conf:

# mongodb.conf
# --------------------------------------------------------------
# These values are added here in configuration of a replica set.
# No other values in this file were changed.
port=37017
replSet=cloudmgr-replicas
fork=true
# --------------------------------------------------------------

(The other node's version of this file contained different port numbers of course.)

Here's a console scrape of booting the second node:

~ $ ssh db2
Welcome to Ubuntu 12.04.2 LTS (GNU/Linux 3.2.0-51-generic x86_64)

...

0 packages can be updated.
0 updates are security updates.

Last login: Sat Aug 17 18:17:18 2013 from 192.168.0.102
russ@db-2:~$ sudo bash
root@db-2:~# ps -ef | grep [m]ongo
root@db-2:~# service mongodb restart
stop: Unknown instance:
mongodb start/running, process 1309
root@db-2:~# service mongodb status
mongodb stop/waiting
root@db-2:~# mongod --config /etc/mongodb.conf
about to fork child process, waiting until server is ready for connections.
forked process: 1320
all output going to: /var/log/mongodb/mongodb.log
child process started successfully, parent exiting

Of course, even getting them up by hand, I get:

root@db-3:~# ps -ef | grep [m]ongod
root  1353  1  0 Aug17 ?  00:15:17 mongod --config /etc/mongodb.conf
root@db-3:~# service mongodb status
mongodb stop/waiting

Solution

I have found the symptom, though I have no idea where the disease came from.

Poring through the log as I attempted to start the service (don't know why this isn't a problem when I start mongod by hand) I see:

Wed Aug 21 10:22:50.408 [initandlisten] MongoDB starting : pid=2898 port=37017 dbpath=/var/lib/mongodb 64-bit host=db-1
Wed Aug 21 10:22:50.409 [initandlisten] db version v2.4.6
Wed Aug 21 10:22:50.409 [initandlisten] git version: b9925db5eac369d77a3a5f5d98a145eaaacd9673
Wed Aug 21 10:22:50.409 [initandlisten] build info: Linux ip-10-2-29-40 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 \
         EST 2009 x86_64 BOOST_LIB_VERSION=1_49
Wed Aug 21 10:22:50.409 [initandlisten] allocator: tcmalloc
Wed Aug 21 10:22:50.409 [initandlisten] options: { config: "/etc/mongodb.conf", dbpath: "/var/lib/mongodb", fork: "true", \
         logappend: "true", logpath: "/var/log/mongodb/mongodb.log", port: 37017, replSet: "cloudmgr-replicas" }
Wed Aug 21 10:22:50.418 [initandlisten] journal dir=/var/lib/mongodb/journal
Wed Aug 21 10:22:50.418 [initandlisten] recover begin
Wed Aug 21 10:22:50.418 [initandlisten] info no lsn file in journal/ directory
Wed Aug 21 10:22:50.418 [initandlisten] recover lsn: 0
Wed Aug 21 10:22:50.418 [initandlisten] recover /var/lib/mongodb/journal/j._0
Wed Aug 21 10:22:50.418 [initandlisten] couldn't open /var/lib/mongodb/journal/j._0 errno:13 Permission denied
Wed Aug 21 10:22:50.418 [initandlisten] Assertion: 13544:recover error couldn't open /var/lib/mongodb/journal/j._0
0xdddd81 0xd9f55b 0xd9fa9c 0x931ef1 0x932182 0x932aac 0x932d12 0x91df9f 0x6d6cbc 0x6d74ed 0x6ddfd0 0x6dfc29 0x7f4a97e7976d 0x6cf999

And, sure enough, I see:

root@db-1:/var/lib/mongodb/journal# ll
total 3145748
drwxr-xr-x 2 mongodb mongodb       4096 Aug 21 09:11 ./
drwxr-xr-x 3 mongodb mongodb       4096 Aug 21 10:22 ../
-rw------- 1 root    root    1073741824 Aug 21 09:11 j._0
-rw------- 1 mongodb mongodb 1073741824 Feb  2  2013 prealloc.1
-rw------- 1 mongodb mongodb 1073741824 Feb  2  2013 prealloc.2

On nodes db2 and db3, the file that wasn't properly owned was named prealloc.0.

I'm not sure why the upgrade caused this permissions change to occur, or if it's because of me launching it by hand as root, but of course re-owning the file to mongodb, removing the lock file (in the subdirectory above) and relaunching via upstart fixes the problem, though I had to cycle through my 3 servers and fix permissions one more time to get the problem completely cleared up.