Transcription of Notes on
Setting Up MongoDB
in a Data Center

Russell Bateman
4 December 2012
last update:

Table of Contents

Checklist
If mongod fails to start...
MongoDB replica sets
Practical: setting up a replica set
Steps followed
Appendix: MongoDB helper files
Appendix: MongoDB shell scrapes
Appendix: MongoDB node launch scrape
Appendix: Notes on a testing set-up
MongoDB configuration files
/data/mongodb/arbiter.conf
/data/mongodb/configsvr.conf
/data/mongodb/sharding.conf
Upstart files
/etc/init/mongodb-arbiter.conf
/etc/init/mongodb-configsvr.conf
/etc/init/mongodb-sharding.conf

This is really no more than a summary from other MongoDB notes and a tutorial I maintain. However, I have in recent times enhanced it with notes taken while setting up MongoDB in an actual data center.

The assumption is that you're setting up MongoDB on three or more VMs in a data center and that the set-up will be accomplished behind a firewall of some sort. Also, we're assuming Ubuntu server, probably Precise. Some of these instructions are from Install MongoDB on Ubuntu.

Here's the check list...

  1. Get permanent root. If you can't do this, you'll need to prefix all the commands in this check list with sudo.
    $ sudo bash
  2. You'll need to attend to your advanced packaging tool repository. First, you'll need to get a public GPG key:
    $ apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10

    If you can't do this because the command times out, it probably means that you're behind a firewall. To solve this, follow the instructions in Getting apt keys from behind firewall.

  3. Notify package manager tool of the 10gen repository. Create /etc/apt/sources.list.d/10gen.list. This file should consist of the following line of text:
    deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen
  4. Update package manager tool's knowledge of repositories.
    $ apt-get update
  5. Use package manager tool to install MongoDB.
    $ apt-get install mongodb-10gen

At this point, you've installed MongoDB. Because Ubuntu is an "upstart" platform, you've also scheduled it as a service to be run should you ever need to bounce your VM (or bring it up after it goes down). The dæmon is named mongod.

If mongod fails to start...

This would be because it got shut down badly and there is a lock file. Remove this lock file and restart the service:

$ rm /var/lib/mongodb/mongod.lock $ service mongodb start

MongoDB replica sets

You'll want the MongoDB you just installed to belong to a replica set or to become a shard. Please find a demonstration of how to erect a replica set here. Please consult the MongDB website for information on setting up sharding.


Practical: setting up a replica set

I'm keeping MongoDB as an upstart service—it's installed this way by 10gen's Debian package.

Assume the following, MongoDB-relevant
paths on each node.

  • /etc/mongodb.conf
  • dbpath = /var/lib/mongodb
  • logpath = /var/log/mongodb
     

Here are the 5 VMs and their MongoDB ports.

acme-db01 37017
acme-db02 37018
acme-db03 37019
acme-db04 37020
acme-db05 37021

Steps followed

  1. Edited each /etc/mongodb.conf file as shown above.
  2. Verified that ping from any node finds any other; so I don't have to muck with /etc/hosts.
  3. Copied config.js script to acme-db01:/home/russ. (See in appendix below.)
  4. Halted mongod on all hosts using
        $ sudo service mongodb stop
        $ ps -ef | grep mongo[d]
    
  5. (However, after mongod running as replica node, this must be done via the MongoDB shell! See appendix below.)
  6. Logged into acme-db01.
  7. Brought the nodes up, by starting MongoDB with the new replica set configuration, one-by-one, in reverse order, via
        $ sudo service mongodb start
        $ ps -ef | grep mongo[d]
    
  8. Logged into acme-db01.
  9. Launched the MongoDB shell.
  10. Scraped the first command in config.js and pasted into shell; executed it; Executed rs.initiate( config ), rs.conf() and rs.status(); See scrapes here. (The first time, I found that VM acme-db03 had not been created properly, so I had to redo it.)

Appendix: MongoDB helper files

/etc/mongodb.conf:

Added to top of /etc/mongodb.conf are the following lines. These will make it so that the replica node's mongod respects:

# ------------------------------------------------------------
# These values are added here in support of an Acme replica set.
# No other values in this file were changed.
port=370[17-21]
replSet=acme-replicas
fork=true
# ------------------------------------------------------------

Existing settings assumed (as already set by 10gen corporation):

dbpath=/var/lib/mongodb
logpath=/var/log/mongodb/mongodb.log
logappend=true
/home/russ/config.js:
/* ------------------------------------------------------------------------
- This is for configuring the Acme replica nodes. Copy and paste the lines
- below into the MongoDB shell to configure the replicas. Run this script
- in a shell launched to target acme-db01 remotely; port 37017 happens to be
- the primary. The command to launch the shell is:
-
- $ mongo --host acme-db01 --port 37017
-
- You shouldn't do this until all the mongod processes on acme-db01-05 are
- running successfully on their respective hosts. This is because it won't
- work until all five mongod processes are going.
- ------------------------------------------------------------------------ */

config =
{ _id:"acme-replicas", members:
[
{ _id:0, host:"acme-db01:37017" },
{ _id:1, host:"acme-db02:37018" },
{ _id:2, host:"acme-db03:37019" },
{ _id:3, host:"acme-db04:37020" },
{ _id:4, host:"acme-db05:37021" }
]
}

// Then issue these next commands to the shell:

rs.initiate( config )
rs.conf()
rs.status()

Appendix: MongoDB shell scrapes

Setting up JavaScript config variable.

russ@acme-db01:~$ mongo -port 37017
MongoDB shell version: 2.2.1
connecting to: 127.0.0.1:37017/test
> config =
... { _id:"acme-replicas", members:
... [
... { _id:0, host:"acme-db01:37017" },
... { _id:1, host:"acme-db02:37018" },
... { _id:2, host:"acme-db03:37019" },
... { _id:3, host:"acme-db04:37020" },
... { _id:4, host:"acme-db05:37021" }
... ]
... }
{
  "_id" : "acme-replicas",
  "members" : [
    {
      "_id" : 0,
      "host" : "acme-db01:37017"
    },
    {
      "_id" : 1,
      "host" : "acme-db02:37018"
    },
    {
      "_id" : 2,
      "host" : "acme-db03:37019"
    },
    {
      "_id" : 3,
      "host" : "acme-db04:37020"
    },
    {
      "_id" : 4,
      "host" : "acme-db05:37021"
    }
  ]
}
...

Initializing (continues from previous scrape because it uses variable config)...

...
> rs.initiate( config )
{
  "info" : "Config now saved locally.  Should come online in about a minute.",
  "ok" : 1
}
> rs.conf()
{
  "_id" : "acme-replicas",
  "version" : 1,
  "members" : [
    {
      "_id" : 0,
      "host" : "acme-db01:37017"
    },
    {
      "_id" : 1,
      "host" : "acme-db02:37018"
    },
    {
      "_id" : 2,
      "host" : "acme-db03:37019"
    },
    {
      "_id" : 3,
      "host" : "acme-db04:37020"
    },
    {
      "_id" : 4,
      "host" : "acme-db05:37021"
    }
  ]
}

Verifying... (Notice how the MongoDB shell prompt has changed after the call to rs.conf().)

acme-replicas:PRIMARY> rs.status()
{
  "set" : "acme-replicas",
  "date" : ISODate("2013-01-08T00:58:22Z"),
  "myState" : 1,
  "members" : [
    {
      "_id" : 0,
      "name" : "acme-db01:37017",
      "health" : 1,
      "state" : 1,
      "stateStr" : "PRIMARY",
      "uptime" : 1607,
      "optime" : Timestamp(1357606555000, 1),
      "optimeDate" : ISODate("2013-01-08T00:55:55Z"),
      "self" : true
    },
    {
      "_id" : 1,
      "name" : "acme-db02:37018",
      "health" : 1,
      "state" : 2,
      "stateStr" : "SECONDARY",
      "uptime" : 147,
      "optime" : Timestamp(1357606555000, 1),
      "optimeDate" : ISODate("2013-01-08T00:55:55Z"),
      "lastHeartbeat" : ISODate("2013-01-08T00:58:21Z"),
      "pingMs" : 0
    },
    {
      "_id" : 2,
      "name" : "acme-db03:37019",
      "health" : 1,
      "state" : 2,
      "stateStr" : "SECONDARY",
      "uptime" : 147,
      "optime" : Timestamp(1357606555000, 1),
      "optimeDate" : ISODate("2013-01-08T00:55:55Z"),
      "lastHeartbeat" : ISODate("2013-01-08T00:58:21Z"),
      "pingMs" : 0
    },
    {
      "_id" : 3,
      "name" : "acme-db04:37020",
      "health" : 1,
      "state" : 2,
      "stateStr" : "SECONDARY",
      "uptime" : 147,
      "optime" : Timestamp(1357606555000, 1),
      "optimeDate" : ISODate("2013-01-08T00:55:55Z"),
      "lastHeartbeat" : ISODate("2013-01-08T00:58:22Z"),
      "pingMs" : 0
    },
    {
      "_id" : 4,
      "name" : "acme-db05:37021",
      "health" : 1,
      "state" : 2,
      "stateStr" : "SECONDARY",
      "uptime" : 147,
      "optime" : Timestamp(1357606555000, 1),
      "optimeDate" : ISODate("2013-01-08T00:55:55Z"),
      "lastHeartbeat" : ISODate("2013-01-08T00:58:21Z"),
      "pingMs" : 0
    }
  ],
  "ok" : 1
}

Appendix: MongoDB node launch sequence scrape

(Some fluff trimmed...)

russ@ssh-gw:~$ ssh acme-db05
Welcome to Ubuntu 12.04.1 LTS (GNU/Linux 3.2.0-23-generic x86_64)
Last login: Mon Jan  7 15:47:02 2013 from ssh-gw.tnt3-zone1.acme
russ@acme-db05:~$ sudo service mongodb start
mongodb start/running, process 13980
russ@acme-db05:~$ ps -ef | grep mongo[d]
mongodb  13983     1  0 16:02 ?        00:00:00 /usr/bin/mongod --config /etc/mongodb.conf
russ@acme-db05:~$ logout
Connection to acme-db05 closed.
russ@ssh-gw:~$ ssh acme-db04
Welcome to Ubuntu 12.04.1 LTS (GNU/Linux 3.2.0-23-generic x86_64)
Last login: Mon Jan  7 15:49:14 2013 from ssh-gw.tnt3-zone1.acme
russ@acme-db04:~$ sudo service mongodb start ; ps -ef | grep mongo[d]
mongodb start/running, process 13774
root     13774     1  0 16:03 ?        00:00:00 start-stop-daemon --start --quiet --chuid mongodb --exec /usr/bin/mongod -- --config /etc/mongodb.conf
russ@acme-db04:~$ logout
Connection to acme-db04 closed.
russ@ssh-gw:~$ ssh acme-db03
Welcome to Ubuntu 12.04.1 LTS (GNU/Linux 3.2.0-23-generic x86_64)
Last login: Mon Jan  7 15:43:00 2013 from ssh-gw.tnt3-zone1.acme
russ@acme-db03:~$ sudo service mongodb start ; ps -ef | grep mongo[d]
mongodb start/running, process 9645
mongodb   9645     1  0 16:03 ?        00:00:00 /usr/bin/mongod --config /etc/mongodb.conf
russ@acme-db03:~$ logout
Connection to acme-db03 closed.
russ@ssh-gw:~$ ssh acme-db02
Welcome to Ubuntu 12.04.1 LTS (GNU/Linux 3.2.0-23-generic x86_64)
Last login: Mon Jan  7 15:53:38 2013 from ssh-gw.tnt3-zone1.acme
russ@acme-db02:~$ sudo service mongodb start ; ps -ef | grep mongo[d]
mongodb start/running, process 13915
root     13915     1  0 16:04 ?        00:00:00 start-stop-daemon --start --quiet --chuid mongodb --exec /usr/bin/mongod -- --config /etc/mongodb.conf
russ@acme-db02:~$ logout
Connection to acme-db02 closed.
russ@ssh-gw:~$ ssh acme-db01
Welcome to Ubuntu 12.04.1 LTS (GNU/Linux 3.2.0-23-generic x86_64)
Last login: Mon Jan  7 15:54:05 2013 from ssh-gw.tnt3-zone1.acme
russ@acme-db01:~$ sudo service mongodb start ; ps -ef | grep mongo[d]
mongodb start/running, process 15724
root     15724     1  0 16:04 ?        00:00:00 start-stop-daemon --start --quiet --chuid mongodb --exec /usr/bin/mongod -- --config /etc/mongodb.conf

Appendix: Notes on a testing set-up

It seems useful to burden a single VM node, in this case acme-db05, with

  1. arbiter (unless there's already an odd number of nodes in the replica)
  2. configuration server (a singe one—this is for testing after all)
  3. sharding router (mongos dæmon)

However, this presents a number of challenges. Here's what I found.

  1. It seemed useful to resort to a new area, /data/mongodb, for juggling all of this configuration and Mongo files. Besides, in numerous MongoDB Chef recipes I've studied, everyone seems to be doing this.
  2.  
  3. There were all sorts of problems of permissions and privileges to overcome. The entire /data/mongodb subdirectory tree had to be re-owned as user mongodb. Remember, the upstart configuration will launch the dæmons as owned by this user and not root (for good reason).
  4.  
  5. I had to experiment directly with the start-stop-daemon command as user mongodb for realism before getting it right. E.g.:
        mongodb@acme-db05:/data/mongodb$ exec start-stop-daemon --start --quiet --chuid mongodb \
                     --exec /usr/bin/mongos -- --config /data/mongodb/sharding.conf
    

    Mostly, I had to do this in order to see why it wasn't working.

  6. I could not run /usr/bin/mongod once for the arbiter and again for the configuration server (also a mongod process). The only (or, at least, best) way around this seems to be to copy the dæmon, which I did (to /data/mongodb/bin).

Filesystem result

Here's what I have at the end of working out all the trouble. (Note: MongoDB created the _tmp subdirectories by itself.)

    root@acme-db05:/data/mongodb# tree
    .
    |-- arbiter                 arbiter stuff...
    |   |-- journal
    |   |   |-- j._0
    |   |   |-- prealloc.1
    |   |   `-- prealloc.2
    |   |-- local.0
    |   |-- local.ns
    |   |-- log
    |   |   `-- arbiter.log
    |   `-- _tmp
    |-- arbiter.conf            arbiter's /etc/mongodb.conf...
    |-- bin
    |   `-- mongod              copy of mongod for configuration server's use...
    |-- configsvr               configuration server stuff...
    |   |-- config.0
    |   |-- config.1
    |   |-- config.ns
    |   |-- journal
    |   |   |-- j._0
    |   |   |-- lsn
    |   |   |-- prealloc.1
    |   |   `-- prealloc.2
    |   |-- local.0
    |   |-- local.ns
    |   |-- log
    |   |   `-- configsvr.log
    |   `-- _tmp
    |-- configsvr.conf          configuration server's /etc/mongodb.conf...
    |-- sharding                sharding router stuff...
    |   `-- log
    |       `-- sharding.log
    `-- sharding.conf           sharding router's /etc/mongodb.conf...

    11 directories, 23 files

MongoDB configuration files

As suggested by the subdirectory tree above, here are the files that in essence replace the stock /etc/mongodb.conf file. See the significance of this at MongoDB Training: Sharding: Sharding instructions.

/data/mongodb/arbiter.conf:
    port=37021
    replSet=acme-replicas
    fork=true
    dbpath=/data/mongodb/arbiter
    logpath=/data/mongodb/arbiter/log/arbiter.log
/data/mongodb/configsvr.conf:

The port number isn't completely arbitrary: it must be recorded and told to the sharding router.

    port=47021
    configsvr=true
    fork=true
    dbpath=/data/mongodb/configsvr
    logpath=/data/mongodb/configsvr/log/configsvr.log
/data/mongodb/sharding.conf:

The sharding router must know the configuration server's port number.

    port=27017
    configdb=acme-db05:47021
    fork=true
    logpath=/data/mongodb/sharding/log/sharding.log
If this weren't a testing-only set-up...

Note that if there were the full compliment of three configuration servers, configdb in each, respective sharding.conf would be like this—the mongos host names and port numbers are invented here. Correspondingly, any additional sharding router configurations would also need to use this exact, same configuration (but with different port numbers than 27017). Also, it wouldn't be a good idea to run the configuration servers on the same hosts.

    configdb=acme-db05:47021,acme-db06:47022,acme-db07:47023

Upstart files

Basically, I copied these from etc/init/mongodb.conf as installed by the MongoDB Debian package. I modified them leaving them mostly intact. This should achieve ensuring that these three dæmons will be relaunched (after the VM comes back after having gone down for whatever reason) without my attention.

/etc/init/mongodb-arbiter.conf:
    author "Russell Bateman"
    description "Keeps arbiter (mongod) running between boots"

    limit nofile 20000 20000
    kill timeout 300          # wait 300s between SIGTERM and SIGKILL.

    start on runlevel [2345]
    stop on runlevel [06]

    script
      ENABLE_MONGODB="yes"
      if [ -f /etc/default/mongodb ]; then . /etc/default/mongodb; fi
      if [ "x$ENABLE_MONGODB" = "xyes" ]; then
        exec start-stop-daemon --start --quiet --chuid mongodb --exec /usr/bin/mongod -- --config /data/mongodb/arbiter.conf
      fi
    end script
/etc/init/mongodb-configsvr.conf:
    author "Russell Bateman"
    description "Keeps configuration server (mongod) running between boots"

    limit nofile 20000 20000
    kill timeout 300          # wait 300s between SIGTERM and SIGKILL.

    start on runlevel [2345]
    stop on runlevel [06]

    script
      ENABLE_MONGODB="yes"
      if [ -f /etc/default/mongodb ]; then . /etc/default/mongodb; fi
      if [ "x$ENABLE_MONGODB" = "xyes" ]; then
        exec start-stop-daemon --start --quiet --chuid mongodb --exec /data/mongodb/bin/mongod -- --config /data/mongodb/configsvr.conf
      fi
    end script
/etc/init/mongodb-sharding.conf:
    author "Russell Bateman"
    description "Keeps mongos running between boots"

    limit nofile 20000 20000
    kill timeout 300          # wait 300s between SIGTERM and SIGKILL.

    start on started mongodb-configsvr
    stop on stopping mongodb-configsvr

    script
      ENABLE_MONGODB="yes"
      if [ -f /etc/default/mongodb ]; then . /etc/default/mongodb; fi
      if [ "x$ENABLE_MONGODB" = "xyes" ]; then
        exec start-stop-daemon --start --quiet --chuid mongodb --exec /usr/bin/mongos -- --config /data/mongodb/sharding.conf
      fi
    end script