Transcription of Notes on
Setting Up MongoDB and Chef
on a VM in a Data Center (obsolete)

Russell Bateman
16 July 2013
last update:

These are just random notes for now. Here's an illustration I want to use.

It's useful to use Chef to set up MongoDB for a data center installation. The primo Chef recipe for MongoDB, is had from:

http://community.opscode.com/cookbooks/mongodb-10gen

The recipe I chose worked for me right off when I was a total newbie to Chef. Other possiblities include https://github.com/edelight/chef-mongodb and https://github.com/rbrcurtis/chef-scripts/tree/master/cookbooks/mongodb-10gen.

My recipe, and all the others I examined too, changes a few rules of the normal game if you've been setting up MongoDB on default installation paths, to wit:

Instead, this recipe abandons traditional for lumping all onto the path /data/mongodb, which it creates.

This is sort of good because it puts everything side by side in one place. After all, this is for use on a replica node in a data center and no one is going to care that this stuff isn't in the canonical locations. Here's what you see after running chef-client on the database node:

    [email protected]:/# ll /data/mongodb/
    total 24
    drwxr-xr-x 6 mongodb mongodb 4096 Jul 12 22:13 ./
    drwxr-xr-x 3 root    root    4096 Jul 12 22:13 ../
    drwxr-xr-x 2 mongodb mongodb 4096 Jul 12 22:13 db/         # where the databases go
    drwxr-xr-x 2 mongodb mongodb 4096 Jul 12 22:13 etc/        # where mongodb.conf goes
    drwxr-xr-x 2 mongodb mongodb 4096 Jul 12 22:13 log/        # where mongodb.log goes
    drwxr-xr-x 2 mongodb mongodb 4096 Jul 12 22:13 misc/       # (we'll see what goes here)

Here's some help on executing commands from a Chef recipe in MongoDB shell: http://stackoverflow.com/questions/12306684/get-chef-to-execute-a-mongodb-script-after-mongodb-has-started.

Forcing MongoDB to restart between recipes...

Once you've set up MongoDB configuration, both the upstart configuration file, /etc/init/mongodb.conf and the sort done either at the command line when invoking one of the dæmons or the /etc/mongodb.conf file, you likely need to restart so that the next recipe that runs can perform MongoDB shell commands on a working, configured MongoDB. First you notify at the same time as the template block, then you invoke Chef's upstart service provider. The use of :immediately is crucial because without it, the restart won't happen until after the Chef run (the default being :delayed). See Common (Resource) Functionality: Notifications.

template "/data/mongodb/mongodb.conf" do
.
.
.
notifies :restart, "service[ #{ node[ :mongodb ][ :nodename ] } ]", :immediately
end
# Restart the service so that replica-set.rb can do its magic.
service node[ :mongodb ][ :nodename ] do
provider Chef::Provider::Service::Upstart
action [ :enable, :start ]
end

Data bags: the solution to node definitions

Up to some point, it seems useful to incorporaate the hostnames, port numbers, etc. directly in the "normal" section of the node-definition JSON file. With the concept of data bags this might not be necessary. For example, imagine the following scenario:

  1. Three replica nodes.
  2. The third is the roll-up, i.e.: it contains all the information for rolling up the replica set in MongoDB which means issuing the MongoDB shell commands that do this.
  3. This is ugly because it means changing the node definition when an IP address, hostname, port number, etc. changes.

Here's how I'd been doing this:

node/db03:
{
"normal":
{
"description" : replica set roll-up",
"instructions" : "Ensure replica set name is known, IP address/DNS name of all members must be known",
"replicaset" : { "name" : "rs1" },
"replicanode" : { "hostname" : 16.86.193.100 }, "port" : 37017 },
"replica_1" : { "hostname" : 16.86.193.100, "port" : 37017 },
"replica_2" : { "hostname" : 16.86.193.101, "port" : 37018 },
"replica_3" : { "hostname" : 16.86.193.102, "port" : 37019 }
},
"name": "db03",
"override": { },
"default": { },
"json_class": "Chef::Node",
"automatic": { },
"run_list":
[
"recipe[apt]",
"recipe[mongodb]",
"recipe[mongodb::replica]",
"recipe[mongodb::replica-set]",
"role[install-database-node]",
"role[install-replica-node]",
"role[config-replica-set]"
],
"chef_type": "node"
}

Instead, this information could be bagged and a reference to it placed on the node that will tell the recipe where it is. That way, only the data bag need to be modified.

Please note that there's no way to comment JSON files. However, it appears encouraged that a data bag have a "comment" tuple in it. Here, I'm using "description".

data_bags/rollups/replicaset-1.json:
{
"id" : "replicaset-1",
"description" : "my replica set roll-up",
"replicaset" : { "name" : "rs1" },
"replicanode" : { "hostname" : 16.86.193.100, "port" : 37017 },
"replica_1" : { "hostname" : 16.86.193.100, "port" : 37017 },
"replica_2" : { "hostname" : 16.86.193.101, "port" : 37018 },
"replica_3" : { "hostname" : 16.86.193.102, "port" : 37019 }
}
node/db03:
{
"normal":
{
"nodetype" : "replica",
"bagid" : "replicaset-1"
},
"name": "db03",
"override": { },
"default": { },
"json_class": "Chef::Node",
"automatic": { },
"run_list":
[
"recipe[apt]",
"recipe[mongodb]",
"recipe[mongodb::replica]",
"recipe[mongodb::replica-set]",
"role[install-database-node]",
"role[install-replica-node]",
"role[config-replica-set]"
],
"chef_type": "node"
}

The recipe code for reading this (roll-up) databag is:

roll_up = data_bag_item( "rollups", "replicaset-1" )
puts "Description:", roll_up[ :description ]

Data bags are global, therefore essential to be named. Installing a databage can be done "from file" with the file living in the default location, i.e.: data_bags. But, this cannot be arbitrarily hierarchical—one subdirectory of depth only. Please see About Data Bags.

$ knife data bag from file [roll-ups] replical-roll-up.json

More good links on this topic:

Example of data-bagging...

Here I've transformed my previous node-only (as alluded earlier) into a node + data bag approach.

~/recipes/data_bags/rollups $ cat *.json
{
"id" : "replicaset-1",
"name" : "rs1",
"description" : "shard 1 replica set roll-up--name, hostnames and ports",
"replicanode" : { "hostname" : 16.86.193.100, "port" : 37017 },
"replica_1" : { "hostname" : 16.86.193.100, "port" : 37017 },
"replica_2" : { "hostname" : 16.86.193.101, "port" : 37018 },
"replica_3" : { "hostname" : 16.86.193.102, "port" : 37019 }
}
{
"id" : "replicaset-2",
"name" : "rs2",
"description" : "shard 2 replica set roll-up--name, hostnames and ports",
"replicanode" : { "hostname" : 16.86.193.100, "port" : 37017 },
"replica_1" : { "hostname" : 16.86.193.100, "port" : 37017 },
"replica_2" : { "hostname" : 16.86.193.101, "port" : 37018 },
"replica_3" : { "hostname" : 16.86.193.102, "port" : 37019 }
}
{
"id" : "configsvr",
"description" : "configuration server list--hostnames and ports",
"configsvr_1" : { "hostname" : 16.86.193.100, "port" : 37017 },
"configsvr_2" : { "hostname" : 16.86.193.101, "port" : 37018 },
"configsvr_3" : { "hostname" : 16.86.193.102, "port" : 37019 }
}
{
"id" : "shard-1",
"description" : "shard 1 roll-up--name, hostnames and ports",
"replicaset_bag" : "replicaset-1",
"configsvr_bag" : "configsvr"
}
{
"id" : "shard-2",
"description" : "shard 2 roll-up--name, hostnames and ports",
"replicaset_bag" : "replicaset-2",
"configsvr_bag" : "configsvr"
}

Here are the node-file (relevant) particulars, db01.json through db11.json:

~/recipes/nodes $ cat db*.json
{
"normal": { "nodetype" : "replica node", "bagid" : "replicaset-1" },
"name": "db01",
...
}
{
"normal": { "nodetype" : "replica node", "bagid" : "replicaset-1" },
"name": "db02",
...
}
{
"normal": { "nodetype" : "replica roll-up node", "bagid" : "replicaset-1" },
"name": "db03",
...
}
{
"normal": { "nodetype" : "replica node", "bagid" : "replicaset-2" },
"name": "db04",
...
}
{
"normal": { "nodetype" : "replica node", "bagid" : "replicaset-2" },
"name": "db05",
...
}
{
"normal": { "nodetype" : "replica roll-up node", "bagid" : "replicaset-2" },
"name": "db06",
...
}
{
"normal": { "nodetype" : "configuration server", "bagid" : "configsvr", "which" : "configsvr_1" },
"name": "db07",
...
}
{
"normal": { "nodetype" : "configuration server", "bagid" : "configsvr", "which" : "configsvr_2" },
"name": "db08",
...
}
{
"normal": { "nodetype" : "configuration server", "bagid" : "configsvr", "which" : "configsvr_3" },
"name": "db09",
...
}
{
"normal": { "nodetype" : "sharding router", "bagid" : "shard-1" },
"name": "db10",
...
}
{
"normal": { "nodetype" : "sharding router", "bagid" : "shard-2" },
"name": "db11",
...
}

And here are some recipe fragment tests dealing with all of this. These are "tests" and not the actual recipes. They contain the guts of what will be the recipe, but brought into Ruby for easier testing (than to run and debug them as Chef recipes during the Chef run).

replica-set.rb:
#!/bin/ruby
# ====================================================================
# Test the Ruby code in the replica-set.rb recipe. This code simulates
# using a data bag that defines what's in the roll-up code, namely,
# what will have to be done in the MongoDB shell.
# ====================================================================

# This simulates 'node' as Chef will set it up in chef-client run.
# ----------------------------------------------------------------
node = Hash.new { | h,k | h[ k ] = Hash.new( &h.default_proc ) }

# Contributed by the "normal" field on the roll-up node, something like
# {
#     "normal": { "nodetype" : "replica roll-up node", "bagid" : "replicaset-1" },
#     "name": "db03",
#     ...
# }
node[ :bagid ] = "replicaset-1"

# This simulates 'data bag' as Chef will set it up in chef-client run.
# --------------------------------------------------------------------
bag = Hash.new { | h,k | h[ k ] = Hash.new( &h.default_proc ) }

# Contributed by a data bag, something like
# {
#     "id"          : "replicaset-1",
#     "name"        : "rs1",
#     "description" : "shard 1 replica set roll-up--name, hostnames and ports",
#     "replicanode" : { "hostname" : 16.86.193.100, "port" : 37017 },
#     "replica_1"   : { "hostname" : 16.86.193.100, "port" : 37017 },
#     "replica_2"   : { "hostname" : 16.86.193.101, "port" : 37018 },
#     "replica_3"   : { "hostname" : 16.86.193.102, "port" : 37019 }
# }
bag[ :id          ] = "replicaset-1"
bag[ :name        ] = "rs1"
bag[ :description ] = "shard 1 replica set roll-up--name, hostnames and ports"
bag[ :replicanode ][ :hostname ] = "16.86.193.100"
bag[ :replicanode ][ :port ]     = 37017
bag[ :replica_1   ][ :hostname ] = "16.86.193.100"
bag[ :replica_1   ][ :port ]     = 37017
bag[ :replica_2   ][ :hostname ] = "16.86.193.101"
bag[ :replica_2   ][ :port ]     = 37018
bag[ :replica_3   ][ :hostname ] = "16.86.193.102"
bag[ :replica_3   ][ :port ]     = 37019

# Recipe code we're testing as Chef will see it in chef-client run.
# --------------------------------------------------------------------
id         = 0
found      = false
replica_id = "id : \"%s\"" % bag[ :name ]

# We'll expect up to 4 replica nodes, though we'd really only expect 3. If
# an even number, likely, there's going to be a replica, but we don't care
# about that in here.
replica_1 = bag[ :replica_1 ]
replica_2 = bag[ :replica_2 ]
replica_3 = bag[ :replica_3 ]
replica_4 = bag[ :replica_4 ]

replica_members = ""

if !replica_1.empty?
   found = true
   hostname = replica_1[ :hostname ]
   port     = replica_1[ :port ]
   replica_members += "{ _id:%d, host:%s:%s }" % [ id, hostname, port ]
   id += 1
end
if !replica_2.empty?
   if found
     replica_members += ", "
   end
   found = true
   hostname = replica_2[ :hostname ]
   port     = replica_2[ :port ]
   replica_members += "{ _id:%d, host:%s:%s }" % [ id, hostname, port ]
   id += 1
end
if !replica_3.empty?
   if found
     replica_members += ", "
   end
   found = true
   hostname = replica_3[ :hostname ]
   port     = replica_3[ :port ]
   replica_members += "{ _id:%d, host:%s:%s }" % [ id, hostname, port ]
   id += 1
end
if !replica_4.empty?
   if found
     replica_members += ", "
   end
   hostname = replica_4[ :hostname ]
   port     = replica_4[ :port ]
   replica_members += "{ _id:%d, host:%s:%s }" % [ id, hostname, port ]
   id += 1
end

configuration = "config = { " + replica_id + ", members: [" + replica_members + "] }"

print "replicaset name = ", bag[ :name ]
puts
print "     replica id = ", bag[ :id ]
puts
print "replica_members = ", replica_members
puts
puts "  configuration = ", configuration

# ====================================================================
# What should be output?
# ~/dev/chef/tests $ ruby test-replicaset.rb
# replicaset name = rs1
#      replica id = replicaset-1
# replica_members = { _id:0, host:16.86.193.100:37017 }, { _id:1, host:16.86.193.101:37018 }, { _id:2, host:16.86.193.102:37019 }
#   configuration =
# config = { id : "rs1", members: [{ _id:0, host:16.86.193.100:37017 }, { _id:1, host:16.86.193.101:37018 }, { _id:2, host:16.86.193.102:37019 }] }
configsvr.rb:

This recipe doesn't have "roll-up" per se, but it uses data bag information to complete a template file to generate the MongoDB configuration file.

#!/bin/ruby
# ====================================================================
# Test the Ruby code in the configsvr.rb recipe. This code simulates
# using a data bag that defines what's in the roll-up code, namely,
# what will have to be done in the MongoDB shell.
# ====================================================================

# This simulates 'node' as Chef will set it up in chef-client run.
# ----------------------------------------------------------------
node = Hash.new { | h,k | h[ k ] = Hash.new( &h.default_proc ) }

# Contributed by the "normal" field on the roll-up node, something like
# {
#     "normal": { "nodetype" : "configuration server", "bagid" : "configsvr", "which" : "configsvr_1" },
#     "name": "db03",
#     ...
# }
node[ :bagid ] = "configsvr"
node[ :which ] = "configsvr_1"

# This simulates 'data bag' as Chef will set it up in chef-client run.
# --------------------------------------------------------------------
bag = Hash.new { | h,k | h[ k ] = Hash.new( &h.default_proc ) }

# Contributed by a data bag, something like
# {
#    "id"          : "configsvr",
#    "description" : "configuration server list--hostnames and ports",
#    "configsvr_1" : { "hostname" : 16.86.193.100, "port" : 37017 },
#    "configsvr_2" : { "hostname" : 16.86.193.101, "port" : 37018 },
#    "configsvr_3" : { "hostname" : 16.86.193.102, "port" : 37019 }
# }
bag[ :id          ] = "configsvr"
bag[ :description ] = "configuration server list--hostnames and ports"
bag[ :configsvr_1 ][ :hostname ] = "16.86.193.100"
bag[ :configsvr_1 ][ :port ]     = 37017
bag[ :configsvr_2 ][ :hostname ] = "16.86.193.101"
bag[ :configsvr_2 ][ :port ]     = 37018
bag[ :configsvr_3 ][ :hostname ] = "16.86.193.102"
bag[ :configsvr_3 ][ :port ]     = 37019

# Recipe code we're testing as Chef will see it in chef-client run.
# --------------------------------------------------------------------

# Get hostname/IP address and port number for the configuration server
which     = node[ :which ]
configsvr = bag[ which.to_sym ]
hostname  = configsvr[ :hostname ]
port      = configsvr[ :port ]

print "      which = ", which
puts
print "     bag id = ", bag[ :id ]
puts
print "description = ", bag[ :description ]
puts
print "   hostname = ", hostname
puts
print "       port = ", port
puts

# ====================================================================
# What should be output?
# ~/dev/chef/tests $ ruby test-configsvr.rb
#       which = configsvr_1
#      bag id = configsvr
# description = configuration server list--hostnames and ports
#    hostname = 16.86.193.100
#        port = 37017

sharding.rb:
#!/bin/ruby
# ====================================================================
# Test the Ruby code in the sharding.rb recipe. This code simulates
# using a data bag that defines what's in the roll-up code, namely,
# what will have to be done in the MongoDB shell.
# ====================================================================

# This simulates 'node' as Chef will set it up in chef-client run.
# ----------------------------------------------------------------
node = Hash.new { | h,k | h[ k ] = Hash.new( &h.default_proc ) }

# Contributed by the "normal" field on the roll-up node, something like
# {
#    "normal": { "nodetype" : "sharding router", "bagid" : "shard-1" },
#    "name": "db10",
#     ...
# }
node[ :bagid ] = "shard-1"

# This simulates 'data bag' as Chef will set it up in chef-client run.
# Specially, we have to get the configuration server bag which is
# required by sharding. This is copied out of another test.
# --------------------------------------------------------------------
bag = Hash.new { | h,k | h[ k ] = Hash.new( &h.default_proc ) }

bag[ :id          ] = "configsvr"
bag[ :description ] = "configuration server list--hostnames and ports"
bag[ :configsvr_1 ][ :hostname ] = "16.86.193.100"
bag[ :configsvr_1 ][ :port ]     = 37017
bag[ :configsvr_2 ][ :hostname ] = "16.86.193.101"
bag[ :configsvr_2 ][ :port ]     = 37018
bag[ :configsvr_3 ][ :hostname ] = "16.86.193.102"
bag[ :configsvr_3 ][ :port ]     = 37019

# Recipe code we're testing as Chef will see it in chef-client run.
# --------------------------------------------------------------------
bagid = node[ :bagid ]

# Gather the list of configuration servers from the known bag.
configsvr_list = nil
configsvr_1 = bag[ :configsvr_1 ]
configsvr_2 = bag[ :configsvr_2 ]
configsvr_3 = bag[ :configsvr_3 ]

if !configsvr_1.empty?
   configsvr_1_hostname = configsvr_1[ :hostname ]
   configsvr_1_port     = configsvr_1[ :port ]
end
if !configsvr_2.empty?
   configsvr_2_hostname = configsvr_2[ :hostname ]
   configsvr_2_port     = configsvr_2[ :port ]
end
if !configsvr_3.empty?
   configsvr_3_hostname = configsvr_3[ :hostname ]
   configsvr_3_port     = configsvr_3[ :port ]
end

if !configsvr_1_hostname.empty? and !configsvr_1_port.nil?
  configsvr_list = configsvr_1_hostname + ":" + configsvr_1_port.to_s
else
  puts "Missing hostname or port number, we're screwed!"
end

if configsvr_2_hostname.empty? or configsvr_3_hostname.empty?
  # so, we'll ignore anything else halfway established...
elsif configsvr_3_hostname.empty?
  # so, we'll ignore anything else halfway established...
else
  # well, we'll give all three a shot...
  configsvr_list += "," + configsvr_2_hostname + ":" + configsvr_2_port.to_s
  configsvr_list += "," + configsvr_3_hostname + ":" + configsvr_3_port.to_s
end

puts "configuration server list", configsvr_list

# ====================================================================
# What should be output?
# ~/dev/chef/tests $ ruby test-sharding.rb
# configuration server list
# 16.86.193.100:37017,16.86.193.101:37018,16.86.193.102:37019
add-shard.rb:
#!/bin/ruby
# ====================================================================
# Test the Ruby code in the add-shard.rb recipe. This code simulates
# using a data bag that defines what's in the roll-up code, namely,
# what will have to be done in the MongoDB shell.
# ====================================================================

# This simulates 'node' as Chef will set it up in chef-client run.
# ----------------------------------------------------------------
node = Hash.new { | h,k | h[ k ] = Hash.new( &h.default_proc ) }

# Contributed by the "normal" field on the roll-up node, something like
# {
#    "normal": { "nodetype" : "sharding router", "bagid" : "shard-1" },
#    "name": "db10",
#     ...
# }
node[ :bagid ] = "shard-1"

# This simulates 'data bag' as Chef will set it up in chef-client run.
# --------------------------------------------------------------------
bag = Hash.new { | h,k | h[ k ] = Hash.new( &h.default_proc ) }

# Contributed by a data bag, something like
# {
#    "id"             : "shard-1",
#    "description"    : "shard 1 roll-up--name, hostnames and ports",
#    "replicaset_bag" : "replicaset-1",
#    "configsvr_bag"  : "configsvr"
# }
bag[ :id          ]      = "shard-1"
bag[ :description ]      = "shard 1 roll-up--name, hostnames and ports"
bag[ :replicaset_bagid ] = "replicaset-1"
bag[ :configsvr_bagid ]  = "configsvr"

# Specially, we have to get a replica set bag that will be required by
# the add-shard bag. This was copied out of another test.
# --------------------------------------------------------------------
temp = Hash.new { | h,k | h[ k ] = Hash.new( &h.default_proc ) }
temp[ :id          ] = "replicaset-1"
temp[ :name        ] = "rs1"
temp[ :description ] = "shard 1 replica set roll-up--name, hostnames and ports"
temp[ :replicanode ][ :hostname ] = "16.86.193.100"
temp[ :replicanode ][ :port ]     = 37017
temp[ :replica_1   ][ :hostname ] = "16.86.193.100"
temp[ :replica_1   ][ :port ]     = 37017
temp[ :replica_2   ][ :hostname ] = "16.86.193.101"
temp[ :replica_2   ][ :port ]     = 37018
temp[ :replica_3   ][ :hostname ] = "16.86.193.102"
temp[ :replica_3   ][ :port ]     = 37019
replicaset_bag = temp

# Recipe code we're testing as Chef will see it in chef-client run.
# --------------------------------------------------------------------
bagid        = node[ :bagid ]
replset_name = replicaset_bag[ :name ]

# Get the name of the replica set that will belong to this shard, plus
# one of its hostname:port tuples.
hostname = ""
port     = nil

if !replicaset_bag[ :replica_1 ].empty?
   hostname = replicaset_bag[ :replica_1 ][ :hostname ]
   port     = replicaset_bag[ :replica_1 ][ :port ]
elsif !replicaset_bag[ :replica_2 ].empty?
   hostname = replicaset_bag[ :replica_2 ][ :hostname ]
   port     = replicaset_bag[ :replica_2 ][ :port ]
elsif !replicaset_bag[ :replica_3 ].empty?
   hostname = replicaset_bag[ :replica_3 ][ :hostname ]
   port     = replicaset_bag[ :replica_3 ][ :port ]
elsif !replicaset_bag[ :replica_4 ].empty?
   hostname = replicaset_bag[ :replica_4 ][ :hostname ]
   port     = replicaset_bag[ :replica_4 ][ :port ]
end

shell_command = "mongo --eval 'rs.addArb( \"#{replset_name}/#{hostname}:#{port}\" )'"

print "MongoDB shell command = ", shell_command
puts

# ====================================================================
# What should be output?
# ~/dev/chef/tests $ ruby test-add-shard.rb
# MongoDB shell command = mongo --eval 'rs.addArb( "rs1/16.86.193.100:37017" )'