Transcription of Notes on MongoDB

A full-featured MongoDB sample covering embedded array functionality, written from the CRUD point-of-view, may be found here. Mongo is nicely accessed via Morphia, a sort-of object-relational manager (ORM) that is very lightweight, for Mongo doesn't need much of this.


Myriad miscellaneous notes

Some stuff is from me, from the course I took from 10-gen, from synthesizing [email protected] or from ripping off voices like Scott Hernandez, Jenna Deboisblanc and others directly.

  • Data integrity. MongoDB MMS can help in finding problems where corruption stops certain nodes from responding in a timely fashion. It will e-mail you immediately. Journaling helps plus the usual back-up system, taking snapshots on a daily basis. See http://mongodb.org/display/DOCS/Backups. See also http://www.mongodb.org/display/DOCS/Durability+and+Repair, in a replica set: http://www.mongodb.org/display/DOCS/Replica+Set+Design+Concepts, also this thread: http://groups.google.com/forum/?fromgroups#!topic/mongodb-user/R3bB06Z0n-c
  • Limiting the database size. If this is important, try the trick outlined here: http://souptonuts.sourceforge.net/quota_tutorial.html.
  • It's possible to design one's schema using embedded documents, non-embedded (i.e.: separate documents) or a bucket (hybrid) structure. There's an excellent and short post about this here.
  • Help mapping from SQL, there is an SQL to Mongo Mapping Chart.
  • MongoDB service start (on Ubuntu). This is done:
        $ service mongodb start
    

    However, it may not "take" as you see if you look for the process. This is because it got shut down badly and there is a lock file. Remove this lock file thus:

        $ rm /var/lib/mongodb/mongod.lock
    
  • Solution to getting MongoDB logging to come into our log files. This can be had if using Slf4j. See http://stackoverflow.com/questions/869945/how-to-send-java-util-logging-to-log4j.
  • To reach MongoDB via HTML, add 1000 to the port on which it's running. If your local host is running Mongo, use http://localhost:28017. Some links require the ReST service to run, accomplish this by launching with --rest.
  • Commercial, inter-node SSL support for MongoDB is had at 10gen Customer Downloads and the price for this, very steep, can be seen in the "Enterprise" column here.

  • Many and more great links...

    Upstart

    When you install MongoDB using the Debian package, it establishes itself as a service via Upstart which isn't what you want if you're running the local installation as a replica.

    Whatever the reason for your interest in this matter, note that the script that governs the Upstart nature of MongoDB is /etc/init/mongodb.conf, not to be confused with /etc/mongodb.conf, what's used to configure how MongoDB starts (not that it starts at all and is stoppable, etc. which is what the Upstart configuration file does).

        # Ubuntu upstart file at /etc/init/mongodb.conf
    
        limit nofile 20000 20000
    
        kill timeout 300 # wait 300s between SIGTERM and SIGKILL.
    
        pre-start script
          mkdir -p /var/lib/mongodb/
          mkdir -p /var/log/mongodb/
        end script
    
        start on runlevel [2345]
        stop on runlevel [06]
    
        script
          ENABLE_MONGODB="yes"
          if [ -f /etc/default/mongodb ]; then
            . /etc/default/mongodb
          fi
          if [ "x$ENABLE_MONGODB" = "xyes" ]; then
            exec start-stop-daemon --start --quiet --chuid mongodb --exec  /usr/bin/mongod -- --config /etc/mongodb.conf
          fi
        end script
    

    A good link discussing this is ubuntu: start(upstart) second instance of mongodb.


    Quick Start

    Installation

    See http://www.javahotchocolate.com/tutorials/mongodb.html.

    Start up the console...

    ...and look around including see what databases are available, switching focus to a database (use), examing a collection, forcing JSON output to be formatted, etc. (Some vertical white space inserted for clarity.)

        $ mongo
        MongoDB shell version: 2.0.5
        connecting to: test
    
        > show dbs
        accountmgrdb      0.203125GB
        local (empty)
        morphia_example   0.203125GB
        my_database       0.203125GB
        russ_trystuff_db  0.203125GB
        test              0.203125GB
        yourdb            0.203125GB
    
        > use accountmgrdb
        switched to db accountmgrdb
    
        > show collections
        Accounts
        system.indexes
    
        > db.Accounts.findOne();
        {
            "_id" : ObjectId("4fbbaa22e4b0b4e60c9820de"),
            "email" : "[email protected]",
            "password" : "passpass",
            "firstname" : "René",
            "lastname" : "de St. Exupéry",
            "fullname" : "René de St. Exupéry",
            "phone" : "33 (0) 3.29.90.66.65",
            "mobile" : "33 (0) 3.29.90.66.65",
            "fax" : "33 (0) 3.29.90.66.63"
        }
    
        > db.Accounts.find( { "firstname" : "René" } );
        { "_id" : ObjectId("4fbbaa22e4b0b4e60c9820de"), "email" : "[email protected]", \
            "password" : "passpass", "firstname" : "René", "lastname" : "de St. Exupéry", \
            "fullname" : "René de St. Exupéry", \
            "phone" : "33 (0) 3.29.90.66.65", "mobile" : "33 (0) 3.29.90.66.65", "fax" : "33 (0) 3.29.90.66.63" }
    
        > db.Accounts.find( { "firstname" : "René" } ).forEach( printjson);
        {
            "_id" : ObjectId("4fbbaa22e4b0b4e60c9820de"),
            "email" : "[email protected]",
            "password" : "passpass",
            "firstname" : "René",
            "lastname" : "de St. Exupéry",
            "fullname" : "René de St. Exupéry",
            "phone" : "33 (0) 3.29.90.66.65",
            "mobile" : "33 (0) 3.29.90.66.65",
            "fax" : "33 (0) 3.29.90.66.63"
        }
    

    CRUD

    Now let's have some real, useful fun...

    Create

    ...a new user or two:

        > db.Account.insert( { "email":"[email protected]", "password":"do 'em every time",
        ..."firstname":"Jack" } );
    

    Now, to make certain the new account was added...

        > db.Accounts.find( { "firstname":"Jack" } ).forEach( printjson );
        {
            "_id" : ObjectId("4fbbcb4e1b599c3db4747a6e"),
            "email" : "[email protected]",
            "password" : "do 'em every time",
            "firstname" : "Jack"
        }
    

    Let's add a second account for grins...

        > db.Accounts.insert( { "email":"[email protected]", "password":"don't hurt me",
        ..."firstname":"Bea" } );
    

    Read

    ...or locate stuff that might be in the database.

        > db.Accounts.find( { "firstname":"Jack" } ).forEach( printjson );
        {
            "_id" : ObjectId("4fbbcb4e1b599c3db4747a6e"),
            "email" : "[email protected]",
            "password" : "do 'em every time",
            "firstname" : "Jack"
        }
    

    If you wish to show all accounts whose e-mail addresses end in ".uk" use a regular expression! (Gotta love that, eh?)

        > db.Accounts.find( { "email": /[.]uk$/ } ).forEach( printjson );
        {
            "_id" : ObjectId("4fbbcdf21b599c3db4747a6f"),
            "email" : "[email protected].uk",
            "password" : "do 'em every time",
            "firstname" : "Jack"
        }
        {
            "_id" : ObjectId("4fbbce2d1b599c3db4747a70"),
            "email" : "beatrice.pansy@ladies-club.uk",
            "password" : "don't hurt me",
            "firstname" : "Bea"
        }
    

    Find all documents:

        > db.Accounts.find( { } );
        { "_id" : ObjectId("4fbbaa22e4b0b4e60c9820de"), "email" : "[email protected]", \
            "password" : "passpass", "firstname" : "René", "lastname" : "de St. Exupéry", \
            "fullname" : "René de St. Exupéry", \
            "phone" : "33 (0) 3.29.90.66.65", "mobile" : "33 (0) 3.29.90.66.65", "fax" : "33 (0) 3.29.90.66.63" }
        { "_id" : ObjectId("4fbbcdf21b599c3db4747a6f"), "email" : "[email protected]", \
            "password" : "do 'em every time", "firstname" : "Jack" }
        { "_id" : ObjectId("4fbbce2d1b599c3db4747a70"), "email" : "[email protected]", \
            "firstname" : "Bea", "lastname" : "pansy", "password" : "don't hurt me" }
    

    See field (i.e.: column in SQL) particulars only for query results. This abbreviates the document returned to only those fields that are to be used. This is vaguely reminiscent of SQL JOIN.

        > db.Addresses.find( { }, { "addresstype" : 1 } );
        { "_id" : ObjectId("4fc7a6b1e4b022644086cff6"), "addresstype" : 1 }
        { "_id" : ObjectId("4fc7c92be4b0cd36353c4a02"), "addresstype" : 2 }
    

    Find document by subdocument.

    Imagine a collection of documents each with a subdocument named data like:

        > db.tuples.findOne();
        {
            "_id" : ObjectId("502fb6a9674c381db9e9249a"),
            "rats" : "large mice",
            "x" : 1,
            "data" : {
                "this" : "uh-huh",
                "that" : "oh-oh",
                "other" : "poo-poo-pee-doo"
            }
        }
    

    Search for such a document by matching exactly one or more tuples. Here are two possible queries:

        > db.tuples.find( { "data.this":"uh-huh", "data.that":"oh-oh" } );
        { "_id" : ObjectId("502fb6a9674c381db9e9249a"), "rats" : "large mice", "x" : 1, \
            "data" : { "this" : "uh-huh", "that" : "oh-oh", "other" : "poo-poo-pee-doo" } }
        > db.tuples.find( { $and : { "data.this":"uh-huh" }, { "data.that":"oh-oh" } ] } );
        { "_id" : ObjectId("502fb6a9674c381db9e9249a"), "rats" : "large mice", "x" : 1, \
            "data" : { "this" : "uh-huh", "that" : "oh-oh", "other" : "poo-poo-pee-doo" } }
    

    See list of query operators.

    Update

    Find Bea's document (record) and add in her last name. Then, find and display the whole document.

        > db.Accounts.update( { "firstname":"Bea" }, { $set: { "lastname":"pansy" } } );
        > db.Accounts.find( { "firstname":"Bea" } ).forEach( printjson );
        {
            "_id" : ObjectId("4fbbce2d1b599c3db4747a70"),
            "email" : "[email protected]",
            "firstname" : "Bea",
            "lastname" : "pansy",
            "password" : "don't hurt me"
        }
    

    See list of update operators.

    Delete

    Remove a document from the collection. The empty command prompt caret shows Mongo's answer to a failed query (nothing).

        > db.Accounts.remove( { "firstname":"Jack" } );
        > db.Accounts.find( { "firstname":"Jack" } ).forEach( printjson );
        >
    

    Query operators

    MongoDB queries are clever in that they are more or less "query by example".

    Along with $set, these are possible operators for doing update operations:


    Update operators

    Along with $set, these are possible operators for doing update operations:

    Update terminology

    Upsert means to create a document where none existed to be updated (or merely update as instructed).

    multiupdates are updates fired on all documents that match the query.


    Deletion options

    After using a database, here's how to drop a) the database, b) a collection, c) a document (DELETE FROM Account WHERE...).

        > db.dropDatabase()
        > db.Account.drop()
        > db.Account.remove( { ... } )
    

    sort()

    How to sort query results: a) ascending order (1), b) descending order (-1):

        > db.Accounts.find().sort( { "email": 1 } ).forEach( printjson );
        {
            "_id" : ObjectId("4fbbce2d1b599c3db4747a70"),
            "email" : "[email protected]",
            "firstname" : "Bea",
            "lastname" : "pansy",
            "password" : "don't hurt me"
        }
        {
            "_id" : ObjectId("4fbbcdf21b599c3db4747a6f"),
            "email" : "[email protected]",
            "password" : "do 'em every time",
            "firstname" : "Jack"
        }
        {
            "_id" : ObjectId("4fbbaa22e4b0b4e60c9820de"),
            "email" : "[email protected]",
            "password" : "passpass",
            "firstname" : "René",
            "lastname" : "de St. Exupéry",
            "fullname" : "René de St. Exupéry",
            "phone" : "33 (0) 3.29.90.66.65",
            "mobile" : "33 (0) 3.29.90.66.65",
            "fax" : "33 (0) 3.29.90.66.63"
        }
        > db.Accounts.find().sort( { "email": -1 } ).forEach( printjson );
        {
            "_id" : ObjectId("4fbbaa22e4b0b4e60c9820de"),
            "email" : "[email protected]",
            "password" : "passpass",
            "firstname" : "René",
            "lastname" : "de St. Exupéry",
            "fullname" : "René de St. Exupéry",
            "phone" : "33 (0) 3.29.90.66.65",
            "mobile" : "33 (0) 3.29.90.66.65",
            "fax" : "33 (0) 3.29.90.66.63"
        }
        {
            "_id" : ObjectId("4fbbcdf21b599c3db4747a6f"),
            "email" : "[email protected]",
            "password" : "do 'em every time",
            "firstname" : "Jack"
        }
        {
            "_id" : ObjectId("4fbbce2d1b599c3db4747a70"),
            "email" : "[email protected]",
            "firstname" : "Bea",
            "lastname" : "pansy",
            "password" : "don't hurt me"
        }
    

    Indices (indexes)

    Indexing is a way to improve performance when data are well known. When encountering performance issues, poorly designed indices are usually to blame.

    Look at the output from explain().

        > db.Accounts.find( { "firstname":"Bea" } );
        { "_id" : ObjectId("4fbbce2d1b599c3db4747a70"), "email" : "[email protected]", \
            "firstname" : "Bea", "lastname" : "pansy", "password" : "don't hurt me" }
        > db.Accounts.find( { "firstname":"Bea" } ).explain();
        {
            "cursor" : "BasicCursor",
            "nscanned" : 3,
            "nscannedObjects" : 3,
            "n" : 1,
            "millis" : 0,
            "nYields" : 0,
            "nChunkSkips" : 0,
            "isMultiKey" : false,
            "indexOnly" : false,
            "indexBounds" : {
            }
        }
    

    [THIS IS A BAD EXAMPLE BECAUSE OUR DATA ARE NEITHER RICH NOR NUMEROUS.]

    Create an index thus:

        > db.Accounts.ensureIndex( { "email": 1 } );
    

    There are multikey indices for fields containing arrays; each entry in the array appearing in the index, and compound indices where two fields are indexed on. Often, to get the best performance, a compound index is desirable, e.g.: querying the list of a user's tweets sorted by creation date.


    The Java side

    This is sort of thrown together, simplistic and tentative. I may come back to do something a little better.

    Download the MongoDB Java driver and Javadoc JARs from here. Adjust the version as necessary.

    The current Java driver (Javadoc) documentation is usually found at: http://api.mongodb.org/java/current/.

    Here's our POJO...

    package com.acme.accountmgr;
    
    import org.bson.types.ObjectId;
    
    import com.mongodb.BasicDBObject;
    import com.mongodb.DBObject;
    
    public class Account
    {
        ObjectId id;
        String   email;
        String   password;
        String   firstname;
        String   lastname;
        String   fullname;
    
        public Account() { }
    
        public Account ( DBObject bson )
        {
            BasicDBObject b = ( BasicDBObject ) bson;
    
            this.id        = ( ObjectId ) b.get( "id" );
            this.email     = ( String )   b.get( "email" );
            this.password  = ( String )   b.get( "password" );
            this.firstname = ( String )   b.get( "firstname" );
            this.lastname  = ( String )   b.get( "lastname" );
            this.fullname  = ( String )   b.get( "fullname" );
        }
    
        public String getEmail() { return this.email; }
        public void setEmail( String email ) { this.email = email; }
    
        public String getPassword() { return this.password; }
        public void setPassword( String password ) { this.password = password; }
    
        public String getFirstname() { return this.firstname; }
        public void setFirstname( String firstname ) { this.firstname = firstname; }
    
        public String getLastname() { return this.lastname; }
        public void setLastname( String lastname ) { this.lastname = lastname; }
    
        public String getFullname() { return this.fullname; }
        public void setFullname( String fullname ) { this.fullname = fullname; }
    }
    

    The notes earlier were all console work; the Java driver is available of course. This code assumes that database we were playing with.

    package com.acme.accountmgr;
    
    import java.net.UnknownHostException;
    import java.util.ArrayList;
    import java.util.List;
    
    import org.bson.types.ObjectId;
    
    import com.mongodb.BasicDBObject;
    import com.mongodb.DBObject;
    import com.mongodb.DB;
    import com.mongodb.DBCollection;
    import com.mongodb.DBCursor;
    import com.mongodb.Mongo;
    import com.mongodb.MongoException;
    
    import com.acme.accountmgr.Account;
    
    public class MongoDemo
    {
        public MongoDemo()
        {
            try
            {
                Mongo        mongo   = new Mongo( "localhost", 27017 );
                DB           db      = mongo.getDB( "accountmgrdb" );
                DBCollection account = db.getCollection( "Accounts" );
            }
            catch( UnknownHostException e )
            {
                log.error( "MongoDB host not found", e );
            }
            catch( MongoException e )
            {
                log.error( "Runtime error attempting MongoDB connection", e );
            }
        }
    

    CRUD

    Create

        public void create( Account account )
        {
            BasicDBObject document = new BasicDBObject();
    
            document.put( "email",     account.getEmail() );
            document.put( "password",  account.getPassword() );
            document.put( "firstname", account.getFirstname() );
            document.put( "lastname",  account.getLastname() );
            document.put( "fullname",  account.getFullname() );
    
            this.account.insert( document );
        }
    

    Read

        public List< Account > read( String property, String value )
        {
            BasicDBObject query  = new BasicDBObject();
            DBCursor      cursor;
    
            query.put( property, value );
    
            cursor = collection.find( query );
    
            while( cursor.hasNext() )
            {
                DBObject object = cursor.next();
                list.add( new Account( object ) );
            }
    
            return list;
        }
    

    Update

        public void update( Account existing, Account replacement )
        {
            BasicDBObject document = new BasicDBObject();
    
            document.put( "_id", existing.getId() );
            document.put( "email",     ( replacement.getEmail() == null )     ? existing.getEmail()     : replacement.getEmail() );
            document.put( "password",  ( replacement.getPassword() == null )  ? existing.getFullname()  : replacement.getPassword() );
            document.put( "firstname", ( replacement.getFirstname() == null ) ? existing.getEmail()     : replacement.getFirstname() );
            document.put( "lastname",  ( replacement.getLastname() == null )  ? existing.getLastname()  : replacement.getLastname() );
            document.put( "fullname",  ( replacement.getFullname() == null )  ? existing.getFirstname() : replacement.getFullname() );
    
            collection.update( existing, replacement );
        }
    

    Delete

        public void delete( Account account )
        {
            BasicDBObject delete = new BasicDBObject().append( "_id", account.getId() );
            collection.remove( delete );
        }
    }
    

    JARs


    $or in Java

    How to embed $or, etc. in queries. Here, we're looking for a document in which a is either 10 or 5. First, we create the factors with values 10 and 5. Then, we create an operation that will OR them. We get a query ready.

    Next, we add the factors one at a time to the OR operation. Then, we tuck them into the query.

    public boolean lookForTensOrFives( ObjectId oid )
    {
        BasicDBObject factor1 = new BasicDBObject();
        BasicDBObject factor2 = new BasicDBObject();
        BasicDBList   or      = new BasicDBList();
        BasicDBObject query   = new BasicDBObject();
    
        factor1.put( "a", 10 );
        factor2.put( "a", 5 );
    
        or.add( factor1 );
        or.add( factor2 );
    
        query.put( "$or", or );
    
        DBCursor cursor = col.find( query );
    
        while( cursor.hasNext() )
        {
            DBObject found = cursor.next();
            // as many times as we get here, 'found' is a document that matches!
        }
    }
    

    ObjectId used as OIDs

    When sorting out ObjectIds, between String and ObjectId, try the following. The point is that if you don't know if the thing coming in is a string or an oid, this helper will ensure it's what Mongo wants (_id, etc.).

        import org.bson.type.ObjectId;
        ...
    
            ObjectId makeObjectId( Object id )
            {
                if( id instanceof String )
                    return new ObjectId( ( String ) id );
                else if( id instanceof ObjectId )
                    return ( ObjectId ) id;
    
                throw new RuntimeException( "Cannot convert " + id + " to an ObjectId" );
            }
    
            void method( String someOid, ObjectId anotherOid )
            {
                ObjectId oidA = makeObjectId( someOid );
                ObjectId oidB = makeObjectId( anotherOid );
                ...
    

    Common errors

    Since JSON makes copious use of double-quoting and one sees double-quotes all over the place, it's easy to get lulled into looking in the wrong place for a failed query. For example, let's say you're representing some object type as an integer, but it shows up in some method as a string (for whatever reason), you may not notice that you have to pass it to BasicDBObject.put() as an Integer when stepping through the debugger.

        void method( String type )
        {
            BasicDBObject query = new BasicDBObject();
    
            query.put( "type", Integer.parseInt( type ) );
    
            DBCursor cursor = collection.find( query );
            ...
    

    Schema solutions: arrays

    What's done in a JOIN in SQL might be done in the same document in MongoDB since the schema is so fluid. Here are various renderings of addresses in a user account.

    Array

    This is probably what I'd prefer since I like to tout an addresstype.

        {
            "_id" : "4fc5520ae4b0aa302dd16e0c",
            "email" : "[email protected]",
            "password" : "passpass",
            "addresses" :
            [
                {
                    "addresstype" : 3,
                    "fullname" : "Yosemite Sam Tucker",
                    "street1" : "PO Box 32",
                    "city" : "Culver City",
                    "state" : "ca",
                    "country" : "us",
                    "postalcode" : "90211",
                    "isdefault" : false
                },
                {
                    "addresstype" : 1,
                    "fullname" : "Yosemite Sam Tucker",
                    "street1" : "1321 Hollywood Blvd",
                    "street2" : "(back lot)",
                    "city" : "Culver City",
                    "state" : "ca",
                    "country" : "us",
                    "postalcode" : "90211",
                    "isdefault" : false
                }
            ]
        }
    

    Subdocument

    This works only if our interface makes use of strings such as "homeaddresses", "shippingaddresses", etc. It's nicer to look at in the Mongo console, but presents maybe no other benefit since users won't have gazillions of addresses anyway.

        {
            "_id" : "4fc5520ae4b0aa302dd16e0c",
            "email" : "[email protected]",
            "password" : "passpass",
            "homeaddresses" :
            [
                {
                    "fullname" : "Yosemite Sam Tucker",
                    "street1" : "1321 Hollywood Blvd",
                    "street2" : "(back lot)",
                    "city" : "Culver City",
                    "state" : "ca",
                    "country" : "us",
                    "postalcode" : "90211",
                    "isdefault" : true
                }
            ]
            "shippingaddresses" :
            [
                {
                    "fullname" : "Yosemite Sam Tucker",
                    "street1" : "PO Box 32",
                    "city" : "Culver City",
                    "state" : "ca",
                    "country" : "us",
                    "postalcode" : "90211",
                    "isdefault" : true
                },
                {
                    "fullname" : "Grandma Tucker",
                    "street1" : "2234 Cowtown Lane",
                    "city" : "Hastings",
                    "state" : "ne",
                    "country" : "us",
                    "postalcode" : "68901",
                    "isdefault" : false
                },
            ]
        }
    

    Exploring $set updates

    Here's some exploring of update. I used additional vertical space to make things clearer. The two updates done here do different things. When $set is used, it adds the new construct to what's there. When not, it replaces all, but the _id.

      > use funstuff
      switched to db funstuff
    
      > db.fun.insert( { _id : 123, "fun" : "things" } );
      > db.fun.findOne()
      { "_id" : 123, "fun" : "things" }
    
      > db.fun.update( { _id:123 }, { $set: { hello: "world" } } );
      > db.fun.findOne()
      { "_id" : 123, "fun" : "things", "hello" : "world" }
    
      > db.fun.remove( { _id : 123 } )
      > db.fun.insert( { _id : 123, "fun" : "things" } );
    
      > db.fun.update( { _id:123 }, { hello: "world" } );
      > db.fun.findOne()
      { "_id" : 123, "hello" : "world" }
    

    More exploring $set updates

    Here'a rather more complex exploration with interleaved Java code (that, at first at least, wasn't tested even for syntax).

        // What's going on in Enchiladas...
        > db.Enchiladas.findOne();
        {
            "_id" : ObjectId("4fccc8dde4b0d5de2eeab3c5"),
            "email" : "[email protected]",
            "password" : "passpass",
            "isdefault" : false
        }
    
        // Initialize 'sam' with the bucket that interests us.
        > var sam = db.Enchiladas.findOne( { "_id" : ObjectId("4fccc8dde4b0d5de2eeab3c5") } );
          
            BasicDBObject query = new BasicDBObject();
    
            query.put( "_id", new ObjectId( "4fccc8dde4b0d5de2eeab3c5" ) );
    
            DBCursor cursor = collection.find( query );
            DBObject sam    = null;
    
            while( cursor.hasNext() )
            {
                sam = cursor.next();
                break;
            }
    
            // sam "points" at his account!
    
    
    
        // Create a new field in document 'sam' to hold the address:
        > sam.address = { "addresstype":2, "fullname":"Yosemite Sam Tucker", "street1":"1321 Hollywood Blvd", \
            "street2":"(back lot)", "city":"Culver City", "state":"ca", "country":"us", "postalcode":"90211", \
            "isdefault":false }
        {
            "addresstype" : 2,
            "fullname" : "Yosemite Sam Tucker",
            "street1" : "1321 Hollywood Blvd",
            "street2" : "(back lot)",
            "city" : "Culver City",
            "state" : "ca",
            "country" : "us",
            "postalcode" : "90211",
            "isdefault" : false
        }
          
            BasicDBObject address = new BasicDBObject();
    
            address.put( "addresstype", 2 );
            address.put( "fullname", "Yosemite Sam Tucker" );
            address.put( "street1", "1321 Hollywood Blvd" );
            address.put( "street2", "(back lot)" );
            address.put( "city", "Culver City" );
            address.put( "state", "ca" );
            address.put( "country", "us" );
            address.put( "postalcode", "90211" );
            address.put( "isdefault", false );
    
            // this will replace what's sam with what's address: we don't want that!
            collection.update( sam, address );
    
            // update fodder here (how to do "$set"...)
            // the statements as if in Mongo console (JavaScript) are built progressively...
            BasicDBObject newsam    = new BasicDBObject().append( "address", address );
            BasicDBObject augmented = new BasicDBObject().append( "$set", newsam );
            collection.update( sam, augmented );
    
    
        // Here's the update adding an address to Sam's bucket:
        > db.Enchiladas.update( { "_id" : ObjectId("4fccc8dde4b0d5de2eeab3c5") }, sam );
    
        // Now, when we look at all we've got, we see different stuff in one bucket than in another.
        > db.Enchiladas.find( { } ).forEach( printjson );
        {
            "_id" : ObjectId("4fcccadbe4b0d5de2eeab3c6"),
            "email" : "[email protected]",
            "password" : "passpass",
            "isdefault" : false
        }
        {
            "_id" : ObjectId("4fcccb686ccd0f44d66a18a4"),
            "email" : "poop.abc.com",
            "ipaddress" : "192168.0.9",
            "password" : "passpass"
        }
        {
            "_id" : ObjectId("4fccc8dde4b0d5de2eeab3c5"),
            "email" : "[email protected]",
            "password" : "passpass",
            "isdefault" : false,
            "address" : {
                "addresstype" : 2,
                "fullname" : "Yosemite Sam Tucker",
                "street1" : "1321 Hollywood Blvd",
                "street2" : "(back lot)",
                "city" : "Culver City",
                "state" : "ca",
                "country" : "us",
                "postalcode" : "90211",
                "isdefault" : false
            }
        }
          
            // find all the documents...
            DBCursor cursor = collection.find( query );
    
            while( cur.hasNext() )
                System.out.println( cursor.next() );
    

    Quick and dirty Mongo set-up code

    I have project named TryIt in which I prototype things quickly if I wish to experiment. Here's a class in it. It might be referenced from other notes on this page.

    package experiment;
    
    import java.net.UnknownHostException;
    
    import com.mongodb.DBCollection;
    import com.mongodb.Mongo;
    
    public class MongoSetup
    {
        Mongo  mongo = null;
        String database;
    
        public MongoSetup()
        {
            setup();
        }
    
        public MongoSetup( String database )
        {
            setup();
            this.database = database;
        }
    
        private void setup()
        {
            try
            {
                mongo = new Mongo();
            }
            catch( UnknownHostException e )
            {
                System.out.println( );
            }
        }
    
        public String getDatabase() { return this.database; }
        public void setDatabase( String database ) { this.database = database; }
    
        public DBCollection getCollection( String collection )
        {
            return this.getCollection( this.database, collection );
        }
    
        public DBCollection getCollection( String database, String collection )
        {
            return mongo.getDB( database ).getCollection( collection );
        }
    }
    

    Exploring arrays...

    This example is from MongoDB -> Home -> Drivers -> Java Language Center -> Java Types

    public static void main( String[] args )
    {
        MongoSetup mongo = new MongoSetup( "funstuff" );
    
        ArrayList< Serializable > x = new ArrayList< Serializable >();
    
        x.add( 1 );
        x.add( 2 );
        x.add( new BasicDBObject( "foo", "bar" ) );
        x.add( 4 );
    
        BasicDBObject doc = new BasicDBObject( "odd-array", x );
    
        DBCollection collection = mongo.getCollection( "array_demo" );
    
        collection.insert( doc );
    }
    

    The Java snippet above created the Mongo console experience below.

        > use funstuff
        switched to db funstuff
        > show collections
        array_demo
        system.indexes
        > db.array_demo.findOne();
        {
            "_id" : ObjectId("4fce18d55a374a574039b45b"),
            "odd-array" : [
                1,
                2,
                {
                    "foo" : "bar"
                },
                4
            ]
        }
    

    What's going on?

    A (Java-shabby) array is created for the purpose of demonstrating wild arrays embedded in a Mongo document. A Mongo document is created and the array embedded as odd-array before being inserted into the database collection shown.


    How to add an array to a MongoDB document in Java...

    That is, a complex object array.

    public class IdentityType
    {
        private String identity;
        private String type;
    
        public IdentityType() { }
        public IdentityType( String identity, String type ) { this.identity = identity; this.type = type; }
    
        public String getIdentity() { return identity; }
        public void setIdentity( String identity ) { this.identity = identity; }
        public String getType() { return type; }
        public void setType( String type ) { this.type = type; }
    
        public String toString()
        {
            StringBuilder sb = new StringBuilder();
    
            sb.append( "{\n" );
            sb.append( "  identity: " + this.identity + "\n" );
            sb.append( "  type:     " + this.type + "\n" );
            sb.append( "\n}" );
    
            return sb.toString();
        }
    }
    

    This is the relevant POJO code:

        private List< IdentityType > idtypes = new ArrayList< IdentityType >();
        public List< IdentityType >  getIdtypes()                             { return this.idtypes; }
        public void                  setIdtypes( List< IdentityType > types ) { this.idtypes = types; }
        public void                  addIdtype( IdentityType type )           { this.idtypes.add( type ); }
    

    Here's the trip from POJO to MongoDB document:

        public DBObject getBsonFromPojo()
        {
            if( getIdtypes().size() > 0 )
            {
                List< BasicDBObject > list = new ArrayList< BasicDBObject >();
    
                for( IdentityType idt : getIdtypes() )
                {
                    BasicDBObject idtype = new BasicDBObject();
    
                    idtype.put( "identity", idt.getIdentity() );
                    idtype.put( "type", idt.getType() );
                    list.add( idtype );
                }
    
                document.put( "idtypes", list );
            }
    
            return document;
        }
    

    It's an easier trip back from MongoDB to POJO:

        public void makePojoFromBson( DBObject bson )
        {
            BasicDBObject b = ( BasicDBObject ) bson;
    
            ...
            setIdtypes( ( List< IdentityType > ) b.get( "idtypes" ) );
        }
    

    The $ (positional) operator for updating array elements

    The $ (dollar sign) can be used to represent the position of the matched array item in the query, or first half of an update operation.

    Imagine a document like (there happens to be only one in this collection):

        > db.accounts.findOne();
        {
          "_id" : ObjectId("4fcf8b055a3770c10a741edb"),
          "addresses" : [
            {
              "_id" : ObjectId("4fcf8b055a3770c10a741ed9"),
              "type" : 1,
              "street" : "123 My Street",
              "city" : "Bedford Falls",
              "state" : "NJ"
            },
            {
              "_id" : ObjectId("4fcf8b055a3770c10a741ed8"),
              "type" : 3,
              "street" : "789 My Street",
              "city" : "Bedford Falls",
              "state" : "NJ"
            }
          ],
          "name" : "Jack"
        }
    

    You wish to re-type the second of the two addresses from 3 to 2. First, create a query that will identify that address.

        > db.accounts.find( { "_id" : ObjectId( "4fcf8b055a3770c10a741edb" ),
        ... "addresses._id" : ObjectId( "4fcf8b055a3770c10a741ed8" ) } ).pretty();
        {
          "_id" : ObjectId("4fcf8b055a3770c10a741edb"),
          "addresses" : [
            {
              "_id" : ObjectId("4fcf8b055a3770c10a741ed9"),
              "type" : 1,
              "street" : "123 My Street",
              "city" : "Bedford Falls",
              "state" : "NJ"
            },
            {
              "_id" : ObjectId("4fcf8b055a3770c10a741ed8"),
              "type" : 3,
              "street" : "789 My Street",
              "city" : "Bedford Falls",
              "state" : "NJ"
            }
          ],
          "name" : "Jack"
        }
    

    With the right address getting isolated, you can now use the positional operator to set its type field to 2.

        > db.accounts.update( { "_id" : ObjectId( "4fcf8b055a3770c10a741edb" ),
        ... "addresses._id" : ObjectId( "4fcf8b055a3770c10a741ed8" ) },
        ... { "$set" : { "addresses.$.type" : 2 } } );
    

    That did it. You can now reuse the query to determine that it actually happened.

        > db.accounts.find( { "_id" : ObjectId( "4fcf8b055a3770c10a741edb" ),
        ... "addresses._id" : ObjectId( "4fcf8b055a3770c10a741ed8" ) } ).pretty();
        {
          "_id" : ObjectId("4fcf8b055a3770c10a741edb"),
          "addresses" : [
            {
              "_id" : ObjectId("4fcf8b055a3770c10a741ed9"),
              "type" : 1,
              "street" : "123 My Street",
              "city" : "Bedford Falls",
              "state" : "NJ"
            },
            {
              "_id" : ObjectId("4fcf8b055a3770c10a741ed8"),
              "city" : "Bedford Falls",
              "state" : "NJ",
              "street" : "789 My Street",
              "type" : 2
            }
          ],
          "name" : "Jack"
        }
    

    To change two or more fields, you only add more comma-separated tuples to the $set:

        > db.accounts.update( { "_id" : ObjectId( "4fcf8b055a3770c10a741edb" ),
        ... "addresses._id" : ObjectId( "4fcf8b055a3770c10a741ed8" ) },
        ... { "$set" : { "addresses.$.type" : 3, "addresses.$.city" : "Potterville" } } );
        > db.accounts.findOne();
        {
    	    "_id" : ObjectId("4fcf8b055a3770c10a741edb"),
    	    "addresses" : [
    		    {
    			    "_id" : ObjectId("4fcf8b055a3770c10a741ed9"),
    			    "type" : 1,
    			    "street" : "123 My Street",
    			    "city" : "Bedford Falls",
    			    "state" : "NJ"
    		    },
    		    {
    			    "_id" : ObjectId("4fcf8b055a3770c10a741ed8"),
    			    "city" : "Potterville",
    			    "state" : "NJ",
    			    "street" : "789 My Street",
    			    "type" : 3
    		    }
    	    ],
    	    "city" : "Potterville",
    	    "name" : "Jack"
        }
    

    In Java...

    In Java, some of this above would be like this. Just as above, notice the dots and dollar signs. Incidentally, if only one of these fields, say street, were to change, the others just wouldn't be passed (as is obvious from checking to see if something's in there in the first place).

    private static void update( ObjectId accountoid, Address address )
    {
        BasicDBObject match = new BasicDBObject();
        match.put( "_id", accountoid );
        match.put( "addresses.oid", address.getOid() );
    
        BasicDBObject addressSpec = new BasicDBObject();
        Integer type = address.getType();
        String temp;
    
        if( ( type = address.getType() ) != null )
            addressSpec.put( "addresses.$.type", type );
        if( ( temp = address.getStreet() ) != null )
            addressSpec.put( "addresses.$.street", temp );
        if( ( temp = address.getCity() ) != null )
            addressSpec.put( "addresses.$.city", temp );
        if( ( temp = address.getState() ) != null )
            addressSpec.put( "addresses.$.state", temp );
    
        BasicDBObject update = new BasicDBObject();
        update.put( "$set", addressSpec );
    
        collection.update( match, update );
    }
    

    MongoDB semantics

    Here is some semantic fall-out from MongoDB terminology and things we say about MongoDB.

    Sharding —Where data is split between more than one replica set. What is in one shard isn't in another. Sharding in MongoDB must be carefully configured, it doesn't come for free, you must do a lot of extra work to achieve it. Among other reasons to shard, sharding can be used to solve issues of geographic collation of data and scaling of that data.

    MongoDB configuration server —This is a special instance of the mongod dæmon that maintains shared-cluster metadata to give to instances of mongos. It's the "how-to" section of the sharding mongos brain. There should be three of these since the MongoDB is dead in the water without at least one in good health. A configuration server (also called a "config server") can only mean sharding.

    Replica set —A collection of replica nodes. A MongoDB shard must have one of these, but a replica set doesn't need to be in a shard to stand on its own. A replica set ensures that data is written to more than one node (place)—effectively duplicating it or better. Note that as soon as you say multiple replica sets, you are necessarily referring to a sharded configuration.

    Replica node —A single instance of the mongod dæmon running usually alone on a VM or host.

    mongod —This is the basic MongoDB dæmon. In a sense, it just i MongoDB.

    mongos —This is a special dæmon that connects an application to a MongoDB sharding set-up and controls reading and writing to the appropriate shard for the data concerned. It uses information from a special mongod erected as a MongoDB configuration server. mongos can only mean sharding.


    WriteConcern

    See http://www.littlelostmanuals.com/2011/11/overview-of-basic-mongodb-java-write.html. Explore also "MongoDB tagging."

    A better much later treatment exists as a subsection on write concerns to my MongoDB Error-handling Notes.

    (no write-concern arguement) Writes to driver which must send potentially over wire to reach mongod.
    WriteConcern.SAFE Returns after operation known to have reached mongod.
    WriteConcern.JOURNAL_SAFE Returns after operation known to have reached mongod and written to its journal.
    WriteConcern.MAJORITY Like SAFE, but returns after operation has been written to a simple majority of nodes in the replica set.
    WriteConcern.FSYNC_SAFE Returns after operation has been written to the server data file.

    Persist new records like user accounts, addresses and payment methods, etc. with

    collection.save( account, WriteConcern.FSYNC_SAFE );
    

    --takes a comparatively long time.

    Persist updates to addresses and payment methods with

    collection.save( new_address/new_payment, WriteConcern.FSYNC_SAFE );
    

    --because these are really new operations (the old one is "forgotten" and left in place). Use

    collection.merge( old_address/old_payment, WriteConcern.SAFE );
    

    to update the old entity with the “forgotten” flag.


    Voting to replace a primary...

    With respect to a replica set in Mongo, if the primary and/or other nodes are lost, you must have a "quorum" of voting nodes in order to elect a new primary and to retain full transactional status, i.e.: reading and writing. If you don't have a quorum, in many cases you can continue supporting reads, but no writes.

    A quorum (my terminology) is "at least 51% or more of the original number of nodes in the replica set". 10gen doesn't use this obvious word, but they should: in a voting body, a quorum is the smallest number of members that can make a decision in the absence of others.

    Also, a voting Mongo replica set quorum must consist of an odd number of members.

    So, if we start with a primary and four secondaries, that makes 5. Lose the primary and we have 4 left. That's an even number which doesn't work. We would need an arbiter too, to break the tie. I don't think arbiters count as members (for calculating quorums), but when voting, an arbiter does count as a member. So, an arbiter should be added, I think.

    Note: There is nothing wrong with arbiters; they're practically free being only mongods requiring virtually no disk and precious little memory.

    In a second case, if we started with a primary and three secondaries, that would make 4 total. Lose the primary and there are 3 voting members; I think that might be enough to elect a new primary.


    Locking in MongoDB...

    ...is done at the database level beginning in 2.2. Someday, it's slated to be more granular still, at the collection level.

    In MongoDB locks aren't really locks in the RDBMS sense, but more mutexes that a process takes while in a critical section of work being done.

    A lock isn't held across multiple documents (rows) as it would be in RDBMS; the duration of the lock is measured in microseconds.

    Coming from RDBMS, one shouldn't expect that locks will be a limiting factor in MongoDB because locks can be used tens of thousands of times per second for writes (and reads).


    Colorizing the MongoDB interactive shell...

    You know that you can configure a few things using what are called "rc" files, typically kept in your home folder. You've seen them:

        .exrc, .vimrc, .bashrc
    

    etc.

    So, you shouldn't be surprised that someone came up with a very fun and useful way of injecting color into your MongoDB interactive shell session. Enter Tyler Brock who replaces the (for now at least) zero-length .mongorc.js file with his own. You do have to be running, at very oldest, MongoDB 2.2.x.

    You can git (pun intended) his stuff and set it up for the next time you run the MongoDB shell by doing the following:

    1. Pick a subdirectory where you'd like to drop his stuff. You'll be updating it, if there's ever need, the same way you'd ever update sources controlled by Git. I put mine under ~/dev.
    2. Do this:
          $ cd ~/dev
          $ git clone [email protected]:TylerBrock/mongo-hacker.git
          $ cd ~
          $ ln -s ~/dev/mongo-hacker/mongo_hacker.js .mongorc.js
      
    3. Then just launch (relaunch) MongoDB to see color when you do stuff:
          $ mongo
      
    4. If ever there's reason to do it, update mongodb-hacker:
          $ cd ~/dev/mongo-hacker
          $ git pull origin master
      
    5. If you decide this is evil and you no longer want to be part of it, do this:
          $ rm -rf ~/dev/mongo-hacker
          $ rm ~/.mongorc.js
      

    Enjoy the ride. Here's something you might see. I don't care for all the colors, but there's probably a way to change that. I also don't like the long prompt he's added; I'll definitely smoke that. (Just edit mongo_hacker.js, look for "prompt" and comment out the whole paragraph of code.)


    Benchmarking MongoDB...

    An example.

    package com.mongodb;
    
    import com.mongodb.BasicDBObject;
    import com.mongodb.DBCollection;
    import com.mongodb.DBCursor;
    import com.mongodb.MongoClient;
    
    import java.net.UnknownHostException;
    
    public class PerfTest
    {
    	public static void main( String[] args ) throws UnknownHostException
    	{
    		MongoClient  m = new MongoClient();
    		DBCollection c = m.getDB( "test" ).getCollection( "PerfTest" );
    
    		/* Add this in to insert 500 documents before running the test:
    		 *     c.drop();
    		 *     for( int i = 0; i < 500; i++ )
    		 *         c.insert( new BasicDBObject( "_id", i ) );
    		 */
    		c.findOne();
    
    		DBCursor cursor = c.find();
    		long startTime = System.nanoTime();
    
    		try
    		{
    			while( cursor.hasNext() )
    				cursor.next();
    		}
    		finally
    		{
    			cursor.close();
    		}
    
    		long   estimatedTime = System.nanoTime() - startTime;
    		double seconds       = ( double ) estimatedTime / 1000000000.0;
    
    		System.out.println( "Done in " + seconds );
    	}
    }
    

    MongoDB 2.6 webinar notes

    Index maintenance

    Inconvenient to add new indices to existing collections, especially if big. Now possible to add it in the background.

    Auto-cancelation of operations by posting a maximum time in milliseconds for any operation, granular.

    Write commands delivered to the server (inserts, updates and deletes). All operations now deliverable in bulk, by some order, etc. Enables asynchronous communication with server.

    Power of 2 allocation enabled by default resulting easier predictability of storage requirements.

    Developers

    Improvements to query system.

    Index intersection, query introspection.

    Integrated text search. Beta in 2.4, released in 2.6 and integrated.

    New update operators $multiply, $min, $max. Now testable and extensible to add these whereas in the past it was very hard.

    Aggregation pipeline enhancements, since 2.4, but in 2.6 unlocks large data sets. (unlimited result set size vs. 16Mb)

    Enterprise security

    1. Authentication:
      Kerberos (2.4), LDAP (2.6), x.509 (2.6).
    2. Authorization:
      User-defined roles for DBs and collections.
    3. Encryption:
      Mixed-mode SSL. Obfuscation: Field-level redaction via aggregation framework.
    4. Auditing:
      Trails can be written to separate file or system log.

    MMS

    Monitoring, of course.

    Back-up