MongoDB Indices
—my experience and quick notes

Russ Bateman
29 April 2013
last update:

Here's some code I wrote playing around to get a MondoDB index from Java. Some information on this topic came from William Zola of 10gen.

'db.collection.getIndexes()' will return a JSON document describing all of the indexes on 'collection'. That sounds sufficient for your needs.

While 'db.collection.ensureIndex({...})' is a no-op if the specified index already exists, if it does not exist, it is a blocking operation that is potentially long-running, and could therefore halt all activity on the node. <DAMHIK>. I agree with your strategy of checking the indexes rather than just blindly re-creating them.

Links

This is a little exposé of Java handling of MongoDB indices. First, here's what I did in the MongoDB shell to create a small collection.

    > db.poop.insert( { "x":1, "y":2, "name":"purple" } )
    Inserted 1 record(s) in 18ms
    > db.poop.insert( { "x":2, "y":3, "name":"orange" } )
    Inserted 1 record(s) in 0ms
    > db.poop.find().pretty()
    {
      "_id": ObjectId("517ea23ff5a7d45346d5fff8"),
      "x": 1,
      "y": 2,
      "name": "purple"
    }
    {
      "_id": ObjectId("517ea24df5a7d45346d5fff9"),
      "x": 2,
      "y": 3,
      "name": "orange"
    }
    Fetched 2 record(s) in 1ms -- Index[none] -- More[false]

Next, I create an ascending (that's what the 1 means) index on the x field for this collection.

    > db.poop.ensureIndex( { "x":1 } )
    Inserted 1 record(s) in 2ms

Let's get the index in the MongoDB shell so we can compare it to what we get in Java.

    > db.poop.getIndexes()
    [
      {
        "v": 1,
        "key": {
          "_id": 1
        },
        "ns": "test.poop",
        "name": "_id_"
      },
      {
        "v": 1,
        "key": {
          "x": 1
        },
        "ns": "test.poop",
        "name": "x_1"
      }
    ]

Next, here's the output from this code:


    [0]: { "v" : 1 , "key" : { "_id" : 1} , "ns" : "test.poop" , "name" : "_id_"}
    [1]: { "v" : 1 , "key" : { "x" : 1.0} , "ns" : "test.poop" , "name" : "x_1"}
    The 0th index belongs to MongoDB; it's the second one that's ours:
        v:    1
        ns:   test.poop
        name: x_1
        key:  { "x" : 1.0}

And now the code...

package com.etretatlogiciels.mongodb;

import java.util.List;

import com.mongodb.BasicDBObject;
import com.mongodb.DB;
import com.mongodb.DBCollection;
import com.mongodb.DBObject;

public class ExploreIndices
{
    public static void main( String[] args )
    {
        OldMongoDB   mongodb    = new OldMongoDB();
        DB           database   = mongodb.getMongo().getDB( "test" );
        DBCollection collection = database.getCollection( "poop" );

        List< DBObject > indices = collection.getIndexInfo();

        int th = 0;

        for( DBObject dbo : indices )
            System.out.println( "[" + th++ + "]: " + dbo );

        DBObject o = indices.get( 1 );

        System.out.println( "The 0th index belongs to MongoDB; it's the second one that's ours:" );

        int    v     = ( Integer ) o.get( "v");
        String ns    = ( String )  o.get( "ns" );
        String name  = ( String )  o.get( "name" );
        System.out.println( "    v:    " + v );
        System.out.println( "    ns:   " + ns );
        System.out.println( "    name: " + name );

        BasicDBObject key = ( BasicDBObject )  o.get( "key" );
        System.out.println( "    key:  " + key );
    }
}

More and better...

Something that has to be understood by thinking about it is that if you define an index thus:

    > use accountsdb
    switched to db accountsdb
    > db.users.ensureIndex ( { "email":  1 } )
    Inserted 1 record(s) in 2ms

...but keep making queries like:

    public List< User > readByDto( User user )
    {
        Query< User > query = getDatastore().createQuery( User.class );

        query.field( "email" ).equal( user.getEmail() );
        query.field( "birthdate" ).equal( user.getBirthdate() );
        query.field( "address" ).equal( user.getAddress() );

        return query.asList();
    }

...the index isn't doing anything and the query results in a nasty table scan that can take a very long time. The solution is to analyze your data. Spend time thinking about...

If you don't want to bog down threads spending a lot of time in the database, you've got to accept the hit on diskspace and memory size to create one or more indices that will accommodate what you're trying to do.

For instance, I once was involved in a situation similar to the one above in which most queries were looking for a user by e-mail address, but were also frequently looking for another field at the same time, something the original index didn't account for. Changing the index to the following:

    > use accountsdb
    switched to db accountsdb
    > db.addresses.dropIndex( "partneroid_1" )
    { "nIndexesWas": 2, "ok": 1 }
    > db.users.ensureIndex ( { "email":  1, "birthdate": 1 } )
    Inserted 1 record(s) in 2ms
    > db.users.getIndexes()
    [
      {
        "v": 1,
        "key": {
          "_id": 1
        },
        "ns": "accountsdb.users",
        "name": "_id_"
      },
      {
        "v": 1,
        "key": {
          "email": 1
          "birthdate": 1
        },
        "ns": "accountsdb.users",
        "name": "email_1_birthdate_1"
      }
    ]

...because we were often looking for users by their e-mail address and birthdate, had a 10-fold effect upon our search times (and this was for a users collection of only some 300-400 users).

And, the times you're only looking for a user by e-mail address, this still suffices.

Here's code to create such an index from Java, but only if it doesn't already exist. It returns true if it had to create the index.

    private static boolean addUserIndices( MongoClient mongo )
    {
        DBCollection     collection = mongo.getCollection( "accountsdb", "users" );
        List< DBObject > indices    = collection.getIndexInfo();
        boolean          something  = false,
                         birthdate  = false;

        // we expect three indices of our own...
        for( DBObject index : indices )
        {
            String name = ( String ) index.get( "name" );

            if( name.equals( "_id_" ) )        // cause MongoDB always creates this one
                continue;

            if( name.equals( "email_1_birthdate_1" ) )
            	birthdate = true;
        }

        if( birthdate )
            collection.createIndex( new BasicDBObject( "email", 1 ).append( "birthdate", 1 ) );

        String message = null;

        if( birthdate )
            message = "Index for birthdate added";

        if( something && message != null )
        {
            log.info( message );
            return true;
        }

        return false;
    }

With the index above, that includes the fields for e-mail and birthdate, but not address, the query shown earlier will still devolve to a table scan, but the frequent query for user by e-mail and birthdate will not.