Russell Bateman
7 June 2013
last update:


Should I shard? One way is to create a three by three system with the idea of standing up another set as you grow.

mongos is "almost" just an extension of the drivers.

Sharding is transparent to the application, but if you're doing updates, you issue an update, what rows, etc., but default it will...

mongos will get update and one by one ask the shards until it finds one that matches the criteria. You'll get failure if you issue an update without a specific shard key or include MULTI=TRUE.

Different errors come back from mongos than from mongod. If you've got retry logic that's error-specific, you could be unpleasantly sruprised.

Version upgrades when sharding

Replace the binary and restart the process (mongos). For all sharding upgrades:

  1. Disable the balancer and wait for any in-progress chunk migrations to finish before beginning the actual upgrade.
  2. Do a "rolling upgrade", secondaries and arbiters first.
  3. Ensure secondary has caught up before upgrading it.
  4. Step down the original primary, allow election to occur.
  5. Upgrade the original primary.

Memory size...

...for dedicated VM running mongod. Rule of thumb...

15% of total document size for index. If index is totally resident, then a 100Gb data set would require DB nodes run about 16Gb memory plus whatever the file system and other processes need. Plan on 24Gb memory per VM to cover this very well.

Replica sets

The oplog is a ring buffer logging what's going on. Idempotent.

Don't share memory (don't let hypervisor share memory), but sharing CPUs is less critical because MongoDB isn't compute-bound.

Here's a data center illustration:


WriteConcern.FSYNC_SAFE, what to use that's better. MongoClient, etc.


Download from 10gen, place on one server, discovers the other members of the replica set/sharding cluster.

Collects samples every minute, can look at 5-minute samplings, etc.


- Don't use Ext3 because of zeroing out at file creation.

- Don't use ReiserFs.

- Use Ext4.

- See

- RAID10.


This is how to set up two data centers.

    arbiter (2 votes)
    arbiter (1 vote)

This equals 7 votes (4 for a quorum).


When DC1 goes down, rig Chef to deploy another arbiter (using force option) to enable DC2 to elect itself a primary (because it then has 4 votes or a quorum). Do this in the case where a human determines that DC1 isn't coming back very soon and the down-streamers can't wait.

Java driver

Query retry: catch exception and try again. Be careful not to lock up the system. Why is the query failing in the first place?

Primary reads, fail-over, takes several seconds for an election. Do we get an exception? Can we do smart retries?

Problem is that sometimes get not-is-master. When primary election, old primary steps down, disposes of its sockets. What can happen is the exception is wrapped-IO connection (peer closed or somehthing). In general, trying to detect reliably is hard because it doesn't look any different from a transient network failure.

Driver will create new connections—won't tell you that the primary changed.


Return after 3 times on a MongoNetworkingException?

No, the sublasses of MongoException are too generic and the actual error depends hugely on whether returned from mongod or mongos. mongos is a pipe binding with the socket, so no help there.

Any network or server error is recoverable. DuplicateKey isn't recoverable. InternalException is driver's SOL error. NullPointerException shouldn't lead to retry. Most people retry quickly, then wait longer and longer, etc.

Election could take 10 seconds.

- If DB is unavailable, save query for later time (if such an action is applicable).

Mongo options

By default, timeout is 0, meaning never time out.

Close cursors

Will have a beneficent effect on garbage collection. Check Morphia out for access to this cursor!

Random notes last morning...

Can make call on connection to get replica set status stuff if need be. Based on that, you could make decisions on increasing or decreasing the WriteConcern level, Etc.

Discussed (but I drew no conclusions on this.)

WriteResult.getN() gets the number of documents updated.