atomic (all or none),
consistent (what's written is valid),
isolated (one operation at a time) and
durability (once committed, it's there).
consistent (all or nothing: all clients always have the same view of the
available (service always available: each client can always read and
partition-tolerant (only complete network failure fails to give
response: works well despite physical network partitions).
Most NoSQL databases yield 2 out of 3. MongoDB gives 2 of 3: (eventual)
consistent and partition-tolerant. Examples:
If you shard heavily and only use Key/Value look-ups, then Riak is probably
easier to manage on a large scale. In fact, the Bump post says exactly this: We
decided to move to Riak because it offers better operational qualities than
MongoDB...Nagios will email us instead of page us,...
If you're using MongoDB heavily as a cache, maybe you end up using Membase /
Redis / HBase.
If you start using MongoDB as a Queue, you will eventually want to look at real
queuing systems. RabbitMQ, ActiveMQ, ZeroMQ, etc.
If you start using MongoDB to do search, you will eventually find that things
like Solr and Sphinx are better suited to resolve the full spectrum of search
The key here is that MongoDB is really a set of trade-offs. Many of the
databases above are very specific in what they do. MongoDB is less specific but
is serviceable for handling many different cases.
MongoDB is superbly adapted to reads if requiring a serious amount of
understanding and effort (especially in picking suitable shard keys) to ensure
MongoDB (2007) -----------------------------------------------------------> MongoDB
CouchDB (2005) -------------------------------+ (personnel-only)
NorthScale (?, memcache) -------> Membase ------------------------------------> Couchbase
MongoDB has done a great job with sharding, something that was mostly an add-on
feature back in the days of traditional RDBMS. However, MongoDB is dedicated to
the document. Sharding works well, but a) it's challenging to choose the best
shard key and b) "document" implies operations weighting more heavily toward
read than write.
Couchbase is more recent, builds on the shoulders of other NoSQL work and
specifically to solve the difficulty of sharding via a sort of "auto-sharding"
for balancing write loads.
There are a lot of good things about couch; the multi-master replication is
very nice, though it requires some special handling to deal with inevitable
conflicts that occur if the same data is changed in multiple places before the
replication occurs or you’ll potentially lose track of (not technically lose,
since it's still there) some of your data.
The main downside (or upside, depending on how you look at it) to Couch is that
whatever types of lookups you want to have have to be defined as "views." A
view is basically an indexed map/reduce output—though it's actually only
the map that is indexed. It's very powerful, but in some ways limited.
What MongoDB has over Couchbase is that it is a lot more flexible and is a good
middle ground, providing a lot of the query capabilities of a traditional RDBMS
while still giving the flexibility of a JSON document store. Couchbase requires
a major adjustment in thinking since you can’t just do look-ups on arbitrary
fields; basically you have to manage the indices explicitly and using
like Couch-Lucene to gain full text search, so there is a lot of flexibility
there if you’re willing to do the work for it.
Couchbase is key-value pairs; there is no indexing of secondary fields
possible. This is one way in which a document database like MongoDB is superior.
Is what you're doing more write- or read-intensive?
MongoDB is really about documents at the ready for reading with redundancy
backing up. Sharding was created to handle write-intensity, but if writing is
the lion's share of what you're doing, you may wish to choose something else.
In a recent, high-profile example, Israeli company Viber abandoned MongoDB
after years of swearing by it because they have a very high-write volume.
This said, Couchbase is, according to what I've heard, pretty nasty in the
"finding and reading objects" department. Here they use Cassandra though I'm
not confident they arrived at the decision in the right way. I've been told
mostly that it works better with Amazon S3, but I'm pretty sure they're wrong
about that and just didn't know what they were doing. Here, it's
read-intensive, so I think they've made a mistake, but I'm nobody anyway.
There's also old-fashioned RDBMS like MySQL and PostgreSQL (and Oracle,
hehehe), but these aren't so good for writing to and their sharding solutions
are add-ons. They also do objects with more difficulty (though the problem's
been mastered via Hibernate in Java).
Each uses a different approach with different features and drawbacks.
What you give up (in theory or in practice):
Here are some downsides to MongoDB, excerpted from an article.
This is a common tradeoff people make when using MongoDB to achieve high
performance. We ended up making the tradeoff too by specifying the most
aggressive write concern (error ignored) and read preference
(secondaryPreferred) for the most performance demanding modules. We would
rather not do that if MongoDB could give us both strong data
consistency/durability and high performance. The cost of the tradeoff was
potential data loss and data inconsistency. Although for these modules minor
data loss or temporary data inconsistency is acceptable, we want to react
quickly if the situation gets worse. That was why we ended up building
comprehensive monitoring support to watch the data and replication lag closely.
In MongoDB, there is no support for joins. If the data is highly normalized,
then the client application has to issue more queries (more network round trips
that add latency) to fetch data from different collections (tables). We had to
de-normalize the data in DPS to reduce network round trips. The cost was that
same data could be scattered in different collections, which not only occupied
more disk space but also could easily lead to data inconsistency. Applications
also needed to do busy work duplicating data in different collections, which
became frustrating at times. It is a good idea to carefully design your schema
in order to make the right tradeoffs for your application.
Using write concerns and read preferences can mitigate some of the data
consistency and durability problems without using transactions, but it cannot
guarantee atomic update across multiple documents or multiple collections. As
of now, we are still not confident to use MongoDB in Perfect Market Vault (a
web based admin tool that is used by both internal administrators and external
partners) because due to the nature of the data Vault manages the requirement
on multi-document or multi-collection data consistency for Vault is much more
demanding than it is for DPS.