A brief about MongoDB

Success of any application depends on database scheme. More normalized form of database leads to faster performance of application. To design database schema with normalization is not a piece of cake. Sometimes programmers need to change code according to change in requirement which changes database schema. Hence best way to map code or documents to database is Documented Oriented Database.

Document oriented database is schema- less database. One of the most popular document oriented database is Mongo Database. Mongo database is same as relational database except that it stores data in files as documents. Documents contain data such as array, arrays contain key value pairs, value can be data such as integer, string, etc.

Why MongoDB?

Document-oriented

  • Documents (objects) map nicely to programming language data types.
  • Embedded documents and arrays reduce need for joins.
  • Dynamically typed (schema-less) for easy schema evolution.
  • No joins and no multi-document transactions for high performance and easy scalability.

High performance

  • No joins and embedding makes reads and writes fast.
  • Indexes include indexing of keys from embedded documents and arrays.
  • Optional asynchronous writes.
  • High availability.
  • Replicated servers with automatic master failover.
  • Easy scalability.
  • Automatic sharding (auto-partitioning of data across servers).
  • Reads and writes are distributed over shards.
  • No joins or multi-document transactions make distributed queries easy and fast.
  • Eventually consistent  reads can be distributed over replicated servers
  • Rich query language.

Availability of a lot of useful features like embedded docs for speed, manageability, agile development with schema-less databases, easier horizontal scalability because joins aren’t as important.

Large MongoDB deployment:

1. One or more shards, each shard holds a portion of the total data (managed automatically). Reads and writes are automatically routed to the appropriate shards. Each shard is backed by a replica set which just holds the data for that shard.

A replica set is one or more servers, each holding copies of the same data. At any given time one is primary and the rest is secondary. If the primary goes down one of the secondary takes over automatically as primary. All writes and consistent reads go to the primary, and all eventually consistent reads are distributed amongst the secondary.

2. Multiple config servers, each one holds a copy of the meta data indicating which data lives on which shard.

3. Each router may act as a server for one or more clients. Clients issue queries/updates to a router and the router routes them to the appropriate shard with the help of config servers.

4. Each client is a part of the user’s application and issues commands to a router via the mongo client library (or driver) for its language.

mongod is the server program (data or config). mongos is the router program.

Mongo data model consists of:

  • A Mongo system (see deployment above) holds a set of databases.
  • A database holds a set of collections.
  • A collection holds a set of documents.
  • A field is a key-value pair.
  • A key is a name (string).
  • A value is a basic type like string, integer, float, timestamp, binary, a document, or an array of values.

Mongo query language:

To retrieve certain documents from a db collection, you fire a query document containing the fields that the desired documents should match. For example, {name: {first: ‘John’, last: ‘Doe’}} will match all documents in the collection with name of John Doe. Likewise, {name.last: ‘Doe’} will match all documents with last name of Doe. Also, {name.last: /^D/} will match all documents with last name starting with ‘D’ (regular expression match).

Queries will also match inside embedded arrays. For example, {keywords: ‘storage’} will match all documents with ‘storage’ in its keywords array. Likewise, {keywords: {$in: ['storage', 'DBMS']}} will match all documents with ‘storage’ or ‘DBMS’ in its keywords array.

If you have lots of documents in a collection and you want to make a query fast then build an index for that query. For example, ensureIndex({name.last: 1}) or ensureIndex({keywords: 1}). Note, indexes occupy space and slow down updates a bit, so use them only when the tradeoff is worth it.

 

This entry was posted in Database and tagged , , , . Bookmark the permalink.

Leave a Reply