It’s common to create indexes to enhance query performance. Fortunately, MongoDB’s indexes can be created easily from the shell. If you’re new to database indexes, this section should make the need for them clear; if you already have indexing experi- ence, you’ll see how easy it is to create indexes and then profile queries against them using the explain() method.
2.2.1 Creating a large collection
An indexing example makes sense only if you have a collection with many documents.
So you’ll add 20,000 simple documents to a numbers collection. Because the MongoDB shell is also a JavaScript interpreter, the code to accomplish this is simple:
> for(i = 0; i < 20000; i++) { db.numbers.save({num: i});
}
WriteResult({ "nInserted" : 1 })
1 For the full list of keyboard shortcuts, please visit http://docs.mongodb.org/v3.0/reference/program/
mongo/#mongo-keyboard-shortcuts.
40 CHAPTER 2 MongoDB through the JavaScript shell
That’s a lot of documents, so don’t be surprised if the insert takes a few seconds to complete. Once it returns, you can run a couple of queries to verify that all the docu- ments are present:
> db.numbers.count() 20000
> db.numbers.find()
{ "_id": ObjectId("4bfbf132dba1aa7c30ac830a"), "num": 0 } { "_id": ObjectId("4bfbf132dba1aa7c30ac830b"), "num": 1 } { "_id": ObjectId("4bfbf132dba1aa7c30ac830c"), "num": 2 } { "_id": ObjectId("4bfbf132dba1aa7c30ac830d"), "num": 3 } { "_id": ObjectId("4bfbf132dba1aa7c30ac830e"), "num": 4 } { "_id": ObjectId("4bfbf132dba1aa7c30ac830f"), "num": 5 } { "_id": ObjectId("4bfbf132dba1aa7c30ac8310"), "num": 6 } { "_id": ObjectId("4bfbf132dba1aa7c30ac8311"), "num": 7 } { "_id": ObjectId("4bfbf132dba1aa7c30ac8312"), "num": 8 } { "_id": ObjectId("4bfbf132dba1aa7c30ac8313"), "num": 9 } { "_id": ObjectId("4bfbf132dba1aa7c30ac8314"), "num": 10 } { "_id": ObjectId("4bfbf132dba1aa7c30ac8315"), "num": 11 } { "_id": ObjectId("4bfbf132dba1aa7c30ac8316"), "num": 12 } { "_id": ObjectId("4bfbf132dba1aa7c30ac8317"), "num": 13 } { "_id": ObjectId("4bfbf132dba1aa7c30ac8318"), "num": 14 } { "_id": ObjectId("4bfbf132dba1aa7c30ac8319"), "num": 15 } { "_id": ObjectId("4bfbf132dba1aa7c30ac831a"), "num": 16 } { "_id": ObjectId("4bfbf132dba1aa7c30ac831b"), "num": 17 } { "_id": ObjectId("4bfbf132dba1aa7c30ac831c"), "num": 18 } { "_id": ObjectId("4bfbf132dba1aa7c30ac831d"), "num": 19 } Type "it" for more
The count() command shows that you’ve inserted 20,000 documents. The subse- quent query displays the first 20 results (this number may be different in your shell).
You can display additional results with the it command:
> it
{ "_id": ObjectId("4bfbf132dba1aa7c30ac831e"), "num": 20 } { "_id": ObjectId("4bfbf132dba1aa7c30ac831f"), "num": 21 } { "_id": ObjectId("4bfbf132dba1aa7c30ac8320"), "num": 22 } ...
The it command instructs the shell to return the next result set.2
With a sizable set of documents available, let’s try a couple queries. Given what you know about MongoDB’s query engine, a simple query matching a document on its num attribute makes sense:
> db.numbers.find({num: 500})
{ "_id" : ObjectId("4bfbf132dba1aa7c30ac84fe"), "num" : 500 }
2 You may be wondering what’s happening behind the scenes here. All queries create a cursor, which allows for iteration over a result set. This is somewhat hidden when using the shell, so it isn’t necessary to discuss in detail at the moment. If you can’t wait to learn more about cursors and their idiosyncrasies, see chapters 3 and 4.
41 Creating and querying with indexes
RANGEQUERIES
More interestingly, you can also issue range queries using the special $gt and $lt operators. They stand for greater than and less than, respectively. Here’s how you query for all documents with a num value greater than 199,995:
> db.numbers.find( {num: {"$gt": 19995 }} )
{ "_id" : ObjectId("552e660b58cd52bcb2581142"), "num" : 19996 } { "_id" : ObjectId("552e660b58cd52bcb2581143"), "num" : 19997 } { "_id" : ObjectId("552e660b58cd52bcb2581144"), "num" : 19998 } { "_id" : ObjectId("552e660b58cd52bcb2581145"), "num" : 19999 }
You can also combine the two operators to specify upper and lower boundaries:
> db.numbers.find( {num: {"$gt": 20, "$lt": 25 }} )
{ "_id" : ObjectId("552e660558cd52bcb257c33b"), "num" : 21 } { "_id" : ObjectId("552e660558cd52bcb257c33c"), "num" : 22 } { "_id" : ObjectId("552e660558cd52bcb257c33d"), "num" : 23 } { "_id" : ObjectId("552e660558cd52bcb257c33e"), "num" : 24 }
You can see that by using a simple JSON document, you’re able to specify a range query in much the same way you might in SQL. $gt and $lt are only two of a host of operators that comprise the MongoDB query language. Others include $gte for greater than or equal to, $lte for (you guessed it) less than or equal to, and $ne for not equal to. You’ll see other operators and many more example queries in later chapters.
Of course, queries like this are of little value unless they’re also efficient. In the next section, we’ll start thinking about query efficiency by exploring MongoDB’s indexing features.
2.2.2 Indexing and explain( )
If you’ve spent time working with relational databases, you’re probably familiar with SQL’s EXPLAIN, an invaluable tool for debugging or optimizing a query. When any database receives a query, it must plan out how to execute it; this is called a query plan. EXPLAIN describes query paths and allows developers to diagnose slow opera- tions by determining which indexes a query has used. Often a query can be executed in multiple ways, and sometimes this results in behavior you might not expect.
EXPLAIN explains. MongoDB has its own version of EXPLAIN that provides the same ser- vice. To get an idea of how it works, let’s apply it to one of the queries you just issued.
Try running the following on your system:
> db.numbers.find({num: {"$gt": 19995}}).explain("executionStats")
The result should look something like what you see in the next listing. The "execution- Stats" keyword is new to MongoDB 3.0 and requests a different mode that gives more detailed output.
42 CHAPTER 2 MongoDB through the JavaScript shell
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "tutorial.numbers", "indexFilterSet" : false,
"parsedQuery" : { "num" : {
"$gt" : 19995 }
},
"winningPlan" : {
"stage" : "COLLSCAN", "filter" : {
"num" : {
"$gt" : 19995 }
},
"direction" : "forward"
},
"rejectedPlans" : [ ] },
"executionStats" : {
"executionSuccess" : true, "nReturned" : 4,
"executionTimeMillis" : 8, "totalKeysExamined" : 0, "totalDocsExamined" : 20000, "executionStages" : { "stage" : "COLLSCAN", "filter" : {
"num" : {
"$gt" : 19995 }
},
"nReturned" : 4,
"executionTimeMillisEstimate" : 0, "works" : 20002,
"advanced" : 4, "needTime" : 19997, "needFetch" : 0, "saveState" : 156, "restoreState" : 156, "isEOF" : 1,
"invalidates" : 0, "direction" : "forward", "docsExamined" : 20000 }
},
"serverInfo" : {
"host" : "rMacBook.local", "port" : 27017,
"version" : "3.0.6",
Listing 2.1 Typical explain("executionStats") output for an unindexed query
43 Creating and querying with indexes
"gitVersion" : "nogitversion"
}, "ok" : 1 }
Upon examining the explain() output,3 you may be surprised to see that the query engine has to scan the entire collection, all 20,000 documents (docsExamined), to return only four results (nReturned). The value of the totalKeysExamined field shows the number of index entries scanned, which is zero. Such a large difference between the number of documents scanned and the number returned marks this as an inefficient query. In a real-world situation, where the collection and the documents themselves would likely be larger, the time needed to process the query would be sub- stantially greater than the eight milliseconds (millis) noted here (this may be differ- ent on your machine).
What this collection needs is an index. You can create an index for the num key within the documents using the createIndex() method. Try entering the following index creation code:
> db.numbers.createIndex({num: 1}) {
"createdCollectionAutomatically" : false, "numIndexesBefore" : 1,
"numIndexesAfter" : 2, "ok" : 1
}
The createIndex() method replaces the ensureIndex() method in MongoDB 3. If you’re using an older MongoDB version, you should use ensureIndex() instead of createIndex(). In MongoDB 3, ensureIndex() is still valid as it’s an alias for create- Index(), but you should stop using it.
As with other MongoDB operations, such as queries and updates, you pass a docu- ment to the createIndex() method to define the index’s keys. In this case, the {num: 1}
document indicates that an ascending index should be built on the num key for all documents in the numbers collection.
You can verify that the index has been created by calling the getIndexes() method:
> db.numbers.getIndexes() [
{
"v" : 1, "key" : { "_id" : 1 },
3 In these examples we’re inserting “hostname” as the machine’s hostname. On your platform this may appear as localhost, your machine’s name, or its name plus .local. Don’t worry if your output looks a little dif- ferent than ours‘; it can vary based on your platform and your exact version of MongoDB.
44 CHAPTER 2 MongoDB through the JavaScript shell "name" : "_id_",
"ns" : "tutorial.numbers"
}, {
"v" : 1, "key" : { "num" : 1 },
"name" : "num_1",
"ns" : "tutorial.numbers"
}
]
The collection now has two indexes. The first is the standard _id index that’s automat- ically built for every collection; the second is the index you created on num. The indexes for those fields are called _id_ and num_1, respectively. If you don’t provide a name, MongoDB sets hopefully meaningful names automatically.
If you run your query with the explain() method, you’ll now see the dramatic dif- ference in query response time, as shown in the following listing.
> db.numbers.find({num: {"$gt": 19995 }}).explain("executionStats") {
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "tutorial.numbers", "indexFilterSet" : false,
"parsedQuery" : { "num" : {
"$gt" : 19995 }
},
"winningPlan" : { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "num" : 1 },
"indexName" : "num_1", "isMultiKey" : false, "direction" : "forward", "indexBounds" : { "num" : [
"(19995.0, inf.0]"
] } } },
"rejectedPlans" : [ ] },
Listing 2.2 explain() output for an indexed query
Using the num_1 index
45 Creating and querying with indexes
"executionStats" : {
"executionSuccess" : true, "nReturned" : 4,
"executionTimeMillis" : 0, "totalKeysExamined" : 4,
"totalDocsExamined" : 4, "executionStages" : { "stage" : "FETCH", "nReturned" : 4,
"executionTimeMillisEstimate" : 0, "works" : 5,
"advanced" : 4, "needTime" : 0, "needFetch" : 0, "saveState" : 0, "restoreState" : 0, "isEOF" : 1, "invalidates" : 0, "docsExamined" : 4, "alreadyHasObj" : 0, "inputStage" : { "stage" : "IXSCAN", "nReturned" : 4,
"executionTimeMillisEstimate" : 0, "works" : 4,
"advanced" : 4, "needTime" : 0, "needFetch" : 0, "saveState" : 0, "restoreState" : 0, "isEOF" : 1, "invalidates" : 0, "keyPattern" : { "num" : 1 },
"indexName" : "num_1", "isMultiKey" : false,
"direction" : "forward", "indexBounds" : { "num" : [
"(19995.0, inf.0]"
] },
"keysExamined" : 4, "dupsTested" : 0, "dupsDropped" : 0, "seenInvalidated" : 0, "matchTested" : 0 }
} },
"serverInfo" : {
"host" : "rMacBook.local", "port" : 27017,
"version" : "3.0.6", Four
documents returned
Much faster!
Only four documents scanned
Using the num_1 index
46 CHAPTER 2 MongoDB through the JavaScript shell "gitVersion" : "nogitversion"
}, "ok" : 1 }
Now that the query utilizes the index num_1 on num, it scans only the four documents pertaining to the query. This reduces the total time to serve the query from 8 ms to 0 ms!
Indexes don’t come free; they take up some space and can make your inserts slightly more expensive, but they’re an essential tool for query optimization. If this example intrigues you, be sure to check out chapter 8, which is devoted to indexing and query optimization. Next you’ll look at the basic administrative commands required to get information about your MongoDB instance. You’ll also learn tech- niques for getting help from the shell, which will aid in mastering the various shell commands.