Record for NoSQL stores

Nearly every developer is familiar with SQL. It has been the reliable provider of data persistence for many years, both prior to mass adoption of the internet right up to the current day. The continued growth of the internet means that applications have to deal with more and more data in increasingly write-orientated architectures. Simply put, the massive amount of interaction that applications commonly require these days is progressively making SQL-based stores tricky to scale.

We can also all appreciate the elegant logic behind the normalization of database schema, but there’s more often than not a mismatch between this structure and mod- ern web programming paradigms. Developers often place ORM systems, such as Map- per, in between their application code and the underlying RDBMS in order to obtain a more OO feel to their data access.

Increasingly, particular organizations have started to wonder if there is perhaps a better, more natural way to work with their data that would better suit various specialized problem domains. Although these problem domains differ fairly widely, the various products are broadly united under the so-called NoSQL movement, because they all shun SQL in favor of a specialized interface. Examples include custom communica- tion interfaces like Thrift, custom data formats like BSON, and custom query con- structs like MapReduce. The Wikipedia article has more background about NoSQL (http://en.wikipedia.org/wiki/NoSQL).

NOTE The NoSQL movement is still a relatively new development, and if you haven’t had time to investigate it, you may be wondering what the purpose of all this specialized technology is. The majority of NoSQL solutions are designed to solve a specific use case, usually from the industry the vendor is from. Although many people are finding these technologies useful in a general sense, there’s no need to worry about them if they don’t fit your use case.

Relational databases are still a really good fit for most applications.

There are many NoSQL stores currently available, and it’s somewhat beyond the scope of this book to list them all and their various nuances, so the following section specifically covers Lift’s integration with NoSQL stores and the Record abstractions the framework provides.

11.3.1 NoSQL support in Lift

NoSQL comes in many flavors, and each store provides different functionality. Lift’s support for the different backends has grown rather organically as the NoSQL scene has expanded and evolved. At the time of writing, Lift provides out-of-the-box NoSQL support for CouchDB (http://couchdb.apache.org/) and MongoDB (http://

www.mongodb.org/).

Both Couch and Mongo are what is known as document-oriented data stores. This essentially means that rather than using tables, as in relational database systems, sche- maless JSON documents store information, where each document has properties and

collections that can be accessed just like any other JSON document. You can retrieve a specific document by asking for a specific key. For example, imagine asking for a specific ISBN number to retrieve the book object you were interested in from among a collection of books. You can think of these keys as being analogous to the primary keys in RDBMS tables.

Record also provides certain idioms so that the different storage mechanisms have similar if not identical operational semantics. Typically, records can be created and persisted like so:

MyThing.createRecord.fieldOne("value").fieldTwo("tester").save

This is true for both the CouchDB and MongoDB implementations covered here, and it should generally be the case for most Record implementations.

Without further ado, let’s walk through some of the basic functionality that each abstraction provides before going on to explore the MongoDB abstraction in greater depth.

COUCHDB

One of the first NoSQL stores to land in popular IT culture was CouchDB. Broadly speaking, Couch and Mongo appear to have many similarities, but they’re mostly skin- deep. Couch typically excels in scenarios where you have master-master replication, typically found in applications that go offline or require the syncing of databases. A good example would be an email client syncing with the server—the local database would likely be out of date if the user was disconnected from the network for a period of time. In essence, if your problem requires eventual consistency over distributed storage nodes, or you require a MapReduce interface, CouchDB is a good candidate to evaluate.

Lift provides a Record abstraction to interoperate with CouchDB, and it allows you to interact with Couch in a manner that follows the Record idioms of having

Eventual consistency

With the rise of distributed systems, it quickly became apparent that building distributed systems (particularly data stores) that maintained the ACID properties (ato- micity, consistency, isolation, durability) was going to be exceedingly difficult, and that such systems would be unlikely to scale to the needs of the humongous systems being constructed now and looking to the future.

Subsequently, the idea of a system that was eventually consistent was born. Given a multi-node database to which an update is sent and a sufficiently long period of time, you can assume that all updates are either applied to all nodes, or that the nodes that didn’t take the updates retired from the service, so that the various distributed nodes of that system eventually become consistent. This is known as Basi- cally Available, Soft-state, Eventual consistency (BASE), and it’s a principle that nearly all distributed NoSQL stores adopt.

275 Record for NoSQL stores

contextually rich fields. Before attempting to use the Couch module, make sure that you’ve included the dependency in your SBT project definition:

val couch = "net.liftweb" %% "lift-couchdb" % liftVersion

In order to start using the Lift integration with Couch, a small amount of setup is required for your Boot class:

import net.liftweb.couchdb.{CouchDB, Database}

import dispatch.{Http, StatusCode}

val database = new Database("bookstore") database.createIfNotCreated(new Http()) CouchDB.defaultDatabase = database

The code in this example is pretty straightforward and should be fairly self-explanatory, with the possible exception of the new Http() statement. Lift’s CouchDB client builds on top of the HTTP Dispatch project (http://dispatch.databinder.net/) in order to communicate back and forth with the Couch server. This statement essentially hands the CouchDB record a vehicle through which it can make HTTP calls. In this particular case, a database is defined and specified in the CouchDB configuration object so you don’t have to pass the connection information later on, assuming you only want to communicate with a single Couch server.

With the database connection configured, you can start to interact with CouchDB by defining the specialized Record classes as detailed in the following listing.

import net.liftweb.record.field._

import net.liftweb.couchdb.{CouchRecord,CouchMetaRecord}

class Book private () extends CouchRecord[Book]{

def meta = Book

val title = new StringField(this, "") val publishedInYear = new IntField(this, 1990) }

object Book extends Book with CouchMetaRecord[Book]

The implementation here looks rather similar to the Squeryl variant detailed in listing 11.4. Specifically, note how the definitions of the fields are identical. The main difference between Squeryl and CouchDB here is the extension of the CouchRecord and CouchMetaRecord types. CouchDB requires a couple of different fields to be implemented in any given entity in order to control the versioning and revision systems, both of which are handled automatically for you by the CouchRecord supertype.

The CouchMetaRecord and Database types give you various convenience methods for interacting with the views provided by Couch for interacting with stored documents: both to create and query. CouchDB querying essentially utilizes these Map- Reduce views in order to obtain query-style data. The views themselves can be precreated and then used in your application at runtime.

Listing 11.6 Implementing a basic CouchDB record

Implement Couch types Define field

types

To create a view using lift-couchdb, you can do something like this:

import net.liftweb.json.Implicits.{int2jvalue, string2jvalue}

import net.liftweb.json.JsonAST.{JObject}

import net.liftweb.json.JsonDSL.{jobject2assoc, pair2Assoc, pair2jvalue}

val design: JObject =

("language" -> "javascript") ~ ("views" -> ("oldest" ->

(("map" -> "function(doc) {

if (doc.type == 'Book'){ emit(doc.title, doc.publishedInYear); }}") ~ ("reduce" -> "function(keys, values) {

return Math.max.apply(null, values); }"))))

Http(database.design("design_name") put design)

If you’re not too familiar with Couch, this may look somewhat odd. This is a specialized CouchDB MapReduce function that obtains the oldest book document. The key line sends the design with the assigned name “design_name” to the database B. Once it’s in place, you can run a query via the Book meta record as shown:

val book = Book.queryView("design_name", "oldest")

This one line calls Couch and executes the predefined view to retrieve the oldest Book title held in the database.

CouchDB is a large subject in and of itself, but this should give you a sense, at a high level, of how the Lift implementation operates.

MONGODB

MongoDB, like CouchDB, is a document-oriented store, but rather than using prewrit- ten views to obtain query data, Mongo is better suited to creating dynamic queries, similar to what you might construct using traditional SQL. Mongo uses a custom query syntax rather than using MapReduce, which although supported, is for data aggrega- tion rather than general-purpose querying.

Mongo uses a custom binary protocol to communicate from your application to the data store, which generally yields a more flexible programming interface than is possible with HTTP. In addition, MongoDB positions itself as being a general-purpose NoSQL database that was designed from the ground up for use in internet applications.

Unlike the CouchDB implementation, the Mongo support in Lift comes in two parts: lift-mongo provides a thin Scala wrapper around the MongoDB driver for Java, and lift-mongo-record provides the integration for using Record with Mongo.

To get started, ensure you’ve added the dependency to your project and called update from the SBT shell:

val mongo = "net.liftweb" %% "lift-mongodb-record" % liftVersion

By default, Lift assumes that the MongoDB server is configured on the same machine (localhost), so for development and testing, it’s likely you’ll need no configuration in your Boot class. But if you need to specify where your Mongo installation is hosted, simply add the following lines:

Create new design

277 Record for NoSQL stores

import net.liftweb.mongodb.{MongoDB, DefaultMongoIdentifier, MongoAddress, MongoHost}

MongoDB.defineDb(

DefaultMongoIdentifier,

MongoAddress(MongoHost("localhost", 27017), "your_db"))

The call to MongoDB.defineDb essentially tells the MongoDB driver where to locate the MongoDB server. The following examples, however, assume that the MongoDB install is the default, local install.

Now that the connection is ready, the next thing is to define your Mongo Record.

The next listing shows the most basic example.

import net.liftweb.record.field._

import net.liftweb.mongodb.record.{MongoRecord,MongoMetaRecord,MongoId}

object Book extends Book with MongoMetaRecord[Book]

class Book private () extends MongoRecord[Book]

➥ with MongoId[Book]{

def meta = Book

object title extends StringField(this, "")

object publishedInYear extends IntField(this, 1990) }

This is nearly identical to the CouchDB and Squeryl examples previously listed, with the only change being the two supertypes, which are now MongoRecord and Mongo- MetaRecordB. MongoRecord supports the specialized querying for the backend store, just as CouchRecord does.

MongoDB deals with collections. These collections can be thought of as similar to tables, and each MongoDB Record entity you create generally represents a collection.

By default, the collection will use the pluralized name of the class—Books in this instance. Each document in the collection will be represented by a Book entity instance.

Let’s assume you want to run a couple of queries:

import net.liftweb.json.JsonDSL._

Book.findAll("title" -> "Lift in Action")

Book.findAll("publishedInYear" -> ("$gte" -> 2005)) Book.findAll("$where" -> "function() {

return this.publishedInYear == '2011'}")

There are three different queries here, but the first one should be fairly self-explanatory: Mongo will go looking for titles that match “Lift in Action”. The second line defines a range query that will retrieve all documents where the publishedInYear is greater than 2005. Finally, the last line makes use of the special MongoDB query construct $where and passes a JavaScript function to confine the result set. Mongo has a whole set of these special identifiers, documented at http://www.mongodb.org/

display/DOCS/Advanced+Queries, but by using the Lift abstraction, you can use what- ever combinations you prefer.

Listing 11.7 Basic implementation of Mongo Record

Extend Mongo classes

That’s the basics of using NoSQL with Record. Irrespective of these two different stores, you can see how Record brings a degree of uniformity that makes it smoother to change your backing store at a later date and also interoperate with other Lift infrastructure. Let’s take the information from this section and re-implement the bookstore from earlier in the chapter with MongoDB.

11.3.2 Bookstore with MongoDB

NoSQL solutions have a rather different way of handling their data, and in many respects this significantly alters the way we as developers need to model our entities.

Specifically with MongoDB, it’s more idiomatic to store information using embedded documents that appear as collections on a given entity, if for the majority of time that data isn’t changing. In practice, the data is just copied into each document. Some- times having a reference is beneficial, but it depends on your use case.

With the Book, Publisher, and Author relationships, the Book entity will really be the main interaction point because once a Book has a Publisher, it’s largely immuta- ble—the same is true of Author. With this in mind, it isn’t a problem to simply embed the appropriate Publisher and Author documents so that they appear as properties of the Book entity.

TIP When using Mongo, a general rule of thumb is that you embed and copy data when it seems reasonable, and fall back to referencing separate entities when the use case demands it. Generally speaking, try to arrange your Mongo entities with the most commonly accessed aspect being the top level, and other aspects being either embedded documents or, in lesser cases, refer- enced entities. The classic scenario is a single blog post that has many comments; the comments are appended directly to the post entity document.

Let’s add those two additional fields for Publisher and Author to the Book record, as shown in the next listing.

import net.liftweb.record.field._

import net.liftweb.mongodb.{JsonObject,JsonObjectMeta}

import net.liftweb.mongodb.record.{MongoRecord,MongoMetaRecord,MongoId}

import net.liftweb.mongodb.record.field._

class Book private () extends MongoRecord[Book]

with MongoId[Book]{

def meta = Book

object title extends StringField(this, "")

object publishedInYear extends IntField(this, 1990) object publisher

extends JsonObjectField[Book, Publisher]

➥(this, Publisher) {

def defaultValue = Publisher("", "") }

Listing 11.8 The full Book entity using MongoRecord

Embedded publisher

279 Record for NoSQL stores

object authors extends

MongoJsonObjectListField[Book, Author](this, Author) }

object Book extends Book with MongoMetaRecord[Book]

case class Publisher(name: String, description: String) extends JsonObject[Publisher] { def meta = Publisher } object Publisher extends JsonObjectMeta[Publisher]

case class Author(firstName: String, lastName: String)

extends JsonObject[Author] { def meta = Author } object Author extends JsonObjectMeta[Author]

There’s a fair amount going on here, over and above the initial implementation in listing 11.7. First, notice the publisher object at B. This inner object extends Json- ObjectField, which essentially means it holds a nested Mongo document. In this particular case, the field is told that it should expect the Publisher type defined at D. The Publisher definition, itself, is a simple caseclass that extends JsonObject and has a companion object called JsonObjectMeta.

The same is true for the Author class defined at E, but because a single Book could feasibly have multiple authors, the entity property authors extends MongoJson- ObjectListField C. As you might imagine, this contains a list of documents, as opposed to the single document required by publisher, so in practice you can think of this field as a simple list or array of documents.

Now that you have the MongoRecord for Book in place, you can start to play around with constructing and querying instances of Book:

scala> import sample.model.mongo._

import sample.model.mongo._

scala>Book.createRecord .title("sample")

.authors(Authors(List(Author("tim","perrett")))) .publisher(Publisher("Manning","")).save

res2: sample.model.mongo.Book = class sample.model.mongo.Book={...}

scala>Book.findAll

res3: List[sample.model.mongo.Book] = List(...) scala>Book.find("title" -> "Lift in Action")

res21: net.liftweb.common.Box[sample.model.mongo.Book] = ...

You can see in this code snippet that it’s easy to query Mongo for specific data in a simple case, such as finding a book by a title, but when you have larger, more complex queries, the syntax can become rather unwieldy. It’s at this point that it would be great to add some more type-safety to the querying, as opposed to passing everything around as strings. This is where the type-safe Rogue DSL comes in.

Embedded authors list

Publisher definition

Author definition

If you’d like to use Rogue, be sure to add the dependency to your project definition and run update from the SBT shell:

val rogue = "com.foursquare" %% "rogue" % "1.0.2"

When Rogue is present in your project, you can create queries simply by adding the following import statement:

import com.foursquare.rogue.Rogue._

This then allows you to interact with Mongo using the DSL:

Book where (_.publishedInYeargte 1990) fetch()

Book where (_.title eqs "Lift in Action") limit(1) fetch()

This is the tip of the iceberg, and the abstraction can do a whole set of other things that are somewhat out of the scope of this section. If you’d like to know more about Rogue, check out the Foursquare engineering blog (http://engineering.foursquare.com/), and particularly the entry on Rogue and type safety (http://mng.bz/R58g), or the Foursquare repository on github.com: https://github.com/foursquare/rogue.

In this section, you’ve seen the NoSQL support that Lift provides out of the box through the Record abstraction. NoSQL through Record could feasibly take many forms, but this section has primarily focused on CouchDB and MongoDB, showing you how to leverage these exciting new technologies and still have the familiar Lift semantics and integration with infrastructure like LiftScreen.

Implementing the basket and checkout process

Forms with LiftScreen and Wizard