Working with Data Models in MongoDB
This article discusses concepts for data modeling with MongoDB. Data modeling has been around as long as databases have. Typically, data models are used to reflect the objects that your application will handle, along with the relationship between those objects.
For example, consider an application that has an object called Car. The Car object is a data model. Another object is a Fan Belt, which is also a data model. However, Fan Belt is also an attribute of the Car object because each Car also has a Fan Belt.
The type of relationship between the Car and the Fan Belt is referred to as an aggregation composition, which is a type of object inheritance.
Let’s also suppose that there is an object called Garage, and each Car object can be represented or serviced by a Garage object. This type of relationship is referred to as an association composition.
The Garage is not part of a car; instead, it services a car. Both types of relationships, aggregate and association composition, are represented in data models by defining tables that represent these objects using integrity keys to define the relationships between the objects.
The relationships that are broken down into data models among the objects are said to be normalized. Normalization, however, can cause performance issues with databases because queries to the database have to branch out, check numerous tables, and so on.
As you will see in this article, most of the data modeling concepts that apply to relational databases also apply to NoSQL or document storage databases such as MongoDB. However, there are some key differences as well.
Document Data Structure
Relationships of data between document structures in MongoDB can be represented in two ways. The first way is similar to that of a relational database, in which the ID column of one table is used as the foreign key in another table. This type of data relationship is referred to as a reference in MongoDB terminology.
An example is an Address table that has the ID of a User in the Address table (userID). The same holds true for the user ID of a MongoDB structured User document with regard to the user ID in the Address document.
For example, given the following two documents, you can see where this reference relationship exists:
{ "_id" : ObjectId("54211d7ce182f488849dee96"), "name" : "Jacks Teller", "email": "teller@soa.com"} { "_id" : ObjectId("84776d7ce182f488845dee95"), "street" : "123 St.", "city": "Charming", "state": "CA", "UserID": ObjectId("54211d7ce182f488849dee96")}
The reference relationship above tells you that the User Jacks Teller has an address of 123 St., Charming, CA. It is very similar to a foreign key constraint between two tables in a relational database.
Another way to establish relationships between data within a MongoDB document structure is by embedding documents within documents. The process is referred to as embedding data.
A document in MongoDB is like a record for a relational database because both contain attributes for a given object, such as a User object. Being able to embed one document within another is a big advantage to a document storage database over a relational database.
It is not really possible to embed a record within a record, so technologies have been developed to store objects within records in their text format such as with YAML. You can also create parent and child record relationships, but they often require the use of more than one table.
In this context, relational databases can embed data within records, but that data is not query-ready until the embed data is converted back to an object within an application (per the YAML approach).
Here’s an example that illustrates the embedding data method of creating data relationships within MongoDB:
{ "_id" : ObjectId("54666d7ce182f488849dee96"), "name" : "Jacks Teller", "email": "teller@soa.com", "address": {"street" : "123 St.", "city": "Charming", "state": "CA"}}
In this example, the address is embedded directly into the User document. If you are familiar with how to do document queries in MongoDB, you know this is an advantage because you need to do only one query to get the data for both the user and address.
If you use a reference relationship between user and address, you have to go out and match against two documents, thus decreasing performance (although nominally in this example).
As your database gets very large, it may be time to reduce the normalization and change to less-atomic queries to speed up performance. However, reference relationships are probably the best design choice overall if very high performance is not a necessity, which is usually the case with small- to medium-size applications and datasets.
You can also see that embedding data in this context strongly represents JSON data, and the way this data is constructed. Once stored in the database, it is a JSON document, so it can be considered a JSON object.
By embedding documents within documents at several levels, you can represent the most complex application objects by storing them in MongoDB as JSON strings.