Table of Contents
Schema Basics
Before exploring the more advanced schemas in this book it’s important to revisit schema basics. In this chapter we will explore the basic relationships from traditional relational databases and how they relate to the document model in MongoDB.
We will start with a look at the One-To-One (1:1) relationship then moving on to the One-To-Many (1:N) and finally the Many-To-Many (N:M).
One-To-One (1:1)
The 1:1 relationship describes a relationship between two entities. In this case the Author has a single Address relationship where an Author lives at a single Address and an Address only contains a single Author.
The 1:1 relationship can be modeled in two ways using MongoDB. The first is to embed the relationship as a document, the second is as a link to a document in a separate collection. Let’s look at both ways of modeling the one to one relationship using the following two documents:
Model
Embedding
The first approach is simply to embed the Address document as an embedded document in the User document.
The strength of embedding the Address document directly in the User document is that we can retrieve the user and its addresses in a single read operation versus having to first read the user document and then the address documents for that specific user. Since addresses have a strong affinity to the user document the embedding makes sense here.
Linking
The second approach is to link the address and user document using a foreign key
.
This is similar to how traditional relational databases would store the data. It is important to note that MongoDB does not enforce any foreign key constraints so the relation only exists as part of the application level schema.
One-To-Many (1:N)
The 1:N relationship describes a relationship where one side can have more than one relationship while the reverse relationship can only be single sided. An example is a Blog where a blog might have many Comments but a Comment is only related to a single Blog.
The 1:N relationship can be modeled in several different ways using MongoDB. In this chapter we will explore three different ways of modeling the 1:N relationship. The first is embedding, the second is linking and the third is a bucketing strategy that is useful for cases like time series. Let’s use the model of a Blog Post
and its Comments
.
Model
A Blog Post is a single document that describes one specific blog post.
For each Blog Post we can have one or more Comments.
Embedding
The first approach is to embed the Comments in the Blog Post
.
The embedding of the comments
in the Blog post means we can easily retrieve all the comments belong to a particular Blog post. Adding new comments is as simple as appending the new comment document to the end of the comments
array.
However, there are three potential problems associated with this approach that one should be aware off.
- The first is that the
comments
array might grow larger than the maximum document size of16 MB
. - The second aspects relates to write performance. As comments get added to Blog Post over time, it becomes hard for MongoDB to predict the correct document padding to apply when a new document is created. MongoDB would need to allocate new space for the growing document. In addition, it would have to copy the document to the new memory location and update all indexes. This could cause a lot more IO load and could impact overall write performance.
- The third problem is exposed when one tries to perform pagination of the comments. There is no way to limit the comments returned from the
Blog Post
using normal finds so we will have to retrieve the whole blog document and filter the comments in our application.
Linking
The second approach is to link comments to the Blog Post
using a more traditional foreign
key.
An advantage this model has is that additional comments will not grow the original Blog Post
document, making it less likely that the applications will run in the maximum document size of 16 MB
. It’s also much easier to return paginated comments as the application can slice and dice the comments more easily. On the downside if we have 1000 comments on a blog post, we would need to retrieve all 1000 documents causing a lot of reads from the database.
Bucketing
The third approach is a hybrid of the two above. Basically, it tries to balance the rigidity of the embedding strategy with the flexibility of the linking strategy. For this example, we will split the comments into buckets with a maximum of 50 comments in each bucket.
The main benefit of using buckets in this case is that we can perform a single read to fetch 50 comments at a time, allowing for efficient pagination.
Many-To-Many (N:M)
An N:M relationship is an example of a relationship between two entities where they both might have many relationships between each other. An example might be a Book that was written by many Authors. At the same time an Author might have written many Books.
N:M relationships are modeled in the relational database by using a join table. A good example is the relationship between books and authors.
- An author might have authored multiple books (1:N).
- A book might have multiple authors (1:M).
This leads to an N:M relationship between authors of books. Let’s look at how this can be modeled.
Two Way Embedding
Embedding the books in an Author document
Model
In Two Way Embedding
we will include the Book foreign keys
under the books
field.
Mirroring the Author document, for each Book we include the Author foreign keys
under the Author field.
Queries
As can be seen, we have to perform two queries in both directions. The first is to find either the author or the book and the second is to perform a $in query to find the books or authors.
One Way Embedding
The One Way Embedding strategy chooses to optimize the read performance of a N:M relationship by embedding the references in one side of the relationship. An example might be where several books belong to a few categories but a couple categories have many books. Let’s look at an example, pulling the categories out into a separate document.
Model
The reason we choose to embed all the references to categories in the books is due to there being lot more books in the drama category than categories in a book. If one embeds the books in the category document it’s easy to foresee that one could break the 16MB max document size for certain broad categories.