MongoDB and the Trouble with DBRefs

Published

DBRefs look great on paper, but experience tells us that they don't work anywhere as well as the name might suggest. Join us as we look at the feature that no one should use and why.

Sometimes a feature just looks too good not to use it. That was the case some years back with MongoDB's DBRef fields. Just add these references to other documents in your database and, if your driver supported it, it could also retrieve that document automatically. It was almost... relational.

What does a DBRef really do?

Under the covers, there was no relational magic. A DBRef is just three fields wrapped up to hide the fact that they simply are an $id ObjectId for the document pointed to, a $ref name for the collection that object id is in and, optionally, a $db database name if it is in a different database.

When a document is retrieved, DBRef's are left as references to those other documents. A driver could theoretically take it upon itself to find all the DBRefs and do a second round trip to get the document the DBRef pointed at and then merge it in place of the DBRef value. That's hard to manage and pathologically performance busting behavior so no driver does that.

Instead drivers tend to help the user create a query using the DBRef that you can execute to retrieve the referenced document. So, what we have with DBRefs is a flag to a client driver that it could compose a query for the user to retrieve another document.

Remember though that there's another thing that can also retrieve a document on demand: your application. It's not like you can't store an object id yourself - and optionally a collection name if it might be in a different collection - and use that to retrieve a document.

DBRefs and fragility

DBRefs are functionally only client deep; there's nothing going on in the server that makes them special. They are a standardized coercion of specially named fields into a pseudo-type, and thats pretty much it.

This wouldn't be quite so bad if DBRefs were reliable, and this is where we get to the well, hacky part. There is no DBRef operator. A DBRef happens when you insert an object with a $ref field, then a $id field and optionally a $db field. We'll demonstrate that now. Let's make a ship in one collection which are homes to crew members...

mongos> db.createCollection("ships")  
{ "ok" : 1 }
mongos> db.ships.insertOne({ "name":"Babylon 5", "size":"station" })  
{
    "acknowledged" : true,
    "insertedId" : ObjectId("5....c81d")
}

We've shortened the object ids to make them easier to read. Now let's make a crew collection and insert a crew member with a DBRef to that ship.

mongos> db.createCollection("crew")  
{ "ok" : 1 }
mongos> db.crew.insertOne({ name:"Delenn", "home": { "$ref":"ships", "$id": ObjectId("5....c81d") } })  
{
    "acknowledged" : true,
    "insertedId" : ObjectId("5....c81f")
}

Notice, nothing marks out the DBRef at insert time, it's just some special field names in a particular order. When we go to query it:

mongos> db.crew.findOne()  
{
    "_id" : ObjectId("5....c81f"),
    "name" : "Delenn",
    "home" : DBRef("ships", ObjectId("5....c81d"))
}

It magically becomes a DBRef. Now, lets add another crew member, but this time lets not put the fields in the "right" order:

mongos> db.crew.insertOne({ name:"Sheridan", "home": { "$id": ObjectId("5....c81d"), "$ref":"ships" } })  
{
    "acknowledged" : true,
    "insertedId" : ObjectId("5....c821")
}
mongos> db.crew.find()  
{ "_id" : ObjectId("5....c81f"), "name" : "Delenn", "home" : DBRef("ships", ObjectId("5....c81d")) }
{ "_id" : ObjectId("5....c821"), "name" : "Sheridan", "home" : { "$id" : ObjectId("5....c81d"), "$ref" : "ships" } }

And now, the reference is just two fields with $ prefixed field names. The information is there, but there's no presenting it with a DBRef wrapper. And when it's being processed you'll have to work with a mix of DBRef(...) functions and { ... } JSON objects.

The DBRef Risk

You may think that this isn't a big risk. All you have to do is rigorously ensure that you always use the right order of fields, which is harder than you think. Field ordering is a fuzzy thing with JSON objects; sometimes they get sorted, sometimes they just have whatever order the unmarshall operation read.

This can even affect applications which copy data between databases - they read a JSON document from one database, the unmarshalling and marshalling can shuffle some fields around and before you know it you have those unwrapped DBRef fields sitting in another database.
It's not a disaster, and with a bit of work you can recover and recreate the DBRefs. But it can break aggregations and other operations which rely on the reference being consistent and there's no real plus to having used DBRefs in the first place.

Looking up the alternative

And that's why the MongoDB documentation recommends manual references unless you have documents referred to in multiple collections. The arrival of the $lookup aggregation operator also makes the referred to document accessible in the pipeline meaning you can process it as part of a server side operation. $lookup doesn't work with DBRefs, so it's a big win.

When you do have multiple collections for your references to target then, we'd suggest its still easier to store the object id and collection name in your own object, without the special naming of DBRefs. And that won't break if someone inserts the fields out of order.

DBRefs haven't been deprecated in MongoDB and the functionality is unlikely to go away, but hopefully you can see there are good reasons to avoid introducing them into new projects.


Read more articles about Compose databases - use our Curated Collections Guide for articles on each database type. If you have any feedback about this or any other Compose article, drop the Compose Articles team a line at articles@compose.com. We're happy to hear from you.

attribution Giammarco Boscaro

Dj Walker-Morgan
Dj Walker-Morgan was Compose's resident Content Curator, and has been both a developer and writer since Apples came in II flavors and Commodores had Pets. Love this article? Head over to Dj Walker-Morgan’s author page to keep reading.

Conquer the Data Layer

Spend your time developing apps, not managing databases.