Compose Compare: MongoDB vs. PostgreSQL

Published

Compose offers hosted services that conquer the data layer of your applications, but choosing the right components for your data layer can be a challenge. Compose Compare pits two of these components together to help you determine which style best fits your needs.

It can be difficult for developers to keep up with the latest advancements in database technology. It's tempting for developers to continually use the same databases for every project because of familiarity, despite there being a system more suited to that project's data needs.

In this Compose Compare, we'll take a look to two of our most popular offerings, MongoDB and PostgreSQL, and help you decide which one would be a good fit for your project.

MongoDB

MongoDB is a widely deployed open-source document storage database (colloquially called a NoSQL database). It stores data in the form of a binary-encoded version of the JSON format called BSON, which allows for the efficient storage and transmission of JSON data structures.

Organization of Data

MongoDB stores data in Documents, which are then organized into Collections. Documents are stored in collections similarly to how files are stored in folders in the traditional file system analogy, and the same way Records are stored in Tables in a relational database. Databases like MongoDB are also called document databases because of this abstraction.

MongoDB Document Collections

Connecting to MongoDB

MongoDB uses an open-source wire protocol over TCP / IP to connect to the database, so you'll want to use native drivers for your platform when connecting to a MongoDB deployment. The mongos protocol specifically addresses using SSL with this wire protocol, and all connections to Compose MongoDB use this secure layer.

MongoDB has a command-line utility called the mongo shell which allows you to execute commands using a REPL interface. It comes bundled with every MongoDB deployment, and runs on Mac OSX, Windows, and Linux platforms.

When you create a MongoDB deployment on Compose, the Connection Info section of the Overview page has a command line section you can use to connect to your deployment via the mongo shell.

MongoDB Connection Info

mongo --ssl --sslAllowInvalidCertificates aws-us-east-1-portal.23.dblayer.com:17249/admin -u <username> -p <password>  

Managing Data

MongoDB is a class of databases informally referred to as NoSQL, which loosely means that it doesn't use SQL to retrieve or manage data. Instead, MongoDB uses function calls that you can send to the database server through an open connection. For example, once you have a connection established you can create a new database by calling the use syntax:

use dbname;  

In the Mongo Shell, this creates a new database if none existed before, and creates a db variable which references that database. From that point on, all functions are executed on the db reference.

Some examples of functions you can call on this db reference are below:

// Find all documents in a collection
db.collection_name.find();

// Search for documents matching a specific criteria
db.collection_name.find({"name": "John"});

// Add an index to a field
db.collection_name.ensureIndex({"name": 1 });

You can find additional information in the getting started guide.

Relationships Between Entities

Documents in MongoDB don't have relationships the way that relational databases do. Documents that have a one-to-one relationship with other documents are typically just embedded into the parent document directly. For example, if we have a program containing "student" and "primary residence" objects, we would store the primary residence object within the student object using standard JSON embedded object syntax:

{
   "firstName": "John",
   "lastName": "Doe",
   "ssn": 123456789,
   "primaryResidence": {
      "address": "123 Main Street",
      "city": "Anywhereville",
      "state": "CA",
      "zip" : 12345-6789
   }
}

We can then query for users in a specific city by using the dot syntax with the find() function:

db.students.find({"primaryResidence.city": "Anywhereville" });  

For many-to-one relationships, we can also embed arrays of objects into our document, as with a phoneNumbers field:

{
   "firstName": "John",
   "lastName": "Doe",
...
   "phoneNumbers": [
       {
          "type": "home",
          "number": 123456789
       }
   ]
}

Many-to-many relationships are modeled using ObjectID references, but there is no standard way of enforcing these relationships - ie: there are no "foreign key" constraints that can be automatically applied to MongoDB data.

When is it a good fit?

MongoDB is a good fit when you have data that changes frequently or data where the schema is unknown. There is no data migration required for documents in a collection, and collections can contain documents with varying fields. This makes MongoDB ideal for prototyping and for environments where there are large amounts of heterogeneous data with simple and pre-defined relationships between entities.

Since documents are stored in JSON format, MongoDB is a natural fit with teams using Javascript since it essentially serves as an object database for Javascript developers.

When is it not a good fit?

In environments where data is highly structured and highly relational, MongoDB may not be an ideal choice. Since data in a collection is heterogeneous, queries across multiple fields that have highly complex relationships can take much longer and require much more RAM than similar queries in a relational database. Typically, if your data is highly structured and unlikely to change, or if you plan on querying across many different entities with complex relationships, then you'll want to choose a relational database such as PostgreSQL.

PostgreSQL

PostgreSQL is an open-source relational database, sometimes referred to as a SQL database. While it also has features for storing unstructured documents, PostgreSQL is typically used for storing and querying highly structured, relational data.

Organization of Data

Being a traditional relational database, data in PostgreSQL is organized into tables and records. Within a table, each record must have exactly the same properties with the same data types for each field. Developers can further add constraints to these fields to require the data in a record to conform to a set of rules. These rules are enforced at the time a record is inserted into the database, and insertion will fail if the data in the record fails those constraints.

Data in a PostgreSQL database can be visualized as a spreadsheet, with columns representing the field names and rows representing each record of data.

PostgreSQL Data

Connecting to PostgreSQL

PostgreSQL uses the PostgreSQL wire protocol over TCP / IP to connect to the database, so you'll want to use native drivers for your platform when connecting to a PostgresSQL deployment. Version 3 of the protocol includes support for SSL encryption and all connections to PostgreSQL on Compose use SSL connections.

PostgreSQL has a command-line utility called psql which allows you to execute specific commands from the command line. It comes bundled with every PostgreSQL deployment and runs on Mac OSX, Windows, and Linux platforms. There are also a large number of other third-party tools available to PostgreSQL developers.

When you create a PostgreSQL deployment on Compose, the Connection Info section of the Overview page has a Command Line section you can to connect to your deployment via the psql shell. You can reveal your default connection credentials by clicking on the Show/Change button in the Credentials section.

PostgreSQL credentials

psql "sslmode=require host=aws-us-east-1-portal.27.dblayer.com port=23012 dbname=compose user=[username]"  

Managing Data

PostgreSQL is a class of databases informally referred to as SQL, which loosely means that it uses a database-specific variant of the Structured Query Language (SQL) to retrieve or manage data. SQL is the primary way of interacting with PostgreSQL, and all data functions, from creating new tables and records to adding constraints and updating indexes, are performed by an SQL command.

Here are some examples of some common commands that can be run on PostgreSQL:

# Create a users table
CREATE TABLE employees (  
    id SERIAL NOT NULL PRIMARY KEY, 
    firstname VARCHAR(255),
    lastname VARCHAR(255),
    jobtitle VARCHAR(255),
    age FLOAT8,
);

# Get a row from the database that matches a criteria.
SELECT * FROM employees WHERE firstname = "John"

# Insert a record into a database
INSERT INTO employees (firstname, lastname, jobtitle) VALUES ('John', 'OConnor', 'Technical Content Creator');  

You can find additional information in the SQL tutorial on the PostgreSQL website. Keep in mind that, while SQL is an ANSI standard and variants of SQL share keywords and basic language constructs, additional language features are often tightly coupled to the specific database they were written for. Make sure you're reading the PostgreSQL documentation specifically, as SQL for other databases (such as MySQL or SQLite) may not work the same.

Relationships Between Entities

PostgreSQL has a robust set of commands for handling the relationships between entities, with the main feature being constraints.

One-to-one relationships are modeled as a field in one table referencing the other, and vice-versa. Since the records in both tables have a single ID field that can reference the other, this guarantees that one entity is only ever associated with that other entity. A foreign-key constraint can be added to the field for both of the tables, ensuring that a foreign-key field in one record refers to the ID of another.

Here's an example in spreadsheet form:
One-To-One Join

One-to-many relationships, where a single record in one table is referenced in multiple records of another, can be modeled by having a foreign-key field in one of the two related tables, with the many side of the relationship containing a foreign key for the one side of the relationship. Since the ID field in the many side is tied specifically to a record in the one side, the foreign key constraint is applied to that field.

You'll want to perform queries on the many table to manage the one-to-many relationship between the records.

One-To-Many Join

When multiple records in one table need to refer to multiple records in another table in a many-to-many relationship, a join table is the best way to go. A join table contains two fields, one for each record you'd like to join.

Join Table

When you want to query a many-to-many, you'll use the join table to manage the relationships between records. Here's an example of a query that uses a join table to associate the data together:

SELECT  
    e.name AS employee
    m.name AS manager
FROM  
    employees e 
    INNER JOIN managers m
    ON e.manager_id = m.id

When is it a good fit?

PostgreSQL is a great fit for data that is highly structured in nature, and that has a lot of relationships that may change over time. It's easy to create and manage new relationships in PostgreSQL, so anytime you have a lot of data with highly complex relationships you should check out PostgreSQL.

PostgreSQL also provides guarantees about how data is inserted into the database, so if you want data to conform to a pre-defined standard and want to guarantee that any data in the database will match that standard, PostgreSQL is a solid choice.

PostgreSQL also has two programming features called triggers and stored procedures that allow some degree of programming to happen in the database itself. Triggers are events that can be generated when various database actions occur, such as when a document is inserted or validate. Stored Procedures are functions written in a procedural language that can be executed an run in the database when various triggers occur. PostgreSQL supports four procedural languages by default and extensions can be added to include other procedural languages as well. This functionality can also be a factor in choose PostgreSQL.

When is it not a good fit?

If your data structures are constantly changing, or if you're trying to find a data design that works well in multiple situations, the rigidity and structure of PostgreSQL may become frustrating. Each change in data structure requires the developer to execute a series of SQL calls which migrate old data so it conforms to the new structure. This can be very tedious when data structures change frequently and may indicate that PostgreSQL isn't the right choice.

Data stored and retrieved in PostgreSQL usually needs to be translated from the relational structure to objects in each programming language via Object-Relational Mapping. Whether this is done manually or through an automated ORM library, this translation can be time-consuming to convert. If the mapping between objects and tables is complex and your data would fit better in an Object database, you may want to consider using a document store.

Wrapping Up

When you're starting out on a project, choosing the right data layer makes a huge difference. If you're building prototypes or need a high degree of flexibility, MongoDB is a great choice. If you have highly relational data that offers guarantees about structure and consistency, PostgreSQL is a great option.

Whether your needs skew more toward NoSQL, or whether Relational databases seem more appropriate, Compose has hosted solutions that will ensure your data layer is available when you need it. We'll explore more comparisons in future articles in this series so you can make the right choice for your project.

John O'Connor
John O'Connor is a code junky, educator, and amateur dad that loves letting the smoke out of gadgets, turning caffeine into code, and writing about it all. Love this article? Head over to John O'Connor’s author page to keep reading.

Conquer the Data Layer

Spend your time developing apps, not managing databases.