Choosing the Right Solution for You - Compose PB&J

If you're new to some of the databases that Compose offers, you might be wondering which ones you should choose for your project. This article provides an overview of Compose offerings so that you can have a better understanding of what each one does and how they fit together.

A Note About SQL vs NoSQL

Before we get started, let's get a couple things straight. "NoSQL" and "schema-less" have become such buzzwords that it almost feels like you must be doing something wrong if you're not using one of these kinds of databases already. Jumping onto the "schema-less" bandwagon has led some developers to use a shoehorn to cram data into these kinds of databases that would be better suited to an RDBMS (relational database management system) or, because of confusion between types of "NoSQL" models, using a document store when a simple key/value store would've solved their problem. Beware of the lure of round holes when you've got square pegs. While many of Compose's offerings do indeed fall under these broad categories of databases (and they can be extremely beneficial for many situations), "NoSQL" and "schema-less" are not a one-size-fits-all solution. We touched on this in our article Schema-Less Is (Usually) a Lie. As we go through each of the Compose offerings below, it'll become evident that they're as different from each other as peanut butter is from jelly... And just like peanut butter goes really well with jelly, many of Compose's offerings go together in yummy ways.

Compose Relational Database - PostgreSQL

Compose offers PostgreSQL as a relational database solution. It's a mature, open source database that is ACID-compliant, adheres strongly to the SQL standard, and provides a robust and customizable set of features. For those coming from the world of MS SQL Server, Oracle or IBM DB2, PostgreSQL will make you feel right at home. One of the key benefits of using PostgreSQL instead of one of those proprietary databases is that you can get started quickly and easily without a lengthy contract or licensing fees. At Compose, your databases are provisioned instantly. We've also written an article about what PostgreSQL has over other open-source SQL databases to highlight differences in PostgreSQL from MySQL, MariaDB, and Firebird.

PostgreSQL can be used for any situation that calls for a relational schema or requires transactions. An example is recording purchases where the purchase can be related to the products purchased and to the users doing the purchasing. Relational data is extremely convenient for generating structured business reports as well. At Compose, we use PostgreSQL for account and financial reporting. In our case, we transport data from our MongoDB collections into a structured schema in PostgreSQL to enable easy retrieval and reporting on certain dimensions and metrics in the data. MongoDB is our peanut butter in this situation and PostgreSQL is our jelly.

Note that PostgreSQL is so robust that it also provides for "schema-less" data storage through its JSON and JSONB data types. So then you're asking if you even need a "NoSQL" solution if there are SQL databases that handle "schema-less" data. Seems to be some peanut butter here already... In Is PostgreSQL Your Next JSON Database?, we explain why these data types in PostgreSQL do not replace the need for JSON-oriented stores, but having JSON-oriented storage in PostgreSQL is still a useful feature in some circumstances. For those who are starting new with JSON but feel most comfortable in a SQL database, this can be a good way to get familiar with JSON objects.

Take a look at PostgreSQL on Compose if this database sounds like it will work for your needs.

Compose Document Stores - MongoDB, RethinkDB and Elasticsearch

Compose offers two open source general purpose document stores, MongoDB and RethinkDB. Elasticsearch, while categorized as a document store, is actually a more specialized open source solution, typically used for search as its name implies.

One of the misunderstandings with document stores is that "schema-less" means there is no schema. That's extremely unlikely. There is usually some form of schema inherent in the data. Rather, "schema-less" in these databases means that the schema can be flexible and dynamic as opposed to the static structures of a SQL database. Here's a snippet of a document from our helpdesk ticket system in MongoDB to demonstrate these points:

{ id: 2222,
  type: "customer",
  subject: "Timeout Errors",
  status: "active",
  createdAt: "2014-10-09T14:38:42Z",
  createdBy: { id: 11111,
    firstName: "Joe",
    lastName: "Smith",
    email: "me@mycompany.com",
    type: "customer" },
  body: "Dear Compose Support team,\n\nThanks for following up so quickly. We optimized some queries and that solved our timeout issue.\n\nBest, Joe",
  createdByCustomer: true 
}

Notice that there is a fairly defined schema for the helpdesk ticket of field-value pairs, but also notice that the customer info is nested within one of the elements illustrating the flexibility of the structure. In our case, we use MongoDB for this data because help desk tickets can have different elements and levels of nesting depending on what they're about, whether it is a message received from the customer or a message we've sent back, an action we've taken or an annotation, or for different types of messages (email vs messages received through our UI vs phone calls). Also, our helpdesk tickets are threaded conversations. We never know how many threads there will be for any one ticket. Having a flexible structure allows us to keep all the data for a ticket in one document so our support team can easily see every message and action taken for the ticket.

One of our Write Stuff articles, written by guest author Rob Ludwig, discusses schema flexibility in MongoDB for further reading on this topic.

MongoDB

As you've probably guessed then, MongoDB is used for its flexibility in storing documents. There will likely be an inherent schema in the data, as we saw above, but the beauty of document stores is that the structure does not have to be pre-defined and imposed on the database beforehand. The documents in MongoDB are expressed in JSON and arranged in collections within the database. Because MongoDB does not have to store the different JSON document fields into database fields in a variety of tables the way a relational database would, it is also known for its speed. Data is maintained in replica sets across a server cluster in a MongoDB architecture and can also be sharded as your data needs grow. A large volume of data can be stored and retrieved extremely quickly at the document level with this kind of setup.

If your application requires more speed than a relational database, will operate at the document level, and needs schema flexibility, then MongoDB is a great solution to choose. Some example applications that benefit from MongoDB include content management systems, product catalogs, and threaded conversations (like our helpdesk tickets). Additionally, this flexibility and speed can be taken advantage of during rapid development and prototyping, before a more structured schema is settled on. In that way, MongoDB might be a stepping stone to a relational database once the structure is better defined. Note that MongoDB also has a large and committed developer community for help getting started and additional support. And if you need something more official to show to your boss about why MongoDB is a good choice for your project, Gartner placed MongoDB in the Leaders category of the Operational Database Management Systems (ODBMS) Magic Quadrant for 2015.

As we mentioned above, we use Mongo DB not only for our helpdesk ticketing system, but also for other data such as accounts and deployments. Now, because MongoDB is not the right kind of database to use for detailed report building (like our financial and accounts reports), we actually move some of the data we need for those reports over to PostgreSQL and transform it into a relational structure along the way. That way we maintain the structural flexibility that our documents require and the speed we rely on in our apps while also being able to do some advanced querying and report development for back office processes. Peanut butter and jelly.

If you're interested in MongoDB, you may actually want to try out MongoDB+ instead. This is our beta offering of MongoDB on a more full-featured platform. Learn more in our article Why You're Going to Like Our New MongoDB Deployments and Going SSL with MongoDB+.

RethinkDB

RethinkDB is also a document store that handles JSON documents (so it's "NoSQL" and "schema-less", too), but it's got some differentiating qualities from MongoDB. One of these is the language developed for use with RethinkDB: ReQL. ReQL is a robust query language that allows you to develop complex queries against your collections and within documents, even allowing for joins across tables (yes, it has the concept of tables!) based on relational data. Because of the richness of ReQL and some aspects of relational models applied against the document store, RethinkDB is the gourmet version of goober mix in the peanut butter and jelly world - it's made for document storage, but with a more structured, even relational, overlay that allows for powerful querying. It's also a distributed solution built for real-time operations using a push mechanism for updated data so it's fast. All of this makes it great for applications that need real-time data that changes frequently such as real-time collaboration tools, multi-player gaming, and auctions.

Note that RethinkDB is the newer kid on the block compared to MongoDB so its developer community and relative popularity are not yet as substantial, but they are growing rapidly.

Read our 2-part series to learn more about real-time operations in RethinkDB or just go ahead and give RethinkDB a try and discover the power for yourself.

Elasticsearch

So, even though Elasticsearch is in the group with MongoDB and RethinkDB, it's not intended to be used as a generalized document store the way the other two are. Elasticsearch has been upfront about some reasons why not, for those of you who are curious. Also, like all the document stores, transactions are supported only at the document level. If you need something with more nitty-gritty transactional operations then PostgreSQL is the way to go.

What Elasticsearch is great for are situations where the document is not changing frequently, but needs to be retrieved frequently and quickly, using a variety of methods. That makes it perfect for search. Elasticsearch is a distributed solution built on Apache Lucene, which underlies many web search engines. For documents stored in Elasticsearch, all fields in the documents are automatically indexed so complex queries can be used to retrieve the data. It also provides full-text search that supports auto-complete, contextual search, and multiple languages.

Elasticsearch is sweet jelly for your peanut butter databases, allowing you to discover related documents and surface insights about the data within. We've written a couple articles about how to use Elasticsearch with MongoDB — Elasticsearch at Compose - How It Fits and Optimizing MongoDB Queries with ElasticSearch — but it can be paired also with RethinkDB or even PostgreSQL (double jelly!).

If you need fast, advanced search capability for your document collections, you should give Elasticsearch a try.

Compose Key/Value Stores - Redis and etcd

Though Redis and etcd are officially open source key/value stores and fall under the broad categories of "NoSQL" and "schema-less", each of them is a specialized solution, just like Elasticsearch above. We consider both of these to be infrastructure solutions, as opposed to data solutions, aimed at helping your applications be more efficient.

Redis

Redis is an in-memory key/value store so its claim to fame is its lightning speed. Because of this, it is often used for data caching or even as a buffer while data moves through it to another source. Redis can be used for any application that requires instantaneous data operations, such as a chat application. In the background Redis persists data to disk and, at Compose, also comes with automatic failover and a 3-node server cluster with a haproxy portal so availability and reliability are not an issue. Compose also provides auto-scaling for Redis (as we do with many of our other offerings) so as your data needs grow, your deployment resources can expand automatically as needed.

Redis is perfect to use with your other databases to take some of the load off of them and perform frequently-needed operations, speedily in-memory, for applications that need that level of immediacy. We've written a couple articles about using Redis with MongoDB to demonstrate its power, but you could also use it with RethinkDB or PostgreSQL: Why (and how to) Redis with your MongoDB and Redis, MongoDB and the Power of Incremency. Redis is for making bite-sized PB&J, easy to consume.

Try Redis now and see how quick and easy it is to use.

Note that Compose is also offering Disque in alpha to existing customers. Disque is an in-memory job queue that is based on the same concepts as Redis.

etcd

Compose's other key/value store is actually a distributed cluster configuration management service. etcd is like the bread that holds our PB&J sandwiches together... and you can even cut the crusts off - configure your PB&J the best way for your tastebuds. If you've been following us so far, you probably now see the value of running MongoDB or RethinkDB, PostgreSQL, Elasticsearch, and Redis all together for your applications. They each have their part to play. etcd helps them stay in lockstep. At it's simplest, etcd can be used just to manage the connection configuration information for these different deployments, but it can do so much more. Check out our 3-part series on etcd that goes into detail, using hypothetical company Exampleco to demonstrate real-world examples, on how to use its rich features for your configurations. At Compose, we use etcd to manage our distributed PostgreSQL deployments.

etcd is a highly recommended component for your stack. No matter what your applications do, it will make your life easier. The peanut butter and jelly need something to hold them together.

Compose Messaging - RabbitMQ

And then we have RabbitMQ, an open source distributed asynchronous message broker. It acts kind of like a temporary database for messages to pass through so sometimes developers try to treat it like a database... but it's not a database. It routes messages where they need to go and provides a temporary home until they can be picked up. It's another one of Compose's infrastructure solutions. Our 2-part series on configuring RabbitMQ provides an example use case for developing a user notification app with RabbitMQ, but it can be used for any situation where applications, databases, or other services need to communicate to each other asynchronously. We've demonstrated how to do this using RethinkDB change notifications and PostgreSQL row inserts to produce messages for RabbitMQ to pass along. RabbitMQ is like the knife that spreads the peanut butter and the jelly, bringing the two together.

Get started with RabbitMQ to see how you can put this messaging service to work for you.

Summary

At Compose, we advocate using the right tool(s) for the job. We hope this article has given you a better understanding of the differences between these various solutions and has clarified why "NoSQL" and "schema-less" are not very useful descriptors. Depending on the kind of data you have and what you're trying to do with it, one (or probably several) of Compose's offerings is sure to be a perfect fit for you. Go ahead... make your own PB&J!

And keep an eye out! We're adding new solutions all the time and we've got some goodies planned for the new year...