Loss prevention with RethinkDB

RethinkDB is a great database. Its sharding, replication and failover facilities are baked into the system. There is a catch though and it's one that can trip up developers and admins who've come from other databases – you need to turn those features on for them to work.

Here's the TL;DR version:

When you create a table in RethinkDB, it is created in one shard and with one replica. The RethinkDB team may change this in the future but currently, for resilience, you will want to change replicas from the default to whatever is appropriate for you. On Compose you should make use of your three data nodes by setting replicas to 3.

Ok, that's the essential information out of the way. Now, how can you know that a table isn't correctly configured? On Compose, we detect it and show it in the Compose console overview like this:

Deployment Errors

Here you can see the secondary servers in the cluster have information about the existence of tables (examples.actors and examples.shows) and that those tables are not replicated to them. There's also a pointer to the RethinkDB documentation, but let's go fix that lack of replication here.

Fixing in the UI

First we need to go to the RethinkDB Administration UI in the browser. The URL is in the overview page. When you get there, select Tables in the options at the top of the page...

Table View

Actors and shows both only have one shard and one replica. They are living dangerously on the primary. Let's fix actors manually first. Click on actors in the list of tables and you'll get to the Table Overview view:

Table Overview

There's lots of information about our table here. There's statistics about its activity and replicas, a chart showing how data is distributed across shards, secondary indexes and a list of servers that this table lives on. As you can see, in the last case, that's one server. Go to the Sharding and Replication panel and click on Reconfigure. Up comes this panel:

Shard/Replica

Now, the easy thing to do is enter 3 into the Replicas per shard field and then click Apply configuration, but before you do click apply, click on Where's my data going. This will show where the data is proposed to go when you press Apply. If you're happy with it, click the Apply button knowing that your table will be unavailable for a little while as it is replicated. Once it's complete the Servers Used table looks like this:

Servers Used

Fixing with ReQL

Now, how can we do that change programatically? We'll fix the examples.shows table from the ReQL command line to show how. Go to the data explorer and run:

r.db('examples').table('shows').config()  

which will, in this case show:

Config Results

This is basically the text version of the Servers used table in the UI, one replica in one place. The reconfigure() command on a table lets us change shards and replicas. Run r.db('examples').table('shows').reconfigure( { shards: 1, replicas: 3 } ) to set the replicas to 3 and the result will be... pretty long as it'll be a complete record of the configuration changes. If we run .db('examples').table('shows').config() again though we'll see the changes confirmed:

After Reconfig

Avoiding the issue

If needs be you can automate those config changes in an application but it's better to stop the problem occurring from the start. You can do that by making more use of the capabilities of tableCreate() when you make your new tables. It takes an optional list of parameters. Keeping it simple, if we're making a new table serials then all we need to do is add replicas: 3 to that list:

r.db('examples').tableCreate( 'serials', { replicas: 3 } )  

And our new table will already be replicated over three nodes. Your data will be replicated and if a failover occurs, it'll be there for your applications to work with. If you are wondering why this isn't turned on by default, it's been something thats been considered and discussed but when you start doing things automatically like that, you make assumptions which may or may not hold. Until this happens, make sure you are creating enough replicas on RethinkDB by just doing , { replicas: 3 } when you create a table.