etcd Introduced

When Compose put PostgreSQL onto the Compose platform, we explained how we build Governor, a high availability system for PostgreSQL using a combination of scripting and etcd. It was during that process that we also realised exactly how powerful etcd was as a glue for disparate services. That experience has also led us to think other people should know more about etcd so let's talk about etcd.

About etcd

etcd has its roots with the developers at CoreOS. The name comes from the idea of distributing the Unix "/etc" directory, where many configuration files live, across multiple machines – /etc distributed. This simple enough idea is not so easy to achieve in practice. The CoreOS developers wanted something highly available and reliable, so they could use it at the heart of their infrastructure. They wanted it to be sequentially consistent, so that everything, in terms of events, occurs in the same order around the network. They wanted to be able to watch it, so applications could sit looking for events, and to make that a little harder to achieve, they wanted it all exposed with an HTTP front end. The system had to be reconfigurable on-the-fly, unlike its contemporaries which needed to go down when configurations changed. Finally, they wanted to make it open source.

To help them achieve what they wanted, they looked to a consensus algorithm called RAFT which takes on the tricky problem of figuring out how to get a cluster of systems to agree on a value. The original 2013 paper on RAFT(PDF) was the start of a process which saw various developers and companies refine RAFT such that in 2015 it is a "go-to" algorithm for getting that agreement between systems, there are many implementations and many systems and research projects make use of it.

The CoreOS developers used it to make their distributed /etc plan work. RAFT allows developers to write systems that agree on the state of values and in etcd's case those values represent a heirachical keyspace with directories and keys.

Basic etcd

The API to manipulate them is very simple. In these examples, we're assuming etcd is running on the local host.To create a key with a value, you do a HTTP PUT...

$ curl http://127.0.0.1:2379/v2/keys/control -XPUT -d value="total"
{"action":"set","node":{"key":"/control","value":"total","modifiedIndex":6,"createdIndex":6}}

The response tells you the "set" action succeeded, creating a new node /control with a value of "total". The other two values returned etcd's indexes tracking modifications and creations in the entire database and would happily increment up for each operation. We mention that because these are critical to how etcd clusters stay in sync or recover sync. It should be fairly obvious that to get the value of that key, we do a GET:

$ curl http://127.0.0.1:2379/v2/keys/control -XGET               
{"action":"get","node":{"key":"/control","value":"total","modifiedIndex":6,"createdIndex":6}}

A get action doesn't increment either index counter as no changes have occurred. When we update that key's value... then we can see the change:

$ curl http://127.0.0.1:2379/v2/keys/control -XPUT -d value="partial"
{"action":"set","node":{"key":"/control","value":"partial","modifiedIndex":7,"createdIndex":7},"prevNode":{"key":"/control","value":"total","modifiedIndex":6,"createdIndex":6}}

And finally, we can delete our new key with a HTTP DELETE:

curl http://127.0.0.1:2379/v2/keys/control -XDELETE  
{"action":"delete","node":{"key":"/control","modifiedIndex":8,"createdIndex":7},"prevNode":{"key":"/control","value":"partial","modifiedIndex":7,"createdIndex":7}}

Notice how the modified index went up, but the created index didn't change. When there's a cluster in play those indexes can be used to ensure that etcd cluster members are not just consistent, but sequentially consistent. This is more than eventually consistent; it means that each member gets to the same result by executing the same operations in the same order. Behind the scenes the members of the cluster divide themselves up, with one elected leader and a every other member following, using the RAFT consensus algorithm to ensure the cluster stays in sync.

But there's more. The etcd "set" operation lets you specify a TTL, a time-to-live in seconds as the ttl= parameter, along with the ability to only set a value if it previously existed by specifying prevExist=true. That means an application can create a key and regularly refresh it. If it misses a heartbeat refresh, it can't restart and refresh it. You can get an even better assurance of that if you make the value a unique id. Then you can use prevValue=id to ensure that your process really owned it, because if the previous value wasn't the application's id, it won't be allowed to set it. This is an atomic compare-and-swap operation and it is complemented by a compare-and-delete operation.

How would you use those features? Well, the Compose Governor uses them to select a lead PostgreSQL in a cluster and coordinate that with other PostgreSQL instances through a Python script. You can read about that in High Availability for PostgreSQL, Batteries Not Included.

As well as ensuring your changes are consistent, you can watch for changes. Add a ?wait=true to your URL in a get operation and etcd will only return you the new value (and its previous value) when it's changed. If you point at a directory then add ?recursive=true to the URL and it'll wait for a change in that directory and all its children. That's where directories become really useful as you can logically group your nodes into directories and wait on changes in those groups. For example you could have a '/available' directory where worker processes put their key to say they are ready to work. A schedule could get ../available?wait=true&recursive=true and get a response whenever a worker process posted.

Getting local with etcd

If you want to get to experience etcd, you can go hands on. The easiest way is to run a single node cluster. For etcd 2.2, your best option for getting the binaries is to go to the 2.2.0 release entry on Github. You'll find basic instructions on how to install - download, unpack and run basically - for Mac OS X, Linux and Windows along with source code for the compile-first people out there. Once you have etcd running you can use the examples of querying from earlier. You can also find out what version of etcd you are running by doing

curl -XGET -L http://127.0.0.1:2379/version  
{"etcdserver":"2.2.0","etcdcluster":"2.2.0"}%     

Now you are ready to explore what etcd has to offer. For a deep dive, check out the etcd API guide but before you go there, read on...

Quicker etcd

Curl can be a little hard on the fingers though if you are spending any time with etcd. When you unpack your etcd file for installation you should notice another file etcdctl. This is a command line application for talking with an etcd server. Our example commands from earlier can then be reduced down to:

$ etcdctl set /control total
total  
$ etcdctl get /control
total  
$ etcdctl set /control partial
partial  
$ etcdctl rm /control
PrevNode.Value: partial  
$

Each time, etcdctl by default returns the value or previous value as its sole response. It can, though produce more detailed information by using -o extended or -o json:

$ etcdctl -o extended set /control partial
Key: /control  
Created-Index: 23  
Modified-Index: 23  
TTL: 0  
Index: 23

partial  
$ etcdctl -o json set /control total
{"action":"set","node":{"key":"/control","value":"total","nodes":null,"createdIndex":24,"modifiedIndex":24},"prevNode":{"key":"/control","value":"partial","nodes":null,"createdIndex":23,"modifiedIndex":23}}
$

Etcdctl also has its own vocabulary of commands, set, get, rm as above are joined by mk, mkdir, rmdir, ls, setdir, update, updatedir, watch and watch-exec and others. We show curl versions of the commands first more as it is the underlying language of etcd's API. You can see the what etcdctl does as curl commands by using --debug like so:

$ etcdctl --debug -o json set /control independent
start to sync cluster using endpoints(http://127.0.0.1:2379,http://127.0.0.1:4001)  
cURL Command: curl -X GET http://127.0.0.1:2379/v2/members  
got endpoints(http://localhost:2379,http://localhost:4001) after sync  
Cluster-Endpoints: http://localhost:2379, http://localhost:4001  
cURL Command: curl -X PUT http://localhost:2379/v2/keys/control -d "value=independent"  
{"action":"set","node":{"key":"/control","value":"independent","nodes":null,"createdIndex":26,"modifiedIndex":26},"prevNode":{"key":"/control","value":"total","nodes":null,"createdIndex":24,"modifiedIndex":24}}

Etcdctl can even work well with scripts. The exec-watch command allows commands to be executed when a key is updated so we could have a shell script called details.sh which reads:

#!/bin/sh
echo It's News Time with $ETCD_WATCH_VALUE  

Then if we run:

etcdctl exec-watch /news -- ./details.sh  

Now, everytime /news changes, the script will run and echo out the new value. When we do etcdctl set /news "Brad Chilters" in another window, we see It's News Time with Brad Chilters in the other window. Not that exciting when done locally, but think about that working reliably over your cloud servers instances.

Further etcd

We've only just touched on the capabilities of etcd as a component of your systems management and configuration. Its high availability with RAFT gives it a high level of dependability, and you can tap into that. For example, we mentioned, in passing, the index values that appear when operations occur. You can use those indexes to ensure you don't miss events or disambiguate your queries for etcd as they reflect the index values being passed around within the RAFT cluster. With this kind of power, you can lean on etcd to provide a range of services to your applications.