Essentially etcd - Part 3 - Indexes, Ordering and APIs

In this final part of the Essentially etcd series, we're going to look at what the index means in etcd – it's important – and look at ordering keys and hiding them. We'll take a whistlestop tour of the libraries and tools available for etcd. If you've missed the previous parts, so far in this series we've looked at how to use etcd to distribute configurations and how to track status reliably. Now, on to the Index.

But what of the Index?

We've talked pretty much exclusively about the keys and values in etcd so far, but we've not mentioned the indexes. When we did a set operation earlier – as an example – we saw:

{
    "action": "set",
    "node": {
        "createdIndex": 71,
        "key": "/config/server-15/database",
        "modifiedIndex": 71,
        "value": "postgresql1"
    }
}

... the createdIndex and modifiedIndex which track the state of the server versus what happened to the node. So, when this node was created, the etcd server's state index was at 71, so the createdIndex is set to 71. The action of creation is also an action that modifies and when any modifying action takes place on a key, the modifiedIndex gets set to the etcd server's state index too. The state index is an integer which ticks its way up in time with changes to the server. Which is all very interesting but what use is it to me, you may wonder. Well, the modifiedIndex holds the ability to ensure no one has changed a key since you last looked at it.

Consider the server heartbeat from part 2. It is possible, though unlikely, that a process could read the value and use that to correctly update the heartbeat. If though, when we updated the heartbeat key, we made a note of the modifiedIndex we could ensure that that hadn't been changed in the interim and it was still the same key we left. To do this well we'd want to do that comparison atomically and it just so happens that etcd comes with the equivalent to prevValue for indexes, prevIndex. Set that parameter to what you recorded the modifiedIndex was and if it doesn't match it will throw an error. If it does match, then it'll carry out the update or deletion.

If you don't have meaningful values for heartbeats (such as encoding status information in the heartbeat value) then tracking the modifiedIndex is easier still. You may even want to just update and track an existing key. Rather than reading it once to get the modifiedIndex value, when you update the key you can set prevIndex to check against 0; that always passes – you should already be getting the updated modifiedIndex from the response.

Order in the directory

When you want to create a queue, or anything that is order sensitive, the index can also play a part in that process too. Going back to when we were talking about creating keys and values, you may recall we used a HTTP PUT method to do that. There's another HTTP method that is used in REST interfaces, POST. Now, the technical difference between PUT and POST is that repeating a PUT should end up with the same results, a property called idempotence, while repeating a POST can have a different outcome each time. Thats where the etcd developers put it to a good use. If we POST a value at a directory, the value is put into a newly generated key inside it. For example, we'll POST the value "Job1" to the key "/myqueue"

$ curl -sS -L https://user:pass@host:port/v2/keys/myqueue -XPOST -d value="Job1"  | python -mjson.tool         
{
  "action": "create",
  "node": {
    "key": "/myqueue/00000000000000050842",
    "value": "Job1",
    "modifiedIndex": 50842,
    "createdIndex": 50842
  }
}

You can see now there's a directory called /myqueue with a key inside that is 00000000000000050842 and which has the value "Job1". Where did that generated key come from though? If you look down, you see the etcd index value is 50842 - that's where the key comes from. Remember what we said earlier about the index always going up according to the operations on the etcd database. This is exploiting that fact to generate keys which when sorted, will always reflect the order they were POSTed. We can go on POSTing values:

$ curl -sS -L https://user:pass@host:port/v2/keys/myqueue -XPOST -d value="Job2" | python -mjson.tool        
{
  "action": "create",
  "node": {
    "key": "/myqueue/00000000000000050843",
    "value": "Job2",
    "modifiedIndex": 50843,
    "createdIndex": 50843
  }
}
$ curl -sS -L https://user:pass@host:port/v2/keys/myqueue -XPOST -d value="Job3"  | python -mjson.tool                
{
  "action": "create",
  "node": {
    "key": "/myqueue/00000000000000050844",
    "value": "Job3",
    "modifiedIndex": 50844,
    "createdIndex": 50844
  }
}

When you want to get the child keys, just make sure to add the "sorted=true" to the key you are getting:

$ curl -sS -L https://user:pass@host:port/v2/keys/myqueue\?sorted\=true -XGET | python -mjson.tool
{
    "action": "get",
    "node": {
        "createdIndex": 50842,
        "dir": true,
        "key": "/myqueue",
        "modifiedIndex": 50842,
        "nodes": [
            {
                "createdIndex": 50842,
                "key": "/myqueue/00000000000000050842",
                "modifiedIndex": 50842,
                "value": "Job1"
            },
            {
                "createdIndex": 50843,
                "key": "/myqueue/00000000000000050843",
                "modifiedIndex": 50843,
                "value": "Job2"
            },
            {
                "createdIndex": 50844,
                "key": "/myqueue/00000000000000050844",
                "modifiedIndex": 50844,
                "value": "Job3"
            }
        ]
    }
}

And there you have it, automatically generated keys, based on the etcd index, so you get a guaranteed on-insertion order.

Of course this wouldn't get complete without showing how you do this with the Go API. There's a CreateInOrder method for the etcd keys API which does the work of switching to using a POST method to get the non-idempotent behaviour. For reference, there's a Create method too which is a wrapper around the Set method with the PrevNoExist option set. Anyway, CreateInOrder does all the work, so we've added two examplco commands, fillqueue which will put ten randomly named jobs into a named queue and another to dump out the contents of the queue, dumpqueue. Here's the code behind them:

func doFillQueue(kapi client.KeysAPI) {  
    var key = queuebase + *queuename
    list := rand.Perm(10)
    for _, v := range list {
        value := "Value" + strconv.Itoa(v)

        resp, err := kapi.CreateInOrder(context.TODO(), key, value, nil)

        if err != nil {
            log.Fatal(err)
        }

        fmt.Println(resp.Action + " " + resp.Node.Key + " to " + resp.Node.Value)
    }
}

func doDumpQueue(kapi client.KeysAPI) {  
    var key = queuebase + *dumpqueuename

    resp, err := kapi.Get(context.TODO(), key, &client.GetOptions{Sort: true})

    if err != nil {
        log.Fatal(err)
    }
    for _, v := range resp.Node.Nodes {
        fmt.Println(v.Key + " set to " + v.Value)
    }
}

Note that the GetOption for sorting the Get is Sort:true rather than the REST API's sorted. You'll find this code in the final version of our examplco demonstration application available in the examplco3 repository on GitHub. And if you've been following the code, you'll find this version is now split into more comprehensible, easier to navigate files.

Hiding in plain sight

With a filesystem like structure for the key/value store, etcd does have other similar functionality. For example, if you name a key with an underscore, "_", as its first character then it doesn't show up when you GET the contents of the enclosing directory. This is etcd's version of the Unix system's dotted files like ".profile" and ".bashrc" and the like. What use, you may ask, is this in a key/value store like etcd? Well, consider an application which creates a complex hierarchy of directories and files which applications traverse to locate locks and config files. One thing you may want to do with these directories is include a key for meta-data so that when a directory is located, metadata aware applications can retrieve it while applications unaware of the metadata keys will not see this extra data.

APIs and other animals

Etcd has just the one official API, the REST API, and only one official client library in Go. That doesn't mean there isn't an etcd client library for your particular language or platform though. The REST API makes it easy for any system that can sensibly talk HTTP to talk to etcd. To save time, bear in mind that etcd has been developing rapidly and unofficial clients may or may not have kept up – look for libraries that support the v2 etcd API.

A page on the etcd GitHub repository lists various tools and client libraries that are available. As is fair, it doesn't select any third party clients as favourites but there are a lot of clients listed which are functional but rarely used or maintained. So, the rule of thumb is to look for the popular, active libraries.

Go not only has the official client library listed but also the older go-etcd library. Although there are some attractions in the older library, it has been deprecated by the authors who suggest new development should be based on the official library.

Node.js has many etcd libraries available for it, but probably the most used is node-etcd which has many other Node libraries depending on it.

Ruby has three etcd libraries listed, but one of them still lacks HTTPS support ruling it out for use with Compose deployments which require SSL connections – the most popular library is, unsurprisingly, etcd-ruby which does have SSL support.

Java etcd libraries - there's a few. Boon/etcd implements a task oriented API built on top of Vertx, while a client like etcd4j has a thinner abstraction built on top of Netty. It's hard to judge popularity with the Java libraries but most are actively maintained.

For Python, the python-etcd library tops the list; actively maintained, updated and depended on.

For most other languages, we refer you back to the list. We've talked about how the REST API affords easy accessibility to etcd, even from the shell using curl, but we would remind you of etcdctl. It was previously a stand-alone application but it's now part of the etcd installation which means it's a bit heavier to install if you only want the etcdctl part of it – you can happily ignore the rest of the etcd package though. Anyway, if you are prepared to install it, etcdctl offers an alternative to formulating curl commands and can even generate them for you with debug on (see Introducing etcd).

Etcetera Etcetera...

To review, etcd is great as a solid source of truth for your configuration and status information for your applications. It's got battle-hardened capabilities like the atomic "compareAnd..." operations and time-to-live for keys or directories that ensure you can rely on freshness and correctness of the data you are storing and retrieving. If you are planning a large system and need to configure and coordinate the various moving parts of that system, etcd is a great place to find the functionality you'll need.