Essentially etcd - Part 1

With the announcement of etcd on Compose, you may be wondering how you could actually use a service like etcd within your applications. To help answer that question, let's consider how we'd distribute regularly updated configuration data to a group of diverse servers.

The Examplco Problem

At Examplco, they have numerous servers which all have to be updated with a set of rules regularly. They've tried all sorts of ways of getting data out to each server and each time the cycle goes something like this...

"So each application has a library to download configurations..."
"-- Won't we need a server to distribute these configurations?"
"Yes."
"How will we make it reliable?"
"We'll make a cluster of servers and write synchronisation and failover code."
"But thats going to be a lot of work..."
"Compared to writing clients for this new server in every language we use?"

That's doing it the hard way -- and they haven't even considered the question of: do you poll for updates? Or, do you keep a long running connection open - that affects how quickly you can get to see your updated configurations.

etcd at Examplco

The first part of the problem that Compose hosted etcd solves for Examplco is the issue of having a server, or servers, with all the synchronisation and failover support to distribute their configuration data. Compose's etcd is a three-node cluster with two access nodes. The nodes stay in lockstep with each other through the use of the RAFT consensus algorithm.

With that out of the way, the Examplco developers can concentrate on getting the configuration data to the various servers. The etcd REST API is the foundation for all the other client libraries and - as REST is just HTTP methods with associated data - you can make use of it where no client library currently exists. Don't worry, we will talk about etcd client libraries, just not quite yet.

Setting some configuration data

The Examplco developers have decided on a configuration hierarchy where they have a directory of configs with a sub-directory for each server. Within that, any configuration is stored as named key-values. So, a path to the name of a database server to use for server-15 would be found in /config/server-15/database. Using the raw REST API, lets set that value using the curl command:

$ curl -L https://user:pass@aws-us-east-1-portal8.dblayer.com:10329/v2/keys/config/server-15/database -XPUT -d value="postgresql1" 
{"action":"set","node":{"key":"/config/server-15/database","value":"postgresql1","modifiedIndex":71,"createdIndex":71}}

The -L parameter in curl just tells it to follow any redirection. In this case, we are talking to just one of the two access portals for the etcd cluster: aws-us-east-1-portal8.dblayer.com:10329 - you can get your particular values from the Compose etcd dashboard under Connection Strings on the Overview page. The host and port information is preceded by the user and password information, also available in the same location on the dashboard.

The etcd database has a versioned REST API which means any endpoints for it contain the version number. We're using the current version 2 of the API, hence the /v2 following the host and port in the URL. That's followed by /keys to indicate we want to manipulate a particular key... which is what follows: /config/server-15/database.

Having said what we want to work with, we then say what we want to do with it. -XPUT says: "use the HTTP PUT method". This, in REST APIs, is how you set a value -- that value is what comes next. With -d value="postgresql1", the data is encoded and PUT into that key. When we hit return, what we get back is some JSON which tells us how that PUT worked out. JSON isn't the easiest thing to read when it's all on one line but if we pipe the output into | python -mjson.tool it comes out prettified:

{
    "action": "set",
    "node": {
        "createdIndex": 71,
        "key": "/config/server-15/database",
        "modifiedIndex": 71,
        "value": "postgresql1"
    }
}

Now, we can see that etcd translated our PUT into what it calls its set action. It then tells us that it set a particular node in the tree with a key of /config/server-15/database to a value of postgresql1.

Getting some configuration data

So, when a server in Exampleco wakes up and wants to know what database to talk to, it will now look up /config/[servername]/database on etcd. To do that, it will HTTP GET the value like this:

$ curl -sS -L https://user:pass@aws-us-east-1-portal8.dblayer.com:10329/v2/keys/config/server-15/database -XGET | python -mjson.tool
{
    "action": "get",
    "node": {
        "createdIndex": 98,
        "key": "/config/server-15/database",
        "modifiedIndex": 98,
        "value": "postgresql1"
    }
}

(Note: We've added -sS to the curl parameters; we're piping the output to our python pretty-printer and without -sS, a progress meter would be output)

This is almost identical to the response from when we set the value for the database key, but....

Directories

We skipped over the fact that when we created the key for the database, we had included path components. The config and the server-15, to be exact. When you create a key with a path, if any part of the path doesn't exist, a directory is created to represent it automatically. That means there is now a config and a server-15 directory. We can "get" their values, too. Let's get the server-15 directory:

$ curl -sS -L https://user:pass@aws-us-east-1-portal8.dblayer.com:10329/v2/keys/config/server-15 -XGET | python -mjson.tool
{
    "action": "get",
    "node": {
        "createdIndex": 71,
        "dir": true,
        "key": "/config/server-15",
        "modifiedIndex": 71,
        "nodes": [
            {
                "createdIndex": 98,
                "key": "/config/server-15/database",
                "modifiedIndex": 98,
                "value": "postgresql1"
            }
        ]
    }
}

As a directory, it doesn't have a value but instead has an array of child nodes associated with it. When we query it like this, we get to see the values of the child nodes. If we moved up the path hierachy, we'd see that we only get the values for the immediate children of the node. To see everything below the node, we'd have to use one of the options in the API by adding ?recursive=true to the URL and query the /config/ key. Notice how we've enclosed the URL in quotes because we just added a "?" into it -- if left unquoted, the shell would try and expand it. Keep this in mind whenever you add an option.

$ curl -sS -L "https://user:pass@aws-us-east-1-portal8.dblayer.com:10329/v2/keys/config/?recursive=true" -XGET | python -mjson.tool
{
    "action": "get",
    "node": {
        "createdIndex": 71,
        "dir": true,
        "key": "/config",
        "modifiedIndex": 71,
        "nodes": [
            {
                "createdIndex": 152,
                "dir": true,
                "key": "/config/server-11",
                "modifiedIndex": 152,
                "nodes": [
                    {
                        "createdIndex": 152,
                        "key": "/config/server-11/database",
                        "modifiedIndex": 152,
                        "value": "mongodb2"
                    }
                ]
            },
            ...
        ]
    }
}

We've cut this short, but you'll see the contents of all the nodes within directories going down the hierarchy - all the way to the actual values.

Getting fresh configuration data

When one of the servers has woken up and received its initial configuration data, it needs to know if there's been a change in the data. This is where we can use another of the options in the API. You can tell a GET call to wait for a change rather than return immediately by adding ?wait=true to the URL. So if we wanted to wait for the database to change we could do:

$ curl -sS -L "https://user:pass@aws-us-east-1-portal8.dblayer.com:10329/v2/keys/config/server-15/database?wait=true" -XGET | python -mjson.tool

This command won't return yet. Over in another window we can update that key:

curl -L https://user:pass@aws-us-east-1-portal8.dblayer.com:10329/v2/keys/config/server-15/database -XPUT -d value="postgres2"  
{"action":"set","node":{"key":"/config/server-15/database","value":"postgres2","modifiedIndex":218,"createdIndex":218},"prevNode":{"key":"/config/server-15/database","value":"postgresql1","modifiedIndex":98,"createdIndex":98}}

This is a set. You'll notice we can see the previous value in the prevNode section, as well as the newly created node. If we look back to our GET curl, we'll be able to see that it has now returned some data:

{
    "action": "set",
    "node": {
        "createdIndex": 218,
        "key": "/config/server-15/database",
        "modifiedIndex": 218,
        "value": "postgres2"
    },
    "prevNode": {
        "createdIndex": 98,
        "key": "/config/server-15/database",
        "modifiedIndex": 98,
        "value": "postgresql1"
    }
}

It's come back with the same information as the PUT curl came back with (albeit, we've pretty printed it). We could wait on every single key we're interested in but that wouldn't be very manageable. We can, though, use the recursive and wait options together. Our server-15 could just wait for any changes, new keys, updates or deletions in the /config/server-15 directory, and that directory could have its own set of directories and this...

curl -sS -L "https://user:pass@aws-us-east-1-portal8.dblayer.com:10329/v2/keys/config/server-15?recursive=true&wait=true" -XGET | python -mjson.tool  

...would still work. So now when we make a change, it returns with the update from the directory:

{
    "action": "set",
    "node": {
        "createdIndex": 274,
        "key": "/config/server-15/database",
        "modifiedIndex": 274,
        "value": "postgres3"
    },
    "prevNode": {
        "createdIndex": 218,
        "key": "/config/server-15/database",
        "modifiedIndex": 218,
        "value": "postgres2"
    }
}

The application could now process that update and change configuration as needed.

Go to the code

There's one official API client for etcd and it's made for the Go language, the same langage etcd is written in. There was a previous version of this API called go-etcd, but thats been deprecated in favour of the new and current client which can be found in the etcd source tree on GitHub.com. You'll find the reference documentation on GoDoc.org. Go developers can install it with go get github.com/coreos/etcd/client.

To demonstrate the facets of etcd we've talked about, the Examplco engineers have produced their own proof of concept app called 'examplco'. You can find the source for it here in the Compose examples repository on Github. It's a command line application with two modes. One of these modes - "config" - will let you set a field in a server's configuration to a particular value while the other - "server" - will listen for changes in a server configuration and update its local copy of that configuration.

As a command line application, we need to parse the command line. For that, we're using Kingpin, mainly because its fluent API makes it easy to read in examples. Here's the entire definition of the command line:

var (  
    app          = kingpin.New("examplco", "An etcd demonstration")
    peerlist     = app.Flag("peers", "etcd peers").Default("http://127.0.0.1:4001,http://127.0.0.1:2379").OverrideDefaultFromEnvar("EX_PEERS").String()
    username     = app.Flag("user", "etcd User").OverrideDefaultFromEnvar("EX_USER").String()
    password     = app.Flag("pass", "etcd Password").OverrideDefaultFromEnvar("EX_PASS").String()
    config       = app.Command("config", "Change config data")
    configserver = config.Arg("server", "Server name").Required().String()
    configvar    = config.Arg("var", "Config variable").Required().String()
    configval    = config.Arg("val", "Config value").Required().String()
    server       = app.Command("server", "Go into server mode and listen for changes")
    servername   = server.Arg("server", "Server name").Required().String()
)


var configbase = "/config/"  

The application will always need to know the list of servers we can talk to. In the curl examples, we only talked to one server. The API supports multiple etcd servers and on Compose we have two visible access points for an application to talk to. In common with other etcd tools, we give this list as a comma separated list of URLs in the parameter of the --peers flag. If not specified, the peers default to a typical local etcd setup. Then, there are the "optional" username and password flags, too.

Just to make it super convenient, you can also set the environment variables EX_PEERS, EX_USER and EX_PASS to the values for for --peers, --user and --pass and you won't need to repeatedly type them in.

Thats enough for us to get started. If we hop forward in the code we find the initialisation code which sets up our connection:

    kingpin.Version("0.0.1")
    command := kingpin.MustParse(app.Parse(os.Args[1:]))
    peers := strings.Split(*peerlist, ",")

We split up the comma separated list of peers because when we get to creating the client.Config struct, used to pass all the connection setup information to the API, it expects an array of endpoints rather than a comma separated list...

    cfg := client.Config{
        Endpoints:               peers,
        Transport:               client.DefaultTransport,
        HeaderTimeoutPerRequest: time.Minute,
        Username:                *username,
        Password:                *password,
    }

We'll leave the Transport set at DefaultTransport and discuss it at another time. We've set HeaderTimeOutPerRequest to a minute – this is the time required to have a request fail. Finally, we copied the username and password from the command line parsing into the Config. Once we've done that, we can turn that into a etcd client:

    etcdclient, err := client.New(cfg)

To keep things brief, we'll be skipping the error checking (and bailing out with log.Fatal()) that is in the actual program, with one exception. There's a couple of APIs available from within the client, but the one we are interested in is the KeysAPI. We can get at that by handing our new etcdclient to the NewKeysAPI function:

    kapi := client.NewKeysAPI(etcdclient)

Setting the Config

We can now start doing things with this API. Let's start with the "config" command. The command takes a "server name", a config variable and a config value. We then construct our key using the first two of those (and another variable, configbase which is set to /config/):

    var key = configbase + *configserver + "/" + *configvar

Using this key, we call on the "KeysAPI Set" (kapi.set) function:

resp, err := kapi.Set(context.TODO(), key, *configval, nil)  

The first parameter, context, sets out the overarching behaviour of this call; if you don't know what behaviour you want, set it to the handy context.TODO(). The next parameter is the key, followed by the value we want to set it to. The nil is us not setting the SetOptions that are available. When this successfully returns we can print out the various parts of the response:

    fmt.Println(resp.Action + " " + resp.Node.Key + " to " + resp.Node.Value)

The JSON response from etcd is already decanted into an easy to access structure, an analog to the JSON's structure. If we wanted to, we could turn the response into printable JSON by doing:

    b, err:=json.MarshalIndent(resp,"  ","  ")
    fmt.Println(string(b))

... which is a useful thing to do when debugging. When we run the complete program we get...

$ ./examplco --peers https://aws-eu-west-1-portal1.dblayer.com:10810,https://aws-eu-west-1-portal0.dblayer.com:10683 --user user --pass password config server-15 database "postgresql12"
set /config/server-15/database to postgresql12  

And server-15's database configuration has been set. That is pretty much all we need for setting the configuration... now, let's see how we could make a server react to these changes.

Watching the config

For this example, we're just going to keep a simple duplicate map of the config settings up to date with etcd. We need to use our given server name to create our key and create a map for storing our results in first.

    var key = configbase + *servername
    var settings map[string]string
    settings = make(map[string]string)

Now we are ready to get whatever is currently set under the key.

    resp, err := kapi.Get(context.TODO(), key, &client.GetOptions{Recursive: true})

This time we're using the "Keys API Get" (kapi.get) call. We're also handing over an option to the call setting the Recursive option to true. Although we are recursing here, it's mainly for illustrative purposes because we are only going to handle the immediate children. However, it does show how simple it is to get that section of the hierarchy. Once we've got it we can iterate over it, extract out the name of the key in the directory and then set that in the map with the value of the node.

    for _, node := range resp.Node.Nodes {
        _, setting := path.Split(node.Key)
        settings[setting] = node.Value
    }
    fmt.Println(settings)

We finish up by printing that map out. Now we need to move to keeping it up to date. For this we're going to use the "Keys API's Watcher"(kapi.Watcher). By placing a watcher on a key, underneath it is an HTTP call waiting on that key to change. In the curl examples when a change happened, curl delivered a response. With the Watcher, we can call Watcher.Next() which will block until the next change happens. So, let's make a Watcher:

    watcher := kapi.Watcher(key, &client.WatcherOptions{Recursive: true})

We've added an option here for the Watcher to be recursive. It'll come back to us with any changes in the directory's nodes and all the way down to each value. Again, we're only processing the immediate children, but this shows how you can react to a more complex configuration tree with only one Watcher.

Now, we need to go into a loop and block in watcher.Next():

    for true {
        resp, err := watcher.Next(context.TODO())

We set a context.TODO() on the calls behaviour, and wait. When it returns it should give us a response or an error. This is one occasion where we want to keep an eye on the error returned.

        if err != nil {
            if _, ok := err.(*client.ClusterError); ok {
                continue
            }
            log.Fatal(err)
        }

This is specific to working with a Watcher. We've found we can get spurious ClusterErrors from the system when doing a long living call, as a Watcher does. So, we check to see if it was a ClusterError and if it was then we continue as normal. Otherwise, we just error out.

As we now have a valid response from the Watcher, we can act on it. Here, we switch on the Action field of the structure. If it is "set", we do what we did when we created the map:

        switch resp.Action {
        case "set":
            _, setting := path.Split(resp.Node.Key)
            settings[setting] = resp.Node.Value

If instead, it was asked to "delete" or "expire" then we would delete the entry from the map.

        case "delete", "expire":
            _, setting := path.Split(resp.Node.Key)
            delete(settings, setting)
        }

The "expire" is what happens when a node removes itself from the etcd data; an expire happens when a TTL (time-to-live) has run out. Now, we've updated our settings. We can act on them or, in our case, just print them out:

        fmt.Println(settings)
    }

If you build this code, then in one window start the server running:

$ ./examplco --peers https://aws-eu-west-1-portal1.dblayer.com:10810,https://aws-eu-west-1-portal0.dblayer.com:10683 --user root --pass password server server-15

You can now go to another window - and as you are going to be typing the command a few times - set the environment variables, then modify server-15's configuration...

$ export EX_PEERS=https://aws-eu-west-1-portal1.dblayer.com:10810,https://aws-eu-west-1-portal0.dblayer.com:10683
$ export EX_USER=root
$ export EX_PASS=password
$ ./examplco config server-15 socket open                             
set /config/server-15/socket to open  
$ ./examplco config server-15 database postgresql19                   
set /config/server-15/database to postgresql19  
$ ./examplco config server-15 socket closed                           
set /config/server-15/socket to closed  
$

Over in the server window you'll see the changes as they happen:

map[database:postgresql12]  
map[database:postgresql12 socket:open]  
map[database:postgresql19 socket:open]  
map[database:postgresql19 socket:closed]  

.Next()

In the next part of Essentially etcd, we'll look at etcd's time-to-live settings, the power of PrevExist and PrevValue operations and tap into more of the API. We'll also take a quick look at some other etcd libraries. Till .Next() time.

Update: Part 2 of Essentially etcd is now available for you to continue your etcd reading.