Building a dynamic configuration service with etcd and Python

A dive into etcd and the creation of a Python library to manage dynamic configuration are the subject of Gigi Sayfan's latest Write Stuff article.

Building robust and performant distributed systems is hard. The main reason is that typically everything is in flux. There is no well defined spec for hardware, software and usage. Everything evolves at the same time. Your system capabilities evolve, if you run in the cloud your cloud provider may change the hardware under you, changes in data volume and user access patterns will push you to new architectural choices, at scale cost of hardware, storage and bandwidth will become the dominant factor and not the people managing your infrastructure. Add to that the dynamic nature of networked systems where component failure is a day to day reality and not a theoretical possibility.

Managing that change and turmoil requires some deliberate design decisions. You can't hard-code all this flexibility into your software, which means you'll have to be able to configure it without making code changes. There are several well-known mechanisms to do that such as command-line arguments, environment variables and configuration files. All have their own time and place.

But, in a large-scale distributed systems context (think hundreds or thousands of servers) they all suffer from a significant downside. They all require a deployment (if pushing a new configuration file) and/or restarting a process in order to provide new command-line arguments or environment variables. This push-based approach is problematic because it's fragile. Some servers will be down and you'll have to keep track of which servers got the new configuration and which ones didn't. The ones that didn't can create all kinds of issues when they eventually get online and keep running with the old configuration.

An alternative approach is a pull-based approach where processes periodically pull from a central configuration repository the most up to date configuration. The ultimate solution is if a process is notified in real-time whenever its configuration changes and can reconfigure itself almost immediately. In this article I'll show you how to do just that and build a dynamic configuration service based on etcd using a Python library called conman.

A Quick Introduction to etcd

Etcd is a distributed key value store that provides a reliable way to store data across a cluster of machines. It uses the RAFT algorithm and is great at keeping critical data correct and available. You can organize your keys in directories to get hierarchies and get notified when keys are added/removed/changed to directories you watch.

Let's play around with etcd to get a sense of how the different concepts work together. You can install a local etcd cluster by following the instructions here for creating a local etcd cluster.

But, it may be easier to play with an existing cluster. Compose.io offers a 30-day trial of a full fledged 3-node etcd cluster running on AWS (or Softlayer - editor). You can sign up here. You'll get access to a nice dashboard that looks like this:

etcdctl is your command-line client. You can use it interactively or in scripts, although for serious programming I recommend using a proper etcd client library in your favorite language.

I created this alias to quickly connect to my Compose.io etcd instance using etcdctl.

alias e='etcdctl --ca-file ~/compose_etcd.pk --no-sync --peers https://aws-us-east-1-portal10.dblayer.com:10835,https://aws-us-east-1-portal11.dblayer.com:27265 -u root:*********'

Note the --ca-file ~/compose_etcd.pk option. You need to create a file that contains the public key provided to you in the compose.io etcd dashboard.

Here are the commands supported by etcdctl:

   backup          backup an etcd directory
   cluster-health  check the health of the etcd cluster
   mk              make a new key with a given value
   mkdir           make a new directory
   rm              remove a key or a directory
   rmdir           removes the key if it is an empty directory or a key-value pair
   get             retrieve the value of a key
   ls              retrieve a directory
   set             set the value of a key
   setdir          create a new or existing directory
   update          update an existing key with a given value
   updatedir       update an existing directory
   watch           watch a key for changes
   exec-watch      watch a key for changes and exec an executable
   member          member add, remove and list subcommands
   import          import a snapshot to a cluster
   user               user add, grant and revoke subcommands
   role            role add, grant and revoke subcommands
   auth            overall auth controls
   help, h         Shows a list of commands or help for one command

Running etcdctl --help will list off these and the global options. Here are the ones to pay special attention to

Playing with etcd

Let's play a little with etcd and on the way I'll introduce some key concepts. Etcd is a key-value store. Let's create some keys then using the "mk" command:

~ > e mk x 3
3  
~ > e mk y 123
123  

Now we can observe the keys using the "ls" command:

~ > e ls
/x
/y

To get the value of the key you use... wait for it... the "get" command:

~ > e get /y
123  

That's pretty straightforward you can also create (like "mk") or update existing keys with the "set" command.

~ > e set new 6
6  
~ > e ls
/x
/y
/new
~ > e set x 777
777  
~ > e get new
6  
~ > e get x
777  

You may have noticed that etcd adds a forward slash / before keys. That's because keys are organized in a directory structure. If you just create keys like I did they get put in the root directory under /. But you can also create directories using the mkdir or implicitly when setting keys that include paths. For example, to create a directory "d" that contains two keys "a" with the value 4 and "b" with the value 5:

~ > e set d/a 4
4

~ > e set d/b 5
5

~ > e ls d
/d/a
/d/b

~ > e get d/a 4
4

~ > e get d/b 5
5  

You can also remove keys or directories using the "rm" command, which has the --recursive option. Let's get rid of the entire "d" directory:

~ > e rm --recursive d
~ > e ls
/x
/y
/new

So, etcd looks pretty similar to a distributed file system from an administrative point of view. That's great because you should have a mental model for how to organize information and how to manage it in etcd.

Etcd has another cool thing, which is TTLs (time to live). When you create a key you can decide how long it hangs around. There are many situations where you may want keys to expire after a while and with tcd you don't have to implement this logic yourself and remember to delete keys after they expire.

Here I set a key with a TTL of 5 seconds and try to get it several times. See what happens after 5 seconds:

~ > e mk e 4 --ttl "5"
4  
~ > e get e
4  
~ > e get e
4  
~ > e get e
Error:  100: Key not found (/e) [45]  

So, TTLs are cool. What is even cooler is the "watch" command. With the "watch" you can get notified when watched keys or directories are modified. Let's see it in action. Since "watch" is blocking (waiting for something to happen) I'll make may changes in a separate terminal window. Here is the watch terminal originally, watching the root recursively:

~ > e watch / --recursive

This command will block until some change occurs in the watched key and all its descendents since I specified the "--recursive" option.

Now, in a separate terminal window I set a new key:

~ > e set a "what this."
what this.  

And immediately, in the original window the "watch" command returned with the information:

~ > e watch / --recursive
[set] /a
what this.  

Normally, when you watch a key, you would want to do something when it is modified. The "exec-watch" command lets you execute arbitrary commands when the watched key is modified.

Other Things You Can Do With Etcd

I will just mention a few other things etcd provides. You can manage authentication, users and roles, grant and revoke read/write privileges on a path basis. You can backup and import snapshots and you can manage the members of the cluster.

The ConMan Library

The ConMan library is a Python library for program configuration. It supports configuration files in various formats but most importantly etcd. It is built on top of python-etcd.

The ConManEtcd class allows you to connect to an etcd instance and then add keys you're interested in. Then it exposes the key and all its subdirectories as a nested Python dictionary. You can refresh at any point or even watch the key and get notified when something is changed.

Good examples can be found in conman_etcd_test.py.

Let's look at one of the test methods to get an idea how to use ConmanEtcd and break it down. Here is the entire test_refresh() function:

def test_refresh(self):  
    self.assertFalse('refresh_test' in self.conman)

    # Insert a new key to etcd
    set_key(self.conman.client, 'refresh_test', dict(a='1'))

    # The new key should still not be visible by conman
    self.assertFalse('refresh_test' in self.conman)

    # Refresh to get the new key
    self.conman.refresh('refresh_test')

    # The new key should now be visible by conman
    self.assertEqual(dict(a='1'), self.conman['refresh_test'])

    # Change the key
    set_key(self.conman.client, 'refresh_test', dict(b='3'))

    # The previous value should still be visible by conman
    self.assertEqual(dict(a='1'), self.conman['refresh_test'])

    # Refresh again
    self.conman.refresh('refresh_test')

    # The new value should now be visible by conman
    self.assertEqual(dict(b='3'), self.conman['refresh_test'])

The test_refresh() function makes sure that when the data for a particular key is modified on etcd and conman's refresh() method is called the updated data shows up. "self.conman" is a ConmanEtcd object that was created in the setup phase. The "set_key()" method is a utility function that sets keys on the etcd instance.

First, let's verify that the key "refresh_test" is not present in self.conman. ConmanEtcd exposes a dict interface, so you can use the "in" operator to check if keys exist or not:

self.assertFalse('refresh_test' in self.conman)  

Then, let's add the key to the etcd instance using the setkey() helper. The value is a dictionary "dict(a=1)", so it will actually create on etcd a directory with a sub-key "/refreshtest/a 1".

# Insert a new key to etcd
set_key(self.conman.client, 'refresh_test', dict(a='1'))  

So, the state has changed on etcd, but our self.conman wasn't refreshed since the change, so it should still be unaware of the new state under the "refresh_test" key.

# The new key should still not be visible by conman
self.assertFalse('refresh_test' in self.conman)  

OK. Let's refresh and verify self.conman has the new state.

# Refresh to get the new key
self.conman.refresh('refresh_test')

# The new key should now be visible by conman
self.assertEqual(dict(a='1'), self.conman['refresh_test'])  

Yeah, it worked. But, what if we change an existing key?

# Change the key
set_key(self.conman.client, 'refresh_test', dict(b='3'))

# The previous value should still be visible by conman
self.assertEqual(dict(a='1'), self.conman['refresh_test'])

# Refresh again
self.conman.refresh('refresh_test')

# The new value should now be visible by conman
self.assertEqual(dict(b='3'), self.conman['refresh_test'])  

Yep. The refresh() method of ConmanEtcd works as expected and can sync with the etcd instance.

Dynamic Configuration with ConMan

Refreshing state is fine, but very often you want to always work with the most up to date state. You could repeatedly call refresh(), but a much better way is to watch for changes. ConmanEtcd supports the watch functionality with a nice callback interface. dyn_conf_program.py is a complete sample program that is configured dynamically. Whenever the configuration on etcd is changed it gets notified via a callback and writes the change to a file. Finally when a special key called "/dyn_conf/stop" is set to 1 it exits.

Let's go over the different aspects of the dyn_conf_program.py program. The whole program is contained in a single class called Program that is instantiated when the script is executed. You need to pass a key to watch and a filename as well as various connectivity arguments, but it connects by default to a local etcd instance.

if __name__ == '__main__':  
    Program(*sys.argv[1:])

The __init__() method creates a ConmanEtcd object passing it own on_configuration_change() as the on_watch callback argument. It also opens and truncates the file indicated by filename (for testing purposes). It then stores the filename and the key, initializes a variable called "last_change" to None and calls its own run() method.

class Program(object):  
def __init__(self,  
             key,
             filename,
             protocol='http',
             host='127.0.0.1',
             port=4001,
             username=None,
             password=None):
    self.conman = ConManEtcd(protocol=protocol,
                             host=host,
                             port=int(port),
                             username=username,
                             password=password,
                             on_change=self.on_configuration_change,
                             watch_timeout=5)
    self.filename = filename
    open(self.filename, 'w+')
    self.key = key
    self.last_change = None
    self.run()

The run() method starts by refreshing the key so it is up to date with the current state then it watches the key by calling self.conman.watch(). Note that this is NOT a blocking call because conman runs each watch in a separate thread. Then the run() method gets into a loop where it is constantly checking if there is a sub-key called "stop with a value of "1" and in this case it returns effectively ending the program. Otherwise, it sleeps for one second and checks conman again. This is a classic example of dynamic configuration where you can terminate the program remotely by setting the "stop" key in etcd to "1". The program is very efficient because it doesn't refresh the entire state from etcd every second. It just checks the state of the local conman.

def run(self):  
    self.conman.refresh(self.key)
    self.conman.watch(self.key)
    while True:
        if self.conman[self.key].get('stop') == '1':
            open(self.filename, 'a').write('Stopping...\n')
            self.conman.stop_watchers()
            return
        time.sleep(1)

OK. So, if run() checks only the local conman's state and never refreshes how does it know when the "stop" key becomes "1"? That's the beauty of the watch functionality. Whenever, someone modifies the watched key the callback method "on_configuration_change()" will be called. This method is smart enough to ignore redundant repeats of the same change, which can happen in a distributed database like etcd and is hence idempotent. When a new change arrives it writes it to the output file and also does a refresh, so the local conman is up to date with the state on etcd.

def on_configuration_change(self, key, action, value):  
    # Sometimes the same change is reported multiple times. Ignore repeats
    if self.last_change == (key, action, value):
        return

    self.last_change = (key, action, value)
    line = 'key: {}, action: {}, value: {}\n'.format(key,
                                                    action,
                                                    value)
    open(self.filename, 'a').write(line)
    self.conman.refresh(self.key)

There are also full-fledged integration tests which run 3 separate dynconfprogram.py programs, settings keys on etcd and eventually setting the "stop" sub-key to "1" in order to terminate all 3 programs. Check it out here

Conclusion

Etcd is a powerful and very reliable distributed database designed for special use cases. One of these use cases is program configuration. This article explained the need and rationale for reliable dynamic configuration of a large number of programs across a distributed system and introduced a Python library called conman that can be used to dynamically configure your programs remotely. Give etcd a try. It's fun and useful.

Gigi Sayfan is the director of software infrastructure at Aclima (http://aclima.io), a start-up company that designs and deploys distributed sensor networks that enable a higher level of environmental awareness. Gigi has been developing software professionally for 20 years in domains as diverse as instant messaging, morphing, chip fabrication process control, embedded multi-media application for game consoles, brain-inspired machine learning, custom browser development, web services for 3D distributed game platform and most recently IoT/sensors.

This article is licensed with CC-BY-NC-SA 4.0 by Compose.