Integration Testing Against Real Databases

Published

Integration testing can be challenging, and adding a database to the mix makes it even more so. In this Write Stuff contribution, Graham Cox talks about integration testing against live databases on both local and Compose-managed deployments.

If you've ever developed any software larger than a toy, you will no doubt know the importance of testing to ensure that it works, and especially that it interacts with other systems in an expected manner. The larger the software project, the more important it is that these tests are automated so that they are run correctly and regularly in order to catch any bugs that might be introduced.

The most fragile part of automated testing is the connection to external systems. If you're connecting to other applications that have a well-defined API, creating "mocks" which simulate responses from those API's can be a valuable technique. However, this technique can fall short when the system you're connecting to is a database.

Mocking database calls is notoriously difficult, and to make matters worse, software systems are usually heavily dependent on interactions with their databases making them a crucial testing need. Not only is it critical that the connection to the database is handled and tested, but the queries sent to the database must be tested for efficiency and to ensure that they return the correct data in all scenarios. Software bugs at the data access layer, especially ones where the data being returned is inconsistent, can be notoriously difficult to discover and fix without a thorough test rig for your application.

Testing is made more complicated by the nuanced behavior between different database engines, and even different versions of the same engine. For example, certain queries are significantly more efficient in PostgreSQL than they are in MySQL, and some queries are more efficient in MySQL 5.6 than they are in MySQL 5.1.

This article will show some techniques for testing your code against accurate versions of your database engine, so that you can reproduce production-like environments much more accurately.

Java Libraries for "Embedded" Databases

If you are writing your application in Java, there are a number of pre-existing libraries that can be used to load up a local Database system as part of your development process. These libraries will generally download the database software, set it up, run the database engine, and then shut it all down and tidy up afterward. This can make it really simple to integrate into your build for when you are running Integration tests and also into your software when you start a development version on your local system.

This section will cover a small number of Database engines that you can achieve this with, but there are many more that can be found with some searching.

PostgreSQL

PostgreSQL can be run from inside your application either in Build-time Integration tests or when running the application locally on your development system by using the Embedded PostgreSQL Server library from Yandex QATools. This library allows you to set up a PostgreSQL database, downloading the version that you have specified, and then stopping it afterward.

In order to use this, you simply need to add the appropriate dependency to your project, as follows:

<dependency>  
    <groupId>ru.yandex.qatools.embed</groupId>
    <artifactId>postgresql-embedded</artifactId>
    <version>2.2</version>
</dependency>  

and then write a wrapper that sets up the database exactly as you need, as follows:

/**
 * Wrapper around the PostgreSQL database
 */
public class PostgresWrapper {  
    /** The embedded postgres wrapper */
    private EmbeddedPostgres embeddedPostgres;
    /** The URL to connect on */
    private String connectionUrl;

    /**
     * Start PostgreSQL running
     * @throws IOException if an error occurs starting Postgres
     */
    public void start() throws IOException {
        if (embeddedPostgres == null) {
            int port = getFreePort();
            embeddedPostgres = new EmbeddedPostgres(V9_6_2);
            connectionUrl = embeddedPostgres.start("localhost", port, "dbName", "userName", "password");
        }
    }

    /**
     * Stop PostgreSQL
     */
    public void stop() {
        if (embeddedPostgres != null) {
            embeddedPostgres.stop();
            embeddedPostgres = null;
        }
    }

    /**
     * Get the URL to use to connect to the database
     * @return the connection URL
     */
    public String getConnectionUrl() {
        return connectionUrl;
    }

    /**
     * Get a free port to listen on
     * @return the port
     * @throws IOException if an error occurs finding a port
     */
    private static int getFreePort() throws IOException {
        ServerSocket s = new ServerSocket(0);
        return s.getLocalPort();
    }
}

When you call the start() method of this class, this will download PostgreSQL 9.6.2 and start it running on a randomly selected free port. It will then make a Connection URL available for a Data Source to connect to. When you call the stop() method it will then stop and delete the downloaded PostgreSQL database, completely cleaning up. This code can be triggered from anywhere appropriate; for example, you might include it in a Spring Context, or wrap it in a JUnit Rule.

MongoDB

MongoDB also has a library that can be used to start and stop an embedded MongoDB database, and it shares the same base code and API with the Embedded PostgreSQL library shared in the previous section so will work in a very similar way.

You simply need to add the appropriate dependency to your project:

<dependency>  
    <groupId>de.flapdoodle.embed</groupId>
    <artifactId>de.flapdoodle.embed.mongo</artifactId>
    <version>2.0.0</version>
</dependency>  

and then write a wrapper that sets up the database exactly as you need, as follows:

/**
 * Wrapper around the MongoDB database
 */
public class MongoWrapper {  
    /** The embedded MongoDB executable */
    private MongodExecutable mongodExecutable;
    /** The URL to connect on */
    private String connectionUrl;

    /**
     * Start MongoDB running
     * @throws IOException if an error occurs starting MongoDB
     */
    public void start() throws IOException {
        if (mongodExecutable == null) {
            int port = getFreePort();

            MongodStarter starter = MongodStarter.getDefaultInstance();
            IMongodConfig mongodConfig = new MongodConfigBuilder()
                .version(Version.V3_4_1)
                .net(new Net("localhost", port, Network.localhostIsIPv6()))
                .build();

            mongodExecutable = starter.prepare(mongodConfig);
            mongodExecutable.start();

            connectionUrl = "mongodb://localhost:" + port;
        }
    }

    /**
     * Stop MongoDB
     */
    public void stop() {
        if (mongodExecutable != null) {
            mongodExecutable.stop();
        }
    }

    /**
     * Get the URL to use to connect to the database
     * @return the connection URL
     */
    public String getConnectionUrl() {
        return connectionUrl;
    }

    /**
     * Get a free port to listen on
     * @return the port
     * @throws IOException if an error occurs finding a port
     */
    private static int getFreePort() throws IOException {
        ServerSocket s = new ServerSocket(0);
        return s.getLocalPort();
    }
}

This wrapper works exactly the same as the PostgreSQL version, with the same start(), stop() and getConnectionUrl() methods that you can use to control the database.

Elasticsearch

To round out our top-three in Java, we'll take a look at doing the same with Elasticsearch using the embedded-elasticsearch library from Allegro Tech. While this is a spiritual sibling to the previous two libraries, this one has a slightly different API. Our wrapper is more complicated this time because Elasticsearch is more complicated to set up.

In this case, we need to provide it with two ports to listen on, rather than just one. If we don't provide both ports then it will use defaults, which might clash with other services running on the same machine. We also need to configure any Plugins and Indexes that we want to use.

In the following example, we will configure the analysis-stempel plugin for performing Stemming of Polish language text, and we configure an index called cars that is configured based on the contents of the car-mapping.json file on the classpath.

As before, we need to add the appropriate dependency to our system:

<dependency>  
    <groupId>pl.allegro.tech</groupId>
    <artifactId>embedded-elasticsearch</artifactId>
    <version>2.1.0</version>
</dependency>  

And then write our wrapper around it.

/**
 * Wrapper around the Elasticseearch database
 */
public class ElasticsearchWrapper {  
    /** The embedded Elasticseearch executable */
    private EmbeddedElastic embeddedElastic;
    /** The URL to connect on */
    private String connectionUrl;

    /**
     * Start Elasticseearch running
     * @throws IOException if an error occurs starting Elasticseearch
     */
    public void start() throws IOException, InterruptedException {
        if (embeddedElastic == null) {
            int httpPort = getFreePort();
            int transportPort = getFreePort();

            embeddedElastic = EmbeddedElastic.builder()
                .withElasticVersion("5.4.0")
                .withSetting(PopularProperties.TRANSPORT_TCP_PORT, transportPort)
                .withSetting(PopularProperties.HTTP_PORT, httpPort)
                .withSetting(PopularProperties.CLUSTER_NAME, "my_cluster")
                .withPlugin("analysis-stempel")
                .withIndex("cars", IndexSettings.builder()
                    .withType("car", getSystemResourceAsStream("car-mapping.json"))
                    .build())
                .build()
                .start();

            connectionUrl = "http://localhost:" + httpPort;
        }
    }

    /**
     * Stop Elasticseearch
     */
    public void stop() {
        if (embeddedElastic != null) {
            embeddedElastic.stop();
        }
    }

    /**
     * Get the URL to use to connect to the database
     * @return the connection URL
     */
    public String getConnectionUrl() {
        return connectionUrl;
    }

    /**
     * Get a free port to listen on
     * @return the port
     * @throws IOException if an error occurs finding a port
     */
    private static int getFreePort() throws IOException {
        ServerSocket s = new ServerSocket(0);
        return s.getLocalPort();
    }
}

Docker

If you are not using a language that has similar support to the above, or else you can not find suitable libraries for the database that you wish to use, then the next option would be to actually run them locally yourself.

Docker makes it incredibly easy to achieve this by including an appropriate Dockerfile or docker-compose.yml file along with your build.

This is slightly more complicated than the above because it involves integrating Docker into your build process as well. Depending on how your system works, this can be done by calling the Docker or Docker Compose tools as part of the build process, or else by interacting directly with the Docker API. However, the advantage here is that you can more closely mirror your production environments, giving more confidence that what you are running in your tests is representative of how things will work live. This is even more true if you are using Docker to deploy to production as well.

This example will use the Docker Compose tool, since we are only interested in deploying software rather than packaging it, and this allows us to deploy as many database engines as are needed; for example, if you are running MySQL and RabbitMQ side-by-side in your environment.

An example docker-compose.yml file for running a MySQL Database and a RabitMQ Message Queue might look like this:

version: '2'  
services:  
    mysql:
        image: mysql/mysql-server:5.7
        ports:
            - "3306:3306"
        environment:
            - MYSQL_ROOT_PASSWORD=myRootPassword
            - MYSQL_DATABASE=testDatabase
            - MYSQL_USER=testUser
            - MYSQL_PASSWORD=testPassword
    rabbitmq:
        image: rabbitmq:3.6.9-management
        ports:
            - "5672:5672"
            - "15672:15672"
        environment:
            - RABBITMQ_DEFAULT_USER=rabbitUser
            - RABBITMQ_DEFAULT_PASS=rabbitPass

This gives us:

Any time that you want to set these databases up, you need only run "docker-compose up" against this file, and then you can run your application or test suite against these databases. You can then stop Docker, and when you start it again you will get exactly the same database versions as before, set up completely clean again.

Compose API

Compose allows you to take this a step further by offering an API to interact with their database hosting options. This allows you to programmatically create and destroy databases using a large number of database engines and versions. The Compose API can also be easily integrated into your software as part of your Integration tests and even as part of the Software deployment in development mode.

If you are using Compose to host your Production database systems then this will be the perfect way to test your database integration since it will give you an identical platform for the test databases to run on. It is even possible to have tests that run against a pre-populated database using a copy of your live Production database, which can be invaluable for certain types of tests.

In order to use the Compose API, you will need to know both your Account ID and your Access Token. Both of these can be obtained from the Compose.io Account screen. It is strongly recommended that these credentials are not stored in your source code, but are passed in from the outside. This ensures that they are not ever stored in source control and thus can never leak out by accident.

For these examples, we will use an Account ID of "thisIsMyAccountId" and an Access Token of "theSecretAccessToken". These are not real tokens obvious placeholders for the real ones which are hexadecimal numbers.

We will also be showing how to do this by making the raw HTTP calls from the command line. In reality this will be done as part of your application using whatever makes the most sense: for example, you may use Spring Framework RestTemplate in a Java Application.

Starting Fresh

In order to create a clean new database for testing, you simply need to know the type and version of the database to create, and where you wish to create it. Unless you are specifically doing performance testing, the data center location that the database is hosted on doesn't really matter. As such, we are going to always use the "us-east-1" from AWS for simplicity.

Here, we are going to create a clean "RethinkDB" for our tests. This will be done by issuing a POST to "https://api.compose.io/2016-07/deployments", as follows:

$ curl -v -H "Content-Type: application/json" -H "Authorization: Bearer theSecretAccessToken" https://api.compose.io/2016-07/deployments -X POST --data '{"deployment": {"name": "article-test", "account_id": "thisIsMyAccountId", "type": "rethink", "version": "2.3.5", "datacenter": "aws:us-east-1"}}'

> Host: api.compose.io
> User-Agent: curl/7.51.0
> Accept: */*
> Content-Type: application/json
> Authorization: Bearer theSecretAccessToken
> Content-Length: 132

< Date: Mon, 29 May 2017 16:37:05 GMT  
< Connection: close  
< X-Frame-Options: SAMEORIGIN  
< X-XSS-Protection: 1; mode=block  
< X-Content-Type-Options: nosniff  
< Content-Type: application/json; charset=utf-8  
< Cache-Control: no-cache  
< X-Request-Id: 5611FCB3E92E_0A00000220FB_592C4E2A_60DA00CAC  
< X-Runtime: 6.967557  
< Vary: Accept-Encoding, Origin  
< Strict-Transport-Security: max-age=31536000  
< X-Rack-CORS: miss; no-origin  
{
  "id": "592c4e2ac6a81f001900038e",
  "account_id": "thisIsMyAccountId",
  "name": "article-test",
  "type": "rethink",
  "created_at": "2017-05-29T16:36:58Z",
  "notes": null,
  "customer_billing_code": null,
  "ca_certificate_base64": "........",
  "connection_strings": {
    "direct": [
      "rethinkdb://admin:ba355df9096a48740b03eba585d806be@aws-us-east-1-portal.25.dblayer.com:19673"
    ],
    "cli": null,
    "maps": null,
    "ssh": null,
    "health": null,
    "admin": [
      "https://aws-us-east-1-portal25.dblayer.com:19673"
    ],
    "cli_token": null,
    "cli_basic_auth": null,
    "cli_token_auth": null
  },
  "provision_recipe_id": "592c4e2ac6a81f001900038d",
  "_links": {
    "compose_web_ui": {
      "href": "https://app.compose.io/n-a-237/deployments/article-test{?embed}",
      "templated": true
    },
    "scalings": {
      "href": "https://api.compose.io/2016-07/deployments/592c4e2ac6a81f001900038e/scalings{?embed}",
      "templated": true
    },
    "backups": {
      "href": "https://api.compose.io/2016-07/deployments/592c4e2ac6a81f001900038e/backups{?embed}",
      "templated": true
    },
    "alerts": {
      "href": "https://api.compose.io/2016-07/deployments/592c4e2ac6a81f001900038e/alerts{?embed}",
      "templated": true
    },
    "cluster": {
      "href": "https://api.compose.io/2016-07/clusters/55889965e8e2b432ef000009{?embed}",
      "templated": true
    }
  }
}

The request payload contains the name and version of the database to be created, as well as the Account ID that it should be created under and the Data Centre in which to create it.

This response has two very important pieces of information that we need to keep track of.

Once you are finished with the database at the end of your tests, or after you stop the development instance of your application, you will want to delete the database.
This is done by sending a DELETE to "https://api.compose.io/2016-07/deployments/[id]", as follows:

$ curl -v -H "Authorization: Bearer theSecretAccessToken" https://api.compose.io/2016-07/deployments/592c4e2ac6a81f001900038e -X DELETE

> Host: api.compose.io
> User-Agent: curl/7.51.0
> Accept: */*
> Authorization: Bearer theSecretAccessToken

< HTTP/1.1 202 Accepted  
< Date: Mon, 29 May 2017 16:55:11 GMT  
< Connection: close  
< X-Frame-Options: SAMEORIGIN  
< X-XSS-Protection: 1; mode=block  
< X-Content-Type-Options: nosniff  
< Content-Type: application/json; charset=utf-8  
< Cache-Control: no-cache  
< X-Request-Id: 5611FCB3EA87_0A00000220FB_592C526E_108590213  
< X-Runtime: 0.390463  
< Vary: Accept-Encoding, Origin  
< Strict-Transport-Security: max-age=31536000  
< X-Rack-CORS: miss; no-origin

{
  "id": "592c526f155cd70010000285",
  "account_id": null,
  "template": "Recipes::Deployment::Deprovision",
  "status": "running",
  "status_detail": "Running destroy_capsule on aws-us-east-1-portal.25.",
  "created_at": "2017-05-29T16:55:11Z",
  "updated_at": "2017-05-29T16:55:11Z",
  "deployment_id": "592c4e2ac6a81f001900038e",
  "name": "Deprovision",
  "_embedded": {
    "recipes": []
  }
}

This indicates that the deployed database has been queued for deletion, which means that the name is free for future tests and that you will no longer be billed for its existence.

Production Data

If you want to create a database from an existing backup of another database, then the Compose API also makes this really simple to achieve. In order to do this, you need to know the ID of the Deployed Database that the backup is from, and the ID of the backup that you wish to restore. Once you know these two things, you can then create a brand new database with an exact copy of this backup as follows:

In this case, we are using a Deployment ID of theDeploymentId, and a Backup ID of *theBackupId:

$ curl -v -H "Content-type: application/json" -H "Authorization: Bearer theSecretAccessToken" https://api.compose.io/2016-07/deployments/theDeploymentId/backups/theBackupId/restore -X POST --data '{"deployment": {"name": "restore-test", "datacenter": "aws:us-east-1"}}' | json

> POST /2016-07/deployments/theDeploymentId/backups/theBackupId/restore HTTP/1.1
> Host: api.compose.io
> User-Agent: curl/7.51.0
> Accept: */*
> Content-type: application/json
> Authorization: Bearer theSecretAccessToken
> Content-Length: 71

< Date: Mon, 29 May 2017 17:19:11 GMT  
< Connection: close  
< X-Frame-Options: SAMEORIGIN  
< X-XSS-Protection: 1; mode=block  
< X-Content-Type-Options: nosniff  
< Content-Type: application/json; charset=utf-8  
< Cache-Control: no-cache  
< X-Request-Id: 5611FCB3EC72_0A00000220FB_592C5803_9D101CF6  
< X-Runtime: 11.828383  
< Vary: Accept-Encoding, Origin  
< Strict-Transport-Security: max-age=31536000  
< X-Rack-CORS: miss; no-origin

{
  "id": "592c5803328838001400000c",
  "account_id": "thisIsMyAccountId",
  "name": "restore-test",
  "type": "mongodb",
  "created_at": "2017-05-29T17:18:59Z",
  "notes": null,
  "customer_billing_code": null,
  "ca_certificate_base64": null,
  "connection_strings": {
    "direct": [
      "mongodb://admin:MQGWQMWLNACGXEZQ@aws-us-east-1-portal.20.dblayer.com:16049,aws-us-east-1-portal.19.dblayer.com:16049/admin?ssl=true"
    ],
    "cli": [
      "mongo --ssl --sslAllowInvalidCertificates aws-us-east-1-portal.20.dblayer.com:16049/admin -u admin -p MQGWQMWLNACGXEZQ"
    ],
    "maps": null,
    "ssh": null,
    "health": null,
    "admin": null,
    "cli_token": null,
    "cli_basic_auth": null,
    "cli_token_auth": null
  },
  "provision_recipe_id": "592c5803328838001400000a",
  "_links": {
    "compose_web_ui": {
      "href": "https://app.compose.io/n-a-237/deployments/restore-test{?embed}",
      "templated": true
    },
    "scalings": {
      "href": "https://api.compose.io/2016-07/deployments/592c5803328838001400000c/scalings{?embed}",
      "templated": true
    },
    "backups": {
      "href": "https://api.compose.io/2016-07/deployments/592c5803328838001400000c/backups{?embed}",
      "templated": true
    },
    "alerts": {
      "href": "https://api.compose.io/2016-07/deployments/592c5803328838001400000c/alerts{?embed}",
      "templated": true
    },
    "cluster": {
      "href": "https://api.compose.io/2016-07/clusters/572dee2d71133800160006ae{?embed}",
      "templated": true
    }
  }
}

The response from this works in the exact same was as before, giving you the Deployment ID needed for cleaning up, and the Connection URL needed for connecting to the database.
The only difference is that this database will already be populated with the data present in the used database backup instead of being completely empty.

Note that this will have created a new deployed database with the name provided in the request; you do not need to create a database to restore the backup into.

As before, you should clean up afterward to ensure that you do not leave resources lying around that are not needed.

Summary

Integration testing against a real database can be an invaluable tool for your testing toolbox. It can give you early indications of problems and allow you to test and tune your application in a real-world setup.

This article has hopefully given some ideas on how you can integrate this level of testing into your own applications. And, as always, there's a lot more options than are covered here so please explore and have fun.


Do you want to shed light on a favorite feature in your preferred database? Why not write about it for Write Stuff?

Graham Cox is a Software Developer from the UK who has been writing software for almost 15 years, predominantly using Java and Javascript but also covering a wide variety of other languages.

attribution Denys Nevozhai

This article is licensed with CC-BY-NC-SA 4.0 by Compose.