API Caching with Redis and Node.js

Published

We'll show you how to use a combination of Redis and Node.js so you can cache API queries and make your applications run quickly and predictably.

Picture the scene. There's an API endpoint you rely on and it is slow or not as reliable as you want. Or you just don't want to add load to a shared resource. You want to ensure your application has predictable performance. This is where effective caching can be your biggest asset.

Let us, for example, look at querying one of data.gov's API's, the College Scorecard. You can get an API key for this by signing up with your email for access to any Data.gov API. Once you have a key, let's put it in the DATAGOVAPIKEY environment variable with export DATAGOVAPIKEY=[your api key].

With No Cache

You can refer to the College Scorecard API documentation to learn more about this particular API, but for our purposes we're just going to use the schools endpoint, search on college name and get back the latitude and longitude for each result.

$ curl -s "https://api.data.gov/ed/collegescorecard/v1/schools?api_key=$DATAGOVAPIKEY&school.name=Harvard&fields=school.name,location.lon,location.lat&per_page=100" | jq
{
  "metadata": {
    "total": 1,
    "page": 0,
    "per_page": 100
  },
  "results": [
    {
      "school.name": "Harvard University",
      "location.lat": 42.374429,
      "location.lon": -71.118177
    }
  ]
}

We're getting data, but if you measure the time taken, it's a significant chunk of time with network latency and processing time. And it's not like Harvard is going to move.

All the code in this article is available from https://github.com/compose-ex/redisapicache.

Let's look at how we could speed this up without using a database. Here's a small app with no caching, nocache.js:

"use strict";
// Add the express web framework
const express = require("express");  
const app = express();  
const fetch = require("node-fetch");

const apikey = process.env.DATAGOVAPIKEY;

app.get("/schools", (req, resp) => {  
  let terms = req.query.name;
  fetch(
    "https://api.data.gov/ed/collegescorecard/v1/schools?api_key=" +
      apikey +
      "&school.name=" +
      terms +
     "&fields=school.name,location.lon,location.lat&per_page=100"
  )
    .then(res => res.json())
    .then(json => {
      resp.send(json);
    })
    .catch(err => {
      console.error(err);
      resp.send(202);
    });
});

app.listen(3000);  

This uses the Express framework to run a small webserver waiting for queries on /schools and taking the name parameter as a search term. It then creates a request on the data.gov endpoint, gets the result and returns that to the requester. Let's test that....

❯ curl -s "http://localhost:3000/schools?name=Harvard" | jq
{
  "metadata": {
    "total": 1,
    "page": 0,
    "per_page": 100
  },
  "results": [
    {
      "school.name": "Harvard University",
      "location.lat": 42.374429,
      "location.lon": -71.118177
    }
  ]
}

So that works. Let's do it ten times and time the requests:

❯ for n in 1 2 3 4 5 6 7 8 9 10
for> curl -s "http://localhost:3000/schools?name=Harvard" -o /dev/null -w "\n%{time_total}"

1.054434  
0.570914  
0.676394  
0.360369  
0.749712  
0.970191  
0.570921  
0.562341  
0.514966  
0.546428  

Protip: The -s stops curl displaying progress, -o /dev/null discards the actual output and -w "\n%{time_total}" writes out a new line and the total time the call took.

Simple. And Slow. And showing all the network latency variation you might expect on the modern internet.

With A Dumb Cache

So let's do some caching, without any kind of database backing it up. We'll create a JavaScript map...

let cache = {};  

and then before we go to do the fetch request, we'll check that cache.

app.get("/schools", (req, resp) => {  
  let terms = req.query.name;
  let result = cache[terms];
  if (result != null) {
    console.log("Cache hit for " + terms);
    resp.send(result);
  } else {
    console.log("Cache missed for " + terms);

Now, if we miss finding the value in the cache, then we do the fetch request. One last change to make it all work...

     .then(json => {
        cache[terms] = json;
        resp.send(json);
      })

... when we get the results back, we save them in the cache. Let's run our test again.

❯ for n in 1 2 3 4 5 6 7 8 9 10
curl -s "http://localhost:3000/schools?name=Harvard" -o /dev/null -w "\n%{time_total}"

0.705937  
0.008542  
0.006950  
0.016867  
0.005305  
0.005843  
0.005379  
0.006736  
0.016139  
0.005329  

Now requests are fast. Super fast even. And there's lots of problems hidden away in here.

First our cache app will grow and grow and never stop. Unless it stops, in which case every query will generate a request till it stops missing the cache. And results will hang around forever and never get refreshed. And that cache can't be shared between multiple instances of the front end.

With a Redis Cache

Imagine now, if we could move that cache variable into a shared service. One that could take care of managing how much memory it used for this. One that could automatically expire out old data and evict least used data when memory was tight. That's where Redis comes in.

Redis is all about data structures. It's a key/value store so we'll make use of that to store our results. First we need a Redis database. You can create your own local Redis or you can go to Compose and create one. When you do create one on Compose, you should take note of the cache option:

Creating a Redis Deployment

Redis on Compose has two easy-to-set modes; storage and cache. Storage saves all data and scales up to save all data. Cache, on the other hand, has a fixed memory usage that you set and uses Redis's eviction algorithms to optimally manage what's retained in memory. It also doesn't write the data to disk at any time, keeping it lean and fast. Select that checkbox and create your deployment. Once it's created, get the Connection String from Compose Console's Overview - and don't forget to get the password - and we'll make a new environment variable, COMPOSE_REDIS_URL with it.

export COMPOSE_REDIS_URL=redis://admin:PASSWORD@sl-eu-lon-2-portal.8.dblayer.com:23176  

Ok, our Redis server is ready, let's go work on our code. Let's start with where we initialized the cache. We'll need to replace that with the code to connect to Redis.

const redis = require("redis");

let connectionString = process.env.COMPOSE_REDIS_URL;

if (connectionString === undefined) {  
  console.error("Please set the COMPOSE_REDIS_URL environment variable");
  process.exit(1);
}

let client = null;

if (connectionString.startsWith("rediss://")) {  
  client = redis.createClient(connectionString, {
    tls: { servername: new URL(connectionString).hostname }
  });
} else {
  client = redis.createClient(connectionString);
}

The next change we need to make is where we looked up the incoming request's search term. This is doing a Redis GET command for a key comprised of "schools/" and the query term.

app.get("/schools", (req, resp) => {  
  let terms = req.query.name;
  client.get("schools/" + terms, (err, result) => {

Notice that we are prefixing the query terms with the string "schools/". Requests could come in for any string, and we could be sharing the Redis server with some other applications. So we want to avoid eating up the key namespace with the various college names and we do that here by adding a prefix. We could also hash the terms into a fixed length string which may, or may not, aid indexing. Here though, we'll keep them readable.

The only other change is to change how we save results. Where we added the term to the cache variable, we change this to write to Redis.

.then(json => {
          client.setex("schools/" + terms, 300, JSON.stringify(json));
          resp.send(json);
        })

There's a little bit to unpack here. We're using Redis's SET command to set a key, again with the "schools/" prefix, to a given value, which we get by converting the JSON to a string. To be precise though, we're using the SETEX command, which sets a key and sets a time to live or EXpiry on that key in seconds. It's set to 300 here so that you can see things expire in 5 minutes (5*60). Set it for real code, to whatever is appropriate.

Now if we start the Redis enabled version, it'll start caching with our Redis database. We're ready to test it now.

❯ for n in 1 2 3 4 5 6 7 8 9 10
curl -s "http://localhost:3000/schools?name=Harvard" -o /dev/null -w "\n%{time_total}"

0.540623  
0.023686  
0.022845  
0.023647  
0.025248  
0.024173  
0.030334  
0.031094  
0.024204  
0.019742  

Not as fast as keeping it all in-memory but acceptably fast times. The big difference comes if we stop it. Now, the data will still be available in Redis until it expires. That means if we just restart our program, it'll carry on without having to reload the cache. We can also start second and third copies of the application to service more requests.

This is the simplest application level cache we could create. We could go beyond this to caching results of calculations on data retrieved from different or multiple APIs, only doing the work when a completely unseen request comes in.

Caching Up

We've run through the basics of caching with Redis here. Caching does have other considerations to bear in mind when it comes to applications and, when misapplied, can actually add load to a system. Identify the data you can effectively cache first and you will reap the benefits.


Read more articles about Compose databases - use our Curated Collections Guide for articles on each database type. If you have any feedback about this or any other Compose article, drop the Compose Articles team a line at articles@compose.com. We're happy to hear from you.

attribution Ambitious Creative Co.

Dj Walker-Morgan
Dj Walker-Morgan was Compose's resident Content Curator, and has been both a developer and writer since Apples came in II flavors and Commodores had Pets. Love this article? Head over to Dj Walker-Morgan’s author page to keep reading.

Conquer the Data Layer

Spend your time developing apps, not managing databases.