Metrics Maven: Calculating a Weighted Moving Average in PostgreSQL

Published

In our Metrics Maven series, Compose's data scientist shares database features, tips, tricks, and code you can use to get the metrics you need from your data. In this article, we'll calculate a weighted moving average.

In this article we'll learn how to calculate a weighted moving average in PostgreSQL. To do this, we'll combine what we learned from our last article on weighted averages with one of the approaches we learned in a previous article on calculating a moving average. With the concepts from both of these already under our belts, it'll be a breeze.

Let's go.

Weighted moving average

A weighted moving average allows us to smooth out trends over time so it's easier to get a read on how the trend is progressing while giving us the ability to add a weighting factor so that some of the data will be treated as "more important" than other data. Weighted moving averages are often applied to stock prices so that it's easy to see if the price is generally moving up or down, giving more weight to the recent dates for the stock price and less weight to former dates. That's because recent dates are expected to be more indicative of future prices than former dates. Weighted moving averages in business are sometimes used to forecast future costs or revenue based on previous dates.

As usual, it's easier to explain by example so let's use our hypothetical pet supply business, that we've used in recent articles on mean, median, and mode, to understand the concept better. What we'd like to know for our business is how the 7-day average order value is trending. Similar to stock prices, we believe that more recent order values are more indicative to future order values that older ones are. To see how the average order value is trending, we're going to calculate a 7-day weighted moving average for it.

Our data

For our weighted moving average calculation, we're going to want summary data at the daily level.

For the daily average order value, we can use what we learned in our article on mean. The example in that article has one day's worth of orders, which we calculated to have an average value of 19.98. If we follow that technique to get the average value for each day, ignoring orders with 0 items since those are invalid, we can create an orders summary table that looks like this:

date       | total_orders | total_order_items | total_order_value | average_order_items | average_order_value  
--------------------------------------------------------------------------------------------------------------
2017-01-01 | 14           | 18                | 106.84            | 1.29                | 7.63  
2017-01-02 | 10           | 21                | 199.79            | 2.10                | 19.98  
2017-01-03 | 12           | 17                | 212.98            | 1.42                | 17.75  
2017-01-04 | 12           | 15                | 100.93            | 1.25                | 8.41  
2017-01-05 | 10           | 13                | 108.54            | 1.30                | 10.85  
2017-01-06 | 14           | 20                | 216.78            | 1.43                | 15.48  
2017-01-07 | 13           | 16                | 198.32            | 1.23                | 15.26  
2017-01-08 | 10           | 12                | 124.67            | 1.20                | 12.47  
2017-01-09 | 10           | 16                | 140.88            | 1.60                | 14.09  
2017-01-10 | 17           | 19                | 136.98            | 1.12                | 8.06  
2017-01-11 | 12           | 14                | 99.67             | 1.17                | 8.31  
2017-01-12 | 11           | 15                | 163.52            | 1.36                | 14.87  
2017-01-13 | 10           | 18                | 207.43            | 1.80                | 20.74  
2017-01-14 | 14           | 20                | 199.68            | 1.43                | 14.26  
2017-01-15 | 16           | 22                | 207.56            | 1.38                | 12.97  
2017-01-16 | 14           | 19                | 176.76            | 1.36                | 12.63  
2017-01-17 | 13           | 18                | 184.48            | 1.38                | 14.19  
2017-01-18 | 14           | 25                | 265.98            | 1.79                | 19.00  
2017-01-19 | 10           | 17                | 178.42            | 1.70                | 17.84  
2017-01-20 | 19           | 24                | 139.67            | 1.26                | 7.35  
2017-01-21 | 15           | 21                | 187.66            | 1.40                | 12.51  
2017-01-22 | 19           | 24                | 226.98            | 1.26                | 11.95  
2017-01-23 | 17           | 24                | 212.64            | 1.41                | 12.51  
2017-01-24 | 16           | 21                | 187.43            | 1.31                | 11.71  
2017-01-25 | 19           | 27                | 244.67            | 1.42                | 12.88  
2017-01-26 | 20           | 29                | 267.44            | 1.45                | 13.37  
2017-01-27 | 17           | 25                | 196.43            | 1.47                | 11.55  
2017-01-28 | 21           | 28                | 234.87            | 1.33                | 11.18  
2017-01-29 | 18           | 29                | 214.66            | 1.61                | 11.93  
2017-01-30 | 14           | 20                | 199.68            | 1.43                | 14.26  
2017-02-01 | 19           | 27                | 189.98            | 1.42                | 10.00  
2017-02-02 | 22           | 31                | 274.98            | 1.41                | 12.50  
2017-02-03 | 20           | 28                | 213.76            | 1.40                | 10.69  
2017-02-04 | 21           | 30                | 242.78            | 1.43                | 11.56  
2017-02-05 | 22           | 34                | 267.88            | 1.55                | 12.18  
2017-02-06 | 19           | 24                | 209.56            | 1.26                | 11.03  
2017-02-07 | 21           | 33                | 263.76            | 1.57                | 12.56  

This will be our base data for calculating our weighted moving average in this article, referred to as daily_orders_summary.

The weighting factor

In a weighted moving average, we're going to add weights to our values before aggregating them, similar to how we did in our previous article on weighted averages; however, the weights for a weighted moving average are fractional values that must add up to 1. Since we're doing a 7-day weighted moving average, we're going to create our weights as fractions of 28. We get 28 by summing the days 1-7: (7 + 6 + 5 + 4 + 3 + 2 + 1) = 28.

The date furthest in the past will have the least weight, 1/28. The second oldest date will have the weight 2/28. The third oldest date will have the weight 3/28. And so on...

We can construct a weighting factor table for reference that looks like this:

days_past | fraction  | weight  
-------------------------------
0         | 7/28       | 0.25  
1         | 6/28       | 0.21  
2         | 5/28       | 0.18  
3         | 4/28       | 0.14  
4         | 3/28       | 0.11  
5         | 2/28       | 0.07  
6         | 1/28       | 0.04  

The most current date, 0 days in the past, will give the most weight at 0.25 in this scenario. You can adjust the weights accordingly for your situation. If you are also doing a 7-day weighted moving average and 0.25 feels too low for the most recent date, then you can increase that if you decrease weight elsewhere. Just make sure that the weights cover the spectrum of the days you want to consider and that the weights add up to 1. For us, this formula works fine: (0.25 + 0.21 + 0.18 + 0.14 + 0.11 + 0.07 + 0.04) = 1.0.

Our 7-day weighted moving average

The most straightforward method of using PostgreSQL to calculate a weighted moving average uses the alternative method we shared in our previous article on calculating a moving average. This approach uses two aliases of our data table, comparing one against the other based on date intervals. Take a look at the section called "An alternative method for a simple moving average" in the previous article if you aren't familiar with this technique or if you just need a refresher.

We're going to expand on that approach by including a CASE WHEN condition to apply the weights and then sum the weighted values to get our weighted moving average. Our query looks like this:

SELECT dos_1.date,  
ROUND(AVG(dos_2.average_order_value), 2) AS moving_average,  
ROUND(SUM(CASE  
        WHEN dos_1.date - dos_2.date = 0 THEN 0.25 * dos_2.average_order_value -- most recent date
        WHEN dos_1.date - dos_2.date = 1 THEN 0.21 * dos_2.average_order_value
        WHEN dos_1.date - dos_2.date = 2 THEN 0.18 * dos_2.average_order_value
        WHEN dos_1.date - dos_2.date = 3 THEN 0.14 * dos_2.average_order_value
        WHEN dos_1.date - dos_2.date = 4 THEN 0.11 * dos_2.average_order_value
        WHEN dos_1.date - dos_2.date = 5 THEN 0.07 * dos_2.average_order_value
        WHEN dos_1.date - dos_2.date = 6 THEN 0.04 * dos_2.average_order_value -- date furthest in the past
    END), 2) AS weighted_moving_average
FROM daily_orders_summary dos_1  
JOIN daily_orders_summary dos_2 ON dos_2.date >= dos_1.date - interval '6 days'  
    AND dos_2.date <= dos_1.date
GROUP BY dos_1.date  
ORDER BY dos_1.date  
;

By comparing the dates in each table across an interval of 7 total days (6 days in the past + the current date), we can determine how far in the past a given date is for the calculation and apply the correct weight to its average order value. We then sum the 7-day weighted values to arrive at our weighted moving average.

Notice that we've also included the simple moving average in our query for comparison purposes. For both values we're using ROUND to 2 decimal places which we learned about in our article on making data pretty.

Our results will look like this:

date       | moving_average | weighted_moving_average  
------------------------------------------------------
2017-01-07 | 13.62          | 13.85  
2017-01-08 | 14.31          | 13.59  
2017-01-09 | 13.47          | 13.55  
2017-01-10 | 12.09          | 12.15  
2017-01-11 | 12.07          | 11.25  
2017-01-12 | 12.65          | 11.95  
2017-01-13 | 13.40          | 13.97  
2017-01-14 | 13.26          | 14.13  
2017-01-15 | 13.33          | 14.09  
2017-01-16 | 13.12          | 13.89  
2017-01-17 | 14.00          | 14.19  
2017-01-18 | 15.52          | 15.43  
2017-01-19 | 15.95          | 16.03  
2017-01-20 | 14.03          | 13.86  
2017-01-21 | 13.78          | 13.51  
2017-01-22 | 13.64          | 13.02  
2017-01-23 | 13.62          | 12.78  
2017-01-24 | 13.27          | 12.27  
2017-01-25 | 12.39          | 12.21  
2017-01-26 | 11.75          | 12.39  
2017-01-27 | 12.35          | 12.37  
2017-01-28 | 12.16          | 12.07  
2017-01-29 | 12.16          | 12.02  
2017-01-30 | 12.41          | 12.54  
2017-01-31 | 12.17          | 11.93  
2017-02-01 | 12.11          | 12.04  
2017-02-02 | 11.73          | 11.65  
2017-02-03 | 11.73          | 11.64  
2017-02-04 | 11.87          | 11.72  
2017-02-05 | 11.75          | 11.54  
2017-02-06 | 11.50          | 11.72  

We'll ignore dates before Jan. 7 since, in our data set, there weren't enough days prior for earlier dates to accurately calculate the 7-day weighted moving average.

Now, if we plot the moving average and the weighted moving average against the original daily average order values, we'd have a chart that looks like this:

comparing average, moving average, weighted moving average

As we can see, the original average order values show peaks and valleys, making it somewhat difficult to spot a trend. The simple moving average smooths out the spikiness, but over- or under-corrects in some places. The weighted moving average smooths out the trend further and gives just enough flavor for us to be able to see that the trend had been on an incline around the middle of January, but has since declined. Looks like we've got some work to do to improve the average order value in our pet supply business.

Wrapping up

In this article we've calculated a weighted moving average using some concepts we covered in previous articles. We've seen how the weighted moving average can help us spot trends in our business that might have been hidden behind too much volatility or too much correcting. You can also use a similar approach to apply an exponentially weighted moving average where the weights that are applied decrease exponentially the further we go into the past. We'll look at this metric more closely in our next article.

Image by: Unsplash

Conquer the Data Layer

Spend your time developing apps, not managing databases.