Building OHLC Data in PostgreSQL

Published

In this Write Stuff article, Lucero Del Alba illustrates the power of PostgreSQL's aggregate and date functions to analyze financial data, and shows you how to display the results in a web browser using TechanJS.

OHLC charts (open, high, low, close) are essential in finance to illustrate the change in prices, which are generated from ticks (bid and ask prices), most often on the application layer. Let’s review here how to build them in a database by using PostgreSQL features.

We'll go step-by-step from getting raw financial data from a reliable source, storing it in a database, and generating from it what's necessary to make a candlestick chart, which we'll visualize in the end. Also, we'll introduce some of the terminology, so hopefully you'll be able to follow along even if you don't know much about finance by just having a foundation of programming and databases.

Tick Data

A tick is a measurement of the upwards or downwards movement in the price of a currency pair, a stock, or any other exchange traded asset. They are normally expressed in milliseconds and come with a bid (buying) and an ask (selling) price, and they are the minimum amount of price movement that you can track in terms of time. Sometimes there are just a few milliseconds between one tick and another when there's volatility in the market. And, sometimes it may take a few seconds before a new tick is registered in a calm market. From the ticks, you can reconstruct what happened, what the price evolution of a stock was, or the price of a currency pair in a given second, minute, hour, day, or even weeks and months.

The level of granularity of time that's less than seconds is very relevant not only for high-frequency trading (HFT), but also very necessary when building a market simulator that will accurately model trading conditions and to analyze the markets in such a way to make, hopefully, useful predictions.

Getting the Ticks

Google Finance, Yahoo! Finance, and many other free web services won't provide you normally with price information to the granularity of ticks; you'll need to acquire this historical or live data from a specialized source, often, at a price. For this article, we'll use free data from the Swiss forex bank and marketplace Dukascopy, and from these ticks we'll later build data at a high level of granularity.

Let's go ahead and download ticks for the forex symbol EUR/USD for November 8, 2016. Go to the Dukascopy Historical Data Feed, create a free account or login with one of the accepted social networks, and you're ready to go; when asked, choose to download a CSV file. You should end up with a file called EURUSD_Ticks_08.11.2016-08.11.2016.csv of about 3.7 MB.

To download data in a more systematic way from Dukascopy, I'd recommend you use the free version of Tickstory. Keep in mind that carefully selecting your data sources will be critical to prevent garbage in, garbage out (GIGO) situations in your analysis.

Storing Ticks in the DB

If we have a look at the file we just downloaded,

head EURUSD_Ticks_08.11.2016-08.11.2016.csv  

we'll see that it has five columns:

Local time,Ask,Bid,AskVolume,BidVolume  
08.11.2016 00:00:00.186,1.10405,1.1040100000000002,0.75,0.75  
08.11.2016 00:00:00.734,1.10405,1.104,0.75,0.75  
08.11.2016 00:00:01.931,1.10405,1.1039700000000001,0.75,2.25  
08.11.2016 00:00:03.058,1.10408,1.104,0.75,0.75  
08.11.2016 00:00:05.202,1.10407,1.10399,0.75,0.75  
08.11.2016 00:00:07.959,1.10406,1.10398,0.75,0.75  
08.11.2016 00:00:11.335,1.10406,1.10404,0.75,2.1  
08.11.2016 00:00:12.463,1.1041100000000001,1.10404,0.75,2.1  
08.11.2016 00:00:36.313,1.1041200000000002,1.10404,0.75,2.85  

So we'll log into Postgres and use the following data schema to create a table for our data:

CREATE TABLE "EUR/USD"  
(
  dt timestamp without time zone NOT NULL,
  bid numeric NOT NULL,
  ask numeric NOT NULL,
  bid_vol numeric,
  ask_vol numeric,
  CONSTRAINT "EUR/USD_pkey" PRIMARY KEY (dt)
);

Next, we'll make use of PostgreSQL COPY to import the CSV data into the newly created EUR/USD table (you'll need root privileges to import from files), but before doing so we'll need to temporarily set DateStyle to match that of the CVS file:

SET DateStyle = 'ISO,DMY';

COPY "EUR/USD" FROM '<PATH>/EURUSD_Ticks_08.11.2016-08.11.2016.csv' CSV HEADER;  

And now, for some SQL magic...

OHLC Data

OHLC stands for open, high, low, close, which are key points typically used in finance to keep track of prices of a given instrument (in our case, the EUR/USD) within a given period of time (let's say, an hour).

As mentioned in the introduction, OHLC data is typically generated in the application layer, that is, by a program or script that's processing the ticks. What we'll do next, however, is use PostgreSQL features to generate OHLC data right in the database when querying the ticks.

Building in the DB

To build the OHLC data, we'll make use the following PostgreSQL date/time and aggregate functions when using the GROUP BY clause:

The first argument for date_trunc should indicate the precision to which we want to truncate the date/time:

Let's say we want to generate OHLC data for hours, this is what the SQL code will look like:

SELECT  
    date_trunc('hour', dt) dt,
    (array_agg(bid ORDER BY dt ASC))[1] o,
    MAX(bid) h,
    MIN(bid) l,
    (array_agg(bid ORDER BY dt DESC))[1] c,
    SUM(bid_vol) bid_vol,
    SUM(ask_vol) ask_vol,
    COUNT(*) ticks
FROM "EUR/USD"  
WHERE dt BETWEEN '2016-11-08' AND '2016-11-09'  
GROUP BY date_trunc('hour', dt)  
ORDER BY dt  
LIMIT 100;  

We arbitrarily chose the bid price for doing the calculations, but you can use the ask price as well just by replacing bid with ask on the array_agg, and MAX() and MIN() arguments.

For the high price (h column) we simply chose the maximum and the low price (l column) for the minimum.

The volumes (bid_vol and ask_vol) are a summation of their respective values, and we conveniently added a ticks column with a count of how many ticks were in this period, which is a very relevant piece of information to illustrate how volatile the period was.

Finally, notice that the WHERE and LIMIT clauses won't change the result set for now, but it's a good idea to put them in place for when we need them, as we'll surely do as soon as we move forward.

A variant with the arithmetic average between the bid and ask prices, in case you want to show a less biased price, would look like this:

SELECT  
    date_trunc('hour', dt) dt,
    ((array_agg(bid ORDER BY dt ASC))[1] +
     (array_agg(ask ORDER BY dt ASC))[1])/2 o,
    (MAX(bid) + MAX(ask))/2 h,
    (MIN(bid) + MIN(ask))/2 l,
    ((array_agg(bid ORDER BY dt DESC))[1] +
     (array_agg(ask ORDER BY dt DESC))[1])/2 c,
    SUM(bid_vol) bid_vol,
    SUM(ask_vol) ask_vol,
    COUNT(*) ticks
FROM "EUR/USD"  
WHERE dt BETWEEN '2016-11-08' AND '2016-11-09'  
GROUP BY date_trunc('hour', dt)  
ORDER BY dt  
LIMIT 100;  

Back End: Making the Data Feed

The OHLC data alone, however, is of little use if we are not going to parse it, analyze it or, at least, visualize it. So let's go ahead and create a small script that's going to feed OHLC data to the front end layer of our application.

I'll use Python to this end, but even if this is not your language of choice, it will only exemplify how to make a prototypical data feed, and we promise you that the rest of the guide will not be Python-specific.

The script has a functional approach:

  1. Query the ticks database with the query_ticks() function (which could later be expanded to accept a date range, a valid period, and a limit for the result set).
  2. Format the results returned by the database with format_as_csv() so that we have a valid CSV file containing the OHLC data.
#!/usr/bin/python3 -u
"""OHLC data feed."""
import cgitb  
import psycopg2

conn = psycopg2.connect(database='test')  # set the appropriate credentials  
cursor = conn.cursor()

SQL = '''SELECT  
    date_trunc('hour', dt) dt,
    (array_agg(bid ORDER BY dt ASC))[1] o,
    MAX(bid) h,
    MIN(bid) l,
    (array_agg(bid ORDER BY dt DESC))[1] c,
    SUM(bid_vol) bid_vol,
    SUM(ask_vol) ask_vol,
    COUNT(*) ticks
FROM "EUR/USD"  
WHERE dt BETWEEN '2016-11-08' AND '2016-11-09'  
GROUP BY date_trunc('hour', dt)  
ORDER BY dt  
LIMIT 100;'''


def query_ticks(date_from=None, date_to=None, period=None, limit=None):  
    """Dummy arguments for now.  Return OHLC result set."""
    cursor.execute(SQL)
    ohlc_result_set = cursor.fetchall()

    return ohlc_result_set


def format_as_csv(ohlc_data, header=False):  
    """Dummy header argument.  Return CSV data."""
    csv_data = 'dt,o,h,l,c,vol\n'

    for row in ohlc_data:
        csv_data += ('%s, %s, %s, %s, %s, %s\n' %
                     (row[0], row[1], row[2], row[3], row[4], row[5] + row[6]))

    return csv_data

if __name__ == '__main__':  
    cgitb.enable()

    ohlc_result_set = query_ticks()
    csv_data = format_as_csv(ohlc_result_set)

    print('Content-Type: text/plain; charset=utf-8\n')
    print(csv_data)

    cursor.close()

Front End: Visualizing a Candlestick Chart

Since we were already using Python as a scripting language, we could have easily generated a chart image with the matplotlib.finance API (see demo chart), or even an eye-candy rich-featured chart for the browser with Bokeh; however, we intend to make this guide non-Python specific so all the code in this section will be a pretty much drop-in regardless of your back end solution.

To that end, we'll use TechanJS, which is an open source stock charting and technical analysis library built on D3.js, to build interactive financial charts for desktop and mobile browsers.

We'll plot a candlestick chart from our OHLC data feed, which is a combination of line and bar charts to represent open, high, low and close prices for a given period of time (day, hours, minutes).

Just for your information, these are the components of a candle in a candlestick chart:


Candlestick chart Wikimedia Commons.

The following code is an adaptation of the TechanJS Crosshair chart. The data source we're passing to d3.csv() is our data_feed.py served by the web browser, but you can just drop in yours and it will work just the same as long as you are returning a valid CSV file with the appropriate header (dt,o,h,l,c,vol).

The resulting chart should look like this:


chart

Use the following HTML code to produce the chart:

<!DOCTYPE html>  
  <meta charset="utf-8">
  <title>Candlesitck chart with crosshair</title>
  <style>
    body { font: 10px sans-serif; }

    path.candle { stroke: #000000; }
    path.candle.body { stroke-width: 0; }
    path.candle.up { fill: #00AA00; stroke: #00AA00; }
    path.candle.down { fill: #FF0000; stroke: #FF0000; }

    .crosshair { cursor: crosshair; }
    .crosshair path.wire { stroke: #DDDDDD; stroke-dasharray: 1, 1; }
    .crosshair .axisannotation path { fill: #DDDDDD; }
  </style>
  <body>
  <script src="http://d3js.org/d3.v4.min.js"></script>
  <script src="http://techanjs.org/techan.min.js"></script>
  <script>
    var margin = {top: 20, right: 50, bottom: 30, left: 50},
            width = 960 - margin.left - margin.right,
            height = 500 - margin.top - margin.bottom;

    var parseDate = d3.timeParse("%Y-%m-%d %H:%M:%S");

    var x = techan.scale.financetime()
            .range([0, width]);

    var y = d3.scaleLinear()
            .range([height, 0]);

    var candlestick = techan.plot.candlestick()
            .xScale(x)
            .yScale(y);

    var xAxis = d3.axisBottom(x);

    var xTopAxis = d3.axisTop(x);

    var yAxis = d3.axisLeft(y);

    var yRightAxis = d3.axisRight(y);

    var ohlcAnnotation = techan.plot.axisannotation()
            .axis(yAxis)
            .orient('left')
            .format(d3.format(',.4f'));

    var ohlcRightAnnotation = techan.plot.axisannotation()
            .axis(yRightAxis)
            .orient('right')
            .translate([width, 0]);

    var timeAnnotation = techan.plot.axisannotation()
            .axis(xAxis)
            .orient('bottom')
            .format(d3.timeFormat('%Y-%m-%d %H:%M'))
            .width(90)
            .translate([0, height]);

    var timeTopAnnotation = techan.plot.axisannotation()
            .axis(xTopAxis)
            .orient('top')
            .format(d3.timeFormat('%Y-%m-%d %H:%M'))
            .width(90);

    var crosshair = techan.plot.crosshair()
            .xScale(x)
            .yScale(y)
            .xAnnotation([timeAnnotation, timeTopAnnotation])
            .yAnnotation([ohlcAnnotation, ohlcRightAnnotation])
            .on("enter", enter)
            .on("out", out)
            .on("move", move);

    var svg = d3.select("body").append("svg")
            .attr("width", width + margin.left + margin.right)
            .attr("height", height + margin.top + margin.bottom)
            .append("g")
            .attr("transform", "translate(" + margin.left + "," + margin.top + ")");

    var coordsText = svg.append('text')
            .style("text-anchor", "end")
            .attr("class", "coords")
            .attr("x", width - 5)
            .attr("y", 15);

    d3.csv("data_feed.py", function(error, data) {
        var accessor = candlestick.accessor();

        data = data.slice(0, 200).map(function(d) {
            return {
                date: parseDate(d.dt),
                open: +d.o,
                high: +d.h,
                low: +d.l,
                close: +d.c,
                volume: +d.vol
            };
        }).sort(function(a, b) { return d3.ascending(accessor.d(a), accessor.d(b)); });

        x.domain(data.map(accessor.d));
        y.domain(techan.scale.plot.ohlc(data, accessor).domain());

        svg.append("g")
                .datum(data)
                .attr("class", "candlestick")
                .call(candlestick);

        svg.append("g")
                .attr("class", "x axis")
                .call(xTopAxis);

        svg.append("g")
                .attr("class", "x axis")
                .attr("transform", "translate(0," + height + ")")
                .call(xAxis);

        svg.append("g")
                .attr("class", "y axis")
                .call(yAxis);

        svg.append("g")
                .attr("class", "y axis")
                .attr("transform", "translate(" + width + ",0)")
                .call(yRightAxis);

        svg.append('g')
                .attr("class", "crosshair")
                .datum({ x: x.domain()[10], y: 1.104 })
                .call(crosshair)
                .each(function(d) { move(d); }); // Display the current data

        svg.append('text')
                .attr("x", 5)
                .attr("y", 15)
                .text("EUR/USD (hourly)");
    });

    function enter() {
        coordsText.style("display", "inline");
    }

    function out() {
        coordsText.style("display", "none");
    }

    function move(coords) {
        coordsText.text(
            timeAnnotation.format()(coords.x) + ", " + ohlcAnnotation.format()(coords.y)
        );
    }
    </script>
  </body>
</html>  

Other JavaScript Libraries

Before ending, we'll leave you with a quick review of JS libraries for financial and time series charting:

For financial charts in general (including candlesticks charts):

For time series charts:

Wrap-Up

We went all the way from downloading ticks for a financial instrument to generating OHLC data that's typically used to display and analyze prices for such instruments. The visualization was a plus. What was interesting about what we did was we let the database do most of the heavy work by using PostgreSQL features to generate the OHLC data for which we would have otherwise needed to write a separate program.

Generating the OHLC in the database comes at a computational expense, however, but you can save CPU performance and take this implementation further by using materialized views (PostgreSQL >= 9.3) to store every result set on a spare table, so that every OHLC set that's been queried before will be transparently retrieved from that table instead of being computed again.

And naturally, the more you can compute in the DB, the more you can profit from services like Hosted PostgreSQL with Compose.

Lucero dances, [plays music](http://luzdealba.bandcamp.com/), writes about random topics, leads projects to varying and doubtful degrees of success, and keeps trying to be real and he keeps failing.

attribution Didier Weemaels

This article is licensed with CC-BY-NC-SA 4.0 by Compose.