TimescaleDB: A Beginners Guide

Intro to TimescaleDB

If you don’t learn to deal with timeseries data then you’ll never learn algorithmic trading. I mean let’s face it, at it’s core that’s all we’re really doing. Every time you look at a chart, it’s time series data. Any time you look at credit card data, parking lot data, all of that is irrelevant without a timestamp attached.

But the real question is. Do you need to learn a specific timeseries database like timescaledb, or is standard SQL good enough to get the job done?

In this blog we’ll look at:

What TimescaleDB is

How to install and get started with TimescaleDB

The main Benefits of TimescaleDB

Conclusion

What is TimescaleDB?

TimescaleDB is a time-series specialist database built on top of PostgreSQL. It prides itself on being exponentially faster than traditional SQL databases. They quote on their website to be 1000x faster at queries than traditional postgres.

Due to its speed it is often used by trading firms that rely on vast amounts of data to be inserted and queried at any time. Making it ideal for quantitative trading, although it is used by many other companies that rely on time series data.

How to install TimescaleDB?

The easiest way is to use the official timescaledb docker image. Install docker if you haven’t already here: https://docs.docker.com/engine/install/ You can then go ahead and open up your terminal and enter the following command:

Note that you have to declare a pg version and latest alone won’t work for this image. You can read more and find the latest version here: https://hub.docker.com/r/timescale/timescaledb

Now we have our docker container running with timescaledb software we can register a connection in pgadmin. This step isn’t necessary but will allow you to visualise your data more easily. You can learn how to do that here: https://www.jonjowadwa.com/blogs/python-postgresql-stock-database

Connect to TimescaleDB with Python

We can connect to our database in a python file with the following code:

We can now create a table in the same way as we do with Postgres, with a slight twist. Instead of adding a primary key, we’ll add a timestamp with a timzone as follows

and then we’ll convert the table to a hypertable:

We can now create a stock and add it to the table with:

Benefits of TimescaleDB

But the true value comes from when we have a lot of data. If we were to for example loop through every number from 1 to 1million and add that price to the database, with 1 million rows, then a tradition database would start becoming slower.

TimescaleDB on the other hand, won’t slow down. The reason for this is it performs automatic partitioning of the data. Earlier when we called the create_hypertable method, we told timescaledb to break our database into chunks, based on the time column. In other words, when the rows get over a certain number, break off and start a new table.

You might then ask but what about if I want to query something like the average price over a total period, then the partitioning becomes redundant as I need access to all the data? Well, you’d be wrong. Timescale does something called continuous aggregation. What this means is that each mini table or chunk of table can have an average price attached to it, saving you querying all the data, and increasing speed by a lot.

Testing Timescale vs Postgres

Test one was to insert 500,000 rows into each db respectively. Timescale was significantly faster. Presumably because it has to partition the tables.

Blog post image

When running advanced queries, however, the speed differences are highly noticeable. We can see this advanced query took nearly three times as long in postgres vs timescaledb in a table of 800,000 rows.

Blog post image

Should you use TimescaleDB over PostgreSQL?

The main benefits of TimescaleDB seem to be focused around speed. With evergrowingly faster hardware, the need for speed is becoming more and more redundant. However, if you have a project that requires a lot of speed and lot of data, then it is a no-brainer. To have such a small learning curve, especially if you come from a postgres background, then the benefits are well worth it.

For me, I will use it for any timeseries data projects I work with. I love the partitioning of data based on the timestamp and that its so close to postgres that many other developer will be able to pick it up. Unlike other time-series solutions like KDB+ where you have to learn an entirely new language.