How to Build an Options Database Like Jim Simons (Step-by-Step)
Who is Jim Simons
If you would have invested 100 dollars with Jim Simons 30 years ago, you would have 1.4 billion dollars right now.
Now I know what you’re thinking. Shut up explaining and tell me how to invest in his fund. Well. I have some bad news. They don’t let anyone invest unless you work for the hedge fund itself, known as renaissance technologies.
But it’s not all bad news. Simon’s often quoted that his edge was large quantities of data. This is because all of the decisions the fund took were based on models. And the only way for these models to get good was large quantities of data.
In this blog, we’re going to look at how you can get started building a data powerhouse, to power your next billion dollar venture.
How to Find and Edge with Data
Choose Rare Data
When choosing data it cannot be easily accessible to the general public. If it is, then everyone has it, and you no longer have an edge. Be careful though, just because a specific area has lots of data, it may not be being stored in a specific way. For example, there is lots options data accessible, but usually in a daily timeframe. If you were to start storing tick options data for a basket of products. You may have an edge. Equally, you could choose a rare dataset in and of itself. Something that no one is likely to be storing, giving you an edge.
Choose Long-Term Data
It is also crucial that you can store this data for a long period of time. Any data not sustainable is worthless as it will not be enough to give the model a fair chance to begin making assumptions.
How to Structure your Data
Only Store What You Need
If you are saving data over a 5, 10 or 20 year period then the file sizes are going to get large. Don’t make this any worse by storing unnecessary data. For example, if you have a running total in one column, just delete that, you can always calculate that easily at a later date. Or if you store the name of the data type, but that is also included in another column.
Keep it Stupid Simple
Do not try and be too fancy here. Think of data structures that live forever. Think CSV, SQL, these types of simple solutions. You do not want to get caught lacking in 5 years time when the newest fancy database goes bust and doesn’t support you anymore.
How to Store your Data
Now this might come as a shock but you do not want anything fancy here. Remember the sole purpose of this is to store the data. Using CSV files is completely fine here. We don’t need fancy indexing at this point, we can do that later. For now we just want the files to be as small as possible to keep our costs down. An SQL database file and a CSV file with the exact same amount of data can be double the size required for storage. If you want to go one step further, you can use parquet files which are even smaller.
Where to Store your Data
In an ideal world you would store this in two places - both on cloud and on prem. For a beginner, this could like something like storing a copy in google cloud storage as well as on a hard drive from amazon. But maintaining two copies does increase the effort required. For this reason I’d suggest starting with just a cloud option, you can automate this and don’t need to worry about buying new hard drives when they run out of space.
Did you find this article helpful?