Backtesting is a mathematical simulation used by traders to evaluate the performance of a trading strategy. The simulation leverages historical market data in an attempt to calculate how well a trading strategy would have done in the past.
At its core, backtesting is a way for traders to try predicting whether or not a strategy will be profitable when implemented with real capital. Traders use backtesting to filter out any strategy that hasn’t been profitable historically.
Although historical performance does not guarantee future results, backtesting is still the most reliable way to identify robust strategies. It’s necessary to study these simulations to filter out strategies that clearly underperform. That way we have the best chances to make money and don’t need to test strategies with real funds.
As cryptocurrency trading tools have become more popular, so has backtesting. Today, it’s recommended that traders thoroughly backtest every strategy before releasing it into the wild crypto market. That way we can gain confidence that the strategy has the potential to perform optimally.
Before we can start backtesting strategies, we must understand the different data types that developers use to build backtesting tools and how they each represent the real-world market.
The most common way to implement a backtesting tool is for developers to use OHLCV candlestick data. The reason most developers use this data is because it’s readily available.
Unfortunately, although it’s the easiest data to access for building these tools, it is the most unreliable data. In fact, using OHLCV candlestick data to run backtests can be the difference between building a profitable strategy and losing your money.
The situation turns even worse when traders use aggregated candlestick data from sources like CoinMarketCap. Aggregated data is not a valid representation of the actual orders that were available on a specific exchange at the time.
Don’t use candlestick data to build backtesting tools.
Tick-by-tick trade data can be a useful component for historical backtesting tools. Individual tick trades are the exact trades that were executed on an exchange at each moment. These individual trades represent real orders that were filled, so we know for certain that there must have been an open order available at that price on the exchange.
Although tick trade data can be a powerful aspect of backtesting services, it will still only be slightly more accurate than OHLCV candlestick data. Individual trade data points don’t provide information about the state of the order book at the time of the trade. As a result, developers can’t accurately assess what orders would have been available on the exchange at that exact moment when a simulated trade is executed.
Using tick-by-tick trade data for backtesting tools is discouraged.
The last type of data that is common in backtesting tools is order book snapshots. Order book snapshots provide the exact state of a market at the time of the snapshot. The intention is to have a full representation of what orders were available on the exchange at a particular time.
When building backtesting tools, this is the most powerful type of data to use. Since the data includes the precise orders that were available at the time a trade is simulated, we can calculate the exact trades we could take and the price of each of those trades.
Order book snapshots allow developers to simulate the impact of the bid-ask spread, slippage, and liquidity.
Order book snapshots are highly recommended as the data type for backtesting tools.
The primary source for order book data is each individual crypto exchange itself. In most cases, this data is streamed live through an exchange’s websockets. However, due to the sheer volume of data, exchanges generally don’t store this data for the long-term. That means once the data is sent through an exchanges websocket, it’s gone forever.
Unless, of course, someone collects the data from the exchange and makes it available through a 3rd party service. This is where data providers enter the picture. Data providers are essentially companies that aggregate data across each exchange and store it so other people can access it later.
Data providers for historical order book snapshots are few and far between. Due to the limited supply of this data, developers have resorted to alternative data sets, like OHCLV candlesticks, which can cause inaccuracies for backtests. As a result, most backtesting tools available in the market today misrepresent the performance of strategies.
After a recent partnership between Shrimpy and Kaiko, Shrimpy is now able to offer a complete historical catalog of order book snapshots across every major exchange. Dating back to as early as 2014, Kaiko has been meticulously collecting tick-by-tick trade data, order book snapshots, and OHLCV candlesticks.
Developers can access this data through the Shrimpy Developer APIs. Using the simple on-demand pricing model, customers can query for snapshots across different time-frames, trading pairs, and exchanges.
Kaiko provides the most precise data in the market. Now, every developer can access Kaiko’s data to accurately simulate backtests through the Shrimpy APIs.
To precisely calculate how a strategy will perform, a backtest requires the most exact numbers possible. Some factors that must be considered during a backtest include:
When simulating the buying of an asset, we must use the asking price on the order book. If you’re on the exchange, the best asking price is the lowest price anyone on the exchange is willing to sell the asset. Don’t forget to also factor in the trading fee and slippage.
Using the order book in Figure 1 as an example, let’s imagine we want to buy 1,500 USDT worth of ENJ. For the sake of this example, let’s assume this order book is for Binance, which has a base trading fee of 0.1%.
We could simulate the buying of 1,500 USDT worth of ENJ by incrementally increasing our order price over the order book until we have purchased our desired amount of 1,500 USDT worth of ENJ. The consecutive trades we would execute include the following:
Notice that there was some left over on the order book that we could not buy at the 0.20663894 price point. The amount we didn’t buy would remain on the exchange for another market participant to take.
In total, we bought exactly 7260.08410553 ENJ after all trades were completed. If we had only used OHLCV candlestick data, our estimate would likely have been as far off as 7319.76112984. This is a difference of almost 60 ENJ or nearly 1%. It might not seem like a lot, but this small percentage compounds incredibly quickly if we are simulating hundreds or thousands of trades.
Once the trade simulation is complete, record the results of the order so we can use those funds to trade to another asset later in the backtest. Using this detailed trade record, we can keep meticulous logs of every trade that was made during the backtest. These logs can be used to calculate additional stats like the trading volume we execute, how many trades we performed, and the frequency of buying or selling a particular asset.
Calculating the performance of a strategy is simple. All we need to do is calculate the value of our portfolio at the beginning of the backtest and compare it to the value of our portfolio at the end of the backtest.
The value of a portfolio is calculated by multiplying the amount of each asset we hold by the price of that asset and summing the values of all assets in the portfolio.
By doing this calculation at the beginning of the backtest and once again at the end of the backtest, we can get the change of value for our portfolio over the course of the backtest.
Calculating performance can then be done by using the equation:
Performance = [( Vf - Vi) / Vi] x 100
Where,
Notice that the purpose of a backtest is not only to optimize for performance. Essentially, just because a specific strategy performs well under the backtest conditions, that doesn’t automatically mean it’s a good strategy. We must also consider the consistency and robustness of the strategy.
Backtesting Consistency - The ability to produce similar results over different historical periods and varying market conditions.
Backtest Robustness - The ability to produce similar results even when minor changes are made to the strategy parameters.
A strategy without robustness can see big performance swings when even the smallest changes are made to a strategy’s parameters. Similarly, a strategy that isn’t consistent will likely experience vastly different results when testing different historical time periods.
In the ideal case, we want to use a strategy that can be backtested on any historical time period and produce similar results. Likewise, our strategy’s performance should not experience large swings when minor changes are made to the strategy.
Strategies without consistency or robustness can lead to widely unpredictable future performance. If backtesting a variety of historical time periods and configurations for our strategy produces widely varying results, it could indicate that our strategy is unpredictable. In that case, selecting only a single configuration or backtesting period to evaluate would essentially be overfitting the strategy to a particular situation. Results of an overfit backtest would not be a general representation of the strategy.
An example of a consistent strategy we have found is rebalancing. In the vast majority of cases, rebalancing outperformed hodling. Even when we adjusted the rebalance period from 1 hour to 1 day to 1 month.
Due to the technical nature of backtesting, it is sometimes difficult to identify if a backtest is reliable. The following red flag items will help you identify if the results of a backtest are reasonable. This is not an extensive list, but some of the most common cases.
Before accepting the results of a backtest at face value, use these red flags to identify issues with the simulated trades.
There have been a few major themes throughout this article. Primarily, we have demonstrated how difficult it can be to build a robust backtesting tool. However, at the same time, we’ve been able to illustrate the importance of backtesting a strategy before deploying the strategy live.
The first step to building a backtesting strategy has always been to have high-quality data. Without high-quality order book data, the results will be highly inaccurate. Ultimately, making decisions based on faulty backtesting tools can be costly. It can cause us to have unrealistic expectations for a strategy that eats away at our portfolio.
When building a backtesting tool, don’t forget to simulate trading fees, slippage, and the bid-ask spread. Each of these aspects of a backtest can make a big difference. Removing even one of these components from the backtest can be the difference between a profitable and unprofitable strategy.
Finally, before deploying a strategy based on backtests, keep testing. When you think you’re done testing, test again. Instead of 100 tests, run 100,000 tests. Backtesting is the best way for us to understand the behavior of a strategy. Try forming new hypotheses for strategies and testing those hypotheses to identify new strategies. Continue the experimentation cycle until you find strategies that work for you.
Each day Shrimpy executes over 200,000 automated trades on behalf of our investor community. And joining them is easy.
After you sign up and connect your first exchange account, you’ll deploy an investment-maximizing strategy in as few as 5-minutes.
Whether you create your own rebalancing strategy or completely custom automation, the ability to walk your own path belongs in the hands of every crypto investor.
Discover 5 popular alternatives to Robinhood and decide which app you should start investing with in 2023.
Dive into automated crypto investing. Learn how it works, its advantages, potential risks, and top platforms for your investment journey.
A crypto robo-advisor is a platform that manages your cryptocurrency portfolio automatically. Here are 5 robo-advisors to use in 2023.