Introduction to Backtesting

Backtesting is a mathematical simulation used by traders to evaluate the performance of a trading strategy. The simulation leverages historical market data in an attempt to calculate how well a trading strategy would have done in the past.

At its core, backtesting is a way for traders to try predicting whether or not a strategy will be profitable when implemented with real capital. Traders use backtesting to filter out any strategy that hasn’t been profitable historically.

Although historical performance does not guarantee future results, backtesting is still the most reliable way to identify robust strategies. It’s necessary to study these simulations to filter out strategies that clearly underperform. That way we have the best chances to make money and don’t need to test strategies with real funds.

As cryptocurrency trading tools have become more popular, so has backtesting. Today, it’s recommended that traders thoroughly backtest every strategy before releasing it into the wild crypto market. That way we can gain confidence that the strategy has the potential to perform optimally.

Backtesting Data Requirements

Before we can start backtesting strategies, we must understand the different data types that developers use to build backtesting tools and how they each represent the real-world market.

Candlestick Data

The most common way to implement a backtesting tool is for developers to use OHLCV candlestick data. The reason most developers use this data is because it’s readily available.

Unfortunately, although it’s the easiest data to access for building these tools, it is the most unreliable data. In fact, using OHLCV candlestick data to run backtests can be the difference between building a profitable strategy and losing your money.

The situation turns even worse when traders use aggregated candlestick data from sources like CoinMarketCap. Aggregated data is not a valid representation of the actual orders that were available on a specific exchange at the time.

Don’t use candlestick data to build backtesting tools.

Tick-by-Tick Trade Data

Tick-by-tick trade data can be a useful component for historical backtesting tools. Individual tick trades are the exact trades that were executed on an exchange at each moment. These individual trades represent real orders that were filled, so we know for certain that there must have been an open order available at that price on the exchange.

Although tick trade data can be a powerful aspect of backtesting services, it will still only be slightly more accurate than OHLCV candlestick data. Individual trade data points don’t provide information about the state of the order book at the time of the trade. As a result, developers can’t accurately assess what orders would have been available on the exchange at that exact moment when a simulated trade is executed.

Using tick-by-tick trade data for backtesting tools is discouraged.

Order Book Snapshot Data

The last type of data that is common in backtesting tools is order book snapshots. Order book snapshots provide the exact state of a market at the time of the snapshot. The intention is to have a full representation of what orders were available on the exchange at a particular time.

When building backtesting tools, this is the most powerful type of data to use. Since the data includes the precise orders that were available at the time a trade is simulated, we can calculate the exact trades we could take and the price of each of those trades.

Order book snapshots allow developers to simulate the impact of the bid-ask spread, slippage, and liquidity.

Order book snapshots are highly recommended as the data type for backtesting tools.

Data Sources

The primary source for order book data is each individual crypto exchange itself. In most cases, this data is streamed live through an exchange’s websockets. However, due to the sheer volume of data, exchanges generally don’t store this data for the long-term. That means once the data is sent through an exchanges websocket, it’s gone forever.

Unless, of course, someone collects the data from the exchange and makes it available through a 3rd party service. This is where data providers enter the picture. Data providers are essentially companies that aggregate data across each exchange and store it so other people can access it later.

Data providers for historical order book snapshots are few and far between. Due to the limited supply of this data, developers have resorted to alternative data sets, like OHCLV candlesticks, which can cause inaccuracies for backtests. As a result, most backtesting tools available in the market today misrepresent the performance of strategies.

After a recent partnership between Shrimpy and Kaiko, Shrimpy is now able to offer a complete historical catalog of order book snapshots across every major exchange. Dating back to as early as 2014, Kaiko has been meticulously collecting tick-by-tick trade data, order book snapshots, and OHLCV candlesticks.

Developers can access this data through the Shrimpy Developer APIs. Using the simple on-demand pricing model, customers can query for snapshots across different time-frames, trading pairs, and exchanges.

Kaiko provides the most precise data in the market. Now, every developer can access Kaiko’s data to accurately simulate backtests through the Shrimpy APIs.

Simulating A Backtest

Figure 1: An example order book for the ENJ-USDT trading pair.

To precisely calculate how a strategy will perform, a backtest requires the most exact numbers possible. Some factors that must be considered during a backtest include:

The exchange’s trading fee
The bid-ask spread for the trading pair
Market slippage on the order book
Timing for each individual trade

When simulating the buying of an asset, we must use the asking price on the order book. If you’re on the exchange, the best asking price is the lowest price anyone on the exchange is willing to sell the asset. Don’t forget to also factor in the trading fee and slippage.

Using the order book in Figure 1 as an example, let’s imagine we want to buy 1,500 USDT worth of ENJ. For the sake of this example, let’s assume this order book is for Binance, which has a base trading fee of 0.1%.

We could simulate the buying of 1,500 USDT worth of ENJ by incrementally increasing our order price over the order book until we have purchased our desired amount of 1,500 USDT worth of ENJ. The consecutive trades we would execute include the following:

Buy 1151.74904126 ENJ at 0.20559424 USDT each = 236.97296881 USDT + 0.2369729 USDT in fees ( 1262.79005829 USDT left)
Buy 2559.954 ENJ at 0.20640294 USDT each = 528.38203186 USDT + 0.52838203 USDT in fees ( 733.8796444 USDT left)
Buy 1992.51418976 ENJ at 0.20659518 USDT each = 411.64382769 USDT + 0.41164382 USDT in fees (321.82417288 USDT left)
Buy 1555.85587451 ENJ at 0.20663894 USDT each = 321.50267164 USDT + 0.32150267 USDT in fees (0 USDT left)

Notice that there was some left over on the order book that we could not buy at the 0.20663894 price point. The amount we didn’t buy would remain on the exchange for another market participant to take.

In total, we bought exactly 7260.08410553 ENJ after all trades were completed. If we had only used OHLCV candlestick data, our estimate would likely have been as far off as 7319.76112984. This is a difference of almost 60 ENJ or nearly 1%. It might not seem like a lot, but this small percentage compounds incredibly quickly if we are simulating hundreds or thousands of trades.

Once the trade simulation is complete, record the results of the order so we can use those funds to trade to another asset later in the backtest. Using this detailed trade record, we can keep meticulous logs of every trade that was made during the backtest. These logs can be used to calculate additional stats like the trading volume we execute, how many trades we performed, and the frequency of buying or selling a particular asset.

Performance Results

Calculating the performance of a strategy is simple. All we need to do is calculate the value of our portfolio at the beginning of the backtest and compare it to the value of our portfolio at the end of the backtest.

The value of a portfolio is calculated by multiplying the amount of each asset we hold by the price of that asset and summing the values of all assets in the portfolio.

By doing this calculation at the beginning of the backtest and once again at the end of the backtest, we can get the change of value for our portfolio over the course of the backtest.

Calculating performance can then be done by using the equation:

Performance = [( Vf - Vi) / Vi] x 100

Where,

Vf is the final value of the portfolio
Vi is the initial value of the portfolio
Multiply by 100 to convert from a decimal to a percentage

Notice that the purpose of a backtest is not only to optimize for performance. Essentially, just because a specific strategy performs well under the backtest conditions, that doesn’t automatically mean it’s a good strategy. We must also consider the consistency and robustness of the strategy.

Backtesting Consistency - The ability to produce similar results over different historical periods and varying market conditions.

Backtest Robustness - The ability to produce similar results even when minor changes are made to the strategy parameters.

A strategy without robustness can see big performance swings when even the smallest changes are made to a strategy’s parameters. Similarly, a strategy that isn’t consistent will likely experience vastly different results when testing different historical time periods.

In the ideal case, we want to use a strategy that can be backtested on any historical time period and produce similar results. Likewise, our strategy’s performance should not experience large swings when minor changes are made to the strategy.

Strategies without consistency or robustness can lead to widely unpredictable future performance. If backtesting a variety of historical time periods and configurations for our strategy produces widely varying results, it could indicate that our strategy is unpredictable. In that case, selecting only a single configuration or backtesting period to evaluate would essentially be overfitting the strategy to a particular situation. Results of an overfit backtest would not be a general representation of the strategy.

An example of a consistent strategy we have found is rebalancing. In the vast majority of cases, rebalancing outperformed hodling. Even when we adjusted the rebalance period from 1 hour to 1 day to 1 month.

Backtest Red Flags

Due to the technical nature of backtesting, it is sometimes difficult to identify if a backtest is reliable. The following red flag items will help you identify if the results of a backtest are reasonable. This is not an extensive list, but some of the most common cases.

Performance increases after every trade. If the performance consistently increases, especially after each trade, this can indicate there is a calculation error in the trading logic.
Consistent exponential growth in funds. When the performance results of a backtest grow exponentially over time, this can often be the result of using OHLCV candlesticks for the trade simulations or points to a calculation error that is a percentage off.
High-frequency trading strategies don’t decrease in value. In general, a strategy that trades a significant amount will lose value due to trading fees. If a high-frequency trading strategy doesn’t lose value, the backtest might not be considering exchange fees.
Low-liquidity markets perform the same as high liquidity markets. A simple way to detect if a backtesting tool uses OHLCV candlestick data or aggregated data is to run the strategy on a low-liquidity market that typically has a large spread. High-frequency trading on a low-liquidity market should result in large portfolio losses.
Changing exchanges doesn’t impact results. Every exchange has different liquidity and trading fees. When backtesting strategies on different exchanges, you should get different results. If you get the same results on different exchanges, that suggests the backtesting tool is using aggregated data and doesn’t use the correct trading fee for each individual exchange.

Before accepting the results of a backtest at face value, use these red flags to identify issues with the simulated trades.

Conclusions

There have been a few major themes throughout this article. Primarily, we have demonstrated how difficult it can be to build a robust backtesting tool. However, at the same time, we’ve been able to illustrate the importance of backtesting a strategy before deploying the strategy live.

The first step to building a backtesting strategy has always been to have high-quality data. Without high-quality order book data, the results will be highly inaccurate. Ultimately, making decisions based on faulty backtesting tools can be costly. It can cause us to have unrealistic expectations for a strategy that eats away at our portfolio.

When building a backtesting tool, don’t forget to simulate trading fees, slippage, and the bid-ask spread. Each of these aspects of a backtest can make a big difference. Removing even one of these components from the backtest can be the difference between a profitable and unprofitable strategy.

Finally, before deploying a strategy based on backtests, keep testing. When you think you’re done testing, test again. Instead of 100 tests, run 100,000 tests. Backtesting is the best way for us to understand the behavior of a strategy. Try forming new hypotheses for strategies and testing those hypotheses to identify new strategies. Continue the experimentation cycle until you find strategies that work for you.

How To: Backtest a Cryptocurrency Trading Strategy

Introduction to Backtesting

Backtesting Data Requirements

Candlestick Data

Tick-by-Tick Trade Data

Order Book Snapshot Data

Data Sources

Simulating A Backtest

Performance Results

Backtest Red Flags

Conclusions

Maximize Your Crypto Portfolio

More articles

5 Popular Alternatives to Robinhood in 2023

Automated Crypto Investing: A Comprehensive Guide

5 Crypto Robo-Advisors to Use in 2023