FreqAI — from price to prediction

Emergent Methods
16 min readMay 16, 2023

--

A guide to feature engineering for algorithmic trading using machine learning
- by Robert Caulk and Elin Törnquist

Introduction

Market dynamics remain one of the most stochastic processes known to humankind. This is likely due to the origin of the stochasticity: human emotion. Although the study of market dynamics dates back thousands of years to futures pricing for olives in the Roman Empire, equity price movements remain difficult to predict. While equity traders like to find and exploit market patterns to make their own predictions, the discovery and exploitation of these patterns affect how future patterns may or may not emerge. This phenomena, combined with the chaotic nature of human emotion, results in a very nasty parameter space. It is therefore a perfect proving ground for adaptive modeling methods and feature engineering.

Figure 1: Some typical market patterns used by traders to make decisions about their trading strategies.

Until the advent of machine learning, market patterns were described using standard math and statistical quantities. For example, the Simple Moving Average (SMA) is defined as:

The SMA equation can be summarized as finding the mean for the asset closing price over a prescribed historical time window. Asset closing price is simply a discretization of the price movement on the asset during a specific time frame commonly referred to as the “candle time frame”. This means that the SMA can be computed for various time frames, such as 5 minute closing price (5m), 1 hour closing price (1h), and so on. Indicators, such as the above example SMA, are numerous. In general, these indicators are all various combinations of rolling standard deviations and rolling sums on asset volume and price.

Indicators like these provide fertile ground for feature set engineering in machine learning. Each indicator is used to explain different market dynamics at different scales. Since these market dynamics derive themselves from human emotion, the indicators themselves are a derivation of human emotion. Thus, indicators provide a coarse look at how traders are feeling and how they may act or react at any given moment.

The behavior of technical indicators, such as the SMA, are often used to characterize market sentiment. An example is when the 50-day SMA crosses above the 200-day SMA — this is referred to as a “golden cross” and indicates a shift in market momentum that suggests higher prices to come.

Figure 2: The relationship between technical indicators can be used to assess market sentiment.

The first instinct for any machine learning enthusiast may be to simply feed raw indicators directly into a neural network or regression algorithm and train it to the price movements. However, the raw indicators do not contain sufficient information to make a probabilistically relevant prediction. We need to build a more sound data set if we hope to properly capture market sentiment.

Establishing the parameter space

At its core, machine learning is simply an optimized method for finding a functional description of a unique “parameter space.” The parameter space is another way to say “the complex relationships between market variables that produce the observable behavior.” Let’s simplify this a bit and look at the driving components of the market dynamics:

Figure 3: The orthogonality of disparate data sources helps expand the parameter space of the feature set.

While this is a gross simplification of the true parameter space (which itself is changing with time), it gives an idea of the relationship between some of the key contributors in the market parameter space. Keep in mind, we cannot perfectly describe the true parameter space, we can only try our best to “fill it out” with our observations (e.g., price data, on-chain data, social data, news, etc.). Still, other parameter space variables exist that are not observable to us. For example, we know that traders interacting with the market are also influencing the market. Say a trader gets stuck in traffic and has a mood change which affects his/her trading for the day. This is a contributing variable to the parameter space that we cannot know or model.

Keep in mind that the parameter space itself is dynamic in time, especially for chaotic systems like crypto markets. This is why FreqAI offers adaptive modeling capabilities, to allow the model to adjust to the changing parameter space.

Knowing the limitations of the information available to us, we need to think about how to best fill out the parameter space so that the machine learning algorithm can properly interpolate inside the it. Enter feature set engineering.

Deeper feature engineering

Feature set curation is a hot topic in the machine learning world, and rightly so as it has a profound effect on the modeling outcome. The discussion here is to demonstrate the nature of feature set design in market dynamics, to guide feature set engineers as they begin transforming their data sets in creative ways, and to introduce how feature engineering is done in FreqAI — an open-source project for real-time adaptive modeling of cryptocurrency markets using machine learning, developed by Emergent Methods with the help of more than 15 open-source contributors. Details about the FreqAI architecture are published in the Journal of Open Source Software [1].

The first step of feature set design requires the creation of a diverse base feature set with a goal of spreading our parameter space as broadly as possible. Technically speaking, this is called “increasing feature variance.” Feature variance is easily quantifiable by assessing the correlation between features through building a “correlation matrix.” In some machine learning applications, this correlation matrix may be used to identify strong correlations. However, in market dynamic prediction, we seek the opposite — we are looking for features that are uncorrelated. This non-correlation is an indication that the feature is providing some additional information about the parameter space that is not captured by any of the other indicators (see Figure 3). The importance of this step cannot be stressed enough — this is the foundation of the feature set design. A weak foundation will lead to a poorly described parameter space, which leads to an incorrect model and thus, poor predictions.

Figure 4: Components and transformations to use during feature set engineering.

The second step of market feature design is best described as “transforming” the base set of indicators. These transformations are where we can inject our creativity into the model. It is where we “sculpt” the coarse parameter space outlined by our base features.

Before transforming the base indicators, we should consider that any transformation we perform on the data should be anchored to the origin of market dynamics: human emotion. In other words, we first need to answer the question, “why do traders make decisions?” before we can start transforming our base feature set. It is illustrative to examine how traders look at price movements in their TradingView charts. Traders don’t look at an indicator at a single moment in time — they look at the historic evolution of the indicator through time before making their decision to buy or sell an asset. It makes sense: an increasing SMA may suggest the market is gaining strength, while a decreasing SMA could suggest the opposite. Thus, when engineering features for a machine learning model, we should transform the base data set to include the change of the indicators. This can be done in a variety of ways, such as taking the first derivative of the time series, or by flattening the desired number of time steps into the input vector of the model. This means the input vector length is 1 dimensional with length N features x T time points of interest.

FreqAI feature: Expanding the base feature set with a temporal shift is done by setting an integer for the number of requested shifts: `”include_shifted_candles”: 4`.

Note: This temporal expansion is a good tactic for decision trees where the temporal aspect of the time series is otherwise lost. However, in other cases, such as for autoregressive methods, the temporal nature of the feature is inherent to the model training/inferencing and is decided by dataset creation and batch sizing.

Traders also tend to watch multiple temporal resolutions (“time frames”) to evaluate both micro- and macro-movements. Thus, it makes sense to replicate all base indicators for a variety of time frames so that the parameter space includes the relationship between micro- and macro-movements.

FreqAI feature: Expanding the base feature set for multiple time frames is done by providing a list of time frames: `”include_timeframes”: [“5m”, “15m”, “4h”, “1d”]`.

Traders also like to vary the look-back window on their indicators. These windows are often quite personal and traders become tuned in to their “preferred” look-back windows. This means the feature set should also include a varied look-back period such as SMA(10 candles), SMA(20 candles), and SMA(30 candles). By including these look-back periods, the model gains a sense of this additional dimension that traders use before clicking “buy” or “sell.”

FreqAI feature: Expanding the base feature set for multiple look-back windows is done by providing a list of window sizes: `“include_periods_candles”: [10, 20, 30]`.

These are some of the transformations that we may consider when building our feature set. But it should be noted that these transformations start to enlarge the data set in an exponential manner. For example, if we want three look-back periods, for each of three time frames, and we want the past three candles for each of those, the total feature set becomes 3 * 3 * 3 = 27 features based on a single indicator selected during the first step of feature design. If the feature set includes 20 indicators, the total feature set contains 540 features.

It is common for traders to also keep “informative assets” in mind when entering or exiting a trade on an asset. Informative assets are usually market leaders that tend to exert a strong influence on market movements. For example, a cryptocurrency trader would perform poorly if they did not have the health of Bitcoin in mind at all times when entering and exiting trades. Likewise, the S&P500 plays a similar role in traditional equity markets. Thus, a good feature set should include informative asset data for all the base indicators, as well as all the transformations discussed above. Further, traders typically use more than a single informative asset. Adding informative assets expands the feature set even more: If we include two informative assets in our feature set, the base feature set of 20 indicators has now grown to 1620 in total.

FreqAI feature: Including informative assets in the feature set is simply done by providing a list of the assets: `“include_corr_pairlist”: [“BTC/USDT”, “ETH/USDT”]`.

Time frames, temporal shifts, look-back windows, and informative assets are only a few possible components to add to the feature set. It is only the imagination of the feature set engineer that is the limit.

Data preprocessing

Data set preprocessing is equally important to base feature set design. It cannot be ignored. This is another piece of the model design process where we can inject creativity into the model. However, the basis of data set preprocessing is not dependent on human emotion, instead it is dependent on maintaining statistical control over the dataset.

Figure 5: Preprocessing steps to ensure a statistically robust feature set.

For illustrative purposes, we take our feature set from the previous section, which comprises 1620 columns (each column represents one of the features). Each row of this data set is a single data point representing the feature set at a single moment in time (typically the time of the candle close in traditional assets and cryptocurrencies).

Dealing with missing data

The first step of preprocessing this data set is to identify the non-existing data entries and remove them (Figure 5: NaN removal). These are referred to as “NaNs” in data science and their mismanagement is a typical pitfall to early machine learning enthusiasts. NaNs originate from missing data points. These missing data points may be holes in the data set due to difficulty collecting data by the original source, but more commonly in market analytics, NaNs originate from the look-back period associated with the majority of market indicators. For example, the SMA(‘1h’, window=30) computes SMA on 1 hour candles using statistical quantities associated with the previous 30 candles. Thus, the feature requires 30 hours of closing price data points before the SMA(‘1h’, window=30) has sufficient data. Until that point, the data set is filled with NaNs. These NaNs must be properly dealt with or else the parameter space will be incorrectly defined, resulting in poor model performance.

Many early machine learning enthusiasts believe that these NaNs can be easily search and replaced with “0” or some other quantity that has no “apparent” relevance. This is a mistake — inserting 0s in place of NaNs for our SMA(‘1h’, window=30) would bias the model to associate a single outlier value, 0, to price action. Other data science students may think more critically about this and decide to replace the NaNs with a statistical quantity of the column, such as the mean or median. This is a better approach, but these data science students should consider how there are other features associated with same data points, and the combination of the median from Feature A with a real value in Feature B is an artificial combination that never occurred in reality. If we feed unrealistic data to the model, how can we expect it to make realistic predictions?

The only statistically consistent and realistic way to handle NaNs is to remove the entire data point (the full row, including the other existing feature data).

The astute reader would recognize that the number of data points available in our indicator data set is strictly limited by the indicator at the highest time frame using the longest look-back window. Despite the fact that our data set is mostly filled by the shorter indicators relying on much less time, the data corresponding to time points where the longer time frame indicators are NaN must be sacrificed to satisfy strict statistical assumptions about the data. This sacrifice is not in vein, it enforces a clear picture of the parameter space.

FreqAI feature: FreqAI automates this entire process so that the user does not need to worry about how much data is needed or where the data should come from. For example, if the user asks for one full month of training data provided a custom set of indicators, FreqAI will identify the longest indicator and preload the data so that there are no time frame induced NaNs in the entire month of data.

Feature scaling

Once the dataset is cleaned, each feature column should be normalized to have matching amplitudes. The normalization process is typical in machine learning for a variety of reasons. The principle motivation for normalization originates from the optimization process that each of the popular machine learning algorithms employ. Referring details to [2], in summary: It is more efficient to run an optimization on a set of features that are all scaled to the same interval.

While neural networks perform much better with normalized data, decision tree algorithms do not warrant the same feature scaling because they operate based on rules relating to each individual feature. Nevertheless, scaling features for use in decision tree algorithms has no adverse effects and is hence still advisable.

FreqAI feature: FreqAI automatically scales training features to the interval [-1, 1]. The scaling information from the train dataset is then used to transform the test and prediction data to prevent data leakage.

Feature scaling is not only important for optimizing model training. Many data manipulations techniques are reliant on the features being on the same scale. One such example is dimensionality reduction techniques such as Principal Component Analysis (PCA) which will not yield proper transformations unless the input data is scaled. Principal component transformation is useful when we have a very large feature set that takes a long time to train. The transform creates a new set of features where each one contains components of the original features. The new PC features are ordered in terms of how much of the variance of the original features they contain. By only keeping those representing the largest variance, we can reduce the size of our feature set whilst retaining the most amount of information.

FreqAI feature: FreqAI allows users to reduce the dimensionality of their feature set by simply setting `”principal_component_analysis”: true` in their configuration file.

Another important effect of feature scaling is seen for clustering algorithms, such as SVMs and k-means, as they rely on distance measurements for their computations. Such algorithms can be used for both feature engineering and outlier detection.

Outlier detection

As with most chaotic systems, the financial market suffers from non-patterned events that can be seen as noise in the feature set. These events lie in the tail of the statistical distribution of the parameter space and can be classified as outlier data points.

Classifying outlier data points is an extremely beneficial practice in chaotic systems. On one hand, we can identify that an incoming point does not belong to our known parameter space (such data are known as “novelty data”). That means our model will not have a high confidence on a prediction associated with this point since it will need to extrapolate into unknown territory. On the other hand, we can make choices about which data points we want to train on to help sculpt our model’s understanding of the parameter space. In other words, we can purposefully exclude outlier points that we believe do not contribute to the structure of the parameter space.

Figure 6: Example of outlier detection using the Dissimilarity Index in FreqAI.

Irrespective of the technique, novelty and outlier identification should be based on characterization of the training data set to, just as for data scaling, circumvent data leakage. As such, if a test data set is present, outliers should be identified and removed using the parameters obtained from the training data; novelty data in incoming inference data should be identified and removed in the same manner.

FreqAI feature: FreqAI implements three different methods for outlier and novelty detection: Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Support Vector Machine (SVM), and a custom metric — the Dissimilarity Index (DI).

Creating model targets

The creation of targets (“labels”) is equally important to the creation of features. Targets require many of the same data processing treatments, such as NaN removal, and normalization. But the decision of what to target is one of the most creative pieces of the model design.

For modeling the financial market, targets can be as simple as looking forward X candles at the price change. But there are both advantages and disadvantages of using such a naive approach. For example, you still need to make a decision post inference about what threshold of price change you are willing to accept before entering a trade. Any trader knows that the expected price change will fluctuate, by definition, with market volatility. So a better target is one that adapts with the market. One example used in FreqAI is the detection of local minima/maxima on the closing price, for example, in a 200 candle moving window.

Figure 7: Extrema (maxima and minima) points can be identified using a sliding window.

Deciding the length of the moving window is where we need to step back and reconsider our sculpted parameter space. Does our parameter space explain a 200-candle window on the 5 minute time frame? Establishing a rule for these things can be difficult, so it is more illustrative to consider the periodicity in our parameter space. Do we have fluctuations across many 200-candle windows on the 5 minute time frame? This would mean that we have a periodic description of that time scale. An example limit case would be to use a 1-week candle target in a model that is trained on 1 week of 5m and 15m candles — there is not enough weekly pattern described in the parameter space to properly inform the model about how to describe the target.

FreqAI feature: FreqAI allows the user to define any target and to use the resulting model predictions for custom enter and exit conditions.

The need for adaptive training

Market regime changes are, by definition, the cause of price movement. Traders speculate on market regime changes at all scales, starting with day traders looking for afternoon market changes, to macro-traders looking to buy into decadal capitulation events. In any case, the market character is always changing, and this means that the model used to describe it needs to change with it.

There are a variety of machine learning approaches designed to accommodate changing environments. Common approaches include fine tuning models on the fly, LSTM methods, autoregressive roll-outs, and reinforcement learning techniques. While variations of these are already available in FreqAI, the architecture of FreqAI introduces a base line for adaptive modeling which is to simply retrain the model on a sliding window as new data streams into the system (Figure 8). The hypothesis is that the chaotic changes to the parameter space are so strong, that starting from scratch may be more adaptive than fine-tuning on the fly (however all methods are available for experimentation in FreqAI).

Figure 8: Adaptive modeling of a system that changes over time requires updating the model when new data become available.

FreqAI feature: The robust architecture of FreqAI takes care of updating and saving data, training models, inferencing trained models as new data arrives, and providing actionable predictions based on the most up-to-date model available.

Conclusion

This article touches upon a variety of important considerations for feature set engineers to keep in mind when setting up their machine learning workflow for feature set design. We hope that this can help improve the machine learning feature set design process and encourage out-of-the-box thinking whilst keeping in mind crucial fundamentals to ensure ending up with a robust and balanced feature set.

If you want to start watching, learning from, and trading on the cryptocurrency market, all you need to do to run the FreqAI software is install Freqtrade and run the following command:

freqtrade trade --config config_examples/config_freqai.example.json --strategy FreqaiExampleStrategy --freqaimodel LightGBMRegressor --strategy-path freqtrade/templates

You are welcome to join the FreqAI discord, where you will find a bunch of like-minded people to share your experiences with, and visit the Emergent Methods’ website, where we have ongoing experiments showcasing the predictive effects of different machine learning algorithms, feature engineering, and more.

References

  1. FreqAI: generalizing adaptive modeling for chaotic time-series market forecasts. Robert A. Caulk, Elin Törnquist, Matthias Voppichler, and others. Journal of Open Source Software (2022), doi:10.21105/joss.04864
  2. Studying the effects of feature scaling in machine learning. Hanan Alshaher. Doctoral dissertation, North Carolina Agricultural and Technical State University (2021).

DISCLAIMER FreqAI is not affiliated with any cryptocurrency offerings. FreqAI is, and always will be, a not-for-profit, open-source project. FreqAI does not have a crypto token, FreqAI does not sell signals, and FreqAI does not have a domain besides the Freqtrade documentation https://www.freqtrade.io/en/latest/freqai/. Please beware of imposter projects, and help us by reporting them to the official FreqAI discord server.

--

--

Emergent Methods

A computational science company focused on applied machine learning for real-time adaptive modeling of dynamic systems.