Real-time head-to-head: Adaptive modeling of financial market data using XGBoost and CatBoost | by Emergent Methods | Jun, 2023


Which gradient boosted resolution tree algorithm is superior for adaptive modeling of monetary market information, XGBoost or CatBoost? There are many articles evaluating these algorithms on arbitrary static datasets, however how do they carry out in a reside, chaotic atmosphere? How about useful resource utilization like common coaching instances, common inference instances, CPU utilization, and RAM consumption? And eventually, how nicely do these predictions translate into revenue? To reply these questions, we designed a benchmark experiment, ran it reside, and picked up the outcomes. Spoiler alert — XGBoost was means quicker and gained by virtually 4x by way of profitability.

Monetary markets are inherently chaotic with worth motion reacting to unexpected information occasions, market manipulation, and heard mentality. Conventional modeling methods typically battle to maintain up with such unpredictability. That is the place adaptive modeling comes into play by offering a dynamic framework that may alter and adapt to altering market situations on the fly.

Determine 1. Adaptive modeling of a system that modifications over time requires coaching new fashions when new information turns into obtainable.

Adaptive modeling boils all the way down to information administration. Information is streamed in to the mannequin for each re-training and inferencing. In reside environments, re-training and inference are occurring concurrently. In the meantime, the information processing pipeline additionally wants cautious consideration by way of function engineering, normalization, outlier elimination, and another manipulation methods.

With FreqAI [1], we have now put collectively all of those totally different mechanics so that you can run adaptive modeling on cryptocurrency market information by way of a user-friendly, group examined, open-source interface, utilizing any exterior ML library of your selecting.

FreqAI is constructed on high of the open-source algorithmic buying and selling software program Freqtrade that permits entry to a wide range of open-source exchange APIs and supplies a set of knowledge evaluation and visualization instruments for evaluating each reside and backtesting efficiency. On high of this, FreqAI at present supplies 18 pre-configured prediction models for XGBoost, CatBoost, LightGBM, PyTorch, and Secure Baselines, and hosts a spread of customized algorithms and methodologies geared toward bettering computational and predictive performances.

To attempt the software program, all it’s essential to do is install Freqtrade and run the next command:

freqtrade commerce --config config_examples/config_freqai.instance.json --strategy FreqaiExampleStrategy --freqaimodel LightGBMRegressor --strategy-path freqtrade/templates

The convenience of switching between XGBoost and CatBoost enabled us to pit XGBoost and CatBoost in opposition to one another, evaluating their efficiency within the area of algorithmic buying and selling for cryptocurrency. In reality, operating them each merely boils all the way down to:

freqtrade commerce --strategy QuickAdapterV3 --freqaimodel XGBoostRegressor
freqtrade commerce --strategy QuickAdapterV3 --freqaimodel CatboostRegressor

All code associated to this experiment is open-source and obtainable for inspection/replica. The underlying FreqAI supply is on the market on Github at In the meantime, the technique and configuration information can be found on the FreqAI discord.

To showcase the talents of FreqAI, we used two of the preferred ML regressors — XGBoost and CatBoost — to carry out a 3-week lengthy examine of their efficiency on reside predictive modeling of chaotic time-series information from the cryptocurrency market.

Between February sixteenth and March twelfth, two FreqAI cases, one for every regressor, have been configured to coach separate fashions for 19 coin pairs (/USDT). The cases have been hosted on separate, equivalent, recycled servers (12 core Xeon X5660 2.8GHz, 64Gb DDR3). All servers have been benchmarked to verify that that they had equivalent efficiency.

The accuracy of the predictions produced by every regressor was assessed by way of two accuracy metrics: the balanced accuracy (the arithmetic imply of sensitivity and specificity) and a customized accuracy rating (the normalized temporal distance between a prediction and its closest goal).

The regressors have been run with default settings, apart from setting the variety of estimators for XGBoost to 1000 (which is the default worth for CatBoost, as in comparison with the XGBoost default of 100), to showcase their default habits when utilized to such a drawback.

All in all, the cluster was actively producing 38 fashions (2 per coin x 19 cash) with ~3.3k options for every one, each 2 hours.

Characteristic engineering

The function set was primarily based on pair-specific worth and quantity information from a sliding window of 14 days main as much as the present time level, acquired from the cryptocurrency change Binance utilizing the open-source CCXT buying and selling library.

The function set for every coin pair contained 42 base indicators computed for the bottom candle timeframe (5 minutes) in addition to for quarter-hour, 1 hour, and 4 hours utilizing the open-source libraries TA-Lib, Pandas-TA, QTPyLib, and Freqtrade technical indicators. A subset of the symptoms have been calculated every for a number of time intervals (8 minutes, 16 minutes, and 32 minutes), and every shifted 3 candles so as to add recency data. We additionally added day-of-the-week and hour-of-the-day as options, and used BTC and ETH as correlated information for all different coin pairs. In whole, this amounted to 3266 options for every coin pair, apart from BTC and ETH that had 2178 options every.

The function engineering parts of the code can be found within the FreqAI discord, and the configuration for function engineering is reasonably easy:

"freqai": {
"feature_parameters": {
"include_corr_pairlist": [
"include_timeframes": [
"label_period_candles": 100,
"include_shifted_candles": 3,
"DI_threshold": 20,
"weight_factor": 0.9,
"indicator_periods_candles": [8, 16, 32],
"noise_standard_deviation": 0.02,
"buffer_train_data_candles": 100


The coaching labels have been outlined because the extrema (minimal and most) factors inside a sliding window of 200 candles (1000 minutes). We outlined the label as &s-extrema, with a worth of -1 for minima and 1 for maxima, and handed a Gaussian filter to easy them to enhance the regression. The code to breed these labels in FreqAI might be discovered under:

from scipy.sign import argrelextrema
def set_freqai_targets(self, dataframe, **kwargs):
Set targets for FreqAI mannequin. Any column prepended with `&`
might be handled as a coaching goal.
dataframe["&s-extrema"] = 0
min_peaks = argrelextrema(
dataframe["low"].values, np.much less,
max_peaks = argrelextrema(
dataframe["high"].values, np.higher,
for mp in min_peaks[0]:[mp, "&s-extrema"] = -1
for mp in max_peaks[0]:[mp, "&s-extrema"] = 1
dataframe["minima-exit"] = np.the place(
dataframe["&s-extrema"] == -1, 1, 0)
dataframe["maxima-exit"] = np.the place(dataframe["&s-extrema"] == 1, 1, 0)
dataframe['&s-extrema'] = dataframe['&s-extrema'].rolling(
window=5, win_type='gaussian', heart=True).imply(std=0.5)
return dataframe

This label definition means two issues: 1) we have now an unbalanced classification drawback, and a pair of) we’re utilizing regression for a classification drawback:

1. Figuring out extrema factors inside a sliding window of 200 candles means that we’ll have solely 2 true positives (one most and one minimal) however 198 true negatives.

2. We wish to predict if the incoming candle corresponds to a most or a minimal or neither, which implies that we’re coping with a multi-class classification drawback.

FreqAI provides you entry to each the regression and classification variations of every of the ML libraries we used. Whereas we have to decide a threshold to make the ultimate classification when utilizing both classification or regression, we discovered that the regressor model performs higher than the classifier for these predictions.

Adaptive thresholding

Utilizing a regressor for a classification process requires dealing with the output predictions after inferencing the educated mannequin. Since a regressor returns a real-valued prediction, this must be transformed right into a binary resolution relating to whether or not or not it’s an extrema. For this, we used an adaptive threshold that was calculated utilizing the imply of the six highest and lowest historic predictions inside the earlier 50 hours for the maxima and minima, respectively. This was carried out in FreqAI with:

num_candles = 600 # 50 hours of 5 minute candles
pred_df_full = self.dd.historic_predictions[pair].tail(num_candles).reset_index(drop=True)
pred_df_sorted = pd.DataFrame()

# type every column independently
for col in pred_df_sorted:
pred_df_sorted[col] = pred_df_sorted[col].sort_values(
ascending=False, ignore_index=True)

# variety of anticipated max minutes over the last 50 hours
frequency = num_candles / 200
# get the imply of the highest and backside candles, use this for the edge
maxima_sort_threshold = pred_df_sorted.iloc[:int(frequency)].imply()
minima_sort_threshold = pred_df_sorted.iloc[-int(frequency):].imply()

Utilizing a dynamic threshold ensures that our regressors output predictions which might be tailored to the state of the market that the mannequin was educated on. It additionally means that there’s a 50 hour “warmup” interval earlier than the a threshold has sufficient historic predictions obtainable to threshold the real-time inference for classification.

Outlier detection

As we touched upon in FreqAI — from price to prediction, our guide to feature engineering for algorithmic trading using machine learning, outlier detection is paramount to minimizing threat when utilizing machine studying for algorithmic buying and selling. In that article, we described a lot of totally different methods for outlier and novelty detection. For the experiment we’re presenting right here, we opted for the Dissimilarity Index — a customized metric obtainable solely in FreqAI.

The Dissimilarity Index (DI) goals to quantify the uncertainty related to every prediction made by the mannequin by evaluating the incoming information used for the prediction to the coaching information. If the prediction information is way away from the coaching information, the mannequin will be unable to correctly assess it and the ensuing prediction shouldn’t be acted upon.

As with turning regressor predictions into binary choices, the DI wants a threshold to be in comparison with with a purpose to decide if the prediction information is near the coaching information or not. Right here, we match the historic (earlier 50 hours) DI information to a Weibull distribution and used the 0.999 percentile as the edge. The selection of cutoff worth will have an effect on how conservatory the DI is, as it is going to enable roughly dissimilar prediction information to be acted upon.

The cutoff is used to categorise outliers as proven within the following determine:

Determine 2. Dissimilarity index (DI) and DI cutoff for outlier detection for BTC/USDT all through the experiment, for every of the 2 regressors. The BTC/USDT shut worth is confirmed within the background to present some perception of the market situations relative to the regressors’ efficiency. We see how market regime modifications (stable arrows) are recognized by the DI crossing the DI cutoff (dashed arrows).

Buying and selling technique

The predictions from the regressors have been used to find out whether or not to enter or exit a commerce:
– If the regressor predicted that the incoming candle was a most or a minimal and it was not at present in a commerce, it could enter a protracted (for a predicted most) or a brief (for a predicted minimal).
– If the regressor predicted that the incoming candle was a most or minimal and it was in a commerce, it could exit a protracted if it predicted a minimal, or exit a brief if it predicted a most. This was outlined in FreqAI utilizing the populate_entry_trend()technique within the technique:

def populate_entry_trend(self, df: DataFrame, metadata: dict) -> DataFrame:
Outline the entry standards for going lengthy or quick.
enter_long_conditions = [
df["do_predict"] == 1,
df["DI_catch"] == 1,
df["&s-extrema"] < df["minima_sort_threshold"],

if enter_long_conditions:
reduce(lambda x, y: x & y, enter_long_conditions), [
"enter_long", "enter_tag"]
] = (1, "lengthy")

enter_short_conditions = [
df["do_predict"] == 1,
df["DI_catch"] == 1,
df["&s-extrema"] > df["maxima_sort_threshold"],

if enter_short_conditions:
reduce(lambda x, y: x & y, enter_short_conditions), [
"enter_short", "enter_tag"]
] = (1, "quick")

return df

Nonetheless, we additionally had different guardrails put in place to assist enhance the efficiency:
– In a earlier check, we seen that staying too lengthy in a commerce usually resulted in a low revenue. Due to this, we restrict the length of trades to 24 hours and any commerce reaching this restrict was exited.
– A cease lack of -4% was put in place to exit any trades that reached -4% revenue.
– If the goal calculation recognized an extrema in the latest candle and this was not already predicted by the regressor, an lively lengthy commerce could be exited if the recognized extrema was a minimal, and an lively quick commerce could be exited if the recognized extrema was a most.
– As was mentioned within the part above about Outlier detection, the customized FreqAI outlier detection technique, the Dissimilarity Index, was used to determine whether or not a predicted extrema could be disregarded or not.

These extra parts have been dealt with contained in the custom_exit() technique within the technique:

def custom_exit(
self, pair: str, commerce: Commerce, current_time: datetime,
current_rate: float, current_profit: float, **kwargs
Consumer defines customized commerce exit standards
dataframe, _ = self.dp.get_analyzed_dataframe(
pair=pair, timeframe=self.timeframe)

last_candle = dataframe.iloc[-1].squeeze()
trade_date = timeframe_to_prev_date(
self.timeframe, (commerce.open_date_utc -
trade_candle = dataframe.loc[(dataframe["date"] == trade_date)]

if trade_candle.empty:
return None
trade_candle = trade_candle.squeeze()

entry_tag = commerce.enter_tag

trade_duration = (current_time - commerce.open_date_utc).seconds / 60

if trade_duration > 1000:
return "commerce expired"

if last_candle["DI_catch"] == 0:
return "Outlier detected"

if (
last_candle["&s-extrema"] < last_candle["minima_sort_threshold"]
and entry_tag == "quick"
return "minimia_detected_short"

if (
last_candle["&s-extrema"] > last_candle["maxima_sort_threshold"]
and entry_tag == "lengthy"
return "maxima_detected_long"

Mannequin accuracy

Since we’re coping with an unbalanced classification drawback, and we prioritize each unfavorable and optimistic predictions (that’s, we care simply as a lot about that our regressors don’t predict an extrema when there’s none, as we do about them accurately predicting one) we assessed the efficiency of the fashions utilizing the balanced accuracy rating:

The balanced accuracy was calculated primarily based on a sliding window of 600 candles (50 hours).

On high of the balanced accuracy, we additionally devised our personal accuracy metric to have the ability to handle the truth that we’re predicting targets in a time collection and therefore have an interest to see the temporal accuracy:

Right here, we’re trying on the temporal distinction between a prediction and its closest goal, normalized to the sliding window of 200 candles (1000 minutes) that we used to establish the targets. An ideal match means we have now a temporal accuracy of 1, while a prediction additional away than 1000 minutes provides a unfavorable worth.

The determine under exhibits the 2 accuracy metrics all through the experiment for the XGBoost regressor, predicting extrema for BTC/USDT. The balanced accuracy was up to date at every candle, while the temporal accuracy solely up to date when there was a predicted extrema.

Determine 3. Balanced (high) and temporal (backside) accuracy scores for the XGBoost regressor predicting extrema for BTC/USDT. The sliding window of fifty hours used to calculate the balanced accuracy is indicated within the high plot by a shaded space. All regressors and cash have been tracked equally by way of our live dashboard.

From Desk 1, we are able to see that the regressors carried out equally by way of each balanced and temporal accuracy.

Desk 1. Common plus/minus commonplace deviation of the mannequin accuracy for the person regressors.

With our goal being outlined as extrema factors inside a window of 1000 minutes, the fashions have been educated to anticipate one minimal/most each 16.7 hours. As you’ll be able to see from Desk 2, the regressors predicted twice the quantity of extrema in comparison with the variety of targets that have been recognized.

Desk 2. Common plus/minus commonplace deviation of the ratio between predictions and targets for the person regressor

Useful resource utilization

The useful resource utilization (Desk 3) for the 2 regressors exhibits that CatBoost was gradual, by way of each coaching and predicting, in comparison with XGBoost. With new information incoming each 5 minutes, the fashions produced by CatBoost weren’t at all times educated utilizing probably the most just lately obtainable information because the regressor was too gradual at finishing the mannequin coaching. Nonetheless, due to the parallelized structure of FreqAI there’s at all times a mannequin obtainable for inferencing.

Desk 3. Common plus/minus commonplace deviation of useful resource utilization through the 3-weeks experiment for the person regressors.


Regardless of being given the very same enter options, the XGBoost and CatBoost regressors carried out very in a different way by way of profitability (Desk 4). On the finish of the experiment, each have been in revenue however XGBoost had clearly outperformed its competitor by ending up at a 7% revenue in comparison with 2% for CatBoost.

Desk 4. Common plus/minus commonplace deviation of shut revenue (% of stake quantity), and ultimate cumulative revenue (% of whole pockets) for the person regressors.

Disruptive market occasions (a few of that are indicated in Determine 4) clearly had results on the profitability of the regressors. The regressors have been educated on historic information, so when the market exhibited sudden modifications that weren’t seen within the coaching information, the regressors carried out poorly. That is, nonetheless, anticipated habits as no machine studying method is ready to predict habits that has not been included within the information used for coaching. As a substitute, the important thing right here is adaptability. Because the beforehand unseen information is included within the subsequent coaching of the regressors, they’re now imbued with new information and may be capable of higher deal with comparable occasions occurring sooner or later. One clear instance of that is occasion F in Determine 4: Occasions A-E characterize a beforehand unseen or rare modifications, and therefore result in poor regressor efficiency. Nonetheless, after sufficient of those occasions had been included into the coaching information set, the regressors (particularly XGBoost) managed to as an alternative benefit from the occasion F.

Determine 4. Cumulative revenue normalized to whole pockets dimension all through the experiment, for every of the 2 regressors. The BTC/USDT shut worth is confirmed within the background to present some perception of the market situations relative to the regressors’ efficiency. Understand that the regressors traded on 19 cash every, a lot of that are solely weakly correlated with BTC. Some disruptive market occasions (stable arrows), with the ensuing revenue habits (dashed arrows), are indicated with labeled circles. The identical market occasions have been mentioned relating to the Dissimilarity Index in Determine 2.

Through the 3-week experiment, we tracked the balanced and temporal accuracies, and useful resource utilization of every regressor for every coin, along with the revenue for every regressor. Consider, no hyper-parameter optimization was executed to tune the regressors and so their efficiency by way of accuracy and revenue ought to solely be interpreted as relative to one another. Greater than something, this experiment is a proof-of-concept to indicate the potential of FreqAI for real-time adaptive modeling of streaming information.

Ongoing experiment

We’re at present operating a brand new experiment you could try by visiting our live dashboard. If you wish to attempt to run your personal bot: be part of our discord server, the place you will see that a bunch of like-minded folks to share your expertise with.

DISCLAIMER FreqAI is not affiliated with any cryptocurrency choices. FreqAI is, and at all times might be, a not-for-profit, open-source mission. FreqAI does not have a crypto token, FreqAI does not promote indicators, and FreqAI does not have a website apart from the Freqtrade documentation. Please watch out for imposter tasks, and assist us by reporting them to the official FreqAI discord server.

1. Caulk, R. A., & others (2022). FreqAI: generalizing adaptive modeling for chaotic time-series market forecasts. Journal of Open Supply Software program, 7(80), 4864,

Source link


Please enter your comment!
Please enter your name here