1 Introduction

Fixed-income markets represent an important financing source for governments, domestic and international organizations, banks, and both private and public companies with access to the fixed-income market. The development of fixed-income trading can contribute to financial stability in general, and enhance financial intermediation by increasing competition and developing the associated financial infrastructure, products, and services (Nunes et al., 2019; International Monetary Fund, 2021). Government bonds are the main instrument of most fixed-income asset markets, for developed and developing economies alike. In the United States, the main data source for public securities trading activity is GovPX and MTS for Europe. The MTS is a fully electronic, quote-driven interbank market comprising multiple trading platforms. All MTS platforms use identical technology for trading; however, every platform maintains its rules set, and market participants (Biais & Green, 2019; Friewald & Nagler, 2019). Fixed-income securities are commonly traded over-the-counter (OTC), on inter-dealer wholesale platforms, and, less frequently, on retail platforms where liquidity is provided by pre-purchased dealers. Transactions are not anonymous and are bilateral; therefore, the conditions of negotiation are determined by search and trading frictions, in the absence of a focal point, dealers have to proactively seek out and negotiate with possible counterparties to get the "best" offer (Darbha & Dufour, 2013; Glode & Opp, 2020; Neklyudov, 2019).

Fixed-income trading has generally received less attention from researchers than equity market trading, even though fixed-income markets involve significantly more capital raising in comparison to equity markets. The electronic ease of bond trading, however, is on the rise. The impact of electronically supported trading on the performance of the fixed-income market will be interesting to evaluate as its contribution expands. In addition, some liquidity providers for corporate bonds offer requests for trades under specific trade size limits using algorithms instead of human participants (Bessembinder et al., 2020).

Technological advances have transformed how investors can operate in the financial markets. High-Frequency Trading (HFT), which is algorithmic trading (AT) distinguished by high-speed trade execution, exemplifies these changes in technology (Frino et al., 2020). HFT is an approach to a financial market intervention involving complex software tools, which are used to execute high-frequency trades, guided by mathematical algorithms, in the markets for stocks, options, bonds, derivative instruments, commodities, etc. (Rundo, 2019). Hendershott et al. (2011) claim that AT reduces trading costs and increases quote information. In addition, liquidity providers' revenues also increase with AT, although this effect seems to be transitory. In conclusion, financial trading demands that the AT scans the environment for suitable and prompt decisions in the absence of monitored data (Aloud & Alkhamees, 2021).

In academic research, the term "high-frequency trading" is often used to refer to asset price windows of 10 min or 60 min, for example (Christiansen & Ranaldo, 2007). For instance, a study published in the Journal of Financial Markets in 2013 defines high-frequency trading as "trading that takes place in intervals of a few seconds to a few minutes" (Aldrich, 2013). Similarly, a paper published in the Journal of Financial Economics in 2008 found that HFT can improve liquidity provision in corporate bond markets, particularly for less liquid bonds (Mahanti et al., 2008).

However, in practice, the term "high-frequency trading" is often used more strictly to refer to price windows with even shorter times. For example, in the US equities market, the Securities and Exchange Commission defines a high-frequency trader as someone who trades at least 2 million shares or $20 million in securities in a single day, with an average holding time of less than 0.5 s (SEC, 2014).

Despite the varying definitions of high-frequency trading used in academic research and in practice, there is general agreement that this type of trading has significant implications for bond markets. Some studies have suggested that HFT can increase market efficiency and liquidity, while others have argued that it can exacerbate market volatility and lead to market instability (e.g., Frino et al., 2013; Schestag et al., 2016). As HFT continues to evolve and shape financial markets, it is likely that academic researchers and practitioners will continue to debate its effects and implications.

AT, both algorithms driven by fundamental and technical indicator analysis and algorithms supported by machine learning techniques, have been examined by several researchers. According to Goldblum et al., (2021), Machine Learning (ML) is playing an important and growing role in financial business applications. Besides, Deep learning (DL), which is a subclass of ML methods that study deep neural networks, develops DL algorithms that can be used to train complex data and predict the output. Today numerous financial firms, ranging from hedge firms, investment banks, and retailers to modern FinTech providers, are investing in developing expertise in data science and ML (Goodell e al., 2021).

As market turmoil and uncertainty in financial markets have increased considerably, ML algorithms are quite applicable for the analysis of financial markets and, in particular, the fixed-income market. The marketplace is very complicated, and the only forecast that can be made is its unpredictability. The financial market's unforeseeability is caused by the uncertainty of many episodes that occur in it (Goldblum et al., 2021). Deep Neural Networks draw knowledge from the data, that can then be utilised to forecast and produce further data. This feedback decreases unreliability by indicating specific problem-solving. ML is especially useful for handling problems where an analytical solution is not explicitly instructed to do so, such as complicated categorisation techniques or recognition of trends (Ghoddusi et al., 2019). The benefit of Deep Neural Network methods over those offered by classical statisticians and econometricians is that ML algorithms can handle a huge quantity of organised and non-structured information and provide quick predictions or conclusions (Milana & Ashta, 2021).

Publications on the use of ML techniques with specific applications to fixed-income markets are scarce. However, other financial areas have attracted much more interest in the research literature, particularly in the equity and foreign exchange markets (Nunes et al., 2018). Most of these studies involve the stock market, mainly for forecasting with artificial neural networks (ANN), support vector machines (SVM), and random forests (RF) models. These methods have proven to produce excellent results for financial time series forecasting (Deng et al., 2021, 2022). For example, Kara et al. (2011) suggested an ANN-based model for predicting the daily price movement in the stock market, and it yielded high accuracy in the forecast. Akyildirim et al. (2022) compare the trading behaviour of several advanced forecasting techniques, such as ANN, autoregressive integrated moving average (ARIMA), nearest neighbors, naïve Bayes method, and logistic regression to forecast stock price movements relying on past prices. They apply these methods to high-frequency data of 27 blue-chip stocks traded on the Istanbul Stock Exchange. Their results highlight that, among the chosen methodologies, naïve Bayes, nearest neighbors, and ANN can detect the future price directions as well as percentage changes at a satisfactory level, while ARIMA and logistic regression perform worse than the random walk model. In addition, these authors establish a future line of research to test the chosen methods in other markets to achieve more accurate and widespread results.

Some authors have made predictions about the performance of fixed-income assets through neural networks. Vukovic et al. (2020) analyze the model of a neural network that forecasts the Sharpe ratio. Their results demonstrate that neural networks are accurate in predicting nonlinear series with an 82% precision in the test cases for forecasting the future Sharpe ratio dynamics and the position of the investor's portfolio. For future research, they propose analyzing more data in stronger artificial intelligence technologies, such as Long Short-Term Memory (LSTM) neural network technology. They conclude that these adaptive methodologies should provide more accurate analysis and forecasting and such an area of study requires additional attention and effort in the future. Li et al. (2021) analyze sovereign CDS to prevent investment risks and propose a hybrid ensemble forecasting model. They employ Autoregressive Integrated Moving Average (ARIMA) model to predict trend elements, meanwhile, the Relevance Vector Machine (RVM) technique is used to forecast market volatility and noise elements, correspondingly, having the model excellent robustness. They establish that although the suggested model exhibits satisfactory prediction efficiency, there is scope for further improvement and apart from sovereign CDS time series, the prediction model provided may be applied to other financial time series to test the generalizability of the model. Nunes et al. (2019) concentrate on yield curve forecasting, currently the centerpiece of the bond markets. They apply ML to fixed-income markets, specifically multilayer perceptrons (MLPs), to analyze the yield curve overall. They exhibit that MLPs could be effectively utilized to forecast the yield curve. They determine that, in terms of future work, an important area of interest is to keep exploring multitask learning, as they believe that further research is required to identify the terms and conditions under which their methodology could be applied with enhanced performance.

To fill the gap in this research area, our study aims to predict bond price movements based on past prices through high-frequency trading. We compare machine learning methods applied to the fixed-income markets (sovereign, corporate and high-yield debt) in advanced and emerging countries in the one-year bond market for the period from 15 February 2000 to 12 April 2023.

Despite the limited number of observations at 10-min frequency, the methodologies applied in this work are capable of working and finally making estimates, something that would be impossible for conventional statistical methodologies and even for some simple computational methodologies. Some previous works have found more avaibility of this data from OTC markets. Although these limitations exist, several works have also appeared investigating corporate bond data with 10-min interval observations, such as Nowak et al. (2009), Aldana (2017), Gomber and Haferkorn (2015), Holden et al. (2018), Gündüz et al. (2023).

We make at least two further contributions to the literature. First, we analyze the fixed-income market through HFT comparing a wide range of innovative computational machine learning methodologies, since most of the previous studies employ statistical and econometric methods. In addition, the prior literature deals with portfolio optimization only with fixed-income assets is not too many, and even fewer are dealing with the use of HFT. Within the ongoing advancement of financial markets, HFT proportion has increased steadily in recent years, which is generally characterized by fast update frequency and high trading speed. HFT also will produce plenty of profitable market influence, like increasing market liquidity and improving risk-handling ability (Deng et al., 2021). Second, our study has made predictions of bond price movements globally, and so not restricted to developed countries, being interesting for those responsible for the economic policies of any country in the world. Whereas the relevance of public debt markets has led to innumerable papers on these markets in the United States and other advanced countries, comparatively limited research exists on emerging bond markets (Bai et al., 2013). In addition, our study has considered not only sovereign bonds but also corporate and high-yield debt.

The rest of the paper is organized as follows. In Sect. 2, the methodologies are described. Section 3 details the sample and data involved in the research. Section 4 points out the results and findings obtained. By last, Sect. 5 finishes explaining the conclusions reached.

2 Methodologies

We have used different methods to predict bond price movements through HFT. The application of various techniques aims to obtain a robust model, which is tested not just via one categorisation technique but using those that have proven successful in prior literature and other areas. Specifically, this study applies Quantum-Fuzzy Approach, Adaptive Boosting, and Genetic Algorithm, Support Vector Machine- Genetic Algorithm, Deep Learning Neural Network- Genetic Algorithm, Quantum Genetic Algorithm, Adaptive Neuro-Fuzzy Inference System-Quantum Genetic Algorithm, Deep Recurrent Convolutional Neural Networks, Convolutional Neural Networks-Long Short Term Memory, Gated Recurrent Unit- Convolutional Neural Networks, and Quantum Recurrent Neural Networks. The techniques Deep Recurrent Convolutional Neural Networks and Quantum Genetic Algorithm have been the ones that have obtained the best results as will be shown in Sect. 4 of Results, therefore these methodologies will be explained below. The rest named are shown in the “Appendix 1” of this study.

2.1 Quantum Genetic Algorithm (QGA)

The quantum evolutionary algorithm (QEA) is an evolutionary algorithm built on the concept of quantum computing. It introduces notions like superposition states in quantum computing and incorporates the single encoding form to obtain improved experimental results in the combinatorial optimisation problem. Nevertheless, when it comes to the optimisation of multimodal functions using QEA, in the specific, high-dimensional multimodal function optimisation problem, it is likely to drop into local optimum and its computing efficiency is poor.

This study aims to improve the global optimisation capacity of the genetic algorithm and the local search ability according to the quantum probabilistic model to introduce a new type of quantum evolutionary algorithm, namely the "quantum genetic algorithm", to deal with the above deficiencies of QEA. This algorithm utilises the quantum probabilistic vector encoding mechanism and takes the crossover operator of the genetic algorithm and the updating strategy of quantum computation simultaneously to optimize the global search capacity of the quantum algorithm effectively.

The quantum genetic algorithm steps are:

2.1.1 Step 1: Population Initialisation

The lowest unit of information in QGA is a quantum bit. The state of a quantum bit can be 0 or 1, expressed as:

$$ \left| {{{\Psi }}\rangle = \alpha } \right.\left| {0\rangle + \beta \left| 1 \right.} \right.\rangle $$
(1)

being \(\alpha\), \(\beta\) two complex numbers corresponding to the likelihood of happening of the respective state: \(\left( {\left| \alpha \right|^{2} + \left| \beta \right|^{2} = 1} \right), \left| \alpha \right|^{2} , \left| \beta \right|^{2}\) symbol the likelihood of the quantum bit in the 0 and 1 state accordingly.

The most commonly adopted coding techniques in EA include binary coding, decimal coding, and symbolic coding. In QGA, a new method of coding is introduced using the quantum bit, namely the use of a pair of complex numbers to describe a quantum bit. A system with m quantum bits is expressed as

$$ \left[ {\left. {\begin{array}{*{20}c} {\alpha _{1} } \\ {\beta _{1} } \\ \end{array} } \right|} \right.\left. {\begin{array}{*{20}c} {\alpha _{2} } \\ {\beta _{2} } \\ \end{array} } \right| {\begin{array}{*{20}c} \ldots \\ \ldots \\ \end{array} } \Big\rfloor \left. {\left. {\begin{array}{*{20}c} {\alpha _{m} } \\ {\beta _{m} } \\ \end{array} } \right|} \right] $$
(2)

In the equation, \(\left| {\alpha_{i} } \right|^{2} + \left| {\beta_{i} } \right|^{2} = 1\) (i = 1, 2, …, m). This method of display may be applied to describe any linear superposition of states. For instance, a system of three quantum bits having the next probability amplitudes:

$$ \left[ {\begin{array}{*{20}c} {\frac{1}{{\sqrt 2 }}} \\ {\frac{1}{{\sqrt 2 }}} \\ \end{array} \Bigg\lfloor {\begin{array}{*{20}c} {\frac{{\sqrt 3 }}{2}} \\ {\frac{1}{2}} \\ \end{array} } } \right]\left. {\begin{array}{*{20}c} {\frac{1}{2}} \\ {\frac{{\sqrt 3 }}{2}} \\ \end{array} } \right| $$
(3)

The system state can be defined as

$$ \frac{\sqrt 3 }{{4\sqrt 2 }}\left| {000} \right.\rangle + \frac{3}{4\sqrt 2 }\left| {001} \right.\rangle + \frac{1}{4\sqrt 2 }\left| {010\rangle + } \right.\frac{\sqrt 3 }{{4\sqrt 2 }}\left| {011} \right.\rangle + \frac{\sqrt 3 }{{4\sqrt 2 }}\left| {100} \right.\rangle + \frac{\sqrt 3 }{{4\sqrt 2 }}\left| {101} \right.\rangle + \frac{1}{4\sqrt 2 }\left| {110\rangle + } \right.\frac{\sqrt 3 }{{4\sqrt 2 }}\left| {111} \right.\rangle $$
(4)

2.1.2 Step 2: Conduct Individual Coding and Measuring of the Population Generating Units

QGA is a probabilistic algorithm analogous to EA. The algorithm is \(H\left( t \right) = \left\{ {Q_{1}^{t} ,Q_{2}^{t} , \ldots Q_{h}^{t} , \ldots Q_{l}^{t} } \right\}\) h = 1,2,…l) being h the size of the population, \(Q_{l} \left( t \right) = \left\{ {q_{1}^{t} ,q_{2}^{t} , \ldots q_{j}^{t} , \ldots q_{n}^{t} } \right\}\) where n represents the number of generator units, t denotes the evolution generation,\( q_{j}^{t}\) symbols the binary coding of the generation volume of the jth generator unit. Its chromosome is shown as below:

$$ q_{j}^{t} = \left[ {\left. {\begin{array}{*{20}c} {\alpha_{1}^{t} } \\ {\beta_{1}^{t} } \\ \end{array} } \right|} \right.\left. {\begin{array}{*{20}c} {\alpha_{2}^{t} } \\ {\beta_{2}^{t} } \\ \end{array} } \right|\begin{array}{*{20}c} \ldots \\ \ldots \\ \end{array} \Big\rfloor \left. {\left. {\begin{array}{*{20}c} {\alpha_{m}^{t} } \\ {\beta_{m}^{t} } \\ \end{array} } \right|} \right] $$
(5)

(j = 1, 2, …, n) (m is the length of the quantum chromosome).

During the ‘‘initialization of H(t),” if \(\alpha_{1}^{t}\), \(\beta_{1}^{t}\) (i = 1, 2, …, m) in \(Q_{l} \left( t \right)\) and all the \(q_{j}^{t}\) are initialized, it denotes that all the possible linear superposition states will happen with equal likelihood. Over the step of ‘‘generating S(t) from H(t)”, a common solution set S(t) is created through observation of the state of H(t), wherein the tth generation, \(S\left( t \right) = \left\{ {P_{1}^{t} ,P_{2}^{t} , \ldots ,P_{h}^{t} , \ldots ,P_{l}^{t} } \right\}, P_{l} = \left\{ {x_{1}^{t} ,x_{2}^{t} , \ldots ,x_{j}^{t} , \ldots ,x_{n}^{t} } \right\}\). Every \(x_{j}^{t}\) (j = 1, 2, …, n) is a series, \(\left( {x_{1} ,x_{2} , \ldots ,x_{i} , \ldots ,x_{m} } \right)\), of length m, which are reached from the amplitude of quantum bit \(\left| {\alpha_{i}^{t} } \right|^{2} \) or \(\left| {\beta_{i}^{t} } \right|^{2}\) (i = 1, 2, …, m). The relevant procedure in the binary scenario is to randomly identify a number [0, 1]. Take ‘‘1″ if it is larger than \(\left| {\alpha_{i}^{t} } \right|^{2}\); take ‘‘0″ otherwise.

2.1.3 Step 3: Make An Individual Measure for Every Item in S(t)

Employ a fitness assessment function to test each object in S(t) and maintain the best object in the generation. If you get a satisfactory solution, the algorithm stops; if not, proceed to the fourth step. When dealing with non-binary optimization problems, the chromosome is usually represented by a set of real-valued parameters rather than a binary string. In such cases, the fitness function is often a continuous function that maps the parameter values to a scalar value that represents the fitness of the solution.

Considering a non-binary optimization problem with a chromosome composed of three real-valued parameters  × 1, ×2, and ×3, the fitness function for this problem would be defined as:

$$ {\text{f}}({\text{x}}1,{\text{x}}2,{\text{x}}3) = ({\text{x}}1 - 3)^{2} + ({\text{x}}2 + 1)^{2} + ({\text{x}}3 - 2)^{2} $$
(6)

The objective in this case would be to minimize the fitness function. To accomplish this, the QGA would search for a set of parameter values that produce the minimum fitness value. The process would be similar to that for a binary problem, with the genetic operators applied to the real-valued parameters rather than binary strings.

2.1.4 Step 4: Apply Genetic Operators to Create New Individuals

The crossover operator is applied by swapping some of the qubits between two chromosomes. One of the most commonly used crossover operators in QGA is the uniform crossover, which selects each qubit from one of the two parent chromosomes with a certain probability. The crossover operator can be represented mathematically as:

$$ \left| {\uppsi {\text{child}}} \right\rangle = \upalpha \left| {\uppsi {\text{parent1}}} \right\rangle +\upbeta \left| {\uppsi {\text{parent2}}} \right\rangle \left| {\uppsi {\text{child}}} \right\rangle = \upalpha \left| {\uppsi {\text{parent1}}} \right\rangle +\upbeta \left| {\uppsi {\text{parent2}}} \right\rangle $$
(7)

where α∣ψparent1⟩ + β∣ψparent2⟩ are the two parent chromosomes, ∣ψchild⟩ is the resulting child chromosome, and α and β are complex coefficients determined by the crossover probability.

The mutation operator randomly flips some of the qubits in a chromosome. Mathematically, the mutation operator can be represented as:

$$ \left| {\uppsi {\text{mutated}}} \right\rangle = {\text{Um}}\left| {\uppsi {\text{original}}} \right\rangle \left| {\uppsi {\text{mutated}}} \right\rangle = {\text{Um}}\left| {\uppsi {\text{original}}} \right\rangle $$
(8)

where Um is a single-qubit unitary gate that applies a random rotation around the Bloch sphere axis for the qubit to be mutated. The mutation rate determines the probability of applying the mutation operator to each qubit in a chromosome.

It’s important to note that the application of genetic operators in QGA can be done in different ways, and the specific equations used can vary depending on the implementation and problem being solved.

2.1.5 Step 5: Apply An Appropriate Quantum Rotation Gate U(t) to Update S(t)

The conventional genetic algorithm utilises mating and mutation operations, etc. to keep the population diverse. The quantum genetic algorithm uses a logic gate to the likelihood amplitude of the quantum state to preserve the diversity of the population. Hence, the method of updating by a quantum gate is the essence of the quantum genetic algorithm. The binary system, adaptation values, and the probability amplitude comparison technique are utilised for updating using a quantum gate in the classical genetic algorithm. This approach to updating via a quantum gate is adequate for solving combinatorial optimisation problems with an in-principle optimum. Nevertheless, for real optimisation problems, especially those optimisation problems of multivariable continuous functions, whose best solutions are in principle not available beforehand. Hence, a quantum rotation gate of the quantum logic gate for the new quantum genetic algorithm is assumed here.

$$ U = \left[ {\begin{array}{*{20}c} {\cos \theta } & {\sin \theta } \\ {\sin \theta } & {\cos \theta } \\ \end{array} } \right] $$
(9)

being \(\theta\) the quantum gate rotation angle. Its value is shown as

$$ \theta = k \cdot f\left( {\alpha_{i} ,\beta_{i} } \right) $$
(10)
$$ k = \pi \cdot \exp \left( { - \frac{t}{{iter_{\max } }}} \right) $$
(11)

We consider k as a variable linked to the evolution generation to adjust the mesh size in a self-adaptive way. Let t be the evolution generation, π is an angle, \(iter_{\max }\) is a constant that relies on the complexity of the optimization problem. The aim of the function \(f\left( {\alpha_{i} ,\beta_{i} } \right)\) serves to cause the algorithm to seek the best direction. It is based on the idea of gradually bringing the actual search solution closer to the optimal solution and thus setting the direction of the quantum rotation gate.

Thus, the process of implementing the quantum rotation gate to the entire probability amplitude for the individual object in the population, namely by applying the quantum rotation gate U(t) to update S(t), in the quantum genetic algorithm may be written as:

$$ S\left( {t + 1} \right) = U\left( t \right) \times S\left( t \right) $$
(12)

being t the evolution generation, U(t) represents the tth generation quantum rotation gate, S(t) symbols the tth generation probability amplitude of a certain object, S(t + 1) denotes the \(t + 1^{th}\) generation probability amplitude of the relevant object.

2.1.6 Step 6: Perturbation

Since QGA is inclined to get caught at a better local extreme value, we disturb the population. QGA analysis has shown that if the best individual of the present generation is a local extreme value, the algorithm is very difficult to free. Thus, the algorithm is stuck at the local extrema if the best individual remains unchanged in subsequent generations.

Finally, we show how is the pseudocode for the implementation of this method for the problem studied and the flowchart (Fig. 1) with the steps to follow as it has been described previously.

figure a

Pseudocode of Quantum Genetic Algorithm

Fig. 1
figure 1

Flowchart of quantum genetic algorithm

2.2 Deep Recurrent Convolutional Neural Network (DRCNN)

The RCNN model consists of a stack of RCLs and may include max pooling layers. To save on computational resources, the first layer is a standard forward convolutional layer with no recurrent connections, followed by a max pooling layer. Four RCLs are used with a max pooling layer in the middle, and there are only feed-forward links between adjacent RCLs. Both clustering operations have a stride of 2 and a size of 3. The fourth RCL's output tracks a global maximum clustering layer that produces the maximum of each feature map, resulting in a feature vector to represent the image. This approach differs from Krizhevsky et al. (2017) model, which uses fully connected layers, and Lin et al. (2013) and Szegedy et al.’s (2017) models, which use global average pooling. Finally, a softmax layer is used to classify the feature vectors into C categories, with the output consisting of:

$$ \mathcalligra{y}_{k} = \frac{{\exp \left( {W_{K}^{T} X} \right)}}{{\mathop \sum \nolimits_{K^{\prime}} \exp \left( {W_{K}^{T} X} \right)}}\left( {{\text{k}} = 1,2, \ldots ,{\text{C}}} \right) $$
(13)

being \(\mathcalligra{y}_{k}\) the predicted probability belonging to the kth category, and x the feature vector generated by the global max pooling.

RNNs have been deployed in many fields in time series forecasting with success owing to their enormous predictive power. The standard RNN framework is structured by the output, which depends on its past estimations (Wan et al., 2017). The standard RNN framework uses a hidden state to store information about past inputs, which is combined with the current input to make a prediction for the output at the current time step. The RCNN model incorporates the standard RNN framework by using Recurrent Convolutional Layers (RCLs) to capture the temporal dependencies in sequential data. The output of each RCL is a sequence of hidden states that can be used to make predictions about future inputs. The DRCNN model extends the RCNN model by stacking RCLs to create a deep architecture, with each layer applying a convolutional operation to the hidden states generated by the previous layer. The output of the last layer is then fed into a supervised learning layer to produce a prediction for the output at the current time step. the output of this RNN can be written as:

$$ {\text{y}}_{{\text{t}}} = {\text{f}}\left( {{\text{W}}_{{\text{y}}} *{\text{s}}_{{\text{t}}} + {\text{b}}_{{\text{y}}} } \right) $$
(14)

where yt is the output at time step t, st is the hidden state at time step t, Wy is the weight matrix connecting the hidden state to the output, by is the bias term, and f is the activation function.

An input sequence vector x, the hidden states of a recurrent layer s, and the output of a unique hidden layer y, can be obtained from formulas (14) and (15).

$$ s_{t} = \sigma \left( {W_{xs} x_{t} + W_{ss} s_{t - 1} + b_{s} } \right) $$
(15)
$$ y_{t} = o\left( {W_{so} s_{t} + b_{y} } \right) $$
(16)

being \(W_{xs}\), \(W_{ss}\), and \(W_{so}\) the weights from the input layer x to the hidden layer s, the hidden layer to itself, and the hidden layer to its output layer, respectively. \(b_{y}\) represent the biases of the hidden layer and output layer. Formula (16) points out \(\sigma\) and \(o\) as a symbol of the activation functions.

$$ STFT\left\{ {z\left( t \right)} \right\}\left( {\tau ,\omega } \right) = \mathop \smallint \limits_{ - \infty }^{ + \infty } z\left( t \right)\omega \left( {t - \tau )e^{ - j\omega t} dt} \right) $$
(17)

where z(t) denotes the vibration signals, ω(t) symbols the Gaussian window function focused around 0. T(τ, ω) represent a complex function defining the vibration signals over time and frequency. To compute the hidden layers with the convolutional operation formulas (17) and (18) are used.

$$ S_{t} = \sigma \left( {W_{TS} * T_{t} + W_{SS} * S_{t - 1} + B_{s} } \right) $$
(18)
$$ Y_{t} = o\left( {W_{YS} * S_{t} + B_{y} } \right) $$
(19)

being W the convolution kernels. Below we show the pseudocode of activation function of these RCNNs:

Pseudocode for Activation function

# Activation function for identifying and ranking values

# Input: Characteristics from convolution layer

# Output: Removal of negative values

activation_function = lambda y: 1.0/(1.0 + np.exp(-y))

input_func = np.random.random((2, 3))

K1, a1 = np.random.random((4, 2)), np.random.random(4)

K2, a2 = np.random.random((1, 4)), np.random.random(1)

K3, a3 = np.random.random((1, 1)), np.random.random(1)

layer1 = activation_function(np.dot(K1, input_func) + a1)

layer2 = activation_function(np.dot(K2, layer1) + a2)

output = np.dot(K3, layer2) + a3

To establish a deep architecture, the recurrent convolutional neural network (RCNN) can be stacked and form the DRCNN (Huang & Narayanan, 2017). In this combination case, the last part of the model is a supervised learning layer, set by formula (19).

$$ \hat{r} = \sigma \left( {W_{h} *h + b_{h} } \right) $$
(20)

being Wh the weight and bh the bias, respectively. The error of predicted and actual observations in the prediction training data may be estimated and fed back into model training (Ma & Mao, 2019). Stochastic gradient descent is implemented to optimise parameter learning. Assuming that the real data at time t is r, the loss function is given in the formula (20).

$$ L\left( {r,\hat{r}} \right) = \frac{1}{2}\left\| {r - \hat{r}} \right\|_{2}^{2} $$
(21)

The number of filters is the number of neurons, since each neuron performs a different convolution on the input to the layer. It can also be distributed as multiples of 32, with a range limit of 32–512. The size of the filter defines how many neighboring data points there are in a convolutional layer. The most used sizes in this work have been 3 × 3 and 5 × 5. Stride and padding are a parameters of the neural network's filter that modifies the amount of movement over observations. In this work and usually, a stride size no greater than 2 × 2 and a padding no greater than 1 × 1 have been used. Finally, we provide a flowchar of the steps to complete in order to run this DRCNN in the Fig. 2.

Fig. 2
figure 2

Deep recurrent convolutional neural network flowchart

3 Sample and Data

We employ bond prices for a one-year bond market in the period from February 15th, 2000 to April 12th, 2023. The sample consists of ten sovereign bonds in five advanced economies (Germany, United States, Italy, Spain, and Japan) and five emerging countries (Turkey, Mexico, Indonesia, Nigeria, and Poland); ten corporate bonds in five advanced economies (Walmart, Johnson & Johnson, Verizon, Unilever PLC, Rito Tinto PLC) and in six emerging economies (Air Liquid, Ambev, Cemex, Turkish Airlines, KCE Electronics, Telekomunikacja Polska), and finally, ten high-yield bonds in five developed countries (Caesars Resort Collection LLC, Asurion LLC, Intelsat Jackson Holdings, Athenahealth Group, Great Outdoors Group LLC) and in five developing markets (Petroleos Mexicanos, Petrobas, Sands China Ltd, Indonesia Asahan Alumini, Longfor Properties). “Appendix 3” displays a detailed information about the features of every bond used in the sample. We have got data on the bond prices from the Eikon database from Refinitiv. The data on trades in Refinitiv comprises information on executed trades such as price and volume, which are timestamped up to the microsecond with tools like Refinitiv Tick History. On the other hand, the information on order book includes the limit price and order volume for both the bid and ask sides, covering limits one to ten. This information has been used by recent studies from Clapham et al. (2022), Hansen and Borch (2022), and Dodd et al. (2023). Table 1 summarizes the sample according to every category of the fixed-income market used.

Table 1 Sample of bonds used

We categorize all trades that happen in the continuous session across the day as "continuous trades" and build "all trades" aggregating the trades performed in the open and close sessions to the "continuous trades". To avoid dealing asynchronously, we display our data at 10, 30, and 60 min.

In addition, we measure the cost-effectiveness of our selected forecasting techniques by the following ratios.

3.1 Sign Prediction Ratio (SPR)

Correctly predicted price direction change is assigned 1, and − 1 otherwise. This ratio is defined as:

$$ SPR = \frac{{\mathop \sum \nolimits_{j = 1 + M/2}^{M} matches \left( {Y_{j} ,Y_{j}^{\prime} } \right)}}{M/2} $$
(22)

being “matches” the following

$$ matches \left( {Y_{j} ,Y_{j}{\prime} } \right) = \left\{ {\begin{array}{*{20}l} {1 \;\;\;\;if\; sign \left( {Y_{j} } \right) = sign \left( {Y_{j}{\prime} } \right)} \hfill \\ {0\;\;\;\; otherwise} \hfill \\ \end{array} } \right. $$
(23)

being the “sign function” that assigns + 1 for positive arguments and -1 for negative arguments.

With the purpose of correcting the possible deficiencies of the model in terms of its precision regarding the direction of the trend of the movements in the prices of the securities, we have incorporated a modification of the previous equation by adding the Moving Average Convergence Divergence (MACD) model. The MACD model is commonly calculated using the following equation:

$$ {\text{MACD }}\;{\text{Line}} = {\text{12-day}}\;{\text{Exponential}}\;{\text{Moving}}\;{\text{Average}}\;\left( {{\text{EMA}}} \right) - {\text{26-day}}\;{\text{ EMA}} $$
(24)

The MACD line represents the difference between the 12-day EMA and the 26-day EMA. The EMA is a type of moving average that gives more weight to recent data points. By subtracting the longer-term EMA from the shorter-term EMA, the MACD line aims to capture the momentum and trend direction of the underlying asset (Chong & Ng, 2008; Ramlall, 2016; Sezer & Ozbayoglu, 2018).The approach of using MACD as a correction factor in the SPR can be modeled using the following equation:

$$ {\text{SPR}}_{{{\text{Adjusted}}}} = {\text{SPR}} + \, \left( {{1} - {\text{SPR}}} \right)*\left( {{1} - {\text{MACD}}} \right) $$
(25)

where SPR is the original sign prediction ratio, MACD is the signal generated by the MACD model, and 1-MACD is used as a correction factor. The term (1-SPR) represents the complement of the original SPR, reflecting the portion of the original SPR that is not considered accurate. The term (1-MACD) represents the complement of the MACD signal, reflecting the extent to which the MACD signal indicates a potential reversal or correction in the market. The intuition behind this equation is that when the MACD signal is positive, it is likely that the market is trending upwards and the original SPR is more accurate. However, when the MACD signal is negative, it suggests a market reversal or a correction, and the original SPR may not be as accurate. In this case, the correction factor is used to adjust the SPR downward to reflect the possibility of a trend reversal (de Almeida & Neves, 2022; Ramlall, 2016; Slade, 2017).

By adding the correction factor to the original SPR, the adjusted SPRAdjusted takes into account the possibility of trend reversals or corrections indicated by the MACD signal. When the MACD signal is positive, the original SPR is considered more accurate and is only slightly adjusted. However, when the MACD signal is negative, indicating a potential trend reversal, the original SPR is adjusted more significantly downward to reflect the increased likelihood of a reversal.

This approach aims to combine the predictive power of the original SPR with the insights provided by the MACD signal, adjusting the SPR to account for potential trend changes. It recognizes that the MACD signal can act as a corrective factor when the market conditions indicate a higher likelihood of a trend reversal or correction.

3.2 Ideal Profit Ratio (IPR)

Is the ratio between the total Return and the maximum return.

$$ IPR = \frac{{Total \;{\text{Re}} turn}}{{Maximum \;{\text{Re}} turn}} $$
(26)

The Total Return is computed in the following formula, where “sign” denotes the “sign function” and the better the forecasting approach, the higher the total return will be.

$$ Total \;Return = \mathop \sum \limits_{j = 1 + M/2}^{M} sign \left( {Y_{j}^{\prime} } \right)*Y_{j} $$
(27)

The maximum return is determined by summing all absolute expected figures and reflects the maximum achievable return, considering a perfectly foreseeable forecast. This ratio is defined as:

$$ Maximum \;Return = \mathop \sum \limits_{j = 1 + M/2}^{M} abs \left( {Y_{j} } \right) $$
(28)

Nelson-Siegel model, which was introduced by economists Nelson and Siegel in 1987. The model is based on the idea that the yield curve can be decomposed into three factors: the level factor, the slope factor, and the curvature factor. These factors capture the average level of interest rates, the steepness of the yield curve, and the degree of curvature, respectively. The model can be expressed mathematically as follows:

$$ {\text{r}}\left( {\text{t}} \right) =\upbeta {1} +\upbeta {2}*\left[ {{1} - {\text{exp}}( - {\text{t}}/\uptau )} \right]/({\text{t}}/\uptau ) +\upbeta {3}*\left[ {({1} - {\text{exp}}( - {\text{t}}/\uptau ))/({\text{t}}/\uptau ) - {\text{exp}}( - {\text{t}}/\uptau )} \right] $$
(29)

where r(t) represents the yield on a bond with time to maturity t, and β1, β2, β3, and τ are parameters to be estimated. The parameter β1 represents the long-term mean level of interest rates, β2 represents the slope of the yield curve at short maturities, β3 represents the curvature of the yield curve, and τ represents the time scale over which the yield curve adjusts to its long-term mean.

The rolling regression method involves estimating the relationship between the excess returns of the bond portfolio and changes in the yield curve over a specified rolling time period, such as one month or one quarter. The slope of the regression line represents the expected excess return of the portfolio for a given change in the yield curve (Grinold & Ronald, 1999; Ibbotson & Kaplan, 2000).

The equation for the rolling regression model can be written as follows:

$$ {\text{Excess}}\;{\text{Return}} = \upalpha +\upbeta *{\text{Yield}}\;{\text{Curve}}\;{\text{Change}} +\upvarepsilon $$
(30)

where Excess Return is the excess return of the bond over the risk-free rate, typically estimated using a 3-month U.S. Treasury bond as the benchmark (Campbell et al., 2001). Yield Curve Change is the change in the yield curve over the rolling time period, calculating the yields for each maturity point based on the Nelson-Siegel model for both yield curves. The term α is the intercept of the regression line, which represents the expected excess return of the bond when the yield curve change is zero. The term β is the slope of the regression line, which represents the expected excess return of the bond for a one-unit change in the yield curve. The term ε is the residual error term, which represents the deviation of the actual excess return from the predicted excess.

For its part, a modification of the Ideal Profit Ratio equation has been made following what has been done in works such as Elton et al. (1995) y Grinold and Ronald (1999). The ideal profit ratio is a measure of the performance of an investment strategy relative to a benchmark. It is calculated as the difference between the total returns of the strategy and the maximum returns of the benchmark, divided by the maximum returns of the benchmark.

To incorporate the excess return based on the yield curve into the calculation of the ideal profit ratio, you could modify the equation as follows:

$$ {\text{Ideal}}\;{\text{Profit}}\;{\text{Ratio}} = \left( {{\text{Total}}\;{\text{Return}} - {\text{Expected}}\;{\text{Return}}} \right)/{\text{Maximum}}\;{\text{Return}} $$
(31)

This modified equation measures the performance of the investment strategy relative to the benchmark, while taking into account the impact of the yield curve on the expected return of the portfolio. A higher ideal profit ratio indicates better performance relative to the benchmark.

Finally, after calculating the aforementioned equation of the Ideal Profit Ratio, its final result will be the net value after applying the transaction cost. In our case we used the difference between the average customer buy and the average customer sell price on each day to quantify transaction costs according to the specification of Hong and Warga (2000) and Chakravarty and Sarkar (2003):

$$ TC_{AvgBidAsk} = \frac{{\overline{{P_{t}^{buy} }} - \overline{{P_{t}^{sell} }} }}{{0.5 - \left( {\overline{{P_{t}^{buy} }} - \overline{{P_{t}^{sell} }} } \right)}} $$
(32)

where \(\overline{{P_{t}^{buy/sell} }}\) t is the average price of all customers buy/sell trades on day t. We calculate TCAvgBidAsk for each day on which there is at least one buy and one sell trade and use the monthly mean as a monthly transaction cost measure following the specifacions of previous works (Schestag et al., 2016).

4 Results

From our data described in the previous section, we collect a sample at 10, 30- and 60-min intervals, and afterward, we implement ten different methods defined in Sect. 2. The size of the training sample for the whole daily forecasting time horizon appears as 50% of the total sample size approximated to the nearest integer value, while the other 50% is used as an out-of-sample data set.

We introduce two key measures, defined in Sect. 3, of the performance of the methodologies. First, is the sign prediction rate, representing the proportion of times that the corresponding methodology accurately estimates the direction of the future price (up or down). Since correctly guessing the future price change would not ensure better results, we should contrast the performance of different prediction methodologies with a correct prediction of price changes. Thus, the ideal profit ratio is the relationship between the profitability generated by a particular method and a perfect sign forecast.

We implement the process mentioned above for "continuous operations" in the sample period and, in addition, we also apply it for "all operations" to test for robustness. Tables 2, 3, 4, 5, 6, and 7 display the results achieved for each bond at different time scales and for "continuous trades”. The results for the "all trades" case scenario are presented in “Appendix 2” via Tables 10, 11, 12, 13, 14, 15.

Table 2 Sign Prediction Ratio (10 min) for continuous trades
Table 3 Ideal profit ratio (10 min) for continuous trades
Table 4 Sign Prediction Ratio (30 min) for continuous trades
Table 5 Ideal profit ratio (30 min) for continuous trades
Table 6 Sign Prediction Ratio (60 min) for continuous trades
Table 7 Ideal profit ratio (60 min) for continuous trades

Table 2 reports the sign prediction accuracy ratios of continuous trading for the ten techniques on the considered bonds for 10 min. We remark that QGA performs the best with 31 bonds, with an accuracy rate of over 0.772 and a mean of 0.881. DRCNN and DLNN-GA and are the second and third methods that correctly predict the change in bond price direction, with an average of 0.850 and 0.847 respectively. SVM-GA may also be regarded as a reference model for the other machine learning algorithms, being the fourth best in the comparison. We notice that the fuzzy approach ise the worst-performing techniques, with an overall mean of 0.770 for the Qfuzzy method.

Table 3 shows the results of the ideal profit ratios for the selected bonds and for a time scale of 10 min for every methodology. It is noted that, in line with the success rates in Table 2, QGA is again the best-performing method, as all bonds have a positive ideal profit ratio with a mean value of 0.0175. Nevertheless, in contrast to the results of the accuracy rates, QRNN becomes the second-best performing method this time, as all bonds also have a positive ideal profit ratio and a mean of 0.0140. QRNN is followed by the ANFIS-QGA method with an average of 0.0134. In this case, SVM-GA and CNN-LSTM are the worst-performing models regarding profit generation for continuous 10-min trades. The maximum value of the ideal profit ratio among sovereign bonds is 0.0212 and is reached by QGA in Turkey. Among corporate bonds, Telelomunikacja Polska is the one that reaches a maximum value, being 0.0192, again in the QGA method. Finally, among high-yield bonds, Caesars Resort Collection LLC stands out with a value of 0.0192, also in the QGA method.

Table 4 presents the results concerning the success ratios of continuous operations with a frequency of 30 min. We note that, as in the case of the 10-min frequency, the QGA method is once again the most performing in terms of mean bond success ratio, with an average of 0.860. In the QGA method, among sovereign bonds, Germany has the highest ratio at 0.906. Looking at corporate bonds, also in the QGAs method, Ambev's is the highest value at 0.918, and among high-yield bonds, Caesars Resort Collection LLC ranks highest with a value of 0.849, also in the QGA method. The next methods that present a correct forecast of the future direction of bond prices are DLNN-GA and DRCNN, with a mean of 0.839 and 0.829 respectively. SVM-GA could also be accepted as a good model for the sign prediction ratio. On the other hand, we notice that, as with the 10-min frequency, the method Qfuzzy show low sign prediction capacity, with mean value of 0.750.

If we examine the results of the ideal profit ratio for continuous 30-min trades in Table 5, we observe that QGA emerges as the best model yielding the greatest profit ratio, with an average of 0.0193 with all bonds having a positive ratio. This result is following the success rates of the QGA method in Table 4. However, if we look at the techniques that worst predict the future direction of bond prices, in this case, this is not the fuzzy one but GRU-CNN and AdaBoost-GA. Both have a mean value of 0.0063 and 0.0080 respectively.

On considering a sampling frequency of 60 min, Table 6 reveals that, in line with the previous results, the QGA is better than the other methods regarding the mean sign prediction ratio, with an average of 0.838. As in the case of the 10-min and 30-min time scales, the maximum success ratio for all bonds through this QGA method is achieved for the corporate bond "Ambev". Moreover, DRCNN and DLNN-GA correctly predict all bonds with a rate above 0.701. Furthermore, we remark that, as with the 10-min frequency and the 30-min frequency, the Qfuzzy method displays a weak sign prediction ability, with mean values of 0.732.

If we analyze Table 7, it is evident that the genetic algorithms are the ones that obtain the best results for the 60-min frequency in the ideal profit ratio. QGA, ANFIS-QGA, and SVM-GA are those that reflect, in this order, the highest ideal profit ratio with all of the bonds having a positive ratio. DRCNN and DLNN-GA come after with an average ratio of 0.0131 and 0.0128, respectively. The lowest values, in contrast to the previous table, are those achieved in the GRU-CNN and CNN-LSTM methods, with an average value of 0.0032 and 0.0048 respectively.

When we examine continuous trading time series at 10, 30, and 60 min, we can reveal the impact of the sampling frequency on the prediction. We note in Fig. 3 that all methods show better results concerning the Sign Prediction Ratio at lower frequencies. Nevertheless, the case is otherwise, as illustrated in Fig. 4, for the Ideal Benefit Ratio, as AdaBoost-GA, SVM-GA, ANFIS-QGA and DRCNN perform the best model for trading strategy setting at 60 min of sampling, and QGAperforms the best for 30 min of sampling. Following our results, not only the bonds and methodology but also the prediction intervals are important. As a consequence, we may conclude that one method is not suitable for everything. While a method may be suitable for raw data, it may not be appropriate for fine data.

Fig. 3
figure 3

Sign Prediction Ratio for continuous trades at different sampling frequencies of each method

Fig. 4
figure 4

Ideal profit ratio for continuous trades at different sampling frequencies of each method

Tables 8 and 9 show the results of Sovereign bonds at 1 and 5 min frequency intervals. If we observe Table 8, the methodology with the best results is QGA for both ratios, sign prediction and ideal profit. In the case of the sign prediction ratio for a one-minute frequency, German sovereign bonds are the ones that reach the highest value (0.924) and for the 5-min frequency case, Italian sovereign bonds with a ratio of 0.946. With reference to the ideal profit ratio, the best result is obtained by Turkey for a frequency of 1 min (0.0229), and Spain for a frequency of 5 min (0.0247). For all trades, Table 9 illustrates that the best sovereign bond performance in sign prediction ratio at 1-min frequency is Japan (0.930) in the Qfuzzy method. However, at a frequency of 5 min, the DLMM-GA method performs best, with Germany having the highest ratio. Regarding the ideal profit ratio, QRNN is the best methodology, with Turkey obtaining the highest values at both frequencies, 0.0244 at a 1-min frequency and 0.0195 at a 5-min frequency.

Table 8 Sign prediction and ideal profit ratios in small frequency for continuous trades of sovereign bonds
Table 9 Sign prediction and ideal profit ratios in small frequency for all trades of sovereign bonds

In comparison with other works, Vukovic et al. (2020) obtain an accuracy of 82% on test cases for predicting future Sharpe ratio dynamics with neural networks. Nunes et al. (2019) achieve RMSE reductions compared to the model without synthetic data in the range of 11% to 70% (mean values, for forecast horizons of 15 and 20 days) for predicting the bond market yield curve, using the Multilayer Perceptrons method. In summary, our study has high precision, and also exceeds the accuracy level of previous work, being the genetic algorithms the ones that obtain the best results, especially the QGA method. Moreover, previous literature dealing with fixed-income assets is not concerned with the use of HFT. The results of our study show that bond market transactions through HFT are executed faster and trading volume increases considerably, enhancing the liquidity of the bond market.

Finally, we analyse the cumulative net profits for each bond market (sovereign, corporate and high-yield) and according to each price window (10-min, 30-min, 60-min and 1-min, 5-min). These results are presented in “Appendix 4” via Figs. 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15. Over the span of two decades under examination, all models encountered drawdowns of varying magnitudes, ranging from 5 to 15% at different points in time. On average, these drawdowns persisted for approximately 2.5 months. Instances of model underperformance became evident during periods of extreme market volatility, exemplified by the 2008 financial crisis, which witnessed model losses surpassing the 20% mark. Similarly, unexpected geopolitical events posed challenges, with losses reaching up to 18%. Our models demonstrated a propensity to falter when confronted with 'black swan' events of exceptional magnitude that surpassed historical data, as exemplified by the impact of the COVID-19 pandemic (Papadamou et al., 2021). Moreover, these models reduce their performance a little in forecasting abrupt market shifts induced by unprecedented occurrences, such as major regulatory changes. Despite these limitations in predicting and achieving profits, our models achieve a higher and more consistent level of cumulative profits over time than previous work on algorithmic trading models, especially high-frequency trading models (Dixon et al, 2018; Rundo, 2019; Lahmiri & Bekiros, 2021; Goudarzi & Bazzana, 2023).

Fig. 5
figure 5

Cumulative net profits for sovereign bonds (10-min)

Fig. 6
figure 6

Cumulative net profits for corporate bonds (10-min)

Fig. 7
figure 7

Cumulative net profits for high-yield bonds (10-min)

Fig. 8
figure 8

Cumulative net profits for sovereign bonds (30-min)

Fig. 9
figure 9

Cumulative net profits for corporate bonds (30-min)

Fig. 10
figure 10

Cumulative net profits for high-yield bonds (30-min)

Fig. 11
figure 11

Cumulative net profits for sovereign bonds (60-min)

Fig. 12
figure 12

Cumulative net profits for corporate bonds (60-min)

Fig. 13
figure 13

Cumulative net profits for high-yield bonds (60-min)

Fig. 14
figure 14

Cumulative net profits for sovereign bonds (1-min)

Fig. 15
figure 15

Cumulative net profits for sovereign bonds (5-min)

Sovereign Bonds exhibit a fluctuating pattern over the years, with a negative start in 2001 but a significant shift towards positive gains in 2002. This positive trend continued until 2006, followed by intermittent fluctuations. By 2023, net gains had stabilized at a relatively positive level, demonstrating the resilience of these bonds. Corporate Bonds also had a negative start in 2001 but saw notable improvements in 2002, with consistent gains until around 2006. There was volatility in the subsequent years, with lucrative moments such as in 2010 but also difficulties. By 2023, corporate net gains appear to have regained a positive trajectory. High-Yield Bonds started in the negative in 2001 and remained mostly so until 2003. They then experienced a period of consistent gains until around 2011, followed by volatility. In 2023, they maintain positive cumulative gains, albeit more moderate.

At the beginning of the 2000s, central banks, particularly the U.S. Federal Reserve, had a more neutral monetary policy stance. Interest rates were relatively higher compared to the 2010s (Jarrow, 2019). However, following the burst of the dot-com bubble and the September 11 attacks in 2001, central banks, including the Federal Reserve, lowered interest rates to stimulate economic growth. These rate cuts resulted in lower yields on government bonds (Fabozzi & Fabozzi, 2021).

Bond yields, especially in the U.S., remained relatively low during the first half of the decade but started to rise as the economy improved. The latter part of the 2000s was marked by the U.S. housing bubble and the subsequent global financial crisis of 2008. These events led to a flight to safety, with investors seeking refuge in government bonds, particularly U.S. Treasuries (Gilchrist et al., 2019). This increased demand for government bonds drove prices up and yields down.

Corporate bonds in the early 2000s offered higher yields compared to government bonds, reflecting the risk premium associated with corporate debt. However, during the financial crisis, corporate bond yields rose significantly as investors became concerned about the creditworthiness of corporations (Jarrow, 2019). Bond spreads, which measure the difference in yields between corporate bonds and government bonds, widened substantially during this period. Emerging market bonds experienced mixed performance during the 2000s. Some emerging market economies attracted foreign investment, leading to lower yields on their bonds. However, there were instances of bond market turmoil in emerging markets, driven by factors such as currency devaluations and political instability (Beirne and Sugandi, 2023).

Regarding the 2010s decade, central banks, particularly in developed economies like the United States, Europe, and Japan, implemented accommodative monetary policies in response to the global financial crisis of 2008 (Albagli et al., 2018). These policies included near-zero or negative interest rates and large-scale bond-buying programs (quantitative easing) aimed at stimulating economic growth. As a result, yields on government bonds, which serve as benchmarks for other fixed-income securities, remained historically low (Blanchard, 2023). The low yield environment prompted investors to seek higher-yielding assets, which sometimes led to increased demand for riskier bonds, such as high yield or corporate bonds. This increased demand pushed up bond prices and drove yields lower. The global economy experienced a prolonged period of low inflation and, at times, deflationary pressures during the 2010s. Low inflation expectations are often associated with lower yields on fixed-income securities (Fabozzi & Fabozzi, 2021).

Regulatory changes in the financial industry, such as Basel III banking regulations, encouraged financial institutions to hold more high-quality liquid assets, including government bonds (Ranaldo et al., 2019). This increased demand for government bonds also contributed to lower yields. While low yields were a prominent feature of the 2010s bond market, it's essential to note that not all bonds experienced the same level of yield compression. The extent of yield compression varied among different types of bonds, and some segments of the bond market, like high yield or emerging market bonds, offered higher yields to compensate for increased risk (Fabozzi & Fabozzi, 2021).

5 Conclusions

This study has developed a comparison of methodologies to predict bond price movements based on past prices through high-frequency trading. We compare ten machine learning methods applied to the fixed-income markets in sovereign, corporate and high-yield debt, in both developed and emerging countries, in the one-year bond market for the period from 15 February 2000 to 12 April 2023. Our results indicate that QGA, DRCNN and DLNN-GA can correctly interpret the expected bond future price direction and rate changes satisfactorily. Curiously, QFuzzy is not adequate for forecasting high-frequency returns and dealers ought to avoid these models in their trading decisions, for the sample bond market.

Our study shows that all methods show better results concerning the Sign Prediction Ratio at lower frequencies. Thus, considering 10 min of frequency, the QGA method is the best performer with all the bonds, with an accuracy rate higher than 0.772 and a mean of 0.881. DRCNN and DLNNN-GA are the second and third methods that correctly predict the change in bond price direction, with an average of 0.850 and 0.847 respectively. However, for the Ideal Profit Ratio, not all methods show better results at the highest frequency. Some methods such as SVM-GA, ANFIS-QGA and DRCNN, perform the best model for the trading strategy configuration at 60-min sampling, and QGA performs the best for 30 min of sampling. Therefore, it is important to consider that depending on the sampling frequency and the objective of the approach, one method does not fit all, and a mixture of different alternative techniques must be examined.

In contrast to previous research, this study has achieved better accuracy results and has made a comparison of innovative methods of ML with the use of HFT, not been applied in the bond market so far. ML algorithms have become widely available for fixed-income market analysis, especially since uncertainty in the financial markets has risen sharply. In addition, our study has made predictions of bond price movements globally, hence it is not exclusively focused on industrialized countries. Finally, our study includes not only sovereign bonds but also corporate and high-yield debt, making it of interest to policymakers in any country.

Our study provides important benefits in the field of finance. From an insider trading perspective, it strengthens the implementation of reliable and fast forecast systems on the bond prices, including the pursuit of returns and volatility targeting, and can analyse the information of indirect market-based monetary policy instruments and the macro environment. Adequate bond price predictability can reduce medium- and long-term debt servicing costs through the development of a deep and liquid market for government securities. At the microeconomic level, the development of a robust bond price prediction model can increase overall financial stability and enhance financial intermediation via increased competition and the development of related financial infrastructure, products, and services. In addition, more generally, financial crises tend to arise in credit markets. Our model has the potential to provide financial institutions with information on the effects of policy measures on the credit market's fragility and to provide a better understanding of how market trends influence liquidity provision, implementation costs, and the impact on transaction prices.

In summary, our paper has a great perspective impact It can facilitate the work of professionals from financial institutions dedicated to trading as well as possible private investors and other stakeholders. This research makes an important contribution to high-frequency trading, as the conclusions have important implications both for investors and market participants as they seek to derive economic and financial profits from the bond market.

Our work has limitations in data availability for 10- and 30-min price frequencies for corporate debt securities. In order for this type of research to have greater generalizability for fixed income market practitioners, greater data availability would be necessary. We leave this issue as a reason to explore future research in which more complex trading strategies can be organized to test and demonstrate the effectiveness of the techniques presented in this work for trading in debt securities.

Besides, further research should broaden the scope of the comparative analysis of methodologies to cover the field of crypto-assets, such as cryptocurrencies and fan tokens, since, in recent years, financial institutions have increasingly incorporated crypto-assets in their portfolios.