1 Introduction

Traditionally, economic systems depend only on third-party financial organizations (e.g., banks). These organizations are the intermediates between the parties, having complete control over the transactions and process of exchanging funds. These traditional systems allow limited money transactions and lack security, trust, flexibility, and transparency [1]. To address these issues, a system that can minimize the role of financial intermediaries. Satoshi Nakamoto, a researcher, wrote a paper [2] in 2008, describing the notion of peer-to-peer (P2P) electronic payment transmission without the involvement of any intermediary financial organizations. In financial systems, cryptocurrency is a decentralized virtual currency [3]. Cryptography is used to secure this currency, making it impossible to be double-spent or faked [4]. A central authority (i.e., central banks) is not used to issue cryptocurrencies. It is created using the blockchain technology [5]. This technology is incredibly complicated and seeks to store data in a way that makes hacking, altering, or defrauding the system impossible or difficult. The blockchain is composed of two essential modules, namely, transaction and block. The transaction defines the participant’s initiated action, while the block records the transaction and additional details (e.g., creation timestamp and correct sequence) [6].

For more than 2 years, Bitcoin was the first and only blockchain-based cryptocurrency [7]. Beginning from the first cryptocurrency suggested by Satoshi Nakamoto, more than 5200 cryptocurrencies, such as Bitcoin, Ethereum, Cardano, Ripple, Monero, Stellar, Litecoin, and Dash, are being traded nowadays [8]. Because of its essential nature of mixing monetary units and encryption technology, cryptocurrencies have recently gained a lot of attention in the domains of cryptography, computer science, and economics [7]. Generally, cryptocurrencies are grouped into three major categories (i.e., currency, platform, and application). The currency domain serve as an exchange medium (i.e., payment method). Platform cryptocurrencies enable the development of a wide range of blockchain-based apps. Finally, cryptocurrencies classified as application domains are used in specific sectors [1].

Since the cryptocurrency’s inception, there has been a significant increase in usage, particularly in the previous 5 years. According to Blockchain.com, the the number of blockchain wallet users had increased from 2015 to 2021, as illustrated in Fig. 1 [9, 10]. Bitcoin went from having no value in 2008 to reaching the highest recorded price of $20,089.00 in 2017. Since then, the price of bitcoin has not fallen below $3000. In mid-April, 2021, Bitcoin prices reached all-time highs of more than $60,000 as Coinbase (i.e., a cryptocurrency exchange) went public [11]. On Nov. 10, 2021, Bitcoin again reached an all-time high of $68,789 before closing at 64,995$ [12]. On June 13, 2021, the prices of the cryptocurrencies dropped, thus, Bitcoin price lowered below $23,000 for the first time since December 2020 [13]. The total market capitalization of all cryptocurrencies was approximately 19 billion USD as of February 2017. According to [14], the top 15 currencies amounted to more than 97% of the market, while seven accounted for 90% of market capitalization [14].

Fig. 1
figure 1

Global blockchain wallet usership from 2015 to 2021

As mentioned before, cryptocurrencies gained broad market acceptance and accelerated development. Many financial institutions have included cryptocurrency-related assets in their trading strategies. Cryptocurrency trading is concerned with the act of selling and buying cryptocurrencies to make a profit. Kyriazis [15] surveyed the predictability of the pricing of cryptocurrencies. So, the efficient market hypothesis is rejected and speculation is possible through trading. Additionally, in Fang et al. [16], a comprehensive survey of cryptocurrency trading research, covering various aspects (e.g., cryptocurrency trading systems, prediction of volatility and return, and technical trading) was proposed. Makarova and Schoar [17] studied the arbitrage and price formation in the market of cryptocurrencies. Chava et al. [18] used Google’s Search Volume Index as a proxy for retail investor attention. They found that celebrity endorsements of crypto tokens, initial coin offerings (ICOs), non-fungible tokens (NFTs), and crypto platforms providing high yields to investors cause greater attention from areas concerning with higher lottery sales per capita.

The bitcoin price has gained the interest of scholars all over the world. Unfortunately, cryptocurrency prices are volatile and dynamic. It is determined by several elements (for example, popularity, mining difficulty, transaction cost, market trends, price of alternate coins, sentiments, stock markets, and some legal issues) [19]. Additionally, small cryptocurrencies with a small market share become a source of shocks that can benefit or harm other cryptocurrencies. These factors make the prices of cryptocurrencies unstable, change rapidly over time, and are difficult to be predicted. As a result, scams, suspected hacks, or other hidden problems lead to dramatically drop in cryptocurrency prices [20]. For example, on June 26th, 2019, more than 10% of the Bitcoin price was lost in a few minutes because of the crashes of the Coinbase digital exchange. Consequently, price prediction has become a critical responsibility for researchers [21]. Clients use thousands of coins around the world. The focus of this study is on three of the most popular cryptocurrencies (i.e., Bitcoin, Ethereum, and Cardano). Bitcoin is leading at $960.79 billion, followed by Ethereum at $189.98 billion, Binance Coin at $39.91 billion, Tether at $35.96 billion, and Cardano at $33.36 billion in market capitalization [22]. It is worth mentioning that these values are regularly updated as the market changes.

Different machine and deep learning architectures have been proposed to perform the task of cryptocurrency price prediction (e.g., support vector machines SVM [23], Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM) [24], and Deep Neural Networks (DNN) [25]). This study proposes a deep learning-based system to predict cryptocurrency values, employing 2 RNN algorithms, specifically, Bidirectional Long Short-Term Memory (BiLSTM) and Gated Recurrent Unit (GRU). Three datasets, namely, Bitcoin USD (BTC-USD) [26], Ethereum USD (ETH-USD) [27], and Cardano USD (ADA-USD) [28], including past cryptocurrency prices, are utilized for training the algorithms to anticipate cryptocurrency values. Figure 2 shows the flow for a cryptocurrency price prediction system.

Fig. 2
figure 2

The flow for a cryptocurrency price prediction system

The cryptocurrency price is suffering from instability and dynamism; hence, prediction is a crucial task. In the current study, deep learning-based algorithms (i.e., Bidirectional LSTM and GRU), are used to predict the prices of three of the most used cryptocurrencies (i.e., Bitcoin, Ethereum, and Cardano). This method seeks to uncover hidden patterns in data, integrate them, and generate more accurate forecasts. The current study’s contributions can be summarised as follows:

  • Utilizing deep learning-based algorithms (i.e., BiLSTM and GRU) to predict the three cryptocurrencies’ prices (i.e., Bitcoin, Ethereum, and Cardano).

  • Utilizing the Grid Search approach for the hyperparameters optimization processes.

  • Evaluating the performance of the proposed models utilizing evaluation metrics such as MSE, RMSE, MAE, and MAPE.

The rest of this paper is structured as follows: Sect. 2 introduces the related work in this field. Section 3 discusses the methodology, data acquisition phase, data pre-processing phase, classification and optimization phase, and performance evaluation phase. Section 4 explores the experimental results. Section 5 presents the current study limitations. Section 5 concludes the paper and presents the future work.

2 Related studies

The prediction of the cryptocurrency price is a time series problem [29]. A time series is a sequence of variable measurements made over time. Usually, these measurements are made at equally spaced times. Time series problems are completely different from other problems because of the following reasons. First, time series problems are time-dependent. Hence, the basic hypothesis of a linear regression model can not be held in this case. Secondly, most of these problems have some form of seasonality trends along with a decreasing or increasing trend (i.e. particular time frame has specific variations) [30]. Time series prediction is used to predict the future values based on the prior observed values over time. Some methods include Autoregressive (AR) [31], Autoregressive Integrated Moving Average (ARIMA) [32], Seasonal Autoregressive Integrated Moving Average (SARIMA) [33], Exponential Smoothing (ES) [34], DeepAR [35], and N-BEATS [36]. Machine learning (ML) models used to solve time series problems include K-Nearest Neighbor (KNN), Classification and Regression Trees (CART), Decision Trees, and Support Vector Regression (SVR), while deep learning (DL) algorithms include Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), XGBoost, AdaBoost, and Convolutional Neural Network (CNN). The most significant advantage of ML-based approaches is that they are computationally inexpensive for implementing online models. DL-based models show several advantages over other predicting models as it not only produces a result that is almost or exactly the same as the actual result but also enhance the result accuracy [37].

Deep learning (DL) is a type of artificial intelligence that uses previous data to predict the future. DL-based models offer several advantages over other prediction models in that they not only produce a result that is nearly or exactly the same as the true result, but they also enhance the result’s accuracy [37]. DNN, RNN, LSTM, and Deep Belief Networks are examples of DL.

Derbentseva et al. [38] presented two machine learning (ML) techniques for time-series data forecasting: random forests (RF) and stochastic gradient boosting machine (SGBM). They forecasted the prices of three of most valuable cryptocurrencies: Bitcoin, Ethereum, and Ripple. Their reported results show that the ML ensemble technique can be used to forecast bitcoin values. Three types of recurrent neural network techniques are proposed in Hamayel, and Owda’s [39] to predict the prices of three cryptocurrencies (i.e., Bitcoin,itecoin, and Ethereum). Their proposed methods produce accurate forecasts based on the mean absolute percentage error. According to the results, the GRU model outperformed the LSTM and bidirectional LSTM (BiLSTM) models in prediction for the three cryptocurrencies. GRU provides MAPE percentages of 0.2454%, 0.2116%, and 0.8267% for Bitcoin, Litecoin, and Ethereum, respectively.

Pour et al. [40] proposed a hybrid model for Bitcoin price prediction that uses Long Short-Term Memory and Bayesian Optimization. Their model was validated using MSE, RMSE, and NRMSE. Their reported results showed that the proposed model had good predictive power. Patel et al. [1] proposed a hybrid cryptocurrency prediction method based on LSTM and GRU, concentrating on Litecoin and Monero. According to reported results, the proposed algorithm predicted the prices with great accuracy, and it may be used in various applications based on the bitcoin price prediction. Finally, Kim et al. [41] proposed a multiple LSTM based on self-attention. Their proposed model contains numerous LSTM modules for on-chain variable groups as well as the attention mechanism of the prediction model. Experiments based on real Bitcoin price data were utilized to assess the suggested framework, which resulted in MAE, RMSE, MSE, and MAPE values of 0.3462, 0.5035, 0.2536, and 1.3251, respectively.

Miura et al. [42] utilized several ML models to predict future values based on prior samples, including multilayer perceptron, GRU, LSTM, SVM, and regression. The results demonstrated that the proposed system accurately predicts prices, thus, the technology might be utilized to anticipate prices for other cryptocurrencies. Yiying and Yeze [43] proposed a sophisticated artificial intelligence approach of fully connected Artificial Neural Network (ANN) and LSTM to study Bitcoin, Etherum, and Ripple price fluctuations. Their stated results revealed that ANN was more dependent on long-term history. LSTM, on the other hand, depended more on short-term dynamics, indicating that it extracts information from historical memory more efficiently than ANN.

A machine learning-based strategy was proposed in SHAHBAZI and BYUN [44] to perform cryptocurrency price predictions (i.e., Litecoin and Monero) for a financial institution. Their proposed method included a reinforcement Learning algorithm to assess and forecast cryptocurrency prices and a blockchain infrastructure to ensure a secure transaction environment. Their findings revealed that the proposed approach predicted prices more correctly than previous state-of-the-art algorithms. KALARIYA et al. [45] proposed a stochastic neural network model for cryptocurrency price prediction based on random walk theory. The Multi-Layer Perceptron (MLP) and LSTM models were trained, and the experimental results for Bitcoin, Ethereum, and Litecoin were released. The model outperformed the deterministic models, according to their findings.

Uras et al. [46] have used statistical methodologies and ML algorithms to forecast the prices of Bitcoin, Litecoin, and Ethereum. The Simple Linear Regression was used to forecast univariate series using only price data, whereas the Multiple Linear Regression was used to forecast multivariate series utilizing both price and volume data. Multilayer Perceptron and Long Short-Term Memory were the deep learning methods employed. In addition, a deep-learning-based hybrid model employing GRU and LSTM was proposed by TANWAR et al. [47] to forecast the price of Litecoin and Zcash. Their proposed model has the potential to be utilized in real-time applications. According to their published results, the model forecasted prices with great accuracy compared to existing models.

Related studies comparison: Table 1 presents a comparison between the mentioned literature and the current study. They are organized in descending order.

Table 1 Comparison between related studies and the current study

Research Gap: In the light of the mentioned literature, the studies addressing Bitcoin are more than those that address the other altcoins as seen in Table 1. For traders, security and privacy are critical problems while trading to gain more trust. Hence, the possibility of using ML- and DL-based algorithms to address the anonymity, security, and privacy level of other cryptocurrencies are needed to be explored. ML Ensemble algorithms are not examined that much in the field of price prediction of cryptocurrencies. Furthermore, not enough focus was performed on optimizing ML techniques to enhance the accuracy.

3 The suggested DLCP2F

As mentioned earlier, Cryptocurrency popularity increased in 2017 as its market value increased rapidly for several months in a row. Prices peaked at around $800 billion in January 2018 [48]. The current study suggests a framework for the Cryptocurrency Price Prediction that utilizes state-of-the-art deep learning architectures. The proposed framework is presented in five phases: (1) data acquisition, where the data is acquired from a public source, (2) data preprocessing phase to prepare the dataset for the next phase, (3) classification phase to learn and optimize the models, (4) performance evaluation phase, and (5) future prediction phase. It is summarized graphically in Fig. 3.

Fig. 3
figure 3

The suggested framework for the cryptocurrency price prediction

3.1 Data acquisition phase

The current study depends on three public real-time cryptocurrency datasets retrieved from “Yahoo Finance”. Their “Historical Prices” daily records are retrieved until the date “August 9, 2022”. The first dataset is named Bitcoin USD (BTC-USD) and consists of 2885 daily records from “September 17, 2014” [26]. The second dataset is named Ethereum USD (ETH-USD) and consists of 1735 daily records from “November 9, 2017” [27]. The third dataset is named Cardano USD (ADA-USD) and consists of 1735 daily records from “November 9, 2017” [28]. The three datasets consist of 7 columns: “Date”, “Open”, “High”, “Low”, “Close”, “Adj Close, and “Volume”. The “Open” and “Close” prices represent the currency market’s open and closed prices on a specific “Date”. The “High” and “Low” prices represent the currency market’s maximum and minimum prices on a specific “Date”. The “Volume” is the amount of money in circulation on a particular “Date”. Table 2 summarizes the details of the datasets. Figure 4 shows the close prices summarization for the three datasets. From it, the close prices are low in the initial period then takes an incremental slope. After that, the prices change but in the high region. From that, the datasets show a recognizable challenge to forecasting the cryptocurrency prices using the given trading features. Statistics on the three datasets are reported in Table 3. Skew is concerned with the measurement of symmetry. A distribution (i.e., dataset) is symmetric if the right and left sides look the same from the center point. Kurtosis measures whether the data is heavy- or light-tailed when compared to a normal distribution. Thus, datasets with high kurtosis (i.e., heavy-tailed) tend to contain outliers. while datasets with low kurtosis (i.e., light-tailed) lack outliers [49, 50]. Table 3 shows that the last column has a very high standard deviation compared to other columns.

Table 2 The different used datasets in the current study
Fig. 4
figure 4

The close prices summarization for the three datasets (i.e., BTC-USD, ETH-USD, and ADA-USD) from the initiated dated until “August 9, 2022”

Table 3 Statistics on the used datasets in the current study

3.2 Data preprocessing phase

The data is arranged chronologically and recorded at regular intervals (i.e., 1 day). It is considered a time series data that requires special treatment with the used models (i.e., BiLSTM and GRU). The first step is to filter the features. The current study uses the “Open”, “Close”, “Adj Close”, and “Volume” features while the other features are dropped. As the target of the current study is to predict the price of the cryptocurrency, it just depends on the selected columns. After that, the features will be squished using the min-max scaler (Eq. 1) where \(X_i\) is the input record and \(X_o\) is the scaled output record. This will facilitate the optimization algorithm to converge faster.

$$\begin{aligned} {X_o} = \frac{X_i-\min {(X_i)}}{\max {(X_i)}-\min {(X_i)}} \end{aligned}$$
(1)

The last step in the preprocessing phase is to make data sequences. Building sequences begin with creating a sequence of a specific length (i.e., window size) at position 0. Then a new sequence is created by shifting one position to the right. This is continued until all of the available positions have been utilized. Finally, the inputs and outputs are created using the same approach. The only difference between the inputs and outputs is that specified value shifts the outputs, namely “days shift”.

The models are controlled by two variables (i.e., days shift and sequence length). The days shift is concerned with the time gap between input (i.e., features) and output (i.e., close price). For example, if the value of the days shift is 5 and the first 10 days were taken as an input, the output will be from the 5th to the 15th day. How this will affect the prediction? When 10 days are entered as an input (i.e., from the 1st to the 10th day) and the value of the days shift is 3, it is supposed to predict the output from the 3rd to the 13th days. Since the goal is to predict future data, the last three elements on the predicted output values are the future values. The lowest value of the days shift is 1, hence, the future data of the next day will be predicted along with with the previous days. The sequence length is concerned with how the data are passed to the model. When the value of the sequence length is 10, the input will be divided into groups, each group consisting of 10 records, and each group is treated as one record. For example, the input X consists of 100 records, and the sequence length is set to 25, hence, four sequences will be generated, and each is treated as one record by the model. When the value sequence length is higher, the performance will be better. This happens because each record contains more information, however, the complexity of the training time will be increased. Thus, the current work aims to determine the best value for both days shift and sequence length through the use of the grid search approach.

Figure 5 shows a graphical sample of the training and testing inputs and outputs process. In this example, a 1000-record dataset is split into 900 for training and 100 for testing where the days shift value is set to 5. The training inputs start from 0 while the training outputs start from 5 (i.e., the days shift value). Hence, the input X is the first 850 records and the output Y is the last 850 records. This means that the prediction will be the forecast for the next 5 days based on the current inputs.

Fig. 5
figure 5

A graphical sample of the process for the training and testing inputs and outputs

3.3 Classification and optimization phase

The current phase works on creating two state-of-the-art deep learning models (i.e., BiLSTM and GRU) and optimizing them based on the input data. Long-Short Term Memory (LSTM) works by allowing each internal layer to use certain gates to access information from both previous and current layers. After going through several gates (for example, they forget and input gates) and many activation functions, the data is delivered via the LSTM cells (such as the Tanh function and ReLU function). The main advantage is that each LSTM cell can recall patterns for a specific time. It is important to note that LSTM can remember important information while forgetting irrelevant information. Furthermore, an LSTM’s default behavior is remembering information for a long time [51].

Bidirectional LSTM (BiLSTM) is a recurrent neural network (RNN) that is commonly used to process natural language. In contrast to typical LSTM, the input flows in both directions and can use data from both sides. In short, BiLSTM adds another LSTM layer, reversing the data flow. In a nutshell, the input sequence flows backward in the additional LSTM layer. The outputs from both LSTM layers are combined in various methods, including average, sum, multiplication, and concatenation [52]. The suggested BiLSTM network consists of: (1) an input LSTM layer with several units equal to the sequence length and Tanh activation function, (2) a 50% dropout layer, (3) a BiLSTM layer with 256 units, (4) another 50% dropout layer, and (5) an output layer with a linear activation function. Figure 6 presents the hierarchy of the BiLSTM model using a sequence length of 50. The “None” keyword means to accept any value.

Fig. 6
figure 6

A graphical presentation of the hierarchy of the BiLSTM model using a sequence length of 50. The graph is generated from TensorFlow

GRU (Gated Recurrent Unit) is an RNN that seeks to tackle the vanishing gradient problem. GRU might be regarded as a variant of the LSTM. It employs the so-called update gate and reset gate to overcome the vanishing gradient problem of a regular RNN. Two vectors determine what data should be sent to the output. They are unique in that they can be trained to retain knowledge from the past without having to wash it away over time or delete information unrelated to the forecast [53]. The suggested GRU network consists of: (1) an input GRU layer with a number of 50 units and Tanh activation function, (2) a 25% dropout layer, (3) another GRU layer with 100 units and Tanh activation function, (4) another 25% dropout layer, and (5) an output layer with a linear activation function. Figure 7 presents the hierarchy of the GRU model using a sequence length of 50. The “None” keyword means to accept any value.

Fig. 7
figure 7

A graphical presentation of the hierarchy of the GRU model using a sequence length of 50. The graph is generated from TensorFlow

For both networks, the AdaGrad parameters’ optimizer [54] is used. It has several advantages: (1) it eliminates the need to manually regulate the learning rate, (2) it achieves faster and more reliable convergence than the basic SGD when the weight scaling is unequal, and (3) it is not sensitive to the size of the step. It uses the update rule in Eq. 2 where \(\eta\) is the learning rate, \(g_t\) is the partial derivative of the objective function, and \(G_t\) is a diagonal matrix. \(\varepsilon\) is added to avoid any divisions by zeros. A model’s hyperparameter is a model’s feature that is independent of the model and whose value cannot be calculated from data. Before the learning process can begin, the hyperparameter’s value must be determined. The grid search (GS) is used to identify the model’s optimal hyperparameters that produce the most optimistic predictions [55]. The target is the GS approach to find the best combination between the sequence length and day shift value. The sequence length range is [10, 20, 30, 40, 50] and the days shift range is [1, 2, 3, 4, 5].

$$\begin{aligned} \theta _{t+1} = \theta _{t} - \frac{\eta }{\sqrt{G_t + \varepsilon }} \times g_t \end{aligned}$$
(2)

3.4 Performance evaluation phase

For each epoch, the performance is evaluated. The current study applies 100 epochs with the early stopping of 10. The dataset is split into training, testing, and validation. The testing size is set to 100. The validation size is set to 10% of the remaining data. The mean squared error is used as the loss and evaluation function. The less the value, the better the model. It is equated in Eq. 3 where N is number of records, \(y_i\) is the actual value, and \(y^*_i\) is the predicted value. Also, the root mean squared error, mean absolute error, mean absolute percentage error, and R2 score are calculated, and their equations are shown from Eqs. 4 to 6. It is worth mentioning that both Eqs. 3 and 4 can be derived from one another.

$$\begin{aligned} \text {MSE}= & {} \frac{1}{N} \times \sum _{i=1}^{N}{(y_i-y^*_i)^2} \end{aligned}$$
(3)
$$\begin{aligned} \text {RMSE}= & {} \sqrt{\text {MSE}} = \sqrt{\frac{1}{N} \times \sum _{i=1}^{N}{(y_i-y^*_i)^2}} \end{aligned}$$
(4)
$$\begin{aligned} \text {MAE}= & {} \frac{1}{N} \times \sum _{i=1}^{N}{\left|y_i-y^*_i\right|} \end{aligned}$$
(5)
$$\begin{aligned} \text {MAPE}= & {} \frac{1}{N} \times \sum _{i=1}^{N}{\left|\frac{y_i-y^*_i}{y_i}\right|} \end{aligned}$$
(6)

4 Experiments and discussions

The current section reports the executed experiments and their discussions. The experiments’ configurations are reported in Table 4.

Table 4 The experiments’ configurations in the current study

4.1 The “Bitcoin USD (BTC-USD)” dataset experiments

Table 5 shows reported performance metrics for the “Bitcoin USD (BTC-USD)” dataset using the two models (i.e., GRU and BiLSTM). The minimum MSE, RMSE, MAE, MAPE, and R2 are 0.00029, 0.01711, 0.02214, 0.07036, and − 2.89528 respectively for the GRU and 0.00210, 0.04582, 0.03358, 0.10942, and − 10.71676 respectively for the BiLSTM. The maximum MSE, RMSE, MAE, MAPE, and R2 are 0.00264, 0.05136, 0.07599, 0.21793, and 0.74536 respectively for the GRU and 0.00793, 0.08907, 0.08577, 0.25511, and 0.44163 respectively for the BiLSTM. For the GRU model, the best sequence length and days shift concerning the MSE (and RMSE) are 50 and 1 respectively, concerning the MAE are 50 and 4 respectively, and concerning the MAPE are 50 and 4 respectively. For the BiLSTM model, the best sequence length and days shift concerning the MSE (and RMSE) are 50 and 1 respectively, concerning the MAE are 50 and 1 respectively, and concerning the MAPE are 50 and 1 respectively. From them, the majority voted sequence length and days shift are are 50 and 1 respectively. Figures 8 and 9 summarize the reported RMSE results graphically using BiLSTM and GRU, where the x-axis represents the day shifts range (i.e., [1, 2, 3, 4, 5] and the y-axis represents RMSE values). From them, the GRU and 50-sequence-length report the best RMSE values while the BiLSTM and 50-sequence-length report the best RMSE values.

Table 5 The reported performance metrics for the “Bitcoin USD (BTC-USD)” dataset using the two models (i.e., GRU and BiLSTM)
Fig. 8
figure 8

A graphical summarization of the reported results using BiLSTM and “Bitcoin USD (BTC-USD)” dataset

Fig. 9
figure 9

A graphical summarization of the reported results using GRU and “Bitcoin USD (BTC-USD)” dataset

4.2 The “Ethereum USD (ETH-USD)” dataset experiments

Table 6 shows reported performance metrics for the “Ethereum USD (ETH-USD)” dataset using the two models (i.e., GRU and BiLSTM). The minimum MSE, RMSE, MAE, MAPE, and R2 are 0.00071, 0.02662, 0.05350, 19.67071, and − 0.48171 respectively for the GRU and 0.00446, 0.06681, 0.05710, 0.23379, and − 3.91045 respectively for the BiLSTM. The maximum MSE, RMSE, MAE, MAPE, and R2 are 0.00411, 0.06414, 0.09458, 33.75066, and 0.73784 respectively for the GRU and 0.01289, 0.11355, 0.10970, 0.40123, and 0.11536 respectively for the BiLSTM. For the GRU model, the best sequence length and days shift concerning the MSE (and RMSE) are 50 and 1 respectively, concerning the MAE are 50 and 5 respectively, and concerning the MAPE are 50 and 5 respectively. For the BiLSTM model, the best sequence length and days shift concerning the MSE (and RMSE) are 10 and 2 respectively, concerning the MAE are 50 and 1 respectively, and concerning the MAPE are 50 and 1 respectively. From them, the majority voted sequence length and days shift are are 50 and 1 respectively. Figures 10 and 11 summarize the reported RMSE results graphically using BiLSTM and GRU, where the x-axis represents the day shifts range (i.e., [1, 2, 3, 4, 5] and the y-axis represents RMSE values). From them, the GRU and 50-sequence-length report the best RMSE values while the BiLSTM and 10-sequence-length report the best RMSE values.

Table 6 The reported performance metrics for the “Ethereum USD (ETH-USD)” dataset using the two models (i.e., GRU and BiLSTM)
Fig. 10
figure 10

A graphical summarization of the reported results using BiLSTM and “Ethereum USD (ETH-USD)” dataset

Fig. 11
figure 11

A graphical summarization of the reported results using GRU and “Ethereum USD (ETH-USD)” dataset

4.3 The “Cardano USD (ADA-USD)” dataset experiments

Table 7 shows reported performance metrics for the “Cardano USD (ADA-USD)” dataset using the two models (i.e., GRU and BiLSTM). The minimum MSE, RMSE, MAE, MAPE, and R2 are 0.00007, 0.00852, 0.00971, 0.06256, and − 2.59725 respectively for the GRU and 0.00031, 0.01752, 0.01599, 0.10515, and − 8.22629 respectively for the BiLSTM. The maximum MSE, RMSE, MAE, MAPE, and R2 are 0.00068, 0.02604, 0.02152, 0.13393, and 0.24750 respectively for the GRU and 0.00165, 0.04058, 0.03424, 0.21699, and − 0.20519 respectively for the BiLSTM. For the GRU model, the best sequence length and days shift concerning the MSE (and RMSE) are 50 and 1 respectively, concerning the MAE are 50 and 4 respectively, and concerning the MAPE are 50 and 4 respectively. For the BiLSTM model, the best sequence length and days shift concerning the MSE (and RMSE) are 30 and 1 respectively, concerning the MAE are 40 and 3 respectively, and concerning the MAPE are 40 and 3 respectively. From them, the majority voted sequence length and days shift are are 50 and 1 respectively. Figures 12 and 13 summarize the reported RMSE results graphically using BiLSTM and GRU, where the x-axis represents the day shifts range (i.e., [1, 2, 3, 4, 5] and the y-axis represents RMSE values). From them, the GRU and 50-sequence-length report the best RMSE values while the BiLSTM and 30- and 40-sequence-length report the best RMSE values.

Table 7 The reported performance metrics for the “Cardano USD (ADA-USD)” dataset using the two models (i.e., GRU and BiLSTM)
Fig. 12
figure 12

A graphical summarization of the reported results using BiLSTM and “Cardano USD (ADA-USD)” dataset

Fig. 13
figure 13

A graphical summarization of the reported results using GRU and “Cardano USD (ADA-USD)” dataset

5 Limitations

One of the main limitations of this work is that each cryptocurrency was treated independently neglecting its potential relations with other cryptocurrencies. Additionally, the non-stationarity and the complexity of cryptocurrency time-series data are not considered. Additionally, all the coins considered in the study are high market capitalization, thus, their behavior is different from newly ICO coins that have low market capitalization. The proposed approach suffers from un-instantaneity. As the training of the models is the most time-consuming stage. Moreover, only 2 RNN architectures are used.

6 Conclusion and future work

A precise prediction of cryptocurrency prices methodology is critical in digital financial markets. Due to variances in forecasting capability per coin, artificial intelligence and machine learning approaches are appealing. This study introduced a framework based on two types of deep learning algorithms (i.e., BiLSTM and GRU). They are utilized to predict the prices of three of the most famous types of cryptocurrency (i.e., Bitcoin, Ethereum, and Cardano). The framework consists of five main phases. First, data is retrieved from a public real-time cryptocurrency source from “Yahoo Finance.”Second, data is preprocessed to prepare the dataset for the next phase via filtering and squished features. Third, the classification via BiLSTM and GRU and model optimization. Fourth, performance evaluation for each epoch, and finally, the future prediction phase. The evaluation metrics such as MSE, RMSE, MAE, MAPE, and R2 are applied to test the accuracy of the used models. For the “Bitcoin USD (BTC-USD)” dataset, for the GRU model, the minimum MSE, RMSE, MAE, and MAPE are 0.00029, 0.01711, 0.02214, and 0.07036, respectively. Similarly, for the BiLSTM model, the minimum MSE, RMSE, MAE, and MAPE are 0.00210, 0.04582, 0.03358, and 0.10942, respectively. For the “Ethereum USD (ETH-USD)” dataset, for the GRU model, the minimum MSE, RMSE, MAE, and MAPE are 0.00071, 0.02662, 0.05350, and 19.67071, respectively. Similarly, for the BiLSTM model, the minimum MSE, RMSE, MAE, and MAPE are 0.00446, 0.06681, 0.05710, and 0.23379, respectively. For the “Cardano USD (ADA-USD)” dataset, for the GRU model, the minimum MSE, RMSE, MAE, and MAPE are 0.00007, 0.00852, 0.00971, and 0.06256, respectively. Similarly, for the BiLSTM model, the minimum MSE, RMSE, MAE, and MAPE are 0.00031, 0.01752, 0.01599, and 0.10515, respectively. The results indicate that GRU outperformed the BiLSTM algorithm for Bitcoin, Ethereum, and Cardano, respectively. The R2 for the GRU model was found to be − 2.89528, − 0.48171, and − 2.5972 for Bitcoin, Ethereum, and Cardano, respectively. Relying on these results, the GRU model is more efficient and reliable in predicting the prices of cryptocurrencies than BiLSTM but the two algorithms deliver excellent results.

In future work, other factors that affect the cryptocurrency market will be investigated. Autoencoder-based time series neural networks will be applied to perform predictions of the time-series data. Moreover, various hyperparameter tuning algorithms such as random or metaheuristic optimization algorithms (e.g., Genetic algorithm, Particle Swarm algorithm, and Bayesian optimization algorithm) can be applied instead of grid search. Additionally, the effect that social media can have on the price and trading volume of cryptocurrencies will be focused on. Hence, sentiment analysis and natural language processing techniques will be used to analyze posts and tweets to extract insights.