1 Introduction

In late 2019, the outbreak of the COVID-19 pandemic caused major disruptions with heightened uncertainty and lockdowns that led to increased unemployment rates and severe economic slowdown. Estimated as one of the most painful economic crises since the Great Depression (1929–1933), it resulted in massive financial and economic damages worldwide with far-reaching consequences on most industries (Alshater et al., 2021; Banna et al., 2022). The global energy system witnessed a significant drop in energy consumption with rising uncertainty about energy supply (Shamsudheen et al., 2022) and consumption patterns (Hallack & Weiss, 2020a, b; Olubusoye et al., 2021a, 2021b). Compulsory lockdowns significantly constrained human mobility, and manufacturing lockdown damaged economic activities leading to a shrink in energy demand. This unprecedented shock caused a dramatic drop in energy equities prices and severe market instability. Remarkably, the US energy market, one of the largest global energy consumers, was dramatically hit. For the first time in history, the West Texas Intermediate (WTI) crude oil future prices crashed from USD 85 per barrel on January 15, 2020, to a negative price of USD 40.32 per barrel on April 20, 2020. The volatility of crude oil and other energy commodities resulted in devastating effects. They are considered the main production factors and lead strategic energy sources for the global economy (Tien & Hung, 2022). While the market slowly recovered, the Russian invasion of Ukraine changed the market dramatically and resulted in a more volatile energy sector.

This economic turmoil raised investors’ concerns worldwide, and an approach to mitigate its harmful consequences became crucial. The present paper aims to examine three essential aspects of the US energy equity markets. First, we predict energy prices through uncertainty indices. Several studies were among the first to examine the connectedness effects between uncertainty indices and market volatility; they found that uncertainty indices can strongly predict oil prices (Chen et al., 2021). Second, we compare conventional methods with machine learning methods as many studies found that the latter group outperforms the former group (Ghoddusi et al., 2019). Finally, we determine the most effective method for an Early Warning System (EWS) for the energy sector during uncertain economic conditions; for instance, Okur et al. (2020) stated that few studies conducted a comparative examination of superior Machine learning (ML) models during uncertain and extraordinary economic conditions.

We employ ML and conventional approaches to achieve the objective of this study. The ML is especially effective for issues that cannot be addressed directly by an analytical solution, model matching, or complex regression and classification tasks. ML models have gained popularity in many aspects of the energy business, inter alia energy prices. This is due to better data processing, categorization, proactive use of large-scale complex information and a large pool of complex data, as well as overlapping dynamics associated with a high level of uncertainty. In this context, predicting energy prices requires more sophisticated models than conventional approaches. ML is the most reliable approach as it allows to focus on non-linear dependencies and exchanges between forecasters. It also improves forecasting accuracy using all possible input variables (Ghoddusi et al., 2019). Effectively, accurate prediction of the highly volatile energy prices constitutes a reliable reference for controlling the costs, grasping market trends and opportunities, and providing a scientific tool and evidence-based data for policymakers and market regulators (Lu et al., 2021). Accordingly, the integration of EWS in energy economics and finance becomes indispensable to achieve major goals like lowering operational expenses, securing energy sectors, and contemplating economic revival.

The contributions of this paper to the extant literature manifold are as follows. First, this study, to the best of our knowledge, tests the power of EWS to predict the energy prices during the pandemic. Second, it contributes to evaluate the risk of uncertainty during the COVID-19 period in the USA by using uncertainty indices as independent variables (predictors) and energy equity indices as dependent variables (responses). The analysis was performed in the context of USA since they are the largest global energy consumers, and related data is available. Third, it performs the predictability simulation by relying on conventional approaches (like regression analysis) and ML models during the pandemic to highlight ML superiority and provide accurate energy price forecasts. Fourth, it extends the ML literature by incorporating around 26 Artificial Intelligence (AI) predictive approaches and highlighting its effectiveness in uncovering key determinants for energy price predictability. Our results find that the Nonlinear Autoregressive with External (Exogenous) parameters (NARX) of Neural Networks (NN) scored significantly better accuracy than all other ML models and conventional approaches.

The remainder of the paper is structured as follows. Section 2 synthesizes the related literature review. Section 3 describes the data and methodology, while the empirical results are presented in Sect. 4. Section 5 highlights our discussions and Sect. 6 concludes the analysis.

2 Literature review

In the United States, the energy system is built from a massive interconnected network that produces and distributes energy from a wide range of energy sources to keep pace with the increasing demand. Industry dynamics are driven by a variety of complex, uncertain and diverse factors beyond supply and demand frameworks. Exogenous shocks such as the Middle East crises, subprime mortgage financial crises, natural disasters, economic uncertainty, and other extreme events such as coronavirus pandemic influence energy stock prices and aggravate their volatilities (Ftiti et al., 2020). Understanding the complex linkages and dependencies between energy prices and external factors is therefore crucial for investors and policy makers to develop and evaluate appropriate strategies and alternatives in times of crisis. A growing literature strand paid attention to finding accurate predicting mechanisms for stock price volatility which has a significant impact on energy consumption. In doing so, many researchers employed conventional methods (regression, e.g., ARIMA, GARCH) to predict volatility. Yet, in the past decade, another approach emerged based on machine learning (ML) and Artificial Intelligence (AI).

ML is a data science-driven model that can identify existing data patterns and enhance temporal effectiveness. The beginning of ML dates back to the 1950s and 1960s, when researchers' curiosity led them to mimic human learning through computers. Accordingly, insightful information can be extracted, which can subsequently be utilized for the projection and generation of new knowledge. Initially, ML implementation started to flourish in the economics and finance fields. One of its earliest uses in energy economics was to forecast electricity prices. In the 2000s, ample publications attempted to estimate power costs using traditional Artificial Neural Network (ANN) methodologies. For instance, Khosravi et al. (2013) employed the delta and bootstrap methodologies to generate power price prediction intervals (PIs) for parameter uncertainty. Papadimitriou et al. (2014) evaluated the predictive power of Support Vector Machines (SVM)-based forecast models to assess the next-day direction shift in power prices. Their findings indicate that SVM is a solid strategy for forecasting short-term power prices with a predictive efficiency of 76.12% across a 200-day timeframe.

Another literature strand invigorated the use of ML and econometric models. This trendy combination gained popularity and attracted scholars’ attention. Godarzi et al. (2014) created a dynamic Non-Linear Autoregressive model incorporating Exogenous inputs (NARX), similar to the Levenberg–Marquardt approach, using the Ensemble Empirical Mode Decomposition (EEMD) approach. Zhang et al. (2015) decomposed global crude oil prices into a variety of separate Intrinsic Mode Functions (IMFs) and residual terms. They also created the Least Square Support Vector Machines (LSSVM)—Particle Swarm Optimization (PSO) approach. They applied the Generalized Autoregressive Conditional Heteroskedasticity (GARCH) model to anticipate the time-varying and non-linear components of crude oil prices. The former models frequently relied on an autoregressive framework. In contrast, the latest studies employed a hybrid strategy capable of managing a diverse collection of input factors such as need, supply, and equity market indices (Chai et al., 2018). Dogah and Premaratne (2018) investigated the susceptibility of BRICS stock returns to changes in oil risk indicators. They integrated the Vector Autoregressive (VAR) model with a random forest approach to address some VAR shortcomings and tackle oil-risk issues.

In this regard, it is essential to expose the benefits of ML sophisticated approaches and the attributes of its superiority over traditional economic models. The scientific community reported a marginally significant increase in predicting out-of-sample forecast accuracy and performance assessment measures. To illustrate the disparities between actual and anticipated values, a standard key metric, the Root Mean Square Error (RMSE), was used. A low RMSE implicates better predicting power. In the same vein, Collado and Creamer (2016) anticipated natural gas prices using an approximation dynamic programming technique that combines "a time series method (ARIMA) with two machine learning algorithms (Support Vector Machine and Random Forests). This technique surpasses logistic regression, which was treated as a baseline. Debnath and Mourshed (2018) explored the prediction model for Energy Planning Models (EPMs) and discovered that Artificial Neural Network (ANN) is the most used approach. Athey (2017) and Mullainathan and Spiess (2017) presented a non-technical concise summary and evaluation of machine learning's economic applicability. The economic sector recently allocated great importance to exploring ML capabilities and limits in testing hypotheses and causation, especially in the wake of the COVID-19 pandemic.

However, a more sophisticated variant of ANNs, the Deep Learning (DL) models, is still rarely used in predicting crude oil prices. Zhao et al. (2017) exceptionally applied the DL algorithm "Stacked Denoising Autoencoders (SDAE) with bootstrap aggregation in the research (bagging). The bagging process created a large number of data sets for training a series of SDAE base models. Tang et al. (2015) presented an ensemble paradigm that combined Extended Extreme Learning Machine (EELM) and Complementary Ensemble Empirical Mode Decomposition (CEEMD) to improve petroleum prices’ predictive accuracy. The findings revealed that the model is a potential prediction tool for complex time series data with significant instability and irregularities. Zhu et al. (2016) proposed an Adaptive Multiscale Ensemble Learning (AMEL) paradigm that integrated LSSVM, PSO, and EEMD with a kernel function prototype.

Other studies include Dudek (2016) who proposed a probabilistic power price forecasting strategy based on a Feed-Forward Neural Network (FFNN) precluding the pre-processing of extra data. Panapakidis and Dagoumas (2016) studied predicting models that include ANN-based prices for the day ahead. To anticipate energy prices, Wang et al. (2017) proposed a two-layer decomposition strategy before building a hybrid model built on Fast Ensemble Empirical Mode Decomposition (FEEMD), Variational Mode Decomposition (VMD), and BPNN and enhanced with a swarm intelligence-based approach. The model outperformed the one-step and multi-step power price estimates. To address the limitations of the classic ANN model, Singh et al. (2017) used a modified neuron model to anticipate the short-term power pricing of the energy market in Australia.

Table 1 presents the previous studies in the field and reports their contextual methodological use.

Table 1 Machine Learning use in energy economics and finance

3 Data and methodology

3.1 Data

This study uses daily data between January 1, 2011 and January 18, 2022. The sample contains 2677 observations, and the data was collected from Thomson Reuters DataStream and the policy uncertainty website. To assess the predictive capability of ML models, we split the sample into two periods: the pre-pandemic period from 1/6/2011 to 31/12/2019 and the COVID-19 pandemic period from 1/1/2020 to 18/1/2022.Footnote 1 We spot three critical dates related to COVID-19 events. The first date was January 27, 2020, when the World Health Organization (WHO) described the COVID -19 as a global threat based on assessed risk worldwide. The second date was February 24, 2020, marking two critical events: (1) a significant increase in COVID-19 cases outside mainland China and (2) the collapse of the global market. The third date was September 3, 2020, when a complete global lockdown was put in place. The indices are presented in Table 2, along with the study descriptive statistics. The data is separated into Panels A and B. Panel A includes the energy equity indices, the dependent variables (responses). Panel B contains the economic uncertainty indices, the independent variables (predictors). Precisely, Panel A consists of the US Renewable Energy Equity index, US Oil, Gas, and Coal Equity index, US Pipelines Equity index, US Oil Equity & Services index, US Oil & Gas Refining & Marketing Equity index, US Oil: Crude Production Equity index, US International Oil & Gas index, and the main US Energy index. In the same framework, Panel B consists of the Twitter Economic Uncertainty (TEU-USA) index, the EMV Infections Uncertainty (EMV) index, the Economic Policy Uncertainty (EPU) index, and the Chicago Board Options Exchange (CBOE) Market Volatility Index (VIX), respectively. The description and sources of the variables are illustrated in Appendix 1.

Table 2 Descriptive Statistics

In Panel A, almost all indices are left- or right-skewed since means are less than their respective medians and vice versa. Based on standard deviation, US Oil & Gas Refining & Marketing Equity index has the highest volatility while the US Renewable Energy Equity index and US pipelines have the lowest. In almost all cases, the kurtosis is less than 3, which means the indices have lighter tails than in a normal distribution. Furthermore, most indices are negatively skewed, which implies that extreme changes occur more frequently. In Panel B, TEU-USA and the EPU index are the most volatile, while the VIX is the least. All indices are right-skewed and have kurtosis higher than three. On the other hand, skewness is positive for all variables. Furthermore, the Augmented Dickey-Fuller (ADF) tests for unit roots are accepted for all indices; the more negative it is, the stronger the rejection of the hypothesis that there is a unit root at some level of confidence. However, after computing the differences between consecutive observations, the dataset becomes stationary. This method is known as differencing (1-lag). The Phillips–Perron (PP) test confirms the ADF unit roots results.

3.2 Methodology

This study uses ML and traditional approaches (i.e., regression) to forecast the prices of energy equity indices. The Root Mean Square Error (RMSE) is deployed to examine models' implementation alongside the accuracy of the predictability. In Table 1 above, we gathered previous studies and pinpointed respective methodologies in chronological order. Notably, previous studies have used one- or two-ML approach(es) in combination with conventional models (such as regression) to predict the results. In the present paper, we combine almost all available ML models in the literature with conventional models including Multiple Linear Regression model (MLR) to predict the energy prices pre-and during the COVID-19 pandemic. The last period is characterized by the high level of uncertainty. It thus serves to assess whether economic uncertainty indices can be helpful in measuring the impact of energy economics and finance during the pandemic. As previously mentioned, the demand for energy has dramatically fluctuated during the pandemic, making it difficult to predict the actual needs of supplies. Following the framework of this approach, the MLR is most suitable method to proceed with predictors and responses, as it dramatically fits our model structure and the set of dependent and independent variables (Aertsen et al., 2010; Raju & Laxmi, 2020).

The present study employed the following approach to predict the prices of the energy sectors:

For sectors of energy prices, we use the energy equity sector indices as indicated in Panel A of Table 2. First, we estimate the energy sectors using a simple Multiple Linear Regression model (MLR) and check the performance. Second, we evaluate the results of Supervised Regression-based Machine Learning models and compare them to MLR to determine which strategy is more effective in forecasting energy prices. We run both models for both periods with the same features to cross-check models’ performance (in-sample) and during the uncertain period of COVID-19 (out-of-sample).

We follow an exhaustive application of ML methods previously used in the literature (ML Generalized Linear Model (GLM), Support Vector Regression (SVR), Gaussian Process Regression (GPR), Regression Trees, Ensemble Methods, and Neural Networks)."The last method was gauged following the Deep Learning family and specifically the Levenberg–Marquardt algorithm for non-linear least squares modes to train networks with dynamic time series.Footnote 2 The accuracy of the model is calculated with Root Mean Square Error (RMSE):


4 Empirical results

This paper extends the literature by combining various ML models to propose the most suitable technique for predicting volatility in times of crisis by comparing its efficiency with conventional methods. Table 3 presents the results of the Multiple Linear Regression analysis pre-and during COVID-19. In the pre-pandemic period, R-Squared is significantly low in all cases. However, during COVID-19 most of the variables are statistically significant, which means that the uncertainty indices are suitable and can be used to predict the energy equity prices. F–Statistic is significant in all cases, while R-Squared varies from 0.44 to 0.67. Not surprisingly, the variables of the indices are negatively correlated with energy prices.Footnote 3 Lastly, the RMSE looks satisfactory in all cases except the US Oil & Gas Refining & Marketing Equity index. The RSME looks high in pre-and during COVID-19 periods at a 13,798 and 6,534, respectively. Finally, RSME is lower in the COVID-19 pandemic as increased uncertainty explains the co-movements of energy prices and uncertainty indices.

Table 3 Multiple Linear Regression analysis results

The intercept parameter is positive and statistically significant in all cases. The mean of intercept is the value of the dependent variable when the independent variable equals zero. As the values of the parameters are relatively high for estimations, it can be considered that there are significant factors that affect the energy prices not related to the uncertainty indices. Moreover, the VIX index is negatively correlated in all estimations but not statistically significant in the case of US Oil Equity & Services and US Oil & Gas Refining & Marketing in the pre-pandemic. At the same time, during the pandemic, it’s not statistically significant in the case of US Renewable Energy index.Footnote 4 The EMV Infections Uncertainty index is not statistically significant except with US Pipelines Equity index (pre-pandemic) and US Renewable Energy index (during a pandemic). The US Economic Policy Uncertainty and US Twitter Economic Uncertainty indices are statistically significant in almost all cases with remarkably small values. Both indices have the highest impact on predicting the energy prices in our model.

In Figs. 1 and 2, we illustrate the actual and predicted prices along with residuals based on MLR analysis.Footnote 5 During the COVID-19 pandemic, the prices become very volatile compared to the previous period, especially after the WHO announcement about the spread of COVID-19 globally; prices have become very unstable with dramatic decreases and extreme volatility till today. Also, there is no doubt that the MLR model struggles to effectively capture the predictions of the prices over that period (from January 2020 and beyond). In addition, as we mentioned above in Table 3, the residuals are less volatile during the COVID-19 pandemic. This makes sense since increased uncertainty during the pandemic explains the co-movements of energy prices and uncertainty indices. Lastly, the residuals of the US Oil & Gas Refining & Marketing Equity index are highly volatile, indicated by high RMSE value.

Fig. 1
figure 1

Actual and Predicted prices with Residuals from the MLR model—Pre pandemic period. Notes: The blue line presents the actual prices while the orange the predictions from the MLR model. In the upper right corner of every subfigure, we provide the name of each category of the energy sector. The depicted data is non-stationary

Fig. 2
figure 2

Actual and Predicted prices with Residuals from the MLR model—COVID-19 pandemic period

In Tables 4 and 5, we outline the training outcomes obtained out of ML models before and after the pandemic. Notably, ML models ultimately outperform the MLR approach. RMSE of most ML models is less than MLR, and R-Squared is higher under ML models. Figures 3 and 4 show the rankings of the predictions based on RMSE for all the employed models in this study. The Neural Network (NN) with the Levenberg–Marquardt algorithm clearly has the greatest result of all designs. The Gaussian Process Regression Models come second, while Regression Trees and SVMs come third and fourth. The Multiple Linear Regression model is always in the last position. In summary, ML models outperform conventional econometric approaches such as MLR.

Table 4 Machine learning training results—pre pandemic period (1/2)
Table 5 Machine learning training results—COVID-19 pandemic period (2/2)
Fig. 3
figure 3

Rankings of RMSE for all forecasting models—Pre pandemic period

Fig. 4
figure 4

Rankings of RMSE for all forecasting models—COVID-19 pandemic period

However, it is worth mentioning the outstanding performance of forecasting of NN with the Levenberg–Marquardt algorithm. As shown in Figs. 3 and 4, the NN has far less RMSE than any other model (machine learning and MLR). The key to this routine is hidden away in the NN structure. Figure 5 explains the structure of the NN with the Levenberg–Marquardt algorithm. The NN works as Nonlinear Autoregressive with Exogenous parameters. Specifically, y(t) input is the actual price of the equity index, while the four exogenous input parameters are the uncertainty indices. In the model, we use ten hidden neurons and two delays (after optimization). The input summary includes an (observations) x (4 uncertainty indices) matrix, demonstrating complex data: (observations) timesteps of four elements (the uncertainty indices) while the output Targets an (observations) x 1matrix, representing dynamic data: (observations) timesteps of one element (which is the predictions).

Fig. 5
figure 5

Structure of Neural Network with Levenberg–Marquardt. Notes: We use ten hidden neurons and two delays. Also, in the upper left corner, we can see 4 exogenous parameters (the uncertainty indices), while in the bottom left, the energy prices as a Nonlinear Autoregressive with External (Exogenous) parameters (NARX) model

The network is formed and taught in open-loop mode, as illustrated in Fig. 5. Closed-loop training is less efficient than open-loop learning (multi-step). The open-loop allows us to supply correct historical results to the system while educating it to give accurate and current outcomes. Following training, the system may be turned into a closed-loop or really any design required by the app. There are three types of Timesteps: training, validation, and testing. During training, the network is supplied with timesteps, which is altered based on its inaccuracy. Validation timesteps are employed to evaluate network generalization and stop educating if adaptation is no longer improving. Because testing timesteps have no impact on learning, they give an objective audit of network quality both during and after learning. Prediction using these NN models is a type of dynamic filtering in which previous values with one or many time series are deployed to forecast subsequent values. Non-linear filtering and prediction are performed using dynamic neural networks with tapped delay lines. Figures 6 and 7 present the time series response of the trained NN with the Levenberg–Marquardt algorithm. The residuals (verified through RMSE) are the least volatile of any other model employed in this study. The model looks pretty accurate during the pandemic in holding good persistence while predicting the actual prices. Lastly, regarding the validity of the models, Fig. 5 presents the Mean Square Errors (MSE) of the most accurate model in the study, the Levenberg–Marquardt algorithm approach. The literature (e.g., Abedin et al., 2021; D’Ecclesia & Clementi, 2021; Yazici et al., 2020) only considers a well-trained model when the training and test sets are very similar. If the RMSE of the test set is much higher than that of the training set, the data is likely overfitted. In our study, the Levenberg–Marquardt algorithm approach produces the best validation between 11 and 20 epochs, which is generally very low, proving the high accuracy of the model.

Fig. 6
figure 6

Time series Response of the trained N.N. with Levenberg–Marquardt algorithm—Pre pandemic period

Fig. 7
figure 7

Time series Response of the trained N.N. with Levenberg–Marquardt algorithm—COVID-19 pandemic period

Overall, the Artificial Intelligence and Machine Learning models allow us to make more accurate predictions than the traditional Multiple Linear Regression models. The Neural Networks seem to be the most accurate from the Machine Learning models. This can be explained due to the flexibility to use autoregression simultaneously with exogenous input parameters, which makes the response outcome entirely accurate and minimizes RMSE relatively to the rest of ML models.

5 Discussions

ML models have grown in popularity in many aspects of the energy industry due to their superior performance in processing, categorizing, as well as projecting complex and large-scale data. Comparing ML's features with classic econometric models (e.g., MLR) validates ML's rising appeal in energy economics analyses. The advantage of ML approaches stands in the ability of algorithms to handle massive volumes of structured and unstructured information while making quick decisions or forecasts. In this context, many studies relied on ML techniques, and some of them concluded to conflicting results. Some scholars have particularly applied ML in predicting solar radiation (Voyant et al., 2017), renewable energy integration (Perera et al., 2014), and calculating client-electric power use (Zemene and Khedkar, 2017). Weron (2014) explored ways for forecasting power prices and devoted a significant amount of the text to ML methodologies under the title Computational Intelligence (CI). Debnath and Mourshed (2018) investigated forecasting model for Energy Planning Models (EPMs) and found that Artificial Neural Network (ANN) is the most often utilized forecasting technique. Athey (2017) and Mullainathan and Spiess (2017) provided a non-technical overview and assessment of ML's economic/econometric applications.

Moreover, a methodology required a combination of ML and econometric models. For instance, Godarzi et al. (2014) developed a dynamic Non-Linear Autoregressive model with Exogenous inputs (NARX), similar to what we achieved using the Levenberg–Marquardt approach. Remarkably, most previous models have frequently employed a regressive framework, while recent studies used a hybrid strategy capable of managing a diverse collection of input factors such as need, supply, and equity market indices (Chai et al., 2018). Khosravi et al. (2013) created power prices Prediction Intervals (PIs) for uncertainty quantification through the use of the delta and bootstrap approaches, while Papadimitriou et al. (2014) investigated the efficacy of SVM-based forecasting models for predicting the next-day directional change in electricity prices. The literature has reported a marginally significant increase in predicting out-of-sample ability. Previous studies have relied on common performance indicators such as Root Mean Square Error (RMSE) to illustrate the disparities between actual and anticipated values. However, no study has conducted a thorough comparison and ranking of ML accuracy models as done in the present paper.

ML models have several advantages in dealing with data, processing, and analysis. First, ML can manage a diverse and vast number of inputs where the model's designer has little burden, particularly DL models, to select a minimal number of appropriate input variables. Secondly, the ML technique can accept dozens of potential input variables and choose the right components (or features) for prediction without respect to co-linearity problems. Third, ML algorithms are frequently capable of handling a wide range of theoretical and accurate data. This feature is particularly useful in the energy industry since it can combine text from articles or publications with time-series data to enhance projections. Fourth, ML models can also uncover complicated linkages and investigate various topologies for potential links across input and output data. Bayesian Model Averaging (BMA) methods make it possible to run a model with a specific data set in the space of classical models. Yet, this latter technique needs the modeler to specify each model's structure while ML models do not need a predefined structure. Fifth, they surpassed linear correlations to discover complicated, non-linear, and high-dimensional interactions among many input factors and intended outputs. Sixth, ML models are less sensitive to data quality. Traditional economic models have long been grappled with data scarcity difficulties (e.g., data sets that include outliers). Some methods (for example, Fuzzy and GNN models) were successfully developed within the ML field to provide the system with less worthy input. For instance, Alobaidi et al. (2018) provided guidance on ensemble models' capacity to deliver enhanced forecast performance with limited inputs. Seventh, ML models can function with little data pre-processing. Seasonal fluctuations, structure break, regime shifting, unit-root, and heteroskedasticity are all recognized properties of energy time-series data that should be tackled prior to implementing the econometric model. ML models do not require extensive data pre-processing since they may incorporate extra data features. Naturally, when features are customized to the demands of each projection via suitable ratios or kernel tweaks, ML algorithm performance would improve. In a nutshell, most ML approaches include advanced features to collect critical traits for the eventual prediction models.

Moreover, ML’s primary focus is related to forecasting market values. Many non-economic ML experts fail to recognize the core ML difference related to traded assets and physical events predictions. Predicting the temperature does not affect climate behavior, yet predicting crude oil and equity prices could immediately impact current prices and stimulate trade activities. In this regard, each ML algorithm that attempts to forecast prices using other algorithms creates an externality. Each effective forecasting model makes the following algorithm more difficult and increases the chances of re-using underused approaches. A broad ML research topic is theory-driven which combines both theory and technique. For example, Gu et al. (2020) investigated several methods and discovered their superiority to depict commodity price changes, especially risk premia evaluation. The recent advances in Deep Learning approaches effectively transform the whole ML domain. Due to the multi-layer structure, DL techniques allow the algorithms to handle a substantially greater proportion of inputs in a very consistent manner without necessitating any prior feature specification. On the opposing hand, ML approaches often require a high volume of input data that requires additional effort and time.

The organizational and financial implications of machine learning and artificial intelligence in the energy business and other economic areas are constantly affected by ML innovations. The protracted impact on the energy industry's architecture has to be observed and analyzed and might replace or supplement human talents. In the replacement scenario, the automation will increase and replace human interventions, while ML is supposed to improve personnel’ abilities in the supplementary role. As a result, an industry reorganization and new industry structures and actors are expected. Accordingly, future research should investigate and assess ML influence on critical aspects such as energy efficiency and cost, smart networks, the effectiveness of energy markets and exchanges, and the workforce in business during the pandemic. As such, the environmental disruption, pervasiveness of sustainable power, and spread of smart grids, machine learning, and artificial intelligence (ML/AI) could be utilized to augment forecasting abilities and integrate volatile renewable energy resources. This will contribute to balancing energy grids and understanding trends related to needs and demands.

Finally, though large energy commodities markets are very efficient, the advantages of price projection following advanced methodologies are still modest. Such benefits may become lower when giant firms adopt the technology to persuade the industry about its effectiveness while using latent data to reach the equilibrium prices, which is a serious hurdle effectively. Hence, ML becomes more useful in forecasting and anticipating market dangers by depicting the dispersion of shocks on equilibrium variables paired with network models and ML optimizing risk management strategies.

6 Conclusions

In the present study, we attempted to introduce an Early Warning System (EWS) to forecast energy equity prices and performed a comparative analysis based on ML and conventional regression models. We relied on equity indices since the valuation of such assets is tied up to expected corporate profits, assumed to positively affect investors demand. We gathered daily data from 1/6/2011 till 18/1/2022 to assess the predictability of the models before and during the COVID-19 pandemic. We used indices from renewable energy, oil, gas, coal, pipelines, and gas refining to proxy energy prices. We investigated the impact of economic uncertainty indices (Twitter Economic Uncertainty (TEU-USA); Infections Uncertainty (EMV); Economic Policy Uncertainty (EPU) index; and CBOE Market Volatility Index (VIX)) on the prediction of energy prices using RMSE as a measure of performance and accuracy.

We applied 25 ML approaches, including NN SVRs, Regression Trees, and GPR models. We found that ML models outperformed the MLR approach in all cases. The Nonlinear Autoregressive with External (Exogenous) parameters (NARX) was superior as it significantly improved accuracy. The simultaneous use of the Levenberg–Marquardt algorithm and autoregression with exogenous input parameters made the response outcome entirely accurate and minimized RMSE relatively to other NN machine learning models. In conclusion, the Artificial Intelligence (AI)" and Machine Learning (ML) models allowed more accurate predictions than traditional Multiple Linear Regression models. The Neural Networks appeared the superior model.

Although the study contributed to important findings, it suffered from some limitations. First, the study was limited to the US context though it would be beneficial to perform it on global echelons. This is due to the availability of US data regarding the energy equity prices and uncertainty indices. Second, the high-frequency data cannot be easily accessed. In this regard, real-time data analysis might conclude valuable insights and important conclusions. Most databases provide high-frequency data for 3-month only, which restrict the analysis to daily frequency while forgoing the option to account for more extended period (two years or more), which can lead to valuable results.

We recommend expanding our suggested framework and the Early Warning Systems (EWS) and incorporating other sectors. While large energy commodities markets are exceptionally efficient, price projection models based on advanced methodologies might conclude to undesirable outcomes as some powerful companies might benefit from this innovation and induce more effectiveness through improved latent data integration in the equilibrium prices. This can be considered a genuine jump based on sophisticated ML forecasts in the spectrum of energy economics. Machine learning appears valuable in determining and anticipating market threats when integrated with network architectures. It might improve the efficacy of risk management systems by accurately capturing the dispersion of disturbances on equilibrium parameters.