1 Introduction

Wind power forecasting task can provide predictions of the potential electricity output of wind farms to facilitate scheduling plans. Wind power’s instability and fluctuations make it challenging to integrate into the system [1, 2], requiring accurate predictions for stable grid operation [3,4,5]. Therefore, accurate wind power forecasting is crucial for wind farm operation and grid stability.

Wind power forecasting methods can be categorized on the basis of time scales: short-term, medium-term, and long-term predictions. Short-term wind power forecasting is made within the next few hours and is critical for real-time wind power scheduling [6,7,8]. Medium-term wind power forecasting is made within the next week and is primarily used for unit mix and standby arrangements [9,10,11]. Long-term wind power forecasting is generally made in months or quarters and is essential for wind resource assessment and maintenance arrangements [12,13,14,15]. The main purpose of all forecasting methods is to improve the accuracy of wind power prediction as much as possible. Traditional wind power forecasting methods are physical and statistical methods [16, 17]; while, intelligent methods are machine learning and deep learning approaches. Physical models attempt to estimate wind speed time series while considering the physical characteristics of the environment [18], utilizing numerical weather prediction (NWP) data to calculate actual wind power generation [19]. However, physical methods suffer from drawbacks such as high computational costs and difficulty in obtaining NWP data from meteorological stations. Statistical models seek relationships among historical data parameters to predict future wind speed and wind power [20]. These methods include autoregressive (AR) [21], vector autoregression (VAR) [22], autoregressive moving average (ARMA) [23], autoregressive integrated moving average (ARIMA) [24], fractional autoregressive integrated moving average (FARIMA) [25], and models based on Kalman filtering [26]. While statistical methods can accurately handle highly volatile noise in wind power time series data with nonlinear input data features, they often lead to significant errors in wind power predictions [27]. Artificial neural network (ANN) forecasting method can provide more information about the uncertainty of wind power prediction [28], with results generated by the output layer [29]. ANN learns the relationship between wind power input and output by training with historical wind power values [30]. Extreme learning machine (ELM) [31] and backpropagation (BP) neural network [32] are the most commonly used ANNs in wind power forecasting. However, due to the inherent randomness of initial network parameters such as weights, thresholds, and smoothing factors, it is challenging to obtain accurate and reliable predictions from ANN. Deep learning approaches generally achieve higher forecasting accuracy but require higher computational resources. Machine learning approaches are simpler to scale up. Intelligent approaches can improve energy effectiveness, reduce energy waste, and make instantaneous decisions in the wind energy sector [33,34,35,36]. Intelligent approaches have great potential in wind power forecasting. In addition, large language models (LLMs) are mainly designed for natural language processing and have the ability to capture long-term dependencies in sequences. Due to the sequential nature of textual and temporal data, recent research has extended LLMs to handle time series and spatiotemporal tasks [37].

Table 1 All abbreviations used in the article

Several researchers have provided reviews of wind power forecasting techniques over time. In Colak et al. [5] provided a brief overview of data mining and wind power forecasting to identify the top forecasting models for wind power at various time frames. The article found that ANN is suitable and efficient for short-term wind power forecasting; while, multilayer perceptrons are effective for medium to long-term wind power forecasting. In Wang et al. [38] provided a thorough overview of artificial intelligence algorithms in wind farms, summarizing the strengths and limitations of ANN, support vector machine (SVM), and seagull optimization algorithm (SOA) for wind farm applications. The article concluded that numerous applications for wind farms may be created using artificial intelligence algorithms. Hanifi et al. [39] provided a systematic classification of wind power forecastingmethods, which includes physical, statistical, and hybrid methods. The article also reviews factors that affect the accuracy and computation time of predictive modeling efforts. These three review articles reflect the application and development of ANN in wind power forecasting research over the last decade. However, they only address a single forecasting model and do not consider the latest processing and optimization techniques in hybrid models. Additionally, visualizing the historical development of the wind power forecasting problem in a single discussion can be challenging.

Over time, the quantity of published works on wind power forecasting has increased, with more and more researchers joining the field. To comprehensively summarize the literature and identify evidence on challenges, researchers often use a systematic approach known as a systematic review. In wind power forecasting, researchers have attempted to review the research area from various perspectives. For instance, Lu et al. [40] gave a general overview of how meta-heuristic algorithms are used in wind power forecasting and the difficulties they encounter. The article’s major focus is on using meta-heuristic techniques to optimize important ANN parameters, such as initial weights and thresholds and smoothing factors, to enhance the accuracy of wind power predictions. Qian et al. [41] conducted a review and discussion of decomposition algorithms used in hybrid models, focusing on wavelet transform (WT) and empirical modal decomposition (EMD). This article also examines techniques employed in recent years to improve decomposition-based models’ performance from three perspectives: feature selection, parameter optimization, and predictability analysis and filtering. Bokde et al. [42] focused on the advantages, timely growth, and future possibilities of hybrid EMD or ensemble empirical mode decomposition (EEMD) models for wind power forecasting. Wang et al. [43] presented a thorough analysis of the various deep learning methods applied to the prediction of wind speed and power, including the many steps of data processing, feature extraction, and relationship learning. The article identifies three challenges to accurate wind speed/power predictions under complex circumstances, namely data uncertainty, incomplete features, and intricate nonlinear relationships. These four articles each focus on a specific aspect of wind power forecasting, such as parameter optimization or data decomposition preprocessing. However, they do not provide a comprehensive overview of the wind power forecasting framework, and as such, cannot fully illustrate current trends in wind power forecasting research.

The aim of this survey is to address the aforementioned gap in the literature. Unlike previous review articles, this article focuses on reviewing the wind power forecasting problem based on existing literature, establishing a systematic knowledge map of research in this area, and identifying the relevant data and findings. Additionally, the survey visualizes the reported results, which have not been done in earlier work. Moreover, a complete forecasting framework involves three essential phases: data pre-processing, model selection, and model optimization. This paper provides a comprehensive overview of hot intelligence approaches in keyword analysis from these perspectives, in the context of a complete process.

This paper focuses on handling the following research questions:

  1. (1)

    What has been the progress in wind power forecasting research in the last two decades based on the collected literature, with regard to countries, institutions, journals, and authors? Answering this question aims to present a comprehensive understanding of the historical development of the wind power forecasting problem from multiple perspectives, such as time, technology, and research institutions.

  2. (2)

    How are intelligent approaches and other emerging technologies (non-autoregressive and LLMs) being applied in current research, given their potential in wind power forecasting? This question aims to analyze and evaluate the methods represented by hot co-occurring and bursting keywords in the literature from multiple perspectives, and explore the development trend of wind power forecasting incorporating these approaches.

  3. (3)

    What are the challenges and opportunities for intelligent approaches in wind power forecasting research, based on the current state of research? By synthesizing the knowledge map of the previous literature in this research area and the current state of research on intelligent approaches, answering this question can inform and open up ideas for future developments. This work provides a comprehensive review of the application of intelligent approaches, specifically machine learning and deep learning, in wind power forecasting. Sections 2, 3 and 4 address the research questions from different perspectives, providing valuable insights into the current state of research. Compared to previous studies, the contributions of this paper can be summarized in three key aspects:

    • A novel production of a knowledge map of the literature on wind power forecasting research, including a scientometric analysis and visual presentation of the development status and trends.

    • A systematic analysis of the application of intelligent approaches, which are the most important trends in research, from the perspectives of data pre-processing and parameter optimization.

    • A detailed analysis of the challenges and future directions for research in wind power forecasting incorporating emerging technologies such as attention mechanisms, non-autoregressive, and LLMs, providing valuable insights for future developments in this field.

    The paper is structured as follows: Sect. 2 presents the findings of the CiteSpace-based systematic knowledge mapping and scientometric statistical analysis. It includes a detailed explanation of co-occurring and bursting keywords analysis and highlights highly cited reference authors and journals. In Sect. 3, machine learning approaches, deep learning approaches, attention approaches, and other emerging technologies (non-autoregressive and LLMs) are further analyzed based on the keyword hotspots identified in Sect. 2. Section 4 discusses future trends and recommendations related to wind power forecasting research. Finally, the paper is concluded in Sect. 5. Table 1 gives some abbreviations used throughout the paper.

2 Knowledge map analysis

In this section, we address research question 1 by analyzing and visualizing the knowledge map of wind power forecasting literature in the Web of Science (WOS) database over the last 20 years. We compare the cooperation networks and development channels and examine the number of publications, frequency of citations, publications in journals, and influence of authors and institutions. We also study changes in co-occurring and bursting keywords to reveal shifting research trends.

To guarantee the veracity and credibility of our data, we sourced information from the WOS core database and searched for literature from the last two decades using subject terms such as “wind power prediction/forecasting” and “time series.” We searched and screened the data, obtaining 222 articles for visualization and analysis.

2.1 Temporal analysis of literature production

The trend in the number of papers published in a particular research area is a reflection of the level of attention given to that field. In this study, we analyzed 222 pieces of literature and obtained a line graph depicting the number of papers published on wind power forecasting from 2000 to 2022. As shown in Fig. 1, the frequency of citations and the number of published articles have been increasing from 2006 to 2022. The field experienced a phase of slow growth followed by rapid growth, indicating that scholars have continued to prioritize research in this area. According to the statistics of the top three countries in terms of publication volume shown in Fig. 2 from 2006 to 2022, it is not difficult to see that scholars paid less attention to wind power forecasting research before 2014, with a relatively stable and low publication volume. During the period when stable generation sources like thermal, hydropower, and nuclear power were dominant, wind farms were relatively uncommon, which meant that accurate forecasting was not critical due to their low impact on grid power volatility.

From 2014 to 2017, there was a transitional phase in the global wind power development process, with the number of publications increasing steadily, particularly in research related to wind power forecasting. Following 2017, the global growth of installed wind power started to accelerate, leading to a rapid increase in both the number of publications and the frequency of citations. Furthermore, from 2020 to 2022, the number of publications in China surged rapidly. Figure 1 reveals that the change in citation frequency and literature production follows the same trend over time, with 1214, 1444, and 1976 citations in 2020, 2021, and 2022, respectively.

Fig. 1
figure 1

Number of publications and citations of research on wind power forecasting from 2006 to 2022

Fig. 2
figure 2

Number of articles published by the top three countries

2.2 Analysis of journals and authors’ publications and influence

The distribution of articles across journals is an important indicator of the research direction and impact. Table 2 reveals that Energies journal has the highest number of articles and the fifth highest citation frequency. Meanwhile, IEEE Transactions on Power Systems has a total of 1584 citations and an average of 153.41 citations, suggesting that the content of articles in these two journals aligns with the focus of the journal, and these articles are highly regarded. The high citation rate indicates a growing interest in wind power forecasting research, with topics ranging from energy policy to other relevant areas. The increasing intensity of development and construction also indicates support for relevant policies.

Regarding the top 10 authors with the most articles published on wind power forecasting research from 2000 to 2022 in Table 3, Pierre Pinson had the most published articles and highest citations, with 10 and 778, respectively. P.K. Dash had the third-highest number of published articles and the second-highest citations, with 378.

Table 2 Top ten journals in the number of publications on wind power forecasting
Table 3 Top ten authors in the number of publications on wind power forecasting

2.3 Analysis of country and institution cooperation networks

The country collaboration network in Fig. 3 shows that China, the USA, Denmark, Iran, and Australia have larger nodes, indicating that they have more publications and occupy a leading position in wind power forecasting research. Among them, China (125), the USA (21), and Denmark (16) have significantly more publications than other countries and are ranked in the top three. Table 4 shows that these countries, China, the USA, and Denmark, are also ranked in the top three regarding the frequency of citations of their publications.

Taking into account the 310.629 GW of cumulative onshore wind power installed in China by the end of 2021, which exceeds that of the USA (134.354 GW) and Germany (56.814 GW), as presented in Table 5, the continuous growth of wind power installations serves as a significant driver for research enthusiasm and the level of research conducted in the field. Furthermore, Fig. 3 shows 48 nodes, 82 connections, and a connection strength of 0.0727, which indicates the initial formation of an initial collaborative network for inter-country wind power forecasting research.

Fig. 3
figure 3

Visualizing map of national cooperation network

Table 4 Top ten countries in the number of publications on wind power forecasting
Table 5 Top ten countries regarding the installed wind power in 2021

The significance of a node in the field can be measured by its high betweenness centrality. China and Denmark have high betweenness centrality, indicating their central role in wind power forecasting research and being at the forefront of research development. Although the USA has more publications (21), its betweenness centrality is 0.08, suggesting that it conducts research independently and does not have close collaborations with other countries.

Figure 4 presents a collaboration map of highly productive institutions in wind power forecasting research. Node circles in the figure represent the number of publications of the institution; while, the number of connected lines indicates the degree of collaboration between the institutions. It is evident from the figure that the circles of all nodes are relatively small, implying that each organization has a limited number of outgoing publications. However, some institutions have larger nodes, indicating that they are high-producing institutions. Furthermore, the higher number of links between high-producing institutions compared to low-producing institutions suggests that high-producing institutions possess significant scientific research and professional authority. This allows them to attract cooperation and academic exchange from other institutions.

Figure 4 displays 207 nodes, 182 links, and a connection strength of 0.0085, indicating the initial formation of a collaborative network among research institutions. On the other hand, Table 6 reveals that Tech Univ Denmark holds the first position with 16 publications and 886 total citations. Notably, four out of the top five institutions in terms of publications are from China, indicating a significant contribution to the field of research from Chinese institutions.

Fig. 4
figure 4

Visualizing map of publishing institution cooperation network

Table 6 Top ten institutions in the number of publications on wind power forecasting

2.4 Analysis of co-occurring keywords

Figure 5 displays a co-occurring keywords map of the wind power forecasting field, which consists of 329 nodes. Each node’s size reflects how frequently it appears in the network. Among the top three keywords, “wind power forecasting,” “neural network,” and “time series” occur more than 40 times. The frequency of “neural network” and “time series” is relatively high, with 55 and 43 occurrences, respectively, indicating that neural network methods are the most common methods applied to time series data for wind power forecasting. After 2011, the keywords “empirical mode decomposition,” “wavelet transform,” “variational mode decomposition,” and optimization algorithms continue to appear, showing that research in wind power forecasting no longer relies solely on intelligent approaches. Instead, it focuses on improving forecasting models through data pre-processing and optimization to enhance prediction accuracy.

Fig. 5
figure 5

Visualizing map of co-occurring keywords network

Fig. 6
figure 6

Proportion of various algorithms in machine learning

In recent years, data-driven artificial intelligence technologies have developed rapidly, with machine learning transitioning from shallow to deep learning. Wind power forecasting models based on artificial intelligence have emerged as a hot topic. Many studies have shown that artificial intelligence-based models exhibit superior forecasting performance compared to traditional statistical models. These models hold the potential to overcome the technical bottlenecks of wind power forecasting and achieve significant improvements in forecasting accuracy. We conducted statistical analysis on various machine learning algorithms used in the literature, and the results are shown in Fig. 6. Artificial intelligence-based wind power forecasting models include traditional machine learning models such as traditional ANN, SVM, and ELM, among others. With the advancement of artificial intelligence, an increasing number of deep learning models, such as long short-term memory (LSTM), convolutional neural networks (CNN), auto-encoders, gated recurrent units (GRU), are being adopted, proving highly effective in addressing complex nonlinear problems.

2.5 Analysis of bursting keywords

Bursting keywords refer to keywords that experience a sudden surge in usage frequency within a specific timeframe, indicating the emergence of hotspots in research during that period. By analyzing the keywords used in wind power forecasting from 2000 to 2022, 20 burst keywords were identified as representing the current forefront of research, as shown in Fig. 7.

Fig. 7
figure 7

Top 20 keywords with the strongest citation bursts

In the early years, wind energy and the market were the main factors considered in wind power forecasting research, including the linear variation of wind speed and temperature in different regions. After 2010, the time series approach became the dominant method. In 2016, the terms “artificial neural network” (particularly “extreme learning machine”) and “support vector machine” gained attention as methods capable of fitting nonlinear relationships in wind power variation and achieving good forecasting results. In recent years, “deep learning” and “feature selection” have emerged as prominent keywords, with deep learning approaches widely developed and applied to wind power forecasting. Additionally, feature selection has been shown to further improve forecasting accuracy.

3 A review of hot spots in wind power forecasting research

In response to research question 2, we will delve into the research hotspots involved in co-occurring and bursting keywords analysis in Sects. 2.4 and 2.5, namely the intelligent approaches represented by machine learning and deep learning.

3.1 Machine learning approaches based on data pre-processing

Time series data are typically high-dimensional and data-intensive, and using them directly for input modeling can greatly affect the accuracy of predictions [44]. Therefore, data pre-processing is an essential step before machine learning modeling. Researchers have employed various techniques to further analyze time series data for enhancing forecasting accuracy.

According to [45], improved results in wind power forecasting can be obtained by decomposing the original time series data and reconstructing the forecasting model for the sub-series data. They suggested a combined wind power forecasting model using reconstructed data and dynamic adaptive weights, which performs well in short-term forecasting and requires a high degree of adaptability in long-term forecasting. Reconstruction of time series data by chaotic phase space not only achieves dimensionality reduction but also considers the chaotic characteristics in time series data of wind power. Wang et al. [46] also applied the reconstruction of time series data by chaotic phase space and analyzed typical chaotic time series forecasting models accordingly, and constructed a hybrid model using a regime-switching optimal mechanism. This forecasting model outperformed traditional artificial intelligence algorithms, with an average improvement of 11.91\(\%\) for mean absolute error (MAE) and 35.38\(\%\) for root mean square error (RMSE).

It is complicated how wind energy and physical weather conditions interact, resulting in inherent stochastic fluctuations, nonlinearity, and non-smoothness in wind power data. As a result, predicting wind power with high precision is challenging. To improve forecasting accuracy, many researchers have utilized techniques such as WT [47, 48] or EMD to decompose the time series into multiple levels and then combine them for predictions. Decomposition of wind power data using post-frequency division has been shown to enhance forecasting accuracy. In [49], the authors proposed the use of wavelet decomposition (WD) to decompose the original wind power time series data into a number of sub-signals with improved behavior and contours. EEMD, as used in [50], is suitable for non-smooth and nonlinear signal analysis. EEMD can apply white noise extreme point mutation to improve decomposition accuracy compared to EMD and address the disadvantages of the EMD algorithm in terms of modal confusion phenomenon [51]. EEMD has been utilized to decompose chaotic wind power time series into eigenmodal signals with varying eigenscales, resulting in improved forecasting accuracy. The approach has been examined and evaluated using genuine Chinese wind power statistics [50].

To improve the analysis of wind power data, Zhang et al. [52] reconstructed non-stationary wind power time series through the complete integrated EMD Rumpel-Ziev complexity decomposition, obtaining stable and mutually independent subsequences. Variational modal decomposition (VMD), employed in [53, 54], is used to process the original time series signal, preventing interaction between different modes. Meanwhile, Wu et al. [55] adopted a mean trend detector (MTD) to decompose the time series into a mean trend component and a stochastic component. These two components have a clear physical meaning and can be interpreted as the trend and high-frequency fluctuations in the variability of the time series, respectively. By using this method, a more accurate and stable combined forecasting is obtained.

Based on a synthesis of 222 lature, the use of data pre-processing techniques in wind power forecasting studies has increased significantly from 2012 to 2022, as shown in Fig. 8. The number of published studies has increased steadily, particularly in 2021 and 2022. Figure 9 illustrates the proportion of data pre-processing-based models in all forecasting models in literature studies published from 2018 to 2022. It indicates that the share of wind power forecasting models based on data pre-processing models exceeded 50\(\%\) in 2022, a 20\(\%\) increase from 2018.

Fig. 8
figure 8

Number of models based on data preprocessing in 222 publications

Fig. 9
figure 9

Proportion of models based on data preprocessing in the total

3.2 Machine learning approaches based on parameter optimization

Parameter optimization is a crucial step in selecting an optimal set of parameters for machine learning algorithms. The process of parameter selection and optimization strategies can enhance forecasting accuracy and decrease computing time by efficiently training the model [56].

ELM is a popularly applied ANN in wind power forecasting, including an input layer, a hidden layer, and an output layer. According to [40], ELM is the most popular ANN used in wind power forecasting. Although ELM uses initialization parameters to construct the model, it suffers from the disadvantage of redundant nodes in the hidden layer, which hinders the accuracy of the forecasting model to some extent. In their study, Zhang et al. [52] utilized ant lion optimization (ALO) to optimize ELM’s parameters. They applied this method to wind power data collected from a wind farm in East China from March 2015 to May 2015. Their findings demonstrate that the suggested model performs well and is especially outstanding for 48-h advance wind power predictions. Meanwhile, Ding et al. [54] presented an improved ELM forecasting model using the gray wolf optimization (GWO) algorithm. The ELM forecasting model is proposed for the sub-time series obtained from the VMD decomposition and optimized with the GWO algorithm. The forecasting results showed that the suggested VMD-improved-GWO-ELM forecasting model performs better on the forecasting curves compared to ELM and BP models. More specifically, both the forward and backward prediction results have significant errors at some outliers, and they do not perform consistently due to the sensitivity of the optimization. In the research by Zhao et al. [57], a backward forecasting model based on ELM with a bidirectional mechanism was proposed to correct prediction errors by comparing the prediction results in both directions. This approach produced beneficial outcomes in medium-term wind power forecasting ranging from 1 to 6 h. The bidirectional mechanism can also be extended to other algorithmic models such as SVM and gray models to improve their performance.

SVM is based on statistical learning theory, as proposed by [58]. SVM is not only useful for high-dimensional identification but also for forecasting in nonlinear and small-sample scenarios. By mapping wind power time series to a high-dimensional feature space through nonlinear mapping, SVM can be used in the regression problem of wind power forecasting and achieve good results [59]. For high-dimensional data spaces like wind power, the inner product function of SVM is usually replaced by a kernel function. In Wang et al. [60], a hybrid PSO-SVM-ARMA model was suggested to enhance forecasting performance by applying a Gaussian radial basis function (RBF) kernel function and optimizing the parameters with particle swarm optimization (PSO) algorithm. To address the problem of computational complexity in SVM, the least squares support vector machine (LSSVM) was suggested [61], which uses squared errors and equation constraints to transform quadratic programming problems into problems involving linear equations, lowering the amount of computing needed for model learning. An error-corrected LSSVM model was also proposed in [62] for wind power forecasting, which considers the error correction process of the model. The proposed method reduces the average absolute error by 52\(\%\) compared with a single LSSVM forecasting model. In the studies by Wang et al. [50] and Li et al. [51], the LSSVM parameters are optimized using the improved sparrow search algorithm (SSA) during model construction. This optimization method leads to improved forecasting accuracy and faster convergence speed in the 24-h forecast. Meanwhile, Lu et al. [56] employed a new heuristic optimization algorithm, namely the gravity search algorithm (GSA), to optimize the hyperparameters of LSSVM. The resulting wind power forecasting model exhibits better regression forecasting accuracy when compared to other benchmark models in the validation of the Hebei province wind power dataset.

3.3 Deep learning approaches

In practical applications, traditional machine learning models are usually used for short-term wind power forecasting under ideal conditions. However, in more complex conditions or for long-term forecasting, the mean absolute percentage error (MAPE) can reach between 10\(\%\) and 17\(\%\), which may not satisfy the engineering specifications needed for typical renewable energy projects [63].

To address this issue, deep learning has emerged as a powerful tool with its automatic feature extraction and complex nonlinear relationship fitting capabilities, which have been widely applied in renewable energy exploitation, especially in wind power forecasting [43]. Deep learning models utilize a multilayer neural convolutional architecture with a gradient descent algorithm to minimize estimation errors and are applicable in a wide range of wind power forecasting scenarios [63]. Table 7 lists the top ten articles and models with the most cited frequency of deep learning methods in the reviewed literature. In Yu et al. [64], CNN was employed to simulate and predict spatiotemporal processes of wind farms. The mean square error (MSE) of wind power forecasting results of this method is decreased by 49.83\(\%\) compared with existing methods, and the model’s training time is reduced by more than 150 times on average. In Shao et al. [65], the proposed model integrated the benefits of feature selection and recurrent neural network (RNN) by using sorted features as input data to obtain the forecasting results through deep CNN. Validated on the dataset of the national renewable energy laboratory (NREL), the accuracy of wind power forecasting is more than 10\(\%\) higher than that of traditional methods. For wind power forecasting, Lin et al. [63] developed a feature-based learning model and applied a multi-layer neural convolution architecture time convolution network (TCN) with a gradient descent algorithm to minimize the estimation model error.

LSTM is an enhanced architecture over traditional RNN, which can overcome its limitations in long-term dependency problems and learn dependency information for a long time [66,67,68]. The addition of a special hidden unit called a memory unit helps to mitigate the gradient disappearance problem and leads to a significant reduction in time series forecasting errors. In Lopez et al. [69], LSTM was combined with an echo state network (ESN), resulting in improved forecasting results compared to traditional models. Furthermore, Yu et al. [70] proposed an optimized LSTM-enhanced forget-gate for wind power forecasting. The model filters the data by clustering to enhance wind power feature extraction through the correlation of turbine data. Devi et al. [71] employed data pre-processing and algorithmic hyperparameter optimization to improve the performance of the LSTM network structure. The resulting LSTM model, which combined EEMD and cuckoo search algorithms, exhibited superior forecasting accuracy compared to BP, SVM, and the original LSTM when tested on given time series data. In another study, Neshat et al. [72] utilized a hybrid optimization approach that combined adaptive differential evolution techniques with sine cosine optimization to optimize the LSTM’s hyperparameters. The resulting hybrid model not only demonstrated improvements in forecasting accuracy but also reduced prediction duration. Meanwhile, Liu et al. [73] proposed a simple forecasting method for wind power using discrete wavelet transform (DWT) and LSTM. The study by Ewees et al. [67] proposed a method that utilizes LSTM to divide wind time series data into sub-series and uses the meta-heuristic optimization algorithm heap-based optimizer (HBO) to train the LSTM. The comparison results with existing models indicate that HBO can enhance the predictive performance of the underlying LSTM model.

As the deep neural network’s depth increases, the performance usually improves. However, when the network size exceeds a certain threshold, overfitting may occur, which can negatively impact the overall performance of the DNN [68, 74, 75]. To address this issue, Schuster et al. [76] proposed bidirectional long and short-term memory (BiLSTM), which enables the training in two directions to improve sequence learning’s performance [77]. In the paper by Liu et al. [78], a BiLSTM model was utilized and was found to outperform traditional LSTM networks regarding forecasting accuracy. The simulation results demonstrated that the BiLSTM neural network model got a forecasting accuracy improvement of 10.25\(\%\), 6.71\(\%\), and 12.18\(\%\) over the LSTM model.

An extensive analysis and synthesis of 222 papers focused on deep learning approaches in wind power forecasting revealed that 44 papers used LSTM and its improved variants, making it the most commonly used method, followed by CNN. As shown in Fig. 10, LSTM has shown remarkable results in wind power forecasting.

Fig. 10
figure 10

The proportion of various algorithms in deep learning

Table 7 Top ten articles in cited frequency on wind power forecasting based on deep learning

In addition, the main characteristics of time series, i.e., seasonality or periodicity and trend patterns [83], can be efficiently handled by Transformers. Transformers exhibit strong modeling capability for long-distance dependencies and interactions in time series data, thus having high application potential in long-term forecasting [84]. In the research of [85, 86], incorporating sequence periodicity or frequency processing into the time series Transformer can significantly improve performance.

To effectively solve the long-distance information extraction and storage in sequence-to-sequence tasks, researchers have developed the Transformer model [87]. It breaks through the limitation of serialization computation based on the RNN model using parallel matrix operation and achieves high performance in the field of machine translation. Zheng et al. [88] combined Transformer and VMD to validate the effectiveness of the forecasting model in two actual datasets of Chinese wind farms. Wang et al. [89] used three improved encoder–decoder architectures for multi-step wind power forecasting. After comparison, the Transformer performs the best in forecasting accuracy and training efficiency. In wind power data from 12 regions in China, compared to existing models, the Transformer model reduced RMSE by 21.8\(\%\) and 16.3\(\%\) in 9-step and 3-step predictions, respectively. Xiao et al. [90] systematically established a hybrid wind power probability density forecasting method with a Transformer network, combining expected regression and kernel density estimation. The proposed method is better than the mainstream RNN model and can more accurately capture the uncertainty in wind power prediction. Huang et al. [91] combined deep learning with Transformer and utilized the unique network structure of Transformer to achieve more effective feature extraction for time series data. Compared with the LSTM forecasting model, the accuracy and efficiency of the forecasting results have been improved. We have collected the latest literature and summarized the latest improvements based on Transformers, as shown in Table 8.

Table 8 Recent articles on Transformer and improvements for time series forecasting

3.4 Other emerging technologies

3.4.1 Non-autoregressive models

When discussing generative models in machine learning, we often think of autoregressive models that use time series for forecasting, such as the ARMA [101] model in load forecasting. These models are known for their ability to generate sequence data incrementally. The advantage of autoregressive generative models is that the generated data are typically well-structured and coherent in terms of syntax. However, a drawback is their slower generation speed because each element must wait for the previous one to be generated before proceeding. In contrast, non-autoregressive generative models differ from traditional autoregressive models in that they do not need to generate data sequentially; instead, they can generate the entire sequence simultaneously, resulting in faster generation speeds. This is typically achieved by parallelizing all generation steps.

Non-autoregressive models for wind power forecasting are based on multivariate modeling. Considering that the wind process is influenced by various factors in the atmosphere, more variables may provide additional information for wind power forecasting [45]. Peng et al. [102] utilized a combination of historical wind power data and features extracted and obtained from NWP through encoders and attention layers as composite inputs. The NWP data comprises 24 wind farm features including temperature, momentum flux, and wind direction. Jency et al. [103] employed a homogeneous point mutual feature selection model to design robust wind power forecasting, utilizing input wind turbine data (low-pressure active power, wind speed at turbine hub height, theoretical power curve, wind direction) and related features. Ouyang et al. [104] proposed a novel combined multivariate model, with a research case employing nine variables including wind power generation, and temperature, among others, to enhance wind power forecasting performance. Table 9 summarizes the various input data variables used in non-autoregressive models for wind power forecasting based on recent publications.

Table 9 Non-autoregressive models and its input data variables for wind power forecasting

3.4.2 Large language models

Recently, in the field of deep learning, LLMs has rapidly developed, bringing changes and opportunities to various disciplines. LLMs are considered to be one of the key technologies for future general artificial intelligence. Initially developed for processing natural language, LLMs leverage large-scale data to learn the structure, semantics, and contextual information of language, enabling them to generate high-quality language outputs based on given instructions. In the context of wind power forecasting, LLMs could potentially be used for the analysis of high-dimensional data related to wind power, thereby enhancing predictions for long sequences and complex scenarios.

LLMs demonstrate powerful capabilities in understanding natural language and solving complex tasks through text generation. Currently, LLMs primarily utilize the Transformer [87] architecture. Bidirectional encoder representations from transformers (BERT) [111] is a deep bidirectional (bidirectional refers to considering context information simultaneously in both directions) representation learning model based on the Transformer architecture. Generative pre-trained transformer (GPT) is a unidirectional Transformer decoder model. Text to Text Transfer Transformer (T5) [112] is a model constructed based on the Transformer’s encoder and decoder architecture. Recent studies have shown that LLMs perform excellently in time series analysis tasks, such as prediction, classification, anomaly detection, interpolation, few-shot learning, and zero-shot learning. For instance, Cao et al. [113] utilized GPT as the basis for time series forecasting. Jin et al. [114] reprogrammed LLMs for time series forecasting while keeping the underlying language model intact. Comprehensive evaluations indicate that the proposed TIME-LLM is a robust time series learner, outperforming state-of-the-art specialized forecasting models. Zhou et al. [115] applied the GPT2 model to key tasks in time series analysis such as few-shot learning, full-data learning, and zero-shot learning. The authors’ research demonstrates that the proposed LLM method performs comparably or better than state-of-the-art methods in nearly all time series tasks. LLMs have the ability of universality and logical reasoning, opening up new research directions for time series forecasting, particularly for wind power forecasting.

3.5 Open-source wind power datasets

Open-source datasets play a crucial role in wind power research. However, many studies in the current wind power forecasting literature rely on unique datasets that cannot be replicated by other researchers. To improve standardization in the field and create a uniform evaluation standard, different researchers need to construct forecasting models and compare their forecasting results on the same dataset.

Wind power data are typically generated by individual turbines; while, a large number of monitoring control and data acquisition (SCADA) data are collected by the turbines. An aggregated dataset is a summary of wind turbine data within a wind farm, which generally does not contain specific information such as SCADA, but provides overall wind power data for the entire wind farm, typically at 30-minute to 1-hour intervals. Table 10 provides a summary of some open-source wind power datasets.

Table 10 A summary of wind power datasets

3.6 Evaluation indicators of forecasting model

Model evaluation is essential for gauging a model’s ability to accurately forecast wind power. It serves as the final step in the modeling process, aiding in identifying the best model to fit the data and estimating its future performance. Table 11 presents a compilation of commonly used evaluation metrics in wind power forecasting.

Table 11 Evaluation indicators of the forecasting model

4 Discussion and future challenges

In answer to research question 3, we have compiled the current research status and knowledge graph of intelligent methods in wind power forecasting to identify the latest development trends and provide future suggestions for forecasting methods. A brief comparison summary of the forecasting techniques is presented in Table 12.

Table 12 A summary of wind power forecasting methods from different perspectives

Our analysis of journal publications indicates that because of the irregular and intermittent nature of wind power time series data, many machine learning models rely on data preprocessing techniques to enhance forecasting accuracy. Data preprocessing techniques involve analyzing and processing time series data through techniques such as decomposition, feature de-noising, or feature extraction. By transforming the original time series data into multiple sequences or matrices, the models can identify more distinct features, thus improving their accuracy in predicting wind power.

As previously mentioned, the most commonly used data preprocessing methods are WD and mode decomposition for wind power forecasting. Among the modal decomposition family methods, EMD, VMD, and EEMD are widely used. Our review found that applying appropriate data preprocessing techniques to process time series data before machine learning modeling leads to stronger predictability of data obtained by different preprocessing methods than the original data, which might enhance prediction accuracy.

In addition, parameter optimization is another common practice in machine learning approaches that can improve the performance of forecasting models. For instance, ELM is the most popularly used ANN in wind power forecasting problems. Research has focused on applying optimization methods to improve crucial parameters such as thresholds and starting weights. The learning capacity of the sample characteristics is only somewhat strong since the neural network is a shallow model. After reviewing various studies, it can be concluded that the application of optimization methods can enhance the forecasting accuracy of almost every machine learning model. However, it is significant to mention that comparing different optimization algorithms can be challenging due to variations in parameters such as population size, the number of iterations, and the number of time scales utilized for wind power forecasting.

Deep learning approaches have shown to be highly effective in capturing complex nonlinear relationships and extracting advanced feature representations from large-scale data. Deep learning models may efficiently extract dynamic characteristics and potential variables from intricate network topologies by thoroughly exploring high-dimensional state spaces. Prediction accuracy can be significantly increased by these elements. The current generation of deep learning methods for wind power forecast comprises CNN, RNN, and their two main variants, LSTM and GRU. Among these models, LSTM networks are capable of predicting and processing time series data with temporal delays and moderate interval lengths thus achieving accurate predictions. As a result, LSTM has the most widespread application in wind power forecasting.

Other emerging technologies have also opened up new research directions in wind power forecasting. The attention mechanism of Transformers can effectively model long-term dependencies in time series data. Capturing the seasonality and periodicity of time series data gives Transformers a significant advantage in long-term wind power forecasting. LLMs were primarily developed for natural language processing but can capture the features of time series data by adjusting the way they learn from textual data. Expanding LLMs for wind power forecasting aligns with two key trends in technological development. Firstly, there is a trend toward greater flexibility, aiming to reduce model size and computational costs. The varying scales of wind farms and fluctuations in the electricity market demand high flexibility in the application of LLMs. Secondly, there is a shift toward multimodality. LLMs need to have a universal interface supporting multimodal input and output to meet the requirements of diverse applications involving different types of wind power and meteorological data.

The following opportunities for future work in wind power forecasting research are outlined below:

  1. (1)

    Data pre-processing is essential for wind power forecasting. As there is no one-size-fits-all optimal method for data pre-processing, it is worth exploring the application of suitable pre-processing methods in different models to improve forecasting accuracy.

  2. (2)

    Machine learning algorithms based on parameter optimization are widely used. However, selecting the most suitable optimization algorithm for different machine learning models is critical in wind power forecasting. Hybrid models using data pre-processing and parameter optimization are also common.

  3. (3)

    Deep learning-based neural networks, such as CNN, RNN, LSTM, and GRU, outperform general machine learning methods in wind power forecasting problems. These algorithms can achieve more accurate predictions because of their capacity to simulate intricate nonlinear functions. However, it is crucial to remember that these methods can be computationally heavy, and proper feature selection, data cleaning and optimizers, and network selection are crucial to enhancing the performance of these models.

  4. (4)

    For long-term time series data, wind power forecasting is difficult, with forecasting accuracy decreasing as the time scale increases. The success of Transformers and LLMs in natural language processing indicates their strong potential for time series forecasting. However, applying these emerging technologies to complex wind power forecasting requires further refinement and adjustment.

5 Conclusion

Since the turn of the 21st century, wind power has become an increasingly significant source of energy, and wind power forecasting techniques have been fully developed. In this article, we aim to address three research questions related to wind power forecasting by combing through relevant literature and research methods. Through the use of knowledge graph visualization and scientometric analysis, we provide a broad overview of the wind power forecasting problem.

Firstly, we conduct a bibliometric analysis of articles on wind power forecasting in the WOS core collection over the last 20 years using CiteSpace software. By counting the number of publications and collaborations by country, institution, journal, and author, and creating visual knowledge maps of the literature on wind power forecasting research, including a scientometric analysis and visual presentation of the development status and trends, we present the research situation of different institutions and authors in wind power forecasting by country clearly and concisely, providing researchers with valuable data references and insights.

Secondly, co-occurrence mapping of keywords in the literature and analysis of keyword bursts reveal the popular research directions in wind power forecasting, that’s machine learning and deep learning. The paper also provides further information and insight into the state of research in the literature on the research methods represented by some of the highlighted words. Machine learning algorithms are simple and easy to use and are often used in conjunction with data pre-processing and parameter optimization methods in applications. On the other hand, deep learning models can explore the deeper information of the data, and appropriate models need to be selected for different wind power time series data. In addition, some emerging technologies, such as Transformers and LLMs, have shown strong modeling abilities for long-range dependencies and interactions in time series data, deserving further studying its application in wind power forecasting.

Finally, it is recommended that future wind power forecasting studies should take into account the observations given in Sect. 4 to get better results.