A Design of the Real-Time Simulation for Wind Turbine Modeling with Machine Learning

Power system operators have recently introduced some AI-based techniques in load prediction, fault diagnosis, scheduling, and maintenance. Operators require a grid analysis that includes wind turbines to mitigate impacts by environmental factors. Among them, a method for modeling wind turbines that reflects the dynamic characteristics and their output characteristics is receiving attention. Recently, a data-based power curve modeling method has been adopted for a simplified model that characteristics as much as possible. However, the simplified EMT simulation is difficult to reflect the output characteristics according to nonlinear wind conditions accurately. This paper proposes a wind turbine model based on artificial neural network techniques using real supervisory control and data acquisition (SCADA) data from a wind farm. The proposed strategy derive the similar to real output value through the trained wind turbine model in various wind scenarios. For the verification of the proposed strategy, the case study was conducted using a real-time digital simulator (RTDS).


Introduction
The development of real-time operation management technology that can efficiently manage wind power as wind power generation increases in the power system is attracting attention. According to the facility capacity of wind systems increasing large-scale, research is expanding on the dynamics of wind turbines [1]. In this process, the importance of modeling that reflects mechanical characteristics has emerged. Modeling of wind farms has focused on several characteristics aimed at assessing the impact on the power system [2][3][4]. In order to proceed with various simulations targeting the wind farm connected grid, the generation system's dynamic characteristics need to be considered. Above all, in the case of wind power generation, output fluctuations due to intermittent wind speed are frequent, so the mechanical factors must reflect dynamic responses when proceeding with the electrical simulation [5].
The model that can consider the output characteristics of the wind turbine depends on the design method. Generally, it can be divided into two cases: a physical model and a datadriven model. A physical model mathematically represents the relationship between a wind turbine's input and output data. This model involves complex processes based on integrals of differential equations and logarithmic calculations. M. Lydia et al. used a logistic function to obtain parametric models of the power curve [6]. Multiple input variables from SCADA were introduced into the Gaussian process-based wind power curve in [7]. In [8], calculated the annual energy of wind power through advanced modeling that considers the properties of the wind turbine and the environment. The advanced modeling proposed models wind turbines by formulating air density parameters that affect output. Reference [9] proposed an electrical model of the wind turbine through detailed analytical modeling of the wind turbine structure. In particular, the structure of the turbine was analyzed via the Euler-Lagrange approach [10] and modeled by applying the blade element momentum (BEM) [11,12] method. In [13], a model based on the internal voltage fluctuation equation on the electromagnetic timescale was proposed for modeling, considering the operating characteristics of the doubly-fed induction generator (DFIG) wind turbine. These methods have the advantage of being able to demonstrate high accuracy in certain respects with actual wind turbines. However, when analyzing the power system on a large scale, there are difficulties in terms of calculation burden.
In order to compensate for these problems, some studies have been carried out through a wind farm's dynamic equivalent model (DEM). An equivalent aggregate modeling method based on the equivalent admittance of wind farms is proposed in [14,15]. A modeling method based on state-space equations for large-scale wind farm complexes is proposed in [16][17][18]. In this approach, the parameters of the wind power model are determined as optimal values by a proposed optimization equation. However, a large wind farm has various wind turbines with different characteristics and output capacities. Therefore, this method is challenging to create an equivalent model that reflects the detailed output characteristics required for power system analysis. Moreover, it is rated as unrealistic modeling if distributed over a large geographical area [19]. With the development of machine learning technologies, various approaches have been proposed for wind turbine modeling. Many methods have been considered to fit the power curve, and essentially speaking, they intend to accurately estimate the nonlinear relationship between the power and other variables [20]. The authors in [21] provided a high-precision wind power curve modeling method based on the wind speed vectors, including the wind speeds and wind directions at different heights of the wind measuring tower. The probabilistic approach based on spline regression models, which are used to generate inputs to a neural network for power forecasting, was considered to model the wind turbine power curve [22]. The stochastic gradient augmented regression tree with various environmental historical data, such as the wind speed, wind direction, blade speed, and yaw angle, is used to model the wind power profile [23]. Rogers et al. presented a methodology based on the Bayesian probabilistic modelling for wind turbine power curves [24]. These artificial intelligence-based methods listed above learn the historical real measurement data to provide an optimal estimator for wind turbine power curves, which reduces their output error. However, these previous researches appear to overcompensate the problem of modelling the wind turbine power curve. These power curve model assumes a fixed relationship between output power and wind speed. Although optimization for fixed power curve has been widely studied, continuity of time series data that may occur in real operational conditions, including variable wind scenario, are barely considered. To address this issue, this paper proposes deep learning based wind turbine model. This strategy consider the continuity of the relationship between wind condition and actual output. A model designed from a nonlinear relationship between wind conditions and power output can derive accurate output.
In this paper, we utilize measured continuous data from a SCADA system for machine learning-based wind turbine design. Applying data preprocessing and time-series techniques, including data mining, a wind turbine model design proposal is presented. We build an algorithm based on the proposed wind turbine model, which is then inserted in turbine controller so that the turbine can react during real-time wind conditions, thereby verifying it in a real-time situation. The hardware-based verification system has been designed using a RTDS to simulate the signal exchange between the machine learning-based wind turbine model and the simplified model. A verification study has also been performed under a hardware-in-the-loop simulation (HILS) environment to explore various wind scenarios situations. The major contributions of this paper are summarized as follows.
Contrary to conventional studies, in this paper, wind turbines are modeled based on artificial neural network techniques. To complement the existing fixed power curve, we applied an algorithm considering the spatiotemporal characteristics of a lot of data. The proposed strategy can derive a similar real output value through the trained wind turbine model in various wind scenarios.
In this paper, we perform verification of a wind turbine model using HILS. By using HILS, which connects real hardware and real-time simulation, we can more accurately evaluate the performance of the proposed model. In addition, we can improve the performance and reliability of the model. This paper is composed of five main sections. In Sect. 2, as related work, we describe a learning structure design dedicated to machine learning-based wind power generation systems. Section 3 discusses machine learning-based wind turbine model design as part of the learning model design. Section 4 analyzes the evaluation results of the wind turbine model using the real-time simulator. Finally, Sect. 5 concludes with expected effects and future research plans.

Convolutional Neural Networks (CNN) Model
A CNN consists of convolution and an artificial neural network structure. CNN is useful for extracting spatial and temporal features of input data. It applies several filters to extract various spatial features. Correlations for spatially adjacent signals are extracted by applying a non-linear filter. These CNNs are distinguished by dimension, and Table 1 shows the characteristics of Conv1D, Conv2D, and Conv3D by dimension.
The composed model in this paper used Conv1D to handle the 1D sequential data. The execution process of Conv1D progresses to three stages: feature extraction, subsampling, and pooling. In the first step, the kernel with weights circulates in a constant flow and calculates multiple convolutions in parallel. In the second step, the values calculated in the first step are through an activation function to extract features of the input data and output them to a feature map. The third step uses the pooling function to reduce the feature map's data. This step extracts the characteristics of the data through iteration. The dataset extracted in the previous step classifies the data features via the Fully Connected hierarchy. The Conv1D equation used to extract the features of the data is presented below.
where l is the layer index, is the activation function, b j is the jth feature map bias, M is the kernel size, W j m is the jth feature map weight, and m is the filter index.

BiLSTM Model
LSTM is widely used as a model that improves the shortcomings of long-term dependence of Recurrent Neural Network (RNN) models. However, since the input data is processed in chronological order, there is a limitation in that the output data affects the pattern that was input immediately before. In order to address this limitation, we applied a BiLSTM model that introduced the concept of bidirectional recurrent neural networks. This structure learns via two separate recurrent neural networks, forward and backward. Figure 1 shows a structural diagram of the BiLSTM. In the case of BiLSTM, a bidirectional recurrent neural network is a structure in which the input and output layers are connected with a forward hiding layer and a further backward hiding layer. This is a combination of two existing unidirectional LSTM models, a forward layer and a backward layer. Input data is input to the forward layer in the forward direction, and data is input to the backward layer in the reverse direction. These BiLSTMs have the characteristic that performance does not deteriorate even if the data length is long by introducing the basic performance of LSTM and the attention mechanism. This paper adopted BiLSTM to learn many years of time series data.

CNN-BiLSTM Model
In this paper aims at designing the wind turbine model based on data by reflecting past output characteristics. The algorithm has to construct considering data patterns to reflect output characteristics. Therefore, this paper performed machine learning based on the CNN-BiLSTM algorithm, a combination of CNN and BiLSTM. The construction of the corresponding CNN-BiLSTM algorithm can be divided into three steps, shown in Fig. 2. The first is the CNN and Max-Pooling hierarchy. In this structure, periodic and aperiodic  features of time-series data are extracted from the CNN hierarchy to generate the feature map output value. By repeating this process to extract periodic and aperiodic features of the time series data. The second step advances deep iterative learning of the BiLSTM layer. The BiLSTM layer learns the relationship between the past and future data of the feature data output via CNN by executing iterative calculations based on the LSTM algorithm. The third step, model evaluation is a very important step to assess and measure the performance and accuracy of the proposed algorithm using the metric scores. The evaluation metrics used for this study were selected based on the recommendations of studies and reports in the field of machine learning. The following section describes the proposed method that applies the CNN-BiLSTM algorithm for wind turbine model design.

Machine Learning Based Wind Turbine Model Design
In this paper, we propose a wind turbine model based on artificial neural network technology using SCADA data of an actual wind farm. The overall flowchart of the proposed model is shown in Fig. 3. The flowchart divided into three parts.

Learning Data Collection
The selected wind farm comprises 15 Hanjin 2 MW doubly fed induction generator (DFIG) wind generators. The specifications of the wind turbines constituting the wind farm are listed in Table 2. The dataset of the long-term continuous operation database has been collected at one of the wind turbines in the Dongbok wind farm. A SCADA system is designed to transmit status information from individual wind turbines to a central control system for effective operation and management of wind farm equipment. The system provides detailed operational information on wind turbines, such as power generation, reactive power, and wind condition. We selected wind speed, direction, power coefficient and active power as the datasets used for learning. The time range of the collected data is 1-s data from Jan 1, 2020, to Dec 31, 2022.

Dataset Preprocessing
Building an ML model involves preprocessing the collected data, training the model through deep learning algorithms, and saving. Raw data of the SCADA system contains unstable data such as communication protocol failure and time synchronization errors. These reasons cause several problems when learning raw data immediately. Data preprocessing is required for cleaning the data and making it suitable for a machine learning model, which also increases the accuracy and efficiency of a machine learning model. The flowchart of the raw data preprocessing process in this paper is shown in Fig. 4. The flowchart of raw SCADA data preprocessing is performed to refine the data into a trainable dataset for training the model. The preprocessing steps are performed sequentially, starting with removing missing Fig. 3 Flowchart of the wind turbine design based on machine learning values, followed by removing outlier data using quantiles, and then normalizing the data. Missing value biases the output results because the sample is less representative, and the model performance suffers. Therefore, we employ a method of removing all missing values to eliminate the uncertainty of the observed missing values in the dataset. Due to operational issues such as wind turbine failures, the raw data has extreme values that deviate significantly from the normal range. This degrades performance when learning patterns of data. In this paper, the box-plot method was applied to find outliers and the outliers were removed based on (Inter-Quartile Range) IQR. Then, raw data converts the range of different variables to a certain level through the scaling process. This process prevents machine learning models from biasing specific data. It also excludes the possibility of problems arising depending on unit differences between variables. Therefore, we applied Min-Max scaling to the data input for training. Table 3 illustrates the structure for deriving the active power utilizing the CNN-BiLSTM algorithm. The input data of this structure are wind speed and direction, power coefficient, and the output data is active power. The convolution and pooling layer extracts the time series characteristics of the multivariate input variables, which are then transferred to the BiLSTM layers. The transmitted time series features allow the BiLSTM layers to model the minimized loss of output value. Finally, the CNN-BiLSTM method can derive active power in a wholly linked hierarchy.

Construction of the ML Model
To prevent overfitting of the proposed model, we divided the training set and the test set in a ratio of 7:3. Here, the test set is applied to validate the trained model. We use the loss function as a metric to verify that the model is well-trained. Figure 5 shows the mean squared error loss over the training epochs for both the train (red) and test (blue) sets. The model converged reasonably quickly, and both train and test performance remained equivalent. This suggests that the neural network configuration of the proposed model is suitable.

Hardware In-Loop Simulation (HILS)
HILS is a real-time technology used to develop and test complex embedded systems. This paper uses HILS to verify the wind turbine model through the controller, where the learning model is saved and a real-time simulator. A commercial wind turbine was simulated on RTDS using RSCAD. The proposed model, which replaces the conventional detailed model, is implemented at the controller in the HILS configuration. Thus, the wind turbine in RSCAD was composed of a model that reflected simple characteristics responding to command values rather than a detailed model. In this paper, an equivalent VSC average model was implemented to RTDS system so that the developed wind turbine model output from the controller can be represented into the HILS system. The detailed HILS configuration is depicted in Fig. 6. An interface between the simulator and external hardware is essential when configuring HILS. Therefore, a simulation environment was established using the Modbus TCP/IP protocol to enable communication between the RTDS internal wind turbine and the controller. The Raspberry Pi-based controller inputting the wind condition data every second generates the active power command value of the wind turbine and sends it to the RTDS.

Performance Evaluation Indicator
The performance of the proposed wind turbine model is evaluated through evaluation indices. Various indicators exist for evaluating machine learning models. Different In the four formulas above, y i represents the actual wind power, ŷ i the wind power derived from the learning model. Figure 7 illustrates the verification process of the proposed wind turbine model in this paper. The data used for verification is SCADA data of January 29, 2023. Each case in the simulation is divided into three criteria, which considers the average rate of change in wind speed per second. Each case have averages 1%, 2%, and 3% wind speed fluctuation rate

Simulation Configuration
per second. The wind speed fluctuation rate was calculated as shown in (6).
Here, W s is wind speed and t is simulation time.
To analyze the dynamic characteristics of wind turbine output in the simulation, the proposed wind turbine model, wind turbine model based on LSTM and optimized power curve-based wind turbine model were established as case studies. The optimized power curve-based wind turbine model generated the active power according to wind speed using the determined power curve. Based on LSTM, the model learned the long-term dependency of sequential data and generated active power according to wind speed. The deep learning-based wind turbine model generated the active power according to input data using the weights of the trained model. The active power generated by all three methods is input to the electrical wind model designed in RSCAD. Graphs for all three methods are displayed along with the actual active power. The graphs recorded each active power extraction of the wind turbine for 60 s. The error rate for the three methods has been compared in the evaluation indicator.

Simulation Results
The simulation results are the validation of the wind turbine model using HILS. We were able to evaluate the performance of the proposed model using HILS more accurately. A graph analysis can visually demonstrate the performance of the three wind power modeling methods. As shown in Fig. 8, the optimized power curve model's active power result significantly differs from the actual active power. The LSTM-based model obtained improved results compared to the active power results of the optimized power curve model. However, the graph deviated from the actual active power curve as the wind speed variability increased. On the other hand, the proposed method showed outputs similar to the actual active power in all three wind speed modes. This means the model's performance trained through the  CNN-BiLSTM algorithm is excellent. For extracting critical information from given sequence data, Conv1d performs well in the feature selection of fluctuation processes. Also, the BiLSTM model performed well in learning time series sequence and could accurately learn the dynamic characteristics relationship between input data. Table 4 shows the comparison error rate between the power generation derived from each case and mode and the actual power generation. The error rate of the table was measured by averaging the error every second. The error rate is calculated using the evaluation indicator. The lower the error rate of the evaluation indicator means, the higher the accuracy. The error rates of the proposed model were low in the order of case 2, case 3, and case 1 in MSE and RMSE.
Compared with the optimized power curve model proposed method reduces the WAPE by 13.06%, 12.98%, and 24.34%, respectively. In addition, compared to the LSTMbased model, the proposed method reduces WAPE by 0.32%, 1.04%, and 6.32%, respectively. Based on these results, the proposed model in this paper has been demonstrated to be more accurate than other models in all wind speed variability. Based on the above analysis, the proposed method has great potential in the practical modeling of wind turbines.

Conclusion
This paper proposed a machine learning method for designing a wind turbine model based on data. To complement the existing fixed power curve, we applied an algorithm considering the spatiotemporal characteristics of a lot of data. In the case study, RTDS was used to verify the effectiveness of the proposed wind turbine model. Each case study compares the power generation of the actual and the proposed model according to wind speed fluctuations. Four indicators are applied to evaluate the performance of the proposed model. Simulation results show that the proposed wind turbine model power tracking values are similar to the actual power generation. As the proposed model shows accurate results, it is expected that the proposed method can be expanded to multiple wind turbines and accurately reflect the dynamic characteristics of a wind farm. Additionally, it is expected that applying the proposed wind turbine model to large-scale wind farm simulations will reduce computational burden and output inaccuracy. 23) funded by the Korean government.  Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.