Introduction

As the world shifts towards renewable and sustainable energy sources, wind turbines play a crucial role in this global change. Wind energy offers a promising new frontier in meeting the growing need for sustainable energy by utilizing the vast potential of wind resources in diverse environments. On the other hand, the installation and operation of wind farms pose challenges that demand innovative solutions to improve overall performance, reliability, and efficiency. In this context, predictive digital twins have attracted attention as an innovative technology with the potential to fundamentally alter the wind energy market. Digital twins, which are virtual replicas of physical assets or systems, allow real time monitoring, simulation, and predictive analysis. The application of predictive digital twins, especially in wind farms, offers valuable insights into the behavior and performance of systems, enhancing proactive decision-making, energy forecasting, and competence in the energy market.

Motivation and objective

The motivation behind of this literature review lies in the fact that, while there are several reviews on digital twins, this study specifically focuses on predictive digital twins within the context of wind energy systems-a perspective that has not been previously explored. Conducting a review of predictive digital twins for wind farms is essential due to the critical necessity of addressing the inherent challenges in this dynamic and volatile environment. Some main challenges with wind farms are diverse and severe conditions, including variable wind patterns, complex environment interaction, and difficulties in data collection. These factors can impact the structural integrity, energy yield, and overall operational efficiency of wind turbines. Predictive digital twins have the potential to transform the way wind farms are monitored and managed. By integrating advanced data analytics, machine learning algorithms, statistical and probabilistic methods, and real time sensor data, these digital assets can predict potential issues before they escalate. This approach helps optimize performance and contributes to the reliability and cost-effectiveness of wind energy projects.

This review aims to comprehensively explore the current state of predictive digital twins for wind farms. The key objectives include:

  • Surveying existing literature Providing a thorough overview of existing studies, research, and implementations related to predictive digital twins in the context of wind energy.

  • Assessing current methods Evaluating the advancements in predictive modeling, data analytics, and machine learning techniques applied to wind farm operations. Investigating how predictive digital twins contribute to enhancing the performance, reliability, and energy yield of wind farms.

  • Identifying challenges and limitations Identifying and critically analyzing the challenges, limitations, and gaps in current research and applications of predictive digital twins in the wind industry.

  • Proposing future directions Proposing potential directions for future research, emphasizing areas that necessitate exploration to enhance the capabilities of predictive digital twins in the realm of wind energy.

This review aims to consolidate and synthesize the existing knowledge, providing valuable insights for researchers and stakeholders who are involved in advancing wind energy through the application of predictive digital twin technologies.

Problem definition and contribution

Predictive digital twins within the realm of wind energy systems have attracted significant attention in recent decades. The creation of predictive digital twin platforms has been made possible by real time data, simulation models, and advanced analytical methods. This technology not only allows stakeholders to forecast potential issues but also enhances informed decision-making and performance optimization. Despite the growing investment in this field, there remains a need for a comprehensive understanding of the current state of research and development in predictive digital twin applications specific to wind energy systems. The main challenge lies in the lack of solid knowledge regarding key challenges and advancements, along with a gap in the literature related to predictive digital twins in wind energy systems. As the field is rapidly evolving, there is a risk of inconsistency and limited transferability of findings across different systems and industries. Additionally, the effectiveness of predictive digital twins in enhancing the overall performance and reliability of wind energy systems remains unclear, given the data provided by several independent sources. The integration and analysis of various data still pose prominent issues. To address these challenges, a literature review is necessary to comprehend existing knowledge, identify trends, and provide a foundation for future research.

This paper contributes to the understanding and development of digital twin technology within wind energy systems through a comprehensive literature review. A systematic approach is used in the review, beginning with the formulation of the research question and establishment of the review protocol. Relevant studies are then searched in the selected databases using the defined query strings. By conducting a literature survey from the past five years, this study presents key trends and advancements in predictive digital twin platforms. The analysis identifies current challenges and limitations, while also discussing commonly employed methodologies, with a focus on enhancing digital twin systems. Furthermore, future research opportunities are outlined to lay a foundation for ongoing advancements in this field. This review seeks to offer valuable insights and practical guidance for academics, industry professionals, or technology developers working on digital twin technology in the wind energy sector.

Background information

A digital twin is a representation of a physical system created through digital information. This digital counterpart serves as a duplicate of the information embedded in the physical system and remains interconnected with it throughout the lifecycle. The origins of the Digital Twin concept can be traced back to a 2002 University of Michigan presentation aimed at establishing a Product Lifecycle Management. Figure 1 provides a visual depiction of the digital twin, highlighting its primary components: real space, virtual space, the link for data and information flow from real space to virtual space (Grieves 2016).

Fig. 1
figure 1

Conceptual framework of digital twin for a wind turbine. The physical asset consists of sensors and IoT devices. The digital twin platform consists of three main fronts: big data and analytics, simulation & property modeling, and visualization. Data is provided from the physical assets to the digital twin platform, where information and processes are sent to the physical asset from the digital twin platform

The digital twin concept operates on three main fronts: first, it stores essential component data. In this capacity, the digital twin systematically collects, organizes, and stores critical information pertaining to the physical system’s components. This encompasses a detailed inventory of the structure, dynamics, and configuration of the various elements of the system. This repository is not only used for the current state but also lays the groundwork for several processes. The stored data becomes the foundational building block upon which the digital twin can further analyze, simulate, and visualize the behavior of the physical system. In the realm of wind energy, the digital twin may capture detailed information about the turbine’s components, such as the specifications of the rotor blades, turbine output, the configuration of the generator or gearbox, also parameters related to environment like wind speed and wind direction. More specifically, the digital twin might store data on the aerodynamic profiles of the rotor blades, including their material composition and dimensions (Jureczko et al. 2005). It would document the specifications of the gearbox, detailing gear ratios and load-bearing capacities (Moghadam et al. 2021). Wind sensor data, historical wind patterns, and turbine performance metrics, such as power output and efficiency, would also be systematically recorded. This detailed component data can be used as the foundation for subsequent analyses and simulations. This information can then be used to simulate the wind turbine’s behavior under various conditions, to optimize the turbine’s performance. Second, it analyzes and simulates the asset based on that data, where computational models and algorithms are utilized to examine the stored data within the digital twin. The digital twin employs advanced analytical tools and machine learning algorithms to simulate the behavior of the physical system under various conditions. These models are intended to replicate the dynamic interactions between components and the environment. The simulations and models enable us to gain insights into how the asset responds to different inputs, environmental factors, or operational scenarios. These virtual tests can identify potential issues or efficiency losses, enabling us to comprehensively assess the system. A digital twin for a wind turbine leverages stored data to conduct detailed performance analyses and simulations. For instance, the digital twin may employ computational models that consider parameters such as wind speed, blade geometry, and turbine specifications. Analyzing the data could involve simulations to predict power generation output at varying wind speeds. The digital twin can be used to assess the wind direction impact on the turbine’s yaw mechanism, optimizing its alignment for maximum energy capture (Wu and Wang 2012). Structural simulations may also be employed to evaluate the integrity of turbine components, helping identify potential stress points or areas requiring maintenance (Bazilevs et al. 2015). Forecasting algorithms can also be implemented to estimate the power output in different time horizons (Hanifi et al. 2020). Implementing all these models enhances efficiency by providing a deep understanding of performance, minimizing downtime under diverse conditions. Third, it visualizes relevant data and results according to predefined objectives. These presentations provide insights through the digital twin’s simulation processes. In this phase, the digital twin transforms complex data and simulation results into comprehensible visual representations. These visualizations align with predefined objectives, to ensure that the information presented is relevant and serves specific needs. Visualization of a digital twin involves creating graphical representations, a dashboard, 3D visualizations, and other illustrative formats that convey key findings (Kandemir et al. 2023). These visual outputs may include performance metrics, trends, and critical insights derived from the analytical simulations. The primary goal is to present the information in a clear and accessible way, facilitating effective communication and decision-making among various stakeholders. In this way, stakeholders can intuitively grasp the complexities of the physical system. In the context of wind turbines, a predefined objective is to optimize energy production; the digital twin could generate visualizations that display real time power output, efficiency trends, and the impact of different wind conditions on energy generation. These tools could include graphical representations of power curves, efficiency maps, and performance trends of subsystems (Rafiee et al. 2018). These visualizations can be used to quickly assess the impact of wind speed, direction, or turbine settings on energy production. Additionally, the digital twin might generate visual alerts or dashboards highlighting areas of the turbine that require attention or maintenance.

Digital twin applications rely on four key technologies: “Internet of Things”, “Data and Analytic”, “Cloud Computing” and “Accessibility and Interaction” (Wang and Liu 2022). The Internet of Things (IoT) functions as a system where physical devices are embedded with software, utilizing Internet connectivity. Various techniques, such as Bluetooth, Wi-Fi, RFID, and GPRS, can establish connections in IoT, facilitating communication between physical and virtual entities for data transfer. Many companies are actively investing in IoT to foster machine-to-machine communications. The framework is structured into three primary layers: perception, network, and application. In the perception layer, interaction with the environment occurs through sensors and actuators. The network layer manages connections between diverse entities, including “things,” network devices, and servers, processing data in the process. The final layer provides services to users (Shah et al. 2018; Mouha 2021). Data and Analytics encompass the utilization of various corporate tools like Standard Query Language (SQL) for tasks such as data storage, manipulation, and retrieval. In this process, it’s crucial to evaluate data through advanced methods that align with specific objectives. These analytics involve a range of methods, including physics based models, statistical and predictive analysis, machine learning, and artificial intelligence (Fowdur et al. 2018). Cloud Computing enables people to reach, share, and store information via the Internet. This innovative computing technology utilizes a network of data centers with interconnected computers, allowing the execution of software functions. Users have access to powerful platforms, and services over the Internet, making it a versatile collection of network-enabled services. Cloud Computing provides on-demand, flexible, and tailored computing infrastructures to a wide range of stakeholders (Kalapatapu and Sarkar 2012). Accessibility and Interaction with Digital Twin involve examining physical systems from a distance. The digital twin stands out by being reachable remotely, facilitating global data transfer with fewer limitations. In scenarios where local access is limited, the need for remote monitoring and control of assets becomes apparent. Moreover, within complex systems, understanding subsystems poses a challenge, but the digital twin simplifies understanding both subsystems and the interaction between systems (Singh et al. 2021). Human interaction emphasizes communication and interaction between humans and machines. Emerging technologies in this area include virtual and augmented reality, 3D visualizations, and recognition algorithms (Ma et al. 2019).

Outline

In "Methodology" Section, the methodology is outlined, detailing the establishment of a review protocol that plays a pivotal role in the investigation of predictive digital twin technology. Specifically, inclusion criteria are outlined, the search strategy is executed, and a systematic approach is employed to explore relevant literature. In "Results" Section, the results of the literature review are provided, aligning with the research questions and presenting key findings on predictive digital twin technology, including current applications, methods, and emerging trends. Section "Discussion" engages in a discussion, analyzing the implications, trends, and methods identified in the literature. This section aims to gain a deeper understanding of the context of predictive digital twins. In "Conclusions and future work" Section, conclusions are drawn, summarizing the state of predictive digital twins based on insights obtained from the literature review.

Methodology

The methodology is inspired by the guidelines proposed by Kitchenham and Charters (2007) for a systematic literature review. Well-formulated research questions are essential as they guide the search, selection, and analysis of relevant studies which provides a comprehensive overview of existing research on a specific topic. The predefined search strategy and inclusion/exclusion criteria enhance reliability. Reviews significantly contribute to scientific knowledge by summarizing findings, identifying gaps, and establishing a reliable foundation for future research. The format also promotes transparency and credibility, owing to the well-established protocol. This paper is conducted in three main steps, as shown in Fig. 2, which include planning, execution, and reporting.

Fig. 2
figure 2

Framework for a literature review. Plan: Develop the research question, establish the review protocol, and create query strings for database searches. Execution: Identify relevant research, filter results based on established quality criteria, and identify proposed methods. Report: Analyze multiple methods found in the literature and document the findings

The planning phase is dedicated to the formulation of an effective search strategy and establishing criteria to evaluate the quality of the gathered studies. During the execution stage, the focus lies on the identification of pertinent studies and the extraction of the employed methodologies for the corresponding studies. The reporting phase synthesizes all the acquired findings and methodologies, facilitating a comprehensive and critical discussion of the outcomes. In essence, this three-step methodology provides a structured framework for conducting literature review.

Research questions

The research question in a literature review is crucial as it shapes the entire study. Its significant role lies in establishing an unbiased framework essential for maintaining objectivity, reliability, and credibility. A well-formulated research question ensures a thorough analysis of existing literature, contributing to the academic integrity of the review. In this context, four research questions have been formulated: (1) targeting methodologies, (2) addressing the integration of data from various sources, (3) focusing on real time decision-making, and (4) delving into challenges.

  • RQ1: What methodologies are commonly employed in developing predictive digital twin models for wind energy systems?

  • RQ2: How do predictive digital twin applications integrate and analyze data from diverse sources to enhance their predictive capabilities?

  • RQ3: What are the key features and technologies that facilitate real time wind energy systems through predictive digital twin?

  • RQ4: What are the challenges commonly encountered in wind energy systems when implementing predictive digital twin solutions?

Search strategy

For the literature review on predictive digital twin in wind energy systems, a comprehensive search strategy was developed. This strategy involved the utilization of academic databases and search engines such as IEEE Xplore, Scopus, ACM Digital Library, SpringerLink, Wiley Online Library, and Taylor & Francis Online. The search string was structured using a combination of keywords and Boolean operators (AND, OR) to refine the search results effectively. The keywords and logical operators are explicitly detailed in Table 1 with the corresponding research questions. The search was conducted across the selected databases between the years 2019 and 2024, aiming to identify relevant studies published within the last 5 years.

Table 1 Search strings used in the digital libraries and databases

Quality criteria and study selection

To ensure transparency and minimize potential bias in the literature review on predictive digital twins for wind energy systems, quality and inclusion criteria were established. These criteria were applied to the selection of studies in accordance with PRISMA guidelines (Page et al. 2021). The selection process involved evaluating studies based on predefined criteria, as outlined in Table 2. The criteria emphasize studies employing research methodologies such as experimental studies, case studies, simulations, and theoretical frameworks relevant to predictive digital twin applications in wind energy. Only peer-reviewed articles published in academic journals, conference proceedings, and scholarly books from recognized publishers are included, with English as the publication language. The review captures the most current advancements and developments in the field over the last 5 years. The qualified study went through the process of multiple stages, including title, abstract, and keyword screening followed by full-text assessment. Figure 3 explicitly outlines each step of the study selection procedure.

Table 2 Systematic literature review inclusion and quality criteria
Fig. 3
figure 3

Study selection diagram. Step 1: Study search in selected databases, Step 2: Removal of duplicate studies, Step 3: Filtering the studies according to quality criteria, Step 4: Screening the studies based on abstract and keywords, Step 5: Screening the studies based on full text, Step 6: Inclusion of relevant studies

Methodology extraction and synthesis

This section outlines the methodology extraction and synthesis process utilized in the literature review on predictive digital twin technology in wind energy systems. The objective was to gather and integrate information on the methods, models, and technologies employed in the selected studies. Each study was examined for the methods and frameworks implemented or conceptualized in predictive digital twin models for wind energy systems, including validation models. Another important aspects were the sources of data, modeling techniques, simulation tools, validation methods, and reasoning approaches applied. By identifying the methodologies, and technologies employed in these studies, the review provides a comprehensive overview of the diverse approaches in predictive digital twin technology for wind energy. The synthesis of methodologies given in the discussion ("Discussion" Section ), directly aligned with the research questions posed at the outset of the review, allowing insights into specific aspects of predictive digital twin technology. The initial number of studies from the database search and the selected studies with complete citations are provided at the end of each research question in the results section as Tables 356 and 7.

Results

RQ1: What methodologies are commonly employed in developing predictive digital twin models for wind energy systems?

Developing predictive digital twin models for wind energy systems involves leveraging advanced methodologies to accurately simulate and forecast the performance and behavior of wind turbines. In this context, three main categories of methodologies are identified: physics-based modeling, data-driven approaches, and hybrid models. These categories were selected based on current research and applications within the field of wind energy systems (Vargas et al. 2019; Liu and Chen 2019).

Physics based modeling

Physics based modeling constitutes one of the core elements for wind energy systems, crucial for optimizing performance and ensuring reliability. This section elaborates on the key submodels involved: structure, aerodynamics, electric model, and control. These four submodels in wind energy system modeling are essential for design, analysis, and optimization.

The structural model covers the mechanical behavior of wind turbine components. Structural dynamics enable the investigation of wind turbines under various loads. Techniques such as finite element analysis enable the prediction of wind turbine responses. These dynamics involve modeling bending and torsion moments, along with tension, compression, and shear forces (Hernandez-Estrada et al. 2021; Jahani et al. 2022; Rajamohan et al. 2022). Studies also consider periodic loads that may cause fatigue effects (Fu et al. 2020; Njiri et al. 2019). Additionally, due to the high aspect ratio of wind turbines, aeroelastic effects such as flutter are accounted for (Chen et al. 2021; Li et al. 2020; Ma et al. 2019). By assessing all these factors, structural integrity and longer lifespan can be achieved. Material properties play a pivotal role in the structural model. Incorporating characteristics such as elasticity, damping, or strength is crucial for accurately representing the behavior of turbine components (Pradeep et al. 2019; Igwemezie et al. 2019; O’Leary et al. 2019). Given the diverse operating conditions and environmental effects, assessing the lifelong performance also relies on correctly represented material properties. In terms of structural modeling, tower and foundation design are additional aspects that need to be considered. The analysis of interactions with the ground, such as soil properties or seismic loads, is essential for onshore wind farm infrastructures (Ren et al. 2021; Zhao et al. 2019). Moreover, in offshore wind farms, the hydrodynamic effects on the wind turbines have a significant impact, requiring materials that can withstand harsher conditions such as high salinity causing oxidation (Mu et al. 2023). The influence of high-amplitude waves on freestream affects the dynamic pressure. Especially for floating offshore wind turbines, precise models are required to investigate complex dynamics involving surface waves and subsurface ocean currents (Zilong and Xiao Wei 2022; Porchetta et al. 2021).

The aerodynamics model predicts the interaction between the wind, turbine blades, and the influence of the wind turbines on one another. Computational fluid dynamics uses numerical methods to analyze and solve fluid flow problems. Several techniques, such as finite volume or finite differences, are employed in the solution method. The differential equations, such as the Navier–Stokes equations, enable the description of the relation between pressure, temperature, velocity, and density of a moving fluid (Qian et al. 2020; Vogel and Willden 2020; Hornshøj-Møller et al. 2021). Additionally, the Blade Element Momentum Theory combines two phenomena, the blade element theory and momentum theory, to calculate aerodynamic forces and moments, considering airfoil characteristics and aerodynamic losses (Ledoux et al. 2021; Zhang and Qu 2021; Tahir et al. 2019). Dynamic inflow affects the wind energy system as wind turbines reach a steady state after a change in the existing state, such as sudden pitch angle variation or tower shadow. Accounting for this effect would enhance the capability to capture the time-varying behavior of aerodynamic performance (Branlard et al. 2022; Papi et al. 2024; Ferreira et al. 2022). Although aeroelastic effects are mentioned in the previous paragraph, it should be noted that elastic deformations lead to changes in the aerodynamic characteristics of the wind turbine, causing unpredictable behavior (Kaviani and Nejat 2021). Boundary layer models, both in laminar and turbulent flow on the blade surface, should be another consideration due to their effect on aerodynamic performance and noise generation (Sedaghatizadeh et al. 2019; Tian et al. 2019).

The electric model simulates the electrical aspects of wind energy systems, focusing on power conversion and integration with the grid. It involves simulating the electrical properties a generator, such as synchronous/asynchronous operation (Xiaoyu and Chao 2019), excitation control, and voltage regulation (Huang et al. 2019; Ravanji et al. 2020), to enhance power generation and grid stability. Additionally, the electric model includes models of power electronics such as rectifiers, inverters, and converters to link variable-speed turbines with the grid, ensuring efficient energy conversion. Moreover, the electric model examines grid connection dynamics, ensuring compliance with grid codes, and managing reactive power, thereby facilitating the smooth integration of wind turbines into the electrical grid. Basit et al. (2020); Li et al. (2020)

The control model governs the operation of the wind turbine for optimum performance, safety, and reliability. Pitch control algorithms are one of the most popular methods for adjusting the blade pitch angle to optimize energy capture and respond to different wind conditions. By implementing pitch control algorithms, stable operations can be conducted across a wide range of environmental conditions (Navarrete et al. 2019; Sierra-García and Santos 2021; Gambier 2021). Similarly, yaw control is a method used to increase efficiency. With this control strategy, the turbine aligns with the incoming wind direction, capturing maximum energy with minimum structural load (Yang et al. 2021; Saenz-Aguirre et al. 2019; Liu et al. 2021). Another algorithm in the control model is the rotor speed regulation algorithm, which determines the optimum rotational speed through pitch control or generator torque control. This helps minimize mechanical stress while maximizing efficiency (Bashetty et al. 2020; Akbari et al. 2019). Additionally, fault detection and diagnostics algorithms enhance wind energy system operations. Monitoring system health and detecting anomalies can prevent catastrophic consequences. Fault detection and predictive maintenance enable cost-effective operation (Merizalde et al. 2019; Udo and Muhammad 2021; Hsu et al. 2020).

Data-driven approaches

In data-driven approaches for wind energy systems, several techniques can be applied depending on the characteristics of a dataset and the required prediction task. These methods are investigated in three main categories: regression models, machine learning algorithms, and statistical methods.

A regression model is a statistical method used to analyze the relationship between a dependent variable and one or multiple independent variables. The aim is to predict the value of the dependent variable based on the values of the independent variables (Fahrmeir et al. 2021). Regression models are commonly used for prediction, forecasting, and understanding the influence of different variables on an outcome (Liu and Chen 2019; Gualtieri 2019). Although there are several types of regression models in wind energy systems, three model types are commonly used: linear regression, polynomial regression, and ridge regression. Linear regression is mainly used to predict the linear relation of turbine power output based on variables such as wind speed, wind direction, or environmental effects (Barhmi et al. 2020; Dupré et al. 2020; López and Arboleya 2022). On the other hand, polynomial regression captures the nonlinear correlation of input variables with turbine performance parameters. This method enables the comprehension of complex nonlinear interactions among independent variables (Wang et al. 2021; Niu et al. 2022; Liu et al. 2021). Ridge regression ensures more stable predictions between input variables and output performance by including a regularization term to prevent overfitting. It is particularly useful when the correlation between independent variables is high, and it finds application in design, optimization, and forecasting (Naik et al. 2019; Zheng et al. 2023; Carneiro et al. 2022).

Machine learning algorithms are popular methods used in wind energy systems. By analyzing various datasets, including weather patterns, turbine operations, and maintenance records, machine learning algorithms can identify patterns to improve the overall efficiency of wind energy production (Elyasichamazkoti and Khajehpoor 2021). The most common algorithms used for this purpose include support vector machines (SVN), artificial neural networks (ANN), recurrent neural networks (RNN), and long short-term memory (LSTM) networks. A support vector machine is a supervised machine learning algorithm used for data classification and regression analysis. It can classify different wind conditions, enabling optimal wind settings (Li et al. 2020; Tuerxun et al. 2021; Lu et al. 2020). Artificial neural networks learn complex patterns such as wind speed or direction to predict turbine power output accurately with optimum parameters (Barhmi and Fatni 2019; Nielson et al. 2020; Sun et al. 2020). Recurrent neural networks are powerful tools, especially for learning sequential data and predicting sequential outputs. They can capture temporal dependencies and nonlinear dynamics in time-series data, allowing for accurate forecasts (Huang et al. 2021; Kisvari et al. 2021). Long short-term memory networks are specialized versions of recurrent neural networks that enable forecasts over extended time horizons (Banik et al. 2020; Shahid et al. 2021).

Statistical models are another popular method due to their interpretability and ability to capture temporal patterns. Some of the commonly used methods, specifically for wind energy systems, are autoregressive integrated moving average (ARIMA), vector autoregression, and seasonal decomposition. Autoregressive integrated moving average models consist of three main components: autoregression, differencing, and moving average (Shivani et al. 2019; Elsaraiti and Merabet 2021; Sheoran and Pasari 2022). This model can also be extended for non-stationary time series by accounting for seasonality. The method facilitates short term planning for turbine operation (Liu et al. 2021; Tyass et al. 2022). Unlike the previous method, vector autoregression is useful for dealing with multiple time series variables as they interact with each other. For instance, the influence of wind speed, temperature, and pressure on wind power generation, along with their dependencies with each other, can be investigated with this model (Keyantuo et al. 2021; Messner and Pinson 2019; Li and Wu 2020). Although seasonal decomposition is not a forecasting technique, it is an important technique for understanding the underlying components of time series. The main classical decomposition components are trend, seasonal, and residual components. This technique is widely used in wind energy systems (Qian et al. 2019; Simon et al. 2024; Mbuli et al. 2020; Yan et al. 2022).

Hybrid modelling

In the evolving field of wind energy, hybrid modeling techniques have attracted significant attention as robust solutions by integrating physics knowledge with data-driven approaches. This section focuses on the main five advanced hybrid methodologies in forecasting, grid integration, fluid dynamics, structure, and predictive maintenance. As they rely on both physical laws and machine learning, accurate and reliable models for predictive digital twin platforms for wind energy systems can be achieved.

Hybrid forecasting models integrate machine learning algorithms with numerical weather prediction models for accurate wind speed predictions, which later yield power output forecasts for the wind turbines. Time series analysis employs methods like ARIMA, LSTM, or fuzzy logic with the numerical weather prediction models to forecast wind conditions (Kosovic et al. 2020; Zhang et al. 2020; Du et al. 2019). Ensemble methods are particularly useful for merging different models to quantify uncertainties (Zhang et al. 2019; Wang et al. 2022; Korprasertsak and Leephakpreeda 2019). Also, data assimilation methods like the Kalman filter or its variations are important for combining real-time sensor data with forecast models implemented in digital twin platforms (Aly 2020; Hur 2021). As wind energy production forecasting models enhance the supply side of grid integration, hybrid models for electricity load estimation can be utilized to estimate the demand side. Some commonly deployed hybrid algorithms include artificial neural networks, wavelet neural networks, Kalman filtering, convolutional neural networks (CNN), and LSTM models with physics-based models (Alhussein et al. 2020; Aly 2020; Lv et al. 2022; Mamun et al. 2020).

Hybrid aerodynamic models combine high-fidelity computational fluid dynamics (CFD) simulations with machine learning models for optimum aerodynamic performance. Computational fluid dynamics simulations are used to generate data for machine learning models, such as Gaussian process regression or support vector regression, to reduce computational costs (Kaya 2019; Morita et al. 2022). Similarly, reinforcement learning algorithms based on real-time data and computational fluid dynamics data are applied for control strategies (Dong et al. 2021, 2022). Additionally, data derived from computational fluid dynamics simulations are corrected with real-time data using Kalman filtering to improve accuracy (Liang et al. 2020; Liu and Liang 2021). A physics-informed neural network incorporates partial differential equations governing fluid dynamics, such as the Navier–Stokes equations, into the neural network architecture, allowing for interpretability (Choi et al. 2022; Cai et al. 2021).

Similar to hybrid aerodynamic models, hybrid structural models integrate finite element analysis with machine learning algorithms such as SVM, ANN, and CNN (Zhilyaev et al. 2022; Li et al. 2024; Cheng and Yao 2022). These analyses are used to optimize design parameters. Moreover, the multiphysics interaction of fluid flow with structures (fluid–structure interaction), integrating mechanical and fluid dynamics and enhanced with machine learning algorithms, enables the prediction of complex interactions in the environment (Kareem 2020; Miyanawala and Jaiman 2019a; Reddy et al. 2019b).

Hybrid predictive maintenance models combine various anomaly detection techniques with data-driven approaches to identify potential failures or estimate the remaining life of wind turbines. These hybrid models can be used to diagnose several components of the wind turbine using different sensor data and in-built predictive models (Wu and Ma 2022; Selvaraj and Selvaraj 2022; Buabeng et al. 2021). In wind turbines, the gearbox, bearings, and other rotating components are the main points of interest. In study (Heydari et al. 2021), hybrid modelling for gearboxes, which are often prone to failure, is the focus. The proposed framework consists of several different methods: clustering filters, ant bee colony optimization algorithm, variational mode decomposition, multi-verse optimization algorithm, and wavelet transform. Combining these methods enables the detection of anomalies before a failure occurs. Primarily, supervisory control and data acquisition system data are utilized for this purpose (Heydari et al. 2021; Beretta et al. 2021; Pandit et al. 2023; Maldonado-Correa et al. 2024). Another important predictive analysis is the estimation of the remaining useful life of the components for proactive maintenance scheduling (Zhang et al. 2022; Liu et al. 2021; Guo and Wang 2021).

Table 3 Primary studies related to research question 1

RQ2: How do predictive digital twin applications integrate and analyze data from diverse sources to enhance their predictive capabilities?

From the reviewed studies, the integration and analysis of data from diverse sources for predictive digital twin platforms primarily focuses on three main challenges: integration, execution, and monitoring. As depicted in Table 5, research on data integration emerged most prominently during the initial database search, highlighting the critical need for effective methods due to the heavy reliance of digital twin platforms on data from multiple sources to build accurate and comprehensive models. The integration of heterogeneous data is essential for enabling a functional platform (Correia et al. 2023). Regarding execution, advancements in computational power have facilitated efficient model analysis. However, the majority of reviewed studies concentrate on methods such as feature selection and dimensionality reduction to manage large datasets (Qi et al. 2021). Real-time monitoring of digital twin platforms enables proactive decision-making and accurate forecasting, which are crucial for real-world applications. As indicated in Table 5, monitoring represents the second most studied aspect in the literature (Correia et al. 2023). The commonly employed techniques and methods are summarized in Table 4.

Data integration

Data is one of the key elements for predictive digital twin platforms. Integrating data from diverse sources into the digital twin platform requires several processes (Zhang and Qu 2021). The first step is identifying relevant data sources, which can include IoT devices, sensors, databases, external application programming interfaces, or historical trends (Minerva et al. 2020; Jacoby and Usländer 2020; Kaur et al. 2020; Platenius-Mohr et al. 2020). Each data source may have its own structure, including structured data from SQL databases, unstructured text files and images, or semi-structured data from various application programming interfaces (Bonney et al. 2022; Xu et al. 2019; Benzon et al. 2022).

These collected data need to go through cleaning and transformation methods to be useful and meaningful for further analysis. Data cleaning techniques address missing values, duplicates, outlier detection, and inconsistencies within the dataset (Alasadi and Bhaya 2017). Transformation methods may include normalization, discretization, and dimensionality reduction (García et al. 2016).Footnote 1 After preprocessing the data with cleaning and transformation methods, the data from different schemas and structures need to be aligned to have a unified format (Lv et al. 2020; Liu et al. 2023; Nguyen et al. 2013). Schema matching algorithms and ontology alignment enable the reconciliation of data schemas and types from diverse sources (Mei et al. 2020; Mohamed et al. 2023; Booshehri et al. 2021).

The different sources may provide temporal and spatial data. The alignment of these data is essential for reliable operation. Temporal alignment methods, such as time series alignment or event synchronization, ensure consistency across time-stamped data streams (Sharma and Balachandra 2019; Yue et al. 2024). On the other hand, spatial alignment techniques may include georeferencing or coordinate transformation for integrating geospatial data (Majidi Nezhad et al. 2019; Ma et al. 2024). As the data are aligned and synchronized, data fusion algorithms, such as Kalman filters or ensemble methods, can fuse information from diverse sources while considering uncertainties (Lio et al. 2021; da Silva et al. 2021).

Feature selection and dimensionality reduction

Enhancing the predictive capabilities of digital twin platforms, dimensionality reduction and feature selection are two important aspects to focus on the most relevant and informative features with the minimum data complexity. In digital twin applications, large amounts of data are necessary for reliable operation. These data include several input features, causing overfitting, an increase in model complexity, more computational resources, and decreased interpretability (Marti-Puig et al. 2019). Feature selection targets finding the most relevant subset of features to predict the target variables (Qadir et al. 2021). Filter and wrapper approaches are some commonly used frameworks. Filter methods assess the relevance of the features independently of the predictive model, whereas wrapper methods evaluate different combinations of features, yielding slower but more precise results (Liu and Chen 2019; de Sá et al. 2020). In wind forecasting algorithms, feature selection methods are deployed for comprehensive results with minimum computational resource demand (Mir et al. 2020).

Similarly, dimensionality reduction aims to reduce the number of input dimensions while retaining essential information. Principal component analysis (PCA) is a technique that projects high-dimensional data onto a lower-dimensional subspace defined by principal components. These components are then used in data-driven algorithms in wind energy systems (Deng et al. 2021; Wang et al. 2020; Gu et al. 2019; Kong et al. 2015). T-distributed stochastic neighbor embedding is a nonlinear dimensionality reduction technique that preserves the local structure of the data in a lower-dimensional space. In wind energy systems, T-distributed stochastic neighbor embedding is used to reduce the dimensionality of data clusters to identify patterns (Shen et al. 2019; Khan et al. 2019; Kouadri et al. 2020).

Real time monitoring

Efficient utilization of wind power depends on the real-time monitoring and optimization of turbine performance. Critical parameters like rotor speed, power output, and component temperature must be continuously monitored (He et al. 2022; Chakraborty et al. 2023). Supervisory control and data acquisition (SCADA) systems are commonly used technologies that interface with the turbines. The knowledge gained from such tools in real-time monitoring can be further implemented into the digital twin platform to increase predictive capabilities (Maldonado-Correa et al. 2020; Gonzalez et al. 2019). Another important aspect of real-time data analysis techniques is to detect anomalies in turbine performance, which can address potential component failures (Xiang et al. 2022; Morrison et al. 2022). Advancements in these techniques can evolve into predictive maintenance to predict the needs of individual components. By integrating subsystem models and real-time environment and turbine parameters, potential issues can be addressed, allowing proactive maintenance planning (Hsu et al. 2020; Wang et al. 2020; Shin et al. 2021). Several ongoing studies specifically focus on these areas, where methods and enabling technologies can be transferred to wind energy systems (van Dinter et al. 2022; Falekas and Karlis 2021; Zhong et al. 2023).

Real-time monitoring also plays a vital role in assessing wind resources. Continuous monitoring of wind speed, direction, and other meteorological data enables the assessment of available wind resources in real-time (Lio et al. 2021). With the methods mentioned in "Data integration" Section, integrating various types of data with the meteorological models built into the digital twin enables more precise wind forecasts. These forecast data can then be used to adjust turbine settings, such as yaw angles or pitch angles, to maximize energy production (Chen et al. 2019; Moness and Moustafa 2020; Tu et al. 2022). In study (Chen et al. 2019), a real-time feedback blade pitch control system is proposed for vertical axis wind turbines. To optimize the pitch angle of the blade, the suggested equation relies on real-time flow velocity, azimuth angle of the blade, and tip speed ratio. This real-time feedback pitch angle control system increases overall performance. Predictive digital twin applications can create a feedback loop, comparing predictions with actual outcomes to refine models iteratively. This continuous learning process enhances the predictive capabilities of the digital twin over time, enabling more accurate and reliable predictions (Fernandez-Gauna et al. 2022; Yang et al. 2019). The technologies explained in "RQ3: What are the key features and technologies that facilitate real time wind energy systems through predictive digital twin?" Section enable the remote diagnosis of issues and implementation of control strategies in real-time from a centralized digital twin platform (Zhao et al. 2020; He et al. 2021).

Table 4 Summary of different methodologies for Integrating and analyzing data from diverse sources
Table 5 Primary studies related to research question 2

RQ3: What are the key features and technologies that facilitate real time wind energy systems through predictive digital twin?

Real-time wind energy systems are important for optimizing wind farm performance. By integrating advanced technologies, real-time operating platforms facilitate decision-making processes. To identify the key features and technologies that enhance real-time wind energy systems through predictive digital twins, a comprehensive literature review was conducted, primarily focusing on academic journals and conference papers on digital twins and wind energy systems. The technologies were evaluated based on their relevance, impact on real-time monitoring and prediction, as well as overall contribution to system efficiency (Stadtmann et al. 2023; Qi et al. 2021). Figure 4 summarizes the key features and technologies enabling real-time operations.

IoT sensors for data acquisition

In the complex landscape of the wind energy systems, Internet of Things sensors are one of the important components, facilitating the collection of essential data for optimum performance and informed operation. These sensors are positioned across wind turbines and the operating environment to monitor several vital parameters, providing operators insights into operational conditions (Li et al. 2023; Wang et al. 2023; Liew et al. 2020).

Advanced multi-sensor platforms are employed in wind energy systems to capture a diverse range of data. These sensors encompass various technologies, such as anemometers for wind speed and direction, thermocouples for temperature monitoring, humidity sensors for atmospheric moisture levels, accelerometers for vibration analysis, and power meters for electrical output measurement (Karad and Thakur 2021). Additionally, emerging technologies like Light Detection and Ranging and Sonic Detection and Ranging support precise wind profiling and turbulence detection. Light Detection and Ranging allows for the detection of turbulent wind before it negatively influences turbine performance, thus optimizing energy production (Guo et al. 2022; Dimitrov et al. 2019). On the other hand, Sodar provides advantages in measuring the wind profile at different altitudes and supporting the anemometers mounted on wind turbines (Yang et al. 2020; Silva et al. 2023).

Communication networks

Communication networks in wind energy systems should be designed to ensure reliable transfer so that the collected sensor data can be used for comprehensive analysis (Zheng et al. 2019). Advanced standardized communication protocols such as MQTT and OPC UA allow sensor data to be transmitted efficiently and securely. Depending on the requirements, centralized control systems or cloud-based platforms are possible solutions(Haghshenas et al. 2023; Sasikala et al. 2021). These protocols enable reliable data transmission over various network infrastructures, facilitating access to critical operational insights.

Low-latency communication networks are essential for data transmission between operating subsystems and the central control system. Technologies like 5 G (fifth-generation cellular network technology) or the standards like time-sensitive networking prioritize the reduction of latency problems (Fahim et al. 2022; Isto et al. 2020; Nguyen et al. 2021; Farkas et al. 2018). In study (Isto et al. 2020), the focus is on 5 G networks for digital twin applications in remote machinery control systems. Two application scenarios are demonstrated: video feedback and haptic feedback. Compared to LTE (Long-Term Evolution), lower delay and jitter are observed in both cases. Wind turbines generate large volumes of data, including sensor readings, environmental parameters, and performance metrics. High-bandwidth communication networks, such as fiber-optic cables or high-speed wireless links, are essential for efficiently transmitting this data to predictive digital twin systems for analysis (Wu et al. 2021; Mashaly 2021). Security is another critical aspect of communication networks. Encryption protocols are employed to safeguard data integrity and protect against cyber threats, ensuring the confidentiality and security of sensitive information (Mccarty et al. 2023; Liu et al. 2020).

Edge computing and cloud computing

Edge computing enables data acquisition through sensors and IoT devices, as discussed in "IoT sensors for data acquisition" Section. Although edge devices often have limited computational power, they can still be useful for local processing. These computational sources can be programmed to support the predictive models implemented in the digital twin platform. Through edge computing platforms like NVIDIA Jetson or Intel Movidius, rapid adjustments can be made based on insights from analytics, thereby achieving optimum performance parameters more quickly (Saad et al. 2020; Hungud and Arunachalam 2020; Li et al. 2021).

On the other hand, cloud computing provides scalability through platforms capable of processing large amounts of data and performance parameters. These powerful frameworks support big data analytics for in depth trend analysis, enhancing the performance of predictive models dynamically. The dynamic update of predictive models supports long-term optimization. Integrating edge devices allows centralized management and condition monitoring of wind energy platforms, providing comprehensive insights for stakeholders (Fahim et al. 2022; Olatunji et al. 2021; Zhang et al. 2022).

Human machine interface

The human–machine interface (HMI) is one of the essential features enabling seamless real time operation. This technology focuses on the interaction between human operators and complex systems (Kumar and Lee 2022). The interface should provide intuitive visualization of real time data along with trends and future predictions. These data enable predictive analytics and provide a comprehensive view of system status. Visual elements may vary from simple charts to 3D models (Qin et al. 2020; Evergreen 2020). Some commonly used libraries and applications include WebGL, Plotly, and Unity (Kandemir et al. 2023; Haghshenas et al. 2023). These programs may incorporate interactive control panels where operators can adjust turbine settings or monitor performance. The selection of the required interaction should be planned according to operational conditions. Touchscreens, augmented reality, or other virtual controls allow for intuitive interaction with quick adaptability to a changing environment (Stadtmann et al. 2023; Lalik and Watorek 2021; Kilimann et al. 2019).

The human–machine interface, combined with decision support systems based on predictive insights, provides operators with contextual information and recommendations for decision-making. Another important role of the human–machine interface is to enable operational training as a support tool for new operators. Interactive tutorials, help menus, and troubleshooting guides assist operators in adapting to optimum operating conditions (Erdei et al. 2022; Kaarlela et al. 2020; Bucchiarone 2022).

Fig. 4
figure 4

Real-time operation facilitating features and technologies

Table 6 Primary studies related to research question 3

RQ4: What are the challenges commonly encountered in wind energy systems when implementing predictive digital twin solutions?

The integration of predictive digital twin solutions in wind energy systems enhances efficiency and reliability through advanced analytics. However, implementing these solutions comes with significant challenges that need to be addressed to realize their potential. Several review papers identify the most common key challenges in this domain, including data quality assurance, model complexity, model order reduction, validation, and calibration. These challenges are categorized based on their impact on the development, deployment, and execution of predictive digital twin solutions (Rodríguez et al. 2023; Hartmann et al. 2018; Liu et al. 2021).

Data quality assurance

Data quality assurance is a critical aspect of predictive digital twin platforms. High-quality data enables improved predictive accuracy, effective condition monitoring, and higher economic viability (Avanzini and Eriksson 2021; Eriksson and Markussen 2023). However, maintaining high-quality data presents several challenges, including managing complex data from diverse sources, addressing sensor reliability issues, quantifying uncertainty, and resolving data completeness problems.

The acquisition of reliable data from heterogeneous sensors and IoT devices requires continuous sensor calibration. In digital twin platforms, implementing periodic calibration algorithms is necessary to prevent inaccurate data (Ward et al. 2021; Koo and Yoon 2024). The uncertainties in a digital twin platform may originate from various sources, including measurement errors, variations in wind properties, and operational parameters such as rotor speed within the models. Techniques like Monte Carlo simulation and Bayesian inference are commonly used to quantify the magnitude and distribution of these uncertainties (Moghadam and Nejad 2022; Chen et al. 2021; Adedipe et al. 2020; Hirvoas et al. 2021). In the event of network failures or sensor malfunctions, implemented failover mechanisms ensure continuous data availability (Hung et al. 2022).

Model complexity and model order reduction

In predictive digital twin platforms for wind energy systems, sophisticated models introduce significant computational challenges. High-fidelity models can capture complex interactions and nonlinear behavior within and between wind turbines, but they demand substantial computational resources for large scale simulations. To address this issue, model order reduction techniques can achieve reliable predictive capabilities while reducing the need for extensive computational resources. However, these techniques also require validation with high-fidelity models (Taira et al. 2020).

Modelling types are discussed in "IoT sensors for data acquisition" Section. In physics-based modelling, computational fluid models are used to simulate the airflow around turbine blades, aiming to capture several effects such as fluid flow or wake formation (Siddiqui et al. 2019; Andersen and Murcia Leon 2022). For detailed analysis, these high-resolution models are required with intensive processing power. Similarly, in the structural dynamics of the components, finite element analysis is used to comprehend deformation or failure points under different loading conditions (Liang et al. 2023; Zhao et al. 2023; Gözcü and Dou 2020). Moreover, these two models may require coupled simulation to understand the interaction between aerodynamic forces and structural responses (fluid structure interaction), which becomes more computationally intensive (Sayed et al. 2019; Grinderslev et al. 2021; Liu et al. 2019). On the other hand, stochastic elements are necessary to achieve a realistic environmental simulation. Methods mentioned in "Data quality assurance" Section may address this issue with an additional computational cost. To address these temporal and spatial resolution challenges, some possible solutions are utilizing high-performance computing resources, using efficient data handling systems, or adapting model order reduction techniques (Michalakes 2020; Veers et al. 2023).

There are several model order reduction techniques that can be adapted depending on the application (Kumar and Ezhilarasi 2023). Proper orthogonal decomposition is a technique, which identifies significant modes by decomposing the system into orthogonal modes. This method can be utilized for analyzing wake dynamics, velocity fields or structure dynamics (Siddiqui et al. 2020; Premaratne et al. 2022; Zhao et al. 2021). Similarly, the balanced truncation approach, used in linear time-invariant systems, seeks to achieve a balance between controllability and observability while reducing the states (Lin et al. 2020). This method is useful in the modeling and control design of systems (Bui 2023; Morovati et al. 2021; Al-Iedani and Gajic 2020). Data-driven reduced order models include techniques inherited from deep learning algorithms, which can model nonlinear turbine aerodynamics, wind turbine interactions, and unsteady fluid–structure interactions with reliable predictive capability and less demanding computational resources. Some commonly used algorithms in these models are CNN, LSTM, and ANN. These data-driven reduced order models can be combined with different methods, enabling hybrid models for enhanced performance and accuracy (Wu et al. 2021; Zhang et al. 2022; Ali and Cal 2020; Siddiqui et al. 2020; Tabib et al. 2022).

Validation and calibration

Continuous calibration and validation of models and sensors on a predictive digital twin platform are crucial to ensure accurate and reliable platforms under real environmental conditions (Pimenta et al. 2020; Jonscher et al. 2022; Lee and Fields 2021; Bergua et al. 2023). However, several challenges need to be considered in this context. Addressing limited historical data, ensuring sensor reliability, and dynamically adapting to varying conditions pose significant challenges.

In newly deployed digital twin platforms, historical data might be scarce, limiting the ability to validate model performance. Collecting data from the system and operating environment takes time. However, to overcome such challenges, physics-based models explained in "Physics based modeling" Section  can be useful for simulating the dynamics in question (Vahidi and Porté-Agel 2022; Wang et al. 2022). The generated synthetic data can then be used for calibration in the initial phase. Bayesian inference techniques can be coupled with data assimilation methods to integrate synthetic data generated from physics-based models. This approach enhances the predictive capabilities of the model with new, unforeseen data (Valikhani et al. 2024; Hirvoas et al. 2022; Sousa and Gorlé 2019; Poterjoy 2022).

The data collected from sensors plays a key role in the predictive capabilities of digital twin platforms. However, these sensors may experience calibration drift over time due to environmental factors and mechanical wear. Therefore, it is necessary to employ advanced calibration techniques, such as periodic sensor recalibration, to ensure data accuracy (Han et al. 2020; Schwegmann et al. 2023). Additionally, model-based fault detection and isolation algorithms, such as observer-based approaches, can be utilized to detect sensor anomalies and correct measurements (Habibi et al. 2019; Liu et al. 2021; Rajpoot et al. 2021). Kalman filters with different variations or deep learning algorithms can further enhance data reliability based on the system dynamics (Hur 2019; Cho et al. 2021).

Dynamic system adaptation methods can be integrated into digital twin platforms to address validation and calibration issues. Implementing advanced control-oriented techniques such as model predictive control or adaptive robust control can calibrate the platform to replicate the dynamic behavior of adaptive control systems (Petrović et al. 2021; Collet et al. 2021; Mahmoud and Oyedeji 2019). Additionally, parameter estimation techniques such as least squares, extended Kalman filter, or sparse identification of nonlinear dynamics (SINDy) can be used to update the model parameters based on real-time sensor feedback under dynamic operating conditions (Ghareveran and Yazdizadeh 2019; Wang et al. 2022; Barhate et al. 2024). In this regard, data-driven approaches like reinforcement learning or neural networks can adaptively calibrate the system based on observed behaviors (Saenz-Aguirre et al. 2020; Xie et al. 2023; Saenz-Aguirre et al. 2019).

Table 7 Primary studies related to research question 4

Discussion

This section provides a deeper analysis of the most recent trends, methods, and challenges in predictive digital twin platforms for wind energy systems. The current state of this field is examined through four main discussion points, which target common methodologies, integration and analysis of various data sources, key features and technologies, and encountered challenges.

Commonly employed methodologies are handled in three main groups: physics-based modeling, data-driven approaches, and hybrid models. In physics-based modeling, the primary research areas include the mechanical behavior of turbine components, aeroelastic effects, and material properties. This research can extend to offshore wind turbine structures, investigating materials under such conditions and the effects of surface waves and ocean currents on structural dynamics. In aerodynamic models, numerical analysis in fluid flow problems has a solid foundation. Similarly, the aeroelastic effect on aerodynamic characteristics and dynamic inflow can be investigated further for better energy efficiency in wind farms. In terms of electrical models, grid integration is attracting attention due to the implementation of new renewable resources. Control models mainly focus on three aspects: pitch control, yaw control, and predictive maintenance. Predictive maintenance with fault detection algorithms is popular in different fields, and there is potential to adapt these technologies for wind energy systems.

In data-driven approaches, regression models, machine learning algorithms, and statistical methods are implemented in studies. Regression models are used to relate dependent variables with independent variables. Among data-driven approaches, machine learning algorithms represent the most popular research area, offering a variety of algorithm types. These models can capture nonlinear temporal and spatial features quite well; however, many of them lack interpretability. To enhance the capabilities of machine learning algorithms, statistical models are incorporated to obtain reliable patterns.

Hybrid models are among the most popular algorithms in predictive digital twin platforms for wind energy systems. For hybrid forecasting models, most research focuses on varying time intervals for wind speed forecasting, essential for operational planning, grid stability, and maintenance scheduling. In terms of fluid dynamics, hybrid models find popular applications in optimizing blade shapes using high-fidelity models, investigating, and mitigating real-time wake effects, as well as adapting control for pitch and yaw angles. Regarding structural models, common research areas include predictive maintenance based on structural health monitoring data, in cooperation with early warning systems for structural failures. Additionally, fatigue and load-bearing capacity during the design phase are two popular areas for structural optimization. One of the most significant developments in both structural and aerodynamics hybrid models is physics-informed neural networks, enabling the embedding of partial differential equations governing the physics laws into neural networks. This enables the investigation of complex flow patterns or structural responses with different material properties and conservation laws. Hybrid predictive maintenance models are another common implementation in predictive digital twin platforms. Estimating the remaining useful life with hybrid models is popular in various fields, with ongoing research, especially in wind turbines, through vibration and thermal analysis. With the increasing popularity of renewable energy systems, grid integration becomes an important field of research to enhance grid stability by aligning wind power generation with real-time demand forecasts.

Integrating data from diverse sources for analysis and improving predictive capabilities requires significant attention. Current studies show that integrating data from different sources and structuring these data in a comprehensible way is a trending area. Data preprocessing techniques are well established; however, there is still a need for research in unsupervised algorithms for processing data. Due to the intensity of multiple sensors, reliable frameworks and protocols with alignment methods are necessary. Continuous real-time monitoring is crucial for reliable predictive models, as it enables continuous learning to iteratively increase model accuracy. However, real-time monitoring requires models that maintain essential information. Therefore, feature selection methods are used to capture the necessary input, while dimensionality reduction can reduce the number of input dimensions.

Communication protocols such as MQTT and OPC UA are commonly employed for efficient and secure data transmission. Many studies focus on reducing latency in communication networks due to the need for time-sensitive networking in predictive digital twin platforms. These technologies enable cloud computing with large amounts of data. Transferring confidential data through these technologies requires secure encryption protocols to ensure secure operation. Relying on these data, human–machine interface modules can be developed. As part of a digital twin platform, interaction with systems through different means is essential for awareness and adapting to optimal conditions.

Data quality assurance is one of the challenges in digital twin platforms due to several heterogeneous data from different sensors and databases, which are associated with uncertainties. There is attention on continuous sensor calibration to eliminate errors. Also, with probabilistic simulations and ensemble methods, the inherited uncertainties within the models can be quantified. Another challenge is model complexity due to high-fidelity modeling, which is in disfavor in real-time operation. In recent studies, the focus is on model order reduction techniques to overcome the model complexity problems. Utilizing hybrid models in reduced-order models is a trending approach. Another challenge is model validation and calibration. To overcome this issue, a combination of historical data and the collected sensor data is mainly used with different type algorithms.

Despite the analysis provided in this study, several limitations need to be addressed to ensure an objective approach to predictive digital twin platforms for wind energy systems.

Physics-based, data-driven, and hybrid models introduce inherent biases associated with each methodology. For instance, physics-based models may provide robust and repeatable results in simulating mechanical behaviors and aerodynamic characteristics; however, they often rely on idealized assumptions, lacking real-world complexities. Data-driven approaches, particularly deep learning algorithms, can identify complex patterns, but as mentioned earlier in the text, they suffer from a lack of interpretability. Moreover, data-driven approaches are highly sensitive to the quality of training data. Hybrid models attempt to combine the strengths of these two methods, but they often inherit the limitations of both methodologies, leading to potential overfitting and computational inefficiencies.

The integration of heterogeneous data sources is a critical challenge that impacts the reliability of predictive models. Despite advances in data processing and alignment methods, the quality of data from diverse sources remains a significant concern. Sensor calibration errors, data transmission latency, and inconsistencies in data formats can introduce significant issues such as noise or biases. Furthermore, this study emphasizes the need for real-time monitoring and continuous learning, which necessitates robust data quality assurance mechanisms. However, the implementation of such mechanisms brings several challenges, such as handling missing or corrupted data points. High-fidelity models often result in high computational demands, making real-time application challenging. Model order reduction techniques may lead to the loss of critical details necessary for precise predictions. In the context of offshore wind turbines, these limitations introduce additional layers of complexity.

Quantifying uncertainties within predictive models remains a critical challenge. Probabilistic simulations and ensemble methods offer potential solutions, but they also introduce computational complexity and demand high-quality data. The scalability of the different models in wind energy systems and varying geographic locations is another limitation. Most studies focus on specific case studies under controlled environments, which may not be generalizable to other settings. For example, the performance of predictive maintenance algorithms developed for onshore turbines may differ when applied to offshore turbines due to different operational conditions. This study falls short of providing comprehensive strategies for managing uncertainties and dealing with the scalability of the methods.

In this study’s meta-analysis, it is observed that the current trend in predictive digital twin platforms for wind energy systems involves the attempt to overcome inherited challenges with hybrid models. Additionally, the trend in model development primarily consists of combining various models, incorporating both physics-based models and machine learning algorithms for better accuracy and interpretability.

Conclusions and future work

In conclusion, the literature review on predictive digital twins for wind energy systems highlights the significant potential in the renewable energy sector. Key findings from the literature indicate that predictive digital twins can be leveraged by various modeling types, including inherited methods from physics and machine learning algorithms. This capability allows for identification of potential failures, enhanced predictive capabilities, and informed decision-making processes. However, the successful implementation of predictive digital twins in wind energy systems requires overcoming several challenges. These include the need for:

  • High-fidelity data acquisition to ensure that data are collected precisely and accurately for comprehensive analysis. These data enable the training of models to support reliable decision making.

  • Standardized, reliable communication networks to align with industry standards, facilitating secure data exchange and interoperability.

  • Integration of diverse data sources, such as sensors, IoTs, historical databases, and external APIs, into a unified system to create a comprehensive view. This process requires methods such as data normalization and synchronization.

  • Addressing cybersecurity concerns to protect the integrity and confidentiality of the data involved.

  • Improving human–machine interface issues to ensure that the insights generated by predictive digital twins are effectively perceived by operators and decision-makers.

Future research should focus on enhancing the precision and reliability of predictive models by exploring hybrid approaches that combine physical and data-driven techniques. For instance, integrating finite element analysis with deep learning neural networks could significantly strengthen model capabilities. Developing methodologies to quantify and reduce uncertainties is essential for reliable operations. Leveraging techniques such as Bayesian inference and Monte Carlo simulations can facilitate robust predictive analysis. Incorporating diverse data sources, including historical trends and real-time environmental inputs, alongside pure sensor data, will improve model capabilities. Additionally, scalability and adaptability of predictive digital twin models across various systems and industries are crucial. This involves reviewing data compatibility, modularity, and interoperability. Common Data Models (CDM) and data lakes can help address compatibility and integration challenges. Moreover, focusing on APIs and middleware software will enable better data exchange.

Overall, predictive digital twins stand as a promising technology in the wind energy sector, which facilitates a shift towards greater sustainability. Continued innovation in this area will support the goal of achieving global renewable energy targets aligned with the United Nations Sustainable Development Goals.