Learning phase in a LIVE Digital Twin for predictive maintenance

Digital Twins are essential in establishing intelligent asset management for an asset or machine. They can be described as the bidirectional communication between a cyber representation and a physical asset. Predictive Maintenance is dependent on the existence of three data sets: Fault history, Maintenance/Repair History, and Machine Conditions. Current Digital Twin solutions can fail to simulate the behaviour of a faulty asset. These solutions also prove to be difficult to implement when an asset’s fault history is incomplete. This paper presents the novel methodology, LIVE Digital Twin, to develop Digital Twins with the focus of Predictive Maintenance. The four phases, Learn, Identify, Verify, and Extend are discussed. A case study analyzes the relationship of component stiffness and vibration in detecting the health of various components. The Learning phase is implemented to demonstrate the process of locating a preliminary sensor network and develop the faulty history of a Sand Removal Skid assembly. Future studies will consider fewer simplifying assumptions and expand on the results to implement the proceeding phases.


Introduction
The developing field of Digital Twins (DT) and the Industry 4.0 paradigm are expanding the foundation of digitalization to achieve intelligent prognostics, and diagnostics in the industry [1,2]. The Digital Twin concept was introduced in NASA's Apollo Project and can be described as a seamless integration of information and data between the cyber and physical spaces [3,4]. Juarez et al. [5] suggested that the precursor to the DT is a Digital Model, a visual portrayal of a physical device without the automated exchange of information. Designing a DT with self adjustability [6], self-calibration [7] and self-correction [8] allows for services such as Predictive Maintenance (PdM) and Structural Health Monitoring (SHM). PdM is a pro-active maintenance strategy specializing in defect inspection and fail-ure prevention to perform operational behaviour-based analytics on equipment [9,10]. This strategy relies on the availability of three data sets: 1) Fault History, is data containing the asset's normal operating and failure scheme, 2) Maintenance/Repair history, is the chronological information of the tasks performed, components replaced, etc., and 3) Machine Conditions, is the assumption that health is time varying and that the data should reveal patterns in the asset's health [11]. SHM is the assessment and evaluation of asset's structural integrity through the means of data processing and sensor communication to develop intelligent maintenance decisions [12]. However, there are 4 main challenges in establishing SHM: 1) Robust technology to sense structure response, 2) Communication between sensors, 3) Feature recognition to represent damage in the asset, and 4) Algorithms to process and detect magnitude of damage [13]. DTs are capable of PdM and SHM while being independent of reconfiguration at a supervisory level [14][15][16]. To promote an efficient PdM and SHM, an asset is to be equipped with a sensor net-work which maximizes the probability of damage detection (POD) [13,17]. Furthermore, an intelligent sensor network enables optimal process control, condition monitoring and asset management [18]. The variety of sensors installed on the physical entity are contingent on the application such as pressure sensors, accelerometer sensors, and gyroscope sensors. Strategical sensor placement, especially in inaccessible locations, are valuable due to their costly nature [19]. An interesting technique is it to embed sensors into a component using Additive Manufacturing [18]. The sensor data is processed and analyzed to predict the Remaining Useful Life (RUL) of an asset. RUL predictions can be categorized as 1) physical model-based, 2) knowledge-based, 3) Statistics model-based, or 4) datadriven methods [20,21]. Algorithmic approaches are proposed for RUL to develop policies for maintenance opportunities and maintenance actions [22]. Li et al. [23] developed a deep convolutional neural network to estimate RUL. Without prior expertise on prognostics, raw sensor data can be inputted to the CNN to estimate the RUL based on learning representations.
Typically, the physical model-based and statistics modelbased DT can achieve real time monitoring to establish a mapping between the DT and the physical asset [24][25][26]. Physical and Statistics model-based methods can produce knowledge of a healthy asset, though their solutions fail to simulate the behaviour of faulty assets. Data driven methods are only applicable when Fault History is readily available to train the models. In most instances, Failure History is difficult and timely to generate [27]. The comprehensive solution, LIVE Digital Twin, is a structured methodology to developing DTs focussed on PdM which compensates for the challenges addressed in literature. The LIVE DT terminology refers to the four principle phases: Learn, Identify, Verify, Extend. The phases are dependent on model-based solutions and multi-physics simulation in a digital environment covering various design, manufacturing, and inspection activities. Traditionally, structured digital design, digital manufacturing, and digital inspection have been using model-based solutions and multi-physics simulation. There are numerous examples of methodologies developed for various scale digital design [26,28], traditional and non-traditional digital manufacturing processes [29,30], and digital inspection [31]. In contrast to previous solutions, LIVE DT establishes a real-time connection and asset management with intelligent sensor allocation.
The four phases of LIVE DT, Learn, Identify, Verify, Extend, employ digital models of various fidelities to simulate the behaviour of the physical asset. High-Fidelity (HF) models are ultra-realistic virtual representations of a physical asset created in sophisticated Computer Aided Engineering (CAE) solvers. Low-Fidelity (LF) models simulate the behaviour of the physical asset with simple geometry, such as beams. Typically, HF are more commonplace due to their high accuracy. LF models are cost effective and have advantageous analytical speed which is favourable during conceptual phase [28,32]. However, it is necessary to calibrate the LF response with the HF model to reliably use it in the methodology. Furthermore, the methodology is divided into two categories: 1. Forward simulation (FWD): Given an initial external cause, the simulation can recognize the response. For example, a cantilever beam that is fixed on one end is influenced by an external force, and the ability to recognize and solve for resulting displacement of the free end is known as a forward simulation. 2. Backward simulation (BWD): Given a response, the simulation can recognize the feasible external causation. For example, a free end of the cantilever has been displaced in a direction, and the ability to recognize and solve for the associated applied loading is known as a backward simulation. The four phases of the LIVE Digital Twin methodology are as follows.
Learn: In the first phase, the HF model is designed and implemented in various FWD simulations to develop a fundamental understanding of the physical asset. The generated datasets are considered the expected response of the asset and contribute to its Fault History and Machine Conditions. The preliminary data is used to train and calibrate the LF model for proceeding phases of LIVE DT. It is possible to determine a preparatory sensor network.
Identify: In the second phase, the LF model is applied in various FWD simulations to discover further insights and the asset's limitations. A crucial aspect of this phase is the development and calibration of the LF model. The Simple Structural Beams (SSB) is a method of representing components in an LF model with simple beam elements [32]. An LF model constructed with SSB has the capability to be analyzed in a finite element environment. Optimizing the LF model relies on data collected in the previous phase to calibrate the various parameters which influence the response, such as the material properties or the geometry of the beam cross-section. Rather than calibrating the LF model to produce identical results, the focus is to study a model with sufficient likeness and minimal discrepancy to validate the findings. The LF model is employed in more rigorous testing and faulty cases and contribute to the continued development of the asset's Machine Conditions dataset. The studied limitations and responses are analyzed to recognize the most cost-effective sensor locations. The designed sensor network is then applied to both the HF model and physical twin.
Verify: In the third phase, the LF model is applied in a BWD simulation to evaluate various faulty cases. The In some instances, large data sizes, internet security, and latency can become concerns for data storage in a cloud. Edge computing or other related allow for data cleansing processes which are responsible for evaluating, validating, and correcting any data near the data source.
Extend: In the final phase, the DT is applied for the purpose of PdM. A bidirectional communication between the DT and physical entity is established. The collected data from the asset is the input for the HF model in a BWD simulation. Based on the Fault History and current data, the LIVE DT can detect incipient fault in the asset. The DT will also suggest maintenance and repair plans to maintain the health and circumvent failure. Figure 1 summarizes the LIVE DT terminology as well as presents the ideal outcome of each phase.
The objective of this analysis is to study the physical phenomenon of vibration and its relationship to health to develop a sensor network aimed to monitor a physical asset. To achieve the research goal, the following section will discuss the methodology and set up for an HF model. Methods of introducing fault and simulating modal analysis in a CAE software will be presented. This is followed by a case study of the Learning phase with a subassembly of a Sand Removal Skid. The results will be presented and discussed to realize a sensor network to monitor the asset.

Structure modelling
The vibration of a structural asset is an important characteristic which reveals insight to the general health of its components. Even slightly underperforming components suggest deviations in the natural vibration which is undesirable for any structural design [32]. This furthers the motive to be able to design a sensor network which is extremely efficient in data collection. A crucial aspect of this research is the ability to predict and simulate fault in a digital model of a Sand Removal Skid (Fig. 2). The physical asset is constructed with four flange connections which are typically fastened with screws. Rather than representing the flange connections, F1-F4, in the digital model with screws, the alternative is to remodel the components of interest with spring connections (Fig. 2).
The inclusion of stiffness effects is critical to the analysis however the presence of smaller components negatively affect the computation time of the analysis. Fine details on the model are insignificant to the result however they are computationally expensive to mesh. Meshing has a significant role in the CAE process and ensuring simulation accuracy. In general, finer meshes predict very accurate results while coarser meshes predict more approximate values. Therefore, omitting certain contours or small components to develop a practical finite element model benefit the overall productivity and efficiency. This case study presents the relation of the flange connection's degrading stiffness to analytical discrepancies revealing the current health of the model. To study the effects of stiffness, they The stiffness of the flange connection is calculated by K (N/m). The connection is dependent on the performance of n screws with a cross-sectional area, A (m 2 ). The length of the screw, L (m), is the distance between the parallel flange surface in the HF model. And the Young's Modulus, E (Pa), is based on the assumed material of steel.

Simulating fault and determining sensor locations
A modal analysis is selected in this case study to reveal the behaviour of a physical asset under predetermined faulty cases. The stiffness of the flange connection is proportional to the vibration response. Therefore, the objective of this learning phase is to prescribe sensor locations which effectively detect the deviations in vibration. In CAE software, the vibration can be represented with the natural frequencies (Hz) and amplitudes of vibration (m). The first frequency in a set is considered the fundamental natural frequency [32]. The initial data set becomes a benchmark set to recognize the behavioural deviations. As the response deviates, it is expected that the fundamental natural frequency will diverge from the benchmark set. Deviations in natural frequency places an asset at risk to resonating with other mechanical components, such as pumps or motors. In extreme cases of resonance, the asset is exposed to catastrophic failure. The calculated flange stiffnesses (Eq. (1)) are considered healthy flange connections which will perform consistently with the benchmark data set. Various faulty cases are achieved by reducing the flange stiffness value, K , by a predetermined percentage. Faulty cases are not limited and can include numerous combinations of compromised flanges. The faulty response can be compared to the benchmark data to develop candidate sensor locations to monitor specific flanges. A healthy case is simulated with a number of pairs, i, of mode shapes and frequencies. Multiple faulty cases are simulated each with equal number of i pairs. An algorithm is employed to study and contrast all faulty cases to the single healthy case. In turn, the faulty cases and their ith pairs are directly compared to the ith pair of the healthy case. A pair that deviates the most across all data sets is considered a faulty case's sensitive pair. A faulty case may contain multiple sensitive pairs. The algorithm then recognizes the sensitive pair as that which produces the greatest percent deviation in comparison to the other pairs in its own data set. The difference can be calculated by using a standard equation.
Where f i is the faulty frequency and f 0 is the healthy frequency at the equivalent ith position. There are two fundamental definitions which are used when comparing mode shapes: peaks and neutral points.
The peaks are the maximum displacement or magnitude of a mode shape. The neutral points are locations with zero displacement. In literature, the neutral points are more commonly referred to as nodes. However, the nomenclature is altered to avoid confusion with the finite element definition of a node. The number of sensors depend on the application and size of the model. Typically, the computational expense exponentially increases with the amount of installed sensors [31]. The case study will design a network where a single sensor will monitor an individual flange connection. This selection method maximizes the POD of the network. There are two main criteria for the optimal design of this sensor network: 1) There is a maximum level of resolution by locating the sensors at the peak of the mode shape. 2) There is minimal cross-sensitivity by avoiding the interferences between the location of the peaks. The first criteria requires that a candidate sensor be located on the peak of the sensitive mode shape. Sensors which are placed on neutral points suffer from limited data collection in its lifetime due to lack of vibration. The second criteria aims to avoid interference between the location of peaks with other unique sensitive mode shapes. Cross-sensitivity occurs when the other sensitive mode shapes experience peaks in similar or identical locations on the digital model. It is preferred that a sensor should be on a peak of its own sensitive mode shape while being a neutral point for the other cases. Thus, the composition of the individual sensors creates an effective sensor network.

Analysis
The necessary simplifying assumptions were developed to enhance the data collection process. Firstly, it was assumed that the tank is empty, or the mass of fluid is negligible. If the tank were filled with liquid or slurry, one would need to account for the sloshing, pre-stress, and added mass effects in the tank. Excluding this assumption would drastically affect the natural frequencies of vibration while the peak locations will be unaffected. Second, the ends of the pipes are assumed to be long pipes which continue past the boundaries of the substructure. Therefore, the pipe-ends have a symmetric boundary condition to have in-plane displacement and rotation. Thirdly, it was assumed the assembly is composed of alloy steel. This decreases the setup time required when manipulating HF models. Lastly, in any physical asset, 100% stiffness can be considered a healthy asset. For this case study, it is assumed that a 50% reduction in stiffness is appropriate to signify fault in the asset. This assumes that the asset can perform to standard if the flange connection proves to be 50% stiff. Accordingly, the sensor position should in theory be the most sensitive when the flanges are at 50% stiffness. However, in application this range may be too large, and maintenance may be required well before the 50th percentile. For the objective of the experimental data presentation and visualization, 50% is ideal.
The model is further prepared by omitting various finer details in the model in efforts to improve the quality of the mesh and reduce the computational time. This includes removing trivial details such as holes, fillets, and components such as screws or washers. Due to their small size, there is a miniscule influence on the simulation accuracy. In early phases of developing a DT, the rapid analysis of the HF model is required. Basic conditions which reflect realistic component interactions are mimicked, such as the defining boundary conditions for the vertical supports. The data collection process includes the modal analysis of the healthy case and the four faulty cases. Each faulty case features a single compromised flange. The entire analysis considers five data sets for comparison to study the sensitive mode shapes and develop four unique sensor locations. Additional model information includes the total number of elements, 111,986, as well as the total number of node locations, 225,303.

Results
The collected frequencies are first plotted to inspect the trend and observe the response (Fig. 3). This analysis collected the first fifteen mode shapes of the model which resided under 340 Hz. Sensors are not capable of monitoring a broad range of frequencies. Various sensors are only effective within a small range under 500 Hz. Though the four faulty cases follow the same trend as the healthy case, the deviations are further studied. Figure 4 calculates and analyzes the percent differences between the healthy and faulty cases using Eq. (2). A response of a faulted flange does not consistently deviate from the healthy case. The outstanding points at any ith mode shape are then considered candidate mode shapes and require further study.
An algorithm is then employed to analyze the frequencies of each flange to find the sensitive mode shapes. For example, the pair at the 10th position is the sensitive mode shape of fault case F2 as it possesses a 14.4% difference, the highest in its data set. Faulty cases F1 and F4 also have candidate mode shapes at the 10th position. However, F2's sensitive mode shape takes precedence due to its larger difference. An optimal sensor is located on the peak of the sensitive mode shape. This may not always be feasible due factors including inaccessibility or cross-sensitivity. To increase the number of candidate locations, node deflections which measure within 5% of the mode shape's peak are considered. In the event where the peak is not feasible, the algorithm optimizes the location which satisfies the criteria and minimizes cross-sensitivity. Table 1 tabulates the sensitive mode shapes, the frequency at which they occur, and the peak locations. The four listed peak locations compose the sensor network. The following figures   (Fig. 5-Fig. 8) illustrate the peak locations of the affiliated sensitive mode shapes.

Discussion
For the case study, the model is to be equipped with a network composed of four optimal sensor locations. As per Table 1, four unique sensors located on the peaks of their respective sensitive mode shape are selected. This study simulated faulty cases when a flange is compromised to 50% health. As the spring connections degrades, the ability Table 1 Critical mode shapes, frequencies, and peak/peak locations  to withstand vibration reduces and the asset's response deviates from the expected data. The prescribed sensor network is extremely sensitive to the related deformation at   location. The Sand Removal Skid is to be equipped with a sensor network composed of four unique sensors. The sensor locations can be verified by the criteria outlined in Sect. 2.2. Criteria 1 selects the node which best represents the peak of the mode shape. In this study, the sensitive nodes which exhibit at most 5% deviation of the true peak is selected as a candidate sensor location. As per Table 1, some faulty cases contain a larger population of sensitive nodes. This can be attributed to the varying mesh density associated to the components. As discussed, smaller components can only be finely meshed while other components have the liberty to be coarsely meshed. Certain sensitive mode shapes in this study contained peaks on denser meshes of the assembly thus produced more sensitive nodes. These smaller components could not be justified to be omitted or modified. Criteria 2 requires that the peak location must be a neutral point for the other sensitive mode shapes. This is to avoid cross-interference between functioning sensors. Allocation is important to maximize POD while minimizing interference [8]. In scenarios that peak selection is difficult, the peak is allowed to exist within the bottom 5% deformation of all opposing mode shapes. If Criteria 2 is not satisfied, statistical error may occur in the data collection process. Figure 9(a) illustrates the cross-interference between two poorly selected peaks. The neutral points are in blue while the peaks are in red. Multiple sensors equipped on a single vibrating component experience cross-sensitivity and will have difficulty deciphering the compromised flange. An ideal illustration of sensitive mode shapes is illustrated in Fig. 9(b). The peaks of both sensitive mode shapes exist on opposite sides of the assembly. Therefore, a sensor can be certain about the causation of vibration in the model. When considering the physical model, there are a variety of options for the sensor network regarding position and collection. It is preferred to have less sensors in more optimal and sensitive positions with a minimal level of crosssensitivity. Moreover, it is not very practical nor cost effective to install a surplus of sensors on the structure. Not only does it get costly very quickly but placing many sensors on the model will return copious amounts of data. These details constrict SHM achieving a higher efficiency. Data reduction and condensation are important points when considering the cost of data transfer and data analyses [33]. Jalalpour et al. [13] suggested a method where sensors sequentially relay data to reduce the overall data computation in one instance. Naturally, there are two categories of data to analyze, those which are available by the sensors and those which can be estimated or derived from the sensors [34]. Estimated parameters include the asset's RUL which is reveals knowledge of its future condition.
At this current juncture, a digital model is equipped with a preliminary sensor network capable of health monitoring. Furthermore, the study introduces the development of the Machine Conditions data set. The conclusion of the Learning phase produces foundational understanding of the asset as well as various Machine Conditions data. This is preparatory state in developing a LIVE DT for predictive maintenance. The next phase, Identify, furthers the development of a LIVE DT, and realizes an LF model of the Sand Removal Skid assembly. In general, LF models are computationally inexpensive in comparison to HF models. It is preferred to use SSB to develop the LF model where components or subassemblies can be represented using a singular beam. LF accuracy is achieved through calibration the response to the data sets collected in this case study. The Identify phase will also further develop the Machine Conditions as well as begin to reveal the asset's Fault History which is critical in establishing the potential for selfcalibration, adjustability, and correction. As data collection progresses, and faulty cases are introduced, the preliminary sensor locations can be modified.

Conclusion
This paper presents the novel methodology LIVE Digital Twin for developing Digital Twins with the focus of Predictive Maintenance. A case study presents the development and application of the Learning phase on a physical asset. As a result, the behaviour and Machine Conditions are understood, and a preliminary sensor network is designed. To summarize the methods, a spring connection is used to represent the flange connections of the Sand Removal Skid assembly. The stiffness can be reduced to emulate fault in the asset and recognize the deviating vibration responses. An algorithm is employed to compare the healthy and faulty responses and reveals a sensitive mode shape for each flange connection. Sensor locations are determined in accordance with two criteria to maximize the potential of damage detection in the network. The process is performed based on the frequency responses of various 50% healthy connection to explain the methodology. However, this does not limit the abilities to adapt the provided methodology for more comprehensive learning when the flanges are compromised differently. The case study concludes with an HF digital model with a precursive sensor network capable of health monitoring. The accuracy in this Learning phase is relative to the assumptions made. Especially, when considering geometric simplification, the subassembly chosen, and the variety of meshing, the numerical response of vibration may differ. To further develop the model into a LIVE DT, the realization of the calibrated LF model and Fault History development is necessary to establishing the real-time communication between the physical asset and the DT. Future work in this field will be to reduce the effects of the simplifying assumptions such as considering a tank with liquid. The added pre-stress and mass effects in the tank will affect the natural frequencies of the system. As well as considering a larger assembly similar to a physical Sand Removal Skid to reflect a more realistic sensor network and further develop the methodology. Furthermore, a small-scale prototype which experiences similar behaviour to the HF and LF models will be designed to further study the effects of vibration is a physical environment and confirm the latter phases of LIVE DT. The compliance of the LIVE DT with Industry 4.0's unique traits will increase its efficiency to be employed for smart maintenance. The systematic approach of LIVE Digital Twin here can be adopted for other physical assets.