Uncertainty Guided Ensemble Self-Training for Semi-Supervised Global Field Reconstruction

Recovering a globally accurate complex physics field from limited sensor is critical to the measurement and control in the aerospace engineering. General reconstruction methods for recovering the field, especially the deep learning with more parameters and better representational ability, usually require large amounts of labeled data which is unaffordable. To solve the problem, this paper proposes Uncertainty Guided Ensemble Self-Training (UGE-ST), using plentiful unlabeled data to improve reconstruction performance. A novel self-training framework with the ensemble teacher and pretraining student designed to improve the accuracy of the pseudo-label and remedy the impact of noise is first proposed. On the other hand, uncertainty-guided learning is proposed to encourage the model to focus on the highly confident regions of pseudo-labels and mitigate the effects of wrong pseudo-labeling in self-training, improving the performance of the reconstruction model. Experiments include the pressure velocity field reconstruction of airfoil and the temperature field reconstruction of aircraft system indicate that our UGE-ST can save up to 90% of the data with the same accuracy as supervised learning.


Introduction
Fast and accurate acquisition of global physics field in aerospace engineering is of great significance for stable operation of monitoring system, smooth control of the process.Since aircraft systems are required to operate in severe environments, the direct measurement of global physics field is extremely difficult.Therefore, developing reconstruction of the global physics field based on sensor observations is essential for the measurement and control in the aerospace engineering [1,2].Traditional reconstruction methods, including principal component regression, partial least squares (PLS) regression, support vector machine (SVM), and artificial neural network (ANN) [3,4,5,6,7], is usually data-driven.However, these methods are limited by the complexity of aircraft systems, these methods are typical shallow methods with limited representational ability which can no longer be adequate for reconstructing global fields from a limited number of sensors in high-dimensional nonlinear complex physics, such as fluid dynamics [8], thermodynamics [9], electromagnetism [10], and solid mechanics [11].
Deep learning with multiple layers has shown their potential in strong high-dimensional nonlinear problems.
It can automatically learn abstract and subtle features from large amounts of data, and already demonstrated their good performance applying to physics field reconstruction [1].Generally, the good performance is highly depended on plentiful labeled data.However, obtaining plentiful training samples for the current task is impossible.Although massive amounts of data are accumulated during industrial processes, these data are mostly unlabeled and cannot be used directly under supervised paradigms.Therefore, it is significance for industrial processes to efficiently utilize these abundant unlabeled data to improve reconstruction performance.
The semi-supervised learning (SSL) method can use only limited labeled and abundant unlabeled data to train the model, and has been successfully applied in computer vision [12], natural language processing [13], and other fields.SSL methods can be loosely classified into consistency based SSL and pseudo-label based SSL [14,15,16].The former SSL is mainly based on the smoothness assumption, which is not reasonable for regression probem, and therefore cannot be applied in such a typical regression problem of global field reconstruction.On the contrary, the pseudo-label based SSL, such as self-training method [17,18] which employ the teacher model to label the abundant unlabeled data and provides the student model for training, is flexible and not constrained by specific assumptions.Since self-training is well adapted to the task at hand, this paper mainly focuses on the self-training global physics field reconstruction.
Despite the good performance of the self-training method, it still mainly suffers from overfitting and noise interference.Especially in the reconstruction problem, the noise in the pseudo-label has a serious erosion on the prediction.Therefore, the unsolved question is how to avoid the effect of inaccurate pseudo-labeling on the model performance?
To address the problem, this paper firstly improves the accuracy of the pseudo-label so as to reduce the damage of the pseudo-label noise on the performance of the student model.Specifically, this paper using ensemble teacher to jointly guide the training of student models.The errors in pseudo-label created by a single teacher model can be mitigated by the "collective voting" of multiple teacher models, resulting in highquality pseudo-labels.Secondly, the student model is guided to concentrate on the ground truth or the areas without noise, thereby eliminating noise interference and improving the performance of the student model.
During the training, the uncertainty in the pseudo-label is quantified based on the ensemble teacher, which is further used as a guided information of noise in the pseudo-label.Based on uncertainty guided learning, the student model ignores the noise region in the pseudo-label during the training process, thus avoiding the propagation and accumulation of noise between the teacher and student model.Then, in order to further reduce the interference of noise to the student model, pre-training student separates the pseudo-label from the labeled data during the training.So that the noise in the pseudo-label can be "forgetten", and the student model is forced to focus on the ground-truth to obtain higher performance.
Based on the above strategy, Uncertainty Guided Ensemble Self-Training (UGE-ST) is proposed in this paper, including ensemble teacher, uncertainty guided learning and pre-training student.The innovations of this paper are summarized as follows: unlabeled training data with the ensemble teacher and pre-training student that reduce the damage of the pseudo-label noise on the student model.
2. This paper proposes uncertainty-guided learning for self-training to improve the performance of the model, which uses the uncertainty as the guided information of the student and supervises the learning process by reducing the effects of noise pseudo label.
3. Two physics field reconstruction problems verify the effectiveness of the proposed method.Experiments show that the proposed UGE-ST in this paper is able to substantially improve model performance with limited labeled data, and is in an advanced position compared to supervision and other semi-supervised methods.
The layout of this paper is given as follows.In Section II, the problem definition and the self-training framework are shortly overviewed.Then, the proposed UGE-ST are illustrated in detail, including ensemble teacher, uncertainty guided learning and pre-training student.After that, the effectiveness and feasibility of proposed approach are demonstrated in two case in Section III.Finally, conclusion is made.

Problem Definition
Consider an two-dimensional discretized physical field Γ described by the governing partial differential equations: where w x is the state vector in point x of physical field Γ that depends on parameters θ and time t; f represents the nonlinear function that governs the physical field Γ.
In practice, due to the complexity of the physical system, the state w of the whole system is usually unavailable.But the state of system on limited points can be observed by the sensors.We denote a(t; θ) as the observed state at time t.The purpose of physical field reconstruction is to reconstruct a complete state w(t; θ) of physical field from a limited observed state a(t; θ), where a(t; θ) = {a y , y ∈ Λ ⊆ Γ}, Λ is the set of observed points; F is the required reconstruction model which is deep neural network in this paper.It is worth mentioning that, although the physical system is time-dependent, the reconstruction model only relies on the observed state at the current step for predicting the system states.
In order to construct the deep reconstruction model, we take the limited observation state a(t; θ) as the input of model and the complete state w(t; θ) as the output of model. Step

Self-training Framework
Generally, the performance of the deep learning model is significantly influenced by the amount of labeled data that is manually annotated.However, collecting large amounts of data is labor-intensive and timeconsuming.Semi-supervised learning is considered a promising technology to mitigate the demand for label data, thereby reducing the cost of deep learning model applications in practical engineering.
Semi-supervised learning aims to generalize from a combination set of limited labeled data and abundant unlabeled data , where M N .Self-training is a classic semi-supervised learning method based on the idea of pseudo-label.The main steps are divided into two steps, as shown in the Fig. 1: 1.Firstly, a small amount of labeled data D l is used to train the teacher model M T with L 1 loss, which is formulated as: where y l i and ŷT i represent the ground truth and the prediction of teacher model, W and H represent the width and height of physical field.

Secondly
, training the student model M S through the teacher model M T .The teacher model M T is used to predict the unlabeled data D ul , and the predictions are adopted as the pseudo-label of unlabeled data.At the same time, labeled data is combined with pseudo-label to train the student model.The constraint can be formulated as: where ŷS i is the prediction of student model.
Discussion about self-training.Unlike the consistency regularization method, the self-training method does not depend on the smoothness assumption and is simple and versatile.However, the student model is trained based on the pseudo-label predicted by the teacher model.When there is a large noise in the pseudo-label, the performance of the student model is affected.
In order to avoid the influence of false label noise on the student model, we improve the self-training method in two aspects.The direct idea is to improve the accuracy of pseudo labels as much as possible so that the noise in pseudo labels is as small as possible to avoid noise's influence on the performance degradation of student models.Another idea is to prevent the effect of noise in pseudo labels on student model as much as possible when the accuracy of pseudo labels is determined.
Step 2 Step 1 Ground truth l y Ground truth l y sup 0

Uncertainty Guided Ensemble Self-Training
This paper proposes UGE-ST including ensemble teacher, uncertainty guided learning, and pre-training student to improve the performance of reconstruction.The pre-training student and the uncertainty-guided learning improve the performance of the student model by avoiding the influence of noise in pseudo-label as much as possible.While the ensemble teachers directly improve the accuracy of pseudo-label, further enhances the performance of the student model.The proposed method framework is shown in Fig. 2. Compared with the basic self-training method, our method consists of three steps: 1. Firstly, a small amount of labeled data D l is used to train the ensemble teacher model M eT with L 1 loss, which is formulated in Eq. 3.
2. Secondly, similar with basis self-training, the student model M S is trained by the ensemble teacher M eT .
The predictions of the ensemble teacher M eT on the unlabeled data D ul are adopted as the pseudo-label.
Then, the uncertainty of the pseudo-label is quantified and normalized as weights to be multiplied with pseudo-label supervision constraint, getting the uncertainty-guided learning constraint L un .
3. Thirdly, a small amount of labeled data D l is employed to re-train the student model M S with L 1 loss.

Ensemble teacher
Ensemble learning is a classic idea in deep learning, which combines multiple weakly supervised models in order to obtain a better and more comprehensive, strongly supervised model.The potential idea of ensemble learning is that even if a weak model gets a wrong prediction, other models can correct the error.Generally speaking, the performance of the model obtained by ensemble learning is better than that of a single model.
Due to the scarcity of labeled data for teacher model training, the performance of a single teacher model is poor, resulting in large pseudo-label noise.Here we combine ensemble learning with self-training, using ensemble learning to train and combine multiple teacher models to obtain a more accurate pseudo-label.
Specifically, We first initialize multiple teacher models Multiple teacher models are trained by labeled data D l follow the supervised loss as shown in Eq. 3 to obtain ensemble teachers , where ŷeT i = M i T (D ul ).Then, average multiple predictions to obtain pseudo-label y p for the unlabeled data as follows: Then, this study normalizes the uncertainty U p into range (0, 1), and weight the pixel value according to the uncertainty of each pixel in the pseudo-label.Among them, regions with larger uncertainty are given smaller weights, and areas with smaller uncertainty are given larger weights.The large weights force the model to learn more areas with smaller noise, while the smaller weights ignore regions with larger noise, thus avoiding the influence of noise on the model during the learning process.This paper defines the uncertainty weights as follows: In the end, combining uncertainty weights and pseudo-label loss to obtain uncertainty guided learning loss:

Pre-training student
The training of neural networks relies on empirical risk minimization.The more data used for training, the more consistent the distribution of datasets with that of all data, and the closer the empirical risk is to the expected risk.When the amount of data is small, the empirical risk effect is not ideal because empirical risk minimization tends to bring about overfitting.Therefore, it is necessary to expand the dataset so that the training data distribution is as consistent as possible with the full data distribution.Although the selftraining introduces a large amount of pseudo-label data, the presence of noise in the pseudo-label mistake the distribution with that of the real data, which eventually causes the empirical risk minimization to fail.
In order to avoid the interference of pseudo-label, this paper proposes a two-stage approach to train the student model, namely pre-training student.The idea is to exploit the catastrophic forgetting phenomenon in neural networks.First, the student model M S is pre-trained using pseudo-label y p , and the loss function is shown as Eq. 9.The student model M S is then retrained using a small amount of labeled data D l , the constraint can be formulated as:

Experiments
In this section, the effectiveness of the proposed STP method was verified through its application to two study cases.The first case is the airfoil velocity and pressure field reconstruction, and the second is the electronic devices temperature field reconstruction.In order to demonstrate the superiority of the proposed UGE-ST, the fully and semi supervised learning methods based on deep learning is implemented to compare performance, where the semi supervised methods include Mean teacher [19], co-training [20], and the vanilla self-training [18], and the rest remains unified.

Background and experimental setting
Reconstructing the pressure and velocity field of airfoil based on the finite sensors is significant to the design of the airfoil.This section adopts airfoil data [21] to verify the validity of proposed method.The convolutional neural network (CNN) is employed to implement the proposed UGE-ST method.CNN is the popularly deep learning model, which is widely used in computer vision [22,23].Compared with Multilayer Perceptron (MLP), CNN with smaller parameters has the ability to process spatial information.Thus it is suitable for processing regular physical field data.Here we adopt U-net [24] as the backbone of model.U-net is an effective CNN structure for image-to-image regression.It can capture the overall and detailed features of the image, and has the advantages of multi-scale fusion and processing large images.
Voronoi tessellation Global pressure field To use a CNN framework, the sparse observation data needs to be projected into an image in an appropriate manner.Similar to [1], this paper maps local measurements to the spatial domain via Voronoi tessellation [25] as shown in the left of Fig.In this case, a total of 1200 labeled data and 800 unlabeled data are generated using finite element simulation.In order to fully verify the applicability of the proposed method, we divides the labeled data into different scale partition protocols include 25, 50, 100, 200, 400, and 800.In addition, 400 labeled samples are set aside as test data.
In order to evaluate the performance of methods, we employ the Mean Absolute Error (MAE) as the evaluation metric, which is expressed as: where y i is the ground truth, ŷ is the prediction, N is the number of samples.
All experiments are implemented based on the Pytorch framework.The model training is completed on a high-performance computer server, and its computing resource allocation is Intel(R) Xeon(R) Gold 6242 CPU @ 2.80 GHz, Nvidia GTX 3090 GPU with 24GB vRAM, and 500 GB RAM.We initialize the weights of whole network randomly and train the models with AdamW optimizer.In order to ensure the fairness, the parameters of the optimizer used in experiments are consistent.The initial learning rate is η = 0.001.
Besides, Cosine Annealing Warm Restarts scheduler is selected as our learning rate policy.In all experiments, the epoch is fixed as 100, the batch size is set to 8 for both labeled and unlabeled samples.The prediction results concerning MAE are tabulated in Tab. 1, and the main purpose is to compare the prediction performances of different methods.The results shows that the proposed UGE-ST significantly exceeds the supervised baseline and other semi-supervised methods regardless of the number of labels.Taking pressure field prediction as an example, the performance of UGE-ST surpass the supervised baseline by 30%, 29%, 33%, 31%, and 25%, under 25, 100, 200, 400, and 800 labeled data respectively.Compared with the self-training, UGE-ST acquired 29%, 20%, 23%, 20%, and 17% improvements, respectively.As for the velocity in x and y axis, our method also achieves a better performance.Besides, the results also shows that the number of labeled data affects the model prediction accuracy, and the performance of all models decays significantly with decreasing of labels number.

Prediction Results and Analysis
It is worth mentioning that other semi-supervised methods, such as Mean teacher and Co-training, achieve poor prediction accuracy, even worse than the supervised baseline.The reason is that such end-to-end methods receive poor performance during the training process's early stage, further spoilt pseudo-label with large noise.
As the iterative learning process proceeds, the noise in the pseudo-label accumulates, disrupting the learning The predictions and errors visualization of supervision, self-training and UGE-ST for the pressure field are shown in Fig. 4. As seen from that, all three methods can predict the trend of the pressure field, and our approach can achieve less error.As shown in row six of Fig. 4, the predictions of the self-training and supervised methods have large errors around the airfoil, while our approach suppresses the errors well.

Background and experimental setting
The normal work of aircraft systems highly depends on the stable environment temperature and heat dissipation is essential to guarantee the working environment due to the internally generated heat.Thermal management of aircraft systems is an effective way to guarantee the proper working environment.Temperature field reconstruction [26] is a base task to obtain the real-time working environment of aircraft systems, which is adopt to verify the performance of our method.
Temperature field Electronic components layout  This section also uses MAE as the evaluation metric.All experimental settings are consistent with the first case, except that the epoch is set to 80.
Besides, the results also shows that with the decrease of labeled data numbers, the gain obtained by our method is increasing.In another words, our method can also achieve competitive results in the case of few shot.
From the perspective of changes in the number of labeled data, our method achieves an accuracy of 6.018e-03 with 100 samples, lower than 6.948e-03 acquired by the supervised method under 1000 samples.The observation shows that our approach can greatly reduce the amount of labeled data required for training with the same performance.The proposed method can save ten times the number of samples in some cases and at least two times the number of samples.
The predictions and errors visualization of supervision, self-training and UGE-ST for the temperature field are shown in Fig. 6.Although supervision and self-training can predict the trend of the temperature field, large errors still exist.In comparison, the proposed method can outstandingly reduce the error in these regions.As shown in row six of Fig. 6, the predictions of the self-training and supervised methods have large errors in the lower left corner, while our approach suppresses the errors well.The results indicate that with the increase of ensemble teachers number, the accuracy of pseudo-label is significantly improved, further leading to the performance increase of PT student.Although the UGE-ST is trained based on PT student, the performance is slightly improved.The reason is that this paper uses pseudo-label and labeled data to train the model successively, which reduces the influence of noise on the student model and also reduces the impact of pseudo-label on the final performance of the student model.We finally chose the ensemble number of 3 to balance the accuracy and training cost.

Conflict of interest statement
On behalf of all authors, the corresponding author states that there is no conflict of interest.

Replication of results
The code of the proposed method is publicly available at https://github.com/meitounao110/UGE-ST

. 3 . 2 .
Uncertainty guided learning Although using ensemble teachers can improve the accuracy of pseudo-label, noise still exists and affects the training of student model.We expect to filter out the noise in the pseudo-label so that the student model can learn more areas with low or no noise.Here we propose using uncertainty to guide student model training.Uncertainty can reflect the noise in the model prediction results, usually, the area with large noise is also uncertain.Since multiple different predictions ŷeT 1 , ŷeT 2 , • • • , ŷeT n can be obtained for the same sample by ensemble teachers M eT , we naturally use variance to measure the uncertainty of pseudo-label y p predicted by the teacher model as follows:

3 .
The grey dots in the figure indicate the placed sensors.Compared with the form of sparse measurement as input directly, voronoi tessellation can retain the spatial information of the sensor measurement points.Besides, the output of deep learning model is the global velocity or pressure field of airfoil as shown in the right of Fig. 3.

Figure 4 :
Figure 4: Visualization of predictions and errors for pressure field.

Figure 5 :
Figure 5: Sensors location and temperature field.

Figure 7 :
Figure 7: MAE changes with the number of ensemble teachers ST), which aims to improve the reconstruction performance with few labeled data.UGE-ST consists of ensemble teachers, uncertainty-guided learning, and pre-trained students.The ensemble teachers employ ensemble learning to construct multiple teacher models to guide the training of student models jointly and the "collective voting" of the ensemble teachers to mitigate the pseudo-label errors generated by individual teacher models, resulting in the accurate pseudo-label.Uncertainty-guided learning is based on ensemble teachers to quantify the uncertainty in pseudo-label, forcing students to learn regions with less noise in pseudo-label and avoiding the propagation and accumulation of noise in the student.Pre-trained student train the student model separately using pseudo-labeled and labeled data, enabling the student model to forget the noise in the pre-learned pseudo-label.Experiments show that the uncertainty-guided ensemble self-training method proposed in this paper can substantially improve the reconstruction performance of the global physics field with limited labeled data.

Table 1 :
Performance comparison under different number of labeled data for the airfoils velocity and pressure field.

Table 2 :
The structure of MLP.

Table 3 :
Performance comparison under different number of labeled data for the temperature field.

Table 4 :
The influence of the pre-training student.The influence of uncertainty guided learningThe influence of uncertainty guided learning is shown in Tab. 5. PT student and UGE-ST achieve better performance when guided by uncertainty.It is worth noting that the gain of PT student from uncertainty guided learning is greater than that of UGE-ST.The pre-training also causes this phenomenon; the impact of the pseudo-label on the final performance of the student model is damped due to the successive training of pseudo-label and labeled data.

Table 5 :
The influence of uncertainty guided learning under different number of ensemle teachers.In this paper, we propose a semi-supervised method, uncertainty-guided integrated self-training (UGE-