Uncertainty guided ensemble self-training for semi-supervised global field reconstruction

Zhang, Yunyang; Gong, Zhiqiang; Zhao, Xiaoyu; Yao, Wen

doi:10.1007/s40747-023-01167-4

Uncertainty guided ensemble self-training for semi-supervised global field reconstruction

Original Article
Open access
Published: 25 July 2023

Volume 10, pages 469–483, (2024)
Cite this article

Download PDF

You have full access to this open access article

Complex & Intelligent Systems Aims and scope Submit manuscript

Uncertainty guided ensemble self-training for semi-supervised global field reconstruction

Download PDF

Yunyang Zhang¹,
Zhiqiang Gong¹,
Xiaoyu Zhao¹ &
…
Wen Yao ORCID: orcid.org/0000-0001-5224-9834¹

649 Accesses
Explore all metrics

Abstract

Recovering the global accurate complex physics field from limited sensors is critical to the measurement and control of the engineering system. General reconstruction methods for recovering the field, especially the deep learning with more parameters and better representational ability, usually require large amounts of labeled data which is unaffordable in practice. To solve the problem, this paper proposes uncertainty guided ensemble self-training (UGE-ST), using plentiful unlabeled data to improve reconstruction performance and reduce the required labeled data. A novel self-training framework with the ensemble teacher and pre-training student designed to improve the accuracy of the pseudo-label and remedy the impact of noise is first proposed. On the other hand, uncertainty guided learning is proposed to encourage the model to focus on the highly confident regions of pseudo-labels and mitigate the effects of wrong pseudo-labeling in self-training, improving the performance of the reconstruction model. Experiments including the airfoil velocity and pressure field reconstruction and the electronic components’ temperature field reconstruction indicate that our UGE-ST can save up to 90% of the data with the same accuracy as supervised learning.

Physics-informed neural networks for enhancing structural seismic response prediction with pseudo-labelling

Article 20 November 2023

A survey on deep learning tools dealing with data scarcity: definitions, challenges, solutions, tips, and applications

Article Open access 14 April 2023

Pre-Training Physics-Informed Neural Network with Mixed Sampling and Its Application in High-Dimensional Systems

Article 26 January 2024

Find the latest articles, discoveries, and news in related topics.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Obtaining the global physical quantities’ distribution information of a system or a spatial field is crucial for complex engineering sciences, such as thermodynamics [1], fluid dynamics [2], electromagnetism [3], and solid mechanics [4]. For instance, the effective design, monitoring, and control of the airfoil require a fast and accurate estimation of the flow information. However, due to space, cost, and energy constraints in practice, the direct measurement of the global physics field is challenging, and sparse observations are often the only option. Therefore, developing reconstruction of the global physics field based on sensor observations is essential for engineering design, monitoring, and control, and has now received wide attention [5,6,7,8].

Basically, global field reconstruction is a special and challenging inverse problem. Traditional machine learning methods, including principal component regression, partial least-squares regression, and support vector machine [9,10,11], are typical shallow methods with the limited representational ability which is inadequate for reconstructing global fields from the incomplete sparsely measurements in high-dimensional nonlinear complex physics system. Deep learning has grown rapidly in recent years and shows potential in strong high-dimensional nonlinear problems [12]. It can automatically learn abstract and subtle features from large amounts of data and has already demonstrated good performance applying to physics field reconstruction [13, 14]. The success of deep learning depends on a large number of labeled data. However, obtaining the plentiful labeled data for the reconstruction model requires an amount of cost of labor and time [15, 16]. Therefore, it is significant to construct a deep learning model of physics field reconstruction with few labeled data to reduce the cost. In addition, in practical engineering, massive amounts of unlabeled data are accumulated, which cannot be directly used under the supervised paradigm. Efficiently utilizing these abundant unlabeled data is helpful for improving reconstruction performance.

The semi-supervised learning (SSL) method explores limited labeled data and abundant unlabeled data to train the model [17], and has been successfully applied in computer vision [18], natural language processing [19], fault diagnosis [20], medical image analysis [21], etc. The mainstream SSL methods can be loosely classified into consistency-based SSL and pseudo-label-based SSL. The former SSL is mainly based on the smoothness assumption [22], which is inapplicable for the regression problem. Therefore, consistency-based methods cannot be employed in global field reconstruction tasks. On the contrary, the pseudo-label-based SSL, such as the self-training method [23] which employs the teacher model to label the abundant unlabeled data and guide the training of the student model, is flexible and not constrained by specific assumptions. Since self-training is well adapted to the task at hand, this paper mainly focuses on the self-training global physics field reconstruction. Despite the good performance of the self-training method, it still mainly suffers from overfitting and noise interference. Especially in the reconstruction problem, the noise in the pseudo-label seriously erodes the prediction. Therefore, the unsolved question is how to avoid the effect of inaccurate pseudo-labeling on the model performance?

To address the problem, this paper first improves the accuracy of the pseudo-label so as to reduce the damage of the pseudo-label noise on the performance of the student model. Specifically, this paper uses ensemble teachers to guide the training of student model jointly. The errors in pseudo-label created by a single teacher model can be mitigated by the “collective voting" of multiple teacher models, resulting in high-quality pseudo-labels. Second, the student model is guided to concentrate on the ground truth or the areas without noise, thereby eliminating noise interference and improving the performance of the student model. During the training, the uncertainty in the pseudo-label is quantified based on the ensemble teacher, which is further used as the guided information of noise in the pseudo-label. Based on uncertainty guided learning, the student model ignores the noise region in the pseudo-label during the training process, thus avoiding the propagation and accumulation of noise between the teacher and student model. Then, to further reduce the interference of noise to the student, the pre-training student separates the pseudo-label from the labeled data during the training. So that the noise in the pseudo-label can be “forgotten", and the student model is forced to focus on the ground truth to obtain higher performance.

Based on the above strategy, uncertainty guided ensemble self-training (UGE-ST) is proposed in this paper, including ensemble teachers, uncertainty guided learning, and pre-training student. The innovations of this paper are summarized as follows:

1.
This paper proposes a novel self-training framework designed to improve the reconstruction accuracy of unlabeled training data with the ensemble teachers and pre-training student that reduce the damage of the pseudo-label noise on the student model.
2.
This paper proposes uncertainty guided learning for self-training to improve the performance of the model, which uses the uncertainty as the guided information of the student and supervises the learning process by forcing the model to the confidence areas of pseudo-label.
3.
Two physics field reconstruction problems verify the effectiveness of the proposed method. Experiments show that the proposed UGE-ST in this paper is able to substantially improve model performance with limited labeled data, and is in an advanced position compared to supervision and other semi-supervised methods.

The structure of this paper is given as follows. The related work is presented in the section “Related work”. In the section “Method”, the problem definition and the self-training framework are shortly overviewed. Then, the proposed UGE-ST are illustrated in detail, including ensemble teachers, uncertainty guided learning, and pre-training student. After that, the effectiveness and feasibility of proposed approach are demonstrated in two cases in the section “Experiments”. Finally, conclusion is presented in the section “Conclusion”.

Related work

Semi-supervised learning

Semi-supervised learning is a learning paradigm in machine learning that aims to construct models by combining supervised and unsupervised learning. It has been widely studied in computer vision [18], natural language processing [19], fault diagnosis [20], medical image analysis [21], etc. Semi-supervised learning explores limited labeled data and abundant unlabeled data to improve the model’s generalization, and further enhance the model’s ability under the few-shot scenarios. The mainstream semi-supervised learning methods are consistency regularization methods and pseudo-labeling methods [24]. Consistency regularization relies on the assumption that the prediction should remain relatively stable in the event of a practical perturbation to unlabeled data points. Consistency regularization methods, such as $\Pi $-model [25], Mean Teacher [26], ICT [27], and UDA [28], aim to reduce the predictions discrepancy between perturbed samples created by data augmentations. Pseudo-labeling methods, such as self-training [23], MPL [29], and S4L [30], utilize the model trained on a labeled dataset to generate additional training samples by labeling unlabeled data based on some heuristics, and further training the generalized model. The above works are mainly developed to solve semi-supervised classification. In deep learning, due to the differences between classification and regression, the assumptions in semi-supervised classification cannot be naturally applied to regression settings [31], which leads to the inapplicability of the consistency regularization methods to the regression task. In contrast, the pseudo-labeling approach can be readily applied to semi-supervised regression settings [31]. In this paper, we propose a novel self-training method to solve the field reconstruction, which is a dense regression problem.

Global field reconstruction

Fast and accurate prediction of the global physics field through sparse observations in the complex engineering sciences is of great significance for the stable operation of the monitoring and smooth control of the process [5]. In recent years, using deep learning to solve the physics field reconstruction problem has aroused extensive attention. Chen et al. [32] create a new benchmark dataset for temperature field reconstruction of heat source systems and propose machine learning modeling methods to advance the state-of-the-art methods over temperature field reconstruction. Fukami et al. [5] present a data-driven technique for spatial field recovery based on a structured grid-based deep learning approach that works for any number and position of sensors. Li et al. [33] develop a data-driven sensor placement framework for thermal field reconstruction. Sun et al. [34] propose a physics-constrained Bayesian deep learning approach to reconstruct flow fields from sparse, noisy velocity data. Erichson et al. [35] use a neural network to reconstruct a geophysical flow and forced isotropic turbulence field from limited sensors. All the above methods are based on the supervised paradigm and require a large amount of labeled data to complete the model training. In contrast, our method adopts the semi-supervised learning paradigm and reduces the required number of labeled data.

Method

Problem definition

Consider an two-dimensional discretized physical field $\Gamma $ described by the governing partial differential equations

$$\begin{aligned} \dot{\varvec{w}}_x=f\left( \varvec{w}_x,t;\varvec{\theta } \right) , x\in \Gamma , \end{aligned}$$

(1)

where $\varvec{w}_x$ is the state vector in point x of physical field $\Gamma $ that depends on parameters $\varvec{\theta }$ and time t; f represents the nonlinear function that governs the physical field $\Gamma $.

In practice, due to the complexity of the physical system, the state $\varvec{w}$ of the whole system is usually unavailable. However, the state of system on limited points can be observed by the sensors. We denote $\varvec{a}(t;\varvec{\theta })$ as the observed state at time t. The purpose of physical field reconstruction is to reconstruct a complete state $\varvec{w}(t;\varvec{\theta })$ of physical field from a limited observed state $\varvec{a}(t;\varvec{\theta })$

$$\begin{aligned} \varvec{w}(t;\varvec{\theta })=F\left( \varvec{a}(t;\varvec{\theta }) \right) , \end{aligned}$$

(2)

where $\varvec{a}(t;\varvec{\theta })=\{\varvec{a}_y, y\in \Lambda \subseteq \Gamma \}$, $\Lambda $ is the set of observed points, and F is the required reconstruction model which is deep neural network in this paper. It is worth mentioning that, although the physical system is time-dependent, the reconstruction model only relies on the observed state at the current step for predicting the system states.

To construct the deep reconstruction model, we take the limited observation state $\varvec{a}(t;\varvec{\theta })$ as the input of model and the complete state $\varvec{w}(t;\varvec{\theta })$ as the output of model.

Self-training framework

Generally, the performance of the deep learning model is significantly influenced by the amount of labeled data that are manually annotated. However, collecting plenty of labeled data is labor-intensive and time-consuming. Semi-supervised learning is considered a promising technology to mitigate the demand for label data, thereby reducing the cost of deep learning model applications in practical engineering.

Semi-supervised learning aims to generalize from a combination set of limited labeled data $D^l = \{(x^l_i, y^l_i)\}^N_{i=1}$ and abundant unlabeled data $D^{ul} = \{(x^{ul}_i, y^{ul}_i)\}^M_{i=1}$, where $M \gg N$. Self-training is a classic semi-supervised learning method based on the idea of pseudo-labeling, which is mainly divided into two steps, as shown in Fig. 1.

1.
First, a small amount of labeled data $D_l$ is used to train the teacher model $M^T$ with $L_1$ loss, which is formulated as
$$\begin{aligned} {L}_\textrm{sup }=\frac{1}{\left| {{D}_{l}} \right| } \sum \limits _{{{x}_{l}}\in {{D}_{l}}}{\frac{1}{HW}} \sum \limits _{i=0}^{HW}{\left| y_{i}^{l}-\hat{y}_{i}^{T} \right| }, \end{aligned}$$
(3)
where $y_{i}^{l}$ and $\hat{y}_{i}^{T}$ represent the ground truth and the prediction of teacher model, and W and H represent the width and height of physics field.
2.
Second, training the student model $M^S$ through the teacher model $M^T$. The teacher model $M^T$ is used to predict the unlabeled data $D_{ul}$, and the prediction is adopted as the pseudo-label $y_p$ of the unlabeled data. Then, the labeled data are combined with the unlabeled data to train the student model. The constraint can be formulated as
$$\begin{aligned} {{L}_{\textrm{semi}}}&=\frac{1}{\left| {{D}_{l}} \right| } \sum \limits _{{{x}_{l}}\in {{D}_{l}}}{\frac{1}{HW}} \sum \limits _{i=0}^{HW}{\left| y_{i}^{l}-\hat{y}_{i}^{S} \right| }\nonumber \\&\quad +\frac{1}{\left| {{D}_{ul}} \right| }\sum \limits _{{{x}_{ul}} \in {{D}_{ul}}}{\frac{1}{HW}}\sum \limits _{i=0}^{HW} {\left| y_{i}^{p}-\hat{y}_{i}^{S} \right| }, \end{aligned}$$
(4)
where $\hat{y}_{i}^{S}$ is the prediction of student model.

Discussion about self-training Unlike the consistency regularization method, the self-training method is simple and versatile, which does not depend on the smoothness assumption. However, the student model is trained based on the pseudo-label predicted by the teacher model. When there is a large noise in the pseudo-label, the performance of the student model is affected.

To prevent the student model from being affected by pseudo-label noise, we improve the self-training method in two aspects. One direct idea is to increase the accuracy of the pseudo-label as much as possible to minimize noise, thus avoid the performance degradation of the student model. Another idea is to improve the learning process to prevent the influence of noise on the student model under the certainty accuracy of the pseudo-label.

Uncertainty guided ensemble self-training

This paper proposes UGE-ST including ensemble teachers, uncertainty guided learning, and pre-training student to improve the performance of reconstruction. The pre-training student and the uncertainty guided learning improve the performance of the student model by avoiding the influence of noise in pseudo-label as much as possible. While the ensemble teachers directly improve the accuracy of pseudo-label, further enhances the performance of the student model. The proposed method framework is shown in Fig. 2. Compared with the basic self-training method, our method consists of three steps:

1.
First, training ensemble teachers. A small amount of labeled data $D_l$ is adopted to train the ensemble teachers $M^{eT}$ with $L_1$ loss, which is formulated in Eq. 3.
2.
Second, pre-training student. Different with the basis self-training, the student model $M^S$ is trained by the ensemble teachers $M^{eT}$ using uncertainty guiding learning. The prediction of the ensemble teachers $M^{eT}$ on the unlabeled data $D_{ul}$ is adopted as the pseudo-label $y_p$. Then, the uncertainty $U_p$ of the pseudo-label is estimated and normalized as weights to be combined with the pseudo-label supervision constraint, getting the uncertainty guided learning constraint ${L}_{{ugl}}$, which is formulated in Eq. 8.
3.
Third, re-training student. A small amount of labeled data $D_l$ is employed to re-train the student model $M^S$ with $L_1$ loss.

Ensemble teachers

Ensemble learning is a classic idea in deep learning, which combines multiple weakly supervised models to obtain a more robust and stronger supervised model. The potential idea of ensemble learning is that even if a weak model gets a wrong prediction, other models can correct the error. Generally speaking, the performance of the model obtained by ensemble learning is better than that of a single model. Due to the scarcity of labeled data for teacher model training, the performance of a single teacher model is poor, resulting in large pseudo-label noise. Here, we combine ensemble learning with self-training, using ensemble learning to train and combine multiple teacher models to obtain a more accurate pseudo-label.

Specifically, we first initialize multiple teacher models $\left\{ M_{1}^{T},M_{2}^{T},\ldots ,M_{n}^{T} \right\} $. Multiple teacher models are trained by labeled data $D_l$ follow the supervised loss, as shown in Eq. 3, to obtain ensemble teachers ${{M}^{eT}}=\left\{ M_{1}^{T},M_{2}^{T},\ldots ,M_{n}^{T} \right\} $. The unlabeled data $D_{ul}$ are predicted by ensemble teachers ${M}^{eT}$ acquiring results $\left\{ \hat{y}_{1}^{eT}, \hat{y}_{2}^{eT},\ldots ,\hat{y}_{n}^{eT} \right\} $, where $\hat{y}_{i}^{eT}={M_i}^{T}(D_{ul})$. Then, average multiple predictions to obtain pseudo-label ${y}_{p}$ for the unlabeled data as follows:

$$\begin{aligned} {{y}_{p}}=mean\left( \hat{y}_{i}^{eT} \right) ,i=1,2,...,n. \end{aligned}$$

(5)

Uncertainty guided learning

Although using ensemble teachers can improve the accuracy of pseudo-label, noise still exists and affects the training of student model. We expect to filter out the noise in the pseudo-label, so that the student model can learn more areas with low or no noise. Here, we propose using uncertainty to guide student model training. Uncertainty can reflect the noise in the model predictions; usually, the area with large noise is also uncertain. It is a popular solution to quantify uncertainty in deep learning to obtain the distribution of neural network prediction using classic Bayesian and ensemble learning methods [36]. In this paper, we adopt the ensemble approach for uncertainty estimation. Using the ensemble teachers ${M}^{eT}$, we can generate diverse predictions $\left\{ \hat{y}_{1}^{eT},\hat{y}_{2}^{eT},\ldots ,\hat{y}_{n}^{eT} \right\} $ for a single sample, further construct a distribution over neural network. The variance of the distribution is employed to measure the uncertainty of pseudo-label ${y}_{p}$ generated by the ensemble teachers as follows:

$$\begin{aligned} {{U}_{p}}=var\left( \hat{y}_{i}^{eT} \right) ,i=1,2,...,n. \end{aligned}$$

(6)

Then, this paper normalizes the uncertainty $U_p$ into range (0, 1), and weight the pixel value according to the uncertainty of each pixel in the pseudo-label. Among them, regions with larger uncertainty are given smaller weights, and regions with smaller uncertainty are given larger weights. The large weights force the model to learn the areas with small noise, while the small weights guide the model to ignore areas with large noise, thus avoiding the influence of noise on the model during the learning process. This paper defines the uncertainty weights as follows:

$$\begin{aligned} w=1-Norm\left( U_p\right) . \end{aligned}$$

(7)

In the end, combining uncertainty weights w and pseudo-label loss to obtain uncertainty guided learning loss is as follows:

$$\begin{aligned} {{L}_{{ugl}}}=\frac{1}{\left| {{D}_{ul}} \right| } \sum \limits _{{{x}_{ul}}\in {{D}_{ul}}}{\frac{1}{HW}} \sum \limits _{i=0}^{HW}{w(i)\left| y_{i}^{p}-\hat{y}_{i}^{S} \right| }. \end{aligned}$$

(8)

Pre-training student

The training of neural networks relies on empirical risk minimization. As the amount of training data increases, the distribution of the training dataset becomes more consistent with that of all data, bringing the empirical risk closer to the expected risk. When the amount of data are small, the empirical risk effect is not ideal, because empirical risk minimization tends to bring about overfitting. Therefore, it is necessary to expand the dataset, so that the training data distribution is as consistent as possible with the full data distribution. Although the self-training introduces a large amount of pseudo-label data, the presence of noise in the pseudo-label mistakes the distribution with that of the real data, which eventually causes the empirical risk minimization to fail.

To avoid the interference of pseudo-label, this paper proposes a two-stage approach to train the student model, namely pre-training student. The idea is to exploit the catastrophic forgetting phenomenon in neural networks. First, the student model $M^S$ is pre-trained using pseudo-label $y_p$, and the loss function is shown in Eq. 8. Through pre-training, student can learn the rough features contained in pseudo-labels and potentially improve the generalization of the student model. Then, a small amount of labeled data $D_l$ is adopted to re-train the student model $M^S$, and the constraint can be formulated as

$$\begin{aligned} {{L}_{\textrm{sup}}}=\frac{1}{\left| {{D}_{l}} \right| }\sum \limits _{{{x}_{l}} \in {{D}_{l}}}{\frac{1}{HW}}\sum \limits _{i=0}^{HW}{\left| y_{i}^{l} -\hat{y}_{i}^{S} \right| }. \end{aligned}$$

(9)

During the re-training process, the student model will forget the noise in the previous learning of pseudo-labels and focus on the ground truth to avoid the influence of the distribution difference caused by the noise in pseudo-labels.

Experiments

Setup

Network structure and datasets

The effectiveness of the proposed UGE-ST is verified through its application to two study cases. The first case is the airfoil velocity and pressure field reconstruction, and the second is the electronic devices’ temperature field reconstruction.

Airfoil velocity and pressure field reconstruction Reconstructing the pressure and velocity field of airfoil based on the finite sensors is significant to the design of the airfoil. We adopt airfoil flow data [37] to verify the validity of proposed method. The convolutional neural network (CNN) is employed to implement the proposed UGE-ST method. CNN is the popularly deep learning model, which is widely used in computer vision [38, 39]. Compared with Multilayer Perceptron (MLP), CNN with smaller parameters has the ability to process spatial information. Thus, it is suitable for processing regular physical field data. Here, we adopt U-net [40] as the backbone of model. U-net is an effective CNN structure for image-to-image regression. It can capture the overall and detailed features of the image, and has the advantages of multi-scale fusion and processing large images. To use a CNN framework, the sparse observation data need to be projected into an image in an appropriate manner. Similar to [5], this paper maps local measurements to the spatial domain via Voronoi tessellation [41], as shown in the left of Fig. 3. The grey dots in the figure indicate the placed sensors. Compared with the form of sparse measurement as input directly, voronoi tessellation can retain the spatial information of the sensor measurement points. Besides, the output of deep learning model is the global velocity or pressure field of airfoil, as shown in the right of Fig. 3.

In this case, a total of 1200 labeled data and 800 unlabeled data are generated using finite-element simulation. To fully verify the applicability of the proposed method, we divide the labeled data into different scale partition protocols which include 25, 50, 100, 200, 400, and 800. In addition, 400 labeled samples are set aside as test data.

Electronic components’ temperature field reconstruction. The normal work of electronic components highly depends on the stable environment temperature and heat dissipation is essential to guarantee the working environment due to the internally generated heat. Thermal management of electronic components is an effective way to guarantee the proper working environment. Temperature field reconstruction [42] is a base task to obtain the real-time working environment of electronic devices, which is adopt to verify the performance of our method. To verify the generality of the proposed method, in this case, we employ another classical neural network, viz., MLP, to implement our UGE-ST. MLP directly accepts sparse observations of the temperature field as model inputs while outputting the temperature value of points in the whole field. The temperature field data size used in this paper is $200 \times 200$, the number of placed sensors are 20, and we construct an MLP with 5 layers whose structure is shown in Table 1. The temperature field data are shown in Fig. 4. The grey dots in the right figure indicate the placed sensors.

Table 1 The structure of MLP

Full size table

In this case, a total of 1500 labeled data and 4000 unlabeled data are generated using finite-element simulation. Similar to the first study case, we divide the labeled data into different scale partition protocols which include 25, 50, 100, 200, 500, and 1000. Besides, 500 labeled samples are set aside as test data.

Evaluation

To evaluate the performance of methods, we select the Mean Absolute Error (MAE) and the Root-Mean-Square Error (RMSE) as the evaluation metric to focus on prediction accuracy.

MAE is an important metric to evaluate the predictive ability of the regression model, which is expressed as

$$\begin{aligned} \text {MAE}=\frac{1}{N}\sum _{i=1}^N \left| \hat{y}_{i}-y_{i} \right| , \end{aligned}$$

(10)

where $y_{i}$ is the ground truth, $\hat{y}_{i}$ is the prediction, and N is the number of samples.

RMSE measures the difference between the prediction and the ground truth, and it is sensitive to outliers and scale-dependent. RMSE is frequently used as a metric in machine learning and can be calculated as below

$$\begin{aligned} \text {RMSE}=\sqrt{\frac{1}{N}\sum _{i=1}^N\left( \hat{y}_{i}-y_{i}\right) ^2}, \end{aligned}$$

(11)

where $y_{i}$ is the ground truth, $\hat{y}_{i}$ is the prediction, and N is the number of samples.

Implementation details

All experiments are implemented based on the Pytorch framework. The model training is completed on a high-performance computer server, and its computing resource allocation is Intel(R) Xeon(R) Gold 6242 CPU @ 2.80 GHz, Nvidia GTX 3090 GPU with 24 GB vRAM, and 500 GB RAM. We initialize the weights of whole network randomly and train the models with AdamW optimizer. To ensure the fairness, the parameters of the optimizer used in experiments are consistent. The initial learning rate is $\eta = 0.001$. Besides, Cosine Annealing Warm Restarts scheduler is selected as our learning rate policy. In the flow field reconstruction experiments, the epoch is fixed as 100, and the batch size is set to 8 for both labeled and unlabeled samples. And, we train the temperature field reconstruction experiments for 40 epochs with the batch size set to 8.

To demonstrate the superiority of the proposed UGE-ST, the fully supervised baseline and some semi-supervised learning methods based on deep learning are implemented to compare performance, where the semi-supervised methods include mean teacher [26], co-training [43], and the vanilla self-training [23]. We implement the baseline and other semi-supervised methods using the same experimental setup as our UGE-ST.

Results

Airfoil velocity and pressure field reconstruction

The prediction results concerning MAE and RMSE are tabulated in Table 2, and the main purpose is to compare the prediction performances of different methods. p is the pressure field, and u and v represent the velocity field in x- and y-axis. The results shows that the proposed UGE-ST significantly exceeds the supervised baseline and other semi-supervised methods regardless of the number of labels. Taking MAE results of pressure field p prediction as an example, the performance of UGE-ST surpass the supervised baseline by 30%, 29%, 33%, 31%, and 25%, under 25, 100, 200, 400, and 800 labeled data, respectively. Compared with the self-training, UGE-ST acquired 29%, 20%, 23%, 20%, and 17% improvements, respectively. As for MAE results of the velocity field in x- and y-axis, our method also achieves a better performance. Besides, the results also shows that the number of labeled data affects the model prediction accuracy, and the performance of all models decays significantly with decreasing of labels number.

Table 2 Performance comparison under different number of labeled data for the airfoils’ velocity and pressure field

Full size table

It is worth mentioning that other semi-supervised methods, such as Mean teacher and Co-training, achieve poor prediction accuracy, even worse than the supervised baseline. The reason is that such end-to-end methods receive poor performance during the training process’s early stage, further spoilt pseudo-label with large noise. As the iterative learning process proceeds, the noise in the pseudo-label accumulates, disrupting the learning process of the model and eventually leading to a degradation of the model’s performance. The self-training is a two-stage training approach, in which the teacher model is first trained with labeled data, and then, the student model is guided by the teacher model. As the trained teacher model contains less noise than that in the early training stage, the performance degradation caused by error accumulation is mitigated. In other words, although there is still noise affecting the student model, this two-stage training approach ensures the performance of student is theoretically not lower than that of the teacher model. And our method can further improve the performance of the teacher model with the help of ensemble learning on the base of self-training, while avoiding the noise in the teacher model to affect the learning of the student model by the uncertainty guided learning and the pre-training student.

The predictions and errors visualization of supervision, self-training, and UGE-ST under 200 labeled data for the different pressure fields are shown in Fig. 5. The images in rows 1, 3, 5, and 7 represent the model predictions of the airfoil’s pressure field under different freestream conditions, and the corresponding errors is shown in rows 2, 4, 6, and 8. As seen from that, all three methods can predict the trend of the pressure field, and our approach can achieve less error. As shown in row 6 of Fig. 5, there are large errors around the airfoil in the predictions of the self-training and supervised methods, while our approach suppresses the errors well.

Table 3 Performance comparison under different number of labeled data for the temperature field

Full size table

Electronic components’ temperature field reconstruction

The prediction results concerning MAE and RMSE are tabulated in Tab. 3. The results shows that the proposed UGE-ST significantly exceeds other methods in different cases. The performance of UGE-ST surpass the supervised baseline by 33%, 35%, 36%, 30%, 25%, and 24%, under 25, 50, 100, 200, 500, and 1000 labeled data, respectively. Compared with the self-training, UGE-ST acquired 31%, 29%, 25%, 17%, 20%, and 14% improvements, respectively. Besides, the results also shows that with the decrease of labeled data numbers, the gain obtained by our method is increasing. In another words, our method can also achieve competitive results in the case of few shot.

From the perspective of changes in the number of labeled data, our method achieves an accuracy of 6.018e–03 with 100 samples, lower than 6.948e–03 acquired by the supervised method under 1000 samples. The observation shows that our approach can greatly reduce the amount of labeled data required for training with the same performance. The proposed method can save ten times the number of samples in some cases and at least two times the number of samples.

The predictions’ and errors’ visualization of supervision, self-training, and UGE-ST under 25 labeled data for the temperature field are shown in Fig. 6. The images in rows 1, 3, and 5 are the predicted temperature field of the electronic components under different power conditions, and the corresponding errors is shown in rows 2, 4, and 6. Although supervision and self-training can predict the trend of the temperature field, large errors still exist. In comparison, the proposed method can outstandingly reduce the error in these regions. As shown in row 6 of Fig. 6, the predictions of the self-training and supervised methods have large errors in the lower left corner, while our approach suppresses the errors well.

Table 4 Ablation study of different components with 200 labeled data on temperature field reconstruction

Full size table

Ablation studies

The proposed UGE-ST comprises three essential components: ensemble teachers, uncertain guided learning, and pre-training student. In this section, we examine the actual effectiveness of these three parts in detail. Besides, the sensitivity to the model size is also reported. We perform the ablation studies with MLP on the temperature field reconstruction task. Unless otherwise specified, 200 labeled data and 4000 unlabeled data are employed for training in all experiments.

Uncertainty guided ensemble self-training

To demonstrate the effectiveness of our novel self-training approach UGE-ST, we adopt the basic self-training as a baseline for ablation studies. The contribution of ensemble teachers, uncertain guided learning, and pre-training student are shown in Table 4. We report a significant reduction in the prediction error of 2.64e–04 and 8.61e–04, respectively, when adding ensemble teachers and pre-training student independently. While combining ensemble teachers and pre-training student can further reduce the prediction error by 1.086e–03. The results indicate that improving the pseudo-label’s accuracy by ensemble teachers and pre-training student can effectively mitigate the influence of noise during the training, thus reducing the prediction error. Uncertainty guided learning must be carried out on the basis of ensemble teachers. The result shows that adding uncertainty guided learning can reduce 3.62e–04 prediction error compared with the baseline. This indicates that uncertainty guided learning can filter out the noise and guide the model to learn the correct pixels in the pseudo-label, further improving the prediction accuracy of the model. When all three components are employed together, we achieve optimal performance which is 5.529e–03.

The influence of ensemble teachers number

We show the influence of ensemble teachers number on the prediction results in Fig. 7. The ensemble teachers’ number is set to 1 (which means no ensemble), 2, 3, and 5, respectively. In the UGE-ST framework, the training progress is divided into three steps. To thoroughly investigate the impact of ensemble teachers, we present the performance of pseudo-label, pre-trained student, and re-trained student, which are generated from the first, second, and third training steps, respectively. The pseudo-label is provided by the ensemble teachers, the pre-trained student is supervised by the pseudo-label on the unlabeled dataset, and the re-trained student is trained using only labeled data on the basis of the pre-trained student. These results offer insights into the influence of ensemble teachers throughout the training process. It is worth mentioned that we cannot quantify the uncertainty when the ensemble number is set to 1. Therefore uncertainty guided learning is not employed during the training to make a fair comparison.

The results indicate that with the increase of ensemble teachers’ number, the accuracy of the pseudo-label is significantly improved, further leading to the performance increase of the pre-trained student which is supervised by the pseudo-label. When the student model is re-trained with the labeled data, prediction accuracy is considerably increased. The performance of re-trained student is also observed to improve with the ensemble teachers number. However, the increase rate is small. The reason is that UGE-ST re-trains the student to reduce the influence of noise on the model, which also reduces the impact of pseudo-label on the final performance of the student model. Moreover, it can be observed that the performance improvement slows down as the ensemble number increases. We show the computational complexity through the FLOPs and parameters of the model in Table 5. The training cost of the model increases linearly with the increase of the ensemble number. When the ensemble number is 5, the prediction accuracy of the model is not significantly improved, while the training cost is nearly doubled. In practice, one must choose the ensemble number carefully to balance the training cost and the model’s accuracy. In our experiments, we finally chose the ensemble number of 3 to obtain considerable performance at the acceptable training cost.

Table 5 The computational complexity using UGE-ST with MLP under different ensemble teachers’ number

Full size table

Table 6 The influence of uncertainty guided learning under different ensemble teachers’ number

Full size table

Table 7 Performance comparison under different MLP sizes

Full size table

The influence of uncertainty guided learning

The influence of uncertainty guided learning is shown in Table 6. Uncertainty guided learning is used to train the student model with the pseudo-label in the second step of our method. To thoroughly investigate the impact of uncertainty guided learning, we present the performance of the pre-trained and re-trained students generated from the second and third training steps, respectively. The pre-trained and re-trained students achieve better performance when guided by uncertainty. When the ensemble number is 2, 3, and 5, the accuracy improvements obtained by the pre-trained student from uncertainty guided learning are 5.25e–04, 3.04e–04, and 1.22e–04, respectively. We note that the re-trained student gains less from uncertainty guided learning than the pre-trained student. This is because uncertainty guided learning forces the model to learn the confident region in the pseudo-label, and the performance is improved by eliminating the noise of the pseudo-label. At the same time, the impact of the pseudo-label on the final student model is damped due to the successive training of unlabeled and labeled data. The re-training process reduces the benefit of uncertainty guided learning while mitigating the impact of pseudo-label noise.

The sensitivity to the model size

We employ the MLPs with different neurons number to demonstrate the sensitivity of our approach to the model size. The structure of adopted MLPs and the results are shown in Table 7. We doubled and halved the number of neurons in the hidden layers on the basis of the MLP-O. It can be seen that the bigger model size acquires a more accurate prediction. From the perspective of the method, our UGE-ST can achieve stable improvement under different model sizes. With MLP-D, the performance of UGE-ST surpasses supervision and self-training by 26% and 18%, respectively. With MLP-H, UGE-ST still acquires 29% and 9% improvement compared with supervision and self-training.

Conclusion

In this paper, we propose a novel semi-supervised method, uncertainty guided ensemble self-training (UGE-ST), which aims to improve the reconstruction performance with limited labeled data. UGE-ST consists of ensemble teachers, uncertainty guided learning, and pre-training student. The ensemble teachers employ ensemble learning to construct multiple teacher models to guide the training of student model jointly and the “collective voting" of the ensemble teachers to mitigate the pseudo-label errors generated by the individual teacher model, resulting in the accurate pseudo-label. Uncertainty guided learning is based on ensemble teachers to quantify the uncertainty in pseudo-label, forcing the student to learn regions with less noise in pseudo-label and avoiding the propagation and accumulation of noise in the student. Pre-training student trains the student model separately using pseudo-labeled and labeled data, enabling the student model to forget the noise in the pre-learned pseudo-label. Experiments show that the uncertainty guided ensemble self-training method proposed in this paper can substantially improve the reconstruction performance of the global physics field with limited labeled data.

Data availability

Data available on request from the authors.

References

Hernandez Q, Badias A, Gonzalez D, Chinesta F, Cueto E (2021) Deep learning of thermodynamics-aware reduced-order models from data. Comput Methods Appl Mech Eng 379:113763
Article ADS MathSciNet Google Scholar
Kochkov D, Smith JA, Alieva A, Wang Q, Brenner MP, Hoyer S (2021) Machine learning-accelerated computational fluid dynamics. Proc Natl Acad Sci 118(21):e2101784118
Article MathSciNet CAS PubMed PubMed Central Google Scholar
Puzyrev V (2019) Deep learning electromagnetic inversion with convolutional neural networks. Geophys J Int 218(2):817–832
Article ADS Google Scholar
Haghighat E, Raissi M, Moure A, Gomez H, Juanes R (2021) A physics-informed deep learning framework for inversion and surrogate modeling in solid mechanics. Comput Methods Appl Mech Eng 379:113741
Article ADS MathSciNet Google Scholar
Fukami K, Maulik R, Ramachandra N, Fukagata K, Taira K (2021) Global field reconstruction from sparse sensors with Voronoi tessellation-assisted deep learning. Nat Mach Intell 3(11):945–951
Article Google Scholar
Zheng X, Yao W, Xu Y, Chen X (2019) Improved compression inference algorithm for reliability analysis of complex multistate satellite system based on multilevel Bayesian network. Reliab Eng Syst Saf 189:123–142
Article Google Scholar
Zheng X, Yao W, Xu Y, Chen X (2020) Algorithms for BDayesian network modeling and reliability inference of complex multistate systems: Part i-independent systems. Reliab Eng Syst Saf 202:107011
Article Google Scholar
Djordjevic V, Stojanovic V, Tao H, Song X, He S, Gao W (2022) Data-driven control of hydraulic servo actuator based on adaptive dynamic programming. Discrete Contin Dyn Syst-Ser S 15(7):1633–1650
Wold H (1966) Estimation of principal components and related models by iterative least squares. Multivariate Anal 1:391–420
Jiang Q, Yan X, Yi H, Gao F (2019) Data-driven batch-end quality modeling and monitoring based on optimized sparse partial least squares. IEEE Trans Ind Electron 67(5):4098–4107
Article Google Scholar
Desai K, Badhe Y, Tambe SS, Kulkarni BD (2006) Soft-sensor development for fed-batch bioreactors using support vector regression. Biochem Eng J 27(3):225–239
Article CAS Google Scholar
Shen L, Tao H, Ni Y et al (2023) Improved YOLOv3 model with feature map cropping for multi-scale road object detection. Measur Sci Technol 34(4):045406
Zhao X, Gong Z, Zhang J, Yao W, Chen X (2021) A surrogate model with data augmentation and deep transfer learning for temperature field prediction of heat source layout. Struct Multidiscip Optim 64(4):2287–2306
Article Google Scholar
Zhao X, Gong Z, Zhang Y, Yao W, Chen X (2023) Physics-informed convolutional neural networks for temperature field prediction of heat source layout without labeled data. Eng Appl Artif Intell 117:105516
Article Google Scholar
Tao H, Qiu J, Chen Y, Stojanovic V, Cheng L (2023) Unsupervised cross-domain rolling bearing fault diagnosis based on time-frequency information fusion. J Franklin Inst 360(2):1454–1477
Article Google Scholar
Zhang Y, Gong Z, Zhou W, Zhao X, Zheng X, Yao W Multi-fidelity surrogate modeling for temperature field prediction using deep convolution neural network. arXiv preprint arXiv:2301.06674
Yao W, Zheng X, Zhang J, Wang N, Tang G (2023) Deep adaptive arbitrary polynomial chaos expansion: a mini-data-driven semi-supervised method for uncertainty quantification. Reliab Eng Syst Saf 229:108813
Article Google Scholar
Zhang Y, Gong Z, Zhao X, Zheng X, Yao W (2022) Semi-supervised semantic segmentation with uncertainty-guided self cross supervision. In: Proceedings of the Asian Conference on computer vision, pp 4631–4647
Lai C-I, Chuang Y-S, Lee H-Y, Li S-W, Glass J (2021) Semi-supervised spoken language understanding via self-supervised speech and language model pretraining, In: ICASSP 2021-2021 IEEE International Conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 7468–7472
Yu K, Lin TR, Ma H, Li X, Li X (2021) A multi-stage semi-supervised learning approach for intelligent fault diagnosis of rolling bearing using data augmentation and metric learning. Mech Syst Signal Process 146:107043
Article Google Scholar
Luo X, Chen J, Song T, Wang G (2021) Semi-supervised medical image segmentation through dual-task consistency. In: Proceedings of the AAAI Conference on artificial intelligence, Vol. 35, pp 8801–8809
Zheng X, Yao W, Zhang Y, Zhang X (2022) Consistency regularization-based deep polynomial chaos neural network method for reliability analysis. Reliab Eng Syst Saf 227:108732
Article Google Scholar
Yang L, Zhuo W, Qi L, Shi Y, Gao Y (2022) St++: make self-training work better for semi-supervised semantic segmentation. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 4268–4277
Yang X, Song Z, King I, Xu Z (2022) A survey on deep semi-supervised learning, IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2022.3220219
Laine S, Aila T Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242
Tarvainen A, Valpola H Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: Advances in neural information processing systems, p 30
Verma V, Kawaguchi K, Lamb A, Kannala J, Solin A, Bengio Y, Lopez-Paz D (2022) Interpolation consistency training for semi-supervised learning. Neural Netw 145:90–106
Article PubMed Google Scholar
Xie Q, Dai Z, Hovy E, Luong T, Le Q (2020) Unsupervised data augmentation for consistency training. Adv Neural Inf Process Syst 33:6256–6268
Google Scholar
Pham H, Dai Z, Xie Q, Le QV (2021) Meta pseudo labels, In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 11557–11568
Zhai X, Oliver A, Kolesnikov A, Beyer L (2019) S4l: self-supervised semi-supervised learning, In: Proceedings of the IEEE/CVF International Conference on computer vision, pp 1476–1485
Van Engelen JE, Hoos HH (2020) A survey on semi-supervised learning. Mach Learn 109(2):373–440
Article MathSciNet Google Scholar
Chen X, Gong Z, Zhao X, Zhou W, Yao W (2023) A machine learning surrogate modeling benchmark for temperature field reconstruction of heat source systems. Sci China Inf Sci 66(5):1–20
Article Google Scholar
Li B, Liu H, Wang R (2021) Data-driven sensor placement for efficient thermal field reconstruction. Sci China Technol Sci 64(9):1981–1994
Article ADS Google Scholar
Sun L, Wang J-X (2020) Physics-constrained Bayesian neural network for fluid flow reconstruction with sparse and noisy data. Theor Appl Mech Lett 10(3):161–169
Article ADS Google Scholar
Erichson NB, Mathelin L, Yao Z, Brunton SL, Mahoney MW, Kutz JN (2020) Shallow neural networks for fluid flow reconstruction with limited sensors. Proc R Soc A 476(2238):20200097
Article ADS MathSciNet PubMed PubMed Central Google Scholar
Havasi M, Jenatton R, Fort S, Liu JZ, Snoek J, Lakshminarayanan B, Dai AM, Tran D Training independent subnetworks for robust prediction. arXiv preprint arXiv:2010.06610
Thuerey N, Weißenow K, Prantl L, Hu X (2020) Deep learning methods for Reynolds-averaged Navier-Stokes simulations of airfoil flows. AIAA J 58(1):25–36
Article ADS Google Scholar
Gong Z, Zhong P, Hu W (2020) Statistical loss and analysis for deep learning in hyperspectral image classification. IEEE Trans Neural Netw Learn Syst 32(1):322–333
Article MathSciNet CAS Google Scholar
Gong Z, Zhong P, Yu Y, Hu W, Li S (2019) A CNN with multiscale convolution and diversified metric for hyperspectral image classification. IEEE Trans Geosci Remote Sens 57(6):3599–3618
Article ADS Google Scholar
Ronneberger O., Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention, Springer, pp 234–241
Voronoï G (1908) New applications of continuous parameters to the theory of quadratic forms. Z Reine Angew Math 134:198
MathSciNet Google Scholar
Chen X, Gong Z, Zhao X, Zhou W, Yao W A machine learning modelling benchmark for temperature field reconstruction of heat-source systems. arXiv preprint arXiv:2108.08298
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational learning theory, pp 92–100

Download references

Acknowledgements

This work was supported by the Natural Science Foundation of China under Grant No. 62001502.

Author information

Authors and Affiliations

Defense Innovation Institute, Chinese Academy of Military Science, Beijing, China
Yunyang Zhang, Zhiqiang Gong, Xiaoyu Zhao & Wen Yao

Authors

Yunyang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhiqiang Gong
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyu Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Wen Yao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wen Yao.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Replication of results

The code of the proposed method is publicly available at https://github.com/meitounao110/UGE-ST.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zhang, Y., Gong, Z., Zhao, X. et al. Uncertainty guided ensemble self-training for semi-supervised global field reconstruction. Complex Intell. Syst. 10, 469–483 (2024). https://doi.org/10.1007/s40747-023-01167-4

Download citation

Received: 19 February 2023
Accepted: 29 June 2023
Published: 25 July 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s40747-023-01167-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Uncertainty guided ensemble self-training for semi-supervised global field reconstruction

Abstract

Similar content being viewed by others

Physics-informed neural networks for enhancing structural seismic response prediction with pseudo-labelling

A survey on deep learning tools dealing with data scarcity: definitions, challenges, solutions, tips, and applications

Pre-Training Physics-Informed Neural Network with Mixed Sampling and Its Application in High-Dimensional Systems

Explore related subjects

Introduction

Related work

Semi-supervised learning

Global field reconstruction

Method

Problem definition

Self-training framework

Uncertainty guided ensemble self-training

Ensemble teachers

Uncertainty guided learning

Pre-training student

Experiments

Setup

Network structure and datasets

Evaluation

Implementation details

Results

Airfoil velocity and pressure field reconstruction

Electronic components’ temperature field reconstruction

Ablation studies

Uncertainty guided ensemble self-training

The influence of ensemble teachers number

The influence of uncertainty guided learning

The sensitivity to the model size

Conclusion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Replication of results

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation