In recent years, data-driven methods such as deep learning have made remarkable achievements in many fields [1]. However, it is worth noting that, in addition to images and natural language processing, which have diverse Internet resources, there are relatively few research fields that can truly collect a large quantity of data [2]. Especially for high-end manufacturing, such as aerospace, the cost of health condition data collection is very high. Therefore, it is valuable to realize the few-shot health condition estimation of such complex systems [3]. Many machine learning methods require massive quantities of labeled data for training to improve their performance [4]. Therefore, for the field of health condition estimation, research methods such as neuroadaptive networks [5], support vector machines [6], Gaussian process regression [7], and hidden Markov models [8], which are all data-driven methods in a broad sense, are difficult to directly apply to the few-shot learning field that exists widely in daily conditions [9]. However, to the best of our knowledge, there are currently no studies specifically aimed at few-shot health condition estimation. However, scholars have applied few-shot learning theory in the fields of fault diagnosis [10], image classification [11], target detection [12], etc. Therefore, studying few-shot health condition estimation has both theoretical and practical significance.

The few-shot learning theory is a new learning method proposed to solve the insufficient information of few-shot datasets, which can also be called small samples. It studies how to train an intelligent and effective machine recognition model by a small number of training samples (usually dozens of magnitudes, or a single sample, or even zero sample) [13]. It is generally believed that a few-shot learning methods can be divided into the following. (1) Data enhancement methods improve the performance of few-shot learning by expanding training samples, but this kind of method does not fundamentally solve the problem of few-shot learning [14]. (2) The metric learning methods simulate the distance distribution between the samples, which is an embedded space, make the samples of the same class close to each other and the samples of different classes far away from each other. However, this kind of method can easily lead to overfitting [15]. (3) Initialization methods train the model in the source domain and then fine-tuning on the target domain to achieve fast iteration and good generalization ability, including transfer learning, in which the source and target domains are similar [16], and meta-learning in which clouds learn to learn. Among them, meta-learning methods have developed rapidly. For example, Wang et al. proposed a metric-based meta-learning model for small-sample fault diagnosis [17]. Ding et al. proposed meta-deep learning for implementing small-sample rotating machinery health prognostics [18]. The essence of these methods is to achieve efficient few-shot learning by acquiring meta-knowledge. However, the current understanding of meta-knowledge is not deep enough, so the generalization ability of related meta-learning needs to be improved [19]. However, in few-shot learning research, another very effective method is often overlooked, which is the method of knowledge reasoning. Compared with abstract meta-knowledge, expert knowledge is very specific and vivid. Since expert knowledge is integrated with past learning experience, it can effectively improve the learning ability of small samples [20].

To further understand few-shot learning, we find that the data enhancement methods start from the data level and improve the few-shot learning ability by mining data features and expanding the amount of training data [21]. The metric learning methods start from the sample level and improve the learning ability of small samples by understanding the overall distribution of the sample [22]. The initialization methods start from the knowledge level by learning the source domain, which is related to the target domain and obtaining basic knowledge to improve the few-shot learning ability of the target domain [23]. There are attempts to solve the few-shot learning problem from three different levels, and a single kind of method has certain drawbacks. Therefore, it will be interesting to merge the above methods.

Among them, although the data enhancement method cannot fundamentally solve the problem of few-shot learning, it has been proven to be beneficial to deep machine learning and is very effective for few-shot learning [24]. Therefore, it can be used as an auxiliary method for other few-shot learning methods. Early data enhancement methods used basic changes such as translation, rotation, and shear to the existing data to obtain a richer variety of generated data, thereby avoiding the appearance of overfitting.

In recent years, some new data enhancement methods have emerged, such as generative adversarial networks (GANs) [25], disturbance compensation [26], and feature space enhancement methods [27]. Among them, Goodfellow et al. proposed a dual network structure that optimizes the generation model through the adversarial process, which has good training efficiency and generation effect. After training, the GAN can fully mine data features and achieve good data enhancement.

In past research on device health condition estimation, the methods used have a precondition, that is, the training data and test data satisfy the same distribution [28], which is one of the foundations of the research. However, we should not totally ignore that this precondition is almost non-existent in the daily condition, just as there are no two leaves in the world that are exactly the same. Even the working environment of the same kind of devices cannot be completely similar, which makes the decline in their health condition different [29]. This assumption facilitates our research, but we should not completely ignore the existence of differences in the training data and test data distributions. Especially, for the few-shot health condition estimation, because the few-shot data contain less information, it may result in a poor model training effect [30]. Therefore, the initialization methods mentioned above, especially the transfer learning method, can fully learn from a sufficient quantity of source domain data, which is similar to the target domain and can achieve accurate estimation of the target domain without satisfying the assumption of independent and identical distribution [31].

The difficulty of few-shot learning is the lack of prior knowledge caused by insufficient data. Therefore, how to effectively improve the prior knowledge level of learning methods will be the key point to solving few-shot learning problems [32]. However, Tang et al. demonstrated that knowledge reasoning methods such as the belief rule base (BRB) have better few-shot learning capabilities than data-driven methods such as neural networks [20].

Therefore, a novel idea is to combine the data enhancement ability of GAN and the few-shot learning ability of BRB to achieve a more accurate few-shot health condition estimation. This paper consists of the following parts. A literature review on GANs and BRBs is presented in Sect. “Literature review”. In Sect. “Batch monotonic GAN”, we introduce the basic idea of the generative model and GAN and propose a batch monotonic GAN for few-shot data generation. In Sect. “Generative transfer-belief rule base”, we propose a generative transfer-belief rule base (GT-BRB) model and describe the implementation process of the GT-BRB. In Sect. “Case study”, a few-shot dataset is simulated using NASA lithium battery data to validate the GT-BRB with and without auxiliary training data separately. Section 6 presents the conclusions and future directions.

Literature review

Few-shot data generation is one of the latest research areas of GAN, among which local-fusion GAN (LoFGAN) is proposed to fuse local representations for few-shot image generation [33]. Few-shot GAN (FSGAN) uses component analysis techniques for adapting GANs in few-shot settings (fewer than 100 images) [34]. Matching-based GAN (matching GAN) is proposed for few-shot image generation, which includes a matching generator and a matching discriminator [35]. It can be seen that the current generation methods of few-shot data based on GAN are concentrated in the image field. Relevant studies have shown that device health condition data generally present an overall monotonic characteristic [36], which is quite different from the characteristic of the differential distribution of image data. Therefore, the methods for few-shot data generation in the image field are not suitable for such data as the few-shot health condition data of devices. Therefore, this paper designs a novel batch monotonic GAN (BM-GAN) for the few-shot data generation of the device health condition.

Yang et al. proposed a belief rule base (BRB) method based on rule reasoning and data-driven thinking, which can realize the effective combination of expert knowledge and data training and has good nonlinear relationship fitting ability [37]. In recent years, related research on BRBs has mainly focused on initial parameters automatically generating methods for large-scale BRBs [38], rule reduction and training methods for extended BRBs [39], and hybrid BRBs for safety assessment with data and knowledge under uncertainty [40]. It is regrettable that there are few studies about the application of the BRB method for few-shot conditions, especially research on the combination of the BRB method and transfer learning. Therefore, this paper will first use the few-shot real data to generate a larger quantity of simulated training data through GAN, second, combine the simulated training data and expert knowledge to train a generalized BRB model, and third use real data to fine-tune the generalized model to obtain a dedicated BRB model to improve the accuracy of few-shot health condition estimation.

The work of this paper is based on the following foundations. (1) GAN has excellent data generation capabilities, but the existing research on few-shot data generation focuses on the image field, and it is difficult to adapt to the overall monotonic characteristics of the device health condition. (2) The BRB method has good few-shot learning capabilities, but there are few publications about the effective transfer learning architecture of knowledge inference methods such as BRB. (3) Few-shot data are the normal state of the actual device operation process, but there are few publications about the few-shot health condition estimation.

The main contributions of this paper are as follows:

  • We propose a batch monotonic GAN model, which solves the problem that traditional GANs can only generate simulation data with the same dimensions as the input data.

  • We provide a transfer learning architecture of a knowledge-based method such as BRB and introduce its implementation process with or without auxiliary data.

  • We analyze and prove that the combination of data enhancement and expert knowledge can effectively solve the problem of few-shot health condition estimation.

Batch monotonic GAN

In this section, the basic idea of the generative model and GAN is introduced. Then, by analyzing the implementation process of GAN, it is concluded that traditional GAN cannot generate data of different dimensions. Therefore, a batch monotonic method is proposed as the loss function of GAN, which can generate simulation data with different dimensions from real data.

The basic idea of the generative model and GAN

The difficulty of few-shot learning is the lack of sample quantity and quality. It is difficult to learn the complete distribution of data through limited data. The most direct method for solving the lack of data is to generate simulated data by learning the data distribution and prior knowledge, which is the basic idea of generative models.

In the context of statistics, a generative model refers to a mathematical model that randomly generates a simulated data sequence and a joint probability distribution of the simulated data and annotated result sequence under a specific implicit mapping relationship. In the artificial intelligence field, generative models are used to sample data based on the probability distribution and to generate expanded datasets by learning existing data structures. This process can be defined as follows:

$$ x^{\prime} = G(L(sample(x))), $$

where x is the real data, x' is the generated data, sample represents the sampling operation, L represents the learning function of the probability distribution, and G represents the data generation based on the learned probability distribution. The generative model can be divided into two types according to its function. The first obtains the exact distribution function of the dataset by learning the given data. The second generates new data under the premise of a fuzzy data distribution function. The generative adversarial network used in this paper is the second type of model.

The generative model has been proven to be able to effectively simulate and deal with high-dimensional distribution problems, and has produced much interesting research with reinforcement learning, semi-supervised learning, etc. In practical applications, such as image enhancement and artistic creation are very valuable research content. Early elementary numerical transformation data generation methods essentially reorganize the original data in disorder, are unable to effectively capture sample characteristics, and may even produce very unbelievable results. Then, learning style generation methods such as autoencoders are applied. This method produces new data similar to the original data distribution, which means that the data features have not been learned, and overfitting often occurs. GAN brings a new idea to generative models through adversarial learning and has also achieved effective applications in data generation and other fields. The generative adversarial process of GAN is shown in Eq. 2

$$ \arg \mathop {\min }\limits_{G} \mathop {\max }\limits_{D} V(\theta^{(D)} ,\theta^{{({\text{G}})}} ). $$

GAN draws on the idea of a zero-sum game and constructs a discriminator (D) and a generator (G) simultaneously, where \(\theta\) is the corresponding network parameter set. The cost function of the discriminator is denoted as \(J^{(D)}\), and its calculation method is

$$ J^{(D)} (\theta^{(D)} ,\theta^{{({\text{G}})}} ) = - \frac{1}{2}E_{{x{{\sim }}P_{data} }} \log D(x) - \frac{1}{2}E_{{z{{\sim }}P_{z} }} \log (1 - D(G(z))). $$

Since the generator and the discriminator are in a zero-sum game adversarial state, the relationship of their cost function is

$$ J^{(G)} = - J^{(D)} . $$

Combine the cost functions of the generator and the discriminator into a unified value function

$$ V(\theta^{(D)} ,\theta^{{({\text{G}})}} ) = E_{{x{{\sim }}P_{data} }} \log D(x) + E_{{z{{\sim }}P_{z} }} \log (1 - D(G(z))) $$
$$ \left\{ {\begin{array}{*{20}l} {J^{(G)} = \frac{1}{2}V(\theta^{(D)} ,\theta^{{({\text{G}})}} )} \\ {J^{(D)} = - \frac{1}{2}V(\theta^{(D)} ,\theta^{{({\text{G}})}} )} \\ \end{array} } \right.. $$

In the adversarial training process, the generator hopes \(V(\theta^{(D)} ,\theta^{{({\text{G}})}} )\) to be as small as possible, while the discriminator is the opposite, thus establishing a game process.

In recent years, GANs have been widely used in the fields of data generation, image processing, and style transfer. The typical structure of a GAN is shown in Fig. 1.

Fig. 1
figure 1

Typical structure of GAN

The computational structure of GAN

This section analyses the mathematical principles of generator training.

First, assume that the generator G is fixed. Set \(G(z) = x\)

$$ V = E_{{x{{\sim }}P_{data} }} \log D(x) + E_{{z{{\sim }}P_{g} }} \log (1 - D(x)) $$
$$ V = \int {P_{data} } (x)\log D(x){\text{d}}x + \int {P_{data} } (x)\log (1 - D(x)){\text{d}}x $$
$$ V = \int {P_{data} } (x)\log D(x) + P_{g} (x)\log (1 - D(x)){\text{d}}x. $$

The current problem is transformed into finding a D that maximizes V, using the feature that the reciprocal of the extreme point is zero, and the optimal solution of \(D(x)\) obtained by solving is

$$ D^{*} (x) = \frac{{P_{data} (x)}}{{P_{data} (x) + P_{g} (x)}}. $$

Obviously, the value of \(D^{*} (x)\) is between 0 and 1. When real data are input, the judgement value of the discriminator should be as close to 1 as possible, and when the simulated data are input, the judgement value of the discriminator should be as close to 0 as possible. When the distribution of the simulated data is very close to the real data, the mean judgement value should tend to be \(\frac{1}{2}\).

Use the conclusion about \(D^{*} (x)\) to analyze the calculation method of the ideal generator \(G^{*}\)

$$ \mathop {\max }\limits_{D} V(G,D) = V(G,D^{*} ) $$
$$ V(G,D^{*} ) = \int {P_{data} } (x)\log D^{*} (x){\text{d}}x + \int {P_{g} } (x)\log (1 - D^{*} (x)){\text{d}}x $$
$$ V(G,D^{*} ) = \int {P_{data} } (x)\log \frac{{P_{data} (x)}}{{P_{data} (x) + P_{g} (x)}}{\text{d}}x + \int {P_{g} } (x)\log \frac{{P_{g} (x)}}{{P_{data} (x) + P_{g} (x)}}{\text{d}}x. $$

Then, we transform it into the form of the Jenson–Shannon divergence based on the Kullback–Leibler divergence, and the result after sorting is as follows:

$$\mathop {\max }\limits_D V(G,D) = - log(4) + KL\left( {\left. {{P_{data}}} \right\|\frac{{{P_{data}} + {P_g}}}{2}} \right) + KL\left( {\left. {{P_g}} \right\|\frac{{{P_{data}} + {P_g}}}{2}} \right)$$
$$ \mathop {\max }\limits_{D} V(G,D) = - log(4) + 2 \times {\text{JSD(}}\left. {P_{data} } \right\|P_{g} {)}{{.}} $$

Because, \({\text{JSD(}}x{)} \ge {0}\), if and only if \(P_{data} = P_{g}\), that is, when the ideal generator state is reached, the global minimum of \(\mathop {\max }\nolimits_{D} V(G,D) = - \log(4)\) is obtained.

The pseudocode of the basic GAN is given below, where k is the number of iterations of the discriminator.

figure a

Few-shot data generation based on BM-GAN

In health condition estimation research, health condition data are often a set of small-sample time-series data. Due to the different working conditions of each device, even the same kind of device also has a diversity of health condition data. However, if a traditional GAN is directly used to generate health condition data, it is easier to have gradient instability and mode collapse. The main reason is the insufficient representation of small-sample data, which easily causes overfitting of complex generative networks. Therefore, it is necessary to improve the traditional GAN according to the characteristics of few-shot health condition data generation.

Relevant studies have proven that the characteristics of the device health condition exhibit overall monotonicity at the data distribution level. This paper proposes a few-shot overall monotonicity function, and the calculation method is as follows:

$$ \left\{ {\begin{array}{*{20}c} {f\left( {x^{k} } \right) = \left\| {\frac{{\sum\limits_{i = 1}^{T - 1} g \left( {x_{i + 1}^{k} - x_{i}^{k} } \right)}}{T - 1}} \right\|} \\ {g(x_{i} ) = \left\{ {\begin{array}{*{20}c} {1,x \ge 0} \\ {0,x = 0} \\ { - 1,x \le 0} \\ \end{array} } \right.} \\ \end{array} } \right.. $$

For few-shot data generation, a key issue that needs to be solved is how to measure the similarity between few-shot real data and large-batch generated data. For traditional GAN, the basis for calculating the similarity between the generated data and the real data is cross-entropy, and the calculation method is as follows:

$$ H(x,y) = E_{x} [ - \log y] = - \sum\limits_{i = 1}^{n} {x_{i} } \log y_{i} . $$

From the cross-entropy calculation method, it can be known that the traditional GAN requires the generated data to have the same dimensions as the real data. In device few-shot health condition estimation research, the generated data need to have a higher dimension and richer distribution characteristics than the real data. Therefore, this paper designs an average cross-entropy to realize the similarity calculation between the few-shot real data x and the large-batch generated data G(z)

$$ \left\{ {\begin{array}{l} {loss\_BMGAN = \left[ {ave({{\min }_{{G^i}}}{{\max }_D}V(D,{G^i}))} \right]*\lambda }\\ {ave({{\min }_{{G^i}}}{{\max }_D}V(D,{G^i})) = \left[ {\frac{1}{N}\sum\limits_{i = 1}^N {{{\mathbb{E}}_{{\mathbf{x}}\sim {p_{{\rm{data }}}}({\mathbf{x}})}}(\log D({\mathbf{x}})) + {{\mathbb{E}}_{{{\mathbf{z}}^i}\sim {p_{{{\mathbf{z}}^i}}}({{\mathbf{z}}^i})}}\left( {\log (1 - D(G({{\mathbf{z}}^i})))} \right)} } \right]}\\ {\lambda = \frac{{\min (f(x),f(G(z)))}}{{\max (f(x),f(G(z)))}}} \end{array}} \right. $$

where N represents the magnification factor, generally a positive integer. zi and G(zi) represent the random vector and generated data of the ith batch. The average cross-entropy is used to solve the dimensional imbalance between the few-shot real data and the large-batch generated data, so that the large-batch generated data can have a data distribution similar to the few-shot real data.

In this paper, this method is called batch monotonic GAN (BM-GAN), and its implementation process is as follows.

figure b

In this section, aiming at the distribution characteristic of the device health condition, an overall monotonicity function is designed to calculate the monotonicity distribution of real data and generated data. To solve the data dimension imbalance between the few-shot real data and the large-batch generated data, average cross-entropy is designed to realize the similarity calculation between data of different dimensions. The two methods are combined flexibly to realize few-shot data generation based on BM-GAN.

Generative transfer-belief rule base

In this section, the basic BRB and its inference method are introduced, then the GT-BRB method based on transfer learning is designed, and finally, the implementation process of the GT-BRB method with or without auxiliary data is detailed.

The basic BRB model

The BRB method integrates theories such as D-S evidence reasoning, fuzzy sets, and the IF–THEN rule base. With the support of certain expert knowledge, it can effectively deal with incomplete or inaccurate information. It is very suitable for few-shot health condition estimation.

The foundation of the BRB method is the IF–THEN rule base, with the addition of rule weights, antecedent attribute weights, and confidence inference methods. The expression method of the kth rule of BRB is

$$ \begin{gathered} R_{k} : \, \hfill \\ {\text{IF }}x_{1} {\text{is }}A_{1}^{k} \wedge x_{2} {\text{is }}A_{2}^{k} \wedge \cdots \wedge x_{{T_{k} }} {\text{is }}A_{{T_{k} }}^{k} \hfill \\ {\text{THEN }}\left\{ {\left( {D_{1} ,\beta_{1,k} } \right),\left( {D_{2} ,\beta_{2,k} } \right), \cdots ,\left( {D_{N} ,\beta_{N,k} } \right)} \right\} \hfill \\ {\text{With a rule weight }}\theta_{k} {\text{ and attribute weight }}\delta_{1} ,\delta_{2} , \cdots ,\delta_{k} . \hfill \\ \end{gathered} $$

Among them, \(x_{i}\) is the ith input of the BRB system, \(i = 1,2, \cdots ,T_{k}\). \(A_{i}^{k}\) is the reference value of the ith antecedent attribute in the kth rule, \(k = 1,2, \cdots ,L\). \(\beta_{j,k}\) is the confidence of the jth evaluation level in the kth rule, \(j = 1,2, \cdots ,N\). \(\theta_{k} \, \) is the rule weight corresponding to rule k, and \(\delta_{k}\) is the weight value of the kth antecedent attribute.

Due to the variety of data types, the BRB system will normalize before processing the data and combine the membership function to calculate the conversion value of each input. The calculation method of this conversion technology is as follows:

$$ \alpha_{i,j}^{k} = \frac{{\gamma_{i,j + 1} - x_{i} }}{{\gamma_{i,j + 1} - \gamma_{i,j} }},\quad \gamma_{i,j} \le x_{i} \le \gamma_{i,j + 1} ,j = 1,2, \cdots ,J_{i} - 1 $$
$$ \alpha_{i,j + 1}^{k} = 1 - \alpha_{i,j}^{k} ,\quad \gamma_{i,j} \le x_{i} \le \gamma_{i,j + 1} \quad ,\quad j = 1,2, \cdots ,J_{i} - 1 $$
$$ \alpha_{i,q}^{k} = 0,\quad q = 1,2, \cdots J_{i} ,\quad q \ne j,j + 1. $$

Inference method of the BRB.

As a typical expert knowledge system, the reliable operation of BRBs relies on accurate knowledge expression and reasonable rule inference. The inference process of the BRB consists of the following two parts:

(1) Calculation of the activation weight of the rule.

The activation weight \(w_{k}\) changes dynamically according to the input data. The calculation method of the activation weight \(w_{k}\) corresponding to the kth rule is

$$ w_{k} = \frac{{\theta_{k} \prod\nolimits_{i = 1}^{M} {\alpha_{i}^{k} } }}{{\sum\nolimits_{l = 1}^{L} {\theta_{l} } \prod\nolimits_{i = 1}^{M} {\alpha_{i}^{l} } }}, $$

where \(w_{k} \in [0,1],\quad k = 1,2, \cdots ,\).

(2) Data fusion and rule inference methods.

First, calculate the confidence \(\hat{\beta }_{j}\) of the evaluation result \(D_{j}\), according to the activation weight \(w_{k}\) of rule k and the confidence \(\beta_{j,k}\) of the jth evaluation level

$$ \hat{\beta }_{j} = \frac{{u \times \left[ {\prod\nolimits_{k = 1}^{L} {\left( {w_{k} \beta_{j,k} + 1 - w_{k} \sum\limits_{i = 1}^{N} {\beta_{j,k} } } \right)} - \prod\limits_{k = 1}^{l} {\left( {1 - w_{k} \sum\nolimits_{i = 1}^{L} {\beta_{j,k} } } \right)} } \right]}}{{1 - u \times \left[ {\prod\limits_{k = 1}^{L} {\left( {1 - w_{k} } \right)} } \right]}}. $$

After obtaining the confidence level \(\hat{\beta }_{j}\) of each \(D_{j}\), the output result of the BRB is formed

$$ S(X) = \left\{ {\left( {D_{j} ,\hat{\beta }_{j} } \right),i = 1,2, \cdots ,M,j = 1,2, \cdots ,N} \right\}. $$

However, for the health condition estimation, it is also necessary to form a comprehensive estimation result, so it is also necessary to calculate a final result from \(D_{j}\). The calculation method is as follows:

$$ \begin{aligned} u & = \left[ \sum\limits_{j = 1}^{N} {\prod\limits_{k = 1}^{L} {\left( {w_{k} \beta_{j,k} + 1 - n_{k}^{\prime } \sum\limits_{i = 1}^{N} {\beta_{j,k} } } \right)} } \right. \\ &\quad \left. - (N - 1)\prod\limits_{k = 1}^{L} {\left( {1 - w_{k} \sum\limits_{i = 1}^{N} {\beta_{j,k} } } \right)} \right]^{ - 1} .\end{aligned} $$

Then, the expected utility of the output of the entire BRB is

$$ \mu (S(X)) = \sum\limits_{j = 1}^{M} \mu \left( {D_{j} } \right)\beta_{j} . $$

Therefore, the final result \(\hat{y}\) of the health condition estimation is

$$ \hat{y} = \mu (S(X)). $$

Transfer learning of GT-BRB

The basic idea of transfer learning is to first train a generalized model on an existing dataset, which is the same or similar to the training data, such as data from the same device in different periods, or the data of the same kind of device under different working conditions. Second, it establishes the transfer relationship between the existing dataset and the training dataset, usually mining the functional relationship that can be used as the training data feature in the generalization model. Finally, through the determined transfer relationship, the training data are input to fine-tune the generalization model to obtain a dedicated model to realize transfer learning.

The advantages of transfer learning are as follows. (1) Transfer learning effectively improves the efficiency of few-shot learning. The initial generalization model is obtained through training with a large quantity of existing data, which can effectively reduce the cost of few-shot learning model training. (2) Transfer learning solves the difficulty of feature extraction in small samples to a certain extent. By training the generalization model, an effective reference feature set can be obtained for the few-shot dataset. (3) Transfer learning effectively improves the generalization learning ability of few-shot learning. Learning the relatively rich features of existing datasets effectively avoids the overfitting phenomenon, which easily occurs in few-shot learning.

Currently, transfer learning is widely used in data-driven methods such as neural networks, but transfer learning of knowledge reasoning methods has not yet been published. This paper draws on the idea of transfer learning and proposes a generative knowledge-based transfer learning architecture. Specifically, the data generation capability of GAN is used, taking the generated data as the source domain and the real data as the target domain. Then, combined with the prior knowledge of BRBs, we attempt to solve the problem of few-shot health condition estimation. The generative transfer BRB (GT-BRB) method proposed in this paper has two main application scenarios. The main difference is whether it contains auxiliary training data, which are different from the test data.

The first scenario is without auxiliary training data. The training set and the test set belong to the same kind of device, which is called the same domain data in this paper. The following process is used to carry out the few-shot health condition estimation: (1) the few-shot training data are expanded by GAN to generate a large quantity of simulated training data, (2) the BRB model is trained using simulated training data combined with expert knowledge to obtain the generalized BRB model, and (3) few-shot training data are used to fine-tune the generalized BRB model to obtain a dedicated BRB model.

The second scenario is that there are some auxiliary training data, such as relatively enough data of the same or similar devices under different working conditions, and the current device has fewer available data, which is called foreign domain data in this paper. The following process is used to carry out the few-shot health condition estimation. (1) Use GAN to expand the few-shot training data of the current device and generate a certain quantity of simulated training data based on the magnitude of the auxiliary training data. (2) Simulated training data, auxiliary training data and expert knowledge are used to train the BRB model into a generalized BRB model. (3) Use the few-shot training data of the current device to fine-tune the generalized BRB model to obtain a dedicated BRB model. The implementation process is summarized in Fig. 2.

Fig. 2
figure 2

The GT-BRB implementation process

Case study

In this section, the experimental background and data sources are introduced, the few-shot data generation ability of BM-GAN is verified, and finally, the few-shot health condition estimation ability of GT-BRB with or without auxiliary data is verified.

Background formulation

Traditional health condition estimation methods generally require that the training data and test data meet the premise of independent and identical distribution. However, in engineering practice, due to the different working conditions and many other factors, this premise is actually difficult to meet. A typical example is lithium batteries, which are widely used in daily life, such as mobile phones, computers, electric vehicles, and even aerospace devices. Lithium batteries are present in every aspect of our lives. Therefore, studying the health condition estimation of lithium batteries has very important practical significance. However, due to the long ageing cycle of lithium batteries, there are fewer degradation experimental data with high reliability, which is a typical few-shot health condition estimation problem. In the past, when studying the health condition of lithium batteries with a very complicated use environment through experimental data of a small number of cycles or single working conditions, it is very easy to produce an overfitting phenomenon, resulting in insufficient generalization ability of the estimation model. Therefore, using the knowledge transfer learning method, which is based on the generated data, to estimate the few-shot health condition of lithium batteries has the dual value of theory and practice.

The nature of the few-shot problem is that the amount of information contained in the small sample is insufficient, which can be manifested as sparse sampling or unbalanced sampling of the sample, resulting in an insufficient quantity of data or uneven distribution. To avoid the randomness of sampling from affecting the experimental results, this paper adopts the average sampling method to generate a few-shot dataset.

The accumulated relative error (ARE) is used to indicate the cumulative change in the relative error between the generated data and the true value in this experiment. The mean relative error (MRE) represents the absolute value of the ratio of the deviation between the predicted value and the true value to the true value. MRE can more intuitively reflect the deviation between the predicted value and the true value by normalizing the error value to the interval [0, 1]. They are defined as follows:

$$ \left\{ {\begin{array}{*{20}c} {ARE = \sum\limits_{i = 1}^{n} {\left| {\frac{{y_{i} - \hat{y}_{i} }}{{\hat{y}_{i} }}} \right|} } \\ {MRE = \frac{1}{n}ARE} \\ \end{array} } \right.. $$

Data enhancement based on BM-GAN

This paper uses NASA’s lithium battery ageing dataset. In the same domain experiment, only battery #0007 (B7) is used. In the foreign domain experiment, battery #0005 (B5) and battery #0006 (B6) are selected as the auxiliary training data for B7.

Related research has proven that the discharge voltage difference time interval (TIEDVD) [41] (that is, the time it takes for the battery to drop a fixed voltage during discharge) and mean temperature (MT) [20] during TIEDVD have a certain relationship with the capacity of lithium batteries. The definitions of TIEDVD and MT are shown in Fig. 3.

Fig. 3
figure 3

The definitions of TIEDVD and MT

The mathematical definition of TIEDVD and MT is

$$ {\text{TIEDVD = }}t_{{V_{H} }} - t_{{V_{L} }} $$
$$ {\text{MT = }}\frac{{\int_{{t_{{V_{L} }} }}^{{t_{{V_{H} }} }} T dT}}{{t_{{V_{H} }} - t_{{V_{L} }} }}. $$

In addition, B7’s TIEDVD, MT, and capacity changes over time are shown in Fig. 4.

Fig. 4
figure 4

The ageing conditions of TIEDVD, MT, and capacity

In Fig. 4, it can be intuitively recognized that TIEDVD and MT are related to the health condition, which means the capacity, of lithium batteries. Therefore, TIEDVD and MT are used as BM-GAN simulation objects in the GT-BRB method. Combining the few-shot conditions and existing research results proposed in the previous section, BM-GAN's generator and discriminator both use deep neural networks. Taking the first 160 sets of data from NASA batteries, the sampling interval is 5, and the number of samples is 32. The number of hidden layers of the generator and discriminator of BM-GAN is set to 2, the number of input layer nodes of the generator is 32, the number of output layer nodes is 160, and the numbers of nodes of the two hidden layers are 600 and 1,000. The number of nodes in the input layer of the discriminator is 160, the number of nodes in the output layer is 1, the numbers of nodes in the two hidden layers are 800 and 300, and each layer directly adopts a fully connected mode. The output result of the discriminator is 0 or 1, where 0 represents that the discrimination result is the data generated by the generator and 1 represents that the discrimination result is the real data. The training rounds of the network are 1,200, and the learning rate is 0.002.

Average sampling of B7’s TIEDVD and MT with an initial sampling number of 32, and the result is shown in Fig. 5.

Fig. 5
figure 5

The B7 TIEDVD and MT with an initial sampling number of 30

When the training rounds are 200, 600, and 1,000, the distribution of simulated data generated by BM-GAN during a training process is shown in Fig. 6.

Fig. 6
figure 6

The distribution of simulated data generated by GAN with the training rounds of 200, 600, and 1,000

It can be seen from the above experimental results that as the number of training rounds increases, BM-GAN gradually learns the distribution of small samples, and there is a certain difference from the original samples, thus ensuring the diversity of simulated training data. Since the generated results of BM-GAN have a certain degree of randomness, taking the average of multiple experimental results make the generated results too smooth, which is not conducive to the diversity of data. Thus, we only show the results of a certain training process. The BM-GAN generation results in different training rounds are very close, with only slight fluctuations.

Next, we verify the few-shot data generation ability of BM-GAN and select GAN, linear regression (LR), and uniform interpolation (UI) for comparison experiments. Among them, the objective function of the GAN method uses average cross-entropy, 160 sample points are uniformly sampled after linear regression, and the interpolation method evenly inserts four generated data between every two real data. The results of the accumulative relative error between the generated data and the real data of TIEDVD are calculated as shown in Table 1.

Table 1 The ARE between the generated data and the real data of TIEDVD for different samples

When the number of samples is 10, BM-GAN has a very large performance improvement compared to the GAN and LR methods and is close to the UI method. In this case, the few-shot real data generally show a downwards trend, and λ plays a leading role, making the generated data of BM-GAN generally have a downwards trend. When the number of samples is 20 or 30, BM-GAN maintains the best generation accuracy. When the number of samples is 50, which is equivalent to an average of three real data samples to collect one point, due to certain fluctuations in health condition data such as TIEDVD, this is also a true response. At this time, the randomness generated by BM-GAN plays a leading role, making the generated data have a similar overall distribution to the few-shot real data, which still has some random fluctuations, so the generation accuracy is higher than linear methods such as LR and UI.

Through the above experimental results, it can be confirmed that BM-GAN has good data generation ability under few-shot conditions.

GT-BRB of the same domain

The previous section analyzed the role of the GAN in the GT-BRB model, and this section introduces the role of the BRB in the GT-BRB model.

When using the BRB method, it is important to combine expert knowledge to set an appropriate initial parameter. Based on related research and the particularity of few-shot data, this paper adopts the following initial parameters.

TIEDVD is set as attribute 1 and divided into short, medium, and long parts, and MT is set as attribute 2 and divided into low and high parts. Set the capacity as the output result and divide it into four parts: good, medium, bad, and invalid

$$ A_{1} = \{ S = 1300s,M = 1700s,L = 2100s\} $$
$$ A_{2} = \left\{ {L = 32^{^\circ } {\text{C}},H = 36^{^\circ } {\text{C}}} \right\} $$
$$ D = \{ {\text{F}} = 1.1{\text{ Ah, }}P = 1.4{\text{ Ah}}, \, M = 1.7{\text{ Ah}}, \, G = 2{\text{Ah}}\} . $$

To effectively compare the estimation capability of the GT-BRB method, we select Gaussian process regression (GPR) with better learning ability for a small sample, a backpropagation neural network (BPNN) as the typical data-driven method, initial BRB method, and the generalized BRB method proposed in this paper for comparison. TIEVD and MT with a sample number of 30 are used to evaluate the remaining battery capacity.

Among them, BPNN is a four-layer network structure, in which the first layer is an input layer containing two neuron nodes, representing TIEDVD and MT. The middle two layers are hidden layers with 16 nodes, and the third layer is an output layer. The training target error of the network is 0.002, the learning efficiency is 0.3, and the number of training rounds is set to 5000. The parameter settings of the initial BRB are consistent with Table 2. The optimization of each BRB model uses the fmincon function in MATLAB, the maximum number of iterations is set to 1000, and the termination error is \(10^{ - 6}\).

Table 2 Initial parameters of the BRB

In the parameters in Table 2, all \(\theta_{k}\) and \(\delta_{k}\) are 1, indicating that the weights of each rule and the weights of the premise attributes (TIEDVD and MT) are the same under the initial conditions. Additionally, the value of \(\beta_{j,k}\) denotes the credibility of the rating under the current rules. After being trained by the same domain simulated data, the parameter values of the generalized BRB model are shown in Table 3. The parameter change values of the dedicated BRB model obtained using real data training the generalized model are also shown in the last row of Table 3.

Table 3 Parameters of the generalized BRB model and changes in the dedicated BRB model

Table 3 shows that the parameters of the BRB model can be adjusted accordingly by inputting the corresponding data, thereby achieving a balance between expert knowledge and data-driven methods. Taking the change in the \(\delta_{k}\) value as an example, since TIEDVD decreases almost monotonically, while MT fluctuates significantly, MT plays a more critical role in multirule fusion. After training, \(\delta_{1}\) is significantly smaller than \(\delta_{2}\), which realizes the optimization and adjustment of parameters.

When the data size of sampling is 30, the experimental results of each method are shown in Fig. 7.

Fig. 7
figure 7

The remaining capacity estimated by each method

First, we analyze the differences in various types of methods. When the data size is 30, because the data are uniformly sampled and the distribution of the sample is close to the overall distribution of the battery, the traditional machine learning model can learn the ageing trend of the battery. However, due to the sparse data size, the learning model has an underfitting problem, and the estimation accuracy of the test data is poor. Among them, the BPNN method is better than the GPR method, and in the case of small samples, the GT-BRB based on transfer learning proposed in this paper shows better estimation accuracy. The information of a small sample is improved through expert knowledge, and then the generation model is used to increase the diversity of samples and training to obtain a generalized BRB model. Then, real data are used to fine-tune the generalized BRB model to obtain a dedicated BRB model.

Second, we analyze the differences in the BRB method in different application scenarios. It can be seen in the lower part of Fig. 7 that the initial BRB estimation accuracy is poor. Although expert knowledge is used, there may be a certain deviation due to expert knowledge, and a few-shot dataset contains insufficient information, so the performance of the initial BRB is not good. The generalized BRB model effectively improves the estimation accuracy of the initial BRB method due to the increase in the training data size, but due to the diversity of the generated data, the estimation result is relatively unstable. The GT-BRB method combines data enhancement methods and transfer learning ideas and uses real data to fine-tune the generalized BRB model to achieve a more accurate health condition estimation.

The experimental results show that when the uniform sampling data size is 30, GT-BRB improves the performance of the BRB method, and its performance is also better than typical methods such as GPR and BPNN and achieves a more accurate health condition estimation. To further analyze the improvement of the GT-BRB on the BRB estimation method, the BRB method is set up with the same initial parameters as the GT-BRB, and the parameters are directly optimized using real data.

While changing the size of the sampling data, the above experiment was repeated ten times, and the average result was obtained, as shown in Table 4.

Table 4 The MRE of different conditions

From the above experimental results, it can be seen that when the data size is small, such as when only 10 real data are used, the MRE of each method is 10% larger, and accurate health condition estimation cannot be achieved. However, the MRE of the GT-BRB method is 12.5%, which is relatively lower than other methods, and it has a certain more accurate few-shot health condition estimation ability. As the quantity of data increases, the estimation performance of each method improves; in particular, the performance of the BPNN method improves the most, which reflects the sensitivity of the neural network method to the size of the training data. When the quantity of data is less than 10, 20, or 30, the BRB method has certain advantages over other health condition estimation methods, which reflects the effectiveness of expert knowledge under the condition of a small sample. However, as the data size increases to 50, the estimation accuracy of the initial BRB method is already lower than that of the GPR and BPNN methods, but the GT-BRB method is still better than GPR and BPNN through the improvement of data enhancement and transfer learning. The above results reflect the effectiveness of the GT-BRB method proposed in this paper and the improvement effect of the initial BRB method. Under the four experimental conditions, the estimation accuracy of the GT-BRB is relatively 17.3% higher than that of the BRB method on average. Therefore, the GT-BRB method proposed in this paper effectively improves the few-shot health condition estimation ability of the BRB method.

GT-BRB of the foreign domain

To further solve the low-accuracy problem of few-shot health condition estimation, this paper considers introducing auxiliary training data. Combining the experiments carried out in the above sections, we select B5 and B6, which are the same type as B7 but under different working conditions. Among them, the distributions of B5 and B7 are similar, and the differences between B6 and B7 are relatively large. The ageing distributions of the three batteries are shown in Fig. 8.

Fig. 8
figure 8

The ageing distribution of B5, B6, and B7

To study the influence of different auxiliary training data on the few-shot learning ability, two sets of auxiliary training data (B5 and B6) combined with few-shot test data (B7) of different data sizes are used to carry out the following experiments. To simplify the research process, this section only studies the auxiliary training data and the simulated training data when their data sizes are the same (subject to the size of the auxiliary training data). The MRE promotion value d_MRE is used as the evaluation index. The calculation method is the MRE value in the experiment of Sect. “Transfer learning of GT-BRB” minus the MRE value of the same test data, and the evaluation method with the auxiliary training data is introduced to the training process in this section. The average result is obtained, as shown in Fig. 9.

Fig. 9
figure 9

The d_MRE of different conditions

The above experimental results show that the experimental results of the B5 group are better than those of the B6 group. Because the distribution of B5 and the test data B7 are closer, the difference between B6 and B7 is relatively large. When the test data size is 10, that is, the sample size is extremely small, whether it is B5 or B7, both highly improve the estimation accuracy of the original method. When the quantity of data is 20, the performance of the BPNN method is significantly improved, which is affected by the randomness of the sampled data. When the quantity of data is 30, the positive impact of the auxiliary training dataset becomes insufficient, and B6 even has a negative impact on the GPR method. When the quantity of data is 50, only B5 has a positive impact on the GPR method, and the remaining groups all have a negative impact. This shows that when the training data size is relatively sufficient, the introduction of foreign domain auxiliary training data is likely to reduce the accuracy of the original few-shot health condition estimation methods.

Finally, the time complexity of the mentioned methods is analyzed. The essence of BPNN feedforward calculation and error backpropagation is matrix multiplication. Under the premise that the input layer and output layer are fixed, it is only related to the number of neurons in the hidden layer. Taking a three-layer neural network as an example, its time complexity is O(N). The time complexity of the GPR method is O(N2), because it needs to solve a triangular linear equation system in the solution process. As a rule inference method, the time complexity of the BRB is O(1), which means that the GT-BRB method proposed in this paper has extremely high computational efficiency.


As few-shot cases are a widespread phenomenon in daily conditions, this paper focuses on few-shot health condition estimation. The GT-BRB method proposed in this paper is a novel generative knowledge-based transfer learning architecture that effectively combines data augmentation, knowledge reasoning, and transfer learning. Through experiments, it is found that the GT-BRB is feasible for few-shot health condition estimation and improves the estimation accuracy of the BRB method by approximately 17.3%. In addition, GT-BRB, as a knowledge reasoning method, has a time complexity of only O(1), and its computational efficiency is significantly better than other data-driven methods. However, there are still some shortcomings in the proposed method. On the one hand, in the GT-BRB method, data enhancement, and condition estimation are performed sequentially. This one-way information transfer may cause the accumulation of errors. Once the generated data deviate greatly from the real data, the subsequent few-shot health condition estimation accuracy will be affected. On the other hand, as a knowledge reasoning method, although BRB can use real data to adjust the parameters preset by experts, it still cannot effectively balance the influence of expert knowledge and data-driven methods on the results of few-shot health condition estimation. Therefore, further research can focus on the dynamic interaction of data enhancement and condition estimation, step-by-step input to generate data, and moderately guide data generation with estimation results. In addition, fusing knowledge reasoning methods with few-shot data-driven methods such as meta-learning is also an interesting idea.