A high-voltage circuit breaker (HVCB) is crucial for ensuring the safety and stability of electric power systems. Due to mechanical, electrical, and environmental impacts, HVCBs suffer from functional deterioration or even failure. According to statistical data, more than half of major failures are caused by mechanical issues (e.g., joint clearance faults caused by wear and corrosion) [1, 2]. Traditional maintenance is generally based on manual schemes, which are quite costly and time-consuming. Furthermore, as more disassembly is involved in the process of manual maintenance, HVCBs might suffer from secondary damage. There is indeed a requirement for mechanical intelligence fault diagnosis [3, 4].

With the rapid development of industrial informatization and machine learning technology, machine learning-based methods for fault diagnosis of HVCBs have received attention from the industry. Feature extraction and data classification are the main tasks in fault diagnosis. More specifically, in these methods, characteristic data of different mechanical states are collected and then used to train and test the classification model. However, most of these methods are typically based on a single classifier and sensor, whose performance is sensitive to network parameters. In other words, the performance quality needs to be ensured by the expert’s prior knowledge of optimal model configurations.

Although various machine learning techniques have been applied in the mechanical fault diagnosis of HVCBs, there are still two open issues. First, as prior knowledge of model structures and parameter selection are indispensable in recent approaches, insufficient expert knowledge may reduce the diagnostic accuracy of these methods. Second, the input data in the most recent fault diagnosis models are collected from a single sensor, and the diagnostic decisions are based on an individual classifier. Thus, the diagnosis models have poor robustness, and prediction errors are more likely to occur. Synthesis fault diagnosis models with fusion analysis are rarely reported. Therefore, it is meaningful to explore effective information fusion techniques and further build an ensemble diagnosis model for enhancing the fault diagnosis performance of HVCBs.

To address the above problems, we report an improved Dempster–Shafer evidence theory-fused echo state neural network (IDS–ESN), a novel ensemble model that combines different intelligent classifiers through the improved DS evidence theory. More specifically, energy distributions in vibration intrinsic modal function (IMF) components by variational mode decomposition (VMD) are extracted as the feature vector for diagnostic model training and testing. Then, multiple ESN modules trained by features of different sensors are adopted as sub-classifiers. Moreover, considering the paradox issue, an improved DS evidence fusion algorithm is proposed by evaluating the deviation degree among multiple pieces of evidence. Finally, through evidence fusion, the ensemble IDS–ESN is obtained for the mechanical fault diagnosis of HVCBs.

In summary, the main contributions of this study are threefold.

  1. 1.

    To the best of our knowledge, the capability of ESN is investigated for the first time in the mechanical fault diagnosis of HVCBs. The comparative results demonstrate that this approach achieves promising performance improvement.

  2. 2.

    Typical DS is improved to fuse multisource information from multiple sensors and ESN sub-classifiers. The proposed IDS–ESN model can effectively enhance the accuracy and make the diagnostic model more robust than individual ESN models.

  3. 3.

    For IDS–ESN, a new training mechanism is investigated. In multiple scenarios, IDS–ESN is more robust and can achieve higher diagnostic accuracy than the traditional DS method.

The rest of our paper is structured as follows. “Related works” outlines the main related works. “Platform description and setup” describes the studied HVCB and the experimental setup. “Ensemble echo state network with evidence fusion” describes the theoretical background and our improved DS evidence theory. “Results and discussion” analyses the feature extraction of vibrations by the VMD method and the effectiveness of the improved DS evidence theory. Conclusions are given in “Conclusions”.

Related works

In the past decade, artificial techniques have been widely applied in fault diagnosis, online monitoring, and intelligent decision-making in energy field [5, 6]. Feature extraction and machine learning algorithms are investigated in the fault diagnosis literature. For feature extraction, sound [7], contact travel curves [8, 9], electromagnet coil currents [10, 11], and vibrations [12,13,14,15] are typical signals used for fault diagnosis. Since abundant structural state-related information is contained in vibrations, most fault diagnosis studies are based on vibration analysis. A series of time–frequency methods have been proposed for signal processing and the subsequent feature extraction of vibration data, including empirical mode decomposition (EMD), local mean decomposition (LMD), empirical wavelet packet decomposition (WPD), the empirical wavelet transform (EWT), and VMD [16,17,18,19,20]. Furthermore, different types of amplitude and frequency vibration features, such as time–frequency entropy, permutation entropy, singular entropy, and energy entropy, were extracted as the input for fault diagnosis models. For multiclass issues, the extracted feature vector is generally high-dimensional and contains redundant components. Thus, the feature selection technique can be used to reduce the dimensionality of a raw feature vector by discarding redundant and irrelevant vectors, which can further enhance the model performance [21, 22]. These feature extraction methods have achieved promising performance with respect to both nonadaptive and adaptive analysis, which contributes to developing intelligent fault diagnosis methods for HVCBs.

For fault diagnosis, a robust and high-accuracy classifier is required to distinguish the type of fault. For example, Tao et al. [23] proposed a feature metric-based fault diagnosis approach under limited data conditions. In the study, a parametric optimization-based meta-learning network and a metric learning network were combined to extract optimization information to adapt between different domains and metric information for similarity discrimination, respectively. For the diagnosis of spring fatigue and oil damper leakage, Ma et al. [24] reported a fault diagnosis model based on a random forest (RF) model to reduce nonessential features. Zhang et al. [25] designed an asynchronous interval type-2 fuzzy approach to address the fault detection problem of a quarter-car suspension system. In particular, a fault diagnosis model based on a support vector machine (SVM) that relied less on the sample size was reported [26]. By optimizing kernel and penalty parameters, advanced SVM methods have been reported and have achieved performance improvements. Biclass and multiclass SVM methods have been investigated for mechanical fault diagnosis of HVCBs, in which the looseness of a base screw, electromagnet immobility, and overtravel have been diagnosed successfully [27, 28]. Compared with other neural networks, echo state networks (ESN) have strong self-adaptability in storing nonlinear input–output mapping relationships. Only the output weights of the ESN need to be trained by a linear regression algorithm, which avoids gradient disappearance and high computational complexity. Therefore, ESN models have been widely applied in control [29, 30], pattern recognition [31, 32], and nonlinear time series prediction [33, 34]. As a novel research field, machine learning and swarm intelligence approaches have been successfully combined, and outstanding results have been obtained in different areas.

Dempster–Shafer (DS) evidence theory could be utilized as a fusion approach for fault diagnosis, target recognition, and condition monitoring [35]. However, while there are probable conflicts of different evidence, the traditional DS evidence fusion method could produce a contradictory result. Considering that there might be deviations in sensor data, Murphy [36] collected the evidence multiple times and calculated the average to weaken the conflict caused by these deviations. On this basis, Deng et al. [37] further introduced the Euclidean distance function to calculate the support degree of specific evidence by other evidence, which improved the robustness of the original method. A new evidence fusion rule was reported for bearing and gear fault diagnosis by Li et al. and Zhang et al. [38, 39], in which evidence credibility was obtained by substituting the evidence distance matrix with a modified Gini index function. To prevent conflicting evidence fusion in the conventional DS approach, this fusion rule is achieved directly by evaluating the correlation between different evidence data. The DS evidence theory has been successfully applied in the mechanical fault diagnosis of rotating parts, such as bearings, gearboxes, and rotors, providing valuable guidance for fault diagnosis of HVCBs.

Platform description and setup

High voltage circuit breaker

A spring-driven HVCB uses stored elastic potential energy to realize the operations of opening and closing and is now widely used in energy power systems. Figure 1 describes the mechanical system of the studied ZN12 HVCB. Its fast operation is generally controlled within tens of milliseconds. Thus, the structural dynamic characteristics are quite sensitive to clearance joints. Severe collisions between moving parts would be strengthened by clearance joints, thus deteriorating the operation quality. Under the influence of wear, corrosion, etc., the joint clearance size will increasingly deviate from the original design value, which leads to greater impact stress during mechanical operation and induces mechanical faults. Therefore, this paper focuses on diagnosing mechanical faults caused by clearance joints.

Fig. 1
figure 1

Mechanical system of the ZN12 HVCB

The vibration signal consists of a series of complex high-frequency vibration waves generated by various components (e.g., electromagnet and motor components, as shown in Fig. 1). Its amplitude, frequency, energy distribution, and other signal characteristics could contain large amounts of structural state-related information. In this paper, combining the energy distribution of vibration signals in different IMF components of VMD, an ensemble classifier is proposed to diagnose HVCB faults. Multiple ESN modules are adopted as sub-classifiers and fused via an improved DS evidence fusion theory. The overall technical flowchart is given in Fig. 2, and it illustrates the main steps of fault diagnosis in the paper.

Fig. 2
figure 2

Flowchart of the IDS–ESN-based fault diagnosis model

Step 1 Repeat the operation experiments of a ZN12 HVCB in different mechanical states. Except for the normal condition, clearance joint faults at three locations are constructed. Furthermore, by changing the joint clearance size, we can obtain two fault conditions for each fault joint clearance. Each type of operation experiment under different fault conditions is carried out 100 times. Thus, a total of 400 groups of vibration signals of two measuring points under different operation conditions are collected.

Step 2 Decompose the obtained vibration signal by VMD method. Then, the energy distribution in different IMFs is extracted as the mechanical state-related feature of the HVCB. The most common way to evaluate a fault diagnosis model is to use training data to train the model and test data to test the model’s performance. Therefore, the feature data obtained from the operation experiment are divided into four data sets, as listed in Table 1, for model training and testing.

Table 1 Data set for model training and testing

Step 3 Perform model training of multiple ESN modules, in which random uniform sampling of the original accelerator data is applied. Taking data set A in Table 1 as an example, 25 training samples (50 data points for each fault condition) are randomly selected for each load condition to generate a training set. The remaining 25 samples are classified as test sets. It is important to note that to guarantee an objective result, the testing data should not appear in the training data set. Otherwise, over-optimistic problems could emerge in the diagnostic result.

Step 4 Employ the trained ESN modules with different network parameters (spectral radius and different numbers of reservoir neurons) as the sub-classifiers of the ensemble model. Their model training tasks are implemented on the above data sets in Table 1. In addition, raw diagnostic results of the different mechanical states mentioned in step 1 then be obtained.

Step 5 Produce the fused fault diagnosis result. In the fusion process of the ensemble model, the output of each ESN module is considered as the raw evidence and fused by DS evidence fusion algorithm for the final fault diagnosis result by our improved DS evidence fusion algorithm. It is worth mentioning that for objective fault diagnosis evaluation, the established models often need to be tested many times for average accuracy, and multiple sampling steps are necessary. Taking data set A in Table 1 as an example, for each training circle, the 25 data points in the training set need to be randomly sampled from data set A, and the original testing set is replaced with the remaining 25 data points.

Testing platform

The HVCB is generally in a long-term static state in its service period, resulting in a shortage of sample data in recent fault diagnosis research. Limited by insufficient fault data, clearance joint fault diagnosis has rarely been reported in previous studies. In this paper, an experimental setup based on a real ZN12 HVCB is built. As shown in Fig. 3, in addition to the normal operating condition, joint fault conditions at three positions are simulated by changing the joint clearance size (fault sizes of 0.25 mm and 0.75 mm). For the normal condition, its joint clearance size is controlled within 0.04 mm, which is negligible.

Fig. 3
figure 3

Experimental setup

Two sets of monoaxial CCLD/IEPE shock accelerometers (Brüel and Kjær, type 8339) are screwed into the HVCB to record vibrations during the closing operation. The measurement range, sensitivity, and upper cutoff frequency of our accelerometer are ± 10,000 m/s2, 0.25 mv/g, and 20 kHz, respectively. In addition, a Brüel & Kjær type 3053-B-120 signal acquisition card is adopted for vibration collection. It is worth mentioning that to prevent mutual influence between two adjacent mechanical operations of the HVCB, an interval of more than 3 min for each opening and closing operation test is needed. Furthermore, our multiple comparison tests have shown that the installation location of the accelerometer is significant for effective vibration measurement. To ensure that the extracted vibration signal contains sufficient state-related information, the accelerometers should be installed near the failure point, but the distance between the two measuring points should be as far as possible. It is ideal for the accelerometer axis lines to be perpendicular to each other.

During the test, the sampling frequency of the acquisition card is set to 65,536 Hz with a sampling time of 1 s. For each normal case and for the three types of joint clearance faults, there are 100 groups of vibration signals from the two accelerometers, and each signal contains 65,536 data points. Table 2 lists the three joint fault locations and the relative FFT spectrum of vibration signals from sensor #1 under the four operating conditions.

Table 2 Fault position, raw vibration signal, and its FFT spectra

Table 2 shows that the FFT spectrum of the obtained vibration signals covers the whole frequency range (0–32,768 Hz), which ensures that the measured data contain all state-related information of the HVCB. On the whole, the vibration signal has the characteristics of a narrow time domain, wide frequency band, and high amplitude. In addition, its vibration FFT spectrum characteristics under different working conditions, including the evolution trend and the amplitude peak point versus frequency, have significant similarities, which makes fault differentiation difficult. Next, the ESN-based technique is applied to the aforementioned fault diagnosis.

Ensemble echo state network with evidence fusion

Echo state network

ESN is a simple and efficient recurrent neural network that is gaining popularity in time series forecasting. As shown in Fig. 4, the hidden layer in the ESN is replaced by a dynamic reservoir, which consists of a large number of sparsely connected neurons.

Fig. 4
figure 4

Regular network architecture of ESN

For illustration, taking an ESN module composed of K input neurons, N reservoir neurons, and L output neurons as an example, the basic equations can be described as follows:

$$ x(n + 1) = \tan h(W^{{{\text{in}}}} u(n + 1) + W^{{{\text{res}}}} x(n) + W^{{{\text{back}}}} y(n)), $$
$$ y(n + 1) = W^{{{\text{out}}}} u(n + 1), $$

where \(x(n) = (x_{1} (n),...x_{N} (n))^{T}\), \(y(n) = (y_{1} (n),...y_{L} (n))^{T}\) and \(u(n) = (u_{1} (n),...u_{k} (n))^{T}\) represent the input vector, the reservoir state, and the output vector, respectively. Win, Wres, and Wout denote the input-to-hidden, hidden-to-hidden, and hidden-to-output connection weight matrices, respectively. In the ESN, Wout is the only parameter that needs to be trained, and other weights are randomly determined and do not change with the training process. In the traditional method, the parameter Wout is determined by minimizing the training error as

$$ W^{{{\text{out}}}} = \mathop {\arg \min }\limits_{W} \left\| {WX - D} \right\|_{2}^{2} , $$

where X and D are the Ltr × N reservoir state matrix and Ltr × L signal label matrix during the training stage, respectively, and Ltr is the training length. X and D can be calculated as follows:

$$ X = \left[ {\begin{array}{*{20}c} {x_{1} (1)} & {x_{2} (1)} & {...} & {x_{N} (1)} \\ {x_{1} (2)} & {x_{2} (2)} & {...} & {x_{N} (2)} \\ \vdots & \vdots & \vdots & \vdots \\ {x_{1} (L_{tr} )} & {x_{2} (L_{tr} )} & \ldots & {x_{N} (L_{tr} )} \\ \end{array} } \right], $$
$$ D = \left[ {\begin{array}{*{20}c} {d_{1} (1)} & {d_{2} (1)} & {...} & {d_{L} (1)} \\ {d_{1} (2)} & {d_{2} (2)} & {...} & {d_{L} (2)} \\ \vdots & \vdots & \vdots & \vdots \\ {d_{1} (L_{tr} )} & {d_{2} (L_{tr} )} & \ldots & {d_{L} (L_{tr} )} \\ \end{array} } \right], $$

where d(t) = [d1(t), … dL(t)] denotes the tth teacher signal. In general, Eq. (3) can be directly solved by the pseudoinverse method. However, an ill-posed solution and overfitting problems can occur due to the pseudoinverse technique for high-dimensional internal states. In that case, a two-norm penalty term of regularization parameter ρ is introduced to the cost function:

$$ W^{{{\text{out}}}} = \mathop {\arg \min }\limits_{W} \left\| {WX - D} \right\|_{2}^{2} + \rho \left\| W \right\|_{2}^{2} . $$

Then, the solution of Eq. (6) is given by

$$ (W^{{{\text{out}}}} )^{T} = (X^{T} X + \rho I)^{ - 1} X^{T} D, $$

where I is the identity matrix. In the current study, the output weights are trained by the ridge regression mechanism. The training process of the traditional ESN module is outlined in Algorithm 1 [34].

figure a

It is worth mentioning that whether the reservoir has the echo state property is very important for the performance of the ESN. The meaning of the echo state property is that the influence of previous input on the future state should gradually tend to disappear. During training, the reservoir gradually removes the historical information to reach an asymptotically stable state. Overall, the main features of the ESN module can be summarized as follows: (1) the core structure is a randomly generated and unchanged sparsely connected reservoir. (2) The output weight is the only part that needs to be trained and adjusted. (3) Its training task can be completed by simple linear regression.

Dempster–Shafer evidence theory

DS evidence theory can be employed to combine information from multiple sensors or evidence sources to produce a synthesis decision. For example, suppose a set comprises N mutually independent and exclusive elements that describe a certain system and the so-called discernment frame.

$$ \Theta = \left\{ {\left. {A_{1} ,A_{2} , \ldots A_{i} , \ldots ,A_{N} } \right\}} \right.. $$

In fault diagnosis of HVCBs, elements A1 ~ AN in the discernment frame generally represent the possible working conditions, including the normal and fault states. Then, m (A) is defined as the basic probability assignment function, which reflects the confidence degree of a specific subset of Θ. This function obeys the following limitations.

$$ \left\{ {\begin{array}{*{20}l} {m(\phi ) = 0} \\ {{0} \le m(A) \le {1,}\forall A \subset \Theta } \\ {\sum\nolimits_{A \subset \Theta } {m(A) = 1} } \\ \end{array} } \right.. $$

Assume that there is a finite BFA function m1, m2, , mn; then, the evidence fusion law is defined as follows:

$$ \left( {m_{1} \oplus m_{2} \cdots \oplus m_{n} } \right)(A) = \frac{1}{{{1} - k}}\sum\limits_{{A_{1} \cap A_{2} \ldots \cap A_{{\text{n}}} { = }A}} {m_{1} (A_{1} )} *m_{2} (A_{2} ) \ldots *m_{n} (A_{n} ), $$

where k is called the conflict coefficient, which is applied to evaluate the degree of evidence conflict and can be expressed as follows.

$$ k = \sum\limits_{{A_{1} \cap A_{2} \ldots \cap A_{n} { = }\emptyset }} {m_{1} (A_{1} )} * \ldots m_{n} (A_{n} ) = 1 - \sum\limits_{{A_{1} \cap A_{2} \ldots \cap A_{n} \ne \emptyset }} {m_{1} (A_{1} )* \ldots m_{n} (A_{n} )} . $$

DS evidence fusion theory provides a good method to synthesize information from different sources, but there are also some disadvantages. As k [refer to Eq. (11)] increases, the conflict degree between the different pieces of evidence becomes more serious, and the result obtained by the fusion method will contradict the intuitive judgment. As shown in Table 3, the common evidence fusion paradox issues, including the complete conflict paradox, 0 trust paradox, 1 trust paradox, and high conflict paradox, limit the applicability of DS theory [38]. For fault diagnosis issues, A ~ E in Table 3 can be treated as concerned fault types. m1 ~ m4 can be the produced fault prediction results of each classifier module. For instance. Module m1 in the complete conflict paradox line completely diagnoses the input data as fault type A. However, module m2 diagnoses it as fault type B with 100% certainty. Therefore, the diagnostic results obtained by m1 and m2 are self-contradictory, which leads to the complete conflict paradox issue. Modules m3 and m4 produce probability distributions of the three fault types.

Table 3 Paradox issues in convolutional DS fusion

Improved Dempster–Shafer evidence theory

In the conventional DS evidence fusion algorithm, inaccurate fusion results could be caused by conflicts from different evidence sources. In this section, the evidence deviation degree is first calculated by the Euclidean distance and then substituted into a segmented circle function to evaluate the evidence’s credibility. Evidence credibility is further employed as the weight to amend the original evidence and decrease evidence conflict.

First, the BPA matrix BM×N, where M and N represent the kinds of evidence and proposition in the frame of discernment Θ, is defined. To obtain the deviation degree of a particular piece of evidence, pi = (mi(A1), mi(A2),… mi(A3)) is defined as the ith row in the BPA matrix. Then, the Euclidean distances between pi and other evidence points are added to obtain the deviation degree of the ith evidence point:

$$ d_{i} = \sum\limits_{j = 1}^{M} {\left\| {p_{i} - p_{j} } \right\|} = \sum\limits_{j = 1}^{M} {\sum\limits_{k = 1}^{N} {\left| {m_{i} (A_{k} ) - m_{j} (A_{k} )} \right|} } , $$

where di is the so-called absolute deviation degree coefficient of the ith evidence point. In Eq. (12), the range of mi(Aj) is [0,1], and the sum of mi(Aj) meets the criteria of \(\sum\nolimits_{j = 1}^{N} {m_{i} } (A_{j} ) = 1\). Furthermore, the relative deviation degree coefficient is obtained as follows:

$$ \varepsilon_{i} = \frac{{d_{i} }}{2M}\quad \varepsilon_{i} \in [0,1],M \ge {3,} $$

where M represents the data category, that is, the categorical series of mechanical faults in the paper. Parameter εi reflects the similarity between the ith evidence point and the other evidence points. In other words, a smaller εi value (closer to 0) means that evidence mi has greater consistency with the other evidence and is more objective; thus, better evidence credibility should be given and vice versa. Therefore, the evidence credibility should be a decreasing function of εi. For that purpose, we propose a segmented circle function as follows:

$$ \varepsilon_{i}^{ * } = \left\{ {\begin{array}{*{20}l} {0.5 + \sqrt {0.25 - \varepsilon_{i}^{2} } } & \quad {(0.5 > \varepsilon_{i} \ge 0)} \\ {{0}{\text{.5}} - \sqrt {0.25 - {(}\varepsilon_{i} - 1)^{2} } } & \quad {(1 \ge \varepsilon_{i} \ge 0.5)} \\ \end{array} } \right., $$

where εi* represents the final deviation degree coefficient. It can be detected that within the argument range [0, 1], Eq. (14) is a typical decreasing function that varies from 1 to 0. In addition, at the start and end of the curve, the slope is smaller (refer to Fig. 5), which retains higher credibility of evidence with a low deviation degree and reduces the influence of abnormal evidence on the final fusion result.

Fig. 5
figure 5

Segmented circle function curve

Furthermore, the credibility of the obtained evidence is employed as a weight coefficient to amend the original evidence:

$$ \left\{ {\begin{array}{*{20}l} {m_{i}^{ * } (A_{j} ) = m_{i} (A_{j} )\varepsilon_{i}^{ * } \quad (j = 1,2 \ldots N)} \\ {m_{i}^{ * } (\Phi ) = 1 - \sum\limits_{j = 1}^{N} {m_{i}^{ * } (A_{j} )} } \\ \end{array} } \right., $$

where Ф represents the uncertain proposition and \(m_{i}^{ * } (\Phi )\) describes the uncertainty degree of the ith proposition of the discernment frame. It should be noted that in the improved DS evidence theory, we calculate the summation of the Euclidean distances between the selected evidence and others and then obtain the evidence credibility by the segmented circle function. This unique combination can give higher confidence to the evidence that has better similarity with others. Furthermore, it weakens the negative effects of conflicting evidence, which will improve the accuracy of the fusion results.

Results and discussion

While using VMD, the final IMF number (I) needs to be predetermined and is vital for the effectiveness of this signal analysis method, since it directly affects the performance of VMD [40]. In this section, the centre frequency observation result is first analyzed to determine the optimal parameter I. Then, we perform two sets of experimental result analyses. In the first set, we evaluate how well our IDS evidence fusion algorithm solves the paradoxes by comparing it with several existing fusion approaches. In the second set, we analyze the performance of IDS–ESN for mechanical fault diagnosis of HVCBs and compare its performance with other widely applied classifiers and DS–ESN.

State-related feature extraction

In general, the centre frequency interval of adjacent IMFs gradually shrinks with increasing I and eventually leads to the over-decomposition problem. On the other hand, too small a predetermined I might bring about the modal aliasing issue, which causes the failure of state-related information separation. At present, there is no standard method for the selection of this parameter, and it is selected mainly based on expert experience. Although there are some reported techniques, such as introducing the cuckoo search algorithm into VMD to determine the mode number [5], they suffer from high computational complexity and are time-consuming. The centre frequency observation method is directly applied to determine the optimal mode number in this paper. Therefore, multiple VMD attempts are conducted to determine I. For the convenience of centre frequency observation, the different FFT spectra of IMFs (I varies in a range of 4–7) are given in Fig. 6.

Fig. 6
figure 6

FFT spectra with varying I values

The original vibration is well-decomposed into 5 IMFs with relatively uniform centre frequency intervals. Hence, no obvious centre frequency increase occurs in the last IMF. For instance, when I is set to 7, the highest centre frequency is 26697 Hz, which is only an increase of 125 Hz in the case of I = 5. Moreover, as I increases (I > 6), the centre frequency difference between adjacent IMFs decreases, and the modal aliasing phenomenon occurs. In summary, the optimal mode number I of VMD in this paper is 5. Moreover, as the feature value of the HVCB's health state, the energy distribution of the vibration signal on the 5 IMFs is calculated as follows:

$$ P_{i} = \frac{{E_{i} }}{E}, $$
$$ \left\{ {\begin{array}{*{20}l} {E_{i} = \displaystyle\int\limits_{{t_{0} }}^{{t_{i} }} {\left| {A(t)} \right|^{2} {\text{d}}t} } \hfill \\ {E = \sum\limits_{1}^{5} {E_{i} } } \hfill \\ \end{array} } \right., $$

where i is the serial number of the IMFs. t0 and ti represent the start and end times of the collected vibration signal, respectively. A(t) and Ei denote the amplitude at different timepoints and the signal energy of each IMF, respectively.

Discussion of the improved Dempster–Shafer model

To demonstrate the superiority of the improved DS (IDS) algorithm, we compare the paradox evidence fusion ability of our IDS with that of the traditional and modified DS algorithms from Murphy [36], Deng [37], and Li [38]. Taking the data in Table 3 as an example, the comparative fusion results are listed in Table 4.

Table 4 Comparisons of the fusion results

The complete conflict paradox (k = 1) makes the denominator of Eq. (10) zero and eventually leads to fusion failure of the convolution DS fusion method. In the 0 trust paradox, since evidence m3 is equal to zero in column A, the final BPA of proposition A is always zero regardless of how other evidence supports A. For the 1 trust paradox, the final BPA results are m(A) = m(C) = 0 and m(B) = 1. This is contrary to the actual situation, as there is very little supporting evidence for proposition B. Regarding the high conflict paradox, the final fusion results are m(B) = 0.3571, m(C) = 0.4286, and m(E) = 0.2143. Due to the high conflicts among evidence, proposition C has a larger BPA value than proposition B, which is also contrary to the correct situation. Table 4 shows that all four modified methods can deal with paradoxical evidence fusion problems but with different BPA values. In the complete conflict paradox, 0 trust paradox, and 1 trust paradox, the maximum BPA values in IDS and the algorithm from Li are similar and obviously higher than those of the other methods. In the high conflict paradox, IDS has a maximum BPA value of 0.9870. For the maximum BPA value, the closer to 1 it is, the more confidence it presents intuitively. Therefore, IDS is superior to other algorithms in high-conflict paradox evidence fusion cases. In addition, it is important to note that, compared with other faults, such as spring fatigue and overtravel faults, the state characteristic of clearance joint faults in this paper is weaker. Thus, a high conflict paradox among different sensors and classifiers is more likely to appear. In summary, our proposed IDS model can achieve the best overall performance while addressing paradoxical evidence for fault diagnosis of HVCBs.

Fault diagnosis result

In this paper, the average accuracy and stability at multiple times are selected as the performance evaluation indicators. First, multiple comparative experiments with several other state-of-the-art classifiers, namely, the back propagation (BP) neural network, radial basis (RBF) neural network, SVM model, extreme learning machine (ELM), and RF network are conducted to evaluate the performances of individual ESN modules. To verify the adaptability of the fault diagnosis model, these classifiers are trained and tested with the same input, corresponding to the different fault joint clearance sizes. Since the data in data sets A ~ D (refer to Table 1) can come from single or multiple sensors, there are three combinations of training and test sets in this paper. To prevent random sampling errors, the same model training and testing process is repeated 100 times, and the average accuracy is taken as the final result, as shown in Fig. 7.

Fig. 7
figure 7

Fault diagnosis results of different individual classifiers on data sets A ~ D

In Fig. 7, 1 and 2, represent the experimental data of 0.25 mm and 0.75 mm fault joint clearance sizes, respectively, in the subsequent sections. 1 → 2 means training the model on the input of the 0.25 mm fault clearance size and then testing it on that of the 0.75 mm fault clearance size. Through Fig. 7, the following observations can be made. First, in Fig. 7a–c, the ESN has the highest diagnostic accuracy. In addition, according to our literature review, ESN is used in the field of mechanical fault diagnosis of HVCBs for the first time, which points out a new direction for this field. Second, the mechanical fault diagnosis performance results of different classifiers with multiple sensors as input are better than those with a single sensor. This indicates that the state-related information of the vibrations at different positions has complementary features. Combining different sensor information should provide more useful information for fault diagnosis. In addition, the diagnosis accuracy on single sensor data can be negatively affected by the sensor quality, signal processing method, environmental effects, and so on, which limits the diagnostic accuracy. Third, in ESN, the diagnostic accuracy values for 1 → 1 and 2 → 2 can reach 95.3% and 95.69% from double-sensor input, respectively. However, in the cases of 1 → 2 and 2 → 1, the accuracy values decrease to 86.6% and 79.1%, respectively.

Notably, the diagnostic accuracy of each classifier in this paper is lower than that in previous literature. This can be explained by the fact that previous works tend to investigate the diagnosis of totally different types of mechanical faults; thus, the feature is more differentiated. In this paper, the studied mechanical faults are in different positions, but all mechanical faults are essentially caused by abnormal joint clearance; these faults may have in-depth correlations, which increase the difficulty of diagnosis. Moreover, to reflect the diagnostic ability of ESN, the span of the fault joint clearance size is enlarged (from 0.25 to 0.75 mm), so the characteristics of fault data of different clearance sizes at the same position have greater dissimilarity.

In actual circumstances, the joint clearance size changes along with the service time of the HVCB. Therefore, it is more desirable that the obtained model can adapt to different fault clearance size conditions. From Fig. 7d, we can see that in the cases of 1 → 2 and 2 → 1 in the 100 testing circles, although some accuracy values are greater than 90%, the accuracy rate is not stable, and the amplitude fluctuation is more than 20%, which deteriorates the model performance. This phenomenon can be explained by the characteristics of the ESN algorithm itself. In the ESN, the internal sparse connection structure of the neurons in the reservoir, the connection weight matrices of the reservoir input, and the reservoir-to-reservoir input are randomly initiated without adjustment in training. Therefore, the network structure and parameters of the ESN module obtained from different training processes are different, resulting in unstable diagnosis results. To address this problem, we initialize multiple ESN modules with different spectral radii and reservoir neuron numbers. Then, their output is fused by the aforementioned IDS algorithm. The average accuracy of 100 test circles for normal and three mechanical fault types is listed in Table 5. In addition, the total average accuracies of the individual ESN modules and ensemble ESN modules by IDS with input from multiple sensors vary and are described in Fig. 8.

Table 5 Fault diagnosis results of individual and ensemble ESN modules
Fig. 8
figure 8

Accuracy fluctuation of individual and ensemble ESN models

Some interesting observations can be made from Table 5. In the case of 1 → 2, the total average accuracy of the individual ESN module is 86.56%. Through fusion analysis, the DS–ESN and IDS–ESN modules improve the accuracy by 2.16% and 4.27%, respectively. However, in the case of 2 → 1, the accuracy values of the DS–ESN and IDS–ESN modules are almost constant (only a 1.6% increase in the IDS–ESN). In different circumstances, the diagnostic accuracy of Fault III is always significantly the lowest, especially in the 2 → 1 case, which eventually results in unsatisfactory final diagnostic results. This might be because the position of Fault III is far from the sensors in our experimental setup. Fault-related features are attenuated before the measuring points, making it more difficult to extract valid feature values from the measured signal. This phenomenon indicates that measuring points are crucial to the validity of the fault diagnosis results, and even the ensemble model may still have poor performance on a certain type of mechanical fault. In our future research, we will adopt other diagnostic methods to improve the diagnostic accuracy of this fault.

In Fig. 8, the red and black lines represent the average accuracy of the fusion model consisting of 25 and 50 ESN modules by IDS. We find that in IDS–ESN, the fluctuation amplitude of the average accuracy decreases significantly, and the degree of decline is positively correlated with the number of fused ESN modules. Therefore, main optimal configuration of our model is the number of ESN modules. More specifically, it is found when the number of the adopted echo state neural network modules increases to 50, the final output of IDS–ESN is sufficiently stable (the largest accuracy difference is within 4% of 100 statistical tests). In IDS–ESN, only the output weights of each ESN module need to be trained by a linear regression algorithm, which avoids gradient disappearance and has high efficiency. Overall, the instability of the diagnostic accuracy is resolved. If all modules can be trained properly, the ensemble model is expected to better reflect the nonlinear mapping between the input and marked fault types. In this paper, for performance evaluation of unbalanced input, we train the IDS–ESN model with the mixing input, in which the training data contain a certain percentage of different clearance size data. The obtained results are described in Fig. 9. More specifically, the diagnostic results with a 10% mixed training sample are described in Fig. 10.

Fig. 9
figure 9

Average accuracy with different percentages of mixed input

Fig. 10
figure 10

Diagnosis statistics of the fusion model with 10% mixed input

Figure 9 shows that the average accuracy of IDS–ESN is higher than that of DS–ESN. This can be attributed to the paradoxical evidence that the traditional D–S method fails at fusion, while our improved IDS fusion algorithm does not. In Fig. 10, three statistical parameters are selected (max, min, and average) according to the 100 statistical tests. After fusing multiple ESN module output results via our IDS approaches, the final average accuracy values of fault diagnosis based on multiple sensors and mixed training data can reach 96.92% and 92.66% in the 1 → 2 and 2 → 1 cases. Compared with the individual ESN results in Fig. 6, increases of 9.64% and 13.55% are observed.

In summary, the proposed IDS–ESN improves the accuracy of mechanical fault diagnosis of HVCBs and has many other advantages. On one hand, the reported IDS–ESN model is more flexible and can deal with more complicated fault diagnosis issues, as extra ESN modules can be directly added and will not affect other ESN modules. On the other hand, the IDS–ESN model has stronger robustness. Even if several ESN modules suffer from strong interruption or disturbance, leading to the wrong output, through the fusion of IDS–ESN, the correct result can still be guaranteed with a certain probability in the end. Accurate mechanical fault diagnosis of HVCBs is crucial for the safety of electric power systems, and it requires a diagnosis model with high reliability and robustness under all environmental conditions. The proposed IDS–ESN model makes up for the weakness of single classifiers. Therefore, it can improve mechanical diagnosis accuracy, reduce accidents, and eventually bring significant economic benefits to electric power systems.

Last but not least, the main limitations of our model come from two aspects. One is state-related feature extraction. A signal processing approach named VMD with centre frequency observation is adopted to extract the energy distribution of vibration as the state-related feature. It belongs to the frequency domain feature extraction method. Hybrid features from the time domain and frequency domain combing feature selection algorithm are considered for fault diagnosis in some recent works, which improves diagnostic accuracy. Another limitation is the issue of unbalanced input, it is not analyzed currently. Therefore, in future work, hybrid state-related features and model performance in case of unbalanced input would be evaluated by the proposed methodology.


IDS–ESN is newly proposed in this study, and it consists of an ensemble classifier with multiple ESN modules and an improved DS evidence theory approach. Compared with other existing machine learning methods, it is first indicated that individual ESNs can achieve promising mechanical fault diagnosis of HVCBs. Then, an effective information fusion approach is explored. Multiple ESN modules are employed as sub-classifiers, whose fault classification output is fused to stabilize the classifier’s diagnostic accuracy. The raw evidence in IDS–ESN is rectified by evidence credibility, which helps to address the paradoxical evidence fusion issue of the conventional DS technique. Comparative results show that our IDS–ESN model can fuse complementary evidence from different ESN modules and sensors for fault diagnosis. In addition, when the number of adopted ESN modules increases to 50, the final output of IDS–ESN is stable, and the largest accuracy difference of 100 statistical tests is within 4%. Therefore, the model output of IDS–ESN is not sensitive to the network parameters of a single ESN module. In other words, the proposed fault diagnosis approach has good robustness and does not rely heavily on expert prior knowledge for parameter setting. In future work, reservoir optimization of ESN modules in our IDS–ESN model will be explored to further enhance the feature mapping ability for more complex fault diagnosis tasks.